The player object and related subclasses
Env player
This module defines a player class exposing the Open AI Gym API with utility functions.
- class poke_env.player.env_player.EnvPlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True)
Bases:
OpenAIGymEnv
[ObsType
,ActType
],ABC
Player exposing the Open AI Gym Env API.
- action_space_size() → int
Returns the size of the action space. Given size x, the action space goes from 0 to x - 1.
- Returns:
The action space size.
- Return type:
int
- get_opponent() → Player | str | List[Player] | List[str]
Returns the opponent (or list of opponents) that will be challenged on the next iteration of the challenge loop. If a list is returned, a random element will be chosen at random during the challenge loop.
- reset_env(opponent: Player | str | None = None, restart: bool = True)
Resets the environment to an inactive state: it will forfeit all unfinished battles, reset the internal battle tracker and optionally change the next opponent and restart the challenge loop.
- Parameters:
opponent (Player or str, optional) – The opponent to use for the next battles. If empty it will not change opponent.
restart (bool) – If True the challenge loop will be restarted before returning, otherwise the challenge loop will be left inactive and can be started manually.
- reward_computing_helper(battle: AbstractBattle, *, fainted_value: float = 0.0, hp_value: float = 0.0, number_of_pokemons: int = 6, starting_value: float = 0.0, status_value: float = 0.0, victory_value: float = 1.0) → float
A helper function to compute rewards.
The reward is computed by computing the value of a game state, and by comparing it to the last state.
State values are computed by weighting different factor. Fainted pokemons, their remaining HP, inflicted statuses and winning are taken into account.
For instance, if the last time this function was called for battle A it had a state value of 8 and this call leads to a value of 9, the returned reward will be 9 - 8 = 1.
Consider a single battle where each player has 6 pokemons. No opponent pokemon has fainted, but our team has one fainted pokemon. Three opposing pokemons are burned. We have one pokemon missing half of its HP, and our fainted pokemon has no HP left.
The value of this state will be:
- With fainted value: 1, status value: 0.5, hp value: 1:
= - 1 (fainted) + 3 * 0.5 (status) - 1.5 (our hp) = -1
- With fainted value: 3, status value: 0, hp value: 1:
= - 3 + 3 * 0 - 1.5 = -4.5
- Parameters:
battle (AbstractBattle) – The battle for which to compute rewards.
fainted_value (float) – The reward weight for fainted pokemons. Defaults to 0.
hp_value (float) – The reward weight for hp per pokemon. Defaults to 0.
number_of_pokemons (int) – The number of pokemons per team. Defaults to 6.
starting_value (float) – The default reference value evaluation. Defaults to 0.
status_value (float) – The reward value per non-fainted status. Defaults to 0.
victory_value (float) – The reward value for winning. Defaults to 1.
- Returns:
The reward.
- Return type:
float
- class poke_env.player.env_player.Gen4EnvSinglePlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True)
Bases:
EnvPlayer
[ObsType
,ActType
],ABC
- action_to_move(action: int, battle: AbstractBattle) → BattleOrder
Converts actions to move orders.
The conversion is done as follows:
- action = -1:
The battle will be forfeited.
- 0 <= action < 4:
The actionth available move in battle.available_moves is executed.
- 4 <= action < 10
The action - 4th available switch in battle.available_switches is executed.
If the proposed action is illegal, a random legal move is performed.
- Parameters:
action (int) – The action to convert.
battle (Battle) – The battle in which to act.
- Returns:
the order to send to the server.
- Return type:
str
- class poke_env.player.env_player.Gen5EnvSinglePlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True)
Bases:
Gen4EnvSinglePlayer
[ObsType
,ActType
],ABC
- class poke_env.player.env_player.Gen6EnvSinglePlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True)
Bases:
EnvPlayer
[ObsType
,ActType
],ABC
- action_to_move(action: int, battle: AbstractBattle) → BattleOrder
Converts actions to move orders.
The conversion is done as follows:
- action = -1:
The battle will be forfeited.
- 0 <= action < 4:
The actionth available move in battle.available_moves is executed.
- 4 <= action < 8:
The action - 8th available move in battle.available_moves is executed, with mega-evolution.
- 8 <= action < 14
The action - 8th available switch in battle.available_switches is executed.
If the proposed action is illegal, a random legal move is performed.
- Parameters:
action (int) – The action to convert.
battle (Battle) – The battle in which to act.
- Returns:
the order to send to the server.
- Return type:
str
- class poke_env.player.env_player.Gen7EnvSinglePlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True)
Bases:
EnvPlayer
[ObsType
,ActType
],ABC
- action_to_move(action: int, battle: AbstractBattle) → BattleOrder
Converts actions to move orders.
The conversion is done as follows:
- action = -1:
The battle will be forfeited.
- 0 <= action < 4:
The actionth available move in battle.available_moves is executed.
- 4 <= action < 8:
The action - 4th available move in battle.available_moves is executed, with z-move.
- 8 <= action < 12:
The action - 8th available move in battle.available_moves is executed, with mega-evolution.
- 12 <= action < 18
The action - 12th available switch in battle.available_switches is executed.
If the proposed action is illegal, a random legal move is performed.
- Parameters:
action (int) – The action to convert.
battle (Battle) – The battle in which to act.
- Returns:
the order to send to the server.
- Return type:
str
- class poke_env.player.env_player.Gen8EnvSinglePlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True)
Bases:
EnvPlayer
[ObsType
,ActType
],ABC
- action_to_move(action: int, battle: AbstractBattle) → BattleOrder
Converts actions to move orders.
The conversion is done as follows:
- action = -1:
The battle will be forfeited.
- 0 <= action < 4:
The actionth available move in battle.available_moves is executed.
- 4 <= action < 8:
The action - 4th available move in battle.available_moves is executed, with z-move.
- 8 <= action < 12:
The action - 8th available move in battle.available_moves is executed, with mega-evolution.
- 8 <= action < 12:
The action - 8th available move in battle.available_moves is executed, with mega-evolution.
- 12 <= action < 16:
The action - 12th available move in battle.available_moves is executed, while dynamaxing.
- 16 <= action < 22
The action - 16th available switch in battle.available_switches is executed.
If the proposed action is illegal, a random legal move is performed.
- Parameters:
action (int) – The action to convert.
battle (Battle) – The battle in which to act.
- Returns:
the order to send to the server.
- Return type:
str
- class poke_env.player.env_player.Gen9EnvSinglePlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True)
Bases:
EnvPlayer
[ObsType
,ActType
],ABC
- action_to_move(action: int, battle: AbstractBattle) → BattleOrder
Converts actions to move orders.
The conversion is done as follows:
- action = -1:
The battle will be forfeited.
- 0 <= action < 4:
The actionth available move in battle.available_moves is executed.
- 4 <= action < 8:
The action - 4th available move in battle.available_moves is executed, with z-move.
- 8 <= action < 12:
The action - 8th available move in battle.available_moves is executed, with mega-evolution.
- 8 <= action < 12:
The action - 8th available move in battle.available_moves is executed, with mega-evolution.
- 12 <= action < 16:
The action - 12th available move in battle.available_moves is executed, while dynamaxing.
- 16 <= action < 20:
The action - 16th available move in battle.available_moves is executed, while terastallizing.
- 20 <= action < 26
The action - 20th available switch in battle.available_switches is executed.
If the proposed action is illegal, a random legal move is performed.
- Parameters:
action (int) – The action to convert.
battle (Battle) – The battle in which to act.
- Returns:
the order to send to the server.
- Return type:
str
Player
This module defines a base class for players.
- class poke_env.player.player.Player(account_configuration: AccountConfiguration | None = None, *, avatar: str | None = None, battle_format: str = 'gen9randombattle', log_level: int | None = None, max_concurrent_battles: int = 1, accept_open_team_sheet: bool = False, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_timer_on_battle_start: bool = False, start_listening: bool = True, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None)
Bases:
ABC
Base class for players.
- DEFAULT_CHOICE_CHANCE = 0.001
- MESSAGES_TO_IGNORE = {'', 'expire', 't:', 'uhtmlchange'}
- async accept_challenges(opponent: str | List[str] | None, n_challenges: int, packed_team: str | None = None)
Let the player wait for challenges from opponent, and accept them.
If opponent is None, every challenge will be accepted. If opponent if a string, all challenges from player with that name will be accepted. If opponent is a list all challenges originating from players whose name is in the list will be accepted.
Up to n_challenges challenges will be accepted, after what the function will wait for these battles to finish, and then return.
- Parameters:
opponent (None, str or list of str) – Players from which challenges will be accepted.
n_challenges (int) – Number of challenges that will be accepted
- Packed_team:
Team to use. Defaults to generating a team with the agent’s teambuilder.
- property accept_open_team_sheet: bool
- async battle_against(opponent: Player, n_battles: int = 1)
Make the player play n_battles against opponent.
This function is a wrapper around send_challenges and accept challenges.
- Parameters:
opponent (Player) – The opponent to play against.
n_battles (int) – The number of games to play. Defaults to 1.
- property battles: Dict[str, AbstractBattle]
- choose_default_move() → DefaultBattleOrder
Returns showdown’s default move order.
This order will result in the first legal order - according to showdown’s ordering - being chosen.
- abstract choose_move(battle: AbstractBattle) → BattleOrder | Awaitable[BattleOrder]
Abstract method to choose a move in a battle.
- Parameters:
battle (AbstractBattle) – The battle.
- Returns:
The move order.
- Return type:
str
- choose_random_doubles_move(battle: DoubleBattle) → BattleOrder
- choose_random_move(battle: AbstractBattle) → BattleOrder
Returns a random legal move from battle.
- Parameters:
battle (AbstractBattle) – The battle in which to move.
- Returns:
Move order
- Return type:
str
- static create_order(order: Move | Pokemon, mega: bool = False, z_move: bool = False, dynamax: bool = False, terastallize: bool = False, move_target: int = 0) → BattleOrder
Formats an move order corresponding to the provided pokemon or move.
- Parameters:
order (Move or Pokemon) – Move to make or Pokemon to switch to.
mega (bool) – Whether to mega evolve the pokemon, if a move is chosen.
z_move (bool) – Whether to make a zmove, if a move is chosen.
dynamax (bool) – Whether to dynamax, if a move is chosen.
terastallize (bool) – Whether to terastallize, if a move is chosen.
move_target (int) – Target Pokemon slot of a given move
- Returns:
Formatted move order
- Return type:
str
- property format: str
- property format_is_doubles: bool
- async ladder(n_games: int)
Make the player play games on the ladder.
n_games defines how many battles will be played.
- Parameters:
n_games (int) – Number of battles that will be played
- property logger: Logger
- property n_finished_battles: int
- property n_lost_battles: int
- property n_tied_battles: int
- property n_won_battles: int
- property next_team: str | None
- random_teampreview(battle: AbstractBattle) → str
Returns a random valid teampreview order for the given battle.
- Parameters:
battle (AbstractBattle) – The battle.
- Returns:
The random teampreview order.
- Return type:
str
- reset_battles()
Resets the player’s inner battle tracker.
- async send_challenges(opponent: str, n_challenges: int, to_wait: Event | None = None)
Make the player send challenges to opponent.
opponent must be a string, corresponding to the name of the player to challenge.
n_challenges defines how many challenges will be sent.
to_wait is an optional event that can be set, in which case it will be waited before launching challenges.
- Parameters:
opponent (str) – Player username to challenge.
n_challenges (int) – Number of battles that will be started
to_wait (Event, optional.) – Optional event to wait before launching challenges.
- teampreview(battle: AbstractBattle) → str
Returns a teampreview order for the given battle.
This order must be of the form /team TEAM, where TEAM is a string defining the team chosen by the player. Multiple formats are supported, among which ‘3461’ and ‘3, 4, 6, 1’ are correct and indicate leading with pokemon 3, with pokemons 4, 6 and 1 in the back in single battles or leading with pokemons 3 and 4 with pokemons 6 and 1 in the back in double battles.
Please refer to Pokemon Showdown’s protocol documentation for more information.
- Parameters:
battle (AbstractBattle) – The battle.
- Returns:
The teampreview order.
- Return type:
str
- update_team(team: Teambuilder | str)
Updates the team used by the player.
- Parameters:
team (str or Teambuilder) – The new team to use.
- property username: str
- property win_rate: float
OpenAIGymEnv
This module defines a player class with the OpenAI API on the main thread. For a black-box implementation consider using the module env_player.
- class poke_env.player.openai_api.OpenAIGymEnv(account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str = 'gen8randombattle', log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = ('localhost:8000', 'https://play.pokemonshowdown.com/action.php?'), start_timer_on_battle_start: bool = False, start_listening: bool = True, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = False)
Bases:
Env
[ObsType
,ActType
],ABC
Base class implementing the OpenAI Gym API on the main thread.
- abstract action_space_size() → int
Returns the size of the action space. Given size x, the action space goes from 0 to x - 1.
- Returns:
The action space size.
- Return type:
int
- abstract action_to_move(action: int, battle: AbstractBattle) → BattleOrder
Returns the BattleOrder relative to the given action.
- Parameters:
action (int) – The action to take.
battle (AbstractBattle) – The current battle state
- Returns:
The battle order for the given action in context of the current battle.
- Return type:
BattleOrder
- background_accept_challenge(username: str)
Accepts a single challenge specified player. The function immediately returns to allow use of the OpenAI gym API.
- Parameters:
username (str) – The username of the player to challenge.
- background_send_challenge(username: str)
Sends a single challenge specified player. The function immediately returns to allow use of the OpenAI gym API.
- Parameters:
username (str) – The username of the player to challenge.
- property battles: Dict[str, AbstractBattle]
- abstract calc_reward(last_battle: AbstractBattle, current_battle: AbstractBattle) → float
Returns the reward for the current battle state. The battle state in the previous turn is given as well and can be used for comparisons.
- Parameters:
last_battle (AbstractBattle) – The battle state in the previous turn.
current_battle (AbstractBattle) – The current battle state.
- Returns:
The reward for current_battle.
- Return type:
float
- close(purge: bool = True)
After the user has finished using the environment, close contains the code necessary to “clean up” the environment.
This is critical for closing rendering windows, database or HTTP connections. Calling
close
on an already closed environment has no effect and won’t raise an error.
- abstract describe_embedding() → Space[ObsType]
Returns the description of the embedding. It must return a Space specifying low bounds and high bounds.
- Returns:
The description of the embedding.
- Return type:
Space
- done(timeout: int | None = None) → bool
Returns True if the task is done or is done after the timeout, false otherwise.
- Parameters:
timeout (int, optional) – The amount of time to wait for if the task is not already done. If empty it will wait until the task is done.
- Returns:
True if the task is done or if the task gets completed after the timeout.
- Return type:
bool
- abstract embed_battle(battle: AbstractBattle) → ObsType
Returns the embedding of the current battle state in a format compatible with the OpenAI gym API.
- Parameters:
battle (AbstractBattle) – The current battle state.
- Returns:
The embedding of the current battle state.
- property format: str
- property format_is_doubles: bool
- get_additional_info() → Dict[str, Any]
Returns additional info for the reset method. Override only if you really need it.
- Returns:
Additional information as a Dict
- Return type:
Dict
- abstract get_opponent() → Player | str | List[Player] | List[str]
Returns the opponent (or list of opponents) that will be challenged on the next iteration of the challenge loop. If a list is returned, a random element will be chosen at random during the challenge loop.
- property logged_in: Event
Event object associated with user login.
- Returns:
The logged-in event
- Return type:
Event
- property logger: Logger
Logger associated with the player.
- Returns:
The logger.
- Return type:
Logger
- property n_finished_battles: int
- property n_lost_battles: int
- property n_tied_battles: int
- property n_won_battles: int
- render(mode: str = 'human')
Compute the render frames as specified by
render_mode
during the initialization of the environment.The environment’s
metadata
render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.- Note:
As the
render_mode
is known during__init__
, the objects used to render the environment state should be initialised in__init__
.
By convention, if the
render_mode
is:None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during
step()
andrender()
doesn’t need to be called. ReturnsNone
.“rgb_array”: Return a single frame representing the current state of the environment. A frame is a
np.ndarray
with shape(x, y, 3)
representing RGB values for an x-by-y pixel image.“ansi”: Return a strings (
str
) orStringIO.StringIO
containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper,
gymnasium.wrappers.RenderCollection
that is automatically applied duringgymnasium.make(..., render_mode="rgb_array_list")
. The frames collected are popped afterrender()
is called orreset()
.
- Note:
Make sure that your class’s
metadata
"render_modes"
key includes the list of supported modes.
Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e.,
gymnasium.make("CartPole-v1", render_mode="human")
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) → Tuple[ObsType, Dict[str, Any]]
Resets the environment to an initial internal state, returning an initial observation and info.
This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the
seed
parameter otherwise if the environment already has a random number generator andreset()
is called withseed=None
, the RNG is not reset.Therefore,
reset()
should (in the typical use case) be called with a seed right after initialization and then never again.For Custom environments, the first line of
reset()
should besuper().reset(seed=seed)
which implements the seeding correctly.Changed in version v0.25: The
return_info
parameter was removed and now info is expected to be returned.- Args:
- seed (optional int): The seed that is used to initialize the environment’s PRNG (np_random).
If the environment does not already have a PRNG and
seed=None
(the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG andseed=None
is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action.- options (optional dict): Additional information to specify how the environment is reset (optional,
depending on the specific environment)
- Returns:
- observation (ObsType): Observation of the initial state. This will be an element of
observation_space
(typically a numpy array) and is analogous to the observation returned by
step()
.- info (dictionary): This dictionary contains auxiliary information complementing
observation
. It should be analogous to the
info
returned bystep()
.
- observation (ObsType): Observation of the initial state. This will be an element of
- reset_battles()
Resets the player’s inner battle tracker.
- start_challenging(n_challenges: int | None = None, callback: Callable[[AbstractBattle], None] | None = None)
Starts the challenge loop.
- Parameters:
n_challenges (int, optional) – The number of challenges to send. If empty it will run until stopped.
callback (Callable[[AbstractBattle], None], optional) – The function to callback after each challenge with a copy of the final battle state.
- start_laddering(n_challenges: int | None = None, callback: Callable[[AbstractBattle], None] | None = None)
Starts the laddering loop.
- Parameters:
n_challenges (int, optional) – The number of ladder games to play. If empty it will run until stopped.
callback (Callable[[AbstractBattle], None], optional) – The function to callback after each challenge with a copy of the final battle state.
- step(action: ActType) → Tuple[ObsType, float, bool, bool, Dict[str, Any]]
Execute the specified action in the environment.
- Parameters:
action (ActType) – The action to be executed.
- Returns:
A tuple containing the new observation, reward, termination flag, truncation flag, and info dictionary.
- Return type:
Tuple[ObsType, float, bool, bool, Dict[str, Any]]
- property username: str
The player’s username.
- Returns:
The player’s username.
- Return type:
str
- property websocket_url: str
The websocket url.
It is derived from the server url.
- Returns:
The websocket url.
- Return type:
str
- property win_rate: float
Random Player
This module defines a random players baseline
- class poke_env.player.random_player.RandomPlayer(account_configuration: AccountConfiguration | None = None, *, avatar: str | None = None, battle_format: str = 'gen9randombattle', log_level: int | None = None, max_concurrent_battles: int = 1, accept_open_team_sheet: bool = False, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_timer_on_battle_start: bool = False, start_listening: bool = True, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None)
Bases:
Player
- choose_move(battle: AbstractBattle) → BattleOrder
Abstract method to choose a move in a battle.
- Parameters:
battle (AbstractBattle) – The battle.
- Returns:
The move order.
- Return type:
str