The player object and related subclasses

Env player 

This module defines a player class exposing the Open AI Gym API with utility functions.

Bases: OpenAIGymEnv[ObsType, ActType], ABC

Player exposing the Open AI Gym Env API.

action_space_size() → int

Returns the size of the action space. Given size x, the action space goes from 0 to x - 1.

Returns:: The action space size.
Return type:: int

get_opponent() → Player | str | List[Player] | List[str]

Returns the opponent (or list of opponents) that will be challenged on the next iteration of the challenge loop. If a list is returned, a random element will be chosen at random during the challenge loop.

Returns:: The opponent (or list of opponents).
Return type:: Player or str or list(Player) or list(str)

reset_env(opponent: Player | str | None = None, restart: bool = True)

Resets the environment to an inactive state: it will forfeit all unfinished battles, reset the internal battle tracker and optionally change the next opponent and restart the challenge loop.

Parameters:

opponent (Player or str, optional) – The opponent to use for the next battles. If empty it will not change opponent.
restart (bool) – If True the challenge loop will be restarted before returning, otherwise the challenge loop will be left inactive and can be started manually.

reward_computing_helper(battle: AbstractBattle, *, fainted_value: float = 0.0, hp_value: float = 0.0, number_of_pokemons: int = 6, starting_value: float = 0.0, status_value: float = 0.0, victory_value: float = 1.0) → float

A helper function to compute rewards.

The reward is computed by computing the value of a game state, and by comparing it to the last state.

State values are computed by weighting different factor. Fainted pokemons, their remaining HP, inflicted statuses and winning are taken into account.

For instance, if the last time this function was called for battle A it had a state value of 8 and this call leads to a value of 9, the returned reward will be 9 - 8 = 1.

Consider a single battle where each player has 6 pokemons. No opponent pokemon has fainted, but our team has one fainted pokemon. Three opposing pokemons are burned. We have one pokemon missing half of its HP, and our fainted pokemon has no HP left.

The value of this state will be:

With fainted value: 1, status value: 0.5, hp value: 1:
= - 1 (fainted) + 3 * 0.5 (status) - 1.5 (our hp) = -1
With fainted value: 3, status value: 0, hp value: 1:
= - 3 + 3 * 0 - 1.5 = -4.5

Parameters:

battle (AbstractBattle) – The battle for which to compute rewards.
fainted_value (float) – The reward weight for fainted pokemons. Defaults to 0.
hp_value (float) – The reward weight for hp per pokemon. Defaults to 0.
number_of_pokemons (int) – The number of pokemons per team. Defaults to 6.
starting_value (float) – The default reference value evaluation. Defaults to 0.
status_value (float) – The reward value per non-fainted status. Defaults to 0.
victory_value (float) – The reward value for winning. Defaults to 1.

Returns:

The reward.

Return type:

float

set_opponent(opponent: Player | str)

Sets the next opponent to the specified opponent.

Parameters:: opponent (Player or str) – The next opponent to challenge

class poke_env.player.env_player.Gen4EnvSinglePlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True)

Bases: EnvPlayer[ObsType, ActType], ABC

action_to_move(action: int, battle: AbstractBattle) → BattleOrder

Converts actions to move orders.

The conversion is done as follows:

action = -1:: The battle will be forfeited.
0 <= action < 4:: The actionth available move in battle.available_moves is executed.
4 <= action < 10: The action - 4th available switch in battle.available_switches is executed.

If the proposed action is illegal, a random legal move is performed.

Parameters:

action (int) – The action to convert.
battle (Battle) – The battle in which to act.

Returns:

the order to send to the server.

Return type:

str

class poke_env.player.env_player.Gen5EnvSinglePlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True): Bases: Gen4EnvSinglePlayer[ObsType, ActType], ABC

class poke_env.player.env_player.Gen6EnvSinglePlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True)

Bases: EnvPlayer[ObsType, ActType], ABC

action_to_move(action: int, battle: AbstractBattle) → BattleOrder

Converts actions to move orders.

The conversion is done as follows:

action = -1:: The battle will be forfeited.
0 <= action < 4:: The actionth available move in battle.available_moves is executed.
4 <= action < 8:: The action - 8th available move in battle.available_moves is executed, with mega-evolution.
8 <= action < 14: The action - 8th available switch in battle.available_switches is executed.

If the proposed action is illegal, a random legal move is performed.

Parameters:

action (int) – The action to convert.
battle (Battle) – The battle in which to act.

Returns:

the order to send to the server.

Return type:

str

class poke_env.player.env_player.Gen7EnvSinglePlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True)

Bases: EnvPlayer[ObsType, ActType], ABC

action_to_move(action: int, battle: AbstractBattle) → BattleOrder

Converts actions to move orders.

The conversion is done as follows:

action = -1:: The battle will be forfeited.
0 <= action < 4:: The actionth available move in battle.available_moves is executed.
4 <= action < 8:: The action - 4th available move in battle.available_moves is executed, with z-move.
8 <= action < 12:: The action - 8th available move in battle.available_moves is executed, with mega-evolution.
12 <= action < 18: The action - 12th available switch in battle.available_switches is executed.

If the proposed action is illegal, a random legal move is performed.

Parameters:

action (int) – The action to convert.
battle (Battle) – The battle in which to act.

Returns:

the order to send to the server.

Return type:

str

class poke_env.player.env_player.Gen8EnvSinglePlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True)

Bases: EnvPlayer[ObsType, ActType], ABC

action_to_move(action: int, battle: AbstractBattle) → BattleOrder

Converts actions to move orders.

The conversion is done as follows:

action = -1:: The battle will be forfeited.
0 <= action < 4:: The actionth available move in battle.available_moves is executed.
4 <= action < 8:: The action - 4th available move in battle.available_moves is executed, with z-move.
8 <= action < 12:: The action - 8th available move in battle.available_moves is executed, with mega-evolution.
8 <= action < 12:: The action - 8th available move in battle.available_moves is executed, with mega-evolution.
12 <= action < 16:: The action - 12th available move in battle.available_moves is executed, while dynamaxing.
16 <= action < 22: The action - 16th available switch in battle.available_switches is executed.

If the proposed action is illegal, a random legal move is performed.

Parameters:

action (int) – The action to convert.
battle (Battle) – The battle in which to act.

Returns:

the order to send to the server.

Return type:

str

class poke_env.player.env_player.Gen9EnvSinglePlayer(opponent: Player | str | None, account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str | None = None, log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_listening: bool = True, start_timer_on_battle_start: bool = False, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = True)

Bases: EnvPlayer[ObsType, ActType], ABC

action_to_move(action: int, battle: AbstractBattle) → BattleOrder

Converts actions to move orders.

The conversion is done as follows:

action = -1:: The battle will be forfeited.
0 <= action < 4:: The actionth available move in battle.available_moves is executed.
4 <= action < 8:: The action - 4th available move in battle.available_moves is executed, with z-move.
8 <= action < 12:: The action - 8th available move in battle.available_moves is executed, with mega-evolution.
8 <= action < 12:: The action - 8th available move in battle.available_moves is executed, with mega-evolution.
12 <= action < 16:: The action - 12th available move in battle.available_moves is executed, while dynamaxing.
16 <= action < 20:: The action - 16th available move in battle.available_moves is executed, while terastallizing.
20 <= action < 26: The action - 20th available switch in battle.available_switches is executed.

If the proposed action is illegal, a random legal move is performed.

Parameters:

action (int) – The action to convert.
battle (Battle) – The battle in which to act.

Returns:

the order to send to the server.

Return type:

str

Player 

This module defines a base class for players.

class poke_env.player.player.Player(account_configuration: AccountConfiguration | None = None, *, avatar: str | None = None, battle_format: str = 'gen9randombattle', log_level: int | None = None, max_concurrent_battles: int = 1, accept_open_team_sheet: bool = False, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_timer_on_battle_start: bool = False, start_listening: bool = True, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None)

Bases: ABC

Base class for players.

DEFAULT_CHOICE_CHANCE = 0.001

MESSAGES_TO_IGNORE = {'', 'expire', 't:', 'uhtmlchange'}

async accept_challenges(opponent: str | List[str] | None, n_challenges: int, packed_team: str | None = None)

Let the player wait for challenges from opponent, and accept them.

If opponent is None, every challenge will be accepted. If opponent if a string, all challenges from player with that name will be accepted. If opponent is a list all challenges originating from players whose name is in the list will be accepted.

Up to n_challenges challenges will be accepted, after what the function will wait for these battles to finish, and then return.

Parameters:

opponent (None, str or list of str) – Players from which challenges will be accepted.
n_challenges (int) – Number of challenges that will be accepted

Packed_team:

Team to use. Defaults to generating a team with the agent’s teambuilder.

property accept_open_team_sheet: bool

async battle_against(opponent: Player, n_battles: int = 1)

Make the player play n_battles against opponent.

This function is a wrapper around send_challenges and accept challenges.

Parameters:

opponent (Player) – The opponent to play against.
n_battles (int) – The number of games to play. Defaults to 1.

property battles: Dict[str, AbstractBattle]

choose_default_move() → DefaultBattleOrder

Returns showdown’s default move order.

This order will result in the first legal order - according to showdown’s ordering - being chosen.

abstract choose_move(battle: AbstractBattle) → BattleOrder | Awaitable[BattleOrder]

Abstract method to choose a move in a battle.

Parameters:: battle (AbstractBattle) – The battle.
Returns:: The move order.
Return type:: str

choose_random_doubles_move(battle: DoubleBattle) → BattleOrder

choose_random_move(battle: AbstractBattle) → BattleOrder

Returns a random legal move from battle.

Parameters:: battle (AbstractBattle) – The battle in which to move.
Returns:: Move order
Return type:: str

choose_random_singles_move(battle: Battle) → BattleOrder

static create_order(order: Move | Pokemon, mega: bool = False, z_move: bool = False, dynamax: bool = False, terastallize: bool = False, move_target: int = 0) → BattleOrder

Formats an move order corresponding to the provided pokemon or move.

Parameters:

order (Move or Pokemon) – Move to make or Pokemon to switch to.
mega (bool) – Whether to mega evolve the pokemon, if a move is chosen.
z_move (bool) – Whether to make a zmove, if a move is chosen.
dynamax (bool) – Whether to dynamax, if a move is chosen.
terastallize (bool) – Whether to terastallize, if a move is chosen.
move_target (int) – Target Pokemon slot of a given move

Returns:

Formatted move order

Return type:

str

property format: str

property format_is_doubles: bool

async ladder(n_games: int)

Make the player play games on the ladder.

n_games defines how many battles will be played.

Parameters:: n_games (int) – Number of battles that will be played

property logger: Logger

property n_finished_battles: int

property n_lost_battles: int

property n_tied_battles: int

property n_won_battles: int

property next_team: str | None

random_teampreview(battle: AbstractBattle) → str

Returns a random valid teampreview order for the given battle.

Parameters:: battle (AbstractBattle) – The battle.
Returns:: The random teampreview order.
Return type:: str

reset_battles(): Resets the player’s inner battle tracker.

async send_challenges(opponent: str, n_challenges: int, to_wait: Event | None = None)

Make the player send challenges to opponent.

opponent must be a string, corresponding to the name of the player to challenge.

n_challenges defines how many challenges will be sent.

to_wait is an optional event that can be set, in which case it will be waited before launching challenges.

Parameters:

opponent (str) – Player username to challenge.
n_challenges (int) – Number of battles that will be started
to_wait (Event, optional.) – Optional event to wait before launching challenges.

teampreview(battle: AbstractBattle) → str

Returns a teampreview order for the given battle.

This order must be of the form /team TEAM, where TEAM is a string defining the team chosen by the player. Multiple formats are supported, among which ‘3461’ and ‘3, 4, 6, 1’ are correct and indicate leading with pokemon 3, with pokemons 4, 6 and 1 in the back in single battles or leading with pokemons 3 and 4 with pokemons 6 and 1 in the back in double battles.

Please refer to Pokemon Showdown’s protocol documentation for more information.

Parameters:: battle (AbstractBattle) – The battle.
Returns:: The teampreview order.
Return type:: str

update_team(team: Teambuilder | str)

Updates the team used by the player.

Parameters:: team (str or Teambuilder) – The new team to use.

property username: str

property win_rate: float

OpenAIGymEnv 

This module defines a player class with the OpenAI API on the main thread. For a black-box implementation consider using the module env_player.

class poke_env.player.openai_api.OpenAIGymEnv(account_configuration: AccountConfiguration | None = None, *, avatar: int | None = None, battle_format: str = 'gen8randombattle', log_level: int | None = None, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = ('localhost:8000', 'https://play.pokemonshowdown.com/action.php?'), start_timer_on_battle_start: bool = False, start_listening: bool = True, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None, start_challenging: bool = False)

Bases: Env[ObsType, ActType], ABC

Base class implementing the OpenAI Gym API on the main thread.

abstract action_space_size() → int

Returns the size of the action space. Given size x, the action space goes from 0 to x - 1.

Returns:: The action space size.
Return type:: int

abstract action_to_move(action: int, battle: AbstractBattle) → BattleOrder

Returns the BattleOrder relative to the given action.

Parameters:

action (int) – The action to take.
battle (AbstractBattle) – The current battle state

Returns:

The battle order for the given action in context of the current battle.

Return type:

BattleOrder

background_accept_challenge(username: str)

Accepts a single challenge specified player. The function immediately returns to allow use of the OpenAI gym API.

Parameters:: username (str) – The username of the player to challenge.

background_send_challenge(username: str)

Sends a single challenge specified player. The function immediately returns to allow use of the OpenAI gym API.

Parameters:: username (str) – The username of the player to challenge.

property battles: Dict[str, AbstractBattle]

abstract calc_reward(last_battle: AbstractBattle, current_battle: AbstractBattle) → float

Returns the reward for the current battle state. The battle state in the previous turn is given as well and can be used for comparisons.

Parameters:

last_battle (AbstractBattle) – The battle state in the previous turn.
current_battle (AbstractBattle) – The current battle state.

Returns:

The reward for current_battle.

Return type:

float

close(purge: bool = True)

After the user has finished using the environment, close contains the code necessary to “clean up” the environment.

This is critical for closing rendering windows, database or HTTP connections. Calling close on an already closed environment has no effect and won’t raise an error.

abstract describe_embedding() → Space[ObsType]

Returns the description of the embedding. It must return a Space specifying low bounds and high bounds.

Returns:: The description of the embedding.
Return type:: Space

done(timeout: int | None = None) → bool

Returns True if the task is done or is done after the timeout, false otherwise.

Parameters:: timeout (int, optional) – The amount of time to wait for if the task is not already done. If empty it will wait until the task is done.
Returns:: True if the task is done or if the task gets completed after the timeout.
Return type:: bool

abstract embed_battle(battle: AbstractBattle) → ObsType

Returns the embedding of the current battle state in a format compatible with the OpenAI gym API.

Parameters:: battle (AbstractBattle) – The current battle state.
Returns:: The embedding of the current battle state.

property format: str

property format_is_doubles: bool

get_additional_info() → Dict[str, Any]

Returns additional info for the reset method. Override only if you really need it.

Returns:: Additional information as a Dict
Return type:: Dict

abstract get_opponent() → Player | str | List[Player] | List[str]

Returns the opponent (or list of opponents) that will be challenged on the next iteration of the challenge loop. If a list is returned, a random element will be chosen at random during the challenge loop.

Returns:: The opponent (or list of opponents).
Return type:: Player or str or list(Player) or list(str)

property logged_in: Event

Event object associated with user login.

Returns:: The logged-in event
Return type:: Event

property logger: Logger

Logger associated with the player.

Returns:: The logger.
Return type:: Logger

property n_finished_battles: int

property n_lost_battles: int

property n_tied_battles: int

property n_won_battles: int

render(mode: str = 'human')

Compute the render frames as specified by render_mode during the initialization of the environment.

The environment’s metadata render modes (env.metadata[“render_modes”]) should contain the possible ways to implement the render modes. In addition, list versions for most render modes is achieved through gymnasium.make which automatically applies a wrapper to collect rendered frames.

Note:: As the render_mode is known during __init__, the objects used to render the environment state should be initialised in __init__.

By convention, if the render_mode is:

None (default): no render is computed.
“human”: The environment is continuously rendered in the current display or terminal, usually for human consumption. This rendering should occur during step() and render() doesn’t need to be called. Returns None.
“rgb_array”: Return a single frame representing the current state of the environment. A frame is a np.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.
“ansi”: Return a strings (str) or StringIO.StringIO containing a terminal-style text representation for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).
“rgb_array_list” and “ansi_list”: List based version of render modes are possible (except Human) through the wrapper, gymnasium.wrappers.RenderCollection that is automatically applied during gymnasium.make(..., render_mode="rgb_array_list"). The frames collected are popped after render() is called or reset().

Note:: Make sure that your class’s metadata "render_modes" key includes the list of supported modes.

Changed in version 0.25.0: The render function was changed to no longer accept parameters, rather these parameters should be specified in the environment initialised, i.e., gymnasium.make("CartPole-v1", render_mode="human")

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) → Tuple[ObsType, Dict[str, Any]]

Resets the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset.

Therefore, reset() should (in the typical use case) be called with a seed right after initialization and then never again.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

Changed in version v0.25: The return_info parameter was removed and now info is expected to be returned.

Args:

seed (optional int): The seed that is used to initialize the environment’s PRNG (np_random).: If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action.
options (optional dict): Additional information to specify how the environment is reset (optional,: depending on the specific environment)

Returns:

observation (ObsType): Observation of the initial state. This will be an element of observation_space: (typically a numpy array) and is analogous to the observation returned by step().
info (dictionary): This dictionary contains auxiliary information complementing observation. It should be analogous to: the info returned by step().

reset_battles(): Resets the player’s inner battle tracker.

start_challenging(n_challenges: int | None = None, callback: Callable[[AbstractBattle], None] | None = None)

Starts the challenge loop.

Parameters:

n_challenges (int, optional) – The number of challenges to send. If empty it will run until stopped.
callback (Callable[[AbstractBattle], None], optional) – The function to callback after each challenge with a copy of the final battle state.

start_laddering(n_challenges: int | None = None, callback: Callable[[AbstractBattle], None] | None = None)

Starts the laddering loop.

Parameters:

n_challenges (int, optional) – The number of ladder games to play. If empty it will run until stopped.
callback (Callable[[AbstractBattle], None], optional) – The function to callback after each challenge with a copy of the final battle state.

step(action: ActType) → Tuple[ObsType, float, bool, bool, Dict[str, Any]]

Execute the specified action in the environment.

Parameters:: action (ActType) – The action to be executed.
Returns:: A tuple containing the new observation, reward, termination flag, truncation flag, and info dictionary.
Return type:: Tuple[ObsType, float, bool, bool, Dict[str, Any]]

property username: str

The player’s username.

Returns:: The player’s username.
Return type:: str

property websocket_url: str

The websocket url.

It is derived from the server url.

Returns:: The websocket url.
Return type:: str

property win_rate: float

Random Player 

This module defines a random players baseline

class poke_env.player.random_player.RandomPlayer(account_configuration: AccountConfiguration | None = None, *, avatar: str | None = None, battle_format: str = 'gen9randombattle', log_level: int | None = None, max_concurrent_battles: int = 1, accept_open_team_sheet: bool = False, save_replays: bool | str = False, server_configuration: ServerConfiguration | None = None, start_timer_on_battle_start: bool = False, start_listening: bool = True, ping_interval: float | None = 20.0, ping_timeout: float | None = 20.0, team: str | Teambuilder | None = None)

Bases: Player

choose_move(battle: AbstractBattle) → BattleOrder

Abstract method to choose a move in a battle.

Parameters:: battle (AbstractBattle) – The battle.
Returns:: The move order.
Return type:: str

The player object and related subclasses

Env player

Player

OpenAIGymEnv

Random Player

Env player 

Player 

OpenAIGymEnv 

Random Player 