Action Mapping and Strict/Fake Modes

SinglesEnv and DoublesEnv expose helpers to convert between encoded actions and Showdown orders:

  • action_to_order

  • order_to_action

  • get_action_mask

Both methods support two important flags:

  • strict (default True): raises ValueError when a conversion is invalid.

  • fake (default False): allows best-effort conversions, even if not legal.

Prerequisites

  • You need a live Battle or DoubleBattle object, usually from a player callback or an environment step.

  • This guide is most useful when building custom RL policies, wrappers, or action encoders.

Why This Matters

During RL training or debugging, invalid actions happen frequently. You can choose between hard-failing (strict) and fallback behavior (non-strict).

  • Use strict=True while validating your policy and action encoding.

  • Use strict=False when you prefer robustness and fallback to random legal orders.

Note

The Gymnasium action_space spans the full encoded action range. Use get_action_mask to determine which actions are currently legal. Sentinel values like default/forfeit are accepted by the converters, but they are not emitted by the standard sampled action spaces.

Singles Example

import numpy as np

from poke_env.environment import SinglesEnv

mask = SinglesEnv.get_action_mask(battle)

# Convert action -> order
order = SinglesEnv.action_to_order(
    np.int64(6),
    battle,
    fake=False,
    strict=True,
)

# Convert order -> action
action = SinglesEnv.order_to_action(
    order,
    battle,
    fake=False,
    strict=True,
)

Doubles Example

import numpy as np

from poke_env.environment import DoublesEnv

mask = DoublesEnv.get_action_mask(battle)

action = np.array([7, 27], dtype=np.int64)
order = DoublesEnv.action_to_order(action, battle, fake=False, strict=False)

recovered_action = DoublesEnv.order_to_action(
    order,
    battle,
    fake=False,
    strict=False,
)