ppo-Pyramids-Training / docs /Python-Gym-API-Documentation.md
AnnaMats's picture
Second Push
05c9ac2

Table of Contents

mlagents_envs.envs.unity_gym_env

UnityGymException Objects

class UnityGymException(error.Error)

Any error related to the gym wrapper of ml-agents.

UnityToGymWrapper Objects

class UnityToGymWrapper(gym.Env)

Provides Gym wrapper for Unity Learning Environments.

__init__

 | __init__(unity_env: BaseEnv, uint8_visual: bool = False, flatten_branched: bool = False, allow_multiple_obs: bool = False, action_space_seed: Optional[int] = None)

Environment initialization

Arguments:

  • unity_env: The Unity BaseEnv to be wrapped in the gym. Will be closed when the UnityToGymWrapper closes.
  • uint8_visual: Return visual observations as uint8 (0-255) matrices instead of float (0.0-1.0).
  • flatten_branched: If True, turn branched discrete action spaces into a Discrete space rather than MultiDiscrete.
  • allow_multiple_obs: If True, return a list of np.ndarrays as observations with the first elements containing the visual observations and the last element containing the array of vector observations. If False, returns a single np.ndarray containing either only a single visual observation or the array of vector observations.
  • action_space_seed: If non-None, will be used to set the random seed on created gym.Space instances.

reset

 | reset() -> Union[List[np.ndarray], np.ndarray]

Resets the state of the environment and returns an initial observation. Returns: observation (object/list): the initial observation of the space.

step

 | step(action: List[Any]) -> GymStepResult

Run one timestep of the environment's dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment's state. Accepts an action and returns a tuple (observation, reward, done, info).

Arguments:

  • action object/list - an action provided by the environment

Returns:

  • observation object/list - agent's observation of the current environment reward (float/list) : amount of reward returned after previous action
  • done boolean/list - whether the episode has ended.
  • info dict - contains auxiliary diagnostic information.

render

 | render(mode="rgb_array")

Return the latest visual observations. Note that it will not render a new frame of the environment.

close

 | close() -> None

Override _close in your subclass to perform any necessary cleanup. Environments will automatically close() themselves when garbage collected or when the program exits.

seed

 | seed(seed: Any = None) -> None

Sets the seed for this env's random number generator(s). Currently not implemented.

ActionFlattener Objects

class ActionFlattener()

Flattens branched discrete action spaces into single-branch discrete action spaces.

__init__

 | __init__(branched_action_space)

Initialize the flattener.

Arguments:

  • branched_action_space: A List containing the sizes of each branch of the action space, e.g. [2,3,3] for three branches with size 2, 3, and 3 respectively.

lookup_action

 | lookup_action(action)

Convert a scalar discrete action into a unique set of branched actions.

Arguments:

  • action: A scalar value representing one of the discrete actions.

Returns:

The List containing the branched actions.