|
# Unity ML-Agents Python Low Level API |
|
|
|
The `mlagents` Python package contains two components: a low level API which |
|
allows you to interact directly with a Unity Environment (`mlagents_envs`) and |
|
an entry point to train (`mlagents-learn`) which allows you to train agents in |
|
Unity Environments using our implementations of reinforcement learning or |
|
imitation learning. This document describes how to use the `mlagents_envs` API. |
|
For information on using `mlagents-learn`, see [here](Training-ML-Agents.md). |
|
For Python Low Level API documentation, see [here](Python-LLAPI-Documentation.md). |
|
|
|
The Python Low Level API can be used to interact directly with your Unity |
|
learning environment. As such, it can serve as the basis for developing and |
|
evaluating new learning algorithms. |
|
|
|
## mlagents_envs |
|
|
|
The ML-Agents Toolkit Low Level API is a Python API for controlling the |
|
simulation loop of an environment or game built with Unity. This API is used by |
|
the training algorithms inside the ML-Agent Toolkit, but you can also write your |
|
own Python programs using this API. |
|
|
|
The key objects in the Python API include: |
|
|
|
- **UnityEnvironment** — the main interface between the Unity application and |
|
your code. Use UnityEnvironment to start and control a simulation or training |
|
session. |
|
- **BehaviorName** - is a string that identifies a behavior in the simulation. |
|
- **AgentId** - is an `int` that serves as unique identifier for Agents in the |
|
simulation. |
|
- **DecisionSteps** — contains the data from Agents belonging to the same |
|
"Behavior" in the simulation, such as observations and rewards. Only Agents |
|
that requested a decision since the last call to `env.step()` are in the |
|
DecisionSteps object. |
|
- **TerminalSteps** — contains the data from Agents belonging to the same |
|
"Behavior" in the simulation, such as observations and rewards. Only Agents |
|
whose episode ended since the last call to `env.step()` are in the |
|
TerminalSteps object. |
|
- **BehaviorSpec** — describes the shape of the observation data inside |
|
DecisionSteps and TerminalSteps as well as the expected action shapes. |
|
|
|
These classes are all defined in the |
|
[base_env](../ml-agents-envs/mlagents_envs/base_env.py) script. |
|
|
|
An Agent "Behavior" is a group of Agents identified by a `BehaviorName` that |
|
share the same observations and action types (described in their |
|
`BehaviorSpec`). You can think about Agent Behavior as a group of agents that |
|
will share the same policy. All Agents with the same behavior have the same goal |
|
and reward signals. |
|
|
|
To communicate with an Agent in a Unity environment from a Python program, the |
|
Agent in the simulation must have `Behavior Parameters` set to communicate. You |
|
must set the `Behavior Type` to `Default` and give it a `Behavior Name`. |
|
|
|
_Notice: Currently communication between Unity and Python takes place over an |
|
open socket without authentication. As such, please make sure that the network |
|
where training takes place is secure. This will be addressed in a future |
|
release._ |
|
|
|
## Loading a Unity Environment |
|
|
|
Python-side communication happens through `UnityEnvironment` which is located in |
|
[`environment.py`](../ml-agents-envs/mlagents_envs/environment.py). To load a |
|
Unity environment from a built binary file, put the file in the same directory |
|
as `envs`. For example, if the filename of your Unity environment is `3DBall`, |
|
in python, run: |
|
|
|
```python |
|
from mlagents_envs.environment import UnityEnvironment |
|
# This is a non-blocking call that only loads the environment. |
|
env = UnityEnvironment(file_name="3DBall", seed=1, side_channels=[]) |
|
# Start interacting with the environment. |
|
env.reset() |
|
behavior_names = env.behavior_specs.keys() |
|
... |
|
``` |
|
**NOTE:** Please read [Interacting with a Unity Environment](#interacting-with-a-unity-environment) |
|
to read more about how you can interact with the Unity environment from Python. |
|
|
|
- `file_name` is the name of the environment binary (located in the root |
|
directory of the python project). |
|
- `worker_id` indicates which port to use for communication with the |
|
environment. For use in parallel training regimes such as A3C. |
|
- `seed` indicates the seed to use when generating random numbers during the |
|
training process. In environments which are stochastic, setting the seed |
|
enables reproducible experimentation by ensuring that the environment and |
|
trainers utilize the same random seed. |
|
- `side_channels` provides a way to exchange data with the Unity simulation that |
|
is not related to the reinforcement learning loop. For example: configurations |
|
or properties. More on them in the [Side Channels](Custom-SideChannels.md) doc. |
|
|
|
If you want to directly interact with the Editor, you need to use |
|
`file_name=None`, then press the **Play** button in the Editor when the message |
|
_"Start training by pressing the Play button in the Unity Editor"_ is displayed |
|
on the screen |
|
|
|
### Interacting with a Unity Environment |
|
|
|
#### The BaseEnv interface |
|
|
|
A `BaseEnv` has the following methods: |
|
|
|
- **Reset : `env.reset()`** Sends a signal to reset the environment. Returns |
|
None. |
|
- **Step : `env.step()`** Sends a signal to step the environment. Returns None. |
|
Note that a "step" for Python does not correspond to either Unity `Update` nor |
|
`FixedUpdate`. When `step()` or `reset()` is called, the Unity simulation will |
|
move forward until an Agent in the simulation needs a input from Python to |
|
act. |
|
- **Close : `env.close()`** Sends a shutdown signal to the environment and |
|
terminates the communication. |
|
- **Behavior Specs : `env.behavior_specs`** Returns a Mapping of |
|
`BehaviorName` to `BehaviorSpec` objects (read only). |
|
A `BehaviorSpec` contains the observation shapes and the |
|
`ActionSpec` (which defines the action shape). Note that |
|
the `BehaviorSpec` for a specific group is fixed throughout the simulation. |
|
The number of entries in the Mapping can change over time in the simulation |
|
if new Agent behaviors are created in the simulation. |
|
- **Get Steps : `env.get_steps(behavior_name: str)`** Returns a tuple |
|
`DecisionSteps, TerminalSteps` corresponding to the behavior_name given as |
|
input. The `DecisionSteps` contains information about the state of the agents |
|
**that need an action this step** and have the behavior behavior_name. The |
|
`TerminalSteps` contains information about the state of the agents **whose |
|
episode ended** and have the behavior behavior_name. Both `DecisionSteps` and |
|
`TerminalSteps` contain information such as the observations, the rewards and |
|
the agent identifiers. `DecisionSteps` also contains action masks for the next |
|
action while `TerminalSteps` contains the reason for termination (did the |
|
Agent reach its maximum step and was interrupted). The data is in `np.array` |
|
of which the first dimension is always the number of agents note that the |
|
number of agents is not guaranteed to remain constant during the simulation |
|
and it is not unusual to have either `DecisionSteps` or `TerminalSteps` |
|
contain no Agents at all. |
|
- **Set Actions :`env.set_actions(behavior_name: str, action: ActionTuple)`** Sets |
|
the actions for a whole agent group. `action` is an `ActionTuple`, which |
|
is made up of a 2D `np.array` of `dtype=np.int32` for discrete actions, and |
|
`dtype=np.float32` for continuous actions. The first dimension of `np.array` |
|
in the tuple is the number of agents that requested a decision since the |
|
last call to `env.step()`. The second dimension is the number of discrete or |
|
continuous actions for the corresponding array. |
|
- **Set Action for Agent : |
|
`env.set_action_for_agent(agent_group: str, agent_id: int, action: ActionTuple)`** |
|
Sets the action for a specific Agent in an agent group. `agent_group` is the |
|
name of the group the Agent belongs to and `agent_id` is the integer |
|
identifier of the Agent. `action` is an `ActionTuple` as described above. |
|
**Note:** If no action is provided for an agent group between two calls to |
|
`env.step()` then the default action will be all zeros. |
|
|
|
#### DecisionSteps and DecisionStep |
|
|
|
`DecisionSteps` (with `s`) contains information about a whole batch of Agents |
|
while `DecisionStep` (no `s`) only contains information about a single Agent. |
|
|
|
A `DecisionSteps` has the following fields : |
|
|
|
- `obs` is a list of numpy arrays observations collected by the group of agent. |
|
The first dimension of the array corresponds to the batch size of the group |
|
(number of agents requesting a decision since the last call to `env.step()`). |
|
- `reward` is a float vector of length batch size. Corresponds to the rewards |
|
collected by each agent since the last simulation step. |
|
- `agent_id` is an int vector of length batch size containing unique identifier |
|
for the corresponding Agent. This is used to track Agents across simulation |
|
steps. |
|
- `action_mask` is an optional list of two dimensional arrays of booleans which is only |
|
available when using multi-discrete actions. Each array corresponds to an |
|
action branch. The first dimension of each array is the batch size and the |
|
second contains a mask for each action of the branch. If true, the action is |
|
not available for the agent during this simulation step. |
|
|
|
It also has the two following methods: |
|
|
|
- `len(DecisionSteps)` Returns the number of agents requesting a decision since |
|
the last call to `env.step()`. |
|
- `DecisionSteps[agent_id]` Returns a `DecisionStep` for the Agent with the |
|
`agent_id` unique identifier. |
|
|
|
A `DecisionStep` has the following fields: |
|
|
|
- `obs` is a list of numpy arrays observations collected by the agent. (Each |
|
array has one less dimension than the arrays in `DecisionSteps`) |
|
- `reward` is a float. Corresponds to the rewards collected by the agent since |
|
the last simulation step. |
|
- `agent_id` is an int and an unique identifier for the corresponding Agent. |
|
- `action_mask` is an optional list of one dimensional arrays of booleans which is only |
|
available when using multi-discrete actions. Each array corresponds to an |
|
action branch. Each array contains a mask for each action of the branch. If |
|
true, the action is not available for the agent during this simulation step. |
|
|
|
#### TerminalSteps and TerminalStep |
|
|
|
Similarly to `DecisionSteps` and `DecisionStep`, `TerminalSteps` (with `s`) |
|
contains information about a whole batch of Agents while `TerminalStep` (no `s`) |
|
only contains information about a single Agent. |
|
|
|
A `TerminalSteps` has the following fields : |
|
|
|
- `obs` is a list of numpy arrays observations collected by the group of agent. |
|
The first dimension of the array corresponds to the batch size of the group |
|
(number of agents requesting a decision since the last call to `env.step()`). |
|
- `reward` is a float vector of length batch size. Corresponds to the rewards |
|
collected by each agent since the last simulation step. |
|
- `agent_id` is an int vector of length batch size containing unique identifier |
|
for the corresponding Agent. This is used to track Agents across simulation |
|
steps. |
|
- `interrupted` is an array of booleans of length batch size. Is true if the |
|
associated Agent was interrupted since the last decision step. For example, |
|
if the Agent reached the maximum number of steps for the episode. |
|
|
|
It also has the two following methods: |
|
|
|
- `len(TerminalSteps)` Returns the number of agents requesting a decision since |
|
the last call to `env.step()`. |
|
- `TerminalSteps[agent_id]` Returns a `TerminalStep` for the Agent with the |
|
`agent_id` unique identifier. |
|
|
|
A `TerminalStep` has the following fields: |
|
|
|
- `obs` is a list of numpy arrays observations collected by the agent. (Each |
|
array has one less dimension than the arrays in `TerminalSteps`) |
|
- `reward` is a float. Corresponds to the rewards collected by the agent since |
|
the last simulation step. |
|
- `agent_id` is an int and an unique identifier for the corresponding Agent. |
|
- `interrupted` is a bool. Is true if the Agent was interrupted since the last |
|
decision step. For example, if the Agent reached the maximum number of steps for |
|
the episode. |
|
|
|
#### BehaviorSpec |
|
|
|
A `BehaviorSpec` has the following fields : |
|
|
|
- `observation_specs` is a List of `ObservationSpec` objects : Each `ObservationSpec` |
|
corresponds to an observation's properties: `shape` is a tuple of ints that |
|
corresponds to the shape of the observation (without the number of agents dimension). |
|
`dimension_property` is a tuple of flags containing extra information about how the |
|
data should be processed in the corresponding dimension. `observation_type` is an enum |
|
corresponding to what type of observation is generating the data (i.e., default, goal, |
|
etc). Note that the `ObservationSpec` have the same ordering as the ordering of observations |
|
in the DecisionSteps, DecisionStep, TerminalSteps and TerminalStep. |
|
- `action_spec` is an `ActionSpec` namedtuple that defines the number and types |
|
of actions for the Agent. |
|
|
|
An `ActionSpec` has the following fields and properties: |
|
- `continuous_size` is the number of floats that constitute the continuous actions. |
|
- `discrete_size` is the number of branches (the number of independent actions) that |
|
constitute the multi-discrete actions. |
|
- `discrete_branches` is a Tuple of ints. Each int corresponds to the number of |
|
different options for each branch of the action. For example: |
|
In a game direction input (no movement, left, right) and |
|
jump input (no jump, jump) there will be two branches (direction and jump), |
|
the first one with 3 options and the second with 2 options. (`discrete_size = 2` |
|
and `discrete_action_branches = (3,2,)`) |
|
|
|
|
|
### Communicating additional information with the Environment |
|
|
|
In addition to the means of communicating between Unity and python described |
|
above, we also provide methods for sharing agent-agnostic information. These |
|
additional methods are referred to as side channels. ML-Agents includes two |
|
ready-made side channels, described below. It is also possible to create custom |
|
side channels to communicate any additional data between a Unity environment and |
|
Python. Instructions for creating custom side channels can be found |
|
[here](Custom-SideChannels.md). |
|
|
|
Side channels exist as separate classes which are instantiated, and then passed |
|
as list to the `side_channels` argument of the constructor of the |
|
`UnityEnvironment` class. |
|
|
|
```python |
|
channel = MyChannel() |
|
|
|
env = UnityEnvironment(side_channels = [channel]) |
|
``` |
|
|
|
**Note** : A side channel will only send/receive messages when `env.step` or |
|
`env.reset()` is called. |
|
|
|
#### EngineConfigurationChannel |
|
|
|
The `EngineConfiguration` side channel allows you to modify the time-scale, |
|
resolution, and graphics quality of the environment. This can be useful for |
|
adjusting the environment to perform better during training, or be more |
|
interpretable during inference. |
|
|
|
`EngineConfigurationChannel` has two methods : |
|
|
|
- `set_configuration_parameters` which takes the following arguments: |
|
- `width`: Defines the width of the display. (Must be set alongside height) |
|
- `height`: Defines the height of the display. (Must be set alongside width) |
|
- `quality_level`: Defines the quality level of the simulation. |
|
- `time_scale`: Defines the multiplier for the deltatime in the simulation. If |
|
set to a higher value, time will pass faster in the simulation but the |
|
physics may perform unpredictably. |
|
- `target_frame_rate`: Instructs simulation to try to render at a specified |
|
frame rate. |
|
- `capture_frame_rate` Instructs the simulation to consider time between |
|
updates to always be constant, regardless of the actual frame rate. |
|
- `set_configuration` with argument config which is an `EngineConfig` NamedTuple |
|
object. |
|
|
|
For example, the following code would adjust the time-scale of the simulation to |
|
be 2x realtime. |
|
|
|
```python |
|
from mlagents_envs.environment import UnityEnvironment |
|
from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel |
|
|
|
channel = EngineConfigurationChannel() |
|
|
|
env = UnityEnvironment(side_channels=[channel]) |
|
|
|
channel.set_configuration_parameters(time_scale = 2.0) |
|
|
|
i = env.reset() |
|
... |
|
``` |
|
|
|
#### EnvironmentParameters |
|
|
|
The `EnvironmentParameters` will allow you to get and set pre-defined numerical |
|
values in the environment. This can be useful for adjusting environment-specific |
|
settings, or for reading non-agent related information from the environment. You |
|
can call `get_property` and `set_property` on the side channel to read and write |
|
properties. |
|
|
|
`EnvironmentParametersChannel` has one methods: |
|
|
|
- `set_float_parameter` Sets a float parameter in the Unity Environment. |
|
- key: The string identifier of the property. |
|
- value: The float value of the property. |
|
|
|
```python |
|
from mlagents_envs.environment import UnityEnvironment |
|
from mlagents_envs.side_channel.environment_parameters_channel import EnvironmentParametersChannel |
|
|
|
channel = EnvironmentParametersChannel() |
|
|
|
env = UnityEnvironment(side_channels=[channel]) |
|
|
|
channel.set_float_parameter("parameter_1", 2.0) |
|
|
|
i = env.reset() |
|
... |
|
``` |
|
|
|
Once a property has been modified in Python, you can access it in C# after the |
|
next call to `step` as follows: |
|
|
|
```csharp |
|
var envParameters = Academy.Instance.EnvironmentParameters; |
|
float property1 = envParameters.GetWithDefault("parameter_1", 0.0f); |
|
``` |
|
|
|
#### Custom side channels |
|
|
|
For information on how to make custom side channels for sending additional data |
|
types, see the documentation [here](Custom-SideChannels.md). |
|
|