File size: 17,298 Bytes
05c9ac2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 |
# Unity ML-Agents Python Low Level API
The `mlagents` Python package contains two components: a low level API which
allows you to interact directly with a Unity Environment (`mlagents_envs`) and
an entry point to train (`mlagents-learn`) which allows you to train agents in
Unity Environments using our implementations of reinforcement learning or
imitation learning. This document describes how to use the `mlagents_envs` API.
For information on using `mlagents-learn`, see [here](Training-ML-Agents.md).
For Python Low Level API documentation, see [here](Python-LLAPI-Documentation.md).
The Python Low Level API can be used to interact directly with your Unity
learning environment. As such, it can serve as the basis for developing and
evaluating new learning algorithms.
## mlagents_envs
The ML-Agents Toolkit Low Level API is a Python API for controlling the
simulation loop of an environment or game built with Unity. This API is used by
the training algorithms inside the ML-Agent Toolkit, but you can also write your
own Python programs using this API.
The key objects in the Python API include:
- **UnityEnvironment** — the main interface between the Unity application and
your code. Use UnityEnvironment to start and control a simulation or training
session.
- **BehaviorName** - is a string that identifies a behavior in the simulation.
- **AgentId** - is an `int` that serves as unique identifier for Agents in the
simulation.
- **DecisionSteps** — contains the data from Agents belonging to the same
"Behavior" in the simulation, such as observations and rewards. Only Agents
that requested a decision since the last call to `env.step()` are in the
DecisionSteps object.
- **TerminalSteps** — contains the data from Agents belonging to the same
"Behavior" in the simulation, such as observations and rewards. Only Agents
whose episode ended since the last call to `env.step()` are in the
TerminalSteps object.
- **BehaviorSpec** — describes the shape of the observation data inside
DecisionSteps and TerminalSteps as well as the expected action shapes.
These classes are all defined in the
[base_env](../ml-agents-envs/mlagents_envs/base_env.py) script.
An Agent "Behavior" is a group of Agents identified by a `BehaviorName` that
share the same observations and action types (described in their
`BehaviorSpec`). You can think about Agent Behavior as a group of agents that
will share the same policy. All Agents with the same behavior have the same goal
and reward signals.
To communicate with an Agent in a Unity environment from a Python program, the
Agent in the simulation must have `Behavior Parameters` set to communicate. You
must set the `Behavior Type` to `Default` and give it a `Behavior Name`.
_Notice: Currently communication between Unity and Python takes place over an
open socket without authentication. As such, please make sure that the network
where training takes place is secure. This will be addressed in a future
release._
## Loading a Unity Environment
Python-side communication happens through `UnityEnvironment` which is located in
[`environment.py`](../ml-agents-envs/mlagents_envs/environment.py). To load a
Unity environment from a built binary file, put the file in the same directory
as `envs`. For example, if the filename of your Unity environment is `3DBall`,
in python, run:
```python
from mlagents_envs.environment import UnityEnvironment
# This is a non-blocking call that only loads the environment.
env = UnityEnvironment(file_name="3DBall", seed=1, side_channels=[])
# Start interacting with the environment.
env.reset()
behavior_names = env.behavior_specs.keys()
...
```
**NOTE:** Please read [Interacting with a Unity Environment](#interacting-with-a-unity-environment)
to read more about how you can interact with the Unity environment from Python.
- `file_name` is the name of the environment binary (located in the root
directory of the python project).
- `worker_id` indicates which port to use for communication with the
environment. For use in parallel training regimes such as A3C.
- `seed` indicates the seed to use when generating random numbers during the
training process. In environments which are stochastic, setting the seed
enables reproducible experimentation by ensuring that the environment and
trainers utilize the same random seed.
- `side_channels` provides a way to exchange data with the Unity simulation that
is not related to the reinforcement learning loop. For example: configurations
or properties. More on them in the [Side Channels](Custom-SideChannels.md) doc.
If you want to directly interact with the Editor, you need to use
`file_name=None`, then press the **Play** button in the Editor when the message
_"Start training by pressing the Play button in the Unity Editor"_ is displayed
on the screen
### Interacting with a Unity Environment
#### The BaseEnv interface
A `BaseEnv` has the following methods:
- **Reset : `env.reset()`** Sends a signal to reset the environment. Returns
None.
- **Step : `env.step()`** Sends a signal to step the environment. Returns None.
Note that a "step" for Python does not correspond to either Unity `Update` nor
`FixedUpdate`. When `step()` or `reset()` is called, the Unity simulation will
move forward until an Agent in the simulation needs a input from Python to
act.
- **Close : `env.close()`** Sends a shutdown signal to the environment and
terminates the communication.
- **Behavior Specs : `env.behavior_specs`** Returns a Mapping of
`BehaviorName` to `BehaviorSpec` objects (read only).
A `BehaviorSpec` contains the observation shapes and the
`ActionSpec` (which defines the action shape). Note that
the `BehaviorSpec` for a specific group is fixed throughout the simulation.
The number of entries in the Mapping can change over time in the simulation
if new Agent behaviors are created in the simulation.
- **Get Steps : `env.get_steps(behavior_name: str)`** Returns a tuple
`DecisionSteps, TerminalSteps` corresponding to the behavior_name given as
input. The `DecisionSteps` contains information about the state of the agents
**that need an action this step** and have the behavior behavior_name. The
`TerminalSteps` contains information about the state of the agents **whose
episode ended** and have the behavior behavior_name. Both `DecisionSteps` and
`TerminalSteps` contain information such as the observations, the rewards and
the agent identifiers. `DecisionSteps` also contains action masks for the next
action while `TerminalSteps` contains the reason for termination (did the
Agent reach its maximum step and was interrupted). The data is in `np.array`
of which the first dimension is always the number of agents note that the
number of agents is not guaranteed to remain constant during the simulation
and it is not unusual to have either `DecisionSteps` or `TerminalSteps`
contain no Agents at all.
- **Set Actions :`env.set_actions(behavior_name: str, action: ActionTuple)`** Sets
the actions for a whole agent group. `action` is an `ActionTuple`, which
is made up of a 2D `np.array` of `dtype=np.int32` for discrete actions, and
`dtype=np.float32` for continuous actions. The first dimension of `np.array`
in the tuple is the number of agents that requested a decision since the
last call to `env.step()`. The second dimension is the number of discrete or
continuous actions for the corresponding array.
- **Set Action for Agent :
`env.set_action_for_agent(agent_group: str, agent_id: int, action: ActionTuple)`**
Sets the action for a specific Agent in an agent group. `agent_group` is the
name of the group the Agent belongs to and `agent_id` is the integer
identifier of the Agent. `action` is an `ActionTuple` as described above.
**Note:** If no action is provided for an agent group between two calls to
`env.step()` then the default action will be all zeros.
#### DecisionSteps and DecisionStep
`DecisionSteps` (with `s`) contains information about a whole batch of Agents
while `DecisionStep` (no `s`) only contains information about a single Agent.
A `DecisionSteps` has the following fields :
- `obs` is a list of numpy arrays observations collected by the group of agent.
The first dimension of the array corresponds to the batch size of the group
(number of agents requesting a decision since the last call to `env.step()`).
- `reward` is a float vector of length batch size. Corresponds to the rewards
collected by each agent since the last simulation step.
- `agent_id` is an int vector of length batch size containing unique identifier
for the corresponding Agent. This is used to track Agents across simulation
steps.
- `action_mask` is an optional list of two dimensional arrays of booleans which is only
available when using multi-discrete actions. Each array corresponds to an
action branch. The first dimension of each array is the batch size and the
second contains a mask for each action of the branch. If true, the action is
not available for the agent during this simulation step.
It also has the two following methods:
- `len(DecisionSteps)` Returns the number of agents requesting a decision since
the last call to `env.step()`.
- `DecisionSteps[agent_id]` Returns a `DecisionStep` for the Agent with the
`agent_id` unique identifier.
A `DecisionStep` has the following fields:
- `obs` is a list of numpy arrays observations collected by the agent. (Each
array has one less dimension than the arrays in `DecisionSteps`)
- `reward` is a float. Corresponds to the rewards collected by the agent since
the last simulation step.
- `agent_id` is an int and an unique identifier for the corresponding Agent.
- `action_mask` is an optional list of one dimensional arrays of booleans which is only
available when using multi-discrete actions. Each array corresponds to an
action branch. Each array contains a mask for each action of the branch. If
true, the action is not available for the agent during this simulation step.
#### TerminalSteps and TerminalStep
Similarly to `DecisionSteps` and `DecisionStep`, `TerminalSteps` (with `s`)
contains information about a whole batch of Agents while `TerminalStep` (no `s`)
only contains information about a single Agent.
A `TerminalSteps` has the following fields :
- `obs` is a list of numpy arrays observations collected by the group of agent.
The first dimension of the array corresponds to the batch size of the group
(number of agents requesting a decision since the last call to `env.step()`).
- `reward` is a float vector of length batch size. Corresponds to the rewards
collected by each agent since the last simulation step.
- `agent_id` is an int vector of length batch size containing unique identifier
for the corresponding Agent. This is used to track Agents across simulation
steps.
- `interrupted` is an array of booleans of length batch size. Is true if the
associated Agent was interrupted since the last decision step. For example,
if the Agent reached the maximum number of steps for the episode.
It also has the two following methods:
- `len(TerminalSteps)` Returns the number of agents requesting a decision since
the last call to `env.step()`.
- `TerminalSteps[agent_id]` Returns a `TerminalStep` for the Agent with the
`agent_id` unique identifier.
A `TerminalStep` has the following fields:
- `obs` is a list of numpy arrays observations collected by the agent. (Each
array has one less dimension than the arrays in `TerminalSteps`)
- `reward` is a float. Corresponds to the rewards collected by the agent since
the last simulation step.
- `agent_id` is an int and an unique identifier for the corresponding Agent.
- `interrupted` is a bool. Is true if the Agent was interrupted since the last
decision step. For example, if the Agent reached the maximum number of steps for
the episode.
#### BehaviorSpec
A `BehaviorSpec` has the following fields :
- `observation_specs` is a List of `ObservationSpec` objects : Each `ObservationSpec`
corresponds to an observation's properties: `shape` is a tuple of ints that
corresponds to the shape of the observation (without the number of agents dimension).
`dimension_property` is a tuple of flags containing extra information about how the
data should be processed in the corresponding dimension. `observation_type` is an enum
corresponding to what type of observation is generating the data (i.e., default, goal,
etc). Note that the `ObservationSpec` have the same ordering as the ordering of observations
in the DecisionSteps, DecisionStep, TerminalSteps and TerminalStep.
- `action_spec` is an `ActionSpec` namedtuple that defines the number and types
of actions for the Agent.
An `ActionSpec` has the following fields and properties:
- `continuous_size` is the number of floats that constitute the continuous actions.
- `discrete_size` is the number of branches (the number of independent actions) that
constitute the multi-discrete actions.
- `discrete_branches` is a Tuple of ints. Each int corresponds to the number of
different options for each branch of the action. For example:
In a game direction input (no movement, left, right) and
jump input (no jump, jump) there will be two branches (direction and jump),
the first one with 3 options and the second with 2 options. (`discrete_size = 2`
and `discrete_action_branches = (3,2,)`)
### Communicating additional information with the Environment
In addition to the means of communicating between Unity and python described
above, we also provide methods for sharing agent-agnostic information. These
additional methods are referred to as side channels. ML-Agents includes two
ready-made side channels, described below. It is also possible to create custom
side channels to communicate any additional data between a Unity environment and
Python. Instructions for creating custom side channels can be found
[here](Custom-SideChannels.md).
Side channels exist as separate classes which are instantiated, and then passed
as list to the `side_channels` argument of the constructor of the
`UnityEnvironment` class.
```python
channel = MyChannel()
env = UnityEnvironment(side_channels = [channel])
```
**Note** : A side channel will only send/receive messages when `env.step` or
`env.reset()` is called.
#### EngineConfigurationChannel
The `EngineConfiguration` side channel allows you to modify the time-scale,
resolution, and graphics quality of the environment. This can be useful for
adjusting the environment to perform better during training, or be more
interpretable during inference.
`EngineConfigurationChannel` has two methods :
- `set_configuration_parameters` which takes the following arguments:
- `width`: Defines the width of the display. (Must be set alongside height)
- `height`: Defines the height of the display. (Must be set alongside width)
- `quality_level`: Defines the quality level of the simulation.
- `time_scale`: Defines the multiplier for the deltatime in the simulation. If
set to a higher value, time will pass faster in the simulation but the
physics may perform unpredictably.
- `target_frame_rate`: Instructs simulation to try to render at a specified
frame rate.
- `capture_frame_rate` Instructs the simulation to consider time between
updates to always be constant, regardless of the actual frame rate.
- `set_configuration` with argument config which is an `EngineConfig` NamedTuple
object.
For example, the following code would adjust the time-scale of the simulation to
be 2x realtime.
```python
from mlagents_envs.environment import UnityEnvironment
from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel
channel = EngineConfigurationChannel()
env = UnityEnvironment(side_channels=[channel])
channel.set_configuration_parameters(time_scale = 2.0)
i = env.reset()
...
```
#### EnvironmentParameters
The `EnvironmentParameters` will allow you to get and set pre-defined numerical
values in the environment. This can be useful for adjusting environment-specific
settings, or for reading non-agent related information from the environment. You
can call `get_property` and `set_property` on the side channel to read and write
properties.
`EnvironmentParametersChannel` has one methods:
- `set_float_parameter` Sets a float parameter in the Unity Environment.
- key: The string identifier of the property.
- value: The float value of the property.
```python
from mlagents_envs.environment import UnityEnvironment
from mlagents_envs.side_channel.environment_parameters_channel import EnvironmentParametersChannel
channel = EnvironmentParametersChannel()
env = UnityEnvironment(side_channels=[channel])
channel.set_float_parameter("parameter_1", 2.0)
i = env.reset()
...
```
Once a property has been modified in Python, you can access it in C# after the
next call to `step` as follows:
```csharp
var envParameters = Academy.Instance.EnvironmentParameters;
float property1 = envParameters.GetWithDefault("parameter_1", 0.0f);
```
#### Custom side channels
For information on how to make custom side channels for sending additional data
types, see the documentation [here](Custom-SideChannels.md).
|