|
# Unity ML-Agents PettingZoo Wrapper |
|
|
|
With the increasing interest in multi-agent training with a gym-like API, we provide a |
|
PettingZoo Wrapper around the [Petting Zoo API](https://www.pettingzoo.ml/). Our wrapper |
|
provides interfaces on top of our `UnityEnvironment` class, which is the default way of |
|
interfacing with a Unity environment via Python. |
|
|
|
## Installation and Examples |
|
|
|
The PettingZoo wrapper is part of the `mlgents_envs` package. Please refer to the |
|
[mlagents_envs installation instructions](ML-Agents-Envs-README.md). |
|
|
|
[[Colab] PettingZoo Wrapper Example](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/develop-python-api-ga/ml-agents-envs/colabs/Colab_PettingZoo.ipynb) |
|
|
|
This colab notebook demonstrates the example usage of the wrapper, including installation, |
|
basic usages, and an example with our |
|
[Striker vs Goalie environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#strikers-vs-goalie) |
|
which is a multi-agents environment with multiple different behavior names. |
|
|
|
## API interface |
|
|
|
This wrapper is compatible with PettingZoo API. Please check out |
|
[PettingZoo API page](https://www.pettingzoo.ml/api) for more details. |
|
Here's an example of interacting with wrapped environment: |
|
|
|
```python |
|
from mlagents_envs.environment import UnityEnvironment |
|
from mlagents_envs.envs import UnityToPettingZooWrapper |
|
|
|
unity_env = UnityEnvironment("StrikersVsGoalie") |
|
env = UnityToPettingZooWrapper(unity_env) |
|
env.reset() |
|
for agent in env.agent_iter(): |
|
observation, reward, done, info = env.last() |
|
action = policy(observation, agent) |
|
env.step(action) |
|
``` |
|
|
|
## Notes |
|
- There is support for both [AEC](https://www.pettingzoo.ml/api#interacting-with-environments) |
|
and [Parallel](https://www.pettingzoo.ml/api#parallel-api) PettingZoo APIs. |
|
- The AEC wrapper is compatible with PettingZoo (PZ) API interface but works in a slightly |
|
different way under the hood. For the AEC API, Instead of stepping the environment in every `env.step(action)`, |
|
the PZ wrapper will store the action, and will only perform environment stepping when all the |
|
agents requesting for actions in the current step have been assigned an action. This is for |
|
performance, considering that the communication between Unity and python is more efficient |
|
when data are sent in batches. |
|
- Since the actions for the AEC wrapper are stored without applying them to the environment until |
|
all the actions are queued, some components of the API might behave in unexpected way. For example, a call |
|
to `env.reward` should return the instantaneous reward for that particular step, but the true |
|
reward would only be available when an actual environment step is performed. It's recommended that |
|
you follow the API definition for training (access rewards from `env.last()` instead of |
|
`env.reward`) and the underlying mechanism shouldn't affect training results. |
|
- The environments will automatically reset when it's done, so `env.agent_iter(max_step)` will |
|
keep going on until the specified max step is reached (default: `2**63`). There is no need to |
|
call `env.reset()` except for the very beginning of instantiating an environment. |
|
|
|
|