|
# Designing a Learning Environment |
|
|
|
This page contains general advice on how to design your learning environment, in |
|
addition to overviewing aspects of the ML-Agents Unity SDK that pertain to |
|
setting up your scene and simulation as opposed to designing your agents within |
|
the scene. We have a dedicated page on |
|
[Designing Agents](Learning-Environment-Design-Agents.md) which includes how to |
|
instrument observations, actions and rewards, define teams for multi-agent |
|
scenarios and record agent demonstrations for imitation learning. |
|
|
|
To help on-board to the entire set of functionality provided by the ML-Agents |
|
Toolkit, we recommend exploring our [API documentation](API-Reference.md). |
|
Additionally, our [example environments](Learning-Environment-Examples.md) are a |
|
great resource as they provide sample usage of almost all of our features. |
|
|
|
## The Simulation and Training Process |
|
|
|
Training and simulation proceed in steps orchestrated by the ML-Agents Academy |
|
class. The Academy works with Agent objects in the scene to step through the |
|
simulation. |
|
|
|
During training, the external Python training process communicates with the |
|
Academy to run a series of episodes while it collects data and optimizes its |
|
neural network model. When training is completed successfully, you can add the |
|
trained model file to your Unity project for later use. |
|
|
|
The ML-Agents Academy class orchestrates the agent simulation loop as follows: |
|
|
|
1. Calls your Academy's `OnEnvironmentReset` delegate. |
|
1. Calls the `OnEpisodeBegin()` function for each Agent in the scene. |
|
1. Gathers information about the scene. This is done by calling the |
|
`CollectObservations(VectorSensor sensor)` function for each Agent in the |
|
scene, as well as updating their sensor and collecting the resulting |
|
observations. |
|
1. Uses each Agent's Policy to decide on the Agent's next action. |
|
1. Calls the `OnActionReceived()` function for each Agent in the scene, passing |
|
in the action chosen by the Agent's Policy. |
|
1. Calls the Agent's `OnEpisodeBegin()` function if the Agent has reached its |
|
`Max Step` count or has otherwise marked itself as `EndEpisode()`. |
|
|
|
To create a training environment, extend the Agent class to implement the above |
|
methods whether you need to implement them or not depends on your specific |
|
scenario. |
|
|
|
## Organizing the Unity Scene |
|
|
|
To train and use the ML-Agents Toolkit in a Unity scene, the scene as many Agent |
|
subclasses as you need. Agent instances should be attached to the GameObject |
|
representing that Agent. |
|
|
|
### Academy |
|
|
|
The Academy is a singleton which orchestrates Agents and their decision making |
|
processes. Only a single Academy exists at a time. |
|
|
|
#### Academy resetting |
|
|
|
To alter the environment at the start of each episode, add your method to the |
|
Academy's OnEnvironmentReset action. |
|
|
|
```csharp |
|
public class MySceneBehavior : MonoBehaviour |
|
{ |
|
public void Awake() |
|
{ |
|
Academy.Instance.OnEnvironmentReset += EnvironmentReset; |
|
} |
|
|
|
void EnvironmentReset() |
|
{ |
|
// Reset the scene here |
|
} |
|
} |
|
``` |
|
|
|
For example, you might want to reset an Agent to its starting position or move a |
|
goal to a random position. An environment resets when the `reset()` method is |
|
called on the Python `UnityEnvironment`. |
|
|
|
When you reset an environment, consider the factors that should change so that |
|
training is generalizable to different conditions. For example, if you were |
|
training a maze-solving agent, you would probably want to change the maze itself |
|
for each training episode. Otherwise, the agent would probably on learn to solve |
|
one, particular maze, not mazes in general. |
|
|
|
### Multiple Areas |
|
|
|
In many of the example environments, many copies of the training area are |
|
instantiated in the scene. This generally speeds up training, allowing the |
|
environment to gather many experiences in parallel. This can be achieved simply |
|
by instantiating many Agents with the same Behavior Name. If possible, consider |
|
designing your scene to support multiple areas. |
|
|
|
Check out our example environments to see examples of multiple areas. |
|
Additionally, the |
|
[Making a New Learning Environment](Learning-Environment-Create-New.md#optional-multiple-training-areas-within-the-same-scene) |
|
guide demonstrates this option. |
|
|
|
## Environments |
|
|
|
When you create a training environment in Unity, you must set up the scene so |
|
that it can be controlled by the external training process. Considerations |
|
include: |
|
|
|
- The training scene must start automatically when your Unity application is |
|
launched by the training process. |
|
- The Academy must reset the scene to a valid starting point for each episode of |
|
training. |
|
- A training episode must have a definite end β either using `Max Steps` or by |
|
each Agent ending its episode manually with `EndEpisode()`. |
|
|
|
## Environment Parameters |
|
|
|
Curriculum learning and environment parameter randomization are two training |
|
methods that control specific parameters in your environment. As such, it is |
|
important to ensure that your environment parameters are updated at each step to |
|
the correct values. To enable this, we expose a `EnvironmentParameters` C# class |
|
that you can use to retrieve the values of the parameters defined in the |
|
training configurations for both of those features. Please see our |
|
[documentation](Training-ML-Agents.md#environment-parameters) |
|
for curriculum learning and environment parameter randomization for details. |
|
|
|
We recommend modifying the environment from the Agent's `OnEpisodeBegin()` |
|
function by leveraging `Academy.Instance.EnvironmentParameters`. See the |
|
WallJump example environment for a sample usage (specifically, |
|
[WallJumpAgent.cs](../Project/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs) |
|
). |
|
|
|
## Agent |
|
|
|
The Agent class represents an actor in the scene that collects observations and |
|
carries out actions. The Agent class is typically attached to the GameObject in |
|
the scene that otherwise represents the actor β for example, to a player object |
|
in a football game or a car object in a vehicle simulation. Every Agent must |
|
have appropriate `Behavior Parameters`. |
|
|
|
Generally, when creating an Agent, you should extend the Agent class and implement |
|
the `CollectObservations(VectorSensor sensor)` and `OnActionReceived()` methods: |
|
|
|
- `CollectObservations(VectorSensor sensor)` β Collects the Agent's observation |
|
of its environment. |
|
- `OnActionReceived()` β Carries out the action chosen by the Agent's Policy and |
|
assigns a reward to the current state. |
|
|
|
Your implementations of these functions determine how the Behavior Parameters |
|
assigned to this Agent must be set. |
|
|
|
You must also determine how an Agent finishes its task or times out. You can |
|
manually terminate an Agent episode in your `OnActionReceived()` function when |
|
the Agent has finished (or irrevocably failed) its task by calling the |
|
`EndEpisode()` function. You can also set the Agent's `Max Steps` property to a |
|
positive value and the Agent will consider the episode over after it has taken |
|
that many steps. You can use the `Agent.OnEpisodeBegin()` function to prepare |
|
the Agent to start again. |
|
|
|
See [Agents](Learning-Environment-Design-Agents.md) for detailed information |
|
about programming your own Agents. |
|
|
|
## Recording Statistics |
|
|
|
We offer developers a mechanism to record statistics from within their Unity |
|
environments. These statistics are aggregated and generated during the training |
|
process. To record statistics, see the `StatsRecorder` C# class. |
|
|
|
See the FoodCollector example environment for a sample usage (specifically, |
|
[FoodCollectorSettings.cs](../Project/Assets/ML-Agents/Examples/FoodCollector/Scripts/FoodCollectorSettings.cs) |
|
). |
|
|