| # Getting Started Guide | |
| This guide walks through the end-to-end process of opening one of our | |
| [example environments](Learning-Environment-Examples.md) in Unity, training an | |
| Agent in it, and embedding the trained model into the Unity environment. After | |
| reading this tutorial, you should be able to train any of the example | |
| environments. If you are not familiar with the | |
| [Unity Engine](https://unity3d.com/unity), view our | |
| [Background: Unity](Background-Unity.md) page for helpful pointers. | |
| Additionally, if you're not familiar with machine learning, view our | |
| [Background: Machine Learning](Background-Machine-Learning.md) page for a brief | |
| overview and helpful pointers. | |
|  | |
| For this guide, we'll use the **3D Balance Ball** environment which contains a | |
| number of agent cubes and balls (which are all copies of each other). Each agent | |
| cube tries to keep its ball from falling by rotating either horizontally or | |
| vertically. In this environment, an agent cube is an **Agent** that receives a | |
| reward for every step that it balances the ball. An agent is also penalized with | |
| a negative reward for dropping the ball. The goal of the training process is to | |
| have the agents learn to balance the ball on their head. | |
| Let's get started! | |
| ## Installation | |
| If you haven't already, follow the [installation instructions](Installation.md). | |
| Afterwards, open the Unity Project that contains all the example environments: | |
| 1. Open the Package Manager Window by navigating to `Window -> Package Manager` | |
| in the menu. | |
| 1. Navigate to the ML-Agents Package and click on it. | |
| 1. Find the `3D Ball` sample and click `Import`. | |
| 1. In the **Project** window, go to the | |
| `Assets/ML-Agents/Examples/3DBall/Scenes` folder and open the `3DBall` scene | |
| file. | |
| ## Understanding a Unity Environment | |
| An agent is an autonomous actor that observes and interacts with an | |
| _environment_. In the context of Unity, an environment is a scene containing one | |
| or more Agent objects, and, of course, the other entities that an agent | |
| interacts with. | |
|  | |
| **Note:** In Unity, the base object of everything in a scene is the | |
| _GameObject_. The GameObject is essentially a container for everything else, | |
| including behaviors, graphics, physics, etc. To see the components that make up | |
| a GameObject, select the GameObject in the Scene window, and open the Inspector | |
| window. The Inspector shows every component on a GameObject. | |
| The first thing you may notice after opening the 3D Balance Ball scene is that | |
| it contains not one, but several agent cubes. Each agent cube in the scene is an | |
| independent agent, but they all share the same Behavior. 3D Balance Ball does | |
| this to speed up training since all twelve agents contribute to training in | |
| parallel. | |
| ### Agent | |
| The Agent is the actor that observes and takes actions in the environment. In | |
| the 3D Balance Ball environment, the Agent components are placed on the twelve | |
| "Agent" GameObjects. The base Agent object has a few properties that affect its | |
| behavior: | |
| - **Behavior Parameters** — Every Agent must have a Behavior. The Behavior | |
| determines how an Agent makes decisions. | |
| - **Max Step** — Defines how many simulation steps can occur before the Agent's | |
| episode ends. In 3D Balance Ball, an Agent restarts after 5000 steps. | |
| #### Behavior Parameters : Vector Observation Space | |
| Before making a decision, an agent collects its observation about its state in | |
| the world. The vector observation is a vector of floating point numbers which | |
| contain relevant information for the agent to make decisions. | |
| The Behavior Parameters of the 3D Balance Ball example uses a `Space Size` of 8. | |
| This means that the feature vector containing the Agent's observations contains | |
| eight elements: the `x` and `z` components of the agent cube's rotation and the | |
| `x`, `y`, and `z` components of the ball's relative position and velocity. | |
| #### Behavior Parameters : Actions | |
| An Agent is given instructions in the form of actions. | |
| ML-Agents Toolkit classifies actions into two types: continuous and discrete. | |
| The 3D Balance Ball example is programmed to use continuous actions, which | |
| are a vector of floating-point numbers that can vary continuously. More specifically, | |
| it uses a `Space Size` of 2 to control the amount of `x` and `z` rotations to apply to | |
| itself to keep the ball balanced on its head. | |
| ## Running a pre-trained model | |
| We include pre-trained models for our agents (`.onnx` files) and we use the | |
| [Unity Inference Engine](Unity-Inference-Engine.md) to run these models inside | |
| Unity. In this section, we will use the pre-trained model for the 3D Ball | |
| example. | |
| 1. In the **Project** window, go to the | |
| `Assets/ML-Agents/Examples/3DBall/Prefabs` folder. Expand `3DBall` and click | |
| on the `Agent` prefab. You should see the `Agent` prefab in the **Inspector** | |
| window. | |
| **Note**: The platforms in the `3DBall` scene were created using the `3DBall` | |
| prefab. Instead of updating all 12 platforms individually, you can update the | |
| `3DBall` prefab instead. | |
|  | |
| 1. In the **Project** window, drag the **3DBall** Model located in | |
| `Assets/ML-Agents/Examples/3DBall/TFModels` into the `Model` property under | |
| `Behavior Parameters (Script)` component in the Agent GameObject | |
| **Inspector** window. | |
|  | |
| 1. You should notice that each `Agent` under each `3DBall` in the **Hierarchy** | |
| windows now contains **3DBall** as `Model` on the `Behavior Parameters`. | |
| **Note** : You can modify multiple game objects in a scene by selecting them | |
| all at once using the search bar in the Scene Hierarchy. | |
| 1. Set the **Inference Device** to use for this model as `CPU`. | |
| 1. Click the **Play** button in the Unity Editor and you will see the platforms | |
| balance the balls using the pre-trained model. | |
| ## Training a new model with Reinforcement Learning | |
| While we provide pre-trained models for the agents in this environment, any | |
| environment you make yourself will require training agents from scratch to | |
| generate a new model file. In this section we will demonstrate how to use the | |
| reinforcement learning algorithms that are part of the ML-Agents Python package | |
| to accomplish this. We have provided a convenient command `mlagents-learn` which | |
| accepts arguments used to configure both training and inference phases. | |
| ### Training the environment | |
| 1. Open a command or terminal window. | |
| 1. Navigate to the folder where you cloned the `ml-agents` repository. **Note**: | |
| If you followed the default [installation](Installation.md), then you should | |
| be able to run `mlagents-learn` from any directory. | |
| 1. Run `mlagents-learn config/ppo/3DBall.yaml --run-id=first3DBallRun`. | |
| - `config/ppo/3DBall.yaml` is the path to a default training | |
| configuration file that we provide. The `config/ppo` folder includes training configuration | |
| files for all our example environments, including 3DBall. | |
| - `run-id` is a unique name for this training session. | |
| 1. When the message _"Start training by pressing the Play button in the Unity | |
| Editor"_ is displayed on the screen, you can press the **Play** button in | |
| Unity to start training in the Editor. | |
| If `mlagents-learn` runs correctly and starts training, you should see something | |
| like this: | |
| ```console | |
| INFO:mlagents_envs: | |
| 'Ball3DAcademy' started successfully! | |
| Unity Academy name: Ball3DAcademy | |
| INFO:mlagents_envs:Connected new brain: | |
| Unity brain name: 3DBallLearning | |
| Number of Visual Observations (per agent): 0 | |
| Vector Observation space size (per agent): 8 | |
| Number of stacked Vector Observation: 1 | |
| INFO:mlagents_envs:Hyperparameters for the PPO Trainer of brain 3DBallLearning: | |
| batch_size: 64 | |
| beta: 0.001 | |
| buffer_size: 12000 | |
| epsilon: 0.2 | |
| gamma: 0.995 | |
| hidden_units: 128 | |
| lambd: 0.99 | |
| learning_rate: 0.0003 | |
| max_steps: 5.0e4 | |
| normalize: True | |
| num_epoch: 3 | |
| num_layers: 2 | |
| time_horizon: 1000 | |
| sequence_length: 64 | |
| summary_freq: 1000 | |
| use_recurrent: False | |
| memory_size: 256 | |
| use_curiosity: False | |
| curiosity_strength: 0.01 | |
| curiosity_enc_size: 128 | |
| output_path: ./results/first3DBallRun/3DBallLearning | |
| INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 1000. Mean Reward: 1.242. Std of Reward: 0.746. Training. | |
| INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 2000. Mean Reward: 1.319. Std of Reward: 0.693. Training. | |
| INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 3000. Mean Reward: 1.804. Std of Reward: 1.056. Training. | |
| INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 4000. Mean Reward: 2.151. Std of Reward: 1.432. Training. | |
| INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 5000. Mean Reward: 3.175. Std of Reward: 2.250. Training. | |
| INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 6000. Mean Reward: 4.898. Std of Reward: 4.019. Training. | |
| INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 7000. Mean Reward: 6.716. Std of Reward: 5.125. Training. | |
| INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 8000. Mean Reward: 12.124. Std of Reward: 11.929. Training. | |
| INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 9000. Mean Reward: 18.151. Std of Reward: 16.871. Training. | |
| INFO:mlagents.trainers: first3DBallRun: 3DBallLearning: Step: 10000. Mean Reward: 27.284. Std of Reward: 28.667. Training. | |
| ``` | |
| Note how the `Mean Reward` value printed to the screen increases as training | |
| progresses. This is a positive sign that training is succeeding. | |
| **Note**: You can train using an executable rather than the Editor. To do so, | |
| follow the instructions in | |
| [Using an Executable](Learning-Environment-Executable.md). | |
| ### Observing Training Progress | |
| Once you start training using `mlagents-learn` in the way described in the | |
| previous section, the `ml-agents` directory will contain a `results` | |
| directory. In order to observe the training process in more detail, you can use | |
| TensorBoard. From the command line run: | |
| ```sh | |
| tensorboard --logdir results | |
| ``` | |
| Then navigate to `localhost:6006` in your browser to view the TensorBoard | |
| summary statistics as shown below. For the purposes of this section, the most | |
| important statistic is `Environment/Cumulative Reward` which should increase | |
| throughout training, eventually converging close to `100` which is the maximum | |
| reward the agent can accumulate. | |
|  | |
| ## Embedding the model into the Unity Environment | |
| Once the training process completes, and the training process saves the model | |
| (denoted by the `Saved Model` message) you can add it to the Unity project and | |
| use it with compatible Agents (the Agents that generated the model). **Note:** | |
| Do not just close the Unity Window once the `Saved Model` message appears. | |
| Either wait for the training process to close the window or press `Ctrl+C` at | |
| the command-line prompt. If you close the window manually, the `.onnx` file | |
| containing the trained model is not exported into the ml-agents folder. | |
| If you've quit the training early using `Ctrl+C` and want to resume training, | |
| run the same command again, appending the `--resume` flag: | |
| ```sh | |
| mlagents-learn config/ppo/3DBall.yaml --run-id=first3DBallRun --resume | |
| ``` | |
| Your trained model will be at `results/<run-identifier>/<behavior_name>.onnx` where | |
| `<behavior_name>` is the name of the `Behavior Name` of the agents corresponding | |
| to the model. This file corresponds to your model's latest checkpoint. You can | |
| now embed this trained model into your Agents by following the steps below, | |
| which is similar to the steps described [above](#running-a-pre-trained-model). | |
| 1. Move your model file into | |
| `Project/Assets/ML-Agents/Examples/3DBall/TFModels/`. | |
| 1. Open the Unity Editor, and select the **3DBall** scene as described above. | |
| 1. Select the **3DBall** prefab Agent object. | |
| 1. Drag the `<behavior_name>.onnx` file from the Project window of the Editor to | |
| the **Model** placeholder in the **Ball3DAgent** inspector window. | |
| 1. Press the **Play** button at the top of the Editor. | |
| ## Next Steps | |
| - For more information on the ML-Agents Toolkit, in addition to helpful | |
| background, check out the [ML-Agents Toolkit Overview](ML-Agents-Overview.md) | |
| page. | |
| - For a "Hello World" introduction to creating your own Learning Environment, | |
| check out the | |
| [Making a New Learning Environment](Learning-Environment-Create-New.md) page. | |
| - For an overview on the more complex example environments that are provided in | |
| this toolkit, check out the | |
| [Example Environments](Learning-Environment-Examples.md) page. | |
| - For more information on the various training options available, check out the | |
| [Training ML-Agents](Training-ML-Agents.md) page. | |