ppo-Pyramids-Training / docs /Python-Custom-Trainer-Plugin.md
AnnaMats's picture
Second Push
05c9ac2

Unity Ml-Agents Custom trainers Plugin

As an attempt to bring a wider variety of reinforcement learning algorithms to our users, we have added custom trainers capabilities. we introduce an extensible plugin system to define new trainers based on the High level trainer API in Ml-agents Package. This will allow rerouting mlagents-learn CLI to custom trainers and extending the config files with hyper-parameters specific to your new trainers. We will expose a high-level extensible trainer (both on-policy, and off-policy trainers) optimizer and hyperparameter classes with documentation for the use of this plugin. For more infromation on how python plugin system works see Plugin interfaces.

Overview

Model-free RL algorithms generally fall into two broad categories: on-policy and off-policy. On-policy algorithms perform updates based on data gathered from the current policy. Off-policy algorithms learn a Q function from a buffer of previous data, then use this Q function to make decisions. Off-policy algorithms have three key benefits in the context of ML-Agents: They tend to use fewer samples than on-policy as they can pull and re-use data from the buffer many times. They allow player demonstrations to be inserted in-line with RL data into the buffer, enabling new ways of doing imitation learning by streaming player data.

To add new custom trainers to ML-agents, you would need to create a new python package. To give you an idea of how to structure your package, we have created a mlagents_trainer_plugin package ourselves as an example, with implementation of A2c and DQN algorithms. You would need a setup.py file to list extra requirements and register the new RL algorithm in ml-agents ecosystem and be able to call mlagents-learn CLI with your customized configuration.

β”œβ”€β”€ mlagents_trainer_plugin
β”‚    β”œβ”€β”€ __init__.py
β”‚    β”œβ”€β”€ a2c
β”‚    β”‚    β”œβ”€β”€ __init__.py
β”‚    β”‚    β”œβ”€β”€ a2c_3DBall.yaml
β”‚    β”‚    β”œβ”€β”€ a2c_optimizer.py
β”‚    β”‚    └── a2c_trainer.py
β”‚    └── dqn
β”‚        β”œβ”€β”€ __init__.py
β”‚        β”œβ”€β”€ dqn_basic.yaml
β”‚        β”œβ”€β”€ dqn_optimizer.py
β”‚        └── dqn_trainer.py
└── setup.py

Installation and Execution

If you haven't already, follow the installation instructions. Once you have the ml-agents-env and ml-agents packages you can install the plugin package. From the repository's root directory install ml-agents-trainer-plugin (or replace with the name of your plugin folder).

pip3 install -e <./ml-agents-trainer-plugin>

Following the previous installations your package is added as an entrypoint and you can use a config file with new trainers:

mlagents-learn ml-agents-trainer-plugin/mlagents_trainer_plugin/a2c/a2c_3DBall.yaml --run-id <run-id-name>
--env <env-executable>

Tutorial

Here’s a step-by-step tutorial on how to write a setup file and extend ml-agents trainers, optimizers, and hyperparameter settings.To extend ML-agents classes see references on trainers and Optimizer.