|
# Unity Ml-Agents Custom trainers Plugin |
|
|
|
As an attempt to bring a wider variety of reinforcement learning algorithms to our users, we have added custom trainers |
|
capabilities. we introduce an extensible plugin system to define new trainers based on the High level trainer API |
|
in `Ml-agents` Package. This will allow rerouting `mlagents-learn` CLI to custom trainers and extending the config files |
|
with hyper-parameters specific to your new trainers. We will expose a high-level extensible trainer (both on-policy, |
|
and off-policy trainers) optimizer and hyperparameter classes with documentation for the use of this plugin. For more |
|
infromation on how python plugin system works see [Plugin interfaces](Training-Plugins.md). |
|
## Overview |
|
Model-free RL algorithms generally fall into two broad categories: on-policy and off-policy. On-policy algorithms perform updates based on data gathered from the current policy. Off-policy algorithms learn a Q function from a buffer of previous data, then use this Q function to make decisions. Off-policy algorithms have three key benefits in the context of ML-Agents: They tend to use fewer samples than on-policy as they can pull and re-use data from the buffer many times. They allow player demonstrations to be inserted in-line with RL data into the buffer, enabling new ways of doing imitation learning by streaming player data. |
|
|
|
To add new custom trainers to ML-agents, you would need to create a new python package. |
|
To give you an idea of how to structure your package, we have created a [mlagents_trainer_plugin](../ml-agents-trainer-plugin) package ourselves as an |
|
example, with implementation of `A2c` and `DQN` algorithms. You would need a `setup.py` file to list extra requirements and |
|
register the new RL algorithm in ml-agents ecosystem and be able to call `mlagents-learn` CLI with your customized |
|
configuration. |
|
|
|
|
|
```shell |
|
βββ mlagents_trainer_plugin |
|
β βββ __init__.py |
|
β βββ a2c |
|
β β βββ __init__.py |
|
β β βββ a2c_3DBall.yaml |
|
β β βββ a2c_optimizer.py |
|
β β βββ a2c_trainer.py |
|
β βββ dqn |
|
β βββ __init__.py |
|
β βββ dqn_basic.yaml |
|
β βββ dqn_optimizer.py |
|
β βββ dqn_trainer.py |
|
βββ setup.py |
|
``` |
|
## Installation and Execution |
|
If you haven't already, follow the [installation instructions](Installation.md). Once you have the `ml-agents-env` and `ml-agents` packages you can install the plugin package. From the repository's root directory install `ml-agents-trainer-plugin` (or replace with the name of your plugin folder). |
|
|
|
```sh |
|
pip3 install -e <./ml-agents-trainer-plugin> |
|
``` |
|
|
|
Following the previous installations your package is added as an entrypoint and you can use a config file with new |
|
trainers: |
|
```sh |
|
mlagents-learn ml-agents-trainer-plugin/mlagents_trainer_plugin/a2c/a2c_3DBall.yaml --run-id <run-id-name> |
|
--env <env-executable> |
|
``` |
|
|
|
## Tutorial |
|
Hereβs a step-by-step [tutorial](Tutorial-Custom-Trainer-Plugin.md) on how to write a setup file and extend ml-agents trainers, optimizers, and |
|
hyperparameter settings.To extend ML-agents classes see references on |
|
[trainers](Python-On-Off-Policy-Trainer-Documentation.md) and [Optimizer](Python-Optimizer-Documentation.md). |
|
|