# Table of Contents * [mlagents.trainers.trainer.on\_policy\_trainer](#mlagents.trainers.trainer.on_policy_trainer) * [OnPolicyTrainer](#mlagents.trainers.trainer.on_policy_trainer.OnPolicyTrainer) * [\_\_init\_\_](#mlagents.trainers.trainer.on_policy_trainer.OnPolicyTrainer.__init__) * [add\_policy](#mlagents.trainers.trainer.on_policy_trainer.OnPolicyTrainer.add_policy) * [mlagents.trainers.trainer.off\_policy\_trainer](#mlagents.trainers.trainer.off_policy_trainer) * [OffPolicyTrainer](#mlagents.trainers.trainer.off_policy_trainer.OffPolicyTrainer) * [\_\_init\_\_](#mlagents.trainers.trainer.off_policy_trainer.OffPolicyTrainer.__init__) * [save\_model](#mlagents.trainers.trainer.off_policy_trainer.OffPolicyTrainer.save_model) * [save\_replay\_buffer](#mlagents.trainers.trainer.off_policy_trainer.OffPolicyTrainer.save_replay_buffer) * [load\_replay\_buffer](#mlagents.trainers.trainer.off_policy_trainer.OffPolicyTrainer.load_replay_buffer) * [add\_policy](#mlagents.trainers.trainer.off_policy_trainer.OffPolicyTrainer.add_policy) * [mlagents.trainers.trainer.rl\_trainer](#mlagents.trainers.trainer.rl_trainer) * [RLTrainer](#mlagents.trainers.trainer.rl_trainer.RLTrainer) * [end\_episode](#mlagents.trainers.trainer.rl_trainer.RLTrainer.end_episode) * [create\_optimizer](#mlagents.trainers.trainer.rl_trainer.RLTrainer.create_optimizer) * [save\_model](#mlagents.trainers.trainer.rl_trainer.RLTrainer.save_model) * [advance](#mlagents.trainers.trainer.rl_trainer.RLTrainer.advance) * [mlagents.trainers.trainer.trainer](#mlagents.trainers.trainer.trainer) * [Trainer](#mlagents.trainers.trainer.trainer.Trainer) * [\_\_init\_\_](#mlagents.trainers.trainer.trainer.Trainer.__init__) * [stats\_reporter](#mlagents.trainers.trainer.trainer.Trainer.stats_reporter) * [parameters](#mlagents.trainers.trainer.trainer.Trainer.parameters) * [get\_max\_steps](#mlagents.trainers.trainer.trainer.Trainer.get_max_steps) * [get\_step](#mlagents.trainers.trainer.trainer.Trainer.get_step) * [threaded](#mlagents.trainers.trainer.trainer.Trainer.threaded) * [should\_still\_train](#mlagents.trainers.trainer.trainer.Trainer.should_still_train) * [reward\_buffer](#mlagents.trainers.trainer.trainer.Trainer.reward_buffer) * [save\_model](#mlagents.trainers.trainer.trainer.Trainer.save_model) * [end\_episode](#mlagents.trainers.trainer.trainer.Trainer.end_episode) * [create\_policy](#mlagents.trainers.trainer.trainer.Trainer.create_policy) * [add\_policy](#mlagents.trainers.trainer.trainer.Trainer.add_policy) * [get\_policy](#mlagents.trainers.trainer.trainer.Trainer.get_policy) * [advance](#mlagents.trainers.trainer.trainer.Trainer.advance) * [publish\_policy\_queue](#mlagents.trainers.trainer.trainer.Trainer.publish_policy_queue) * [subscribe\_trajectory\_queue](#mlagents.trainers.trainer.trainer.Trainer.subscribe_trajectory_queue) * [mlagents.trainers.settings](#mlagents.trainers.settings) * [deep\_update\_dict](#mlagents.trainers.settings.deep_update_dict) * [RewardSignalSettings](#mlagents.trainers.settings.RewardSignalSettings) * [structure](#mlagents.trainers.settings.RewardSignalSettings.structure) * [ParameterRandomizationSettings](#mlagents.trainers.settings.ParameterRandomizationSettings) * [\_\_str\_\_](#mlagents.trainers.settings.ParameterRandomizationSettings.__str__) * [structure](#mlagents.trainers.settings.ParameterRandomizationSettings.structure) * [unstructure](#mlagents.trainers.settings.ParameterRandomizationSettings.unstructure) * [apply](#mlagents.trainers.settings.ParameterRandomizationSettings.apply) * [ConstantSettings](#mlagents.trainers.settings.ConstantSettings) * [\_\_str\_\_](#mlagents.trainers.settings.ConstantSettings.__str__) * [apply](#mlagents.trainers.settings.ConstantSettings.apply) * [UniformSettings](#mlagents.trainers.settings.UniformSettings) * [\_\_str\_\_](#mlagents.trainers.settings.UniformSettings.__str__) * [apply](#mlagents.trainers.settings.UniformSettings.apply) * [GaussianSettings](#mlagents.trainers.settings.GaussianSettings) * [\_\_str\_\_](#mlagents.trainers.settings.GaussianSettings.__str__) * [apply](#mlagents.trainers.settings.GaussianSettings.apply) * [MultiRangeUniformSettings](#mlagents.trainers.settings.MultiRangeUniformSettings) * [\_\_str\_\_](#mlagents.trainers.settings.MultiRangeUniformSettings.__str__) * [apply](#mlagents.trainers.settings.MultiRangeUniformSettings.apply) * [CompletionCriteriaSettings](#mlagents.trainers.settings.CompletionCriteriaSettings) * [need\_increment](#mlagents.trainers.settings.CompletionCriteriaSettings.need_increment) * [Lesson](#mlagents.trainers.settings.Lesson) * [EnvironmentParameterSettings](#mlagents.trainers.settings.EnvironmentParameterSettings) * [structure](#mlagents.trainers.settings.EnvironmentParameterSettings.structure) * [TrainerSettings](#mlagents.trainers.settings.TrainerSettings) * [structure](#mlagents.trainers.settings.TrainerSettings.structure) * [CheckpointSettings](#mlagents.trainers.settings.CheckpointSettings) * [prioritize\_resume\_init](#mlagents.trainers.settings.CheckpointSettings.prioritize_resume_init) * [RunOptions](#mlagents.trainers.settings.RunOptions) * [from\_argparse](#mlagents.trainers.settings.RunOptions.from_argparse) # mlagents.trainers.trainer.on\_policy\_trainer ## OnPolicyTrainer Objects ```python class OnPolicyTrainer(RLTrainer) ``` The PPOTrainer is an implementation of the PPO algorithm. #### \_\_init\_\_ ```python | __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str) ``` Responsible for collecting experiences and training an on-policy model. **Arguments**: - `behavior_name`: The name of the behavior associated with trainer config - `reward_buff_cap`: Max reward history to track in the reward buffer - `trainer_settings`: The parameters for the trainer. - `training`: Whether the trainer is set for training. - `load`: Whether the model should be loaded. - `seed`: The seed the model will be initialized with - `artifact_path`: The directory within which to store artifacts from this trainer. #### add\_policy ```python | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None ``` Adds policy to trainer. **Arguments**: - `parsed_behavior_id`: Behavior identifiers that the policy should belong to. - `policy`: Policy to associate with name_behavior_id. # mlagents.trainers.trainer.off\_policy\_trainer ## OffPolicyTrainer Objects ```python class OffPolicyTrainer(RLTrainer) ``` The SACTrainer is an implementation of the SAC algorithm, with support for discrete actions and recurrent networks. #### \_\_init\_\_ ```python | __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str) ``` Responsible for collecting experiences and training an off-policy model. **Arguments**: - `behavior_name`: The name of the behavior associated with trainer config - `reward_buff_cap`: Max reward history to track in the reward buffer - `trainer_settings`: The parameters for the trainer. - `training`: Whether the trainer is set for training. - `load`: Whether the model should be loaded. - `seed`: The seed the model will be initialized with - `artifact_path`: The directory within which to store artifacts from this trainer. #### save\_model ```python | save_model() -> None ``` Saves the final training model to memory Overrides the default to save the replay buffer. #### save\_replay\_buffer ```python | save_replay_buffer() -> None ``` Save the training buffer's update buffer to a pickle file. #### load\_replay\_buffer ```python | load_replay_buffer() -> None ``` Loads the last saved replay buffer from a file. #### add\_policy ```python | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None ``` Adds policy to trainer. # mlagents.trainers.trainer.rl\_trainer ## RLTrainer Objects ```python class RLTrainer(Trainer) ``` This class is the base class for trainers that use Reward Signals. #### end\_episode ```python | end_episode() -> None ``` A signal that the Episode has ended. The buffer must be reset. Get only called when the academy resets. #### create\_optimizer ```python | @abc.abstractmethod | create_optimizer() -> TorchOptimizer ``` Creates an Optimizer object #### save\_model ```python | save_model() -> None ``` Saves the policy associated with this trainer. #### advance ```python | advance() -> None ``` Steps the trainer, taking in trajectories and updates if ready. Will block and wait briefly if there are no trajectories. # mlagents.trainers.trainer.trainer ## Trainer Objects ```python class Trainer(abc.ABC) ``` This class is the base class for the mlagents_envs.trainers #### \_\_init\_\_ ```python | __init__(brain_name: str, trainer_settings: TrainerSettings, training: bool, load: bool, artifact_path: str, reward_buff_cap: int = 1) ``` Responsible for collecting experiences and training a neural network model. **Arguments**: - `brain_name`: Brain name of brain to be trained. - `trainer_settings`: The parameters for the trainer (dictionary). - `training`: Whether the trainer is set for training. - `artifact_path`: The directory within which to store artifacts from this trainer - `reward_buff_cap`: #### stats\_reporter ```python | @property | stats_reporter() ``` Returns the stats reporter associated with this Trainer. #### parameters ```python | @property | parameters() -> TrainerSettings ``` Returns the trainer parameters of the trainer. #### get\_max\_steps ```python | @property | get_max_steps() -> int ``` Returns the maximum number of steps. Is used to know when the trainer should be stopped. **Returns**: The maximum number of steps of the trainer #### get\_step ```python | @property | get_step() -> int ``` Returns the number of steps the trainer has performed **Returns**: the step count of the trainer #### threaded ```python | @property | threaded() -> bool ``` Whether or not to run the trainer in a thread. True allows the trainer to update the policy while the environment is taking steps. Set to False to enforce strict on-policy updates (i.e. don't update the policy when taking steps.) #### should\_still\_train ```python | @property | should_still_train() -> bool ``` Returns whether or not the trainer should train. A Trainer could stop training if it wasn't training to begin with, or if max_steps is reached. #### reward\_buffer ```python | @property | reward_buffer() -> Deque[float] ``` Returns the reward buffer. The reward buffer contains the cumulative rewards of the most recent episodes completed by agents using this trainer. **Returns**: the reward buffer. #### save\_model ```python | @abc.abstractmethod | save_model() -> None ``` Saves model file(s) for the policy or policies associated with this trainer. #### end\_episode ```python | @abc.abstractmethod | end_episode() ``` A signal that the Episode has ended. The buffer must be reset. Get only called when the academy resets. #### create\_policy ```python | @abc.abstractmethod | create_policy(parsed_behavior_id: BehaviorIdentifiers, behavior_spec: BehaviorSpec) -> Policy ``` Creates a Policy object #### add\_policy ```python | @abc.abstractmethod | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None ``` Adds policy to trainer. #### get\_policy ```python | get_policy(name_behavior_id: str) -> Policy ``` Gets policy associated with name_behavior_id **Arguments**: - `name_behavior_id`: Fully qualified behavior name **Returns**: Policy associated with name_behavior_id #### advance ```python | @abc.abstractmethod | advance() -> None ``` Advances the trainer. Typically, this means grabbing trajectories from all subscribed trajectory queues (self.trajectory_queues), and updating a policy using the steps in them, and if needed pushing a new policy onto the right policy queues (self.policy_queues). #### publish\_policy\_queue ```python | publish_policy_queue(policy_queue: AgentManagerQueue[Policy]) -> None ``` Adds a policy queue to the list of queues to publish to when this Trainer makes a policy update **Arguments**: - `policy_queue`: Policy queue to publish to. #### subscribe\_trajectory\_queue ```python | subscribe_trajectory_queue(trajectory_queue: AgentManagerQueue[Trajectory]) -> None ``` Adds a trajectory queue to the list of queues for the trainer to ingest Trajectories from. **Arguments**: - `trajectory_queue`: Trajectory queue to read from. # mlagents.trainers.settings #### deep\_update\_dict ```python deep_update_dict(d: Dict, update_d: Mapping) -> None ``` Similar to dict.update(), but works for nested dicts of dicts as well. ## RewardSignalSettings Objects ```python @attr.s(auto_attribs=True) class RewardSignalSettings() ``` #### structure ```python | @staticmethod | structure(d: Mapping, t: type) -> Any ``` Helper method to structure a Dict of RewardSignalSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle the special Enum selection of RewardSignalSettings classes. ## ParameterRandomizationSettings Objects ```python @attr.s(auto_attribs=True) class ParameterRandomizationSettings(abc.ABC) ``` #### \_\_str\_\_ ```python | __str__() -> str ``` Helper method to output sampler stats to console. #### structure ```python | @staticmethod | structure(d: Union[Mapping, float], t: type) -> "ParameterRandomizationSettings" ``` Helper method to a ParameterRandomizationSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle the special Enum selection of ParameterRandomizationSettings classes. #### unstructure ```python | @staticmethod | unstructure(d: "ParameterRandomizationSettings") -> Mapping ``` Helper method to a ParameterRandomizationSettings class. Meant to be registered with cattr.register_unstructure_hook() and called with cattr.unstructure(). #### apply ```python | @abc.abstractmethod | apply(key: str, env_channel: EnvironmentParametersChannel) -> None ``` Helper method to send sampler settings over EnvironmentParametersChannel Calls the appropriate sampler type set method. **Arguments**: - `key`: environment parameter to be sampled - `env_channel`: The EnvironmentParametersChannel to communicate sampler settings to environment ## ConstantSettings Objects ```python @attr.s(auto_attribs=True) class ConstantSettings(ParameterRandomizationSettings) ``` #### \_\_str\_\_ ```python | __str__() -> str ``` Helper method to output sampler stats to console. #### apply ```python | apply(key: str, env_channel: EnvironmentParametersChannel) -> None ``` Helper method to send sampler settings over EnvironmentParametersChannel Calls the constant sampler type set method. **Arguments**: - `key`: environment parameter to be sampled - `env_channel`: The EnvironmentParametersChannel to communicate sampler settings to environment ## UniformSettings Objects ```python @attr.s(auto_attribs=True) class UniformSettings(ParameterRandomizationSettings) ``` #### \_\_str\_\_ ```python | __str__() -> str ``` Helper method to output sampler stats to console. #### apply ```python | apply(key: str, env_channel: EnvironmentParametersChannel) -> None ``` Helper method to send sampler settings over EnvironmentParametersChannel Calls the uniform sampler type set method. **Arguments**: - `key`: environment parameter to be sampled - `env_channel`: The EnvironmentParametersChannel to communicate sampler settings to environment ## GaussianSettings Objects ```python @attr.s(auto_attribs=True) class GaussianSettings(ParameterRandomizationSettings) ``` #### \_\_str\_\_ ```python | __str__() -> str ``` Helper method to output sampler stats to console. #### apply ```python | apply(key: str, env_channel: EnvironmentParametersChannel) -> None ``` Helper method to send sampler settings over EnvironmentParametersChannel Calls the gaussian sampler type set method. **Arguments**: - `key`: environment parameter to be sampled - `env_channel`: The EnvironmentParametersChannel to communicate sampler settings to environment ## MultiRangeUniformSettings Objects ```python @attr.s(auto_attribs=True) class MultiRangeUniformSettings(ParameterRandomizationSettings) ``` #### \_\_str\_\_ ```python | __str__() -> str ``` Helper method to output sampler stats to console. #### apply ```python | apply(key: str, env_channel: EnvironmentParametersChannel) -> None ``` Helper method to send sampler settings over EnvironmentParametersChannel Calls the multirangeuniform sampler type set method. **Arguments**: - `key`: environment parameter to be sampled - `env_channel`: The EnvironmentParametersChannel to communicate sampler settings to environment ## CompletionCriteriaSettings Objects ```python @attr.s(auto_attribs=True) class CompletionCriteriaSettings() ``` CompletionCriteriaSettings contains the information needed to figure out if the next lesson must start. #### need\_increment ```python | need_increment(progress: float, reward_buffer: List[float], smoothing: float) -> Tuple[bool, float] ``` Given measures, this method returns a boolean indicating if the lesson needs to change now, and a float corresponding to the new smoothed value. ## Lesson Objects ```python @attr.s(auto_attribs=True) class Lesson() ``` Gathers the data of one lesson for one environment parameter including its name, the condition that must be fullfiled for the lesson to be completed and a sampler for the environment parameter. If the completion_criteria is None, then this is the last lesson in the curriculum. ## EnvironmentParameterSettings Objects ```python @attr.s(auto_attribs=True) class EnvironmentParameterSettings() ``` EnvironmentParameterSettings is an ordered list of lessons for one environment parameter. #### structure ```python | @staticmethod | structure(d: Mapping, t: type) -> Dict[str, "EnvironmentParameterSettings"] ``` Helper method to structure a Dict of EnvironmentParameterSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). ## TrainerSettings Objects ```python @attr.s(auto_attribs=True) class TrainerSettings(ExportableSettings) ``` #### structure ```python | @staticmethod | structure(d: Mapping, t: type) -> Any ``` Helper method to structure a TrainerSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). ## CheckpointSettings Objects ```python @attr.s(auto_attribs=True) class CheckpointSettings() ``` #### prioritize\_resume\_init ```python | prioritize_resume_init() -> None ``` Prioritize explicit command line resume/init over conflicting yaml options. if both resume/init are set at one place use resume ## RunOptions Objects ```python @attr.s(auto_attribs=True) class RunOptions(ExportableSettings) ``` #### from\_argparse ```python | @staticmethod | from_argparse(args: argparse.Namespace) -> "RunOptions" ``` Takes an argparse.Namespace as specified in `parse_command_line`, loads input configuration files from file paths, and converts to a RunOptions instance. **Arguments**: - `args`: collection of command-line parameters passed to mlagents-learn **Returns**: RunOptions representing the passed in arguments, with trainer config, curriculum and sampler configs loaded from files.