# Table of Contents * [mlagents.trainers.optimizer.torch\_optimizer](#mlagents.trainers.optimizer.torch_optimizer) * [TorchOptimizer](#mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer) * [create\_reward\_signals](#mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer.create_reward_signals) * [get\_trajectory\_value\_estimates](#mlagents.trainers.optimizer.torch_optimizer.TorchOptimizer.get_trajectory_value_estimates) * [mlagents.trainers.optimizer.optimizer](#mlagents.trainers.optimizer.optimizer) * [Optimizer](#mlagents.trainers.optimizer.optimizer.Optimizer) * [update](#mlagents.trainers.optimizer.optimizer.Optimizer.update) # mlagents.trainers.optimizer.torch\_optimizer ## TorchOptimizer Objects ```python class TorchOptimizer(Optimizer) ``` #### create\_reward\_signals ```python | create_reward_signals(reward_signal_configs: Dict[RewardSignalType, RewardSignalSettings]) -> None ``` Create reward signals **Arguments**: - `reward_signal_configs`: Reward signal config. #### get\_trajectory\_value\_estimates ```python | get_trajectory_value_estimates(batch: AgentBuffer, next_obs: List[np.ndarray], done: bool, agent_id: str = "") -> Tuple[Dict[str, np.ndarray], Dict[str, float], Optional[AgentBufferField]] ``` Get value estimates and memories for a trajectory, in batch form. **Arguments**: - `batch`: An AgentBuffer that consists of a trajectory. - `next_obs`: the next observation (after the trajectory). Used for boostrapping if this is not a termiinal trajectory. - `done`: Set true if this is a terminal trajectory. - `agent_id`: Agent ID of the agent that this trajectory belongs to. **Returns**: A Tuple of the Value Estimates as a Dict of [name, np.ndarray(trajectory_len)], the final value estimate as a Dict of [name, float], and optionally (if using memories) an AgentBufferField of initial critic memories to be used during update. # mlagents.trainers.optimizer.optimizer ## Optimizer Objects ```python class Optimizer(abc.ABC) ``` Creates loss functions and auxillary networks (e.g. Q or Value) needed for training. Provides methods to update the Policy. #### update ```python | @abc.abstractmethod | update(batch: AgentBuffer, num_sequences: int) -> Dict[str, float] ``` Update the Policy based on the batch that was passed in. **Arguments**: - `batch`: AgentBuffer that contains the minibatch of data used for this update. - `num_sequences`: Number of recurrent sequences found in the minibatch. **Returns**: A Dict containing statistics (name, value) from the update (e.g. loss)