--- library_name: ml-agents tags: - SnowballTarget - deep-reinforcement-learning - reinforcement-learning - ML-Agents-SnowballTarget --- # **ppo** Agent playing **SnowballTarget** This is a trained model of a **ppo** agent playing **SnowballTarget** using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents). ## Usage (with ML-Agents) The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/ We wrote a complete tutorial to learn to train your first agent using ML-Agents and publish it to the Hub: - A *short tutorial* where you teach Huggy the Dog 🐶 to fetch the stick and then play with him directly in your browser: https://huggingface.co/learn/deep-rl-course/unitbonus1/introduction - A *longer tutorial* to understand how works ML-Agents: https://huggingface.co/learn/deep-rl-course/unit5/introduction ### Resume the training ```bash mlagents-learn --run-id= --resume ``` ### Watch your Agent play You can watch your agent **playing directly in your browser** 1. If the environment is part of ML-Agents official environments, go to https://huggingface.co/unity 2. Step 1: Find your model_id: lambdavi/ppo-SnowballTarget 3. Step 2: Select your *.nn /*.onnx file 4. Click on Watch the agent play 👀 ### Hyperparams used: SnowballTarget: trainer_type: ppo hyperparameters: batch_size: 128 buffer_size: 2048 learning_rate: 0.005 beta: 0.005 epsilon: 0.2 lambd: 0.95 num_epoch: 5 shared_critic: False learning_rate_schedule: linear beta_schedule: linear epsilon_schedule: linear checkpoint_interval: 50000 network_settings: normalize: False hidden_units: 256 num_layers: 2 vis_encode_type: simple memory: None goal_conditioning_type: hyper deterministic: False reward_signals: extrinsic: gamma: 0.99 strength: 1.0 network_settings: normalize: False hidden_units: 128 num_layers: 2 vis_encode_type: simple memory: None goal_conditioning_type: hyper deterministic: False init_path: None keep_checkpoints: 10 even_checkpoints: False max_steps: 500000 time_horizon: 64 summary_freq: 10000 threaded: True self_play: None behavioral_cloning: None