micdestefano
/

micppo-LunarLander-v2-unit8-part1

Reinforcement Learning

deep-reinforcement-learning

custom-implementation

Model card Files Files and versions Metrics Training metrics Community

micdestefano commited on Dec 28, 2023

Commit

bf0bcc1

·

1 Parent(s): e87beef

Update README.md

Files changed (1) hide show

README.md +43 -42

README.md CHANGED Viewed

@@ -22,48 +22,49 @@ model-index:
       verified: false
 ---
-    # PPO Agent Playing LunarLander-v2
-    This is a trained model of a PPO agent playing LunarLander-v2.
-    The agent has been trained with a custom PPO implementation inspired to
-    [a tutorial by Costa Huang](https://www.youtube.com/watch?v=MEt6rrxH8W4).
-    This work is related to Unit 8, part 1 of the Hugging Face Deep RL course. I had to slightly modify
-    some pieces of the provided notebook, because I used gymnasium and not gym.
-    Furthermore, the PPO implementation is available on GitHub, here:
-    [https://github.com/micdestefano/micppo](https://github.com/micdestefano/micppo).
-    # Hyperparameters
-    ```python
-    {'exp_name': 'micppo'
-'gym_id': 'LunarLander-v2'
-'learning_rate': 0.00025
-'min_learning_rate_ratio': 0.01
-'seed': 1
-'total_timesteps': 10000000
-'torch_not_deterministic': False
-'no_cuda': False
-'capture_video': True
-'hidden_size': 256
-'num_hidden_layers': 3
-'activation': 'leaky-relu'
-'num_checkpoints': 4
-'num_envs': 8
-'num_steps': 2048
-'no_lr_annealing': False
-'no_gae': False
-'gamma': 0.99
-'gae_lambda': 0.95
-'num_minibatches': 16
-'num_update_epochs': 32
-'no_advantage_normalization': False
-'clip_coef': 0.2
-'no_value_loss_clip': False
-'ent_coef': 0.01
-'vf_coef': 0.5
-'max_grad_norm': 0.5
-'target_kl': None
-'batch_size': 16384
-'minibatch_size': 1024}
-    ```

       verified: false
 ---
+# PPO Agent Playing LunarLander-v2
+This is a trained model of a PPO agent playing LunarLander-v2.
+The agent has been trained with a custom PPO implementation inspired to
+[a tutorial by Costa Huang](https://www.youtube.com/watch?v=MEt6rrxH8W4).
+This work is related to Unit 8, part 1 of the Hugging Face Deep RL course. I had to slightly modify
+some pieces of the provided notebook, because I used gymnasium and not gym.
+Furthermore, the PPO implementation is available on GitHub, here:
+[https://github.com/micdestefano/micppo](https://github.com/micdestefano/micppo).
+# Hyperparameters
+```python
+{
+  'exp_name': 'micppo'
+  'gym_id': 'LunarLander-v2'
+  'learning_rate': 0.00025
+  'min_learning_rate_ratio': 0.01
+  'seed': 1
+  'total_timesteps': 10000000
+  'torch_not_deterministic': False
+  'no_cuda': False
+  'capture_video': True
+  'hidden_size': 256
+  'num_hidden_layers': 3
+  'activation': 'leaky-relu'
+  'num_checkpoints': 4
+  'num_envs': 8
+  'num_steps': 2048
+  'no_lr_annealing': False
+  'no_gae': False
+  'gamma': 0.99
+  'gae_lambda': 0.95
+  'num_minibatches': 16
+  'num_update_epochs': 32
+  'no_advantage_normalization': False
+  'clip_coef': 0.2
+  'no_value_loss_clip': False
+  'ent_coef': 0.01
+  'vf_coef': 0.5
+  'max_grad_norm': 0.5
+  'target_kl': None
+  'batch_size': 16384
+  'minibatch_size': 1024
+}
+```