PPO agent for the lunar lander environment as part of the hugging face reinforcement learning course.
80da88a
{"mean_reward": 244.79613669999998, "std_reward": 45.83244575177973, "is_deterministic": true, "n_eval_episodes": 10, "eval_datetime": "2023-08-15T17:20:16.204579"} |