Edit model card

PPO Agent playing BreakoutNoFrameskip-v4

This is a trained model of a PPO agent playing BreakoutNoFrameskip-v4 using the stable-baselines3 library and the RL Zoo.

The RL Zoo is a training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.

Usage (with SB3 RL Zoo)

RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo
SB3: https://github.com/DLR-RM/stable-baselines3
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Install the RL Zoo (with SB3 and SB3-Contrib):

pip install rl_zoo3
# Download model and save it into the logs/ folder
python -m rl_zoo3.load_from_hub --algo ppo --env BreakoutNoFrameskip-v4 -orga MattStammers -f logs/
python -m rl_zoo3.enjoy --algo ppo --env BreakoutNoFrameskip-v4  -f logs/

If you installed the RL Zoo3 via pip (pip install rl_zoo3), from anywhere you can do:

python -m rl_zoo3.load_from_hub --algo ppo --env BreakoutNoFrameskip-v4 -orga MattStammers -f logs/
python -m rl_zoo3.enjoy --algo ppo --env BreakoutNoFrameskip-v4  -f logs/

Training (with the RL Zoo)

python -m rl_zoo3.train --algo ppo --env BreakoutNoFrameskip-v4 -f logs/
# Upload the model and generate video (when possible)
python -m rl_zoo3.push_to_hub --algo ppo --env BreakoutNoFrameskip-v4 -f logs/ -orga MattStammers

Hyperparameters

OrderedDict([('batch_size', 256),
             ('clip_range', 'lin_0.1'),
             ('ent_coef', 0.01),
             ('env_wrapper',
              ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
             ('frame_stack', 4),
             ('learning_rate', 'lin_2.5e-4'),
             ('n_envs', 8),
             ('n_epochs', 4),
             ('n_steps', 128),
             ('n_timesteps', 10000000.0),
             ('normalize', False),
             ('policy', 'CnnPolicy'),
             ('vf_coef', 0.5)])

Environment Arguments

{'render_mode': 'rgb_array'}

This PPO agent is better than past attempts but still leaves something to be desired as it can't quite clear the board. I have added extended video now which demonstrates how it typically tries to puncture the bricks to get the ball in behind as soon as it can.

Downloads last month
10
Video Preview
loading

Evaluation results

  • mean_reward on BreakoutNoFrameskip-v4
    self-reported
    397.30 +/- 24.41