UPLOAD Model version 1 : no hyperparameter trained on 1M step PPO architecture. Mean_reward 263.07035025211746 +/- std_reward 15.52574254837321 ea060f9 Hans14 commited on May 14, 2023