DeepRL-PPO-LLv2 / results.json
0x05a4's picture
Baseline: LR=5e-4/cosine-100, epochs=1e7/305
ab2dd36
raw
history blame
157 Bytes
{"mean_reward": 288.9118304, "std_reward": 10.97332868779998, "is_deterministic": true, "n_eval_episodes": 10, "eval_datetime": "2023-06-16T07:05:52.212687"}