|
2023-05-18 23:22:15 - SimpleLog - INFO: - General Configs: |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - ================================================================================ |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - Name Value Type |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - env_name gym <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - algo_name PER_DQN <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - mode train <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - device cuda <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - seed 1 <class 'int'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - max_episode 100 <class 'int'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - max_step 200 <class 'int'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - collect_traj 0 <class 'bool'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - mp_backend single <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - n_workers 2 <class 'int'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - n_learners 1 <class 'int'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - share_buffer 1 <class 'bool'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - online_eval 1 <class 'bool'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - online_eval_episode 10 <class 'int'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - model_save_fre 500 <class 'int'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - load_checkpoint 0 <class 'bool'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - load_path Train_single_CartPole-v1_DQN_20230515-211721 <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - load_model_step best <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - ================================================================================ |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - Algo Configs: |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - ================================================================================ |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - Name Value Type |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - epsilon_start 0.95 <class 'float'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - epsilon_end 0.01 <class 'float'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - epsilon_decay 1000 <class 'int'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - gamma 0.99 <class 'float'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - lr 0.0001 <class 'float'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - buffer_type PER_QUE <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - buffer_size 100000 <class 'int'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - per_alpha 0.6 <class 'float'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - per_beta 0.4 <class 'float'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - per_beta_annealing 0.001 <class 'float'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - per_epsilon 0.01 <class 'float'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - batch_size 64 <class 'int'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - target_update 4 <class 'int'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - value_layers [{'layer_type': 'linear', 'layer_size': [256], 'activation': 'relu'}, {'layer_type': 'linear', 'layer_size': [256], 'activation': 'relu'}] <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - ================================================================================ |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - Env Configs: |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - ================================================================================ |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - Name Value Type |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - id CartPole-v1 <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - render_mode None <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - wrapper None <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - ignore_params ['wrapper', 'ignore_params'] <class 'str'> |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - ================================================================================ |
|
2023-05-18 23:22:15 - SimpleLog - INFO: - obs_space: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), n_actions: Discrete(2) |
|
2023-05-18 23:22:16 - SimpleLog - INFO: - Start training! |
|
2023-05-18 23:22:17 - SimpleLog - INFO: - episode: 0, ep_reward: 34.0, ep_step: 34 |
|
2023-05-18 23:22:17 - SimpleLog - INFO: - episode: 1, ep_reward: 14.0, ep_step: 14 |
|
2023-05-18 23:22:17 - SimpleLog - INFO: - episode: 2, ep_reward: 15.0, ep_step: 15 |
|
2023-05-18 23:22:17 - SimpleLog - INFO: - episode: 3, ep_reward: 17.0, ep_step: 17 |
|
2023-05-18 23:22:17 - SimpleLog - INFO: - episode: 4, ep_reward: 12.0, ep_step: 12 |
|
2023-05-18 23:22:17 - SimpleLog - INFO: - episode: 5, ep_reward: 39.0, ep_step: 39 |
|
2023-05-18 23:22:18 - SimpleLog - INFO: - episode: 6, ep_reward: 28.0, ep_step: 28 |
|
2023-05-18 23:22:18 - SimpleLog - INFO: - episode: 7, ep_reward: 33.0, ep_step: 33 |
|
2023-05-18 23:22:18 - SimpleLog - INFO: - episode: 8, ep_reward: 15.0, ep_step: 15 |
|
2023-05-18 23:22:18 - SimpleLog - INFO: - episode: 9, ep_reward: 15.0, ep_step: 15 |
|
2023-05-18 23:22:18 - SimpleLog - INFO: - episode: 10, ep_reward: 20.0, ep_step: 20 |
|
2023-05-18 23:22:18 - SimpleLog - INFO: - episode: 11, ep_reward: 20.0, ep_step: 20 |
|
2023-05-18 23:22:18 - SimpleLog - INFO: - episode: 12, ep_reward: 13.0, ep_step: 13 |
|
2023-05-18 23:22:18 - SimpleLog - INFO: - episode: 13, ep_reward: 19.0, ep_step: 19 |
|
2023-05-18 23:22:18 - SimpleLog - INFO: - episode: 14, ep_reward: 30.0, ep_step: 30 |
|
2023-05-18 23:22:18 - SimpleLog - INFO: - episode: 15, ep_reward: 15.0, ep_step: 15 |
|
2023-05-18 23:22:19 - SimpleLog - INFO: - episode: 16, ep_reward: 20.0, ep_step: 20 |
|
2023-05-18 23:22:19 - SimpleLog - INFO: - episode: 17, ep_reward: 14.0, ep_step: 14 |
|
2023-05-18 23:22:19 - SimpleLog - INFO: - episode: 18, ep_reward: 11.0, ep_step: 11 |
|
2023-05-18 23:22:19 - SimpleLog - INFO: - episode: 19, ep_reward: 21.0, ep_step: 21 |
|
2023-05-18 23:22:19 - SimpleLog - INFO: - episode: 20, ep_reward: 15.0, ep_step: 15 |
|
2023-05-18 23:22:19 - SimpleLog - INFO: - episode: 21, ep_reward: 18.0, ep_step: 18 |
|
2023-05-18 23:22:19 - SimpleLog - INFO: - episode: 22, ep_reward: 12.0, ep_step: 12 |
|
2023-05-18 23:22:19 - SimpleLog - INFO: - episode: 23, ep_reward: 24.0, ep_step: 24 |
|
2023-05-18 23:22:19 - SimpleLog - INFO: - episode: 24, ep_reward: 23.0, ep_step: 23 |
|
2023-05-18 23:22:19 - SimpleLog - INFO: - episode: 25, ep_reward: 25.0, ep_step: 25 |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - episode: 26, ep_reward: 17.0, ep_step: 17 |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - episode: 27, ep_reward: 12.0, ep_step: 12 |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - update_step: 500, online_eval_reward: 9.000 |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - current update step obtain a better online_eval_reward: 9.000, save the best model! |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - episode: 28, ep_reward: 17.0, ep_step: 17 |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - episode: 29, ep_reward: 12.0, ep_step: 12 |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - episode: 30, ep_reward: 12.0, ep_step: 12 |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - episode: 31, ep_reward: 16.0, ep_step: 16 |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - episode: 32, ep_reward: 11.0, ep_step: 11 |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - episode: 33, ep_reward: 15.0, ep_step: 15 |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - episode: 34, ep_reward: 12.0, ep_step: 12 |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - episode: 35, ep_reward: 18.0, ep_step: 18 |
|
2023-05-18 23:22:20 - SimpleLog - INFO: - episode: 36, ep_reward: 9.0, ep_step: 9 |
|
2023-05-18 23:22:21 - SimpleLog - INFO: - episode: 37, ep_reward: 50.0, ep_step: 50 |
|
2023-05-18 23:22:21 - SimpleLog - INFO: - episode: 38, ep_reward: 12.0, ep_step: 12 |
|
2023-05-18 23:22:21 - SimpleLog - INFO: - episode: 39, ep_reward: 14.0, ep_step: 14 |
|
2023-05-18 23:22:21 - SimpleLog - INFO: - episode: 40, ep_reward: 15.0, ep_step: 15 |
|
2023-05-18 23:22:21 - SimpleLog - INFO: - episode: 41, ep_reward: 10.0, ep_step: 10 |
|
2023-05-18 23:22:21 - SimpleLog - INFO: - episode: 42, ep_reward: 19.0, ep_step: 19 |
|
2023-05-18 23:22:21 - SimpleLog - INFO: - episode: 43, ep_reward: 13.0, ep_step: 13 |
|
2023-05-18 23:22:21 - SimpleLog - INFO: - episode: 44, ep_reward: 16.0, ep_step: 16 |
|
2023-05-18 23:22:21 - SimpleLog - INFO: - episode: 45, ep_reward: 20.0, ep_step: 20 |
|
2023-05-18 23:22:21 - SimpleLog - INFO: - episode: 46, ep_reward: 16.0, ep_step: 16 |
|
2023-05-18 23:22:21 - SimpleLog - INFO: - episode: 47, ep_reward: 32.0, ep_step: 32 |
|
2023-05-18 23:22:21 - SimpleLog - INFO: - episode: 48, ep_reward: 25.0, ep_step: 25 |
|
2023-05-18 23:22:22 - SimpleLog - INFO: - episode: 49, ep_reward: 73.0, ep_step: 73 |
|
2023-05-18 23:22:22 - SimpleLog - INFO: - episode: 50, ep_reward: 28.0, ep_step: 28 |
|
2023-05-18 23:22:22 - SimpleLog - INFO: - update_step: 1000, online_eval_reward: 75.000 |
|
2023-05-18 23:22:22 - SimpleLog - INFO: - current update step obtain a better online_eval_reward: 75.000, save the best model! |
|
2023-05-18 23:22:23 - SimpleLog - INFO: - episode: 51, ep_reward: 88.0, ep_step: 88 |
|
2023-05-18 23:22:23 - SimpleLog - INFO: - episode: 52, ep_reward: 58.0, ep_step: 58 |
|
2023-05-18 23:22:23 - SimpleLog - INFO: - episode: 53, ep_reward: 53.0, ep_step: 53 |
|
2023-05-18 23:22:24 - SimpleLog - INFO: - episode: 54, ep_reward: 77.0, ep_step: 77 |
|
2023-05-18 23:22:24 - SimpleLog - INFO: - episode: 55, ep_reward: 48.0, ep_step: 48 |
|
2023-05-18 23:22:25 - SimpleLog - INFO: - episode: 56, ep_reward: 150.0, ep_step: 150 |
|
2023-05-18 23:22:25 - SimpleLog - INFO: - episode: 57, ep_reward: 45.0, ep_step: 45 |
|
2023-05-18 23:22:25 - SimpleLog - INFO: - update_step: 1500, online_eval_reward: 68.000 |
|
2023-05-18 23:22:25 - SimpleLog - INFO: - episode: 58, ep_reward: 53.0, ep_step: 53 |
|
2023-05-18 23:22:26 - SimpleLog - INFO: - episode: 59, ep_reward: 75.0, ep_step: 75 |
|
2023-05-18 23:22:26 - SimpleLog - INFO: - episode: 60, ep_reward: 49.0, ep_step: 49 |
|
2023-05-18 23:22:27 - SimpleLog - INFO: - episode: 61, ep_reward: 127.0, ep_step: 127 |
|
2023-05-18 23:22:27 - SimpleLog - INFO: - episode: 62, ep_reward: 107.0, ep_step: 107 |
|
2023-05-18 23:22:28 - SimpleLog - INFO: - episode: 63, ep_reward: 72.0, ep_step: 72 |
|
2023-05-18 23:22:28 - SimpleLog - INFO: - update_step: 2000, online_eval_reward: 59.000 |
|
2023-05-18 23:22:28 - SimpleLog - INFO: - episode: 64, ep_reward: 70.0, ep_step: 70 |
|
2023-05-18 23:22:29 - SimpleLog - INFO: - episode: 65, ep_reward: 54.0, ep_step: 54 |
|
2023-05-18 23:22:29 - SimpleLog - INFO: - episode: 66, ep_reward: 49.0, ep_step: 49 |
|
2023-05-18 23:22:29 - SimpleLog - INFO: - episode: 67, ep_reward: 56.0, ep_step: 56 |
|
2023-05-18 23:22:29 - SimpleLog - INFO: - episode: 68, ep_reward: 69.0, ep_step: 69 |
|
2023-05-18 23:22:30 - SimpleLog - INFO: - episode: 69, ep_reward: 70.0, ep_step: 70 |
|
2023-05-18 23:22:30 - SimpleLog - INFO: - episode: 70, ep_reward: 65.0, ep_step: 65 |
|
2023-05-18 23:22:30 - SimpleLog - INFO: - episode: 71, ep_reward: 57.0, ep_step: 57 |
|
2023-05-18 23:22:31 - SimpleLog - INFO: - episode: 72, ep_reward: 50.0, ep_step: 50 |
|
2023-05-18 23:22:31 - SimpleLog - INFO: - update_step: 2500, online_eval_reward: 124.000 |
|
2023-05-18 23:22:31 - SimpleLog - INFO: - current update step obtain a better online_eval_reward: 124.000, save the best model! |
|
2023-05-18 23:22:32 - SimpleLog - INFO: - episode: 73, ep_reward: 82.0, ep_step: 82 |
|
2023-05-18 23:22:32 - SimpleLog - INFO: - episode: 74, ep_reward: 74.0, ep_step: 74 |
|
2023-05-18 23:22:33 - SimpleLog - INFO: - episode: 75, ep_reward: 93.0, ep_step: 93 |
|
2023-05-18 23:22:33 - SimpleLog - INFO: - episode: 76, ep_reward: 80.0, ep_step: 80 |
|
2023-05-18 23:22:33 - SimpleLog - INFO: - episode: 77, ep_reward: 56.0, ep_step: 56 |
|
2023-05-18 23:22:34 - SimpleLog - INFO: - episode: 78, ep_reward: 87.0, ep_step: 87 |
|
2023-05-18 23:22:34 - SimpleLog - INFO: - update_step: 3000, online_eval_reward: 68.000 |
|
2023-05-18 23:22:35 - SimpleLog - INFO: - episode: 79, ep_reward: 67.0, ep_step: 67 |
|
2023-05-18 23:22:35 - SimpleLog - INFO: - episode: 80, ep_reward: 80.0, ep_step: 80 |
|
2023-05-18 23:22:35 - SimpleLog - INFO: - episode: 81, ep_reward: 65.0, ep_step: 65 |
|
2023-05-18 23:22:36 - SimpleLog - INFO: - episode: 82, ep_reward: 79.0, ep_step: 79 |
|
2023-05-18 23:22:36 - SimpleLog - INFO: - episode: 83, ep_reward: 66.0, ep_step: 66 |
|
2023-05-18 23:22:37 - SimpleLog - INFO: - episode: 84, ep_reward: 90.0, ep_step: 90 |
|
2023-05-18 23:22:38 - SimpleLog - INFO: - update_step: 3500, online_eval_reward: 146.000 |
|
2023-05-18 23:22:38 - SimpleLog - INFO: - current update step obtain a better online_eval_reward: 146.000, save the best model! |
|
2023-05-18 23:22:38 - SimpleLog - INFO: - episode: 85, ep_reward: 134.0, ep_step: 134 |
|
2023-05-18 23:22:39 - SimpleLog - INFO: - episode: 86, ep_reward: 156.0, ep_step: 156 |
|
2023-05-18 23:22:40 - SimpleLog - INFO: - episode: 87, ep_reward: 200.0, ep_step: 200 |
|
2023-05-18 23:22:41 - SimpleLog - INFO: - update_step: 4000, online_eval_reward: 185.000 |
|
2023-05-18 23:22:41 - SimpleLog - INFO: - current update step obtain a better online_eval_reward: 185.000, save the best model! |
|
2023-05-18 23:22:42 - SimpleLog - INFO: - episode: 88, ep_reward: 196.0, ep_step: 196 |
|
2023-05-18 23:22:43 - SimpleLog - INFO: - episode: 89, ep_reward: 190.0, ep_step: 190 |
|
2023-05-18 23:22:44 - SimpleLog - INFO: - episode: 90, ep_reward: 200.0, ep_step: 200 |
|
2023-05-18 23:22:45 - SimpleLog - INFO: - update_step: 4500, online_eval_reward: 200.000 |
|
2023-05-18 23:22:45 - SimpleLog - INFO: - current update step obtain a better online_eval_reward: 200.000, save the best model! |
|
2023-05-18 23:22:46 - SimpleLog - INFO: - episode: 91, ep_reward: 200.0, ep_step: 200 |
|
2023-05-18 23:22:47 - SimpleLog - INFO: - episode: 92, ep_reward: 200.0, ep_step: 200 |
|
2023-05-18 23:22:48 - SimpleLog - INFO: - update_step: 5000, online_eval_reward: 200.000 |
|
2023-05-18 23:22:48 - SimpleLog - INFO: - current update step obtain a better online_eval_reward: 200.000, save the best model! |
|
2023-05-18 23:22:49 - SimpleLog - INFO: - episode: 93, ep_reward: 200.0, ep_step: 200 |
|
2023-05-18 23:22:50 - SimpleLog - INFO: - episode: 94, ep_reward: 200.0, ep_step: 200 |
|
2023-05-18 23:22:51 - SimpleLog - INFO: - episode: 95, ep_reward: 200.0, ep_step: 200 |
|
2023-05-18 23:22:52 - SimpleLog - INFO: - update_step: 5500, online_eval_reward: 200.000 |
|
2023-05-18 23:22:52 - SimpleLog - INFO: - current update step obtain a better online_eval_reward: 200.000, save the best model! |
|
2023-05-18 23:22:53 - SimpleLog - INFO: - episode: 96, ep_reward: 200.0, ep_step: 200 |
|
2023-05-18 23:22:54 - SimpleLog - INFO: - episode: 97, ep_reward: 200.0, ep_step: 200 |
|
2023-05-18 23:22:55 - SimpleLog - INFO: - update_step: 6000, online_eval_reward: 200.000 |
|
2023-05-18 23:22:55 - SimpleLog - INFO: - current update step obtain a better online_eval_reward: 200.000, save the best model! |
|
2023-05-18 23:22:56 - SimpleLog - INFO: - episode: 98, ep_reward: 200.0, ep_step: 200 |
|
2023-05-18 23:22:57 - SimpleLog - INFO: - episode: 99, ep_reward: 200.0, ep_step: 200 |
|
2023-05-18 23:22:57 - SimpleLog - INFO: - Finish training! total time consumed: 41.45s |
|
|