diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1132 @@ +[2023-02-24 13:43:01,056][11586] Saving configuration to /content/train_dir/default_experiment/config.json... +[2023-02-24 13:43:01,060][11586] Rollout worker 0 uses device cpu +[2023-02-24 13:43:01,063][11586] Rollout worker 1 uses device cpu +[2023-02-24 13:43:01,065][11586] Rollout worker 2 uses device cpu +[2023-02-24 13:43:01,067][11586] Rollout worker 3 uses device cpu +[2023-02-24 13:43:01,068][11586] Rollout worker 4 uses device cpu +[2023-02-24 13:43:01,070][11586] Rollout worker 5 uses device cpu +[2023-02-24 13:43:01,072][11586] Rollout worker 6 uses device cpu +[2023-02-24 13:43:01,073][11586] Rollout worker 7 uses device cpu +[2023-02-24 13:43:01,263][11586] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-24 13:43:01,266][11586] InferenceWorker_p0-w0: min num requests: 2 +[2023-02-24 13:43:01,302][11586] Starting all processes... +[2023-02-24 13:43:01,306][11586] Starting process learner_proc0 +[2023-02-24 13:43:01,360][11586] Starting all processes... +[2023-02-24 13:43:01,372][11586] Starting process inference_proc0-0 +[2023-02-24 13:43:01,374][11586] Starting process rollout_proc0 +[2023-02-24 13:43:01,374][11586] Starting process rollout_proc1 +[2023-02-24 13:43:01,374][11586] Starting process rollout_proc2 +[2023-02-24 13:43:01,375][11586] Starting process rollout_proc3 +[2023-02-24 13:43:01,375][11586] Starting process rollout_proc4 +[2023-02-24 13:43:01,375][11586] Starting process rollout_proc5 +[2023-02-24 13:43:01,375][11586] Starting process rollout_proc6 +[2023-02-24 13:43:01,375][11586] Starting process rollout_proc7 +[2023-02-24 13:43:13,148][16334] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-24 13:43:13,148][16334] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2023-02-24 13:43:13,541][16350] Worker 1 uses CPU cores [1] +[2023-02-24 13:43:13,725][16353] Worker 4 uses CPU cores [0] +[2023-02-24 13:43:14,142][16348] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-24 13:43:14,148][16348] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2023-02-24 13:43:14,192][16354] Worker 5 uses CPU cores [1] +[2023-02-24 13:43:14,235][16355] Worker 7 uses CPU cores [1] +[2023-02-24 13:43:14,284][16352] Worker 2 uses CPU cores [0] +[2023-02-24 13:43:14,295][16351] Worker 3 uses CPU cores [1] +[2023-02-24 13:43:14,317][16349] Worker 0 uses CPU cores [0] +[2023-02-24 13:43:14,329][16356] Worker 6 uses CPU cores [0] +[2023-02-24 13:43:14,388][16348] Num visible devices: 1 +[2023-02-24 13:43:14,390][16334] Num visible devices: 1 +[2023-02-24 13:43:14,399][16334] Starting seed is not provided +[2023-02-24 13:43:14,400][16334] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-24 13:43:14,400][16334] Initializing actor-critic model on device cuda:0 +[2023-02-24 13:43:14,400][16334] RunningMeanStd input shape: (3, 72, 128) +[2023-02-24 13:43:14,403][16334] RunningMeanStd input shape: (1,) +[2023-02-24 13:43:14,416][16334] ConvEncoder: input_channels=3 +[2023-02-24 13:43:14,698][16334] Conv encoder output size: 512 +[2023-02-24 13:43:14,698][16334] Policy head output size: 512 +[2023-02-24 13:43:14,748][16334] Created Actor Critic model with architecture: +[2023-02-24 13:43:14,748][16334] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2023-02-24 13:43:21,254][11586] Heartbeat connected on Batcher_0 +[2023-02-24 13:43:21,264][11586] Heartbeat connected on InferenceWorker_p0-w0 +[2023-02-24 13:43:21,275][11586] Heartbeat connected on RolloutWorker_w0 +[2023-02-24 13:43:21,280][11586] Heartbeat connected on RolloutWorker_w1 +[2023-02-24 13:43:21,284][11586] Heartbeat connected on RolloutWorker_w2 +[2023-02-24 13:43:21,287][11586] Heartbeat connected on RolloutWorker_w3 +[2023-02-24 13:43:21,292][11586] Heartbeat connected on RolloutWorker_w4 +[2023-02-24 13:43:21,297][11586] Heartbeat connected on RolloutWorker_w5 +[2023-02-24 13:43:21,300][11586] Heartbeat connected on RolloutWorker_w6 +[2023-02-24 13:43:21,303][11586] Heartbeat connected on RolloutWorker_w7 +[2023-02-24 13:43:21,689][16334] Using optimizer +[2023-02-24 13:43:21,690][16334] No checkpoints found +[2023-02-24 13:43:21,690][16334] Did not load from checkpoint, starting from scratch! +[2023-02-24 13:43:21,691][16334] Initialized policy 0 weights for model version 0 +[2023-02-24 13:43:21,694][16334] LearnerWorker_p0 finished initialization! +[2023-02-24 13:43:21,696][11586] Heartbeat connected on LearnerWorker_p0 +[2023-02-24 13:43:21,695][16334] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-24 13:43:21,893][11586] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-24 13:43:21,927][16348] RunningMeanStd input shape: (3, 72, 128) +[2023-02-24 13:43:21,928][16348] RunningMeanStd input shape: (1,) +[2023-02-24 13:43:21,941][16348] ConvEncoder: input_channels=3 +[2023-02-24 13:43:22,044][16348] Conv encoder output size: 512 +[2023-02-24 13:43:22,044][16348] Policy head output size: 512 +[2023-02-24 13:43:25,247][11586] Inference worker 0-0 is ready! +[2023-02-24 13:43:25,250][11586] All inference workers are ready! Signal rollout workers to start! +[2023-02-24 13:43:25,379][16351] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 13:43:25,449][16356] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 13:43:25,466][16349] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 13:43:25,472][16350] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 13:43:25,477][16352] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 13:43:25,487][16353] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 13:43:25,495][16354] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 13:43:25,569][16355] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 13:43:26,894][11586] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-24 13:43:26,935][16353] Decorrelating experience for 0 frames... +[2023-02-24 13:43:26,936][16356] Decorrelating experience for 0 frames... +[2023-02-24 13:43:27,354][16351] Decorrelating experience for 0 frames... +[2023-02-24 13:43:27,395][16354] Decorrelating experience for 0 frames... +[2023-02-24 13:43:27,399][16350] Decorrelating experience for 0 frames... +[2023-02-24 13:43:27,415][16355] Decorrelating experience for 0 frames... +[2023-02-24 13:43:28,046][16350] Decorrelating experience for 32 frames... +[2023-02-24 13:43:28,376][16349] Decorrelating experience for 0 frames... +[2023-02-24 13:43:28,378][16356] Decorrelating experience for 32 frames... +[2023-02-24 13:43:28,432][16353] Decorrelating experience for 32 frames... +[2023-02-24 13:43:28,715][16352] Decorrelating experience for 0 frames... +[2023-02-24 13:43:28,999][16349] Decorrelating experience for 32 frames... +[2023-02-24 13:43:29,265][16351] Decorrelating experience for 32 frames... +[2023-02-24 13:43:29,436][16354] Decorrelating experience for 32 frames... +[2023-02-24 13:43:29,817][16350] Decorrelating experience for 64 frames... +[2023-02-24 13:43:29,985][16353] Decorrelating experience for 64 frames... +[2023-02-24 13:43:30,116][16349] Decorrelating experience for 64 frames... +[2023-02-24 13:43:30,257][16352] Decorrelating experience for 32 frames... +[2023-02-24 13:43:30,398][16351] Decorrelating experience for 64 frames... +[2023-02-24 13:43:30,783][16350] Decorrelating experience for 96 frames... +[2023-02-24 13:43:31,018][16354] Decorrelating experience for 64 frames... +[2023-02-24 13:43:31,292][16356] Decorrelating experience for 64 frames... +[2023-02-24 13:43:31,551][16352] Decorrelating experience for 64 frames... +[2023-02-24 13:43:31,591][16354] Decorrelating experience for 96 frames... +[2023-02-24 13:43:31,893][11586] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-24 13:43:32,082][16349] Decorrelating experience for 96 frames... +[2023-02-24 13:43:32,536][16355] Decorrelating experience for 32 frames... +[2023-02-24 13:43:32,595][16352] Decorrelating experience for 96 frames... +[2023-02-24 13:43:32,920][16353] Decorrelating experience for 96 frames... +[2023-02-24 13:43:32,993][16351] Decorrelating experience for 96 frames... +[2023-02-24 13:43:33,415][16356] Decorrelating experience for 96 frames... +[2023-02-24 13:43:33,547][16355] Decorrelating experience for 64 frames... +[2023-02-24 13:43:33,878][16355] Decorrelating experience for 96 frames... +[2023-02-24 13:43:36,893][11586] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 3.1. Samples: 46. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-24 13:43:36,899][11586] Avg episode reward: [(0, '1.695')] +[2023-02-24 13:43:37,550][16334] Signal inference workers to stop experience collection... +[2023-02-24 13:43:37,573][16348] InferenceWorker_p0-w0: stopping experience collection +[2023-02-24 13:43:40,413][16334] Signal inference workers to resume experience collection... +[2023-02-24 13:43:40,415][16348] InferenceWorker_p0-w0: resuming experience collection +[2023-02-24 13:43:41,895][11586] Fps is (10 sec: 409.5, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 112.6. Samples: 2252. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2023-02-24 13:43:41,903][11586] Avg episode reward: [(0, '2.277')] +[2023-02-24 13:43:46,893][11586] Fps is (10 sec: 2048.0, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 20480. Throughput: 0: 214.5. Samples: 5362. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2023-02-24 13:43:46,899][11586] Avg episode reward: [(0, '3.573')] +[2023-02-24 13:43:51,320][16348] Updated weights for policy 0, policy_version 10 (0.0598) +[2023-02-24 13:43:51,893][11586] Fps is (10 sec: 3687.2, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 40960. Throughput: 0: 280.5. Samples: 8414. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:43:51,896][11586] Avg episode reward: [(0, '4.038')] +[2023-02-24 13:43:56,893][11586] Fps is (10 sec: 4095.7, 60 sec: 1755.4, 300 sec: 1755.4). Total num frames: 61440. Throughput: 0: 424.1. Samples: 14844. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2023-02-24 13:43:56,898][11586] Avg episode reward: [(0, '4.372')] +[2023-02-24 13:44:01,893][11586] Fps is (10 sec: 3276.8, 60 sec: 1843.2, 300 sec: 1843.2). Total num frames: 73728. Throughput: 0: 482.0. Samples: 19280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:44:01,896][11586] Avg episode reward: [(0, '4.520')] +[2023-02-24 13:44:03,901][16348] Updated weights for policy 0, policy_version 20 (0.0012) +[2023-02-24 13:44:06,893][11586] Fps is (10 sec: 2867.4, 60 sec: 2002.5, 300 sec: 2002.5). Total num frames: 90112. Throughput: 0: 473.3. Samples: 21298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:44:06,896][11586] Avg episode reward: [(0, '4.434')] +[2023-02-24 13:44:11,893][11586] Fps is (10 sec: 3686.4, 60 sec: 2211.8, 300 sec: 2211.8). Total num frames: 110592. Throughput: 0: 592.9. Samples: 26678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:44:11,896][11586] Avg episode reward: [(0, '4.254')] +[2023-02-24 13:44:11,900][16334] Saving new best policy, reward=4.254! +[2023-02-24 13:44:14,707][16348] Updated weights for policy 0, policy_version 30 (0.0014) +[2023-02-24 13:44:16,893][11586] Fps is (10 sec: 4096.0, 60 sec: 2383.1, 300 sec: 2383.1). Total num frames: 131072. Throughput: 0: 733.6. Samples: 33012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:44:16,896][11586] Avg episode reward: [(0, '4.381')] +[2023-02-24 13:44:16,913][16334] Saving new best policy, reward=4.381! +[2023-02-24 13:44:21,896][11586] Fps is (10 sec: 3275.8, 60 sec: 2389.2, 300 sec: 2389.2). Total num frames: 143360. Throughput: 0: 779.3. Samples: 35116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:44:21,898][11586] Avg episode reward: [(0, '4.486')] +[2023-02-24 13:44:21,905][16334] Saving new best policy, reward=4.486! +[2023-02-24 13:44:26,893][11586] Fps is (10 sec: 2457.6, 60 sec: 2594.2, 300 sec: 2394.6). Total num frames: 155648. Throughput: 0: 818.0. Samples: 39060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:44:26,898][11586] Avg episode reward: [(0, '4.422')] +[2023-02-24 13:44:28,184][16348] Updated weights for policy 0, policy_version 40 (0.0041) +[2023-02-24 13:44:31,893][11586] Fps is (10 sec: 3277.8, 60 sec: 2935.5, 300 sec: 2516.1). Total num frames: 176128. Throughput: 0: 876.0. Samples: 44784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:44:31,900][11586] Avg episode reward: [(0, '4.453')] +[2023-02-24 13:44:36,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 2621.4). Total num frames: 196608. Throughput: 0: 878.4. Samples: 47944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:44:36,900][11586] Avg episode reward: [(0, '4.511')] +[2023-02-24 13:44:36,936][16334] Saving new best policy, reward=4.511! +[2023-02-24 13:44:38,401][16348] Updated weights for policy 0, policy_version 50 (0.0022) +[2023-02-24 13:44:41,896][11586] Fps is (10 sec: 3685.1, 60 sec: 3481.5, 300 sec: 2662.3). Total num frames: 212992. Throughput: 0: 846.9. Samples: 52956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:44:41,899][11586] Avg episode reward: [(0, '4.567')] +[2023-02-24 13:44:41,905][16334] Saving new best policy, reward=4.567! +[2023-02-24 13:44:46,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 2650.4). Total num frames: 225280. Throughput: 0: 833.8. Samples: 56802. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:44:46,895][11586] Avg episode reward: [(0, '4.486')] +[2023-02-24 13:44:51,495][16348] Updated weights for policy 0, policy_version 60 (0.0016) +[2023-02-24 13:44:51,893][11586] Fps is (10 sec: 3277.9, 60 sec: 3413.3, 300 sec: 2730.7). Total num frames: 245760. Throughput: 0: 852.5. Samples: 59662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:44:51,901][11586] Avg episode reward: [(0, '4.393')] +[2023-02-24 13:44:56,893][11586] Fps is (10 sec: 4095.9, 60 sec: 3413.4, 300 sec: 2802.5). Total num frames: 266240. Throughput: 0: 873.6. Samples: 65992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:44:56,895][11586] Avg episode reward: [(0, '4.416')] +[2023-02-24 13:44:56,903][16334] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000065_266240.pth... +[2023-02-24 13:45:01,893][11586] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 2785.3). Total num frames: 278528. Throughput: 0: 837.2. Samples: 70688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:45:01,898][11586] Avg episode reward: [(0, '4.298')] +[2023-02-24 13:45:03,560][16348] Updated weights for policy 0, policy_version 70 (0.0014) +[2023-02-24 13:45:06,894][11586] Fps is (10 sec: 2866.8, 60 sec: 3413.2, 300 sec: 2808.6). Total num frames: 294912. Throughput: 0: 834.7. Samples: 72674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:45:06,898][11586] Avg episode reward: [(0, '4.312')] +[2023-02-24 13:45:11,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 2867.2). Total num frames: 315392. Throughput: 0: 862.9. Samples: 77890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:45:11,899][11586] Avg episode reward: [(0, '4.443')] +[2023-02-24 13:45:14,601][16348] Updated weights for policy 0, policy_version 80 (0.0035) +[2023-02-24 13:45:16,895][11586] Fps is (10 sec: 4095.6, 60 sec: 3413.2, 300 sec: 2920.6). Total num frames: 335872. Throughput: 0: 878.6. Samples: 84324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:45:16,898][11586] Avg episode reward: [(0, '4.337')] +[2023-02-24 13:45:21,893][11586] Fps is (10 sec: 3276.6, 60 sec: 3413.5, 300 sec: 2901.3). Total num frames: 348160. Throughput: 0: 863.0. Samples: 86778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:45:21,898][11586] Avg episode reward: [(0, '4.352')] +[2023-02-24 13:45:26,893][11586] Fps is (10 sec: 2867.9, 60 sec: 3481.6, 300 sec: 2916.4). Total num frames: 364544. Throughput: 0: 840.3. Samples: 90766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 13:45:26,901][11586] Avg episode reward: [(0, '4.355')] +[2023-02-24 13:45:27,920][16348] Updated weights for policy 0, policy_version 90 (0.0011) +[2023-02-24 13:45:31,893][11586] Fps is (10 sec: 3686.6, 60 sec: 3481.6, 300 sec: 2961.7). Total num frames: 385024. Throughput: 0: 876.8. Samples: 96260. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:45:31,896][11586] Avg episode reward: [(0, '4.426')] +[2023-02-24 13:45:36,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3003.7). Total num frames: 405504. Throughput: 0: 884.0. Samples: 99440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:45:36,896][11586] Avg episode reward: [(0, '4.539')] +[2023-02-24 13:45:37,482][16348] Updated weights for policy 0, policy_version 100 (0.0016) +[2023-02-24 13:45:41,896][11586] Fps is (10 sec: 3275.9, 60 sec: 3413.4, 300 sec: 2984.2). Total num frames: 417792. Throughput: 0: 864.0. Samples: 104872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:45:41,899][11586] Avg episode reward: [(0, '4.483')] +[2023-02-24 13:45:46,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 2994.3). Total num frames: 434176. Throughput: 0: 848.9. Samples: 108890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:45:46,899][11586] Avg episode reward: [(0, '4.622')] +[2023-02-24 13:45:46,910][16334] Saving new best policy, reward=4.622! +[2023-02-24 13:45:51,009][16348] Updated weights for policy 0, policy_version 110 (0.0012) +[2023-02-24 13:45:51,893][11586] Fps is (10 sec: 3277.7, 60 sec: 3413.3, 300 sec: 3003.7). Total num frames: 450560. Throughput: 0: 860.2. Samples: 111382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:45:51,895][11586] Avg episode reward: [(0, '4.513')] +[2023-02-24 13:45:56,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3065.4). Total num frames: 475136. Throughput: 0: 884.5. Samples: 117692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:45:56,896][11586] Avg episode reward: [(0, '4.300')] +[2023-02-24 13:46:01,893][11586] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3046.4). Total num frames: 487424. Throughput: 0: 853.4. Samples: 122726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:46:01,896][11586] Avg episode reward: [(0, '4.235')] +[2023-02-24 13:46:02,263][16348] Updated weights for policy 0, policy_version 120 (0.0013) +[2023-02-24 13:46:06,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3053.4). Total num frames: 503808. Throughput: 0: 844.1. Samples: 124762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 13:46:06,901][11586] Avg episode reward: [(0, '4.430')] +[2023-02-24 13:46:11,893][11586] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3060.0). Total num frames: 520192. Throughput: 0: 862.0. Samples: 129556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:46:11,896][11586] Avg episode reward: [(0, '4.491')] +[2023-02-24 13:46:14,219][16348] Updated weights for policy 0, policy_version 130 (0.0017) +[2023-02-24 13:46:16,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3089.6). Total num frames: 540672. Throughput: 0: 880.6. Samples: 135888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:46:16,901][11586] Avg episode reward: [(0, '4.493')] +[2023-02-24 13:46:21,895][11586] Fps is (10 sec: 3685.4, 60 sec: 3481.5, 300 sec: 3094.7). Total num frames: 557056. Throughput: 0: 869.1. Samples: 138550. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-24 13:46:21,897][11586] Avg episode reward: [(0, '4.518')] +[2023-02-24 13:46:26,894][11586] Fps is (10 sec: 2866.8, 60 sec: 3413.3, 300 sec: 3077.5). Total num frames: 569344. Throughput: 0: 835.4. Samples: 142462. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-24 13:46:26,903][11586] Avg episode reward: [(0, '4.567')] +[2023-02-24 13:46:27,154][16348] Updated weights for policy 0, policy_version 140 (0.0021) +[2023-02-24 13:46:31,893][11586] Fps is (10 sec: 3277.6, 60 sec: 3413.3, 300 sec: 3104.3). Total num frames: 589824. Throughput: 0: 864.6. Samples: 147796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-24 13:46:31,901][11586] Avg episode reward: [(0, '4.339')] +[2023-02-24 13:46:36,893][11586] Fps is (10 sec: 4096.5, 60 sec: 3413.3, 300 sec: 3129.8). Total num frames: 610304. Throughput: 0: 881.2. Samples: 151036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:46:36,899][11586] Avg episode reward: [(0, '4.255')] +[2023-02-24 13:46:37,080][16348] Updated weights for policy 0, policy_version 150 (0.0013) +[2023-02-24 13:46:41,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 3133.4). Total num frames: 626688. Throughput: 0: 867.1. Samples: 156712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:46:41,898][11586] Avg episode reward: [(0, '4.387')] +[2023-02-24 13:46:46,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3117.0). Total num frames: 638976. Throughput: 0: 846.1. Samples: 160800. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:46:46,896][11586] Avg episode reward: [(0, '4.509')] +[2023-02-24 13:46:50,409][16348] Updated weights for policy 0, policy_version 160 (0.0012) +[2023-02-24 13:46:51,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3140.3). Total num frames: 659456. Throughput: 0: 855.0. Samples: 163236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 13:46:51,902][11586] Avg episode reward: [(0, '4.575')] +[2023-02-24 13:46:56,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3162.5). Total num frames: 679936. Throughput: 0: 890.8. Samples: 169642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:46:56,895][11586] Avg episode reward: [(0, '4.579')] +[2023-02-24 13:46:56,966][16334] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000167_684032.pth... +[2023-02-24 13:47:00,411][16348] Updated weights for policy 0, policy_version 170 (0.0013) +[2023-02-24 13:47:01,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3165.1). Total num frames: 696320. Throughput: 0: 868.8. Samples: 174984. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:47:01,899][11586] Avg episode reward: [(0, '4.408')] +[2023-02-24 13:47:06,895][11586] Fps is (10 sec: 3276.0, 60 sec: 3481.5, 300 sec: 3167.5). Total num frames: 712704. Throughput: 0: 855.2. Samples: 177034. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-24 13:47:06,900][11586] Avg episode reward: [(0, '4.368')] +[2023-02-24 13:47:11,894][11586] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3169.9). Total num frames: 729088. Throughput: 0: 876.0. Samples: 181884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:47:11,898][11586] Avg episode reward: [(0, '4.486')] +[2023-02-24 13:47:13,127][16348] Updated weights for policy 0, policy_version 180 (0.0022) +[2023-02-24 13:47:16,893][11586] Fps is (10 sec: 4096.9, 60 sec: 3549.8, 300 sec: 3207.1). Total num frames: 753664. Throughput: 0: 901.5. Samples: 188366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:47:16,895][11586] Avg episode reward: [(0, '4.552')] +[2023-02-24 13:47:21,895][11586] Fps is (10 sec: 3686.1, 60 sec: 3481.6, 300 sec: 3191.4). Total num frames: 765952. Throughput: 0: 893.6. Samples: 191248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:47:21,901][11586] Avg episode reward: [(0, '4.705')] +[2023-02-24 13:47:21,970][16334] Saving new best policy, reward=4.705! +[2023-02-24 13:47:25,175][16348] Updated weights for policy 0, policy_version 190 (0.0016) +[2023-02-24 13:47:26,893][11586] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3193.2). Total num frames: 782336. Throughput: 0: 855.1. Samples: 195190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:47:26,902][11586] Avg episode reward: [(0, '4.601')] +[2023-02-24 13:47:31,893][11586] Fps is (10 sec: 3277.6, 60 sec: 3481.6, 300 sec: 3194.9). Total num frames: 798720. Throughput: 0: 882.8. Samples: 200524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:47:31,896][11586] Avg episode reward: [(0, '4.591')] +[2023-02-24 13:47:35,713][16348] Updated weights for policy 0, policy_version 200 (0.0019) +[2023-02-24 13:47:36,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3228.6). Total num frames: 823296. Throughput: 0: 900.7. Samples: 203766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:47:36,895][11586] Avg episode reward: [(0, '4.746')] +[2023-02-24 13:47:36,917][16334] Saving new best policy, reward=4.746! +[2023-02-24 13:47:41,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3229.5). Total num frames: 839680. Throughput: 0: 882.6. Samples: 209358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:47:41,899][11586] Avg episode reward: [(0, '4.573')] +[2023-02-24 13:47:46,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3215.0). Total num frames: 851968. Throughput: 0: 852.6. Samples: 213350. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:47:46,898][11586] Avg episode reward: [(0, '4.585')] +[2023-02-24 13:47:48,965][16348] Updated weights for policy 0, policy_version 210 (0.0013) +[2023-02-24 13:47:51,893][11586] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3216.1). Total num frames: 868352. Throughput: 0: 859.1. Samples: 215690. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2023-02-24 13:47:51,896][11586] Avg episode reward: [(0, '4.810')] +[2023-02-24 13:47:51,903][16334] Saving new best policy, reward=4.810! +[2023-02-24 13:47:56,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3247.0). Total num frames: 892928. Throughput: 0: 893.5. Samples: 222088. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-24 13:47:56,900][11586] Avg episode reward: [(0, '4.848')] +[2023-02-24 13:47:56,913][16334] Saving new best policy, reward=4.848! +[2023-02-24 13:47:58,845][16348] Updated weights for policy 0, policy_version 220 (0.0015) +[2023-02-24 13:48:01,893][11586] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3232.9). Total num frames: 905216. Throughput: 0: 864.9. Samples: 227286. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:48:01,899][11586] Avg episode reward: [(0, '4.655')] +[2023-02-24 13:48:06,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3233.7). Total num frames: 921600. Throughput: 0: 846.4. Samples: 229336. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-24 13:48:06,900][11586] Avg episode reward: [(0, '4.524')] +[2023-02-24 13:48:11,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3234.4). Total num frames: 937984. Throughput: 0: 861.7. Samples: 233968. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2023-02-24 13:48:11,900][11586] Avg episode reward: [(0, '4.697')] +[2023-02-24 13:48:12,247][16348] Updated weights for policy 0, policy_version 230 (0.0014) +[2023-02-24 13:48:16,896][11586] Fps is (10 sec: 3685.3, 60 sec: 3413.2, 300 sec: 3249.0). Total num frames: 958464. Throughput: 0: 885.5. Samples: 240374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:48:16,900][11586] Avg episode reward: [(0, '4.751')] +[2023-02-24 13:48:21,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3304.6). Total num frames: 974848. Throughput: 0: 881.1. Samples: 243414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:48:21,896][11586] Avg episode reward: [(0, '4.746')] +[2023-02-24 13:48:23,535][16348] Updated weights for policy 0, policy_version 240 (0.0012) +[2023-02-24 13:48:26,894][11586] Fps is (10 sec: 3277.3, 60 sec: 3481.5, 300 sec: 3360.1). Total num frames: 991232. Throughput: 0: 846.1. Samples: 247434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:48:26,897][11586] Avg episode reward: [(0, '5.067')] +[2023-02-24 13:48:26,908][16334] Saving new best policy, reward=5.067! +[2023-02-24 13:48:31,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 1007616. Throughput: 0: 867.1. Samples: 252370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:48:31,895][11586] Avg episode reward: [(0, '4.927')] +[2023-02-24 13:48:35,219][16348] Updated weights for policy 0, policy_version 250 (0.0024) +[2023-02-24 13:48:36,893][11586] Fps is (10 sec: 3686.9, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 1028096. Throughput: 0: 886.1. Samples: 255564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:48:36,895][11586] Avg episode reward: [(0, '5.010')] +[2023-02-24 13:48:41,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1048576. Throughput: 0: 877.6. Samples: 261578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:48:41,898][11586] Avg episode reward: [(0, '5.279')] +[2023-02-24 13:48:41,908][16334] Saving new best policy, reward=5.279! +[2023-02-24 13:48:46,894][11586] Fps is (10 sec: 3276.4, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 1060864. Throughput: 0: 850.5. Samples: 265560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:48:46,901][11586] Avg episode reward: [(0, '5.089')] +[2023-02-24 13:48:48,081][16348] Updated weights for policy 0, policy_version 260 (0.0038) +[2023-02-24 13:48:51,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1077248. Throughput: 0: 850.0. Samples: 267588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:48:51,898][11586] Avg episode reward: [(0, '5.068')] +[2023-02-24 13:48:56,893][11586] Fps is (10 sec: 3686.9, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 1097728. Throughput: 0: 889.6. Samples: 273998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:48:56,898][11586] Avg episode reward: [(0, '5.328')] +[2023-02-24 13:48:56,909][16334] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_1097728.pth... +[2023-02-24 13:48:57,046][16334] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000065_266240.pth +[2023-02-24 13:48:57,058][16334] Saving new best policy, reward=5.328! +[2023-02-24 13:48:58,244][16348] Updated weights for policy 0, policy_version 270 (0.0014) +[2023-02-24 13:49:01,896][11586] Fps is (10 sec: 3685.1, 60 sec: 3481.4, 300 sec: 3471.1). Total num frames: 1114112. Throughput: 0: 870.3. Samples: 279540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:49:01,899][11586] Avg episode reward: [(0, '5.518')] +[2023-02-24 13:49:01,902][16334] Saving new best policy, reward=5.518! +[2023-02-24 13:49:06,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1126400. Throughput: 0: 845.2. Samples: 281446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:49:06,895][11586] Avg episode reward: [(0, '5.638')] +[2023-02-24 13:49:06,907][16334] Saving new best policy, reward=5.638! +[2023-02-24 13:49:11,570][16348] Updated weights for policy 0, policy_version 280 (0.0013) +[2023-02-24 13:49:11,896][11586] Fps is (10 sec: 3276.9, 60 sec: 3481.4, 300 sec: 3443.4). Total num frames: 1146880. Throughput: 0: 852.1. Samples: 285778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:49:11,899][11586] Avg episode reward: [(0, '5.723')] +[2023-02-24 13:49:11,902][16334] Saving new best policy, reward=5.723! +[2023-02-24 13:49:16,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3481.8, 300 sec: 3471.2). Total num frames: 1167360. Throughput: 0: 883.2. Samples: 292112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:49:16,900][11586] Avg episode reward: [(0, '5.731')] +[2023-02-24 13:49:16,911][16334] Saving new best policy, reward=5.731! +[2023-02-24 13:49:21,893][11586] Fps is (10 sec: 3687.5, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1183744. Throughput: 0: 880.7. Samples: 295194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:49:21,899][11586] Avg episode reward: [(0, '5.716')] +[2023-02-24 13:49:22,766][16348] Updated weights for policy 0, policy_version 290 (0.0015) +[2023-02-24 13:49:26,893][11586] Fps is (10 sec: 2867.1, 60 sec: 3413.4, 300 sec: 3457.3). Total num frames: 1196032. Throughput: 0: 838.0. Samples: 299288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:49:26,896][11586] Avg episode reward: [(0, '5.513')] +[2023-02-24 13:49:31,893][11586] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1212416. Throughput: 0: 849.8. Samples: 303802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:49:31,896][11586] Avg episode reward: [(0, '5.522')] +[2023-02-24 13:49:35,077][16348] Updated weights for policy 0, policy_version 300 (0.0017) +[2023-02-24 13:49:36,893][11586] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 1232896. Throughput: 0: 874.8. Samples: 306956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:49:36,898][11586] Avg episode reward: [(0, '5.655')] +[2023-02-24 13:49:41,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 1253376. Throughput: 0: 873.6. Samples: 313312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:49:41,900][11586] Avg episode reward: [(0, '5.586')] +[2023-02-24 13:49:46,893][11586] Fps is (10 sec: 3276.7, 60 sec: 3413.4, 300 sec: 3457.3). Total num frames: 1265664. Throughput: 0: 840.4. Samples: 317354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:49:46,901][11586] Avg episode reward: [(0, '5.876')] +[2023-02-24 13:49:46,915][16334] Saving new best policy, reward=5.876! +[2023-02-24 13:49:47,267][16348] Updated weights for policy 0, policy_version 310 (0.0028) +[2023-02-24 13:49:51,893][11586] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1282048. Throughput: 0: 840.8. Samples: 319282. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:49:51,898][11586] Avg episode reward: [(0, '5.866')] +[2023-02-24 13:49:56,893][11586] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 1302528. Throughput: 0: 878.9. Samples: 325326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:49:56,898][11586] Avg episode reward: [(0, '5.982')] +[2023-02-24 13:49:56,909][16334] Saving new best policy, reward=5.982! +[2023-02-24 13:49:58,176][16348] Updated weights for policy 0, policy_version 320 (0.0021) +[2023-02-24 13:50:01,893][11586] Fps is (10 sec: 4096.2, 60 sec: 3481.8, 300 sec: 3485.1). Total num frames: 1323008. Throughput: 0: 866.3. Samples: 331094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:50:01,897][11586] Avg episode reward: [(0, '5.840')] +[2023-02-24 13:50:06,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1335296. Throughput: 0: 843.0. Samples: 333130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:50:06,897][11586] Avg episode reward: [(0, '5.878')] +[2023-02-24 13:50:11,470][16348] Updated weights for policy 0, policy_version 330 (0.0035) +[2023-02-24 13:50:11,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3413.5, 300 sec: 3443.4). Total num frames: 1351680. Throughput: 0: 843.5. Samples: 337246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 13:50:11,896][11586] Avg episode reward: [(0, '5.984')] +[2023-02-24 13:50:11,905][16334] Saving new best policy, reward=5.984! +[2023-02-24 13:50:16,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 1372160. Throughput: 0: 884.3. Samples: 343594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:50:16,901][11586] Avg episode reward: [(0, '6.574')] +[2023-02-24 13:50:16,916][16334] Saving new best policy, reward=6.574! +[2023-02-24 13:50:21,895][11586] Fps is (10 sec: 3685.6, 60 sec: 3413.2, 300 sec: 3471.2). Total num frames: 1388544. Throughput: 0: 885.6. Samples: 346812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:50:21,906][11586] Avg episode reward: [(0, '6.846')] +[2023-02-24 13:50:21,953][16334] Saving new best policy, reward=6.846! +[2023-02-24 13:50:21,970][16348] Updated weights for policy 0, policy_version 340 (0.0023) +[2023-02-24 13:50:26,895][11586] Fps is (10 sec: 3276.0, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 1404928. Throughput: 0: 840.2. Samples: 351124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-24 13:50:26,904][11586] Avg episode reward: [(0, '6.937')] +[2023-02-24 13:50:26,921][16334] Saving new best policy, reward=6.937! +[2023-02-24 13:50:31,893][11586] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1421312. Throughput: 0: 847.4. Samples: 355488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-02-24 13:50:31,896][11586] Avg episode reward: [(0, '7.134')] +[2023-02-24 13:50:31,902][16334] Saving new best policy, reward=7.134! +[2023-02-24 13:50:34,774][16348] Updated weights for policy 0, policy_version 350 (0.0021) +[2023-02-24 13:50:36,893][11586] Fps is (10 sec: 3687.3, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1441792. Throughput: 0: 873.0. Samples: 358568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-24 13:50:36,895][11586] Avg episode reward: [(0, '6.860')] +[2023-02-24 13:50:41,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1462272. Throughput: 0: 880.9. Samples: 364968. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:50:41,895][11586] Avg episode reward: [(0, '6.902')] +[2023-02-24 13:50:46,352][16348] Updated weights for policy 0, policy_version 360 (0.0014) +[2023-02-24 13:50:46,895][11586] Fps is (10 sec: 3276.1, 60 sec: 3481.5, 300 sec: 3471.2). Total num frames: 1474560. Throughput: 0: 843.9. Samples: 369072. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:50:46,901][11586] Avg episode reward: [(0, '6.740')] +[2023-02-24 13:50:51,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1490944. Throughput: 0: 843.9. Samples: 371106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:50:51,895][11586] Avg episode reward: [(0, '7.033')] +[2023-02-24 13:50:56,893][11586] Fps is (10 sec: 3687.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1511424. Throughput: 0: 883.2. Samples: 376992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 13:50:56,896][11586] Avg episode reward: [(0, '8.038')] +[2023-02-24 13:50:56,907][16334] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000369_1511424.pth... +[2023-02-24 13:50:57,028][16334] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000167_684032.pth +[2023-02-24 13:50:57,038][16334] Saving new best policy, reward=8.038! +[2023-02-24 13:50:57,663][16348] Updated weights for policy 0, policy_version 370 (0.0016) +[2023-02-24 13:51:01,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 1527808. Throughput: 0: 879.9. Samples: 383188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:51:01,899][11586] Avg episode reward: [(0, '8.618')] +[2023-02-24 13:51:01,975][16334] Saving new best policy, reward=8.618! +[2023-02-24 13:51:06,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1544192. Throughput: 0: 851.8. Samples: 385142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:51:06,899][11586] Avg episode reward: [(0, '9.310')] +[2023-02-24 13:51:06,915][16334] Saving new best policy, reward=9.310! +[2023-02-24 13:51:10,975][16348] Updated weights for policy 0, policy_version 380 (0.0021) +[2023-02-24 13:51:11,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1556480. Throughput: 0: 845.4. Samples: 389164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:51:11,901][11586] Avg episode reward: [(0, '9.794')] +[2023-02-24 13:51:11,906][16334] Saving new best policy, reward=9.794! +[2023-02-24 13:51:16,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1581056. Throughput: 0: 885.0. Samples: 395312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:51:16,896][11586] Avg episode reward: [(0, '9.723')] +[2023-02-24 13:51:20,542][16348] Updated weights for policy 0, policy_version 390 (0.0018) +[2023-02-24 13:51:21,898][11586] Fps is (10 sec: 4093.7, 60 sec: 3481.4, 300 sec: 3485.0). Total num frames: 1597440. Throughput: 0: 888.4. Samples: 398552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:51:21,903][11586] Avg episode reward: [(0, '9.744')] +[2023-02-24 13:51:26,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3471.2). Total num frames: 1613824. Throughput: 0: 851.8. Samples: 403300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 13:51:26,899][11586] Avg episode reward: [(0, '9.423')] +[2023-02-24 13:51:31,893][11586] Fps is (10 sec: 2868.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 1626112. Throughput: 0: 852.3. Samples: 407424. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:51:31,902][11586] Avg episode reward: [(0, '8.759')] +[2023-02-24 13:51:33,751][16348] Updated weights for policy 0, policy_version 400 (0.0019) +[2023-02-24 13:51:36,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1650688. Throughput: 0: 879.3. Samples: 410676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:51:36,896][11586] Avg episode reward: [(0, '8.992')] +[2023-02-24 13:51:41,893][11586] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1671168. Throughput: 0: 894.8. Samples: 417256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:51:41,895][11586] Avg episode reward: [(0, '9.703')] +[2023-02-24 13:51:44,032][16348] Updated weights for policy 0, policy_version 410 (0.0021) +[2023-02-24 13:51:46,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3471.2). Total num frames: 1683456. Throughput: 0: 856.2. Samples: 421716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:51:46,899][11586] Avg episode reward: [(0, '10.794')] +[2023-02-24 13:51:46,909][16334] Saving new best policy, reward=10.794! +[2023-02-24 13:51:51,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1699840. Throughput: 0: 855.7. Samples: 423648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:51:51,901][11586] Avg episode reward: [(0, '11.394')] +[2023-02-24 13:51:51,904][16334] Saving new best policy, reward=11.394! +[2023-02-24 13:51:56,531][16348] Updated weights for policy 0, policy_version 420 (0.0026) +[2023-02-24 13:51:56,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1720320. Throughput: 0: 889.7. Samples: 429202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:51:56,898][11586] Avg episode reward: [(0, '11.226')] +[2023-02-24 13:52:01,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1740800. Throughput: 0: 896.6. Samples: 435660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:52:01,901][11586] Avg episode reward: [(0, '10.589')] +[2023-02-24 13:52:06,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1753088. Throughput: 0: 870.7. Samples: 437728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:52:06,895][11586] Avg episode reward: [(0, '10.150')] +[2023-02-24 13:52:08,719][16348] Updated weights for policy 0, policy_version 430 (0.0015) +[2023-02-24 13:52:11,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3443.4). Total num frames: 1769472. Throughput: 0: 856.3. Samples: 441832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:52:11,897][11586] Avg episode reward: [(0, '10.410')] +[2023-02-24 13:52:16,893][11586] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1789952. Throughput: 0: 897.1. Samples: 447794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:52:16,903][11586] Avg episode reward: [(0, '10.839')] +[2023-02-24 13:52:19,387][16348] Updated weights for policy 0, policy_version 440 (0.0014) +[2023-02-24 13:52:21,893][11586] Fps is (10 sec: 4095.9, 60 sec: 3550.2, 300 sec: 3485.1). Total num frames: 1810432. Throughput: 0: 896.4. Samples: 451016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:52:21,895][11586] Avg episode reward: [(0, '11.202')] +[2023-02-24 13:52:26,898][11586] Fps is (10 sec: 3275.2, 60 sec: 3481.3, 300 sec: 3471.1). Total num frames: 1822720. Throughput: 0: 861.0. Samples: 456004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:52:26,905][11586] Avg episode reward: [(0, '10.794')] +[2023-02-24 13:52:31,893][11586] Fps is (10 sec: 2867.1, 60 sec: 3549.8, 300 sec: 3443.4). Total num frames: 1839104. Throughput: 0: 852.6. Samples: 460082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-24 13:52:31,900][11586] Avg episode reward: [(0, '10.399')] +[2023-02-24 13:52:32,784][16348] Updated weights for policy 0, policy_version 450 (0.0017) +[2023-02-24 13:52:36,893][11586] Fps is (10 sec: 3688.3, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1859584. Throughput: 0: 874.3. Samples: 462990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:52:36,895][11586] Avg episode reward: [(0, '11.061')] +[2023-02-24 13:52:41,893][11586] Fps is (10 sec: 4096.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1880064. Throughput: 0: 893.7. Samples: 469418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:52:41,899][11586] Avg episode reward: [(0, '11.383')] +[2023-02-24 13:52:42,034][16348] Updated weights for policy 0, policy_version 460 (0.0012) +[2023-02-24 13:52:46,894][11586] Fps is (10 sec: 3685.8, 60 sec: 3549.8, 300 sec: 3485.1). Total num frames: 1896448. Throughput: 0: 855.3. Samples: 474150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:52:46,899][11586] Avg episode reward: [(0, '12.964')] +[2023-02-24 13:52:46,919][16334] Saving new best policy, reward=12.964! +[2023-02-24 13:52:51,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1908736. Throughput: 0: 853.1. Samples: 476118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:52:51,896][11586] Avg episode reward: [(0, '13.172')] +[2023-02-24 13:52:51,900][16334] Saving new best policy, reward=13.172! +[2023-02-24 13:52:55,930][16348] Updated weights for policy 0, policy_version 470 (0.0016) +[2023-02-24 13:52:56,893][11586] Fps is (10 sec: 3277.3, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1929216. Throughput: 0: 874.2. Samples: 481172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:52:56,900][11586] Avg episode reward: [(0, '14.486')] +[2023-02-24 13:52:56,912][16334] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000471_1929216.pth... +[2023-02-24 13:52:57,079][16334] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_1097728.pth +[2023-02-24 13:52:57,091][16334] Saving new best policy, reward=14.486! +[2023-02-24 13:53:01,893][11586] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1949696. Throughput: 0: 881.2. Samples: 487448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:53:01,895][11586] Avg episode reward: [(0, '14.201')] +[2023-02-24 13:53:06,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1961984. Throughput: 0: 862.4. Samples: 489826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:53:06,896][11586] Avg episode reward: [(0, '13.964')] +[2023-02-24 13:53:07,498][16348] Updated weights for policy 0, policy_version 480 (0.0020) +[2023-02-24 13:53:11,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1978368. Throughput: 0: 842.5. Samples: 493914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:53:11,898][11586] Avg episode reward: [(0, '13.351')] +[2023-02-24 13:53:16,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1998848. Throughput: 0: 882.1. Samples: 499774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:53:16,901][11586] Avg episode reward: [(0, '13.081')] +[2023-02-24 13:53:18,676][16348] Updated weights for policy 0, policy_version 490 (0.0019) +[2023-02-24 13:53:21,896][11586] Fps is (10 sec: 4094.7, 60 sec: 3481.4, 300 sec: 3485.0). Total num frames: 2019328. Throughput: 0: 888.8. Samples: 502988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 13:53:21,899][11586] Avg episode reward: [(0, '14.499')] +[2023-02-24 13:53:21,905][16334] Saving new best policy, reward=14.499! +[2023-02-24 13:53:26,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.9, 300 sec: 3471.2). Total num frames: 2031616. Throughput: 0: 855.6. Samples: 507918. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:53:26,895][11586] Avg episode reward: [(0, '15.315')] +[2023-02-24 13:53:26,918][16334] Saving new best policy, reward=15.315! +[2023-02-24 13:53:31,715][16348] Updated weights for policy 0, policy_version 500 (0.0033) +[2023-02-24 13:53:31,893][11586] Fps is (10 sec: 2868.1, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 2048000. Throughput: 0: 844.4. Samples: 512148. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:53:31,900][11586] Avg episode reward: [(0, '15.565')] +[2023-02-24 13:53:31,904][16334] Saving new best policy, reward=15.565! +[2023-02-24 13:53:36,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 2068480. Throughput: 0: 870.6. Samples: 515294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:53:36,901][11586] Avg episode reward: [(0, '17.455')] +[2023-02-24 13:53:36,914][16334] Saving new best policy, reward=17.455! +[2023-02-24 13:53:41,239][16348] Updated weights for policy 0, policy_version 510 (0.0013) +[2023-02-24 13:53:41,897][11586] Fps is (10 sec: 4094.1, 60 sec: 3481.3, 300 sec: 3485.0). Total num frames: 2088960. Throughput: 0: 899.9. Samples: 521670. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:53:41,900][11586] Avg episode reward: [(0, '16.876')] +[2023-02-24 13:53:46,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3471.2). Total num frames: 2101248. Throughput: 0: 857.7. Samples: 526044. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-24 13:53:46,899][11586] Avg episode reward: [(0, '16.652')] +[2023-02-24 13:53:51,893][11586] Fps is (10 sec: 2868.6, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 2117632. Throughput: 0: 850.7. Samples: 528108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:53:51,896][11586] Avg episode reward: [(0, '16.505')] +[2023-02-24 13:53:54,055][16348] Updated weights for policy 0, policy_version 520 (0.0020) +[2023-02-24 13:53:56,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2138112. Throughput: 0: 894.7. Samples: 534176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:53:56,895][11586] Avg episode reward: [(0, '17.061')] +[2023-02-24 13:54:01,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2158592. Throughput: 0: 899.2. Samples: 540240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:54:01,895][11586] Avg episode reward: [(0, '16.498')] +[2023-02-24 13:54:05,425][16348] Updated weights for policy 0, policy_version 530 (0.0015) +[2023-02-24 13:54:06,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2170880. Throughput: 0: 873.8. Samples: 542306. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:54:06,903][11586] Avg episode reward: [(0, '17.225')] +[2023-02-24 13:54:11,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2191360. Throughput: 0: 861.6. Samples: 546688. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2023-02-24 13:54:11,903][11586] Avg episode reward: [(0, '16.198')] +[2023-02-24 13:54:16,568][16348] Updated weights for policy 0, policy_version 540 (0.0016) +[2023-02-24 13:54:16,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2211840. Throughput: 0: 914.0. Samples: 553278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:54:16,901][11586] Avg episode reward: [(0, '15.331')] +[2023-02-24 13:54:21,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 3499.0). Total num frames: 2228224. Throughput: 0: 915.2. Samples: 556476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:54:21,899][11586] Avg episode reward: [(0, '15.861')] +[2023-02-24 13:54:26,893][11586] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 3498.9). Total num frames: 2244608. Throughput: 0: 867.4. Samples: 560700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:54:26,901][11586] Avg episode reward: [(0, '14.839')] +[2023-02-24 13:54:29,519][16348] Updated weights for policy 0, policy_version 550 (0.0038) +[2023-02-24 13:54:31,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2260992. Throughput: 0: 882.8. Samples: 565768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:54:31,902][11586] Avg episode reward: [(0, '16.660')] +[2023-02-24 13:54:36,893][11586] Fps is (10 sec: 4096.3, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2285568. Throughput: 0: 909.8. Samples: 569048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:54:36,902][11586] Avg episode reward: [(0, '18.839')] +[2023-02-24 13:54:36,912][16334] Saving new best policy, reward=18.839! +[2023-02-24 13:54:38,997][16348] Updated weights for policy 0, policy_version 560 (0.0021) +[2023-02-24 13:54:41,896][11586] Fps is (10 sec: 3685.1, 60 sec: 3481.7, 300 sec: 3498.9). Total num frames: 2297856. Throughput: 0: 906.1. Samples: 574954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:54:41,899][11586] Avg episode reward: [(0, '19.053')] +[2023-02-24 13:54:41,925][16334] Saving new best policy, reward=19.053! +[2023-02-24 13:54:46,893][11586] Fps is (10 sec: 2867.1, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2314240. Throughput: 0: 861.1. Samples: 578992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-24 13:54:46,896][11586] Avg episode reward: [(0, '20.325')] +[2023-02-24 13:54:46,920][16334] Saving new best policy, reward=20.325! +[2023-02-24 13:54:51,879][16348] Updated weights for policy 0, policy_version 570 (0.0012) +[2023-02-24 13:54:51,893][11586] Fps is (10 sec: 3687.7, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2334720. Throughput: 0: 871.6. Samples: 581528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 13:54:51,901][11586] Avg episode reward: [(0, '20.596')] +[2023-02-24 13:54:51,904][16334] Saving new best policy, reward=20.596! +[2023-02-24 13:54:56,893][11586] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3499.0). Total num frames: 2355200. Throughput: 0: 916.8. Samples: 587944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-24 13:54:56,895][11586] Avg episode reward: [(0, '21.373')] +[2023-02-24 13:54:56,915][16334] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000575_2355200.pth... +[2023-02-24 13:54:57,032][16334] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000369_1511424.pth +[2023-02-24 13:54:57,055][16334] Saving new best policy, reward=21.373! +[2023-02-24 13:55:01,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2367488. Throughput: 0: 881.8. Samples: 592960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-24 13:55:01,895][11586] Avg episode reward: [(0, '19.517')] +[2023-02-24 13:55:03,513][16348] Updated weights for policy 0, policy_version 580 (0.0022) +[2023-02-24 13:55:06,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2383872. Throughput: 0: 854.9. Samples: 594948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:55:06,898][11586] Avg episode reward: [(0, '19.889')] +[2023-02-24 13:55:11,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2404352. Throughput: 0: 876.4. Samples: 600136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:55:11,899][11586] Avg episode reward: [(0, '20.030')] +[2023-02-24 13:55:14,453][16348] Updated weights for policy 0, policy_version 590 (0.0020) +[2023-02-24 13:55:16,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.9). Total num frames: 2424832. Throughput: 0: 907.0. Samples: 606584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:55:16,900][11586] Avg episode reward: [(0, '19.976')] +[2023-02-24 13:55:21,894][11586] Fps is (10 sec: 3685.8, 60 sec: 3549.8, 300 sec: 3512.9). Total num frames: 2441216. Throughput: 0: 889.2. Samples: 609062. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:55:21,898][11586] Avg episode reward: [(0, '19.782')] +[2023-02-24 13:55:26,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2453504. Throughput: 0: 846.6. Samples: 613050. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:55:26,900][11586] Avg episode reward: [(0, '20.747')] +[2023-02-24 13:55:27,570][16348] Updated weights for policy 0, policy_version 600 (0.0013) +[2023-02-24 13:55:31,893][11586] Fps is (10 sec: 3277.3, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2473984. Throughput: 0: 889.6. Samples: 619026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:55:31,895][11586] Avg episode reward: [(0, '20.667')] +[2023-02-24 13:55:36,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2494464. Throughput: 0: 904.1. Samples: 622212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:55:36,896][11586] Avg episode reward: [(0, '21.303')] +[2023-02-24 13:55:37,239][16348] Updated weights for policy 0, policy_version 610 (0.0017) +[2023-02-24 13:55:41,896][11586] Fps is (10 sec: 3685.1, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2510848. Throughput: 0: 873.3. Samples: 627246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:55:41,898][11586] Avg episode reward: [(0, '22.212')] +[2023-02-24 13:55:41,906][16334] Saving new best policy, reward=22.212! +[2023-02-24 13:55:46,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2523136. Throughput: 0: 853.4. Samples: 631362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:55:46,901][11586] Avg episode reward: [(0, '22.286')] +[2023-02-24 13:55:46,913][16334] Saving new best policy, reward=22.286! +[2023-02-24 13:55:49,949][16348] Updated weights for policy 0, policy_version 620 (0.0028) +[2023-02-24 13:55:51,893][11586] Fps is (10 sec: 3278.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2543616. Throughput: 0: 880.0. Samples: 634548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:55:51,895][11586] Avg episode reward: [(0, '20.926')] +[2023-02-24 13:55:56,893][11586] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 2564096. Throughput: 0: 907.9. Samples: 640990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:55:56,895][11586] Avg episode reward: [(0, '20.281')] +[2023-02-24 13:56:01,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2576384. Throughput: 0: 860.2. Samples: 645292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:56:01,899][11586] Avg episode reward: [(0, '20.289')] +[2023-02-24 13:56:02,015][16348] Updated weights for policy 0, policy_version 630 (0.0032) +[2023-02-24 13:56:06,893][11586] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 2596864. Throughput: 0: 850.0. Samples: 647312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:56:06,901][11586] Avg episode reward: [(0, '19.515')] +[2023-02-24 13:56:11,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2617344. Throughput: 0: 898.0. Samples: 653460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:56:11,899][11586] Avg episode reward: [(0, '19.289')] +[2023-02-24 13:56:12,495][16348] Updated weights for policy 0, policy_version 640 (0.0036) +[2023-02-24 13:56:16,896][11586] Fps is (10 sec: 3685.2, 60 sec: 3481.4, 300 sec: 3512.9). Total num frames: 2633728. Throughput: 0: 897.1. Samples: 659400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:56:16,900][11586] Avg episode reward: [(0, '19.114')] +[2023-02-24 13:56:21,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3512.8). Total num frames: 2650112. Throughput: 0: 872.0. Samples: 661454. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:56:21,896][11586] Avg episode reward: [(0, '18.714')] +[2023-02-24 13:56:25,321][16348] Updated weights for policy 0, policy_version 650 (0.0017) +[2023-02-24 13:56:26,893][11586] Fps is (10 sec: 3278.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2666496. Throughput: 0: 861.7. Samples: 666020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:56:26,900][11586] Avg episode reward: [(0, '19.984')] +[2023-02-24 13:56:31,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2686976. Throughput: 0: 915.9. Samples: 672576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:56:31,900][11586] Avg episode reward: [(0, '18.826')] +[2023-02-24 13:56:35,027][16348] Updated weights for policy 0, policy_version 660 (0.0018) +[2023-02-24 13:56:36,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2707456. Throughput: 0: 915.4. Samples: 675740. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:56:36,899][11586] Avg episode reward: [(0, '18.404')] +[2023-02-24 13:56:41,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3512.8). Total num frames: 2719744. Throughput: 0: 862.0. Samples: 679778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 13:56:41,896][11586] Avg episode reward: [(0, '19.447')] +[2023-02-24 13:56:46,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2740224. Throughput: 0: 884.6. Samples: 685098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:56:46,900][11586] Avg episode reward: [(0, '19.049')] +[2023-02-24 13:56:47,767][16348] Updated weights for policy 0, policy_version 670 (0.0023) +[2023-02-24 13:56:51,893][11586] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2760704. Throughput: 0: 912.5. Samples: 688374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:56:51,901][11586] Avg episode reward: [(0, '19.234')] +[2023-02-24 13:56:56,897][11586] Fps is (10 sec: 3684.7, 60 sec: 3549.6, 300 sec: 3512.8). Total num frames: 2777088. Throughput: 0: 902.3. Samples: 694068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:56:56,910][11586] Avg episode reward: [(0, '19.270')] +[2023-02-24 13:56:56,920][16334] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000678_2777088.pth... +[2023-02-24 13:56:57,105][16334] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000471_1929216.pth +[2023-02-24 13:56:59,595][16348] Updated weights for policy 0, policy_version 680 (0.0021) +[2023-02-24 13:57:01,894][11586] Fps is (10 sec: 2866.8, 60 sec: 3549.8, 300 sec: 3512.8). Total num frames: 2789376. Throughput: 0: 859.6. Samples: 698078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:57:01,897][11586] Avg episode reward: [(0, '19.281')] +[2023-02-24 13:57:06,893][11586] Fps is (10 sec: 3278.3, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2809856. Throughput: 0: 875.4. Samples: 700848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-24 13:57:06,899][11586] Avg episode reward: [(0, '18.594')] +[2023-02-24 13:57:10,226][16348] Updated weights for policy 0, policy_version 690 (0.0013) +[2023-02-24 13:57:11,893][11586] Fps is (10 sec: 4096.6, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2830336. Throughput: 0: 917.4. Samples: 707302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:57:11,899][11586] Avg episode reward: [(0, '19.445')] +[2023-02-24 13:57:16,893][11586] Fps is (10 sec: 3686.3, 60 sec: 3550.1, 300 sec: 3512.8). Total num frames: 2846720. Throughput: 0: 881.0. Samples: 712222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:57:16,899][11586] Avg episode reward: [(0, '19.672')] +[2023-02-24 13:57:21,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 2859008. Throughput: 0: 854.3. Samples: 714184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:57:21,901][11586] Avg episode reward: [(0, '19.183')] +[2023-02-24 13:57:22,938][16348] Updated weights for policy 0, policy_version 700 (0.0024) +[2023-02-24 13:57:26,893][11586] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2883584. Throughput: 0: 889.4. Samples: 719802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:57:26,895][11586] Avg episode reward: [(0, '18.622')] +[2023-02-24 13:57:31,893][11586] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2904064. Throughput: 0: 918.1. Samples: 726412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:57:31,904][11586] Avg episode reward: [(0, '18.527')] +[2023-02-24 13:57:32,834][16348] Updated weights for policy 0, policy_version 710 (0.0023) +[2023-02-24 13:57:36,894][11586] Fps is (10 sec: 3276.4, 60 sec: 3481.5, 300 sec: 3512.8). Total num frames: 2916352. Throughput: 0: 891.8. Samples: 728504. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:57:36,898][11586] Avg episode reward: [(0, '19.253')] +[2023-02-24 13:57:41,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3512.9). Total num frames: 2932736. Throughput: 0: 857.9. Samples: 732668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:57:41,895][11586] Avg episode reward: [(0, '19.307')] +[2023-02-24 13:57:45,177][16348] Updated weights for policy 0, policy_version 720 (0.0016) +[2023-02-24 13:57:46,893][11586] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2953216. Throughput: 0: 910.7. Samples: 739058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:57:46,895][11586] Avg episode reward: [(0, '20.853')] +[2023-02-24 13:57:51,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2973696. Throughput: 0: 921.5. Samples: 742314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:57:51,897][11586] Avg episode reward: [(0, '22.975')] +[2023-02-24 13:57:51,906][16334] Saving new best policy, reward=22.975! +[2023-02-24 13:57:56,896][11586] Fps is (10 sec: 3275.7, 60 sec: 3481.7, 300 sec: 3512.8). Total num frames: 2985984. Throughput: 0: 880.9. Samples: 746946. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:57:56,909][11586] Avg episode reward: [(0, '22.552')] +[2023-02-24 13:57:57,062][16348] Updated weights for policy 0, policy_version 730 (0.0012) +[2023-02-24 13:58:01,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3540.6). Total num frames: 3006464. Throughput: 0: 876.0. Samples: 751640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:58:01,895][11586] Avg episode reward: [(0, '23.305')] +[2023-02-24 13:58:01,902][16334] Saving new best policy, reward=23.305! +[2023-02-24 13:58:06,893][11586] Fps is (10 sec: 4097.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3026944. Throughput: 0: 902.2. Samples: 754784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:58:06,903][11586] Avg episode reward: [(0, '22.452')] +[2023-02-24 13:58:07,657][16348] Updated weights for policy 0, policy_version 740 (0.0022) +[2023-02-24 13:58:11,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3043328. Throughput: 0: 917.3. Samples: 761082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:58:11,896][11586] Avg episode reward: [(0, '21.481')] +[2023-02-24 13:58:16,894][11586] Fps is (10 sec: 2866.7, 60 sec: 3481.5, 300 sec: 3512.9). Total num frames: 3055616. Throughput: 0: 861.4. Samples: 765178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:58:16,903][11586] Avg episode reward: [(0, '19.025')] +[2023-02-24 13:58:20,404][16348] Updated weights for policy 0, policy_version 750 (0.0023) +[2023-02-24 13:58:21,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3076096. Throughput: 0: 865.0. Samples: 767426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:58:21,895][11586] Avg episode reward: [(0, '19.891')] +[2023-02-24 13:58:26,893][11586] Fps is (10 sec: 4096.7, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3096576. Throughput: 0: 916.4. Samples: 773906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:58:26,895][11586] Avg episode reward: [(0, '20.137')] +[2023-02-24 13:58:30,305][16348] Updated weights for policy 0, policy_version 760 (0.0011) +[2023-02-24 13:58:31,894][11586] Fps is (10 sec: 4095.3, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 3117056. Throughput: 0: 900.0. Samples: 779558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:58:31,899][11586] Avg episode reward: [(0, '21.103')] +[2023-02-24 13:58:36,894][11586] Fps is (10 sec: 3276.5, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 3129344. Throughput: 0: 873.8. Samples: 781634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:58:36,900][11586] Avg episode reward: [(0, '22.161')] +[2023-02-24 13:58:41,893][11586] Fps is (10 sec: 3277.3, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3149824. Throughput: 0: 883.0. Samples: 786678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:58:41,902][11586] Avg episode reward: [(0, '21.945')] +[2023-02-24 13:58:42,316][16348] Updated weights for policy 0, policy_version 770 (0.0046) +[2023-02-24 13:58:46,895][11586] Fps is (10 sec: 4095.4, 60 sec: 3618.0, 300 sec: 3568.3). Total num frames: 3170304. Throughput: 0: 924.1. Samples: 793228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:58:46,902][11586] Avg episode reward: [(0, '22.936')] +[2023-02-24 13:58:51,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3186688. Throughput: 0: 916.5. Samples: 796026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:58:51,899][11586] Avg episode reward: [(0, '22.030')] +[2023-02-24 13:58:54,142][16348] Updated weights for policy 0, policy_version 780 (0.0013) +[2023-02-24 13:58:56,893][11586] Fps is (10 sec: 2867.8, 60 sec: 3550.0, 300 sec: 3526.7). Total num frames: 3198976. Throughput: 0: 866.3. Samples: 800068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 13:58:56,896][11586] Avg episode reward: [(0, '21.553')] +[2023-02-24 13:58:56,919][16334] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000781_3198976.pth... +[2023-02-24 13:58:57,044][16334] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000575_2355200.pth +[2023-02-24 13:59:01,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3223552. Throughput: 0: 902.8. Samples: 805802. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:59:01,896][11586] Avg episode reward: [(0, '20.684')] +[2023-02-24 13:59:04,511][16348] Updated weights for policy 0, policy_version 790 (0.0021) +[2023-02-24 13:59:06,893][11586] Fps is (10 sec: 4505.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3244032. Throughput: 0: 927.7. Samples: 809172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:59:06,896][11586] Avg episode reward: [(0, '20.073')] +[2023-02-24 13:59:11,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3260416. Throughput: 0: 900.7. Samples: 814438. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 13:59:11,902][11586] Avg episode reward: [(0, '20.213')] +[2023-02-24 13:59:16,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3540.6). Total num frames: 3272704. Throughput: 0: 868.7. Samples: 818646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:59:16,900][11586] Avg episode reward: [(0, '20.585')] +[2023-02-24 13:59:17,341][16348] Updated weights for policy 0, policy_version 800 (0.0023) +[2023-02-24 13:59:21,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3293184. Throughput: 0: 895.5. Samples: 821932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 13:59:21,899][11586] Avg episode reward: [(0, '21.083')] +[2023-02-24 13:59:26,630][16348] Updated weights for policy 0, policy_version 810 (0.0023) +[2023-02-24 13:59:26,893][11586] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3317760. Throughput: 0: 933.0. Samples: 828662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:59:26,896][11586] Avg episode reward: [(0, '21.045')] +[2023-02-24 13:59:31,908][11586] Fps is (10 sec: 3680.6, 60 sec: 3549.0, 300 sec: 3540.4). Total num frames: 3330048. Throughput: 0: 886.6. Samples: 833138. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:59:31,917][11586] Avg episode reward: [(0, '21.209')] +[2023-02-24 13:59:36,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 3346432. Throughput: 0: 871.1. Samples: 835226. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 13:59:36,896][11586] Avg episode reward: [(0, '22.539')] +[2023-02-24 13:59:39,089][16348] Updated weights for policy 0, policy_version 820 (0.0025) +[2023-02-24 13:59:41,893][11586] Fps is (10 sec: 4102.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3371008. Throughput: 0: 919.4. Samples: 841440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:59:41,903][11586] Avg episode reward: [(0, '22.704')] +[2023-02-24 13:59:46,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3618.3, 300 sec: 3568.4). Total num frames: 3387392. Throughput: 0: 927.1. Samples: 847522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:59:46,898][11586] Avg episode reward: [(0, '22.720')] +[2023-02-24 13:59:50,477][16348] Updated weights for policy 0, policy_version 830 (0.0012) +[2023-02-24 13:59:51,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 3399680. Throughput: 0: 897.6. Samples: 849564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 13:59:51,898][11586] Avg episode reward: [(0, '22.879')] +[2023-02-24 13:59:56,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 3420160. Throughput: 0: 882.4. Samples: 854144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 13:59:56,895][11586] Avg episode reward: [(0, '23.813')] +[2023-02-24 13:59:56,910][16334] Saving new best policy, reward=23.813! +[2023-02-24 14:00:01,390][16348] Updated weights for policy 0, policy_version 840 (0.0024) +[2023-02-24 14:00:01,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3440640. Throughput: 0: 933.1. Samples: 860634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 14:00:01,895][11586] Avg episode reward: [(0, '23.739')] +[2023-02-24 14:00:06,897][11586] Fps is (10 sec: 3684.7, 60 sec: 3549.6, 300 sec: 3568.3). Total num frames: 3457024. Throughput: 0: 930.7. Samples: 863820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 14:00:06,899][11586] Avg episode reward: [(0, '21.959')] +[2023-02-24 14:00:11,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3473408. Throughput: 0: 871.3. Samples: 867870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 14:00:11,896][11586] Avg episode reward: [(0, '22.595')] +[2023-02-24 14:00:14,314][16348] Updated weights for policy 0, policy_version 850 (0.0038) +[2023-02-24 14:00:16,893][11586] Fps is (10 sec: 3278.3, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3489792. Throughput: 0: 887.7. Samples: 873070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 14:00:16,903][11586] Avg episode reward: [(0, '22.773')] +[2023-02-24 14:00:21,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3514368. Throughput: 0: 913.5. Samples: 876332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 14:00:21,899][11586] Avg episode reward: [(0, '22.347')] +[2023-02-24 14:00:23,830][16348] Updated weights for policy 0, policy_version 860 (0.0015) +[2023-02-24 14:00:26,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3568.4). Total num frames: 3526656. Throughput: 0: 906.0. Samples: 882208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 14:00:26,898][11586] Avg episode reward: [(0, '23.232')] +[2023-02-24 14:00:31,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3550.8, 300 sec: 3554.5). Total num frames: 3543040. Throughput: 0: 862.7. Samples: 886342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 14:00:31,899][11586] Avg episode reward: [(0, '23.298')] +[2023-02-24 14:00:36,300][16348] Updated weights for policy 0, policy_version 870 (0.0018) +[2023-02-24 14:00:36,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3563520. Throughput: 0: 880.0. Samples: 889166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 14:00:36,896][11586] Avg episode reward: [(0, '22.989')] +[2023-02-24 14:00:41,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 3584000. Throughput: 0: 924.5. Samples: 895748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-24 14:00:41,902][11586] Avg episode reward: [(0, '21.634')] +[2023-02-24 14:00:46,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 3600384. Throughput: 0: 892.0. Samples: 900776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 14:00:46,900][11586] Avg episode reward: [(0, '21.791')] +[2023-02-24 14:00:47,707][16348] Updated weights for policy 0, policy_version 880 (0.0037) +[2023-02-24 14:00:51,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3612672. Throughput: 0: 866.1. Samples: 902792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 14:00:51,901][11586] Avg episode reward: [(0, '21.174')] +[2023-02-24 14:00:56,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3637248. Throughput: 0: 903.4. Samples: 908524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-24 14:00:56,901][11586] Avg episode reward: [(0, '20.043')] +[2023-02-24 14:00:56,915][16334] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000888_3637248.pth... +[2023-02-24 14:00:57,041][16334] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000678_2777088.pth +[2023-02-24 14:00:58,394][16348] Updated weights for policy 0, policy_version 890 (0.0016) +[2023-02-24 14:01:01,893][11586] Fps is (10 sec: 4505.5, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3657728. Throughput: 0: 933.6. Samples: 915080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 14:01:01,896][11586] Avg episode reward: [(0, '22.321')] +[2023-02-24 14:01:06,897][11586] Fps is (10 sec: 3275.3, 60 sec: 3549.9, 300 sec: 3568.3). Total num frames: 3670016. Throughput: 0: 907.2. Samples: 917158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-24 14:01:06,900][11586] Avg episode reward: [(0, '23.048')] +[2023-02-24 14:01:11,190][16348] Updated weights for policy 0, policy_version 900 (0.0034) +[2023-02-24 14:01:11,893][11586] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3686400. Throughput: 0: 868.0. Samples: 921270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 14:01:11,896][11586] Avg episode reward: [(0, '23.478')] +[2023-02-24 14:01:16,893][11586] Fps is (10 sec: 4097.8, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3710976. Throughput: 0: 919.6. Samples: 927722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 14:01:16,904][11586] Avg episode reward: [(0, '23.789')] +[2023-02-24 14:01:20,617][16348] Updated weights for policy 0, policy_version 910 (0.0013) +[2023-02-24 14:01:21,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 3727360. Throughput: 0: 927.9. Samples: 930922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 14:01:21,898][11586] Avg episode reward: [(0, '25.225')] +[2023-02-24 14:01:21,903][16334] Saving new best policy, reward=25.225! +[2023-02-24 14:01:26,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3743744. Throughput: 0: 884.3. Samples: 935542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 14:01:26,900][11586] Avg episode reward: [(0, '23.896')] +[2023-02-24 14:01:31,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3760128. Throughput: 0: 879.8. Samples: 940368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 14:01:31,895][11586] Avg episode reward: [(0, '22.658')] +[2023-02-24 14:01:33,470][16348] Updated weights for policy 0, policy_version 920 (0.0048) +[2023-02-24 14:01:36,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3780608. Throughput: 0: 908.0. Samples: 943650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-24 14:01:36,895][11586] Avg episode reward: [(0, '22.253')] +[2023-02-24 14:01:41,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3801088. Throughput: 0: 919.4. Samples: 949898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 14:01:41,898][11586] Avg episode reward: [(0, '21.093')] +[2023-02-24 14:01:44,557][16348] Updated weights for policy 0, policy_version 930 (0.0017) +[2023-02-24 14:01:46,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3813376. Throughput: 0: 865.5. Samples: 954026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 14:01:46,899][11586] Avg episode reward: [(0, '21.203')] +[2023-02-24 14:01:51,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3833856. Throughput: 0: 874.5. Samples: 956508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 14:01:51,901][11586] Avg episode reward: [(0, '20.789')] +[2023-02-24 14:01:55,341][16348] Updated weights for policy 0, policy_version 940 (0.0029) +[2023-02-24 14:01:56,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 3854336. Throughput: 0: 930.0. Samples: 963120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 14:01:56,901][11586] Avg episode reward: [(0, '22.296')] +[2023-02-24 14:02:01,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 3870720. Throughput: 0: 905.4. Samples: 968464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 14:02:01,898][11586] Avg episode reward: [(0, '21.884')] +[2023-02-24 14:02:06,893][11586] Fps is (10 sec: 3276.8, 60 sec: 3618.4, 300 sec: 3582.3). Total num frames: 3887104. Throughput: 0: 878.5. Samples: 970454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 14:02:06,899][11586] Avg episode reward: [(0, '22.978')] +[2023-02-24 14:02:08,087][16348] Updated weights for policy 0, policy_version 950 (0.0025) +[2023-02-24 14:02:11,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.2). Total num frames: 3907584. Throughput: 0: 893.6. Samples: 975756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 14:02:11,902][11586] Avg episode reward: [(0, '23.018')] +[2023-02-24 14:02:16,893][11586] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3928064. Throughput: 0: 932.3. Samples: 982322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 14:02:16,899][11586] Avg episode reward: [(0, '22.471')] +[2023-02-24 14:02:17,413][16348] Updated weights for policy 0, policy_version 960 (0.0024) +[2023-02-24 14:02:21,893][11586] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3944448. Throughput: 0: 916.2. Samples: 984880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-24 14:02:21,898][11586] Avg episode reward: [(0, '22.697')] +[2023-02-24 14:02:26,893][11586] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3568.4). Total num frames: 3956736. Throughput: 0: 868.2. Samples: 988968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-24 14:02:26,901][11586] Avg episode reward: [(0, '21.365')] +[2023-02-24 14:02:30,041][16348] Updated weights for policy 0, policy_version 970 (0.0019) +[2023-02-24 14:02:31,893][11586] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3977216. Throughput: 0: 915.7. Samples: 995232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-24 14:02:31,895][11586] Avg episode reward: [(0, '22.285')] +[2023-02-24 14:02:36,893][11586] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 4001792. Throughput: 0: 935.2. Samples: 998592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-24 14:02:36,895][11586] Avg episode reward: [(0, '22.844')] +[2023-02-24 14:02:38,058][16334] Stopping Batcher_0... +[2023-02-24 14:02:38,059][16334] Loop batcher_evt_loop terminating... +[2023-02-24 14:02:38,060][11586] Component Batcher_0 stopped! +[2023-02-24 14:02:38,061][16334] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-24 14:02:38,129][16348] Weights refcount: 2 0 +[2023-02-24 14:02:38,132][11586] Component InferenceWorker_p0-w0 stopped! +[2023-02-24 14:02:38,132][16348] Stopping InferenceWorker_p0-w0... +[2023-02-24 14:02:38,142][16348] Loop inference_proc0-0_evt_loop terminating... +[2023-02-24 14:02:38,198][16353] Stopping RolloutWorker_w4... +[2023-02-24 14:02:38,199][16353] Loop rollout_proc4_evt_loop terminating... +[2023-02-24 14:02:38,198][16349] Stopping RolloutWorker_w0... +[2023-02-24 14:02:38,201][16349] Loop rollout_proc0_evt_loop terminating... +[2023-02-24 14:02:38,197][11586] Component RolloutWorker_w0 stopped! +[2023-02-24 14:02:38,209][11586] Component RolloutWorker_w4 stopped! +[2023-02-24 14:02:38,219][16352] Stopping RolloutWorker_w2... +[2023-02-24 14:02:38,220][16352] Loop rollout_proc2_evt_loop terminating... +[2023-02-24 14:02:38,219][11586] Component RolloutWorker_w3 stopped! +[2023-02-24 14:02:38,223][11586] Component RolloutWorker_w2 stopped! +[2023-02-24 14:02:38,223][16351] Stopping RolloutWorker_w3... +[2023-02-24 14:02:38,230][16351] Loop rollout_proc3_evt_loop terminating... +[2023-02-24 14:02:38,231][16356] Stopping RolloutWorker_w6... +[2023-02-24 14:02:38,232][11586] Component RolloutWorker_w6 stopped! +[2023-02-24 14:02:38,249][16354] Stopping RolloutWorker_w5... +[2023-02-24 14:02:38,249][16354] Loop rollout_proc5_evt_loop terminating... +[2023-02-24 14:02:38,249][11586] Component RolloutWorker_w5 stopped! +[2023-02-24 14:02:38,255][16350] Stopping RolloutWorker_w1... +[2023-02-24 14:02:38,255][16350] Loop rollout_proc1_evt_loop terminating... +[2023-02-24 14:02:38,232][16356] Loop rollout_proc6_evt_loop terminating... +[2023-02-24 14:02:38,254][11586] Component RolloutWorker_w1 stopped! +[2023-02-24 14:02:38,264][11586] Component RolloutWorker_w7 stopped! +[2023-02-24 14:02:38,270][16355] Stopping RolloutWorker_w7... +[2023-02-24 14:02:38,271][16355] Loop rollout_proc7_evt_loop terminating... +[2023-02-24 14:02:38,322][16334] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000781_3198976.pth +[2023-02-24 14:02:38,340][16334] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-24 14:02:38,529][16334] Stopping LearnerWorker_p0... +[2023-02-24 14:02:38,529][16334] Loop learner_proc0_evt_loop terminating... +[2023-02-24 14:02:38,537][11586] Component LearnerWorker_p0 stopped! +[2023-02-24 14:02:38,540][11586] Waiting for process learner_proc0 to stop... +[2023-02-24 14:02:40,993][11586] Waiting for process inference_proc0-0 to join... +[2023-02-24 14:02:41,665][11586] Waiting for process rollout_proc0 to join... +[2023-02-24 14:02:42,340][11586] Waiting for process rollout_proc1 to join... +[2023-02-24 14:02:42,342][11586] Waiting for process rollout_proc2 to join... +[2023-02-24 14:02:42,344][11586] Waiting for process rollout_proc3 to join... +[2023-02-24 14:02:42,345][11586] Waiting for process rollout_proc4 to join... +[2023-02-24 14:02:42,347][11586] Waiting for process rollout_proc5 to join... +[2023-02-24 14:02:42,348][11586] Waiting for process rollout_proc6 to join... +[2023-02-24 14:02:42,353][11586] Waiting for process rollout_proc7 to join... +[2023-02-24 14:02:42,354][11586] Batcher 0 profile tree view: +batching: 26.6612, releasing_batches: 0.0301 +[2023-02-24 14:02:42,357][11586] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 554.7535 +update_model: 8.8894 + weight_update: 0.0030 +one_step: 0.0065 + handle_policy_step: 543.1596 + deserialize: 16.2346, stack: 3.0119, obs_to_device_normalize: 120.2656, forward: 262.3141, send_messages: 28.3859 + prepare_outputs: 85.6442 + to_cpu: 51.5226 +[2023-02-24 14:02:42,361][11586] Learner 0 profile tree view: +misc: 0.0063, prepare_batch: 16.8706 +train: 76.6896 + epoch_init: 0.0058, minibatch_init: 0.0066, losses_postprocess: 0.6378, kl_divergence: 0.5449, after_optimizer: 33.3831 + calculate_losses: 26.7981 + losses_init: 0.0050, forward_head: 1.8537, bptt_initial: 17.6600, tail: 1.0470, advantages_returns: 0.3146, losses: 3.3213 + bptt: 2.2673 + bptt_forward_core: 2.1825 + update: 14.6233 + clip: 1.4078 +[2023-02-24 14:02:42,362][11586] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.3046, enqueue_policy_requests: 156.2539, env_step: 861.8929, overhead: 23.0260, complete_rollouts: 7.4981 +save_policy_outputs: 21.0416 + split_output_tensors: 10.2705 +[2023-02-24 14:02:42,364][11586] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.3684, enqueue_policy_requests: 154.3714, env_step: 861.9040, overhead: 23.6245, complete_rollouts: 6.8798 +save_policy_outputs: 21.3858 + split_output_tensors: 10.3105 +[2023-02-24 14:02:42,369][11586] Loop Runner_EvtLoop terminating... +[2023-02-24 14:02:42,371][11586] Runner profile tree view: +main_loop: 1181.0693 +[2023-02-24 14:02:42,372][11586] Collected {0: 4005888}, FPS: 3391.7 +[2023-02-24 14:02:42,432][11586] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-24 14:02:42,435][11586] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-24 14:02:42,437][11586] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-24 14:02:42,439][11586] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-24 14:02:42,441][11586] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-24 14:02:42,443][11586] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-24 14:02:42,444][11586] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2023-02-24 14:02:42,450][11586] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-24 14:02:42,452][11586] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2023-02-24 14:02:42,453][11586] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2023-02-24 14:02:42,454][11586] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-24 14:02:42,456][11586] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-24 14:02:42,457][11586] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-24 14:02:42,458][11586] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-24 14:02:42,460][11586] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-24 14:02:42,493][11586] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-24 14:02:42,497][11586] RunningMeanStd input shape: (3, 72, 128) +[2023-02-24 14:02:42,501][11586] RunningMeanStd input shape: (1,) +[2023-02-24 14:02:42,526][11586] ConvEncoder: input_channels=3 +[2023-02-24 14:02:43,237][11586] Conv encoder output size: 512 +[2023-02-24 14:02:43,239][11586] Policy head output size: 512 +[2023-02-24 14:02:45,713][11586] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-24 14:02:47,013][11586] Num frames 100... +[2023-02-24 14:02:47,133][11586] Num frames 200... +[2023-02-24 14:02:47,257][11586] Num frames 300... +[2023-02-24 14:02:47,389][11586] Num frames 400... +[2023-02-24 14:02:47,508][11586] Num frames 500... +[2023-02-24 14:02:47,632][11586] Num frames 600... +[2023-02-24 14:02:47,755][11586] Num frames 700... +[2023-02-24 14:02:47,874][11586] Num frames 800... +[2023-02-24 14:02:47,997][11586] Num frames 900... +[2023-02-24 14:02:48,110][11586] Num frames 1000... +[2023-02-24 14:02:48,246][11586] Num frames 1100... +[2023-02-24 14:02:48,375][11586] Num frames 1200... +[2023-02-24 14:02:48,490][11586] Num frames 1300... +[2023-02-24 14:02:48,613][11586] Num frames 1400... +[2023-02-24 14:02:48,759][11586] Avg episode rewards: #0: 38.720, true rewards: #0: 14.720 +[2023-02-24 14:02:48,761][11586] Avg episode reward: 38.720, avg true_objective: 14.720 +[2023-02-24 14:02:48,803][11586] Num frames 1500... +[2023-02-24 14:02:48,920][11586] Num frames 1600... +[2023-02-24 14:02:49,039][11586] Num frames 1700... +[2023-02-24 14:02:49,157][11586] Num frames 1800... +[2023-02-24 14:02:49,273][11586] Num frames 1900... +[2023-02-24 14:02:49,397][11586] Num frames 2000... +[2023-02-24 14:02:49,516][11586] Num frames 2100... +[2023-02-24 14:02:49,638][11586] Num frames 2200... +[2023-02-24 14:02:49,774][11586] Num frames 2300... +[2023-02-24 14:02:49,896][11586] Num frames 2400... +[2023-02-24 14:02:50,017][11586] Num frames 2500... +[2023-02-24 14:02:50,133][11586] Num frames 2600... +[2023-02-24 14:02:50,251][11586] Num frames 2700... +[2023-02-24 14:02:50,381][11586] Num frames 2800... +[2023-02-24 14:02:50,499][11586] Num frames 2900... +[2023-02-24 14:02:50,620][11586] Num frames 3000... +[2023-02-24 14:02:50,739][11586] Num frames 3100... +[2023-02-24 14:02:50,859][11586] Num frames 3200... +[2023-02-24 14:02:50,992][11586] Num frames 3300... +[2023-02-24 14:02:51,093][11586] Avg episode rewards: #0: 43.679, true rewards: #0: 16.680 +[2023-02-24 14:02:51,095][11586] Avg episode reward: 43.679, avg true_objective: 16.680 +[2023-02-24 14:02:51,175][11586] Num frames 3400... +[2023-02-24 14:02:51,298][11586] Num frames 3500... +[2023-02-24 14:02:51,433][11586] Num frames 3600... +[2023-02-24 14:02:51,549][11586] Num frames 3700... +[2023-02-24 14:02:51,644][11586] Avg episode rewards: #0: 31.443, true rewards: #0: 12.443 +[2023-02-24 14:02:51,646][11586] Avg episode reward: 31.443, avg true_objective: 12.443 +[2023-02-24 14:02:51,725][11586] Num frames 3800... +[2023-02-24 14:02:51,857][11586] Num frames 3900... +[2023-02-24 14:02:51,971][11586] Num frames 4000... +[2023-02-24 14:02:52,103][11586] Num frames 4100... +[2023-02-24 14:02:52,231][11586] Num frames 4200... +[2023-02-24 14:02:52,363][11586] Num frames 4300... +[2023-02-24 14:02:52,533][11586] Num frames 4400... +[2023-02-24 14:02:52,695][11586] Num frames 4500... +[2023-02-24 14:02:52,862][11586] Num frames 4600... +[2023-02-24 14:02:53,022][11586] Num frames 4700... +[2023-02-24 14:02:53,187][11586] Num frames 4800... +[2023-02-24 14:02:53,366][11586] Num frames 4900... +[2023-02-24 14:02:53,530][11586] Num frames 5000... +[2023-02-24 14:02:53,694][11586] Num frames 5100... +[2023-02-24 14:02:53,868][11586] Num frames 5200... +[2023-02-24 14:02:54,035][11586] Num frames 5300... +[2023-02-24 14:02:54,263][11586] Avg episode rewards: #0: 32.742, true rewards: #0: 13.492 +[2023-02-24 14:02:54,266][11586] Avg episode reward: 32.742, avg true_objective: 13.492 +[2023-02-24 14:02:54,272][11586] Num frames 5400... +[2023-02-24 14:02:54,439][11586] Num frames 5500... +[2023-02-24 14:02:54,613][11586] Num frames 5600... +[2023-02-24 14:02:54,783][11586] Num frames 5700... +[2023-02-24 14:02:54,966][11586] Num frames 5800... +[2023-02-24 14:02:55,129][11586] Num frames 5900... +[2023-02-24 14:02:55,300][11586] Num frames 6000... +[2023-02-24 14:02:55,449][11586] Num frames 6100... +[2023-02-24 14:02:55,553][11586] Avg episode rewards: #0: 28.866, true rewards: #0: 12.266 +[2023-02-24 14:02:55,554][11586] Avg episode reward: 28.866, avg true_objective: 12.266 +[2023-02-24 14:02:55,646][11586] Num frames 6200... +[2023-02-24 14:02:55,775][11586] Num frames 6300... +[2023-02-24 14:02:55,892][11586] Num frames 6400... +[2023-02-24 14:02:56,052][11586] Avg episode rewards: #0: 24.808, true rewards: #0: 10.808 +[2023-02-24 14:02:56,053][11586] Avg episode reward: 24.808, avg true_objective: 10.808 +[2023-02-24 14:02:56,079][11586] Num frames 6500... +[2023-02-24 14:02:56,198][11586] Num frames 6600... +[2023-02-24 14:02:56,315][11586] Num frames 6700... +[2023-02-24 14:02:56,431][11586] Num frames 6800... +[2023-02-24 14:02:56,570][11586] Num frames 6900... +[2023-02-24 14:02:56,689][11586] Num frames 7000... +[2023-02-24 14:02:56,807][11586] Num frames 7100... +[2023-02-24 14:02:56,930][11586] Num frames 7200... +[2023-02-24 14:02:57,049][11586] Num frames 7300... +[2023-02-24 14:02:57,168][11586] Num frames 7400... +[2023-02-24 14:02:57,305][11586] Num frames 7500... +[2023-02-24 14:02:57,376][11586] Avg episode rewards: #0: 25.158, true rewards: #0: 10.730 +[2023-02-24 14:02:57,382][11586] Avg episode reward: 25.158, avg true_objective: 10.730 +[2023-02-24 14:02:57,492][11586] Num frames 7600... +[2023-02-24 14:02:57,614][11586] Num frames 7700... +[2023-02-24 14:02:57,732][11586] Num frames 7800... +[2023-02-24 14:02:57,861][11586] Num frames 7900... +[2023-02-24 14:02:57,982][11586] Num frames 8000... +[2023-02-24 14:02:58,096][11586] Num frames 8100... +[2023-02-24 14:02:58,210][11586] Num frames 8200... +[2023-02-24 14:02:58,334][11586] Num frames 8300... +[2023-02-24 14:02:58,457][11586] Num frames 8400... +[2023-02-24 14:02:58,581][11586] Num frames 8500... +[2023-02-24 14:02:58,713][11586] Num frames 8600... +[2023-02-24 14:02:58,847][11586] Num frames 8700... +[2023-02-24 14:02:58,964][11586] Num frames 8800... +[2023-02-24 14:02:59,086][11586] Num frames 8900... +[2023-02-24 14:02:59,209][11586] Num frames 9000... +[2023-02-24 14:02:59,327][11586] Num frames 9100... +[2023-02-24 14:02:59,445][11586] Num frames 9200... +[2023-02-24 14:02:59,567][11586] Num frames 9300... +[2023-02-24 14:02:59,698][11586] Num frames 9400... +[2023-02-24 14:02:59,821][11586] Num frames 9500... +[2023-02-24 14:02:59,952][11586] Avg episode rewards: #0: 27.948, true rewards: #0: 11.949 +[2023-02-24 14:02:59,953][11586] Avg episode reward: 27.948, avg true_objective: 11.949 +[2023-02-24 14:03:00,006][11586] Num frames 9600... +[2023-02-24 14:03:00,124][11586] Num frames 9700... +[2023-02-24 14:03:00,236][11586] Num frames 9800... +[2023-02-24 14:03:00,367][11586] Num frames 9900... +[2023-02-24 14:03:00,497][11586] Num frames 10000... +[2023-02-24 14:03:00,615][11586] Num frames 10100... +[2023-02-24 14:03:00,718][11586] Avg episode rewards: #0: 25.928, true rewards: #0: 11.261 +[2023-02-24 14:03:00,719][11586] Avg episode reward: 25.928, avg true_objective: 11.261 +[2023-02-24 14:03:00,803][11586] Num frames 10200... +[2023-02-24 14:03:00,927][11586] Num frames 10300... +[2023-02-24 14:03:01,062][11586] Num frames 10400... +[2023-02-24 14:03:01,185][11586] Num frames 10500... +[2023-02-24 14:03:01,313][11586] Num frames 10600... +[2023-02-24 14:03:01,448][11586] Num frames 10700... +[2023-02-24 14:03:01,581][11586] Num frames 10800... +[2023-02-24 14:03:01,696][11586] Num frames 10900... +[2023-02-24 14:03:01,759][11586] Avg episode rewards: #0: 25.003, true rewards: #0: 10.903 +[2023-02-24 14:03:01,761][11586] Avg episode reward: 25.003, avg true_objective: 10.903 +[2023-02-24 14:04:07,810][11586] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2023-02-24 14:04:08,199][11586] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-24 14:04:08,201][11586] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-24 14:04:08,203][11586] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-24 14:04:08,206][11586] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-24 14:04:08,208][11586] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-24 14:04:08,210][11586] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-24 14:04:08,211][11586] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2023-02-24 14:04:08,213][11586] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-24 14:04:08,214][11586] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2023-02-24 14:04:08,215][11586] Adding new argument 'hf_repository'='Roberto/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2023-02-24 14:04:08,217][11586] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-24 14:04:08,218][11586] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-24 14:04:08,219][11586] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-24 14:04:08,220][11586] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-24 14:04:08,222][11586] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-24 14:04:08,248][11586] RunningMeanStd input shape: (3, 72, 128) +[2023-02-24 14:04:08,251][11586] RunningMeanStd input shape: (1,) +[2023-02-24 14:04:08,272][11586] ConvEncoder: input_channels=3 +[2023-02-24 14:04:08,334][11586] Conv encoder output size: 512 +[2023-02-24 14:04:08,336][11586] Policy head output size: 512 +[2023-02-24 14:04:08,363][11586] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-24 14:04:09,074][11586] Num frames 100... +[2023-02-24 14:04:09,250][11586] Num frames 200... +[2023-02-24 14:04:09,422][11586] Num frames 300... +[2023-02-24 14:04:09,603][11586] Num frames 400... +[2023-02-24 14:04:09,809][11586] Num frames 500... +[2023-02-24 14:04:10,009][11586] Num frames 600... +[2023-02-24 14:04:10,159][11586] Avg episode rewards: #0: 13.470, true rewards: #0: 6.470 +[2023-02-24 14:04:10,161][11586] Avg episode reward: 13.470, avg true_objective: 6.470 +[2023-02-24 14:04:10,269][11586] Num frames 700... +[2023-02-24 14:04:10,471][11586] Num frames 800... +[2023-02-24 14:04:10,653][11586] Num frames 900... +[2023-02-24 14:04:10,846][11586] Num frames 1000... +[2023-02-24 14:04:11,012][11586] Num frames 1100... +[2023-02-24 14:04:11,177][11586] Num frames 1200... +[2023-02-24 14:04:11,332][11586] Num frames 1300... +[2023-02-24 14:04:11,487][11586] Num frames 1400... +[2023-02-24 14:04:11,633][11586] Avg episode rewards: #0: 15.785, true rewards: #0: 7.285 +[2023-02-24 14:04:11,636][11586] Avg episode reward: 15.785, avg true_objective: 7.285 +[2023-02-24 14:04:11,732][11586] Num frames 1500... +[2023-02-24 14:04:11,892][11586] Num frames 1600... +[2023-02-24 14:04:12,056][11586] Num frames 1700... +[2023-02-24 14:04:12,224][11586] Num frames 1800... +[2023-02-24 14:04:12,415][11586] Num frames 1900... +[2023-02-24 14:04:12,605][11586] Num frames 2000... +[2023-02-24 14:04:12,794][11586] Num frames 2100... +[2023-02-24 14:04:12,997][11586] Num frames 2200... +[2023-02-24 14:04:13,186][11586] Num frames 2300... +[2023-02-24 14:04:13,386][11586] Num frames 2400... +[2023-02-24 14:04:13,560][11586] Num frames 2500... +[2023-02-24 14:04:13,727][11586] Avg episode rewards: #0: 18.897, true rewards: #0: 8.563 +[2023-02-24 14:04:13,730][11586] Avg episode reward: 18.897, avg true_objective: 8.563 +[2023-02-24 14:04:13,800][11586] Num frames 2600... +[2023-02-24 14:04:13,974][11586] Num frames 2700... +[2023-02-24 14:04:14,136][11586] Num frames 2800... +[2023-02-24 14:04:14,291][11586] Num frames 2900... +[2023-02-24 14:04:14,446][11586] Num frames 3000... +[2023-02-24 14:04:14,535][11586] Avg episode rewards: #0: 15.543, true rewards: #0: 7.542 +[2023-02-24 14:04:14,538][11586] Avg episode reward: 15.543, avg true_objective: 7.542 +[2023-02-24 14:04:14,700][11586] Num frames 3100... +[2023-02-24 14:04:14,907][11586] Num frames 3200... +[2023-02-24 14:04:15,113][11586] Num frames 3300... +[2023-02-24 14:04:15,281][11586] Num frames 3400... +[2023-02-24 14:04:15,475][11586] Num frames 3500... +[2023-02-24 14:04:15,657][11586] Num frames 3600... +[2023-02-24 14:04:15,845][11586] Num frames 3700... +[2023-02-24 14:04:16,042][11586] Num frames 3800... +[2023-02-24 14:04:16,269][11586] Avg episode rewards: #0: 16.380, true rewards: #0: 7.780 +[2023-02-24 14:04:16,272][11586] Avg episode reward: 16.380, avg true_objective: 7.780 +[2023-02-24 14:04:16,296][11586] Num frames 3900... +[2023-02-24 14:04:16,490][11586] Num frames 4000... +[2023-02-24 14:04:16,690][11586] Num frames 4100... +[2023-02-24 14:04:16,893][11586] Num frames 4200... +[2023-02-24 14:04:17,073][11586] Num frames 4300... +[2023-02-24 14:04:17,253][11586] Num frames 4400... +[2023-02-24 14:04:17,452][11586] Num frames 4500... +[2023-02-24 14:04:17,659][11586] Num frames 4600... +[2023-02-24 14:04:17,863][11586] Num frames 4700... +[2023-02-24 14:04:18,054][11586] Num frames 4800... +[2023-02-24 14:04:18,263][11586] Num frames 4900... +[2023-02-24 14:04:18,469][11586] Num frames 5000... +[2023-02-24 14:04:18,681][11586] Num frames 5100... +[2023-02-24 14:04:18,887][11586] Num frames 5200... +[2023-02-24 14:04:19,113][11586] Num frames 5300... +[2023-02-24 14:04:19,320][11586] Num frames 5400... +[2023-02-24 14:04:19,554][11586] Avg episode rewards: #0: 19.990, true rewards: #0: 9.157 +[2023-02-24 14:04:19,557][11586] Avg episode reward: 19.990, avg true_objective: 9.157 +[2023-02-24 14:04:19,573][11586] Num frames 5500... +[2023-02-24 14:04:19,738][11586] Num frames 5600... +[2023-02-24 14:04:19,926][11586] Num frames 5700... +[2023-02-24 14:04:20,089][11586] Num frames 5800... +[2023-02-24 14:04:20,261][11586] Num frames 5900... +[2023-02-24 14:04:20,436][11586] Num frames 6000... +[2023-02-24 14:04:20,638][11586] Num frames 6100... +[2023-02-24 14:04:20,829][11586] Num frames 6200... +[2023-02-24 14:04:20,983][11586] Num frames 6300... +[2023-02-24 14:04:21,107][11586] Num frames 6400... +[2023-02-24 14:04:21,190][11586] Avg episode rewards: #0: 19.746, true rewards: #0: 9.174 +[2023-02-24 14:04:21,193][11586] Avg episode reward: 19.746, avg true_objective: 9.174 +[2023-02-24 14:04:21,329][11586] Num frames 6500... +[2023-02-24 14:04:21,493][11586] Num frames 6600... +[2023-02-24 14:04:21,657][11586] Num frames 6700... +[2023-02-24 14:04:21,819][11586] Num frames 6800... +[2023-02-24 14:04:21,988][11586] Num frames 6900... +[2023-02-24 14:04:22,166][11586] Num frames 7000... +[2023-02-24 14:04:22,331][11586] Num frames 7100... +[2023-02-24 14:04:22,485][11586] Avg episode rewards: #0: 19.448, true rewards: #0: 8.947 +[2023-02-24 14:04:22,487][11586] Avg episode reward: 19.448, avg true_objective: 8.947 +[2023-02-24 14:04:22,559][11586] Num frames 7200... +[2023-02-24 14:04:22,721][11586] Num frames 7300... +[2023-02-24 14:04:22,880][11586] Num frames 7400... +[2023-02-24 14:04:23,017][11586] Num frames 7500... +[2023-02-24 14:04:23,135][11586] Num frames 7600... +[2023-02-24 14:04:23,260][11586] Num frames 7700... +[2023-02-24 14:04:23,382][11586] Avg episode rewards: #0: 18.380, true rewards: #0: 8.602 +[2023-02-24 14:04:23,384][11586] Avg episode reward: 18.380, avg true_objective: 8.602 +[2023-02-24 14:04:23,458][11586] Num frames 7800... +[2023-02-24 14:04:23,578][11586] Num frames 7900... +[2023-02-24 14:04:23,693][11586] Num frames 8000... +[2023-02-24 14:04:23,802][11586] Avg episode rewards: #0: 17.044, true rewards: #0: 8.044 +[2023-02-24 14:04:23,803][11586] Avg episode reward: 17.044, avg true_objective: 8.044 +[2023-02-24 14:05:13,777][11586] Replay video saved to /content/train_dir/default_experiment/replay.mp4!