[2024-10-16 02:32:58,337][00603] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-10-16 02:32:58,339][00603] Rollout worker 0 uses device cpu [2024-10-16 02:32:58,340][00603] Rollout worker 1 uses device cpu [2024-10-16 02:32:58,342][00603] Rollout worker 2 uses device cpu [2024-10-16 02:32:58,343][00603] Rollout worker 3 uses device cpu [2024-10-16 02:32:58,344][00603] Rollout worker 4 uses device cpu [2024-10-16 02:32:58,345][00603] Rollout worker 5 uses device cpu [2024-10-16 02:32:58,346][00603] Rollout worker 6 uses device cpu [2024-10-16 02:32:58,347][00603] Rollout worker 7 uses device cpu [2024-10-16 02:32:58,518][00603] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-16 02:32:58,520][00603] InferenceWorker_p0-w0: min num requests: 2 [2024-10-16 02:32:58,553][00603] Starting all processes... [2024-10-16 02:32:58,554][00603] Starting process learner_proc0 [2024-10-16 02:32:59,236][00603] Starting all processes... [2024-10-16 02:32:59,247][00603] Starting process inference_proc0-0 [2024-10-16 02:32:59,247][00603] Starting process rollout_proc0 [2024-10-16 02:32:59,248][00603] Starting process rollout_proc1 [2024-10-16 02:32:59,250][00603] Starting process rollout_proc2 [2024-10-16 02:32:59,250][00603] Starting process rollout_proc3 [2024-10-16 02:32:59,250][00603] Starting process rollout_proc4 [2024-10-16 02:32:59,250][00603] Starting process rollout_proc5 [2024-10-16 02:32:59,250][00603] Starting process rollout_proc6 [2024-10-16 02:32:59,250][00603] Starting process rollout_proc7 [2024-10-16 02:33:13,825][03704] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-16 02:33:13,825][03704] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-10-16 02:33:13,898][03704] Num visible devices: 1 [2024-10-16 02:33:13,941][03704] Starting seed is not provided [2024-10-16 02:33:13,942][03704] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-16 02:33:13,943][03704] Initializing actor-critic model on device cuda:0 [2024-10-16 02:33:13,943][03704] RunningMeanStd input shape: (3, 72, 128) [2024-10-16 02:33:13,947][03704] RunningMeanStd input shape: (1,) [2024-10-16 02:33:14,105][03704] ConvEncoder: input_channels=3 [2024-10-16 02:33:14,539][03718] Worker 0 uses CPU cores [0] [2024-10-16 02:33:14,826][03724] Worker 6 uses CPU cores [0] [2024-10-16 02:33:14,844][03723] Worker 4 uses CPU cores [0] [2024-10-16 02:33:14,938][03719] Worker 1 uses CPU cores [1] [2024-10-16 02:33:14,959][03725] Worker 7 uses CPU cores [1] [2024-10-16 02:33:14,981][03721] Worker 2 uses CPU cores [0] [2024-10-16 02:33:15,091][03704] Conv encoder output size: 512 [2024-10-16 02:33:15,093][03704] Policy head output size: 512 [2024-10-16 02:33:15,095][03722] Worker 5 uses CPU cores [1] [2024-10-16 02:33:15,134][03717] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-16 02:33:15,134][03717] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-10-16 02:33:15,141][03720] Worker 3 uses CPU cores [1] [2024-10-16 02:33:15,152][03717] Num visible devices: 1 [2024-10-16 02:33:15,183][03704] Created Actor Critic model with architecture: [2024-10-16 02:33:15,183][03704] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-10-16 02:33:15,583][03704] Using optimizer [2024-10-16 02:33:16,273][03704] No checkpoints found [2024-10-16 02:33:16,273][03704] Did not load from checkpoint, starting from scratch! [2024-10-16 02:33:16,273][03704] Initialized policy 0 weights for model version 0 [2024-10-16 02:33:16,278][03704] LearnerWorker_p0 finished initialization! [2024-10-16 02:33:16,279][03704] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-16 02:33:16,472][03717] RunningMeanStd input shape: (3, 72, 128) [2024-10-16 02:33:16,473][03717] RunningMeanStd input shape: (1,) [2024-10-16 02:33:16,485][03717] ConvEncoder: input_channels=3 [2024-10-16 02:33:16,586][03717] Conv encoder output size: 512 [2024-10-16 02:33:16,587][03717] Policy head output size: 512 [2024-10-16 02:33:16,637][00603] Inference worker 0-0 is ready! [2024-10-16 02:33:16,638][00603] All inference workers are ready! Signal rollout workers to start! [2024-10-16 02:33:16,832][03720] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-16 02:33:16,836][03725] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-16 02:33:16,833][03719] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-16 02:33:16,839][03718] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-16 02:33:16,840][03723] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-16 02:33:16,840][03724] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-16 02:33:16,836][03722] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-16 02:33:16,836][03721] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-16 02:33:17,841][03720] Decorrelating experience for 0 frames... [2024-10-16 02:33:17,840][03722] Decorrelating experience for 0 frames... [2024-10-16 02:33:18,052][00603] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-16 02:33:18,469][03724] Decorrelating experience for 0 frames... [2024-10-16 02:33:18,473][03718] Decorrelating experience for 0 frames... [2024-10-16 02:33:18,475][03723] Decorrelating experience for 0 frames... [2024-10-16 02:33:18,481][03721] Decorrelating experience for 0 frames... [2024-10-16 02:33:18,510][00603] Heartbeat connected on Batcher_0 [2024-10-16 02:33:18,514][00603] Heartbeat connected on LearnerWorker_p0 [2024-10-16 02:33:18,557][00603] Heartbeat connected on InferenceWorker_p0-w0 [2024-10-16 02:33:18,651][03722] Decorrelating experience for 32 frames... [2024-10-16 02:33:18,663][03720] Decorrelating experience for 32 frames... [2024-10-16 02:33:19,557][03724] Decorrelating experience for 32 frames... [2024-10-16 02:33:19,562][03721] Decorrelating experience for 32 frames... [2024-10-16 02:33:19,564][03723] Decorrelating experience for 32 frames... [2024-10-16 02:33:19,666][03719] Decorrelating experience for 0 frames... [2024-10-16 02:33:20,098][03722] Decorrelating experience for 64 frames... [2024-10-16 02:33:20,117][03720] Decorrelating experience for 64 frames... [2024-10-16 02:33:20,805][03718] Decorrelating experience for 32 frames... [2024-10-16 02:33:20,815][03725] Decorrelating experience for 0 frames... [2024-10-16 02:33:21,192][03720] Decorrelating experience for 96 frames... [2024-10-16 02:33:21,264][03721] Decorrelating experience for 64 frames... [2024-10-16 02:33:21,443][00603] Heartbeat connected on RolloutWorker_w3 [2024-10-16 02:33:21,608][03723] Decorrelating experience for 64 frames... [2024-10-16 02:33:22,635][03724] Decorrelating experience for 64 frames... [2024-10-16 02:33:22,642][03722] Decorrelating experience for 96 frames... [2024-10-16 02:33:22,936][00603] Heartbeat connected on RolloutWorker_w5 [2024-10-16 02:33:23,052][00603] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-16 02:33:23,164][03721] Decorrelating experience for 96 frames... [2024-10-16 02:33:23,415][00603] Heartbeat connected on RolloutWorker_w2 [2024-10-16 02:33:23,494][03723] Decorrelating experience for 96 frames... [2024-10-16 02:33:23,785][03719] Decorrelating experience for 32 frames... [2024-10-16 02:33:23,864][00603] Heartbeat connected on RolloutWorker_w4 [2024-10-16 02:33:26,894][03724] Decorrelating experience for 96 frames... [2024-10-16 02:33:27,059][03719] Decorrelating experience for 64 frames... [2024-10-16 02:33:27,311][03718] Decorrelating experience for 64 frames... [2024-10-16 02:33:27,453][00603] Heartbeat connected on RolloutWorker_w6 [2024-10-16 02:33:28,052][00603] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 96.6. Samples: 966. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-16 02:33:28,057][00603] Avg episode reward: [(0, '2.932')] [2024-10-16 02:33:28,271][03704] Signal inference workers to stop experience collection... [2024-10-16 02:33:28,296][03717] InferenceWorker_p0-w0: stopping experience collection [2024-10-16 02:33:28,585][03718] Decorrelating experience for 96 frames... [2024-10-16 02:33:28,704][03725] Decorrelating experience for 32 frames... [2024-10-16 02:33:28,755][00603] Heartbeat connected on RolloutWorker_w0 [2024-10-16 02:33:29,517][03719] Decorrelating experience for 96 frames... [2024-10-16 02:33:29,664][00603] Heartbeat connected on RolloutWorker_w1 [2024-10-16 02:33:29,694][03725] Decorrelating experience for 64 frames... [2024-10-16 02:33:30,075][03725] Decorrelating experience for 96 frames... [2024-10-16 02:33:30,151][00603] Heartbeat connected on RolloutWorker_w7 [2024-10-16 02:33:31,691][03704] Signal inference workers to resume experience collection... [2024-10-16 02:33:31,692][03717] InferenceWorker_p0-w0: resuming experience collection [2024-10-16 02:33:33,052][00603] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 151.2. Samples: 2268. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-10-16 02:33:33,056][00603] Avg episode reward: [(0, '2.945')] [2024-10-16 02:33:38,052][00603] Fps is (10 sec: 3276.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 32768. Throughput: 0: 434.5. Samples: 8690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:33:38,058][00603] Avg episode reward: [(0, '3.785')] [2024-10-16 02:33:40,699][03717] Updated weights for policy 0, policy_version 10 (0.0018) [2024-10-16 02:33:43,052][00603] Fps is (10 sec: 3276.8, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 45056. Throughput: 0: 433.7. Samples: 10842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-16 02:33:43,055][00603] Avg episode reward: [(0, '4.095')] [2024-10-16 02:33:48,054][00603] Fps is (10 sec: 3276.1, 60 sec: 2184.4, 300 sec: 2184.4). Total num frames: 65536. Throughput: 0: 534.8. Samples: 16046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:33:48,057][00603] Avg episode reward: [(0, '4.423')] [2024-10-16 02:33:50,556][03717] Updated weights for policy 0, policy_version 20 (0.0022) [2024-10-16 02:33:53,053][00603] Fps is (10 sec: 4914.7, 60 sec: 2691.6, 300 sec: 2691.6). Total num frames: 94208. Throughput: 0: 672.6. Samples: 23540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:33:53,060][00603] Avg episode reward: [(0, '4.339')] [2024-10-16 02:33:58,052][00603] Fps is (10 sec: 4096.8, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 106496. Throughput: 0: 655.4. Samples: 26216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:33:58,055][00603] Avg episode reward: [(0, '4.325')] [2024-10-16 02:33:58,085][03704] Saving new best policy, reward=4.325! [2024-10-16 02:34:01,555][03717] Updated weights for policy 0, policy_version 30 (0.0025) [2024-10-16 02:34:03,052][00603] Fps is (10 sec: 3277.1, 60 sec: 2821.7, 300 sec: 2821.7). Total num frames: 126976. Throughput: 0: 695.8. Samples: 31310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:34:03,055][00603] Avg episode reward: [(0, '4.445')] [2024-10-16 02:34:03,062][03704] Saving new best policy, reward=4.445! [2024-10-16 02:34:08,052][00603] Fps is (10 sec: 4505.6, 60 sec: 3031.0, 300 sec: 3031.0). Total num frames: 151552. Throughput: 0: 857.6. Samples: 38592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:34:08,058][00603] Avg episode reward: [(0, '4.372')] [2024-10-16 02:34:10,144][03717] Updated weights for policy 0, policy_version 40 (0.0029) [2024-10-16 02:34:13,052][00603] Fps is (10 sec: 4505.7, 60 sec: 3127.9, 300 sec: 3127.9). Total num frames: 172032. Throughput: 0: 910.4. Samples: 41934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:34:13,061][00603] Avg episode reward: [(0, '4.359')] [2024-10-16 02:34:18,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3140.3). Total num frames: 188416. Throughput: 0: 980.6. Samples: 46396. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:34:18,055][00603] Avg episode reward: [(0, '4.399')] [2024-10-16 02:34:21,056][03717] Updated weights for policy 0, policy_version 50 (0.0038) [2024-10-16 02:34:23,052][00603] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 212992. Throughput: 0: 999.4. Samples: 53662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:34:23,058][00603] Avg episode reward: [(0, '4.407')] [2024-10-16 02:34:28,053][00603] Fps is (10 sec: 4505.0, 60 sec: 3891.1, 300 sec: 3335.2). Total num frames: 233472. Throughput: 0: 1034.0. Samples: 57374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:34:28,059][00603] Avg episode reward: [(0, '4.343')] [2024-10-16 02:34:30,919][03717] Updated weights for policy 0, policy_version 60 (0.0034) [2024-10-16 02:34:33,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3331.4). Total num frames: 249856. Throughput: 0: 1032.3. Samples: 62498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:34:33,058][00603] Avg episode reward: [(0, '4.546')] [2024-10-16 02:34:33,072][03704] Saving new best policy, reward=4.546! [2024-10-16 02:34:38,052][00603] Fps is (10 sec: 4096.6, 60 sec: 4027.7, 300 sec: 3430.4). Total num frames: 274432. Throughput: 0: 1008.6. Samples: 68924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-16 02:34:38,058][00603] Avg episode reward: [(0, '4.647')] [2024-10-16 02:34:38,062][03704] Saving new best policy, reward=4.647! [2024-10-16 02:34:40,458][03717] Updated weights for policy 0, policy_version 70 (0.0024) [2024-10-16 02:34:43,052][00603] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 3517.7). Total num frames: 299008. Throughput: 0: 1029.6. Samples: 72546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:34:43,057][00603] Avg episode reward: [(0, '4.649')] [2024-10-16 02:34:43,064][03704] Saving new best policy, reward=4.649! [2024-10-16 02:34:48,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 3458.8). Total num frames: 311296. Throughput: 0: 1045.5. Samples: 78358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:34:48,056][00603] Avg episode reward: [(0, '4.589')] [2024-10-16 02:34:51,472][03717] Updated weights for policy 0, policy_version 80 (0.0036) [2024-10-16 02:34:53,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3535.5). Total num frames: 335872. Throughput: 0: 1009.8. Samples: 84032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:34:53,055][00603] Avg episode reward: [(0, '4.506')] [2024-10-16 02:34:53,065][03704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000082_335872.pth... [2024-10-16 02:34:58,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3563.5). Total num frames: 356352. Throughput: 0: 1015.0. Samples: 87610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-16 02:34:58,060][00603] Avg episode reward: [(0, '4.283')] [2024-10-16 02:34:59,851][03717] Updated weights for policy 0, policy_version 90 (0.0027) [2024-10-16 02:35:03,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3588.9). Total num frames: 376832. Throughput: 0: 1063.0. Samples: 94230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:35:03,054][00603] Avg episode reward: [(0, '4.278')] [2024-10-16 02:35:08,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3574.7). Total num frames: 393216. Throughput: 0: 1006.0. Samples: 98930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:35:08,059][00603] Avg episode reward: [(0, '4.491')] [2024-10-16 02:35:10,946][03717] Updated weights for policy 0, policy_version 100 (0.0025) [2024-10-16 02:35:13,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3633.0). Total num frames: 417792. Throughput: 0: 1005.4. Samples: 102618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:35:13,061][00603] Avg episode reward: [(0, '4.791')] [2024-10-16 02:35:13,076][03704] Saving new best policy, reward=4.791! [2024-10-16 02:35:18,053][00603] Fps is (10 sec: 4505.1, 60 sec: 4164.2, 300 sec: 3652.2). Total num frames: 438272. Throughput: 0: 1046.9. Samples: 109608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:35:18,059][00603] Avg episode reward: [(0, '4.422')] [2024-10-16 02:35:21,304][03717] Updated weights for policy 0, policy_version 110 (0.0034) [2024-10-16 02:35:23,052][00603] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3637.2). Total num frames: 454656. Throughput: 0: 1005.9. Samples: 114188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:35:23,057][00603] Avg episode reward: [(0, '4.291')] [2024-10-16 02:35:28,052][00603] Fps is (10 sec: 4096.4, 60 sec: 4096.1, 300 sec: 3686.4). Total num frames: 479232. Throughput: 0: 995.9. Samples: 117362. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:35:28,059][00603] Avg episode reward: [(0, '4.499')] [2024-10-16 02:35:30,381][03717] Updated weights for policy 0, policy_version 120 (0.0026) [2024-10-16 02:35:33,052][00603] Fps is (10 sec: 4915.3, 60 sec: 4232.5, 300 sec: 3731.9). Total num frames: 503808. Throughput: 0: 1034.4. Samples: 124906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:35:33,059][00603] Avg episode reward: [(0, '4.639')] [2024-10-16 02:35:38,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3686.4). Total num frames: 516096. Throughput: 0: 1020.3. Samples: 129944. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-16 02:35:38,060][00603] Avg episode reward: [(0, '4.736')] [2024-10-16 02:35:41,845][03717] Updated weights for policy 0, policy_version 130 (0.0027) [2024-10-16 02:35:43,052][00603] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3700.5). Total num frames: 536576. Throughput: 0: 991.1. Samples: 132208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:35:43,055][00603] Avg episode reward: [(0, '4.970')] [2024-10-16 02:35:43,061][03704] Saving new best policy, reward=4.970! [2024-10-16 02:35:48,052][00603] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 3741.0). Total num frames: 561152. Throughput: 0: 1001.8. Samples: 139310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:35:48,054][00603] Avg episode reward: [(0, '4.802')] [2024-10-16 02:35:50,770][03717] Updated weights for policy 0, policy_version 140 (0.0027) [2024-10-16 02:35:53,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3726.0). Total num frames: 577536. Throughput: 0: 1031.7. Samples: 145356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:35:53,057][00603] Avg episode reward: [(0, '4.420')] [2024-10-16 02:35:58,052][00603] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3712.0). Total num frames: 593920. Throughput: 0: 999.4. Samples: 147592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:35:58,059][00603] Avg episode reward: [(0, '4.338')] [2024-10-16 02:36:01,577][03717] Updated weights for policy 0, policy_version 150 (0.0020) [2024-10-16 02:36:03,052][00603] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3748.5). Total num frames: 618496. Throughput: 0: 993.7. Samples: 154322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:36:03,055][00603] Avg episode reward: [(0, '4.462')] [2024-10-16 02:36:08,052][00603] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3782.8). Total num frames: 643072. Throughput: 0: 1044.2. Samples: 161176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:36:08,054][00603] Avg episode reward: [(0, '4.644')] [2024-10-16 02:36:12,228][03717] Updated weights for policy 0, policy_version 160 (0.0040) [2024-10-16 02:36:13,052][00603] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3744.9). Total num frames: 655360. Throughput: 0: 1018.1. Samples: 163178. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:36:13,060][00603] Avg episode reward: [(0, '4.614')] [2024-10-16 02:36:18,052][00603] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3754.7). Total num frames: 675840. Throughput: 0: 972.3. Samples: 168660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:36:18,058][00603] Avg episode reward: [(0, '4.684')] [2024-10-16 02:36:22,184][03717] Updated weights for policy 0, policy_version 170 (0.0033) [2024-10-16 02:36:23,058][00603] Fps is (10 sec: 4093.5, 60 sec: 4027.3, 300 sec: 3763.8). Total num frames: 696320. Throughput: 0: 1006.3. Samples: 175232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:36:23,063][00603] Avg episode reward: [(0, '4.758')] [2024-10-16 02:36:28,054][00603] Fps is (10 sec: 3685.6, 60 sec: 3891.1, 300 sec: 3751.0). Total num frames: 712704. Throughput: 0: 1012.8. Samples: 177786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:36:28,057][00603] Avg episode reward: [(0, '4.909')] [2024-10-16 02:36:33,052][00603] Fps is (10 sec: 3688.7, 60 sec: 3822.9, 300 sec: 3759.9). Total num frames: 733184. Throughput: 0: 965.0. Samples: 182734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:36:33,055][00603] Avg episode reward: [(0, '5.016')] [2024-10-16 02:36:33,062][03704] Saving new best policy, reward=5.016! [2024-10-16 02:36:33,527][03717] Updated weights for policy 0, policy_version 180 (0.0020) [2024-10-16 02:36:38,052][00603] Fps is (10 sec: 4506.6, 60 sec: 4027.7, 300 sec: 3788.8). Total num frames: 757760. Throughput: 0: 991.8. Samples: 189986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:36:38,055][00603] Avg episode reward: [(0, '5.089')] [2024-10-16 02:36:38,059][03704] Saving new best policy, reward=5.089! [2024-10-16 02:36:42,930][03717] Updated weights for policy 0, policy_version 190 (0.0038) [2024-10-16 02:36:43,052][00603] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3796.3). Total num frames: 778240. Throughput: 0: 1020.3. Samples: 193504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:36:43,056][00603] Avg episode reward: [(0, '5.332')] [2024-10-16 02:36:43,067][03704] Saving new best policy, reward=5.332! [2024-10-16 02:36:48,052][00603] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3783.9). Total num frames: 794624. Throughput: 0: 968.3. Samples: 197896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:36:48,055][00603] Avg episode reward: [(0, '5.410')] [2024-10-16 02:36:48,062][03704] Saving new best policy, reward=5.410! [2024-10-16 02:36:53,052][00603] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3791.2). Total num frames: 815104. Throughput: 0: 968.5. Samples: 204758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:36:53,057][00603] Avg episode reward: [(0, '5.268')] [2024-10-16 02:36:53,074][03704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000199_815104.pth... [2024-10-16 02:36:53,280][03717] Updated weights for policy 0, policy_version 200 (0.0030) [2024-10-16 02:36:58,052][00603] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3816.7). Total num frames: 839680. Throughput: 0: 1001.1. Samples: 208226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:36:58,057][00603] Avg episode reward: [(0, '5.535')] [2024-10-16 02:36:58,061][03704] Saving new best policy, reward=5.535! [2024-10-16 02:37:03,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3786.5). Total num frames: 851968. Throughput: 0: 981.6. Samples: 212832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:37:03,058][00603] Avg episode reward: [(0, '5.463')] [2024-10-16 02:37:04,940][03717] Updated weights for policy 0, policy_version 210 (0.0033) [2024-10-16 02:37:08,052][00603] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3793.2). Total num frames: 872448. Throughput: 0: 970.0. Samples: 218878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:37:08,060][00603] Avg episode reward: [(0, '4.946')] [2024-10-16 02:37:13,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3817.1). Total num frames: 897024. Throughput: 0: 992.3. Samples: 222438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:37:13,054][00603] Avg episode reward: [(0, '5.061')] [2024-10-16 02:37:13,589][03717] Updated weights for policy 0, policy_version 220 (0.0032) [2024-10-16 02:37:18,053][00603] Fps is (10 sec: 4095.6, 60 sec: 3959.4, 300 sec: 3805.8). Total num frames: 913408. Throughput: 0: 1004.8. Samples: 227950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:37:18,058][00603] Avg episode reward: [(0, '5.121')] [2024-10-16 02:37:23,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3959.9, 300 sec: 3811.8). Total num frames: 933888. Throughput: 0: 970.1. Samples: 233642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:37:23,055][00603] Avg episode reward: [(0, '4.980')] [2024-10-16 02:37:24,665][03717] Updated weights for policy 0, policy_version 230 (0.0020) [2024-10-16 02:37:28,052][00603] Fps is (10 sec: 4506.1, 60 sec: 4096.1, 300 sec: 3833.9). Total num frames: 958464. Throughput: 0: 974.1. Samples: 237340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:37:28,055][00603] Avg episode reward: [(0, '5.093')] [2024-10-16 02:37:33,054][00603] Fps is (10 sec: 4095.1, 60 sec: 4027.6, 300 sec: 3822.9). Total num frames: 974848. Throughput: 0: 1026.1. Samples: 244072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:37:33,058][00603] Avg episode reward: [(0, '4.769')] [2024-10-16 02:37:34,833][03717] Updated weights for policy 0, policy_version 240 (0.0015) [2024-10-16 02:37:38,052][00603] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3812.4). Total num frames: 991232. Throughput: 0: 980.9. Samples: 248898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:37:38,058][00603] Avg episode reward: [(0, '4.607')] [2024-10-16 02:37:43,052][00603] Fps is (10 sec: 4506.6, 60 sec: 4027.8, 300 sec: 3848.7). Total num frames: 1019904. Throughput: 0: 984.6. Samples: 252534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:37:43,059][00603] Avg episode reward: [(0, '4.920')] [2024-10-16 02:37:43,878][03717] Updated weights for policy 0, policy_version 250 (0.0033) [2024-10-16 02:37:48,055][00603] Fps is (10 sec: 4913.6, 60 sec: 4095.8, 300 sec: 3853.2). Total num frames: 1040384. Throughput: 0: 1044.5. Samples: 259840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:37:48,058][00603] Avg episode reward: [(0, '4.959')] [2024-10-16 02:37:53,055][00603] Fps is (10 sec: 3275.8, 60 sec: 3959.3, 300 sec: 3827.9). Total num frames: 1052672. Throughput: 0: 1008.6. Samples: 264270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:37:53,058][00603] Avg episode reward: [(0, '5.143')] [2024-10-16 02:37:55,151][03717] Updated weights for policy 0, policy_version 260 (0.0039) [2024-10-16 02:37:58,052][00603] Fps is (10 sec: 3687.6, 60 sec: 3959.5, 300 sec: 3847.3). Total num frames: 1077248. Throughput: 0: 998.9. Samples: 267390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:37:58,055][00603] Avg episode reward: [(0, '5.407')] [2024-10-16 02:38:03,052][00603] Fps is (10 sec: 4916.7, 60 sec: 4164.3, 300 sec: 3866.0). Total num frames: 1101824. Throughput: 0: 1036.1. Samples: 274572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:38:03,055][00603] Avg episode reward: [(0, '5.562')] [2024-10-16 02:38:03,066][03704] Saving new best policy, reward=5.562! [2024-10-16 02:38:04,104][03717] Updated weights for policy 0, policy_version 270 (0.0036) [2024-10-16 02:38:08,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3841.8). Total num frames: 1114112. Throughput: 0: 1021.2. Samples: 279596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-16 02:38:08,057][00603] Avg episode reward: [(0, '5.500')] [2024-10-16 02:38:13,052][00603] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1134592. Throughput: 0: 992.9. Samples: 282022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:38:13,057][00603] Avg episode reward: [(0, '5.362')] [2024-10-16 02:38:15,157][03717] Updated weights for policy 0, policy_version 280 (0.0013) [2024-10-16 02:38:18,052][00603] Fps is (10 sec: 4505.5, 60 sec: 4096.1, 300 sec: 3929.4). Total num frames: 1159168. Throughput: 0: 997.3. Samples: 288948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:38:18,059][00603] Avg episode reward: [(0, '5.679')] [2024-10-16 02:38:18,064][03704] Saving new best policy, reward=5.679! [2024-10-16 02:38:23,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1175552. Throughput: 0: 1012.5. Samples: 294460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:38:23,057][00603] Avg episode reward: [(0, '6.027')] [2024-10-16 02:38:23,069][03704] Saving new best policy, reward=6.027! [2024-10-16 02:38:26,789][03717] Updated weights for policy 0, policy_version 290 (0.0021) [2024-10-16 02:38:28,052][00603] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 1191936. Throughput: 0: 976.9. Samples: 296494. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:38:28,058][00603] Avg episode reward: [(0, '5.939')] [2024-10-16 02:38:33,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 4012.7). Total num frames: 1216512. Throughput: 0: 965.7. Samples: 303292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:38:33,058][00603] Avg episode reward: [(0, '6.244')] [2024-10-16 02:38:33,068][03704] Saving new best policy, reward=6.244! [2024-10-16 02:38:35,568][03717] Updated weights for policy 0, policy_version 300 (0.0052) [2024-10-16 02:38:38,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1236992. Throughput: 0: 1010.8. Samples: 309752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:38:38,056][00603] Avg episode reward: [(0, '6.262')] [2024-10-16 02:38:38,061][03704] Saving new best policy, reward=6.262! [2024-10-16 02:38:43,055][00603] Fps is (10 sec: 3275.8, 60 sec: 3822.7, 300 sec: 4012.7). Total num frames: 1249280. Throughput: 0: 987.8. Samples: 311846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:38:43,060][00603] Avg episode reward: [(0, '6.220')] [2024-10-16 02:38:46,905][03717] Updated weights for policy 0, policy_version 310 (0.0013) [2024-10-16 02:38:48,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3998.8). Total num frames: 1273856. Throughput: 0: 959.3. Samples: 317740. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:38:48,061][00603] Avg episode reward: [(0, '5.944')] [2024-10-16 02:38:53,052][00603] Fps is (10 sec: 4916.7, 60 sec: 4096.2, 300 sec: 4040.5). Total num frames: 1298432. Throughput: 0: 1004.4. Samples: 324792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:38:53,054][00603] Avg episode reward: [(0, '5.989')] [2024-10-16 02:38:53,064][03704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000317_1298432.pth... [2024-10-16 02:38:53,219][03704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000082_335872.pth [2024-10-16 02:38:56,969][03717] Updated weights for policy 0, policy_version 320 (0.0021) [2024-10-16 02:38:58,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 1310720. Throughput: 0: 1002.3. Samples: 327124. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:38:58,055][00603] Avg episode reward: [(0, '6.342')] [2024-10-16 02:38:58,063][03704] Saving new best policy, reward=6.342! [2024-10-16 02:39:03,052][00603] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3998.8). Total num frames: 1331200. Throughput: 0: 957.3. Samples: 332028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:39:03,059][00603] Avg episode reward: [(0, '6.333')] [2024-10-16 02:39:07,233][03717] Updated weights for policy 0, policy_version 330 (0.0025) [2024-10-16 02:39:08,052][00603] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1351680. Throughput: 0: 989.0. Samples: 338966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:39:08,054][00603] Avg episode reward: [(0, '6.688')] [2024-10-16 02:39:08,103][03704] Saving new best policy, reward=6.688! [2024-10-16 02:39:13,054][00603] Fps is (10 sec: 4095.2, 60 sec: 3959.3, 300 sec: 4012.7). Total num frames: 1372160. Throughput: 0: 1013.2. Samples: 342092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:39:13,056][00603] Avg episode reward: [(0, '6.162')] [2024-10-16 02:39:18,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3984.9). Total num frames: 1388544. Throughput: 0: 956.5. Samples: 346334. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:39:18,060][00603] Avg episode reward: [(0, '6.009')] [2024-10-16 02:39:18,836][03717] Updated weights for policy 0, policy_version 340 (0.0033) [2024-10-16 02:39:23,052][00603] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 1409024. Throughput: 0: 968.3. Samples: 353324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-16 02:39:23,059][00603] Avg episode reward: [(0, '6.313')] [2024-10-16 02:39:27,923][03717] Updated weights for policy 0, policy_version 350 (0.0022) [2024-10-16 02:39:28,054][00603] Fps is (10 sec: 4504.8, 60 sec: 4027.6, 300 sec: 4012.7). Total num frames: 1433600. Throughput: 0: 1003.9. Samples: 357020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:39:28,056][00603] Avg episode reward: [(0, '6.796')] [2024-10-16 02:39:28,063][03704] Saving new best policy, reward=6.796! [2024-10-16 02:39:33,052][00603] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3971.0). Total num frames: 1445888. Throughput: 0: 970.6. Samples: 361418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-16 02:39:33,056][00603] Avg episode reward: [(0, '6.687')] [2024-10-16 02:39:38,052][00603] Fps is (10 sec: 3277.4, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 1466368. Throughput: 0: 946.7. Samples: 367394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:39:38,059][00603] Avg episode reward: [(0, '7.213')] [2024-10-16 02:39:38,064][03704] Saving new best policy, reward=7.213! [2024-10-16 02:39:39,307][03717] Updated weights for policy 0, policy_version 360 (0.0031) [2024-10-16 02:39:43,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4027.9, 300 sec: 3998.8). Total num frames: 1490944. Throughput: 0: 972.9. Samples: 370904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:39:43,058][00603] Avg episode reward: [(0, '7.003')] [2024-10-16 02:39:48,056][00603] Fps is (10 sec: 3684.8, 60 sec: 3822.7, 300 sec: 3957.1). Total num frames: 1503232. Throughput: 0: 986.7. Samples: 376434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:39:48,059][00603] Avg episode reward: [(0, '7.479')] [2024-10-16 02:39:48,060][03704] Saving new best policy, reward=7.479! [2024-10-16 02:39:51,105][03717] Updated weights for policy 0, policy_version 370 (0.0045) [2024-10-16 02:39:53,052][00603] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3957.2). Total num frames: 1523712. Throughput: 0: 942.4. Samples: 381374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:39:53,055][00603] Avg episode reward: [(0, '7.632')] [2024-10-16 02:39:53,066][03704] Saving new best policy, reward=7.632! [2024-10-16 02:39:58,052][00603] Fps is (10 sec: 4097.7, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1544192. Throughput: 0: 945.0. Samples: 384616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:39:58,054][00603] Avg episode reward: [(0, '8.528')] [2024-10-16 02:39:58,057][03704] Saving new best policy, reward=8.528! [2024-10-16 02:39:59,974][03717] Updated weights for policy 0, policy_version 380 (0.0034) [2024-10-16 02:40:03,057][00603] Fps is (10 sec: 4094.0, 60 sec: 3890.9, 300 sec: 3971.0). Total num frames: 1564672. Throughput: 0: 990.2. Samples: 390898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:40:03,059][00603] Avg episode reward: [(0, '8.286')] [2024-10-16 02:40:08,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 1581056. Throughput: 0: 941.0. Samples: 395668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:40:08,059][00603] Avg episode reward: [(0, '8.441')] [2024-10-16 02:40:11,325][03717] Updated weights for policy 0, policy_version 390 (0.0021) [2024-10-16 02:40:13,052][00603] Fps is (10 sec: 4098.0, 60 sec: 3891.3, 300 sec: 3957.2). Total num frames: 1605632. Throughput: 0: 936.7. Samples: 399172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:40:13,058][00603] Avg episode reward: [(0, '8.110')] [2024-10-16 02:40:18,054][00603] Fps is (10 sec: 4504.7, 60 sec: 3959.3, 300 sec: 3971.0). Total num frames: 1626112. Throughput: 0: 995.1. Samples: 406198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:40:18,061][00603] Avg episode reward: [(0, '8.290')] [2024-10-16 02:40:22,163][03717] Updated weights for policy 0, policy_version 400 (0.0033) [2024-10-16 02:40:23,052][00603] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 1638400. Throughput: 0: 957.1. Samples: 410462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:40:23,057][00603] Avg episode reward: [(0, '8.242')] [2024-10-16 02:40:28,053][00603] Fps is (10 sec: 3686.8, 60 sec: 3823.0, 300 sec: 3929.4). Total num frames: 1662976. Throughput: 0: 948.0. Samples: 413564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:40:28,055][00603] Avg episode reward: [(0, '8.181')] [2024-10-16 02:40:31,597][03717] Updated weights for policy 0, policy_version 410 (0.0046) [2024-10-16 02:40:33,052][00603] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1683456. Throughput: 0: 983.5. Samples: 420688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-16 02:40:33,054][00603] Avg episode reward: [(0, '8.416')] [2024-10-16 02:40:38,053][00603] Fps is (10 sec: 3686.3, 60 sec: 3891.1, 300 sec: 3943.3). Total num frames: 1699840. Throughput: 0: 993.4. Samples: 426078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:40:38,058][00603] Avg episode reward: [(0, '8.875')] [2024-10-16 02:40:38,065][03704] Saving new best policy, reward=8.875! [2024-10-16 02:40:42,577][03717] Updated weights for policy 0, policy_version 420 (0.0014) [2024-10-16 02:40:43,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 1720320. Throughput: 0: 974.4. Samples: 428462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:40:43,054][00603] Avg episode reward: [(0, '10.845')] [2024-10-16 02:40:43,067][03704] Saving new best policy, reward=10.845! [2024-10-16 02:40:48,052][00603] Fps is (10 sec: 4506.1, 60 sec: 4028.0, 300 sec: 3957.2). Total num frames: 1744896. Throughput: 0: 990.5. Samples: 435466. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:40:48,054][00603] Avg episode reward: [(0, '11.560')] [2024-10-16 02:40:48,057][03704] Saving new best policy, reward=11.560! [2024-10-16 02:40:51,833][03717] Updated weights for policy 0, policy_version 430 (0.0044) [2024-10-16 02:40:53,057][00603] Fps is (10 sec: 4093.9, 60 sec: 3959.1, 300 sec: 3957.1). Total num frames: 1761280. Throughput: 0: 1014.2. Samples: 441310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:40:53,060][00603] Avg episode reward: [(0, '11.973')] [2024-10-16 02:40:53,073][03704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000430_1761280.pth... [2024-10-16 02:40:53,348][03704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000199_815104.pth [2024-10-16 02:40:53,364][03704] Saving new best policy, reward=11.973! [2024-10-16 02:40:58,052][00603] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1777664. Throughput: 0: 980.0. Samples: 443272. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:40:58,056][00603] Avg episode reward: [(0, '11.443')] [2024-10-16 02:41:02,774][03717] Updated weights for policy 0, policy_version 440 (0.0025) [2024-10-16 02:41:03,052][00603] Fps is (10 sec: 4098.1, 60 sec: 3959.8, 300 sec: 3929.4). Total num frames: 1802240. Throughput: 0: 971.8. Samples: 449928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:41:03,059][00603] Avg episode reward: [(0, '12.051')] [2024-10-16 02:41:03,072][03704] Saving new best policy, reward=12.051! [2024-10-16 02:41:08,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1822720. Throughput: 0: 1028.0. Samples: 456720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:41:08,059][00603] Avg episode reward: [(0, '11.876')] [2024-10-16 02:41:13,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1839104. Throughput: 0: 1004.9. Samples: 458784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:41:13,055][00603] Avg episode reward: [(0, '11.983')] [2024-10-16 02:41:14,044][03717] Updated weights for policy 0, policy_version 450 (0.0025) [2024-10-16 02:41:18,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3943.3). Total num frames: 1859584. Throughput: 0: 973.4. Samples: 464490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:41:18,056][00603] Avg episode reward: [(0, '12.045')] [2024-10-16 02:41:22,751][03717] Updated weights for policy 0, policy_version 460 (0.0024) [2024-10-16 02:41:23,053][00603] Fps is (10 sec: 4505.2, 60 sec: 4095.9, 300 sec: 3971.1). Total num frames: 1884160. Throughput: 0: 1009.6. Samples: 471508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:41:23,059][00603] Avg episode reward: [(0, '12.369')] [2024-10-16 02:41:23,067][03704] Saving new best policy, reward=12.369! [2024-10-16 02:41:28,052][00603] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 1900544. Throughput: 0: 1014.5. Samples: 474116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:41:28,056][00603] Avg episode reward: [(0, '13.322')] [2024-10-16 02:41:28,061][03704] Saving new best policy, reward=13.322! [2024-10-16 02:41:33,052][00603] Fps is (10 sec: 3277.1, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 1916928. Throughput: 0: 968.0. Samples: 479024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:41:33,059][00603] Avg episode reward: [(0, '13.834')] [2024-10-16 02:41:33,070][03704] Saving new best policy, reward=13.834! [2024-10-16 02:41:34,068][03717] Updated weights for policy 0, policy_version 470 (0.0045) [2024-10-16 02:41:38,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3943.3). Total num frames: 1941504. Throughput: 0: 1001.2. Samples: 486358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-16 02:41:38,054][00603] Avg episode reward: [(0, '14.114')] [2024-10-16 02:41:38,060][03704] Saving new best policy, reward=14.114! [2024-10-16 02:41:43,053][00603] Fps is (10 sec: 4505.1, 60 sec: 4027.7, 300 sec: 3957.1). Total num frames: 1961984. Throughput: 0: 1035.8. Samples: 489886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:41:43,056][00603] Avg episode reward: [(0, '14.096')] [2024-10-16 02:41:43,646][03717] Updated weights for policy 0, policy_version 480 (0.0034) [2024-10-16 02:41:48,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 1978368. Throughput: 0: 986.7. Samples: 494328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:41:48,060][00603] Avg episode reward: [(0, '12.891')] [2024-10-16 02:41:53,052][00603] Fps is (10 sec: 4096.5, 60 sec: 4028.1, 300 sec: 3943.3). Total num frames: 2002944. Throughput: 0: 991.8. Samples: 501352. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:41:53,054][00603] Avg episode reward: [(0, '13.058')] [2024-10-16 02:41:53,299][03717] Updated weights for policy 0, policy_version 490 (0.0023) [2024-10-16 02:41:58,052][00603] Fps is (10 sec: 4915.0, 60 sec: 4164.2, 300 sec: 3984.9). Total num frames: 2027520. Throughput: 0: 1030.1. Samples: 505138. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:41:58,059][00603] Avg episode reward: [(0, '12.425')] [2024-10-16 02:42:03,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2039808. Throughput: 0: 1016.5. Samples: 510234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:42:03,054][00603] Avg episode reward: [(0, '13.419')] [2024-10-16 02:42:04,281][03717] Updated weights for policy 0, policy_version 500 (0.0032) [2024-10-16 02:42:08,052][00603] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2064384. Throughput: 0: 1003.6. Samples: 516670. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:42:08,054][00603] Avg episode reward: [(0, '14.499')] [2024-10-16 02:42:08,062][03704] Saving new best policy, reward=14.499! [2024-10-16 02:42:12,619][03717] Updated weights for policy 0, policy_version 510 (0.0021) [2024-10-16 02:42:13,052][00603] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3984.9). Total num frames: 2088960. Throughput: 0: 1025.7. Samples: 520274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:42:13,054][00603] Avg episode reward: [(0, '15.035')] [2024-10-16 02:42:13,066][03704] Saving new best policy, reward=15.035! [2024-10-16 02:42:18,055][00603] Fps is (10 sec: 4094.8, 60 sec: 4095.8, 300 sec: 3971.0). Total num frames: 2105344. Throughput: 0: 1046.6. Samples: 526126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:42:18,058][00603] Avg episode reward: [(0, '16.331')] [2024-10-16 02:42:18,059][03704] Saving new best policy, reward=16.331! [2024-10-16 02:42:23,052][00603] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2121728. Throughput: 0: 1005.6. Samples: 531612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:42:23,060][00603] Avg episode reward: [(0, '17.236')] [2024-10-16 02:42:23,085][03704] Saving new best policy, reward=17.236! [2024-10-16 02:42:24,043][03717] Updated weights for policy 0, policy_version 520 (0.0049) [2024-10-16 02:42:28,052][00603] Fps is (10 sec: 4097.2, 60 sec: 4096.0, 300 sec: 3971.1). Total num frames: 2146304. Throughput: 0: 1005.2. Samples: 535120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:42:28,058][00603] Avg episode reward: [(0, '17.672')] [2024-10-16 02:42:28,063][03704] Saving new best policy, reward=17.672! [2024-10-16 02:42:33,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3984.9). Total num frames: 2166784. Throughput: 0: 1054.4. Samples: 541776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:42:33,056][00603] Avg episode reward: [(0, '17.660')] [2024-10-16 02:42:33,812][03717] Updated weights for policy 0, policy_version 530 (0.0016) [2024-10-16 02:42:38,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2183168. Throughput: 0: 1007.2. Samples: 546676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:42:38,058][00603] Avg episode reward: [(0, '17.101')] [2024-10-16 02:42:43,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 3957.2). Total num frames: 2207744. Throughput: 0: 1007.3. Samples: 550466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:42:43,059][00603] Avg episode reward: [(0, '16.940')] [2024-10-16 02:42:43,187][03717] Updated weights for policy 0, policy_version 540 (0.0028) [2024-10-16 02:42:48,055][00603] Fps is (10 sec: 4913.6, 60 sec: 4232.3, 300 sec: 3998.8). Total num frames: 2232320. Throughput: 0: 1059.4. Samples: 557910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:42:48,058][00603] Avg episode reward: [(0, '17.073')] [2024-10-16 02:42:53,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 2248704. Throughput: 0: 1019.0. Samples: 562526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:42:53,061][00603] Avg episode reward: [(0, '18.839')] [2024-10-16 02:42:53,073][03704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000549_2248704.pth... [2024-10-16 02:42:53,243][03704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000317_1298432.pth [2024-10-16 02:42:53,259][03704] Saving new best policy, reward=18.839! [2024-10-16 02:42:54,244][03717] Updated weights for policy 0, policy_version 550 (0.0024) [2024-10-16 02:42:58,052][00603] Fps is (10 sec: 3687.6, 60 sec: 4027.8, 300 sec: 3957.2). Total num frames: 2269184. Throughput: 0: 1010.2. Samples: 565734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:42:58,063][00603] Avg episode reward: [(0, '17.790')] [2024-10-16 02:43:02,494][03717] Updated weights for policy 0, policy_version 560 (0.0020) [2024-10-16 02:43:03,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 3998.8). Total num frames: 2293760. Throughput: 0: 1042.3. Samples: 573026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:43:03,058][00603] Avg episode reward: [(0, '17.401')] [2024-10-16 02:43:08,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 2310144. Throughput: 0: 1037.4. Samples: 578294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:43:08,057][00603] Avg episode reward: [(0, '17.901')] [2024-10-16 02:43:13,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2330624. Throughput: 0: 1016.4. Samples: 580860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:43:13,057][00603] Avg episode reward: [(0, '16.429')] [2024-10-16 02:43:13,489][03717] Updated weights for policy 0, policy_version 570 (0.0036) [2024-10-16 02:43:18,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4164.5, 300 sec: 3998.8). Total num frames: 2355200. Throughput: 0: 1029.2. Samples: 588090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:43:18,058][00603] Avg episode reward: [(0, '16.882')] [2024-10-16 02:43:23,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3998.8). Total num frames: 2371584. Throughput: 0: 1052.6. Samples: 594044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-16 02:43:23,057][00603] Avg episode reward: [(0, '16.389')] [2024-10-16 02:43:23,315][03717] Updated weights for policy 0, policy_version 580 (0.0033) [2024-10-16 02:43:28,052][00603] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2387968. Throughput: 0: 1016.5. Samples: 596210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-16 02:43:28,059][00603] Avg episode reward: [(0, '15.599')] [2024-10-16 02:43:33,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 2412544. Throughput: 0: 999.5. Samples: 602884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:43:33,059][00603] Avg episode reward: [(0, '16.857')] [2024-10-16 02:43:33,256][03717] Updated weights for policy 0, policy_version 590 (0.0031) [2024-10-16 02:43:38,052][00603] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4026.6). Total num frames: 2437120. Throughput: 0: 1049.6. Samples: 609758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:43:38,058][00603] Avg episode reward: [(0, '17.643')] [2024-10-16 02:43:43,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2449408. Throughput: 0: 1026.1. Samples: 611910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:43:43,054][00603] Avg episode reward: [(0, '17.413')] [2024-10-16 02:43:44,469][03717] Updated weights for policy 0, policy_version 600 (0.0030) [2024-10-16 02:43:48,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4028.0, 300 sec: 3984.9). Total num frames: 2473984. Throughput: 0: 992.2. Samples: 617674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:43:48,054][00603] Avg episode reward: [(0, '16.980')] [2024-10-16 02:43:53,043][03717] Updated weights for policy 0, policy_version 610 (0.0033) [2024-10-16 02:43:53,058][00603] Fps is (10 sec: 4912.5, 60 sec: 4163.9, 300 sec: 4026.5). Total num frames: 2498560. Throughput: 0: 1036.0. Samples: 624918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:43:53,060][00603] Avg episode reward: [(0, '16.509')] [2024-10-16 02:43:58,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2510848. Throughput: 0: 1032.2. Samples: 627310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-10-16 02:43:58,059][00603] Avg episode reward: [(0, '17.036')] [2024-10-16 02:44:03,052][00603] Fps is (10 sec: 3278.6, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2531328. Throughput: 0: 984.8. Samples: 632408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:44:03,057][00603] Avg episode reward: [(0, '17.202')] [2024-10-16 02:44:04,248][03717] Updated weights for policy 0, policy_version 620 (0.0026) [2024-10-16 02:44:08,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 2555904. Throughput: 0: 1010.4. Samples: 639514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:44:08,054][00603] Avg episode reward: [(0, '19.744')] [2024-10-16 02:44:08,061][03704] Saving new best policy, reward=19.744! [2024-10-16 02:44:13,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2572288. Throughput: 0: 1038.0. Samples: 642920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:44:13,054][00603] Avg episode reward: [(0, '19.116')] [2024-10-16 02:44:14,572][03717] Updated weights for policy 0, policy_version 630 (0.0036) [2024-10-16 02:44:18,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2592768. Throughput: 0: 984.6. Samples: 647190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:44:18,060][00603] Avg episode reward: [(0, '17.850')] [2024-10-16 02:44:23,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 2613248. Throughput: 0: 988.8. Samples: 654254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:44:23,061][00603] Avg episode reward: [(0, '17.408')] [2024-10-16 02:44:24,155][03717] Updated weights for policy 0, policy_version 640 (0.0022) [2024-10-16 02:44:28,052][00603] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2633728. Throughput: 0: 1018.2. Samples: 657728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:44:28,055][00603] Avg episode reward: [(0, '17.810')] [2024-10-16 02:44:33,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2650112. Throughput: 0: 999.0. Samples: 662630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:44:33,063][00603] Avg episode reward: [(0, '17.718')] [2024-10-16 02:44:35,226][03717] Updated weights for policy 0, policy_version 650 (0.0023) [2024-10-16 02:44:38,052][00603] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2674688. Throughput: 0: 982.3. Samples: 669114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:44:38,060][00603] Avg episode reward: [(0, '19.154')] [2024-10-16 02:44:43,052][00603] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4054.4). Total num frames: 2699264. Throughput: 0: 1011.1. Samples: 672808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:44:43,054][00603] Avg episode reward: [(0, '20.785')] [2024-10-16 02:44:43,070][03704] Saving new best policy, reward=20.785! [2024-10-16 02:44:43,719][03717] Updated weights for policy 0, policy_version 660 (0.0034) [2024-10-16 02:44:48,057][00603] Fps is (10 sec: 4094.0, 60 sec: 4027.4, 300 sec: 4040.4). Total num frames: 2715648. Throughput: 0: 1024.7. Samples: 678524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:44:48,065][00603] Avg episode reward: [(0, '21.257')] [2024-10-16 02:44:48,068][03704] Saving new best policy, reward=21.257! [2024-10-16 02:44:53,052][00603] Fps is (10 sec: 3276.8, 60 sec: 3891.6, 300 sec: 4026.6). Total num frames: 2732032. Throughput: 0: 992.0. Samples: 684152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:44:53,055][00603] Avg episode reward: [(0, '22.628')] [2024-10-16 02:44:53,067][03704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000668_2736128.pth... [2024-10-16 02:44:53,186][03704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000430_1761280.pth [2024-10-16 02:44:53,206][03704] Saving new best policy, reward=22.628! [2024-10-16 02:44:54,956][03717] Updated weights for policy 0, policy_version 670 (0.0027) [2024-10-16 02:44:58,052][00603] Fps is (10 sec: 4098.1, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2756608. Throughput: 0: 991.4. Samples: 687534. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:44:58,055][00603] Avg episode reward: [(0, '21.725')] [2024-10-16 02:45:03,057][00603] Fps is (10 sec: 4503.3, 60 sec: 4095.7, 300 sec: 4054.3). Total num frames: 2777088. Throughput: 0: 1040.7. Samples: 694026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:45:03,060][00603] Avg episode reward: [(0, '21.270')] [2024-10-16 02:45:05,487][03717] Updated weights for policy 0, policy_version 680 (0.0027) [2024-10-16 02:45:08,052][00603] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 2793472. Throughput: 0: 991.5. Samples: 698872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:45:08,054][00603] Avg episode reward: [(0, '20.941')] [2024-10-16 02:45:13,052][00603] Fps is (10 sec: 4098.1, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2818048. Throughput: 0: 995.4. Samples: 702520. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-16 02:45:13,055][00603] Avg episode reward: [(0, '18.184')] [2024-10-16 02:45:14,444][03717] Updated weights for policy 0, policy_version 690 (0.0033) [2024-10-16 02:45:18,053][00603] Fps is (10 sec: 4505.3, 60 sec: 4095.9, 300 sec: 4068.2). Total num frames: 2838528. Throughput: 0: 1047.1. Samples: 709752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:45:18,055][00603] Avg episode reward: [(0, '19.100')] [2024-10-16 02:45:23,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2854912. Throughput: 0: 1004.8. Samples: 714332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:45:23,057][00603] Avg episode reward: [(0, '18.755')] [2024-10-16 02:45:25,481][03717] Updated weights for policy 0, policy_version 700 (0.0044) [2024-10-16 02:45:28,052][00603] Fps is (10 sec: 4096.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2879488. Throughput: 0: 994.1. Samples: 717542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:45:28,054][00603] Avg episode reward: [(0, '18.293')] [2024-10-16 02:45:33,052][00603] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4082.1). Total num frames: 2904064. Throughput: 0: 1032.6. Samples: 724984. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:45:33,055][00603] Avg episode reward: [(0, '19.591')] [2024-10-16 02:45:34,098][03717] Updated weights for policy 0, policy_version 710 (0.0033) [2024-10-16 02:45:38,054][00603] Fps is (10 sec: 3685.6, 60 sec: 4027.6, 300 sec: 4054.3). Total num frames: 2916352. Throughput: 0: 1022.7. Samples: 730176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:45:38,061][00603] Avg episode reward: [(0, '19.568')] [2024-10-16 02:45:43,052][00603] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2936832. Throughput: 0: 1006.1. Samples: 732808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:45:43,055][00603] Avg episode reward: [(0, '21.287')] [2024-10-16 02:45:44,754][03717] Updated weights for policy 0, policy_version 720 (0.0016) [2024-10-16 02:45:48,052][00603] Fps is (10 sec: 4916.3, 60 sec: 4164.6, 300 sec: 4082.2). Total num frames: 2965504. Throughput: 0: 1028.0. Samples: 740282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:45:48,055][00603] Avg episode reward: [(0, '22.122')] [2024-10-16 02:45:53,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 2981888. Throughput: 0: 1052.7. Samples: 746242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:45:53,055][00603] Avg episode reward: [(0, '23.105')] [2024-10-16 02:45:53,073][03704] Saving new best policy, reward=23.105! [2024-10-16 02:45:55,243][03717] Updated weights for policy 0, policy_version 730 (0.0067) [2024-10-16 02:45:58,052][00603] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2998272. Throughput: 0: 1019.9. Samples: 748414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:45:58,055][00603] Avg episode reward: [(0, '22.290')] [2024-10-16 02:46:03,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.3, 300 sec: 4068.2). Total num frames: 3022848. Throughput: 0: 1011.0. Samples: 755248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:46:03,055][00603] Avg episode reward: [(0, '21.451')] [2024-10-16 02:46:04,222][03717] Updated weights for policy 0, policy_version 740 (0.0015) [2024-10-16 02:46:08,052][00603] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 3047424. Throughput: 0: 1062.0. Samples: 762120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:46:08,057][00603] Avg episode reward: [(0, '20.886')] [2024-10-16 02:46:13,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3059712. Throughput: 0: 1039.0. Samples: 764298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:46:13,059][00603] Avg episode reward: [(0, '20.538')] [2024-10-16 02:46:15,237][03717] Updated weights for policy 0, policy_version 750 (0.0046) [2024-10-16 02:46:18,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 4068.2). Total num frames: 3084288. Throughput: 0: 1007.2. Samples: 770310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:46:18,059][00603] Avg episode reward: [(0, '21.035')] [2024-10-16 02:46:23,053][00603] Fps is (10 sec: 4914.7, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 3108864. Throughput: 0: 1056.3. Samples: 777708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:46:23,058][00603] Avg episode reward: [(0, '21.623')] [2024-10-16 02:46:23,575][03717] Updated weights for policy 0, policy_version 760 (0.0034) [2024-10-16 02:46:28,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3125248. Throughput: 0: 1054.5. Samples: 780260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-16 02:46:28,055][00603] Avg episode reward: [(0, '22.026')] [2024-10-16 02:46:33,052][00603] Fps is (10 sec: 3686.8, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3145728. Throughput: 0: 1004.9. Samples: 785502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:46:33,055][00603] Avg episode reward: [(0, '21.559')] [2024-10-16 02:46:34,676][03717] Updated weights for policy 0, policy_version 770 (0.0058) [2024-10-16 02:46:38,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4232.7, 300 sec: 4096.0). Total num frames: 3170304. Throughput: 0: 1037.1. Samples: 792912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:46:38,055][00603] Avg episode reward: [(0, '21.806')] [2024-10-16 02:46:43,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3186688. Throughput: 0: 1067.6. Samples: 796454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:46:43,057][00603] Avg episode reward: [(0, '20.282')] [2024-10-16 02:46:44,635][03717] Updated weights for policy 0, policy_version 780 (0.0029) [2024-10-16 02:46:48,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3207168. Throughput: 0: 1014.0. Samples: 800880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:46:48,056][00603] Avg episode reward: [(0, '21.101')] [2024-10-16 02:46:53,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3231744. Throughput: 0: 1024.0. Samples: 808202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:46:53,054][00603] Avg episode reward: [(0, '22.551')] [2024-10-16 02:46:53,071][03704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000789_3231744.pth... [2024-10-16 02:46:53,197][03704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000549_2248704.pth [2024-10-16 02:46:53,726][03717] Updated weights for policy 0, policy_version 790 (0.0041) [2024-10-16 02:46:58,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4109.9). Total num frames: 3252224. Throughput: 0: 1053.0. Samples: 811684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:46:58,061][00603] Avg episode reward: [(0, '22.959')] [2024-10-16 02:47:03,052][00603] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3264512. Throughput: 0: 1028.6. Samples: 816596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-16 02:47:03,055][00603] Avg episode reward: [(0, '22.802')] [2024-10-16 02:47:05,124][03717] Updated weights for policy 0, policy_version 800 (0.0022) [2024-10-16 02:47:08,052][00603] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3289088. Throughput: 0: 1009.0. Samples: 823112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:47:08,057][00603] Avg episode reward: [(0, '23.428')] [2024-10-16 02:47:08,064][03704] Saving new best policy, reward=23.428! [2024-10-16 02:47:13,052][00603] Fps is (10 sec: 4915.3, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 3313664. Throughput: 0: 1032.5. Samples: 826722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:47:13,055][00603] Avg episode reward: [(0, '23.899')] [2024-10-16 02:47:13,063][03704] Saving new best policy, reward=23.899! [2024-10-16 02:47:13,331][03717] Updated weights for policy 0, policy_version 810 (0.0034) [2024-10-16 02:47:18,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3330048. Throughput: 0: 1041.6. Samples: 832376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:47:18,055][00603] Avg episode reward: [(0, '23.801')] [2024-10-16 02:47:23,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4082.1). Total num frames: 3350528. Throughput: 0: 1005.7. Samples: 838168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:47:23,060][00603] Avg episode reward: [(0, '23.108')] [2024-10-16 02:47:24,538][03717] Updated weights for policy 0, policy_version 820 (0.0022) [2024-10-16 02:47:28,052][00603] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3375104. Throughput: 0: 1005.4. Samples: 841696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:47:28,059][00603] Avg episode reward: [(0, '23.876')] [2024-10-16 02:47:33,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3391488. Throughput: 0: 1052.2. Samples: 848230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:47:33,057][00603] Avg episode reward: [(0, '23.734')] [2024-10-16 02:47:34,784][03717] Updated weights for policy 0, policy_version 830 (0.0041) [2024-10-16 02:47:38,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3411968. Throughput: 0: 1003.9. Samples: 853378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:47:38,056][00603] Avg episode reward: [(0, '23.367')] [2024-10-16 02:47:43,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.2). Total num frames: 3436544. Throughput: 0: 1007.3. Samples: 857012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:47:43,054][00603] Avg episode reward: [(0, '23.605')] [2024-10-16 02:47:43,849][03717] Updated weights for policy 0, policy_version 840 (0.0029) [2024-10-16 02:47:48,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3457024. Throughput: 0: 1059.6. Samples: 864276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:47:48,060][00603] Avg episode reward: [(0, '22.936')] [2024-10-16 02:47:53,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3473408. Throughput: 0: 1014.2. Samples: 868752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:47:53,055][00603] Avg episode reward: [(0, '22.812')] [2024-10-16 02:47:54,581][03717] Updated weights for policy 0, policy_version 850 (0.0033) [2024-10-16 02:47:58,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3493888. Throughput: 0: 1006.8. Samples: 872030. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:47:58,055][00603] Avg episode reward: [(0, '23.696')] [2024-10-16 02:48:03,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 3518464. Throughput: 0: 1045.1. Samples: 879404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-10-16 02:48:03,054][00603] Avg episode reward: [(0, '24.361')] [2024-10-16 02:48:03,070][03704] Saving new best policy, reward=24.361! [2024-10-16 02:48:03,076][03717] Updated weights for policy 0, policy_version 860 (0.0027) [2024-10-16 02:48:08,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3534848. Throughput: 0: 1025.8. Samples: 884328. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:48:08,055][00603] Avg episode reward: [(0, '23.719')] [2024-10-16 02:48:13,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3555328. Throughput: 0: 1008.3. Samples: 887070. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:48:13,055][00603] Avg episode reward: [(0, '24.054')] [2024-10-16 02:48:14,163][03717] Updated weights for policy 0, policy_version 870 (0.0037) [2024-10-16 02:48:18,052][00603] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3579904. Throughput: 0: 1026.5. Samples: 894424. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:48:18,055][00603] Avg episode reward: [(0, '25.121')] [2024-10-16 02:48:18,060][03704] Saving new best policy, reward=25.121! [2024-10-16 02:48:23,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3596288. Throughput: 0: 1041.6. Samples: 900250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:48:23,057][00603] Avg episode reward: [(0, '25.685')] [2024-10-16 02:48:23,075][03704] Saving new best policy, reward=25.685! [2024-10-16 02:48:24,686][03717] Updated weights for policy 0, policy_version 880 (0.0038) [2024-10-16 02:48:28,052][00603] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3616768. Throughput: 0: 1008.3. Samples: 902384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:48:28,054][00603] Avg episode reward: [(0, '25.312')] [2024-10-16 02:48:33,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3641344. Throughput: 0: 999.6. Samples: 909256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:48:33,054][00603] Avg episode reward: [(0, '25.542')] [2024-10-16 02:48:33,879][03717] Updated weights for policy 0, policy_version 890 (0.0032) [2024-10-16 02:48:38,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3661824. Throughput: 0: 1052.4. Samples: 916110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:48:38,054][00603] Avg episode reward: [(0, '26.028')] [2024-10-16 02:48:38,060][03704] Saving new best policy, reward=26.028! [2024-10-16 02:48:43,052][00603] Fps is (10 sec: 3276.7, 60 sec: 3959.4, 300 sec: 4068.2). Total num frames: 3674112. Throughput: 0: 1026.2. Samples: 918210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:48:43,058][00603] Avg episode reward: [(0, '24.318')] [2024-10-16 02:48:44,966][03717] Updated weights for policy 0, policy_version 900 (0.0031) [2024-10-16 02:48:48,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.3). Total num frames: 3698688. Throughput: 0: 999.1. Samples: 924364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:48:48,055][00603] Avg episode reward: [(0, '24.113')] [2024-10-16 02:48:53,052][00603] Fps is (10 sec: 4915.4, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3723264. Throughput: 0: 1053.3. Samples: 931728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:48:53,054][00603] Avg episode reward: [(0, '24.082')] [2024-10-16 02:48:53,069][03704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000909_3723264.pth... [2024-10-16 02:48:53,188][03704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000668_2736128.pth [2024-10-16 02:48:53,349][03717] Updated weights for policy 0, policy_version 910 (0.0035) [2024-10-16 02:48:58,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3739648. Throughput: 0: 1045.7. Samples: 934126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-10-16 02:48:58,056][00603] Avg episode reward: [(0, '23.201')] [2024-10-16 02:49:03,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3760128. Throughput: 0: 998.0. Samples: 939334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:49:03,054][00603] Avg episode reward: [(0, '23.098')] [2024-10-16 02:49:04,446][03717] Updated weights for policy 0, policy_version 920 (0.0034) [2024-10-16 02:49:08,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3784704. Throughput: 0: 1030.9. Samples: 946640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:49:08,054][00603] Avg episode reward: [(0, '23.376')] [2024-10-16 02:49:13,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3801088. Throughput: 0: 1056.8. Samples: 949938. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:49:13,054][00603] Avg episode reward: [(0, '22.386')] [2024-10-16 02:49:14,466][03717] Updated weights for policy 0, policy_version 930 (0.0017) [2024-10-16 02:49:18,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4096.0). Total num frames: 3821568. Throughput: 0: 1003.2. Samples: 954398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:49:18,055][00603] Avg episode reward: [(0, '22.402')] [2024-10-16 02:49:23,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3846144. Throughput: 0: 1015.5. Samples: 961808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:49:23,062][00603] Avg episode reward: [(0, '22.712')] [2024-10-16 02:49:23,698][03717] Updated weights for policy 0, policy_version 940 (0.0029) [2024-10-16 02:49:28,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3866624. Throughput: 0: 1051.2. Samples: 965514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-10-16 02:49:28,055][00603] Avg episode reward: [(0, '22.761')] [2024-10-16 02:49:33,052][00603] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3883008. Throughput: 0: 1021.1. Samples: 970314. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-16 02:49:33,054][00603] Avg episode reward: [(0, '23.160')] [2024-10-16 02:49:34,642][03717] Updated weights for policy 0, policy_version 950 (0.0047) [2024-10-16 02:49:38,052][00603] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3907584. Throughput: 0: 1005.5. Samples: 976976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-10-16 02:49:38,054][00603] Avg episode reward: [(0, '24.975')] [2024-10-16 02:49:43,052][00603] Fps is (10 sec: 4505.6, 60 sec: 4232.6, 300 sec: 4110.0). Total num frames: 3928064. Throughput: 0: 1035.5. Samples: 980722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:49:43,054][00603] Avg episode reward: [(0, '25.294')] [2024-10-16 02:49:43,147][03717] Updated weights for policy 0, policy_version 960 (0.0042) [2024-10-16 02:49:48,054][00603] Fps is (10 sec: 3685.5, 60 sec: 4095.8, 300 sec: 4109.9). Total num frames: 3944448. Throughput: 0: 1044.6. Samples: 986344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-10-16 02:49:48,059][00603] Avg episode reward: [(0, '25.442')] [2024-10-16 02:49:53,052][00603] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3964928. Throughput: 0: 1015.6. Samples: 992342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-10-16 02:49:53,059][00603] Avg episode reward: [(0, '25.010')] [2024-10-16 02:49:54,050][03717] Updated weights for policy 0, policy_version 970 (0.0024) [2024-10-16 02:49:58,052][00603] Fps is (10 sec: 4506.7, 60 sec: 4164.3, 300 sec: 4110.0). Total num frames: 3989504. Throughput: 0: 1025.4. Samples: 996082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-10-16 02:49:58,060][00603] Avg episode reward: [(0, '25.234')] [2024-10-16 02:50:01,508][03704] Stopping Batcher_0... [2024-10-16 02:50:01,508][03704] Loop batcher_evt_loop terminating... [2024-10-16 02:50:01,510][03704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-16 02:50:01,509][00603] Component Batcher_0 stopped! [2024-10-16 02:50:01,611][03717] Weights refcount: 2 0 [2024-10-16 02:50:01,612][03717] Stopping InferenceWorker_p0-w0... [2024-10-16 02:50:01,613][03717] Loop inference_proc0-0_evt_loop terminating... [2024-10-16 02:50:01,619][00603] Component InferenceWorker_p0-w0 stopped! [2024-10-16 02:50:01,690][03704] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000789_3231744.pth [2024-10-16 02:50:01,722][03704] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-16 02:50:01,988][00603] Component LearnerWorker_p0 stopped! [2024-10-16 02:50:01,988][03704] Stopping LearnerWorker_p0... [2024-10-16 02:50:01,998][03704] Loop learner_proc0_evt_loop terminating... [2024-10-16 02:50:02,082][00603] Component RolloutWorker_w3 stopped! [2024-10-16 02:50:02,089][03720] Stopping RolloutWorker_w3... [2024-10-16 02:50:02,089][03720] Loop rollout_proc3_evt_loop terminating... [2024-10-16 02:50:02,165][00603] Component RolloutWorker_w5 stopped! [2024-10-16 02:50:02,176][03722] Stopping RolloutWorker_w5... [2024-10-16 02:50:02,177][03722] Loop rollout_proc5_evt_loop terminating... [2024-10-16 02:50:02,201][00603] Component RolloutWorker_w1 stopped! [2024-10-16 02:50:02,203][03719] Stopping RolloutWorker_w1... [2024-10-16 02:50:02,204][03719] Loop rollout_proc1_evt_loop terminating... [2024-10-16 02:50:02,302][03723] Stopping RolloutWorker_w4... [2024-10-16 02:50:02,303][03723] Loop rollout_proc4_evt_loop terminating... [2024-10-16 02:50:02,302][00603] Component RolloutWorker_w7 stopped! [2024-10-16 02:50:02,312][00603] Component RolloutWorker_w4 stopped! [2024-10-16 02:50:02,313][03725] Stopping RolloutWorker_w7... [2024-10-16 02:50:02,318][03724] Stopping RolloutWorker_w6... [2024-10-16 02:50:02,318][00603] Component RolloutWorker_w6 stopped! [2024-10-16 02:50:02,328][03718] Stopping RolloutWorker_w0... [2024-10-16 02:50:02,328][00603] Component RolloutWorker_w0 stopped! [2024-10-16 02:50:02,329][03725] Loop rollout_proc7_evt_loop terminating... [2024-10-16 02:50:02,319][03724] Loop rollout_proc6_evt_loop terminating... [2024-10-16 02:50:02,346][03718] Loop rollout_proc0_evt_loop terminating... [2024-10-16 02:50:02,352][03721] Stopping RolloutWorker_w2... [2024-10-16 02:50:02,352][00603] Component RolloutWorker_w2 stopped! [2024-10-16 02:50:02,354][00603] Waiting for process learner_proc0 to stop... [2024-10-16 02:50:02,361][03721] Loop rollout_proc2_evt_loop terminating... [2024-10-16 02:50:04,149][00603] Waiting for process inference_proc0-0 to join... [2024-10-16 02:50:04,220][00603] Waiting for process rollout_proc0 to join... [2024-10-16 02:50:06,569][00603] Waiting for process rollout_proc1 to join... [2024-10-16 02:50:06,577][00603] Waiting for process rollout_proc2 to join... [2024-10-16 02:50:06,583][00603] Waiting for process rollout_proc3 to join... [2024-10-16 02:50:06,587][00603] Waiting for process rollout_proc4 to join... [2024-10-16 02:50:06,593][00603] Waiting for process rollout_proc5 to join... [2024-10-16 02:50:06,595][00603] Waiting for process rollout_proc6 to join... [2024-10-16 02:50:06,599][00603] Waiting for process rollout_proc7 to join... [2024-10-16 02:50:06,602][00603] Batcher 0 profile tree view: batching: 25.4277, releasing_batches: 0.0298 [2024-10-16 02:50:06,605][00603] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 372.9132 update_model: 9.0106 weight_update: 0.0024 one_step: 0.0026 handle_policy_step: 579.1799 deserialize: 14.7531, stack: 3.1240, obs_to_device_normalize: 118.4604, forward: 306.7456, send_messages: 28.2668 prepare_outputs: 79.1585 to_cpu: 45.0736 [2024-10-16 02:50:06,606][00603] Learner 0 profile tree view: misc: 0.0049, prepare_batch: 13.7569 train: 73.0831 epoch_init: 0.0058, minibatch_init: 0.0110, losses_postprocess: 0.5940, kl_divergence: 0.6730, after_optimizer: 33.9300 calculate_losses: 25.5135 losses_init: 0.0036, forward_head: 1.1781, bptt_initial: 17.3043, tail: 1.0883, advantages_returns: 0.2217, losses: 3.6486 bptt: 1.7578 bptt_forward_core: 1.6850 update: 11.6600 clip: 0.9013 [2024-10-16 02:50:06,609][00603] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2629, enqueue_policy_requests: 88.0033, env_step: 781.1599, overhead: 12.2023, complete_rollouts: 6.5842 save_policy_outputs: 19.4104 split_output_tensors: 8.1970 [2024-10-16 02:50:06,610][00603] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3239, enqueue_policy_requests: 83.4413, env_step: 782.2549, overhead: 12.2931, complete_rollouts: 6.0696 save_policy_outputs: 20.5523 split_output_tensors: 8.1587 [2024-10-16 02:50:06,612][00603] Loop Runner_EvtLoop terminating... [2024-10-16 02:50:06,613][00603] Runner profile tree view: main_loop: 1028.0605 [2024-10-16 02:50:06,614][00603] Collected {0: 4005888}, FPS: 3896.5 [2024-10-16 02:50:43,377][00603] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-16 02:50:43,379][00603] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-16 02:50:43,381][00603] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-16 02:50:43,382][00603] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-16 02:50:43,384][00603] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-16 02:50:43,386][00603] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-16 02:50:43,387][00603] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-10-16 02:50:43,389][00603] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-16 02:50:43,390][00603] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-10-16 02:50:43,391][00603] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-10-16 02:50:43,392][00603] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-16 02:50:43,393][00603] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-16 02:50:43,394][00603] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-16 02:50:43,395][00603] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-16 02:50:43,396][00603] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-16 02:50:43,430][00603] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-16 02:50:43,433][00603] RunningMeanStd input shape: (3, 72, 128) [2024-10-16 02:50:43,435][00603] RunningMeanStd input shape: (1,) [2024-10-16 02:50:43,452][00603] ConvEncoder: input_channels=3 [2024-10-16 02:50:43,556][00603] Conv encoder output size: 512 [2024-10-16 02:50:43,557][00603] Policy head output size: 512 [2024-10-16 02:50:43,838][00603] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-16 02:50:44,631][00603] Num frames 100... [2024-10-16 02:50:44,757][00603] Num frames 200... [2024-10-16 02:50:44,880][00603] Num frames 300... [2024-10-16 02:50:45,014][00603] Num frames 400... [2024-10-16 02:50:45,129][00603] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2024-10-16 02:50:45,131][00603] Avg episode reward: 5.480, avg true_objective: 4.480 [2024-10-16 02:50:45,198][00603] Num frames 500... [2024-10-16 02:50:45,324][00603] Num frames 600... [2024-10-16 02:50:45,500][00603] Num frames 700... [2024-10-16 02:50:45,667][00603] Num frames 800... [2024-10-16 02:50:45,792][00603] Num frames 900... [2024-10-16 02:50:45,913][00603] Num frames 1000... [2024-10-16 02:50:46,048][00603] Num frames 1100... [2024-10-16 02:50:46,190][00603] Num frames 1200... [2024-10-16 02:50:46,369][00603] Num frames 1300... [2024-10-16 02:50:46,519][00603] Num frames 1400... [2024-10-16 02:50:46,641][00603] Num frames 1500... [2024-10-16 02:50:46,775][00603] Num frames 1600... [2024-10-16 02:50:46,898][00603] Num frames 1700... [2024-10-16 02:50:47,031][00603] Num frames 1800... [2024-10-16 02:50:47,153][00603] Num frames 1900... [2024-10-16 02:50:47,275][00603] Num frames 2000... [2024-10-16 02:50:47,426][00603] Num frames 2100... [2024-10-16 02:50:47,615][00603] Num frames 2200... [2024-10-16 02:50:47,810][00603] Num frames 2300... [2024-10-16 02:50:47,998][00603] Num frames 2400... [2024-10-16 02:50:48,180][00603] Num frames 2500... [2024-10-16 02:50:48,329][00603] Avg episode rewards: #0: 33.739, true rewards: #0: 12.740 [2024-10-16 02:50:48,335][00603] Avg episode reward: 33.739, avg true_objective: 12.740 [2024-10-16 02:50:48,458][00603] Num frames 2600... [2024-10-16 02:50:48,691][00603] Num frames 2700... [2024-10-16 02:50:48,893][00603] Num frames 2800... [2024-10-16 02:50:49,140][00603] Num frames 2900... [2024-10-16 02:50:49,543][00603] Num frames 3000... [2024-10-16 02:50:50,005][00603] Num frames 3100... [2024-10-16 02:50:50,308][00603] Num frames 3200... [2024-10-16 02:50:50,809][00603] Avg episode rewards: #0: 29.646, true rewards: #0: 10.980 [2024-10-16 02:50:50,813][00603] Avg episode reward: 29.646, avg true_objective: 10.980 [2024-10-16 02:50:50,839][00603] Num frames 3300... [2024-10-16 02:50:51,194][00603] Num frames 3400... [2024-10-16 02:50:51,543][00603] Num frames 3500... [2024-10-16 02:50:51,852][00603] Num frames 3600... [2024-10-16 02:50:52,259][00603] Num frames 3700... [2024-10-16 02:50:52,480][00603] Num frames 3800... [2024-10-16 02:50:52,663][00603] Num frames 3900... [2024-10-16 02:50:52,841][00603] Num frames 4000... [2024-10-16 02:50:53,052][00603] Num frames 4100... [2024-10-16 02:50:53,216][00603] Avg episode rewards: #0: 26.475, true rewards: #0: 10.475 [2024-10-16 02:50:53,218][00603] Avg episode reward: 26.475, avg true_objective: 10.475 [2024-10-16 02:50:53,235][00603] Num frames 4200... [2024-10-16 02:50:53,357][00603] Num frames 4300... [2024-10-16 02:50:53,480][00603] Num frames 4400... [2024-10-16 02:50:53,604][00603] Num frames 4500... [2024-10-16 02:50:53,725][00603] Num frames 4600... [2024-10-16 02:50:53,847][00603] Num frames 4700... [2024-10-16 02:50:53,982][00603] Num frames 4800... [2024-10-16 02:50:54,078][00603] Avg episode rewards: #0: 23.660, true rewards: #0: 9.660 [2024-10-16 02:50:54,079][00603] Avg episode reward: 23.660, avg true_objective: 9.660 [2024-10-16 02:50:54,167][00603] Num frames 4900... [2024-10-16 02:50:54,290][00603] Num frames 5000... [2024-10-16 02:50:54,412][00603] Num frames 5100... [2024-10-16 02:50:54,531][00603] Num frames 5200... [2024-10-16 02:50:54,650][00603] Num frames 5300... [2024-10-16 02:50:54,771][00603] Num frames 5400... [2024-10-16 02:50:54,892][00603] Num frames 5500... [2024-10-16 02:50:55,032][00603] Num frames 5600... [2024-10-16 02:50:55,153][00603] Num frames 5700... [2024-10-16 02:50:55,277][00603] Num frames 5800... [2024-10-16 02:50:55,402][00603] Num frames 5900... [2024-10-16 02:50:55,568][00603] Avg episode rewards: #0: 24.488, true rewards: #0: 9.988 [2024-10-16 02:50:55,569][00603] Avg episode reward: 24.488, avg true_objective: 9.988 [2024-10-16 02:50:55,581][00603] Num frames 6000... [2024-10-16 02:50:55,700][00603] Num frames 6100... [2024-10-16 02:50:55,822][00603] Num frames 6200... [2024-10-16 02:50:55,950][00603] Num frames 6300... [2024-10-16 02:50:56,079][00603] Num frames 6400... [2024-10-16 02:50:56,198][00603] Num frames 6500... [2024-10-16 02:50:56,322][00603] Num frames 6600... [2024-10-16 02:50:56,444][00603] Num frames 6700... [2024-10-16 02:50:56,565][00603] Num frames 6800... [2024-10-16 02:50:56,686][00603] Num frames 6900... [2024-10-16 02:50:56,835][00603] Num frames 7000... [2024-10-16 02:50:56,975][00603] Num frames 7100... [2024-10-16 02:50:57,105][00603] Num frames 7200... [2024-10-16 02:50:57,231][00603] Num frames 7300... [2024-10-16 02:50:57,358][00603] Num frames 7400... [2024-10-16 02:50:57,480][00603] Num frames 7500... [2024-10-16 02:50:57,599][00603] Num frames 7600... [2024-10-16 02:50:57,718][00603] Num frames 7700... [2024-10-16 02:50:57,841][00603] Num frames 7800... [2024-10-16 02:50:57,919][00603] Avg episode rewards: #0: 27.453, true rewards: #0: 11.167 [2024-10-16 02:50:57,921][00603] Avg episode reward: 27.453, avg true_objective: 11.167 [2024-10-16 02:50:58,030][00603] Num frames 7900... [2024-10-16 02:50:58,163][00603] Num frames 8000... [2024-10-16 02:50:58,284][00603] Num frames 8100... [2024-10-16 02:50:58,406][00603] Num frames 8200... [2024-10-16 02:50:58,526][00603] Num frames 8300... [2024-10-16 02:50:58,645][00603] Num frames 8400... [2024-10-16 02:50:58,769][00603] Avg episode rewards: #0: 25.446, true rewards: #0: 10.571 [2024-10-16 02:50:58,770][00603] Avg episode reward: 25.446, avg true_objective: 10.571 [2024-10-16 02:50:58,825][00603] Num frames 8500... [2024-10-16 02:50:58,951][00603] Num frames 8600... [2024-10-16 02:50:59,074][00603] Num frames 8700... [2024-10-16 02:50:59,199][00603] Num frames 8800... [2024-10-16 02:50:59,344][00603] Avg episode rewards: #0: 23.303, true rewards: #0: 9.859 [2024-10-16 02:50:59,345][00603] Avg episode reward: 23.303, avg true_objective: 9.859 [2024-10-16 02:50:59,379][00603] Num frames 8900... [2024-10-16 02:50:59,497][00603] Num frames 9000... [2024-10-16 02:50:59,615][00603] Num frames 9100... [2024-10-16 02:50:59,734][00603] Num frames 9200... [2024-10-16 02:50:59,856][00603] Num frames 9300... [2024-10-16 02:50:59,983][00603] Num frames 9400... [2024-10-16 02:51:00,103][00603] Num frames 9500... [2024-10-16 02:51:00,234][00603] Num frames 9600... [2024-10-16 02:51:00,358][00603] Num frames 9700... [2024-10-16 02:51:00,481][00603] Num frames 9800... [2024-10-16 02:51:00,600][00603] Num frames 9900... [2024-10-16 02:51:00,721][00603] Num frames 10000... [2024-10-16 02:51:00,843][00603] Num frames 10100... [2024-10-16 02:51:00,969][00603] Num frames 10200... [2024-10-16 02:51:01,092][00603] Num frames 10300... [2024-10-16 02:51:01,221][00603] Num frames 10400... [2024-10-16 02:51:01,345][00603] Num frames 10500... [2024-10-16 02:51:01,471][00603] Num frames 10600... [2024-10-16 02:51:01,593][00603] Num frames 10700... [2024-10-16 02:51:01,715][00603] Num frames 10800... [2024-10-16 02:51:01,839][00603] Num frames 10900... [2024-10-16 02:51:01,923][00603] Avg episode rewards: #0: 25.921, true rewards: #0: 10.921 [2024-10-16 02:51:01,926][00603] Avg episode reward: 25.921, avg true_objective: 10.921 [2024-10-16 02:52:01,890][00603] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-10-16 02:53:26,631][00603] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-16 02:53:26,633][00603] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-16 02:53:26,635][00603] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-16 02:53:26,637][00603] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-16 02:53:26,639][00603] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-16 02:53:26,643][00603] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-16 02:53:26,644][00603] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-10-16 02:53:26,646][00603] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-16 02:53:26,648][00603] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-10-16 02:53:26,650][00603] Adding new argument 'hf_repository'='77qq/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-10-16 02:53:26,651][00603] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-16 02:53:26,652][00603] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-16 02:53:26,653][00603] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-16 02:53:26,654][00603] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-16 02:53:26,655][00603] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-16 02:53:26,684][00603] RunningMeanStd input shape: (3, 72, 128) [2024-10-16 02:53:26,685][00603] RunningMeanStd input shape: (1,) [2024-10-16 02:53:26,698][00603] ConvEncoder: input_channels=3 [2024-10-16 02:53:26,735][00603] Conv encoder output size: 512 [2024-10-16 02:53:26,736][00603] Policy head output size: 512 [2024-10-16 02:53:26,756][00603] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-16 02:53:27,331][00603] Num frames 100... [2024-10-16 02:53:27,511][00603] Num frames 200... [2024-10-16 02:53:27,671][00603] Num frames 300... [2024-10-16 02:53:27,832][00603] Num frames 400... [2024-10-16 02:53:28,015][00603] Num frames 500... [2024-10-16 02:53:28,175][00603] Num frames 600... [2024-10-16 02:53:28,340][00603] Num frames 700... [2024-10-16 02:53:28,506][00603] Num frames 800... [2024-10-16 02:53:28,680][00603] Num frames 900... [2024-10-16 02:53:28,850][00603] Num frames 1000... [2024-10-16 02:53:29,020][00603] Num frames 1100... [2024-10-16 02:53:29,197][00603] Num frames 1200... [2024-10-16 02:53:29,369][00603] Num frames 1300... [2024-10-16 02:53:29,490][00603] Num frames 1400... [2024-10-16 02:53:29,610][00603] Num frames 1500... [2024-10-16 02:53:29,734][00603] Num frames 1600... [2024-10-16 02:53:29,855][00603] Num frames 1700... [2024-10-16 02:53:29,985][00603] Num frames 1800... [2024-10-16 02:53:30,114][00603] Num frames 1900... [2024-10-16 02:53:30,236][00603] Num frames 2000... [2024-10-16 02:53:30,357][00603] Num frames 2100... [2024-10-16 02:53:30,409][00603] Avg episode rewards: #0: 56.999, true rewards: #0: 21.000 [2024-10-16 02:53:30,411][00603] Avg episode reward: 56.999, avg true_objective: 21.000 [2024-10-16 02:53:30,527][00603] Num frames 2200... [2024-10-16 02:53:30,647][00603] Num frames 2300... [2024-10-16 02:53:30,765][00603] Num frames 2400... [2024-10-16 02:53:30,885][00603] Num frames 2500... [2024-10-16 02:53:31,012][00603] Num frames 2600... [2024-10-16 02:53:31,081][00603] Avg episode rewards: #0: 33.059, true rewards: #0: 13.060 [2024-10-16 02:53:31,083][00603] Avg episode reward: 33.059, avg true_objective: 13.060 [2024-10-16 02:53:31,196][00603] Num frames 2700... [2024-10-16 02:53:31,314][00603] Num frames 2800... [2024-10-16 02:53:31,436][00603] Num frames 2900... [2024-10-16 02:53:31,557][00603] Num frames 3000... [2024-10-16 02:53:31,676][00603] Num frames 3100... [2024-10-16 02:53:31,797][00603] Num frames 3200... [2024-10-16 02:53:31,913][00603] Avg episode rewards: #0: 25.840, true rewards: #0: 10.840 [2024-10-16 02:53:31,915][00603] Avg episode reward: 25.840, avg true_objective: 10.840 [2024-10-16 02:53:31,982][00603] Num frames 3300... [2024-10-16 02:53:32,101][00603] Num frames 3400... [2024-10-16 02:53:32,231][00603] Num frames 3500... [2024-10-16 02:53:32,351][00603] Num frames 3600... [2024-10-16 02:53:32,468][00603] Num frames 3700... [2024-10-16 02:53:32,586][00603] Num frames 3800... [2024-10-16 02:53:32,706][00603] Num frames 3900... [2024-10-16 02:53:32,823][00603] Num frames 4000... [2024-10-16 02:53:32,947][00603] Num frames 4100... [2024-10-16 02:53:33,046][00603] Avg episode rewards: #0: 24.340, true rewards: #0: 10.340 [2024-10-16 02:53:33,048][00603] Avg episode reward: 24.340, avg true_objective: 10.340 [2024-10-16 02:53:33,124][00603] Num frames 4200... [2024-10-16 02:53:33,250][00603] Num frames 4300... [2024-10-16 02:53:33,368][00603] Num frames 4400... [2024-10-16 02:53:33,485][00603] Num frames 4500... [2024-10-16 02:53:33,603][00603] Num frames 4600... [2024-10-16 02:53:33,721][00603] Num frames 4700... [2024-10-16 02:53:33,842][00603] Num frames 4800... [2024-10-16 02:53:33,969][00603] Num frames 4900... [2024-10-16 02:53:34,087][00603] Num frames 5000... [2024-10-16 02:53:34,214][00603] Num frames 5100... [2024-10-16 02:53:34,332][00603] Num frames 5200... [2024-10-16 02:53:34,458][00603] Num frames 5300... [2024-10-16 02:53:34,510][00603] Avg episode rewards: #0: 24.800, true rewards: #0: 10.600 [2024-10-16 02:53:34,512][00603] Avg episode reward: 24.800, avg true_objective: 10.600 [2024-10-16 02:53:34,630][00603] Num frames 5400... [2024-10-16 02:53:34,753][00603] Num frames 5500... [2024-10-16 02:53:34,874][00603] Num frames 5600... [2024-10-16 02:53:35,002][00603] Num frames 5700... [2024-10-16 02:53:35,121][00603] Num frames 5800... [2024-10-16 02:53:35,269][00603] Avg episode rewards: #0: 22.293, true rewards: #0: 9.793 [2024-10-16 02:53:35,271][00603] Avg episode reward: 22.293, avg true_objective: 9.793 [2024-10-16 02:53:35,303][00603] Num frames 5900... [2024-10-16 02:53:35,426][00603] Num frames 6000... [2024-10-16 02:53:35,545][00603] Num frames 6100... [2024-10-16 02:53:35,665][00603] Num frames 6200... [2024-10-16 02:53:35,787][00603] Num frames 6300... [2024-10-16 02:53:35,907][00603] Num frames 6400... [2024-10-16 02:53:36,036][00603] Num frames 6500... [2024-10-16 02:53:36,154][00603] Num frames 6600... [2024-10-16 02:53:36,285][00603] Num frames 6700... [2024-10-16 02:53:36,427][00603] Avg episode rewards: #0: 22.246, true rewards: #0: 9.674 [2024-10-16 02:53:36,429][00603] Avg episode reward: 22.246, avg true_objective: 9.674 [2024-10-16 02:53:36,464][00603] Num frames 6800... [2024-10-16 02:53:36,581][00603] Num frames 6900... [2024-10-16 02:53:36,702][00603] Num frames 7000... [2024-10-16 02:53:36,824][00603] Num frames 7100... [2024-10-16 02:53:36,950][00603] Num frames 7200... [2024-10-16 02:53:37,074][00603] Num frames 7300... [2024-10-16 02:53:37,194][00603] Num frames 7400... [2024-10-16 02:53:37,265][00603] Avg episode rewards: #0: 20.890, true rewards: #0: 9.265 [2024-10-16 02:53:37,267][00603] Avg episode reward: 20.890, avg true_objective: 9.265 [2024-10-16 02:53:37,375][00603] Num frames 7500... [2024-10-16 02:53:37,506][00603] Num frames 7600... [2024-10-16 02:53:37,628][00603] Num frames 7700... [2024-10-16 02:53:37,748][00603] Num frames 7800... [2024-10-16 02:53:37,870][00603] Num frames 7900... [2024-10-16 02:53:38,003][00603] Num frames 8000... [2024-10-16 02:53:38,122][00603] Num frames 8100... [2024-10-16 02:53:38,245][00603] Num frames 8200... [2024-10-16 02:53:38,372][00603] Num frames 8300... [2024-10-16 02:53:38,493][00603] Num frames 8400... [2024-10-16 02:53:38,616][00603] Num frames 8500... [2024-10-16 02:53:38,739][00603] Num frames 8600... [2024-10-16 02:53:38,860][00603] Num frames 8700... [2024-10-16 02:53:38,988][00603] Num frames 8800... [2024-10-16 02:53:39,109][00603] Num frames 8900... [2024-10-16 02:53:39,227][00603] Num frames 9000... [2024-10-16 02:53:39,359][00603] Num frames 9100... [2024-10-16 02:53:39,536][00603] Num frames 9200... [2024-10-16 02:53:39,702][00603] Num frames 9300... [2024-10-16 02:53:39,870][00603] Num frames 9400... [2024-10-16 02:53:40,037][00603] Num frames 9500... [2024-10-16 02:53:40,114][00603] Avg episode rewards: #0: 25.235, true rewards: #0: 10.569 [2024-10-16 02:53:40,116][00603] Avg episode reward: 25.235, avg true_objective: 10.569 [2024-10-16 02:53:40,259][00603] Num frames 9600... [2024-10-16 02:53:40,424][00603] Num frames 9700... [2024-10-16 02:53:40,594][00603] Num frames 9800... [2024-10-16 02:53:40,762][00603] Num frames 9900... [2024-10-16 02:53:40,936][00603] Num frames 10000... [2024-10-16 02:53:41,117][00603] Num frames 10100... [2024-10-16 02:53:41,262][00603] Avg episode rewards: #0: 23.852, true rewards: #0: 10.152 [2024-10-16 02:53:41,263][00603] Avg episode reward: 23.852, avg true_objective: 10.152 [2024-10-16 02:54:36,945][00603] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-10-16 02:54:51,855][00603] The model has been pushed to https://huggingface.co/77qq/rl_course_vizdoom_health_gathering_supreme [2024-10-16 02:57:20,711][00603] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json [2024-10-16 02:57:20,713][00603] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json [2024-10-16 02:57:20,715][00603] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line [2024-10-16 02:57:20,717][00603] Overriding arg 'train_dir' with value 'train_dir' passed from command line [2024-10-16 02:57:20,719][00603] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-16 02:57:20,721][00603] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! [2024-10-16 02:57:20,722][00603] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! [2024-10-16 02:57:20,724][00603] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! [2024-10-16 02:57:20,725][00603] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-16 02:57:20,726][00603] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-16 02:57:20,727][00603] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-16 02:57:20,728][00603] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-16 02:57:20,729][00603] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-10-16 02:57:20,730][00603] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-16 02:57:20,731][00603] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-10-16 02:57:20,732][00603] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-10-16 02:57:20,733][00603] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-16 02:57:20,734][00603] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-16 02:57:20,735][00603] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-16 02:57:20,736][00603] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-16 02:57:20,737][00603] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-16 02:57:20,770][00603] RunningMeanStd input shape: (3, 72, 128) [2024-10-16 02:57:20,772][00603] RunningMeanStd input shape: (1,) [2024-10-16 02:57:20,783][00603] ConvEncoder: input_channels=3 [2024-10-16 02:57:20,828][00603] Conv encoder output size: 512 [2024-10-16 02:57:20,829][00603] Policy head output size: 512 [2024-10-16 02:57:20,853][00603] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth... [2024-10-16 02:57:21,290][00603] Num frames 100... [2024-10-16 02:57:21,418][00603] Num frames 200... [2024-10-16 02:57:21,542][00603] Num frames 300... [2024-10-16 02:57:21,661][00603] Num frames 400... [2024-10-16 02:57:21,782][00603] Num frames 500... [2024-10-16 02:57:21,906][00603] Num frames 600... [2024-10-16 02:57:22,054][00603] Num frames 700... [2024-10-16 02:57:22,182][00603] Num frames 800... [2024-10-16 02:57:22,309][00603] Num frames 900... [2024-10-16 02:57:22,433][00603] Num frames 1000... [2024-10-16 02:57:22,561][00603] Num frames 1100... [2024-10-16 02:57:22,680][00603] Num frames 1200... [2024-10-16 02:57:22,805][00603] Num frames 1300... [2024-10-16 02:57:22,935][00603] Num frames 1400... [2024-10-16 02:57:23,058][00603] Num frames 1500... [2024-10-16 02:57:23,180][00603] Num frames 1600... [2024-10-16 02:57:23,303][00603] Num frames 1700... [2024-10-16 02:57:23,428][00603] Num frames 1800... [2024-10-16 02:57:23,560][00603] Num frames 1900... [2024-10-16 02:57:23,683][00603] Num frames 2000... [2024-10-16 02:57:23,826][00603] Num frames 2100... [2024-10-16 02:57:23,879][00603] Avg episode rewards: #0: 62.999, true rewards: #0: 21.000 [2024-10-16 02:57:23,881][00603] Avg episode reward: 62.999, avg true_objective: 21.000 [2024-10-16 02:57:24,058][00603] Num frames 2200... [2024-10-16 02:57:24,227][00603] Num frames 2300... [2024-10-16 02:57:24,393][00603] Num frames 2400... [2024-10-16 02:57:24,561][00603] Num frames 2500... [2024-10-16 02:57:24,723][00603] Num frames 2600... [2024-10-16 02:57:24,888][00603] Num frames 2700... [2024-10-16 02:57:25,058][00603] Num frames 2800... [2024-10-16 02:57:25,225][00603] Num frames 2900... [2024-10-16 02:57:25,403][00603] Num frames 3000... [2024-10-16 02:57:25,585][00603] Num frames 3100... [2024-10-16 02:57:25,758][00603] Num frames 3200... [2024-10-16 02:57:25,949][00603] Num frames 3300... [2024-10-16 02:57:26,124][00603] Num frames 3400... [2024-10-16 02:57:26,248][00603] Num frames 3500... [2024-10-16 02:57:26,376][00603] Num frames 3600... [2024-10-16 02:57:26,499][00603] Num frames 3700... [2024-10-16 02:57:26,629][00603] Num frames 3800... [2024-10-16 02:57:26,751][00603] Num frames 3900... [2024-10-16 02:57:26,874][00603] Num frames 4000... [2024-10-16 02:57:27,007][00603] Num frames 4100... [2024-10-16 02:57:27,133][00603] Num frames 4200... [2024-10-16 02:57:27,185][00603] Avg episode rewards: #0: 63.999, true rewards: #0: 21.000 [2024-10-16 02:57:27,186][00603] Avg episode reward: 63.999, avg true_objective: 21.000 [2024-10-16 02:57:27,310][00603] Num frames 4300... [2024-10-16 02:57:27,431][00603] Num frames 4400... [2024-10-16 02:57:27,551][00603] Num frames 4500... [2024-10-16 02:57:27,684][00603] Num frames 4600... [2024-10-16 02:57:27,807][00603] Num frames 4700... [2024-10-16 02:57:27,933][00603] Num frames 4800... [2024-10-16 02:57:28,056][00603] Num frames 4900... [2024-10-16 02:57:28,180][00603] Num frames 5000... [2024-10-16 02:57:28,309][00603] Num frames 5100... [2024-10-16 02:57:28,442][00603] Num frames 5200... [2024-10-16 02:57:28,580][00603] Num frames 5300... [2024-10-16 02:57:28,700][00603] Avg episode rewards: #0: 54.839, true rewards: #0: 17.840 [2024-10-16 02:57:28,702][00603] Avg episode reward: 54.839, avg true_objective: 17.840 [2024-10-16 02:57:28,764][00603] Num frames 5400... [2024-10-16 02:57:28,888][00603] Num frames 5500... [2024-10-16 02:57:29,022][00603] Num frames 5600... [2024-10-16 02:57:29,146][00603] Num frames 5700... [2024-10-16 02:57:29,268][00603] Num frames 5800... [2024-10-16 02:57:29,391][00603] Num frames 5900... [2024-10-16 02:57:29,512][00603] Num frames 6000... [2024-10-16 02:57:29,635][00603] Num frames 6100... [2024-10-16 02:57:29,766][00603] Num frames 6200... [2024-10-16 02:57:29,889][00603] Num frames 6300... [2024-10-16 02:57:30,015][00603] Num frames 6400... [2024-10-16 02:57:30,135][00603] Num frames 6500... [2024-10-16 02:57:30,255][00603] Num frames 6600... [2024-10-16 02:57:30,383][00603] Num frames 6700... [2024-10-16 02:57:30,505][00603] Num frames 6800... [2024-10-16 02:57:30,627][00603] Num frames 6900... [2024-10-16 02:57:30,760][00603] Num frames 7000... [2024-10-16 02:57:30,881][00603] Num frames 7100... [2024-10-16 02:57:31,015][00603] Num frames 7200... [2024-10-16 02:57:31,138][00603] Num frames 7300... [2024-10-16 02:57:31,265][00603] Num frames 7400... [2024-10-16 02:57:31,385][00603] Avg episode rewards: #0: 57.379, true rewards: #0: 18.630 [2024-10-16 02:57:31,387][00603] Avg episode reward: 57.379, avg true_objective: 18.630 [2024-10-16 02:57:31,449][00603] Num frames 7500... [2024-10-16 02:57:31,568][00603] Num frames 7600... [2024-10-16 02:57:31,689][00603] Num frames 7700... [2024-10-16 02:57:31,816][00603] Num frames 7800... [2024-10-16 02:57:31,943][00603] Num frames 7900... [2024-10-16 02:57:32,065][00603] Num frames 8000... [2024-10-16 02:57:32,183][00603] Num frames 8100... [2024-10-16 02:57:32,312][00603] Num frames 8200... [2024-10-16 02:57:32,436][00603] Num frames 8300... [2024-10-16 02:57:32,561][00603] Num frames 8400... [2024-10-16 02:57:32,683][00603] Num frames 8500... [2024-10-16 02:57:32,814][00603] Num frames 8600... [2024-10-16 02:57:32,947][00603] Num frames 8700... [2024-10-16 02:57:33,071][00603] Num frames 8800... [2024-10-16 02:57:33,195][00603] Num frames 8900... [2024-10-16 02:57:33,320][00603] Num frames 9000... [2024-10-16 02:57:33,448][00603] Num frames 9100... [2024-10-16 02:57:33,576][00603] Num frames 9200... [2024-10-16 02:57:33,697][00603] Num frames 9300... [2024-10-16 02:57:33,831][00603] Num frames 9400... [2024-10-16 02:57:33,965][00603] Num frames 9500... [2024-10-16 02:57:34,084][00603] Avg episode rewards: #0: 58.303, true rewards: #0: 19.104 [2024-10-16 02:57:34,086][00603] Avg episode reward: 58.303, avg true_objective: 19.104 [2024-10-16 02:57:34,146][00603] Num frames 9600... [2024-10-16 02:57:34,274][00603] Num frames 9700... [2024-10-16 02:57:34,405][00603] Num frames 9800... [2024-10-16 02:57:34,531][00603] Num frames 9900... [2024-10-16 02:57:34,654][00603] Num frames 10000... [2024-10-16 02:57:34,778][00603] Num frames 10100... [2024-10-16 02:57:34,910][00603] Num frames 10200... [2024-10-16 02:57:35,049][00603] Num frames 10300... [2024-10-16 02:57:35,173][00603] Num frames 10400... [2024-10-16 02:57:35,299][00603] Num frames 10500... [2024-10-16 02:57:35,421][00603] Num frames 10600... [2024-10-16 02:57:35,546][00603] Num frames 10700... [2024-10-16 02:57:35,677][00603] Num frames 10800... [2024-10-16 02:57:35,803][00603] Num frames 10900... [2024-10-16 02:57:35,939][00603] Num frames 11000... [2024-10-16 02:57:36,063][00603] Num frames 11100... [2024-10-16 02:57:36,223][00603] Num frames 11200... [2024-10-16 02:57:36,397][00603] Num frames 11300... [2024-10-16 02:57:36,563][00603] Num frames 11400... [2024-10-16 02:57:36,735][00603] Num frames 11500... [2024-10-16 02:57:36,907][00603] Avg episode rewards: #0: 58.110, true rewards: #0: 19.278 [2024-10-16 02:57:36,909][00603] Avg episode reward: 58.110, avg true_objective: 19.278 [2024-10-16 02:57:36,971][00603] Num frames 11600... [2024-10-16 02:57:37,138][00603] Num frames 11700... [2024-10-16 02:57:37,305][00603] Num frames 11800... [2024-10-16 02:57:37,477][00603] Num frames 11900... [2024-10-16 02:57:37,651][00603] Num frames 12000... [2024-10-16 02:57:37,829][00603] Num frames 12100... [2024-10-16 02:57:38,019][00603] Num frames 12200... [2024-10-16 02:57:38,195][00603] Num frames 12300... [2024-10-16 02:57:38,374][00603] Num frames 12400... [2024-10-16 02:57:38,520][00603] Num frames 12500... [2024-10-16 02:57:38,642][00603] Num frames 12600... [2024-10-16 02:57:38,767][00603] Num frames 12700... [2024-10-16 02:57:38,890][00603] Num frames 12800... [2024-10-16 02:57:39,031][00603] Num frames 12900... [2024-10-16 02:57:39,154][00603] Num frames 13000... [2024-10-16 02:57:39,278][00603] Num frames 13100... [2024-10-16 02:57:39,403][00603] Num frames 13200... [2024-10-16 02:57:39,526][00603] Num frames 13300... [2024-10-16 02:57:39,651][00603] Num frames 13400... [2024-10-16 02:57:39,780][00603] Num frames 13500... [2024-10-16 02:57:39,906][00603] Num frames 13600... [2024-10-16 02:57:40,063][00603] Avg episode rewards: #0: 58.380, true rewards: #0: 19.524 [2024-10-16 02:57:40,065][00603] Avg episode reward: 58.380, avg true_objective: 19.524 [2024-10-16 02:57:40,107][00603] Num frames 13700... [2024-10-16 02:57:40,232][00603] Num frames 13800... [2024-10-16 02:57:40,355][00603] Num frames 13900... [2024-10-16 02:57:40,482][00603] Num frames 14000... [2024-10-16 02:57:40,606][00603] Num frames 14100... [2024-10-16 02:57:40,728][00603] Num frames 14200... [2024-10-16 02:57:40,852][00603] Num frames 14300... [2024-10-16 02:57:40,986][00603] Num frames 14400... [2024-10-16 02:57:41,113][00603] Num frames 14500... [2024-10-16 02:57:41,240][00603] Num frames 14600... [2024-10-16 02:57:41,367][00603] Num frames 14700... [2024-10-16 02:57:41,493][00603] Num frames 14800... [2024-10-16 02:57:41,621][00603] Num frames 14900... [2024-10-16 02:57:41,747][00603] Num frames 15000... [2024-10-16 02:57:41,876][00603] Num frames 15100... [2024-10-16 02:57:42,021][00603] Num frames 15200... [2024-10-16 02:57:42,150][00603] Num frames 15300... [2024-10-16 02:57:42,279][00603] Num frames 15400... [2024-10-16 02:57:42,405][00603] Num frames 15500... [2024-10-16 02:57:42,530][00603] Num frames 15600... [2024-10-16 02:57:42,655][00603] Num frames 15700... [2024-10-16 02:57:42,796][00603] Avg episode rewards: #0: 58.957, true rewards: #0: 19.709 [2024-10-16 02:57:42,798][00603] Avg episode reward: 58.957, avg true_objective: 19.709 [2024-10-16 02:57:42,844][00603] Num frames 15800... [2024-10-16 02:57:42,976][00603] Num frames 15900... [2024-10-16 02:57:43,106][00603] Num frames 16000... [2024-10-16 02:57:43,235][00603] Num frames 16100... [2024-10-16 02:57:43,360][00603] Num frames 16200... [2024-10-16 02:57:43,483][00603] Num frames 16300... [2024-10-16 02:57:43,611][00603] Num frames 16400... [2024-10-16 02:57:43,737][00603] Num frames 16500... [2024-10-16 02:57:43,864][00603] Num frames 16600... [2024-10-16 02:57:43,995][00603] Num frames 16700... [2024-10-16 02:57:44,128][00603] Num frames 16800... [2024-10-16 02:57:44,254][00603] Num frames 16900... [2024-10-16 02:57:44,384][00603] Num frames 17000... [2024-10-16 02:57:44,510][00603] Num frames 17100... [2024-10-16 02:57:44,635][00603] Num frames 17200... [2024-10-16 02:57:44,763][00603] Num frames 17300... [2024-10-16 02:57:44,889][00603] Num frames 17400... [2024-10-16 02:57:45,033][00603] Num frames 17500... [2024-10-16 02:57:45,164][00603] Num frames 17600... [2024-10-16 02:57:45,292][00603] Num frames 17700... [2024-10-16 02:57:45,419][00603] Num frames 17800... [2024-10-16 02:57:45,558][00603] Avg episode rewards: #0: 59.851, true rewards: #0: 19.852 [2024-10-16 02:57:45,560][00603] Avg episode reward: 59.851, avg true_objective: 19.852 [2024-10-16 02:57:45,603][00603] Num frames 17900... [2024-10-16 02:57:45,728][00603] Num frames 18000... [2024-10-16 02:57:45,853][00603] Num frames 18100... [2024-10-16 02:57:45,986][00603] Num frames 18200... [2024-10-16 02:57:46,115][00603] Num frames 18300... [2024-10-16 02:57:46,248][00603] Num frames 18400... [2024-10-16 02:57:46,373][00603] Num frames 18500... [2024-10-16 02:57:46,496][00603] Num frames 18600... [2024-10-16 02:57:46,619][00603] Num frames 18700... [2024-10-16 02:57:46,743][00603] Num frames 18800... [2024-10-16 02:57:46,867][00603] Num frames 18900... [2024-10-16 02:57:47,001][00603] Num frames 19000... [2024-10-16 02:57:47,128][00603] Num frames 19100... [2024-10-16 02:57:47,261][00603] Num frames 19200... [2024-10-16 02:57:47,388][00603] Num frames 19300... [2024-10-16 02:57:47,513][00603] Num frames 19400... [2024-10-16 02:57:47,641][00603] Num frames 19500... [2024-10-16 02:57:47,769][00603] Num frames 19600... [2024-10-16 02:57:47,895][00603] Num frames 19700... [2024-10-16 02:57:48,030][00603] Num frames 19800... [2024-10-16 02:57:48,158][00603] Num frames 19900... [2024-10-16 02:57:48,307][00603] Avg episode rewards: #0: 60.566, true rewards: #0: 19.967 [2024-10-16 02:57:48,310][00603] Avg episode reward: 60.566, avg true_objective: 19.967 [2024-10-16 02:59:39,055][00603] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4! [2024-10-16 03:00:01,234][00603] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-16 03:00:01,237][00603] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-16 03:00:01,239][00603] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-16 03:00:01,240][00603] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-16 03:00:01,242][00603] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-16 03:00:01,244][00603] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-16 03:00:01,246][00603] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-10-16 03:00:01,248][00603] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-16 03:00:01,248][00603] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-10-16 03:00:01,249][00603] Adding new argument 'hf_repository'='77qq/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-10-16 03:00:01,250][00603] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-16 03:00:01,251][00603] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-16 03:00:01,252][00603] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-16 03:00:01,253][00603] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-16 03:00:01,254][00603] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-16 03:00:01,282][00603] RunningMeanStd input shape: (3, 72, 128) [2024-10-16 03:00:01,285][00603] RunningMeanStd input shape: (1,) [2024-10-16 03:00:01,297][00603] ConvEncoder: input_channels=3 [2024-10-16 03:00:01,335][00603] Conv encoder output size: 512 [2024-10-16 03:00:01,337][00603] Policy head output size: 512 [2024-10-16 03:00:01,356][00603] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-10-16 03:00:01,779][00603] Num frames 100... [2024-10-16 03:00:01,899][00603] Num frames 200... [2024-10-16 03:00:02,041][00603] Num frames 300... [2024-10-16 03:00:02,129][00603] Avg episode rewards: #0: 5.260, true rewards: #0: 3.260 [2024-10-16 03:00:02,130][00603] Avg episode reward: 5.260, avg true_objective: 3.260 [2024-10-16 03:00:02,224][00603] Num frames 400... [2024-10-16 03:00:02,344][00603] Num frames 500... [2024-10-16 03:00:02,470][00603] Num frames 600... [2024-10-16 03:00:02,588][00603] Num frames 700... [2024-10-16 03:00:02,710][00603] Num frames 800... [2024-10-16 03:00:02,834][00603] Num frames 900... [2024-10-16 03:00:02,960][00603] Num frames 1000... [2024-10-16 03:00:03,078][00603] Num frames 1100... [2024-10-16 03:00:03,209][00603] Avg episode rewards: #0: 12.820, true rewards: #0: 5.820 [2024-10-16 03:00:03,211][00603] Avg episode reward: 12.820, avg true_objective: 5.820 [2024-10-16 03:00:03,257][00603] Num frames 1200... [2024-10-16 03:00:03,376][00603] Num frames 1300... [2024-10-16 03:00:03,503][00603] Num frames 1400... [2024-10-16 03:00:03,633][00603] Num frames 1500... [2024-10-16 03:00:03,756][00603] Num frames 1600... [2024-10-16 03:00:03,881][00603] Num frames 1700... [2024-10-16 03:00:04,009][00603] Num frames 1800... [2024-10-16 03:00:04,131][00603] Num frames 1900... [2024-10-16 03:00:04,253][00603] Num frames 2000... [2024-10-16 03:00:04,376][00603] Num frames 2100... [2024-10-16 03:00:04,513][00603] Num frames 2200... [2024-10-16 03:00:04,635][00603] Num frames 2300... [2024-10-16 03:00:04,759][00603] Num frames 2400... [2024-10-16 03:00:04,887][00603] Num frames 2500... [2024-10-16 03:00:05,016][00603] Num frames 2600... [2024-10-16 03:00:05,136][00603] Num frames 2700... [2024-10-16 03:00:05,263][00603] Num frames 2800... [2024-10-16 03:00:05,392][00603] Avg episode rewards: #0: 22.200, true rewards: #0: 9.533 [2024-10-16 03:00:05,394][00603] Avg episode reward: 22.200, avg true_objective: 9.533 [2024-10-16 03:00:05,447][00603] Num frames 2900... [2024-10-16 03:00:05,575][00603] Num frames 3000... [2024-10-16 03:00:05,699][00603] Num frames 3100... [2024-10-16 03:00:05,821][00603] Num frames 3200... [2024-10-16 03:00:05,950][00603] Num frames 3300... [2024-10-16 03:00:06,070][00603] Num frames 3400... [2024-10-16 03:00:06,189][00603] Num frames 3500... [2024-10-16 03:00:06,310][00603] Num frames 3600... [2024-10-16 03:00:06,436][00603] Num frames 3700... [2024-10-16 03:00:06,609][00603] Num frames 3800... [2024-10-16 03:00:06,783][00603] Num frames 3900... [2024-10-16 03:00:06,951][00603] Num frames 4000... [2024-10-16 03:00:07,111][00603] Num frames 4100... [2024-10-16 03:00:07,279][00603] Num frames 4200... [2024-10-16 03:00:07,441][00603] Num frames 4300... [2024-10-16 03:00:07,610][00603] Num frames 4400... [2024-10-16 03:00:07,772][00603] Avg episode rewards: #0: 25.400, true rewards: #0: 11.150 [2024-10-16 03:00:07,774][00603] Avg episode reward: 25.400, avg true_objective: 11.150 [2024-10-16 03:00:07,841][00603] Num frames 4500... [2024-10-16 03:00:08,017][00603] Num frames 4600... [2024-10-16 03:00:08,188][00603] Num frames 4700... [2024-10-16 03:00:08,363][00603] Num frames 4800... [2024-10-16 03:00:08,539][00603] Num frames 4900... [2024-10-16 03:00:08,718][00603] Num frames 5000... [2024-10-16 03:00:08,886][00603] Num frames 5100... [2024-10-16 03:00:09,021][00603] Avg episode rewards: #0: 23.528, true rewards: #0: 10.328 [2024-10-16 03:00:09,023][00603] Avg episode reward: 23.528, avg true_objective: 10.328 [2024-10-16 03:00:09,070][00603] Num frames 5200... [2024-10-16 03:00:09,188][00603] Num frames 5300... [2024-10-16 03:00:09,309][00603] Num frames 5400... [2024-10-16 03:00:09,427][00603] Num frames 5500... [2024-10-16 03:00:09,557][00603] Num frames 5600... [2024-10-16 03:00:09,680][00603] Num frames 5700... [2024-10-16 03:00:09,812][00603] Num frames 5800... [2024-10-16 03:00:09,939][00603] Num frames 5900... [2024-10-16 03:00:10,063][00603] Num frames 6000... [2024-10-16 03:00:10,184][00603] Num frames 6100... [2024-10-16 03:00:10,303][00603] Num frames 6200... [2024-10-16 03:00:10,424][00603] Num frames 6300... [2024-10-16 03:00:10,547][00603] Num frames 6400... [2024-10-16 03:00:10,671][00603] Avg episode rewards: #0: 24.590, true rewards: #0: 10.757 [2024-10-16 03:00:10,674][00603] Avg episode reward: 24.590, avg true_objective: 10.757 [2024-10-16 03:00:10,741][00603] Num frames 6500... [2024-10-16 03:00:10,861][00603] Num frames 6600... [2024-10-16 03:00:10,991][00603] Num frames 6700... [2024-10-16 03:00:11,111][00603] Num frames 6800... [2024-10-16 03:00:11,231][00603] Num frames 6900... [2024-10-16 03:00:11,352][00603] Num frames 7000... [2024-10-16 03:00:11,475][00603] Num frames 7100... [2024-10-16 03:00:11,600][00603] Num frames 7200... [2024-10-16 03:00:11,683][00603] Avg episode rewards: #0: 22.746, true rewards: #0: 10.317 [2024-10-16 03:00:11,684][00603] Avg episode reward: 22.746, avg true_objective: 10.317 [2024-10-16 03:00:11,791][00603] Num frames 7300... [2024-10-16 03:00:11,912][00603] Num frames 7400... [2024-10-16 03:00:12,038][00603] Num frames 7500... [2024-10-16 03:00:12,157][00603] Num frames 7600... [2024-10-16 03:00:12,259][00603] Avg episode rewards: #0: 20.673, true rewards: #0: 9.547 [2024-10-16 03:00:12,261][00603] Avg episode reward: 20.673, avg true_objective: 9.547 [2024-10-16 03:00:12,339][00603] Num frames 7700... [2024-10-16 03:00:12,458][00603] Num frames 7800... [2024-10-16 03:00:12,577][00603] Num frames 7900... [2024-10-16 03:00:12,700][00603] Num frames 8000... [2024-10-16 03:00:12,831][00603] Num frames 8100... [2024-10-16 03:00:12,959][00603] Num frames 8200... [2024-10-16 03:00:13,080][00603] Num frames 8300... [2024-10-16 03:00:13,205][00603] Num frames 8400... [2024-10-16 03:00:13,328][00603] Num frames 8500... [2024-10-16 03:00:13,447][00603] Num frames 8600... [2024-10-16 03:00:13,564][00603] Num frames 8700... [2024-10-16 03:00:13,684][00603] Num frames 8800... [2024-10-16 03:00:13,815][00603] Num frames 8900... [2024-10-16 03:00:13,943][00603] Num frames 9000... [2024-10-16 03:00:14,062][00603] Num frames 9100... [2024-10-16 03:00:14,183][00603] Num frames 9200... [2024-10-16 03:00:14,247][00603] Avg episode rewards: #0: 23.118, true rewards: #0: 10.229 [2024-10-16 03:00:14,248][00603] Avg episode reward: 23.118, avg true_objective: 10.229 [2024-10-16 03:00:14,362][00603] Num frames 9300... [2024-10-16 03:00:14,482][00603] Num frames 9400... [2024-10-16 03:00:14,602][00603] Num frames 9500... [2024-10-16 03:00:14,727][00603] Num frames 9600... [2024-10-16 03:00:14,854][00603] Num frames 9700... [2024-10-16 03:00:14,981][00603] Num frames 9800... [2024-10-16 03:00:15,100][00603] Num frames 9900... [2024-10-16 03:00:15,221][00603] Num frames 10000... [2024-10-16 03:00:15,341][00603] Num frames 10100... [2024-10-16 03:00:15,402][00603] Avg episode rewards: #0: 22.502, true rewards: #0: 10.102 [2024-10-16 03:00:15,404][00603] Avg episode reward: 22.502, avg true_objective: 10.102 [2024-10-16 03:01:10,044][00603] Replay video saved to /content/train_dir/default_experiment/replay.mp4!