diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1060 @@ +[2024-09-06 07:54:51,174][01070] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-09-06 07:54:51,178][01070] Rollout worker 0 uses device cpu +[2024-09-06 07:54:51,179][01070] Rollout worker 1 uses device cpu +[2024-09-06 07:54:51,182][01070] Rollout worker 2 uses device cpu +[2024-09-06 07:54:51,183][01070] Rollout worker 3 uses device cpu +[2024-09-06 07:54:51,184][01070] Rollout worker 4 uses device cpu +[2024-09-06 07:54:51,185][01070] Rollout worker 5 uses device cpu +[2024-09-06 07:54:51,186][01070] Rollout worker 6 uses device cpu +[2024-09-06 07:54:51,187][01070] Rollout worker 7 uses device cpu +[2024-09-06 07:54:51,345][01070] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-06 07:54:51,348][01070] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-06 07:54:51,381][01070] Starting all processes... +[2024-09-06 07:54:51,382][01070] Starting process learner_proc0 +[2024-09-06 07:54:52,097][01070] Starting all processes... +[2024-09-06 07:54:52,107][01070] Starting process inference_proc0-0 +[2024-09-06 07:54:52,108][01070] Starting process rollout_proc0 +[2024-09-06 07:54:52,108][01070] Starting process rollout_proc1 +[2024-09-06 07:54:52,108][01070] Starting process rollout_proc2 +[2024-09-06 07:54:52,108][01070] Starting process rollout_proc3 +[2024-09-06 07:54:52,108][01070] Starting process rollout_proc4 +[2024-09-06 07:54:52,108][01070] Starting process rollout_proc5 +[2024-09-06 07:54:52,108][01070] Starting process rollout_proc6 +[2024-09-06 07:54:52,108][01070] Starting process rollout_proc7 +[2024-09-06 07:55:08,667][06068] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-06 07:55:08,668][06068] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-06 07:55:08,754][06069] Worker 0 uses CPU cores [0] +[2024-09-06 07:55:08,849][06068] Num visible devices: 1 +[2024-09-06 07:55:08,857][06055] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-06 07:55:08,862][06055] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-06 07:55:08,865][06072] Worker 3 uses CPU cores [1] +[2024-09-06 07:55:08,930][06073] Worker 4 uses CPU cores [0] +[2024-09-06 07:55:08,955][06055] Num visible devices: 1 +[2024-09-06 07:55:08,995][06055] Starting seed is not provided +[2024-09-06 07:55:08,996][06055] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-06 07:55:08,997][06055] Initializing actor-critic model on device cuda:0 +[2024-09-06 07:55:08,998][06055] RunningMeanStd input shape: (3, 72, 128) +[2024-09-06 07:55:09,001][06055] RunningMeanStd input shape: (1,) +[2024-09-06 07:55:09,045][06075] Worker 6 uses CPU cores [0] +[2024-09-06 07:55:09,054][06074] Worker 5 uses CPU cores [1] +[2024-09-06 07:55:09,100][06055] ConvEncoder: input_channels=3 +[2024-09-06 07:55:09,170][06071] Worker 2 uses CPU cores [0] +[2024-09-06 07:55:09,188][06076] Worker 7 uses CPU cores [1] +[2024-09-06 07:55:09,206][06070] Worker 1 uses CPU cores [1] +[2024-09-06 07:55:09,434][06055] Conv encoder output size: 512 +[2024-09-06 07:55:09,434][06055] Policy head output size: 512 +[2024-09-06 07:55:09,504][06055] Created Actor Critic model with architecture: +[2024-09-06 07:55:09,505][06055] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-06 07:55:09,884][06055] Using optimizer +[2024-09-06 07:55:10,611][06055] No checkpoints found +[2024-09-06 07:55:10,612][06055] Did not load from checkpoint, starting from scratch! +[2024-09-06 07:55:10,612][06055] Initialized policy 0 weights for model version 0 +[2024-09-06 07:55:10,618][06055] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-06 07:55:10,624][06055] LearnerWorker_p0 finished initialization! +[2024-09-06 07:55:10,711][06068] RunningMeanStd input shape: (3, 72, 128) +[2024-09-06 07:55:10,713][06068] RunningMeanStd input shape: (1,) +[2024-09-06 07:55:10,725][06068] ConvEncoder: input_channels=3 +[2024-09-06 07:55:10,825][06068] Conv encoder output size: 512 +[2024-09-06 07:55:10,826][06068] Policy head output size: 512 +[2024-09-06 07:55:10,875][01070] Inference worker 0-0 is ready! +[2024-09-06 07:55:10,877][01070] All inference workers are ready! Signal rollout workers to start! +[2024-09-06 07:55:11,084][06071] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-06 07:55:11,088][06069] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-06 07:55:11,090][06075] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-06 07:55:11,099][06076] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-06 07:55:11,093][06073] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-06 07:55:11,101][06074] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-06 07:55:11,097][06072] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-06 07:55:11,109][06070] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-06 07:55:11,317][01070] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-06 07:55:11,338][01070] Heartbeat connected on Batcher_0 +[2024-09-06 07:55:11,342][01070] Heartbeat connected on LearnerWorker_p0 +[2024-09-06 07:55:11,375][01070] Heartbeat connected on InferenceWorker_p0-w0 +[2024-09-06 07:55:12,668][06071] Decorrelating experience for 0 frames... +[2024-09-06 07:55:12,666][06069] Decorrelating experience for 0 frames... +[2024-09-06 07:55:12,669][06075] Decorrelating experience for 0 frames... +[2024-09-06 07:55:12,943][06076] Decorrelating experience for 0 frames... +[2024-09-06 07:55:12,952][06072] Decorrelating experience for 0 frames... +[2024-09-06 07:55:12,955][06074] Decorrelating experience for 0 frames... +[2024-09-06 07:55:12,961][06070] Decorrelating experience for 0 frames... +[2024-09-06 07:55:13,724][06076] Decorrelating experience for 32 frames... +[2024-09-06 07:55:13,727][06074] Decorrelating experience for 32 frames... +[2024-09-06 07:55:14,072][06071] Decorrelating experience for 32 frames... +[2024-09-06 07:55:14,074][06069] Decorrelating experience for 32 frames... +[2024-09-06 07:55:14,077][06075] Decorrelating experience for 32 frames... +[2024-09-06 07:55:14,540][06073] Decorrelating experience for 0 frames... +[2024-09-06 07:55:14,891][06070] Decorrelating experience for 32 frames... +[2024-09-06 07:55:15,287][06074] Decorrelating experience for 64 frames... +[2024-09-06 07:55:15,383][06072] Decorrelating experience for 32 frames... +[2024-09-06 07:55:15,566][06071] Decorrelating experience for 64 frames... +[2024-09-06 07:55:15,585][06069] Decorrelating experience for 64 frames... +[2024-09-06 07:55:16,059][06075] Decorrelating experience for 64 frames... +[2024-09-06 07:55:16,321][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-06 07:55:16,546][06071] Decorrelating experience for 96 frames... +[2024-09-06 07:55:16,581][06070] Decorrelating experience for 64 frames... +[2024-09-06 07:55:16,677][06076] Decorrelating experience for 64 frames... +[2024-09-06 07:55:16,724][06074] Decorrelating experience for 96 frames... +[2024-09-06 07:55:16,739][01070] Heartbeat connected on RolloutWorker_w2 +[2024-09-06 07:55:16,988][01070] Heartbeat connected on RolloutWorker_w5 +[2024-09-06 07:55:17,186][06075] Decorrelating experience for 96 frames... +[2024-09-06 07:55:17,328][01070] Heartbeat connected on RolloutWorker_w6 +[2024-09-06 07:55:17,605][06069] Decorrelating experience for 96 frames... +[2024-09-06 07:55:17,810][01070] Heartbeat connected on RolloutWorker_w0 +[2024-09-06 07:55:17,989][06072] Decorrelating experience for 64 frames... +[2024-09-06 07:55:17,992][06070] Decorrelating experience for 96 frames... +[2024-09-06 07:55:18,111][06076] Decorrelating experience for 96 frames... +[2024-09-06 07:55:18,212][01070] Heartbeat connected on RolloutWorker_w1 +[2024-09-06 07:55:18,296][01070] Heartbeat connected on RolloutWorker_w7 +[2024-09-06 07:55:18,667][06073] Decorrelating experience for 32 frames... +[2024-09-06 07:55:18,736][06072] Decorrelating experience for 96 frames... +[2024-09-06 07:55:18,820][01070] Heartbeat connected on RolloutWorker_w3 +[2024-09-06 07:55:20,114][06073] Decorrelating experience for 64 frames... +[2024-09-06 07:55:21,318][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 91.4. Samples: 914. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-06 07:55:21,321][01070] Avg episode reward: [(0, '1.348')] +[2024-09-06 07:55:22,967][06055] Signal inference workers to stop experience collection... +[2024-09-06 07:55:22,981][06068] InferenceWorker_p0-w0: stopping experience collection +[2024-09-06 07:55:23,292][06073] Decorrelating experience for 96 frames... +[2024-09-06 07:55:23,475][01070] Heartbeat connected on RolloutWorker_w4 +[2024-09-06 07:55:26,317][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 152.5. Samples: 2288. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-06 07:55:26,322][01070] Avg episode reward: [(0, '2.368')] +[2024-09-06 07:55:26,933][06055] Signal inference workers to resume experience collection... +[2024-09-06 07:55:26,935][06068] InferenceWorker_p0-w0: resuming experience collection +[2024-09-06 07:55:31,317][01070] Fps is (10 sec: 2458.0, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 209.5. Samples: 4190. Policy #0 lag: (min: 0.0, avg: 0.4, max: 3.0) +[2024-09-06 07:55:31,322][01070] Avg episode reward: [(0, '3.545')] +[2024-09-06 07:55:34,552][06068] Updated weights for policy 0, policy_version 10 (0.0173) +[2024-09-06 07:55:36,317][01070] Fps is (10 sec: 4505.4, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 45056. Throughput: 0: 445.8. Samples: 11144. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 07:55:36,323][01070] Avg episode reward: [(0, '4.241')] +[2024-09-06 07:55:41,319][01070] Fps is (10 sec: 3276.0, 60 sec: 1911.3, 300 sec: 1911.3). Total num frames: 57344. Throughput: 0: 511.6. Samples: 15350. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 07:55:41,324][01070] Avg episode reward: [(0, '4.401')] +[2024-09-06 07:55:46,317][01070] Fps is (10 sec: 3276.8, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 500.7. Samples: 17526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:55:46,319][01070] Avg episode reward: [(0, '4.396')] +[2024-09-06 07:55:47,170][06068] Updated weights for policy 0, policy_version 20 (0.0030) +[2024-09-06 07:55:51,317][01070] Fps is (10 sec: 4097.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 98304. Throughput: 0: 607.7. Samples: 24308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 07:55:51,319][01070] Avg episode reward: [(0, '4.360')] +[2024-09-06 07:55:56,317][01070] Fps is (10 sec: 4096.0, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 118784. Throughput: 0: 684.4. Samples: 30798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 07:55:56,321][01070] Avg episode reward: [(0, '4.546')] +[2024-09-06 07:55:56,332][06055] Saving new best policy, reward=4.546! +[2024-09-06 07:55:56,696][06068] Updated weights for policy 0, policy_version 30 (0.0020) +[2024-09-06 07:56:01,317][01070] Fps is (10 sec: 3686.4, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 135168. Throughput: 0: 728.7. Samples: 32786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 07:56:01,321][01070] Avg episode reward: [(0, '4.463')] +[2024-09-06 07:56:06,316][01070] Fps is (10 sec: 4096.2, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 159744. Throughput: 0: 840.3. Samples: 38726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-06 07:56:06,319][01070] Avg episode reward: [(0, '4.541')] +[2024-09-06 07:56:07,255][06068] Updated weights for policy 0, policy_version 40 (0.0021) +[2024-09-06 07:56:11,317][01070] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 172032. Throughput: 0: 933.5. Samples: 44294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 07:56:11,325][01070] Avg episode reward: [(0, '4.557')] +[2024-09-06 07:56:11,327][06055] Saving new best policy, reward=4.557! +[2024-09-06 07:56:16,317][01070] Fps is (10 sec: 2457.4, 60 sec: 3072.2, 300 sec: 2835.7). Total num frames: 184320. Throughput: 0: 930.2. Samples: 46048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:56:16,322][01070] Avg episode reward: [(0, '4.495')] +[2024-09-06 07:56:21,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2867.2). Total num frames: 200704. Throughput: 0: 859.7. Samples: 49832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 07:56:21,319][01070] Avg episode reward: [(0, '4.274')] +[2024-09-06 07:56:21,856][06068] Updated weights for policy 0, policy_version 50 (0.0042) +[2024-09-06 07:56:26,320][01070] Fps is (10 sec: 4094.9, 60 sec: 3754.4, 300 sec: 3003.6). Total num frames: 225280. Throughput: 0: 917.1. Samples: 56622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 07:56:26,326][01070] Avg episode reward: [(0, '4.304')] +[2024-09-06 07:56:30,559][06068] Updated weights for policy 0, policy_version 60 (0.0031) +[2024-09-06 07:56:31,316][01070] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 946.4. Samples: 60114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 07:56:31,322][01070] Avg episode reward: [(0, '4.369')] +[2024-09-06 07:56:36,317][01070] Fps is (10 sec: 3687.7, 60 sec: 3618.2, 300 sec: 3084.0). Total num frames: 262144. Throughput: 0: 907.4. Samples: 65140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:56:36,319][01070] Avg episode reward: [(0, '4.407')] +[2024-09-06 07:56:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3140.3). Total num frames: 282624. Throughput: 0: 891.6. Samples: 70920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:56:41,318][01070] Avg episode reward: [(0, '4.396')] +[2024-09-06 07:56:42,046][06068] Updated weights for policy 0, policy_version 70 (0.0046) +[2024-09-06 07:56:46,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3233.7). Total num frames: 307200. Throughput: 0: 924.6. Samples: 74394. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-06 07:56:46,323][01070] Avg episode reward: [(0, '4.273')] +[2024-09-06 07:56:46,333][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth... +[2024-09-06 07:56:51,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3194.9). Total num frames: 319488. Throughput: 0: 923.6. Samples: 80288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 07:56:51,321][01070] Avg episode reward: [(0, '4.403')] +[2024-09-06 07:56:53,243][06068] Updated weights for policy 0, policy_version 80 (0.0022) +[2024-09-06 07:56:56,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3198.8). Total num frames: 335872. Throughput: 0: 904.1. Samples: 84980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:56:56,321][01070] Avg episode reward: [(0, '4.573')] +[2024-09-06 07:56:56,329][06055] Saving new best policy, reward=4.573! +[2024-09-06 07:57:01,316][01070] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 360448. Throughput: 0: 940.5. Samples: 88368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 07:57:01,325][01070] Avg episode reward: [(0, '4.464')] +[2024-09-06 07:57:02,726][06068] Updated weights for policy 0, policy_version 90 (0.0031) +[2024-09-06 07:57:06,316][01070] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3312.4). Total num frames: 380928. Throughput: 0: 1012.4. Samples: 95392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 07:57:06,319][01070] Avg episode reward: [(0, '4.686')] +[2024-09-06 07:57:06,327][06055] Saving new best policy, reward=4.686! +[2024-09-06 07:57:11,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 393216. Throughput: 0: 953.2. Samples: 99512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 07:57:11,320][01070] Avg episode reward: [(0, '4.739')] +[2024-09-06 07:57:11,330][06055] Saving new best policy, reward=4.739! +[2024-09-06 07:57:14,413][06068] Updated weights for policy 0, policy_version 100 (0.0021) +[2024-09-06 07:57:16,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3342.3). Total num frames: 417792. Throughput: 0: 941.3. Samples: 102472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 07:57:16,324][01070] Avg episode reward: [(0, '4.631')] +[2024-09-06 07:57:21,317][01070] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3402.8). Total num frames: 442368. Throughput: 0: 983.4. Samples: 109392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-06 07:57:21,324][01070] Avg episode reward: [(0, '4.517')] +[2024-09-06 07:57:23,713][06068] Updated weights for policy 0, policy_version 110 (0.0022) +[2024-09-06 07:57:26,318][01070] Fps is (10 sec: 3685.8, 60 sec: 3823.1, 300 sec: 3367.8). Total num frames: 454656. Throughput: 0: 971.4. Samples: 114634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 07:57:26,321][01070] Avg episode reward: [(0, '4.538')] +[2024-09-06 07:57:31,316][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3393.8). Total num frames: 475136. Throughput: 0: 942.6. Samples: 116810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:57:31,323][01070] Avg episode reward: [(0, '4.571')] +[2024-09-06 07:57:34,630][06068] Updated weights for policy 0, policy_version 120 (0.0029) +[2024-09-06 07:57:36,317][01070] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3418.0). Total num frames: 495616. Throughput: 0: 962.0. Samples: 123580. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-06 07:57:36,324][01070] Avg episode reward: [(0, '4.726')] +[2024-09-06 07:57:41,317][01070] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3440.6). Total num frames: 516096. Throughput: 0: 999.0. Samples: 129936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 07:57:41,323][01070] Avg episode reward: [(0, '4.768')] +[2024-09-06 07:57:41,328][06055] Saving new best policy, reward=4.768! +[2024-09-06 07:57:46,181][06068] Updated weights for policy 0, policy_version 130 (0.0043) +[2024-09-06 07:57:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3435.4). Total num frames: 532480. Throughput: 0: 968.0. Samples: 131926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-06 07:57:46,324][01070] Avg episode reward: [(0, '4.814')] +[2024-09-06 07:57:46,339][06055] Saving new best policy, reward=4.814! +[2024-09-06 07:57:51,316][01070] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3456.0). Total num frames: 552960. Throughput: 0: 937.5. Samples: 137578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 07:57:51,319][01070] Avg episode reward: [(0, '4.845')] +[2024-09-06 07:57:51,323][06055] Saving new best policy, reward=4.845! +[2024-09-06 07:57:55,369][06068] Updated weights for policy 0, policy_version 140 (0.0029) +[2024-09-06 07:57:56,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3500.2). Total num frames: 577536. Throughput: 0: 1000.7. Samples: 144544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:57:56,319][01070] Avg episode reward: [(0, '4.741')] +[2024-09-06 07:58:01,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3469.6). Total num frames: 589824. Throughput: 0: 990.9. Samples: 147064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-06 07:58:01,322][01070] Avg episode reward: [(0, '4.681')] +[2024-09-06 07:58:06,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3487.5). Total num frames: 610304. Throughput: 0: 940.4. Samples: 151710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 07:58:06,321][01070] Avg episode reward: [(0, '4.354')] +[2024-09-06 07:58:06,904][06068] Updated weights for policy 0, policy_version 150 (0.0034) +[2024-09-06 07:58:11,317][01070] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3504.4). Total num frames: 630784. Throughput: 0: 978.3. Samples: 158654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 07:58:11,319][01070] Avg episode reward: [(0, '4.484')] +[2024-09-06 07:58:16,317][01070] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3520.3). Total num frames: 651264. Throughput: 0: 1007.8. Samples: 162160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 07:58:16,322][01070] Avg episode reward: [(0, '4.499')] +[2024-09-06 07:58:17,037][06068] Updated weights for policy 0, policy_version 160 (0.0034) +[2024-09-06 07:58:21,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3492.4). Total num frames: 663552. Throughput: 0: 953.2. Samples: 166476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:58:21,322][01070] Avg episode reward: [(0, '4.346')] +[2024-09-06 07:58:26,317][01070] Fps is (10 sec: 3686.5, 60 sec: 3891.3, 300 sec: 3528.9). Total num frames: 688128. Throughput: 0: 952.7. Samples: 172806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:58:26,319][01070] Avg episode reward: [(0, '4.378')] +[2024-09-06 07:58:27,560][06068] Updated weights for policy 0, policy_version 170 (0.0040) +[2024-09-06 07:58:31,317][01070] Fps is (10 sec: 4915.1, 60 sec: 3959.4, 300 sec: 3563.5). Total num frames: 712704. Throughput: 0: 985.9. Samples: 176290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 07:58:31,324][01070] Avg episode reward: [(0, '4.546')] +[2024-09-06 07:58:36,322][01070] Fps is (10 sec: 3684.3, 60 sec: 3822.6, 300 sec: 3536.4). Total num frames: 724992. Throughput: 0: 981.4. Samples: 181748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:58:36,325][01070] Avg episode reward: [(0, '4.479')] +[2024-09-06 07:58:39,036][06068] Updated weights for policy 0, policy_version 180 (0.0054) +[2024-09-06 07:58:41,317][01070] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 745472. Throughput: 0: 945.0. Samples: 187070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 07:58:41,323][01070] Avg episode reward: [(0, '4.487')] +[2024-09-06 07:58:46,317][01070] Fps is (10 sec: 4508.1, 60 sec: 3959.5, 300 sec: 3581.6). Total num frames: 770048. Throughput: 0: 963.1. Samples: 190404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 07:58:46,319][01070] Avg episode reward: [(0, '4.599')] +[2024-09-06 07:58:46,332][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth... +[2024-09-06 07:58:48,051][06068] Updated weights for policy 0, policy_version 190 (0.0033) +[2024-09-06 07:58:51,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3574.7). Total num frames: 786432. Throughput: 0: 1001.3. Samples: 196768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 07:58:51,319][01070] Avg episode reward: [(0, '4.716')] +[2024-09-06 07:58:56,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3568.1). Total num frames: 802816. Throughput: 0: 945.9. Samples: 201218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:58:56,324][01070] Avg episode reward: [(0, '4.677')] +[2024-09-06 07:58:59,542][06068] Updated weights for policy 0, policy_version 200 (0.0036) +[2024-09-06 07:59:01,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3579.5). Total num frames: 823296. Throughput: 0: 944.4. Samples: 204656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:59:01,319][01070] Avg episode reward: [(0, '4.666')] +[2024-09-06 07:59:06,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3608.0). Total num frames: 847872. Throughput: 0: 1001.9. Samples: 211562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 07:59:06,323][01070] Avg episode reward: [(0, '4.550')] +[2024-09-06 07:59:10,208][06068] Updated weights for policy 0, policy_version 210 (0.0028) +[2024-09-06 07:59:11,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3584.0). Total num frames: 860160. Throughput: 0: 960.0. Samples: 216004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 07:59:11,327][01070] Avg episode reward: [(0, '4.533')] +[2024-09-06 07:59:16,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3594.4). Total num frames: 880640. Throughput: 0: 939.2. Samples: 218556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 07:59:16,323][01070] Avg episode reward: [(0, '4.805')] +[2024-09-06 07:59:20,094][06068] Updated weights for policy 0, policy_version 220 (0.0030) +[2024-09-06 07:59:21,316][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3620.9). Total num frames: 905216. Throughput: 0: 972.9. Samples: 225522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 07:59:21,319][01070] Avg episode reward: [(0, '4.579')] +[2024-09-06 07:59:26,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3614.1). Total num frames: 921600. Throughput: 0: 980.2. Samples: 231180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:59:26,322][01070] Avg episode reward: [(0, '4.617')] +[2024-09-06 07:59:31,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3607.6). Total num frames: 937984. Throughput: 0: 952.4. Samples: 233264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 07:59:31,321][01070] Avg episode reward: [(0, '4.698')] +[2024-09-06 07:59:31,746][06068] Updated weights for policy 0, policy_version 230 (0.0037) +[2024-09-06 07:59:36,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.8, 300 sec: 3632.3). Total num frames: 962560. Throughput: 0: 960.2. Samples: 239978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:59:36,324][01070] Avg episode reward: [(0, '4.640')] +[2024-09-06 07:59:40,641][06068] Updated weights for policy 0, policy_version 240 (0.0019) +[2024-09-06 07:59:41,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3640.9). Total num frames: 983040. Throughput: 0: 1006.4. Samples: 246506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 07:59:41,319][01070] Avg episode reward: [(0, '4.662')] +[2024-09-06 07:59:46,321][01070] Fps is (10 sec: 3684.9, 60 sec: 3822.7, 300 sec: 3634.2). Total num frames: 999424. Throughput: 0: 975.4. Samples: 248554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 07:59:46,324][01070] Avg episode reward: [(0, '4.517')] +[2024-09-06 07:59:51,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3642.5). Total num frames: 1019904. Throughput: 0: 945.4. Samples: 254104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 07:59:51,319][01070] Avg episode reward: [(0, '4.585')] +[2024-09-06 07:59:52,074][06068] Updated weights for policy 0, policy_version 250 (0.0044) +[2024-09-06 07:59:56,317][01070] Fps is (10 sec: 3687.9, 60 sec: 3891.2, 300 sec: 3636.1). Total num frames: 1036288. Throughput: 0: 969.2. Samples: 259616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 07:59:56,319][01070] Avg episode reward: [(0, '4.734')] +[2024-09-06 08:00:01,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3615.8). Total num frames: 1048576. Throughput: 0: 955.6. Samples: 261556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:00:01,321][01070] Avg episode reward: [(0, '4.762')] +[2024-09-06 08:00:06,317][01070] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 1060864. Throughput: 0: 888.8. Samples: 265516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:00:06,318][01070] Avg episode reward: [(0, '4.686')] +[2024-09-06 08:00:06,345][06068] Updated weights for policy 0, policy_version 260 (0.0046) +[2024-09-06 08:00:11,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1085440. Throughput: 0: 911.2. Samples: 272186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:00:11,319][01070] Avg episode reward: [(0, '4.470')] +[2024-09-06 08:00:15,254][06068] Updated weights for policy 0, policy_version 270 (0.0039) +[2024-09-06 08:00:16,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1105920. Throughput: 0: 939.1. Samples: 275524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:00:16,318][01070] Avg episode reward: [(0, '4.535')] +[2024-09-06 08:00:21,328][01070] Fps is (10 sec: 3682.2, 60 sec: 3617.4, 300 sec: 3804.3). Total num frames: 1122304. Throughput: 0: 901.1. Samples: 280540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:00:21,339][01070] Avg episode reward: [(0, '4.759')] +[2024-09-06 08:00:26,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1142784. Throughput: 0: 885.6. Samples: 286356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:00:26,321][01070] Avg episode reward: [(0, '4.852')] +[2024-09-06 08:00:26,332][06055] Saving new best policy, reward=4.852! +[2024-09-06 08:00:26,855][06068] Updated weights for policy 0, policy_version 280 (0.0043) +[2024-09-06 08:00:31,317][01070] Fps is (10 sec: 4510.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1167360. Throughput: 0: 914.9. Samples: 289722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:00:31,318][01070] Avg episode reward: [(0, '4.581')] +[2024-09-06 08:00:36,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1183744. Throughput: 0: 929.6. Samples: 295936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:00:36,323][01070] Avg episode reward: [(0, '4.600')] +[2024-09-06 08:00:37,327][06068] Updated weights for policy 0, policy_version 290 (0.0022) +[2024-09-06 08:00:41,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 1200128. Throughput: 0: 911.8. Samples: 300646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:00:41,323][01070] Avg episode reward: [(0, '4.564')] +[2024-09-06 08:00:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3804.4). Total num frames: 1220608. Throughput: 0: 943.5. Samples: 304012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:00:46,321][01070] Avg episode reward: [(0, '4.603')] +[2024-09-06 08:00:46,372][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000299_1224704.pth... +[2024-09-06 08:00:46,507][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth +[2024-09-06 08:00:47,243][06068] Updated weights for policy 0, policy_version 300 (0.0033) +[2024-09-06 08:00:51,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 1241088. Throughput: 0: 1005.5. Samples: 310764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:00:51,321][01070] Avg episode reward: [(0, '4.905')] +[2024-09-06 08:00:51,344][06055] Saving new best policy, reward=4.905! +[2024-09-06 08:00:56,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 1257472. Throughput: 0: 950.9. Samples: 314976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:00:56,324][01070] Avg episode reward: [(0, '4.923')] +[2024-09-06 08:00:56,342][06055] Saving new best policy, reward=4.923! +[2024-09-06 08:00:58,832][06068] Updated weights for policy 0, policy_version 310 (0.0029) +[2024-09-06 08:01:01,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1277952. Throughput: 0: 942.3. Samples: 317926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-06 08:01:01,322][01070] Avg episode reward: [(0, '4.709')] +[2024-09-06 08:01:06,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 1302528. Throughput: 0: 984.8. Samples: 324844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:01:06,319][01070] Avg episode reward: [(0, '4.749')] +[2024-09-06 08:01:08,155][06068] Updated weights for policy 0, policy_version 320 (0.0016) +[2024-09-06 08:01:11,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1318912. Throughput: 0: 972.0. Samples: 330098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:01:11,320][01070] Avg episode reward: [(0, '4.757')] +[2024-09-06 08:01:16,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1335296. Throughput: 0: 944.2. Samples: 332210. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-06 08:01:16,324][01070] Avg episode reward: [(0, '4.768')] +[2024-09-06 08:01:19,327][06068] Updated weights for policy 0, policy_version 330 (0.0026) +[2024-09-06 08:01:21,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3960.2, 300 sec: 3846.1). Total num frames: 1359872. Throughput: 0: 959.8. Samples: 339126. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-06 08:01:21,320][01070] Avg episode reward: [(0, '4.852')] +[2024-09-06 08:01:26,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1380352. Throughput: 0: 994.4. Samples: 345396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:01:26,321][01070] Avg episode reward: [(0, '4.673')] +[2024-09-06 08:01:30,374][06068] Updated weights for policy 0, policy_version 340 (0.0044) +[2024-09-06 08:01:31,316][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1392640. Throughput: 0: 965.5. Samples: 347460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-06 08:01:31,320][01070] Avg episode reward: [(0, '4.731')] +[2024-09-06 08:01:36,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1417216. Throughput: 0: 949.2. Samples: 353480. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-06 08:01:36,318][01070] Avg episode reward: [(0, '4.856')] +[2024-09-06 08:01:39,566][06068] Updated weights for policy 0, policy_version 350 (0.0025) +[2024-09-06 08:01:41,316][01070] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 1441792. Throughput: 0: 1011.6. Samples: 360496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:01:41,323][01070] Avg episode reward: [(0, '4.687')] +[2024-09-06 08:01:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1454080. Throughput: 0: 996.0. Samples: 362748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-06 08:01:46,323][01070] Avg episode reward: [(0, '4.580')] +[2024-09-06 08:01:51,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1470464. Throughput: 0: 945.5. Samples: 367392. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-09-06 08:01:51,321][01070] Avg episode reward: [(0, '4.688')] +[2024-09-06 08:01:51,501][06068] Updated weights for policy 0, policy_version 360 (0.0037) +[2024-09-06 08:01:56,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1495040. Throughput: 0: 984.5. Samples: 374400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:01:56,324][01070] Avg episode reward: [(0, '4.809')] +[2024-09-06 08:02:01,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1511424. Throughput: 0: 1012.0. Samples: 377750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:02:01,325][01070] Avg episode reward: [(0, '5.081')] +[2024-09-06 08:02:01,334][06055] Saving new best policy, reward=5.081! +[2024-09-06 08:02:01,346][06068] Updated weights for policy 0, policy_version 370 (0.0044) +[2024-09-06 08:02:06,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1527808. Throughput: 0: 953.3. Samples: 382024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:02:06,324][01070] Avg episode reward: [(0, '5.109')] +[2024-09-06 08:02:06,335][06055] Saving new best policy, reward=5.109! +[2024-09-06 08:02:11,317][01070] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1552384. Throughput: 0: 958.0. Samples: 388506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:02:11,327][01070] Avg episode reward: [(0, '4.909')] +[2024-09-06 08:02:12,040][06068] Updated weights for policy 0, policy_version 380 (0.0043) +[2024-09-06 08:02:16,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1572864. Throughput: 0: 989.5. Samples: 391988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:02:16,319][01070] Avg episode reward: [(0, '4.791')] +[2024-09-06 08:02:21,316][01070] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1589248. Throughput: 0: 966.2. Samples: 396960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:02:21,321][01070] Avg episode reward: [(0, '4.771')] +[2024-09-06 08:02:23,656][06068] Updated weights for policy 0, policy_version 390 (0.0062) +[2024-09-06 08:02:26,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1609728. Throughput: 0: 936.7. Samples: 402648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:02:26,322][01070] Avg episode reward: [(0, '4.651')] +[2024-09-06 08:02:31,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1630208. Throughput: 0: 964.6. Samples: 406156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:02:31,322][01070] Avg episode reward: [(0, '4.673')] +[2024-09-06 08:02:32,348][06068] Updated weights for policy 0, policy_version 400 (0.0035) +[2024-09-06 08:02:36,317][01070] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1650688. Throughput: 0: 999.3. Samples: 412360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:02:36,318][01070] Avg episode reward: [(0, '4.851')] +[2024-09-06 08:02:41,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1662976. Throughput: 0: 946.6. Samples: 416998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:02:41,322][01070] Avg episode reward: [(0, '5.188')] +[2024-09-06 08:02:41,329][06055] Saving new best policy, reward=5.188! +[2024-09-06 08:02:43,939][06068] Updated weights for policy 0, policy_version 410 (0.0044) +[2024-09-06 08:02:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1687552. Throughput: 0: 948.0. Samples: 420410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:02:46,322][01070] Avg episode reward: [(0, '5.138')] +[2024-09-06 08:02:46,333][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000412_1687552.pth... +[2024-09-06 08:02:46,505][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth +[2024-09-06 08:02:51,318][01070] Fps is (10 sec: 4504.9, 60 sec: 3959.3, 300 sec: 3832.2). Total num frames: 1708032. Throughput: 0: 1003.2. Samples: 427168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:02:51,321][01070] Avg episode reward: [(0, '4.904')] +[2024-09-06 08:02:54,708][06068] Updated weights for policy 0, policy_version 420 (0.0014) +[2024-09-06 08:02:56,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1724416. Throughput: 0: 952.8. Samples: 431380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 08:02:56,319][01070] Avg episode reward: [(0, '4.841')] +[2024-09-06 08:03:01,317][01070] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1744896. Throughput: 0: 938.4. Samples: 434216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:03:01,319][01070] Avg episode reward: [(0, '4.777')] +[2024-09-06 08:03:04,504][06068] Updated weights for policy 0, policy_version 430 (0.0014) +[2024-09-06 08:03:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1765376. Throughput: 0: 983.9. Samples: 441236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 08:03:06,323][01070] Avg episode reward: [(0, '4.848')] +[2024-09-06 08:03:11,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 1781760. Throughput: 0: 978.1. Samples: 446662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:03:11,322][01070] Avg episode reward: [(0, '4.875')] +[2024-09-06 08:03:16,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 1798144. Throughput: 0: 947.8. Samples: 448806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:03:16,319][01070] Avg episode reward: [(0, '5.049')] +[2024-09-06 08:03:16,335][06068] Updated weights for policy 0, policy_version 440 (0.0023) +[2024-09-06 08:03:21,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1822720. Throughput: 0: 954.9. Samples: 455330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:03:21,320][01070] Avg episode reward: [(0, '5.155')] +[2024-09-06 08:03:25,412][06068] Updated weights for policy 0, policy_version 450 (0.0032) +[2024-09-06 08:03:26,317][01070] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1843200. Throughput: 0: 993.1. Samples: 461688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:03:26,318][01070] Avg episode reward: [(0, '5.165')] +[2024-09-06 08:03:31,322][01070] Fps is (10 sec: 3684.2, 60 sec: 3822.6, 300 sec: 3846.1). Total num frames: 1859584. Throughput: 0: 961.7. Samples: 463694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:03:31,325][01070] Avg episode reward: [(0, '5.015')] +[2024-09-06 08:03:36,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1880064. Throughput: 0: 941.8. Samples: 469546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:03:36,319][01070] Avg episode reward: [(0, '4.801')] +[2024-09-06 08:03:36,683][06068] Updated weights for policy 0, policy_version 460 (0.0046) +[2024-09-06 08:03:41,317][01070] Fps is (10 sec: 4098.3, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1900544. Throughput: 0: 995.4. Samples: 476172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:03:41,324][01070] Avg episode reward: [(0, '5.126')] +[2024-09-06 08:03:46,323][01070] Fps is (10 sec: 3274.6, 60 sec: 3754.2, 300 sec: 3818.2). Total num frames: 1912832. Throughput: 0: 972.6. Samples: 477988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:03:46,326][01070] Avg episode reward: [(0, '5.144')] +[2024-09-06 08:03:50,595][06068] Updated weights for policy 0, policy_version 470 (0.0043) +[2024-09-06 08:03:51,317][01070] Fps is (10 sec: 2457.7, 60 sec: 3618.2, 300 sec: 3804.4). Total num frames: 1925120. Throughput: 0: 891.7. Samples: 481364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:03:51,320][01070] Avg episode reward: [(0, '5.180')] +[2024-09-06 08:03:56,317][01070] Fps is (10 sec: 3688.9, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1949696. Throughput: 0: 907.3. Samples: 487490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-06 08:03:56,320][01070] Avg episode reward: [(0, '4.899')] +[2024-09-06 08:03:59,818][06068] Updated weights for policy 0, policy_version 480 (0.0031) +[2024-09-06 08:04:01,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1970176. Throughput: 0: 936.2. Samples: 490936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:04:01,319][01070] Avg episode reward: [(0, '4.708')] +[2024-09-06 08:04:06,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1986560. Throughput: 0: 912.2. Samples: 496378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:04:06,319][01070] Avg episode reward: [(0, '4.858')] +[2024-09-06 08:04:11,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2002944. Throughput: 0: 889.3. Samples: 501706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:04:11,323][01070] Avg episode reward: [(0, '5.187')] +[2024-09-06 08:04:11,461][06068] Updated weights for policy 0, policy_version 490 (0.0044) +[2024-09-06 08:04:16,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 2027520. Throughput: 0: 922.6. Samples: 505206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:04:16,321][01070] Avg episode reward: [(0, '5.216')] +[2024-09-06 08:04:16,333][06055] Saving new best policy, reward=5.216! +[2024-09-06 08:04:21,319][01070] Fps is (10 sec: 4094.9, 60 sec: 3686.2, 300 sec: 3804.4). Total num frames: 2043904. Throughput: 0: 931.9. Samples: 511486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:04:21,322][01070] Avg episode reward: [(0, '5.013')] +[2024-09-06 08:04:21,394][06068] Updated weights for policy 0, policy_version 500 (0.0030) +[2024-09-06 08:04:26,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2060288. Throughput: 0: 878.6. Samples: 515708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:04:26,324][01070] Avg episode reward: [(0, '4.618')] +[2024-09-06 08:04:31,317][01070] Fps is (10 sec: 4097.1, 60 sec: 3755.0, 300 sec: 3804.4). Total num frames: 2084864. Throughput: 0: 916.1. Samples: 519206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:04:31,319][01070] Avg episode reward: [(0, '4.710')] +[2024-09-06 08:04:32,059][06068] Updated weights for policy 0, policy_version 510 (0.0024) +[2024-09-06 08:04:36,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2105344. Throughput: 0: 995.1. Samples: 526144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:04:36,319][01070] Avg episode reward: [(0, '5.116')] +[2024-09-06 08:04:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3804.5). Total num frames: 2121728. Throughput: 0: 964.6. Samples: 530896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:04:41,319][01070] Avg episode reward: [(0, '5.008')] +[2024-09-06 08:04:43,583][06068] Updated weights for policy 0, policy_version 520 (0.0023) +[2024-09-06 08:04:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.4, 300 sec: 3804.4). Total num frames: 2142208. Throughput: 0: 945.4. Samples: 533480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:04:46,320][01070] Avg episode reward: [(0, '5.078')] +[2024-09-06 08:04:46,333][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000523_2142208.pth... +[2024-09-06 08:04:46,489][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000299_1224704.pth +[2024-09-06 08:04:51,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2162688. Throughput: 0: 975.2. Samples: 540264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:04:51,321][01070] Avg episode reward: [(0, '5.193')] +[2024-09-06 08:04:52,623][06068] Updated weights for policy 0, policy_version 530 (0.0036) +[2024-09-06 08:04:56,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2179072. Throughput: 0: 981.4. Samples: 545870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:04:56,323][01070] Avg episode reward: [(0, '4.939')] +[2024-09-06 08:05:01,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2195456. Throughput: 0: 949.8. Samples: 547946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:05:01,323][01070] Avg episode reward: [(0, '5.156')] +[2024-09-06 08:05:04,000][06068] Updated weights for policy 0, policy_version 540 (0.0028) +[2024-09-06 08:05:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2220032. Throughput: 0: 959.5. Samples: 554662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:05:06,324][01070] Avg episode reward: [(0, '5.088')] +[2024-09-06 08:05:11,316][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2240512. Throughput: 0: 1010.8. Samples: 561196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:05:11,320][01070] Avg episode reward: [(0, '4.885')] +[2024-09-06 08:05:14,445][06068] Updated weights for policy 0, policy_version 550 (0.0037) +[2024-09-06 08:05:16,318][01070] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3846.2). Total num frames: 2256896. Throughput: 0: 978.9. Samples: 563260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:05:16,321][01070] Avg episode reward: [(0, '5.189')] +[2024-09-06 08:05:21,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3891.4, 300 sec: 3846.1). Total num frames: 2277376. Throughput: 0: 948.0. Samples: 568804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 08:05:21,319][01070] Avg episode reward: [(0, '5.640')] +[2024-09-06 08:05:21,324][06055] Saving new best policy, reward=5.640! +[2024-09-06 08:05:24,437][06068] Updated weights for policy 0, policy_version 560 (0.0040) +[2024-09-06 08:05:26,317][01070] Fps is (10 sec: 4506.5, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 2301952. Throughput: 0: 998.1. Samples: 575810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:05:26,321][01070] Avg episode reward: [(0, '5.759')] +[2024-09-06 08:05:26,331][06055] Saving new best policy, reward=5.759! +[2024-09-06 08:05:31,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2318336. Throughput: 0: 999.7. Samples: 578466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:05:31,322][01070] Avg episode reward: [(0, '5.847')] +[2024-09-06 08:05:31,326][06055] Saving new best policy, reward=5.847! +[2024-09-06 08:05:35,920][06068] Updated weights for policy 0, policy_version 570 (0.0024) +[2024-09-06 08:05:36,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2334720. Throughput: 0: 952.0. Samples: 583102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:05:36,319][01070] Avg episode reward: [(0, '5.772')] +[2024-09-06 08:05:41,317][01070] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2359296. Throughput: 0: 984.4. Samples: 590166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:05:41,319][01070] Avg episode reward: [(0, '5.772')] +[2024-09-06 08:05:44,966][06068] Updated weights for policy 0, policy_version 580 (0.0031) +[2024-09-06 08:05:46,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2375680. Throughput: 0: 1015.8. Samples: 593656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:05:46,326][01070] Avg episode reward: [(0, '5.835')] +[2024-09-06 08:05:51,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2392064. Throughput: 0: 961.4. Samples: 597926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:05:51,325][01070] Avg episode reward: [(0, '5.707')] +[2024-09-06 08:05:56,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2412544. Throughput: 0: 956.4. Samples: 604232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:05:56,325][01070] Avg episode reward: [(0, '5.539')] +[2024-09-06 08:05:56,404][06068] Updated weights for policy 0, policy_version 590 (0.0035) +[2024-09-06 08:06:01,317][01070] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 2437120. Throughput: 0: 987.2. Samples: 607680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:06:01,321][01070] Avg episode reward: [(0, '5.823')] +[2024-09-06 08:06:06,317][01070] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2453504. Throughput: 0: 982.3. Samples: 613010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:06:06,323][01070] Avg episode reward: [(0, '5.596')] +[2024-09-06 08:06:07,575][06068] Updated weights for policy 0, policy_version 600 (0.0034) +[2024-09-06 08:06:11,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2469888. Throughput: 0: 945.9. Samples: 618374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:06:11,324][01070] Avg episode reward: [(0, '5.336')] +[2024-09-06 08:06:16,317][01070] Fps is (10 sec: 4096.2, 60 sec: 3959.6, 300 sec: 3846.1). Total num frames: 2494464. Throughput: 0: 964.4. Samples: 621862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:06:16,324][01070] Avg episode reward: [(0, '5.749')] +[2024-09-06 08:06:16,804][06068] Updated weights for policy 0, policy_version 610 (0.0036) +[2024-09-06 08:06:21,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2510848. Throughput: 0: 1000.3. Samples: 628114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:06:21,322][01070] Avg episode reward: [(0, '6.247')] +[2024-09-06 08:06:21,328][06055] Saving new best policy, reward=6.247! +[2024-09-06 08:06:26,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2527232. Throughput: 0: 943.9. Samples: 632640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:06:26,324][01070] Avg episode reward: [(0, '6.326')] +[2024-09-06 08:06:26,335][06055] Saving new best policy, reward=6.326! +[2024-09-06 08:06:28,439][06068] Updated weights for policy 0, policy_version 620 (0.0026) +[2024-09-06 08:06:31,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2551808. Throughput: 0: 943.6. Samples: 636116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:06:31,323][01070] Avg episode reward: [(0, '5.950')] +[2024-09-06 08:06:36,323][01070] Fps is (10 sec: 4502.5, 60 sec: 3959.0, 300 sec: 3832.1). Total num frames: 2572288. Throughput: 0: 1005.1. Samples: 643160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:06:36,326][01070] Avg episode reward: [(0, '5.960')] +[2024-09-06 08:06:38,101][06068] Updated weights for policy 0, policy_version 630 (0.0023) +[2024-09-06 08:06:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2588672. Throughput: 0: 964.2. Samples: 647620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:06:41,319][01070] Avg episode reward: [(0, '5.987')] +[2024-09-06 08:06:46,317][01070] Fps is (10 sec: 3688.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2609152. Throughput: 0: 950.4. Samples: 650448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:06:46,323][01070] Avg episode reward: [(0, '6.091')] +[2024-09-06 08:06:46,333][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000637_2609152.pth... +[2024-09-06 08:06:46,464][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000412_1687552.pth +[2024-09-06 08:06:48,887][06068] Updated weights for policy 0, policy_version 640 (0.0049) +[2024-09-06 08:06:51,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2629632. Throughput: 0: 982.1. Samples: 657202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:06:51,323][01070] Avg episode reward: [(0, '6.072')] +[2024-09-06 08:06:56,317][01070] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2646016. Throughput: 0: 980.7. Samples: 662504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:06:56,319][01070] Avg episode reward: [(0, '5.924')] +[2024-09-06 08:07:00,529][06068] Updated weights for policy 0, policy_version 650 (0.0028) +[2024-09-06 08:07:01,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2662400. Throughput: 0: 950.1. Samples: 664618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:07:01,319][01070] Avg episode reward: [(0, '5.764')] +[2024-09-06 08:07:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2686976. Throughput: 0: 963.3. Samples: 671462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-09-06 08:07:06,318][01070] Avg episode reward: [(0, '5.901')] +[2024-09-06 08:07:09,346][06068] Updated weights for policy 0, policy_version 660 (0.0045) +[2024-09-06 08:07:11,317][01070] Fps is (10 sec: 4505.2, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 2707456. Throughput: 0: 1000.9. Samples: 677680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:07:11,320][01070] Avg episode reward: [(0, '6.609')] +[2024-09-06 08:07:11,322][06055] Saving new best policy, reward=6.609! +[2024-09-06 08:07:16,318][01070] Fps is (10 sec: 3276.4, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 2719744. Throughput: 0: 967.8. Samples: 679670. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:07:16,324][01070] Avg episode reward: [(0, '6.832')] +[2024-09-06 08:07:16,336][06055] Saving new best policy, reward=6.832! +[2024-09-06 08:07:21,160][06068] Updated weights for policy 0, policy_version 670 (0.0030) +[2024-09-06 08:07:21,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 2744320. Throughput: 0: 935.1. Samples: 685232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:07:21,321][01070] Avg episode reward: [(0, '6.827')] +[2024-09-06 08:07:26,317][01070] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2760704. Throughput: 0: 966.4. Samples: 691106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:07:26,323][01070] Avg episode reward: [(0, '6.439')] +[2024-09-06 08:07:31,319][01070] Fps is (10 sec: 2866.6, 60 sec: 3686.2, 300 sec: 3804.4). Total num frames: 2772992. Throughput: 0: 942.7. Samples: 692870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:07:31,325][01070] Avg episode reward: [(0, '6.339')] +[2024-09-06 08:07:35,290][06068] Updated weights for policy 0, policy_version 680 (0.0034) +[2024-09-06 08:07:36,317][01070] Fps is (10 sec: 2457.6, 60 sec: 3550.3, 300 sec: 3804.4). Total num frames: 2785280. Throughput: 0: 878.4. Samples: 696728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-09-06 08:07:36,321][01070] Avg episode reward: [(0, '6.444')] +[2024-09-06 08:07:41,317][01070] Fps is (10 sec: 3687.5, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2809856. Throughput: 0: 908.9. Samples: 703404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:07:41,319][01070] Avg episode reward: [(0, '6.654')] +[2024-09-06 08:07:44,151][06068] Updated weights for policy 0, policy_version 690 (0.0028) +[2024-09-06 08:07:46,317][01070] Fps is (10 sec: 4915.2, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2834432. Throughput: 0: 940.5. Samples: 706940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:07:46,319][01070] Avg episode reward: [(0, '6.718')] +[2024-09-06 08:07:51,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2846720. Throughput: 0: 901.1. Samples: 712010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:07:51,319][01070] Avg episode reward: [(0, '6.882')] +[2024-09-06 08:07:51,324][06055] Saving new best policy, reward=6.882! +[2024-09-06 08:07:55,843][06068] Updated weights for policy 0, policy_version 700 (0.0039) +[2024-09-06 08:07:56,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2867200. Throughput: 0: 884.7. Samples: 717490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:07:56,325][01070] Avg episode reward: [(0, '7.244')] +[2024-09-06 08:07:56,336][06055] Saving new best policy, reward=7.244! +[2024-09-06 08:08:01,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2891776. Throughput: 0: 917.8. Samples: 720972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:08:01,324][01070] Avg episode reward: [(0, '7.391')] +[2024-09-06 08:08:01,326][06055] Saving new best policy, reward=7.391! +[2024-09-06 08:08:05,806][06068] Updated weights for policy 0, policy_version 710 (0.0027) +[2024-09-06 08:08:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 2908160. Throughput: 0: 930.1. Samples: 727084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:08:06,325][01070] Avg episode reward: [(0, '6.836')] +[2024-09-06 08:08:11,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3818.3). Total num frames: 2924544. Throughput: 0: 901.6. Samples: 731678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:08:11,319][01070] Avg episode reward: [(0, '6.692')] +[2024-09-06 08:08:16,037][06068] Updated weights for policy 0, policy_version 720 (0.0022) +[2024-09-06 08:08:16,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 2949120. Throughput: 0: 940.1. Samples: 735172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:08:16,319][01070] Avg episode reward: [(0, '6.850')] +[2024-09-06 08:08:21,321][01070] Fps is (10 sec: 4503.7, 60 sec: 3754.5, 300 sec: 3818.2). Total num frames: 2969600. Throughput: 0: 1006.2. Samples: 742010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 08:08:21,323][01070] Avg episode reward: [(0, '7.485')] +[2024-09-06 08:08:21,325][06055] Saving new best policy, reward=7.485! +[2024-09-06 08:08:26,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.5). Total num frames: 2981888. Throughput: 0: 953.0. Samples: 746290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:08:26,319][01070] Avg episode reward: [(0, '7.434')] +[2024-09-06 08:08:27,902][06068] Updated weights for policy 0, policy_version 730 (0.0032) +[2024-09-06 08:08:31,317][01070] Fps is (10 sec: 3278.2, 60 sec: 3823.1, 300 sec: 3804.4). Total num frames: 3002368. Throughput: 0: 939.8. Samples: 749232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:08:31,319][01070] Avg episode reward: [(0, '7.716')] +[2024-09-06 08:08:31,354][06055] Saving new best policy, reward=7.716! +[2024-09-06 08:08:36,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 3026944. Throughput: 0: 982.7. Samples: 756230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:08:36,319][01070] Avg episode reward: [(0, '7.928')] +[2024-09-06 08:08:36,335][06055] Saving new best policy, reward=7.928! +[2024-09-06 08:08:36,786][06068] Updated weights for policy 0, policy_version 740 (0.0040) +[2024-09-06 08:08:41,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.3). Total num frames: 3043328. Throughput: 0: 975.6. Samples: 761394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:08:41,319][01070] Avg episode reward: [(0, '7.704')] +[2024-09-06 08:08:46,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3059712. Throughput: 0: 946.2. Samples: 763550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:08:46,318][01070] Avg episode reward: [(0, '7.583')] +[2024-09-06 08:08:46,332][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000747_3059712.pth... +[2024-09-06 08:08:46,453][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000523_2142208.pth +[2024-09-06 08:08:48,365][06068] Updated weights for policy 0, policy_version 750 (0.0018) +[2024-09-06 08:08:51,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3084288. Throughput: 0: 960.6. Samples: 770310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:08:51,322][01070] Avg episode reward: [(0, '7.840')] +[2024-09-06 08:08:56,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3100672. Throughput: 0: 992.8. Samples: 776356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:08:56,320][01070] Avg episode reward: [(0, '7.925')] +[2024-09-06 08:08:59,558][06068] Updated weights for policy 0, policy_version 760 (0.0058) +[2024-09-06 08:09:01,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3117056. Throughput: 0: 961.3. Samples: 778430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:09:01,323][01070] Avg episode reward: [(0, '7.469')] +[2024-09-06 08:09:06,318][01070] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 3141632. Throughput: 0: 942.7. Samples: 784430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-09-06 08:09:06,322][01070] Avg episode reward: [(0, '7.036')] +[2024-09-06 08:09:08,713][06068] Updated weights for policy 0, policy_version 770 (0.0036) +[2024-09-06 08:09:11,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3162112. Throughput: 0: 1005.6. Samples: 791544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:09:11,324][01070] Avg episode reward: [(0, '7.387')] +[2024-09-06 08:09:16,322][01070] Fps is (10 sec: 3684.7, 60 sec: 3822.6, 300 sec: 3846.0). Total num frames: 3178496. Throughput: 0: 991.1. Samples: 793836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 08:09:16,325][01070] Avg episode reward: [(0, '7.663')] +[2024-09-06 08:09:20,382][06068] Updated weights for policy 0, policy_version 780 (0.0022) +[2024-09-06 08:09:21,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3860.0). Total num frames: 3198976. Throughput: 0: 946.0. Samples: 798798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:09:21,325][01070] Avg episode reward: [(0, '7.722')] +[2024-09-06 08:09:26,317][01070] Fps is (10 sec: 4508.2, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3223552. Throughput: 0: 988.2. Samples: 805864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:09:26,320][01070] Avg episode reward: [(0, '7.744')] +[2024-09-06 08:09:29,518][06068] Updated weights for policy 0, policy_version 790 (0.0014) +[2024-09-06 08:09:31,319][01070] Fps is (10 sec: 4095.0, 60 sec: 3959.3, 300 sec: 3846.0). Total num frames: 3239936. Throughput: 0: 1014.1. Samples: 809186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:09:31,322][01070] Avg episode reward: [(0, '7.755')] +[2024-09-06 08:09:36,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3256320. Throughput: 0: 960.5. Samples: 813534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:09:36,318][01070] Avg episode reward: [(0, '8.140')] +[2024-09-06 08:09:36,331][06055] Saving new best policy, reward=8.140! +[2024-09-06 08:09:40,585][06068] Updated weights for policy 0, policy_version 800 (0.0042) +[2024-09-06 08:09:41,317][01070] Fps is (10 sec: 3687.3, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3276800. Throughput: 0: 974.7. Samples: 820218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:09:41,323][01070] Avg episode reward: [(0, '8.704')] +[2024-09-06 08:09:41,332][06055] Saving new best policy, reward=8.704! +[2024-09-06 08:09:46,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3301376. Throughput: 0: 1006.8. Samples: 823736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:09:46,322][01070] Avg episode reward: [(0, '8.566')] +[2024-09-06 08:09:51,321][01070] Fps is (10 sec: 3684.6, 60 sec: 3822.6, 300 sec: 3846.0). Total num frames: 3313664. Throughput: 0: 986.4. Samples: 828822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:09:51,327][01070] Avg episode reward: [(0, '8.672')] +[2024-09-06 08:09:51,764][06068] Updated weights for policy 0, policy_version 810 (0.0032) +[2024-09-06 08:09:56,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3334144. Throughput: 0: 955.2. Samples: 834528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:09:56,323][01070] Avg episode reward: [(0, '8.667')] +[2024-09-06 08:10:00,867][06068] Updated weights for policy 0, policy_version 820 (0.0032) +[2024-09-06 08:10:01,317][01070] Fps is (10 sec: 4507.8, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3358720. Throughput: 0: 982.2. Samples: 838030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:10:01,319][01070] Avg episode reward: [(0, '9.323')] +[2024-09-06 08:10:01,327][06055] Saving new best policy, reward=9.323! +[2024-09-06 08:10:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 3375104. Throughput: 0: 1002.8. Samples: 843922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:10:06,322][01070] Avg episode reward: [(0, '9.281')] +[2024-09-06 08:10:11,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3391488. Throughput: 0: 951.8. Samples: 848696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:10:11,319][01070] Avg episode reward: [(0, '9.212')] +[2024-09-06 08:10:12,461][06068] Updated weights for policy 0, policy_version 830 (0.0025) +[2024-09-06 08:10:16,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.9, 300 sec: 3860.0). Total num frames: 3416064. Throughput: 0: 956.3. Samples: 852216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-06 08:10:16,319][01070] Avg episode reward: [(0, '9.185')] +[2024-09-06 08:10:21,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3436544. Throughput: 0: 1017.1. Samples: 859302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:10:21,320][01070] Avg episode reward: [(0, '9.434')] +[2024-09-06 08:10:21,324][06055] Saving new best policy, reward=9.434! +[2024-09-06 08:10:22,066][06068] Updated weights for policy 0, policy_version 840 (0.0030) +[2024-09-06 08:10:26,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3452928. Throughput: 0: 961.2. Samples: 863472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-06 08:10:26,322][01070] Avg episode reward: [(0, '9.987')] +[2024-09-06 08:10:26,338][06055] Saving new best policy, reward=9.987! +[2024-09-06 08:10:31,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3860.0). Total num frames: 3473408. Throughput: 0: 951.3. Samples: 866546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:10:31,322][01070] Avg episode reward: [(0, '10.544')] +[2024-09-06 08:10:31,325][06055] Saving new best policy, reward=10.544! +[2024-09-06 08:10:32,772][06068] Updated weights for policy 0, policy_version 850 (0.0028) +[2024-09-06 08:10:36,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3497984. Throughput: 0: 993.9. Samples: 873542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:10:36,319][01070] Avg episode reward: [(0, '11.038')] +[2024-09-06 08:10:36,329][06055] Saving new best policy, reward=11.038! +[2024-09-06 08:10:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3510272. Throughput: 0: 980.8. Samples: 878664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:10:41,325][01070] Avg episode reward: [(0, '10.332')] +[2024-09-06 08:10:44,181][06068] Updated weights for policy 0, policy_version 860 (0.0030) +[2024-09-06 08:10:46,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3530752. Throughput: 0: 953.3. Samples: 880928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:10:46,318][01070] Avg episode reward: [(0, '10.049')] +[2024-09-06 08:10:46,335][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000862_3530752.pth... +[2024-09-06 08:10:46,466][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000637_2609152.pth +[2024-09-06 08:10:51,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 3873.8). Total num frames: 3555328. Throughput: 0: 981.0. Samples: 888068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:10:51,325][01070] Avg episode reward: [(0, '9.881')] +[2024-09-06 08:10:52,867][06068] Updated weights for policy 0, policy_version 870 (0.0024) +[2024-09-06 08:10:56,318][01070] Fps is (10 sec: 4095.3, 60 sec: 3959.3, 300 sec: 3846.1). Total num frames: 3571712. Throughput: 0: 1010.2. Samples: 894156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:10:56,321][01070] Avg episode reward: [(0, '10.324')] +[2024-09-06 08:11:01,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3588096. Throughput: 0: 978.2. Samples: 896234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:11:01,324][01070] Avg episode reward: [(0, '10.751')] +[2024-09-06 08:11:04,130][06068] Updated weights for policy 0, policy_version 880 (0.0019) +[2024-09-06 08:11:06,317][01070] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3612672. Throughput: 0: 960.8. Samples: 902540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-06 08:11:06,321][01070] Avg episode reward: [(0, '11.121')] +[2024-09-06 08:11:06,332][06055] Saving new best policy, reward=11.121! +[2024-09-06 08:11:11,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 3629056. Throughput: 0: 994.5. Samples: 908226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:11:11,319][01070] Avg episode reward: [(0, '11.751')] +[2024-09-06 08:11:11,326][06055] Saving new best policy, reward=11.751! +[2024-09-06 08:11:16,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3641344. Throughput: 0: 964.6. Samples: 909952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-06 08:11:16,319][01070] Avg episode reward: [(0, '12.049')] +[2024-09-06 08:11:16,333][06055] Saving new best policy, reward=12.049! +[2024-09-06 08:11:17,256][06068] Updated weights for policy 0, policy_version 890 (0.0045) +[2024-09-06 08:11:21,316][01070] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3657728. Throughput: 0: 893.7. Samples: 913758. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-06 08:11:21,318][01070] Avg episode reward: [(0, '13.812')] +[2024-09-06 08:11:21,326][06055] Saving new best policy, reward=13.812! +[2024-09-06 08:11:26,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 3678208. Throughput: 0: 932.5. Samples: 920626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:11:26,325][01070] Avg episode reward: [(0, '15.231')] +[2024-09-06 08:11:26,390][06055] Saving new best policy, reward=15.231! +[2024-09-06 08:11:27,393][06068] Updated weights for policy 0, policy_version 900 (0.0024) +[2024-09-06 08:11:31,316][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3832.3). Total num frames: 3702784. Throughput: 0: 960.0. Samples: 924128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:11:31,322][01070] Avg episode reward: [(0, '15.872')] +[2024-09-06 08:11:31,328][06055] Saving new best policy, reward=15.872! +[2024-09-06 08:11:36,317][01070] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 3715072. Throughput: 0: 907.6. Samples: 928912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-06 08:11:36,320][01070] Avg episode reward: [(0, '16.561')] +[2024-09-06 08:11:36,333][06055] Saving new best policy, reward=16.561! +[2024-09-06 08:11:38,716][06068] Updated weights for policy 0, policy_version 910 (0.0025) +[2024-09-06 08:11:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3739648. Throughput: 0: 909.8. Samples: 935094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:11:41,318][01070] Avg episode reward: [(0, '14.523')] +[2024-09-06 08:11:46,317][01070] Fps is (10 sec: 4505.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3760128. Throughput: 0: 942.4. Samples: 938640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:11:46,320][01070] Avg episode reward: [(0, '14.427')] +[2024-09-06 08:11:47,204][06068] Updated weights for policy 0, policy_version 920 (0.0035) +[2024-09-06 08:11:51,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3776512. Throughput: 0: 934.4. Samples: 944586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:11:51,321][01070] Avg episode reward: [(0, '14.547')] +[2024-09-06 08:11:56,317][01070] Fps is (10 sec: 3686.6, 60 sec: 3754.8, 300 sec: 3846.1). Total num frames: 3796992. Throughput: 0: 915.3. Samples: 949416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:11:56,319][01070] Avg episode reward: [(0, '15.007')] +[2024-09-06 08:11:58,865][06068] Updated weights for policy 0, policy_version 930 (0.0040) +[2024-09-06 08:12:01,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3817472. Throughput: 0: 955.3. Samples: 952942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:12:01,319][01070] Avg episode reward: [(0, '16.278')] +[2024-09-06 08:12:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3837952. Throughput: 0: 1027.6. Samples: 959998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:12:06,326][01070] Avg episode reward: [(0, '16.304')] +[2024-09-06 08:12:09,321][06068] Updated weights for policy 0, policy_version 940 (0.0024) +[2024-09-06 08:12:11,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3854336. Throughput: 0: 969.6. Samples: 964258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:12:11,320][01070] Avg episode reward: [(0, '16.576')] +[2024-09-06 08:12:11,326][06055] Saving new best policy, reward=16.576! +[2024-09-06 08:12:16,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3878912. Throughput: 0: 961.4. Samples: 967390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:12:16,322][01070] Avg episode reward: [(0, '16.934')] +[2024-09-06 08:12:16,332][06055] Saving new best policy, reward=16.934! +[2024-09-06 08:12:18,962][06068] Updated weights for policy 0, policy_version 950 (0.0061) +[2024-09-06 08:12:21,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3899392. Throughput: 0: 1011.4. Samples: 974424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:12:21,321][01070] Avg episode reward: [(0, '16.941')] +[2024-09-06 08:12:21,326][06055] Saving new best policy, reward=16.941! +[2024-09-06 08:12:26,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 3915776. Throughput: 0: 986.0. Samples: 979462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-06 08:12:26,321][01070] Avg episode reward: [(0, '17.171')] +[2024-09-06 08:12:26,332][06055] Saving new best policy, reward=17.171! +[2024-09-06 08:12:30,483][06068] Updated weights for policy 0, policy_version 960 (0.0038) +[2024-09-06 08:12:31,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3932160. Throughput: 0: 955.2. Samples: 981622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:12:31,318][01070] Avg episode reward: [(0, '17.914')] +[2024-09-06 08:12:31,330][06055] Saving new best policy, reward=17.914! +[2024-09-06 08:12:36,317][01070] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3956736. Throughput: 0: 980.8. Samples: 988720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:12:36,319][01070] Avg episode reward: [(0, '17.778')] +[2024-09-06 08:12:39,405][06068] Updated weights for policy 0, policy_version 970 (0.0031) +[2024-09-06 08:12:41,318][01070] Fps is (10 sec: 4504.8, 60 sec: 3959.3, 300 sec: 3873.8). Total num frames: 3977216. Throughput: 0: 1011.3. Samples: 994928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-06 08:12:41,321][01070] Avg episode reward: [(0, '17.411')] +[2024-09-06 08:12:46,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 3989504. Throughput: 0: 979.5. Samples: 997020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-06 08:12:46,319][01070] Avg episode reward: [(0, '16.564')] +[2024-09-06 08:12:46,386][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000975_3993600.pth... +[2024-09-06 08:12:46,505][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000747_3059712.pth +[2024-09-06 08:12:49,066][06055] Stopping Batcher_0... +[2024-09-06 08:12:49,066][06055] Loop batcher_evt_loop terminating... +[2024-09-06 08:12:49,068][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-06 08:12:49,066][01070] Component Batcher_0 stopped! +[2024-09-06 08:12:49,172][06068] Weights refcount: 2 0 +[2024-09-06 08:12:49,178][06068] Stopping InferenceWorker_p0-w0... +[2024-09-06 08:12:49,180][01070] Component InferenceWorker_p0-w0 stopped! +[2024-09-06 08:12:49,179][06068] Loop inference_proc0-0_evt_loop terminating... +[2024-09-06 08:12:49,262][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000862_3530752.pth +[2024-09-06 08:12:49,282][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-06 08:12:49,452][01070] Component LearnerWorker_p0 stopped! +[2024-09-06 08:12:49,458][06055] Stopping LearnerWorker_p0... +[2024-09-06 08:12:49,458][06055] Loop learner_proc0_evt_loop terminating... +[2024-09-06 08:12:49,517][06070] Stopping RolloutWorker_w1... +[2024-09-06 08:12:49,518][06070] Loop rollout_proc1_evt_loop terminating... +[2024-09-06 08:12:49,518][01070] Component RolloutWorker_w1 stopped! +[2024-09-06 08:12:49,528][01070] Component RolloutWorker_w7 stopped! +[2024-09-06 08:12:49,528][06076] Stopping RolloutWorker_w7... +[2024-09-06 08:12:49,534][06076] Loop rollout_proc7_evt_loop terminating... +[2024-09-06 08:12:49,583][06072] Stopping RolloutWorker_w3... +[2024-09-06 08:12:49,587][06074] Stopping RolloutWorker_w5... +[2024-09-06 08:12:49,584][01070] Component RolloutWorker_w3 stopped! +[2024-09-06 08:12:49,589][06072] Loop rollout_proc3_evt_loop terminating... +[2024-09-06 08:12:49,589][01070] Component RolloutWorker_w5 stopped! +[2024-09-06 08:12:49,589][06074] Loop rollout_proc5_evt_loop terminating... +[2024-09-06 08:12:49,624][01070] Component RolloutWorker_w0 stopped! +[2024-09-06 08:12:49,629][06069] Stopping RolloutWorker_w0... +[2024-09-06 08:12:49,636][06069] Loop rollout_proc0_evt_loop terminating... +[2024-09-06 08:12:49,659][01070] Component RolloutWorker_w6 stopped! +[2024-09-06 08:12:49,662][06075] Stopping RolloutWorker_w6... +[2024-09-06 08:12:49,663][06075] Loop rollout_proc6_evt_loop terminating... +[2024-09-06 08:12:49,674][01070] Component RolloutWorker_w2 stopped! +[2024-09-06 08:12:49,677][06071] Stopping RolloutWorker_w2... +[2024-09-06 08:12:49,677][06071] Loop rollout_proc2_evt_loop terminating... +[2024-09-06 08:12:49,711][01070] Component RolloutWorker_w4 stopped! +[2024-09-06 08:12:49,714][01070] Waiting for process learner_proc0 to stop... +[2024-09-06 08:12:49,717][06073] Stopping RolloutWorker_w4... +[2024-09-06 08:12:49,718][06073] Loop rollout_proc4_evt_loop terminating... +[2024-09-06 08:12:51,010][01070] Waiting for process inference_proc0-0 to join... +[2024-09-06 08:12:51,015][01070] Waiting for process rollout_proc0 to join... +[2024-09-06 08:12:53,043][01070] Waiting for process rollout_proc1 to join... +[2024-09-06 08:12:53,046][01070] Waiting for process rollout_proc2 to join... +[2024-09-06 08:12:53,049][01070] Waiting for process rollout_proc3 to join... +[2024-09-06 08:12:53,051][01070] Waiting for process rollout_proc4 to join... +[2024-09-06 08:12:53,052][01070] Waiting for process rollout_proc5 to join... +[2024-09-06 08:12:53,054][01070] Waiting for process rollout_proc6 to join... +[2024-09-06 08:12:53,057][01070] Waiting for process rollout_proc7 to join... +[2024-09-06 08:12:53,059][01070] Batcher 0 profile tree view: +batching: 28.0352, releasing_batches: 0.0261 +[2024-09-06 08:12:53,060][01070] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0001 + wait_policy_total: 396.0637 +update_model: 9.3731 + weight_update: 0.0039 +one_step: 0.0115 + handle_policy_step: 605.9671 + deserialize: 14.6060, stack: 3.1718, obs_to_device_normalize: 122.4392, forward: 322.4806, send_messages: 28.9576 + prepare_outputs: 84.3394 + to_cpu: 49.4062 +[2024-09-06 08:12:53,062][01070] Learner 0 profile tree view: +misc: 0.0070, prepare_batch: 14.0241 +train: 74.3266 + epoch_init: 0.0157, minibatch_init: 0.0064, losses_postprocess: 0.6825, kl_divergence: 0.6975, after_optimizer: 33.2803 + calculate_losses: 26.8972 + losses_init: 0.0105, forward_head: 1.2733, bptt_initial: 18.0432, tail: 1.1022, advantages_returns: 0.2551, losses: 3.8922 + bptt: 1.9881 + bptt_forward_core: 1.8812 + update: 12.1225 + clip: 0.8825 +[2024-09-06 08:12:53,063][01070] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.3955, enqueue_policy_requests: 94.3963, env_step: 821.5439, overhead: 13.5076, complete_rollouts: 7.0298 +save_policy_outputs: 20.8305 + split_output_tensors: 8.5837 +[2024-09-06 08:12:53,065][01070] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.3496, enqueue_policy_requests: 97.5017, env_step: 821.5330, overhead: 13.0956, complete_rollouts: 6.8883 +save_policy_outputs: 20.0181 + split_output_tensors: 7.8801 +[2024-09-06 08:12:53,066][01070] Loop Runner_EvtLoop terminating... +[2024-09-06 08:12:53,068][01070] Runner profile tree view: +main_loop: 1081.6875 +[2024-09-06 08:12:53,069][01070] Collected {0: 4005888}, FPS: 3703.4 +[2024-09-06 08:26:47,354][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-06 08:26:47,356][01070] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-06 08:26:47,359][01070] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-06 08:26:47,361][01070] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-06 08:26:47,363][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-06 08:26:47,365][01070] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-06 08:26:47,366][01070] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-09-06 08:26:47,367][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-06 08:26:47,368][01070] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-09-06 08:26:47,369][01070] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-09-06 08:26:47,370][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-06 08:26:47,371][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-06 08:26:47,372][01070] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-06 08:26:47,373][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-06 08:26:47,374][01070] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-06 08:26:47,410][01070] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-06 08:26:47,413][01070] RunningMeanStd input shape: (3, 72, 128) +[2024-09-06 08:26:47,415][01070] RunningMeanStd input shape: (1,) +[2024-09-06 08:26:47,432][01070] ConvEncoder: input_channels=3 +[2024-09-06 08:26:47,594][01070] Conv encoder output size: 512 +[2024-09-06 08:26:47,596][01070] Policy head output size: 512 +[2024-09-06 08:26:47,888][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-06 08:26:48,754][01070] Num frames 100... +[2024-09-06 08:26:48,875][01070] Num frames 200... +[2024-09-06 08:26:48,994][01070] Num frames 300... +[2024-09-06 08:26:49,113][01070] Num frames 400... +[2024-09-06 08:26:49,254][01070] Num frames 500... +[2024-09-06 08:26:49,388][01070] Avg episode rewards: #0: 11.440, true rewards: #0: 5.440 +[2024-09-06 08:26:49,391][01070] Avg episode reward: 11.440, avg true_objective: 5.440 +[2024-09-06 08:26:49,511][01070] Num frames 600... +[2024-09-06 08:26:49,678][01070] Num frames 700... +[2024-09-06 08:26:49,846][01070] Num frames 800... +[2024-09-06 08:26:50,007][01070] Num frames 900... +[2024-09-06 08:26:50,168][01070] Num frames 1000... +[2024-09-06 08:26:50,331][01070] Num frames 1100... +[2024-09-06 08:26:50,503][01070] Num frames 1200... +[2024-09-06 08:26:50,681][01070] Num frames 1300... +[2024-09-06 08:26:50,858][01070] Num frames 1400... +[2024-09-06 08:26:51,030][01070] Num frames 1500... +[2024-09-06 08:26:51,200][01070] Num frames 1600... +[2024-09-06 08:26:51,364][01070] Avg episode rewards: #0: 19.795, true rewards: #0: 8.295 +[2024-09-06 08:26:51,366][01070] Avg episode reward: 19.795, avg true_objective: 8.295 +[2024-09-06 08:26:51,440][01070] Num frames 1700... +[2024-09-06 08:26:51,597][01070] Num frames 1800... +[2024-09-06 08:26:51,716][01070] Num frames 1900... +[2024-09-06 08:26:51,843][01070] Num frames 2000... +[2024-09-06 08:26:51,964][01070] Num frames 2100... +[2024-09-06 08:26:52,083][01070] Num frames 2200... +[2024-09-06 08:26:52,203][01070] Num frames 2300... +[2024-09-06 08:26:52,366][01070] Avg episode rewards: #0: 18.633, true rewards: #0: 7.967 +[2024-09-06 08:26:52,367][01070] Avg episode reward: 18.633, avg true_objective: 7.967 +[2024-09-06 08:26:52,382][01070] Num frames 2400... +[2024-09-06 08:26:52,509][01070] Num frames 2500... +[2024-09-06 08:26:52,629][01070] Num frames 2600... +[2024-09-06 08:26:52,748][01070] Num frames 2700... +[2024-09-06 08:26:52,878][01070] Num frames 2800... +[2024-09-06 08:26:52,997][01070] Num frames 2900... +[2024-09-06 08:26:53,117][01070] Num frames 3000... +[2024-09-06 08:26:53,239][01070] Num frames 3100... +[2024-09-06 08:26:53,401][01070] Avg episode rewards: #0: 18.478, true rewards: #0: 7.977 +[2024-09-06 08:26:53,402][01070] Avg episode reward: 18.478, avg true_objective: 7.977 +[2024-09-06 08:26:53,416][01070] Num frames 3200... +[2024-09-06 08:26:53,543][01070] Num frames 3300... +[2024-09-06 08:26:53,662][01070] Num frames 3400... +[2024-09-06 08:26:53,779][01070] Num frames 3500... +[2024-09-06 08:26:53,906][01070] Num frames 3600... +[2024-09-06 08:26:54,025][01070] Num frames 3700... +[2024-09-06 08:26:54,144][01070] Num frames 3800... +[2024-09-06 08:26:54,263][01070] Num frames 3900... +[2024-09-06 08:26:54,327][01070] Avg episode rewards: #0: 18.210, true rewards: #0: 7.810 +[2024-09-06 08:26:54,328][01070] Avg episode reward: 18.210, avg true_objective: 7.810 +[2024-09-06 08:26:54,438][01070] Num frames 4000... +[2024-09-06 08:26:54,568][01070] Num frames 4100... +[2024-09-06 08:26:54,685][01070] Num frames 4200... +[2024-09-06 08:26:54,803][01070] Num frames 4300... +[2024-09-06 08:26:54,928][01070] Num frames 4400... +[2024-09-06 08:26:55,056][01070] Num frames 4500... +[2024-09-06 08:26:55,190][01070] Num frames 4600... +[2024-09-06 08:26:55,326][01070] Num frames 4700... +[2024-09-06 08:26:55,391][01070] Avg episode rewards: #0: 17.675, true rewards: #0: 7.842 +[2024-09-06 08:26:55,392][01070] Avg episode reward: 17.675, avg true_objective: 7.842 +[2024-09-06 08:26:55,513][01070] Num frames 4800... +[2024-09-06 08:26:55,637][01070] Num frames 4900... +[2024-09-06 08:26:55,758][01070] Num frames 5000... +[2024-09-06 08:26:55,827][01070] Avg episode rewards: #0: 15.870, true rewards: #0: 7.156 +[2024-09-06 08:26:55,828][01070] Avg episode reward: 15.870, avg true_objective: 7.156 +[2024-09-06 08:26:55,944][01070] Num frames 5100... +[2024-09-06 08:26:56,065][01070] Num frames 5200... +[2024-09-06 08:26:56,186][01070] Num frames 5300... +[2024-09-06 08:26:56,305][01070] Num frames 5400... +[2024-09-06 08:26:56,425][01070] Num frames 5500... +[2024-09-06 08:26:56,557][01070] Num frames 5600... +[2024-09-06 08:26:56,681][01070] Num frames 5700... +[2024-09-06 08:26:56,803][01070] Num frames 5800... +[2024-09-06 08:26:56,931][01070] Num frames 5900... +[2024-09-06 08:26:57,053][01070] Num frames 6000... +[2024-09-06 08:26:57,173][01070] Num frames 6100... +[2024-09-06 08:26:57,294][01070] Num frames 6200... +[2024-09-06 08:26:57,418][01070] Num frames 6300... +[2024-09-06 08:26:57,547][01070] Num frames 6400... +[2024-09-06 08:26:57,675][01070] Num frames 6500... +[2024-09-06 08:26:57,795][01070] Num frames 6600... +[2024-09-06 08:26:57,916][01070] Num frames 6700... +[2024-09-06 08:26:58,045][01070] Num frames 6800... +[2024-09-06 08:26:58,169][01070] Num frames 6900... +[2024-09-06 08:26:58,260][01070] Avg episode rewards: #0: 19.911, true rewards: #0: 8.661 +[2024-09-06 08:26:58,261][01070] Avg episode reward: 19.911, avg true_objective: 8.661 +[2024-09-06 08:26:58,349][01070] Num frames 7000... +[2024-09-06 08:26:58,470][01070] Num frames 7100... +[2024-09-06 08:26:58,599][01070] Num frames 7200... +[2024-09-06 08:26:58,720][01070] Num frames 7300... +[2024-09-06 08:26:58,842][01070] Num frames 7400... +[2024-09-06 08:26:58,969][01070] Num frames 7500... +[2024-09-06 08:26:59,094][01070] Num frames 7600... +[2024-09-06 08:26:59,214][01070] Num frames 7700... +[2024-09-06 08:26:59,332][01070] Num frames 7800... +[2024-09-06 08:26:59,457][01070] Num frames 7900... +[2024-09-06 08:26:59,545][01070] Avg episode rewards: #0: 19.801, true rewards: #0: 8.801 +[2024-09-06 08:26:59,546][01070] Avg episode reward: 19.801, avg true_objective: 8.801 +[2024-09-06 08:26:59,649][01070] Num frames 8000... +[2024-09-06 08:26:59,793][01070] Num frames 8100... +[2024-09-06 08:26:59,916][01070] Num frames 8200... +[2024-09-06 08:27:00,042][01070] Num frames 8300... +[2024-09-06 08:27:00,161][01070] Num frames 8400... +[2024-09-06 08:27:00,280][01070] Num frames 8500... +[2024-09-06 08:27:00,411][01070] Avg episode rewards: #0: 18.961, true rewards: #0: 8.561 +[2024-09-06 08:27:00,412][01070] Avg episode reward: 18.961, avg true_objective: 8.561 +[2024-09-06 08:27:55,072][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-09-06 08:29:44,806][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-06 08:29:44,808][01070] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-06 08:29:44,810][01070] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-06 08:29:44,811][01070] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-06 08:29:44,813][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-06 08:29:44,814][01070] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-06 08:29:44,817][01070] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-09-06 08:29:44,818][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-06 08:29:44,820][01070] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-09-06 08:29:44,822][01070] Adding new argument 'hf_repository'='Re-Re/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-09-06 08:29:44,823][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-06 08:29:44,826][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-06 08:29:44,827][01070] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-06 08:29:44,828][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-06 08:29:44,829][01070] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-06 08:29:44,858][01070] RunningMeanStd input shape: (3, 72, 128) +[2024-09-06 08:29:44,861][01070] RunningMeanStd input shape: (1,) +[2024-09-06 08:29:44,875][01070] ConvEncoder: input_channels=3 +[2024-09-06 08:29:44,911][01070] Conv encoder output size: 512 +[2024-09-06 08:29:44,912][01070] Policy head output size: 512 +[2024-09-06 08:29:44,931][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-09-06 08:29:45,358][01070] Num frames 100... +[2024-09-06 08:29:45,488][01070] Num frames 200... +[2024-09-06 08:29:45,624][01070] Num frames 300... +[2024-09-06 08:29:45,744][01070] Num frames 400... +[2024-09-06 08:29:45,863][01070] Num frames 500... +[2024-09-06 08:29:45,987][01070] Num frames 600... +[2024-09-06 08:29:46,108][01070] Num frames 700... +[2024-09-06 08:29:46,235][01070] Num frames 800... +[2024-09-06 08:29:46,300][01070] Avg episode rewards: #0: 17.070, true rewards: #0: 8.070 +[2024-09-06 08:29:46,302][01070] Avg episode reward: 17.070, avg true_objective: 8.070 +[2024-09-06 08:29:46,416][01070] Num frames 900... +[2024-09-06 08:29:46,542][01070] Num frames 1000... +[2024-09-06 08:29:46,669][01070] Num frames 1100... +[2024-09-06 08:29:46,794][01070] Num frames 1200... +[2024-09-06 08:29:46,917][01070] Num frames 1300... +[2024-09-06 08:29:47,039][01070] Num frames 1400... +[2024-09-06 08:29:47,189][01070] Avg episode rewards: #0: 14.895, true rewards: #0: 7.395 +[2024-09-06 08:29:47,191][01070] Avg episode reward: 14.895, avg true_objective: 7.395 +[2024-09-06 08:29:47,218][01070] Num frames 1500... +[2024-09-06 08:29:47,338][01070] Num frames 1600... +[2024-09-06 08:29:47,461][01070] Num frames 1700... +[2024-09-06 08:29:47,587][01070] Num frames 1800... +[2024-09-06 08:29:47,708][01070] Num frames 1900... +[2024-09-06 08:29:47,826][01070] Num frames 2000... +[2024-09-06 08:29:47,910][01070] Avg episode rewards: #0: 12.410, true rewards: #0: 6.743 +[2024-09-06 08:29:47,913][01070] Avg episode reward: 12.410, avg true_objective: 6.743 +[2024-09-06 08:29:48,007][01070] Num frames 2100... +[2024-09-06 08:29:48,128][01070] Num frames 2200... +[2024-09-06 08:29:48,257][01070] Num frames 2300... +[2024-09-06 08:29:48,380][01070] Num frames 2400... +[2024-09-06 08:29:48,524][01070] Avg episode rewards: #0: 11.178, true rewards: #0: 6.177 +[2024-09-06 08:29:48,526][01070] Avg episode reward: 11.178, avg true_objective: 6.177 +[2024-09-06 08:29:48,562][01070] Num frames 2500... +[2024-09-06 08:29:48,679][01070] Num frames 2600... +[2024-09-06 08:29:48,802][01070] Num frames 2700... +[2024-09-06 08:29:48,926][01070] Num frames 2800... +[2024-09-06 08:29:49,047][01070] Num frames 2900... +[2024-09-06 08:29:49,168][01070] Num frames 3000... +[2024-09-06 08:29:49,296][01070] Num frames 3100... +[2024-09-06 08:29:49,421][01070] Num frames 3200... +[2024-09-06 08:29:49,551][01070] Num frames 3300... +[2024-09-06 08:29:49,677][01070] Num frames 3400... +[2024-09-06 08:29:49,816][01070] Avg episode rewards: #0: 12.736, true rewards: #0: 6.936 +[2024-09-06 08:29:49,819][01070] Avg episode reward: 12.736, avg true_objective: 6.936 +[2024-09-06 08:29:49,860][01070] Num frames 3500... +[2024-09-06 08:29:49,981][01070] Num frames 3600... +[2024-09-06 08:29:50,103][01070] Num frames 3700... +[2024-09-06 08:29:50,228][01070] Num frames 3800... +[2024-09-06 08:29:50,360][01070] Num frames 3900... +[2024-09-06 08:29:50,439][01070] Avg episode rewards: #0: 11.862, true rewards: #0: 6.528 +[2024-09-06 08:29:50,441][01070] Avg episode reward: 11.862, avg true_objective: 6.528 +[2024-09-06 08:29:50,548][01070] Num frames 4000... +[2024-09-06 08:29:50,713][01070] Num frames 4100... +[2024-09-06 08:29:50,880][01070] Num frames 4200... +[2024-09-06 08:29:51,043][01070] Num frames 4300... +[2024-09-06 08:29:51,205][01070] Avg episode rewards: #0: 10.950, true rewards: #0: 6.236 +[2024-09-06 08:29:51,208][01070] Avg episode reward: 10.950, avg true_objective: 6.236 +[2024-09-06 08:29:51,274][01070] Num frames 4400... +[2024-09-06 08:29:51,443][01070] Num frames 4500... +[2024-09-06 08:29:51,608][01070] Num frames 4600... +[2024-09-06 08:29:51,767][01070] Num frames 4700... +[2024-09-06 08:29:51,938][01070] Num frames 4800... +[2024-09-06 08:29:52,113][01070] Num frames 4900... +[2024-09-06 08:29:52,291][01070] Avg episode rewards: #0: 10.716, true rewards: #0: 6.216 +[2024-09-06 08:29:52,294][01070] Avg episode reward: 10.716, avg true_objective: 6.216 +[2024-09-06 08:29:52,349][01070] Num frames 5000... +[2024-09-06 08:29:52,549][01070] Num frames 5100... +[2024-09-06 08:29:52,723][01070] Num frames 5200... +[2024-09-06 08:29:52,895][01070] Num frames 5300... +[2024-09-06 08:29:53,063][01070] Num frames 5400... +[2024-09-06 08:29:53,183][01070] Num frames 5500... +[2024-09-06 08:29:53,304][01070] Num frames 5600... +[2024-09-06 08:29:53,437][01070] Num frames 5700... +[2024-09-06 08:29:53,565][01070] Num frames 5800... +[2024-09-06 08:29:53,688][01070] Num frames 5900... +[2024-09-06 08:29:53,810][01070] Num frames 6000... +[2024-09-06 08:29:53,930][01070] Num frames 6100... +[2024-09-06 08:29:54,050][01070] Num frames 6200... +[2024-09-06 08:29:54,174][01070] Num frames 6300... +[2024-09-06 08:29:54,254][01070] Avg episode rewards: #0: 13.130, true rewards: #0: 7.019 +[2024-09-06 08:29:54,255][01070] Avg episode reward: 13.130, avg true_objective: 7.019 +[2024-09-06 08:29:54,356][01070] Num frames 6400... +[2024-09-06 08:29:54,490][01070] Num frames 6500... +[2024-09-06 08:29:54,609][01070] Num frames 6600... +[2024-09-06 08:29:54,726][01070] Num frames 6700... +[2024-09-06 08:29:54,842][01070] Num frames 6800... +[2024-09-06 08:29:54,961][01070] Num frames 6900... +[2024-09-06 08:29:55,080][01070] Num frames 7000... +[2024-09-06 08:29:55,202][01070] Num frames 7100... +[2024-09-06 08:29:55,323][01070] Num frames 7200... +[2024-09-06 08:29:55,437][01070] Avg episode rewards: #0: 13.847, true rewards: #0: 7.247 +[2024-09-06 08:29:55,439][01070] Avg episode reward: 13.847, avg true_objective: 7.247 +[2024-09-06 08:30:39,348][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4!