diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1,50 +1,50 @@ -[2024-11-28 08:28:46,403][00195] Saving configuration to /content/train_dir/default_experiment/config.json... -[2024-11-28 08:28:46,405][00195] Rollout worker 0 uses device cpu -[2024-11-28 08:28:46,411][00195] Rollout worker 1 uses device cpu -[2024-11-28 08:28:46,412][00195] Rollout worker 2 uses device cpu -[2024-11-28 08:28:46,413][00195] Rollout worker 3 uses device cpu -[2024-11-28 08:28:46,414][00195] Rollout worker 4 uses device cpu -[2024-11-28 08:28:46,416][00195] Rollout worker 5 uses device cpu -[2024-11-28 08:28:46,417][00195] Rollout worker 6 uses device cpu -[2024-11-28 08:28:46,418][00195] Rollout worker 7 uses device cpu -[2024-11-28 08:28:46,576][00195] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-11-28 08:28:46,578][00195] InferenceWorker_p0-w0: min num requests: 2 -[2024-11-28 08:28:46,614][00195] Starting all processes... -[2024-11-28 08:28:46,616][00195] Starting process learner_proc0 -[2024-11-28 08:28:46,665][00195] Starting all processes... -[2024-11-28 08:28:46,675][00195] Starting process inference_proc0-0 -[2024-11-28 08:28:46,676][00195] Starting process rollout_proc0 -[2024-11-28 08:28:46,680][00195] Starting process rollout_proc1 -[2024-11-28 08:28:46,680][00195] Starting process rollout_proc2 -[2024-11-28 08:28:46,681][00195] Starting process rollout_proc3 -[2024-11-28 08:28:46,681][00195] Starting process rollout_proc4 -[2024-11-28 08:28:46,681][00195] Starting process rollout_proc5 -[2024-11-28 08:28:46,681][00195] Starting process rollout_proc6 -[2024-11-28 08:28:46,681][00195] Starting process rollout_proc7 -[2024-11-28 08:29:04,320][02276] Worker 6 uses CPU cores [0] -[2024-11-28 08:29:04,460][02251] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-11-28 08:29:04,467][02251] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2024-11-28 08:29:04,522][02251] Num visible devices: 1 -[2024-11-28 08:29:04,559][02251] Starting seed is not provided -[2024-11-28 08:29:04,560][02251] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-11-28 08:29:04,561][02251] Initializing actor-critic model on device cuda:0 -[2024-11-28 08:29:04,562][02251] RunningMeanStd input shape: (3, 72, 128) -[2024-11-28 08:29:04,565][02251] RunningMeanStd input shape: (1,) -[2024-11-28 08:29:04,644][02251] ConvEncoder: input_channels=3 -[2024-11-28 08:29:04,709][02271] Worker 2 uses CPU cores [0] -[2024-11-28 08:29:04,708][02270] Worker 1 uses CPU cores [1] -[2024-11-28 08:29:04,714][02273] Worker 4 uses CPU cores [0] -[2024-11-28 08:29:04,773][02269] Worker 0 uses CPU cores [0] -[2024-11-28 08:29:04,840][02272] Worker 3 uses CPU cores [1] -[2024-11-28 08:29:04,872][02268] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-11-28 08:29:04,873][02268] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2024-11-28 08:29:04,889][02275] Worker 7 uses CPU cores [1] -[2024-11-28 08:29:04,891][02268] Num visible devices: 1 -[2024-11-28 08:29:04,956][02274] Worker 5 uses CPU cores [1] -[2024-11-28 08:29:05,017][02251] Conv encoder output size: 512 -[2024-11-28 08:29:05,018][02251] Policy head output size: 512 -[2024-11-28 08:29:05,068][02251] Created Actor Critic model with architecture: -[2024-11-28 08:29:05,068][02251] ActorCriticSharedWeights( +[2024-12-01 11:06:37,002][02154] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-12-01 11:06:37,007][02154] Rollout worker 0 uses device cpu +[2024-12-01 11:06:37,012][02154] Rollout worker 1 uses device cpu +[2024-12-01 11:06:37,016][02154] Rollout worker 2 uses device cpu +[2024-12-01 11:06:37,018][02154] Rollout worker 3 uses device cpu +[2024-12-01 11:06:37,035][02154] Rollout worker 4 uses device cpu +[2024-12-01 11:06:37,036][02154] Rollout worker 5 uses device cpu +[2024-12-01 11:06:37,039][02154] Rollout worker 6 uses device cpu +[2024-12-01 11:06:37,042][02154] Rollout worker 7 uses device cpu +[2024-12-01 11:06:37,453][02154] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-01 11:06:37,458][02154] InferenceWorker_p0-w0: min num requests: 2 +[2024-12-01 11:06:37,569][02154] Starting all processes... +[2024-12-01 11:06:37,576][02154] Starting process learner_proc0 +[2024-12-01 11:06:37,702][02154] Starting all processes... +[2024-12-01 11:06:37,795][02154] Starting process inference_proc0-0 +[2024-12-01 11:06:37,796][02154] Starting process rollout_proc0 +[2024-12-01 11:06:37,801][02154] Starting process rollout_proc1 +[2024-12-01 11:06:37,801][02154] Starting process rollout_proc2 +[2024-12-01 11:06:37,801][02154] Starting process rollout_proc3 +[2024-12-01 11:06:37,801][02154] Starting process rollout_proc4 +[2024-12-01 11:06:37,801][02154] Starting process rollout_proc5 +[2024-12-01 11:06:37,801][02154] Starting process rollout_proc6 +[2024-12-01 11:06:37,801][02154] Starting process rollout_proc7 +[2024-12-01 11:06:54,464][04297] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-01 11:06:54,468][04297] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-12-01 11:06:54,558][04297] Num visible devices: 1 +[2024-12-01 11:06:54,614][04297] Starting seed is not provided +[2024-12-01 11:06:54,616][04297] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-01 11:06:54,617][04297] Initializing actor-critic model on device cuda:0 +[2024-12-01 11:06:54,624][04297] RunningMeanStd input shape: (3, 72, 128) +[2024-12-01 11:06:54,636][04297] RunningMeanStd input shape: (1,) +[2024-12-01 11:06:54,714][04297] ConvEncoder: input_channels=3 +[2024-12-01 11:06:54,762][04318] Worker 7 uses CPU cores [1] +[2024-12-01 11:06:54,951][04310] Worker 0 uses CPU cores [0] +[2024-12-01 11:06:54,960][04311] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-01 11:06:54,960][04311] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-12-01 11:06:55,034][04311] Num visible devices: 1 +[2024-12-01 11:06:55,226][04314] Worker 3 uses CPU cores [1] +[2024-12-01 11:06:55,240][04312] Worker 1 uses CPU cores [1] +[2024-12-01 11:06:55,259][04313] Worker 2 uses CPU cores [0] +[2024-12-01 11:06:55,286][04317] Worker 6 uses CPU cores [0] +[2024-12-01 11:06:55,314][04315] Worker 4 uses CPU cores [0] +[2024-12-01 11:06:55,337][04316] Worker 5 uses CPU cores [1] +[2024-12-01 11:06:55,372][04297] Conv encoder output size: 512 +[2024-12-01 11:06:55,373][04297] Policy head output size: 512 +[2024-12-01 11:06:55,465][04297] Created Actor Critic model with architecture: +[2024-12-01 11:06:55,465][04297] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -85,1029 +85,1010 @@ (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) -[2024-11-28 08:29:05,458][02251] Using optimizer -[2024-11-28 08:29:06,568][00195] Heartbeat connected on Batcher_0 -[2024-11-28 08:29:06,577][00195] Heartbeat connected on InferenceWorker_p0-w0 -[2024-11-28 08:29:06,586][00195] Heartbeat connected on RolloutWorker_w0 -[2024-11-28 08:29:06,590][00195] Heartbeat connected on RolloutWorker_w1 -[2024-11-28 08:29:06,599][00195] Heartbeat connected on RolloutWorker_w3 -[2024-11-28 08:29:06,600][00195] Heartbeat connected on RolloutWorker_w2 -[2024-11-28 08:29:06,604][00195] Heartbeat connected on RolloutWorker_w4 -[2024-11-28 08:29:06,607][00195] Heartbeat connected on RolloutWorker_w5 -[2024-11-28 08:29:06,610][00195] Heartbeat connected on RolloutWorker_w6 -[2024-11-28 08:29:06,614][00195] Heartbeat connected on RolloutWorker_w7 -[2024-11-28 08:29:09,858][02251] No checkpoints found -[2024-11-28 08:29:09,859][02251] Did not load from checkpoint, starting from scratch! -[2024-11-28 08:29:09,860][02251] Initialized policy 0 weights for model version 0 -[2024-11-28 08:29:09,870][02251] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-11-28 08:29:09,877][02251] LearnerWorker_p0 finished initialization! -[2024-11-28 08:29:09,878][00195] Heartbeat connected on LearnerWorker_p0 -[2024-11-28 08:29:10,137][02268] RunningMeanStd input shape: (3, 72, 128) -[2024-11-28 08:29:10,139][02268] RunningMeanStd input shape: (1,) -[2024-11-28 08:29:10,158][02268] ConvEncoder: input_channels=3 -[2024-11-28 08:29:10,334][02268] Conv encoder output size: 512 -[2024-11-28 08:29:10,335][02268] Policy head output size: 512 -[2024-11-28 08:29:10,417][00195] Inference worker 0-0 is ready! -[2024-11-28 08:29:10,419][00195] All inference workers are ready! Signal rollout workers to start! -[2024-11-28 08:29:10,650][02276] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-11-28 08:29:10,651][02269] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-11-28 08:29:10,652][02273] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-11-28 08:29:10,653][02271] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-11-28 08:29:10,695][02272] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-11-28 08:29:10,699][02270] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-11-28 08:29:10,700][02274] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-11-28 08:29:10,703][02275] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-11-28 08:29:11,776][02274] Decorrelating experience for 0 frames... -[2024-11-28 08:29:11,775][02275] Decorrelating experience for 0 frames... -[2024-11-28 08:29:12,039][02269] Decorrelating experience for 0 frames... -[2024-11-28 08:29:12,044][02276] Decorrelating experience for 0 frames... -[2024-11-28 08:29:12,044][02273] Decorrelating experience for 0 frames... -[2024-11-28 08:29:12,824][02275] Decorrelating experience for 32 frames... -[2024-11-28 08:29:12,907][02270] Decorrelating experience for 0 frames... -[2024-11-28 08:29:13,322][02272] Decorrelating experience for 0 frames... -[2024-11-28 08:29:13,576][02273] Decorrelating experience for 32 frames... -[2024-11-28 08:29:13,580][02269] Decorrelating experience for 32 frames... -[2024-11-28 08:29:13,588][02276] Decorrelating experience for 32 frames... -[2024-11-28 08:29:13,584][02271] Decorrelating experience for 0 frames... -[2024-11-28 08:29:13,966][02274] Decorrelating experience for 32 frames... -[2024-11-28 08:29:14,285][02275] Decorrelating experience for 64 frames... -[2024-11-28 08:29:14,396][00195] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-11-28 08:29:14,526][02270] Decorrelating experience for 32 frames... -[2024-11-28 08:29:14,915][02271] Decorrelating experience for 32 frames... -[2024-11-28 08:29:15,226][02273] Decorrelating experience for 64 frames... -[2024-11-28 08:29:15,234][02269] Decorrelating experience for 64 frames... -[2024-11-28 08:29:15,375][02275] Decorrelating experience for 96 frames... -[2024-11-28 08:29:15,686][02270] Decorrelating experience for 64 frames... -[2024-11-28 08:29:16,085][02274] Decorrelating experience for 64 frames... -[2024-11-28 08:29:16,580][02272] Decorrelating experience for 32 frames... -[2024-11-28 08:29:17,117][02276] Decorrelating experience for 64 frames... -[2024-11-28 08:29:17,119][02271] Decorrelating experience for 64 frames... -[2024-11-28 08:29:17,244][02273] Decorrelating experience for 96 frames... -[2024-11-28 08:29:17,667][02269] Decorrelating experience for 96 frames... -[2024-11-28 08:29:19,337][02276] Decorrelating experience for 96 frames... -[2024-11-28 08:29:19,358][02271] Decorrelating experience for 96 frames... -[2024-11-28 08:29:19,397][00195] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-11-28 08:29:19,399][02274] Decorrelating experience for 96 frames... -[2024-11-28 08:29:19,403][00195] Avg episode reward: [(0, '2.173')] -[2024-11-28 08:29:19,868][02272] Decorrelating experience for 64 frames... -[2024-11-28 08:29:20,507][02270] Decorrelating experience for 96 frames... -[2024-11-28 08:29:24,400][00195] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 225.9. Samples: 2260. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-11-28 08:29:24,408][00195] Avg episode reward: [(0, '2.758')] -[2024-11-28 08:29:24,555][02272] Decorrelating experience for 96 frames... -[2024-11-28 08:29:24,599][02251] Signal inference workers to stop experience collection... -[2024-11-28 08:29:24,633][02268] InferenceWorker_p0-w0: stopping experience collection -[2024-11-28 08:29:27,096][02251] Signal inference workers to resume experience collection... -[2024-11-28 08:29:27,097][02268] InferenceWorker_p0-w0: resuming experience collection -[2024-11-28 08:29:29,397][00195] Fps is (10 sec: 1638.4, 60 sec: 1092.3, 300 sec: 1092.3). Total num frames: 16384. Throughput: 0: 255.2. Samples: 3828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) -[2024-11-28 08:29:29,399][00195] Avg episode reward: [(0, '3.227')] -[2024-11-28 08:29:34,396][00195] Fps is (10 sec: 3687.8, 60 sec: 1843.2, 300 sec: 1843.2). Total num frames: 36864. Throughput: 0: 334.7. Samples: 6694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:29:34,399][00195] Avg episode reward: [(0, '3.727')] -[2024-11-28 08:29:34,989][02268] Updated weights for policy 0, policy_version 10 (0.0030) -[2024-11-28 08:29:39,398][00195] Fps is (10 sec: 3685.7, 60 sec: 2129.8, 300 sec: 2129.8). Total num frames: 53248. Throughput: 0: 531.0. Samples: 13276. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-11-28 08:29:39,403][00195] Avg episode reward: [(0, '4.244')] -[2024-11-28 08:29:44,397][00195] Fps is (10 sec: 3276.8, 60 sec: 2321.1, 300 sec: 2321.1). Total num frames: 69632. Throughput: 0: 582.0. Samples: 17460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-11-28 08:29:44,401][00195] Avg episode reward: [(0, '4.388')] -[2024-11-28 08:29:46,789][02268] Updated weights for policy 0, policy_version 20 (0.0040) -[2024-11-28 08:29:49,397][00195] Fps is (10 sec: 3687.1, 60 sec: 2574.6, 300 sec: 2574.6). Total num frames: 90112. Throughput: 0: 590.1. Samples: 20652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:29:49,400][00195] Avg episode reward: [(0, '4.478')] -[2024-11-28 08:29:54,396][00195] Fps is (10 sec: 4096.0, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 110592. Throughput: 0: 669.3. Samples: 26772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:29:54,400][00195] Avg episode reward: [(0, '4.395')] -[2024-11-28 08:29:54,403][02251] Saving new best policy, reward=4.395! -[2024-11-28 08:29:57,604][02268] Updated weights for policy 0, policy_version 30 (0.0019) -[2024-11-28 08:29:59,397][00195] Fps is (10 sec: 3686.4, 60 sec: 2821.7, 300 sec: 2821.7). Total num frames: 126976. Throughput: 0: 702.2. Samples: 31600. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:29:59,402][00195] Avg episode reward: [(0, '4.283')] -[2024-11-28 08:30:04,397][00195] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 143360. Throughput: 0: 750.9. Samples: 33802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:30:04,399][00195] Avg episode reward: [(0, '4.352')] -[2024-11-28 08:30:08,292][02268] Updated weights for policy 0, policy_version 40 (0.0019) -[2024-11-28 08:30:09,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3053.4, 300 sec: 3053.4). Total num frames: 167936. Throughput: 0: 852.5. Samples: 40620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:30:09,399][00195] Avg episode reward: [(0, '4.472')] -[2024-11-28 08:30:09,406][02251] Saving new best policy, reward=4.472! -[2024-11-28 08:30:14,398][00195] Fps is (10 sec: 4095.3, 60 sec: 3071.9, 300 sec: 3071.9). Total num frames: 184320. Throughput: 0: 946.3. Samples: 46414. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:30:14,406][00195] Avg episode reward: [(0, '4.416')] -[2024-11-28 08:30:19,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3087.7). Total num frames: 200704. Throughput: 0: 927.9. Samples: 48448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:30:19,399][00195] Avg episode reward: [(0, '4.437')] -[2024-11-28 08:30:20,102][02268] Updated weights for policy 0, policy_version 50 (0.0034) -[2024-11-28 08:30:24,397][00195] Fps is (10 sec: 3687.0, 60 sec: 3686.6, 300 sec: 3159.8). Total num frames: 221184. Throughput: 0: 915.7. Samples: 54480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:30:24,399][00195] Avg episode reward: [(0, '4.506')] -[2024-11-28 08:30:24,471][02251] Saving new best policy, reward=4.506! -[2024-11-28 08:30:29,397][00195] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 971.4. Samples: 61172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:30:29,402][00195] Avg episode reward: [(0, '4.484')] -[2024-11-28 08:30:29,580][02268] Updated weights for policy 0, policy_version 60 (0.0023) -[2024-11-28 08:30:34,398][00195] Fps is (10 sec: 3685.8, 60 sec: 3686.3, 300 sec: 3225.5). Total num frames: 258048. Throughput: 0: 944.9. Samples: 63174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:30:34,404][00195] Avg episode reward: [(0, '4.588')] -[2024-11-28 08:30:34,408][02251] Saving new best policy, reward=4.588! -[2024-11-28 08:30:39,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3228.6). Total num frames: 274432. Throughput: 0: 917.7. Samples: 68068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:30:39,398][00195] Avg episode reward: [(0, '4.478')] -[2024-11-28 08:30:39,408][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth... -[2024-11-28 08:30:41,288][02268] Updated weights for policy 0, policy_version 70 (0.0021) -[2024-11-28 08:30:44,397][00195] Fps is (10 sec: 4096.7, 60 sec: 3822.9, 300 sec: 3322.3). Total num frames: 299008. Throughput: 0: 959.0. Samples: 74756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:30:44,398][00195] Avg episode reward: [(0, '4.357')] -[2024-11-28 08:30:49,397][00195] Fps is (10 sec: 4095.7, 60 sec: 3754.6, 300 sec: 3319.9). Total num frames: 315392. Throughput: 0: 979.3. Samples: 77872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-11-28 08:30:49,404][00195] Avg episode reward: [(0, '4.453')] -[2024-11-28 08:30:53,021][02268] Updated weights for policy 0, policy_version 80 (0.0019) -[2024-11-28 08:30:54,397][00195] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3317.7). Total num frames: 331776. Throughput: 0: 920.9. Samples: 82062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:30:54,402][00195] Avg episode reward: [(0, '4.415')] -[2024-11-28 08:30:59,396][00195] Fps is (10 sec: 4096.3, 60 sec: 3822.9, 300 sec: 3393.8). Total num frames: 356352. Throughput: 0: 940.3. Samples: 88724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:30:59,401][00195] Avg episode reward: [(0, '4.372')] -[2024-11-28 08:31:01,999][02268] Updated weights for policy 0, policy_version 90 (0.0015) -[2024-11-28 08:31:04,402][00195] Fps is (10 sec: 4093.9, 60 sec: 3822.6, 300 sec: 3388.3). Total num frames: 372736. Throughput: 0: 966.9. Samples: 91966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:31:04,405][00195] Avg episode reward: [(0, '4.440')] -[2024-11-28 08:31:09,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3383.7). Total num frames: 389120. Throughput: 0: 940.9. Samples: 96822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:31:09,401][00195] Avg episode reward: [(0, '4.442')] -[2024-11-28 08:31:13,771][02268] Updated weights for policy 0, policy_version 100 (0.0033) -[2024-11-28 08:31:14,396][00195] Fps is (10 sec: 3688.5, 60 sec: 3754.8, 300 sec: 3413.3). Total num frames: 409600. Throughput: 0: 919.5. Samples: 102548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:31:14,401][00195] Avg episode reward: [(0, '4.408')] -[2024-11-28 08:31:19,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3473.4). Total num frames: 434176. Throughput: 0: 951.1. Samples: 105974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-11-28 08:31:19,403][00195] Avg episode reward: [(0, '4.485')] -[2024-11-28 08:31:24,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3434.3). Total num frames: 446464. Throughput: 0: 972.5. Samples: 111830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-11-28 08:31:24,402][00195] Avg episode reward: [(0, '4.426')] -[2024-11-28 08:31:24,524][02268] Updated weights for policy 0, policy_version 110 (0.0039) -[2024-11-28 08:31:29,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3458.8). Total num frames: 466944. Throughput: 0: 931.0. Samples: 116652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:31:29,403][00195] Avg episode reward: [(0, '4.378')] -[2024-11-28 08:31:34,397][00195] Fps is (10 sec: 4095.7, 60 sec: 3823.0, 300 sec: 3481.6). Total num frames: 487424. Throughput: 0: 935.3. Samples: 119960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:31:34,404][00195] Avg episode reward: [(0, '4.393')] -[2024-11-28 08:31:34,571][02268] Updated weights for policy 0, policy_version 120 (0.0027) -[2024-11-28 08:31:39,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3502.8). Total num frames: 507904. Throughput: 0: 989.1. Samples: 126570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:31:39,399][00195] Avg episode reward: [(0, '4.437')] -[2024-11-28 08:31:44,396][00195] Fps is (10 sec: 3277.0, 60 sec: 3686.4, 300 sec: 3467.9). Total num frames: 520192. Throughput: 0: 935.0. Samples: 130798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:31:44,399][00195] Avg episode reward: [(0, '4.362')] -[2024-11-28 08:31:46,120][02268] Updated weights for policy 0, policy_version 130 (0.0023) -[2024-11-28 08:31:49,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3514.6). Total num frames: 544768. Throughput: 0: 933.4. Samples: 133962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-11-28 08:31:49,399][00195] Avg episode reward: [(0, '4.470')] -[2024-11-28 08:31:54,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3532.8). Total num frames: 565248. Throughput: 0: 977.0. Samples: 140786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:31:54,401][00195] Avg episode reward: [(0, '4.399')] -[2024-11-28 08:31:55,889][02268] Updated weights for policy 0, policy_version 140 (0.0014) -[2024-11-28 08:31:59,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3525.0). Total num frames: 581632. Throughput: 0: 956.8. Samples: 145606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:31:59,402][00195] Avg episode reward: [(0, '4.375')] -[2024-11-28 08:32:04,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3755.0, 300 sec: 3517.7). Total num frames: 598016. Throughput: 0: 927.2. Samples: 147698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:32:04,398][00195] Avg episode reward: [(0, '4.389')] -[2024-11-28 08:32:07,123][02268] Updated weights for policy 0, policy_version 150 (0.0016) -[2024-11-28 08:32:09,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3557.7). Total num frames: 622592. Throughput: 0: 949.2. Samples: 154546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:32:09,404][00195] Avg episode reward: [(0, '4.396')] -[2024-11-28 08:32:14,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 638976. Throughput: 0: 970.6. Samples: 160330. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:32:14,399][00195] Avg episode reward: [(0, '4.395')] -[2024-11-28 08:32:18,987][02268] Updated weights for policy 0, policy_version 160 (0.0027) -[2024-11-28 08:32:19,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3542.5). Total num frames: 655360. Throughput: 0: 944.0. Samples: 162438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:32:19,404][00195] Avg episode reward: [(0, '4.373')] -[2024-11-28 08:32:24,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3578.6). Total num frames: 679936. Throughput: 0: 929.7. Samples: 168408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-11-28 08:32:24,399][00195] Avg episode reward: [(0, '4.616')] -[2024-11-28 08:32:24,406][02251] Saving new best policy, reward=4.616! -[2024-11-28 08:32:27,716][02268] Updated weights for policy 0, policy_version 170 (0.0016) -[2024-11-28 08:32:29,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3591.9). Total num frames: 700416. Throughput: 0: 988.8. Samples: 175292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-11-28 08:32:29,399][00195] Avg episode reward: [(0, '4.684')] -[2024-11-28 08:32:29,408][02251] Saving new best policy, reward=4.684! -[2024-11-28 08:32:34,397][00195] Fps is (10 sec: 3276.6, 60 sec: 3754.7, 300 sec: 3563.5). Total num frames: 712704. Throughput: 0: 962.4. Samples: 177270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:32:34,400][00195] Avg episode reward: [(0, '4.497')] -[2024-11-28 08:32:39,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3576.5). Total num frames: 733184. Throughput: 0: 921.6. Samples: 182258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:32:39,399][00195] Avg episode reward: [(0, '4.243')] -[2024-11-28 08:32:39,412][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth... -[2024-11-28 08:32:39,957][02268] Updated weights for policy 0, policy_version 180 (0.0027) -[2024-11-28 08:32:44,396][00195] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3608.4). Total num frames: 757760. Throughput: 0: 964.7. Samples: 189018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:32:44,399][00195] Avg episode reward: [(0, '4.377')] -[2024-11-28 08:32:49,400][00195] Fps is (10 sec: 3685.0, 60 sec: 3754.4, 300 sec: 3581.6). Total num frames: 770048. Throughput: 0: 988.5. Samples: 192186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:32:49,403][00195] Avg episode reward: [(0, '4.664')] -[2024-11-28 08:32:50,916][02268] Updated weights for policy 0, policy_version 190 (0.0022) -[2024-11-28 08:32:54,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3593.3). Total num frames: 790528. Throughput: 0: 930.7. Samples: 196428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:32:54,398][00195] Avg episode reward: [(0, '4.635')] -[2024-11-28 08:32:59,397][00195] Fps is (10 sec: 4097.4, 60 sec: 3822.9, 300 sec: 3604.5). Total num frames: 811008. Throughput: 0: 952.3. Samples: 203186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:32:59,399][00195] Avg episode reward: [(0, '4.481')] -[2024-11-28 08:33:00,313][02268] Updated weights for policy 0, policy_version 200 (0.0015) -[2024-11-28 08:33:04,397][00195] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3615.2). Total num frames: 831488. Throughput: 0: 979.2. Samples: 206504. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-11-28 08:33:04,401][00195] Avg episode reward: [(0, '4.461')] -[2024-11-28 08:33:09,397][00195] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3590.5). Total num frames: 843776. Throughput: 0: 949.4. Samples: 211130. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-11-28 08:33:09,404][00195] Avg episode reward: [(0, '4.604')] -[2024-11-28 08:33:14,081][02268] Updated weights for policy 0, policy_version 210 (0.0027) -[2024-11-28 08:33:14,397][00195] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3584.0). Total num frames: 860160. Throughput: 0: 889.8. Samples: 215334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:33:14,401][00195] Avg episode reward: [(0, '4.467')] -[2024-11-28 08:33:19,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3577.7). Total num frames: 876544. Throughput: 0: 894.5. Samples: 217522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:33:19,399][00195] Avg episode reward: [(0, '4.614')] -[2024-11-28 08:33:24,398][00195] Fps is (10 sec: 3276.4, 60 sec: 3549.8, 300 sec: 3571.7). Total num frames: 892928. Throughput: 0: 902.3. Samples: 222862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:33:24,403][00195] Avg episode reward: [(0, '4.624')] -[2024-11-28 08:33:26,715][02268] Updated weights for policy 0, policy_version 220 (0.0024) -[2024-11-28 08:33:29,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3565.9). Total num frames: 909312. Throughput: 0: 860.0. Samples: 227718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:33:29,403][00195] Avg episode reward: [(0, '4.620')] -[2024-11-28 08:33:34,396][00195] Fps is (10 sec: 4096.7, 60 sec: 3686.4, 300 sec: 3591.9). Total num frames: 933888. Throughput: 0: 865.7. Samples: 231138. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:33:34,402][00195] Avg episode reward: [(0, '4.471')] -[2024-11-28 08:33:35,932][02268] Updated weights for policy 0, policy_version 230 (0.0022) -[2024-11-28 08:33:39,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3585.9). Total num frames: 950272. Throughput: 0: 915.3. Samples: 237616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:33:39,399][00195] Avg episode reward: [(0, '4.529')] -[2024-11-28 08:33:44,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3580.2). Total num frames: 966656. Throughput: 0: 857.3. Samples: 241762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:33:44,398][00195] Avg episode reward: [(0, '4.545')] -[2024-11-28 08:33:47,880][02268] Updated weights for policy 0, policy_version 240 (0.0035) -[2024-11-28 08:33:49,399][00195] Fps is (10 sec: 3685.5, 60 sec: 3618.2, 300 sec: 3589.6). Total num frames: 987136. Throughput: 0: 851.0. Samples: 244800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:33:49,404][00195] Avg episode reward: [(0, '4.359')] -[2024-11-28 08:33:54,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3613.3). Total num frames: 1011712. Throughput: 0: 901.0. Samples: 251674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:33:54,402][00195] Avg episode reward: [(0, '4.578')] -[2024-11-28 08:33:58,082][02268] Updated weights for policy 0, policy_version 250 (0.0018) -[2024-11-28 08:33:59,396][00195] Fps is (10 sec: 3687.3, 60 sec: 3549.9, 300 sec: 3593.0). Total num frames: 1024000. Throughput: 0: 920.6. Samples: 256762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:33:59,401][00195] Avg episode reward: [(0, '4.831')] -[2024-11-28 08:33:59,413][02251] Saving new best policy, reward=4.831! -[2024-11-28 08:34:04,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3601.7). Total num frames: 1044480. Throughput: 0: 919.4. Samples: 258894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:34:04,401][00195] Avg episode reward: [(0, '4.669')] -[2024-11-28 08:34:08,353][02268] Updated weights for policy 0, policy_version 260 (0.0018) -[2024-11-28 08:34:09,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 1069056. Throughput: 0: 953.0. Samples: 265744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:34:09,401][00195] Avg episode reward: [(0, '4.407')] -[2024-11-28 08:34:14,399][00195] Fps is (10 sec: 4095.0, 60 sec: 3754.6, 300 sec: 3679.4). Total num frames: 1085440. Throughput: 0: 977.4. Samples: 271702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:34:14,401][00195] Avg episode reward: [(0, '4.758')] -[2024-11-28 08:34:19,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1101824. Throughput: 0: 947.2. Samples: 273764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:34:19,398][00195] Avg episode reward: [(0, '4.826')] -[2024-11-28 08:34:20,041][02268] Updated weights for policy 0, policy_version 270 (0.0014) -[2024-11-28 08:34:24,396][00195] Fps is (10 sec: 4097.0, 60 sec: 3891.3, 300 sec: 3762.8). Total num frames: 1126400. Throughput: 0: 939.5. Samples: 279894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:34:24,399][00195] Avg episode reward: [(0, '4.686')] -[2024-11-28 08:34:29,283][02268] Updated weights for policy 0, policy_version 280 (0.0026) -[2024-11-28 08:34:29,403][00195] Fps is (10 sec: 4502.6, 60 sec: 3959.0, 300 sec: 3762.7). Total num frames: 1146880. Throughput: 0: 998.2. Samples: 286688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:34:29,406][00195] Avg episode reward: [(0, '4.658')] -[2024-11-28 08:34:34,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1159168. Throughput: 0: 976.0. Samples: 288716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:34:34,401][00195] Avg episode reward: [(0, '4.553')] -[2024-11-28 08:34:39,397][00195] Fps is (10 sec: 3279.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1179648. Throughput: 0: 934.7. Samples: 293736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:34:39,401][00195] Avg episode reward: [(0, '4.513')] -[2024-11-28 08:34:39,412][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000288_1179648.pth... -[2024-11-28 08:34:39,538][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth -[2024-11-28 08:34:40,832][02268] Updated weights for policy 0, policy_version 290 (0.0030) -[2024-11-28 08:34:44,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1204224. Throughput: 0: 971.7. Samples: 300490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:34:44,399][00195] Avg episode reward: [(0, '4.423')] -[2024-11-28 08:34:49,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3748.9). Total num frames: 1216512. Throughput: 0: 989.0. Samples: 303400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:34:49,399][00195] Avg episode reward: [(0, '4.553')] -[2024-11-28 08:34:52,707][02268] Updated weights for policy 0, policy_version 300 (0.0041) -[2024-11-28 08:34:54,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1236992. Throughput: 0: 928.0. Samples: 307502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:34:54,399][00195] Avg episode reward: [(0, '4.876')] -[2024-11-28 08:34:54,403][02251] Saving new best policy, reward=4.876! -[2024-11-28 08:34:59,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1257472. Throughput: 0: 947.2. Samples: 314322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:34:59,399][00195] Avg episode reward: [(0, '4.982')] -[2024-11-28 08:34:59,409][02251] Saving new best policy, reward=4.982! -[2024-11-28 08:35:01,827][02268] Updated weights for policy 0, policy_version 310 (0.0041) -[2024-11-28 08:35:04,404][00195] Fps is (10 sec: 4092.9, 60 sec: 3890.7, 300 sec: 3762.7). Total num frames: 1277952. Throughput: 0: 976.1. Samples: 317694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:35:04,407][00195] Avg episode reward: [(0, '4.835')] -[2024-11-28 08:35:09,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 1290240. Throughput: 0: 943.0. Samples: 322328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:35:09,402][00195] Avg episode reward: [(0, '4.703')] -[2024-11-28 08:35:13,672][02268] Updated weights for policy 0, policy_version 320 (0.0024) -[2024-11-28 08:35:14,396][00195] Fps is (10 sec: 3279.3, 60 sec: 3754.8, 300 sec: 3762.8). Total num frames: 1310720. Throughput: 0: 921.4. Samples: 328144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:35:14,399][00195] Avg episode reward: [(0, '4.770')] -[2024-11-28 08:35:19,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1335296. Throughput: 0: 953.6. Samples: 331630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-11-28 08:35:19,398][00195] Avg episode reward: [(0, '4.880')] -[2024-11-28 08:35:24,098][02268] Updated weights for policy 0, policy_version 330 (0.0021) -[2024-11-28 08:35:24,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1351680. Throughput: 0: 971.8. Samples: 337466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:35:24,401][00195] Avg episode reward: [(0, '4.857')] -[2024-11-28 08:35:29,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.8, 300 sec: 3762.8). Total num frames: 1368064. Throughput: 0: 930.1. Samples: 342346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:35:29,405][00195] Avg episode reward: [(0, '4.802')] -[2024-11-28 08:35:33,967][02268] Updated weights for policy 0, policy_version 340 (0.0020) -[2024-11-28 08:35:34,399][00195] Fps is (10 sec: 4095.0, 60 sec: 3891.0, 300 sec: 3790.5). Total num frames: 1392640. Throughput: 0: 940.0. Samples: 345704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:35:34,403][00195] Avg episode reward: [(0, '4.791')] -[2024-11-28 08:35:39,400][00195] Fps is (10 sec: 4094.5, 60 sec: 3822.7, 300 sec: 3762.7). Total num frames: 1409024. Throughput: 0: 993.1. Samples: 352194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:35:39,403][00195] Avg episode reward: [(0, '4.716')] -[2024-11-28 08:35:44,396][00195] Fps is (10 sec: 3277.6, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1425408. Throughput: 0: 934.0. Samples: 356354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:35:44,404][00195] Avg episode reward: [(0, '4.752')] -[2024-11-28 08:35:45,884][02268] Updated weights for policy 0, policy_version 350 (0.0017) -[2024-11-28 08:35:49,397][00195] Fps is (10 sec: 3687.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1445888. Throughput: 0: 931.0. Samples: 359582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:35:49,403][00195] Avg episode reward: [(0, '4.883')] -[2024-11-28 08:35:54,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1470464. Throughput: 0: 978.0. Samples: 366336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:35:54,398][00195] Avg episode reward: [(0, '5.083')] -[2024-11-28 08:35:54,403][02251] Saving new best policy, reward=5.083! -[2024-11-28 08:35:55,665][02268] Updated weights for policy 0, policy_version 360 (0.0015) -[2024-11-28 08:35:59,403][00195] Fps is (10 sec: 3683.9, 60 sec: 3754.2, 300 sec: 3762.8). Total num frames: 1482752. Throughput: 0: 954.6. Samples: 371106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2024-11-28 08:35:59,410][00195] Avg episode reward: [(0, '4.896')] -[2024-11-28 08:36:04,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3755.1, 300 sec: 3776.7). Total num frames: 1503232. Throughput: 0: 926.2. Samples: 373308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:36:04,399][00195] Avg episode reward: [(0, '4.951')] -[2024-11-28 08:36:07,071][02268] Updated weights for policy 0, policy_version 370 (0.0025) -[2024-11-28 08:36:09,397][00195] Fps is (10 sec: 4098.7, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 1523712. Throughput: 0: 946.2. Samples: 380044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-11-28 08:36:09,399][00195] Avg episode reward: [(0, '5.023')] -[2024-11-28 08:36:14,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1540096. Throughput: 0: 966.9. Samples: 385858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:36:14,403][00195] Avg episode reward: [(0, '4.944')] -[2024-11-28 08:36:18,674][02268] Updated weights for policy 0, policy_version 380 (0.0044) -[2024-11-28 08:36:19,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1556480. Throughput: 0: 939.6. Samples: 387984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:36:19,403][00195] Avg episode reward: [(0, '5.061')] -[2024-11-28 08:36:24,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1581056. Throughput: 0: 931.0. Samples: 394086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:36:24,399][00195] Avg episode reward: [(0, '4.747')] -[2024-11-28 08:36:27,618][02268] Updated weights for policy 0, policy_version 390 (0.0021) -[2024-11-28 08:36:29,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1601536. Throughput: 0: 991.4. Samples: 400968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2024-11-28 08:36:29,399][00195] Avg episode reward: [(0, '4.793')] -[2024-11-28 08:36:34,401][00195] Fps is (10 sec: 3275.4, 60 sec: 3686.3, 300 sec: 3748.8). Total num frames: 1613824. Throughput: 0: 964.4. Samples: 402986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2024-11-28 08:36:34,403][00195] Avg episode reward: [(0, '4.944')] -[2024-11-28 08:36:39,320][02268] Updated weights for policy 0, policy_version 400 (0.0031) -[2024-11-28 08:36:39,400][00195] Fps is (10 sec: 3685.1, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1638400. Throughput: 0: 926.1. Samples: 408012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:36:39,403][00195] Avg episode reward: [(0, '5.264')] -[2024-11-28 08:36:39,414][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000400_1638400.pth... -[2024-11-28 08:36:39,532][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth -[2024-11-28 08:36:39,550][02251] Saving new best policy, reward=5.264! -[2024-11-28 08:36:44,396][00195] Fps is (10 sec: 4507.5, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1658880. Throughput: 0: 971.2. Samples: 414804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:36:44,400][00195] Avg episode reward: [(0, '5.291')] -[2024-11-28 08:36:44,404][02251] Saving new best policy, reward=5.291! -[2024-11-28 08:36:49,398][00195] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 3762.7). Total num frames: 1675264. Throughput: 0: 985.4. Samples: 417652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-11-28 08:36:49,413][00195] Avg episode reward: [(0, '5.184')] -[2024-11-28 08:36:50,487][02268] Updated weights for policy 0, policy_version 410 (0.0043) -[2024-11-28 08:36:54,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1691648. Throughput: 0: 931.3. Samples: 421952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:36:54,404][00195] Avg episode reward: [(0, '5.129')] -[2024-11-28 08:36:59,396][00195] Fps is (10 sec: 4096.5, 60 sec: 3891.6, 300 sec: 3790.5). Total num frames: 1716224. Throughput: 0: 959.8. Samples: 429050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:36:59,398][00195] Avg episode reward: [(0, '5.233')] -[2024-11-28 08:36:59,862][02268] Updated weights for policy 0, policy_version 420 (0.0021) -[2024-11-28 08:37:04,397][00195] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1736704. Throughput: 0: 989.6. Samples: 432518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:37:04,400][00195] Avg episode reward: [(0, '5.147')] -[2024-11-28 08:37:09,397][00195] Fps is (10 sec: 3276.5, 60 sec: 3754.6, 300 sec: 3762.8). Total num frames: 1748992. Throughput: 0: 954.8. Samples: 437052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:37:09,404][00195] Avg episode reward: [(0, '5.163')] -[2024-11-28 08:37:11,543][02268] Updated weights for policy 0, policy_version 430 (0.0015) -[2024-11-28 08:37:14,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1773568. Throughput: 0: 934.7. Samples: 443028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:37:14,399][00195] Avg episode reward: [(0, '5.476')] -[2024-11-28 08:37:14,408][02251] Saving new best policy, reward=5.476! -[2024-11-28 08:37:19,397][00195] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1794048. Throughput: 0: 965.9. Samples: 446448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:37:19,402][00195] Avg episode reward: [(0, '5.682')] -[2024-11-28 08:37:19,411][02251] Saving new best policy, reward=5.682! -[2024-11-28 08:37:21,204][02268] Updated weights for policy 0, policy_version 440 (0.0024) -[2024-11-28 08:37:24,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1810432. Throughput: 0: 976.0. Samples: 451930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:37:24,401][00195] Avg episode reward: [(0, '5.462')] -[2024-11-28 08:37:29,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1826816. Throughput: 0: 938.6. Samples: 457042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:37:29,404][00195] Avg episode reward: [(0, '5.400')] -[2024-11-28 08:37:32,099][02268] Updated weights for policy 0, policy_version 450 (0.0035) -[2024-11-28 08:37:34,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3790.5). Total num frames: 1851392. Throughput: 0: 950.7. Samples: 460434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:37:34,404][00195] Avg episode reward: [(0, '5.745')] -[2024-11-28 08:37:34,413][02251] Saving new best policy, reward=5.745! -[2024-11-28 08:37:39,399][00195] Fps is (10 sec: 4094.9, 60 sec: 3823.0, 300 sec: 3762.7). Total num frames: 1867776. Throughput: 0: 994.6. Samples: 466712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:37:39,403][00195] Avg episode reward: [(0, '5.710')] -[2024-11-28 08:37:44,025][02268] Updated weights for policy 0, policy_version 460 (0.0031) -[2024-11-28 08:37:44,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1884160. Throughput: 0: 930.0. Samples: 470898. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-11-28 08:37:44,404][00195] Avg episode reward: [(0, '5.802')] -[2024-11-28 08:37:44,407][02251] Saving new best policy, reward=5.802! -[2024-11-28 08:37:49,397][00195] Fps is (10 sec: 3687.4, 60 sec: 3823.0, 300 sec: 3776.7). Total num frames: 1904640. Throughput: 0: 924.4. Samples: 474118. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:37:49,403][00195] Avg episode reward: [(0, '6.391')] -[2024-11-28 08:37:49,412][02251] Saving new best policy, reward=6.391! -[2024-11-28 08:37:53,198][02268] Updated weights for policy 0, policy_version 470 (0.0021) -[2024-11-28 08:37:54,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 1929216. Throughput: 0: 973.8. Samples: 480872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:37:54,401][00195] Avg episode reward: [(0, '6.495')] -[2024-11-28 08:37:54,403][02251] Saving new best policy, reward=6.495! -[2024-11-28 08:37:59,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1941504. Throughput: 0: 944.8. Samples: 485546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:37:59,401][00195] Avg episode reward: [(0, '6.194')] -[2024-11-28 08:38:04,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1961984. Throughput: 0: 922.2. Samples: 487948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:38:04,405][00195] Avg episode reward: [(0, '6.302')] -[2024-11-28 08:38:04,907][02268] Updated weights for policy 0, policy_version 480 (0.0019) -[2024-11-28 08:38:09,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 1982464. Throughput: 0: 951.8. Samples: 494762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:38:09,399][00195] Avg episode reward: [(0, '6.429')] -[2024-11-28 08:38:14,399][00195] Fps is (10 sec: 3276.0, 60 sec: 3686.2, 300 sec: 3790.5). Total num frames: 1994752. Throughput: 0: 933.3. Samples: 499042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:38:14,402][00195] Avg episode reward: [(0, '6.653')] -[2024-11-28 08:38:14,406][02251] Saving new best policy, reward=6.653! -[2024-11-28 08:38:19,280][02268] Updated weights for policy 0, policy_version 490 (0.0040) -[2024-11-28 08:38:19,396][00195] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3776.7). Total num frames: 2007040. Throughput: 0: 892.6. Samples: 500600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:38:19,399][00195] Avg episode reward: [(0, '6.352')] -[2024-11-28 08:38:24,397][00195] Fps is (10 sec: 3277.6, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 2027520. Throughput: 0: 857.6. Samples: 505302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:38:24,409][00195] Avg episode reward: [(0, '6.584')] -[2024-11-28 08:38:28,691][02268] Updated weights for policy 0, policy_version 500 (0.0018) -[2024-11-28 08:38:29,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2048000. Throughput: 0: 919.1. Samples: 512258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:38:29,399][00195] Avg episode reward: [(0, '7.282')] -[2024-11-28 08:38:29,409][02251] Saving new best policy, reward=7.282! -[2024-11-28 08:38:34,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3776.7). Total num frames: 2064384. Throughput: 0: 914.5. Samples: 515272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:38:34,403][00195] Avg episode reward: [(0, '7.865')] -[2024-11-28 08:38:34,406][02251] Saving new best policy, reward=7.865! -[2024-11-28 08:38:39,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3776.7). Total num frames: 2080768. Throughput: 0: 853.4. Samples: 519276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:38:39,399][00195] Avg episode reward: [(0, '7.578')] -[2024-11-28 08:38:39,406][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000508_2080768.pth... -[2024-11-28 08:38:39,557][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000288_1179648.pth -[2024-11-28 08:38:40,821][02268] Updated weights for policy 0, policy_version 510 (0.0040) -[2024-11-28 08:38:44,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 2101248. Throughput: 0: 893.9. Samples: 525770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2024-11-28 08:38:44,399][00195] Avg episode reward: [(0, '7.313')] -[2024-11-28 08:38:49,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 2121728. Throughput: 0: 915.3. Samples: 529138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:38:49,402][00195] Avg episode reward: [(0, '7.074')] -[2024-11-28 08:38:51,008][02268] Updated weights for policy 0, policy_version 520 (0.0021) -[2024-11-28 08:38:54,397][00195] Fps is (10 sec: 3686.1, 60 sec: 3481.6, 300 sec: 3776.6). Total num frames: 2138112. Throughput: 0: 874.4. Samples: 534110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:38:54,400][00195] Avg episode reward: [(0, '7.200')] -[2024-11-28 08:38:59,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 2158592. Throughput: 0: 909.7. Samples: 539978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:38:59,399][00195] Avg episode reward: [(0, '7.859')] -[2024-11-28 08:39:01,369][02268] Updated weights for policy 0, policy_version 530 (0.0023) -[2024-11-28 08:39:04,397][00195] Fps is (10 sec: 4505.9, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 2183168. Throughput: 0: 951.7. Samples: 543428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:39:04,399][00195] Avg episode reward: [(0, '8.011')] -[2024-11-28 08:39:04,404][02251] Saving new best policy, reward=8.011! -[2024-11-28 08:39:09,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 2195456. Throughput: 0: 976.0. Samples: 549222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:39:09,399][00195] Avg episode reward: [(0, '7.563')] -[2024-11-28 08:39:12,992][02268] Updated weights for policy 0, policy_version 540 (0.0019) -[2024-11-28 08:39:14,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3776.7). Total num frames: 2215936. Throughput: 0: 928.5. Samples: 554040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:39:14,399][00195] Avg episode reward: [(0, '7.246')] -[2024-11-28 08:39:19,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2240512. Throughput: 0: 938.6. Samples: 557508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:39:19,398][00195] Avg episode reward: [(0, '7.660')] -[2024-11-28 08:39:21,954][02268] Updated weights for policy 0, policy_version 550 (0.0022) -[2024-11-28 08:39:24,410][00195] Fps is (10 sec: 4090.6, 60 sec: 3822.1, 300 sec: 3762.7). Total num frames: 2256896. Throughput: 0: 999.4. Samples: 564264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:39:24,415][00195] Avg episode reward: [(0, '8.159')] -[2024-11-28 08:39:24,418][02251] Saving new best policy, reward=8.159! -[2024-11-28 08:39:29,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2273280. Throughput: 0: 947.8. Samples: 568420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:39:29,404][00195] Avg episode reward: [(0, '8.909')] -[2024-11-28 08:39:29,416][02251] Saving new best policy, reward=8.909! -[2024-11-28 08:39:33,466][02268] Updated weights for policy 0, policy_version 560 (0.0029) -[2024-11-28 08:39:34,396][00195] Fps is (10 sec: 4101.5, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2297856. Throughput: 0: 942.6. Samples: 571556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-11-28 08:39:34,399][00195] Avg episode reward: [(0, '9.134')] -[2024-11-28 08:39:34,403][02251] Saving new best policy, reward=9.134! -[2024-11-28 08:39:39,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 2318336. Throughput: 0: 984.8. Samples: 578426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:39:39,401][00195] Avg episode reward: [(0, '9.669')] -[2024-11-28 08:39:39,416][02251] Saving new best policy, reward=9.669! -[2024-11-28 08:39:44,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2330624. Throughput: 0: 959.4. Samples: 583150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:39:44,399][00195] Avg episode reward: [(0, '9.713')] -[2024-11-28 08:39:44,406][02251] Saving new best policy, reward=9.713! -[2024-11-28 08:39:44,860][02268] Updated weights for policy 0, policy_version 570 (0.0027) -[2024-11-28 08:39:49,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2351104. Throughput: 0: 932.8. Samples: 585402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:39:49,404][00195] Avg episode reward: [(0, '10.444')] -[2024-11-28 08:39:49,415][02251] Saving new best policy, reward=10.444! -[2024-11-28 08:39:54,366][02268] Updated weights for policy 0, policy_version 580 (0.0018) -[2024-11-28 08:39:54,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 2375680. Throughput: 0: 956.3. Samples: 592256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-11-28 08:39:54,399][00195] Avg episode reward: [(0, '9.718')] -[2024-11-28 08:39:59,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2392064. Throughput: 0: 980.0. Samples: 598142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:39:59,403][00195] Avg episode reward: [(0, '9.579')] -[2024-11-28 08:40:04,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2408448. Throughput: 0: 949.7. Samples: 600244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:40:04,398][00195] Avg episode reward: [(0, '9.062')] -[2024-11-28 08:40:05,888][02268] Updated weights for policy 0, policy_version 590 (0.0024) -[2024-11-28 08:40:09,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2428928. Throughput: 0: 941.0. Samples: 606598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:40:09,403][00195] Avg episode reward: [(0, '9.228')] -[2024-11-28 08:40:14,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2449408. Throughput: 0: 996.5. Samples: 613262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:40:14,402][00195] Avg episode reward: [(0, '8.935')] -[2024-11-28 08:40:15,965][02268] Updated weights for policy 0, policy_version 600 (0.0038) -[2024-11-28 08:40:19,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2465792. Throughput: 0: 973.2. Samples: 615352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:40:19,399][00195] Avg episode reward: [(0, '8.619')] -[2024-11-28 08:40:24,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3823.8, 300 sec: 3790.5). Total num frames: 2486272. Throughput: 0: 935.5. Samples: 620522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:40:24,399][00195] Avg episode reward: [(0, '8.585')] -[2024-11-28 08:40:26,460][02268] Updated weights for policy 0, policy_version 610 (0.0021) -[2024-11-28 08:40:29,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.6). Total num frames: 2510848. Throughput: 0: 983.2. Samples: 627396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:40:29,399][00195] Avg episode reward: [(0, '8.622')] -[2024-11-28 08:40:34,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 2527232. Throughput: 0: 998.7. Samples: 630342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:40:34,399][00195] Avg episode reward: [(0, '8.701')] -[2024-11-28 08:40:38,003][02268] Updated weights for policy 0, policy_version 620 (0.0019) -[2024-11-28 08:40:39,399][00195] Fps is (10 sec: 3275.9, 60 sec: 3754.5, 300 sec: 3790.5). Total num frames: 2543616. Throughput: 0: 942.7. Samples: 634682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:40:39,403][00195] Avg episode reward: [(0, '9.028')] -[2024-11-28 08:40:39,420][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000621_2543616.pth... -[2024-11-28 08:40:39,558][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000400_1638400.pth -[2024-11-28 08:40:44,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2568192. Throughput: 0: 962.5. Samples: 641456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:40:44,399][00195] Avg episode reward: [(0, '8.962')] -[2024-11-28 08:40:46,955][02268] Updated weights for policy 0, policy_version 630 (0.0021) -[2024-11-28 08:40:49,397][00195] Fps is (10 sec: 4097.1, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2584576. Throughput: 0: 994.0. Samples: 644976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:40:49,399][00195] Avg episode reward: [(0, '9.096')] -[2024-11-28 08:40:54,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 2600960. Throughput: 0: 955.0. Samples: 649574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:40:54,403][00195] Avg episode reward: [(0, '9.462')] -[2024-11-28 08:40:58,445][02268] Updated weights for policy 0, policy_version 640 (0.0016) -[2024-11-28 08:40:59,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2625536. Throughput: 0: 947.8. Samples: 655912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:40:59,399][00195] Avg episode reward: [(0, '11.201')] -[2024-11-28 08:40:59,409][02251] Saving new best policy, reward=11.201! -[2024-11-28 08:41:04,397][00195] Fps is (10 sec: 4505.4, 60 sec: 3959.4, 300 sec: 3804.4). Total num frames: 2646016. Throughput: 0: 977.3. Samples: 659332. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:41:04,404][00195] Avg episode reward: [(0, '12.785')] -[2024-11-28 08:41:04,406][02251] Saving new best policy, reward=12.785! -[2024-11-28 08:41:09,260][02268] Updated weights for policy 0, policy_version 650 (0.0030) -[2024-11-28 08:41:09,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2662400. Throughput: 0: 985.7. Samples: 664878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:41:09,399][00195] Avg episode reward: [(0, '12.171')] -[2024-11-28 08:41:14,396][00195] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2678784. Throughput: 0: 943.3. Samples: 669844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:41:14,399][00195] Avg episode reward: [(0, '11.355')] -[2024-11-28 08:41:19,035][02268] Updated weights for policy 0, policy_version 660 (0.0016) -[2024-11-28 08:41:19,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2703360. Throughput: 0: 954.8. Samples: 673310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:41:19,399][00195] Avg episode reward: [(0, '10.918')] -[2024-11-28 08:41:24,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2719744. Throughput: 0: 1005.3. Samples: 679920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:41:24,398][00195] Avg episode reward: [(0, '11.943')] -[2024-11-28 08:41:29,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.5). Total num frames: 2736128. Throughput: 0: 947.2. Samples: 684078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:41:29,404][00195] Avg episode reward: [(0, '12.273')] -[2024-11-28 08:41:30,866][02268] Updated weights for policy 0, policy_version 670 (0.0018) -[2024-11-28 08:41:34,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.5). Total num frames: 2760704. Throughput: 0: 941.8. Samples: 687356. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:41:34,398][00195] Avg episode reward: [(0, '13.343')] -[2024-11-28 08:41:34,401][02251] Saving new best policy, reward=13.343! -[2024-11-28 08:41:39,398][00195] Fps is (10 sec: 4504.8, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2781184. Throughput: 0: 992.9. Samples: 694254. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:41:39,403][00195] Avg episode reward: [(0, '14.360')] -[2024-11-28 08:41:39,419][02251] Saving new best policy, reward=14.360! -[2024-11-28 08:41:40,062][02268] Updated weights for policy 0, policy_version 680 (0.0018) -[2024-11-28 08:41:44,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 2793472. Throughput: 0: 956.1. Samples: 698938. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2024-11-28 08:41:44,403][00195] Avg episode reward: [(0, '13.685')] -[2024-11-28 08:41:49,396][00195] Fps is (10 sec: 3277.3, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2813952. Throughput: 0: 933.5. Samples: 701338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:41:49,398][00195] Avg episode reward: [(0, '12.569')] -[2024-11-28 08:41:51,418][02268] Updated weights for policy 0, policy_version 690 (0.0014) -[2024-11-28 08:41:54,397][00195] Fps is (10 sec: 4505.7, 60 sec: 3959.4, 300 sec: 3804.4). Total num frames: 2838528. Throughput: 0: 964.9. Samples: 708300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:41:54,400][00195] Avg episode reward: [(0, '13.141')] -[2024-11-28 08:41:59,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2854912. Throughput: 0: 984.0. Samples: 714126. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:41:59,402][00195] Avg episode reward: [(0, '12.530')] -[2024-11-28 08:42:03,103][02268] Updated weights for policy 0, policy_version 700 (0.0021) -[2024-11-28 08:42:04,396][00195] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2871296. Throughput: 0: 951.0. Samples: 716106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:42:04,404][00195] Avg episode reward: [(0, '13.076')] -[2024-11-28 08:42:09,400][00195] Fps is (10 sec: 4094.6, 60 sec: 3891.0, 300 sec: 3804.4). Total num frames: 2895872. Throughput: 0: 942.5. Samples: 722336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:42:09,402][00195] Avg episode reward: [(0, '14.494')] -[2024-11-28 08:42:09,412][02251] Saving new best policy, reward=14.494! -[2024-11-28 08:42:12,338][02268] Updated weights for policy 0, policy_version 710 (0.0018) -[2024-11-28 08:42:14,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2912256. Throughput: 0: 995.8. Samples: 728890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:42:14,407][00195] Avg episode reward: [(0, '14.829')] -[2024-11-28 08:42:14,412][02251] Saving new best policy, reward=14.829! -[2024-11-28 08:42:19,397][00195] Fps is (10 sec: 3277.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2928640. Throughput: 0: 967.7. Samples: 730902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:42:19,402][00195] Avg episode reward: [(0, '15.400')] -[2024-11-28 08:42:19,413][02251] Saving new best policy, reward=15.400! -[2024-11-28 08:42:24,016][02268] Updated weights for policy 0, policy_version 720 (0.0037) -[2024-11-28 08:42:24,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2949120. Throughput: 0: 929.3. Samples: 736072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:42:24,400][00195] Avg episode reward: [(0, '15.334')] -[2024-11-28 08:42:29,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2973696. Throughput: 0: 979.9. Samples: 743034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-11-28 08:42:29,399][00195] Avg episode reward: [(0, '15.673')] -[2024-11-28 08:42:29,406][02251] Saving new best policy, reward=15.673! -[2024-11-28 08:42:34,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 2985984. Throughput: 0: 990.3. Samples: 745902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:42:34,404][00195] Avg episode reward: [(0, '15.352')] -[2024-11-28 08:42:34,611][02268] Updated weights for policy 0, policy_version 730 (0.0022) -[2024-11-28 08:42:39,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3754.8, 300 sec: 3804.4). Total num frames: 3006464. Throughput: 0: 931.3. Samples: 750210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:42:39,400][00195] Avg episode reward: [(0, '15.389')] -[2024-11-28 08:42:39,410][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000734_3006464.pth... -[2024-11-28 08:42:39,552][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000508_2080768.pth -[2024-11-28 08:42:44,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3026944. Throughput: 0: 951.0. Samples: 756922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:42:44,399][00195] Avg episode reward: [(0, '16.028')] -[2024-11-28 08:42:44,407][02251] Saving new best policy, reward=16.028! -[2024-11-28 08:42:44,734][02268] Updated weights for policy 0, policy_version 740 (0.0026) -[2024-11-28 08:42:49,397][00195] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3047424. Throughput: 0: 983.2. Samples: 760350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:42:49,403][00195] Avg episode reward: [(0, '16.306')] -[2024-11-28 08:42:49,413][02251] Saving new best policy, reward=16.306! -[2024-11-28 08:42:54,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 3059712. Throughput: 0: 943.4. Samples: 764784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:42:54,403][00195] Avg episode reward: [(0, '15.748')] -[2024-11-28 08:42:56,518][02268] Updated weights for policy 0, policy_version 750 (0.0021) -[2024-11-28 08:42:59,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3084288. Throughput: 0: 933.5. Samples: 770896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:42:59,405][00195] Avg episode reward: [(0, '16.924')] -[2024-11-28 08:42:59,414][02251] Saving new best policy, reward=16.924! -[2024-11-28 08:43:04,396][00195] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3104768. Throughput: 0: 965.0. Samples: 774326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-11-28 08:43:04,402][00195] Avg episode reward: [(0, '18.483')] -[2024-11-28 08:43:04,412][02251] Saving new best policy, reward=18.483! -[2024-11-28 08:43:06,068][02268] Updated weights for policy 0, policy_version 760 (0.0028) -[2024-11-28 08:43:09,398][00195] Fps is (10 sec: 3686.0, 60 sec: 3754.8, 300 sec: 3818.3). Total num frames: 3121152. Throughput: 0: 971.3. Samples: 779782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:43:09,400][00195] Avg episode reward: [(0, '17.122')] -[2024-11-28 08:43:14,403][00195] Fps is (10 sec: 2865.3, 60 sec: 3686.0, 300 sec: 3818.2). Total num frames: 3133440. Throughput: 0: 914.8. Samples: 784206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-11-28 08:43:14,415][00195] Avg episode reward: [(0, '17.412')] -[2024-11-28 08:43:19,396][00195] Fps is (10 sec: 2867.5, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 3149824. Throughput: 0: 895.7. Samples: 786208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-11-28 08:43:19,400][00195] Avg episode reward: [(0, '16.753')] -[2024-11-28 08:43:20,117][02268] Updated weights for policy 0, policy_version 770 (0.0030) -[2024-11-28 08:43:24,396][00195] Fps is (10 sec: 3279.0, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 3166208. Throughput: 0: 914.9. Samples: 791378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:43:24,401][00195] Avg episode reward: [(0, '17.421')] -[2024-11-28 08:43:29,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3790.5). Total num frames: 3182592. Throughput: 0: 859.3. Samples: 795592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:43:29,399][00195] Avg episode reward: [(0, '16.994')] -[2024-11-28 08:43:31,735][02268] Updated weights for policy 0, policy_version 780 (0.0041) -[2024-11-28 08:43:34,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3207168. Throughput: 0: 856.8. Samples: 798904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:43:34,401][00195] Avg episode reward: [(0, '18.072')] -[2024-11-28 08:43:39,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3227648. Throughput: 0: 912.8. Samples: 805858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:43:39,402][00195] Avg episode reward: [(0, '19.560')] -[2024-11-28 08:43:39,412][02251] Saving new best policy, reward=19.560! -[2024-11-28 08:43:41,910][02268] Updated weights for policy 0, policy_version 790 (0.0022) -[2024-11-28 08:43:44,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3790.5). Total num frames: 3239936. Throughput: 0: 879.9. Samples: 810492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:43:44,398][00195] Avg episode reward: [(0, '20.542')] -[2024-11-28 08:43:44,401][02251] Saving new best policy, reward=20.542! -[2024-11-28 08:43:49,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3804.4). Total num frames: 3260416. Throughput: 0: 855.2. Samples: 812810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:43:49,404][00195] Avg episode reward: [(0, '18.871')] -[2024-11-28 08:43:52,506][02268] Updated weights for policy 0, policy_version 800 (0.0022) -[2024-11-28 08:43:54,397][00195] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3284992. Throughput: 0: 887.4. Samples: 819712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:43:54,398][00195] Avg episode reward: [(0, '18.730')] -[2024-11-28 08:43:59,399][00195] Fps is (10 sec: 4095.1, 60 sec: 3618.0, 300 sec: 3790.5). Total num frames: 3301376. Throughput: 0: 917.7. Samples: 825498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:43:59,401][00195] Avg episode reward: [(0, '18.799')] -[2024-11-28 08:44:04,211][02268] Updated weights for policy 0, policy_version 810 (0.0030) -[2024-11-28 08:44:04,396][00195] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3804.4). Total num frames: 3317760. Throughput: 0: 919.1. Samples: 827566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:44:04,402][00195] Avg episode reward: [(0, '17.752')] -[2024-11-28 08:44:09,397][00195] Fps is (10 sec: 3687.4, 60 sec: 3618.2, 300 sec: 3804.4). Total num frames: 3338240. Throughput: 0: 943.6. Samples: 833838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:44:09,399][00195] Avg episode reward: [(0, '18.057')] -[2024-11-28 08:44:13,240][02268] Updated weights for policy 0, policy_version 820 (0.0024) -[2024-11-28 08:44:14,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3755.1, 300 sec: 3790.5). Total num frames: 3358720. Throughput: 0: 995.5. Samples: 840390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:44:14,401][00195] Avg episode reward: [(0, '18.791')] -[2024-11-28 08:44:19,397][00195] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3790.7). Total num frames: 3375104. Throughput: 0: 969.1. Samples: 842512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:44:19,400][00195] Avg episode reward: [(0, '19.066')] -[2024-11-28 08:44:24,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3395584. Throughput: 0: 934.6. Samples: 847916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:44:24,400][00195] Avg episode reward: [(0, '18.069')] -[2024-11-28 08:44:24,522][02268] Updated weights for policy 0, policy_version 830 (0.0039) -[2024-11-28 08:44:29,396][00195] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3420160. Throughput: 0: 990.5. Samples: 855064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:44:29,399][00195] Avg episode reward: [(0, '17.541')] -[2024-11-28 08:44:34,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3436544. Throughput: 0: 1003.0. Samples: 857944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:44:34,401][00195] Avg episode reward: [(0, '16.624')] -[2024-11-28 08:44:35,341][02268] Updated weights for policy 0, policy_version 840 (0.0018) -[2024-11-28 08:44:39,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3457024. Throughput: 0: 949.5. Samples: 862440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:44:39,402][00195] Avg episode reward: [(0, '17.563')] -[2024-11-28 08:44:39,411][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000844_3457024.pth... -[2024-11-28 08:44:39,558][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000621_2543616.pth -[2024-11-28 08:44:44,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3477504. Throughput: 0: 973.5. Samples: 869302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:44:44,399][00195] Avg episode reward: [(0, '18.362')] -[2024-11-28 08:44:44,945][02268] Updated weights for policy 0, policy_version 850 (0.0032) -[2024-11-28 08:44:49,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3497984. Throughput: 0: 1003.4. Samples: 872718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:44:49,399][00195] Avg episode reward: [(0, '19.933')] -[2024-11-28 08:44:54,397][00195] Fps is (10 sec: 3276.6, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 3510272. Throughput: 0: 962.6. Samples: 877156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:44:54,402][00195] Avg episode reward: [(0, '20.494')] -[2024-11-28 08:44:56,301][02268] Updated weights for policy 0, policy_version 860 (0.0028) -[2024-11-28 08:44:59,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3818.3). Total num frames: 3534848. Throughput: 0: 961.6. Samples: 883660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:44:59,400][00195] Avg episode reward: [(0, '21.358')] -[2024-11-28 08:44:59,408][02251] Saving new best policy, reward=21.358! -[2024-11-28 08:45:04,398][00195] Fps is (10 sec: 4505.1, 60 sec: 3959.4, 300 sec: 3818.3). Total num frames: 3555328. Throughput: 0: 989.6. Samples: 887046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:45:04,404][00195] Avg episode reward: [(0, '21.614')] -[2024-11-28 08:45:04,440][02251] Saving new best policy, reward=21.614! -[2024-11-28 08:45:06,079][02268] Updated weights for policy 0, policy_version 870 (0.0037) -[2024-11-28 08:45:09,397][00195] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3804.4). Total num frames: 3571712. Throughput: 0: 987.6. Samples: 892358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:45:09,401][00195] Avg episode reward: [(0, '21.624')] -[2024-11-28 08:45:09,413][02251] Saving new best policy, reward=21.624! -[2024-11-28 08:45:14,396][00195] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3592192. Throughput: 0: 945.9. Samples: 897628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:45:14,403][00195] Avg episode reward: [(0, '21.365')] -[2024-11-28 08:45:17,062][02268] Updated weights for policy 0, policy_version 880 (0.0018) -[2024-11-28 08:45:19,396][00195] Fps is (10 sec: 4096.4, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3612672. Throughput: 0: 955.7. Samples: 900950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-11-28 08:45:19,399][00195] Avg episode reward: [(0, '21.279')] -[2024-11-28 08:45:24,398][00195] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3790.5). Total num frames: 3629056. Throughput: 0: 998.8. Samples: 907386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:45:24,401][00195] Avg episode reward: [(0, '21.432')] -[2024-11-28 08:45:28,630][02268] Updated weights for policy 0, policy_version 890 (0.0063) -[2024-11-28 08:45:29,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3645440. Throughput: 0: 942.8. Samples: 911730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:45:29,403][00195] Avg episode reward: [(0, '21.920')] -[2024-11-28 08:45:29,447][02251] Saving new best policy, reward=21.920! -[2024-11-28 08:45:34,396][00195] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3670016. Throughput: 0: 940.0. Samples: 915016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:45:34,399][00195] Avg episode reward: [(0, '22.432')] -[2024-11-28 08:45:34,403][02251] Saving new best policy, reward=22.432! -[2024-11-28 08:45:37,745][02268] Updated weights for policy 0, policy_version 900 (0.0024) -[2024-11-28 08:45:39,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3690496. Throughput: 0: 992.9. Samples: 921836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-11-28 08:45:39,399][00195] Avg episode reward: [(0, '21.587')] -[2024-11-28 08:45:44,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3702784. Throughput: 0: 948.3. Samples: 926334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:45:44,402][00195] Avg episode reward: [(0, '20.588')] -[2024-11-28 08:45:49,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3723264. Throughput: 0: 928.3. Samples: 928816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:45:49,399][00195] Avg episode reward: [(0, '21.814')] -[2024-11-28 08:45:49,654][02268] Updated weights for policy 0, policy_version 910 (0.0024) -[2024-11-28 08:45:54,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3747840. Throughput: 0: 962.9. Samples: 935688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) -[2024-11-28 08:45:54,399][00195] Avg episode reward: [(0, '20.884')] -[2024-11-28 08:45:59,397][00195] Fps is (10 sec: 4095.7, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3764224. Throughput: 0: 971.8. Samples: 941360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:45:59,402][00195] Avg episode reward: [(0, '20.709')] -[2024-11-28 08:46:00,278][02268] Updated weights for policy 0, policy_version 920 (0.0026) -[2024-11-28 08:46:04,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3790.5). Total num frames: 3780608. Throughput: 0: 946.0. Samples: 943522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:46:04,402][00195] Avg episode reward: [(0, '20.664')] -[2024-11-28 08:46:09,397][00195] Fps is (10 sec: 4096.3, 60 sec: 3891.3, 300 sec: 3818.3). Total num frames: 3805184. Throughput: 0: 952.1. Samples: 950230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:46:09,399][00195] Avg episode reward: [(0, '21.113')] -[2024-11-28 08:46:09,777][02268] Updated weights for policy 0, policy_version 930 (0.0021) -[2024-11-28 08:46:14,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3825664. Throughput: 0: 999.5. Samples: 956708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:46:14,400][00195] Avg episode reward: [(0, '19.813')] -[2024-11-28 08:46:19,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 3837952. Throughput: 0: 973.1. Samples: 958804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:46:19,399][00195] Avg episode reward: [(0, '20.580')] -[2024-11-28 08:46:21,357][02268] Updated weights for policy 0, policy_version 940 (0.0030) -[2024-11-28 08:46:24,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3818.3). Total num frames: 3862528. Throughput: 0: 945.2. Samples: 964368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:46:24,404][00195] Avg episode reward: [(0, '20.875')] -[2024-11-28 08:46:29,396][00195] Fps is (10 sec: 4915.4, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 3887104. Throughput: 0: 1002.4. Samples: 971440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-11-28 08:46:29,399][00195] Avg episode reward: [(0, '21.718')] -[2024-11-28 08:46:30,302][02268] Updated weights for policy 0, policy_version 950 (0.0027) -[2024-11-28 08:46:34,400][00195] Fps is (10 sec: 3685.0, 60 sec: 3822.7, 300 sec: 3790.5). Total num frames: 3899392. Throughput: 0: 1007.1. Samples: 974138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:46:34,403][00195] Avg episode reward: [(0, '22.750')] -[2024-11-28 08:46:34,405][02251] Saving new best policy, reward=22.750! -[2024-11-28 08:46:39,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3919872. Throughput: 0: 957.6. Samples: 978782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:46:39,407][00195] Avg episode reward: [(0, '24.118')] -[2024-11-28 08:46:39,417][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000957_3919872.pth... -[2024-11-28 08:46:39,536][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000734_3006464.pth -[2024-11-28 08:46:39,552][02251] Saving new best policy, reward=24.118! -[2024-11-28 08:46:41,681][02268] Updated weights for policy 0, policy_version 960 (0.0016) -[2024-11-28 08:46:44,396][00195] Fps is (10 sec: 4507.3, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 3944448. Throughput: 0: 983.2. Samples: 985602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:46:44,399][00195] Avg episode reward: [(0, '25.261')] -[2024-11-28 08:46:44,405][02251] Saving new best policy, reward=25.261! -[2024-11-28 08:46:49,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3960832. Throughput: 0: 1006.4. Samples: 988812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-11-28 08:46:49,400][00195] Avg episode reward: [(0, '24.951')] -[2024-11-28 08:46:53,484][02268] Updated weights for policy 0, policy_version 970 (0.0023) -[2024-11-28 08:46:54,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3977216. Throughput: 0: 950.9. Samples: 993022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:46:54,399][00195] Avg episode reward: [(0, '25.118')] -[2024-11-28 08:46:59,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3997696. Throughput: 0: 955.6. Samples: 999712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-11-28 08:46:59,402][00195] Avg episode reward: [(0, '24.634')] -[2024-11-28 08:47:00,456][02251] Stopping Batcher_0... -[2024-11-28 08:47:00,456][02251] Loop batcher_evt_loop terminating... -[2024-11-28 08:47:00,456][00195] Component Batcher_0 stopped! -[2024-11-28 08:47:00,460][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-11-28 08:47:00,516][02268] Weights refcount: 2 0 -[2024-11-28 08:47:00,520][02268] Stopping InferenceWorker_p0-w0... -[2024-11-28 08:47:00,521][02268] Loop inference_proc0-0_evt_loop terminating... -[2024-11-28 08:47:00,520][00195] Component InferenceWorker_p0-w0 stopped! -[2024-11-28 08:47:00,579][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000844_3457024.pth -[2024-11-28 08:47:00,594][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-11-28 08:47:00,792][02251] Stopping LearnerWorker_p0... -[2024-11-28 08:47:00,793][00195] Component LearnerWorker_p0 stopped! -[2024-11-28 08:47:00,793][02251] Loop learner_proc0_evt_loop terminating... -[2024-11-28 08:47:00,840][02272] Stopping RolloutWorker_w3... -[2024-11-28 08:47:00,841][02272] Loop rollout_proc3_evt_loop terminating... -[2024-11-28 08:47:00,840][00195] Component RolloutWorker_w3 stopped! -[2024-11-28 08:47:00,892][02274] Stopping RolloutWorker_w5... -[2024-11-28 08:47:00,892][02274] Loop rollout_proc5_evt_loop terminating... -[2024-11-28 08:47:00,889][00195] Component RolloutWorker_w5 stopped! -[2024-11-28 08:47:00,894][02275] Stopping RolloutWorker_w7... -[2024-11-28 08:47:00,895][00195] Component RolloutWorker_w7 stopped! -[2024-11-28 08:47:00,901][02275] Loop rollout_proc7_evt_loop terminating... -[2024-11-28 08:47:00,904][00195] Component RolloutWorker_w1 stopped! -[2024-11-28 08:47:00,906][02270] Stopping RolloutWorker_w1... -[2024-11-28 08:47:00,910][02270] Loop rollout_proc1_evt_loop terminating... -[2024-11-28 08:47:00,943][02271] Stopping RolloutWorker_w2... -[2024-11-28 08:47:00,943][00195] Component RolloutWorker_w2 stopped! -[2024-11-28 08:47:00,944][02271] Loop rollout_proc2_evt_loop terminating... -[2024-11-28 08:47:00,971][02273] Stopping RolloutWorker_w4... -[2024-11-28 08:47:00,971][00195] Component RolloutWorker_w4 stopped! -[2024-11-28 08:47:00,972][02273] Loop rollout_proc4_evt_loop terminating... -[2024-11-28 08:47:00,991][00195] Component RolloutWorker_w0 stopped! -[2024-11-28 08:47:00,991][02269] Stopping RolloutWorker_w0... -[2024-11-28 08:47:00,997][02269] Loop rollout_proc0_evt_loop terminating... -[2024-11-28 08:47:01,013][02276] Stopping RolloutWorker_w6... -[2024-11-28 08:47:01,013][00195] Component RolloutWorker_w6 stopped! -[2024-11-28 08:47:01,014][02276] Loop rollout_proc6_evt_loop terminating... -[2024-11-28 08:47:01,015][00195] Waiting for process learner_proc0 to stop... -[2024-11-28 08:47:02,457][00195] Waiting for process inference_proc0-0 to join... -[2024-11-28 08:47:02,462][00195] Waiting for process rollout_proc0 to join... -[2024-11-28 08:47:04,804][00195] Waiting for process rollout_proc1 to join... -[2024-11-28 08:47:04,934][00195] Waiting for process rollout_proc2 to join... -[2024-11-28 08:47:04,937][00195] Waiting for process rollout_proc3 to join... -[2024-11-28 08:47:04,942][00195] Waiting for process rollout_proc4 to join... -[2024-11-28 08:47:04,946][00195] Waiting for process rollout_proc5 to join... -[2024-11-28 08:47:04,950][00195] Waiting for process rollout_proc6 to join... -[2024-11-28 08:47:04,954][00195] Waiting for process rollout_proc7 to join... -[2024-11-28 08:47:04,960][00195] Batcher 0 profile tree view: -batching: 28.2691, releasing_batches: 0.0272 -[2024-11-28 08:47:04,961][00195] InferenceWorker_p0-w0 profile tree view: +[2024-12-01 11:06:56,018][04297] Using optimizer +[2024-12-01 11:06:57,417][02154] Heartbeat connected on Batcher_0 +[2024-12-01 11:06:57,454][02154] Heartbeat connected on InferenceWorker_p0-w0 +[2024-12-01 11:06:57,481][02154] Heartbeat connected on RolloutWorker_w0 +[2024-12-01 11:06:57,495][02154] Heartbeat connected on RolloutWorker_w1 +[2024-12-01 11:06:57,510][02154] Heartbeat connected on RolloutWorker_w2 +[2024-12-01 11:06:57,524][02154] Heartbeat connected on RolloutWorker_w3 +[2024-12-01 11:06:57,530][02154] Heartbeat connected on RolloutWorker_w4 +[2024-12-01 11:06:57,540][02154] Heartbeat connected on RolloutWorker_w5 +[2024-12-01 11:06:57,547][02154] Heartbeat connected on RolloutWorker_w6 +[2024-12-01 11:06:57,554][02154] Heartbeat connected on RolloutWorker_w7 +[2024-12-01 11:06:59,419][04297] No checkpoints found +[2024-12-01 11:06:59,420][04297] Did not load from checkpoint, starting from scratch! +[2024-12-01 11:06:59,420][04297] Initialized policy 0 weights for model version 0 +[2024-12-01 11:06:59,424][04297] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-01 11:06:59,431][04297] LearnerWorker_p0 finished initialization! +[2024-12-01 11:06:59,432][02154] Heartbeat connected on LearnerWorker_p0 +[2024-12-01 11:06:59,545][02154] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-12-01 11:06:59,623][04311] RunningMeanStd input shape: (3, 72, 128) +[2024-12-01 11:06:59,624][04311] RunningMeanStd input shape: (1,) +[2024-12-01 11:06:59,636][04311] ConvEncoder: input_channels=3 +[2024-12-01 11:06:59,743][04311] Conv encoder output size: 512 +[2024-12-01 11:06:59,743][04311] Policy head output size: 512 +[2024-12-01 11:06:59,806][02154] Inference worker 0-0 is ready! +[2024-12-01 11:06:59,808][02154] All inference workers are ready! Signal rollout workers to start! +[2024-12-01 11:06:59,998][04318] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-01 11:07:00,001][04314] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-01 11:07:00,002][04316] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-01 11:07:00,003][04312] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-01 11:07:00,012][04315] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-01 11:07:00,013][04310] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-01 11:07:00,008][04313] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-01 11:07:00,018][04317] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-01 11:07:01,031][04317] Decorrelating experience for 0 frames... +[2024-12-01 11:07:01,030][04315] Decorrelating experience for 0 frames... +[2024-12-01 11:07:01,708][04318] Decorrelating experience for 0 frames... +[2024-12-01 11:07:01,718][04312] Decorrelating experience for 0 frames... +[2024-12-01 11:07:01,715][04316] Decorrelating experience for 0 frames... +[2024-12-01 11:07:01,725][04314] Decorrelating experience for 0 frames... +[2024-12-01 11:07:02,165][04315] Decorrelating experience for 32 frames... +[2024-12-01 11:07:02,191][04313] Decorrelating experience for 0 frames... +[2024-12-01 11:07:03,802][04316] Decorrelating experience for 32 frames... +[2024-12-01 11:07:03,808][04318] Decorrelating experience for 32 frames... +[2024-12-01 11:07:03,816][04312] Decorrelating experience for 32 frames... +[2024-12-01 11:07:04,130][04317] Decorrelating experience for 32 frames... +[2024-12-01 11:07:04,525][04314] Decorrelating experience for 32 frames... +[2024-12-01 11:07:04,545][02154] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-12-01 11:07:04,624][04315] Decorrelating experience for 64 frames... +[2024-12-01 11:07:05,380][04310] Decorrelating experience for 0 frames... +[2024-12-01 11:07:05,384][04313] Decorrelating experience for 32 frames... +[2024-12-01 11:07:06,428][04316] Decorrelating experience for 64 frames... +[2024-12-01 11:07:06,437][04312] Decorrelating experience for 64 frames... +[2024-12-01 11:07:06,444][04318] Decorrelating experience for 64 frames... +[2024-12-01 11:07:06,902][04313] Decorrelating experience for 64 frames... +[2024-12-01 11:07:06,974][04314] Decorrelating experience for 64 frames... +[2024-12-01 11:07:07,872][04316] Decorrelating experience for 96 frames... +[2024-12-01 11:07:07,878][04310] Decorrelating experience for 32 frames... +[2024-12-01 11:07:07,883][04318] Decorrelating experience for 96 frames... +[2024-12-01 11:07:07,912][04315] Decorrelating experience for 96 frames... +[2024-12-01 11:07:08,694][04312] Decorrelating experience for 96 frames... +[2024-12-01 11:07:09,131][04317] Decorrelating experience for 64 frames... +[2024-12-01 11:07:09,134][04313] Decorrelating experience for 96 frames... +[2024-12-01 11:07:09,447][04310] Decorrelating experience for 64 frames... +[2024-12-01 11:07:09,545][02154] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-12-01 11:07:09,858][04314] Decorrelating experience for 96 frames... +[2024-12-01 11:07:10,065][04317] Decorrelating experience for 96 frames... +[2024-12-01 11:07:10,203][04310] Decorrelating experience for 96 frames... +[2024-12-01 11:07:12,812][04297] Signal inference workers to stop experience collection... +[2024-12-01 11:07:12,825][04311] InferenceWorker_p0-w0: stopping experience collection +[2024-12-01 11:07:14,545][02154] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 95.6. Samples: 1434. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-12-01 11:07:14,551][02154] Avg episode reward: [(0, '2.083')] +[2024-12-01 11:07:16,225][04297] Signal inference workers to resume experience collection... +[2024-12-01 11:07:16,227][04311] InferenceWorker_p0-w0: resuming experience collection +[2024-12-01 11:07:19,546][02154] Fps is (10 sec: 1638.2, 60 sec: 819.1, 300 sec: 819.1). Total num frames: 16384. Throughput: 0: 181.1. Samples: 3622. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2024-12-01 11:07:19,553][02154] Avg episode reward: [(0, '3.384')] +[2024-12-01 11:07:24,545][02154] Fps is (10 sec: 2867.3, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 28672. Throughput: 0: 312.3. Samples: 7808. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2024-12-01 11:07:24,548][02154] Avg episode reward: [(0, '3.813')] +[2024-12-01 11:07:26,577][04311] Updated weights for policy 0, policy_version 10 (0.0151) +[2024-12-01 11:07:29,545][02154] Fps is (10 sec: 3687.1, 60 sec: 1775.0, 300 sec: 1775.0). Total num frames: 53248. Throughput: 0: 359.9. Samples: 10796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-12-01 11:07:29,551][02154] Avg episode reward: [(0, '4.356')] +[2024-12-01 11:07:34,545][02154] Fps is (10 sec: 4505.6, 60 sec: 2106.6, 300 sec: 2106.6). Total num frames: 73728. Throughput: 0: 509.1. Samples: 17818. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-12-01 11:07:34,550][02154] Avg episode reward: [(0, '4.455')] +[2024-12-01 11:07:36,731][04311] Updated weights for policy 0, policy_version 20 (0.0016) +[2024-12-01 11:07:39,545][02154] Fps is (10 sec: 3276.8, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 86016. Throughput: 0: 560.0. Samples: 22400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:07:39,549][02154] Avg episode reward: [(0, '4.468')] +[2024-12-01 11:07:44,545][02154] Fps is (10 sec: 3276.7, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 106496. Throughput: 0: 544.9. Samples: 24522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:07:44,547][02154] Avg episode reward: [(0, '4.385')] +[2024-12-01 11:07:44,551][04297] Saving new best policy, reward=4.385! +[2024-12-01 11:07:47,758][04311] Updated weights for policy 0, policy_version 30 (0.0019) +[2024-12-01 11:07:49,544][02154] Fps is (10 sec: 4505.7, 60 sec: 2621.5, 300 sec: 2621.5). Total num frames: 131072. Throughput: 0: 697.3. Samples: 31378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:07:49,551][02154] Avg episode reward: [(0, '4.446')] +[2024-12-01 11:07:49,557][04297] Saving new best policy, reward=4.446! +[2024-12-01 11:07:54,545][02154] Fps is (10 sec: 4096.1, 60 sec: 2681.0, 300 sec: 2681.0). Total num frames: 147456. Throughput: 0: 831.2. Samples: 37404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:07:54,550][02154] Avg episode reward: [(0, '4.371')] +[2024-12-01 11:07:59,364][04311] Updated weights for policy 0, policy_version 40 (0.0041) +[2024-12-01 11:07:59,545][02154] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 163840. Throughput: 0: 845.7. Samples: 39488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:07:59,546][02154] Avg episode reward: [(0, '4.379')] +[2024-12-01 11:08:04,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 184320. Throughput: 0: 924.0. Samples: 45202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:08:04,549][02154] Avg episode reward: [(0, '4.576')] +[2024-12-01 11:08:04,552][04297] Saving new best policy, reward=4.576! +[2024-12-01 11:08:08,677][04311] Updated weights for policy 0, policy_version 50 (0.0017) +[2024-12-01 11:08:09,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 2925.7). Total num frames: 204800. Throughput: 0: 980.3. Samples: 51922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:08:09,548][02154] Avg episode reward: [(0, '4.483')] +[2024-12-01 11:08:14,544][02154] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 2949.1). Total num frames: 221184. Throughput: 0: 960.1. Samples: 54000. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:08:14,551][02154] Avg episode reward: [(0, '4.418')] +[2024-12-01 11:08:19,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3020.8). Total num frames: 241664. Throughput: 0: 920.1. Samples: 59224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:08:19,546][02154] Avg episode reward: [(0, '4.284')] +[2024-12-01 11:08:20,178][04311] Updated weights for policy 0, policy_version 60 (0.0030) +[2024-12-01 11:08:24,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3084.1). Total num frames: 262144. Throughput: 0: 973.3. Samples: 66198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:08:24,550][02154] Avg episode reward: [(0, '4.324')] +[2024-12-01 11:08:29,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3094.8). Total num frames: 278528. Throughput: 0: 989.1. Samples: 69032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:08:29,547][02154] Avg episode reward: [(0, '4.523')] +[2024-12-01 11:08:29,557][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth... +[2024-12-01 11:08:31,493][04311] Updated weights for policy 0, policy_version 70 (0.0023) +[2024-12-01 11:08:34,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3147.5). Total num frames: 299008. Throughput: 0: 935.6. Samples: 73478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:08:34,549][02154] Avg episode reward: [(0, '4.364')] +[2024-12-01 11:08:39,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3194.9). Total num frames: 319488. Throughput: 0: 955.3. Samples: 80394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:08:39,551][02154] Avg episode reward: [(0, '4.338')] +[2024-12-01 11:08:40,743][04311] Updated weights for policy 0, policy_version 80 (0.0022) +[2024-12-01 11:08:44,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3237.8). Total num frames: 339968. Throughput: 0: 985.3. Samples: 83828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:08:44,551][02154] Avg episode reward: [(0, '4.484')] +[2024-12-01 11:08:49,549][02154] Fps is (10 sec: 3684.6, 60 sec: 3754.4, 300 sec: 3239.4). Total num frames: 356352. Throughput: 0: 954.4. Samples: 88156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:08:49,552][02154] Avg episode reward: [(0, '4.416')] +[2024-12-01 11:08:52,127][04311] Updated weights for policy 0, policy_version 90 (0.0015) +[2024-12-01 11:08:54,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 376832. Throughput: 0: 950.7. Samples: 94702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:08:54,551][02154] Avg episode reward: [(0, '4.424')] +[2024-12-01 11:08:59,545][02154] Fps is (10 sec: 4507.7, 60 sec: 3959.5, 300 sec: 3345.1). Total num frames: 401408. Throughput: 0: 981.5. Samples: 98166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:08:59,546][02154] Avg episode reward: [(0, '4.514')] +[2024-12-01 11:09:02,173][04311] Updated weights for policy 0, policy_version 100 (0.0030) +[2024-12-01 11:09:04,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3309.6). Total num frames: 413696. Throughput: 0: 979.0. Samples: 103278. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:09:04,547][02154] Avg episode reward: [(0, '4.520')] +[2024-12-01 11:09:09,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3339.8). Total num frames: 434176. Throughput: 0: 949.6. Samples: 108928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:09:09,550][02154] Avg episode reward: [(0, '4.629')] +[2024-12-01 11:09:09,558][04297] Saving new best policy, reward=4.629! +[2024-12-01 11:09:12,567][04311] Updated weights for policy 0, policy_version 110 (0.0027) +[2024-12-01 11:09:14,545][02154] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3398.2). Total num frames: 458752. Throughput: 0: 961.9. Samples: 112318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:09:14,553][02154] Avg episode reward: [(0, '4.475')] +[2024-12-01 11:09:19,548][02154] Fps is (10 sec: 3685.0, 60 sec: 3822.7, 300 sec: 3364.5). Total num frames: 471040. Throughput: 0: 992.4. Samples: 118142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:09:19,554][02154] Avg episode reward: [(0, '4.368')] +[2024-12-01 11:09:24,332][04311] Updated weights for policy 0, policy_version 120 (0.0018) +[2024-12-01 11:09:24,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3389.8). Total num frames: 491520. Throughput: 0: 945.3. Samples: 122934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:09:24,547][02154] Avg episode reward: [(0, '4.358')] +[2024-12-01 11:09:29,545][02154] Fps is (10 sec: 4097.6, 60 sec: 3891.2, 300 sec: 3413.3). Total num frames: 512000. Throughput: 0: 946.1. Samples: 126404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:09:29,551][02154] Avg episode reward: [(0, '4.650')] +[2024-12-01 11:09:29,616][04297] Saving new best policy, reward=4.650! +[2024-12-01 11:09:33,599][04311] Updated weights for policy 0, policy_version 130 (0.0014) +[2024-12-01 11:09:34,546][02154] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3435.3). Total num frames: 532480. Throughput: 0: 998.9. Samples: 133104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:09:34,551][02154] Avg episode reward: [(0, '4.935')] +[2024-12-01 11:09:34,556][04297] Saving new best policy, reward=4.935! +[2024-12-01 11:09:39,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3430.4). Total num frames: 548864. Throughput: 0: 945.2. Samples: 137238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-01 11:09:39,551][02154] Avg episode reward: [(0, '4.899')] +[2024-12-01 11:09:44,545][02154] Fps is (10 sec: 3686.7, 60 sec: 3822.9, 300 sec: 3450.6). Total num frames: 569344. Throughput: 0: 940.5. Samples: 140488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:09:44,555][02154] Avg episode reward: [(0, '4.829')] +[2024-12-01 11:09:44,762][04311] Updated weights for policy 0, policy_version 140 (0.0035) +[2024-12-01 11:09:49,546][02154] Fps is (10 sec: 4505.0, 60 sec: 3959.7, 300 sec: 3493.6). Total num frames: 593920. Throughput: 0: 982.6. Samples: 147496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:09:49,550][02154] Avg episode reward: [(0, '4.708')] +[2024-12-01 11:09:54,545][02154] Fps is (10 sec: 3686.7, 60 sec: 3822.9, 300 sec: 3464.1). Total num frames: 606208. Throughput: 0: 964.9. Samples: 152350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:09:54,552][02154] Avg episode reward: [(0, '4.756')] +[2024-12-01 11:09:56,202][04311] Updated weights for policy 0, policy_version 150 (0.0026) +[2024-12-01 11:09:59,545][02154] Fps is (10 sec: 3277.3, 60 sec: 3754.7, 300 sec: 3481.6). Total num frames: 626688. Throughput: 0: 942.8. Samples: 154744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:09:59,548][02154] Avg episode reward: [(0, '4.559')] +[2024-12-01 11:10:04,545][02154] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3520.4). Total num frames: 651264. Throughput: 0: 967.1. Samples: 161658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:10:04,547][02154] Avg episode reward: [(0, '4.980')] +[2024-12-01 11:10:04,554][04297] Saving new best policy, reward=4.980! +[2024-12-01 11:10:05,329][04311] Updated weights for policy 0, policy_version 160 (0.0017) +[2024-12-01 11:10:09,546][02154] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3513.9). Total num frames: 667648. Throughput: 0: 985.8. Samples: 167296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:10:09,548][02154] Avg episode reward: [(0, '4.886')] +[2024-12-01 11:10:14,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3507.9). Total num frames: 684032. Throughput: 0: 953.6. Samples: 169316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:10:14,552][02154] Avg episode reward: [(0, '4.873')] +[2024-12-01 11:10:16,937][04311] Updated weights for policy 0, policy_version 170 (0.0030) +[2024-12-01 11:10:19,545][02154] Fps is (10 sec: 4096.5, 60 sec: 3959.7, 300 sec: 3543.1). Total num frames: 708608. Throughput: 0: 947.8. Samples: 175752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-12-01 11:10:19,551][02154] Avg episode reward: [(0, '5.009')] +[2024-12-01 11:10:19,561][04297] Saving new best policy, reward=5.009! +[2024-12-01 11:10:24,572][02154] Fps is (10 sec: 4085.0, 60 sec: 3889.4, 300 sec: 3536.1). Total num frames: 724992. Throughput: 0: 1000.6. Samples: 182290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:10:24,576][02154] Avg episode reward: [(0, '5.080')] +[2024-12-01 11:10:24,651][04297] Saving new best policy, reward=5.080! +[2024-12-01 11:10:27,725][04311] Updated weights for policy 0, policy_version 180 (0.0017) +[2024-12-01 11:10:29,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3530.4). Total num frames: 741376. Throughput: 0: 973.6. Samples: 184300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:10:29,550][02154] Avg episode reward: [(0, '5.132')] +[2024-12-01 11:10:29,560][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000181_741376.pth... +[2024-12-01 11:10:29,723][04297] Saving new best policy, reward=5.132! +[2024-12-01 11:10:34,545][02154] Fps is (10 sec: 3696.4, 60 sec: 3823.0, 300 sec: 3543.5). Total num frames: 761856. Throughput: 0: 938.6. Samples: 189732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:10:34,551][02154] Avg episode reward: [(0, '5.341')] +[2024-12-01 11:10:34,553][04297] Saving new best policy, reward=5.341! +[2024-12-01 11:10:37,487][04311] Updated weights for policy 0, policy_version 190 (0.0018) +[2024-12-01 11:10:39,545][02154] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3574.7). Total num frames: 786432. Throughput: 0: 981.3. Samples: 196510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:10:39,547][02154] Avg episode reward: [(0, '5.566')] +[2024-12-01 11:10:39,555][04297] Saving new best policy, reward=5.566! +[2024-12-01 11:10:44,548][02154] Fps is (10 sec: 3685.0, 60 sec: 3822.7, 300 sec: 3549.8). Total num frames: 798720. Throughput: 0: 985.3. Samples: 199086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:10:44,551][02154] Avg episode reward: [(0, '5.299')] +[2024-12-01 11:10:49,132][04311] Updated weights for policy 0, policy_version 200 (0.0062) +[2024-12-01 11:10:49,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3561.7). Total num frames: 819200. Throughput: 0: 936.8. Samples: 203812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:10:49,547][02154] Avg episode reward: [(0, '5.720')] +[2024-12-01 11:10:49,558][04297] Saving new best policy, reward=5.720! +[2024-12-01 11:10:54,544][02154] Fps is (10 sec: 4507.4, 60 sec: 3959.5, 300 sec: 3590.5). Total num frames: 843776. Throughput: 0: 965.0. Samples: 210722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:10:54,551][02154] Avg episode reward: [(0, '5.950')] +[2024-12-01 11:10:54,555][04297] Saving new best policy, reward=5.950! +[2024-12-01 11:10:59,072][04311] Updated weights for policy 0, policy_version 210 (0.0021) +[2024-12-01 11:10:59,545][02154] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3584.0). Total num frames: 860160. Throughput: 0: 994.7. Samples: 214078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:10:59,547][02154] Avg episode reward: [(0, '5.960')] +[2024-12-01 11:10:59,553][04297] Saving new best policy, reward=5.960! +[2024-12-01 11:11:04,544][02154] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3577.7). Total num frames: 876544. Throughput: 0: 945.2. Samples: 218284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:11:04,547][02154] Avg episode reward: [(0, '5.842')] +[2024-12-01 11:11:09,553][02154] Fps is (10 sec: 3686.5, 60 sec: 3823.0, 300 sec: 3588.1). Total num frames: 897024. Throughput: 0: 945.7. Samples: 224820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:11:09,560][02154] Avg episode reward: [(0, '6.163')] +[2024-12-01 11:11:09,585][04311] Updated weights for policy 0, policy_version 220 (0.0025) +[2024-12-01 11:11:09,589][04297] Saving new best policy, reward=6.163! +[2024-12-01 11:11:14,550][02154] Fps is (10 sec: 4502.9, 60 sec: 3959.1, 300 sec: 3614.0). Total num frames: 921600. Throughput: 0: 976.0. Samples: 228226. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-12-01 11:11:14,553][02154] Avg episode reward: [(0, '6.746')] +[2024-12-01 11:11:14,555][04297] Saving new best policy, reward=6.746! +[2024-12-01 11:11:19,547][02154] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3591.8). Total num frames: 933888. Throughput: 0: 964.5. Samples: 233136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:11:19,552][02154] Avg episode reward: [(0, '6.237')] +[2024-12-01 11:11:21,326][04311] Updated weights for policy 0, policy_version 230 (0.0025) +[2024-12-01 11:11:24,545][02154] Fps is (10 sec: 3278.7, 60 sec: 3824.7, 300 sec: 3601.4). Total num frames: 954368. Throughput: 0: 948.0. Samples: 239172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:11:24,547][02154] Avg episode reward: [(0, '6.446')] +[2024-12-01 11:11:29,545][02154] Fps is (10 sec: 4506.9, 60 sec: 3959.5, 300 sec: 3625.7). Total num frames: 978944. Throughput: 0: 969.6. Samples: 242714. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:11:29,547][02154] Avg episode reward: [(0, '6.223')] +[2024-12-01 11:11:29,923][04311] Updated weights for policy 0, policy_version 240 (0.0032) +[2024-12-01 11:11:34,545][02154] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3619.4). Total num frames: 995328. Throughput: 0: 995.8. Samples: 248622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:11:34,547][02154] Avg episode reward: [(0, '6.553')] +[2024-12-01 11:11:39,545][02154] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3627.9). Total num frames: 1015808. Throughput: 0: 957.9. Samples: 253830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:11:39,550][02154] Avg episode reward: [(0, '6.609')] +[2024-12-01 11:11:41,322][04311] Updated weights for policy 0, policy_version 250 (0.0025) +[2024-12-01 11:11:44,545][02154] Fps is (10 sec: 4096.1, 60 sec: 3959.7, 300 sec: 3636.1). Total num frames: 1036288. Throughput: 0: 959.3. Samples: 257246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:11:44,547][02154] Avg episode reward: [(0, '6.984')] +[2024-12-01 11:11:44,553][04297] Saving new best policy, reward=6.984! +[2024-12-01 11:11:49,545][02154] Fps is (10 sec: 4095.7, 60 sec: 3959.4, 300 sec: 3644.0). Total num frames: 1056768. Throughput: 0: 1013.1. Samples: 263874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-01 11:11:49,550][02154] Avg episode reward: [(0, '6.679')] +[2024-12-01 11:11:52,294][04311] Updated weights for policy 0, policy_version 260 (0.0019) +[2024-12-01 11:11:54,544][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3637.8). Total num frames: 1073152. Throughput: 0: 963.6. Samples: 268184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:11:54,546][02154] Avg episode reward: [(0, '7.022')] +[2024-12-01 11:11:54,551][04297] Saving new best policy, reward=7.022! +[2024-12-01 11:11:59,545][02154] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 1093632. Throughput: 0: 965.1. Samples: 271652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:11:59,549][02154] Avg episode reward: [(0, '6.522')] +[2024-12-01 11:12:01,509][04311] Updated weights for policy 0, policy_version 270 (0.0022) +[2024-12-01 11:12:04,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1114112. Throughput: 0: 1011.0. Samples: 278628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:12:04,550][02154] Avg episode reward: [(0, '6.455')] +[2024-12-01 11:12:09,546][02154] Fps is (10 sec: 3686.0, 60 sec: 3891.1, 300 sec: 3832.2). Total num frames: 1130496. Throughput: 0: 979.8. Samples: 283264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:12:09,548][02154] Avg episode reward: [(0, '6.511')] +[2024-12-01 11:12:12,954][04311] Updated weights for policy 0, policy_version 280 (0.0022) +[2024-12-01 11:12:14,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3846.1). Total num frames: 1150976. Throughput: 0: 962.5. Samples: 286028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:12:14,551][02154] Avg episode reward: [(0, '6.813')] +[2024-12-01 11:12:19,545][02154] Fps is (10 sec: 4506.0, 60 sec: 4027.9, 300 sec: 3887.7). Total num frames: 1175552. Throughput: 0: 988.9. Samples: 293124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:12:19,548][02154] Avg episode reward: [(0, '7.376')] +[2024-12-01 11:12:19,555][04297] Saving new best policy, reward=7.376! +[2024-12-01 11:12:22,420][04311] Updated weights for policy 0, policy_version 290 (0.0014) +[2024-12-01 11:12:24,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1191936. Throughput: 0: 993.0. Samples: 298514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:12:24,548][02154] Avg episode reward: [(0, '7.498')] +[2024-12-01 11:12:24,556][04297] Saving new best policy, reward=7.498! +[2024-12-01 11:12:29,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1212416. Throughput: 0: 964.0. Samples: 300626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:12:29,547][02154] Avg episode reward: [(0, '8.103')] +[2024-12-01 11:12:29,556][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth... +[2024-12-01 11:12:29,686][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth +[2024-12-01 11:12:29,703][04297] Saving new best policy, reward=8.103! +[2024-12-01 11:12:33,069][04311] Updated weights for policy 0, policy_version 300 (0.0021) +[2024-12-01 11:12:34,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1232896. Throughput: 0: 969.2. Samples: 307488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:12:34,550][02154] Avg episode reward: [(0, '8.688')] +[2024-12-01 11:12:34,553][04297] Saving new best policy, reward=8.688! +[2024-12-01 11:12:39,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1253376. Throughput: 0: 1010.7. Samples: 313666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:12:39,548][02154] Avg episode reward: [(0, '8.875')] +[2024-12-01 11:12:39,557][04297] Saving new best policy, reward=8.875! +[2024-12-01 11:12:44,545][02154] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1265664. Throughput: 0: 978.8. Samples: 315698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:12:44,551][02154] Avg episode reward: [(0, '8.909')] +[2024-12-01 11:12:44,625][04311] Updated weights for policy 0, policy_version 310 (0.0017) +[2024-12-01 11:12:44,621][04297] Saving new best policy, reward=8.909! +[2024-12-01 11:12:49,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3873.8). Total num frames: 1290240. Throughput: 0: 958.2. Samples: 321748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:12:49,547][02154] Avg episode reward: [(0, '9.240')] +[2024-12-01 11:12:49,554][04297] Saving new best policy, reward=9.240! +[2024-12-01 11:12:53,504][04311] Updated weights for policy 0, policy_version 320 (0.0019) +[2024-12-01 11:12:54,545][02154] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1310720. Throughput: 0: 1011.2. Samples: 328768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:12:54,550][02154] Avg episode reward: [(0, '9.491')] +[2024-12-01 11:12:54,616][04297] Saving new best policy, reward=9.491! +[2024-12-01 11:12:59,545][02154] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 1327104. Throughput: 0: 995.8. Samples: 330838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:12:59,548][02154] Avg episode reward: [(0, '9.634')] +[2024-12-01 11:12:59,561][04297] Saving new best policy, reward=9.634! +[2024-12-01 11:13:04,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1347584. Throughput: 0: 956.2. Samples: 336154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:13:04,547][02154] Avg episode reward: [(0, '9.440')] +[2024-12-01 11:13:04,777][04311] Updated weights for policy 0, policy_version 330 (0.0020) +[2024-12-01 11:13:09,552][02154] Fps is (10 sec: 4502.6, 60 sec: 4027.3, 300 sec: 3901.5). Total num frames: 1372160. Throughput: 0: 994.1. Samples: 343258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:13:09,554][02154] Avg episode reward: [(0, '9.874')] +[2024-12-01 11:13:09,560][04297] Saving new best policy, reward=9.874! +[2024-12-01 11:13:14,546][02154] Fps is (10 sec: 4095.3, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 1388544. Throughput: 0: 1010.3. Samples: 346092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:13:14,551][02154] Avg episode reward: [(0, '9.646')] +[2024-12-01 11:13:15,579][04311] Updated weights for policy 0, policy_version 340 (0.0017) +[2024-12-01 11:13:19,545][02154] Fps is (10 sec: 3279.2, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1404928. Throughput: 0: 956.4. Samples: 350526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:13:19,551][02154] Avg episode reward: [(0, '10.598')] +[2024-12-01 11:13:19,557][04297] Saving new best policy, reward=10.598! +[2024-12-01 11:13:24,545][02154] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1429504. Throughput: 0: 974.2. Samples: 357504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:13:24,551][02154] Avg episode reward: [(0, '10.997')] +[2024-12-01 11:13:24,553][04297] Saving new best policy, reward=10.997! +[2024-12-01 11:13:25,121][04311] Updated weights for policy 0, policy_version 350 (0.0022) +[2024-12-01 11:13:29,545][02154] Fps is (10 sec: 4505.2, 60 sec: 3959.4, 300 sec: 3901.6). Total num frames: 1449984. Throughput: 0: 1006.6. Samples: 360994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:13:29,552][02154] Avg episode reward: [(0, '11.469')] +[2024-12-01 11:13:29,564][04297] Saving new best policy, reward=11.469! +[2024-12-01 11:13:34,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1462272. Throughput: 0: 971.2. Samples: 365450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:13:34,551][02154] Avg episode reward: [(0, '11.156')] +[2024-12-01 11:13:36,611][04311] Updated weights for policy 0, policy_version 360 (0.0024) +[2024-12-01 11:13:39,544][02154] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1486848. Throughput: 0: 960.8. Samples: 372004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:13:39,551][02154] Avg episode reward: [(0, '11.143')] +[2024-12-01 11:13:44,545][02154] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3915.6). Total num frames: 1511424. Throughput: 0: 990.6. Samples: 375416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:13:44,550][02154] Avg episode reward: [(0, '11.309')] +[2024-12-01 11:13:45,885][04311] Updated weights for policy 0, policy_version 370 (0.0018) +[2024-12-01 11:13:49,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1523712. Throughput: 0: 989.6. Samples: 380686. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:13:49,554][02154] Avg episode reward: [(0, '12.157')] +[2024-12-01 11:13:49,566][04297] Saving new best policy, reward=12.157! +[2024-12-01 11:13:54,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1544192. Throughput: 0: 960.2. Samples: 386462. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-01 11:13:54,552][02154] Avg episode reward: [(0, '11.562')] +[2024-12-01 11:13:56,549][04311] Updated weights for policy 0, policy_version 380 (0.0015) +[2024-12-01 11:13:59,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3915.5). Total num frames: 1568768. Throughput: 0: 974.6. Samples: 389948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:13:59,547][02154] Avg episode reward: [(0, '12.331')] +[2024-12-01 11:13:59,554][04297] Saving new best policy, reward=12.331! +[2024-12-01 11:14:04,547][02154] Fps is (10 sec: 4095.1, 60 sec: 3959.3, 300 sec: 3901.6). Total num frames: 1585152. Throughput: 0: 1006.7. Samples: 395828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-01 11:14:04,549][02154] Avg episode reward: [(0, '12.222')] +[2024-12-01 11:14:07,963][04311] Updated weights for policy 0, policy_version 390 (0.0019) +[2024-12-01 11:14:09,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3823.4, 300 sec: 3873.8). Total num frames: 1601536. Throughput: 0: 963.6. Samples: 400868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-01 11:14:09,549][02154] Avg episode reward: [(0, '12.702')] +[2024-12-01 11:14:09,559][04297] Saving new best policy, reward=12.702! +[2024-12-01 11:14:14,545][02154] Fps is (10 sec: 4097.0, 60 sec: 3959.6, 300 sec: 3915.5). Total num frames: 1626112. Throughput: 0: 962.2. Samples: 404294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:14:14,552][02154] Avg episode reward: [(0, '14.319')] +[2024-12-01 11:14:14,557][04297] Saving new best policy, reward=14.319! +[2024-12-01 11:14:16,959][04311] Updated weights for policy 0, policy_version 400 (0.0018) +[2024-12-01 11:14:19,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1642496. Throughput: 0: 1013.3. Samples: 411048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:14:19,547][02154] Avg episode reward: [(0, '15.188')] +[2024-12-01 11:14:19,623][04297] Saving new best policy, reward=15.188! +[2024-12-01 11:14:24,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 1658880. Throughput: 0: 958.9. Samples: 415154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:14:24,552][02154] Avg episode reward: [(0, '15.897')] +[2024-12-01 11:14:24,559][04297] Saving new best policy, reward=15.897! +[2024-12-01 11:14:28,708][04311] Updated weights for policy 0, policy_version 410 (0.0018) +[2024-12-01 11:14:29,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 1679360. Throughput: 0: 956.0. Samples: 418436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:14:29,547][02154] Avg episode reward: [(0, '15.752')] +[2024-12-01 11:14:29,579][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000411_1683456.pth... +[2024-12-01 11:14:29,705][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000181_741376.pth +[2024-12-01 11:14:34,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1703936. Throughput: 0: 992.4. Samples: 425344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:14:34,550][02154] Avg episode reward: [(0, '15.207')] +[2024-12-01 11:14:39,440][04311] Updated weights for policy 0, policy_version 420 (0.0030) +[2024-12-01 11:14:39,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1720320. Throughput: 0: 971.0. Samples: 430158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:14:39,550][02154] Avg episode reward: [(0, '15.406')] +[2024-12-01 11:14:44,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 1740800. Throughput: 0: 952.2. Samples: 432796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:14:44,547][02154] Avg episode reward: [(0, '14.643')] +[2024-12-01 11:14:48,797][04311] Updated weights for policy 0, policy_version 430 (0.0019) +[2024-12-01 11:14:49,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1761280. Throughput: 0: 979.0. Samples: 439882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:14:49,547][02154] Avg episode reward: [(0, '16.035')] +[2024-12-01 11:14:49,553][04297] Saving new best policy, reward=16.035! +[2024-12-01 11:14:54,545][02154] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3901.6). Total num frames: 1777664. Throughput: 0: 988.5. Samples: 445352. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-12-01 11:14:54,549][02154] Avg episode reward: [(0, '15.634')] +[2024-12-01 11:14:59,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 1798144. Throughput: 0: 959.5. Samples: 447470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:14:59,549][02154] Avg episode reward: [(0, '15.724')] +[2024-12-01 11:15:00,013][04311] Updated weights for policy 0, policy_version 440 (0.0019) +[2024-12-01 11:15:04,545][02154] Fps is (10 sec: 4506.0, 60 sec: 3959.6, 300 sec: 3915.5). Total num frames: 1822720. Throughput: 0: 963.1. Samples: 454386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:15:04,552][02154] Avg episode reward: [(0, '15.637')] +[2024-12-01 11:15:09,320][04311] Updated weights for policy 0, policy_version 450 (0.0027) +[2024-12-01 11:15:09,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1843200. Throughput: 0: 1017.0. Samples: 460920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:15:09,551][02154] Avg episode reward: [(0, '15.215')] +[2024-12-01 11:15:14,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 1855488. Throughput: 0: 989.7. Samples: 462974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:15:14,548][02154] Avg episode reward: [(0, '16.071')] +[2024-12-01 11:15:14,551][04297] Saving new best policy, reward=16.071! +[2024-12-01 11:15:19,545][02154] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3915.9). Total num frames: 1880064. Throughput: 0: 969.2. Samples: 468956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-01 11:15:19,549][02154] Avg episode reward: [(0, '16.809')] +[2024-12-01 11:15:19,558][04297] Saving new best policy, reward=16.809! +[2024-12-01 11:15:20,142][04311] Updated weights for policy 0, policy_version 460 (0.0018) +[2024-12-01 11:15:24,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1900544. Throughput: 0: 1020.2. Samples: 476068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:15:24,547][02154] Avg episode reward: [(0, '16.062')] +[2024-12-01 11:15:29,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1916928. Throughput: 0: 1011.2. Samples: 478302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:15:29,554][02154] Avg episode reward: [(0, '15.872')] +[2024-12-01 11:15:31,341][04311] Updated weights for policy 0, policy_version 470 (0.0016) +[2024-12-01 11:15:34,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1937408. Throughput: 0: 974.3. Samples: 483726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:15:34,551][02154] Avg episode reward: [(0, '15.878')] +[2024-12-01 11:15:39,545][02154] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1961984. Throughput: 0: 1013.0. Samples: 490938. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-01 11:15:39,551][02154] Avg episode reward: [(0, '16.309')] +[2024-12-01 11:15:39,618][04311] Updated weights for policy 0, policy_version 480 (0.0034) +[2024-12-01 11:15:44,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1978368. Throughput: 0: 1034.0. Samples: 493998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:15:44,547][02154] Avg episode reward: [(0, '16.560')] +[2024-12-01 11:15:49,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1998848. Throughput: 0: 978.5. Samples: 498420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:15:49,551][02154] Avg episode reward: [(0, '17.542')] +[2024-12-01 11:15:49,560][04297] Saving new best policy, reward=17.542! +[2024-12-01 11:15:51,079][04311] Updated weights for policy 0, policy_version 490 (0.0034) +[2024-12-01 11:15:54,545][02154] Fps is (10 sec: 4505.3, 60 sec: 4096.0, 300 sec: 3943.3). Total num frames: 2023424. Throughput: 0: 990.4. Samples: 505490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:15:54,548][02154] Avg episode reward: [(0, '19.366')] +[2024-12-01 11:15:54,557][04297] Saving new best policy, reward=19.366! +[2024-12-01 11:15:59,545][02154] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2039808. Throughput: 0: 1022.3. Samples: 508978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:15:59,550][02154] Avg episode reward: [(0, '19.661')] +[2024-12-01 11:15:59,560][04297] Saving new best policy, reward=19.661! +[2024-12-01 11:16:01,342][04311] Updated weights for policy 0, policy_version 500 (0.0031) +[2024-12-01 11:16:04,545][02154] Fps is (10 sec: 3277.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2056192. Throughput: 0: 990.1. Samples: 513510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-01 11:16:04,549][02154] Avg episode reward: [(0, '19.704')] +[2024-12-01 11:16:04,553][04297] Saving new best policy, reward=19.704! +[2024-12-01 11:16:09,545][02154] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3929.5). Total num frames: 2080768. Throughput: 0: 975.5. Samples: 519964. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-01 11:16:09,549][02154] Avg episode reward: [(0, '20.702')] +[2024-12-01 11:16:09,557][04297] Saving new best policy, reward=20.702! +[2024-12-01 11:16:11,209][04311] Updated weights for policy 0, policy_version 510 (0.0030) +[2024-12-01 11:16:14,545][02154] Fps is (10 sec: 4505.2, 60 sec: 4095.9, 300 sec: 3957.2). Total num frames: 2101248. Throughput: 0: 1003.8. Samples: 523472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:16:14,550][02154] Avg episode reward: [(0, '21.517')] +[2024-12-01 11:16:14,555][04297] Saving new best policy, reward=21.517! +[2024-12-01 11:16:19,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2117632. Throughput: 0: 999.4. Samples: 528700. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:16:19,548][02154] Avg episode reward: [(0, '20.857')] +[2024-12-01 11:16:22,622][04311] Updated weights for policy 0, policy_version 520 (0.0016) +[2024-12-01 11:16:24,545][02154] Fps is (10 sec: 3686.7, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2138112. Throughput: 0: 964.3. Samples: 534332. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:16:24,549][02154] Avg episode reward: [(0, '19.505')] +[2024-12-01 11:16:29,545][02154] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2158592. Throughput: 0: 975.7. Samples: 537906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-01 11:16:29,550][02154] Avg episode reward: [(0, '19.547')] +[2024-12-01 11:16:29,561][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000527_2158592.pth... +[2024-12-01 11:16:29,715][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth +[2024-12-01 11:16:31,705][04311] Updated weights for policy 0, policy_version 530 (0.0018) +[2024-12-01 11:16:34,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2174976. Throughput: 0: 1013.6. Samples: 544034. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-01 11:16:34,550][02154] Avg episode reward: [(0, '19.638')] +[2024-12-01 11:16:39,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2195456. Throughput: 0: 962.5. Samples: 548800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:16:39,549][02154] Avg episode reward: [(0, '19.190')] +[2024-12-01 11:16:42,699][04311] Updated weights for policy 0, policy_version 540 (0.0017) +[2024-12-01 11:16:44,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2215936. Throughput: 0: 965.5. Samples: 552424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:16:44,551][02154] Avg episode reward: [(0, '19.651')] +[2024-12-01 11:16:49,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2236416. Throughput: 0: 1019.4. Samples: 559384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:16:49,548][02154] Avg episode reward: [(0, '21.226')] +[2024-12-01 11:16:54,060][04311] Updated weights for policy 0, policy_version 550 (0.0015) +[2024-12-01 11:16:54,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3929.4). Total num frames: 2252800. Throughput: 0: 972.0. Samples: 563704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:16:54,551][02154] Avg episode reward: [(0, '21.004')] +[2024-12-01 11:16:59,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2277376. Throughput: 0: 969.0. Samples: 567074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:16:59,546][02154] Avg episode reward: [(0, '20.715')] +[2024-12-01 11:17:02,662][04311] Updated weights for policy 0, policy_version 560 (0.0016) +[2024-12-01 11:17:04,547][02154] Fps is (10 sec: 4914.2, 60 sec: 4095.9, 300 sec: 3971.0). Total num frames: 2301952. Throughput: 0: 1010.5. Samples: 574174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:17:04,549][02154] Avg episode reward: [(0, '21.025')] +[2024-12-01 11:17:09,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2314240. Throughput: 0: 996.4. Samples: 579172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:17:09,547][02154] Avg episode reward: [(0, '21.297')] +[2024-12-01 11:17:13,933][04311] Updated weights for policy 0, policy_version 570 (0.0029) +[2024-12-01 11:17:14,545][02154] Fps is (10 sec: 3277.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2334720. Throughput: 0: 973.3. Samples: 581706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:17:14,553][02154] Avg episode reward: [(0, '20.957')] +[2024-12-01 11:17:19,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2359296. Throughput: 0: 992.8. Samples: 588710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:17:19,550][02154] Avg episode reward: [(0, '21.059')] +[2024-12-01 11:17:23,750][04311] Updated weights for policy 0, policy_version 580 (0.0036) +[2024-12-01 11:17:24,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2375680. Throughput: 0: 1015.6. Samples: 594500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:17:24,547][02154] Avg episode reward: [(0, '20.867')] +[2024-12-01 11:17:29,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2392064. Throughput: 0: 982.4. Samples: 596630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:17:29,550][02154] Avg episode reward: [(0, '19.787')] +[2024-12-01 11:17:34,007][04311] Updated weights for policy 0, policy_version 590 (0.0020) +[2024-12-01 11:17:34,545][02154] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2416640. Throughput: 0: 976.0. Samples: 603302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:17:34,551][02154] Avg episode reward: [(0, '19.073')] +[2024-12-01 11:17:39,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2437120. Throughput: 0: 1028.8. Samples: 610000. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:17:39,548][02154] Avg episode reward: [(0, '19.120')] +[2024-12-01 11:17:44,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2453504. Throughput: 0: 999.7. Samples: 612062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:17:44,550][02154] Avg episode reward: [(0, '20.245')] +[2024-12-01 11:17:45,453][04311] Updated weights for policy 0, policy_version 600 (0.0019) +[2024-12-01 11:17:49,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2473984. Throughput: 0: 970.4. Samples: 617838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:17:49,546][02154] Avg episode reward: [(0, '19.747')] +[2024-12-01 11:17:54,070][04311] Updated weights for policy 0, policy_version 610 (0.0020) +[2024-12-01 11:17:54,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 2498560. Throughput: 0: 1018.0. Samples: 624984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:17:54,547][02154] Avg episode reward: [(0, '20.735')] +[2024-12-01 11:17:59,549][02154] Fps is (10 sec: 3684.9, 60 sec: 3890.9, 300 sec: 3943.2). Total num frames: 2510848. Throughput: 0: 1015.1. Samples: 627388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:17:59,551][02154] Avg episode reward: [(0, '21.183')] +[2024-12-01 11:18:04,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3943.4). Total num frames: 2535424. Throughput: 0: 971.2. Samples: 632416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:18:04,548][02154] Avg episode reward: [(0, '21.601')] +[2024-12-01 11:18:04,550][04297] Saving new best policy, reward=21.601! +[2024-12-01 11:18:05,495][04311] Updated weights for policy 0, policy_version 620 (0.0022) +[2024-12-01 11:18:09,545][02154] Fps is (10 sec: 4507.4, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2555904. Throughput: 0: 997.4. Samples: 639384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:18:09,547][02154] Avg episode reward: [(0, '21.949')] +[2024-12-01 11:18:09,558][04297] Saving new best policy, reward=21.949! +[2024-12-01 11:18:14,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2572288. Throughput: 0: 1019.5. Samples: 642506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:18:14,548][02154] Avg episode reward: [(0, '22.947')] +[2024-12-01 11:18:14,554][04297] Saving new best policy, reward=22.947! +[2024-12-01 11:18:16,588][04311] Updated weights for policy 0, policy_version 630 (0.0026) +[2024-12-01 11:18:19,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2588672. Throughput: 0: 961.7. Samples: 646580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:18:19,547][02154] Avg episode reward: [(0, '24.463')] +[2024-12-01 11:18:19,557][04297] Saving new best policy, reward=24.463! +[2024-12-01 11:18:24,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2613248. Throughput: 0: 965.3. Samples: 653440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:18:24,546][02154] Avg episode reward: [(0, '23.117')] +[2024-12-01 11:18:26,014][04311] Updated weights for policy 0, policy_version 640 (0.0019) +[2024-12-01 11:18:29,551][02154] Fps is (10 sec: 4503.0, 60 sec: 4027.3, 300 sec: 3971.0). Total num frames: 2633728. Throughput: 0: 996.9. Samples: 656928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:18:29,553][02154] Avg episode reward: [(0, '24.151')] +[2024-12-01 11:18:29,564][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000643_2633728.pth... +[2024-12-01 11:18:29,709][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000411_1683456.pth +[2024-12-01 11:18:34,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2646016. Throughput: 0: 972.8. Samples: 661616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:18:34,550][02154] Avg episode reward: [(0, '23.539')] +[2024-12-01 11:18:37,485][04311] Updated weights for policy 0, policy_version 650 (0.0018) +[2024-12-01 11:18:39,545][02154] Fps is (10 sec: 3688.6, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2670592. Throughput: 0: 952.4. Samples: 667840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:18:39,547][02154] Avg episode reward: [(0, '23.248')] +[2024-12-01 11:18:44,545][02154] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2695168. Throughput: 0: 978.0. Samples: 671394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:18:44,547][02154] Avg episode reward: [(0, '21.934')] +[2024-12-01 11:18:47,100][04311] Updated weights for policy 0, policy_version 660 (0.0018) +[2024-12-01 11:18:49,549][02154] Fps is (10 sec: 3684.9, 60 sec: 3890.9, 300 sec: 3943.2). Total num frames: 2707456. Throughput: 0: 989.2. Samples: 676932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:18:49,551][02154] Avg episode reward: [(0, '23.061')] +[2024-12-01 11:18:54,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2727936. Throughput: 0: 950.4. Samples: 682150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:18:54,547][02154] Avg episode reward: [(0, '24.697')] +[2024-12-01 11:18:54,553][04297] Saving new best policy, reward=24.697! +[2024-12-01 11:18:57,677][04311] Updated weights for policy 0, policy_version 670 (0.0028) +[2024-12-01 11:18:59,545][02154] Fps is (10 sec: 4507.4, 60 sec: 4028.0, 300 sec: 3957.2). Total num frames: 2752512. Throughput: 0: 956.8. Samples: 685564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:18:59,547][02154] Avg episode reward: [(0, '24.392')] +[2024-12-01 11:19:04,545][02154] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3957.1). Total num frames: 2768896. Throughput: 0: 1012.2. Samples: 692130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:19:04,547][02154] Avg episode reward: [(0, '25.435')] +[2024-12-01 11:19:04,555][04297] Saving new best policy, reward=25.435! +[2024-12-01 11:19:09,307][04311] Updated weights for policy 0, policy_version 680 (0.0028) +[2024-12-01 11:19:09,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2785280. Throughput: 0: 955.9. Samples: 696454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:19:09,551][02154] Avg episode reward: [(0, '25.549')] +[2024-12-01 11:19:09,560][04297] Saving new best policy, reward=25.549! +[2024-12-01 11:19:14,545][02154] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2805760. Throughput: 0: 953.2. Samples: 699818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:19:14,551][02154] Avg episode reward: [(0, '26.194')] +[2024-12-01 11:19:14,597][04297] Saving new best policy, reward=26.194! +[2024-12-01 11:19:18,346][04311] Updated weights for policy 0, policy_version 690 (0.0038) +[2024-12-01 11:19:19,545][02154] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3957.1). Total num frames: 2826240. Throughput: 0: 998.5. Samples: 706548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:19:19,551][02154] Avg episode reward: [(0, '24.193')] +[2024-12-01 11:19:24,549][02154] Fps is (10 sec: 3684.6, 60 sec: 3822.6, 300 sec: 3943.2). Total num frames: 2842624. Throughput: 0: 959.9. Samples: 711038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:19:24,552][02154] Avg episode reward: [(0, '23.403')] +[2024-12-01 11:19:29,545][02154] Fps is (10 sec: 3686.5, 60 sec: 3823.3, 300 sec: 3929.4). Total num frames: 2863104. Throughput: 0: 946.1. Samples: 713970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:19:29,547][02154] Avg episode reward: [(0, '20.869')] +[2024-12-01 11:19:29,685][04311] Updated weights for policy 0, policy_version 700 (0.0018) +[2024-12-01 11:19:34,545][02154] Fps is (10 sec: 4507.8, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2887680. Throughput: 0: 979.2. Samples: 720990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:19:34,547][02154] Avg episode reward: [(0, '21.022')] +[2024-12-01 11:19:39,545][02154] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2904064. Throughput: 0: 979.6. Samples: 726234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:19:39,549][02154] Avg episode reward: [(0, '21.384')] +[2024-12-01 11:19:40,740][04311] Updated weights for policy 0, policy_version 710 (0.0015) +[2024-12-01 11:19:44,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 2920448. Throughput: 0: 951.6. Samples: 728386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:19:44,547][02154] Avg episode reward: [(0, '21.556')] +[2024-12-01 11:19:49,550][02154] Fps is (10 sec: 4094.2, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 2945024. Throughput: 0: 958.6. Samples: 735270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:19:49,552][02154] Avg episode reward: [(0, '21.931')] +[2024-12-01 11:19:50,096][04311] Updated weights for policy 0, policy_version 720 (0.0023) +[2024-12-01 11:19:54,545][02154] Fps is (10 sec: 4095.6, 60 sec: 3891.1, 300 sec: 3943.3). Total num frames: 2961408. Throughput: 0: 997.4. Samples: 741340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:19:54,549][02154] Avg episode reward: [(0, '23.076')] +[2024-12-01 11:19:59,545][02154] Fps is (10 sec: 3278.4, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 2977792. Throughput: 0: 970.4. Samples: 743484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:19:59,549][02154] Avg episode reward: [(0, '24.049')] +[2024-12-01 11:20:01,310][04311] Updated weights for policy 0, policy_version 730 (0.0030) +[2024-12-01 11:20:04,545][02154] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3002368. Throughput: 0: 961.5. Samples: 749814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:20:04,551][02154] Avg episode reward: [(0, '21.628')] +[2024-12-01 11:20:09,545][02154] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3022848. Throughput: 0: 1014.2. Samples: 756674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:20:09,547][02154] Avg episode reward: [(0, '22.201')] +[2024-12-01 11:20:11,089][04311] Updated weights for policy 0, policy_version 740 (0.0017) +[2024-12-01 11:20:14,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3039232. Throughput: 0: 994.9. Samples: 758742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:20:14,555][02154] Avg episode reward: [(0, '21.835')] +[2024-12-01 11:20:19,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3059712. Throughput: 0: 961.8. Samples: 764272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:20:19,550][02154] Avg episode reward: [(0, '24.322')] +[2024-12-01 11:20:21,794][04311] Updated weights for policy 0, policy_version 750 (0.0019) +[2024-12-01 11:20:24,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 3957.2). Total num frames: 3084288. Throughput: 0: 997.3. Samples: 771112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:20:24,549][02154] Avg episode reward: [(0, '24.311')] +[2024-12-01 11:20:29,552][02154] Fps is (10 sec: 3683.8, 60 sec: 3890.7, 300 sec: 3929.3). Total num frames: 3096576. Throughput: 0: 1010.2. Samples: 773852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:20:29,556][02154] Avg episode reward: [(0, '24.685')] +[2024-12-01 11:20:29,646][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000757_3100672.pth... +[2024-12-01 11:20:29,806][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000527_2158592.pth +[2024-12-01 11:20:33,133][04311] Updated weights for policy 0, policy_version 760 (0.0013) +[2024-12-01 11:20:34,545][02154] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3117056. Throughput: 0: 962.2. Samples: 778564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:20:34,547][02154] Avg episode reward: [(0, '24.915')] +[2024-12-01 11:20:39,545][02154] Fps is (10 sec: 4508.7, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3141632. Throughput: 0: 983.0. Samples: 785572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:20:39,547][02154] Avg episode reward: [(0, '25.423')] +[2024-12-01 11:20:41,698][04311] Updated weights for policy 0, policy_version 770 (0.0014) +[2024-12-01 11:20:44,546][02154] Fps is (10 sec: 4504.9, 60 sec: 4027.6, 300 sec: 3943.2). Total num frames: 3162112. Throughput: 0: 1014.3. Samples: 789130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:20:44,549][02154] Avg episode reward: [(0, '24.226')] +[2024-12-01 11:20:49,546][02154] Fps is (10 sec: 3276.5, 60 sec: 3823.2, 300 sec: 3901.6). Total num frames: 3174400. Throughput: 0: 968.5. Samples: 793398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:20:49,550][02154] Avg episode reward: [(0, '22.657')] +[2024-12-01 11:20:53,245][04311] Updated weights for policy 0, policy_version 780 (0.0031) +[2024-12-01 11:20:54,545][02154] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3198976. Throughput: 0: 962.7. Samples: 799996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:20:54,551][02154] Avg episode reward: [(0, '24.286')] +[2024-12-01 11:20:59,545][02154] Fps is (10 sec: 4506.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3219456. Throughput: 0: 993.6. Samples: 803454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:20:59,550][02154] Avg episode reward: [(0, '26.326')] +[2024-12-01 11:20:59,565][04297] Saving new best policy, reward=26.326! +[2024-12-01 11:21:04,096][04311] Updated weights for policy 0, policy_version 790 (0.0021) +[2024-12-01 11:21:04,545][02154] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3235840. Throughput: 0: 984.8. Samples: 808586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:21:04,550][02154] Avg episode reward: [(0, '27.308')] +[2024-12-01 11:21:04,557][04297] Saving new best policy, reward=27.308! +[2024-12-01 11:21:09,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3256320. Throughput: 0: 961.7. Samples: 814390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:21:09,553][02154] Avg episode reward: [(0, '26.186')] +[2024-12-01 11:21:13,490][04311] Updated weights for policy 0, policy_version 800 (0.0016) +[2024-12-01 11:21:14,545][02154] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3280896. Throughput: 0: 978.9. Samples: 817896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:21:14,548][02154] Avg episode reward: [(0, '26.529')] +[2024-12-01 11:21:19,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3293184. Throughput: 0: 1005.4. Samples: 823808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:21:19,551][02154] Avg episode reward: [(0, '24.745')] +[2024-12-01 11:21:24,545][02154] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3313664. Throughput: 0: 957.6. Samples: 828664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:21:24,547][02154] Avg episode reward: [(0, '24.575')] +[2024-12-01 11:21:25,039][04311] Updated weights for policy 0, policy_version 810 (0.0018) +[2024-12-01 11:21:29,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4028.2, 300 sec: 3943.3). Total num frames: 3338240. Throughput: 0: 954.4. Samples: 832076. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:21:29,547][02154] Avg episode reward: [(0, '23.801')] +[2024-12-01 11:21:34,545][02154] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3354624. Throughput: 0: 1010.1. Samples: 838850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:21:34,552][02154] Avg episode reward: [(0, '23.884')] +[2024-12-01 11:21:34,583][04311] Updated weights for policy 0, policy_version 820 (0.0014) +[2024-12-01 11:21:39,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3371008. Throughput: 0: 956.9. Samples: 843058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:21:39,550][02154] Avg episode reward: [(0, '24.720')] +[2024-12-01 11:21:44,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3929.4). Total num frames: 3395584. Throughput: 0: 954.0. Samples: 846386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:21:44,552][02154] Avg episode reward: [(0, '25.642')] +[2024-12-01 11:21:45,310][04311] Updated weights for policy 0, policy_version 830 (0.0023) +[2024-12-01 11:21:49,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3943.3). Total num frames: 3416064. Throughput: 0: 997.6. Samples: 853476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:21:49,547][02154] Avg episode reward: [(0, '26.876')] +[2024-12-01 11:21:54,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3428352. Throughput: 0: 971.4. Samples: 858102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:21:54,550][02154] Avg episode reward: [(0, '26.875')] +[2024-12-01 11:21:56,845][04311] Updated weights for policy 0, policy_version 840 (0.0034) +[2024-12-01 11:21:59,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3452928. Throughput: 0: 950.2. Samples: 860656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:21:59,552][02154] Avg episode reward: [(0, '26.071')] +[2024-12-01 11:22:04,545][02154] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3473408. Throughput: 0: 973.4. Samples: 867612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:22:04,546][02154] Avg episode reward: [(0, '24.160')] +[2024-12-01 11:22:05,733][04311] Updated weights for policy 0, policy_version 850 (0.0041) +[2024-12-01 11:22:09,545][02154] Fps is (10 sec: 3686.1, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3489792. Throughput: 0: 991.1. Samples: 873264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:22:09,553][02154] Avg episode reward: [(0, '23.148')] +[2024-12-01 11:22:14,544][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3510272. Throughput: 0: 961.1. Samples: 875324. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-01 11:22:14,547][02154] Avg episode reward: [(0, '22.031')] +[2024-12-01 11:22:17,172][04311] Updated weights for policy 0, policy_version 860 (0.0023) +[2024-12-01 11:22:19,545][02154] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3530752. Throughput: 0: 959.2. Samples: 882014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:22:19,547][02154] Avg episode reward: [(0, '22.965')] +[2024-12-01 11:22:24,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3551232. Throughput: 0: 1007.4. Samples: 888392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:22:24,551][02154] Avg episode reward: [(0, '23.408')] +[2024-12-01 11:22:28,284][04311] Updated weights for policy 0, policy_version 870 (0.0023) +[2024-12-01 11:22:29,551][02154] Fps is (10 sec: 3275.0, 60 sec: 3754.3, 300 sec: 3887.7). Total num frames: 3563520. Throughput: 0: 978.8. Samples: 890440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:22:29,552][02154] Avg episode reward: [(0, '23.721')] +[2024-12-01 11:22:29,573][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000870_3563520.pth... +[2024-12-01 11:22:29,711][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000643_2633728.pth +[2024-12-01 11:22:34,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3588096. Throughput: 0: 946.3. Samples: 896058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:22:34,551][02154] Avg episode reward: [(0, '25.978')] +[2024-12-01 11:22:37,852][04311] Updated weights for policy 0, policy_version 880 (0.0026) +[2024-12-01 11:22:39,545][02154] Fps is (10 sec: 4918.1, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 3612672. Throughput: 0: 997.2. Samples: 902974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:22:39,546][02154] Avg episode reward: [(0, '25.812')] +[2024-12-01 11:22:44,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3624960. Throughput: 0: 996.3. Samples: 905490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:22:44,552][02154] Avg episode reward: [(0, '25.971')] +[2024-12-01 11:22:49,242][04311] Updated weights for policy 0, policy_version 890 (0.0017) +[2024-12-01 11:22:49,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3645440. Throughput: 0: 949.4. Samples: 910336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:22:49,546][02154] Avg episode reward: [(0, '25.548')] +[2024-12-01 11:22:54,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.6). Total num frames: 3665920. Throughput: 0: 978.2. Samples: 917280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-12-01 11:22:54,552][02154] Avg episode reward: [(0, '24.778')] +[2024-12-01 11:22:59,178][04311] Updated weights for policy 0, policy_version 900 (0.0013) +[2024-12-01 11:22:59,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3686400. Throughput: 0: 1006.4. Samples: 920610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:22:59,551][02154] Avg episode reward: [(0, '25.657')] +[2024-12-01 11:23:04,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3702784. Throughput: 0: 952.7. Samples: 924884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-12-01 11:23:04,552][02154] Avg episode reward: [(0, '25.224')] +[2024-12-01 11:23:09,482][04311] Updated weights for policy 0, policy_version 910 (0.0029) +[2024-12-01 11:23:09,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3727360. Throughput: 0: 963.2. Samples: 931736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:23:09,547][02154] Avg episode reward: [(0, '26.063')] +[2024-12-01 11:23:14,545][02154] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3747840. Throughput: 0: 996.7. Samples: 935284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:23:14,553][02154] Avg episode reward: [(0, '25.871')] +[2024-12-01 11:23:19,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 3760128. Throughput: 0: 978.8. Samples: 940106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:23:19,552][02154] Avg episode reward: [(0, '26.466')] +[2024-12-01 11:23:20,921][04311] Updated weights for policy 0, policy_version 920 (0.0028) +[2024-12-01 11:23:24,544][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.7). Total num frames: 3784704. Throughput: 0: 957.1. Samples: 946044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:23:24,549][02154] Avg episode reward: [(0, '24.950')] +[2024-12-01 11:23:29,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 3929.4). Total num frames: 3805184. Throughput: 0: 980.7. Samples: 949622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:23:29,547][02154] Avg episode reward: [(0, '23.885')] +[2024-12-01 11:23:29,575][04311] Updated weights for policy 0, policy_version 930 (0.0022) +[2024-12-01 11:23:34,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3821568. Throughput: 0: 999.5. Samples: 955314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-01 11:23:34,551][02154] Avg episode reward: [(0, '23.925')] +[2024-12-01 11:23:39,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3842048. Throughput: 0: 962.5. Samples: 960594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:23:39,547][02154] Avg episode reward: [(0, '23.939')] +[2024-12-01 11:23:41,009][04311] Updated weights for policy 0, policy_version 940 (0.0029) +[2024-12-01 11:23:44,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 3866624. Throughput: 0: 967.6. Samples: 964152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:23:44,547][02154] Avg episode reward: [(0, '23.039')] +[2024-12-01 11:23:49,548][02154] Fps is (10 sec: 4094.4, 60 sec: 3959.2, 300 sec: 3915.4). Total num frames: 3883008. Throughput: 0: 1015.5. Samples: 970584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:23:49,551][02154] Avg episode reward: [(0, '22.681')] +[2024-12-01 11:23:51,998][04311] Updated weights for policy 0, policy_version 950 (0.0023) +[2024-12-01 11:23:54,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3899392. Throughput: 0: 960.0. Samples: 974936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:23:54,555][02154] Avg episode reward: [(0, '22.218')] +[2024-12-01 11:23:59,545][02154] Fps is (10 sec: 4097.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3923968. Throughput: 0: 960.3. Samples: 978498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-01 11:23:59,547][02154] Avg episode reward: [(0, '23.019')] +[2024-12-01 11:24:01,200][04311] Updated weights for policy 0, policy_version 960 (0.0025) +[2024-12-01 11:24:04,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 3944448. Throughput: 0: 1009.2. Samples: 985518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:24:04,551][02154] Avg episode reward: [(0, '22.438')] +[2024-12-01 11:24:09,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3956736. Throughput: 0: 976.3. Samples: 989978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-01 11:24:09,550][02154] Avg episode reward: [(0, '21.696')] +[2024-12-01 11:24:12,606][04311] Updated weights for policy 0, policy_version 970 (0.0051) +[2024-12-01 11:24:14,545][02154] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3981312. Throughput: 0: 960.3. Samples: 992836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-01 11:24:14,552][02154] Avg episode reward: [(0, '22.582')] +[2024-12-01 11:24:19,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 4001792. Throughput: 0: 991.5. Samples: 999932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-01 11:24:19,552][02154] Avg episode reward: [(0, '23.238')] +[2024-12-01 11:24:19,673][04297] Stopping Batcher_0... +[2024-12-01 11:24:19,674][04297] Loop batcher_evt_loop terminating... +[2024-12-01 11:24:19,673][02154] Component Batcher_0 stopped! +[2024-12-01 11:24:19,679][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-01 11:24:19,732][04311] Weights refcount: 2 0 +[2024-12-01 11:24:19,737][04311] Stopping InferenceWorker_p0-w0... +[2024-12-01 11:24:19,738][04311] Loop inference_proc0-0_evt_loop terminating... +[2024-12-01 11:24:19,737][02154] Component InferenceWorker_p0-w0 stopped! +[2024-12-01 11:24:19,799][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000757_3100672.pth +[2024-12-01 11:24:19,825][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-01 11:24:19,996][04297] Stopping LearnerWorker_p0... +[2024-12-01 11:24:19,997][04297] Loop learner_proc0_evt_loop terminating... +[2024-12-01 11:24:19,997][02154] Component LearnerWorker_p0 stopped! +[2024-12-01 11:24:20,094][02154] Component RolloutWorker_w1 stopped! +[2024-12-01 11:24:20,100][04312] Stopping RolloutWorker_w1... +[2024-12-01 11:24:20,103][04312] Loop rollout_proc1_evt_loop terminating... +[2024-12-01 11:24:20,121][04310] Stopping RolloutWorker_w0... +[2024-12-01 11:24:20,124][04317] Stopping RolloutWorker_w6... +[2024-12-01 11:24:20,125][04317] Loop rollout_proc6_evt_loop terminating... +[2024-12-01 11:24:20,121][02154] Component RolloutWorker_w0 stopped! +[2024-12-01 11:24:20,126][02154] Component RolloutWorker_w6 stopped! +[2024-12-01 11:24:20,133][04310] Loop rollout_proc0_evt_loop terminating... +[2024-12-01 11:24:20,143][04313] Stopping RolloutWorker_w2... +[2024-12-01 11:24:20,143][04313] Loop rollout_proc2_evt_loop terminating... +[2024-12-01 11:24:20,144][04314] Stopping RolloutWorker_w3... +[2024-12-01 11:24:20,145][04314] Loop rollout_proc3_evt_loop terminating... +[2024-12-01 11:24:20,144][02154] Component RolloutWorker_w2 stopped! +[2024-12-01 11:24:20,153][02154] Component RolloutWorker_w3 stopped! +[2024-12-01 11:24:20,162][02154] Component RolloutWorker_w7 stopped! +[2024-12-01 11:24:20,168][04318] Stopping RolloutWorker_w7... +[2024-12-01 11:24:20,168][04318] Loop rollout_proc7_evt_loop terminating... +[2024-12-01 11:24:20,200][04315] Stopping RolloutWorker_w4... +[2024-12-01 11:24:20,199][02154] Component RolloutWorker_w4 stopped! +[2024-12-01 11:24:20,200][04315] Loop rollout_proc4_evt_loop terminating... +[2024-12-01 11:24:20,222][02154] Component RolloutWorker_w5 stopped! +[2024-12-01 11:24:20,226][02154] Waiting for process learner_proc0 to stop... +[2024-12-01 11:24:20,234][04316] Stopping RolloutWorker_w5... +[2024-12-01 11:24:20,234][04316] Loop rollout_proc5_evt_loop terminating... +[2024-12-01 11:24:22,232][02154] Waiting for process inference_proc0-0 to join... +[2024-12-01 11:24:22,238][02154] Waiting for process rollout_proc0 to join... +[2024-12-01 11:24:24,949][02154] Waiting for process rollout_proc1 to join... +[2024-12-01 11:24:25,062][02154] Waiting for process rollout_proc2 to join... +[2024-12-01 11:24:25,067][02154] Waiting for process rollout_proc3 to join... +[2024-12-01 11:24:25,070][02154] Waiting for process rollout_proc4 to join... +[2024-12-01 11:24:25,074][02154] Waiting for process rollout_proc5 to join... +[2024-12-01 11:24:25,077][02154] Waiting for process rollout_proc6 to join... +[2024-12-01 11:24:25,081][02154] Waiting for process rollout_proc7 to join... +[2024-12-01 11:24:25,084][02154] Batcher 0 profile tree view: +batching: 25.9505, releasing_batches: 0.0308 +[2024-12-01 11:24:25,086][02154] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 - wait_policy_total: 405.9747 -update_model: 8.6611 - weight_update: 0.0030 + wait_policy_total: 406.1312 +update_model: 8.6124 + weight_update: 0.0022 one_step: 0.0024 - handle_policy_step: 608.9111 - deserialize: 15.0671, stack: 3.4031, obs_to_device_normalize: 128.1769, forward: 305.6480, send_messages: 30.3155 - prepare_outputs: 95.9428 - to_cpu: 58.1498 -[2024-11-28 08:47:04,965][00195] Learner 0 profile tree view: -misc: 0.0050, prepare_batch: 14.0583 -train: 76.6414 - epoch_init: 0.0079, minibatch_init: 0.0124, losses_postprocess: 0.6427, kl_divergence: 0.6697, after_optimizer: 34.0289 - calculate_losses: 28.1813 - losses_init: 0.0055, forward_head: 1.3473, bptt_initial: 18.9848, tail: 1.2203, advantages_returns: 0.3103, losses: 3.9149 - bptt: 2.0505 - bptt_forward_core: 1.9567 - update: 12.5077 - clip: 0.9133 -[2024-11-28 08:47:04,967][00195] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.2925, enqueue_policy_requests: 99.5740, env_step: 837.2425, overhead: 13.5808, complete_rollouts: 7.3692 -save_policy_outputs: 21.7104 - split_output_tensors: 8.8833 -[2024-11-28 08:47:04,968][00195] RolloutWorker_w7 profile tree view: -wait_for_trajectories: 0.3746, enqueue_policy_requests: 102.9072, env_step: 835.3827, overhead: 13.0551, complete_rollouts: 6.2838 -save_policy_outputs: 21.1759 - split_output_tensors: 8.6046 -[2024-11-28 08:47:04,971][00195] Loop Runner_EvtLoop terminating... -[2024-11-28 08:47:04,972][00195] Runner profile tree view: -main_loop: 1098.3587 -[2024-11-28 08:47:04,974][00195] Collected {0: 4005888}, FPS: 3647.2 -[2024-11-28 09:07:36,401][00195] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-11-28 09:07:36,403][00195] Overriding arg 'num_workers' with value 1 passed from command line -[2024-11-28 09:07:36,405][00195] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-11-28 09:07:36,408][00195] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-11-28 09:07:36,409][00195] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-11-28 09:07:36,411][00195] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-11-28 09:07:36,413][00195] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2024-11-28 09:07:36,414][00195] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-11-28 09:07:36,415][00195] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2024-11-28 09:07:36,416][00195] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2024-11-28 09:07:36,417][00195] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-11-28 09:07:36,418][00195] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-11-28 09:07:36,420][00195] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-11-28 09:07:36,421][00195] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-11-28 09:07:36,422][00195] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-11-28 09:07:36,457][00195] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-11-28 09:07:36,460][00195] RunningMeanStd input shape: (3, 72, 128) -[2024-11-28 09:07:36,464][00195] RunningMeanStd input shape: (1,) -[2024-11-28 09:07:36,479][00195] ConvEncoder: input_channels=3 -[2024-11-28 09:07:36,594][00195] Conv encoder output size: 512 -[2024-11-28 09:07:36,596][00195] Policy head output size: 512 -[2024-11-28 09:07:36,914][00195] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-11-28 09:07:37,783][00195] Num frames 100... -[2024-11-28 09:07:37,917][00195] Num frames 200... -[2024-11-28 09:07:38,055][00195] Num frames 300... -[2024-11-28 09:07:38,192][00195] Num frames 400... -[2024-11-28 09:07:38,341][00195] Num frames 500... -[2024-11-28 09:07:38,478][00195] Num frames 600... -[2024-11-28 09:07:38,618][00195] Num frames 700... -[2024-11-28 09:07:38,753][00195] Num frames 800... -[2024-11-28 09:07:38,901][00195] Num frames 900... -[2024-11-28 09:07:39,037][00195] Num frames 1000... -[2024-11-28 09:07:39,133][00195] Avg episode rewards: #0: 27.290, true rewards: #0: 10.290 -[2024-11-28 09:07:39,135][00195] Avg episode reward: 27.290, avg true_objective: 10.290 -[2024-11-28 09:07:39,232][00195] Num frames 1100... -[2024-11-28 09:07:39,377][00195] Num frames 1200... -[2024-11-28 09:07:39,514][00195] Num frames 1300... -[2024-11-28 09:07:39,652][00195] Num frames 1400... -[2024-11-28 09:07:39,787][00195] Num frames 1500... -[2024-11-28 09:07:39,927][00195] Num frames 1600... -[2024-11-28 09:07:40,065][00195] Num frames 1700... -[2024-11-28 09:07:40,200][00195] Num frames 1800... -[2024-11-28 09:07:40,374][00195] Avg episode rewards: #0: 24.930, true rewards: #0: 9.430 -[2024-11-28 09:07:40,376][00195] Avg episode reward: 24.930, avg true_objective: 9.430 -[2024-11-28 09:07:40,398][00195] Num frames 1900... -[2024-11-28 09:07:40,537][00195] Num frames 2000... -[2024-11-28 09:07:40,682][00195] Num frames 2100... -[2024-11-28 09:07:40,837][00195] Num frames 2200... -[2024-11-28 09:07:40,973][00195] Num frames 2300... -[2024-11-28 09:07:41,106][00195] Num frames 2400... -[2024-11-28 09:07:41,205][00195] Avg episode rewards: #0: 19.100, true rewards: #0: 8.100 -[2024-11-28 09:07:41,207][00195] Avg episode reward: 19.100, avg true_objective: 8.100 -[2024-11-28 09:07:41,312][00195] Num frames 2500... -[2024-11-28 09:07:41,456][00195] Num frames 2600... -[2024-11-28 09:07:41,624][00195] Avg episode rewards: #0: 14.965, true rewards: #0: 6.715 -[2024-11-28 09:07:41,626][00195] Avg episode reward: 14.965, avg true_objective: 6.715 -[2024-11-28 09:07:41,650][00195] Num frames 2700... -[2024-11-28 09:07:41,784][00195] Num frames 2800... -[2024-11-28 09:07:41,933][00195] Num frames 2900... -[2024-11-28 09:07:42,072][00195] Num frames 3000... -[2024-11-28 09:07:42,211][00195] Num frames 3100... -[2024-11-28 09:07:42,349][00195] Num frames 3200... -[2024-11-28 09:07:42,488][00195] Num frames 3300... -[2024-11-28 09:07:42,623][00195] Num frames 3400... -[2024-11-28 09:07:42,762][00195] Num frames 3500... -[2024-11-28 09:07:42,900][00195] Num frames 3600... -[2024-11-28 09:07:42,999][00195] Avg episode rewards: #0: 16.058, true rewards: #0: 7.258 -[2024-11-28 09:07:43,001][00195] Avg episode reward: 16.058, avg true_objective: 7.258 -[2024-11-28 09:07:43,104][00195] Num frames 3700... -[2024-11-28 09:07:43,245][00195] Num frames 3800... -[2024-11-28 09:07:43,389][00195] Num frames 3900... -[2024-11-28 09:07:43,539][00195] Num frames 4000... -[2024-11-28 09:07:43,679][00195] Num frames 4100... -[2024-11-28 09:07:43,818][00195] Num frames 4200... -[2024-11-28 09:07:43,954][00195] Num frames 4300... -[2024-11-28 09:07:44,090][00195] Num frames 4400... -[2024-11-28 09:07:44,230][00195] Avg episode rewards: #0: 16.602, true rewards: #0: 7.435 -[2024-11-28 09:07:44,232][00195] Avg episode reward: 16.602, avg true_objective: 7.435 -[2024-11-28 09:07:44,285][00195] Num frames 4500... -[2024-11-28 09:07:44,419][00195] Num frames 4600... -[2024-11-28 09:07:44,563][00195] Num frames 4700... -[2024-11-28 09:07:44,698][00195] Num frames 4800... -[2024-11-28 09:07:44,842][00195] Num frames 4900... -[2024-11-28 09:07:44,983][00195] Num frames 5000... -[2024-11-28 09:07:45,117][00195] Num frames 5100... -[2024-11-28 09:07:45,248][00195] Num frames 5200... -[2024-11-28 09:07:45,380][00195] Num frames 5300... -[2024-11-28 09:07:45,521][00195] Num frames 5400... -[2024-11-28 09:07:45,656][00195] Num frames 5500... -[2024-11-28 09:07:45,798][00195] Num frames 5600... -[2024-11-28 09:07:45,938][00195] Num frames 5700... -[2024-11-28 09:07:46,073][00195] Num frames 5800... -[2024-11-28 09:07:46,206][00195] Num frames 5900... -[2024-11-28 09:07:46,390][00195] Num frames 6000... -[2024-11-28 09:07:46,603][00195] Num frames 6100... -[2024-11-28 09:07:46,765][00195] Avg episode rewards: #0: 19.931, true rewards: #0: 8.789 -[2024-11-28 09:07:46,767][00195] Avg episode reward: 19.931, avg true_objective: 8.789 -[2024-11-28 09:07:46,864][00195] Num frames 6200... -[2024-11-28 09:07:47,052][00195] Num frames 6300... -[2024-11-28 09:07:47,260][00195] Num frames 6400... -[2024-11-28 09:07:47,440][00195] Num frames 6500... -[2024-11-28 09:07:47,641][00195] Num frames 6600... -[2024-11-28 09:07:47,842][00195] Num frames 6700... -[2024-11-28 09:07:48,041][00195] Num frames 6800... -[2024-11-28 09:07:48,215][00195] Avg episode rewards: #0: 19.449, true rewards: #0: 8.574 -[2024-11-28 09:07:48,217][00195] Avg episode reward: 19.449, avg true_objective: 8.574 -[2024-11-28 09:07:48,303][00195] Num frames 6900... -[2024-11-28 09:07:48,496][00195] Num frames 7000... -[2024-11-28 09:07:48,695][00195] Num frames 7100... -[2024-11-28 09:07:48,907][00195] Num frames 7200... -[2024-11-28 09:07:49,109][00195] Num frames 7300... -[2024-11-28 09:07:49,250][00195] Num frames 7400... -[2024-11-28 09:07:49,374][00195] Avg episode rewards: #0: 18.501, true rewards: #0: 8.279 -[2024-11-28 09:07:49,376][00195] Avg episode reward: 18.501, avg true_objective: 8.279 -[2024-11-28 09:07:49,440][00195] Num frames 7500... -[2024-11-28 09:07:49,573][00195] Num frames 7600... -[2024-11-28 09:07:49,712][00195] Num frames 7700... -[2024-11-28 09:07:49,864][00195] Num frames 7800... -[2024-11-28 09:07:50,013][00195] Avg episode rewards: #0: 17.267, true rewards: #0: 7.867 -[2024-11-28 09:07:50,015][00195] Avg episode reward: 17.267, avg true_objective: 7.867 -[2024-11-28 09:08:44,584][00195] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2024-11-28 09:16:05,200][00195] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-11-28 09:16:05,202][00195] Overriding arg 'num_workers' with value 1 passed from command line -[2024-11-28 09:16:05,204][00195] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-11-28 09:16:05,206][00195] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-11-28 09:16:05,208][00195] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-11-28 09:16:05,210][00195] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-11-28 09:16:05,211][00195] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2024-11-28 09:16:05,212][00195] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-11-28 09:16:05,213][00195] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2024-11-28 09:16:05,214][00195] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2024-11-28 09:16:05,216][00195] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-11-28 09:16:05,217][00195] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-11-28 09:16:05,218][00195] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-11-28 09:16:05,219][00195] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-11-28 09:16:05,220][00195] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-11-28 09:16:05,254][00195] RunningMeanStd input shape: (3, 72, 128) -[2024-11-28 09:16:05,258][00195] RunningMeanStd input shape: (1,) -[2024-11-28 09:16:05,271][00195] ConvEncoder: input_channels=3 -[2024-11-28 09:16:05,313][00195] Conv encoder output size: 512 -[2024-11-28 09:16:05,315][00195] Policy head output size: 512 -[2024-11-28 09:16:05,336][00195] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-11-28 09:16:05,811][00195] Num frames 100... -[2024-11-28 09:16:05,972][00195] Num frames 200... -[2024-11-28 09:16:06,114][00195] Num frames 300... -[2024-11-28 09:16:06,250][00195] Num frames 400... -[2024-11-28 09:16:06,389][00195] Num frames 500... -[2024-11-28 09:16:06,603][00195] Num frames 600... -[2024-11-28 09:16:06,810][00195] Avg episode rewards: #0: 11.720, true rewards: #0: 6.720 -[2024-11-28 09:16:06,813][00195] Avg episode reward: 11.720, avg true_objective: 6.720 -[2024-11-28 09:16:06,878][00195] Num frames 700... -[2024-11-28 09:16:07,069][00195] Num frames 800... -[2024-11-28 09:16:07,261][00195] Num frames 900... -[2024-11-28 09:16:07,457][00195] Num frames 1000... -[2024-11-28 09:16:20,450][00195] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-11-28 09:16:20,453][00195] Overriding arg 'num_workers' with value 1 passed from command line -[2024-11-28 09:16:20,455][00195] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-11-28 09:16:20,457][00195] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-11-28 09:16:20,459][00195] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-11-28 09:16:20,461][00195] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-11-28 09:16:20,463][00195] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2024-11-28 09:16:20,464][00195] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-11-28 09:16:20,468][00195] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2024-11-28 09:16:20,469][00195] Adding new argument 'hf_repository'='Farseer-W/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2024-11-28 09:16:20,470][00195] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-11-28 09:16:20,472][00195] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-11-28 09:16:20,475][00195] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-11-28 09:16:20,476][00195] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-11-28 09:16:20,478][00195] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-11-28 09:16:20,544][00195] RunningMeanStd input shape: (3, 72, 128) -[2024-11-28 09:16:20,547][00195] RunningMeanStd input shape: (1,) -[2024-11-28 09:16:20,570][00195] ConvEncoder: input_channels=3 -[2024-11-28 09:16:20,639][00195] Conv encoder output size: 512 -[2024-11-28 09:16:20,642][00195] Policy head output size: 512 -[2024-11-28 09:16:20,676][00195] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-11-28 09:16:21,270][00195] Num frames 100... -[2024-11-28 09:16:21,409][00195] Num frames 200... -[2024-11-28 09:16:21,547][00195] Num frames 300... -[2024-11-28 09:16:21,690][00195] Num frames 400... -[2024-11-28 09:16:21,803][00195] Avg episode rewards: #0: 7.400, true rewards: #0: 4.400 -[2024-11-28 09:16:21,805][00195] Avg episode reward: 7.400, avg true_objective: 4.400 -[2024-11-28 09:16:21,899][00195] Num frames 500... -[2024-11-28 09:16:22,047][00195] Num frames 600... -[2024-11-28 09:16:22,187][00195] Num frames 700... -[2024-11-28 09:16:22,327][00195] Num frames 800... -[2024-11-28 09:16:22,461][00195] Num frames 900... -[2024-11-28 09:16:22,597][00195] Num frames 1000... -[2024-11-28 09:16:22,740][00195] Num frames 1100... -[2024-11-28 09:16:22,884][00195] Num frames 1200... -[2024-11-28 09:16:22,994][00195] Avg episode rewards: #0: 12.200, true rewards: #0: 6.200 -[2024-11-28 09:16:22,997][00195] Avg episode reward: 12.200, avg true_objective: 6.200 -[2024-11-28 09:16:23,086][00195] Num frames 1300... -[2024-11-28 09:16:23,221][00195] Num frames 1400... -[2024-11-28 09:16:23,358][00195] Num frames 1500... -[2024-11-28 09:16:23,499][00195] Num frames 1600... -[2024-11-28 09:16:23,631][00195] Num frames 1700... -[2024-11-28 09:16:23,767][00195] Num frames 1800... -[2024-11-28 09:16:23,918][00195] Num frames 1900... -[2024-11-28 09:16:24,067][00195] Num frames 2000... -[2024-11-28 09:16:24,212][00195] Num frames 2100... -[2024-11-28 09:16:24,349][00195] Num frames 2200... -[2024-11-28 09:16:24,490][00195] Num frames 2300... -[2024-11-28 09:16:24,625][00195] Num frames 2400... -[2024-11-28 09:16:24,764][00195] Num frames 2500... -[2024-11-28 09:16:24,917][00195] Num frames 2600... -[2024-11-28 09:16:25,061][00195] Num frames 2700... -[2024-11-28 09:16:25,195][00195] Num frames 2800... -[2024-11-28 09:16:25,347][00195] Avg episode rewards: #0: 22.907, true rewards: #0: 9.573 -[2024-11-28 09:16:25,349][00195] Avg episode reward: 22.907, avg true_objective: 9.573 -[2024-11-28 09:16:25,394][00195] Num frames 2900... -[2024-11-28 09:16:25,532][00195] Num frames 3000... -[2024-11-28 09:16:25,672][00195] Num frames 3100... -[2024-11-28 09:16:25,814][00195] Num frames 3200... -[2024-11-28 09:16:25,953][00195] Num frames 3300... -[2024-11-28 09:16:26,102][00195] Num frames 3400... -[2024-11-28 09:16:26,244][00195] Num frames 3500... -[2024-11-28 09:16:26,318][00195] Avg episode rewards: #0: 20.030, true rewards: #0: 8.780 -[2024-11-28 09:16:26,319][00195] Avg episode reward: 20.030, avg true_objective: 8.780 -[2024-11-28 09:16:26,441][00195] Num frames 3600... -[2024-11-28 09:16:26,571][00195] Num frames 3700... -[2024-11-28 09:16:26,706][00195] Num frames 3800... -[2024-11-28 09:16:26,851][00195] Num frames 3900... -[2024-11-28 09:16:27,026][00195] Num frames 4000... -[2024-11-28 09:16:27,172][00195] Avg episode rewards: #0: 17.512, true rewards: #0: 8.112 -[2024-11-28 09:16:27,174][00195] Avg episode reward: 17.512, avg true_objective: 8.112 -[2024-11-28 09:16:27,239][00195] Num frames 4100... -[2024-11-28 09:16:27,371][00195] Num frames 4200... -[2024-11-28 09:16:27,509][00195] Num frames 4300... -[2024-11-28 09:16:27,676][00195] Num frames 4400... -[2024-11-28 09:16:27,830][00195] Num frames 4500... -[2024-11-28 09:16:27,975][00195] Num frames 4600... -[2024-11-28 09:16:28,128][00195] Num frames 4700... -[2024-11-28 09:16:28,270][00195] Num frames 4800... -[2024-11-28 09:16:28,412][00195] Num frames 4900... -[2024-11-28 09:16:28,557][00195] Num frames 5000... -[2024-11-28 09:16:28,705][00195] Num frames 5100... -[2024-11-28 09:16:28,855][00195] Num frames 5200... -[2024-11-28 09:16:28,997][00195] Num frames 5300... -[2024-11-28 09:16:29,148][00195] Num frames 5400... -[2024-11-28 09:16:29,289][00195] Num frames 5500... -[2024-11-28 09:16:29,435][00195] Num frames 5600... -[2024-11-28 09:16:29,560][00195] Avg episode rewards: #0: 20.745, true rewards: #0: 9.412 -[2024-11-28 09:16:29,562][00195] Avg episode reward: 20.745, avg true_objective: 9.412 -[2024-11-28 09:16:29,638][00195] Num frames 5700... -[2024-11-28 09:16:29,781][00195] Num frames 5800... -[2024-11-28 09:16:29,936][00195] Num frames 5900... -[2024-11-28 09:16:30,075][00195] Num frames 6000... -[2024-11-28 09:16:30,225][00195] Num frames 6100... -[2024-11-28 09:16:30,363][00195] Num frames 6200... -[2024-11-28 09:16:30,536][00195] Avg episode rewards: #0: 19.696, true rewards: #0: 8.981 -[2024-11-28 09:16:30,538][00195] Avg episode reward: 19.696, avg true_objective: 8.981 -[2024-11-28 09:16:30,563][00195] Num frames 6300... -[2024-11-28 09:16:30,704][00195] Num frames 6400... -[2024-11-28 09:16:30,857][00195] Num frames 6500... -[2024-11-28 09:16:30,997][00195] Num frames 6600... -[2024-11-28 09:16:31,110][00195] Avg episode rewards: #0: 17.799, true rewards: #0: 8.299 -[2024-11-28 09:16:31,112][00195] Avg episode reward: 17.799, avg true_objective: 8.299 -[2024-11-28 09:16:31,248][00195] Num frames 6700... -[2024-11-28 09:16:31,451][00195] Num frames 6800... -[2024-11-28 09:16:31,650][00195] Num frames 6900... -[2024-11-28 09:16:31,864][00195] Num frames 7000... -[2024-11-28 09:16:32,067][00195] Num frames 7100... -[2024-11-28 09:16:32,281][00195] Num frames 7200... -[2024-11-28 09:16:32,473][00195] Num frames 7300... -[2024-11-28 09:16:32,657][00195] Num frames 7400... -[2024-11-28 09:16:32,863][00195] Num frames 7500... -[2024-11-28 09:16:33,072][00195] Num frames 7600... -[2024-11-28 09:16:33,260][00195] Num frames 7700... -[2024-11-28 09:16:33,470][00195] Num frames 7800... -[2024-11-28 09:16:33,681][00195] Num frames 7900... -[2024-11-28 09:16:33,895][00195] Num frames 8000... -[2024-11-28 09:16:34,102][00195] Num frames 8100... -[2024-11-28 09:16:34,243][00195] Num frames 8200... -[2024-11-28 09:16:34,385][00195] Num frames 8300... -[2024-11-28 09:16:34,530][00195] Num frames 8400... -[2024-11-28 09:16:34,670][00195] Num frames 8500... -[2024-11-28 09:16:34,815][00195] Num frames 8600... -[2024-11-28 09:16:34,906][00195] Avg episode rewards: #0: 21.248, true rewards: #0: 9.581 -[2024-11-28 09:16:34,907][00195] Avg episode reward: 21.248, avg true_objective: 9.581 -[2024-11-28 09:16:35,018][00195] Num frames 8700... -[2024-11-28 09:16:35,156][00195] Num frames 8800... -[2024-11-28 09:16:35,293][00195] Num frames 8900... -[2024-11-28 09:16:35,442][00195] Num frames 9000... -[2024-11-28 09:16:35,582][00195] Num frames 9100... -[2024-11-28 09:16:35,644][00195] Avg episode rewards: #0: 19.703, true rewards: #0: 9.103 -[2024-11-28 09:16:35,646][00195] Avg episode reward: 19.703, avg true_objective: 9.103 -[2024-11-28 09:17:36,546][00195] Replay video saved to /content/train_dir/default_experiment/replay.mp4! + handle_policy_step: 579.6014 + deserialize: 14.8573, stack: 3.1071, obs_to_device_normalize: 121.6247, forward: 292.0473, send_messages: 28.7061 + prepare_outputs: 89.5644 + to_cpu: 54.2086 +[2024-12-01 11:24:25,087][02154] Learner 0 profile tree view: +misc: 0.0059, prepare_batch: 13.5534 +train: 72.7957 + epoch_init: 0.0134, minibatch_init: 0.0063, losses_postprocess: 0.5706, kl_divergence: 0.6039, after_optimizer: 33.7038 + calculate_losses: 25.5581 + losses_init: 0.0034, forward_head: 1.1618, bptt_initial: 17.0828, tail: 1.0208, advantages_returns: 0.2430, losses: 3.7061 + bptt: 2.0363 + bptt_forward_core: 1.9198 + update: 11.7813 + clip: 0.8637 +[2024-12-01 11:24:25,089][02154] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.2925, enqueue_policy_requests: 97.5265, env_step: 811.8499, overhead: 12.6237, complete_rollouts: 6.9544 +save_policy_outputs: 20.7697 + split_output_tensors: 8.5124 +[2024-12-01 11:24:25,091][02154] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.2677, enqueue_policy_requests: 98.5244, env_step: 808.1773, overhead: 13.5904, complete_rollouts: 6.7081 +save_policy_outputs: 20.8069 + split_output_tensors: 8.5927 +[2024-12-01 11:24:25,092][02154] Loop Runner_EvtLoop terminating... +[2024-12-01 11:24:25,094][02154] Runner profile tree view: +main_loop: 1067.5266 +[2024-12-01 11:24:25,095][02154] Collected {0: 4005888}, FPS: 3752.5 +[2024-12-01 11:24:30,310][02154] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-12-01 11:24:30,312][02154] Overriding arg 'num_workers' with value 1 passed from command line +[2024-12-01 11:24:30,315][02154] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-12-01 11:24:30,317][02154] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-12-01 11:24:30,320][02154] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-12-01 11:24:30,321][02154] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-12-01 11:24:30,322][02154] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-12-01 11:24:30,324][02154] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-12-01 11:24:30,325][02154] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-12-01 11:24:30,326][02154] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-12-01 11:24:30,327][02154] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-12-01 11:24:30,329][02154] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-12-01 11:24:30,330][02154] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-12-01 11:24:30,331][02154] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-12-01 11:24:30,332][02154] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-12-01 11:24:30,367][02154] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-01 11:24:30,371][02154] RunningMeanStd input shape: (3, 72, 128) +[2024-12-01 11:24:30,373][02154] RunningMeanStd input shape: (1,) +[2024-12-01 11:24:30,395][02154] ConvEncoder: input_channels=3 +[2024-12-01 11:24:30,504][02154] Conv encoder output size: 512 +[2024-12-01 11:24:30,506][02154] Policy head output size: 512 +[2024-12-01 11:24:30,782][02154] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-01 11:24:31,542][02154] Num frames 100... +[2024-12-01 11:24:31,659][02154] Num frames 200... +[2024-12-01 11:24:31,779][02154] Num frames 300... +[2024-12-01 11:24:31,905][02154] Num frames 400... +[2024-12-01 11:24:31,981][02154] Avg episode rewards: #0: 8.160, true rewards: #0: 4.160 +[2024-12-01 11:24:31,983][02154] Avg episode reward: 8.160, avg true_objective: 4.160 +[2024-12-01 11:24:32,085][02154] Num frames 500... +[2024-12-01 11:24:32,206][02154] Num frames 600... +[2024-12-01 11:24:32,329][02154] Num frames 700... +[2024-12-01 11:24:32,462][02154] Num frames 800... +[2024-12-01 11:24:32,592][02154] Num frames 900... +[2024-12-01 11:24:32,713][02154] Num frames 1000... +[2024-12-01 11:24:32,830][02154] Num frames 1100... +[2024-12-01 11:24:32,947][02154] Num frames 1200... +[2024-12-01 11:24:33,067][02154] Num frames 1300... +[2024-12-01 11:24:33,186][02154] Num frames 1400... +[2024-12-01 11:24:33,307][02154] Num frames 1500... +[2024-12-01 11:24:33,423][02154] Num frames 1600... +[2024-12-01 11:24:33,561][02154] Num frames 1700... +[2024-12-01 11:24:33,688][02154] Num frames 1800... +[2024-12-01 11:24:33,808][02154] Num frames 1900... +[2024-12-01 11:24:33,929][02154] Num frames 2000... +[2024-12-01 11:24:34,053][02154] Num frames 2100... +[2024-12-01 11:24:34,174][02154] Num frames 2200... +[2024-12-01 11:24:34,297][02154] Num frames 2300... +[2024-12-01 11:24:34,469][02154] Avg episode rewards: #0: 30.880, true rewards: #0: 11.880 +[2024-12-01 11:24:34,471][02154] Avg episode reward: 30.880, avg true_objective: 11.880 +[2024-12-01 11:24:34,527][02154] Num frames 2400... +[2024-12-01 11:24:34,700][02154] Num frames 2500... +[2024-12-01 11:24:34,866][02154] Num frames 2600... +[2024-12-01 11:24:35,034][02154] Num frames 2700... +[2024-12-01 11:24:35,205][02154] Num frames 2800... +[2024-12-01 11:24:35,367][02154] Num frames 2900... +[2024-12-01 11:24:35,544][02154] Num frames 3000... +[2024-12-01 11:24:35,716][02154] Num frames 3100... +[2024-12-01 11:24:35,796][02154] Avg episode rewards: #0: 26.040, true rewards: #0: 10.373 +[2024-12-01 11:24:35,798][02154] Avg episode reward: 26.040, avg true_objective: 10.373 +[2024-12-01 11:24:35,953][02154] Num frames 3200... +[2024-12-01 11:24:36,124][02154] Num frames 3300... +[2024-12-01 11:24:36,294][02154] Num frames 3400... +[2024-12-01 11:24:36,468][02154] Num frames 3500... +[2024-12-01 11:24:36,659][02154] Num frames 3600... +[2024-12-01 11:24:36,826][02154] Num frames 3700... +[2024-12-01 11:24:36,942][02154] Avg episode rewards: #0: 22.880, true rewards: #0: 9.380 +[2024-12-01 11:24:36,944][02154] Avg episode reward: 22.880, avg true_objective: 9.380 +[2024-12-01 11:24:37,004][02154] Num frames 3800... +[2024-12-01 11:24:37,122][02154] Num frames 3900... +[2024-12-01 11:24:37,239][02154] Num frames 4000... +[2024-12-01 11:24:37,359][02154] Num frames 4100... +[2024-12-01 11:24:37,480][02154] Num frames 4200... +[2024-12-01 11:24:37,612][02154] Num frames 4300... +[2024-12-01 11:24:37,738][02154] Num frames 4400... +[2024-12-01 11:24:37,859][02154] Num frames 4500... +[2024-12-01 11:24:37,977][02154] Num frames 4600... +[2024-12-01 11:24:38,103][02154] Num frames 4700... +[2024-12-01 11:24:38,226][02154] Num frames 4800... +[2024-12-01 11:24:38,346][02154] Num frames 4900... +[2024-12-01 11:24:38,481][02154] Avg episode rewards: #0: 23.936, true rewards: #0: 9.936 +[2024-12-01 11:24:38,482][02154] Avg episode reward: 23.936, avg true_objective: 9.936 +[2024-12-01 11:24:38,526][02154] Num frames 5000... +[2024-12-01 11:24:38,648][02154] Num frames 5100... +[2024-12-01 11:24:38,780][02154] Num frames 5200... +[2024-12-01 11:24:38,921][02154] Avg episode rewards: #0: 21.117, true rewards: #0: 8.783 +[2024-12-01 11:24:38,924][02154] Avg episode reward: 21.117, avg true_objective: 8.783 +[2024-12-01 11:24:38,962][02154] Num frames 5300... +[2024-12-01 11:24:39,094][02154] Num frames 5400... +[2024-12-01 11:24:39,213][02154] Num frames 5500... +[2024-12-01 11:24:39,335][02154] Num frames 5600... +[2024-12-01 11:24:39,457][02154] Num frames 5700... +[2024-12-01 11:24:39,592][02154] Num frames 5800... +[2024-12-01 11:24:39,726][02154] Num frames 5900... +[2024-12-01 11:24:39,801][02154] Avg episode rewards: #0: 19.593, true rewards: #0: 8.450 +[2024-12-01 11:24:39,802][02154] Avg episode reward: 19.593, avg true_objective: 8.450 +[2024-12-01 11:24:39,905][02154] Num frames 6000... +[2024-12-01 11:24:40,029][02154] Num frames 6100... +[2024-12-01 11:24:40,149][02154] Num frames 6200... +[2024-12-01 11:24:40,323][02154] Avg episode rewards: #0: 17.624, true rewards: #0: 7.874 +[2024-12-01 11:24:40,325][02154] Avg episode reward: 17.624, avg true_objective: 7.874 +[2024-12-01 11:24:40,329][02154] Num frames 6300... +[2024-12-01 11:24:40,450][02154] Num frames 6400... +[2024-12-01 11:24:40,584][02154] Num frames 6500... +[2024-12-01 11:24:40,705][02154] Num frames 6600... +[2024-12-01 11:24:40,838][02154] Num frames 6700... +[2024-12-01 11:24:40,965][02154] Num frames 6800... +[2024-12-01 11:24:41,088][02154] Num frames 6900... +[2024-12-01 11:24:41,229][02154] Avg episode rewards: #0: 16.857, true rewards: #0: 7.746 +[2024-12-01 11:24:41,232][02154] Avg episode reward: 16.857, avg true_objective: 7.746 +[2024-12-01 11:24:41,269][02154] Num frames 7000... +[2024-12-01 11:24:41,389][02154] Num frames 7100... +[2024-12-01 11:24:41,518][02154] Num frames 7200... +[2024-12-01 11:24:41,651][02154] Num frames 7300... +[2024-12-01 11:24:41,781][02154] Num frames 7400... +[2024-12-01 11:24:41,903][02154] Num frames 7500... +[2024-12-01 11:24:42,025][02154] Num frames 7600... +[2024-12-01 11:24:42,148][02154] Num frames 7700... +[2024-12-01 11:24:42,276][02154] Avg episode rewards: #0: 16.660, true rewards: #0: 7.760 +[2024-12-01 11:24:42,277][02154] Avg episode reward: 16.660, avg true_objective: 7.760 +[2024-12-01 11:25:28,135][02154] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-12-01 11:31:45,835][02154] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-12-01 11:31:45,836][02154] Overriding arg 'num_workers' with value 1 passed from command line +[2024-12-01 11:31:45,838][02154] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-12-01 11:31:45,840][02154] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-12-01 11:31:45,842][02154] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-12-01 11:31:45,844][02154] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-12-01 11:31:45,845][02154] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-12-01 11:31:45,846][02154] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-12-01 11:31:45,847][02154] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-12-01 11:31:45,849][02154] Adding new argument 'hf_repository'='Farseer-W/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-12-01 11:31:45,849][02154] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-12-01 11:31:45,850][02154] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-12-01 11:31:45,851][02154] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-12-01 11:31:45,852][02154] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-12-01 11:31:45,853][02154] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-12-01 11:31:45,893][02154] RunningMeanStd input shape: (3, 72, 128) +[2024-12-01 11:31:45,895][02154] RunningMeanStd input shape: (1,) +[2024-12-01 11:31:45,912][02154] ConvEncoder: input_channels=3 +[2024-12-01 11:31:45,973][02154] Conv encoder output size: 512 +[2024-12-01 11:31:45,975][02154] Policy head output size: 512 +[2024-12-01 11:31:46,002][02154] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-01 11:31:46,647][02154] Num frames 100... +[2024-12-01 11:31:46,832][02154] Num frames 200... +[2024-12-01 11:31:47,011][02154] Num frames 300... +[2024-12-01 11:31:47,167][02154] Num frames 400... +[2024-12-01 11:31:47,289][02154] Num frames 500... +[2024-12-01 11:31:47,414][02154] Num frames 600... +[2024-12-01 11:31:47,541][02154] Num frames 700... +[2024-12-01 11:31:47,663][02154] Num frames 800... +[2024-12-01 11:31:47,780][02154] Num frames 900... +[2024-12-01 11:31:47,901][02154] Num frames 1000... +[2024-12-01 11:31:48,021][02154] Num frames 1100... +[2024-12-01 11:31:48,142][02154] Num frames 1200... +[2024-12-01 11:31:48,259][02154] Num frames 1300... +[2024-12-01 11:31:48,390][02154] Num frames 1400... +[2024-12-01 11:31:48,512][02154] Num frames 1500... +[2024-12-01 11:31:48,641][02154] Num frames 1600... +[2024-12-01 11:31:48,764][02154] Num frames 1700... +[2024-12-01 11:31:48,889][02154] Num frames 1800... +[2024-12-01 11:31:49,012][02154] Num frames 1900... +[2024-12-01 11:31:49,093][02154] Avg episode rewards: #0: 43.199, true rewards: #0: 19.200 +[2024-12-01 11:31:49,095][02154] Avg episode reward: 43.199, avg true_objective: 19.200 +[2024-12-01 11:31:49,198][02154] Num frames 2000... +[2024-12-01 11:31:49,317][02154] Num frames 2100... +[2024-12-01 11:31:49,447][02154] Num frames 2200... +[2024-12-01 11:31:49,577][02154] Num frames 2300... +[2024-12-01 11:31:49,698][02154] Num frames 2400... +[2024-12-01 11:31:49,816][02154] Avg episode rewards: #0: 25.750, true rewards: #0: 12.250 +[2024-12-01 11:31:49,817][02154] Avg episode reward: 25.750, avg true_objective: 12.250 +[2024-12-01 11:31:49,878][02154] Num frames 2500... +[2024-12-01 11:31:50,000][02154] Num frames 2600... +[2024-12-01 11:31:50,125][02154] Num frames 2700... +[2024-12-01 11:31:50,247][02154] Num frames 2800... +[2024-12-01 11:31:50,382][02154] Num frames 2900... +[2024-12-01 11:31:50,550][02154] Avg episode rewards: #0: 19.646, true rewards: #0: 9.980 +[2024-12-01 11:31:50,554][02154] Avg episode reward: 19.646, avg true_objective: 9.980 +[2024-12-01 11:31:50,563][02154] Num frames 3000... +[2024-12-01 11:31:50,686][02154] Num frames 3100... +[2024-12-01 11:31:50,811][02154] Num frames 3200... +[2024-12-01 11:31:50,931][02154] Num frames 3300... +[2024-12-01 11:31:51,054][02154] Num frames 3400... +[2024-12-01 11:31:51,177][02154] Num frames 3500... +[2024-12-01 11:31:51,308][02154] Num frames 3600... +[2024-12-01 11:31:51,455][02154] Avg episode rewards: #0: 17.915, true rewards: #0: 9.165 +[2024-12-01 11:31:51,456][02154] Avg episode reward: 17.915, avg true_objective: 9.165 +[2024-12-01 11:31:51,501][02154] Num frames 3700... +[2024-12-01 11:31:51,635][02154] Num frames 3800... +[2024-12-01 11:31:51,756][02154] Num frames 3900... +[2024-12-01 11:31:51,880][02154] Num frames 4000... +[2024-12-01 11:31:52,000][02154] Num frames 4100... +[2024-12-01 11:31:52,127][02154] Num frames 4200... +[2024-12-01 11:31:52,249][02154] Num frames 4300... +[2024-12-01 11:31:52,372][02154] Num frames 4400... +[2024-12-01 11:31:52,505][02154] Num frames 4500... +[2024-12-01 11:31:52,636][02154] Num frames 4600... +[2024-12-01 11:31:52,759][02154] Num frames 4700... +[2024-12-01 11:31:52,879][02154] Num frames 4800... +[2024-12-01 11:31:52,998][02154] Num frames 4900... +[2024-12-01 11:31:53,124][02154] Num frames 5000... +[2024-12-01 11:31:53,244][02154] Num frames 5100... +[2024-12-01 11:31:53,361][02154] Avg episode rewards: #0: 21.288, true rewards: #0: 10.288 +[2024-12-01 11:31:53,362][02154] Avg episode reward: 21.288, avg true_objective: 10.288 +[2024-12-01 11:31:53,432][02154] Num frames 5200... +[2024-12-01 11:31:53,569][02154] Num frames 5300... +[2024-12-01 11:31:53,688][02154] Num frames 5400... +[2024-12-01 11:31:53,808][02154] Num frames 5500... +[2024-12-01 11:31:53,927][02154] Num frames 5600... +[2024-12-01 11:31:54,051][02154] Num frames 5700... +[2024-12-01 11:31:54,175][02154] Num frames 5800... +[2024-12-01 11:31:54,292][02154] Num frames 5900... +[2024-12-01 11:31:54,413][02154] Num frames 6000... +[2024-12-01 11:31:54,548][02154] Num frames 6100... +[2024-12-01 11:31:54,668][02154] Num frames 6200... +[2024-12-01 11:31:54,792][02154] Num frames 6300... +[2024-12-01 11:31:54,916][02154] Num frames 6400... +[2024-12-01 11:31:55,039][02154] Avg episode rewards: #0: 22.760, true rewards: #0: 10.760 +[2024-12-01 11:31:55,041][02154] Avg episode reward: 22.760, avg true_objective: 10.760 +[2024-12-01 11:31:55,102][02154] Num frames 6500... +[2024-12-01 11:31:55,226][02154] Num frames 6600... +[2024-12-01 11:31:55,349][02154] Num frames 6700... +[2024-12-01 11:31:55,471][02154] Num frames 6800... +[2024-12-01 11:31:55,608][02154] Num frames 6900... +[2024-12-01 11:31:55,734][02154] Num frames 7000... +[2024-12-01 11:31:55,858][02154] Num frames 7100... +[2024-12-01 11:31:55,980][02154] Num frames 7200... +[2024-12-01 11:31:56,105][02154] Num frames 7300... +[2024-12-01 11:31:56,230][02154] Num frames 7400... +[2024-12-01 11:31:56,348][02154] Num frames 7500... +[2024-12-01 11:31:56,471][02154] Num frames 7600... +[2024-12-01 11:31:56,611][02154] Num frames 7700... +[2024-12-01 11:31:56,739][02154] Num frames 7800... +[2024-12-01 11:31:56,860][02154] Num frames 7900... +[2024-12-01 11:31:56,983][02154] Num frames 8000... +[2024-12-01 11:31:57,113][02154] Num frames 8100... +[2024-12-01 11:31:57,306][02154] Num frames 8200... +[2024-12-01 11:31:57,369][02154] Avg episode rewards: #0: 26.290, true rewards: #0: 11.719 +[2024-12-01 11:31:57,372][02154] Avg episode reward: 26.290, avg true_objective: 11.719 +[2024-12-01 11:31:57,583][02154] Num frames 8300... +[2024-12-01 11:31:57,772][02154] Num frames 8400... +[2024-12-01 11:31:57,969][02154] Num frames 8500... +[2024-12-01 11:31:58,184][02154] Num frames 8600... +[2024-12-01 11:31:58,376][02154] Num frames 8700... +[2024-12-01 11:31:58,584][02154] Num frames 8800... +[2024-12-01 11:31:58,759][02154] Num frames 8900... +[2024-12-01 11:31:58,928][02154] Num frames 9000... +[2024-12-01 11:31:59,097][02154] Num frames 9100... +[2024-12-01 11:31:59,266][02154] Num frames 9200... +[2024-12-01 11:31:59,440][02154] Num frames 9300... +[2024-12-01 11:31:59,627][02154] Num frames 9400... +[2024-12-01 11:31:59,811][02154] Num frames 9500... +[2024-12-01 11:31:59,983][02154] Num frames 9600... +[2024-12-01 11:32:00,169][02154] Num frames 9700... +[2024-12-01 11:32:00,240][02154] Avg episode rewards: #0: 27.509, true rewards: #0: 12.134 +[2024-12-01 11:32:00,242][02154] Avg episode reward: 27.509, avg true_objective: 12.134 +[2024-12-01 11:32:00,351][02154] Num frames 9800... +[2024-12-01 11:32:00,471][02154] Num frames 9900... +[2024-12-01 11:32:00,599][02154] Num frames 10000... +[2024-12-01 11:32:00,729][02154] Num frames 10100... +[2024-12-01 11:32:00,853][02154] Num frames 10200... +[2024-12-01 11:32:00,977][02154] Num frames 10300... +[2024-12-01 11:32:01,106][02154] Num frames 10400... +[2024-12-01 11:32:01,246][02154] Avg episode rewards: #0: 26.299, true rewards: #0: 11.632 +[2024-12-01 11:32:01,247][02154] Avg episode reward: 26.299, avg true_objective: 11.632 +[2024-12-01 11:32:01,287][02154] Num frames 10500... +[2024-12-01 11:32:01,405][02154] Num frames 10600... +[2024-12-01 11:32:01,530][02154] Num frames 10700... +[2024-12-01 11:32:01,657][02154] Num frames 10800... +[2024-12-01 11:32:01,787][02154] Num frames 10900... +[2024-12-01 11:32:01,911][02154] Num frames 11000... +[2024-12-01 11:32:02,031][02154] Num frames 11100... +[2024-12-01 11:32:02,156][02154] Num frames 11200... +[2024-12-01 11:32:02,278][02154] Num frames 11300... +[2024-12-01 11:32:02,409][02154] Avg episode rewards: #0: 25.462, true rewards: #0: 11.362 +[2024-12-01 11:32:02,411][02154] Avg episode reward: 25.462, avg true_objective: 11.362 +[2024-12-01 11:33:08,588][02154] Replay video saved to /content/train_dir/default_experiment/replay.mp4!