[2024-11-28 08:28:46,403][00195] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-11-28 08:28:46,405][00195] Rollout worker 0 uses device cpu [2024-11-28 08:28:46,411][00195] Rollout worker 1 uses device cpu [2024-11-28 08:28:46,412][00195] Rollout worker 2 uses device cpu [2024-11-28 08:28:46,413][00195] Rollout worker 3 uses device cpu [2024-11-28 08:28:46,414][00195] Rollout worker 4 uses device cpu [2024-11-28 08:28:46,416][00195] Rollout worker 5 uses device cpu [2024-11-28 08:28:46,417][00195] Rollout worker 6 uses device cpu [2024-11-28 08:28:46,418][00195] Rollout worker 7 uses device cpu [2024-11-28 08:28:46,576][00195] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-28 08:28:46,578][00195] InferenceWorker_p0-w0: min num requests: 2 [2024-11-28 08:28:46,614][00195] Starting all processes... [2024-11-28 08:28:46,616][00195] Starting process learner_proc0 [2024-11-28 08:28:46,665][00195] Starting all processes... [2024-11-28 08:28:46,675][00195] Starting process inference_proc0-0 [2024-11-28 08:28:46,676][00195] Starting process rollout_proc0 [2024-11-28 08:28:46,680][00195] Starting process rollout_proc1 [2024-11-28 08:28:46,680][00195] Starting process rollout_proc2 [2024-11-28 08:28:46,681][00195] Starting process rollout_proc3 [2024-11-28 08:28:46,681][00195] Starting process rollout_proc4 [2024-11-28 08:28:46,681][00195] Starting process rollout_proc5 [2024-11-28 08:28:46,681][00195] Starting process rollout_proc6 [2024-11-28 08:28:46,681][00195] Starting process rollout_proc7 [2024-11-28 08:29:04,320][02276] Worker 6 uses CPU cores [0] [2024-11-28 08:29:04,460][02251] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-28 08:29:04,467][02251] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-28 08:29:04,522][02251] Num visible devices: 1 [2024-11-28 08:29:04,559][02251] Starting seed is not provided [2024-11-28 08:29:04,560][02251] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-28 08:29:04,561][02251] Initializing actor-critic model on device cuda:0 [2024-11-28 08:29:04,562][02251] RunningMeanStd input shape: (3, 72, 128) [2024-11-28 08:29:04,565][02251] RunningMeanStd input shape: (1,) [2024-11-28 08:29:04,644][02251] ConvEncoder: input_channels=3 [2024-11-28 08:29:04,709][02271] Worker 2 uses CPU cores [0] [2024-11-28 08:29:04,708][02270] Worker 1 uses CPU cores [1] [2024-11-28 08:29:04,714][02273] Worker 4 uses CPU cores [0] [2024-11-28 08:29:04,773][02269] Worker 0 uses CPU cores [0] [2024-11-28 08:29:04,840][02272] Worker 3 uses CPU cores [1] [2024-11-28 08:29:04,872][02268] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-28 08:29:04,873][02268] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-28 08:29:04,889][02275] Worker 7 uses CPU cores [1] [2024-11-28 08:29:04,891][02268] Num visible devices: 1 [2024-11-28 08:29:04,956][02274] Worker 5 uses CPU cores [1] [2024-11-28 08:29:05,017][02251] Conv encoder output size: 512 [2024-11-28 08:29:05,018][02251] Policy head output size: 512 [2024-11-28 08:29:05,068][02251] Created Actor Critic model with architecture: [2024-11-28 08:29:05,068][02251] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-28 08:29:05,458][02251] Using optimizer [2024-11-28 08:29:06,568][00195] Heartbeat connected on Batcher_0 [2024-11-28 08:29:06,577][00195] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-28 08:29:06,586][00195] Heartbeat connected on RolloutWorker_w0 [2024-11-28 08:29:06,590][00195] Heartbeat connected on RolloutWorker_w1 [2024-11-28 08:29:06,599][00195] Heartbeat connected on RolloutWorker_w3 [2024-11-28 08:29:06,600][00195] Heartbeat connected on RolloutWorker_w2 [2024-11-28 08:29:06,604][00195] Heartbeat connected on RolloutWorker_w4 [2024-11-28 08:29:06,607][00195] Heartbeat connected on RolloutWorker_w5 [2024-11-28 08:29:06,610][00195] Heartbeat connected on RolloutWorker_w6 [2024-11-28 08:29:06,614][00195] Heartbeat connected on RolloutWorker_w7 [2024-11-28 08:29:09,858][02251] No checkpoints found [2024-11-28 08:29:09,859][02251] Did not load from checkpoint, starting from scratch! [2024-11-28 08:29:09,860][02251] Initialized policy 0 weights for model version 0 [2024-11-28 08:29:09,870][02251] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-28 08:29:09,877][02251] LearnerWorker_p0 finished initialization! [2024-11-28 08:29:09,878][00195] Heartbeat connected on LearnerWorker_p0 [2024-11-28 08:29:10,137][02268] RunningMeanStd input shape: (3, 72, 128) [2024-11-28 08:29:10,139][02268] RunningMeanStd input shape: (1,) [2024-11-28 08:29:10,158][02268] ConvEncoder: input_channels=3 [2024-11-28 08:29:10,334][02268] Conv encoder output size: 512 [2024-11-28 08:29:10,335][02268] Policy head output size: 512 [2024-11-28 08:29:10,417][00195] Inference worker 0-0 is ready! [2024-11-28 08:29:10,419][00195] All inference workers are ready! Signal rollout workers to start! [2024-11-28 08:29:10,650][02276] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-28 08:29:10,651][02269] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-28 08:29:10,652][02273] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-28 08:29:10,653][02271] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-28 08:29:10,695][02272] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-28 08:29:10,699][02270] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-28 08:29:10,700][02274] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-28 08:29:10,703][02275] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-28 08:29:11,776][02274] Decorrelating experience for 0 frames... [2024-11-28 08:29:11,775][02275] Decorrelating experience for 0 frames... [2024-11-28 08:29:12,039][02269] Decorrelating experience for 0 frames... [2024-11-28 08:29:12,044][02276] Decorrelating experience for 0 frames... [2024-11-28 08:29:12,044][02273] Decorrelating experience for 0 frames... [2024-11-28 08:29:12,824][02275] Decorrelating experience for 32 frames... [2024-11-28 08:29:12,907][02270] Decorrelating experience for 0 frames... [2024-11-28 08:29:13,322][02272] Decorrelating experience for 0 frames... [2024-11-28 08:29:13,576][02273] Decorrelating experience for 32 frames... [2024-11-28 08:29:13,580][02269] Decorrelating experience for 32 frames... [2024-11-28 08:29:13,588][02276] Decorrelating experience for 32 frames... [2024-11-28 08:29:13,584][02271] Decorrelating experience for 0 frames... [2024-11-28 08:29:13,966][02274] Decorrelating experience for 32 frames... [2024-11-28 08:29:14,285][02275] Decorrelating experience for 64 frames... [2024-11-28 08:29:14,396][00195] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-28 08:29:14,526][02270] Decorrelating experience for 32 frames... [2024-11-28 08:29:14,915][02271] Decorrelating experience for 32 frames... [2024-11-28 08:29:15,226][02273] Decorrelating experience for 64 frames... [2024-11-28 08:29:15,234][02269] Decorrelating experience for 64 frames... [2024-11-28 08:29:15,375][02275] Decorrelating experience for 96 frames... [2024-11-28 08:29:15,686][02270] Decorrelating experience for 64 frames... [2024-11-28 08:29:16,085][02274] Decorrelating experience for 64 frames... [2024-11-28 08:29:16,580][02272] Decorrelating experience for 32 frames... [2024-11-28 08:29:17,117][02276] Decorrelating experience for 64 frames... [2024-11-28 08:29:17,119][02271] Decorrelating experience for 64 frames... [2024-11-28 08:29:17,244][02273] Decorrelating experience for 96 frames... [2024-11-28 08:29:17,667][02269] Decorrelating experience for 96 frames... [2024-11-28 08:29:19,337][02276] Decorrelating experience for 96 frames... [2024-11-28 08:29:19,358][02271] Decorrelating experience for 96 frames... [2024-11-28 08:29:19,397][00195] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-28 08:29:19,399][02274] Decorrelating experience for 96 frames... [2024-11-28 08:29:19,403][00195] Avg episode reward: [(0, '2.173')] [2024-11-28 08:29:19,868][02272] Decorrelating experience for 64 frames... [2024-11-28 08:29:20,507][02270] Decorrelating experience for 96 frames... [2024-11-28 08:29:24,400][00195] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 225.9. Samples: 2260. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-28 08:29:24,408][00195] Avg episode reward: [(0, '2.758')] [2024-11-28 08:29:24,555][02272] Decorrelating experience for 96 frames... [2024-11-28 08:29:24,599][02251] Signal inference workers to stop experience collection... [2024-11-28 08:29:24,633][02268] InferenceWorker_p0-w0: stopping experience collection [2024-11-28 08:29:27,096][02251] Signal inference workers to resume experience collection... [2024-11-28 08:29:27,097][02268] InferenceWorker_p0-w0: resuming experience collection [2024-11-28 08:29:29,397][00195] Fps is (10 sec: 1638.4, 60 sec: 1092.3, 300 sec: 1092.3). Total num frames: 16384. Throughput: 0: 255.2. Samples: 3828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2024-11-28 08:29:29,399][00195] Avg episode reward: [(0, '3.227')] [2024-11-28 08:29:34,396][00195] Fps is (10 sec: 3687.8, 60 sec: 1843.2, 300 sec: 1843.2). Total num frames: 36864. Throughput: 0: 334.7. Samples: 6694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:29:34,399][00195] Avg episode reward: [(0, '3.727')] [2024-11-28 08:29:34,989][02268] Updated weights for policy 0, policy_version 10 (0.0030) [2024-11-28 08:29:39,398][00195] Fps is (10 sec: 3685.7, 60 sec: 2129.8, 300 sec: 2129.8). Total num frames: 53248. Throughput: 0: 531.0. Samples: 13276. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-28 08:29:39,403][00195] Avg episode reward: [(0, '4.244')] [2024-11-28 08:29:44,397][00195] Fps is (10 sec: 3276.8, 60 sec: 2321.1, 300 sec: 2321.1). Total num frames: 69632. Throughput: 0: 582.0. Samples: 17460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-28 08:29:44,401][00195] Avg episode reward: [(0, '4.388')] [2024-11-28 08:29:46,789][02268] Updated weights for policy 0, policy_version 20 (0.0040) [2024-11-28 08:29:49,397][00195] Fps is (10 sec: 3687.1, 60 sec: 2574.6, 300 sec: 2574.6). Total num frames: 90112. Throughput: 0: 590.1. Samples: 20652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:29:49,400][00195] Avg episode reward: [(0, '4.478')] [2024-11-28 08:29:54,396][00195] Fps is (10 sec: 4096.0, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 110592. Throughput: 0: 669.3. Samples: 26772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:29:54,400][00195] Avg episode reward: [(0, '4.395')] [2024-11-28 08:29:54,403][02251] Saving new best policy, reward=4.395! [2024-11-28 08:29:57,604][02268] Updated weights for policy 0, policy_version 30 (0.0019) [2024-11-28 08:29:59,397][00195] Fps is (10 sec: 3686.4, 60 sec: 2821.7, 300 sec: 2821.7). Total num frames: 126976. Throughput: 0: 702.2. Samples: 31600. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:29:59,402][00195] Avg episode reward: [(0, '4.283')] [2024-11-28 08:30:04,397][00195] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 143360. Throughput: 0: 750.9. Samples: 33802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:30:04,399][00195] Avg episode reward: [(0, '4.352')] [2024-11-28 08:30:08,292][02268] Updated weights for policy 0, policy_version 40 (0.0019) [2024-11-28 08:30:09,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3053.4, 300 sec: 3053.4). Total num frames: 167936. Throughput: 0: 852.5. Samples: 40620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:30:09,399][00195] Avg episode reward: [(0, '4.472')] [2024-11-28 08:30:09,406][02251] Saving new best policy, reward=4.472! [2024-11-28 08:30:14,398][00195] Fps is (10 sec: 4095.3, 60 sec: 3071.9, 300 sec: 3071.9). Total num frames: 184320. Throughput: 0: 946.3. Samples: 46414. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:30:14,406][00195] Avg episode reward: [(0, '4.416')] [2024-11-28 08:30:19,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3087.7). Total num frames: 200704. Throughput: 0: 927.9. Samples: 48448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:30:19,399][00195] Avg episode reward: [(0, '4.437')] [2024-11-28 08:30:20,102][02268] Updated weights for policy 0, policy_version 50 (0.0034) [2024-11-28 08:30:24,397][00195] Fps is (10 sec: 3687.0, 60 sec: 3686.6, 300 sec: 3159.8). Total num frames: 221184. Throughput: 0: 915.7. Samples: 54480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:30:24,399][00195] Avg episode reward: [(0, '4.506')] [2024-11-28 08:30:24,471][02251] Saving new best policy, reward=4.506! [2024-11-28 08:30:29,397][00195] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 971.4. Samples: 61172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:30:29,402][00195] Avg episode reward: [(0, '4.484')] [2024-11-28 08:30:29,580][02268] Updated weights for policy 0, policy_version 60 (0.0023) [2024-11-28 08:30:34,398][00195] Fps is (10 sec: 3685.8, 60 sec: 3686.3, 300 sec: 3225.5). Total num frames: 258048. Throughput: 0: 944.9. Samples: 63174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:30:34,404][00195] Avg episode reward: [(0, '4.588')] [2024-11-28 08:30:34,408][02251] Saving new best policy, reward=4.588! [2024-11-28 08:30:39,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3228.6). Total num frames: 274432. Throughput: 0: 917.7. Samples: 68068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:30:39,398][00195] Avg episode reward: [(0, '4.478')] [2024-11-28 08:30:39,408][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth... [2024-11-28 08:30:41,288][02268] Updated weights for policy 0, policy_version 70 (0.0021) [2024-11-28 08:30:44,397][00195] Fps is (10 sec: 4096.7, 60 sec: 3822.9, 300 sec: 3322.3). Total num frames: 299008. Throughput: 0: 959.0. Samples: 74756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:30:44,398][00195] Avg episode reward: [(0, '4.357')] [2024-11-28 08:30:49,397][00195] Fps is (10 sec: 4095.7, 60 sec: 3754.6, 300 sec: 3319.9). Total num frames: 315392. Throughput: 0: 979.3. Samples: 77872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-28 08:30:49,404][00195] Avg episode reward: [(0, '4.453')] [2024-11-28 08:30:53,021][02268] Updated weights for policy 0, policy_version 80 (0.0019) [2024-11-28 08:30:54,397][00195] Fps is (10 sec: 3276.6, 60 sec: 3686.4, 300 sec: 3317.7). Total num frames: 331776. Throughput: 0: 920.9. Samples: 82062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:30:54,402][00195] Avg episode reward: [(0, '4.415')] [2024-11-28 08:30:59,396][00195] Fps is (10 sec: 4096.3, 60 sec: 3822.9, 300 sec: 3393.8). Total num frames: 356352. Throughput: 0: 940.3. Samples: 88724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:30:59,401][00195] Avg episode reward: [(0, '4.372')] [2024-11-28 08:31:01,999][02268] Updated weights for policy 0, policy_version 90 (0.0015) [2024-11-28 08:31:04,402][00195] Fps is (10 sec: 4093.9, 60 sec: 3822.6, 300 sec: 3388.3). Total num frames: 372736. Throughput: 0: 966.9. Samples: 91966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:31:04,405][00195] Avg episode reward: [(0, '4.440')] [2024-11-28 08:31:09,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3383.7). Total num frames: 389120. Throughput: 0: 940.9. Samples: 96822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:31:09,401][00195] Avg episode reward: [(0, '4.442')] [2024-11-28 08:31:13,771][02268] Updated weights for policy 0, policy_version 100 (0.0033) [2024-11-28 08:31:14,396][00195] Fps is (10 sec: 3688.5, 60 sec: 3754.8, 300 sec: 3413.3). Total num frames: 409600. Throughput: 0: 919.5. Samples: 102548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:31:14,401][00195] Avg episode reward: [(0, '4.408')] [2024-11-28 08:31:19,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3473.4). Total num frames: 434176. Throughput: 0: 951.1. Samples: 105974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-28 08:31:19,403][00195] Avg episode reward: [(0, '4.485')] [2024-11-28 08:31:24,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3434.3). Total num frames: 446464. Throughput: 0: 972.5. Samples: 111830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-28 08:31:24,402][00195] Avg episode reward: [(0, '4.426')] [2024-11-28 08:31:24,524][02268] Updated weights for policy 0, policy_version 110 (0.0039) [2024-11-28 08:31:29,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3458.8). Total num frames: 466944. Throughput: 0: 931.0. Samples: 116652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:31:29,403][00195] Avg episode reward: [(0, '4.378')] [2024-11-28 08:31:34,397][00195] Fps is (10 sec: 4095.7, 60 sec: 3823.0, 300 sec: 3481.6). Total num frames: 487424. Throughput: 0: 935.3. Samples: 119960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:31:34,404][00195] Avg episode reward: [(0, '4.393')] [2024-11-28 08:31:34,571][02268] Updated weights for policy 0, policy_version 120 (0.0027) [2024-11-28 08:31:39,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3502.8). Total num frames: 507904. Throughput: 0: 989.1. Samples: 126570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:31:39,399][00195] Avg episode reward: [(0, '4.437')] [2024-11-28 08:31:44,396][00195] Fps is (10 sec: 3277.0, 60 sec: 3686.4, 300 sec: 3467.9). Total num frames: 520192. Throughput: 0: 935.0. Samples: 130798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:31:44,399][00195] Avg episode reward: [(0, '4.362')] [2024-11-28 08:31:46,120][02268] Updated weights for policy 0, policy_version 130 (0.0023) [2024-11-28 08:31:49,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3514.6). Total num frames: 544768. Throughput: 0: 933.4. Samples: 133962. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-28 08:31:49,399][00195] Avg episode reward: [(0, '4.470')] [2024-11-28 08:31:54,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3532.8). Total num frames: 565248. Throughput: 0: 977.0. Samples: 140786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:31:54,401][00195] Avg episode reward: [(0, '4.399')] [2024-11-28 08:31:55,889][02268] Updated weights for policy 0, policy_version 140 (0.0014) [2024-11-28 08:31:59,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3525.0). Total num frames: 581632. Throughput: 0: 956.8. Samples: 145606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:31:59,402][00195] Avg episode reward: [(0, '4.375')] [2024-11-28 08:32:04,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3755.0, 300 sec: 3517.7). Total num frames: 598016. Throughput: 0: 927.2. Samples: 147698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:32:04,398][00195] Avg episode reward: [(0, '4.389')] [2024-11-28 08:32:07,123][02268] Updated weights for policy 0, policy_version 150 (0.0016) [2024-11-28 08:32:09,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3557.7). Total num frames: 622592. Throughput: 0: 949.2. Samples: 154546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:32:09,404][00195] Avg episode reward: [(0, '4.396')] [2024-11-28 08:32:14,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 638976. Throughput: 0: 970.6. Samples: 160330. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:32:14,399][00195] Avg episode reward: [(0, '4.395')] [2024-11-28 08:32:18,987][02268] Updated weights for policy 0, policy_version 160 (0.0027) [2024-11-28 08:32:19,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3542.5). Total num frames: 655360. Throughput: 0: 944.0. Samples: 162438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:32:19,404][00195] Avg episode reward: [(0, '4.373')] [2024-11-28 08:32:24,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3578.6). Total num frames: 679936. Throughput: 0: 929.7. Samples: 168408. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-28 08:32:24,399][00195] Avg episode reward: [(0, '4.616')] [2024-11-28 08:32:24,406][02251] Saving new best policy, reward=4.616! [2024-11-28 08:32:27,716][02268] Updated weights for policy 0, policy_version 170 (0.0016) [2024-11-28 08:32:29,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3591.9). Total num frames: 700416. Throughput: 0: 988.8. Samples: 175292. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-28 08:32:29,399][00195] Avg episode reward: [(0, '4.684')] [2024-11-28 08:32:29,408][02251] Saving new best policy, reward=4.684! [2024-11-28 08:32:34,397][00195] Fps is (10 sec: 3276.6, 60 sec: 3754.7, 300 sec: 3563.5). Total num frames: 712704. Throughput: 0: 962.4. Samples: 177270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:32:34,400][00195] Avg episode reward: [(0, '4.497')] [2024-11-28 08:32:39,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3576.5). Total num frames: 733184. Throughput: 0: 921.6. Samples: 182258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:32:39,399][00195] Avg episode reward: [(0, '4.243')] [2024-11-28 08:32:39,412][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth... [2024-11-28 08:32:39,957][02268] Updated weights for policy 0, policy_version 180 (0.0027) [2024-11-28 08:32:44,396][00195] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3608.4). Total num frames: 757760. Throughput: 0: 964.7. Samples: 189018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:32:44,399][00195] Avg episode reward: [(0, '4.377')] [2024-11-28 08:32:49,400][00195] Fps is (10 sec: 3685.0, 60 sec: 3754.4, 300 sec: 3581.6). Total num frames: 770048. Throughput: 0: 988.5. Samples: 192186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:32:49,403][00195] Avg episode reward: [(0, '4.664')] [2024-11-28 08:32:50,916][02268] Updated weights for policy 0, policy_version 190 (0.0022) [2024-11-28 08:32:54,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3593.3). Total num frames: 790528. Throughput: 0: 930.7. Samples: 196428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:32:54,398][00195] Avg episode reward: [(0, '4.635')] [2024-11-28 08:32:59,397][00195] Fps is (10 sec: 4097.4, 60 sec: 3822.9, 300 sec: 3604.5). Total num frames: 811008. Throughput: 0: 952.3. Samples: 203186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:32:59,399][00195] Avg episode reward: [(0, '4.481')] [2024-11-28 08:33:00,313][02268] Updated weights for policy 0, policy_version 200 (0.0015) [2024-11-28 08:33:04,397][00195] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3615.2). Total num frames: 831488. Throughput: 0: 979.2. Samples: 206504. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-28 08:33:04,401][00195] Avg episode reward: [(0, '4.461')] [2024-11-28 08:33:09,397][00195] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3590.5). Total num frames: 843776. Throughput: 0: 949.4. Samples: 211130. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-28 08:33:09,404][00195] Avg episode reward: [(0, '4.604')] [2024-11-28 08:33:14,081][02268] Updated weights for policy 0, policy_version 210 (0.0027) [2024-11-28 08:33:14,397][00195] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3584.0). Total num frames: 860160. Throughput: 0: 889.8. Samples: 215334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:33:14,401][00195] Avg episode reward: [(0, '4.467')] [2024-11-28 08:33:19,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3577.7). Total num frames: 876544. Throughput: 0: 894.5. Samples: 217522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:33:19,399][00195] Avg episode reward: [(0, '4.614')] [2024-11-28 08:33:24,398][00195] Fps is (10 sec: 3276.4, 60 sec: 3549.8, 300 sec: 3571.7). Total num frames: 892928. Throughput: 0: 902.3. Samples: 222862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:33:24,403][00195] Avg episode reward: [(0, '4.624')] [2024-11-28 08:33:26,715][02268] Updated weights for policy 0, policy_version 220 (0.0024) [2024-11-28 08:33:29,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3565.9). Total num frames: 909312. Throughput: 0: 860.0. Samples: 227718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:33:29,403][00195] Avg episode reward: [(0, '4.620')] [2024-11-28 08:33:34,396][00195] Fps is (10 sec: 4096.7, 60 sec: 3686.4, 300 sec: 3591.9). Total num frames: 933888. Throughput: 0: 865.7. Samples: 231138. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:33:34,402][00195] Avg episode reward: [(0, '4.471')] [2024-11-28 08:33:35,932][02268] Updated weights for policy 0, policy_version 230 (0.0022) [2024-11-28 08:33:39,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3585.9). Total num frames: 950272. Throughput: 0: 915.3. Samples: 237616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:33:39,399][00195] Avg episode reward: [(0, '4.529')] [2024-11-28 08:33:44,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3580.2). Total num frames: 966656. Throughput: 0: 857.3. Samples: 241762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:33:44,398][00195] Avg episode reward: [(0, '4.545')] [2024-11-28 08:33:47,880][02268] Updated weights for policy 0, policy_version 240 (0.0035) [2024-11-28 08:33:49,399][00195] Fps is (10 sec: 3685.5, 60 sec: 3618.2, 300 sec: 3589.6). Total num frames: 987136. Throughput: 0: 851.0. Samples: 244800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:33:49,404][00195] Avg episode reward: [(0, '4.359')] [2024-11-28 08:33:54,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3613.3). Total num frames: 1011712. Throughput: 0: 901.0. Samples: 251674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:33:54,402][00195] Avg episode reward: [(0, '4.578')] [2024-11-28 08:33:58,082][02268] Updated weights for policy 0, policy_version 250 (0.0018) [2024-11-28 08:33:59,396][00195] Fps is (10 sec: 3687.3, 60 sec: 3549.9, 300 sec: 3593.0). Total num frames: 1024000. Throughput: 0: 920.6. Samples: 256762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:33:59,401][00195] Avg episode reward: [(0, '4.831')] [2024-11-28 08:33:59,413][02251] Saving new best policy, reward=4.831! [2024-11-28 08:34:04,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3601.7). Total num frames: 1044480. Throughput: 0: 919.4. Samples: 258894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:34:04,401][00195] Avg episode reward: [(0, '4.669')] [2024-11-28 08:34:08,353][02268] Updated weights for policy 0, policy_version 260 (0.0018) [2024-11-28 08:34:09,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3623.9). Total num frames: 1069056. Throughput: 0: 953.0. Samples: 265744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:34:09,401][00195] Avg episode reward: [(0, '4.407')] [2024-11-28 08:34:14,399][00195] Fps is (10 sec: 4095.0, 60 sec: 3754.6, 300 sec: 3679.4). Total num frames: 1085440. Throughput: 0: 977.4. Samples: 271702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:34:14,401][00195] Avg episode reward: [(0, '4.758')] [2024-11-28 08:34:19,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1101824. Throughput: 0: 947.2. Samples: 273764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:34:19,398][00195] Avg episode reward: [(0, '4.826')] [2024-11-28 08:34:20,041][02268] Updated weights for policy 0, policy_version 270 (0.0014) [2024-11-28 08:34:24,396][00195] Fps is (10 sec: 4097.0, 60 sec: 3891.3, 300 sec: 3762.8). Total num frames: 1126400. Throughput: 0: 939.5. Samples: 279894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:34:24,399][00195] Avg episode reward: [(0, '4.686')] [2024-11-28 08:34:29,283][02268] Updated weights for policy 0, policy_version 280 (0.0026) [2024-11-28 08:34:29,403][00195] Fps is (10 sec: 4502.6, 60 sec: 3959.0, 300 sec: 3762.7). Total num frames: 1146880. Throughput: 0: 998.2. Samples: 286688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:34:29,406][00195] Avg episode reward: [(0, '4.658')] [2024-11-28 08:34:34,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1159168. Throughput: 0: 976.0. Samples: 288716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:34:34,401][00195] Avg episode reward: [(0, '4.553')] [2024-11-28 08:34:39,397][00195] Fps is (10 sec: 3279.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1179648. Throughput: 0: 934.7. Samples: 293736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:34:39,401][00195] Avg episode reward: [(0, '4.513')] [2024-11-28 08:34:39,412][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000288_1179648.pth... [2024-11-28 08:34:39,538][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth [2024-11-28 08:34:40,832][02268] Updated weights for policy 0, policy_version 290 (0.0030) [2024-11-28 08:34:44,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1204224. Throughput: 0: 971.7. Samples: 300490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:34:44,399][00195] Avg episode reward: [(0, '4.423')] [2024-11-28 08:34:49,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3748.9). Total num frames: 1216512. Throughput: 0: 989.0. Samples: 303400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:34:49,399][00195] Avg episode reward: [(0, '4.553')] [2024-11-28 08:34:52,707][02268] Updated weights for policy 0, policy_version 300 (0.0041) [2024-11-28 08:34:54,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1236992. Throughput: 0: 928.0. Samples: 307502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:34:54,399][00195] Avg episode reward: [(0, '4.876')] [2024-11-28 08:34:54,403][02251] Saving new best policy, reward=4.876! [2024-11-28 08:34:59,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1257472. Throughput: 0: 947.2. Samples: 314322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:34:59,399][00195] Avg episode reward: [(0, '4.982')] [2024-11-28 08:34:59,409][02251] Saving new best policy, reward=4.982! [2024-11-28 08:35:01,827][02268] Updated weights for policy 0, policy_version 310 (0.0041) [2024-11-28 08:35:04,404][00195] Fps is (10 sec: 4092.9, 60 sec: 3890.7, 300 sec: 3762.7). Total num frames: 1277952. Throughput: 0: 976.1. Samples: 317694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:35:04,407][00195] Avg episode reward: [(0, '4.835')] [2024-11-28 08:35:09,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 1290240. Throughput: 0: 943.0. Samples: 322328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:35:09,402][00195] Avg episode reward: [(0, '4.703')] [2024-11-28 08:35:13,672][02268] Updated weights for policy 0, policy_version 320 (0.0024) [2024-11-28 08:35:14,396][00195] Fps is (10 sec: 3279.3, 60 sec: 3754.8, 300 sec: 3762.8). Total num frames: 1310720. Throughput: 0: 921.4. Samples: 328144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:35:14,399][00195] Avg episode reward: [(0, '4.770')] [2024-11-28 08:35:19,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1335296. Throughput: 0: 953.6. Samples: 331630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-28 08:35:19,398][00195] Avg episode reward: [(0, '4.880')] [2024-11-28 08:35:24,098][02268] Updated weights for policy 0, policy_version 330 (0.0021) [2024-11-28 08:35:24,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1351680. Throughput: 0: 971.8. Samples: 337466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:35:24,401][00195] Avg episode reward: [(0, '4.857')] [2024-11-28 08:35:29,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.8, 300 sec: 3762.8). Total num frames: 1368064. Throughput: 0: 930.1. Samples: 342346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:35:29,405][00195] Avg episode reward: [(0, '4.802')] [2024-11-28 08:35:33,967][02268] Updated weights for policy 0, policy_version 340 (0.0020) [2024-11-28 08:35:34,399][00195] Fps is (10 sec: 4095.0, 60 sec: 3891.0, 300 sec: 3790.5). Total num frames: 1392640. Throughput: 0: 940.0. Samples: 345704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:35:34,403][00195] Avg episode reward: [(0, '4.791')] [2024-11-28 08:35:39,400][00195] Fps is (10 sec: 4094.5, 60 sec: 3822.7, 300 sec: 3762.7). Total num frames: 1409024. Throughput: 0: 993.1. Samples: 352194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:35:39,403][00195] Avg episode reward: [(0, '4.716')] [2024-11-28 08:35:44,396][00195] Fps is (10 sec: 3277.6, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1425408. Throughput: 0: 934.0. Samples: 356354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:35:44,404][00195] Avg episode reward: [(0, '4.752')] [2024-11-28 08:35:45,884][02268] Updated weights for policy 0, policy_version 350 (0.0017) [2024-11-28 08:35:49,397][00195] Fps is (10 sec: 3687.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1445888. Throughput: 0: 931.0. Samples: 359582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:35:49,403][00195] Avg episode reward: [(0, '4.883')] [2024-11-28 08:35:54,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1470464. Throughput: 0: 978.0. Samples: 366336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:35:54,398][00195] Avg episode reward: [(0, '5.083')] [2024-11-28 08:35:54,403][02251] Saving new best policy, reward=5.083! [2024-11-28 08:35:55,665][02268] Updated weights for policy 0, policy_version 360 (0.0015) [2024-11-28 08:35:59,403][00195] Fps is (10 sec: 3683.9, 60 sec: 3754.2, 300 sec: 3762.8). Total num frames: 1482752. Throughput: 0: 954.6. Samples: 371106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-28 08:35:59,410][00195] Avg episode reward: [(0, '4.896')] [2024-11-28 08:36:04,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3755.1, 300 sec: 3776.7). Total num frames: 1503232. Throughput: 0: 926.2. Samples: 373308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:36:04,399][00195] Avg episode reward: [(0, '4.951')] [2024-11-28 08:36:07,071][02268] Updated weights for policy 0, policy_version 370 (0.0025) [2024-11-28 08:36:09,397][00195] Fps is (10 sec: 4098.7, 60 sec: 3891.2, 300 sec: 3776.6). Total num frames: 1523712. Throughput: 0: 946.2. Samples: 380044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-28 08:36:09,399][00195] Avg episode reward: [(0, '5.023')] [2024-11-28 08:36:14,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1540096. Throughput: 0: 966.9. Samples: 385858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:36:14,403][00195] Avg episode reward: [(0, '4.944')] [2024-11-28 08:36:18,674][02268] Updated weights for policy 0, policy_version 380 (0.0044) [2024-11-28 08:36:19,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1556480. Throughput: 0: 939.6. Samples: 387984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:36:19,403][00195] Avg episode reward: [(0, '5.061')] [2024-11-28 08:36:24,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1581056. Throughput: 0: 931.0. Samples: 394086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:36:24,399][00195] Avg episode reward: [(0, '4.747')] [2024-11-28 08:36:27,618][02268] Updated weights for policy 0, policy_version 390 (0.0021) [2024-11-28 08:36:29,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1601536. Throughput: 0: 991.4. Samples: 400968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-28 08:36:29,399][00195] Avg episode reward: [(0, '4.793')] [2024-11-28 08:36:34,401][00195] Fps is (10 sec: 3275.4, 60 sec: 3686.3, 300 sec: 3748.8). Total num frames: 1613824. Throughput: 0: 964.4. Samples: 402986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-28 08:36:34,403][00195] Avg episode reward: [(0, '4.944')] [2024-11-28 08:36:39,320][02268] Updated weights for policy 0, policy_version 400 (0.0031) [2024-11-28 08:36:39,400][00195] Fps is (10 sec: 3685.1, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1638400. Throughput: 0: 926.1. Samples: 408012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:36:39,403][00195] Avg episode reward: [(0, '5.264')] [2024-11-28 08:36:39,414][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000400_1638400.pth... [2024-11-28 08:36:39,532][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth [2024-11-28 08:36:39,550][02251] Saving new best policy, reward=5.264! [2024-11-28 08:36:44,396][00195] Fps is (10 sec: 4507.5, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1658880. Throughput: 0: 971.2. Samples: 414804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:36:44,400][00195] Avg episode reward: [(0, '5.291')] [2024-11-28 08:36:44,404][02251] Saving new best policy, reward=5.291! [2024-11-28 08:36:49,398][00195] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 3762.7). Total num frames: 1675264. Throughput: 0: 985.4. Samples: 417652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-28 08:36:49,413][00195] Avg episode reward: [(0, '5.184')] [2024-11-28 08:36:50,487][02268] Updated weights for policy 0, policy_version 410 (0.0043) [2024-11-28 08:36:54,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1691648. Throughput: 0: 931.3. Samples: 421952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:36:54,404][00195] Avg episode reward: [(0, '5.129')] [2024-11-28 08:36:59,396][00195] Fps is (10 sec: 4096.5, 60 sec: 3891.6, 300 sec: 3790.5). Total num frames: 1716224. Throughput: 0: 959.8. Samples: 429050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:36:59,398][00195] Avg episode reward: [(0, '5.233')] [2024-11-28 08:36:59,862][02268] Updated weights for policy 0, policy_version 420 (0.0021) [2024-11-28 08:37:04,397][00195] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 1736704. Throughput: 0: 989.6. Samples: 432518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:37:04,400][00195] Avg episode reward: [(0, '5.147')] [2024-11-28 08:37:09,397][00195] Fps is (10 sec: 3276.5, 60 sec: 3754.6, 300 sec: 3762.8). Total num frames: 1748992. Throughput: 0: 954.8. Samples: 437052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:37:09,404][00195] Avg episode reward: [(0, '5.163')] [2024-11-28 08:37:11,543][02268] Updated weights for policy 0, policy_version 430 (0.0015) [2024-11-28 08:37:14,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1773568. Throughput: 0: 934.7. Samples: 443028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:37:14,399][00195] Avg episode reward: [(0, '5.476')] [2024-11-28 08:37:14,408][02251] Saving new best policy, reward=5.476! [2024-11-28 08:37:19,397][00195] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1794048. Throughput: 0: 965.9. Samples: 446448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:37:19,402][00195] Avg episode reward: [(0, '5.682')] [2024-11-28 08:37:19,411][02251] Saving new best policy, reward=5.682! [2024-11-28 08:37:21,204][02268] Updated weights for policy 0, policy_version 440 (0.0024) [2024-11-28 08:37:24,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1810432. Throughput: 0: 976.0. Samples: 451930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:37:24,401][00195] Avg episode reward: [(0, '5.462')] [2024-11-28 08:37:29,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1826816. Throughput: 0: 938.6. Samples: 457042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:37:29,404][00195] Avg episode reward: [(0, '5.400')] [2024-11-28 08:37:32,099][02268] Updated weights for policy 0, policy_version 450 (0.0035) [2024-11-28 08:37:34,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3790.5). Total num frames: 1851392. Throughput: 0: 950.7. Samples: 460434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:37:34,404][00195] Avg episode reward: [(0, '5.745')] [2024-11-28 08:37:34,413][02251] Saving new best policy, reward=5.745! [2024-11-28 08:37:39,399][00195] Fps is (10 sec: 4094.9, 60 sec: 3823.0, 300 sec: 3762.7). Total num frames: 1867776. Throughput: 0: 994.6. Samples: 466712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:37:39,403][00195] Avg episode reward: [(0, '5.710')] [2024-11-28 08:37:44,025][02268] Updated weights for policy 0, policy_version 460 (0.0031) [2024-11-28 08:37:44,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1884160. Throughput: 0: 930.0. Samples: 470898. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-28 08:37:44,404][00195] Avg episode reward: [(0, '5.802')] [2024-11-28 08:37:44,407][02251] Saving new best policy, reward=5.802! [2024-11-28 08:37:49,397][00195] Fps is (10 sec: 3687.4, 60 sec: 3823.0, 300 sec: 3776.7). Total num frames: 1904640. Throughput: 0: 924.4. Samples: 474118. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:37:49,403][00195] Avg episode reward: [(0, '6.391')] [2024-11-28 08:37:49,412][02251] Saving new best policy, reward=6.391! [2024-11-28 08:37:53,198][02268] Updated weights for policy 0, policy_version 470 (0.0021) [2024-11-28 08:37:54,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 1929216. Throughput: 0: 973.8. Samples: 480872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:37:54,401][00195] Avg episode reward: [(0, '6.495')] [2024-11-28 08:37:54,403][02251] Saving new best policy, reward=6.495! [2024-11-28 08:37:59,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1941504. Throughput: 0: 944.8. Samples: 485546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:37:59,401][00195] Avg episode reward: [(0, '6.194')] [2024-11-28 08:38:04,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1961984. Throughput: 0: 922.2. Samples: 487948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:38:04,405][00195] Avg episode reward: [(0, '6.302')] [2024-11-28 08:38:04,907][02268] Updated weights for policy 0, policy_version 480 (0.0019) [2024-11-28 08:38:09,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 1982464. Throughput: 0: 951.8. Samples: 494762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:38:09,399][00195] Avg episode reward: [(0, '6.429')] [2024-11-28 08:38:14,399][00195] Fps is (10 sec: 3276.0, 60 sec: 3686.2, 300 sec: 3790.5). Total num frames: 1994752. Throughput: 0: 933.3. Samples: 499042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:38:14,402][00195] Avg episode reward: [(0, '6.653')] [2024-11-28 08:38:14,406][02251] Saving new best policy, reward=6.653! [2024-11-28 08:38:19,280][02268] Updated weights for policy 0, policy_version 490 (0.0040) [2024-11-28 08:38:19,396][00195] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3776.7). Total num frames: 2007040. Throughput: 0: 892.6. Samples: 500600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:38:19,399][00195] Avg episode reward: [(0, '6.352')] [2024-11-28 08:38:24,397][00195] Fps is (10 sec: 3277.6, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 2027520. Throughput: 0: 857.6. Samples: 505302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:38:24,409][00195] Avg episode reward: [(0, '6.584')] [2024-11-28 08:38:28,691][02268] Updated weights for policy 0, policy_version 500 (0.0018) [2024-11-28 08:38:29,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2048000. Throughput: 0: 919.1. Samples: 512258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:38:29,399][00195] Avg episode reward: [(0, '7.282')] [2024-11-28 08:38:29,409][02251] Saving new best policy, reward=7.282! [2024-11-28 08:38:34,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3776.7). Total num frames: 2064384. Throughput: 0: 914.5. Samples: 515272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:38:34,403][00195] Avg episode reward: [(0, '7.865')] [2024-11-28 08:38:34,406][02251] Saving new best policy, reward=7.865! [2024-11-28 08:38:39,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3776.7). Total num frames: 2080768. Throughput: 0: 853.4. Samples: 519276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:38:39,399][00195] Avg episode reward: [(0, '7.578')] [2024-11-28 08:38:39,406][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000508_2080768.pth... [2024-11-28 08:38:39,557][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000288_1179648.pth [2024-11-28 08:38:40,821][02268] Updated weights for policy 0, policy_version 510 (0.0040) [2024-11-28 08:38:44,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 2101248. Throughput: 0: 893.9. Samples: 525770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-11-28 08:38:44,399][00195] Avg episode reward: [(0, '7.313')] [2024-11-28 08:38:49,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 2121728. Throughput: 0: 915.3. Samples: 529138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:38:49,402][00195] Avg episode reward: [(0, '7.074')] [2024-11-28 08:38:51,008][02268] Updated weights for policy 0, policy_version 520 (0.0021) [2024-11-28 08:38:54,397][00195] Fps is (10 sec: 3686.1, 60 sec: 3481.6, 300 sec: 3776.6). Total num frames: 2138112. Throughput: 0: 874.4. Samples: 534110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:38:54,400][00195] Avg episode reward: [(0, '7.200')] [2024-11-28 08:38:59,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 2158592. Throughput: 0: 909.7. Samples: 539978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:38:59,399][00195] Avg episode reward: [(0, '7.859')] [2024-11-28 08:39:01,369][02268] Updated weights for policy 0, policy_version 530 (0.0023) [2024-11-28 08:39:04,397][00195] Fps is (10 sec: 4505.9, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 2183168. Throughput: 0: 951.7. Samples: 543428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:39:04,399][00195] Avg episode reward: [(0, '8.011')] [2024-11-28 08:39:04,404][02251] Saving new best policy, reward=8.011! [2024-11-28 08:39:09,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3762.8). Total num frames: 2195456. Throughput: 0: 976.0. Samples: 549222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:39:09,399][00195] Avg episode reward: [(0, '7.563')] [2024-11-28 08:39:12,992][02268] Updated weights for policy 0, policy_version 540 (0.0019) [2024-11-28 08:39:14,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3776.7). Total num frames: 2215936. Throughput: 0: 928.5. Samples: 554040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:39:14,399][00195] Avg episode reward: [(0, '7.246')] [2024-11-28 08:39:19,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2240512. Throughput: 0: 938.6. Samples: 557508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:39:19,398][00195] Avg episode reward: [(0, '7.660')] [2024-11-28 08:39:21,954][02268] Updated weights for policy 0, policy_version 550 (0.0022) [2024-11-28 08:39:24,410][00195] Fps is (10 sec: 4090.6, 60 sec: 3822.1, 300 sec: 3762.7). Total num frames: 2256896. Throughput: 0: 999.4. Samples: 564264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:39:24,415][00195] Avg episode reward: [(0, '8.159')] [2024-11-28 08:39:24,418][02251] Saving new best policy, reward=8.159! [2024-11-28 08:39:29,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2273280. Throughput: 0: 947.8. Samples: 568420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:39:29,404][00195] Avg episode reward: [(0, '8.909')] [2024-11-28 08:39:29,416][02251] Saving new best policy, reward=8.909! [2024-11-28 08:39:33,466][02268] Updated weights for policy 0, policy_version 560 (0.0029) [2024-11-28 08:39:34,396][00195] Fps is (10 sec: 4101.5, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2297856. Throughput: 0: 942.6. Samples: 571556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-28 08:39:34,399][00195] Avg episode reward: [(0, '9.134')] [2024-11-28 08:39:34,403][02251] Saving new best policy, reward=9.134! [2024-11-28 08:39:39,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 2318336. Throughput: 0: 984.8. Samples: 578426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:39:39,401][00195] Avg episode reward: [(0, '9.669')] [2024-11-28 08:39:39,416][02251] Saving new best policy, reward=9.669! [2024-11-28 08:39:44,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2330624. Throughput: 0: 959.4. Samples: 583150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:39:44,399][00195] Avg episode reward: [(0, '9.713')] [2024-11-28 08:39:44,406][02251] Saving new best policy, reward=9.713! [2024-11-28 08:39:44,860][02268] Updated weights for policy 0, policy_version 570 (0.0027) [2024-11-28 08:39:49,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2351104. Throughput: 0: 932.8. Samples: 585402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:39:49,404][00195] Avg episode reward: [(0, '10.444')] [2024-11-28 08:39:49,415][02251] Saving new best policy, reward=10.444! [2024-11-28 08:39:54,366][02268] Updated weights for policy 0, policy_version 580 (0.0018) [2024-11-28 08:39:54,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 2375680. Throughput: 0: 956.3. Samples: 592256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-28 08:39:54,399][00195] Avg episode reward: [(0, '9.718')] [2024-11-28 08:39:59,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2392064. Throughput: 0: 980.0. Samples: 598142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:39:59,403][00195] Avg episode reward: [(0, '9.579')] [2024-11-28 08:40:04,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2408448. Throughput: 0: 949.7. Samples: 600244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:40:04,398][00195] Avg episode reward: [(0, '9.062')] [2024-11-28 08:40:05,888][02268] Updated weights for policy 0, policy_version 590 (0.0024) [2024-11-28 08:40:09,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2428928. Throughput: 0: 941.0. Samples: 606598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:40:09,403][00195] Avg episode reward: [(0, '9.228')] [2024-11-28 08:40:14,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2449408. Throughput: 0: 996.5. Samples: 613262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:40:14,402][00195] Avg episode reward: [(0, '8.935')] [2024-11-28 08:40:15,965][02268] Updated weights for policy 0, policy_version 600 (0.0038) [2024-11-28 08:40:19,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2465792. Throughput: 0: 973.2. Samples: 615352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:40:19,399][00195] Avg episode reward: [(0, '8.619')] [2024-11-28 08:40:24,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3823.8, 300 sec: 3790.5). Total num frames: 2486272. Throughput: 0: 935.5. Samples: 620522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:40:24,399][00195] Avg episode reward: [(0, '8.585')] [2024-11-28 08:40:26,460][02268] Updated weights for policy 0, policy_version 610 (0.0021) [2024-11-28 08:40:29,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3790.6). Total num frames: 2510848. Throughput: 0: 983.2. Samples: 627396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:40:29,399][00195] Avg episode reward: [(0, '8.622')] [2024-11-28 08:40:34,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 2527232. Throughput: 0: 998.7. Samples: 630342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:40:34,399][00195] Avg episode reward: [(0, '8.701')] [2024-11-28 08:40:38,003][02268] Updated weights for policy 0, policy_version 620 (0.0019) [2024-11-28 08:40:39,399][00195] Fps is (10 sec: 3275.9, 60 sec: 3754.5, 300 sec: 3790.5). Total num frames: 2543616. Throughput: 0: 942.7. Samples: 634682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:40:39,403][00195] Avg episode reward: [(0, '9.028')] [2024-11-28 08:40:39,420][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000621_2543616.pth... [2024-11-28 08:40:39,558][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000400_1638400.pth [2024-11-28 08:40:44,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2568192. Throughput: 0: 962.5. Samples: 641456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:40:44,399][00195] Avg episode reward: [(0, '8.962')] [2024-11-28 08:40:46,955][02268] Updated weights for policy 0, policy_version 630 (0.0021) [2024-11-28 08:40:49,397][00195] Fps is (10 sec: 4097.1, 60 sec: 3891.2, 300 sec: 3776.7). Total num frames: 2584576. Throughput: 0: 994.0. Samples: 644976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:40:49,399][00195] Avg episode reward: [(0, '9.096')] [2024-11-28 08:40:54,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 2600960. Throughput: 0: 955.0. Samples: 649574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:40:54,403][00195] Avg episode reward: [(0, '9.462')] [2024-11-28 08:40:58,445][02268] Updated weights for policy 0, policy_version 640 (0.0016) [2024-11-28 08:40:59,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2625536. Throughput: 0: 947.8. Samples: 655912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:40:59,399][00195] Avg episode reward: [(0, '11.201')] [2024-11-28 08:40:59,409][02251] Saving new best policy, reward=11.201! [2024-11-28 08:41:04,397][00195] Fps is (10 sec: 4505.4, 60 sec: 3959.4, 300 sec: 3804.4). Total num frames: 2646016. Throughput: 0: 977.3. Samples: 659332. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:41:04,404][00195] Avg episode reward: [(0, '12.785')] [2024-11-28 08:41:04,406][02251] Saving new best policy, reward=12.785! [2024-11-28 08:41:09,260][02268] Updated weights for policy 0, policy_version 650 (0.0030) [2024-11-28 08:41:09,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2662400. Throughput: 0: 985.7. Samples: 664878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:41:09,399][00195] Avg episode reward: [(0, '12.171')] [2024-11-28 08:41:14,396][00195] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2678784. Throughput: 0: 943.3. Samples: 669844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:41:14,399][00195] Avg episode reward: [(0, '11.355')] [2024-11-28 08:41:19,035][02268] Updated weights for policy 0, policy_version 660 (0.0016) [2024-11-28 08:41:19,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2703360. Throughput: 0: 954.8. Samples: 673310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:41:19,399][00195] Avg episode reward: [(0, '10.918')] [2024-11-28 08:41:24,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2719744. Throughput: 0: 1005.3. Samples: 679920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:41:24,398][00195] Avg episode reward: [(0, '11.943')] [2024-11-28 08:41:29,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.5). Total num frames: 2736128. Throughput: 0: 947.2. Samples: 684078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:41:29,404][00195] Avg episode reward: [(0, '12.273')] [2024-11-28 08:41:30,866][02268] Updated weights for policy 0, policy_version 670 (0.0018) [2024-11-28 08:41:34,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.5). Total num frames: 2760704. Throughput: 0: 941.8. Samples: 687356. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:41:34,398][00195] Avg episode reward: [(0, '13.343')] [2024-11-28 08:41:34,401][02251] Saving new best policy, reward=13.343! [2024-11-28 08:41:39,398][00195] Fps is (10 sec: 4504.8, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2781184. Throughput: 0: 992.9. Samples: 694254. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:41:39,403][00195] Avg episode reward: [(0, '14.360')] [2024-11-28 08:41:39,419][02251] Saving new best policy, reward=14.360! [2024-11-28 08:41:40,062][02268] Updated weights for policy 0, policy_version 680 (0.0018) [2024-11-28 08:41:44,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 2793472. Throughput: 0: 956.1. Samples: 698938. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-28 08:41:44,403][00195] Avg episode reward: [(0, '13.685')] [2024-11-28 08:41:49,396][00195] Fps is (10 sec: 3277.3, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2813952. Throughput: 0: 933.5. Samples: 701338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:41:49,398][00195] Avg episode reward: [(0, '12.569')] [2024-11-28 08:41:51,418][02268] Updated weights for policy 0, policy_version 690 (0.0014) [2024-11-28 08:41:54,397][00195] Fps is (10 sec: 4505.7, 60 sec: 3959.4, 300 sec: 3804.4). Total num frames: 2838528. Throughput: 0: 964.9. Samples: 708300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:41:54,400][00195] Avg episode reward: [(0, '13.141')] [2024-11-28 08:41:59,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2854912. Throughput: 0: 984.0. Samples: 714126. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:41:59,402][00195] Avg episode reward: [(0, '12.530')] [2024-11-28 08:42:03,103][02268] Updated weights for policy 0, policy_version 700 (0.0021) [2024-11-28 08:42:04,396][00195] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2871296. Throughput: 0: 951.0. Samples: 716106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:42:04,404][00195] Avg episode reward: [(0, '13.076')] [2024-11-28 08:42:09,400][00195] Fps is (10 sec: 4094.6, 60 sec: 3891.0, 300 sec: 3804.4). Total num frames: 2895872. Throughput: 0: 942.5. Samples: 722336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:42:09,402][00195] Avg episode reward: [(0, '14.494')] [2024-11-28 08:42:09,412][02251] Saving new best policy, reward=14.494! [2024-11-28 08:42:12,338][02268] Updated weights for policy 0, policy_version 710 (0.0018) [2024-11-28 08:42:14,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 2912256. Throughput: 0: 995.8. Samples: 728890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:42:14,407][00195] Avg episode reward: [(0, '14.829')] [2024-11-28 08:42:14,412][02251] Saving new best policy, reward=14.829! [2024-11-28 08:42:19,397][00195] Fps is (10 sec: 3277.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2928640. Throughput: 0: 967.7. Samples: 730902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:42:19,402][00195] Avg episode reward: [(0, '15.400')] [2024-11-28 08:42:19,413][02251] Saving new best policy, reward=15.400! [2024-11-28 08:42:24,016][02268] Updated weights for policy 0, policy_version 720 (0.0037) [2024-11-28 08:42:24,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2949120. Throughput: 0: 929.3. Samples: 736072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:42:24,400][00195] Avg episode reward: [(0, '15.334')] [2024-11-28 08:42:29,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 2973696. Throughput: 0: 979.9. Samples: 743034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-28 08:42:29,399][00195] Avg episode reward: [(0, '15.673')] [2024-11-28 08:42:29,406][02251] Saving new best policy, reward=15.673! [2024-11-28 08:42:34,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 2985984. Throughput: 0: 990.3. Samples: 745902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:42:34,404][00195] Avg episode reward: [(0, '15.352')] [2024-11-28 08:42:34,611][02268] Updated weights for policy 0, policy_version 730 (0.0022) [2024-11-28 08:42:39,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3754.8, 300 sec: 3804.4). Total num frames: 3006464. Throughput: 0: 931.3. Samples: 750210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:42:39,400][00195] Avg episode reward: [(0, '15.389')] [2024-11-28 08:42:39,410][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000734_3006464.pth... [2024-11-28 08:42:39,552][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000508_2080768.pth [2024-11-28 08:42:44,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3026944. Throughput: 0: 951.0. Samples: 756922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:42:44,399][00195] Avg episode reward: [(0, '16.028')] [2024-11-28 08:42:44,407][02251] Saving new best policy, reward=16.028! [2024-11-28 08:42:44,734][02268] Updated weights for policy 0, policy_version 740 (0.0026) [2024-11-28 08:42:49,397][00195] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3047424. Throughput: 0: 983.2. Samples: 760350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:42:49,403][00195] Avg episode reward: [(0, '16.306')] [2024-11-28 08:42:49,413][02251] Saving new best policy, reward=16.306! [2024-11-28 08:42:54,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 3059712. Throughput: 0: 943.4. Samples: 764784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:42:54,403][00195] Avg episode reward: [(0, '15.748')] [2024-11-28 08:42:56,518][02268] Updated weights for policy 0, policy_version 750 (0.0021) [2024-11-28 08:42:59,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3084288. Throughput: 0: 933.5. Samples: 770896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:42:59,405][00195] Avg episode reward: [(0, '16.924')] [2024-11-28 08:42:59,414][02251] Saving new best policy, reward=16.924! [2024-11-28 08:43:04,396][00195] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3104768. Throughput: 0: 965.0. Samples: 774326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-28 08:43:04,402][00195] Avg episode reward: [(0, '18.483')] [2024-11-28 08:43:04,412][02251] Saving new best policy, reward=18.483! [2024-11-28 08:43:06,068][02268] Updated weights for policy 0, policy_version 760 (0.0028) [2024-11-28 08:43:09,398][00195] Fps is (10 sec: 3686.0, 60 sec: 3754.8, 300 sec: 3818.3). Total num frames: 3121152. Throughput: 0: 971.3. Samples: 779782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:43:09,400][00195] Avg episode reward: [(0, '17.122')] [2024-11-28 08:43:14,403][00195] Fps is (10 sec: 2865.3, 60 sec: 3686.0, 300 sec: 3818.2). Total num frames: 3133440. Throughput: 0: 914.8. Samples: 784206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-28 08:43:14,415][00195] Avg episode reward: [(0, '17.412')] [2024-11-28 08:43:19,396][00195] Fps is (10 sec: 2867.5, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 3149824. Throughput: 0: 895.7. Samples: 786208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-28 08:43:19,400][00195] Avg episode reward: [(0, '16.753')] [2024-11-28 08:43:20,117][02268] Updated weights for policy 0, policy_version 770 (0.0030) [2024-11-28 08:43:24,396][00195] Fps is (10 sec: 3279.0, 60 sec: 3618.1, 300 sec: 3790.5). Total num frames: 3166208. Throughput: 0: 914.9. Samples: 791378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:43:24,401][00195] Avg episode reward: [(0, '17.421')] [2024-11-28 08:43:29,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3790.5). Total num frames: 3182592. Throughput: 0: 859.3. Samples: 795592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:43:29,399][00195] Avg episode reward: [(0, '16.994')] [2024-11-28 08:43:31,735][02268] Updated weights for policy 0, policy_version 780 (0.0041) [2024-11-28 08:43:34,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3207168. Throughput: 0: 856.8. Samples: 798904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:43:34,401][00195] Avg episode reward: [(0, '18.072')] [2024-11-28 08:43:39,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3227648. Throughput: 0: 912.8. Samples: 805858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:43:39,402][00195] Avg episode reward: [(0, '19.560')] [2024-11-28 08:43:39,412][02251] Saving new best policy, reward=19.560! [2024-11-28 08:43:41,910][02268] Updated weights for policy 0, policy_version 790 (0.0022) [2024-11-28 08:43:44,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3790.5). Total num frames: 3239936. Throughput: 0: 879.9. Samples: 810492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:43:44,398][00195] Avg episode reward: [(0, '20.542')] [2024-11-28 08:43:44,401][02251] Saving new best policy, reward=20.542! [2024-11-28 08:43:49,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3804.4). Total num frames: 3260416. Throughput: 0: 855.2. Samples: 812810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:43:49,404][00195] Avg episode reward: [(0, '18.871')] [2024-11-28 08:43:52,506][02268] Updated weights for policy 0, policy_version 800 (0.0022) [2024-11-28 08:43:54,397][00195] Fps is (10 sec: 4505.5, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3284992. Throughput: 0: 887.4. Samples: 819712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:43:54,398][00195] Avg episode reward: [(0, '18.730')] [2024-11-28 08:43:59,399][00195] Fps is (10 sec: 4095.1, 60 sec: 3618.0, 300 sec: 3790.5). Total num frames: 3301376. Throughput: 0: 917.7. Samples: 825498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:43:59,401][00195] Avg episode reward: [(0, '18.799')] [2024-11-28 08:44:04,211][02268] Updated weights for policy 0, policy_version 810 (0.0030) [2024-11-28 08:44:04,396][00195] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3804.4). Total num frames: 3317760. Throughput: 0: 919.1. Samples: 827566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:44:04,402][00195] Avg episode reward: [(0, '17.752')] [2024-11-28 08:44:09,397][00195] Fps is (10 sec: 3687.4, 60 sec: 3618.2, 300 sec: 3804.4). Total num frames: 3338240. Throughput: 0: 943.6. Samples: 833838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:44:09,399][00195] Avg episode reward: [(0, '18.057')] [2024-11-28 08:44:13,240][02268] Updated weights for policy 0, policy_version 820 (0.0024) [2024-11-28 08:44:14,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3755.1, 300 sec: 3790.5). Total num frames: 3358720. Throughput: 0: 995.5. Samples: 840390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:44:14,401][00195] Avg episode reward: [(0, '18.791')] [2024-11-28 08:44:19,397][00195] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3790.7). Total num frames: 3375104. Throughput: 0: 969.1. Samples: 842512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:44:19,400][00195] Avg episode reward: [(0, '19.066')] [2024-11-28 08:44:24,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3395584. Throughput: 0: 934.6. Samples: 847916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:44:24,400][00195] Avg episode reward: [(0, '18.069')] [2024-11-28 08:44:24,522][02268] Updated weights for policy 0, policy_version 830 (0.0039) [2024-11-28 08:44:29,396][00195] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3420160. Throughput: 0: 990.5. Samples: 855064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:44:29,399][00195] Avg episode reward: [(0, '17.541')] [2024-11-28 08:44:34,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3436544. Throughput: 0: 1003.0. Samples: 857944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:44:34,401][00195] Avg episode reward: [(0, '16.624')] [2024-11-28 08:44:35,341][02268] Updated weights for policy 0, policy_version 840 (0.0018) [2024-11-28 08:44:39,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3457024. Throughput: 0: 949.5. Samples: 862440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:44:39,402][00195] Avg episode reward: [(0, '17.563')] [2024-11-28 08:44:39,411][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000844_3457024.pth... [2024-11-28 08:44:39,558][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000621_2543616.pth [2024-11-28 08:44:44,397][00195] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3477504. Throughput: 0: 973.5. Samples: 869302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:44:44,399][00195] Avg episode reward: [(0, '18.362')] [2024-11-28 08:44:44,945][02268] Updated weights for policy 0, policy_version 850 (0.0032) [2024-11-28 08:44:49,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3497984. Throughput: 0: 1003.4. Samples: 872718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:44:49,399][00195] Avg episode reward: [(0, '19.933')] [2024-11-28 08:44:54,397][00195] Fps is (10 sec: 3276.6, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 3510272. Throughput: 0: 962.6. Samples: 877156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:44:54,402][00195] Avg episode reward: [(0, '20.494')] [2024-11-28 08:44:56,301][02268] Updated weights for policy 0, policy_version 860 (0.0028) [2024-11-28 08:44:59,397][00195] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3818.3). Total num frames: 3534848. Throughput: 0: 961.6. Samples: 883660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:44:59,400][00195] Avg episode reward: [(0, '21.358')] [2024-11-28 08:44:59,408][02251] Saving new best policy, reward=21.358! [2024-11-28 08:45:04,398][00195] Fps is (10 sec: 4505.1, 60 sec: 3959.4, 300 sec: 3818.3). Total num frames: 3555328. Throughput: 0: 989.6. Samples: 887046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:45:04,404][00195] Avg episode reward: [(0, '21.614')] [2024-11-28 08:45:04,440][02251] Saving new best policy, reward=21.614! [2024-11-28 08:45:06,079][02268] Updated weights for policy 0, policy_version 870 (0.0037) [2024-11-28 08:45:09,397][00195] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3804.4). Total num frames: 3571712. Throughput: 0: 987.6. Samples: 892358. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:45:09,401][00195] Avg episode reward: [(0, '21.624')] [2024-11-28 08:45:09,413][02251] Saving new best policy, reward=21.624! [2024-11-28 08:45:14,396][00195] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3592192. Throughput: 0: 945.9. Samples: 897628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:45:14,403][00195] Avg episode reward: [(0, '21.365')] [2024-11-28 08:45:17,062][02268] Updated weights for policy 0, policy_version 880 (0.0018) [2024-11-28 08:45:19,396][00195] Fps is (10 sec: 4096.4, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 3612672. Throughput: 0: 955.7. Samples: 900950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-28 08:45:19,399][00195] Avg episode reward: [(0, '21.279')] [2024-11-28 08:45:24,398][00195] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3790.5). Total num frames: 3629056. Throughput: 0: 998.8. Samples: 907386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:45:24,401][00195] Avg episode reward: [(0, '21.432')] [2024-11-28 08:45:28,630][02268] Updated weights for policy 0, policy_version 890 (0.0063) [2024-11-28 08:45:29,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3645440. Throughput: 0: 942.8. Samples: 911730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:45:29,403][00195] Avg episode reward: [(0, '21.920')] [2024-11-28 08:45:29,447][02251] Saving new best policy, reward=21.920! [2024-11-28 08:45:34,396][00195] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3670016. Throughput: 0: 940.0. Samples: 915016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:45:34,399][00195] Avg episode reward: [(0, '22.432')] [2024-11-28 08:45:34,403][02251] Saving new best policy, reward=22.432! [2024-11-28 08:45:37,745][02268] Updated weights for policy 0, policy_version 900 (0.0024) [2024-11-28 08:45:39,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3690496. Throughput: 0: 992.9. Samples: 921836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-28 08:45:39,399][00195] Avg episode reward: [(0, '21.587')] [2024-11-28 08:45:44,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3702784. Throughput: 0: 948.3. Samples: 926334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:45:44,402][00195] Avg episode reward: [(0, '20.588')] [2024-11-28 08:45:49,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 3723264. Throughput: 0: 928.3. Samples: 928816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:45:49,399][00195] Avg episode reward: [(0, '21.814')] [2024-11-28 08:45:49,654][02268] Updated weights for policy 0, policy_version 910 (0.0024) [2024-11-28 08:45:54,397][00195] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3747840. Throughput: 0: 962.9. Samples: 935688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-28 08:45:54,399][00195] Avg episode reward: [(0, '20.884')] [2024-11-28 08:45:59,397][00195] Fps is (10 sec: 4095.7, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3764224. Throughput: 0: 971.8. Samples: 941360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:45:59,402][00195] Avg episode reward: [(0, '20.709')] [2024-11-28 08:46:00,278][02268] Updated weights for policy 0, policy_version 920 (0.0026) [2024-11-28 08:46:04,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3790.5). Total num frames: 3780608. Throughput: 0: 946.0. Samples: 943522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:46:04,402][00195] Avg episode reward: [(0, '20.664')] [2024-11-28 08:46:09,397][00195] Fps is (10 sec: 4096.3, 60 sec: 3891.3, 300 sec: 3818.3). Total num frames: 3805184. Throughput: 0: 952.1. Samples: 950230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:46:09,399][00195] Avg episode reward: [(0, '21.113')] [2024-11-28 08:46:09,777][02268] Updated weights for policy 0, policy_version 930 (0.0021) [2024-11-28 08:46:14,396][00195] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3825664. Throughput: 0: 999.5. Samples: 956708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:46:14,400][00195] Avg episode reward: [(0, '19.813')] [2024-11-28 08:46:19,397][00195] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 3837952. Throughput: 0: 973.1. Samples: 958804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:46:19,399][00195] Avg episode reward: [(0, '20.580')] [2024-11-28 08:46:21,357][02268] Updated weights for policy 0, policy_version 940 (0.0030) [2024-11-28 08:46:24,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3818.3). Total num frames: 3862528. Throughput: 0: 945.2. Samples: 964368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:46:24,404][00195] Avg episode reward: [(0, '20.875')] [2024-11-28 08:46:29,396][00195] Fps is (10 sec: 4915.4, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 3887104. Throughput: 0: 1002.4. Samples: 971440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-28 08:46:29,399][00195] Avg episode reward: [(0, '21.718')] [2024-11-28 08:46:30,302][02268] Updated weights for policy 0, policy_version 950 (0.0027) [2024-11-28 08:46:34,400][00195] Fps is (10 sec: 3685.0, 60 sec: 3822.7, 300 sec: 3790.5). Total num frames: 3899392. Throughput: 0: 1007.1. Samples: 974138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:46:34,403][00195] Avg episode reward: [(0, '22.750')] [2024-11-28 08:46:34,405][02251] Saving new best policy, reward=22.750! [2024-11-28 08:46:39,397][00195] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3919872. Throughput: 0: 957.6. Samples: 978782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:46:39,407][00195] Avg episode reward: [(0, '24.118')] [2024-11-28 08:46:39,417][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000957_3919872.pth... [2024-11-28 08:46:39,536][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000734_3006464.pth [2024-11-28 08:46:39,552][02251] Saving new best policy, reward=24.118! [2024-11-28 08:46:41,681][02268] Updated weights for policy 0, policy_version 960 (0.0016) [2024-11-28 08:46:44,396][00195] Fps is (10 sec: 4507.3, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 3944448. Throughput: 0: 983.2. Samples: 985602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:46:44,399][00195] Avg episode reward: [(0, '25.261')] [2024-11-28 08:46:44,405][02251] Saving new best policy, reward=25.261! [2024-11-28 08:46:49,396][00195] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 3960832. Throughput: 0: 1006.4. Samples: 988812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-28 08:46:49,400][00195] Avg episode reward: [(0, '24.951')] [2024-11-28 08:46:53,484][02268] Updated weights for policy 0, policy_version 970 (0.0023) [2024-11-28 08:46:54,396][00195] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3977216. Throughput: 0: 950.9. Samples: 993022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:46:54,399][00195] Avg episode reward: [(0, '25.118')] [2024-11-28 08:46:59,396][00195] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3997696. Throughput: 0: 955.6. Samples: 999712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-28 08:46:59,402][00195] Avg episode reward: [(0, '24.634')] [2024-11-28 08:47:00,456][02251] Stopping Batcher_0... [2024-11-28 08:47:00,456][02251] Loop batcher_evt_loop terminating... [2024-11-28 08:47:00,456][00195] Component Batcher_0 stopped! [2024-11-28 08:47:00,460][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-28 08:47:00,516][02268] Weights refcount: 2 0 [2024-11-28 08:47:00,520][02268] Stopping InferenceWorker_p0-w0... [2024-11-28 08:47:00,521][02268] Loop inference_proc0-0_evt_loop terminating... [2024-11-28 08:47:00,520][00195] Component InferenceWorker_p0-w0 stopped! [2024-11-28 08:47:00,579][02251] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000844_3457024.pth [2024-11-28 08:47:00,594][02251] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-28 08:47:00,792][02251] Stopping LearnerWorker_p0... [2024-11-28 08:47:00,793][00195] Component LearnerWorker_p0 stopped! [2024-11-28 08:47:00,793][02251] Loop learner_proc0_evt_loop terminating... [2024-11-28 08:47:00,840][02272] Stopping RolloutWorker_w3... [2024-11-28 08:47:00,841][02272] Loop rollout_proc3_evt_loop terminating... [2024-11-28 08:47:00,840][00195] Component RolloutWorker_w3 stopped! [2024-11-28 08:47:00,892][02274] Stopping RolloutWorker_w5... [2024-11-28 08:47:00,892][02274] Loop rollout_proc5_evt_loop terminating... [2024-11-28 08:47:00,889][00195] Component RolloutWorker_w5 stopped! [2024-11-28 08:47:00,894][02275] Stopping RolloutWorker_w7... [2024-11-28 08:47:00,895][00195] Component RolloutWorker_w7 stopped! [2024-11-28 08:47:00,901][02275] Loop rollout_proc7_evt_loop terminating... [2024-11-28 08:47:00,904][00195] Component RolloutWorker_w1 stopped! [2024-11-28 08:47:00,906][02270] Stopping RolloutWorker_w1... [2024-11-28 08:47:00,910][02270] Loop rollout_proc1_evt_loop terminating... [2024-11-28 08:47:00,943][02271] Stopping RolloutWorker_w2... [2024-11-28 08:47:00,943][00195] Component RolloutWorker_w2 stopped! [2024-11-28 08:47:00,944][02271] Loop rollout_proc2_evt_loop terminating... [2024-11-28 08:47:00,971][02273] Stopping RolloutWorker_w4... [2024-11-28 08:47:00,971][00195] Component RolloutWorker_w4 stopped! [2024-11-28 08:47:00,972][02273] Loop rollout_proc4_evt_loop terminating... [2024-11-28 08:47:00,991][00195] Component RolloutWorker_w0 stopped! [2024-11-28 08:47:00,991][02269] Stopping RolloutWorker_w0... [2024-11-28 08:47:00,997][02269] Loop rollout_proc0_evt_loop terminating... [2024-11-28 08:47:01,013][02276] Stopping RolloutWorker_w6... [2024-11-28 08:47:01,013][00195] Component RolloutWorker_w6 stopped! [2024-11-28 08:47:01,014][02276] Loop rollout_proc6_evt_loop terminating... [2024-11-28 08:47:01,015][00195] Waiting for process learner_proc0 to stop... [2024-11-28 08:47:02,457][00195] Waiting for process inference_proc0-0 to join... [2024-11-28 08:47:02,462][00195] Waiting for process rollout_proc0 to join... [2024-11-28 08:47:04,804][00195] Waiting for process rollout_proc1 to join... [2024-11-28 08:47:04,934][00195] Waiting for process rollout_proc2 to join... [2024-11-28 08:47:04,937][00195] Waiting for process rollout_proc3 to join... [2024-11-28 08:47:04,942][00195] Waiting for process rollout_proc4 to join... [2024-11-28 08:47:04,946][00195] Waiting for process rollout_proc5 to join... [2024-11-28 08:47:04,950][00195] Waiting for process rollout_proc6 to join... [2024-11-28 08:47:04,954][00195] Waiting for process rollout_proc7 to join... [2024-11-28 08:47:04,960][00195] Batcher 0 profile tree view: batching: 28.2691, releasing_batches: 0.0272 [2024-11-28 08:47:04,961][00195] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 405.9747 update_model: 8.6611 weight_update: 0.0030 one_step: 0.0024 handle_policy_step: 608.9111 deserialize: 15.0671, stack: 3.4031, obs_to_device_normalize: 128.1769, forward: 305.6480, send_messages: 30.3155 prepare_outputs: 95.9428 to_cpu: 58.1498 [2024-11-28 08:47:04,965][00195] Learner 0 profile tree view: misc: 0.0050, prepare_batch: 14.0583 train: 76.6414 epoch_init: 0.0079, minibatch_init: 0.0124, losses_postprocess: 0.6427, kl_divergence: 0.6697, after_optimizer: 34.0289 calculate_losses: 28.1813 losses_init: 0.0055, forward_head: 1.3473, bptt_initial: 18.9848, tail: 1.2203, advantages_returns: 0.3103, losses: 3.9149 bptt: 2.0505 bptt_forward_core: 1.9567 update: 12.5077 clip: 0.9133 [2024-11-28 08:47:04,967][00195] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2925, enqueue_policy_requests: 99.5740, env_step: 837.2425, overhead: 13.5808, complete_rollouts: 7.3692 save_policy_outputs: 21.7104 split_output_tensors: 8.8833 [2024-11-28 08:47:04,968][00195] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3746, enqueue_policy_requests: 102.9072, env_step: 835.3827, overhead: 13.0551, complete_rollouts: 6.2838 save_policy_outputs: 21.1759 split_output_tensors: 8.6046 [2024-11-28 08:47:04,971][00195] Loop Runner_EvtLoop terminating... [2024-11-28 08:47:04,972][00195] Runner profile tree view: main_loop: 1098.3587 [2024-11-28 08:47:04,974][00195] Collected {0: 4005888}, FPS: 3647.2 [2024-11-28 09:07:36,401][00195] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-28 09:07:36,403][00195] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-28 09:07:36,405][00195] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-28 09:07:36,408][00195] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-28 09:07:36,409][00195] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-28 09:07:36,411][00195] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-28 09:07:36,413][00195] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-28 09:07:36,414][00195] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-28 09:07:36,415][00195] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-28 09:07:36,416][00195] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-28 09:07:36,417][00195] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-28 09:07:36,418][00195] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-28 09:07:36,420][00195] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-28 09:07:36,421][00195] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-28 09:07:36,422][00195] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-28 09:07:36,457][00195] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-28 09:07:36,460][00195] RunningMeanStd input shape: (3, 72, 128) [2024-11-28 09:07:36,464][00195] RunningMeanStd input shape: (1,) [2024-11-28 09:07:36,479][00195] ConvEncoder: input_channels=3 [2024-11-28 09:07:36,594][00195] Conv encoder output size: 512 [2024-11-28 09:07:36,596][00195] Policy head output size: 512 [2024-11-28 09:07:36,914][00195] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-28 09:07:37,783][00195] Num frames 100... [2024-11-28 09:07:37,917][00195] Num frames 200... [2024-11-28 09:07:38,055][00195] Num frames 300... [2024-11-28 09:07:38,192][00195] Num frames 400... [2024-11-28 09:07:38,341][00195] Num frames 500... [2024-11-28 09:07:38,478][00195] Num frames 600... [2024-11-28 09:07:38,618][00195] Num frames 700... [2024-11-28 09:07:38,753][00195] Num frames 800... [2024-11-28 09:07:38,901][00195] Num frames 900... [2024-11-28 09:07:39,037][00195] Num frames 1000... [2024-11-28 09:07:39,133][00195] Avg episode rewards: #0: 27.290, true rewards: #0: 10.290 [2024-11-28 09:07:39,135][00195] Avg episode reward: 27.290, avg true_objective: 10.290 [2024-11-28 09:07:39,232][00195] Num frames 1100... [2024-11-28 09:07:39,377][00195] Num frames 1200... [2024-11-28 09:07:39,514][00195] Num frames 1300... [2024-11-28 09:07:39,652][00195] Num frames 1400... [2024-11-28 09:07:39,787][00195] Num frames 1500... [2024-11-28 09:07:39,927][00195] Num frames 1600... [2024-11-28 09:07:40,065][00195] Num frames 1700... [2024-11-28 09:07:40,200][00195] Num frames 1800... [2024-11-28 09:07:40,374][00195] Avg episode rewards: #0: 24.930, true rewards: #0: 9.430 [2024-11-28 09:07:40,376][00195] Avg episode reward: 24.930, avg true_objective: 9.430 [2024-11-28 09:07:40,398][00195] Num frames 1900... [2024-11-28 09:07:40,537][00195] Num frames 2000... [2024-11-28 09:07:40,682][00195] Num frames 2100... [2024-11-28 09:07:40,837][00195] Num frames 2200... [2024-11-28 09:07:40,973][00195] Num frames 2300... [2024-11-28 09:07:41,106][00195] Num frames 2400... [2024-11-28 09:07:41,205][00195] Avg episode rewards: #0: 19.100, true rewards: #0: 8.100 [2024-11-28 09:07:41,207][00195] Avg episode reward: 19.100, avg true_objective: 8.100 [2024-11-28 09:07:41,312][00195] Num frames 2500... [2024-11-28 09:07:41,456][00195] Num frames 2600... [2024-11-28 09:07:41,624][00195] Avg episode rewards: #0: 14.965, true rewards: #0: 6.715 [2024-11-28 09:07:41,626][00195] Avg episode reward: 14.965, avg true_objective: 6.715 [2024-11-28 09:07:41,650][00195] Num frames 2700... [2024-11-28 09:07:41,784][00195] Num frames 2800... [2024-11-28 09:07:41,933][00195] Num frames 2900... [2024-11-28 09:07:42,072][00195] Num frames 3000... [2024-11-28 09:07:42,211][00195] Num frames 3100... [2024-11-28 09:07:42,349][00195] Num frames 3200... [2024-11-28 09:07:42,488][00195] Num frames 3300... [2024-11-28 09:07:42,623][00195] Num frames 3400... [2024-11-28 09:07:42,762][00195] Num frames 3500... [2024-11-28 09:07:42,900][00195] Num frames 3600... [2024-11-28 09:07:42,999][00195] Avg episode rewards: #0: 16.058, true rewards: #0: 7.258 [2024-11-28 09:07:43,001][00195] Avg episode reward: 16.058, avg true_objective: 7.258 [2024-11-28 09:07:43,104][00195] Num frames 3700... [2024-11-28 09:07:43,245][00195] Num frames 3800... [2024-11-28 09:07:43,389][00195] Num frames 3900... [2024-11-28 09:07:43,539][00195] Num frames 4000... [2024-11-28 09:07:43,679][00195] Num frames 4100... [2024-11-28 09:07:43,818][00195] Num frames 4200... [2024-11-28 09:07:43,954][00195] Num frames 4300... [2024-11-28 09:07:44,090][00195] Num frames 4400... [2024-11-28 09:07:44,230][00195] Avg episode rewards: #0: 16.602, true rewards: #0: 7.435 [2024-11-28 09:07:44,232][00195] Avg episode reward: 16.602, avg true_objective: 7.435 [2024-11-28 09:07:44,285][00195] Num frames 4500... [2024-11-28 09:07:44,419][00195] Num frames 4600... [2024-11-28 09:07:44,563][00195] Num frames 4700... [2024-11-28 09:07:44,698][00195] Num frames 4800... [2024-11-28 09:07:44,842][00195] Num frames 4900... [2024-11-28 09:07:44,983][00195] Num frames 5000... [2024-11-28 09:07:45,117][00195] Num frames 5100... [2024-11-28 09:07:45,248][00195] Num frames 5200... [2024-11-28 09:07:45,380][00195] Num frames 5300... [2024-11-28 09:07:45,521][00195] Num frames 5400... [2024-11-28 09:07:45,656][00195] Num frames 5500... [2024-11-28 09:07:45,798][00195] Num frames 5600... [2024-11-28 09:07:45,938][00195] Num frames 5700... [2024-11-28 09:07:46,073][00195] Num frames 5800... [2024-11-28 09:07:46,206][00195] Num frames 5900... [2024-11-28 09:07:46,390][00195] Num frames 6000... [2024-11-28 09:07:46,603][00195] Num frames 6100... [2024-11-28 09:07:46,765][00195] Avg episode rewards: #0: 19.931, true rewards: #0: 8.789 [2024-11-28 09:07:46,767][00195] Avg episode reward: 19.931, avg true_objective: 8.789 [2024-11-28 09:07:46,864][00195] Num frames 6200... [2024-11-28 09:07:47,052][00195] Num frames 6300... [2024-11-28 09:07:47,260][00195] Num frames 6400... [2024-11-28 09:07:47,440][00195] Num frames 6500... [2024-11-28 09:07:47,641][00195] Num frames 6600... [2024-11-28 09:07:47,842][00195] Num frames 6700... [2024-11-28 09:07:48,041][00195] Num frames 6800... [2024-11-28 09:07:48,215][00195] Avg episode rewards: #0: 19.449, true rewards: #0: 8.574 [2024-11-28 09:07:48,217][00195] Avg episode reward: 19.449, avg true_objective: 8.574 [2024-11-28 09:07:48,303][00195] Num frames 6900... [2024-11-28 09:07:48,496][00195] Num frames 7000... [2024-11-28 09:07:48,695][00195] Num frames 7100... [2024-11-28 09:07:48,907][00195] Num frames 7200... [2024-11-28 09:07:49,109][00195] Num frames 7300... [2024-11-28 09:07:49,250][00195] Num frames 7400... [2024-11-28 09:07:49,374][00195] Avg episode rewards: #0: 18.501, true rewards: #0: 8.279 [2024-11-28 09:07:49,376][00195] Avg episode reward: 18.501, avg true_objective: 8.279 [2024-11-28 09:07:49,440][00195] Num frames 7500... [2024-11-28 09:07:49,573][00195] Num frames 7600... [2024-11-28 09:07:49,712][00195] Num frames 7700... [2024-11-28 09:07:49,864][00195] Num frames 7800... [2024-11-28 09:07:50,013][00195] Avg episode rewards: #0: 17.267, true rewards: #0: 7.867 [2024-11-28 09:07:50,015][00195] Avg episode reward: 17.267, avg true_objective: 7.867 [2024-11-28 09:08:44,584][00195] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-28 09:16:05,200][00195] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-28 09:16:05,202][00195] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-28 09:16:05,204][00195] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-28 09:16:05,206][00195] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-28 09:16:05,208][00195] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-28 09:16:05,210][00195] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-28 09:16:05,211][00195] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-28 09:16:05,212][00195] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-28 09:16:05,213][00195] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-28 09:16:05,214][00195] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-28 09:16:05,216][00195] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-28 09:16:05,217][00195] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-28 09:16:05,218][00195] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-28 09:16:05,219][00195] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-28 09:16:05,220][00195] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-28 09:16:05,254][00195] RunningMeanStd input shape: (3, 72, 128) [2024-11-28 09:16:05,258][00195] RunningMeanStd input shape: (1,) [2024-11-28 09:16:05,271][00195] ConvEncoder: input_channels=3 [2024-11-28 09:16:05,313][00195] Conv encoder output size: 512 [2024-11-28 09:16:05,315][00195] Policy head output size: 512 [2024-11-28 09:16:05,336][00195] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-28 09:16:05,811][00195] Num frames 100... [2024-11-28 09:16:05,972][00195] Num frames 200... [2024-11-28 09:16:06,114][00195] Num frames 300... [2024-11-28 09:16:06,250][00195] Num frames 400... [2024-11-28 09:16:06,389][00195] Num frames 500... [2024-11-28 09:16:06,603][00195] Num frames 600... [2024-11-28 09:16:06,810][00195] Avg episode rewards: #0: 11.720, true rewards: #0: 6.720 [2024-11-28 09:16:06,813][00195] Avg episode reward: 11.720, avg true_objective: 6.720 [2024-11-28 09:16:06,878][00195] Num frames 700... [2024-11-28 09:16:07,069][00195] Num frames 800... [2024-11-28 09:16:07,261][00195] Num frames 900... [2024-11-28 09:16:07,457][00195] Num frames 1000... [2024-11-28 09:16:20,450][00195] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-28 09:16:20,453][00195] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-28 09:16:20,455][00195] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-28 09:16:20,457][00195] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-28 09:16:20,459][00195] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-28 09:16:20,461][00195] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-28 09:16:20,463][00195] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-28 09:16:20,464][00195] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-28 09:16:20,468][00195] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-28 09:16:20,469][00195] Adding new argument 'hf_repository'='Farseer-W/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-28 09:16:20,470][00195] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-28 09:16:20,472][00195] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-28 09:16:20,475][00195] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-28 09:16:20,476][00195] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-28 09:16:20,478][00195] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-28 09:16:20,544][00195] RunningMeanStd input shape: (3, 72, 128) [2024-11-28 09:16:20,547][00195] RunningMeanStd input shape: (1,) [2024-11-28 09:16:20,570][00195] ConvEncoder: input_channels=3 [2024-11-28 09:16:20,639][00195] Conv encoder output size: 512 [2024-11-28 09:16:20,642][00195] Policy head output size: 512 [2024-11-28 09:16:20,676][00195] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-28 09:16:21,270][00195] Num frames 100... [2024-11-28 09:16:21,409][00195] Num frames 200... [2024-11-28 09:16:21,547][00195] Num frames 300... [2024-11-28 09:16:21,690][00195] Num frames 400... [2024-11-28 09:16:21,803][00195] Avg episode rewards: #0: 7.400, true rewards: #0: 4.400 [2024-11-28 09:16:21,805][00195] Avg episode reward: 7.400, avg true_objective: 4.400 [2024-11-28 09:16:21,899][00195] Num frames 500... [2024-11-28 09:16:22,047][00195] Num frames 600... [2024-11-28 09:16:22,187][00195] Num frames 700... [2024-11-28 09:16:22,327][00195] Num frames 800... [2024-11-28 09:16:22,461][00195] Num frames 900... [2024-11-28 09:16:22,597][00195] Num frames 1000... [2024-11-28 09:16:22,740][00195] Num frames 1100... [2024-11-28 09:16:22,884][00195] Num frames 1200... [2024-11-28 09:16:22,994][00195] Avg episode rewards: #0: 12.200, true rewards: #0: 6.200 [2024-11-28 09:16:22,997][00195] Avg episode reward: 12.200, avg true_objective: 6.200 [2024-11-28 09:16:23,086][00195] Num frames 1300... [2024-11-28 09:16:23,221][00195] Num frames 1400... [2024-11-28 09:16:23,358][00195] Num frames 1500... [2024-11-28 09:16:23,499][00195] Num frames 1600... [2024-11-28 09:16:23,631][00195] Num frames 1700... [2024-11-28 09:16:23,767][00195] Num frames 1800... [2024-11-28 09:16:23,918][00195] Num frames 1900... [2024-11-28 09:16:24,067][00195] Num frames 2000... [2024-11-28 09:16:24,212][00195] Num frames 2100... [2024-11-28 09:16:24,349][00195] Num frames 2200... [2024-11-28 09:16:24,490][00195] Num frames 2300... [2024-11-28 09:16:24,625][00195] Num frames 2400... [2024-11-28 09:16:24,764][00195] Num frames 2500... [2024-11-28 09:16:24,917][00195] Num frames 2600... [2024-11-28 09:16:25,061][00195] Num frames 2700... [2024-11-28 09:16:25,195][00195] Num frames 2800... [2024-11-28 09:16:25,347][00195] Avg episode rewards: #0: 22.907, true rewards: #0: 9.573 [2024-11-28 09:16:25,349][00195] Avg episode reward: 22.907, avg true_objective: 9.573 [2024-11-28 09:16:25,394][00195] Num frames 2900... [2024-11-28 09:16:25,532][00195] Num frames 3000... [2024-11-28 09:16:25,672][00195] Num frames 3100... [2024-11-28 09:16:25,814][00195] Num frames 3200... [2024-11-28 09:16:25,953][00195] Num frames 3300... [2024-11-28 09:16:26,102][00195] Num frames 3400... [2024-11-28 09:16:26,244][00195] Num frames 3500... [2024-11-28 09:16:26,318][00195] Avg episode rewards: #0: 20.030, true rewards: #0: 8.780 [2024-11-28 09:16:26,319][00195] Avg episode reward: 20.030, avg true_objective: 8.780 [2024-11-28 09:16:26,441][00195] Num frames 3600... [2024-11-28 09:16:26,571][00195] Num frames 3700... [2024-11-28 09:16:26,706][00195] Num frames 3800... [2024-11-28 09:16:26,851][00195] Num frames 3900... [2024-11-28 09:16:27,026][00195] Num frames 4000... [2024-11-28 09:16:27,172][00195] Avg episode rewards: #0: 17.512, true rewards: #0: 8.112 [2024-11-28 09:16:27,174][00195] Avg episode reward: 17.512, avg true_objective: 8.112 [2024-11-28 09:16:27,239][00195] Num frames 4100... [2024-11-28 09:16:27,371][00195] Num frames 4200... [2024-11-28 09:16:27,509][00195] Num frames 4300... [2024-11-28 09:16:27,676][00195] Num frames 4400... [2024-11-28 09:16:27,830][00195] Num frames 4500... [2024-11-28 09:16:27,975][00195] Num frames 4600... [2024-11-28 09:16:28,128][00195] Num frames 4700... [2024-11-28 09:16:28,270][00195] Num frames 4800... [2024-11-28 09:16:28,412][00195] Num frames 4900... [2024-11-28 09:16:28,557][00195] Num frames 5000... [2024-11-28 09:16:28,705][00195] Num frames 5100... [2024-11-28 09:16:28,855][00195] Num frames 5200... [2024-11-28 09:16:28,997][00195] Num frames 5300... [2024-11-28 09:16:29,148][00195] Num frames 5400... [2024-11-28 09:16:29,289][00195] Num frames 5500... [2024-11-28 09:16:29,435][00195] Num frames 5600... [2024-11-28 09:16:29,560][00195] Avg episode rewards: #0: 20.745, true rewards: #0: 9.412 [2024-11-28 09:16:29,562][00195] Avg episode reward: 20.745, avg true_objective: 9.412 [2024-11-28 09:16:29,638][00195] Num frames 5700... [2024-11-28 09:16:29,781][00195] Num frames 5800... [2024-11-28 09:16:29,936][00195] Num frames 5900... [2024-11-28 09:16:30,075][00195] Num frames 6000... [2024-11-28 09:16:30,225][00195] Num frames 6100... [2024-11-28 09:16:30,363][00195] Num frames 6200... [2024-11-28 09:16:30,536][00195] Avg episode rewards: #0: 19.696, true rewards: #0: 8.981 [2024-11-28 09:16:30,538][00195] Avg episode reward: 19.696, avg true_objective: 8.981 [2024-11-28 09:16:30,563][00195] Num frames 6300... [2024-11-28 09:16:30,704][00195] Num frames 6400... [2024-11-28 09:16:30,857][00195] Num frames 6500... [2024-11-28 09:16:30,997][00195] Num frames 6600... [2024-11-28 09:16:31,110][00195] Avg episode rewards: #0: 17.799, true rewards: #0: 8.299 [2024-11-28 09:16:31,112][00195] Avg episode reward: 17.799, avg true_objective: 8.299 [2024-11-28 09:16:31,248][00195] Num frames 6700... [2024-11-28 09:16:31,451][00195] Num frames 6800... [2024-11-28 09:16:31,650][00195] Num frames 6900... [2024-11-28 09:16:31,864][00195] Num frames 7000... [2024-11-28 09:16:32,067][00195] Num frames 7100... [2024-11-28 09:16:32,281][00195] Num frames 7200... [2024-11-28 09:16:32,473][00195] Num frames 7300... [2024-11-28 09:16:32,657][00195] Num frames 7400... [2024-11-28 09:16:32,863][00195] Num frames 7500... [2024-11-28 09:16:33,072][00195] Num frames 7600... [2024-11-28 09:16:33,260][00195] Num frames 7700... [2024-11-28 09:16:33,470][00195] Num frames 7800... [2024-11-28 09:16:33,681][00195] Num frames 7900... [2024-11-28 09:16:33,895][00195] Num frames 8000... [2024-11-28 09:16:34,102][00195] Num frames 8100... [2024-11-28 09:16:34,243][00195] Num frames 8200... [2024-11-28 09:16:34,385][00195] Num frames 8300... [2024-11-28 09:16:34,530][00195] Num frames 8400... [2024-11-28 09:16:34,670][00195] Num frames 8500... [2024-11-28 09:16:34,815][00195] Num frames 8600... [2024-11-28 09:16:34,906][00195] Avg episode rewards: #0: 21.248, true rewards: #0: 9.581 [2024-11-28 09:16:34,907][00195] Avg episode reward: 21.248, avg true_objective: 9.581 [2024-11-28 09:16:35,018][00195] Num frames 8700... [2024-11-28 09:16:35,156][00195] Num frames 8800... [2024-11-28 09:16:35,293][00195] Num frames 8900... [2024-11-28 09:16:35,442][00195] Num frames 9000... [2024-11-28 09:16:35,582][00195] Num frames 9100... [2024-11-28 09:16:35,644][00195] Avg episode rewards: #0: 19.703, true rewards: #0: 9.103 [2024-11-28 09:16:35,646][00195] Avg episode reward: 19.703, avg true_objective: 9.103 [2024-11-28 09:17:36,546][00195] Replay video saved to /content/train_dir/default_experiment/replay.mp4!