[2024-09-06 07:54:51,174][01070] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-06 07:54:51,178][01070] Rollout worker 0 uses device cpu [2024-09-06 07:54:51,179][01070] Rollout worker 1 uses device cpu [2024-09-06 07:54:51,182][01070] Rollout worker 2 uses device cpu [2024-09-06 07:54:51,183][01070] Rollout worker 3 uses device cpu [2024-09-06 07:54:51,184][01070] Rollout worker 4 uses device cpu [2024-09-06 07:54:51,185][01070] Rollout worker 5 uses device cpu [2024-09-06 07:54:51,186][01070] Rollout worker 6 uses device cpu [2024-09-06 07:54:51,187][01070] Rollout worker 7 uses device cpu [2024-09-06 07:54:51,345][01070] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 07:54:51,348][01070] InferenceWorker_p0-w0: min num requests: 2 [2024-09-06 07:54:51,381][01070] Starting all processes... [2024-09-06 07:54:51,382][01070] Starting process learner_proc0 [2024-09-06 07:54:52,097][01070] Starting all processes... [2024-09-06 07:54:52,107][01070] Starting process inference_proc0-0 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc0 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc1 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc2 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc3 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc4 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc5 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc6 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc7 [2024-09-06 07:55:08,667][06068] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 07:55:08,668][06068] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-06 07:55:08,754][06069] Worker 0 uses CPU cores [0] [2024-09-06 07:55:08,849][06068] Num visible devices: 1 [2024-09-06 07:55:08,857][06055] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 07:55:08,862][06055] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-06 07:55:08,865][06072] Worker 3 uses CPU cores [1] [2024-09-06 07:55:08,930][06073] Worker 4 uses CPU cores [0] [2024-09-06 07:55:08,955][06055] Num visible devices: 1 [2024-09-06 07:55:08,995][06055] Starting seed is not provided [2024-09-06 07:55:08,996][06055] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 07:55:08,997][06055] Initializing actor-critic model on device cuda:0 [2024-09-06 07:55:08,998][06055] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 07:55:09,001][06055] RunningMeanStd input shape: (1,) [2024-09-06 07:55:09,045][06075] Worker 6 uses CPU cores [0] [2024-09-06 07:55:09,054][06074] Worker 5 uses CPU cores [1] [2024-09-06 07:55:09,100][06055] ConvEncoder: input_channels=3 [2024-09-06 07:55:09,170][06071] Worker 2 uses CPU cores [0] [2024-09-06 07:55:09,188][06076] Worker 7 uses CPU cores [1] [2024-09-06 07:55:09,206][06070] Worker 1 uses CPU cores [1] [2024-09-06 07:55:09,434][06055] Conv encoder output size: 512 [2024-09-06 07:55:09,434][06055] Policy head output size: 512 [2024-09-06 07:55:09,504][06055] Created Actor Critic model with architecture: [2024-09-06 07:55:09,505][06055] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-06 07:55:09,884][06055] Using optimizer [2024-09-06 07:55:10,611][06055] No checkpoints found [2024-09-06 07:55:10,612][06055] Did not load from checkpoint, starting from scratch! [2024-09-06 07:55:10,612][06055] Initialized policy 0 weights for model version 0 [2024-09-06 07:55:10,618][06055] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 07:55:10,624][06055] LearnerWorker_p0 finished initialization! [2024-09-06 07:55:10,711][06068] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 07:55:10,713][06068] RunningMeanStd input shape: (1,) [2024-09-06 07:55:10,725][06068] ConvEncoder: input_channels=3 [2024-09-06 07:55:10,825][06068] Conv encoder output size: 512 [2024-09-06 07:55:10,826][06068] Policy head output size: 512 [2024-09-06 07:55:10,875][01070] Inference worker 0-0 is ready! [2024-09-06 07:55:10,877][01070] All inference workers are ready! Signal rollout workers to start! [2024-09-06 07:55:11,084][06071] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,088][06069] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,090][06075] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,099][06076] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,093][06073] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,101][06074] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,097][06072] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,109][06070] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,317][01070] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 07:55:11,338][01070] Heartbeat connected on Batcher_0 [2024-09-06 07:55:11,342][01070] Heartbeat connected on LearnerWorker_p0 [2024-09-06 07:55:11,375][01070] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-06 07:55:12,668][06071] Decorrelating experience for 0 frames... [2024-09-06 07:55:12,666][06069] Decorrelating experience for 0 frames... [2024-09-06 07:55:12,669][06075] Decorrelating experience for 0 frames... [2024-09-06 07:55:12,943][06076] Decorrelating experience for 0 frames... [2024-09-06 07:55:12,952][06072] Decorrelating experience for 0 frames... [2024-09-06 07:55:12,955][06074] Decorrelating experience for 0 frames... [2024-09-06 07:55:12,961][06070] Decorrelating experience for 0 frames... [2024-09-06 07:55:13,724][06076] Decorrelating experience for 32 frames... [2024-09-06 07:55:13,727][06074] Decorrelating experience for 32 frames... [2024-09-06 07:55:14,072][06071] Decorrelating experience for 32 frames... [2024-09-06 07:55:14,074][06069] Decorrelating experience for 32 frames... [2024-09-06 07:55:14,077][06075] Decorrelating experience for 32 frames... [2024-09-06 07:55:14,540][06073] Decorrelating experience for 0 frames... [2024-09-06 07:55:14,891][06070] Decorrelating experience for 32 frames... [2024-09-06 07:55:15,287][06074] Decorrelating experience for 64 frames... [2024-09-06 07:55:15,383][06072] Decorrelating experience for 32 frames... [2024-09-06 07:55:15,566][06071] Decorrelating experience for 64 frames... [2024-09-06 07:55:15,585][06069] Decorrelating experience for 64 frames... [2024-09-06 07:55:16,059][06075] Decorrelating experience for 64 frames... [2024-09-06 07:55:16,321][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 07:55:16,546][06071] Decorrelating experience for 96 frames... [2024-09-06 07:55:16,581][06070] Decorrelating experience for 64 frames... [2024-09-06 07:55:16,677][06076] Decorrelating experience for 64 frames... [2024-09-06 07:55:16,724][06074] Decorrelating experience for 96 frames... [2024-09-06 07:55:16,739][01070] Heartbeat connected on RolloutWorker_w2 [2024-09-06 07:55:16,988][01070] Heartbeat connected on RolloutWorker_w5 [2024-09-06 07:55:17,186][06075] Decorrelating experience for 96 frames... [2024-09-06 07:55:17,328][01070] Heartbeat connected on RolloutWorker_w6 [2024-09-06 07:55:17,605][06069] Decorrelating experience for 96 frames... [2024-09-06 07:55:17,810][01070] Heartbeat connected on RolloutWorker_w0 [2024-09-06 07:55:17,989][06072] Decorrelating experience for 64 frames... [2024-09-06 07:55:17,992][06070] Decorrelating experience for 96 frames... [2024-09-06 07:55:18,111][06076] Decorrelating experience for 96 frames... [2024-09-06 07:55:18,212][01070] Heartbeat connected on RolloutWorker_w1 [2024-09-06 07:55:18,296][01070] Heartbeat connected on RolloutWorker_w7 [2024-09-06 07:55:18,667][06073] Decorrelating experience for 32 frames... [2024-09-06 07:55:18,736][06072] Decorrelating experience for 96 frames... [2024-09-06 07:55:18,820][01070] Heartbeat connected on RolloutWorker_w3 [2024-09-06 07:55:20,114][06073] Decorrelating experience for 64 frames... [2024-09-06 07:55:21,318][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 91.4. Samples: 914. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 07:55:21,321][01070] Avg episode reward: [(0, '1.348')] [2024-09-06 07:55:22,967][06055] Signal inference workers to stop experience collection... [2024-09-06 07:55:22,981][06068] InferenceWorker_p0-w0: stopping experience collection [2024-09-06 07:55:23,292][06073] Decorrelating experience for 96 frames... [2024-09-06 07:55:23,475][01070] Heartbeat connected on RolloutWorker_w4 [2024-09-06 07:55:26,317][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 152.5. Samples: 2288. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 07:55:26,322][01070] Avg episode reward: [(0, '2.368')] [2024-09-06 07:55:26,933][06055] Signal inference workers to resume experience collection... [2024-09-06 07:55:26,935][06068] InferenceWorker_p0-w0: resuming experience collection [2024-09-06 07:55:31,317][01070] Fps is (10 sec: 2458.0, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 209.5. Samples: 4190. Policy #0 lag: (min: 0.0, avg: 0.4, max: 3.0) [2024-09-06 07:55:31,322][01070] Avg episode reward: [(0, '3.545')] [2024-09-06 07:55:34,552][06068] Updated weights for policy 0, policy_version 10 (0.0173) [2024-09-06 07:55:36,317][01070] Fps is (10 sec: 4505.4, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 45056. Throughput: 0: 445.8. Samples: 11144. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:55:36,323][01070] Avg episode reward: [(0, '4.241')] [2024-09-06 07:55:41,319][01070] Fps is (10 sec: 3276.0, 60 sec: 1911.3, 300 sec: 1911.3). Total num frames: 57344. Throughput: 0: 511.6. Samples: 15350. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:55:41,324][01070] Avg episode reward: [(0, '4.401')] [2024-09-06 07:55:46,317][01070] Fps is (10 sec: 3276.8, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 500.7. Samples: 17526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:55:46,319][01070] Avg episode reward: [(0, '4.396')] [2024-09-06 07:55:47,170][06068] Updated weights for policy 0, policy_version 20 (0.0030) [2024-09-06 07:55:51,317][01070] Fps is (10 sec: 4097.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 98304. Throughput: 0: 607.7. Samples: 24308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 07:55:51,319][01070] Avg episode reward: [(0, '4.360')] [2024-09-06 07:55:56,317][01070] Fps is (10 sec: 4096.0, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 118784. Throughput: 0: 684.4. Samples: 30798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:55:56,321][01070] Avg episode reward: [(0, '4.546')] [2024-09-06 07:55:56,332][06055] Saving new best policy, reward=4.546! [2024-09-06 07:55:56,696][06068] Updated weights for policy 0, policy_version 30 (0.0020) [2024-09-06 07:56:01,317][01070] Fps is (10 sec: 3686.4, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 135168. Throughput: 0: 728.7. Samples: 32786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:56:01,321][01070] Avg episode reward: [(0, '4.463')] [2024-09-06 07:56:06,316][01070] Fps is (10 sec: 4096.2, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 159744. Throughput: 0: 840.3. Samples: 38726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 07:56:06,319][01070] Avg episode reward: [(0, '4.541')] [2024-09-06 07:56:07,255][06068] Updated weights for policy 0, policy_version 40 (0.0021) [2024-09-06 07:56:11,317][01070] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 172032. Throughput: 0: 933.5. Samples: 44294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:56:11,325][01070] Avg episode reward: [(0, '4.557')] [2024-09-06 07:56:11,327][06055] Saving new best policy, reward=4.557! [2024-09-06 07:56:16,317][01070] Fps is (10 sec: 2457.4, 60 sec: 3072.2, 300 sec: 2835.7). Total num frames: 184320. Throughput: 0: 930.2. Samples: 46048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:56:16,322][01070] Avg episode reward: [(0, '4.495')] [2024-09-06 07:56:21,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2867.2). Total num frames: 200704. Throughput: 0: 859.7. Samples: 49832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:56:21,319][01070] Avg episode reward: [(0, '4.274')] [2024-09-06 07:56:21,856][06068] Updated weights for policy 0, policy_version 50 (0.0042) [2024-09-06 07:56:26,320][01070] Fps is (10 sec: 4094.9, 60 sec: 3754.4, 300 sec: 3003.6). Total num frames: 225280. Throughput: 0: 917.1. Samples: 56622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:56:26,326][01070] Avg episode reward: [(0, '4.304')] [2024-09-06 07:56:30,559][06068] Updated weights for policy 0, policy_version 60 (0.0031) [2024-09-06 07:56:31,316][01070] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 946.4. Samples: 60114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:56:31,322][01070] Avg episode reward: [(0, '4.369')] [2024-09-06 07:56:36,317][01070] Fps is (10 sec: 3687.7, 60 sec: 3618.2, 300 sec: 3084.0). Total num frames: 262144. Throughput: 0: 907.4. Samples: 65140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:56:36,319][01070] Avg episode reward: [(0, '4.407')] [2024-09-06 07:56:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3140.3). Total num frames: 282624. Throughput: 0: 891.6. Samples: 70920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:56:41,318][01070] Avg episode reward: [(0, '4.396')] [2024-09-06 07:56:42,046][06068] Updated weights for policy 0, policy_version 70 (0.0046) [2024-09-06 07:56:46,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3233.7). Total num frames: 307200. Throughput: 0: 924.6. Samples: 74394. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 07:56:46,323][01070] Avg episode reward: [(0, '4.273')] [2024-09-06 07:56:46,333][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth... [2024-09-06 07:56:51,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3194.9). Total num frames: 319488. Throughput: 0: 923.6. Samples: 80288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 07:56:51,321][01070] Avg episode reward: [(0, '4.403')] [2024-09-06 07:56:53,243][06068] Updated weights for policy 0, policy_version 80 (0.0022) [2024-09-06 07:56:56,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3198.8). Total num frames: 335872. Throughput: 0: 904.1. Samples: 84980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:56:56,321][01070] Avg episode reward: [(0, '4.573')] [2024-09-06 07:56:56,329][06055] Saving new best policy, reward=4.573! [2024-09-06 07:57:01,316][01070] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 360448. Throughput: 0: 940.5. Samples: 88368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 07:57:01,325][01070] Avg episode reward: [(0, '4.464')] [2024-09-06 07:57:02,726][06068] Updated weights for policy 0, policy_version 90 (0.0031) [2024-09-06 07:57:06,316][01070] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3312.4). Total num frames: 380928. Throughput: 0: 1012.4. Samples: 95392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:57:06,319][01070] Avg episode reward: [(0, '4.686')] [2024-09-06 07:57:06,327][06055] Saving new best policy, reward=4.686! [2024-09-06 07:57:11,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 393216. Throughput: 0: 953.2. Samples: 99512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:57:11,320][01070] Avg episode reward: [(0, '4.739')] [2024-09-06 07:57:11,330][06055] Saving new best policy, reward=4.739! [2024-09-06 07:57:14,413][06068] Updated weights for policy 0, policy_version 100 (0.0021) [2024-09-06 07:57:16,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3342.3). Total num frames: 417792. Throughput: 0: 941.3. Samples: 102472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 07:57:16,324][01070] Avg episode reward: [(0, '4.631')] [2024-09-06 07:57:21,317][01070] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3402.8). Total num frames: 442368. Throughput: 0: 983.4. Samples: 109392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 07:57:21,324][01070] Avg episode reward: [(0, '4.517')] [2024-09-06 07:57:23,713][06068] Updated weights for policy 0, policy_version 110 (0.0022) [2024-09-06 07:57:26,318][01070] Fps is (10 sec: 3685.8, 60 sec: 3823.1, 300 sec: 3367.8). Total num frames: 454656. Throughput: 0: 971.4. Samples: 114634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:57:26,321][01070] Avg episode reward: [(0, '4.538')] [2024-09-06 07:57:31,316][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3393.8). Total num frames: 475136. Throughput: 0: 942.6. Samples: 116810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:57:31,323][01070] Avg episode reward: [(0, '4.571')] [2024-09-06 07:57:34,630][06068] Updated weights for policy 0, policy_version 120 (0.0029) [2024-09-06 07:57:36,317][01070] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3418.0). Total num frames: 495616. Throughput: 0: 962.0. Samples: 123580. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 07:57:36,324][01070] Avg episode reward: [(0, '4.726')] [2024-09-06 07:57:41,317][01070] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3440.6). Total num frames: 516096. Throughput: 0: 999.0. Samples: 129936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:57:41,323][01070] Avg episode reward: [(0, '4.768')] [2024-09-06 07:57:41,328][06055] Saving new best policy, reward=4.768! [2024-09-06 07:57:46,181][06068] Updated weights for policy 0, policy_version 130 (0.0043) [2024-09-06 07:57:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3435.4). Total num frames: 532480. Throughput: 0: 968.0. Samples: 131926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 07:57:46,324][01070] Avg episode reward: [(0, '4.814')] [2024-09-06 07:57:46,339][06055] Saving new best policy, reward=4.814! [2024-09-06 07:57:51,316][01070] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3456.0). Total num frames: 552960. Throughput: 0: 937.5. Samples: 137578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:57:51,319][01070] Avg episode reward: [(0, '4.845')] [2024-09-06 07:57:51,323][06055] Saving new best policy, reward=4.845! [2024-09-06 07:57:55,369][06068] Updated weights for policy 0, policy_version 140 (0.0029) [2024-09-06 07:57:56,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3500.2). Total num frames: 577536. Throughput: 0: 1000.7. Samples: 144544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:57:56,319][01070] Avg episode reward: [(0, '4.741')] [2024-09-06 07:58:01,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3469.6). Total num frames: 589824. Throughput: 0: 990.9. Samples: 147064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 07:58:01,322][01070] Avg episode reward: [(0, '4.681')] [2024-09-06 07:58:06,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3487.5). Total num frames: 610304. Throughput: 0: 940.4. Samples: 151710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:58:06,321][01070] Avg episode reward: [(0, '4.354')] [2024-09-06 07:58:06,904][06068] Updated weights for policy 0, policy_version 150 (0.0034) [2024-09-06 07:58:11,317][01070] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3504.4). Total num frames: 630784. Throughput: 0: 978.3. Samples: 158654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:58:11,319][01070] Avg episode reward: [(0, '4.484')] [2024-09-06 07:58:16,317][01070] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3520.3). Total num frames: 651264. Throughput: 0: 1007.8. Samples: 162160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 07:58:16,322][01070] Avg episode reward: [(0, '4.499')] [2024-09-06 07:58:17,037][06068] Updated weights for policy 0, policy_version 160 (0.0034) [2024-09-06 07:58:21,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3492.4). Total num frames: 663552. Throughput: 0: 953.2. Samples: 166476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:58:21,322][01070] Avg episode reward: [(0, '4.346')] [2024-09-06 07:58:26,317][01070] Fps is (10 sec: 3686.5, 60 sec: 3891.3, 300 sec: 3528.9). Total num frames: 688128. Throughput: 0: 952.7. Samples: 172806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:58:26,319][01070] Avg episode reward: [(0, '4.378')] [2024-09-06 07:58:27,560][06068] Updated weights for policy 0, policy_version 170 (0.0040) [2024-09-06 07:58:31,317][01070] Fps is (10 sec: 4915.1, 60 sec: 3959.4, 300 sec: 3563.5). Total num frames: 712704. Throughput: 0: 985.9. Samples: 176290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 07:58:31,324][01070] Avg episode reward: [(0, '4.546')] [2024-09-06 07:58:36,322][01070] Fps is (10 sec: 3684.3, 60 sec: 3822.6, 300 sec: 3536.4). Total num frames: 724992. Throughput: 0: 981.4. Samples: 181748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:58:36,325][01070] Avg episode reward: [(0, '4.479')] [2024-09-06 07:58:39,036][06068] Updated weights for policy 0, policy_version 180 (0.0054) [2024-09-06 07:58:41,317][01070] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 745472. Throughput: 0: 945.0. Samples: 187070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 07:58:41,323][01070] Avg episode reward: [(0, '4.487')] [2024-09-06 07:58:46,317][01070] Fps is (10 sec: 4508.1, 60 sec: 3959.5, 300 sec: 3581.6). Total num frames: 770048. Throughput: 0: 963.1. Samples: 190404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 07:58:46,319][01070] Avg episode reward: [(0, '4.599')] [2024-09-06 07:58:46,332][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth... [2024-09-06 07:58:48,051][06068] Updated weights for policy 0, policy_version 190 (0.0033) [2024-09-06 07:58:51,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3574.7). Total num frames: 786432. Throughput: 0: 1001.3. Samples: 196768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 07:58:51,319][01070] Avg episode reward: [(0, '4.716')] [2024-09-06 07:58:56,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3568.1). Total num frames: 802816. Throughput: 0: 945.9. Samples: 201218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:58:56,324][01070] Avg episode reward: [(0, '4.677')] [2024-09-06 07:58:59,542][06068] Updated weights for policy 0, policy_version 200 (0.0036) [2024-09-06 07:59:01,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3579.5). Total num frames: 823296. Throughput: 0: 944.4. Samples: 204656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:59:01,319][01070] Avg episode reward: [(0, '4.666')] [2024-09-06 07:59:06,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3608.0). Total num frames: 847872. Throughput: 0: 1001.9. Samples: 211562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:59:06,323][01070] Avg episode reward: [(0, '4.550')] [2024-09-06 07:59:10,208][06068] Updated weights for policy 0, policy_version 210 (0.0028) [2024-09-06 07:59:11,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3584.0). Total num frames: 860160. Throughput: 0: 960.0. Samples: 216004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:59:11,327][01070] Avg episode reward: [(0, '4.533')] [2024-09-06 07:59:16,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3594.4). Total num frames: 880640. Throughput: 0: 939.2. Samples: 218556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:59:16,323][01070] Avg episode reward: [(0, '4.805')] [2024-09-06 07:59:20,094][06068] Updated weights for policy 0, policy_version 220 (0.0030) [2024-09-06 07:59:21,316][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3620.9). Total num frames: 905216. Throughput: 0: 972.9. Samples: 225522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:59:21,319][01070] Avg episode reward: [(0, '4.579')] [2024-09-06 07:59:26,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3614.1). Total num frames: 921600. Throughput: 0: 980.2. Samples: 231180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:59:26,322][01070] Avg episode reward: [(0, '4.617')] [2024-09-06 07:59:31,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3607.6). Total num frames: 937984. Throughput: 0: 952.4. Samples: 233264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:59:31,321][01070] Avg episode reward: [(0, '4.698')] [2024-09-06 07:59:31,746][06068] Updated weights for policy 0, policy_version 230 (0.0037) [2024-09-06 07:59:36,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.8, 300 sec: 3632.3). Total num frames: 962560. Throughput: 0: 960.2. Samples: 239978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:59:36,324][01070] Avg episode reward: [(0, '4.640')] [2024-09-06 07:59:40,641][06068] Updated weights for policy 0, policy_version 240 (0.0019) [2024-09-06 07:59:41,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3640.9). Total num frames: 983040. Throughput: 0: 1006.4. Samples: 246506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:59:41,319][01070] Avg episode reward: [(0, '4.662')] [2024-09-06 07:59:46,321][01070] Fps is (10 sec: 3684.9, 60 sec: 3822.7, 300 sec: 3634.2). Total num frames: 999424. Throughput: 0: 975.4. Samples: 248554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:59:46,324][01070] Avg episode reward: [(0, '4.517')] [2024-09-06 07:59:51,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3642.5). Total num frames: 1019904. Throughput: 0: 945.4. Samples: 254104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:59:51,319][01070] Avg episode reward: [(0, '4.585')] [2024-09-06 07:59:52,074][06068] Updated weights for policy 0, policy_version 250 (0.0044) [2024-09-06 07:59:56,317][01070] Fps is (10 sec: 3687.9, 60 sec: 3891.2, 300 sec: 3636.1). Total num frames: 1036288. Throughput: 0: 969.2. Samples: 259616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:59:56,319][01070] Avg episode reward: [(0, '4.734')] [2024-09-06 08:00:01,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3615.8). Total num frames: 1048576. Throughput: 0: 955.6. Samples: 261556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:00:01,321][01070] Avg episode reward: [(0, '4.762')] [2024-09-06 08:00:06,317][01070] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 1060864. Throughput: 0: 888.8. Samples: 265516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:00:06,318][01070] Avg episode reward: [(0, '4.686')] [2024-09-06 08:00:06,345][06068] Updated weights for policy 0, policy_version 260 (0.0046) [2024-09-06 08:00:11,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1085440. Throughput: 0: 911.2. Samples: 272186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:00:11,319][01070] Avg episode reward: [(0, '4.470')] [2024-09-06 08:00:15,254][06068] Updated weights for policy 0, policy_version 270 (0.0039) [2024-09-06 08:00:16,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1105920. Throughput: 0: 939.1. Samples: 275524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:00:16,318][01070] Avg episode reward: [(0, '4.535')] [2024-09-06 08:00:21,328][01070] Fps is (10 sec: 3682.2, 60 sec: 3617.4, 300 sec: 3804.3). Total num frames: 1122304. Throughput: 0: 901.1. Samples: 280540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:00:21,339][01070] Avg episode reward: [(0, '4.759')] [2024-09-06 08:00:26,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1142784. Throughput: 0: 885.6. Samples: 286356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:00:26,321][01070] Avg episode reward: [(0, '4.852')] [2024-09-06 08:00:26,332][06055] Saving new best policy, reward=4.852! [2024-09-06 08:00:26,855][06068] Updated weights for policy 0, policy_version 280 (0.0043) [2024-09-06 08:00:31,317][01070] Fps is (10 sec: 4510.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1167360. Throughput: 0: 914.9. Samples: 289722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:00:31,318][01070] Avg episode reward: [(0, '4.581')] [2024-09-06 08:00:36,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1183744. Throughput: 0: 929.6. Samples: 295936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:00:36,323][01070] Avg episode reward: [(0, '4.600')] [2024-09-06 08:00:37,327][06068] Updated weights for policy 0, policy_version 290 (0.0022) [2024-09-06 08:00:41,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 1200128. Throughput: 0: 911.8. Samples: 300646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:00:41,323][01070] Avg episode reward: [(0, '4.564')] [2024-09-06 08:00:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3804.4). Total num frames: 1220608. Throughput: 0: 943.5. Samples: 304012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:00:46,321][01070] Avg episode reward: [(0, '4.603')] [2024-09-06 08:00:46,372][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000299_1224704.pth... [2024-09-06 08:00:46,507][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth [2024-09-06 08:00:47,243][06068] Updated weights for policy 0, policy_version 300 (0.0033) [2024-09-06 08:00:51,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 1241088. Throughput: 0: 1005.5. Samples: 310764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:00:51,321][01070] Avg episode reward: [(0, '4.905')] [2024-09-06 08:00:51,344][06055] Saving new best policy, reward=4.905! [2024-09-06 08:00:56,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 1257472. Throughput: 0: 950.9. Samples: 314976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:00:56,324][01070] Avg episode reward: [(0, '4.923')] [2024-09-06 08:00:56,342][06055] Saving new best policy, reward=4.923! [2024-09-06 08:00:58,832][06068] Updated weights for policy 0, policy_version 310 (0.0029) [2024-09-06 08:01:01,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1277952. Throughput: 0: 942.3. Samples: 317926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:01:01,322][01070] Avg episode reward: [(0, '4.709')] [2024-09-06 08:01:06,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 1302528. Throughput: 0: 984.8. Samples: 324844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:01:06,319][01070] Avg episode reward: [(0, '4.749')] [2024-09-06 08:01:08,155][06068] Updated weights for policy 0, policy_version 320 (0.0016) [2024-09-06 08:01:11,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1318912. Throughput: 0: 972.0. Samples: 330098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:01:11,320][01070] Avg episode reward: [(0, '4.757')] [2024-09-06 08:01:16,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1335296. Throughput: 0: 944.2. Samples: 332210. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 08:01:16,324][01070] Avg episode reward: [(0, '4.768')] [2024-09-06 08:01:19,327][06068] Updated weights for policy 0, policy_version 330 (0.0026) [2024-09-06 08:01:21,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3960.2, 300 sec: 3846.1). Total num frames: 1359872. Throughput: 0: 959.8. Samples: 339126. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:01:21,320][01070] Avg episode reward: [(0, '4.852')] [2024-09-06 08:01:26,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1380352. Throughput: 0: 994.4. Samples: 345396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:01:26,321][01070] Avg episode reward: [(0, '4.673')] [2024-09-06 08:01:30,374][06068] Updated weights for policy 0, policy_version 340 (0.0044) [2024-09-06 08:01:31,316][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1392640. Throughput: 0: 965.5. Samples: 347460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 08:01:31,320][01070] Avg episode reward: [(0, '4.731')] [2024-09-06 08:01:36,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1417216. Throughput: 0: 949.2. Samples: 353480. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 08:01:36,318][01070] Avg episode reward: [(0, '4.856')] [2024-09-06 08:01:39,566][06068] Updated weights for policy 0, policy_version 350 (0.0025) [2024-09-06 08:01:41,316][01070] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 1441792. Throughput: 0: 1011.6. Samples: 360496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:01:41,323][01070] Avg episode reward: [(0, '4.687')] [2024-09-06 08:01:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1454080. Throughput: 0: 996.0. Samples: 362748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:01:46,323][01070] Avg episode reward: [(0, '4.580')] [2024-09-06 08:01:51,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1470464. Throughput: 0: 945.5. Samples: 367392. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 08:01:51,321][01070] Avg episode reward: [(0, '4.688')] [2024-09-06 08:01:51,501][06068] Updated weights for policy 0, policy_version 360 (0.0037) [2024-09-06 08:01:56,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1495040. Throughput: 0: 984.5. Samples: 374400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:01:56,324][01070] Avg episode reward: [(0, '4.809')] [2024-09-06 08:02:01,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1511424. Throughput: 0: 1012.0. Samples: 377750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:01,325][01070] Avg episode reward: [(0, '5.081')] [2024-09-06 08:02:01,334][06055] Saving new best policy, reward=5.081! [2024-09-06 08:02:01,346][06068] Updated weights for policy 0, policy_version 370 (0.0044) [2024-09-06 08:02:06,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1527808. Throughput: 0: 953.3. Samples: 382024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:06,324][01070] Avg episode reward: [(0, '5.109')] [2024-09-06 08:02:06,335][06055] Saving new best policy, reward=5.109! [2024-09-06 08:02:11,317][01070] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1552384. Throughput: 0: 958.0. Samples: 388506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:11,327][01070] Avg episode reward: [(0, '4.909')] [2024-09-06 08:02:12,040][06068] Updated weights for policy 0, policy_version 380 (0.0043) [2024-09-06 08:02:16,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1572864. Throughput: 0: 989.5. Samples: 391988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:16,319][01070] Avg episode reward: [(0, '4.791')] [2024-09-06 08:02:21,316][01070] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1589248. Throughput: 0: 966.2. Samples: 396960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:02:21,321][01070] Avg episode reward: [(0, '4.771')] [2024-09-06 08:02:23,656][06068] Updated weights for policy 0, policy_version 390 (0.0062) [2024-09-06 08:02:26,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1609728. Throughput: 0: 936.7. Samples: 402648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:26,322][01070] Avg episode reward: [(0, '4.651')] [2024-09-06 08:02:31,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1630208. Throughput: 0: 964.6. Samples: 406156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:31,322][01070] Avg episode reward: [(0, '4.673')] [2024-09-06 08:02:32,348][06068] Updated weights for policy 0, policy_version 400 (0.0035) [2024-09-06 08:02:36,317][01070] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1650688. Throughput: 0: 999.3. Samples: 412360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:36,318][01070] Avg episode reward: [(0, '4.851')] [2024-09-06 08:02:41,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1662976. Throughput: 0: 946.6. Samples: 416998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:41,322][01070] Avg episode reward: [(0, '5.188')] [2024-09-06 08:02:41,329][06055] Saving new best policy, reward=5.188! [2024-09-06 08:02:43,939][06068] Updated weights for policy 0, policy_version 410 (0.0044) [2024-09-06 08:02:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1687552. Throughput: 0: 948.0. Samples: 420410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:02:46,322][01070] Avg episode reward: [(0, '5.138')] [2024-09-06 08:02:46,333][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000412_1687552.pth... [2024-09-06 08:02:46,505][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth [2024-09-06 08:02:51,318][01070] Fps is (10 sec: 4504.9, 60 sec: 3959.3, 300 sec: 3832.2). Total num frames: 1708032. Throughput: 0: 1003.2. Samples: 427168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:51,321][01070] Avg episode reward: [(0, '4.904')] [2024-09-06 08:02:54,708][06068] Updated weights for policy 0, policy_version 420 (0.0014) [2024-09-06 08:02:56,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1724416. Throughput: 0: 952.8. Samples: 431380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:02:56,319][01070] Avg episode reward: [(0, '4.841')] [2024-09-06 08:03:01,317][01070] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1744896. Throughput: 0: 938.4. Samples: 434216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:03:01,319][01070] Avg episode reward: [(0, '4.777')] [2024-09-06 08:03:04,504][06068] Updated weights for policy 0, policy_version 430 (0.0014) [2024-09-06 08:03:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1765376. Throughput: 0: 983.9. Samples: 441236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:03:06,323][01070] Avg episode reward: [(0, '4.848')] [2024-09-06 08:03:11,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 1781760. Throughput: 0: 978.1. Samples: 446662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:03:11,322][01070] Avg episode reward: [(0, '4.875')] [2024-09-06 08:03:16,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 1798144. Throughput: 0: 947.8. Samples: 448806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:03:16,319][01070] Avg episode reward: [(0, '5.049')] [2024-09-06 08:03:16,335][06068] Updated weights for policy 0, policy_version 440 (0.0023) [2024-09-06 08:03:21,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1822720. Throughput: 0: 954.9. Samples: 455330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:03:21,320][01070] Avg episode reward: [(0, '5.155')] [2024-09-06 08:03:25,412][06068] Updated weights for policy 0, policy_version 450 (0.0032) [2024-09-06 08:03:26,317][01070] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1843200. Throughput: 0: 993.1. Samples: 461688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:03:26,318][01070] Avg episode reward: [(0, '5.165')] [2024-09-06 08:03:31,322][01070] Fps is (10 sec: 3684.2, 60 sec: 3822.6, 300 sec: 3846.1). Total num frames: 1859584. Throughput: 0: 961.7. Samples: 463694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:03:31,325][01070] Avg episode reward: [(0, '5.015')] [2024-09-06 08:03:36,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1880064. Throughput: 0: 941.8. Samples: 469546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:03:36,319][01070] Avg episode reward: [(0, '4.801')] [2024-09-06 08:03:36,683][06068] Updated weights for policy 0, policy_version 460 (0.0046) [2024-09-06 08:03:41,317][01070] Fps is (10 sec: 4098.3, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1900544. Throughput: 0: 995.4. Samples: 476172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:03:41,324][01070] Avg episode reward: [(0, '5.126')] [2024-09-06 08:03:46,323][01070] Fps is (10 sec: 3274.6, 60 sec: 3754.2, 300 sec: 3818.2). Total num frames: 1912832. Throughput: 0: 972.6. Samples: 477988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:03:46,326][01070] Avg episode reward: [(0, '5.144')] [2024-09-06 08:03:50,595][06068] Updated weights for policy 0, policy_version 470 (0.0043) [2024-09-06 08:03:51,317][01070] Fps is (10 sec: 2457.7, 60 sec: 3618.2, 300 sec: 3804.4). Total num frames: 1925120. Throughput: 0: 891.7. Samples: 481364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:03:51,320][01070] Avg episode reward: [(0, '5.180')] [2024-09-06 08:03:56,317][01070] Fps is (10 sec: 3688.9, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1949696. Throughput: 0: 907.3. Samples: 487490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:03:56,320][01070] Avg episode reward: [(0, '4.899')] [2024-09-06 08:03:59,818][06068] Updated weights for policy 0, policy_version 480 (0.0031) [2024-09-06 08:04:01,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1970176. Throughput: 0: 936.2. Samples: 490936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:04:01,319][01070] Avg episode reward: [(0, '4.708')] [2024-09-06 08:04:06,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1986560. Throughput: 0: 912.2. Samples: 496378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:04:06,319][01070] Avg episode reward: [(0, '4.858')] [2024-09-06 08:04:11,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2002944. Throughput: 0: 889.3. Samples: 501706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:04:11,323][01070] Avg episode reward: [(0, '5.187')] [2024-09-06 08:04:11,461][06068] Updated weights for policy 0, policy_version 490 (0.0044) [2024-09-06 08:04:16,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 2027520. Throughput: 0: 922.6. Samples: 505206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:04:16,321][01070] Avg episode reward: [(0, '5.216')] [2024-09-06 08:04:16,333][06055] Saving new best policy, reward=5.216! [2024-09-06 08:04:21,319][01070] Fps is (10 sec: 4094.9, 60 sec: 3686.2, 300 sec: 3804.4). Total num frames: 2043904. Throughput: 0: 931.9. Samples: 511486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:04:21,322][01070] Avg episode reward: [(0, '5.013')] [2024-09-06 08:04:21,394][06068] Updated weights for policy 0, policy_version 500 (0.0030) [2024-09-06 08:04:26,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2060288. Throughput: 0: 878.6. Samples: 515708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:04:26,324][01070] Avg episode reward: [(0, '4.618')] [2024-09-06 08:04:31,317][01070] Fps is (10 sec: 4097.1, 60 sec: 3755.0, 300 sec: 3804.4). Total num frames: 2084864. Throughput: 0: 916.1. Samples: 519206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:04:31,319][01070] Avg episode reward: [(0, '4.710')] [2024-09-06 08:04:32,059][06068] Updated weights for policy 0, policy_version 510 (0.0024) [2024-09-06 08:04:36,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2105344. Throughput: 0: 995.1. Samples: 526144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:04:36,319][01070] Avg episode reward: [(0, '5.116')] [2024-09-06 08:04:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3804.5). Total num frames: 2121728. Throughput: 0: 964.6. Samples: 530896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:04:41,319][01070] Avg episode reward: [(0, '5.008')] [2024-09-06 08:04:43,583][06068] Updated weights for policy 0, policy_version 520 (0.0023) [2024-09-06 08:04:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.4, 300 sec: 3804.4). Total num frames: 2142208. Throughput: 0: 945.4. Samples: 533480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:04:46,320][01070] Avg episode reward: [(0, '5.078')] [2024-09-06 08:04:46,333][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000523_2142208.pth... [2024-09-06 08:04:46,489][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000299_1224704.pth [2024-09-06 08:04:51,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2162688. Throughput: 0: 975.2. Samples: 540264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:04:51,321][01070] Avg episode reward: [(0, '5.193')] [2024-09-06 08:04:52,623][06068] Updated weights for policy 0, policy_version 530 (0.0036) [2024-09-06 08:04:56,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2179072. Throughput: 0: 981.4. Samples: 545870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:04:56,323][01070] Avg episode reward: [(0, '4.939')] [2024-09-06 08:05:01,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2195456. Throughput: 0: 949.8. Samples: 547946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:05:01,323][01070] Avg episode reward: [(0, '5.156')] [2024-09-06 08:05:04,000][06068] Updated weights for policy 0, policy_version 540 (0.0028) [2024-09-06 08:05:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2220032. Throughput: 0: 959.5. Samples: 554662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:05:06,324][01070] Avg episode reward: [(0, '5.088')] [2024-09-06 08:05:11,316][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2240512. Throughput: 0: 1010.8. Samples: 561196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:05:11,320][01070] Avg episode reward: [(0, '4.885')] [2024-09-06 08:05:14,445][06068] Updated weights for policy 0, policy_version 550 (0.0037) [2024-09-06 08:05:16,318][01070] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3846.2). Total num frames: 2256896. Throughput: 0: 978.9. Samples: 563260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:05:16,321][01070] Avg episode reward: [(0, '5.189')] [2024-09-06 08:05:21,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3891.4, 300 sec: 3846.1). Total num frames: 2277376. Throughput: 0: 948.0. Samples: 568804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:05:21,319][01070] Avg episode reward: [(0, '5.640')] [2024-09-06 08:05:21,324][06055] Saving new best policy, reward=5.640! [2024-09-06 08:05:24,437][06068] Updated weights for policy 0, policy_version 560 (0.0040) [2024-09-06 08:05:26,317][01070] Fps is (10 sec: 4506.5, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 2301952. Throughput: 0: 998.1. Samples: 575810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:05:26,321][01070] Avg episode reward: [(0, '5.759')] [2024-09-06 08:05:26,331][06055] Saving new best policy, reward=5.759! [2024-09-06 08:05:31,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2318336. Throughput: 0: 999.7. Samples: 578466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:05:31,322][01070] Avg episode reward: [(0, '5.847')] [2024-09-06 08:05:31,326][06055] Saving new best policy, reward=5.847! [2024-09-06 08:05:35,920][06068] Updated weights for policy 0, policy_version 570 (0.0024) [2024-09-06 08:05:36,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2334720. Throughput: 0: 952.0. Samples: 583102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:05:36,319][01070] Avg episode reward: [(0, '5.772')] [2024-09-06 08:05:41,317][01070] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2359296. Throughput: 0: 984.4. Samples: 590166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:05:41,319][01070] Avg episode reward: [(0, '5.772')] [2024-09-06 08:05:44,966][06068] Updated weights for policy 0, policy_version 580 (0.0031) [2024-09-06 08:05:46,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2375680. Throughput: 0: 1015.8. Samples: 593656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:05:46,326][01070] Avg episode reward: [(0, '5.835')] [2024-09-06 08:05:51,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2392064. Throughput: 0: 961.4. Samples: 597926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:05:51,325][01070] Avg episode reward: [(0, '5.707')] [2024-09-06 08:05:56,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2412544. Throughput: 0: 956.4. Samples: 604232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:05:56,325][01070] Avg episode reward: [(0, '5.539')] [2024-09-06 08:05:56,404][06068] Updated weights for policy 0, policy_version 590 (0.0035) [2024-09-06 08:06:01,317][01070] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 2437120. Throughput: 0: 987.2. Samples: 607680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:06:01,321][01070] Avg episode reward: [(0, '5.823')] [2024-09-06 08:06:06,317][01070] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2453504. Throughput: 0: 982.3. Samples: 613010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:06:06,323][01070] Avg episode reward: [(0, '5.596')] [2024-09-06 08:06:07,575][06068] Updated weights for policy 0, policy_version 600 (0.0034) [2024-09-06 08:06:11,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2469888. Throughput: 0: 945.9. Samples: 618374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:06:11,324][01070] Avg episode reward: [(0, '5.336')] [2024-09-06 08:06:16,317][01070] Fps is (10 sec: 4096.2, 60 sec: 3959.6, 300 sec: 3846.1). Total num frames: 2494464. Throughput: 0: 964.4. Samples: 621862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:06:16,324][01070] Avg episode reward: [(0, '5.749')] [2024-09-06 08:06:16,804][06068] Updated weights for policy 0, policy_version 610 (0.0036) [2024-09-06 08:06:21,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2510848. Throughput: 0: 1000.3. Samples: 628114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:06:21,322][01070] Avg episode reward: [(0, '6.247')] [2024-09-06 08:06:21,328][06055] Saving new best policy, reward=6.247! [2024-09-06 08:06:26,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2527232. Throughput: 0: 943.9. Samples: 632640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:06:26,324][01070] Avg episode reward: [(0, '6.326')] [2024-09-06 08:06:26,335][06055] Saving new best policy, reward=6.326! [2024-09-06 08:06:28,439][06068] Updated weights for policy 0, policy_version 620 (0.0026) [2024-09-06 08:06:31,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2551808. Throughput: 0: 943.6. Samples: 636116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:06:31,323][01070] Avg episode reward: [(0, '5.950')] [2024-09-06 08:06:36,323][01070] Fps is (10 sec: 4502.5, 60 sec: 3959.0, 300 sec: 3832.1). Total num frames: 2572288. Throughput: 0: 1005.1. Samples: 643160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:06:36,326][01070] Avg episode reward: [(0, '5.960')] [2024-09-06 08:06:38,101][06068] Updated weights for policy 0, policy_version 630 (0.0023) [2024-09-06 08:06:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2588672. Throughput: 0: 964.2. Samples: 647620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:06:41,319][01070] Avg episode reward: [(0, '5.987')] [2024-09-06 08:06:46,317][01070] Fps is (10 sec: 3688.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2609152. Throughput: 0: 950.4. Samples: 650448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:06:46,323][01070] Avg episode reward: [(0, '6.091')] [2024-09-06 08:06:46,333][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000637_2609152.pth... [2024-09-06 08:06:46,464][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000412_1687552.pth [2024-09-06 08:06:48,887][06068] Updated weights for policy 0, policy_version 640 (0.0049) [2024-09-06 08:06:51,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2629632. Throughput: 0: 982.1. Samples: 657202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:06:51,323][01070] Avg episode reward: [(0, '6.072')] [2024-09-06 08:06:56,317][01070] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2646016. Throughput: 0: 980.7. Samples: 662504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:06:56,319][01070] Avg episode reward: [(0, '5.924')] [2024-09-06 08:07:00,529][06068] Updated weights for policy 0, policy_version 650 (0.0028) [2024-09-06 08:07:01,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2662400. Throughput: 0: 950.1. Samples: 664618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:07:01,319][01070] Avg episode reward: [(0, '5.764')] [2024-09-06 08:07:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2686976. Throughput: 0: 963.3. Samples: 671462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-06 08:07:06,318][01070] Avg episode reward: [(0, '5.901')] [2024-09-06 08:07:09,346][06068] Updated weights for policy 0, policy_version 660 (0.0045) [2024-09-06 08:07:11,317][01070] Fps is (10 sec: 4505.2, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 2707456. Throughput: 0: 1000.9. Samples: 677680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:07:11,320][01070] Avg episode reward: [(0, '6.609')] [2024-09-06 08:07:11,322][06055] Saving new best policy, reward=6.609! [2024-09-06 08:07:16,318][01070] Fps is (10 sec: 3276.4, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 2719744. Throughput: 0: 967.8. Samples: 679670. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:07:16,324][01070] Avg episode reward: [(0, '6.832')] [2024-09-06 08:07:16,336][06055] Saving new best policy, reward=6.832! [2024-09-06 08:07:21,160][06068] Updated weights for policy 0, policy_version 670 (0.0030) [2024-09-06 08:07:21,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 2744320. Throughput: 0: 935.1. Samples: 685232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:07:21,321][01070] Avg episode reward: [(0, '6.827')] [2024-09-06 08:07:26,317][01070] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2760704. Throughput: 0: 966.4. Samples: 691106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:07:26,323][01070] Avg episode reward: [(0, '6.439')] [2024-09-06 08:07:31,319][01070] Fps is (10 sec: 2866.6, 60 sec: 3686.2, 300 sec: 3804.4). Total num frames: 2772992. Throughput: 0: 942.7. Samples: 692870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:07:31,325][01070] Avg episode reward: [(0, '6.339')] [2024-09-06 08:07:35,290][06068] Updated weights for policy 0, policy_version 680 (0.0034) [2024-09-06 08:07:36,317][01070] Fps is (10 sec: 2457.6, 60 sec: 3550.3, 300 sec: 3804.4). Total num frames: 2785280. Throughput: 0: 878.4. Samples: 696728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-06 08:07:36,321][01070] Avg episode reward: [(0, '6.444')] [2024-09-06 08:07:41,317][01070] Fps is (10 sec: 3687.5, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2809856. Throughput: 0: 908.9. Samples: 703404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:07:41,319][01070] Avg episode reward: [(0, '6.654')] [2024-09-06 08:07:44,151][06068] Updated weights for policy 0, policy_version 690 (0.0028) [2024-09-06 08:07:46,317][01070] Fps is (10 sec: 4915.2, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2834432. Throughput: 0: 940.5. Samples: 706940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:07:46,319][01070] Avg episode reward: [(0, '6.718')] [2024-09-06 08:07:51,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2846720. Throughput: 0: 901.1. Samples: 712010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:07:51,319][01070] Avg episode reward: [(0, '6.882')] [2024-09-06 08:07:51,324][06055] Saving new best policy, reward=6.882! [2024-09-06 08:07:55,843][06068] Updated weights for policy 0, policy_version 700 (0.0039) [2024-09-06 08:07:56,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2867200. Throughput: 0: 884.7. Samples: 717490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:07:56,325][01070] Avg episode reward: [(0, '7.244')] [2024-09-06 08:07:56,336][06055] Saving new best policy, reward=7.244! [2024-09-06 08:08:01,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2891776. Throughput: 0: 917.8. Samples: 720972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:08:01,324][01070] Avg episode reward: [(0, '7.391')] [2024-09-06 08:08:01,326][06055] Saving new best policy, reward=7.391! [2024-09-06 08:08:05,806][06068] Updated weights for policy 0, policy_version 710 (0.0027) [2024-09-06 08:08:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 2908160. Throughput: 0: 930.1. Samples: 727084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:08:06,325][01070] Avg episode reward: [(0, '6.836')] [2024-09-06 08:08:11,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3818.3). Total num frames: 2924544. Throughput: 0: 901.6. Samples: 731678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:08:11,319][01070] Avg episode reward: [(0, '6.692')] [2024-09-06 08:08:16,037][06068] Updated weights for policy 0, policy_version 720 (0.0022) [2024-09-06 08:08:16,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 2949120. Throughput: 0: 940.1. Samples: 735172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:08:16,319][01070] Avg episode reward: [(0, '6.850')] [2024-09-06 08:08:21,321][01070] Fps is (10 sec: 4503.7, 60 sec: 3754.5, 300 sec: 3818.2). Total num frames: 2969600. Throughput: 0: 1006.2. Samples: 742010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:08:21,323][01070] Avg episode reward: [(0, '7.485')] [2024-09-06 08:08:21,325][06055] Saving new best policy, reward=7.485! [2024-09-06 08:08:26,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.5). Total num frames: 2981888. Throughput: 0: 953.0. Samples: 746290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:08:26,319][01070] Avg episode reward: [(0, '7.434')] [2024-09-06 08:08:27,902][06068] Updated weights for policy 0, policy_version 730 (0.0032) [2024-09-06 08:08:31,317][01070] Fps is (10 sec: 3278.2, 60 sec: 3823.1, 300 sec: 3804.4). Total num frames: 3002368. Throughput: 0: 939.8. Samples: 749232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:08:31,319][01070] Avg episode reward: [(0, '7.716')] [2024-09-06 08:08:31,354][06055] Saving new best policy, reward=7.716! [2024-09-06 08:08:36,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 3026944. Throughput: 0: 982.7. Samples: 756230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:08:36,319][01070] Avg episode reward: [(0, '7.928')] [2024-09-06 08:08:36,335][06055] Saving new best policy, reward=7.928! [2024-09-06 08:08:36,786][06068] Updated weights for policy 0, policy_version 740 (0.0040) [2024-09-06 08:08:41,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.3). Total num frames: 3043328. Throughput: 0: 975.6. Samples: 761394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:08:41,319][01070] Avg episode reward: [(0, '7.704')] [2024-09-06 08:08:46,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3059712. Throughput: 0: 946.2. Samples: 763550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:08:46,318][01070] Avg episode reward: [(0, '7.583')] [2024-09-06 08:08:46,332][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000747_3059712.pth... [2024-09-06 08:08:46,453][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000523_2142208.pth [2024-09-06 08:08:48,365][06068] Updated weights for policy 0, policy_version 750 (0.0018) [2024-09-06 08:08:51,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3084288. Throughput: 0: 960.6. Samples: 770310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:08:51,322][01070] Avg episode reward: [(0, '7.840')] [2024-09-06 08:08:56,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3100672. Throughput: 0: 992.8. Samples: 776356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:08:56,320][01070] Avg episode reward: [(0, '7.925')] [2024-09-06 08:08:59,558][06068] Updated weights for policy 0, policy_version 760 (0.0058) [2024-09-06 08:09:01,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3117056. Throughput: 0: 961.3. Samples: 778430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:09:01,323][01070] Avg episode reward: [(0, '7.469')] [2024-09-06 08:09:06,318][01070] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 3141632. Throughput: 0: 942.7. Samples: 784430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-06 08:09:06,322][01070] Avg episode reward: [(0, '7.036')] [2024-09-06 08:09:08,713][06068] Updated weights for policy 0, policy_version 770 (0.0036) [2024-09-06 08:09:11,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3162112. Throughput: 0: 1005.6. Samples: 791544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:09:11,324][01070] Avg episode reward: [(0, '7.387')] [2024-09-06 08:09:16,322][01070] Fps is (10 sec: 3684.7, 60 sec: 3822.6, 300 sec: 3846.0). Total num frames: 3178496. Throughput: 0: 991.1. Samples: 793836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:09:16,325][01070] Avg episode reward: [(0, '7.663')] [2024-09-06 08:09:20,382][06068] Updated weights for policy 0, policy_version 780 (0.0022) [2024-09-06 08:09:21,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3860.0). Total num frames: 3198976. Throughput: 0: 946.0. Samples: 798798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:09:21,325][01070] Avg episode reward: [(0, '7.722')] [2024-09-06 08:09:26,317][01070] Fps is (10 sec: 4508.2, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3223552. Throughput: 0: 988.2. Samples: 805864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:09:26,320][01070] Avg episode reward: [(0, '7.744')] [2024-09-06 08:09:29,518][06068] Updated weights for policy 0, policy_version 790 (0.0014) [2024-09-06 08:09:31,319][01070] Fps is (10 sec: 4095.0, 60 sec: 3959.3, 300 sec: 3846.0). Total num frames: 3239936. Throughput: 0: 1014.1. Samples: 809186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:09:31,322][01070] Avg episode reward: [(0, '7.755')] [2024-09-06 08:09:36,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3256320. Throughput: 0: 960.5. Samples: 813534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:09:36,318][01070] Avg episode reward: [(0, '8.140')] [2024-09-06 08:09:36,331][06055] Saving new best policy, reward=8.140! [2024-09-06 08:09:40,585][06068] Updated weights for policy 0, policy_version 800 (0.0042) [2024-09-06 08:09:41,317][01070] Fps is (10 sec: 3687.3, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3276800. Throughput: 0: 974.7. Samples: 820218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:09:41,323][01070] Avg episode reward: [(0, '8.704')] [2024-09-06 08:09:41,332][06055] Saving new best policy, reward=8.704! [2024-09-06 08:09:46,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3301376. Throughput: 0: 1006.8. Samples: 823736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:09:46,322][01070] Avg episode reward: [(0, '8.566')] [2024-09-06 08:09:51,321][01070] Fps is (10 sec: 3684.6, 60 sec: 3822.6, 300 sec: 3846.0). Total num frames: 3313664. Throughput: 0: 986.4. Samples: 828822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:09:51,327][01070] Avg episode reward: [(0, '8.672')] [2024-09-06 08:09:51,764][06068] Updated weights for policy 0, policy_version 810 (0.0032) [2024-09-06 08:09:56,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3334144. Throughput: 0: 955.2. Samples: 834528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:09:56,323][01070] Avg episode reward: [(0, '8.667')] [2024-09-06 08:10:00,867][06068] Updated weights for policy 0, policy_version 820 (0.0032) [2024-09-06 08:10:01,317][01070] Fps is (10 sec: 4507.8, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3358720. Throughput: 0: 982.2. Samples: 838030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:10:01,319][01070] Avg episode reward: [(0, '9.323')] [2024-09-06 08:10:01,327][06055] Saving new best policy, reward=9.323! [2024-09-06 08:10:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 3375104. Throughput: 0: 1002.8. Samples: 843922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:10:06,322][01070] Avg episode reward: [(0, '9.281')] [2024-09-06 08:10:11,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3391488. Throughput: 0: 951.8. Samples: 848696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:10:11,319][01070] Avg episode reward: [(0, '9.212')] [2024-09-06 08:10:12,461][06068] Updated weights for policy 0, policy_version 830 (0.0025) [2024-09-06 08:10:16,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.9, 300 sec: 3860.0). Total num frames: 3416064. Throughput: 0: 956.3. Samples: 852216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:10:16,319][01070] Avg episode reward: [(0, '9.185')] [2024-09-06 08:10:21,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3436544. Throughput: 0: 1017.1. Samples: 859302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:10:21,320][01070] Avg episode reward: [(0, '9.434')] [2024-09-06 08:10:21,324][06055] Saving new best policy, reward=9.434! [2024-09-06 08:10:22,066][06068] Updated weights for policy 0, policy_version 840 (0.0030) [2024-09-06 08:10:26,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3452928. Throughput: 0: 961.2. Samples: 863472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:10:26,322][01070] Avg episode reward: [(0, '9.987')] [2024-09-06 08:10:26,338][06055] Saving new best policy, reward=9.987! [2024-09-06 08:10:31,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3860.0). Total num frames: 3473408. Throughput: 0: 951.3. Samples: 866546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:10:31,322][01070] Avg episode reward: [(0, '10.544')] [2024-09-06 08:10:31,325][06055] Saving new best policy, reward=10.544! [2024-09-06 08:10:32,772][06068] Updated weights for policy 0, policy_version 850 (0.0028) [2024-09-06 08:10:36,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3497984. Throughput: 0: 993.9. Samples: 873542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:10:36,319][01070] Avg episode reward: [(0, '11.038')] [2024-09-06 08:10:36,329][06055] Saving new best policy, reward=11.038! [2024-09-06 08:10:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3510272. Throughput: 0: 980.8. Samples: 878664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:10:41,325][01070] Avg episode reward: [(0, '10.332')] [2024-09-06 08:10:44,181][06068] Updated weights for policy 0, policy_version 860 (0.0030) [2024-09-06 08:10:46,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3530752. Throughput: 0: 953.3. Samples: 880928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:10:46,318][01070] Avg episode reward: [(0, '10.049')] [2024-09-06 08:10:46,335][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000862_3530752.pth... [2024-09-06 08:10:46,466][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000637_2609152.pth [2024-09-06 08:10:51,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 3873.8). Total num frames: 3555328. Throughput: 0: 981.0. Samples: 888068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:10:51,325][01070] Avg episode reward: [(0, '9.881')] [2024-09-06 08:10:52,867][06068] Updated weights for policy 0, policy_version 870 (0.0024) [2024-09-06 08:10:56,318][01070] Fps is (10 sec: 4095.3, 60 sec: 3959.3, 300 sec: 3846.1). Total num frames: 3571712. Throughput: 0: 1010.2. Samples: 894156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:10:56,321][01070] Avg episode reward: [(0, '10.324')] [2024-09-06 08:11:01,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3588096. Throughput: 0: 978.2. Samples: 896234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:11:01,324][01070] Avg episode reward: [(0, '10.751')] [2024-09-06 08:11:04,130][06068] Updated weights for policy 0, policy_version 880 (0.0019) [2024-09-06 08:11:06,317][01070] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3612672. Throughput: 0: 960.8. Samples: 902540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:11:06,321][01070] Avg episode reward: [(0, '11.121')] [2024-09-06 08:11:06,332][06055] Saving new best policy, reward=11.121! [2024-09-06 08:11:11,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 3629056. Throughput: 0: 994.5. Samples: 908226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:11:11,319][01070] Avg episode reward: [(0, '11.751')] [2024-09-06 08:11:11,326][06055] Saving new best policy, reward=11.751! [2024-09-06 08:11:16,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3641344. Throughput: 0: 964.6. Samples: 909952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:11:16,319][01070] Avg episode reward: [(0, '12.049')] [2024-09-06 08:11:16,333][06055] Saving new best policy, reward=12.049! [2024-09-06 08:11:17,256][06068] Updated weights for policy 0, policy_version 890 (0.0045) [2024-09-06 08:11:21,316][01070] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3657728. Throughput: 0: 893.7. Samples: 913758. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:11:21,318][01070] Avg episode reward: [(0, '13.812')] [2024-09-06 08:11:21,326][06055] Saving new best policy, reward=13.812! [2024-09-06 08:11:26,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 3678208. Throughput: 0: 932.5. Samples: 920626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:11:26,325][01070] Avg episode reward: [(0, '15.231')] [2024-09-06 08:11:26,390][06055] Saving new best policy, reward=15.231! [2024-09-06 08:11:27,393][06068] Updated weights for policy 0, policy_version 900 (0.0024) [2024-09-06 08:11:31,316][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3832.3). Total num frames: 3702784. Throughput: 0: 960.0. Samples: 924128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:11:31,322][01070] Avg episode reward: [(0, '15.872')] [2024-09-06 08:11:31,328][06055] Saving new best policy, reward=15.872! [2024-09-06 08:11:36,317][01070] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 3715072. Throughput: 0: 907.6. Samples: 928912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:11:36,320][01070] Avg episode reward: [(0, '16.561')] [2024-09-06 08:11:36,333][06055] Saving new best policy, reward=16.561! [2024-09-06 08:11:38,716][06068] Updated weights for policy 0, policy_version 910 (0.0025) [2024-09-06 08:11:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3739648. Throughput: 0: 909.8. Samples: 935094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:11:41,318][01070] Avg episode reward: [(0, '14.523')] [2024-09-06 08:11:46,317][01070] Fps is (10 sec: 4505.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3760128. Throughput: 0: 942.4. Samples: 938640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:11:46,320][01070] Avg episode reward: [(0, '14.427')] [2024-09-06 08:11:47,204][06068] Updated weights for policy 0, policy_version 920 (0.0035) [2024-09-06 08:11:51,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3776512. Throughput: 0: 934.4. Samples: 944586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:11:51,321][01070] Avg episode reward: [(0, '14.547')] [2024-09-06 08:11:56,317][01070] Fps is (10 sec: 3686.6, 60 sec: 3754.8, 300 sec: 3846.1). Total num frames: 3796992. Throughput: 0: 915.3. Samples: 949416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:11:56,319][01070] Avg episode reward: [(0, '15.007')] [2024-09-06 08:11:58,865][06068] Updated weights for policy 0, policy_version 930 (0.0040) [2024-09-06 08:12:01,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3817472. Throughput: 0: 955.3. Samples: 952942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:12:01,319][01070] Avg episode reward: [(0, '16.278')] [2024-09-06 08:12:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3837952. Throughput: 0: 1027.6. Samples: 959998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:12:06,326][01070] Avg episode reward: [(0, '16.304')] [2024-09-06 08:12:09,321][06068] Updated weights for policy 0, policy_version 940 (0.0024) [2024-09-06 08:12:11,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3854336. Throughput: 0: 969.6. Samples: 964258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:12:11,320][01070] Avg episode reward: [(0, '16.576')] [2024-09-06 08:12:11,326][06055] Saving new best policy, reward=16.576! [2024-09-06 08:12:16,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3878912. Throughput: 0: 961.4. Samples: 967390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:12:16,322][01070] Avg episode reward: [(0, '16.934')] [2024-09-06 08:12:16,332][06055] Saving new best policy, reward=16.934! [2024-09-06 08:12:18,962][06068] Updated weights for policy 0, policy_version 950 (0.0061) [2024-09-06 08:12:21,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3899392. Throughput: 0: 1011.4. Samples: 974424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:12:21,321][01070] Avg episode reward: [(0, '16.941')] [2024-09-06 08:12:21,326][06055] Saving new best policy, reward=16.941! [2024-09-06 08:12:26,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 3915776. Throughput: 0: 986.0. Samples: 979462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:12:26,321][01070] Avg episode reward: [(0, '17.171')] [2024-09-06 08:12:26,332][06055] Saving new best policy, reward=17.171! [2024-09-06 08:12:30,483][06068] Updated weights for policy 0, policy_version 960 (0.0038) [2024-09-06 08:12:31,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3932160. Throughput: 0: 955.2. Samples: 981622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:12:31,318][01070] Avg episode reward: [(0, '17.914')] [2024-09-06 08:12:31,330][06055] Saving new best policy, reward=17.914! [2024-09-06 08:12:36,317][01070] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3956736. Throughput: 0: 980.8. Samples: 988720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:12:36,319][01070] Avg episode reward: [(0, '17.778')] [2024-09-06 08:12:39,405][06068] Updated weights for policy 0, policy_version 970 (0.0031) [2024-09-06 08:12:41,318][01070] Fps is (10 sec: 4504.8, 60 sec: 3959.3, 300 sec: 3873.8). Total num frames: 3977216. Throughput: 0: 1011.3. Samples: 994928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:12:41,321][01070] Avg episode reward: [(0, '17.411')] [2024-09-06 08:12:46,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 3989504. Throughput: 0: 979.5. Samples: 997020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:12:46,319][01070] Avg episode reward: [(0, '16.564')] [2024-09-06 08:12:46,386][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000975_3993600.pth... [2024-09-06 08:12:46,505][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000747_3059712.pth [2024-09-06 08:12:49,066][06055] Stopping Batcher_0... [2024-09-06 08:12:49,066][06055] Loop batcher_evt_loop terminating... [2024-09-06 08:12:49,068][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-06 08:12:49,066][01070] Component Batcher_0 stopped! [2024-09-06 08:12:49,172][06068] Weights refcount: 2 0 [2024-09-06 08:12:49,178][06068] Stopping InferenceWorker_p0-w0... [2024-09-06 08:12:49,180][01070] Component InferenceWorker_p0-w0 stopped! [2024-09-06 08:12:49,179][06068] Loop inference_proc0-0_evt_loop terminating... [2024-09-06 08:12:49,262][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000862_3530752.pth [2024-09-06 08:12:49,282][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-06 08:12:49,452][01070] Component LearnerWorker_p0 stopped! [2024-09-06 08:12:49,458][06055] Stopping LearnerWorker_p0... [2024-09-06 08:12:49,458][06055] Loop learner_proc0_evt_loop terminating... [2024-09-06 08:12:49,517][06070] Stopping RolloutWorker_w1... [2024-09-06 08:12:49,518][06070] Loop rollout_proc1_evt_loop terminating... [2024-09-06 08:12:49,518][01070] Component RolloutWorker_w1 stopped! [2024-09-06 08:12:49,528][01070] Component RolloutWorker_w7 stopped! [2024-09-06 08:12:49,528][06076] Stopping RolloutWorker_w7... [2024-09-06 08:12:49,534][06076] Loop rollout_proc7_evt_loop terminating... [2024-09-06 08:12:49,583][06072] Stopping RolloutWorker_w3... [2024-09-06 08:12:49,587][06074] Stopping RolloutWorker_w5... [2024-09-06 08:12:49,584][01070] Component RolloutWorker_w3 stopped! [2024-09-06 08:12:49,589][06072] Loop rollout_proc3_evt_loop terminating... [2024-09-06 08:12:49,589][01070] Component RolloutWorker_w5 stopped! [2024-09-06 08:12:49,589][06074] Loop rollout_proc5_evt_loop terminating... [2024-09-06 08:12:49,624][01070] Component RolloutWorker_w0 stopped! [2024-09-06 08:12:49,629][06069] Stopping RolloutWorker_w0... [2024-09-06 08:12:49,636][06069] Loop rollout_proc0_evt_loop terminating... [2024-09-06 08:12:49,659][01070] Component RolloutWorker_w6 stopped! [2024-09-06 08:12:49,662][06075] Stopping RolloutWorker_w6... [2024-09-06 08:12:49,663][06075] Loop rollout_proc6_evt_loop terminating... [2024-09-06 08:12:49,674][01070] Component RolloutWorker_w2 stopped! [2024-09-06 08:12:49,677][06071] Stopping RolloutWorker_w2... [2024-09-06 08:12:49,677][06071] Loop rollout_proc2_evt_loop terminating... [2024-09-06 08:12:49,711][01070] Component RolloutWorker_w4 stopped! [2024-09-06 08:12:49,714][01070] Waiting for process learner_proc0 to stop... [2024-09-06 08:12:49,717][06073] Stopping RolloutWorker_w4... [2024-09-06 08:12:49,718][06073] Loop rollout_proc4_evt_loop terminating... [2024-09-06 08:12:51,010][01070] Waiting for process inference_proc0-0 to join... [2024-09-06 08:12:51,015][01070] Waiting for process rollout_proc0 to join... [2024-09-06 08:12:53,043][01070] Waiting for process rollout_proc1 to join... [2024-09-06 08:12:53,046][01070] Waiting for process rollout_proc2 to join... [2024-09-06 08:12:53,049][01070] Waiting for process rollout_proc3 to join... [2024-09-06 08:12:53,051][01070] Waiting for process rollout_proc4 to join... [2024-09-06 08:12:53,052][01070] Waiting for process rollout_proc5 to join... [2024-09-06 08:12:53,054][01070] Waiting for process rollout_proc6 to join... [2024-09-06 08:12:53,057][01070] Waiting for process rollout_proc7 to join... [2024-09-06 08:12:53,059][01070] Batcher 0 profile tree view: batching: 28.0352, releasing_batches: 0.0261 [2024-09-06 08:12:53,060][01070] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 396.0637 update_model: 9.3731 weight_update: 0.0039 one_step: 0.0115 handle_policy_step: 605.9671 deserialize: 14.6060, stack: 3.1718, obs_to_device_normalize: 122.4392, forward: 322.4806, send_messages: 28.9576 prepare_outputs: 84.3394 to_cpu: 49.4062 [2024-09-06 08:12:53,062][01070] Learner 0 profile tree view: misc: 0.0070, prepare_batch: 14.0241 train: 74.3266 epoch_init: 0.0157, minibatch_init: 0.0064, losses_postprocess: 0.6825, kl_divergence: 0.6975, after_optimizer: 33.2803 calculate_losses: 26.8972 losses_init: 0.0105, forward_head: 1.2733, bptt_initial: 18.0432, tail: 1.1022, advantages_returns: 0.2551, losses: 3.8922 bptt: 1.9881 bptt_forward_core: 1.8812 update: 12.1225 clip: 0.8825 [2024-09-06 08:12:53,063][01070] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3955, enqueue_policy_requests: 94.3963, env_step: 821.5439, overhead: 13.5076, complete_rollouts: 7.0298 save_policy_outputs: 20.8305 split_output_tensors: 8.5837 [2024-09-06 08:12:53,065][01070] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3496, enqueue_policy_requests: 97.5017, env_step: 821.5330, overhead: 13.0956, complete_rollouts: 6.8883 save_policy_outputs: 20.0181 split_output_tensors: 7.8801 [2024-09-06 08:12:53,066][01070] Loop Runner_EvtLoop terminating... [2024-09-06 08:12:53,068][01070] Runner profile tree view: main_loop: 1081.6875 [2024-09-06 08:12:53,069][01070] Collected {0: 4005888}, FPS: 3703.4 [2024-09-06 08:26:47,354][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 08:26:47,356][01070] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-06 08:26:47,359][01070] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-06 08:26:47,361][01070] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-06 08:26:47,363][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 08:26:47,365][01070] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-06 08:26:47,366][01070] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 08:26:47,367][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-06 08:26:47,368][01070] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-06 08:26:47,369][01070] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-06 08:26:47,370][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-06 08:26:47,371][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-06 08:26:47,372][01070] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-06 08:26:47,373][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-06 08:26:47,374][01070] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-06 08:26:47,410][01070] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:26:47,413][01070] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 08:26:47,415][01070] RunningMeanStd input shape: (1,) [2024-09-06 08:26:47,432][01070] ConvEncoder: input_channels=3 [2024-09-06 08:26:47,594][01070] Conv encoder output size: 512 [2024-09-06 08:26:47,596][01070] Policy head output size: 512 [2024-09-06 08:26:47,888][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-06 08:26:48,754][01070] Num frames 100... [2024-09-06 08:26:48,875][01070] Num frames 200... [2024-09-06 08:26:48,994][01070] Num frames 300... [2024-09-06 08:26:49,113][01070] Num frames 400... [2024-09-06 08:26:49,254][01070] Num frames 500... [2024-09-06 08:26:49,388][01070] Avg episode rewards: #0: 11.440, true rewards: #0: 5.440 [2024-09-06 08:26:49,391][01070] Avg episode reward: 11.440, avg true_objective: 5.440 [2024-09-06 08:26:49,511][01070] Num frames 600... [2024-09-06 08:26:49,678][01070] Num frames 700... [2024-09-06 08:26:49,846][01070] Num frames 800... [2024-09-06 08:26:50,007][01070] Num frames 900... [2024-09-06 08:26:50,168][01070] Num frames 1000... [2024-09-06 08:26:50,331][01070] Num frames 1100... [2024-09-06 08:26:50,503][01070] Num frames 1200... [2024-09-06 08:26:50,681][01070] Num frames 1300... [2024-09-06 08:26:50,858][01070] Num frames 1400... [2024-09-06 08:26:51,030][01070] Num frames 1500... [2024-09-06 08:26:51,200][01070] Num frames 1600... [2024-09-06 08:26:51,364][01070] Avg episode rewards: #0: 19.795, true rewards: #0: 8.295 [2024-09-06 08:26:51,366][01070] Avg episode reward: 19.795, avg true_objective: 8.295 [2024-09-06 08:26:51,440][01070] Num frames 1700... [2024-09-06 08:26:51,597][01070] Num frames 1800... [2024-09-06 08:26:51,716][01070] Num frames 1900... [2024-09-06 08:26:51,843][01070] Num frames 2000... [2024-09-06 08:26:51,964][01070] Num frames 2100... [2024-09-06 08:26:52,083][01070] Num frames 2200... [2024-09-06 08:26:52,203][01070] Num frames 2300... [2024-09-06 08:26:52,366][01070] Avg episode rewards: #0: 18.633, true rewards: #0: 7.967 [2024-09-06 08:26:52,367][01070] Avg episode reward: 18.633, avg true_objective: 7.967 [2024-09-06 08:26:52,382][01070] Num frames 2400... [2024-09-06 08:26:52,509][01070] Num frames 2500... [2024-09-06 08:26:52,629][01070] Num frames 2600... [2024-09-06 08:26:52,748][01070] Num frames 2700... [2024-09-06 08:26:52,878][01070] Num frames 2800... [2024-09-06 08:26:52,997][01070] Num frames 2900... [2024-09-06 08:26:53,117][01070] Num frames 3000... [2024-09-06 08:26:53,239][01070] Num frames 3100... [2024-09-06 08:26:53,401][01070] Avg episode rewards: #0: 18.478, true rewards: #0: 7.977 [2024-09-06 08:26:53,402][01070] Avg episode reward: 18.478, avg true_objective: 7.977 [2024-09-06 08:26:53,416][01070] Num frames 3200... [2024-09-06 08:26:53,543][01070] Num frames 3300... [2024-09-06 08:26:53,662][01070] Num frames 3400... [2024-09-06 08:26:53,779][01070] Num frames 3500... [2024-09-06 08:26:53,906][01070] Num frames 3600... [2024-09-06 08:26:54,025][01070] Num frames 3700... [2024-09-06 08:26:54,144][01070] Num frames 3800... [2024-09-06 08:26:54,263][01070] Num frames 3900... [2024-09-06 08:26:54,327][01070] Avg episode rewards: #0: 18.210, true rewards: #0: 7.810 [2024-09-06 08:26:54,328][01070] Avg episode reward: 18.210, avg true_objective: 7.810 [2024-09-06 08:26:54,438][01070] Num frames 4000... [2024-09-06 08:26:54,568][01070] Num frames 4100... [2024-09-06 08:26:54,685][01070] Num frames 4200... [2024-09-06 08:26:54,803][01070] Num frames 4300... [2024-09-06 08:26:54,928][01070] Num frames 4400... [2024-09-06 08:26:55,056][01070] Num frames 4500... [2024-09-06 08:26:55,190][01070] Num frames 4600... [2024-09-06 08:26:55,326][01070] Num frames 4700... [2024-09-06 08:26:55,391][01070] Avg episode rewards: #0: 17.675, true rewards: #0: 7.842 [2024-09-06 08:26:55,392][01070] Avg episode reward: 17.675, avg true_objective: 7.842 [2024-09-06 08:26:55,513][01070] Num frames 4800... [2024-09-06 08:26:55,637][01070] Num frames 4900... [2024-09-06 08:26:55,758][01070] Num frames 5000... [2024-09-06 08:26:55,827][01070] Avg episode rewards: #0: 15.870, true rewards: #0: 7.156 [2024-09-06 08:26:55,828][01070] Avg episode reward: 15.870, avg true_objective: 7.156 [2024-09-06 08:26:55,944][01070] Num frames 5100... [2024-09-06 08:26:56,065][01070] Num frames 5200... [2024-09-06 08:26:56,186][01070] Num frames 5300... [2024-09-06 08:26:56,305][01070] Num frames 5400... [2024-09-06 08:26:56,425][01070] Num frames 5500... [2024-09-06 08:26:56,557][01070] Num frames 5600... [2024-09-06 08:26:56,681][01070] Num frames 5700... [2024-09-06 08:26:56,803][01070] Num frames 5800... [2024-09-06 08:26:56,931][01070] Num frames 5900... [2024-09-06 08:26:57,053][01070] Num frames 6000... [2024-09-06 08:26:57,173][01070] Num frames 6100... [2024-09-06 08:26:57,294][01070] Num frames 6200... [2024-09-06 08:26:57,418][01070] Num frames 6300... [2024-09-06 08:26:57,547][01070] Num frames 6400... [2024-09-06 08:26:57,675][01070] Num frames 6500... [2024-09-06 08:26:57,795][01070] Num frames 6600... [2024-09-06 08:26:57,916][01070] Num frames 6700... [2024-09-06 08:26:58,045][01070] Num frames 6800... [2024-09-06 08:26:58,169][01070] Num frames 6900... [2024-09-06 08:26:58,260][01070] Avg episode rewards: #0: 19.911, true rewards: #0: 8.661 [2024-09-06 08:26:58,261][01070] Avg episode reward: 19.911, avg true_objective: 8.661 [2024-09-06 08:26:58,349][01070] Num frames 7000... [2024-09-06 08:26:58,470][01070] Num frames 7100... [2024-09-06 08:26:58,599][01070] Num frames 7200... [2024-09-06 08:26:58,720][01070] Num frames 7300... [2024-09-06 08:26:58,842][01070] Num frames 7400... [2024-09-06 08:26:58,969][01070] Num frames 7500... [2024-09-06 08:26:59,094][01070] Num frames 7600... [2024-09-06 08:26:59,214][01070] Num frames 7700... [2024-09-06 08:26:59,332][01070] Num frames 7800... [2024-09-06 08:26:59,457][01070] Num frames 7900... [2024-09-06 08:26:59,545][01070] Avg episode rewards: #0: 19.801, true rewards: #0: 8.801 [2024-09-06 08:26:59,546][01070] Avg episode reward: 19.801, avg true_objective: 8.801 [2024-09-06 08:26:59,649][01070] Num frames 8000... [2024-09-06 08:26:59,793][01070] Num frames 8100... [2024-09-06 08:26:59,916][01070] Num frames 8200... [2024-09-06 08:27:00,042][01070] Num frames 8300... [2024-09-06 08:27:00,161][01070] Num frames 8400... [2024-09-06 08:27:00,280][01070] Num frames 8500... [2024-09-06 08:27:00,411][01070] Avg episode rewards: #0: 18.961, true rewards: #0: 8.561 [2024-09-06 08:27:00,412][01070] Avg episode reward: 18.961, avg true_objective: 8.561 [2024-09-06 08:27:55,072][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-06 08:29:44,806][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 08:29:44,808][01070] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-06 08:29:44,810][01070] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-06 08:29:44,811][01070] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-06 08:29:44,813][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 08:29:44,814][01070] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-06 08:29:44,817][01070] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-06 08:29:44,818][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-06 08:29:44,820][01070] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-06 08:29:44,822][01070] Adding new argument 'hf_repository'='Re-Re/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-06 08:29:44,823][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-06 08:29:44,826][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-06 08:29:44,827][01070] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-06 08:29:44,828][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-06 08:29:44,829][01070] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-06 08:29:44,858][01070] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 08:29:44,861][01070] RunningMeanStd input shape: (1,) [2024-09-06 08:29:44,875][01070] ConvEncoder: input_channels=3 [2024-09-06 08:29:44,911][01070] Conv encoder output size: 512 [2024-09-06 08:29:44,912][01070] Policy head output size: 512 [2024-09-06 08:29:44,931][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-06 08:29:45,358][01070] Num frames 100... [2024-09-06 08:29:45,488][01070] Num frames 200... [2024-09-06 08:29:45,624][01070] Num frames 300... [2024-09-06 08:29:45,744][01070] Num frames 400... [2024-09-06 08:29:45,863][01070] Num frames 500... [2024-09-06 08:29:45,987][01070] Num frames 600... [2024-09-06 08:29:46,108][01070] Num frames 700... [2024-09-06 08:29:46,235][01070] Num frames 800... [2024-09-06 08:29:46,300][01070] Avg episode rewards: #0: 17.070, true rewards: #0: 8.070 [2024-09-06 08:29:46,302][01070] Avg episode reward: 17.070, avg true_objective: 8.070 [2024-09-06 08:29:46,416][01070] Num frames 900... [2024-09-06 08:29:46,542][01070] Num frames 1000... [2024-09-06 08:29:46,669][01070] Num frames 1100... [2024-09-06 08:29:46,794][01070] Num frames 1200... [2024-09-06 08:29:46,917][01070] Num frames 1300... [2024-09-06 08:29:47,039][01070] Num frames 1400... [2024-09-06 08:29:47,189][01070] Avg episode rewards: #0: 14.895, true rewards: #0: 7.395 [2024-09-06 08:29:47,191][01070] Avg episode reward: 14.895, avg true_objective: 7.395 [2024-09-06 08:29:47,218][01070] Num frames 1500... [2024-09-06 08:29:47,338][01070] Num frames 1600... [2024-09-06 08:29:47,461][01070] Num frames 1700... [2024-09-06 08:29:47,587][01070] Num frames 1800... [2024-09-06 08:29:47,708][01070] Num frames 1900... [2024-09-06 08:29:47,826][01070] Num frames 2000... [2024-09-06 08:29:47,910][01070] Avg episode rewards: #0: 12.410, true rewards: #0: 6.743 [2024-09-06 08:29:47,913][01070] Avg episode reward: 12.410, avg true_objective: 6.743 [2024-09-06 08:29:48,007][01070] Num frames 2100... [2024-09-06 08:29:48,128][01070] Num frames 2200... [2024-09-06 08:29:48,257][01070] Num frames 2300... [2024-09-06 08:29:48,380][01070] Num frames 2400... [2024-09-06 08:29:48,524][01070] Avg episode rewards: #0: 11.178, true rewards: #0: 6.177 [2024-09-06 08:29:48,526][01070] Avg episode reward: 11.178, avg true_objective: 6.177 [2024-09-06 08:29:48,562][01070] Num frames 2500... [2024-09-06 08:29:48,679][01070] Num frames 2600... [2024-09-06 08:29:48,802][01070] Num frames 2700... [2024-09-06 08:29:48,926][01070] Num frames 2800... [2024-09-06 08:29:49,047][01070] Num frames 2900... [2024-09-06 08:29:49,168][01070] Num frames 3000... [2024-09-06 08:29:49,296][01070] Num frames 3100... [2024-09-06 08:29:49,421][01070] Num frames 3200... [2024-09-06 08:29:49,551][01070] Num frames 3300... [2024-09-06 08:29:49,677][01070] Num frames 3400... [2024-09-06 08:29:49,816][01070] Avg episode rewards: #0: 12.736, true rewards: #0: 6.936 [2024-09-06 08:29:49,819][01070] Avg episode reward: 12.736, avg true_objective: 6.936 [2024-09-06 08:29:49,860][01070] Num frames 3500... [2024-09-06 08:29:49,981][01070] Num frames 3600... [2024-09-06 08:29:50,103][01070] Num frames 3700... [2024-09-06 08:29:50,228][01070] Num frames 3800... [2024-09-06 08:29:50,360][01070] Num frames 3900... [2024-09-06 08:29:50,439][01070] Avg episode rewards: #0: 11.862, true rewards: #0: 6.528 [2024-09-06 08:29:50,441][01070] Avg episode reward: 11.862, avg true_objective: 6.528 [2024-09-06 08:29:50,548][01070] Num frames 4000... [2024-09-06 08:29:50,713][01070] Num frames 4100... [2024-09-06 08:29:50,880][01070] Num frames 4200... [2024-09-06 08:29:51,043][01070] Num frames 4300... [2024-09-06 08:29:51,205][01070] Avg episode rewards: #0: 10.950, true rewards: #0: 6.236 [2024-09-06 08:29:51,208][01070] Avg episode reward: 10.950, avg true_objective: 6.236 [2024-09-06 08:29:51,274][01070] Num frames 4400... [2024-09-06 08:29:51,443][01070] Num frames 4500... [2024-09-06 08:29:51,608][01070] Num frames 4600... [2024-09-06 08:29:51,767][01070] Num frames 4700... [2024-09-06 08:29:51,938][01070] Num frames 4800... [2024-09-06 08:29:52,113][01070] Num frames 4900... [2024-09-06 08:29:52,291][01070] Avg episode rewards: #0: 10.716, true rewards: #0: 6.216 [2024-09-06 08:29:52,294][01070] Avg episode reward: 10.716, avg true_objective: 6.216 [2024-09-06 08:29:52,349][01070] Num frames 5000... [2024-09-06 08:29:52,549][01070] Num frames 5100... [2024-09-06 08:29:52,723][01070] Num frames 5200... [2024-09-06 08:29:52,895][01070] Num frames 5300... [2024-09-06 08:29:53,063][01070] Num frames 5400... [2024-09-06 08:29:53,183][01070] Num frames 5500... [2024-09-06 08:29:53,304][01070] Num frames 5600... [2024-09-06 08:29:53,437][01070] Num frames 5700... [2024-09-06 08:29:53,565][01070] Num frames 5800... [2024-09-06 08:29:53,688][01070] Num frames 5900... [2024-09-06 08:29:53,810][01070] Num frames 6000... [2024-09-06 08:29:53,930][01070] Num frames 6100... [2024-09-06 08:29:54,050][01070] Num frames 6200... [2024-09-06 08:29:54,174][01070] Num frames 6300... [2024-09-06 08:29:54,254][01070] Avg episode rewards: #0: 13.130, true rewards: #0: 7.019 [2024-09-06 08:29:54,255][01070] Avg episode reward: 13.130, avg true_objective: 7.019 [2024-09-06 08:29:54,356][01070] Num frames 6400... [2024-09-06 08:29:54,490][01070] Num frames 6500... [2024-09-06 08:29:54,609][01070] Num frames 6600... [2024-09-06 08:29:54,726][01070] Num frames 6700... [2024-09-06 08:29:54,842][01070] Num frames 6800... [2024-09-06 08:29:54,961][01070] Num frames 6900... [2024-09-06 08:29:55,080][01070] Num frames 7000... [2024-09-06 08:29:55,202][01070] Num frames 7100... [2024-09-06 08:29:55,323][01070] Num frames 7200... [2024-09-06 08:29:55,437][01070] Avg episode rewards: #0: 13.847, true rewards: #0: 7.247 [2024-09-06 08:29:55,439][01070] Avg episode reward: 13.847, avg true_objective: 7.247 [2024-09-06 08:30:39,348][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-06 08:30:46,004][01070] The model has been pushed to https://huggingface.co/Re-Re/rl_course_vizdoom_health_gathering_supreme [2024-09-06 08:35:27,232][01070] Environment doom_basic already registered, overwriting... [2024-09-06 08:35:27,234][01070] Environment doom_two_colors_easy already registered, overwriting... [2024-09-06 08:35:27,237][01070] Environment doom_two_colors_hard already registered, overwriting... [2024-09-06 08:35:27,238][01070] Environment doom_dm already registered, overwriting... [2024-09-06 08:35:27,240][01070] Environment doom_dwango5 already registered, overwriting... [2024-09-06 08:35:27,241][01070] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-06 08:35:27,242][01070] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-06 08:35:27,243][01070] Environment doom_my_way_home already registered, overwriting... [2024-09-06 08:35:27,244][01070] Environment doom_deadly_corridor already registered, overwriting... [2024-09-06 08:35:27,245][01070] Environment doom_defend_the_center already registered, overwriting... [2024-09-06 08:35:27,246][01070] Environment doom_defend_the_line already registered, overwriting... [2024-09-06 08:35:27,247][01070] Environment doom_health_gathering already registered, overwriting... [2024-09-06 08:35:27,248][01070] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-06 08:35:27,250][01070] Environment doom_battle already registered, overwriting... [2024-09-06 08:35:27,251][01070] Environment doom_battle2 already registered, overwriting... [2024-09-06 08:35:27,252][01070] Environment doom_duel_bots already registered, overwriting... [2024-09-06 08:35:27,253][01070] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-06 08:35:27,254][01070] Environment doom_duel already registered, overwriting... [2024-09-06 08:35:27,255][01070] Environment doom_deathmatch_full already registered, overwriting... [2024-09-06 08:35:27,256][01070] Environment doom_benchmark already registered, overwriting... [2024-09-06 08:35:27,257][01070] register_encoder_factory: [2024-09-06 08:35:27,282][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 08:35:27,284][01070] Overriding arg 'train_for_env_steps' with value 5000000 passed from command line [2024-09-06 08:35:27,291][01070] Experiment dir /content/train_dir/default_experiment already exists! [2024-09-06 08:35:27,293][01070] Resuming existing experiment from /content/train_dir/default_experiment... [2024-09-06 08:35:27,295][01070] Weights and Biases integration disabled [2024-09-06 08:35:27,298][01070] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-09-06 08:35:29,357][01070] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=5000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-09-06 08:35:29,359][01070] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-06 08:35:29,362][01070] Rollout worker 0 uses device cpu [2024-09-06 08:35:29,364][01070] Rollout worker 1 uses device cpu [2024-09-06 08:35:29,365][01070] Rollout worker 2 uses device cpu [2024-09-06 08:35:29,366][01070] Rollout worker 3 uses device cpu [2024-09-06 08:35:29,368][01070] Rollout worker 4 uses device cpu [2024-09-06 08:35:29,369][01070] Rollout worker 5 uses device cpu [2024-09-06 08:35:29,370][01070] Rollout worker 6 uses device cpu [2024-09-06 08:35:29,372][01070] Rollout worker 7 uses device cpu [2024-09-06 08:35:29,446][01070] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 08:35:29,447][01070] InferenceWorker_p0-w0: min num requests: 2 [2024-09-06 08:35:29,485][01070] Starting all processes... [2024-09-06 08:35:29,486][01070] Starting process learner_proc0 [2024-09-06 08:35:29,535][01070] Starting all processes... [2024-09-06 08:35:29,540][01070] Starting process inference_proc0-0 [2024-09-06 08:35:29,541][01070] Starting process rollout_proc0 [2024-09-06 08:35:29,541][01070] Starting process rollout_proc1 [2024-09-06 08:35:29,541][01070] Starting process rollout_proc2 [2024-09-06 08:35:29,541][01070] Starting process rollout_proc3 [2024-09-06 08:35:29,541][01070] Starting process rollout_proc4 [2024-09-06 08:35:29,541][01070] Starting process rollout_proc5 [2024-09-06 08:35:29,736][01070] Starting process rollout_proc7 [2024-09-06 08:35:29,753][01070] Starting process rollout_proc6 [2024-09-06 08:35:43,326][19093] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 08:35:43,327][19093] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-06 08:35:43,391][19093] Num visible devices: 1 [2024-09-06 08:35:43,432][19093] Starting seed is not provided [2024-09-06 08:35:43,433][19093] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 08:35:43,434][19093] Initializing actor-critic model on device cuda:0 [2024-09-06 08:35:43,434][19093] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 08:35:43,436][19093] RunningMeanStd input shape: (1,) [2024-09-06 08:35:43,522][19093] ConvEncoder: input_channels=3 [2024-09-06 08:35:44,545][19093] Conv encoder output size: 512 [2024-09-06 08:35:44,548][19093] Policy head output size: 512 [2024-09-06 08:35:44,679][19093] Created Actor Critic model with architecture: [2024-09-06 08:35:44,680][19093] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-06 08:35:45,399][19110] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 08:35:45,404][19110] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-06 08:35:45,627][19116] Worker 5 uses CPU cores [1] [2024-09-06 08:35:45,628][19113] Worker 2 uses CPU cores [0] [2024-09-06 08:35:45,638][19110] Num visible devices: 1 [2024-09-06 08:35:45,693][19114] Worker 4 uses CPU cores [0] [2024-09-06 08:35:45,750][19093] Using optimizer [2024-09-06 08:35:45,925][19111] Worker 1 uses CPU cores [1] [2024-09-06 08:35:45,964][19117] Worker 7 uses CPU cores [1] [2024-09-06 08:35:46,011][19118] Worker 6 uses CPU cores [0] [2024-09-06 08:35:46,116][19112] Worker 0 uses CPU cores [0] [2024-09-06 08:35:46,119][19115] Worker 3 uses CPU cores [1] [2024-09-06 08:35:46,846][19093] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-06 08:35:46,893][19093] Loading model from checkpoint [2024-09-06 08:35:46,895][19093] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2024-09-06 08:35:46,896][19093] Initialized policy 0 weights for model version 978 [2024-09-06 08:35:46,906][19093] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 08:35:46,914][19093] LearnerWorker_p0 finished initialization! [2024-09-06 08:35:47,083][19110] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 08:35:47,085][19110] RunningMeanStd input shape: (1,) [2024-09-06 08:35:47,103][19110] ConvEncoder: input_channels=3 [2024-09-06 08:35:47,256][19110] Conv encoder output size: 512 [2024-09-06 08:35:47,257][19110] Policy head output size: 512 [2024-09-06 08:35:47,299][01070] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 08:35:47,343][01070] Inference worker 0-0 is ready! [2024-09-06 08:35:47,345][01070] All inference workers are ready! Signal rollout workers to start! [2024-09-06 08:35:47,666][19115] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,759][19117] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,841][19111] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,847][19116] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,859][19118] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,902][19114] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,911][19113] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,923][19112] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:49,437][01070] Heartbeat connected on Batcher_0 [2024-09-06 08:35:49,445][01070] Heartbeat connected on LearnerWorker_p0 [2024-09-06 08:35:49,476][01070] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-06 08:35:49,689][19112] Decorrelating experience for 0 frames... [2024-09-06 08:35:49,691][19113] Decorrelating experience for 0 frames... [2024-09-06 08:35:49,916][19115] Decorrelating experience for 0 frames... [2024-09-06 08:35:49,947][19117] Decorrelating experience for 0 frames... [2024-09-06 08:35:50,051][19111] Decorrelating experience for 0 frames... [2024-09-06 08:35:50,057][19116] Decorrelating experience for 0 frames... [2024-09-06 08:35:50,424][19112] Decorrelating experience for 32 frames... [2024-09-06 08:35:51,332][19117] Decorrelating experience for 32 frames... [2024-09-06 08:35:51,378][19115] Decorrelating experience for 32 frames... [2024-09-06 08:35:51,423][19111] Decorrelating experience for 32 frames... [2024-09-06 08:35:51,456][19114] Decorrelating experience for 0 frames... [2024-09-06 08:35:51,798][19113] Decorrelating experience for 32 frames... [2024-09-06 08:35:52,103][19112] Decorrelating experience for 64 frames... [2024-09-06 08:35:52,299][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 08:35:52,385][19118] Decorrelating experience for 0 frames... [2024-09-06 08:35:52,773][19116] Decorrelating experience for 32 frames... [2024-09-06 08:35:53,059][19117] Decorrelating experience for 64 frames... [2024-09-06 08:35:53,088][19115] Decorrelating experience for 64 frames... [2024-09-06 08:35:53,163][19111] Decorrelating experience for 64 frames... [2024-09-06 08:35:53,227][19114] Decorrelating experience for 32 frames... [2024-09-06 08:35:54,004][19116] Decorrelating experience for 64 frames... [2024-09-06 08:35:54,080][19111] Decorrelating experience for 96 frames... [2024-09-06 08:35:54,169][01070] Heartbeat connected on RolloutWorker_w1 [2024-09-06 08:35:54,232][19113] Decorrelating experience for 64 frames... [2024-09-06 08:35:54,438][19118] Decorrelating experience for 32 frames... [2024-09-06 08:35:54,884][19114] Decorrelating experience for 64 frames... [2024-09-06 08:35:55,752][19112] Decorrelating experience for 96 frames... [2024-09-06 08:35:55,972][01070] Heartbeat connected on RolloutWorker_w0 [2024-09-06 08:35:56,002][19113] Decorrelating experience for 96 frames... [2024-09-06 08:35:56,247][01070] Heartbeat connected on RolloutWorker_w2 [2024-09-06 08:35:56,489][19118] Decorrelating experience for 64 frames... [2024-09-06 08:35:56,773][19114] Decorrelating experience for 96 frames... [2024-09-06 08:35:57,299][01070] Heartbeat connected on RolloutWorker_w4 [2024-09-06 08:35:57,303][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 71.2. Samples: 712. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 08:35:57,307][01070] Avg episode reward: [(0, '5.702')] [2024-09-06 08:35:57,398][19116] Decorrelating experience for 96 frames... [2024-09-06 08:35:57,706][01070] Heartbeat connected on RolloutWorker_w5 [2024-09-06 08:35:57,841][19117] Decorrelating experience for 96 frames... [2024-09-06 08:35:58,244][01070] Heartbeat connected on RolloutWorker_w7 [2024-09-06 08:36:00,233][19093] Signal inference workers to stop experience collection... [2024-09-06 08:36:00,248][19110] InferenceWorker_p0-w0: stopping experience collection [2024-09-06 08:36:00,307][19118] Decorrelating experience for 96 frames... [2024-09-06 08:36:00,409][01070] Heartbeat connected on RolloutWorker_w6 [2024-09-06 08:36:00,811][19115] Decorrelating experience for 96 frames... [2024-09-06 08:36:00,904][01070] Heartbeat connected on RolloutWorker_w3 [2024-09-06 08:36:02,107][19093] Signal inference workers to resume experience collection... [2024-09-06 08:36:02,108][19110] InferenceWorker_p0-w0: resuming experience collection [2024-09-06 08:36:02,300][01070] Fps is (10 sec: 409.6, 60 sec: 273.0, 300 sec: 273.0). Total num frames: 4009984. Throughput: 0: 147.9. Samples: 2218. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-06 08:36:02,302][01070] Avg episode reward: [(0, '4.701')] [2024-09-06 08:36:07,302][01070] Fps is (10 sec: 1638.4, 60 sec: 819.0, 300 sec: 819.0). Total num frames: 4022272. Throughput: 0: 228.8. Samples: 4576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-09-06 08:36:07,305][01070] Avg episode reward: [(0, '6.521')] [2024-09-06 08:36:12,073][19110] Updated weights for policy 0, policy_version 988 (0.0020) [2024-09-06 08:36:12,300][01070] Fps is (10 sec: 3686.1, 60 sec: 1638.3, 300 sec: 1638.3). Total num frames: 4046848. Throughput: 0: 391.6. Samples: 9790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:36:12,306][01070] Avg episode reward: [(0, '11.236')] [2024-09-06 08:36:17,299][01070] Fps is (10 sec: 4507.3, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 4067328. Throughput: 0: 445.7. Samples: 13370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:36:17,305][01070] Avg episode reward: [(0, '12.296')] [2024-09-06 08:36:22,299][01070] Fps is (10 sec: 3687.1, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 4083712. Throughput: 0: 556.9. Samples: 19492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:36:22,301][01070] Avg episode reward: [(0, '14.420')] [2024-09-06 08:36:22,439][19110] Updated weights for policy 0, policy_version 998 (0.0030) [2024-09-06 08:36:27,299][01070] Fps is (10 sec: 3686.3, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 4104192. Throughput: 0: 602.8. Samples: 24114. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-06 08:36:27,301][01070] Avg episode reward: [(0, '16.055')] [2024-09-06 08:36:32,188][19110] Updated weights for policy 0, policy_version 1008 (0.0021) [2024-09-06 08:36:32,299][01070] Fps is (10 sec: 4505.6, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 4128768. Throughput: 0: 615.9. Samples: 27716. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-06 08:36:32,305][01070] Avg episode reward: [(0, '18.102')] [2024-09-06 08:36:32,309][19093] Saving new best policy, reward=18.102! [2024-09-06 08:36:37,300][01070] Fps is (10 sec: 4505.2, 60 sec: 2867.1, 300 sec: 2867.1). Total num frames: 4149248. Throughput: 0: 773.5. Samples: 34808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:36:37,305][01070] Avg episode reward: [(0, '19.063')] [2024-09-06 08:36:37,312][19093] Saving new best policy, reward=19.063! [2024-09-06 08:36:42,299][01070] Fps is (10 sec: 3276.7, 60 sec: 2829.9, 300 sec: 2829.9). Total num frames: 4161536. Throughput: 0: 852.4. Samples: 39066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:36:42,303][01070] Avg episode reward: [(0, '20.446')] [2024-09-06 08:36:42,306][19093] Saving new best policy, reward=20.446! [2024-09-06 08:36:43,965][19110] Updated weights for policy 0, policy_version 1018 (0.0034) [2024-09-06 08:36:47,299][01070] Fps is (10 sec: 3277.1, 60 sec: 2935.5, 300 sec: 2935.5). Total num frames: 4182016. Throughput: 0: 879.8. Samples: 41808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:36:47,305][01070] Avg episode reward: [(0, '21.840')] [2024-09-06 08:36:47,316][19093] Saving new best policy, reward=21.840! [2024-09-06 08:36:52,299][01070] Fps is (10 sec: 4505.7, 60 sec: 3345.1, 300 sec: 3087.8). Total num frames: 4206592. Throughput: 0: 982.8. Samples: 48798. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-06 08:36:52,304][01070] Avg episode reward: [(0, '21.047')] [2024-09-06 08:36:52,709][19110] Updated weights for policy 0, policy_version 1028 (0.0017) [2024-09-06 08:36:57,304][01070] Fps is (10 sec: 4094.0, 60 sec: 3618.1, 300 sec: 3101.0). Total num frames: 4222976. Throughput: 0: 987.4. Samples: 54226. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:36:57,306][01070] Avg episode reward: [(0, '20.552')] [2024-09-06 08:37:02,299][01070] Fps is (10 sec: 2457.6, 60 sec: 3686.5, 300 sec: 3003.7). Total num frames: 4231168. Throughput: 0: 944.6. Samples: 55878. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:37:02,301][01070] Avg episode reward: [(0, '21.311')] [2024-09-06 08:37:07,058][19110] Updated weights for policy 0, policy_version 1038 (0.0038) [2024-09-06 08:37:07,299][01070] Fps is (10 sec: 2868.7, 60 sec: 3823.2, 300 sec: 3072.0). Total num frames: 4251648. Throughput: 0: 897.5. Samples: 59880. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:37:07,301][01070] Avg episode reward: [(0, '20.332')] [2024-09-06 08:37:12,299][01070] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3132.2). Total num frames: 4272128. Throughput: 0: 950.4. Samples: 66884. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:37:12,301][01070] Avg episode reward: [(0, '19.661')] [2024-09-06 08:37:17,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3140.3). Total num frames: 4288512. Throughput: 0: 928.7. Samples: 69506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:37:17,302][01070] Avg episode reward: [(0, '20.777')] [2024-09-06 08:37:18,168][19110] Updated weights for policy 0, policy_version 1048 (0.0027) [2024-09-06 08:37:22,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3190.6). Total num frames: 4308992. Throughput: 0: 873.3. Samples: 74104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:37:22,304][01070] Avg episode reward: [(0, '21.018')] [2024-09-06 08:37:27,299][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 4333568. Throughput: 0: 938.8. Samples: 81312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:37:27,300][19110] Updated weights for policy 0, policy_version 1058 (0.0031) [2024-09-06 08:37:27,301][01070] Avg episode reward: [(0, '21.381')] [2024-09-06 08:37:27,314][19093] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001058_4333568.pth... [2024-09-06 08:37:27,438][19093] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000975_3993600.pth [2024-09-06 08:37:32,300][01070] Fps is (10 sec: 4095.5, 60 sec: 3686.3, 300 sec: 3276.8). Total num frames: 4349952. Throughput: 0: 954.6. Samples: 84768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:37:32,306][01070] Avg episode reward: [(0, '20.498')] [2024-09-06 08:37:37,299][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3276.8). Total num frames: 4366336. Throughput: 0: 899.8. Samples: 89288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:37:37,302][01070] Avg episode reward: [(0, '20.356')] [2024-09-06 08:37:38,951][19110] Updated weights for policy 0, policy_version 1068 (0.0030) [2024-09-06 08:37:42,302][01070] Fps is (10 sec: 3685.8, 60 sec: 3754.5, 300 sec: 3312.3). Total num frames: 4386816. Throughput: 0: 918.0. Samples: 95532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:37:42,306][01070] Avg episode reward: [(0, '18.966')] [2024-09-06 08:37:47,299][01070] Fps is (10 sec: 4505.6, 60 sec: 3823.0, 300 sec: 3379.2). Total num frames: 4411392. Throughput: 0: 960.9. Samples: 99118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:37:47,303][01070] Avg episode reward: [(0, '19.345')] [2024-09-06 08:37:48,240][19110] Updated weights for policy 0, policy_version 1078 (0.0013) [2024-09-06 08:37:52,299][01070] Fps is (10 sec: 3687.5, 60 sec: 3618.1, 300 sec: 3342.3). Total num frames: 4423680. Throughput: 0: 989.1. Samples: 104388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:37:52,301][01070] Avg episode reward: [(0, '19.443')] [2024-09-06 08:37:57,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3755.0, 300 sec: 3402.8). Total num frames: 4448256. Throughput: 0: 957.7. Samples: 109982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:37:57,301][01070] Avg episode reward: [(0, '20.099')] [2024-09-06 08:37:59,151][19110] Updated weights for policy 0, policy_version 1088 (0.0024) [2024-09-06 08:38:02,299][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3428.5). Total num frames: 4468736. Throughput: 0: 978.0. Samples: 113514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:38:02,306][01070] Avg episode reward: [(0, '20.360')] [2024-09-06 08:38:07,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3423.1). Total num frames: 4485120. Throughput: 0: 1019.1. Samples: 119964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:38:07,302][01070] Avg episode reward: [(0, '20.240')] [2024-09-06 08:38:10,312][19110] Updated weights for policy 0, policy_version 1098 (0.0022) [2024-09-06 08:38:12,299][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3418.0). Total num frames: 4501504. Throughput: 0: 953.8. Samples: 124232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:38:12,304][01070] Avg episode reward: [(0, '19.557')] [2024-09-06 08:38:17,299][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3467.9). Total num frames: 4526080. Throughput: 0: 954.6. Samples: 127726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:38:17,301][01070] Avg episode reward: [(0, '20.239')] [2024-09-06 08:38:19,439][19110] Updated weights for policy 0, policy_version 1108 (0.0041) [2024-09-06 08:38:22,301][01070] Fps is (10 sec: 4504.4, 60 sec: 3959.3, 300 sec: 3488.1). Total num frames: 4546560. Throughput: 0: 1011.7. Samples: 134818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:38:22,304][01070] Avg episode reward: [(0, '20.506')] [2024-09-06 08:38:27,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3481.6). Total num frames: 4562944. Throughput: 0: 977.9. Samples: 139534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:38:27,303][01070] Avg episode reward: [(0, '20.585')] [2024-09-06 08:38:31,030][19110] Updated weights for policy 0, policy_version 1118 (0.0046) [2024-09-06 08:38:32,299][01070] Fps is (10 sec: 3687.4, 60 sec: 3891.3, 300 sec: 3500.2). Total num frames: 4583424. Throughput: 0: 953.9. Samples: 142044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:38:32,306][01070] Avg episode reward: [(0, '20.308')] [2024-09-06 08:38:37,299][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3541.8). Total num frames: 4608000. Throughput: 0: 993.6. Samples: 149102. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:38:37,304][01070] Avg episode reward: [(0, '21.252')] [2024-09-06 08:38:40,626][19110] Updated weights for policy 0, policy_version 1128 (0.0030) [2024-09-06 08:38:42,299][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3534.3). Total num frames: 4624384. Throughput: 0: 994.4. Samples: 154730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:38:42,303][01070] Avg episode reward: [(0, '20.914')] [2024-09-06 08:38:47,299][01070] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3527.1). Total num frames: 4640768. Throughput: 0: 963.1. Samples: 156854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:38:47,307][01070] Avg episode reward: [(0, '20.844')] [2024-09-06 08:38:51,335][19110] Updated weights for policy 0, policy_version 1138 (0.0028) [2024-09-06 08:38:52,299][01070] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3564.6). Total num frames: 4665344. Throughput: 0: 963.3. Samples: 163312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:38:52,301][01070] Avg episode reward: [(0, '22.493')] [2024-09-06 08:38:52,304][19093] Saving new best policy, reward=22.493! [2024-09-06 08:38:57,299][01070] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3578.6). Total num frames: 4685824. Throughput: 0: 1017.3. Samples: 170010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:38:57,305][01070] Avg episode reward: [(0, '21.806')] [2024-09-06 08:39:02,299][01070] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 4698112. Throughput: 0: 985.0. Samples: 172050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:39:02,300][01070] Avg episode reward: [(0, '22.885')] [2024-09-06 08:39:02,305][19093] Saving new best policy, reward=22.885! [2024-09-06 08:39:02,950][19110] Updated weights for policy 0, policy_version 1148 (0.0035) [2024-09-06 08:39:07,299][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3563.5). Total num frames: 4718592. Throughput: 0: 948.3. Samples: 177488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:39:07,301][01070] Avg episode reward: [(0, '23.190')] [2024-09-06 08:39:07,318][19093] Saving new best policy, reward=23.190! [2024-09-06 08:39:11,864][19110] Updated weights for policy 0, policy_version 1158 (0.0024) [2024-09-06 08:39:12,299][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3596.5). Total num frames: 4743168. Throughput: 0: 994.3. Samples: 184276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:39:12,303][01070] Avg episode reward: [(0, '23.069')] [2024-09-06 08:39:17,299][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3588.9). Total num frames: 4759552. Throughput: 0: 1001.5. Samples: 187112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:39:17,301][01070] Avg episode reward: [(0, '22.125')] [2024-09-06 08:39:22,299][01070] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3581.6). Total num frames: 4775936. Throughput: 0: 941.3. Samples: 191462. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 08:39:22,303][01070] Avg episode reward: [(0, '21.027')] [2024-09-06 08:39:23,456][19110] Updated weights for policy 0, policy_version 1168 (0.0022) [2024-09-06 08:39:27,299][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3611.9). Total num frames: 4800512. Throughput: 0: 973.6. Samples: 198540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:39:27,301][01070] Avg episode reward: [(0, '19.180')] [2024-09-06 08:39:27,314][19093] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001172_4800512.pth... [2024-09-06 08:39:27,463][19093] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2024-09-06 08:39:32,300][01070] Fps is (10 sec: 4504.9, 60 sec: 3959.4, 300 sec: 3622.7). Total num frames: 4820992. Throughput: 0: 1000.5. Samples: 201878. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:39:32,303][01070] Avg episode reward: [(0, '20.069')] [2024-09-06 08:39:33,797][19110] Updated weights for policy 0, policy_version 1178 (0.0024) [2024-09-06 08:39:37,299][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3597.4). Total num frames: 4833280. Throughput: 0: 959.6. Samples: 206492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:39:37,301][01070] Avg episode reward: [(0, '19.170')] [2024-09-06 08:39:42,299][01070] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3625.4). Total num frames: 4857856. Throughput: 0: 947.1. Samples: 212628. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 08:39:42,306][01070] Avg episode reward: [(0, '20.382')] [2024-09-06 08:39:43,732][19110] Updated weights for policy 0, policy_version 1188 (0.0024) [2024-09-06 08:39:47,299][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3635.2). Total num frames: 4878336. Throughput: 0: 980.8. Samples: 216188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:39:47,313][01070] Avg episode reward: [(0, '21.641')] [2024-09-06 08:39:52,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3627.9). Total num frames: 4894720. Throughput: 0: 984.4. Samples: 221786. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:39:52,301][01070] Avg episode reward: [(0, '21.803')] [2024-09-06 08:39:55,248][19110] Updated weights for policy 0, policy_version 1198 (0.0031) [2024-09-06 08:39:57,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3637.2). Total num frames: 4915200. Throughput: 0: 951.6. Samples: 227098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:39:57,304][01070] Avg episode reward: [(0, '22.439')] [2024-09-06 08:40:02,299][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3662.3). Total num frames: 4939776. Throughput: 0: 968.4. Samples: 230692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:40:02,305][01070] Avg episode reward: [(0, '23.387')] [2024-09-06 08:40:02,307][19093] Saving new best policy, reward=23.387! [2024-09-06 08:40:04,048][19110] Updated weights for policy 0, policy_version 1208 (0.0019) [2024-09-06 08:40:07,299][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3654.9). Total num frames: 4956160. Throughput: 0: 1015.8. Samples: 237174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:40:07,304][01070] Avg episode reward: [(0, '25.478')] [2024-09-06 08:40:07,312][19093] Saving new best policy, reward=25.478! [2024-09-06 08:40:12,299][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3647.8). Total num frames: 4972544. Throughput: 0: 950.4. Samples: 241310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:40:12,301][01070] Avg episode reward: [(0, '26.573')] [2024-09-06 08:40:12,304][19093] Saving new best policy, reward=26.573! [2024-09-06 08:40:15,918][19110] Updated weights for policy 0, policy_version 1218 (0.0034) [2024-09-06 08:40:17,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3656.1). Total num frames: 4993024. Throughput: 0: 947.8. Samples: 244528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:40:17,301][01070] Avg episode reward: [(0, '26.655')] [2024-09-06 08:40:17,310][19093] Saving new best policy, reward=26.655! [2024-09-06 08:40:19,413][19093] Stopping Batcher_0... [2024-09-06 08:40:19,413][01070] Component Batcher_0 stopped! [2024-09-06 08:40:19,415][19093] Loop batcher_evt_loop terminating... [2024-09-06 08:40:19,420][19093] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-09-06 08:40:19,470][19110] Weights refcount: 2 0 [2024-09-06 08:40:19,478][01070] Component InferenceWorker_p0-w0 stopped! [2024-09-06 08:40:19,481][19110] Stopping InferenceWorker_p0-w0... [2024-09-06 08:40:19,481][19110] Loop inference_proc0-0_evt_loop terminating... [2024-09-06 08:40:19,543][19093] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001058_4333568.pth [2024-09-06 08:40:19,557][19093] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-09-06 08:40:19,761][01070] Component LearnerWorker_p0 stopped! [2024-09-06 08:40:19,761][19093] Stopping LearnerWorker_p0... [2024-09-06 08:40:19,766][19093] Loop learner_proc0_evt_loop terminating... [2024-09-06 08:40:19,774][01070] Component RolloutWorker_w3 stopped! [2024-09-06 08:40:19,777][19115] Stopping RolloutWorker_w3... [2024-09-06 08:40:19,779][19115] Loop rollout_proc3_evt_loop terminating... [2024-09-06 08:40:19,804][01070] Component RolloutWorker_w1 stopped! [2024-09-06 08:40:19,806][19111] Stopping RolloutWorker_w1... [2024-09-06 08:40:19,813][19111] Loop rollout_proc1_evt_loop terminating... [2024-09-06 08:40:19,874][01070] Component RolloutWorker_w7 stopped! [2024-09-06 08:40:19,880][19117] Stopping RolloutWorker_w7... [2024-09-06 08:40:19,888][19117] Loop rollout_proc7_evt_loop terminating... [2024-09-06 08:40:19,902][01070] Component RolloutWorker_w5 stopped! [2024-09-06 08:40:19,904][19116] Stopping RolloutWorker_w5... [2024-09-06 08:40:19,909][19116] Loop rollout_proc5_evt_loop terminating... [2024-09-06 08:40:20,003][19118] Stopping RolloutWorker_w6... [2024-09-06 08:40:20,003][19118] Loop rollout_proc6_evt_loop terminating... [2024-09-06 08:40:20,003][01070] Component RolloutWorker_w6 stopped! [2024-09-06 08:40:20,035][19113] Stopping RolloutWorker_w2... [2024-09-06 08:40:20,037][19113] Loop rollout_proc2_evt_loop terminating... [2024-09-06 08:40:20,035][01070] Component RolloutWorker_w2 stopped! [2024-09-06 08:40:20,058][19114] Stopping RolloutWorker_w4... [2024-09-06 08:40:20,057][01070] Component RolloutWorker_w4 stopped! [2024-09-06 08:40:20,062][19114] Loop rollout_proc4_evt_loop terminating... [2024-09-06 08:40:20,066][19112] Stopping RolloutWorker_w0... [2024-09-06 08:40:20,066][01070] Component RolloutWorker_w0 stopped! [2024-09-06 08:40:20,068][01070] Waiting for process learner_proc0 to stop... [2024-09-06 08:40:20,078][19112] Loop rollout_proc0_evt_loop terminating... [2024-09-06 08:40:21,230][01070] Waiting for process inference_proc0-0 to join... [2024-09-06 08:40:21,233][01070] Waiting for process rollout_proc0 to join... [2024-09-06 08:40:24,143][01070] Waiting for process rollout_proc1 to join... [2024-09-06 08:40:24,148][01070] Waiting for process rollout_proc2 to join... [2024-09-06 08:40:24,150][01070] Waiting for process rollout_proc3 to join... [2024-09-06 08:40:24,153][01070] Waiting for process rollout_proc4 to join... [2024-09-06 08:40:24,159][01070] Waiting for process rollout_proc5 to join... [2024-09-06 08:40:24,162][01070] Waiting for process rollout_proc6 to join... [2024-09-06 08:40:24,165][01070] Waiting for process rollout_proc7 to join... [2024-09-06 08:40:24,168][01070] Batcher 0 profile tree view: batching: 7.2186, releasing_batches: 0.0064 [2024-09-06 08:40:24,170][01070] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0054 wait_policy_total: 102.6532 update_model: 2.2548 weight_update: 0.0030 one_step: 0.0026 handle_policy_step: 154.4121 deserialize: 3.6770, stack: 0.8213, obs_to_device_normalize: 31.1067, forward: 82.7946, send_messages: 7.5294 prepare_outputs: 21.0675 to_cpu: 12.5076 [2024-09-06 08:40:24,171][01070] Learner 0 profile tree view: misc: 0.0012, prepare_batch: 5.2823 train: 21.8730 epoch_init: 0.0014, minibatch_init: 0.0016, losses_postprocess: 0.1603, kl_divergence: 0.2048, after_optimizer: 0.9110 calculate_losses: 8.7013 losses_init: 0.0013, forward_head: 0.6601, bptt_initial: 6.1101, tail: 0.3325, advantages_returns: 0.0894, losses: 0.9715 bptt: 0.4678 bptt_forward_core: 0.4488 update: 11.7699 clip: 0.2453 [2024-09-06 08:40:24,173][01070] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0619, enqueue_policy_requests: 23.3356, env_step: 206.1955, overhead: 3.3833, complete_rollouts: 1.8563 save_policy_outputs: 5.2619 split_output_tensors: 2.1953 [2024-09-06 08:40:24,175][01070] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0745, enqueue_policy_requests: 24.4494, env_step: 203.3563, overhead: 3.1667, complete_rollouts: 1.6788 save_policy_outputs: 4.9515 split_output_tensors: 2.0046 [2024-09-06 08:40:24,177][01070] Loop Runner_EvtLoop terminating... [2024-09-06 08:40:24,179][01070] Runner profile tree view: main_loop: 294.6944 [2024-09-06 08:40:24,180][01070] Collected {0: 5005312}, FPS: 3391.4 [2024-09-06 08:56:44,318][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 08:56:44,320][01070] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-06 08:56:44,322][01070] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-06 08:56:44,324][01070] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-06 08:56:44,326][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 08:56:44,327][01070] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-06 08:56:44,328][01070] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 08:56:44,329][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-06 08:56:44,331][01070] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-06 08:56:44,332][01070] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-06 08:56:44,333][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-06 08:56:44,334][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-06 08:56:44,339][01070] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-06 08:56:44,340][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-06 08:56:44,341][01070] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-06 08:56:44,368][01070] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 08:56:44,370][01070] RunningMeanStd input shape: (1,) [2024-09-06 08:56:44,390][01070] ConvEncoder: input_channels=3 [2024-09-06 08:56:44,435][01070] Conv encoder output size: 512 [2024-09-06 08:56:44,436][01070] Policy head output size: 512 [2024-09-06 08:56:44,457][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-09-06 08:56:44,883][01070] Num frames 100... [2024-09-06 08:56:45,006][01070] Num frames 200... [2024-09-06 08:56:45,128][01070] Num frames 300... [2024-09-06 08:56:45,247][01070] Num frames 400... [2024-09-06 08:56:45,367][01070] Num frames 500... [2024-09-06 08:56:45,497][01070] Num frames 600... [2024-09-06 08:56:45,618][01070] Num frames 700... [2024-09-06 08:56:45,741][01070] Num frames 800... [2024-09-06 08:56:45,860][01070] Num frames 900... [2024-09-06 08:56:45,978][01070] Num frames 1000... [2024-09-06 08:56:46,102][01070] Num frames 1100... [2024-09-06 08:56:46,224][01070] Num frames 1200... [2024-09-06 08:56:46,348][01070] Num frames 1300... [2024-09-06 08:56:46,479][01070] Num frames 1400... [2024-09-06 08:56:46,606][01070] Num frames 1500... [2024-09-06 08:56:46,726][01070] Num frames 1600... [2024-09-06 08:56:46,845][01070] Num frames 1700... [2024-09-06 08:56:46,967][01070] Num frames 1800... [2024-09-06 08:56:47,110][01070] Avg episode rewards: #0: 46.719, true rewards: #0: 18.720 [2024-09-06 08:56:47,113][01070] Avg episode reward: 46.719, avg true_objective: 18.720 [2024-09-06 08:56:47,150][01070] Num frames 1900... [2024-09-06 08:56:47,268][01070] Num frames 2000... [2024-09-06 08:56:47,388][01070] Num frames 2100... [2024-09-06 08:56:47,523][01070] Num frames 2200... [2024-09-06 08:56:47,671][01070] Num frames 2300... [2024-09-06 08:56:47,751][01070] Avg episode rewards: #0: 27.100, true rewards: #0: 11.600 [2024-09-06 08:56:47,752][01070] Avg episode reward: 27.100, avg true_objective: 11.600 [2024-09-06 08:56:47,854][01070] Num frames 2400... [2024-09-06 08:56:47,975][01070] Num frames 2500... [2024-09-06 08:56:48,094][01070] Num frames 2600... [2024-09-06 08:56:48,217][01070] Num frames 2700... [2024-09-06 08:56:48,342][01070] Num frames 2800... [2024-09-06 08:56:48,467][01070] Num frames 2900... [2024-09-06 08:56:48,602][01070] Num frames 3000... [2024-09-06 08:56:48,749][01070] Num frames 3100... [2024-09-06 08:56:48,919][01070] Num frames 3200... [2024-09-06 08:56:49,093][01070] Num frames 3300... [2024-09-06 08:56:49,273][01070] Avg episode rewards: #0: 26.253, true rewards: #0: 11.253 [2024-09-06 08:56:49,277][01070] Avg episode reward: 26.253, avg true_objective: 11.253 [2024-09-06 08:56:49,319][01070] Num frames 3400... [2024-09-06 08:56:49,496][01070] Num frames 3500... [2024-09-06 08:56:49,661][01070] Num frames 3600... [2024-09-06 08:56:49,822][01070] Num frames 3700... [2024-09-06 08:56:50,036][01070] Avg episode rewards: #0: 21.230, true rewards: #0: 9.480 [2024-09-06 08:56:50,038][01070] Avg episode reward: 21.230, avg true_objective: 9.480 [2024-09-06 08:56:50,057][01070] Num frames 3800... [2024-09-06 08:56:50,224][01070] Num frames 3900... [2024-09-06 08:56:50,397][01070] Num frames 4000... [2024-09-06 08:56:50,585][01070] Num frames 4100... [2024-09-06 08:56:50,755][01070] Num frames 4200... [2024-09-06 08:56:50,944][01070] Num frames 4300... [2024-09-06 08:56:51,125][01070] Num frames 4400... [2024-09-06 08:56:51,259][01070] Num frames 4500... [2024-09-06 08:56:51,382][01070] Num frames 4600... [2024-09-06 08:56:51,504][01070] Num frames 4700... [2024-09-06 08:56:51,623][01070] Num frames 4800... [2024-09-06 08:56:51,752][01070] Num frames 4900... [2024-09-06 08:56:51,870][01070] Num frames 5000... [2024-09-06 08:56:51,988][01070] Num frames 5100... [2024-09-06 08:56:52,111][01070] Num frames 5200... [2024-09-06 08:56:52,228][01070] Num frames 5300... [2024-09-06 08:56:52,393][01070] Avg episode rewards: #0: 24.784, true rewards: #0: 10.784 [2024-09-06 08:56:52,394][01070] Avg episode reward: 24.784, avg true_objective: 10.784 [2024-09-06 08:56:52,408][01070] Num frames 5400... [2024-09-06 08:56:52,534][01070] Num frames 5500... [2024-09-06 08:56:52,657][01070] Num frames 5600... [2024-09-06 08:56:52,788][01070] Num frames 5700... [2024-09-06 08:56:52,907][01070] Num frames 5800... [2024-09-06 08:56:53,028][01070] Num frames 5900... [2024-09-06 08:56:53,148][01070] Num frames 6000... [2024-09-06 08:56:53,268][01070] Num frames 6100... [2024-09-06 08:56:53,388][01070] Num frames 6200... [2024-09-06 08:56:53,515][01070] Num frames 6300... [2024-09-06 08:56:53,638][01070] Num frames 6400... [2024-09-06 08:56:53,779][01070] Num frames 6500... [2024-09-06 08:56:53,901][01070] Num frames 6600... [2024-09-06 08:56:54,025][01070] Num frames 6700... [2024-09-06 08:56:54,146][01070] Num frames 6800... [2024-09-06 08:56:54,268][01070] Num frames 6900... [2024-09-06 08:56:54,390][01070] Num frames 7000... [2024-09-06 08:56:54,560][01070] Avg episode rewards: #0: 28.313, true rewards: #0: 11.813 [2024-09-06 08:56:54,562][01070] Avg episode reward: 28.313, avg true_objective: 11.813 [2024-09-06 08:56:54,581][01070] Num frames 7100... [2024-09-06 08:56:54,699][01070] Num frames 7200... [2024-09-06 08:56:54,827][01070] Num frames 7300... [2024-09-06 08:56:54,945][01070] Num frames 7400... [2024-09-06 08:56:55,064][01070] Num frames 7500... [2024-09-06 08:56:55,188][01070] Num frames 7600... [2024-09-06 08:56:55,281][01070] Avg episode rewards: #0: 25.760, true rewards: #0: 10.903 [2024-09-06 08:56:55,283][01070] Avg episode reward: 25.760, avg true_objective: 10.903 [2024-09-06 08:56:55,366][01070] Num frames 7700... [2024-09-06 08:56:55,489][01070] Num frames 7800... [2024-09-06 08:56:55,612][01070] Num frames 7900... [2024-09-06 08:56:55,732][01070] Num frames 8000... [2024-09-06 08:56:55,892][01070] Avg episode rewards: #0: 23.600, true rewards: #0: 10.100 [2024-09-06 08:56:55,894][01070] Avg episode reward: 23.600, avg true_objective: 10.100 [2024-09-06 08:56:55,921][01070] Num frames 8100... [2024-09-06 08:56:56,039][01070] Num frames 8200... [2024-09-06 08:56:56,159][01070] Num frames 8300... [2024-09-06 08:56:56,280][01070] Num frames 8400... [2024-09-06 08:56:56,402][01070] Num frames 8500... [2024-09-06 08:56:56,532][01070] Num frames 8600... [2024-09-06 08:56:56,656][01070] Num frames 8700... [2024-09-06 08:56:56,782][01070] Num frames 8800... [2024-09-06 08:56:56,911][01070] Num frames 8900... [2024-09-06 08:56:57,036][01070] Num frames 9000... [2024-09-06 08:56:57,162][01070] Num frames 9100... [2024-09-06 08:56:57,286][01070] Num frames 9200... [2024-09-06 08:56:57,409][01070] Num frames 9300... [2024-09-06 08:56:57,498][01070] Avg episode rewards: #0: 24.253, true rewards: #0: 10.364 [2024-09-06 08:56:57,500][01070] Avg episode reward: 24.253, avg true_objective: 10.364 [2024-09-06 08:56:57,588][01070] Num frames 9400... [2024-09-06 08:56:57,710][01070] Num frames 9500... [2024-09-06 08:56:57,848][01070] Num frames 9600... [2024-09-06 08:56:57,973][01070] Num frames 9700... [2024-09-06 08:56:58,097][01070] Num frames 9800... [2024-09-06 08:56:58,209][01070] Avg episode rewards: #0: 22.746, true rewards: #0: 9.846 [2024-09-06 08:56:58,211][01070] Avg episode reward: 22.746, avg true_objective: 9.846 [2024-09-06 08:57:59,802][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-06 09:00:55,778][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 09:00:55,780][01070] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-06 09:00:55,782][01070] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-06 09:00:55,784][01070] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-06 09:00:55,786][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 09:00:55,789][01070] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-06 09:00:55,791][01070] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-06 09:00:55,792][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-06 09:00:55,793][01070] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-06 09:00:55,794][01070] Adding new argument 'hf_repository'='Re-Re/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-06 09:00:55,795][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-06 09:00:55,796][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-06 09:00:55,797][01070] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-06 09:00:55,798][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-06 09:00:55,799][01070] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-06 09:00:55,830][01070] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 09:00:55,832][01070] RunningMeanStd input shape: (1,) [2024-09-06 09:00:55,846][01070] ConvEncoder: input_channels=3 [2024-09-06 09:00:55,882][01070] Conv encoder output size: 512 [2024-09-06 09:00:55,884][01070] Policy head output size: 512 [2024-09-06 09:00:55,904][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-09-06 09:00:56,320][01070] Num frames 100... [2024-09-06 09:00:56,441][01070] Num frames 200... [2024-09-06 09:00:56,597][01070] Num frames 300... [2024-09-06 09:00:56,715][01070] Num frames 400... [2024-09-06 09:00:56,840][01070] Num frames 500... [2024-09-06 09:00:56,960][01070] Num frames 600... [2024-09-06 09:00:57,080][01070] Num frames 700... [2024-09-06 09:00:57,182][01070] Avg episode rewards: #0: 16.380, true rewards: #0: 7.380 [2024-09-06 09:00:57,184][01070] Avg episode reward: 16.380, avg true_objective: 7.380 [2024-09-06 09:00:57,264][01070] Num frames 800... [2024-09-06 09:00:57,399][01070] Num frames 900... [2024-09-06 09:00:57,529][01070] Num frames 1000... [2024-09-06 09:00:57,650][01070] Num frames 1100... [2024-09-06 09:00:57,775][01070] Num frames 1200... [2024-09-06 09:00:57,898][01070] Num frames 1300... [2024-09-06 09:00:58,016][01070] Num frames 1400... [2024-09-06 09:00:58,136][01070] Num frames 1500... [2024-09-06 09:00:58,255][01070] Num frames 1600... [2024-09-06 09:00:58,314][01070] Avg episode rewards: #0: 18.010, true rewards: #0: 8.010 [2024-09-06 09:00:58,316][01070] Avg episode reward: 18.010, avg true_objective: 8.010 [2024-09-06 09:00:58,433][01070] Num frames 1700... [2024-09-06 09:00:58,562][01070] Num frames 1800... [2024-09-06 09:00:58,686][01070] Num frames 1900... [2024-09-06 09:00:58,804][01070] Num frames 2000... [2024-09-06 09:00:58,927][01070] Num frames 2100... [2024-09-06 09:00:59,047][01070] Num frames 2200... [2024-09-06 09:00:59,165][01070] Num frames 2300... [2024-09-06 09:00:59,285][01070] Num frames 2400... [2024-09-06 09:00:59,413][01070] Num frames 2500... [2024-09-06 09:00:59,544][01070] Num frames 2600... [2024-09-06 09:00:59,669][01070] Num frames 2700... [2024-09-06 09:00:59,789][01070] Num frames 2800... [2024-09-06 09:00:59,937][01070] Num frames 2900... [2024-09-06 09:01:00,112][01070] Num frames 3000... [2024-09-06 09:01:00,278][01070] Num frames 3100... [2024-09-06 09:01:00,453][01070] Num frames 3200... [2024-09-06 09:01:00,623][01070] Num frames 3300... [2024-09-06 09:01:00,786][01070] Num frames 3400... [2024-09-06 09:01:00,952][01070] Num frames 3500... [2024-09-06 09:01:01,119][01070] Num frames 3600... [2024-09-06 09:01:01,294][01070] Num frames 3700... [2024-09-06 09:01:01,356][01070] Avg episode rewards: #0: 30.673, true rewards: #0: 12.340 [2024-09-06 09:01:01,357][01070] Avg episode reward: 30.673, avg true_objective: 12.340 [2024-09-06 09:01:01,535][01070] Num frames 3800... [2024-09-06 09:01:01,711][01070] Num frames 3900... [2024-09-06 09:01:01,881][01070] Num frames 4000... [2024-09-06 09:01:02,049][01070] Num frames 4100... [2024-09-06 09:01:02,222][01070] Num frames 4200... [2024-09-06 09:01:02,396][01070] Num frames 4300... [2024-09-06 09:01:02,534][01070] Num frames 4400... [2024-09-06 09:01:02,655][01070] Num frames 4500... [2024-09-06 09:01:02,776][01070] Num frames 4600... [2024-09-06 09:01:02,898][01070] Num frames 4700... [2024-09-06 09:01:03,020][01070] Num frames 4800... [2024-09-06 09:01:03,140][01070] Num frames 4900... [2024-09-06 09:01:03,261][01070] Num frames 5000... [2024-09-06 09:01:03,383][01070] Num frames 5100... [2024-09-06 09:01:03,519][01070] Num frames 5200... [2024-09-06 09:01:03,641][01070] Num frames 5300... [2024-09-06 09:01:03,817][01070] Avg episode rewards: #0: 34.225, true rewards: #0: 13.475 [2024-09-06 09:01:03,818][01070] Avg episode reward: 34.225, avg true_objective: 13.475 [2024-09-06 09:01:03,835][01070] Num frames 5400... [2024-09-06 09:01:03,956][01070] Num frames 5500... [2024-09-06 09:01:04,074][01070] Num frames 5600... [2024-09-06 09:01:04,194][01070] Num frames 5700... [2024-09-06 09:01:04,314][01070] Num frames 5800... [2024-09-06 09:01:04,436][01070] Num frames 5900... [2024-09-06 09:01:04,574][01070] Num frames 6000... [2024-09-06 09:01:04,693][01070] Num frames 6100... [2024-09-06 09:01:04,810][01070] Num frames 6200... [2024-09-06 09:01:04,934][01070] Num frames 6300... [2024-09-06 09:01:05,057][01070] Num frames 6400... [2024-09-06 09:01:05,177][01070] Num frames 6500... [2024-09-06 09:01:05,296][01070] Num frames 6600... [2024-09-06 09:01:05,424][01070] Num frames 6700... [2024-09-06 09:01:05,562][01070] Num frames 6800... [2024-09-06 09:01:05,698][01070] Num frames 6900... [2024-09-06 09:01:05,816][01070] Num frames 7000... [2024-09-06 09:01:05,936][01070] Num frames 7100... [2024-09-06 09:01:06,057][01070] Num frames 7200... [2024-09-06 09:01:06,179][01070] Num frames 7300... [2024-09-06 09:01:06,303][01070] Num frames 7400... [2024-09-06 09:01:06,469][01070] Avg episode rewards: #0: 38.779, true rewards: #0: 14.980 [2024-09-06 09:01:06,474][01070] Avg episode reward: 38.779, avg true_objective: 14.980 [2024-09-06 09:01:06,493][01070] Num frames 7500... [2024-09-06 09:01:06,629][01070] Num frames 7600... [2024-09-06 09:01:06,753][01070] Num frames 7700... [2024-09-06 09:01:06,875][01070] Num frames 7800... [2024-09-06 09:01:06,995][01070] Num frames 7900... [2024-09-06 09:01:07,118][01070] Num frames 8000... [2024-09-06 09:01:07,240][01070] Num frames 8100... [2024-09-06 09:01:07,359][01070] Num frames 8200... [2024-09-06 09:01:07,476][01070] Avg episode rewards: #0: 34.920, true rewards: #0: 13.753 [2024-09-06 09:01:07,478][01070] Avg episode reward: 34.920, avg true_objective: 13.753 [2024-09-06 09:01:07,541][01070] Num frames 8300... [2024-09-06 09:01:07,670][01070] Num frames 8400... [2024-09-06 09:01:07,789][01070] Num frames 8500... [2024-09-06 09:01:07,912][01070] Num frames 8600... [2024-09-06 09:01:08,030][01070] Num frames 8700... [2024-09-06 09:01:08,150][01070] Num frames 8800... [2024-09-06 09:01:08,275][01070] Num frames 8900... [2024-09-06 09:01:08,400][01070] Avg episode rewards: #0: 31.794, true rewards: #0: 12.794 [2024-09-06 09:01:08,402][01070] Avg episode reward: 31.794, avg true_objective: 12.794 [2024-09-06 09:01:08,457][01070] Num frames 9000... [2024-09-06 09:01:08,587][01070] Num frames 9100... [2024-09-06 09:01:08,723][01070] Num frames 9200... [2024-09-06 09:01:08,851][01070] Num frames 9300... [2024-09-06 09:01:08,979][01070] Num frames 9400... [2024-09-06 09:01:09,106][01070] Num frames 9500... [2024-09-06 09:01:09,230][01070] Num frames 9600... [2024-09-06 09:01:09,356][01070] Num frames 9700... [2024-09-06 09:01:09,487][01070] Num frames 9800... [2024-09-06 09:01:09,614][01070] Num frames 9900... [2024-09-06 09:01:09,745][01070] Num frames 10000... [2024-09-06 09:01:09,870][01070] Num frames 10100... [2024-09-06 09:01:09,995][01070] Num frames 10200... [2024-09-06 09:01:10,118][01070] Num frames 10300... [2024-09-06 09:01:10,238][01070] Num frames 10400... [2024-09-06 09:01:10,360][01070] Num frames 10500... [2024-09-06 09:01:10,493][01070] Num frames 10600... [2024-09-06 09:01:10,614][01070] Num frames 10700... [2024-09-06 09:01:10,741][01070] Num frames 10800... [2024-09-06 09:01:10,866][01070] Num frames 10900... [2024-09-06 09:01:10,993][01070] Num frames 11000... [2024-09-06 09:01:11,116][01070] Avg episode rewards: #0: 34.820, true rewards: #0: 13.820 [2024-09-06 09:01:11,118][01070] Avg episode reward: 34.820, avg true_objective: 13.820 [2024-09-06 09:01:11,175][01070] Num frames 11100... [2024-09-06 09:01:11,294][01070] Num frames 11200... [2024-09-06 09:01:11,414][01070] Num frames 11300... [2024-09-06 09:01:11,549][01070] Num frames 11400... [2024-09-06 09:01:11,615][01070] Avg episode rewards: #0: 31.453, true rewards: #0: 12.676 [2024-09-06 09:01:11,617][01070] Avg episode reward: 31.453, avg true_objective: 12.676 [2024-09-06 09:01:11,740][01070] Num frames 11500... [2024-09-06 09:01:11,869][01070] Num frames 11600... [2024-09-06 09:01:11,989][01070] Num frames 11700... [2024-09-06 09:01:12,106][01070] Num frames 11800... [2024-09-06 09:01:12,231][01070] Avg episode rewards: #0: 29.156, true rewards: #0: 11.856 [2024-09-06 09:01:12,233][01070] Avg episode reward: 29.156, avg true_objective: 11.856 [2024-09-06 09:02:25,893][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4!