[2024-09-06 07:54:51,174][01070] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-06 07:54:51,178][01070] Rollout worker 0 uses device cpu [2024-09-06 07:54:51,179][01070] Rollout worker 1 uses device cpu [2024-09-06 07:54:51,182][01070] Rollout worker 2 uses device cpu [2024-09-06 07:54:51,183][01070] Rollout worker 3 uses device cpu [2024-09-06 07:54:51,184][01070] Rollout worker 4 uses device cpu [2024-09-06 07:54:51,185][01070] Rollout worker 5 uses device cpu [2024-09-06 07:54:51,186][01070] Rollout worker 6 uses device cpu [2024-09-06 07:54:51,187][01070] Rollout worker 7 uses device cpu [2024-09-06 07:54:51,345][01070] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 07:54:51,348][01070] InferenceWorker_p0-w0: min num requests: 2 [2024-09-06 07:54:51,381][01070] Starting all processes... [2024-09-06 07:54:51,382][01070] Starting process learner_proc0 [2024-09-06 07:54:52,097][01070] Starting all processes... [2024-09-06 07:54:52,107][01070] Starting process inference_proc0-0 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc0 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc1 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc2 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc3 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc4 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc5 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc6 [2024-09-06 07:54:52,108][01070] Starting process rollout_proc7 [2024-09-06 07:55:08,667][06068] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 07:55:08,668][06068] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-06 07:55:08,754][06069] Worker 0 uses CPU cores [0] [2024-09-06 07:55:08,849][06068] Num visible devices: 1 [2024-09-06 07:55:08,857][06055] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 07:55:08,862][06055] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-06 07:55:08,865][06072] Worker 3 uses CPU cores [1] [2024-09-06 07:55:08,930][06073] Worker 4 uses CPU cores [0] [2024-09-06 07:55:08,955][06055] Num visible devices: 1 [2024-09-06 07:55:08,995][06055] Starting seed is not provided [2024-09-06 07:55:08,996][06055] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 07:55:08,997][06055] Initializing actor-critic model on device cuda:0 [2024-09-06 07:55:08,998][06055] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 07:55:09,001][06055] RunningMeanStd input shape: (1,) [2024-09-06 07:55:09,045][06075] Worker 6 uses CPU cores [0] [2024-09-06 07:55:09,054][06074] Worker 5 uses CPU cores [1] [2024-09-06 07:55:09,100][06055] ConvEncoder: input_channels=3 [2024-09-06 07:55:09,170][06071] Worker 2 uses CPU cores [0] [2024-09-06 07:55:09,188][06076] Worker 7 uses CPU cores [1] [2024-09-06 07:55:09,206][06070] Worker 1 uses CPU cores [1] [2024-09-06 07:55:09,434][06055] Conv encoder output size: 512 [2024-09-06 07:55:09,434][06055] Policy head output size: 512 [2024-09-06 07:55:09,504][06055] Created Actor Critic model with architecture: [2024-09-06 07:55:09,505][06055] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-06 07:55:09,884][06055] Using optimizer [2024-09-06 07:55:10,611][06055] No checkpoints found [2024-09-06 07:55:10,612][06055] Did not load from checkpoint, starting from scratch! [2024-09-06 07:55:10,612][06055] Initialized policy 0 weights for model version 0 [2024-09-06 07:55:10,618][06055] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 07:55:10,624][06055] LearnerWorker_p0 finished initialization! [2024-09-06 07:55:10,711][06068] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 07:55:10,713][06068] RunningMeanStd input shape: (1,) [2024-09-06 07:55:10,725][06068] ConvEncoder: input_channels=3 [2024-09-06 07:55:10,825][06068] Conv encoder output size: 512 [2024-09-06 07:55:10,826][06068] Policy head output size: 512 [2024-09-06 07:55:10,875][01070] Inference worker 0-0 is ready! [2024-09-06 07:55:10,877][01070] All inference workers are ready! Signal rollout workers to start! [2024-09-06 07:55:11,084][06071] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,088][06069] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,090][06075] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,099][06076] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,093][06073] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,101][06074] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,097][06072] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,109][06070] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 07:55:11,317][01070] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 07:55:11,338][01070] Heartbeat connected on Batcher_0 [2024-09-06 07:55:11,342][01070] Heartbeat connected on LearnerWorker_p0 [2024-09-06 07:55:11,375][01070] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-06 07:55:12,668][06071] Decorrelating experience for 0 frames... [2024-09-06 07:55:12,666][06069] Decorrelating experience for 0 frames... [2024-09-06 07:55:12,669][06075] Decorrelating experience for 0 frames... [2024-09-06 07:55:12,943][06076] Decorrelating experience for 0 frames... [2024-09-06 07:55:12,952][06072] Decorrelating experience for 0 frames... [2024-09-06 07:55:12,955][06074] Decorrelating experience for 0 frames... [2024-09-06 07:55:12,961][06070] Decorrelating experience for 0 frames... [2024-09-06 07:55:13,724][06076] Decorrelating experience for 32 frames... [2024-09-06 07:55:13,727][06074] Decorrelating experience for 32 frames... [2024-09-06 07:55:14,072][06071] Decorrelating experience for 32 frames... [2024-09-06 07:55:14,074][06069] Decorrelating experience for 32 frames... [2024-09-06 07:55:14,077][06075] Decorrelating experience for 32 frames... [2024-09-06 07:55:14,540][06073] Decorrelating experience for 0 frames... [2024-09-06 07:55:14,891][06070] Decorrelating experience for 32 frames... [2024-09-06 07:55:15,287][06074] Decorrelating experience for 64 frames... [2024-09-06 07:55:15,383][06072] Decorrelating experience for 32 frames... [2024-09-06 07:55:15,566][06071] Decorrelating experience for 64 frames... [2024-09-06 07:55:15,585][06069] Decorrelating experience for 64 frames... [2024-09-06 07:55:16,059][06075] Decorrelating experience for 64 frames... [2024-09-06 07:55:16,321][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 07:55:16,546][06071] Decorrelating experience for 96 frames... [2024-09-06 07:55:16,581][06070] Decorrelating experience for 64 frames... [2024-09-06 07:55:16,677][06076] Decorrelating experience for 64 frames... [2024-09-06 07:55:16,724][06074] Decorrelating experience for 96 frames... [2024-09-06 07:55:16,739][01070] Heartbeat connected on RolloutWorker_w2 [2024-09-06 07:55:16,988][01070] Heartbeat connected on RolloutWorker_w5 [2024-09-06 07:55:17,186][06075] Decorrelating experience for 96 frames... [2024-09-06 07:55:17,328][01070] Heartbeat connected on RolloutWorker_w6 [2024-09-06 07:55:17,605][06069] Decorrelating experience for 96 frames... [2024-09-06 07:55:17,810][01070] Heartbeat connected on RolloutWorker_w0 [2024-09-06 07:55:17,989][06072] Decorrelating experience for 64 frames... [2024-09-06 07:55:17,992][06070] Decorrelating experience for 96 frames... [2024-09-06 07:55:18,111][06076] Decorrelating experience for 96 frames... [2024-09-06 07:55:18,212][01070] Heartbeat connected on RolloutWorker_w1 [2024-09-06 07:55:18,296][01070] Heartbeat connected on RolloutWorker_w7 [2024-09-06 07:55:18,667][06073] Decorrelating experience for 32 frames... [2024-09-06 07:55:18,736][06072] Decorrelating experience for 96 frames... [2024-09-06 07:55:18,820][01070] Heartbeat connected on RolloutWorker_w3 [2024-09-06 07:55:20,114][06073] Decorrelating experience for 64 frames... [2024-09-06 07:55:21,318][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 91.4. Samples: 914. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 07:55:21,321][01070] Avg episode reward: [(0, '1.348')] [2024-09-06 07:55:22,967][06055] Signal inference workers to stop experience collection... [2024-09-06 07:55:22,981][06068] InferenceWorker_p0-w0: stopping experience collection [2024-09-06 07:55:23,292][06073] Decorrelating experience for 96 frames... [2024-09-06 07:55:23,475][01070] Heartbeat connected on RolloutWorker_w4 [2024-09-06 07:55:26,317][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 152.5. Samples: 2288. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 07:55:26,322][01070] Avg episode reward: [(0, '2.368')] [2024-09-06 07:55:26,933][06055] Signal inference workers to resume experience collection... [2024-09-06 07:55:26,935][06068] InferenceWorker_p0-w0: resuming experience collection [2024-09-06 07:55:31,317][01070] Fps is (10 sec: 2458.0, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 24576. Throughput: 0: 209.5. Samples: 4190. Policy #0 lag: (min: 0.0, avg: 0.4, max: 3.0) [2024-09-06 07:55:31,322][01070] Avg episode reward: [(0, '3.545')] [2024-09-06 07:55:34,552][06068] Updated weights for policy 0, policy_version 10 (0.0173) [2024-09-06 07:55:36,317][01070] Fps is (10 sec: 4505.4, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 45056. Throughput: 0: 445.8. Samples: 11144. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:55:36,323][01070] Avg episode reward: [(0, '4.241')] [2024-09-06 07:55:41,319][01070] Fps is (10 sec: 3276.0, 60 sec: 1911.3, 300 sec: 1911.3). Total num frames: 57344. Throughput: 0: 511.6. Samples: 15350. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:55:41,324][01070] Avg episode reward: [(0, '4.401')] [2024-09-06 07:55:46,317][01070] Fps is (10 sec: 3276.8, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 500.7. Samples: 17526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:55:46,319][01070] Avg episode reward: [(0, '4.396')] [2024-09-06 07:55:47,170][06068] Updated weights for policy 0, policy_version 20 (0.0030) [2024-09-06 07:55:51,317][01070] Fps is (10 sec: 4097.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 98304. Throughput: 0: 607.7. Samples: 24308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 07:55:51,319][01070] Avg episode reward: [(0, '4.360')] [2024-09-06 07:55:56,317][01070] Fps is (10 sec: 4096.0, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 118784. Throughput: 0: 684.4. Samples: 30798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:55:56,321][01070] Avg episode reward: [(0, '4.546')] [2024-09-06 07:55:56,332][06055] Saving new best policy, reward=4.546! [2024-09-06 07:55:56,696][06068] Updated weights for policy 0, policy_version 30 (0.0020) [2024-09-06 07:56:01,317][01070] Fps is (10 sec: 3686.4, 60 sec: 2703.4, 300 sec: 2703.4). Total num frames: 135168. Throughput: 0: 728.7. Samples: 32786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:56:01,321][01070] Avg episode reward: [(0, '4.463')] [2024-09-06 07:56:06,316][01070] Fps is (10 sec: 4096.2, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 159744. Throughput: 0: 840.3. Samples: 38726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 07:56:06,319][01070] Avg episode reward: [(0, '4.541')] [2024-09-06 07:56:07,255][06068] Updated weights for policy 0, policy_version 40 (0.0021) [2024-09-06 07:56:11,317][01070] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 172032. Throughput: 0: 933.5. Samples: 44294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:56:11,325][01070] Avg episode reward: [(0, '4.557')] [2024-09-06 07:56:11,327][06055] Saving new best policy, reward=4.557! [2024-09-06 07:56:16,317][01070] Fps is (10 sec: 2457.4, 60 sec: 3072.2, 300 sec: 2835.7). Total num frames: 184320. Throughput: 0: 930.2. Samples: 46048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:56:16,322][01070] Avg episode reward: [(0, '4.495')] [2024-09-06 07:56:21,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 2867.2). Total num frames: 200704. Throughput: 0: 859.7. Samples: 49832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:56:21,319][01070] Avg episode reward: [(0, '4.274')] [2024-09-06 07:56:21,856][06068] Updated weights for policy 0, policy_version 50 (0.0042) [2024-09-06 07:56:26,320][01070] Fps is (10 sec: 4094.9, 60 sec: 3754.4, 300 sec: 3003.6). Total num frames: 225280. Throughput: 0: 917.1. Samples: 56622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:56:26,326][01070] Avg episode reward: [(0, '4.304')] [2024-09-06 07:56:30,559][06068] Updated weights for policy 0, policy_version 60 (0.0031) [2024-09-06 07:56:31,316][01070] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 946.4. Samples: 60114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:56:31,322][01070] Avg episode reward: [(0, '4.369')] [2024-09-06 07:56:36,317][01070] Fps is (10 sec: 3687.7, 60 sec: 3618.2, 300 sec: 3084.0). Total num frames: 262144. Throughput: 0: 907.4. Samples: 65140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:56:36,319][01070] Avg episode reward: [(0, '4.407')] [2024-09-06 07:56:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3140.3). Total num frames: 282624. Throughput: 0: 891.6. Samples: 70920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:56:41,318][01070] Avg episode reward: [(0, '4.396')] [2024-09-06 07:56:42,046][06068] Updated weights for policy 0, policy_version 70 (0.0046) [2024-09-06 07:56:46,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3233.7). Total num frames: 307200. Throughput: 0: 924.6. Samples: 74394. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 07:56:46,323][01070] Avg episode reward: [(0, '4.273')] [2024-09-06 07:56:46,333][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth... [2024-09-06 07:56:51,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3194.9). Total num frames: 319488. Throughput: 0: 923.6. Samples: 80288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 07:56:51,321][01070] Avg episode reward: [(0, '4.403')] [2024-09-06 07:56:53,243][06068] Updated weights for policy 0, policy_version 80 (0.0022) [2024-09-06 07:56:56,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3618.2, 300 sec: 3198.8). Total num frames: 335872. Throughput: 0: 904.1. Samples: 84980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:56:56,321][01070] Avg episode reward: [(0, '4.573')] [2024-09-06 07:56:56,329][06055] Saving new best policy, reward=4.573! [2024-09-06 07:57:01,316][01070] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 360448. Throughput: 0: 940.5. Samples: 88368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 07:57:01,325][01070] Avg episode reward: [(0, '4.464')] [2024-09-06 07:57:02,726][06068] Updated weights for policy 0, policy_version 90 (0.0031) [2024-09-06 07:57:06,316][01070] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3312.4). Total num frames: 380928. Throughput: 0: 1012.4. Samples: 95392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:57:06,319][01070] Avg episode reward: [(0, '4.686')] [2024-09-06 07:57:06,327][06055] Saving new best policy, reward=4.686! [2024-09-06 07:57:11,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 393216. Throughput: 0: 953.2. Samples: 99512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:57:11,320][01070] Avg episode reward: [(0, '4.739')] [2024-09-06 07:57:11,330][06055] Saving new best policy, reward=4.739! [2024-09-06 07:57:14,413][06068] Updated weights for policy 0, policy_version 100 (0.0021) [2024-09-06 07:57:16,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3342.3). Total num frames: 417792. Throughput: 0: 941.3. Samples: 102472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 07:57:16,324][01070] Avg episode reward: [(0, '4.631')] [2024-09-06 07:57:21,317][01070] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3402.8). Total num frames: 442368. Throughput: 0: 983.4. Samples: 109392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 07:57:21,324][01070] Avg episode reward: [(0, '4.517')] [2024-09-06 07:57:23,713][06068] Updated weights for policy 0, policy_version 110 (0.0022) [2024-09-06 07:57:26,318][01070] Fps is (10 sec: 3685.8, 60 sec: 3823.1, 300 sec: 3367.8). Total num frames: 454656. Throughput: 0: 971.4. Samples: 114634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:57:26,321][01070] Avg episode reward: [(0, '4.538')] [2024-09-06 07:57:31,316][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3393.8). Total num frames: 475136. Throughput: 0: 942.6. Samples: 116810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:57:31,323][01070] Avg episode reward: [(0, '4.571')] [2024-09-06 07:57:34,630][06068] Updated weights for policy 0, policy_version 120 (0.0029) [2024-09-06 07:57:36,317][01070] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3418.0). Total num frames: 495616. Throughput: 0: 962.0. Samples: 123580. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 07:57:36,324][01070] Avg episode reward: [(0, '4.726')] [2024-09-06 07:57:41,317][01070] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3440.6). Total num frames: 516096. Throughput: 0: 999.0. Samples: 129936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:57:41,323][01070] Avg episode reward: [(0, '4.768')] [2024-09-06 07:57:41,328][06055] Saving new best policy, reward=4.768! [2024-09-06 07:57:46,181][06068] Updated weights for policy 0, policy_version 130 (0.0043) [2024-09-06 07:57:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3435.4). Total num frames: 532480. Throughput: 0: 968.0. Samples: 131926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 07:57:46,324][01070] Avg episode reward: [(0, '4.814')] [2024-09-06 07:57:46,339][06055] Saving new best policy, reward=4.814! [2024-09-06 07:57:51,316][01070] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3456.0). Total num frames: 552960. Throughput: 0: 937.5. Samples: 137578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:57:51,319][01070] Avg episode reward: [(0, '4.845')] [2024-09-06 07:57:51,323][06055] Saving new best policy, reward=4.845! [2024-09-06 07:57:55,369][06068] Updated weights for policy 0, policy_version 140 (0.0029) [2024-09-06 07:57:56,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3500.2). Total num frames: 577536. Throughput: 0: 1000.7. Samples: 144544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:57:56,319][01070] Avg episode reward: [(0, '4.741')] [2024-09-06 07:58:01,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3469.6). Total num frames: 589824. Throughput: 0: 990.9. Samples: 147064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 07:58:01,322][01070] Avg episode reward: [(0, '4.681')] [2024-09-06 07:58:06,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3487.5). Total num frames: 610304. Throughput: 0: 940.4. Samples: 151710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:58:06,321][01070] Avg episode reward: [(0, '4.354')] [2024-09-06 07:58:06,904][06068] Updated weights for policy 0, policy_version 150 (0.0034) [2024-09-06 07:58:11,317][01070] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3504.4). Total num frames: 630784. Throughput: 0: 978.3. Samples: 158654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:58:11,319][01070] Avg episode reward: [(0, '4.484')] [2024-09-06 07:58:16,317][01070] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3520.3). Total num frames: 651264. Throughput: 0: 1007.8. Samples: 162160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 07:58:16,322][01070] Avg episode reward: [(0, '4.499')] [2024-09-06 07:58:17,037][06068] Updated weights for policy 0, policy_version 160 (0.0034) [2024-09-06 07:58:21,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3492.4). Total num frames: 663552. Throughput: 0: 953.2. Samples: 166476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:58:21,322][01070] Avg episode reward: [(0, '4.346')] [2024-09-06 07:58:26,317][01070] Fps is (10 sec: 3686.5, 60 sec: 3891.3, 300 sec: 3528.9). Total num frames: 688128. Throughput: 0: 952.7. Samples: 172806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:58:26,319][01070] Avg episode reward: [(0, '4.378')] [2024-09-06 07:58:27,560][06068] Updated weights for policy 0, policy_version 170 (0.0040) [2024-09-06 07:58:31,317][01070] Fps is (10 sec: 4915.1, 60 sec: 3959.4, 300 sec: 3563.5). Total num frames: 712704. Throughput: 0: 985.9. Samples: 176290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 07:58:31,324][01070] Avg episode reward: [(0, '4.546')] [2024-09-06 07:58:36,322][01070] Fps is (10 sec: 3684.3, 60 sec: 3822.6, 300 sec: 3536.4). Total num frames: 724992. Throughput: 0: 981.4. Samples: 181748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:58:36,325][01070] Avg episode reward: [(0, '4.479')] [2024-09-06 07:58:39,036][06068] Updated weights for policy 0, policy_version 180 (0.0054) [2024-09-06 07:58:41,317][01070] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 745472. Throughput: 0: 945.0. Samples: 187070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 07:58:41,323][01070] Avg episode reward: [(0, '4.487')] [2024-09-06 07:58:46,317][01070] Fps is (10 sec: 4508.1, 60 sec: 3959.5, 300 sec: 3581.6). Total num frames: 770048. Throughput: 0: 963.1. Samples: 190404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 07:58:46,319][01070] Avg episode reward: [(0, '4.599')] [2024-09-06 07:58:46,332][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth... [2024-09-06 07:58:48,051][06068] Updated weights for policy 0, policy_version 190 (0.0033) [2024-09-06 07:58:51,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3574.7). Total num frames: 786432. Throughput: 0: 1001.3. Samples: 196768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 07:58:51,319][01070] Avg episode reward: [(0, '4.716')] [2024-09-06 07:58:56,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3568.1). Total num frames: 802816. Throughput: 0: 945.9. Samples: 201218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:58:56,324][01070] Avg episode reward: [(0, '4.677')] [2024-09-06 07:58:59,542][06068] Updated weights for policy 0, policy_version 200 (0.0036) [2024-09-06 07:59:01,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3579.5). Total num frames: 823296. Throughput: 0: 944.4. Samples: 204656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:59:01,319][01070] Avg episode reward: [(0, '4.666')] [2024-09-06 07:59:06,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3608.0). Total num frames: 847872. Throughput: 0: 1001.9. Samples: 211562. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:59:06,323][01070] Avg episode reward: [(0, '4.550')] [2024-09-06 07:59:10,208][06068] Updated weights for policy 0, policy_version 210 (0.0028) [2024-09-06 07:59:11,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3584.0). Total num frames: 860160. Throughput: 0: 960.0. Samples: 216004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:59:11,327][01070] Avg episode reward: [(0, '4.533')] [2024-09-06 07:59:16,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3594.4). Total num frames: 880640. Throughput: 0: 939.2. Samples: 218556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:59:16,323][01070] Avg episode reward: [(0, '4.805')] [2024-09-06 07:59:20,094][06068] Updated weights for policy 0, policy_version 220 (0.0030) [2024-09-06 07:59:21,316][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3620.9). Total num frames: 905216. Throughput: 0: 972.9. Samples: 225522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 07:59:21,319][01070] Avg episode reward: [(0, '4.579')] [2024-09-06 07:59:26,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3614.1). Total num frames: 921600. Throughput: 0: 980.2. Samples: 231180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:59:26,322][01070] Avg episode reward: [(0, '4.617')] [2024-09-06 07:59:31,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3607.6). Total num frames: 937984. Throughput: 0: 952.4. Samples: 233264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:59:31,321][01070] Avg episode reward: [(0, '4.698')] [2024-09-06 07:59:31,746][06068] Updated weights for policy 0, policy_version 230 (0.0037) [2024-09-06 07:59:36,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.8, 300 sec: 3632.3). Total num frames: 962560. Throughput: 0: 960.2. Samples: 239978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:59:36,324][01070] Avg episode reward: [(0, '4.640')] [2024-09-06 07:59:40,641][06068] Updated weights for policy 0, policy_version 240 (0.0019) [2024-09-06 07:59:41,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3640.9). Total num frames: 983040. Throughput: 0: 1006.4. Samples: 246506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 07:59:41,319][01070] Avg episode reward: [(0, '4.662')] [2024-09-06 07:59:46,321][01070] Fps is (10 sec: 3684.9, 60 sec: 3822.7, 300 sec: 3634.2). Total num frames: 999424. Throughput: 0: 975.4. Samples: 248554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:59:46,324][01070] Avg episode reward: [(0, '4.517')] [2024-09-06 07:59:51,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3642.5). Total num frames: 1019904. Throughput: 0: 945.4. Samples: 254104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 07:59:51,319][01070] Avg episode reward: [(0, '4.585')] [2024-09-06 07:59:52,074][06068] Updated weights for policy 0, policy_version 250 (0.0044) [2024-09-06 07:59:56,317][01070] Fps is (10 sec: 3687.9, 60 sec: 3891.2, 300 sec: 3636.1). Total num frames: 1036288. Throughput: 0: 969.2. Samples: 259616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 07:59:56,319][01070] Avg episode reward: [(0, '4.734')] [2024-09-06 08:00:01,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3615.8). Total num frames: 1048576. Throughput: 0: 955.6. Samples: 261556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:00:01,321][01070] Avg episode reward: [(0, '4.762')] [2024-09-06 08:00:06,317][01070] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 1060864. Throughput: 0: 888.8. Samples: 265516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:00:06,318][01070] Avg episode reward: [(0, '4.686')] [2024-09-06 08:00:06,345][06068] Updated weights for policy 0, policy_version 260 (0.0046) [2024-09-06 08:00:11,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 1085440. Throughput: 0: 911.2. Samples: 272186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:00:11,319][01070] Avg episode reward: [(0, '4.470')] [2024-09-06 08:00:15,254][06068] Updated weights for policy 0, policy_version 270 (0.0039) [2024-09-06 08:00:16,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1105920. Throughput: 0: 939.1. Samples: 275524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:00:16,318][01070] Avg episode reward: [(0, '4.535')] [2024-09-06 08:00:21,328][01070] Fps is (10 sec: 3682.2, 60 sec: 3617.4, 300 sec: 3804.3). Total num frames: 1122304. Throughput: 0: 901.1. Samples: 280540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:00:21,339][01070] Avg episode reward: [(0, '4.759')] [2024-09-06 08:00:26,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1142784. Throughput: 0: 885.6. Samples: 286356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:00:26,321][01070] Avg episode reward: [(0, '4.852')] [2024-09-06 08:00:26,332][06055] Saving new best policy, reward=4.852! [2024-09-06 08:00:26,855][06068] Updated weights for policy 0, policy_version 280 (0.0043) [2024-09-06 08:00:31,317][01070] Fps is (10 sec: 4510.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1167360. Throughput: 0: 914.9. Samples: 289722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:00:31,318][01070] Avg episode reward: [(0, '4.581')] [2024-09-06 08:00:36,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1183744. Throughput: 0: 929.6. Samples: 295936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:00:36,323][01070] Avg episode reward: [(0, '4.600')] [2024-09-06 08:00:37,327][06068] Updated weights for policy 0, policy_version 290 (0.0022) [2024-09-06 08:00:41,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 1200128. Throughput: 0: 911.8. Samples: 300646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:00:41,323][01070] Avg episode reward: [(0, '4.564')] [2024-09-06 08:00:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3804.4). Total num frames: 1220608. Throughput: 0: 943.5. Samples: 304012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:00:46,321][01070] Avg episode reward: [(0, '4.603')] [2024-09-06 08:00:46,372][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000299_1224704.pth... [2024-09-06 08:00:46,507][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth [2024-09-06 08:00:47,243][06068] Updated weights for policy 0, policy_version 300 (0.0033) [2024-09-06 08:00:51,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 1241088. Throughput: 0: 1005.5. Samples: 310764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:00:51,321][01070] Avg episode reward: [(0, '4.905')] [2024-09-06 08:00:51,344][06055] Saving new best policy, reward=4.905! [2024-09-06 08:00:56,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 1257472. Throughput: 0: 950.9. Samples: 314976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:00:56,324][01070] Avg episode reward: [(0, '4.923')] [2024-09-06 08:00:56,342][06055] Saving new best policy, reward=4.923! [2024-09-06 08:00:58,832][06068] Updated weights for policy 0, policy_version 310 (0.0029) [2024-09-06 08:01:01,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1277952. Throughput: 0: 942.3. Samples: 317926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:01:01,322][01070] Avg episode reward: [(0, '4.709')] [2024-09-06 08:01:06,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 1302528. Throughput: 0: 984.8. Samples: 324844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:01:06,319][01070] Avg episode reward: [(0, '4.749')] [2024-09-06 08:01:08,155][06068] Updated weights for policy 0, policy_version 320 (0.0016) [2024-09-06 08:01:11,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1318912. Throughput: 0: 972.0. Samples: 330098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:01:11,320][01070] Avg episode reward: [(0, '4.757')] [2024-09-06 08:01:16,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1335296. Throughput: 0: 944.2. Samples: 332210. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 08:01:16,324][01070] Avg episode reward: [(0, '4.768')] [2024-09-06 08:01:19,327][06068] Updated weights for policy 0, policy_version 330 (0.0026) [2024-09-06 08:01:21,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3960.2, 300 sec: 3846.1). Total num frames: 1359872. Throughput: 0: 959.8. Samples: 339126. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:01:21,320][01070] Avg episode reward: [(0, '4.852')] [2024-09-06 08:01:26,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1380352. Throughput: 0: 994.4. Samples: 345396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:01:26,321][01070] Avg episode reward: [(0, '4.673')] [2024-09-06 08:01:30,374][06068] Updated weights for policy 0, policy_version 340 (0.0044) [2024-09-06 08:01:31,316][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1392640. Throughput: 0: 965.5. Samples: 347460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 08:01:31,320][01070] Avg episode reward: [(0, '4.731')] [2024-09-06 08:01:36,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1417216. Throughput: 0: 949.2. Samples: 353480. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 08:01:36,318][01070] Avg episode reward: [(0, '4.856')] [2024-09-06 08:01:39,566][06068] Updated weights for policy 0, policy_version 350 (0.0025) [2024-09-06 08:01:41,316][01070] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 1441792. Throughput: 0: 1011.6. Samples: 360496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:01:41,323][01070] Avg episode reward: [(0, '4.687')] [2024-09-06 08:01:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1454080. Throughput: 0: 996.0. Samples: 362748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:01:46,323][01070] Avg episode reward: [(0, '4.580')] [2024-09-06 08:01:51,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1470464. Throughput: 0: 945.5. Samples: 367392. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 08:01:51,321][01070] Avg episode reward: [(0, '4.688')] [2024-09-06 08:01:51,501][06068] Updated weights for policy 0, policy_version 360 (0.0037) [2024-09-06 08:01:56,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1495040. Throughput: 0: 984.5. Samples: 374400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:01:56,324][01070] Avg episode reward: [(0, '4.809')] [2024-09-06 08:02:01,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1511424. Throughput: 0: 1012.0. Samples: 377750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:01,325][01070] Avg episode reward: [(0, '5.081')] [2024-09-06 08:02:01,334][06055] Saving new best policy, reward=5.081! [2024-09-06 08:02:01,346][06068] Updated weights for policy 0, policy_version 370 (0.0044) [2024-09-06 08:02:06,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1527808. Throughput: 0: 953.3. Samples: 382024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:06,324][01070] Avg episode reward: [(0, '5.109')] [2024-09-06 08:02:06,335][06055] Saving new best policy, reward=5.109! [2024-09-06 08:02:11,317][01070] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1552384. Throughput: 0: 958.0. Samples: 388506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:11,327][01070] Avg episode reward: [(0, '4.909')] [2024-09-06 08:02:12,040][06068] Updated weights for policy 0, policy_version 380 (0.0043) [2024-09-06 08:02:16,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1572864. Throughput: 0: 989.5. Samples: 391988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:16,319][01070] Avg episode reward: [(0, '4.791')] [2024-09-06 08:02:21,316][01070] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1589248. Throughput: 0: 966.2. Samples: 396960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:02:21,321][01070] Avg episode reward: [(0, '4.771')] [2024-09-06 08:02:23,656][06068] Updated weights for policy 0, policy_version 390 (0.0062) [2024-09-06 08:02:26,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1609728. Throughput: 0: 936.7. Samples: 402648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:26,322][01070] Avg episode reward: [(0, '4.651')] [2024-09-06 08:02:31,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1630208. Throughput: 0: 964.6. Samples: 406156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:31,322][01070] Avg episode reward: [(0, '4.673')] [2024-09-06 08:02:32,348][06068] Updated weights for policy 0, policy_version 400 (0.0035) [2024-09-06 08:02:36,317][01070] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1650688. Throughput: 0: 999.3. Samples: 412360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:36,318][01070] Avg episode reward: [(0, '4.851')] [2024-09-06 08:02:41,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1662976. Throughput: 0: 946.6. Samples: 416998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:41,322][01070] Avg episode reward: [(0, '5.188')] [2024-09-06 08:02:41,329][06055] Saving new best policy, reward=5.188! [2024-09-06 08:02:43,939][06068] Updated weights for policy 0, policy_version 410 (0.0044) [2024-09-06 08:02:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1687552. Throughput: 0: 948.0. Samples: 420410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:02:46,322][01070] Avg episode reward: [(0, '5.138')] [2024-09-06 08:02:46,333][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000412_1687552.pth... [2024-09-06 08:02:46,505][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth [2024-09-06 08:02:51,318][01070] Fps is (10 sec: 4504.9, 60 sec: 3959.3, 300 sec: 3832.2). Total num frames: 1708032. Throughput: 0: 1003.2. Samples: 427168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:02:51,321][01070] Avg episode reward: [(0, '4.904')] [2024-09-06 08:02:54,708][06068] Updated weights for policy 0, policy_version 420 (0.0014) [2024-09-06 08:02:56,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1724416. Throughput: 0: 952.8. Samples: 431380. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:02:56,319][01070] Avg episode reward: [(0, '4.841')] [2024-09-06 08:03:01,317][01070] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1744896. Throughput: 0: 938.4. Samples: 434216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:03:01,319][01070] Avg episode reward: [(0, '4.777')] [2024-09-06 08:03:04,504][06068] Updated weights for policy 0, policy_version 430 (0.0014) [2024-09-06 08:03:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1765376. Throughput: 0: 983.9. Samples: 441236. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:03:06,323][01070] Avg episode reward: [(0, '4.848')] [2024-09-06 08:03:11,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 1781760. Throughput: 0: 978.1. Samples: 446662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:03:11,322][01070] Avg episode reward: [(0, '4.875')] [2024-09-06 08:03:16,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 1798144. Throughput: 0: 947.8. Samples: 448806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:03:16,319][01070] Avg episode reward: [(0, '5.049')] [2024-09-06 08:03:16,335][06068] Updated weights for policy 0, policy_version 440 (0.0023) [2024-09-06 08:03:21,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1822720. Throughput: 0: 954.9. Samples: 455330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:03:21,320][01070] Avg episode reward: [(0, '5.155')] [2024-09-06 08:03:25,412][06068] Updated weights for policy 0, policy_version 450 (0.0032) [2024-09-06 08:03:26,317][01070] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1843200. Throughput: 0: 993.1. Samples: 461688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:03:26,318][01070] Avg episode reward: [(0, '5.165')] [2024-09-06 08:03:31,322][01070] Fps is (10 sec: 3684.2, 60 sec: 3822.6, 300 sec: 3846.1). Total num frames: 1859584. Throughput: 0: 961.7. Samples: 463694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:03:31,325][01070] Avg episode reward: [(0, '5.015')] [2024-09-06 08:03:36,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1880064. Throughput: 0: 941.8. Samples: 469546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:03:36,319][01070] Avg episode reward: [(0, '4.801')] [2024-09-06 08:03:36,683][06068] Updated weights for policy 0, policy_version 460 (0.0046) [2024-09-06 08:03:41,317][01070] Fps is (10 sec: 4098.3, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1900544. Throughput: 0: 995.4. Samples: 476172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:03:41,324][01070] Avg episode reward: [(0, '5.126')] [2024-09-06 08:03:46,323][01070] Fps is (10 sec: 3274.6, 60 sec: 3754.2, 300 sec: 3818.2). Total num frames: 1912832. Throughput: 0: 972.6. Samples: 477988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:03:46,326][01070] Avg episode reward: [(0, '5.144')] [2024-09-06 08:03:50,595][06068] Updated weights for policy 0, policy_version 470 (0.0043) [2024-09-06 08:03:51,317][01070] Fps is (10 sec: 2457.7, 60 sec: 3618.2, 300 sec: 3804.4). Total num frames: 1925120. Throughput: 0: 891.7. Samples: 481364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:03:51,320][01070] Avg episode reward: [(0, '5.180')] [2024-09-06 08:03:56,317][01070] Fps is (10 sec: 3688.9, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1949696. Throughput: 0: 907.3. Samples: 487490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:03:56,320][01070] Avg episode reward: [(0, '4.899')] [2024-09-06 08:03:59,818][06068] Updated weights for policy 0, policy_version 480 (0.0031) [2024-09-06 08:04:01,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1970176. Throughput: 0: 936.2. Samples: 490936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:04:01,319][01070] Avg episode reward: [(0, '4.708')] [2024-09-06 08:04:06,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1986560. Throughput: 0: 912.2. Samples: 496378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:04:06,319][01070] Avg episode reward: [(0, '4.858')] [2024-09-06 08:04:11,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2002944. Throughput: 0: 889.3. Samples: 501706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:04:11,323][01070] Avg episode reward: [(0, '5.187')] [2024-09-06 08:04:11,461][06068] Updated weights for policy 0, policy_version 490 (0.0044) [2024-09-06 08:04:16,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3804.4). Total num frames: 2027520. Throughput: 0: 922.6. Samples: 505206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:04:16,321][01070] Avg episode reward: [(0, '5.216')] [2024-09-06 08:04:16,333][06055] Saving new best policy, reward=5.216! [2024-09-06 08:04:21,319][01070] Fps is (10 sec: 4094.9, 60 sec: 3686.2, 300 sec: 3804.4). Total num frames: 2043904. Throughput: 0: 931.9. Samples: 511486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:04:21,322][01070] Avg episode reward: [(0, '5.013')] [2024-09-06 08:04:21,394][06068] Updated weights for policy 0, policy_version 500 (0.0030) [2024-09-06 08:04:26,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2060288. Throughput: 0: 878.6. Samples: 515708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:04:26,324][01070] Avg episode reward: [(0, '4.618')] [2024-09-06 08:04:31,317][01070] Fps is (10 sec: 4097.1, 60 sec: 3755.0, 300 sec: 3804.4). Total num frames: 2084864. Throughput: 0: 916.1. Samples: 519206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:04:31,319][01070] Avg episode reward: [(0, '4.710')] [2024-09-06 08:04:32,059][06068] Updated weights for policy 0, policy_version 510 (0.0024) [2024-09-06 08:04:36,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2105344. Throughput: 0: 995.1. Samples: 526144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:04:36,319][01070] Avg episode reward: [(0, '5.116')] [2024-09-06 08:04:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3804.5). Total num frames: 2121728. Throughput: 0: 964.6. Samples: 530896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:04:41,319][01070] Avg episode reward: [(0, '5.008')] [2024-09-06 08:04:43,583][06068] Updated weights for policy 0, policy_version 520 (0.0023) [2024-09-06 08:04:46,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.4, 300 sec: 3804.4). Total num frames: 2142208. Throughput: 0: 945.4. Samples: 533480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:04:46,320][01070] Avg episode reward: [(0, '5.078')] [2024-09-06 08:04:46,333][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000523_2142208.pth... [2024-09-06 08:04:46,489][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000299_1224704.pth [2024-09-06 08:04:51,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2162688. Throughput: 0: 975.2. Samples: 540264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:04:51,321][01070] Avg episode reward: [(0, '5.193')] [2024-09-06 08:04:52,623][06068] Updated weights for policy 0, policy_version 530 (0.0036) [2024-09-06 08:04:56,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2179072. Throughput: 0: 981.4. Samples: 545870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:04:56,323][01070] Avg episode reward: [(0, '4.939')] [2024-09-06 08:05:01,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2195456. Throughput: 0: 949.8. Samples: 547946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:05:01,323][01070] Avg episode reward: [(0, '5.156')] [2024-09-06 08:05:04,000][06068] Updated weights for policy 0, policy_version 540 (0.0028) [2024-09-06 08:05:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2220032. Throughput: 0: 959.5. Samples: 554662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:05:06,324][01070] Avg episode reward: [(0, '5.088')] [2024-09-06 08:05:11,316][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2240512. Throughput: 0: 1010.8. Samples: 561196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:05:11,320][01070] Avg episode reward: [(0, '4.885')] [2024-09-06 08:05:14,445][06068] Updated weights for policy 0, policy_version 550 (0.0037) [2024-09-06 08:05:16,318][01070] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3846.2). Total num frames: 2256896. Throughput: 0: 978.9. Samples: 563260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:05:16,321][01070] Avg episode reward: [(0, '5.189')] [2024-09-06 08:05:21,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3891.4, 300 sec: 3846.1). Total num frames: 2277376. Throughput: 0: 948.0. Samples: 568804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:05:21,319][01070] Avg episode reward: [(0, '5.640')] [2024-09-06 08:05:21,324][06055] Saving new best policy, reward=5.640! [2024-09-06 08:05:24,437][06068] Updated weights for policy 0, policy_version 560 (0.0040) [2024-09-06 08:05:26,317][01070] Fps is (10 sec: 4506.5, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 2301952. Throughput: 0: 998.1. Samples: 575810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:05:26,321][01070] Avg episode reward: [(0, '5.759')] [2024-09-06 08:05:26,331][06055] Saving new best policy, reward=5.759! [2024-09-06 08:05:31,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2318336. Throughput: 0: 999.7. Samples: 578466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:05:31,322][01070] Avg episode reward: [(0, '5.847')] [2024-09-06 08:05:31,326][06055] Saving new best policy, reward=5.847! [2024-09-06 08:05:35,920][06068] Updated weights for policy 0, policy_version 570 (0.0024) [2024-09-06 08:05:36,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2334720. Throughput: 0: 952.0. Samples: 583102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:05:36,319][01070] Avg episode reward: [(0, '5.772')] [2024-09-06 08:05:41,317][01070] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2359296. Throughput: 0: 984.4. Samples: 590166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:05:41,319][01070] Avg episode reward: [(0, '5.772')] [2024-09-06 08:05:44,966][06068] Updated weights for policy 0, policy_version 580 (0.0031) [2024-09-06 08:05:46,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2375680. Throughput: 0: 1015.8. Samples: 593656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:05:46,326][01070] Avg episode reward: [(0, '5.835')] [2024-09-06 08:05:51,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2392064. Throughput: 0: 961.4. Samples: 597926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:05:51,325][01070] Avg episode reward: [(0, '5.707')] [2024-09-06 08:05:56,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2412544. Throughput: 0: 956.4. Samples: 604232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:05:56,325][01070] Avg episode reward: [(0, '5.539')] [2024-09-06 08:05:56,404][06068] Updated weights for policy 0, policy_version 590 (0.0035) [2024-09-06 08:06:01,317][01070] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 2437120. Throughput: 0: 987.2. Samples: 607680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:06:01,321][01070] Avg episode reward: [(0, '5.823')] [2024-09-06 08:06:06,317][01070] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2453504. Throughput: 0: 982.3. Samples: 613010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:06:06,323][01070] Avg episode reward: [(0, '5.596')] [2024-09-06 08:06:07,575][06068] Updated weights for policy 0, policy_version 600 (0.0034) [2024-09-06 08:06:11,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2469888. Throughput: 0: 945.9. Samples: 618374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:06:11,324][01070] Avg episode reward: [(0, '5.336')] [2024-09-06 08:06:16,317][01070] Fps is (10 sec: 4096.2, 60 sec: 3959.6, 300 sec: 3846.1). Total num frames: 2494464. Throughput: 0: 964.4. Samples: 621862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:06:16,324][01070] Avg episode reward: [(0, '5.749')] [2024-09-06 08:06:16,804][06068] Updated weights for policy 0, policy_version 610 (0.0036) [2024-09-06 08:06:21,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2510848. Throughput: 0: 1000.3. Samples: 628114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:06:21,322][01070] Avg episode reward: [(0, '6.247')] [2024-09-06 08:06:21,328][06055] Saving new best policy, reward=6.247! [2024-09-06 08:06:26,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2527232. Throughput: 0: 943.9. Samples: 632640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:06:26,324][01070] Avg episode reward: [(0, '6.326')] [2024-09-06 08:06:26,335][06055] Saving new best policy, reward=6.326! [2024-09-06 08:06:28,439][06068] Updated weights for policy 0, policy_version 620 (0.0026) [2024-09-06 08:06:31,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2551808. Throughput: 0: 943.6. Samples: 636116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:06:31,323][01070] Avg episode reward: [(0, '5.950')] [2024-09-06 08:06:36,323][01070] Fps is (10 sec: 4502.5, 60 sec: 3959.0, 300 sec: 3832.1). Total num frames: 2572288. Throughput: 0: 1005.1. Samples: 643160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:06:36,326][01070] Avg episode reward: [(0, '5.960')] [2024-09-06 08:06:38,101][06068] Updated weights for policy 0, policy_version 630 (0.0023) [2024-09-06 08:06:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2588672. Throughput: 0: 964.2. Samples: 647620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:06:41,319][01070] Avg episode reward: [(0, '5.987')] [2024-09-06 08:06:46,317][01070] Fps is (10 sec: 3688.9, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2609152. Throughput: 0: 950.4. Samples: 650448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:06:46,323][01070] Avg episode reward: [(0, '6.091')] [2024-09-06 08:06:46,333][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000637_2609152.pth... [2024-09-06 08:06:46,464][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000412_1687552.pth [2024-09-06 08:06:48,887][06068] Updated weights for policy 0, policy_version 640 (0.0049) [2024-09-06 08:06:51,316][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2629632. Throughput: 0: 982.1. Samples: 657202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:06:51,323][01070] Avg episode reward: [(0, '6.072')] [2024-09-06 08:06:56,317][01070] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2646016. Throughput: 0: 980.7. Samples: 662504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:06:56,319][01070] Avg episode reward: [(0, '5.924')] [2024-09-06 08:07:00,529][06068] Updated weights for policy 0, policy_version 650 (0.0028) [2024-09-06 08:07:01,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2662400. Throughput: 0: 950.1. Samples: 664618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:07:01,319][01070] Avg episode reward: [(0, '5.764')] [2024-09-06 08:07:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2686976. Throughput: 0: 963.3. Samples: 671462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-06 08:07:06,318][01070] Avg episode reward: [(0, '5.901')] [2024-09-06 08:07:09,346][06068] Updated weights for policy 0, policy_version 660 (0.0045) [2024-09-06 08:07:11,317][01070] Fps is (10 sec: 4505.2, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 2707456. Throughput: 0: 1000.9. Samples: 677680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:07:11,320][01070] Avg episode reward: [(0, '6.609')] [2024-09-06 08:07:11,322][06055] Saving new best policy, reward=6.609! [2024-09-06 08:07:16,318][01070] Fps is (10 sec: 3276.4, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 2719744. Throughput: 0: 967.8. Samples: 679670. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:07:16,324][01070] Avg episode reward: [(0, '6.832')] [2024-09-06 08:07:16,336][06055] Saving new best policy, reward=6.832! [2024-09-06 08:07:21,160][06068] Updated weights for policy 0, policy_version 670 (0.0030) [2024-09-06 08:07:21,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 2744320. Throughput: 0: 935.1. Samples: 685232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:07:21,321][01070] Avg episode reward: [(0, '6.827')] [2024-09-06 08:07:26,317][01070] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2760704. Throughput: 0: 966.4. Samples: 691106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:07:26,323][01070] Avg episode reward: [(0, '6.439')] [2024-09-06 08:07:31,319][01070] Fps is (10 sec: 2866.6, 60 sec: 3686.2, 300 sec: 3804.4). Total num frames: 2772992. Throughput: 0: 942.7. Samples: 692870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:07:31,325][01070] Avg episode reward: [(0, '6.339')] [2024-09-06 08:07:35,290][06068] Updated weights for policy 0, policy_version 680 (0.0034) [2024-09-06 08:07:36,317][01070] Fps is (10 sec: 2457.6, 60 sec: 3550.3, 300 sec: 3804.4). Total num frames: 2785280. Throughput: 0: 878.4. Samples: 696728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-06 08:07:36,321][01070] Avg episode reward: [(0, '6.444')] [2024-09-06 08:07:41,317][01070] Fps is (10 sec: 3687.5, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2809856. Throughput: 0: 908.9. Samples: 703404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:07:41,319][01070] Avg episode reward: [(0, '6.654')] [2024-09-06 08:07:44,151][06068] Updated weights for policy 0, policy_version 690 (0.0028) [2024-09-06 08:07:46,317][01070] Fps is (10 sec: 4915.2, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2834432. Throughput: 0: 940.5. Samples: 706940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:07:46,319][01070] Avg episode reward: [(0, '6.718')] [2024-09-06 08:07:51,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2846720. Throughput: 0: 901.1. Samples: 712010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:07:51,319][01070] Avg episode reward: [(0, '6.882')] [2024-09-06 08:07:51,324][06055] Saving new best policy, reward=6.882! [2024-09-06 08:07:55,843][06068] Updated weights for policy 0, policy_version 700 (0.0039) [2024-09-06 08:07:56,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2867200. Throughput: 0: 884.7. Samples: 717490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:07:56,325][01070] Avg episode reward: [(0, '7.244')] [2024-09-06 08:07:56,336][06055] Saving new best policy, reward=7.244! [2024-09-06 08:08:01,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2891776. Throughput: 0: 917.8. Samples: 720972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:08:01,324][01070] Avg episode reward: [(0, '7.391')] [2024-09-06 08:08:01,326][06055] Saving new best policy, reward=7.391! [2024-09-06 08:08:05,806][06068] Updated weights for policy 0, policy_version 710 (0.0027) [2024-09-06 08:08:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 2908160. Throughput: 0: 930.1. Samples: 727084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:08:06,325][01070] Avg episode reward: [(0, '6.836')] [2024-09-06 08:08:11,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3818.3). Total num frames: 2924544. Throughput: 0: 901.6. Samples: 731678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:08:11,319][01070] Avg episode reward: [(0, '6.692')] [2024-09-06 08:08:16,037][06068] Updated weights for policy 0, policy_version 720 (0.0022) [2024-09-06 08:08:16,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 2949120. Throughput: 0: 940.1. Samples: 735172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:08:16,319][01070] Avg episode reward: [(0, '6.850')] [2024-09-06 08:08:21,321][01070] Fps is (10 sec: 4503.7, 60 sec: 3754.5, 300 sec: 3818.2). Total num frames: 2969600. Throughput: 0: 1006.2. Samples: 742010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:08:21,323][01070] Avg episode reward: [(0, '7.485')] [2024-09-06 08:08:21,325][06055] Saving new best policy, reward=7.485! [2024-09-06 08:08:26,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.5). Total num frames: 2981888. Throughput: 0: 953.0. Samples: 746290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:08:26,319][01070] Avg episode reward: [(0, '7.434')] [2024-09-06 08:08:27,902][06068] Updated weights for policy 0, policy_version 730 (0.0032) [2024-09-06 08:08:31,317][01070] Fps is (10 sec: 3278.2, 60 sec: 3823.1, 300 sec: 3804.4). Total num frames: 3002368. Throughput: 0: 939.8. Samples: 749232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:08:31,319][01070] Avg episode reward: [(0, '7.716')] [2024-09-06 08:08:31,354][06055] Saving new best policy, reward=7.716! [2024-09-06 08:08:36,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 3026944. Throughput: 0: 982.7. Samples: 756230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:08:36,319][01070] Avg episode reward: [(0, '7.928')] [2024-09-06 08:08:36,335][06055] Saving new best policy, reward=7.928! [2024-09-06 08:08:36,786][06068] Updated weights for policy 0, policy_version 740 (0.0040) [2024-09-06 08:08:41,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.3). Total num frames: 3043328. Throughput: 0: 975.6. Samples: 761394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:08:41,319][01070] Avg episode reward: [(0, '7.704')] [2024-09-06 08:08:46,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3059712. Throughput: 0: 946.2. Samples: 763550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:08:46,318][01070] Avg episode reward: [(0, '7.583')] [2024-09-06 08:08:46,332][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000747_3059712.pth... [2024-09-06 08:08:46,453][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000523_2142208.pth [2024-09-06 08:08:48,365][06068] Updated weights for policy 0, policy_version 750 (0.0018) [2024-09-06 08:08:51,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3084288. Throughput: 0: 960.6. Samples: 770310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:08:51,322][01070] Avg episode reward: [(0, '7.840')] [2024-09-06 08:08:56,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3100672. Throughput: 0: 992.8. Samples: 776356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:08:56,320][01070] Avg episode reward: [(0, '7.925')] [2024-09-06 08:08:59,558][06068] Updated weights for policy 0, policy_version 760 (0.0058) [2024-09-06 08:09:01,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3117056. Throughput: 0: 961.3. Samples: 778430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:09:01,323][01070] Avg episode reward: [(0, '7.469')] [2024-09-06 08:09:06,318][01070] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 3141632. Throughput: 0: 942.7. Samples: 784430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-09-06 08:09:06,322][01070] Avg episode reward: [(0, '7.036')] [2024-09-06 08:09:08,713][06068] Updated weights for policy 0, policy_version 770 (0.0036) [2024-09-06 08:09:11,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3162112. Throughput: 0: 1005.6. Samples: 791544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:09:11,324][01070] Avg episode reward: [(0, '7.387')] [2024-09-06 08:09:16,322][01070] Fps is (10 sec: 3684.7, 60 sec: 3822.6, 300 sec: 3846.0). Total num frames: 3178496. Throughput: 0: 991.1. Samples: 793836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:09:16,325][01070] Avg episode reward: [(0, '7.663')] [2024-09-06 08:09:20,382][06068] Updated weights for policy 0, policy_version 780 (0.0022) [2024-09-06 08:09:21,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3860.0). Total num frames: 3198976. Throughput: 0: 946.0. Samples: 798798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:09:21,325][01070] Avg episode reward: [(0, '7.722')] [2024-09-06 08:09:26,317][01070] Fps is (10 sec: 4508.2, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3223552. Throughput: 0: 988.2. Samples: 805864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:09:26,320][01070] Avg episode reward: [(0, '7.744')] [2024-09-06 08:09:29,518][06068] Updated weights for policy 0, policy_version 790 (0.0014) [2024-09-06 08:09:31,319][01070] Fps is (10 sec: 4095.0, 60 sec: 3959.3, 300 sec: 3846.0). Total num frames: 3239936. Throughput: 0: 1014.1. Samples: 809186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:09:31,322][01070] Avg episode reward: [(0, '7.755')] [2024-09-06 08:09:36,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3256320. Throughput: 0: 960.5. Samples: 813534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:09:36,318][01070] Avg episode reward: [(0, '8.140')] [2024-09-06 08:09:36,331][06055] Saving new best policy, reward=8.140! [2024-09-06 08:09:40,585][06068] Updated weights for policy 0, policy_version 800 (0.0042) [2024-09-06 08:09:41,317][01070] Fps is (10 sec: 3687.3, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3276800. Throughput: 0: 974.7. Samples: 820218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:09:41,323][01070] Avg episode reward: [(0, '8.704')] [2024-09-06 08:09:41,332][06055] Saving new best policy, reward=8.704! [2024-09-06 08:09:46,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3301376. Throughput: 0: 1006.8. Samples: 823736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:09:46,322][01070] Avg episode reward: [(0, '8.566')] [2024-09-06 08:09:51,321][01070] Fps is (10 sec: 3684.6, 60 sec: 3822.6, 300 sec: 3846.0). Total num frames: 3313664. Throughput: 0: 986.4. Samples: 828822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:09:51,327][01070] Avg episode reward: [(0, '8.672')] [2024-09-06 08:09:51,764][06068] Updated weights for policy 0, policy_version 810 (0.0032) [2024-09-06 08:09:56,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3334144. Throughput: 0: 955.2. Samples: 834528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:09:56,323][01070] Avg episode reward: [(0, '8.667')] [2024-09-06 08:10:00,867][06068] Updated weights for policy 0, policy_version 820 (0.0032) [2024-09-06 08:10:01,317][01070] Fps is (10 sec: 4507.8, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3358720. Throughput: 0: 982.2. Samples: 838030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:10:01,319][01070] Avg episode reward: [(0, '9.323')] [2024-09-06 08:10:01,327][06055] Saving new best policy, reward=9.323! [2024-09-06 08:10:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 3375104. Throughput: 0: 1002.8. Samples: 843922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:10:06,322][01070] Avg episode reward: [(0, '9.281')] [2024-09-06 08:10:11,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3391488. Throughput: 0: 951.8. Samples: 848696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:10:11,319][01070] Avg episode reward: [(0, '9.212')] [2024-09-06 08:10:12,461][06068] Updated weights for policy 0, policy_version 830 (0.0025) [2024-09-06 08:10:16,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.9, 300 sec: 3860.0). Total num frames: 3416064. Throughput: 0: 956.3. Samples: 852216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:10:16,319][01070] Avg episode reward: [(0, '9.185')] [2024-09-06 08:10:21,317][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3436544. Throughput: 0: 1017.1. Samples: 859302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:10:21,320][01070] Avg episode reward: [(0, '9.434')] [2024-09-06 08:10:21,324][06055] Saving new best policy, reward=9.434! [2024-09-06 08:10:22,066][06068] Updated weights for policy 0, policy_version 840 (0.0030) [2024-09-06 08:10:26,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3452928. Throughput: 0: 961.2. Samples: 863472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:10:26,322][01070] Avg episode reward: [(0, '9.987')] [2024-09-06 08:10:26,338][06055] Saving new best policy, reward=9.987! [2024-09-06 08:10:31,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3860.0). Total num frames: 3473408. Throughput: 0: 951.3. Samples: 866546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:10:31,322][01070] Avg episode reward: [(0, '10.544')] [2024-09-06 08:10:31,325][06055] Saving new best policy, reward=10.544! [2024-09-06 08:10:32,772][06068] Updated weights for policy 0, policy_version 850 (0.0028) [2024-09-06 08:10:36,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3497984. Throughput: 0: 993.9. Samples: 873542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:10:36,319][01070] Avg episode reward: [(0, '11.038')] [2024-09-06 08:10:36,329][06055] Saving new best policy, reward=11.038! [2024-09-06 08:10:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3510272. Throughput: 0: 980.8. Samples: 878664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:10:41,325][01070] Avg episode reward: [(0, '10.332')] [2024-09-06 08:10:44,181][06068] Updated weights for policy 0, policy_version 860 (0.0030) [2024-09-06 08:10:46,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3530752. Throughput: 0: 953.3. Samples: 880928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:10:46,318][01070] Avg episode reward: [(0, '10.049')] [2024-09-06 08:10:46,335][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000862_3530752.pth... [2024-09-06 08:10:46,466][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000637_2609152.pth [2024-09-06 08:10:51,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 3873.8). Total num frames: 3555328. Throughput: 0: 981.0. Samples: 888068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:10:51,325][01070] Avg episode reward: [(0, '9.881')] [2024-09-06 08:10:52,867][06068] Updated weights for policy 0, policy_version 870 (0.0024) [2024-09-06 08:10:56,318][01070] Fps is (10 sec: 4095.3, 60 sec: 3959.3, 300 sec: 3846.1). Total num frames: 3571712. Throughput: 0: 1010.2. Samples: 894156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:10:56,321][01070] Avg episode reward: [(0, '10.324')] [2024-09-06 08:11:01,317][01070] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3588096. Throughput: 0: 978.2. Samples: 896234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:11:01,324][01070] Avg episode reward: [(0, '10.751')] [2024-09-06 08:11:04,130][06068] Updated weights for policy 0, policy_version 880 (0.0019) [2024-09-06 08:11:06,317][01070] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3612672. Throughput: 0: 960.8. Samples: 902540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:11:06,321][01070] Avg episode reward: [(0, '11.121')] [2024-09-06 08:11:06,332][06055] Saving new best policy, reward=11.121! [2024-09-06 08:11:11,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 3629056. Throughput: 0: 994.5. Samples: 908226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:11:11,319][01070] Avg episode reward: [(0, '11.751')] [2024-09-06 08:11:11,326][06055] Saving new best policy, reward=11.751! [2024-09-06 08:11:16,317][01070] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3641344. Throughput: 0: 964.6. Samples: 909952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:11:16,319][01070] Avg episode reward: [(0, '12.049')] [2024-09-06 08:11:16,333][06055] Saving new best policy, reward=12.049! [2024-09-06 08:11:17,256][06068] Updated weights for policy 0, policy_version 890 (0.0045) [2024-09-06 08:11:21,316][01070] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3657728. Throughput: 0: 893.7. Samples: 913758. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:11:21,318][01070] Avg episode reward: [(0, '13.812')] [2024-09-06 08:11:21,326][06055] Saving new best policy, reward=13.812! [2024-09-06 08:11:26,317][01070] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 3678208. Throughput: 0: 932.5. Samples: 920626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:11:26,325][01070] Avg episode reward: [(0, '15.231')] [2024-09-06 08:11:26,390][06055] Saving new best policy, reward=15.231! [2024-09-06 08:11:27,393][06068] Updated weights for policy 0, policy_version 900 (0.0024) [2024-09-06 08:11:31,316][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3832.3). Total num frames: 3702784. Throughput: 0: 960.0. Samples: 924128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:11:31,322][01070] Avg episode reward: [(0, '15.872')] [2024-09-06 08:11:31,328][06055] Saving new best policy, reward=15.872! [2024-09-06 08:11:36,317][01070] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 3715072. Throughput: 0: 907.6. Samples: 928912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:11:36,320][01070] Avg episode reward: [(0, '16.561')] [2024-09-06 08:11:36,333][06055] Saving new best policy, reward=16.561! [2024-09-06 08:11:38,716][06068] Updated weights for policy 0, policy_version 910 (0.0025) [2024-09-06 08:11:41,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3739648. Throughput: 0: 909.8. Samples: 935094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:11:41,318][01070] Avg episode reward: [(0, '14.523')] [2024-09-06 08:11:46,317][01070] Fps is (10 sec: 4505.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3760128. Throughput: 0: 942.4. Samples: 938640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:11:46,320][01070] Avg episode reward: [(0, '14.427')] [2024-09-06 08:11:47,204][06068] Updated weights for policy 0, policy_version 920 (0.0035) [2024-09-06 08:11:51,316][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3776512. Throughput: 0: 934.4. Samples: 944586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:11:51,321][01070] Avg episode reward: [(0, '14.547')] [2024-09-06 08:11:56,317][01070] Fps is (10 sec: 3686.6, 60 sec: 3754.8, 300 sec: 3846.1). Total num frames: 3796992. Throughput: 0: 915.3. Samples: 949416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:11:56,319][01070] Avg episode reward: [(0, '15.007')] [2024-09-06 08:11:58,865][06068] Updated weights for policy 0, policy_version 930 (0.0040) [2024-09-06 08:12:01,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3817472. Throughput: 0: 955.3. Samples: 952942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:12:01,319][01070] Avg episode reward: [(0, '16.278')] [2024-09-06 08:12:06,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3837952. Throughput: 0: 1027.6. Samples: 959998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:12:06,326][01070] Avg episode reward: [(0, '16.304')] [2024-09-06 08:12:09,321][06068] Updated weights for policy 0, policy_version 940 (0.0024) [2024-09-06 08:12:11,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3854336. Throughput: 0: 969.6. Samples: 964258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:12:11,320][01070] Avg episode reward: [(0, '16.576')] [2024-09-06 08:12:11,326][06055] Saving new best policy, reward=16.576! [2024-09-06 08:12:16,317][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3878912. Throughput: 0: 961.4. Samples: 967390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:12:16,322][01070] Avg episode reward: [(0, '16.934')] [2024-09-06 08:12:16,332][06055] Saving new best policy, reward=16.934! [2024-09-06 08:12:18,962][06068] Updated weights for policy 0, policy_version 950 (0.0061) [2024-09-06 08:12:21,317][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3899392. Throughput: 0: 1011.4. Samples: 974424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:12:21,321][01070] Avg episode reward: [(0, '16.941')] [2024-09-06 08:12:21,326][06055] Saving new best policy, reward=16.941! [2024-09-06 08:12:26,317][01070] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 3915776. Throughput: 0: 986.0. Samples: 979462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:12:26,321][01070] Avg episode reward: [(0, '17.171')] [2024-09-06 08:12:26,332][06055] Saving new best policy, reward=17.171! [2024-09-06 08:12:30,483][06068] Updated weights for policy 0, policy_version 960 (0.0038) [2024-09-06 08:12:31,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3932160. Throughput: 0: 955.2. Samples: 981622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:12:31,318][01070] Avg episode reward: [(0, '17.914')] [2024-09-06 08:12:31,330][06055] Saving new best policy, reward=17.914! [2024-09-06 08:12:36,317][01070] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3956736. Throughput: 0: 980.8. Samples: 988720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:12:36,319][01070] Avg episode reward: [(0, '17.778')] [2024-09-06 08:12:39,405][06068] Updated weights for policy 0, policy_version 970 (0.0031) [2024-09-06 08:12:41,318][01070] Fps is (10 sec: 4504.8, 60 sec: 3959.3, 300 sec: 3873.8). Total num frames: 3977216. Throughput: 0: 1011.3. Samples: 994928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:12:41,321][01070] Avg episode reward: [(0, '17.411')] [2024-09-06 08:12:46,317][01070] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 3989504. Throughput: 0: 979.5. Samples: 997020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:12:46,319][01070] Avg episode reward: [(0, '16.564')] [2024-09-06 08:12:46,386][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000975_3993600.pth... [2024-09-06 08:12:46,505][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000747_3059712.pth [2024-09-06 08:12:49,066][06055] Stopping Batcher_0... [2024-09-06 08:12:49,066][06055] Loop batcher_evt_loop terminating... [2024-09-06 08:12:49,068][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-06 08:12:49,066][01070] Component Batcher_0 stopped! [2024-09-06 08:12:49,172][06068] Weights refcount: 2 0 [2024-09-06 08:12:49,178][06068] Stopping InferenceWorker_p0-w0... [2024-09-06 08:12:49,180][01070] Component InferenceWorker_p0-w0 stopped! [2024-09-06 08:12:49,179][06068] Loop inference_proc0-0_evt_loop terminating... [2024-09-06 08:12:49,262][06055] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000862_3530752.pth [2024-09-06 08:12:49,282][06055] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-06 08:12:49,452][01070] Component LearnerWorker_p0 stopped! [2024-09-06 08:12:49,458][06055] Stopping LearnerWorker_p0... [2024-09-06 08:12:49,458][06055] Loop learner_proc0_evt_loop terminating... [2024-09-06 08:12:49,517][06070] Stopping RolloutWorker_w1... [2024-09-06 08:12:49,518][06070] Loop rollout_proc1_evt_loop terminating... [2024-09-06 08:12:49,518][01070] Component RolloutWorker_w1 stopped! [2024-09-06 08:12:49,528][01070] Component RolloutWorker_w7 stopped! [2024-09-06 08:12:49,528][06076] Stopping RolloutWorker_w7... [2024-09-06 08:12:49,534][06076] Loop rollout_proc7_evt_loop terminating... [2024-09-06 08:12:49,583][06072] Stopping RolloutWorker_w3... [2024-09-06 08:12:49,587][06074] Stopping RolloutWorker_w5... [2024-09-06 08:12:49,584][01070] Component RolloutWorker_w3 stopped! [2024-09-06 08:12:49,589][06072] Loop rollout_proc3_evt_loop terminating... [2024-09-06 08:12:49,589][01070] Component RolloutWorker_w5 stopped! [2024-09-06 08:12:49,589][06074] Loop rollout_proc5_evt_loop terminating... [2024-09-06 08:12:49,624][01070] Component RolloutWorker_w0 stopped! [2024-09-06 08:12:49,629][06069] Stopping RolloutWorker_w0... [2024-09-06 08:12:49,636][06069] Loop rollout_proc0_evt_loop terminating... [2024-09-06 08:12:49,659][01070] Component RolloutWorker_w6 stopped! [2024-09-06 08:12:49,662][06075] Stopping RolloutWorker_w6... [2024-09-06 08:12:49,663][06075] Loop rollout_proc6_evt_loop terminating... [2024-09-06 08:12:49,674][01070] Component RolloutWorker_w2 stopped! [2024-09-06 08:12:49,677][06071] Stopping RolloutWorker_w2... [2024-09-06 08:12:49,677][06071] Loop rollout_proc2_evt_loop terminating... [2024-09-06 08:12:49,711][01070] Component RolloutWorker_w4 stopped! [2024-09-06 08:12:49,714][01070] Waiting for process learner_proc0 to stop... [2024-09-06 08:12:49,717][06073] Stopping RolloutWorker_w4... [2024-09-06 08:12:49,718][06073] Loop rollout_proc4_evt_loop terminating... [2024-09-06 08:12:51,010][01070] Waiting for process inference_proc0-0 to join... [2024-09-06 08:12:51,015][01070] Waiting for process rollout_proc0 to join... [2024-09-06 08:12:53,043][01070] Waiting for process rollout_proc1 to join... [2024-09-06 08:12:53,046][01070] Waiting for process rollout_proc2 to join... [2024-09-06 08:12:53,049][01070] Waiting for process rollout_proc3 to join... [2024-09-06 08:12:53,051][01070] Waiting for process rollout_proc4 to join... [2024-09-06 08:12:53,052][01070] Waiting for process rollout_proc5 to join... [2024-09-06 08:12:53,054][01070] Waiting for process rollout_proc6 to join... [2024-09-06 08:12:53,057][01070] Waiting for process rollout_proc7 to join... [2024-09-06 08:12:53,059][01070] Batcher 0 profile tree view: batching: 28.0352, releasing_batches: 0.0261 [2024-09-06 08:12:53,060][01070] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 396.0637 update_model: 9.3731 weight_update: 0.0039 one_step: 0.0115 handle_policy_step: 605.9671 deserialize: 14.6060, stack: 3.1718, obs_to_device_normalize: 122.4392, forward: 322.4806, send_messages: 28.9576 prepare_outputs: 84.3394 to_cpu: 49.4062 [2024-09-06 08:12:53,062][01070] Learner 0 profile tree view: misc: 0.0070, prepare_batch: 14.0241 train: 74.3266 epoch_init: 0.0157, minibatch_init: 0.0064, losses_postprocess: 0.6825, kl_divergence: 0.6975, after_optimizer: 33.2803 calculate_losses: 26.8972 losses_init: 0.0105, forward_head: 1.2733, bptt_initial: 18.0432, tail: 1.1022, advantages_returns: 0.2551, losses: 3.8922 bptt: 1.9881 bptt_forward_core: 1.8812 update: 12.1225 clip: 0.8825 [2024-09-06 08:12:53,063][01070] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3955, enqueue_policy_requests: 94.3963, env_step: 821.5439, overhead: 13.5076, complete_rollouts: 7.0298 save_policy_outputs: 20.8305 split_output_tensors: 8.5837 [2024-09-06 08:12:53,065][01070] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3496, enqueue_policy_requests: 97.5017, env_step: 821.5330, overhead: 13.0956, complete_rollouts: 6.8883 save_policy_outputs: 20.0181 split_output_tensors: 7.8801 [2024-09-06 08:12:53,066][01070] Loop Runner_EvtLoop terminating... [2024-09-06 08:12:53,068][01070] Runner profile tree view: main_loop: 1081.6875 [2024-09-06 08:12:53,069][01070] Collected {0: 4005888}, FPS: 3703.4 [2024-09-06 08:26:47,354][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 08:26:47,356][01070] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-06 08:26:47,359][01070] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-06 08:26:47,361][01070] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-06 08:26:47,363][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 08:26:47,365][01070] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-06 08:26:47,366][01070] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 08:26:47,367][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-06 08:26:47,368][01070] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-06 08:26:47,369][01070] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-06 08:26:47,370][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-06 08:26:47,371][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-06 08:26:47,372][01070] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-06 08:26:47,373][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-06 08:26:47,374][01070] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-06 08:26:47,410][01070] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:26:47,413][01070] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 08:26:47,415][01070] RunningMeanStd input shape: (1,) [2024-09-06 08:26:47,432][01070] ConvEncoder: input_channels=3 [2024-09-06 08:26:47,594][01070] Conv encoder output size: 512 [2024-09-06 08:26:47,596][01070] Policy head output size: 512 [2024-09-06 08:26:47,888][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-06 08:26:48,754][01070] Num frames 100... [2024-09-06 08:26:48,875][01070] Num frames 200... [2024-09-06 08:26:48,994][01070] Num frames 300... [2024-09-06 08:26:49,113][01070] Num frames 400... [2024-09-06 08:26:49,254][01070] Num frames 500... [2024-09-06 08:26:49,388][01070] Avg episode rewards: #0: 11.440, true rewards: #0: 5.440 [2024-09-06 08:26:49,391][01070] Avg episode reward: 11.440, avg true_objective: 5.440 [2024-09-06 08:26:49,511][01070] Num frames 600... [2024-09-06 08:26:49,678][01070] Num frames 700... [2024-09-06 08:26:49,846][01070] Num frames 800... [2024-09-06 08:26:50,007][01070] Num frames 900... [2024-09-06 08:26:50,168][01070] Num frames 1000... [2024-09-06 08:26:50,331][01070] Num frames 1100... [2024-09-06 08:26:50,503][01070] Num frames 1200... [2024-09-06 08:26:50,681][01070] Num frames 1300... [2024-09-06 08:26:50,858][01070] Num frames 1400... [2024-09-06 08:26:51,030][01070] Num frames 1500... [2024-09-06 08:26:51,200][01070] Num frames 1600... [2024-09-06 08:26:51,364][01070] Avg episode rewards: #0: 19.795, true rewards: #0: 8.295 [2024-09-06 08:26:51,366][01070] Avg episode reward: 19.795, avg true_objective: 8.295 [2024-09-06 08:26:51,440][01070] Num frames 1700... [2024-09-06 08:26:51,597][01070] Num frames 1800... [2024-09-06 08:26:51,716][01070] Num frames 1900... [2024-09-06 08:26:51,843][01070] Num frames 2000... [2024-09-06 08:26:51,964][01070] Num frames 2100... [2024-09-06 08:26:52,083][01070] Num frames 2200... [2024-09-06 08:26:52,203][01070] Num frames 2300... [2024-09-06 08:26:52,366][01070] Avg episode rewards: #0: 18.633, true rewards: #0: 7.967 [2024-09-06 08:26:52,367][01070] Avg episode reward: 18.633, avg true_objective: 7.967 [2024-09-06 08:26:52,382][01070] Num frames 2400... [2024-09-06 08:26:52,509][01070] Num frames 2500... [2024-09-06 08:26:52,629][01070] Num frames 2600... [2024-09-06 08:26:52,748][01070] Num frames 2700... [2024-09-06 08:26:52,878][01070] Num frames 2800... [2024-09-06 08:26:52,997][01070] Num frames 2900... [2024-09-06 08:26:53,117][01070] Num frames 3000... [2024-09-06 08:26:53,239][01070] Num frames 3100... [2024-09-06 08:26:53,401][01070] Avg episode rewards: #0: 18.478, true rewards: #0: 7.977 [2024-09-06 08:26:53,402][01070] Avg episode reward: 18.478, avg true_objective: 7.977 [2024-09-06 08:26:53,416][01070] Num frames 3200... [2024-09-06 08:26:53,543][01070] Num frames 3300... [2024-09-06 08:26:53,662][01070] Num frames 3400... [2024-09-06 08:26:53,779][01070] Num frames 3500... [2024-09-06 08:26:53,906][01070] Num frames 3600... [2024-09-06 08:26:54,025][01070] Num frames 3700... [2024-09-06 08:26:54,144][01070] Num frames 3800... [2024-09-06 08:26:54,263][01070] Num frames 3900... [2024-09-06 08:26:54,327][01070] Avg episode rewards: #0: 18.210, true rewards: #0: 7.810 [2024-09-06 08:26:54,328][01070] Avg episode reward: 18.210, avg true_objective: 7.810 [2024-09-06 08:26:54,438][01070] Num frames 4000... [2024-09-06 08:26:54,568][01070] Num frames 4100... [2024-09-06 08:26:54,685][01070] Num frames 4200... [2024-09-06 08:26:54,803][01070] Num frames 4300... [2024-09-06 08:26:54,928][01070] Num frames 4400... [2024-09-06 08:26:55,056][01070] Num frames 4500... [2024-09-06 08:26:55,190][01070] Num frames 4600... [2024-09-06 08:26:55,326][01070] Num frames 4700... [2024-09-06 08:26:55,391][01070] Avg episode rewards: #0: 17.675, true rewards: #0: 7.842 [2024-09-06 08:26:55,392][01070] Avg episode reward: 17.675, avg true_objective: 7.842 [2024-09-06 08:26:55,513][01070] Num frames 4800... [2024-09-06 08:26:55,637][01070] Num frames 4900... [2024-09-06 08:26:55,758][01070] Num frames 5000... [2024-09-06 08:26:55,827][01070] Avg episode rewards: #0: 15.870, true rewards: #0: 7.156 [2024-09-06 08:26:55,828][01070] Avg episode reward: 15.870, avg true_objective: 7.156 [2024-09-06 08:26:55,944][01070] Num frames 5100... [2024-09-06 08:26:56,065][01070] Num frames 5200... [2024-09-06 08:26:56,186][01070] Num frames 5300... [2024-09-06 08:26:56,305][01070] Num frames 5400... [2024-09-06 08:26:56,425][01070] Num frames 5500... [2024-09-06 08:26:56,557][01070] Num frames 5600... [2024-09-06 08:26:56,681][01070] Num frames 5700... [2024-09-06 08:26:56,803][01070] Num frames 5800... [2024-09-06 08:26:56,931][01070] Num frames 5900... [2024-09-06 08:26:57,053][01070] Num frames 6000... [2024-09-06 08:26:57,173][01070] Num frames 6100... [2024-09-06 08:26:57,294][01070] Num frames 6200... [2024-09-06 08:26:57,418][01070] Num frames 6300... [2024-09-06 08:26:57,547][01070] Num frames 6400... [2024-09-06 08:26:57,675][01070] Num frames 6500... [2024-09-06 08:26:57,795][01070] Num frames 6600... [2024-09-06 08:26:57,916][01070] Num frames 6700... [2024-09-06 08:26:58,045][01070] Num frames 6800... [2024-09-06 08:26:58,169][01070] Num frames 6900... [2024-09-06 08:26:58,260][01070] Avg episode rewards: #0: 19.911, true rewards: #0: 8.661 [2024-09-06 08:26:58,261][01070] Avg episode reward: 19.911, avg true_objective: 8.661 [2024-09-06 08:26:58,349][01070] Num frames 7000... [2024-09-06 08:26:58,470][01070] Num frames 7100... [2024-09-06 08:26:58,599][01070] Num frames 7200... [2024-09-06 08:26:58,720][01070] Num frames 7300... [2024-09-06 08:26:58,842][01070] Num frames 7400... [2024-09-06 08:26:58,969][01070] Num frames 7500... [2024-09-06 08:26:59,094][01070] Num frames 7600... [2024-09-06 08:26:59,214][01070] Num frames 7700... [2024-09-06 08:26:59,332][01070] Num frames 7800... [2024-09-06 08:26:59,457][01070] Num frames 7900... [2024-09-06 08:26:59,545][01070] Avg episode rewards: #0: 19.801, true rewards: #0: 8.801 [2024-09-06 08:26:59,546][01070] Avg episode reward: 19.801, avg true_objective: 8.801 [2024-09-06 08:26:59,649][01070] Num frames 8000... [2024-09-06 08:26:59,793][01070] Num frames 8100... [2024-09-06 08:26:59,916][01070] Num frames 8200... [2024-09-06 08:27:00,042][01070] Num frames 8300... [2024-09-06 08:27:00,161][01070] Num frames 8400... [2024-09-06 08:27:00,280][01070] Num frames 8500... [2024-09-06 08:27:00,411][01070] Avg episode rewards: #0: 18.961, true rewards: #0: 8.561 [2024-09-06 08:27:00,412][01070] Avg episode reward: 18.961, avg true_objective: 8.561 [2024-09-06 08:27:55,072][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-06 08:29:44,806][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 08:29:44,808][01070] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-06 08:29:44,810][01070] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-06 08:29:44,811][01070] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-06 08:29:44,813][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 08:29:44,814][01070] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-06 08:29:44,817][01070] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-06 08:29:44,818][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-06 08:29:44,820][01070] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-06 08:29:44,822][01070] Adding new argument 'hf_repository'='Re-Re/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-06 08:29:44,823][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-06 08:29:44,826][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-06 08:29:44,827][01070] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-06 08:29:44,828][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-06 08:29:44,829][01070] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-06 08:29:44,858][01070] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 08:29:44,861][01070] RunningMeanStd input shape: (1,) [2024-09-06 08:29:44,875][01070] ConvEncoder: input_channels=3 [2024-09-06 08:29:44,911][01070] Conv encoder output size: 512 [2024-09-06 08:29:44,912][01070] Policy head output size: 512 [2024-09-06 08:29:44,931][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-06 08:29:45,358][01070] Num frames 100... [2024-09-06 08:29:45,488][01070] Num frames 200... [2024-09-06 08:29:45,624][01070] Num frames 300... [2024-09-06 08:29:45,744][01070] Num frames 400... [2024-09-06 08:29:45,863][01070] Num frames 500... [2024-09-06 08:29:45,987][01070] Num frames 600... [2024-09-06 08:29:46,108][01070] Num frames 700... [2024-09-06 08:29:46,235][01070] Num frames 800... [2024-09-06 08:29:46,300][01070] Avg episode rewards: #0: 17.070, true rewards: #0: 8.070 [2024-09-06 08:29:46,302][01070] Avg episode reward: 17.070, avg true_objective: 8.070 [2024-09-06 08:29:46,416][01070] Num frames 900... [2024-09-06 08:29:46,542][01070] Num frames 1000... [2024-09-06 08:29:46,669][01070] Num frames 1100... [2024-09-06 08:29:46,794][01070] Num frames 1200... [2024-09-06 08:29:46,917][01070] Num frames 1300... [2024-09-06 08:29:47,039][01070] Num frames 1400... [2024-09-06 08:29:47,189][01070] Avg episode rewards: #0: 14.895, true rewards: #0: 7.395 [2024-09-06 08:29:47,191][01070] Avg episode reward: 14.895, avg true_objective: 7.395 [2024-09-06 08:29:47,218][01070] Num frames 1500... [2024-09-06 08:29:47,338][01070] Num frames 1600... [2024-09-06 08:29:47,461][01070] Num frames 1700... [2024-09-06 08:29:47,587][01070] Num frames 1800... [2024-09-06 08:29:47,708][01070] Num frames 1900... [2024-09-06 08:29:47,826][01070] Num frames 2000... [2024-09-06 08:29:47,910][01070] Avg episode rewards: #0: 12.410, true rewards: #0: 6.743 [2024-09-06 08:29:47,913][01070] Avg episode reward: 12.410, avg true_objective: 6.743 [2024-09-06 08:29:48,007][01070] Num frames 2100... [2024-09-06 08:29:48,128][01070] Num frames 2200... [2024-09-06 08:29:48,257][01070] Num frames 2300... [2024-09-06 08:29:48,380][01070] Num frames 2400... [2024-09-06 08:29:48,524][01070] Avg episode rewards: #0: 11.178, true rewards: #0: 6.177 [2024-09-06 08:29:48,526][01070] Avg episode reward: 11.178, avg true_objective: 6.177 [2024-09-06 08:29:48,562][01070] Num frames 2500... [2024-09-06 08:29:48,679][01070] Num frames 2600... [2024-09-06 08:29:48,802][01070] Num frames 2700... [2024-09-06 08:29:48,926][01070] Num frames 2800... [2024-09-06 08:29:49,047][01070] Num frames 2900... [2024-09-06 08:29:49,168][01070] Num frames 3000... [2024-09-06 08:29:49,296][01070] Num frames 3100... [2024-09-06 08:29:49,421][01070] Num frames 3200... [2024-09-06 08:29:49,551][01070] Num frames 3300... [2024-09-06 08:29:49,677][01070] Num frames 3400... [2024-09-06 08:29:49,816][01070] Avg episode rewards: #0: 12.736, true rewards: #0: 6.936 [2024-09-06 08:29:49,819][01070] Avg episode reward: 12.736, avg true_objective: 6.936 [2024-09-06 08:29:49,860][01070] Num frames 3500... [2024-09-06 08:29:49,981][01070] Num frames 3600... [2024-09-06 08:29:50,103][01070] Num frames 3700... [2024-09-06 08:29:50,228][01070] Num frames 3800... [2024-09-06 08:29:50,360][01070] Num frames 3900... [2024-09-06 08:29:50,439][01070] Avg episode rewards: #0: 11.862, true rewards: #0: 6.528 [2024-09-06 08:29:50,441][01070] Avg episode reward: 11.862, avg true_objective: 6.528 [2024-09-06 08:29:50,548][01070] Num frames 4000... [2024-09-06 08:29:50,713][01070] Num frames 4100... [2024-09-06 08:29:50,880][01070] Num frames 4200... [2024-09-06 08:29:51,043][01070] Num frames 4300... [2024-09-06 08:29:51,205][01070] Avg episode rewards: #0: 10.950, true rewards: #0: 6.236 [2024-09-06 08:29:51,208][01070] Avg episode reward: 10.950, avg true_objective: 6.236 [2024-09-06 08:29:51,274][01070] Num frames 4400... [2024-09-06 08:29:51,443][01070] Num frames 4500... [2024-09-06 08:29:51,608][01070] Num frames 4600... [2024-09-06 08:29:51,767][01070] Num frames 4700... [2024-09-06 08:29:51,938][01070] Num frames 4800... [2024-09-06 08:29:52,113][01070] Num frames 4900... [2024-09-06 08:29:52,291][01070] Avg episode rewards: #0: 10.716, true rewards: #0: 6.216 [2024-09-06 08:29:52,294][01070] Avg episode reward: 10.716, avg true_objective: 6.216 [2024-09-06 08:29:52,349][01070] Num frames 5000... [2024-09-06 08:29:52,549][01070] Num frames 5100... [2024-09-06 08:29:52,723][01070] Num frames 5200... [2024-09-06 08:29:52,895][01070] Num frames 5300... [2024-09-06 08:29:53,063][01070] Num frames 5400... [2024-09-06 08:29:53,183][01070] Num frames 5500... [2024-09-06 08:29:53,304][01070] Num frames 5600... [2024-09-06 08:29:53,437][01070] Num frames 5700... [2024-09-06 08:29:53,565][01070] Num frames 5800... [2024-09-06 08:29:53,688][01070] Num frames 5900... [2024-09-06 08:29:53,810][01070] Num frames 6000... [2024-09-06 08:29:53,930][01070] Num frames 6100... [2024-09-06 08:29:54,050][01070] Num frames 6200... [2024-09-06 08:29:54,174][01070] Num frames 6300... [2024-09-06 08:29:54,254][01070] Avg episode rewards: #0: 13.130, true rewards: #0: 7.019 [2024-09-06 08:29:54,255][01070] Avg episode reward: 13.130, avg true_objective: 7.019 [2024-09-06 08:29:54,356][01070] Num frames 6400... [2024-09-06 08:29:54,490][01070] Num frames 6500... [2024-09-06 08:29:54,609][01070] Num frames 6600... [2024-09-06 08:29:54,726][01070] Num frames 6700... [2024-09-06 08:29:54,842][01070] Num frames 6800... [2024-09-06 08:29:54,961][01070] Num frames 6900... [2024-09-06 08:29:55,080][01070] Num frames 7000... [2024-09-06 08:29:55,202][01070] Num frames 7100... [2024-09-06 08:29:55,323][01070] Num frames 7200... [2024-09-06 08:29:55,437][01070] Avg episode rewards: #0: 13.847, true rewards: #0: 7.247 [2024-09-06 08:29:55,439][01070] Avg episode reward: 13.847, avg true_objective: 7.247 [2024-09-06 08:30:39,348][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-06 08:30:46,004][01070] The model has been pushed to https://huggingface.co/Re-Re/rl_course_vizdoom_health_gathering_supreme [2024-09-06 08:35:27,232][01070] Environment doom_basic already registered, overwriting... [2024-09-06 08:35:27,234][01070] Environment doom_two_colors_easy already registered, overwriting... [2024-09-06 08:35:27,237][01070] Environment doom_two_colors_hard already registered, overwriting... [2024-09-06 08:35:27,238][01070] Environment doom_dm already registered, overwriting... [2024-09-06 08:35:27,240][01070] Environment doom_dwango5 already registered, overwriting... [2024-09-06 08:35:27,241][01070] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-06 08:35:27,242][01070] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-06 08:35:27,243][01070] Environment doom_my_way_home already registered, overwriting... [2024-09-06 08:35:27,244][01070] Environment doom_deadly_corridor already registered, overwriting... [2024-09-06 08:35:27,245][01070] Environment doom_defend_the_center already registered, overwriting... [2024-09-06 08:35:27,246][01070] Environment doom_defend_the_line already registered, overwriting... [2024-09-06 08:35:27,247][01070] Environment doom_health_gathering already registered, overwriting... [2024-09-06 08:35:27,248][01070] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-06 08:35:27,250][01070] Environment doom_battle already registered, overwriting... [2024-09-06 08:35:27,251][01070] Environment doom_battle2 already registered, overwriting... [2024-09-06 08:35:27,252][01070] Environment doom_duel_bots already registered, overwriting... [2024-09-06 08:35:27,253][01070] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-06 08:35:27,254][01070] Environment doom_duel already registered, overwriting... [2024-09-06 08:35:27,255][01070] Environment doom_deathmatch_full already registered, overwriting... [2024-09-06 08:35:27,256][01070] Environment doom_benchmark already registered, overwriting... [2024-09-06 08:35:27,257][01070] register_encoder_factory: [2024-09-06 08:35:27,282][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 08:35:27,284][01070] Overriding arg 'train_for_env_steps' with value 5000000 passed from command line [2024-09-06 08:35:27,291][01070] Experiment dir /content/train_dir/default_experiment already exists! [2024-09-06 08:35:27,293][01070] Resuming existing experiment from /content/train_dir/default_experiment... [2024-09-06 08:35:27,295][01070] Weights and Biases integration disabled [2024-09-06 08:35:27,298][01070] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-09-06 08:35:29,357][01070] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=5000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-09-06 08:35:29,359][01070] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-06 08:35:29,362][01070] Rollout worker 0 uses device cpu [2024-09-06 08:35:29,364][01070] Rollout worker 1 uses device cpu [2024-09-06 08:35:29,365][01070] Rollout worker 2 uses device cpu [2024-09-06 08:35:29,366][01070] Rollout worker 3 uses device cpu [2024-09-06 08:35:29,368][01070] Rollout worker 4 uses device cpu [2024-09-06 08:35:29,369][01070] Rollout worker 5 uses device cpu [2024-09-06 08:35:29,370][01070] Rollout worker 6 uses device cpu [2024-09-06 08:35:29,372][01070] Rollout worker 7 uses device cpu [2024-09-06 08:35:29,446][01070] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 08:35:29,447][01070] InferenceWorker_p0-w0: min num requests: 2 [2024-09-06 08:35:29,485][01070] Starting all processes... [2024-09-06 08:35:29,486][01070] Starting process learner_proc0 [2024-09-06 08:35:29,535][01070] Starting all processes... [2024-09-06 08:35:29,540][01070] Starting process inference_proc0-0 [2024-09-06 08:35:29,541][01070] Starting process rollout_proc0 [2024-09-06 08:35:29,541][01070] Starting process rollout_proc1 [2024-09-06 08:35:29,541][01070] Starting process rollout_proc2 [2024-09-06 08:35:29,541][01070] Starting process rollout_proc3 [2024-09-06 08:35:29,541][01070] Starting process rollout_proc4 [2024-09-06 08:35:29,541][01070] Starting process rollout_proc5 [2024-09-06 08:35:29,736][01070] Starting process rollout_proc7 [2024-09-06 08:35:29,753][01070] Starting process rollout_proc6 [2024-09-06 08:35:43,326][19093] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 08:35:43,327][19093] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-06 08:35:43,391][19093] Num visible devices: 1 [2024-09-06 08:35:43,432][19093] Starting seed is not provided [2024-09-06 08:35:43,433][19093] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 08:35:43,434][19093] Initializing actor-critic model on device cuda:0 [2024-09-06 08:35:43,434][19093] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 08:35:43,436][19093] RunningMeanStd input shape: (1,) [2024-09-06 08:35:43,522][19093] ConvEncoder: input_channels=3 [2024-09-06 08:35:44,545][19093] Conv encoder output size: 512 [2024-09-06 08:35:44,548][19093] Policy head output size: 512 [2024-09-06 08:35:44,679][19093] Created Actor Critic model with architecture: [2024-09-06 08:35:44,680][19093] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-06 08:35:45,399][19110] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 08:35:45,404][19110] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-06 08:35:45,627][19116] Worker 5 uses CPU cores [1] [2024-09-06 08:35:45,628][19113] Worker 2 uses CPU cores [0] [2024-09-06 08:35:45,638][19110] Num visible devices: 1 [2024-09-06 08:35:45,693][19114] Worker 4 uses CPU cores [0] [2024-09-06 08:35:45,750][19093] Using optimizer [2024-09-06 08:35:45,925][19111] Worker 1 uses CPU cores [1] [2024-09-06 08:35:45,964][19117] Worker 7 uses CPU cores [1] [2024-09-06 08:35:46,011][19118] Worker 6 uses CPU cores [0] [2024-09-06 08:35:46,116][19112] Worker 0 uses CPU cores [0] [2024-09-06 08:35:46,119][19115] Worker 3 uses CPU cores [1] [2024-09-06 08:35:46,846][19093] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-09-06 08:35:46,893][19093] Loading model from checkpoint [2024-09-06 08:35:46,895][19093] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2024-09-06 08:35:46,896][19093] Initialized policy 0 weights for model version 978 [2024-09-06 08:35:46,906][19093] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 08:35:46,914][19093] LearnerWorker_p0 finished initialization! [2024-09-06 08:35:47,083][19110] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 08:35:47,085][19110] RunningMeanStd input shape: (1,) [2024-09-06 08:35:47,103][19110] ConvEncoder: input_channels=3 [2024-09-06 08:35:47,256][19110] Conv encoder output size: 512 [2024-09-06 08:35:47,257][19110] Policy head output size: 512 [2024-09-06 08:35:47,299][01070] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 08:35:47,343][01070] Inference worker 0-0 is ready! [2024-09-06 08:35:47,345][01070] All inference workers are ready! Signal rollout workers to start! [2024-09-06 08:35:47,666][19115] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,759][19117] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,841][19111] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,847][19116] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,859][19118] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,902][19114] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,911][19113] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:47,923][19112] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 08:35:49,437][01070] Heartbeat connected on Batcher_0 [2024-09-06 08:35:49,445][01070] Heartbeat connected on LearnerWorker_p0 [2024-09-06 08:35:49,476][01070] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-06 08:35:49,689][19112] Decorrelating experience for 0 frames... [2024-09-06 08:35:49,691][19113] Decorrelating experience for 0 frames... [2024-09-06 08:35:49,916][19115] Decorrelating experience for 0 frames... [2024-09-06 08:35:49,947][19117] Decorrelating experience for 0 frames... [2024-09-06 08:35:50,051][19111] Decorrelating experience for 0 frames... [2024-09-06 08:35:50,057][19116] Decorrelating experience for 0 frames... [2024-09-06 08:35:50,424][19112] Decorrelating experience for 32 frames... [2024-09-06 08:35:51,332][19117] Decorrelating experience for 32 frames... [2024-09-06 08:35:51,378][19115] Decorrelating experience for 32 frames... [2024-09-06 08:35:51,423][19111] Decorrelating experience for 32 frames... [2024-09-06 08:35:51,456][19114] Decorrelating experience for 0 frames... [2024-09-06 08:35:51,798][19113] Decorrelating experience for 32 frames... [2024-09-06 08:35:52,103][19112] Decorrelating experience for 64 frames... [2024-09-06 08:35:52,299][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 08:35:52,385][19118] Decorrelating experience for 0 frames... [2024-09-06 08:35:52,773][19116] Decorrelating experience for 32 frames... [2024-09-06 08:35:53,059][19117] Decorrelating experience for 64 frames... [2024-09-06 08:35:53,088][19115] Decorrelating experience for 64 frames... [2024-09-06 08:35:53,163][19111] Decorrelating experience for 64 frames... [2024-09-06 08:35:53,227][19114] Decorrelating experience for 32 frames... [2024-09-06 08:35:54,004][19116] Decorrelating experience for 64 frames... [2024-09-06 08:35:54,080][19111] Decorrelating experience for 96 frames... [2024-09-06 08:35:54,169][01070] Heartbeat connected on RolloutWorker_w1 [2024-09-06 08:35:54,232][19113] Decorrelating experience for 64 frames... [2024-09-06 08:35:54,438][19118] Decorrelating experience for 32 frames... [2024-09-06 08:35:54,884][19114] Decorrelating experience for 64 frames... [2024-09-06 08:35:55,752][19112] Decorrelating experience for 96 frames... [2024-09-06 08:35:55,972][01070] Heartbeat connected on RolloutWorker_w0 [2024-09-06 08:35:56,002][19113] Decorrelating experience for 96 frames... [2024-09-06 08:35:56,247][01070] Heartbeat connected on RolloutWorker_w2 [2024-09-06 08:35:56,489][19118] Decorrelating experience for 64 frames... [2024-09-06 08:35:56,773][19114] Decorrelating experience for 96 frames... [2024-09-06 08:35:57,299][01070] Heartbeat connected on RolloutWorker_w4 [2024-09-06 08:35:57,303][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 71.2. Samples: 712. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 08:35:57,307][01070] Avg episode reward: [(0, '5.702')] [2024-09-06 08:35:57,398][19116] Decorrelating experience for 96 frames... [2024-09-06 08:35:57,706][01070] Heartbeat connected on RolloutWorker_w5 [2024-09-06 08:35:57,841][19117] Decorrelating experience for 96 frames... [2024-09-06 08:35:58,244][01070] Heartbeat connected on RolloutWorker_w7 [2024-09-06 08:36:00,233][19093] Signal inference workers to stop experience collection... [2024-09-06 08:36:00,248][19110] InferenceWorker_p0-w0: stopping experience collection [2024-09-06 08:36:00,307][19118] Decorrelating experience for 96 frames... [2024-09-06 08:36:00,409][01070] Heartbeat connected on RolloutWorker_w6 [2024-09-06 08:36:00,811][19115] Decorrelating experience for 96 frames... [2024-09-06 08:36:00,904][01070] Heartbeat connected on RolloutWorker_w3 [2024-09-06 08:36:02,107][19093] Signal inference workers to resume experience collection... [2024-09-06 08:36:02,108][19110] InferenceWorker_p0-w0: resuming experience collection [2024-09-06 08:36:02,300][01070] Fps is (10 sec: 409.6, 60 sec: 273.0, 300 sec: 273.0). Total num frames: 4009984. Throughput: 0: 147.9. Samples: 2218. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-06 08:36:02,302][01070] Avg episode reward: [(0, '4.701')] [2024-09-06 08:36:07,302][01070] Fps is (10 sec: 1638.4, 60 sec: 819.0, 300 sec: 819.0). Total num frames: 4022272. Throughput: 0: 228.8. Samples: 4576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2024-09-06 08:36:07,305][01070] Avg episode reward: [(0, '6.521')] [2024-09-06 08:36:12,073][19110] Updated weights for policy 0, policy_version 988 (0.0020) [2024-09-06 08:36:12,300][01070] Fps is (10 sec: 3686.1, 60 sec: 1638.3, 300 sec: 1638.3). Total num frames: 4046848. Throughput: 0: 391.6. Samples: 9790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:36:12,306][01070] Avg episode reward: [(0, '11.236')] [2024-09-06 08:36:17,299][01070] Fps is (10 sec: 4507.3, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 4067328. Throughput: 0: 445.7. Samples: 13370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:36:17,305][01070] Avg episode reward: [(0, '12.296')] [2024-09-06 08:36:22,299][01070] Fps is (10 sec: 3687.1, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 4083712. Throughput: 0: 556.9. Samples: 19492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:36:22,301][01070] Avg episode reward: [(0, '14.420')] [2024-09-06 08:36:22,439][19110] Updated weights for policy 0, policy_version 998 (0.0030) [2024-09-06 08:36:27,299][01070] Fps is (10 sec: 3686.3, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 4104192. Throughput: 0: 602.8. Samples: 24114. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-06 08:36:27,301][01070] Avg episode reward: [(0, '16.055')] [2024-09-06 08:36:32,188][19110] Updated weights for policy 0, policy_version 1008 (0.0021) [2024-09-06 08:36:32,299][01070] Fps is (10 sec: 4505.6, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 4128768. Throughput: 0: 615.9. Samples: 27716. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-06 08:36:32,305][01070] Avg episode reward: [(0, '18.102')] [2024-09-06 08:36:32,309][19093] Saving new best policy, reward=18.102! [2024-09-06 08:36:37,300][01070] Fps is (10 sec: 4505.2, 60 sec: 2867.1, 300 sec: 2867.1). Total num frames: 4149248. Throughput: 0: 773.5. Samples: 34808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:36:37,305][01070] Avg episode reward: [(0, '19.063')] [2024-09-06 08:36:37,312][19093] Saving new best policy, reward=19.063! [2024-09-06 08:36:42,299][01070] Fps is (10 sec: 3276.7, 60 sec: 2829.9, 300 sec: 2829.9). Total num frames: 4161536. Throughput: 0: 852.4. Samples: 39066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:36:42,303][01070] Avg episode reward: [(0, '20.446')] [2024-09-06 08:36:42,306][19093] Saving new best policy, reward=20.446! [2024-09-06 08:36:43,965][19110] Updated weights for policy 0, policy_version 1018 (0.0034) [2024-09-06 08:36:47,299][01070] Fps is (10 sec: 3277.1, 60 sec: 2935.5, 300 sec: 2935.5). Total num frames: 4182016. Throughput: 0: 879.8. Samples: 41808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:36:47,305][01070] Avg episode reward: [(0, '21.840')] [2024-09-06 08:36:47,316][19093] Saving new best policy, reward=21.840! [2024-09-06 08:36:52,299][01070] Fps is (10 sec: 4505.7, 60 sec: 3345.1, 300 sec: 3087.8). Total num frames: 4206592. Throughput: 0: 982.8. Samples: 48798. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-06 08:36:52,304][01070] Avg episode reward: [(0, '21.047')] [2024-09-06 08:36:52,709][19110] Updated weights for policy 0, policy_version 1028 (0.0017) [2024-09-06 08:36:57,304][01070] Fps is (10 sec: 4094.0, 60 sec: 3618.1, 300 sec: 3101.0). Total num frames: 4222976. Throughput: 0: 987.4. Samples: 54226. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:36:57,306][01070] Avg episode reward: [(0, '20.552')] [2024-09-06 08:37:02,299][01070] Fps is (10 sec: 2457.6, 60 sec: 3686.5, 300 sec: 3003.7). Total num frames: 4231168. Throughput: 0: 944.6. Samples: 55878. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:37:02,301][01070] Avg episode reward: [(0, '21.311')] [2024-09-06 08:37:07,058][19110] Updated weights for policy 0, policy_version 1038 (0.0038) [2024-09-06 08:37:07,299][01070] Fps is (10 sec: 2868.7, 60 sec: 3823.2, 300 sec: 3072.0). Total num frames: 4251648. Throughput: 0: 897.5. Samples: 59880. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:37:07,301][01070] Avg episode reward: [(0, '20.332')] [2024-09-06 08:37:12,299][01070] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3132.2). Total num frames: 4272128. Throughput: 0: 950.4. Samples: 66884. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:37:12,301][01070] Avg episode reward: [(0, '19.661')] [2024-09-06 08:37:17,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3140.3). Total num frames: 4288512. Throughput: 0: 928.7. Samples: 69506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:37:17,302][01070] Avg episode reward: [(0, '20.777')] [2024-09-06 08:37:18,168][19110] Updated weights for policy 0, policy_version 1048 (0.0027) [2024-09-06 08:37:22,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3190.6). Total num frames: 4308992. Throughput: 0: 873.3. Samples: 74104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:37:22,304][01070] Avg episode reward: [(0, '21.018')] [2024-09-06 08:37:27,299][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 4333568. Throughput: 0: 938.8. Samples: 81312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:37:27,300][19110] Updated weights for policy 0, policy_version 1058 (0.0031) [2024-09-06 08:37:27,301][01070] Avg episode reward: [(0, '21.381')] [2024-09-06 08:37:27,314][19093] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001058_4333568.pth... [2024-09-06 08:37:27,438][19093] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000975_3993600.pth [2024-09-06 08:37:32,300][01070] Fps is (10 sec: 4095.5, 60 sec: 3686.3, 300 sec: 3276.8). Total num frames: 4349952. Throughput: 0: 954.6. Samples: 84768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:37:32,306][01070] Avg episode reward: [(0, '20.498')] [2024-09-06 08:37:37,299][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3276.8). Total num frames: 4366336. Throughput: 0: 899.8. Samples: 89288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:37:37,302][01070] Avg episode reward: [(0, '20.356')] [2024-09-06 08:37:38,951][19110] Updated weights for policy 0, policy_version 1068 (0.0030) [2024-09-06 08:37:42,302][01070] Fps is (10 sec: 3685.8, 60 sec: 3754.5, 300 sec: 3312.3). Total num frames: 4386816. Throughput: 0: 918.0. Samples: 95532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:37:42,306][01070] Avg episode reward: [(0, '18.966')] [2024-09-06 08:37:47,299][01070] Fps is (10 sec: 4505.6, 60 sec: 3823.0, 300 sec: 3379.2). Total num frames: 4411392. Throughput: 0: 960.9. Samples: 99118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:37:47,303][01070] Avg episode reward: [(0, '19.345')] [2024-09-06 08:37:48,240][19110] Updated weights for policy 0, policy_version 1078 (0.0013) [2024-09-06 08:37:52,299][01070] Fps is (10 sec: 3687.5, 60 sec: 3618.1, 300 sec: 3342.3). Total num frames: 4423680. Throughput: 0: 989.1. Samples: 104388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:37:52,301][01070] Avg episode reward: [(0, '19.443')] [2024-09-06 08:37:57,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3755.0, 300 sec: 3402.8). Total num frames: 4448256. Throughput: 0: 957.7. Samples: 109982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:37:57,301][01070] Avg episode reward: [(0, '20.099')] [2024-09-06 08:37:59,151][19110] Updated weights for policy 0, policy_version 1088 (0.0024) [2024-09-06 08:38:02,299][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3428.5). Total num frames: 4468736. Throughput: 0: 978.0. Samples: 113514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:38:02,306][01070] Avg episode reward: [(0, '20.360')] [2024-09-06 08:38:07,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3423.1). Total num frames: 4485120. Throughput: 0: 1019.1. Samples: 119964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:38:07,302][01070] Avg episode reward: [(0, '20.240')] [2024-09-06 08:38:10,312][19110] Updated weights for policy 0, policy_version 1098 (0.0022) [2024-09-06 08:38:12,299][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3418.0). Total num frames: 4501504. Throughput: 0: 953.8. Samples: 124232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:38:12,304][01070] Avg episode reward: [(0, '19.557')] [2024-09-06 08:38:17,299][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3467.9). Total num frames: 4526080. Throughput: 0: 954.6. Samples: 127726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:38:17,301][01070] Avg episode reward: [(0, '20.239')] [2024-09-06 08:38:19,439][19110] Updated weights for policy 0, policy_version 1108 (0.0041) [2024-09-06 08:38:22,301][01070] Fps is (10 sec: 4504.4, 60 sec: 3959.3, 300 sec: 3488.1). Total num frames: 4546560. Throughput: 0: 1011.7. Samples: 134818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:38:22,304][01070] Avg episode reward: [(0, '20.506')] [2024-09-06 08:38:27,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3481.6). Total num frames: 4562944. Throughput: 0: 977.9. Samples: 139534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:38:27,303][01070] Avg episode reward: [(0, '20.585')] [2024-09-06 08:38:31,030][19110] Updated weights for policy 0, policy_version 1118 (0.0046) [2024-09-06 08:38:32,299][01070] Fps is (10 sec: 3687.4, 60 sec: 3891.3, 300 sec: 3500.2). Total num frames: 4583424. Throughput: 0: 953.9. Samples: 142044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 08:38:32,306][01070] Avg episode reward: [(0, '20.308')] [2024-09-06 08:38:37,299][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3541.8). Total num frames: 4608000. Throughput: 0: 993.6. Samples: 149102. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:38:37,304][01070] Avg episode reward: [(0, '21.252')] [2024-09-06 08:38:40,626][19110] Updated weights for policy 0, policy_version 1128 (0.0030) [2024-09-06 08:38:42,299][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3534.3). Total num frames: 4624384. Throughput: 0: 994.4. Samples: 154730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:38:42,303][01070] Avg episode reward: [(0, '20.914')] [2024-09-06 08:38:47,299][01070] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3527.1). Total num frames: 4640768. Throughput: 0: 963.1. Samples: 156854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:38:47,307][01070] Avg episode reward: [(0, '20.844')] [2024-09-06 08:38:51,335][19110] Updated weights for policy 0, policy_version 1138 (0.0028) [2024-09-06 08:38:52,299][01070] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3564.6). Total num frames: 4665344. Throughput: 0: 963.3. Samples: 163312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:38:52,301][01070] Avg episode reward: [(0, '22.493')] [2024-09-06 08:38:52,304][19093] Saving new best policy, reward=22.493! [2024-09-06 08:38:57,299][01070] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3578.6). Total num frames: 4685824. Throughput: 0: 1017.3. Samples: 170010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:38:57,305][01070] Avg episode reward: [(0, '21.806')] [2024-09-06 08:39:02,299][01070] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 4698112. Throughput: 0: 985.0. Samples: 172050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:39:02,300][01070] Avg episode reward: [(0, '22.885')] [2024-09-06 08:39:02,305][19093] Saving new best policy, reward=22.885! [2024-09-06 08:39:02,950][19110] Updated weights for policy 0, policy_version 1148 (0.0035) [2024-09-06 08:39:07,299][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3563.5). Total num frames: 4718592. Throughput: 0: 948.3. Samples: 177488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:39:07,301][01070] Avg episode reward: [(0, '23.190')] [2024-09-06 08:39:07,318][19093] Saving new best policy, reward=23.190! [2024-09-06 08:39:11,864][19110] Updated weights for policy 0, policy_version 1158 (0.0024) [2024-09-06 08:39:12,299][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3596.5). Total num frames: 4743168. Throughput: 0: 994.3. Samples: 184276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 08:39:12,303][01070] Avg episode reward: [(0, '23.069')] [2024-09-06 08:39:17,299][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3588.9). Total num frames: 4759552. Throughput: 0: 1001.5. Samples: 187112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 08:39:17,301][01070] Avg episode reward: [(0, '22.125')] [2024-09-06 08:39:22,299][01070] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3581.6). Total num frames: 4775936. Throughput: 0: 941.3. Samples: 191462. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 08:39:22,303][01070] Avg episode reward: [(0, '21.027')] [2024-09-06 08:39:23,456][19110] Updated weights for policy 0, policy_version 1168 (0.0022) [2024-09-06 08:39:27,299][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3611.9). Total num frames: 4800512. Throughput: 0: 973.6. Samples: 198540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:39:27,301][01070] Avg episode reward: [(0, '19.180')] [2024-09-06 08:39:27,314][19093] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001172_4800512.pth... [2024-09-06 08:39:27,463][19093] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2024-09-06 08:39:32,300][01070] Fps is (10 sec: 4504.9, 60 sec: 3959.4, 300 sec: 3622.7). Total num frames: 4820992. Throughput: 0: 1000.5. Samples: 201878. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 08:39:32,303][01070] Avg episode reward: [(0, '20.069')] [2024-09-06 08:39:33,797][19110] Updated weights for policy 0, policy_version 1178 (0.0024) [2024-09-06 08:39:37,299][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3597.4). Total num frames: 4833280. Throughput: 0: 959.6. Samples: 206492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 08:39:37,301][01070] Avg episode reward: [(0, '19.170')] [2024-09-06 08:39:42,299][01070] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3625.4). Total num frames: 4857856. Throughput: 0: 947.1. Samples: 212628. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 08:39:42,306][01070] Avg episode reward: [(0, '20.382')] [2024-09-06 08:39:43,732][19110] Updated weights for policy 0, policy_version 1188 (0.0024) [2024-09-06 08:39:47,299][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3635.2). Total num frames: 4878336. Throughput: 0: 980.8. Samples: 216188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:39:47,313][01070] Avg episode reward: [(0, '21.641')] [2024-09-06 08:39:52,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3627.9). Total num frames: 4894720. Throughput: 0: 984.4. Samples: 221786. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:39:52,301][01070] Avg episode reward: [(0, '21.803')] [2024-09-06 08:39:55,248][19110] Updated weights for policy 0, policy_version 1198 (0.0031) [2024-09-06 08:39:57,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3637.2). Total num frames: 4915200. Throughput: 0: 951.6. Samples: 227098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:39:57,304][01070] Avg episode reward: [(0, '22.439')] [2024-09-06 08:40:02,299][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3662.3). Total num frames: 4939776. Throughput: 0: 968.4. Samples: 230692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:40:02,305][01070] Avg episode reward: [(0, '23.387')] [2024-09-06 08:40:02,307][19093] Saving new best policy, reward=23.387! [2024-09-06 08:40:04,048][19110] Updated weights for policy 0, policy_version 1208 (0.0019) [2024-09-06 08:40:07,299][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3654.9). Total num frames: 4956160. Throughput: 0: 1015.8. Samples: 237174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:40:07,304][01070] Avg episode reward: [(0, '25.478')] [2024-09-06 08:40:07,312][19093] Saving new best policy, reward=25.478! [2024-09-06 08:40:12,299][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3647.8). Total num frames: 4972544. Throughput: 0: 950.4. Samples: 241310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 08:40:12,301][01070] Avg episode reward: [(0, '26.573')] [2024-09-06 08:40:12,304][19093] Saving new best policy, reward=26.573! [2024-09-06 08:40:15,918][19110] Updated weights for policy 0, policy_version 1218 (0.0034) [2024-09-06 08:40:17,299][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3656.1). Total num frames: 4993024. Throughput: 0: 947.8. Samples: 244528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 08:40:17,301][01070] Avg episode reward: [(0, '26.655')] [2024-09-06 08:40:17,310][19093] Saving new best policy, reward=26.655! [2024-09-06 08:40:19,413][19093] Stopping Batcher_0... [2024-09-06 08:40:19,413][01070] Component Batcher_0 stopped! [2024-09-06 08:40:19,415][19093] Loop batcher_evt_loop terminating... [2024-09-06 08:40:19,420][19093] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-09-06 08:40:19,470][19110] Weights refcount: 2 0 [2024-09-06 08:40:19,478][01070] Component InferenceWorker_p0-w0 stopped! [2024-09-06 08:40:19,481][19110] Stopping InferenceWorker_p0-w0... [2024-09-06 08:40:19,481][19110] Loop inference_proc0-0_evt_loop terminating... [2024-09-06 08:40:19,543][19093] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001058_4333568.pth [2024-09-06 08:40:19,557][19093] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-09-06 08:40:19,761][01070] Component LearnerWorker_p0 stopped! [2024-09-06 08:40:19,761][19093] Stopping LearnerWorker_p0... [2024-09-06 08:40:19,766][19093] Loop learner_proc0_evt_loop terminating... [2024-09-06 08:40:19,774][01070] Component RolloutWorker_w3 stopped! [2024-09-06 08:40:19,777][19115] Stopping RolloutWorker_w3... [2024-09-06 08:40:19,779][19115] Loop rollout_proc3_evt_loop terminating... [2024-09-06 08:40:19,804][01070] Component RolloutWorker_w1 stopped! [2024-09-06 08:40:19,806][19111] Stopping RolloutWorker_w1... [2024-09-06 08:40:19,813][19111] Loop rollout_proc1_evt_loop terminating... [2024-09-06 08:40:19,874][01070] Component RolloutWorker_w7 stopped! [2024-09-06 08:40:19,880][19117] Stopping RolloutWorker_w7... [2024-09-06 08:40:19,888][19117] Loop rollout_proc7_evt_loop terminating... [2024-09-06 08:40:19,902][01070] Component RolloutWorker_w5 stopped! [2024-09-06 08:40:19,904][19116] Stopping RolloutWorker_w5... [2024-09-06 08:40:19,909][19116] Loop rollout_proc5_evt_loop terminating... [2024-09-06 08:40:20,003][19118] Stopping RolloutWorker_w6... [2024-09-06 08:40:20,003][19118] Loop rollout_proc6_evt_loop terminating... [2024-09-06 08:40:20,003][01070] Component RolloutWorker_w6 stopped! [2024-09-06 08:40:20,035][19113] Stopping RolloutWorker_w2... [2024-09-06 08:40:20,037][19113] Loop rollout_proc2_evt_loop terminating... [2024-09-06 08:40:20,035][01070] Component RolloutWorker_w2 stopped! [2024-09-06 08:40:20,058][19114] Stopping RolloutWorker_w4... [2024-09-06 08:40:20,057][01070] Component RolloutWorker_w4 stopped! [2024-09-06 08:40:20,062][19114] Loop rollout_proc4_evt_loop terminating... [2024-09-06 08:40:20,066][19112] Stopping RolloutWorker_w0... [2024-09-06 08:40:20,066][01070] Component RolloutWorker_w0 stopped! [2024-09-06 08:40:20,068][01070] Waiting for process learner_proc0 to stop... [2024-09-06 08:40:20,078][19112] Loop rollout_proc0_evt_loop terminating... [2024-09-06 08:40:21,230][01070] Waiting for process inference_proc0-0 to join... [2024-09-06 08:40:21,233][01070] Waiting for process rollout_proc0 to join... [2024-09-06 08:40:24,143][01070] Waiting for process rollout_proc1 to join... [2024-09-06 08:40:24,148][01070] Waiting for process rollout_proc2 to join... [2024-09-06 08:40:24,150][01070] Waiting for process rollout_proc3 to join... [2024-09-06 08:40:24,153][01070] Waiting for process rollout_proc4 to join... [2024-09-06 08:40:24,159][01070] Waiting for process rollout_proc5 to join... [2024-09-06 08:40:24,162][01070] Waiting for process rollout_proc6 to join... [2024-09-06 08:40:24,165][01070] Waiting for process rollout_proc7 to join... [2024-09-06 08:40:24,168][01070] Batcher 0 profile tree view: batching: 7.2186, releasing_batches: 0.0064 [2024-09-06 08:40:24,170][01070] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0054 wait_policy_total: 102.6532 update_model: 2.2548 weight_update: 0.0030 one_step: 0.0026 handle_policy_step: 154.4121 deserialize: 3.6770, stack: 0.8213, obs_to_device_normalize: 31.1067, forward: 82.7946, send_messages: 7.5294 prepare_outputs: 21.0675 to_cpu: 12.5076 [2024-09-06 08:40:24,171][01070] Learner 0 profile tree view: misc: 0.0012, prepare_batch: 5.2823 train: 21.8730 epoch_init: 0.0014, minibatch_init: 0.0016, losses_postprocess: 0.1603, kl_divergence: 0.2048, after_optimizer: 0.9110 calculate_losses: 8.7013 losses_init: 0.0013, forward_head: 0.6601, bptt_initial: 6.1101, tail: 0.3325, advantages_returns: 0.0894, losses: 0.9715 bptt: 0.4678 bptt_forward_core: 0.4488 update: 11.7699 clip: 0.2453 [2024-09-06 08:40:24,173][01070] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0619, enqueue_policy_requests: 23.3356, env_step: 206.1955, overhead: 3.3833, complete_rollouts: 1.8563 save_policy_outputs: 5.2619 split_output_tensors: 2.1953 [2024-09-06 08:40:24,175][01070] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0745, enqueue_policy_requests: 24.4494, env_step: 203.3563, overhead: 3.1667, complete_rollouts: 1.6788 save_policy_outputs: 4.9515 split_output_tensors: 2.0046 [2024-09-06 08:40:24,177][01070] Loop Runner_EvtLoop terminating... [2024-09-06 08:40:24,179][01070] Runner profile tree view: main_loop: 294.6944 [2024-09-06 08:40:24,180][01070] Collected {0: 5005312}, FPS: 3391.4 [2024-09-06 08:56:44,318][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 08:56:44,320][01070] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-06 08:56:44,322][01070] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-06 08:56:44,324][01070] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-06 08:56:44,326][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 08:56:44,327][01070] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-06 08:56:44,328][01070] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 08:56:44,329][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-06 08:56:44,331][01070] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-06 08:56:44,332][01070] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-06 08:56:44,333][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-06 08:56:44,334][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-06 08:56:44,339][01070] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-06 08:56:44,340][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-06 08:56:44,341][01070] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-06 08:56:44,368][01070] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 08:56:44,370][01070] RunningMeanStd input shape: (1,) [2024-09-06 08:56:44,390][01070] ConvEncoder: input_channels=3 [2024-09-06 08:56:44,435][01070] Conv encoder output size: 512 [2024-09-06 08:56:44,436][01070] Policy head output size: 512 [2024-09-06 08:56:44,457][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-09-06 08:56:44,883][01070] Num frames 100... [2024-09-06 08:56:45,006][01070] Num frames 200... [2024-09-06 08:56:45,128][01070] Num frames 300... [2024-09-06 08:56:45,247][01070] Num frames 400... [2024-09-06 08:56:45,367][01070] Num frames 500... [2024-09-06 08:56:45,497][01070] Num frames 600... [2024-09-06 08:56:45,618][01070] Num frames 700... [2024-09-06 08:56:45,741][01070] Num frames 800... [2024-09-06 08:56:45,860][01070] Num frames 900... [2024-09-06 08:56:45,978][01070] Num frames 1000... [2024-09-06 08:56:46,102][01070] Num frames 1100... [2024-09-06 08:56:46,224][01070] Num frames 1200... [2024-09-06 08:56:46,348][01070] Num frames 1300... [2024-09-06 08:56:46,479][01070] Num frames 1400... [2024-09-06 08:56:46,606][01070] Num frames 1500... [2024-09-06 08:56:46,726][01070] Num frames 1600... [2024-09-06 08:56:46,845][01070] Num frames 1700... [2024-09-06 08:56:46,967][01070] Num frames 1800... [2024-09-06 08:56:47,110][01070] Avg episode rewards: #0: 46.719, true rewards: #0: 18.720 [2024-09-06 08:56:47,113][01070] Avg episode reward: 46.719, avg true_objective: 18.720 [2024-09-06 08:56:47,150][01070] Num frames 1900... [2024-09-06 08:56:47,268][01070] Num frames 2000... [2024-09-06 08:56:47,388][01070] Num frames 2100... [2024-09-06 08:56:47,523][01070] Num frames 2200... [2024-09-06 08:56:47,671][01070] Num frames 2300... [2024-09-06 08:56:47,751][01070] Avg episode rewards: #0: 27.100, true rewards: #0: 11.600 [2024-09-06 08:56:47,752][01070] Avg episode reward: 27.100, avg true_objective: 11.600 [2024-09-06 08:56:47,854][01070] Num frames 2400... [2024-09-06 08:56:47,975][01070] Num frames 2500... [2024-09-06 08:56:48,094][01070] Num frames 2600... [2024-09-06 08:56:48,217][01070] Num frames 2700... [2024-09-06 08:56:48,342][01070] Num frames 2800... [2024-09-06 08:56:48,467][01070] Num frames 2900... [2024-09-06 08:56:48,602][01070] Num frames 3000... [2024-09-06 08:56:48,749][01070] Num frames 3100... [2024-09-06 08:56:48,919][01070] Num frames 3200... [2024-09-06 08:56:49,093][01070] Num frames 3300... [2024-09-06 08:56:49,273][01070] Avg episode rewards: #0: 26.253, true rewards: #0: 11.253 [2024-09-06 08:56:49,277][01070] Avg episode reward: 26.253, avg true_objective: 11.253 [2024-09-06 08:56:49,319][01070] Num frames 3400... [2024-09-06 08:56:49,496][01070] Num frames 3500... [2024-09-06 08:56:49,661][01070] Num frames 3600... [2024-09-06 08:56:49,822][01070] Num frames 3700... [2024-09-06 08:56:50,036][01070] Avg episode rewards: #0: 21.230, true rewards: #0: 9.480 [2024-09-06 08:56:50,038][01070] Avg episode reward: 21.230, avg true_objective: 9.480 [2024-09-06 08:56:50,057][01070] Num frames 3800... [2024-09-06 08:56:50,224][01070] Num frames 3900... [2024-09-06 08:56:50,397][01070] Num frames 4000... [2024-09-06 08:56:50,585][01070] Num frames 4100... [2024-09-06 08:56:50,755][01070] Num frames 4200... [2024-09-06 08:56:50,944][01070] Num frames 4300... [2024-09-06 08:56:51,125][01070] Num frames 4400... [2024-09-06 08:56:51,259][01070] Num frames 4500... [2024-09-06 08:56:51,382][01070] Num frames 4600... [2024-09-06 08:56:51,504][01070] Num frames 4700... [2024-09-06 08:56:51,623][01070] Num frames 4800... [2024-09-06 08:56:51,752][01070] Num frames 4900... [2024-09-06 08:56:51,870][01070] Num frames 5000... [2024-09-06 08:56:51,988][01070] Num frames 5100... [2024-09-06 08:56:52,111][01070] Num frames 5200... [2024-09-06 08:56:52,228][01070] Num frames 5300... [2024-09-06 08:56:52,393][01070] Avg episode rewards: #0: 24.784, true rewards: #0: 10.784 [2024-09-06 08:56:52,394][01070] Avg episode reward: 24.784, avg true_objective: 10.784 [2024-09-06 08:56:52,408][01070] Num frames 5400... [2024-09-06 08:56:52,534][01070] Num frames 5500... [2024-09-06 08:56:52,657][01070] Num frames 5600... [2024-09-06 08:56:52,788][01070] Num frames 5700... [2024-09-06 08:56:52,907][01070] Num frames 5800... [2024-09-06 08:56:53,028][01070] Num frames 5900... [2024-09-06 08:56:53,148][01070] Num frames 6000... [2024-09-06 08:56:53,268][01070] Num frames 6100... [2024-09-06 08:56:53,388][01070] Num frames 6200... [2024-09-06 08:56:53,515][01070] Num frames 6300... [2024-09-06 08:56:53,638][01070] Num frames 6400... [2024-09-06 08:56:53,779][01070] Num frames 6500... [2024-09-06 08:56:53,901][01070] Num frames 6600... [2024-09-06 08:56:54,025][01070] Num frames 6700... [2024-09-06 08:56:54,146][01070] Num frames 6800... [2024-09-06 08:56:54,268][01070] Num frames 6900... [2024-09-06 08:56:54,390][01070] Num frames 7000... [2024-09-06 08:56:54,560][01070] Avg episode rewards: #0: 28.313, true rewards: #0: 11.813 [2024-09-06 08:56:54,562][01070] Avg episode reward: 28.313, avg true_objective: 11.813 [2024-09-06 08:56:54,581][01070] Num frames 7100... [2024-09-06 08:56:54,699][01070] Num frames 7200... [2024-09-06 08:56:54,827][01070] Num frames 7300... [2024-09-06 08:56:54,945][01070] Num frames 7400... [2024-09-06 08:56:55,064][01070] Num frames 7500... [2024-09-06 08:56:55,188][01070] Num frames 7600... [2024-09-06 08:56:55,281][01070] Avg episode rewards: #0: 25.760, true rewards: #0: 10.903 [2024-09-06 08:56:55,283][01070] Avg episode reward: 25.760, avg true_objective: 10.903 [2024-09-06 08:56:55,366][01070] Num frames 7700... [2024-09-06 08:56:55,489][01070] Num frames 7800... [2024-09-06 08:56:55,612][01070] Num frames 7900... [2024-09-06 08:56:55,732][01070] Num frames 8000... [2024-09-06 08:56:55,892][01070] Avg episode rewards: #0: 23.600, true rewards: #0: 10.100 [2024-09-06 08:56:55,894][01070] Avg episode reward: 23.600, avg true_objective: 10.100 [2024-09-06 08:56:55,921][01070] Num frames 8100... [2024-09-06 08:56:56,039][01070] Num frames 8200... [2024-09-06 08:56:56,159][01070] Num frames 8300... [2024-09-06 08:56:56,280][01070] Num frames 8400... [2024-09-06 08:56:56,402][01070] Num frames 8500... [2024-09-06 08:56:56,532][01070] Num frames 8600... [2024-09-06 08:56:56,656][01070] Num frames 8700... [2024-09-06 08:56:56,782][01070] Num frames 8800... [2024-09-06 08:56:56,911][01070] Num frames 8900... [2024-09-06 08:56:57,036][01070] Num frames 9000... [2024-09-06 08:56:57,162][01070] Num frames 9100... [2024-09-06 08:56:57,286][01070] Num frames 9200... [2024-09-06 08:56:57,409][01070] Num frames 9300... [2024-09-06 08:56:57,498][01070] Avg episode rewards: #0: 24.253, true rewards: #0: 10.364 [2024-09-06 08:56:57,500][01070] Avg episode reward: 24.253, avg true_objective: 10.364 [2024-09-06 08:56:57,588][01070] Num frames 9400... [2024-09-06 08:56:57,710][01070] Num frames 9500... [2024-09-06 08:56:57,848][01070] Num frames 9600... [2024-09-06 08:56:57,973][01070] Num frames 9700... [2024-09-06 08:56:58,097][01070] Num frames 9800... [2024-09-06 08:56:58,209][01070] Avg episode rewards: #0: 22.746, true rewards: #0: 9.846 [2024-09-06 08:56:58,211][01070] Avg episode reward: 22.746, avg true_objective: 9.846 [2024-09-06 08:57:59,802][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-06 09:00:55,778][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 09:00:55,780][01070] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-06 09:00:55,782][01070] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-06 09:00:55,784][01070] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-06 09:00:55,786][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 09:00:55,789][01070] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-06 09:00:55,791][01070] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-06 09:00:55,792][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-06 09:00:55,793][01070] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-06 09:00:55,794][01070] Adding new argument 'hf_repository'='Re-Re/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-06 09:00:55,795][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-06 09:00:55,796][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-06 09:00:55,797][01070] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-06 09:00:55,798][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-06 09:00:55,799][01070] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-06 09:00:55,830][01070] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 09:00:55,832][01070] RunningMeanStd input shape: (1,) [2024-09-06 09:00:55,846][01070] ConvEncoder: input_channels=3 [2024-09-06 09:00:55,882][01070] Conv encoder output size: 512 [2024-09-06 09:00:55,884][01070] Policy head output size: 512 [2024-09-06 09:00:55,904][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-09-06 09:00:56,320][01070] Num frames 100... [2024-09-06 09:00:56,441][01070] Num frames 200... [2024-09-06 09:00:56,597][01070] Num frames 300... [2024-09-06 09:00:56,715][01070] Num frames 400... [2024-09-06 09:00:56,840][01070] Num frames 500... [2024-09-06 09:00:56,960][01070] Num frames 600... [2024-09-06 09:00:57,080][01070] Num frames 700... [2024-09-06 09:00:57,182][01070] Avg episode rewards: #0: 16.380, true rewards: #0: 7.380 [2024-09-06 09:00:57,184][01070] Avg episode reward: 16.380, avg true_objective: 7.380 [2024-09-06 09:00:57,264][01070] Num frames 800... [2024-09-06 09:00:57,399][01070] Num frames 900... [2024-09-06 09:00:57,529][01070] Num frames 1000... [2024-09-06 09:00:57,650][01070] Num frames 1100... [2024-09-06 09:00:57,775][01070] Num frames 1200... [2024-09-06 09:00:57,898][01070] Num frames 1300... [2024-09-06 09:00:58,016][01070] Num frames 1400... [2024-09-06 09:00:58,136][01070] Num frames 1500... [2024-09-06 09:00:58,255][01070] Num frames 1600... [2024-09-06 09:00:58,314][01070] Avg episode rewards: #0: 18.010, true rewards: #0: 8.010 [2024-09-06 09:00:58,316][01070] Avg episode reward: 18.010, avg true_objective: 8.010 [2024-09-06 09:00:58,433][01070] Num frames 1700... [2024-09-06 09:00:58,562][01070] Num frames 1800... [2024-09-06 09:00:58,686][01070] Num frames 1900... [2024-09-06 09:00:58,804][01070] Num frames 2000... [2024-09-06 09:00:58,927][01070] Num frames 2100... [2024-09-06 09:00:59,047][01070] Num frames 2200... [2024-09-06 09:00:59,165][01070] Num frames 2300... [2024-09-06 09:00:59,285][01070] Num frames 2400... [2024-09-06 09:00:59,413][01070] Num frames 2500... [2024-09-06 09:00:59,544][01070] Num frames 2600... [2024-09-06 09:00:59,669][01070] Num frames 2700... [2024-09-06 09:00:59,789][01070] Num frames 2800... [2024-09-06 09:00:59,937][01070] Num frames 2900... [2024-09-06 09:01:00,112][01070] Num frames 3000... [2024-09-06 09:01:00,278][01070] Num frames 3100... [2024-09-06 09:01:00,453][01070] Num frames 3200... [2024-09-06 09:01:00,623][01070] Num frames 3300... [2024-09-06 09:01:00,786][01070] Num frames 3400... [2024-09-06 09:01:00,952][01070] Num frames 3500... [2024-09-06 09:01:01,119][01070] Num frames 3600... [2024-09-06 09:01:01,294][01070] Num frames 3700... [2024-09-06 09:01:01,356][01070] Avg episode rewards: #0: 30.673, true rewards: #0: 12.340 [2024-09-06 09:01:01,357][01070] Avg episode reward: 30.673, avg true_objective: 12.340 [2024-09-06 09:01:01,535][01070] Num frames 3800... [2024-09-06 09:01:01,711][01070] Num frames 3900... [2024-09-06 09:01:01,881][01070] Num frames 4000... [2024-09-06 09:01:02,049][01070] Num frames 4100... [2024-09-06 09:01:02,222][01070] Num frames 4200... [2024-09-06 09:01:02,396][01070] Num frames 4300... [2024-09-06 09:01:02,534][01070] Num frames 4400... [2024-09-06 09:01:02,655][01070] Num frames 4500... [2024-09-06 09:01:02,776][01070] Num frames 4600... [2024-09-06 09:01:02,898][01070] Num frames 4700... [2024-09-06 09:01:03,020][01070] Num frames 4800... [2024-09-06 09:01:03,140][01070] Num frames 4900... [2024-09-06 09:01:03,261][01070] Num frames 5000... [2024-09-06 09:01:03,383][01070] Num frames 5100... [2024-09-06 09:01:03,519][01070] Num frames 5200... [2024-09-06 09:01:03,641][01070] Num frames 5300... [2024-09-06 09:01:03,817][01070] Avg episode rewards: #0: 34.225, true rewards: #0: 13.475 [2024-09-06 09:01:03,818][01070] Avg episode reward: 34.225, avg true_objective: 13.475 [2024-09-06 09:01:03,835][01070] Num frames 5400... [2024-09-06 09:01:03,956][01070] Num frames 5500... [2024-09-06 09:01:04,074][01070] Num frames 5600... [2024-09-06 09:01:04,194][01070] Num frames 5700... [2024-09-06 09:01:04,314][01070] Num frames 5800... [2024-09-06 09:01:04,436][01070] Num frames 5900... [2024-09-06 09:01:04,574][01070] Num frames 6000... [2024-09-06 09:01:04,693][01070] Num frames 6100... [2024-09-06 09:01:04,810][01070] Num frames 6200... [2024-09-06 09:01:04,934][01070] Num frames 6300... [2024-09-06 09:01:05,057][01070] Num frames 6400... [2024-09-06 09:01:05,177][01070] Num frames 6500... [2024-09-06 09:01:05,296][01070] Num frames 6600... [2024-09-06 09:01:05,424][01070] Num frames 6700... [2024-09-06 09:01:05,562][01070] Num frames 6800... [2024-09-06 09:01:05,698][01070] Num frames 6900... [2024-09-06 09:01:05,816][01070] Num frames 7000... [2024-09-06 09:01:05,936][01070] Num frames 7100... [2024-09-06 09:01:06,057][01070] Num frames 7200... [2024-09-06 09:01:06,179][01070] Num frames 7300... [2024-09-06 09:01:06,303][01070] Num frames 7400... [2024-09-06 09:01:06,469][01070] Avg episode rewards: #0: 38.779, true rewards: #0: 14.980 [2024-09-06 09:01:06,474][01070] Avg episode reward: 38.779, avg true_objective: 14.980 [2024-09-06 09:01:06,493][01070] Num frames 7500... [2024-09-06 09:01:06,629][01070] Num frames 7600... [2024-09-06 09:01:06,753][01070] Num frames 7700... [2024-09-06 09:01:06,875][01070] Num frames 7800... [2024-09-06 09:01:06,995][01070] Num frames 7900... [2024-09-06 09:01:07,118][01070] Num frames 8000... [2024-09-06 09:01:07,240][01070] Num frames 8100... [2024-09-06 09:01:07,359][01070] Num frames 8200... [2024-09-06 09:01:07,476][01070] Avg episode rewards: #0: 34.920, true rewards: #0: 13.753 [2024-09-06 09:01:07,478][01070] Avg episode reward: 34.920, avg true_objective: 13.753 [2024-09-06 09:01:07,541][01070] Num frames 8300... [2024-09-06 09:01:07,670][01070] Num frames 8400... [2024-09-06 09:01:07,789][01070] Num frames 8500... [2024-09-06 09:01:07,912][01070] Num frames 8600... [2024-09-06 09:01:08,030][01070] Num frames 8700... [2024-09-06 09:01:08,150][01070] Num frames 8800... [2024-09-06 09:01:08,275][01070] Num frames 8900... [2024-09-06 09:01:08,400][01070] Avg episode rewards: #0: 31.794, true rewards: #0: 12.794 [2024-09-06 09:01:08,402][01070] Avg episode reward: 31.794, avg true_objective: 12.794 [2024-09-06 09:01:08,457][01070] Num frames 9000... [2024-09-06 09:01:08,587][01070] Num frames 9100... [2024-09-06 09:01:08,723][01070] Num frames 9200... [2024-09-06 09:01:08,851][01070] Num frames 9300... [2024-09-06 09:01:08,979][01070] Num frames 9400... [2024-09-06 09:01:09,106][01070] Num frames 9500... [2024-09-06 09:01:09,230][01070] Num frames 9600... [2024-09-06 09:01:09,356][01070] Num frames 9700... [2024-09-06 09:01:09,487][01070] Num frames 9800... [2024-09-06 09:01:09,614][01070] Num frames 9900... [2024-09-06 09:01:09,745][01070] Num frames 10000... [2024-09-06 09:01:09,870][01070] Num frames 10100... [2024-09-06 09:01:09,995][01070] Num frames 10200... [2024-09-06 09:01:10,118][01070] Num frames 10300... [2024-09-06 09:01:10,238][01070] Num frames 10400... [2024-09-06 09:01:10,360][01070] Num frames 10500... [2024-09-06 09:01:10,493][01070] Num frames 10600... [2024-09-06 09:01:10,614][01070] Num frames 10700... [2024-09-06 09:01:10,741][01070] Num frames 10800... [2024-09-06 09:01:10,866][01070] Num frames 10900... [2024-09-06 09:01:10,993][01070] Num frames 11000... [2024-09-06 09:01:11,116][01070] Avg episode rewards: #0: 34.820, true rewards: #0: 13.820 [2024-09-06 09:01:11,118][01070] Avg episode reward: 34.820, avg true_objective: 13.820 [2024-09-06 09:01:11,175][01070] Num frames 11100... [2024-09-06 09:01:11,294][01070] Num frames 11200... [2024-09-06 09:01:11,414][01070] Num frames 11300... [2024-09-06 09:01:11,549][01070] Num frames 11400... [2024-09-06 09:01:11,615][01070] Avg episode rewards: #0: 31.453, true rewards: #0: 12.676 [2024-09-06 09:01:11,617][01070] Avg episode reward: 31.453, avg true_objective: 12.676 [2024-09-06 09:01:11,740][01070] Num frames 11500... [2024-09-06 09:01:11,869][01070] Num frames 11600... [2024-09-06 09:01:11,989][01070] Num frames 11700... [2024-09-06 09:01:12,106][01070] Num frames 11800... [2024-09-06 09:01:12,231][01070] Avg episode rewards: #0: 29.156, true rewards: #0: 11.856 [2024-09-06 09:01:12,233][01070] Avg episode reward: 29.156, avg true_objective: 11.856 [2024-09-06 09:02:25,893][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-06 09:02:31,093][01070] The model has been pushed to https://huggingface.co/Re-Re/rl_course_vizdoom_health_gathering_supreme [2024-09-06 09:06:01,283][01070] Environment doom_basic already registered, overwriting... [2024-09-06 09:06:01,285][01070] Environment doom_two_colors_easy already registered, overwriting... [2024-09-06 09:06:01,287][01070] Environment doom_two_colors_hard already registered, overwriting... [2024-09-06 09:06:01,290][01070] Environment doom_dm already registered, overwriting... [2024-09-06 09:06:01,293][01070] Environment doom_dwango5 already registered, overwriting... [2024-09-06 09:06:01,295][01070] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-06 09:06:01,296][01070] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-06 09:06:01,298][01070] Environment doom_my_way_home already registered, overwriting... [2024-09-06 09:06:01,300][01070] Environment doom_deadly_corridor already registered, overwriting... [2024-09-06 09:06:01,302][01070] Environment doom_defend_the_center already registered, overwriting... [2024-09-06 09:06:01,304][01070] Environment doom_defend_the_line already registered, overwriting... [2024-09-06 09:06:01,306][01070] Environment doom_health_gathering already registered, overwriting... [2024-09-06 09:06:01,308][01070] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-06 09:06:01,310][01070] Environment doom_battle already registered, overwriting... [2024-09-06 09:06:01,313][01070] Environment doom_battle2 already registered, overwriting... [2024-09-06 09:06:01,315][01070] Environment doom_duel_bots already registered, overwriting... [2024-09-06 09:06:01,316][01070] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-06 09:06:01,318][01070] Environment doom_duel already registered, overwriting... [2024-09-06 09:06:01,320][01070] Environment doom_deathmatch_full already registered, overwriting... [2024-09-06 09:06:01,322][01070] Environment doom_benchmark already registered, overwriting... [2024-09-06 09:06:01,323][01070] register_encoder_factory: [2024-09-06 09:06:01,341][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 09:06:01,342][01070] Overriding arg 'train_for_env_steps' with value 7500000 passed from command line [2024-09-06 09:06:01,349][01070] Experiment dir /content/train_dir/default_experiment already exists! [2024-09-06 09:06:01,350][01070] Resuming existing experiment from /content/train_dir/default_experiment... [2024-09-06 09:06:01,352][01070] Weights and Biases integration disabled [2024-09-06 09:06:01,355][01070] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-09-06 09:06:03,516][01070] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=7500000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-09-06 09:06:03,519][01070] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-06 09:06:03,524][01070] Rollout worker 0 uses device cpu [2024-09-06 09:06:03,526][01070] Rollout worker 1 uses device cpu [2024-09-06 09:06:03,527][01070] Rollout worker 2 uses device cpu [2024-09-06 09:06:03,529][01070] Rollout worker 3 uses device cpu [2024-09-06 09:06:03,530][01070] Rollout worker 4 uses device cpu [2024-09-06 09:06:03,531][01070] Rollout worker 5 uses device cpu [2024-09-06 09:06:03,532][01070] Rollout worker 6 uses device cpu [2024-09-06 09:06:03,534][01070] Rollout worker 7 uses device cpu [2024-09-06 09:06:03,608][01070] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 09:06:03,610][01070] InferenceWorker_p0-w0: min num requests: 2 [2024-09-06 09:06:03,643][01070] Starting all processes... [2024-09-06 09:06:03,645][01070] Starting process learner_proc0 [2024-09-06 09:06:03,693][01070] Starting all processes... [2024-09-06 09:06:03,699][01070] Starting process inference_proc0-0 [2024-09-06 09:06:03,700][01070] Starting process rollout_proc0 [2024-09-06 09:06:03,701][01070] Starting process rollout_proc1 [2024-09-06 09:06:03,701][01070] Starting process rollout_proc2 [2024-09-06 09:06:03,702][01070] Starting process rollout_proc3 [2024-09-06 09:06:03,702][01070] Starting process rollout_proc4 [2024-09-06 09:06:03,702][01070] Starting process rollout_proc5 [2024-09-06 09:06:03,702][01070] Starting process rollout_proc6 [2024-09-06 09:06:03,702][01070] Starting process rollout_proc7 [2024-09-06 09:06:20,401][26905] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 09:06:20,401][26905] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-06 09:06:20,451][26905] Num visible devices: 1 [2024-09-06 09:06:20,487][26905] Starting seed is not provided [2024-09-06 09:06:20,488][26905] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 09:06:20,489][26905] Initializing actor-critic model on device cuda:0 [2024-09-06 09:06:20,489][26905] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 09:06:20,491][26905] RunningMeanStd input shape: (1,) [2024-09-06 09:06:20,545][26905] ConvEncoder: input_channels=3 [2024-09-06 09:06:20,881][26918] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 09:06:20,882][26918] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-06 09:06:20,909][26926] Worker 7 uses CPU cores [1] [2024-09-06 09:06:20,990][26918] Num visible devices: 1 [2024-09-06 09:06:21,196][26923] Worker 4 uses CPU cores [0] [2024-09-06 09:06:21,222][26924] Worker 5 uses CPU cores [1] [2024-09-06 09:06:21,235][26919] Worker 0 uses CPU cores [0] [2024-09-06 09:06:21,331][26920] Worker 1 uses CPU cores [1] [2024-09-06 09:06:21,331][26921] Worker 2 uses CPU cores [0] [2024-09-06 09:06:21,344][26905] Conv encoder output size: 512 [2024-09-06 09:06:21,345][26905] Policy head output size: 512 [2024-09-06 09:06:21,393][26922] Worker 3 uses CPU cores [1] [2024-09-06 09:06:21,395][26905] Created Actor Critic model with architecture: [2024-09-06 09:06:21,395][26905] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-06 09:06:21,429][26925] Worker 6 uses CPU cores [0] [2024-09-06 09:06:21,538][26905] Using optimizer [2024-09-06 09:06:22,136][26905] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2024-09-06 09:06:22,176][26905] Loading model from checkpoint [2024-09-06 09:06:22,178][26905] Loaded experiment state at self.train_step=1222, self.env_steps=5005312 [2024-09-06 09:06:22,178][26905] Initialized policy 0 weights for model version 1222 [2024-09-06 09:06:22,183][26905] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 09:06:22,190][26905] LearnerWorker_p0 finished initialization! [2024-09-06 09:06:22,275][26918] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 09:06:22,276][26918] RunningMeanStd input shape: (1,) [2024-09-06 09:06:22,288][26918] ConvEncoder: input_channels=3 [2024-09-06 09:06:22,391][26918] Conv encoder output size: 512 [2024-09-06 09:06:22,391][26918] Policy head output size: 512 [2024-09-06 09:06:22,443][01070] Inference worker 0-0 is ready! [2024-09-06 09:06:22,445][01070] All inference workers are ready! Signal rollout workers to start! [2024-09-06 09:06:22,677][26920] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:06:22,689][26922] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:06:22,692][26924] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:06:22,683][26926] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:06:22,720][26919] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:06:22,725][26925] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:06:22,727][26921] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:06:22,715][26923] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:06:23,532][26919] Decorrelating experience for 0 frames... [2024-09-06 09:06:23,600][01070] Heartbeat connected on Batcher_0 [2024-09-06 09:06:23,605][01070] Heartbeat connected on LearnerWorker_p0 [2024-09-06 09:06:23,634][01070] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-06 09:06:23,927][26923] Decorrelating experience for 0 frames... [2024-09-06 09:06:24,411][26922] Decorrelating experience for 0 frames... [2024-09-06 09:06:24,421][26924] Decorrelating experience for 0 frames... [2024-09-06 09:06:24,414][26920] Decorrelating experience for 0 frames... [2024-09-06 09:06:24,428][26926] Decorrelating experience for 0 frames... [2024-09-06 09:06:26,027][26919] Decorrelating experience for 32 frames... [2024-09-06 09:06:26,059][26921] Decorrelating experience for 0 frames... [2024-09-06 09:06:26,129][26922] Decorrelating experience for 32 frames... [2024-09-06 09:06:26,130][26920] Decorrelating experience for 32 frames... [2024-09-06 09:06:26,135][26924] Decorrelating experience for 32 frames... [2024-09-06 09:06:26,142][26926] Decorrelating experience for 32 frames... [2024-09-06 09:06:26,359][01070] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 5005312. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 09:06:27,542][26921] Decorrelating experience for 32 frames... [2024-09-06 09:06:27,689][26923] Decorrelating experience for 32 frames... [2024-09-06 09:06:27,841][26920] Decorrelating experience for 64 frames... [2024-09-06 09:06:28,170][26919] Decorrelating experience for 64 frames... [2024-09-06 09:06:29,208][26925] Decorrelating experience for 0 frames... [2024-09-06 09:06:29,215][26922] Decorrelating experience for 64 frames... [2024-09-06 09:06:29,356][26921] Decorrelating experience for 64 frames... [2024-09-06 09:06:29,802][26924] Decorrelating experience for 64 frames... [2024-09-06 09:06:29,918][26920] Decorrelating experience for 96 frames... [2024-09-06 09:06:30,160][01070] Heartbeat connected on RolloutWorker_w1 [2024-09-06 09:06:30,247][26919] Decorrelating experience for 96 frames... [2024-09-06 09:06:30,413][01070] Heartbeat connected on RolloutWorker_w0 [2024-09-06 09:06:30,668][26921] Decorrelating experience for 96 frames... [2024-09-06 09:06:30,864][01070] Heartbeat connected on RolloutWorker_w2 [2024-09-06 09:06:30,935][26926] Decorrelating experience for 64 frames... [2024-09-06 09:06:31,356][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 09:06:31,362][01070] Avg episode reward: [(0, '0.380')] [2024-09-06 09:06:31,506][26924] Decorrelating experience for 96 frames... [2024-09-06 09:06:31,519][26923] Decorrelating experience for 64 frames... [2024-09-06 09:06:31,875][01070] Heartbeat connected on RolloutWorker_w5 [2024-09-06 09:06:33,162][26926] Decorrelating experience for 96 frames... [2024-09-06 09:06:33,550][01070] Heartbeat connected on RolloutWorker_w7 [2024-09-06 09:06:33,915][26922] Decorrelating experience for 96 frames... [2024-09-06 09:06:34,378][01070] Heartbeat connected on RolloutWorker_w3 [2024-09-06 09:06:34,636][26905] Signal inference workers to stop experience collection... [2024-09-06 09:06:34,665][26918] InferenceWorker_p0-w0: stopping experience collection [2024-09-06 09:06:34,723][26923] Decorrelating experience for 96 frames... [2024-09-06 09:06:34,840][01070] Heartbeat connected on RolloutWorker_w4 [2024-09-06 09:06:35,103][26925] Decorrelating experience for 32 frames... [2024-09-06 09:06:35,634][26925] Decorrelating experience for 64 frames... [2024-09-06 09:06:36,059][26925] Decorrelating experience for 96 frames... [2024-09-06 09:06:36,142][01070] Heartbeat connected on RolloutWorker_w6 [2024-09-06 09:06:36,355][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 5005312. Throughput: 0: 218.7. Samples: 2186. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 09:06:36,358][01070] Avg episode reward: [(0, '5.207')] [2024-09-06 09:06:37,319][26905] Signal inference workers to resume experience collection... [2024-09-06 09:06:37,321][26918] InferenceWorker_p0-w0: resuming experience collection [2024-09-06 09:06:41,356][01070] Fps is (10 sec: 2048.0, 60 sec: 1365.6, 300 sec: 1365.6). Total num frames: 5025792. Throughput: 0: 364.1. Samples: 5460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:06:41,358][01070] Avg episode reward: [(0, '6.203')] [2024-09-06 09:06:46,356][01070] Fps is (10 sec: 3276.8, 60 sec: 1638.6, 300 sec: 1638.6). Total num frames: 5038080. Throughput: 0: 379.1. Samples: 7580. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 09:06:46,366][01070] Avg episode reward: [(0, '11.223')] [2024-09-06 09:06:47,757][26918] Updated weights for policy 0, policy_version 1232 (0.0226) [2024-09-06 09:06:51,356][01070] Fps is (10 sec: 3686.3, 60 sec: 2294.0, 300 sec: 2294.0). Total num frames: 5062656. Throughput: 0: 512.5. Samples: 12812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:06:51,362][01070] Avg episode reward: [(0, '17.027')] [2024-09-06 09:06:56,356][01070] Fps is (10 sec: 4096.0, 60 sec: 2457.8, 300 sec: 2457.8). Total num frames: 5079040. Throughput: 0: 637.0. Samples: 19108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:06:56,358][01070] Avg episode reward: [(0, '18.689')] [2024-09-06 09:06:57,557][26918] Updated weights for policy 0, policy_version 1242 (0.0039) [2024-09-06 09:07:01,360][01070] Fps is (10 sec: 3275.6, 60 sec: 2574.6, 300 sec: 2574.6). Total num frames: 5095424. Throughput: 0: 621.9. Samples: 21768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:07:01,363][01070] Avg episode reward: [(0, '20.324')] [2024-09-06 09:07:06,356][01070] Fps is (10 sec: 3686.4, 60 sec: 2765.0, 300 sec: 2765.0). Total num frames: 5115904. Throughput: 0: 649.8. Samples: 25990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:07:06,357][01070] Avg episode reward: [(0, '21.032')] [2024-09-06 09:07:08,828][26918] Updated weights for policy 0, policy_version 1252 (0.0020) [2024-09-06 09:07:11,356][01070] Fps is (10 sec: 4097.5, 60 sec: 2912.9, 300 sec: 2912.9). Total num frames: 5136384. Throughput: 0: 737.4. Samples: 33182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:07:11,364][01070] Avg episode reward: [(0, '25.508')] [2024-09-06 09:07:16,356][01070] Fps is (10 sec: 4095.7, 60 sec: 3031.2, 300 sec: 3031.2). Total num frames: 5156864. Throughput: 0: 816.6. Samples: 36746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:07:16,362][01070] Avg episode reward: [(0, '26.378')] [2024-09-06 09:07:19,470][26918] Updated weights for policy 0, policy_version 1262 (0.0038) [2024-09-06 09:07:21,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3053.5, 300 sec: 3053.5). Total num frames: 5173248. Throughput: 0: 874.2. Samples: 41524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:07:21,358][01070] Avg episode reward: [(0, '25.084')] [2024-09-06 09:07:26,356][01070] Fps is (10 sec: 3686.6, 60 sec: 3140.4, 300 sec: 3140.4). Total num frames: 5193728. Throughput: 0: 937.4. Samples: 47642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:07:26,358][01070] Avg episode reward: [(0, '24.867')] [2024-09-06 09:07:28,996][26918] Updated weights for policy 0, policy_version 1272 (0.0035) [2024-09-06 09:07:31,356][01070] Fps is (10 sec: 4505.7, 60 sec: 3549.9, 300 sec: 3277.0). Total num frames: 5218304. Throughput: 0: 967.2. Samples: 51102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:07:31,358][01070] Avg episode reward: [(0, '24.314')] [2024-09-06 09:07:36,356][01070] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3276.9). Total num frames: 5234688. Throughput: 0: 981.9. Samples: 56998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:07:36,360][01070] Avg episode reward: [(0, '22.614')] [2024-09-06 09:07:40,320][26918] Updated weights for policy 0, policy_version 1282 (0.0028) [2024-09-06 09:07:41,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3331.5). Total num frames: 5255168. Throughput: 0: 955.6. Samples: 62112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:07:41,358][01070] Avg episode reward: [(0, '21.784')] [2024-09-06 09:07:46,356][01070] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3430.5). Total num frames: 5279744. Throughput: 0: 978.5. Samples: 65798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:07:46,358][01070] Avg episode reward: [(0, '22.907')] [2024-09-06 09:07:48,978][26918] Updated weights for policy 0, policy_version 1292 (0.0023) [2024-09-06 09:07:51,356][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3421.5). Total num frames: 5296128. Throughput: 0: 1038.5. Samples: 72724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:07:51,363][01070] Avg episode reward: [(0, '23.940')] [2024-09-06 09:07:56,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3413.4). Total num frames: 5312512. Throughput: 0: 974.1. Samples: 77018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:07:56,358][01070] Avg episode reward: [(0, '23.407')] [2024-09-06 09:08:00,195][26918] Updated weights for policy 0, policy_version 1302 (0.0029) [2024-09-06 09:08:01,356][01070] Fps is (10 sec: 4096.0, 60 sec: 4028.0, 300 sec: 3492.5). Total num frames: 5337088. Throughput: 0: 964.8. Samples: 80160. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 09:08:01,362][01070] Avg episode reward: [(0, '23.887')] [2024-09-06 09:08:01,369][26905] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001303_5337088.pth... [2024-09-06 09:08:01,513][26905] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001172_4800512.pth [2024-09-06 09:08:06,356][01070] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3563.6). Total num frames: 5361664. Throughput: 0: 1017.4. Samples: 87308. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:08:06,363][01070] Avg episode reward: [(0, '25.013')] [2024-09-06 09:08:10,733][26918] Updated weights for policy 0, policy_version 1312 (0.0019) [2024-09-06 09:08:11,358][01070] Fps is (10 sec: 3685.5, 60 sec: 3959.3, 300 sec: 3510.9). Total num frames: 5373952. Throughput: 0: 993.7. Samples: 92362. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 09:08:11,364][01070] Avg episode reward: [(0, '24.217')] [2024-09-06 09:08:16,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3537.5). Total num frames: 5394432. Throughput: 0: 965.2. Samples: 94536. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:08:16,363][01070] Avg episode reward: [(0, '23.685')] [2024-09-06 09:08:20,626][26918] Updated weights for policy 0, policy_version 1322 (0.0015) [2024-09-06 09:08:21,356][01070] Fps is (10 sec: 4096.9, 60 sec: 4027.8, 300 sec: 3561.8). Total num frames: 5414912. Throughput: 0: 991.2. Samples: 101604. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:08:21,365][01070] Avg episode reward: [(0, '23.700')] [2024-09-06 09:08:26,356][01070] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3584.1). Total num frames: 5435392. Throughput: 0: 1018.9. Samples: 107962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:08:26,358][01070] Avg episode reward: [(0, '22.939')] [2024-09-06 09:08:31,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3571.8). Total num frames: 5451776. Throughput: 0: 982.6. Samples: 110016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:08:31,362][01070] Avg episode reward: [(0, '21.481')] [2024-09-06 09:08:32,058][26918] Updated weights for policy 0, policy_version 1332 (0.0022) [2024-09-06 09:08:36,356][01070] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3623.5). Total num frames: 5476352. Throughput: 0: 964.4. Samples: 116120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:08:36,358][01070] Avg episode reward: [(0, '21.845')] [2024-09-06 09:08:40,559][26918] Updated weights for policy 0, policy_version 1342 (0.0036) [2024-09-06 09:08:41,357][01070] Fps is (10 sec: 4504.8, 60 sec: 4027.6, 300 sec: 3640.9). Total num frames: 5496832. Throughput: 0: 1029.1. Samples: 123328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:08:41,362][01070] Avg episode reward: [(0, '22.820')] [2024-09-06 09:08:46,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3628.0). Total num frames: 5513216. Throughput: 0: 1012.0. Samples: 125700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:08:46,358][01070] Avg episode reward: [(0, '24.266')] [2024-09-06 09:08:51,356][01070] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 3644.1). Total num frames: 5533696. Throughput: 0: 961.3. Samples: 130566. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 09:08:51,359][01070] Avg episode reward: [(0, '26.457')] [2024-09-06 09:08:52,046][26918] Updated weights for policy 0, policy_version 1352 (0.0038) [2024-09-06 09:08:56,355][01070] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3686.5). Total num frames: 5558272. Throughput: 0: 1008.9. Samples: 137758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:08:56,363][01070] Avg episode reward: [(0, '26.386')] [2024-09-06 09:09:01,358][01070] Fps is (10 sec: 4095.1, 60 sec: 3959.3, 300 sec: 3673.2). Total num frames: 5574656. Throughput: 0: 1032.2. Samples: 140988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:09:01,363][01070] Avg episode reward: [(0, '26.784')] [2024-09-06 09:09:01,373][26905] Saving new best policy, reward=26.784! [2024-09-06 09:09:02,724][26918] Updated weights for policy 0, policy_version 1362 (0.0027) [2024-09-06 09:09:06,356][01070] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3635.3). Total num frames: 5586944. Throughput: 0: 968.6. Samples: 145192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:09:06,359][01070] Avg episode reward: [(0, '26.871')] [2024-09-06 09:09:06,422][26905] Saving new best policy, reward=26.871! [2024-09-06 09:09:11,356][01070] Fps is (10 sec: 3687.2, 60 sec: 3959.6, 300 sec: 3674.1). Total num frames: 5611520. Throughput: 0: 970.1. Samples: 151618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:09:11,362][01070] Avg episode reward: [(0, '26.221')] [2024-09-06 09:09:12,605][26918] Updated weights for policy 0, policy_version 1372 (0.0024) [2024-09-06 09:09:16,361][01070] Fps is (10 sec: 4912.6, 60 sec: 4027.4, 300 sec: 3710.4). Total num frames: 5636096. Throughput: 0: 1003.0. Samples: 155156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:09:16,368][01070] Avg episode reward: [(0, '25.283')] [2024-09-06 09:09:21,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3674.8). Total num frames: 5648384. Throughput: 0: 987.0. Samples: 160534. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:09:21,361][01070] Avg episode reward: [(0, '25.122')] [2024-09-06 09:09:24,141][26918] Updated weights for policy 0, policy_version 1382 (0.0031) [2024-09-06 09:09:26,356][01070] Fps is (10 sec: 3278.5, 60 sec: 3891.2, 300 sec: 3686.5). Total num frames: 5668864. Throughput: 0: 951.7. Samples: 166152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:09:26,358][01070] Avg episode reward: [(0, '25.810')] [2024-09-06 09:09:31,356][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3719.7). Total num frames: 5693440. Throughput: 0: 974.4. Samples: 169550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:09:31,362][01070] Avg episode reward: [(0, '24.912')] [2024-09-06 09:09:32,863][26918] Updated weights for policy 0, policy_version 1392 (0.0021) [2024-09-06 09:09:36,357][01070] Fps is (10 sec: 4095.7, 60 sec: 3891.1, 300 sec: 3708.0). Total num frames: 5709824. Throughput: 0: 1008.7. Samples: 175958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:09:36,359][01070] Avg episode reward: [(0, '25.858')] [2024-09-06 09:09:41,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3697.0). Total num frames: 5726208. Throughput: 0: 948.8. Samples: 180456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:09:41,361][01070] Avg episode reward: [(0, '25.156')] [2024-09-06 09:09:44,121][26918] Updated weights for policy 0, policy_version 1402 (0.0019) [2024-09-06 09:09:46,356][01070] Fps is (10 sec: 4096.4, 60 sec: 3959.4, 300 sec: 3727.4). Total num frames: 5750784. Throughput: 0: 955.1. Samples: 183966. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 09:09:46,362][01070] Avg episode reward: [(0, '24.774')] [2024-09-06 09:09:51,356][01070] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3736.4). Total num frames: 5771264. Throughput: 0: 1022.3. Samples: 191196. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:09:51,358][01070] Avg episode reward: [(0, '25.942')] [2024-09-06 09:09:54,349][26918] Updated weights for policy 0, policy_version 1412 (0.0027) [2024-09-06 09:09:56,356][01070] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3725.4). Total num frames: 5787648. Throughput: 0: 982.8. Samples: 195844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:09:56,359][01070] Avg episode reward: [(0, '25.874')] [2024-09-06 09:10:01,356][01070] Fps is (10 sec: 2867.3, 60 sec: 3754.8, 300 sec: 3696.0). Total num frames: 5799936. Throughput: 0: 943.3. Samples: 197598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:10:01,363][01070] Avg episode reward: [(0, '26.130')] [2024-09-06 09:10:01,378][26905] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001416_5799936.pth... [2024-09-06 09:10:01,573][26905] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth [2024-09-06 09:10:06,356][01070] Fps is (10 sec: 2867.4, 60 sec: 3822.9, 300 sec: 3686.5). Total num frames: 5816320. Throughput: 0: 924.6. Samples: 202142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:10:06,357][01070] Avg episode reward: [(0, '25.358')] [2024-09-06 09:10:07,253][26918] Updated weights for policy 0, policy_version 1422 (0.0021) [2024-09-06 09:10:11,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3695.6). Total num frames: 5836800. Throughput: 0: 948.5. Samples: 208836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:10:11,366][01070] Avg episode reward: [(0, '26.176')] [2024-09-06 09:10:16,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3618.5, 300 sec: 3686.4). Total num frames: 5853184. Throughput: 0: 920.9. Samples: 210992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:10:16,358][01070] Avg episode reward: [(0, '25.004')] [2024-09-06 09:10:18,768][26918] Updated weights for policy 0, policy_version 1432 (0.0026) [2024-09-06 09:10:21,356][01070] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3712.6). Total num frames: 5877760. Throughput: 0: 901.2. Samples: 216510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:10:21,363][01070] Avg episode reward: [(0, '25.456')] [2024-09-06 09:10:26,355][01070] Fps is (10 sec: 4915.2, 60 sec: 3891.2, 300 sec: 3737.6). Total num frames: 5902336. Throughput: 0: 961.4. Samples: 223718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:10:26,360][01070] Avg episode reward: [(0, '24.545')] [2024-09-06 09:10:27,436][26918] Updated weights for policy 0, policy_version 1442 (0.0016) [2024-09-06 09:10:31,360][01070] Fps is (10 sec: 3684.9, 60 sec: 3686.2, 300 sec: 3711.5). Total num frames: 5914624. Throughput: 0: 949.5. Samples: 226696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:10:31,370][01070] Avg episode reward: [(0, '25.006')] [2024-09-06 09:10:36,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3719.2). Total num frames: 5935104. Throughput: 0: 885.6. Samples: 231046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:10:36,358][01070] Avg episode reward: [(0, '25.665')] [2024-09-06 09:10:38,547][26918] Updated weights for policy 0, policy_version 1452 (0.0016) [2024-09-06 09:10:41,356][01070] Fps is (10 sec: 4507.4, 60 sec: 3891.2, 300 sec: 3742.7). Total num frames: 5959680. Throughput: 0: 940.1. Samples: 238146. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 09:10:41,357][01070] Avg episode reward: [(0, '25.912')] [2024-09-06 09:10:46,356][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3749.5). Total num frames: 5980160. Throughput: 0: 981.6. Samples: 241772. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-09-06 09:10:46,362][01070] Avg episode reward: [(0, '25.585')] [2024-09-06 09:10:49,005][26918] Updated weights for policy 0, policy_version 1462 (0.0028) [2024-09-06 09:10:51,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3725.1). Total num frames: 5992448. Throughput: 0: 986.2. Samples: 246522. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:10:51,364][01070] Avg episode reward: [(0, '25.632')] [2024-09-06 09:10:56,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3747.1). Total num frames: 6017024. Throughput: 0: 971.8. Samples: 252568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:10:56,363][01070] Avg episode reward: [(0, '27.421')] [2024-09-06 09:10:56,365][26905] Saving new best policy, reward=27.421! [2024-09-06 09:10:59,150][26918] Updated weights for policy 0, policy_version 1472 (0.0039) [2024-09-06 09:11:01,356][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3753.5). Total num frames: 6037504. Throughput: 0: 998.2. Samples: 255912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:11:01,358][01070] Avg episode reward: [(0, '26.535')] [2024-09-06 09:11:06,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3745.0). Total num frames: 6053888. Throughput: 0: 997.8. Samples: 261410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:11:06,359][01070] Avg episode reward: [(0, '25.905')] [2024-09-06 09:11:10,964][26918] Updated weights for policy 0, policy_version 1482 (0.0037) [2024-09-06 09:11:11,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3736.7). Total num frames: 6070272. Throughput: 0: 944.5. Samples: 266222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:11:11,358][01070] Avg episode reward: [(0, '26.962')] [2024-09-06 09:11:16,356][01070] Fps is (10 sec: 4095.7, 60 sec: 4027.7, 300 sec: 3757.1). Total num frames: 6094848. Throughput: 0: 958.3. Samples: 269818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:11:16,360][01070] Avg episode reward: [(0, '26.222')] [2024-09-06 09:11:19,722][26918] Updated weights for policy 0, policy_version 1492 (0.0021) [2024-09-06 09:11:21,356][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 6115328. Throughput: 0: 1021.4. Samples: 277008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:11:21,363][01070] Avg episode reward: [(0, '26.579')] [2024-09-06 09:11:26,356][01070] Fps is (10 sec: 3277.0, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 6127616. Throughput: 0: 959.7. Samples: 281332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:11:26,360][01070] Avg episode reward: [(0, '26.111')] [2024-09-06 09:11:30,986][26918] Updated weights for policy 0, policy_version 1502 (0.0029) [2024-09-06 09:11:31,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3887.7). Total num frames: 6152192. Throughput: 0: 948.0. Samples: 284432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:11:31,358][01070] Avg episode reward: [(0, '27.333')] [2024-09-06 09:11:36,356][01070] Fps is (10 sec: 4914.9, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 6176768. Throughput: 0: 998.4. Samples: 291450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:11:36,358][01070] Avg episode reward: [(0, '27.316')] [2024-09-06 09:11:41,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 6189056. Throughput: 0: 983.0. Samples: 296802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:11:41,362][01070] Avg episode reward: [(0, '27.328')] [2024-09-06 09:11:41,444][26918] Updated weights for policy 0, policy_version 1512 (0.0034) [2024-09-06 09:11:46,356][01070] Fps is (10 sec: 3277.0, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 6209536. Throughput: 0: 956.0. Samples: 298930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:11:46,363][01070] Avg episode reward: [(0, '27.148')] [2024-09-06 09:11:51,072][26918] Updated weights for policy 0, policy_version 1522 (0.0025) [2024-09-06 09:11:51,356][01070] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 6234112. Throughput: 0: 990.5. Samples: 305984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:11:51,358][01070] Avg episode reward: [(0, '27.216')] [2024-09-06 09:11:56,356][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 6254592. Throughput: 0: 1028.2. Samples: 312490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 09:11:56,361][01070] Avg episode reward: [(0, '26.377')] [2024-09-06 09:12:01,358][01070] Fps is (10 sec: 3276.0, 60 sec: 3822.8, 300 sec: 3901.6). Total num frames: 6266880. Throughput: 0: 995.6. Samples: 314624. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:12:01,361][01070] Avg episode reward: [(0, '26.319')] [2024-09-06 09:12:01,378][26905] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001530_6266880.pth... [2024-09-06 09:12:01,525][26905] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001303_5337088.pth [2024-09-06 09:12:02,686][26918] Updated weights for policy 0, policy_version 1532 (0.0038) [2024-09-06 09:12:06,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 6291456. Throughput: 0: 960.0. Samples: 320206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:12:06,363][01070] Avg episode reward: [(0, '26.331')] [2024-09-06 09:12:11,270][26918] Updated weights for policy 0, policy_version 1542 (0.0038) [2024-09-06 09:12:11,356][01070] Fps is (10 sec: 4916.4, 60 sec: 4096.0, 300 sec: 3929.4). Total num frames: 6316032. Throughput: 0: 1024.0. Samples: 327410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:12:11,362][01070] Avg episode reward: [(0, '25.599')] [2024-09-06 09:12:16,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 6328320. Throughput: 0: 1012.3. Samples: 329984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:12:16,364][01070] Avg episode reward: [(0, '26.978')] [2024-09-06 09:12:21,356][01070] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 6348800. Throughput: 0: 960.3. Samples: 334664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:12:21,360][01070] Avg episode reward: [(0, '27.058')] [2024-09-06 09:12:22,708][26918] Updated weights for policy 0, policy_version 1552 (0.0022) [2024-09-06 09:12:26,356][01070] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 6373376. Throughput: 0: 1002.0. Samples: 341894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:12:26,360][01070] Avg episode reward: [(0, '26.297')] [2024-09-06 09:12:31,356][01070] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 6389760. Throughput: 0: 1035.0. Samples: 345504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:12:31,362][01070] Avg episode reward: [(0, '25.448')] [2024-09-06 09:12:32,950][26918] Updated weights for policy 0, policy_version 1562 (0.0029) [2024-09-06 09:12:36,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3901.6). Total num frames: 6406144. Throughput: 0: 975.6. Samples: 349884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:12:36,358][01070] Avg episode reward: [(0, '24.471')] [2024-09-06 09:12:41,356][01070] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 6430720. Throughput: 0: 974.3. Samples: 356334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:12:41,358][01070] Avg episode reward: [(0, '24.485')] [2024-09-06 09:12:42,679][26918] Updated weights for policy 0, policy_version 1572 (0.0032) [2024-09-06 09:12:46,359][01070] Fps is (10 sec: 4913.6, 60 sec: 4095.8, 300 sec: 3929.3). Total num frames: 6455296. Throughput: 0: 1007.1. Samples: 359942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:12:46,361][01070] Avg episode reward: [(0, '23.821')] [2024-09-06 09:12:51,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 6467584. Throughput: 0: 1006.8. Samples: 365512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:12:51,359][01070] Avg episode reward: [(0, '24.404')] [2024-09-06 09:12:54,080][26918] Updated weights for policy 0, policy_version 1582 (0.0035) [2024-09-06 09:12:56,356][01070] Fps is (10 sec: 3277.8, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 6488064. Throughput: 0: 965.9. Samples: 370876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:12:56,358][01070] Avg episode reward: [(0, '25.540')] [2024-09-06 09:13:01,356][01070] Fps is (10 sec: 4505.6, 60 sec: 4096.2, 300 sec: 3901.6). Total num frames: 6512640. Throughput: 0: 987.1. Samples: 374404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:13:01,360][01070] Avg episode reward: [(0, '25.883')] [2024-09-06 09:13:02,760][26918] Updated weights for policy 0, policy_version 1592 (0.0021) [2024-09-06 09:13:06,356][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 6529024. Throughput: 0: 1028.7. Samples: 380956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:13:06,359][01070] Avg episode reward: [(0, '27.463')] [2024-09-06 09:13:06,363][26905] Saving new best policy, reward=27.463! [2024-09-06 09:13:11,356][01070] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 6541312. Throughput: 0: 953.2. Samples: 384790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:13:11,359][01070] Avg episode reward: [(0, '26.136')] [2024-09-06 09:13:15,232][26918] Updated weights for policy 0, policy_version 1602 (0.0024) [2024-09-06 09:13:16,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 6565888. Throughput: 0: 938.2. Samples: 387722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:13:16,358][01070] Avg episode reward: [(0, '26.648')] [2024-09-06 09:13:21,356][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 6586368. Throughput: 0: 990.0. Samples: 394432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:13:21,362][01070] Avg episode reward: [(0, '26.222')] [2024-09-06 09:13:26,258][26918] Updated weights for policy 0, policy_version 1612 (0.0029) [2024-09-06 09:13:26,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 6602752. Throughput: 0: 956.8. Samples: 399388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:13:26,362][01070] Avg episode reward: [(0, '25.453')] [2024-09-06 09:13:31,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 6619136. Throughput: 0: 922.8. Samples: 401464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:13:31,363][01070] Avg episode reward: [(0, '25.128')] [2024-09-06 09:13:36,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 6639616. Throughput: 0: 946.2. Samples: 408090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:13:36,362][01070] Avg episode reward: [(0, '24.206')] [2024-09-06 09:13:36,380][26918] Updated weights for policy 0, policy_version 1622 (0.0021) [2024-09-06 09:13:41,356][01070] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 6660096. Throughput: 0: 955.2. Samples: 413862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:13:41,358][01070] Avg episode reward: [(0, '26.013')] [2024-09-06 09:13:46,359][01070] Fps is (10 sec: 2866.1, 60 sec: 3549.8, 300 sec: 3846.0). Total num frames: 6668288. Throughput: 0: 912.7. Samples: 415480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:13:46,363][01070] Avg episode reward: [(0, '26.088')] [2024-09-06 09:13:51,356][01070] Fps is (10 sec: 2048.0, 60 sec: 3549.9, 300 sec: 3804.4). Total num frames: 6680576. Throughput: 0: 838.4. Samples: 418684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:13:51,364][01070] Avg episode reward: [(0, '26.938')] [2024-09-06 09:13:51,474][26918] Updated weights for policy 0, policy_version 1632 (0.0031) [2024-09-06 09:13:56,356][01070] Fps is (10 sec: 3687.8, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 6705152. Throughput: 0: 892.4. Samples: 424946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:13:56,361][01070] Avg episode reward: [(0, '24.673')] [2024-09-06 09:14:00,965][26918] Updated weights for policy 0, policy_version 1642 (0.0021) [2024-09-06 09:14:01,356][01070] Fps is (10 sec: 4505.7, 60 sec: 3549.9, 300 sec: 3860.0). Total num frames: 6725632. Throughput: 0: 902.8. Samples: 428348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:14:01,359][01070] Avg episode reward: [(0, '26.432')] [2024-09-06 09:14:01,378][26905] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001642_6725632.pth... [2024-09-06 09:14:01,547][26905] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001416_5799936.pth [2024-09-06 09:14:06,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3818.3). Total num frames: 6737920. Throughput: 0: 856.3. Samples: 432966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:14:06,358][01070] Avg episode reward: [(0, '26.729')] [2024-09-06 09:14:11,356][01070] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3804.5). Total num frames: 6758400. Throughput: 0: 871.9. Samples: 438622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:14:11,360][01070] Avg episode reward: [(0, '26.781')] [2024-09-06 09:14:12,518][26918] Updated weights for policy 0, policy_version 1652 (0.0031) [2024-09-06 09:14:16,356][01070] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3846.1). Total num frames: 6782976. Throughput: 0: 901.9. Samples: 442050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:14:16,361][01070] Avg episode reward: [(0, '24.850')] [2024-09-06 09:14:21,356][01070] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3832.2). Total num frames: 6799360. Throughput: 0: 893.2. Samples: 448282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:14:21,363][01070] Avg episode reward: [(0, '24.715')] [2024-09-06 09:14:23,497][26918] Updated weights for policy 0, policy_version 1662 (0.0015) [2024-09-06 09:14:26,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3804.4). Total num frames: 6815744. Throughput: 0: 869.5. Samples: 452988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:14:26,363][01070] Avg episode reward: [(0, '24.597')] [2024-09-06 09:14:31,356][01070] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 6840320. Throughput: 0: 910.8. Samples: 456464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:14:31,362][01070] Avg episode reward: [(0, '25.878')] [2024-09-06 09:14:32,781][26918] Updated weights for policy 0, policy_version 1672 (0.0018) [2024-09-06 09:14:36,356][01070] Fps is (10 sec: 4505.4, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 6860800. Throughput: 0: 995.2. Samples: 463466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:14:36,359][01070] Avg episode reward: [(0, '26.414')] [2024-09-06 09:14:41,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3804.4). Total num frames: 6873088. Throughput: 0: 954.0. Samples: 467874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:14:41,360][01070] Avg episode reward: [(0, '24.276')] [2024-09-06 09:14:44,286][26918] Updated weights for policy 0, policy_version 1682 (0.0023) [2024-09-06 09:14:46,356][01070] Fps is (10 sec: 3686.5, 60 sec: 3823.2, 300 sec: 3818.3). Total num frames: 6897664. Throughput: 0: 940.6. Samples: 470674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:14:46,358][01070] Avg episode reward: [(0, '24.187')] [2024-09-06 09:14:51,356][01070] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 6922240. Throughput: 0: 997.2. Samples: 477840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:14:51,358][01070] Avg episode reward: [(0, '24.909')] [2024-09-06 09:14:53,391][26918] Updated weights for policy 0, policy_version 1692 (0.0019) [2024-09-06 09:14:56,356][01070] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 6938624. Throughput: 0: 992.7. Samples: 483292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:14:56,363][01070] Avg episode reward: [(0, '25.437')] [2024-09-06 09:15:01,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 6955008. Throughput: 0: 962.7. Samples: 485372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:15:01,362][01070] Avg episode reward: [(0, '24.558')] [2024-09-06 09:15:04,835][26918] Updated weights for policy 0, policy_version 1702 (0.0034) [2024-09-06 09:15:06,356][01070] Fps is (10 sec: 3686.5, 60 sec: 3959.4, 300 sec: 3860.0). Total num frames: 6975488. Throughput: 0: 964.8. Samples: 491700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:15:06,359][01070] Avg episode reward: [(0, '25.932')] [2024-09-06 09:15:11,359][01070] Fps is (10 sec: 4094.7, 60 sec: 3959.3, 300 sec: 3873.8). Total num frames: 6995968. Throughput: 0: 999.5. Samples: 497968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:15:11,361][01070] Avg episode reward: [(0, '26.598')] [2024-09-06 09:15:16,356][01070] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 7008256. Throughput: 0: 966.7. Samples: 499964. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 09:15:16,360][01070] Avg episode reward: [(0, '26.669')] [2024-09-06 09:15:16,629][26918] Updated weights for policy 0, policy_version 1712 (0.0029) [2024-09-06 09:15:21,356][01070] Fps is (10 sec: 3687.5, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 7032832. Throughput: 0: 934.1. Samples: 505502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:15:21,361][01070] Avg episode reward: [(0, '28.340')] [2024-09-06 09:15:21,369][26905] Saving new best policy, reward=28.340! [2024-09-06 09:15:25,736][26918] Updated weights for policy 0, policy_version 1722 (0.0017) [2024-09-06 09:15:26,356][01070] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3860.0). Total num frames: 7053312. Throughput: 0: 988.7. Samples: 512366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:15:26,361][01070] Avg episode reward: [(0, '26.164')] [2024-09-06 09:15:31,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7069696. Throughput: 0: 981.7. Samples: 514852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:15:31,358][01070] Avg episode reward: [(0, '25.977')] [2024-09-06 09:15:36,356][01070] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 7086080. Throughput: 0: 917.2. Samples: 519116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 09:15:36,358][01070] Avg episode reward: [(0, '24.462')] [2024-09-06 09:15:37,677][26918] Updated weights for policy 0, policy_version 1732 (0.0020) [2024-09-06 09:15:41,356][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 7110656. Throughput: 0: 950.7. Samples: 526072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:15:41,363][01070] Avg episode reward: [(0, '26.157')] [2024-09-06 09:15:46,356][01070] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7127040. Throughput: 0: 981.0. Samples: 529516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:15:46,359][01070] Avg episode reward: [(0, '26.246')] [2024-09-06 09:15:48,160][26918] Updated weights for policy 0, policy_version 1742 (0.0027) [2024-09-06 09:15:51,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 7143424. Throughput: 0: 937.5. Samples: 533888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:15:51,361][01070] Avg episode reward: [(0, '26.209')] [2024-09-06 09:15:56,355][01070] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 7168000. Throughput: 0: 938.6. Samples: 540204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:15:56,361][01070] Avg episode reward: [(0, '25.364')] [2024-09-06 09:15:58,022][26918] Updated weights for policy 0, policy_version 1752 (0.0021) [2024-09-06 09:16:01,362][01070] Fps is (10 sec: 4502.7, 60 sec: 3890.8, 300 sec: 3846.0). Total num frames: 7188480. Throughput: 0: 974.7. Samples: 543832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:16:01,368][01070] Avg episode reward: [(0, '25.809')] [2024-09-06 09:16:01,384][26905] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001755_7188480.pth... [2024-09-06 09:16:01,576][26905] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001530_6266880.pth [2024-09-06 09:16:06,356][01070] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7204864. Throughput: 0: 972.9. Samples: 549282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:16:06,361][01070] Avg episode reward: [(0, '26.285')] [2024-09-06 09:16:09,912][26918] Updated weights for policy 0, policy_version 1762 (0.0018) [2024-09-06 09:16:11,356][01070] Fps is (10 sec: 3278.9, 60 sec: 3754.9, 300 sec: 3818.3). Total num frames: 7221248. Throughput: 0: 929.5. Samples: 554194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:16:11,362][01070] Avg episode reward: [(0, '25.498')] [2024-09-06 09:16:16,356][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 7245824. Throughput: 0: 949.5. Samples: 557580. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:16:16,358][01070] Avg episode reward: [(0, '25.721')] [2024-09-06 09:16:19,014][26918] Updated weights for policy 0, policy_version 1772 (0.0031) [2024-09-06 09:16:21,356][01070] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 7262208. Throughput: 0: 997.7. Samples: 564012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:16:21,363][01070] Avg episode reward: [(0, '26.446')] [2024-09-06 09:16:26,356][01070] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 7274496. Throughput: 0: 934.9. Samples: 568142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:16:26,361][01070] Avg episode reward: [(0, '26.027')] [2024-09-06 09:16:30,972][26918] Updated weights for policy 0, policy_version 1782 (0.0023) [2024-09-06 09:16:31,357][01070] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 7299072. Throughput: 0: 925.0. Samples: 571140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:16:31,361][01070] Avg episode reward: [(0, '27.410')] [2024-09-06 09:16:36,357][01070] Fps is (10 sec: 4914.6, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 7323648. Throughput: 0: 976.6. Samples: 577838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:16:36,365][01070] Avg episode reward: [(0, '26.279')] [2024-09-06 09:16:41,356][01070] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 7335936. Throughput: 0: 945.0. Samples: 582728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:16:41,361][01070] Avg episode reward: [(0, '26.079')] [2024-09-06 09:16:42,668][26918] Updated weights for policy 0, policy_version 1792 (0.0023) [2024-09-06 09:16:46,356][01070] Fps is (10 sec: 2867.6, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 7352320. Throughput: 0: 909.9. Samples: 584772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:16:46,362][01070] Avg episode reward: [(0, '24.573')] [2024-09-06 09:16:51,356][01070] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 7376896. Throughput: 0: 937.6. Samples: 591472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:16:51,358][01070] Avg episode reward: [(0, '24.745')] [2024-09-06 09:16:51,968][26918] Updated weights for policy 0, policy_version 1802 (0.0024) [2024-09-06 09:16:56,356][01070] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 7393280. Throughput: 0: 963.2. Samples: 597536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:16:56,363][01070] Avg episode reward: [(0, '24.638')] [2024-09-06 09:17:01,355][01070] Fps is (10 sec: 3276.9, 60 sec: 3686.8, 300 sec: 3790.5). Total num frames: 7409664. Throughput: 0: 931.6. Samples: 599500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:17:01,361][01070] Avg episode reward: [(0, '23.608')] [2024-09-06 09:17:04,174][26918] Updated weights for policy 0, policy_version 1812 (0.0038) [2024-09-06 09:17:06,356][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 7430144. Throughput: 0: 909.0. Samples: 604916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:17:06,361][01070] Avg episode reward: [(0, '24.117')] [2024-09-06 09:17:11,356][01070] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 7450624. Throughput: 0: 967.2. Samples: 611664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:17:11,360][01070] Avg episode reward: [(0, '25.639')] [2024-09-06 09:17:15,155][26918] Updated weights for policy 0, policy_version 1822 (0.0022) [2024-09-06 09:17:16,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3776.7). Total num frames: 7462912. Throughput: 0: 950.2. Samples: 613900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:17:16,360][01070] Avg episode reward: [(0, '24.625')] [2024-09-06 09:17:21,356][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 7483392. Throughput: 0: 902.2. Samples: 618438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:17:21,363][01070] Avg episode reward: [(0, '23.896')] [2024-09-06 09:17:26,357][01070] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3762.7). Total num frames: 7499776. Throughput: 0: 928.5. Samples: 624510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:17:26,362][01070] Avg episode reward: [(0, '23.156')] [2024-09-06 09:17:26,513][26918] Updated weights for policy 0, policy_version 1832 (0.0033) [2024-09-06 09:17:28,093][26905] Stopping Batcher_0... [2024-09-06 09:17:28,095][26905] Loop batcher_evt_loop terminating... [2024-09-06 09:17:28,094][01070] Component Batcher_0 stopped! [2024-09-06 09:17:28,112][26905] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001833_7507968.pth... [2024-09-06 09:17:28,214][26918] Weights refcount: 2 0 [2024-09-06 09:17:28,227][01070] Component InferenceWorker_p0-w0 stopped! [2024-09-06 09:17:28,231][26918] Stopping InferenceWorker_p0-w0... [2024-09-06 09:17:28,232][26918] Loop inference_proc0-0_evt_loop terminating... [2024-09-06 09:17:28,257][26905] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001642_6725632.pth [2024-09-06 09:17:28,275][26905] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001833_7507968.pth... [2024-09-06 09:17:28,542][01070] Component LearnerWorker_p0 stopped! [2024-09-06 09:17:28,549][26905] Stopping LearnerWorker_p0... [2024-09-06 09:17:28,550][26905] Loop learner_proc0_evt_loop terminating... [2024-09-06 09:17:28,936][26924] Stopping RolloutWorker_w5... [2024-09-06 09:17:28,931][01070] Component RolloutWorker_w5 stopped! [2024-09-06 09:17:28,944][01070] Component RolloutWorker_w2 stopped! [2024-09-06 09:17:28,946][26921] Stopping RolloutWorker_w2... [2024-09-06 09:17:28,954][01070] Component RolloutWorker_w0 stopped! [2024-09-06 09:17:28,958][26919] Stopping RolloutWorker_w0... [2024-09-06 09:17:28,958][26919] Loop rollout_proc0_evt_loop terminating... [2024-09-06 09:17:28,963][26922] Stopping RolloutWorker_w3... [2024-09-06 09:17:28,963][26922] Loop rollout_proc3_evt_loop terminating... [2024-09-06 09:17:28,963][01070] Component RolloutWorker_w3 stopped! [2024-09-06 09:17:28,947][26921] Loop rollout_proc2_evt_loop terminating... [2024-09-06 09:17:28,976][26924] Loop rollout_proc5_evt_loop terminating... [2024-09-06 09:17:28,980][01070] Component RolloutWorker_w4 stopped! [2024-09-06 09:17:28,982][26923] Stopping RolloutWorker_w4... [2024-09-06 09:17:28,983][26923] Loop rollout_proc4_evt_loop terminating... [2024-09-06 09:17:28,999][26920] Stopping RolloutWorker_w1... [2024-09-06 09:17:29,000][26920] Loop rollout_proc1_evt_loop terminating... [2024-09-06 09:17:28,999][01070] Component RolloutWorker_w1 stopped! [2024-09-06 09:17:29,023][01070] Component RolloutWorker_w6 stopped! [2024-09-06 09:17:29,025][26925] Stopping RolloutWorker_w6... [2024-09-06 09:17:29,026][26925] Loop rollout_proc6_evt_loop terminating... [2024-09-06 09:17:29,026][26926] Stopping RolloutWorker_w7... [2024-09-06 09:17:29,034][26926] Loop rollout_proc7_evt_loop terminating... [2024-09-06 09:17:29,035][01070] Component RolloutWorker_w7 stopped! [2024-09-06 09:17:29,037][01070] Waiting for process learner_proc0 to stop... [2024-09-06 09:17:31,285][01070] Waiting for process inference_proc0-0 to join... [2024-09-06 09:17:31,293][01070] Waiting for process rollout_proc0 to join... [2024-09-06 09:17:34,584][01070] Waiting for process rollout_proc1 to join... [2024-09-06 09:17:34,588][01070] Waiting for process rollout_proc2 to join... [2024-09-06 09:17:34,592][01070] Waiting for process rollout_proc3 to join... [2024-09-06 09:17:34,598][01070] Waiting for process rollout_proc4 to join... [2024-09-06 09:17:34,600][01070] Waiting for process rollout_proc5 to join... [2024-09-06 09:17:34,604][01070] Waiting for process rollout_proc6 to join... [2024-09-06 09:17:34,609][01070] Waiting for process rollout_proc7 to join... [2024-09-06 09:17:34,612][01070] Batcher 0 profile tree view: batching: 17.9925, releasing_batches: 0.0162 [2024-09-06 09:17:34,615][01070] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0010 wait_policy_total: 248.3928 update_model: 5.8404 weight_update: 0.0033 one_step: 0.0117 handle_policy_step: 381.6419 deserialize: 9.7064, stack: 1.8896, obs_to_device_normalize: 77.3906, forward: 201.8424, send_messages: 18.9870 prepare_outputs: 52.8040 to_cpu: 30.3389 [2024-09-06 09:17:34,617][01070] Learner 0 profile tree view: misc: 0.0038, prepare_batch: 8.6835 train: 47.1106 epoch_init: 0.0036, minibatch_init: 0.0064, losses_postprocess: 0.4114, kl_divergence: 0.4119, after_optimizer: 2.1648 calculate_losses: 16.5354 losses_init: 0.0030, forward_head: 1.0620, bptt_initial: 10.6952, tail: 0.7308, advantages_returns: 0.2059, losses: 2.4252 bptt: 1.2532 bptt_forward_core: 1.1527 update: 27.2086 clip: 0.5411 [2024-09-06 09:17:34,618][01070] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.1890, enqueue_policy_requests: 59.3218, env_step: 514.9058, overhead: 8.3718, complete_rollouts: 4.3760 save_policy_outputs: 13.4348 split_output_tensors: 5.2857 [2024-09-06 09:17:34,623][01070] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2244, enqueue_policy_requests: 61.7752, env_step: 509.9034, overhead: 8.3385, complete_rollouts: 4.7589 save_policy_outputs: 12.9599 split_output_tensors: 5.3185 [2024-09-06 09:17:34,624][01070] Loop Runner_EvtLoop terminating... [2024-09-06 09:17:34,626][01070] Runner profile tree view: main_loop: 690.9829 [2024-09-06 09:17:34,629][01070] Collected {0: 7507968}, FPS: 3621.9 [2024-09-06 09:17:45,779][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 09:17:45,780][01070] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-06 09:17:45,781][01070] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-06 09:17:45,782][01070] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-06 09:17:45,784][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 09:17:45,785][01070] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-06 09:17:45,786][01070] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 09:17:45,787][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-06 09:17:45,788][01070] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-06 09:17:45,789][01070] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-06 09:17:45,790][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-06 09:17:45,791][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-06 09:17:45,793][01070] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-06 09:17:45,794][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-06 09:17:45,795][01070] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-06 09:17:45,830][01070] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 09:17:45,831][01070] RunningMeanStd input shape: (1,) [2024-09-06 09:17:45,845][01070] ConvEncoder: input_channels=3 [2024-09-06 09:17:45,883][01070] Conv encoder output size: 512 [2024-09-06 09:17:45,885][01070] Policy head output size: 512 [2024-09-06 09:17:45,905][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001833_7507968.pth... [2024-09-06 09:17:46,321][01070] Num frames 100... [2024-09-06 09:17:46,441][01070] Num frames 200... [2024-09-06 09:17:46,582][01070] Num frames 300... [2024-09-06 09:17:46,701][01070] Num frames 400... [2024-09-06 09:17:46,820][01070] Num frames 500... [2024-09-06 09:17:46,938][01070] Num frames 600... [2024-09-06 09:17:47,062][01070] Num frames 700... [2024-09-06 09:17:47,185][01070] Num frames 800... [2024-09-06 09:17:47,317][01070] Num frames 900... [2024-09-06 09:17:47,445][01070] Avg episode rewards: #0: 19.600, true rewards: #0: 9.600 [2024-09-06 09:17:47,447][01070] Avg episode reward: 19.600, avg true_objective: 9.600 [2024-09-06 09:17:47,505][01070] Num frames 1000... [2024-09-06 09:17:47,628][01070] Num frames 1100... [2024-09-06 09:17:47,749][01070] Num frames 1200... [2024-09-06 09:17:47,870][01070] Num frames 1300... [2024-09-06 09:17:47,994][01070] Num frames 1400... [2024-09-06 09:17:48,115][01070] Num frames 1500... [2024-09-06 09:17:48,237][01070] Num frames 1600... [2024-09-06 09:17:48,367][01070] Num frames 1700... [2024-09-06 09:17:48,500][01070] Num frames 1800... [2024-09-06 09:17:48,628][01070] Num frames 1900... [2024-09-06 09:17:48,753][01070] Num frames 2000... [2024-09-06 09:17:48,873][01070] Num frames 2100... [2024-09-06 09:17:48,996][01070] Num frames 2200... [2024-09-06 09:17:49,120][01070] Num frames 2300... [2024-09-06 09:17:49,242][01070] Num frames 2400... [2024-09-06 09:17:49,375][01070] Num frames 2500... [2024-09-06 09:17:49,505][01070] Num frames 2600... [2024-09-06 09:17:49,630][01070] Num frames 2700... [2024-09-06 09:17:49,753][01070] Num frames 2800... [2024-09-06 09:17:49,829][01070] Avg episode rewards: #0: 33.080, true rewards: #0: 14.080 [2024-09-06 09:17:49,832][01070] Avg episode reward: 33.080, avg true_objective: 14.080 [2024-09-06 09:17:49,934][01070] Num frames 2900... [2024-09-06 09:17:50,057][01070] Num frames 3000... [2024-09-06 09:17:50,185][01070] Num frames 3100... [2024-09-06 09:17:50,309][01070] Num frames 3200... [2024-09-06 09:17:50,439][01070] Num frames 3300... [2024-09-06 09:17:50,572][01070] Num frames 3400... [2024-09-06 09:17:50,696][01070] Num frames 3500... [2024-09-06 09:17:50,818][01070] Num frames 3600... [2024-09-06 09:17:50,943][01070] Num frames 3700... [2024-09-06 09:17:51,066][01070] Num frames 3800... [2024-09-06 09:17:51,193][01070] Num frames 3900... [2024-09-06 09:17:51,319][01070] Num frames 4000... [2024-09-06 09:17:51,454][01070] Num frames 4100... [2024-09-06 09:17:51,590][01070] Num frames 4200... [2024-09-06 09:17:51,712][01070] Num frames 4300... [2024-09-06 09:17:51,829][01070] Avg episode rewards: #0: 34.170, true rewards: #0: 14.503 [2024-09-06 09:17:51,831][01070] Avg episode reward: 34.170, avg true_objective: 14.503 [2024-09-06 09:17:51,892][01070] Num frames 4400... [2024-09-06 09:17:52,013][01070] Num frames 4500... [2024-09-06 09:17:52,134][01070] Num frames 4600... [2024-09-06 09:17:52,254][01070] Num frames 4700... [2024-09-06 09:17:52,392][01070] Num frames 4800... [2024-09-06 09:17:52,519][01070] Num frames 4900... [2024-09-06 09:17:52,643][01070] Num frames 5000... [2024-09-06 09:17:52,767][01070] Num frames 5100... [2024-09-06 09:17:52,887][01070] Num frames 5200... [2024-09-06 09:17:53,047][01070] Avg episode rewards: #0: 31.220, true rewards: #0: 13.220 [2024-09-06 09:17:53,050][01070] Avg episode reward: 31.220, avg true_objective: 13.220 [2024-09-06 09:17:53,067][01070] Num frames 5300... [2024-09-06 09:17:53,188][01070] Num frames 5400... [2024-09-06 09:17:53,311][01070] Num frames 5500... [2024-09-06 09:17:53,439][01070] Num frames 5600... [2024-09-06 09:17:53,569][01070] Num frames 5700... [2024-09-06 09:17:53,688][01070] Num frames 5800... [2024-09-06 09:17:53,820][01070] Num frames 5900... [2024-09-06 09:17:53,942][01070] Num frames 6000... [2024-09-06 09:17:54,064][01070] Num frames 6100... [2024-09-06 09:17:54,186][01070] Num frames 6200... [2024-09-06 09:17:54,305][01070] Num frames 6300... [2024-09-06 09:17:54,429][01070] Num frames 6400... [2024-09-06 09:17:54,564][01070] Num frames 6500... [2024-09-06 09:17:54,684][01070] Num frames 6600... [2024-09-06 09:17:54,803][01070] Num frames 6700... [2024-09-06 09:17:54,921][01070] Num frames 6800... [2024-09-06 09:17:55,042][01070] Num frames 6900... [2024-09-06 09:17:55,161][01070] Num frames 7000... [2024-09-06 09:17:55,283][01070] Num frames 7100... [2024-09-06 09:17:55,406][01070] Num frames 7200... [2024-09-06 09:17:55,568][01070] Avg episode rewards: #0: 36.544, true rewards: #0: 14.544 [2024-09-06 09:17:55,570][01070] Avg episode reward: 36.544, avg true_objective: 14.544 [2024-09-06 09:17:55,620][01070] Num frames 7300... [2024-09-06 09:17:55,785][01070] Num frames 7400... [2024-09-06 09:17:55,948][01070] Num frames 7500... [2024-09-06 09:17:56,111][01070] Num frames 7600... [2024-09-06 09:17:56,313][01070] Avg episode rewards: #0: 31.980, true rewards: #0: 12.813 [2024-09-06 09:17:56,316][01070] Avg episode reward: 31.980, avg true_objective: 12.813 [2024-09-06 09:17:56,341][01070] Num frames 7700... [2024-09-06 09:17:56,516][01070] Num frames 7800... [2024-09-06 09:17:56,682][01070] Num frames 7900... [2024-09-06 09:17:56,852][01070] Num frames 8000... [2024-09-06 09:17:57,025][01070] Num frames 8100... [2024-09-06 09:17:57,187][01070] Num frames 8200... [2024-09-06 09:17:57,363][01070] Num frames 8300... [2024-09-06 09:17:57,543][01070] Num frames 8400... [2024-09-06 09:17:57,725][01070] Num frames 8500... [2024-09-06 09:17:57,897][01070] Num frames 8600... [2024-09-06 09:17:58,074][01070] Num frames 8700... [2024-09-06 09:17:58,236][01070] Avg episode rewards: #0: 30.823, true rewards: #0: 12.537 [2024-09-06 09:17:58,239][01070] Avg episode reward: 30.823, avg true_objective: 12.537 [2024-09-06 09:17:58,269][01070] Num frames 8800... [2024-09-06 09:17:58,395][01070] Num frames 8900... [2024-09-06 09:17:58,527][01070] Num frames 9000... [2024-09-06 09:17:58,658][01070] Num frames 9100... [2024-09-06 09:17:58,782][01070] Num frames 9200... [2024-09-06 09:17:58,903][01070] Num frames 9300... [2024-09-06 09:17:59,024][01070] Num frames 9400... [2024-09-06 09:17:59,146][01070] Num frames 9500... [2024-09-06 09:17:59,272][01070] Num frames 9600... [2024-09-06 09:17:59,395][01070] Num frames 9700... [2024-09-06 09:17:59,524][01070] Num frames 9800... [2024-09-06 09:17:59,647][01070] Num frames 9900... [2024-09-06 09:17:59,745][01070] Avg episode rewards: #0: 30.285, true rewards: #0: 12.410 [2024-09-06 09:17:59,746][01070] Avg episode reward: 30.285, avg true_objective: 12.410 [2024-09-06 09:17:59,839][01070] Num frames 10000... [2024-09-06 09:17:59,962][01070] Num frames 10100... [2024-09-06 09:18:00,084][01070] Num frames 10200... [2024-09-06 09:18:00,206][01070] Num frames 10300... [2024-09-06 09:18:00,365][01070] Avg episode rewards: #0: 27.871, true rewards: #0: 11.538 [2024-09-06 09:18:00,366][01070] Avg episode reward: 27.871, avg true_objective: 11.538 [2024-09-06 09:18:00,389][01070] Num frames 10400... [2024-09-06 09:18:00,522][01070] Num frames 10500... [2024-09-06 09:18:00,660][01070] Num frames 10600... [2024-09-06 09:18:00,798][01070] Num frames 10700... [2024-09-06 09:18:00,918][01070] Num frames 10800... [2024-09-06 09:18:01,038][01070] Num frames 10900... [2024-09-06 09:18:01,163][01070] Num frames 11000... [2024-09-06 09:18:01,284][01070] Num frames 11100... [2024-09-06 09:18:01,405][01070] Num frames 11200... [2024-09-06 09:18:01,533][01070] Num frames 11300... [2024-09-06 09:18:01,658][01070] Num frames 11400... [2024-09-06 09:18:01,790][01070] Num frames 11500... [2024-09-06 09:18:01,915][01070] Num frames 11600... [2024-09-06 09:18:02,004][01070] Avg episode rewards: #0: 27.925, true rewards: #0: 11.625 [2024-09-06 09:18:02,006][01070] Avg episode reward: 27.925, avg true_objective: 11.625 [2024-09-06 09:19:12,012][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-06 09:19:36,347][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 09:19:36,349][01070] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-06 09:19:36,350][01070] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-06 09:19:36,351][01070] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-06 09:19:36,355][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 09:19:36,356][01070] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-06 09:19:36,359][01070] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-06 09:19:36,359][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-06 09:19:36,361][01070] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-06 09:19:36,362][01070] Adding new argument 'hf_repository'='Re-Re/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-06 09:19:36,364][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-06 09:19:36,365][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-06 09:19:36,367][01070] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-06 09:19:36,368][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-06 09:19:36,369][01070] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-06 09:19:36,399][01070] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 09:19:36,400][01070] RunningMeanStd input shape: (1,) [2024-09-06 09:19:36,414][01070] ConvEncoder: input_channels=3 [2024-09-06 09:19:36,451][01070] Conv encoder output size: 512 [2024-09-06 09:19:36,452][01070] Policy head output size: 512 [2024-09-06 09:19:36,477][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001833_7507968.pth... [2024-09-06 09:19:36,955][01070] Num frames 100... [2024-09-06 09:19:37,079][01070] Num frames 200... [2024-09-06 09:19:37,209][01070] Num frames 300... [2024-09-06 09:19:37,385][01070] Num frames 400... [2024-09-06 09:19:37,559][01070] Num frames 500... [2024-09-06 09:19:37,729][01070] Num frames 600... [2024-09-06 09:19:37,920][01070] Num frames 700... [2024-09-06 09:19:38,094][01070] Num frames 800... [2024-09-06 09:19:38,257][01070] Num frames 900... [2024-09-06 09:19:38,425][01070] Num frames 1000... [2024-09-06 09:19:38,607][01070] Num frames 1100... [2024-09-06 09:19:38,779][01070] Num frames 1200... [2024-09-06 09:19:38,970][01070] Num frames 1300... [2024-09-06 09:19:39,151][01070] Num frames 1400... [2024-09-06 09:19:39,333][01070] Num frames 1500... [2024-09-06 09:19:39,514][01070] Num frames 1600... [2024-09-06 09:19:39,697][01070] Num frames 1700... [2024-09-06 09:19:39,863][01070] Num frames 1800... [2024-09-06 09:19:39,995][01070] Num frames 1900... [2024-09-06 09:19:40,121][01070] Num frames 2000... [2024-09-06 09:19:40,238][01070] Avg episode rewards: #0: 51.479, true rewards: #0: 20.480 [2024-09-06 09:19:40,240][01070] Avg episode reward: 51.479, avg true_objective: 20.480 [2024-09-06 09:19:40,307][01070] Num frames 2100... [2024-09-06 09:19:40,432][01070] Num frames 2200... [2024-09-06 09:19:40,565][01070] Num frames 2300... [2024-09-06 09:19:40,689][01070] Num frames 2400... [2024-09-06 09:19:40,851][01070] Avg episode rewards: #0: 28.429, true rewards: #0: 12.430 [2024-09-06 09:19:40,853][01070] Avg episode reward: 28.429, avg true_objective: 12.430 [2024-09-06 09:19:40,874][01070] Num frames 2500... [2024-09-06 09:19:41,006][01070] Num frames 2600... [2024-09-06 09:19:41,130][01070] Num frames 2700... [2024-09-06 09:19:41,253][01070] Num frames 2800... [2024-09-06 09:19:41,420][01070] Avg episode rewards: #0: 21.636, true rewards: #0: 9.637 [2024-09-06 09:19:41,422][01070] Avg episode reward: 21.636, avg true_objective: 9.637 [2024-09-06 09:19:41,438][01070] Num frames 2900... [2024-09-06 09:19:41,568][01070] Num frames 3000... [2024-09-06 09:19:41,692][01070] Num frames 3100... [2024-09-06 09:19:41,815][01070] Num frames 3200... [2024-09-06 09:19:41,942][01070] Num frames 3300... [2024-09-06 09:19:42,072][01070] Num frames 3400... [2024-09-06 09:19:42,225][01070] Avg episode rewards: #0: 19.450, true rewards: #0: 8.700 [2024-09-06 09:19:42,228][01070] Avg episode reward: 19.450, avg true_objective: 8.700 [2024-09-06 09:19:42,256][01070] Num frames 3500... [2024-09-06 09:19:42,378][01070] Num frames 3600... [2024-09-06 09:19:42,511][01070] Num frames 3700... [2024-09-06 09:19:42,635][01070] Num frames 3800... [2024-09-06 09:19:42,758][01070] Num frames 3900... [2024-09-06 09:19:42,882][01070] Num frames 4000... [2024-09-06 09:19:43,006][01070] Num frames 4100... [2024-09-06 09:19:43,138][01070] Num frames 4200... [2024-09-06 09:19:43,263][01070] Num frames 4300... [2024-09-06 09:19:43,384][01070] Num frames 4400... [2024-09-06 09:19:43,510][01070] Num frames 4500... [2024-09-06 09:19:43,637][01070] Num frames 4600... [2024-09-06 09:19:43,807][01070] Avg episode rewards: #0: 20.992, true rewards: #0: 9.392 [2024-09-06 09:19:43,808][01070] Avg episode reward: 20.992, avg true_objective: 9.392 [2024-09-06 09:19:43,817][01070] Num frames 4700... [2024-09-06 09:19:43,941][01070] Num frames 4800... [2024-09-06 09:19:44,073][01070] Num frames 4900... [2024-09-06 09:19:44,194][01070] Num frames 5000... [2024-09-06 09:19:44,313][01070] Num frames 5100... [2024-09-06 09:19:44,458][01070] Avg episode rewards: #0: 19.460, true rewards: #0: 8.627 [2024-09-06 09:19:44,461][01070] Avg episode reward: 19.460, avg true_objective: 8.627 [2024-09-06 09:19:44,499][01070] Num frames 5200... [2024-09-06 09:19:44,619][01070] Num frames 5300... [2024-09-06 09:19:44,747][01070] Num frames 5400... [2024-09-06 09:19:44,868][01070] Num frames 5500... [2024-09-06 09:19:44,985][01070] Num frames 5600... [2024-09-06 09:19:45,160][01070] Avg episode rewards: #0: 18.424, true rewards: #0: 8.139 [2024-09-06 09:19:45,162][01070] Avg episode reward: 18.424, avg true_objective: 8.139 [2024-09-06 09:19:45,169][01070] Num frames 5700... [2024-09-06 09:19:45,291][01070] Num frames 5800... [2024-09-06 09:19:45,409][01070] Num frames 5900... [2024-09-06 09:19:45,542][01070] Num frames 6000... [2024-09-06 09:19:45,659][01070] Num frames 6100... [2024-09-06 09:19:45,778][01070] Num frames 6200... [2024-09-06 09:19:45,899][01070] Num frames 6300... [2024-09-06 09:19:46,016][01070] Num frames 6400... [2024-09-06 09:19:46,141][01070] Num frames 6500... [2024-09-06 09:19:46,260][01070] Num frames 6600... [2024-09-06 09:19:46,384][01070] Num frames 6700... [2024-09-06 09:19:46,514][01070] Num frames 6800... [2024-09-06 09:19:46,639][01070] Num frames 6900... [2024-09-06 09:19:46,763][01070] Num frames 7000... [2024-09-06 09:19:46,887][01070] Num frames 7100... [2024-09-06 09:19:47,010][01070] Num frames 7200... [2024-09-06 09:19:47,139][01070] Num frames 7300... [2024-09-06 09:19:47,260][01070] Num frames 7400... [2024-09-06 09:19:47,388][01070] Num frames 7500... [2024-09-06 09:19:47,517][01070] Num frames 7600... [2024-09-06 09:19:47,639][01070] Num frames 7700... [2024-09-06 09:19:47,813][01070] Avg episode rewards: #0: 23.371, true rewards: #0: 9.746 [2024-09-06 09:19:47,815][01070] Avg episode reward: 23.371, avg true_objective: 9.746 [2024-09-06 09:19:47,821][01070] Num frames 7800... [2024-09-06 09:19:47,942][01070] Num frames 7900... [2024-09-06 09:19:48,066][01070] Num frames 8000... [2024-09-06 09:19:48,202][01070] Num frames 8100... [2024-09-06 09:19:48,326][01070] Num frames 8200... [2024-09-06 09:19:48,450][01070] Num frames 8300... [2024-09-06 09:19:48,583][01070] Num frames 8400... [2024-09-06 09:19:48,708][01070] Num frames 8500... [2024-09-06 09:19:48,838][01070] Num frames 8600... [2024-09-06 09:19:48,963][01070] Num frames 8700... [2024-09-06 09:19:49,088][01070] Num frames 8800... [2024-09-06 09:19:49,227][01070] Num frames 8900... [2024-09-06 09:19:49,353][01070] Num frames 9000... [2024-09-06 09:19:49,485][01070] Num frames 9100... [2024-09-06 09:19:49,612][01070] Num frames 9200... [2024-09-06 09:19:49,738][01070] Num frames 9300... [2024-09-06 09:19:49,893][01070] Num frames 9400... [2024-09-06 09:19:50,071][01070] Num frames 9500... [2024-09-06 09:19:50,251][01070] Num frames 9600... [2024-09-06 09:19:50,422][01070] Num frames 9700... [2024-09-06 09:19:50,629][01070] Avg episode rewards: #0: 26.986, true rewards: #0: 10.876 [2024-09-06 09:19:50,631][01070] Avg episode reward: 26.986, avg true_objective: 10.876 [2024-09-06 09:19:50,658][01070] Num frames 9800... [2024-09-06 09:19:50,826][01070] Num frames 9900... [2024-09-06 09:19:50,989][01070] Num frames 10000... [2024-09-06 09:19:51,162][01070] Num frames 10100... [2024-09-06 09:19:51,348][01070] Num frames 10200... [2024-09-06 09:19:51,531][01070] Num frames 10300... [2024-09-06 09:19:51,711][01070] Num frames 10400... [2024-09-06 09:19:51,893][01070] Num frames 10500... [2024-09-06 09:19:52,069][01070] Num frames 10600... [2024-09-06 09:19:52,177][01070] Avg episode rewards: #0: 26.327, true rewards: #0: 10.627 [2024-09-06 09:19:52,179][01070] Avg episode reward: 26.327, avg true_objective: 10.627 [2024-09-06 09:20:57,694][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-06 09:21:03,134][01070] The model has been pushed to https://huggingface.co/Re-Re/rl_course_vizdoom_health_gathering_supreme [2024-09-06 09:22:18,448][01070] Environment doom_basic already registered, overwriting... [2024-09-06 09:22:18,451][01070] Environment doom_two_colors_easy already registered, overwriting... [2024-09-06 09:22:18,453][01070] Environment doom_two_colors_hard already registered, overwriting... [2024-09-06 09:22:18,454][01070] Environment doom_dm already registered, overwriting... [2024-09-06 09:22:18,459][01070] Environment doom_dwango5 already registered, overwriting... [2024-09-06 09:22:18,460][01070] Environment doom_my_way_home_flat_actions already registered, overwriting... [2024-09-06 09:22:18,461][01070] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2024-09-06 09:22:18,463][01070] Environment doom_my_way_home already registered, overwriting... [2024-09-06 09:22:18,465][01070] Environment doom_deadly_corridor already registered, overwriting... [2024-09-06 09:22:18,468][01070] Environment doom_defend_the_center already registered, overwriting... [2024-09-06 09:22:18,470][01070] Environment doom_defend_the_line already registered, overwriting... [2024-09-06 09:22:18,471][01070] Environment doom_health_gathering already registered, overwriting... [2024-09-06 09:22:18,472][01070] Environment doom_health_gathering_supreme already registered, overwriting... [2024-09-06 09:22:18,474][01070] Environment doom_battle already registered, overwriting... [2024-09-06 09:22:18,476][01070] Environment doom_battle2 already registered, overwriting... [2024-09-06 09:22:18,477][01070] Environment doom_duel_bots already registered, overwriting... [2024-09-06 09:22:18,480][01070] Environment doom_deathmatch_bots already registered, overwriting... [2024-09-06 09:22:18,481][01070] Environment doom_duel already registered, overwriting... [2024-09-06 09:22:18,482][01070] Environment doom_deathmatch_full already registered, overwriting... [2024-09-06 09:22:18,484][01070] Environment doom_benchmark already registered, overwriting... [2024-09-06 09:22:18,487][01070] register_encoder_factory: [2024-09-06 09:22:18,527][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 09:22:18,530][01070] Overriding arg 'train_for_env_steps' with value 10000000 passed from command line [2024-09-06 09:22:18,537][01070] Experiment dir /content/train_dir/default_experiment already exists! [2024-09-06 09:22:18,542][01070] Resuming existing experiment from /content/train_dir/default_experiment... [2024-09-06 09:22:18,544][01070] Weights and Biases integration disabled [2024-09-06 09:22:18,549][01070] Environment var CUDA_VISIBLE_DEVICES is 0 [2024-09-06 09:22:21,018][01070] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2024-09-06 09:22:21,020][01070] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-09-06 09:22:21,025][01070] Rollout worker 0 uses device cpu [2024-09-06 09:22:21,026][01070] Rollout worker 1 uses device cpu [2024-09-06 09:22:21,028][01070] Rollout worker 2 uses device cpu [2024-09-06 09:22:21,030][01070] Rollout worker 3 uses device cpu [2024-09-06 09:22:21,032][01070] Rollout worker 4 uses device cpu [2024-09-06 09:22:21,033][01070] Rollout worker 5 uses device cpu [2024-09-06 09:22:21,034][01070] Rollout worker 6 uses device cpu [2024-09-06 09:22:21,035][01070] Rollout worker 7 uses device cpu [2024-09-06 09:22:21,108][01070] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 09:22:21,110][01070] InferenceWorker_p0-w0: min num requests: 2 [2024-09-06 09:22:21,142][01070] Starting all processes... [2024-09-06 09:22:21,144][01070] Starting process learner_proc0 [2024-09-06 09:22:21,192][01070] Starting all processes... [2024-09-06 09:22:21,199][01070] Starting process inference_proc0-0 [2024-09-06 09:22:21,199][01070] Starting process rollout_proc0 [2024-09-06 09:22:21,202][01070] Starting process rollout_proc1 [2024-09-06 09:22:21,202][01070] Starting process rollout_proc2 [2024-09-06 09:22:21,202][01070] Starting process rollout_proc3 [2024-09-06 09:22:21,202][01070] Starting process rollout_proc4 [2024-09-06 09:22:21,202][01070] Starting process rollout_proc5 [2024-09-06 09:22:21,202][01070] Starting process rollout_proc6 [2024-09-06 09:22:21,202][01070] Starting process rollout_proc7 [2024-09-06 09:22:36,030][31321] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 09:22:36,032][31321] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-09-06 09:22:36,093][31321] Num visible devices: 1 [2024-09-06 09:22:36,128][31321] Starting seed is not provided [2024-09-06 09:22:36,129][31321] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 09:22:36,129][31321] Initializing actor-critic model on device cuda:0 [2024-09-06 09:22:36,130][31321] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 09:22:36,131][31321] RunningMeanStd input shape: (1,) [2024-09-06 09:22:36,216][31321] ConvEncoder: input_channels=3 [2024-09-06 09:22:36,685][31338] Worker 3 uses CPU cores [1] [2024-09-06 09:22:36,789][31339] Worker 4 uses CPU cores [0] [2024-09-06 09:22:36,944][31335] Worker 0 uses CPU cores [0] [2024-09-06 09:22:37,005][31321] Conv encoder output size: 512 [2024-09-06 09:22:37,007][31321] Policy head output size: 512 [2024-09-06 09:22:37,042][31321] Created Actor Critic model with architecture: [2024-09-06 09:22:37,044][31321] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-09-06 09:22:37,147][31341] Worker 7 uses CPU cores [1] [2024-09-06 09:22:37,225][31342] Worker 6 uses CPU cores [0] [2024-09-06 09:22:37,277][31337] Worker 2 uses CPU cores [0] [2024-09-06 09:22:37,291][31334] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 09:22:37,292][31334] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-09-06 09:22:37,329][31321] Using optimizer [2024-09-06 09:22:37,352][31334] Num visible devices: 1 [2024-09-06 09:22:37,367][31340] Worker 5 uses CPU cores [1] [2024-09-06 09:22:37,377][31336] Worker 1 uses CPU cores [1] [2024-09-06 09:22:37,959][31321] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001833_7507968.pth... [2024-09-06 09:22:38,004][31321] Loading model from checkpoint [2024-09-06 09:22:38,005][31321] Loaded experiment state at self.train_step=1833, self.env_steps=7507968 [2024-09-06 09:22:38,006][31321] Initialized policy 0 weights for model version 1833 [2024-09-06 09:22:38,010][31321] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-09-06 09:22:38,016][31321] LearnerWorker_p0 finished initialization! [2024-09-06 09:22:38,102][31334] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 09:22:38,103][31334] RunningMeanStd input shape: (1,) [2024-09-06 09:22:38,115][31334] ConvEncoder: input_channels=3 [2024-09-06 09:22:38,216][31334] Conv encoder output size: 512 [2024-09-06 09:22:38,216][31334] Policy head output size: 512 [2024-09-06 09:22:38,268][01070] Inference worker 0-0 is ready! [2024-09-06 09:22:38,269][01070] All inference workers are ready! Signal rollout workers to start! [2024-09-06 09:22:38,464][31338] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:22:38,472][31341] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:22:38,478][31340] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:22:38,479][31337] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:22:38,480][31336] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:22:38,484][31339] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:22:38,477][31342] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:22:38,494][31335] Doom resolution: 160x120, resize resolution: (128, 72) [2024-09-06 09:22:38,550][01070] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 7507968. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 09:22:40,039][31338] Decorrelating experience for 0 frames... [2024-09-06 09:22:40,046][31341] Decorrelating experience for 0 frames... [2024-09-06 09:22:40,050][31340] Decorrelating experience for 0 frames... [2024-09-06 09:22:40,067][31335] Decorrelating experience for 0 frames... [2024-09-06 09:22:40,071][31339] Decorrelating experience for 0 frames... [2024-09-06 09:22:40,073][31342] Decorrelating experience for 0 frames... [2024-09-06 09:22:40,457][31342] Decorrelating experience for 32 frames... [2024-09-06 09:22:41,103][01070] Heartbeat connected on Batcher_0 [2024-09-06 09:22:41,108][01070] Heartbeat connected on LearnerWorker_p0 [2024-09-06 09:22:41,149][31341] Decorrelating experience for 32 frames... [2024-09-06 09:22:41,152][31338] Decorrelating experience for 32 frames... [2024-09-06 09:22:41,156][01070] Heartbeat connected on InferenceWorker_p0-w0 [2024-09-06 09:22:41,155][31336] Decorrelating experience for 0 frames... [2024-09-06 09:22:41,251][31340] Decorrelating experience for 32 frames... [2024-09-06 09:22:41,935][31339] Decorrelating experience for 32 frames... [2024-09-06 09:22:41,990][31342] Decorrelating experience for 64 frames... [2024-09-06 09:22:42,475][31336] Decorrelating experience for 32 frames... [2024-09-06 09:22:42,759][31338] Decorrelating experience for 64 frames... [2024-09-06 09:22:42,770][31341] Decorrelating experience for 64 frames... [2024-09-06 09:22:42,938][31337] Decorrelating experience for 0 frames... [2024-09-06 09:22:43,009][31342] Decorrelating experience for 96 frames... [2024-09-06 09:22:43,190][01070] Heartbeat connected on RolloutWorker_w6 [2024-09-06 09:22:43,550][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 7507968. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 09:22:43,730][31341] Decorrelating experience for 96 frames... [2024-09-06 09:22:44,157][01070] Heartbeat connected on RolloutWorker_w7 [2024-09-06 09:22:44,451][31339] Decorrelating experience for 64 frames... [2024-09-06 09:22:44,856][31336] Decorrelating experience for 64 frames... [2024-09-06 09:22:44,864][31335] Decorrelating experience for 32 frames... [2024-09-06 09:22:46,558][31337] Decorrelating experience for 32 frames... [2024-09-06 09:22:47,472][31340] Decorrelating experience for 64 frames... [2024-09-06 09:22:47,615][31339] Decorrelating experience for 96 frames... [2024-09-06 09:22:47,884][31336] Decorrelating experience for 96 frames... [2024-09-06 09:22:48,559][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 7507968. Throughput: 0: 107.7. Samples: 1078. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-09-06 09:22:48,565][01070] Avg episode reward: [(0, '4.944')] [2024-09-06 09:22:48,624][01070] Heartbeat connected on RolloutWorker_w4 [2024-09-06 09:22:48,733][01070] Heartbeat connected on RolloutWorker_w1 [2024-09-06 09:22:49,622][31335] Decorrelating experience for 64 frames... [2024-09-06 09:22:51,819][31321] Signal inference workers to stop experience collection... [2024-09-06 09:22:51,828][31338] Decorrelating experience for 96 frames... [2024-09-06 09:22:51,835][31334] InferenceWorker_p0-w0: stopping experience collection [2024-09-06 09:22:52,119][31340] Decorrelating experience for 96 frames... [2024-09-06 09:22:52,209][01070] Heartbeat connected on RolloutWorker_w3 [2024-09-06 09:22:52,262][01070] Heartbeat connected on RolloutWorker_w5 [2024-09-06 09:22:52,408][31337] Decorrelating experience for 64 frames... [2024-09-06 09:22:52,497][31335] Decorrelating experience for 96 frames... [2024-09-06 09:22:52,617][01070] Heartbeat connected on RolloutWorker_w0 [2024-09-06 09:22:52,947][31337] Decorrelating experience for 96 frames... [2024-09-06 09:22:53,040][01070] Heartbeat connected on RolloutWorker_w2 [2024-09-06 09:22:53,160][31321] Signal inference workers to resume experience collection... [2024-09-06 09:22:53,160][31334] InferenceWorker_p0-w0: resuming experience collection [2024-09-06 09:22:53,550][01070] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 7512064. Throughput: 0: 155.3. Samples: 2330. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-09-06 09:22:53,554][01070] Avg episode reward: [(0, '6.489')] [2024-09-06 09:22:58,550][01070] Fps is (10 sec: 2870.0, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 7536640. Throughput: 0: 280.7. Samples: 5614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 09:22:58,553][01070] Avg episode reward: [(0, '9.260')] [2024-09-06 09:23:00,818][31334] Updated weights for policy 0, policy_version 1843 (0.0158) [2024-09-06 09:23:03,550][01070] Fps is (10 sec: 4096.0, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 7553024. Throughput: 0: 476.4. Samples: 11910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:23:03,552][01070] Avg episode reward: [(0, '15.401')] [2024-09-06 09:23:08,550][01070] Fps is (10 sec: 3276.7, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 7569408. Throughput: 0: 542.8. Samples: 16284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 09:23:08,552][01070] Avg episode reward: [(0, '17.480')] [2024-09-06 09:23:13,081][31334] Updated weights for policy 0, policy_version 1853 (0.0030) [2024-09-06 09:23:13,550][01070] Fps is (10 sec: 3686.4, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 7589888. Throughput: 0: 541.2. Samples: 18942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:23:13,555][01070] Avg episode reward: [(0, '23.472')] [2024-09-06 09:23:18,550][01070] Fps is (10 sec: 4505.7, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 7614464. Throughput: 0: 651.7. Samples: 26066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:23:18,556][01070] Avg episode reward: [(0, '24.891')] [2024-09-06 09:23:23,550][01070] Fps is (10 sec: 3686.4, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 7626752. Throughput: 0: 685.1. Samples: 30830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:23:23,554][01070] Avg episode reward: [(0, '25.069')] [2024-09-06 09:23:24,047][31334] Updated weights for policy 0, policy_version 1863 (0.0031) [2024-09-06 09:23:28,550][01070] Fps is (10 sec: 3276.8, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 7647232. Throughput: 0: 741.2. Samples: 33354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:23:28,551][01070] Avg episode reward: [(0, '27.639')] [2024-09-06 09:23:33,156][31334] Updated weights for policy 0, policy_version 1873 (0.0025) [2024-09-06 09:23:33,550][01070] Fps is (10 sec: 4505.6, 60 sec: 2978.9, 300 sec: 2978.9). Total num frames: 7671808. Throughput: 0: 875.4. Samples: 40464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:23:33,552][01070] Avg episode reward: [(0, '30.649')] [2024-09-06 09:23:33,557][31321] Saving new best policy, reward=30.649! [2024-09-06 09:23:38,550][01070] Fps is (10 sec: 4095.7, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 7688192. Throughput: 0: 974.7. Samples: 46194. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-06 09:23:38,553][01070] Avg episode reward: [(0, '32.833')] [2024-09-06 09:23:38,574][31321] Saving new best policy, reward=32.833! [2024-09-06 09:23:43,550][01070] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3024.7). Total num frames: 7704576. Throughput: 0: 947.1. Samples: 48236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:23:43,556][01070] Avg episode reward: [(0, '31.634')] [2024-09-06 09:23:45,051][31334] Updated weights for policy 0, policy_version 1883 (0.0024) [2024-09-06 09:23:48,550][01070] Fps is (10 sec: 4096.3, 60 sec: 3687.0, 300 sec: 3159.8). Total num frames: 7729152. Throughput: 0: 945.8. Samples: 54472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:23:48,555][01070] Avg episode reward: [(0, '30.545')] [2024-09-06 09:23:53,550][01070] Fps is (10 sec: 4505.8, 60 sec: 3959.5, 300 sec: 3222.2). Total num frames: 7749632. Throughput: 0: 1005.1. Samples: 61512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:23:53,555][01070] Avg episode reward: [(0, '29.693')] [2024-09-06 09:23:53,894][31334] Updated weights for policy 0, policy_version 1893 (0.0023) [2024-09-06 09:23:58,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3225.6). Total num frames: 7766016. Throughput: 0: 993.6. Samples: 63654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:23:58,552][01070] Avg episode reward: [(0, '29.410')] [2024-09-06 09:24:03,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 7786496. Throughput: 0: 953.6. Samples: 68980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:24:03,553][01070] Avg episode reward: [(0, '29.158')] [2024-09-06 09:24:04,972][31334] Updated weights for policy 0, policy_version 1903 (0.0015) [2024-09-06 09:24:08,550][01070] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3367.8). Total num frames: 7811072. Throughput: 0: 1005.6. Samples: 76080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:24:08,552][01070] Avg episode reward: [(0, '26.234')] [2024-09-06 09:24:13,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3363.0). Total num frames: 7827456. Throughput: 0: 1019.1. Samples: 79212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:24:13,554][01070] Avg episode reward: [(0, '26.548')] [2024-09-06 09:24:15,998][31334] Updated weights for policy 0, policy_version 1913 (0.0045) [2024-09-06 09:24:18,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3358.7). Total num frames: 7843840. Throughput: 0: 955.4. Samples: 83456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:24:18,555][01070] Avg episode reward: [(0, '26.397')] [2024-09-06 09:24:18,565][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001915_7843840.pth... [2024-09-06 09:24:18,702][31321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001755_7188480.pth [2024-09-06 09:24:23,550][01070] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3393.8). Total num frames: 7864320. Throughput: 0: 976.5. Samples: 90134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:24:23,555][01070] Avg episode reward: [(0, '27.525')] [2024-09-06 09:24:25,519][31334] Updated weights for policy 0, policy_version 1923 (0.0026) [2024-09-06 09:24:28,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3463.0). Total num frames: 7888896. Throughput: 0: 1010.0. Samples: 93684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:24:28,557][01070] Avg episode reward: [(0, '27.802')] [2024-09-06 09:24:33,550][01070] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3419.3). Total num frames: 7901184. Throughput: 0: 983.9. Samples: 98746. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:24:33,552][01070] Avg episode reward: [(0, '28.191')] [2024-09-06 09:24:36,946][31334] Updated weights for policy 0, policy_version 1933 (0.0021) [2024-09-06 09:24:38,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3447.5). Total num frames: 7921664. Throughput: 0: 955.6. Samples: 104512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:24:38,555][01070] Avg episode reward: [(0, '28.196')] [2024-09-06 09:24:43,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3506.2). Total num frames: 7946240. Throughput: 0: 986.7. Samples: 108056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:24:43,552][01070] Avg episode reward: [(0, '28.823')] [2024-09-06 09:24:45,998][31334] Updated weights for policy 0, policy_version 1943 (0.0024) [2024-09-06 09:24:48,552][01070] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3497.3). Total num frames: 7962624. Throughput: 0: 1001.7. Samples: 114060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:24:48,558][01070] Avg episode reward: [(0, '30.975')] [2024-09-06 09:24:53,550][01070] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3458.8). Total num frames: 7974912. Throughput: 0: 927.9. Samples: 117836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:24:53,552][01070] Avg episode reward: [(0, '30.576')] [2024-09-06 09:24:58,550][01070] Fps is (10 sec: 2867.7, 60 sec: 3754.7, 300 sec: 3452.3). Total num frames: 7991296. Throughput: 0: 904.0. Samples: 119892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:24:58,553][01070] Avg episode reward: [(0, '29.362')] [2024-09-06 09:25:00,366][31334] Updated weights for policy 0, policy_version 1953 (0.0019) [2024-09-06 09:25:03,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3474.5). Total num frames: 8011776. Throughput: 0: 940.5. Samples: 125780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:25:03,552][01070] Avg episode reward: [(0, '27.933')] [2024-09-06 09:25:08,551][01070] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3467.9). Total num frames: 8028160. Throughput: 0: 905.1. Samples: 130864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:25:08,557][01070] Avg episode reward: [(0, '27.199')] [2024-09-06 09:25:11,902][31334] Updated weights for policy 0, policy_version 1963 (0.0032) [2024-09-06 09:25:13,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3461.8). Total num frames: 8044544. Throughput: 0: 874.2. Samples: 133024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:25:13,552][01070] Avg episode reward: [(0, '26.753')] [2024-09-06 09:25:18,550][01070] Fps is (10 sec: 4096.7, 60 sec: 3754.7, 300 sec: 3507.2). Total num frames: 8069120. Throughput: 0: 914.2. Samples: 139886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:25:18,556][01070] Avg episode reward: [(0, '22.844')] [2024-09-06 09:25:20,756][31334] Updated weights for policy 0, policy_version 1973 (0.0025) [2024-09-06 09:25:23,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3525.0). Total num frames: 8089600. Throughput: 0: 923.9. Samples: 146086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:25:23,552][01070] Avg episode reward: [(0, '23.151')] [2024-09-06 09:25:28,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3493.6). Total num frames: 8101888. Throughput: 0: 891.0. Samples: 148150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:25:28,555][01070] Avg episode reward: [(0, '22.191')] [2024-09-06 09:25:32,254][31334] Updated weights for policy 0, policy_version 1983 (0.0018) [2024-09-06 09:25:33,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3534.3). Total num frames: 8126464. Throughput: 0: 892.6. Samples: 154226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:25:33,553][01070] Avg episode reward: [(0, '21.268')] [2024-09-06 09:25:38,550][01070] Fps is (10 sec: 4915.2, 60 sec: 3822.9, 300 sec: 3572.6). Total num frames: 8151040. Throughput: 0: 966.9. Samples: 161348. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 09:25:38,553][01070] Avg episode reward: [(0, '23.448')] [2024-09-06 09:25:42,015][31334] Updated weights for policy 0, policy_version 1993 (0.0023) [2024-09-06 09:25:43,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3542.5). Total num frames: 8163328. Throughput: 0: 973.9. Samples: 163716. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 09:25:43,566][01070] Avg episode reward: [(0, '23.931')] [2024-09-06 09:25:48,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3557.1). Total num frames: 8183808. Throughput: 0: 950.0. Samples: 168532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:25:48,554][01070] Avg episode reward: [(0, '24.324')] [2024-09-06 09:25:52,316][31334] Updated weights for policy 0, policy_version 2003 (0.0041) [2024-09-06 09:25:53,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3591.9). Total num frames: 8208384. Throughput: 0: 994.5. Samples: 175614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:25:53,557][01070] Avg episode reward: [(0, '24.519')] [2024-09-06 09:25:58,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3604.5). Total num frames: 8228864. Throughput: 0: 1026.0. Samples: 179196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:25:58,554][01070] Avg episode reward: [(0, '26.480')] [2024-09-06 09:26:03,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3576.5). Total num frames: 8241152. Throughput: 0: 967.9. Samples: 183442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:26:03,557][01070] Avg episode reward: [(0, '27.819')] [2024-09-06 09:26:03,999][31334] Updated weights for policy 0, policy_version 2013 (0.0022) [2024-09-06 09:26:08,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3608.4). Total num frames: 8265728. Throughput: 0: 975.7. Samples: 189992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:26:08,553][01070] Avg episode reward: [(0, '26.706')] [2024-09-06 09:26:12,456][31334] Updated weights for policy 0, policy_version 2023 (0.0028) [2024-09-06 09:26:13,550][01070] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3638.8). Total num frames: 8290304. Throughput: 0: 1009.5. Samples: 193576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:26:13,552][01070] Avg episode reward: [(0, '27.588')] [2024-09-06 09:26:18,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3611.9). Total num frames: 8302592. Throughput: 0: 990.3. Samples: 198790. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 09:26:18,552][01070] Avg episode reward: [(0, '27.201')] [2024-09-06 09:26:18,571][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002027_8302592.pth... [2024-09-06 09:26:18,738][31321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001833_7507968.pth [2024-09-06 09:26:23,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3622.7). Total num frames: 8323072. Throughput: 0: 964.0. Samples: 204726. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-09-06 09:26:23,552][01070] Avg episode reward: [(0, '28.094')] [2024-09-06 09:26:23,905][31334] Updated weights for policy 0, policy_version 2033 (0.0024) [2024-09-06 09:26:28,550][01070] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3650.8). Total num frames: 8347648. Throughput: 0: 990.6. Samples: 208294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:26:28,553][01070] Avg episode reward: [(0, '27.199')] [2024-09-06 09:26:33,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3642.8). Total num frames: 8364032. Throughput: 0: 1014.7. Samples: 214192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:26:33,555][01070] Avg episode reward: [(0, '26.943')] [2024-09-06 09:26:34,200][31334] Updated weights for policy 0, policy_version 2043 (0.0015) [2024-09-06 09:26:38,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3635.2). Total num frames: 8380416. Throughput: 0: 965.6. Samples: 219068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:26:38,555][01070] Avg episode reward: [(0, '25.237')] [2024-09-06 09:26:43,550][01070] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3661.3). Total num frames: 8404992. Throughput: 0: 965.8. Samples: 222658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:26:43,557][01070] Avg episode reward: [(0, '26.134')] [2024-09-06 09:26:43,951][31334] Updated weights for policy 0, policy_version 2053 (0.0019) [2024-09-06 09:26:48,554][01070] Fps is (10 sec: 4503.9, 60 sec: 4027.5, 300 sec: 3670.0). Total num frames: 8425472. Throughput: 0: 1024.3. Samples: 229538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:26:48,558][01070] Avg episode reward: [(0, '26.531')] [2024-09-06 09:26:53,551][01070] Fps is (10 sec: 3685.9, 60 sec: 3891.1, 300 sec: 3662.3). Total num frames: 8441856. Throughput: 0: 973.3. Samples: 233794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:26:53,557][01070] Avg episode reward: [(0, '26.496')] [2024-09-06 09:26:55,624][31334] Updated weights for policy 0, policy_version 2063 (0.0027) [2024-09-06 09:26:58,550][01070] Fps is (10 sec: 3687.8, 60 sec: 3891.2, 300 sec: 3670.6). Total num frames: 8462336. Throughput: 0: 961.7. Samples: 236854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:26:58,558][01070] Avg episode reward: [(0, '26.453')] [2024-09-06 09:27:03,550][01070] Fps is (10 sec: 4506.3, 60 sec: 4096.0, 300 sec: 3694.1). Total num frames: 8486912. Throughput: 0: 1003.4. Samples: 243944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:27:03,558][01070] Avg episode reward: [(0, '27.902')] [2024-09-06 09:27:04,386][31334] Updated weights for policy 0, policy_version 2073 (0.0021) [2024-09-06 09:27:08,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3671.2). Total num frames: 8499200. Throughput: 0: 988.0. Samples: 249184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:27:08,552][01070] Avg episode reward: [(0, '29.399')] [2024-09-06 09:27:13,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3679.0). Total num frames: 8519680. Throughput: 0: 957.6. Samples: 251386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:27:13,552][01070] Avg episode reward: [(0, '30.254')] [2024-09-06 09:27:15,982][31334] Updated weights for policy 0, policy_version 2083 (0.0032) [2024-09-06 09:27:18,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3701.0). Total num frames: 8544256. Throughput: 0: 977.0. Samples: 258158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:27:18,552][01070] Avg episode reward: [(0, '30.973')] [2024-09-06 09:27:23,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3708.0). Total num frames: 8564736. Throughput: 0: 1010.3. Samples: 264532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:27:23,555][01070] Avg episode reward: [(0, '31.711')] [2024-09-06 09:27:26,470][31334] Updated weights for policy 0, policy_version 2093 (0.0036) [2024-09-06 09:27:28,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3686.4). Total num frames: 8577024. Throughput: 0: 976.8. Samples: 266614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:27:28,552][01070] Avg episode reward: [(0, '32.002')] [2024-09-06 09:27:33,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3693.3). Total num frames: 8597504. Throughput: 0: 951.4. Samples: 272346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:27:33,557][01070] Avg episode reward: [(0, '32.432')] [2024-09-06 09:27:36,269][31334] Updated weights for policy 0, policy_version 2103 (0.0040) [2024-09-06 09:27:38,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3776.7). Total num frames: 8622080. Throughput: 0: 1013.1. Samples: 279382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:27:38,556][01070] Avg episode reward: [(0, '30.337')] [2024-09-06 09:27:43,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.3). Total num frames: 8638464. Throughput: 0: 1001.7. Samples: 281930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:27:43,552][01070] Avg episode reward: [(0, '30.111')] [2024-09-06 09:27:48,104][31334] Updated weights for policy 0, policy_version 2113 (0.0024) [2024-09-06 09:27:48,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3823.2, 300 sec: 3873.8). Total num frames: 8654848. Throughput: 0: 943.9. Samples: 286418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:27:48,552][01070] Avg episode reward: [(0, '28.878')] [2024-09-06 09:27:53,550][01070] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 8679424. Throughput: 0: 983.7. Samples: 293450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:27:53,556][01070] Avg episode reward: [(0, '28.257')] [2024-09-06 09:27:56,774][31334] Updated weights for policy 0, policy_version 2123 (0.0031) [2024-09-06 09:27:58,552][01070] Fps is (10 sec: 4504.7, 60 sec: 3959.3, 300 sec: 3887.7). Total num frames: 8699904. Throughput: 0: 1012.7. Samples: 296960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:27:58,554][01070] Avg episode reward: [(0, '28.838')] [2024-09-06 09:28:03,550][01070] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 8712192. Throughput: 0: 961.9. Samples: 301442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:28:03,553][01070] Avg episode reward: [(0, '28.745')] [2024-09-06 09:28:08,412][31334] Updated weights for policy 0, policy_version 2133 (0.0013) [2024-09-06 09:28:08,550][01070] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 8736768. Throughput: 0: 957.9. Samples: 307636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:28:08,552][01070] Avg episode reward: [(0, '29.904')] [2024-09-06 09:28:13,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 8757248. Throughput: 0: 989.8. Samples: 311156. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:28:13,552][01070] Avg episode reward: [(0, '29.493')] [2024-09-06 09:28:18,552][01070] Fps is (10 sec: 3685.4, 60 sec: 3822.8, 300 sec: 3887.7). Total num frames: 8773632. Throughput: 0: 984.0. Samples: 316628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:28:18,556][01070] Avg episode reward: [(0, '27.888')] [2024-09-06 09:28:18,569][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002142_8773632.pth... [2024-09-06 09:28:18,718][31321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001915_7843840.pth [2024-09-06 09:28:19,217][31334] Updated weights for policy 0, policy_version 2143 (0.0036) [2024-09-06 09:28:23,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 8790016. Throughput: 0: 940.9. Samples: 321724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:28:23,552][01070] Avg episode reward: [(0, '28.035')] [2024-09-06 09:28:28,550][01070] Fps is (10 sec: 4097.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 8814592. Throughput: 0: 962.2. Samples: 325230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:28:28,554][01070] Avg episode reward: [(0, '25.924')] [2024-09-06 09:28:28,713][31334] Updated weights for policy 0, policy_version 2153 (0.0035) [2024-09-06 09:28:33,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 8835072. Throughput: 0: 1014.2. Samples: 332058. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:28:33,559][01070] Avg episode reward: [(0, '26.391')] [2024-09-06 09:28:38,550][01070] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 8851456. Throughput: 0: 955.3. Samples: 336440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:28:38,554][01070] Avg episode reward: [(0, '26.390')] [2024-09-06 09:28:40,330][31334] Updated weights for policy 0, policy_version 2163 (0.0015) [2024-09-06 09:28:43,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 8871936. Throughput: 0: 948.3. Samples: 339630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:28:43,552][01070] Avg episode reward: [(0, '25.753')] [2024-09-06 09:28:48,550][01070] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 8896512. Throughput: 0: 1000.0. Samples: 346442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:28:48,557][01070] Avg episode reward: [(0, '25.859')] [2024-09-06 09:28:49,169][31334] Updated weights for policy 0, policy_version 2173 (0.0034) [2024-09-06 09:28:53,552][01070] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3873.8). Total num frames: 8908800. Throughput: 0: 974.6. Samples: 351494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:28:53,558][01070] Avg episode reward: [(0, '25.517')] [2024-09-06 09:28:58,550][01070] Fps is (10 sec: 3276.7, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 8929280. Throughput: 0: 943.7. Samples: 353622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:28:58,553][01070] Avg episode reward: [(0, '26.407')] [2024-09-06 09:29:00,954][31334] Updated weights for policy 0, policy_version 2183 (0.0026) [2024-09-06 09:29:03,555][01070] Fps is (10 sec: 4094.8, 60 sec: 3959.1, 300 sec: 3859.9). Total num frames: 8949760. Throughput: 0: 970.5. Samples: 360304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 09:29:03,557][01070] Avg episode reward: [(0, '25.470')] [2024-09-06 09:29:08,550][01070] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 8962048. Throughput: 0: 944.0. Samples: 364206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:29:08,555][01070] Avg episode reward: [(0, '25.993')] [2024-09-06 09:29:13,555][01070] Fps is (10 sec: 2457.4, 60 sec: 3617.8, 300 sec: 3832.1). Total num frames: 8974336. Throughput: 0: 907.2. Samples: 366058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:29:13,562][01070] Avg episode reward: [(0, '25.573')] [2024-09-06 09:29:15,599][31334] Updated weights for policy 0, policy_version 2193 (0.0034) [2024-09-06 09:29:18,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3832.2). Total num frames: 8994816. Throughput: 0: 869.2. Samples: 371170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:29:18,557][01070] Avg episode reward: [(0, '25.499')] [2024-09-06 09:29:23,550][01070] Fps is (10 sec: 4508.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 9019392. Throughput: 0: 931.0. Samples: 378336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:29:23,558][01070] Avg episode reward: [(0, '25.004')] [2024-09-06 09:29:24,221][31334] Updated weights for policy 0, policy_version 2203 (0.0023) [2024-09-06 09:29:28,550][01070] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 9035776. Throughput: 0: 928.8. Samples: 381428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:29:28,558][01070] Avg episode reward: [(0, '25.638')] [2024-09-06 09:29:33,550][01070] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 9052160. Throughput: 0: 872.4. Samples: 385698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:29:33,556][01070] Avg episode reward: [(0, '25.467')] [2024-09-06 09:29:35,802][31334] Updated weights for policy 0, policy_version 2213 (0.0024) [2024-09-06 09:29:38,550][01070] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 9076736. Throughput: 0: 917.8. Samples: 392794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:29:38,552][01070] Avg episode reward: [(0, '26.654')] [2024-09-06 09:29:43,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 9097216. Throughput: 0: 949.8. Samples: 396364. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-09-06 09:29:43,554][01070] Avg episode reward: [(0, '27.006')] [2024-09-06 09:29:45,512][31334] Updated weights for policy 0, policy_version 2223 (0.0020) [2024-09-06 09:29:48,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3860.0). Total num frames: 9113600. Throughput: 0: 906.2. Samples: 401080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:29:48,555][01070] Avg episode reward: [(0, '27.430')] [2024-09-06 09:29:53,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3873.8). Total num frames: 9134080. Throughput: 0: 950.4. Samples: 406972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:29:53,552][01070] Avg episode reward: [(0, '28.731')] [2024-09-06 09:29:55,896][31334] Updated weights for policy 0, policy_version 2233 (0.0016) [2024-09-06 09:29:58,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 9158656. Throughput: 0: 988.4. Samples: 410532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 09:29:58,552][01070] Avg episode reward: [(0, '27.817')] [2024-09-06 09:30:03,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3873.9). Total num frames: 9170944. Throughput: 0: 1005.6. Samples: 416422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:30:03,554][01070] Avg episode reward: [(0, '28.267')] [2024-09-06 09:30:07,671][31334] Updated weights for policy 0, policy_version 2243 (0.0043) [2024-09-06 09:30:08,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 9191424. Throughput: 0: 953.2. Samples: 421228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:30:08,556][01070] Avg episode reward: [(0, '26.904')] [2024-09-06 09:30:13,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.8, 300 sec: 3873.8). Total num frames: 9211904. Throughput: 0: 962.7. Samples: 424748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:30:13,556][01070] Avg episode reward: [(0, '28.356')] [2024-09-06 09:30:16,331][31334] Updated weights for policy 0, policy_version 2253 (0.0013) [2024-09-06 09:30:18,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9232384. Throughput: 0: 1024.3. Samples: 431792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:30:18,552][01070] Avg episode reward: [(0, '27.318')] [2024-09-06 09:30:18,626][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002255_9236480.pth... [2024-09-06 09:30:18,780][31321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002027_8302592.pth [2024-09-06 09:30:23,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 9248768. Throughput: 0: 960.3. Samples: 436008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:30:23,554][01070] Avg episode reward: [(0, '27.558')] [2024-09-06 09:30:28,105][31334] Updated weights for policy 0, policy_version 2263 (0.0049) [2024-09-06 09:30:28,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 9269248. Throughput: 0: 942.6. Samples: 438782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:30:28,558][01070] Avg episode reward: [(0, '27.140')] [2024-09-06 09:30:33,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 9293824. Throughput: 0: 998.1. Samples: 445996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:30:33,552][01070] Avg episode reward: [(0, '27.243')] [2024-09-06 09:30:37,903][31334] Updated weights for policy 0, policy_version 2273 (0.0040) [2024-09-06 09:30:38,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 9310208. Throughput: 0: 988.6. Samples: 451458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:30:38,555][01070] Avg episode reward: [(0, '25.930')] [2024-09-06 09:30:43,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 9326592. Throughput: 0: 957.8. Samples: 453632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:30:43,557][01070] Avg episode reward: [(0, '25.857')] [2024-09-06 09:30:48,412][31334] Updated weights for policy 0, policy_version 2283 (0.0029) [2024-09-06 09:30:48,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9351168. Throughput: 0: 974.4. Samples: 460270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:30:48,553][01070] Avg episode reward: [(0, '26.119')] [2024-09-06 09:30:53,551][01070] Fps is (10 sec: 4504.8, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 9371648. Throughput: 0: 1009.6. Samples: 466664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:30:53,554][01070] Avg episode reward: [(0, '26.189')] [2024-09-06 09:30:58,553][01070] Fps is (10 sec: 3275.6, 60 sec: 3754.4, 300 sec: 3873.8). Total num frames: 9383936. Throughput: 0: 979.9. Samples: 468846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:30:58,556][01070] Avg episode reward: [(0, '26.161')] [2024-09-06 09:31:00,090][31334] Updated weights for policy 0, policy_version 2293 (0.0017) [2024-09-06 09:31:03,550][01070] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9408512. Throughput: 0: 946.0. Samples: 474360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:31:03,554][01070] Avg episode reward: [(0, '26.578')] [2024-09-06 09:31:08,551][01070] Fps is (10 sec: 4506.5, 60 sec: 3959.3, 300 sec: 3859.9). Total num frames: 9428992. Throughput: 0: 1008.8. Samples: 481404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:31:08,557][01070] Avg episode reward: [(0, '27.973')] [2024-09-06 09:31:08,806][31334] Updated weights for policy 0, policy_version 2303 (0.0017) [2024-09-06 09:31:13,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 9445376. Throughput: 0: 1010.6. Samples: 484258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:31:13,558][01070] Avg episode reward: [(0, '27.929')] [2024-09-06 09:31:18,550][01070] Fps is (10 sec: 3277.3, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9461760. Throughput: 0: 946.3. Samples: 488582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:31:18,553][01070] Avg episode reward: [(0, '29.506')] [2024-09-06 09:31:20,377][31334] Updated weights for policy 0, policy_version 2313 (0.0028) [2024-09-06 09:31:23,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 9486336. Throughput: 0: 979.2. Samples: 495520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:31:23,557][01070] Avg episode reward: [(0, '28.909')] [2024-09-06 09:31:28,550][01070] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9506816. Throughput: 0: 1009.9. Samples: 499078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:31:28,556][01070] Avg episode reward: [(0, '29.850')] [2024-09-06 09:31:30,293][31334] Updated weights for policy 0, policy_version 2323 (0.0023) [2024-09-06 09:31:33,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 9523200. Throughput: 0: 967.2. Samples: 503792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:31:33,556][01070] Avg episode reward: [(0, '29.280')] [2024-09-06 09:31:38,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 9543680. Throughput: 0: 958.8. Samples: 509810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:31:38,553][01070] Avg episode reward: [(0, '27.557')] [2024-09-06 09:31:40,673][31334] Updated weights for policy 0, policy_version 2333 (0.0042) [2024-09-06 09:31:43,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.9). Total num frames: 9568256. Throughput: 0: 989.4. Samples: 513364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:31:43,556][01070] Avg episode reward: [(0, '27.125')] [2024-09-06 09:31:48,551][01070] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 9584640. Throughput: 0: 995.3. Samples: 519150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-09-06 09:31:48,553][01070] Avg episode reward: [(0, '27.322')] [2024-09-06 09:31:52,363][31334] Updated weights for policy 0, policy_version 2343 (0.0016) [2024-09-06 09:31:53,550][01070] Fps is (10 sec: 3276.7, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 9601024. Throughput: 0: 944.7. Samples: 523914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:31:53,552][01070] Avg episode reward: [(0, '27.500')] [2024-09-06 09:31:58,551][01070] Fps is (10 sec: 4095.9, 60 sec: 4027.9, 300 sec: 3859.9). Total num frames: 9625600. Throughput: 0: 961.7. Samples: 527534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:31:58,553][01070] Avg episode reward: [(0, '29.126')] [2024-09-06 09:32:00,889][31334] Updated weights for policy 0, policy_version 2353 (0.0022) [2024-09-06 09:32:03,550][01070] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 9646080. Throughput: 0: 1019.8. Samples: 534472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:32:03,558][01070] Avg episode reward: [(0, '29.982')] [2024-09-06 09:32:08,550][01070] Fps is (10 sec: 3277.3, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 9658368. Throughput: 0: 960.1. Samples: 538726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:32:08,556][01070] Avg episode reward: [(0, '29.952')] [2024-09-06 09:32:12,587][31334] Updated weights for policy 0, policy_version 2363 (0.0031) [2024-09-06 09:32:13,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 9682944. Throughput: 0: 949.4. Samples: 541802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:32:13,552][01070] Avg episode reward: [(0, '28.555')] [2024-09-06 09:32:18,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 9703424. Throughput: 0: 1001.6. Samples: 548864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:32:18,556][01070] Avg episode reward: [(0, '25.628')] [2024-09-06 09:32:18,570][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002369_9703424.pth... [2024-09-06 09:32:18,702][31321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002142_8773632.pth [2024-09-06 09:32:22,792][31334] Updated weights for policy 0, policy_version 2373 (0.0016) [2024-09-06 09:32:23,550][01070] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 9719808. Throughput: 0: 977.9. Samples: 553814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-09-06 09:32:23,555][01070] Avg episode reward: [(0, '22.375')] [2024-09-06 09:32:28,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9736192. Throughput: 0: 948.0. Samples: 556022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:32:28,553][01070] Avg episode reward: [(0, '20.360')] [2024-09-06 09:32:32,965][31334] Updated weights for policy 0, policy_version 2383 (0.0025) [2024-09-06 09:32:33,550][01070] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 9760768. Throughput: 0: 973.8. Samples: 562968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-09-06 09:32:33,555][01070] Avg episode reward: [(0, '20.884')] [2024-09-06 09:32:38,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9781248. Throughput: 0: 1009.0. Samples: 569320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:32:38,552][01070] Avg episode reward: [(0, '22.478')] [2024-09-06 09:32:43,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 9797632. Throughput: 0: 975.1. Samples: 571410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:32:43,556][01070] Avg episode reward: [(0, '24.007')] [2024-09-06 09:32:44,350][31334] Updated weights for policy 0, policy_version 2393 (0.0025) [2024-09-06 09:32:48,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 9818112. Throughput: 0: 954.8. Samples: 577440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:32:48,555][01070] Avg episode reward: [(0, '24.930')] [2024-09-06 09:32:53,552][01070] Fps is (10 sec: 3275.9, 60 sec: 3822.8, 300 sec: 3832.2). Total num frames: 9830400. Throughput: 0: 960.2. Samples: 581938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:32:53,557][01070] Avg episode reward: [(0, '25.976')] [2024-09-06 09:32:57,156][31334] Updated weights for policy 0, policy_version 2403 (0.0037) [2024-09-06 09:32:58,550][01070] Fps is (10 sec: 2457.6, 60 sec: 3618.2, 300 sec: 3832.2). Total num frames: 9842688. Throughput: 0: 929.7. Samples: 583638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:32:58,555][01070] Avg episode reward: [(0, '26.309')] [2024-09-06 09:33:03,550][01070] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 9863168. Throughput: 0: 873.8. Samples: 588184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:33:03,552][01070] Avg episode reward: [(0, '28.969')] [2024-09-06 09:33:07,715][31334] Updated weights for policy 0, policy_version 2413 (0.0028) [2024-09-06 09:33:08,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 9887744. Throughput: 0: 922.7. Samples: 595336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:33:08,557][01070] Avg episode reward: [(0, '29.922')] [2024-09-06 09:33:13,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 9908224. Throughput: 0: 951.1. Samples: 598822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-09-06 09:33:13,556][01070] Avg episode reward: [(0, '30.424')] [2024-09-06 09:33:18,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 9920512. Throughput: 0: 895.7. Samples: 603274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-09-06 09:33:18,552][01070] Avg episode reward: [(0, '29.949')] [2024-09-06 09:33:18,995][31334] Updated weights for policy 0, policy_version 2423 (0.0013) [2024-09-06 09:33:23,550][01070] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 9945088. Throughput: 0: 893.8. Samples: 609540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:33:23,552][01070] Avg episode reward: [(0, '32.278')] [2024-09-06 09:33:27,758][31334] Updated weights for policy 0, policy_version 2433 (0.0015) [2024-09-06 09:33:28,550][01070] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 9965568. Throughput: 0: 926.0. Samples: 613080. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 09:33:28,557][01070] Avg episode reward: [(0, '32.614')] [2024-09-06 09:33:33,550][01070] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 9981952. Throughput: 0: 917.8. Samples: 618742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-09-06 09:33:33,553][01070] Avg episode reward: [(0, '30.653')] [2024-09-06 09:33:38,550][01070] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 9998336. Throughput: 0: 930.7. Samples: 623816. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-09-06 09:33:38,554][01070] Avg episode reward: [(0, '29.532')] [2024-09-06 09:33:39,515][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-09-06 09:33:39,516][31321] Stopping Batcher_0... [2024-09-06 09:33:39,524][31321] Loop batcher_evt_loop terminating... [2024-09-06 09:33:39,530][01070] Component Batcher_0 stopped! [2024-09-06 09:33:39,548][31334] Updated weights for policy 0, policy_version 2443 (0.0018) [2024-09-06 09:33:39,594][31334] Weights refcount: 2 0 [2024-09-06 09:33:39,598][31334] Stopping InferenceWorker_p0-w0... [2024-09-06 09:33:39,599][31334] Loop inference_proc0-0_evt_loop terminating... [2024-09-06 09:33:39,599][01070] Component InferenceWorker_p0-w0 stopped! [2024-09-06 09:33:39,677][31321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002255_9236480.pth [2024-09-06 09:33:39,692][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-09-06 09:33:39,880][31321] Stopping LearnerWorker_p0... [2024-09-06 09:33:39,884][31321] Loop learner_proc0_evt_loop terminating... [2024-09-06 09:33:39,881][01070] Component LearnerWorker_p0 stopped! [2024-09-06 09:33:39,915][01070] Component RolloutWorker_w1 stopped! [2024-09-06 09:33:39,920][31336] Stopping RolloutWorker_w1... [2024-09-06 09:33:39,929][01070] Component RolloutWorker_w3 stopped! [2024-09-06 09:33:39,933][31338] Stopping RolloutWorker_w3... [2024-09-06 09:33:39,934][31336] Loop rollout_proc1_evt_loop terminating... [2024-09-06 09:33:39,937][01070] Component RolloutWorker_w5 stopped! [2024-09-06 09:33:39,941][31340] Stopping RolloutWorker_w5... [2024-09-06 09:33:39,942][31340] Loop rollout_proc5_evt_loop terminating... [2024-09-06 09:33:39,934][31338] Loop rollout_proc3_evt_loop terminating... [2024-09-06 09:33:39,963][01070] Component RolloutWorker_w7 stopped! [2024-09-06 09:33:39,967][31341] Stopping RolloutWorker_w7... [2024-09-06 09:33:39,972][31341] Loop rollout_proc7_evt_loop terminating... [2024-09-06 09:33:39,988][31337] Stopping RolloutWorker_w2... [2024-09-06 09:33:39,988][01070] Component RolloutWorker_w2 stopped! [2024-09-06 09:33:39,989][31337] Loop rollout_proc2_evt_loop terminating... [2024-09-06 09:33:39,998][31335] Stopping RolloutWorker_w0... [2024-09-06 09:33:39,998][01070] Component RolloutWorker_w0 stopped! [2024-09-06 09:33:40,002][31335] Loop rollout_proc0_evt_loop terminating... [2024-09-06 09:33:40,068][31339] Stopping RolloutWorker_w4... [2024-09-06 09:33:40,068][01070] Component RolloutWorker_w4 stopped! [2024-09-06 09:33:40,069][31339] Loop rollout_proc4_evt_loop terminating... [2024-09-06 09:33:40,097][31342] Stopping RolloutWorker_w6... [2024-09-06 09:33:40,097][01070] Component RolloutWorker_w6 stopped! [2024-09-06 09:33:40,102][01070] Waiting for process learner_proc0 to stop... [2024-09-06 09:33:40,098][31342] Loop rollout_proc6_evt_loop terminating... [2024-09-06 09:33:41,279][01070] Waiting for process inference_proc0-0 to join... [2024-09-06 09:33:41,286][01070] Waiting for process rollout_proc0 to join... [2024-09-06 09:33:43,440][01070] Waiting for process rollout_proc1 to join... [2024-09-06 09:33:43,450][01070] Waiting for process rollout_proc2 to join... [2024-09-06 09:33:43,456][01070] Waiting for process rollout_proc3 to join... [2024-09-06 09:33:43,459][01070] Waiting for process rollout_proc4 to join... [2024-09-06 09:33:43,464][01070] Waiting for process rollout_proc5 to join... [2024-09-06 09:33:43,468][01070] Waiting for process rollout_proc6 to join... [2024-09-06 09:33:43,473][01070] Waiting for process rollout_proc7 to join... [2024-09-06 09:33:43,477][01070] Batcher 0 profile tree view: batching: 17.3741, releasing_batches: 0.0206 [2024-09-06 09:33:43,479][01070] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 250.6480 update_model: 5.6099 weight_update: 0.0018 one_step: 0.0090 handle_policy_step: 377.0589 deserialize: 9.3343, stack: 2.0445, obs_to_device_normalize: 77.2195, forward: 198.9774, send_messages: 18.2353 prepare_outputs: 52.7164 to_cpu: 30.3890 [2024-09-06 09:33:43,480][01070] Learner 0 profile tree view: misc: 0.0038, prepare_batch: 9.0030 train: 48.0575 epoch_init: 0.0094, minibatch_init: 0.0154, losses_postprocess: 0.4238, kl_divergence: 0.4480, after_optimizer: 1.8224 calculate_losses: 17.3374 losses_init: 0.0049, forward_head: 1.0561, bptt_initial: 11.6751, tail: 0.7162, advantages_returns: 0.1663, losses: 2.2672 bptt: 1.2517 bptt_forward_core: 1.1681 update: 27.5809 clip: 0.5703 [2024-09-06 09:33:43,483][01070] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2106, enqueue_policy_requests: 61.5026, env_step: 506.3878, overhead: 8.0540, complete_rollouts: 5.0957 save_policy_outputs: 12.4197 split_output_tensors: 4.9996 [2024-09-06 09:33:43,485][01070] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.1713, enqueue_policy_requests: 59.7714, env_step: 514.1502, overhead: 8.4068, complete_rollouts: 4.1925 save_policy_outputs: 12.9198 split_output_tensors: 5.3484 [2024-09-06 09:33:43,487][01070] Loop Runner_EvtLoop terminating... [2024-09-06 09:33:43,488][01070] Runner profile tree view: main_loop: 682.3462 [2024-09-06 09:33:43,490][01070] Collected {0: 10006528}, FPS: 3661.7 [2024-09-06 09:33:49,612][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 09:33:49,613][01070] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-06 09:33:49,615][01070] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-06 09:33:49,616][01070] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-06 09:33:49,617][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 09:33:49,618][01070] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-06 09:33:49,619][01070] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 09:33:49,621][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-06 09:33:49,622][01070] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-09-06 09:33:49,623][01070] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-09-06 09:33:49,624][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-06 09:33:49,625][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-06 09:33:49,626][01070] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-06 09:33:49,627][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-06 09:33:49,628][01070] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-06 09:33:49,660][01070] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 09:33:49,661][01070] RunningMeanStd input shape: (1,) [2024-09-06 09:33:49,675][01070] ConvEncoder: input_channels=3 [2024-09-06 09:33:49,713][01070] Conv encoder output size: 512 [2024-09-06 09:33:49,714][01070] Policy head output size: 512 [2024-09-06 09:33:49,733][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-09-06 09:33:50,158][01070] Num frames 100... [2024-09-06 09:33:50,294][01070] Num frames 200... [2024-09-06 09:33:50,435][01070] Num frames 300... [2024-09-06 09:33:50,580][01070] Num frames 400... [2024-09-06 09:33:50,701][01070] Num frames 500... [2024-09-06 09:33:50,828][01070] Num frames 600... [2024-09-06 09:33:50,948][01070] Num frames 700... [2024-09-06 09:33:51,068][01070] Num frames 800... [2024-09-06 09:33:51,192][01070] Num frames 900... [2024-09-06 09:33:51,316][01070] Num frames 1000... [2024-09-06 09:33:51,448][01070] Num frames 1100... [2024-09-06 09:33:51,584][01070] Num frames 1200... [2024-09-06 09:33:51,708][01070] Num frames 1300... [2024-09-06 09:33:51,831][01070] Num frames 1400... [2024-09-06 09:33:51,964][01070] Num frames 1500... [2024-09-06 09:33:52,096][01070] Num frames 1600... [2024-09-06 09:33:52,228][01070] Num frames 1700... [2024-09-06 09:33:52,381][01070] Avg episode rewards: #0: 42.729, true rewards: #0: 17.730 [2024-09-06 09:33:52,382][01070] Avg episode reward: 42.729, avg true_objective: 17.730 [2024-09-06 09:33:52,418][01070] Num frames 1800... [2024-09-06 09:33:52,551][01070] Num frames 1900... [2024-09-06 09:33:52,676][01070] Num frames 2000... [2024-09-06 09:33:52,798][01070] Num frames 2100... [2024-09-06 09:33:52,927][01070] Num frames 2200... [2024-09-06 09:33:53,052][01070] Num frames 2300... [2024-09-06 09:33:53,174][01070] Num frames 2400... [2024-09-06 09:33:53,325][01070] Avg episode rewards: #0: 27.885, true rewards: #0: 12.385 [2024-09-06 09:33:53,326][01070] Avg episode reward: 27.885, avg true_objective: 12.385 [2024-09-06 09:33:53,356][01070] Num frames 2500... [2024-09-06 09:33:53,476][01070] Num frames 2600... [2024-09-06 09:33:53,599][01070] Num frames 2700... [2024-09-06 09:33:53,724][01070] Num frames 2800... [2024-09-06 09:33:53,844][01070] Num frames 2900... [2024-09-06 09:33:53,933][01070] Avg episode rewards: #0: 20.416, true rewards: #0: 9.750 [2024-09-06 09:33:53,934][01070] Avg episode reward: 20.416, avg true_objective: 9.750 [2024-09-06 09:33:54,025][01070] Num frames 3000... [2024-09-06 09:33:54,146][01070] Num frames 3100... [2024-09-06 09:33:54,275][01070] Num frames 3200... [2024-09-06 09:33:54,415][01070] Num frames 3300... [2024-09-06 09:33:54,544][01070] Num frames 3400... [2024-09-06 09:33:54,668][01070] Num frames 3500... [2024-09-06 09:33:54,789][01070] Num frames 3600... [2024-09-06 09:33:54,917][01070] Num frames 3700... [2024-09-06 09:33:55,043][01070] Avg episode rewards: #0: 19.895, true rewards: #0: 9.395 [2024-09-06 09:33:55,044][01070] Avg episode reward: 19.895, avg true_objective: 9.395 [2024-09-06 09:33:55,099][01070] Num frames 3800... [2024-09-06 09:33:55,230][01070] Num frames 3900... [2024-09-06 09:33:55,361][01070] Num frames 4000... [2024-09-06 09:33:55,490][01070] Num frames 4100... [2024-09-06 09:33:55,610][01070] Num frames 4200... [2024-09-06 09:33:55,712][01070] Avg episode rewards: #0: 17.476, true rewards: #0: 8.476 [2024-09-06 09:33:55,714][01070] Avg episode reward: 17.476, avg true_objective: 8.476 [2024-09-06 09:33:55,790][01070] Num frames 4300... [2024-09-06 09:33:55,909][01070] Num frames 4400... [2024-09-06 09:33:56,038][01070] Num frames 4500... [2024-09-06 09:33:56,162][01070] Num frames 4600... [2024-09-06 09:33:56,279][01070] Num frames 4700... [2024-09-06 09:33:56,400][01070] Num frames 4800... [2024-09-06 09:33:56,534][01070] Num frames 4900... [2024-09-06 09:33:56,659][01070] Num frames 5000... [2024-09-06 09:33:56,780][01070] Num frames 5100... [2024-09-06 09:33:56,902][01070] Num frames 5200... [2024-09-06 09:33:57,033][01070] Num frames 5300... [2024-09-06 09:33:57,157][01070] Num frames 5400... [2024-09-06 09:33:57,279][01070] Num frames 5500... [2024-09-06 09:33:57,405][01070] Num frames 5600... [2024-09-06 09:33:57,545][01070] Num frames 5700... [2024-09-06 09:33:57,669][01070] Num frames 5800... [2024-09-06 09:33:57,789][01070] Num frames 5900... [2024-09-06 09:33:57,910][01070] Num frames 6000... [2024-09-06 09:33:58,039][01070] Num frames 6100... [2024-09-06 09:33:58,162][01070] Num frames 6200... [2024-09-06 09:33:58,322][01070] Num frames 6300... [2024-09-06 09:33:58,444][01070] Avg episode rewards: #0: 24.730, true rewards: #0: 10.563 [2024-09-06 09:33:58,446][01070] Avg episode reward: 24.730, avg true_objective: 10.563 [2024-09-06 09:33:58,553][01070] Num frames 6400... [2024-09-06 09:33:58,719][01070] Num frames 6500... [2024-09-06 09:33:58,886][01070] Num frames 6600... [2024-09-06 09:33:59,057][01070] Num frames 6700... [2024-09-06 09:33:59,222][01070] Num frames 6800... [2024-09-06 09:33:59,382][01070] Num frames 6900... [2024-09-06 09:33:59,564][01070] Num frames 7000... [2024-09-06 09:33:59,740][01070] Num frames 7100... [2024-09-06 09:33:59,914][01070] Num frames 7200... [2024-09-06 09:34:00,087][01070] Num frames 7300... [2024-09-06 09:34:00,258][01070] Num frames 7400... [2024-09-06 09:34:00,429][01070] Num frames 7500... [2024-09-06 09:34:00,610][01070] Num frames 7600... [2024-09-06 09:34:00,786][01070] Num frames 7700... [2024-09-06 09:34:00,911][01070] Num frames 7800... [2024-09-06 09:34:01,031][01070] Num frames 7900... [2024-09-06 09:34:01,159][01070] Num frames 8000... [2024-09-06 09:34:01,279][01070] Num frames 8100... [2024-09-06 09:34:01,402][01070] Num frames 8200... [2024-09-06 09:34:01,537][01070] Num frames 8300... [2024-09-06 09:34:01,664][01070] Num frames 8400... [2024-09-06 09:34:01,766][01070] Avg episode rewards: #0: 29.768, true rewards: #0: 12.054 [2024-09-06 09:34:01,767][01070] Avg episode reward: 29.768, avg true_objective: 12.054 [2024-09-06 09:34:01,845][01070] Num frames 8500... [2024-09-06 09:34:01,964][01070] Num frames 8600... [2024-09-06 09:34:02,084][01070] Num frames 8700... [2024-09-06 09:34:02,213][01070] Num frames 8800... [2024-09-06 09:34:02,332][01070] Num frames 8900... [2024-09-06 09:34:02,455][01070] Num frames 9000... [2024-09-06 09:34:02,585][01070] Num frames 9100... [2024-09-06 09:34:02,710][01070] Num frames 9200... [2024-09-06 09:34:02,832][01070] Num frames 9300... [2024-09-06 09:34:02,955][01070] Num frames 9400... [2024-09-06 09:34:03,078][01070] Num frames 9500... [2024-09-06 09:34:03,208][01070] Num frames 9600... [2024-09-06 09:34:03,326][01070] Num frames 9700... [2024-09-06 09:34:03,450][01070] Num frames 9800... [2024-09-06 09:34:03,583][01070] Num frames 9900... [2024-09-06 09:34:03,708][01070] Num frames 10000... [2024-09-06 09:34:03,829][01070] Num frames 10100... [2024-09-06 09:34:03,949][01070] Num frames 10200... [2024-09-06 09:34:04,072][01070] Num frames 10300... [2024-09-06 09:34:04,200][01070] Num frames 10400... [2024-09-06 09:34:04,323][01070] Num frames 10500... [2024-09-06 09:34:04,414][01070] Avg episode rewards: #0: 33.533, true rewards: #0: 13.159 [2024-09-06 09:34:04,416][01070] Avg episode reward: 33.533, avg true_objective: 13.159 [2024-09-06 09:34:04,519][01070] Num frames 10600... [2024-09-06 09:34:04,640][01070] Num frames 10700... [2024-09-06 09:34:04,761][01070] Num frames 10800... [2024-09-06 09:34:04,884][01070] Num frames 10900... [2024-09-06 09:34:05,003][01070] Num frames 11000... [2024-09-06 09:34:05,126][01070] Num frames 11100... [2024-09-06 09:34:05,263][01070] Num frames 11200... [2024-09-06 09:34:05,384][01070] Num frames 11300... [2024-09-06 09:34:05,512][01070] Num frames 11400... [2024-09-06 09:34:05,637][01070] Num frames 11500... [2024-09-06 09:34:05,761][01070] Num frames 11600... [2024-09-06 09:34:05,884][01070] Num frames 11700... [2024-09-06 09:34:06,007][01070] Num frames 11800... [2024-09-06 09:34:06,131][01070] Num frames 11900... [2024-09-06 09:34:06,262][01070] Num frames 12000... [2024-09-06 09:34:06,384][01070] Num frames 12100... [2024-09-06 09:34:06,477][01070] Avg episode rewards: #0: 34.477, true rewards: #0: 13.478 [2024-09-06 09:34:06,478][01070] Avg episode reward: 34.477, avg true_objective: 13.478 [2024-09-06 09:34:06,568][01070] Num frames 12200... [2024-09-06 09:34:06,688][01070] Num frames 12300... [2024-09-06 09:34:06,808][01070] Num frames 12400... [2024-09-06 09:34:06,928][01070] Num frames 12500... [2024-09-06 09:34:07,058][01070] Num frames 12600... [2024-09-06 09:34:07,184][01070] Num frames 12700... [2024-09-06 09:34:07,317][01070] Num frames 12800... [2024-09-06 09:34:07,443][01070] Avg episode rewards: #0: 32.758, true rewards: #0: 12.858 [2024-09-06 09:34:07,444][01070] Avg episode reward: 32.758, avg true_objective: 12.858 [2024-09-06 09:35:25,862][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-09-06 09:35:27,895][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-09-06 09:35:27,897][01070] Overriding arg 'num_workers' with value 1 passed from command line [2024-09-06 09:35:27,899][01070] Adding new argument 'no_render'=True that is not in the saved config file! [2024-09-06 09:35:27,901][01070] Adding new argument 'save_video'=True that is not in the saved config file! [2024-09-06 09:35:27,903][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-09-06 09:35:27,904][01070] Adding new argument 'video_name'=None that is not in the saved config file! [2024-09-06 09:35:27,906][01070] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-09-06 09:35:27,908][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-09-06 09:35:27,909][01070] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-09-06 09:35:27,910][01070] Adding new argument 'hf_repository'='Re-Re/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-09-06 09:35:27,911][01070] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-09-06 09:35:27,912][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-09-06 09:35:27,913][01070] Adding new argument 'train_script'=None that is not in the saved config file! [2024-09-06 09:35:27,914][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-09-06 09:35:27,915][01070] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-09-06 09:35:27,944][01070] RunningMeanStd input shape: (3, 72, 128) [2024-09-06 09:35:27,946][01070] RunningMeanStd input shape: (1,) [2024-09-06 09:35:27,960][01070] ConvEncoder: input_channels=3 [2024-09-06 09:35:27,997][01070] Conv encoder output size: 512 [2024-09-06 09:35:27,998][01070] Policy head output size: 512 [2024-09-06 09:35:28,018][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2024-09-06 09:35:28,432][01070] Num frames 100... [2024-09-06 09:35:28,573][01070] Num frames 200... [2024-09-06 09:35:28,697][01070] Num frames 300... [2024-09-06 09:35:28,814][01070] Num frames 400... [2024-09-06 09:35:28,932][01070] Num frames 500... [2024-09-06 09:35:29,050][01070] Num frames 600... [2024-09-06 09:35:29,170][01070] Num frames 700... [2024-09-06 09:35:29,290][01070] Num frames 800... [2024-09-06 09:35:29,422][01070] Avg episode rewards: #0: 17.640, true rewards: #0: 8.640 [2024-09-06 09:35:29,424][01070] Avg episode reward: 17.640, avg true_objective: 8.640 [2024-09-06 09:35:29,487][01070] Num frames 900... [2024-09-06 09:35:29,613][01070] Num frames 1000... [2024-09-06 09:35:29,733][01070] Num frames 1100... [2024-09-06 09:35:29,853][01070] Num frames 1200... [2024-09-06 09:35:29,974][01070] Num frames 1300... [2024-09-06 09:35:30,108][01070] Num frames 1400... [2024-09-06 09:35:30,275][01070] Num frames 1500... [2024-09-06 09:35:30,436][01070] Num frames 1600... [2024-09-06 09:35:30,626][01070] Num frames 1700... [2024-09-06 09:35:30,791][01070] Num frames 1800... [2024-09-06 09:35:30,958][01070] Num frames 1900... [2024-09-06 09:35:31,122][01070] Num frames 2000... [2024-09-06 09:35:31,289][01070] Num frames 2100... [2024-09-06 09:35:31,465][01070] Num frames 2200... [2024-09-06 09:35:31,646][01070] Num frames 2300... [2024-09-06 09:35:31,818][01070] Num frames 2400... [2024-09-06 09:35:32,050][01070] Avg episode rewards: #0: 28.480, true rewards: #0: 12.480 [2024-09-06 09:35:32,052][01070] Avg episode reward: 28.480, avg true_objective: 12.480 [2024-09-06 09:35:32,066][01070] Num frames 2500... [2024-09-06 09:35:32,252][01070] Num frames 2600... [2024-09-06 09:35:32,423][01070] Num frames 2700... [2024-09-06 09:35:32,607][01070] Num frames 2800... [2024-09-06 09:35:32,773][01070] Num frames 2900... [2024-09-06 09:35:32,896][01070] Num frames 3000... [2024-09-06 09:35:33,016][01070] Num frames 3100... [2024-09-06 09:35:33,138][01070] Num frames 3200... [2024-09-06 09:35:33,262][01070] Num frames 3300... [2024-09-06 09:35:33,382][01070] Num frames 3400... [2024-09-06 09:35:33,467][01070] Avg episode rewards: #0: 26.080, true rewards: #0: 11.413 [2024-09-06 09:35:33,469][01070] Avg episode reward: 26.080, avg true_objective: 11.413 [2024-09-06 09:35:33,567][01070] Num frames 3500... [2024-09-06 09:35:33,697][01070] Num frames 3600... [2024-09-06 09:35:33,817][01070] Num frames 3700... [2024-09-06 09:35:33,963][01070] Avg episode rewards: #0: 21.190, true rewards: #0: 9.440 [2024-09-06 09:35:33,964][01070] Avg episode reward: 21.190, avg true_objective: 9.440 [2024-09-06 09:35:33,997][01070] Num frames 3800... [2024-09-06 09:35:34,117][01070] Num frames 3900... [2024-09-06 09:35:34,241][01070] Num frames 4000... [2024-09-06 09:35:34,361][01070] Num frames 4100... [2024-09-06 09:35:34,490][01070] Num frames 4200... [2024-09-06 09:35:34,616][01070] Num frames 4300... [2024-09-06 09:35:34,748][01070] Num frames 4400... [2024-09-06 09:35:34,870][01070] Num frames 4500... [2024-09-06 09:35:34,990][01070] Num frames 4600... [2024-09-06 09:35:35,113][01070] Num frames 4700... [2024-09-06 09:35:35,236][01070] Num frames 4800... [2024-09-06 09:35:35,355][01070] Num frames 4900... [2024-09-06 09:35:35,477][01070] Num frames 5000... [2024-09-06 09:35:35,600][01070] Num frames 5100... [2024-09-06 09:35:35,717][01070] Avg episode rewards: #0: 22.704, true rewards: #0: 10.304 [2024-09-06 09:35:35,719][01070] Avg episode reward: 22.704, avg true_objective: 10.304 [2024-09-06 09:35:35,781][01070] Num frames 5200... [2024-09-06 09:35:35,899][01070] Num frames 5300... [2024-09-06 09:35:36,021][01070] Num frames 5400... [2024-09-06 09:35:36,145][01070] Num frames 5500... [2024-09-06 09:35:36,267][01070] Num frames 5600... [2024-09-06 09:35:36,386][01070] Num frames 5700... [2024-09-06 09:35:36,515][01070] Num frames 5800... [2024-09-06 09:35:36,637][01070] Num frames 5900... [2024-09-06 09:35:36,764][01070] Num frames 6000... [2024-09-06 09:35:36,889][01070] Num frames 6100... [2024-09-06 09:35:37,009][01070] Num frames 6200... [2024-09-06 09:35:37,131][01070] Num frames 6300... [2024-09-06 09:35:37,191][01070] Avg episode rewards: #0: 23.507, true rewards: #0: 10.507 [2024-09-06 09:35:37,194][01070] Avg episode reward: 23.507, avg true_objective: 10.507 [2024-09-06 09:35:37,306][01070] Num frames 6400... [2024-09-06 09:35:37,424][01070] Num frames 6500... [2024-09-06 09:35:37,552][01070] Num frames 6600... [2024-09-06 09:35:37,668][01070] Num frames 6700... [2024-09-06 09:35:37,791][01070] Num frames 6800... [2024-09-06 09:35:37,943][01070] Avg episode rewards: #0: 21.400, true rewards: #0: 9.829 [2024-09-06 09:35:37,945][01070] Avg episode reward: 21.400, avg true_objective: 9.829 [2024-09-06 09:35:37,971][01070] Num frames 6900... [2024-09-06 09:35:38,088][01070] Num frames 7000... [2024-09-06 09:35:38,212][01070] Num frames 7100... [2024-09-06 09:35:38,333][01070] Num frames 7200... [2024-09-06 09:35:38,453][01070] Num frames 7300... [2024-09-06 09:35:38,583][01070] Num frames 7400... [2024-09-06 09:35:38,700][01070] Num frames 7500... [2024-09-06 09:35:38,825][01070] Num frames 7600... [2024-09-06 09:35:38,943][01070] Num frames 7700... [2024-09-06 09:35:39,063][01070] Num frames 7800... [2024-09-06 09:35:39,184][01070] Num frames 7900... [2024-09-06 09:35:39,306][01070] Num frames 8000... [2024-09-06 09:35:39,424][01070] Num frames 8100... [2024-09-06 09:35:39,552][01070] Num frames 8200... [2024-09-06 09:35:39,648][01070] Avg episode rewards: #0: 23.416, true rewards: #0: 10.291 [2024-09-06 09:35:39,650][01070] Avg episode reward: 23.416, avg true_objective: 10.291 [2024-09-06 09:35:39,731][01070] Num frames 8300... [2024-09-06 09:35:39,860][01070] Num frames 8400... [2024-09-06 09:35:39,978][01070] Num frames 8500... [2024-09-06 09:35:40,096][01070] Num frames 8600... [2024-09-06 09:35:40,219][01070] Num frames 8700... [2024-09-06 09:35:40,336][01070] Num frames 8800... [2024-09-06 09:35:40,464][01070] Num frames 8900... [2024-09-06 09:35:40,596][01070] Num frames 9000... [2024-09-06 09:35:40,719][01070] Num frames 9100... [2024-09-06 09:35:40,847][01070] Num frames 9200... [2024-09-06 09:35:40,971][01070] Num frames 9300... [2024-09-06 09:35:41,093][01070] Num frames 9400... [2024-09-06 09:35:41,220][01070] Num frames 9500... [2024-09-06 09:35:41,339][01070] Num frames 9600... [2024-09-06 09:35:41,465][01070] Num frames 9700... [2024-09-06 09:35:41,597][01070] Num frames 9800... [2024-09-06 09:35:41,723][01070] Num frames 9900... [2024-09-06 09:35:41,853][01070] Num frames 10000... [2024-09-06 09:35:41,979][01070] Num frames 10100... [2024-09-06 09:35:42,103][01070] Num frames 10200... [2024-09-06 09:35:42,228][01070] Num frames 10300... [2024-09-06 09:35:42,326][01070] Avg episode rewards: #0: 27.259, true rewards: #0: 11.481 [2024-09-06 09:35:42,329][01070] Avg episode reward: 27.259, avg true_objective: 11.481 [2024-09-06 09:35:42,414][01070] Num frames 10400... [2024-09-06 09:35:42,552][01070] Num frames 10500... [2024-09-06 09:35:42,675][01070] Num frames 10600... [2024-09-06 09:35:42,830][01070] Num frames 10700... [2024-09-06 09:35:43,000][01070] Num frames 10800... [2024-09-06 09:35:43,164][01070] Num frames 10900... [2024-09-06 09:35:43,294][01070] Avg episode rewards: #0: 25.741, true rewards: #0: 10.941 [2024-09-06 09:35:43,299][01070] Avg episode reward: 25.741, avg true_objective: 10.941 [2024-09-06 09:36:50,704][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4!