[2024-08-22 18:24:17,978][01647] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-08-22 18:24:17,980][01647] Rollout worker 0 uses device cpu [2024-08-22 18:24:17,982][01647] Rollout worker 1 uses device cpu [2024-08-22 18:24:17,983][01647] Rollout worker 2 uses device cpu [2024-08-22 18:24:17,984][01647] Rollout worker 3 uses device cpu [2024-08-22 18:24:17,986][01647] Rollout worker 4 uses device cpu [2024-08-22 18:24:17,987][01647] Rollout worker 5 uses device cpu [2024-08-22 18:24:17,988][01647] Rollout worker 6 uses device cpu [2024-08-22 18:24:17,989][01647] Rollout worker 7 uses device cpu [2024-08-22 18:24:18,143][01647] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-22 18:24:18,145][01647] InferenceWorker_p0-w0: min num requests: 2 [2024-08-22 18:24:18,178][01647] Starting all processes... [2024-08-22 18:24:18,179][01647] Starting process learner_proc0 [2024-08-22 18:24:18,228][01647] Starting all processes... [2024-08-22 18:24:18,236][01647] Starting process inference_proc0-0 [2024-08-22 18:24:18,236][01647] Starting process rollout_proc0 [2024-08-22 18:24:18,238][01647] Starting process rollout_proc1 [2024-08-22 18:24:18,238][01647] Starting process rollout_proc2 [2024-08-22 18:24:18,238][01647] Starting process rollout_proc3 [2024-08-22 18:24:18,238][01647] Starting process rollout_proc4 [2024-08-22 18:24:18,238][01647] Starting process rollout_proc5 [2024-08-22 18:24:18,238][01647] Starting process rollout_proc6 [2024-08-22 18:24:18,238][01647] Starting process rollout_proc7 [2024-08-22 18:24:28,653][05181] Worker 2 uses CPU cores [0] [2024-08-22 18:24:29,024][05182] Worker 0 uses CPU cores [0] [2024-08-22 18:24:29,059][05165] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-22 18:24:29,060][05165] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-08-22 18:24:29,095][05165] Num visible devices: 1 [2024-08-22 18:24:29,131][05165] Starting seed is not provided [2024-08-22 18:24:29,132][05165] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-22 18:24:29,132][05165] Initializing actor-critic model on device cuda:0 [2024-08-22 18:24:29,133][05165] RunningMeanStd input shape: (3, 72, 128) [2024-08-22 18:24:29,134][05165] RunningMeanStd input shape: (1,) [2024-08-22 18:24:29,261][05165] ConvEncoder: input_channels=3 [2024-08-22 18:24:29,407][05185] Worker 6 uses CPU cores [0] [2024-08-22 18:24:29,484][05186] Worker 7 uses CPU cores [1] [2024-08-22 18:24:29,536][05184] Worker 5 uses CPU cores [1] [2024-08-22 18:24:29,553][05183] Worker 4 uses CPU cores [0] [2024-08-22 18:24:29,659][05178] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-22 18:24:29,662][05178] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-08-22 18:24:29,666][05180] Worker 3 uses CPU cores [1] [2024-08-22 18:24:29,679][05179] Worker 1 uses CPU cores [1] [2024-08-22 18:24:29,715][05178] Num visible devices: 1 [2024-08-22 18:24:29,864][05165] Conv encoder output size: 512 [2024-08-22 18:24:29,865][05165] Policy head output size: 512 [2024-08-22 18:24:29,890][05165] Created Actor Critic model with architecture: [2024-08-22 18:24:29,891][05165] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-08-22 18:24:33,820][05165] Using optimizer [2024-08-22 18:24:33,821][05165] No checkpoints found [2024-08-22 18:24:33,822][05165] Did not load from checkpoint, starting from scratch! [2024-08-22 18:24:33,822][05165] Initialized policy 0 weights for model version 0 [2024-08-22 18:24:33,827][05165] LearnerWorker_p0 finished initialization! [2024-08-22 18:24:33,829][05165] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-22 18:24:33,958][05178] RunningMeanStd input shape: (3, 72, 128) [2024-08-22 18:24:33,960][05178] RunningMeanStd input shape: (1,) [2024-08-22 18:24:33,977][05178] ConvEncoder: input_channels=3 [2024-08-22 18:24:34,074][01647] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-22 18:24:34,098][05178] Conv encoder output size: 512 [2024-08-22 18:24:34,099][05178] Policy head output size: 512 [2024-08-22 18:24:35,619][01647] Inference worker 0-0 is ready! [2024-08-22 18:24:35,620][01647] All inference workers are ready! Signal rollout workers to start! [2024-08-22 18:24:35,774][05181] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-22 18:24:35,776][05183] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-22 18:24:35,801][05179] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-22 18:24:35,803][05185] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-22 18:24:35,809][05180] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-22 18:24:35,810][05184] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-22 18:24:35,807][05182] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-22 18:24:35,811][05186] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-22 18:24:36,977][05181] Decorrelating experience for 0 frames... [2024-08-22 18:24:36,979][05182] Decorrelating experience for 0 frames... [2024-08-22 18:24:37,272][05180] Decorrelating experience for 0 frames... [2024-08-22 18:24:37,275][05179] Decorrelating experience for 0 frames... [2024-08-22 18:24:37,277][05186] Decorrelating experience for 0 frames... [2024-08-22 18:24:37,283][05184] Decorrelating experience for 0 frames... [2024-08-22 18:24:37,374][05185] Decorrelating experience for 0 frames... [2024-08-22 18:24:37,757][05182] Decorrelating experience for 32 frames... [2024-08-22 18:24:38,135][01647] Heartbeat connected on Batcher_0 [2024-08-22 18:24:38,139][01647] Heartbeat connected on LearnerWorker_p0 [2024-08-22 18:24:38,181][01647] Heartbeat connected on InferenceWorker_p0-w0 [2024-08-22 18:24:38,235][05182] Decorrelating experience for 64 frames... [2024-08-22 18:24:38,366][05180] Decorrelating experience for 32 frames... [2024-08-22 18:24:38,367][05179] Decorrelating experience for 32 frames... [2024-08-22 18:24:38,368][05184] Decorrelating experience for 32 frames... [2024-08-22 18:24:38,975][05182] Decorrelating experience for 96 frames... [2024-08-22 18:24:39,074][01647] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-22 18:24:39,130][01647] Heartbeat connected on RolloutWorker_w0 [2024-08-22 18:24:39,216][05183] Decorrelating experience for 0 frames... [2024-08-22 18:24:39,793][05185] Decorrelating experience for 32 frames... [2024-08-22 18:24:39,814][05186] Decorrelating experience for 32 frames... [2024-08-22 18:24:40,054][05180] Decorrelating experience for 64 frames... [2024-08-22 18:24:40,057][05184] Decorrelating experience for 64 frames... [2024-08-22 18:24:40,542][05179] Decorrelating experience for 64 frames... [2024-08-22 18:24:41,119][05183] Decorrelating experience for 32 frames... [2024-08-22 18:24:41,414][05185] Decorrelating experience for 64 frames... [2024-08-22 18:24:42,402][05184] Decorrelating experience for 96 frames... [2024-08-22 18:24:42,404][05180] Decorrelating experience for 96 frames... [2024-08-22 18:24:42,790][01647] Heartbeat connected on RolloutWorker_w5 [2024-08-22 18:24:42,797][01647] Heartbeat connected on RolloutWorker_w3 [2024-08-22 18:24:43,220][05186] Decorrelating experience for 64 frames... [2024-08-22 18:24:43,999][05179] Decorrelating experience for 96 frames... [2024-08-22 18:24:44,075][01647] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.8. Samples: 8. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-22 18:24:44,077][01647] Avg episode reward: [(0, '1.120')] [2024-08-22 18:24:44,581][01647] Heartbeat connected on RolloutWorker_w1 [2024-08-22 18:24:44,751][05185] Decorrelating experience for 96 frames... [2024-08-22 18:24:45,160][01647] Heartbeat connected on RolloutWorker_w6 [2024-08-22 18:24:45,488][05183] Decorrelating experience for 64 frames... [2024-08-22 18:24:48,619][05186] Decorrelating experience for 96 frames... [2024-08-22 18:24:48,665][05181] Decorrelating experience for 32 frames... [2024-08-22 18:24:49,075][01647] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 115.2. Samples: 1728. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-22 18:24:49,081][01647] Avg episode reward: [(0, '2.464')] [2024-08-22 18:24:49,322][01647] Heartbeat connected on RolloutWorker_w7 [2024-08-22 18:24:49,407][05165] Signal inference workers to stop experience collection... [2024-08-22 18:24:49,429][05178] InferenceWorker_p0-w0: stopping experience collection [2024-08-22 18:24:49,737][05183] Decorrelating experience for 96 frames... [2024-08-22 18:24:49,828][01647] Heartbeat connected on RolloutWorker_w4 [2024-08-22 18:24:49,942][05181] Decorrelating experience for 64 frames... [2024-08-22 18:24:50,353][05181] Decorrelating experience for 96 frames... [2024-08-22 18:24:50,417][01647] Heartbeat connected on RolloutWorker_w2 [2024-08-22 18:24:51,114][05165] Signal inference workers to resume experience collection... [2024-08-22 18:24:51,115][05178] InferenceWorker_p0-w0: resuming experience collection [2024-08-22 18:24:54,074][01647] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 208.3. Samples: 4166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:24:54,080][01647] Avg episode reward: [(0, '3.223')] [2024-08-22 18:24:59,074][01647] Fps is (10 sec: 3277.1, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 32768. Throughput: 0: 292.7. Samples: 7318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:24:59,079][01647] Avg episode reward: [(0, '3.792')] [2024-08-22 18:25:00,756][05178] Updated weights for policy 0, policy_version 10 (0.0017) [2024-08-22 18:25:04,076][01647] Fps is (10 sec: 3276.3, 60 sec: 1638.3, 300 sec: 1638.3). Total num frames: 49152. Throughput: 0: 401.0. Samples: 12032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:25:04,080][01647] Avg episode reward: [(0, '4.133')] [2024-08-22 18:25:09,074][01647] Fps is (10 sec: 3276.8, 60 sec: 1872.5, 300 sec: 1872.5). Total num frames: 65536. Throughput: 0: 489.3. Samples: 17126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:25:09,077][01647] Avg episode reward: [(0, '4.331')] [2024-08-22 18:25:12,113][05178] Updated weights for policy 0, policy_version 20 (0.0019) [2024-08-22 18:25:14,074][01647] Fps is (10 sec: 4096.6, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 90112. Throughput: 0: 509.2. Samples: 20370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:25:14,076][01647] Avg episode reward: [(0, '4.323')] [2024-08-22 18:25:19,074][01647] Fps is (10 sec: 4096.0, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 106496. Throughput: 0: 579.9. Samples: 26094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:25:19,077][01647] Avg episode reward: [(0, '4.195')] [2024-08-22 18:25:19,082][05165] Saving new best policy, reward=4.195! [2024-08-22 18:25:24,074][01647] Fps is (10 sec: 2867.2, 60 sec: 2375.7, 300 sec: 2375.7). Total num frames: 118784. Throughput: 0: 665.6. Samples: 29954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:25:24,082][01647] Avg episode reward: [(0, '4.293')] [2024-08-22 18:25:24,092][05165] Saving new best policy, reward=4.293! [2024-08-22 18:25:24,903][05178] Updated weights for policy 0, policy_version 30 (0.0030) [2024-08-22 18:25:29,074][01647] Fps is (10 sec: 3276.8, 60 sec: 2532.1, 300 sec: 2532.1). Total num frames: 139264. Throughput: 0: 735.5. Samples: 33106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:25:29,076][01647] Avg episode reward: [(0, '4.299')] [2024-08-22 18:25:29,086][05165] Saving new best policy, reward=4.299! [2024-08-22 18:25:34,074][01647] Fps is (10 sec: 4096.0, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 159744. Throughput: 0: 833.1. Samples: 39218. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:25:34,077][01647] Avg episode reward: [(0, '4.347')] [2024-08-22 18:25:34,087][05165] Saving new best policy, reward=4.347! [2024-08-22 18:25:35,558][05178] Updated weights for policy 0, policy_version 40 (0.0030) [2024-08-22 18:25:39,076][01647] Fps is (10 sec: 2866.8, 60 sec: 2798.9, 300 sec: 2583.6). Total num frames: 167936. Throughput: 0: 866.2. Samples: 43146. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:25:39,085][01647] Avg episode reward: [(0, '4.436')] [2024-08-22 18:25:39,177][05165] Saving new best policy, reward=4.436! [2024-08-22 18:25:44,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 2691.7). Total num frames: 188416. Throughput: 0: 840.8. Samples: 45154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:25:44,078][01647] Avg episode reward: [(0, '4.412')] [2024-08-22 18:25:48,251][05178] Updated weights for policy 0, policy_version 50 (0.0019) [2024-08-22 18:25:49,074][01647] Fps is (10 sec: 3686.9, 60 sec: 3413.4, 300 sec: 2730.7). Total num frames: 204800. Throughput: 0: 869.9. Samples: 51176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:25:49,080][01647] Avg episode reward: [(0, '4.354')] [2024-08-22 18:25:54,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 2764.8). Total num frames: 221184. Throughput: 0: 865.5. Samples: 56072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:25:54,076][01647] Avg episode reward: [(0, '4.445')] [2024-08-22 18:25:54,091][05165] Saving new best policy, reward=4.445! [2024-08-22 18:25:59,077][01647] Fps is (10 sec: 2866.4, 60 sec: 3344.9, 300 sec: 2746.6). Total num frames: 233472. Throughput: 0: 832.7. Samples: 57844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:25:59,081][01647] Avg episode reward: [(0, '4.452')] [2024-08-22 18:25:59,084][05165] Saving new best policy, reward=4.452! [2024-08-22 18:26:01,415][05178] Updated weights for policy 0, policy_version 60 (0.0018) [2024-08-22 18:26:04,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 2821.7). Total num frames: 253952. Throughput: 0: 823.4. Samples: 63148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:26:04,080][01647] Avg episode reward: [(0, '4.672')] [2024-08-22 18:26:04,089][05165] Saving new best policy, reward=4.672! [2024-08-22 18:26:09,074][01647] Fps is (10 sec: 4097.1, 60 sec: 3481.6, 300 sec: 2888.8). Total num frames: 274432. Throughput: 0: 872.9. Samples: 69234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:26:09,083][01647] Avg episode reward: [(0, '4.688')] [2024-08-22 18:26:09,085][05165] Saving new best policy, reward=4.688! [2024-08-22 18:26:13,451][05178] Updated weights for policy 0, policy_version 70 (0.0016) [2024-08-22 18:26:14,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 2867.2). Total num frames: 286720. Throughput: 0: 843.0. Samples: 71040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:26:14,077][01647] Avg episode reward: [(0, '4.654')] [2024-08-22 18:26:14,094][05165] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000070_286720.pth... [2024-08-22 18:26:19,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2886.7). Total num frames: 303104. Throughput: 0: 807.2. Samples: 75542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:26:19,080][01647] Avg episode reward: [(0, '4.629')] [2024-08-22 18:26:24,075][01647] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 2941.7). Total num frames: 323584. Throughput: 0: 858.0. Samples: 81756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:26:24,077][01647] Avg episode reward: [(0, '4.714')] [2024-08-22 18:26:24,089][05165] Saving new best policy, reward=4.714! [2024-08-22 18:26:24,506][05178] Updated weights for policy 0, policy_version 80 (0.0019) [2024-08-22 18:26:29,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2956.2). Total num frames: 339968. Throughput: 0: 872.8. Samples: 84430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-08-22 18:26:29,078][01647] Avg episode reward: [(0, '4.508')] [2024-08-22 18:26:34,074][01647] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 2935.5). Total num frames: 352256. Throughput: 0: 825.1. Samples: 88304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:26:34,077][01647] Avg episode reward: [(0, '4.508')] [2024-08-22 18:26:37,172][05178] Updated weights for policy 0, policy_version 90 (0.0015) [2024-08-22 18:26:39,075][01647] Fps is (10 sec: 3276.7, 60 sec: 3413.4, 300 sec: 2981.9). Total num frames: 372736. Throughput: 0: 851.0. Samples: 94368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:26:39,078][01647] Avg episode reward: [(0, '4.404')] [2024-08-22 18:26:44,075][01647] Fps is (10 sec: 4505.5, 60 sec: 3481.6, 300 sec: 3056.2). Total num frames: 397312. Throughput: 0: 882.9. Samples: 97574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:26:44,077][01647] Avg episode reward: [(0, '4.601')] [2024-08-22 18:26:48,824][05178] Updated weights for policy 0, policy_version 100 (0.0021) [2024-08-22 18:26:49,074][01647] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3034.1). Total num frames: 409600. Throughput: 0: 866.0. Samples: 102120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:26:49,077][01647] Avg episode reward: [(0, '4.669')] [2024-08-22 18:26:54,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3042.7). Total num frames: 425984. Throughput: 0: 841.6. Samples: 107104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:26:54,080][01647] Avg episode reward: [(0, '4.585')] [2024-08-22 18:26:59,075][01647] Fps is (10 sec: 3686.3, 60 sec: 3550.0, 300 sec: 3079.1). Total num frames: 446464. Throughput: 0: 871.4. Samples: 110254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:26:59,079][01647] Avg episode reward: [(0, '4.355')] [2024-08-22 18:26:59,430][05178] Updated weights for policy 0, policy_version 110 (0.0037) [2024-08-22 18:27:04,076][01647] Fps is (10 sec: 3685.8, 60 sec: 3481.5, 300 sec: 3085.6). Total num frames: 462848. Throughput: 0: 894.9. Samples: 115812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:27:04,080][01647] Avg episode reward: [(0, '4.368')] [2024-08-22 18:27:09,074][01647] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3091.8). Total num frames: 479232. Throughput: 0: 850.3. Samples: 120020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:27:09,079][01647] Avg episode reward: [(0, '4.396')] [2024-08-22 18:27:11,749][05178] Updated weights for policy 0, policy_version 120 (0.0012) [2024-08-22 18:27:14,074][01647] Fps is (10 sec: 3687.0, 60 sec: 3549.9, 300 sec: 3123.2). Total num frames: 499712. Throughput: 0: 860.7. Samples: 123162. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:27:14,076][01647] Avg episode reward: [(0, '4.557')] [2024-08-22 18:27:19,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3127.9). Total num frames: 516096. Throughput: 0: 912.7. Samples: 129376. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-22 18:27:19,080][01647] Avg episode reward: [(0, '4.603')] [2024-08-22 18:27:24,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3108.1). Total num frames: 528384. Throughput: 0: 862.8. Samples: 133194. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:27:24,077][01647] Avg episode reward: [(0, '4.656')] [2024-08-22 18:27:24,112][05178] Updated weights for policy 0, policy_version 130 (0.0016) [2024-08-22 18:27:29,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3136.4). Total num frames: 548864. Throughput: 0: 847.9. Samples: 135728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:27:29,079][01647] Avg episode reward: [(0, '4.652')] [2024-08-22 18:27:34,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3163.0). Total num frames: 569344. Throughput: 0: 885.7. Samples: 141976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:27:34,080][01647] Avg episode reward: [(0, '4.654')] [2024-08-22 18:27:34,514][05178] Updated weights for policy 0, policy_version 140 (0.0022) [2024-08-22 18:27:39,080][01647] Fps is (10 sec: 3684.2, 60 sec: 3549.5, 300 sec: 3166.0). Total num frames: 585728. Throughput: 0: 882.4. Samples: 146818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:27:39,083][01647] Avg episode reward: [(0, '4.532')] [2024-08-22 18:27:44,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3147.5). Total num frames: 598016. Throughput: 0: 854.7. Samples: 148716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:27:44,077][01647] Avg episode reward: [(0, '4.529')] [2024-08-22 18:27:47,164][05178] Updated weights for policy 0, policy_version 150 (0.0033) [2024-08-22 18:27:49,074][01647] Fps is (10 sec: 3278.8, 60 sec: 3481.6, 300 sec: 3171.8). Total num frames: 618496. Throughput: 0: 860.1. Samples: 154516. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:27:49,077][01647] Avg episode reward: [(0, '4.370')] [2024-08-22 18:27:54,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3174.4). Total num frames: 634880. Throughput: 0: 889.9. Samples: 160064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:27:54,079][01647] Avg episode reward: [(0, '4.542')] [2024-08-22 18:27:59,075][01647] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3176.9). Total num frames: 651264. Throughput: 0: 860.9. Samples: 161904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:27:59,078][01647] Avg episode reward: [(0, '4.534')] [2024-08-22 18:28:00,314][05178] Updated weights for policy 0, policy_version 160 (0.0014) [2024-08-22 18:28:04,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3179.3). Total num frames: 667648. Throughput: 0: 832.3. Samples: 166828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:28:04,077][01647] Avg episode reward: [(0, '4.555')] [2024-08-22 18:28:09,074][01647] Fps is (10 sec: 3686.6, 60 sec: 3481.6, 300 sec: 3200.6). Total num frames: 688128. Throughput: 0: 888.2. Samples: 173162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:28:09,076][01647] Avg episode reward: [(0, '4.510')] [2024-08-22 18:28:10,214][05178] Updated weights for policy 0, policy_version 170 (0.0018) [2024-08-22 18:28:14,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3202.3). Total num frames: 704512. Throughput: 0: 885.9. Samples: 175592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:28:14,076][01647] Avg episode reward: [(0, '4.435')] [2024-08-22 18:28:14,092][05165] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000172_704512.pth... [2024-08-22 18:28:19,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3204.0). Total num frames: 720896. Throughput: 0: 834.8. Samples: 179540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:28:19,084][01647] Avg episode reward: [(0, '4.593')] [2024-08-22 18:28:22,851][05178] Updated weights for policy 0, policy_version 180 (0.0020) [2024-08-22 18:28:24,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3223.4). Total num frames: 741376. Throughput: 0: 862.2. Samples: 185612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:28:24,077][01647] Avg episode reward: [(0, '4.618')] [2024-08-22 18:28:29,085][01647] Fps is (10 sec: 3682.5, 60 sec: 3481.0, 300 sec: 3224.4). Total num frames: 757760. Throughput: 0: 888.3. Samples: 188700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:28:29,087][01647] Avg episode reward: [(0, '4.580')] [2024-08-22 18:28:34,074][01647] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3208.5). Total num frames: 770048. Throughput: 0: 851.2. Samples: 192822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:28:34,077][01647] Avg episode reward: [(0, '4.603')] [2024-08-22 18:28:35,791][05178] Updated weights for policy 0, policy_version 190 (0.0017) [2024-08-22 18:28:39,074][01647] Fps is (10 sec: 3280.2, 60 sec: 3413.7, 300 sec: 3226.6). Total num frames: 790528. Throughput: 0: 848.7. Samples: 198254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:28:39,076][01647] Avg episode reward: [(0, '4.546')] [2024-08-22 18:28:44,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3244.0). Total num frames: 811008. Throughput: 0: 878.5. Samples: 201436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:28:44,077][01647] Avg episode reward: [(0, '4.445')] [2024-08-22 18:28:45,442][05178] Updated weights for policy 0, policy_version 200 (0.0018) [2024-08-22 18:28:49,075][01647] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3244.7). Total num frames: 827392. Throughput: 0: 885.0. Samples: 206654. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:28:49,079][01647] Avg episode reward: [(0, '4.439')] [2024-08-22 18:28:54,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3245.3). Total num frames: 843776. Throughput: 0: 843.2. Samples: 211108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:28:54,078][01647] Avg episode reward: [(0, '4.456')] [2024-08-22 18:28:57,909][05178] Updated weights for policy 0, policy_version 210 (0.0018) [2024-08-22 18:28:59,074][01647] Fps is (10 sec: 3686.6, 60 sec: 3549.9, 300 sec: 3261.3). Total num frames: 864256. Throughput: 0: 859.1. Samples: 214252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:28:59,076][01647] Avg episode reward: [(0, '4.630')] [2024-08-22 18:29:04,076][01647] Fps is (10 sec: 3685.8, 60 sec: 3549.8, 300 sec: 3261.6). Total num frames: 880640. Throughput: 0: 909.9. Samples: 220486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:29:04,081][01647] Avg episode reward: [(0, '4.703')] [2024-08-22 18:29:09,077][01647] Fps is (10 sec: 2866.5, 60 sec: 3413.2, 300 sec: 3247.0). Total num frames: 892928. Throughput: 0: 860.8. Samples: 224348. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:29:09,081][01647] Avg episode reward: [(0, '4.628')] [2024-08-22 18:29:10,598][05178] Updated weights for policy 0, policy_version 220 (0.0035) [2024-08-22 18:29:14,074][01647] Fps is (10 sec: 3277.3, 60 sec: 3481.6, 300 sec: 3262.2). Total num frames: 913408. Throughput: 0: 853.4. Samples: 227096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:29:14,077][01647] Avg episode reward: [(0, '4.627')] [2024-08-22 18:29:19,074][01647] Fps is (10 sec: 4097.0, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 933888. Throughput: 0: 902.8. Samples: 233446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:29:19,077][01647] Avg episode reward: [(0, '4.439')] [2024-08-22 18:29:20,715][05178] Updated weights for policy 0, policy_version 230 (0.0022) [2024-08-22 18:29:24,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 950272. Throughput: 0: 886.0. Samples: 238124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:29:24,078][01647] Avg episode reward: [(0, '4.488')] [2024-08-22 18:29:29,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3482.2, 300 sec: 3276.8). Total num frames: 966656. Throughput: 0: 858.8. Samples: 240084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:29:29,077][01647] Avg episode reward: [(0, '4.562')] [2024-08-22 18:29:32,611][05178] Updated weights for policy 0, policy_version 240 (0.0026) [2024-08-22 18:29:34,074][01647] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3346.2). Total num frames: 987136. Throughput: 0: 880.8. Samples: 246290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:29:34,082][01647] Avg episode reward: [(0, '4.478')] [2024-08-22 18:29:39,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3401.8). Total num frames: 1003520. Throughput: 0: 911.3. Samples: 252116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:29:39,078][01647] Avg episode reward: [(0, '4.426')] [2024-08-22 18:29:44,075][01647] Fps is (10 sec: 3276.5, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1019904. Throughput: 0: 883.4. Samples: 254004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:29:44,080][01647] Avg episode reward: [(0, '4.489')] [2024-08-22 18:29:45,071][05178] Updated weights for policy 0, policy_version 250 (0.0023) [2024-08-22 18:29:49,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 1040384. Throughput: 0: 866.0. Samples: 259454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:29:49,078][01647] Avg episode reward: [(0, '4.491')] [2024-08-22 18:29:54,074][01647] Fps is (10 sec: 4096.4, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 1060864. Throughput: 0: 919.4. Samples: 265718. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:29:54,082][01647] Avg episode reward: [(0, '4.509')] [2024-08-22 18:29:54,626][05178] Updated weights for policy 0, policy_version 260 (0.0023) [2024-08-22 18:29:59,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1073152. Throughput: 0: 907.2. Samples: 267918. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-22 18:29:59,077][01647] Avg episode reward: [(0, '4.518')] [2024-08-22 18:30:04,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3485.1). Total num frames: 1093632. Throughput: 0: 864.1. Samples: 272332. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-08-22 18:30:04,076][01647] Avg episode reward: [(0, '4.402')] [2024-08-22 18:30:06,924][05178] Updated weights for policy 0, policy_version 270 (0.0017) [2024-08-22 18:30:09,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3686.5, 300 sec: 3471.2). Total num frames: 1114112. Throughput: 0: 903.4. Samples: 278778. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:30:09,079][01647] Avg episode reward: [(0, '4.384')] [2024-08-22 18:30:14,076][01647] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3471.2). Total num frames: 1130496. Throughput: 0: 930.0. Samples: 281936. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:30:14,079][01647] Avg episode reward: [(0, '4.632')] [2024-08-22 18:30:14,091][05165] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000276_1130496.pth... [2024-08-22 18:30:14,282][05165] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000070_286720.pth [2024-08-22 18:30:19,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1142784. Throughput: 0: 877.7. Samples: 285788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:30:19,079][01647] Avg episode reward: [(0, '4.861')] [2024-08-22 18:30:19,082][05165] Saving new best policy, reward=4.861! [2024-08-22 18:30:19,704][05178] Updated weights for policy 0, policy_version 280 (0.0014) [2024-08-22 18:30:24,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 1163264. Throughput: 0: 876.4. Samples: 291554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:30:24,083][01647] Avg episode reward: [(0, '4.890')] [2024-08-22 18:30:24,097][05165] Saving new best policy, reward=4.890! [2024-08-22 18:30:29,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3471.2). Total num frames: 1183744. Throughput: 0: 898.9. Samples: 294454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:30:29,078][01647] Avg episode reward: [(0, '4.910')] [2024-08-22 18:30:29,084][05165] Saving new best policy, reward=4.910! [2024-08-22 18:30:29,988][05178] Updated weights for policy 0, policy_version 290 (0.0018) [2024-08-22 18:30:34,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1196032. Throughput: 0: 885.2. Samples: 299286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:30:34,077][01647] Avg episode reward: [(0, '4.844')] [2024-08-22 18:30:39,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1212416. Throughput: 0: 854.0. Samples: 304146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:30:39,076][01647] Avg episode reward: [(0, '4.907')] [2024-08-22 18:30:42,061][05178] Updated weights for policy 0, policy_version 300 (0.0052) [2024-08-22 18:30:44,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1232896. Throughput: 0: 874.9. Samples: 307290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:30:44,079][01647] Avg episode reward: [(0, '4.796')] [2024-08-22 18:30:49,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1253376. Throughput: 0: 907.8. Samples: 313184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:30:49,082][01647] Avg episode reward: [(0, '4.779')] [2024-08-22 18:30:54,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 1265664. Throughput: 0: 851.6. Samples: 317100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:30:54,078][01647] Avg episode reward: [(0, '4.776')] [2024-08-22 18:30:54,532][05178] Updated weights for policy 0, policy_version 310 (0.0020) [2024-08-22 18:30:59,075][01647] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3499.0). Total num frames: 1286144. Throughput: 0: 849.9. Samples: 320182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:30:59,082][01647] Avg episode reward: [(0, '4.821')] [2024-08-22 18:31:04,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1306624. Throughput: 0: 906.7. Samples: 326590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:31:04,083][01647] Avg episode reward: [(0, '4.829')] [2024-08-22 18:31:04,592][05178] Updated weights for policy 0, policy_version 320 (0.0012) [2024-08-22 18:31:09,075][01647] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 1318912. Throughput: 0: 874.2. Samples: 330894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:31:09,080][01647] Avg episode reward: [(0, '4.723')] [2024-08-22 18:31:14,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1339392. Throughput: 0: 862.4. Samples: 333262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:31:14,081][01647] Avg episode reward: [(0, '4.657')] [2024-08-22 18:31:16,568][05178] Updated weights for policy 0, policy_version 330 (0.0016) [2024-08-22 18:31:19,074][01647] Fps is (10 sec: 4096.2, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 1359872. Throughput: 0: 895.0. Samples: 339560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:31:19,077][01647] Avg episode reward: [(0, '4.891')] [2024-08-22 18:31:24,076][01647] Fps is (10 sec: 3685.7, 60 sec: 3549.8, 300 sec: 3512.8). Total num frames: 1376256. Throughput: 0: 904.0. Samples: 344826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:31:24,083][01647] Avg episode reward: [(0, '4.682')] [2024-08-22 18:31:29,059][05178] Updated weights for policy 0, policy_version 340 (0.0048) [2024-08-22 18:31:29,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1392640. Throughput: 0: 876.2. Samples: 346718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:31:29,077][01647] Avg episode reward: [(0, '4.358')] [2024-08-22 18:31:34,074][01647] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 1413120. Throughput: 0: 873.6. Samples: 352498. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:31:34,076][01647] Avg episode reward: [(0, '4.479')] [2024-08-22 18:31:39,014][05178] Updated weights for policy 0, policy_version 350 (0.0026) [2024-08-22 18:31:39,079][01647] Fps is (10 sec: 4094.2, 60 sec: 3686.1, 300 sec: 3512.8). Total num frames: 1433600. Throughput: 0: 926.1. Samples: 358778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:31:39,081][01647] Avg episode reward: [(0, '4.728')] [2024-08-22 18:31:44,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1445888. Throughput: 0: 901.2. Samples: 360734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:31:44,079][01647] Avg episode reward: [(0, '4.751')] [2024-08-22 18:31:49,074][01647] Fps is (10 sec: 3278.3, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1466368. Throughput: 0: 865.4. Samples: 365534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:31:49,078][01647] Avg episode reward: [(0, '4.539')] [2024-08-22 18:31:51,033][05178] Updated weights for policy 0, policy_version 360 (0.0037) [2024-08-22 18:31:54,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 1486848. Throughput: 0: 912.1. Samples: 371940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:31:54,076][01647] Avg episode reward: [(0, '4.582')] [2024-08-22 18:31:59,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3512.9). Total num frames: 1499136. Throughput: 0: 920.7. Samples: 374692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:31:59,079][01647] Avg episode reward: [(0, '4.551')] [2024-08-22 18:32:03,644][05178] Updated weights for policy 0, policy_version 370 (0.0017) [2024-08-22 18:32:04,075][01647] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1515520. Throughput: 0: 866.6. Samples: 378556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:32:04,077][01647] Avg episode reward: [(0, '4.598')] [2024-08-22 18:32:09,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3512.8). Total num frames: 1536000. Throughput: 0: 890.0. Samples: 384874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:32:09,079][01647] Avg episode reward: [(0, '4.822')] [2024-08-22 18:32:13,086][05178] Updated weights for policy 0, policy_version 380 (0.0013) [2024-08-22 18:32:14,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 1556480. Throughput: 0: 919.1. Samples: 388078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:32:14,076][01647] Avg episode reward: [(0, '4.698')] [2024-08-22 18:32:14,088][05165] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000380_1556480.pth... [2024-08-22 18:32:14,263][05165] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000172_704512.pth [2024-08-22 18:32:19,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1568768. Throughput: 0: 890.1. Samples: 392552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:32:19,077][01647] Avg episode reward: [(0, '4.729')] [2024-08-22 18:32:24,075][01647] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3526.7). Total num frames: 1589248. Throughput: 0: 867.5. Samples: 397812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:32:24,082][01647] Avg episode reward: [(0, '4.725')] [2024-08-22 18:32:25,824][05178] Updated weights for policy 0, policy_version 390 (0.0025) [2024-08-22 18:32:29,076][01647] Fps is (10 sec: 4095.5, 60 sec: 3618.0, 300 sec: 3526.7). Total num frames: 1609728. Throughput: 0: 894.0. Samples: 400964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:32:29,082][01647] Avg episode reward: [(0, '4.570')] [2024-08-22 18:32:34,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 1626112. Throughput: 0: 907.6. Samples: 406376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:32:34,081][01647] Avg episode reward: [(0, '4.580')] [2024-08-22 18:32:38,357][05178] Updated weights for policy 0, policy_version 400 (0.0017) [2024-08-22 18:32:39,074][01647] Fps is (10 sec: 2867.6, 60 sec: 3413.6, 300 sec: 3526.7). Total num frames: 1638400. Throughput: 0: 859.3. Samples: 410610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:32:39,077][01647] Avg episode reward: [(0, '4.723')] [2024-08-22 18:32:44,076][01647] Fps is (10 sec: 3276.4, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 1658880. Throughput: 0: 867.7. Samples: 413740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:32:44,082][01647] Avg episode reward: [(0, '4.861')] [2024-08-22 18:32:48,112][05178] Updated weights for policy 0, policy_version 410 (0.0024) [2024-08-22 18:32:49,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1679360. Throughput: 0: 922.4. Samples: 420066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:32:49,079][01647] Avg episode reward: [(0, '4.690')] [2024-08-22 18:32:54,074][01647] Fps is (10 sec: 3277.3, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 1691648. Throughput: 0: 867.6. Samples: 423916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:32:54,080][01647] Avg episode reward: [(0, '4.862')] [2024-08-22 18:32:59,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1712128. Throughput: 0: 853.5. Samples: 426486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:32:59,077][01647] Avg episode reward: [(0, '4.747')] [2024-08-22 18:33:00,775][05178] Updated weights for policy 0, policy_version 420 (0.0020) [2024-08-22 18:33:04,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1732608. Throughput: 0: 890.6. Samples: 432628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:33:04,080][01647] Avg episode reward: [(0, '4.714')] [2024-08-22 18:33:09,077][01647] Fps is (10 sec: 3276.0, 60 sec: 3481.5, 300 sec: 3526.7). Total num frames: 1744896. Throughput: 0: 882.0. Samples: 437506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:33:09,081][01647] Avg episode reward: [(0, '4.789')] [2024-08-22 18:33:13,382][05178] Updated weights for policy 0, policy_version 430 (0.0027) [2024-08-22 18:33:14,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 1761280. Throughput: 0: 854.6. Samples: 439418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:33:14,077][01647] Avg episode reward: [(0, '4.894')] [2024-08-22 18:33:19,074][01647] Fps is (10 sec: 3687.3, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1781760. Throughput: 0: 865.2. Samples: 445310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:33:19,081][01647] Avg episode reward: [(0, '4.882')] [2024-08-22 18:33:23,613][05178] Updated weights for policy 0, policy_version 440 (0.0013) [2024-08-22 18:33:24,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3540.7). Total num frames: 1802240. Throughput: 0: 900.0. Samples: 451110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:33:24,081][01647] Avg episode reward: [(0, '4.668')] [2024-08-22 18:33:29,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3540.6). Total num frames: 1814528. Throughput: 0: 874.6. Samples: 453096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:33:29,082][01647] Avg episode reward: [(0, '4.661')] [2024-08-22 18:33:34,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1835008. Throughput: 0: 848.2. Samples: 458234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:33:34,081][01647] Avg episode reward: [(0, '4.658')] [2024-08-22 18:33:35,478][05178] Updated weights for policy 0, policy_version 450 (0.0030) [2024-08-22 18:33:39,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1855488. Throughput: 0: 908.9. Samples: 464816. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:33:39,082][01647] Avg episode reward: [(0, '4.588')] [2024-08-22 18:33:44,075][01647] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1871872. Throughput: 0: 910.9. Samples: 467476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:33:44,079][01647] Avg episode reward: [(0, '4.546')] [2024-08-22 18:33:47,311][05178] Updated weights for policy 0, policy_version 460 (0.0039) [2024-08-22 18:33:49,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1892352. Throughput: 0: 874.2. Samples: 471966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:33:49,076][01647] Avg episode reward: [(0, '4.700')] [2024-08-22 18:33:54,074][01647] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 1912832. Throughput: 0: 917.2. Samples: 478778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:33:54,078][01647] Avg episode reward: [(0, '4.715')] [2024-08-22 18:33:56,284][05178] Updated weights for policy 0, policy_version 470 (0.0012) [2024-08-22 18:33:59,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 1933312. Throughput: 0: 950.9. Samples: 482208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:33:59,081][01647] Avg episode reward: [(0, '4.901')] [2024-08-22 18:34:04,078][01647] Fps is (10 sec: 3275.4, 60 sec: 3549.6, 300 sec: 3568.4). Total num frames: 1945600. Throughput: 0: 915.4. Samples: 486506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:34:04,084][01647] Avg episode reward: [(0, '4.877')] [2024-08-22 18:34:08,196][05178] Updated weights for policy 0, policy_version 480 (0.0020) [2024-08-22 18:34:09,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3568.4). Total num frames: 1966080. Throughput: 0: 919.2. Samples: 492472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:34:09,077][01647] Avg episode reward: [(0, '4.827')] [2024-08-22 18:34:14,074][01647] Fps is (10 sec: 4507.5, 60 sec: 3822.9, 300 sec: 3582.3). Total num frames: 1990656. Throughput: 0: 951.6. Samples: 495920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:34:14,079][01647] Avg episode reward: [(0, '4.744')] [2024-08-22 18:34:14,089][05165] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000486_1990656.pth... [2024-08-22 18:34:14,248][05165] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000276_1130496.pth [2024-08-22 18:34:18,950][05178] Updated weights for policy 0, policy_version 490 (0.0024) [2024-08-22 18:34:19,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 2007040. Throughput: 0: 956.8. Samples: 501292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:34:19,077][01647] Avg episode reward: [(0, '4.678')] [2024-08-22 18:34:24,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 2023424. Throughput: 0: 922.8. Samples: 506340. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:34:24,082][01647] Avg episode reward: [(0, '4.880')] [2024-08-22 18:34:29,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3582.3). Total num frames: 2043904. Throughput: 0: 939.8. Samples: 509768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:34:29,080][01647] Avg episode reward: [(0, '4.731')] [2024-08-22 18:34:29,217][05178] Updated weights for policy 0, policy_version 500 (0.0028) [2024-08-22 18:34:34,077][01647] Fps is (10 sec: 4095.0, 60 sec: 3822.8, 300 sec: 3596.1). Total num frames: 2064384. Throughput: 0: 976.8. Samples: 515926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:34:34,081][01647] Avg episode reward: [(0, '4.518')] [2024-08-22 18:34:39,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 2076672. Throughput: 0: 919.4. Samples: 520152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:34:39,077][01647] Avg episode reward: [(0, '4.592')] [2024-08-22 18:34:41,221][05178] Updated weights for policy 0, policy_version 510 (0.0012) [2024-08-22 18:34:44,074][01647] Fps is (10 sec: 3687.3, 60 sec: 3823.0, 300 sec: 3596.1). Total num frames: 2101248. Throughput: 0: 914.9. Samples: 523378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:34:44,081][01647] Avg episode reward: [(0, '4.881')] [2024-08-22 18:34:49,074][01647] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3596.2). Total num frames: 2121728. Throughput: 0: 970.5. Samples: 530174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:34:49,082][01647] Avg episode reward: [(0, '4.706')] [2024-08-22 18:34:50,701][05178] Updated weights for policy 0, policy_version 520 (0.0012) [2024-08-22 18:34:54,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3610.0). Total num frames: 2138112. Throughput: 0: 947.0. Samples: 535088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:34:54,079][01647] Avg episode reward: [(0, '4.557')] [2024-08-22 18:34:59,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 2154496. Throughput: 0: 918.0. Samples: 537230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:34:59,079][01647] Avg episode reward: [(0, '4.539')] [2024-08-22 18:35:02,036][05178] Updated weights for policy 0, policy_version 530 (0.0021) [2024-08-22 18:35:04,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3891.5, 300 sec: 3610.0). Total num frames: 2179072. Throughput: 0: 952.0. Samples: 544134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:35:04,081][01647] Avg episode reward: [(0, '4.472')] [2024-08-22 18:35:09,075][01647] Fps is (10 sec: 4505.4, 60 sec: 3891.2, 300 sec: 3623.9). Total num frames: 2199552. Throughput: 0: 972.6. Samples: 550106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:35:09,077][01647] Avg episode reward: [(0, '4.893')] [2024-08-22 18:35:13,286][05178] Updated weights for policy 0, policy_version 540 (0.0025) [2024-08-22 18:35:14,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 2211840. Throughput: 0: 943.8. Samples: 552238. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-08-22 18:35:14,082][01647] Avg episode reward: [(0, '4.996')] [2024-08-22 18:35:14,092][05165] Saving new best policy, reward=4.996! [2024-08-22 18:35:19,074][01647] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3637.8). Total num frames: 2236416. Throughput: 0: 938.3. Samples: 558148. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:35:19,077][01647] Avg episode reward: [(0, '5.013')] [2024-08-22 18:35:19,084][05165] Saving new best policy, reward=5.013! [2024-08-22 18:35:22,674][05178] Updated weights for policy 0, policy_version 550 (0.0013) [2024-08-22 18:35:24,077][01647] Fps is (10 sec: 4504.5, 60 sec: 3891.0, 300 sec: 3637.8). Total num frames: 2256896. Throughput: 0: 996.4. Samples: 564992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:35:24,080][01647] Avg episode reward: [(0, '4.917')] [2024-08-22 18:35:29,075][01647] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3637.8). Total num frames: 2269184. Throughput: 0: 974.5. Samples: 567230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:35:29,081][01647] Avg episode reward: [(0, '4.898')] [2024-08-22 18:35:34,074][01647] Fps is (10 sec: 3277.6, 60 sec: 3754.8, 300 sec: 3651.7). Total num frames: 2289664. Throughput: 0: 929.3. Samples: 571992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:35:34,078][01647] Avg episode reward: [(0, '5.036')] [2024-08-22 18:35:34,087][05165] Saving new best policy, reward=5.036! [2024-08-22 18:35:34,552][05178] Updated weights for policy 0, policy_version 560 (0.0025) [2024-08-22 18:35:39,074][01647] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3665.6). Total num frames: 2314240. Throughput: 0: 973.5. Samples: 578896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:35:39,076][01647] Avg episode reward: [(0, '5.323')] [2024-08-22 18:35:39,082][05165] Saving new best policy, reward=5.323! [2024-08-22 18:35:44,075][01647] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3651.7). Total num frames: 2330624. Throughput: 0: 998.7. Samples: 582172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:35:44,077][01647] Avg episode reward: [(0, '5.206')] [2024-08-22 18:35:44,476][05178] Updated weights for policy 0, policy_version 570 (0.0023) [2024-08-22 18:35:49,074][01647] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3665.6). Total num frames: 2347008. Throughput: 0: 941.8. Samples: 586516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:35:49,078][01647] Avg episode reward: [(0, '5.026')] [2024-08-22 18:35:54,075][01647] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3679.5). Total num frames: 2371584. Throughput: 0: 954.1. Samples: 593040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:35:54,079][01647] Avg episode reward: [(0, '5.362')] [2024-08-22 18:35:54,088][05165] Saving new best policy, reward=5.362! [2024-08-22 18:35:54,804][05178] Updated weights for policy 0, policy_version 580 (0.0017) [2024-08-22 18:35:59,074][01647] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3679.5). Total num frames: 2392064. Throughput: 0: 981.0. Samples: 596384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:35:59,079][01647] Avg episode reward: [(0, '5.511')] [2024-08-22 18:35:59,084][05165] Saving new best policy, reward=5.511! [2024-08-22 18:36:04,074][01647] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2404352. Throughput: 0: 956.6. Samples: 601196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:36:04,078][01647] Avg episode reward: [(0, '5.302')] [2024-08-22 18:36:07,137][05178] Updated weights for policy 0, policy_version 590 (0.0019) [2024-08-22 18:36:09,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2424832. Throughput: 0: 918.6. Samples: 606326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:36:09,080][01647] Avg episode reward: [(0, '5.435')] [2024-08-22 18:36:14,075][01647] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3679.5). Total num frames: 2445312. Throughput: 0: 944.8. Samples: 609746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:36:14,080][01647] Avg episode reward: [(0, '5.382')] [2024-08-22 18:36:14,091][05165] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000597_2445312.pth... [2024-08-22 18:36:14,198][05165] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000380_1556480.pth [2024-08-22 18:36:16,421][05178] Updated weights for policy 0, policy_version 600 (0.0020) [2024-08-22 18:36:19,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2461696. Throughput: 0: 975.4. Samples: 615884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:36:19,079][01647] Avg episode reward: [(0, '5.364')] [2024-08-22 18:36:24,074][01647] Fps is (10 sec: 3276.9, 60 sec: 3686.5, 300 sec: 3679.5). Total num frames: 2478080. Throughput: 0: 912.3. Samples: 619948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:36:24,081][01647] Avg episode reward: [(0, '5.453')] [2024-08-22 18:36:28,464][05178] Updated weights for policy 0, policy_version 610 (0.0016) [2024-08-22 18:36:29,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 2498560. Throughput: 0: 911.2. Samples: 623174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:36:29,077][01647] Avg episode reward: [(0, '5.311')] [2024-08-22 18:36:34,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3679.5). Total num frames: 2519040. Throughput: 0: 958.0. Samples: 629628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:36:34,080][01647] Avg episode reward: [(0, '5.189')] [2024-08-22 18:36:39,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3679.5). Total num frames: 2531328. Throughput: 0: 907.5. Samples: 633876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:36:39,086][01647] Avg episode reward: [(0, '5.156')] [2024-08-22 18:36:40,824][05178] Updated weights for policy 0, policy_version 620 (0.0013) [2024-08-22 18:36:44,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 2551808. Throughput: 0: 884.3. Samples: 636178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:36:44,080][01647] Avg episode reward: [(0, '4.971')] [2024-08-22 18:36:49,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 2572288. Throughput: 0: 923.0. Samples: 642730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:36:49,077][01647] Avg episode reward: [(0, '4.922')] [2024-08-22 18:36:50,314][05178] Updated weights for policy 0, policy_version 630 (0.0018) [2024-08-22 18:36:54,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2588672. Throughput: 0: 929.4. Samples: 648148. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:36:54,078][01647] Avg episode reward: [(0, '4.992')] [2024-08-22 18:36:59,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 2605056. Throughput: 0: 896.5. Samples: 650086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:36:59,077][01647] Avg episode reward: [(0, '5.072')] [2024-08-22 18:37:02,695][05178] Updated weights for policy 0, policy_version 640 (0.0012) [2024-08-22 18:37:04,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2625536. Throughput: 0: 888.5. Samples: 655868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:37:04,080][01647] Avg episode reward: [(0, '4.957')] [2024-08-22 18:37:09,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 2646016. Throughput: 0: 938.5. Samples: 662182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:37:09,079][01647] Avg episode reward: [(0, '5.268')] [2024-08-22 18:37:14,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 2658304. Throughput: 0: 911.8. Samples: 664204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:37:14,081][01647] Avg episode reward: [(0, '5.340')] [2024-08-22 18:37:14,404][05178] Updated weights for policy 0, policy_version 650 (0.0020) [2024-08-22 18:37:19,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2678784. Throughput: 0: 876.0. Samples: 669048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:37:19,077][01647] Avg episode reward: [(0, '5.552')] [2024-08-22 18:37:19,080][05165] Saving new best policy, reward=5.552! [2024-08-22 18:37:24,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3693.4). Total num frames: 2699264. Throughput: 0: 922.0. Samples: 675366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:37:24,077][01647] Avg episode reward: [(0, '5.117')] [2024-08-22 18:37:24,440][05178] Updated weights for policy 0, policy_version 660 (0.0021) [2024-08-22 18:37:29,075][01647] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 2715648. Throughput: 0: 931.8. Samples: 678110. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-22 18:37:29,077][01647] Avg episode reward: [(0, '5.302')] [2024-08-22 18:37:34,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3693.3). Total num frames: 2727936. Throughput: 0: 873.0. Samples: 682014. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-22 18:37:34,077][01647] Avg episode reward: [(0, '5.030')] [2024-08-22 18:37:37,198][05178] Updated weights for policy 0, policy_version 670 (0.0016) [2024-08-22 18:37:39,074][01647] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3693.4). Total num frames: 2748416. Throughput: 0: 890.3. Samples: 688212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:37:39,080][01647] Avg episode reward: [(0, '5.043')] [2024-08-22 18:37:44,080][01647] Fps is (10 sec: 4093.8, 60 sec: 3617.8, 300 sec: 3693.3). Total num frames: 2768896. Throughput: 0: 919.1. Samples: 691450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:37:44,083][01647] Avg episode reward: [(0, '5.372')] [2024-08-22 18:37:49,076][01647] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3693.3). Total num frames: 2781184. Throughput: 0: 890.6. Samples: 695948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:37:49,078][01647] Avg episode reward: [(0, '5.024')] [2024-08-22 18:37:49,182][05178] Updated weights for policy 0, policy_version 680 (0.0032) [2024-08-22 18:37:54,074][01647] Fps is (10 sec: 3278.6, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 2801664. Throughput: 0: 872.3. Samples: 701434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:37:54,077][01647] Avg episode reward: [(0, '5.348')] [2024-08-22 18:37:58,576][05178] Updated weights for policy 0, policy_version 690 (0.0021) [2024-08-22 18:37:59,074][01647] Fps is (10 sec: 4506.3, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 2826240. Throughput: 0: 902.7. Samples: 704824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:37:59,077][01647] Avg episode reward: [(0, '5.062')] [2024-08-22 18:38:04,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 2842624. Throughput: 0: 924.0. Samples: 710628. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:38:04,077][01647] Avg episode reward: [(0, '4.602')] [2024-08-22 18:38:09,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2859008. Throughput: 0: 880.8. Samples: 715004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:38:09,077][01647] Avg episode reward: [(0, '4.753')] [2024-08-22 18:38:10,686][05178] Updated weights for policy 0, policy_version 700 (0.0022) [2024-08-22 18:38:14,075][01647] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2879488. Throughput: 0: 893.8. Samples: 718330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:38:14,077][01647] Avg episode reward: [(0, '4.862')] [2024-08-22 18:38:14,085][05165] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000703_2879488.pth... [2024-08-22 18:38:14,221][05165] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000486_1990656.pth [2024-08-22 18:38:19,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2899968. Throughput: 0: 955.0. Samples: 724990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:38:19,082][01647] Avg episode reward: [(0, '4.638')] [2024-08-22 18:38:21,474][05178] Updated weights for policy 0, policy_version 710 (0.0013) [2024-08-22 18:38:24,075][01647] Fps is (10 sec: 3276.8, 60 sec: 3549.8, 300 sec: 3721.1). Total num frames: 2912256. Throughput: 0: 910.1. Samples: 729166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:38:24,077][01647] Avg episode reward: [(0, '4.499')] [2024-08-22 18:38:29,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3721.1). Total num frames: 2932736. Throughput: 0: 898.0. Samples: 731856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:38:29,079][01647] Avg episode reward: [(0, '4.544')] [2024-08-22 18:38:32,081][05178] Updated weights for policy 0, policy_version 720 (0.0031) [2024-08-22 18:38:34,075][01647] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 2957312. Throughput: 0: 948.3. Samples: 738622. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:38:34,077][01647] Avg episode reward: [(0, '4.712')] [2024-08-22 18:38:39,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 2969600. Throughput: 0: 937.4. Samples: 743618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:38:39,083][01647] Avg episode reward: [(0, '4.892')] [2024-08-22 18:38:43,927][05178] Updated weights for policy 0, policy_version 730 (0.0011) [2024-08-22 18:38:44,074][01647] Fps is (10 sec: 3276.9, 60 sec: 3686.7, 300 sec: 3721.1). Total num frames: 2990080. Throughput: 0: 911.2. Samples: 745830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:38:44,077][01647] Avg episode reward: [(0, '5.441')] [2024-08-22 18:38:49,075][01647] Fps is (10 sec: 4095.9, 60 sec: 3823.0, 300 sec: 3721.1). Total num frames: 3010560. Throughput: 0: 924.7. Samples: 752238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:38:49,082][01647] Avg episode reward: [(0, '5.231')] [2024-08-22 18:38:53,639][05178] Updated weights for policy 0, policy_version 740 (0.0015) [2024-08-22 18:38:54,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3031040. Throughput: 0: 961.4. Samples: 758268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:38:54,077][01647] Avg episode reward: [(0, '5.060')] [2024-08-22 18:38:59,074][01647] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3721.2). Total num frames: 3043328. Throughput: 0: 932.0. Samples: 760270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:38:59,077][01647] Avg episode reward: [(0, '5.208')] [2024-08-22 18:39:04,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3063808. Throughput: 0: 902.6. Samples: 765608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:39:04,082][01647] Avg episode reward: [(0, '4.913')] [2024-08-22 18:39:05,631][05178] Updated weights for policy 0, policy_version 750 (0.0012) [2024-08-22 18:39:09,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3084288. Throughput: 0: 955.1. Samples: 772144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:39:09,077][01647] Avg episode reward: [(0, '4.914')] [2024-08-22 18:39:14,075][01647] Fps is (10 sec: 3686.1, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3100672. Throughput: 0: 949.5. Samples: 774584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:39:14,078][01647] Avg episode reward: [(0, '5.162')] [2024-08-22 18:39:17,646][05178] Updated weights for policy 0, policy_version 760 (0.0012) [2024-08-22 18:39:19,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 3117056. Throughput: 0: 894.0. Samples: 778852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:39:19,082][01647] Avg episode reward: [(0, '5.135')] [2024-08-22 18:39:24,074][01647] Fps is (10 sec: 3686.7, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3137536. Throughput: 0: 929.6. Samples: 785448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:39:24,084][01647] Avg episode reward: [(0, '5.318')] [2024-08-22 18:39:26,889][05178] Updated weights for policy 0, policy_version 770 (0.0020) [2024-08-22 18:39:29,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.3). Total num frames: 3158016. Throughput: 0: 953.3. Samples: 788728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:39:29,082][01647] Avg episode reward: [(0, '5.333')] [2024-08-22 18:39:34,077][01647] Fps is (10 sec: 3276.0, 60 sec: 3549.7, 300 sec: 3707.2). Total num frames: 3170304. Throughput: 0: 903.6. Samples: 792902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:39:34,082][01647] Avg episode reward: [(0, '5.294')] [2024-08-22 18:39:39,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 3190784. Throughput: 0: 898.6. Samples: 798706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:39:39,077][01647] Avg episode reward: [(0, '5.207')] [2024-08-22 18:39:39,253][05178] Updated weights for policy 0, policy_version 780 (0.0020) [2024-08-22 18:39:44,074][01647] Fps is (10 sec: 4506.7, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3215360. Throughput: 0: 927.7. Samples: 802016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:39:44,077][01647] Avg episode reward: [(0, '5.279')] [2024-08-22 18:39:49,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3693.3). Total num frames: 3227648. Throughput: 0: 924.6. Samples: 807216. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:39:49,081][01647] Avg episode reward: [(0, '5.195')] [2024-08-22 18:39:50,930][05178] Updated weights for policy 0, policy_version 790 (0.0018) [2024-08-22 18:39:54,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 3248128. Throughput: 0: 889.8. Samples: 812184. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:39:54,080][01647] Avg episode reward: [(0, '5.004')] [2024-08-22 18:39:59,075][01647] Fps is (10 sec: 4095.9, 60 sec: 3754.6, 300 sec: 3693.3). Total num frames: 3268608. Throughput: 0: 908.3. Samples: 815458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:39:59,080][01647] Avg episode reward: [(0, '4.840')] [2024-08-22 18:40:00,500][05178] Updated weights for policy 0, policy_version 800 (0.0015) [2024-08-22 18:40:04,076][01647] Fps is (10 sec: 3685.9, 60 sec: 3686.3, 300 sec: 3679.4). Total num frames: 3284992. Throughput: 0: 950.9. Samples: 821644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:40:04,079][01647] Avg episode reward: [(0, '4.725')] [2024-08-22 18:40:09,074][01647] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 3301376. Throughput: 0: 894.6. Samples: 825706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:40:09,080][01647] Avg episode reward: [(0, '4.691')] [2024-08-22 18:40:12,786][05178] Updated weights for policy 0, policy_version 810 (0.0012) [2024-08-22 18:40:14,074][01647] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3321856. Throughput: 0: 890.3. Samples: 828790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:40:14,082][01647] Avg episode reward: [(0, '4.846')] [2024-08-22 18:40:14,100][05165] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000811_3321856.pth... [2024-08-22 18:40:14,220][05165] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000597_2445312.pth [2024-08-22 18:40:19,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3679.5). Total num frames: 3342336. Throughput: 0: 936.4. Samples: 835038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:40:19,078][01647] Avg episode reward: [(0, '5.016')] [2024-08-22 18:40:24,076][01647] Fps is (10 sec: 3276.3, 60 sec: 3618.0, 300 sec: 3679.4). Total num frames: 3354624. Throughput: 0: 904.7. Samples: 839418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:40:24,083][01647] Avg episode reward: [(0, '4.810')] [2024-08-22 18:40:24,805][05178] Updated weights for policy 0, policy_version 820 (0.0029) [2024-08-22 18:40:29,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3665.6). Total num frames: 3371008. Throughput: 0: 877.1. Samples: 841486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:40:29,082][01647] Avg episode reward: [(0, '4.948')] [2024-08-22 18:40:34,074][01647] Fps is (10 sec: 3687.0, 60 sec: 3686.5, 300 sec: 3651.7). Total num frames: 3391488. Throughput: 0: 900.1. Samples: 847720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:40:34,082][01647] Avg episode reward: [(0, '5.196')] [2024-08-22 18:40:35,205][05178] Updated weights for policy 0, policy_version 830 (0.0031) [2024-08-22 18:40:39,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3407872. Throughput: 0: 907.4. Samples: 853018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:40:39,082][01647] Avg episode reward: [(0, '5.304')] [2024-08-22 18:40:44,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 3424256. Throughput: 0: 876.5. Samples: 854902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:40:44,078][01647] Avg episode reward: [(0, '5.489')] [2024-08-22 18:40:47,895][05178] Updated weights for policy 0, policy_version 840 (0.0017) [2024-08-22 18:40:49,075][01647] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3444736. Throughput: 0: 861.7. Samples: 860420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:40:49,077][01647] Avg episode reward: [(0, '5.606')] [2024-08-22 18:40:49,081][05165] Saving new best policy, reward=5.606! [2024-08-22 18:40:54,077][01647] Fps is (10 sec: 4095.1, 60 sec: 3618.0, 300 sec: 3637.8). Total num frames: 3465216. Throughput: 0: 912.8. Samples: 866784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:40:54,079][01647] Avg episode reward: [(0, '5.530')] [2024-08-22 18:40:59,074][01647] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3637.8). Total num frames: 3477504. Throughput: 0: 888.0. Samples: 868752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:40:59,081][01647] Avg episode reward: [(0, '5.442')] [2024-08-22 18:41:00,059][05178] Updated weights for policy 0, policy_version 850 (0.0023) [2024-08-22 18:41:04,074][01647] Fps is (10 sec: 2867.8, 60 sec: 3481.7, 300 sec: 3623.9). Total num frames: 3493888. Throughput: 0: 851.3. Samples: 873346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:41:04,077][01647] Avg episode reward: [(0, '5.286')] [2024-08-22 18:41:09,075][01647] Fps is (10 sec: 4095.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3518464. Throughput: 0: 894.1. Samples: 879650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:41:09,082][01647] Avg episode reward: [(0, '5.118')] [2024-08-22 18:41:09,892][05178] Updated weights for policy 0, policy_version 860 (0.0025) [2024-08-22 18:41:14,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3623.9). Total num frames: 3530752. Throughput: 0: 916.1. Samples: 882712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:41:14,081][01647] Avg episode reward: [(0, '5.432')] [2024-08-22 18:41:19,074][01647] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 3623.9). Total num frames: 3547136. Throughput: 0: 864.9. Samples: 886642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:41:19,077][01647] Avg episode reward: [(0, '5.502')] [2024-08-22 18:41:22,556][05178] Updated weights for policy 0, policy_version 870 (0.0031) [2024-08-22 18:41:24,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3623.9). Total num frames: 3567616. Throughput: 0: 879.3. Samples: 892588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:41:24,082][01647] Avg episode reward: [(0, '5.510')] [2024-08-22 18:41:29,076][01647] Fps is (10 sec: 4095.2, 60 sec: 3618.0, 300 sec: 3623.9). Total num frames: 3588096. Throughput: 0: 908.4. Samples: 895782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-22 18:41:29,083][01647] Avg episode reward: [(0, '5.914')] [2024-08-22 18:41:29,085][05165] Saving new best policy, reward=5.914! [2024-08-22 18:41:34,078][01647] Fps is (10 sec: 3275.5, 60 sec: 3481.4, 300 sec: 3623.9). Total num frames: 3600384. Throughput: 0: 886.8. Samples: 900328. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:41:34,086][01647] Avg episode reward: [(0, '6.127')] [2024-08-22 18:41:34,100][05165] Saving new best policy, reward=6.127! [2024-08-22 18:41:34,910][05178] Updated weights for policy 0, policy_version 880 (0.0018) [2024-08-22 18:41:39,074][01647] Fps is (10 sec: 2867.8, 60 sec: 3481.6, 300 sec: 3610.0). Total num frames: 3616768. Throughput: 0: 852.4. Samples: 905140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:41:39,078][01647] Avg episode reward: [(0, '6.429')] [2024-08-22 18:41:39,102][05165] Saving new best policy, reward=6.429! [2024-08-22 18:41:44,074][01647] Fps is (10 sec: 4097.7, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3641344. Throughput: 0: 877.5. Samples: 908240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:41:44,077][01647] Avg episode reward: [(0, '6.241')] [2024-08-22 18:41:45,104][05178] Updated weights for policy 0, policy_version 890 (0.0016) [2024-08-22 18:41:49,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 3657728. Throughput: 0: 906.3. Samples: 914128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:41:49,077][01647] Avg episode reward: [(0, '5.984')] [2024-08-22 18:41:54,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3413.5, 300 sec: 3610.0). Total num frames: 3670016. Throughput: 0: 852.5. Samples: 918010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:41:54,078][01647] Avg episode reward: [(0, '6.238')] [2024-08-22 18:41:57,588][05178] Updated weights for policy 0, policy_version 900 (0.0019) [2024-08-22 18:41:59,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3690496. Throughput: 0: 854.4. Samples: 921158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:41:59,077][01647] Avg episode reward: [(0, '6.492')] [2024-08-22 18:41:59,083][05165] Saving new best policy, reward=6.492! [2024-08-22 18:42:04,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3710976. Throughput: 0: 907.5. Samples: 927480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-22 18:42:04,077][01647] Avg episode reward: [(0, '6.839')] [2024-08-22 18:42:04,087][05165] Saving new best policy, reward=6.839! [2024-08-22 18:42:09,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3610.0). Total num frames: 3723264. Throughput: 0: 871.2. Samples: 931792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:42:09,080][01647] Avg episode reward: [(0, '6.869')] [2024-08-22 18:42:09,082][05165] Saving new best policy, reward=6.869! [2024-08-22 18:42:09,916][05178] Updated weights for policy 0, policy_version 910 (0.0012) [2024-08-22 18:42:14,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3743744. Throughput: 0: 844.4. Samples: 933780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:42:14,077][01647] Avg episode reward: [(0, '7.024')] [2024-08-22 18:42:14,088][05165] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000914_3743744.pth... [2024-08-22 18:42:14,225][05165] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000703_2879488.pth [2024-08-22 18:42:14,244][05165] Saving new best policy, reward=7.024! [2024-08-22 18:42:19,074][01647] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3764224. Throughput: 0: 883.0. Samples: 940060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-22 18:42:19,076][01647] Avg episode reward: [(0, '6.823')] [2024-08-22 18:42:19,980][05178] Updated weights for policy 0, policy_version 920 (0.0026) [2024-08-22 18:42:24,084][01647] Fps is (10 sec: 3682.9, 60 sec: 3549.3, 300 sec: 3609.9). Total num frames: 3780608. Throughput: 0: 898.7. Samples: 945592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:42:24,088][01647] Avg episode reward: [(0, '6.805')] [2024-08-22 18:42:29,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3610.0). Total num frames: 3792896. Throughput: 0: 874.4. Samples: 947586. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:42:29,082][01647] Avg episode reward: [(0, '6.887')] [2024-08-22 18:42:32,476][05178] Updated weights for policy 0, policy_version 930 (0.0020) [2024-08-22 18:42:34,074][01647] Fps is (10 sec: 3280.0, 60 sec: 3550.1, 300 sec: 3610.0). Total num frames: 3813376. Throughput: 0: 865.6. Samples: 953078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:42:34,086][01647] Avg episode reward: [(0, '6.753')] [2024-08-22 18:42:39,075][01647] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 3833856. Throughput: 0: 920.4. Samples: 959428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:42:39,081][01647] Avg episode reward: [(0, '7.434')] [2024-08-22 18:42:39,092][05165] Saving new best policy, reward=7.434! [2024-08-22 18:42:44,076][01647] Fps is (10 sec: 3276.1, 60 sec: 3413.2, 300 sec: 3610.0). Total num frames: 3846144. Throughput: 0: 895.1. Samples: 961438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:42:44,079][01647] Avg episode reward: [(0, '7.929')] [2024-08-22 18:42:44,138][05165] Saving new best policy, reward=7.929! [2024-08-22 18:42:44,166][05178] Updated weights for policy 0, policy_version 940 (0.0025) [2024-08-22 18:42:49,075][01647] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3610.0). Total num frames: 3866624. Throughput: 0: 853.5. Samples: 965888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-22 18:42:49,076][01647] Avg episode reward: [(0, '8.738')] [2024-08-22 18:42:49,079][05165] Saving new best policy, reward=8.738! [2024-08-22 18:42:54,076][01647] Fps is (10 sec: 4096.0, 60 sec: 3618.0, 300 sec: 3596.1). Total num frames: 3887104. Throughput: 0: 898.4. Samples: 972224. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-22 18:42:54,080][01647] Avg episode reward: [(0, '8.339')] [2024-08-22 18:42:54,732][05178] Updated weights for policy 0, policy_version 950 (0.0014) [2024-08-22 18:42:59,074][01647] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 3903488. Throughput: 0: 923.2. Samples: 975324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:42:59,077][01647] Avg episode reward: [(0, '7.125')] [2024-08-22 18:43:04,076][01647] Fps is (10 sec: 2867.4, 60 sec: 3413.2, 300 sec: 3582.2). Total num frames: 3915776. Throughput: 0: 871.7. Samples: 979290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:43:04,081][01647] Avg episode reward: [(0, '6.748')] [2024-08-22 18:43:07,015][05178] Updated weights for policy 0, policy_version 960 (0.0029) [2024-08-22 18:43:09,074][01647] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3940352. Throughput: 0: 881.6. Samples: 985254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-22 18:43:09,076][01647] Avg episode reward: [(0, '6.971')] [2024-08-22 18:43:14,074][01647] Fps is (10 sec: 4506.3, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3960832. Throughput: 0: 910.1. Samples: 988540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:43:14,079][01647] Avg episode reward: [(0, '7.847')] [2024-08-22 18:43:18,560][05178] Updated weights for policy 0, policy_version 970 (0.0031) [2024-08-22 18:43:19,074][01647] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3596.2). Total num frames: 3973120. Throughput: 0: 893.5. Samples: 993286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-22 18:43:19,081][01647] Avg episode reward: [(0, '8.501')] [2024-08-22 18:43:24,074][01647] Fps is (10 sec: 2867.2, 60 sec: 3482.2, 300 sec: 3582.3). Total num frames: 3989504. Throughput: 0: 868.3. Samples: 998500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-22 18:43:24,081][01647] Avg episode reward: [(0, '8.883')] [2024-08-22 18:43:24,137][05165] Saving new best policy, reward=8.883! [2024-08-22 18:43:27,146][05165] Stopping Batcher_0... [2024-08-22 18:43:27,148][05165] Loop batcher_evt_loop terminating... [2024-08-22 18:43:27,149][05165] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-22 18:43:27,151][01647] Component Batcher_0 stopped! [2024-08-22 18:43:27,196][05178] Weights refcount: 2 0 [2024-08-22 18:43:27,200][01647] Component InferenceWorker_p0-w0 stopped! [2024-08-22 18:43:27,202][05178] Stopping InferenceWorker_p0-w0... [2024-08-22 18:43:27,202][05178] Loop inference_proc0-0_evt_loop terminating... [2024-08-22 18:43:27,275][05165] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000811_3321856.pth [2024-08-22 18:43:27,294][05165] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-22 18:43:27,477][05165] Stopping LearnerWorker_p0... [2024-08-22 18:43:27,477][01647] Component LearnerWorker_p0 stopped! [2024-08-22 18:43:27,478][05165] Loop learner_proc0_evt_loop terminating... [2024-08-22 18:43:27,546][01647] Component RolloutWorker_w0 stopped! [2024-08-22 18:43:27,550][05182] Stopping RolloutWorker_w0... [2024-08-22 18:43:27,550][05182] Loop rollout_proc0_evt_loop terminating... [2024-08-22 18:43:27,564][01647] Component RolloutWorker_w2 stopped! [2024-08-22 18:43:27,569][05181] Stopping RolloutWorker_w2... [2024-08-22 18:43:27,572][01647] Component RolloutWorker_w6 stopped! [2024-08-22 18:43:27,577][05185] Stopping RolloutWorker_w6... [2024-08-22 18:43:27,570][05181] Loop rollout_proc2_evt_loop terminating... [2024-08-22 18:43:27,581][05185] Loop rollout_proc6_evt_loop terminating... [2024-08-22 18:43:27,586][01647] Component RolloutWorker_w4 stopped! [2024-08-22 18:43:27,591][05183] Stopping RolloutWorker_w4... [2024-08-22 18:43:27,592][05183] Loop rollout_proc4_evt_loop terminating... [2024-08-22 18:43:27,722][01647] Component RolloutWorker_w7 stopped! [2024-08-22 18:43:27,727][05186] Stopping RolloutWorker_w7... [2024-08-22 18:43:27,734][05186] Loop rollout_proc7_evt_loop terminating... [2024-08-22 18:43:27,787][01647] Component RolloutWorker_w3 stopped! [2024-08-22 18:43:27,791][05180] Stopping RolloutWorker_w3... [2024-08-22 18:43:27,792][05180] Loop rollout_proc3_evt_loop terminating... [2024-08-22 18:43:27,827][01647] Component RolloutWorker_w1 stopped! [2024-08-22 18:43:27,831][05179] Stopping RolloutWorker_w1... [2024-08-22 18:43:27,832][05179] Loop rollout_proc1_evt_loop terminating... [2024-08-22 18:43:27,906][01647] Component RolloutWorker_w5 stopped! [2024-08-22 18:43:27,910][01647] Waiting for process learner_proc0 to stop... [2024-08-22 18:43:27,913][05184] Stopping RolloutWorker_w5... [2024-08-22 18:43:27,918][05184] Loop rollout_proc5_evt_loop terminating... [2024-08-22 18:43:29,146][01647] Waiting for process inference_proc0-0 to join... [2024-08-22 18:43:29,240][01647] Waiting for process rollout_proc0 to join... [2024-08-22 18:43:30,299][01647] Waiting for process rollout_proc1 to join... [2024-08-22 18:43:30,470][01647] Waiting for process rollout_proc2 to join... [2024-08-22 18:43:30,472][01647] Waiting for process rollout_proc3 to join... [2024-08-22 18:43:30,475][01647] Waiting for process rollout_proc4 to join... [2024-08-22 18:43:30,477][01647] Waiting for process rollout_proc5 to join... [2024-08-22 18:43:30,478][01647] Waiting for process rollout_proc6 to join... [2024-08-22 18:43:30,481][01647] Waiting for process rollout_proc7 to join... [2024-08-22 18:43:30,482][01647] Batcher 0 profile tree view: batching: 26.9482, releasing_batches: 0.0271 [2024-08-22 18:43:30,483][01647] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 493.0477 update_model: 7.9107 weight_update: 0.0033 one_step: 0.0027 handle_policy_step: 583.2341 deserialize: 15.7673, stack: 3.1331, obs_to_device_normalize: 120.1707, forward: 296.5118, send_messages: 28.9602 prepare_outputs: 89.4345 to_cpu: 55.4786 [2024-08-22 18:43:30,485][01647] Learner 0 profile tree view: misc: 0.0048, prepare_batch: 16.7949 train: 75.6992 epoch_init: 0.0127, minibatch_init: 0.0090, losses_postprocess: 0.5681, kl_divergence: 0.5589, after_optimizer: 33.8783 calculate_losses: 25.7407 losses_init: 0.0035, forward_head: 1.8710, bptt_initial: 16.4781, tail: 1.2862, advantages_returns: 0.3216, losses: 3.0594 bptt: 2.3683 bptt_forward_core: 2.2809 update: 14.3124 clip: 1.4411 [2024-08-22 18:43:30,486][01647] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3366, enqueue_policy_requests: 131.9332, env_step: 868.0941, overhead: 15.8653, complete_rollouts: 7.3828 save_policy_outputs: 27.4208 split_output_tensors: 9.4219 [2024-08-22 18:43:30,488][01647] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3813, enqueue_policy_requests: 131.4835, env_step: 860.9753, overhead: 15.7720, complete_rollouts: 7.5139 save_policy_outputs: 27.7238 split_output_tensors: 9.8253 [2024-08-22 18:43:30,489][01647] Loop Runner_EvtLoop terminating... [2024-08-22 18:43:30,491][01647] Runner profile tree view: main_loop: 1152.3128 [2024-08-22 18:43:30,492][01647] Collected {0: 4005888}, FPS: 3476.4 [2024-08-22 18:43:30,741][01647] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-22 18:43:30,743][01647] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-22 18:43:30,745][01647] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-22 18:43:30,746][01647] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-22 18:43:30,747][01647] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-22 18:43:30,749][01647] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-22 18:43:30,750][01647] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-08-22 18:43:30,752][01647] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-22 18:43:30,753][01647] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-08-22 18:43:30,755][01647] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-08-22 18:43:30,756][01647] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-22 18:43:30,758][01647] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-22 18:43:30,759][01647] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-22 18:43:30,761][01647] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-22 18:43:30,762][01647] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-22 18:43:30,778][01647] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-22 18:43:30,781][01647] RunningMeanStd input shape: (3, 72, 128) [2024-08-22 18:43:30,783][01647] RunningMeanStd input shape: (1,) [2024-08-22 18:43:30,798][01647] ConvEncoder: input_channels=3 [2024-08-22 18:43:30,947][01647] Conv encoder output size: 512 [2024-08-22 18:43:30,951][01647] Policy head output size: 512 [2024-08-22 18:43:33,121][01647] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-22 18:43:34,057][01647] Num frames 100... [2024-08-22 18:43:34,178][01647] Num frames 200... [2024-08-22 18:43:34,310][01647] Num frames 300... [2024-08-22 18:43:34,435][01647] Num frames 400... [2024-08-22 18:43:34,561][01647] Num frames 500... [2024-08-22 18:43:34,688][01647] Num frames 600... [2024-08-22 18:43:34,791][01647] Avg episode rewards: #0: 10.400, true rewards: #0: 6.400 [2024-08-22 18:43:34,793][01647] Avg episode reward: 10.400, avg true_objective: 6.400 [2024-08-22 18:43:34,864][01647] Num frames 700... [2024-08-22 18:43:34,980][01647] Num frames 800... [2024-08-22 18:43:35,110][01647] Num frames 900... [2024-08-22 18:43:35,226][01647] Num frames 1000... [2024-08-22 18:43:35,314][01647] Avg episode rewards: #0: 7.120, true rewards: #0: 5.120 [2024-08-22 18:43:35,315][01647] Avg episode reward: 7.120, avg true_objective: 5.120 [2024-08-22 18:43:35,409][01647] Num frames 1100... [2024-08-22 18:43:35,526][01647] Num frames 1200... [2024-08-22 18:43:35,644][01647] Num frames 1300... [2024-08-22 18:43:35,765][01647] Num frames 1400... [2024-08-22 18:43:35,884][01647] Num frames 1500... [2024-08-22 18:43:36,004][01647] Num frames 1600... [2024-08-22 18:43:36,181][01647] Avg episode rewards: #0: 9.320, true rewards: #0: 5.653 [2024-08-22 18:43:36,182][01647] Avg episode reward: 9.320, avg true_objective: 5.653 [2024-08-22 18:43:36,192][01647] Num frames 1700... [2024-08-22 18:43:36,321][01647] Num frames 1800... [2024-08-22 18:43:36,446][01647] Num frames 1900... [2024-08-22 18:43:36,571][01647] Num frames 2000... [2024-08-22 18:43:36,692][01647] Num frames 2100... [2024-08-22 18:43:36,802][01647] Avg episode rewards: #0: 8.360, true rewards: #0: 5.360 [2024-08-22 18:43:36,804][01647] Avg episode reward: 8.360, avg true_objective: 5.360 [2024-08-22 18:43:36,872][01647] Num frames 2200... [2024-08-22 18:43:36,991][01647] Num frames 2300... [2024-08-22 18:43:37,127][01647] Num frames 2400... [2024-08-22 18:43:37,250][01647] Num frames 2500... [2024-08-22 18:43:37,424][01647] Avg episode rewards: #0: 7.784, true rewards: #0: 5.184 [2024-08-22 18:43:37,426][01647] Avg episode reward: 7.784, avg true_objective: 5.184 [2024-08-22 18:43:37,441][01647] Num frames 2600... [2024-08-22 18:43:37,555][01647] Num frames 2700... [2024-08-22 18:43:37,675][01647] Num frames 2800... [2024-08-22 18:43:37,796][01647] Num frames 2900... [2024-08-22 18:43:37,915][01647] Num frames 3000... [2024-08-22 18:43:38,042][01647] Num frames 3100... [2024-08-22 18:43:38,167][01647] Num frames 3200... [2024-08-22 18:43:38,335][01647] Avg episode rewards: #0: 8.327, true rewards: #0: 5.493 [2024-08-22 18:43:38,337][01647] Avg episode reward: 8.327, avg true_objective: 5.493 [2024-08-22 18:43:38,350][01647] Num frames 3300... [2024-08-22 18:43:38,466][01647] Num frames 3400... [2024-08-22 18:43:38,583][01647] Num frames 3500... [2024-08-22 18:43:38,732][01647] Num frames 3600... [2024-08-22 18:43:38,854][01647] Num frames 3700... [2024-08-22 18:43:38,961][01647] Avg episode rewards: #0: 7.920, true rewards: #0: 5.349 [2024-08-22 18:43:38,963][01647] Avg episode reward: 7.920, avg true_objective: 5.349 [2024-08-22 18:43:39,039][01647] Num frames 3800... [2024-08-22 18:43:39,164][01647] Num frames 3900... [2024-08-22 18:43:39,280][01647] Num frames 4000... [2024-08-22 18:43:39,457][01647] Avg episode rewards: #0: 7.495, true rewards: #0: 5.120 [2024-08-22 18:43:39,459][01647] Avg episode reward: 7.495, avg true_objective: 5.120 [2024-08-22 18:43:39,469][01647] Num frames 4100... [2024-08-22 18:43:39,587][01647] Num frames 4200... [2024-08-22 18:43:39,709][01647] Num frames 4300... [2024-08-22 18:43:39,841][01647] Num frames 4400... [2024-08-22 18:43:39,994][01647] Avg episode rewards: #0: 7.311, true rewards: #0: 4.978 [2024-08-22 18:43:39,995][01647] Avg episode reward: 7.311, avg true_objective: 4.978 [2024-08-22 18:43:40,032][01647] Num frames 4500... [2024-08-22 18:43:40,157][01647] Num frames 4600... [2024-08-22 18:43:40,281][01647] Num frames 4700... [2024-08-22 18:43:40,410][01647] Num frames 4800... [2024-08-22 18:43:40,532][01647] Num frames 4900... [2024-08-22 18:43:40,655][01647] Num frames 5000... [2024-08-22 18:43:40,777][01647] Num frames 5100... [2024-08-22 18:43:40,858][01647] Avg episode rewards: #0: 7.620, true rewards: #0: 5.120 [2024-08-22 18:43:40,859][01647] Avg episode reward: 7.620, avg true_objective: 5.120 [2024-08-22 18:44:12,158][01647] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-08-22 19:00:31,943][01647] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-22 19:00:31,944][01647] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-22 19:00:31,947][01647] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-22 19:00:31,949][01647] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-22 19:00:31,950][01647] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-22 19:00:31,952][01647] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-22 19:00:31,953][01647] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-08-22 19:00:31,955][01647] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-22 19:00:31,956][01647] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-08-22 19:00:31,957][01647] Adding new argument 'hf_repository'='electricwapiti/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-08-22 19:00:31,958][01647] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-22 19:00:31,959][01647] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-22 19:00:31,960][01647] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-22 19:00:31,961][01647] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-22 19:00:31,962][01647] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-22 19:00:31,971][01647] RunningMeanStd input shape: (3, 72, 128) [2024-08-22 19:00:31,979][01647] RunningMeanStd input shape: (1,) [2024-08-22 19:00:31,992][01647] ConvEncoder: input_channels=3 [2024-08-22 19:00:32,034][01647] Conv encoder output size: 512 [2024-08-22 19:00:32,037][01647] Policy head output size: 512 [2024-08-22 19:00:32,058][01647] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-22 19:00:32,528][01647] Num frames 100... [2024-08-22 19:00:32,646][01647] Num frames 200... [2024-08-22 19:00:32,766][01647] Num frames 300... [2024-08-22 19:00:32,884][01647] Num frames 400... [2024-08-22 19:00:33,048][01647] Avg episode rewards: #0: 6.890, true rewards: #0: 4.890 [2024-08-22 19:00:33,050][01647] Avg episode reward: 6.890, avg true_objective: 4.890 [2024-08-22 19:00:33,066][01647] Num frames 500... [2024-08-22 19:00:33,184][01647] Num frames 600... [2024-08-22 19:00:33,311][01647] Num frames 700... [2024-08-22 19:00:33,429][01647] Num frames 800... [2024-08-22 19:00:33,546][01647] Num frames 900... [2024-08-22 19:00:33,682][01647] Avg episode rewards: #0: 6.845, true rewards: #0: 4.845 [2024-08-22 19:00:33,683][01647] Avg episode reward: 6.845, avg true_objective: 4.845 [2024-08-22 19:00:33,722][01647] Num frames 1000... [2024-08-22 19:00:33,835][01647] Num frames 1100... [2024-08-22 19:00:33,958][01647] Num frames 1200... [2024-08-22 19:00:34,084][01647] Num frames 1300... [2024-08-22 19:00:34,203][01647] Avg episode rewards: #0: 5.843, true rewards: #0: 4.510 [2024-08-22 19:00:34,204][01647] Avg episode reward: 5.843, avg true_objective: 4.510 [2024-08-22 19:00:34,260][01647] Num frames 1400... [2024-08-22 19:00:34,390][01647] Num frames 1500... [2024-08-22 19:00:34,507][01647] Num frames 1600... [2024-08-22 19:00:34,633][01647] Num frames 1700... [2024-08-22 19:00:34,756][01647] Num frames 1800... [2024-08-22 19:00:34,882][01647] Num frames 1900... [2024-08-22 19:00:35,006][01647] Num frames 2000... [2024-08-22 19:00:35,130][01647] Num frames 2100... [2024-08-22 19:00:35,250][01647] Num frames 2200... [2024-08-22 19:00:35,363][01647] Avg episode rewards: #0: 8.623, true rewards: #0: 5.622 [2024-08-22 19:00:35,364][01647] Avg episode reward: 8.623, avg true_objective: 5.622 [2024-08-22 19:00:35,428][01647] Num frames 2300... [2024-08-22 19:00:35,546][01647] Num frames 2400... [2024-08-22 19:00:35,666][01647] Num frames 2500... [2024-08-22 19:00:35,783][01647] Num frames 2600... [2024-08-22 19:00:35,913][01647] Num frames 2700... [2024-08-22 19:00:36,085][01647] Avg episode rewards: #0: 8.386, true rewards: #0: 5.586 [2024-08-22 19:00:36,088][01647] Avg episode reward: 8.386, avg true_objective: 5.586 [2024-08-22 19:00:36,099][01647] Num frames 2800... [2024-08-22 19:00:36,217][01647] Num frames 2900... [2024-08-22 19:00:36,333][01647] Num frames 3000... [2024-08-22 19:00:36,460][01647] Num frames 3100... [2024-08-22 19:00:36,584][01647] Num frames 3200... [2024-08-22 19:00:36,691][01647] Avg episode rewards: #0: 7.902, true rewards: #0: 5.402 [2024-08-22 19:00:36,692][01647] Avg episode reward: 7.902, avg true_objective: 5.402 [2024-08-22 19:00:36,768][01647] Num frames 3300... [2024-08-22 19:00:36,882][01647] Num frames 3400... [2024-08-22 19:00:37,001][01647] Num frames 3500... [2024-08-22 19:00:37,126][01647] Num frames 3600... [2024-08-22 19:00:37,239][01647] Num frames 3700... [2024-08-22 19:00:37,357][01647] Num frames 3800... [2024-08-22 19:00:37,479][01647] Avg episode rewards: #0: 8.070, true rewards: #0: 5.499 [2024-08-22 19:00:37,483][01647] Avg episode reward: 8.070, avg true_objective: 5.499 [2024-08-22 19:00:37,542][01647] Num frames 3900... [2024-08-22 19:00:37,655][01647] Num frames 4000... [2024-08-22 19:00:37,770][01647] Num frames 4100... [2024-08-22 19:00:37,890][01647] Num frames 4200... [2024-08-22 19:00:38,004][01647] Avg episode rewards: #0: 7.938, true rewards: #0: 5.312 [2024-08-22 19:00:38,006][01647] Avg episode reward: 7.938, avg true_objective: 5.312 [2024-08-22 19:00:38,073][01647] Num frames 4300... [2024-08-22 19:00:38,191][01647] Num frames 4400... [2024-08-22 19:00:38,309][01647] Num frames 4500... [2024-08-22 19:00:38,438][01647] Num frames 4600... [2024-08-22 19:00:38,555][01647] Num frames 4700... [2024-08-22 19:00:38,671][01647] Num frames 4800... [2024-08-22 19:00:38,788][01647] Num frames 4900... [2024-08-22 19:00:38,908][01647] Num frames 5000... [2024-08-22 19:00:39,031][01647] Num frames 5100... [2024-08-22 19:00:39,176][01647] Avg episode rewards: #0: 8.865, true rewards: #0: 5.753 [2024-08-22 19:00:39,178][01647] Avg episode reward: 8.865, avg true_objective: 5.753 [2024-08-22 19:00:39,206][01647] Num frames 5200... [2024-08-22 19:00:39,323][01647] Num frames 5300... [2024-08-22 19:00:39,445][01647] Num frames 5400... [2024-08-22 19:00:39,565][01647] Num frames 5500... [2024-08-22 19:00:39,700][01647] Avg episode rewards: #0: 8.362, true rewards: #0: 5.562 [2024-08-22 19:00:39,701][01647] Avg episode reward: 8.362, avg true_objective: 5.562 [2024-08-22 19:01:12,481][01647] Replay video saved to /content/train_dir/default_experiment/replay.mp4!