[2024-08-27 14:48:26,099][00237] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-08-27 14:48:26,102][00237] Rollout worker 0 uses device cpu [2024-08-27 14:48:26,108][00237] Rollout worker 1 uses device cpu [2024-08-27 14:48:26,110][00237] Rollout worker 2 uses device cpu [2024-08-27 14:48:26,115][00237] Rollout worker 3 uses device cpu [2024-08-27 14:48:26,117][00237] Rollout worker 4 uses device cpu [2024-08-27 14:48:26,122][00237] Rollout worker 5 uses device cpu [2024-08-27 14:48:26,125][00237] Rollout worker 6 uses device cpu [2024-08-27 14:48:26,129][00237] Rollout worker 7 uses device cpu [2024-08-27 14:48:26,440][00237] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-27 14:48:26,445][00237] InferenceWorker_p0-w0: min num requests: 2 [2024-08-27 14:48:26,508][00237] Starting all processes... [2024-08-27 14:48:26,515][00237] Starting process learner_proc0 [2024-08-27 14:48:28,184][00237] Starting all processes... [2024-08-27 14:48:28,209][00237] Starting process inference_proc0-0 [2024-08-27 14:48:28,212][00237] Starting process rollout_proc0 [2024-08-27 14:48:28,212][00237] Starting process rollout_proc1 [2024-08-27 14:48:28,212][00237] Starting process rollout_proc2 [2024-08-27 14:48:28,212][00237] Starting process rollout_proc3 [2024-08-27 14:48:28,212][00237] Starting process rollout_proc4 [2024-08-27 14:48:28,212][00237] Starting process rollout_proc5 [2024-08-27 14:48:28,212][00237] Starting process rollout_proc6 [2024-08-27 14:48:28,212][00237] Starting process rollout_proc7 [2024-08-27 14:48:48,857][03019] Worker 4 uses CPU cores [0] [2024-08-27 14:48:48,893][03020] Worker 3 uses CPU cores [1] [2024-08-27 14:48:48,989][03017] Worker 2 uses CPU cores [0] [2024-08-27 14:48:48,982][03015] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-27 14:48:48,996][03015] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-08-27 14:48:49,033][03018] Worker 1 uses CPU cores [1] [2024-08-27 14:48:49,064][00237] Heartbeat connected on RolloutWorker_w4 [2024-08-27 14:48:49,083][00237] Heartbeat connected on RolloutWorker_w3 [2024-08-27 14:48:49,095][03015] Num visible devices: 1 [2024-08-27 14:48:49,121][00237] Heartbeat connected on InferenceWorker_p0-w0 [2024-08-27 14:48:49,125][03021] Worker 6 uses CPU cores [0] [2024-08-27 14:48:49,141][02998] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-27 14:48:49,146][02998] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-08-27 14:48:49,183][00237] Heartbeat connected on RolloutWorker_w1 [2024-08-27 14:48:49,188][00237] Heartbeat connected on RolloutWorker_w2 [2024-08-27 14:48:49,202][03022] Worker 5 uses CPU cores [1] [2024-08-27 14:48:49,201][02998] Num visible devices: 1 [2024-08-27 14:48:49,221][02998] Starting seed is not provided [2024-08-27 14:48:49,222][02998] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-27 14:48:49,222][02998] Initializing actor-critic model on device cuda:0 [2024-08-27 14:48:49,223][02998] RunningMeanStd input shape: (3, 72, 128) [2024-08-27 14:48:49,225][00237] Heartbeat connected on Batcher_0 [2024-08-27 14:48:49,230][02998] RunningMeanStd input shape: (1,) [2024-08-27 14:48:49,250][03016] Worker 0 uses CPU cores [0] [2024-08-27 14:48:49,266][02998] ConvEncoder: input_channels=3 [2024-08-27 14:48:49,268][00237] Heartbeat connected on RolloutWorker_w6 [2024-08-27 14:48:49,276][00237] Heartbeat connected on RolloutWorker_w0 [2024-08-27 14:48:49,294][00237] Heartbeat connected on RolloutWorker_w5 [2024-08-27 14:48:49,400][03023] Worker 7 uses CPU cores [1] [2024-08-27 14:48:49,501][00237] Heartbeat connected on RolloutWorker_w7 [2024-08-27 14:48:49,676][02998] Conv encoder output size: 512 [2024-08-27 14:48:49,676][02998] Policy head output size: 512 [2024-08-27 14:48:49,744][02998] Created Actor Critic model with architecture: [2024-08-27 14:48:49,744][02998] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-08-27 14:48:50,278][02998] Using optimizer [2024-08-27 14:48:50,969][02998] No checkpoints found [2024-08-27 14:48:50,969][02998] Did not load from checkpoint, starting from scratch! [2024-08-27 14:48:50,970][02998] Initialized policy 0 weights for model version 0 [2024-08-27 14:48:50,976][02998] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-27 14:48:50,984][02998] LearnerWorker_p0 finished initialization! [2024-08-27 14:48:50,985][00237] Heartbeat connected on LearnerWorker_p0 [2024-08-27 14:48:51,068][03015] RunningMeanStd input shape: (3, 72, 128) [2024-08-27 14:48:51,070][03015] RunningMeanStd input shape: (1,) [2024-08-27 14:48:51,082][03015] ConvEncoder: input_channels=3 [2024-08-27 14:48:51,186][03015] Conv encoder output size: 512 [2024-08-27 14:48:51,186][03015] Policy head output size: 512 [2024-08-27 14:48:51,239][00237] Inference worker 0-0 is ready! [2024-08-27 14:48:51,242][00237] All inference workers are ready! Signal rollout workers to start! [2024-08-27 14:48:51,440][03021] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-27 14:48:51,437][03022] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-27 14:48:51,444][03017] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-27 14:48:51,446][03018] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-27 14:48:51,442][03019] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-27 14:48:51,444][03020] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-27 14:48:51,449][03023] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-27 14:48:51,445][03016] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-27 14:48:52,448][00237] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-27 14:48:52,460][03016] Decorrelating experience for 0 frames... [2024-08-27 14:48:52,459][03017] Decorrelating experience for 0 frames... [2024-08-27 14:48:53,171][03018] Decorrelating experience for 0 frames... [2024-08-27 14:48:53,169][03022] Decorrelating experience for 0 frames... [2024-08-27 14:48:53,174][03023] Decorrelating experience for 0 frames... [2024-08-27 14:48:53,168][03020] Decorrelating experience for 0 frames... [2024-08-27 14:48:53,200][03019] Decorrelating experience for 0 frames... [2024-08-27 14:48:53,269][03017] Decorrelating experience for 32 frames... [2024-08-27 14:48:54,209][03023] Decorrelating experience for 32 frames... [2024-08-27 14:48:54,211][03022] Decorrelating experience for 32 frames... [2024-08-27 14:48:54,862][03016] Decorrelating experience for 32 frames... [2024-08-27 14:48:54,867][03019] Decorrelating experience for 32 frames... [2024-08-27 14:48:54,934][03021] Decorrelating experience for 0 frames... [2024-08-27 14:48:55,795][03018] Decorrelating experience for 32 frames... [2024-08-27 14:48:56,164][03022] Decorrelating experience for 64 frames... [2024-08-27 14:48:56,281][03020] Decorrelating experience for 32 frames... [2024-08-27 14:48:57,046][03017] Decorrelating experience for 64 frames... [2024-08-27 14:48:57,104][03021] Decorrelating experience for 32 frames... [2024-08-27 14:48:57,448][00237] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-27 14:48:58,324][03019] Decorrelating experience for 64 frames... [2024-08-27 14:48:58,433][03018] Decorrelating experience for 64 frames... [2024-08-27 14:48:58,490][03016] Decorrelating experience for 64 frames... [2024-08-27 14:48:58,612][03022] Decorrelating experience for 96 frames... [2024-08-27 14:48:59,414][03023] Decorrelating experience for 64 frames... [2024-08-27 14:48:59,429][03020] Decorrelating experience for 64 frames... [2024-08-27 14:48:59,556][03017] Decorrelating experience for 96 frames... [2024-08-27 14:49:00,039][03021] Decorrelating experience for 64 frames... [2024-08-27 14:49:00,580][03018] Decorrelating experience for 96 frames... [2024-08-27 14:49:00,749][03016] Decorrelating experience for 96 frames... [2024-08-27 14:49:01,112][03020] Decorrelating experience for 96 frames... [2024-08-27 14:49:01,116][03023] Decorrelating experience for 96 frames... [2024-08-27 14:49:01,462][03019] Decorrelating experience for 96 frames... [2024-08-27 14:49:01,666][03021] Decorrelating experience for 96 frames... [2024-08-27 14:49:02,448][00237] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 24. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-27 14:49:02,450][00237] Avg episode reward: [(0, '0.800')] [2024-08-27 14:49:04,108][02998] Signal inference workers to stop experience collection... [2024-08-27 14:49:04,134][03015] InferenceWorker_p0-w0: stopping experience collection [2024-08-27 14:49:07,451][00237] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 171.7. Samples: 2576. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-27 14:49:07,461][00237] Avg episode reward: [(0, '1.938')] [2024-08-27 14:49:07,472][02998] Signal inference workers to resume experience collection... [2024-08-27 14:49:07,473][03015] InferenceWorker_p0-w0: resuming experience collection [2024-08-27 14:49:12,450][00237] Fps is (10 sec: 2457.1, 60 sec: 1228.7, 300 sec: 1228.7). Total num frames: 24576. Throughput: 0: 336.8. Samples: 6736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:49:12,455][00237] Avg episode reward: [(0, '3.497')] [2024-08-27 14:49:17,448][00237] Fps is (10 sec: 3687.4, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 36864. Throughput: 0: 351.0. Samples: 8774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:49:17,455][00237] Avg episode reward: [(0, '3.844')] [2024-08-27 14:49:17,881][03015] Updated weights for policy 0, policy_version 10 (0.0159) [2024-08-27 14:49:22,448][00237] Fps is (10 sec: 3277.5, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 57344. Throughput: 0: 467.4. Samples: 14022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:49:22,452][00237] Avg episode reward: [(0, '4.581')] [2024-08-27 14:49:27,111][03015] Updated weights for policy 0, policy_version 20 (0.0030) [2024-08-27 14:49:27,450][00237] Fps is (10 sec: 4504.5, 60 sec: 2340.4, 300 sec: 2340.4). Total num frames: 81920. Throughput: 0: 606.2. Samples: 21220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:49:27,453][00237] Avg episode reward: [(0, '4.478')] [2024-08-27 14:49:32,448][00237] Fps is (10 sec: 3686.3, 60 sec: 2355.2, 300 sec: 2355.2). Total num frames: 94208. Throughput: 0: 583.8. Samples: 23352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:49:32,452][00237] Avg episode reward: [(0, '4.438')] [2024-08-27 14:49:37,448][00237] Fps is (10 sec: 3277.6, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 114688. Throughput: 0: 635.5. Samples: 28598. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:49:37,451][00237] Avg episode reward: [(0, '4.371')] [2024-08-27 14:49:37,458][02998] Saving new best policy, reward=4.371! [2024-08-27 14:49:38,577][03015] Updated weights for policy 0, policy_version 30 (0.0052) [2024-08-27 14:49:42,448][00237] Fps is (10 sec: 4505.7, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 789.6. Samples: 35530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 14:49:42,450][00237] Avg episode reward: [(0, '4.416')] [2024-08-27 14:49:42,455][02998] Saving new best policy, reward=4.416! [2024-08-27 14:49:47,448][00237] Fps is (10 sec: 4095.8, 60 sec: 2829.9, 300 sec: 2829.9). Total num frames: 155648. Throughput: 0: 852.2. Samples: 38374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:49:47,454][00237] Avg episode reward: [(0, '4.542')] [2024-08-27 14:49:47,466][02998] Saving new best policy, reward=4.542! [2024-08-27 14:49:49,670][03015] Updated weights for policy 0, policy_version 40 (0.0039) [2024-08-27 14:49:52,448][00237] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 172032. Throughput: 0: 885.4. Samples: 42416. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:49:52,450][00237] Avg episode reward: [(0, '4.534')] [2024-08-27 14:49:57,448][00237] Fps is (10 sec: 3686.6, 60 sec: 3208.5, 300 sec: 2961.7). Total num frames: 192512. Throughput: 0: 943.6. Samples: 49198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:49:57,453][00237] Avg episode reward: [(0, '4.406')] [2024-08-27 14:49:59,415][03015] Updated weights for policy 0, policy_version 50 (0.0028) [2024-08-27 14:50:02,449][00237] Fps is (10 sec: 4095.4, 60 sec: 3549.8, 300 sec: 3042.7). Total num frames: 212992. Throughput: 0: 977.5. Samples: 52764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:50:02,454][00237] Avg episode reward: [(0, '4.340')] [2024-08-27 14:50:07,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3058.3). Total num frames: 229376. Throughput: 0: 964.7. Samples: 57434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:50:07,454][00237] Avg episode reward: [(0, '4.297')] [2024-08-27 14:50:10,952][03015] Updated weights for policy 0, policy_version 60 (0.0028) [2024-08-27 14:50:12,448][00237] Fps is (10 sec: 3686.9, 60 sec: 3754.8, 300 sec: 3123.2). Total num frames: 249856. Throughput: 0: 931.2. Samples: 63122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:50:12,449][00237] Avg episode reward: [(0, '4.482')] [2024-08-27 14:50:17,448][00237] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3180.4). Total num frames: 270336. Throughput: 0: 942.5. Samples: 65764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 14:50:17,450][00237] Avg episode reward: [(0, '4.486')] [2024-08-27 14:50:17,463][02998] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000066_270336.pth... [2024-08-27 14:50:22,450][00237] Fps is (10 sec: 3276.2, 60 sec: 3754.5, 300 sec: 3140.2). Total num frames: 282624. Throughput: 0: 948.7. Samples: 71292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 14:50:22,454][00237] Avg episode reward: [(0, '4.380')] [2024-08-27 14:50:22,820][03015] Updated weights for policy 0, policy_version 70 (0.0024) [2024-08-27 14:50:27,448][00237] Fps is (10 sec: 3276.9, 60 sec: 3686.6, 300 sec: 3190.6). Total num frames: 303104. Throughput: 0: 909.9. Samples: 76474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:50:27,450][00237] Avg episode reward: [(0, '4.562')] [2024-08-27 14:50:27,462][02998] Saving new best policy, reward=4.562! [2024-08-27 14:50:32,448][00237] Fps is (10 sec: 4096.8, 60 sec: 3822.9, 300 sec: 3235.8). Total num frames: 323584. Throughput: 0: 922.9. Samples: 79906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:50:32,451][00237] Avg episode reward: [(0, '4.492')] [2024-08-27 14:50:32,466][03015] Updated weights for policy 0, policy_version 80 (0.0019) [2024-08-27 14:50:37,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 344064. Throughput: 0: 982.9. Samples: 86648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 14:50:37,457][00237] Avg episode reward: [(0, '4.441')] [2024-08-27 14:50:42,451][00237] Fps is (10 sec: 3685.3, 60 sec: 3686.2, 300 sec: 3276.7). Total num frames: 360448. Throughput: 0: 926.0. Samples: 90870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:50:42,456][00237] Avg episode reward: [(0, '4.457')] [2024-08-27 14:50:44,129][03015] Updated weights for policy 0, policy_version 90 (0.0021) [2024-08-27 14:50:47,448][00237] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3312.4). Total num frames: 380928. Throughput: 0: 920.1. Samples: 94168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 14:50:47,454][00237] Avg episode reward: [(0, '4.622')] [2024-08-27 14:50:47,534][02998] Saving new best policy, reward=4.622! [2024-08-27 14:50:52,448][00237] Fps is (10 sec: 4506.9, 60 sec: 3891.2, 300 sec: 3379.2). Total num frames: 405504. Throughput: 0: 969.7. Samples: 101072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:50:52,454][00237] Avg episode reward: [(0, '4.745')] [2024-08-27 14:50:52,460][02998] Saving new best policy, reward=4.745! [2024-08-27 14:50:53,104][03015] Updated weights for policy 0, policy_version 100 (0.0014) [2024-08-27 14:50:57,448][00237] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3375.1). Total num frames: 421888. Throughput: 0: 949.7. Samples: 105858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:50:57,450][00237] Avg episode reward: [(0, '4.738')] [2024-08-27 14:51:02,450][00237] Fps is (10 sec: 3276.1, 60 sec: 3754.6, 300 sec: 3371.3). Total num frames: 438272. Throughput: 0: 938.9. Samples: 108018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:51:02,452][00237] Avg episode reward: [(0, '4.850')] [2024-08-27 14:51:02,455][02998] Saving new best policy, reward=4.850! [2024-08-27 14:51:04,779][03015] Updated weights for policy 0, policy_version 110 (0.0034) [2024-08-27 14:51:07,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3428.5). Total num frames: 462848. Throughput: 0: 972.5. Samples: 115054. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:51:07,454][00237] Avg episode reward: [(0, '4.733')] [2024-08-27 14:51:12,449][00237] Fps is (10 sec: 4096.4, 60 sec: 3822.9, 300 sec: 3423.1). Total num frames: 479232. Throughput: 0: 982.9. Samples: 120704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:51:12,451][00237] Avg episode reward: [(0, '4.887')] [2024-08-27 14:51:12,457][02998] Saving new best policy, reward=4.887! [2024-08-27 14:51:16,383][03015] Updated weights for policy 0, policy_version 120 (0.0026) [2024-08-27 14:51:17,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3418.0). Total num frames: 495616. Throughput: 0: 951.5. Samples: 122722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-27 14:51:17,454][00237] Avg episode reward: [(0, '4.791')] [2024-08-27 14:51:22,448][00237] Fps is (10 sec: 3277.2, 60 sec: 3823.1, 300 sec: 3413.3). Total num frames: 512000. Throughput: 0: 921.4. Samples: 128110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 14:51:22,450][00237] Avg episode reward: [(0, '4.669')] [2024-08-27 14:51:27,448][00237] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3408.9). Total num frames: 528384. Throughput: 0: 937.2. Samples: 133042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:51:27,453][00237] Avg episode reward: [(0, '4.666')] [2024-08-27 14:51:28,359][03015] Updated weights for policy 0, policy_version 130 (0.0032) [2024-08-27 14:51:32,448][00237] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3379.2). Total num frames: 540672. Throughput: 0: 910.1. Samples: 135120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 14:51:32,450][00237] Avg episode reward: [(0, '4.698')] [2024-08-27 14:51:37,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3400.9). Total num frames: 561152. Throughput: 0: 879.8. Samples: 140664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-08-27 14:51:37,451][00237] Avg episode reward: [(0, '4.461')] [2024-08-27 14:51:39,309][03015] Updated weights for policy 0, policy_version 140 (0.0041) [2024-08-27 14:51:42,448][00237] Fps is (10 sec: 4505.6, 60 sec: 3754.8, 300 sec: 3445.5). Total num frames: 585728. Throughput: 0: 927.1. Samples: 147578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:51:42,450][00237] Avg episode reward: [(0, '4.328')] [2024-08-27 14:51:47,448][00237] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3440.6). Total num frames: 602112. Throughput: 0: 940.4. Samples: 150332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:51:47,450][00237] Avg episode reward: [(0, '4.478')] [2024-08-27 14:51:50,889][03015] Updated weights for policy 0, policy_version 150 (0.0026) [2024-08-27 14:51:52,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3436.1). Total num frames: 618496. Throughput: 0: 879.0. Samples: 154608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-27 14:51:52,453][00237] Avg episode reward: [(0, '4.537')] [2024-08-27 14:51:57,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3476.1). Total num frames: 643072. Throughput: 0: 907.0. Samples: 161520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:51:57,451][00237] Avg episode reward: [(0, '4.484')] [2024-08-27 14:51:59,677][03015] Updated weights for policy 0, policy_version 160 (0.0021) [2024-08-27 14:52:02,448][00237] Fps is (10 sec: 4505.4, 60 sec: 3754.8, 300 sec: 3492.4). Total num frames: 663552. Throughput: 0: 940.2. Samples: 165030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:52:02,451][00237] Avg episode reward: [(0, '4.556')] [2024-08-27 14:52:07,451][00237] Fps is (10 sec: 3275.9, 60 sec: 3549.7, 300 sec: 3465.8). Total num frames: 675840. Throughput: 0: 919.5. Samples: 169490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:52:07,455][00237] Avg episode reward: [(0, '4.635')] [2024-08-27 14:52:11,168][03015] Updated weights for policy 0, policy_version 170 (0.0028) [2024-08-27 14:52:12,448][00237] Fps is (10 sec: 3686.5, 60 sec: 3686.5, 300 sec: 3502.1). Total num frames: 700416. Throughput: 0: 949.7. Samples: 175776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:52:12,450][00237] Avg episode reward: [(0, '4.826')] [2024-08-27 14:52:17,448][00237] Fps is (10 sec: 4916.6, 60 sec: 3822.9, 300 sec: 3536.5). Total num frames: 724992. Throughput: 0: 982.8. Samples: 179348. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-27 14:52:17,451][00237] Avg episode reward: [(0, '5.037')] [2024-08-27 14:52:17,461][02998] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_724992.pth... [2024-08-27 14:52:17,628][02998] Saving new best policy, reward=5.037! [2024-08-27 14:52:21,503][03015] Updated weights for policy 0, policy_version 180 (0.0034) [2024-08-27 14:52:22,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3510.9). Total num frames: 737280. Throughput: 0: 977.0. Samples: 184630. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:52:22,453][00237] Avg episode reward: [(0, '5.181')] [2024-08-27 14:52:22,459][02998] Saving new best policy, reward=5.181! [2024-08-27 14:52:27,453][00237] Fps is (10 sec: 3275.2, 60 sec: 3822.6, 300 sec: 3524.4). Total num frames: 757760. Throughput: 0: 939.2. Samples: 189848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:52:27,459][00237] Avg episode reward: [(0, '5.014')] [2024-08-27 14:52:31,649][03015] Updated weights for policy 0, policy_version 190 (0.0039) [2024-08-27 14:52:32,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3537.5). Total num frames: 778240. Throughput: 0: 956.9. Samples: 193392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:52:32,452][00237] Avg episode reward: [(0, '5.238')] [2024-08-27 14:52:32,530][02998] Saving new best policy, reward=5.238! [2024-08-27 14:52:37,449][00237] Fps is (10 sec: 4097.4, 60 sec: 3959.4, 300 sec: 3549.8). Total num frames: 798720. Throughput: 0: 1004.6. Samples: 199816. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-27 14:52:37,453][00237] Avg episode reward: [(0, '5.405')] [2024-08-27 14:52:37,466][02998] Saving new best policy, reward=5.405! [2024-08-27 14:52:42,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3526.1). Total num frames: 811008. Throughput: 0: 938.7. Samples: 203760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:52:42,454][00237] Avg episode reward: [(0, '5.418')] [2024-08-27 14:52:42,459][02998] Saving new best policy, reward=5.418! [2024-08-27 14:52:43,927][03015] Updated weights for policy 0, policy_version 200 (0.0032) [2024-08-27 14:52:47,448][00237] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3555.7). Total num frames: 835584. Throughput: 0: 930.4. Samples: 206896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:52:47,455][00237] Avg episode reward: [(0, '5.463')] [2024-08-27 14:52:47,465][02998] Saving new best policy, reward=5.463! [2024-08-27 14:52:52,448][00237] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3566.9). Total num frames: 856064. Throughput: 0: 978.6. Samples: 213526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 14:52:52,451][00237] Avg episode reward: [(0, '5.432')] [2024-08-27 14:52:53,952][03015] Updated weights for policy 0, policy_version 210 (0.0029) [2024-08-27 14:52:57,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3544.3). Total num frames: 868352. Throughput: 0: 934.3. Samples: 217820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:52:57,450][00237] Avg episode reward: [(0, '4.969')] [2024-08-27 14:53:02,448][00237] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3538.9). Total num frames: 884736. Throughput: 0: 902.9. Samples: 219980. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:53:02,450][00237] Avg episode reward: [(0, '5.275')] [2024-08-27 14:53:05,491][03015] Updated weights for policy 0, policy_version 220 (0.0036) [2024-08-27 14:53:07,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3565.9). Total num frames: 909312. Throughput: 0: 935.5. Samples: 226728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:53:07,452][00237] Avg episode reward: [(0, '5.481')] [2024-08-27 14:53:07,463][02998] Saving new best policy, reward=5.481! [2024-08-27 14:53:12,448][00237] Fps is (10 sec: 4095.9, 60 sec: 3754.6, 300 sec: 3560.4). Total num frames: 925696. Throughput: 0: 942.1. Samples: 232240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:53:12,453][00237] Avg episode reward: [(0, '5.723')] [2024-08-27 14:53:12,460][02998] Saving new best policy, reward=5.723! [2024-08-27 14:53:17,448][00237] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3539.6). Total num frames: 937984. Throughput: 0: 905.6. Samples: 234142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:53:17,453][00237] Avg episode reward: [(0, '5.456')] [2024-08-27 14:53:17,829][03015] Updated weights for policy 0, policy_version 230 (0.0035) [2024-08-27 14:53:22,448][00237] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3549.9). Total num frames: 958464. Throughput: 0: 889.8. Samples: 239854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:53:22,451][00237] Avg episode reward: [(0, '5.347')] [2024-08-27 14:53:27,359][03015] Updated weights for policy 0, policy_version 240 (0.0019) [2024-08-27 14:53:27,448][00237] Fps is (10 sec: 4505.5, 60 sec: 3755.0, 300 sec: 3574.7). Total num frames: 983040. Throughput: 0: 943.1. Samples: 246200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:53:27,451][00237] Avg episode reward: [(0, '5.424')] [2024-08-27 14:53:32,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.7). Total num frames: 995328. Throughput: 0: 917.9. Samples: 248202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:53:32,454][00237] Avg episode reward: [(0, '5.689')] [2024-08-27 14:53:37,448][00237] Fps is (10 sec: 2867.3, 60 sec: 3550.0, 300 sec: 3549.9). Total num frames: 1011712. Throughput: 0: 872.2. Samples: 252774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:53:37,450][00237] Avg episode reward: [(0, '6.038')] [2024-08-27 14:53:37,463][02998] Saving new best policy, reward=6.038! [2024-08-27 14:53:39,896][03015] Updated weights for policy 0, policy_version 250 (0.0022) [2024-08-27 14:53:42,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3559.3). Total num frames: 1032192. Throughput: 0: 921.8. Samples: 259302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:53:42,453][00237] Avg episode reward: [(0, '6.215')] [2024-08-27 14:53:42,457][02998] Saving new best policy, reward=6.215! [2024-08-27 14:53:47,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1048576. Throughput: 0: 938.6. Samples: 262216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 14:53:47,450][00237] Avg episode reward: [(0, '5.839')] [2024-08-27 14:53:51,988][03015] Updated weights for policy 0, policy_version 260 (0.0029) [2024-08-27 14:53:52,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3610.0). Total num frames: 1064960. Throughput: 0: 879.2. Samples: 266290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:53:52,450][00237] Avg episode reward: [(0, '5.652')] [2024-08-27 14:53:57,452][00237] Fps is (10 sec: 4094.4, 60 sec: 3686.2, 300 sec: 3693.3). Total num frames: 1089536. Throughput: 0: 901.5. Samples: 272810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:53:57,454][00237] Avg episode reward: [(0, '5.527')] [2024-08-27 14:54:00,992][03015] Updated weights for policy 0, policy_version 270 (0.0038) [2024-08-27 14:54:02,448][00237] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1110016. Throughput: 0: 933.9. Samples: 276166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:54:02,452][00237] Avg episode reward: [(0, '5.391')] [2024-08-27 14:54:07,448][00237] Fps is (10 sec: 3278.1, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 1122304. Throughput: 0: 914.2. Samples: 280994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:54:07,452][00237] Avg episode reward: [(0, '5.651')] [2024-08-27 14:54:12,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 1142784. Throughput: 0: 897.8. Samples: 286602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 14:54:12,456][00237] Avg episode reward: [(0, '5.788')] [2024-08-27 14:54:12,821][03015] Updated weights for policy 0, policy_version 280 (0.0037) [2024-08-27 14:54:17,448][00237] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1167360. Throughput: 0: 930.3. Samples: 290064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:54:17,455][00237] Avg episode reward: [(0, '5.967')] [2024-08-27 14:54:17,465][02998] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000285_1167360.pth... [2024-08-27 14:54:17,587][02998] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000066_270336.pth [2024-08-27 14:54:22,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1183744. Throughput: 0: 960.5. Samples: 295996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:54:22,450][00237] Avg episode reward: [(0, '6.420')] [2024-08-27 14:54:22,452][02998] Saving new best policy, reward=6.420! [2024-08-27 14:54:23,406][03015] Updated weights for policy 0, policy_version 290 (0.0036) [2024-08-27 14:54:27,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3748.9). Total num frames: 1200128. Throughput: 0: 916.2. Samples: 300532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:54:27,450][00237] Avg episode reward: [(0, '6.338')] [2024-08-27 14:54:32,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1220608. Throughput: 0: 929.7. Samples: 304052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:54:32,450][00237] Avg episode reward: [(0, '6.468')] [2024-08-27 14:54:32,538][02998] Saving new best policy, reward=6.468! [2024-08-27 14:54:33,456][03015] Updated weights for policy 0, policy_version 300 (0.0035) [2024-08-27 14:54:37,450][00237] Fps is (10 sec: 4504.5, 60 sec: 3891.0, 300 sec: 3748.9). Total num frames: 1245184. Throughput: 0: 995.8. Samples: 311104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-27 14:54:37,453][00237] Avg episode reward: [(0, '6.561')] [2024-08-27 14:54:37,474][02998] Saving new best policy, reward=6.561! [2024-08-27 14:54:42,448][00237] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 1257472. Throughput: 0: 945.7. Samples: 315362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 14:54:42,452][00237] Avg episode reward: [(0, '6.837')] [2024-08-27 14:54:42,461][02998] Saving new best policy, reward=6.837! [2024-08-27 14:54:45,292][03015] Updated weights for policy 0, policy_version 310 (0.0024) [2024-08-27 14:54:47,448][00237] Fps is (10 sec: 3277.5, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1277952. Throughput: 0: 929.4. Samples: 317988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 14:54:47,450][00237] Avg episode reward: [(0, '6.022')] [2024-08-27 14:54:52,448][00237] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 1302528. Throughput: 0: 977.2. Samples: 324968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:54:52,450][00237] Avg episode reward: [(0, '5.719')] [2024-08-27 14:54:54,355][03015] Updated weights for policy 0, policy_version 320 (0.0036) [2024-08-27 14:54:57,448][00237] Fps is (10 sec: 3686.5, 60 sec: 3754.9, 300 sec: 3735.0). Total num frames: 1314816. Throughput: 0: 966.9. Samples: 330114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:54:57,452][00237] Avg episode reward: [(0, '6.388')] [2024-08-27 14:55:02,448][00237] Fps is (10 sec: 2867.3, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1331200. Throughput: 0: 937.8. Samples: 332266. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-08-27 14:55:02,450][00237] Avg episode reward: [(0, '7.296')] [2024-08-27 14:55:02,518][02998] Saving new best policy, reward=7.296! [2024-08-27 14:55:06,072][03015] Updated weights for policy 0, policy_version 330 (0.0045) [2024-08-27 14:55:07,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 1355776. Throughput: 0: 954.1. Samples: 338930. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-08-27 14:55:07,452][00237] Avg episode reward: [(0, '7.385')] [2024-08-27 14:55:07,461][02998] Saving new best policy, reward=7.385! [2024-08-27 14:55:12,448][00237] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 1376256. Throughput: 0: 991.9. Samples: 345168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:55:12,450][00237] Avg episode reward: [(0, '6.770')] [2024-08-27 14:55:17,389][03015] Updated weights for policy 0, policy_version 340 (0.0039) [2024-08-27 14:55:17,449][00237] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3762.8). Total num frames: 1392640. Throughput: 0: 959.2. Samples: 347218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:55:17,452][00237] Avg episode reward: [(0, '6.671')] [2024-08-27 14:55:22,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1413120. Throughput: 0: 928.9. Samples: 352902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:55:22,457][00237] Avg episode reward: [(0, '7.142')] [2024-08-27 14:55:26,575][03015] Updated weights for policy 0, policy_version 350 (0.0032) [2024-08-27 14:55:27,448][00237] Fps is (10 sec: 4506.2, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1437696. Throughput: 0: 990.1. Samples: 359914. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 14:55:27,450][00237] Avg episode reward: [(0, '7.045')] [2024-08-27 14:55:32,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1449984. Throughput: 0: 986.2. Samples: 362366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:55:32,453][00237] Avg episode reward: [(0, '6.683')] [2024-08-27 14:55:37,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3762.8). Total num frames: 1470464. Throughput: 0: 939.5. Samples: 367244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 14:55:37,449][00237] Avg episode reward: [(0, '6.664')] [2024-08-27 14:55:37,912][03015] Updated weights for policy 0, policy_version 360 (0.0043) [2024-08-27 14:55:42,448][00237] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1495040. Throughput: 0: 982.7. Samples: 374334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:55:42,453][00237] Avg episode reward: [(0, '6.737')] [2024-08-27 14:55:47,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3748.9). Total num frames: 1511424. Throughput: 0: 1010.3. Samples: 377730. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:55:47,461][00237] Avg episode reward: [(0, '7.802')] [2024-08-27 14:55:47,554][02998] Saving new best policy, reward=7.802! [2024-08-27 14:55:47,568][03015] Updated weights for policy 0, policy_version 370 (0.0038) [2024-08-27 14:55:52,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1527808. Throughput: 0: 958.2. Samples: 382050. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:55:52,451][00237] Avg episode reward: [(0, '7.695')] [2024-08-27 14:55:57,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1552384. Throughput: 0: 966.9. Samples: 388680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:55:57,450][00237] Avg episode reward: [(0, '7.948')] [2024-08-27 14:55:57,460][02998] Saving new best policy, reward=7.948! [2024-08-27 14:55:58,190][03015] Updated weights for policy 0, policy_version 380 (0.0034) [2024-08-27 14:56:02,448][00237] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3762.8). Total num frames: 1572864. Throughput: 0: 996.8. Samples: 392072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:56:02,454][00237] Avg episode reward: [(0, '8.722')] [2024-08-27 14:56:02,457][02998] Saving new best policy, reward=8.722! [2024-08-27 14:56:07,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1585152. Throughput: 0: 980.9. Samples: 397042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:56:07,454][00237] Avg episode reward: [(0, '8.548')] [2024-08-27 14:56:09,962][03015] Updated weights for policy 0, policy_version 390 (0.0032) [2024-08-27 14:56:12,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1605632. Throughput: 0: 948.8. Samples: 402612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:56:12,455][00237] Avg episode reward: [(0, '8.362')] [2024-08-27 14:56:17,448][00237] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 3790.5). Total num frames: 1630208. Throughput: 0: 971.8. Samples: 406096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-27 14:56:17,451][00237] Avg episode reward: [(0, '7.871')] [2024-08-27 14:56:17,461][02998] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000398_1630208.pth... [2024-08-27 14:56:17,591][02998] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_724992.pth [2024-08-27 14:56:18,640][03015] Updated weights for policy 0, policy_version 400 (0.0015) [2024-08-27 14:56:22,450][00237] Fps is (10 sec: 4095.0, 60 sec: 3891.0, 300 sec: 3790.5). Total num frames: 1646592. Throughput: 0: 999.9. Samples: 412240. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:56:22,453][00237] Avg episode reward: [(0, '8.028')] [2024-08-27 14:56:27,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1662976. Throughput: 0: 945.2. Samples: 416868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:56:27,454][00237] Avg episode reward: [(0, '7.760')] [2024-08-27 14:56:30,184][03015] Updated weights for policy 0, policy_version 410 (0.0046) [2024-08-27 14:56:32,448][00237] Fps is (10 sec: 4097.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1687552. Throughput: 0: 949.8. Samples: 420472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:56:32,456][00237] Avg episode reward: [(0, '7.882')] [2024-08-27 14:56:37,448][00237] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 1708032. Throughput: 0: 1014.1. Samples: 427684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:56:37,450][00237] Avg episode reward: [(0, '8.678')] [2024-08-27 14:56:40,414][03015] Updated weights for policy 0, policy_version 420 (0.0038) [2024-08-27 14:56:42,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1724416. Throughput: 0: 961.9. Samples: 431966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:56:42,451][00237] Avg episode reward: [(0, '9.923')] [2024-08-27 14:56:42,456][02998] Saving new best policy, reward=9.923! [2024-08-27 14:56:47,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1744896. Throughput: 0: 952.4. Samples: 434930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:56:47,456][00237] Avg episode reward: [(0, '10.144')] [2024-08-27 14:56:47,465][02998] Saving new best policy, reward=10.144! [2024-08-27 14:56:50,294][03015] Updated weights for policy 0, policy_version 430 (0.0021) [2024-08-27 14:56:52,448][00237] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 1769472. Throughput: 0: 999.2. Samples: 442006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:56:52,450][00237] Avg episode reward: [(0, '10.304')] [2024-08-27 14:56:52,453][02998] Saving new best policy, reward=10.304! [2024-08-27 14:56:57,451][00237] Fps is (10 sec: 4094.6, 60 sec: 3891.0, 300 sec: 3804.4). Total num frames: 1785856. Throughput: 0: 992.5. Samples: 447280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:56:57,454][00237] Avg episode reward: [(0, '9.934')] [2024-08-27 14:57:01,701][03015] Updated weights for policy 0, policy_version 440 (0.0019) [2024-08-27 14:57:02,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1802240. Throughput: 0: 962.8. Samples: 449422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:57:02,450][00237] Avg episode reward: [(0, '10.162')] [2024-08-27 14:57:07,448][00237] Fps is (10 sec: 4097.3, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 1826816. Throughput: 0: 983.5. Samples: 456494. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:57:07,450][00237] Avg episode reward: [(0, '10.620')] [2024-08-27 14:57:07,460][02998] Saving new best policy, reward=10.620! [2024-08-27 14:57:10,318][03015] Updated weights for policy 0, policy_version 450 (0.0026) [2024-08-27 14:57:12,450][00237] Fps is (10 sec: 4504.4, 60 sec: 4027.6, 300 sec: 3804.4). Total num frames: 1847296. Throughput: 0: 1022.1. Samples: 462864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:57:12,453][00237] Avg episode reward: [(0, '12.010')] [2024-08-27 14:57:12,459][02998] Saving new best policy, reward=12.010! [2024-08-27 14:57:17,448][00237] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1863680. Throughput: 0: 988.1. Samples: 464936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:57:17,452][00237] Avg episode reward: [(0, '11.592')] [2024-08-27 14:57:21,904][03015] Updated weights for policy 0, policy_version 460 (0.0017) [2024-08-27 14:57:22,448][00237] Fps is (10 sec: 3687.4, 60 sec: 3959.6, 300 sec: 3818.4). Total num frames: 1884160. Throughput: 0: 960.6. Samples: 470910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:57:22,450][00237] Avg episode reward: [(0, '12.160')] [2024-08-27 14:57:22,453][02998] Saving new best policy, reward=12.160! [2024-08-27 14:57:27,448][00237] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3832.2). Total num frames: 1908736. Throughput: 0: 1023.6. Samples: 478026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:57:27,451][00237] Avg episode reward: [(0, '11.700')] [2024-08-27 14:57:32,265][03015] Updated weights for policy 0, policy_version 470 (0.0047) [2024-08-27 14:57:32,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1925120. Throughput: 0: 1010.4. Samples: 480396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:57:32,455][00237] Avg episode reward: [(0, '12.690')] [2024-08-27 14:57:32,458][02998] Saving new best policy, reward=12.690! [2024-08-27 14:57:37,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1941504. Throughput: 0: 962.4. Samples: 485316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:57:37,452][00237] Avg episode reward: [(0, '12.937')] [2024-08-27 14:57:37,465][02998] Saving new best policy, reward=12.937! [2024-08-27 14:57:41,981][03015] Updated weights for policy 0, policy_version 480 (0.0031) [2024-08-27 14:57:42,448][00237] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 1966080. Throughput: 0: 1002.4. Samples: 492386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:57:42,450][00237] Avg episode reward: [(0, '13.294')] [2024-08-27 14:57:42,459][02998] Saving new best policy, reward=13.294! [2024-08-27 14:57:47,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 1982464. Throughput: 0: 1026.1. Samples: 495598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:57:47,453][00237] Avg episode reward: [(0, '13.264')] [2024-08-27 14:57:52,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1998848. Throughput: 0: 964.5. Samples: 499894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:57:52,450][00237] Avg episode reward: [(0, '13.216')] [2024-08-27 14:57:53,591][03015] Updated weights for policy 0, policy_version 490 (0.0020) [2024-08-27 14:57:57,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3860.0). Total num frames: 2023424. Throughput: 0: 974.9. Samples: 506734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:57:57,455][00237] Avg episode reward: [(0, '13.253')] [2024-08-27 14:58:02,449][00237] Fps is (10 sec: 4504.9, 60 sec: 4027.6, 300 sec: 3846.1). Total num frames: 2043904. Throughput: 0: 1004.2. Samples: 510126. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:58:02,456][00237] Avg episode reward: [(0, '13.041')] [2024-08-27 14:58:02,491][03015] Updated weights for policy 0, policy_version 500 (0.0031) [2024-08-27 14:58:07,449][00237] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 2060288. Throughput: 0: 984.9. Samples: 515230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:58:07,456][00237] Avg episode reward: [(0, '13.695')] [2024-08-27 14:58:07,474][02998] Saving new best policy, reward=13.695! [2024-08-27 14:58:12,448][00237] Fps is (10 sec: 3687.0, 60 sec: 3891.4, 300 sec: 3873.8). Total num frames: 2080768. Throughput: 0: 950.8. Samples: 520814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:58:12,454][00237] Avg episode reward: [(0, '14.155')] [2024-08-27 14:58:12,457][02998] Saving new best policy, reward=14.155! [2024-08-27 14:58:13,918][03015] Updated weights for policy 0, policy_version 510 (0.0031) [2024-08-27 14:58:17,448][00237] Fps is (10 sec: 4506.3, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 2105344. Throughput: 0: 973.4. Samples: 524198. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:58:17,453][00237] Avg episode reward: [(0, '14.272')] [2024-08-27 14:58:17,463][02998] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000514_2105344.pth... [2024-08-27 14:58:17,590][02998] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000285_1167360.pth [2024-08-27 14:58:17,601][02998] Saving new best policy, reward=14.272! [2024-08-27 14:58:22,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2117632. Throughput: 0: 992.2. Samples: 529966. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:58:22,454][00237] Avg episode reward: [(0, '14.847')] [2024-08-27 14:58:22,457][02998] Saving new best policy, reward=14.847! [2024-08-27 14:58:25,627][03015] Updated weights for policy 0, policy_version 520 (0.0022) [2024-08-27 14:58:27,448][00237] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2134016. Throughput: 0: 944.3. Samples: 534878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:58:27,454][00237] Avg episode reward: [(0, '14.873')] [2024-08-27 14:58:27,487][02998] Saving new best policy, reward=14.873! [2024-08-27 14:58:32,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2158592. Throughput: 0: 946.2. Samples: 538178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 14:58:32,450][00237] Avg episode reward: [(0, '14.877')] [2024-08-27 14:58:32,453][02998] Saving new best policy, reward=14.877! [2024-08-27 14:58:34,635][03015] Updated weights for policy 0, policy_version 530 (0.0022) [2024-08-27 14:58:37,448][00237] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2179072. Throughput: 0: 1001.8. Samples: 544976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 14:58:37,457][00237] Avg episode reward: [(0, '15.445')] [2024-08-27 14:58:37,487][02998] Saving new best policy, reward=15.445! [2024-08-27 14:58:42,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 2191360. Throughput: 0: 937.7. Samples: 548930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:58:42,450][00237] Avg episode reward: [(0, '15.357')] [2024-08-27 14:58:46,822][03015] Updated weights for policy 0, policy_version 540 (0.0029) [2024-08-27 14:58:47,448][00237] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 2211840. Throughput: 0: 925.4. Samples: 551768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:58:47,451][00237] Avg episode reward: [(0, '16.661')] [2024-08-27 14:58:47,461][02998] Saving new best policy, reward=16.661! [2024-08-27 14:58:52,448][00237] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.8). Total num frames: 2236416. Throughput: 0: 962.6. Samples: 558544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:58:52,451][00237] Avg episode reward: [(0, '17.776')] [2024-08-27 14:58:52,456][02998] Saving new best policy, reward=17.776! [2024-08-27 14:58:57,448][00237] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3860.0). Total num frames: 2248704. Throughput: 0: 944.1. Samples: 563298. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 14:58:57,453][00237] Avg episode reward: [(0, '18.244')] [2024-08-27 14:58:57,465][02998] Saving new best policy, reward=18.244! [2024-08-27 14:58:58,148][03015] Updated weights for policy 0, policy_version 550 (0.0017) [2024-08-27 14:59:02,448][00237] Fps is (10 sec: 2867.2, 60 sec: 3686.5, 300 sec: 3873.8). Total num frames: 2265088. Throughput: 0: 912.5. Samples: 565262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 14:59:02,452][00237] Avg episode reward: [(0, '18.069')] [2024-08-27 14:59:07,448][00237] Fps is (10 sec: 4096.2, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 2289664. Throughput: 0: 935.8. Samples: 572078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 14:59:07,450][00237] Avg episode reward: [(0, '19.089')] [2024-08-27 14:59:07,490][02998] Saving new best policy, reward=19.089! [2024-08-27 14:59:08,071][03015] Updated weights for policy 0, policy_version 560 (0.0047) [2024-08-27 14:59:12,448][00237] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2310144. Throughput: 0: 964.0. Samples: 578260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:59:12,454][00237] Avg episode reward: [(0, '18.367')] [2024-08-27 14:59:17,448][00237] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3860.0). Total num frames: 2322432. Throughput: 0: 934.7. Samples: 580242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 14:59:17,451][00237] Avg episode reward: [(0, '18.317')] [2024-08-27 14:59:19,734][03015] Updated weights for policy 0, policy_version 570 (0.0030) [2024-08-27 14:59:22,451][00237] Fps is (10 sec: 3685.1, 60 sec: 3822.7, 300 sec: 3887.7). Total num frames: 2347008. Throughput: 0: 916.3. Samples: 586214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 14:59:22,459][00237] Avg episode reward: [(0, '18.748')] [2024-08-27 14:59:27,448][00237] Fps is (10 sec: 4505.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2367488. Throughput: 0: 983.4. Samples: 593184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:59:27,453][00237] Avg episode reward: [(0, '18.650')] [2024-08-27 14:59:28,784][03015] Updated weights for policy 0, policy_version 580 (0.0016) [2024-08-27 14:59:32,450][00237] Fps is (10 sec: 3686.7, 60 sec: 3754.5, 300 sec: 3860.0). Total num frames: 2383872. Throughput: 0: 971.9. Samples: 595504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:59:32,453][00237] Avg episode reward: [(0, '18.372')] [2024-08-27 14:59:37,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 2404352. Throughput: 0: 932.1. Samples: 600490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 14:59:37,452][00237] Avg episode reward: [(0, '18.370')] [2024-08-27 14:59:39,988][03015] Updated weights for policy 0, policy_version 590 (0.0014) [2024-08-27 14:59:42,448][00237] Fps is (10 sec: 4097.1, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2424832. Throughput: 0: 981.1. Samples: 607448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:59:42,453][00237] Avg episode reward: [(0, '19.417')] [2024-08-27 14:59:42,551][02998] Saving new best policy, reward=19.417! [2024-08-27 14:59:47,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2445312. Throughput: 0: 1010.2. Samples: 610720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:59:47,452][00237] Avg episode reward: [(0, '18.760')] [2024-08-27 14:59:51,053][03015] Updated weights for policy 0, policy_version 600 (0.0025) [2024-08-27 14:59:52,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 2461696. Throughput: 0: 953.2. Samples: 614972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-27 14:59:52,453][00237] Avg episode reward: [(0, '17.310')] [2024-08-27 14:59:57,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2482176. Throughput: 0: 964.7. Samples: 621672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 14:59:57,453][00237] Avg episode reward: [(0, '17.771')] [2024-08-27 15:00:00,310][03015] Updated weights for policy 0, policy_version 610 (0.0033) [2024-08-27 15:00:02,448][00237] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2506752. Throughput: 0: 997.5. Samples: 625130. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 15:00:02,450][00237] Avg episode reward: [(0, '16.094')] [2024-08-27 15:00:07,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2519040. Throughput: 0: 970.9. Samples: 629902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:00:07,452][00237] Avg episode reward: [(0, '15.772')] [2024-08-27 15:00:12,139][03015] Updated weights for policy 0, policy_version 620 (0.0051) [2024-08-27 15:00:12,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 2539520. Throughput: 0: 940.7. Samples: 635516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:00:12,450][00237] Avg episode reward: [(0, '17.606')] [2024-08-27 15:00:17,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 2560000. Throughput: 0: 966.7. Samples: 639002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:00:17,456][00237] Avg episode reward: [(0, '18.146')] [2024-08-27 15:00:17,493][02998] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000626_2564096.pth... [2024-08-27 15:00:17,626][02998] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000398_1630208.pth [2024-08-27 15:00:22,180][03015] Updated weights for policy 0, policy_version 630 (0.0040) [2024-08-27 15:00:22,450][00237] Fps is (10 sec: 4095.1, 60 sec: 3891.3, 300 sec: 3873.8). Total num frames: 2580480. Throughput: 0: 990.8. Samples: 645080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:00:22,453][00237] Avg episode reward: [(0, '18.604')] [2024-08-27 15:00:27,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 2596864. Throughput: 0: 938.1. Samples: 649664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:00:27,451][00237] Avg episode reward: [(0, '19.231')] [2024-08-27 15:00:32,448][00237] Fps is (10 sec: 3687.2, 60 sec: 3891.4, 300 sec: 3887.7). Total num frames: 2617344. Throughput: 0: 942.2. Samples: 653118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 15:00:32,452][00237] Avg episode reward: [(0, '20.069')] [2024-08-27 15:00:32,458][02998] Saving new best policy, reward=20.069! [2024-08-27 15:00:32,718][03015] Updated weights for policy 0, policy_version 640 (0.0033) [2024-08-27 15:00:37,451][00237] Fps is (10 sec: 4094.6, 60 sec: 3891.0, 300 sec: 3873.8). Total num frames: 2637824. Throughput: 0: 1000.5. Samples: 660000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 15:00:37,460][00237] Avg episode reward: [(0, '20.698')] [2024-08-27 15:00:37,480][02998] Saving new best policy, reward=20.698! [2024-08-27 15:00:42,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2654208. Throughput: 0: 944.7. Samples: 664184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:00:42,456][00237] Avg episode reward: [(0, '21.287')] [2024-08-27 15:00:42,458][02998] Saving new best policy, reward=21.287! [2024-08-27 15:00:44,425][03015] Updated weights for policy 0, policy_version 650 (0.0042) [2024-08-27 15:00:47,448][00237] Fps is (10 sec: 3687.6, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 2674688. Throughput: 0: 930.2. Samples: 666990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-27 15:00:47,450][00237] Avg episode reward: [(0, '21.641')] [2024-08-27 15:00:47,459][02998] Saving new best policy, reward=21.641! [2024-08-27 15:00:52,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2695168. Throughput: 0: 976.4. Samples: 673840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:00:52,452][00237] Avg episode reward: [(0, '20.747')] [2024-08-27 15:00:53,573][03015] Updated weights for policy 0, policy_version 660 (0.0020) [2024-08-27 15:00:57,448][00237] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2711552. Throughput: 0: 969.4. Samples: 679138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:00:57,454][00237] Avg episode reward: [(0, '20.935')] [2024-08-27 15:01:02,448][00237] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3873.8). Total num frames: 2727936. Throughput: 0: 938.7. Samples: 681244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:01:02,450][00237] Avg episode reward: [(0, '20.083')] [2024-08-27 15:01:05,168][03015] Updated weights for policy 0, policy_version 670 (0.0029) [2024-08-27 15:01:07,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2752512. Throughput: 0: 950.3. Samples: 687840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:01:07,451][00237] Avg episode reward: [(0, '20.939')] [2024-08-27 15:01:12,448][00237] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2772992. Throughput: 0: 990.4. Samples: 694232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 15:01:12,450][00237] Avg episode reward: [(0, '22.351')] [2024-08-27 15:01:12,454][02998] Saving new best policy, reward=22.351! [2024-08-27 15:01:16,453][03015] Updated weights for policy 0, policy_version 680 (0.0019) [2024-08-27 15:01:17,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2785280. Throughput: 0: 955.8. Samples: 696130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 15:01:17,453][00237] Avg episode reward: [(0, '23.501')] [2024-08-27 15:01:17,471][02998] Saving new best policy, reward=23.501! [2024-08-27 15:01:22,448][00237] Fps is (10 sec: 3276.9, 60 sec: 3754.8, 300 sec: 3873.8). Total num frames: 2805760. Throughput: 0: 929.1. Samples: 701808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:01:22,453][00237] Avg episode reward: [(0, '23.773')] [2024-08-27 15:01:22,457][02998] Saving new best policy, reward=23.773! [2024-08-27 15:01:26,096][03015] Updated weights for policy 0, policy_version 690 (0.0037) [2024-08-27 15:01:27,448][00237] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2830336. Throughput: 0: 987.3. Samples: 708614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 15:01:27,450][00237] Avg episode reward: [(0, '24.264')] [2024-08-27 15:01:27,461][02998] Saving new best policy, reward=24.264! [2024-08-27 15:01:32,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2846720. Throughput: 0: 977.0. Samples: 710954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:01:32,454][00237] Avg episode reward: [(0, '24.296')] [2024-08-27 15:01:32,459][02998] Saving new best policy, reward=24.296! [2024-08-27 15:01:37,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3860.0). Total num frames: 2863104. Throughput: 0: 923.2. Samples: 715386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 15:01:37,452][00237] Avg episode reward: [(0, '24.036')] [2024-08-27 15:01:37,970][03015] Updated weights for policy 0, policy_version 700 (0.0013) [2024-08-27 15:01:42,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2887680. Throughput: 0: 963.0. Samples: 722472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:01:42,455][00237] Avg episode reward: [(0, '22.917')] [2024-08-27 15:01:47,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2904064. Throughput: 0: 992.8. Samples: 725922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:01:47,454][00237] Avg episode reward: [(0, '21.423')] [2024-08-27 15:01:47,540][03015] Updated weights for policy 0, policy_version 710 (0.0031) [2024-08-27 15:01:52,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 2920448. Throughput: 0: 940.0. Samples: 730142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 15:01:52,451][00237] Avg episode reward: [(0, '22.906')] [2024-08-27 15:01:57,448][00237] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2945024. Throughput: 0: 945.6. Samples: 736784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:01:57,454][00237] Avg episode reward: [(0, '22.571')] [2024-08-27 15:01:58,058][03015] Updated weights for policy 0, policy_version 720 (0.0014) [2024-08-27 15:02:02,449][00237] Fps is (10 sec: 4505.2, 60 sec: 3959.4, 300 sec: 3860.0). Total num frames: 2965504. Throughput: 0: 980.6. Samples: 740260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:02:02,451][00237] Avg episode reward: [(0, '23.354')] [2024-08-27 15:02:07,448][00237] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2981888. Throughput: 0: 967.1. Samples: 745326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:02:07,450][00237] Avg episode reward: [(0, '24.101')] [2024-08-27 15:02:09,939][03015] Updated weights for policy 0, policy_version 730 (0.0020) [2024-08-27 15:02:12,452][00237] Fps is (10 sec: 3275.6, 60 sec: 3754.4, 300 sec: 3846.0). Total num frames: 2998272. Throughput: 0: 939.1. Samples: 750876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:02:12,458][00237] Avg episode reward: [(0, '24.383')] [2024-08-27 15:02:12,463][02998] Saving new best policy, reward=24.383! [2024-08-27 15:02:17,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3022848. Throughput: 0: 961.2. Samples: 754206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:02:17,451][00237] Avg episode reward: [(0, '26.844')] [2024-08-27 15:02:17,464][02998] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000738_3022848.pth... [2024-08-27 15:02:17,593][02998] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000514_2105344.pth [2024-08-27 15:02:17,603][02998] Saving new best policy, reward=26.844! [2024-08-27 15:02:19,116][03015] Updated weights for policy 0, policy_version 740 (0.0021) [2024-08-27 15:02:22,449][00237] Fps is (10 sec: 4097.5, 60 sec: 3891.1, 300 sec: 3832.2). Total num frames: 3039232. Throughput: 0: 995.2. Samples: 760170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:02:22,455][00237] Avg episode reward: [(0, '26.193')] [2024-08-27 15:02:27,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3055616. Throughput: 0: 938.3. Samples: 764696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:02:27,453][00237] Avg episode reward: [(0, '26.378')] [2024-08-27 15:02:30,529][03015] Updated weights for policy 0, policy_version 750 (0.0030) [2024-08-27 15:02:32,448][00237] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3080192. Throughput: 0: 940.0. Samples: 768224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:02:32,451][00237] Avg episode reward: [(0, '27.317')] [2024-08-27 15:02:32,458][02998] Saving new best policy, reward=27.317! [2024-08-27 15:02:37,448][00237] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3100672. Throughput: 0: 994.6. Samples: 774900. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-08-27 15:02:37,453][00237] Avg episode reward: [(0, '26.920')] [2024-08-27 15:02:41,404][03015] Updated weights for policy 0, policy_version 760 (0.0037) [2024-08-27 15:02:42,449][00237] Fps is (10 sec: 3276.3, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 3112960. Throughput: 0: 946.7. Samples: 779388. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-27 15:02:42,457][00237] Avg episode reward: [(0, '26.475')] [2024-08-27 15:02:47,448][00237] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3133440. Throughput: 0: 931.6. Samples: 782182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:02:47,450][00237] Avg episode reward: [(0, '25.214')] [2024-08-27 15:02:50,956][03015] Updated weights for policy 0, policy_version 770 (0.0030) [2024-08-27 15:02:52,452][00237] Fps is (10 sec: 4504.2, 60 sec: 3959.2, 300 sec: 3846.0). Total num frames: 3158016. Throughput: 0: 979.4. Samples: 789402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:02:52,455][00237] Avg episode reward: [(0, '25.192')] [2024-08-27 15:02:57,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3832.2). Total num frames: 3174400. Throughput: 0: 974.4. Samples: 794718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:02:57,455][00237] Avg episode reward: [(0, '25.360')] [2024-08-27 15:03:02,297][03015] Updated weights for policy 0, policy_version 780 (0.0019) [2024-08-27 15:03:02,448][00237] Fps is (10 sec: 3688.0, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 3194880. Throughput: 0: 947.6. Samples: 796846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:03:02,455][00237] Avg episode reward: [(0, '25.648')] [2024-08-27 15:03:07,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3215360. Throughput: 0: 966.2. Samples: 803650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 15:03:07,452][00237] Avg episode reward: [(0, '25.967')] [2024-08-27 15:03:11,864][03015] Updated weights for policy 0, policy_version 790 (0.0036) [2024-08-27 15:03:12,448][00237] Fps is (10 sec: 4095.9, 60 sec: 3959.7, 300 sec: 3832.2). Total num frames: 3235840. Throughput: 0: 1002.3. Samples: 809798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 15:03:12,456][00237] Avg episode reward: [(0, '25.282')] [2024-08-27 15:03:17,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3248128. Throughput: 0: 968.4. Samples: 811800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:03:17,453][00237] Avg episode reward: [(0, '24.814')] [2024-08-27 15:03:22,448][00237] Fps is (10 sec: 3686.6, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 3272704. Throughput: 0: 949.9. Samples: 817646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:03:22,454][00237] Avg episode reward: [(0, '23.751')] [2024-08-27 15:03:23,143][03015] Updated weights for policy 0, policy_version 800 (0.0027) [2024-08-27 15:03:27,448][00237] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 3297280. Throughput: 0: 1010.2. Samples: 824844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:03:27,451][00237] Avg episode reward: [(0, '21.891')] [2024-08-27 15:03:32,452][00237] Fps is (10 sec: 3685.6, 60 sec: 3822.8, 300 sec: 3832.2). Total num frames: 3309568. Throughput: 0: 997.1. Samples: 827052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:03:32,454][00237] Avg episode reward: [(0, '21.607')] [2024-08-27 15:03:34,528][03015] Updated weights for policy 0, policy_version 810 (0.0021) [2024-08-27 15:03:37,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3330048. Throughput: 0: 942.9. Samples: 831828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 15:03:37,451][00237] Avg episode reward: [(0, '21.770')] [2024-08-27 15:03:42,448][00237] Fps is (10 sec: 4096.9, 60 sec: 3959.6, 300 sec: 3860.0). Total num frames: 3350528. Throughput: 0: 981.6. Samples: 838890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 15:03:42,451][00237] Avg episode reward: [(0, '22.383')] [2024-08-27 15:03:43,422][03015] Updated weights for policy 0, policy_version 820 (0.0018) [2024-08-27 15:03:47,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3371008. Throughput: 0: 1008.9. Samples: 842248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 15:03:47,450][00237] Avg episode reward: [(0, '22.983')] [2024-08-27 15:03:52,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3860.0). Total num frames: 3387392. Throughput: 0: 951.6. Samples: 846474. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:03:52,450][00237] Avg episode reward: [(0, '24.003')] [2024-08-27 15:03:54,717][03015] Updated weights for policy 0, policy_version 830 (0.0020) [2024-08-27 15:03:57,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3411968. Throughput: 0: 969.4. Samples: 853420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:03:57,455][00237] Avg episode reward: [(0, '24.438')] [2024-08-27 15:04:02,452][00237] Fps is (10 sec: 4503.6, 60 sec: 3959.2, 300 sec: 3873.8). Total num frames: 3432448. Throughput: 0: 1003.0. Samples: 856938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:04:02,454][00237] Avg episode reward: [(0, '25.168')] [2024-08-27 15:04:04,492][03015] Updated weights for policy 0, policy_version 840 (0.0028) [2024-08-27 15:04:07,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3448832. Throughput: 0: 985.3. Samples: 861984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 15:04:07,450][00237] Avg episode reward: [(0, '26.252')] [2024-08-27 15:04:12,448][00237] Fps is (10 sec: 3688.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3469312. Throughput: 0: 956.9. Samples: 867904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 15:04:12,452][00237] Avg episode reward: [(0, '27.213')] [2024-08-27 15:04:14,847][03015] Updated weights for policy 0, policy_version 850 (0.0019) [2024-08-27 15:04:17,448][00237] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3887.8). Total num frames: 3493888. Throughput: 0: 986.9. Samples: 871460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:04:17,453][00237] Avg episode reward: [(0, '27.419')] [2024-08-27 15:04:17,463][02998] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000853_3493888.pth... [2024-08-27 15:04:17,588][02998] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000626_2564096.pth [2024-08-27 15:04:17,604][02998] Saving new best policy, reward=27.419! [2024-08-27 15:04:22,450][00237] Fps is (10 sec: 4095.1, 60 sec: 3959.3, 300 sec: 3873.8). Total num frames: 3510272. Throughput: 0: 1012.7. Samples: 877404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:04:22,454][00237] Avg episode reward: [(0, '26.813')] [2024-08-27 15:04:26,330][03015] Updated weights for policy 0, policy_version 860 (0.0038) [2024-08-27 15:04:27,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 3526656. Throughput: 0: 965.6. Samples: 882340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:04:27,451][00237] Avg episode reward: [(0, '26.305')] [2024-08-27 15:04:32,448][00237] Fps is (10 sec: 4097.0, 60 sec: 4027.9, 300 sec: 3887.7). Total num frames: 3551232. Throughput: 0: 970.3. Samples: 885912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:04:32,450][00237] Avg episode reward: [(0, '27.300')] [2024-08-27 15:04:34,932][03015] Updated weights for policy 0, policy_version 870 (0.0037) [2024-08-27 15:04:37,455][00237] Fps is (10 sec: 4502.2, 60 sec: 4027.2, 300 sec: 3887.6). Total num frames: 3571712. Throughput: 0: 1031.3. Samples: 892890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 15:04:37,458][00237] Avg episode reward: [(0, '26.432')] [2024-08-27 15:04:42,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3584000. Throughput: 0: 972.0. Samples: 897158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 15:04:42,453][00237] Avg episode reward: [(0, '25.472')] [2024-08-27 15:04:46,627][03015] Updated weights for policy 0, policy_version 880 (0.0041) [2024-08-27 15:04:47,448][00237] Fps is (10 sec: 3279.3, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3604480. Throughput: 0: 961.0. Samples: 900180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 15:04:47,450][00237] Avg episode reward: [(0, '24.473')] [2024-08-27 15:04:52,448][00237] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 3629056. Throughput: 0: 1006.1. Samples: 907258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 15:04:52,451][00237] Avg episode reward: [(0, '25.328')] [2024-08-27 15:04:56,481][03015] Updated weights for policy 0, policy_version 890 (0.0037) [2024-08-27 15:04:57,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3645440. Throughput: 0: 988.1. Samples: 912366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-27 15:04:57,451][00237] Avg episode reward: [(0, '24.435')] [2024-08-27 15:05:02,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3891.5, 300 sec: 3887.7). Total num frames: 3665920. Throughput: 0: 958.7. Samples: 914602. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 15:05:02,454][00237] Avg episode reward: [(0, '23.718')] [2024-08-27 15:05:06,940][03015] Updated weights for policy 0, policy_version 900 (0.0020) [2024-08-27 15:05:07,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3686400. Throughput: 0: 980.7. Samples: 921532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-27 15:05:07,455][00237] Avg episode reward: [(0, '24.501')] [2024-08-27 15:05:12,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3706880. Throughput: 0: 1009.9. Samples: 927786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:05:12,453][00237] Avg episode reward: [(0, '25.656')] [2024-08-27 15:05:17,448][00237] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 3723264. Throughput: 0: 978.3. Samples: 929936. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-27 15:05:17,450][00237] Avg episode reward: [(0, '26.012')] [2024-08-27 15:05:18,159][03015] Updated weights for policy 0, policy_version 910 (0.0038) [2024-08-27 15:05:22,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3887.7). Total num frames: 3743744. Throughput: 0: 960.6. Samples: 936110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-27 15:05:22,451][00237] Avg episode reward: [(0, '24.811')] [2024-08-27 15:05:26,820][03015] Updated weights for policy 0, policy_version 920 (0.0049) [2024-08-27 15:05:27,448][00237] Fps is (10 sec: 4505.8, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 3768320. Throughput: 0: 1024.7. Samples: 943268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-27 15:05:27,450][00237] Avg episode reward: [(0, '26.071')] [2024-08-27 15:05:32,453][00237] Fps is (10 sec: 4093.9, 60 sec: 3890.9, 300 sec: 3887.7). Total num frames: 3784704. Throughput: 0: 1007.8. Samples: 945536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-27 15:05:32,455][00237] Avg episode reward: [(0, '26.546')] [2024-08-27 15:05:37,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3823.4, 300 sec: 3887.7). Total num frames: 3801088. Throughput: 0: 963.6. Samples: 950620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 15:05:37,453][00237] Avg episode reward: [(0, '27.251')] [2024-08-27 15:05:38,461][03015] Updated weights for policy 0, policy_version 930 (0.0029) [2024-08-27 15:05:42,448][00237] Fps is (10 sec: 4098.0, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 3825664. Throughput: 0: 1002.0. Samples: 957454. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 15:05:42,455][00237] Avg episode reward: [(0, '26.388')] [2024-08-27 15:05:47,449][00237] Fps is (10 sec: 4095.5, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 3842048. Throughput: 0: 1020.6. Samples: 960532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 15:05:47,451][00237] Avg episode reward: [(0, '27.149')] [2024-08-27 15:05:49,040][03015] Updated weights for policy 0, policy_version 940 (0.0042) [2024-08-27 15:05:52,448][00237] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3858432. Throughput: 0: 961.6. Samples: 964806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 15:05:52,454][00237] Avg episode reward: [(0, '26.501')] [2024-08-27 15:05:57,448][00237] Fps is (10 sec: 4096.5, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3883008. Throughput: 0: 975.0. Samples: 971662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 15:05:57,450][00237] Avg episode reward: [(0, '26.417')] [2024-08-27 15:05:58,719][03015] Updated weights for policy 0, policy_version 950 (0.0018) [2024-08-27 15:06:02,448][00237] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3903488. Throughput: 0: 1007.4. Samples: 975268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:06:02,455][00237] Avg episode reward: [(0, '25.158')] [2024-08-27 15:06:07,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3919872. Throughput: 0: 977.6. Samples: 980100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:06:07,452][00237] Avg episode reward: [(0, '25.779')] [2024-08-27 15:06:10,325][03015] Updated weights for policy 0, policy_version 960 (0.0046) [2024-08-27 15:06:12,448][00237] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3940352. Throughput: 0: 948.2. Samples: 985938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-27 15:06:12,454][00237] Avg episode reward: [(0, '28.416')] [2024-08-27 15:06:12,457][02998] Saving new best policy, reward=28.416! [2024-08-27 15:06:17,448][00237] Fps is (10 sec: 4505.5, 60 sec: 4027.8, 300 sec: 3929.4). Total num frames: 3964928. Throughput: 0: 972.7. Samples: 989302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:06:17,452][00237] Avg episode reward: [(0, '27.637')] [2024-08-27 15:06:17,471][02998] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000968_3964928.pth... [2024-08-27 15:06:17,624][02998] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000738_3022848.pth [2024-08-27 15:06:19,451][03015] Updated weights for policy 0, policy_version 970 (0.0042) [2024-08-27 15:06:22,448][00237] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3981312. Throughput: 0: 991.7. Samples: 995246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-27 15:06:22,451][00237] Avg episode reward: [(0, '26.400')] [2024-08-27 15:06:27,448][00237] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3997696. Throughput: 0: 945.8. Samples: 1000016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-27 15:06:27,456][00237] Avg episode reward: [(0, '26.459')] [2024-08-27 15:06:28,930][02998] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-27 15:06:28,931][00237] Component Batcher_0 stopped! [2024-08-27 15:06:28,931][02998] Stopping Batcher_0... [2024-08-27 15:06:28,945][02998] Loop batcher_evt_loop terminating... [2024-08-27 15:06:28,994][03015] Weights refcount: 2 0 [2024-08-27 15:06:29,000][03015] Stopping InferenceWorker_p0-w0... [2024-08-27 15:06:29,001][03015] Loop inference_proc0-0_evt_loop terminating... [2024-08-27 15:06:29,001][00237] Component InferenceWorker_p0-w0 stopped! [2024-08-27 15:06:29,057][02998] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000853_3493888.pth [2024-08-27 15:06:29,083][02998] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-27 15:06:29,261][00237] Component LearnerWorker_p0 stopped! [2024-08-27 15:06:29,264][02998] Stopping LearnerWorker_p0... [2024-08-27 15:06:29,268][02998] Loop learner_proc0_evt_loop terminating... [2024-08-27 15:06:29,309][00237] Component RolloutWorker_w2 stopped! [2024-08-27 15:06:29,315][03017] Stopping RolloutWorker_w2... [2024-08-27 15:06:29,316][03017] Loop rollout_proc2_evt_loop terminating... [2024-08-27 15:06:29,322][03022] Stopping RolloutWorker_w5... [2024-08-27 15:06:29,322][00237] Component RolloutWorker_w5 stopped! [2024-08-27 15:06:29,328][00237] Component RolloutWorker_w4 stopped! [2024-08-27 15:06:29,331][03019] Stopping RolloutWorker_w4... [2024-08-27 15:06:29,332][03018] Stopping RolloutWorker_w1... [2024-08-27 15:06:29,332][00237] Component RolloutWorker_w1 stopped! [2024-08-27 15:06:29,338][03021] Stopping RolloutWorker_w6... [2024-08-27 15:06:29,339][03021] Loop rollout_proc6_evt_loop terminating... [2024-08-27 15:06:29,337][00237] Component RolloutWorker_w6 stopped! [2024-08-27 15:06:29,342][03019] Loop rollout_proc4_evt_loop terminating... [2024-08-27 15:06:29,337][03022] Loop rollout_proc5_evt_loop terminating... [2024-08-27 15:06:29,333][03018] Loop rollout_proc1_evt_loop terminating... [2024-08-27 15:06:29,357][00237] Component RolloutWorker_w0 stopped! [2024-08-27 15:06:29,360][03016] Stopping RolloutWorker_w0... [2024-08-27 15:06:29,364][03023] Stopping RolloutWorker_w7... [2024-08-27 15:06:29,364][00237] Component RolloutWorker_w7 stopped! [2024-08-27 15:06:29,371][03023] Loop rollout_proc7_evt_loop terminating... [2024-08-27 15:06:29,362][03016] Loop rollout_proc0_evt_loop terminating... [2024-08-27 15:06:29,392][03020] Stopping RolloutWorker_w3... [2024-08-27 15:06:29,393][03020] Loop rollout_proc3_evt_loop terminating... [2024-08-27 15:06:29,392][00237] Component RolloutWorker_w3 stopped! [2024-08-27 15:06:29,397][00237] Waiting for process learner_proc0 to stop... [2024-08-27 15:06:30,680][00237] Waiting for process inference_proc0-0 to join... [2024-08-27 15:06:30,689][00237] Waiting for process rollout_proc0 to join... [2024-08-27 15:06:32,693][00237] Waiting for process rollout_proc1 to join... [2024-08-27 15:06:32,698][00237] Waiting for process rollout_proc2 to join... [2024-08-27 15:06:32,703][00237] Waiting for process rollout_proc3 to join... [2024-08-27 15:06:32,705][00237] Waiting for process rollout_proc4 to join... [2024-08-27 15:06:32,709][00237] Waiting for process rollout_proc5 to join... [2024-08-27 15:06:32,712][00237] Waiting for process rollout_proc6 to join... [2024-08-27 15:06:32,716][00237] Waiting for process rollout_proc7 to join... [2024-08-27 15:06:32,719][00237] Batcher 0 profile tree view: batching: 27.7720, releasing_batches: 0.0291 [2024-08-27 15:06:32,721][00237] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0061 wait_policy_total: 393.7118 update_model: 9.4411 weight_update: 0.0047 one_step: 0.0056 handle_policy_step: 608.2165 deserialize: 14.7277, stack: 3.1022, obs_to_device_normalize: 123.7796, forward: 322.8888, send_messages: 30.2211 prepare_outputs: 83.5640 to_cpu: 48.3968 [2024-08-27 15:06:32,724][00237] Learner 0 profile tree view: misc: 0.0053, prepare_batch: 13.2388 train: 74.7645 epoch_init: 0.0156, minibatch_init: 0.0162, losses_postprocess: 0.6087, kl_divergence: 0.7023, after_optimizer: 34.7189 calculate_losses: 26.2689 losses_init: 0.0036, forward_head: 1.2570, bptt_initial: 17.4600, tail: 1.1661, advantages_returns: 0.2504, losses: 3.7055 bptt: 2.1027 bptt_forward_core: 1.9849 update: 11.6729 clip: 0.9138 [2024-08-27 15:06:32,726][00237] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3524, enqueue_policy_requests: 96.2146, env_step: 820.5335, overhead: 13.1066, complete_rollouts: 7.1622 save_policy_outputs: 21.4132 split_output_tensors: 8.5225 [2024-08-27 15:06:32,727][00237] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4021, enqueue_policy_requests: 94.5743, env_step: 821.6598, overhead: 13.6814, complete_rollouts: 6.4195 save_policy_outputs: 20.9036 split_output_tensors: 8.5356 [2024-08-27 15:06:32,729][00237] Loop Runner_EvtLoop terminating... [2024-08-27 15:06:32,731][00237] Runner profile tree view: main_loop: 1086.2234 [2024-08-27 15:06:32,733][00237] Collected {0: 4005888}, FPS: 3687.9 [2024-08-27 15:08:41,898][00237] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-27 15:08:41,900][00237] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-27 15:08:41,902][00237] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-27 15:08:41,904][00237] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-27 15:08:41,905][00237] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-27 15:08:41,907][00237] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-27 15:08:41,910][00237] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-08-27 15:08:41,912][00237] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-27 15:08:41,915][00237] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-08-27 15:08:41,916][00237] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-08-27 15:08:41,919][00237] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-27 15:08:41,920][00237] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-27 15:08:41,921][00237] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-27 15:08:41,922][00237] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-27 15:08:41,923][00237] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-27 15:08:41,975][00237] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-27 15:08:41,981][00237] RunningMeanStd input shape: (3, 72, 128) [2024-08-27 15:08:41,984][00237] RunningMeanStd input shape: (1,) [2024-08-27 15:08:42,008][00237] ConvEncoder: input_channels=3 [2024-08-27 15:08:42,172][00237] Conv encoder output size: 512 [2024-08-27 15:08:42,174][00237] Policy head output size: 512 [2024-08-27 15:08:42,409][00237] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-27 15:08:43,578][00237] Num frames 100... [2024-08-27 15:08:43,745][00237] Num frames 200... [2024-08-27 15:08:43,915][00237] Num frames 300... [2024-08-27 15:08:44,094][00237] Num frames 400... [2024-08-27 15:08:44,290][00237] Num frames 500... [2024-08-27 15:08:44,420][00237] Num frames 600... [2024-08-27 15:08:44,485][00237] Avg episode rewards: #0: 13.070, true rewards: #0: 6.070 [2024-08-27 15:08:44,487][00237] Avg episode reward: 13.070, avg true_objective: 6.070 [2024-08-27 15:08:44,599][00237] Num frames 700... [2024-08-27 15:08:44,719][00237] Num frames 800... [2024-08-27 15:08:44,843][00237] Num frames 900... [2024-08-27 15:08:44,967][00237] Num frames 1000... [2024-08-27 15:08:45,088][00237] Num frames 1100... [2024-08-27 15:08:45,207][00237] Num frames 1200... [2024-08-27 15:08:45,365][00237] Num frames 1300... [2024-08-27 15:08:45,484][00237] Num frames 1400... [2024-08-27 15:08:45,630][00237] Avg episode rewards: #0: 15.365, true rewards: #0: 7.365 [2024-08-27 15:08:45,632][00237] Avg episode reward: 15.365, avg true_objective: 7.365 [2024-08-27 15:08:45,666][00237] Num frames 1500... [2024-08-27 15:08:45,785][00237] Num frames 1600... [2024-08-27 15:08:45,905][00237] Num frames 1700... [2024-08-27 15:08:46,037][00237] Num frames 1800... [2024-08-27 15:08:46,173][00237] Num frames 1900... [2024-08-27 15:08:46,299][00237] Num frames 2000... [2024-08-27 15:08:46,433][00237] Num frames 2100... [2024-08-27 15:08:46,553][00237] Num frames 2200... [2024-08-27 15:08:46,673][00237] Num frames 2300... [2024-08-27 15:08:46,798][00237] Num frames 2400... [2024-08-27 15:08:46,917][00237] Num frames 2500... [2024-08-27 15:08:47,040][00237] Num frames 2600... [2024-08-27 15:08:47,159][00237] Num frames 2700... [2024-08-27 15:08:47,327][00237] Avg episode rewards: #0: 22.313, true rewards: #0: 9.313 [2024-08-27 15:08:47,339][00237] Avg episode reward: 22.313, avg true_objective: 9.313 [2024-08-27 15:08:47,356][00237] Num frames 2800... [2024-08-27 15:08:47,628][00237] Num frames 2900... [2024-08-27 15:08:47,820][00237] Num frames 3000... [2024-08-27 15:08:48,002][00237] Num frames 3100... [2024-08-27 15:08:48,149][00237] Num frames 3200... [2024-08-27 15:08:48,272][00237] Num frames 3300... [2024-08-27 15:08:48,396][00237] Num frames 3400... [2024-08-27 15:08:48,541][00237] Avg episode rewards: #0: 21.165, true rewards: #0: 8.665 [2024-08-27 15:08:48,543][00237] Avg episode reward: 21.165, avg true_objective: 8.665 [2024-08-27 15:08:48,588][00237] Num frames 3500... [2024-08-27 15:08:48,709][00237] Num frames 3600... [2024-08-27 15:08:48,831][00237] Num frames 3700... [2024-08-27 15:08:48,963][00237] Num frames 3800... [2024-08-27 15:08:49,086][00237] Num frames 3900... [2024-08-27 15:08:49,206][00237] Num frames 4000... [2024-08-27 15:08:49,328][00237] Num frames 4100... [2024-08-27 15:08:49,458][00237] Num frames 4200... [2024-08-27 15:08:49,578][00237] Num frames 4300... [2024-08-27 15:08:49,700][00237] Num frames 4400... [2024-08-27 15:08:49,822][00237] Num frames 4500... [2024-08-27 15:08:49,947][00237] Num frames 4600... [2024-08-27 15:08:50,067][00237] Num frames 4700... [2024-08-27 15:08:50,234][00237] Avg episode rewards: #0: 23.788, true rewards: #0: 9.588 [2024-08-27 15:08:50,235][00237] Avg episode reward: 23.788, avg true_objective: 9.588 [2024-08-27 15:08:50,245][00237] Num frames 4800... [2024-08-27 15:08:50,363][00237] Num frames 4900... [2024-08-27 15:08:50,487][00237] Num frames 5000... [2024-08-27 15:08:50,606][00237] Num frames 5100... [2024-08-27 15:08:50,727][00237] Num frames 5200... [2024-08-27 15:08:50,850][00237] Num frames 5300... [2024-08-27 15:08:50,977][00237] Num frames 5400... [2024-08-27 15:08:51,096][00237] Num frames 5500... [2024-08-27 15:08:51,215][00237] Num frames 5600... [2024-08-27 15:08:51,334][00237] Num frames 5700... [2024-08-27 15:08:51,460][00237] Num frames 5800... [2024-08-27 15:08:51,588][00237] Num frames 5900... [2024-08-27 15:08:51,710][00237] Num frames 6000... [2024-08-27 15:08:51,831][00237] Num frames 6100... [2024-08-27 15:08:51,956][00237] Num frames 6200... [2024-08-27 15:08:52,078][00237] Num frames 6300... [2024-08-27 15:08:52,200][00237] Num frames 6400... [2024-08-27 15:08:52,325][00237] Num frames 6500... [2024-08-27 15:08:52,450][00237] Num frames 6600... [2024-08-27 15:08:52,581][00237] Num frames 6700... [2024-08-27 15:08:52,703][00237] Num frames 6800... [2024-08-27 15:08:52,876][00237] Avg episode rewards: #0: 29.490, true rewards: #0: 11.490 [2024-08-27 15:08:52,877][00237] Avg episode reward: 29.490, avg true_objective: 11.490 [2024-08-27 15:08:52,889][00237] Num frames 6900... [2024-08-27 15:08:53,021][00237] Num frames 7000... [2024-08-27 15:08:53,142][00237] Num frames 7100... [2024-08-27 15:08:53,263][00237] Num frames 7200... [2024-08-27 15:08:53,386][00237] Num frames 7300... [2024-08-27 15:08:53,505][00237] Num frames 7400... [2024-08-27 15:08:53,642][00237] Num frames 7500... [2024-08-27 15:08:53,764][00237] Num frames 7600... [2024-08-27 15:08:53,886][00237] Num frames 7700... [2024-08-27 15:08:54,011][00237] Num frames 7800... [2024-08-27 15:08:54,129][00237] Num frames 7900... [2024-08-27 15:08:54,252][00237] Num frames 8000... [2024-08-27 15:08:54,395][00237] Num frames 8100... [2024-08-27 15:08:54,578][00237] Num frames 8200... [2024-08-27 15:08:54,750][00237] Num frames 8300... [2024-08-27 15:08:54,915][00237] Avg episode rewards: #0: 30.377, true rewards: #0: 11.949 [2024-08-27 15:08:54,919][00237] Avg episode reward: 30.377, avg true_objective: 11.949 [2024-08-27 15:08:54,991][00237] Num frames 8400... [2024-08-27 15:08:55,154][00237] Num frames 8500... [2024-08-27 15:08:55,317][00237] Num frames 8600... [2024-08-27 15:08:55,480][00237] Num frames 8700... [2024-08-27 15:08:55,650][00237] Num frames 8800... [2024-08-27 15:08:55,819][00237] Num frames 8900... [2024-08-27 15:08:55,986][00237] Num frames 9000... [2024-08-27 15:08:56,156][00237] Avg episode rewards: #0: 28.082, true rewards: #0: 11.332 [2024-08-27 15:08:56,159][00237] Avg episode reward: 28.082, avg true_objective: 11.332 [2024-08-27 15:08:56,223][00237] Num frames 9100... [2024-08-27 15:08:56,401][00237] Num frames 9200... [2024-08-27 15:08:56,577][00237] Num frames 9300... [2024-08-27 15:08:56,775][00237] Num frames 9400... [2024-08-27 15:08:56,899][00237] Num frames 9500... [2024-08-27 15:08:57,028][00237] Num frames 9600... [2024-08-27 15:08:57,149][00237] Num frames 9700... [2024-08-27 15:08:57,270][00237] Num frames 9800... [2024-08-27 15:08:57,395][00237] Num frames 9900... [2024-08-27 15:08:57,520][00237] Num frames 10000... [2024-08-27 15:08:57,645][00237] Num frames 10100... [2024-08-27 15:08:57,776][00237] Num frames 10200... [2024-08-27 15:08:57,899][00237] Num frames 10300... [2024-08-27 15:08:58,030][00237] Num frames 10400... [2024-08-27 15:08:58,156][00237] Num frames 10500... [2024-08-27 15:08:58,280][00237] Num frames 10600... [2024-08-27 15:08:58,378][00237] Avg episode rewards: #0: 29.593, true rewards: #0: 11.816 [2024-08-27 15:08:58,380][00237] Avg episode reward: 29.593, avg true_objective: 11.816 [2024-08-27 15:08:58,462][00237] Num frames 10700... [2024-08-27 15:08:58,586][00237] Num frames 10800... [2024-08-27 15:08:58,718][00237] Num frames 10900... [2024-08-27 15:08:58,848][00237] Num frames 11000... [2024-08-27 15:08:58,981][00237] Num frames 11100... [2024-08-27 15:08:59,059][00237] Avg episode rewards: #0: 27.817, true rewards: #0: 11.117 [2024-08-27 15:08:59,060][00237] Avg episode reward: 27.817, avg true_objective: 11.117 [2024-08-27 15:10:08,577][00237] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-08-27 15:18:10,503][00237] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-27 15:18:10,505][00237] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-27 15:18:10,507][00237] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-27 15:18:10,508][00237] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-27 15:18:10,510][00237] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-27 15:18:10,512][00237] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-27 15:18:10,514][00237] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-08-27 15:18:10,516][00237] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-27 15:18:10,517][00237] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-08-27 15:18:10,518][00237] Adding new argument 'hf_repository'='GeorgeImmanuel/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-08-27 15:18:10,519][00237] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-27 15:18:10,520][00237] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-27 15:18:10,522][00237] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-27 15:18:10,523][00237] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-27 15:18:10,524][00237] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-27 15:18:10,569][00237] RunningMeanStd input shape: (3, 72, 128) [2024-08-27 15:18:10,572][00237] RunningMeanStd input shape: (1,) [2024-08-27 15:18:10,590][00237] ConvEncoder: input_channels=3 [2024-08-27 15:18:10,659][00237] Conv encoder output size: 512 [2024-08-27 15:18:10,668][00237] Policy head output size: 512 [2024-08-27 15:18:10,704][00237] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-27 15:18:11,389][00237] Num frames 100... [2024-08-27 15:18:11,660][00237] Num frames 200... [2024-08-27 15:18:11,930][00237] Num frames 300... [2024-08-27 15:18:12,123][00237] Num frames 400... [2024-08-27 15:18:12,348][00237] Num frames 500... [2024-08-27 15:18:12,609][00237] Avg episode rewards: #0: 10.760, true rewards: #0: 5.760 [2024-08-27 15:18:12,612][00237] Avg episode reward: 10.760, avg true_objective: 5.760 [2024-08-27 15:18:12,671][00237] Num frames 600... [2024-08-27 15:18:12,950][00237] Num frames 700... [2024-08-27 15:18:13,153][00237] Num frames 800... [2024-08-27 15:18:13,376][00237] Num frames 900... [2024-08-27 15:18:13,570][00237] Num frames 1000... [2024-08-27 15:18:13,809][00237] Num frames 1100... [2024-08-27 15:18:14,077][00237] Num frames 1200... [2024-08-27 15:18:14,266][00237] Num frames 1300... [2024-08-27 15:18:14,477][00237] Num frames 1400... [2024-08-27 15:18:14,608][00237] Num frames 1500... [2024-08-27 15:18:14,736][00237] Num frames 1600... [2024-08-27 15:18:14,867][00237] Num frames 1700... [2024-08-27 15:18:15,017][00237] Num frames 1800... [2024-08-27 15:18:15,139][00237] Num frames 1900... [2024-08-27 15:18:15,265][00237] Num frames 2000... [2024-08-27 15:18:15,389][00237] Num frames 2100... [2024-08-27 15:18:15,511][00237] Num frames 2200... [2024-08-27 15:18:15,630][00237] Num frames 2300... [2024-08-27 15:18:15,755][00237] Num frames 2400... [2024-08-27 15:18:15,874][00237] Num frames 2500... [2024-08-27 15:18:16,004][00237] Num frames 2600... [2024-08-27 15:18:16,148][00237] Avg episode rewards: #0: 33.879, true rewards: #0: 13.380 [2024-08-27 15:18:16,149][00237] Avg episode reward: 33.879, avg true_objective: 13.380 [2024-08-27 15:18:16,195][00237] Num frames 2700... [2024-08-27 15:18:16,364][00237] Num frames 2800... [2024-08-27 15:18:16,531][00237] Num frames 2900... [2024-08-27 15:18:16,697][00237] Num frames 3000... [2024-08-27 15:18:16,857][00237] Num frames 3100... [2024-08-27 15:18:17,025][00237] Num frames 3200... [2024-08-27 15:18:17,184][00237] Num frames 3300... [2024-08-27 15:18:17,342][00237] Num frames 3400... [2024-08-27 15:18:17,515][00237] Num frames 3500... [2024-08-27 15:18:17,687][00237] Num frames 3600... [2024-08-27 15:18:17,868][00237] Num frames 3700... [2024-08-27 15:18:18,038][00237] Num frames 3800... [2024-08-27 15:18:18,210][00237] Num frames 3900... [2024-08-27 15:18:18,382][00237] Num frames 4000... [2024-08-27 15:18:18,568][00237] Num frames 4100... [2024-08-27 15:18:18,736][00237] Num frames 4200... [2024-08-27 15:18:18,866][00237] Num frames 4300... [2024-08-27 15:18:19,000][00237] Num frames 4400... [2024-08-27 15:18:19,123][00237] Num frames 4500... [2024-08-27 15:18:19,249][00237] Num frames 4600... [2024-08-27 15:18:19,373][00237] Num frames 4700... [2024-08-27 15:18:19,521][00237] Avg episode rewards: #0: 41.919, true rewards: #0: 15.920 [2024-08-27 15:18:19,522][00237] Avg episode reward: 41.919, avg true_objective: 15.920 [2024-08-27 15:18:19,555][00237] Num frames 4800... [2024-08-27 15:18:19,687][00237] Num frames 4900... [2024-08-27 15:18:19,819][00237] Num frames 5000... [2024-08-27 15:18:19,942][00237] Num frames 5100... [2024-08-27 15:18:20,064][00237] Num frames 5200... [2024-08-27 15:18:20,184][00237] Num frames 5300... [2024-08-27 15:18:20,306][00237] Num frames 5400... [2024-08-27 15:18:20,430][00237] Num frames 5500... [2024-08-27 15:18:20,556][00237] Num frames 5600... [2024-08-27 15:18:20,677][00237] Num frames 5700... [2024-08-27 15:18:20,795][00237] Num frames 5800... [2024-08-27 15:18:20,924][00237] Num frames 5900... [2024-08-27 15:18:21,049][00237] Num frames 6000... [2024-08-27 15:18:21,172][00237] Num frames 6100... [2024-08-27 15:18:21,294][00237] Num frames 6200... [2024-08-27 15:18:21,408][00237] Avg episode rewards: #0: 39.619, true rewards: #0: 15.620 [2024-08-27 15:18:21,410][00237] Avg episode reward: 39.619, avg true_objective: 15.620 [2024-08-27 15:18:21,484][00237] Num frames 6300... [2024-08-27 15:18:21,605][00237] Num frames 6400... [2024-08-27 15:18:21,727][00237] Num frames 6500... [2024-08-27 15:18:21,852][00237] Num frames 6600... [2024-08-27 15:18:21,981][00237] Num frames 6700... [2024-08-27 15:18:22,102][00237] Num frames 6800... [2024-08-27 15:18:22,221][00237] Num frames 6900... [2024-08-27 15:18:22,343][00237] Num frames 7000... [2024-08-27 15:18:22,466][00237] Num frames 7100... [2024-08-27 15:18:22,578][00237] Avg episode rewards: #0: 35.088, true rewards: #0: 14.288 [2024-08-27 15:18:22,579][00237] Avg episode reward: 35.088, avg true_objective: 14.288 [2024-08-27 15:18:22,651][00237] Num frames 7200... [2024-08-27 15:18:22,772][00237] Num frames 7300... [2024-08-27 15:18:22,900][00237] Num frames 7400... [2024-08-27 15:18:23,027][00237] Num frames 7500... [2024-08-27 15:18:23,152][00237] Num frames 7600... [2024-08-27 15:18:23,299][00237] Avg episode rewards: #0: 31.126, true rewards: #0: 12.793 [2024-08-27 15:18:23,301][00237] Avg episode reward: 31.126, avg true_objective: 12.793 [2024-08-27 15:18:23,333][00237] Num frames 7700... [2024-08-27 15:18:23,457][00237] Num frames 7800... [2024-08-27 15:18:23,582][00237] Num frames 7900... [2024-08-27 15:18:23,702][00237] Num frames 8000... [2024-08-27 15:18:23,823][00237] Num frames 8100... [2024-08-27 15:18:23,957][00237] Num frames 8200... [2024-08-27 15:18:24,076][00237] Num frames 8300... [2024-08-27 15:18:24,197][00237] Num frames 8400... [2024-08-27 15:18:24,267][00237] Avg episode rewards: #0: 29.303, true rewards: #0: 12.017 [2024-08-27 15:18:24,269][00237] Avg episode reward: 29.303, avg true_objective: 12.017 [2024-08-27 15:18:24,373][00237] Num frames 8500... [2024-08-27 15:18:24,498][00237] Num frames 8600... [2024-08-27 15:18:24,619][00237] Num frames 8700... [2024-08-27 15:18:24,738][00237] Num frames 8800... [2024-08-27 15:18:24,856][00237] Num frames 8900... [2024-08-27 15:18:24,992][00237] Num frames 9000... [2024-08-27 15:18:25,125][00237] Num frames 9100... [2024-08-27 15:18:25,245][00237] Num frames 9200... [2024-08-27 15:18:25,367][00237] Num frames 9300... [2024-08-27 15:18:25,489][00237] Num frames 9400... [2024-08-27 15:18:25,615][00237] Num frames 9500... [2024-08-27 15:18:25,747][00237] Num frames 9600... [2024-08-27 15:18:25,883][00237] Num frames 9700... [2024-08-27 15:18:26,024][00237] Num frames 9800... [2024-08-27 15:18:26,149][00237] Num frames 9900... [2024-08-27 15:18:26,271][00237] Num frames 10000... [2024-08-27 15:18:26,400][00237] Num frames 10100... [2024-08-27 15:18:26,522][00237] Num frames 10200... [2024-08-27 15:18:26,649][00237] Num frames 10300... [2024-08-27 15:18:26,770][00237] Num frames 10400... [2024-08-27 15:18:26,898][00237] Num frames 10500... [2024-08-27 15:18:26,972][00237] Avg episode rewards: #0: 32.890, true rewards: #0: 13.140 [2024-08-27 15:18:26,973][00237] Avg episode reward: 32.890, avg true_objective: 13.140 [2024-08-27 15:18:27,086][00237] Num frames 10600... [2024-08-27 15:18:27,208][00237] Num frames 10700... [2024-08-27 15:18:27,330][00237] Num frames 10800... [2024-08-27 15:18:27,453][00237] Num frames 10900... [2024-08-27 15:18:27,573][00237] Num frames 11000... [2024-08-27 15:18:27,695][00237] Num frames 11100... [2024-08-27 15:18:27,820][00237] Num frames 11200... [2024-08-27 15:18:27,949][00237] Num frames 11300... [2024-08-27 15:18:28,080][00237] Num frames 11400... [2024-08-27 15:18:28,202][00237] Num frames 11500... [2024-08-27 15:18:28,323][00237] Num frames 11600... [2024-08-27 15:18:28,443][00237] Num frames 11700... [2024-08-27 15:18:28,564][00237] Num frames 11800... [2024-08-27 15:18:28,691][00237] Num frames 11900... [2024-08-27 15:18:28,859][00237] Num frames 12000... [2024-08-27 15:18:29,048][00237] Num frames 12100... [2024-08-27 15:18:29,214][00237] Num frames 12200... [2024-08-27 15:18:29,380][00237] Num frames 12300... [2024-08-27 15:18:29,549][00237] Num frames 12400... [2024-08-27 15:18:29,730][00237] Avg episode rewards: #0: 35.415, true rewards: #0: 13.860 [2024-08-27 15:18:29,733][00237] Avg episode reward: 35.415, avg true_objective: 13.860 [2024-08-27 15:18:29,780][00237] Num frames 12500... [2024-08-27 15:18:29,948][00237] Num frames 12600... [2024-08-27 15:18:30,128][00237] Num frames 12700... [2024-08-27 15:18:30,304][00237] Num frames 12800... [2024-08-27 15:18:30,477][00237] Num frames 12900... [2024-08-27 15:18:30,649][00237] Num frames 13000... [2024-08-27 15:18:30,817][00237] Num frames 13100... [2024-08-27 15:18:30,996][00237] Num frames 13200... [2024-08-27 15:18:31,185][00237] Num frames 13300... [2024-08-27 15:18:31,327][00237] Num frames 13400... [2024-08-27 15:18:31,447][00237] Num frames 13500... [2024-08-27 15:18:31,574][00237] Num frames 13600... [2024-08-27 15:18:31,702][00237] Num frames 13700... [2024-08-27 15:18:31,827][00237] Num frames 13800... [2024-08-27 15:18:31,958][00237] Num frames 13900... [2024-08-27 15:18:32,083][00237] Num frames 14000... [2024-08-27 15:18:32,214][00237] Num frames 14100... [2024-08-27 15:18:32,339][00237] Num frames 14200... [2024-08-27 15:18:32,464][00237] Num frames 14300... [2024-08-27 15:18:32,594][00237] Num frames 14400... [2024-08-27 15:18:32,722][00237] Avg episode rewards: #0: 36.958, true rewards: #0: 14.458 [2024-08-27 15:18:32,723][00237] Avg episode reward: 36.958, avg true_objective: 14.458 [2024-08-27 15:20:05,961][00237] Replay video saved to /content/train_dir/default_experiment/replay.mp4!