[2024-08-24 01:31:18,283][01629] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-08-24 01:31:18,287][01629] Rollout worker 0 uses device cpu [2024-08-24 01:31:18,288][01629] Rollout worker 1 uses device cpu [2024-08-24 01:31:18,291][01629] Rollout worker 2 uses device cpu [2024-08-24 01:31:18,294][01629] Rollout worker 3 uses device cpu [2024-08-24 01:31:18,297][01629] Rollout worker 4 uses device cpu [2024-08-24 01:31:18,303][01629] Rollout worker 5 uses device cpu [2024-08-24 01:31:18,309][01629] Rollout worker 6 uses device cpu [2024-08-24 01:31:18,311][01629] Rollout worker 7 uses device cpu [2024-08-24 01:31:18,645][01629] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-24 01:31:18,648][01629] InferenceWorker_p0-w0: min num requests: 2 [2024-08-24 01:31:18,693][01629] Starting all processes... [2024-08-24 01:31:18,696][01629] Starting process learner_proc0 [2024-08-24 01:31:18,799][01629] Starting all processes... [2024-08-24 01:31:18,898][01629] Starting process inference_proc0-0 [2024-08-24 01:31:18,899][01629] Starting process rollout_proc0 [2024-08-24 01:31:18,901][01629] Starting process rollout_proc1 [2024-08-24 01:31:18,901][01629] Starting process rollout_proc2 [2024-08-24 01:31:18,901][01629] Starting process rollout_proc3 [2024-08-24 01:31:18,901][01629] Starting process rollout_proc4 [2024-08-24 01:31:18,901][01629] Starting process rollout_proc5 [2024-08-24 01:31:18,901][01629] Starting process rollout_proc6 [2024-08-24 01:31:18,901][01629] Starting process rollout_proc7 [2024-08-24 01:31:30,650][04321] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-24 01:31:30,652][04321] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-08-24 01:31:30,710][04321] Num visible devices: 1 [2024-08-24 01:31:30,717][04344] Worker 4 uses CPU cores [0] [2024-08-24 01:31:30,718][04336] Worker 1 uses CPU cores [1] [2024-08-24 01:31:30,739][04321] Starting seed is not provided [2024-08-24 01:31:30,740][04321] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-24 01:31:30,740][04321] Initializing actor-critic model on device cuda:0 [2024-08-24 01:31:30,741][04321] RunningMeanStd input shape: (3, 72, 128) [2024-08-24 01:31:30,743][04321] RunningMeanStd input shape: (1,) [2024-08-24 01:31:30,813][04321] ConvEncoder: input_channels=3 [2024-08-24 01:31:30,855][04338] Worker 3 uses CPU cores [1] [2024-08-24 01:31:30,856][04335] Worker 0 uses CPU cores [0] [2024-08-24 01:31:30,944][04343] Worker 5 uses CPU cores [1] [2024-08-24 01:31:30,944][04337] Worker 2 uses CPU cores [0] [2024-08-24 01:31:31,043][04345] Worker 6 uses CPU cores [0] [2024-08-24 01:31:31,073][04334] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-24 01:31:31,074][04334] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-08-24 01:31:31,080][04346] Worker 7 uses CPU cores [1] [2024-08-24 01:31:31,092][04334] Num visible devices: 1 [2024-08-24 01:31:31,170][04321] Conv encoder output size: 512 [2024-08-24 01:31:31,170][04321] Policy head output size: 512 [2024-08-24 01:31:31,185][04321] Created Actor Critic model with architecture: [2024-08-24 01:31:31,185][04321] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-08-24 01:31:35,648][04321] Using optimizer [2024-08-24 01:31:35,650][04321] No checkpoints found [2024-08-24 01:31:35,650][04321] Did not load from checkpoint, starting from scratch! [2024-08-24 01:31:35,650][04321] Initialized policy 0 weights for model version 0 [2024-08-24 01:31:35,655][04321] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-24 01:31:35,662][04321] LearnerWorker_p0 finished initialization! [2024-08-24 01:31:35,899][04334] RunningMeanStd input shape: (3, 72, 128) [2024-08-24 01:31:35,901][04334] RunningMeanStd input shape: (1,) [2024-08-24 01:31:35,913][04334] ConvEncoder: input_channels=3 [2024-08-24 01:31:36,010][04334] Conv encoder output size: 512 [2024-08-24 01:31:36,010][04334] Policy head output size: 512 [2024-08-24 01:31:37,370][01629] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-24 01:31:37,914][01629] Inference worker 0-0 is ready! [2024-08-24 01:31:37,916][01629] All inference workers are ready! Signal rollout workers to start! [2024-08-24 01:31:38,081][04338] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-24 01:31:38,095][04346] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-24 01:31:38,095][04343] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-24 01:31:38,115][04336] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-24 01:31:38,135][04335] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-24 01:31:38,140][04344] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-24 01:31:38,144][04345] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-24 01:31:38,139][04337] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-24 01:31:38,630][01629] Heartbeat connected on Batcher_0 [2024-08-24 01:31:38,635][01629] Heartbeat connected on LearnerWorker_p0 [2024-08-24 01:31:38,704][01629] Heartbeat connected on InferenceWorker_p0-w0 [2024-08-24 01:31:40,217][04344] Decorrelating experience for 0 frames... [2024-08-24 01:31:40,218][04337] Decorrelating experience for 0 frames... [2024-08-24 01:31:40,219][04338] Decorrelating experience for 0 frames... [2024-08-24 01:31:40,219][04345] Decorrelating experience for 0 frames... [2024-08-24 01:31:40,223][04346] Decorrelating experience for 0 frames... [2024-08-24 01:31:40,225][04343] Decorrelating experience for 0 frames... [2024-08-24 01:31:41,225][04338] Decorrelating experience for 32 frames... [2024-08-24 01:31:41,227][04336] Decorrelating experience for 0 frames... [2024-08-24 01:31:41,590][04345] Decorrelating experience for 32 frames... [2024-08-24 01:31:41,594][04344] Decorrelating experience for 32 frames... [2024-08-24 01:31:41,609][04337] Decorrelating experience for 32 frames... [2024-08-24 01:31:41,970][04336] Decorrelating experience for 32 frames... [2024-08-24 01:31:42,062][04338] Decorrelating experience for 64 frames... [2024-08-24 01:31:42,370][01629] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-24 01:31:42,523][04338] Decorrelating experience for 96 frames... [2024-08-24 01:31:42,610][01629] Heartbeat connected on RolloutWorker_w3 [2024-08-24 01:31:43,012][04345] Decorrelating experience for 64 frames... [2024-08-24 01:31:43,033][04337] Decorrelating experience for 64 frames... [2024-08-24 01:31:43,035][04344] Decorrelating experience for 64 frames... [2024-08-24 01:31:43,362][04335] Decorrelating experience for 0 frames... [2024-08-24 01:31:43,766][04336] Decorrelating experience for 64 frames... [2024-08-24 01:31:44,291][04343] Decorrelating experience for 32 frames... [2024-08-24 01:31:44,439][04345] Decorrelating experience for 96 frames... [2024-08-24 01:31:44,457][04344] Decorrelating experience for 96 frames... [2024-08-24 01:31:44,721][01629] Heartbeat connected on RolloutWorker_w6 [2024-08-24 01:31:44,731][01629] Heartbeat connected on RolloutWorker_w4 [2024-08-24 01:31:44,890][04336] Decorrelating experience for 96 frames... [2024-08-24 01:31:44,961][04337] Decorrelating experience for 96 frames... [2024-08-24 01:31:45,037][01629] Heartbeat connected on RolloutWorker_w1 [2024-08-24 01:31:45,130][01629] Heartbeat connected on RolloutWorker_w2 [2024-08-24 01:31:45,162][04346] Decorrelating experience for 32 frames... [2024-08-24 01:31:45,674][04335] Decorrelating experience for 32 frames... [2024-08-24 01:31:45,909][04346] Decorrelating experience for 64 frames... [2024-08-24 01:31:47,370][01629] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 98.0. Samples: 980. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-24 01:31:47,376][01629] Avg episode reward: [(0, '1.720')] [2024-08-24 01:31:48,636][04343] Decorrelating experience for 64 frames... [2024-08-24 01:31:48,745][04335] Decorrelating experience for 64 frames... [2024-08-24 01:31:49,218][04346] Decorrelating experience for 96 frames... [2024-08-24 01:31:49,533][04321] Signal inference workers to stop experience collection... [2024-08-24 01:31:49,551][04334] InferenceWorker_p0-w0: stopping experience collection [2024-08-24 01:31:49,609][01629] Heartbeat connected on RolloutWorker_w7 [2024-08-24 01:31:49,930][04343] Decorrelating experience for 96 frames... [2024-08-24 01:31:50,025][01629] Heartbeat connected on RolloutWorker_w5 [2024-08-24 01:31:50,120][04335] Decorrelating experience for 96 frames... [2024-08-24 01:31:50,178][01629] Heartbeat connected on RolloutWorker_w0 [2024-08-24 01:31:51,406][04321] Signal inference workers to resume experience collection... [2024-08-24 01:31:51,408][04334] InferenceWorker_p0-w0: resuming experience collection [2024-08-24 01:31:52,370][01629] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 167.9. Samples: 2518. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-08-24 01:31:52,377][01629] Avg episode reward: [(0, '2.839')] [2024-08-24 01:31:57,370][01629] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 20480. Throughput: 0: 237.4. Samples: 4748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:31:57,376][01629] Avg episode reward: [(0, '3.522')] [2024-08-24 01:32:01,689][04334] Updated weights for policy 0, policy_version 10 (0.0579) [2024-08-24 01:32:02,370][01629] Fps is (10 sec: 3686.4, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 40960. Throughput: 0: 445.7. Samples: 11142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:32:02,378][01629] Avg episode reward: [(0, '4.168')] [2024-08-24 01:32:07,370][01629] Fps is (10 sec: 4505.6, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 65536. Throughput: 0: 463.2. Samples: 13896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:32:07,372][01629] Avg episode reward: [(0, '4.346')] [2024-08-24 01:32:12,370][01629] Fps is (10 sec: 3686.4, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 550.9. Samples: 19282. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-24 01:32:12,372][01629] Avg episode reward: [(0, '4.444')] [2024-08-24 01:32:13,333][04334] Updated weights for policy 0, policy_version 20 (0.0023) [2024-08-24 01:32:17,370][01629] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 98304. Throughput: 0: 614.7. Samples: 24588. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-24 01:32:17,376][01629] Avg episode reward: [(0, '4.370')] [2024-08-24 01:32:22,370][01629] Fps is (10 sec: 4096.0, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 118784. Throughput: 0: 624.1. Samples: 28086. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-24 01:32:22,372][01629] Avg episode reward: [(0, '4.366')] [2024-08-24 01:32:22,380][04321] Saving new best policy, reward=4.366! [2024-08-24 01:32:22,644][04334] Updated weights for policy 0, policy_version 30 (0.0016) [2024-08-24 01:32:27,370][01629] Fps is (10 sec: 4096.0, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 764.9. Samples: 34422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:32:27,374][01629] Avg episode reward: [(0, '4.308')] [2024-08-24 01:32:32,370][01629] Fps is (10 sec: 3276.8, 60 sec: 2755.5, 300 sec: 2755.5). Total num frames: 151552. Throughput: 0: 843.0. Samples: 38914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:32:32,375][01629] Avg episode reward: [(0, '4.335')] [2024-08-24 01:32:34,287][04334] Updated weights for policy 0, policy_version 40 (0.0031) [2024-08-24 01:32:37,372][01629] Fps is (10 sec: 3685.5, 60 sec: 2935.4, 300 sec: 2935.4). Total num frames: 176128. Throughput: 0: 879.9. Samples: 42114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:32:37,375][01629] Avg episode reward: [(0, '4.448')] [2024-08-24 01:32:37,380][04321] Saving new best policy, reward=4.448! [2024-08-24 01:32:42,374][01629] Fps is (10 sec: 4503.5, 60 sec: 3276.5, 300 sec: 3024.5). Total num frames: 196608. Throughput: 0: 983.3. Samples: 49002. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:32:42,380][01629] Avg episode reward: [(0, '4.404')] [2024-08-24 01:32:44,223][04334] Updated weights for policy 0, policy_version 50 (0.0019) [2024-08-24 01:32:47,371][01629] Fps is (10 sec: 3686.7, 60 sec: 3549.8, 300 sec: 3042.7). Total num frames: 212992. Throughput: 0: 944.9. Samples: 53664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:32:47,374][01629] Avg episode reward: [(0, '4.264')] [2024-08-24 01:32:52,370][01629] Fps is (10 sec: 3688.1, 60 sec: 3822.9, 300 sec: 3113.0). Total num frames: 233472. Throughput: 0: 940.9. Samples: 56238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:32:52,376][01629] Avg episode reward: [(0, '4.349')] [2024-08-24 01:32:55,163][04334] Updated weights for policy 0, policy_version 60 (0.0016) [2024-08-24 01:32:57,370][01629] Fps is (10 sec: 4096.6, 60 sec: 3891.2, 300 sec: 3174.4). Total num frames: 253952. Throughput: 0: 964.8. Samples: 62700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:32:57,374][01629] Avg episode reward: [(0, '4.369')] [2024-08-24 01:33:02,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3180.4). Total num frames: 270336. Throughput: 0: 966.5. Samples: 68080. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:33:02,373][01629] Avg episode reward: [(0, '4.366')] [2024-08-24 01:33:07,138][04334] Updated weights for policy 0, policy_version 70 (0.0021) [2024-08-24 01:33:07,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3185.8). Total num frames: 286720. Throughput: 0: 936.0. Samples: 70204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:33:07,376][01629] Avg episode reward: [(0, '4.392')] [2024-08-24 01:33:12,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3233.7). Total num frames: 307200. Throughput: 0: 938.6. Samples: 76660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:33:12,373][01629] Avg episode reward: [(0, '4.424')] [2024-08-24 01:33:12,390][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000076_311296.pth... [2024-08-24 01:33:16,146][04334] Updated weights for policy 0, policy_version 80 (0.0013) [2024-08-24 01:33:17,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 327680. Throughput: 0: 982.2. Samples: 83114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:33:17,379][01629] Avg episode reward: [(0, '4.321')] [2024-08-24 01:33:22,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 344064. Throughput: 0: 958.3. Samples: 85234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:33:22,373][01629] Avg episode reward: [(0, '4.301')] [2024-08-24 01:33:27,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3314.0). Total num frames: 364544. Throughput: 0: 927.0. Samples: 90714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:33:27,376][01629] Avg episode reward: [(0, '4.531')] [2024-08-24 01:33:27,383][04321] Saving new best policy, reward=4.531! [2024-08-24 01:33:27,894][04334] Updated weights for policy 0, policy_version 90 (0.0020) [2024-08-24 01:33:32,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3383.7). Total num frames: 389120. Throughput: 0: 971.7. Samples: 97390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:33:32,373][01629] Avg episode reward: [(0, '4.386')] [2024-08-24 01:33:37,370][01629] Fps is (10 sec: 3686.1, 60 sec: 3754.8, 300 sec: 3345.0). Total num frames: 401408. Throughput: 0: 973.4. Samples: 100040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:33:37,375][01629] Avg episode reward: [(0, '4.451')] [2024-08-24 01:33:39,346][04334] Updated weights for policy 0, policy_version 100 (0.0029) [2024-08-24 01:33:42,370][01629] Fps is (10 sec: 2867.2, 60 sec: 3686.7, 300 sec: 3342.3). Total num frames: 417792. Throughput: 0: 923.9. Samples: 104274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:33:42,373][01629] Avg episode reward: [(0, '4.654')] [2024-08-24 01:33:42,384][04321] Saving new best policy, reward=4.654! [2024-08-24 01:33:47,370][01629] Fps is (10 sec: 4096.3, 60 sec: 3823.0, 300 sec: 3402.8). Total num frames: 442368. Throughput: 0: 959.7. Samples: 111268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:33:47,371][01629] Avg episode reward: [(0, '4.638')] [2024-08-24 01:33:48,624][04334] Updated weights for policy 0, policy_version 110 (0.0022) [2024-08-24 01:33:52,376][01629] Fps is (10 sec: 4502.6, 60 sec: 3822.5, 300 sec: 3428.3). Total num frames: 462848. Throughput: 0: 987.4. Samples: 114644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:33:52,382][01629] Avg episode reward: [(0, '4.539')] [2024-08-24 01:33:57,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3393.8). Total num frames: 475136. Throughput: 0: 945.0. Samples: 119186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:33:57,379][01629] Avg episode reward: [(0, '4.730')] [2024-08-24 01:33:57,383][04321] Saving new best policy, reward=4.730! [2024-08-24 01:34:00,447][04334] Updated weights for policy 0, policy_version 120 (0.0029) [2024-08-24 01:34:02,370][01629] Fps is (10 sec: 3688.8, 60 sec: 3822.9, 300 sec: 3446.3). Total num frames: 499712. Throughput: 0: 937.5. Samples: 125302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-24 01:34:02,377][01629] Avg episode reward: [(0, '4.742')] [2024-08-24 01:34:02,393][04321] Saving new best policy, reward=4.742! [2024-08-24 01:34:07,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3467.9). Total num frames: 520192. Throughput: 0: 965.0. Samples: 128658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:34:07,377][01629] Avg episode reward: [(0, '4.540')] [2024-08-24 01:34:10,002][04334] Updated weights for policy 0, policy_version 130 (0.0020) [2024-08-24 01:34:12,373][01629] Fps is (10 sec: 3685.2, 60 sec: 3822.7, 300 sec: 3461.7). Total num frames: 536576. Throughput: 0: 965.4. Samples: 134162. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:34:12,375][01629] Avg episode reward: [(0, '4.497')] [2024-08-24 01:34:17,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3481.6). Total num frames: 557056. Throughput: 0: 930.2. Samples: 139250. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:34:17,376][01629] Avg episode reward: [(0, '4.497')] [2024-08-24 01:34:21,032][04334] Updated weights for policy 0, policy_version 140 (0.0023) [2024-08-24 01:34:22,370][01629] Fps is (10 sec: 4097.3, 60 sec: 3891.2, 300 sec: 3500.2). Total num frames: 577536. Throughput: 0: 948.1. Samples: 142706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:34:22,375][01629] Avg episode reward: [(0, '4.240')] [2024-08-24 01:34:27,371][01629] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3517.7). Total num frames: 598016. Throughput: 0: 1000.9. Samples: 149316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:34:27,381][01629] Avg episode reward: [(0, '4.273')] [2024-08-24 01:34:32,277][04334] Updated weights for policy 0, policy_version 150 (0.0030) [2024-08-24 01:34:32,370][01629] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3510.9). Total num frames: 614400. Throughput: 0: 943.1. Samples: 153708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:34:32,372][01629] Avg episode reward: [(0, '4.349')] [2024-08-24 01:34:37,370][01629] Fps is (10 sec: 3686.9, 60 sec: 3891.2, 300 sec: 3527.1). Total num frames: 634880. Throughput: 0: 944.9. Samples: 157158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:34:37,376][01629] Avg episode reward: [(0, '4.586')] [2024-08-24 01:34:41,505][04334] Updated weights for policy 0, policy_version 160 (0.0026) [2024-08-24 01:34:42,371][01629] Fps is (10 sec: 4095.5, 60 sec: 3959.4, 300 sec: 3542.5). Total num frames: 655360. Throughput: 0: 992.9. Samples: 163870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:34:42,377][01629] Avg episode reward: [(0, '4.538')] [2024-08-24 01:34:47,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3535.5). Total num frames: 671744. Throughput: 0: 964.8. Samples: 168716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:34:47,372][01629] Avg episode reward: [(0, '4.402')] [2024-08-24 01:34:52,370][01629] Fps is (10 sec: 3687.0, 60 sec: 3823.4, 300 sec: 3549.9). Total num frames: 692224. Throughput: 0: 946.6. Samples: 171254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:34:52,372][01629] Avg episode reward: [(0, '4.420')] [2024-08-24 01:34:52,858][04334] Updated weights for policy 0, policy_version 170 (0.0017) [2024-08-24 01:34:57,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3563.5). Total num frames: 712704. Throughput: 0: 979.3. Samples: 178228. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:34:57,372][01629] Avg episode reward: [(0, '4.436')] [2024-08-24 01:35:02,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3576.5). Total num frames: 733184. Throughput: 0: 991.6. Samples: 183874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:35:02,372][01629] Avg episode reward: [(0, '4.512')] [2024-08-24 01:35:03,415][04334] Updated weights for policy 0, policy_version 180 (0.0017) [2024-08-24 01:35:07,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3569.4). Total num frames: 749568. Throughput: 0: 961.7. Samples: 185982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:35:07,372][01629] Avg episode reward: [(0, '4.680')] [2024-08-24 01:35:12,370][01629] Fps is (10 sec: 3686.3, 60 sec: 3891.4, 300 sec: 3581.6). Total num frames: 770048. Throughput: 0: 956.6. Samples: 192360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:35:12,373][01629] Avg episode reward: [(0, '4.732')] [2024-08-24 01:35:12,381][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth... [2024-08-24 01:35:13,475][04334] Updated weights for policy 0, policy_version 190 (0.0015) [2024-08-24 01:35:17,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3611.9). Total num frames: 794624. Throughput: 0: 1009.9. Samples: 199154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:35:17,375][01629] Avg episode reward: [(0, '4.733')] [2024-08-24 01:35:22,370][01629] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3586.3). Total num frames: 806912. Throughput: 0: 977.6. Samples: 201150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:35:22,374][01629] Avg episode reward: [(0, '4.710')] [2024-08-24 01:35:25,032][04334] Updated weights for policy 0, policy_version 200 (0.0017) [2024-08-24 01:35:27,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3597.4). Total num frames: 827392. Throughput: 0: 947.2. Samples: 206494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:35:27,372][01629] Avg episode reward: [(0, '4.535')] [2024-08-24 01:35:32,372][01629] Fps is (10 sec: 4504.4, 60 sec: 3959.3, 300 sec: 3625.4). Total num frames: 851968. Throughput: 0: 994.9. Samples: 213490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:35:32,375][01629] Avg episode reward: [(0, '4.627')] [2024-08-24 01:35:33,970][04334] Updated weights for policy 0, policy_version 210 (0.0034) [2024-08-24 01:35:37,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3618.1). Total num frames: 868352. Throughput: 0: 1005.4. Samples: 216498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:35:37,372][01629] Avg episode reward: [(0, '4.654')] [2024-08-24 01:35:42,370][01629] Fps is (10 sec: 3277.6, 60 sec: 3823.0, 300 sec: 3611.2). Total num frames: 884736. Throughput: 0: 942.5. Samples: 220642. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:35:42,377][01629] Avg episode reward: [(0, '4.802')] [2024-08-24 01:35:42,391][04321] Saving new best policy, reward=4.802! [2024-08-24 01:35:45,607][04334] Updated weights for policy 0, policy_version 220 (0.0019) [2024-08-24 01:35:47,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3637.2). Total num frames: 909312. Throughput: 0: 970.6. Samples: 227552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:35:47,377][01629] Avg episode reward: [(0, '4.672')] [2024-08-24 01:35:52,370][01629] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3646.2). Total num frames: 929792. Throughput: 0: 1000.9. Samples: 231024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:35:52,376][01629] Avg episode reward: [(0, '4.549')] [2024-08-24 01:35:56,337][04334] Updated weights for policy 0, policy_version 230 (0.0019) [2024-08-24 01:35:57,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3623.4). Total num frames: 942080. Throughput: 0: 967.2. Samples: 235882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:35:57,374][01629] Avg episode reward: [(0, '4.552')] [2024-08-24 01:36:02,370][01629] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3632.3). Total num frames: 962560. Throughput: 0: 942.9. Samples: 241584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:36:02,376][01629] Avg episode reward: [(0, '4.498')] [2024-08-24 01:36:06,176][04334] Updated weights for policy 0, policy_version 240 (0.0026) [2024-08-24 01:36:07,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3656.1). Total num frames: 987136. Throughput: 0: 974.2. Samples: 244988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:36:07,373][01629] Avg episode reward: [(0, '4.456')] [2024-08-24 01:36:12,373][01629] Fps is (10 sec: 4094.5, 60 sec: 3891.0, 300 sec: 3649.1). Total num frames: 1003520. Throughput: 0: 982.6. Samples: 250716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:36:12,380][01629] Avg episode reward: [(0, '4.610')] [2024-08-24 01:36:17,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3642.5). Total num frames: 1019904. Throughput: 0: 936.5. Samples: 255632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:36:17,372][01629] Avg episode reward: [(0, '4.684')] [2024-08-24 01:36:17,920][04334] Updated weights for policy 0, policy_version 250 (0.0018) [2024-08-24 01:36:22,370][01629] Fps is (10 sec: 4097.5, 60 sec: 3959.5, 300 sec: 3664.8). Total num frames: 1044480. Throughput: 0: 946.0. Samples: 259070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:36:22,373][01629] Avg episode reward: [(0, '4.533')] [2024-08-24 01:36:27,370][01629] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3658.1). Total num frames: 1060864. Throughput: 0: 1002.3. Samples: 265748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:36:27,374][01629] Avg episode reward: [(0, '4.635')] [2024-08-24 01:36:27,414][04334] Updated weights for policy 0, policy_version 260 (0.0026) [2024-08-24 01:36:32,370][01629] Fps is (10 sec: 3276.7, 60 sec: 3754.8, 300 sec: 3651.7). Total num frames: 1077248. Throughput: 0: 944.1. Samples: 270038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:36:32,376][01629] Avg episode reward: [(0, '4.881')] [2024-08-24 01:36:32,397][04321] Saving new best policy, reward=4.881! [2024-08-24 01:36:37,370][01629] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1097728. Throughput: 0: 935.9. Samples: 273138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:36:37,372][01629] Avg episode reward: [(0, '4.890')] [2024-08-24 01:36:37,379][04321] Saving new best policy, reward=4.890! [2024-08-24 01:36:38,529][04334] Updated weights for policy 0, policy_version 270 (0.0024) [2024-08-24 01:36:42,370][01629] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 976.0. Samples: 279804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:36:42,372][01629] Avg episode reward: [(0, '4.768')] [2024-08-24 01:36:47,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1134592. Throughput: 0: 959.1. Samples: 284744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:36:47,371][01629] Avg episode reward: [(0, '4.821')] [2024-08-24 01:36:50,218][04334] Updated weights for policy 0, policy_version 280 (0.0015) [2024-08-24 01:36:52,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1155072. Throughput: 0: 932.0. Samples: 286928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:36:52,373][01629] Avg episode reward: [(0, '4.875')] [2024-08-24 01:36:57,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1179648. Throughput: 0: 960.3. Samples: 293924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:36:57,372][01629] Avg episode reward: [(0, '5.002')] [2024-08-24 01:36:57,377][04321] Saving new best policy, reward=5.002! [2024-08-24 01:36:59,154][04334] Updated weights for policy 0, policy_version 290 (0.0018) [2024-08-24 01:37:02,370][01629] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1196032. Throughput: 0: 986.4. Samples: 300022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:37:02,376][01629] Avg episode reward: [(0, '5.061')] [2024-08-24 01:37:02,399][04321] Saving new best policy, reward=5.061! [2024-08-24 01:37:07,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1212416. Throughput: 0: 953.6. Samples: 301982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:37:07,375][01629] Avg episode reward: [(0, '5.352')] [2024-08-24 01:37:07,379][04321] Saving new best policy, reward=5.352! [2024-08-24 01:37:10,935][04334] Updated weights for policy 0, policy_version 300 (0.0026) [2024-08-24 01:37:12,370][01629] Fps is (10 sec: 3686.6, 60 sec: 3823.2, 300 sec: 3846.1). Total num frames: 1232896. Throughput: 0: 932.7. Samples: 307718. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:37:12,372][01629] Avg episode reward: [(0, '5.553')] [2024-08-24 01:37:12,382][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000301_1232896.pth... [2024-08-24 01:37:12,523][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000076_311296.pth [2024-08-24 01:37:12,539][04321] Saving new best policy, reward=5.553! [2024-08-24 01:37:17,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1253376. Throughput: 0: 989.7. Samples: 314572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:37:17,373][01629] Avg episode reward: [(0, '5.477')] [2024-08-24 01:37:21,704][04334] Updated weights for policy 0, policy_version 310 (0.0017) [2024-08-24 01:37:22,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1269760. Throughput: 0: 969.4. Samples: 316762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:37:22,375][01629] Avg episode reward: [(0, '5.263')] [2024-08-24 01:37:27,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 1290240. Throughput: 0: 929.9. Samples: 321648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:37:27,372][01629] Avg episode reward: [(0, '5.089')] [2024-08-24 01:37:31,746][04334] Updated weights for policy 0, policy_version 320 (0.0012) [2024-08-24 01:37:32,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1310720. Throughput: 0: 975.3. Samples: 328632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:37:32,372][01629] Avg episode reward: [(0, '4.898')] [2024-08-24 01:37:37,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1331200. Throughput: 0: 1002.4. Samples: 332034. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:37:37,374][01629] Avg episode reward: [(0, '5.093')] [2024-08-24 01:37:42,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1343488. Throughput: 0: 943.5. Samples: 336382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:37:42,376][01629] Avg episode reward: [(0, '5.396')] [2024-08-24 01:37:43,244][04334] Updated weights for policy 0, policy_version 330 (0.0017) [2024-08-24 01:37:47,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1368064. Throughput: 0: 950.8. Samples: 342808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:37:47,375][01629] Avg episode reward: [(0, '5.518')] [2024-08-24 01:37:52,234][04334] Updated weights for policy 0, policy_version 340 (0.0012) [2024-08-24 01:37:52,370][01629] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1392640. Throughput: 0: 984.8. Samples: 346296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-24 01:37:52,373][01629] Avg episode reward: [(0, '5.673')] [2024-08-24 01:37:52,389][04321] Saving new best policy, reward=5.673! [2024-08-24 01:37:57,370][01629] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 1404928. Throughput: 0: 970.6. Samples: 351394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:37:57,372][01629] Avg episode reward: [(0, '5.618')] [2024-08-24 01:38:02,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 1425408. Throughput: 0: 945.2. Samples: 357106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:38:02,373][01629] Avg episode reward: [(0, '5.354')] [2024-08-24 01:38:03,544][04334] Updated weights for policy 0, policy_version 350 (0.0016) [2024-08-24 01:38:07,370][01629] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1449984. Throughput: 0: 973.4. Samples: 360566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:38:07,372][01629] Avg episode reward: [(0, '5.206')] [2024-08-24 01:38:12,370][01629] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1466368. Throughput: 0: 998.2. Samples: 366566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:38:12,377][01629] Avg episode reward: [(0, '5.318')] [2024-08-24 01:38:14,900][04334] Updated weights for policy 0, policy_version 360 (0.0021) [2024-08-24 01:38:17,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1482752. Throughput: 0: 942.3. Samples: 371036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:38:17,372][01629] Avg episode reward: [(0, '5.284')] [2024-08-24 01:38:22,370][01629] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1503232. Throughput: 0: 942.1. Samples: 374430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:38:22,376][01629] Avg episode reward: [(0, '5.417')] [2024-08-24 01:38:24,390][04334] Updated weights for policy 0, policy_version 370 (0.0022) [2024-08-24 01:38:27,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1527808. Throughput: 0: 1001.5. Samples: 381448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:38:27,376][01629] Avg episode reward: [(0, '5.406')] [2024-08-24 01:38:32,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1540096. Throughput: 0: 956.0. Samples: 385826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:38:32,374][01629] Avg episode reward: [(0, '5.330')] [2024-08-24 01:38:35,914][04334] Updated weights for policy 0, policy_version 380 (0.0032) [2024-08-24 01:38:37,371][01629] Fps is (10 sec: 3276.5, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1560576. Throughput: 0: 941.9. Samples: 388682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:38:37,377][01629] Avg episode reward: [(0, '5.687')] [2024-08-24 01:38:37,379][04321] Saving new best policy, reward=5.687! [2024-08-24 01:38:42,370][01629] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 1585152. Throughput: 0: 976.3. Samples: 395326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:38:42,372][01629] Avg episode reward: [(0, '5.991')] [2024-08-24 01:38:42,382][04321] Saving new best policy, reward=5.991! [2024-08-24 01:38:46,009][04334] Updated weights for policy 0, policy_version 390 (0.0049) [2024-08-24 01:38:47,370][01629] Fps is (10 sec: 3686.7, 60 sec: 3822.9, 300 sec: 3846.2). Total num frames: 1597440. Throughput: 0: 962.4. Samples: 400412. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:38:47,386][01629] Avg episode reward: [(0, '6.713')] [2024-08-24 01:38:47,397][04321] Saving new best policy, reward=6.713! [2024-08-24 01:38:52,370][01629] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 1613824. Throughput: 0: 929.4. Samples: 402390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:38:52,374][01629] Avg episode reward: [(0, '6.648')] [2024-08-24 01:38:57,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1634304. Throughput: 0: 934.0. Samples: 408596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:38:57,372][01629] Avg episode reward: [(0, '7.200')] [2024-08-24 01:38:57,380][04321] Saving new best policy, reward=7.200! [2024-08-24 01:38:57,383][04334] Updated weights for policy 0, policy_version 400 (0.0024) [2024-08-24 01:39:02,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1654784. Throughput: 0: 969.3. Samples: 414656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:39:02,375][01629] Avg episode reward: [(0, '7.105')] [2024-08-24 01:39:07,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 1667072. Throughput: 0: 937.9. Samples: 416636. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:39:07,373][01629] Avg episode reward: [(0, '7.217')] [2024-08-24 01:39:07,385][04321] Saving new best policy, reward=7.217! [2024-08-24 01:39:09,546][04334] Updated weights for policy 0, policy_version 410 (0.0025) [2024-08-24 01:39:12,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1687552. Throughput: 0: 895.8. Samples: 421758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:39:12,377][01629] Avg episode reward: [(0, '7.441')] [2024-08-24 01:39:12,389][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000412_1687552.pth... [2024-08-24 01:39:12,524][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth [2024-08-24 01:39:12,543][04321] Saving new best policy, reward=7.441! [2024-08-24 01:39:17,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1708032. Throughput: 0: 940.3. Samples: 428138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:39:17,375][01629] Avg episode reward: [(0, '7.945')] [2024-08-24 01:39:17,466][04321] Saving new best policy, reward=7.945! [2024-08-24 01:39:20,268][04334] Updated weights for policy 0, policy_version 420 (0.0018) [2024-08-24 01:39:22,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 1724416. Throughput: 0: 930.8. Samples: 430568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:39:22,373][01629] Avg episode reward: [(0, '7.952')] [2024-08-24 01:39:22,389][04321] Saving new best policy, reward=7.952! [2024-08-24 01:39:27,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 1744896. Throughput: 0: 885.7. Samples: 435182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-24 01:39:27,371][01629] Avg episode reward: [(0, '8.030')] [2024-08-24 01:39:27,377][04321] Saving new best policy, reward=8.030! [2024-08-24 01:39:30,799][04334] Updated weights for policy 0, policy_version 430 (0.0024) [2024-08-24 01:39:32,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1765376. Throughput: 0: 928.3. Samples: 442184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:39:32,373][01629] Avg episode reward: [(0, '7.958')] [2024-08-24 01:39:37,372][01629] Fps is (10 sec: 4094.9, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 1785856. Throughput: 0: 960.1. Samples: 445596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:39:37,379][01629] Avg episode reward: [(0, '7.767')] [2024-08-24 01:39:42,174][04334] Updated weights for policy 0, policy_version 440 (0.0023) [2024-08-24 01:39:42,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 1802240. Throughput: 0: 921.9. Samples: 450082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:39:42,381][01629] Avg episode reward: [(0, '8.204')] [2024-08-24 01:39:42,392][04321] Saving new best policy, reward=8.204! [2024-08-24 01:39:47,370][01629] Fps is (10 sec: 3687.2, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 1822720. Throughput: 0: 925.1. Samples: 456286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:39:47,378][01629] Avg episode reward: [(0, '9.129')] [2024-08-24 01:39:47,380][04321] Saving new best policy, reward=9.129! [2024-08-24 01:39:51,492][04334] Updated weights for policy 0, policy_version 450 (0.0019) [2024-08-24 01:39:52,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1847296. Throughput: 0: 954.4. Samples: 459582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:39:52,374][01629] Avg episode reward: [(0, '9.151')] [2024-08-24 01:39:52,384][04321] Saving new best policy, reward=9.151! [2024-08-24 01:39:57,370][01629] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1859584. Throughput: 0: 956.3. Samples: 464790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:39:57,375][01629] Avg episode reward: [(0, '9.328')] [2024-08-24 01:39:57,380][04321] Saving new best policy, reward=9.328! [2024-08-24 01:40:02,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1880064. Throughput: 0: 934.6. Samples: 470196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:40:02,377][01629] Avg episode reward: [(0, '8.641')] [2024-08-24 01:40:03,137][04334] Updated weights for policy 0, policy_version 460 (0.0016) [2024-08-24 01:40:07,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1900544. Throughput: 0: 959.0. Samples: 473722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:40:07,373][01629] Avg episode reward: [(0, '10.396')] [2024-08-24 01:40:07,384][04321] Saving new best policy, reward=10.396! [2024-08-24 01:40:12,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1921024. Throughput: 0: 992.5. Samples: 479846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:40:12,374][01629] Avg episode reward: [(0, '11.248')] [2024-08-24 01:40:12,382][04321] Saving new best policy, reward=11.248! [2024-08-24 01:40:13,975][04334] Updated weights for policy 0, policy_version 470 (0.0032) [2024-08-24 01:40:17,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1933312. Throughput: 0: 933.0. Samples: 484170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:40:17,372][01629] Avg episode reward: [(0, '10.857')] [2024-08-24 01:40:22,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1957888. Throughput: 0: 932.3. Samples: 487546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:40:22,376][01629] Avg episode reward: [(0, '11.267')] [2024-08-24 01:40:22,384][04321] Saving new best policy, reward=11.267! [2024-08-24 01:40:23,828][04334] Updated weights for policy 0, policy_version 480 (0.0027) [2024-08-24 01:40:27,374][01629] Fps is (10 sec: 4503.5, 60 sec: 3890.9, 300 sec: 3818.3). Total num frames: 1978368. Throughput: 0: 987.7. Samples: 494534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:40:27,377][01629] Avg episode reward: [(0, '11.571')] [2024-08-24 01:40:27,379][04321] Saving new best policy, reward=11.571! [2024-08-24 01:40:32,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1994752. Throughput: 0: 951.6. Samples: 499108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:40:32,372][01629] Avg episode reward: [(0, '12.096')] [2024-08-24 01:40:32,390][04321] Saving new best policy, reward=12.096! [2024-08-24 01:40:35,563][04334] Updated weights for policy 0, policy_version 490 (0.0018) [2024-08-24 01:40:37,370][01629] Fps is (10 sec: 3688.1, 60 sec: 3823.1, 300 sec: 3832.2). Total num frames: 2015232. Throughput: 0: 936.3. Samples: 501714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:40:37,378][01629] Avg episode reward: [(0, '12.620')] [2024-08-24 01:40:37,382][04321] Saving new best policy, reward=12.620! [2024-08-24 01:40:42,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2035712. Throughput: 0: 968.4. Samples: 508366. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:40:42,372][01629] Avg episode reward: [(0, '14.155')] [2024-08-24 01:40:42,384][04321] Saving new best policy, reward=14.155! [2024-08-24 01:40:44,934][04334] Updated weights for policy 0, policy_version 500 (0.0015) [2024-08-24 01:40:47,370][01629] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2052096. Throughput: 0: 970.7. Samples: 513880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:40:47,376][01629] Avg episode reward: [(0, '14.133')] [2024-08-24 01:40:52,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 2068480. Throughput: 0: 940.9. Samples: 516064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:40:52,372][01629] Avg episode reward: [(0, '13.897')] [2024-08-24 01:40:56,108][04334] Updated weights for policy 0, policy_version 510 (0.0014) [2024-08-24 01:40:57,370][01629] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2093056. Throughput: 0: 951.0. Samples: 522640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:40:57,375][01629] Avg episode reward: [(0, '13.416')] [2024-08-24 01:41:02,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2113536. Throughput: 0: 998.0. Samples: 529082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:41:02,373][01629] Avg episode reward: [(0, '13.071')] [2024-08-24 01:41:07,103][04334] Updated weights for policy 0, policy_version 520 (0.0036) [2024-08-24 01:41:07,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.4). Total num frames: 2129920. Throughput: 0: 971.2. Samples: 531248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:41:07,375][01629] Avg episode reward: [(0, '14.081')] [2024-08-24 01:41:12,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2150400. Throughput: 0: 940.8. Samples: 536866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:41:12,372][01629] Avg episode reward: [(0, '14.390')] [2024-08-24 01:41:12,381][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000525_2150400.pth... [2024-08-24 01:41:12,504][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000301_1232896.pth [2024-08-24 01:41:12,517][04321] Saving new best policy, reward=14.390! [2024-08-24 01:41:16,590][04334] Updated weights for policy 0, policy_version 530 (0.0016) [2024-08-24 01:41:17,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2170880. Throughput: 0: 989.0. Samples: 543614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:41:17,377][01629] Avg episode reward: [(0, '15.104')] [2024-08-24 01:41:17,379][04321] Saving new best policy, reward=15.104! [2024-08-24 01:41:22,375][01629] Fps is (10 sec: 3684.6, 60 sec: 3822.6, 300 sec: 3818.3). Total num frames: 2187264. Throughput: 0: 987.8. Samples: 546168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:41:22,377][01629] Avg episode reward: [(0, '15.604')] [2024-08-24 01:41:22,387][04321] Saving new best policy, reward=15.604! [2024-08-24 01:41:27,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3832.2). Total num frames: 2207744. Throughput: 0: 941.6. Samples: 550738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:41:27,372][01629] Avg episode reward: [(0, '15.122')] [2024-08-24 01:41:28,253][04334] Updated weights for policy 0, policy_version 540 (0.0015) [2024-08-24 01:41:32,370][01629] Fps is (10 sec: 4098.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2228224. Throughput: 0: 977.1. Samples: 557848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:41:32,372][01629] Avg episode reward: [(0, '15.402')] [2024-08-24 01:41:37,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2248704. Throughput: 0: 1005.1. Samples: 561294. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:41:37,376][01629] Avg episode reward: [(0, '14.712')] [2024-08-24 01:41:38,368][04334] Updated weights for policy 0, policy_version 550 (0.0023) [2024-08-24 01:41:42,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2260992. Throughput: 0: 955.0. Samples: 565614. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:41:42,375][01629] Avg episode reward: [(0, '14.644')] [2024-08-24 01:41:47,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2285568. Throughput: 0: 952.0. Samples: 571920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:41:47,375][01629] Avg episode reward: [(0, '13.140')] [2024-08-24 01:41:48,756][04334] Updated weights for policy 0, policy_version 560 (0.0016) [2024-08-24 01:41:52,370][01629] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 2310144. Throughput: 0: 981.4. Samples: 575412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:41:52,376][01629] Avg episode reward: [(0, '13.813')] [2024-08-24 01:41:57,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2322432. Throughput: 0: 980.0. Samples: 580964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:41:57,374][01629] Avg episode reward: [(0, '13.839')] [2024-08-24 01:42:00,314][04334] Updated weights for policy 0, policy_version 570 (0.0027) [2024-08-24 01:42:02,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2342912. Throughput: 0: 947.0. Samples: 586228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:42:02,372][01629] Avg episode reward: [(0, '14.644')] [2024-08-24 01:42:07,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2367488. Throughput: 0: 969.7. Samples: 589798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:42:07,377][01629] Avg episode reward: [(0, '16.281')] [2024-08-24 01:42:07,379][04321] Saving new best policy, reward=16.281! [2024-08-24 01:42:09,033][04334] Updated weights for policy 0, policy_version 580 (0.0021) [2024-08-24 01:42:12,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2383872. Throughput: 0: 1011.7. Samples: 596264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:42:12,376][01629] Avg episode reward: [(0, '15.363')] [2024-08-24 01:42:17,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2400256. Throughput: 0: 948.8. Samples: 600542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:42:17,375][01629] Avg episode reward: [(0, '15.659')] [2024-08-24 01:42:20,677][04334] Updated weights for policy 0, policy_version 590 (0.0051) [2024-08-24 01:42:22,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.8, 300 sec: 3846.1). Total num frames: 2424832. Throughput: 0: 945.6. Samples: 603848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:42:22,377][01629] Avg episode reward: [(0, '16.049')] [2024-08-24 01:42:27,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2445312. Throughput: 0: 1007.1. Samples: 610934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:42:27,375][01629] Avg episode reward: [(0, '17.026')] [2024-08-24 01:42:27,377][04321] Saving new best policy, reward=17.026! [2024-08-24 01:42:30,493][04334] Updated weights for policy 0, policy_version 600 (0.0024) [2024-08-24 01:42:32,370][01629] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2461696. Throughput: 0: 975.9. Samples: 615836. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:42:32,379][01629] Avg episode reward: [(0, '17.241')] [2024-08-24 01:42:32,398][04321] Saving new best policy, reward=17.241! [2024-08-24 01:42:37,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 2478080. Throughput: 0: 948.2. Samples: 618080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:42:37,375][01629] Avg episode reward: [(0, '18.771')] [2024-08-24 01:42:37,413][04321] Saving new best policy, reward=18.771! [2024-08-24 01:42:41,312][04334] Updated weights for policy 0, policy_version 610 (0.0040) [2024-08-24 01:42:42,370][01629] Fps is (10 sec: 4096.2, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 2502656. Throughput: 0: 976.3. Samples: 624898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:42:42,377][01629] Avg episode reward: [(0, '18.666')] [2024-08-24 01:42:47,372][01629] Fps is (10 sec: 4095.2, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 2519040. Throughput: 0: 990.6. Samples: 630808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:42:47,378][01629] Avg episode reward: [(0, '18.173')] [2024-08-24 01:42:52,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2535424. Throughput: 0: 958.9. Samples: 632948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:42:52,377][01629] Avg episode reward: [(0, '18.087')] [2024-08-24 01:42:52,666][04334] Updated weights for policy 0, policy_version 620 (0.0037) [2024-08-24 01:42:57,370][01629] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2560000. Throughput: 0: 953.7. Samples: 639182. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:42:57,375][01629] Avg episode reward: [(0, '18.490')] [2024-08-24 01:43:01,564][04334] Updated weights for policy 0, policy_version 630 (0.0029) [2024-08-24 01:43:02,374][01629] Fps is (10 sec: 4503.9, 60 sec: 3959.2, 300 sec: 3832.1). Total num frames: 2580480. Throughput: 0: 1010.5. Samples: 646020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:43:02,376][01629] Avg episode reward: [(0, '18.278')] [2024-08-24 01:43:07,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2596864. Throughput: 0: 985.3. Samples: 648188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:43:07,374][01629] Avg episode reward: [(0, '17.847')] [2024-08-24 01:43:12,370][01629] Fps is (10 sec: 3687.8, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2617344. Throughput: 0: 949.1. Samples: 653644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:43:12,375][01629] Avg episode reward: [(0, '17.444')] [2024-08-24 01:43:12,392][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000639_2617344.pth... [2024-08-24 01:43:12,560][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000412_1687552.pth [2024-08-24 01:43:13,221][04334] Updated weights for policy 0, policy_version 640 (0.0023) [2024-08-24 01:43:17,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2637824. Throughput: 0: 984.6. Samples: 660144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:43:17,374][01629] Avg episode reward: [(0, '16.446')] [2024-08-24 01:43:22,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2654208. Throughput: 0: 1001.6. Samples: 663152. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:43:22,372][01629] Avg episode reward: [(0, '16.107')] [2024-08-24 01:43:24,108][04334] Updated weights for policy 0, policy_version 650 (0.0031) [2024-08-24 01:43:27,372][01629] Fps is (10 sec: 3276.1, 60 sec: 3754.5, 300 sec: 3832.2). Total num frames: 2670592. Throughput: 0: 946.7. Samples: 667502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:43:27,376][01629] Avg episode reward: [(0, '18.814')] [2024-08-24 01:43:27,463][04321] Saving new best policy, reward=18.814! [2024-08-24 01:43:32,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2695168. Throughput: 0: 965.6. Samples: 674260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:43:32,375][01629] Avg episode reward: [(0, '18.572')] [2024-08-24 01:43:33,989][04334] Updated weights for policy 0, policy_version 660 (0.0023) [2024-08-24 01:43:37,370][01629] Fps is (10 sec: 4506.5, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 2715648. Throughput: 0: 989.8. Samples: 677490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:43:37,377][01629] Avg episode reward: [(0, '19.633')] [2024-08-24 01:43:37,382][04321] Saving new best policy, reward=19.633! [2024-08-24 01:43:42,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 2727936. Throughput: 0: 954.0. Samples: 682112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:43:42,374][01629] Avg episode reward: [(0, '19.635')] [2024-08-24 01:43:42,385][04321] Saving new best policy, reward=19.635! [2024-08-24 01:43:45,724][04334] Updated weights for policy 0, policy_version 670 (0.0014) [2024-08-24 01:43:47,371][01629] Fps is (10 sec: 3276.5, 60 sec: 3823.0, 300 sec: 3846.1). Total num frames: 2748416. Throughput: 0: 933.8. Samples: 688038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:43:47,377][01629] Avg episode reward: [(0, '20.706')] [2024-08-24 01:43:47,379][04321] Saving new best policy, reward=20.706! [2024-08-24 01:43:52,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2772992. Throughput: 0: 961.1. Samples: 691436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:43:52,372][01629] Avg episode reward: [(0, '20.105')] [2024-08-24 01:43:55,437][04334] Updated weights for policy 0, policy_version 680 (0.0023) [2024-08-24 01:43:57,373][01629] Fps is (10 sec: 4094.9, 60 sec: 3822.7, 300 sec: 3846.0). Total num frames: 2789376. Throughput: 0: 964.3. Samples: 697040. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:43:57,376][01629] Avg episode reward: [(0, '20.618')] [2024-08-24 01:44:02,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3860.0). Total num frames: 2805760. Throughput: 0: 935.9. Samples: 702260. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-24 01:44:02,372][01629] Avg episode reward: [(0, '20.536')] [2024-08-24 01:44:05,997][04334] Updated weights for policy 0, policy_version 690 (0.0021) [2024-08-24 01:44:07,370][01629] Fps is (10 sec: 4097.5, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2830336. Throughput: 0: 948.5. Samples: 705834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:44:07,376][01629] Avg episode reward: [(0, '21.284')] [2024-08-24 01:44:07,378][04321] Saving new best policy, reward=21.284! [2024-08-24 01:44:12,370][01629] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2850816. Throughput: 0: 996.1. Samples: 712326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:44:12,380][01629] Avg episode reward: [(0, '19.942')] [2024-08-24 01:44:17,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2863104. Throughput: 0: 940.8. Samples: 716594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:44:17,373][01629] Avg episode reward: [(0, '20.186')] [2024-08-24 01:44:17,808][04334] Updated weights for policy 0, policy_version 700 (0.0012) [2024-08-24 01:44:22,370][01629] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2887680. Throughput: 0: 941.4. Samples: 719854. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-24 01:44:22,377][01629] Avg episode reward: [(0, '21.108')] [2024-08-24 01:44:26,451][04334] Updated weights for policy 0, policy_version 710 (0.0019) [2024-08-24 01:44:27,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 3873.8). Total num frames: 2908160. Throughput: 0: 995.7. Samples: 726920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:44:27,375][01629] Avg episode reward: [(0, '21.197')] [2024-08-24 01:44:32,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2924544. Throughput: 0: 973.3. Samples: 731836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:44:32,376][01629] Avg episode reward: [(0, '22.493')] [2024-08-24 01:44:32,387][04321] Saving new best policy, reward=22.493! [2024-08-24 01:44:37,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2945024. Throughput: 0: 945.6. Samples: 733986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:44:37,371][01629] Avg episode reward: [(0, '22.395')] [2024-08-24 01:44:38,347][04334] Updated weights for policy 0, policy_version 720 (0.0014) [2024-08-24 01:44:42,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2965504. Throughput: 0: 974.7. Samples: 740900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:44:42,372][01629] Avg episode reward: [(0, '22.326')] [2024-08-24 01:44:47,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2985984. Throughput: 0: 991.3. Samples: 746870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:44:47,373][01629] Avg episode reward: [(0, '21.614')] [2024-08-24 01:44:48,765][04334] Updated weights for policy 0, policy_version 730 (0.0028) [2024-08-24 01:44:52,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2998272. Throughput: 0: 958.8. Samples: 748978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:44:52,378][01629] Avg episode reward: [(0, '18.432')] [2024-08-24 01:44:57,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3873.8). Total num frames: 3022848. Throughput: 0: 951.9. Samples: 755162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:44:57,375][01629] Avg episode reward: [(0, '16.501')] [2024-08-24 01:44:58,792][04334] Updated weights for policy 0, policy_version 740 (0.0031) [2024-08-24 01:45:02,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3043328. Throughput: 0: 1008.6. Samples: 761980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:45:02,372][01629] Avg episode reward: [(0, '16.255')] [2024-08-24 01:45:07,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3059712. Throughput: 0: 984.5. Samples: 764158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:45:07,372][01629] Avg episode reward: [(0, '16.637')] [2024-08-24 01:45:10,422][04334] Updated weights for policy 0, policy_version 750 (0.0037) [2024-08-24 01:45:12,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3080192. Throughput: 0: 940.0. Samples: 769222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:45:12,372][01629] Avg episode reward: [(0, '17.461')] [2024-08-24 01:45:12,381][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000752_3080192.pth... [2024-08-24 01:45:12,564][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000525_2150400.pth [2024-08-24 01:45:17,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3100672. Throughput: 0: 977.4. Samples: 775818. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:45:17,376][01629] Avg episode reward: [(0, '19.206')] [2024-08-24 01:45:19,601][04334] Updated weights for policy 0, policy_version 760 (0.0013) [2024-08-24 01:45:22,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3117056. Throughput: 0: 998.8. Samples: 778934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:45:22,372][01629] Avg episode reward: [(0, '21.320')] [2024-08-24 01:45:27,371][01629] Fps is (10 sec: 3276.4, 60 sec: 3754.6, 300 sec: 3859.9). Total num frames: 3133440. Throughput: 0: 940.0. Samples: 783200. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:45:27,377][01629] Avg episode reward: [(0, '22.300')] [2024-08-24 01:45:31,210][04334] Updated weights for policy 0, policy_version 770 (0.0033) [2024-08-24 01:45:32,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3158016. Throughput: 0: 959.6. Samples: 790050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-24 01:45:32,375][01629] Avg episode reward: [(0, '21.586')] [2024-08-24 01:45:37,370][01629] Fps is (10 sec: 4506.1, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3178496. Throughput: 0: 987.0. Samples: 793392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:45:37,373][01629] Avg episode reward: [(0, '21.654')] [2024-08-24 01:45:42,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3190784. Throughput: 0: 957.6. Samples: 798252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:45:42,373][01629] Avg episode reward: [(0, '21.446')] [2024-08-24 01:45:42,570][04334] Updated weights for policy 0, policy_version 780 (0.0021) [2024-08-24 01:45:47,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 3211264. Throughput: 0: 928.6. Samples: 803766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:45:47,372][01629] Avg episode reward: [(0, '20.654')] [2024-08-24 01:45:52,151][04334] Updated weights for policy 0, policy_version 790 (0.0019) [2024-08-24 01:45:52,370][01629] Fps is (10 sec: 4505.4, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 3235840. Throughput: 0: 952.8. Samples: 807036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:45:52,378][01629] Avg episode reward: [(0, '20.795')] [2024-08-24 01:45:57,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3248128. Throughput: 0: 965.4. Samples: 812666. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:45:57,374][01629] Avg episode reward: [(0, '21.362')] [2024-08-24 01:46:02,370][01629] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3268608. Throughput: 0: 923.8. Samples: 817388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:46:02,378][01629] Avg episode reward: [(0, '21.403')] [2024-08-24 01:46:03,850][04334] Updated weights for policy 0, policy_version 800 (0.0018) [2024-08-24 01:46:07,370][01629] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3293184. Throughput: 0: 933.6. Samples: 820948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:46:07,373][01629] Avg episode reward: [(0, '20.705')] [2024-08-24 01:46:12,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3309568. Throughput: 0: 995.0. Samples: 827974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:46:12,376][01629] Avg episode reward: [(0, '20.149')] [2024-08-24 01:46:13,960][04334] Updated weights for policy 0, policy_version 810 (0.0017) [2024-08-24 01:46:17,370][01629] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 3325952. Throughput: 0: 934.9. Samples: 832122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:46:17,376][01629] Avg episode reward: [(0, '20.465')] [2024-08-24 01:46:22,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3346432. Throughput: 0: 926.2. Samples: 835072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:46:22,372][01629] Avg episode reward: [(0, '20.955')] [2024-08-24 01:46:24,440][04334] Updated weights for policy 0, policy_version 820 (0.0013) [2024-08-24 01:46:27,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 3371008. Throughput: 0: 974.4. Samples: 842100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:46:27,371][01629] Avg episode reward: [(0, '20.656')] [2024-08-24 01:46:32,371][01629] Fps is (10 sec: 4095.3, 60 sec: 3822.8, 300 sec: 3859.9). Total num frames: 3387392. Throughput: 0: 965.4. Samples: 847210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:46:32,374][01629] Avg episode reward: [(0, '21.314')] [2024-08-24 01:46:36,185][04334] Updated weights for policy 0, policy_version 830 (0.0017) [2024-08-24 01:46:37,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 3403776. Throughput: 0: 937.9. Samples: 849240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:46:37,372][01629] Avg episode reward: [(0, '22.517')] [2024-08-24 01:46:37,376][04321] Saving new best policy, reward=22.517! [2024-08-24 01:46:42,370][01629] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3424256. Throughput: 0: 957.1. Samples: 855736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:46:42,373][01629] Avg episode reward: [(0, '23.119')] [2024-08-24 01:46:42,382][04321] Saving new best policy, reward=23.119! [2024-08-24 01:46:45,901][04334] Updated weights for policy 0, policy_version 840 (0.0019) [2024-08-24 01:46:47,370][01629] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3444736. Throughput: 0: 981.7. Samples: 861564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:46:47,380][01629] Avg episode reward: [(0, '23.158')] [2024-08-24 01:46:47,385][04321] Saving new best policy, reward=23.158! [2024-08-24 01:46:52,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 3457024. Throughput: 0: 944.6. Samples: 863454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:46:52,376][01629] Avg episode reward: [(0, '23.491')] [2024-08-24 01:46:52,391][04321] Saving new best policy, reward=23.491! [2024-08-24 01:46:57,370][01629] Fps is (10 sec: 3277.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3477504. Throughput: 0: 906.0. Samples: 868744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:46:57,373][01629] Avg episode reward: [(0, '23.735')] [2024-08-24 01:46:57,379][04321] Saving new best policy, reward=23.735! [2024-08-24 01:46:57,976][04334] Updated weights for policy 0, policy_version 850 (0.0014) [2024-08-24 01:47:02,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3497984. Throughput: 0: 964.4. Samples: 875518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:47:02,378][01629] Avg episode reward: [(0, '23.130')] [2024-08-24 01:47:07,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3514368. Throughput: 0: 953.0. Samples: 877958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:47:07,376][01629] Avg episode reward: [(0, '21.844')] [2024-08-24 01:47:09,345][04334] Updated weights for policy 0, policy_version 860 (0.0015) [2024-08-24 01:47:12,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 3534848. Throughput: 0: 904.1. Samples: 882784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:47:12,375][01629] Avg episode reward: [(0, '20.081')] [2024-08-24 01:47:12,385][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000863_3534848.pth... [2024-08-24 01:47:12,571][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000639_2617344.pth [2024-08-24 01:47:17,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3555328. Throughput: 0: 933.3. Samples: 889206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:47:17,377][01629] Avg episode reward: [(0, '20.117')] [2024-08-24 01:47:19,033][04334] Updated weights for policy 0, policy_version 870 (0.0019) [2024-08-24 01:47:22,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3571712. Throughput: 0: 962.8. Samples: 892564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:47:22,379][01629] Avg episode reward: [(0, '20.877')] [2024-08-24 01:47:27,370][01629] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 3588096. Throughput: 0: 911.7. Samples: 896764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:47:27,373][01629] Avg episode reward: [(0, '21.791')] [2024-08-24 01:47:30,572][04334] Updated weights for policy 0, policy_version 880 (0.0021) [2024-08-24 01:47:32,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3846.1). Total num frames: 3612672. Throughput: 0: 925.2. Samples: 903198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:47:32,374][01629] Avg episode reward: [(0, '21.497')] [2024-08-24 01:47:37,373][01629] Fps is (10 sec: 4504.1, 60 sec: 3822.7, 300 sec: 3832.1). Total num frames: 3633152. Throughput: 0: 955.8. Samples: 906470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:47:37,376][01629] Avg episode reward: [(0, '21.588')] [2024-08-24 01:47:41,231][04334] Updated weights for policy 0, policy_version 890 (0.0027) [2024-08-24 01:47:42,370][01629] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3645440. Throughput: 0: 953.0. Samples: 911628. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:47:42,377][01629] Avg episode reward: [(0, '21.512')] [2024-08-24 01:47:47,370][01629] Fps is (10 sec: 3278.0, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3665920. Throughput: 0: 921.1. Samples: 916968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:47:47,372][01629] Avg episode reward: [(0, '19.893')] [2024-08-24 01:47:51,733][04334] Updated weights for policy 0, policy_version 900 (0.0026) [2024-08-24 01:47:52,370][01629] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3686400. Throughput: 0: 940.7. Samples: 920290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:47:52,372][01629] Avg episode reward: [(0, '19.905')] [2024-08-24 01:47:57,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.5). Total num frames: 3702784. Throughput: 0: 968.5. Samples: 926366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:47:57,373][01629] Avg episode reward: [(0, '19.608')] [2024-08-24 01:48:02,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 3719168. Throughput: 0: 922.9. Samples: 930738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:48:02,374][01629] Avg episode reward: [(0, '19.842')] [2024-08-24 01:48:03,470][04334] Updated weights for policy 0, policy_version 910 (0.0035) [2024-08-24 01:48:07,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3743744. Throughput: 0: 923.0. Samples: 934100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:48:07,375][01629] Avg episode reward: [(0, '20.714')] [2024-08-24 01:48:12,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3764224. Throughput: 0: 977.9. Samples: 940768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-24 01:48:12,375][01629] Avg episode reward: [(0, '21.300')] [2024-08-24 01:48:13,391][04334] Updated weights for policy 0, policy_version 920 (0.0021) [2024-08-24 01:48:17,371][01629] Fps is (10 sec: 3276.3, 60 sec: 3686.3, 300 sec: 3804.4). Total num frames: 3776512. Throughput: 0: 930.2. Samples: 945058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:48:17,377][01629] Avg episode reward: [(0, '22.232')] [2024-08-24 01:48:22,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3796992. Throughput: 0: 915.9. Samples: 947682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:48:22,380][01629] Avg episode reward: [(0, '21.855')] [2024-08-24 01:48:24,592][04334] Updated weights for policy 0, policy_version 930 (0.0012) [2024-08-24 01:48:27,370][01629] Fps is (10 sec: 4506.3, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3821568. Throughput: 0: 951.1. Samples: 954426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:48:27,376][01629] Avg episode reward: [(0, '23.355')] [2024-08-24 01:48:32,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 3833856. Throughput: 0: 951.7. Samples: 959796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:48:32,377][01629] Avg episode reward: [(0, '22.782')] [2024-08-24 01:48:36,600][04334] Updated weights for policy 0, policy_version 940 (0.0016) [2024-08-24 01:48:37,370][01629] Fps is (10 sec: 2867.2, 60 sec: 3618.4, 300 sec: 3804.4). Total num frames: 3850240. Throughput: 0: 922.9. Samples: 961820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:48:37,372][01629] Avg episode reward: [(0, '22.573')] [2024-08-24 01:48:42,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3874816. Throughput: 0: 930.5. Samples: 968238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:48:42,379][01629] Avg episode reward: [(0, '22.190')] [2024-08-24 01:48:45,449][04334] Updated weights for policy 0, policy_version 950 (0.0016) [2024-08-24 01:48:47,370][01629] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3895296. Throughput: 0: 976.0. Samples: 974660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:48:47,378][01629] Avg episode reward: [(0, '22.414')] [2024-08-24 01:48:52,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.5). Total num frames: 3911680. Throughput: 0: 947.8. Samples: 976750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:48:52,373][01629] Avg episode reward: [(0, '22.651')] [2024-08-24 01:48:57,285][04334] Updated weights for policy 0, policy_version 960 (0.0012) [2024-08-24 01:48:57,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3932160. Throughput: 0: 919.3. Samples: 982136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:48:57,377][01629] Avg episode reward: [(0, '22.613')] [2024-08-24 01:49:02,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3952640. Throughput: 0: 976.2. Samples: 988986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:49:02,374][01629] Avg episode reward: [(0, '23.228')] [2024-08-24 01:49:07,372][01629] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3790.5). Total num frames: 3969024. Throughput: 0: 975.2. Samples: 991570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:49:07,374][01629] Avg episode reward: [(0, '22.362')] [2024-08-24 01:49:07,961][04334] Updated weights for policy 0, policy_version 970 (0.0022) [2024-08-24 01:49:12,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 3985408. Throughput: 0: 926.9. Samples: 996136. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:49:12,376][01629] Avg episode reward: [(0, '21.746')] [2024-08-24 01:49:12,395][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000974_3989504.pth... [2024-08-24 01:49:12,525][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000752_3080192.pth [2024-08-24 01:49:17,370][01629] Fps is (10 sec: 4096.9, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 4009984. Throughput: 0: 955.3. Samples: 1002786. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:49:17,372][01629] Avg episode reward: [(0, '22.629')] [2024-08-24 01:49:18,058][04334] Updated weights for policy 0, policy_version 980 (0.0027) [2024-08-24 01:49:22,370][01629] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 4026368. Throughput: 0: 982.3. Samples: 1006024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:49:22,374][01629] Avg episode reward: [(0, '21.078')] [2024-08-24 01:49:27,372][01629] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3790.5). Total num frames: 4042752. Throughput: 0: 935.6. Samples: 1010342. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:49:27,376][01629] Avg episode reward: [(0, '21.923')] [2024-08-24 01:49:30,079][04334] Updated weights for policy 0, policy_version 990 (0.0016) [2024-08-24 01:49:32,370][01629] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 4063232. Throughput: 0: 930.0. Samples: 1016508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:49:32,377][01629] Avg episode reward: [(0, '23.053')] [2024-08-24 01:49:37,370][01629] Fps is (10 sec: 4506.5, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 4087808. Throughput: 0: 961.6. Samples: 1020020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:49:37,377][01629] Avg episode reward: [(0, '24.069')] [2024-08-24 01:49:37,381][04321] Saving new best policy, reward=24.069! [2024-08-24 01:49:40,203][04334] Updated weights for policy 0, policy_version 1000 (0.0021) [2024-08-24 01:49:42,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 4100096. Throughput: 0: 956.5. Samples: 1025180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:49:42,372][01629] Avg episode reward: [(0, '23.822')] [2024-08-24 01:49:47,370][01629] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 4116480. Throughput: 0: 915.5. Samples: 1030182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:49:47,372][01629] Avg episode reward: [(0, '24.317')] [2024-08-24 01:49:47,385][04321] Saving new best policy, reward=24.317! [2024-08-24 01:49:51,019][04334] Updated weights for policy 0, policy_version 1010 (0.0020) [2024-08-24 01:49:52,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 4141056. Throughput: 0: 931.8. Samples: 1033498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:49:52,372][01629] Avg episode reward: [(0, '25.410')] [2024-08-24 01:49:52,382][04321] Saving new best policy, reward=25.410! [2024-08-24 01:49:57,373][01629] Fps is (10 sec: 4095.5, 60 sec: 3754.6, 300 sec: 3776.6). Total num frames: 4157440. Throughput: 0: 974.4. Samples: 1039986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:49:57,376][01629] Avg episode reward: [(0, '23.926')] [2024-08-24 01:50:02,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.6). Total num frames: 4173824. Throughput: 0: 918.9. Samples: 1044136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:50:02,372][01629] Avg episode reward: [(0, '23.313')] [2024-08-24 01:50:02,886][04334] Updated weights for policy 0, policy_version 1020 (0.0013) [2024-08-24 01:50:07,370][01629] Fps is (10 sec: 4096.5, 60 sec: 3823.1, 300 sec: 3790.5). Total num frames: 4198400. Throughput: 0: 922.5. Samples: 1047538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:50:07,377][01629] Avg episode reward: [(0, '23.736')] [2024-08-24 01:50:11,452][04334] Updated weights for policy 0, policy_version 1030 (0.0021) [2024-08-24 01:50:12,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 4218880. Throughput: 0: 983.2. Samples: 1054582. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:50:12,377][01629] Avg episode reward: [(0, '23.077')] [2024-08-24 01:50:17,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 4235264. Throughput: 0: 951.3. Samples: 1059318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:50:17,372][01629] Avg episode reward: [(0, '23.113')] [2024-08-24 01:50:22,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 4251648. Throughput: 0: 922.9. Samples: 1061550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:50:22,375][01629] Avg episode reward: [(0, '23.167')] [2024-08-24 01:50:23,415][04334] Updated weights for policy 0, policy_version 1040 (0.0034) [2024-08-24 01:50:27,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3790.5). Total num frames: 4276224. Throughput: 0: 960.4. Samples: 1068396. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:50:27,374][01629] Avg episode reward: [(0, '24.302')] [2024-08-24 01:50:32,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 4296704. Throughput: 0: 986.1. Samples: 1074556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:50:32,377][01629] Avg episode reward: [(0, '23.331')] [2024-08-24 01:50:33,690][04334] Updated weights for policy 0, policy_version 1050 (0.0047) [2024-08-24 01:50:37,370][01629] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 4308992. Throughput: 0: 959.7. Samples: 1076686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:50:37,379][01629] Avg episode reward: [(0, '22.712')] [2024-08-24 01:50:42,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 4333568. Throughput: 0: 948.5. Samples: 1082668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:50:42,377][01629] Avg episode reward: [(0, '22.875')] [2024-08-24 01:50:43,926][04334] Updated weights for policy 0, policy_version 1060 (0.0013) [2024-08-24 01:50:47,370][01629] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 4354048. Throughput: 0: 1006.9. Samples: 1089446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:50:47,376][01629] Avg episode reward: [(0, '22.041')] [2024-08-24 01:50:52,374][01629] Fps is (10 sec: 3684.9, 60 sec: 3822.7, 300 sec: 3804.4). Total num frames: 4370432. Throughput: 0: 978.9. Samples: 1091592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:50:52,376][01629] Avg episode reward: [(0, '21.559')] [2024-08-24 01:50:55,561][04334] Updated weights for policy 0, policy_version 1070 (0.0030) [2024-08-24 01:50:57,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 4390912. Throughput: 0: 935.9. Samples: 1096698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:50:57,377][01629] Avg episode reward: [(0, '22.135')] [2024-08-24 01:51:02,370][01629] Fps is (10 sec: 4097.7, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 4411392. Throughput: 0: 982.0. Samples: 1103506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:51:02,378][01629] Avg episode reward: [(0, '23.690')] [2024-08-24 01:51:04,851][04334] Updated weights for policy 0, policy_version 1080 (0.0026) [2024-08-24 01:51:07,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 4427776. Throughput: 0: 1001.0. Samples: 1106596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:51:07,375][01629] Avg episode reward: [(0, '24.819')] [2024-08-24 01:51:12,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 4444160. Throughput: 0: 946.6. Samples: 1110992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:51:12,372][01629] Avg episode reward: [(0, '25.571')] [2024-08-24 01:51:12,380][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001085_4444160.pth... [2024-08-24 01:51:12,533][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000863_3534848.pth [2024-08-24 01:51:12,545][04321] Saving new best policy, reward=25.571! [2024-08-24 01:51:16,156][04334] Updated weights for policy 0, policy_version 1090 (0.0016) [2024-08-24 01:51:17,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 4468736. Throughput: 0: 955.8. Samples: 1117566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:51:17,376][01629] Avg episode reward: [(0, '25.133')] [2024-08-24 01:51:22,370][01629] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3790.5). Total num frames: 4489216. Throughput: 0: 982.0. Samples: 1120876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:51:22,379][01629] Avg episode reward: [(0, '25.541')] [2024-08-24 01:51:27,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 4501504. Throughput: 0: 955.0. Samples: 1125642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:51:27,375][01629] Avg episode reward: [(0, '24.897')] [2024-08-24 01:51:27,840][04334] Updated weights for policy 0, policy_version 1100 (0.0016) [2024-08-24 01:51:32,370][01629] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 4526080. Throughput: 0: 936.9. Samples: 1131606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:51:32,376][01629] Avg episode reward: [(0, '22.984')] [2024-08-24 01:51:36,881][04334] Updated weights for policy 0, policy_version 1110 (0.0015) [2024-08-24 01:51:37,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 4546560. Throughput: 0: 966.8. Samples: 1135092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:51:37,377][01629] Avg episode reward: [(0, '23.113')] [2024-08-24 01:51:42,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 4562944. Throughput: 0: 980.8. Samples: 1140834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:51:42,375][01629] Avg episode reward: [(0, '22.751')] [2024-08-24 01:51:47,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 4579328. Throughput: 0: 933.2. Samples: 1145502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-08-24 01:51:47,377][01629] Avg episode reward: [(0, '22.130')] [2024-08-24 01:51:48,707][04334] Updated weights for policy 0, policy_version 1120 (0.0029) [2024-08-24 01:51:52,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.5, 300 sec: 3818.3). Total num frames: 4603904. Throughput: 0: 943.0. Samples: 1149032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:51:52,375][01629] Avg episode reward: [(0, '21.380')] [2024-08-24 01:51:57,375][01629] Fps is (10 sec: 4503.0, 60 sec: 3890.8, 300 sec: 3818.2). Total num frames: 4624384. Throughput: 0: 998.6. Samples: 1155936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:51:57,378][01629] Avg episode reward: [(0, '20.987')] [2024-08-24 01:51:58,594][04334] Updated weights for policy 0, policy_version 1130 (0.0014) [2024-08-24 01:52:02,372][01629] Fps is (10 sec: 3276.0, 60 sec: 3754.5, 300 sec: 3804.4). Total num frames: 4636672. Throughput: 0: 946.7. Samples: 1160168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:52:02,382][01629] Avg episode reward: [(0, '20.624')] [2024-08-24 01:52:07,370][01629] Fps is (10 sec: 3688.5, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 4661248. Throughput: 0: 940.1. Samples: 1163178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:52:07,372][01629] Avg episode reward: [(0, '20.532')] [2024-08-24 01:52:09,007][04334] Updated weights for policy 0, policy_version 1140 (0.0025) [2024-08-24 01:52:12,370][01629] Fps is (10 sec: 4506.8, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 4681728. Throughput: 0: 991.0. Samples: 1170238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:52:12,377][01629] Avg episode reward: [(0, '21.550')] [2024-08-24 01:52:17,370][01629] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 4698112. Throughput: 0: 972.7. Samples: 1175378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:52:17,375][01629] Avg episode reward: [(0, '22.049')] [2024-08-24 01:52:20,540][04334] Updated weights for policy 0, policy_version 1150 (0.0019) [2024-08-24 01:52:22,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 4714496. Throughput: 0: 941.5. Samples: 1177458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:52:22,377][01629] Avg episode reward: [(0, '22.031')] [2024-08-24 01:52:27,370][01629] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 4739072. Throughput: 0: 963.5. Samples: 1184190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:52:27,377][01629] Avg episode reward: [(0, '22.062')] [2024-08-24 01:52:29,629][04334] Updated weights for policy 0, policy_version 1160 (0.0016) [2024-08-24 01:52:32,370][01629] Fps is (10 sec: 4505.3, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 4759552. Throughput: 0: 997.5. Samples: 1190392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:52:32,373][01629] Avg episode reward: [(0, '24.189')] [2024-08-24 01:52:37,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 4771840. Throughput: 0: 965.6. Samples: 1192484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:52:37,379][01629] Avg episode reward: [(0, '24.174')] [2024-08-24 01:52:41,196][04334] Updated weights for policy 0, policy_version 1170 (0.0039) [2024-08-24 01:52:42,370][01629] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 4796416. Throughput: 0: 945.5. Samples: 1198478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:52:42,376][01629] Avg episode reward: [(0, '23.213')] [2024-08-24 01:52:47,371][01629] Fps is (10 sec: 4504.9, 60 sec: 3959.4, 300 sec: 3832.2). Total num frames: 4816896. Throughput: 0: 1002.4. Samples: 1205274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:52:47,374][01629] Avg episode reward: [(0, '23.451')] [2024-08-24 01:52:51,810][04334] Updated weights for policy 0, policy_version 1180 (0.0012) [2024-08-24 01:52:52,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 4833280. Throughput: 0: 984.0. Samples: 1207456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:52:52,372][01629] Avg episode reward: [(0, '23.764')] [2024-08-24 01:52:57,372][01629] Fps is (10 sec: 3276.5, 60 sec: 3754.9, 300 sec: 3832.2). Total num frames: 4849664. Throughput: 0: 936.4. Samples: 1212380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:52:57,375][01629] Avg episode reward: [(0, '21.792')] [2024-08-24 01:53:01,711][04334] Updated weights for policy 0, policy_version 1190 (0.0016) [2024-08-24 01:53:02,370][01629] Fps is (10 sec: 4096.1, 60 sec: 3959.6, 300 sec: 3832.2). Total num frames: 4874240. Throughput: 0: 974.7. Samples: 1219238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:53:02,372][01629] Avg episode reward: [(0, '20.748')] [2024-08-24 01:53:07,370][01629] Fps is (10 sec: 4097.1, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 4890624. Throughput: 0: 999.7. Samples: 1222444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:53:07,372][01629] Avg episode reward: [(0, '21.573')] [2024-08-24 01:53:12,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 4907008. Throughput: 0: 943.5. Samples: 1226646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:53:12,372][01629] Avg episode reward: [(0, '20.817')] [2024-08-24 01:53:12,381][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001198_4907008.pth... [2024-08-24 01:53:12,516][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000974_3989504.pth [2024-08-24 01:53:13,560][04334] Updated weights for policy 0, policy_version 1200 (0.0020) [2024-08-24 01:53:17,370][01629] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 4931584. Throughput: 0: 955.0. Samples: 1233366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:53:17,372][01629] Avg episode reward: [(0, '20.965')] [2024-08-24 01:53:22,374][01629] Fps is (10 sec: 4503.5, 60 sec: 3959.2, 300 sec: 3832.1). Total num frames: 4952064. Throughput: 0: 980.4. Samples: 1236608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:53:22,382][01629] Avg episode reward: [(0, '21.796')] [2024-08-24 01:53:23,171][04334] Updated weights for policy 0, policy_version 1210 (0.0024) [2024-08-24 01:53:27,370][01629] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 4964352. Throughput: 0: 956.6. Samples: 1241526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:53:27,374][01629] Avg episode reward: [(0, '21.193')] [2024-08-24 01:53:32,370][01629] Fps is (10 sec: 3278.3, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 4984832. Throughput: 0: 925.9. Samples: 1246940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:53:32,375][01629] Avg episode reward: [(0, '21.369')] [2024-08-24 01:53:34,627][04334] Updated weights for policy 0, policy_version 1220 (0.0040) [2024-08-24 01:53:37,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 5005312. Throughput: 0: 949.9. Samples: 1250200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:53:37,372][01629] Avg episode reward: [(0, '23.214')] [2024-08-24 01:53:42,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 5021696. Throughput: 0: 971.5. Samples: 1256094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:53:42,372][01629] Avg episode reward: [(0, '23.244')] [2024-08-24 01:53:46,427][04334] Updated weights for policy 0, policy_version 1230 (0.0017) [2024-08-24 01:53:47,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3832.2). Total num frames: 5042176. Throughput: 0: 921.3. Samples: 1260696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:53:47,372][01629] Avg episode reward: [(0, '23.212')] [2024-08-24 01:53:52,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 5062656. Throughput: 0: 922.7. Samples: 1263966. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:53:52,376][01629] Avg episode reward: [(0, '23.265')] [2024-08-24 01:53:55,392][04334] Updated weights for policy 0, policy_version 1240 (0.0020) [2024-08-24 01:53:57,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.4, 300 sec: 3832.2). Total num frames: 5083136. Throughput: 0: 986.4. Samples: 1271032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:53:57,377][01629] Avg episode reward: [(0, '24.096')] [2024-08-24 01:54:02,370][01629] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3832.2). Total num frames: 5099520. Throughput: 0: 934.3. Samples: 1275408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:54:02,373][01629] Avg episode reward: [(0, '23.039')] [2024-08-24 01:54:07,012][04334] Updated weights for policy 0, policy_version 1250 (0.0031) [2024-08-24 01:54:07,370][01629] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 5120000. Throughput: 0: 927.5. Samples: 1278340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:54:07,372][01629] Avg episode reward: [(0, '23.732')] [2024-08-24 01:54:12,370][01629] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 5144576. Throughput: 0: 969.0. Samples: 1285130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:54:12,372][01629] Avg episode reward: [(0, '23.664')] [2024-08-24 01:54:17,350][04334] Updated weights for policy 0, policy_version 1260 (0.0012) [2024-08-24 01:54:17,373][01629] Fps is (10 sec: 4094.6, 60 sec: 3822.7, 300 sec: 3846.0). Total num frames: 5160960. Throughput: 0: 966.4. Samples: 1290430. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:54:17,377][01629] Avg episode reward: [(0, '23.123')] [2024-08-24 01:54:22,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3755.0, 300 sec: 3846.1). Total num frames: 5177344. Throughput: 0: 941.7. Samples: 1292578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:54:22,373][01629] Avg episode reward: [(0, '22.515')] [2024-08-24 01:54:27,370][01629] Fps is (10 sec: 3687.7, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 5197824. Throughput: 0: 961.6. Samples: 1299368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:54:27,377][01629] Avg episode reward: [(0, '22.316')] [2024-08-24 01:54:27,482][04334] Updated weights for policy 0, policy_version 1270 (0.0012) [2024-08-24 01:54:32,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 5218304. Throughput: 0: 1000.0. Samples: 1305694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:54:32,373][01629] Avg episode reward: [(0, '21.687')] [2024-08-24 01:54:37,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 5234688. Throughput: 0: 975.9. Samples: 1307880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:54:37,372][01629] Avg episode reward: [(0, '22.939')] [2024-08-24 01:54:38,673][04334] Updated weights for policy 0, policy_version 1280 (0.0020) [2024-08-24 01:54:42,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5259264. Throughput: 0: 951.8. Samples: 1313862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:54:42,372][01629] Avg episode reward: [(0, '23.375')] [2024-08-24 01:54:47,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 5279744. Throughput: 0: 1005.6. Samples: 1320660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:54:47,375][01629] Avg episode reward: [(0, '24.955')] [2024-08-24 01:54:47,873][04334] Updated weights for policy 0, policy_version 1290 (0.0016) [2024-08-24 01:54:52,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 5296128. Throughput: 0: 992.6. Samples: 1323006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:54:52,375][01629] Avg episode reward: [(0, '24.303')] [2024-08-24 01:54:57,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5312512. Throughput: 0: 952.8. Samples: 1328004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:54:57,377][01629] Avg episode reward: [(0, '25.130')] [2024-08-24 01:54:59,107][04334] Updated weights for policy 0, policy_version 1300 (0.0013) [2024-08-24 01:55:02,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 5337088. Throughput: 0: 989.4. Samples: 1334950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:55:02,372][01629] Avg episode reward: [(0, '24.858')] [2024-08-24 01:55:07,370][01629] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 5353472. Throughput: 0: 1013.8. Samples: 1338200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:55:07,373][01629] Avg episode reward: [(0, '25.064')] [2024-08-24 01:55:10,288][04334] Updated weights for policy 0, policy_version 1310 (0.0015) [2024-08-24 01:55:12,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 5369856. Throughput: 0: 962.7. Samples: 1342690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:55:12,375][01629] Avg episode reward: [(0, '23.318')] [2024-08-24 01:55:12,385][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001311_5369856.pth... [2024-08-24 01:55:12,524][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001085_4444160.pth [2024-08-24 01:55:17,370][01629] Fps is (10 sec: 4096.3, 60 sec: 3891.4, 300 sec: 3873.8). Total num frames: 5394432. Throughput: 0: 968.8. Samples: 1349290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:55:17,374][01629] Avg episode reward: [(0, '22.993')] [2024-08-24 01:55:19,567][04334] Updated weights for policy 0, policy_version 1320 (0.0018) [2024-08-24 01:55:22,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 5414912. Throughput: 0: 992.6. Samples: 1352546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:55:22,377][01629] Avg episode reward: [(0, '23.332')] [2024-08-24 01:55:27,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 5431296. Throughput: 0: 973.2. Samples: 1357654. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:55:27,372][01629] Avg episode reward: [(0, '23.416')] [2024-08-24 01:55:31,060][04334] Updated weights for policy 0, policy_version 1330 (0.0016) [2024-08-24 01:55:32,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 5451776. Throughput: 0: 949.7. Samples: 1363396. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:55:32,372][01629] Avg episode reward: [(0, '23.357')] [2024-08-24 01:55:37,370][01629] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 5476352. Throughput: 0: 974.7. Samples: 1366868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:55:37,377][01629] Avg episode reward: [(0, '23.346')] [2024-08-24 01:55:40,461][04334] Updated weights for policy 0, policy_version 1340 (0.0022) [2024-08-24 01:55:42,373][01629] Fps is (10 sec: 4094.8, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 5492736. Throughput: 0: 999.7. Samples: 1372992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:55:42,376][01629] Avg episode reward: [(0, '23.906')] [2024-08-24 01:55:47,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5509120. Throughput: 0: 947.2. Samples: 1377572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:55:47,372][01629] Avg episode reward: [(0, '24.143')] [2024-08-24 01:55:51,632][04334] Updated weights for policy 0, policy_version 1350 (0.0022) [2024-08-24 01:55:52,370][01629] Fps is (10 sec: 3687.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 5529600. Throughput: 0: 948.3. Samples: 1380872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:55:52,372][01629] Avg episode reward: [(0, '23.347')] [2024-08-24 01:55:57,373][01629] Fps is (10 sec: 4504.2, 60 sec: 4027.5, 300 sec: 3873.8). Total num frames: 5554176. Throughput: 0: 1006.2. Samples: 1387972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:55:57,375][01629] Avg episode reward: [(0, '23.392')] [2024-08-24 01:56:02,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5566464. Throughput: 0: 955.6. Samples: 1392294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:56:02,372][01629] Avg episode reward: [(0, '23.173')] [2024-08-24 01:56:03,095][04334] Updated weights for policy 0, policy_version 1360 (0.0035) [2024-08-24 01:56:07,370][01629] Fps is (10 sec: 3277.7, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 5586944. Throughput: 0: 946.4. Samples: 1395134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:56:07,372][01629] Avg episode reward: [(0, '23.404')] [2024-08-24 01:56:11,930][04334] Updated weights for policy 0, policy_version 1370 (0.0012) [2024-08-24 01:56:12,370][01629] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 5611520. Throughput: 0: 991.6. Samples: 1402278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:56:12,373][01629] Avg episode reward: [(0, '22.044')] [2024-08-24 01:56:17,373][01629] Fps is (10 sec: 4094.7, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 5627904. Throughput: 0: 981.9. Samples: 1407586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:56:17,377][01629] Avg episode reward: [(0, '22.185')] [2024-08-24 01:56:22,371][01629] Fps is (10 sec: 3276.5, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 5644288. Throughput: 0: 952.0. Samples: 1409710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:56:22,378][01629] Avg episode reward: [(0, '21.823')] [2024-08-24 01:56:23,660][04334] Updated weights for policy 0, policy_version 1380 (0.0012) [2024-08-24 01:56:27,370][01629] Fps is (10 sec: 4097.5, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5668864. Throughput: 0: 962.9. Samples: 1416320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:56:27,377][01629] Avg episode reward: [(0, '21.605')] [2024-08-24 01:56:32,370][01629] Fps is (10 sec: 4505.8, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 5689344. Throughput: 0: 1005.2. Samples: 1422806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:56:32,372][01629] Avg episode reward: [(0, '22.245')] [2024-08-24 01:56:33,535][04334] Updated weights for policy 0, policy_version 1390 (0.0012) [2024-08-24 01:56:37,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 5701632. Throughput: 0: 978.8. Samples: 1424918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:56:37,372][01629] Avg episode reward: [(0, '21.808')] [2024-08-24 01:56:42,370][01629] Fps is (10 sec: 3686.5, 60 sec: 3891.4, 300 sec: 3887.7). Total num frames: 5726208. Throughput: 0: 947.7. Samples: 1430614. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:56:42,376][01629] Avg episode reward: [(0, '21.933')] [2024-08-24 01:56:44,163][04334] Updated weights for policy 0, policy_version 1400 (0.0012) [2024-08-24 01:56:47,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5746688. Throughput: 0: 1004.2. Samples: 1437484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 01:56:47,372][01629] Avg episode reward: [(0, '22.429')] [2024-08-24 01:56:52,370][01629] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 5763072. Throughput: 0: 998.5. Samples: 1440066. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:56:52,373][01629] Avg episode reward: [(0, '22.812')] [2024-08-24 01:56:55,651][04334] Updated weights for policy 0, policy_version 1410 (0.0013) [2024-08-24 01:56:57,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3873.9). Total num frames: 5779456. Throughput: 0: 942.6. Samples: 1444696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:56:57,374][01629] Avg episode reward: [(0, '22.524')] [2024-08-24 01:57:02,370][01629] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5804032. Throughput: 0: 977.8. Samples: 1451582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:57:02,372][01629] Avg episode reward: [(0, '22.713')] [2024-08-24 01:57:04,616][04334] Updated weights for policy 0, policy_version 1420 (0.0020) [2024-08-24 01:57:07,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5824512. Throughput: 0: 1006.4. Samples: 1454998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:57:07,372][01629] Avg episode reward: [(0, '23.460')] [2024-08-24 01:57:12,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 5836800. Throughput: 0: 956.7. Samples: 1459370. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:57:12,372][01629] Avg episode reward: [(0, '22.951')] [2024-08-24 01:57:12,456][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001426_5840896.pth... [2024-08-24 01:57:12,629][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001198_4907008.pth [2024-08-24 01:57:16,010][04334] Updated weights for policy 0, policy_version 1430 (0.0028) [2024-08-24 01:57:17,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3887.7). Total num frames: 5861376. Throughput: 0: 954.9. Samples: 1465776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:57:17,377][01629] Avg episode reward: [(0, '23.788')] [2024-08-24 01:57:22,374][01629] Fps is (10 sec: 4503.6, 60 sec: 3959.2, 300 sec: 3873.8). Total num frames: 5881856. Throughput: 0: 985.9. Samples: 1469288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:57:22,376][01629] Avg episode reward: [(0, '23.720')] [2024-08-24 01:57:27,021][04334] Updated weights for policy 0, policy_version 1440 (0.0034) [2024-08-24 01:57:27,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 5898240. Throughput: 0: 971.3. Samples: 1474322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:57:27,374][01629] Avg episode reward: [(0, '23.740')] [2024-08-24 01:57:32,370][01629] Fps is (10 sec: 3688.0, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 5918720. Throughput: 0: 942.8. Samples: 1479908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:57:32,373][01629] Avg episode reward: [(0, '22.978')] [2024-08-24 01:57:36,660][04334] Updated weights for policy 0, policy_version 1450 (0.0022) [2024-08-24 01:57:37,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 5939200. Throughput: 0: 963.9. Samples: 1483442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:57:37,372][01629] Avg episode reward: [(0, '22.983')] [2024-08-24 01:57:42,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 5959680. Throughput: 0: 1000.1. Samples: 1489702. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:57:42,372][01629] Avg episode reward: [(0, '22.335')] [2024-08-24 01:57:47,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 5976064. Throughput: 0: 946.4. Samples: 1494172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:57:47,372][01629] Avg episode reward: [(0, '21.307')] [2024-08-24 01:57:48,245][04334] Updated weights for policy 0, policy_version 1460 (0.0013) [2024-08-24 01:57:52,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.8). Total num frames: 5996544. Throughput: 0: 947.4. Samples: 1497632. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 01:57:52,375][01629] Avg episode reward: [(0, '21.606')] [2024-08-24 01:57:57,308][04334] Updated weights for policy 0, policy_version 1470 (0.0013) [2024-08-24 01:57:57,370][01629] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 6021120. Throughput: 0: 1006.1. Samples: 1504644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 01:57:57,372][01629] Avg episode reward: [(0, '21.684')] [2024-08-24 01:58:02,373][01629] Fps is (10 sec: 3685.2, 60 sec: 3822.7, 300 sec: 3873.8). Total num frames: 6033408. Throughput: 0: 963.1. Samples: 1509120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:58:02,382][01629] Avg episode reward: [(0, '21.261')] [2024-08-24 01:58:07,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 6053888. Throughput: 0: 944.8. Samples: 1511802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:58:07,374][01629] Avg episode reward: [(0, '20.741')] [2024-08-24 01:58:08,594][04334] Updated weights for policy 0, policy_version 1480 (0.0016) [2024-08-24 01:58:12,370][01629] Fps is (10 sec: 4507.0, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 6078464. Throughput: 0: 990.3. Samples: 1518884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:58:12,372][01629] Avg episode reward: [(0, '21.556')] [2024-08-24 01:58:17,373][01629] Fps is (10 sec: 4094.5, 60 sec: 3891.0, 300 sec: 3873.9). Total num frames: 6094848. Throughput: 0: 989.3. Samples: 1524428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:58:17,377][01629] Avg episode reward: [(0, '22.023')] [2024-08-24 01:58:19,644][04334] Updated weights for policy 0, policy_version 1490 (0.0016) [2024-08-24 01:58:22,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3823.2, 300 sec: 3887.7). Total num frames: 6111232. Throughput: 0: 958.0. Samples: 1526554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:58:22,378][01629] Avg episode reward: [(0, '21.506')] [2024-08-24 01:58:27,371][01629] Fps is (10 sec: 4097.1, 60 sec: 3959.4, 300 sec: 3901.6). Total num frames: 6135808. Throughput: 0: 959.6. Samples: 1532886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 01:58:27,373][01629] Avg episode reward: [(0, '22.378')] [2024-08-24 01:58:29,120][04334] Updated weights for policy 0, policy_version 1500 (0.0021) [2024-08-24 01:58:32,392][01629] Fps is (10 sec: 4495.8, 60 sec: 3958.0, 300 sec: 3901.3). Total num frames: 6156288. Throughput: 0: 1003.7. Samples: 1539360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:58:32,397][01629] Avg episode reward: [(0, '22.790')] [2024-08-24 01:58:37,370][01629] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 6168576. Throughput: 0: 974.7. Samples: 1541492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:58:37,372][01629] Avg episode reward: [(0, '21.900')] [2024-08-24 01:58:40,568][04334] Updated weights for policy 0, policy_version 1510 (0.0012) [2024-08-24 01:58:42,370][01629] Fps is (10 sec: 3283.9, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 6189056. Throughput: 0: 945.3. Samples: 1547184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:58:42,373][01629] Avg episode reward: [(0, '21.147')] [2024-08-24 01:58:47,370][01629] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 6213632. Throughput: 0: 998.5. Samples: 1554048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:58:47,373][01629] Avg episode reward: [(0, '22.530')] [2024-08-24 01:58:50,733][04334] Updated weights for policy 0, policy_version 1520 (0.0022) [2024-08-24 01:58:52,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 6230016. Throughput: 0: 994.0. Samples: 1556534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:58:52,381][01629] Avg episode reward: [(0, '21.670')] [2024-08-24 01:58:57,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 6246400. Throughput: 0: 938.7. Samples: 1561126. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:58:57,376][01629] Avg episode reward: [(0, '22.023')] [2024-08-24 01:59:01,144][04334] Updated weights for policy 0, policy_version 1530 (0.0013) [2024-08-24 01:59:02,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3901.6). Total num frames: 6270976. Throughput: 0: 969.9. Samples: 1568068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:59:02,372][01629] Avg episode reward: [(0, '22.426')] [2024-08-24 01:59:07,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 6291456. Throughput: 0: 998.0. Samples: 1571466. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:59:07,375][01629] Avg episode reward: [(0, '22.986')] [2024-08-24 01:59:12,370][01629] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3873.9). Total num frames: 6303744. Throughput: 0: 956.6. Samples: 1575930. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:59:12,372][01629] Avg episode reward: [(0, '23.395')] [2024-08-24 01:59:12,387][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001539_6303744.pth... [2024-08-24 01:59:12,526][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001311_5369856.pth [2024-08-24 01:59:12,741][04334] Updated weights for policy 0, policy_version 1540 (0.0014) [2024-08-24 01:59:17,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3901.6). Total num frames: 6328320. Throughput: 0: 955.8. Samples: 1582352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:59:17,377][01629] Avg episode reward: [(0, '24.198')] [2024-08-24 01:59:21,558][04334] Updated weights for policy 0, policy_version 1550 (0.0016) [2024-08-24 01:59:22,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 6348800. Throughput: 0: 984.9. Samples: 1585814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:59:22,378][01629] Avg episode reward: [(0, '24.965')] [2024-08-24 01:59:27,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 6365184. Throughput: 0: 970.4. Samples: 1590852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:59:27,372][01629] Avg episode reward: [(0, '25.097')] [2024-08-24 01:59:32,370][01629] Fps is (10 sec: 3276.9, 60 sec: 3756.0, 300 sec: 3887.7). Total num frames: 6381568. Throughput: 0: 942.9. Samples: 1596478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:59:32,378][01629] Avg episode reward: [(0, '26.057')] [2024-08-24 01:59:32,401][04321] Saving new best policy, reward=26.057! [2024-08-24 01:59:33,370][04334] Updated weights for policy 0, policy_version 1560 (0.0018) [2024-08-24 01:59:37,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 6406144. Throughput: 0: 962.0. Samples: 1599826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 01:59:37,377][01629] Avg episode reward: [(0, '24.247')] [2024-08-24 01:59:42,372][01629] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 6422528. Throughput: 0: 998.9. Samples: 1606080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:59:42,374][01629] Avg episode reward: [(0, '23.604')] [2024-08-24 01:59:44,189][04334] Updated weights for policy 0, policy_version 1570 (0.0032) [2024-08-24 01:59:47,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 6438912. Throughput: 0: 943.8. Samples: 1610538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:59:47,379][01629] Avg episode reward: [(0, '23.247')] [2024-08-24 01:59:52,370][01629] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 6463488. Throughput: 0: 945.9. Samples: 1614030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 01:59:52,372][01629] Avg episode reward: [(0, '23.848')] [2024-08-24 01:59:53,734][04334] Updated weights for policy 0, policy_version 1580 (0.0027) [2024-08-24 01:59:57,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 6483968. Throughput: 0: 1001.4. Samples: 1620992. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 01:59:57,372][01629] Avg episode reward: [(0, '22.775')] [2024-08-24 02:00:02,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 6500352. Throughput: 0: 961.5. Samples: 1625618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 02:00:02,372][01629] Avg episode reward: [(0, '23.039')] [2024-08-24 02:00:05,461][04334] Updated weights for policy 0, policy_version 1590 (0.0017) [2024-08-24 02:00:07,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 6520832. Throughput: 0: 942.7. Samples: 1628234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:00:07,372][01629] Avg episode reward: [(0, '25.104')] [2024-08-24 02:00:12,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 6541312. Throughput: 0: 984.8. Samples: 1635166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:00:12,372][01629] Avg episode reward: [(0, '24.742')] [2024-08-24 02:00:14,401][04334] Updated weights for policy 0, policy_version 1600 (0.0012) [2024-08-24 02:00:17,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 6561792. Throughput: 0: 984.4. Samples: 1640774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:00:17,378][01629] Avg episode reward: [(0, '23.667')] [2024-08-24 02:00:22,370][01629] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 6578176. Throughput: 0: 956.7. Samples: 1642878. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 02:00:22,376][01629] Avg episode reward: [(0, '24.349')] [2024-08-24 02:00:25,910][04334] Updated weights for policy 0, policy_version 1610 (0.0027) [2024-08-24 02:00:27,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 6598656. Throughput: 0: 959.9. Samples: 1649272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 02:00:27,372][01629] Avg episode reward: [(0, '24.236')] [2024-08-24 02:00:32,370][01629] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 6619136. Throughput: 0: 1006.8. Samples: 1655844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:00:32,374][01629] Avg episode reward: [(0, '22.776')] [2024-08-24 02:00:37,101][04334] Updated weights for policy 0, policy_version 1620 (0.0022) [2024-08-24 02:00:37,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 6635520. Throughput: 0: 976.7. Samples: 1657982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 02:00:37,372][01629] Avg episode reward: [(0, '23.220')] [2024-08-24 02:00:42,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 6656000. Throughput: 0: 947.0. Samples: 1663608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:00:42,373][01629] Avg episode reward: [(0, '23.570')] [2024-08-24 02:00:46,203][04334] Updated weights for policy 0, policy_version 1630 (0.0017) [2024-08-24 02:00:47,370][01629] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 6680576. Throughput: 0: 996.9. Samples: 1670478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:00:47,372][01629] Avg episode reward: [(0, '24.701')] [2024-08-24 02:00:52,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 6696960. Throughput: 0: 995.0. Samples: 1673010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:00:52,377][01629] Avg episode reward: [(0, '23.746')] [2024-08-24 02:00:57,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 6713344. Throughput: 0: 944.6. Samples: 1677674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:00:57,372][01629] Avg episode reward: [(0, '23.733')] [2024-08-24 02:00:57,797][04334] Updated weights for policy 0, policy_version 1640 (0.0030) [2024-08-24 02:01:02,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 6737920. Throughput: 0: 977.0. Samples: 1684738. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 02:01:02,372][01629] Avg episode reward: [(0, '25.146')] [2024-08-24 02:01:07,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 6754304. Throughput: 0: 1005.0. Samples: 1688102. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 02:01:07,377][01629] Avg episode reward: [(0, '23.909')] [2024-08-24 02:01:07,623][04334] Updated weights for policy 0, policy_version 1650 (0.0021) [2024-08-24 02:01:12,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 6770688. Throughput: 0: 960.1. Samples: 1692478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:01:12,377][01629] Avg episode reward: [(0, '24.939')] [2024-08-24 02:01:12,391][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001653_6770688.pth... [2024-08-24 02:01:12,514][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001426_5840896.pth [2024-08-24 02:01:17,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 6791168. Throughput: 0: 956.5. Samples: 1698886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:01:17,372][01629] Avg episode reward: [(0, '24.481')] [2024-08-24 02:01:18,359][04334] Updated weights for policy 0, policy_version 1660 (0.0026) [2024-08-24 02:01:22,370][01629] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 6815744. Throughput: 0: 986.9. Samples: 1702392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:01:22,376][01629] Avg episode reward: [(0, '25.329')] [2024-08-24 02:01:27,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 6828032. Throughput: 0: 976.9. Samples: 1707568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 02:01:27,374][01629] Avg episode reward: [(0, '26.253')] [2024-08-24 02:01:27,390][04321] Saving new best policy, reward=26.253! [2024-08-24 02:01:30,039][04334] Updated weights for policy 0, policy_version 1670 (0.0032) [2024-08-24 02:01:32,370][01629] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 6848512. Throughput: 0: 940.5. Samples: 1712800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:01:32,372][01629] Avg episode reward: [(0, '25.902')] [2024-08-24 02:01:37,370][01629] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 6873088. Throughput: 0: 963.6. Samples: 1716370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 02:01:37,372][01629] Avg episode reward: [(0, '28.083')] [2024-08-24 02:01:37,378][04321] Saving new best policy, reward=28.083! [2024-08-24 02:01:38,977][04334] Updated weights for policy 0, policy_version 1680 (0.0015) [2024-08-24 02:01:42,371][01629] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 6889472. Throughput: 0: 1002.0. Samples: 1722764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:01:42,380][01629] Avg episode reward: [(0, '26.016')] [2024-08-24 02:01:47,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.9). Total num frames: 6905856. Throughput: 0: 940.2. Samples: 1727046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:01:47,378][01629] Avg episode reward: [(0, '26.021')] [2024-08-24 02:01:50,537][04334] Updated weights for policy 0, policy_version 1690 (0.0012) [2024-08-24 02:01:52,370][01629] Fps is (10 sec: 4096.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 6930432. Throughput: 0: 939.4. Samples: 1730374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:01:52,372][01629] Avg episode reward: [(0, '27.526')] [2024-08-24 02:01:57,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 6950912. Throughput: 0: 995.8. Samples: 1737288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:01:57,373][01629] Avg episode reward: [(0, '25.447')] [2024-08-24 02:02:01,027][04334] Updated weights for policy 0, policy_version 1700 (0.0017) [2024-08-24 02:02:02,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 6967296. Throughput: 0: 959.6. Samples: 1742070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 02:02:02,373][01629] Avg episode reward: [(0, '25.699')] [2024-08-24 02:02:07,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 6983680. Throughput: 0: 939.3. Samples: 1744660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 02:02:07,372][01629] Avg episode reward: [(0, '25.237')] [2024-08-24 02:02:11,102][04334] Updated weights for policy 0, policy_version 1710 (0.0021) [2024-08-24 02:02:12,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 7008256. Throughput: 0: 977.6. Samples: 1751560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:02:12,377][01629] Avg episode reward: [(0, '25.530')] [2024-08-24 02:02:17,370][01629] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 7024640. Throughput: 0: 988.9. Samples: 1757300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 02:02:17,376][01629] Avg episode reward: [(0, '25.022')] [2024-08-24 02:02:22,363][04334] Updated weights for policy 0, policy_version 1720 (0.0025) [2024-08-24 02:02:22,375][01629] Fps is (10 sec: 3684.5, 60 sec: 3822.6, 300 sec: 3887.7). Total num frames: 7045120. Throughput: 0: 956.9. Samples: 1759434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 02:02:22,382][01629] Avg episode reward: [(0, '24.337')] [2024-08-24 02:02:27,370][01629] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 7065600. Throughput: 0: 955.5. Samples: 1765762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:02:27,372][01629] Avg episode reward: [(0, '26.096')] [2024-08-24 02:02:31,566][04334] Updated weights for policy 0, policy_version 1730 (0.0022) [2024-08-24 02:02:32,370][01629] Fps is (10 sec: 4098.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 7086080. Throughput: 0: 1004.7. Samples: 1772256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:02:32,374][01629] Avg episode reward: [(0, '24.907')] [2024-08-24 02:02:37,370][01629] Fps is (10 sec: 3276.6, 60 sec: 3754.6, 300 sec: 3860.0). Total num frames: 7098368. Throughput: 0: 977.9. Samples: 1774382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:02:37,374][01629] Avg episode reward: [(0, '24.336')] [2024-08-24 02:02:42,370][01629] Fps is (10 sec: 3686.5, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 7122944. Throughput: 0: 951.6. Samples: 1780112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:02:42,377][01629] Avg episode reward: [(0, '24.804')] [2024-08-24 02:02:42,756][04334] Updated weights for policy 0, policy_version 1740 (0.0029) [2024-08-24 02:02:47,370][01629] Fps is (10 sec: 4915.5, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 7147520. Throughput: 0: 1000.6. Samples: 1787096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 02:02:47,377][01629] Avg episode reward: [(0, '23.715')] [2024-08-24 02:02:52,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 7159808. Throughput: 0: 999.3. Samples: 1789628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:02:52,377][01629] Avg episode reward: [(0, '23.389')] [2024-08-24 02:02:54,332][04334] Updated weights for policy 0, policy_version 1750 (0.0037) [2024-08-24 02:02:57,370][01629] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3887.8). Total num frames: 7180288. Throughput: 0: 944.3. Samples: 1794052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:02:57,374][01629] Avg episode reward: [(0, '23.337')] [2024-08-24 02:03:02,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 7200768. Throughput: 0: 973.3. Samples: 1801100. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-08-24 02:03:02,373][01629] Avg episode reward: [(0, '24.155')] [2024-08-24 02:03:03,368][04334] Updated weights for policy 0, policy_version 1760 (0.0032) [2024-08-24 02:03:07,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 7221248. Throughput: 0: 1004.6. Samples: 1804638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:03:07,376][01629] Avg episode reward: [(0, '23.514')] [2024-08-24 02:03:12,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 7237632. Throughput: 0: 962.0. Samples: 1809054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 02:03:12,379][01629] Avg episode reward: [(0, '22.695')] [2024-08-24 02:03:12,396][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001767_7237632.pth... [2024-08-24 02:03:12,532][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001539_6303744.pth [2024-08-24 02:03:15,064][04334] Updated weights for policy 0, policy_version 1770 (0.0017) [2024-08-24 02:03:17,370][01629] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 7258112. Throughput: 0: 957.5. Samples: 1815344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 02:03:17,376][01629] Avg episode reward: [(0, '24.381')] [2024-08-24 02:03:22,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.8, 300 sec: 3887.7). Total num frames: 7282688. Throughput: 0: 988.5. Samples: 1818862. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 02:03:22,378][01629] Avg episode reward: [(0, '23.764')] [2024-08-24 02:03:24,484][04334] Updated weights for policy 0, policy_version 1780 (0.0016) [2024-08-24 02:03:27,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.2). Total num frames: 7294976. Throughput: 0: 979.6. Samples: 1824196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:03:27,372][01629] Avg episode reward: [(0, '23.701')] [2024-08-24 02:03:32,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 7315456. Throughput: 0: 934.8. Samples: 1829162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:03:32,372][01629] Avg episode reward: [(0, '23.865')] [2024-08-24 02:03:35,798][04334] Updated weights for policy 0, policy_version 1790 (0.0012) [2024-08-24 02:03:37,370][01629] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 7335936. Throughput: 0: 953.3. Samples: 1832526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 02:03:37,376][01629] Avg episode reward: [(0, '24.498')] [2024-08-24 02:03:42,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 7356416. Throughput: 0: 998.2. Samples: 1838970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:03:42,373][01629] Avg episode reward: [(0, '24.159')] [2024-08-24 02:03:47,231][04334] Updated weights for policy 0, policy_version 1800 (0.0016) [2024-08-24 02:03:47,370][01629] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 7372800. Throughput: 0: 937.3. Samples: 1843278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:03:47,372][01629] Avg episode reward: [(0, '24.011')] [2024-08-24 02:03:52,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 7393280. Throughput: 0: 934.9. Samples: 1846708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 02:03:52,372][01629] Avg episode reward: [(0, '23.727')] [2024-08-24 02:03:56,483][04334] Updated weights for policy 0, policy_version 1810 (0.0012) [2024-08-24 02:03:57,375][01629] Fps is (10 sec: 4093.8, 60 sec: 3890.9, 300 sec: 3873.8). Total num frames: 7413760. Throughput: 0: 985.4. Samples: 1853402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:03:57,379][01629] Avg episode reward: [(0, '23.440')] [2024-08-24 02:04:02,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 7430144. Throughput: 0: 951.1. Samples: 1858142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:04:02,375][01629] Avg episode reward: [(0, '23.209')] [2024-08-24 02:04:07,370][01629] Fps is (10 sec: 3688.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 7450624. Throughput: 0: 929.4. Samples: 1860686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 02:04:07,372][01629] Avg episode reward: [(0, '23.790')] [2024-08-24 02:04:07,980][04334] Updated weights for policy 0, policy_version 1820 (0.0020) [2024-08-24 02:04:12,372][01629] Fps is (10 sec: 4094.9, 60 sec: 3891.0, 300 sec: 3873.8). Total num frames: 7471104. Throughput: 0: 964.9. Samples: 1867618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 02:04:12,375][01629] Avg episode reward: [(0, '24.635')] [2024-08-24 02:04:17,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 7491584. Throughput: 0: 981.1. Samples: 1873312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:04:17,372][01629] Avg episode reward: [(0, '23.295')] [2024-08-24 02:04:18,607][04334] Updated weights for policy 0, policy_version 1830 (0.0017) [2024-08-24 02:04:22,370][01629] Fps is (10 sec: 3687.4, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 7507968. Throughput: 0: 954.9. Samples: 1875496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:04:22,375][01629] Avg episode reward: [(0, '24.594')] [2024-08-24 02:04:27,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 7528448. Throughput: 0: 951.5. Samples: 1881786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:04:27,374][01629] Avg episode reward: [(0, '23.667')] [2024-08-24 02:04:28,465][04334] Updated weights for policy 0, policy_version 1840 (0.0012) [2024-08-24 02:04:32,373][01629] Fps is (10 sec: 4094.5, 60 sec: 3891.0, 300 sec: 3873.8). Total num frames: 7548928. Throughput: 0: 1002.1. Samples: 1888376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:04:32,377][01629] Avg episode reward: [(0, '21.917')] [2024-08-24 02:04:37,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 7565312. Throughput: 0: 971.6. Samples: 1890432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:04:37,372][01629] Avg episode reward: [(0, '21.284')] [2024-08-24 02:04:40,047][04334] Updated weights for policy 0, policy_version 1850 (0.0021) [2024-08-24 02:04:42,370][01629] Fps is (10 sec: 3687.5, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 7585792. Throughput: 0: 947.3. Samples: 1896028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:04:42,372][01629] Avg episode reward: [(0, '21.351')] [2024-08-24 02:04:47,370][01629] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 7610368. Throughput: 0: 996.2. Samples: 1902972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 02:04:47,377][01629] Avg episode reward: [(0, '20.973')] [2024-08-24 02:04:49,185][04334] Updated weights for policy 0, policy_version 1860 (0.0012) [2024-08-24 02:04:52,371][01629] Fps is (10 sec: 4095.6, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 7626752. Throughput: 0: 1002.0. Samples: 1905776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:04:52,374][01629] Avg episode reward: [(0, '21.103')] [2024-08-24 02:04:57,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3823.3, 300 sec: 3873.8). Total num frames: 7643136. Throughput: 0: 943.8. Samples: 1910088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 02:04:57,372][01629] Avg episode reward: [(0, '20.938')] [2024-08-24 02:05:00,549][04334] Updated weights for policy 0, policy_version 1870 (0.0023) [2024-08-24 02:05:02,370][01629] Fps is (10 sec: 4096.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 7667712. Throughput: 0: 972.5. Samples: 1917074. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 02:05:02,376][01629] Avg episode reward: [(0, '21.088')] [2024-08-24 02:05:07,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 7688192. Throughput: 0: 1000.8. Samples: 1920532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 02:05:07,372][01629] Avg episode reward: [(0, '24.501')] [2024-08-24 02:05:11,470][04334] Updated weights for policy 0, policy_version 1880 (0.0020) [2024-08-24 02:05:12,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3860.0). Total num frames: 7700480. Throughput: 0: 963.0. Samples: 1925122. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-08-24 02:05:12,378][01629] Avg episode reward: [(0, '24.886')] [2024-08-24 02:05:12,388][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001880_7700480.pth... [2024-08-24 02:05:12,573][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001653_6770688.pth [2024-08-24 02:05:17,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 7725056. Throughput: 0: 952.0. Samples: 1931212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 02:05:17,371][01629] Avg episode reward: [(0, '24.177')] [2024-08-24 02:05:21,011][04334] Updated weights for policy 0, policy_version 1890 (0.0020) [2024-08-24 02:05:22,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 7745536. Throughput: 0: 983.9. Samples: 1934708. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 02:05:22,376][01629] Avg episode reward: [(0, '25.513')] [2024-08-24 02:05:27,371][01629] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 7761920. Throughput: 0: 985.3. Samples: 1940368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:05:27,380][01629] Avg episode reward: [(0, '24.828')] [2024-08-24 02:05:32,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3823.2, 300 sec: 3873.8). Total num frames: 7778304. Throughput: 0: 942.7. Samples: 1945392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 02:05:32,379][01629] Avg episode reward: [(0, '23.392')] [2024-08-24 02:05:32,628][04334] Updated weights for policy 0, policy_version 1900 (0.0027) [2024-08-24 02:05:37,370][01629] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 7802880. Throughput: 0: 955.1. Samples: 1948754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:05:37,372][01629] Avg episode reward: [(0, '21.337')] [2024-08-24 02:05:42,171][04334] Updated weights for policy 0, policy_version 1910 (0.0020) [2024-08-24 02:05:42,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 7823360. Throughput: 0: 1008.3. Samples: 1955460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:05:42,378][01629] Avg episode reward: [(0, '21.470')] [2024-08-24 02:05:47,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 7835648. Throughput: 0: 950.6. Samples: 1959850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-08-24 02:05:47,373][01629] Avg episode reward: [(0, '21.595')] [2024-08-24 02:05:52,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 7860224. Throughput: 0: 945.2. Samples: 1963066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:05:52,372][01629] Avg episode reward: [(0, '22.543')] [2024-08-24 02:05:53,050][04334] Updated weights for policy 0, policy_version 1920 (0.0018) [2024-08-24 02:05:57,370][01629] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 7880704. Throughput: 0: 993.8. Samples: 1969842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:05:57,379][01629] Avg episode reward: [(0, '22.046')] [2024-08-24 02:06:02,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 7897088. Throughput: 0: 968.4. Samples: 1974790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-08-24 02:06:02,376][01629] Avg episode reward: [(0, '22.306')] [2024-08-24 02:06:04,738][04334] Updated weights for policy 0, policy_version 1930 (0.0039) [2024-08-24 02:06:07,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 7917568. Throughput: 0: 942.7. Samples: 1977128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:06:07,377][01629] Avg episode reward: [(0, '23.173')] [2024-08-24 02:06:12,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 7938048. Throughput: 0: 971.9. Samples: 1984102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:06:12,377][01629] Avg episode reward: [(0, '22.662')] [2024-08-24 02:06:13,633][04334] Updated weights for policy 0, policy_version 1940 (0.0019) [2024-08-24 02:06:17,370][01629] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 7954432. Throughput: 0: 989.2. Samples: 1989906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-08-24 02:06:17,378][01629] Avg episode reward: [(0, '23.667')] [2024-08-24 02:06:22,370][01629] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 7970816. Throughput: 0: 961.5. Samples: 1992020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-08-24 02:06:22,372][01629] Avg episode reward: [(0, '23.432')] [2024-08-24 02:06:25,145][04334] Updated weights for policy 0, policy_version 1950 (0.0016) [2024-08-24 02:06:27,370][01629] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 7995392. Throughput: 0: 952.2. Samples: 1998310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-08-24 02:06:27,374][01629] Avg episode reward: [(0, '22.792')] [2024-08-24 02:06:29,722][04321] Stopping Batcher_0... [2024-08-24 02:06:29,723][04321] Loop batcher_evt_loop terminating... [2024-08-24 02:06:29,724][01629] Component Batcher_0 stopped! [2024-08-24 02:06:29,741][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2024-08-24 02:06:29,765][04334] Weights refcount: 2 0 [2024-08-24 02:06:29,770][04334] Stopping InferenceWorker_p0-w0... [2024-08-24 02:06:29,771][04334] Loop inference_proc0-0_evt_loop terminating... [2024-08-24 02:06:29,771][01629] Component InferenceWorker_p0-w0 stopped! [2024-08-24 02:06:29,878][04321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001767_7237632.pth [2024-08-24 02:06:29,897][04321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2024-08-24 02:06:30,081][01629] Component LearnerWorker_p0 stopped! [2024-08-24 02:06:30,088][04321] Stopping LearnerWorker_p0... [2024-08-24 02:06:30,089][04321] Loop learner_proc0_evt_loop terminating... [2024-08-24 02:06:30,154][04343] Stopping RolloutWorker_w5... [2024-08-24 02:06:30,154][04343] Loop rollout_proc5_evt_loop terminating... [2024-08-24 02:06:30,154][01629] Component RolloutWorker_w5 stopped! [2024-08-24 02:06:30,179][04336] Stopping RolloutWorker_w1... [2024-08-24 02:06:30,183][04336] Loop rollout_proc1_evt_loop terminating... [2024-08-24 02:06:30,179][01629] Component RolloutWorker_w1 stopped! [2024-08-24 02:06:30,217][04338] Stopping RolloutWorker_w3... [2024-08-24 02:06:30,218][01629] Component RolloutWorker_w3 stopped! [2024-08-24 02:06:30,217][04338] Loop rollout_proc3_evt_loop terminating... [2024-08-24 02:06:30,230][01629] Component RolloutWorker_w2 stopped! [2024-08-24 02:06:30,237][04337] Stopping RolloutWorker_w2... [2024-08-24 02:06:30,238][04337] Loop rollout_proc2_evt_loop terminating... [2024-08-24 02:06:30,258][04346] Stopping RolloutWorker_w7... [2024-08-24 02:06:30,259][04346] Loop rollout_proc7_evt_loop terminating... [2024-08-24 02:06:30,253][01629] Component RolloutWorker_w0 stopped! [2024-08-24 02:06:30,260][04335] Stopping RolloutWorker_w0... [2024-08-24 02:06:30,273][04344] Stopping RolloutWorker_w4... [2024-08-24 02:06:30,278][01629] Component RolloutWorker_w7 stopped! [2024-08-24 02:06:30,280][01629] Component RolloutWorker_w4 stopped! [2024-08-24 02:06:30,292][04345] Stopping RolloutWorker_w6... [2024-08-24 02:06:30,279][04344] Loop rollout_proc4_evt_loop terminating... [2024-08-24 02:06:30,280][04335] Loop rollout_proc0_evt_loop terminating... [2024-08-24 02:06:30,292][01629] Component RolloutWorker_w6 stopped! [2024-08-24 02:06:30,293][01629] Waiting for process learner_proc0 to stop... [2024-08-24 02:06:30,304][04345] Loop rollout_proc6_evt_loop terminating... [2024-08-24 02:06:31,673][01629] Waiting for process inference_proc0-0 to join... [2024-08-24 02:06:31,952][01629] Waiting for process rollout_proc0 to join... [2024-08-24 02:06:33,926][01629] Waiting for process rollout_proc1 to join... [2024-08-24 02:06:33,935][01629] Waiting for process rollout_proc2 to join... [2024-08-24 02:06:33,940][01629] Waiting for process rollout_proc3 to join... [2024-08-24 02:06:33,943][01629] Waiting for process rollout_proc4 to join... [2024-08-24 02:06:33,949][01629] Waiting for process rollout_proc5 to join... [2024-08-24 02:06:33,954][01629] Waiting for process rollout_proc6 to join... [2024-08-24 02:06:33,958][01629] Waiting for process rollout_proc7 to join... [2024-08-24 02:06:33,962][01629] Batcher 0 profile tree view: batching: 51.5623, releasing_batches: 0.0472 [2024-08-24 02:06:33,967][01629] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0054 wait_policy_total: 935.7146 update_model: 15.5899 weight_update: 0.0012 one_step: 0.0023 handle_policy_step: 1055.0999 deserialize: 29.7189, stack: 5.8373, obs_to_device_normalize: 230.5561, forward: 518.1148, send_messages: 53.4938 prepare_outputs: 162.9249 to_cpu: 101.1999 [2024-08-24 02:06:33,974][01629] Learner 0 profile tree view: misc: 0.0105, prepare_batch: 27.4964 train: 145.9590 epoch_init: 0.0141, minibatch_init: 0.0143, losses_postprocess: 1.2676, kl_divergence: 1.2168, after_optimizer: 66.7268 calculate_losses: 48.5607 losses_init: 0.0125, forward_head: 3.1232, bptt_initial: 31.2669, tail: 2.1440, advantages_returns: 0.5368, losses: 6.4627 bptt: 4.3783 bptt_forward_core: 4.2303 update: 27.0924 clip: 2.9025 [2024-08-24 02:06:33,976][01629] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.6929, enqueue_policy_requests: 229.5247, env_step: 1630.6699, overhead: 26.1541, complete_rollouts: 14.1748 save_policy_outputs: 48.3752 split_output_tensors: 16.5344 [2024-08-24 02:06:33,978][01629] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.5810, enqueue_policy_requests: 233.2167, env_step: 1625.4271, overhead: 27.3267, complete_rollouts: 13.0617 save_policy_outputs: 48.3434 split_output_tensors: 16.5505 [2024-08-24 02:06:33,980][01629] Loop Runner_EvtLoop terminating... [2024-08-24 02:06:33,984][01629] Runner profile tree view: main_loop: 2115.2917 [2024-08-24 02:06:33,985][01629] Collected {0: 8007680}, FPS: 3785.6 [2024-08-24 02:06:34,291][01629] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-24 02:06:34,294][01629] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-24 02:06:34,296][01629] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-24 02:06:34,298][01629] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-24 02:06:34,300][01629] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-24 02:06:34,301][01629] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-24 02:06:34,303][01629] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-08-24 02:06:34,304][01629] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-24 02:06:34,306][01629] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-08-24 02:06:34,308][01629] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-08-24 02:06:34,309][01629] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-24 02:06:34,311][01629] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-24 02:06:34,312][01629] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-24 02:06:34,314][01629] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-24 02:06:34,315][01629] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-24 02:06:34,342][01629] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-24 02:06:34,345][01629] RunningMeanStd input shape: (3, 72, 128) [2024-08-24 02:06:34,348][01629] RunningMeanStd input shape: (1,) [2024-08-24 02:06:34,371][01629] ConvEncoder: input_channels=3 [2024-08-24 02:06:34,558][01629] Conv encoder output size: 512 [2024-08-24 02:06:34,561][01629] Policy head output size: 512 [2024-08-24 02:06:36,343][01629] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2024-08-24 02:06:37,204][01629] Num frames 100... [2024-08-24 02:06:37,332][01629] Num frames 200... [2024-08-24 02:06:37,454][01629] Num frames 300... [2024-08-24 02:06:37,571][01629] Num frames 400... [2024-08-24 02:06:37,695][01629] Num frames 500... [2024-08-24 02:06:37,822][01629] Num frames 600... [2024-08-24 02:06:37,953][01629] Num frames 700... [2024-08-24 02:06:38,070][01629] Num frames 800... [2024-08-24 02:06:38,189][01629] Num frames 900... [2024-08-24 02:06:38,354][01629] Avg episode rewards: #0: 20.920, true rewards: #0: 9.920 [2024-08-24 02:06:38,356][01629] Avg episode reward: 20.920, avg true_objective: 9.920 [2024-08-24 02:06:38,369][01629] Num frames 1000... [2024-08-24 02:06:38,487][01629] Num frames 1100... [2024-08-24 02:06:38,611][01629] Num frames 1200... [2024-08-24 02:06:38,731][01629] Num frames 1300... [2024-08-24 02:06:38,855][01629] Num frames 1400... [2024-08-24 02:06:38,985][01629] Num frames 1500... [2024-08-24 02:06:39,104][01629] Num frames 1600... [2024-08-24 02:06:39,225][01629] Num frames 1700... [2024-08-24 02:06:39,348][01629] Num frames 1800... [2024-08-24 02:06:39,470][01629] Num frames 1900... [2024-08-24 02:06:39,590][01629] Num frames 2000... [2024-08-24 02:06:39,710][01629] Num frames 2100... [2024-08-24 02:06:39,830][01629] Num frames 2200... [2024-08-24 02:06:39,972][01629] Num frames 2300... [2024-08-24 02:06:40,100][01629] Num frames 2400... [2024-08-24 02:06:40,222][01629] Num frames 2500... [2024-08-24 02:06:40,341][01629] Num frames 2600... [2024-08-24 02:06:40,462][01629] Num frames 2700... [2024-08-24 02:06:40,582][01629] Num frames 2800... [2024-08-24 02:06:40,700][01629] Num frames 2900... [2024-08-24 02:06:40,822][01629] Num frames 3000... [2024-08-24 02:06:40,999][01629] Avg episode rewards: #0: 40.959, true rewards: #0: 15.460 [2024-08-24 02:06:41,001][01629] Avg episode reward: 40.959, avg true_objective: 15.460 [2024-08-24 02:06:41,014][01629] Num frames 3100... [2024-08-24 02:06:41,131][01629] Num frames 3200... [2024-08-24 02:06:41,250][01629] Num frames 3300... [2024-08-24 02:06:41,371][01629] Num frames 3400... [2024-08-24 02:06:41,487][01629] Num frames 3500... [2024-08-24 02:06:41,609][01629] Num frames 3600... [2024-08-24 02:06:41,727][01629] Num frames 3700... [2024-08-24 02:06:41,848][01629] Num frames 3800... [2024-08-24 02:06:41,983][01629] Num frames 3900... [2024-08-24 02:06:42,099][01629] Num frames 4000... [2024-08-24 02:06:42,217][01629] Avg episode rewards: #0: 33.506, true rewards: #0: 13.507 [2024-08-24 02:06:42,219][01629] Avg episode reward: 33.506, avg true_objective: 13.507 [2024-08-24 02:06:42,276][01629] Num frames 4100... [2024-08-24 02:06:42,396][01629] Num frames 4200... [2024-08-24 02:06:42,517][01629] Num frames 4300... [2024-08-24 02:06:42,636][01629] Num frames 4400... [2024-08-24 02:06:42,752][01629] Num frames 4500... [2024-08-24 02:06:42,883][01629] Num frames 4600... [2024-08-24 02:06:43,014][01629] Num frames 4700... [2024-08-24 02:06:43,136][01629] Num frames 4800... [2024-08-24 02:06:43,250][01629] Num frames 4900... [2024-08-24 02:06:43,365][01629] Num frames 5000... [2024-08-24 02:06:43,486][01629] Num frames 5100... [2024-08-24 02:06:43,607][01629] Num frames 5200... [2024-08-24 02:06:43,728][01629] Num frames 5300... [2024-08-24 02:06:43,857][01629] Avg episode rewards: #0: 32.410, true rewards: #0: 13.410 [2024-08-24 02:06:43,859][01629] Avg episode reward: 32.410, avg true_objective: 13.410 [2024-08-24 02:06:43,909][01629] Num frames 5400... [2024-08-24 02:06:44,033][01629] Num frames 5500... [2024-08-24 02:06:44,150][01629] Num frames 5600... [2024-08-24 02:06:44,271][01629] Num frames 5700... [2024-08-24 02:06:44,392][01629] Num frames 5800... [2024-08-24 02:06:44,498][01629] Avg episode rewards: #0: 28.088, true rewards: #0: 11.688 [2024-08-24 02:06:44,499][01629] Avg episode reward: 28.088, avg true_objective: 11.688 [2024-08-24 02:06:44,567][01629] Num frames 5900... [2024-08-24 02:06:44,687][01629] Num frames 6000... [2024-08-24 02:06:44,812][01629] Num frames 6100... [2024-08-24 02:06:44,936][01629] Num frames 6200... [2024-08-24 02:06:45,064][01629] Num frames 6300... [2024-08-24 02:06:45,217][01629] Num frames 6400... [2024-08-24 02:06:45,378][01629] Num frames 6500... [2024-08-24 02:06:45,510][01629] Avg episode rewards: #0: 25.247, true rewards: #0: 10.913 [2024-08-24 02:06:45,512][01629] Avg episode reward: 25.247, avg true_objective: 10.913 [2024-08-24 02:06:45,598][01629] Num frames 6600... [2024-08-24 02:06:45,762][01629] Num frames 6700... [2024-08-24 02:06:45,931][01629] Num frames 6800... [2024-08-24 02:06:46,091][01629] Num frames 6900... [2024-08-24 02:06:46,246][01629] Num frames 7000... [2024-08-24 02:06:46,409][01629] Num frames 7100... [2024-08-24 02:06:46,588][01629] Num frames 7200... [2024-08-24 02:06:46,751][01629] Num frames 7300... [2024-08-24 02:06:46,925][01629] Num frames 7400... [2024-08-24 02:06:47,099][01629] Num frames 7500... [2024-08-24 02:06:47,216][01629] Avg episode rewards: #0: 24.757, true rewards: #0: 10.757 [2024-08-24 02:06:47,218][01629] Avg episode reward: 24.757, avg true_objective: 10.757 [2024-08-24 02:06:47,335][01629] Num frames 7600... [2024-08-24 02:06:47,507][01629] Num frames 7700... [2024-08-24 02:06:47,642][01629] Num frames 7800... [2024-08-24 02:06:47,758][01629] Num frames 7900... [2024-08-24 02:06:47,883][01629] Num frames 8000... [2024-08-24 02:06:48,009][01629] Num frames 8100... [2024-08-24 02:06:48,115][01629] Avg episode rewards: #0: 23.172, true rewards: #0: 10.172 [2024-08-24 02:06:48,117][01629] Avg episode reward: 23.172, avg true_objective: 10.172 [2024-08-24 02:06:48,203][01629] Num frames 8200... [2024-08-24 02:06:48,324][01629] Num frames 8300... [2024-08-24 02:06:48,444][01629] Num frames 8400... [2024-08-24 02:06:48,567][01629] Num frames 8500... [2024-08-24 02:06:48,687][01629] Num frames 8600... [2024-08-24 02:06:48,811][01629] Num frames 8700... [2024-08-24 02:06:48,938][01629] Num frames 8800... [2024-08-24 02:06:49,058][01629] Num frames 8900... [2024-08-24 02:06:49,182][01629] Num frames 9000... [2024-08-24 02:06:49,304][01629] Num frames 9100... [2024-08-24 02:06:49,427][01629] Num frames 9200... [2024-08-24 02:06:49,556][01629] Num frames 9300... [2024-08-24 02:06:49,672][01629] Num frames 9400... [2024-08-24 02:06:49,789][01629] Num frames 9500... [2024-08-24 02:06:49,914][01629] Num frames 9600... [2024-08-24 02:06:50,018][01629] Avg episode rewards: #0: 23.824, true rewards: #0: 10.713 [2024-08-24 02:06:50,020][01629] Avg episode reward: 23.824, avg true_objective: 10.713 [2024-08-24 02:06:50,089][01629] Num frames 9700... [2024-08-24 02:06:50,211][01629] Num frames 9800... [2024-08-24 02:06:50,330][01629] Num frames 9900... [2024-08-24 02:06:50,465][01629] Avg episode rewards: #0: 21.864, true rewards: #0: 9.964 [2024-08-24 02:06:50,466][01629] Avg episode reward: 21.864, avg true_objective: 9.964 [2024-08-24 02:07:45,300][01629] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-08-24 02:09:03,139][01629] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-24 02:09:03,141][01629] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-24 02:09:03,143][01629] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-24 02:09:03,145][01629] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-24 02:09:03,147][01629] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-24 02:09:03,149][01629] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-24 02:09:03,150][01629] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-08-24 02:09:03,152][01629] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-24 02:09:03,153][01629] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-08-24 02:09:03,154][01629] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-08-24 02:09:03,155][01629] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-24 02:09:03,156][01629] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-24 02:09:03,157][01629] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-24 02:09:03,158][01629] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-24 02:09:03,160][01629] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-24 02:09:03,169][01629] RunningMeanStd input shape: (3, 72, 128) [2024-08-24 02:09:03,177][01629] RunningMeanStd input shape: (1,) [2024-08-24 02:09:03,189][01629] ConvEncoder: input_channels=3 [2024-08-24 02:09:03,225][01629] Conv encoder output size: 512 [2024-08-24 02:09:03,226][01629] Policy head output size: 512 [2024-08-24 02:09:03,246][01629] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2024-08-24 02:09:15,003][01629] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-24 02:09:15,004][01629] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-24 02:09:15,007][01629] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-24 02:09:15,008][01629] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-24 02:09:15,010][01629] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-24 02:09:15,012][01629] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-24 02:09:15,013][01629] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-08-24 02:09:15,015][01629] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-24 02:09:15,017][01629] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-08-24 02:09:15,019][01629] Adding new argument 'hf_repository'='electricwapiti/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-08-24 02:09:15,020][01629] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-24 02:09:15,021][01629] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-24 02:09:15,024][01629] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-24 02:09:15,025][01629] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-24 02:09:15,027][01629] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-24 02:09:15,041][01629] RunningMeanStd input shape: (3, 72, 128) [2024-08-24 02:09:15,043][01629] RunningMeanStd input shape: (1,) [2024-08-24 02:09:15,055][01629] ConvEncoder: input_channels=3 [2024-08-24 02:09:15,092][01629] Conv encoder output size: 512 [2024-08-24 02:09:15,093][01629] Policy head output size: 512 [2024-08-24 02:09:15,114][01629] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... [2024-08-24 02:09:15,590][01629] Num frames 100... [2024-08-24 02:09:15,712][01629] Num frames 200... [2024-08-24 02:09:15,843][01629] Num frames 300... [2024-08-24 02:09:15,991][01629] Num frames 400... [2024-08-24 02:09:16,108][01629] Num frames 500... [2024-08-24 02:09:16,226][01629] Num frames 600... [2024-08-24 02:09:16,375][01629] Num frames 700... [2024-08-24 02:09:16,439][01629] Avg episode rewards: #0: 14.040, true rewards: #0: 7.040 [2024-08-24 02:09:16,441][01629] Avg episode reward: 14.040, avg true_objective: 7.040 [2024-08-24 02:09:16,560][01629] Num frames 800... [2024-08-24 02:09:16,679][01629] Num frames 900... [2024-08-24 02:09:16,801][01629] Num frames 1000... [2024-08-24 02:09:16,938][01629] Num frames 1100... [2024-08-24 02:09:17,062][01629] Num frames 1200... [2024-08-24 02:09:17,185][01629] Num frames 1300... [2024-08-24 02:09:17,308][01629] Num frames 1400... [2024-08-24 02:09:17,427][01629] Avg episode rewards: #0: 15.750, true rewards: #0: 7.250 [2024-08-24 02:09:17,429][01629] Avg episode reward: 15.750, avg true_objective: 7.250 [2024-08-24 02:09:17,496][01629] Num frames 1500... [2024-08-24 02:09:17,616][01629] Num frames 1600... [2024-08-24 02:09:17,733][01629] Num frames 1700... [2024-08-24 02:09:17,862][01629] Num frames 1800... [2024-08-24 02:09:17,991][01629] Num frames 1900... [2024-08-24 02:09:18,162][01629] Avg episode rewards: #0: 15.320, true rewards: #0: 6.653 [2024-08-24 02:09:18,164][01629] Avg episode reward: 15.320, avg true_objective: 6.653 [2024-08-24 02:09:18,172][01629] Num frames 2000... [2024-08-24 02:09:18,289][01629] Num frames 2100... [2024-08-24 02:09:18,418][01629] Num frames 2200... [2024-08-24 02:09:18,580][01629] Avg episode rewards: #0: 12.478, true rewards: #0: 5.727 [2024-08-24 02:09:18,581][01629] Avg episode reward: 12.478, avg true_objective: 5.727 [2024-08-24 02:09:18,595][01629] Num frames 2300... [2024-08-24 02:09:18,710][01629] Num frames 2400... [2024-08-24 02:09:18,826][01629] Num frames 2500... [2024-08-24 02:09:18,957][01629] Num frames 2600... [2024-08-24 02:09:19,079][01629] Num frames 2700... [2024-08-24 02:09:19,195][01629] Num frames 2800... [2024-08-24 02:09:19,309][01629] Num frames 2900... [2024-08-24 02:09:19,426][01629] Num frames 3000... [2024-08-24 02:09:19,547][01629] Num frames 3100... [2024-08-24 02:09:19,607][01629] Avg episode rewards: #0: 13.006, true rewards: #0: 6.206 [2024-08-24 02:09:19,608][01629] Avg episode reward: 13.006, avg true_objective: 6.206 [2024-08-24 02:09:19,723][01629] Num frames 3200... [2024-08-24 02:09:19,841][01629] Num frames 3300... [2024-08-24 02:09:19,974][01629] Num frames 3400... [2024-08-24 02:09:20,091][01629] Num frames 3500... [2024-08-24 02:09:20,208][01629] Num frames 3600... [2024-08-24 02:09:20,329][01629] Num frames 3700... [2024-08-24 02:09:20,491][01629] Num frames 3800... [2024-08-24 02:09:20,612][01629] Avg episode rewards: #0: 13.065, true rewards: #0: 6.398 [2024-08-24 02:09:20,616][01629] Avg episode reward: 13.065, avg true_objective: 6.398 [2024-08-24 02:09:20,757][01629] Num frames 3900... [2024-08-24 02:09:21,030][01629] Num frames 4000... [2024-08-24 02:09:21,239][01629] Num frames 4100... [2024-08-24 02:09:21,427][01629] Num frames 4200... [2024-08-24 02:09:21,614][01629] Num frames 4300... [2024-08-24 02:09:21,837][01629] Num frames 4400... [2024-08-24 02:09:22,133][01629] Num frames 4500... [2024-08-24 02:09:22,528][01629] Num frames 4600... [2024-08-24 02:09:22,911][01629] Num frames 4700... [2024-08-24 02:09:23,355][01629] Num frames 4800... [2024-08-24 02:09:23,791][01629] Avg episode rewards: #0: 14.564, true rewards: #0: 6.993 [2024-08-24 02:09:23,799][01629] Avg episode reward: 14.564, avg true_objective: 6.993 [2024-08-24 02:09:23,830][01629] Num frames 4900... [2024-08-24 02:09:24,231][01629] Num frames 5000... [2024-08-24 02:09:24,543][01629] Num frames 5100... [2024-08-24 02:09:24,826][01629] Num frames 5200... [2024-08-24 02:09:24,992][01629] Num frames 5300... [2024-08-24 02:09:25,128][01629] Num frames 5400... [2024-08-24 02:09:25,246][01629] Num frames 5500... [2024-08-24 02:09:25,373][01629] Num frames 5600... [2024-08-24 02:09:25,495][01629] Num frames 5700... [2024-08-24 02:09:25,613][01629] Num frames 5800... [2024-08-24 02:09:25,733][01629] Num frames 5900... [2024-08-24 02:09:25,848][01629] Num frames 6000... [2024-08-24 02:09:25,974][01629] Num frames 6100... [2024-08-24 02:09:26,094][01629] Num frames 6200... [2024-08-24 02:09:26,223][01629] Num frames 6300... [2024-08-24 02:09:26,346][01629] Num frames 6400... [2024-08-24 02:09:26,465][01629] Num frames 6500... [2024-08-24 02:09:26,586][01629] Num frames 6600... [2024-08-24 02:09:26,660][01629] Avg episode rewards: #0: 18.268, true rewards: #0: 8.267 [2024-08-24 02:09:26,661][01629] Avg episode reward: 18.268, avg true_objective: 8.267 [2024-08-24 02:09:26,765][01629] Num frames 6700... [2024-08-24 02:09:26,893][01629] Num frames 6800... [2024-08-24 02:09:27,012][01629] Num frames 6900... [2024-08-24 02:09:27,128][01629] Num frames 7000... [2024-08-24 02:09:27,281][01629] Num frames 7100... [2024-08-24 02:09:27,398][01629] Num frames 7200... [2024-08-24 02:09:27,514][01629] Num frames 7300... [2024-08-24 02:09:27,637][01629] Num frames 7400... [2024-08-24 02:09:27,706][01629] Avg episode rewards: #0: 18.122, true rewards: #0: 8.233 [2024-08-24 02:09:27,708][01629] Avg episode reward: 18.122, avg true_objective: 8.233 [2024-08-24 02:09:27,814][01629] Num frames 7500... [2024-08-24 02:09:27,942][01629] Num frames 7600... [2024-08-24 02:09:28,061][01629] Num frames 7700... [2024-08-24 02:09:28,181][01629] Num frames 7800... [2024-08-24 02:09:28,303][01629] Num frames 7900... [2024-08-24 02:09:28,423][01629] Num frames 8000... [2024-08-24 02:09:28,544][01629] Num frames 8100... [2024-08-24 02:09:28,661][01629] Num frames 8200... [2024-08-24 02:09:28,780][01629] Num frames 8300... [2024-08-24 02:09:28,905][01629] Num frames 8400... [2024-08-24 02:09:29,026][01629] Num frames 8500... [2024-08-24 02:09:29,121][01629] Avg episode rewards: #0: 18.630, true rewards: #0: 8.530 [2024-08-24 02:09:29,122][01629] Avg episode reward: 18.630, avg true_objective: 8.530 [2024-08-24 02:10:17,270][01629] Replay video saved to /content/train_dir/default_experiment/replay.mp4!