[2024-08-31 17:36:31,213][00204] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-08-31 17:36:31,215][00204] Rollout worker 0 uses device cpu [2024-08-31 17:36:31,217][00204] Rollout worker 1 uses device cpu [2024-08-31 17:36:31,218][00204] Rollout worker 2 uses device cpu [2024-08-31 17:36:31,219][00204] Rollout worker 3 uses device cpu [2024-08-31 17:36:31,221][00204] Rollout worker 4 uses device cpu [2024-08-31 17:36:31,222][00204] Rollout worker 5 uses device cpu [2024-08-31 17:36:31,223][00204] Rollout worker 6 uses device cpu [2024-08-31 17:36:31,224][00204] Rollout worker 7 uses device cpu [2024-08-31 17:36:31,375][00204] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-31 17:36:31,376][00204] InferenceWorker_p0-w0: min num requests: 2 [2024-08-31 17:36:31,412][00204] Starting all processes... [2024-08-31 17:36:31,414][00204] Starting process learner_proc0 [2024-08-31 17:36:31,463][00204] Starting all processes... [2024-08-31 17:36:31,478][00204] Starting process inference_proc0-0 [2024-08-31 17:36:31,478][00204] Starting process rollout_proc0 [2024-08-31 17:36:31,480][00204] Starting process rollout_proc1 [2024-08-31 17:36:31,480][00204] Starting process rollout_proc2 [2024-08-31 17:36:31,480][00204] Starting process rollout_proc3 [2024-08-31 17:36:31,480][00204] Starting process rollout_proc4 [2024-08-31 17:36:31,481][00204] Starting process rollout_proc5 [2024-08-31 17:36:31,481][00204] Starting process rollout_proc6 [2024-08-31 17:36:31,481][00204] Starting process rollout_proc7 [2024-08-31 17:36:42,848][04514] Worker 3 uses CPU cores [1] [2024-08-31 17:36:43,005][04495] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-31 17:36:43,012][04495] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-08-31 17:36:43,084][04495] Num visible devices: 1 [2024-08-31 17:36:43,112][04512] Worker 5 uses CPU cores [1] [2024-08-31 17:36:43,127][04495] Starting seed is not provided [2024-08-31 17:36:43,127][04495] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-31 17:36:43,128][04495] Initializing actor-critic model on device cuda:0 [2024-08-31 17:36:43,129][04495] RunningMeanStd input shape: (3, 72, 128) [2024-08-31 17:36:43,131][04495] RunningMeanStd input shape: (1,) [2024-08-31 17:36:43,224][04495] ConvEncoder: input_channels=3 [2024-08-31 17:36:43,253][04509] Worker 0 uses CPU cores [0] [2024-08-31 17:36:43,273][04513] Worker 4 uses CPU cores [0] [2024-08-31 17:36:43,271][04508] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-31 17:36:43,277][04510] Worker 1 uses CPU cores [1] [2024-08-31 17:36:43,274][04508] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-08-31 17:36:43,296][04516] Worker 7 uses CPU cores [1] [2024-08-31 17:36:43,330][04515] Worker 6 uses CPU cores [0] [2024-08-31 17:36:43,333][04511] Worker 2 uses CPU cores [0] [2024-08-31 17:36:43,334][04508] Num visible devices: 1 [2024-08-31 17:36:43,468][04495] Conv encoder output size: 512 [2024-08-31 17:36:43,468][04495] Policy head output size: 512 [2024-08-31 17:36:43,483][04495] Created Actor Critic model with architecture: [2024-08-31 17:36:43,483][04495] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-08-31 17:36:47,674][04495] Using optimizer [2024-08-31 17:36:47,675][04495] No checkpoints found [2024-08-31 17:36:47,675][04495] Did not load from checkpoint, starting from scratch! [2024-08-31 17:36:47,676][04495] Initialized policy 0 weights for model version 0 [2024-08-31 17:36:47,678][04495] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-08-31 17:36:47,686][04495] LearnerWorker_p0 finished initialization! [2024-08-31 17:36:47,859][00204] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-31 17:36:47,928][04508] RunningMeanStd input shape: (3, 72, 128) [2024-08-31 17:36:47,930][04508] RunningMeanStd input shape: (1,) [2024-08-31 17:36:47,943][04508] ConvEncoder: input_channels=3 [2024-08-31 17:36:48,044][04508] Conv encoder output size: 512 [2024-08-31 17:36:48,044][04508] Policy head output size: 512 [2024-08-31 17:36:49,566][00204] Inference worker 0-0 is ready! [2024-08-31 17:36:49,568][00204] All inference workers are ready! Signal rollout workers to start! [2024-08-31 17:36:49,656][04511] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-31 17:36:49,654][04513] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-31 17:36:49,668][04509] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-31 17:36:49,683][04512] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-31 17:36:49,684][04510] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-31 17:36:49,675][04516] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-31 17:36:49,684][04515] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-31 17:36:49,703][04514] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-31 17:36:50,894][04516] Decorrelating experience for 0 frames... [2024-08-31 17:36:50,895][04510] Decorrelating experience for 0 frames... [2024-08-31 17:36:50,894][04509] Decorrelating experience for 0 frames... [2024-08-31 17:36:51,262][04509] Decorrelating experience for 32 frames... [2024-08-31 17:36:51,367][00204] Heartbeat connected on Batcher_0 [2024-08-31 17:36:51,372][00204] Heartbeat connected on LearnerWorker_p0 [2024-08-31 17:36:51,424][00204] Heartbeat connected on InferenceWorker_p0-w0 [2024-08-31 17:36:51,556][04510] Decorrelating experience for 32 frames... [2024-08-31 17:36:51,569][04516] Decorrelating experience for 32 frames... [2024-08-31 17:36:52,092][04509] Decorrelating experience for 64 frames... [2024-08-31 17:36:52,715][04509] Decorrelating experience for 96 frames... [2024-08-31 17:36:52,786][04510] Decorrelating experience for 64 frames... [2024-08-31 17:36:52,784][04516] Decorrelating experience for 64 frames... [2024-08-31 17:36:52,865][00204] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-31 17:36:52,942][00204] Heartbeat connected on RolloutWorker_w0 [2024-08-31 17:36:53,784][04510] Decorrelating experience for 96 frames... [2024-08-31 17:36:53,787][04516] Decorrelating experience for 96 frames... [2024-08-31 17:36:53,968][00204] Heartbeat connected on RolloutWorker_w1 [2024-08-31 17:36:53,979][00204] Heartbeat connected on RolloutWorker_w7 [2024-08-31 17:36:57,859][00204] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1.6. Samples: 16. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-08-31 17:36:57,863][00204] Avg episode reward: [(0, '2.640')] [2024-08-31 17:36:58,813][04495] Signal inference workers to stop experience collection... [2024-08-31 17:36:58,833][04508] InferenceWorker_p0-w0: stopping experience collection [2024-08-31 17:37:00,363][04495] Signal inference workers to resume experience collection... [2024-08-31 17:37:00,364][04508] InferenceWorker_p0-w0: resuming experience collection [2024-08-31 17:37:02,859][00204] Fps is (10 sec: 1229.5, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 194.3. Samples: 2914. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-08-31 17:37:02,865][00204] Avg episode reward: [(0, '3.661')] [2024-08-31 17:37:07,859][00204] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 395.9. Samples: 7918. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:37:07,865][00204] Avg episode reward: [(0, '4.074')] [2024-08-31 17:37:11,021][04508] Updated weights for policy 0, policy_version 10 (0.0364) [2024-08-31 17:37:12,859][00204] Fps is (10 sec: 3276.8, 60 sec: 1802.3, 300 sec: 1802.3). Total num frames: 45056. Throughput: 0: 398.3. Samples: 9958. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:37:12,865][00204] Avg episode reward: [(0, '4.389')] [2024-08-31 17:37:17,859][00204] Fps is (10 sec: 3686.4, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 65536. Throughput: 0: 540.7. Samples: 16222. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:37:17,863][00204] Avg episode reward: [(0, '4.415')] [2024-08-31 17:37:21,321][04508] Updated weights for policy 0, policy_version 20 (0.0012) [2024-08-31 17:37:22,859][00204] Fps is (10 sec: 3686.4, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 613.0. Samples: 21454. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:37:22,865][00204] Avg episode reward: [(0, '4.488')] [2024-08-31 17:37:27,859][00204] Fps is (10 sec: 3686.4, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 592.4. Samples: 23696. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:37:27,861][00204] Avg episode reward: [(0, '4.394')] [2024-08-31 17:37:27,869][04495] Saving new best policy, reward=4.394! [2024-08-31 17:37:32,749][04508] Updated weights for policy 0, policy_version 30 (0.0012) [2024-08-31 17:37:32,859][00204] Fps is (10 sec: 4096.0, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 662.9. Samples: 29830. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:37:32,862][00204] Avg episode reward: [(0, '4.324')] [2024-08-31 17:37:37,862][00204] Fps is (10 sec: 3685.3, 60 sec: 2785.1, 300 sec: 2785.1). Total num frames: 139264. Throughput: 0: 776.9. Samples: 34958. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:37:37,866][00204] Avg episode reward: [(0, '4.456')] [2024-08-31 17:37:37,874][04495] Saving new best policy, reward=4.456! [2024-08-31 17:37:42,859][00204] Fps is (10 sec: 3276.8, 60 sec: 2830.0, 300 sec: 2830.0). Total num frames: 155648. Throughput: 0: 825.8. Samples: 37176. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:37:42,864][00204] Avg episode reward: [(0, '4.594')] [2024-08-31 17:37:42,866][04495] Saving new best policy, reward=4.594! [2024-08-31 17:37:44,433][04508] Updated weights for policy 0, policy_version 40 (0.0012) [2024-08-31 17:37:47,859][00204] Fps is (10 sec: 3687.5, 60 sec: 2935.5, 300 sec: 2935.5). Total num frames: 176128. Throughput: 0: 899.2. Samples: 43376. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:37:47,861][00204] Avg episode reward: [(0, '4.454')] [2024-08-31 17:37:52,864][00204] Fps is (10 sec: 3684.7, 60 sec: 3208.6, 300 sec: 2961.5). Total num frames: 192512. Throughput: 0: 897.0. Samples: 48288. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:37:52,865][00204] Avg episode reward: [(0, '4.329')] [2024-08-31 17:37:56,150][04508] Updated weights for policy 0, policy_version 50 (0.0012) [2024-08-31 17:37:57,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 2984.2). Total num frames: 208896. Throughput: 0: 907.9. Samples: 50814. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:37:57,866][00204] Avg episode reward: [(0, '4.407')] [2024-08-31 17:38:02,859][00204] Fps is (10 sec: 3688.1, 60 sec: 3618.1, 300 sec: 3058.4). Total num frames: 229376. Throughput: 0: 906.9. Samples: 57034. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:38:02,861][00204] Avg episode reward: [(0, '4.467')] [2024-08-31 17:38:07,505][04508] Updated weights for policy 0, policy_version 60 (0.0015) [2024-08-31 17:38:07,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3072.0). Total num frames: 245760. Throughput: 0: 893.6. Samples: 61668. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:38:07,866][00204] Avg episode reward: [(0, '4.421')] [2024-08-31 17:38:12,859][00204] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3132.2). Total num frames: 266240. Throughput: 0: 904.1. Samples: 64380. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:38:12,862][00204] Avg episode reward: [(0, '4.372')] [2024-08-31 17:38:17,801][04508] Updated weights for policy 0, policy_version 70 (0.0013) [2024-08-31 17:38:17,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3185.8). Total num frames: 286720. Throughput: 0: 906.5. Samples: 70624. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:38:17,861][00204] Avg episode reward: [(0, '4.515')] [2024-08-31 17:38:22,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3147.5). Total num frames: 299008. Throughput: 0: 891.8. Samples: 75088. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:38:22,866][00204] Avg episode reward: [(0, '4.689')] [2024-08-31 17:38:22,869][04495] Saving new best policy, reward=4.689! [2024-08-31 17:38:27,859][00204] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3194.9). Total num frames: 319488. Throughput: 0: 908.4. Samples: 78056. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:38:27,864][00204] Avg episode reward: [(0, '4.793')] [2024-08-31 17:38:27,878][04495] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth... [2024-08-31 17:38:27,993][04495] Saving new best policy, reward=4.793! [2024-08-31 17:38:29,778][04508] Updated weights for policy 0, policy_version 80 (0.0012) [2024-08-31 17:38:32,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3237.8). Total num frames: 339968. Throughput: 0: 904.8. Samples: 84090. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:38:32,863][00204] Avg episode reward: [(0, '4.605')] [2024-08-31 17:38:37,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3202.3). Total num frames: 352256. Throughput: 0: 889.6. Samples: 88316. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:38:37,862][00204] Avg episode reward: [(0, '4.386')] [2024-08-31 17:38:41,551][04508] Updated weights for policy 0, policy_version 90 (0.0012) [2024-08-31 17:38:42,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3241.2). Total num frames: 372736. Throughput: 0: 899.2. Samples: 91276. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:38:42,862][00204] Avg episode reward: [(0, '4.477')] [2024-08-31 17:38:47,859][00204] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3208.5). Total num frames: 385024. Throughput: 0: 872.1. Samples: 96280. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:38:47,866][00204] Avg episode reward: [(0, '4.557')] [2024-08-31 17:38:52,859][00204] Fps is (10 sec: 2457.6, 60 sec: 3413.6, 300 sec: 3178.5). Total num frames: 397312. Throughput: 0: 851.4. Samples: 99982. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:38:52,862][00204] Avg episode reward: [(0, '4.660')] [2024-08-31 17:38:55,283][04508] Updated weights for policy 0, policy_version 100 (0.0012) [2024-08-31 17:38:57,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3213.8). Total num frames: 417792. Throughput: 0: 856.8. Samples: 102934. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:38:57,862][00204] Avg episode reward: [(0, '4.696')] [2024-08-31 17:39:02,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3246.5). Total num frames: 438272. Throughput: 0: 855.8. Samples: 109134. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:39:02,863][00204] Avg episode reward: [(0, '4.586')] [2024-08-31 17:39:06,508][04508] Updated weights for policy 0, policy_version 110 (0.0012) [2024-08-31 17:39:07,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3218.3). Total num frames: 450560. Throughput: 0: 854.7. Samples: 113550. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:39:07,861][00204] Avg episode reward: [(0, '4.521')] [2024-08-31 17:39:12,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3248.6). Total num frames: 471040. Throughput: 0: 855.7. Samples: 116560. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:39:12,864][00204] Avg episode reward: [(0, '4.613')] [2024-08-31 17:39:16,851][04508] Updated weights for policy 0, policy_version 120 (0.0012) [2024-08-31 17:39:17,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3276.8). Total num frames: 491520. Throughput: 0: 860.0. Samples: 122788. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:39:17,866][00204] Avg episode reward: [(0, '4.548')] [2024-08-31 17:39:22,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 507904. Throughput: 0: 862.2. Samples: 127116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:39:22,866][00204] Avg episode reward: [(0, '4.564')] [2024-08-31 17:39:27,861][00204] Fps is (10 sec: 3685.6, 60 sec: 3481.5, 300 sec: 3302.4). Total num frames: 528384. Throughput: 0: 866.1. Samples: 130252. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:39:27,863][00204] Avg episode reward: [(0, '4.452')] [2024-08-31 17:39:28,492][04508] Updated weights for policy 0, policy_version 130 (0.0012) [2024-08-31 17:39:32,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3326.5). Total num frames: 548864. Throughput: 0: 893.4. Samples: 136484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:39:32,865][00204] Avg episode reward: [(0, '4.469')] [2024-08-31 17:39:37,859][00204] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3300.9). Total num frames: 561152. Throughput: 0: 907.1. Samples: 140802. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:39:37,867][00204] Avg episode reward: [(0, '4.558')] [2024-08-31 17:39:40,341][04508] Updated weights for policy 0, policy_version 140 (0.0012) [2024-08-31 17:39:42,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3323.6). Total num frames: 581632. Throughput: 0: 908.4. Samples: 143810. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:39:42,862][00204] Avg episode reward: [(0, '4.387')] [2024-08-31 17:39:47,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3345.1). Total num frames: 602112. Throughput: 0: 909.0. Samples: 150040. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:39:47,869][00204] Avg episode reward: [(0, '4.367')] [2024-08-31 17:39:51,885][04508] Updated weights for policy 0, policy_version 150 (0.0012) [2024-08-31 17:39:52,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3343.2). Total num frames: 618496. Throughput: 0: 911.0. Samples: 154544. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:39:52,863][00204] Avg episode reward: [(0, '4.483')] [2024-08-31 17:39:57,859][00204] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3363.0). Total num frames: 638976. Throughput: 0: 913.6. Samples: 157674. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:39:57,868][00204] Avg episode reward: [(0, '4.523')] [2024-08-31 17:40:01,935][04508] Updated weights for policy 0, policy_version 160 (0.0014) [2024-08-31 17:40:02,863][00204] Fps is (10 sec: 3684.7, 60 sec: 3617.9, 300 sec: 3360.7). Total num frames: 655360. Throughput: 0: 912.4. Samples: 163848. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:40:02,869][00204] Avg episode reward: [(0, '4.629')] [2024-08-31 17:40:07,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3358.7). Total num frames: 671744. Throughput: 0: 914.2. Samples: 168254. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:40:07,867][00204] Avg episode reward: [(0, '4.589')] [2024-08-31 17:40:12,860][00204] Fps is (10 sec: 3687.5, 60 sec: 3686.3, 300 sec: 3376.7). Total num frames: 692224. Throughput: 0: 912.5. Samples: 171312. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:40:12,862][00204] Avg episode reward: [(0, '4.592')] [2024-08-31 17:40:13,533][04508] Updated weights for policy 0, policy_version 170 (0.0018) [2024-08-31 17:40:17,860][00204] Fps is (10 sec: 3686.1, 60 sec: 3618.1, 300 sec: 3374.3). Total num frames: 708608. Throughput: 0: 907.8. Samples: 177338. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:40:17,862][00204] Avg episode reward: [(0, '4.666')] [2024-08-31 17:40:22,859][00204] Fps is (10 sec: 3277.2, 60 sec: 3618.1, 300 sec: 3372.1). Total num frames: 724992. Throughput: 0: 910.9. Samples: 181794. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:40:22,861][00204] Avg episode reward: [(0, '4.716')] [2024-08-31 17:40:25,574][04508] Updated weights for policy 0, policy_version 180 (0.0015) [2024-08-31 17:40:27,859][00204] Fps is (10 sec: 3686.7, 60 sec: 3618.3, 300 sec: 3388.5). Total num frames: 745472. Throughput: 0: 905.6. Samples: 184562. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:40:27,864][00204] Avg episode reward: [(0, '4.585')] [2024-08-31 17:40:27,876][04495] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000182_745472.pth... [2024-08-31 17:40:32,859][00204] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3386.0). Total num frames: 761856. Throughput: 0: 894.3. Samples: 190284. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:40:32,866][00204] Avg episode reward: [(0, '4.718')] [2024-08-31 17:40:37,376][04508] Updated weights for policy 0, policy_version 190 (0.0011) [2024-08-31 17:40:37,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3383.7). Total num frames: 778240. Throughput: 0: 898.3. Samples: 194966. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:40:37,867][00204] Avg episode reward: [(0, '4.564')] [2024-08-31 17:40:42,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3398.8). Total num frames: 798720. Throughput: 0: 895.5. Samples: 197972. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:40:42,864][00204] Avg episode reward: [(0, '4.508')] [2024-08-31 17:40:47,859][00204] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3396.3). Total num frames: 815104. Throughput: 0: 886.9. Samples: 203754. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:40:47,866][00204] Avg episode reward: [(0, '4.631')] [2024-08-31 17:40:48,604][04508] Updated weights for policy 0, policy_version 200 (0.0015) [2024-08-31 17:40:52,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3393.8). Total num frames: 831488. Throughput: 0: 894.0. Samples: 208482. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:40:52,866][00204] Avg episode reward: [(0, '4.770')] [2024-08-31 17:40:57,859][00204] Fps is (10 sec: 3686.3, 60 sec: 3549.8, 300 sec: 3407.9). Total num frames: 851968. Throughput: 0: 892.5. Samples: 211474. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:40:57,862][00204] Avg episode reward: [(0, '4.608')] [2024-08-31 17:40:59,299][04508] Updated weights for policy 0, policy_version 210 (0.0020) [2024-08-31 17:41:02,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3405.3). Total num frames: 868352. Throughput: 0: 878.9. Samples: 216888. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:41:02,863][00204] Avg episode reward: [(0, '4.528')] [2024-08-31 17:41:07,859][00204] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3402.8). Total num frames: 884736. Throughput: 0: 884.7. Samples: 221604. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:41:07,868][00204] Avg episode reward: [(0, '4.252')] [2024-08-31 17:41:11,480][04508] Updated weights for policy 0, policy_version 220 (0.0012) [2024-08-31 17:41:12,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3415.9). Total num frames: 905216. Throughput: 0: 889.6. Samples: 224594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:41:12,862][00204] Avg episode reward: [(0, '4.515')] [2024-08-31 17:41:17,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3413.3). Total num frames: 921600. Throughput: 0: 882.3. Samples: 229988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:41:17,863][00204] Avg episode reward: [(0, '4.749')] [2024-08-31 17:41:22,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3410.9). Total num frames: 937984. Throughput: 0: 891.1. Samples: 235066. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:41:22,864][00204] Avg episode reward: [(0, '4.822')] [2024-08-31 17:41:22,870][04495] Saving new best policy, reward=4.822! [2024-08-31 17:41:23,234][04508] Updated weights for policy 0, policy_version 230 (0.0016) [2024-08-31 17:41:27,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3423.1). Total num frames: 958464. Throughput: 0: 890.3. Samples: 238036. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:41:27,861][00204] Avg episode reward: [(0, '4.444')] [2024-08-31 17:41:32,863][00204] Fps is (10 sec: 3684.7, 60 sec: 3549.6, 300 sec: 3420.5). Total num frames: 974848. Throughput: 0: 876.6. Samples: 243206. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:41:32,866][00204] Avg episode reward: [(0, '4.424')] [2024-08-31 17:41:35,164][04508] Updated weights for policy 0, policy_version 240 (0.0012) [2024-08-31 17:41:37,861][00204] Fps is (10 sec: 3276.2, 60 sec: 3549.8, 300 sec: 3418.0). Total num frames: 991232. Throughput: 0: 886.1. Samples: 248360. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:41:37,863][00204] Avg episode reward: [(0, '4.564')] [2024-08-31 17:41:42,859][00204] Fps is (10 sec: 3687.9, 60 sec: 3549.8, 300 sec: 3429.5). Total num frames: 1011712. Throughput: 0: 883.9. Samples: 251250. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:41:42,862][00204] Avg episode reward: [(0, '4.718')] [2024-08-31 17:41:46,345][04508] Updated weights for policy 0, policy_version 250 (0.0013) [2024-08-31 17:41:47,859][00204] Fps is (10 sec: 3277.4, 60 sec: 3481.6, 300 sec: 3471.3). Total num frames: 1024000. Throughput: 0: 875.2. Samples: 256270. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:41:47,864][00204] Avg episode reward: [(0, '4.822')] [2024-08-31 17:41:52,859][00204] Fps is (10 sec: 2457.7, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 1036288. Throughput: 0: 843.4. Samples: 259558. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:41:52,865][00204] Avg episode reward: [(0, '4.634')] [2024-08-31 17:41:57,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3540.6). Total num frames: 1056768. Throughput: 0: 838.8. Samples: 262338. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:41:57,861][00204] Avg episode reward: [(0, '4.666')] [2024-08-31 17:41:59,729][04508] Updated weights for policy 0, policy_version 260 (0.0020) [2024-08-31 17:42:02,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3526.7). Total num frames: 1069056. Throughput: 0: 832.1. Samples: 267434. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:42:02,867][00204] Avg episode reward: [(0, '4.703')] [2024-08-31 17:42:07,859][00204] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3526.7). Total num frames: 1085440. Throughput: 0: 827.7. Samples: 272314. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:42:07,866][00204] Avg episode reward: [(0, '4.543')] [2024-08-31 17:42:11,911][04508] Updated weights for policy 0, policy_version 270 (0.0014) [2024-08-31 17:42:12,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3526.7). Total num frames: 1105920. Throughput: 0: 827.1. Samples: 275256. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:42:12,867][00204] Avg episode reward: [(0, '4.545')] [2024-08-31 17:42:17,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3526.7). Total num frames: 1122304. Throughput: 0: 827.0. Samples: 280418. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:42:17,867][00204] Avg episode reward: [(0, '4.722')] [2024-08-31 17:42:22,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 1138688. Throughput: 0: 821.9. Samples: 285346. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:42:22,861][00204] Avg episode reward: [(0, '4.662')] [2024-08-31 17:42:24,057][04508] Updated weights for policy 0, policy_version 280 (0.0017) [2024-08-31 17:42:27,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 1159168. Throughput: 0: 825.2. Samples: 288382. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:42:27,861][00204] Avg episode reward: [(0, '4.634')] [2024-08-31 17:42:27,875][04495] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000283_1159168.pth... [2024-08-31 17:42:27,971][04495] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth [2024-08-31 17:42:32,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3345.3, 300 sec: 3512.9). Total num frames: 1175552. Throughput: 0: 827.4. Samples: 293504. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:42:32,861][00204] Avg episode reward: [(0, '4.559')] [2024-08-31 17:42:36,124][04508] Updated weights for policy 0, policy_version 290 (0.0012) [2024-08-31 17:42:37,861][00204] Fps is (10 sec: 3276.1, 60 sec: 3345.0, 300 sec: 3512.8). Total num frames: 1191936. Throughput: 0: 868.9. Samples: 298662. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:42:37,864][00204] Avg episode reward: [(0, '4.641')] [2024-08-31 17:42:42,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 1212416. Throughput: 0: 874.4. Samples: 301688. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:42:42,865][00204] Avg episode reward: [(0, '4.777')] [2024-08-31 17:42:47,363][04508] Updated weights for policy 0, policy_version 300 (0.0016) [2024-08-31 17:42:47,859][00204] Fps is (10 sec: 3687.3, 60 sec: 3413.3, 300 sec: 3512.9). Total num frames: 1228800. Throughput: 0: 875.0. Samples: 306810. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:42:47,861][00204] Avg episode reward: [(0, '4.705')] [2024-08-31 17:42:52,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1245184. Throughput: 0: 885.8. Samples: 312176. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:42:52,861][00204] Avg episode reward: [(0, '4.615')] [2024-08-31 17:42:57,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1265664. Throughput: 0: 887.1. Samples: 315176. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:42:57,862][00204] Avg episode reward: [(0, '4.751')] [2024-08-31 17:42:57,983][04508] Updated weights for policy 0, policy_version 310 (0.0016) [2024-08-31 17:43:02,863][00204] Fps is (10 sec: 3685.0, 60 sec: 3549.6, 300 sec: 3512.8). Total num frames: 1282048. Throughput: 0: 883.4. Samples: 320176. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:43:02,865][00204] Avg episode reward: [(0, '4.796')] [2024-08-31 17:43:07,861][00204] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3512.8). Total num frames: 1302528. Throughput: 0: 894.8. Samples: 325612. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:43:07,866][00204] Avg episode reward: [(0, '4.829')] [2024-08-31 17:43:07,881][04495] Saving new best policy, reward=4.829! [2024-08-31 17:43:09,882][04508] Updated weights for policy 0, policy_version 320 (0.0018) [2024-08-31 17:43:12,862][00204] Fps is (10 sec: 3686.8, 60 sec: 3549.7, 300 sec: 3498.9). Total num frames: 1318912. Throughput: 0: 893.4. Samples: 328588. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:43:12,873][00204] Avg episode reward: [(0, '4.804')] [2024-08-31 17:43:17,859][00204] Fps is (10 sec: 3277.2, 60 sec: 3549.8, 300 sec: 3512.8). Total num frames: 1335296. Throughput: 0: 889.1. Samples: 333516. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:43:17,864][00204] Avg episode reward: [(0, '4.861')] [2024-08-31 17:43:17,882][04495] Saving new best policy, reward=4.861! [2024-08-31 17:43:21,850][04508] Updated weights for policy 0, policy_version 330 (0.0016) [2024-08-31 17:43:22,859][00204] Fps is (10 sec: 3687.4, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 1355776. Throughput: 0: 894.2. Samples: 338900. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:43:22,862][00204] Avg episode reward: [(0, '4.681')] [2024-08-31 17:43:27,859][00204] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 1376256. Throughput: 0: 895.6. Samples: 341990. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:43:27,862][00204] Avg episode reward: [(0, '4.850')] [2024-08-31 17:43:32,860][00204] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 3512.8). Total num frames: 1388544. Throughput: 0: 890.7. Samples: 346892. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:43:32,863][00204] Avg episode reward: [(0, '4.849')] [2024-08-31 17:43:33,654][04508] Updated weights for policy 0, policy_version 340 (0.0012) [2024-08-31 17:43:37,859][00204] Fps is (10 sec: 3276.7, 60 sec: 3618.3, 300 sec: 3512.8). Total num frames: 1409024. Throughput: 0: 895.1. Samples: 352456. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:43:37,862][00204] Avg episode reward: [(0, '4.518')] [2024-08-31 17:43:42,859][00204] Fps is (10 sec: 4096.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1429504. Throughput: 0: 895.8. Samples: 355486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:43:42,866][00204] Avg episode reward: [(0, '4.646')] [2024-08-31 17:43:43,938][04508] Updated weights for policy 0, policy_version 350 (0.0012) [2024-08-31 17:43:47,859][00204] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1441792. Throughput: 0: 891.3. Samples: 360282. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:43:47,864][00204] Avg episode reward: [(0, '4.803')] [2024-08-31 17:43:52,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1462272. Throughput: 0: 896.3. Samples: 365944. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:43:52,863][00204] Avg episode reward: [(0, '4.807')] [2024-08-31 17:43:55,340][04508] Updated weights for policy 0, policy_version 360 (0.0014) [2024-08-31 17:43:57,859][00204] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1482752. Throughput: 0: 899.2. Samples: 369050. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:43:57,865][00204] Avg episode reward: [(0, '4.622')] [2024-08-31 17:44:02,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3540.6). Total num frames: 1495040. Throughput: 0: 893.0. Samples: 373702. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:44:02,865][00204] Avg episode reward: [(0, '4.639')] [2024-08-31 17:44:07,245][04508] Updated weights for policy 0, policy_version 370 (0.0016) [2024-08-31 17:44:07,859][00204] Fps is (10 sec: 3276.9, 60 sec: 3550.0, 300 sec: 3540.6). Total num frames: 1515520. Throughput: 0: 900.8. Samples: 379434. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:44:07,862][00204] Avg episode reward: [(0, '4.804')] [2024-08-31 17:44:12,862][00204] Fps is (10 sec: 4094.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1536000. Throughput: 0: 898.6. Samples: 382430. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:44:12,864][00204] Avg episode reward: [(0, '5.085')] [2024-08-31 17:44:12,867][04495] Saving new best policy, reward=5.085! [2024-08-31 17:44:17,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1548288. Throughput: 0: 887.0. Samples: 386806. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:44:17,861][00204] Avg episode reward: [(0, '4.947')] [2024-08-31 17:44:19,333][04508] Updated weights for policy 0, policy_version 380 (0.0017) [2024-08-31 17:44:22,859][00204] Fps is (10 sec: 3277.8, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 1568768. Throughput: 0: 894.4. Samples: 392702. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:44:22,863][00204] Avg episode reward: [(0, '4.932')] [2024-08-31 17:44:27,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1589248. Throughput: 0: 894.6. Samples: 395744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:44:27,861][00204] Avg episode reward: [(0, '4.740')] [2024-08-31 17:44:27,872][04495] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000388_1589248.pth... [2024-08-31 17:44:28,018][04495] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000182_745472.pth [2024-08-31 17:44:30,551][04508] Updated weights for policy 0, policy_version 390 (0.0015) [2024-08-31 17:44:32,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1601536. Throughput: 0: 884.9. Samples: 400102. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:44:32,865][00204] Avg episode reward: [(0, '4.841')] [2024-08-31 17:44:37,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1622016. Throughput: 0: 890.4. Samples: 406012. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:44:37,865][00204] Avg episode reward: [(0, '4.688')] [2024-08-31 17:44:41,370][04508] Updated weights for policy 0, policy_version 400 (0.0012) [2024-08-31 17:44:42,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1642496. Throughput: 0: 889.7. Samples: 409086. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:44:42,866][00204] Avg episode reward: [(0, '4.771')] [2024-08-31 17:44:47,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 1654784. Throughput: 0: 881.3. Samples: 413360. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:44:47,862][00204] Avg episode reward: [(0, '5.089')] [2024-08-31 17:44:47,873][04495] Saving new best policy, reward=5.089! [2024-08-31 17:44:52,865][00204] Fps is (10 sec: 2865.9, 60 sec: 3481.3, 300 sec: 3498.9). Total num frames: 1671168. Throughput: 0: 853.6. Samples: 417852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:44:52,871][00204] Avg episode reward: [(0, '5.369')] [2024-08-31 17:44:52,879][04495] Saving new best policy, reward=5.369! [2024-08-31 17:44:55,096][04508] Updated weights for policy 0, policy_version 410 (0.0015) [2024-08-31 17:44:57,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3499.0). Total num frames: 1687552. Throughput: 0: 847.5. Samples: 420566. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:44:57,864][00204] Avg episode reward: [(0, '5.215')] [2024-08-31 17:45:02,859][00204] Fps is (10 sec: 2868.5, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 1699840. Throughput: 0: 845.6. Samples: 424860. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:45:02,865][00204] Avg episode reward: [(0, '5.096')] [2024-08-31 17:45:06,957][04508] Updated weights for policy 0, policy_version 420 (0.0019) [2024-08-31 17:45:07,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 1720320. Throughput: 0: 849.8. Samples: 430944. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:45:07,866][00204] Avg episode reward: [(0, '4.879')] [2024-08-31 17:45:12,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3413.5, 300 sec: 3499.0). Total num frames: 1740800. Throughput: 0: 851.0. Samples: 434040. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:45:12,861][00204] Avg episode reward: [(0, '5.200')] [2024-08-31 17:45:17,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1757184. Throughput: 0: 849.3. Samples: 438320. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:45:17,862][00204] Avg episode reward: [(0, '5.309')] [2024-08-31 17:45:18,788][04508] Updated weights for policy 0, policy_version 430 (0.0012) [2024-08-31 17:45:22,861][00204] Fps is (10 sec: 3685.6, 60 sec: 3481.5, 300 sec: 3498.9). Total num frames: 1777664. Throughput: 0: 855.8. Samples: 444524. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:45:22,867][00204] Avg episode reward: [(0, '5.451')] [2024-08-31 17:45:22,870][04495] Saving new best policy, reward=5.451! [2024-08-31 17:45:27,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 1794048. Throughput: 0: 854.8. Samples: 447550. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:45:27,862][00204] Avg episode reward: [(0, '5.210')] [2024-08-31 17:45:30,518][04508] Updated weights for policy 0, policy_version 440 (0.0016) [2024-08-31 17:45:32,859][00204] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1810432. Throughput: 0: 853.4. Samples: 451764. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:45:32,861][00204] Avg episode reward: [(0, '5.014')] [2024-08-31 17:45:37,859][00204] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1830912. Throughput: 0: 890.0. Samples: 457896. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:45:37,861][00204] Avg episode reward: [(0, '5.241')] [2024-08-31 17:45:40,649][04508] Updated weights for policy 0, policy_version 450 (0.0015) [2024-08-31 17:45:42,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 1847296. Throughput: 0: 898.0. Samples: 460976. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:45:42,861][00204] Avg episode reward: [(0, '5.169')] [2024-08-31 17:45:47,859][00204] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1863680. Throughput: 0: 898.2. Samples: 465278. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:45:47,865][00204] Avg episode reward: [(0, '5.471')] [2024-08-31 17:45:47,885][04495] Saving new best policy, reward=5.471! [2024-08-31 17:45:52,440][04508] Updated weights for policy 0, policy_version 460 (0.0012) [2024-08-31 17:45:52,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3499.0). Total num frames: 1884160. Throughput: 0: 899.3. Samples: 471414. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:45:52,866][00204] Avg episode reward: [(0, '5.227')] [2024-08-31 17:45:57,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1900544. Throughput: 0: 897.1. Samples: 474410. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:45:57,861][00204] Avg episode reward: [(0, '4.968')] [2024-08-31 17:46:02,861][00204] Fps is (10 sec: 3276.1, 60 sec: 3618.0, 300 sec: 3498.9). Total num frames: 1916928. Throughput: 0: 902.1. Samples: 478916. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:46:02,863][00204] Avg episode reward: [(0, '5.034')] [2024-08-31 17:46:04,026][04508] Updated weights for policy 0, policy_version 470 (0.0012) [2024-08-31 17:46:07,860][00204] Fps is (10 sec: 3686.1, 60 sec: 3618.1, 300 sec: 3498.9). Total num frames: 1937408. Throughput: 0: 902.8. Samples: 485150. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:46:07,865][00204] Avg episode reward: [(0, '5.349')] [2024-08-31 17:46:12,862][00204] Fps is (10 sec: 3686.1, 60 sec: 3549.7, 300 sec: 3498.9). Total num frames: 1953792. Throughput: 0: 898.6. Samples: 487988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:46:12,868][00204] Avg episode reward: [(0, '5.574')] [2024-08-31 17:46:12,872][04495] Saving new best policy, reward=5.574! [2024-08-31 17:46:15,932][04508] Updated weights for policy 0, policy_version 480 (0.0012) [2024-08-31 17:46:17,859][00204] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 1974272. Throughput: 0: 904.6. Samples: 492470. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:46:17,861][00204] Avg episode reward: [(0, '5.534')] [2024-08-31 17:46:22,859][00204] Fps is (10 sec: 4097.3, 60 sec: 3618.3, 300 sec: 3512.8). Total num frames: 1994752. Throughput: 0: 907.2. Samples: 498718. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:46:22,861][00204] Avg episode reward: [(0, '5.222')] [2024-08-31 17:46:26,250][04508] Updated weights for policy 0, policy_version 490 (0.0012) [2024-08-31 17:46:27,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3512.9). Total num frames: 2011136. Throughput: 0: 900.6. Samples: 501504. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:46:27,861][00204] Avg episode reward: [(0, '5.463')] [2024-08-31 17:46:27,880][00204] Components not started: RolloutWorker_w2, RolloutWorker_w3, RolloutWorker_w4, RolloutWorker_w5, RolloutWorker_w6, wait_time=600.0 seconds [2024-08-31 17:46:27,883][04495] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000491_2011136.pth... [2024-08-31 17:46:27,991][04495] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000283_1159168.pth [2024-08-31 17:46:32,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3512.9). Total num frames: 2027520. Throughput: 0: 907.0. Samples: 506092. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:46:32,865][00204] Avg episode reward: [(0, '5.549')] [2024-08-31 17:46:37,414][04508] Updated weights for policy 0, policy_version 500 (0.0012) [2024-08-31 17:46:37,860][00204] Fps is (10 sec: 3685.9, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 2048000. Throughput: 0: 908.9. Samples: 512316. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:46:37,863][00204] Avg episode reward: [(0, '5.594')] [2024-08-31 17:46:37,876][04495] Saving new best policy, reward=5.594! [2024-08-31 17:46:42,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2064384. Throughput: 0: 902.7. Samples: 515030. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:46:42,867][00204] Avg episode reward: [(0, '5.810')] [2024-08-31 17:46:42,884][04495] Saving new best policy, reward=5.810! [2024-08-31 17:46:47,859][00204] Fps is (10 sec: 3277.2, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2080768. Throughput: 0: 903.0. Samples: 519548. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:46:47,866][00204] Avg episode reward: [(0, '5.820')] [2024-08-31 17:46:47,877][04495] Saving new best policy, reward=5.820! [2024-08-31 17:46:49,468][04508] Updated weights for policy 0, policy_version 510 (0.0016) [2024-08-31 17:46:52,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 2101248. Throughput: 0: 901.5. Samples: 525716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:46:52,861][00204] Avg episode reward: [(0, '6.020')] [2024-08-31 17:46:52,870][04495] Saving new best policy, reward=6.020! [2024-08-31 17:46:57,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2117632. Throughput: 0: 896.5. Samples: 528328. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:46:57,865][00204] Avg episode reward: [(0, '5.972')] [2024-08-31 17:47:01,226][04508] Updated weights for policy 0, policy_version 520 (0.0014) [2024-08-31 17:47:02,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3554.5). Total num frames: 2134016. Throughput: 0: 902.8. Samples: 533096. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:47:02,861][00204] Avg episode reward: [(0, '6.004')] [2024-08-31 17:47:07,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 2154496. Throughput: 0: 903.3. Samples: 539366. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:47:07,866][00204] Avg episode reward: [(0, '5.951')] [2024-08-31 17:47:12,379][04508] Updated weights for policy 0, policy_version 530 (0.0013) [2024-08-31 17:47:12,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3554.5). Total num frames: 2170880. Throughput: 0: 896.4. Samples: 541842. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:47:12,865][00204] Avg episode reward: [(0, '5.996')] [2024-08-31 17:47:17,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2191360. Throughput: 0: 902.8. Samples: 546718. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:47:17,866][00204] Avg episode reward: [(0, '6.082')] [2024-08-31 17:47:17,874][04495] Saving new best policy, reward=6.082! [2024-08-31 17:47:22,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2207744. Throughput: 0: 899.4. Samples: 552788. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:47:22,862][00204] Avg episode reward: [(0, '6.067')] [2024-08-31 17:47:23,006][04508] Updated weights for policy 0, policy_version 540 (0.0012) [2024-08-31 17:47:27,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2224128. Throughput: 0: 891.3. Samples: 555140. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:47:27,864][00204] Avg episode reward: [(0, '6.239')] [2024-08-31 17:47:27,878][04495] Saving new best policy, reward=6.239! [2024-08-31 17:47:32,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2244608. Throughput: 0: 902.6. Samples: 560166. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:47:32,864][00204] Avg episode reward: [(0, '6.275')] [2024-08-31 17:47:32,866][04495] Saving new best policy, reward=6.275! [2024-08-31 17:47:34,720][04508] Updated weights for policy 0, policy_version 550 (0.0017) [2024-08-31 17:47:37,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3618.2, 300 sec: 3568.4). Total num frames: 2265088. Throughput: 0: 903.9. Samples: 566392. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:47:37,867][00204] Avg episode reward: [(0, '6.630')] [2024-08-31 17:47:37,875][04495] Saving new best policy, reward=6.630! [2024-08-31 17:47:42,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2277376. Throughput: 0: 896.4. Samples: 568668. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:47:42,862][00204] Avg episode reward: [(0, '6.410')] [2024-08-31 17:47:46,380][04508] Updated weights for policy 0, policy_version 560 (0.0012) [2024-08-31 17:47:47,859][00204] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2297856. Throughput: 0: 903.6. Samples: 573756. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:47:47,862][00204] Avg episode reward: [(0, '6.494')] [2024-08-31 17:47:52,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2318336. Throughput: 0: 904.5. Samples: 580068. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:47:52,861][00204] Avg episode reward: [(0, '6.884')] [2024-08-31 17:47:52,869][04495] Saving new best policy, reward=6.884! [2024-08-31 17:47:57,861][00204] Fps is (10 sec: 3276.2, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 2330624. Throughput: 0: 899.7. Samples: 582328. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:47:57,865][00204] Avg episode reward: [(0, '7.072')] [2024-08-31 17:47:57,885][04495] Saving new best policy, reward=7.072! [2024-08-31 17:47:57,895][04508] Updated weights for policy 0, policy_version 570 (0.0012) [2024-08-31 17:48:02,859][00204] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2351104. Throughput: 0: 907.6. Samples: 587560. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:48:02,867][00204] Avg episode reward: [(0, '7.316')] [2024-08-31 17:48:02,873][04495] Saving new best policy, reward=7.316! [2024-08-31 17:48:07,860][00204] Fps is (10 sec: 4096.2, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2371584. Throughput: 0: 909.4. Samples: 593714. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:48:07,862][00204] Avg episode reward: [(0, '7.314')] [2024-08-31 17:48:08,039][04508] Updated weights for policy 0, policy_version 580 (0.0012) [2024-08-31 17:48:12,859][00204] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2387968. Throughput: 0: 906.8. Samples: 595946. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:48:12,861][00204] Avg episode reward: [(0, '7.112')] [2024-08-31 17:48:17,859][00204] Fps is (10 sec: 3686.9, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2408448. Throughput: 0: 915.0. Samples: 601340. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:48:17,863][00204] Avg episode reward: [(0, '7.262')] [2024-08-31 17:48:19,365][04508] Updated weights for policy 0, policy_version 590 (0.0014) [2024-08-31 17:48:22,860][00204] Fps is (10 sec: 4095.5, 60 sec: 3686.3, 300 sec: 3568.4). Total num frames: 2428928. Throughput: 0: 916.9. Samples: 607654. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:48:22,862][00204] Avg episode reward: [(0, '6.863')] [2024-08-31 17:48:27,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2441216. Throughput: 0: 912.9. Samples: 609750. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:48:27,861][00204] Avg episode reward: [(0, '6.977')] [2024-08-31 17:48:27,878][04495] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000596_2441216.pth... [2024-08-31 17:48:27,985][04495] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000388_1589248.pth [2024-08-31 17:48:31,030][04508] Updated weights for policy 0, policy_version 600 (0.0012) [2024-08-31 17:48:32,859][00204] Fps is (10 sec: 3277.2, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2461696. Throughput: 0: 919.4. Samples: 615128. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:48:32,867][00204] Avg episode reward: [(0, '7.041')] [2024-08-31 17:48:37,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2482176. Throughput: 0: 919.3. Samples: 621436. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:48:37,862][00204] Avg episode reward: [(0, '7.462')] [2024-08-31 17:48:37,878][04495] Saving new best policy, reward=7.462! [2024-08-31 17:48:42,738][04508] Updated weights for policy 0, policy_version 610 (0.0012) [2024-08-31 17:48:42,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 2498560. Throughput: 0: 912.4. Samples: 623384. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:48:42,861][00204] Avg episode reward: [(0, '7.519')] [2024-08-31 17:48:42,865][04495] Saving new best policy, reward=7.519! [2024-08-31 17:48:47,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 2519040. Throughput: 0: 918.0. Samples: 628868. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:48:47,862][00204] Avg episode reward: [(0, '6.956')] [2024-08-31 17:48:52,728][04508] Updated weights for policy 0, policy_version 620 (0.0011) [2024-08-31 17:48:52,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 2539520. Throughput: 0: 918.8. Samples: 635060. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:48:52,861][00204] Avg episode reward: [(0, '6.371')] [2024-08-31 17:48:57,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3582.3). Total num frames: 2551808. Throughput: 0: 913.0. Samples: 637030. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:48:57,866][00204] Avg episode reward: [(0, '6.608')] [2024-08-31 17:49:02,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 2572288. Throughput: 0: 921.4. Samples: 642802. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:49:02,865][00204] Avg episode reward: [(0, '7.168')] [2024-08-31 17:49:04,001][04508] Updated weights for policy 0, policy_version 630 (0.0012) [2024-08-31 17:49:07,862][00204] Fps is (10 sec: 4094.8, 60 sec: 3686.3, 300 sec: 3582.3). Total num frames: 2592768. Throughput: 0: 914.3. Samples: 648798. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:49:07,864][00204] Avg episode reward: [(0, '7.753')] [2024-08-31 17:49:07,880][04495] Saving new best policy, reward=7.753! [2024-08-31 17:49:12,860][00204] Fps is (10 sec: 3686.1, 60 sec: 3686.3, 300 sec: 3596.1). Total num frames: 2609152. Throughput: 0: 911.8. Samples: 650784. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:49:12,863][00204] Avg episode reward: [(0, '7.667')] [2024-08-31 17:49:15,482][04508] Updated weights for policy 0, policy_version 640 (0.0012) [2024-08-31 17:49:17,861][00204] Fps is (10 sec: 3686.7, 60 sec: 3686.3, 300 sec: 3596.1). Total num frames: 2629632. Throughput: 0: 921.9. Samples: 656614. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:49:17,863][00204] Avg episode reward: [(0, '7.888')] [2024-08-31 17:49:17,881][04495] Saving new best policy, reward=7.888! [2024-08-31 17:49:22,859][00204] Fps is (10 sec: 4096.3, 60 sec: 3686.5, 300 sec: 3596.1). Total num frames: 2650112. Throughput: 0: 914.6. Samples: 662594. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:49:22,862][00204] Avg episode reward: [(0, '8.541')] [2024-08-31 17:49:22,868][04495] Saving new best policy, reward=8.541! [2024-08-31 17:49:27,229][04508] Updated weights for policy 0, policy_version 650 (0.0012) [2024-08-31 17:49:27,859][00204] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 2662400. Throughput: 0: 911.3. Samples: 664394. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:49:27,862][00204] Avg episode reward: [(0, '9.313')] [2024-08-31 17:49:27,870][04495] Saving new best policy, reward=9.313! [2024-08-31 17:49:32,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 2682880. Throughput: 0: 918.7. Samples: 670208. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:49:32,867][00204] Avg episode reward: [(0, '9.941')] [2024-08-31 17:49:32,870][04495] Saving new best policy, reward=9.941! [2024-08-31 17:49:37,708][04508] Updated weights for policy 0, policy_version 660 (0.0012) [2024-08-31 17:49:37,863][00204] Fps is (10 sec: 4094.4, 60 sec: 3686.2, 300 sec: 3596.1). Total num frames: 2703360. Throughput: 0: 908.9. Samples: 675962. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:49:37,865][00204] Avg episode reward: [(0, '10.372')] [2024-08-31 17:49:37,877][04495] Saving new best policy, reward=10.372! [2024-08-31 17:49:42,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 2715648. Throughput: 0: 906.8. Samples: 677834. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:49:42,867][00204] Avg episode reward: [(0, '10.754')] [2024-08-31 17:49:42,870][04495] Saving new best policy, reward=10.754! [2024-08-31 17:49:47,859][00204] Fps is (10 sec: 3278.1, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 2736128. Throughput: 0: 910.6. Samples: 683778. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:49:47,861][00204] Avg episode reward: [(0, '11.062')] [2024-08-31 17:49:47,871][04495] Saving new best policy, reward=11.062! [2024-08-31 17:49:48,983][04508] Updated weights for policy 0, policy_version 670 (0.0017) [2024-08-31 17:49:52,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2756608. Throughput: 0: 903.4. Samples: 689450. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:49:52,862][00204] Avg episode reward: [(0, '11.935')] [2024-08-31 17:49:52,866][04495] Saving new best policy, reward=11.935! [2024-08-31 17:49:57,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2772992. Throughput: 0: 901.9. Samples: 691368. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:49:57,865][00204] Avg episode reward: [(0, '11.501')] [2024-08-31 17:50:00,607][04508] Updated weights for policy 0, policy_version 680 (0.0014) [2024-08-31 17:50:02,862][00204] Fps is (10 sec: 3685.3, 60 sec: 3686.2, 300 sec: 3637.8). Total num frames: 2793472. Throughput: 0: 908.7. Samples: 697508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:50:02,864][00204] Avg episode reward: [(0, '11.131')] [2024-08-31 17:50:07,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3623.9). Total num frames: 2809856. Throughput: 0: 899.6. Samples: 703074. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:50:07,864][00204] Avg episode reward: [(0, '11.294')] [2024-08-31 17:50:12,322][04508] Updated weights for policy 0, policy_version 690 (0.0017) [2024-08-31 17:50:12,859][00204] Fps is (10 sec: 3277.7, 60 sec: 3618.2, 300 sec: 3623.9). Total num frames: 2826240. Throughput: 0: 903.5. Samples: 705050. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:50:12,867][00204] Avg episode reward: [(0, '11.116')] [2024-08-31 17:50:17,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3623.9). Total num frames: 2846720. Throughput: 0: 911.0. Samples: 711204. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:50:17,861][00204] Avg episode reward: [(0, '11.321')] [2024-08-31 17:50:22,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2863104. Throughput: 0: 905.8. Samples: 716718. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:50:22,861][00204] Avg episode reward: [(0, '10.787')] [2024-08-31 17:50:22,868][04508] Updated weights for policy 0, policy_version 700 (0.0012) [2024-08-31 17:50:27,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2883584. Throughput: 0: 912.1. Samples: 718878. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:50:27,861][00204] Avg episode reward: [(0, '9.875')] [2024-08-31 17:50:27,872][04495] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000704_2883584.pth... [2024-08-31 17:50:27,966][04495] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000491_2011136.pth [2024-08-31 17:50:32,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2904064. Throughput: 0: 918.9. Samples: 725128. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:50:32,865][00204] Avg episode reward: [(0, '9.616')] [2024-08-31 17:50:33,727][04508] Updated weights for policy 0, policy_version 710 (0.0012) [2024-08-31 17:50:37,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.4, 300 sec: 3637.8). Total num frames: 2920448. Throughput: 0: 911.5. Samples: 730468. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:50:37,861][00204] Avg episode reward: [(0, '9.444')] [2024-08-31 17:50:42,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2936832. Throughput: 0: 916.4. Samples: 732608. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:50:42,862][00204] Avg episode reward: [(0, '9.720')] [2024-08-31 17:50:45,211][04508] Updated weights for policy 0, policy_version 720 (0.0011) [2024-08-31 17:50:47,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2957312. Throughput: 0: 918.7. Samples: 738846. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:50:47,861][00204] Avg episode reward: [(0, '10.043')] [2024-08-31 17:50:52,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2973696. Throughput: 0: 913.3. Samples: 744172. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:50:52,863][00204] Avg episode reward: [(0, '10.669')] [2024-08-31 17:50:56,821][04508] Updated weights for policy 0, policy_version 730 (0.0012) [2024-08-31 17:50:57,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 2994176. Throughput: 0: 920.5. Samples: 746472. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:50:57,865][00204] Avg episode reward: [(0, '11.472')] [2024-08-31 17:51:02,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3686.6, 300 sec: 3651.7). Total num frames: 3014656. Throughput: 0: 924.4. Samples: 752800. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:51:02,866][00204] Avg episode reward: [(0, '13.178')] [2024-08-31 17:51:02,868][04495] Saving new best policy, reward=13.178! [2024-08-31 17:51:07,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3026944. Throughput: 0: 912.7. Samples: 757790. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:51:07,865][00204] Avg episode reward: [(0, '14.392')] [2024-08-31 17:51:07,889][04495] Saving new best policy, reward=14.392! [2024-08-31 17:51:07,908][04508] Updated weights for policy 0, policy_version 740 (0.0015) [2024-08-31 17:51:12,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3047424. Throughput: 0: 918.6. Samples: 760214. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:51:12,863][00204] Avg episode reward: [(0, '15.069')] [2024-08-31 17:51:12,866][04495] Saving new best policy, reward=15.069! [2024-08-31 17:51:17,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3067904. Throughput: 0: 917.4. Samples: 766410. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:51:17,861][00204] Avg episode reward: [(0, '14.864')] [2024-08-31 17:51:18,310][04508] Updated weights for policy 0, policy_version 750 (0.0012) [2024-08-31 17:51:22,863][00204] Fps is (10 sec: 3684.9, 60 sec: 3686.2, 300 sec: 3637.8). Total num frames: 3084288. Throughput: 0: 905.2. Samples: 771204. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:51:22,868][00204] Avg episode reward: [(0, '14.338')] [2024-08-31 17:51:27,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3100672. Throughput: 0: 920.3. Samples: 774022. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:51:27,861][00204] Avg episode reward: [(0, '13.347')] [2024-08-31 17:51:29,792][04508] Updated weights for policy 0, policy_version 760 (0.0014) [2024-08-31 17:51:32,859][00204] Fps is (10 sec: 4097.6, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3125248. Throughput: 0: 919.9. Samples: 780242. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:51:32,868][00204] Avg episode reward: [(0, '12.174')] [2024-08-31 17:51:37,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3137536. Throughput: 0: 905.0. Samples: 784896. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:51:37,862][00204] Avg episode reward: [(0, '12.919')] [2024-08-31 17:51:41,464][04508] Updated weights for policy 0, policy_version 770 (0.0012) [2024-08-31 17:51:42,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3158016. Throughput: 0: 917.2. Samples: 787748. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:51:42,866][00204] Avg episode reward: [(0, '13.970')] [2024-08-31 17:51:47,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3178496. Throughput: 0: 915.3. Samples: 793988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:51:47,863][00204] Avg episode reward: [(0, '14.952')] [2024-08-31 17:51:52,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3190784. Throughput: 0: 907.9. Samples: 798644. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:51:52,869][00204] Avg episode reward: [(0, '16.628')] [2024-08-31 17:51:52,872][04495] Saving new best policy, reward=16.628! [2024-08-31 17:51:53,079][04508] Updated weights for policy 0, policy_version 780 (0.0012) [2024-08-31 17:51:57,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3215360. Throughput: 0: 921.2. Samples: 801666. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:51:57,868][00204] Avg episode reward: [(0, '17.312')] [2024-08-31 17:51:57,877][04495] Saving new best policy, reward=17.312! [2024-08-31 17:52:02,844][04508] Updated weights for policy 0, policy_version 790 (0.0012) [2024-08-31 17:52:02,874][00204] Fps is (10 sec: 4498.8, 60 sec: 3685.5, 300 sec: 3665.4). Total num frames: 3235840. Throughput: 0: 921.0. Samples: 807870. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:52:02,877][00204] Avg episode reward: [(0, '18.520')] [2024-08-31 17:52:02,884][04495] Saving new best policy, reward=18.520! [2024-08-31 17:52:07,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3248128. Throughput: 0: 913.1. Samples: 812292. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:52:07,861][00204] Avg episode reward: [(0, '18.673')] [2024-08-31 17:52:07,876][04495] Saving new best policy, reward=18.673! [2024-08-31 17:52:12,859][00204] Fps is (10 sec: 3281.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3268608. Throughput: 0: 918.9. Samples: 815374. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:52:12,866][00204] Avg episode reward: [(0, '18.044')] [2024-08-31 17:52:14,431][04508] Updated weights for policy 0, policy_version 800 (0.0013) [2024-08-31 17:52:17,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3289088. Throughput: 0: 918.7. Samples: 821584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:52:17,861][00204] Avg episode reward: [(0, '16.828')] [2024-08-31 17:52:22,862][00204] Fps is (10 sec: 3275.9, 60 sec: 3618.2, 300 sec: 3651.7). Total num frames: 3301376. Throughput: 0: 915.7. Samples: 826106. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:52:22,863][00204] Avg episode reward: [(0, '15.547')] [2024-08-31 17:52:25,967][04508] Updated weights for policy 0, policy_version 810 (0.0012) [2024-08-31 17:52:27,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3651.7). Total num frames: 3321856. Throughput: 0: 922.1. Samples: 829244. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:52:27,865][00204] Avg episode reward: [(0, '15.734')] [2024-08-31 17:52:27,875][04495] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000812_3325952.pth... [2024-08-31 17:52:27,977][04495] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000596_2441216.pth [2024-08-31 17:52:32,859][00204] Fps is (10 sec: 4097.1, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3342336. Throughput: 0: 923.4. Samples: 835542. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:52:32,866][00204] Avg episode reward: [(0, '17.144')] [2024-08-31 17:52:37,462][04508] Updated weights for policy 0, policy_version 820 (0.0012) [2024-08-31 17:52:37,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3358720. Throughput: 0: 917.7. Samples: 839940. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:52:37,864][00204] Avg episode reward: [(0, '17.994')] [2024-08-31 17:52:42,861][00204] Fps is (10 sec: 3685.7, 60 sec: 3686.3, 300 sec: 3665.6). Total num frames: 3379200. Throughput: 0: 920.3. Samples: 843082. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:52:42,863][00204] Avg episode reward: [(0, '18.674')] [2024-08-31 17:52:47,652][04508] Updated weights for policy 0, policy_version 830 (0.0012) [2024-08-31 17:52:47,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3399680. Throughput: 0: 921.4. Samples: 849318. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:52:47,862][00204] Avg episode reward: [(0, '20.654')] [2024-08-31 17:52:47,873][04495] Saving new best policy, reward=20.654! [2024-08-31 17:52:52,859][00204] Fps is (10 sec: 3277.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3411968. Throughput: 0: 921.0. Samples: 853736. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:52:52,861][00204] Avg episode reward: [(0, '20.424')] [2024-08-31 17:52:57,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3432448. Throughput: 0: 923.0. Samples: 856908. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:52:57,861][00204] Avg episode reward: [(0, '18.529')] [2024-08-31 17:52:58,876][04508] Updated weights for policy 0, policy_version 840 (0.0012) [2024-08-31 17:53:02,859][00204] Fps is (10 sec: 4095.9, 60 sec: 3619.0, 300 sec: 3665.6). Total num frames: 3452928. Throughput: 0: 919.6. Samples: 862968. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:53:02,866][00204] Avg episode reward: [(0, '17.662')] [2024-08-31 17:53:07,861][00204] Fps is (10 sec: 3685.7, 60 sec: 3686.3, 300 sec: 3665.5). Total num frames: 3469312. Throughput: 0: 921.8. Samples: 867586. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:53:07,864][00204] Avg episode reward: [(0, '17.505')] [2024-08-31 17:53:10,502][04508] Updated weights for policy 0, policy_version 850 (0.0012) [2024-08-31 17:53:12,859][00204] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3489792. Throughput: 0: 922.8. Samples: 870770. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:53:12,863][00204] Avg episode reward: [(0, '16.379')] [2024-08-31 17:53:17,864][00204] Fps is (10 sec: 4094.6, 60 sec: 3686.1, 300 sec: 3665.5). Total num frames: 3510272. Throughput: 0: 915.8. Samples: 876758. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:53:17,869][00204] Avg episode reward: [(0, '16.488')] [2024-08-31 17:53:22,095][04508] Updated weights for policy 0, policy_version 860 (0.0011) [2024-08-31 17:53:22,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3665.6). Total num frames: 3522560. Throughput: 0: 921.6. Samples: 881414. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:53:22,861][00204] Avg episode reward: [(0, '18.571')] [2024-08-31 17:53:27,859][00204] Fps is (10 sec: 3278.5, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3543040. Throughput: 0: 921.4. Samples: 884542. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:53:27,861][00204] Avg episode reward: [(0, '19.828')] [2024-08-31 17:53:32,609][04508] Updated weights for policy 0, policy_version 870 (0.0013) [2024-08-31 17:53:32,863][00204] Fps is (10 sec: 4094.5, 60 sec: 3686.2, 300 sec: 3665.5). Total num frames: 3563520. Throughput: 0: 912.4. Samples: 890378. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:53:32,865][00204] Avg episode reward: [(0, '19.502')] [2024-08-31 17:53:37,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3579904. Throughput: 0: 919.5. Samples: 895112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:53:37,862][00204] Avg episode reward: [(0, '20.004')] [2024-08-31 17:53:42,859][00204] Fps is (10 sec: 3687.8, 60 sec: 3686.5, 300 sec: 3665.6). Total num frames: 3600384. Throughput: 0: 918.1. Samples: 898222. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:53:42,861][00204] Avg episode reward: [(0, '19.799')] [2024-08-31 17:53:43,637][04508] Updated weights for policy 0, policy_version 880 (0.0012) [2024-08-31 17:53:47,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3616768. Throughput: 0: 910.0. Samples: 903918. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:53:47,861][00204] Avg episode reward: [(0, '19.288')] [2024-08-31 17:53:52,861][00204] Fps is (10 sec: 3276.2, 60 sec: 3686.3, 300 sec: 3665.6). Total num frames: 3633152. Throughput: 0: 917.6. Samples: 908876. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:53:52,865][00204] Avg episode reward: [(0, '19.496')] [2024-08-31 17:53:55,152][04508] Updated weights for policy 0, policy_version 890 (0.0013) [2024-08-31 17:53:57,859][00204] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3653632. Throughput: 0: 917.9. Samples: 912074. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:53:57,861][00204] Avg episode reward: [(0, '19.375')] [2024-08-31 17:54:02,859][00204] Fps is (10 sec: 3686.9, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3670016. Throughput: 0: 910.9. Samples: 917742. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:54:02,862][00204] Avg episode reward: [(0, '19.949')] [2024-08-31 17:54:06,673][04508] Updated weights for policy 0, policy_version 900 (0.0012) [2024-08-31 17:54:07,859][00204] Fps is (10 sec: 3686.5, 60 sec: 3686.5, 300 sec: 3665.6). Total num frames: 3690496. Throughput: 0: 918.9. Samples: 922766. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:54:07,861][00204] Avg episode reward: [(0, '20.686')] [2024-08-31 17:54:07,875][04495] Saving new best policy, reward=20.686! [2024-08-31 17:54:12,859][00204] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3710976. Throughput: 0: 919.1. Samples: 925900. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:54:12,861][00204] Avg episode reward: [(0, '19.803')] [2024-08-31 17:54:17,345][04508] Updated weights for policy 0, policy_version 910 (0.0013) [2024-08-31 17:54:17,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3618.5, 300 sec: 3651.7). Total num frames: 3727360. Throughput: 0: 914.7. Samples: 931538. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:54:17,861][00204] Avg episode reward: [(0, '19.229')] [2024-08-31 17:54:22,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3743744. Throughput: 0: 923.5. Samples: 936668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:54:22,862][00204] Avg episode reward: [(0, '19.197')] [2024-08-31 17:54:27,859][00204] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3764224. Throughput: 0: 923.2. Samples: 939768. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:54:27,861][00204] Avg episode reward: [(0, '18.791')] [2024-08-31 17:54:27,945][04495] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000920_3768320.pth... [2024-08-31 17:54:27,951][04508] Updated weights for policy 0, policy_version 920 (0.0019) [2024-08-31 17:54:28,042][04495] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000704_2883584.pth [2024-08-31 17:54:32,863][00204] Fps is (10 sec: 3685.0, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3780608. Throughput: 0: 916.7. Samples: 945172. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:54:32,865][00204] Avg episode reward: [(0, '18.304')] [2024-08-31 17:54:37,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3801088. Throughput: 0: 924.2. Samples: 950462. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:54:37,862][00204] Avg episode reward: [(0, '18.973')] [2024-08-31 17:54:39,491][04508] Updated weights for policy 0, policy_version 930 (0.0016) [2024-08-31 17:54:42,859][00204] Fps is (10 sec: 4097.5, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3821568. Throughput: 0: 922.0. Samples: 953562. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:54:42,866][00204] Avg episode reward: [(0, '19.242')] [2024-08-31 17:54:47,860][00204] Fps is (10 sec: 3276.5, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 3833856. Throughput: 0: 914.0. Samples: 958872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:54:47,865][00204] Avg episode reward: [(0, '19.396')] [2024-08-31 17:54:51,305][04508] Updated weights for policy 0, policy_version 940 (0.0011) [2024-08-31 17:54:52,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3665.6). Total num frames: 3854336. Throughput: 0: 919.4. Samples: 964138. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-08-31 17:54:52,865][00204] Avg episode reward: [(0, '19.471')] [2024-08-31 17:54:57,859][00204] Fps is (10 sec: 4096.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3874816. Throughput: 0: 919.6. Samples: 967280. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:54:57,861][00204] Avg episode reward: [(0, '20.001')] [2024-08-31 17:55:02,306][04508] Updated weights for policy 0, policy_version 950 (0.0012) [2024-08-31 17:55:02,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3891200. Throughput: 0: 909.6. Samples: 972470. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:55:02,867][00204] Avg episode reward: [(0, '20.919')] [2024-08-31 17:55:02,872][04495] Saving new best policy, reward=20.919! [2024-08-31 17:55:07,859][00204] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3911680. Throughput: 0: 917.3. Samples: 977948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:55:07,863][00204] Avg episode reward: [(0, '19.841')] [2024-08-31 17:55:12,590][04508] Updated weights for policy 0, policy_version 960 (0.0012) [2024-08-31 17:55:12,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3679.5). Total num frames: 3932160. Throughput: 0: 918.8. Samples: 981114. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:55:12,861][00204] Avg episode reward: [(0, '20.117')] [2024-08-31 17:55:17,859][00204] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 3944448. Throughput: 0: 912.6. Samples: 986236. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-08-31 17:55:17,863][00204] Avg episode reward: [(0, '19.866')] [2024-08-31 17:55:22,860][00204] Fps is (10 sec: 3276.4, 60 sec: 3686.3, 300 sec: 3665.6). Total num frames: 3964928. Throughput: 0: 919.0. Samples: 991820. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:55:22,862][00204] Avg episode reward: [(0, '19.386')] [2024-08-31 17:55:24,255][04508] Updated weights for policy 0, policy_version 970 (0.0013) [2024-08-31 17:55:27,859][00204] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 3985408. Throughput: 0: 919.5. Samples: 994938. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:55:27,868][00204] Avg episode reward: [(0, '19.646')] [2024-08-31 17:55:32,859][00204] Fps is (10 sec: 3686.7, 60 sec: 3686.6, 300 sec: 3665.6). Total num frames: 4001792. Throughput: 0: 912.5. Samples: 999934. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-08-31 17:55:32,865][00204] Avg episode reward: [(0, '19.173')] [2024-08-31 17:55:33,831][04495] Stopping Batcher_0... [2024-08-31 17:55:33,832][04495] Loop batcher_evt_loop terminating... [2024-08-31 17:55:33,833][00204] Component Batcher_0 stopped! [2024-08-31 17:55:33,841][00204] Component RolloutWorker_w2 process died already! Don't wait for it. [2024-08-31 17:55:33,843][00204] Component RolloutWorker_w3 process died already! Don't wait for it. [2024-08-31 17:55:33,848][00204] Component RolloutWorker_w4 process died already! Don't wait for it. [2024-08-31 17:55:33,851][00204] Component RolloutWorker_w5 process died already! Don't wait for it. [2024-08-31 17:55:33,854][00204] Component RolloutWorker_w6 process died already! Don't wait for it. [2024-08-31 17:55:33,860][04508] Weights refcount: 2 0 [2024-08-31 17:55:33,862][04508] Stopping InferenceWorker_p0-w0... [2024-08-31 17:55:33,862][04508] Loop inference_proc0-0_evt_loop terminating... [2024-08-31 17:55:33,865][04495] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-31 17:55:33,863][00204] Component InferenceWorker_p0-w0 stopped! [2024-08-31 17:55:33,988][04495] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000812_3325952.pth [2024-08-31 17:55:34,000][04495] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-31 17:55:34,153][00204] Component RolloutWorker_w0 stopped! [2024-08-31 17:55:34,160][04509] Stopping RolloutWorker_w0... [2024-08-31 17:55:34,162][04509] Loop rollout_proc0_evt_loop terminating... [2024-08-31 17:55:34,164][04495] Stopping LearnerWorker_p0... [2024-08-31 17:55:34,166][04495] Loop learner_proc0_evt_loop terminating... [2024-08-31 17:55:34,164][00204] Component LearnerWorker_p0 stopped! [2024-08-31 17:55:34,310][04510] Stopping RolloutWorker_w1... [2024-08-31 17:55:34,312][04510] Loop rollout_proc1_evt_loop terminating... [2024-08-31 17:55:34,310][00204] Component RolloutWorker_w1 stopped! [2024-08-31 17:55:34,331][04516] Stopping RolloutWorker_w7... [2024-08-31 17:55:34,332][04516] Loop rollout_proc7_evt_loop terminating... [2024-08-31 17:55:34,331][00204] Component RolloutWorker_w7 stopped! [2024-08-31 17:55:34,337][00204] Waiting for process learner_proc0 to stop... [2024-08-31 17:55:35,406][00204] Waiting for process inference_proc0-0 to join... [2024-08-31 17:55:35,412][00204] Waiting for process rollout_proc0 to join... [2024-08-31 17:55:35,589][00204] Waiting for process rollout_proc1 to join... [2024-08-31 17:55:35,934][00204] Waiting for process rollout_proc2 to join... [2024-08-31 17:55:35,936][00204] Waiting for process rollout_proc3 to join... [2024-08-31 17:55:35,940][00204] Waiting for process rollout_proc4 to join... [2024-08-31 17:55:35,941][00204] Waiting for process rollout_proc5 to join... [2024-08-31 17:55:35,942][00204] Waiting for process rollout_proc6 to join... [2024-08-31 17:55:35,944][00204] Waiting for process rollout_proc7 to join... [2024-08-31 17:55:35,947][00204] Batcher 0 profile tree view: batching: 20.2127, releasing_batches: 0.0252 [2024-08-31 17:55:35,949][00204] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0026 wait_policy_total: 482.4535 update_model: 9.1112 weight_update: 0.0014 one_step: 0.0023 handle_policy_step: 585.9801 deserialize: 16.0149, stack: 3.7532, obs_to_device_normalize: 135.3176, forward: 295.0496, send_messages: 21.5612 prepare_outputs: 83.3401 to_cpu: 51.3861 [2024-08-31 17:55:35,950][00204] Learner 0 profile tree view: misc: 0.0067, prepare_batch: 14.2248 train: 67.1672 epoch_init: 0.0055, minibatch_init: 0.0083, losses_postprocess: 0.5133, kl_divergence: 0.4832, after_optimizer: 32.3544 calculate_losses: 21.1797 losses_init: 0.0044, forward_head: 1.4029, bptt_initial: 14.3846, tail: 0.8013, advantages_returns: 0.2362, losses: 2.3960 bptt: 1.6877 bptt_forward_core: 1.6133 update: 12.1174 clip: 1.3428 [2024-08-31 17:55:35,952][00204] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.6342, enqueue_policy_requests: 322.6874, env_step: 627.3718, overhead: 29.3904, complete_rollouts: 3.6486 save_policy_outputs: 43.9834 split_output_tensors: 14.8036 [2024-08-31 17:55:35,954][00204] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.5306, enqueue_policy_requests: 144.1409, env_step: 802.9788, overhead: 23.8648, complete_rollouts: 7.9195 save_policy_outputs: 40.4400 split_output_tensors: 13.8662 [2024-08-31 17:55:35,955][00204] Loop Runner_EvtLoop terminating... [2024-08-31 17:55:35,957][00204] Runner profile tree view: main_loop: 1144.5452 [2024-08-31 17:55:35,958][00204] Collected {0: 4005888}, FPS: 3500.0 [2024-08-31 17:59:53,474][00204] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-31 17:59:53,476][00204] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-31 17:59:53,479][00204] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-31 17:59:53,481][00204] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-31 17:59:53,483][00204] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-31 17:59:53,486][00204] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-31 17:59:53,487][00204] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-08-31 17:59:53,489][00204] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-31 17:59:53,491][00204] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-08-31 17:59:53,492][00204] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-08-31 17:59:53,493][00204] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-31 17:59:53,494][00204] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-31 17:59:53,496][00204] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-31 17:59:53,497][00204] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-31 17:59:53,499][00204] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-31 17:59:53,517][00204] Doom resolution: 160x120, resize resolution: (128, 72) [2024-08-31 17:59:53,519][00204] RunningMeanStd input shape: (3, 72, 128) [2024-08-31 17:59:53,521][00204] RunningMeanStd input shape: (1,) [2024-08-31 17:59:53,539][00204] ConvEncoder: input_channels=3 [2024-08-31 17:59:53,661][00204] Conv encoder output size: 512 [2024-08-31 17:59:53,662][00204] Policy head output size: 512 [2024-08-31 17:59:55,382][00204] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-31 17:59:56,229][00204] Num frames 100... [2024-08-31 17:59:56,349][00204] Num frames 200... [2024-08-31 17:59:56,522][00204] Num frames 300... [2024-08-31 17:59:56,684][00204] Num frames 400... [2024-08-31 17:59:56,847][00204] Num frames 500... [2024-08-31 17:59:57,017][00204] Num frames 600... [2024-08-31 17:59:57,183][00204] Num frames 700... [2024-08-31 17:59:57,273][00204] Avg episode rewards: #0: 14.180, true rewards: #0: 7.180 [2024-08-31 17:59:57,274][00204] Avg episode reward: 14.180, avg true_objective: 7.180 [2024-08-31 17:59:57,404][00204] Num frames 800... [2024-08-31 17:59:57,561][00204] Num frames 900... [2024-08-31 17:59:57,734][00204] Num frames 1000... [2024-08-31 17:59:57,900][00204] Num frames 1100... [2024-08-31 17:59:57,961][00204] Avg episode rewards: #0: 9.510, true rewards: #0: 5.510 [2024-08-31 17:59:57,963][00204] Avg episode reward: 9.510, avg true_objective: 5.510 [2024-08-31 17:59:58,138][00204] Num frames 1200... [2024-08-31 17:59:58,311][00204] Num frames 1300... [2024-08-31 17:59:58,482][00204] Num frames 1400... [2024-08-31 17:59:58,648][00204] Num frames 1500... [2024-08-31 17:59:58,821][00204] Num frames 1600... [2024-08-31 17:59:58,968][00204] Num frames 1700... [2024-08-31 17:59:59,096][00204] Num frames 1800... [2024-08-31 17:59:59,214][00204] Num frames 1900... [2024-08-31 17:59:59,336][00204] Num frames 2000... [2024-08-31 17:59:59,500][00204] Avg episode rewards: #0: 12.647, true rewards: #0: 6.980 [2024-08-31 17:59:59,502][00204] Avg episode reward: 12.647, avg true_objective: 6.980 [2024-08-31 17:59:59,513][00204] Num frames 2100... [2024-08-31 17:59:59,634][00204] Num frames 2200... [2024-08-31 17:59:59,757][00204] Num frames 2300... [2024-08-31 17:59:59,892][00204] Num frames 2400... [2024-08-31 18:00:00,013][00204] Num frames 2500... [2024-08-31 18:00:00,139][00204] Num frames 2600... [2024-08-31 18:00:00,258][00204] Num frames 2700... [2024-08-31 18:00:00,376][00204] Num frames 2800... [2024-08-31 18:00:00,495][00204] Num frames 2900... [2024-08-31 18:00:00,583][00204] Avg episode rewards: #0: 13.065, true rewards: #0: 7.315 [2024-08-31 18:00:00,584][00204] Avg episode reward: 13.065, avg true_objective: 7.315 [2024-08-31 18:00:00,672][00204] Num frames 3000... [2024-08-31 18:00:00,792][00204] Num frames 3100... [2024-08-31 18:00:00,916][00204] Num frames 3200... [2024-08-31 18:00:01,039][00204] Num frames 3300... [2024-08-31 18:00:01,163][00204] Num frames 3400... [2024-08-31 18:00:01,279][00204] Num frames 3500... [2024-08-31 18:00:01,400][00204] Num frames 3600... [2024-08-31 18:00:01,527][00204] Avg episode rewards: #0: 13.124, true rewards: #0: 7.324 [2024-08-31 18:00:01,528][00204] Avg episode reward: 13.124, avg true_objective: 7.324 [2024-08-31 18:00:01,578][00204] Num frames 3700... [2024-08-31 18:00:01,699][00204] Num frames 3800... [2024-08-31 18:00:01,823][00204] Num frames 3900... [2024-08-31 18:00:01,948][00204] Num frames 4000... [2024-08-31 18:00:02,067][00204] Num frames 4100... [2024-08-31 18:00:02,192][00204] Num frames 4200... [2024-08-31 18:00:02,309][00204] Num frames 4300... [2024-08-31 18:00:02,427][00204] Num frames 4400... [2024-08-31 18:00:02,518][00204] Avg episode rewards: #0: 13.717, true rewards: #0: 7.383 [2024-08-31 18:00:02,520][00204] Avg episode reward: 13.717, avg true_objective: 7.383 [2024-08-31 18:00:02,605][00204] Num frames 4500... [2024-08-31 18:00:02,726][00204] Num frames 4600... [2024-08-31 18:00:02,849][00204] Num frames 4700... [2024-08-31 18:00:02,971][00204] Num frames 4800... [2024-08-31 18:00:03,092][00204] Num frames 4900... [2024-08-31 18:00:03,223][00204] Num frames 5000... [2024-08-31 18:00:03,339][00204] Num frames 5100... [2024-08-31 18:00:03,456][00204] Num frames 5200... [2024-08-31 18:00:03,577][00204] Num frames 5300... [2024-08-31 18:00:03,699][00204] Num frames 5400... [2024-08-31 18:00:03,821][00204] Num frames 5500... [2024-08-31 18:00:03,945][00204] Num frames 5600... [2024-08-31 18:00:04,065][00204] Num frames 5700... [2024-08-31 18:00:04,185][00204] Num frames 5800... [2024-08-31 18:00:04,311][00204] Num frames 5900... [2024-08-31 18:00:04,430][00204] Num frames 6000... [2024-08-31 18:00:04,549][00204] Avg episode rewards: #0: 17.929, true rewards: #0: 8.643 [2024-08-31 18:00:04,551][00204] Avg episode reward: 17.929, avg true_objective: 8.643 [2024-08-31 18:00:04,611][00204] Num frames 6100... [2024-08-31 18:00:04,729][00204] Num frames 6200... [2024-08-31 18:00:04,859][00204] Num frames 6300... [2024-08-31 18:00:04,975][00204] Num frames 6400... [2024-08-31 18:00:05,035][00204] Avg episode rewards: #0: 16.503, true rewards: #0: 8.002 [2024-08-31 18:00:05,036][00204] Avg episode reward: 16.503, avg true_objective: 8.002 [2024-08-31 18:00:05,154][00204] Num frames 6500... [2024-08-31 18:00:05,278][00204] Num frames 6600... [2024-08-31 18:00:05,394][00204] Num frames 6700... [2024-08-31 18:00:05,515][00204] Num frames 6800... [2024-08-31 18:00:05,631][00204] Num frames 6900... [2024-08-31 18:00:05,751][00204] Num frames 7000... [2024-08-31 18:00:05,876][00204] Num frames 7100... [2024-08-31 18:00:05,996][00204] Num frames 7200... [2024-08-31 18:00:06,118][00204] Num frames 7300... [2024-08-31 18:00:06,235][00204] Num frames 7400... [2024-08-31 18:00:06,363][00204] Num frames 7500... [2024-08-31 18:00:06,481][00204] Num frames 7600... [2024-08-31 18:00:06,599][00204] Num frames 7700... [2024-08-31 18:00:06,725][00204] Num frames 7800... [2024-08-31 18:00:06,849][00204] Num frames 7900... [2024-08-31 18:00:06,967][00204] Num frames 8000... [2024-08-31 18:00:07,089][00204] Num frames 8100... [2024-08-31 18:00:07,206][00204] Num frames 8200... [2024-08-31 18:00:07,336][00204] Num frames 8300... [2024-08-31 18:00:07,469][00204] Num frames 8400... [2024-08-31 18:00:07,588][00204] Num frames 8500... [2024-08-31 18:00:07,647][00204] Avg episode rewards: #0: 21.335, true rewards: #0: 9.447 [2024-08-31 18:00:07,648][00204] Avg episode reward: 21.335, avg true_objective: 9.447 [2024-08-31 18:00:07,766][00204] Num frames 8600... [2024-08-31 18:00:07,893][00204] Num frames 8700... [2024-08-31 18:00:08,013][00204] Num frames 8800... [2024-08-31 18:00:08,129][00204] Num frames 8900... [2024-08-31 18:00:08,245][00204] Num frames 9000... [2024-08-31 18:00:08,374][00204] Num frames 9100... [2024-08-31 18:00:08,490][00204] Num frames 9200... [2024-08-31 18:00:08,626][00204] Avg episode rewards: #0: 20.670, true rewards: #0: 9.270 [2024-08-31 18:00:08,628][00204] Avg episode reward: 20.670, avg true_objective: 9.270 [2024-08-31 18:01:02,087][00204] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-08-31 18:03:56,713][00204] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-31 18:03:56,714][00204] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-31 18:03:56,717][00204] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-31 18:03:56,719][00204] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-31 18:03:56,721][00204] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-31 18:03:56,722][00204] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-31 18:03:56,725][00204] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-08-31 18:03:56,726][00204] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-31 18:03:56,727][00204] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-08-31 18:03:56,728][00204] Adding new argument 'hf_repository'='Cryxim/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-08-31 18:03:56,729][00204] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-31 18:03:56,730][00204] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-31 18:03:56,731][00204] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-31 18:03:56,733][00204] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-31 18:03:56,735][00204] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-31 18:03:56,749][00204] RunningMeanStd input shape: (3, 72, 128) [2024-08-31 18:03:56,751][00204] RunningMeanStd input shape: (1,) [2024-08-31 18:03:56,763][00204] ConvEncoder: input_channels=3 [2024-08-31 18:03:56,800][00204] Conv encoder output size: 512 [2024-08-31 18:03:56,801][00204] Policy head output size: 512 [2024-08-31 18:03:56,821][00204] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-31 18:03:57,295][00204] Num frames 100... [2024-08-31 18:03:57,411][00204] Num frames 200... [2024-08-31 18:03:57,528][00204] Num frames 300... [2024-08-31 18:03:57,644][00204] Num frames 400... [2024-08-31 18:03:57,760][00204] Num frames 500... [2024-08-31 18:03:57,884][00204] Num frames 600... [2024-08-31 18:03:58,002][00204] Num frames 700... [2024-08-31 18:03:58,121][00204] Num frames 800... [2024-08-31 18:03:58,240][00204] Num frames 900... [2024-08-31 18:03:58,413][00204] Avg episode rewards: #0: 21.920, true rewards: #0: 9.920 [2024-08-31 18:03:58,414][00204] Avg episode reward: 21.920, avg true_objective: 9.920 [2024-08-31 18:03:58,427][00204] Num frames 1000... [2024-08-31 18:03:58,542][00204] Num frames 1100... [2024-08-31 18:03:58,655][00204] Num frames 1200... [2024-08-31 18:03:58,769][00204] Num frames 1300... [2024-08-31 18:03:58,891][00204] Num frames 1400... [2024-08-31 18:03:59,004][00204] Num frames 1500... [2024-08-31 18:03:59,116][00204] Num frames 1600... [2024-08-31 18:03:59,206][00204] Avg episode rewards: #0: 16.660, true rewards: #0: 8.160 [2024-08-31 18:03:59,207][00204] Avg episode reward: 16.660, avg true_objective: 8.160 [2024-08-31 18:03:59,292][00204] Num frames 1700... [2024-08-31 18:03:59,414][00204] Num frames 1800... [2024-08-31 18:03:59,533][00204] Num frames 1900... [2024-08-31 18:03:59,648][00204] Num frames 2000... [2024-08-31 18:03:59,764][00204] Num frames 2100... [2024-08-31 18:03:59,885][00204] Num frames 2200... [2024-08-31 18:03:59,989][00204] Avg episode rewards: #0: 16.140, true rewards: #0: 7.473 [2024-08-31 18:03:59,992][00204] Avg episode reward: 16.140, avg true_objective: 7.473 [2024-08-31 18:04:00,058][00204] Num frames 2300... [2024-08-31 18:04:00,171][00204] Num frames 2400... [2024-08-31 18:04:00,290][00204] Num frames 2500... [2024-08-31 18:04:00,414][00204] Num frames 2600... [2024-08-31 18:04:00,530][00204] Num frames 2700... [2024-08-31 18:04:00,649][00204] Avg episode rewards: #0: 14.635, true rewards: #0: 6.885 [2024-08-31 18:04:00,651][00204] Avg episode reward: 14.635, avg true_objective: 6.885 [2024-08-31 18:04:00,705][00204] Num frames 2800... [2024-08-31 18:04:00,825][00204] Num frames 2900... [2024-08-31 18:04:00,949][00204] Num frames 3000... [2024-08-31 18:04:01,072][00204] Num frames 3100... [2024-08-31 18:04:01,298][00204] Num frames 3200... [2024-08-31 18:04:01,479][00204] Num frames 3300... [2024-08-31 18:04:01,595][00204] Num frames 3400... [2024-08-31 18:04:01,718][00204] Avg episode rewards: #0: 14.712, true rewards: #0: 6.912 [2024-08-31 18:04:01,720][00204] Avg episode reward: 14.712, avg true_objective: 6.912 [2024-08-31 18:04:01,774][00204] Num frames 3500... [2024-08-31 18:04:01,895][00204] Num frames 3600... [2024-08-31 18:04:02,174][00204] Num frames 3700... [2024-08-31 18:04:02,307][00204] Num frames 3800... [2024-08-31 18:04:02,432][00204] Num frames 3900... [2024-08-31 18:04:02,549][00204] Num frames 4000... [2024-08-31 18:04:02,665][00204] Num frames 4100... [2024-08-31 18:04:02,812][00204] Num frames 4200... [2024-08-31 18:04:02,975][00204] Avg episode rewards: #0: 14.873, true rewards: #0: 7.040 [2024-08-31 18:04:02,978][00204] Avg episode reward: 14.873, avg true_objective: 7.040 [2024-08-31 18:04:03,078][00204] Num frames 4300... [2024-08-31 18:04:03,200][00204] Num frames 4400... [2024-08-31 18:04:03,321][00204] Num frames 4500... [2024-08-31 18:04:03,445][00204] Num frames 4600... [2024-08-31 18:04:03,560][00204] Num frames 4700... [2024-08-31 18:04:03,677][00204] Num frames 4800... [2024-08-31 18:04:03,794][00204] Num frames 4900... [2024-08-31 18:04:03,963][00204] Avg episode rewards: #0: 14.560, true rewards: #0: 7.131 [2024-08-31 18:04:03,965][00204] Avg episode reward: 14.560, avg true_objective: 7.131 [2024-08-31 18:04:03,978][00204] Num frames 5000... [2024-08-31 18:04:04,093][00204] Num frames 5100... [2024-08-31 18:04:04,209][00204] Num frames 5200... [2024-08-31 18:04:04,326][00204] Num frames 5300... [2024-08-31 18:04:04,452][00204] Num frames 5400... [2024-08-31 18:04:04,572][00204] Num frames 5500... [2024-08-31 18:04:04,667][00204] Avg episode rewards: #0: 13.670, true rewards: #0: 6.920 [2024-08-31 18:04:04,669][00204] Avg episode reward: 13.670, avg true_objective: 6.920 [2024-08-31 18:04:04,743][00204] Num frames 5600... [2024-08-31 18:04:04,876][00204] Num frames 5700... [2024-08-31 18:04:04,993][00204] Num frames 5800... [2024-08-31 18:04:05,108][00204] Num frames 5900... [2024-08-31 18:04:05,224][00204] Num frames 6000... [2024-08-31 18:04:05,340][00204] Num frames 6100... [2024-08-31 18:04:05,458][00204] Num frames 6200... [2024-08-31 18:04:05,580][00204] Num frames 6300... [2024-08-31 18:04:05,698][00204] Num frames 6400... [2024-08-31 18:04:05,815][00204] Num frames 6500... [2024-08-31 18:04:05,937][00204] Num frames 6600... [2024-08-31 18:04:06,055][00204] Num frames 6700... [2024-08-31 18:04:06,171][00204] Num frames 6800... [2024-08-31 18:04:06,298][00204] Num frames 6900... [2024-08-31 18:04:06,463][00204] Num frames 7000... [2024-08-31 18:04:06,637][00204] Num frames 7100... [2024-08-31 18:04:06,798][00204] Num frames 7200... [2024-08-31 18:04:06,975][00204] Avg episode rewards: #0: 16.639, true rewards: #0: 8.083 [2024-08-31 18:04:06,978][00204] Avg episode reward: 16.639, avg true_objective: 8.083 [2024-08-31 18:04:07,023][00204] Num frames 7300... [2024-08-31 18:04:07,179][00204] Num frames 7400... [2024-08-31 18:04:07,337][00204] Num frames 7500... [2024-08-31 18:04:07,497][00204] Num frames 7600... [2024-08-31 18:04:07,672][00204] Num frames 7700... [2024-08-31 18:04:07,849][00204] Num frames 7800... [2024-08-31 18:04:08,016][00204] Num frames 7900... [2024-08-31 18:04:08,181][00204] Num frames 8000... [2024-08-31 18:04:08,347][00204] Num frames 8100... [2024-08-31 18:04:08,507][00204] Num frames 8200... [2024-08-31 18:04:08,685][00204] Num frames 8300... [2024-08-31 18:04:08,906][00204] Avg episode rewards: #0: 17.495, true rewards: #0: 8.395 [2024-08-31 18:04:08,908][00204] Avg episode reward: 17.495, avg true_objective: 8.395 [2024-08-31 18:04:55,697][00204] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-08-31 18:05:02,002][00204] The model has been pushed to https://huggingface.co/Cryxim/rl_course_vizdoom_health_gathering_supreme [2024-08-31 18:07:14,466][00204] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-08-31 18:07:14,468][00204] Overriding arg 'num_workers' with value 1 passed from command line [2024-08-31 18:07:14,470][00204] Adding new argument 'no_render'=True that is not in the saved config file! [2024-08-31 18:07:14,472][00204] Adding new argument 'save_video'=True that is not in the saved config file! [2024-08-31 18:07:14,474][00204] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-08-31 18:07:14,475][00204] Adding new argument 'video_name'=None that is not in the saved config file! [2024-08-31 18:07:14,477][00204] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-08-31 18:07:14,478][00204] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-08-31 18:07:14,479][00204] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-08-31 18:07:14,480][00204] Adding new argument 'hf_repository'='Cryxim/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-08-31 18:07:14,481][00204] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-08-31 18:07:14,482][00204] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-08-31 18:07:14,483][00204] Adding new argument 'train_script'=None that is not in the saved config file! [2024-08-31 18:07:14,484][00204] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-08-31 18:07:14,485][00204] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-08-31 18:07:14,494][00204] RunningMeanStd input shape: (3, 72, 128) [2024-08-31 18:07:14,500][00204] RunningMeanStd input shape: (1,) [2024-08-31 18:07:14,513][00204] ConvEncoder: input_channels=3 [2024-08-31 18:07:14,548][00204] Conv encoder output size: 512 [2024-08-31 18:07:14,549][00204] Policy head output size: 512 [2024-08-31 18:07:14,568][00204] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-08-31 18:07:15,042][00204] Num frames 100... [2024-08-31 18:07:15,157][00204] Num frames 200... [2024-08-31 18:07:15,270][00204] Num frames 300... [2024-08-31 18:07:15,385][00204] Num frames 400... [2024-08-31 18:07:15,496][00204] Num frames 500... [2024-08-31 18:07:15,638][00204] Avg episode rewards: #0: 8.760, true rewards: #0: 5.760 [2024-08-31 18:07:15,640][00204] Avg episode reward: 8.760, avg true_objective: 5.760 [2024-08-31 18:07:15,676][00204] Num frames 600... [2024-08-31 18:07:15,793][00204] Num frames 700... [2024-08-31 18:07:15,920][00204] Num frames 800... [2024-08-31 18:07:16,037][00204] Num frames 900... [2024-08-31 18:07:16,150][00204] Num frames 1000... [2024-08-31 18:07:16,265][00204] Num frames 1100... [2024-08-31 18:07:16,377][00204] Num frames 1200... [2024-08-31 18:07:16,497][00204] Num frames 1300... [2024-08-31 18:07:16,612][00204] Num frames 1400... [2024-08-31 18:07:16,755][00204] Num frames 1500... [2024-08-31 18:07:16,894][00204] Num frames 1600... [2024-08-31 18:07:16,986][00204] Avg episode rewards: #0: 14.160, true rewards: #0: 8.160 [2024-08-31 18:07:16,988][00204] Avg episode reward: 14.160, avg true_objective: 8.160 [2024-08-31 18:07:17,109][00204] Num frames 1700... [2024-08-31 18:07:17,268][00204] Num frames 1800... [2024-08-31 18:07:17,425][00204] Num frames 1900... [2024-08-31 18:07:17,576][00204] Num frames 2000... [2024-08-31 18:07:17,734][00204] Num frames 2100... [2024-08-31 18:07:17,908][00204] Avg episode rewards: #0: 11.920, true rewards: #0: 7.253 [2024-08-31 18:07:17,910][00204] Avg episode reward: 11.920, avg true_objective: 7.253 [2024-08-31 18:07:17,951][00204] Num frames 2200... [2024-08-31 18:07:18,102][00204] Num frames 2300... [2024-08-31 18:07:18,266][00204] Num frames 2400... [2024-08-31 18:07:18,431][00204] Num frames 2500... [2024-08-31 18:07:18,595][00204] Num frames 2600... [2024-08-31 18:07:18,794][00204] Num frames 2700... [2024-08-31 18:07:18,991][00204] Avg episode rewards: #0: 11.210, true rewards: #0: 6.960 [2024-08-31 18:07:18,994][00204] Avg episode reward: 11.210, avg true_objective: 6.960 [2024-08-31 18:07:19,024][00204] Num frames 2800... [2024-08-31 18:07:19,187][00204] Num frames 2900... [2024-08-31 18:07:19,350][00204] Num frames 3000... [2024-08-31 18:07:19,521][00204] Num frames 3100... [2024-08-31 18:07:19,689][00204] Num frames 3200... [2024-08-31 18:07:19,819][00204] Num frames 3300... [2024-08-31 18:07:19,940][00204] Num frames 3400... [2024-08-31 18:07:20,051][00204] Avg episode rewards: #0: 11.694, true rewards: #0: 6.894 [2024-08-31 18:07:20,053][00204] Avg episode reward: 11.694, avg true_objective: 6.894 [2024-08-31 18:07:20,117][00204] Num frames 3500... [2024-08-31 18:07:20,232][00204] Num frames 3600... [2024-08-31 18:07:20,349][00204] Num frames 3700... [2024-08-31 18:07:20,464][00204] Num frames 3800... [2024-08-31 18:07:20,589][00204] Num frames 3900... [2024-08-31 18:07:20,713][00204] Num frames 4000... [2024-08-31 18:07:20,839][00204] Num frames 4100... [2024-08-31 18:07:20,988][00204] Avg episode rewards: #0: 12.290, true rewards: #0: 6.957 [2024-08-31 18:07:20,990][00204] Avg episode reward: 12.290, avg true_objective: 6.957 [2024-08-31 18:07:21,024][00204] Num frames 4200... [2024-08-31 18:07:21,147][00204] Num frames 4300... [2024-08-31 18:07:21,265][00204] Num frames 4400... [2024-08-31 18:07:21,381][00204] Num frames 4500... [2024-08-31 18:07:21,497][00204] Num frames 4600... [2024-08-31 18:07:21,618][00204] Num frames 4700... [2024-08-31 18:07:21,734][00204] Num frames 4800... [2024-08-31 18:07:21,866][00204] Num frames 4900... [2024-08-31 18:07:21,986][00204] Num frames 5000... [2024-08-31 18:07:22,108][00204] Num frames 5100... [2024-08-31 18:07:22,207][00204] Avg episode rewards: #0: 13.049, true rewards: #0: 7.334 [2024-08-31 18:07:22,209][00204] Avg episode reward: 13.049, avg true_objective: 7.334 [2024-08-31 18:07:22,292][00204] Num frames 5200... [2024-08-31 18:07:22,410][00204] Num frames 5300... [2024-08-31 18:07:22,525][00204] Num frames 5400... [2024-08-31 18:07:22,648][00204] Num frames 5500... [2024-08-31 18:07:22,777][00204] Num frames 5600... [2024-08-31 18:07:22,913][00204] Num frames 5700... [2024-08-31 18:07:23,071][00204] Avg episode rewards: #0: 13.231, true rewards: #0: 7.231 [2024-08-31 18:07:23,072][00204] Avg episode reward: 13.231, avg true_objective: 7.231 [2024-08-31 18:07:23,093][00204] Num frames 5800... [2024-08-31 18:07:23,208][00204] Num frames 5900... [2024-08-31 18:07:23,329][00204] Num frames 6000... [2024-08-31 18:07:23,449][00204] Num frames 6100... [2024-08-31 18:07:23,565][00204] Num frames 6200... [2024-08-31 18:07:23,687][00204] Num frames 6300... [2024-08-31 18:07:23,803][00204] Num frames 6400... [2024-08-31 18:07:23,938][00204] Num frames 6500... [2024-08-31 18:07:24,057][00204] Num frames 6600... [2024-08-31 18:07:24,172][00204] Num frames 6700... [2024-08-31 18:07:24,283][00204] Avg episode rewards: #0: 14.161, true rewards: #0: 7.494 [2024-08-31 18:07:24,284][00204] Avg episode reward: 14.161, avg true_objective: 7.494 [2024-08-31 18:07:24,351][00204] Num frames 6800... [2024-08-31 18:07:24,466][00204] Num frames 6900... [2024-08-31 18:07:24,595][00204] Num frames 7000... [2024-08-31 18:07:24,711][00204] Num frames 7100... [2024-08-31 18:07:24,836][00204] Num frames 7200... [2024-08-31 18:07:24,963][00204] Num frames 7300... [2024-08-31 18:07:25,082][00204] Num frames 7400... [2024-08-31 18:07:25,202][00204] Num frames 7500... [2024-08-31 18:07:25,321][00204] Num frames 7600... [2024-08-31 18:07:25,440][00204] Num frames 7700... [2024-08-31 18:07:25,556][00204] Num frames 7800... [2024-08-31 18:07:25,672][00204] Num frames 7900... [2024-08-31 18:07:25,797][00204] Num frames 8000... [2024-08-31 18:07:25,920][00204] Num frames 8100... [2024-08-31 18:07:26,050][00204] Num frames 8200... [2024-08-31 18:07:26,168][00204] Num frames 8300... [2024-08-31 18:07:26,227][00204] Avg episode rewards: #0: 17.001, true rewards: #0: 8.301 [2024-08-31 18:07:26,228][00204] Avg episode reward: 17.001, avg true_objective: 8.301 [2024-08-31 18:08:13,048][00204] Replay video saved to /content/train_dir/default_experiment/replay.mp4!