diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1,50 +1,49 @@ -[2024-09-15 15:33:57,678][00283] Saving configuration to /content/train_dir/default_experiment/config.json... -[2024-09-15 15:33:57,680][00283] Rollout worker 0 uses device cpu -[2024-09-15 15:33:57,681][00283] Rollout worker 1 uses device cpu -[2024-09-15 15:33:57,683][00283] Rollout worker 2 uses device cpu -[2024-09-15 15:33:57,684][00283] Rollout worker 3 uses device cpu -[2024-09-15 15:33:57,686][00283] Rollout worker 4 uses device cpu -[2024-09-15 15:33:57,687][00283] Rollout worker 5 uses device cpu -[2024-09-15 15:33:57,689][00283] Rollout worker 6 uses device cpu -[2024-09-15 15:33:57,690][00283] Rollout worker 7 uses device cpu -[2024-09-15 15:33:57,816][00283] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-09-15 15:33:57,817][00283] InferenceWorker_p0-w0: min num requests: 2 -[2024-09-15 15:33:57,849][00283] Starting all processes... -[2024-09-15 15:33:57,850][00283] Starting process learner_proc0 -[2024-09-15 15:33:58,582][00283] Starting all processes... -[2024-09-15 15:33:58,587][00283] Starting process inference_proc0-0 -[2024-09-15 15:33:58,588][00283] Starting process rollout_proc0 -[2024-09-15 15:33:58,588][00283] Starting process rollout_proc1 -[2024-09-15 15:33:58,589][00283] Starting process rollout_proc2 -[2024-09-15 15:33:58,590][00283] Starting process rollout_proc3 -[2024-09-15 15:33:58,602][00283] Starting process rollout_proc4 -[2024-09-15 15:33:58,605][00283] Starting process rollout_proc5 -[2024-09-15 15:33:58,606][00283] Starting process rollout_proc6 -[2024-09-15 15:33:58,607][00283] Starting process rollout_proc7 -[2024-09-15 15:34:00,971][00927] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-09-15 15:34:01,283][00924] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-09-15 15:34:01,380][00922] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-09-15 15:34:01,482][00920] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-09-15 15:34:01,482][00920] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2024-09-15 15:34:01,497][00920] Num visible devices: 1 -[2024-09-15 15:34:01,497][00905] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-09-15 15:34:01,497][00905] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2024-09-15 15:34:01,515][00905] Num visible devices: 1 -[2024-09-15 15:34:01,516][00925] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-09-15 15:34:01,539][00905] Starting seed is not provided -[2024-09-15 15:34:01,540][00905] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-09-15 15:34:01,540][00905] Initializing actor-critic model on device cuda:0 -[2024-09-15 15:34:01,540][00905] RunningMeanStd input shape: (3, 72, 128) -[2024-09-15 15:34:01,544][00905] RunningMeanStd input shape: (1,) -[2024-09-15 15:34:01,565][00905] ConvEncoder: input_channels=3 -[2024-09-15 15:34:01,604][00919] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-09-15 15:34:01,649][00923] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-09-15 15:34:01,649][00921] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-09-15 15:34:01,682][00926] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] -[2024-09-15 15:34:01,850][00905] Conv encoder output size: 512 -[2024-09-15 15:34:01,851][00905] Policy head output size: 512 -[2024-09-15 15:34:01,915][00905] Created Actor Critic model with architecture: -[2024-09-15 15:34:01,915][00905] ActorCriticSharedWeights( +[2024-09-22 05:59:43,579][04746] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-09-22 05:59:43,581][04746] Rollout worker 0 uses device cpu +[2024-09-22 05:59:43,582][04746] Rollout worker 1 uses device cpu +[2024-09-22 05:59:43,584][04746] Rollout worker 2 uses device cpu +[2024-09-22 05:59:43,586][04746] Rollout worker 3 uses device cpu +[2024-09-22 05:59:43,587][04746] Rollout worker 4 uses device cpu +[2024-09-22 05:59:43,589][04746] Rollout worker 5 uses device cpu +[2024-09-22 05:59:43,590][04746] Rollout worker 6 uses device cpu +[2024-09-22 05:59:43,592][04746] Rollout worker 7 uses device cpu +[2024-09-22 05:59:43,714][04746] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 05:59:43,716][04746] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-22 05:59:43,750][04746] Starting all processes... +[2024-09-22 05:59:43,753][04746] Starting process learner_proc0 +[2024-09-22 05:59:44,501][04746] Starting all processes... +[2024-09-22 05:59:44,507][04746] Starting process inference_proc0-0 +[2024-09-22 05:59:44,508][04746] Starting process rollout_proc0 +[2024-09-22 05:59:44,508][04746] Starting process rollout_proc1 +[2024-09-22 05:59:44,509][04746] Starting process rollout_proc2 +[2024-09-22 05:59:44,510][04746] Starting process rollout_proc3 +[2024-09-22 05:59:44,510][04746] Starting process rollout_proc4 +[2024-09-22 05:59:44,511][04746] Starting process rollout_proc5 +[2024-09-22 05:59:44,520][04746] Starting process rollout_proc6 +[2024-09-22 05:59:44,525][04746] Starting process rollout_proc7 +[2024-09-22 05:59:48,487][06918] Worker 7 uses CPU cores [7] +[2024-09-22 05:59:48,681][06893] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 05:59:48,681][06893] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-22 05:59:48,700][06893] Num visible devices: 1 +[2024-09-22 05:59:48,736][06910] Worker 3 uses CPU cores [3] +[2024-09-22 05:59:48,761][06893] Starting seed is not provided +[2024-09-22 05:59:48,761][06893] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 05:59:48,762][06893] Initializing actor-critic model on device cuda:0 +[2024-09-22 05:59:48,762][06893] RunningMeanStd input shape: (3, 72, 128) +[2024-09-22 05:59:48,765][06893] RunningMeanStd input shape: (1,) +[2024-09-22 05:59:48,801][06893] ConvEncoder: input_channels=3 +[2024-09-22 05:59:48,897][06906] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 05:59:48,898][06906] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-22 05:59:48,921][06906] Num visible devices: 1 +[2024-09-22 05:59:48,976][06909] Worker 2 uses CPU cores [2] +[2024-09-22 05:59:49,076][06908] Worker 1 uses CPU cores [1] +[2024-09-22 05:59:49,172][06893] Conv encoder output size: 512 +[2024-09-22 05:59:49,172][06893] Policy head output size: 512 +[2024-09-22 05:59:49,177][06913] Worker 4 uses CPU cores [4] +[2024-09-22 05:59:49,196][06907] Worker 0 uses CPU cores [0] +[2024-09-22 05:59:49,230][06912] Worker 6 uses CPU cores [6] +[2024-09-22 05:59:49,240][06893] Created Actor Critic model with architecture: +[2024-09-22 05:59:49,240][06893] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -85,535 +84,2144 @@ (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) -[2024-09-15 15:34:02,226][00905] Using optimizer -[2024-09-15 15:34:02,893][00905] No checkpoints found -[2024-09-15 15:34:02,893][00905] Did not load from checkpoint, starting from scratch! -[2024-09-15 15:34:02,893][00905] Initialized policy 0 weights for model version 0 -[2024-09-15 15:34:02,898][00905] LearnerWorker_p0 finished initialization! -[2024-09-15 15:34:02,898][00905] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2024-09-15 15:34:02,973][00920] RunningMeanStd input shape: (3, 72, 128) -[2024-09-15 15:34:02,974][00920] RunningMeanStd input shape: (1,) -[2024-09-15 15:34:02,986][00920] ConvEncoder: input_channels=3 -[2024-09-15 15:34:03,094][00920] Conv encoder output size: 512 -[2024-09-15 15:34:03,094][00920] Policy head output size: 512 -[2024-09-15 15:34:03,147][00283] Inference worker 0-0 is ready! -[2024-09-15 15:34:03,149][00283] All inference workers are ready! Signal rollout workers to start! -[2024-09-15 15:34:03,181][00919] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-15 15:34:03,181][00926] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-15 15:34:03,182][00921] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-15 15:34:03,182][00922] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-15 15:34:03,201][00925] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-15 15:34:03,201][00927] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-15 15:34:03,202][00923] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-15 15:34:03,202][00924] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-15 15:34:03,610][00925] Decorrelating experience for 0 frames... -[2024-09-15 15:34:03,610][00921] Decorrelating experience for 0 frames... -[2024-09-15 15:34:03,610][00924] Decorrelating experience for 0 frames... -[2024-09-15 15:34:03,610][00922] Decorrelating experience for 0 frames... -[2024-09-15 15:34:03,610][00926] Decorrelating experience for 0 frames... -[2024-09-15 15:34:03,879][00922] Decorrelating experience for 32 frames... -[2024-09-15 15:34:03,879][00925] Decorrelating experience for 32 frames... -[2024-09-15 15:34:03,881][00923] Decorrelating experience for 0 frames... -[2024-09-15 15:34:03,881][00924] Decorrelating experience for 32 frames... -[2024-09-15 15:34:03,886][00926] Decorrelating experience for 32 frames... -[2024-09-15 15:34:03,989][00927] Decorrelating experience for 0 frames... -[2024-09-15 15:34:03,995][00921] Decorrelating experience for 32 frames... -[2024-09-15 15:34:04,123][00923] Decorrelating experience for 32 frames... -[2024-09-15 15:34:04,203][00924] Decorrelating experience for 64 frames... -[2024-09-15 15:34:04,235][00925] Decorrelating experience for 64 frames... -[2024-09-15 15:34:04,241][00927] Decorrelating experience for 32 frames... -[2024-09-15 15:34:04,242][00922] Decorrelating experience for 64 frames... -[2024-09-15 15:34:04,347][00921] Decorrelating experience for 64 frames... -[2024-09-15 15:34:04,453][00926] Decorrelating experience for 64 frames... -[2024-09-15 15:34:04,497][00923] Decorrelating experience for 64 frames... -[2024-09-15 15:34:04,509][00924] Decorrelating experience for 96 frames... -[2024-09-15 15:34:04,548][00922] Decorrelating experience for 96 frames... -[2024-09-15 15:34:04,551][00925] Decorrelating experience for 96 frames... -[2024-09-15 15:34:04,653][00921] Decorrelating experience for 96 frames... -[2024-09-15 15:34:04,749][00927] Decorrelating experience for 64 frames... -[2024-09-15 15:34:04,762][00926] Decorrelating experience for 96 frames... -[2024-09-15 15:34:04,941][00923] Decorrelating experience for 96 frames... -[2024-09-15 15:34:05,025][00927] Decorrelating experience for 96 frames... -[2024-09-15 15:34:07,213][00905] Signal inference workers to stop experience collection... -[2024-09-15 15:34:07,218][00920] InferenceWorker_p0-w0: stopping experience collection -[2024-09-15 15:34:07,575][00283] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 32. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2024-09-15 15:34:07,576][00283] Avg episode reward: [(0, '2.837')] -[2024-09-15 15:34:10,276][00905] Signal inference workers to resume experience collection... -[2024-09-15 15:34:10,276][00920] InferenceWorker_p0-w0: resuming experience collection -[2024-09-15 15:34:12,406][00920] Updated weights for policy 0, policy_version 10 (0.0149) -[2024-09-15 15:34:12,575][00283] Fps is (10 sec: 8191.8, 60 sec: 8191.8, 300 sec: 8191.8). Total num frames: 40960. Throughput: 0: 1940.4. Samples: 9734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-15 15:34:12,578][00283] Avg episode reward: [(0, '4.224')] -[2024-09-15 15:34:14,647][00920] Updated weights for policy 0, policy_version 20 (0.0013) -[2024-09-15 15:34:16,934][00920] Updated weights for policy 0, policy_version 30 (0.0013) -[2024-09-15 15:34:17,575][00283] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13107.2). Total num frames: 131072. Throughput: 0: 2320.8. Samples: 23240. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-15 15:34:17,578][00283] Avg episode reward: [(0, '4.536')] -[2024-09-15 15:34:17,598][00905] Saving new best policy, reward=4.536! -[2024-09-15 15:34:17,807][00283] Heartbeat connected on Batcher_0 -[2024-09-15 15:34:17,819][00283] Heartbeat connected on LearnerWorker_p0 -[2024-09-15 15:34:17,823][00283] Heartbeat connected on InferenceWorker_p0-w0 -[2024-09-15 15:34:17,829][00283] Heartbeat connected on RolloutWorker_w1 -[2024-09-15 15:34:17,832][00283] Heartbeat connected on RolloutWorker_w2 -[2024-09-15 15:34:17,835][00283] Heartbeat connected on RolloutWorker_w3 -[2024-09-15 15:34:17,838][00283] Heartbeat connected on RolloutWorker_w4 -[2024-09-15 15:34:17,841][00283] Heartbeat connected on RolloutWorker_w5 -[2024-09-15 15:34:17,847][00283] Heartbeat connected on RolloutWorker_w6 -[2024-09-15 15:34:17,849][00283] Heartbeat connected on RolloutWorker_w7 -[2024-09-15 15:34:19,189][00920] Updated weights for policy 0, policy_version 40 (0.0013) -[2024-09-15 15:34:21,477][00920] Updated weights for policy 0, policy_version 50 (0.0013) -[2024-09-15 15:34:22,575][00283] Fps is (10 sec: 18022.3, 60 sec: 14745.4, 300 sec: 14745.4). Total num frames: 221184. Throughput: 0: 3365.6. Samples: 50516. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-15 15:34:22,578][00283] Avg episode reward: [(0, '4.429')] -[2024-09-15 15:34:23,765][00920] Updated weights for policy 0, policy_version 60 (0.0013) -[2024-09-15 15:34:26,039][00920] Updated weights for policy 0, policy_version 70 (0.0012) -[2024-09-15 15:34:27,575][00283] Fps is (10 sec: 18022.4, 60 sec: 15564.8, 300 sec: 15564.8). Total num frames: 311296. Throughput: 0: 3863.1. Samples: 77294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-15 15:34:27,577][00283] Avg episode reward: [(0, '4.266')] -[2024-09-15 15:34:28,301][00920] Updated weights for policy 0, policy_version 80 (0.0012) -[2024-09-15 15:34:30,530][00920] Updated weights for policy 0, policy_version 90 (0.0013) -[2024-09-15 15:34:32,575][00283] Fps is (10 sec: 18022.4, 60 sec: 16056.2, 300 sec: 16056.2). Total num frames: 401408. Throughput: 0: 3637.3. Samples: 90966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-15 15:34:32,578][00283] Avg episode reward: [(0, '4.535')] -[2024-09-15 15:34:32,822][00920] Updated weights for policy 0, policy_version 100 (0.0012) -[2024-09-15 15:34:35,134][00920] Updated weights for policy 0, policy_version 110 (0.0012) -[2024-09-15 15:34:37,510][00920] Updated weights for policy 0, policy_version 120 (0.0012) -[2024-09-15 15:34:37,575][00283] Fps is (10 sec: 18022.4, 60 sec: 16384.0, 300 sec: 16384.0). Total num frames: 491520. Throughput: 0: 3920.3. Samples: 117642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-15 15:34:37,577][00283] Avg episode reward: [(0, '4.584')] -[2024-09-15 15:34:37,579][00905] Saving new best policy, reward=4.584! -[2024-09-15 15:34:39,800][00920] Updated weights for policy 0, policy_version 130 (0.0013) -[2024-09-15 15:34:42,005][00920] Updated weights for policy 0, policy_version 140 (0.0012) -[2024-09-15 15:34:42,575][00283] Fps is (10 sec: 18022.6, 60 sec: 16618.0, 300 sec: 16618.0). Total num frames: 581632. Throughput: 0: 4126.6. Samples: 144462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-15 15:34:42,578][00283] Avg episode reward: [(0, '4.717')] -[2024-09-15 15:34:42,585][00905] Saving new best policy, reward=4.717! -[2024-09-15 15:34:44,311][00920] Updated weights for policy 0, policy_version 150 (0.0012) -[2024-09-15 15:34:46,504][00920] Updated weights for policy 0, policy_version 160 (0.0012) -[2024-09-15 15:34:47,575][00283] Fps is (10 sec: 18022.3, 60 sec: 16793.6, 300 sec: 16793.6). Total num frames: 671744. Throughput: 0: 3951.4. Samples: 158088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2024-09-15 15:34:47,577][00283] Avg episode reward: [(0, '4.565')] -[2024-09-15 15:34:48,821][00920] Updated weights for policy 0, policy_version 170 (0.0012) -[2024-09-15 15:34:51,101][00920] Updated weights for policy 0, policy_version 180 (0.0013) -[2024-09-15 15:34:52,575][00283] Fps is (10 sec: 18022.4, 60 sec: 16930.1, 300 sec: 16930.1). Total num frames: 761856. Throughput: 0: 4111.9. Samples: 185068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-15 15:34:52,577][00283] Avg episode reward: [(0, '4.478')] -[2024-09-15 15:34:53,382][00920] Updated weights for policy 0, policy_version 190 (0.0013) -[2024-09-15 15:34:55,645][00920] Updated weights for policy 0, policy_version 200 (0.0012) -[2024-09-15 15:34:57,575][00283] Fps is (10 sec: 18022.5, 60 sec: 17039.4, 300 sec: 17039.4). Total num frames: 851968. Throughput: 0: 4500.1. Samples: 212236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-15 15:34:57,578][00283] Avg episode reward: [(0, '4.743')] -[2024-09-15 15:34:57,581][00905] Saving new best policy, reward=4.743! -[2024-09-15 15:34:57,907][00920] Updated weights for policy 0, policy_version 210 (0.0012) -[2024-09-15 15:35:00,155][00920] Updated weights for policy 0, policy_version 220 (0.0012) -[2024-09-15 15:35:02,394][00920] Updated weights for policy 0, policy_version 230 (0.0012) -[2024-09-15 15:35:02,575][00283] Fps is (10 sec: 18022.3, 60 sec: 17128.7, 300 sec: 17128.7). Total num frames: 942080. Throughput: 0: 4501.2. Samples: 225796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-15 15:35:02,578][00283] Avg episode reward: [(0, '4.534')] -[2024-09-15 15:35:04,706][00920] Updated weights for policy 0, policy_version 240 (0.0013) -[2024-09-15 15:35:07,015][00920] Updated weights for policy 0, policy_version 250 (0.0013) -[2024-09-15 15:35:07,575][00283] Fps is (10 sec: 18022.3, 60 sec: 17203.2, 300 sec: 17203.2). Total num frames: 1032192. Throughput: 0: 4493.0. Samples: 252700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-15 15:35:07,578][00283] Avg episode reward: [(0, '4.654')] -[2024-09-15 15:35:09,271][00920] Updated weights for policy 0, policy_version 260 (0.0012) -[2024-09-15 15:35:11,563][00920] Updated weights for policy 0, policy_version 270 (0.0013) -[2024-09-15 15:35:12,575][00283] Fps is (10 sec: 18022.6, 60 sec: 18022.4, 300 sec: 17266.2). Total num frames: 1122304. Throughput: 0: 4500.6. Samples: 279820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-15 15:35:12,577][00283] Avg episode reward: [(0, '4.668')] -[2024-09-15 15:35:13,812][00920] Updated weights for policy 0, policy_version 280 (0.0012) -[2024-09-15 15:35:16,111][00920] Updated weights for policy 0, policy_version 290 (0.0013) -[2024-09-15 15:35:17,575][00283] Fps is (10 sec: 18022.4, 60 sec: 18022.4, 300 sec: 17320.2). Total num frames: 1212416. Throughput: 0: 4496.8. Samples: 293320. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-09-15 15:35:17,578][00283] Avg episode reward: [(0, '4.511')] -[2024-09-15 15:35:18,423][00920] Updated weights for policy 0, policy_version 300 (0.0012) -[2024-09-15 15:35:20,740][00920] Updated weights for policy 0, policy_version 310 (0.0012) -[2024-09-15 15:35:22,575][00283] Fps is (10 sec: 18022.3, 60 sec: 18022.4, 300 sec: 17367.0). Total num frames: 1302528. Throughput: 0: 4497.1. Samples: 320010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-15 15:35:22,578][00283] Avg episode reward: [(0, '4.327')] -[2024-09-15 15:35:22,966][00920] Updated weights for policy 0, policy_version 320 (0.0012) -[2024-09-15 15:35:25,248][00920] Updated weights for policy 0, policy_version 330 (0.0012) -[2024-09-15 15:35:27,497][00920] Updated weights for policy 0, policy_version 340 (0.0012) -[2024-09-15 15:35:27,575][00283] Fps is (10 sec: 18022.4, 60 sec: 18022.4, 300 sec: 17408.0). Total num frames: 1392640. Throughput: 0: 4505.7. Samples: 347218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-15 15:35:27,577][00283] Avg episode reward: [(0, '4.523')] -[2024-09-15 15:35:29,759][00920] Updated weights for policy 0, policy_version 350 (0.0012) -[2024-09-15 15:35:32,112][00920] Updated weights for policy 0, policy_version 360 (0.0013) -[2024-09-15 15:35:32,575][00283] Fps is (10 sec: 18022.4, 60 sec: 18022.4, 300 sec: 17444.1). Total num frames: 1482752. Throughput: 0: 4504.8. Samples: 360804. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-09-15 15:35:32,577][00283] Avg episode reward: [(0, '4.304')] -[2024-09-15 15:35:34,357][00920] Updated weights for policy 0, policy_version 370 (0.0013) -[2024-09-15 15:35:36,702][00920] Updated weights for policy 0, policy_version 380 (0.0013) -[2024-09-15 15:35:37,575][00283] Fps is (10 sec: 17613.0, 60 sec: 17954.1, 300 sec: 17430.8). Total num frames: 1568768. Throughput: 0: 4493.7. Samples: 387284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-15 15:35:37,577][00283] Avg episode reward: [(0, '4.624')] -[2024-09-15 15:35:38,950][00920] Updated weights for policy 0, policy_version 390 (0.0013) -[2024-09-15 15:35:41,209][00920] Updated weights for policy 0, policy_version 400 (0.0012) -[2024-09-15 15:35:42,575][00283] Fps is (10 sec: 18022.6, 60 sec: 18022.4, 300 sec: 17505.0). Total num frames: 1662976. Throughput: 0: 4493.3. Samples: 414436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-15 15:35:42,578][00283] Avg episode reward: [(0, '4.489')] -[2024-09-15 15:35:43,487][00920] Updated weights for policy 0, policy_version 410 (0.0012) -[2024-09-15 15:35:45,822][00920] Updated weights for policy 0, policy_version 420 (0.0013) -[2024-09-15 15:35:47,575][00283] Fps is (10 sec: 18022.2, 60 sec: 17954.1, 300 sec: 17489.9). Total num frames: 1748992. Throughput: 0: 4487.1. Samples: 427714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-15 15:35:47,577][00283] Avg episode reward: [(0, '5.006')] -[2024-09-15 15:35:47,580][00905] Saving new best policy, reward=5.006! -[2024-09-15 15:35:48,125][00920] Updated weights for policy 0, policy_version 430 (0.0012) -[2024-09-15 15:35:50,325][00920] Updated weights for policy 0, policy_version 440 (0.0012) -[2024-09-15 15:35:52,575][00283] Fps is (10 sec: 17612.7, 60 sec: 17954.1, 300 sec: 17515.3). Total num frames: 1839104. Throughput: 0: 4491.3. Samples: 454810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-15 15:35:52,577][00283] Avg episode reward: [(0, '4.720')] -[2024-09-15 15:35:52,585][00905] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000449_1839104.pth... -[2024-09-15 15:35:52,689][00920] Updated weights for policy 0, policy_version 450 (0.0012) -[2024-09-15 15:35:54,893][00920] Updated weights for policy 0, policy_version 460 (0.0013) -[2024-09-15 15:35:57,162][00920] Updated weights for policy 0, policy_version 470 (0.0012) -[2024-09-15 15:35:57,575][00283] Fps is (10 sec: 18022.4, 60 sec: 17954.1, 300 sec: 17538.3). Total num frames: 1929216. Throughput: 0: 4487.6. Samples: 481760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-15 15:35:57,577][00283] Avg episode reward: [(0, '4.500')] -[2024-09-15 15:35:59,523][00920] Updated weights for policy 0, policy_version 480 (0.0012) -[2024-09-15 15:36:01,794][00920] Updated weights for policy 0, policy_version 490 (0.0012) -[2024-09-15 15:36:02,575][00283] Fps is (10 sec: 18022.3, 60 sec: 17954.2, 300 sec: 17559.4). Total num frames: 2019328. Throughput: 0: 4479.7. Samples: 494908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-15 15:36:02,578][00283] Avg episode reward: [(0, '4.667')] -[2024-09-15 15:36:04,050][00920] Updated weights for policy 0, policy_version 500 (0.0013) -[2024-09-15 15:36:06,281][00920] Updated weights for policy 0, policy_version 510 (0.0012) -[2024-09-15 15:36:07,575][00283] Fps is (10 sec: 18022.4, 60 sec: 17954.1, 300 sec: 17578.7). Total num frames: 2109440. Throughput: 0: 4493.8. Samples: 522230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-15 15:36:07,578][00283] Avg episode reward: [(0, '4.579')] -[2024-09-15 15:36:08,561][00920] Updated weights for policy 0, policy_version 520 (0.0012) -[2024-09-15 15:36:10,832][00920] Updated weights for policy 0, policy_version 530 (0.0012) -[2024-09-15 15:36:12,575][00283] Fps is (10 sec: 18022.4, 60 sec: 17954.1, 300 sec: 17596.4). Total num frames: 2199552. Throughput: 0: 4486.7. Samples: 549120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-15 15:36:12,577][00283] Avg episode reward: [(0, '4.761')] -[2024-09-15 15:36:13,176][00920] Updated weights for policy 0, policy_version 540 (0.0013) -[2024-09-15 15:36:15,427][00920] Updated weights for policy 0, policy_version 550 (0.0013) -[2024-09-15 15:36:17,575][00283] Fps is (10 sec: 18022.6, 60 sec: 17954.2, 300 sec: 17612.8). Total num frames: 2289664. Throughput: 0: 4483.1. Samples: 562544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-15 15:36:17,578][00283] Avg episode reward: [(0, '4.830')] -[2024-09-15 15:36:17,672][00920] Updated weights for policy 0, policy_version 560 (0.0012) -[2024-09-15 15:36:19,912][00920] Updated weights for policy 0, policy_version 570 (0.0012) -[2024-09-15 15:36:22,200][00920] Updated weights for policy 0, policy_version 580 (0.0012) -[2024-09-15 15:36:22,575][00283] Fps is (10 sec: 18022.4, 60 sec: 17954.1, 300 sec: 17628.0). Total num frames: 2379776. Throughput: 0: 4501.6. Samples: 589858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-15 15:36:22,577][00283] Avg episode reward: [(0, '5.021')] -[2024-09-15 15:36:22,585][00905] Saving new best policy, reward=5.021! -[2024-09-15 15:36:24,500][00920] Updated weights for policy 0, policy_version 590 (0.0012) -[2024-09-15 15:36:26,819][00920] Updated weights for policy 0, policy_version 600 (0.0012) -[2024-09-15 15:36:27,575][00283] Fps is (10 sec: 18022.4, 60 sec: 17954.2, 300 sec: 17642.1). Total num frames: 2469888. Throughput: 0: 4489.4. Samples: 616458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-15 15:36:27,577][00283] Avg episode reward: [(0, '4.679')] -[2024-09-15 15:36:29,114][00920] Updated weights for policy 0, policy_version 610 (0.0012) -[2024-09-15 15:36:31,340][00920] Updated weights for policy 0, policy_version 620 (0.0012) -[2024-09-15 15:36:32,575][00283] Fps is (10 sec: 18022.4, 60 sec: 17954.1, 300 sec: 17655.2). Total num frames: 2560000. Throughput: 0: 4495.9. Samples: 630028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-15 15:36:32,577][00283] Avg episode reward: [(0, '4.972')] -[2024-09-15 15:36:33,590][00920] Updated weights for policy 0, policy_version 630 (0.0013) -[2024-09-15 15:36:35,849][00920] Updated weights for policy 0, policy_version 640 (0.0012) -[2024-09-15 15:36:37,575][00283] Fps is (10 sec: 18022.3, 60 sec: 18022.4, 300 sec: 17667.4). Total num frames: 2650112. Throughput: 0: 4501.5. Samples: 657376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-15 15:36:37,578][00283] Avg episode reward: [(0, '4.711')] -[2024-09-15 15:36:38,199][00920] Updated weights for policy 0, policy_version 650 (0.0013) -[2024-09-15 15:36:40,549][00920] Updated weights for policy 0, policy_version 660 (0.0013) -[2024-09-15 15:36:42,575][00283] Fps is (10 sec: 18022.4, 60 sec: 17954.1, 300 sec: 17678.9). Total num frames: 2740224. Throughput: 0: 4489.2. Samples: 683774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-15 15:36:42,578][00283] Avg episode reward: [(0, '5.622')] -[2024-09-15 15:36:42,586][00905] Saving new best policy, reward=5.622! -[2024-09-15 15:36:42,762][00920] Updated weights for policy 0, policy_version 670 (0.0013) -[2024-09-15 15:36:45,053][00920] Updated weights for policy 0, policy_version 680 (0.0012) -[2024-09-15 15:36:47,268][00920] Updated weights for policy 0, policy_version 690 (0.0012) -[2024-09-15 15:36:47,575][00283] Fps is (10 sec: 18022.4, 60 sec: 18022.4, 300 sec: 17689.6). Total num frames: 2830336. Throughput: 0: 4501.4. Samples: 697472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-15 15:36:47,577][00283] Avg episode reward: [(0, '5.687')] -[2024-09-15 15:36:47,579][00905] Saving new best policy, reward=5.687! -[2024-09-15 15:36:49,519][00920] Updated weights for policy 0, policy_version 700 (0.0013) -[2024-09-15 15:36:51,813][00920] Updated weights for policy 0, policy_version 710 (0.0012) -[2024-09-15 15:36:52,575][00283] Fps is (10 sec: 18022.4, 60 sec: 18022.4, 300 sec: 17699.7). Total num frames: 2920448. Throughput: 0: 4498.7. Samples: 724672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) -[2024-09-15 15:36:52,578][00283] Avg episode reward: [(0, '5.521')] -[2024-09-15 15:36:54,144][00920] Updated weights for policy 0, policy_version 720 (0.0013) -[2024-09-15 15:36:56,428][00920] Updated weights for policy 0, policy_version 730 (0.0013) -[2024-09-15 15:36:57,575][00283] Fps is (10 sec: 18022.3, 60 sec: 18022.4, 300 sec: 17709.2). Total num frames: 3010560. Throughput: 0: 4497.1. Samples: 751490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-15 15:36:57,577][00283] Avg episode reward: [(0, '5.582')] -[2024-09-15 15:36:58,639][00920] Updated weights for policy 0, policy_version 740 (0.0012) -[2024-09-15 15:37:00,911][00920] Updated weights for policy 0, policy_version 750 (0.0012) -[2024-09-15 15:37:02,575][00283] Fps is (10 sec: 18022.5, 60 sec: 18022.4, 300 sec: 17718.1). Total num frames: 3100672. Throughput: 0: 4504.9. Samples: 765266. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2024-09-15 15:37:02,577][00283] Avg episode reward: [(0, '4.894')] -[2024-09-15 15:37:03,194][00920] Updated weights for policy 0, policy_version 760 (0.0012) -[2024-09-15 15:37:05,466][00920] Updated weights for policy 0, policy_version 770 (0.0013) -[2024-09-15 15:37:07,575][00283] Fps is (10 sec: 17612.9, 60 sec: 17954.1, 300 sec: 17703.8). Total num frames: 3186688. Throughput: 0: 4488.7. Samples: 791850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-15 15:37:07,578][00283] Avg episode reward: [(0, '5.259')] -[2024-09-15 15:37:07,889][00920] Updated weights for policy 0, policy_version 780 (0.0013) -[2024-09-15 15:37:10,133][00920] Updated weights for policy 0, policy_version 790 (0.0012) -[2024-09-15 15:37:12,451][00920] Updated weights for policy 0, policy_version 800 (0.0012) -[2024-09-15 15:37:12,575][00283] Fps is (10 sec: 17612.7, 60 sec: 17954.1, 300 sec: 17712.4). Total num frames: 3276800. Throughput: 0: 4490.4. Samples: 818526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-15 15:37:12,578][00283] Avg episode reward: [(0, '5.193')] -[2024-09-15 15:37:14,683][00920] Updated weights for policy 0, policy_version 810 (0.0012) -[2024-09-15 15:37:16,951][00920] Updated weights for policy 0, policy_version 820 (0.0012) -[2024-09-15 15:37:17,575][00283] Fps is (10 sec: 18022.6, 60 sec: 17954.1, 300 sec: 17720.6). Total num frames: 3366912. Throughput: 0: 4489.6. Samples: 832060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) -[2024-09-15 15:37:17,577][00283] Avg episode reward: [(0, '5.314')] -[2024-09-15 15:37:19,229][00920] Updated weights for policy 0, policy_version 830 (0.0012) -[2024-09-15 15:37:21,559][00920] Updated weights for policy 0, policy_version 840 (0.0012) -[2024-09-15 15:37:22,575][00283] Fps is (10 sec: 18022.5, 60 sec: 17954.2, 300 sec: 17728.3). Total num frames: 3457024. Throughput: 0: 4479.1. Samples: 858934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-15 15:37:22,578][00283] Avg episode reward: [(0, '5.790')] -[2024-09-15 15:37:22,586][00905] Saving new best policy, reward=5.790! -[2024-09-15 15:37:23,840][00920] Updated weights for policy 0, policy_version 850 (0.0012) -[2024-09-15 15:37:26,039][00920] Updated weights for policy 0, policy_version 860 (0.0012) -[2024-09-15 15:37:27,575][00283] Fps is (10 sec: 18022.3, 60 sec: 17954.1, 300 sec: 17735.7). Total num frames: 3547136. Throughput: 0: 4498.9. Samples: 886226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) -[2024-09-15 15:37:27,577][00283] Avg episode reward: [(0, '6.104')] -[2024-09-15 15:37:27,579][00905] Saving new best policy, reward=6.104! -[2024-09-15 15:37:28,319][00920] Updated weights for policy 0, policy_version 870 (0.0012) -[2024-09-15 15:37:30,542][00920] Updated weights for policy 0, policy_version 880 (0.0012) -[2024-09-15 15:37:32,575][00283] Fps is (10 sec: 18022.4, 60 sec: 17954.2, 300 sec: 17742.7). Total num frames: 3637248. Throughput: 0: 4497.3. Samples: 899848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-15 15:37:32,577][00283] Avg episode reward: [(0, '6.669')] -[2024-09-15 15:37:32,586][00905] Saving new best policy, reward=6.669! -[2024-09-15 15:37:32,864][00920] Updated weights for policy 0, policy_version 890 (0.0012) -[2024-09-15 15:37:35,199][00920] Updated weights for policy 0, policy_version 900 (0.0013) -[2024-09-15 15:37:37,455][00920] Updated weights for policy 0, policy_version 910 (0.0012) -[2024-09-15 15:37:37,575][00283] Fps is (10 sec: 18022.4, 60 sec: 17954.2, 300 sec: 17749.3). Total num frames: 3727360. Throughput: 0: 4485.0. Samples: 926496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-15 15:37:37,577][00283] Avg episode reward: [(0, '6.612')] -[2024-09-15 15:37:39,757][00920] Updated weights for policy 0, policy_version 920 (0.0012) -[2024-09-15 15:37:42,007][00920] Updated weights for policy 0, policy_version 930 (0.0012) -[2024-09-15 15:37:42,575][00283] Fps is (10 sec: 18022.4, 60 sec: 17954.2, 300 sec: 17755.7). Total num frames: 3817472. Throughput: 0: 4491.7. Samples: 953616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) -[2024-09-15 15:37:42,577][00283] Avg episode reward: [(0, '7.084')] -[2024-09-15 15:37:42,585][00905] Saving new best policy, reward=7.084! -[2024-09-15 15:37:44,276][00920] Updated weights for policy 0, policy_version 940 (0.0012) -[2024-09-15 15:37:46,586][00920] Updated weights for policy 0, policy_version 950 (0.0013) -[2024-09-15 15:37:47,575][00283] Fps is (10 sec: 18022.4, 60 sec: 17954.2, 300 sec: 17761.7). Total num frames: 3907584. Throughput: 0: 4487.0. Samples: 967180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) -[2024-09-15 15:37:47,577][00283] Avg episode reward: [(0, '6.702')] -[2024-09-15 15:37:48,893][00920] Updated weights for policy 0, policy_version 960 (0.0013) -[2024-09-15 15:37:51,120][00920] Updated weights for policy 0, policy_version 970 (0.0012) -[2024-09-15 15:37:52,575][00283] Fps is (10 sec: 18022.1, 60 sec: 17954.1, 300 sec: 17767.5). Total num frames: 3997696. Throughput: 0: 4490.3. Samples: 993912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) -[2024-09-15 15:37:52,578][00283] Avg episode reward: [(0, '6.957')] -[2024-09-15 15:37:52,586][00905] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000976_3997696.pth... -[2024-09-15 15:37:52,959][00905] Stopping Batcher_0... -[2024-09-15 15:37:52,960][00905] Loop batcher_evt_loop terminating... -[2024-09-15 15:37:52,960][00905] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-09-15 15:37:52,959][00283] Component Batcher_0 stopped! -[2024-09-15 15:37:52,961][00283] Component RolloutWorker_w0 process died already! Don't wait for it. -[2024-09-15 15:37:52,978][00920] Weights refcount: 2 0 -[2024-09-15 15:37:52,979][00920] Stopping InferenceWorker_p0-w0... -[2024-09-15 15:37:52,980][00920] Loop inference_proc0-0_evt_loop terminating... -[2024-09-15 15:37:52,980][00283] Component InferenceWorker_p0-w0 stopped! -[2024-09-15 15:37:53,010][00927] Stopping RolloutWorker_w6... -[2024-09-15 15:37:53,010][00927] Loop rollout_proc6_evt_loop terminating... -[2024-09-15 15:37:53,011][00923] Stopping RolloutWorker_w4... -[2024-09-15 15:37:53,012][00923] Loop rollout_proc4_evt_loop terminating... -[2024-09-15 15:37:53,010][00283] Component RolloutWorker_w6 stopped! -[2024-09-15 15:37:53,013][00926] Stopping RolloutWorker_w7... -[2024-09-15 15:37:53,013][00922] Stopping RolloutWorker_w2... -[2024-09-15 15:37:53,013][00922] Loop rollout_proc2_evt_loop terminating... -[2024-09-15 15:37:53,013][00926] Loop rollout_proc7_evt_loop terminating... -[2024-09-15 15:37:53,012][00283] Component RolloutWorker_w4 stopped! -[2024-09-15 15:37:53,015][00921] Stopping RolloutWorker_w1... -[2024-09-15 15:37:53,016][00921] Loop rollout_proc1_evt_loop terminating... -[2024-09-15 15:37:53,016][00925] Stopping RolloutWorker_w5... -[2024-09-15 15:37:53,015][00283] Component RolloutWorker_w7 stopped! -[2024-09-15 15:37:53,017][00925] Loop rollout_proc5_evt_loop terminating... -[2024-09-15 15:37:53,017][00924] Stopping RolloutWorker_w3... -[2024-09-15 15:37:53,017][00283] Component RolloutWorker_w2 stopped! -[2024-09-15 15:37:53,018][00924] Loop rollout_proc3_evt_loop terminating... -[2024-09-15 15:37:53,018][00283] Component RolloutWorker_w1 stopped! -[2024-09-15 15:37:53,019][00283] Component RolloutWorker_w5 stopped! -[2024-09-15 15:37:53,020][00283] Component RolloutWorker_w3 stopped! -[2024-09-15 15:37:53,029][00905] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000449_1839104.pth -[2024-09-15 15:37:53,035][00905] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-09-15 15:37:53,123][00905] Stopping LearnerWorker_p0... -[2024-09-15 15:37:53,123][00905] Loop learner_proc0_evt_loop terminating... -[2024-09-15 15:37:53,123][00283] Component LearnerWorker_p0 stopped! -[2024-09-15 15:37:53,126][00283] Waiting for process learner_proc0 to stop... -[2024-09-15 15:37:53,958][00283] Waiting for process inference_proc0-0 to join... -[2024-09-15 15:37:53,961][00283] Waiting for process rollout_proc0 to join... -[2024-09-15 15:37:53,962][00283] Waiting for process rollout_proc1 to join... -[2024-09-15 15:37:53,964][00283] Waiting for process rollout_proc2 to join... -[2024-09-15 15:37:53,966][00283] Waiting for process rollout_proc3 to join... -[2024-09-15 15:37:53,968][00283] Waiting for process rollout_proc4 to join... -[2024-09-15 15:37:53,970][00283] Waiting for process rollout_proc5 to join... -[2024-09-15 15:37:53,972][00283] Waiting for process rollout_proc6 to join... -[2024-09-15 15:37:53,973][00283] Waiting for process rollout_proc7 to join... -[2024-09-15 15:37:53,975][00283] Batcher 0 profile tree view: -batching: 13.5559, releasing_batches: 0.0228 -[2024-09-15 15:37:53,976][00283] InferenceWorker_p0-w0 profile tree view: +[2024-09-22 05:59:49,362][06911] Worker 5 uses CPU cores [5] +[2024-09-22 05:59:49,667][06893] Using optimizer +[2024-09-22 05:59:50,419][06893] No checkpoints found +[2024-09-22 05:59:50,419][06893] Did not load from checkpoint, starting from scratch! +[2024-09-22 05:59:50,419][06893] Initialized policy 0 weights for model version 0 +[2024-09-22 05:59:50,423][06893] LearnerWorker_p0 finished initialization! +[2024-09-22 05:59:50,424][06893] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 05:59:50,602][06906] RunningMeanStd input shape: (3, 72, 128) +[2024-09-22 05:59:50,603][06906] RunningMeanStd input shape: (1,) +[2024-09-22 05:59:50,616][06906] ConvEncoder: input_channels=3 +[2024-09-22 05:59:50,731][06906] Conv encoder output size: 512 +[2024-09-22 05:59:50,731][06906] Policy head output size: 512 +[2024-09-22 05:59:50,788][04746] Inference worker 0-0 is ready! +[2024-09-22 05:59:50,789][04746] All inference workers are ready! Signal rollout workers to start! +[2024-09-22 05:59:50,846][06911] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 05:59:50,847][06909] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 05:59:50,846][06908] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 05:59:50,848][06910] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 05:59:50,848][06918] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 05:59:50,847][06913] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 05:59:50,848][06907] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 05:59:50,848][06912] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 05:59:51,197][06910] Decorrelating experience for 0 frames... +[2024-09-22 05:59:51,197][06911] Decorrelating experience for 0 frames... +[2024-09-22 05:59:51,197][06909] Decorrelating experience for 0 frames... +[2024-09-22 05:59:51,324][06913] Decorrelating experience for 0 frames... +[2024-09-22 05:59:51,326][06907] Decorrelating experience for 0 frames... +[2024-09-22 05:59:51,457][06910] Decorrelating experience for 32 frames... +[2024-09-22 05:59:51,462][06908] Decorrelating experience for 0 frames... +[2024-09-22 05:59:51,639][06907] Decorrelating experience for 32 frames... +[2024-09-22 05:59:51,755][06909] Decorrelating experience for 32 frames... +[2024-09-22 05:59:51,781][06912] Decorrelating experience for 0 frames... +[2024-09-22 05:59:51,819][06910] Decorrelating experience for 64 frames... +[2024-09-22 05:59:51,842][06913] Decorrelating experience for 32 frames... +[2024-09-22 05:59:51,852][06908] Decorrelating experience for 32 frames... +[2024-09-22 05:59:51,915][06911] Decorrelating experience for 32 frames... +[2024-09-22 05:59:52,106][06918] Decorrelating experience for 0 frames... +[2024-09-22 05:59:52,182][06907] Decorrelating experience for 64 frames... +[2024-09-22 05:59:52,251][06912] Decorrelating experience for 32 frames... +[2024-09-22 05:59:52,259][06908] Decorrelating experience for 64 frames... +[2024-09-22 05:59:52,265][06909] Decorrelating experience for 64 frames... +[2024-09-22 05:59:52,361][06918] Decorrelating experience for 32 frames... +[2024-09-22 05:59:52,477][06913] Decorrelating experience for 64 frames... +[2024-09-22 05:59:52,547][06908] Decorrelating experience for 96 frames... +[2024-09-22 05:59:52,636][06907] Decorrelating experience for 96 frames... +[2024-09-22 05:59:52,736][06909] Decorrelating experience for 96 frames... +[2024-09-22 05:59:52,786][06912] Decorrelating experience for 64 frames... +[2024-09-22 05:59:52,797][06910] Decorrelating experience for 96 frames... +[2024-09-22 05:59:52,998][06918] Decorrelating experience for 64 frames... +[2024-09-22 05:59:53,031][04746] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-22 05:59:53,072][06911] Decorrelating experience for 64 frames... +[2024-09-22 05:59:53,097][06912] Decorrelating experience for 96 frames... +[2024-09-22 05:59:53,296][06913] Decorrelating experience for 96 frames... +[2024-09-22 05:59:53,393][06918] Decorrelating experience for 96 frames... +[2024-09-22 05:59:53,421][06911] Decorrelating experience for 96 frames... +[2024-09-22 05:59:55,184][06893] Signal inference workers to stop experience collection... +[2024-09-22 05:59:55,192][06906] InferenceWorker_p0-w0: stopping experience collection +[2024-09-22 05:59:58,031][04746] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 460.4. Samples: 2302. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-22 05:59:58,034][04746] Avg episode reward: [(0, '1.889')] +[2024-09-22 05:59:58,912][06893] Signal inference workers to resume experience collection... +[2024-09-22 05:59:58,913][06906] InferenceWorker_p0-w0: resuming experience collection +[2024-09-22 06:00:01,784][06906] Updated weights for policy 0, policy_version 10 (0.0163) +[2024-09-22 06:00:03,033][04746] Fps is (10 sec: 5324.0, 60 sec: 5324.0, 300 sec: 5324.0). Total num frames: 53248. Throughput: 0: 1365.4. Samples: 13656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-22 06:00:03,035][04746] Avg episode reward: [(0, '4.332')] +[2024-09-22 06:00:03,705][04746] Heartbeat connected on Batcher_0 +[2024-09-22 06:00:03,709][04746] Heartbeat connected on LearnerWorker_p0 +[2024-09-22 06:00:03,721][04746] Heartbeat connected on InferenceWorker_p0-w0 +[2024-09-22 06:00:03,726][04746] Heartbeat connected on RolloutWorker_w0 +[2024-09-22 06:00:03,729][04746] Heartbeat connected on RolloutWorker_w1 +[2024-09-22 06:00:03,733][04746] Heartbeat connected on RolloutWorker_w2 +[2024-09-22 06:00:03,737][04746] Heartbeat connected on RolloutWorker_w3 +[2024-09-22 06:00:03,738][04746] Heartbeat connected on RolloutWorker_w4 +[2024-09-22 06:00:03,748][04746] Heartbeat connected on RolloutWorker_w6 +[2024-09-22 06:00:03,756][04746] Heartbeat connected on RolloutWorker_w5 +[2024-09-22 06:00:03,758][04746] Heartbeat connected on RolloutWorker_w7 +[2024-09-22 06:00:05,230][06906] Updated weights for policy 0, policy_version 20 (0.0017) +[2024-09-22 06:00:08,031][04746] Fps is (10 sec: 11468.9, 60 sec: 7645.9, 300 sec: 7645.9). Total num frames: 114688. Throughput: 0: 1535.2. Samples: 23028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:00:08,034][04746] Avg episode reward: [(0, '4.432')] +[2024-09-22 06:00:08,084][06893] Saving new best policy, reward=4.432! +[2024-09-22 06:00:08,391][06906] Updated weights for policy 0, policy_version 30 (0.0014) +[2024-09-22 06:00:11,361][06906] Updated weights for policy 0, policy_version 40 (0.0015) +[2024-09-22 06:00:13,031][04746] Fps is (10 sec: 13109.1, 60 sec: 9216.0, 300 sec: 9216.0). Total num frames: 184320. Throughput: 0: 2153.9. Samples: 43078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:00:13,033][04746] Avg episode reward: [(0, '4.490')] +[2024-09-22 06:00:13,036][06893] Saving new best policy, reward=4.490! +[2024-09-22 06:00:14,385][06906] Updated weights for policy 0, policy_version 50 (0.0016) +[2024-09-22 06:00:17,617][06906] Updated weights for policy 0, policy_version 60 (0.0014) +[2024-09-22 06:00:18,034][04746] Fps is (10 sec: 13513.4, 60 sec: 9993.3, 300 sec: 9993.3). Total num frames: 249856. Throughput: 0: 2506.8. Samples: 62676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-22 06:00:18,036][04746] Avg episode reward: [(0, '4.243')] +[2024-09-22 06:00:20,589][06906] Updated weights for policy 0, policy_version 70 (0.0014) +[2024-09-22 06:00:23,031][04746] Fps is (10 sec: 13516.8, 60 sec: 10649.6, 300 sec: 10649.6). Total num frames: 319488. Throughput: 0: 2431.7. Samples: 72952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:00:23,034][04746] Avg episode reward: [(0, '4.546')] +[2024-09-22 06:00:23,037][06893] Saving new best policy, reward=4.546! +[2024-09-22 06:00:23,601][06906] Updated weights for policy 0, policy_version 80 (0.0014) +[2024-09-22 06:00:26,497][06906] Updated weights for policy 0, policy_version 90 (0.0015) +[2024-09-22 06:00:28,031][04746] Fps is (10 sec: 13929.9, 60 sec: 11117.7, 300 sec: 11117.7). Total num frames: 389120. Throughput: 0: 2685.5. Samples: 93994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:00:28,034][04746] Avg episode reward: [(0, '4.462')] +[2024-09-22 06:00:29,428][06906] Updated weights for policy 0, policy_version 100 (0.0016) +[2024-09-22 06:00:32,699][06906] Updated weights for policy 0, policy_version 110 (0.0015) +[2024-09-22 06:00:33,031][04746] Fps is (10 sec: 13516.7, 60 sec: 11366.4, 300 sec: 11366.4). Total num frames: 454656. Throughput: 0: 2842.0. Samples: 113682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-22 06:00:33,033][04746] Avg episode reward: [(0, '4.527')] +[2024-09-22 06:00:35,664][06906] Updated weights for policy 0, policy_version 120 (0.0017) +[2024-09-22 06:00:38,031][04746] Fps is (10 sec: 13107.0, 60 sec: 11559.8, 300 sec: 11559.8). Total num frames: 520192. Throughput: 0: 2755.9. Samples: 124018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:00:38,033][04746] Avg episode reward: [(0, '4.583')] +[2024-09-22 06:00:38,060][06893] Saving new best policy, reward=4.583! +[2024-09-22 06:00:38,640][06906] Updated weights for policy 0, policy_version 130 (0.0014) +[2024-09-22 06:00:41,589][06906] Updated weights for policy 0, policy_version 140 (0.0016) +[2024-09-22 06:00:43,031][04746] Fps is (10 sec: 13517.0, 60 sec: 11796.5, 300 sec: 11796.5). Total num frames: 589824. Throughput: 0: 3167.9. Samples: 144858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:00:43,034][04746] Avg episode reward: [(0, '4.505')] +[2024-09-22 06:00:44,648][06906] Updated weights for policy 0, policy_version 150 (0.0015) +[2024-09-22 06:00:47,953][06906] Updated weights for policy 0, policy_version 160 (0.0016) +[2024-09-22 06:00:48,031][04746] Fps is (10 sec: 13516.7, 60 sec: 11915.6, 300 sec: 11915.6). Total num frames: 655360. Throughput: 0: 3341.2. Samples: 164004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:00:48,034][04746] Avg episode reward: [(0, '4.547')] +[2024-09-22 06:00:50,901][06906] Updated weights for policy 0, policy_version 170 (0.0015) +[2024-09-22 06:00:53,031][04746] Fps is (10 sec: 13516.8, 60 sec: 12083.2, 300 sec: 12083.2). Total num frames: 724992. Throughput: 0: 3367.3. Samples: 174558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:00:53,034][04746] Avg episode reward: [(0, '4.534')] +[2024-09-22 06:00:53,858][06906] Updated weights for policy 0, policy_version 180 (0.0018) +[2024-09-22 06:00:56,747][06906] Updated weights for policy 0, policy_version 190 (0.0019) +[2024-09-22 06:00:58,031][04746] Fps is (10 sec: 13926.4, 60 sec: 13243.7, 300 sec: 12225.0). Total num frames: 794624. Throughput: 0: 3385.9. Samples: 195444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:00:58,033][04746] Avg episode reward: [(0, '4.582')] +[2024-09-22 06:00:59,870][06906] Updated weights for policy 0, policy_version 200 (0.0017) +[2024-09-22 06:01:03,031][04746] Fps is (10 sec: 13107.3, 60 sec: 13380.6, 300 sec: 12229.5). Total num frames: 856064. Throughput: 0: 3377.6. Samples: 214660. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-22 06:01:03,033][04746] Avg episode reward: [(0, '4.789')] +[2024-09-22 06:01:03,056][06893] Saving new best policy, reward=4.789! +[2024-09-22 06:01:03,063][06906] Updated weights for policy 0, policy_version 210 (0.0017) +[2024-09-22 06:01:06,026][06906] Updated weights for policy 0, policy_version 220 (0.0015) +[2024-09-22 06:01:08,031][04746] Fps is (10 sec: 13107.3, 60 sec: 13516.8, 300 sec: 12342.6). Total num frames: 925696. Throughput: 0: 3381.0. Samples: 225098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:01:08,034][04746] Avg episode reward: [(0, '4.693')] +[2024-09-22 06:01:09,098][06906] Updated weights for policy 0, policy_version 230 (0.0015) +[2024-09-22 06:01:12,029][06906] Updated weights for policy 0, policy_version 240 (0.0017) +[2024-09-22 06:01:13,031][04746] Fps is (10 sec: 13926.3, 60 sec: 13516.8, 300 sec: 12441.6). Total num frames: 995328. Throughput: 0: 3367.8. Samples: 245544. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-09-22 06:01:13,035][04746] Avg episode reward: [(0, '4.842')] +[2024-09-22 06:01:13,039][06893] Saving new best policy, reward=4.842! +[2024-09-22 06:01:15,348][06906] Updated weights for policy 0, policy_version 250 (0.0014) +[2024-09-22 06:01:18,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13449.1, 300 sec: 12432.6). Total num frames: 1056768. Throughput: 0: 3350.1. Samples: 264436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:01:18,033][04746] Avg episode reward: [(0, '5.062')] +[2024-09-22 06:01:18,043][06893] Saving new best policy, reward=5.062! +[2024-09-22 06:01:18,583][06906] Updated weights for policy 0, policy_version 260 (0.0015) +[2024-09-22 06:01:21,557][06906] Updated weights for policy 0, policy_version 270 (0.0016) +[2024-09-22 06:01:23,031][04746] Fps is (10 sec: 13107.4, 60 sec: 13448.6, 300 sec: 12515.6). Total num frames: 1126400. Throughput: 0: 3346.1. Samples: 274590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:01:23,034][04746] Avg episode reward: [(0, '5.321')] +[2024-09-22 06:01:23,036][06893] Saving new best policy, reward=5.321! +[2024-09-22 06:01:24,489][06906] Updated weights for policy 0, policy_version 280 (0.0018) +[2024-09-22 06:01:27,668][06906] Updated weights for policy 0, policy_version 290 (0.0014) +[2024-09-22 06:01:28,031][04746] Fps is (10 sec: 13516.9, 60 sec: 13380.2, 300 sec: 12546.7). Total num frames: 1191936. Throughput: 0: 3335.5. Samples: 294956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:01:28,033][04746] Avg episode reward: [(0, '5.955')] +[2024-09-22 06:01:28,042][06893] Saving new best policy, reward=5.955! +[2024-09-22 06:01:30,955][06906] Updated weights for policy 0, policy_version 300 (0.0017) +[2024-09-22 06:01:33,031][04746] Fps is (10 sec: 12697.5, 60 sec: 13312.0, 300 sec: 12533.8). Total num frames: 1253376. Throughput: 0: 3336.7. Samples: 314156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:01:33,033][04746] Avg episode reward: [(0, '6.551')] +[2024-09-22 06:01:33,055][06893] Saving new best policy, reward=6.551! +[2024-09-22 06:01:33,984][06906] Updated weights for policy 0, policy_version 310 (0.0016) +[2024-09-22 06:01:36,885][06906] Updated weights for policy 0, policy_version 320 (0.0014) +[2024-09-22 06:01:38,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13380.3, 300 sec: 12600.1). Total num frames: 1323008. Throughput: 0: 3331.8. Samples: 324488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-22 06:01:38,035][04746] Avg episode reward: [(0, '7.361')] +[2024-09-22 06:01:38,093][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000324_1327104.pth... +[2024-09-22 06:01:38,180][06893] Saving new best policy, reward=7.361! +[2024-09-22 06:01:39,935][06906] Updated weights for policy 0, policy_version 330 (0.0016) +[2024-09-22 06:01:43,031][04746] Fps is (10 sec: 13516.6, 60 sec: 13312.0, 300 sec: 12623.1). Total num frames: 1388544. Throughput: 0: 3315.4. Samples: 344638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:01:43,033][04746] Avg episode reward: [(0, '6.732')] +[2024-09-22 06:01:43,143][06906] Updated weights for policy 0, policy_version 340 (0.0019) +[2024-09-22 06:01:46,376][06906] Updated weights for policy 0, policy_version 350 (0.0014) +[2024-09-22 06:01:48,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13312.0, 300 sec: 12644.2). Total num frames: 1454080. Throughput: 0: 3315.5. Samples: 363858. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-22 06:01:48,035][04746] Avg episode reward: [(0, '7.363')] +[2024-09-22 06:01:48,043][06893] Saving new best policy, reward=7.363! +[2024-09-22 06:01:49,345][06906] Updated weights for policy 0, policy_version 360 (0.0016) +[2024-09-22 06:01:52,347][06906] Updated weights for policy 0, policy_version 370 (0.0018) +[2024-09-22 06:01:53,031][04746] Fps is (10 sec: 13516.9, 60 sec: 13312.0, 300 sec: 12697.6). Total num frames: 1523712. Throughput: 0: 3316.4. Samples: 374334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:01:53,034][04746] Avg episode reward: [(0, '8.228')] +[2024-09-22 06:01:53,038][06893] Saving new best policy, reward=8.228! +[2024-09-22 06:01:55,338][06906] Updated weights for policy 0, policy_version 380 (0.0015) +[2024-09-22 06:01:58,031][04746] Fps is (10 sec: 13516.7, 60 sec: 13243.7, 300 sec: 12714.0). Total num frames: 1589248. Throughput: 0: 3304.1. Samples: 394228. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-22 06:01:58,035][04746] Avg episode reward: [(0, '10.239')] +[2024-09-22 06:01:58,041][06893] Saving new best policy, reward=10.239! +[2024-09-22 06:01:58,675][06906] Updated weights for policy 0, policy_version 390 (0.0017) +[2024-09-22 06:02:01,808][06906] Updated weights for policy 0, policy_version 400 (0.0015) +[2024-09-22 06:02:03,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13312.0, 300 sec: 12729.1). Total num frames: 1654784. Throughput: 0: 3315.7. Samples: 413642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:02:03,034][04746] Avg episode reward: [(0, '10.943')] +[2024-09-22 06:02:03,037][06893] Saving new best policy, reward=10.943! +[2024-09-22 06:02:04,782][06906] Updated weights for policy 0, policy_version 410 (0.0014) +[2024-09-22 06:02:07,832][06906] Updated weights for policy 0, policy_version 420 (0.0015) +[2024-09-22 06:02:08,031][04746] Fps is (10 sec: 13107.3, 60 sec: 13243.7, 300 sec: 12743.1). Total num frames: 1720320. Throughput: 0: 3319.8. Samples: 423980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:02:08,033][04746] Avg episode reward: [(0, '11.171')] +[2024-09-22 06:02:08,041][06893] Saving new best policy, reward=11.171! +[2024-09-22 06:02:10,963][06906] Updated weights for policy 0, policy_version 430 (0.0018) +[2024-09-22 06:02:13,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13175.5, 300 sec: 12756.1). Total num frames: 1785856. Throughput: 0: 3302.7. Samples: 443576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:02:13,033][04746] Avg episode reward: [(0, '9.843')] +[2024-09-22 06:02:14,230][06906] Updated weights for policy 0, policy_version 440 (0.0015) +[2024-09-22 06:02:17,315][06906] Updated weights for policy 0, policy_version 450 (0.0015) +[2024-09-22 06:02:18,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13243.7, 300 sec: 12768.2). Total num frames: 1851392. Throughput: 0: 3307.7. Samples: 463004. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-22 06:02:18,033][04746] Avg episode reward: [(0, '11.811')] +[2024-09-22 06:02:18,040][06893] Saving new best policy, reward=11.811! +[2024-09-22 06:02:20,387][06906] Updated weights for policy 0, policy_version 460 (0.0019) +[2024-09-22 06:02:23,031][04746] Fps is (10 sec: 13516.7, 60 sec: 13243.7, 300 sec: 12806.8). Total num frames: 1921024. Throughput: 0: 3302.8. Samples: 473116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:02:23,033][04746] Avg episode reward: [(0, '15.738')] +[2024-09-22 06:02:23,037][06893] Saving new best policy, reward=15.738! +[2024-09-22 06:02:23,336][06906] Updated weights for policy 0, policy_version 470 (0.0016) +[2024-09-22 06:02:26,473][06906] Updated weights for policy 0, policy_version 480 (0.0017) +[2024-09-22 06:02:28,031][04746] Fps is (10 sec: 13107.1, 60 sec: 13175.5, 300 sec: 12790.1). Total num frames: 1982464. Throughput: 0: 3297.9. Samples: 493042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:02:28,033][04746] Avg episode reward: [(0, '17.847')] +[2024-09-22 06:02:28,042][06893] Saving new best policy, reward=17.847! +[2024-09-22 06:02:29,836][06906] Updated weights for policy 0, policy_version 490 (0.0016) +[2024-09-22 06:02:32,843][06906] Updated weights for policy 0, policy_version 500 (0.0017) +[2024-09-22 06:02:33,031][04746] Fps is (10 sec: 12697.7, 60 sec: 13243.7, 300 sec: 12800.0). Total num frames: 2048000. Throughput: 0: 3304.2. Samples: 512546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:02:33,033][04746] Avg episode reward: [(0, '18.660')] +[2024-09-22 06:02:33,036][06893] Saving new best policy, reward=18.660! +[2024-09-22 06:02:35,862][06906] Updated weights for policy 0, policy_version 510 (0.0014) +[2024-09-22 06:02:38,031][04746] Fps is (10 sec: 13516.8, 60 sec: 13243.7, 300 sec: 12834.1). Total num frames: 2117632. Throughput: 0: 3298.8. Samples: 522778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:02:38,034][04746] Avg episode reward: [(0, '17.260')] +[2024-09-22 06:02:38,822][06906] Updated weights for policy 0, policy_version 520 (0.0015) +[2024-09-22 06:02:42,162][06906] Updated weights for policy 0, policy_version 530 (0.0020) +[2024-09-22 06:02:43,031][04746] Fps is (10 sec: 13107.1, 60 sec: 13175.5, 300 sec: 12818.1). Total num frames: 2179072. Throughput: 0: 3289.2. Samples: 542240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:02:43,033][04746] Avg episode reward: [(0, '18.519')] +[2024-09-22 06:02:45,374][06906] Updated weights for policy 0, policy_version 540 (0.0014) +[2024-09-22 06:02:48,031][04746] Fps is (10 sec: 13107.3, 60 sec: 13243.7, 300 sec: 12849.7). Total num frames: 2248704. Throughput: 0: 3299.2. Samples: 562106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:02:48,034][04746] Avg episode reward: [(0, '17.896')] +[2024-09-22 06:02:48,298][06906] Updated weights for policy 0, policy_version 550 (0.0015) +[2024-09-22 06:02:51,351][06906] Updated weights for policy 0, policy_version 560 (0.0017) +[2024-09-22 06:02:53,031][04746] Fps is (10 sec: 13516.9, 60 sec: 13175.5, 300 sec: 12856.9). Total num frames: 2314240. Throughput: 0: 3293.2. Samples: 572176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-22 06:02:53,033][04746] Avg episode reward: [(0, '19.252')] +[2024-09-22 06:02:53,037][06893] Saving new best policy, reward=19.252! +[2024-09-22 06:02:54,461][06906] Updated weights for policy 0, policy_version 570 (0.0020) +[2024-09-22 06:02:57,854][06906] Updated weights for policy 0, policy_version 580 (0.0016) +[2024-09-22 06:02:58,031][04746] Fps is (10 sec: 12697.6, 60 sec: 13107.2, 300 sec: 12841.5). Total num frames: 2375680. Throughput: 0: 3285.8. Samples: 591436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:02:58,033][04746] Avg episode reward: [(0, '19.334')] +[2024-09-22 06:02:58,043][06893] Saving new best policy, reward=19.334! +[2024-09-22 06:03:00,813][06906] Updated weights for policy 0, policy_version 590 (0.0015) +[2024-09-22 06:03:03,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13175.5, 300 sec: 12870.1). Total num frames: 2445312. Throughput: 0: 3301.7. Samples: 611580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:03:03,033][04746] Avg episode reward: [(0, '19.455')] +[2024-09-22 06:03:03,035][06893] Saving new best policy, reward=19.455! +[2024-09-22 06:03:03,857][06906] Updated weights for policy 0, policy_version 600 (0.0014) +[2024-09-22 06:03:06,865][06906] Updated weights for policy 0, policy_version 610 (0.0014) +[2024-09-22 06:03:08,031][04746] Fps is (10 sec: 13516.7, 60 sec: 13175.5, 300 sec: 12876.1). Total num frames: 2510848. Throughput: 0: 3301.4. Samples: 621680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:03:08,034][04746] Avg episode reward: [(0, '21.272')] +[2024-09-22 06:03:08,042][06893] Saving new best policy, reward=21.272! +[2024-09-22 06:03:10,156][06906] Updated weights for policy 0, policy_version 620 (0.0016) +[2024-09-22 06:03:13,031][04746] Fps is (10 sec: 12697.5, 60 sec: 13107.2, 300 sec: 12861.4). Total num frames: 2572288. Throughput: 0: 3277.0. Samples: 640508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-22 06:03:13,033][04746] Avg episode reward: [(0, '21.166')] +[2024-09-22 06:03:13,485][06906] Updated weights for policy 0, policy_version 630 (0.0015) +[2024-09-22 06:03:16,486][06906] Updated weights for policy 0, policy_version 640 (0.0015) +[2024-09-22 06:03:18,031][04746] Fps is (10 sec: 13107.1, 60 sec: 13175.4, 300 sec: 12887.4). Total num frames: 2641920. Throughput: 0: 3288.1. Samples: 660510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:03:18,034][04746] Avg episode reward: [(0, '20.795')] +[2024-09-22 06:03:19,482][06906] Updated weights for policy 0, policy_version 650 (0.0017) +[2024-09-22 06:03:22,441][06906] Updated weights for policy 0, policy_version 660 (0.0017) +[2024-09-22 06:03:23,031][04746] Fps is (10 sec: 13516.8, 60 sec: 13107.2, 300 sec: 12892.6). Total num frames: 2707456. Throughput: 0: 3287.4. Samples: 670712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:03:23,034][04746] Avg episode reward: [(0, '19.731')] +[2024-09-22 06:03:25,766][06906] Updated weights for policy 0, policy_version 670 (0.0017) +[2024-09-22 06:03:28,031][04746] Fps is (10 sec: 13107.3, 60 sec: 13175.5, 300 sec: 12897.6). Total num frames: 2772992. Throughput: 0: 3283.5. Samples: 689998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:03:28,033][04746] Avg episode reward: [(0, '21.732')] +[2024-09-22 06:03:28,043][06893] Saving new best policy, reward=21.732! +[2024-09-22 06:03:28,876][06906] Updated weights for policy 0, policy_version 680 (0.0020) +[2024-09-22 06:03:31,786][06906] Updated weights for policy 0, policy_version 690 (0.0016) +[2024-09-22 06:03:33,031][04746] Fps is (10 sec: 13516.8, 60 sec: 13243.7, 300 sec: 12921.0). Total num frames: 2842624. Throughput: 0: 3303.7. Samples: 710774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-22 06:03:33,033][04746] Avg episode reward: [(0, '21.135')] +[2024-09-22 06:03:34,764][06906] Updated weights for policy 0, policy_version 700 (0.0015) +[2024-09-22 06:03:37,840][06906] Updated weights for policy 0, policy_version 710 (0.0014) +[2024-09-22 06:03:38,031][04746] Fps is (10 sec: 13516.6, 60 sec: 13175.4, 300 sec: 12925.1). Total num frames: 2908160. Throughput: 0: 3307.5. Samples: 721014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:03:38,034][04746] Avg episode reward: [(0, '19.791')] +[2024-09-22 06:03:38,044][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000710_2908160.pth... +[2024-09-22 06:03:41,191][06906] Updated weights for policy 0, policy_version 720 (0.0017) +[2024-09-22 06:03:43,031][04746] Fps is (10 sec: 13106.9, 60 sec: 13243.7, 300 sec: 12929.1). Total num frames: 2973696. Throughput: 0: 3304.0. Samples: 740116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-22 06:03:43,034][04746] Avg episode reward: [(0, '21.972')] +[2024-09-22 06:03:43,037][06893] Saving new best policy, reward=21.972! +[2024-09-22 06:03:44,083][06906] Updated weights for policy 0, policy_version 730 (0.0015) +[2024-09-22 06:03:46,998][06906] Updated weights for policy 0, policy_version 740 (0.0014) +[2024-09-22 06:03:48,031][04746] Fps is (10 sec: 13517.1, 60 sec: 13243.7, 300 sec: 12950.3). Total num frames: 3043328. Throughput: 0: 3329.4. Samples: 761404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:03:48,033][04746] Avg episode reward: [(0, '26.497')] +[2024-09-22 06:03:48,044][06893] Saving new best policy, reward=26.497! +[2024-09-22 06:03:49,881][06906] Updated weights for policy 0, policy_version 750 (0.0015) +[2024-09-22 06:03:52,913][06906] Updated weights for policy 0, policy_version 760 (0.0019) +[2024-09-22 06:03:53,031][04746] Fps is (10 sec: 13926.7, 60 sec: 13312.0, 300 sec: 12970.7). Total num frames: 3112960. Throughput: 0: 3336.3. Samples: 771812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-22 06:03:53,033][04746] Avg episode reward: [(0, '27.469')] +[2024-09-22 06:03:53,035][06893] Saving new best policy, reward=27.469! +[2024-09-22 06:03:56,124][06906] Updated weights for policy 0, policy_version 770 (0.0021) +[2024-09-22 06:03:58,031][04746] Fps is (10 sec: 13516.8, 60 sec: 13380.3, 300 sec: 12973.5). Total num frames: 3178496. Throughput: 0: 3348.5. Samples: 791188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:03:58,034][04746] Avg episode reward: [(0, '24.919')] +[2024-09-22 06:03:59,087][06906] Updated weights for policy 0, policy_version 780 (0.0015) +[2024-09-22 06:04:02,101][06906] Updated weights for policy 0, policy_version 790 (0.0017) +[2024-09-22 06:04:03,031][04746] Fps is (10 sec: 13516.8, 60 sec: 13380.3, 300 sec: 12992.5). Total num frames: 3248128. Throughput: 0: 3368.8. Samples: 812106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:04:03,034][04746] Avg episode reward: [(0, '24.168')] +[2024-09-22 06:04:05,093][06906] Updated weights for policy 0, policy_version 800 (0.0017) +[2024-09-22 06:04:08,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13312.0, 300 sec: 12978.7). Total num frames: 3309568. Throughput: 0: 3366.3. Samples: 822196. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:04:08,033][04746] Avg episode reward: [(0, '23.725')] +[2024-09-22 06:04:08,410][06906] Updated weights for policy 0, policy_version 810 (0.0014) +[2024-09-22 06:04:11,477][06906] Updated weights for policy 0, policy_version 820 (0.0015) +[2024-09-22 06:04:13,031][04746] Fps is (10 sec: 13107.2, 60 sec: 13448.5, 300 sec: 12996.9). Total num frames: 3379200. Throughput: 0: 3366.3. Samples: 841482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-22 06:04:13,034][04746] Avg episode reward: [(0, '23.472')] +[2024-09-22 06:04:14,400][06906] Updated weights for policy 0, policy_version 830 (0.0014) +[2024-09-22 06:04:17,234][06906] Updated weights for policy 0, policy_version 840 (0.0015) +[2024-09-22 06:04:18,031][04746] Fps is (10 sec: 13926.3, 60 sec: 13448.5, 300 sec: 13014.5). Total num frames: 3448832. Throughput: 0: 3378.0. Samples: 862784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:04:18,033][04746] Avg episode reward: [(0, '21.957')] +[2024-09-22 06:04:20,179][06906] Updated weights for policy 0, policy_version 850 (0.0016) +[2024-09-22 06:04:23,031][04746] Fps is (10 sec: 13516.9, 60 sec: 13448.5, 300 sec: 13016.2). Total num frames: 3514368. Throughput: 0: 3377.9. Samples: 873020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:04:23,034][04746] Avg episode reward: [(0, '22.854')] +[2024-09-22 06:04:23,411][06906] Updated weights for policy 0, policy_version 860 (0.0016) +[2024-09-22 06:04:26,393][06906] Updated weights for policy 0, policy_version 870 (0.0016) +[2024-09-22 06:04:28,031][04746] Fps is (10 sec: 13516.5, 60 sec: 13516.7, 300 sec: 13032.7). Total num frames: 3584000. Throughput: 0: 3400.0. Samples: 893116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:04:28,034][04746] Avg episode reward: [(0, '24.346')] +[2024-09-22 06:04:29,222][06906] Updated weights for policy 0, policy_version 880 (0.0015) +[2024-09-22 06:04:32,062][06906] Updated weights for policy 0, policy_version 890 (0.0016) +[2024-09-22 06:04:33,031][04746] Fps is (10 sec: 14336.0, 60 sec: 13585.1, 300 sec: 13063.3). Total num frames: 3657728. Throughput: 0: 3406.0. Samples: 914676. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-22 06:04:33,034][04746] Avg episode reward: [(0, '25.572')] +[2024-09-22 06:04:34,926][06906] Updated weights for policy 0, policy_version 900 (0.0015) +[2024-09-22 06:04:38,031][04746] Fps is (10 sec: 13926.8, 60 sec: 13585.1, 300 sec: 13064.1). Total num frames: 3723264. Throughput: 0: 3406.2. Samples: 925090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:04:38,033][04746] Avg episode reward: [(0, '25.821')] +[2024-09-22 06:04:38,157][06906] Updated weights for policy 0, policy_version 910 (0.0017) +[2024-09-22 06:04:41,098][06906] Updated weights for policy 0, policy_version 920 (0.0017) +[2024-09-22 06:04:43,031][04746] Fps is (10 sec: 13516.8, 60 sec: 13653.4, 300 sec: 13079.0). Total num frames: 3792896. Throughput: 0: 3424.8. Samples: 945304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-22 06:04:43,033][04746] Avg episode reward: [(0, '25.851')] +[2024-09-22 06:04:43,917][06906] Updated weights for policy 0, policy_version 930 (0.0015) +[2024-09-22 06:04:46,694][06906] Updated weights for policy 0, policy_version 940 (0.0014) +[2024-09-22 06:04:48,031][04746] Fps is (10 sec: 14335.9, 60 sec: 13721.6, 300 sec: 13107.2). Total num frames: 3866624. Throughput: 0: 3445.4. Samples: 967150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:04:48,035][04746] Avg episode reward: [(0, '27.156')] +[2024-09-22 06:04:49,569][06906] Updated weights for policy 0, policy_version 950 (0.0014) +[2024-09-22 06:04:52,744][06906] Updated weights for policy 0, policy_version 960 (0.0016) +[2024-09-22 06:04:53,031][04746] Fps is (10 sec: 13926.3, 60 sec: 13653.3, 300 sec: 13329.4). Total num frames: 3932160. Throughput: 0: 3449.2. Samples: 977410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:04:53,034][04746] Avg episode reward: [(0, '28.118')] +[2024-09-22 06:04:53,073][06893] Saving new best policy, reward=28.118! +[2024-09-22 06:04:55,653][06906] Updated weights for policy 0, policy_version 970 (0.0015) +[2024-09-22 06:04:58,031][04746] Fps is (10 sec: 13926.5, 60 sec: 13789.8, 300 sec: 13398.8). Total num frames: 4005888. Throughput: 0: 3480.9. Samples: 998122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:04:58,033][04746] Avg episode reward: [(0, '26.885')] +[2024-09-22 06:04:58,467][06906] Updated weights for policy 0, policy_version 980 (0.0014) +[2024-09-22 06:05:01,257][06906] Updated weights for policy 0, policy_version 990 (0.0014) +[2024-09-22 06:05:03,031][04746] Fps is (10 sec: 14745.6, 60 sec: 13858.1, 300 sec: 13440.4). Total num frames: 4079616. Throughput: 0: 3496.9. Samples: 1020144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:05:03,034][04746] Avg episode reward: [(0, '26.385')] +[2024-09-22 06:05:04,146][06906] Updated weights for policy 0, policy_version 1000 (0.0014) +[2024-09-22 06:05:07,210][06906] Updated weights for policy 0, policy_version 1010 (0.0014) +[2024-09-22 06:05:08,031][04746] Fps is (10 sec: 13926.4, 60 sec: 13926.4, 300 sec: 13426.5). Total num frames: 4145152. Throughput: 0: 3498.6. Samples: 1030456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:05:08,033][04746] Avg episode reward: [(0, '23.512')] +[2024-09-22 06:05:10,028][06906] Updated weights for policy 0, policy_version 1020 (0.0014) +[2024-09-22 06:05:12,812][06906] Updated weights for policy 0, policy_version 1030 (0.0016) +[2024-09-22 06:05:13,031][04746] Fps is (10 sec: 13926.5, 60 sec: 13994.7, 300 sec: 13454.4). Total num frames: 4218880. Throughput: 0: 3524.5. Samples: 1051718. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-09-22 06:05:13,034][04746] Avg episode reward: [(0, '25.543')] +[2024-09-22 06:05:15,610][06906] Updated weights for policy 0, policy_version 1040 (0.0018) +[2024-09-22 06:05:18,031][04746] Fps is (10 sec: 14745.3, 60 sec: 14062.9, 300 sec: 13468.2). Total num frames: 4292608. Throughput: 0: 3532.4. Samples: 1073634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:05:18,033][04746] Avg episode reward: [(0, '26.777')] +[2024-09-22 06:05:18,500][06906] Updated weights for policy 0, policy_version 1050 (0.0016) +[2024-09-22 06:05:21,655][06906] Updated weights for policy 0, policy_version 1060 (0.0021) +[2024-09-22 06:05:23,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14062.9, 300 sec: 13454.3). Total num frames: 4358144. Throughput: 0: 3519.3. Samples: 1083460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:05:23,034][04746] Avg episode reward: [(0, '25.521')] +[2024-09-22 06:05:24,450][06906] Updated weights for policy 0, policy_version 1070 (0.0014) +[2024-09-22 06:05:27,263][06906] Updated weights for policy 0, policy_version 1080 (0.0014) +[2024-09-22 06:05:28,031][04746] Fps is (10 sec: 13926.6, 60 sec: 14131.2, 300 sec: 13482.1). Total num frames: 4431872. Throughput: 0: 3545.8. Samples: 1104864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:05:28,033][04746] Avg episode reward: [(0, '25.581')] +[2024-09-22 06:05:30,185][06906] Updated weights for policy 0, policy_version 1090 (0.0014) +[2024-09-22 06:05:33,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14062.9, 300 sec: 13496.0). Total num frames: 4501504. Throughput: 0: 3531.2. Samples: 1126052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-09-22 06:05:33,034][04746] Avg episode reward: [(0, '24.798')] +[2024-09-22 06:05:33,088][06906] Updated weights for policy 0, policy_version 1100 (0.0019) +[2024-09-22 06:05:36,247][06906] Updated weights for policy 0, policy_version 1110 (0.0015) +[2024-09-22 06:05:38,031][04746] Fps is (10 sec: 13926.5, 60 sec: 14131.2, 300 sec: 13496.0). Total num frames: 4571136. Throughput: 0: 3521.8. Samples: 1135890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:05:38,034][04746] Avg episode reward: [(0, '22.580')] +[2024-09-22 06:05:38,043][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001116_4571136.pth... +[2024-09-22 06:05:38,124][06893] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000324_1327104.pth +[2024-09-22 06:05:39,129][06906] Updated weights for policy 0, policy_version 1120 (0.0015) +[2024-09-22 06:05:41,966][06906] Updated weights for policy 0, policy_version 1130 (0.0016) +[2024-09-22 06:05:43,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14131.2, 300 sec: 13509.9). Total num frames: 4640768. Throughput: 0: 3534.8. Samples: 1157188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:05:43,035][04746] Avg episode reward: [(0, '25.251')] +[2024-09-22 06:05:44,780][06906] Updated weights for policy 0, policy_version 1140 (0.0016) +[2024-09-22 06:05:47,685][06906] Updated weights for policy 0, policy_version 1150 (0.0016) +[2024-09-22 06:05:48,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 13523.7). Total num frames: 4714496. Throughput: 0: 3521.1. Samples: 1178592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:05:48,034][04746] Avg episode reward: [(0, '25.028')] +[2024-09-22 06:05:50,830][06906] Updated weights for policy 0, policy_version 1160 (0.0018) +[2024-09-22 06:05:53,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 13509.9). Total num frames: 4780032. Throughput: 0: 3511.0. Samples: 1188452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:05:53,033][04746] Avg episode reward: [(0, '25.848')] +[2024-09-22 06:05:53,683][06906] Updated weights for policy 0, policy_version 1170 (0.0016) +[2024-09-22 06:05:56,521][06906] Updated weights for policy 0, policy_version 1180 (0.0015) +[2024-09-22 06:05:58,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14131.2, 300 sec: 13551.5). Total num frames: 4853760. Throughput: 0: 3520.9. Samples: 1210160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:05:58,033][04746] Avg episode reward: [(0, '27.013')] +[2024-09-22 06:05:59,297][06906] Updated weights for policy 0, policy_version 1190 (0.0014) +[2024-09-22 06:06:02,256][06906] Updated weights for policy 0, policy_version 1200 (0.0014) +[2024-09-22 06:06:03,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14062.9, 300 sec: 13551.5). Total num frames: 4923392. Throughput: 0: 3504.4. Samples: 1231332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:06:03,034][04746] Avg episode reward: [(0, '26.650')] +[2024-09-22 06:06:05,409][06906] Updated weights for policy 0, policy_version 1210 (0.0015) +[2024-09-22 06:06:08,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14131.2, 300 sec: 13551.5). Total num frames: 4993024. Throughput: 0: 3504.7. Samples: 1241172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-22 06:06:08,034][04746] Avg episode reward: [(0, '28.227')] +[2024-09-22 06:06:08,044][06893] Saving new best policy, reward=28.227! +[2024-09-22 06:06:08,226][06906] Updated weights for policy 0, policy_version 1220 (0.0015) +[2024-09-22 06:06:11,012][06906] Updated weights for policy 0, policy_version 1230 (0.0017) +[2024-09-22 06:06:13,031][04746] Fps is (10 sec: 14336.1, 60 sec: 14131.2, 300 sec: 13593.2). Total num frames: 5066752. Throughput: 0: 3515.6. Samples: 1263066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:06:13,033][04746] Avg episode reward: [(0, '27.215')] +[2024-09-22 06:06:13,844][06906] Updated weights for policy 0, policy_version 1240 (0.0016) +[2024-09-22 06:06:16,820][06906] Updated weights for policy 0, policy_version 1250 (0.0014) +[2024-09-22 06:06:18,031][04746] Fps is (10 sec: 13926.6, 60 sec: 13994.7, 300 sec: 13579.3). Total num frames: 5132288. Throughput: 0: 3506.6. Samples: 1283850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:06:18,034][04746] Avg episode reward: [(0, '26.215')] +[2024-09-22 06:06:19,919][06906] Updated weights for policy 0, policy_version 1260 (0.0016) +[2024-09-22 06:06:22,675][06906] Updated weights for policy 0, policy_version 1270 (0.0015) +[2024-09-22 06:06:23,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 13607.1). Total num frames: 5206016. Throughput: 0: 3515.7. Samples: 1294098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-22 06:06:23,033][04746] Avg episode reward: [(0, '27.755')] +[2024-09-22 06:06:25,477][06906] Updated weights for policy 0, policy_version 1280 (0.0017) +[2024-09-22 06:06:28,031][04746] Fps is (10 sec: 14745.7, 60 sec: 14131.2, 300 sec: 13648.7). Total num frames: 5279744. Throughput: 0: 3536.1. Samples: 1316314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:06:28,033][04746] Avg episode reward: [(0, '28.448')] +[2024-09-22 06:06:28,042][06893] Saving new best policy, reward=28.448! +[2024-09-22 06:06:28,227][06906] Updated weights for policy 0, policy_version 1290 (0.0016) +[2024-09-22 06:06:31,196][06906] Updated weights for policy 0, policy_version 1300 (0.0014) +[2024-09-22 06:06:33,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14062.9, 300 sec: 13634.8). Total num frames: 5345280. Throughput: 0: 3520.6. Samples: 1337020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:06:33,033][04746] Avg episode reward: [(0, '29.035')] +[2024-09-22 06:06:33,088][06893] Saving new best policy, reward=29.035! +[2024-09-22 06:06:34,328][06906] Updated weights for policy 0, policy_version 1310 (0.0016) +[2024-09-22 06:06:37,134][06906] Updated weights for policy 0, policy_version 1320 (0.0014) +[2024-09-22 06:06:38,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 13662.6). Total num frames: 5419008. Throughput: 0: 3534.7. Samples: 1347514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:06:38,033][04746] Avg episode reward: [(0, '25.313')] +[2024-09-22 06:06:39,945][06906] Updated weights for policy 0, policy_version 1330 (0.0014) +[2024-09-22 06:06:42,784][06906] Updated weights for policy 0, policy_version 1340 (0.0013) +[2024-09-22 06:06:43,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 13676.5). Total num frames: 5488640. Throughput: 0: 3534.7. Samples: 1369222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:06:43,034][04746] Avg episode reward: [(0, '29.276')] +[2024-09-22 06:06:43,069][06893] Saving new best policy, reward=29.276! +[2024-09-22 06:06:45,725][06906] Updated weights for policy 0, policy_version 1350 (0.0014) +[2024-09-22 06:06:48,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14062.9, 300 sec: 13676.5). Total num frames: 5558272. Throughput: 0: 3523.2. Samples: 1389876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:06:48,035][04746] Avg episode reward: [(0, '28.923')] +[2024-09-22 06:06:48,756][06906] Updated weights for policy 0, policy_version 1360 (0.0018) +[2024-09-22 06:06:51,574][06906] Updated weights for policy 0, policy_version 1370 (0.0015) +[2024-09-22 06:06:53,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14199.5, 300 sec: 13704.2). Total num frames: 5632000. Throughput: 0: 3545.9. Samples: 1400736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:06:53,034][04746] Avg episode reward: [(0, '28.842')] +[2024-09-22 06:06:54,417][06906] Updated weights for policy 0, policy_version 1380 (0.0015) +[2024-09-22 06:06:57,212][06906] Updated weights for policy 0, policy_version 1390 (0.0013) +[2024-09-22 06:06:58,031][04746] Fps is (10 sec: 14745.6, 60 sec: 14199.5, 300 sec: 13732.0). Total num frames: 5705728. Throughput: 0: 3544.0. Samples: 1422546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:06:58,034][04746] Avg episode reward: [(0, '29.792')] +[2024-09-22 06:06:58,044][06893] Saving new best policy, reward=29.792! +[2024-09-22 06:07:00,170][06906] Updated weights for policy 0, policy_version 1400 (0.0016) +[2024-09-22 06:07:03,031][04746] Fps is (10 sec: 13926.0, 60 sec: 14131.1, 300 sec: 13732.0). Total num frames: 5771264. Throughput: 0: 3535.4. Samples: 1442942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:07:03,034][04746] Avg episode reward: [(0, '26.261')] +[2024-09-22 06:07:03,245][06906] Updated weights for policy 0, policy_version 1410 (0.0015) +[2024-09-22 06:07:06,026][06906] Updated weights for policy 0, policy_version 1420 (0.0014) +[2024-09-22 06:07:08,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14199.5, 300 sec: 13759.8). Total num frames: 5844992. Throughput: 0: 3553.7. Samples: 1454014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:07:08,033][04746] Avg episode reward: [(0, '23.809')] +[2024-09-22 06:07:08,746][06906] Updated weights for policy 0, policy_version 1430 (0.0013) +[2024-09-22 06:07:11,538][06906] Updated weights for policy 0, policy_version 1440 (0.0016) +[2024-09-22 06:07:13,031][04746] Fps is (10 sec: 14746.0, 60 sec: 14199.5, 300 sec: 13787.6). Total num frames: 5918720. Throughput: 0: 3553.0. Samples: 1476198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:07:13,034][04746] Avg episode reward: [(0, '24.877')] +[2024-09-22 06:07:14,501][06906] Updated weights for policy 0, policy_version 1450 (0.0014) +[2024-09-22 06:07:17,502][06906] Updated weights for policy 0, policy_version 1460 (0.0018) +[2024-09-22 06:07:18,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14199.5, 300 sec: 13773.7). Total num frames: 5984256. Throughput: 0: 3550.3. Samples: 1496784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:07:18,033][04746] Avg episode reward: [(0, '23.963')] +[2024-09-22 06:07:20,270][06906] Updated weights for policy 0, policy_version 1470 (0.0014) +[2024-09-22 06:07:23,031][04746] Fps is (10 sec: 13926.5, 60 sec: 14199.5, 300 sec: 13815.3). Total num frames: 6057984. Throughput: 0: 3567.2. Samples: 1508040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-09-22 06:07:23,035][04746] Avg episode reward: [(0, '24.729')] +[2024-09-22 06:07:23,054][06906] Updated weights for policy 0, policy_version 1480 (0.0016) +[2024-09-22 06:07:25,760][06906] Updated weights for policy 0, policy_version 1490 (0.0014) +[2024-09-22 06:07:28,031][04746] Fps is (10 sec: 14745.5, 60 sec: 14199.4, 300 sec: 13843.1). Total num frames: 6131712. Throughput: 0: 3574.1. Samples: 1530058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:07:28,033][04746] Avg episode reward: [(0, '27.785')] +[2024-09-22 06:07:28,824][06906] Updated weights for policy 0, policy_version 1500 (0.0018) +[2024-09-22 06:07:31,944][06906] Updated weights for policy 0, policy_version 1510 (0.0018) +[2024-09-22 06:07:33,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14199.5, 300 sec: 13829.2). Total num frames: 6197248. Throughput: 0: 3557.4. Samples: 1549960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:07:33,034][04746] Avg episode reward: [(0, '29.769')] +[2024-09-22 06:07:34,749][06906] Updated weights for policy 0, policy_version 1520 (0.0016) +[2024-09-22 06:07:37,616][06906] Updated weights for policy 0, policy_version 1530 (0.0014) +[2024-09-22 06:07:38,031][04746] Fps is (10 sec: 13926.6, 60 sec: 14199.5, 300 sec: 13870.9). Total num frames: 6270976. Throughput: 0: 3557.6. Samples: 1560828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-22 06:07:38,033][04746] Avg episode reward: [(0, '26.364')] +[2024-09-22 06:07:38,042][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001531_6270976.pth... +[2024-09-22 06:07:38,123][06893] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000710_2908160.pth +[2024-09-22 06:07:40,498][06906] Updated weights for policy 0, policy_version 1540 (0.0015) +[2024-09-22 06:07:43,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14199.5, 300 sec: 13870.9). Total num frames: 6340608. Throughput: 0: 3542.5. Samples: 1581960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:07:43,034][04746] Avg episode reward: [(0, '24.922')] +[2024-09-22 06:07:43,629][06906] Updated weights for policy 0, policy_version 1550 (0.0015) +[2024-09-22 06:07:46,667][06906] Updated weights for policy 0, policy_version 1560 (0.0015) +[2024-09-22 06:07:48,031][04746] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 13870.9). Total num frames: 6406144. Throughput: 0: 3544.8. Samples: 1602458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:07:48,034][04746] Avg episode reward: [(0, '27.267')] +[2024-09-22 06:07:49,407][06906] Updated weights for policy 0, policy_version 1570 (0.0014) +[2024-09-22 06:07:52,150][06906] Updated weights for policy 0, policy_version 1580 (0.0016) +[2024-09-22 06:07:53,031][04746] Fps is (10 sec: 14336.1, 60 sec: 14199.5, 300 sec: 13926.4). Total num frames: 6483968. Throughput: 0: 3545.2. Samples: 1613550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:07:53,034][04746] Avg episode reward: [(0, '26.000')] +[2024-09-22 06:07:54,976][06906] Updated weights for policy 0, policy_version 1590 (0.0015) +[2024-09-22 06:07:57,903][06906] Updated weights for policy 0, policy_version 1600 (0.0019) +[2024-09-22 06:07:58,031][04746] Fps is (10 sec: 14745.6, 60 sec: 14131.2, 300 sec: 13926.4). Total num frames: 6553600. Throughput: 0: 3533.6. Samples: 1635212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:07:58,034][04746] Avg episode reward: [(0, '28.912')] +[2024-09-22 06:08:00,923][06906] Updated weights for policy 0, policy_version 1610 (0.0015) +[2024-09-22 06:08:03,031][04746] Fps is (10 sec: 13926.3, 60 sec: 14199.5, 300 sec: 13940.3). Total num frames: 6623232. Throughput: 0: 3544.4. Samples: 1656284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:08:03,033][04746] Avg episode reward: [(0, '31.208')] +[2024-09-22 06:08:03,037][06893] Saving new best policy, reward=31.208! +[2024-09-22 06:08:03,740][06906] Updated weights for policy 0, policy_version 1620 (0.0013) +[2024-09-22 06:08:06,459][06906] Updated weights for policy 0, policy_version 1630 (0.0014) +[2024-09-22 06:08:08,031][04746] Fps is (10 sec: 14335.9, 60 sec: 14199.5, 300 sec: 13981.9). Total num frames: 6696960. Throughput: 0: 3540.1. Samples: 1667344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:08:08,034][04746] Avg episode reward: [(0, '29.081')] +[2024-09-22 06:08:09,240][06906] Updated weights for policy 0, policy_version 1640 (0.0014) +[2024-09-22 06:08:12,326][06906] Updated weights for policy 0, policy_version 1650 (0.0016) +[2024-09-22 06:08:13,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 13981.9). Total num frames: 6766592. Throughput: 0: 3523.4. Samples: 1688610. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-09-22 06:08:13,034][04746] Avg episode reward: [(0, '28.647')] +[2024-09-22 06:08:15,293][06906] Updated weights for policy 0, policy_version 1660 (0.0015) +[2024-09-22 06:08:18,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14199.5, 300 sec: 13995.8). Total num frames: 6836224. Throughput: 0: 3552.0. Samples: 1709802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:08:18,034][04746] Avg episode reward: [(0, '27.663')] +[2024-09-22 06:08:18,055][06906] Updated weights for policy 0, policy_version 1670 (0.0014) +[2024-09-22 06:08:20,785][06906] Updated weights for policy 0, policy_version 1680 (0.0014) +[2024-09-22 06:08:23,031][04746] Fps is (10 sec: 14745.7, 60 sec: 14267.7, 300 sec: 14037.5). Total num frames: 6914048. Throughput: 0: 3558.1. Samples: 1720942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:08:23,034][04746] Avg episode reward: [(0, '27.779')] +[2024-09-22 06:08:23,555][06906] Updated weights for policy 0, policy_version 1690 (0.0014) +[2024-09-22 06:08:26,565][06906] Updated weights for policy 0, policy_version 1700 (0.0014) +[2024-09-22 06:08:28,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 14023.6). Total num frames: 6979584. Throughput: 0: 3560.0. Samples: 1742160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:08:28,034][04746] Avg episode reward: [(0, '26.484')] +[2024-09-22 06:08:29,575][06906] Updated weights for policy 0, policy_version 1710 (0.0014) +[2024-09-22 06:08:32,341][06906] Updated weights for policy 0, policy_version 1720 (0.0014) +[2024-09-22 06:08:33,031][04746] Fps is (10 sec: 13926.5, 60 sec: 14267.8, 300 sec: 14051.4). Total num frames: 7053312. Throughput: 0: 3584.4. Samples: 1763758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:08:33,034][04746] Avg episode reward: [(0, '28.002')] +[2024-09-22 06:08:35,066][06906] Updated weights for policy 0, policy_version 1730 (0.0014) +[2024-09-22 06:08:37,861][06906] Updated weights for policy 0, policy_version 1740 (0.0017) +[2024-09-22 06:08:38,031][04746] Fps is (10 sec: 14745.6, 60 sec: 14267.7, 300 sec: 14079.1). Total num frames: 7127040. Throughput: 0: 3585.8. Samples: 1774912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:08:38,035][04746] Avg episode reward: [(0, '25.099')] +[2024-09-22 06:08:40,840][06906] Updated weights for policy 0, policy_version 1750 (0.0015) +[2024-09-22 06:08:43,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14267.8, 300 sec: 14079.1). Total num frames: 7196672. Throughput: 0: 3568.1. Samples: 1795776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:08:43,034][04746] Avg episode reward: [(0, '24.603')] +[2024-09-22 06:08:43,786][06906] Updated weights for policy 0, policy_version 1760 (0.0015) +[2024-09-22 06:08:46,470][06906] Updated weights for policy 0, policy_version 1770 (0.0014) +[2024-09-22 06:08:48,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14404.3, 300 sec: 14093.0). Total num frames: 7270400. Throughput: 0: 3595.8. Samples: 1818094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:08:48,033][04746] Avg episode reward: [(0, '28.280')] +[2024-09-22 06:08:49,259][06906] Updated weights for policy 0, policy_version 1780 (0.0018) +[2024-09-22 06:08:51,996][06906] Updated weights for policy 0, policy_version 1790 (0.0015) +[2024-09-22 06:08:53,031][04746] Fps is (10 sec: 14745.4, 60 sec: 14336.0, 300 sec: 14120.8). Total num frames: 7344128. Throughput: 0: 3596.8. Samples: 1829200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:08:53,034][04746] Avg episode reward: [(0, '30.855')] +[2024-09-22 06:08:54,933][06906] Updated weights for policy 0, policy_version 1800 (0.0017) +[2024-09-22 06:08:57,835][06906] Updated weights for policy 0, policy_version 1810 (0.0015) +[2024-09-22 06:08:58,031][04746] Fps is (10 sec: 14336.0, 60 sec: 14336.0, 300 sec: 14120.8). Total num frames: 7413760. Throughput: 0: 3586.4. Samples: 1849996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:08:58,034][04746] Avg episode reward: [(0, '27.938')] +[2024-09-22 06:09:00,571][06906] Updated weights for policy 0, policy_version 1820 (0.0014) +[2024-09-22 06:09:03,031][04746] Fps is (10 sec: 14745.8, 60 sec: 14472.6, 300 sec: 14176.3). Total num frames: 7491584. Throughput: 0: 3621.7. Samples: 1872776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:09:03,033][04746] Avg episode reward: [(0, '27.574')] +[2024-09-22 06:09:03,308][06906] Updated weights for policy 0, policy_version 1830 (0.0015) +[2024-09-22 06:09:05,943][06906] Updated weights for policy 0, policy_version 1840 (0.0016) +[2024-09-22 06:09:08,031][04746] Fps is (10 sec: 15155.2, 60 sec: 14472.5, 300 sec: 14190.2). Total num frames: 7565312. Throughput: 0: 3629.4. Samples: 1884264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:09:08,034][04746] Avg episode reward: [(0, '30.259')] +[2024-09-22 06:09:08,883][06906] Updated weights for policy 0, policy_version 1850 (0.0015) +[2024-09-22 06:09:11,811][06906] Updated weights for policy 0, policy_version 1860 (0.0016) +[2024-09-22 06:09:13,031][04746] Fps is (10 sec: 14335.9, 60 sec: 14472.6, 300 sec: 14190.2). Total num frames: 7634944. Throughput: 0: 3622.0. Samples: 1905148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:09:13,034][04746] Avg episode reward: [(0, '26.642')] +[2024-09-22 06:09:14,570][06906] Updated weights for policy 0, policy_version 1870 (0.0015) +[2024-09-22 06:09:17,253][06906] Updated weights for policy 0, policy_version 1880 (0.0014) +[2024-09-22 06:09:18,031][04746] Fps is (10 sec: 14335.9, 60 sec: 14540.8, 300 sec: 14218.0). Total num frames: 7708672. Throughput: 0: 3644.3. Samples: 1927752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:09:18,033][04746] Avg episode reward: [(0, '25.555')] +[2024-09-22 06:09:20,033][06906] Updated weights for policy 0, policy_version 1890 (0.0016) +[2024-09-22 06:09:23,025][06906] Updated weights for policy 0, policy_version 1900 (0.0014) +[2024-09-22 06:09:23,038][04746] Fps is (10 sec: 14734.9, 60 sec: 14470.8, 300 sec: 14231.5). Total num frames: 7782400. Throughput: 0: 3644.5. Samples: 1938942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:09:23,043][04746] Avg episode reward: [(0, '28.122')] +[2024-09-22 06:09:26,050][06906] Updated weights for policy 0, policy_version 1910 (0.0014) +[2024-09-22 06:09:28,031][04746] Fps is (10 sec: 14336.3, 60 sec: 14540.8, 300 sec: 14218.0). Total num frames: 7852032. Throughput: 0: 3632.6. Samples: 1959244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:09:28,033][04746] Avg episode reward: [(0, '28.462')] +[2024-09-22 06:09:28,842][06906] Updated weights for policy 0, policy_version 1920 (0.0013) +[2024-09-22 06:09:31,567][06906] Updated weights for policy 0, policy_version 1930 (0.0017) +[2024-09-22 06:09:33,031][04746] Fps is (10 sec: 14346.3, 60 sec: 14540.8, 300 sec: 14245.7). Total num frames: 7925760. Throughput: 0: 3632.2. Samples: 1981544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:09:33,033][04746] Avg episode reward: [(0, '33.180')] +[2024-09-22 06:09:33,035][06893] Saving new best policy, reward=33.180! +[2024-09-22 06:09:34,368][06906] Updated weights for policy 0, policy_version 1940 (0.0014) +[2024-09-22 06:09:37,321][06906] Updated weights for policy 0, policy_version 1950 (0.0014) +[2024-09-22 06:09:38,031][04746] Fps is (10 sec: 14335.8, 60 sec: 14472.5, 300 sec: 14245.7). Total num frames: 7995392. Throughput: 0: 3627.2. Samples: 1992422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-22 06:09:38,033][04746] Avg episode reward: [(0, '31.870')] +[2024-09-22 06:09:38,044][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001952_7995392.pth... +[2024-09-22 06:09:38,118][06893] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001116_4571136.pth +[2024-09-22 06:09:40,305][06906] Updated weights for policy 0, policy_version 1960 (0.0015) +[2024-09-22 06:09:43,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14472.5, 300 sec: 14231.9). Total num frames: 8065024. Throughput: 0: 3618.8. Samples: 2012844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:09:43,034][04746] Avg episode reward: [(0, '28.848')] +[2024-09-22 06:09:43,249][06906] Updated weights for policy 0, policy_version 1970 (0.0015) +[2024-09-22 06:09:46,191][06906] Updated weights for policy 0, policy_version 1980 (0.0014) +[2024-09-22 06:09:48,031][04746] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 14245.7). Total num frames: 8134656. Throughput: 0: 3579.4. Samples: 2033850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:09:48,033][04746] Avg episode reward: [(0, '30.720')] +[2024-09-22 06:09:49,107][06906] Updated weights for policy 0, policy_version 1990 (0.0016) +[2024-09-22 06:09:52,217][06906] Updated weights for policy 0, policy_version 2000 (0.0017) +[2024-09-22 06:09:53,031][04746] Fps is (10 sec: 13516.9, 60 sec: 14267.8, 300 sec: 14218.0). Total num frames: 8200192. Throughput: 0: 3547.8. Samples: 2043916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-09-22 06:09:53,033][04746] Avg episode reward: [(0, '31.790')] +[2024-09-22 06:09:55,358][06906] Updated weights for policy 0, policy_version 2010 (0.0014) +[2024-09-22 06:09:58,031][04746] Fps is (10 sec: 13107.2, 60 sec: 14199.5, 300 sec: 14190.2). Total num frames: 8265728. Throughput: 0: 3515.7. Samples: 2063356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-22 06:09:58,033][04746] Avg episode reward: [(0, '30.821')] +[2024-09-22 06:09:58,682][06906] Updated weights for policy 0, policy_version 2020 (0.0017) +[2024-09-22 06:10:02,065][06906] Updated weights for policy 0, policy_version 2030 (0.0017) +[2024-09-22 06:10:03,031][04746] Fps is (10 sec: 12287.9, 60 sec: 13858.1, 300 sec: 14162.4). Total num frames: 8323072. Throughput: 0: 3419.3. Samples: 2081622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:10:03,034][04746] Avg episode reward: [(0, '27.214')] +[2024-09-22 06:10:05,664][06906] Updated weights for policy 0, policy_version 2040 (0.0017) +[2024-09-22 06:10:08,031][04746] Fps is (10 sec: 11468.7, 60 sec: 13585.1, 300 sec: 14106.9). Total num frames: 8380416. Throughput: 0: 3358.8. Samples: 2090066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:10:08,033][04746] Avg episode reward: [(0, '28.579')] +[2024-09-22 06:10:09,296][06906] Updated weights for policy 0, policy_version 2050 (0.0017) +[2024-09-22 06:10:12,735][06906] Updated weights for policy 0, policy_version 2060 (0.0017) +[2024-09-22 06:10:13,031][04746] Fps is (10 sec: 11468.7, 60 sec: 13380.2, 300 sec: 14051.4). Total num frames: 8437760. Throughput: 0: 3288.1. Samples: 2107210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:10:13,036][04746] Avg episode reward: [(0, '30.279')] +[2024-09-22 06:10:16,120][06906] Updated weights for policy 0, policy_version 2070 (0.0017) +[2024-09-22 06:10:18,031][04746] Fps is (10 sec: 11878.6, 60 sec: 13175.5, 300 sec: 14037.5). Total num frames: 8499200. Throughput: 0: 3198.1. Samples: 2125458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:10:18,034][04746] Avg episode reward: [(0, '29.975')] +[2024-09-22 06:10:19,656][06906] Updated weights for policy 0, policy_version 2080 (0.0018) +[2024-09-22 06:10:23,031][04746] Fps is (10 sec: 11878.4, 60 sec: 12903.9, 300 sec: 13981.9). Total num frames: 8556544. Throughput: 0: 3141.6. Samples: 2133794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:10:23,033][04746] Avg episode reward: [(0, '27.525')] +[2024-09-22 06:10:23,348][06906] Updated weights for policy 0, policy_version 2090 (0.0018) +[2024-09-22 06:10:26,726][06906] Updated weights for policy 0, policy_version 2100 (0.0016) +[2024-09-22 06:10:28,031][04746] Fps is (10 sec: 11468.8, 60 sec: 12697.6, 300 sec: 13940.3). Total num frames: 8613888. Throughput: 0: 3077.3. Samples: 2151320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:10:28,033][04746] Avg episode reward: [(0, '24.633')] +[2024-09-22 06:10:30,089][06906] Updated weights for policy 0, policy_version 2110 (0.0015) +[2024-09-22 06:10:33,031][04746] Fps is (10 sec: 11878.4, 60 sec: 12492.8, 300 sec: 13912.5). Total num frames: 8675328. Throughput: 0: 3015.4. Samples: 2169544. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-09-22 06:10:33,034][04746] Avg episode reward: [(0, '27.902')] +[2024-09-22 06:10:33,512][06906] Updated weights for policy 0, policy_version 2120 (0.0014) +[2024-09-22 06:10:37,238][06906] Updated weights for policy 0, policy_version 2130 (0.0017) +[2024-09-22 06:10:38,031][04746] Fps is (10 sec: 11878.2, 60 sec: 12288.0, 300 sec: 13870.9). Total num frames: 8732672. Throughput: 0: 2976.3. Samples: 2177850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:10:38,034][04746] Avg episode reward: [(0, '28.677')] +[2024-09-22 06:10:40,661][06906] Updated weights for policy 0, policy_version 2140 (0.0015) +[2024-09-22 06:10:43,031][04746] Fps is (10 sec: 11468.8, 60 sec: 12083.2, 300 sec: 13815.3). Total num frames: 8790016. Throughput: 0: 2932.5. Samples: 2195320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:10:43,034][04746] Avg episode reward: [(0, '27.919')] +[2024-09-22 06:10:44,061][06906] Updated weights for policy 0, policy_version 2150 (0.0016) +[2024-09-22 06:10:47,398][06906] Updated weights for policy 0, policy_version 2160 (0.0020) +[2024-09-22 06:10:48,031][04746] Fps is (10 sec: 11878.5, 60 sec: 11946.7, 300 sec: 13801.4). Total num frames: 8851456. Throughput: 0: 2936.0. Samples: 2213744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:10:48,033][04746] Avg episode reward: [(0, '29.776')] +[2024-09-22 06:10:50,642][06906] Updated weights for policy 0, policy_version 2170 (0.0017) +[2024-09-22 06:10:53,031][04746] Fps is (10 sec: 12697.3, 60 sec: 11946.6, 300 sec: 13773.7). Total num frames: 8916992. Throughput: 0: 2957.7. Samples: 2223164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:10:53,035][04746] Avg episode reward: [(0, '30.713')] +[2024-09-22 06:10:53,743][06906] Updated weights for policy 0, policy_version 2180 (0.0017) +[2024-09-22 06:10:56,608][06906] Updated weights for policy 0, policy_version 2190 (0.0014) +[2024-09-22 06:10:58,031][04746] Fps is (10 sec: 13516.7, 60 sec: 12014.9, 300 sec: 13773.7). Total num frames: 8986624. Throughput: 0: 3037.3. Samples: 2243888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:10:58,034][04746] Avg episode reward: [(0, '28.134')] +[2024-09-22 06:10:59,455][06906] Updated weights for policy 0, policy_version 2200 (0.0015) +[2024-09-22 06:11:02,390][06906] Updated weights for policy 0, policy_version 2210 (0.0019) +[2024-09-22 06:11:03,033][04746] Fps is (10 sec: 13923.8, 60 sec: 12219.3, 300 sec: 13773.6). Total num frames: 9056256. Throughput: 0: 3093.6. Samples: 2264678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:11:03,035][04746] Avg episode reward: [(0, '27.638')] +[2024-09-22 06:11:06,169][06906] Updated weights for policy 0, policy_version 2220 (0.0016) +[2024-09-22 06:11:08,031][04746] Fps is (10 sec: 12697.7, 60 sec: 12219.8, 300 sec: 13718.1). Total num frames: 9113600. Throughput: 0: 3091.7. Samples: 2272920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:11:08,033][04746] Avg episode reward: [(0, '28.320')] +[2024-09-22 06:11:09,625][06906] Updated weights for policy 0, policy_version 2230 (0.0016) +[2024-09-22 06:11:13,014][06906] Updated weights for policy 0, policy_version 2240 (0.0016) +[2024-09-22 06:11:13,031][04746] Fps is (10 sec: 11880.9, 60 sec: 12288.0, 300 sec: 13704.2). Total num frames: 9175040. Throughput: 0: 3097.7. Samples: 2290716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:11:13,034][04746] Avg episode reward: [(0, '29.334')] +[2024-09-22 06:11:16,343][06906] Updated weights for policy 0, policy_version 2250 (0.0016) +[2024-09-22 06:11:18,031][04746] Fps is (10 sec: 11878.4, 60 sec: 12219.7, 300 sec: 13648.7). Total num frames: 9232384. Throughput: 0: 3094.1. Samples: 2308778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:11:18,033][04746] Avg episode reward: [(0, '31.025')] +[2024-09-22 06:11:20,034][06906] Updated weights for policy 0, policy_version 2260 (0.0018) +[2024-09-22 06:11:23,031][04746] Fps is (10 sec: 11468.9, 60 sec: 12219.7, 300 sec: 13593.2). Total num frames: 9289728. Throughput: 0: 3092.1. Samples: 2316994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:11:23,035][04746] Avg episode reward: [(0, '29.830')] +[2024-09-22 06:11:23,554][06906] Updated weights for policy 0, policy_version 2270 (0.0015) +[2024-09-22 06:11:26,999][06906] Updated weights for policy 0, policy_version 2280 (0.0016) +[2024-09-22 06:11:28,031][04746] Fps is (10 sec: 11878.4, 60 sec: 12288.0, 300 sec: 13579.3). Total num frames: 9351168. Throughput: 0: 3100.1. Samples: 2334826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:11:28,034][04746] Avg episode reward: [(0, '28.612')] +[2024-09-22 06:11:30,359][06906] Updated weights for policy 0, policy_version 2290 (0.0014) +[2024-09-22 06:11:33,031][04746] Fps is (10 sec: 11878.3, 60 sec: 12219.7, 300 sec: 13523.7). Total num frames: 9408512. Throughput: 0: 3083.4. Samples: 2352498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-09-22 06:11:33,034][04746] Avg episode reward: [(0, '30.104')] +[2024-09-22 06:11:33,991][06906] Updated weights for policy 0, policy_version 2300 (0.0015) +[2024-09-22 06:11:37,626][06906] Updated weights for policy 0, policy_version 2310 (0.0017) +[2024-09-22 06:11:38,031][04746] Fps is (10 sec: 11468.6, 60 sec: 12219.7, 300 sec: 13482.1). Total num frames: 9465856. Throughput: 0: 3056.4. Samples: 2360700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-22 06:11:38,034][04746] Avg episode reward: [(0, '30.210')] +[2024-09-22 06:11:38,043][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002311_9465856.pth... +[2024-09-22 06:11:38,126][06893] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001531_6270976.pth +[2024-09-22 06:11:40,997][06906] Updated weights for policy 0, policy_version 2320 (0.0017) +[2024-09-22 06:11:43,031][04746] Fps is (10 sec: 11468.9, 60 sec: 12219.7, 300 sec: 13440.4). Total num frames: 9523200. Throughput: 0: 2996.4. Samples: 2378726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:11:43,034][04746] Avg episode reward: [(0, '28.804')] +[2024-09-22 06:11:44,444][06906] Updated weights for policy 0, policy_version 2330 (0.0015) +[2024-09-22 06:11:48,018][06906] Updated weights for policy 0, policy_version 2340 (0.0017) +[2024-09-22 06:11:48,031][04746] Fps is (10 sec: 11878.3, 60 sec: 12219.7, 300 sec: 13398.8). Total num frames: 9584640. Throughput: 0: 2922.2. Samples: 2396170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-09-22 06:11:48,034][04746] Avg episode reward: [(0, '31.504')] +[2024-09-22 06:11:51,642][06906] Updated weights for policy 0, policy_version 2350 (0.0017) +[2024-09-22 06:11:53,031][04746] Fps is (10 sec: 11878.1, 60 sec: 12083.2, 300 sec: 13343.2). Total num frames: 9641984. Throughput: 0: 2923.3. Samples: 2404470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:11:53,034][04746] Avg episode reward: [(0, '31.838')] +[2024-09-22 06:11:55,083][06906] Updated weights for policy 0, policy_version 2360 (0.0018) +[2024-09-22 06:11:58,031][04746] Fps is (10 sec: 11469.1, 60 sec: 11878.4, 300 sec: 13315.5). Total num frames: 9699328. Throughput: 0: 2926.8. Samples: 2422422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:11:58,033][04746] Avg episode reward: [(0, '30.794')] +[2024-09-22 06:11:58,469][06906] Updated weights for policy 0, policy_version 2370 (0.0015) +[2024-09-22 06:12:01,564][06906] Updated weights for policy 0, policy_version 2380 (0.0018) +[2024-09-22 06:12:03,031][04746] Fps is (10 sec: 12288.3, 60 sec: 11810.6, 300 sec: 13287.7). Total num frames: 9764864. Throughput: 0: 2956.6. Samples: 2441824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-09-22 06:12:03,034][04746] Avg episode reward: [(0, '28.643')] +[2024-09-22 06:12:04,847][06906] Updated weights for policy 0, policy_version 2390 (0.0018) +[2024-09-22 06:12:07,678][06906] Updated weights for policy 0, policy_version 2400 (0.0017) +[2024-09-22 06:12:08,032][04746] Fps is (10 sec: 13516.3, 60 sec: 12014.8, 300 sec: 13273.8). Total num frames: 9834496. Throughput: 0: 2990.0. Samples: 2451544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-09-22 06:12:08,033][04746] Avg episode reward: [(0, '27.010')] +[2024-09-22 06:12:10,518][06906] Updated weights for policy 0, policy_version 2410 (0.0015) +[2024-09-22 06:12:13,031][04746] Fps is (10 sec: 13926.3, 60 sec: 12151.5, 300 sec: 13287.7). Total num frames: 9904128. Throughput: 0: 3068.1. Samples: 2472890. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-09-22 06:12:13,033][04746] Avg episode reward: [(0, '29.198')] +[2024-09-22 06:12:13,580][06906] Updated weights for policy 0, policy_version 2420 (0.0014) +[2024-09-22 06:12:17,131][06906] Updated weights for policy 0, policy_version 2430 (0.0017) +[2024-09-22 06:12:18,031][04746] Fps is (10 sec: 12698.1, 60 sec: 12151.5, 300 sec: 13232.2). Total num frames: 9961472. Throughput: 0: 3073.9. Samples: 2490824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-09-22 06:12:18,034][04746] Avg episode reward: [(0, '29.191')] +[2024-09-22 06:12:20,788][06906] Updated weights for policy 0, policy_version 2440 (0.0018) +[2024-09-22 06:12:21,773][06893] Stopping Batcher_0... +[2024-09-22 06:12:21,774][06893] Loop batcher_evt_loop terminating... +[2024-09-22 06:12:21,777][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... +[2024-09-22 06:12:21,774][04746] Component Batcher_0 stopped! +[2024-09-22 06:12:21,808][06906] Weights refcount: 2 0 +[2024-09-22 06:12:21,810][06906] Stopping InferenceWorker_p0-w0... +[2024-09-22 06:12:21,810][06906] Loop inference_proc0-0_evt_loop terminating... +[2024-09-22 06:12:21,811][04746] Component InferenceWorker_p0-w0 stopped! +[2024-09-22 06:12:21,838][06913] Stopping RolloutWorker_w4... +[2024-09-22 06:12:21,838][06913] Loop rollout_proc4_evt_loop terminating... +[2024-09-22 06:12:21,838][04746] Component RolloutWorker_w4 stopped! +[2024-09-22 06:12:21,851][06911] Stopping RolloutWorker_w5... +[2024-09-22 06:12:21,852][06911] Loop rollout_proc5_evt_loop terminating... +[2024-09-22 06:12:21,851][04746] Component RolloutWorker_w5 stopped! +[2024-09-22 06:12:21,864][06910] Stopping RolloutWorker_w3... +[2024-09-22 06:12:21,865][06910] Loop rollout_proc3_evt_loop terminating... +[2024-09-22 06:12:21,867][06918] Stopping RolloutWorker_w7... +[2024-09-22 06:12:21,864][04746] Component RolloutWorker_w3 stopped! +[2024-09-22 06:12:21,868][06918] Loop rollout_proc7_evt_loop terminating... +[2024-09-22 06:12:21,868][04746] Component RolloutWorker_w7 stopped! +[2024-09-22 06:12:21,872][06893] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001952_7995392.pth +[2024-09-22 06:12:21,878][06912] Stopping RolloutWorker_w6... +[2024-09-22 06:12:21,879][04746] Component RolloutWorker_w6 stopped! +[2024-09-22 06:12:21,882][06912] Loop rollout_proc6_evt_loop terminating... +[2024-09-22 06:12:21,885][06893] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... +[2024-09-22 06:12:21,898][06908] Stopping RolloutWorker_w1... +[2024-09-22 06:12:21,899][06908] Loop rollout_proc1_evt_loop terminating... +[2024-09-22 06:12:21,898][04746] Component RolloutWorker_w1 stopped! +[2024-09-22 06:12:21,921][06909] Stopping RolloutWorker_w2... +[2024-09-22 06:12:21,922][06909] Loop rollout_proc2_evt_loop terminating... +[2024-09-22 06:12:21,924][04746] Component RolloutWorker_w2 stopped! +[2024-09-22 06:12:21,966][06907] Stopping RolloutWorker_w0... +[2024-09-22 06:12:21,967][06907] Loop rollout_proc0_evt_loop terminating... +[2024-09-22 06:12:21,967][04746] Component RolloutWorker_w0 stopped! +[2024-09-22 06:12:22,064][06893] Stopping LearnerWorker_p0... +[2024-09-22 06:12:22,064][06893] Loop learner_proc0_evt_loop terminating... +[2024-09-22 06:12:22,063][04746] Component LearnerWorker_p0 stopped! +[2024-09-22 06:12:22,068][04746] Waiting for process learner_proc0 to stop... +[2024-09-22 06:12:23,148][04746] Waiting for process inference_proc0-0 to join... +[2024-09-22 06:12:23,151][04746] Waiting for process rollout_proc0 to join... +[2024-09-22 06:12:23,154][04746] Waiting for process rollout_proc1 to join... +[2024-09-22 06:12:23,156][04746] Waiting for process rollout_proc2 to join... +[2024-09-22 06:12:23,158][04746] Waiting for process rollout_proc3 to join... +[2024-09-22 06:12:23,160][04746] Waiting for process rollout_proc4 to join... +[2024-09-22 06:12:23,162][04746] Waiting for process rollout_proc5 to join... +[2024-09-22 06:12:23,165][04746] Waiting for process rollout_proc6 to join... +[2024-09-22 06:12:23,168][04746] Waiting for process rollout_proc7 to join... +[2024-09-22 06:12:23,170][04746] Batcher 0 profile tree view: +batching: 41.2143, releasing_batches: 0.0861 +[2024-09-22 06:12:23,171][04746] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 - wait_policy_total: 3.9341 -update_model: 3.6129 - weight_update: 0.0012 -one_step: 0.0026 - handle_policy_step: 209.1558 - deserialize: 7.8546, stack: 1.3917, obs_to_device_normalize: 49.2185, forward: 105.4844, send_messages: 13.5968 - prepare_outputs: 22.3562 - to_cpu: 13.1604 -[2024-09-15 15:37:53,977][00283] Learner 0 profile tree view: -misc: 0.0052, prepare_batch: 10.4311 -train: 24.0978 - epoch_init: 0.0055, minibatch_init: 0.0061, losses_postprocess: 0.3000, kl_divergence: 0.4024, after_optimizer: 5.3092 - calculate_losses: 10.1043 - losses_init: 0.0033, forward_head: 0.6827, bptt_initial: 6.5923, tail: 0.5541, advantages_returns: 0.1398, losses: 1.0380 - bptt: 0.9309 - bptt_forward_core: 0.8807 - update: 7.6408 - clip: 0.7823 -[2024-09-15 15:37:53,980][00283] RolloutWorker_w7 profile tree view: -wait_for_trajectories: 0.1627, enqueue_policy_requests: 8.2873, env_step: 137.4009, overhead: 6.8411, complete_rollouts: 0.2546 -save_policy_outputs: 9.7021 - split_output_tensors: 3.8621 -[2024-09-15 15:37:53,981][00283] Loop Runner_EvtLoop terminating... -[2024-09-15 15:37:53,984][00283] Runner profile tree view: -main_loop: 236.1357 -[2024-09-15 15:37:53,985][00283] Collected {0: 4005888}, FPS: 16964.3 -[2024-09-15 15:37:54,248][00283] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-09-15 15:37:54,250][00283] Overriding arg 'num_workers' with value 1 passed from command line -[2024-09-15 15:37:54,251][00283] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-09-15 15:37:54,253][00283] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-09-15 15:37:54,254][00283] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-09-15 15:37:54,255][00283] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-09-15 15:37:54,256][00283] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2024-09-15 15:37:54,257][00283] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-09-15 15:37:54,259][00283] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2024-09-15 15:37:54,259][00283] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2024-09-15 15:37:54,261][00283] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-09-15 15:37:54,262][00283] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-09-15 15:37:54,264][00283] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-09-15 15:37:54,265][00283] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-09-15 15:37:54,266][00283] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-09-15 15:37:54,295][00283] Doom resolution: 160x120, resize resolution: (128, 72) -[2024-09-15 15:37:54,298][00283] RunningMeanStd input shape: (3, 72, 128) -[2024-09-15 15:37:54,300][00283] RunningMeanStd input shape: (1,) -[2024-09-15 15:37:54,314][00283] ConvEncoder: input_channels=3 -[2024-09-15 15:37:54,426][00283] Conv encoder output size: 512 -[2024-09-15 15:37:54,427][00283] Policy head output size: 512 -[2024-09-15 15:37:54,570][00283] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-09-15 15:37:55,418][00283] Num frames 100... -[2024-09-15 15:37:55,538][00283] Num frames 200... -[2024-09-15 15:37:55,659][00283] Num frames 300... -[2024-09-15 15:37:55,780][00283] Num frames 400... -[2024-09-15 15:37:55,892][00283] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 -[2024-09-15 15:37:55,894][00283] Avg episode reward: 5.480, avg true_objective: 4.480 -[2024-09-15 15:37:55,956][00283] Num frames 500... -[2024-09-15 15:37:56,071][00283] Num frames 600... -[2024-09-15 15:37:56,188][00283] Num frames 700... -[2024-09-15 15:37:56,313][00283] Num frames 800... -[2024-09-15 15:37:56,406][00283] Avg episode rewards: #0: 6.160, true rewards: #0: 4.160 -[2024-09-15 15:37:56,407][00283] Avg episode reward: 6.160, avg true_objective: 4.160 -[2024-09-15 15:37:56,491][00283] Num frames 900... -[2024-09-15 15:37:56,611][00283] Num frames 1000... -[2024-09-15 15:37:56,729][00283] Num frames 1100... -[2024-09-15 15:37:56,846][00283] Num frames 1200... -[2024-09-15 15:37:56,964][00283] Num frames 1300... -[2024-09-15 15:37:57,084][00283] Num frames 1400... -[2024-09-15 15:37:57,186][00283] Avg episode rewards: #0: 7.133, true rewards: #0: 4.800 -[2024-09-15 15:37:57,187][00283] Avg episode reward: 7.133, avg true_objective: 4.800 -[2024-09-15 15:37:57,260][00283] Num frames 1500... -[2024-09-15 15:37:57,377][00283] Num frames 1600... -[2024-09-15 15:37:57,498][00283] Num frames 1700... -[2024-09-15 15:37:57,615][00283] Num frames 1800... -[2024-09-15 15:37:57,736][00283] Num frames 1900... -[2024-09-15 15:37:57,901][00283] Avg episode rewards: #0: 7.483, true rewards: #0: 4.982 -[2024-09-15 15:37:57,903][00283] Avg episode reward: 7.483, avg true_objective: 4.982 -[2024-09-15 15:37:57,911][00283] Num frames 2000... -[2024-09-15 15:37:58,027][00283] Num frames 2100... -[2024-09-15 15:37:58,144][00283] Avg episode rewards: #0: 6.306, true rewards: #0: 4.306 -[2024-09-15 15:37:58,145][00283] Avg episode reward: 6.306, avg true_objective: 4.306 -[2024-09-15 15:37:58,201][00283] Num frames 2200... -[2024-09-15 15:37:58,314][00283] Num frames 2300... -[2024-09-15 15:37:58,431][00283] Num frames 2400... -[2024-09-15 15:37:58,546][00283] Num frames 2500... -[2024-09-15 15:37:58,668][00283] Num frames 2600... -[2024-09-15 15:37:58,843][00283] Avg episode rewards: #0: 6.662, true rewards: #0: 4.495 -[2024-09-15 15:37:58,845][00283] Avg episode reward: 6.662, avg true_objective: 4.495 -[2024-09-15 15:37:58,849][00283] Num frames 2700... -[2024-09-15 15:37:58,966][00283] Num frames 2800... -[2024-09-15 15:37:59,085][00283] Num frames 2900... -[2024-09-15 15:37:59,205][00283] Num frames 3000... -[2024-09-15 15:37:59,328][00283] Num frames 3100... -[2024-09-15 15:37:59,448][00283] Num frames 3200... -[2024-09-15 15:37:59,515][00283] Avg episode rewards: #0: 6.727, true rewards: #0: 4.584 -[2024-09-15 15:37:59,516][00283] Avg episode reward: 6.727, avg true_objective: 4.584 -[2024-09-15 15:37:59,628][00283] Num frames 3300... -[2024-09-15 15:37:59,754][00283] Num frames 3400... -[2024-09-15 15:37:59,883][00283] Num frames 3500... -[2024-09-15 15:38:00,012][00283] Num frames 3600... -[2024-09-15 15:38:00,154][00283] Avg episode rewards: #0: 6.571, true rewards: #0: 4.571 -[2024-09-15 15:38:00,155][00283] Avg episode reward: 6.571, avg true_objective: 4.571 -[2024-09-15 15:38:00,208][00283] Num frames 3700... -[2024-09-15 15:38:00,329][00283] Num frames 3800... -[2024-09-15 15:38:00,450][00283] Num frames 3900... -[2024-09-15 15:38:00,570][00283] Num frames 4000... -[2024-09-15 15:38:00,690][00283] Num frames 4100... -[2024-09-15 15:38:00,820][00283] Num frames 4200... -[2024-09-15 15:38:00,953][00283] Avg episode rewards: #0: 6.850, true rewards: #0: 4.739 -[2024-09-15 15:38:00,954][00283] Avg episode reward: 6.850, avg true_objective: 4.739 -[2024-09-15 15:38:00,996][00283] Num frames 4300... -[2024-09-15 15:38:01,115][00283] Num frames 4400... -[2024-09-15 15:38:01,233][00283] Num frames 4500... -[2024-09-15 15:38:01,352][00283] Num frames 4600... -[2024-09-15 15:38:01,504][00283] Avg episode rewards: #0: 6.781, true rewards: #0: 4.681 -[2024-09-15 15:38:01,505][00283] Avg episode reward: 6.781, avg true_objective: 4.681 -[2024-09-15 15:38:12,554][00283] Replay video saved to /content/train_dir/default_experiment/replay.mp4! -[2024-09-15 15:39:48,505][00283] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2024-09-15 15:39:48,506][00283] Overriding arg 'num_workers' with value 1 passed from command line -[2024-09-15 15:39:48,507][00283] Adding new argument 'no_render'=True that is not in the saved config file! -[2024-09-15 15:39:48,508][00283] Adding new argument 'save_video'=True that is not in the saved config file! -[2024-09-15 15:39:48,510][00283] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2024-09-15 15:39:48,512][00283] Adding new argument 'video_name'=None that is not in the saved config file! -[2024-09-15 15:39:48,513][00283] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2024-09-15 15:39:48,515][00283] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2024-09-15 15:39:48,516][00283] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2024-09-15 15:39:48,518][00283] Adding new argument 'hf_repository'='Vivek-huggingface/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2024-09-15 15:39:48,519][00283] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2024-09-15 15:39:48,520][00283] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2024-09-15 15:39:48,522][00283] Adding new argument 'train_script'=None that is not in the saved config file! -[2024-09-15 15:39:48,524][00283] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2024-09-15 15:39:48,525][00283] Using frameskip 1 and render_action_repeat=4 for evaluation -[2024-09-15 15:39:48,548][00283] RunningMeanStd input shape: (3, 72, 128) -[2024-09-15 15:39:48,550][00283] RunningMeanStd input shape: (1,) -[2024-09-15 15:39:48,562][00283] ConvEncoder: input_channels=3 -[2024-09-15 15:39:48,600][00283] Conv encoder output size: 512 -[2024-09-15 15:39:48,601][00283] Policy head output size: 512 -[2024-09-15 15:39:48,620][00283] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2024-09-15 15:39:49,031][00283] Num frames 100... -[2024-09-15 15:39:49,151][00283] Num frames 200... -[2024-09-15 15:39:49,270][00283] Num frames 300... -[2024-09-15 15:39:49,390][00283] Num frames 400... -[2024-09-15 15:39:49,508][00283] Num frames 500... -[2024-09-15 15:39:49,628][00283] Num frames 600... -[2024-09-15 15:39:49,746][00283] Num frames 700... -[2024-09-15 15:39:49,880][00283] Avg episode rewards: #0: 12.680, true rewards: #0: 7.680 -[2024-09-15 15:39:49,882][00283] Avg episode reward: 12.680, avg true_objective: 7.680 -[2024-09-15 15:39:49,921][00283] Num frames 800... -[2024-09-15 15:39:50,037][00283] Num frames 900... -[2024-09-15 15:39:50,154][00283] Num frames 1000... -[2024-09-15 15:39:50,272][00283] Num frames 1100... -[2024-09-15 15:39:50,390][00283] Num frames 1200... -[2024-09-15 15:39:50,464][00283] Avg episode rewards: #0: 10.080, true rewards: #0: 6.080 -[2024-09-15 15:39:50,466][00283] Avg episode reward: 10.080, avg true_objective: 6.080 -[2024-09-15 15:39:50,564][00283] Num frames 1300... -[2024-09-15 15:39:50,680][00283] Num frames 1400... -[2024-09-15 15:39:50,796][00283] Num frames 1500... -[2024-09-15 15:39:50,913][00283] Num frames 1600... -[2024-09-15 15:39:51,030][00283] Num frames 1700... -[2024-09-15 15:39:51,147][00283] Num frames 1800... -[2024-09-15 15:39:51,265][00283] Num frames 1900... -[2024-09-15 15:39:51,384][00283] Num frames 2000... -[2024-09-15 15:39:51,503][00283] Num frames 2100... -[2024-09-15 15:39:51,623][00283] Num frames 2200... -[2024-09-15 15:39:51,767][00283] Avg episode rewards: #0: 13.243, true rewards: #0: 7.577 -[2024-09-15 15:39:51,768][00283] Avg episode reward: 13.243, avg true_objective: 7.577 -[2024-09-15 15:39:51,800][00283] Num frames 2300... -[2024-09-15 15:39:51,917][00283] Num frames 2400... -[2024-09-15 15:39:52,034][00283] Num frames 2500... -[2024-09-15 15:39:52,152][00283] Num frames 2600... -[2024-09-15 15:39:52,268][00283] Num frames 2700... -[2024-09-15 15:39:52,383][00283] Num frames 2800... -[2024-09-15 15:39:52,502][00283] Num frames 2900... -[2024-09-15 15:39:52,620][00283] Num frames 3000... -[2024-09-15 15:39:52,738][00283] Num frames 3100... -[2024-09-15 15:39:52,862][00283] Num frames 3200... -[2024-09-15 15:39:53,033][00283] Avg episode rewards: #0: 14.493, true rewards: #0: 8.242 -[2024-09-15 15:39:53,035][00283] Avg episode reward: 14.493, avg true_objective: 8.242 -[2024-09-15 15:39:53,039][00283] Num frames 3300... -[2024-09-15 15:39:53,156][00283] Num frames 3400... -[2024-09-15 15:39:53,271][00283] Num frames 3500... -[2024-09-15 15:39:53,388][00283] Num frames 3600... -[2024-09-15 15:39:53,544][00283] Avg episode rewards: #0: 12.362, true rewards: #0: 7.362 -[2024-09-15 15:39:53,546][00283] Avg episode reward: 12.362, avg true_objective: 7.362 -[2024-09-15 15:39:53,569][00283] Num frames 3700... -[2024-09-15 15:39:53,687][00283] Num frames 3800... -[2024-09-15 15:39:53,812][00283] Num frames 3900... -[2024-09-15 15:39:53,937][00283] Num frames 4000... -[2024-09-15 15:39:54,075][00283] Avg episode rewards: #0: 10.942, true rewards: #0: 6.775 -[2024-09-15 15:39:54,076][00283] Avg episode reward: 10.942, avg true_objective: 6.775 -[2024-09-15 15:39:54,118][00283] Num frames 4100... -[2024-09-15 15:39:54,236][00283] Num frames 4200... -[2024-09-15 15:39:54,353][00283] Num frames 4300... -[2024-09-15 15:39:54,470][00283] Num frames 4400... -[2024-09-15 15:39:54,594][00283] Num frames 4500... -[2024-09-15 15:39:54,722][00283] Num frames 4600... -[2024-09-15 15:39:54,850][00283] Num frames 4700... -[2024-09-15 15:39:54,977][00283] Num frames 4800... -[2024-09-15 15:39:55,074][00283] Avg episode rewards: #0: 11.333, true rewards: #0: 6.904 -[2024-09-15 15:39:55,076][00283] Avg episode reward: 11.333, avg true_objective: 6.904 -[2024-09-15 15:39:55,160][00283] Num frames 4900... -[2024-09-15 15:39:55,286][00283] Num frames 5000... -[2024-09-15 15:39:55,411][00283] Num frames 5100... -[2024-09-15 15:39:55,537][00283] Num frames 5200... -[2024-09-15 15:39:55,614][00283] Avg episode rewards: #0: 10.396, true rewards: #0: 6.521 -[2024-09-15 15:39:55,616][00283] Avg episode reward: 10.396, avg true_objective: 6.521 -[2024-09-15 15:39:55,718][00283] Num frames 5300... -[2024-09-15 15:39:55,845][00283] Num frames 5400... -[2024-09-15 15:39:55,965][00283] Num frames 5500... -[2024-09-15 15:39:56,084][00283] Num frames 5600... -[2024-09-15 15:39:56,216][00283] Avg episode rewards: #0: 9.850, true rewards: #0: 6.294 -[2024-09-15 15:39:56,217][00283] Avg episode reward: 9.850, avg true_objective: 6.294 -[2024-09-15 15:39:56,260][00283] Num frames 5700... -[2024-09-15 15:39:56,376][00283] Num frames 5800... -[2024-09-15 15:39:56,494][00283] Num frames 5900... -[2024-09-15 15:39:56,610][00283] Num frames 6000... -[2024-09-15 15:39:56,722][00283] Avg episode rewards: #0: 9.249, true rewards: #0: 6.049 -[2024-09-15 15:39:56,723][00283] Avg episode reward: 9.249, avg true_objective: 6.049 -[2024-09-15 15:40:09,911][00283] Replay video saved to /content/train_dir/default_experiment/replay.mp4! + wait_policy_total: 12.9840 +update_model: 12.5052 + weight_update: 0.0017 +one_step: 0.0084 + handle_policy_step: 680.9309 + deserialize: 27.9295, stack: 4.8215, obs_to_device_normalize: 159.6845, forward: 336.5609, send_messages: 40.8327 + prepare_outputs: 78.3299 + to_cpu: 49.7910 +[2024-09-22 06:12:23,173][04746] Learner 0 profile tree view: +misc: 0.0139, prepare_batch: 26.0761 +train: 112.9910 + epoch_init: 0.0147, minibatch_init: 0.0160, losses_postprocess: 0.8946, kl_divergence: 0.9136, after_optimizer: 52.6494 + calculate_losses: 38.7085 + losses_init: 0.0096, forward_head: 1.8256, bptt_initial: 27.6678, tail: 1.5628, advantages_returns: 0.4044, losses: 3.8285 + bptt: 2.9188 + bptt_forward_core: 2.7751 + update: 18.7262 + clip: 1.8584 +[2024-09-22 06:12:23,176][04746] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.4337, enqueue_policy_requests: 22.8105, env_step: 304.0444, overhead: 17.7295, complete_rollouts: 1.0132 +save_policy_outputs: 26.1419 + split_output_tensors: 10.4431 +[2024-09-22 06:12:23,178][04746] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.4597, enqueue_policy_requests: 23.9968, env_step: 320.4870, overhead: 18.6543, complete_rollouts: 1.3253 +save_policy_outputs: 27.0024 + split_output_tensors: 10.7421 +[2024-09-22 06:12:23,180][04746] Loop Runner_EvtLoop terminating... +[2024-09-22 06:12:23,181][04746] Runner profile tree view: +main_loop: 759.4308 +[2024-09-22 06:12:23,182][04746] Collected {0: 10006528}, FPS: 13176.4 +[2024-09-22 06:12:23,547][04746] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-22 06:12:23,550][04746] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-22 06:12:23,552][04746] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-22 06:12:23,553][04746] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-22 06:12:23,554][04746] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-22 06:12:23,556][04746] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-22 06:12:23,559][04746] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-09-22 06:12:23,561][04746] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-22 06:12:23,563][04746] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-09-22 06:12:23,564][04746] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-09-22 06:12:23,565][04746] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-22 06:12:23,566][04746] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-22 06:12:23,568][04746] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-22 06:12:23,571][04746] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-22 06:12:23,572][04746] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-22 06:12:23,605][04746] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:12:23,609][04746] RunningMeanStd input shape: (3, 72, 128) +[2024-09-22 06:12:23,612][04746] RunningMeanStd input shape: (1,) +[2024-09-22 06:12:23,630][04746] ConvEncoder: input_channels=3 +[2024-09-22 06:12:23,759][04746] Conv encoder output size: 512 +[2024-09-22 06:12:23,761][04746] Policy head output size: 512 +[2024-09-22 06:12:24,038][04746] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... +[2024-09-22 06:12:24,942][04746] Num frames 100... +[2024-09-22 06:12:25,115][04746] Num frames 200... +[2024-09-22 06:12:25,269][04746] Num frames 300... +[2024-09-22 06:12:25,426][04746] Num frames 400... +[2024-09-22 06:12:25,577][04746] Num frames 500... +[2024-09-22 06:12:25,746][04746] Num frames 600... +[2024-09-22 06:12:25,902][04746] Num frames 700... +[2024-09-22 06:12:26,054][04746] Num frames 800... +[2024-09-22 06:12:26,213][04746] Num frames 900... +[2024-09-22 06:12:26,381][04746] Num frames 1000... +[2024-09-22 06:12:26,539][04746] Num frames 1100... +[2024-09-22 06:12:26,687][04746] Num frames 1200... +[2024-09-22 06:12:26,864][04746] Num frames 1300... +[2024-09-22 06:12:27,021][04746] Num frames 1400... +[2024-09-22 06:12:27,174][04746] Num frames 1500... +[2024-09-22 06:12:27,329][04746] Num frames 1600... +[2024-09-22 06:12:27,547][04746] Avg episode rewards: #0: 39.959, true rewards: #0: 16.960 +[2024-09-22 06:12:27,549][04746] Avg episode reward: 39.959, avg true_objective: 16.960 +[2024-09-22 06:12:27,558][04746] Num frames 1700... +[2024-09-22 06:12:27,714][04746] Num frames 1800... +[2024-09-22 06:12:27,857][04746] Num frames 1900... +[2024-09-22 06:12:28,003][04746] Avg episode rewards: #0: 21.260, true rewards: #0: 9.760 +[2024-09-22 06:12:28,005][04746] Avg episode reward: 21.260, avg true_objective: 9.760 +[2024-09-22 06:12:28,080][04746] Num frames 2000... +[2024-09-22 06:12:28,220][04746] Num frames 2100... +[2024-09-22 06:12:28,369][04746] Num frames 2200... +[2024-09-22 06:12:28,537][04746] Num frames 2300... +[2024-09-22 06:12:28,700][04746] Num frames 2400... +[2024-09-22 06:12:28,753][04746] Avg episode rewards: #0: 17.333, true rewards: #0: 8.000 +[2024-09-22 06:12:28,755][04746] Avg episode reward: 17.333, avg true_objective: 8.000 +[2024-09-22 06:12:28,896][04746] Num frames 2500... +[2024-09-22 06:12:29,065][04746] Num frames 2600... +[2024-09-22 06:12:29,220][04746] Num frames 2700... +[2024-09-22 06:12:29,368][04746] Num frames 2800... +[2024-09-22 06:12:29,544][04746] Num frames 2900... +[2024-09-22 06:12:29,704][04746] Num frames 3000... +[2024-09-22 06:12:29,869][04746] Num frames 3100... +[2024-09-22 06:12:29,931][04746] Avg episode rewards: #0: 17.010, true rewards: #0: 7.760 +[2024-09-22 06:12:29,933][04746] Avg episode reward: 17.010, avg true_objective: 7.760 +[2024-09-22 06:12:30,089][04746] Num frames 3200... +[2024-09-22 06:12:30,263][04746] Num frames 3300... +[2024-09-22 06:12:30,419][04746] Num frames 3400... +[2024-09-22 06:12:30,579][04746] Num frames 3500... +[2024-09-22 06:12:30,757][04746] Num frames 3600... +[2024-09-22 06:12:30,909][04746] Num frames 3700... +[2024-09-22 06:12:31,071][04746] Num frames 3800... +[2024-09-22 06:12:31,236][04746] Num frames 3900... +[2024-09-22 06:12:31,394][04746] Num frames 4000... +[2024-09-22 06:12:31,552][04746] Num frames 4100... +[2024-09-22 06:12:31,697][04746] Num frames 4200... +[2024-09-22 06:12:31,863][04746] Num frames 4300... +[2024-09-22 06:12:32,023][04746] Num frames 4400... +[2024-09-22 06:12:32,187][04746] Num frames 4500... +[2024-09-22 06:12:32,342][04746] Num frames 4600... +[2024-09-22 06:12:32,504][04746] Num frames 4700... +[2024-09-22 06:12:32,651][04746] Num frames 4800... +[2024-09-22 06:12:32,856][04746] Avg episode rewards: #0: 23.392, true rewards: #0: 9.792 +[2024-09-22 06:12:32,857][04746] Avg episode reward: 23.392, avg true_objective: 9.792 +[2024-09-22 06:12:32,867][04746] Num frames 4900... +[2024-09-22 06:12:33,037][04746] Num frames 5000... +[2024-09-22 06:12:33,185][04746] Num frames 5100... +[2024-09-22 06:12:33,346][04746] Num frames 5200... +[2024-09-22 06:12:33,503][04746] Num frames 5300... +[2024-09-22 06:12:33,661][04746] Num frames 5400... +[2024-09-22 06:12:33,817][04746] Num frames 5500... +[2024-09-22 06:12:33,968][04746] Num frames 5600... +[2024-09-22 06:12:34,125][04746] Num frames 5700... +[2024-09-22 06:12:34,214][04746] Avg episode rewards: #0: 22.366, true rewards: #0: 9.533 +[2024-09-22 06:12:34,215][04746] Avg episode reward: 22.366, avg true_objective: 9.533 +[2024-09-22 06:12:34,350][04746] Num frames 5800... +[2024-09-22 06:12:34,489][04746] Num frames 5900... +[2024-09-22 06:12:34,652][04746] Num frames 6000... +[2024-09-22 06:12:34,807][04746] Num frames 6100... +[2024-09-22 06:12:34,979][04746] Num frames 6200... +[2024-09-22 06:12:35,123][04746] Num frames 6300... +[2024-09-22 06:12:35,277][04746] Num frames 6400... +[2024-09-22 06:12:35,448][04746] Num frames 6500... +[2024-09-22 06:12:35,606][04746] Num frames 6600... +[2024-09-22 06:12:35,753][04746] Num frames 6700... +[2024-09-22 06:12:35,913][04746] Num frames 6800... +[2024-09-22 06:12:36,072][04746] Num frames 6900... +[2024-09-22 06:12:36,217][04746] Num frames 7000... +[2024-09-22 06:12:36,375][04746] Num frames 7100... +[2024-09-22 06:12:36,543][04746] Num frames 7200... +[2024-09-22 06:12:36,687][04746] Num frames 7300... +[2024-09-22 06:12:36,843][04746] Num frames 7400... +[2024-09-22 06:12:37,003][04746] Num frames 7500... +[2024-09-22 06:12:37,157][04746] Num frames 7600... +[2024-09-22 06:12:37,313][04746] Num frames 7700... +[2024-09-22 06:12:37,479][04746] Num frames 7800... +[2024-09-22 06:12:37,573][04746] Avg episode rewards: #0: 28.028, true rewards: #0: 11.171 +[2024-09-22 06:12:37,574][04746] Avg episode reward: 28.028, avg true_objective: 11.171 +[2024-09-22 06:12:37,706][04746] Num frames 7900... +[2024-09-22 06:12:37,854][04746] Num frames 8000... +[2024-09-22 06:12:38,021][04746] Num frames 8100... +[2024-09-22 06:12:38,197][04746] Num frames 8200... +[2024-09-22 06:12:38,358][04746] Num frames 8300... +[2024-09-22 06:12:38,510][04746] Num frames 8400... +[2024-09-22 06:12:38,687][04746] Num frames 8500... +[2024-09-22 06:12:38,841][04746] Num frames 8600... +[2024-09-22 06:12:38,995][04746] Num frames 8700... +[2024-09-22 06:12:39,166][04746] Num frames 8800... +[2024-09-22 06:12:39,329][04746] Num frames 8900... +[2024-09-22 06:12:39,484][04746] Num frames 9000... +[2024-09-22 06:12:39,649][04746] Num frames 9100... +[2024-09-22 06:12:39,835][04746] Num frames 9200... +[2024-09-22 06:12:40,038][04746] Avg episode rewards: #0: 29.365, true rewards: #0: 11.615 +[2024-09-22 06:12:40,039][04746] Avg episode reward: 29.365, avg true_objective: 11.615 +[2024-09-22 06:12:40,056][04746] Num frames 9300... +[2024-09-22 06:12:40,215][04746] Num frames 9400... +[2024-09-22 06:12:40,393][04746] Num frames 9500... +[2024-09-22 06:12:40,553][04746] Num frames 9600... +[2024-09-22 06:12:40,718][04746] Num frames 9700... +[2024-09-22 06:12:40,899][04746] Num frames 9800... +[2024-09-22 06:12:41,058][04746] Num frames 9900... +[2024-09-22 06:12:41,221][04746] Num frames 10000... +[2024-09-22 06:12:41,410][04746] Num frames 10100... +[2024-09-22 06:12:41,585][04746] Num frames 10200... +[2024-09-22 06:12:41,750][04746] Num frames 10300... +[2024-09-22 06:12:41,924][04746] Num frames 10400... +[2024-09-22 06:12:42,091][04746] Num frames 10500... +[2024-09-22 06:12:42,271][04746] Num frames 10600... +[2024-09-22 06:12:42,437][04746] Num frames 10700... +[2024-09-22 06:12:42,602][04746] Num frames 10800... +[2024-09-22 06:12:42,789][04746] Num frames 10900... +[2024-09-22 06:12:42,954][04746] Num frames 11000... +[2024-09-22 06:12:43,120][04746] Num frames 11100... +[2024-09-22 06:12:43,279][04746] Num frames 11200... +[2024-09-22 06:12:43,456][04746] Num frames 11300... +[2024-09-22 06:12:43,668][04746] Avg episode rewards: #0: 32.546, true rewards: #0: 12.658 +[2024-09-22 06:12:43,670][04746] Avg episode reward: 32.546, avg true_objective: 12.658 +[2024-09-22 06:12:43,687][04746] Num frames 11400... +[2024-09-22 06:12:43,854][04746] Num frames 11500... +[2024-09-22 06:12:44,023][04746] Num frames 11600... +[2024-09-22 06:12:44,199][04746] Num frames 11700... +[2024-09-22 06:12:44,351][04746] Num frames 11800... +[2024-09-22 06:12:44,507][04746] Num frames 11900... +[2024-09-22 06:12:44,682][04746] Num frames 12000... +[2024-09-22 06:12:44,829][04746] Num frames 12100... +[2024-09-22 06:12:44,985][04746] Num frames 12200... +[2024-09-22 06:12:45,160][04746] Num frames 12300... +[2024-09-22 06:12:45,324][04746] Num frames 12400... +[2024-09-22 06:12:45,483][04746] Num frames 12500... +[2024-09-22 06:12:45,642][04746] Num frames 12600... +[2024-09-22 06:12:45,814][04746] Num frames 12700... +[2024-09-22 06:12:45,972][04746] Num frames 12800... +[2024-09-22 06:12:46,141][04746] Num frames 12900... +[2024-09-22 06:12:46,299][04746] Num frames 13000... +[2024-09-22 06:12:46,460][04746] Avg episode rewards: #0: 33.657, true rewards: #0: 13.057 +[2024-09-22 06:12:46,462][04746] Avg episode reward: 33.657, avg true_objective: 13.057 +[2024-09-22 06:13:23,381][04746] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-09-22 06:14:25,599][04746] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-22 06:14:25,601][04746] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-22 06:14:25,603][04746] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-22 06:14:25,604][04746] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-22 06:14:25,607][04746] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-22 06:14:25,608][04746] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-22 06:14:25,609][04746] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-09-22 06:14:25,611][04746] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-22 06:14:25,612][04746] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-09-22 06:14:25,613][04746] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-09-22 06:14:25,614][04746] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-22 06:14:25,616][04746] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-22 06:14:25,617][04746] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-22 06:14:25,618][04746] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-22 06:14:25,623][04746] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-22 06:14:25,650][04746] RunningMeanStd input shape: (3, 72, 128) +[2024-09-22 06:14:25,653][04746] RunningMeanStd input shape: (1,) +[2024-09-22 06:14:25,667][04746] ConvEncoder: input_channels=3 +[2024-09-22 06:14:25,717][04746] Conv encoder output size: 512 +[2024-09-22 06:14:25,719][04746] Policy head output size: 512 +[2024-09-22 06:14:25,742][04746] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... +[2024-09-22 06:14:26,239][04746] Num frames 100... +[2024-09-22 06:14:26,399][04746] Num frames 200... +[2024-09-22 06:14:26,561][04746] Num frames 300... +[2024-09-22 06:14:26,716][04746] Num frames 400... +[2024-09-22 06:14:26,878][04746] Num frames 500... +[2024-09-22 06:14:27,036][04746] Num frames 600... +[2024-09-22 06:14:27,212][04746] Num frames 700... +[2024-09-22 06:14:27,367][04746] Num frames 800... +[2024-09-22 06:14:27,531][04746] Num frames 900... +[2024-09-22 06:14:27,697][04746] Num frames 1000... +[2024-09-22 06:14:27,867][04746] Num frames 1100... +[2024-09-22 06:14:28,071][04746] Avg episode rewards: #0: 28.950, true rewards: #0: 11.950 +[2024-09-22 06:14:28,073][04746] Avg episode reward: 28.950, avg true_objective: 11.950 +[2024-09-22 06:14:28,084][04746] Num frames 1200... +[2024-09-22 06:14:28,257][04746] Num frames 1300... +[2024-09-22 06:14:28,436][04746] Num frames 1400... +[2024-09-22 06:14:28,602][04746] Num frames 1500... +[2024-09-22 06:14:28,767][04746] Num frames 1600... +[2024-09-22 06:14:28,901][04746] Num frames 1700... +[2024-09-22 06:14:29,041][04746] Num frames 1800... +[2024-09-22 06:14:29,148][04746] Avg episode rewards: #0: 19.645, true rewards: #0: 9.145 +[2024-09-22 06:14:29,150][04746] Avg episode reward: 19.645, avg true_objective: 9.145 +[2024-09-22 06:14:29,250][04746] Num frames 1900... +[2024-09-22 06:14:29,382][04746] Num frames 2000... +[2024-09-22 06:14:29,514][04746] Num frames 2100... +[2024-09-22 06:14:29,648][04746] Num frames 2200... +[2024-09-22 06:14:29,785][04746] Num frames 2300... +[2024-09-22 06:14:29,923][04746] Num frames 2400... +[2024-09-22 06:14:30,060][04746] Num frames 2500... +[2024-09-22 06:14:30,190][04746] Num frames 2600... +[2024-09-22 06:14:30,352][04746] Num frames 2700... +[2024-09-22 06:14:30,483][04746] Num frames 2800... +[2024-09-22 06:14:30,614][04746] Num frames 2900... +[2024-09-22 06:14:30,745][04746] Num frames 3000... +[2024-09-22 06:14:30,875][04746] Num frames 3100... +[2024-09-22 06:14:30,945][04746] Avg episode rewards: #0: 22.364, true rewards: #0: 10.363 +[2024-09-22 06:14:30,947][04746] Avg episode reward: 22.364, avg true_objective: 10.363 +[2024-09-22 06:14:31,071][04746] Num frames 3200... +[2024-09-22 06:14:31,216][04746] Num frames 3300... +[2024-09-22 06:14:31,359][04746] Num frames 3400... +[2024-09-22 06:14:31,489][04746] Num frames 3500... +[2024-09-22 06:14:31,623][04746] Num frames 3600... +[2024-09-22 06:14:31,781][04746] Num frames 3700... +[2024-09-22 06:14:31,942][04746] Num frames 3800... +[2024-09-22 06:14:32,079][04746] Num frames 3900... +[2024-09-22 06:14:32,210][04746] Num frames 4000... +[2024-09-22 06:14:32,339][04746] Num frames 4100... +[2024-09-22 06:14:32,471][04746] Num frames 4200... +[2024-09-22 06:14:32,602][04746] Num frames 4300... +[2024-09-22 06:14:32,730][04746] Num frames 4400... +[2024-09-22 06:14:32,861][04746] Num frames 4500... +[2024-09-22 06:14:32,996][04746] Num frames 4600... +[2024-09-22 06:14:33,126][04746] Num frames 4700... +[2024-09-22 06:14:33,262][04746] Num frames 4800... +[2024-09-22 06:14:33,444][04746] Num frames 4900... +[2024-09-22 06:14:33,599][04746] Num frames 5000... +[2024-09-22 06:14:33,729][04746] Num frames 5100... +[2024-09-22 06:14:33,879][04746] Avg episode rewards: #0: 29.935, true rewards: #0: 12.935 +[2024-09-22 06:14:33,881][04746] Avg episode reward: 29.935, avg true_objective: 12.935 +[2024-09-22 06:14:33,922][04746] Num frames 5200... +[2024-09-22 06:14:34,053][04746] Num frames 5300... +[2024-09-22 06:14:34,184][04746] Num frames 5400... +[2024-09-22 06:14:34,314][04746] Num frames 5500... +[2024-09-22 06:14:34,446][04746] Num frames 5600... +[2024-09-22 06:14:34,576][04746] Num frames 5700... +[2024-09-22 06:14:34,703][04746] Num frames 5800... +[2024-09-22 06:14:34,830][04746] Num frames 5900... +[2024-09-22 06:14:34,964][04746] Num frames 6000... +[2024-09-22 06:14:35,094][04746] Num frames 6100... +[2024-09-22 06:14:35,219][04746] Num frames 6200... +[2024-09-22 06:14:35,347][04746] Num frames 6300... +[2024-09-22 06:14:35,482][04746] Num frames 6400... +[2024-09-22 06:14:35,619][04746] Num frames 6500... +[2024-09-22 06:14:35,747][04746] Num frames 6600... +[2024-09-22 06:14:35,879][04746] Num frames 6700... +[2024-09-22 06:14:36,016][04746] Num frames 6800... +[2024-09-22 06:14:36,150][04746] Num frames 6900... +[2024-09-22 06:14:36,279][04746] Num frames 7000... +[2024-09-22 06:14:36,407][04746] Num frames 7100... +[2024-09-22 06:14:36,538][04746] Num frames 7200... +[2024-09-22 06:14:36,690][04746] Avg episode rewards: #0: 36.148, true rewards: #0: 14.548 +[2024-09-22 06:14:36,693][04746] Avg episode reward: 36.148, avg true_objective: 14.548 +[2024-09-22 06:14:36,728][04746] Num frames 7300... +[2024-09-22 06:14:36,853][04746] Num frames 7400... +[2024-09-22 06:14:36,991][04746] Num frames 7500... +[2024-09-22 06:14:37,126][04746] Num frames 7600... +[2024-09-22 06:14:37,254][04746] Num frames 7700... +[2024-09-22 06:14:37,381][04746] Num frames 7800... +[2024-09-22 06:14:37,508][04746] Num frames 7900... +[2024-09-22 06:14:37,635][04746] Avg episode rewards: #0: 32.596, true rewards: #0: 13.263 +[2024-09-22 06:14:37,637][04746] Avg episode reward: 32.596, avg true_objective: 13.263 +[2024-09-22 06:14:37,692][04746] Num frames 8000... +[2024-09-22 06:14:37,818][04746] Num frames 8100... +[2024-09-22 06:14:37,945][04746] Num frames 8200... +[2024-09-22 06:14:38,070][04746] Num frames 8300... +[2024-09-22 06:14:38,200][04746] Num frames 8400... +[2024-09-22 06:14:38,326][04746] Num frames 8500... +[2024-09-22 06:14:38,458][04746] Num frames 8600... +[2024-09-22 06:14:38,590][04746] Num frames 8700... +[2024-09-22 06:14:38,725][04746] Num frames 8800... +[2024-09-22 06:14:38,854][04746] Num frames 8900... +[2024-09-22 06:14:38,992][04746] Num frames 9000... +[2024-09-22 06:14:39,124][04746] Num frames 9100... +[2024-09-22 06:14:39,263][04746] Num frames 9200... +[2024-09-22 06:14:39,399][04746] Num frames 9300... +[2024-09-22 06:17:24,401][11734] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-09-22 06:17:24,404][11734] Rollout worker 0 uses device cpu +[2024-09-22 06:17:24,406][11734] Rollout worker 1 uses device cpu +[2024-09-22 06:17:24,407][11734] Rollout worker 2 uses device cpu +[2024-09-22 06:17:24,409][11734] Rollout worker 3 uses device cpu +[2024-09-22 06:17:24,410][11734] Rollout worker 4 uses device cpu +[2024-09-22 06:17:24,411][11734] Rollout worker 5 uses device cpu +[2024-09-22 06:17:24,413][11734] Rollout worker 6 uses device cpu +[2024-09-22 06:17:24,414][11734] Rollout worker 7 uses device cpu +[2024-09-22 06:17:24,484][11734] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 06:17:24,486][11734] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-22 06:17:24,526][11734] Starting all processes... +[2024-09-22 06:17:24,528][11734] Starting process learner_proc0 +[2024-09-22 06:17:24,938][11734] Starting all processes... +[2024-09-22 06:17:24,947][11734] Starting process inference_proc0-0 +[2024-09-22 06:17:24,947][11734] Starting process rollout_proc0 +[2024-09-22 06:17:24,947][11734] Starting process rollout_proc1 +[2024-09-22 06:17:24,949][11734] Starting process rollout_proc2 +[2024-09-22 06:17:24,950][11734] Starting process rollout_proc3 +[2024-09-22 06:17:24,955][11734] Starting process rollout_proc4 +[2024-09-22 06:17:24,970][11734] Starting process rollout_proc5 +[2024-09-22 06:17:24,973][11734] Starting process rollout_proc6 +[2024-09-22 06:17:24,982][11734] Starting process rollout_proc7 +[2024-09-22 06:17:29,220][12533] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 06:17:29,221][12533] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-22 06:17:29,251][12533] Num visible devices: 1 +[2024-09-22 06:17:29,320][12536] Worker 2 uses CPU cores [2] +[2024-09-22 06:17:29,320][12520] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 06:17:29,321][12520] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-22 06:17:29,341][12520] Num visible devices: 1 +[2024-09-22 06:17:29,352][12537] Worker 3 uses CPU cores [3] +[2024-09-22 06:17:29,391][12520] Starting seed is not provided +[2024-09-22 06:17:29,392][12520] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 06:17:29,393][12520] Initializing actor-critic model on device cuda:0 +[2024-09-22 06:17:29,395][12520] RunningMeanStd input shape: (3, 72, 128) +[2024-09-22 06:17:29,397][12520] RunningMeanStd input shape: (1,) +[2024-09-22 06:17:29,421][12520] ConvEncoder: input_channels=3 +[2024-09-22 06:17:29,426][12540] Worker 7 uses CPU cores [7] +[2024-09-22 06:17:29,502][12538] Worker 4 uses CPU cores [4] +[2024-09-22 06:17:29,503][12535] Worker 1 uses CPU cores [1] +[2024-09-22 06:17:29,515][12534] Worker 0 uses CPU cores [0] +[2024-09-22 06:17:29,589][12539] Worker 6 uses CPU cores [6] +[2024-09-22 06:17:29,592][12520] Conv encoder output size: 512 +[2024-09-22 06:17:29,593][12520] Policy head output size: 512 +[2024-09-22 06:17:29,609][12520] Created Actor Critic model with architecture: +[2024-09-22 06:17:29,609][12520] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-22 06:17:29,872][12520] Using optimizer +[2024-09-22 06:17:29,892][12541] Worker 5 uses CPU cores [5] +[2024-09-22 06:17:30,625][12520] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... +[2024-09-22 06:17:30,665][12520] Loading model from checkpoint +[2024-09-22 06:17:30,667][12520] Loaded experiment state at self.train_step=2443, self.env_steps=10006528 +[2024-09-22 06:17:30,667][12520] Initialized policy 0 weights for model version 2443 +[2024-09-22 06:17:30,672][12520] LearnerWorker_p0 finished initialization! +[2024-09-22 06:17:30,672][12520] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 06:17:30,860][12533] RunningMeanStd input shape: (3, 72, 128) +[2024-09-22 06:17:30,861][12533] RunningMeanStd input shape: (1,) +[2024-09-22 06:17:30,875][12533] ConvEncoder: input_channels=3 +[2024-09-22 06:17:31,004][12533] Conv encoder output size: 512 +[2024-09-22 06:17:31,005][12533] Policy head output size: 512 +[2024-09-22 06:17:31,064][11734] Inference worker 0-0 is ready! +[2024-09-22 06:17:31,066][11734] All inference workers are ready! Signal rollout workers to start! +[2024-09-22 06:17:31,122][12540] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:17:31,124][12534] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:17:31,124][12539] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:17:31,126][12538] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:17:31,126][12541] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:17:31,126][12535] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:17:31,134][12536] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:17:31,140][12537] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:17:31,458][12540] Decorrelating experience for 0 frames... +[2024-09-22 06:17:31,459][12539] Decorrelating experience for 0 frames... +[2024-09-22 06:17:31,459][12541] Decorrelating experience for 0 frames... +[2024-09-22 06:17:31,619][12538] Decorrelating experience for 0 frames... +[2024-09-22 06:17:31,621][12534] Decorrelating experience for 0 frames... +[2024-09-22 06:17:31,724][12539] Decorrelating experience for 32 frames... +[2024-09-22 06:17:31,870][12541] Decorrelating experience for 32 frames... +[2024-09-22 06:17:31,914][12535] Decorrelating experience for 0 frames... +[2024-09-22 06:17:32,052][12534] Decorrelating experience for 32 frames... +[2024-09-22 06:17:32,052][12538] Decorrelating experience for 32 frames... +[2024-09-22 06:17:32,149][12539] Decorrelating experience for 64 frames... +[2024-09-22 06:17:32,266][12535] Decorrelating experience for 32 frames... +[2024-09-22 06:17:32,310][12536] Decorrelating experience for 0 frames... +[2024-09-22 06:17:32,349][12540] Decorrelating experience for 32 frames... +[2024-09-22 06:17:32,511][12541] Decorrelating experience for 64 frames... +[2024-09-22 06:17:32,621][12538] Decorrelating experience for 64 frames... +[2024-09-22 06:17:32,622][12534] Decorrelating experience for 64 frames... +[2024-09-22 06:17:32,703][12536] Decorrelating experience for 32 frames... +[2024-09-22 06:17:32,767][12537] Decorrelating experience for 0 frames... +[2024-09-22 06:17:32,785][12539] Decorrelating experience for 96 frames... +[2024-09-22 06:17:32,810][12535] Decorrelating experience for 64 frames... +[2024-09-22 06:17:32,862][12540] Decorrelating experience for 64 frames... +[2024-09-22 06:17:32,943][12541] Decorrelating experience for 96 frames... +[2024-09-22 06:17:33,088][12534] Decorrelating experience for 96 frames... +[2024-09-22 06:17:33,157][12538] Decorrelating experience for 96 frames... +[2024-09-22 06:17:33,253][12537] Decorrelating experience for 32 frames... +[2024-09-22 06:17:33,257][12535] Decorrelating experience for 96 frames... +[2024-09-22 06:17:33,298][12536] Decorrelating experience for 64 frames... +[2024-09-22 06:17:33,368][12540] Decorrelating experience for 96 frames... +[2024-09-22 06:17:33,615][12536] Decorrelating experience for 96 frames... +[2024-09-22 06:17:33,697][12537] Decorrelating experience for 64 frames... +[2024-09-22 06:17:34,075][12537] Decorrelating experience for 96 frames... +[2024-09-22 06:17:34,420][11734] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10006528. Throughput: 0: nan. Samples: 772. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-22 06:17:34,422][11734] Avg episode reward: [(0, '1.866')] +[2024-09-22 06:17:34,924][12520] Signal inference workers to stop experience collection... +[2024-09-22 06:17:34,935][12533] InferenceWorker_p0-w0: stopping experience collection +[2024-09-22 06:17:37,581][12520] Signal inference workers to resume experience collection... +[2024-09-22 06:17:37,582][12520] Stopping Batcher_0... +[2024-09-22 06:17:37,582][12520] Loop batcher_evt_loop terminating... +[2024-09-22 06:17:37,593][11734] Component Batcher_0 stopped! +[2024-09-22 06:17:37,604][12533] Weights refcount: 2 0 +[2024-09-22 06:17:37,611][12533] Stopping InferenceWorker_p0-w0... +[2024-09-22 06:17:37,610][11734] Component InferenceWorker_p0-w0 stopped! +[2024-09-22 06:17:37,613][12533] Loop inference_proc0-0_evt_loop terminating... +[2024-09-22 06:17:37,629][12534] Stopping RolloutWorker_w0... +[2024-09-22 06:17:37,630][12534] Loop rollout_proc0_evt_loop terminating... +[2024-09-22 06:17:37,630][11734] Component RolloutWorker_w0 stopped! +[2024-09-22 06:17:37,632][12540] Stopping RolloutWorker_w7... +[2024-09-22 06:17:37,634][12540] Loop rollout_proc7_evt_loop terminating... +[2024-09-22 06:17:37,634][12536] Stopping RolloutWorker_w2... +[2024-09-22 06:17:37,633][11734] Component RolloutWorker_w7 stopped! +[2024-09-22 06:17:37,635][12536] Loop rollout_proc2_evt_loop terminating... +[2024-09-22 06:17:37,635][11734] Component RolloutWorker_w2 stopped! +[2024-09-22 06:17:37,639][12539] Stopping RolloutWorker_w6... +[2024-09-22 06:17:37,639][12539] Loop rollout_proc6_evt_loop terminating... +[2024-09-22 06:17:37,641][11734] Component RolloutWorker_w6 stopped! +[2024-09-22 06:17:37,642][12537] Stopping RolloutWorker_w3... +[2024-09-22 06:17:37,644][12537] Loop rollout_proc3_evt_loop terminating... +[2024-09-22 06:17:37,645][11734] Component RolloutWorker_w3 stopped! +[2024-09-22 06:17:37,670][12541] Stopping RolloutWorker_w5... +[2024-09-22 06:17:37,670][11734] Component RolloutWorker_w5 stopped! +[2024-09-22 06:17:37,672][12541] Loop rollout_proc5_evt_loop terminating... +[2024-09-22 06:17:37,725][12535] Stopping RolloutWorker_w1... +[2024-09-22 06:17:37,726][12535] Loop rollout_proc1_evt_loop terminating... +[2024-09-22 06:17:37,725][11734] Component RolloutWorker_w1 stopped! +[2024-09-22 06:17:37,848][11734] Component RolloutWorker_w4 stopped! +[2024-09-22 06:17:37,847][12538] Stopping RolloutWorker_w4... +[2024-09-22 06:17:37,853][12538] Loop rollout_proc4_evt_loop terminating... +[2024-09-22 06:17:38,311][12520] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... +[2024-09-22 06:17:38,415][12520] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002311_9465856.pth +[2024-09-22 06:17:38,430][12520] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... +[2024-09-22 06:17:38,536][12520] Stopping LearnerWorker_p0... +[2024-09-22 06:17:38,537][12520] Loop learner_proc0_evt_loop terminating... +[2024-09-22 06:17:38,536][11734] Component LearnerWorker_p0 stopped! +[2024-09-22 06:17:38,540][11734] Waiting for process learner_proc0 to stop... +[2024-09-22 06:17:39,375][11734] Waiting for process inference_proc0-0 to join... +[2024-09-22 06:17:39,377][11734] Waiting for process rollout_proc0 to join... +[2024-09-22 06:17:39,380][11734] Waiting for process rollout_proc1 to join... +[2024-09-22 06:17:39,382][11734] Waiting for process rollout_proc2 to join... +[2024-09-22 06:17:39,384][11734] Waiting for process rollout_proc3 to join... +[2024-09-22 06:17:39,387][11734] Waiting for process rollout_proc4 to join... +[2024-09-22 06:17:39,389][11734] Waiting for process rollout_proc5 to join... +[2024-09-22 06:17:39,391][11734] Waiting for process rollout_proc6 to join... +[2024-09-22 06:17:39,394][11734] Waiting for process rollout_proc7 to join... +[2024-09-22 06:17:39,396][11734] Batcher 0 profile tree view: +batching: 0.0272, releasing_batches: 0.0006 +[2024-09-22 06:17:39,399][11734] InferenceWorker_p0-w0 profile tree view: +update_model: 0.0150 +wait_policy: 0.0001 + wait_policy_total: 1.7781 +one_step: 0.0072 + handle_policy_step: 1.9771 + deserialize: 0.0586, stack: 0.0103, obs_to_device_normalize: 0.3707, forward: 1.2566, send_messages: 0.1221 + prepare_outputs: 0.1041 + to_cpu: 0.0496 +[2024-09-22 06:17:39,402][11734] Learner 0 profile tree view: +misc: 0.0000, prepare_batch: 1.5120 +train: 2.3528 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0005, kl_divergence: 0.0125, after_optimizer: 0.0419 + calculate_losses: 0.9267 + losses_init: 0.0000, forward_head: 0.3230, bptt_initial: 0.5264, tail: 0.0353, advantages_returns: 0.0010, losses: 0.0367 + bptt: 0.0038 + bptt_forward_core: 0.0036 + update: 1.3702 + clip: 0.0448 +[2024-09-22 06:17:39,405][11734] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.0398, env_step: 0.4080, overhead: 0.0266, complete_rollouts: 0.0008 +save_policy_outputs: 0.0351 + split_output_tensors: 0.0139 +[2024-09-22 06:17:39,406][11734] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0010, enqueue_policy_requests: 0.0449, env_step: 0.4532, overhead: 0.0323, complete_rollouts: 0.0014 +save_policy_outputs: 0.0410 + split_output_tensors: 0.0159 +[2024-09-22 06:17:39,409][11734] Loop Runner_EvtLoop terminating... +[2024-09-22 06:17:39,410][11734] Runner profile tree view: +main_loop: 14.8845 +[2024-09-22 06:17:39,412][11734] Collected {0: 10014720}, FPS: 550.4 +[2024-09-22 06:17:39,433][11734] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-22 06:17:39,434][11734] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-22 06:17:39,435][11734] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-22 06:17:39,438][11734] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-22 06:17:39,439][11734] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-22 06:17:39,440][11734] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-22 06:17:39,442][11734] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-09-22 06:17:39,443][11734] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-22 06:17:39,444][11734] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-09-22 06:17:39,446][11734] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-09-22 06:17:39,447][11734] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-22 06:17:39,449][11734] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-22 06:17:39,450][11734] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-22 06:17:39,452][11734] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-22 06:17:39,454][11734] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-22 06:17:39,486][11734] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:17:39,490][11734] RunningMeanStd input shape: (3, 72, 128) +[2024-09-22 06:17:39,493][11734] RunningMeanStd input shape: (1,) +[2024-09-22 06:17:39,510][11734] ConvEncoder: input_channels=3 +[2024-09-22 06:17:39,647][11734] Conv encoder output size: 512 +[2024-09-22 06:17:39,649][11734] Policy head output size: 512 +[2024-09-22 06:17:39,942][11734] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... +[2024-09-22 06:17:40,875][11734] Num frames 100... +[2024-09-22 06:17:41,034][11734] Num frames 200... +[2024-09-22 06:17:41,208][11734] Num frames 300... +[2024-09-22 06:17:41,367][11734] Num frames 400... +[2024-09-22 06:17:41,530][11734] Num frames 500... +[2024-09-22 06:17:41,691][11734] Num frames 600... +[2024-09-22 06:17:41,865][11734] Num frames 700... +[2024-09-22 06:17:42,034][11734] Num frames 800... +[2024-09-22 06:17:42,189][11734] Num frames 900... +[2024-09-22 06:17:42,365][11734] Num frames 1000... +[2024-09-22 06:17:42,537][11734] Num frames 1100... +[2024-09-22 06:17:42,708][11734] Num frames 1200... +[2024-09-22 06:17:42,874][11734] Num frames 1300... +[2024-09-22 06:17:43,048][11734] Num frames 1400... +[2024-09-22 06:17:43,217][11734] Num frames 1500... +[2024-09-22 06:17:43,376][11734] Num frames 1600... +[2024-09-22 06:17:43,525][11734] Num frames 1700... +[2024-09-22 06:17:43,702][11734] Num frames 1800... +[2024-09-22 06:17:43,842][11734] Num frames 1900... +[2024-09-22 06:17:43,987][11734] Num frames 2000... +[2024-09-22 06:17:44,100][11734] Avg episode rewards: #0: 54.329, true rewards: #0: 20.330 +[2024-09-22 06:17:44,102][11734] Avg episode reward: 54.329, avg true_objective: 20.330 +[2024-09-22 06:17:44,216][11734] Num frames 2100... +[2024-09-22 06:17:44,378][11734] Num frames 2200... +[2024-09-22 06:17:44,535][11734] Num frames 2300... +[2024-09-22 06:17:44,683][11734] Num frames 2400... +[2024-09-22 06:17:44,870][11734] Avg episode rewards: #0: 29.904, true rewards: #0: 12.405 +[2024-09-22 06:17:44,872][11734] Avg episode reward: 29.904, avg true_objective: 12.405 +[2024-09-22 06:17:44,909][11734] Num frames 2500... +[2024-09-22 06:17:45,059][11734] Num frames 2600... +[2024-09-22 06:17:45,215][11734] Num frames 2700... +[2024-09-22 06:17:45,387][11734] Num frames 2800... +[2024-09-22 06:17:45,547][11734] Num frames 2900... +[2024-09-22 06:17:45,693][11734] Num frames 3000... +[2024-09-22 06:17:45,849][11734] Num frames 3100... +[2024-09-22 06:17:46,021][11734] Num frames 3200... +[2024-09-22 06:17:46,169][11734] Num frames 3300... +[2024-09-22 06:17:46,317][11734] Num frames 3400... +[2024-09-22 06:17:46,489][11734] Num frames 3500... +[2024-09-22 06:17:46,640][11734] Num frames 3600... +[2024-09-22 06:17:46,776][11734] Num frames 3700... +[2024-09-22 06:17:46,959][11734] Num frames 3800... +[2024-09-22 06:17:47,123][11734] Num frames 3900... +[2024-09-22 06:17:47,298][11734] Num frames 4000... +[2024-09-22 06:17:47,471][11734] Num frames 4100... +[2024-09-22 06:17:47,656][11734] Num frames 4200... +[2024-09-22 06:17:47,811][11734] Num frames 4300... +[2024-09-22 06:17:47,973][11734] Num frames 4400... +[2024-09-22 06:17:48,146][11734] Num frames 4500... +[2024-09-22 06:17:48,361][11734] Avg episode rewards: #0: 40.936, true rewards: #0: 15.270 +[2024-09-22 06:17:48,364][11734] Avg episode reward: 40.936, avg true_objective: 15.270 +[2024-09-22 06:17:48,400][11734] Num frames 4600... +[2024-09-22 06:17:48,575][11734] Num frames 4700... +[2024-09-22 06:17:48,748][11734] Num frames 4800... +[2024-09-22 06:17:48,937][11734] Num frames 4900... +[2024-09-22 06:17:49,090][11734] Num frames 5000... +[2024-09-22 06:17:49,238][11734] Num frames 5100... +[2024-09-22 06:17:49,398][11734] Num frames 5200... +[2024-09-22 06:17:49,568][11734] Num frames 5300... +[2024-09-22 06:17:49,722][11734] Num frames 5400... +[2024-09-22 06:17:49,869][11734] Num frames 5500... +[2024-09-22 06:17:50,043][11734] Num frames 5600... +[2024-09-22 06:17:50,213][11734] Avg episode rewards: #0: 36.672, true rewards: #0: 14.173 +[2024-09-22 06:17:50,215][11734] Avg episode reward: 36.672, avg true_objective: 14.173 +[2024-09-22 06:17:50,265][11734] Num frames 5700... +[2024-09-22 06:17:50,420][11734] Num frames 5800... +[2024-09-22 06:17:50,573][11734] Num frames 5900... +[2024-09-22 06:17:50,739][11734] Num frames 6000... +[2024-09-22 06:17:50,899][11734] Num frames 6100... +[2024-09-22 06:17:51,053][11734] Num frames 6200... +[2024-09-22 06:17:51,205][11734] Num frames 6300... +[2024-09-22 06:17:51,381][11734] Num frames 6400... +[2024-09-22 06:17:51,535][11734] Num frames 6500... +[2024-09-22 06:17:51,684][11734] Num frames 6600... +[2024-09-22 06:17:51,860][11734] Num frames 6700... +[2024-09-22 06:17:52,030][11734] Num frames 6800... +[2024-09-22 06:17:52,189][11734] Num frames 6900... +[2024-09-22 06:17:52,342][11734] Num frames 7000... +[2024-09-22 06:17:52,514][11734] Num frames 7100... +[2024-09-22 06:17:52,657][11734] Num frames 7200... +[2024-09-22 06:17:52,722][11734] Avg episode rewards: #0: 37.209, true rewards: #0: 14.410 +[2024-09-22 06:17:52,724][11734] Avg episode reward: 37.209, avg true_objective: 14.410 +[2024-09-22 06:17:52,867][11734] Num frames 7300... +[2024-09-22 06:17:53,040][11734] Num frames 7400... +[2024-09-22 06:17:53,184][11734] Num frames 7500... +[2024-09-22 06:17:53,357][11734] Num frames 7600... +[2024-09-22 06:17:53,515][11734] Num frames 7700... +[2024-09-22 06:17:53,682][11734] Num frames 7800... +[2024-09-22 06:17:53,839][11734] Num frames 7900... +[2024-09-22 06:17:53,941][11734] Avg episode rewards: #0: 34.208, true rewards: #0: 13.208 +[2024-09-22 06:17:53,943][11734] Avg episode reward: 34.208, avg true_objective: 13.208 +[2024-09-22 06:17:54,063][11734] Num frames 8000... +[2024-09-22 06:17:54,241][11734] Num frames 8100... +[2024-09-22 06:17:54,386][11734] Num frames 8200... +[2024-09-22 06:17:54,554][11734] Num frames 8300... +[2024-09-22 06:17:54,725][11734] Num frames 8400... +[2024-09-22 06:17:54,892][11734] Num frames 8500... +[2024-09-22 06:17:55,045][11734] Num frames 8600... +[2024-09-22 06:17:55,208][11734] Num frames 8700... +[2024-09-22 06:17:55,387][11734] Num frames 8800... +[2024-09-22 06:17:55,541][11734] Num frames 8900... +[2024-09-22 06:17:55,707][11734] Num frames 9000... +[2024-09-22 06:17:55,875][11734] Num frames 9100... +[2024-09-22 06:17:56,028][11734] Num frames 9200... +[2024-09-22 06:17:56,185][11734] Num frames 9300... +[2024-09-22 06:17:56,367][11734] Num frames 9400... +[2024-09-22 06:17:56,535][11734] Num frames 9500... +[2024-09-22 06:17:56,692][11734] Num frames 9600... +[2024-09-22 06:17:56,844][11734] Num frames 9700... +[2024-09-22 06:17:57,016][11734] Num frames 9800... +[2024-09-22 06:17:57,179][11734] Num frames 9900... +[2024-09-22 06:17:57,344][11734] Num frames 10000... +[2024-09-22 06:17:57,446][11734] Avg episode rewards: #0: 37.607, true rewards: #0: 14.321 +[2024-09-22 06:17:57,448][11734] Avg episode reward: 37.607, avg true_objective: 14.321 +[2024-09-22 06:17:57,563][11734] Num frames 10100... +[2024-09-22 06:17:57,749][11734] Num frames 10200... +[2024-09-22 06:17:57,907][11734] Num frames 10300... +[2024-09-22 06:17:58,065][11734] Num frames 10400... +[2024-09-22 06:17:58,228][11734] Num frames 10500... +[2024-09-22 06:17:58,400][11734] Num frames 10600... +[2024-09-22 06:17:58,559][11734] Num frames 10700... +[2024-09-22 06:17:58,731][11734] Num frames 10800... +[2024-09-22 06:17:58,890][11734] Num frames 10900... +[2024-09-22 06:17:59,071][11734] Num frames 11000... +[2024-09-22 06:17:59,239][11734] Num frames 11100... +[2024-09-22 06:17:59,394][11734] Num frames 11200... +[2024-09-22 06:17:59,587][11734] Num frames 11300... +[2024-09-22 06:17:59,757][11734] Num frames 11400... +[2024-09-22 06:17:59,924][11734] Num frames 11500... +[2024-09-22 06:18:00,108][11734] Num frames 11600... +[2024-09-22 06:18:00,295][11734] Num frames 11700... +[2024-09-22 06:18:00,474][11734] Num frames 11800... +[2024-09-22 06:18:00,643][11734] Num frames 11900... +[2024-09-22 06:18:00,829][11734] Num frames 12000... +[2024-09-22 06:18:00,980][11734] Num frames 12100... +[2024-09-22 06:18:01,084][11734] Avg episode rewards: #0: 40.406, true rewards: #0: 15.156 +[2024-09-22 06:18:01,086][11734] Avg episode reward: 40.406, avg true_objective: 15.156 +[2024-09-22 06:18:01,199][11734] Num frames 12200... +[2024-09-22 06:18:01,366][11734] Num frames 12300... +[2024-09-22 06:18:01,519][11734] Num frames 12400... +[2024-09-22 06:18:01,673][11734] Num frames 12500... +[2024-09-22 06:18:01,836][11734] Num frames 12600... +[2024-09-22 06:18:02,021][11734] Num frames 12700... +[2024-09-22 06:18:02,176][11734] Num frames 12800... +[2024-09-22 06:18:02,321][11734] Num frames 12900... +[2024-09-22 06:18:02,447][11734] Avg episode rewards: #0: 37.938, true rewards: #0: 14.383 +[2024-09-22 06:18:02,449][11734] Avg episode reward: 37.938, avg true_objective: 14.383 +[2024-09-22 06:18:02,543][11734] Num frames 13000... +[2024-09-22 06:18:02,691][11734] Num frames 13100... +[2024-09-22 06:18:02,860][11734] Num frames 13200... +[2024-09-22 06:18:03,019][11734] Num frames 13300... +[2024-09-22 06:18:03,185][11734] Avg episode rewards: #0: 34.760, true rewards: #0: 13.361 +[2024-09-22 06:18:03,187][11734] Avg episode reward: 34.760, avg true_objective: 13.361 +[2024-09-22 06:18:40,287][11734] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-09-22 06:18:40,316][11734] Environment doom_basic already registered, overwriting... +[2024-09-22 06:18:40,319][11734] Environment doom_two_colors_easy already registered, overwriting... +[2024-09-22 06:18:40,320][11734] Environment doom_two_colors_hard already registered, overwriting... +[2024-09-22 06:18:40,321][11734] Environment doom_dm already registered, overwriting... +[2024-09-22 06:18:40,322][11734] Environment doom_dwango5 already registered, overwriting... +[2024-09-22 06:18:40,325][11734] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-09-22 06:18:40,326][11734] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-09-22 06:18:40,327][11734] Environment doom_my_way_home already registered, overwriting... +[2024-09-22 06:18:40,328][11734] Environment doom_deadly_corridor already registered, overwriting... +[2024-09-22 06:18:40,331][11734] Environment doom_defend_the_center already registered, overwriting... +[2024-09-22 06:18:40,332][11734] Environment doom_defend_the_line already registered, overwriting... +[2024-09-22 06:18:40,334][11734] Environment doom_health_gathering already registered, overwriting... +[2024-09-22 06:18:40,336][11734] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-09-22 06:18:40,337][11734] Environment doom_battle already registered, overwriting... +[2024-09-22 06:18:40,338][11734] Environment doom_battle2 already registered, overwriting... +[2024-09-22 06:18:40,339][11734] Environment doom_duel_bots already registered, overwriting... +[2024-09-22 06:18:40,342][11734] Environment doom_deathmatch_bots already registered, overwriting... +[2024-09-22 06:18:40,343][11734] Environment doom_duel already registered, overwriting... +[2024-09-22 06:18:40,345][11734] Environment doom_deathmatch_full already registered, overwriting... +[2024-09-22 06:18:40,346][11734] Environment doom_benchmark already registered, overwriting... +[2024-09-22 06:18:40,348][11734] register_encoder_factory: +[2024-09-22 06:18:40,360][11734] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-22 06:18:40,366][11734] Experiment dir /content/train_dir/default_experiment already exists! +[2024-09-22 06:18:40,367][11734] Resuming existing experiment from /content/train_dir/default_experiment... +[2024-09-22 06:18:40,369][11734] Weights and Biases integration disabled +[2024-09-22 06:18:40,372][11734] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-09-22 06:18:42,971][11734] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/content/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=10000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=10000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 10000000} +git_hash=unknown +git_repo_name=not a git repository +[2024-09-22 06:18:42,973][11734] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-09-22 06:18:42,976][11734] Rollout worker 0 uses device cpu +[2024-09-22 06:18:42,977][11734] Rollout worker 1 uses device cpu +[2024-09-22 06:18:42,978][11734] Rollout worker 2 uses device cpu +[2024-09-22 06:18:42,981][11734] Rollout worker 3 uses device cpu +[2024-09-22 06:18:42,982][11734] Rollout worker 4 uses device cpu +[2024-09-22 06:18:42,983][11734] Rollout worker 5 uses device cpu +[2024-09-22 06:18:42,986][11734] Rollout worker 6 uses device cpu +[2024-09-22 06:18:42,987][11734] Rollout worker 7 uses device cpu +[2024-09-22 06:18:43,028][11734] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 06:18:43,030][11734] InferenceWorker_p0-w0: min num requests: 2 +[2024-09-22 06:18:43,068][11734] Starting all processes... +[2024-09-22 06:18:43,070][11734] Starting process learner_proc0 +[2024-09-22 06:18:43,118][11734] Starting all processes... +[2024-09-22 06:18:43,123][11734] Starting process inference_proc0-0 +[2024-09-22 06:18:43,125][11734] Starting process rollout_proc0 +[2024-09-22 06:18:43,126][11734] Starting process rollout_proc1 +[2024-09-22 06:18:43,129][11734] Starting process rollout_proc2 +[2024-09-22 06:18:43,131][11734] Starting process rollout_proc3 +[2024-09-22 06:18:43,133][11734] Starting process rollout_proc4 +[2024-09-22 06:18:43,140][11734] Starting process rollout_proc5 +[2024-09-22 06:18:43,141][11734] Starting process rollout_proc6 +[2024-09-22 06:18:43,155][11734] Starting process rollout_proc7 +[2024-09-22 06:18:47,451][13282] Worker 4 uses CPU cores [4] +[2024-09-22 06:18:47,487][13280] Worker 2 uses CPU cores [2] +[2024-09-22 06:18:47,594][13281] Worker 3 uses CPU cores [3] +[2024-09-22 06:18:47,605][13284] Worker 6 uses CPU cores [6] +[2024-09-22 06:18:47,701][13285] Worker 7 uses CPU cores [7] +[2024-09-22 06:18:47,722][13283] Worker 5 uses CPU cores [5] +[2024-09-22 06:18:47,769][13260] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 06:18:47,769][13260] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-09-22 06:18:47,788][13260] Num visible devices: 1 +[2024-09-22 06:18:47,794][13279] Worker 1 uses CPU cores [1] +[2024-09-22 06:18:47,809][13260] Starting seed is not provided +[2024-09-22 06:18:47,809][13260] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 06:18:47,809][13260] Initializing actor-critic model on device cuda:0 +[2024-09-22 06:18:47,810][13260] RunningMeanStd input shape: (3, 72, 128) +[2024-09-22 06:18:47,811][13260] RunningMeanStd input shape: (1,) +[2024-09-22 06:18:47,832][13260] ConvEncoder: input_channels=3 +[2024-09-22 06:18:47,846][13274] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 06:18:47,846][13274] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-09-22 06:18:47,862][13274] Num visible devices: 1 +[2024-09-22 06:18:47,918][13278] Worker 0 uses CPU cores [0] +[2024-09-22 06:18:47,975][13260] Conv encoder output size: 512 +[2024-09-22 06:18:47,975][13260] Policy head output size: 512 +[2024-09-22 06:18:47,992][13260] Created Actor Critic model with architecture: +[2024-09-22 06:18:47,992][13260] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-09-22 06:18:48,219][13260] Using optimizer +[2024-09-22 06:18:48,952][13260] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002445_10014720.pth... +[2024-09-22 06:18:48,997][13260] Loading model from checkpoint +[2024-09-22 06:18:48,999][13260] Loaded experiment state at self.train_step=2445, self.env_steps=10014720 +[2024-09-22 06:18:49,000][13260] Initialized policy 0 weights for model version 2445 +[2024-09-22 06:18:49,004][13260] LearnerWorker_p0 finished initialization! +[2024-09-22 06:18:49,004][13260] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-09-22 06:18:49,191][13274] RunningMeanStd input shape: (3, 72, 128) +[2024-09-22 06:18:49,193][13274] RunningMeanStd input shape: (1,) +[2024-09-22 06:18:49,207][13274] ConvEncoder: input_channels=3 +[2024-09-22 06:18:49,327][13274] Conv encoder output size: 512 +[2024-09-22 06:18:49,327][13274] Policy head output size: 512 +[2024-09-22 06:18:49,388][11734] Inference worker 0-0 is ready! +[2024-09-22 06:18:49,390][11734] All inference workers are ready! Signal rollout workers to start! +[2024-09-22 06:18:49,443][13278] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:18:49,446][13284] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:18:49,446][13279] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:18:49,449][13281] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:18:49,450][13283] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:18:49,451][13285] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:18:49,461][13280] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:18:49,508][13282] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-09-22 06:18:49,858][13281] Decorrelating experience for 0 frames... +[2024-09-22 06:18:49,938][13284] Decorrelating experience for 0 frames... +[2024-09-22 06:18:49,938][13278] Decorrelating experience for 0 frames... +[2024-09-22 06:18:49,957][13279] Decorrelating experience for 0 frames... +[2024-09-22 06:18:49,957][13283] Decorrelating experience for 0 frames... +[2024-09-22 06:18:49,958][13280] Decorrelating experience for 0 frames... +[2024-09-22 06:18:49,978][13282] Decorrelating experience for 0 frames... +[2024-09-22 06:18:50,224][13278] Decorrelating experience for 32 frames... +[2024-09-22 06:18:50,232][13283] Decorrelating experience for 32 frames... +[2024-09-22 06:18:50,240][13280] Decorrelating experience for 32 frames... +[2024-09-22 06:18:50,373][11734] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10014720. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-22 06:18:50,377][13281] Decorrelating experience for 32 frames... +[2024-09-22 06:18:50,407][13285] Decorrelating experience for 0 frames... +[2024-09-22 06:18:50,544][13282] Decorrelating experience for 32 frames... +[2024-09-22 06:18:50,583][13283] Decorrelating experience for 64 frames... +[2024-09-22 06:18:50,706][13285] Decorrelating experience for 32 frames... +[2024-09-22 06:18:50,833][13280] Decorrelating experience for 64 frames... +[2024-09-22 06:18:50,853][13279] Decorrelating experience for 32 frames... +[2024-09-22 06:18:50,868][13284] Decorrelating experience for 32 frames... +[2024-09-22 06:18:51,025][13278] Decorrelating experience for 64 frames... +[2024-09-22 06:18:51,063][13282] Decorrelating experience for 64 frames... +[2024-09-22 06:18:51,222][13285] Decorrelating experience for 64 frames... +[2024-09-22 06:18:51,262][13280] Decorrelating experience for 96 frames... +[2024-09-22 06:18:51,307][13279] Decorrelating experience for 64 frames... +[2024-09-22 06:18:51,314][13281] Decorrelating experience for 64 frames... +[2024-09-22 06:18:51,366][13282] Decorrelating experience for 96 frames... +[2024-09-22 06:18:51,460][13284] Decorrelating experience for 64 frames... +[2024-09-22 06:18:51,469][13283] Decorrelating experience for 96 frames... +[2024-09-22 06:18:51,703][13279] Decorrelating experience for 96 frames... +[2024-09-22 06:18:51,756][13284] Decorrelating experience for 96 frames... +[2024-09-22 06:18:51,759][13281] Decorrelating experience for 96 frames... +[2024-09-22 06:18:51,876][13278] Decorrelating experience for 96 frames... +[2024-09-22 06:18:51,905][13285] Decorrelating experience for 96 frames... +[2024-09-22 06:18:53,021][13260] Signal inference workers to stop experience collection... +[2024-09-22 06:18:53,028][13274] InferenceWorker_p0-w0: stopping experience collection +[2024-09-22 06:18:55,372][11734] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 10014720. Throughput: 0: 84.8. Samples: 424. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-09-22 06:18:55,375][11734] Avg episode reward: [(0, '2.298')] +[2024-09-22 06:18:55,699][13260] Signal inference workers to resume experience collection... +[2024-09-22 06:18:55,700][13260] Stopping Batcher_0... +[2024-09-22 06:18:55,701][13260] Loop batcher_evt_loop terminating... +[2024-09-22 06:18:55,713][11734] Component Batcher_0 stopped! +[2024-09-22 06:18:55,723][13274] Weights refcount: 2 0 +[2024-09-22 06:18:55,725][13274] Stopping InferenceWorker_p0-w0... +[2024-09-22 06:18:55,726][13274] Loop inference_proc0-0_evt_loop terminating... +[2024-09-22 06:18:55,726][11734] Component InferenceWorker_p0-w0 stopped! +[2024-09-22 06:18:55,748][13281] Stopping RolloutWorker_w3... +[2024-09-22 06:18:55,749][13281] Loop rollout_proc3_evt_loop terminating... +[2024-09-22 06:18:55,749][13278] Stopping RolloutWorker_w0... +[2024-09-22 06:18:55,750][13278] Loop rollout_proc0_evt_loop terminating... +[2024-09-22 06:18:55,749][11734] Component RolloutWorker_w3 stopped! +[2024-09-22 06:18:55,751][11734] Component RolloutWorker_w0 stopped! +[2024-09-22 06:18:55,753][13279] Stopping RolloutWorker_w1... +[2024-09-22 06:18:55,753][13279] Loop rollout_proc1_evt_loop terminating... +[2024-09-22 06:18:55,755][13280] Stopping RolloutWorker_w2... +[2024-09-22 06:18:55,755][13280] Loop rollout_proc2_evt_loop terminating... +[2024-09-22 06:18:55,752][13282] Stopping RolloutWorker_w4... +[2024-09-22 06:18:55,756][13282] Loop rollout_proc4_evt_loop terminating... +[2024-09-22 06:18:55,759][13284] Stopping RolloutWorker_w6... +[2024-09-22 06:18:55,754][11734] Component RolloutWorker_w4 stopped! +[2024-09-22 06:18:55,760][13284] Loop rollout_proc6_evt_loop terminating... +[2024-09-22 06:18:55,760][11734] Component RolloutWorker_w1 stopped! +[2024-09-22 06:18:55,761][11734] Component RolloutWorker_w2 stopped! +[2024-09-22 06:18:55,764][11734] Component RolloutWorker_w6 stopped! +[2024-09-22 06:18:55,838][13283] Stopping RolloutWorker_w5... +[2024-09-22 06:18:55,839][13283] Loop rollout_proc5_evt_loop terminating... +[2024-09-22 06:18:55,838][11734] Component RolloutWorker_w5 stopped! +[2024-09-22 06:18:55,956][13285] Stopping RolloutWorker_w7... +[2024-09-22 06:18:55,957][13285] Loop rollout_proc7_evt_loop terminating... +[2024-09-22 06:18:55,957][11734] Component RolloutWorker_w7 stopped! +[2024-09-22 06:18:56,413][13260] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth... +[2024-09-22 06:18:56,518][13260] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth +[2024-09-22 06:18:56,532][13260] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth... +[2024-09-22 06:18:56,656][13260] Stopping LearnerWorker_p0... +[2024-09-22 06:18:56,656][13260] Loop learner_proc0_evt_loop terminating... +[2024-09-22 06:18:56,656][11734] Component LearnerWorker_p0 stopped! +[2024-09-22 06:18:56,658][11734] Waiting for process learner_proc0 to stop... +[2024-09-22 06:18:57,551][11734] Waiting for process inference_proc0-0 to join... +[2024-09-22 06:18:57,555][11734] Waiting for process rollout_proc0 to join... +[2024-09-22 06:18:57,557][11734] Waiting for process rollout_proc1 to join... +[2024-09-22 06:18:57,560][11734] Waiting for process rollout_proc2 to join... +[2024-09-22 06:18:57,562][11734] Waiting for process rollout_proc3 to join... +[2024-09-22 06:18:57,564][11734] Waiting for process rollout_proc4 to join... +[2024-09-22 06:18:57,567][11734] Waiting for process rollout_proc5 to join... +[2024-09-22 06:18:57,569][11734] Waiting for process rollout_proc6 to join... +[2024-09-22 06:18:57,572][11734] Waiting for process rollout_proc7 to join... +[2024-09-22 06:18:57,574][11734] Batcher 0 profile tree view: +batching: 0.0279, releasing_batches: 0.0006 +[2024-09-22 06:18:57,575][11734] InferenceWorker_p0-w0 profile tree view: +update_model: 0.0078 +wait_policy: 0.0001 + wait_policy_total: 1.9361 +one_step: 0.0729 + handle_policy_step: 1.6765 + deserialize: 0.0400, stack: 0.0044, obs_to_device_normalize: 0.3127, forward: 1.1167, send_messages: 0.1127 + prepare_outputs: 0.0566 + to_cpu: 0.0275 +[2024-09-22 06:18:57,577][11734] Learner 0 profile tree view: +misc: 0.0000, prepare_batch: 1.4660 +train: 2.3107 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0121, after_optimizer: 0.0508 + calculate_losses: 0.8574 + losses_init: 0.0000, forward_head: 0.2691, bptt_initial: 0.5100, tail: 0.0361, advantages_returns: 0.0010, losses: 0.0366 + bptt: 0.0041 + bptt_forward_core: 0.0039 + update: 1.3889 + clip: 0.0478 +[2024-09-22 06:18:57,579][11734] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0006, enqueue_policy_requests: 0.0289, env_step: 0.2787, overhead: 0.0196, complete_rollouts: 0.0007 +save_policy_outputs: 0.0275 + split_output_tensors: 0.0102 +[2024-09-22 06:18:57,581][11734] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0006, enqueue_policy_requests: 0.0295, env_step: 0.2849, overhead: 0.0200, complete_rollouts: 0.0007 +save_policy_outputs: 0.0273 + split_output_tensors: 0.0109 +[2024-09-22 06:18:57,584][11734] Loop Runner_EvtLoop terminating... +[2024-09-22 06:18:57,586][11734] Runner profile tree view: +main_loop: 14.5178 +[2024-09-22 06:18:57,587][11734] Collected {0: 10022912}, FPS: 564.3 +[2024-09-22 06:19:03,049][11734] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-22 06:19:03,050][11734] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-22 06:19:03,052][11734] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-22 06:19:03,054][11734] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-22 06:19:03,055][11734] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-22 06:19:03,058][11734] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-22 06:19:03,059][11734] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-09-22 06:19:03,060][11734] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-22 06:19:03,061][11734] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-09-22 06:19:03,066][11734] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-09-22 06:19:03,067][11734] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-22 06:19:03,070][11734] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-22 06:19:03,071][11734] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-22 06:19:03,072][11734] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-22 06:19:03,076][11734] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-22 06:19:03,102][11734] RunningMeanStd input shape: (3, 72, 128) +[2024-09-22 06:19:03,105][11734] RunningMeanStd input shape: (1,) +[2024-09-22 06:19:03,119][11734] ConvEncoder: input_channels=3 +[2024-09-22 06:19:03,167][11734] Conv encoder output size: 512 +[2024-09-22 06:19:03,170][11734] Policy head output size: 512 +[2024-09-22 06:19:03,192][11734] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth... +[2024-09-22 06:19:03,692][11734] Num frames 100... +[2024-09-22 06:19:03,869][11734] Num frames 200... +[2024-09-22 06:19:04,041][11734] Num frames 300... +[2024-09-22 06:19:04,201][11734] Num frames 400... +[2024-09-22 06:19:04,362][11734] Num frames 500... +[2024-09-22 06:19:04,543][11734] Num frames 600... +[2024-09-22 06:19:04,707][11734] Num frames 700... +[2024-09-22 06:19:04,862][11734] Num frames 800... +[2024-09-22 06:19:05,010][11734] Num frames 900... +[2024-09-22 06:19:05,165][11734] Avg episode rewards: #0: 20.600, true rewards: #0: 9.600 +[2024-09-22 06:19:05,167][11734] Avg episode reward: 20.600, avg true_objective: 9.600 +[2024-09-22 06:19:05,227][11734] Num frames 1000... +[2024-09-22 06:19:05,361][11734] Num frames 1100... +[2024-09-22 06:19:05,507][11734] Num frames 1200... +[2024-09-22 06:19:05,640][11734] Avg episode rewards: #0: 13.240, true rewards: #0: 6.240 +[2024-09-22 06:19:05,643][11734] Avg episode reward: 13.240, avg true_objective: 6.240 +[2024-09-22 06:19:05,721][11734] Num frames 1300... +[2024-09-22 06:19:05,869][11734] Num frames 1400... +[2024-09-22 06:19:06,016][11734] Num frames 1500... +[2024-09-22 06:19:06,183][11734] Num frames 1600... +[2024-09-22 06:19:06,341][11734] Num frames 1700... +[2024-09-22 06:19:06,500][11734] Num frames 1800... +[2024-09-22 06:19:06,663][11734] Num frames 1900... +[2024-09-22 06:19:06,821][11734] Num frames 2000... +[2024-09-22 06:19:06,965][11734] Num frames 2100... +[2024-09-22 06:19:07,115][11734] Num frames 2200... +[2024-09-22 06:19:07,302][11734] Num frames 2300... +[2024-09-22 06:19:07,457][11734] Num frames 2400... +[2024-09-22 06:19:07,631][11734] Num frames 2500... +[2024-09-22 06:19:07,787][11734] Num frames 2600... +[2024-09-22 06:19:07,949][11734] Num frames 2700... +[2024-09-22 06:19:08,090][11734] Num frames 2800... +[2024-09-22 06:19:08,258][11734] Num frames 2900... +[2024-09-22 06:19:08,442][11734] Num frames 3000... +[2024-09-22 06:19:08,603][11734] Num frames 3100... +[2024-09-22 06:19:08,755][11734] Num frames 3200... +[2024-09-22 06:19:08,907][11734] Avg episode rewards: #0: 28.213, true rewards: #0: 10.880 +[2024-09-22 06:19:08,908][11734] Avg episode reward: 28.213, avg true_objective: 10.880 +[2024-09-22 06:19:08,968][11734] Num frames 3300... +[2024-09-22 06:19:09,139][11734] Num frames 3400... +[2024-09-22 06:19:09,305][11734] Num frames 3500... +[2024-09-22 06:19:09,472][11734] Num frames 3600... +[2024-09-22 06:19:09,637][11734] Num frames 3700... +[2024-09-22 06:19:09,818][11734] Num frames 3800... +[2024-09-22 06:19:09,973][11734] Num frames 3900... +[2024-09-22 06:19:10,128][11734] Num frames 4000... +[2024-09-22 06:19:10,281][11734] Num frames 4100... +[2024-09-22 06:19:10,462][11734] Num frames 4200... +[2024-09-22 06:19:10,633][11734] Num frames 4300... +[2024-09-22 06:19:10,796][11734] Num frames 4400... +[2024-09-22 06:19:10,956][11734] Num frames 4500... +[2024-09-22 06:19:11,119][11734] Num frames 4600... +[2024-09-22 06:19:11,264][11734] Num frames 4700... +[2024-09-22 06:19:11,418][11734] Num frames 4800... +[2024-09-22 06:19:11,608][11734] Num frames 4900... +[2024-09-22 06:19:11,775][11734] Num frames 5000... +[2024-09-22 06:19:11,925][11734] Num frames 5100... +[2024-09-22 06:19:12,103][11734] Num frames 5200... +[2024-09-22 06:19:12,245][11734] Avg episode rewards: #0: 34.384, true rewards: #0: 13.135 +[2024-09-22 06:19:12,247][11734] Avg episode reward: 34.384, avg true_objective: 13.135 +[2024-09-22 06:19:12,324][11734] Num frames 5300... +[2024-09-22 06:19:12,481][11734] Num frames 5400... +[2024-09-22 06:19:12,659][11734] Num frames 5500... +[2024-09-22 06:19:12,820][11734] Num frames 5600... +[2024-09-22 06:19:13,009][11734] Num frames 5700... +[2024-09-22 06:19:13,209][11734] Num frames 5800... +[2024-09-22 06:19:13,384][11734] Num frames 5900... +[2024-09-22 06:19:13,549][11734] Num frames 6000... +[2024-09-22 06:19:13,703][11734] Num frames 6100... +[2024-09-22 06:19:13,876][11734] Num frames 6200... +[2024-09-22 06:19:14,040][11734] Num frames 6300... +[2024-09-22 06:19:14,199][11734] Num frames 6400... +[2024-09-22 06:19:14,363][11734] Num frames 6500... +[2024-09-22 06:19:14,533][11734] Num frames 6600... +[2024-09-22 06:19:14,702][11734] Num frames 6700... +[2024-09-22 06:19:14,859][11734] Num frames 6800... +[2024-09-22 06:19:14,923][11734] Avg episode rewards: #0: 36.407, true rewards: #0: 13.608 +[2024-09-22 06:19:14,925][11734] Avg episode reward: 36.407, avg true_objective: 13.608 +[2024-09-22 06:19:15,075][11734] Num frames 6900... +[2024-09-22 06:19:15,243][11734] Num frames 7000... +[2024-09-22 06:19:15,402][11734] Num frames 7100... +[2024-09-22 06:19:15,560][11734] Num frames 7200... +[2024-09-22 06:19:15,724][11734] Num frames 7300... +[2024-09-22 06:19:15,885][11734] Num frames 7400... +[2024-09-22 06:19:16,037][11734] Num frames 7500... +[2024-09-22 06:19:16,107][11734] Avg episode rewards: #0: 33.013, true rewards: #0: 12.513 +[2024-09-22 06:19:16,109][11734] Avg episode reward: 33.013, avg true_objective: 12.513 +[2024-09-22 06:19:16,246][11734] Num frames 7600... +[2024-09-22 06:19:16,411][11734] Num frames 7700... +[2024-09-22 06:19:16,545][11734] Num frames 7800... +[2024-09-22 06:19:16,699][11734] Num frames 7900... +[2024-09-22 06:19:16,858][11734] Num frames 8000... +[2024-09-22 06:19:17,023][11734] Num frames 8100... +[2024-09-22 06:19:17,104][11734] Avg episode rewards: #0: 29.880, true rewards: #0: 11.594 +[2024-09-22 06:19:17,106][11734] Avg episode reward: 29.880, avg true_objective: 11.594 +[2024-09-22 06:19:17,228][11734] Num frames 8200... +[2024-09-22 06:19:17,390][11734] Num frames 8300... +[2024-09-22 06:19:17,565][11734] Num frames 8400... +[2024-09-22 06:19:17,734][11734] Num frames 8500... +[2024-09-22 06:19:17,881][11734] Num frames 8600... +[2024-09-22 06:19:18,053][11734] Num frames 8700... +[2024-09-22 06:19:18,219][11734] Num frames 8800... +[2024-09-22 06:19:18,361][11734] Num frames 8900... +[2024-09-22 06:19:18,528][11734] Num frames 9000... +[2024-09-22 06:19:18,701][11734] Num frames 9100... +[2024-09-22 06:19:18,851][11734] Num frames 9200... +[2024-09-22 06:19:19,007][11734] Num frames 9300... +[2024-09-22 06:19:19,183][11734] Num frames 9400... +[2024-09-22 06:19:19,348][11734] Num frames 9500... +[2024-09-22 06:19:19,502][11734] Num frames 9600... +[2024-09-22 06:19:19,654][11734] Num frames 9700... +[2024-09-22 06:19:19,836][11734] Num frames 9800... +[2024-09-22 06:19:19,986][11734] Num frames 9900... +[2024-09-22 06:19:20,145][11734] Num frames 10000... +[2024-09-22 06:19:20,313][11734] Num frames 10100... +[2024-09-22 06:19:20,488][11734] Num frames 10200... +[2024-09-22 06:19:20,563][11734] Avg episode rewards: #0: 33.016, true rewards: #0: 12.766 +[2024-09-22 06:19:20,565][11734] Avg episode reward: 33.016, avg true_objective: 12.766 +[2024-09-22 06:19:20,694][11734] Num frames 10300... +[2024-09-22 06:19:20,851][11734] Num frames 10400... +[2024-09-22 06:19:21,017][11734] Num frames 10500... +[2024-09-22 06:19:21,188][11734] Num frames 10600... +[2024-09-22 06:19:21,347][11734] Num frames 10700... +[2024-09-22 06:19:21,507][11734] Num frames 10800... +[2024-09-22 06:19:21,687][11734] Num frames 10900... +[2024-09-22 06:19:21,845][11734] Num frames 11000... +[2024-09-22 06:19:21,991][11734] Num frames 11100... +[2024-09-22 06:19:22,162][11734] Num frames 11200... +[2024-09-22 06:19:22,313][11734] Num frames 11300... +[2024-09-22 06:19:22,467][11734] Num frames 11400... +[2024-09-22 06:19:22,623][11734] Num frames 11500... +[2024-09-22 06:19:22,726][11734] Avg episode rewards: #0: 33.694, true rewards: #0: 12.806 +[2024-09-22 06:19:22,727][11734] Avg episode reward: 33.694, avg true_objective: 12.806 +[2024-09-22 06:19:22,847][11734] Num frames 11600... +[2024-09-22 06:19:23,010][11734] Num frames 11700... +[2024-09-22 06:19:23,161][11734] Num frames 11800... +[2024-09-22 06:19:23,332][11734] Avg episode rewards: #0: 30.777, true rewards: #0: 11.877 +[2024-09-22 06:19:23,334][11734] Avg episode reward: 30.777, avg true_objective: 11.877 +[2024-09-22 06:19:56,200][11734] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-09-22 06:20:31,391][11734] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-09-22 06:20:31,393][11734] Overriding arg 'num_workers' with value 1 passed from command line +[2024-09-22 06:20:31,394][11734] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-09-22 06:20:31,395][11734] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-09-22 06:20:31,398][11734] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-09-22 06:20:31,400][11734] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-09-22 06:20:31,401][11734] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-09-22 06:20:31,402][11734] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-09-22 06:20:31,405][11734] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-09-22 06:20:31,405][11734] Adding new argument 'hf_repository'='Vivek-huggingface/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-09-22 06:20:31,407][11734] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-09-22 06:20:31,409][11734] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-09-22 06:20:31,411][11734] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-09-22 06:20:31,412][11734] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-09-22 06:20:31,414][11734] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-09-22 06:20:31,440][11734] RunningMeanStd input shape: (3, 72, 128) +[2024-09-22 06:20:31,442][11734] RunningMeanStd input shape: (1,) +[2024-09-22 06:20:31,457][11734] ConvEncoder: input_channels=3 +[2024-09-22 06:20:31,502][11734] Conv encoder output size: 512 +[2024-09-22 06:20:31,504][11734] Policy head output size: 512 +[2024-09-22 06:20:31,532][11734] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002447_10022912.pth... +[2024-09-22 06:20:32,034][11734] Num frames 100... +[2024-09-22 06:20:32,187][11734] Num frames 200... +[2024-09-22 06:20:32,360][11734] Num frames 300... +[2024-09-22 06:20:32,527][11734] Num frames 400... +[2024-09-22 06:20:32,684][11734] Num frames 500... +[2024-09-22 06:20:32,844][11734] Num frames 600... +[2024-09-22 06:20:33,013][11734] Num frames 700... +[2024-09-22 06:20:33,168][11734] Num frames 800... +[2024-09-22 06:20:33,327][11734] Num frames 900... +[2024-09-22 06:20:33,498][11734] Num frames 1000... +[2024-09-22 06:20:33,669][11734] Num frames 1100... +[2024-09-22 06:20:33,839][11734] Num frames 1200... +[2024-09-22 06:20:34,001][11734] Num frames 1300... +[2024-09-22 06:20:34,177][11734] Num frames 1400... +[2024-09-22 06:20:34,354][11734] Avg episode rewards: #0: 33.720, true rewards: #0: 14.720 +[2024-09-22 06:20:34,356][11734] Avg episode reward: 33.720, avg true_objective: 14.720 +[2024-09-22 06:20:34,408][11734] Num frames 1500... +[2024-09-22 06:20:34,590][11734] Num frames 1600... +[2024-09-22 06:20:34,754][11734] Num frames 1700... +[2024-09-22 06:20:34,937][11734] Num frames 1800... +[2024-09-22 06:20:35,107][11734] Num frames 1900... +[2024-09-22 06:20:35,252][11734] Num frames 2000... +[2024-09-22 06:20:35,398][11734] Avg episode rewards: #0: 21.740, true rewards: #0: 10.240 +[2024-09-22 06:20:35,400][11734] Avg episode reward: 21.740, avg true_objective: 10.240 +[2024-09-22 06:20:35,496][11734] Num frames 2100... +[2024-09-22 06:20:35,668][11734] Num frames 2200... +[2024-09-22 06:20:35,827][11734] Num frames 2300... +[2024-09-22 06:20:36,009][11734] Num frames 2400... +[2024-09-22 06:20:36,163][11734] Num frames 2500... +[2024-09-22 06:20:36,329][11734] Num frames 2600... +[2024-09-22 06:20:36,462][11734] Num frames 2700... +[2024-09-22 06:20:36,591][11734] Num frames 2800... +[2024-09-22 06:20:36,723][11734] Num frames 2900... +[2024-09-22 06:20:36,860][11734] Num frames 3000... +[2024-09-22 06:20:36,996][11734] Num frames 3100... +[2024-09-22 06:20:37,128][11734] Num frames 3200... +[2024-09-22 06:20:37,262][11734] Avg episode rewards: #0: 23.523, true rewards: #0: 10.857 +[2024-09-22 06:20:37,263][11734] Avg episode reward: 23.523, avg true_objective: 10.857 +[2024-09-22 06:20:37,322][11734] Num frames 3300... +[2024-09-22 06:20:37,458][11734] Num frames 3400... +[2024-09-22 06:20:37,594][11734] Num frames 3500... +[2024-09-22 06:20:37,725][11734] Num frames 3600... +[2024-09-22 06:20:37,861][11734] Num frames 3700... +[2024-09-22 06:20:37,993][11734] Num frames 3800... +[2024-09-22 06:20:38,131][11734] Avg episode rewards: #0: 20.913, true rewards: #0: 9.662 +[2024-09-22 06:20:38,132][11734] Avg episode reward: 20.913, avg true_objective: 9.662 +[2024-09-22 06:20:38,181][11734] Num frames 3900... +[2024-09-22 06:20:38,311][11734] Num frames 4000... +[2024-09-22 06:20:38,444][11734] Num frames 4100... +[2024-09-22 06:20:38,577][11734] Num frames 4200... +[2024-09-22 06:20:38,716][11734] Num frames 4300... +[2024-09-22 06:20:38,854][11734] Num frames 4400... +[2024-09-22 06:20:38,984][11734] Num frames 4500... +[2024-09-22 06:20:39,116][11734] Num frames 4600... +[2024-09-22 06:20:39,255][11734] Num frames 4700... +[2024-09-22 06:20:39,393][11734] Num frames 4800... +[2024-09-22 06:20:39,527][11734] Num frames 4900... +[2024-09-22 06:20:39,667][11734] Num frames 5000... +[2024-09-22 06:20:39,798][11734] Num frames 5100... +[2024-09-22 06:20:39,934][11734] Num frames 5200... +[2024-09-22 06:20:40,073][11734] Num frames 5300... +[2024-09-22 06:20:40,211][11734] Num frames 5400... +[2024-09-22 06:20:40,398][11734] Avg episode rewards: #0: 25.394, true rewards: #0: 10.994 +[2024-09-22 06:20:40,400][11734] Avg episode reward: 25.394, avg true_objective: 10.994 +[2024-09-22 06:20:40,406][11734] Num frames 5500... +[2024-09-22 06:20:40,544][11734] Num frames 5600... +[2024-09-22 06:20:40,679][11734] Num frames 5700... +[2024-09-22 06:20:40,809][11734] Num frames 5800... +[2024-09-22 06:20:40,942][11734] Num frames 5900... +[2024-09-22 06:20:41,075][11734] Num frames 6000... +[2024-09-22 06:20:41,205][11734] Num frames 6100... +[2024-09-22 06:20:41,340][11734] Num frames 6200... +[2024-09-22 06:20:41,475][11734] Num frames 6300... +[2024-09-22 06:20:41,608][11734] Num frames 6400... +[2024-09-22 06:20:41,743][11734] Num frames 6500... +[2024-09-22 06:20:41,877][11734] Num frames 6600... +[2024-09-22 06:20:42,043][11734] Avg episode rewards: #0: 25.468, true rewards: #0: 11.135 +[2024-09-22 06:20:42,046][11734] Avg episode reward: 25.468, avg true_objective: 11.135 +[2024-09-22 06:20:42,074][11734] Num frames 6700... +[2024-09-22 06:20:42,211][11734] Num frames 6800... +[2024-09-22 06:20:42,342][11734] Num frames 6900... +[2024-09-22 06:20:42,474][11734] Num frames 7000... +[2024-09-22 06:20:42,605][11734] Num frames 7100... +[2024-09-22 06:20:42,745][11734] Num frames 7200... +[2024-09-22 06:20:42,878][11734] Num frames 7300... +[2024-09-22 06:20:43,012][11734] Num frames 7400... +[2024-09-22 06:20:43,146][11734] Num frames 7500... +[2024-09-22 06:20:43,282][11734] Num frames 7600... +[2024-09-22 06:20:43,370][11734] Avg episode rewards: #0: 24.889, true rewards: #0: 10.889 +[2024-09-22 06:20:43,373][11734] Avg episode reward: 24.889, avg true_objective: 10.889 +[2024-09-22 06:20:43,478][11734] Num frames 7700... +[2024-09-22 06:20:43,615][11734] Num frames 7800... +[2024-09-22 06:20:43,751][11734] Num frames 7900... +[2024-09-22 06:20:43,885][11734] Num frames 8000... +[2024-09-22 06:20:44,019][11734] Num frames 8100... +[2024-09-22 06:20:44,152][11734] Num frames 8200... +[2024-09-22 06:20:44,286][11734] Num frames 8300... +[2024-09-22 06:20:44,421][11734] Num frames 8400... +[2024-09-22 06:20:44,558][11734] Num frames 8500... +[2024-09-22 06:20:44,691][11734] Num frames 8600... +[2024-09-22 06:20:44,825][11734] Num frames 8700... +[2024-09-22 06:20:44,956][11734] Num frames 8800... +[2024-09-22 06:20:45,086][11734] Num frames 8900... +[2024-09-22 06:20:45,213][11734] Num frames 9000... +[2024-09-22 06:20:45,343][11734] Num frames 9100... +[2024-09-22 06:20:45,515][11734] Avg episode rewards: #0: 27.112, true rewards: #0: 11.487 +[2024-09-22 06:20:45,517][11734] Avg episode reward: 27.112, avg true_objective: 11.487 +[2024-09-22 06:20:45,532][11734] Num frames 9200... +[2024-09-22 06:20:45,660][11734] Num frames 9300... +[2024-09-22 06:20:45,798][11734] Num frames 9400... +[2024-09-22 06:20:45,927][11734] Num frames 9500... +[2024-09-22 06:20:46,054][11734] Num frames 9600... +[2024-09-22 06:20:46,184][11734] Num frames 9700... +[2024-09-22 06:20:46,312][11734] Num frames 9800... +[2024-09-22 06:20:46,453][11734] Num frames 9900... +[2024-09-22 06:20:46,589][11734] Num frames 10000... +[2024-09-22 06:20:46,714][11734] Avg episode rewards: #0: 26.282, true rewards: #0: 11.171 +[2024-09-22 06:20:46,716][11734] Avg episode reward: 26.282, avg true_objective: 11.171 +[2024-09-22 06:20:46,780][11734] Num frames 10100... +[2024-09-22 06:20:46,917][11734] Num frames 10200... +[2024-09-22 06:20:47,052][11734] Num frames 10300... +[2024-09-22 06:20:47,191][11734] Num frames 10400... +[2024-09-22 06:20:47,326][11734] Num frames 10500... +[2024-09-22 06:20:47,465][11734] Num frames 10600... +[2024-09-22 06:20:47,603][11734] Num frames 10700... +[2024-09-22 06:20:47,740][11734] Num frames 10800... +[2024-09-22 06:20:47,880][11734] Num frames 10900... +[2024-09-22 06:20:48,017][11734] Num frames 11000... +[2024-09-22 06:20:48,150][11734] Num frames 11100... +[2024-09-22 06:20:48,284][11734] Num frames 11200... +[2024-09-22 06:20:48,413][11734] Num frames 11300... +[2024-09-22 06:20:48,599][11734] Avg episode rewards: #0: 26.798, true rewards: #0: 11.398 +[2024-09-22 06:20:48,601][11734] Avg episode reward: 26.798, avg true_objective: 11.398 +[2024-09-22 06:20:48,606][11734] Num frames 11400... +[2024-09-22 06:21:20,460][11734] Replay video saved to /content/train_dir/default_experiment/replay.mp4!