diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1214 @@ +[2023-02-25 17:05:20,778][08744] Saving configuration to /content/train_dir/default_experiment/config.json... +[2023-02-25 17:05:20,781][08744] Rollout worker 0 uses device cpu +[2023-02-25 17:05:20,783][08744] Rollout worker 1 uses device cpu +[2023-02-25 17:05:20,784][08744] Rollout worker 2 uses device cpu +[2023-02-25 17:05:20,785][08744] Rollout worker 3 uses device cpu +[2023-02-25 17:05:20,787][08744] Rollout worker 4 uses device cpu +[2023-02-25 17:05:20,788][08744] Rollout worker 5 uses device cpu +[2023-02-25 17:05:20,789][08744] Rollout worker 6 uses device cpu +[2023-02-25 17:05:20,790][08744] Rollout worker 7 uses device cpu +[2023-02-25 17:05:21,005][08744] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-25 17:05:21,007][08744] InferenceWorker_p0-w0: min num requests: 2 +[2023-02-25 17:05:21,048][08744] Starting all processes... +[2023-02-25 17:05:21,055][08744] Starting process learner_proc0 +[2023-02-25 17:05:21,143][08744] Starting all processes... +[2023-02-25 17:05:21,161][08744] Starting process inference_proc0-0 +[2023-02-25 17:05:21,162][08744] Starting process rollout_proc0 +[2023-02-25 17:05:21,168][08744] Starting process rollout_proc1 +[2023-02-25 17:05:21,168][08744] Starting process rollout_proc2 +[2023-02-25 17:05:21,168][08744] Starting process rollout_proc3 +[2023-02-25 17:05:21,169][08744] Starting process rollout_proc5 +[2023-02-25 17:05:21,169][08744] Starting process rollout_proc4 +[2023-02-25 17:05:21,187][08744] Starting process rollout_proc6 +[2023-02-25 17:05:21,198][08744] Starting process rollout_proc7 +[2023-02-25 17:05:32,007][14418] Worker 3 uses CPU cores [1] +[2023-02-25 17:05:32,081][14400] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-25 17:05:32,083][14400] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2023-02-25 17:05:32,221][14416] Worker 0 uses CPU cores [0] +[2023-02-25 17:05:32,257][14414] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-25 17:05:32,258][14414] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2023-02-25 17:05:32,309][14417] Worker 2 uses CPU cores [0] +[2023-02-25 17:05:32,422][14422] Worker 7 uses CPU cores [1] +[2023-02-25 17:05:32,425][14420] Worker 4 uses CPU cores [0] +[2023-02-25 17:05:32,441][14415] Worker 1 uses CPU cores [1] +[2023-02-25 17:05:32,447][14421] Worker 6 uses CPU cores [0] +[2023-02-25 17:05:32,475][14419] Worker 5 uses CPU cores [1] +[2023-02-25 17:05:32,932][14400] Num visible devices: 1 +[2023-02-25 17:05:32,932][14414] Num visible devices: 1 +[2023-02-25 17:05:32,935][14400] Starting seed is not provided +[2023-02-25 17:05:32,935][14400] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-25 17:05:32,936][14400] Initializing actor-critic model on device cuda:0 +[2023-02-25 17:05:32,936][14400] RunningMeanStd input shape: (3, 72, 128) +[2023-02-25 17:05:32,939][14400] RunningMeanStd input shape: (1,) +[2023-02-25 17:05:32,953][14400] ConvEncoder: input_channels=3 +[2023-02-25 17:05:33,215][14400] Conv encoder output size: 512 +[2023-02-25 17:05:33,216][14400] Policy head output size: 512 +[2023-02-25 17:05:33,259][14400] Created Actor Critic model with architecture: +[2023-02-25 17:05:33,260][14400] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2023-02-25 17:05:40,182][14400] Using optimizer +[2023-02-25 17:05:40,183][14400] No checkpoints found +[2023-02-25 17:05:40,183][14400] Did not load from checkpoint, starting from scratch! +[2023-02-25 17:05:40,184][14400] Initialized policy 0 weights for model version 0 +[2023-02-25 17:05:40,187][14400] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-25 17:05:40,195][14400] LearnerWorker_p0 finished initialization! +[2023-02-25 17:05:40,305][14414] RunningMeanStd input shape: (3, 72, 128) +[2023-02-25 17:05:40,306][14414] RunningMeanStd input shape: (1,) +[2023-02-25 17:05:40,318][14414] ConvEncoder: input_channels=3 +[2023-02-25 17:05:40,421][14414] Conv encoder output size: 512 +[2023-02-25 17:05:40,421][14414] Policy head output size: 512 +[2023-02-25 17:05:40,723][08744] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-25 17:05:40,994][08744] Heartbeat connected on Batcher_0 +[2023-02-25 17:05:41,001][08744] Heartbeat connected on LearnerWorker_p0 +[2023-02-25 17:05:41,016][08744] Heartbeat connected on RolloutWorker_w0 +[2023-02-25 17:05:41,022][08744] Heartbeat connected on RolloutWorker_w1 +[2023-02-25 17:05:41,027][08744] Heartbeat connected on RolloutWorker_w2 +[2023-02-25 17:05:41,031][08744] Heartbeat connected on RolloutWorker_w3 +[2023-02-25 17:05:41,036][08744] Heartbeat connected on RolloutWorker_w4 +[2023-02-25 17:05:41,041][08744] Heartbeat connected on RolloutWorker_w5 +[2023-02-25 17:05:41,046][08744] Heartbeat connected on RolloutWorker_w6 +[2023-02-25 17:05:41,052][08744] Heartbeat connected on RolloutWorker_w7 +[2023-02-25 17:05:42,793][08744] Inference worker 0-0 is ready! +[2023-02-25 17:05:42,795][08744] All inference workers are ready! Signal rollout workers to start! +[2023-02-25 17:05:42,799][08744] Heartbeat connected on InferenceWorker_p0-w0 +[2023-02-25 17:05:42,902][14416] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-25 17:05:42,914][14420] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-25 17:05:42,948][14417] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-25 17:05:42,953][14415] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-25 17:05:42,956][14422] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-25 17:05:42,958][14421] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-25 17:05:42,961][14418] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-25 17:05:42,983][14419] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-25 17:05:43,474][14415] Decorrelating experience for 0 frames... +[2023-02-25 17:05:43,815][14415] Decorrelating experience for 32 frames... +[2023-02-25 17:05:44,193][14416] Decorrelating experience for 0 frames... +[2023-02-25 17:05:44,201][14420] Decorrelating experience for 0 frames... +[2023-02-25 17:05:44,213][14421] Decorrelating experience for 0 frames... +[2023-02-25 17:05:44,216][14417] Decorrelating experience for 0 frames... +[2023-02-25 17:05:44,901][14422] Decorrelating experience for 0 frames... +[2023-02-25 17:05:44,928][14418] Decorrelating experience for 0 frames... +[2023-02-25 17:05:45,501][14420] Decorrelating experience for 32 frames... +[2023-02-25 17:05:45,506][14421] Decorrelating experience for 32 frames... +[2023-02-25 17:05:45,546][14417] Decorrelating experience for 32 frames... +[2023-02-25 17:05:45,604][14416] Decorrelating experience for 32 frames... +[2023-02-25 17:05:45,676][14415] Decorrelating experience for 64 frames... +[2023-02-25 17:05:45,710][14418] Decorrelating experience for 32 frames... +[2023-02-25 17:05:45,723][08744] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-25 17:05:46,688][14422] Decorrelating experience for 32 frames... +[2023-02-25 17:05:46,909][14419] Decorrelating experience for 0 frames... +[2023-02-25 17:05:46,979][14420] Decorrelating experience for 64 frames... +[2023-02-25 17:05:47,077][14421] Decorrelating experience for 64 frames... +[2023-02-25 17:05:47,150][14417] Decorrelating experience for 64 frames... +[2023-02-25 17:05:47,184][14418] Decorrelating experience for 64 frames... +[2023-02-25 17:05:48,569][14422] Decorrelating experience for 64 frames... +[2023-02-25 17:05:48,669][14419] Decorrelating experience for 32 frames... +[2023-02-25 17:05:48,940][14416] Decorrelating experience for 64 frames... +[2023-02-25 17:05:48,960][14418] Decorrelating experience for 96 frames... +[2023-02-25 17:05:49,305][14421] Decorrelating experience for 96 frames... +[2023-02-25 17:05:49,303][14420] Decorrelating experience for 96 frames... +[2023-02-25 17:05:49,521][14417] Decorrelating experience for 96 frames... +[2023-02-25 17:05:50,329][14415] Decorrelating experience for 96 frames... +[2023-02-25 17:05:50,450][14419] Decorrelating experience for 64 frames... +[2023-02-25 17:05:50,723][08744] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-25 17:05:50,953][14422] Decorrelating experience for 96 frames... +[2023-02-25 17:05:51,379][14419] Decorrelating experience for 96 frames... +[2023-02-25 17:05:52,079][14416] Decorrelating experience for 96 frames... +[2023-02-25 17:05:55,723][08744] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 25.6. Samples: 384. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-25 17:05:55,732][08744] Avg episode reward: [(0, '0.640')] +[2023-02-25 17:05:57,897][14400] Signal inference workers to stop experience collection... +[2023-02-25 17:05:57,907][14414] InferenceWorker_p0-w0: stopping experience collection +[2023-02-25 17:06:00,198][14400] Signal inference workers to resume experience collection... +[2023-02-25 17:06:00,199][14414] InferenceWorker_p0-w0: resuming experience collection +[2023-02-25 17:06:00,723][08744] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 4096. Throughput: 0: 111.2. Samples: 2224. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2023-02-25 17:06:00,726][08744] Avg episode reward: [(0, '2.229')] +[2023-02-25 17:06:05,730][08744] Fps is (10 sec: 2455.8, 60 sec: 982.7, 300 sec: 982.7). Total num frames: 24576. Throughput: 0: 195.3. Samples: 4884. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-25 17:06:05,733][08744] Avg episode reward: [(0, '3.567')] +[2023-02-25 17:06:10,723][08744] Fps is (10 sec: 2867.1, 60 sec: 1092.2, 300 sec: 1092.2). Total num frames: 32768. Throughput: 0: 298.1. Samples: 8942. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-25 17:06:10,726][08744] Avg episode reward: [(0, '3.855')] +[2023-02-25 17:06:12,789][14414] Updated weights for policy 0, policy_version 10 (0.0025) +[2023-02-25 17:06:15,723][08744] Fps is (10 sec: 2049.5, 60 sec: 1287.3, 300 sec: 1287.3). Total num frames: 45056. Throughput: 0: 307.7. Samples: 10768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:06:15,725][08744] Avg episode reward: [(0, '4.478')] +[2023-02-25 17:06:20,723][08744] Fps is (10 sec: 3277.0, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 65536. Throughput: 0: 384.9. Samples: 15394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:06:20,725][08744] Avg episode reward: [(0, '4.642')] +[2023-02-25 17:06:24,549][14414] Updated weights for policy 0, policy_version 20 (0.0017) +[2023-02-25 17:06:25,723][08744] Fps is (10 sec: 4095.9, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 86016. Throughput: 0: 470.2. Samples: 21158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:06:25,731][08744] Avg episode reward: [(0, '4.277')] +[2023-02-25 17:06:30,725][08744] Fps is (10 sec: 3276.0, 60 sec: 1966.0, 300 sec: 1966.0). Total num frames: 98304. Throughput: 0: 523.0. Samples: 23536. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2023-02-25 17:06:30,731][08744] Avg episode reward: [(0, '4.172')] +[2023-02-25 17:06:35,725][08744] Fps is (10 sec: 2457.0, 60 sec: 2010.7, 300 sec: 2010.7). Total num frames: 110592. Throughput: 0: 603.8. Samples: 27174. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2023-02-25 17:06:35,733][08744] Avg episode reward: [(0, '4.203')] +[2023-02-25 17:06:35,749][14400] Saving new best policy, reward=4.203! +[2023-02-25 17:06:39,002][14414] Updated weights for policy 0, policy_version 30 (0.0015) +[2023-02-25 17:06:40,723][08744] Fps is (10 sec: 2867.9, 60 sec: 2116.3, 300 sec: 2116.3). Total num frames: 126976. Throughput: 0: 697.3. Samples: 31764. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2023-02-25 17:06:40,724][08744] Avg episode reward: [(0, '4.350')] +[2023-02-25 17:06:40,735][14400] Saving new best policy, reward=4.350! +[2023-02-25 17:06:45,723][08744] Fps is (10 sec: 3277.7, 60 sec: 2389.3, 300 sec: 2205.5). Total num frames: 143360. Throughput: 0: 717.1. Samples: 34492. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-25 17:06:45,725][08744] Avg episode reward: [(0, '4.386')] +[2023-02-25 17:06:45,748][14400] Saving new best policy, reward=4.386! +[2023-02-25 17:06:50,723][08744] Fps is (10 sec: 3276.8, 60 sec: 2662.4, 300 sec: 2282.1). Total num frames: 159744. Throughput: 0: 772.0. Samples: 39616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:06:50,728][08744] Avg episode reward: [(0, '4.530')] +[2023-02-25 17:06:50,732][14400] Saving new best policy, reward=4.530! +[2023-02-25 17:06:51,542][14414] Updated weights for policy 0, policy_version 40 (0.0020) +[2023-02-25 17:06:55,723][08744] Fps is (10 sec: 2867.1, 60 sec: 2867.2, 300 sec: 2293.8). Total num frames: 172032. Throughput: 0: 770.5. Samples: 43616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:06:55,730][08744] Avg episode reward: [(0, '4.600')] +[2023-02-25 17:06:55,741][14400] Saving new best policy, reward=4.600! +[2023-02-25 17:07:00,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 2355.2). Total num frames: 188416. Throughput: 0: 773.2. Samples: 45564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:07:00,730][08744] Avg episode reward: [(0, '4.569')] +[2023-02-25 17:07:04,090][14414] Updated weights for policy 0, policy_version 50 (0.0021) +[2023-02-25 17:07:05,723][08744] Fps is (10 sec: 3686.5, 60 sec: 3072.4, 300 sec: 2457.6). Total num frames: 208896. Throughput: 0: 798.8. Samples: 51338. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:07:05,729][08744] Avg episode reward: [(0, '4.423')] +[2023-02-25 17:07:10,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3208.6, 300 sec: 2503.1). Total num frames: 225280. Throughput: 0: 784.7. Samples: 56470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:07:10,730][08744] Avg episode reward: [(0, '4.351')] +[2023-02-25 17:07:15,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2500.7). Total num frames: 237568. Throughput: 0: 771.6. Samples: 58254. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:07:15,732][08744] Avg episode reward: [(0, '4.314')] +[2023-02-25 17:07:15,751][14400] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000058_237568.pth... +[2023-02-25 17:07:18,663][14414] Updated weights for policy 0, policy_version 60 (0.0017) +[2023-02-25 17:07:20,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 2498.6). Total num frames: 249856. Throughput: 0: 775.5. Samples: 62070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:07:20,725][08744] Avg episode reward: [(0, '4.342')] +[2023-02-25 17:07:25,724][08744] Fps is (10 sec: 3276.2, 60 sec: 3071.9, 300 sec: 2574.6). Total num frames: 270336. Throughput: 0: 804.1. Samples: 67952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:07:25,732][08744] Avg episode reward: [(0, '4.377')] +[2023-02-25 17:07:29,843][14414] Updated weights for policy 0, policy_version 70 (0.0022) +[2023-02-25 17:07:30,730][08744] Fps is (10 sec: 3683.7, 60 sec: 3140.0, 300 sec: 2606.4). Total num frames: 286720. Throughput: 0: 810.1. Samples: 70952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:07:30,735][08744] Avg episode reward: [(0, '4.406')] +[2023-02-25 17:07:35,723][08744] Fps is (10 sec: 2867.7, 60 sec: 3140.4, 300 sec: 2600.1). Total num frames: 299008. Throughput: 0: 782.0. Samples: 74808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:07:35,725][08744] Avg episode reward: [(0, '4.294')] +[2023-02-25 17:07:40,723][08744] Fps is (10 sec: 2869.3, 60 sec: 3140.3, 300 sec: 2628.3). Total num frames: 315392. Throughput: 0: 784.4. Samples: 78912. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:07:40,729][08744] Avg episode reward: [(0, '4.296')] +[2023-02-25 17:07:43,373][14414] Updated weights for policy 0, policy_version 80 (0.0028) +[2023-02-25 17:07:45,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2687.0). Total num frames: 335872. Throughput: 0: 805.6. Samples: 81818. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-25 17:07:45,729][08744] Avg episode reward: [(0, '4.178')] +[2023-02-25 17:07:50,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2709.7). Total num frames: 352256. Throughput: 0: 811.5. Samples: 87854. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-25 17:07:50,728][08744] Avg episode reward: [(0, '4.235')] +[2023-02-25 17:07:55,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2700.3). Total num frames: 364544. Throughput: 0: 780.3. Samples: 91582. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-25 17:07:55,731][08744] Avg episode reward: [(0, '4.349')] +[2023-02-25 17:07:56,403][14414] Updated weights for policy 0, policy_version 90 (0.0032) +[2023-02-25 17:08:00,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 2691.7). Total num frames: 376832. Throughput: 0: 782.7. Samples: 93476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:08:00,729][08744] Avg episode reward: [(0, '4.510')] +[2023-02-25 17:08:05,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 2740.1). Total num frames: 397312. Throughput: 0: 814.1. Samples: 98704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:08:05,725][08744] Avg episode reward: [(0, '4.583')] +[2023-02-25 17:08:08,313][14414] Updated weights for policy 0, policy_version 100 (0.0021) +[2023-02-25 17:08:10,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 2758.0). Total num frames: 413696. Throughput: 0: 809.4. Samples: 104374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:08:10,736][08744] Avg episode reward: [(0, '4.776')] +[2023-02-25 17:08:10,787][14400] Saving new best policy, reward=4.776! +[2023-02-25 17:08:15,724][08744] Fps is (10 sec: 2866.8, 60 sec: 3140.2, 300 sec: 2748.3). Total num frames: 425984. Throughput: 0: 780.8. Samples: 106084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:08:15,732][08744] Avg episode reward: [(0, '4.512')] +[2023-02-25 17:08:20,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2764.8). Total num frames: 442368. Throughput: 0: 778.3. Samples: 109832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:08:20,728][08744] Avg episode reward: [(0, '4.527')] +[2023-02-25 17:08:22,334][14414] Updated weights for policy 0, policy_version 110 (0.0021) +[2023-02-25 17:08:25,723][08744] Fps is (10 sec: 3686.9, 60 sec: 3208.6, 300 sec: 2805.1). Total num frames: 462848. Throughput: 0: 825.2. Samples: 116044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:08:25,731][08744] Avg episode reward: [(0, '4.627')] +[2023-02-25 17:08:30,725][08744] Fps is (10 sec: 3685.4, 60 sec: 3208.8, 300 sec: 2819.0). Total num frames: 479232. Throughput: 0: 830.4. Samples: 119188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:08:30,728][08744] Avg episode reward: [(0, '4.615')] +[2023-02-25 17:08:35,726][08744] Fps is (10 sec: 2456.7, 60 sec: 3140.1, 300 sec: 2785.2). Total num frames: 487424. Throughput: 0: 766.7. Samples: 122358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:08:35,729][08744] Avg episode reward: [(0, '4.539')] +[2023-02-25 17:08:36,154][14414] Updated weights for policy 0, policy_version 120 (0.0015) +[2023-02-25 17:08:40,723][08744] Fps is (10 sec: 2048.5, 60 sec: 3072.0, 300 sec: 2776.2). Total num frames: 499712. Throughput: 0: 750.8. Samples: 125368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:08:40,733][08744] Avg episode reward: [(0, '4.489')] +[2023-02-25 17:08:45,723][08744] Fps is (10 sec: 2458.5, 60 sec: 2935.5, 300 sec: 2767.6). Total num frames: 512000. Throughput: 0: 740.1. Samples: 126780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:08:45,728][08744] Avg episode reward: [(0, '4.580')] +[2023-02-25 17:08:50,279][14414] Updated weights for policy 0, policy_version 130 (0.0016) +[2023-02-25 17:08:50,728][08744] Fps is (10 sec: 3275.2, 60 sec: 3003.5, 300 sec: 2802.5). Total num frames: 532480. Throughput: 0: 744.3. Samples: 132202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:08:50,734][08744] Avg episode reward: [(0, '4.607')] +[2023-02-25 17:08:55,723][08744] Fps is (10 sec: 3686.3, 60 sec: 3072.0, 300 sec: 2814.7). Total num frames: 548864. Throughput: 0: 744.2. Samples: 137862. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:08:55,728][08744] Avg episode reward: [(0, '4.758')] +[2023-02-25 17:09:00,727][08744] Fps is (10 sec: 2867.3, 60 sec: 3071.8, 300 sec: 2805.7). Total num frames: 561152. Throughput: 0: 748.7. Samples: 139776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:09:00,730][08744] Avg episode reward: [(0, '4.748')] +[2023-02-25 17:09:04,307][14414] Updated weights for policy 0, policy_version 140 (0.0033) +[2023-02-25 17:09:05,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2817.2). Total num frames: 577536. Throughput: 0: 748.9. Samples: 143534. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:09:05,729][08744] Avg episode reward: [(0, '4.662')] +[2023-02-25 17:09:10,723][08744] Fps is (10 sec: 3278.3, 60 sec: 3003.7, 300 sec: 2828.2). Total num frames: 593920. Throughput: 0: 732.0. Samples: 148982. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:09:10,726][08744] Avg episode reward: [(0, '4.529')] +[2023-02-25 17:09:15,017][14414] Updated weights for policy 0, policy_version 150 (0.0021) +[2023-02-25 17:09:15,723][08744] Fps is (10 sec: 3686.5, 60 sec: 3140.3, 300 sec: 2857.7). Total num frames: 614400. Throughput: 0: 729.0. Samples: 151992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:09:15,728][08744] Avg episode reward: [(0, '4.614')] +[2023-02-25 17:09:15,740][14400] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000150_614400.pth... +[2023-02-25 17:09:20,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 2848.6). Total num frames: 626688. Throughput: 0: 755.7. Samples: 156362. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:09:20,734][08744] Avg episode reward: [(0, '4.498')] +[2023-02-25 17:09:25,723][08744] Fps is (10 sec: 2457.6, 60 sec: 2935.5, 300 sec: 2839.9). Total num frames: 638976. Throughput: 0: 771.4. Samples: 160080. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2023-02-25 17:09:25,725][08744] Avg episode reward: [(0, '4.610')] +[2023-02-25 17:09:29,640][14414] Updated weights for policy 0, policy_version 160 (0.0019) +[2023-02-25 17:09:30,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3003.9, 300 sec: 2867.2). Total num frames: 659456. Throughput: 0: 799.3. Samples: 162750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-25 17:09:30,730][08744] Avg episode reward: [(0, '4.637')] +[2023-02-25 17:09:35,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3140.4, 300 sec: 2875.9). Total num frames: 675840. Throughput: 0: 808.6. Samples: 168584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:09:35,736][08744] Avg episode reward: [(0, '4.827')] +[2023-02-25 17:09:35,749][14400] Saving new best policy, reward=4.827! +[2023-02-25 17:09:40,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 2867.2). Total num frames: 688128. Throughput: 0: 781.2. Samples: 173016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:09:40,734][08744] Avg episode reward: [(0, '4.839')] +[2023-02-25 17:09:40,738][14400] Saving new best policy, reward=4.839! +[2023-02-25 17:09:42,225][14414] Updated weights for policy 0, policy_version 170 (0.0025) +[2023-02-25 17:09:45,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2875.6). Total num frames: 704512. Throughput: 0: 784.0. Samples: 175054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:09:45,730][08744] Avg episode reward: [(0, '4.863')] +[2023-02-25 17:09:45,743][14400] Saving new best policy, reward=4.863! +[2023-02-25 17:09:50,726][08744] Fps is (10 sec: 3685.1, 60 sec: 3208.6, 300 sec: 2899.9). Total num frames: 724992. Throughput: 0: 815.1. Samples: 180218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:09:50,730][08744] Avg episode reward: [(0, '5.033')] +[2023-02-25 17:09:50,738][14400] Saving new best policy, reward=5.033! +[2023-02-25 17:09:53,707][14414] Updated weights for policy 0, policy_version 180 (0.0025) +[2023-02-25 17:09:55,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2907.4). Total num frames: 741376. Throughput: 0: 824.0. Samples: 186062. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:09:55,730][08744] Avg episode reward: [(0, '5.025')] +[2023-02-25 17:10:00,723][08744] Fps is (10 sec: 3278.0, 60 sec: 3277.1, 300 sec: 2914.5). Total num frames: 757760. Throughput: 0: 810.1. Samples: 188446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:10:00,727][08744] Avg episode reward: [(0, '4.923')] +[2023-02-25 17:10:05,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2905.8). Total num frames: 770048. Throughput: 0: 792.7. Samples: 192034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:10:05,729][08744] Avg episode reward: [(0, '5.099')] +[2023-02-25 17:10:05,744][14400] Saving new best policy, reward=5.099! +[2023-02-25 17:10:07,753][14414] Updated weights for policy 0, policy_version 190 (0.0023) +[2023-02-25 17:10:10,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2912.7). Total num frames: 786432. Throughput: 0: 822.0. Samples: 197072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:10:10,730][08744] Avg episode reward: [(0, '5.299')] +[2023-02-25 17:10:10,732][14400] Saving new best policy, reward=5.299! +[2023-02-25 17:10:15,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 2934.2). Total num frames: 806912. Throughput: 0: 827.3. Samples: 199978. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:10:15,728][08744] Avg episode reward: [(0, '5.744')] +[2023-02-25 17:10:15,739][14400] Saving new best policy, reward=5.744! +[2023-02-25 17:10:19,047][14414] Updated weights for policy 0, policy_version 200 (0.0013) +[2023-02-25 17:10:20,726][08744] Fps is (10 sec: 3275.6, 60 sec: 3208.3, 300 sec: 2925.7). Total num frames: 819200. Throughput: 0: 811.6. Samples: 205110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:10:20,730][08744] Avg episode reward: [(0, '5.686')] +[2023-02-25 17:10:25,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 2917.5). Total num frames: 831488. Throughput: 0: 795.5. Samples: 208814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:10:25,728][08744] Avg episode reward: [(0, '5.266')] +[2023-02-25 17:10:30,723][08744] Fps is (10 sec: 3278.0, 60 sec: 3208.5, 300 sec: 2937.8). Total num frames: 851968. Throughput: 0: 802.8. Samples: 211180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:10:30,725][08744] Avg episode reward: [(0, '5.335')] +[2023-02-25 17:10:32,317][14414] Updated weights for policy 0, policy_version 210 (0.0022) +[2023-02-25 17:10:35,723][08744] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 2957.5). Total num frames: 872448. Throughput: 0: 818.6. Samples: 217054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:10:35,726][08744] Avg episode reward: [(0, '5.309')] +[2023-02-25 17:10:40,727][08744] Fps is (10 sec: 3275.3, 60 sec: 3276.5, 300 sec: 2999.1). Total num frames: 884736. Throughput: 0: 795.5. Samples: 221864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:10:40,731][08744] Avg episode reward: [(0, '5.515')] +[2023-02-25 17:10:45,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3208.5, 300 sec: 3040.8). Total num frames: 897024. Throughput: 0: 784.6. Samples: 223754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:10:45,727][08744] Avg episode reward: [(0, '5.396')] +[2023-02-25 17:10:46,126][14414] Updated weights for policy 0, policy_version 220 (0.0021) +[2023-02-25 17:10:50,723][08744] Fps is (10 sec: 3278.2, 60 sec: 3208.7, 300 sec: 3110.2). Total num frames: 917504. Throughput: 0: 800.3. Samples: 228046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:10:50,726][08744] Avg episode reward: [(0, '5.215')] +[2023-02-25 17:10:55,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3151.8). Total num frames: 933888. Throughput: 0: 822.3. Samples: 234076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:10:55,729][08744] Avg episode reward: [(0, '5.265')] +[2023-02-25 17:10:56,917][14414] Updated weights for policy 0, policy_version 230 (0.0016) +[2023-02-25 17:11:00,723][08744] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3138.0). Total num frames: 950272. Throughput: 0: 820.6. Samples: 236904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:11:00,731][08744] Avg episode reward: [(0, '5.549')] +[2023-02-25 17:11:05,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3151.8). Total num frames: 962560. Throughput: 0: 789.2. Samples: 240620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:11:05,728][08744] Avg episode reward: [(0, '5.692')] +[2023-02-25 17:11:10,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 978944. Throughput: 0: 804.1. Samples: 245000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:11:10,728][08744] Avg episode reward: [(0, '5.749')] +[2023-02-25 17:11:10,731][14400] Saving new best policy, reward=5.749! +[2023-02-25 17:11:11,258][14414] Updated weights for policy 0, policy_version 240 (0.0013) +[2023-02-25 17:11:15,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 999424. Throughput: 0: 818.8. Samples: 248026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:11:15,724][08744] Avg episode reward: [(0, '6.110')] +[2023-02-25 17:11:15,741][14400] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000244_999424.pth... +[2023-02-25 17:11:15,906][14400] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000058_237568.pth +[2023-02-25 17:11:15,916][14400] Saving new best policy, reward=6.110! +[2023-02-25 17:11:20,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3277.0, 300 sec: 3151.8). Total num frames: 1015808. Throughput: 0: 818.4. Samples: 253882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:11:20,725][08744] Avg episode reward: [(0, '6.685')] +[2023-02-25 17:11:20,730][14400] Saving new best policy, reward=6.685! +[2023-02-25 17:11:23,217][14414] Updated weights for policy 0, policy_version 250 (0.0016) +[2023-02-25 17:11:25,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3151.9). Total num frames: 1028096. Throughput: 0: 790.3. Samples: 257424. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:11:25,746][08744] Avg episode reward: [(0, '7.039')] +[2023-02-25 17:11:25,760][14400] Saving new best policy, reward=7.039! +[2023-02-25 17:11:30,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3165.8). Total num frames: 1044480. Throughput: 0: 790.6. Samples: 259332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:11:30,730][08744] Avg episode reward: [(0, '7.019')] +[2023-02-25 17:11:35,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1060864. Throughput: 0: 812.3. Samples: 264600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:11:35,730][08744] Avg episode reward: [(0, '6.927')] +[2023-02-25 17:11:36,031][14414] Updated weights for policy 0, policy_version 260 (0.0042) +[2023-02-25 17:11:40,729][08744] Fps is (10 sec: 3274.6, 60 sec: 3208.4, 300 sec: 3165.7). Total num frames: 1077248. Throughput: 0: 801.4. Samples: 270146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:11:40,732][08744] Avg episode reward: [(0, '7.292')] +[2023-02-25 17:11:40,734][14400] Saving new best policy, reward=7.292! +[2023-02-25 17:11:45,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3151.8). Total num frames: 1089536. Throughput: 0: 777.9. Samples: 271908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:11:45,727][08744] Avg episode reward: [(0, '7.455')] +[2023-02-25 17:11:45,750][14400] Saving new best policy, reward=7.455! +[2023-02-25 17:11:50,679][14414] Updated weights for policy 0, policy_version 270 (0.0028) +[2023-02-25 17:11:50,723][08744] Fps is (10 sec: 2869.1, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1105920. Throughput: 0: 774.8. Samples: 275484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:11:50,725][08744] Avg episode reward: [(0, '7.914')] +[2023-02-25 17:11:50,732][14400] Saving new best policy, reward=7.914! +[2023-02-25 17:11:55,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1122304. Throughput: 0: 799.6. Samples: 280982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:11:55,725][08744] Avg episode reward: [(0, '8.110')] +[2023-02-25 17:11:55,736][14400] Saving new best policy, reward=8.110! +[2023-02-25 17:12:00,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3151.8). Total num frames: 1138688. Throughput: 0: 799.9. Samples: 284022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:12:00,729][08744] Avg episode reward: [(0, '8.484')] +[2023-02-25 17:12:00,757][14400] Saving new best policy, reward=8.484! +[2023-02-25 17:12:02,413][14414] Updated weights for policy 0, policy_version 280 (0.0026) +[2023-02-25 17:12:05,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3138.0). Total num frames: 1150976. Throughput: 0: 759.8. Samples: 288072. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:12:05,728][08744] Avg episode reward: [(0, '8.342')] +[2023-02-25 17:12:10,723][08744] Fps is (10 sec: 2457.5, 60 sec: 3072.0, 300 sec: 3137.9). Total num frames: 1163264. Throughput: 0: 760.3. Samples: 291636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:12:10,725][08744] Avg episode reward: [(0, '8.155')] +[2023-02-25 17:12:15,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3165.7). Total num frames: 1183744. Throughput: 0: 781.1. Samples: 294480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-25 17:12:15,736][08744] Avg episode reward: [(0, '7.725')] +[2023-02-25 17:12:16,115][14414] Updated weights for policy 0, policy_version 290 (0.0027) +[2023-02-25 17:12:20,723][08744] Fps is (10 sec: 4096.2, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1204224. Throughput: 0: 792.1. Samples: 300246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:12:20,730][08744] Avg episode reward: [(0, '7.452')] +[2023-02-25 17:12:25,725][08744] Fps is (10 sec: 3276.0, 60 sec: 3140.1, 300 sec: 3151.9). Total num frames: 1216512. Throughput: 0: 758.9. Samples: 304292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:12:25,728][08744] Avg episode reward: [(0, '7.537')] +[2023-02-25 17:12:30,646][14414] Updated weights for policy 0, policy_version 300 (0.0028) +[2023-02-25 17:12:30,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3151.8). Total num frames: 1228800. Throughput: 0: 758.7. Samples: 306048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:12:30,732][08744] Avg episode reward: [(0, '8.051')] +[2023-02-25 17:12:35,723][08744] Fps is (10 sec: 3277.7, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1249280. Throughput: 0: 791.9. Samples: 311118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:12:35,726][08744] Avg episode reward: [(0, '8.974')] +[2023-02-25 17:12:35,738][14400] Saving new best policy, reward=8.974! +[2023-02-25 17:12:40,723][08744] Fps is (10 sec: 3686.1, 60 sec: 3140.6, 300 sec: 3151.8). Total num frames: 1265664. Throughput: 0: 806.1. Samples: 317256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:12:40,726][08744] Avg episode reward: [(0, '9.192')] +[2023-02-25 17:12:40,735][14400] Saving new best policy, reward=9.192! +[2023-02-25 17:12:40,749][14414] Updated weights for policy 0, policy_version 310 (0.0022) +[2023-02-25 17:12:45,724][08744] Fps is (10 sec: 2866.9, 60 sec: 3140.2, 300 sec: 3137.9). Total num frames: 1277952. Throughput: 0: 782.2. Samples: 319220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:12:45,730][08744] Avg episode reward: [(0, '9.452')] +[2023-02-25 17:12:45,850][14400] Saving new best policy, reward=9.452! +[2023-02-25 17:12:50,723][08744] Fps is (10 sec: 2457.8, 60 sec: 3072.0, 300 sec: 3138.0). Total num frames: 1290240. Throughput: 0: 774.4. Samples: 322922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:12:50,729][08744] Avg episode reward: [(0, '9.020')] +[2023-02-25 17:12:55,429][14414] Updated weights for policy 0, policy_version 320 (0.0024) +[2023-02-25 17:12:55,723][08744] Fps is (10 sec: 3277.1, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1310720. Throughput: 0: 800.9. Samples: 327674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:12:55,728][08744] Avg episode reward: [(0, '9.124')] +[2023-02-25 17:13:00,723][08744] Fps is (10 sec: 4096.1, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 1331200. Throughput: 0: 803.5. Samples: 330638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:13:00,725][08744] Avg episode reward: [(0, '9.416')] +[2023-02-25 17:13:05,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3151.8). Total num frames: 1343488. Throughput: 0: 780.9. Samples: 335388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:13:05,729][08744] Avg episode reward: [(0, '9.770')] +[2023-02-25 17:13:05,741][14400] Saving new best policy, reward=9.770! +[2023-02-25 17:13:08,879][14414] Updated weights for policy 0, policy_version 330 (0.0019) +[2023-02-25 17:13:10,725][08744] Fps is (10 sec: 2457.0, 60 sec: 3208.4, 300 sec: 3151.8). Total num frames: 1355776. Throughput: 0: 770.9. Samples: 338980. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:13:10,727][08744] Avg episode reward: [(0, '10.805')] +[2023-02-25 17:13:10,736][14400] Saving new best policy, reward=10.805! +[2023-02-25 17:13:15,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3151.8). Total num frames: 1372160. Throughput: 0: 779.9. Samples: 341142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:13:15,726][08744] Avg episode reward: [(0, '10.669')] +[2023-02-25 17:13:15,735][14400] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000335_1372160.pth... +[2023-02-25 17:13:15,878][14400] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000150_614400.pth +[2023-02-25 17:13:20,723][08744] Fps is (10 sec: 3277.6, 60 sec: 3072.0, 300 sec: 3138.0). Total num frames: 1388544. Throughput: 0: 792.1. Samples: 346762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:13:20,730][08744] Avg episode reward: [(0, '11.135')] +[2023-02-25 17:13:20,734][14400] Saving new best policy, reward=11.135! +[2023-02-25 17:13:21,057][14414] Updated weights for policy 0, policy_version 340 (0.0031) +[2023-02-25 17:13:25,724][08744] Fps is (10 sec: 3276.5, 60 sec: 3140.4, 300 sec: 3138.0). Total num frames: 1404928. Throughput: 0: 760.7. Samples: 351488. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-25 17:13:25,731][08744] Avg episode reward: [(0, '11.089')] +[2023-02-25 17:13:30,723][08744] Fps is (10 sec: 2867.1, 60 sec: 3140.2, 300 sec: 3151.9). Total num frames: 1417216. Throughput: 0: 757.2. Samples: 353292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:13:30,729][08744] Avg episode reward: [(0, '10.085')] +[2023-02-25 17:13:35,257][14414] Updated weights for policy 0, policy_version 350 (0.0025) +[2023-02-25 17:13:35,723][08744] Fps is (10 sec: 2867.5, 60 sec: 3072.0, 300 sec: 3165.7). Total num frames: 1433600. Throughput: 0: 767.1. Samples: 357440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:13:35,725][08744] Avg episode reward: [(0, '10.197')] +[2023-02-25 17:13:40,723][08744] Fps is (10 sec: 3686.6, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 1454080. Throughput: 0: 789.4. Samples: 363196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:13:40,725][08744] Avg episode reward: [(0, '9.878')] +[2023-02-25 17:13:45,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3165.8). Total num frames: 1466368. Throughput: 0: 787.6. Samples: 366080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:13:45,726][08744] Avg episode reward: [(0, '10.156')] +[2023-02-25 17:13:47,744][14414] Updated weights for policy 0, policy_version 360 (0.0016) +[2023-02-25 17:13:50,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3151.8). Total num frames: 1478656. Throughput: 0: 763.6. Samples: 369752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:13:50,725][08744] Avg episode reward: [(0, '10.745')] +[2023-02-25 17:13:55,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3165.8). Total num frames: 1495040. Throughput: 0: 776.5. Samples: 373920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:13:55,725][08744] Avg episode reward: [(0, '11.914')] +[2023-02-25 17:13:55,741][14400] Saving new best policy, reward=11.914! +[2023-02-25 17:14:00,348][14414] Updated weights for policy 0, policy_version 370 (0.0020) +[2023-02-25 17:14:00,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3179.6). Total num frames: 1515520. Throughput: 0: 796.7. Samples: 376992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:14:00,730][08744] Avg episode reward: [(0, '11.641')] +[2023-02-25 17:14:05,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3179.6). Total num frames: 1531904. Throughput: 0: 794.6. Samples: 382520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:14:05,725][08744] Avg episode reward: [(0, '11.505')] +[2023-02-25 17:14:10,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3140.4, 300 sec: 3151.8). Total num frames: 1544192. Throughput: 0: 771.5. Samples: 386204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:14:10,727][08744] Avg episode reward: [(0, '10.725')] +[2023-02-25 17:14:15,082][14414] Updated weights for policy 0, policy_version 380 (0.0018) +[2023-02-25 17:14:15,724][08744] Fps is (10 sec: 2457.4, 60 sec: 3072.0, 300 sec: 3151.8). Total num frames: 1556480. Throughput: 0: 770.5. Samples: 387964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:14:15,730][08744] Avg episode reward: [(0, '10.763')] +[2023-02-25 17:14:20,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3179.6). Total num frames: 1576960. Throughput: 0: 801.9. Samples: 393526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:14:20,730][08744] Avg episode reward: [(0, '11.336')] +[2023-02-25 17:14:25,723][08744] Fps is (10 sec: 3686.7, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 1593344. Throughput: 0: 797.4. Samples: 399080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:14:25,733][08744] Avg episode reward: [(0, '11.515')] +[2023-02-25 17:14:25,941][14414] Updated weights for policy 0, policy_version 390 (0.0021) +[2023-02-25 17:14:30,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3151.8). Total num frames: 1605632. Throughput: 0: 776.9. Samples: 401040. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:14:30,728][08744] Avg episode reward: [(0, '12.102')] +[2023-02-25 17:14:30,734][14400] Saving new best policy, reward=12.102! +[2023-02-25 17:14:35,723][08744] Fps is (10 sec: 2867.0, 60 sec: 3140.2, 300 sec: 3165.7). Total num frames: 1622016. Throughput: 0: 773.5. Samples: 404558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:14:35,729][08744] Avg episode reward: [(0, '11.995')] +[2023-02-25 17:14:39,854][14414] Updated weights for policy 0, policy_version 400 (0.0017) +[2023-02-25 17:14:40,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3165.7). Total num frames: 1638400. Throughput: 0: 808.0. Samples: 410282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-25 17:14:40,729][08744] Avg episode reward: [(0, '11.900')] +[2023-02-25 17:14:45,727][08744] Fps is (10 sec: 2866.0, 60 sec: 3071.8, 300 sec: 3137.9). Total num frames: 1650688. Throughput: 0: 789.8. Samples: 412538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-25 17:14:45,730][08744] Avg episode reward: [(0, '12.644')] +[2023-02-25 17:14:45,751][14400] Saving new best policy, reward=12.644! +[2023-02-25 17:14:50,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3124.1). Total num frames: 1662976. Throughput: 0: 737.1. Samples: 415688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:14:50,729][08744] Avg episode reward: [(0, '13.302')] +[2023-02-25 17:14:50,732][14400] Saving new best policy, reward=13.302! +[2023-02-25 17:14:55,723][08744] Fps is (10 sec: 2048.9, 60 sec: 2935.5, 300 sec: 3096.3). Total num frames: 1671168. Throughput: 0: 718.6. Samples: 418542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:14:55,728][08744] Avg episode reward: [(0, '13.665')] +[2023-02-25 17:14:55,741][14400] Saving new best policy, reward=13.665! +[2023-02-25 17:14:57,799][14414] Updated weights for policy 0, policy_version 410 (0.0023) +[2023-02-25 17:15:00,723][08744] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 3110.2). Total num frames: 1687552. Throughput: 0: 722.1. Samples: 420458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:15:00,729][08744] Avg episode reward: [(0, '13.653')] +[2023-02-25 17:15:05,723][08744] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 3110.2). Total num frames: 1703936. Throughput: 0: 717.1. Samples: 425794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:15:05,730][08744] Avg episode reward: [(0, '13.800')] +[2023-02-25 17:15:05,743][14400] Saving new best policy, reward=13.800! +[2023-02-25 17:15:08,644][14414] Updated weights for policy 0, policy_version 420 (0.0013) +[2023-02-25 17:15:10,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 3110.2). Total num frames: 1724416. Throughput: 0: 721.3. Samples: 431540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:15:10,726][08744] Avg episode reward: [(0, '13.722')] +[2023-02-25 17:15:15,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3003.8, 300 sec: 3110.2). Total num frames: 1736704. Throughput: 0: 724.8. Samples: 433656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:15:15,729][08744] Avg episode reward: [(0, '13.638')] +[2023-02-25 17:15:15,740][14400] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth... +[2023-02-25 17:15:15,906][14400] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000244_999424.pth +[2023-02-25 17:15:20,723][08744] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 3124.1). Total num frames: 1753088. Throughput: 0: 735.3. Samples: 437644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:15:20,735][08744] Avg episode reward: [(0, '14.002')] +[2023-02-25 17:15:20,742][14400] Saving new best policy, reward=14.002! +[2023-02-25 17:15:22,535][14414] Updated weights for policy 0, policy_version 430 (0.0026) +[2023-02-25 17:15:25,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 3124.1). Total num frames: 1773568. Throughput: 0: 729.5. Samples: 443108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:15:25,731][08744] Avg episode reward: [(0, '14.465')] +[2023-02-25 17:15:25,744][14400] Saving new best policy, reward=14.465! +[2023-02-25 17:15:30,724][08744] Fps is (10 sec: 3686.0, 60 sec: 3071.9, 300 sec: 3110.2). Total num frames: 1789952. Throughput: 0: 742.5. Samples: 445950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:15:30,727][08744] Avg episode reward: [(0, '15.492')] +[2023-02-25 17:15:30,730][14400] Saving new best policy, reward=15.492! +[2023-02-25 17:15:35,730][08744] Fps is (10 sec: 2455.7, 60 sec: 2935.1, 300 sec: 3096.3). Total num frames: 1798144. Throughput: 0: 760.7. Samples: 449924. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:15:35,737][08744] Avg episode reward: [(0, '15.255')] +[2023-02-25 17:15:35,818][14414] Updated weights for policy 0, policy_version 440 (0.0018) +[2023-02-25 17:15:40,723][08744] Fps is (10 sec: 2457.9, 60 sec: 2935.5, 300 sec: 3110.2). Total num frames: 1814528. Throughput: 0: 779.0. Samples: 453596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:15:40,731][08744] Avg episode reward: [(0, '15.440')] +[2023-02-25 17:15:45,723][08744] Fps is (10 sec: 3279.2, 60 sec: 3003.9, 300 sec: 3096.3). Total num frames: 1830912. Throughput: 0: 799.5. Samples: 456436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:15:45,725][08744] Avg episode reward: [(0, '14.834')] +[2023-02-25 17:15:47,996][14414] Updated weights for policy 0, policy_version 450 (0.0024) +[2023-02-25 17:15:50,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 1851392. Throughput: 0: 809.0. Samples: 462198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:15:50,728][08744] Avg episode reward: [(0, '14.948')] +[2023-02-25 17:15:55,723][08744] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3096.3). Total num frames: 1863680. Throughput: 0: 769.0. Samples: 466146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:15:55,726][08744] Avg episode reward: [(0, '14.785')] +[2023-02-25 17:16:00,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 1875968. Throughput: 0: 760.6. Samples: 467882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:16:00,724][08744] Avg episode reward: [(0, '15.583')] +[2023-02-25 17:16:00,732][14400] Saving new best policy, reward=15.583! +[2023-02-25 17:16:02,654][14414] Updated weights for policy 0, policy_version 460 (0.0022) +[2023-02-25 17:16:05,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 1896448. Throughput: 0: 786.2. Samples: 473022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-25 17:16:05,726][08744] Avg episode reward: [(0, '14.764')] +[2023-02-25 17:16:10,723][08744] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 1916928. Throughput: 0: 809.0. Samples: 479512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:16:10,725][08744] Avg episode reward: [(0, '16.320')] +[2023-02-25 17:16:10,729][14400] Saving new best policy, reward=16.320! +[2023-02-25 17:16:13,238][14414] Updated weights for policy 0, policy_version 470 (0.0013) +[2023-02-25 17:16:15,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3096.3). Total num frames: 1929216. Throughput: 0: 789.8. Samples: 481490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:16:15,725][08744] Avg episode reward: [(0, '16.161')] +[2023-02-25 17:16:20,723][08744] Fps is (10 sec: 2457.5, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 1941504. Throughput: 0: 783.9. Samples: 485194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:16:20,731][08744] Avg episode reward: [(0, '15.425')] +[2023-02-25 17:16:25,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3096.3). Total num frames: 1957888. Throughput: 0: 807.4. Samples: 489930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:16:25,731][08744] Avg episode reward: [(0, '15.040')] +[2023-02-25 17:16:27,077][14414] Updated weights for policy 0, policy_version 480 (0.0013) +[2023-02-25 17:16:30,723][08744] Fps is (10 sec: 3686.5, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 1978368. Throughput: 0: 811.4. Samples: 492948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:16:30,731][08744] Avg episode reward: [(0, '15.778')] +[2023-02-25 17:16:35,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.9, 300 sec: 3096.4). Total num frames: 1990656. Throughput: 0: 792.6. Samples: 497866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:16:35,726][08744] Avg episode reward: [(0, '15.815')] +[2023-02-25 17:16:40,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 2002944. Throughput: 0: 786.0. Samples: 501518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:16:40,731][08744] Avg episode reward: [(0, '16.218')] +[2023-02-25 17:16:41,138][14414] Updated weights for policy 0, policy_version 490 (0.0023) +[2023-02-25 17:16:45,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 2023424. Throughput: 0: 795.2. Samples: 503668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:16:45,729][08744] Avg episode reward: [(0, '16.968')] +[2023-02-25 17:16:45,740][14400] Saving new best policy, reward=16.968! +[2023-02-25 17:16:50,723][08744] Fps is (10 sec: 3686.5, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 2039808. Throughput: 0: 810.2. Samples: 509482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:16:50,731][08744] Avg episode reward: [(0, '16.833')] +[2023-02-25 17:16:52,073][14414] Updated weights for policy 0, policy_version 500 (0.0014) +[2023-02-25 17:16:55,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 2056192. Throughput: 0: 770.4. Samples: 514178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:16:55,725][08744] Avg episode reward: [(0, '16.872')] +[2023-02-25 17:17:00,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 2068480. Throughput: 0: 765.7. Samples: 515948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:17:00,729][08744] Avg episode reward: [(0, '16.678')] +[2023-02-25 17:17:05,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 2084864. Throughput: 0: 774.6. Samples: 520050. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:17:05,726][08744] Avg episode reward: [(0, '16.460')] +[2023-02-25 17:17:06,715][14414] Updated weights for policy 0, policy_version 510 (0.0029) +[2023-02-25 17:17:10,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3110.2). Total num frames: 2101248. Throughput: 0: 797.4. Samples: 525812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:17:10,725][08744] Avg episode reward: [(0, '15.644')] +[2023-02-25 17:17:15,725][08744] Fps is (10 sec: 3276.0, 60 sec: 3140.1, 300 sec: 3096.3). Total num frames: 2117632. Throughput: 0: 793.6. Samples: 528664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:17:15,731][08744] Avg episode reward: [(0, '17.008')] +[2023-02-25 17:17:15,746][14400] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000517_2117632.pth... +[2023-02-25 17:17:15,912][14400] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000335_1372160.pth +[2023-02-25 17:17:15,930][14400] Saving new best policy, reward=17.008! +[2023-02-25 17:17:20,240][14414] Updated weights for policy 0, policy_version 520 (0.0017) +[2023-02-25 17:17:20,724][08744] Fps is (10 sec: 2866.8, 60 sec: 3140.2, 300 sec: 3096.3). Total num frames: 2129920. Throughput: 0: 762.3. Samples: 532172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:17:20,730][08744] Avg episode reward: [(0, '17.734')] +[2023-02-25 17:17:20,736][14400] Saving new best policy, reward=17.734! +[2023-02-25 17:17:25,723][08744] Fps is (10 sec: 2867.9, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 2146304. Throughput: 0: 770.6. Samples: 536196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:17:25,726][08744] Avg episode reward: [(0, '17.502')] +[2023-02-25 17:17:30,723][08744] Fps is (10 sec: 3277.3, 60 sec: 3072.0, 300 sec: 3096.3). Total num frames: 2162688. Throughput: 0: 786.8. Samples: 539072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:17:30,730][08744] Avg episode reward: [(0, '19.060')] +[2023-02-25 17:17:30,738][14400] Saving new best policy, reward=19.060! +[2023-02-25 17:17:32,151][14414] Updated weights for policy 0, policy_version 530 (0.0025) +[2023-02-25 17:17:35,723][08744] Fps is (10 sec: 3276.7, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 2179072. Throughput: 0: 780.3. Samples: 544598. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:17:35,725][08744] Avg episode reward: [(0, '19.363')] +[2023-02-25 17:17:35,733][14400] Saving new best policy, reward=19.363! +[2023-02-25 17:17:40,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 2191360. Throughput: 0: 756.4. Samples: 548214. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2023-02-25 17:17:40,732][08744] Avg episode reward: [(0, '19.549')] +[2023-02-25 17:17:40,736][14400] Saving new best policy, reward=19.549! +[2023-02-25 17:17:45,723][08744] Fps is (10 sec: 2457.7, 60 sec: 3003.7, 300 sec: 3096.3). Total num frames: 2203648. Throughput: 0: 755.5. Samples: 549946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:17:45,732][08744] Avg episode reward: [(0, '20.077')] +[2023-02-25 17:17:45,744][14400] Saving new best policy, reward=20.077! +[2023-02-25 17:17:47,021][14414] Updated weights for policy 0, policy_version 540 (0.0034) +[2023-02-25 17:17:50,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3096.3). Total num frames: 2224128. Throughput: 0: 788.4. Samples: 555528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:17:50,729][08744] Avg episode reward: [(0, '19.937')] +[2023-02-25 17:17:55,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3082.4). Total num frames: 2240512. Throughput: 0: 779.2. Samples: 560878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:17:55,728][08744] Avg episode reward: [(0, '19.433')] +[2023-02-25 17:17:59,244][14414] Updated weights for policy 0, policy_version 550 (0.0040) +[2023-02-25 17:18:00,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3082.4). Total num frames: 2252800. Throughput: 0: 758.2. Samples: 562780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:18:00,726][08744] Avg episode reward: [(0, '18.060')] +[2023-02-25 17:18:05,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3096.3). Total num frames: 2269184. Throughput: 0: 761.4. Samples: 566432. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:18:05,725][08744] Avg episode reward: [(0, '18.077')] +[2023-02-25 17:18:10,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3096.3). Total num frames: 2285568. Throughput: 0: 795.8. Samples: 572006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:18:10,726][08744] Avg episode reward: [(0, '17.103')] +[2023-02-25 17:18:12,052][14414] Updated weights for policy 0, policy_version 560 (0.0032) +[2023-02-25 17:18:15,724][08744] Fps is (10 sec: 3686.0, 60 sec: 3140.4, 300 sec: 3110.2). Total num frames: 2306048. Throughput: 0: 794.5. Samples: 574826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:18:15,726][08744] Avg episode reward: [(0, '18.251')] +[2023-02-25 17:18:20,726][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 2318336. Throughput: 0: 771.0. Samples: 579292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:18:20,729][08744] Avg episode reward: [(0, '17.980')] +[2023-02-25 17:18:25,723][08744] Fps is (10 sec: 2457.8, 60 sec: 3072.0, 300 sec: 3096.3). Total num frames: 2330624. Throughput: 0: 769.2. Samples: 582826. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:18:25,730][08744] Avg episode reward: [(0, '17.932')] +[2023-02-25 17:18:26,486][14414] Updated weights for policy 0, policy_version 570 (0.0060) +[2023-02-25 17:18:30,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3096.3). Total num frames: 2347008. Throughput: 0: 790.2. Samples: 585504. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:18:30,725][08744] Avg episode reward: [(0, '19.607')] +[2023-02-25 17:18:35,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 2367488. Throughput: 0: 796.0. Samples: 591348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:18:35,725][08744] Avg episode reward: [(0, '19.250')] +[2023-02-25 17:18:37,848][14414] Updated weights for policy 0, policy_version 580 (0.0032) +[2023-02-25 17:18:40,729][08744] Fps is (10 sec: 3274.6, 60 sec: 3139.9, 300 sec: 3096.2). Total num frames: 2379776. Throughput: 0: 775.9. Samples: 595798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:18:40,731][08744] Avg episode reward: [(0, '19.932')] +[2023-02-25 17:18:45,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 2392064. Throughput: 0: 774.0. Samples: 597612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:18:45,728][08744] Avg episode reward: [(0, '20.006')] +[2023-02-25 17:18:50,723][08744] Fps is (10 sec: 3279.0, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 2412544. Throughput: 0: 799.3. Samples: 602400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:18:50,725][08744] Avg episode reward: [(0, '20.122')] +[2023-02-25 17:18:50,732][14400] Saving new best policy, reward=20.122! +[2023-02-25 17:18:51,398][14414] Updated weights for policy 0, policy_version 590 (0.0030) +[2023-02-25 17:18:55,723][08744] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 2433024. Throughput: 0: 798.7. Samples: 607948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:18:55,726][08744] Avg episode reward: [(0, '19.264')] +[2023-02-25 17:19:00,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3096.3). Total num frames: 2445312. Throughput: 0: 791.4. Samples: 610440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:19:00,725][08744] Avg episode reward: [(0, '18.698')] +[2023-02-25 17:19:05,446][14414] Updated weights for policy 0, policy_version 600 (0.0048) +[2023-02-25 17:19:05,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 2457600. Throughput: 0: 771.1. Samples: 613992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:19:05,729][08744] Avg episode reward: [(0, '19.630')] +[2023-02-25 17:19:10,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 2473984. Throughput: 0: 803.1. Samples: 618964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:19:10,732][08744] Avg episode reward: [(0, '19.213')] +[2023-02-25 17:19:15,387][14414] Updated weights for policy 0, policy_version 610 (0.0036) +[2023-02-25 17:19:15,723][08744] Fps is (10 sec: 4096.1, 60 sec: 3208.6, 300 sec: 3124.1). Total num frames: 2498560. Throughput: 0: 817.0. Samples: 622268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:19:15,730][08744] Avg episode reward: [(0, '18.684')] +[2023-02-25 17:19:15,741][14400] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000610_2498560.pth... +[2023-02-25 17:19:15,873][14400] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth +[2023-02-25 17:19:20,724][08744] Fps is (10 sec: 4095.3, 60 sec: 3276.7, 300 sec: 3124.0). Total num frames: 2514944. Throughput: 0: 821.3. Samples: 628310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:19:20,731][08744] Avg episode reward: [(0, '20.037')] +[2023-02-25 17:19:25,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3124.1). Total num frames: 2527232. Throughput: 0: 801.2. Samples: 631846. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:19:25,732][08744] Avg episode reward: [(0, '19.818')] +[2023-02-25 17:19:29,679][14414] Updated weights for policy 0, policy_version 620 (0.0015) +[2023-02-25 17:19:30,726][08744] Fps is (10 sec: 2457.1, 60 sec: 3208.3, 300 sec: 3110.2). Total num frames: 2539520. Throughput: 0: 800.4. Samples: 633634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:19:30,733][08744] Avg episode reward: [(0, '20.364')] +[2023-02-25 17:19:30,739][14400] Saving new best policy, reward=20.364! +[2023-02-25 17:19:35,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3124.1). Total num frames: 2560000. Throughput: 0: 819.2. Samples: 639262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:19:35,725][08744] Avg episode reward: [(0, '20.080')] +[2023-02-25 17:19:40,723][08744] Fps is (10 sec: 3278.0, 60 sec: 3208.9, 300 sec: 3124.1). Total num frames: 2572288. Throughput: 0: 805.0. Samples: 644172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:19:40,729][08744] Avg episode reward: [(0, '20.810')] +[2023-02-25 17:19:40,761][14400] Saving new best policy, reward=20.810! +[2023-02-25 17:19:42,472][14414] Updated weights for policy 0, policy_version 630 (0.0017) +[2023-02-25 17:19:45,725][08744] Fps is (10 sec: 2866.5, 60 sec: 3276.7, 300 sec: 3137.9). Total num frames: 2588672. Throughput: 0: 788.5. Samples: 645926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:19:45,731][08744] Avg episode reward: [(0, '21.370')] +[2023-02-25 17:19:45,751][14400] Saving new best policy, reward=21.370! +[2023-02-25 17:19:50,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 2605056. Throughput: 0: 802.8. Samples: 650118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:19:50,730][08744] Avg episode reward: [(0, '21.636')] +[2023-02-25 17:19:50,733][14400] Saving new best policy, reward=21.636! +[2023-02-25 17:19:55,037][14414] Updated weights for policy 0, policy_version 640 (0.0018) +[2023-02-25 17:19:55,723][08744] Fps is (10 sec: 3277.6, 60 sec: 3140.3, 300 sec: 3165.7). Total num frames: 2621440. Throughput: 0: 818.4. Samples: 655790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:19:55,725][08744] Avg episode reward: [(0, '21.917')] +[2023-02-25 17:19:55,738][14400] Saving new best policy, reward=21.917! +[2023-02-25 17:20:00,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 2637824. Throughput: 0: 809.9. Samples: 658714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:20:00,728][08744] Avg episode reward: [(0, '22.207')] +[2023-02-25 17:20:00,730][14400] Saving new best policy, reward=22.207! +[2023-02-25 17:20:05,727][08744] Fps is (10 sec: 2865.9, 60 sec: 3208.3, 300 sec: 3137.9). Total num frames: 2650112. Throughput: 0: 760.3. Samples: 662528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:20:05,731][08744] Avg episode reward: [(0, '21.734')] +[2023-02-25 17:20:09,505][14414] Updated weights for policy 0, policy_version 650 (0.0021) +[2023-02-25 17:20:10,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3151.8). Total num frames: 2666496. Throughput: 0: 771.8. Samples: 666576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:20:10,728][08744] Avg episode reward: [(0, '22.612')] +[2023-02-25 17:20:10,731][14400] Saving new best policy, reward=22.612! +[2023-02-25 17:20:15,723][08744] Fps is (10 sec: 3278.3, 60 sec: 3072.0, 300 sec: 3151.8). Total num frames: 2682880. Throughput: 0: 795.2. Samples: 669416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:20:15,725][08744] Avg episode reward: [(0, '23.123')] +[2023-02-25 17:20:15,740][14400] Saving new best policy, reward=23.123! +[2023-02-25 17:20:20,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3072.1, 300 sec: 3138.0). Total num frames: 2699264. Throughput: 0: 796.6. Samples: 675108. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-25 17:20:20,731][08744] Avg episode reward: [(0, '22.541')] +[2023-02-25 17:20:20,982][14414] Updated weights for policy 0, policy_version 660 (0.0016) +[2023-02-25 17:20:25,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3124.1). Total num frames: 2711552. Throughput: 0: 768.2. Samples: 678740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:20:25,726][08744] Avg episode reward: [(0, '22.754')] +[2023-02-25 17:20:30,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3140.5, 300 sec: 3151.9). Total num frames: 2727936. Throughput: 0: 768.4. Samples: 680500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:20:30,728][08744] Avg episode reward: [(0, '23.468')] +[2023-02-25 17:20:30,733][14400] Saving new best policy, reward=23.468! +[2023-02-25 17:20:34,883][14414] Updated weights for policy 0, policy_version 670 (0.0015) +[2023-02-25 17:20:35,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3072.0, 300 sec: 3151.8). Total num frames: 2744320. Throughput: 0: 788.2. Samples: 685588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:20:35,725][08744] Avg episode reward: [(0, '24.594')] +[2023-02-25 17:20:35,741][14400] Saving new best policy, reward=24.594! +[2023-02-25 17:20:40,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3151.8). Total num frames: 2760704. Throughput: 0: 787.3. Samples: 691218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:20:40,731][08744] Avg episode reward: [(0, '25.368')] +[2023-02-25 17:20:40,739][14400] Saving new best policy, reward=25.368! +[2023-02-25 17:20:45,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3072.1, 300 sec: 3124.1). Total num frames: 2772992. Throughput: 0: 761.0. Samples: 692960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-25 17:20:45,729][08744] Avg episode reward: [(0, '25.285')] +[2023-02-25 17:20:49,552][14414] Updated weights for policy 0, policy_version 680 (0.0038) +[2023-02-25 17:20:50,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3124.1). Total num frames: 2785280. Throughput: 0: 757.9. Samples: 696630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-25 17:20:50,731][08744] Avg episode reward: [(0, '24.226')] +[2023-02-25 17:20:55,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3138.0). Total num frames: 2801664. Throughput: 0: 772.2. Samples: 701324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-25 17:20:55,725][08744] Avg episode reward: [(0, '22.987')] +[2023-02-25 17:21:00,723][08744] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 3110.2). Total num frames: 2813952. Throughput: 0: 750.3. Samples: 703178. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:21:00,730][08744] Avg episode reward: [(0, '23.022')] +[2023-02-25 17:21:05,715][14414] Updated weights for policy 0, policy_version 690 (0.0025) +[2023-02-25 17:21:05,725][08744] Fps is (10 sec: 2457.3, 60 sec: 2935.6, 300 sec: 3082.4). Total num frames: 2826240. Throughput: 0: 691.8. Samples: 706242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:21:05,730][08744] Avg episode reward: [(0, '21.652')] +[2023-02-25 17:21:10,723][08744] Fps is (10 sec: 2457.6, 60 sec: 2867.2, 300 sec: 3082.4). Total num frames: 2838528. Throughput: 0: 689.7. Samples: 709776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:21:10,730][08744] Avg episode reward: [(0, '21.176')] +[2023-02-25 17:21:15,723][08744] Fps is (10 sec: 2867.6, 60 sec: 2867.2, 300 sec: 3096.3). Total num frames: 2854912. Throughput: 0: 698.5. Samples: 711932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:21:15,725][08744] Avg episode reward: [(0, '19.658')] +[2023-02-25 17:21:15,736][14400] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000697_2854912.pth... +[2023-02-25 17:21:15,848][14400] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000517_2117632.pth +[2023-02-25 17:21:17,765][14414] Updated weights for policy 0, policy_version 700 (0.0013) +[2023-02-25 17:21:20,723][08744] Fps is (10 sec: 3686.4, 60 sec: 2935.5, 300 sec: 3110.2). Total num frames: 2875392. Throughput: 0: 731.2. Samples: 718492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:21:20,725][08744] Avg episode reward: [(0, '18.954')] +[2023-02-25 17:21:25,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3003.7, 300 sec: 3096.3). Total num frames: 2891776. Throughput: 0: 718.2. Samples: 723536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:21:25,725][08744] Avg episode reward: [(0, '20.180')] +[2023-02-25 17:21:30,723][08744] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 3096.3). Total num frames: 2904064. Throughput: 0: 720.1. Samples: 725364. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:21:30,726][08744] Avg episode reward: [(0, '19.764')] +[2023-02-25 17:21:31,427][14414] Updated weights for policy 0, policy_version 710 (0.0042) +[2023-02-25 17:21:35,723][08744] Fps is (10 sec: 2867.2, 60 sec: 2935.5, 300 sec: 3110.2). Total num frames: 2920448. Throughput: 0: 722.2. Samples: 729130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:21:35,729][08744] Avg episode reward: [(0, '20.469')] +[2023-02-25 17:21:40,723][08744] Fps is (10 sec: 3276.8, 60 sec: 2935.5, 300 sec: 3096.3). Total num frames: 2936832. Throughput: 0: 745.6. Samples: 734876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:21:40,729][08744] Avg episode reward: [(0, '20.133')] +[2023-02-25 17:21:43,009][14414] Updated weights for policy 0, policy_version 720 (0.0016) +[2023-02-25 17:21:45,723][08744] Fps is (10 sec: 3276.6, 60 sec: 3003.7, 300 sec: 3096.3). Total num frames: 2953216. Throughput: 0: 768.3. Samples: 737750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:21:45,726][08744] Avg episode reward: [(0, '20.151')] +[2023-02-25 17:21:50,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 3082.4). Total num frames: 2965504. Throughput: 0: 788.7. Samples: 741734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:21:50,731][08744] Avg episode reward: [(0, '20.098')] +[2023-02-25 17:21:55,723][08744] Fps is (10 sec: 2867.3, 60 sec: 3003.7, 300 sec: 3096.3). Total num frames: 2981888. Throughput: 0: 797.4. Samples: 745660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:21:55,725][08744] Avg episode reward: [(0, '20.816')] +[2023-02-25 17:21:57,326][14414] Updated weights for policy 0, policy_version 730 (0.0022) +[2023-02-25 17:22:00,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 3002368. Throughput: 0: 813.6. Samples: 748544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:22:00,730][08744] Avg episode reward: [(0, '19.674')] +[2023-02-25 17:22:05,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3208.6, 300 sec: 3110.2). Total num frames: 3018752. Throughput: 0: 797.8. Samples: 754392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:22:05,726][08744] Avg episode reward: [(0, '19.218')] +[2023-02-25 17:22:10,111][14414] Updated weights for policy 0, policy_version 740 (0.0014) +[2023-02-25 17:22:10,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3096.3). Total num frames: 3031040. Throughput: 0: 767.8. Samples: 758088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:22:10,727][08744] Avg episode reward: [(0, '19.907')] +[2023-02-25 17:22:15,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 3043328. Throughput: 0: 766.3. Samples: 759846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:22:15,726][08744] Avg episode reward: [(0, '21.542')] +[2023-02-25 17:22:20,724][08744] Fps is (10 sec: 3276.2, 60 sec: 3140.2, 300 sec: 3110.2). Total num frames: 3063808. Throughput: 0: 798.5. Samples: 765066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:22:20,727][08744] Avg episode reward: [(0, '22.363')] +[2023-02-25 17:22:22,415][14414] Updated weights for policy 0, policy_version 750 (0.0035) +[2023-02-25 17:22:25,723][08744] Fps is (10 sec: 3686.3, 60 sec: 3140.2, 300 sec: 3110.2). Total num frames: 3080192. Throughput: 0: 795.6. Samples: 770678. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:22:25,728][08744] Avg episode reward: [(0, '23.141')] +[2023-02-25 17:22:30,723][08744] Fps is (10 sec: 2867.7, 60 sec: 3140.3, 300 sec: 3096.3). Total num frames: 3092480. Throughput: 0: 772.2. Samples: 772500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:22:30,733][08744] Avg episode reward: [(0, '23.033')] +[2023-02-25 17:22:35,723][08744] Fps is (10 sec: 2457.7, 60 sec: 3072.0, 300 sec: 3096.3). Total num frames: 3104768. Throughput: 0: 763.8. Samples: 776104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-25 17:22:35,725][08744] Avg episode reward: [(0, '23.687')] +[2023-02-25 17:22:37,069][14414] Updated weights for policy 0, policy_version 760 (0.0023) +[2023-02-25 17:22:40,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3125248. Throughput: 0: 803.8. Samples: 781830. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:22:40,725][08744] Avg episode reward: [(0, '23.476')] +[2023-02-25 17:22:45,723][08744] Fps is (10 sec: 4095.9, 60 sec: 3208.5, 300 sec: 3124.1). Total num frames: 3145728. Throughput: 0: 802.7. Samples: 784664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:22:45,729][08744] Avg episode reward: [(0, '22.540')] +[2023-02-25 17:22:48,394][14414] Updated weights for policy 0, policy_version 770 (0.0036) +[2023-02-25 17:22:50,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 3158016. Throughput: 0: 774.5. Samples: 789246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:22:50,725][08744] Avg episode reward: [(0, '21.947')] +[2023-02-25 17:22:55,723][08744] Fps is (10 sec: 2457.7, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 3170304. Throughput: 0: 774.5. Samples: 792940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:22:55,730][08744] Avg episode reward: [(0, '22.382')] +[2023-02-25 17:23:00,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3190784. Throughput: 0: 795.3. Samples: 795634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:23:00,726][08744] Avg episode reward: [(0, '20.137')] +[2023-02-25 17:23:01,283][14414] Updated weights for policy 0, policy_version 780 (0.0032) +[2023-02-25 17:23:05,723][08744] Fps is (10 sec: 4096.0, 60 sec: 3208.5, 300 sec: 3138.0). Total num frames: 3211264. Throughput: 0: 811.4. Samples: 801578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:23:05,725][08744] Avg episode reward: [(0, '19.357')] +[2023-02-25 17:23:10,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 3223552. Throughput: 0: 785.3. Samples: 806016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:23:10,727][08744] Avg episode reward: [(0, '19.773')] +[2023-02-25 17:23:15,721][14414] Updated weights for policy 0, policy_version 790 (0.0037) +[2023-02-25 17:23:15,724][08744] Fps is (10 sec: 2457.4, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 3235840. Throughput: 0: 782.3. Samples: 807706. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-25 17:23:15,727][08744] Avg episode reward: [(0, '20.982')] +[2023-02-25 17:23:15,737][14400] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000790_3235840.pth... +[2023-02-25 17:23:15,910][14400] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000610_2498560.pth +[2023-02-25 17:23:20,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3140.4, 300 sec: 3124.1). Total num frames: 3252224. Throughput: 0: 810.4. Samples: 812574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:23:20,731][08744] Avg episode reward: [(0, '19.469')] +[2023-02-25 17:23:25,723][08744] Fps is (10 sec: 3686.7, 60 sec: 3208.5, 300 sec: 3138.0). Total num frames: 3272704. Throughput: 0: 813.6. Samples: 818444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:23:25,731][08744] Avg episode reward: [(0, '20.064')] +[2023-02-25 17:23:26,493][14414] Updated weights for policy 0, policy_version 800 (0.0033) +[2023-02-25 17:23:30,723][08744] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 3284992. Throughput: 0: 799.1. Samples: 820622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:23:30,726][08744] Avg episode reward: [(0, '21.453')] +[2023-02-25 17:23:35,724][08744] Fps is (10 sec: 2457.2, 60 sec: 3208.4, 300 sec: 3110.2). Total num frames: 3297280. Throughput: 0: 779.9. Samples: 824344. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:23:35,728][08744] Avg episode reward: [(0, '22.085')] +[2023-02-25 17:23:40,492][14414] Updated weights for policy 0, policy_version 810 (0.0034) +[2023-02-25 17:23:40,723][08744] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3138.0). Total num frames: 3317760. Throughput: 0: 811.9. Samples: 829474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:23:40,730][08744] Avg episode reward: [(0, '23.566')] +[2023-02-25 17:23:45,723][08744] Fps is (10 sec: 3687.0, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3334144. Throughput: 0: 814.0. Samples: 832262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:23:45,725][08744] Avg episode reward: [(0, '23.509')] +[2023-02-25 17:23:50,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 3350528. Throughput: 0: 793.6. Samples: 837290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:23:50,729][08744] Avg episode reward: [(0, '24.037')] +[2023-02-25 17:23:53,389][14414] Updated weights for policy 0, policy_version 820 (0.0023) +[2023-02-25 17:23:55,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 3362816. Throughput: 0: 778.7. Samples: 841058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:23:55,725][08744] Avg episode reward: [(0, '24.403')] +[2023-02-25 17:24:00,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3379200. Throughput: 0: 790.0. Samples: 843256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:24:00,731][08744] Avg episode reward: [(0, '23.389')] +[2023-02-25 17:24:05,131][14414] Updated weights for policy 0, policy_version 830 (0.0026) +[2023-02-25 17:24:05,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3138.0). Total num frames: 3399680. Throughput: 0: 813.5. Samples: 849182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:24:05,731][08744] Avg episode reward: [(0, '23.387')] +[2023-02-25 17:24:10,723][08744] Fps is (10 sec: 3686.2, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 3416064. Throughput: 0: 794.1. Samples: 854180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:24:10,726][08744] Avg episode reward: [(0, '23.680')] +[2023-02-25 17:24:15,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3096.3). Total num frames: 3428352. Throughput: 0: 785.2. Samples: 855956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:24:15,725][08744] Avg episode reward: [(0, '22.665')] +[2023-02-25 17:24:19,194][14414] Updated weights for policy 0, policy_version 840 (0.0034) +[2023-02-25 17:24:20,723][08744] Fps is (10 sec: 2867.4, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 3444736. Throughput: 0: 798.3. Samples: 860264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:24:20,726][08744] Avg episode reward: [(0, '22.173')] +[2023-02-25 17:24:25,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3138.0). Total num frames: 3465216. Throughput: 0: 816.4. Samples: 866214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:24:25,728][08744] Avg episode reward: [(0, '21.007')] +[2023-02-25 17:24:30,560][14414] Updated weights for policy 0, policy_version 850 (0.0013) +[2023-02-25 17:24:30,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3124.1). Total num frames: 3481600. Throughput: 0: 815.7. Samples: 868968. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:24:30,726][08744] Avg episode reward: [(0, '21.696')] +[2023-02-25 17:24:35,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3208.6, 300 sec: 3110.2). Total num frames: 3489792. Throughput: 0: 786.9. Samples: 872702. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-25 17:24:35,729][08744] Avg episode reward: [(0, '22.429')] +[2023-02-25 17:24:40,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3110.2). Total num frames: 3506176. Throughput: 0: 802.6. Samples: 877174. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:24:40,728][08744] Avg episode reward: [(0, '23.166')] +[2023-02-25 17:24:44,046][14414] Updated weights for policy 0, policy_version 860 (0.0025) +[2023-02-25 17:24:45,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3124.1). Total num frames: 3526656. Throughput: 0: 816.8. Samples: 880014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:24:45,730][08744] Avg episode reward: [(0, '23.388')] +[2023-02-25 17:24:50,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3124.1). Total num frames: 3543040. Throughput: 0: 806.9. Samples: 885494. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:24:50,727][08744] Avg episode reward: [(0, '24.066')] +[2023-02-25 17:24:55,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3110.2). Total num frames: 3555328. Throughput: 0: 779.1. Samples: 889238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:24:55,730][08744] Avg episode reward: [(0, '24.721')] +[2023-02-25 17:24:58,288][14414] Updated weights for policy 0, policy_version 870 (0.0025) +[2023-02-25 17:25:00,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3124.1). Total num frames: 3571712. Throughput: 0: 779.7. Samples: 891044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:25:00,726][08744] Avg episode reward: [(0, '24.003')] +[2023-02-25 17:25:05,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3588096. Throughput: 0: 809.7. Samples: 896700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:25:05,725][08744] Avg episode reward: [(0, '24.185')] +[2023-02-25 17:25:09,168][14414] Updated weights for policy 0, policy_version 880 (0.0033) +[2023-02-25 17:25:10,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3604480. Throughput: 0: 799.2. Samples: 902176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:25:10,727][08744] Avg episode reward: [(0, '23.144')] +[2023-02-25 17:25:15,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3124.1). Total num frames: 3620864. Throughput: 0: 777.8. Samples: 903968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:25:15,729][08744] Avg episode reward: [(0, '22.641')] +[2023-02-25 17:25:15,743][14400] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000884_3620864.pth... +[2023-02-25 17:25:15,911][14400] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000697_2854912.pth +[2023-02-25 17:25:20,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3633152. Throughput: 0: 776.5. Samples: 907644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:25:20,726][08744] Avg episode reward: [(0, '22.378')] +[2023-02-25 17:25:23,288][14414] Updated weights for policy 0, policy_version 890 (0.0019) +[2023-02-25 17:25:25,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3138.0). Total num frames: 3653632. Throughput: 0: 804.3. Samples: 913368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:25:25,730][08744] Avg episode reward: [(0, '22.687')] +[2023-02-25 17:25:30,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3140.3, 300 sec: 3138.0). Total num frames: 3670016. Throughput: 0: 802.2. Samples: 916114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:25:30,725][08744] Avg episode reward: [(0, '24.259')] +[2023-02-25 17:25:35,725][08744] Fps is (10 sec: 2866.5, 60 sec: 3208.4, 300 sec: 3124.0). Total num frames: 3682304. Throughput: 0: 770.8. Samples: 920180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:25:35,733][08744] Avg episode reward: [(0, '24.641')] +[2023-02-25 17:25:37,023][14414] Updated weights for policy 0, policy_version 900 (0.0027) +[2023-02-25 17:25:40,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3124.1). Total num frames: 3694592. Throughput: 0: 768.8. Samples: 923834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:25:40,730][08744] Avg episode reward: [(0, '24.149')] +[2023-02-25 17:25:45,723][08744] Fps is (10 sec: 3277.6, 60 sec: 3140.3, 300 sec: 3151.8). Total num frames: 3715072. Throughput: 0: 796.7. Samples: 926896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:25:45,729][08744] Avg episode reward: [(0, '23.459')] +[2023-02-25 17:25:48,032][14414] Updated weights for policy 0, policy_version 910 (0.0038) +[2023-02-25 17:25:50,726][08744] Fps is (10 sec: 4094.5, 60 sec: 3208.3, 300 sec: 3165.7). Total num frames: 3735552. Throughput: 0: 816.5. Samples: 933444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-25 17:25:50,729][08744] Avg episode reward: [(0, '24.422')] +[2023-02-25 17:25:55,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3165.7). Total num frames: 3747840. Throughput: 0: 790.0. Samples: 937726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-25 17:25:55,742][08744] Avg episode reward: [(0, '24.302')] +[2023-02-25 17:26:00,723][08744] Fps is (10 sec: 2868.2, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 3764224. Throughput: 0: 795.4. Samples: 939760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-25 17:26:00,725][08744] Avg episode reward: [(0, '23.426')] +[2023-02-25 17:26:01,254][14414] Updated weights for policy 0, policy_version 920 (0.0026) +[2023-02-25 17:26:05,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 3784704. Throughput: 0: 838.7. Samples: 945384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:26:05,725][08744] Avg episode reward: [(0, '23.203')] +[2023-02-25 17:26:10,723][08744] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 3805184. Throughput: 0: 856.0. Samples: 951888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-25 17:26:10,724][08744] Avg episode reward: [(0, '24.365')] +[2023-02-25 17:26:10,878][14414] Updated weights for policy 0, policy_version 930 (0.0019) +[2023-02-25 17:26:15,723][08744] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 3817472. Throughput: 0: 842.5. Samples: 954026. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-25 17:26:15,728][08744] Avg episode reward: [(0, '24.662')] +[2023-02-25 17:26:20,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3193.5). Total num frames: 3833856. Throughput: 0: 840.9. Samples: 958018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:26:20,730][08744] Avg episode reward: [(0, '23.894')] +[2023-02-25 17:26:24,118][14414] Updated weights for policy 0, policy_version 940 (0.0014) +[2023-02-25 17:26:25,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 3854336. Throughput: 0: 889.1. Samples: 963844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:26:25,726][08744] Avg episode reward: [(0, '24.688')] +[2023-02-25 17:26:30,725][08744] Fps is (10 sec: 4095.0, 60 sec: 3413.2, 300 sec: 3235.1). Total num frames: 3874816. Throughput: 0: 895.9. Samples: 967212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:26:30,728][08744] Avg episode reward: [(0, '26.237')] +[2023-02-25 17:26:30,732][14400] Saving new best policy, reward=26.237! +[2023-02-25 17:26:35,190][14414] Updated weights for policy 0, policy_version 950 (0.0021) +[2023-02-25 17:26:35,729][08744] Fps is (10 sec: 3683.9, 60 sec: 3481.4, 300 sec: 3235.1). Total num frames: 3891200. Throughput: 0: 863.6. Samples: 972308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:26:35,736][08744] Avg episode reward: [(0, '25.852')] +[2023-02-25 17:26:40,723][08744] Fps is (10 sec: 2867.9, 60 sec: 3481.6, 300 sec: 3221.3). Total num frames: 3903488. Throughput: 0: 859.0. Samples: 976380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:26:40,725][08744] Avg episode reward: [(0, '26.139')] +[2023-02-25 17:26:45,723][08744] Fps is (10 sec: 3279.0, 60 sec: 3481.6, 300 sec: 3249.0). Total num frames: 3923968. Throughput: 0: 874.4. Samples: 979108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:26:45,728][08744] Avg episode reward: [(0, '24.856')] +[2023-02-25 17:26:46,934][14414] Updated weights for policy 0, policy_version 960 (0.0017) +[2023-02-25 17:26:50,723][08744] Fps is (10 sec: 4505.6, 60 sec: 3550.1, 300 sec: 3276.8). Total num frames: 3948544. Throughput: 0: 895.7. Samples: 985690. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:26:50,728][08744] Avg episode reward: [(0, '23.068')] +[2023-02-25 17:26:55,723][08744] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3249.0). Total num frames: 3960832. Throughput: 0: 864.3. Samples: 990782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-25 17:26:55,728][08744] Avg episode reward: [(0, '21.279')] +[2023-02-25 17:26:59,384][14414] Updated weights for policy 0, policy_version 970 (0.0023) +[2023-02-25 17:27:00,723][08744] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 3235.1). Total num frames: 3973120. Throughput: 0: 863.1. Samples: 992866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:27:00,727][08744] Avg episode reward: [(0, '21.030')] +[2023-02-25 17:27:05,724][08744] Fps is (10 sec: 2866.7, 60 sec: 3413.2, 300 sec: 3249.0). Total num frames: 3989504. Throughput: 0: 868.6. Samples: 997108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:27:05,730][08744] Avg episode reward: [(0, '20.007')] +[2023-02-25 17:27:10,723][08744] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 4001792. Throughput: 0: 832.0. Samples: 1001286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-25 17:27:10,728][08744] Avg episode reward: [(0, '19.957')] +[2023-02-25 17:27:10,794][14400] Stopping Batcher_0... +[2023-02-25 17:27:10,795][14400] Loop batcher_evt_loop terminating... +[2023-02-25 17:27:10,795][08744] Component Batcher_0 stopped! +[2023-02-25 17:27:10,800][14400] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-25 17:27:10,842][14414] Weights refcount: 2 0 +[2023-02-25 17:27:10,848][14414] Stopping InferenceWorker_p0-w0... +[2023-02-25 17:27:10,848][14414] Loop inference_proc0-0_evt_loop terminating... +[2023-02-25 17:27:10,855][08744] Component InferenceWorker_p0-w0 stopped! +[2023-02-25 17:27:10,877][14418] Stopping RolloutWorker_w3... +[2023-02-25 17:27:10,878][08744] Component RolloutWorker_w3 stopped! +[2023-02-25 17:27:10,879][14418] Loop rollout_proc3_evt_loop terminating... +[2023-02-25 17:27:10,888][08744] Component RolloutWorker_w7 stopped! +[2023-02-25 17:27:10,887][14422] Stopping RolloutWorker_w7... +[2023-02-25 17:27:10,899][08744] Component RolloutWorker_w1 stopped! +[2023-02-25 17:27:10,904][14415] Stopping RolloutWorker_w1... +[2023-02-25 17:27:10,905][14415] Loop rollout_proc1_evt_loop terminating... +[2023-02-25 17:27:10,912][08744] Component RolloutWorker_w5 stopped! +[2023-02-25 17:27:10,916][14419] Stopping RolloutWorker_w5... +[2023-02-25 17:27:10,918][14419] Loop rollout_proc5_evt_loop terminating... +[2023-02-25 17:27:10,896][14422] Loop rollout_proc7_evt_loop terminating... +[2023-02-25 17:27:11,023][14416] Stopping RolloutWorker_w0... +[2023-02-25 17:27:11,023][08744] Component RolloutWorker_w0 stopped! +[2023-02-25 17:27:11,033][08744] Component RolloutWorker_w4 stopped! +[2023-02-25 17:27:11,033][14420] Stopping RolloutWorker_w4... +[2023-02-25 17:27:11,039][14420] Loop rollout_proc4_evt_loop terminating... +[2023-02-25 17:27:11,044][14400] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000790_3235840.pth +[2023-02-25 17:27:11,053][14417] Stopping RolloutWorker_w2... +[2023-02-25 17:27:11,054][14417] Loop rollout_proc2_evt_loop terminating... +[2023-02-25 17:27:11,053][08744] Component RolloutWorker_w2 stopped! +[2023-02-25 17:27:11,024][14416] Loop rollout_proc0_evt_loop terminating... +[2023-02-25 17:27:11,059][14400] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-25 17:27:11,067][14421] Stopping RolloutWorker_w6... +[2023-02-25 17:27:11,068][14421] Loop rollout_proc6_evt_loop terminating... +[2023-02-25 17:27:11,067][08744] Component RolloutWorker_w6 stopped! +[2023-02-25 17:27:11,395][14400] Stopping LearnerWorker_p0... +[2023-02-25 17:27:11,396][14400] Loop learner_proc0_evt_loop terminating... +[2023-02-25 17:27:11,395][08744] Component LearnerWorker_p0 stopped! +[2023-02-25 17:27:11,397][08744] Waiting for process learner_proc0 to stop... +[2023-02-25 17:27:14,084][08744] Waiting for process inference_proc0-0 to join... +[2023-02-25 17:27:14,943][08744] Waiting for process rollout_proc0 to join... +[2023-02-25 17:27:15,577][08744] Waiting for process rollout_proc1 to join... +[2023-02-25 17:27:15,580][08744] Waiting for process rollout_proc2 to join... +[2023-02-25 17:27:15,581][08744] Waiting for process rollout_proc3 to join... +[2023-02-25 17:27:15,582][08744] Waiting for process rollout_proc4 to join... +[2023-02-25 17:27:15,583][08744] Waiting for process rollout_proc5 to join... +[2023-02-25 17:27:15,584][08744] Waiting for process rollout_proc6 to join... +[2023-02-25 17:27:15,585][08744] Waiting for process rollout_proc7 to join... +[2023-02-25 17:27:15,586][08744] Batcher 0 profile tree view: +batching: 27.0065, releasing_batches: 0.0307 +[2023-02-25 17:27:15,589][08744] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0113 + wait_policy_total: 625.5268 +update_model: 9.3480 + weight_update: 0.0030 +one_step: 0.0025 + handle_policy_step: 603.7016 + deserialize: 17.6749, stack: 3.5419, obs_to_device_normalize: 130.3602, forward: 298.9829, send_messages: 30.3434 + prepare_outputs: 93.4677 + to_cpu: 56.6202 +[2023-02-25 17:27:15,590][08744] Learner 0 profile tree view: +misc: 0.0068, prepare_batch: 16.5096 +train: 77.2435 + epoch_init: 0.0100, minibatch_init: 0.0202, losses_postprocess: 0.5078, kl_divergence: 0.6118, after_optimizer: 32.9877 + calculate_losses: 27.5819 + losses_init: 0.0060, forward_head: 2.1287, bptt_initial: 17.6677, tail: 1.2567, advantages_returns: 0.3092, losses: 3.4066 + bptt: 2.4943 + bptt_forward_core: 2.4010 + update: 14.7871 + clip: 1.4763 +[2023-02-25 17:27:15,591][08744] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.4449, enqueue_policy_requests: 186.1918, env_step: 947.4893, overhead: 29.3215, complete_rollouts: 8.0498 +save_policy_outputs: 24.4024 + split_output_tensors: 12.1745 +[2023-02-25 17:27:15,593][08744] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.3633, enqueue_policy_requests: 188.2064, env_step: 945.1326, overhead: 27.9021, complete_rollouts: 8.6555 +save_policy_outputs: 24.3528 + split_output_tensors: 11.4950 +[2023-02-25 17:27:15,594][08744] Loop Runner_EvtLoop terminating... +[2023-02-25 17:27:15,596][08744] Runner profile tree view: +main_loop: 1314.5485 +[2023-02-25 17:27:15,601][08744] Collected {0: 4005888}, FPS: 3047.3 +[2023-02-25 17:27:37,701][08744] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-25 17:27:37,703][08744] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-25 17:27:37,706][08744] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-25 17:27:37,709][08744] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-25 17:27:37,711][08744] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-25 17:27:37,713][08744] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-25 17:27:37,716][08744] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2023-02-25 17:27:37,718][08744] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-25 17:27:37,719][08744] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2023-02-25 17:27:37,720][08744] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2023-02-25 17:27:37,721][08744] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-25 17:27:37,722][08744] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-25 17:27:37,725][08744] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-25 17:27:37,726][08744] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-25 17:27:37,729][08744] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-25 17:27:37,760][08744] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-25 17:27:37,763][08744] RunningMeanStd input shape: (3, 72, 128) +[2023-02-25 17:27:37,766][08744] RunningMeanStd input shape: (1,) +[2023-02-25 17:27:37,783][08744] ConvEncoder: input_channels=3 +[2023-02-25 17:27:38,469][08744] Conv encoder output size: 512 +[2023-02-25 17:27:38,471][08744] Policy head output size: 512 +[2023-02-25 17:27:41,326][08744] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-25 17:27:42,994][08744] Num frames 100... +[2023-02-25 17:27:43,111][08744] Num frames 200... +[2023-02-25 17:27:43,222][08744] Num frames 300... +[2023-02-25 17:27:43,335][08744] Num frames 400... +[2023-02-25 17:27:43,443][08744] Num frames 500... +[2023-02-25 17:27:43,551][08744] Num frames 600... +[2023-02-25 17:27:43,661][08744] Num frames 700... +[2023-02-25 17:27:43,777][08744] Num frames 800... +[2023-02-25 17:27:43,890][08744] Num frames 900... +[2023-02-25 17:27:44,009][08744] Num frames 1000... +[2023-02-25 17:27:44,133][08744] Num frames 1100... +[2023-02-25 17:27:44,249][08744] Num frames 1200... +[2023-02-25 17:27:44,365][08744] Num frames 1300... +[2023-02-25 17:27:44,483][08744] Num frames 1400... +[2023-02-25 17:27:44,599][08744] Num frames 1500... +[2023-02-25 17:27:44,728][08744] Avg episode rewards: #0: 33.680, true rewards: #0: 15.680 +[2023-02-25 17:27:44,729][08744] Avg episode reward: 33.680, avg true_objective: 15.680 +[2023-02-25 17:27:44,779][08744] Num frames 1600... +[2023-02-25 17:27:44,905][08744] Num frames 1700... +[2023-02-25 17:27:45,031][08744] Num frames 1800... +[2023-02-25 17:27:45,145][08744] Num frames 1900... +[2023-02-25 17:27:45,253][08744] Num frames 2000... +[2023-02-25 17:27:45,363][08744] Num frames 2100... +[2023-02-25 17:27:45,473][08744] Num frames 2200... +[2023-02-25 17:27:45,588][08744] Num frames 2300... +[2023-02-25 17:27:45,697][08744] Num frames 2400... +[2023-02-25 17:27:45,808][08744] Num frames 2500... +[2023-02-25 17:27:45,918][08744] Num frames 2600... +[2023-02-25 17:27:46,040][08744] Num frames 2700... +[2023-02-25 17:27:46,159][08744] Num frames 2800... +[2023-02-25 17:27:46,274][08744] Num frames 2900... +[2023-02-25 17:27:46,391][08744] Num frames 3000... +[2023-02-25 17:27:46,514][08744] Num frames 3100... +[2023-02-25 17:27:46,635][08744] Num frames 3200... +[2023-02-25 17:27:46,795][08744] Avg episode rewards: #0: 38.980, true rewards: #0: 16.480 +[2023-02-25 17:27:46,797][08744] Avg episode reward: 38.980, avg true_objective: 16.480 +[2023-02-25 17:27:46,808][08744] Num frames 3300... +[2023-02-25 17:27:46,918][08744] Num frames 3400... +[2023-02-25 17:27:47,043][08744] Num frames 3500... +[2023-02-25 17:27:47,158][08744] Num frames 3600... +[2023-02-25 17:27:47,268][08744] Num frames 3700... +[2023-02-25 17:27:47,377][08744] Num frames 3800... +[2023-02-25 17:27:47,486][08744] Num frames 3900... +[2023-02-25 17:27:47,596][08744] Num frames 4000... +[2023-02-25 17:27:47,719][08744] Num frames 4100... +[2023-02-25 17:27:47,829][08744] Num frames 4200... +[2023-02-25 17:27:47,940][08744] Num frames 4300... +[2023-02-25 17:27:48,048][08744] Num frames 4400... +[2023-02-25 17:27:48,164][08744] Num frames 4500... +[2023-02-25 17:27:48,277][08744] Num frames 4600... +[2023-02-25 17:27:48,389][08744] Num frames 4700... +[2023-02-25 17:27:48,506][08744] Num frames 4800... +[2023-02-25 17:27:48,657][08744] Avg episode rewards: #0: 38.293, true rewards: #0: 16.293 +[2023-02-25 17:27:48,658][08744] Avg episode reward: 38.293, avg true_objective: 16.293 +[2023-02-25 17:27:48,684][08744] Num frames 4900... +[2023-02-25 17:27:48,793][08744] Num frames 5000... +[2023-02-25 17:27:48,904][08744] Num frames 5100... +[2023-02-25 17:27:49,025][08744] Num frames 5200... +[2023-02-25 17:27:49,142][08744] Num frames 5300... +[2023-02-25 17:27:49,259][08744] Num frames 5400... +[2023-02-25 17:27:49,370][08744] Num frames 5500... +[2023-02-25 17:27:49,477][08744] Num frames 5600... +[2023-02-25 17:27:49,603][08744] Avg episode rewards: #0: 32.665, true rewards: #0: 14.165 +[2023-02-25 17:27:49,605][08744] Avg episode reward: 32.665, avg true_objective: 14.165 +[2023-02-25 17:27:49,651][08744] Num frames 5700... +[2023-02-25 17:27:49,764][08744] Num frames 5800... +[2023-02-25 17:27:49,876][08744] Num frames 5900... +[2023-02-25 17:27:49,996][08744] Num frames 6000... +[2023-02-25 17:27:50,118][08744] Num frames 6100... +[2023-02-25 17:27:50,228][08744] Num frames 6200... +[2023-02-25 17:27:50,339][08744] Num frames 6300... +[2023-02-25 17:27:50,425][08744] Avg episode rewards: #0: 28.854, true rewards: #0: 12.654 +[2023-02-25 17:27:50,427][08744] Avg episode reward: 28.854, avg true_objective: 12.654 +[2023-02-25 17:27:50,512][08744] Num frames 6400... +[2023-02-25 17:27:50,629][08744] Num frames 6500... +[2023-02-25 17:27:50,751][08744] Num frames 6600... +[2023-02-25 17:27:50,868][08744] Num frames 6700... +[2023-02-25 17:27:50,986][08744] Num frames 6800... +[2023-02-25 17:27:51,095][08744] Num frames 6900... +[2023-02-25 17:27:51,155][08744] Avg episode rewards: #0: 25.672, true rewards: #0: 11.505 +[2023-02-25 17:27:51,157][08744] Avg episode reward: 25.672, avg true_objective: 11.505 +[2023-02-25 17:27:51,267][08744] Num frames 7000... +[2023-02-25 17:27:51,376][08744] Num frames 7100... +[2023-02-25 17:27:51,489][08744] Num frames 7200... +[2023-02-25 17:27:51,599][08744] Num frames 7300... +[2023-02-25 17:27:51,705][08744] Num frames 7400... +[2023-02-25 17:27:51,814][08744] Num frames 7500... +[2023-02-25 17:27:51,922][08744] Num frames 7600... +[2023-02-25 17:27:51,997][08744] Avg episode rewards: #0: 24.310, true rewards: #0: 10.881 +[2023-02-25 17:27:51,999][08744] Avg episode reward: 24.310, avg true_objective: 10.881 +[2023-02-25 17:27:52,104][08744] Num frames 7700... +[2023-02-25 17:27:52,218][08744] Num frames 7800... +[2023-02-25 17:27:52,331][08744] Num frames 7900... +[2023-02-25 17:27:52,441][08744] Num frames 8000... +[2023-02-25 17:27:52,551][08744] Num frames 8100... +[2023-02-25 17:27:52,661][08744] Num frames 8200... +[2023-02-25 17:27:52,770][08744] Num frames 8300... +[2023-02-25 17:27:52,927][08744] Num frames 8400... +[2023-02-25 17:27:53,094][08744] Num frames 8500... +[2023-02-25 17:27:53,250][08744] Num frames 8600... +[2023-02-25 17:27:53,409][08744] Num frames 8700... +[2023-02-25 17:27:53,567][08744] Num frames 8800... +[2023-02-25 17:27:53,727][08744] Num frames 8900... +[2023-02-25 17:27:53,903][08744] Num frames 9000... +[2023-02-25 17:27:54,016][08744] Avg episode rewards: #0: 25.281, true rewards: #0: 11.281 +[2023-02-25 17:27:54,021][08744] Avg episode reward: 25.281, avg true_objective: 11.281 +[2023-02-25 17:27:54,139][08744] Num frames 9100... +[2023-02-25 17:27:54,302][08744] Num frames 9200... +[2023-02-25 17:27:54,454][08744] Num frames 9300... +[2023-02-25 17:27:54,610][08744] Num frames 9400... +[2023-02-25 17:27:54,770][08744] Num frames 9500... +[2023-02-25 17:27:54,938][08744] Num frames 9600... +[2023-02-25 17:27:55,106][08744] Num frames 9700... +[2023-02-25 17:27:55,268][08744] Num frames 9800... +[2023-02-25 17:27:55,430][08744] Num frames 9900... +[2023-02-25 17:27:55,590][08744] Num frames 10000... +[2023-02-25 17:27:55,751][08744] Num frames 10100... +[2023-02-25 17:27:55,916][08744] Num frames 10200... +[2023-02-25 17:27:56,076][08744] Num frames 10300... +[2023-02-25 17:27:56,238][08744] Num frames 10400... +[2023-02-25 17:27:56,379][08744] Num frames 10500... +[2023-02-25 17:27:56,490][08744] Num frames 10600... +[2023-02-25 17:27:56,603][08744] Num frames 10700... +[2023-02-25 17:27:56,707][08744] Avg episode rewards: #0: 27.602, true rewards: #0: 11.936 +[2023-02-25 17:27:56,711][08744] Avg episode reward: 27.602, avg true_objective: 11.936 +[2023-02-25 17:27:56,778][08744] Num frames 10800... +[2023-02-25 17:27:56,902][08744] Num frames 10900... +[2023-02-25 17:27:57,012][08744] Num frames 11000... +[2023-02-25 17:27:57,120][08744] Num frames 11100... +[2023-02-25 17:27:57,228][08744] Num frames 11200... +[2023-02-25 17:27:57,345][08744] Num frames 11300... +[2023-02-25 17:27:57,453][08744] Num frames 11400... +[2023-02-25 17:27:57,560][08744] Num frames 11500... +[2023-02-25 17:27:57,675][08744] Num frames 11600... +[2023-02-25 17:27:57,785][08744] Num frames 11700... +[2023-02-25 17:27:57,898][08744] Num frames 11800... +[2023-02-25 17:27:58,006][08744] Num frames 11900... +[2023-02-25 17:27:58,118][08744] Num frames 12000... +[2023-02-25 17:27:58,231][08744] Num frames 12100... +[2023-02-25 17:27:58,352][08744] Num frames 12200... +[2023-02-25 17:27:58,464][08744] Num frames 12300... +[2023-02-25 17:27:58,576][08744] Num frames 12400... +[2023-02-25 17:27:58,692][08744] Num frames 12500... +[2023-02-25 17:27:58,804][08744] Num frames 12600... +[2023-02-25 17:27:58,925][08744] Num frames 12700... +[2023-02-25 17:27:59,043][08744] Num frames 12800... +[2023-02-25 17:27:59,146][08744] Avg episode rewards: #0: 30.342, true rewards: #0: 12.842 +[2023-02-25 17:27:59,148][08744] Avg episode reward: 30.342, avg true_objective: 12.842 +[2023-02-25 17:29:20,408][08744] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2023-02-25 17:32:34,122][08744] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-25 17:32:34,128][08744] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-25 17:32:34,130][08744] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-25 17:32:34,133][08744] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-25 17:32:34,138][08744] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-25 17:32:34,139][08744] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-25 17:32:34,141][08744] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2023-02-25 17:32:34,142][08744] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-25 17:32:34,143][08744] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2023-02-25 17:32:34,147][08744] Adding new argument 'hf_repository'='akgeni/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2023-02-25 17:32:34,149][08744] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-25 17:32:34,151][08744] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-25 17:32:34,152][08744] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-25 17:32:34,154][08744] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-25 17:32:34,157][08744] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-25 17:32:34,192][08744] RunningMeanStd input shape: (3, 72, 128) +[2023-02-25 17:32:34,194][08744] RunningMeanStd input shape: (1,) +[2023-02-25 17:32:34,219][08744] ConvEncoder: input_channels=3 +[2023-02-25 17:32:34,282][08744] Conv encoder output size: 512 +[2023-02-25 17:32:34,284][08744] Policy head output size: 512 +[2023-02-25 17:32:34,313][08744] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-25 17:32:34,933][08744] Num frames 100... +[2023-02-25 17:32:35,084][08744] Num frames 200... +[2023-02-25 17:32:35,248][08744] Num frames 300... +[2023-02-25 17:32:35,408][08744] Num frames 400... +[2023-02-25 17:32:35,559][08744] Num frames 500... +[2023-02-25 17:32:35,710][08744] Num frames 600... +[2023-02-25 17:32:35,782][08744] Avg episode rewards: #0: 9.080, true rewards: #0: 6.080 +[2023-02-25 17:32:35,784][08744] Avg episode reward: 9.080, avg true_objective: 6.080 +[2023-02-25 17:32:35,939][08744] Num frames 700... +[2023-02-25 17:32:36,097][08744] Num frames 800... +[2023-02-25 17:32:36,257][08744] Num frames 900... +[2023-02-25 17:32:36,428][08744] Num frames 1000... +[2023-02-25 17:32:36,520][08744] Avg episode rewards: #0: 7.610, true rewards: #0: 5.110 +[2023-02-25 17:32:36,522][08744] Avg episode reward: 7.610, avg true_objective: 5.110 +[2023-02-25 17:32:36,663][08744] Num frames 1100... +[2023-02-25 17:32:36,820][08744] Num frames 1200... +[2023-02-25 17:32:36,968][08744] Num frames 1300... +[2023-02-25 17:32:37,078][08744] Num frames 1400... +[2023-02-25 17:32:37,197][08744] Num frames 1500... +[2023-02-25 17:32:37,309][08744] Num frames 1600... +[2023-02-25 17:32:37,436][08744] Num frames 1700... +[2023-02-25 17:32:37,546][08744] Num frames 1800... +[2023-02-25 17:32:37,658][08744] Num frames 1900... +[2023-02-25 17:32:37,766][08744] Num frames 2000... +[2023-02-25 17:32:37,879][08744] Num frames 2100... +[2023-02-25 17:32:37,994][08744] Num frames 2200... +[2023-02-25 17:32:38,110][08744] Num frames 2300... +[2023-02-25 17:32:38,221][08744] Num frames 2400... +[2023-02-25 17:32:38,359][08744] Num frames 2500... +[2023-02-25 17:32:38,482][08744] Num frames 2600... +[2023-02-25 17:32:38,596][08744] Num frames 2700... +[2023-02-25 17:32:38,709][08744] Num frames 2800... +[2023-02-25 17:32:38,820][08744] Num frames 2900... +[2023-02-25 17:32:38,931][08744] Num frames 3000... +[2023-02-25 17:32:39,058][08744] Num frames 3100... +[2023-02-25 17:32:39,146][08744] Avg episode rewards: #0: 22.073, true rewards: #0: 10.407 +[2023-02-25 17:32:39,151][08744] Avg episode reward: 22.073, avg true_objective: 10.407 +[2023-02-25 17:32:39,239][08744] Num frames 3200... +[2023-02-25 17:32:39,355][08744] Num frames 3300... +[2023-02-25 17:32:39,473][08744] Num frames 3400... +[2023-02-25 17:32:39,541][08744] Avg episode rewards: #0: 17.775, true rewards: #0: 8.525 +[2023-02-25 17:32:39,542][08744] Avg episode reward: 17.775, avg true_objective: 8.525 +[2023-02-25 17:32:39,645][08744] Num frames 3500... +[2023-02-25 17:32:39,763][08744] Num frames 3600... +[2023-02-25 17:32:39,882][08744] Num frames 3700... +[2023-02-25 17:32:40,005][08744] Num frames 3800... +[2023-02-25 17:32:40,122][08744] Num frames 3900... +[2023-02-25 17:32:40,231][08744] Num frames 4000... +[2023-02-25 17:32:40,346][08744] Num frames 4100... +[2023-02-25 17:32:40,460][08744] Num frames 4200... +[2023-02-25 17:32:40,586][08744] Num frames 4300... +[2023-02-25 17:32:40,694][08744] Num frames 4400... +[2023-02-25 17:32:40,804][08744] Num frames 4500... +[2023-02-25 17:32:40,914][08744] Num frames 4600... +[2023-02-25 17:32:41,031][08744] Num frames 4700... +[2023-02-25 17:32:41,147][08744] Num frames 4800... +[2023-02-25 17:32:41,257][08744] Num frames 4900... +[2023-02-25 17:32:41,366][08744] Num frames 5000... +[2023-02-25 17:32:41,480][08744] Num frames 5100... +[2023-02-25 17:32:41,584][08744] Avg episode rewards: #0: 22.876, true rewards: #0: 10.276 +[2023-02-25 17:32:41,586][08744] Avg episode reward: 22.876, avg true_objective: 10.276 +[2023-02-25 17:32:41,656][08744] Num frames 5200... +[2023-02-25 17:32:41,773][08744] Num frames 5300... +[2023-02-25 17:32:41,903][08744] Num frames 5400... +[2023-02-25 17:32:42,010][08744] Avg episode rewards: #0: 19.898, true rewards: #0: 9.065 +[2023-02-25 17:32:42,011][08744] Avg episode reward: 19.898, avg true_objective: 9.065 +[2023-02-25 17:32:42,086][08744] Num frames 5500... +[2023-02-25 17:32:42,197][08744] Num frames 5600... +[2023-02-25 17:32:42,307][08744] Num frames 5700... +[2023-02-25 17:32:42,418][08744] Num frames 5800... +[2023-02-25 17:32:42,529][08744] Num frames 5900... +[2023-02-25 17:32:42,635][08744] Num frames 6000... +[2023-02-25 17:32:42,741][08744] Num frames 6100... +[2023-02-25 17:32:42,857][08744] Num frames 6200... +[2023-02-25 17:32:42,922][08744] Avg episode rewards: #0: 18.867, true rewards: #0: 8.867 +[2023-02-25 17:32:42,923][08744] Avg episode reward: 18.867, avg true_objective: 8.867 +[2023-02-25 17:32:43,030][08744] Num frames 6300... +[2023-02-25 17:32:43,139][08744] Num frames 6400... +[2023-02-25 17:32:43,249][08744] Num frames 6500... +[2023-02-25 17:32:43,365][08744] Num frames 6600... +[2023-02-25 17:32:43,493][08744] Num frames 6700... +[2023-02-25 17:32:43,616][08744] Num frames 6800... +[2023-02-25 17:32:43,734][08744] Num frames 6900... +[2023-02-25 17:32:43,855][08744] Avg episode rewards: #0: 18.696, true rewards: #0: 8.696 +[2023-02-25 17:32:43,856][08744] Avg episode reward: 18.696, avg true_objective: 8.696 +[2023-02-25 17:32:43,908][08744] Num frames 7000... +[2023-02-25 17:32:44,017][08744] Num frames 7100... +[2023-02-25 17:32:44,129][08744] Num frames 7200... +[2023-02-25 17:32:44,271][08744] Avg episode rewards: #0: 17.310, true rewards: #0: 8.088 +[2023-02-25 17:32:44,274][08744] Avg episode reward: 17.310, avg true_objective: 8.088 +[2023-02-25 17:32:44,302][08744] Num frames 7300... +[2023-02-25 17:32:44,410][08744] Num frames 7400... +[2023-02-25 17:32:44,534][08744] Num frames 7500... +[2023-02-25 17:32:44,643][08744] Num frames 7600... +[2023-02-25 17:32:44,765][08744] Avg episode rewards: #0: 15.963, true rewards: #0: 7.663 +[2023-02-25 17:32:44,767][08744] Avg episode reward: 15.963, avg true_objective: 7.663 +[2023-02-25 17:33:35,350][08744] Replay video saved to /content/train_dir/default_experiment/replay.mp4!