[2024-12-01 11:06:37,002][02154] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-01 11:06:37,007][02154] Rollout worker 0 uses device cpu [2024-12-01 11:06:37,012][02154] Rollout worker 1 uses device cpu [2024-12-01 11:06:37,016][02154] Rollout worker 2 uses device cpu [2024-12-01 11:06:37,018][02154] Rollout worker 3 uses device cpu [2024-12-01 11:06:37,035][02154] Rollout worker 4 uses device cpu [2024-12-01 11:06:37,036][02154] Rollout worker 5 uses device cpu [2024-12-01 11:06:37,039][02154] Rollout worker 6 uses device cpu [2024-12-01 11:06:37,042][02154] Rollout worker 7 uses device cpu [2024-12-01 11:06:37,453][02154] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-01 11:06:37,458][02154] InferenceWorker_p0-w0: min num requests: 2 [2024-12-01 11:06:37,569][02154] Starting all processes... [2024-12-01 11:06:37,576][02154] Starting process learner_proc0 [2024-12-01 11:06:37,702][02154] Starting all processes... [2024-12-01 11:06:37,795][02154] Starting process inference_proc0-0 [2024-12-01 11:06:37,796][02154] Starting process rollout_proc0 [2024-12-01 11:06:37,801][02154] Starting process rollout_proc1 [2024-12-01 11:06:37,801][02154] Starting process rollout_proc2 [2024-12-01 11:06:37,801][02154] Starting process rollout_proc3 [2024-12-01 11:06:37,801][02154] Starting process rollout_proc4 [2024-12-01 11:06:37,801][02154] Starting process rollout_proc5 [2024-12-01 11:06:37,801][02154] Starting process rollout_proc6 [2024-12-01 11:06:37,801][02154] Starting process rollout_proc7 [2024-12-01 11:06:54,464][04297] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-01 11:06:54,468][04297] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-12-01 11:06:54,558][04297] Num visible devices: 1 [2024-12-01 11:06:54,614][04297] Starting seed is not provided [2024-12-01 11:06:54,616][04297] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-01 11:06:54,617][04297] Initializing actor-critic model on device cuda:0 [2024-12-01 11:06:54,624][04297] RunningMeanStd input shape: (3, 72, 128) [2024-12-01 11:06:54,636][04297] RunningMeanStd input shape: (1,) [2024-12-01 11:06:54,714][04297] ConvEncoder: input_channels=3 [2024-12-01 11:06:54,762][04318] Worker 7 uses CPU cores [1] [2024-12-01 11:06:54,951][04310] Worker 0 uses CPU cores [0] [2024-12-01 11:06:54,960][04311] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-01 11:06:54,960][04311] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-12-01 11:06:55,034][04311] Num visible devices: 1 [2024-12-01 11:06:55,226][04314] Worker 3 uses CPU cores [1] [2024-12-01 11:06:55,240][04312] Worker 1 uses CPU cores [1] [2024-12-01 11:06:55,259][04313] Worker 2 uses CPU cores [0] [2024-12-01 11:06:55,286][04317] Worker 6 uses CPU cores [0] [2024-12-01 11:06:55,314][04315] Worker 4 uses CPU cores [0] [2024-12-01 11:06:55,337][04316] Worker 5 uses CPU cores [1] [2024-12-01 11:06:55,372][04297] Conv encoder output size: 512 [2024-12-01 11:06:55,373][04297] Policy head output size: 512 [2024-12-01 11:06:55,465][04297] Created Actor Critic model with architecture: [2024-12-01 11:06:55,465][04297] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-12-01 11:06:56,018][04297] Using optimizer [2024-12-01 11:06:57,417][02154] Heartbeat connected on Batcher_0 [2024-12-01 11:06:57,454][02154] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-01 11:06:57,481][02154] Heartbeat connected on RolloutWorker_w0 [2024-12-01 11:06:57,495][02154] Heartbeat connected on RolloutWorker_w1 [2024-12-01 11:06:57,510][02154] Heartbeat connected on RolloutWorker_w2 [2024-12-01 11:06:57,524][02154] Heartbeat connected on RolloutWorker_w3 [2024-12-01 11:06:57,530][02154] Heartbeat connected on RolloutWorker_w4 [2024-12-01 11:06:57,540][02154] Heartbeat connected on RolloutWorker_w5 [2024-12-01 11:06:57,547][02154] Heartbeat connected on RolloutWorker_w6 [2024-12-01 11:06:57,554][02154] Heartbeat connected on RolloutWorker_w7 [2024-12-01 11:06:59,419][04297] No checkpoints found [2024-12-01 11:06:59,420][04297] Did not load from checkpoint, starting from scratch! [2024-12-01 11:06:59,420][04297] Initialized policy 0 weights for model version 0 [2024-12-01 11:06:59,424][04297] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-01 11:06:59,431][04297] LearnerWorker_p0 finished initialization! [2024-12-01 11:06:59,432][02154] Heartbeat connected on LearnerWorker_p0 [2024-12-01 11:06:59,545][02154] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-01 11:06:59,623][04311] RunningMeanStd input shape: (3, 72, 128) [2024-12-01 11:06:59,624][04311] RunningMeanStd input shape: (1,) [2024-12-01 11:06:59,636][04311] ConvEncoder: input_channels=3 [2024-12-01 11:06:59,743][04311] Conv encoder output size: 512 [2024-12-01 11:06:59,743][04311] Policy head output size: 512 [2024-12-01 11:06:59,806][02154] Inference worker 0-0 is ready! [2024-12-01 11:06:59,808][02154] All inference workers are ready! Signal rollout workers to start! [2024-12-01 11:06:59,998][04318] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-01 11:07:00,001][04314] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-01 11:07:00,002][04316] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-01 11:07:00,003][04312] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-01 11:07:00,012][04315] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-01 11:07:00,013][04310] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-01 11:07:00,008][04313] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-01 11:07:00,018][04317] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-01 11:07:01,031][04317] Decorrelating experience for 0 frames... [2024-12-01 11:07:01,030][04315] Decorrelating experience for 0 frames... [2024-12-01 11:07:01,708][04318] Decorrelating experience for 0 frames... [2024-12-01 11:07:01,718][04312] Decorrelating experience for 0 frames... [2024-12-01 11:07:01,715][04316] Decorrelating experience for 0 frames... [2024-12-01 11:07:01,725][04314] Decorrelating experience for 0 frames... [2024-12-01 11:07:02,165][04315] Decorrelating experience for 32 frames... [2024-12-01 11:07:02,191][04313] Decorrelating experience for 0 frames... [2024-12-01 11:07:03,802][04316] Decorrelating experience for 32 frames... [2024-12-01 11:07:03,808][04318] Decorrelating experience for 32 frames... [2024-12-01 11:07:03,816][04312] Decorrelating experience for 32 frames... [2024-12-01 11:07:04,130][04317] Decorrelating experience for 32 frames... [2024-12-01 11:07:04,525][04314] Decorrelating experience for 32 frames... [2024-12-01 11:07:04,545][02154] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-01 11:07:04,624][04315] Decorrelating experience for 64 frames... [2024-12-01 11:07:05,380][04310] Decorrelating experience for 0 frames... [2024-12-01 11:07:05,384][04313] Decorrelating experience for 32 frames... [2024-12-01 11:07:06,428][04316] Decorrelating experience for 64 frames... [2024-12-01 11:07:06,437][04312] Decorrelating experience for 64 frames... [2024-12-01 11:07:06,444][04318] Decorrelating experience for 64 frames... [2024-12-01 11:07:06,902][04313] Decorrelating experience for 64 frames... [2024-12-01 11:07:06,974][04314] Decorrelating experience for 64 frames... [2024-12-01 11:07:07,872][04316] Decorrelating experience for 96 frames... [2024-12-01 11:07:07,878][04310] Decorrelating experience for 32 frames... [2024-12-01 11:07:07,883][04318] Decorrelating experience for 96 frames... [2024-12-01 11:07:07,912][04315] Decorrelating experience for 96 frames... [2024-12-01 11:07:08,694][04312] Decorrelating experience for 96 frames... [2024-12-01 11:07:09,131][04317] Decorrelating experience for 64 frames... [2024-12-01 11:07:09,134][04313] Decorrelating experience for 96 frames... [2024-12-01 11:07:09,447][04310] Decorrelating experience for 64 frames... [2024-12-01 11:07:09,545][02154] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-01 11:07:09,858][04314] Decorrelating experience for 96 frames... [2024-12-01 11:07:10,065][04317] Decorrelating experience for 96 frames... [2024-12-01 11:07:10,203][04310] Decorrelating experience for 96 frames... [2024-12-01 11:07:12,812][04297] Signal inference workers to stop experience collection... [2024-12-01 11:07:12,825][04311] InferenceWorker_p0-w0: stopping experience collection [2024-12-01 11:07:14,545][02154] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 95.6. Samples: 1434. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-01 11:07:14,551][02154] Avg episode reward: [(0, '2.083')] [2024-12-01 11:07:16,225][04297] Signal inference workers to resume experience collection... [2024-12-01 11:07:16,227][04311] InferenceWorker_p0-w0: resuming experience collection [2024-12-01 11:07:19,546][02154] Fps is (10 sec: 1638.2, 60 sec: 819.1, 300 sec: 819.1). Total num frames: 16384. Throughput: 0: 181.1. Samples: 3622. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-12-01 11:07:19,553][02154] Avg episode reward: [(0, '3.384')] [2024-12-01 11:07:24,545][02154] Fps is (10 sec: 2867.3, 60 sec: 1146.9, 300 sec: 1146.9). Total num frames: 28672. Throughput: 0: 312.3. Samples: 7808. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-12-01 11:07:24,548][02154] Avg episode reward: [(0, '3.813')] [2024-12-01 11:07:26,577][04311] Updated weights for policy 0, policy_version 10 (0.0151) [2024-12-01 11:07:29,545][02154] Fps is (10 sec: 3687.1, 60 sec: 1775.0, 300 sec: 1775.0). Total num frames: 53248. Throughput: 0: 359.9. Samples: 10796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-01 11:07:29,551][02154] Avg episode reward: [(0, '4.356')] [2024-12-01 11:07:34,545][02154] Fps is (10 sec: 4505.6, 60 sec: 2106.6, 300 sec: 2106.6). Total num frames: 73728. Throughput: 0: 509.1. Samples: 17818. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-01 11:07:34,550][02154] Avg episode reward: [(0, '4.455')] [2024-12-01 11:07:36,731][04311] Updated weights for policy 0, policy_version 20 (0.0016) [2024-12-01 11:07:39,545][02154] Fps is (10 sec: 3276.8, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 86016. Throughput: 0: 560.0. Samples: 22400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:07:39,549][02154] Avg episode reward: [(0, '4.468')] [2024-12-01 11:07:44,545][02154] Fps is (10 sec: 3276.7, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 106496. Throughput: 0: 544.9. Samples: 24522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:07:44,547][02154] Avg episode reward: [(0, '4.385')] [2024-12-01 11:07:44,551][04297] Saving new best policy, reward=4.385! [2024-12-01 11:07:47,758][04311] Updated weights for policy 0, policy_version 30 (0.0019) [2024-12-01 11:07:49,544][02154] Fps is (10 sec: 4505.7, 60 sec: 2621.5, 300 sec: 2621.5). Total num frames: 131072. Throughput: 0: 697.3. Samples: 31378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:07:49,551][02154] Avg episode reward: [(0, '4.446')] [2024-12-01 11:07:49,557][04297] Saving new best policy, reward=4.446! [2024-12-01 11:07:54,545][02154] Fps is (10 sec: 4096.1, 60 sec: 2681.0, 300 sec: 2681.0). Total num frames: 147456. Throughput: 0: 831.2. Samples: 37404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:07:54,550][02154] Avg episode reward: [(0, '4.371')] [2024-12-01 11:07:59,364][04311] Updated weights for policy 0, policy_version 40 (0.0041) [2024-12-01 11:07:59,545][02154] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 163840. Throughput: 0: 845.7. Samples: 39488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:07:59,546][02154] Avg episode reward: [(0, '4.379')] [2024-12-01 11:08:04,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 184320. Throughput: 0: 924.0. Samples: 45202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:08:04,549][02154] Avg episode reward: [(0, '4.576')] [2024-12-01 11:08:04,552][04297] Saving new best policy, reward=4.576! [2024-12-01 11:08:08,677][04311] Updated weights for policy 0, policy_version 50 (0.0017) [2024-12-01 11:08:09,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 2925.7). Total num frames: 204800. Throughput: 0: 980.3. Samples: 51922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:08:09,548][02154] Avg episode reward: [(0, '4.483')] [2024-12-01 11:08:14,544][02154] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 2949.1). Total num frames: 221184. Throughput: 0: 960.1. Samples: 54000. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:08:14,551][02154] Avg episode reward: [(0, '4.418')] [2024-12-01 11:08:19,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3020.8). Total num frames: 241664. Throughput: 0: 920.1. Samples: 59224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:08:19,546][02154] Avg episode reward: [(0, '4.284')] [2024-12-01 11:08:20,178][04311] Updated weights for policy 0, policy_version 60 (0.0030) [2024-12-01 11:08:24,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3084.1). Total num frames: 262144. Throughput: 0: 973.3. Samples: 66198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:08:24,550][02154] Avg episode reward: [(0, '4.324')] [2024-12-01 11:08:29,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3094.8). Total num frames: 278528. Throughput: 0: 989.1. Samples: 69032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:08:29,547][02154] Avg episode reward: [(0, '4.523')] [2024-12-01 11:08:29,557][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth... [2024-12-01 11:08:31,493][04311] Updated weights for policy 0, policy_version 70 (0.0023) [2024-12-01 11:08:34,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3147.5). Total num frames: 299008. Throughput: 0: 935.6. Samples: 73478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:08:34,549][02154] Avg episode reward: [(0, '4.364')] [2024-12-01 11:08:39,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3194.9). Total num frames: 319488. Throughput: 0: 955.3. Samples: 80394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:08:39,551][02154] Avg episode reward: [(0, '4.338')] [2024-12-01 11:08:40,743][04311] Updated weights for policy 0, policy_version 80 (0.0022) [2024-12-01 11:08:44,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3237.8). Total num frames: 339968. Throughput: 0: 985.3. Samples: 83828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:08:44,551][02154] Avg episode reward: [(0, '4.484')] [2024-12-01 11:08:49,549][02154] Fps is (10 sec: 3684.6, 60 sec: 3754.4, 300 sec: 3239.4). Total num frames: 356352. Throughput: 0: 954.4. Samples: 88156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:08:49,552][02154] Avg episode reward: [(0, '4.416')] [2024-12-01 11:08:52,127][04311] Updated weights for policy 0, policy_version 90 (0.0015) [2024-12-01 11:08:54,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 376832. Throughput: 0: 950.7. Samples: 94702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:08:54,551][02154] Avg episode reward: [(0, '4.424')] [2024-12-01 11:08:59,545][02154] Fps is (10 sec: 4507.7, 60 sec: 3959.5, 300 sec: 3345.1). Total num frames: 401408. Throughput: 0: 981.5. Samples: 98166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:08:59,546][02154] Avg episode reward: [(0, '4.514')] [2024-12-01 11:09:02,173][04311] Updated weights for policy 0, policy_version 100 (0.0030) [2024-12-01 11:09:04,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3309.6). Total num frames: 413696. Throughput: 0: 979.0. Samples: 103278. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:09:04,547][02154] Avg episode reward: [(0, '4.520')] [2024-12-01 11:09:09,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3339.8). Total num frames: 434176. Throughput: 0: 949.6. Samples: 108928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:09:09,550][02154] Avg episode reward: [(0, '4.629')] [2024-12-01 11:09:09,558][04297] Saving new best policy, reward=4.629! [2024-12-01 11:09:12,567][04311] Updated weights for policy 0, policy_version 110 (0.0027) [2024-12-01 11:09:14,545][02154] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3398.2). Total num frames: 458752. Throughput: 0: 961.9. Samples: 112318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:09:14,553][02154] Avg episode reward: [(0, '4.475')] [2024-12-01 11:09:19,548][02154] Fps is (10 sec: 3685.0, 60 sec: 3822.7, 300 sec: 3364.5). Total num frames: 471040. Throughput: 0: 992.4. Samples: 118142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:09:19,554][02154] Avg episode reward: [(0, '4.368')] [2024-12-01 11:09:24,332][04311] Updated weights for policy 0, policy_version 120 (0.0018) [2024-12-01 11:09:24,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3389.8). Total num frames: 491520. Throughput: 0: 945.3. Samples: 122934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:09:24,547][02154] Avg episode reward: [(0, '4.358')] [2024-12-01 11:09:29,545][02154] Fps is (10 sec: 4097.6, 60 sec: 3891.2, 300 sec: 3413.3). Total num frames: 512000. Throughput: 0: 946.1. Samples: 126404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:09:29,551][02154] Avg episode reward: [(0, '4.650')] [2024-12-01 11:09:29,616][04297] Saving new best policy, reward=4.650! [2024-12-01 11:09:33,599][04311] Updated weights for policy 0, policy_version 130 (0.0014) [2024-12-01 11:09:34,546][02154] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3435.3). Total num frames: 532480. Throughput: 0: 998.9. Samples: 133104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:09:34,551][02154] Avg episode reward: [(0, '4.935')] [2024-12-01 11:09:34,556][04297] Saving new best policy, reward=4.935! [2024-12-01 11:09:39,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3430.4). Total num frames: 548864. Throughput: 0: 945.2. Samples: 137238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-01 11:09:39,551][02154] Avg episode reward: [(0, '4.899')] [2024-12-01 11:09:44,545][02154] Fps is (10 sec: 3686.7, 60 sec: 3822.9, 300 sec: 3450.6). Total num frames: 569344. Throughput: 0: 940.5. Samples: 140488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:09:44,555][02154] Avg episode reward: [(0, '4.829')] [2024-12-01 11:09:44,762][04311] Updated weights for policy 0, policy_version 140 (0.0035) [2024-12-01 11:09:49,546][02154] Fps is (10 sec: 4505.0, 60 sec: 3959.7, 300 sec: 3493.6). Total num frames: 593920. Throughput: 0: 982.6. Samples: 147496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:09:49,550][02154] Avg episode reward: [(0, '4.708')] [2024-12-01 11:09:54,545][02154] Fps is (10 sec: 3686.7, 60 sec: 3822.9, 300 sec: 3464.1). Total num frames: 606208. Throughput: 0: 964.9. Samples: 152350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:09:54,552][02154] Avg episode reward: [(0, '4.756')] [2024-12-01 11:09:56,202][04311] Updated weights for policy 0, policy_version 150 (0.0026) [2024-12-01 11:09:59,545][02154] Fps is (10 sec: 3277.3, 60 sec: 3754.7, 300 sec: 3481.6). Total num frames: 626688. Throughput: 0: 942.8. Samples: 154744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:09:59,548][02154] Avg episode reward: [(0, '4.559')] [2024-12-01 11:10:04,545][02154] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3520.4). Total num frames: 651264. Throughput: 0: 967.1. Samples: 161658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:10:04,547][02154] Avg episode reward: [(0, '4.980')] [2024-12-01 11:10:04,554][04297] Saving new best policy, reward=4.980! [2024-12-01 11:10:05,329][04311] Updated weights for policy 0, policy_version 160 (0.0017) [2024-12-01 11:10:09,546][02154] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3513.9). Total num frames: 667648. Throughput: 0: 985.8. Samples: 167296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:10:09,548][02154] Avg episode reward: [(0, '4.886')] [2024-12-01 11:10:14,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3507.9). Total num frames: 684032. Throughput: 0: 953.6. Samples: 169316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:10:14,552][02154] Avg episode reward: [(0, '4.873')] [2024-12-01 11:10:16,937][04311] Updated weights for policy 0, policy_version 170 (0.0030) [2024-12-01 11:10:19,545][02154] Fps is (10 sec: 4096.5, 60 sec: 3959.7, 300 sec: 3543.1). Total num frames: 708608. Throughput: 0: 947.8. Samples: 175752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-01 11:10:19,551][02154] Avg episode reward: [(0, '5.009')] [2024-12-01 11:10:19,561][04297] Saving new best policy, reward=5.009! [2024-12-01 11:10:24,572][02154] Fps is (10 sec: 4085.0, 60 sec: 3889.4, 300 sec: 3536.1). Total num frames: 724992. Throughput: 0: 1000.6. Samples: 182290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:10:24,576][02154] Avg episode reward: [(0, '5.080')] [2024-12-01 11:10:24,651][04297] Saving new best policy, reward=5.080! [2024-12-01 11:10:27,725][04311] Updated weights for policy 0, policy_version 180 (0.0017) [2024-12-01 11:10:29,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3530.4). Total num frames: 741376. Throughput: 0: 973.6. Samples: 184300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:10:29,550][02154] Avg episode reward: [(0, '5.132')] [2024-12-01 11:10:29,560][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000181_741376.pth... [2024-12-01 11:10:29,723][04297] Saving new best policy, reward=5.132! [2024-12-01 11:10:34,545][02154] Fps is (10 sec: 3696.4, 60 sec: 3823.0, 300 sec: 3543.5). Total num frames: 761856. Throughput: 0: 938.6. Samples: 189732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:10:34,551][02154] Avg episode reward: [(0, '5.341')] [2024-12-01 11:10:34,553][04297] Saving new best policy, reward=5.341! [2024-12-01 11:10:37,487][04311] Updated weights for policy 0, policy_version 190 (0.0018) [2024-12-01 11:10:39,545][02154] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3574.7). Total num frames: 786432. Throughput: 0: 981.3. Samples: 196510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:10:39,547][02154] Avg episode reward: [(0, '5.566')] [2024-12-01 11:10:39,555][04297] Saving new best policy, reward=5.566! [2024-12-01 11:10:44,548][02154] Fps is (10 sec: 3685.0, 60 sec: 3822.7, 300 sec: 3549.8). Total num frames: 798720. Throughput: 0: 985.3. Samples: 199086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:10:44,551][02154] Avg episode reward: [(0, '5.299')] [2024-12-01 11:10:49,132][04311] Updated weights for policy 0, policy_version 200 (0.0062) [2024-12-01 11:10:49,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3561.7). Total num frames: 819200. Throughput: 0: 936.8. Samples: 203812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:10:49,547][02154] Avg episode reward: [(0, '5.720')] [2024-12-01 11:10:49,558][04297] Saving new best policy, reward=5.720! [2024-12-01 11:10:54,544][02154] Fps is (10 sec: 4507.4, 60 sec: 3959.5, 300 sec: 3590.5). Total num frames: 843776. Throughput: 0: 965.0. Samples: 210722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:10:54,551][02154] Avg episode reward: [(0, '5.950')] [2024-12-01 11:10:54,555][04297] Saving new best policy, reward=5.950! [2024-12-01 11:10:59,072][04311] Updated weights for policy 0, policy_version 210 (0.0021) [2024-12-01 11:10:59,545][02154] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3584.0). Total num frames: 860160. Throughput: 0: 994.7. Samples: 214078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:10:59,547][02154] Avg episode reward: [(0, '5.960')] [2024-12-01 11:10:59,553][04297] Saving new best policy, reward=5.960! [2024-12-01 11:11:04,544][02154] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3577.7). Total num frames: 876544. Throughput: 0: 945.2. Samples: 218284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:11:04,547][02154] Avg episode reward: [(0, '5.842')] [2024-12-01 11:11:09,553][02154] Fps is (10 sec: 3686.5, 60 sec: 3823.0, 300 sec: 3588.1). Total num frames: 897024. Throughput: 0: 945.7. Samples: 224820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:11:09,560][02154] Avg episode reward: [(0, '6.163')] [2024-12-01 11:11:09,585][04311] Updated weights for policy 0, policy_version 220 (0.0025) [2024-12-01 11:11:09,589][04297] Saving new best policy, reward=6.163! [2024-12-01 11:11:14,550][02154] Fps is (10 sec: 4502.9, 60 sec: 3959.1, 300 sec: 3614.0). Total num frames: 921600. Throughput: 0: 976.0. Samples: 228226. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-01 11:11:14,553][02154] Avg episode reward: [(0, '6.746')] [2024-12-01 11:11:14,555][04297] Saving new best policy, reward=6.746! [2024-12-01 11:11:19,547][02154] Fps is (10 sec: 3685.4, 60 sec: 3754.5, 300 sec: 3591.8). Total num frames: 933888. Throughput: 0: 964.5. Samples: 233136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:11:19,552][02154] Avg episode reward: [(0, '6.237')] [2024-12-01 11:11:21,326][04311] Updated weights for policy 0, policy_version 230 (0.0025) [2024-12-01 11:11:24,545][02154] Fps is (10 sec: 3278.7, 60 sec: 3824.7, 300 sec: 3601.4). Total num frames: 954368. Throughput: 0: 948.0. Samples: 239172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:11:24,547][02154] Avg episode reward: [(0, '6.446')] [2024-12-01 11:11:29,545][02154] Fps is (10 sec: 4506.9, 60 sec: 3959.5, 300 sec: 3625.7). Total num frames: 978944. Throughput: 0: 969.6. Samples: 242714. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:11:29,547][02154] Avg episode reward: [(0, '6.223')] [2024-12-01 11:11:29,923][04311] Updated weights for policy 0, policy_version 240 (0.0032) [2024-12-01 11:11:34,545][02154] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3619.4). Total num frames: 995328. Throughput: 0: 995.8. Samples: 248622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:11:34,547][02154] Avg episode reward: [(0, '6.553')] [2024-12-01 11:11:39,545][02154] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3627.9). Total num frames: 1015808. Throughput: 0: 957.9. Samples: 253830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:11:39,550][02154] Avg episode reward: [(0, '6.609')] [2024-12-01 11:11:41,322][04311] Updated weights for policy 0, policy_version 250 (0.0025) [2024-12-01 11:11:44,545][02154] Fps is (10 sec: 4096.1, 60 sec: 3959.7, 300 sec: 3636.1). Total num frames: 1036288. Throughput: 0: 959.3. Samples: 257246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:11:44,547][02154] Avg episode reward: [(0, '6.984')] [2024-12-01 11:11:44,553][04297] Saving new best policy, reward=6.984! [2024-12-01 11:11:49,545][02154] Fps is (10 sec: 4095.7, 60 sec: 3959.4, 300 sec: 3644.0). Total num frames: 1056768. Throughput: 0: 1013.1. Samples: 263874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-01 11:11:49,550][02154] Avg episode reward: [(0, '6.679')] [2024-12-01 11:11:52,294][04311] Updated weights for policy 0, policy_version 260 (0.0019) [2024-12-01 11:11:54,544][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3637.8). Total num frames: 1073152. Throughput: 0: 963.6. Samples: 268184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:11:54,546][02154] Avg episode reward: [(0, '7.022')] [2024-12-01 11:11:54,551][04297] Saving new best policy, reward=7.022! [2024-12-01 11:11:59,545][02154] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 1093632. Throughput: 0: 965.1. Samples: 271652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:11:59,549][02154] Avg episode reward: [(0, '6.522')] [2024-12-01 11:12:01,509][04311] Updated weights for policy 0, policy_version 270 (0.0022) [2024-12-01 11:12:04,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1114112. Throughput: 0: 1011.0. Samples: 278628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:12:04,550][02154] Avg episode reward: [(0, '6.455')] [2024-12-01 11:12:09,546][02154] Fps is (10 sec: 3686.0, 60 sec: 3891.1, 300 sec: 3832.2). Total num frames: 1130496. Throughput: 0: 979.8. Samples: 283264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:12:09,548][02154] Avg episode reward: [(0, '6.511')] [2024-12-01 11:12:12,954][04311] Updated weights for policy 0, policy_version 280 (0.0022) [2024-12-01 11:12:14,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3846.1). Total num frames: 1150976. Throughput: 0: 962.5. Samples: 286028. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:12:14,551][02154] Avg episode reward: [(0, '6.813')] [2024-12-01 11:12:19,545][02154] Fps is (10 sec: 4506.0, 60 sec: 4027.9, 300 sec: 3887.7). Total num frames: 1175552. Throughput: 0: 988.9. Samples: 293124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:12:19,548][02154] Avg episode reward: [(0, '7.376')] [2024-12-01 11:12:19,555][04297] Saving new best policy, reward=7.376! [2024-12-01 11:12:22,420][04311] Updated weights for policy 0, policy_version 290 (0.0014) [2024-12-01 11:12:24,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1191936. Throughput: 0: 993.0. Samples: 298514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:12:24,548][02154] Avg episode reward: [(0, '7.498')] [2024-12-01 11:12:24,556][04297] Saving new best policy, reward=7.498! [2024-12-01 11:12:29,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1212416. Throughput: 0: 964.0. Samples: 300626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:12:29,547][02154] Avg episode reward: [(0, '8.103')] [2024-12-01 11:12:29,556][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth... [2024-12-01 11:12:29,686][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000068_278528.pth [2024-12-01 11:12:29,703][04297] Saving new best policy, reward=8.103! [2024-12-01 11:12:33,069][04311] Updated weights for policy 0, policy_version 300 (0.0021) [2024-12-01 11:12:34,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1232896. Throughput: 0: 969.2. Samples: 307488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:12:34,550][02154] Avg episode reward: [(0, '8.688')] [2024-12-01 11:12:34,553][04297] Saving new best policy, reward=8.688! [2024-12-01 11:12:39,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1253376. Throughput: 0: 1010.7. Samples: 313666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:12:39,548][02154] Avg episode reward: [(0, '8.875')] [2024-12-01 11:12:39,557][04297] Saving new best policy, reward=8.875! [2024-12-01 11:12:44,545][02154] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1265664. Throughput: 0: 978.8. Samples: 315698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:12:44,551][02154] Avg episode reward: [(0, '8.909')] [2024-12-01 11:12:44,625][04311] Updated weights for policy 0, policy_version 310 (0.0017) [2024-12-01 11:12:44,621][04297] Saving new best policy, reward=8.909! [2024-12-01 11:12:49,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3873.8). Total num frames: 1290240. Throughput: 0: 958.2. Samples: 321748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:12:49,547][02154] Avg episode reward: [(0, '9.240')] [2024-12-01 11:12:49,554][04297] Saving new best policy, reward=9.240! [2024-12-01 11:12:53,504][04311] Updated weights for policy 0, policy_version 320 (0.0019) [2024-12-01 11:12:54,545][02154] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1310720. Throughput: 0: 1011.2. Samples: 328768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:12:54,550][02154] Avg episode reward: [(0, '9.491')] [2024-12-01 11:12:54,616][04297] Saving new best policy, reward=9.491! [2024-12-01 11:12:59,545][02154] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 1327104. Throughput: 0: 995.8. Samples: 330838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:12:59,548][02154] Avg episode reward: [(0, '9.634')] [2024-12-01 11:12:59,561][04297] Saving new best policy, reward=9.634! [2024-12-01 11:13:04,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1347584. Throughput: 0: 956.2. Samples: 336154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:13:04,547][02154] Avg episode reward: [(0, '9.440')] [2024-12-01 11:13:04,777][04311] Updated weights for policy 0, policy_version 330 (0.0020) [2024-12-01 11:13:09,552][02154] Fps is (10 sec: 4502.6, 60 sec: 4027.3, 300 sec: 3901.5). Total num frames: 1372160. Throughput: 0: 994.1. Samples: 343258. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:13:09,554][02154] Avg episode reward: [(0, '9.874')] [2024-12-01 11:13:09,560][04297] Saving new best policy, reward=9.874! [2024-12-01 11:13:14,546][02154] Fps is (10 sec: 4095.3, 60 sec: 3959.4, 300 sec: 3887.7). Total num frames: 1388544. Throughput: 0: 1010.3. Samples: 346092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:13:14,551][02154] Avg episode reward: [(0, '9.646')] [2024-12-01 11:13:15,579][04311] Updated weights for policy 0, policy_version 340 (0.0017) [2024-12-01 11:13:19,545][02154] Fps is (10 sec: 3279.2, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1404928. Throughput: 0: 956.4. Samples: 350526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:13:19,551][02154] Avg episode reward: [(0, '10.598')] [2024-12-01 11:13:19,557][04297] Saving new best policy, reward=10.598! [2024-12-01 11:13:24,545][02154] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1429504. Throughput: 0: 974.2. Samples: 357504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:13:24,551][02154] Avg episode reward: [(0, '10.997')] [2024-12-01 11:13:24,553][04297] Saving new best policy, reward=10.997! [2024-12-01 11:13:25,121][04311] Updated weights for policy 0, policy_version 350 (0.0022) [2024-12-01 11:13:29,545][02154] Fps is (10 sec: 4505.2, 60 sec: 3959.4, 300 sec: 3901.6). Total num frames: 1449984. Throughput: 0: 1006.6. Samples: 360994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:13:29,552][02154] Avg episode reward: [(0, '11.469')] [2024-12-01 11:13:29,564][04297] Saving new best policy, reward=11.469! [2024-12-01 11:13:34,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 1462272. Throughput: 0: 971.2. Samples: 365450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:13:34,551][02154] Avg episode reward: [(0, '11.156')] [2024-12-01 11:13:36,611][04311] Updated weights for policy 0, policy_version 360 (0.0024) [2024-12-01 11:13:39,544][02154] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1486848. Throughput: 0: 960.8. Samples: 372004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:13:39,551][02154] Avg episode reward: [(0, '11.143')] [2024-12-01 11:13:44,545][02154] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3915.6). Total num frames: 1511424. Throughput: 0: 990.6. Samples: 375416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:13:44,550][02154] Avg episode reward: [(0, '11.309')] [2024-12-01 11:13:45,885][04311] Updated weights for policy 0, policy_version 370 (0.0018) [2024-12-01 11:13:49,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1523712. Throughput: 0: 989.6. Samples: 380686. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:13:49,554][02154] Avg episode reward: [(0, '12.157')] [2024-12-01 11:13:49,566][04297] Saving new best policy, reward=12.157! [2024-12-01 11:13:54,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1544192. Throughput: 0: 960.2. Samples: 386462. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-01 11:13:54,552][02154] Avg episode reward: [(0, '11.562')] [2024-12-01 11:13:56,549][04311] Updated weights for policy 0, policy_version 380 (0.0015) [2024-12-01 11:13:59,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3915.5). Total num frames: 1568768. Throughput: 0: 974.6. Samples: 389948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:13:59,547][02154] Avg episode reward: [(0, '12.331')] [2024-12-01 11:13:59,554][04297] Saving new best policy, reward=12.331! [2024-12-01 11:14:04,547][02154] Fps is (10 sec: 4095.1, 60 sec: 3959.3, 300 sec: 3901.6). Total num frames: 1585152. Throughput: 0: 1006.7. Samples: 395828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-01 11:14:04,549][02154] Avg episode reward: [(0, '12.222')] [2024-12-01 11:14:07,963][04311] Updated weights for policy 0, policy_version 390 (0.0019) [2024-12-01 11:14:09,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3823.4, 300 sec: 3873.8). Total num frames: 1601536. Throughput: 0: 963.6. Samples: 400868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-01 11:14:09,549][02154] Avg episode reward: [(0, '12.702')] [2024-12-01 11:14:09,559][04297] Saving new best policy, reward=12.702! [2024-12-01 11:14:14,545][02154] Fps is (10 sec: 4097.0, 60 sec: 3959.6, 300 sec: 3915.5). Total num frames: 1626112. Throughput: 0: 962.2. Samples: 404294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:14:14,552][02154] Avg episode reward: [(0, '14.319')] [2024-12-01 11:14:14,557][04297] Saving new best policy, reward=14.319! [2024-12-01 11:14:16,959][04311] Updated weights for policy 0, policy_version 400 (0.0018) [2024-12-01 11:14:19,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1642496. Throughput: 0: 1013.3. Samples: 411048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:14:19,547][02154] Avg episode reward: [(0, '15.188')] [2024-12-01 11:14:19,623][04297] Saving new best policy, reward=15.188! [2024-12-01 11:14:24,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 1658880. Throughput: 0: 958.9. Samples: 415154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:14:24,552][02154] Avg episode reward: [(0, '15.897')] [2024-12-01 11:14:24,559][04297] Saving new best policy, reward=15.897! [2024-12-01 11:14:28,708][04311] Updated weights for policy 0, policy_version 410 (0.0018) [2024-12-01 11:14:29,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 1679360. Throughput: 0: 956.0. Samples: 418436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:14:29,547][02154] Avg episode reward: [(0, '15.752')] [2024-12-01 11:14:29,579][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000411_1683456.pth... [2024-12-01 11:14:29,705][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000181_741376.pth [2024-12-01 11:14:34,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1703936. Throughput: 0: 992.4. Samples: 425344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:14:34,550][02154] Avg episode reward: [(0, '15.207')] [2024-12-01 11:14:39,440][04311] Updated weights for policy 0, policy_version 420 (0.0030) [2024-12-01 11:14:39,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1720320. Throughput: 0: 971.0. Samples: 430158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:14:39,550][02154] Avg episode reward: [(0, '15.406')] [2024-12-01 11:14:44,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 1740800. Throughput: 0: 952.2. Samples: 432796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:14:44,547][02154] Avg episode reward: [(0, '14.643')] [2024-12-01 11:14:48,797][04311] Updated weights for policy 0, policy_version 430 (0.0019) [2024-12-01 11:14:49,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1761280. Throughput: 0: 979.0. Samples: 439882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:14:49,547][02154] Avg episode reward: [(0, '16.035')] [2024-12-01 11:14:49,553][04297] Saving new best policy, reward=16.035! [2024-12-01 11:14:54,545][02154] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3901.6). Total num frames: 1777664. Throughput: 0: 988.5. Samples: 445352. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2024-12-01 11:14:54,549][02154] Avg episode reward: [(0, '15.634')] [2024-12-01 11:14:59,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 1798144. Throughput: 0: 959.5. Samples: 447470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:14:59,549][02154] Avg episode reward: [(0, '15.724')] [2024-12-01 11:15:00,013][04311] Updated weights for policy 0, policy_version 440 (0.0019) [2024-12-01 11:15:04,545][02154] Fps is (10 sec: 4506.0, 60 sec: 3959.6, 300 sec: 3915.5). Total num frames: 1822720. Throughput: 0: 963.1. Samples: 454386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:15:04,552][02154] Avg episode reward: [(0, '15.637')] [2024-12-01 11:15:09,320][04311] Updated weights for policy 0, policy_version 450 (0.0027) [2024-12-01 11:15:09,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1843200. Throughput: 0: 1017.0. Samples: 460920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:15:09,551][02154] Avg episode reward: [(0, '15.215')] [2024-12-01 11:15:14,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 1855488. Throughput: 0: 989.7. Samples: 462974. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:15:14,548][02154] Avg episode reward: [(0, '16.071')] [2024-12-01 11:15:14,551][04297] Saving new best policy, reward=16.071! [2024-12-01 11:15:19,545][02154] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3915.9). Total num frames: 1880064. Throughput: 0: 969.2. Samples: 468956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-01 11:15:19,549][02154] Avg episode reward: [(0, '16.809')] [2024-12-01 11:15:19,558][04297] Saving new best policy, reward=16.809! [2024-12-01 11:15:20,142][04311] Updated weights for policy 0, policy_version 460 (0.0018) [2024-12-01 11:15:24,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1900544. Throughput: 0: 1020.2. Samples: 476068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:15:24,547][02154] Avg episode reward: [(0, '16.062')] [2024-12-01 11:15:29,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1916928. Throughput: 0: 1011.2. Samples: 478302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:15:29,554][02154] Avg episode reward: [(0, '15.872')] [2024-12-01 11:15:31,341][04311] Updated weights for policy 0, policy_version 470 (0.0016) [2024-12-01 11:15:34,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1937408. Throughput: 0: 974.3. Samples: 483726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:15:34,551][02154] Avg episode reward: [(0, '15.878')] [2024-12-01 11:15:39,545][02154] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1961984. Throughput: 0: 1013.0. Samples: 490938. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-01 11:15:39,551][02154] Avg episode reward: [(0, '16.309')] [2024-12-01 11:15:39,618][04311] Updated weights for policy 0, policy_version 480 (0.0034) [2024-12-01 11:15:44,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1978368. Throughput: 0: 1034.0. Samples: 493998. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:15:44,547][02154] Avg episode reward: [(0, '16.560')] [2024-12-01 11:15:49,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1998848. Throughput: 0: 978.5. Samples: 498420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:15:49,551][02154] Avg episode reward: [(0, '17.542')] [2024-12-01 11:15:49,560][04297] Saving new best policy, reward=17.542! [2024-12-01 11:15:51,079][04311] Updated weights for policy 0, policy_version 490 (0.0034) [2024-12-01 11:15:54,545][02154] Fps is (10 sec: 4505.3, 60 sec: 4096.0, 300 sec: 3943.3). Total num frames: 2023424. Throughput: 0: 990.4. Samples: 505490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:15:54,548][02154] Avg episode reward: [(0, '19.366')] [2024-12-01 11:15:54,557][04297] Saving new best policy, reward=19.366! [2024-12-01 11:15:59,545][02154] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2039808. Throughput: 0: 1022.3. Samples: 508978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:15:59,550][02154] Avg episode reward: [(0, '19.661')] [2024-12-01 11:15:59,560][04297] Saving new best policy, reward=19.661! [2024-12-01 11:16:01,342][04311] Updated weights for policy 0, policy_version 500 (0.0031) [2024-12-01 11:16:04,545][02154] Fps is (10 sec: 3277.0, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2056192. Throughput: 0: 990.1. Samples: 513510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-01 11:16:04,549][02154] Avg episode reward: [(0, '19.704')] [2024-12-01 11:16:04,553][04297] Saving new best policy, reward=19.704! [2024-12-01 11:16:09,545][02154] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3929.5). Total num frames: 2080768. Throughput: 0: 975.5. Samples: 519964. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-01 11:16:09,549][02154] Avg episode reward: [(0, '20.702')] [2024-12-01 11:16:09,557][04297] Saving new best policy, reward=20.702! [2024-12-01 11:16:11,209][04311] Updated weights for policy 0, policy_version 510 (0.0030) [2024-12-01 11:16:14,545][02154] Fps is (10 sec: 4505.2, 60 sec: 4095.9, 300 sec: 3957.2). Total num frames: 2101248. Throughput: 0: 1003.8. Samples: 523472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:16:14,550][02154] Avg episode reward: [(0, '21.517')] [2024-12-01 11:16:14,555][04297] Saving new best policy, reward=21.517! [2024-12-01 11:16:19,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2117632. Throughput: 0: 999.4. Samples: 528700. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:16:19,548][02154] Avg episode reward: [(0, '20.857')] [2024-12-01 11:16:22,622][04311] Updated weights for policy 0, policy_version 520 (0.0016) [2024-12-01 11:16:24,545][02154] Fps is (10 sec: 3686.7, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2138112. Throughput: 0: 964.3. Samples: 534332. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:16:24,549][02154] Avg episode reward: [(0, '19.505')] [2024-12-01 11:16:29,545][02154] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2158592. Throughput: 0: 975.7. Samples: 537906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-01 11:16:29,550][02154] Avg episode reward: [(0, '19.547')] [2024-12-01 11:16:29,561][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000527_2158592.pth... [2024-12-01 11:16:29,715][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth [2024-12-01 11:16:31,705][04311] Updated weights for policy 0, policy_version 530 (0.0018) [2024-12-01 11:16:34,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2174976. Throughput: 0: 1013.6. Samples: 544034. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-01 11:16:34,550][02154] Avg episode reward: [(0, '19.638')] [2024-12-01 11:16:39,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2195456. Throughput: 0: 962.5. Samples: 548800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:16:39,549][02154] Avg episode reward: [(0, '19.190')] [2024-12-01 11:16:42,699][04311] Updated weights for policy 0, policy_version 540 (0.0017) [2024-12-01 11:16:44,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2215936. Throughput: 0: 965.5. Samples: 552424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:16:44,551][02154] Avg episode reward: [(0, '19.651')] [2024-12-01 11:16:49,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2236416. Throughput: 0: 1019.4. Samples: 559384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:16:49,548][02154] Avg episode reward: [(0, '21.226')] [2024-12-01 11:16:54,060][04311] Updated weights for policy 0, policy_version 550 (0.0015) [2024-12-01 11:16:54,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3929.4). Total num frames: 2252800. Throughput: 0: 972.0. Samples: 563704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:16:54,551][02154] Avg episode reward: [(0, '21.004')] [2024-12-01 11:16:59,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2277376. Throughput: 0: 969.0. Samples: 567074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:16:59,546][02154] Avg episode reward: [(0, '20.715')] [2024-12-01 11:17:02,662][04311] Updated weights for policy 0, policy_version 560 (0.0016) [2024-12-01 11:17:04,547][02154] Fps is (10 sec: 4914.2, 60 sec: 4095.9, 300 sec: 3971.0). Total num frames: 2301952. Throughput: 0: 1010.5. Samples: 574174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:17:04,549][02154] Avg episode reward: [(0, '21.025')] [2024-12-01 11:17:09,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2314240. Throughput: 0: 996.4. Samples: 579172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:17:09,547][02154] Avg episode reward: [(0, '21.297')] [2024-12-01 11:17:13,933][04311] Updated weights for policy 0, policy_version 570 (0.0029) [2024-12-01 11:17:14,545][02154] Fps is (10 sec: 3277.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2334720. Throughput: 0: 973.3. Samples: 581706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:17:14,553][02154] Avg episode reward: [(0, '20.957')] [2024-12-01 11:17:19,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2359296. Throughput: 0: 992.8. Samples: 588710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:17:19,550][02154] Avg episode reward: [(0, '21.059')] [2024-12-01 11:17:23,750][04311] Updated weights for policy 0, policy_version 580 (0.0036) [2024-12-01 11:17:24,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2375680. Throughput: 0: 1015.6. Samples: 594500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:17:24,547][02154] Avg episode reward: [(0, '20.867')] [2024-12-01 11:17:29,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2392064. Throughput: 0: 982.4. Samples: 596630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:17:29,550][02154] Avg episode reward: [(0, '19.787')] [2024-12-01 11:17:34,007][04311] Updated weights for policy 0, policy_version 590 (0.0020) [2024-12-01 11:17:34,545][02154] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2416640. Throughput: 0: 976.0. Samples: 603302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:17:34,551][02154] Avg episode reward: [(0, '19.073')] [2024-12-01 11:17:39,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2437120. Throughput: 0: 1028.8. Samples: 610000. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:17:39,548][02154] Avg episode reward: [(0, '19.120')] [2024-12-01 11:17:44,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2453504. Throughput: 0: 999.7. Samples: 612062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:17:44,550][02154] Avg episode reward: [(0, '20.245')] [2024-12-01 11:17:45,453][04311] Updated weights for policy 0, policy_version 600 (0.0019) [2024-12-01 11:17:49,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2473984. Throughput: 0: 970.4. Samples: 617838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:17:49,546][02154] Avg episode reward: [(0, '19.747')] [2024-12-01 11:17:54,070][04311] Updated weights for policy 0, policy_version 610 (0.0020) [2024-12-01 11:17:54,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 2498560. Throughput: 0: 1018.0. Samples: 624984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:17:54,547][02154] Avg episode reward: [(0, '20.735')] [2024-12-01 11:17:59,549][02154] Fps is (10 sec: 3684.9, 60 sec: 3890.9, 300 sec: 3943.2). Total num frames: 2510848. Throughput: 0: 1015.1. Samples: 627388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:17:59,551][02154] Avg episode reward: [(0, '21.183')] [2024-12-01 11:18:04,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3943.4). Total num frames: 2535424. Throughput: 0: 971.2. Samples: 632416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:18:04,548][02154] Avg episode reward: [(0, '21.601')] [2024-12-01 11:18:04,550][04297] Saving new best policy, reward=21.601! [2024-12-01 11:18:05,495][04311] Updated weights for policy 0, policy_version 620 (0.0022) [2024-12-01 11:18:09,545][02154] Fps is (10 sec: 4507.4, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2555904. Throughput: 0: 997.4. Samples: 639384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:18:09,547][02154] Avg episode reward: [(0, '21.949')] [2024-12-01 11:18:09,558][04297] Saving new best policy, reward=21.949! [2024-12-01 11:18:14,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2572288. Throughput: 0: 1019.5. Samples: 642506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:18:14,548][02154] Avg episode reward: [(0, '22.947')] [2024-12-01 11:18:14,554][04297] Saving new best policy, reward=22.947! [2024-12-01 11:18:16,588][04311] Updated weights for policy 0, policy_version 630 (0.0026) [2024-12-01 11:18:19,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2588672. Throughput: 0: 961.7. Samples: 646580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:18:19,547][02154] Avg episode reward: [(0, '24.463')] [2024-12-01 11:18:19,557][04297] Saving new best policy, reward=24.463! [2024-12-01 11:18:24,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 2613248. Throughput: 0: 965.3. Samples: 653440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:18:24,546][02154] Avg episode reward: [(0, '23.117')] [2024-12-01 11:18:26,014][04311] Updated weights for policy 0, policy_version 640 (0.0019) [2024-12-01 11:18:29,551][02154] Fps is (10 sec: 4503.0, 60 sec: 4027.3, 300 sec: 3971.0). Total num frames: 2633728. Throughput: 0: 996.9. Samples: 656928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:18:29,553][02154] Avg episode reward: [(0, '24.151')] [2024-12-01 11:18:29,564][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000643_2633728.pth... [2024-12-01 11:18:29,709][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000411_1683456.pth [2024-12-01 11:18:34,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2646016. Throughput: 0: 972.8. Samples: 661616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:18:34,550][02154] Avg episode reward: [(0, '23.539')] [2024-12-01 11:18:37,485][04311] Updated weights for policy 0, policy_version 650 (0.0018) [2024-12-01 11:18:39,545][02154] Fps is (10 sec: 3688.6, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2670592. Throughput: 0: 952.4. Samples: 667840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:18:39,547][02154] Avg episode reward: [(0, '23.248')] [2024-12-01 11:18:44,545][02154] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2695168. Throughput: 0: 978.0. Samples: 671394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:18:44,547][02154] Avg episode reward: [(0, '21.934')] [2024-12-01 11:18:47,100][04311] Updated weights for policy 0, policy_version 660 (0.0018) [2024-12-01 11:18:49,549][02154] Fps is (10 sec: 3684.9, 60 sec: 3890.9, 300 sec: 3943.2). Total num frames: 2707456. Throughput: 0: 989.2. Samples: 676932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:18:49,551][02154] Avg episode reward: [(0, '23.061')] [2024-12-01 11:18:54,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2727936. Throughput: 0: 950.4. Samples: 682150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:18:54,547][02154] Avg episode reward: [(0, '24.697')] [2024-12-01 11:18:54,553][04297] Saving new best policy, reward=24.697! [2024-12-01 11:18:57,677][04311] Updated weights for policy 0, policy_version 670 (0.0028) [2024-12-01 11:18:59,545][02154] Fps is (10 sec: 4507.4, 60 sec: 4028.0, 300 sec: 3957.2). Total num frames: 2752512. Throughput: 0: 956.8. Samples: 685564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:18:59,547][02154] Avg episode reward: [(0, '24.392')] [2024-12-01 11:19:04,545][02154] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3957.1). Total num frames: 2768896. Throughput: 0: 1012.2. Samples: 692130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:19:04,547][02154] Avg episode reward: [(0, '25.435')] [2024-12-01 11:19:04,555][04297] Saving new best policy, reward=25.435! [2024-12-01 11:19:09,307][04311] Updated weights for policy 0, policy_version 680 (0.0028) [2024-12-01 11:19:09,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2785280. Throughput: 0: 955.9. Samples: 696454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:19:09,551][02154] Avg episode reward: [(0, '25.549')] [2024-12-01 11:19:09,560][04297] Saving new best policy, reward=25.549! [2024-12-01 11:19:14,545][02154] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2805760. Throughput: 0: 953.2. Samples: 699818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:19:14,551][02154] Avg episode reward: [(0, '26.194')] [2024-12-01 11:19:14,597][04297] Saving new best policy, reward=26.194! [2024-12-01 11:19:18,346][04311] Updated weights for policy 0, policy_version 690 (0.0038) [2024-12-01 11:19:19,545][02154] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3957.1). Total num frames: 2826240. Throughput: 0: 998.5. Samples: 706548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:19:19,551][02154] Avg episode reward: [(0, '24.193')] [2024-12-01 11:19:24,549][02154] Fps is (10 sec: 3684.6, 60 sec: 3822.6, 300 sec: 3943.2). Total num frames: 2842624. Throughput: 0: 959.9. Samples: 711038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:19:24,552][02154] Avg episode reward: [(0, '23.403')] [2024-12-01 11:19:29,545][02154] Fps is (10 sec: 3686.5, 60 sec: 3823.3, 300 sec: 3929.4). Total num frames: 2863104. Throughput: 0: 946.1. Samples: 713970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:19:29,547][02154] Avg episode reward: [(0, '20.869')] [2024-12-01 11:19:29,685][04311] Updated weights for policy 0, policy_version 700 (0.0018) [2024-12-01 11:19:34,545][02154] Fps is (10 sec: 4507.8, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 2887680. Throughput: 0: 979.2. Samples: 720990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:19:34,547][02154] Avg episode reward: [(0, '21.022')] [2024-12-01 11:19:39,545][02154] Fps is (10 sec: 4095.7, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2904064. Throughput: 0: 979.6. Samples: 726234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:19:39,549][02154] Avg episode reward: [(0, '21.384')] [2024-12-01 11:19:40,740][04311] Updated weights for policy 0, policy_version 710 (0.0015) [2024-12-01 11:19:44,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3929.4). Total num frames: 2920448. Throughput: 0: 951.6. Samples: 728386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:19:44,547][02154] Avg episode reward: [(0, '21.556')] [2024-12-01 11:19:49,550][02154] Fps is (10 sec: 4094.2, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 2945024. Throughput: 0: 958.6. Samples: 735270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:19:49,552][02154] Avg episode reward: [(0, '21.931')] [2024-12-01 11:19:50,096][04311] Updated weights for policy 0, policy_version 720 (0.0023) [2024-12-01 11:19:54,545][02154] Fps is (10 sec: 4095.6, 60 sec: 3891.1, 300 sec: 3943.3). Total num frames: 2961408. Throughput: 0: 997.4. Samples: 741340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:19:54,549][02154] Avg episode reward: [(0, '23.076')] [2024-12-01 11:19:59,545][02154] Fps is (10 sec: 3278.4, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 2977792. Throughput: 0: 970.4. Samples: 743484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:19:59,549][02154] Avg episode reward: [(0, '24.049')] [2024-12-01 11:20:01,310][04311] Updated weights for policy 0, policy_version 730 (0.0030) [2024-12-01 11:20:04,545][02154] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3002368. Throughput: 0: 961.5. Samples: 749814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:20:04,551][02154] Avg episode reward: [(0, '21.628')] [2024-12-01 11:20:09,545][02154] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3022848. Throughput: 0: 1014.2. Samples: 756674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:20:09,547][02154] Avg episode reward: [(0, '22.201')] [2024-12-01 11:20:11,089][04311] Updated weights for policy 0, policy_version 740 (0.0017) [2024-12-01 11:20:14,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3039232. Throughput: 0: 994.9. Samples: 758742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:20:14,555][02154] Avg episode reward: [(0, '21.835')] [2024-12-01 11:20:19,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3059712. Throughput: 0: 961.8. Samples: 764272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:20:19,550][02154] Avg episode reward: [(0, '24.322')] [2024-12-01 11:20:21,794][04311] Updated weights for policy 0, policy_version 750 (0.0019) [2024-12-01 11:20:24,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 3957.2). Total num frames: 3084288. Throughput: 0: 997.3. Samples: 771112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:20:24,549][02154] Avg episode reward: [(0, '24.311')] [2024-12-01 11:20:29,552][02154] Fps is (10 sec: 3683.8, 60 sec: 3890.7, 300 sec: 3929.3). Total num frames: 3096576. Throughput: 0: 1010.2. Samples: 773852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:20:29,556][02154] Avg episode reward: [(0, '24.685')] [2024-12-01 11:20:29,646][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000757_3100672.pth... [2024-12-01 11:20:29,806][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000527_2158592.pth [2024-12-01 11:20:33,133][04311] Updated weights for policy 0, policy_version 760 (0.0013) [2024-12-01 11:20:34,545][02154] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3117056. Throughput: 0: 962.2. Samples: 778564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:20:34,547][02154] Avg episode reward: [(0, '24.915')] [2024-12-01 11:20:39,545][02154] Fps is (10 sec: 4508.7, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3141632. Throughput: 0: 983.0. Samples: 785572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:20:39,547][02154] Avg episode reward: [(0, '25.423')] [2024-12-01 11:20:41,698][04311] Updated weights for policy 0, policy_version 770 (0.0014) [2024-12-01 11:20:44,546][02154] Fps is (10 sec: 4504.9, 60 sec: 4027.6, 300 sec: 3943.2). Total num frames: 3162112. Throughput: 0: 1014.3. Samples: 789130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:20:44,549][02154] Avg episode reward: [(0, '24.226')] [2024-12-01 11:20:49,546][02154] Fps is (10 sec: 3276.5, 60 sec: 3823.2, 300 sec: 3901.6). Total num frames: 3174400. Throughput: 0: 968.5. Samples: 793398. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:20:49,550][02154] Avg episode reward: [(0, '22.657')] [2024-12-01 11:20:53,245][04311] Updated weights for policy 0, policy_version 780 (0.0031) [2024-12-01 11:20:54,545][02154] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3198976. Throughput: 0: 962.7. Samples: 799996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:20:54,551][02154] Avg episode reward: [(0, '24.286')] [2024-12-01 11:20:59,545][02154] Fps is (10 sec: 4506.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3219456. Throughput: 0: 993.6. Samples: 803454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:20:59,550][02154] Avg episode reward: [(0, '26.326')] [2024-12-01 11:20:59,565][04297] Saving new best policy, reward=26.326! [2024-12-01 11:21:04,096][04311] Updated weights for policy 0, policy_version 790 (0.0021) [2024-12-01 11:21:04,545][02154] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3235840. Throughput: 0: 984.8. Samples: 808586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:21:04,550][02154] Avg episode reward: [(0, '27.308')] [2024-12-01 11:21:04,557][04297] Saving new best policy, reward=27.308! [2024-12-01 11:21:09,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3256320. Throughput: 0: 961.7. Samples: 814390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:21:09,553][02154] Avg episode reward: [(0, '26.186')] [2024-12-01 11:21:13,490][04311] Updated weights for policy 0, policy_version 800 (0.0016) [2024-12-01 11:21:14,545][02154] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 3280896. Throughput: 0: 978.9. Samples: 817896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:21:14,548][02154] Avg episode reward: [(0, '26.529')] [2024-12-01 11:21:19,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3293184. Throughput: 0: 1005.4. Samples: 823808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:21:19,551][02154] Avg episode reward: [(0, '24.745')] [2024-12-01 11:21:24,545][02154] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3313664. Throughput: 0: 957.6. Samples: 828664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:21:24,547][02154] Avg episode reward: [(0, '24.575')] [2024-12-01 11:21:25,039][04311] Updated weights for policy 0, policy_version 810 (0.0018) [2024-12-01 11:21:29,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4028.2, 300 sec: 3943.3). Total num frames: 3338240. Throughput: 0: 954.4. Samples: 832076. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:21:29,547][02154] Avg episode reward: [(0, '23.801')] [2024-12-01 11:21:34,545][02154] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3354624. Throughput: 0: 1010.1. Samples: 838850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:21:34,552][02154] Avg episode reward: [(0, '23.884')] [2024-12-01 11:21:34,583][04311] Updated weights for policy 0, policy_version 820 (0.0014) [2024-12-01 11:21:39,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 3371008. Throughput: 0: 956.9. Samples: 843058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:21:39,550][02154] Avg episode reward: [(0, '24.720')] [2024-12-01 11:21:44,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3929.4). Total num frames: 3395584. Throughput: 0: 954.0. Samples: 846386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:21:44,552][02154] Avg episode reward: [(0, '25.642')] [2024-12-01 11:21:45,310][04311] Updated weights for policy 0, policy_version 830 (0.0023) [2024-12-01 11:21:49,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3943.3). Total num frames: 3416064. Throughput: 0: 997.6. Samples: 853476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:21:49,547][02154] Avg episode reward: [(0, '26.876')] [2024-12-01 11:21:54,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3428352. Throughput: 0: 971.4. Samples: 858102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:21:54,550][02154] Avg episode reward: [(0, '26.875')] [2024-12-01 11:21:56,845][04311] Updated weights for policy 0, policy_version 840 (0.0034) [2024-12-01 11:21:59,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3452928. Throughput: 0: 950.2. Samples: 860656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:21:59,552][02154] Avg episode reward: [(0, '26.071')] [2024-12-01 11:22:04,545][02154] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3473408. Throughput: 0: 973.4. Samples: 867612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:22:04,546][02154] Avg episode reward: [(0, '24.160')] [2024-12-01 11:22:05,733][04311] Updated weights for policy 0, policy_version 850 (0.0041) [2024-12-01 11:22:09,545][02154] Fps is (10 sec: 3686.1, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3489792. Throughput: 0: 991.1. Samples: 873264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:22:09,553][02154] Avg episode reward: [(0, '23.148')] [2024-12-01 11:22:14,544][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3510272. Throughput: 0: 961.1. Samples: 875324. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-01 11:22:14,547][02154] Avg episode reward: [(0, '22.031')] [2024-12-01 11:22:17,172][04311] Updated weights for policy 0, policy_version 860 (0.0023) [2024-12-01 11:22:19,545][02154] Fps is (10 sec: 4096.2, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3530752. Throughput: 0: 959.2. Samples: 882014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:22:19,547][02154] Avg episode reward: [(0, '22.965')] [2024-12-01 11:22:24,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3551232. Throughput: 0: 1007.4. Samples: 888392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:22:24,551][02154] Avg episode reward: [(0, '23.408')] [2024-12-01 11:22:28,284][04311] Updated weights for policy 0, policy_version 870 (0.0023) [2024-12-01 11:22:29,551][02154] Fps is (10 sec: 3275.0, 60 sec: 3754.3, 300 sec: 3887.7). Total num frames: 3563520. Throughput: 0: 978.8. Samples: 890440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:22:29,552][02154] Avg episode reward: [(0, '23.721')] [2024-12-01 11:22:29,573][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000870_3563520.pth... [2024-12-01 11:22:29,711][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000643_2633728.pth [2024-12-01 11:22:34,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3588096. Throughput: 0: 946.3. Samples: 896058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:22:34,551][02154] Avg episode reward: [(0, '25.978')] [2024-12-01 11:22:37,852][04311] Updated weights for policy 0, policy_version 880 (0.0026) [2024-12-01 11:22:39,545][02154] Fps is (10 sec: 4918.1, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 3612672. Throughput: 0: 997.2. Samples: 902974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:22:39,546][02154] Avg episode reward: [(0, '25.812')] [2024-12-01 11:22:44,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3624960. Throughput: 0: 996.3. Samples: 905490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:22:44,552][02154] Avg episode reward: [(0, '25.971')] [2024-12-01 11:22:49,242][04311] Updated weights for policy 0, policy_version 890 (0.0017) [2024-12-01 11:22:49,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3645440. Throughput: 0: 949.4. Samples: 910336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:22:49,546][02154] Avg episode reward: [(0, '25.548')] [2024-12-01 11:22:54,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.6). Total num frames: 3665920. Throughput: 0: 978.2. Samples: 917280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-01 11:22:54,552][02154] Avg episode reward: [(0, '24.778')] [2024-12-01 11:22:59,178][04311] Updated weights for policy 0, policy_version 900 (0.0013) [2024-12-01 11:22:59,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3686400. Throughput: 0: 1006.4. Samples: 920610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:22:59,551][02154] Avg episode reward: [(0, '25.657')] [2024-12-01 11:23:04,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3702784. Throughput: 0: 952.7. Samples: 924884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-01 11:23:04,552][02154] Avg episode reward: [(0, '25.224')] [2024-12-01 11:23:09,482][04311] Updated weights for policy 0, policy_version 910 (0.0029) [2024-12-01 11:23:09,545][02154] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3727360. Throughput: 0: 963.2. Samples: 931736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:23:09,547][02154] Avg episode reward: [(0, '26.063')] [2024-12-01 11:23:14,545][02154] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 3747840. Throughput: 0: 996.7. Samples: 935284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:23:14,553][02154] Avg episode reward: [(0, '25.871')] [2024-12-01 11:23:19,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 3760128. Throughput: 0: 978.8. Samples: 940106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:23:19,552][02154] Avg episode reward: [(0, '26.466')] [2024-12-01 11:23:20,921][04311] Updated weights for policy 0, policy_version 920 (0.0028) [2024-12-01 11:23:24,544][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.7). Total num frames: 3784704. Throughput: 0: 957.1. Samples: 946044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:23:24,549][02154] Avg episode reward: [(0, '24.950')] [2024-12-01 11:23:29,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4028.1, 300 sec: 3929.4). Total num frames: 3805184. Throughput: 0: 980.7. Samples: 949622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:23:29,547][02154] Avg episode reward: [(0, '23.885')] [2024-12-01 11:23:29,575][04311] Updated weights for policy 0, policy_version 930 (0.0022) [2024-12-01 11:23:34,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3821568. Throughput: 0: 999.5. Samples: 955314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-01 11:23:34,551][02154] Avg episode reward: [(0, '23.925')] [2024-12-01 11:23:39,545][02154] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3842048. Throughput: 0: 962.5. Samples: 960594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:23:39,547][02154] Avg episode reward: [(0, '23.939')] [2024-12-01 11:23:41,009][04311] Updated weights for policy 0, policy_version 940 (0.0029) [2024-12-01 11:23:44,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 3866624. Throughput: 0: 967.6. Samples: 964152. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:23:44,547][02154] Avg episode reward: [(0, '23.039')] [2024-12-01 11:23:49,548][02154] Fps is (10 sec: 4094.4, 60 sec: 3959.2, 300 sec: 3915.4). Total num frames: 3883008. Throughput: 0: 1015.5. Samples: 970584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:23:49,551][02154] Avg episode reward: [(0, '22.681')] [2024-12-01 11:23:51,998][04311] Updated weights for policy 0, policy_version 950 (0.0023) [2024-12-01 11:23:54,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3899392. Throughput: 0: 960.0. Samples: 974936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:23:54,555][02154] Avg episode reward: [(0, '22.218')] [2024-12-01 11:23:59,545][02154] Fps is (10 sec: 4097.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3923968. Throughput: 0: 960.3. Samples: 978498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-01 11:23:59,547][02154] Avg episode reward: [(0, '23.019')] [2024-12-01 11:24:01,200][04311] Updated weights for policy 0, policy_version 960 (0.0025) [2024-12-01 11:24:04,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 3944448. Throughput: 0: 1009.2. Samples: 985518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:24:04,551][02154] Avg episode reward: [(0, '22.438')] [2024-12-01 11:24:09,545][02154] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3956736. Throughput: 0: 976.3. Samples: 989978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-01 11:24:09,550][02154] Avg episode reward: [(0, '21.696')] [2024-12-01 11:24:12,606][04311] Updated weights for policy 0, policy_version 970 (0.0051) [2024-12-01 11:24:14,545][02154] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3981312. Throughput: 0: 960.3. Samples: 992836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-01 11:24:14,552][02154] Avg episode reward: [(0, '22.582')] [2024-12-01 11:24:19,545][02154] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 4001792. Throughput: 0: 991.5. Samples: 999932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-01 11:24:19,552][02154] Avg episode reward: [(0, '23.238')] [2024-12-01 11:24:19,673][04297] Stopping Batcher_0... [2024-12-01 11:24:19,674][04297] Loop batcher_evt_loop terminating... [2024-12-01 11:24:19,673][02154] Component Batcher_0 stopped! [2024-12-01 11:24:19,679][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-01 11:24:19,732][04311] Weights refcount: 2 0 [2024-12-01 11:24:19,737][04311] Stopping InferenceWorker_p0-w0... [2024-12-01 11:24:19,738][04311] Loop inference_proc0-0_evt_loop terminating... [2024-12-01 11:24:19,737][02154] Component InferenceWorker_p0-w0 stopped! [2024-12-01 11:24:19,799][04297] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000757_3100672.pth [2024-12-01 11:24:19,825][04297] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-01 11:24:19,996][04297] Stopping LearnerWorker_p0... [2024-12-01 11:24:19,997][04297] Loop learner_proc0_evt_loop terminating... [2024-12-01 11:24:19,997][02154] Component LearnerWorker_p0 stopped! [2024-12-01 11:24:20,094][02154] Component RolloutWorker_w1 stopped! [2024-12-01 11:24:20,100][04312] Stopping RolloutWorker_w1... [2024-12-01 11:24:20,103][04312] Loop rollout_proc1_evt_loop terminating... [2024-12-01 11:24:20,121][04310] Stopping RolloutWorker_w0... [2024-12-01 11:24:20,124][04317] Stopping RolloutWorker_w6... [2024-12-01 11:24:20,125][04317] Loop rollout_proc6_evt_loop terminating... [2024-12-01 11:24:20,121][02154] Component RolloutWorker_w0 stopped! [2024-12-01 11:24:20,126][02154] Component RolloutWorker_w6 stopped! [2024-12-01 11:24:20,133][04310] Loop rollout_proc0_evt_loop terminating... [2024-12-01 11:24:20,143][04313] Stopping RolloutWorker_w2... [2024-12-01 11:24:20,143][04313] Loop rollout_proc2_evt_loop terminating... [2024-12-01 11:24:20,144][04314] Stopping RolloutWorker_w3... [2024-12-01 11:24:20,145][04314] Loop rollout_proc3_evt_loop terminating... [2024-12-01 11:24:20,144][02154] Component RolloutWorker_w2 stopped! [2024-12-01 11:24:20,153][02154] Component RolloutWorker_w3 stopped! [2024-12-01 11:24:20,162][02154] Component RolloutWorker_w7 stopped! [2024-12-01 11:24:20,168][04318] Stopping RolloutWorker_w7... [2024-12-01 11:24:20,168][04318] Loop rollout_proc7_evt_loop terminating... [2024-12-01 11:24:20,200][04315] Stopping RolloutWorker_w4... [2024-12-01 11:24:20,199][02154] Component RolloutWorker_w4 stopped! [2024-12-01 11:24:20,200][04315] Loop rollout_proc4_evt_loop terminating... [2024-12-01 11:24:20,222][02154] Component RolloutWorker_w5 stopped! [2024-12-01 11:24:20,226][02154] Waiting for process learner_proc0 to stop... [2024-12-01 11:24:20,234][04316] Stopping RolloutWorker_w5... [2024-12-01 11:24:20,234][04316] Loop rollout_proc5_evt_loop terminating... [2024-12-01 11:24:22,232][02154] Waiting for process inference_proc0-0 to join... [2024-12-01 11:24:22,238][02154] Waiting for process rollout_proc0 to join... [2024-12-01 11:24:24,949][02154] Waiting for process rollout_proc1 to join... [2024-12-01 11:24:25,062][02154] Waiting for process rollout_proc2 to join... [2024-12-01 11:24:25,067][02154] Waiting for process rollout_proc3 to join... [2024-12-01 11:24:25,070][02154] Waiting for process rollout_proc4 to join... [2024-12-01 11:24:25,074][02154] Waiting for process rollout_proc5 to join... [2024-12-01 11:24:25,077][02154] Waiting for process rollout_proc6 to join... [2024-12-01 11:24:25,081][02154] Waiting for process rollout_proc7 to join... [2024-12-01 11:24:25,084][02154] Batcher 0 profile tree view: batching: 25.9505, releasing_batches: 0.0308 [2024-12-01 11:24:25,086][02154] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 406.1312 update_model: 8.6124 weight_update: 0.0022 one_step: 0.0024 handle_policy_step: 579.6014 deserialize: 14.8573, stack: 3.1071, obs_to_device_normalize: 121.6247, forward: 292.0473, send_messages: 28.7061 prepare_outputs: 89.5644 to_cpu: 54.2086 [2024-12-01 11:24:25,087][02154] Learner 0 profile tree view: misc: 0.0059, prepare_batch: 13.5534 train: 72.7957 epoch_init: 0.0134, minibatch_init: 0.0063, losses_postprocess: 0.5706, kl_divergence: 0.6039, after_optimizer: 33.7038 calculate_losses: 25.5581 losses_init: 0.0034, forward_head: 1.1618, bptt_initial: 17.0828, tail: 1.0208, advantages_returns: 0.2430, losses: 3.7061 bptt: 2.0363 bptt_forward_core: 1.9198 update: 11.7813 clip: 0.8637 [2024-12-01 11:24:25,089][02154] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2925, enqueue_policy_requests: 97.5265, env_step: 811.8499, overhead: 12.6237, complete_rollouts: 6.9544 save_policy_outputs: 20.7697 split_output_tensors: 8.5124 [2024-12-01 11:24:25,091][02154] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.2677, enqueue_policy_requests: 98.5244, env_step: 808.1773, overhead: 13.5904, complete_rollouts: 6.7081 save_policy_outputs: 20.8069 split_output_tensors: 8.5927 [2024-12-01 11:24:25,092][02154] Loop Runner_EvtLoop terminating... [2024-12-01 11:24:25,094][02154] Runner profile tree view: main_loop: 1067.5266 [2024-12-01 11:24:25,095][02154] Collected {0: 4005888}, FPS: 3752.5 [2024-12-01 11:24:30,310][02154] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-01 11:24:30,312][02154] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-01 11:24:30,315][02154] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-01 11:24:30,317][02154] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-01 11:24:30,320][02154] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-01 11:24:30,321][02154] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-01 11:24:30,322][02154] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-01 11:24:30,324][02154] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-01 11:24:30,325][02154] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-01 11:24:30,326][02154] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-01 11:24:30,327][02154] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-01 11:24:30,329][02154] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-01 11:24:30,330][02154] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-01 11:24:30,331][02154] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-01 11:24:30,332][02154] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-01 11:24:30,367][02154] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-01 11:24:30,371][02154] RunningMeanStd input shape: (3, 72, 128) [2024-12-01 11:24:30,373][02154] RunningMeanStd input shape: (1,) [2024-12-01 11:24:30,395][02154] ConvEncoder: input_channels=3 [2024-12-01 11:24:30,504][02154] Conv encoder output size: 512 [2024-12-01 11:24:30,506][02154] Policy head output size: 512 [2024-12-01 11:24:30,782][02154] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-01 11:24:31,542][02154] Num frames 100... [2024-12-01 11:24:31,659][02154] Num frames 200... [2024-12-01 11:24:31,779][02154] Num frames 300... [2024-12-01 11:24:31,905][02154] Num frames 400... [2024-12-01 11:24:31,981][02154] Avg episode rewards: #0: 8.160, true rewards: #0: 4.160 [2024-12-01 11:24:31,983][02154] Avg episode reward: 8.160, avg true_objective: 4.160 [2024-12-01 11:24:32,085][02154] Num frames 500... [2024-12-01 11:24:32,206][02154] Num frames 600... [2024-12-01 11:24:32,329][02154] Num frames 700... [2024-12-01 11:24:32,462][02154] Num frames 800... [2024-12-01 11:24:32,592][02154] Num frames 900... [2024-12-01 11:24:32,713][02154] Num frames 1000... [2024-12-01 11:24:32,830][02154] Num frames 1100... [2024-12-01 11:24:32,947][02154] Num frames 1200... [2024-12-01 11:24:33,067][02154] Num frames 1300... [2024-12-01 11:24:33,186][02154] Num frames 1400... [2024-12-01 11:24:33,307][02154] Num frames 1500... [2024-12-01 11:24:33,423][02154] Num frames 1600... [2024-12-01 11:24:33,561][02154] Num frames 1700... [2024-12-01 11:24:33,688][02154] Num frames 1800... [2024-12-01 11:24:33,808][02154] Num frames 1900... [2024-12-01 11:24:33,929][02154] Num frames 2000... [2024-12-01 11:24:34,053][02154] Num frames 2100... [2024-12-01 11:24:34,174][02154] Num frames 2200... [2024-12-01 11:24:34,297][02154] Num frames 2300... [2024-12-01 11:24:34,469][02154] Avg episode rewards: #0: 30.880, true rewards: #0: 11.880 [2024-12-01 11:24:34,471][02154] Avg episode reward: 30.880, avg true_objective: 11.880 [2024-12-01 11:24:34,527][02154] Num frames 2400... [2024-12-01 11:24:34,700][02154] Num frames 2500... [2024-12-01 11:24:34,866][02154] Num frames 2600... [2024-12-01 11:24:35,034][02154] Num frames 2700... [2024-12-01 11:24:35,205][02154] Num frames 2800... [2024-12-01 11:24:35,367][02154] Num frames 2900... [2024-12-01 11:24:35,544][02154] Num frames 3000... [2024-12-01 11:24:35,716][02154] Num frames 3100... [2024-12-01 11:24:35,796][02154] Avg episode rewards: #0: 26.040, true rewards: #0: 10.373 [2024-12-01 11:24:35,798][02154] Avg episode reward: 26.040, avg true_objective: 10.373 [2024-12-01 11:24:35,953][02154] Num frames 3200... [2024-12-01 11:24:36,124][02154] Num frames 3300... [2024-12-01 11:24:36,294][02154] Num frames 3400... [2024-12-01 11:24:36,468][02154] Num frames 3500... [2024-12-01 11:24:36,659][02154] Num frames 3600... [2024-12-01 11:24:36,826][02154] Num frames 3700... [2024-12-01 11:24:36,942][02154] Avg episode rewards: #0: 22.880, true rewards: #0: 9.380 [2024-12-01 11:24:36,944][02154] Avg episode reward: 22.880, avg true_objective: 9.380 [2024-12-01 11:24:37,004][02154] Num frames 3800... [2024-12-01 11:24:37,122][02154] Num frames 3900... [2024-12-01 11:24:37,239][02154] Num frames 4000... [2024-12-01 11:24:37,359][02154] Num frames 4100... [2024-12-01 11:24:37,480][02154] Num frames 4200... [2024-12-01 11:24:37,612][02154] Num frames 4300... [2024-12-01 11:24:37,738][02154] Num frames 4400... [2024-12-01 11:24:37,859][02154] Num frames 4500... [2024-12-01 11:24:37,977][02154] Num frames 4600... [2024-12-01 11:24:38,103][02154] Num frames 4700... [2024-12-01 11:24:38,226][02154] Num frames 4800... [2024-12-01 11:24:38,346][02154] Num frames 4900... [2024-12-01 11:24:38,481][02154] Avg episode rewards: #0: 23.936, true rewards: #0: 9.936 [2024-12-01 11:24:38,482][02154] Avg episode reward: 23.936, avg true_objective: 9.936 [2024-12-01 11:24:38,526][02154] Num frames 5000... [2024-12-01 11:24:38,648][02154] Num frames 5100... [2024-12-01 11:24:38,780][02154] Num frames 5200... [2024-12-01 11:24:38,921][02154] Avg episode rewards: #0: 21.117, true rewards: #0: 8.783 [2024-12-01 11:24:38,924][02154] Avg episode reward: 21.117, avg true_objective: 8.783 [2024-12-01 11:24:38,962][02154] Num frames 5300... [2024-12-01 11:24:39,094][02154] Num frames 5400... [2024-12-01 11:24:39,213][02154] Num frames 5500... [2024-12-01 11:24:39,335][02154] Num frames 5600... [2024-12-01 11:24:39,457][02154] Num frames 5700... [2024-12-01 11:24:39,592][02154] Num frames 5800... [2024-12-01 11:24:39,726][02154] Num frames 5900... [2024-12-01 11:24:39,801][02154] Avg episode rewards: #0: 19.593, true rewards: #0: 8.450 [2024-12-01 11:24:39,802][02154] Avg episode reward: 19.593, avg true_objective: 8.450 [2024-12-01 11:24:39,905][02154] Num frames 6000... [2024-12-01 11:24:40,029][02154] Num frames 6100... [2024-12-01 11:24:40,149][02154] Num frames 6200... [2024-12-01 11:24:40,323][02154] Avg episode rewards: #0: 17.624, true rewards: #0: 7.874 [2024-12-01 11:24:40,325][02154] Avg episode reward: 17.624, avg true_objective: 7.874 [2024-12-01 11:24:40,329][02154] Num frames 6300... [2024-12-01 11:24:40,450][02154] Num frames 6400... [2024-12-01 11:24:40,584][02154] Num frames 6500... [2024-12-01 11:24:40,705][02154] Num frames 6600... [2024-12-01 11:24:40,838][02154] Num frames 6700... [2024-12-01 11:24:40,965][02154] Num frames 6800... [2024-12-01 11:24:41,088][02154] Num frames 6900... [2024-12-01 11:24:41,229][02154] Avg episode rewards: #0: 16.857, true rewards: #0: 7.746 [2024-12-01 11:24:41,232][02154] Avg episode reward: 16.857, avg true_objective: 7.746 [2024-12-01 11:24:41,269][02154] Num frames 7000... [2024-12-01 11:24:41,389][02154] Num frames 7100... [2024-12-01 11:24:41,518][02154] Num frames 7200... [2024-12-01 11:24:41,651][02154] Num frames 7300... [2024-12-01 11:24:41,781][02154] Num frames 7400... [2024-12-01 11:24:41,903][02154] Num frames 7500... [2024-12-01 11:24:42,025][02154] Num frames 7600... [2024-12-01 11:24:42,148][02154] Num frames 7700... [2024-12-01 11:24:42,276][02154] Avg episode rewards: #0: 16.660, true rewards: #0: 7.760 [2024-12-01 11:24:42,277][02154] Avg episode reward: 16.660, avg true_objective: 7.760 [2024-12-01 11:25:28,135][02154] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-01 11:31:45,835][02154] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-01 11:31:45,836][02154] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-01 11:31:45,838][02154] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-01 11:31:45,840][02154] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-01 11:31:45,842][02154] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-01 11:31:45,844][02154] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-01 11:31:45,845][02154] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-01 11:31:45,846][02154] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-01 11:31:45,847][02154] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-01 11:31:45,849][02154] Adding new argument 'hf_repository'='Farseer-W/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-01 11:31:45,849][02154] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-01 11:31:45,850][02154] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-01 11:31:45,851][02154] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-01 11:31:45,852][02154] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-01 11:31:45,853][02154] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-01 11:31:45,893][02154] RunningMeanStd input shape: (3, 72, 128) [2024-12-01 11:31:45,895][02154] RunningMeanStd input shape: (1,) [2024-12-01 11:31:45,912][02154] ConvEncoder: input_channels=3 [2024-12-01 11:31:45,973][02154] Conv encoder output size: 512 [2024-12-01 11:31:45,975][02154] Policy head output size: 512 [2024-12-01 11:31:46,002][02154] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-01 11:31:46,647][02154] Num frames 100... [2024-12-01 11:31:46,832][02154] Num frames 200... [2024-12-01 11:31:47,011][02154] Num frames 300... [2024-12-01 11:31:47,167][02154] Num frames 400... [2024-12-01 11:31:47,289][02154] Num frames 500... [2024-12-01 11:31:47,414][02154] Num frames 600... [2024-12-01 11:31:47,541][02154] Num frames 700... [2024-12-01 11:31:47,663][02154] Num frames 800... [2024-12-01 11:31:47,780][02154] Num frames 900... [2024-12-01 11:31:47,901][02154] Num frames 1000... [2024-12-01 11:31:48,021][02154] Num frames 1100... [2024-12-01 11:31:48,142][02154] Num frames 1200... [2024-12-01 11:31:48,259][02154] Num frames 1300... [2024-12-01 11:31:48,390][02154] Num frames 1400... [2024-12-01 11:31:48,512][02154] Num frames 1500... [2024-12-01 11:31:48,641][02154] Num frames 1600... [2024-12-01 11:31:48,764][02154] Num frames 1700... [2024-12-01 11:31:48,889][02154] Num frames 1800... [2024-12-01 11:31:49,012][02154] Num frames 1900... [2024-12-01 11:31:49,093][02154] Avg episode rewards: #0: 43.199, true rewards: #0: 19.200 [2024-12-01 11:31:49,095][02154] Avg episode reward: 43.199, avg true_objective: 19.200 [2024-12-01 11:31:49,198][02154] Num frames 2000... [2024-12-01 11:31:49,317][02154] Num frames 2100... [2024-12-01 11:31:49,447][02154] Num frames 2200... [2024-12-01 11:31:49,577][02154] Num frames 2300... [2024-12-01 11:31:49,698][02154] Num frames 2400... [2024-12-01 11:31:49,816][02154] Avg episode rewards: #0: 25.750, true rewards: #0: 12.250 [2024-12-01 11:31:49,817][02154] Avg episode reward: 25.750, avg true_objective: 12.250 [2024-12-01 11:31:49,878][02154] Num frames 2500... [2024-12-01 11:31:50,000][02154] Num frames 2600... [2024-12-01 11:31:50,125][02154] Num frames 2700... [2024-12-01 11:31:50,247][02154] Num frames 2800... [2024-12-01 11:31:50,382][02154] Num frames 2900... [2024-12-01 11:31:50,550][02154] Avg episode rewards: #0: 19.646, true rewards: #0: 9.980 [2024-12-01 11:31:50,554][02154] Avg episode reward: 19.646, avg true_objective: 9.980 [2024-12-01 11:31:50,563][02154] Num frames 3000... [2024-12-01 11:31:50,686][02154] Num frames 3100... [2024-12-01 11:31:50,811][02154] Num frames 3200... [2024-12-01 11:31:50,931][02154] Num frames 3300... [2024-12-01 11:31:51,054][02154] Num frames 3400... [2024-12-01 11:31:51,177][02154] Num frames 3500... [2024-12-01 11:31:51,308][02154] Num frames 3600... [2024-12-01 11:31:51,455][02154] Avg episode rewards: #0: 17.915, true rewards: #0: 9.165 [2024-12-01 11:31:51,456][02154] Avg episode reward: 17.915, avg true_objective: 9.165 [2024-12-01 11:31:51,501][02154] Num frames 3700... [2024-12-01 11:31:51,635][02154] Num frames 3800... [2024-12-01 11:31:51,756][02154] Num frames 3900... [2024-12-01 11:31:51,880][02154] Num frames 4000... [2024-12-01 11:31:52,000][02154] Num frames 4100... [2024-12-01 11:31:52,127][02154] Num frames 4200... [2024-12-01 11:31:52,249][02154] Num frames 4300... [2024-12-01 11:31:52,372][02154] Num frames 4400... [2024-12-01 11:31:52,505][02154] Num frames 4500... [2024-12-01 11:31:52,636][02154] Num frames 4600... [2024-12-01 11:31:52,759][02154] Num frames 4700... [2024-12-01 11:31:52,879][02154] Num frames 4800... [2024-12-01 11:31:52,998][02154] Num frames 4900... [2024-12-01 11:31:53,124][02154] Num frames 5000... [2024-12-01 11:31:53,244][02154] Num frames 5100... [2024-12-01 11:31:53,361][02154] Avg episode rewards: #0: 21.288, true rewards: #0: 10.288 [2024-12-01 11:31:53,362][02154] Avg episode reward: 21.288, avg true_objective: 10.288 [2024-12-01 11:31:53,432][02154] Num frames 5200... [2024-12-01 11:31:53,569][02154] Num frames 5300... [2024-12-01 11:31:53,688][02154] Num frames 5400... [2024-12-01 11:31:53,808][02154] Num frames 5500... [2024-12-01 11:31:53,927][02154] Num frames 5600... [2024-12-01 11:31:54,051][02154] Num frames 5700... [2024-12-01 11:31:54,175][02154] Num frames 5800... [2024-12-01 11:31:54,292][02154] Num frames 5900... [2024-12-01 11:31:54,413][02154] Num frames 6000... [2024-12-01 11:31:54,548][02154] Num frames 6100... [2024-12-01 11:31:54,668][02154] Num frames 6200... [2024-12-01 11:31:54,792][02154] Num frames 6300... [2024-12-01 11:31:54,916][02154] Num frames 6400... [2024-12-01 11:31:55,039][02154] Avg episode rewards: #0: 22.760, true rewards: #0: 10.760 [2024-12-01 11:31:55,041][02154] Avg episode reward: 22.760, avg true_objective: 10.760 [2024-12-01 11:31:55,102][02154] Num frames 6500... [2024-12-01 11:31:55,226][02154] Num frames 6600... [2024-12-01 11:31:55,349][02154] Num frames 6700... [2024-12-01 11:31:55,471][02154] Num frames 6800... [2024-12-01 11:31:55,608][02154] Num frames 6900... [2024-12-01 11:31:55,734][02154] Num frames 7000... [2024-12-01 11:31:55,858][02154] Num frames 7100... [2024-12-01 11:31:55,980][02154] Num frames 7200... [2024-12-01 11:31:56,105][02154] Num frames 7300... [2024-12-01 11:31:56,230][02154] Num frames 7400... [2024-12-01 11:31:56,348][02154] Num frames 7500... [2024-12-01 11:31:56,471][02154] Num frames 7600... [2024-12-01 11:31:56,611][02154] Num frames 7700... [2024-12-01 11:31:56,739][02154] Num frames 7800... [2024-12-01 11:31:56,860][02154] Num frames 7900... [2024-12-01 11:31:56,983][02154] Num frames 8000... [2024-12-01 11:31:57,113][02154] Num frames 8100... [2024-12-01 11:31:57,306][02154] Num frames 8200... [2024-12-01 11:31:57,369][02154] Avg episode rewards: #0: 26.290, true rewards: #0: 11.719 [2024-12-01 11:31:57,372][02154] Avg episode reward: 26.290, avg true_objective: 11.719 [2024-12-01 11:31:57,583][02154] Num frames 8300... [2024-12-01 11:31:57,772][02154] Num frames 8400... [2024-12-01 11:31:57,969][02154] Num frames 8500... [2024-12-01 11:31:58,184][02154] Num frames 8600... [2024-12-01 11:31:58,376][02154] Num frames 8700... [2024-12-01 11:31:58,584][02154] Num frames 8800... [2024-12-01 11:31:58,759][02154] Num frames 8900... [2024-12-01 11:31:58,928][02154] Num frames 9000... [2024-12-01 11:31:59,097][02154] Num frames 9100... [2024-12-01 11:31:59,266][02154] Num frames 9200... [2024-12-01 11:31:59,440][02154] Num frames 9300... [2024-12-01 11:31:59,627][02154] Num frames 9400... [2024-12-01 11:31:59,811][02154] Num frames 9500... [2024-12-01 11:31:59,983][02154] Num frames 9600... [2024-12-01 11:32:00,169][02154] Num frames 9700... [2024-12-01 11:32:00,240][02154] Avg episode rewards: #0: 27.509, true rewards: #0: 12.134 [2024-12-01 11:32:00,242][02154] Avg episode reward: 27.509, avg true_objective: 12.134 [2024-12-01 11:32:00,351][02154] Num frames 9800... [2024-12-01 11:32:00,471][02154] Num frames 9900... [2024-12-01 11:32:00,599][02154] Num frames 10000... [2024-12-01 11:32:00,729][02154] Num frames 10100... [2024-12-01 11:32:00,853][02154] Num frames 10200... [2024-12-01 11:32:00,977][02154] Num frames 10300... [2024-12-01 11:32:01,106][02154] Num frames 10400... [2024-12-01 11:32:01,246][02154] Avg episode rewards: #0: 26.299, true rewards: #0: 11.632 [2024-12-01 11:32:01,247][02154] Avg episode reward: 26.299, avg true_objective: 11.632 [2024-12-01 11:32:01,287][02154] Num frames 10500... [2024-12-01 11:32:01,405][02154] Num frames 10600... [2024-12-01 11:32:01,530][02154] Num frames 10700... [2024-12-01 11:32:01,657][02154] Num frames 10800... [2024-12-01 11:32:01,787][02154] Num frames 10900... [2024-12-01 11:32:01,911][02154] Num frames 11000... [2024-12-01 11:32:02,031][02154] Num frames 11100... [2024-12-01 11:32:02,156][02154] Num frames 11200... [2024-12-01 11:32:02,278][02154] Num frames 11300... [2024-12-01 11:32:02,409][02154] Avg episode rewards: #0: 25.462, true rewards: #0: 11.362 [2024-12-01 11:32:02,411][02154] Avg episode reward: 25.462, avg true_objective: 11.362 [2024-12-01 11:33:08,588][02154] Replay video saved to /content/train_dir/default_experiment/replay.mp4!