[2024-11-10 12:37:59,936][02935] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-11-10 12:37:59,940][02935] Rollout worker 0 uses device cpu [2024-11-10 12:37:59,941][02935] Rollout worker 1 uses device cpu [2024-11-10 12:37:59,943][02935] Rollout worker 2 uses device cpu [2024-11-10 12:37:59,944][02935] Rollout worker 3 uses device cpu [2024-11-10 12:37:59,945][02935] Rollout worker 4 uses device cpu [2024-11-10 12:37:59,947][02935] Rollout worker 5 uses device cpu [2024-11-10 12:37:59,948][02935] Rollout worker 6 uses device cpu [2024-11-10 12:37:59,949][02935] Rollout worker 7 uses device cpu [2024-11-10 12:38:00,104][02935] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-10 12:38:00,106][02935] InferenceWorker_p0-w0: min num requests: 2 [2024-11-10 12:38:00,139][02935] Starting all processes... [2024-11-10 12:38:00,140][02935] Starting process learner_proc0 [2024-11-10 12:38:00,185][02935] Starting all processes... [2024-11-10 12:38:00,199][02935] Starting process inference_proc0-0 [2024-11-10 12:38:00,199][02935] Starting process rollout_proc0 [2024-11-10 12:38:00,201][02935] Starting process rollout_proc1 [2024-11-10 12:38:00,202][02935] Starting process rollout_proc2 [2024-11-10 12:38:00,202][02935] Starting process rollout_proc3 [2024-11-10 12:38:00,202][02935] Starting process rollout_proc4 [2024-11-10 12:38:00,202][02935] Starting process rollout_proc5 [2024-11-10 12:38:00,202][02935] Starting process rollout_proc6 [2024-11-10 12:38:00,202][02935] Starting process rollout_proc7 [2024-11-10 12:38:16,474][08251] Worker 0 uses CPU cores [0] [2024-11-10 12:38:16,481][08237] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-10 12:38:16,483][08237] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-11-10 12:38:16,522][08250] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-10 12:38:16,524][08250] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-11-10 12:38:16,538][08254] Worker 3 uses CPU cores [1] [2024-11-10 12:38:16,539][08237] Num visible devices: 1 [2024-11-10 12:38:16,589][08237] Starting seed is not provided [2024-11-10 12:38:16,590][08237] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-10 12:38:16,590][08237] Initializing actor-critic model on device cuda:0 [2024-11-10 12:38:16,591][08237] RunningMeanStd input shape: (3, 72, 128) [2024-11-10 12:38:16,595][08237] RunningMeanStd input shape: (1,) [2024-11-10 12:38:16,601][08250] Num visible devices: 1 [2024-11-10 12:38:16,644][08237] ConvEncoder: input_channels=3 [2024-11-10 12:38:16,670][08253] Worker 2 uses CPU cores [0] [2024-11-10 12:38:16,719][08258] Worker 6 uses CPU cores [0] [2024-11-10 12:38:16,751][08252] Worker 1 uses CPU cores [1] [2024-11-10 12:38:16,795][08255] Worker 4 uses CPU cores [0] [2024-11-10 12:38:16,849][08256] Worker 5 uses CPU cores [1] [2024-11-10 12:38:16,897][08257] Worker 7 uses CPU cores [1] [2024-11-10 12:38:16,977][08237] Conv encoder output size: 512 [2024-11-10 12:38:16,977][08237] Policy head output size: 512 [2024-11-10 12:38:17,036][08237] Created Actor Critic model with architecture: [2024-11-10 12:38:17,036][08237] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-11-10 12:38:17,330][08237] Using optimizer [2024-11-10 12:38:20,097][02935] Heartbeat connected on Batcher_0 [2024-11-10 12:38:20,104][02935] Heartbeat connected on InferenceWorker_p0-w0 [2024-11-10 12:38:20,113][02935] Heartbeat connected on RolloutWorker_w0 [2024-11-10 12:38:20,116][02935] Heartbeat connected on RolloutWorker_w1 [2024-11-10 12:38:20,121][02935] Heartbeat connected on RolloutWorker_w2 [2024-11-10 12:38:20,124][02935] Heartbeat connected on RolloutWorker_w3 [2024-11-10 12:38:20,128][02935] Heartbeat connected on RolloutWorker_w4 [2024-11-10 12:38:20,131][02935] Heartbeat connected on RolloutWorker_w5 [2024-11-10 12:38:20,135][02935] Heartbeat connected on RolloutWorker_w6 [2024-11-10 12:38:20,138][02935] Heartbeat connected on RolloutWorker_w7 [2024-11-10 12:38:20,675][08237] No checkpoints found [2024-11-10 12:38:20,675][08237] Did not load from checkpoint, starting from scratch! [2024-11-10 12:38:20,676][08237] Initialized policy 0 weights for model version 0 [2024-11-10 12:38:20,685][08237] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-11-10 12:38:20,694][08237] LearnerWorker_p0 finished initialization! [2024-11-10 12:38:20,695][02935] Heartbeat connected on LearnerWorker_p0 [2024-11-10 12:38:20,832][08250] RunningMeanStd input shape: (3, 72, 128) [2024-11-10 12:38:20,833][08250] RunningMeanStd input shape: (1,) [2024-11-10 12:38:20,854][08250] ConvEncoder: input_channels=3 [2024-11-10 12:38:21,016][08250] Conv encoder output size: 512 [2024-11-10 12:38:21,016][08250] Policy head output size: 512 [2024-11-10 12:38:21,090][02935] Inference worker 0-0 is ready! [2024-11-10 12:38:21,093][02935] All inference workers are ready! Signal rollout workers to start! [2024-11-10 12:38:21,292][08256] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-10 12:38:21,291][08257] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-10 12:38:21,300][08254] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-10 12:38:21,300][08252] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-10 12:38:21,434][08255] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-10 12:38:21,436][08253] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-10 12:38:21,437][08251] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-10 12:38:21,438][08258] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-10 12:38:22,853][08255] Decorrelating experience for 0 frames... [2024-11-10 12:38:23,193][08254] Decorrelating experience for 0 frames... [2024-11-10 12:38:23,195][08252] Decorrelating experience for 0 frames... [2024-11-10 12:38:23,188][08256] Decorrelating experience for 0 frames... [2024-11-10 12:38:23,190][08257] Decorrelating experience for 0 frames... [2024-11-10 12:38:24,275][02935] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-10 12:38:24,422][08258] Decorrelating experience for 0 frames... [2024-11-10 12:38:24,424][08255] Decorrelating experience for 32 frames... [2024-11-10 12:38:24,694][08257] Decorrelating experience for 32 frames... [2024-11-10 12:38:24,692][08252] Decorrelating experience for 32 frames... [2024-11-10 12:38:24,698][08256] Decorrelating experience for 32 frames... [2024-11-10 12:38:25,728][08254] Decorrelating experience for 32 frames... [2024-11-10 12:38:25,749][08258] Decorrelating experience for 32 frames... [2024-11-10 12:38:25,756][08253] Decorrelating experience for 0 frames... [2024-11-10 12:38:26,023][08257] Decorrelating experience for 64 frames... [2024-11-10 12:38:26,060][08255] Decorrelating experience for 64 frames... [2024-11-10 12:38:26,370][08253] Decorrelating experience for 32 frames... [2024-11-10 12:38:26,746][08256] Decorrelating experience for 64 frames... [2024-11-10 12:38:27,004][08254] Decorrelating experience for 64 frames... [2024-11-10 12:38:27,063][08257] Decorrelating experience for 96 frames... [2024-11-10 12:38:27,707][08252] Decorrelating experience for 64 frames... [2024-11-10 12:38:27,810][08256] Decorrelating experience for 96 frames... [2024-11-10 12:38:27,976][08258] Decorrelating experience for 64 frames... [2024-11-10 12:38:28,403][08254] Decorrelating experience for 96 frames... [2024-11-10 12:38:28,437][08253] Decorrelating experience for 64 frames... [2024-11-10 12:38:29,275][02935] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-10 12:38:29,327][08251] Decorrelating experience for 0 frames... [2024-11-10 12:38:29,705][08255] Decorrelating experience for 96 frames... [2024-11-10 12:38:29,792][08258] Decorrelating experience for 96 frames... [2024-11-10 12:38:29,858][08252] Decorrelating experience for 96 frames... [2024-11-10 12:38:30,196][08253] Decorrelating experience for 96 frames... [2024-11-10 12:38:31,287][08251] Decorrelating experience for 32 frames... [2024-11-10 12:38:32,619][08237] Signal inference workers to stop experience collection... [2024-11-10 12:38:32,629][08250] InferenceWorker_p0-w0: stopping experience collection [2024-11-10 12:38:32,896][08251] Decorrelating experience for 64 frames... [2024-11-10 12:38:33,252][08251] Decorrelating experience for 96 frames... [2024-11-10 12:38:34,275][02935] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 239.6. Samples: 2396. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-11-10 12:38:34,282][02935] Avg episode reward: [(0, '2.313')] [2024-11-10 12:38:36,102][08237] Signal inference workers to resume experience collection... [2024-11-10 12:38:36,103][08250] InferenceWorker_p0-w0: resuming experience collection [2024-11-10 12:38:39,280][02935] Fps is (10 sec: 1228.2, 60 sec: 818.9, 300 sec: 818.9). Total num frames: 12288. Throughput: 0: 281.4. Samples: 4222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2024-11-10 12:38:39,282][02935] Avg episode reward: [(0, '3.116')] [2024-11-10 12:38:44,275][02935] Fps is (10 sec: 3276.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 32768. Throughput: 0: 332.9. Samples: 6658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:38:44,281][02935] Avg episode reward: [(0, '3.703')] [2024-11-10 12:38:45,566][08250] Updated weights for policy 0, policy_version 10 (0.0027) [2024-11-10 12:38:49,275][02935] Fps is (10 sec: 4098.0, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 53248. Throughput: 0: 520.6. Samples: 13016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:38:49,279][02935] Avg episode reward: [(0, '4.309')] [2024-11-10 12:38:54,275][02935] Fps is (10 sec: 4096.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 73728. Throughput: 0: 631.0. Samples: 18930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:38:54,277][02935] Avg episode reward: [(0, '4.357')] [2024-11-10 12:38:56,908][08250] Updated weights for policy 0, policy_version 20 (0.0035) [2024-11-10 12:38:59,275][02935] Fps is (10 sec: 3686.4, 60 sec: 2574.6, 300 sec: 2574.6). Total num frames: 90112. Throughput: 0: 603.9. Samples: 21136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:38:59,277][02935] Avg episode reward: [(0, '4.382')] [2024-11-10 12:39:04,275][02935] Fps is (10 sec: 3686.4, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 110592. Throughput: 0: 698.2. Samples: 27926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:39:04,278][02935] Avg episode reward: [(0, '4.581')] [2024-11-10 12:39:04,299][08237] Saving new best policy, reward=4.581! [2024-11-10 12:39:06,071][08250] Updated weights for policy 0, policy_version 30 (0.0022) [2024-11-10 12:39:09,275][02935] Fps is (10 sec: 4505.6, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 135168. Throughput: 0: 770.4. Samples: 34666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-10 12:39:09,277][02935] Avg episode reward: [(0, '4.473')] [2024-11-10 12:39:14,275][02935] Fps is (10 sec: 3686.4, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 147456. Throughput: 0: 820.1. Samples: 36904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-10 12:39:14,279][02935] Avg episode reward: [(0, '4.421')] [2024-11-10 12:39:16,987][08250] Updated weights for policy 0, policy_version 40 (0.0021) [2024-11-10 12:39:19,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3127.9, 300 sec: 3127.9). Total num frames: 172032. Throughput: 0: 897.1. Samples: 42766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:39:19,281][02935] Avg episode reward: [(0, '4.423')] [2024-11-10 12:39:24,275][02935] Fps is (10 sec: 4915.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 196608. Throughput: 0: 1018.9. Samples: 50066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:39:24,283][02935] Avg episode reward: [(0, '4.392')] [2024-11-10 12:39:25,887][08250] Updated weights for policy 0, policy_version 50 (0.0023) [2024-11-10 12:39:29,281][02935] Fps is (10 sec: 4093.7, 60 sec: 3549.5, 300 sec: 3276.5). Total num frames: 212992. Throughput: 0: 1021.3. Samples: 52622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:39:29,287][02935] Avg episode reward: [(0, '4.438')] [2024-11-10 12:39:34,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3335.3). Total num frames: 233472. Throughput: 0: 993.8. Samples: 57736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:39:34,277][02935] Avg episode reward: [(0, '4.448')] [2024-11-10 12:39:36,604][08250] Updated weights for policy 0, policy_version 60 (0.0041) [2024-11-10 12:39:39,275][02935] Fps is (10 sec: 4508.2, 60 sec: 4096.3, 300 sec: 3440.6). Total num frames: 258048. Throughput: 0: 1022.4. Samples: 64936. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:39:39,279][02935] Avg episode reward: [(0, '4.493')] [2024-11-10 12:39:44,276][02935] Fps is (10 sec: 4095.5, 60 sec: 4027.6, 300 sec: 3430.3). Total num frames: 274432. Throughput: 0: 1046.5. Samples: 68230. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-11-10 12:39:44,282][02935] Avg episode reward: [(0, '4.742')] [2024-11-10 12:39:44,296][08237] Saving new best policy, reward=4.742! [2024-11-10 12:39:47,652][08250] Updated weights for policy 0, policy_version 70 (0.0020) [2024-11-10 12:39:49,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3421.4). Total num frames: 290816. Throughput: 0: 993.2. Samples: 72622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:39:49,281][02935] Avg episode reward: [(0, '4.722')] [2024-11-10 12:39:54,275][02935] Fps is (10 sec: 4096.5, 60 sec: 4027.7, 300 sec: 3504.4). Total num frames: 315392. Throughput: 0: 997.0. Samples: 79530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:39:54,277][02935] Avg episode reward: [(0, '4.531')] [2024-11-10 12:39:54,286][08237] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth... [2024-11-10 12:39:56,679][08250] Updated weights for policy 0, policy_version 80 (0.0027) [2024-11-10 12:39:59,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3535.5). Total num frames: 335872. Throughput: 0: 1027.6. Samples: 83148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:39:59,277][02935] Avg episode reward: [(0, '4.436')] [2024-11-10 12:40:04,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3522.6). Total num frames: 352256. Throughput: 0: 1009.0. Samples: 88170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:40:04,287][02935] Avg episode reward: [(0, '4.271')] [2024-11-10 12:40:07,711][08250] Updated weights for policy 0, policy_version 90 (0.0027) [2024-11-10 12:40:09,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 372736. Throughput: 0: 983.1. Samples: 94306. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:40:09,277][02935] Avg episode reward: [(0, '4.262')] [2024-11-10 12:40:14,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3611.9). Total num frames: 397312. Throughput: 0: 1006.2. Samples: 97896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:40:14,277][02935] Avg episode reward: [(0, '4.420')] [2024-11-10 12:40:16,725][08250] Updated weights for policy 0, policy_version 100 (0.0017) [2024-11-10 12:40:19,278][02935] Fps is (10 sec: 4094.6, 60 sec: 4027.5, 300 sec: 3597.2). Total num frames: 413696. Throughput: 0: 1024.0. Samples: 103820. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:40:19,285][02935] Avg episode reward: [(0, '4.507')] [2024-11-10 12:40:24,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3584.0). Total num frames: 430080. Throughput: 0: 955.7. Samples: 107942. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-11-10 12:40:24,278][02935] Avg episode reward: [(0, '4.654')] [2024-11-10 12:40:28,458][08250] Updated weights for policy 0, policy_version 110 (0.0014) [2024-11-10 12:40:29,275][02935] Fps is (10 sec: 3687.7, 60 sec: 3959.8, 300 sec: 3604.5). Total num frames: 450560. Throughput: 0: 960.7. Samples: 111460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-10 12:40:29,280][02935] Avg episode reward: [(0, '4.818')] [2024-11-10 12:40:29,283][08237] Saving new best policy, reward=4.818! [2024-11-10 12:40:34,278][02935] Fps is (10 sec: 4094.6, 60 sec: 3959.2, 300 sec: 3623.3). Total num frames: 471040. Throughput: 0: 1016.4. Samples: 118362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:40:34,283][02935] Avg episode reward: [(0, '4.578')] [2024-11-10 12:40:39,275][02935] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3610.5). Total num frames: 487424. Throughput: 0: 958.4. Samples: 122658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-10 12:40:39,278][02935] Avg episode reward: [(0, '4.611')] [2024-11-10 12:40:40,048][08250] Updated weights for policy 0, policy_version 120 (0.0042) [2024-11-10 12:40:44,275][02935] Fps is (10 sec: 3687.7, 60 sec: 3891.3, 300 sec: 3627.9). Total num frames: 507904. Throughput: 0: 954.5. Samples: 126102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:40:44,278][02935] Avg episode reward: [(0, '4.659')] [2024-11-10 12:40:48,479][08250] Updated weights for policy 0, policy_version 130 (0.0017) [2024-11-10 12:40:49,275][02935] Fps is (10 sec: 4505.9, 60 sec: 4027.7, 300 sec: 3672.3). Total num frames: 532480. Throughput: 0: 1001.0. Samples: 133216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:40:49,281][02935] Avg episode reward: [(0, '4.606')] [2024-11-10 12:40:54,275][02935] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3659.1). Total num frames: 548864. Throughput: 0: 973.1. Samples: 138096. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:40:54,277][02935] Avg episode reward: [(0, '4.788')] [2024-11-10 12:40:59,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3673.2). Total num frames: 569344. Throughput: 0: 949.4. Samples: 140620. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:40:59,277][02935] Avg episode reward: [(0, '4.814')] [2024-11-10 12:40:59,954][08250] Updated weights for policy 0, policy_version 140 (0.0023) [2024-11-10 12:41:04,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3712.0). Total num frames: 593920. Throughput: 0: 976.1. Samples: 147740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:41:04,280][02935] Avg episode reward: [(0, '4.356')] [2024-11-10 12:41:09,275][02935] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3698.8). Total num frames: 610304. Throughput: 0: 1019.8. Samples: 153832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:41:09,280][02935] Avg episode reward: [(0, '4.201')] [2024-11-10 12:41:09,437][08250] Updated weights for policy 0, policy_version 150 (0.0013) [2024-11-10 12:41:14,275][02935] Fps is (10 sec: 3276.6, 60 sec: 3822.9, 300 sec: 3686.4). Total num frames: 626688. Throughput: 0: 991.0. Samples: 156054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:41:14,279][02935] Avg episode reward: [(0, '4.423')] [2024-11-10 12:41:19,240][08250] Updated weights for policy 0, policy_version 160 (0.0027) [2024-11-10 12:41:19,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4028.0, 300 sec: 3744.9). Total num frames: 655360. Throughput: 0: 992.3. Samples: 163012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:41:19,277][02935] Avg episode reward: [(0, '4.596')] [2024-11-10 12:41:24,275][02935] Fps is (10 sec: 4915.4, 60 sec: 4096.0, 300 sec: 3754.7). Total num frames: 675840. Throughput: 0: 1046.9. Samples: 169770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:41:24,280][02935] Avg episode reward: [(0, '4.552')] [2024-11-10 12:41:29,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3741.8). Total num frames: 692224. Throughput: 0: 1021.2. Samples: 172054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:41:29,282][02935] Avg episode reward: [(0, '4.552')] [2024-11-10 12:41:30,402][08250] Updated weights for policy 0, policy_version 170 (0.0024) [2024-11-10 12:41:34,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4028.0, 300 sec: 3751.1). Total num frames: 712704. Throughput: 0: 995.5. Samples: 178014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:41:34,280][02935] Avg episode reward: [(0, '4.639')] [2024-11-10 12:41:38,718][08250] Updated weights for policy 0, policy_version 180 (0.0013) [2024-11-10 12:41:39,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3780.9). Total num frames: 737280. Throughput: 0: 1049.9. Samples: 185340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:41:39,282][02935] Avg episode reward: [(0, '4.831')] [2024-11-10 12:41:39,285][08237] Saving new best policy, reward=4.831! [2024-11-10 12:41:44,275][02935] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3768.3). Total num frames: 753664. Throughput: 0: 1052.5. Samples: 187982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:41:44,278][02935] Avg episode reward: [(0, '4.703')] [2024-11-10 12:41:49,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3776.3). Total num frames: 774144. Throughput: 0: 1006.6. Samples: 193036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:41:49,281][02935] Avg episode reward: [(0, '4.839')] [2024-11-10 12:41:49,284][08237] Saving new best policy, reward=4.839! [2024-11-10 12:41:50,000][08250] Updated weights for policy 0, policy_version 190 (0.0015) [2024-11-10 12:41:54,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3803.4). Total num frames: 798720. Throughput: 0: 1031.4. Samples: 200244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:41:54,279][02935] Avg episode reward: [(0, '4.840')] [2024-11-10 12:41:54,297][08237] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000195_798720.pth... [2024-11-10 12:41:54,424][08237] Saving new best policy, reward=4.840! [2024-11-10 12:41:59,275][02935] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3791.2). Total num frames: 815104. Throughput: 0: 1053.2. Samples: 203446. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-10 12:41:59,277][02935] Avg episode reward: [(0, '4.802')] [2024-11-10 12:41:59,608][08250] Updated weights for policy 0, policy_version 200 (0.0029) [2024-11-10 12:42:04,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3779.5). Total num frames: 831488. Throughput: 0: 999.6. Samples: 207996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:42:04,279][02935] Avg episode reward: [(0, '4.851')] [2024-11-10 12:42:04,290][08237] Saving new best policy, reward=4.851! [2024-11-10 12:42:09,275][02935] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 3804.7). Total num frames: 856064. Throughput: 0: 1006.8. Samples: 215078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:42:09,278][02935] Avg episode reward: [(0, '5.192')] [2024-11-10 12:42:09,282][08237] Saving new best policy, reward=5.192! [2024-11-10 12:42:09,594][08250] Updated weights for policy 0, policy_version 210 (0.0026) [2024-11-10 12:42:14,275][02935] Fps is (10 sec: 4915.2, 60 sec: 4232.6, 300 sec: 3828.9). Total num frames: 880640. Throughput: 0: 1036.6. Samples: 218700. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:42:14,277][02935] Avg episode reward: [(0, '5.002')] [2024-11-10 12:42:19,278][02935] Fps is (10 sec: 4094.9, 60 sec: 4027.5, 300 sec: 3817.1). Total num frames: 897024. Throughput: 0: 1021.3. Samples: 223974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:42:19,283][02935] Avg episode reward: [(0, '4.676')] [2024-11-10 12:42:20,466][08250] Updated weights for policy 0, policy_version 220 (0.0015) [2024-11-10 12:42:24,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3822.9). Total num frames: 917504. Throughput: 0: 993.5. Samples: 230046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:42:24,280][02935] Avg episode reward: [(0, '4.771')] [2024-11-10 12:42:28,952][08250] Updated weights for policy 0, policy_version 230 (0.0020) [2024-11-10 12:42:29,275][02935] Fps is (10 sec: 4506.9, 60 sec: 4164.3, 300 sec: 3845.2). Total num frames: 942080. Throughput: 0: 1017.3. Samples: 233760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:42:29,279][02935] Avg episode reward: [(0, '4.916')] [2024-11-10 12:42:34,278][02935] Fps is (10 sec: 4094.6, 60 sec: 4095.8, 300 sec: 3833.8). Total num frames: 958464. Throughput: 0: 1043.8. Samples: 240012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:42:34,280][02935] Avg episode reward: [(0, '5.050')] [2024-11-10 12:42:39,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3839.0). Total num frames: 978944. Throughput: 0: 1001.3. Samples: 245302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:42:39,277][02935] Avg episode reward: [(0, '5.140')] [2024-11-10 12:42:39,916][08250] Updated weights for policy 0, policy_version 240 (0.0030) [2024-11-10 12:42:44,275][02935] Fps is (10 sec: 4507.1, 60 sec: 4164.3, 300 sec: 3859.7). Total num frames: 1003520. Throughput: 0: 1011.7. Samples: 248974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:42:44,277][02935] Avg episode reward: [(0, '5.144')] [2024-11-10 12:42:48,573][08250] Updated weights for policy 0, policy_version 250 (0.0015) [2024-11-10 12:42:49,283][02935] Fps is (10 sec: 4502.1, 60 sec: 4163.7, 300 sec: 3864.0). Total num frames: 1024000. Throughput: 0: 1072.2. Samples: 256254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:42:49,287][02935] Avg episode reward: [(0, '5.086')] [2024-11-10 12:42:54,276][02935] Fps is (10 sec: 3685.9, 60 sec: 4027.6, 300 sec: 3853.3). Total num frames: 1040384. Throughput: 0: 1013.9. Samples: 260706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:42:54,278][02935] Avg episode reward: [(0, '5.188')] [2024-11-10 12:42:59,027][08250] Updated weights for policy 0, policy_version 260 (0.0054) [2024-11-10 12:42:59,275][02935] Fps is (10 sec: 4099.1, 60 sec: 4164.3, 300 sec: 3872.6). Total num frames: 1064960. Throughput: 0: 1014.5. Samples: 264354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:42:59,280][02935] Avg episode reward: [(0, '5.463')] [2024-11-10 12:42:59,284][08237] Saving new best policy, reward=5.463! [2024-11-10 12:43:04,275][02935] Fps is (10 sec: 4915.7, 60 sec: 4300.8, 300 sec: 3891.2). Total num frames: 1089536. Throughput: 0: 1061.4. Samples: 271734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:43:04,282][02935] Avg episode reward: [(0, '5.540')] [2024-11-10 12:43:04,292][08237] Saving new best policy, reward=5.540! [2024-11-10 12:43:09,229][08250] Updated weights for policy 0, policy_version 270 (0.0017) [2024-11-10 12:43:09,275][02935] Fps is (10 sec: 4095.9, 60 sec: 4164.2, 300 sec: 3880.4). Total num frames: 1105920. Throughput: 0: 1040.2. Samples: 276854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:43:09,279][02935] Avg episode reward: [(0, '5.501')] [2024-11-10 12:43:14,275][02935] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 3884.1). Total num frames: 1126400. Throughput: 0: 1016.0. Samples: 279482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:43:14,282][02935] Avg episode reward: [(0, '5.588')] [2024-11-10 12:43:14,289][08237] Saving new best policy, reward=5.588! [2024-11-10 12:43:18,554][08250] Updated weights for policy 0, policy_version 280 (0.0034) [2024-11-10 12:43:19,275][02935] Fps is (10 sec: 4096.1, 60 sec: 4164.5, 300 sec: 3887.7). Total num frames: 1146880. Throughput: 0: 1039.3. Samples: 286776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:43:19,278][02935] Avg episode reward: [(0, '5.787')] [2024-11-10 12:43:19,280][08237] Saving new best policy, reward=5.787! [2024-11-10 12:43:24,275][02935] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 3957.2). Total num frames: 1167360. Throughput: 0: 1050.2. Samples: 292560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:43:24,281][02935] Avg episode reward: [(0, '5.821')] [2024-11-10 12:43:24,296][08237] Saving new best policy, reward=5.821! [2024-11-10 12:43:29,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1183744. Throughput: 0: 1015.2. Samples: 294660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-10 12:43:29,282][02935] Avg episode reward: [(0, '5.706')] [2024-11-10 12:43:29,909][08250] Updated weights for policy 0, policy_version 290 (0.0021) [2024-11-10 12:43:34,275][02935] Fps is (10 sec: 4096.0, 60 sec: 4164.5, 300 sec: 4054.4). Total num frames: 1208320. Throughput: 0: 1007.7. Samples: 301594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:43:34,279][02935] Avg episode reward: [(0, '5.689')] [2024-11-10 12:43:38,252][08250] Updated weights for policy 0, policy_version 300 (0.0026) [2024-11-10 12:43:39,279][02935] Fps is (10 sec: 4503.5, 60 sec: 4164.0, 300 sec: 4054.3). Total num frames: 1228800. Throughput: 0: 1060.7. Samples: 308440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:43:39,283][02935] Avg episode reward: [(0, '5.260')] [2024-11-10 12:43:44,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1245184. Throughput: 0: 1029.7. Samples: 310692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:43:44,280][02935] Avg episode reward: [(0, '5.467')] [2024-11-10 12:43:49,179][08250] Updated weights for policy 0, policy_version 310 (0.0013) [2024-11-10 12:43:49,275][02935] Fps is (10 sec: 4097.8, 60 sec: 4096.5, 300 sec: 4054.3). Total num frames: 1269760. Throughput: 0: 999.6. Samples: 316714. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-10 12:43:49,282][02935] Avg episode reward: [(0, '5.640')] [2024-11-10 12:43:54,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.4, 300 sec: 4068.2). Total num frames: 1290240. Throughput: 0: 1045.0. Samples: 323878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:43:54,282][02935] Avg episode reward: [(0, '5.671')] [2024-11-10 12:43:54,296][08237] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000315_1290240.pth... [2024-11-10 12:43:54,457][08237] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000077_315392.pth [2024-11-10 12:43:59,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1306624. Throughput: 0: 1045.5. Samples: 326530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:43:59,281][02935] Avg episode reward: [(0, '5.559')] [2024-11-10 12:43:59,627][08250] Updated weights for policy 0, policy_version 320 (0.0047) [2024-11-10 12:44:04,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1327104. Throughput: 0: 993.6. Samples: 331486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:44:04,280][02935] Avg episode reward: [(0, '5.682')] [2024-11-10 12:44:08,827][08250] Updated weights for policy 0, policy_version 330 (0.0016) [2024-11-10 12:44:09,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1351680. Throughput: 0: 1031.5. Samples: 338976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-10 12:44:09,282][02935] Avg episode reward: [(0, '5.897')] [2024-11-10 12:44:09,285][08237] Saving new best policy, reward=5.897! [2024-11-10 12:44:14,275][02935] Fps is (10 sec: 4505.4, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1372160. Throughput: 0: 1064.8. Samples: 342578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:44:14,282][02935] Avg episode reward: [(0, '6.339')] [2024-11-10 12:44:14,299][08237] Saving new best policy, reward=6.339! [2024-11-10 12:44:19,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1388544. Throughput: 0: 1007.6. Samples: 346938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:44:19,277][02935] Avg episode reward: [(0, '6.308')] [2024-11-10 12:44:19,883][08250] Updated weights for policy 0, policy_version 340 (0.0025) [2024-11-10 12:44:24,275][02935] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4068.3). Total num frames: 1413120. Throughput: 0: 1009.3. Samples: 353852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:44:24,282][02935] Avg episode reward: [(0, '6.862')] [2024-11-10 12:44:24,293][08237] Saving new best policy, reward=6.862! [2024-11-10 12:44:28,189][08250] Updated weights for policy 0, policy_version 350 (0.0023) [2024-11-10 12:44:29,275][02935] Fps is (10 sec: 4915.1, 60 sec: 4232.5, 300 sec: 4082.1). Total num frames: 1437696. Throughput: 0: 1039.5. Samples: 357468. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:44:29,282][02935] Avg episode reward: [(0, '7.008')] [2024-11-10 12:44:29,284][08237] Saving new best policy, reward=7.008! [2024-11-10 12:44:34,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1449984. Throughput: 0: 1022.1. Samples: 362710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:44:34,280][02935] Avg episode reward: [(0, '6.848')] [2024-11-10 12:44:39,275][02935] Fps is (10 sec: 3276.8, 60 sec: 4028.0, 300 sec: 4054.4). Total num frames: 1470464. Throughput: 0: 997.0. Samples: 368742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:44:39,280][02935] Avg episode reward: [(0, '7.048')] [2024-11-10 12:44:39,292][08237] Saving new best policy, reward=7.048! [2024-11-10 12:44:39,536][08250] Updated weights for policy 0, policy_version 360 (0.0019) [2024-11-10 12:44:44,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1495040. Throughput: 0: 1016.5. Samples: 372274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:44:44,282][02935] Avg episode reward: [(0, '6.930')] [2024-11-10 12:44:49,147][08250] Updated weights for policy 0, policy_version 370 (0.0030) [2024-11-10 12:44:49,278][02935] Fps is (10 sec: 4504.0, 60 sec: 4095.8, 300 sec: 4068.2). Total num frames: 1515520. Throughput: 0: 1046.7. Samples: 378592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:44:49,281][02935] Avg episode reward: [(0, '6.879')] [2024-11-10 12:44:54,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1531904. Throughput: 0: 992.7. Samples: 383646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:44:54,277][02935] Avg episode reward: [(0, '7.567')] [2024-11-10 12:44:54,287][08237] Saving new best policy, reward=7.567! [2024-11-10 12:44:59,087][08250] Updated weights for policy 0, policy_version 380 (0.0019) [2024-11-10 12:44:59,275][02935] Fps is (10 sec: 4097.3, 60 sec: 4164.2, 300 sec: 4082.1). Total num frames: 1556480. Throughput: 0: 995.4. Samples: 387372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:44:59,279][02935] Avg episode reward: [(0, '7.834')] [2024-11-10 12:44:59,282][08237] Saving new best policy, reward=7.834! [2024-11-10 12:45:04,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1576960. Throughput: 0: 1059.5. Samples: 394614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:45:04,277][02935] Avg episode reward: [(0, '7.700')] [2024-11-10 12:45:09,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1593344. Throughput: 0: 1006.7. Samples: 399154. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:45:09,281][02935] Avg episode reward: [(0, '7.552')] [2024-11-10 12:45:09,968][08250] Updated weights for policy 0, policy_version 390 (0.0021) [2024-11-10 12:45:14,275][02935] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.2). Total num frames: 1617920. Throughput: 0: 1000.4. Samples: 402488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:45:14,277][02935] Avg episode reward: [(0, '8.245')] [2024-11-10 12:45:14,288][08237] Saving new best policy, reward=8.245! [2024-11-10 12:45:18,457][08250] Updated weights for policy 0, policy_version 400 (0.0022) [2024-11-10 12:45:19,275][02935] Fps is (10 sec: 4915.4, 60 sec: 4232.5, 300 sec: 4109.9). Total num frames: 1642496. Throughput: 0: 1045.7. Samples: 409768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:45:19,282][02935] Avg episode reward: [(0, '8.917')] [2024-11-10 12:45:19,285][08237] Saving new best policy, reward=8.917! [2024-11-10 12:45:24,277][02935] Fps is (10 sec: 3685.5, 60 sec: 4027.6, 300 sec: 4082.1). Total num frames: 1654784. Throughput: 0: 1024.7. Samples: 414858. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:45:24,280][02935] Avg episode reward: [(0, '9.472')] [2024-11-10 12:45:24,298][08237] Saving new best policy, reward=9.472! [2024-11-10 12:45:29,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4082.2). Total num frames: 1675264. Throughput: 0: 996.3. Samples: 417106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:45:29,280][02935] Avg episode reward: [(0, '9.303')] [2024-11-10 12:45:29,995][08250] Updated weights for policy 0, policy_version 410 (0.0036) [2024-11-10 12:45:34,275][02935] Fps is (10 sec: 4506.7, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 1699840. Throughput: 0: 1018.7. Samples: 424430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-10 12:45:34,280][02935] Avg episode reward: [(0, '10.192')] [2024-11-10 12:45:34,290][08237] Saving new best policy, reward=10.192! [2024-11-10 12:45:38,835][08250] Updated weights for policy 0, policy_version 420 (0.0024) [2024-11-10 12:45:39,275][02935] Fps is (10 sec: 4505.4, 60 sec: 4164.2, 300 sec: 4109.9). Total num frames: 1720320. Throughput: 0: 1044.4. Samples: 430646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-10 12:45:39,281][02935] Avg episode reward: [(0, '9.668')] [2024-11-10 12:45:44,279][02935] Fps is (10 sec: 3275.6, 60 sec: 3959.2, 300 sec: 4068.2). Total num frames: 1732608. Throughput: 0: 1010.7. Samples: 432858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:45:44,284][02935] Avg episode reward: [(0, '9.850')] [2024-11-10 12:45:49,234][08250] Updated weights for policy 0, policy_version 430 (0.0030) [2024-11-10 12:45:49,275][02935] Fps is (10 sec: 4096.2, 60 sec: 4096.2, 300 sec: 4109.9). Total num frames: 1761280. Throughput: 0: 995.7. Samples: 439420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:45:49,277][02935] Avg episode reward: [(0, '8.550')] [2024-11-10 12:45:54,275][02935] Fps is (10 sec: 4917.0, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 1781760. Throughput: 0: 1054.1. Samples: 446586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:45:54,279][02935] Avg episode reward: [(0, '8.225')] [2024-11-10 12:45:54,295][08237] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000435_1781760.pth... [2024-11-10 12:45:54,453][08237] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000195_798720.pth [2024-11-10 12:45:59,280][02935] Fps is (10 sec: 3684.4, 60 sec: 4027.4, 300 sec: 4082.0). Total num frames: 1798144. Throughput: 0: 1027.7. Samples: 448742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:45:59,285][02935] Avg episode reward: [(0, '8.338')] [2024-11-10 12:46:00,200][08250] Updated weights for policy 0, policy_version 440 (0.0034) [2024-11-10 12:46:04,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 1818624. Throughput: 0: 991.7. Samples: 454394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:46:04,281][02935] Avg episode reward: [(0, '9.254')] [2024-11-10 12:46:08,849][08250] Updated weights for policy 0, policy_version 450 (0.0027) [2024-11-10 12:46:09,275][02935] Fps is (10 sec: 4508.1, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 1843200. Throughput: 0: 1041.7. Samples: 461732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:46:09,277][02935] Avg episode reward: [(0, '9.831')] [2024-11-10 12:46:14,275][02935] Fps is (10 sec: 4505.4, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 1863680. Throughput: 0: 1060.7. Samples: 464836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:46:14,278][02935] Avg episode reward: [(0, '10.479')] [2024-11-10 12:46:14,290][08237] Saving new best policy, reward=10.479! [2024-11-10 12:46:19,275][02935] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 1880064. Throughput: 0: 1000.6. Samples: 469458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:46:19,278][02935] Avg episode reward: [(0, '10.867')] [2024-11-10 12:46:19,282][08237] Saving new best policy, reward=10.867! [2024-11-10 12:46:19,938][08250] Updated weights for policy 0, policy_version 460 (0.0013) [2024-11-10 12:46:24,275][02935] Fps is (10 sec: 4096.2, 60 sec: 4164.4, 300 sec: 4109.9). Total num frames: 1904640. Throughput: 0: 1021.4. Samples: 476608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:46:24,280][02935] Avg episode reward: [(0, '10.750')] [2024-11-10 12:46:28,770][08250] Updated weights for policy 0, policy_version 470 (0.0027) [2024-11-10 12:46:29,275][02935] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 1925120. Throughput: 0: 1054.4. Samples: 480300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:46:29,280][02935] Avg episode reward: [(0, '10.941')] [2024-11-10 12:46:29,283][08237] Saving new best policy, reward=10.941! [2024-11-10 12:46:34,276][02935] Fps is (10 sec: 3276.5, 60 sec: 3959.4, 300 sec: 4068.2). Total num frames: 1937408. Throughput: 0: 1009.2. Samples: 484836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:46:34,278][02935] Avg episode reward: [(0, '11.242')] [2024-11-10 12:46:34,300][08237] Saving new best policy, reward=11.242! [2024-11-10 12:46:39,275][02935] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 1961984. Throughput: 0: 997.1. Samples: 491456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:46:39,282][02935] Avg episode reward: [(0, '11.344')] [2024-11-10 12:46:39,286][08237] Saving new best policy, reward=11.344! [2024-11-10 12:46:39,681][08250] Updated weights for policy 0, policy_version 480 (0.0014) [2024-11-10 12:46:44,275][02935] Fps is (10 sec: 4915.6, 60 sec: 4232.8, 300 sec: 4109.9). Total num frames: 1986560. Throughput: 0: 1030.4. Samples: 495104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:46:44,277][02935] Avg episode reward: [(0, '11.436')] [2024-11-10 12:46:44,289][08237] Saving new best policy, reward=11.436! [2024-11-10 12:46:49,278][02935] Fps is (10 sec: 4095.0, 60 sec: 4027.6, 300 sec: 4082.1). Total num frames: 2002944. Throughput: 0: 1026.7. Samples: 500598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:46:49,282][02935] Avg episode reward: [(0, '11.489')] [2024-11-10 12:46:49,286][08237] Saving new best policy, reward=11.489! [2024-11-10 12:46:50,548][08250] Updated weights for policy 0, policy_version 490 (0.0025) [2024-11-10 12:46:54,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 2019328. Throughput: 0: 987.3. Samples: 506160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-10 12:46:54,288][02935] Avg episode reward: [(0, '11.793')] [2024-11-10 12:46:54,314][08237] Saving new best policy, reward=11.793! [2024-11-10 12:46:59,276][02935] Fps is (10 sec: 4096.4, 60 sec: 4096.3, 300 sec: 4109.9). Total num frames: 2043904. Throughput: 0: 996.3. Samples: 509670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:46:59,281][02935] Avg episode reward: [(0, '11.475')] [2024-11-10 12:46:59,313][08250] Updated weights for policy 0, policy_version 500 (0.0017) [2024-11-10 12:47:04,276][02935] Fps is (10 sec: 4505.2, 60 sec: 4095.9, 300 sec: 4096.0). Total num frames: 2064384. Throughput: 0: 1043.0. Samples: 516394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:47:04,283][02935] Avg episode reward: [(0, '11.268')] [2024-11-10 12:47:09,275][02935] Fps is (10 sec: 3686.9, 60 sec: 3959.4, 300 sec: 4068.2). Total num frames: 2080768. Throughput: 0: 988.6. Samples: 521096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:47:09,283][02935] Avg episode reward: [(0, '11.104')] [2024-11-10 12:47:10,380][08250] Updated weights for policy 0, policy_version 510 (0.0034) [2024-11-10 12:47:14,275][02935] Fps is (10 sec: 4096.3, 60 sec: 4027.8, 300 sec: 4096.0). Total num frames: 2105344. Throughput: 0: 989.3. Samples: 524818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:47:14,277][02935] Avg episode reward: [(0, '11.885')] [2024-11-10 12:47:14,289][08237] Saving new best policy, reward=11.885! [2024-11-10 12:47:19,114][08250] Updated weights for policy 0, policy_version 520 (0.0022) [2024-11-10 12:47:19,275][02935] Fps is (10 sec: 4915.3, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 2129920. Throughput: 0: 1048.1. Samples: 532000. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:47:19,280][02935] Avg episode reward: [(0, '14.262')] [2024-11-10 12:47:19,284][08237] Saving new best policy, reward=14.262! [2024-11-10 12:47:24,275][02935] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 2142208. Throughput: 0: 999.6. Samples: 536438. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:47:24,277][02935] Avg episode reward: [(0, '14.417')] [2024-11-10 12:47:24,292][08237] Saving new best policy, reward=14.417! [2024-11-10 12:47:29,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4082.2). Total num frames: 2162688. Throughput: 0: 983.7. Samples: 539370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:47:29,283][02935] Avg episode reward: [(0, '15.973')] [2024-11-10 12:47:29,285][08237] Saving new best policy, reward=15.973! [2024-11-10 12:47:30,328][08250] Updated weights for policy 0, policy_version 530 (0.0017) [2024-11-10 12:47:34,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2187264. Throughput: 0: 1021.1. Samples: 546544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:47:34,280][02935] Avg episode reward: [(0, '15.800')] [2024-11-10 12:47:39,281][02935] Fps is (10 sec: 4093.4, 60 sec: 4027.3, 300 sec: 4068.1). Total num frames: 2203648. Throughput: 0: 1019.2. Samples: 552030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:47:39,288][02935] Avg episode reward: [(0, '14.036')] [2024-11-10 12:47:41,171][08250] Updated weights for policy 0, policy_version 540 (0.0024) [2024-11-10 12:47:44,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4068.3). Total num frames: 2224128. Throughput: 0: 991.5. Samples: 554284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:47:44,281][02935] Avg episode reward: [(0, '13.660')] [2024-11-10 12:47:49,275][02935] Fps is (10 sec: 4508.4, 60 sec: 4096.2, 300 sec: 4096.0). Total num frames: 2248704. Throughput: 0: 1001.7. Samples: 561470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:47:49,277][02935] Avg episode reward: [(0, '12.372')] [2024-11-10 12:47:49,910][08250] Updated weights for policy 0, policy_version 550 (0.0026) [2024-11-10 12:47:54,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 2269184. Throughput: 0: 1034.1. Samples: 567632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:47:54,277][02935] Avg episode reward: [(0, '12.919')] [2024-11-10 12:47:54,294][08237] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000554_2269184.pth... [2024-11-10 12:47:54,422][08237] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000315_1290240.pth [2024-11-10 12:47:59,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3959.6, 300 sec: 4040.5). Total num frames: 2281472. Throughput: 0: 998.4. Samples: 569748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:47:59,280][02935] Avg episode reward: [(0, '13.687')] [2024-11-10 12:48:01,436][08250] Updated weights for policy 0, policy_version 560 (0.0027) [2024-11-10 12:48:04,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4068.2). Total num frames: 2306048. Throughput: 0: 976.1. Samples: 575926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:48:04,280][02935] Avg episode reward: [(0, '14.516')] [2024-11-10 12:48:09,275][02935] Fps is (10 sec: 4915.3, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 2330624. Throughput: 0: 1037.8. Samples: 583138. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:48:09,281][02935] Avg episode reward: [(0, '16.045')] [2024-11-10 12:48:09,284][08237] Saving new best policy, reward=16.045! [2024-11-10 12:48:10,295][08250] Updated weights for policy 0, policy_version 570 (0.0025) [2024-11-10 12:48:14,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2342912. Throughput: 0: 1022.2. Samples: 585368. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:48:14,282][02935] Avg episode reward: [(0, '17.016')] [2024-11-10 12:48:14,294][08237] Saving new best policy, reward=17.016! [2024-11-10 12:48:19,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4054.3). Total num frames: 2363392. Throughput: 0: 974.9. Samples: 590414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:48:19,280][02935] Avg episode reward: [(0, '17.322')] [2024-11-10 12:48:19,282][08237] Saving new best policy, reward=17.322! [2024-11-10 12:48:21,553][08250] Updated weights for policy 0, policy_version 580 (0.0019) [2024-11-10 12:48:24,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2387968. Throughput: 0: 1010.1. Samples: 597480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-10 12:48:24,278][02935] Avg episode reward: [(0, '18.338')] [2024-11-10 12:48:24,287][08237] Saving new best policy, reward=18.338! [2024-11-10 12:48:29,275][02935] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2404352. Throughput: 0: 1031.9. Samples: 600718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:48:29,278][02935] Avg episode reward: [(0, '17.484')] [2024-11-10 12:48:32,399][08250] Updated weights for policy 0, policy_version 590 (0.0019) [2024-11-10 12:48:34,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4040.5). Total num frames: 2420736. Throughput: 0: 971.2. Samples: 605172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:48:34,277][02935] Avg episode reward: [(0, '16.598')] [2024-11-10 12:48:39,275][02935] Fps is (10 sec: 4096.0, 60 sec: 4028.2, 300 sec: 4068.2). Total num frames: 2445312. Throughput: 0: 994.7. Samples: 612392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:48:39,283][02935] Avg episode reward: [(0, '20.085')] [2024-11-10 12:48:39,288][08237] Saving new best policy, reward=20.085! [2024-11-10 12:48:41,224][08250] Updated weights for policy 0, policy_version 600 (0.0023) [2024-11-10 12:48:44,275][02935] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2469888. Throughput: 0: 1027.2. Samples: 615970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:48:44,284][02935] Avg episode reward: [(0, '19.325')] [2024-11-10 12:48:49,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4040.5). Total num frames: 2482176. Throughput: 0: 1000.4. Samples: 620942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:48:49,277][02935] Avg episode reward: [(0, '20.260')] [2024-11-10 12:48:49,286][08237] Saving new best policy, reward=20.260! [2024-11-10 12:48:52,607][08250] Updated weights for policy 0, policy_version 610 (0.0018) [2024-11-10 12:48:54,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4054.3). Total num frames: 2502656. Throughput: 0: 977.7. Samples: 627136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:48:54,277][02935] Avg episode reward: [(0, '21.216')] [2024-11-10 12:48:54,330][08237] Saving new best policy, reward=21.216! [2024-11-10 12:48:59,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2527232. Throughput: 0: 1001.8. Samples: 630450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:48:59,279][02935] Avg episode reward: [(0, '18.495')] [2024-11-10 12:49:02,111][08250] Updated weights for policy 0, policy_version 620 (0.0027) [2024-11-10 12:49:04,277][02935] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 4040.4). Total num frames: 2543616. Throughput: 0: 1016.6. Samples: 636164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:49:04,280][02935] Avg episode reward: [(0, '17.918')] [2024-11-10 12:49:09,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4040.5). Total num frames: 2564096. Throughput: 0: 979.2. Samples: 641546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:49:09,277][02935] Avg episode reward: [(0, '16.234')] [2024-11-10 12:49:12,491][08250] Updated weights for policy 0, policy_version 630 (0.0029) [2024-11-10 12:49:14,275][02935] Fps is (10 sec: 4506.3, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2588672. Throughput: 0: 988.3. Samples: 645190. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:49:14,279][02935] Avg episode reward: [(0, '15.736')] [2024-11-10 12:49:19,275][02935] Fps is (10 sec: 4505.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2609152. Throughput: 0: 1043.2. Samples: 652116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:49:19,277][02935] Avg episode reward: [(0, '17.091')] [2024-11-10 12:49:23,242][08250] Updated weights for policy 0, policy_version 640 (0.0049) [2024-11-10 12:49:24,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 2621440. Throughput: 0: 981.5. Samples: 656558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:49:24,282][02935] Avg episode reward: [(0, '18.692')] [2024-11-10 12:49:29,275][02935] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2646016. Throughput: 0: 979.3. Samples: 660038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:49:29,277][02935] Avg episode reward: [(0, '19.217')] [2024-11-10 12:49:32,262][08250] Updated weights for policy 0, policy_version 650 (0.0026) [2024-11-10 12:49:34,275][02935] Fps is (10 sec: 4915.1, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 2670592. Throughput: 0: 1030.6. Samples: 667320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-10 12:49:34,278][02935] Avg episode reward: [(0, '18.658')] [2024-11-10 12:49:39,279][02935] Fps is (10 sec: 4094.2, 60 sec: 4027.4, 300 sec: 4040.4). Total num frames: 2686976. Throughput: 0: 1002.3. Samples: 672244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-11-10 12:49:39,283][02935] Avg episode reward: [(0, '19.374')] [2024-11-10 12:49:43,293][08250] Updated weights for policy 0, policy_version 660 (0.0017) [2024-11-10 12:49:44,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2707456. Throughput: 0: 987.0. Samples: 674866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:49:44,282][02935] Avg episode reward: [(0, '18.483')] [2024-11-10 12:49:49,276][02935] Fps is (10 sec: 4097.1, 60 sec: 4095.9, 300 sec: 4054.3). Total num frames: 2727936. Throughput: 0: 1021.8. Samples: 682144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:49:49,284][02935] Avg episode reward: [(0, '18.053')] [2024-11-10 12:49:52,128][08250] Updated weights for policy 0, policy_version 670 (0.0013) [2024-11-10 12:49:54,275][02935] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2748416. Throughput: 0: 1033.1. Samples: 688034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:49:54,278][02935] Avg episode reward: [(0, '17.915')] [2024-11-10 12:49:54,290][08237] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000671_2748416.pth... [2024-11-10 12:49:54,452][08237] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000435_1781760.pth [2024-11-10 12:49:59,275][02935] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 2764800. Throughput: 0: 996.1. Samples: 690014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:49:59,282][02935] Avg episode reward: [(0, '18.674')] [2024-11-10 12:50:03,263][08250] Updated weights for policy 0, policy_version 680 (0.0022) [2024-11-10 12:50:04,275][02935] Fps is (10 sec: 4096.1, 60 sec: 4096.1, 300 sec: 4054.4). Total num frames: 2789376. Throughput: 0: 991.4. Samples: 696730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:50:04,282][02935] Avg episode reward: [(0, '19.042')] [2024-11-10 12:50:09,280][02935] Fps is (10 sec: 4503.1, 60 sec: 4095.6, 300 sec: 4040.4). Total num frames: 2809856. Throughput: 0: 1046.6. Samples: 703662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:50:09,287][02935] Avg episode reward: [(0, '17.435')] [2024-11-10 12:50:13,828][08250] Updated weights for policy 0, policy_version 690 (0.0048) [2024-11-10 12:50:14,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2826240. Throughput: 0: 1018.3. Samples: 705860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:50:14,281][02935] Avg episode reward: [(0, '17.995')] [2024-11-10 12:50:19,275][02935] Fps is (10 sec: 4098.3, 60 sec: 4027.8, 300 sec: 4054.4). Total num frames: 2850816. Throughput: 0: 990.1. Samples: 711876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:50:19,282][02935] Avg episode reward: [(0, '17.357')] [2024-11-10 12:50:22,619][08250] Updated weights for policy 0, policy_version 700 (0.0032) [2024-11-10 12:50:24,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 2871296. Throughput: 0: 1044.0. Samples: 719220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:50:24,277][02935] Avg episode reward: [(0, '17.871')] [2024-11-10 12:50:29,275][02935] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2891776. Throughput: 0: 1046.2. Samples: 721946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:50:29,277][02935] Avg episode reward: [(0, '18.085')] [2024-11-10 12:50:33,740][08250] Updated weights for policy 0, policy_version 710 (0.0024) [2024-11-10 12:50:34,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 2908160. Throughput: 0: 993.8. Samples: 726862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:50:34,277][02935] Avg episode reward: [(0, '19.461')] [2024-11-10 12:50:39,275][02935] Fps is (10 sec: 4095.9, 60 sec: 4096.3, 300 sec: 4068.3). Total num frames: 2932736. Throughput: 0: 1026.2. Samples: 734212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:50:39,286][02935] Avg episode reward: [(0, '20.244')] [2024-11-10 12:50:42,012][08250] Updated weights for policy 0, policy_version 720 (0.0018) [2024-11-10 12:50:44,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2953216. Throughput: 0: 1063.4. Samples: 737866. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:50:44,277][02935] Avg episode reward: [(0, '20.917')] [2024-11-10 12:50:49,275][02935] Fps is (10 sec: 3686.5, 60 sec: 4027.8, 300 sec: 4026.6). Total num frames: 2969600. Throughput: 0: 1011.4. Samples: 742242. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-10 12:50:49,278][02935] Avg episode reward: [(0, '20.551')] [2024-11-10 12:50:53,218][08250] Updated weights for policy 0, policy_version 730 (0.0029) [2024-11-10 12:50:54,275][02935] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4054.4). Total num frames: 2994176. Throughput: 0: 1009.4. Samples: 749080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:50:54,281][02935] Avg episode reward: [(0, '20.414')] [2024-11-10 12:50:59,276][02935] Fps is (10 sec: 4505.2, 60 sec: 4164.2, 300 sec: 4054.3). Total num frames: 3014656. Throughput: 0: 1039.4. Samples: 752632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:50:59,282][02935] Avg episode reward: [(0, '19.207')] [2024-11-10 12:51:03,743][08250] Updated weights for policy 0, policy_version 740 (0.0036) [2024-11-10 12:51:04,275][02935] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3031040. Throughput: 0: 1020.9. Samples: 757816. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:51:04,277][02935] Avg episode reward: [(0, '19.731')] [2024-11-10 12:51:09,275][02935] Fps is (10 sec: 3686.7, 60 sec: 4028.1, 300 sec: 4026.6). Total num frames: 3051520. Throughput: 0: 994.0. Samples: 763952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:51:09,281][02935] Avg episode reward: [(0, '20.273')] [2024-11-10 12:51:12,853][08250] Updated weights for policy 0, policy_version 750 (0.0015) [2024-11-10 12:51:14,275][02935] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 3076096. Throughput: 0: 1014.4. Samples: 767594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:51:14,282][02935] Avg episode reward: [(0, '21.630')] [2024-11-10 12:51:14,291][08237] Saving new best policy, reward=21.630! [2024-11-10 12:51:19,278][02935] Fps is (10 sec: 4094.6, 60 sec: 4027.5, 300 sec: 4026.5). Total num frames: 3092480. Throughput: 0: 1041.8. Samples: 773748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:51:19,281][02935] Avg episode reward: [(0, '20.616')] [2024-11-10 12:51:24,036][08250] Updated weights for policy 0, policy_version 760 (0.0037) [2024-11-10 12:51:24,275][02935] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3112960. Throughput: 0: 993.5. Samples: 778920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:51:24,277][02935] Avg episode reward: [(0, '21.169')] [2024-11-10 12:51:29,275][02935] Fps is (10 sec: 4507.2, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3137536. Throughput: 0: 992.1. Samples: 782510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:51:29,277][02935] Avg episode reward: [(0, '20.821')] [2024-11-10 12:51:32,607][08250] Updated weights for policy 0, policy_version 770 (0.0022) [2024-11-10 12:51:34,275][02935] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 3158016. Throughput: 0: 1050.5. Samples: 789514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:51:34,278][02935] Avg episode reward: [(0, '21.416')] [2024-11-10 12:51:39,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3174400. Throughput: 0: 999.0. Samples: 794034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:51:39,278][02935] Avg episode reward: [(0, '20.665')] [2024-11-10 12:51:43,749][08250] Updated weights for policy 0, policy_version 780 (0.0024) [2024-11-10 12:51:44,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3194880. Throughput: 0: 995.0. Samples: 797408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-10 12:51:44,282][02935] Avg episode reward: [(0, '20.840')] [2024-11-10 12:51:49,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3219456. Throughput: 0: 1043.4. Samples: 804768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:51:49,280][02935] Avg episode reward: [(0, '21.440')] [2024-11-10 12:51:53,397][08250] Updated weights for policy 0, policy_version 790 (0.0031) [2024-11-10 12:51:54,275][02935] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3235840. Throughput: 0: 1021.3. Samples: 809912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:51:54,281][02935] Avg episode reward: [(0, '21.206')] [2024-11-10 12:51:54,292][08237] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000790_3235840.pth... [2024-11-10 12:51:54,456][08237] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000554_2269184.pth [2024-11-10 12:51:59,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 3256320. Throughput: 0: 993.3. Samples: 812294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-11-10 12:51:59,277][02935] Avg episode reward: [(0, '20.778')] [2024-11-10 12:52:03,373][08250] Updated weights for policy 0, policy_version 800 (0.0023) [2024-11-10 12:52:04,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3280896. Throughput: 0: 1019.1. Samples: 819602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:52:04,280][02935] Avg episode reward: [(0, '22.236')] [2024-11-10 12:52:04,288][08237] Saving new best policy, reward=22.236! [2024-11-10 12:52:09,275][02935] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3297280. Throughput: 0: 1041.7. Samples: 825796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:52:09,277][02935] Avg episode reward: [(0, '21.651')] [2024-11-10 12:52:14,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3313664. Throughput: 0: 1010.4. Samples: 827978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:52:14,283][02935] Avg episode reward: [(0, '21.059')] [2024-11-10 12:52:14,375][08250] Updated weights for policy 0, policy_version 810 (0.0019) [2024-11-10 12:52:19,275][02935] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4054.3). Total num frames: 3338240. Throughput: 0: 1004.8. Samples: 834730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:52:19,280][02935] Avg episode reward: [(0, '21.481')] [2024-11-10 12:52:22,419][08250] Updated weights for policy 0, policy_version 820 (0.0019) [2024-11-10 12:52:24,275][02935] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3362816. Throughput: 0: 1064.4. Samples: 841934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:52:24,279][02935] Avg episode reward: [(0, '21.121')] [2024-11-10 12:52:29,277][02935] Fps is (10 sec: 4095.2, 60 sec: 4027.6, 300 sec: 4040.4). Total num frames: 3379200. Throughput: 0: 1037.3. Samples: 844086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:52:29,280][02935] Avg episode reward: [(0, '21.389')] [2024-11-10 12:52:33,623][08250] Updated weights for policy 0, policy_version 830 (0.0018) [2024-11-10 12:52:34,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 3399680. Throughput: 0: 999.4. Samples: 849742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:52:34,283][02935] Avg episode reward: [(0, '21.869')] [2024-11-10 12:52:39,275][02935] Fps is (10 sec: 4506.5, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3424256. Throughput: 0: 1046.8. Samples: 857020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:52:39,279][02935] Avg episode reward: [(0, '21.365')] [2024-11-10 12:52:42,748][08250] Updated weights for policy 0, policy_version 840 (0.0017) [2024-11-10 12:52:44,275][02935] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3440640. Throughput: 0: 1060.8. Samples: 860030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:52:44,278][02935] Avg episode reward: [(0, '21.585')] [2024-11-10 12:52:49,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3461120. Throughput: 0: 1003.6. Samples: 864766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:52:49,282][02935] Avg episode reward: [(0, '21.983')] [2024-11-10 12:52:53,197][08250] Updated weights for policy 0, policy_version 850 (0.0022) [2024-11-10 12:52:54,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3485696. Throughput: 0: 1028.9. Samples: 872098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:52:54,280][02935] Avg episode reward: [(0, '19.924')] [2024-11-10 12:52:59,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3506176. Throughput: 0: 1059.2. Samples: 875642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:52:59,277][02935] Avg episode reward: [(0, '19.485')] [2024-11-10 12:53:04,193][08250] Updated weights for policy 0, policy_version 860 (0.0015) [2024-11-10 12:53:04,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3522560. Throughput: 0: 1013.2. Samples: 880322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:53:04,277][02935] Avg episode reward: [(0, '19.646')] [2024-11-10 12:53:09,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3543040. Throughput: 0: 1001.3. Samples: 886992. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-11-10 12:53:09,278][02935] Avg episode reward: [(0, '21.344')] [2024-11-10 12:53:12,602][08250] Updated weights for policy 0, policy_version 870 (0.0030) [2024-11-10 12:53:14,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4082.1). Total num frames: 3567616. Throughput: 0: 1036.0. Samples: 890704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:53:14,281][02935] Avg episode reward: [(0, '22.289')] [2024-11-10 12:53:14,363][08237] Saving new best policy, reward=22.289! [2024-11-10 12:53:19,275][02935] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3584000. Throughput: 0: 1034.4. Samples: 896292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:53:19,278][02935] Avg episode reward: [(0, '22.653')] [2024-11-10 12:53:19,284][08237] Saving new best policy, reward=22.653! [2024-11-10 12:53:23,990][08250] Updated weights for policy 0, policy_version 880 (0.0027) [2024-11-10 12:53:24,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3604480. Throughput: 0: 998.6. Samples: 901958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:53:24,282][02935] Avg episode reward: [(0, '23.653')] [2024-11-10 12:53:24,292][08237] Saving new best policy, reward=23.653! [2024-11-10 12:53:29,275][02935] Fps is (10 sec: 4505.7, 60 sec: 4164.4, 300 sec: 4096.0). Total num frames: 3629056. Throughput: 0: 1011.7. Samples: 905556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:53:29,282][02935] Avg episode reward: [(0, '24.445')] [2024-11-10 12:53:29,285][08237] Saving new best policy, reward=24.445! [2024-11-10 12:53:32,882][08250] Updated weights for policy 0, policy_version 890 (0.0021) [2024-11-10 12:53:34,279][02935] Fps is (10 sec: 4503.5, 60 sec: 4164.0, 300 sec: 4082.1). Total num frames: 3649536. Throughput: 0: 1050.1. Samples: 912024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:53:34,282][02935] Avg episode reward: [(0, '21.802')] [2024-11-10 12:53:39,275][02935] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3661824. Throughput: 0: 994.4. Samples: 916844. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:53:39,281][02935] Avg episode reward: [(0, '21.826')] [2024-11-10 12:53:43,348][08250] Updated weights for policy 0, policy_version 900 (0.0036) [2024-11-10 12:53:44,275][02935] Fps is (10 sec: 4097.8, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3690496. Throughput: 0: 997.7. Samples: 920540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:53:44,277][02935] Avg episode reward: [(0, '21.650')] [2024-11-10 12:53:49,275][02935] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3710976. Throughput: 0: 1056.7. Samples: 927874. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:53:49,282][02935] Avg episode reward: [(0, '21.025')] [2024-11-10 12:53:53,934][08250] Updated weights for policy 0, policy_version 910 (0.0025) [2024-11-10 12:53:54,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3727360. Throughput: 0: 1010.4. Samples: 932460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:53:54,279][02935] Avg episode reward: [(0, '21.567')] [2024-11-10 12:53:54,295][08237] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000910_3727360.pth... [2024-11-10 12:53:54,474][08237] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000671_2748416.pth [2024-11-10 12:53:59,275][02935] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3747840. Throughput: 0: 995.2. Samples: 935490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:53:59,281][02935] Avg episode reward: [(0, '21.819')] [2024-11-10 12:54:03,367][08250] Updated weights for policy 0, policy_version 920 (0.0024) [2024-11-10 12:54:04,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3772416. Throughput: 0: 1026.3. Samples: 942476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:54:04,285][02935] Avg episode reward: [(0, '20.734')] [2024-11-10 12:54:09,278][02935] Fps is (10 sec: 4095.6, 60 sec: 4095.9, 300 sec: 4068.2). Total num frames: 3788800. Throughput: 0: 1021.6. Samples: 947932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:54:09,281][02935] Avg episode reward: [(0, '20.057')] [2024-11-10 12:54:14,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4054.4). Total num frames: 3805184. Throughput: 0: 991.4. Samples: 950168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:54:14,280][02935] Avg episode reward: [(0, '19.910')] [2024-11-10 12:54:14,311][08250] Updated weights for policy 0, policy_version 930 (0.0024) [2024-11-10 12:54:19,275][02935] Fps is (10 sec: 4096.5, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3829760. Throughput: 0: 1010.8. Samples: 957506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:54:19,280][02935] Avg episode reward: [(0, '19.391')] [2024-11-10 12:54:23,022][08250] Updated weights for policy 0, policy_version 940 (0.0018) [2024-11-10 12:54:24,276][02935] Fps is (10 sec: 4505.3, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3850240. Throughput: 0: 1045.5. Samples: 963894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:54:24,283][02935] Avg episode reward: [(0, '19.656')] [2024-11-10 12:54:29,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 3866624. Throughput: 0: 1011.3. Samples: 966050. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-11-10 12:54:29,278][02935] Avg episode reward: [(0, '19.968')] [2024-11-10 12:54:33,983][08250] Updated weights for policy 0, policy_version 950 (0.0029) [2024-11-10 12:54:34,275][02935] Fps is (10 sec: 4096.2, 60 sec: 4028.0, 300 sec: 4082.2). Total num frames: 3891200. Throughput: 0: 986.4. Samples: 972260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:54:34,285][02935] Avg episode reward: [(0, '21.996')] [2024-11-10 12:54:39,275][02935] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4096.0). Total num frames: 3915776. Throughput: 0: 1044.7. Samples: 979470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:54:39,280][02935] Avg episode reward: [(0, '22.556')] [2024-11-10 12:54:44,275][02935] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4068.3). Total num frames: 3928064. Throughput: 0: 1025.1. Samples: 981618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-11-10 12:54:44,280][02935] Avg episode reward: [(0, '22.648')] [2024-11-10 12:54:44,792][08250] Updated weights for policy 0, policy_version 960 (0.0023) [2024-11-10 12:54:49,275][02935] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 3948544. Throughput: 0: 990.5. Samples: 987050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:54:49,277][02935] Avg episode reward: [(0, '23.349')] [2024-11-10 12:54:53,646][08250] Updated weights for policy 0, policy_version 970 (0.0022) [2024-11-10 12:54:54,275][02935] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3973120. Throughput: 0: 1033.0. Samples: 994414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-11-10 12:54:54,282][02935] Avg episode reward: [(0, '23.353')] [2024-11-10 12:54:59,277][02935] Fps is (10 sec: 4504.5, 60 sec: 4095.8, 300 sec: 4082.1). Total num frames: 3993600. Throughput: 0: 1050.5. Samples: 997444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-11-10 12:54:59,280][02935] Avg episode reward: [(0, '21.871')] [2024-11-10 12:55:03,201][08237] Stopping Batcher_0... [2024-11-10 12:55:03,203][08237] Loop batcher_evt_loop terminating... [2024-11-10 12:55:03,203][02935] Component Batcher_0 stopped! [2024-11-10 12:55:03,203][08237] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-10 12:55:03,283][08250] Weights refcount: 2 0 [2024-11-10 12:55:03,289][08250] Stopping InferenceWorker_p0-w0... [2024-11-10 12:55:03,290][02935] Component InferenceWorker_p0-w0 stopped! [2024-11-10 12:55:03,296][08250] Loop inference_proc0-0_evt_loop terminating... [2024-11-10 12:55:03,341][08237] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000790_3235840.pth [2024-11-10 12:55:03,360][08237] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-10 12:55:03,554][08237] Stopping LearnerWorker_p0... [2024-11-10 12:55:03,554][08237] Loop learner_proc0_evt_loop terminating... [2024-11-10 12:55:03,554][02935] Component LearnerWorker_p0 stopped! [2024-11-10 12:55:03,578][08254] Stopping RolloutWorker_w3... [2024-11-10 12:55:03,577][02935] Component RolloutWorker_w2 stopped! [2024-11-10 12:55:03,583][02935] Component RolloutWorker_w3 stopped! [2024-11-10 12:55:03,579][08254] Loop rollout_proc3_evt_loop terminating... [2024-11-10 12:55:03,587][08253] Stopping RolloutWorker_w2... [2024-11-10 12:55:03,594][02935] Component RolloutWorker_w6 stopped! [2024-11-10 12:55:03,600][08256] Stopping RolloutWorker_w5... [2024-11-10 12:55:03,600][02935] Component RolloutWorker_w5 stopped! [2024-11-10 12:55:03,605][08252] Stopping RolloutWorker_w1... [2024-11-10 12:55:03,605][08258] Stopping RolloutWorker_w6... [2024-11-10 12:55:03,606][08252] Loop rollout_proc1_evt_loop terminating... [2024-11-10 12:55:03,606][08258] Loop rollout_proc6_evt_loop terminating... [2024-11-10 12:55:03,608][08257] Stopping RolloutWorker_w7... [2024-11-10 12:55:03,608][08257] Loop rollout_proc7_evt_loop terminating... [2024-11-10 12:55:03,600][08256] Loop rollout_proc5_evt_loop terminating... [2024-11-10 12:55:03,605][02935] Component RolloutWorker_w1 stopped! [2024-11-10 12:55:03,595][08253] Loop rollout_proc2_evt_loop terminating... [2024-11-10 12:55:03,613][02935] Component RolloutWorker_w7 stopped! [2024-11-10 12:55:03,629][02935] Component RolloutWorker_w0 stopped! [2024-11-10 12:55:03,634][08251] Stopping RolloutWorker_w0... [2024-11-10 12:55:03,637][08251] Loop rollout_proc0_evt_loop terminating... [2024-11-10 12:55:03,646][02935] Component RolloutWorker_w4 stopped! [2024-11-10 12:55:03,652][02935] Waiting for process learner_proc0 to stop... [2024-11-10 12:55:03,656][08255] Stopping RolloutWorker_w4... [2024-11-10 12:55:03,657][08255] Loop rollout_proc4_evt_loop terminating... [2024-11-10 12:55:05,230][02935] Waiting for process inference_proc0-0 to join... [2024-11-10 12:55:05,237][02935] Waiting for process rollout_proc0 to join... [2024-11-10 12:55:06,962][02935] Waiting for process rollout_proc1 to join... [2024-11-10 12:55:07,107][02935] Waiting for process rollout_proc2 to join... [2024-11-10 12:55:07,110][02935] Waiting for process rollout_proc3 to join... [2024-11-10 12:55:07,114][02935] Waiting for process rollout_proc4 to join... [2024-11-10 12:55:07,117][02935] Waiting for process rollout_proc5 to join... [2024-11-10 12:55:07,121][02935] Waiting for process rollout_proc6 to join... [2024-11-10 12:55:07,125][02935] Waiting for process rollout_proc7 to join... [2024-11-10 12:55:07,128][02935] Batcher 0 profile tree view: batching: 26.0343, releasing_batches: 0.0291 [2024-11-10 12:55:07,129][02935] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 406.4562 update_model: 8.0174 weight_update: 0.0029 one_step: 0.0035 handle_policy_step: 543.4140 deserialize: 13.9237, stack: 3.0818, obs_to_device_normalize: 116.6087, forward: 270.7420, send_messages: 27.3907 prepare_outputs: 83.6437 to_cpu: 51.0823 [2024-11-10 12:55:07,134][02935] Learner 0 profile tree view: misc: 0.0047, prepare_batch: 13.4588 train: 72.8454 epoch_init: 0.0108, minibatch_init: 0.0065, losses_postprocess: 0.6596, kl_divergence: 0.6678, after_optimizer: 33.8859 calculate_losses: 25.2943 losses_init: 0.0060, forward_head: 1.2465, bptt_initial: 16.9610, tail: 1.0294, advantages_returns: 0.2378, losses: 3.6671 bptt: 1.8206 bptt_forward_core: 1.7015 update: 11.7427 clip: 0.9018 [2024-11-10 12:55:07,136][02935] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4011, enqueue_policy_requests: 92.1029, env_step: 784.1919, overhead: 11.8830, complete_rollouts: 6.8302 save_policy_outputs: 19.0356 split_output_tensors: 7.8320 [2024-11-10 12:55:07,137][02935] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3045, enqueue_policy_requests: 93.2773, env_step: 787.4292, overhead: 11.9172, complete_rollouts: 6.5937 save_policy_outputs: 19.2567 split_output_tensors: 7.8200 [2024-11-10 12:55:07,138][02935] Loop Runner_EvtLoop terminating... [2024-11-10 12:55:07,140][02935] Runner profile tree view: main_loop: 1027.0013 [2024-11-10 12:55:07,141][02935] Collected {0: 4005888}, FPS: 3900.6 [2024-11-10 12:55:07,544][02935] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-10 12:55:07,546][02935] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-10 12:55:07,548][02935] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-10 12:55:07,551][02935] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-10 12:55:07,553][02935] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-10 12:55:07,554][02935] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-10 12:55:07,556][02935] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-11-10 12:55:07,557][02935] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-10 12:55:07,558][02935] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-11-10 12:55:07,559][02935] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-11-10 12:55:07,560][02935] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-10 12:55:07,561][02935] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-10 12:55:07,562][02935] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-10 12:55:07,563][02935] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-10 12:55:07,565][02935] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-10 12:55:07,602][02935] Doom resolution: 160x120, resize resolution: (128, 72) [2024-11-10 12:55:07,605][02935] RunningMeanStd input shape: (3, 72, 128) [2024-11-10 12:55:07,607][02935] RunningMeanStd input shape: (1,) [2024-11-10 12:55:07,622][02935] ConvEncoder: input_channels=3 [2024-11-10 12:55:07,732][02935] Conv encoder output size: 512 [2024-11-10 12:55:07,734][02935] Policy head output size: 512 [2024-11-10 12:55:07,914][02935] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-10 12:55:08,708][02935] Num frames 100... [2024-11-10 12:55:08,829][02935] Num frames 200... [2024-11-10 12:55:08,949][02935] Num frames 300... [2024-11-10 12:55:09,083][02935] Num frames 400... [2024-11-10 12:55:09,199][02935] Num frames 500... [2024-11-10 12:55:09,330][02935] Num frames 600... [2024-11-10 12:55:09,432][02935] Avg episode rewards: #0: 13.400, true rewards: #0: 6.400 [2024-11-10 12:55:09,433][02935] Avg episode reward: 13.400, avg true_objective: 6.400 [2024-11-10 12:55:09,507][02935] Num frames 700... [2024-11-10 12:55:09,631][02935] Num frames 800... [2024-11-10 12:55:09,749][02935] Num frames 900... [2024-11-10 12:55:09,877][02935] Num frames 1000... [2024-11-10 12:55:09,998][02935] Num frames 1100... [2024-11-10 12:55:10,149][02935] Avg episode rewards: #0: 11.420, true rewards: #0: 5.920 [2024-11-10 12:55:10,152][02935] Avg episode reward: 11.420, avg true_objective: 5.920 [2024-11-10 12:55:10,173][02935] Num frames 1200... [2024-11-10 12:55:10,296][02935] Num frames 1300... [2024-11-10 12:55:10,420][02935] Num frames 1400... [2024-11-10 12:55:10,543][02935] Num frames 1500... [2024-11-10 12:55:10,660][02935] Num frames 1600... [2024-11-10 12:55:10,781][02935] Num frames 1700... [2024-11-10 12:55:10,906][02935] Num frames 1800... [2024-11-10 12:55:11,025][02935] Num frames 1900... [2024-11-10 12:55:11,144][02935] Num frames 2000... [2024-11-10 12:55:11,306][02935] Avg episode rewards: #0: 13.297, true rewards: #0: 6.963 [2024-11-10 12:55:11,308][02935] Avg episode reward: 13.297, avg true_objective: 6.963 [2024-11-10 12:55:11,323][02935] Num frames 2100... [2024-11-10 12:55:11,447][02935] Num frames 2200... [2024-11-10 12:55:11,570][02935] Num frames 2300... [2024-11-10 12:55:11,689][02935] Num frames 2400... [2024-11-10 12:55:11,811][02935] Num frames 2500... [2024-11-10 12:55:11,936][02935] Num frames 2600... [2024-11-10 12:55:12,111][02935] Avg episode rewards: #0: 12.243, true rewards: #0: 6.742 [2024-11-10 12:55:12,114][02935] Avg episode reward: 12.243, avg true_objective: 6.742 [2024-11-10 12:55:12,119][02935] Num frames 2700... [2024-11-10 12:55:12,238][02935] Num frames 2800... [2024-11-10 12:55:12,362][02935] Num frames 2900... [2024-11-10 12:55:12,488][02935] Num frames 3000... [2024-11-10 12:55:12,608][02935] Num frames 3100... [2024-11-10 12:55:12,730][02935] Num frames 3200... [2024-11-10 12:55:12,853][02935] Num frames 3300... [2024-11-10 12:55:12,985][02935] Num frames 3400... [2024-11-10 12:55:13,156][02935] Num frames 3500... [2024-11-10 12:55:13,327][02935] Num frames 3600... [2024-11-10 12:55:13,500][02935] Num frames 3700... [2024-11-10 12:55:13,666][02935] Num frames 3800... [2024-11-10 12:55:13,831][02935] Num frames 3900... [2024-11-10 12:55:14,009][02935] Num frames 4000... [2024-11-10 12:55:14,175][02935] Num frames 4100... [2024-11-10 12:55:14,344][02935] Num frames 4200... [2024-11-10 12:55:14,521][02935] Num frames 4300... [2024-11-10 12:55:14,686][02935] Num frames 4400... [2024-11-10 12:55:14,859][02935] Num frames 4500... [2024-11-10 12:55:15,025][02935] Avg episode rewards: #0: 19.728, true rewards: #0: 9.128 [2024-11-10 12:55:15,028][02935] Avg episode reward: 19.728, avg true_objective: 9.128 [2024-11-10 12:55:15,093][02935] Num frames 4600... [2024-11-10 12:55:15,259][02935] Num frames 4700... [2024-11-10 12:55:15,401][02935] Num frames 4800... [2024-11-10 12:55:15,530][02935] Num frames 4900... [2024-11-10 12:55:15,649][02935] Num frames 5000... [2024-11-10 12:55:15,769][02935] Num frames 5100... [2024-11-10 12:55:15,896][02935] Num frames 5200... [2024-11-10 12:55:16,013][02935] Num frames 5300... [2024-11-10 12:55:16,135][02935] Num frames 5400... [2024-11-10 12:55:16,261][02935] Num frames 5500... [2024-11-10 12:55:16,379][02935] Num frames 5600... [2024-11-10 12:55:16,510][02935] Num frames 5700... [2024-11-10 12:55:16,635][02935] Num frames 5800... [2024-11-10 12:55:16,756][02935] Num frames 5900... [2024-11-10 12:55:16,883][02935] Num frames 6000... [2024-11-10 12:55:17,005][02935] Num frames 6100... [2024-11-10 12:55:17,127][02935] Num frames 6200... [2024-11-10 12:55:17,249][02935] Num frames 6300... [2024-11-10 12:55:17,371][02935] Num frames 6400... [2024-11-10 12:55:17,491][02935] Num frames 6500... [2024-11-10 12:55:17,567][02935] Avg episode rewards: #0: 24.693, true rewards: #0: 10.860 [2024-11-10 12:55:17,569][02935] Avg episode reward: 24.693, avg true_objective: 10.860 [2024-11-10 12:55:17,668][02935] Num frames 6600... [2024-11-10 12:55:17,784][02935] Num frames 6700... [2024-11-10 12:55:17,911][02935] Num frames 6800... [2024-11-10 12:55:18,030][02935] Num frames 6900... [2024-11-10 12:55:18,150][02935] Num frames 7000... [2024-11-10 12:55:18,265][02935] Num frames 7100... [2024-11-10 12:55:18,380][02935] Num frames 7200... [2024-11-10 12:55:18,503][02935] Num frames 7300... [2024-11-10 12:55:18,629][02935] Num frames 7400... [2024-11-10 12:55:18,749][02935] Num frames 7500... [2024-11-10 12:55:18,873][02935] Num frames 7600... [2024-11-10 12:55:18,943][02935] Avg episode rewards: #0: 24.446, true rewards: #0: 10.874 [2024-11-10 12:55:18,944][02935] Avg episode reward: 24.446, avg true_objective: 10.874 [2024-11-10 12:55:19,051][02935] Num frames 7700... [2024-11-10 12:55:19,168][02935] Num frames 7800... [2024-11-10 12:55:19,290][02935] Num frames 7900... [2024-11-10 12:55:19,420][02935] Avg episode rewards: #0: 22.080, true rewards: #0: 9.955 [2024-11-10 12:55:19,422][02935] Avg episode reward: 22.080, avg true_objective: 9.955 [2024-11-10 12:55:19,468][02935] Num frames 8000... [2024-11-10 12:55:19,600][02935] Num frames 8100... [2024-11-10 12:55:19,718][02935] Num frames 8200... [2024-11-10 12:55:19,843][02935] Num frames 8300... [2024-11-10 12:55:19,967][02935] Num frames 8400... [2024-11-10 12:55:20,087][02935] Num frames 8500... [2024-11-10 12:55:20,208][02935] Num frames 8600... [2024-11-10 12:55:20,328][02935] Num frames 8700... [2024-11-10 12:55:20,446][02935] Num frames 8800... [2024-11-10 12:55:20,569][02935] Num frames 8900... [2024-11-10 12:55:20,696][02935] Num frames 9000... [2024-11-10 12:55:20,817][02935] Num frames 9100... [2024-11-10 12:55:20,941][02935] Num frames 9200... [2024-11-10 12:55:21,048][02935] Avg episode rewards: #0: 23.049, true rewards: #0: 10.271 [2024-11-10 12:55:21,050][02935] Avg episode reward: 23.049, avg true_objective: 10.271 [2024-11-10 12:55:21,119][02935] Num frames 9300... [2024-11-10 12:55:21,240][02935] Num frames 9400... [2024-11-10 12:55:21,359][02935] Num frames 9500... [2024-11-10 12:55:21,479][02935] Num frames 9600... [2024-11-10 12:55:21,602][02935] Num frames 9700... [2024-11-10 12:55:21,729][02935] Num frames 9800... [2024-11-10 12:55:21,853][02935] Num frames 9900... [2024-11-10 12:55:21,976][02935] Num frames 10000... [2024-11-10 12:55:22,095][02935] Num frames 10100... [2024-11-10 12:55:22,219][02935] Num frames 10200... [2024-11-10 12:55:22,337][02935] Num frames 10300... [2024-11-10 12:55:22,455][02935] Num frames 10400... [2024-11-10 12:55:22,578][02935] Num frames 10500... [2024-11-10 12:55:22,661][02935] Avg episode rewards: #0: 23.724, true rewards: #0: 10.524 [2024-11-10 12:55:22,663][02935] Avg episode reward: 23.724, avg true_objective: 10.524 [2024-11-10 12:56:24,920][02935] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-10 12:56:25,517][02935] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-10 12:56:25,518][02935] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-10 12:56:25,520][02935] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-10 12:56:25,521][02935] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-10 12:56:25,523][02935] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-10 12:56:25,524][02935] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-10 12:56:25,526][02935] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-10 12:56:25,527][02935] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-10 12:56:25,529][02935] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-10 12:56:25,529][02935] Adding new argument 'hf_repository'='ToshI4/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-10 12:56:25,530][02935] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-10 12:56:25,531][02935] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-10 12:56:25,532][02935] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-10 12:56:25,533][02935] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-10 12:56:25,534][02935] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-10 12:56:25,576][02935] RunningMeanStd input shape: (3, 72, 128) [2024-11-10 12:56:25,578][02935] RunningMeanStd input shape: (1,) [2024-11-10 12:56:25,595][02935] ConvEncoder: input_channels=3 [2024-11-10 12:56:25,651][02935] Conv encoder output size: 512 [2024-11-10 12:56:25,653][02935] Policy head output size: 512 [2024-11-10 12:56:25,679][02935] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-10 12:56:26,307][02935] Num frames 100... [2024-11-10 12:56:26,461][02935] Num frames 200... [2024-11-10 12:56:26,617][02935] Num frames 300... [2024-11-10 12:56:26,774][02935] Num frames 400... [2024-11-10 12:56:26,944][02935] Num frames 500... [2024-11-10 12:56:27,109][02935] Num frames 600... [2024-11-10 12:56:27,260][02935] Num frames 700... [2024-11-10 12:56:27,414][02935] Num frames 800... [2024-11-10 12:56:27,575][02935] Num frames 900... [2024-11-10 12:56:27,772][02935] Avg episode rewards: #0: 24.920, true rewards: #0: 9.920 [2024-11-10 12:56:27,773][02935] Avg episode reward: 24.920, avg true_objective: 9.920 [2024-11-10 12:56:27,788][02935] Num frames 1000... [2024-11-10 12:56:27,945][02935] Num frames 1100... [2024-11-10 12:56:28,110][02935] Num frames 1200... [2024-11-10 12:56:28,268][02935] Num frames 1300... [2024-11-10 12:56:28,436][02935] Num frames 1400... [2024-11-10 12:56:28,595][02935] Num frames 1500... [2024-11-10 12:56:28,792][02935] Num frames 1600... [2024-11-10 12:56:28,998][02935] Num frames 1700... [2024-11-10 12:56:29,172][02935] Num frames 1800... [2024-11-10 12:56:29,345][02935] Num frames 1900... [2024-11-10 12:56:29,542][02935] Num frames 2000... [2024-11-10 12:56:29,742][02935] Num frames 2100... [2024-11-10 12:56:29,942][02935] Num frames 2200... [2024-11-10 12:56:30,141][02935] Num frames 2300... [2024-11-10 12:56:30,399][02935] Num frames 2400... [2024-11-10 12:56:30,597][02935] Num frames 2500... [2024-11-10 12:56:30,816][02935] Num frames 2600... [2024-11-10 12:56:31,036][02935] Num frames 2700... [2024-11-10 12:56:31,266][02935] Avg episode rewards: #0: 36.920, true rewards: #0: 13.920 [2024-11-10 12:56:31,268][02935] Avg episode reward: 36.920, avg true_objective: 13.920 [2024-11-10 12:56:31,309][02935] Num frames 2800... [2024-11-10 12:56:31,513][02935] Num frames 2900... [2024-11-10 12:56:31,716][02935] Num frames 3000... [2024-11-10 12:56:31,895][02935] Num frames 3100... [2024-11-10 12:56:32,090][02935] Num frames 3200... [2024-11-10 12:56:32,271][02935] Num frames 3300... [2024-11-10 12:56:32,439][02935] Num frames 3400... [2024-11-10 12:56:32,626][02935] Num frames 3500... [2024-11-10 12:56:32,810][02935] Num frames 3600... [2024-11-10 12:56:32,989][02935] Num frames 3700... [2024-11-10 12:56:33,072][02935] Avg episode rewards: #0: 32.373, true rewards: #0: 12.373 [2024-11-10 12:56:33,074][02935] Avg episode reward: 32.373, avg true_objective: 12.373 [2024-11-10 12:56:33,242][02935] Num frames 3800... [2024-11-10 12:56:33,413][02935] Num frames 3900... [2024-11-10 12:56:33,581][02935] Num frames 4000... [2024-11-10 12:56:33,749][02935] Num frames 4100... [2024-11-10 12:56:33,920][02935] Num frames 4200... [2024-11-10 12:56:34,099][02935] Num frames 4300... [2024-11-10 12:56:34,197][02935] Avg episode rewards: #0: 27.052, true rewards: #0: 10.802 [2024-11-10 12:56:34,198][02935] Avg episode reward: 27.052, avg true_objective: 10.802 [2024-11-10 12:56:34,298][02935] Num frames 4400... [2024-11-10 12:56:34,422][02935] Num frames 4500... [2024-11-10 12:56:34,539][02935] Num frames 4600... [2024-11-10 12:56:34,653][02935] Num frames 4700... [2024-11-10 12:56:34,787][02935] Avg episode rewards: #0: 22.738, true rewards: #0: 9.538 [2024-11-10 12:56:34,788][02935] Avg episode reward: 22.738, avg true_objective: 9.538 [2024-11-10 12:56:34,825][02935] Num frames 4800... [2024-11-10 12:56:34,951][02935] Num frames 4900... [2024-11-10 12:56:35,072][02935] Num frames 5000... [2024-11-10 12:56:35,190][02935] Num frames 5100... [2024-11-10 12:56:35,310][02935] Num frames 5200... [2024-11-10 12:56:35,436][02935] Num frames 5300... [2024-11-10 12:56:35,556][02935] Num frames 5400... [2024-11-10 12:56:35,673][02935] Num frames 5500... [2024-11-10 12:56:35,791][02935] Num frames 5600... [2024-11-10 12:56:35,917][02935] Num frames 5700... [2024-11-10 12:56:36,037][02935] Num frames 5800... [2024-11-10 12:56:36,157][02935] Num frames 5900... [2024-11-10 12:56:36,280][02935] Num frames 6000... [2024-11-10 12:56:36,405][02935] Num frames 6100... [2024-11-10 12:56:36,525][02935] Num frames 6200... [2024-11-10 12:56:36,645][02935] Num frames 6300... [2024-11-10 12:56:36,767][02935] Num frames 6400... [2024-11-10 12:56:36,891][02935] Num frames 6500... [2024-11-10 12:56:37,012][02935] Num frames 6600... [2024-11-10 12:56:37,131][02935] Num frames 6700... [2024-11-10 12:56:37,249][02935] Num frames 6800... [2024-11-10 12:56:37,346][02935] Avg episode rewards: #0: 28.555, true rewards: #0: 11.388 [2024-11-10 12:56:37,347][02935] Avg episode reward: 28.555, avg true_objective: 11.388 [2024-11-10 12:56:37,433][02935] Num frames 6900... [2024-11-10 12:56:37,557][02935] Num frames 7000... [2024-11-10 12:56:37,678][02935] Num frames 7100... [2024-11-10 12:56:37,796][02935] Num frames 7200... [2024-11-10 12:56:37,921][02935] Num frames 7300... [2024-11-10 12:56:38,046][02935] Num frames 7400... [2024-11-10 12:56:38,164][02935] Num frames 7500... [2024-11-10 12:56:38,281][02935] Num frames 7600... [2024-11-10 12:56:38,399][02935] Num frames 7700... [2024-11-10 12:56:38,527][02935] Num frames 7800... [2024-11-10 12:56:38,648][02935] Avg episode rewards: #0: 28.224, true rewards: #0: 11.224 [2024-11-10 12:56:38,650][02935] Avg episode reward: 28.224, avg true_objective: 11.224 [2024-11-10 12:56:38,703][02935] Num frames 7900... [2024-11-10 12:56:38,820][02935] Num frames 8000... [2024-11-10 12:56:38,943][02935] Num frames 8100... [2024-11-10 12:56:39,061][02935] Num frames 8200... [2024-11-10 12:56:39,181][02935] Num frames 8300... [2024-11-10 12:56:39,320][02935] Num frames 8400... [2024-11-10 12:56:39,440][02935] Num frames 8500... [2024-11-10 12:56:39,567][02935] Num frames 8600... [2024-11-10 12:56:39,687][02935] Num frames 8700... [2024-11-10 12:56:39,806][02935] Num frames 8800... [2024-11-10 12:56:39,932][02935] Avg episode rewards: #0: 27.442, true rewards: #0: 11.067 [2024-11-10 12:56:39,933][02935] Avg episode reward: 27.442, avg true_objective: 11.067 [2024-11-10 12:56:39,992][02935] Num frames 8900... [2024-11-10 12:56:40,143][02935] Num frames 9000... [2024-11-10 12:56:40,268][02935] Num frames 9100... [2024-11-10 12:56:40,386][02935] Num frames 9200... [2024-11-10 12:56:40,514][02935] Num frames 9300... [2024-11-10 12:56:40,632][02935] Num frames 9400... [2024-11-10 12:56:40,760][02935] Num frames 9500... [2024-11-10 12:56:40,889][02935] Num frames 9600... [2024-11-10 12:56:41,009][02935] Num frames 9700... [2024-11-10 12:56:41,132][02935] Num frames 9800... [2024-11-10 12:56:41,249][02935] Num frames 9900... [2024-11-10 12:56:41,331][02935] Avg episode rewards: #0: 26.801, true rewards: #0: 11.023 [2024-11-10 12:56:41,333][02935] Avg episode reward: 26.801, avg true_objective: 11.023 [2024-11-10 12:56:41,427][02935] Num frames 10000... [2024-11-10 12:56:41,554][02935] Num frames 10100... [2024-11-10 12:56:41,672][02935] Num frames 10200... [2024-11-10 12:56:41,790][02935] Num frames 10300... [2024-11-10 12:56:41,918][02935] Num frames 10400... [2024-11-10 12:56:42,042][02935] Num frames 10500... [2024-11-10 12:56:42,165][02935] Num frames 10600... [2024-11-10 12:56:42,287][02935] Num frames 10700... [2024-11-10 12:56:42,405][02935] Num frames 10800... [2024-11-10 12:56:42,527][02935] Num frames 10900... [2024-11-10 12:56:42,653][02935] Num frames 11000... [2024-11-10 12:56:42,774][02935] Num frames 11100... [2024-11-10 12:56:42,902][02935] Num frames 11200... [2024-11-10 12:56:43,023][02935] Num frames 11300... [2024-11-10 12:56:43,144][02935] Num frames 11400... [2024-11-10 12:56:43,271][02935] Num frames 11500... [2024-11-10 12:56:43,393][02935] Num frames 11600... [2024-11-10 12:56:43,540][02935] Avg episode rewards: #0: 28.576, true rewards: #0: 11.676 [2024-11-10 12:56:43,542][02935] Avg episode reward: 28.576, avg true_objective: 11.676 [2024-11-10 12:57:52,800][02935] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-11-10 12:58:29,348][02935] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-11-10 12:58:29,352][02935] Overriding arg 'num_workers' with value 1 passed from command line [2024-11-10 12:58:29,355][02935] Adding new argument 'no_render'=True that is not in the saved config file! [2024-11-10 12:58:29,357][02935] Adding new argument 'save_video'=True that is not in the saved config file! [2024-11-10 12:58:29,360][02935] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-11-10 12:58:29,362][02935] Adding new argument 'video_name'=None that is not in the saved config file! [2024-11-10 12:58:29,366][02935] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-11-10 12:58:29,368][02935] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-11-10 12:58:29,369][02935] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-11-10 12:58:29,370][02935] Adding new argument 'hf_repository'='ToshI4/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-11-10 12:58:29,373][02935] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-11-10 12:58:29,374][02935] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-11-10 12:58:29,375][02935] Adding new argument 'train_script'=None that is not in the saved config file! [2024-11-10 12:58:29,377][02935] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-11-10 12:58:29,378][02935] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-11-10 12:58:29,413][02935] RunningMeanStd input shape: (3, 72, 128) [2024-11-10 12:58:29,416][02935] RunningMeanStd input shape: (1,) [2024-11-10 12:58:29,429][02935] ConvEncoder: input_channels=3 [2024-11-10 12:58:29,467][02935] Conv encoder output size: 512 [2024-11-10 12:58:29,468][02935] Policy head output size: 512 [2024-11-10 12:58:29,487][02935] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-11-10 12:58:29,926][02935] Num frames 100... [2024-11-10 12:58:30,042][02935] Num frames 200... [2024-11-10 12:58:30,162][02935] Num frames 300... [2024-11-10 12:58:30,283][02935] Num frames 400... [2024-11-10 12:58:30,402][02935] Num frames 500... [2024-11-10 12:58:30,523][02935] Num frames 600... [2024-11-10 12:58:30,640][02935] Num frames 700... [2024-11-10 12:58:30,765][02935] Num frames 800... [2024-11-10 12:58:30,818][02935] Avg episode rewards: #0: 16.000, true rewards: #0: 8.000 [2024-11-10 12:58:30,820][02935] Avg episode reward: 16.000, avg true_objective: 8.000 [2024-11-10 12:58:30,942][02935] Num frames 900... [2024-11-10 12:58:31,064][02935] Num frames 1000... [2024-11-10 12:58:31,182][02935] Num frames 1100... [2024-11-10 12:58:31,299][02935] Num frames 1200... [2024-11-10 12:58:31,423][02935] Num frames 1300... [2024-11-10 12:58:31,544][02935] Num frames 1400... [2024-11-10 12:58:31,668][02935] Num frames 1500... [2024-11-10 12:58:31,799][02935] Num frames 1600... [2024-11-10 12:58:31,925][02935] Num frames 1700... [2024-11-10 12:58:32,042][02935] Num frames 1800... [2024-11-10 12:58:32,160][02935] Num frames 1900... [2024-11-10 12:58:32,287][02935] Num frames 2000... [2024-11-10 12:58:32,412][02935] Num frames 2100... [2024-11-10 12:58:32,520][02935] Avg episode rewards: #0: 21.220, true rewards: #0: 10.720 [2024-11-10 12:58:32,522][02935] Avg episode reward: 21.220, avg true_objective: 10.720 [2024-11-10 12:58:32,596][02935] Num frames 2200... [2024-11-10 12:58:32,713][02935] Num frames 2300... [2024-11-10 12:58:32,849][02935] Num frames 2400... [2024-11-10 12:58:32,971][02935] Num frames 2500... [2024-11-10 12:58:33,088][02935] Num frames 2600... [2024-11-10 12:58:33,212][02935] Num frames 2700... [2024-11-10 12:58:33,332][02935] Num frames 2800... [2024-11-10 12:58:33,460][02935] Num frames 2900... [2024-11-10 12:58:33,579][02935] Num frames 3000... [2024-11-10 12:58:33,703][02935] Num frames 3100... [2024-11-10 12:58:33,822][02935] Num frames 3200... [2024-11-10 12:58:33,957][02935] Num frames 3300... [2024-11-10 12:58:34,076][02935] Num frames 3400... [2024-11-10 12:58:34,200][02935] Num frames 3500... [2024-11-10 12:58:34,283][02935] Avg episode rewards: #0: 25.067, true rewards: #0: 11.733 [2024-11-10 12:58:34,285][02935] Avg episode reward: 25.067, avg true_objective: 11.733 [2024-11-10 12:58:34,392][02935] Num frames 3600... [2024-11-10 12:58:34,517][02935] Num frames 3700... [2024-11-10 12:58:34,636][02935] Num frames 3800... [2024-11-10 12:58:34,753][02935] Num frames 3900... [2024-11-10 12:58:34,881][02935] Num frames 4000... [2024-11-10 12:58:35,007][02935] Num frames 4100... [2024-11-10 12:58:35,136][02935] Num frames 4200... [2024-11-10 12:58:35,253][02935] Num frames 4300... [2024-11-10 12:58:35,376][02935] Num frames 4400... [2024-11-10 12:58:35,500][02935] Num frames 4500... [2024-11-10 12:58:35,621][02935] Num frames 4600... [2024-11-10 12:58:35,743][02935] Num frames 4700... [2024-11-10 12:58:35,869][02935] Num frames 4800... [2024-11-10 12:58:35,999][02935] Num frames 4900... [2024-11-10 12:58:36,119][02935] Num frames 5000... [2024-11-10 12:58:36,245][02935] Num frames 5100... [2024-11-10 12:58:36,368][02935] Num frames 5200... [2024-11-10 12:58:36,515][02935] Num frames 5300... [2024-11-10 12:58:36,687][02935] Num frames 5400... [2024-11-10 12:58:36,863][02935] Num frames 5500... [2024-11-10 12:58:37,036][02935] Num frames 5600... [2024-11-10 12:58:37,128][02935] Avg episode rewards: #0: 33.800, true rewards: #0: 14.050 [2024-11-10 12:58:37,129][02935] Avg episode reward: 33.800, avg true_objective: 14.050 [2024-11-10 12:58:37,263][02935] Num frames 5700... [2024-11-10 12:58:37,420][02935] Num frames 5800... [2024-11-10 12:58:37,583][02935] Num frames 5900... [2024-11-10 12:58:37,750][02935] Num frames 6000... [2024-11-10 12:58:37,927][02935] Num frames 6100... [2024-11-10 12:58:38,096][02935] Num frames 6200... [2024-11-10 12:58:38,272][02935] Num frames 6300... [2024-11-10 12:58:38,437][02935] Num frames 6400... [2024-11-10 12:58:38,615][02935] Num frames 6500... [2024-11-10 12:58:38,787][02935] Num frames 6600... [2024-11-10 12:58:38,966][02935] Num frames 6700... [2024-11-10 12:58:39,097][02935] Num frames 6800... [2024-11-10 12:58:39,216][02935] Num frames 6900... [2024-11-10 12:58:39,343][02935] Num frames 7000... [2024-11-10 12:58:39,464][02935] Num frames 7100... [2024-11-10 12:58:39,584][02935] Num frames 7200... [2024-11-10 12:58:39,703][02935] Num frames 7300... [2024-11-10 12:58:39,823][02935] Num frames 7400... [2024-11-10 12:58:39,950][02935] Num frames 7500... [2024-11-10 12:58:40,090][02935] Avg episode rewards: #0: 36.944, true rewards: #0: 15.144 [2024-11-10 12:58:40,092][02935] Avg episode reward: 36.944, avg true_objective: 15.144 [2024-11-10 12:58:40,128][02935] Num frames 7600... [2024-11-10 12:58:40,251][02935] Num frames 7700... [2024-11-10 12:58:40,376][02935] Num frames 7800... [2024-11-10 12:58:40,495][02935] Num frames 7900... [2024-11-10 12:58:40,611][02935] Num frames 8000... [2024-11-10 12:58:40,727][02935] Num frames 8100... [2024-11-10 12:58:40,846][02935] Num frames 8200... [2024-11-10 12:58:40,966][02935] Num frames 8300... [2024-11-10 12:58:41,093][02935] Num frames 8400... [2024-11-10 12:58:41,211][02935] Num frames 8500... [2024-11-10 12:58:41,326][02935] Num frames 8600... [2024-11-10 12:58:41,449][02935] Num frames 8700... [2024-11-10 12:58:41,567][02935] Num frames 8800... [2024-11-10 12:58:41,691][02935] Num frames 8900... [2024-11-10 12:58:41,809][02935] Num frames 9000... [2024-11-10 12:58:41,923][02935] Avg episode rewards: #0: 35.906, true rewards: #0: 15.073 [2024-11-10 12:58:41,924][02935] Avg episode reward: 35.906, avg true_objective: 15.073 [2024-11-10 12:58:41,993][02935] Num frames 9100... [2024-11-10 12:58:42,118][02935] Num frames 9200... [2024-11-10 12:58:42,238][02935] Num frames 9300... [2024-11-10 12:58:42,359][02935] Num frames 9400... [2024-11-10 12:58:42,479][02935] Num frames 9500... [2024-11-10 12:58:42,600][02935] Num frames 9600... [2024-11-10 12:58:42,721][02935] Num frames 9700... [2024-11-10 12:58:42,848][02935] Num frames 9800... [2024-11-10 12:58:42,978][02935] Num frames 9900... [2024-11-10 12:58:43,101][02935] Num frames 10000... [2024-11-10 12:58:43,227][02935] Num frames 10100... [2024-11-10 12:58:43,350][02935] Num frames 10200... [2024-11-10 12:58:43,493][02935] Avg episode rewards: #0: 34.678, true rewards: #0: 14.679 [2024-11-10 12:58:43,495][02935] Avg episode reward: 34.678, avg true_objective: 14.679 [2024-11-10 12:58:43,530][02935] Num frames 10300... [2024-11-10 12:58:43,650][02935] Num frames 10400... [2024-11-10 12:58:43,779][02935] Num frames 10500... [2024-11-10 12:58:43,910][02935] Num frames 10600... [2024-11-10 12:58:44,031][02935] Num frames 10700... [2024-11-10 12:58:44,157][02935] Num frames 10800... [2024-11-10 12:58:44,277][02935] Num frames 10900... [2024-11-10 12:58:44,398][02935] Num frames 11000... [2024-11-10 12:58:44,521][02935] Num frames 11100... [2024-11-10 12:58:44,643][02935] Num frames 11200... [2024-11-10 12:58:44,765][02935] Num frames 11300... [2024-11-10 12:58:44,891][02935] Num frames 11400... [2024-11-10 12:58:45,057][02935] Avg episode rewards: #0: 33.989, true rewards: #0: 14.364 [2024-11-10 12:58:45,059][02935] Avg episode reward: 33.989, avg true_objective: 14.364 [2024-11-10 12:58:45,073][02935] Num frames 11500... [2024-11-10 12:58:45,197][02935] Num frames 11600... [2024-11-10 12:58:45,325][02935] Num frames 11700... [2024-11-10 12:58:45,447][02935] Num frames 11800... [2024-11-10 12:58:45,568][02935] Num frames 11900... [2024-11-10 12:58:45,691][02935] Num frames 12000... [2024-11-10 12:58:45,809][02935] Num frames 12100... [2024-11-10 12:58:45,938][02935] Num frames 12200... [2024-11-10 12:58:46,056][02935] Num frames 12300... [2024-11-10 12:58:46,201][02935] Avg episode rewards: #0: 32.527, true rewards: #0: 13.749 [2024-11-10 12:58:46,202][02935] Avg episode reward: 32.527, avg true_objective: 13.749 [2024-11-10 12:58:46,240][02935] Num frames 12400... [2024-11-10 12:58:46,357][02935] Num frames 12500... [2024-11-10 12:58:46,482][02935] Num frames 12600... [2024-11-10 12:58:46,601][02935] Num frames 12700... [2024-11-10 12:58:46,723][02935] Num frames 12800... [2024-11-10 12:58:46,845][02935] Num frames 12900... [2024-11-10 12:58:46,968][02935] Num frames 13000... [2024-11-10 12:58:47,088][02935] Num frames 13100... [2024-11-10 12:58:47,206][02935] Num frames 13200... [2024-11-10 12:58:47,332][02935] Num frames 13300... [2024-11-10 12:58:47,450][02935] Num frames 13400... [2024-11-10 12:58:47,573][02935] Num frames 13500... [2024-11-10 12:58:47,696][02935] Num frames 13600... [2024-11-10 12:58:47,818][02935] Num frames 13700... [2024-11-10 12:58:47,948][02935] Num frames 13800... [2024-11-10 12:58:48,070][02935] Num frames 13900... [2024-11-10 12:58:48,189][02935] Num frames 14000... [2024-11-10 12:58:48,324][02935] Num frames 14100... [2024-11-10 12:58:48,444][02935] Num frames 14200... [2024-11-10 12:58:48,570][02935] Num frames 14300... [2024-11-10 12:58:48,694][02935] Num frames 14400... [2024-11-10 12:58:48,836][02935] Avg episode rewards: #0: 34.774, true rewards: #0: 14.474 [2024-11-10 12:58:48,838][02935] Avg episode reward: 34.774, avg true_objective: 14.474 [2024-11-10 13:00:14,329][02935] Replay video saved to /content/train_dir/default_experiment/replay.mp4!