diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1119 @@ +[2024-08-03 17:31:03,328][08459] Saving configuration to /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/config.json... +[2024-08-03 17:31:03,329][08459] Rollout worker 0 uses device cpu +[2024-08-03 17:31:03,329][08459] Rollout worker 1 uses device cpu +[2024-08-03 17:31:03,329][08459] Rollout worker 2 uses device cpu +[2024-08-03 17:31:03,329][08459] Rollout worker 3 uses device cpu +[2024-08-03 17:31:03,329][08459] Rollout worker 4 uses device cpu +[2024-08-03 17:31:03,329][08459] Rollout worker 5 uses device cpu +[2024-08-03 17:31:03,330][08459] Rollout worker 6 uses device cpu +[2024-08-03 17:31:03,330][08459] Rollout worker 7 uses device cpu +[2024-08-03 17:31:03,330][08459] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 +[2024-08-03 17:31:03,341][08459] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-03 17:31:03,341][08459] InferenceWorker_p0-w0: min num requests: 2 +[2024-08-03 17:31:03,359][08459] Starting all processes... +[2024-08-03 17:31:03,359][08459] Starting process learner_proc0 +[2024-08-03 17:31:03,571][08459] Starting all processes... +[2024-08-03 17:31:03,583][08459] Starting process inference_proc0-0 +[2024-08-03 17:31:03,585][08459] Starting process rollout_proc0 +[2024-08-03 17:31:03,588][08459] Starting process rollout_proc1 +[2024-08-03 17:31:03,588][08459] Starting process rollout_proc2 +[2024-08-03 17:31:03,589][08459] Starting process rollout_proc3 +[2024-08-03 17:31:03,589][08459] Starting process rollout_proc4 +[2024-08-03 17:31:03,589][08459] Starting process rollout_proc5 +[2024-08-03 17:31:03,591][08459] Starting process rollout_proc6 +[2024-08-03 17:31:03,598][08459] Starting process rollout_proc7 +[2024-08-03 17:31:05,157][08520] Worker 1 uses CPU cores [1] +[2024-08-03 17:31:05,285][08518] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-03 17:31:05,286][08518] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-08-03 17:31:05,287][08533] Worker 7 uses CPU cores [7] +[2024-08-03 17:31:05,339][08529] Worker 3 uses CPU cores [3] +[2024-08-03 17:31:05,349][08518] Num visible devices: 1 +[2024-08-03 17:31:05,359][08531] Worker 5 uses CPU cores [5] +[2024-08-03 17:31:05,378][08505] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-03 17:31:05,379][08505] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-08-03 17:31:05,402][08532] Worker 6 uses CPU cores [6] +[2024-08-03 17:31:05,408][08505] Num visible devices: 1 +[2024-08-03 17:31:05,444][08505] Starting seed is not provided +[2024-08-03 17:31:05,444][08505] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-03 17:31:05,444][08505] Initializing actor-critic model on device cuda:0 +[2024-08-03 17:31:05,445][08505] RunningMeanStd input shape: (27,) +[2024-08-03 17:31:05,445][08505] RunningMeanStd input shape: (1,) +[2024-08-03 17:31:05,504][08505] Created Actor Critic model with architecture: +[2024-08-03 17:31:05,504][08505] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): MultiInputEncoder( + (encoders): ModuleDict( + (obs): MlpEncoder( + (mlp_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=Tanh) + (2): RecursiveScriptModule(original_name=Linear) + (3): RecursiveScriptModule(original_name=Tanh) + ) + ) + ) + ) + (core): ModelCoreIdentity() + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=64, out_features=1, bias=True) + (action_parameterization): ActionParameterizationContinuousNonAdaptiveStddev( + (distribution_linear): Linear(in_features=64, out_features=8, bias=True) + ) +) +[2024-08-03 17:31:05,507][08521] Worker 2 uses CPU cores [2] +[2024-08-03 17:31:05,719][08530] Worker 4 uses CPU cores [4] +[2024-08-03 17:31:05,760][08519] Worker 0 uses CPU cores [0] +[2024-08-03 17:31:05,912][08505] Using optimizer +[2024-08-03 17:31:06,352][08505] No checkpoints found +[2024-08-03 17:31:06,352][08505] Did not load from checkpoint, starting from scratch! +[2024-08-03 17:31:06,352][08505] Initialized policy 0 weights for model version 0 +[2024-08-03 17:31:06,354][08505] LearnerWorker_p0 finished initialization! +[2024-08-03 17:31:06,354][08505] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-08-03 17:31:06,560][08518] RunningMeanStd input shape: (27,) +[2024-08-03 17:31:06,560][08518] RunningMeanStd input shape: (1,) +[2024-08-03 17:31:06,617][08459] Inference worker 0-0 is ready! +[2024-08-03 17:31:06,617][08459] All inference workers are ready! Signal rollout workers to start! +[2024-08-03 17:31:06,716][08531] Decorrelating experience for 0 frames... +[2024-08-03 17:31:06,716][08521] Decorrelating experience for 0 frames... +[2024-08-03 17:31:06,716][08533] Decorrelating experience for 0 frames... +[2024-08-03 17:31:06,716][08532] Decorrelating experience for 0 frames... +[2024-08-03 17:31:06,717][08531] Decorrelating experience for 64 frames... +[2024-08-03 17:31:06,717][08521] Decorrelating experience for 64 frames... +[2024-08-03 17:31:06,717][08532] Decorrelating experience for 64 frames... +[2024-08-03 17:31:06,717][08533] Decorrelating experience for 64 frames... +[2024-08-03 17:31:06,717][08530] Decorrelating experience for 0 frames... +[2024-08-03 17:31:06,717][08519] Decorrelating experience for 0 frames... +[2024-08-03 17:31:06,717][08530] Decorrelating experience for 64 frames... +[2024-08-03 17:31:06,718][08519] Decorrelating experience for 64 frames... +[2024-08-03 17:31:06,719][08520] Decorrelating experience for 0 frames... +[2024-08-03 17:31:06,720][08520] Decorrelating experience for 64 frames... +[2024-08-03 17:31:06,734][08529] Decorrelating experience for 0 frames... +[2024-08-03 17:31:06,735][08529] Decorrelating experience for 64 frames... +[2024-08-03 17:31:06,739][08532] Decorrelating experience for 128 frames... +[2024-08-03 17:31:06,739][08531] Decorrelating experience for 128 frames... +[2024-08-03 17:31:06,741][08519] Decorrelating experience for 128 frames... +[2024-08-03 17:31:06,742][08533] Decorrelating experience for 128 frames... +[2024-08-03 17:31:06,742][08521] Decorrelating experience for 128 frames... +[2024-08-03 17:31:06,743][08530] Decorrelating experience for 128 frames... +[2024-08-03 17:31:06,744][08520] Decorrelating experience for 128 frames... +[2024-08-03 17:31:06,758][08529] Decorrelating experience for 128 frames... +[2024-08-03 17:31:06,783][08532] Decorrelating experience for 192 frames... +[2024-08-03 17:31:06,784][08531] Decorrelating experience for 192 frames... +[2024-08-03 17:31:06,786][08519] Decorrelating experience for 192 frames... +[2024-08-03 17:31:06,786][08530] Decorrelating experience for 192 frames... +[2024-08-03 17:31:06,787][08533] Decorrelating experience for 192 frames... +[2024-08-03 17:31:06,789][08521] Decorrelating experience for 192 frames... +[2024-08-03 17:31:06,790][08520] Decorrelating experience for 192 frames... +[2024-08-03 17:31:06,800][08529] Decorrelating experience for 192 frames... +[2024-08-03 17:31:06,857][08531] Decorrelating experience for 256 frames... +[2024-08-03 17:31:06,858][08532] Decorrelating experience for 256 frames... +[2024-08-03 17:31:06,863][08530] Decorrelating experience for 256 frames... +[2024-08-03 17:31:06,864][08519] Decorrelating experience for 256 frames... +[2024-08-03 17:31:06,865][08533] Decorrelating experience for 256 frames... +[2024-08-03 17:31:06,867][08520] Decorrelating experience for 256 frames... +[2024-08-03 17:31:06,868][08521] Decorrelating experience for 256 frames... +[2024-08-03 17:31:06,873][08529] Decorrelating experience for 256 frames... +[2024-08-03 17:31:06,944][08531] Decorrelating experience for 320 frames... +[2024-08-03 17:31:06,944][08532] Decorrelating experience for 320 frames... +[2024-08-03 17:31:06,949][08530] Decorrelating experience for 320 frames... +[2024-08-03 17:31:06,951][08519] Decorrelating experience for 320 frames... +[2024-08-03 17:31:06,954][08533] Decorrelating experience for 320 frames... +[2024-08-03 17:31:06,956][08520] Decorrelating experience for 320 frames... +[2024-08-03 17:31:06,957][08521] Decorrelating experience for 320 frames... +[2024-08-03 17:31:06,959][08529] Decorrelating experience for 320 frames... +[2024-08-03 17:31:07,052][08532] Decorrelating experience for 384 frames... +[2024-08-03 17:31:07,053][08531] Decorrelating experience for 384 frames... +[2024-08-03 17:31:07,060][08530] Decorrelating experience for 384 frames... +[2024-08-03 17:31:07,061][08519] Decorrelating experience for 384 frames... +[2024-08-03 17:31:07,065][08529] Decorrelating experience for 384 frames... +[2024-08-03 17:31:07,068][08520] Decorrelating experience for 384 frames... +[2024-08-03 17:31:07,069][08533] Decorrelating experience for 384 frames... +[2024-08-03 17:31:07,070][08521] Decorrelating experience for 384 frames... +[2024-08-03 17:31:07,184][08532] Decorrelating experience for 448 frames... +[2024-08-03 17:31:07,184][08531] Decorrelating experience for 448 frames... +[2024-08-03 17:31:07,192][08519] Decorrelating experience for 448 frames... +[2024-08-03 17:31:07,194][08530] Decorrelating experience for 448 frames... +[2024-08-03 17:31:07,194][08529] Decorrelating experience for 448 frames... +[2024-08-03 17:31:07,201][08533] Decorrelating experience for 448 frames... +[2024-08-03 17:31:07,204][08521] Decorrelating experience for 448 frames... +[2024-08-03 17:31:07,205][08520] Decorrelating experience for 448 frames... +[2024-08-03 17:31:10,755][08459] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 12288. Throughput: 0: nan. Samples: 7664. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:31:10,755][08459] Avg episode reward: [(0, '-132.225')] +[2024-08-03 17:31:13,578][08518] Updated weights for policy 0, policy_version 80 (0.0006) +[2024-08-03 17:31:15,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 9011.2). Total num frames: 57344. Throughput: 0: 10152.0. Samples: 58424. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:31:15,755][08459] Avg episode reward: [(0, '-231.521')] +[2024-08-03 17:31:15,760][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000000112_57344.pth... +[2024-08-03 17:31:18,260][08518] Updated weights for policy 0, policy_version 160 (0.0006) +[2024-08-03 17:31:20,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9420.9, 300 sec: 9420.9). Total num frames: 106496. Throughput: 0: 7841.7. Samples: 86080. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:31:20,755][08459] Avg episode reward: [(0, '-109.434')] +[2024-08-03 17:31:20,756][08505] Saving new best policy, reward=-109.434! +[2024-08-03 17:31:22,456][08518] Updated weights for policy 0, policy_version 240 (0.0006) +[2024-08-03 17:31:23,337][08459] Heartbeat connected on Batcher_0 +[2024-08-03 17:31:23,339][08459] Heartbeat connected on LearnerWorker_p0 +[2024-08-03 17:31:23,347][08459] Heartbeat connected on InferenceWorker_p0-w0 +[2024-08-03 17:31:23,348][08459] Heartbeat connected on RolloutWorker_w1 +[2024-08-03 17:31:23,349][08459] Heartbeat connected on RolloutWorker_w0 +[2024-08-03 17:31:23,351][08459] Heartbeat connected on RolloutWorker_w3 +[2024-08-03 17:31:23,352][08459] Heartbeat connected on RolloutWorker_w2 +[2024-08-03 17:31:23,357][08459] Heartbeat connected on RolloutWorker_w5 +[2024-08-03 17:31:23,357][08459] Heartbeat connected on RolloutWorker_w6 +[2024-08-03 17:31:23,360][08459] Heartbeat connected on RolloutWorker_w7 +[2024-08-03 17:31:23,364][08459] Heartbeat connected on RolloutWorker_w4 +[2024-08-03 17:31:25,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9284.3, 300 sec: 9284.3). Total num frames: 151552. Throughput: 0: 9047.8. Samples: 143380. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-08-03 17:31:25,755][08459] Avg episode reward: [(0, '-45.738')] +[2024-08-03 17:31:25,756][08505] Saving new best policy, reward=-45.738! +[2024-08-03 17:31:26,720][08518] Updated weights for policy 0, policy_version 320 (0.0007) +[2024-08-03 17:31:30,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9420.8, 300 sec: 9420.8). Total num frames: 200704. Throughput: 0: 9638.6. Samples: 200436. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:31:30,756][08459] Avg episode reward: [(0, '-36.309')] +[2024-08-03 17:31:30,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000000392_200704.pth... +[2024-08-03 17:31:30,763][08505] Saving new best policy, reward=-36.309! +[2024-08-03 17:31:31,206][08518] Updated weights for policy 0, policy_version 400 (0.0006) +[2024-08-03 17:31:35,249][08518] Updated weights for policy 0, policy_version 480 (0.0005) +[2024-08-03 17:31:35,755][08459] Fps is (10 sec: 9830.3, 60 sec: 9502.7, 300 sec: 9502.7). Total num frames: 249856. Throughput: 0: 8944.3. Samples: 231272. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) +[2024-08-03 17:31:35,755][08459] Avg episode reward: [(0, '-35.099')] +[2024-08-03 17:31:35,756][08505] Saving new best policy, reward=-35.099! +[2024-08-03 17:31:39,537][08518] Updated weights for policy 0, policy_version 560 (0.0005) +[2024-08-03 17:31:40,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9557.3, 300 sec: 9557.3). Total num frames: 299008. Throughput: 0: 9332.5. Samples: 287640. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:31:40,756][08459] Avg episode reward: [(0, '-55.993')] +[2024-08-03 17:31:44,232][08518] Updated weights for policy 0, policy_version 640 (0.0007) +[2024-08-03 17:31:45,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9362.3, 300 sec: 9362.3). Total num frames: 339968. Throughput: 0: 9555.5. Samples: 342108. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-08-03 17:31:45,755][08459] Avg episode reward: [(0, '-16.988')] +[2024-08-03 17:31:45,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000000664_339968.pth... +[2024-08-03 17:31:45,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000000112_57344.pth +[2024-08-03 17:31:45,761][08505] Saving new best policy, reward=-16.988! +[2024-08-03 17:31:48,687][08518] Updated weights for policy 0, policy_version 720 (0.0006) +[2024-08-03 17:31:50,755][08459] Fps is (10 sec: 8601.7, 60 sec: 9318.4, 300 sec: 9318.4). Total num frames: 385024. Throughput: 0: 9023.8. Samples: 368616. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:31:50,755][08459] Avg episode reward: [(0, '28.523')] +[2024-08-03 17:31:50,756][08505] Saving new best policy, reward=28.523! +[2024-08-03 17:31:53,420][08518] Updated weights for policy 0, policy_version 800 (0.0007) +[2024-08-03 17:31:55,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9284.3, 300 sec: 9284.3). Total num frames: 430080. Throughput: 0: 9188.4. Samples: 421140. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:31:55,755][08459] Avg episode reward: [(0, '125.799')] +[2024-08-03 17:31:55,756][08505] Saving new best policy, reward=125.799! +[2024-08-03 17:31:57,857][08518] Updated weights for policy 0, policy_version 880 (0.0006) +[2024-08-03 17:32:00,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9257.0, 300 sec: 9257.0). Total num frames: 475136. Throughput: 0: 9281.0. Samples: 476068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:32:00,763][08459] Avg episode reward: [(0, '244.810')] +[2024-08-03 17:32:00,765][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000000928_475136.pth... +[2024-08-03 17:32:00,770][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000000392_200704.pth +[2024-08-03 17:32:00,770][08505] Saving new best policy, reward=244.810! +[2024-08-03 17:32:02,185][08518] Updated weights for policy 0, policy_version 960 (0.0006) +[2024-08-03 17:32:05,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9309.1, 300 sec: 9309.1). Total num frames: 524288. Throughput: 0: 9313.9. Samples: 505204. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:32:05,755][08459] Avg episode reward: [(0, '379.247')] +[2024-08-03 17:32:05,756][08505] Saving new best policy, reward=379.247! +[2024-08-03 17:32:06,647][08518] Updated weights for policy 0, policy_version 1040 (0.0006) +[2024-08-03 17:32:10,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9284.3, 300 sec: 9284.3). Total num frames: 569344. Throughput: 0: 9285.0. Samples: 561208. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:32:10,755][08459] Avg episode reward: [(0, '519.662')] +[2024-08-03 17:32:10,756][08505] Saving new best policy, reward=519.662! +[2024-08-03 17:32:11,073][08518] Updated weights for policy 0, policy_version 1120 (0.0006) +[2024-08-03 17:32:15,645][08518] Updated weights for policy 0, policy_version 1200 (0.0008) +[2024-08-03 17:32:15,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9284.3, 300 sec: 9263.3). Total num frames: 614400. Throughput: 0: 9200.6. Samples: 614464. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:32:15,755][08459] Avg episode reward: [(0, '634.036')] +[2024-08-03 17:32:15,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000001200_614400.pth... +[2024-08-03 17:32:15,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000000664_339968.pth +[2024-08-03 17:32:15,761][08505] Saving new best policy, reward=634.036! +[2024-08-03 17:32:20,115][08518] Updated weights for policy 0, policy_version 1280 (0.0007) +[2024-08-03 17:32:20,758][08459] Fps is (10 sec: 9008.8, 60 sec: 9215.6, 300 sec: 9244.9). Total num frames: 659456. Throughput: 0: 9141.2. Samples: 642652. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-08-03 17:32:20,758][08459] Avg episode reward: [(0, '754.930')] +[2024-08-03 17:32:20,760][08505] Saving new best policy, reward=754.930! +[2024-08-03 17:32:24,604][08518] Updated weights for policy 0, policy_version 1360 (0.0006) +[2024-08-03 17:32:25,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9216.0, 300 sec: 9229.7). Total num frames: 704512. Throughput: 0: 9085.6. Samples: 696492. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:32:25,755][08459] Avg episode reward: [(0, '833.701')] +[2024-08-03 17:32:25,756][08505] Saving new best policy, reward=833.701! +[2024-08-03 17:32:29,120][08518] Updated weights for policy 0, policy_version 1440 (0.0006) +[2024-08-03 17:32:30,755][08459] Fps is (10 sec: 9013.6, 60 sec: 9147.7, 300 sec: 9216.0). Total num frames: 749568. Throughput: 0: 9083.9. Samples: 750884. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:32:30,755][08459] Avg episode reward: [(0, '930.675')] +[2024-08-03 17:32:30,757][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000001464_749568.pth... +[2024-08-03 17:32:30,760][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000000928_475136.pth +[2024-08-03 17:32:30,761][08505] Saving new best policy, reward=930.675! +[2024-08-03 17:32:33,446][08518] Updated weights for policy 0, policy_version 1520 (0.0006) +[2024-08-03 17:32:35,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9147.7, 300 sec: 9252.1). Total num frames: 798720. Throughput: 0: 9153.8. Samples: 780536. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:32:35,755][08459] Avg episode reward: [(0, '1030.855')] +[2024-08-03 17:32:35,756][08505] Saving new best policy, reward=1030.855! +[2024-08-03 17:32:37,785][08518] Updated weights for policy 0, policy_version 1600 (0.0008) +[2024-08-03 17:32:40,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 9193.2). Total num frames: 839680. Throughput: 0: 9195.2. Samples: 834924. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:32:40,763][08459] Avg episode reward: [(0, '1107.690')] +[2024-08-03 17:32:40,763][08505] Saving new best policy, reward=1107.690! +[2024-08-03 17:32:42,667][08518] Updated weights for policy 0, policy_version 1680 (0.0006) +[2024-08-03 17:32:45,755][08459] Fps is (10 sec: 8601.6, 60 sec: 9079.5, 300 sec: 9183.7). Total num frames: 884736. Throughput: 0: 9145.8. Samples: 887628. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:32:45,755][08459] Avg episode reward: [(0, '1201.064')] +[2024-08-03 17:32:45,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000001728_884736.pth... +[2024-08-03 17:32:45,760][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000001200_614400.pth +[2024-08-03 17:32:45,761][08505] Saving new best policy, reward=1201.064! +[2024-08-03 17:32:47,277][08518] Updated weights for policy 0, policy_version 1760 (0.0007) +[2024-08-03 17:32:50,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9079.5, 300 sec: 9175.0). Total num frames: 929792. Throughput: 0: 9072.3. Samples: 913456. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) +[2024-08-03 17:32:50,763][08459] Avg episode reward: [(0, '1288.832')] +[2024-08-03 17:32:50,763][08505] Saving new best policy, reward=1288.832! +[2024-08-03 17:32:51,812][08518] Updated weights for policy 0, policy_version 1840 (0.0007) +[2024-08-03 17:32:55,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9147.7, 300 sec: 9206.3). Total num frames: 978944. Throughput: 0: 9099.1. Samples: 970668. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:32:55,755][08459] Avg episode reward: [(0, '1376.698')] +[2024-08-03 17:32:55,756][08505] Saving new best policy, reward=1376.698! +[2024-08-03 17:32:56,090][08518] Updated weights for policy 0, policy_version 1920 (0.0006) +[2024-08-03 17:33:00,688][08518] Updated weights for policy 0, policy_version 2000 (0.0007) +[2024-08-03 17:33:00,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9147.7, 300 sec: 9197.4). Total num frames: 1024000. Throughput: 0: 9101.8. Samples: 1024044. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:33:00,755][08459] Avg episode reward: [(0, '1289.790')] +[2024-08-03 17:33:00,760][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000002000_1024000.pth... +[2024-08-03 17:33:00,765][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000001464_749568.pth +[2024-08-03 17:33:05,460][08518] Updated weights for policy 0, policy_version 2080 (0.0007) +[2024-08-03 17:33:05,755][08459] Fps is (10 sec: 8601.4, 60 sec: 9011.2, 300 sec: 9153.7). Total num frames: 1064960. Throughput: 0: 9026.2. Samples: 1048808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:33:05,756][08459] Avg episode reward: [(0, '1321.336')] +[2024-08-03 17:33:09,780][08518] Updated weights for policy 0, policy_version 2160 (0.0007) +[2024-08-03 17:33:10,756][08459] Fps is (10 sec: 8601.1, 60 sec: 9011.1, 300 sec: 9147.7). Total num frames: 1110016. Throughput: 0: 9078.7. Samples: 1105040. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:33:10,756][08459] Avg episode reward: [(0, '1476.268')] +[2024-08-03 17:33:10,758][08505] Saving new best policy, reward=1476.268! +[2024-08-03 17:33:14,497][08518] Updated weights for policy 0, policy_version 2240 (0.0006) +[2024-08-03 17:33:15,755][08459] Fps is (10 sec: 9011.4, 60 sec: 9011.2, 300 sec: 9142.3). Total num frames: 1155072. Throughput: 0: 9019.0. Samples: 1156740. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:33:15,755][08459] Avg episode reward: [(0, '1626.853')] +[2024-08-03 17:33:15,757][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000002256_1155072.pth... +[2024-08-03 17:33:15,760][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000001728_884736.pth +[2024-08-03 17:33:15,761][08505] Saving new best policy, reward=1626.853! +[2024-08-03 17:33:18,717][08518] Updated weights for policy 0, policy_version 2320 (0.0006) +[2024-08-03 17:33:20,755][08459] Fps is (10 sec: 9421.3, 60 sec: 9079.9, 300 sec: 9168.7). Total num frames: 1204224. Throughput: 0: 9050.5. Samples: 1187808. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:33:20,755][08459] Avg episode reward: [(0, '1705.457')] +[2024-08-03 17:33:20,756][08505] Saving new best policy, reward=1705.457! +[2024-08-03 17:33:23,465][08518] Updated weights for policy 0, policy_version 2400 (0.0006) +[2024-08-03 17:33:25,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9011.2, 300 sec: 9132.6). Total num frames: 1245184. Throughput: 0: 8970.5. Samples: 1238596. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) +[2024-08-03 17:33:25,755][08459] Avg episode reward: [(0, '1832.401')] +[2024-08-03 17:33:25,756][08505] Saving new best policy, reward=1832.401! +[2024-08-03 17:33:28,986][08518] Updated weights for policy 0, policy_version 2480 (0.0007) +[2024-08-03 17:33:30,755][08459] Fps is (10 sec: 7782.4, 60 sec: 8874.7, 300 sec: 9069.7). Total num frames: 1282048. Throughput: 0: 8796.1. Samples: 1283452. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:33:30,755][08459] Avg episode reward: [(0, '1914.915')] +[2024-08-03 17:33:30,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000002504_1282048.pth... +[2024-08-03 17:33:30,762][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000002000_1024000.pth +[2024-08-03 17:33:30,762][08505] Saving new best policy, reward=1914.915! +[2024-08-03 17:33:33,539][08518] Updated weights for policy 0, policy_version 2560 (0.0006) +[2024-08-03 17:33:35,755][08459] Fps is (10 sec: 8192.0, 60 sec: 8806.4, 300 sec: 9067.7). Total num frames: 1327104. Throughput: 0: 8840.8. Samples: 1311292. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:33:35,755][08459] Avg episode reward: [(0, '2120.663')] +[2024-08-03 17:33:35,756][08505] Saving new best policy, reward=2120.663! +[2024-08-03 17:33:38,499][08518] Updated weights for policy 0, policy_version 2640 (0.0007) +[2024-08-03 17:33:40,755][08459] Fps is (10 sec: 8601.6, 60 sec: 8806.4, 300 sec: 9038.5). Total num frames: 1368064. Throughput: 0: 8716.1. Samples: 1362892. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-08-03 17:33:40,756][08459] Avg episode reward: [(0, '2277.772')] +[2024-08-03 17:33:40,756][08505] Saving new best policy, reward=2277.772! +[2024-08-03 17:33:43,387][08518] Updated weights for policy 0, policy_version 2720 (0.0008) +[2024-08-03 17:33:45,755][08459] Fps is (10 sec: 8191.9, 60 sec: 8738.1, 300 sec: 9011.2). Total num frames: 1409024. Throughput: 0: 8583.5. Samples: 1410300. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:33:45,755][08459] Avg episode reward: [(0, '2265.191')] +[2024-08-03 17:33:45,759][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000002752_1409024.pth... +[2024-08-03 17:33:45,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000002256_1155072.pth +[2024-08-03 17:33:48,278][08518] Updated weights for policy 0, policy_version 2800 (0.0006) +[2024-08-03 17:33:50,755][08459] Fps is (10 sec: 8601.6, 60 sec: 8738.1, 300 sec: 9011.2). Total num frames: 1454080. Throughput: 0: 8634.2. Samples: 1437344. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:33:50,756][08459] Avg episode reward: [(0, '2503.994')] +[2024-08-03 17:33:50,756][08505] Saving new best policy, reward=2503.994! +[2024-08-03 17:33:53,042][08518] Updated weights for policy 0, policy_version 2880 (0.0007) +[2024-08-03 17:33:55,755][08459] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8986.4). Total num frames: 1495040. Throughput: 0: 8505.0. Samples: 1487760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:33:55,755][08459] Avg episode reward: [(0, '2905.540')] +[2024-08-03 17:33:55,756][08505] Saving new best policy, reward=2905.540! +[2024-08-03 17:33:57,949][08518] Updated weights for policy 0, policy_version 2960 (0.0008) +[2024-08-03 17:34:00,755][08459] Fps is (10 sec: 8192.0, 60 sec: 8533.3, 300 sec: 8963.0). Total num frames: 1536000. Throughput: 0: 8506.6. Samples: 1539536. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:34:00,763][08459] Avg episode reward: [(0, '2986.662')] +[2024-08-03 17:34:00,766][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000003000_1536000.pth... +[2024-08-03 17:34:00,770][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000002504_1282048.pth +[2024-08-03 17:34:00,770][08505] Saving new best policy, reward=2986.662! +[2024-08-03 17:34:02,755][08518] Updated weights for policy 0, policy_version 3040 (0.0008) +[2024-08-03 17:34:05,755][08459] Fps is (10 sec: 8601.5, 60 sec: 8601.6, 300 sec: 8964.4). Total num frames: 1581056. Throughput: 0: 8377.0. Samples: 1564772. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:34:05,756][08459] Avg episode reward: [(0, '2907.643')] +[2024-08-03 17:34:07,499][08518] Updated weights for policy 0, policy_version 3120 (0.0007) +[2024-08-03 17:34:10,755][08459] Fps is (10 sec: 8601.6, 60 sec: 8533.4, 300 sec: 8942.9). Total num frames: 1622016. Throughput: 0: 8346.7. Samples: 1614200. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:34:10,756][08459] Avg episode reward: [(0, '2717.742')] +[2024-08-03 17:34:12,867][08518] Updated weights for policy 0, policy_version 3200 (0.0010) +[2024-08-03 17:34:15,755][08459] Fps is (10 sec: 7782.5, 60 sec: 8396.8, 300 sec: 8900.5). Total num frames: 1658880. Throughput: 0: 8411.7. Samples: 1661976. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:34:15,763][08459] Avg episode reward: [(0, '2546.582')] +[2024-08-03 17:34:15,765][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000003240_1658880.pth... +[2024-08-03 17:34:15,768][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000002752_1409024.pth +[2024-08-03 17:34:17,709][08518] Updated weights for policy 0, policy_version 3280 (0.0006) +[2024-08-03 17:34:20,755][08459] Fps is (10 sec: 8192.1, 60 sec: 8328.5, 300 sec: 8903.4). Total num frames: 1703936. Throughput: 0: 8375.9. Samples: 1688208. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) +[2024-08-03 17:34:20,755][08459] Avg episode reward: [(0, '2538.835')] +[2024-08-03 17:34:22,241][08518] Updated weights for policy 0, policy_version 3360 (0.0007) +[2024-08-03 17:34:25,755][08459] Fps is (10 sec: 9420.8, 60 sec: 8465.1, 300 sec: 8927.2). Total num frames: 1753088. Throughput: 0: 8491.5. Samples: 1745008. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:34:25,755][08459] Avg episode reward: [(0, '2611.500')] +[2024-08-03 17:34:26,324][08518] Updated weights for policy 0, policy_version 3440 (0.0005) +[2024-08-03 17:34:30,755][08459] Fps is (10 sec: 9420.8, 60 sec: 8601.6, 300 sec: 8929.3). Total num frames: 1798144. Throughput: 0: 8675.5. Samples: 1800696. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:34:30,755][08459] Avg episode reward: [(0, '3148.569')] +[2024-08-03 17:34:30,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000003512_1798144.pth... +[2024-08-03 17:34:30,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000003000_1536000.pth +[2024-08-03 17:34:30,761][08505] Saving new best policy, reward=3148.569! +[2024-08-03 17:34:30,818][08518] Updated weights for policy 0, policy_version 3520 (0.0007) +[2024-08-03 17:34:35,151][08518] Updated weights for policy 0, policy_version 3600 (0.0007) +[2024-08-03 17:34:35,755][08459] Fps is (10 sec: 9420.8, 60 sec: 8669.9, 300 sec: 8951.3). Total num frames: 1847296. Throughput: 0: 8747.0. Samples: 1830960. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-08-03 17:34:35,755][08459] Avg episode reward: [(0, '3458.177')] +[2024-08-03 17:34:35,756][08505] Saving new best policy, reward=3458.177! +[2024-08-03 17:34:39,631][08518] Updated weights for policy 0, policy_version 3680 (0.0007) +[2024-08-03 17:34:40,755][08459] Fps is (10 sec: 9420.8, 60 sec: 8738.1, 300 sec: 8952.7). Total num frames: 1892352. Throughput: 0: 8811.2. Samples: 1884264. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) +[2024-08-03 17:34:40,755][08459] Avg episode reward: [(0, '3351.798')] +[2024-08-03 17:34:44,168][08518] Updated weights for policy 0, policy_version 3760 (0.0008) +[2024-08-03 17:34:45,755][08459] Fps is (10 sec: 9011.2, 60 sec: 8806.4, 300 sec: 8954.0). Total num frames: 1937408. Throughput: 0: 8866.1. Samples: 1938508. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2024-08-03 17:34:45,763][08459] Avg episode reward: [(0, '3369.593')] +[2024-08-03 17:34:45,766][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000003784_1937408.pth... +[2024-08-03 17:34:45,769][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000003240_1658880.pth +[2024-08-03 17:34:48,501][08518] Updated weights for policy 0, policy_version 3840 (0.0006) +[2024-08-03 17:34:50,755][08459] Fps is (10 sec: 9420.8, 60 sec: 8874.7, 300 sec: 8974.0). Total num frames: 1986560. Throughput: 0: 8953.2. Samples: 1967664. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:34:50,755][08459] Avg episode reward: [(0, '3682.897')] +[2024-08-03 17:34:50,756][08505] Saving new best policy, reward=3682.897! +[2024-08-03 17:34:52,911][08518] Updated weights for policy 0, policy_version 3920 (0.0007) +[2024-08-03 17:34:55,755][08459] Fps is (10 sec: 9420.8, 60 sec: 8942.9, 300 sec: 8974.8). Total num frames: 2031616. Throughput: 0: 9087.8. Samples: 2023148. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) +[2024-08-03 17:34:55,755][08459] Avg episode reward: [(0, '3910.246')] +[2024-08-03 17:34:55,756][08505] Saving new best policy, reward=3910.246! +[2024-08-03 17:34:57,326][08518] Updated weights for policy 0, policy_version 4000 (0.0007) +[2024-08-03 17:35:00,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9079.5, 300 sec: 8993.4). Total num frames: 2080768. Throughput: 0: 9305.9. Samples: 2080744. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2024-08-03 17:35:00,763][08459] Avg episode reward: [(0, '4042.405')] +[2024-08-03 17:35:00,766][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000004064_2080768.pth... +[2024-08-03 17:35:00,769][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000003512_1798144.pth +[2024-08-03 17:35:00,769][08505] Saving new best policy, reward=4042.405! +[2024-08-03 17:35:01,462][08518] Updated weights for policy 0, policy_version 4080 (0.0006) +[2024-08-03 17:35:05,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9079.5, 300 sec: 8993.8). Total num frames: 2125824. Throughput: 0: 9360.7. Samples: 2109440. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:35:05,755][08459] Avg episode reward: [(0, '3921.291')] +[2024-08-03 17:35:05,856][08518] Updated weights for policy 0, policy_version 4160 (0.0007) +[2024-08-03 17:35:10,174][08518] Updated weights for policy 0, policy_version 4240 (0.0006) +[2024-08-03 17:35:10,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9216.0, 300 sec: 9011.2). Total num frames: 2174976. Throughput: 0: 9349.9. Samples: 2165752. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:35:10,755][08459] Avg episode reward: [(0, '3756.514')] +[2024-08-03 17:35:14,452][08518] Updated weights for policy 0, policy_version 4320 (0.0006) +[2024-08-03 17:35:15,755][08459] Fps is (10 sec: 9830.3, 60 sec: 9420.8, 300 sec: 9027.9). Total num frames: 2224128. Throughput: 0: 9400.3. Samples: 2223708. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:35:15,755][08459] Avg episode reward: [(0, '3827.902')] +[2024-08-03 17:35:15,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000004344_2224128.pth... +[2024-08-03 17:35:15,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000003784_1937408.pth +[2024-08-03 17:35:18,778][08518] Updated weights for policy 0, policy_version 4400 (0.0008) +[2024-08-03 17:35:20,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9027.6). Total num frames: 2269184. Throughput: 0: 9364.6. Samples: 2252368. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:35:20,755][08459] Avg episode reward: [(0, '4079.339')] +[2024-08-03 17:35:20,756][08505] Saving new best policy, reward=4079.339! +[2024-08-03 17:35:23,063][08518] Updated weights for policy 0, policy_version 4480 (0.0007) +[2024-08-03 17:35:25,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9043.3). Total num frames: 2318336. Throughput: 0: 9477.7. Samples: 2310760. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:35:25,755][08459] Avg episode reward: [(0, '4113.011')] +[2024-08-03 17:35:25,756][08505] Saving new best policy, reward=4113.011! +[2024-08-03 17:35:27,156][08518] Updated weights for policy 0, policy_version 4560 (0.0006) +[2024-08-03 17:35:30,755][08459] Fps is (10 sec: 9830.2, 60 sec: 9489.0, 300 sec: 9058.5). Total num frames: 2367488. Throughput: 0: 9534.3. Samples: 2367552. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) +[2024-08-03 17:35:30,756][08459] Avg episode reward: [(0, '4255.795')] +[2024-08-03 17:35:30,761][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000004624_2367488.pth... +[2024-08-03 17:35:30,766][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000004064_2080768.pth +[2024-08-03 17:35:30,767][08505] Saving new best policy, reward=4255.795! +[2024-08-03 17:35:31,633][08518] Updated weights for policy 0, policy_version 4640 (0.0009) +[2024-08-03 17:35:35,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9057.6). Total num frames: 2412544. Throughput: 0: 9519.8. Samples: 2396056. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) +[2024-08-03 17:35:35,755][08459] Avg episode reward: [(0, '4365.433')] +[2024-08-03 17:35:35,770][08505] Saving new best policy, reward=4365.433! +[2024-08-03 17:35:35,773][08518] Updated weights for policy 0, policy_version 4720 (0.0005) +[2024-08-03 17:35:40,451][08518] Updated weights for policy 0, policy_version 4800 (0.0008) +[2024-08-03 17:35:40,755][08459] Fps is (10 sec: 9011.4, 60 sec: 9420.8, 300 sec: 9056.7). Total num frames: 2457600. Throughput: 0: 9511.3. Samples: 2451156. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2024-08-03 17:35:40,755][08459] Avg episode reward: [(0, '4524.168')] +[2024-08-03 17:35:40,756][08505] Saving new best policy, reward=4524.168! +[2024-08-03 17:35:44,855][08518] Updated weights for policy 0, policy_version 4880 (0.0006) +[2024-08-03 17:35:45,756][08459] Fps is (10 sec: 9010.7, 60 sec: 9420.7, 300 sec: 9055.9). Total num frames: 2502656. Throughput: 0: 9447.5. Samples: 2505888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:35:45,756][08459] Avg episode reward: [(0, '4738.356')] +[2024-08-03 17:35:45,767][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000004888_2502656.pth... +[2024-08-03 17:35:45,781][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000004344_2224128.pth +[2024-08-03 17:35:45,782][08505] Saving new best policy, reward=4738.356! +[2024-08-03 17:35:49,000][08518] Updated weights for policy 0, policy_version 4960 (0.0006) +[2024-08-03 17:35:50,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9069.7). Total num frames: 2551808. Throughput: 0: 9482.5. Samples: 2536152. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:35:50,755][08459] Avg episode reward: [(0, '4679.548')] +[2024-08-03 17:35:53,442][08518] Updated weights for policy 0, policy_version 5040 (0.0007) +[2024-08-03 17:35:55,755][08459] Fps is (10 sec: 9831.0, 60 sec: 9489.1, 300 sec: 9083.1). Total num frames: 2600960. Throughput: 0: 9483.8. Samples: 2592524. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:35:55,755][08459] Avg episode reward: [(0, '4611.077')] +[2024-08-03 17:35:57,617][08518] Updated weights for policy 0, policy_version 5120 (0.0006) +[2024-08-03 17:36:00,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9081.8). Total num frames: 2646016. Throughput: 0: 9460.9. Samples: 2649448. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:36:00,755][08459] Avg episode reward: [(0, '4642.144')] +[2024-08-03 17:36:00,773][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000005176_2650112.pth... +[2024-08-03 17:36:00,777][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000004624_2367488.pth +[2024-08-03 17:36:02,021][08518] Updated weights for policy 0, policy_version 5200 (0.0007) +[2024-08-03 17:36:05,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9489.0, 300 sec: 9094.5). Total num frames: 2695168. Throughput: 0: 9414.5. Samples: 2676020. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:36:05,756][08459] Avg episode reward: [(0, '4550.089')] +[2024-08-03 17:36:06,490][08518] Updated weights for policy 0, policy_version 5280 (0.0008) +[2024-08-03 17:36:10,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9094.5). Total num frames: 2740224. Throughput: 0: 9405.9. Samples: 2734024. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:36:10,755][08459] Avg episode reward: [(0, '4475.385')] +[2024-08-03 17:36:10,783][08518] Updated weights for policy 0, policy_version 5360 (0.0008) +[2024-08-03 17:36:14,923][08518] Updated weights for policy 0, policy_version 5440 (0.0006) +[2024-08-03 17:36:15,755][08459] Fps is (10 sec: 9830.5, 60 sec: 9489.1, 300 sec: 9108.4). Total num frames: 2793472. Throughput: 0: 9463.9. Samples: 2793428. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:36:15,755][08459] Avg episode reward: [(0, '4611.691')] +[2024-08-03 17:36:15,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000005456_2793472.pth... +[2024-08-03 17:36:15,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000004888_2502656.pth +[2024-08-03 17:36:19,310][08518] Updated weights for policy 0, policy_version 5520 (0.0006) +[2024-08-03 17:36:20,755][08459] Fps is (10 sec: 9830.1, 60 sec: 9489.0, 300 sec: 9108.4). Total num frames: 2838528. Throughput: 0: 9454.2. Samples: 2821500. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) +[2024-08-03 17:36:20,756][08459] Avg episode reward: [(0, '4574.738')] +[2024-08-03 17:36:23,220][08518] Updated weights for policy 0, policy_version 5600 (0.0006) +[2024-08-03 17:36:25,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9108.4). Total num frames: 2887680. Throughput: 0: 9536.3. Samples: 2880288. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) +[2024-08-03 17:36:25,755][08459] Avg episode reward: [(0, '4540.041')] +[2024-08-03 17:36:27,654][08518] Updated weights for policy 0, policy_version 5680 (0.0008) +[2024-08-03 17:36:30,755][08459] Fps is (10 sec: 9421.1, 60 sec: 9420.8, 300 sec: 9094.5). Total num frames: 2932736. Throughput: 0: 9519.8. Samples: 2934272. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2024-08-03 17:36:30,763][08459] Avg episode reward: [(0, '4734.400')] +[2024-08-03 17:36:30,766][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000005728_2932736.pth... +[2024-08-03 17:36:30,769][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000005176_2650112.pth +[2024-08-03 17:36:32,374][08518] Updated weights for policy 0, policy_version 5760 (0.0008) +[2024-08-03 17:36:35,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9094.5). Total num frames: 2981888. Throughput: 0: 9477.0. Samples: 2962616. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-08-03 17:36:35,755][08459] Avg episode reward: [(0, '4706.507')] +[2024-08-03 17:36:36,558][08518] Updated weights for policy 0, policy_version 5840 (0.0006) +[2024-08-03 17:36:40,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9108.4). Total num frames: 3026944. Throughput: 0: 9446.4. Samples: 3017612. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2024-08-03 17:36:40,755][08459] Avg episode reward: [(0, '4738.291')] +[2024-08-03 17:36:41,139][08518] Updated weights for policy 0, policy_version 5920 (0.0009) +[2024-08-03 17:36:45,383][08518] Updated weights for policy 0, policy_version 6000 (0.0006) +[2024-08-03 17:36:45,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9557.4, 300 sec: 9122.3). Total num frames: 3076096. Throughput: 0: 9460.3. Samples: 3075160. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:36:45,755][08459] Avg episode reward: [(0, '4607.609')] +[2024-08-03 17:36:45,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000006008_3076096.pth... +[2024-08-03 17:36:45,760][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000005456_2793472.pth +[2024-08-03 17:36:49,811][08518] Updated weights for policy 0, policy_version 6080 (0.0007) +[2024-08-03 17:36:50,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9489.1, 300 sec: 9122.3). Total num frames: 3121152. Throughput: 0: 9492.9. Samples: 3103200. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:36:50,755][08459] Avg episode reward: [(0, '4526.904')] +[2024-08-03 17:36:54,238][08518] Updated weights for policy 0, policy_version 6160 (0.0007) +[2024-08-03 17:36:55,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9420.8, 300 sec: 9122.3). Total num frames: 3166208. Throughput: 0: 9423.5. Samples: 3158080. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) +[2024-08-03 17:36:55,755][08459] Avg episode reward: [(0, '4771.429')] +[2024-08-03 17:36:55,756][08505] Saving new best policy, reward=4771.429! +[2024-08-03 17:36:58,343][08518] Updated weights for policy 0, policy_version 6240 (0.0006) +[2024-08-03 17:37:00,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9122.3). Total num frames: 3215360. Throughput: 0: 9375.9. Samples: 3215344. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:37:00,755][08459] Avg episode reward: [(0, '4851.501')] +[2024-08-03 17:37:00,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000006280_3215360.pth... +[2024-08-03 17:37:00,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000005728_2932736.pth +[2024-08-03 17:37:00,761][08505] Saving new best policy, reward=4851.501! +[2024-08-03 17:37:02,793][08518] Updated weights for policy 0, policy_version 6320 (0.0006) +[2024-08-03 17:37:05,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9122.3). Total num frames: 3260416. Throughput: 0: 9389.7. Samples: 3244032. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:37:05,755][08459] Avg episode reward: [(0, '4916.884')] +[2024-08-03 17:37:05,756][08505] Saving new best policy, reward=4916.884! +[2024-08-03 17:37:07,275][08518] Updated weights for policy 0, policy_version 6400 (0.0008) +[2024-08-03 17:37:10,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9136.2). Total num frames: 3309568. Throughput: 0: 9329.3. Samples: 3300108. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:37:10,755][08459] Avg episode reward: [(0, '4993.257')] +[2024-08-03 17:37:10,756][08505] Saving new best policy, reward=4993.257! +[2024-08-03 17:37:11,600][08518] Updated weights for policy 0, policy_version 6480 (0.0006) +[2024-08-03 17:37:15,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9136.2). Total num frames: 3354624. Throughput: 0: 9362.6. Samples: 3355588. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) +[2024-08-03 17:37:15,755][08459] Avg episode reward: [(0, '5036.599')] +[2024-08-03 17:37:15,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000006552_3354624.pth... +[2024-08-03 17:37:15,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000006008_3076096.pth +[2024-08-03 17:37:15,761][08505] Saving new best policy, reward=5036.599! +[2024-08-03 17:37:15,949][08518] Updated weights for policy 0, policy_version 6560 (0.0007) +[2024-08-03 17:37:20,442][08518] Updated weights for policy 0, policy_version 6640 (0.0006) +[2024-08-03 17:37:20,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9352.6, 300 sec: 9136.2). Total num frames: 3399680. Throughput: 0: 9368.6. Samples: 3384204. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:37:20,755][08459] Avg episode reward: [(0, '5030.831')] +[2024-08-03 17:37:24,730][08518] Updated weights for policy 0, policy_version 6720 (0.0006) +[2024-08-03 17:37:25,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9150.0). Total num frames: 3448832. Throughput: 0: 9394.7. Samples: 3440372. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:37:25,755][08459] Avg episode reward: [(0, '4939.961')] +[2024-08-03 17:37:28,903][08518] Updated weights for policy 0, policy_version 6800 (0.0006) +[2024-08-03 17:37:30,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9420.8, 300 sec: 9150.0). Total num frames: 3497984. Throughput: 0: 9397.5. Samples: 3498048. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:37:30,755][08459] Avg episode reward: [(0, '4928.001')] +[2024-08-03 17:37:30,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000006832_3497984.pth... +[2024-08-03 17:37:30,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000006280_3215360.pth +[2024-08-03 17:37:33,151][08518] Updated weights for policy 0, policy_version 6880 (0.0007) +[2024-08-03 17:37:35,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9163.9). Total num frames: 3543040. Throughput: 0: 9410.1. Samples: 3526656. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:37:35,755][08459] Avg episode reward: [(0, '4864.127')] +[2024-08-03 17:37:37,342][08518] Updated weights for policy 0, policy_version 6960 (0.0006) +[2024-08-03 17:37:40,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9489.1, 300 sec: 9191.7). Total num frames: 3596288. Throughput: 0: 9521.7. Samples: 3586556. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) +[2024-08-03 17:37:40,763][08459] Avg episode reward: [(0, '4900.269')] +[2024-08-03 17:37:41,494][08518] Updated weights for policy 0, policy_version 7040 (0.0006) +[2024-08-03 17:37:45,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9420.8, 300 sec: 9191.7). Total num frames: 3641344. Throughput: 0: 9476.5. Samples: 3641788. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2024-08-03 17:37:45,755][08459] Avg episode reward: [(0, '4941.618')] +[2024-08-03 17:37:45,761][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000007112_3641344.pth... +[2024-08-03 17:37:45,764][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000006552_3354624.pth +[2024-08-03 17:37:46,017][08518] Updated weights for policy 0, policy_version 7120 (0.0006) +[2024-08-03 17:37:50,326][08518] Updated weights for policy 0, policy_version 7200 (0.0006) +[2024-08-03 17:37:50,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9191.7). Total num frames: 3690496. Throughput: 0: 9472.4. Samples: 3670292. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:37:50,755][08459] Avg episode reward: [(0, '5020.625')] +[2024-08-03 17:37:54,617][08518] Updated weights for policy 0, policy_version 7280 (0.0007) +[2024-08-03 17:37:55,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9191.7). Total num frames: 3735552. Throughput: 0: 9496.7. Samples: 3727460. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:37:55,755][08459] Avg episode reward: [(0, '5048.869')] +[2024-08-03 17:37:55,756][08505] Saving new best policy, reward=5048.869! +[2024-08-03 17:37:58,898][08518] Updated weights for policy 0, policy_version 7360 (0.0005) +[2024-08-03 17:38:00,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9219.5). Total num frames: 3784704. Throughput: 0: 9562.8. Samples: 3785916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:38:00,755][08459] Avg episode reward: [(0, '5135.884')] +[2024-08-03 17:38:00,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000007392_3784704.pth... +[2024-08-03 17:38:00,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000006832_3497984.pth +[2024-08-03 17:38:00,761][08505] Saving new best policy, reward=5135.884! +[2024-08-03 17:38:03,232][08518] Updated weights for policy 0, policy_version 7440 (0.0006) +[2024-08-03 17:38:05,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9219.5). Total num frames: 3829760. Throughput: 0: 9539.4. Samples: 3813476. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:38:05,755][08459] Avg episode reward: [(0, '5077.737')] +[2024-08-03 17:38:07,789][08518] Updated weights for policy 0, policy_version 7520 (0.0008) +[2024-08-03 17:38:10,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9489.1, 300 sec: 9233.4). Total num frames: 3878912. Throughput: 0: 9558.5. Samples: 3870504. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:38:10,763][08459] Avg episode reward: [(0, '5323.566')] +[2024-08-03 17:38:10,763][08505] Saving new best policy, reward=5323.566! +[2024-08-03 17:38:12,000][08518] Updated weights for policy 0, policy_version 7600 (0.0006) +[2024-08-03 17:38:15,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9219.5). Total num frames: 3923968. Throughput: 0: 9470.3. Samples: 3924212. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:38:15,763][08459] Avg episode reward: [(0, '5299.462')] +[2024-08-03 17:38:15,766][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000007664_3923968.pth... +[2024-08-03 17:38:15,769][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000007112_3641344.pth +[2024-08-03 17:38:16,610][08518] Updated weights for policy 0, policy_version 7680 (0.0008) +[2024-08-03 17:38:20,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9489.1, 300 sec: 9233.4). Total num frames: 3969024. Throughput: 0: 9440.4. Samples: 3951472. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:38:20,755][08459] Avg episode reward: [(0, '5208.200')] +[2024-08-03 17:38:21,108][08518] Updated weights for policy 0, policy_version 7760 (0.0007) +[2024-08-03 17:38:25,531][08518] Updated weights for policy 0, policy_version 7840 (0.0007) +[2024-08-03 17:38:25,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9420.8, 300 sec: 9261.1). Total num frames: 4014080. Throughput: 0: 9318.5. Samples: 4005888. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:38:25,755][08459] Avg episode reward: [(0, '5175.625')] +[2024-08-03 17:38:30,146][08518] Updated weights for policy 0, policy_version 7920 (0.0008) +[2024-08-03 17:38:30,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9352.5, 300 sec: 9261.1). Total num frames: 4059136. Throughput: 0: 9283.2. Samples: 4059532. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:38:30,755][08459] Avg episode reward: [(0, '5119.972')] +[2024-08-03 17:38:30,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000007928_4059136.pth... +[2024-08-03 17:38:30,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000007392_3784704.pth +[2024-08-03 17:38:34,498][08518] Updated weights for policy 0, policy_version 8000 (0.0007) +[2024-08-03 17:38:35,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9288.9). Total num frames: 4108288. Throughput: 0: 9296.4. Samples: 4088632. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:38:35,755][08459] Avg episode reward: [(0, '5176.945')] +[2024-08-03 17:38:38,658][08518] Updated weights for policy 0, policy_version 8080 (0.0006) +[2024-08-03 17:38:40,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9284.3, 300 sec: 9302.8). Total num frames: 4153344. Throughput: 0: 9283.6. Samples: 4145220. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:38:40,755][08459] Avg episode reward: [(0, '5075.560')] +[2024-08-03 17:38:43,436][08518] Updated weights for policy 0, policy_version 8160 (0.0008) +[2024-08-03 17:38:45,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9284.3, 300 sec: 9302.8). Total num frames: 4198400. Throughput: 0: 9209.4. Samples: 4200340. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:38:45,755][08459] Avg episode reward: [(0, '4976.779')] +[2024-08-03 17:38:45,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000008200_4198400.pth... +[2024-08-03 17:38:45,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000007664_3923968.pth +[2024-08-03 17:38:47,761][08518] Updated weights for policy 0, policy_version 8240 (0.0006) +[2024-08-03 17:38:50,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9216.0, 300 sec: 9316.7). Total num frames: 4243456. Throughput: 0: 9189.8. Samples: 4227016. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:38:50,763][08459] Avg episode reward: [(0, '5162.080')] +[2024-08-03 17:38:52,226][08518] Updated weights for policy 0, policy_version 8320 (0.0008) +[2024-08-03 17:38:55,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9216.0, 300 sec: 9330.6). Total num frames: 4288512. Throughput: 0: 9149.4. Samples: 4282228. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) +[2024-08-03 17:38:55,755][08459] Avg episode reward: [(0, '5441.010')] +[2024-08-03 17:38:55,756][08505] Saving new best policy, reward=5441.010! +[2024-08-03 17:38:56,692][08518] Updated weights for policy 0, policy_version 8400 (0.0007) +[2024-08-03 17:39:00,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9216.0, 300 sec: 9344.4). Total num frames: 4337664. Throughput: 0: 9187.1. Samples: 4337632. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) +[2024-08-03 17:39:00,763][08459] Avg episode reward: [(0, '5456.078')] +[2024-08-03 17:39:00,766][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000008472_4337664.pth... +[2024-08-03 17:39:00,768][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000007928_4059136.pth +[2024-08-03 17:39:00,769][08505] Saving new best policy, reward=5456.078! +[2024-08-03 17:39:01,100][08518] Updated weights for policy 0, policy_version 8480 (0.0007) +[2024-08-03 17:39:05,374][08518] Updated weights for policy 0, policy_version 8560 (0.0006) +[2024-08-03 17:39:05,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9216.0, 300 sec: 9358.3). Total num frames: 4382720. Throughput: 0: 9223.1. Samples: 4366512. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:39:05,755][08459] Avg episode reward: [(0, '5151.847')] +[2024-08-03 17:39:09,585][08518] Updated weights for policy 0, policy_version 8640 (0.0006) +[2024-08-03 17:39:10,755][08459] Fps is (10 sec: 9420.4, 60 sec: 9215.9, 300 sec: 9400.0). Total num frames: 4431872. Throughput: 0: 9296.3. Samples: 4424224. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:39:10,756][08459] Avg episode reward: [(0, '4997.781')] +[2024-08-03 17:39:13,895][08518] Updated weights for policy 0, policy_version 8720 (0.0007) +[2024-08-03 17:39:15,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9284.3, 300 sec: 9413.9). Total num frames: 4481024. Throughput: 0: 9366.5. Samples: 4481024. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:39:15,755][08459] Avg episode reward: [(0, '4885.763')] +[2024-08-03 17:39:15,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000008752_4481024.pth... +[2024-08-03 17:39:15,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000008200_4198400.pth +[2024-08-03 17:39:18,199][08518] Updated weights for policy 0, policy_version 8800 (0.0006) +[2024-08-03 17:39:20,755][08459] Fps is (10 sec: 9421.2, 60 sec: 9284.3, 300 sec: 9400.0). Total num frames: 4526080. Throughput: 0: 9357.0. Samples: 4509696. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:39:20,755][08459] Avg episode reward: [(0, '5298.800')] +[2024-08-03 17:39:22,493][08518] Updated weights for policy 0, policy_version 8880 (0.0006) +[2024-08-03 17:39:25,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9413.9). Total num frames: 4575232. Throughput: 0: 9374.9. Samples: 4567088. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:39:25,755][08459] Avg episode reward: [(0, '5373.519')] +[2024-08-03 17:39:26,955][08518] Updated weights for policy 0, policy_version 8960 (0.0006) +[2024-08-03 17:39:30,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9400.0). Total num frames: 4620288. Throughput: 0: 9387.8. Samples: 4622792. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:39:30,763][08459] Avg episode reward: [(0, '5067.040')] +[2024-08-03 17:39:30,798][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000009032_4624384.pth... +[2024-08-03 17:39:30,802][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000008472_4337664.pth +[2024-08-03 17:39:31,230][08518] Updated weights for policy 0, policy_version 9040 (0.0006) +[2024-08-03 17:39:35,361][08518] Updated weights for policy 0, policy_version 9120 (0.0006) +[2024-08-03 17:39:35,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9420.8, 300 sec: 9427.7). Total num frames: 4673536. Throughput: 0: 9467.6. Samples: 4653056. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) +[2024-08-03 17:39:35,755][08459] Avg episode reward: [(0, '5293.336')] +[2024-08-03 17:39:39,522][08518] Updated weights for policy 0, policy_version 9200 (0.0006) +[2024-08-03 17:39:40,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9420.8, 300 sec: 9427.7). Total num frames: 4718592. Throughput: 0: 9540.8. Samples: 4711564. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) +[2024-08-03 17:39:40,755][08459] Avg episode reward: [(0, '5262.606')] +[2024-08-03 17:39:43,746][08518] Updated weights for policy 0, policy_version 9280 (0.0007) +[2024-08-03 17:39:45,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9489.1, 300 sec: 9427.7). Total num frames: 4767744. Throughput: 0: 9579.0. Samples: 4768688. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:39:45,755][08459] Avg episode reward: [(0, '5129.260')] +[2024-08-03 17:39:45,766][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000009312_4767744.pth... +[2024-08-03 17:39:45,770][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000008752_4481024.pth +[2024-08-03 17:39:48,247][08518] Updated weights for policy 0, policy_version 9360 (0.0007) +[2024-08-03 17:39:50,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9427.7). Total num frames: 4812800. Throughput: 0: 9555.7. Samples: 4796516. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:39:50,755][08459] Avg episode reward: [(0, '5239.751')] +[2024-08-03 17:39:52,496][08518] Updated weights for policy 0, policy_version 9440 (0.0007) +[2024-08-03 17:39:55,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9557.3, 300 sec: 9427.7). Total num frames: 4861952. Throughput: 0: 9537.2. Samples: 4853396. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:39:55,755][08459] Avg episode reward: [(0, '5216.844')] +[2024-08-03 17:39:57,125][08518] Updated weights for policy 0, policy_version 9520 (0.0008) +[2024-08-03 17:40:00,756][08459] Fps is (10 sec: 9010.7, 60 sec: 9420.7, 300 sec: 9413.8). Total num frames: 4902912. Throughput: 0: 9439.0. Samples: 4905784. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-08-03 17:40:00,764][08459] Avg episode reward: [(0, '4948.641')] +[2024-08-03 17:40:00,904][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000009584_4907008.pth... +[2024-08-03 17:40:00,914][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000009032_4624384.pth +[2024-08-03 17:40:01,784][08518] Updated weights for policy 0, policy_version 9600 (0.0009) +[2024-08-03 17:40:05,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9489.1, 300 sec: 9413.9). Total num frames: 4952064. Throughput: 0: 9438.9. Samples: 4934448. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:40:05,755][08459] Avg episode reward: [(0, '5078.951')] +[2024-08-03 17:40:06,099][08518] Updated weights for policy 0, policy_version 9680 (0.0006) +[2024-08-03 17:40:10,442][08518] Updated weights for policy 0, policy_version 9760 (0.0007) +[2024-08-03 17:40:10,755][08459] Fps is (10 sec: 9421.3, 60 sec: 9420.9, 300 sec: 9400.0). Total num frames: 4997120. Throughput: 0: 9382.0. Samples: 4989276. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:40:10,755][08459] Avg episode reward: [(0, '5293.290')] +[2024-08-03 17:40:15,015][08518] Updated weights for policy 0, policy_version 9840 (0.0008) +[2024-08-03 17:40:15,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9352.5, 300 sec: 9400.0). Total num frames: 5042176. Throughput: 0: 9364.3. Samples: 5044184. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:40:15,755][08459] Avg episode reward: [(0, '5024.359')] +[2024-08-03 17:40:15,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000009848_5042176.pth... +[2024-08-03 17:40:15,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000009312_4767744.pth +[2024-08-03 17:40:19,352][08518] Updated weights for policy 0, policy_version 9920 (0.0006) +[2024-08-03 17:40:20,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9400.0). Total num frames: 5091328. Throughput: 0: 9314.8. Samples: 5072220. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:40:20,755][08459] Avg episode reward: [(0, '5153.167')] +[2024-08-03 17:40:23,474][08518] Updated weights for policy 0, policy_version 10000 (0.0006) +[2024-08-03 17:40:25,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9420.8, 300 sec: 9400.0). Total num frames: 5140480. Throughput: 0: 9313.3. Samples: 5130664. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:40:25,755][08459] Avg episode reward: [(0, '5227.409')] +[2024-08-03 17:40:28,014][08518] Updated weights for policy 0, policy_version 10080 (0.0007) +[2024-08-03 17:40:30,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9400.0). Total num frames: 5185536. Throughput: 0: 9269.7. Samples: 5185824. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) +[2024-08-03 17:40:30,763][08459] Avg episode reward: [(0, '5235.035')] +[2024-08-03 17:40:30,766][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000010128_5185536.pth... +[2024-08-03 17:40:30,769][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000009584_4907008.pth +[2024-08-03 17:40:32,200][08518] Updated weights for policy 0, policy_version 10160 (0.0005) +[2024-08-03 17:40:35,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9284.3, 300 sec: 9400.0). Total num frames: 5230592. Throughput: 0: 9306.0. Samples: 5215288. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) +[2024-08-03 17:40:35,755][08459] Avg episode reward: [(0, '5497.485')] +[2024-08-03 17:40:35,759][08505] Saving new best policy, reward=5497.485! +[2024-08-03 17:40:36,653][08518] Updated weights for policy 0, policy_version 10240 (0.0006) +[2024-08-03 17:40:40,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9413.9). Total num frames: 5279744. Throughput: 0: 9322.2. Samples: 5272896. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:40:40,763][08459] Avg episode reward: [(0, '5450.189')] +[2024-08-03 17:40:40,994][08518] Updated weights for policy 0, policy_version 10320 (0.0006) +[2024-08-03 17:40:45,403][08518] Updated weights for policy 0, policy_version 10400 (0.0007) +[2024-08-03 17:40:45,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9284.3, 300 sec: 9400.0). Total num frames: 5324800. Throughput: 0: 9369.5. Samples: 5327408. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:40:45,755][08459] Avg episode reward: [(0, '5399.144')] +[2024-08-03 17:40:45,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000010400_5324800.pth... +[2024-08-03 17:40:45,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000009848_5042176.pth +[2024-08-03 17:40:49,772][08518] Updated weights for policy 0, policy_version 10480 (0.0007) +[2024-08-03 17:40:50,756][08459] Fps is (10 sec: 9010.8, 60 sec: 9284.2, 300 sec: 9386.1). Total num frames: 5369856. Throughput: 0: 9357.5. Samples: 5355540. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:40:50,756][08459] Avg episode reward: [(0, '5316.514')] +[2024-08-03 17:40:54,144][08518] Updated weights for policy 0, policy_version 10560 (0.0008) +[2024-08-03 17:40:55,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9284.3, 300 sec: 9400.0). Total num frames: 5419008. Throughput: 0: 9367.6. Samples: 5410816. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) +[2024-08-03 17:40:55,755][08459] Avg episode reward: [(0, '5254.750')] +[2024-08-03 17:40:58,443][08518] Updated weights for policy 0, policy_version 10640 (0.0007) +[2024-08-03 17:41:00,755][08459] Fps is (10 sec: 9830.8, 60 sec: 9420.9, 300 sec: 9400.0). Total num frames: 5468160. Throughput: 0: 9450.8. Samples: 5469468. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) +[2024-08-03 17:41:00,755][08459] Avg episode reward: [(0, '5287.509')] +[2024-08-03 17:41:00,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000010680_5468160.pth... +[2024-08-03 17:41:00,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000010128_5185536.pth +[2024-08-03 17:41:02,557][08518] Updated weights for policy 0, policy_version 10720 (0.0005) +[2024-08-03 17:41:05,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9420.8, 300 sec: 9413.9). Total num frames: 5517312. Throughput: 0: 9500.0. Samples: 5499720. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:41:05,755][08459] Avg episode reward: [(0, '5429.694')] +[2024-08-03 17:41:06,679][08518] Updated weights for policy 0, policy_version 10800 (0.0005) +[2024-08-03 17:41:10,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9489.1, 300 sec: 9400.0). Total num frames: 5566464. Throughput: 0: 9501.6. Samples: 5558236. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:41:10,763][08459] Avg episode reward: [(0, '5689.542')] +[2024-08-03 17:41:10,763][08505] Saving new best policy, reward=5689.542! +[2024-08-03 17:41:11,160][08518] Updated weights for policy 0, policy_version 10880 (0.0007) +[2024-08-03 17:41:15,584][08518] Updated weights for policy 0, policy_version 10960 (0.0007) +[2024-08-03 17:41:15,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9400.0). Total num frames: 5611520. Throughput: 0: 9471.5. Samples: 5612040. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:41:15,755][08459] Avg episode reward: [(0, '5521.669')] +[2024-08-03 17:41:15,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000010960_5611520.pth... +[2024-08-03 17:41:15,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000010400_5324800.pth +[2024-08-03 17:41:19,624][08518] Updated weights for policy 0, policy_version 11040 (0.0005) +[2024-08-03 17:41:20,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9400.0). Total num frames: 5660672. Throughput: 0: 9522.4. Samples: 5643796. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:41:20,755][08459] Avg episode reward: [(0, '5295.240')] +[2024-08-03 17:41:24,076][08518] Updated weights for policy 0, policy_version 11120 (0.0008) +[2024-08-03 17:41:25,756][08459] Fps is (10 sec: 9420.4, 60 sec: 9420.7, 300 sec: 9400.0). Total num frames: 5705728. Throughput: 0: 9490.7. Samples: 5699980. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:41:25,756][08459] Avg episode reward: [(0, '5385.609')] +[2024-08-03 17:41:28,223][08518] Updated weights for policy 0, policy_version 11200 (0.0007) +[2024-08-03 17:41:30,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9400.0). Total num frames: 5754880. Throughput: 0: 9570.0. Samples: 5758056. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:41:30,755][08459] Avg episode reward: [(0, '5361.107')] +[2024-08-03 17:41:30,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000011240_5754880.pth... +[2024-08-03 17:41:30,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000010680_5468160.pth +[2024-08-03 17:41:32,394][08518] Updated weights for policy 0, policy_version 11280 (0.0006) +[2024-08-03 17:41:35,755][08459] Fps is (10 sec: 9830.8, 60 sec: 9557.3, 300 sec: 9413.9). Total num frames: 5804032. Throughput: 0: 9604.9. Samples: 5787756. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:41:35,755][08459] Avg episode reward: [(0, '5345.420')] +[2024-08-03 17:41:36,711][08518] Updated weights for policy 0, policy_version 11360 (0.0006) +[2024-08-03 17:41:40,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9557.3, 300 sec: 9413.9). Total num frames: 5853184. Throughput: 0: 9652.6. Samples: 5845184. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:41:40,755][08459] Avg episode reward: [(0, '5473.606')] +[2024-08-03 17:41:40,966][08518] Updated weights for policy 0, policy_version 11440 (0.0006) +[2024-08-03 17:41:45,273][08518] Updated weights for policy 0, policy_version 11520 (0.0007) +[2024-08-03 17:41:45,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9625.6, 300 sec: 9427.7). Total num frames: 5902336. Throughput: 0: 9620.7. Samples: 5902400. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:41:45,755][08459] Avg episode reward: [(0, '5587.386')] +[2024-08-03 17:41:45,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000011528_5902336.pth... +[2024-08-03 17:41:45,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000010960_5611520.pth +[2024-08-03 17:41:49,548][08518] Updated weights for policy 0, policy_version 11600 (0.0006) +[2024-08-03 17:41:50,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9625.7, 300 sec: 9427.7). Total num frames: 5947392. Throughput: 0: 9583.7. Samples: 5930988. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:41:50,755][08459] Avg episode reward: [(0, '5763.244')] +[2024-08-03 17:41:50,766][08505] Saving new best policy, reward=5763.244! +[2024-08-03 17:41:53,582][08518] Updated weights for policy 0, policy_version 11680 (0.0006) +[2024-08-03 17:41:55,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9625.6, 300 sec: 9427.7). Total num frames: 5996544. Throughput: 0: 9559.0. Samples: 5988392. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:41:55,755][08459] Avg episode reward: [(0, '5645.963')] +[2024-08-03 17:41:58,361][08518] Updated weights for policy 0, policy_version 11760 (0.0007) +[2024-08-03 17:42:00,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9557.3, 300 sec: 9427.7). Total num frames: 6041600. Throughput: 0: 9547.2. Samples: 6041664. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:42:00,755][08459] Avg episode reward: [(0, '5492.731')] +[2024-08-03 17:42:00,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000011800_6041600.pth... +[2024-08-03 17:42:00,762][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000011240_5754880.pth +[2024-08-03 17:42:02,786][08518] Updated weights for policy 0, policy_version 11840 (0.0008) +[2024-08-03 17:42:05,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9489.1, 300 sec: 9413.9). Total num frames: 6086656. Throughput: 0: 9489.7. Samples: 6070832. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:42:05,763][08459] Avg episode reward: [(0, '5470.868')] +[2024-08-03 17:42:07,032][08518] Updated weights for policy 0, policy_version 11920 (0.0006) +[2024-08-03 17:42:10,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9489.1, 300 sec: 9427.7). Total num frames: 6135808. Throughput: 0: 9477.7. Samples: 6126472. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:42:10,755][08459] Avg episode reward: [(0, '5325.780')] +[2024-08-03 17:42:11,589][08518] Updated weights for policy 0, policy_version 12000 (0.0008) +[2024-08-03 17:42:15,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9427.7). Total num frames: 6180864. Throughput: 0: 9414.0. Samples: 6181684. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) +[2024-08-03 17:42:15,755][08459] Avg episode reward: [(0, '5443.887')] +[2024-08-03 17:42:15,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000012072_6180864.pth... +[2024-08-03 17:42:15,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000011528_5902336.pth +[2024-08-03 17:42:15,968][08518] Updated weights for policy 0, policy_version 12080 (0.0006) +[2024-08-03 17:42:20,608][08518] Updated weights for policy 0, policy_version 12160 (0.0010) +[2024-08-03 17:42:20,755][08459] Fps is (10 sec: 9011.1, 60 sec: 9420.8, 300 sec: 9413.9). Total num frames: 6225920. Throughput: 0: 9357.5. Samples: 6208844. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) +[2024-08-03 17:42:20,756][08459] Avg episode reward: [(0, '5508.001')] +[2024-08-03 17:42:24,860][08518] Updated weights for policy 0, policy_version 12240 (0.0006) +[2024-08-03 17:42:25,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9413.9). Total num frames: 6275072. Throughput: 0: 9321.3. Samples: 6264644. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:42:25,755][08459] Avg episode reward: [(0, '5539.828')] +[2024-08-03 17:42:28,921][08518] Updated weights for policy 0, policy_version 12320 (0.0005) +[2024-08-03 17:42:30,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9420.8, 300 sec: 9413.9). Total num frames: 6320128. Throughput: 0: 9359.8. Samples: 6323592. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:42:30,755][08459] Avg episode reward: [(0, '5511.651')] +[2024-08-03 17:42:30,762][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000012352_6324224.pth... +[2024-08-03 17:42:30,765][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000011800_6041600.pth +[2024-08-03 17:42:33,703][08518] Updated weights for policy 0, policy_version 12400 (0.0009) +[2024-08-03 17:42:35,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9352.5, 300 sec: 9386.1). Total num frames: 6365184. Throughput: 0: 9284.7. Samples: 6348800. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) +[2024-08-03 17:42:35,755][08459] Avg episode reward: [(0, '5576.374')] +[2024-08-03 17:42:38,011][08518] Updated weights for policy 0, policy_version 12480 (0.0007) +[2024-08-03 17:42:40,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9400.0). Total num frames: 6414336. Throughput: 0: 9309.4. Samples: 6407316. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) +[2024-08-03 17:42:40,755][08459] Avg episode reward: [(0, '5711.458')] +[2024-08-03 17:42:42,033][08518] Updated weights for policy 0, policy_version 12560 (0.0005) +[2024-08-03 17:42:45,755][08459] Fps is (10 sec: 10239.9, 60 sec: 9420.8, 300 sec: 9413.9). Total num frames: 6467584. Throughput: 0: 9464.1. Samples: 6467548. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:42:45,755][08459] Avg episode reward: [(0, '5650.053')] +[2024-08-03 17:42:45,759][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000012632_6467584.pth... +[2024-08-03 17:42:45,762][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000012072_6180864.pth +[2024-08-03 17:42:46,078][08518] Updated weights for policy 0, policy_version 12640 (0.0005) +[2024-08-03 17:42:50,377][08518] Updated weights for policy 0, policy_version 12720 (0.0006) +[2024-08-03 17:42:50,755][08459] Fps is (10 sec: 10240.0, 60 sec: 9489.1, 300 sec: 9427.7). Total num frames: 6516736. Throughput: 0: 9456.4. Samples: 6496368. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:42:50,755][08459] Avg episode reward: [(0, '5558.895')] +[2024-08-03 17:42:54,363][08518] Updated weights for policy 0, policy_version 12800 (0.0005) +[2024-08-03 17:42:55,755][08459] Fps is (10 sec: 9830.5, 60 sec: 9489.1, 300 sec: 9427.7). Total num frames: 6565888. Throughput: 0: 9566.6. Samples: 6556968. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) +[2024-08-03 17:42:55,755][08459] Avg episode reward: [(0, '5567.975')] +[2024-08-03 17:42:58,675][08518] Updated weights for policy 0, policy_version 12880 (0.0007) +[2024-08-03 17:43:00,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9489.1, 300 sec: 9427.7). Total num frames: 6610944. Throughput: 0: 9594.3. Samples: 6613428. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) +[2024-08-03 17:43:00,755][08459] Avg episode reward: [(0, '5364.413')] +[2024-08-03 17:43:00,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000012912_6610944.pth... +[2024-08-03 17:43:00,762][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000012352_6324224.pth +[2024-08-03 17:43:03,040][08518] Updated weights for policy 0, policy_version 12960 (0.0006) +[2024-08-03 17:43:05,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9557.3, 300 sec: 9427.7). Total num frames: 6660096. Throughput: 0: 9591.0. Samples: 6640440. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:43:05,763][08459] Avg episode reward: [(0, '5450.249')] +[2024-08-03 17:43:07,339][08518] Updated weights for policy 0, policy_version 13040 (0.0007) +[2024-08-03 17:43:10,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9427.7). Total num frames: 6705152. Throughput: 0: 9645.4. Samples: 6698688. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:43:10,755][08459] Avg episode reward: [(0, '5652.180')] +[2024-08-03 17:43:11,752][08518] Updated weights for policy 0, policy_version 13120 (0.0007) +[2024-08-03 17:43:15,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9557.3, 300 sec: 9441.6). Total num frames: 6754304. Throughput: 0: 9595.0. Samples: 6755368. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:43:15,755][08459] Avg episode reward: [(0, '5765.327')] +[2024-08-03 17:43:15,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000013192_6754304.pth... +[2024-08-03 17:43:15,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000012632_6467584.pth +[2024-08-03 17:43:15,761][08505] Saving new best policy, reward=5765.327! +[2024-08-03 17:43:15,969][08518] Updated weights for policy 0, policy_version 13200 (0.0006) +[2024-08-03 17:43:20,242][08518] Updated weights for policy 0, policy_version 13280 (0.0006) +[2024-08-03 17:43:20,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9625.6, 300 sec: 9455.5). Total num frames: 6803456. Throughput: 0: 9711.1. Samples: 6785800. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:43:20,755][08459] Avg episode reward: [(0, '5605.863')] +[2024-08-03 17:43:24,761][08518] Updated weights for policy 0, policy_version 13360 (0.0007) +[2024-08-03 17:43:25,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9557.3, 300 sec: 9455.5). Total num frames: 6848512. Throughput: 0: 9608.2. Samples: 6839684. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) +[2024-08-03 17:43:25,755][08459] Avg episode reward: [(0, '5521.266')] +[2024-08-03 17:43:29,176][08518] Updated weights for policy 0, policy_version 13440 (0.0006) +[2024-08-03 17:43:30,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9625.6, 300 sec: 9455.5). Total num frames: 6897664. Throughput: 0: 9555.6. Samples: 6897552. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) +[2024-08-03 17:43:30,755][08459] Avg episode reward: [(0, '5439.126')] +[2024-08-03 17:43:30,760][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000013472_6897664.pth... +[2024-08-03 17:43:30,764][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000012912_6610944.pth +[2024-08-03 17:43:33,274][08518] Updated weights for policy 0, policy_version 13520 (0.0006) +[2024-08-03 17:43:35,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9625.6, 300 sec: 9455.5). Total num frames: 6942720. Throughput: 0: 9554.8. Samples: 6926336. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:43:35,755][08459] Avg episode reward: [(0, '5464.055')] +[2024-08-03 17:43:37,698][08518] Updated weights for policy 0, policy_version 13600 (0.0007) +[2024-08-03 17:43:40,755][08459] Fps is (10 sec: 9011.3, 60 sec: 9557.3, 300 sec: 9455.5). Total num frames: 6987776. Throughput: 0: 9473.7. Samples: 6983284. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:43:40,763][08459] Avg episode reward: [(0, '5577.137')] +[2024-08-03 17:43:42,076][08518] Updated weights for policy 0, policy_version 13680 (0.0006) +[2024-08-03 17:43:45,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9469.4). Total num frames: 7036928. Throughput: 0: 9412.0. Samples: 7036968. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:43:45,755][08459] Avg episode reward: [(0, '5729.617')] +[2024-08-03 17:43:45,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000013744_7036928.pth... +[2024-08-03 17:43:45,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000013192_6754304.pth +[2024-08-03 17:43:46,534][08518] Updated weights for policy 0, policy_version 13760 (0.0006) +[2024-08-03 17:43:50,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9469.4). Total num frames: 7081984. Throughput: 0: 9443.7. Samples: 7065408. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:43:50,755][08459] Avg episode reward: [(0, '5682.824')] +[2024-08-03 17:43:50,959][08518] Updated weights for policy 0, policy_version 13840 (0.0007) +[2024-08-03 17:43:55,394][08518] Updated weights for policy 0, policy_version 13920 (0.0007) +[2024-08-03 17:43:55,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9352.5, 300 sec: 9455.5). Total num frames: 7127040. Throughput: 0: 9338.3. Samples: 7118912. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2024-08-03 17:43:55,755][08459] Avg episode reward: [(0, '5573.099')] +[2024-08-03 17:43:59,791][08518] Updated weights for policy 0, policy_version 14000 (0.0007) +[2024-08-03 17:44:00,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9420.8, 300 sec: 9469.4). Total num frames: 7176192. Throughput: 0: 9351.6. Samples: 7176192. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2024-08-03 17:44:00,755][08459] Avg episode reward: [(0, '5726.101')] +[2024-08-03 17:44:00,759][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000014016_7176192.pth... +[2024-08-03 17:44:00,762][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000013472_6897664.pth +[2024-08-03 17:44:04,230][08518] Updated weights for policy 0, policy_version 14080 (0.0007) +[2024-08-03 17:44:05,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9455.5). Total num frames: 7221248. Throughput: 0: 9310.0. Samples: 7204748. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) +[2024-08-03 17:44:05,759][08459] Avg episode reward: [(0, '5717.196')] +[2024-08-03 17:44:08,516][08518] Updated weights for policy 0, policy_version 14160 (0.0009) +[2024-08-03 17:44:10,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9420.8, 300 sec: 9455.5). Total num frames: 7270400. Throughput: 0: 9366.5. Samples: 7261176. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) +[2024-08-03 17:44:10,755][08459] Avg episode reward: [(0, '5508.257')] +[2024-08-03 17:44:13,001][08518] Updated weights for policy 0, policy_version 14240 (0.0007) +[2024-08-03 17:44:15,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9352.5, 300 sec: 9455.5). Total num frames: 7315456. Throughput: 0: 9288.1. Samples: 7315516. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:44:15,755][08459] Avg episode reward: [(0, '5521.021')] +[2024-08-03 17:44:15,759][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000014288_7315456.pth... +[2024-08-03 17:44:15,762][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000013744_7036928.pth +[2024-08-03 17:44:17,451][08518] Updated weights for policy 0, policy_version 14320 (0.0006) +[2024-08-03 17:44:20,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9284.3, 300 sec: 9441.6). Total num frames: 7360512. Throughput: 0: 9284.3. Samples: 7344128. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:44:20,763][08459] Avg episode reward: [(0, '5673.150')] +[2024-08-03 17:44:21,675][08518] Updated weights for policy 0, policy_version 14400 (0.0006) +[2024-08-03 17:44:25,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9455.5). Total num frames: 7409664. Throughput: 0: 9292.7. Samples: 7401456. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:44:25,763][08459] Avg episode reward: [(0, '5584.909')] +[2024-08-03 17:44:25,845][08518] Updated weights for policy 0, policy_version 14480 (0.0005) +[2024-08-03 17:44:30,159][08518] Updated weights for policy 0, policy_version 14560 (0.0006) +[2024-08-03 17:44:30,755][08459] Fps is (10 sec: 9830.3, 60 sec: 9352.5, 300 sec: 9441.6). Total num frames: 7458816. Throughput: 0: 9379.9. Samples: 7459064. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:44:30,755][08459] Avg episode reward: [(0, '5582.677')] +[2024-08-03 17:44:30,759][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000014568_7458816.pth... +[2024-08-03 17:44:30,762][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000014016_7176192.pth +[2024-08-03 17:44:34,713][08518] Updated weights for policy 0, policy_version 14640 (0.0007) +[2024-08-03 17:44:35,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9441.6). Total num frames: 7503872. Throughput: 0: 9326.2. Samples: 7485088. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:44:35,755][08459] Avg episode reward: [(0, '5592.247')] +[2024-08-03 17:44:39,065][08518] Updated weights for policy 0, policy_version 14720 (0.0008) +[2024-08-03 17:44:40,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9420.8, 300 sec: 9441.6). Total num frames: 7553024. Throughput: 0: 9421.3. Samples: 7542872. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:44:40,755][08459] Avg episode reward: [(0, '5594.239')] +[2024-08-03 17:44:43,353][08518] Updated weights for policy 0, policy_version 14800 (0.0008) +[2024-08-03 17:44:45,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9441.6). Total num frames: 7598080. Throughput: 0: 9378.1. Samples: 7598204. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:44:45,755][08459] Avg episode reward: [(0, '5616.485')] +[2024-08-03 17:44:45,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000014840_7598080.pth... +[2024-08-03 17:44:45,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000014288_7315456.pth +[2024-08-03 17:44:47,886][08518] Updated weights for policy 0, policy_version 14880 (0.0007) +[2024-08-03 17:44:50,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9352.5, 300 sec: 9427.7). Total num frames: 7643136. Throughput: 0: 9379.3. Samples: 7626816. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:44:50,763][08459] Avg episode reward: [(0, '5579.893')] +[2024-08-03 17:44:52,122][08518] Updated weights for policy 0, policy_version 14960 (0.0006) +[2024-08-03 17:44:55,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9455.5). Total num frames: 7692288. Throughput: 0: 9367.5. Samples: 7682712. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:44:55,763][08459] Avg episode reward: [(0, '5528.090')] +[2024-08-03 17:44:56,513][08518] Updated weights for policy 0, policy_version 15040 (0.0006) +[2024-08-03 17:45:00,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9441.6). Total num frames: 7737344. Throughput: 0: 9436.4. Samples: 7740152. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:45:00,755][08459] Avg episode reward: [(0, '5616.296')] +[2024-08-03 17:45:00,792][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000015120_7741440.pth... +[2024-08-03 17:45:00,793][08518] Updated weights for policy 0, policy_version 15120 (0.0006) +[2024-08-03 17:45:00,795][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000014568_7458816.pth +[2024-08-03 17:45:05,033][08518] Updated weights for policy 0, policy_version 15200 (0.0007) +[2024-08-03 17:45:05,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9455.5). Total num frames: 7786496. Throughput: 0: 9471.6. Samples: 7770352. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:45:05,755][08459] Avg episode reward: [(0, '5680.698')] +[2024-08-03 17:45:09,342][08518] Updated weights for policy 0, policy_version 15280 (0.0006) +[2024-08-03 17:45:10,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9420.8, 300 sec: 9469.4). Total num frames: 7835648. Throughput: 0: 9456.4. Samples: 7826992. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:45:10,755][08459] Avg episode reward: [(0, '5526.052')] +[2024-08-03 17:45:13,839][08518] Updated weights for policy 0, policy_version 15360 (0.0007) +[2024-08-03 17:45:15,756][08459] Fps is (10 sec: 9420.4, 60 sec: 9420.7, 300 sec: 9455.5). Total num frames: 7880704. Throughput: 0: 9369.7. Samples: 7880704. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:45:15,756][08459] Avg episode reward: [(0, '5675.220')] +[2024-08-03 17:45:15,768][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000015392_7880704.pth... +[2024-08-03 17:45:15,778][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000014840_7598080.pth +[2024-08-03 17:45:18,372][08518] Updated weights for policy 0, policy_version 15440 (0.0009) +[2024-08-03 17:45:20,755][08459] Fps is (10 sec: 8601.6, 60 sec: 9352.5, 300 sec: 9427.7). Total num frames: 7921664. Throughput: 0: 9402.0. Samples: 7908180. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:45:20,755][08459] Avg episode reward: [(0, '5783.552')] +[2024-08-03 17:45:20,756][08505] Saving new best policy, reward=5783.552! +[2024-08-03 17:45:23,247][08518] Updated weights for policy 0, policy_version 15520 (0.0009) +[2024-08-03 17:45:25,755][08459] Fps is (10 sec: 8602.0, 60 sec: 9284.3, 300 sec: 9427.7). Total num frames: 7966720. Throughput: 0: 9249.4. Samples: 7959096. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:45:25,755][08459] Avg episode reward: [(0, '5688.326')] +[2024-08-03 17:45:27,549][08518] Updated weights for policy 0, policy_version 15600 (0.0006) +[2024-08-03 17:45:30,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9284.3, 300 sec: 9441.6). Total num frames: 8015872. Throughput: 0: 9349.2. Samples: 8018916. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:45:30,763][08459] Avg episode reward: [(0, '5738.667')] +[2024-08-03 17:45:30,784][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000015664_8019968.pth... +[2024-08-03 17:45:30,787][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000015120_7741440.pth +[2024-08-03 17:45:31,666][08518] Updated weights for policy 0, policy_version 15680 (0.0005) +[2024-08-03 17:45:35,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9352.5, 300 sec: 9441.6). Total num frames: 8065024. Throughput: 0: 9355.9. Samples: 8047832. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:45:35,755][08459] Avg episode reward: [(0, '5735.980')] +[2024-08-03 17:45:35,918][08518] Updated weights for policy 0, policy_version 15760 (0.0005) +[2024-08-03 17:45:40,232][08518] Updated weights for policy 0, policy_version 15840 (0.0006) +[2024-08-03 17:45:40,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9352.5, 300 sec: 9455.5). Total num frames: 8114176. Throughput: 0: 9354.8. Samples: 8103680. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:45:40,755][08459] Avg episode reward: [(0, '5639.529')] +[2024-08-03 17:45:44,295][08518] Updated weights for policy 0, policy_version 15920 (0.0005) +[2024-08-03 17:45:45,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9420.8, 300 sec: 9469.4). Total num frames: 8163328. Throughput: 0: 9426.1. Samples: 8164328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:45:45,755][08459] Avg episode reward: [(0, '5682.277')] +[2024-08-03 17:45:45,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000015944_8163328.pth... +[2024-08-03 17:45:45,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000015392_7880704.pth +[2024-08-03 17:45:48,301][08518] Updated weights for policy 0, policy_version 16000 (0.0006) +[2024-08-03 17:45:50,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9489.1, 300 sec: 9469.4). Total num frames: 8212480. Throughput: 0: 9461.0. Samples: 8196096. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:45:50,755][08459] Avg episode reward: [(0, '5800.605')] +[2024-08-03 17:45:50,756][08505] Saving new best policy, reward=5800.605! +[2024-08-03 17:45:52,845][08518] Updated weights for policy 0, policy_version 16080 (0.0007) +[2024-08-03 17:45:55,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9489.1, 300 sec: 9469.4). Total num frames: 8261632. Throughput: 0: 9444.3. Samples: 8251984. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) +[2024-08-03 17:45:55,763][08459] Avg episode reward: [(0, '5671.840')] +[2024-08-03 17:45:56,973][08518] Updated weights for policy 0, policy_version 16160 (0.0006) +[2024-08-03 17:46:00,755][08459] Fps is (10 sec: 9420.6, 60 sec: 9489.0, 300 sec: 9455.5). Total num frames: 8306688. Throughput: 0: 9467.8. Samples: 8306752. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) +[2024-08-03 17:46:00,756][08459] Avg episode reward: [(0, '5789.338')] +[2024-08-03 17:46:00,759][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000016224_8306688.pth... +[2024-08-03 17:46:00,763][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000015664_8019968.pth +[2024-08-03 17:46:01,551][08518] Updated weights for policy 0, policy_version 16240 (0.0008) +[2024-08-03 17:46:05,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9420.8, 300 sec: 9441.6). Total num frames: 8351744. Throughput: 0: 9441.0. Samples: 8333024. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:46:05,755][08459] Avg episode reward: [(0, '5982.420')] +[2024-08-03 17:46:05,756][08505] Saving new best policy, reward=5982.420! +[2024-08-03 17:46:06,220][08518] Updated weights for policy 0, policy_version 16320 (0.0007) +[2024-08-03 17:46:10,444][08518] Updated weights for policy 0, policy_version 16400 (0.0007) +[2024-08-03 17:46:10,755][08459] Fps is (10 sec: 9011.3, 60 sec: 9352.5, 300 sec: 9441.6). Total num frames: 8396800. Throughput: 0: 9545.8. Samples: 8388656. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:46:10,755][08459] Avg episode reward: [(0, '5975.572')] +[2024-08-03 17:46:14,836][08518] Updated weights for policy 0, policy_version 16480 (0.0007) +[2024-08-03 17:46:15,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.9, 300 sec: 9441.6). Total num frames: 8445952. Throughput: 0: 9482.6. Samples: 8445632. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:46:15,755][08459] Avg episode reward: [(0, '5922.192')] +[2024-08-03 17:46:15,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000016496_8445952.pth... +[2024-08-03 17:46:15,762][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000015944_8163328.pth +[2024-08-03 17:46:19,258][08518] Updated weights for policy 0, policy_version 16560 (0.0006) +[2024-08-03 17:46:20,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9489.1, 300 sec: 9441.6). Total num frames: 8491008. Throughput: 0: 9476.8. Samples: 8474288. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:46:20,756][08459] Avg episode reward: [(0, '5841.604')] +[2024-08-03 17:46:23,779][08518] Updated weights for policy 0, policy_version 16640 (0.0007) +[2024-08-03 17:46:25,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9489.1, 300 sec: 9427.7). Total num frames: 8536064. Throughput: 0: 9427.4. Samples: 8527912. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:46:25,755][08459] Avg episode reward: [(0, '5896.092')] +[2024-08-03 17:46:28,000][08518] Updated weights for policy 0, policy_version 16720 (0.0007) +[2024-08-03 17:46:30,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9427.7). Total num frames: 8585216. Throughput: 0: 9354.8. Samples: 8585296. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:46:30,755][08459] Avg episode reward: [(0, '5716.351')] +[2024-08-03 17:46:30,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000016768_8585216.pth... +[2024-08-03 17:46:30,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000016224_8306688.pth +[2024-08-03 17:46:32,631][08518] Updated weights for policy 0, policy_version 16800 (0.0008) +[2024-08-03 17:46:35,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9413.9). Total num frames: 8630272. Throughput: 0: 9246.6. Samples: 8612192. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:46:35,755][08459] Avg episode reward: [(0, '5499.385')] +[2024-08-03 17:46:36,860][08518] Updated weights for policy 0, policy_version 16880 (0.0006) +[2024-08-03 17:46:40,756][08459] Fps is (10 sec: 9010.9, 60 sec: 9352.5, 300 sec: 9400.0). Total num frames: 8675328. Throughput: 0: 9267.3. Samples: 8669016. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-08-03 17:46:40,764][08459] Avg episode reward: [(0, '5572.570')] +[2024-08-03 17:46:41,255][08518] Updated weights for policy 0, policy_version 16960 (0.0007) +[2024-08-03 17:46:45,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9284.3, 300 sec: 9400.0). Total num frames: 8720384. Throughput: 0: 9242.7. Samples: 8722672. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-08-03 17:46:45,763][08459] Avg episode reward: [(0, '5698.504')] +[2024-08-03 17:46:45,766][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000017032_8720384.pth... +[2024-08-03 17:46:45,769][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000016496_8445952.pth +[2024-08-03 17:46:45,891][08518] Updated weights for policy 0, policy_version 17040 (0.0007) +[2024-08-03 17:46:50,257][08518] Updated weights for policy 0, policy_version 17120 (0.0006) +[2024-08-03 17:46:50,755][08459] Fps is (10 sec: 9421.2, 60 sec: 9284.3, 300 sec: 9400.0). Total num frames: 8769536. Throughput: 0: 9310.4. Samples: 8751992. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-08-03 17:46:50,755][08459] Avg episode reward: [(0, '5729.591')] +[2024-08-03 17:46:54,473][08518] Updated weights for policy 0, policy_version 17200 (0.0007) +[2024-08-03 17:46:55,755][08459] Fps is (10 sec: 9830.5, 60 sec: 9284.3, 300 sec: 9413.9). Total num frames: 8818688. Throughput: 0: 9326.8. Samples: 8808360. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:46:55,755][08459] Avg episode reward: [(0, '5853.953')] +[2024-08-03 17:46:58,577][08518] Updated weights for policy 0, policy_version 17280 (0.0005) +[2024-08-03 17:47:00,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9352.6, 300 sec: 9427.7). Total num frames: 8867840. Throughput: 0: 9393.4. Samples: 8868336. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:47:00,755][08459] Avg episode reward: [(0, '6045.859')] +[2024-08-03 17:47:00,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000017320_8867840.pth... +[2024-08-03 17:47:00,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000016768_8585216.pth +[2024-08-03 17:47:00,761][08505] Saving new best policy, reward=6045.859! +[2024-08-03 17:47:02,968][08518] Updated weights for policy 0, policy_version 17360 (0.0007) +[2024-08-03 17:47:05,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9352.5, 300 sec: 9413.9). Total num frames: 8912896. Throughput: 0: 9350.7. Samples: 8895068. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:47:05,755][08459] Avg episode reward: [(0, '6057.160')] +[2024-08-03 17:47:05,756][08505] Saving new best policy, reward=6057.160! +[2024-08-03 17:47:07,464][08518] Updated weights for policy 0, policy_version 17440 (0.0008) +[2024-08-03 17:47:10,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9352.5, 300 sec: 9413.9). Total num frames: 8957952. Throughput: 0: 9419.3. Samples: 8951780. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:47:10,755][08459] Avg episode reward: [(0, '5979.014')] +[2024-08-03 17:47:11,828][08518] Updated weights for policy 0, policy_version 17520 (0.0007) +[2024-08-03 17:47:15,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9284.3, 300 sec: 9413.9). Total num frames: 9003008. Throughput: 0: 9302.0. Samples: 9003884. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:47:15,763][08459] Avg episode reward: [(0, '5847.510')] +[2024-08-03 17:47:15,766][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000017584_9003008.pth... +[2024-08-03 17:47:15,769][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000017032_8720384.pth +[2024-08-03 17:47:16,402][08518] Updated weights for policy 0, policy_version 17600 (0.0008) +[2024-08-03 17:47:20,671][08518] Updated weights for policy 0, policy_version 17680 (0.0006) +[2024-08-03 17:47:20,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9352.5, 300 sec: 9413.9). Total num frames: 9052160. Throughput: 0: 9386.0. Samples: 9034564. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:47:20,755][08459] Avg episode reward: [(0, '5775.518')] +[2024-08-03 17:47:25,028][08518] Updated weights for policy 0, policy_version 17760 (0.0006) +[2024-08-03 17:47:25,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9352.5, 300 sec: 9413.9). Total num frames: 9097216. Throughput: 0: 9335.8. Samples: 9089124. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:47:25,755][08459] Avg episode reward: [(0, '5796.669')] +[2024-08-03 17:47:29,256][08518] Updated weights for policy 0, policy_version 17840 (0.0006) +[2024-08-03 17:47:30,755][08459] Fps is (10 sec: 9420.9, 60 sec: 9352.5, 300 sec: 9427.7). Total num frames: 9146368. Throughput: 0: 9448.7. Samples: 9147864. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:47:30,755][08459] Avg episode reward: [(0, '5860.601')] +[2024-08-03 17:47:30,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000017864_9146368.pth... +[2024-08-03 17:47:30,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000017320_8867840.pth +[2024-08-03 17:47:33,323][08518] Updated weights for policy 0, policy_version 17920 (0.0006) +[2024-08-03 17:47:35,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9420.8, 300 sec: 9427.7). Total num frames: 9195520. Throughput: 0: 9486.6. Samples: 9178888. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:47:35,755][08459] Avg episode reward: [(0, '5976.929')] +[2024-08-03 17:47:37,641][08518] Updated weights for policy 0, policy_version 18000 (0.0007) +[2024-08-03 17:47:40,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9489.1, 300 sec: 9413.9). Total num frames: 9244672. Throughput: 0: 9554.0. Samples: 9238292. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) +[2024-08-03 17:47:40,755][08459] Avg episode reward: [(0, '5739.782')] +[2024-08-03 17:47:41,741][08518] Updated weights for policy 0, policy_version 18080 (0.0007) +[2024-08-03 17:47:45,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9400.0). Total num frames: 9289728. Throughput: 0: 9396.9. Samples: 9291196. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:47:45,763][08459] Avg episode reward: [(0, '5672.616')] +[2024-08-03 17:47:45,766][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000018144_9289728.pth... +[2024-08-03 17:47:45,769][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000017584_9003008.pth +[2024-08-03 17:47:46,442][08518] Updated weights for policy 0, policy_version 18160 (0.0007) +[2024-08-03 17:47:50,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9420.8, 300 sec: 9386.1). Total num frames: 9334784. Throughput: 0: 9424.5. Samples: 9319168. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:47:50,763][08459] Avg episode reward: [(0, '5738.802')] +[2024-08-03 17:47:50,917][08518] Updated weights for policy 0, policy_version 18240 (0.0006) +[2024-08-03 17:47:55,107][08518] Updated weights for policy 0, policy_version 18320 (0.0006) +[2024-08-03 17:47:55,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9400.0). Total num frames: 9383936. Throughput: 0: 9418.5. Samples: 9375612. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:47:55,755][08459] Avg episode reward: [(0, '5641.297')] +[2024-08-03 17:47:59,323][08518] Updated weights for policy 0, policy_version 18400 (0.0006) +[2024-08-03 17:48:00,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9420.8, 300 sec: 9400.0). Total num frames: 9433088. Throughput: 0: 9541.4. Samples: 9433248. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:48:00,755][08459] Avg episode reward: [(0, '5647.558')] +[2024-08-03 17:48:00,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000018424_9433088.pth... +[2024-08-03 17:48:00,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000017864_9146368.pth +[2024-08-03 17:48:03,444][08518] Updated weights for policy 0, policy_version 18480 (0.0007) +[2024-08-03 17:48:05,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9400.0). Total num frames: 9478144. Throughput: 0: 9529.0. Samples: 9463368. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:48:05,755][08459] Avg episode reward: [(0, '5886.180')] +[2024-08-03 17:48:07,981][08518] Updated weights for policy 0, policy_version 18560 (0.0008) +[2024-08-03 17:48:10,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9400.0). Total num frames: 9527296. Throughput: 0: 9556.5. Samples: 9519168. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:48:10,755][08459] Avg episode reward: [(0, '6072.532')] +[2024-08-03 17:48:10,756][08505] Saving new best policy, reward=6072.532! +[2024-08-03 17:48:12,251][08518] Updated weights for policy 0, policy_version 18640 (0.0006) +[2024-08-03 17:48:15,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9557.3, 300 sec: 9400.0). Total num frames: 9576448. Throughput: 0: 9521.4. Samples: 9576328. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:48:15,763][08459] Avg episode reward: [(0, '6104.801')] +[2024-08-03 17:48:15,766][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000018704_9576448.pth... +[2024-08-03 17:48:15,769][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000018144_9289728.pth +[2024-08-03 17:48:15,769][08505] Saving new best policy, reward=6104.801! +[2024-08-03 17:48:16,507][08518] Updated weights for policy 0, policy_version 18720 (0.0006) +[2024-08-03 17:48:20,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9400.0). Total num frames: 9621504. Throughput: 0: 9490.8. Samples: 9605972. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:48:20,763][08459] Avg episode reward: [(0, '6111.833')] +[2024-08-03 17:48:20,763][08505] Saving new best policy, reward=6111.833! +[2024-08-03 17:48:21,019][08518] Updated weights for policy 0, policy_version 18800 (0.0007) +[2024-08-03 17:48:25,412][08518] Updated weights for policy 0, policy_version 18880 (0.0007) +[2024-08-03 17:48:25,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9489.1, 300 sec: 9386.1). Total num frames: 9666560. Throughput: 0: 9354.3. Samples: 9659236. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:48:25,755][08459] Avg episode reward: [(0, '6159.222')] +[2024-08-03 17:48:25,756][08505] Saving new best policy, reward=6159.222! +[2024-08-03 17:48:29,892][08518] Updated weights for policy 0, policy_version 18960 (0.0007) +[2024-08-03 17:48:30,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9400.0). Total num frames: 9715712. Throughput: 0: 9433.2. Samples: 9715692. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:48:30,755][08459] Avg episode reward: [(0, '6099.992')] +[2024-08-03 17:48:30,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000018976_9715712.pth... +[2024-08-03 17:48:30,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000018424_9433088.pth +[2024-08-03 17:48:33,998][08518] Updated weights for policy 0, policy_version 19040 (0.0005) +[2024-08-03 17:48:35,755][08459] Fps is (10 sec: 9830.4, 60 sec: 9489.1, 300 sec: 9413.9). Total num frames: 9764864. Throughput: 0: 9477.2. Samples: 9745644. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:48:35,755][08459] Avg episode reward: [(0, '5928.801')] +[2024-08-03 17:48:38,203][08518] Updated weights for policy 0, policy_version 19120 (0.0006) +[2024-08-03 17:48:40,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9420.8, 300 sec: 9400.0). Total num frames: 9809920. Throughput: 0: 9529.2. Samples: 9804428. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:48:40,755][08459] Avg episode reward: [(0, '5888.697')] +[2024-08-03 17:48:42,537][08518] Updated weights for policy 0, policy_version 19200 (0.0007) +[2024-08-03 17:48:45,755][08459] Fps is (10 sec: 9420.7, 60 sec: 9489.1, 300 sec: 9413.9). Total num frames: 9859072. Throughput: 0: 9515.2. Samples: 9861432. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) +[2024-08-03 17:48:45,755][08459] Avg episode reward: [(0, '5903.253')] +[2024-08-03 17:48:45,758][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000019256_9859072.pth... +[2024-08-03 17:48:45,761][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000018704_9576448.pth +[2024-08-03 17:48:46,920][08518] Updated weights for policy 0, policy_version 19280 (0.0007) +[2024-08-03 17:48:50,755][08459] Fps is (10 sec: 9420.8, 60 sec: 9489.1, 300 sec: 9413.9). Total num frames: 9904128. Throughput: 0: 9408.8. Samples: 9886764. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2024-08-03 17:48:50,763][08459] Avg episode reward: [(0, '5917.079')] +[2024-08-03 17:48:51,541][08518] Updated weights for policy 0, policy_version 19360 (0.0010) +[2024-08-03 17:48:55,755][08459] Fps is (10 sec: 9011.2, 60 sec: 9420.8, 300 sec: 9400.0). Total num frames: 9949184. Throughput: 0: 9399.7. Samples: 9942156. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2024-08-03 17:48:55,763][08459] Avg episode reward: [(0, '6134.657')] +[2024-08-03 17:48:55,817][08518] Updated weights for policy 0, policy_version 19440 (0.0006) +[2024-08-03 17:48:59,920][08518] Updated weights for policy 0, policy_version 19520 (0.0005) +[2024-08-03 17:49:00,755][08459] Fps is (10 sec: 9830.3, 60 sec: 9489.1, 300 sec: 9427.7). Total num frames: 10002432. Throughput: 0: 9469.0. Samples: 10002432. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) +[2024-08-03 17:49:00,755][08459] Avg episode reward: [(0, '6071.706')] +[2024-08-03 17:49:00,759][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000019536_10002432.pth... +[2024-08-03 17:49:00,762][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000018976_9715712.pth +[2024-08-03 17:49:01,200][08505] Early stopping after 2 epochs (8 sgd steps), loss delta 0.0000000 +[2024-08-03 17:49:01,202][08530] Stopping RolloutWorker_w4... +[2024-08-03 17:49:01,202][08520] Stopping RolloutWorker_w1... +[2024-08-03 17:49:01,202][08532] Stopping RolloutWorker_w6... +[2024-08-03 17:49:01,202][08519] Stopping RolloutWorker_w0... +[2024-08-03 17:49:01,202][08521] Stopping RolloutWorker_w2... +[2024-08-03 17:49:01,202][08531] Stopping RolloutWorker_w5... +[2024-08-03 17:49:01,202][08505] Stopping Batcher_0... +[2024-08-03 17:49:01,202][08530] Loop rollout_proc4_evt_loop terminating... +[2024-08-03 17:49:01,202][08520] Loop rollout_proc1_evt_loop terminating... +[2024-08-03 17:49:01,202][08532] Loop rollout_proc6_evt_loop terminating... +[2024-08-03 17:49:01,202][08519] Loop rollout_proc0_evt_loop terminating... +[2024-08-03 17:49:01,202][08521] Loop rollout_proc2_evt_loop terminating... +[2024-08-03 17:49:01,202][08505] Loop batcher_evt_loop terminating... +[2024-08-03 17:49:01,203][08529] Stopping RolloutWorker_w3... +[2024-08-03 17:49:01,203][08533] Stopping RolloutWorker_w7... +[2024-08-03 17:49:01,203][08529] Loop rollout_proc3_evt_loop terminating... +[2024-08-03 17:49:01,202][08531] Loop rollout_proc5_evt_loop terminating... +[2024-08-03 17:49:01,204][08533] Loop rollout_proc7_evt_loop terminating... +[2024-08-03 17:49:01,208][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000019544_10006528.pth... +[2024-08-03 17:49:01,208][08459] Component RolloutWorker_w4 stopped! +[2024-08-03 17:49:01,209][08459] Component RolloutWorker_w1 stopped! +[2024-08-03 17:49:01,209][08459] Component RolloutWorker_w2 stopped! +[2024-08-03 17:49:01,209][08459] Component RolloutWorker_w5 stopped! +[2024-08-03 17:49:01,210][08459] Component RolloutWorker_w6 stopped! +[2024-08-03 17:49:01,210][08459] Component RolloutWorker_w0 stopped! +[2024-08-03 17:49:01,210][08459] Component Batcher_0 stopped! +[2024-08-03 17:49:01,210][08459] Component RolloutWorker_w3 stopped! +[2024-08-03 17:49:01,211][08459] Component RolloutWorker_w7 stopped! +[2024-08-03 17:49:01,212][08505] Removing /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000019256_9859072.pth +[2024-08-03 17:49:01,213][08505] Saving /home/evgenii/Documents/Jupyter_notebooks/SAMPLE_FACTORY/train_dir/mujoco_first_run/checkpoint_p0/checkpoint_000019544_10006528.pth... +[2024-08-03 17:49:01,217][08505] Stopping LearnerWorker_p0... +[2024-08-03 17:49:01,218][08505] Loop learner_proc0_evt_loop terminating... +[2024-08-03 17:49:01,218][08459] Component LearnerWorker_p0 stopped! +[2024-08-03 17:49:01,268][08518] Weights refcount: 2 0 +[2024-08-03 17:49:01,269][08518] Stopping InferenceWorker_p0-w0... +[2024-08-03 17:49:01,269][08459] Component InferenceWorker_p0-w0 stopped! +[2024-08-03 17:49:01,269][08518] Loop inference_proc0-0_evt_loop terminating... +[2024-08-03 17:49:01,269][08459] Waiting for process learner_proc0 to stop... +[2024-08-03 17:49:02,022][08459] Waiting for process inference_proc0-0 to join... +[2024-08-03 17:49:02,022][08459] Waiting for process rollout_proc0 to join... +[2024-08-03 17:49:02,022][08459] Waiting for process rollout_proc1 to join... +[2024-08-03 17:49:02,023][08459] Waiting for process rollout_proc2 to join... +[2024-08-03 17:49:02,023][08459] Waiting for process rollout_proc3 to join... +[2024-08-03 17:49:02,023][08459] Waiting for process rollout_proc4 to join... +[2024-08-03 17:49:02,023][08459] Waiting for process rollout_proc5 to join... +[2024-08-03 17:49:02,023][08459] Waiting for process rollout_proc6 to join... +[2024-08-03 17:49:02,023][08459] Waiting for process rollout_proc7 to join... +[2024-08-03 17:49:02,024][08459] Batcher 0 profile tree view: +batching: 7.2641, releasing_batches: 1.3576 +[2024-08-03 17:49:02,024][08459] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0051 + wait_policy_total: 355.7692 +update_model: 10.9505 + weight_update: 0.0005 +one_step: 0.0016 + handle_policy_step: 657.2699 + deserialize: 18.5976, stack: 4.3380, obs_to_device_normalize: 140.2299, forward: 323.1131, send_messages: 46.9133 + prepare_outputs: 87.5350 + to_cpu: 49.3123 +[2024-08-03 17:49:02,024][08459] Learner 0 profile tree view: +misc: 0.0075, prepare_batch: 9.9317 +train: 103.6763 + epoch_init: 0.0446, minibatch_init: 1.6079, losses_postprocess: 3.2881, kl_divergence: 1.6551, after_optimizer: 1.5556 + calculate_losses: 33.4953 + losses_init: 0.0477, forward_head: 3.1640, bptt_initial: 0.2234, bptt: 0.1978, tail: 12.5279, advantages_returns: 1.8080, losses: 13.6363 + update: 59.7942 + clip: 7.5762 +[2024-08-03 17:49:02,024][08459] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.3959, enqueue_policy_requests: 20.9536, env_step: 425.4342, overhead: 36.4816, complete_rollouts: 0.5829 +save_policy_outputs: 54.8924 + split_output_tensors: 18.5788 +[2024-08-03 17:49:02,024][08459] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.4337, enqueue_policy_requests: 20.5288, env_step: 427.3592, overhead: 36.4778, complete_rollouts: 0.5643 +save_policy_outputs: 54.9152 + split_output_tensors: 18.7499 +[2024-08-03 17:49:02,024][08459] Loop Runner_EvtLoop terminating... +[2024-08-03 17:49:02,024][08459] Runner profile tree view: +main_loop: 1078.6660 +[2024-08-03 17:49:02,025][08459] Collected {0: 10006528}, FPS: 9276.8