[2023-08-17 11:34:50,384][121125] Saving configuration to /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... [2023-08-17 11:34:50,404][121125] Rollout worker 0 uses device cpu [2023-08-17 11:34:50,404][121125] Rollout worker 1 uses device cpu [2023-08-17 11:34:50,405][121125] Rollout worker 2 uses device cpu [2023-08-17 11:34:50,405][121125] Rollout worker 3 uses device cpu [2023-08-17 11:34:50,406][121125] Rollout worker 4 uses device cpu [2023-08-17 11:34:50,406][121125] Rollout worker 5 uses device cpu [2023-08-17 11:34:50,406][121125] Rollout worker 6 uses device cpu [2023-08-17 11:34:50,406][121125] Rollout worker 7 uses device cpu [2023-08-17 11:34:50,440][121125] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 11:34:50,441][121125] InferenceWorker_p0-w0: min num requests: 2 [2023-08-17 11:34:50,458][121125] Starting all processes... [2023-08-17 11:34:50,458][121125] Starting process learner_proc0 [2023-08-17 11:34:50,508][121125] Starting all processes... [2023-08-17 11:34:50,512][121125] Starting process inference_proc0-0 [2023-08-17 11:34:50,512][121125] Starting process rollout_proc0 [2023-08-17 11:34:50,513][121125] Starting process rollout_proc1 [2023-08-17 11:34:50,514][121125] Starting process rollout_proc2 [2023-08-17 11:34:50,514][121125] Starting process rollout_proc3 [2023-08-17 11:34:50,514][121125] Starting process rollout_proc4 [2023-08-17 11:34:50,514][121125] Starting process rollout_proc5 [2023-08-17 11:34:50,514][121125] Starting process rollout_proc6 [2023-08-17 11:34:50,515][121125] Starting process rollout_proc7 [2023-08-17 11:34:51,414][121211] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 11:34:51,414][121211] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-08-17 11:34:51,424][121211] Num visible devices: 1 [2023-08-17 11:34:51,443][121211] Starting seed is not provided [2023-08-17 11:34:51,444][121211] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 11:34:51,444][121211] Initializing actor-critic model on device cuda:0 [2023-08-17 11:34:51,444][121211] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 11:34:51,445][121211] RunningMeanStd input shape: (1,) [2023-08-17 11:34:51,456][121211] ConvEncoder: input_channels=3 [2023-08-17 11:34:51,528][121211] Conv encoder output size: 512 [2023-08-17 11:34:51,528][121211] Policy head output size: 512 [2023-08-17 11:34:51,541][121211] Created Actor Critic model with architecture: [2023-08-17 11:34:51,541][121211] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-08-17 11:34:51,559][121232] Worker 6 uses CPU cores [18, 19, 20] [2023-08-17 11:34:51,561][121228] Worker 3 uses CPU cores [9, 10, 11] [2023-08-17 11:34:51,567][121230] Worker 5 uses CPU cores [15, 16, 17] [2023-08-17 11:34:51,567][121226] Worker 0 uses CPU cores [0, 1, 2] [2023-08-17 11:34:51,573][121224] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 11:34:51,573][121224] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-08-17 11:34:51,581][121224] Num visible devices: 1 [2023-08-17 11:34:51,590][121225] Worker 1 uses CPU cores [3, 4, 5] [2023-08-17 11:34:51,593][121229] Worker 4 uses CPU cores [12, 13, 14] [2023-08-17 11:34:51,613][121231] Worker 7 uses CPU cores [21, 22, 23] [2023-08-17 11:34:51,628][121227] Worker 2 uses CPU cores [6, 7, 8] [2023-08-17 11:34:53,156][121211] Using optimizer [2023-08-17 11:34:53,157][121211] No checkpoints found [2023-08-17 11:34:53,157][121211] Did not load from checkpoint, starting from scratch! [2023-08-17 11:34:53,157][121211] Initialized policy 0 weights for model version 0 [2023-08-17 11:34:53,158][121211] LearnerWorker_p0 finished initialization! [2023-08-17 11:34:53,158][121211] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 11:34:53,455][121125] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-08-17 11:34:53,703][121224] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 11:34:53,703][121224] RunningMeanStd input shape: (1,) [2023-08-17 11:34:53,710][121224] ConvEncoder: input_channels=3 [2023-08-17 11:34:53,760][121224] Conv encoder output size: 512 [2023-08-17 11:34:53,760][121224] Policy head output size: 512 [2023-08-17 11:34:54,313][121125] Inference worker 0-0 is ready! [2023-08-17 11:34:54,314][121125] All inference workers are ready! Signal rollout workers to start! [2023-08-17 11:34:54,329][121229] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 11:34:54,329][121226] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 11:34:54,329][121231] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 11:34:54,329][121232] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 11:34:54,330][121227] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 11:34:54,330][121228] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 11:34:54,333][121225] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 11:34:54,333][121230] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 11:34:54,460][121227] VizDoom game.init() threw an exception ViZDoomUnexpectedExitException('Controlled ViZDoom instance exited unexpectedly.'). Terminate process... [2023-08-17 11:34:54,461][121227] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Sampler', signal_name='_inference_workers_initialized'), args=() Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 228, in _game_init self.game.init() vizdoom.vizdoom.ViZDoomUnexpectedExitException: Controlled ViZDoom instance exited unexpectedly. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 150, in init env_runner.init(self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 418, in init self._reset() File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 430, in _reset observations, info = e.reset(seed=seed) # new way of doing seeding since Gym 0.26.0 File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 414, in reset return self.env.reset(seed=seed, options=options) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 125, in reset obs, info = self.env.reset(**kwargs) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 110, in reset obs, info = self.env.reset(**kwargs) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 30, in reset return self.env.reset(**kwargs) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 462, in reset obs, info = self.env.reset(seed=seed, options=options) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 82, in reset obs, info = self.env.reset(**kwargs) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 414, in reset return self.env.reset(seed=seed, options=options) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 51, in reset return self.env.reset(**kwargs) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 323, in reset self._ensure_initialized() File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 274, in _ensure_initialized self.initialize() File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 269, in initialize self._game_init() File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 244, in _game_init raise EnvCriticalError() sample_factory.envs.env_utils.EnvCriticalError [2023-08-17 11:34:54,462][121227] Unhandled exception in evt loop rollout_proc2_evt_loop [2023-08-17 11:34:54,535][121231] Decorrelating experience for 0 frames... [2023-08-17 11:34:54,544][121226] Decorrelating experience for 0 frames... [2023-08-17 11:34:54,548][121232] Decorrelating experience for 0 frames... [2023-08-17 11:34:54,550][121228] Decorrelating experience for 0 frames... [2023-08-17 11:34:54,550][121230] Decorrelating experience for 0 frames... [2023-08-17 11:34:54,728][121231] Decorrelating experience for 32 frames... [2023-08-17 11:34:54,746][121229] Decorrelating experience for 0 frames... [2023-08-17 11:34:54,747][121228] Decorrelating experience for 32 frames... [2023-08-17 11:34:54,747][121230] Decorrelating experience for 32 frames... [2023-08-17 11:34:54,748][121232] Decorrelating experience for 32 frames... [2023-08-17 11:34:54,934][121229] Decorrelating experience for 32 frames... [2023-08-17 11:34:54,934][121226] Decorrelating experience for 32 frames... [2023-08-17 11:34:54,954][121230] Decorrelating experience for 64 frames... [2023-08-17 11:34:54,954][121228] Decorrelating experience for 64 frames... [2023-08-17 11:34:54,955][121232] Decorrelating experience for 64 frames... [2023-08-17 11:34:55,137][121226] Decorrelating experience for 64 frames... [2023-08-17 11:34:55,138][121229] Decorrelating experience for 64 frames... [2023-08-17 11:34:55,138][121231] Decorrelating experience for 64 frames... [2023-08-17 11:34:55,138][121225] Decorrelating experience for 0 frames... [2023-08-17 11:34:55,144][121232] Decorrelating experience for 96 frames... [2023-08-17 11:34:55,336][121225] Decorrelating experience for 32 frames... [2023-08-17 11:34:55,338][121231] Decorrelating experience for 96 frames... [2023-08-17 11:34:55,367][121228] Decorrelating experience for 96 frames... [2023-08-17 11:34:55,519][121225] Decorrelating experience for 64 frames... [2023-08-17 11:34:55,524][121226] Decorrelating experience for 96 frames... [2023-08-17 11:34:55,734][121230] Decorrelating experience for 96 frames... [2023-08-17 11:34:55,737][121229] Decorrelating experience for 96 frames... [2023-08-17 11:34:55,742][121225] Decorrelating experience for 96 frames... [2023-08-17 11:34:56,232][121211] Signal inference workers to stop experience collection... [2023-08-17 11:34:56,234][121224] InferenceWorker_p0-w0: stopping experience collection [2023-08-17 11:34:57,320][121211] Signal inference workers to resume experience collection... [2023-08-17 11:34:57,320][121224] InferenceWorker_p0-w0: resuming experience collection [2023-08-17 11:34:58,455][121125] Fps is (10 sec: 6553.7, 60 sec: 6553.7, 300 sec: 6553.7). Total num frames: 32768. Throughput: 0: 566.8. Samples: 2834. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-08-17 11:34:58,456][121125] Avg episode reward: [(0, '3.897')] [2023-08-17 11:34:58,583][121224] Updated weights for policy 0, policy_version 10 (0.0188) [2023-08-17 11:34:59,640][121224] Updated weights for policy 0, policy_version 20 (0.0006) [2023-08-17 11:35:00,607][121224] Updated weights for policy 0, policy_version 30 (0.0005) [2023-08-17 11:35:01,567][121224] Updated weights for policy 0, policy_version 40 (0.0005) [2023-08-17 11:35:02,584][121224] Updated weights for policy 0, policy_version 50 (0.0006) [2023-08-17 11:35:03,455][121125] Fps is (10 sec: 23347.3, 60 sec: 23347.3, 300 sec: 23347.3). Total num frames: 233472. Throughput: 0: 5847.0. Samples: 58470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-08-17 11:35:03,456][121125] Avg episode reward: [(0, '4.655')] [2023-08-17 11:35:03,457][121211] Saving new best policy, reward=4.655! [2023-08-17 11:35:03,725][121224] Updated weights for policy 0, policy_version 60 (0.0006) [2023-08-17 11:35:04,740][121224] Updated weights for policy 0, policy_version 70 (0.0006) [2023-08-17 11:35:05,714][121224] Updated weights for policy 0, policy_version 80 (0.0005) [2023-08-17 11:35:06,667][121224] Updated weights for policy 0, policy_version 90 (0.0005) [2023-08-17 11:35:07,660][121224] Updated weights for policy 0, policy_version 100 (0.0005) [2023-08-17 11:35:08,455][121125] Fps is (10 sec: 40959.9, 60 sec: 29491.3, 300 sec: 29491.3). Total num frames: 442368. Throughput: 0: 5926.0. Samples: 88890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-08-17 11:35:08,456][121125] Avg episode reward: [(0, '4.598')] [2023-08-17 11:35:08,628][121224] Updated weights for policy 0, policy_version 110 (0.0005) [2023-08-17 11:35:09,699][121224] Updated weights for policy 0, policy_version 120 (0.0006) [2023-08-17 11:35:10,436][121125] Heartbeat connected on Batcher_0 [2023-08-17 11:35:10,438][121125] Heartbeat connected on LearnerWorker_p0 [2023-08-17 11:35:10,442][121125] Heartbeat connected on InferenceWorker_p0-w0 [2023-08-17 11:35:10,444][121125] Heartbeat connected on RolloutWorker_w0 [2023-08-17 11:35:10,446][121125] Heartbeat connected on RolloutWorker_w1 [2023-08-17 11:35:10,450][121125] Heartbeat connected on RolloutWorker_w3 [2023-08-17 11:35:10,453][121125] Heartbeat connected on RolloutWorker_w4 [2023-08-17 11:35:10,455][121125] Heartbeat connected on RolloutWorker_w5 [2023-08-17 11:35:10,456][121125] Heartbeat connected on RolloutWorker_w6 [2023-08-17 11:35:10,459][121125] Heartbeat connected on RolloutWorker_w7 [2023-08-17 11:35:10,742][121224] Updated weights for policy 0, policy_version 130 (0.0005) [2023-08-17 11:35:11,789][121224] Updated weights for policy 0, policy_version 140 (0.0005) [2023-08-17 11:35:12,785][121224] Updated weights for policy 0, policy_version 150 (0.0005) [2023-08-17 11:35:13,455][121125] Fps is (10 sec: 40550.2, 60 sec: 31948.8, 300 sec: 31948.8). Total num frames: 638976. Throughput: 0: 7468.3. Samples: 149366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-08-17 11:35:13,456][121125] Avg episode reward: [(0, '4.646')] [2023-08-17 11:35:13,856][121224] Updated weights for policy 0, policy_version 160 (0.0007) [2023-08-17 11:35:14,953][121224] Updated weights for policy 0, policy_version 170 (0.0007) [2023-08-17 11:35:15,042][121230] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(0, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 11:35:15,042][121229] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(1, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 11:35:15,043][121230] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop [2023-08-17 11:35:15,043][121229] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop [2023-08-17 11:35:15,042][121228] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(0, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 11:35:15,044][121228] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop [2023-08-17 11:35:15,043][121231] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(0, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 11:35:15,044][121231] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop [2023-08-17 11:35:15,046][121232] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(0, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 11:35:15,047][121232] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop [2023-08-17 11:35:15,046][121225] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(1, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 11:35:15,047][121225] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop [2023-08-17 11:35:15,042][121226] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(0, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 11:35:15,048][121226] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop [2023-08-17 11:35:15,058][121125] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 121125], exiting... [2023-08-17 11:35:15,059][121125] Runner profile tree view: main_loop: 24.6015 [2023-08-17 11:35:15,060][121211] Stopping Batcher_0... [2023-08-17 11:35:15,060][121211] Loop batcher_evt_loop terminating... [2023-08-17 11:35:15,060][121125] Collected {0: 696320}, FPS: 28303.9 [2023-08-17 11:35:15,061][121211] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000170_696320.pth... [2023-08-17 11:35:15,107][121211] Stopping LearnerWorker_p0... [2023-08-17 11:35:15,108][121211] Loop learner_proc0_evt_loop terminating... [2023-08-17 11:35:15,121][121224] Weights refcount: 2 0 [2023-08-17 11:35:15,123][121224] Stopping InferenceWorker_p0-w0... [2023-08-17 11:35:15,123][121224] Loop inference_proc0-0_evt_loop terminating... [2023-08-17 12:08:48,041][131794] Saving configuration to /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... [2023-08-17 12:08:48,053][131794] Rollout worker 0 uses device cpu [2023-08-17 12:08:48,054][131794] Rollout worker 1 uses device cpu [2023-08-17 12:08:48,054][131794] Rollout worker 2 uses device cpu [2023-08-17 12:08:48,055][131794] Rollout worker 3 uses device cpu [2023-08-17 12:08:48,055][131794] Rollout worker 4 uses device cpu [2023-08-17 12:08:48,055][131794] Rollout worker 5 uses device cpu [2023-08-17 12:08:48,056][131794] Rollout worker 6 uses device cpu [2023-08-17 12:08:48,056][131794] Rollout worker 7 uses device cpu [2023-08-17 12:08:48,086][131794] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 12:08:48,087][131794] InferenceWorker_p0-w0: min num requests: 2 [2023-08-17 12:08:48,104][131794] Starting all processes... [2023-08-17 12:08:48,105][131794] Starting process learner_proc0 [2023-08-17 12:08:48,154][131794] Starting all processes... [2023-08-17 12:08:48,158][131794] Starting process inference_proc0-0 [2023-08-17 12:08:48,158][131794] Starting process rollout_proc0 [2023-08-17 12:08:48,158][131794] Starting process rollout_proc1 [2023-08-17 12:08:48,159][131794] Starting process rollout_proc2 [2023-08-17 12:08:48,159][131794] Starting process rollout_proc3 [2023-08-17 12:08:48,160][131794] Starting process rollout_proc4 [2023-08-17 12:08:48,161][131794] Starting process rollout_proc5 [2023-08-17 12:08:48,162][131794] Starting process rollout_proc6 [2023-08-17 12:08:48,162][131794] Starting process rollout_proc7 [2023-08-17 12:08:49,071][131864] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 12:08:49,071][131864] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-08-17 12:08:49,082][131864] Num visible devices: 1 [2023-08-17 12:08:49,101][131864] Starting seed is not provided [2023-08-17 12:08:49,102][131864] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 12:08:49,102][131864] Initializing actor-critic model on device cuda:0 [2023-08-17 12:08:49,102][131864] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 12:08:49,102][131864] RunningMeanStd input shape: (1,) [2023-08-17 12:08:49,111][131864] ConvEncoder: input_channels=3 [2023-08-17 12:08:49,128][131877] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 12:08:49,128][131877] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-08-17 12:08:49,131][131877] Num visible devices: 1 [2023-08-17 12:08:49,176][131864] Conv encoder output size: 512 [2023-08-17 12:08:49,176][131864] Policy head output size: 512 [2023-08-17 12:08:49,183][131864] Created Actor Critic model with architecture: [2023-08-17 12:08:49,183][131864] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-08-17 12:08:49,230][131885] Worker 7 uses CPU cores [21, 22, 23] [2023-08-17 12:08:49,237][131880] Worker 2 uses CPU cores [6, 7, 8] [2023-08-17 12:08:49,239][131884] Worker 6 uses CPU cores [18, 19, 20] [2023-08-17 12:08:49,240][131879] Worker 0 uses CPU cores [0, 1, 2] [2023-08-17 12:08:49,241][131883] Worker 5 uses CPU cores [15, 16, 17] [2023-08-17 12:08:49,242][131881] Worker 3 uses CPU cores [9, 10, 11] [2023-08-17 12:08:49,247][131882] Worker 4 uses CPU cores [12, 13, 14] [2023-08-17 12:08:49,280][131878] Worker 1 uses CPU cores [3, 4, 5] [2023-08-17 12:08:50,303][131864] Using optimizer [2023-08-17 12:08:50,303][131864] Loading state from checkpoint /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000170_696320.pth... [2023-08-17 12:08:50,319][131864] Loading model from checkpoint [2023-08-17 12:08:50,321][131864] Loaded experiment state at self.train_step=170, self.env_steps=696320 [2023-08-17 12:08:50,322][131864] Initialized policy 0 weights for model version 170 [2023-08-17 12:08:50,322][131864] LearnerWorker_p0 finished initialization! [2023-08-17 12:08:50,323][131864] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 12:08:50,883][131877] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 12:08:50,883][131877] RunningMeanStd input shape: (1,) [2023-08-17 12:08:50,890][131877] ConvEncoder: input_channels=3 [2023-08-17 12:08:50,941][131877] Conv encoder output size: 512 [2023-08-17 12:08:50,941][131877] Policy head output size: 512 [2023-08-17 12:08:51,065][131794] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 696320. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-08-17 12:08:51,490][131794] Inference worker 0-0 is ready! [2023-08-17 12:08:51,491][131794] All inference workers are ready! Signal rollout workers to start! [2023-08-17 12:08:51,506][131880] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:08:51,506][131884] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:08:51,506][131881] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:08:51,506][131885] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:08:51,508][131879] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:08:51,508][131878] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:08:51,508][131882] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:08:51,508][131883] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:08:51,714][131885] Decorrelating experience for 0 frames... [2023-08-17 12:08:51,715][131883] Decorrelating experience for 0 frames... [2023-08-17 12:08:51,715][131882] Decorrelating experience for 0 frames... [2023-08-17 12:08:51,720][131881] Decorrelating experience for 0 frames... [2023-08-17 12:08:51,735][131884] Decorrelating experience for 0 frames... [2023-08-17 12:08:51,743][131880] Decorrelating experience for 0 frames... [2023-08-17 12:08:51,900][131882] Decorrelating experience for 32 frames... [2023-08-17 12:08:51,906][131885] Decorrelating experience for 32 frames... [2023-08-17 12:08:51,906][131881] Decorrelating experience for 32 frames... [2023-08-17 12:08:51,910][131878] Decorrelating experience for 0 frames... [2023-08-17 12:08:51,920][131884] Decorrelating experience for 32 frames... [2023-08-17 12:08:51,931][131880] Decorrelating experience for 32 frames... [2023-08-17 12:08:51,956][131883] Decorrelating experience for 32 frames... [2023-08-17 12:08:52,104][131878] Decorrelating experience for 32 frames... [2023-08-17 12:08:52,108][131882] Decorrelating experience for 64 frames... [2023-08-17 12:08:52,116][131879] Decorrelating experience for 0 frames... [2023-08-17 12:08:52,124][131881] Decorrelating experience for 64 frames... [2023-08-17 12:08:52,131][131884] Decorrelating experience for 64 frames... [2023-08-17 12:08:52,182][131880] Decorrelating experience for 64 frames... [2023-08-17 12:08:52,299][131878] Decorrelating experience for 64 frames... [2023-08-17 12:08:52,301][131879] Decorrelating experience for 32 frames... [2023-08-17 12:08:52,315][131885] Decorrelating experience for 64 frames... [2023-08-17 12:08:52,441][131882] Decorrelating experience for 96 frames... [2023-08-17 12:08:52,485][131884] Decorrelating experience for 96 frames... [2023-08-17 12:08:52,502][131878] Decorrelating experience for 96 frames... [2023-08-17 12:08:52,530][131879] Decorrelating experience for 64 frames... [2023-08-17 12:08:52,680][131881] Decorrelating experience for 96 frames... [2023-08-17 12:08:52,728][131883] Decorrelating experience for 64 frames... [2023-08-17 12:08:52,732][131885] Decorrelating experience for 96 frames... [2023-08-17 12:08:52,869][131879] Decorrelating experience for 96 frames... [2023-08-17 12:08:52,919][131883] Decorrelating experience for 96 frames... [2023-08-17 12:08:52,923][131880] Decorrelating experience for 96 frames... [2023-08-17 12:08:53,218][131864] Signal inference workers to stop experience collection... [2023-08-17 12:08:53,246][131877] InferenceWorker_p0-w0: stopping experience collection [2023-08-17 12:08:54,097][131864] Signal inference workers to resume experience collection... [2023-08-17 12:08:54,097][131877] InferenceWorker_p0-w0: resuming experience collection [2023-08-17 12:08:55,323][131877] Updated weights for policy 0, policy_version 180 (0.0180) [2023-08-17 12:08:56,065][131794] Fps is (10 sec: 13926.5, 60 sec: 13926.5, 300 sec: 13926.5). Total num frames: 765952. Throughput: 0: 662.8. Samples: 3314. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-08-17 12:08:56,066][131794] Avg episode reward: [(0, '4.760')] [2023-08-17 12:08:56,070][131864] Saving new best policy, reward=4.760! [2023-08-17 12:08:56,305][131877] Updated weights for policy 0, policy_version 190 (0.0006) [2023-08-17 12:08:57,286][131877] Updated weights for policy 0, policy_version 200 (0.0006) [2023-08-17 12:08:58,252][131877] Updated weights for policy 0, policy_version 210 (0.0007) [2023-08-17 12:08:59,223][131877] Updated weights for policy 0, policy_version 220 (0.0006) [2023-08-17 12:09:00,213][131877] Updated weights for policy 0, policy_version 230 (0.0006) [2023-08-17 12:09:01,065][131794] Fps is (10 sec: 27853.0, 60 sec: 27853.0, 300 sec: 27853.0). Total num frames: 974848. Throughput: 0: 6506.0. Samples: 65060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 12:09:01,067][131794] Avg episode reward: [(0, '4.894')] [2023-08-17 12:09:01,068][131864] Saving new best policy, reward=4.894! [2023-08-17 12:09:01,199][131877] Updated weights for policy 0, policy_version 240 (0.0006) [2023-08-17 12:09:02,167][131877] Updated weights for policy 0, policy_version 250 (0.0006) [2023-08-17 12:09:03,131][131877] Updated weights for policy 0, policy_version 260 (0.0006) [2023-08-17 12:09:04,168][131877] Updated weights for policy 0, policy_version 270 (0.0007) [2023-08-17 12:09:05,171][131877] Updated weights for policy 0, policy_version 280 (0.0007) [2023-08-17 12:09:06,065][131794] Fps is (10 sec: 41779.5, 60 sec: 32495.2, 300 sec: 32495.2). Total num frames: 1183744. Throughput: 0: 6420.3. Samples: 96304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2023-08-17 12:09:06,066][131794] Avg episode reward: [(0, '5.072')] [2023-08-17 12:09:06,069][131864] Saving new best policy, reward=5.072! [2023-08-17 12:09:06,158][131877] Updated weights for policy 0, policy_version 290 (0.0007) [2023-08-17 12:09:07,207][131877] Updated weights for policy 0, policy_version 300 (0.0006) [2023-08-17 12:09:08,081][131794] Heartbeat connected on Batcher_0 [2023-08-17 12:09:08,084][131794] Heartbeat connected on LearnerWorker_p0 [2023-08-17 12:09:08,089][131794] Heartbeat connected on InferenceWorker_p0-w0 [2023-08-17 12:09:08,090][131794] Heartbeat connected on RolloutWorker_w0 [2023-08-17 12:09:08,093][131794] Heartbeat connected on RolloutWorker_w1 [2023-08-17 12:09:08,094][131794] Heartbeat connected on RolloutWorker_w2 [2023-08-17 12:09:08,097][131794] Heartbeat connected on RolloutWorker_w3 [2023-08-17 12:09:08,098][131794] Heartbeat connected on RolloutWorker_w4 [2023-08-17 12:09:08,102][131794] Heartbeat connected on RolloutWorker_w5 [2023-08-17 12:09:08,102][131794] Heartbeat connected on RolloutWorker_w6 [2023-08-17 12:09:08,105][131794] Heartbeat connected on RolloutWorker_w7 [2023-08-17 12:09:08,269][131877] Updated weights for policy 0, policy_version 310 (0.0007) [2023-08-17 12:09:09,317][131877] Updated weights for policy 0, policy_version 320 (0.0007) [2023-08-17 12:09:10,353][131877] Updated weights for policy 0, policy_version 330 (0.0006) [2023-08-17 12:09:11,065][131794] Fps is (10 sec: 40140.7, 60 sec: 33996.9, 300 sec: 33996.9). Total num frames: 1376256. Throughput: 0: 7819.2. Samples: 156384. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-08-17 12:09:11,066][131794] Avg episode reward: [(0, '5.895')] [2023-08-17 12:09:11,067][131864] Saving new best policy, reward=5.895! [2023-08-17 12:09:11,382][131877] Updated weights for policy 0, policy_version 340 (0.0006) [2023-08-17 12:09:12,356][131877] Updated weights for policy 0, policy_version 350 (0.0007) [2023-08-17 12:09:13,358][131877] Updated weights for policy 0, policy_version 360 (0.0006) [2023-08-17 12:09:14,367][131877] Updated weights for policy 0, policy_version 370 (0.0006) [2023-08-17 12:09:15,414][131877] Updated weights for policy 0, policy_version 380 (0.0006) [2023-08-17 12:09:16,065][131794] Fps is (10 sec: 39731.1, 60 sec: 35389.6, 300 sec: 35389.6). Total num frames: 1581056. Throughput: 0: 8668.8. Samples: 216718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-08-17 12:09:16,066][131794] Avg episode reward: [(0, '5.889')] [2023-08-17 12:09:16,442][131877] Updated weights for policy 0, policy_version 390 (0.0006) [2023-08-17 12:09:17,456][131877] Updated weights for policy 0, policy_version 400 (0.0006) [2023-08-17 12:09:18,507][131877] Updated weights for policy 0, policy_version 410 (0.0006) [2023-08-17 12:09:19,530][131877] Updated weights for policy 0, policy_version 420 (0.0007) [2023-08-17 12:09:20,560][131877] Updated weights for policy 0, policy_version 430 (0.0007) [2023-08-17 12:09:21,065][131794] Fps is (10 sec: 40550.8, 60 sec: 36181.5, 300 sec: 36181.5). Total num frames: 1781760. Throughput: 0: 8212.2. Samples: 246364. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 12:09:21,066][131794] Avg episode reward: [(0, '6.038')] [2023-08-17 12:09:21,066][131864] Saving new best policy, reward=6.038! [2023-08-17 12:09:21,554][131877] Updated weights for policy 0, policy_version 440 (0.0006) [2023-08-17 12:09:22,574][131877] Updated weights for policy 0, policy_version 450 (0.0007) [2023-08-17 12:09:23,579][131877] Updated weights for policy 0, policy_version 460 (0.0006) [2023-08-17 12:09:24,534][131877] Updated weights for policy 0, policy_version 470 (0.0006) [2023-08-17 12:09:25,469][131877] Updated weights for policy 0, policy_version 480 (0.0006) [2023-08-17 12:09:26,065][131794] Fps is (10 sec: 40960.0, 60 sec: 36981.1, 300 sec: 36981.1). Total num frames: 1990656. Throughput: 0: 8786.5. Samples: 307528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 12:09:26,066][131794] Avg episode reward: [(0, '8.814')] [2023-08-17 12:09:26,068][131864] Saving new best policy, reward=8.814! [2023-08-17 12:09:26,467][131877] Updated weights for policy 0, policy_version 490 (0.0007) [2023-08-17 12:09:27,472][131877] Updated weights for policy 0, policy_version 500 (0.0007) [2023-08-17 12:09:28,481][131877] Updated weights for policy 0, policy_version 510 (0.0006) [2023-08-17 12:09:29,497][131877] Updated weights for policy 0, policy_version 520 (0.0006) [2023-08-17 12:09:30,455][131877] Updated weights for policy 0, policy_version 530 (0.0007) [2023-08-17 12:09:31,065][131794] Fps is (10 sec: 40959.8, 60 sec: 37376.1, 300 sec: 37376.1). Total num frames: 2191360. Throughput: 0: 9247.0. Samples: 369878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-08-17 12:09:31,066][131794] Avg episode reward: [(0, '10.505')] [2023-08-17 12:09:31,067][131864] Saving new best policy, reward=10.505! [2023-08-17 12:09:31,504][131877] Updated weights for policy 0, policy_version 540 (0.0006) [2023-08-17 12:09:32,471][131877] Updated weights for policy 0, policy_version 550 (0.0006) [2023-08-17 12:09:33,439][131877] Updated weights for policy 0, policy_version 560 (0.0006) [2023-08-17 12:09:34,495][131877] Updated weights for policy 0, policy_version 570 (0.0007) [2023-08-17 12:09:35,483][131877] Updated weights for policy 0, policy_version 580 (0.0006) [2023-08-17 12:09:36,065][131794] Fps is (10 sec: 40550.4, 60 sec: 37774.3, 300 sec: 37774.3). Total num frames: 2396160. Throughput: 0: 8903.2. Samples: 400642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-08-17 12:09:36,066][131794] Avg episode reward: [(0, '13.981')] [2023-08-17 12:09:36,069][131864] Saving new best policy, reward=13.981! [2023-08-17 12:09:36,506][131877] Updated weights for policy 0, policy_version 590 (0.0007) [2023-08-17 12:09:37,541][131877] Updated weights for policy 0, policy_version 600 (0.0006) [2023-08-17 12:09:38,535][131877] Updated weights for policy 0, policy_version 610 (0.0007) [2023-08-17 12:09:39,574][131877] Updated weights for policy 0, policy_version 620 (0.0006) [2023-08-17 12:09:40,629][131877] Updated weights for policy 0, policy_version 630 (0.0007) [2023-08-17 12:09:41,065][131794] Fps is (10 sec: 40550.4, 60 sec: 38010.9, 300 sec: 38010.9). Total num frames: 2596864. Throughput: 0: 10165.2. Samples: 460746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 12:09:41,066][131794] Avg episode reward: [(0, '16.720')] [2023-08-17 12:09:41,067][131864] Saving new best policy, reward=16.720! [2023-08-17 12:09:41,662][131877] Updated weights for policy 0, policy_version 640 (0.0006) [2023-08-17 12:09:42,705][131877] Updated weights for policy 0, policy_version 650 (0.0007) [2023-08-17 12:09:43,682][131877] Updated weights for policy 0, policy_version 660 (0.0006) [2023-08-17 12:09:44,707][131877] Updated weights for policy 0, policy_version 670 (0.0007) [2023-08-17 12:09:45,654][131877] Updated weights for policy 0, policy_version 680 (0.0006) [2023-08-17 12:09:46,065][131794] Fps is (10 sec: 40550.4, 60 sec: 38279.1, 300 sec: 38279.1). Total num frames: 2801664. Throughput: 0: 10135.5. Samples: 521158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-08-17 12:09:46,066][131794] Avg episode reward: [(0, '15.224')] [2023-08-17 12:09:46,644][131877] Updated weights for policy 0, policy_version 690 (0.0006) [2023-08-17 12:09:47,596][131877] Updated weights for policy 0, policy_version 700 (0.0006) [2023-08-17 12:09:48,566][131877] Updated weights for policy 0, policy_version 710 (0.0006) [2023-08-17 12:09:49,544][131877] Updated weights for policy 0, policy_version 720 (0.0006) [2023-08-17 12:09:50,566][131877] Updated weights for policy 0, policy_version 730 (0.0007) [2023-08-17 12:09:51,065][131794] Fps is (10 sec: 41369.5, 60 sec: 38570.7, 300 sec: 38570.7). Total num frames: 3010560. Throughput: 0: 10141.9. Samples: 552690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 12:09:51,066][131794] Avg episode reward: [(0, '16.889')] [2023-08-17 12:09:51,067][131864] Saving new best policy, reward=16.889! [2023-08-17 12:09:51,497][131877] Updated weights for policy 0, policy_version 740 (0.0006) [2023-08-17 12:09:52,458][131877] Updated weights for policy 0, policy_version 750 (0.0006) [2023-08-17 12:09:53,420][131877] Updated weights for policy 0, policy_version 760 (0.0006) [2023-08-17 12:09:54,449][131877] Updated weights for policy 0, policy_version 770 (0.0007) [2023-08-17 12:09:55,425][131877] Updated weights for policy 0, policy_version 780 (0.0007) [2023-08-17 12:09:56,065][131794] Fps is (10 sec: 41779.2, 60 sec: 40891.8, 300 sec: 38817.5). Total num frames: 3219456. Throughput: 0: 10201.7. Samples: 615458. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 12:09:56,066][131794] Avg episode reward: [(0, '18.774')] [2023-08-17 12:09:56,069][131864] Saving new best policy, reward=18.774! [2023-08-17 12:09:56,434][131877] Updated weights for policy 0, policy_version 790 (0.0006) [2023-08-17 12:09:57,381][131877] Updated weights for policy 0, policy_version 800 (0.0007) [2023-08-17 12:09:58,323][131877] Updated weights for policy 0, policy_version 810 (0.0006) [2023-08-17 12:09:59,287][131877] Updated weights for policy 0, policy_version 820 (0.0006) [2023-08-17 12:10:00,214][131877] Updated weights for policy 0, policy_version 830 (0.0006) [2023-08-17 12:10:01,065][131794] Fps is (10 sec: 42188.9, 60 sec: 40960.0, 300 sec: 39087.6). Total num frames: 3432448. Throughput: 0: 10286.3. Samples: 679604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 12:10:01,066][131794] Avg episode reward: [(0, '21.885')] [2023-08-17 12:10:01,067][131864] Saving new best policy, reward=21.885! [2023-08-17 12:10:01,201][131877] Updated weights for policy 0, policy_version 840 (0.0006) [2023-08-17 12:10:02,187][131877] Updated weights for policy 0, policy_version 850 (0.0006) [2023-08-17 12:10:03,190][131877] Updated weights for policy 0, policy_version 860 (0.0007) [2023-08-17 12:10:04,150][131877] Updated weights for policy 0, policy_version 870 (0.0006) [2023-08-17 12:10:05,158][131877] Updated weights for policy 0, policy_version 880 (0.0006) [2023-08-17 12:10:06,065][131794] Fps is (10 sec: 41779.0, 60 sec: 40891.7, 300 sec: 39212.4). Total num frames: 3637248. Throughput: 0: 10318.1. Samples: 710680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 12:10:06,066][131794] Avg episode reward: [(0, '19.364')] [2023-08-17 12:10:06,176][131877] Updated weights for policy 0, policy_version 890 (0.0007) [2023-08-17 12:10:07,147][131877] Updated weights for policy 0, policy_version 900 (0.0006) [2023-08-17 12:10:08,101][131877] Updated weights for policy 0, policy_version 910 (0.0007) [2023-08-17 12:10:09,146][131877] Updated weights for policy 0, policy_version 920 (0.0007) [2023-08-17 12:10:10,146][131877] Updated weights for policy 0, policy_version 930 (0.0006) [2023-08-17 12:10:11,065][131794] Fps is (10 sec: 41369.6, 60 sec: 41164.8, 300 sec: 39372.8). Total num frames: 3846144. Throughput: 0: 10327.7. Samples: 772274. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-08-17 12:10:11,066][131794] Avg episode reward: [(0, '19.007')] [2023-08-17 12:10:11,109][131877] Updated weights for policy 0, policy_version 940 (0.0006) [2023-08-17 12:10:12,081][131877] Updated weights for policy 0, policy_version 950 (0.0007) [2023-08-17 12:10:13,110][131877] Updated weights for policy 0, policy_version 960 (0.0007) [2023-08-17 12:10:14,089][131877] Updated weights for policy 0, policy_version 970 (0.0006) [2023-08-17 12:10:14,877][131864] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-08-17 12:10:14,877][131794] Component Batcher_0 stopped! [2023-08-17 12:10:14,877][131864] Stopping Batcher_0... [2023-08-17 12:10:14,889][131864] Loop batcher_evt_loop terminating... [2023-08-17 12:10:14,890][131877] Weights refcount: 2 0 [2023-08-17 12:10:14,891][131877] Stopping InferenceWorker_p0-w0... [2023-08-17 12:10:14,891][131877] Loop inference_proc0-0_evt_loop terminating... [2023-08-17 12:10:14,891][131794] Component InferenceWorker_p0-w0 stopped! [2023-08-17 12:10:14,910][131864] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-08-17 12:10:14,930][131881] Stopping RolloutWorker_w3... [2023-08-17 12:10:14,930][131881] Loop rollout_proc3_evt_loop terminating... [2023-08-17 12:10:14,930][131794] Component RolloutWorker_w3 stopped! [2023-08-17 12:10:14,935][131882] Stopping RolloutWorker_w4... [2023-08-17 12:10:14,935][131878] Stopping RolloutWorker_w1... [2023-08-17 12:10:14,935][131878] Loop rollout_proc1_evt_loop terminating... [2023-08-17 12:10:14,935][131882] Loop rollout_proc4_evt_loop terminating... [2023-08-17 12:10:14,935][131794] Component RolloutWorker_w4 stopped! [2023-08-17 12:10:14,936][131884] Stopping RolloutWorker_w6... [2023-08-17 12:10:14,936][131794] Component RolloutWorker_w1 stopped! [2023-08-17 12:10:14,936][131884] Loop rollout_proc6_evt_loop terminating... [2023-08-17 12:10:14,936][131794] Component RolloutWorker_w6 stopped! [2023-08-17 12:10:14,938][131879] Stopping RolloutWorker_w0... [2023-08-17 12:10:14,938][131879] Loop rollout_proc0_evt_loop terminating... [2023-08-17 12:10:14,938][131794] Component RolloutWorker_w0 stopped! [2023-08-17 12:10:14,940][131883] Stopping RolloutWorker_w5... [2023-08-17 12:10:14,940][131794] Component RolloutWorker_w5 stopped! [2023-08-17 12:10:14,940][131883] Loop rollout_proc5_evt_loop terminating... [2023-08-17 12:10:14,940][131885] Stopping RolloutWorker_w7... [2023-08-17 12:10:14,940][131885] Loop rollout_proc7_evt_loop terminating... [2023-08-17 12:10:14,940][131794] Component RolloutWorker_w7 stopped! [2023-08-17 12:10:14,963][131880] Stopping RolloutWorker_w2... [2023-08-17 12:10:14,963][131880] Loop rollout_proc2_evt_loop terminating... [2023-08-17 12:10:14,963][131794] Component RolloutWorker_w2 stopped! [2023-08-17 12:10:14,973][131864] Stopping LearnerWorker_p0... [2023-08-17 12:10:14,974][131864] Loop learner_proc0_evt_loop terminating... [2023-08-17 12:10:14,974][131794] Component LearnerWorker_p0 stopped! [2023-08-17 12:10:14,975][131794] Waiting for process learner_proc0 to stop... [2023-08-17 12:10:15,620][131794] Waiting for process inference_proc0-0 to join... [2023-08-17 12:10:15,620][131794] Waiting for process rollout_proc0 to join... [2023-08-17 12:10:15,629][131794] Waiting for process rollout_proc1 to join... [2023-08-17 12:10:15,630][131794] Waiting for process rollout_proc2 to join... [2023-08-17 12:10:15,630][131794] Waiting for process rollout_proc3 to join... [2023-08-17 12:10:15,631][131794] Waiting for process rollout_proc4 to join... [2023-08-17 12:10:15,632][131794] Waiting for process rollout_proc5 to join... [2023-08-17 12:10:15,632][131794] Waiting for process rollout_proc6 to join... [2023-08-17 12:10:15,633][131794] Waiting for process rollout_proc7 to join... [2023-08-17 12:10:15,633][131794] Batcher 0 profile tree view: batching: 6.7047, releasing_batches: 0.0087 [2023-08-17 12:10:15,634][131794] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 2.0680 update_model: 1.3116 weight_update: 0.0007 one_step: 0.0012 handle_policy_step: 74.7597 deserialize: 3.1557, stack: 0.3403, obs_to_device_normalize: 17.0704, forward: 37.2802, send_messages: 4.8157 prepare_outputs: 8.7975 to_cpu: 5.6577 [2023-08-17 12:10:15,634][131794] Learner 0 profile tree view: misc: 0.0028, prepare_batch: 4.0784 train: 9.7538 epoch_init: 0.0026, minibatch_init: 0.0026, losses_postprocess: 0.2418, kl_divergence: 0.1909, after_optimizer: 0.2499 calculate_losses: 3.4695 losses_init: 0.0014, forward_head: 0.2537, bptt_initial: 2.0322, tail: 0.2342, advantages_returns: 0.0568, losses: 0.4473 bptt: 0.3786 bptt_forward_core: 0.3600 update: 5.4474 clip: 2.7340 [2023-08-17 12:10:15,635][131794] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0615, enqueue_policy_requests: 2.6644, env_step: 37.2031, overhead: 3.5782, complete_rollouts: 0.0881 save_policy_outputs: 3.6697 split_output_tensors: 1.7077 [2023-08-17 12:10:15,635][131794] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0601, enqueue_policy_requests: 2.6081, env_step: 35.6438, overhead: 3.4090, complete_rollouts: 0.0854 save_policy_outputs: 3.4689 split_output_tensors: 1.6294 [2023-08-17 12:10:15,636][131794] Loop Runner_EvtLoop terminating... [2023-08-17 12:10:15,637][131794] Runner profile tree view: main_loop: 87.5323 [2023-08-17 12:10:15,637][131794] Collected {0: 4005888}, FPS: 37809.7 [2023-08-17 12:11:06,264][131794] Loading existing experiment configuration from /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json [2023-08-17 12:11:06,264][131794] Overriding arg 'num_workers' with value 1 passed from command line [2023-08-17 12:11:06,265][131794] Adding new argument 'no_render'=True that is not in the saved config file! [2023-08-17 12:11:06,265][131794] Adding new argument 'save_video'=True that is not in the saved config file! [2023-08-17 12:11:06,265][131794] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-08-17 12:11:06,266][131794] Adding new argument 'video_name'=None that is not in the saved config file! [2023-08-17 12:11:06,266][131794] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-08-17 12:11:06,266][131794] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-08-17 12:11:06,266][131794] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-08-17 12:11:06,267][131794] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-08-17 12:11:06,267][131794] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-08-17 12:11:06,267][131794] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-08-17 12:11:06,268][131794] Adding new argument 'train_script'=None that is not in the saved config file! [2023-08-17 12:11:06,268][131794] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-08-17 12:11:06,268][131794] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-08-17 12:11:06,274][131794] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:11:06,275][131794] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 12:11:06,275][131794] RunningMeanStd input shape: (1,) [2023-08-17 12:11:06,283][131794] ConvEncoder: input_channels=3 [2023-08-17 12:11:06,348][131794] Conv encoder output size: 512 [2023-08-17 12:11:06,349][131794] Policy head output size: 512 [2023-08-17 12:11:07,473][131794] Loading state from checkpoint /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-08-17 12:11:08,166][131794] Num frames 100... [2023-08-17 12:11:08,225][131794] Num frames 200... [2023-08-17 12:11:08,283][131794] Num frames 300... [2023-08-17 12:11:08,341][131794] Num frames 400... [2023-08-17 12:11:08,400][131794] Num frames 500... [2023-08-17 12:11:08,457][131794] Num frames 600... [2023-08-17 12:11:08,516][131794] Num frames 700... [2023-08-17 12:11:08,574][131794] Num frames 800... [2023-08-17 12:11:08,632][131794] Num frames 900... [2023-08-17 12:11:08,694][131794] Num frames 1000... [2023-08-17 12:11:08,753][131794] Num frames 1100... [2023-08-17 12:11:08,811][131794] Num frames 1200... [2023-08-17 12:11:08,869][131794] Num frames 1300... [2023-08-17 12:11:08,928][131794] Num frames 1400... [2023-08-17 12:11:09,003][131794] Avg episode rewards: #0: 31.400, true rewards: #0: 14.400 [2023-08-17 12:11:09,004][131794] Avg episode reward: 31.400, avg true_objective: 14.400 [2023-08-17 12:11:09,040][131794] Num frames 1500... [2023-08-17 12:11:09,097][131794] Num frames 1600... [2023-08-17 12:11:09,154][131794] Num frames 1700... [2023-08-17 12:11:09,212][131794] Num frames 1800... [2023-08-17 12:11:09,297][131794] Avg episode rewards: #0: 18.780, true rewards: #0: 9.280 [2023-08-17 12:11:09,298][131794] Avg episode reward: 18.780, avg true_objective: 9.280 [2023-08-17 12:11:09,323][131794] Num frames 1900... [2023-08-17 12:11:09,382][131794] Num frames 2000... [2023-08-17 12:11:09,439][131794] Num frames 2100... [2023-08-17 12:11:09,498][131794] Num frames 2200... [2023-08-17 12:11:09,574][131794] Avg episode rewards: #0: 14.467, true rewards: #0: 7.467 [2023-08-17 12:11:09,574][131794] Avg episode reward: 14.467, avg true_objective: 7.467 [2023-08-17 12:11:09,610][131794] Num frames 2300... [2023-08-17 12:11:09,670][131794] Num frames 2400... [2023-08-17 12:11:09,729][131794] Num frames 2500... [2023-08-17 12:11:09,789][131794] Num frames 2600... [2023-08-17 12:11:09,847][131794] Num frames 2700... [2023-08-17 12:11:09,906][131794] Num frames 2800... [2023-08-17 12:11:09,963][131794] Num frames 2900... [2023-08-17 12:11:10,021][131794] Num frames 3000... [2023-08-17 12:11:10,079][131794] Num frames 3100... [2023-08-17 12:11:10,138][131794] Num frames 3200... [2023-08-17 12:11:10,200][131794] Num frames 3300... [2023-08-17 12:11:10,259][131794] Num frames 3400... [2023-08-17 12:11:10,317][131794] Num frames 3500... [2023-08-17 12:11:10,375][131794] Num frames 3600... [2023-08-17 12:11:10,433][131794] Num frames 3700... [2023-08-17 12:11:10,492][131794] Num frames 3800... [2023-08-17 12:11:10,550][131794] Num frames 3900... [2023-08-17 12:11:10,610][131794] Num frames 4000... [2023-08-17 12:11:10,668][131794] Num frames 4100... [2023-08-17 12:11:10,727][131794] Num frames 4200... [2023-08-17 12:11:10,794][131794] Avg episode rewards: #0: 22.560, true rewards: #0: 10.560 [2023-08-17 12:11:10,794][131794] Avg episode reward: 22.560, avg true_objective: 10.560 [2023-08-17 12:11:10,840][131794] Num frames 4300... [2023-08-17 12:11:10,898][131794] Num frames 4400... [2023-08-17 12:11:10,959][131794] Num frames 4500... [2023-08-17 12:11:11,017][131794] Num frames 4600... [2023-08-17 12:11:11,076][131794] Num frames 4700... [2023-08-17 12:11:11,134][131794] Num frames 4800... [2023-08-17 12:11:11,242][131794] Avg episode rewards: #0: 20.392, true rewards: #0: 9.792 [2023-08-17 12:11:11,243][131794] Avg episode reward: 20.392, avg true_objective: 9.792 [2023-08-17 12:11:11,246][131794] Num frames 4900... [2023-08-17 12:11:11,303][131794] Num frames 5000... [2023-08-17 12:11:11,363][131794] Num frames 5100... [2023-08-17 12:11:11,421][131794] Num frames 5200... [2023-08-17 12:11:11,479][131794] Num frames 5300... [2023-08-17 12:11:11,537][131794] Num frames 5400... [2023-08-17 12:11:11,594][131794] Num frames 5500... [2023-08-17 12:11:11,651][131794] Num frames 5600... [2023-08-17 12:11:11,709][131794] Num frames 5700... [2023-08-17 12:11:11,799][131794] Avg episode rewards: #0: 19.933, true rewards: #0: 9.600 [2023-08-17 12:11:11,799][131794] Avg episode reward: 19.933, avg true_objective: 9.600 [2023-08-17 12:11:11,823][131794] Num frames 5800... [2023-08-17 12:11:11,881][131794] Num frames 5900... [2023-08-17 12:11:11,940][131794] Num frames 6000... [2023-08-17 12:11:11,998][131794] Num frames 6100... [2023-08-17 12:11:12,056][131794] Num frames 6200... [2023-08-17 12:11:12,113][131794] Num frames 6300... [2023-08-17 12:11:12,170][131794] Num frames 6400... [2023-08-17 12:11:12,260][131794] Avg episode rewards: #0: 19.234, true rewards: #0: 9.234 [2023-08-17 12:11:12,260][131794] Avg episode reward: 19.234, avg true_objective: 9.234 [2023-08-17 12:11:12,282][131794] Num frames 6500... [2023-08-17 12:11:12,340][131794] Num frames 6600... [2023-08-17 12:11:12,398][131794] Num frames 6700... [2023-08-17 12:11:12,455][131794] Num frames 6800... [2023-08-17 12:11:12,513][131794] Num frames 6900... [2023-08-17 12:11:12,609][131794] Avg episode rewards: #0: 17.720, true rewards: #0: 8.720 [2023-08-17 12:11:12,609][131794] Avg episode reward: 17.720, avg true_objective: 8.720 [2023-08-17 12:11:12,624][131794] Num frames 7000... [2023-08-17 12:11:12,683][131794] Num frames 7100... [2023-08-17 12:11:12,757][131794] Avg episode rewards: #0: 16.040, true rewards: #0: 7.929 [2023-08-17 12:11:12,757][131794] Avg episode reward: 16.040, avg true_objective: 7.929 [2023-08-17 12:11:12,794][131794] Num frames 7200... [2023-08-17 12:11:12,852][131794] Num frames 7300... [2023-08-17 12:11:12,910][131794] Num frames 7400... [2023-08-17 12:11:12,968][131794] Num frames 7500... [2023-08-17 12:11:13,027][131794] Num frames 7600... [2023-08-17 12:11:13,084][131794] Num frames 7700... [2023-08-17 12:11:13,142][131794] Num frames 7800... [2023-08-17 12:11:13,200][131794] Num frames 7900... [2023-08-17 12:11:13,298][131794] Avg episode rewards: #0: 16.278, true rewards: #0: 7.978 [2023-08-17 12:11:13,299][131794] Avg episode reward: 16.278, avg true_objective: 7.978 [2023-08-17 12:11:20,895][131794] Replay video saved to /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! [2023-08-17 12:52:38,482][131794] Loading existing experiment configuration from /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json [2023-08-17 12:52:38,483][131794] Overriding arg 'num_workers' with value 1 passed from command line [2023-08-17 12:52:38,483][131794] Adding new argument 'no_render'=True that is not in the saved config file! [2023-08-17 12:52:38,483][131794] Adding new argument 'save_video'=True that is not in the saved config file! [2023-08-17 12:52:38,484][131794] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-08-17 12:52:38,484][131794] Adding new argument 'video_name'=None that is not in the saved config file! [2023-08-17 12:52:38,485][131794] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-08-17 12:52:38,485][131794] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-08-17 12:52:38,486][131794] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-08-17 12:52:38,486][131794] Adding new argument 'hf_repository'='patonw/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-08-17 12:52:38,486][131794] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-08-17 12:52:38,487][131794] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-08-17 12:52:38,487][131794] Adding new argument 'train_script'=None that is not in the saved config file! [2023-08-17 12:52:38,487][131794] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-08-17 12:52:38,488][131794] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-08-17 12:52:38,491][131794] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 12:52:38,492][131794] RunningMeanStd input shape: (1,) [2023-08-17 12:52:38,497][131794] ConvEncoder: input_channels=3 [2023-08-17 12:52:38,518][131794] Conv encoder output size: 512 [2023-08-17 12:52:38,519][131794] Policy head output size: 512 [2023-08-17 12:52:38,534][131794] Loading state from checkpoint /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-08-17 12:52:38,832][131794] Num frames 100... [2023-08-17 12:52:38,889][131794] Num frames 200... [2023-08-17 12:52:38,952][131794] Num frames 300... [2023-08-17 12:52:39,009][131794] Num frames 400... [2023-08-17 12:52:39,065][131794] Num frames 500... [2023-08-17 12:52:39,122][131794] Num frames 600... [2023-08-17 12:52:39,178][131794] Num frames 700... [2023-08-17 12:52:39,237][131794] Num frames 800... [2023-08-17 12:52:39,293][131794] Num frames 900... [2023-08-17 12:52:39,349][131794] Num frames 1000... [2023-08-17 12:52:39,406][131794] Num frames 1100... [2023-08-17 12:52:39,462][131794] Num frames 1200... [2023-08-17 12:52:39,520][131794] Num frames 1300... [2023-08-17 12:52:39,576][131794] Num frames 1400... [2023-08-17 12:52:39,633][131794] Num frames 1500... [2023-08-17 12:52:39,691][131794] Num frames 1600... [2023-08-17 12:52:39,748][131794] Num frames 1700... [2023-08-17 12:52:39,804][131794] Num frames 1800... [2023-08-17 12:52:39,860][131794] Num frames 1900... [2023-08-17 12:52:39,917][131794] Num frames 2000... [2023-08-17 12:52:39,997][131794] Avg episode rewards: #0: 47.479, true rewards: #0: 20.480 [2023-08-17 12:52:39,998][131794] Avg episode reward: 47.479, avg true_objective: 20.480 [2023-08-17 12:52:40,027][131794] Num frames 2100... [2023-08-17 12:52:40,084][131794] Num frames 2200... [2023-08-17 12:52:40,142][131794] Num frames 2300... [2023-08-17 12:52:40,199][131794] Num frames 2400... [2023-08-17 12:52:40,256][131794] Num frames 2500... [2023-08-17 12:52:40,312][131794] Num frames 2600... [2023-08-17 12:52:40,368][131794] Num frames 2700... [2023-08-17 12:52:40,424][131794] Num frames 2800... [2023-08-17 12:52:40,522][131794] Avg episode rewards: #0: 30.900, true rewards: #0: 14.400 [2023-08-17 12:52:40,523][131794] Avg episode reward: 30.900, avg true_objective: 14.400 [2023-08-17 12:52:40,535][131794] Num frames 2900... [2023-08-17 12:52:40,591][131794] Num frames 3000... [2023-08-17 12:52:40,648][131794] Num frames 3100... [2023-08-17 12:52:40,704][131794] Num frames 3200... [2023-08-17 12:52:40,760][131794] Num frames 3300... [2023-08-17 12:52:40,815][131794] Num frames 3400... [2023-08-17 12:52:40,899][131794] Avg episode rewards: #0: 23.853, true rewards: #0: 11.520 [2023-08-17 12:52:40,899][131794] Avg episode reward: 23.853, avg true_objective: 11.520 [2023-08-17 12:52:40,925][131794] Num frames 3500... [2023-08-17 12:52:40,981][131794] Num frames 3600... [2023-08-17 12:52:41,039][131794] Num frames 3700... [2023-08-17 12:52:41,096][131794] Num frames 3800... [2023-08-17 12:52:41,153][131794] Num frames 3900... [2023-08-17 12:52:41,211][131794] Num frames 4000... [2023-08-17 12:52:41,270][131794] Num frames 4100... [2023-08-17 12:52:41,328][131794] Num frames 4200... [2023-08-17 12:52:41,386][131794] Num frames 4300... [2023-08-17 12:52:41,446][131794] Num frames 4400... [2023-08-17 12:52:41,525][131794] Avg episode rewards: #0: 22.870, true rewards: #0: 11.120 [2023-08-17 12:52:41,526][131794] Avg episode reward: 22.870, avg true_objective: 11.120 [2023-08-17 12:52:41,555][131794] Num frames 4500... [2023-08-17 12:52:41,612][131794] Num frames 4600... [2023-08-17 12:52:41,668][131794] Num frames 4700... [2023-08-17 12:52:41,725][131794] Num frames 4800... [2023-08-17 12:52:41,781][131794] Num frames 4900... [2023-08-17 12:52:41,868][131794] Avg episode rewards: #0: 20.322, true rewards: #0: 9.922 [2023-08-17 12:52:41,869][131794] Avg episode reward: 20.322, avg true_objective: 9.922 [2023-08-17 12:52:41,891][131794] Num frames 5000... [2023-08-17 12:52:41,948][131794] Num frames 5100... [2023-08-17 12:52:42,004][131794] Num frames 5200... [2023-08-17 12:52:42,063][131794] Num frames 5300... [2023-08-17 12:52:42,119][131794] Num frames 5400... [2023-08-17 12:52:42,175][131794] Num frames 5500... [2023-08-17 12:52:42,231][131794] Num frames 5600... [2023-08-17 12:52:42,287][131794] Num frames 5700... [2023-08-17 12:52:42,344][131794] Num frames 5800... [2023-08-17 12:52:42,400][131794] Num frames 5900... [2023-08-17 12:52:42,456][131794] Num frames 6000... [2023-08-17 12:52:42,513][131794] Num frames 6100... [2023-08-17 12:52:42,567][131794] Num frames 6200... [2023-08-17 12:52:42,621][131794] Num frames 6300... [2023-08-17 12:52:42,675][131794] Num frames 6400... [2023-08-17 12:52:42,762][131794] Avg episode rewards: #0: 22.108, true rewards: #0: 10.775 [2023-08-17 12:52:42,763][131794] Avg episode reward: 22.108, avg true_objective: 10.775 [2023-08-17 12:52:42,781][131794] Num frames 6500... [2023-08-17 12:52:42,836][131794] Num frames 6600... [2023-08-17 12:52:42,889][131794] Num frames 6700... [2023-08-17 12:52:42,944][131794] Num frames 6800... [2023-08-17 12:52:42,997][131794] Num frames 6900... [2023-08-17 12:52:43,051][131794] Num frames 7000... [2023-08-17 12:52:43,107][131794] Num frames 7100... [2023-08-17 12:52:43,162][131794] Num frames 7200... [2023-08-17 12:52:43,217][131794] Num frames 7300... [2023-08-17 12:52:43,285][131794] Avg episode rewards: #0: 21.751, true rewards: #0: 10.466 [2023-08-17 12:52:43,285][131794] Avg episode reward: 21.751, avg true_objective: 10.466 [2023-08-17 12:52:43,325][131794] Num frames 7400... [2023-08-17 12:52:43,381][131794] Num frames 7500... [2023-08-17 12:52:43,435][131794] Num frames 7600... [2023-08-17 12:52:43,490][131794] Num frames 7700... [2023-08-17 12:52:43,544][131794] Num frames 7800... [2023-08-17 12:52:43,602][131794] Num frames 7900... [2023-08-17 12:52:43,657][131794] Num frames 8000... [2023-08-17 12:52:43,711][131794] Num frames 8100... [2023-08-17 12:52:43,766][131794] Num frames 8200... [2023-08-17 12:52:43,824][131794] Num frames 8300... [2023-08-17 12:52:43,881][131794] Num frames 8400... [2023-08-17 12:52:43,959][131794] Avg episode rewards: #0: 22.183, true rewards: #0: 10.557 [2023-08-17 12:52:43,960][131794] Avg episode reward: 22.183, avg true_objective: 10.557 [2023-08-17 12:52:43,991][131794] Num frames 8500... [2023-08-17 12:52:44,048][131794] Num frames 8600... [2023-08-17 12:52:44,105][131794] Num frames 8700... [2023-08-17 12:52:44,162][131794] Num frames 8800... [2023-08-17 12:52:44,218][131794] Num frames 8900... [2023-08-17 12:52:44,272][131794] Num frames 9000... [2023-08-17 12:52:44,328][131794] Num frames 9100... [2023-08-17 12:52:44,383][131794] Num frames 9200... [2023-08-17 12:52:44,437][131794] Num frames 9300... [2023-08-17 12:52:44,492][131794] Num frames 9400... [2023-08-17 12:52:44,548][131794] Avg episode rewards: #0: 22.229, true rewards: #0: 10.451 [2023-08-17 12:52:44,549][131794] Avg episode reward: 22.229, avg true_objective: 10.451 [2023-08-17 12:52:44,599][131794] Num frames 9500... [2023-08-17 12:52:44,655][131794] Num frames 9600... [2023-08-17 12:52:44,709][131794] Num frames 9700... [2023-08-17 12:52:44,763][131794] Num frames 9800... [2023-08-17 12:52:44,835][131794] Avg episode rewards: #0: 20.835, true rewards: #0: 9.835 [2023-08-17 12:52:44,835][131794] Avg episode reward: 20.835, avg true_objective: 9.835 [2023-08-17 12:52:54,306][131794] Replay video saved to /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! [2023-08-17 12:54:20,339][131794] The model has been pushed to https://huggingface.co/patonw/rl_course_vizdoom_health_gathering_supreme [2023-08-17 12:59:09,422][131794] Environment doom_basic already registered, overwriting... [2023-08-17 12:59:09,423][131794] Environment doom_two_colors_easy already registered, overwriting... [2023-08-17 12:59:09,423][131794] Environment doom_two_colors_hard already registered, overwriting... [2023-08-17 12:59:09,424][131794] Environment doom_dm already registered, overwriting... [2023-08-17 12:59:09,424][131794] Environment doom_dwango5 already registered, overwriting... [2023-08-17 12:59:09,424][131794] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-08-17 12:59:09,425][131794] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-08-17 12:59:09,425][131794] Environment doom_my_way_home already registered, overwriting... [2023-08-17 12:59:09,425][131794] Environment doom_deadly_corridor already registered, overwriting... [2023-08-17 12:59:09,425][131794] Environment doom_defend_the_center already registered, overwriting... [2023-08-17 12:59:09,426][131794] Environment doom_defend_the_line already registered, overwriting... [2023-08-17 12:59:09,426][131794] Environment doom_health_gathering already registered, overwriting... [2023-08-17 12:59:09,426][131794] Environment doom_health_gathering_supreme already registered, overwriting... [2023-08-17 12:59:09,427][131794] Environment doom_battle already registered, overwriting... [2023-08-17 12:59:09,427][131794] Environment doom_battle2 already registered, overwriting... [2023-08-17 12:59:09,427][131794] Environment doom_duel_bots already registered, overwriting... [2023-08-17 12:59:09,427][131794] Environment doom_deathmatch_bots already registered, overwriting... [2023-08-17 12:59:09,428][131794] Environment doom_duel already registered, overwriting... [2023-08-17 12:59:09,428][131794] Environment doom_deathmatch_full already registered, overwriting... [2023-08-17 12:59:09,428][131794] Environment doom_benchmark already registered, overwriting... [2023-08-17 12:59:09,429][131794] register_encoder_factory: [2023-08-17 12:59:29,604][131794] Environment doom_basic already registered, overwriting... [2023-08-17 12:59:29,605][131794] Environment doom_two_colors_easy already registered, overwriting... [2023-08-17 12:59:29,606][131794] Environment doom_two_colors_hard already registered, overwriting... [2023-08-17 12:59:29,606][131794] Environment doom_dm already registered, overwriting... [2023-08-17 12:59:29,606][131794] Environment doom_dwango5 already registered, overwriting... [2023-08-17 12:59:29,607][131794] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-08-17 12:59:29,607][131794] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-08-17 12:59:29,607][131794] Environment doom_my_way_home already registered, overwriting... [2023-08-17 12:59:29,608][131794] Environment doom_deadly_corridor already registered, overwriting... [2023-08-17 12:59:29,608][131794] Environment doom_defend_the_center already registered, overwriting... [2023-08-17 12:59:29,608][131794] Environment doom_defend_the_line already registered, overwriting... [2023-08-17 12:59:29,609][131794] Environment doom_health_gathering already registered, overwriting... [2023-08-17 12:59:29,609][131794] Environment doom_health_gathering_supreme already registered, overwriting... [2023-08-17 12:59:29,609][131794] Environment doom_battle already registered, overwriting... [2023-08-17 12:59:29,610][131794] Environment doom_battle2 already registered, overwriting... [2023-08-17 12:59:29,610][131794] Environment doom_duel_bots already registered, overwriting... [2023-08-17 12:59:29,611][131794] Environment doom_deathmatch_bots already registered, overwriting... [2023-08-17 12:59:29,611][131794] Environment doom_duel already registered, overwriting... [2023-08-17 12:59:29,611][131794] Environment doom_deathmatch_full already registered, overwriting... [2023-08-17 12:59:29,612][131794] Environment doom_benchmark already registered, overwriting... [2023-08-17 12:59:29,612][131794] register_encoder_factory: [2023-08-17 12:59:29,630][131794] Loading existing experiment configuration from /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json [2023-08-17 12:59:29,631][131794] Overriding arg 'train_for_env_steps' with value 10000000 passed from command line [2023-08-17 12:59:29,631][131794] Overriding arg 'with_wandb' with value True passed from command line [2023-08-17 12:59:29,634][131794] Experiment dir /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists! [2023-08-17 12:59:29,635][131794] Resuming existing experiment from /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment... [2023-08-17 12:59:29,635][131794] Weights and Biases integration enabled. Project: sample_factory, user: None, group: None, unique_id: default_experiment_20230817_125929_635646 [2023-08-17 12:59:29,819][131794] Initializing WandB... [2023-08-17 12:59:34,788][131794] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-08-17 12:59:35,803][131794] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=10000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=True wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=336df5a551fea3a2cf40925bf3083db6b4518c91 git_repo_name=https://github.com/huggingface/deep-rl-class wandb_unique_id=default_experiment_20230817_125929_635646 [2023-08-17 12:59:35,804][131794] Saving configuration to /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... [2023-08-17 12:59:35,879][131794] Rollout worker 0 uses device cpu [2023-08-17 12:59:35,879][131794] Rollout worker 1 uses device cpu [2023-08-17 12:59:35,880][131794] Rollout worker 2 uses device cpu [2023-08-17 12:59:35,881][131794] Rollout worker 3 uses device cpu [2023-08-17 12:59:35,881][131794] Rollout worker 4 uses device cpu [2023-08-17 12:59:35,882][131794] Rollout worker 5 uses device cpu [2023-08-17 12:59:35,882][131794] Rollout worker 6 uses device cpu [2023-08-17 12:59:35,883][131794] Rollout worker 7 uses device cpu [2023-08-17 12:59:35,911][131794] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 12:59:35,912][131794] InferenceWorker_p0-w0: min num requests: 2 [2023-08-17 12:59:35,935][131794] Starting all processes... [2023-08-17 12:59:35,936][131794] Starting process learner_proc0 [2023-08-17 12:59:35,984][131794] Starting all processes... [2023-08-17 12:59:35,986][131794] Starting process inference_proc0-0 [2023-08-17 12:59:35,986][131794] Starting process rollout_proc0 [2023-08-17 12:59:35,986][131794] Starting process rollout_proc1 [2023-08-17 12:59:35,986][131794] Starting process rollout_proc2 [2023-08-17 12:59:35,987][131794] Starting process rollout_proc3 [2023-08-17 12:59:35,987][131794] Starting process rollout_proc4 [2023-08-17 12:59:35,988][131794] Starting process rollout_proc5 [2023-08-17 12:59:35,988][131794] Starting process rollout_proc6 [2023-08-17 12:59:35,989][131794] Starting process rollout_proc7 [2023-08-17 12:59:36,953][138062] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 12:59:36,953][138062] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-08-17 12:59:36,957][138062] Num visible devices: 1 [2023-08-17 12:59:36,973][138062] Starting seed is not provided [2023-08-17 12:59:36,974][138062] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 12:59:36,974][138062] Initializing actor-critic model on device cuda:0 [2023-08-17 12:59:36,974][138062] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 12:59:36,975][138062] RunningMeanStd input shape: (1,) [2023-08-17 12:59:36,983][138062] ConvEncoder: input_channels=3 [2023-08-17 12:59:37,000][138076] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 12:59:37,000][138076] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-08-17 12:59:37,004][138076] Num visible devices: 1 [2023-08-17 12:59:37,043][138062] Conv encoder output size: 512 [2023-08-17 12:59:37,043][138062] Policy head output size: 512 [2023-08-17 12:59:37,048][138077] Worker 1 uses CPU cores [3, 4, 5] [2023-08-17 12:59:37,050][138062] Created Actor Critic model with architecture: [2023-08-17 12:59:37,051][138062] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-08-17 12:59:37,051][138075] Worker 0 uses CPU cores [0, 1, 2] [2023-08-17 12:59:37,061][138081] Worker 5 uses CPU cores [15, 16, 17] [2023-08-17 12:59:37,061][138082] Worker 6 uses CPU cores [18, 19, 20] [2023-08-17 12:59:37,061][138083] Worker 7 uses CPU cores [21, 22, 23] [2023-08-17 12:59:37,065][138078] Worker 2 uses CPU cores [6, 7, 8] [2023-08-17 12:59:37,075][138079] Worker 3 uses CPU cores [9, 10, 11] [2023-08-17 12:59:37,079][138080] Worker 4 uses CPU cores [12, 13, 14] [2023-08-17 12:59:37,161][138062] Using optimizer [2023-08-17 12:59:37,161][138062] Loading state from checkpoint /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2023-08-17 12:59:37,184][138062] Loading model from checkpoint [2023-08-17 12:59:37,187][138062] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2023-08-17 12:59:37,187][138062] Initialized policy 0 weights for model version 978 [2023-08-17 12:59:37,188][138062] LearnerWorker_p0 finished initialization! [2023-08-17 12:59:37,188][138062] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 12:59:37,231][138076] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 12:59:37,232][138076] RunningMeanStd input shape: (1,) [2023-08-17 12:59:37,238][138076] ConvEncoder: input_channels=3 [2023-08-17 12:59:37,290][138076] Conv encoder output size: 512 [2023-08-17 12:59:37,290][138076] Policy head output size: 512 [2023-08-17 12:59:37,315][131794] Inference worker 0-0 is ready! [2023-08-17 12:59:37,316][131794] All inference workers are ready! Signal rollout workers to start! [2023-08-17 12:59:37,333][138083] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:59:37,334][138079] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:59:37,334][138080] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:59:37,334][138075] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:59:37,335][138077] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:59:37,335][138081] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:59:37,335][138078] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:59:37,336][138082] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 12:59:37,546][138075] Decorrelating experience for 0 frames... [2023-08-17 12:59:37,547][138083] Decorrelating experience for 0 frames... [2023-08-17 12:59:37,557][138077] Decorrelating experience for 0 frames... [2023-08-17 12:59:37,734][138083] Decorrelating experience for 32 frames... [2023-08-17 12:59:37,734][138079] Decorrelating experience for 0 frames... [2023-08-17 12:59:37,734][138078] Decorrelating experience for 0 frames... [2023-08-17 12:59:37,740][138077] Decorrelating experience for 32 frames... [2023-08-17 12:59:37,785][138075] Decorrelating experience for 32 frames... [2023-08-17 12:59:37,920][138078] Decorrelating experience for 32 frames... [2023-08-17 12:59:37,941][138083] Decorrelating experience for 64 frames... [2023-08-17 12:59:37,941][138080] Decorrelating experience for 0 frames... [2023-08-17 12:59:37,951][138077] Decorrelating experience for 64 frames... [2023-08-17 12:59:37,959][138079] Decorrelating experience for 32 frames... [2023-08-17 12:59:38,127][138078] Decorrelating experience for 64 frames... [2023-08-17 12:59:38,138][138080] Decorrelating experience for 32 frames... [2023-08-17 12:59:38,145][138075] Decorrelating experience for 64 frames... [2023-08-17 12:59:38,164][138083] Decorrelating experience for 96 frames... [2023-08-17 12:59:38,164][138077] Decorrelating experience for 96 frames... [2023-08-17 12:59:38,172][138079] Decorrelating experience for 64 frames... [2023-08-17 12:59:38,335][138078] Decorrelating experience for 96 frames... [2023-08-17 12:59:38,375][138081] Decorrelating experience for 0 frames... [2023-08-17 12:59:38,386][138075] Decorrelating experience for 96 frames... [2023-08-17 12:59:38,408][138079] Decorrelating experience for 96 frames... [2023-08-17 12:59:38,420][138080] Decorrelating experience for 64 frames... [2023-08-17 12:59:38,628][138082] Decorrelating experience for 0 frames... [2023-08-17 12:59:38,630][138081] Decorrelating experience for 32 frames... [2023-08-17 12:59:38,750][138062] Signal inference workers to stop experience collection... [2023-08-17 12:59:38,756][138076] InferenceWorker_p0-w0: stopping experience collection [2023-08-17 12:59:38,821][138082] Decorrelating experience for 32 frames... [2023-08-17 12:59:38,835][138081] Decorrelating experience for 64 frames... [2023-08-17 12:59:38,836][138080] Decorrelating experience for 96 frames... [2023-08-17 12:59:39,025][138082] Decorrelating experience for 64 frames... [2023-08-17 12:59:39,042][138081] Decorrelating experience for 96 frames... [2023-08-17 12:59:39,218][138082] Decorrelating experience for 96 frames... [2023-08-17 12:59:39,282][138062] Signal inference workers to resume experience collection... [2023-08-17 12:59:39,283][138076] InferenceWorker_p0-w0: resuming experience collection [2023-08-17 12:59:39,788][131794] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4014080. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-08-17 12:59:39,789][131794] Avg episode reward: [(0, '6.493')] [2023-08-17 12:59:40,549][138076] Updated weights for policy 0, policy_version 988 (0.0184) [2023-08-17 12:59:41,527][138076] Updated weights for policy 0, policy_version 998 (0.0006) [2023-08-17 12:59:42,585][138076] Updated weights for policy 0, policy_version 1008 (0.0006) [2023-08-17 12:59:43,648][138076] Updated weights for policy 0, policy_version 1018 (0.0007) [2023-08-17 12:59:44,652][138076] Updated weights for policy 0, policy_version 1028 (0.0006) [2023-08-17 12:59:44,788][131794] Fps is (10 sec: 40141.5, 60 sec: 40141.5, 300 sec: 40141.5). Total num frames: 4214784. Throughput: 0: 7708.9. Samples: 38544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 12:59:44,789][131794] Avg episode reward: [(0, '20.021')] [2023-08-17 12:59:45,583][138076] Updated weights for policy 0, policy_version 1038 (0.0006) [2023-08-17 12:59:46,560][138076] Updated weights for policy 0, policy_version 1048 (0.0006) [2023-08-17 12:59:47,593][138076] Updated weights for policy 0, policy_version 1058 (0.0006) [2023-08-17 12:59:48,603][138076] Updated weights for policy 0, policy_version 1068 (0.0006) [2023-08-17 12:59:49,605][138076] Updated weights for policy 0, policy_version 1078 (0.0006) [2023-08-17 12:59:49,788][131794] Fps is (10 sec: 40550.4, 60 sec: 40550.4, 300 sec: 40550.4). Total num frames: 4419584. Throughput: 0: 10010.8. Samples: 100108. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-08-17 12:59:49,789][131794] Avg episode reward: [(0, '21.241')] [2023-08-17 12:59:50,586][138076] Updated weights for policy 0, policy_version 1088 (0.0006) [2023-08-17 12:59:51,564][138076] Updated weights for policy 0, policy_version 1098 (0.0006) [2023-08-17 12:59:52,582][138076] Updated weights for policy 0, policy_version 1108 (0.0006) [2023-08-17 12:59:53,540][138076] Updated weights for policy 0, policy_version 1118 (0.0006) [2023-08-17 12:59:54,551][138076] Updated weights for policy 0, policy_version 1128 (0.0006) [2023-08-17 12:59:54,788][131794] Fps is (10 sec: 41369.6, 60 sec: 40960.3, 300 sec: 40960.3). Total num frames: 4628480. Throughput: 0: 8754.6. Samples: 131318. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2023-08-17 12:59:54,789][131794] Avg episode reward: [(0, '19.294')] [2023-08-17 12:59:55,528][138076] Updated weights for policy 0, policy_version 1138 (0.0006) [2023-08-17 12:59:55,905][131794] Heartbeat connected on Batcher_0 [2023-08-17 12:59:55,915][131794] Heartbeat connected on LearnerWorker_p0 [2023-08-17 12:59:55,916][131794] Heartbeat connected on InferenceWorker_p0-w0 [2023-08-17 12:59:55,917][131794] Heartbeat connected on RolloutWorker_w0 [2023-08-17 12:59:55,919][131794] Heartbeat connected on RolloutWorker_w1 [2023-08-17 12:59:55,922][131794] Heartbeat connected on RolloutWorker_w2 [2023-08-17 12:59:55,925][131794] Heartbeat connected on RolloutWorker_w3 [2023-08-17 12:59:55,928][131794] Heartbeat connected on RolloutWorker_w4 [2023-08-17 12:59:55,929][131794] Heartbeat connected on RolloutWorker_w5 [2023-08-17 12:59:55,932][131794] Heartbeat connected on RolloutWorker_w6 [2023-08-17 12:59:55,934][131794] Heartbeat connected on RolloutWorker_w7 [2023-08-17 12:59:56,515][138076] Updated weights for policy 0, policy_version 1148 (0.0006) [2023-08-17 12:59:57,482][138076] Updated weights for policy 0, policy_version 1158 (0.0006) [2023-08-17 12:59:58,514][138076] Updated weights for policy 0, policy_version 1168 (0.0006) [2023-08-17 12:59:59,555][138076] Updated weights for policy 0, policy_version 1178 (0.0006) [2023-08-17 12:59:59,788][131794] Fps is (10 sec: 41369.8, 60 sec: 40960.1, 300 sec: 40960.1). Total num frames: 4833280. Throughput: 0: 9661.5. Samples: 193230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 12:59:59,789][131794] Avg episode reward: [(0, '25.107')] [2023-08-17 12:59:59,790][138062] Saving new best policy, reward=25.107! [2023-08-17 13:00:00,580][138076] Updated weights for policy 0, policy_version 1188 (0.0006) [2023-08-17 13:00:01,571][138076] Updated weights for policy 0, policy_version 1198 (0.0006) [2023-08-17 13:00:02,515][138076] Updated weights for policy 0, policy_version 1208 (0.0006) [2023-08-17 13:00:03,491][138076] Updated weights for policy 0, policy_version 1218 (0.0006) [2023-08-17 13:00:04,455][138076] Updated weights for policy 0, policy_version 1228 (0.0006) [2023-08-17 13:00:04,788][131794] Fps is (10 sec: 41369.4, 60 sec: 41123.9, 300 sec: 41123.9). Total num frames: 5042176. Throughput: 0: 10216.7. Samples: 255418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:00:04,789][131794] Avg episode reward: [(0, '23.711')] [2023-08-17 13:00:05,441][138076] Updated weights for policy 0, policy_version 1238 (0.0006) [2023-08-17 13:00:06,429][138076] Updated weights for policy 0, policy_version 1248 (0.0006) [2023-08-17 13:00:07,404][138076] Updated weights for policy 0, policy_version 1258 (0.0006) [2023-08-17 13:00:08,373][138076] Updated weights for policy 0, policy_version 1268 (0.0006) [2023-08-17 13:00:09,421][138076] Updated weights for policy 0, policy_version 1278 (0.0006) [2023-08-17 13:00:09,788][131794] Fps is (10 sec: 41369.8, 60 sec: 41096.7, 300 sec: 41096.7). Total num frames: 5246976. Throughput: 0: 9549.6. Samples: 286486. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:00:09,789][131794] Avg episode reward: [(0, '20.401')] [2023-08-17 13:00:10,433][138076] Updated weights for policy 0, policy_version 1288 (0.0007) [2023-08-17 13:00:11,471][138076] Updated weights for policy 0, policy_version 1298 (0.0007) [2023-08-17 13:00:12,500][138076] Updated weights for policy 0, policy_version 1308 (0.0006) [2023-08-17 13:00:13,520][138076] Updated weights for policy 0, policy_version 1318 (0.0006) [2023-08-17 13:00:14,499][138076] Updated weights for policy 0, policy_version 1328 (0.0006) [2023-08-17 13:00:14,788][131794] Fps is (10 sec: 40550.6, 60 sec: 40960.1, 300 sec: 40960.1). Total num frames: 5447680. Throughput: 0: 9908.0. Samples: 346780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:00:14,789][131794] Avg episode reward: [(0, '21.480')] [2023-08-17 13:00:15,502][138076] Updated weights for policy 0, policy_version 1338 (0.0006) [2023-08-17 13:00:16,525][138076] Updated weights for policy 0, policy_version 1348 (0.0007) [2023-08-17 13:00:17,539][138076] Updated weights for policy 0, policy_version 1358 (0.0006) [2023-08-17 13:00:18,483][138076] Updated weights for policy 0, policy_version 1368 (0.0006) [2023-08-17 13:00:19,457][138076] Updated weights for policy 0, policy_version 1378 (0.0006) [2023-08-17 13:00:19,788][131794] Fps is (10 sec: 40959.7, 60 sec: 41062.4, 300 sec: 41062.4). Total num frames: 5656576. Throughput: 0: 10223.4. Samples: 408936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:00:19,789][131794] Avg episode reward: [(0, '24.208')] [2023-08-17 13:00:20,437][138076] Updated weights for policy 0, policy_version 1388 (0.0006) [2023-08-17 13:00:21,479][138076] Updated weights for policy 0, policy_version 1398 (0.0007) [2023-08-17 13:00:22,461][138076] Updated weights for policy 0, policy_version 1408 (0.0007) [2023-08-17 13:00:23,400][138076] Updated weights for policy 0, policy_version 1418 (0.0006) [2023-08-17 13:00:24,390][138076] Updated weights for policy 0, policy_version 1428 (0.0006) [2023-08-17 13:00:24,788][131794] Fps is (10 sec: 41779.0, 60 sec: 41142.1, 300 sec: 41142.1). Total num frames: 5865472. Throughput: 0: 9769.6. Samples: 439632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:00:24,789][131794] Avg episode reward: [(0, '23.461')] [2023-08-17 13:00:25,374][138076] Updated weights for policy 0, policy_version 1438 (0.0006) [2023-08-17 13:00:26,415][138076] Updated weights for policy 0, policy_version 1448 (0.0007) [2023-08-17 13:00:27,435][138076] Updated weights for policy 0, policy_version 1458 (0.0006) [2023-08-17 13:00:28,377][138076] Updated weights for policy 0, policy_version 1468 (0.0006) [2023-08-17 13:00:29,386][138076] Updated weights for policy 0, policy_version 1478 (0.0007) [2023-08-17 13:00:29,788][131794] Fps is (10 sec: 41369.8, 60 sec: 41123.9, 300 sec: 41123.9). Total num frames: 6070272. Throughput: 0: 10296.5. Samples: 501888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:00:29,789][131794] Avg episode reward: [(0, '23.247')] [2023-08-17 13:00:30,352][138076] Updated weights for policy 0, policy_version 1488 (0.0006) [2023-08-17 13:00:31,330][138076] Updated weights for policy 0, policy_version 1498 (0.0006) [2023-08-17 13:00:32,335][138076] Updated weights for policy 0, policy_version 1508 (0.0006) [2023-08-17 13:00:33,303][138076] Updated weights for policy 0, policy_version 1518 (0.0006) [2023-08-17 13:00:34,304][138076] Updated weights for policy 0, policy_version 1528 (0.0006) [2023-08-17 13:00:34,788][131794] Fps is (10 sec: 41369.8, 60 sec: 41183.5, 300 sec: 41183.5). Total num frames: 6279168. Throughput: 0: 10313.4. Samples: 564208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:00:34,789][131794] Avg episode reward: [(0, '21.354')] [2023-08-17 13:00:35,279][138076] Updated weights for policy 0, policy_version 1538 (0.0006) [2023-08-17 13:00:36,307][138076] Updated weights for policy 0, policy_version 1548 (0.0007) [2023-08-17 13:00:37,355][138076] Updated weights for policy 0, policy_version 1558 (0.0007) [2023-08-17 13:00:38,294][138076] Updated weights for policy 0, policy_version 1568 (0.0007) [2023-08-17 13:00:39,268][138076] Updated weights for policy 0, policy_version 1578 (0.0006) [2023-08-17 13:00:39,788][131794] Fps is (10 sec: 41369.6, 60 sec: 41164.9, 300 sec: 41164.9). Total num frames: 6483968. Throughput: 0: 10290.4. Samples: 594388. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-08-17 13:00:39,789][131794] Avg episode reward: [(0, '20.492')] [2023-08-17 13:00:40,270][138076] Updated weights for policy 0, policy_version 1588 (0.0007) [2023-08-17 13:00:41,245][138076] Updated weights for policy 0, policy_version 1598 (0.0006) [2023-08-17 13:00:42,195][138076] Updated weights for policy 0, policy_version 1608 (0.0006) [2023-08-17 13:00:43,197][138076] Updated weights for policy 0, policy_version 1618 (0.0006) [2023-08-17 13:00:44,192][138076] Updated weights for policy 0, policy_version 1628 (0.0007) [2023-08-17 13:00:44,788][131794] Fps is (10 sec: 41369.3, 60 sec: 41301.3, 300 sec: 41212.1). Total num frames: 6692864. Throughput: 0: 10311.1. Samples: 657232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:00:44,789][131794] Avg episode reward: [(0, '22.781')] [2023-08-17 13:00:45,189][138076] Updated weights for policy 0, policy_version 1638 (0.0006) [2023-08-17 13:00:46,149][138076] Updated weights for policy 0, policy_version 1648 (0.0006) [2023-08-17 13:00:47,125][138076] Updated weights for policy 0, policy_version 1658 (0.0006) [2023-08-17 13:00:48,117][138076] Updated weights for policy 0, policy_version 1668 (0.0006) [2023-08-17 13:00:49,070][138076] Updated weights for policy 0, policy_version 1678 (0.0006) [2023-08-17 13:00:49,788][131794] Fps is (10 sec: 41778.5, 60 sec: 41369.5, 300 sec: 41252.5). Total num frames: 6901760. Throughput: 0: 10331.5. Samples: 720336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-08-17 13:00:49,789][131794] Avg episode reward: [(0, '23.809')] [2023-08-17 13:00:49,994][138076] Updated weights for policy 0, policy_version 1688 (0.0006) [2023-08-17 13:00:50,994][138076] Updated weights for policy 0, policy_version 1698 (0.0007) [2023-08-17 13:00:51,961][138076] Updated weights for policy 0, policy_version 1708 (0.0006) [2023-08-17 13:00:52,902][138076] Updated weights for policy 0, policy_version 1718 (0.0006) [2023-08-17 13:00:53,878][138076] Updated weights for policy 0, policy_version 1728 (0.0007) [2023-08-17 13:00:54,788][131794] Fps is (10 sec: 41779.3, 60 sec: 41369.6, 300 sec: 41287.7). Total num frames: 7110656. Throughput: 0: 10348.9. Samples: 752188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-08-17 13:00:54,789][131794] Avg episode reward: [(0, '20.533')] [2023-08-17 13:00:54,908][138076] Updated weights for policy 0, policy_version 1738 (0.0007) [2023-08-17 13:00:55,862][138076] Updated weights for policy 0, policy_version 1748 (0.0006) [2023-08-17 13:00:56,857][138076] Updated weights for policy 0, policy_version 1758 (0.0006) [2023-08-17 13:00:57,828][138076] Updated weights for policy 0, policy_version 1768 (0.0006) [2023-08-17 13:00:58,775][138076] Updated weights for policy 0, policy_version 1778 (0.0006) [2023-08-17 13:00:59,711][138076] Updated weights for policy 0, policy_version 1788 (0.0006) [2023-08-17 13:00:59,788][131794] Fps is (10 sec: 42189.4, 60 sec: 41506.1, 300 sec: 41369.6). Total num frames: 7323648. Throughput: 0: 10401.9. Samples: 814868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:00:59,789][131794] Avg episode reward: [(0, '24.820')] [2023-08-17 13:01:00,668][138076] Updated weights for policy 0, policy_version 1798 (0.0007) [2023-08-17 13:01:01,641][138076] Updated weights for policy 0, policy_version 1808 (0.0006) [2023-08-17 13:01:02,611][138076] Updated weights for policy 0, policy_version 1818 (0.0006) [2023-08-17 13:01:03,492][138076] Updated weights for policy 0, policy_version 1828 (0.0005) [2023-08-17 13:01:04,409][138076] Updated weights for policy 0, policy_version 1838 (0.0006) [2023-08-17 13:01:04,788][131794] Fps is (10 sec: 43417.8, 60 sec: 41711.0, 300 sec: 41538.3). Total num frames: 7544832. Throughput: 0: 10472.4. Samples: 880194. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:01:04,789][131794] Avg episode reward: [(0, '21.285')] [2023-08-17 13:01:05,381][138076] Updated weights for policy 0, policy_version 1848 (0.0006) [2023-08-17 13:01:06,393][138076] Updated weights for policy 0, policy_version 1858 (0.0006) [2023-08-17 13:01:07,365][138076] Updated weights for policy 0, policy_version 1868 (0.0006) [2023-08-17 13:01:08,372][138076] Updated weights for policy 0, policy_version 1878 (0.0006) [2023-08-17 13:01:09,387][138076] Updated weights for policy 0, policy_version 1888 (0.0006) [2023-08-17 13:01:09,788][131794] Fps is (10 sec: 42188.8, 60 sec: 41642.6, 300 sec: 41460.6). Total num frames: 7745536. Throughput: 0: 10490.1. Samples: 911686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-08-17 13:01:09,789][131794] Avg episode reward: [(0, '23.546')] [2023-08-17 13:01:10,409][138076] Updated weights for policy 0, policy_version 1898 (0.0006) [2023-08-17 13:01:11,405][138076] Updated weights for policy 0, policy_version 1908 (0.0006) [2023-08-17 13:01:12,388][138076] Updated weights for policy 0, policy_version 1918 (0.0006) [2023-08-17 13:01:13,419][138076] Updated weights for policy 0, policy_version 1928 (0.0006) [2023-08-17 13:01:14,389][138076] Updated weights for policy 0, policy_version 1938 (0.0006) [2023-08-17 13:01:14,788][131794] Fps is (10 sec: 40959.9, 60 sec: 41779.2, 300 sec: 41477.4). Total num frames: 7954432. Throughput: 0: 10458.9. Samples: 972538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2023-08-17 13:01:14,789][131794] Avg episode reward: [(0, '22.888')] [2023-08-17 13:01:15,393][138076] Updated weights for policy 0, policy_version 1948 (0.0006) [2023-08-17 13:01:16,338][138076] Updated weights for policy 0, policy_version 1958 (0.0006) [2023-08-17 13:01:17,292][138076] Updated weights for policy 0, policy_version 1968 (0.0005) [2023-08-17 13:01:18,289][138076] Updated weights for policy 0, policy_version 1978 (0.0006) [2023-08-17 13:01:19,274][138076] Updated weights for policy 0, policy_version 1988 (0.0006) [2023-08-17 13:01:19,788][131794] Fps is (10 sec: 41779.1, 60 sec: 41779.2, 300 sec: 41492.5). Total num frames: 8163328. Throughput: 0: 10474.6. Samples: 1035566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:01:19,789][131794] Avg episode reward: [(0, '23.391')] [2023-08-17 13:01:20,269][138076] Updated weights for policy 0, policy_version 1998 (0.0006) [2023-08-17 13:01:21,317][138076] Updated weights for policy 0, policy_version 2008 (0.0007) [2023-08-17 13:01:22,285][138076] Updated weights for policy 0, policy_version 2018 (0.0006) [2023-08-17 13:01:23,339][138076] Updated weights for policy 0, policy_version 2028 (0.0007) [2023-08-17 13:01:24,378][138076] Updated weights for policy 0, policy_version 2038 (0.0007) [2023-08-17 13:01:24,788][131794] Fps is (10 sec: 40549.8, 60 sec: 41574.3, 300 sec: 41389.1). Total num frames: 8359936. Throughput: 0: 10485.4. Samples: 1066232. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-08-17 13:01:24,789][131794] Avg episode reward: [(0, '24.663')] [2023-08-17 13:01:25,431][138076] Updated weights for policy 0, policy_version 2048 (0.0007) [2023-08-17 13:01:26,428][138076] Updated weights for policy 0, policy_version 2058 (0.0006) [2023-08-17 13:01:27,401][138076] Updated weights for policy 0, policy_version 2068 (0.0006) [2023-08-17 13:01:28,336][138076] Updated weights for policy 0, policy_version 2078 (0.0007) [2023-08-17 13:01:29,286][138076] Updated weights for policy 0, policy_version 2088 (0.0006) [2023-08-17 13:01:29,788][131794] Fps is (10 sec: 40550.2, 60 sec: 41642.6, 300 sec: 41406.8). Total num frames: 8568832. Throughput: 0: 10439.8. Samples: 1127022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2023-08-17 13:01:29,790][131794] Avg episode reward: [(0, '25.178')] [2023-08-17 13:01:29,791][138062] Saving new best policy, reward=25.178! [2023-08-17 13:01:30,320][138076] Updated weights for policy 0, policy_version 2098 (0.0007) [2023-08-17 13:01:31,287][138076] Updated weights for policy 0, policy_version 2108 (0.0006) [2023-08-17 13:01:32,279][138076] Updated weights for policy 0, policy_version 2118 (0.0006) [2023-08-17 13:01:33,314][138076] Updated weights for policy 0, policy_version 2128 (0.0007) [2023-08-17 13:01:34,294][138076] Updated weights for policy 0, policy_version 2138 (0.0006) [2023-08-17 13:01:34,788][131794] Fps is (10 sec: 41370.4, 60 sec: 41574.4, 300 sec: 41387.4). Total num frames: 8773632. Throughput: 0: 10409.6. Samples: 1188768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:01:34,789][131794] Avg episode reward: [(0, '26.292')] [2023-08-17 13:01:34,791][138062] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002142_8773632.pth... [2023-08-17 13:01:34,827][138062] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000170_696320.pth [2023-08-17 13:01:34,834][138062] Saving new best policy, reward=26.292! [2023-08-17 13:01:35,358][138076] Updated weights for policy 0, policy_version 2148 (0.0007) [2023-08-17 13:01:36,310][138076] Updated weights for policy 0, policy_version 2158 (0.0006) [2023-08-17 13:01:37,294][138076] Updated weights for policy 0, policy_version 2168 (0.0006) [2023-08-17 13:01:38,264][138076] Updated weights for policy 0, policy_version 2178 (0.0006) [2023-08-17 13:01:39,278][138076] Updated weights for policy 0, policy_version 2188 (0.0007) [2023-08-17 13:01:39,788][131794] Fps is (10 sec: 41370.0, 60 sec: 41642.7, 300 sec: 41403.8). Total num frames: 8982528. Throughput: 0: 10381.0. Samples: 1219334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2023-08-17 13:01:39,789][131794] Avg episode reward: [(0, '21.438')] [2023-08-17 13:01:40,280][138076] Updated weights for policy 0, policy_version 2198 (0.0006) [2023-08-17 13:01:41,229][138076] Updated weights for policy 0, policy_version 2208 (0.0006) [2023-08-17 13:01:42,216][138076] Updated weights for policy 0, policy_version 2218 (0.0006) [2023-08-17 13:01:43,186][138076] Updated weights for policy 0, policy_version 2228 (0.0006) [2023-08-17 13:01:44,149][138076] Updated weights for policy 0, policy_version 2238 (0.0006) [2023-08-17 13:01:44,788][131794] Fps is (10 sec: 41779.0, 60 sec: 41642.7, 300 sec: 41418.8). Total num frames: 9191424. Throughput: 0: 10379.3. Samples: 1281936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:01:44,789][131794] Avg episode reward: [(0, '22.403')] [2023-08-17 13:01:45,098][138076] Updated weights for policy 0, policy_version 2248 (0.0006) [2023-08-17 13:01:46,042][138076] Updated weights for policy 0, policy_version 2258 (0.0006) [2023-08-17 13:01:47,015][138076] Updated weights for policy 0, policy_version 2268 (0.0006) [2023-08-17 13:01:47,992][138076] Updated weights for policy 0, policy_version 2278 (0.0007) [2023-08-17 13:01:49,037][138076] Updated weights for policy 0, policy_version 2288 (0.0007) [2023-08-17 13:01:49,788][131794] Fps is (10 sec: 41779.1, 60 sec: 41642.8, 300 sec: 41432.6). Total num frames: 9400320. Throughput: 0: 10329.8. Samples: 1345036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-08-17 13:01:49,789][131794] Avg episode reward: [(0, '24.155')] [2023-08-17 13:01:50,078][138076] Updated weights for policy 0, policy_version 2298 (0.0006) [2023-08-17 13:01:51,011][138076] Updated weights for policy 0, policy_version 2308 (0.0006) [2023-08-17 13:01:52,009][138076] Updated weights for policy 0, policy_version 2318 (0.0006) [2023-08-17 13:01:52,933][138076] Updated weights for policy 0, policy_version 2328 (0.0006) [2023-08-17 13:01:53,988][138076] Updated weights for policy 0, policy_version 2338 (0.0007) [2023-08-17 13:01:54,788][131794] Fps is (10 sec: 41779.5, 60 sec: 41642.7, 300 sec: 41445.5). Total num frames: 9609216. Throughput: 0: 10323.1. Samples: 1376224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2023-08-17 13:01:54,789][131794] Avg episode reward: [(0, '23.394')] [2023-08-17 13:01:54,969][138076] Updated weights for policy 0, policy_version 2348 (0.0006) [2023-08-17 13:01:55,990][138076] Updated weights for policy 0, policy_version 2358 (0.0006) [2023-08-17 13:01:56,962][138076] Updated weights for policy 0, policy_version 2368 (0.0006) [2023-08-17 13:01:57,986][138076] Updated weights for policy 0, policy_version 2378 (0.0006) [2023-08-17 13:01:58,980][138076] Updated weights for policy 0, policy_version 2388 (0.0006) [2023-08-17 13:01:59,788][131794] Fps is (10 sec: 41369.7, 60 sec: 41506.1, 300 sec: 41428.1). Total num frames: 9814016. Throughput: 0: 10345.8. Samples: 1438100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-08-17 13:01:59,789][131794] Avg episode reward: [(0, '22.972')] [2023-08-17 13:01:59,927][138076] Updated weights for policy 0, policy_version 2398 (0.0006) [2023-08-17 13:02:00,931][138076] Updated weights for policy 0, policy_version 2408 (0.0007) [2023-08-17 13:02:01,923][138076] Updated weights for policy 0, policy_version 2418 (0.0006) [2023-08-17 13:02:02,877][138076] Updated weights for policy 0, policy_version 2428 (0.0005) [2023-08-17 13:02:03,905][138076] Updated weights for policy 0, policy_version 2438 (0.0007) [2023-08-17 13:02:04,439][138062] Stopping Batcher_0... [2023-08-17 13:02:04,440][138062] Loop batcher_evt_loop terminating... [2023-08-17 13:02:04,440][138062] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-08-17 13:02:04,439][131794] Component Batcher_0 stopped! [2023-08-17 13:02:04,455][138076] Weights refcount: 2 0 [2023-08-17 13:02:04,456][138076] Stopping InferenceWorker_p0-w0... [2023-08-17 13:02:04,456][138076] Loop inference_proc0-0_evt_loop terminating... [2023-08-17 13:02:04,456][131794] Component InferenceWorker_p0-w0 stopped! [2023-08-17 13:02:04,473][138062] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2023-08-17 13:02:04,477][138062] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-08-17 13:02:04,501][138080] Stopping RolloutWorker_w4... [2023-08-17 13:02:04,501][138080] Loop rollout_proc4_evt_loop terminating... [2023-08-17 13:02:04,501][131794] Component RolloutWorker_w4 stopped! [2023-08-17 13:02:04,507][138081] Stopping RolloutWorker_w5... [2023-08-17 13:02:04,507][138079] Stopping RolloutWorker_w3... [2023-08-17 13:02:04,507][138081] Loop rollout_proc5_evt_loop terminating... [2023-08-17 13:02:04,507][138079] Loop rollout_proc3_evt_loop terminating... [2023-08-17 13:02:04,507][131794] Component RolloutWorker_w3 stopped! [2023-08-17 13:02:04,508][131794] Component RolloutWorker_w5 stopped! [2023-08-17 13:02:04,511][138083] Stopping RolloutWorker_w7... [2023-08-17 13:02:04,511][138083] Loop rollout_proc7_evt_loop terminating... [2023-08-17 13:02:04,511][131794] Component RolloutWorker_w7 stopped! [2023-08-17 13:02:04,512][138078] Stopping RolloutWorker_w2... [2023-08-17 13:02:04,512][138078] Loop rollout_proc2_evt_loop terminating... [2023-08-17 13:02:04,513][138082] Stopping RolloutWorker_w6... [2023-08-17 13:02:04,512][131794] Component RolloutWorker_w2 stopped! [2023-08-17 13:02:04,513][138082] Loop rollout_proc6_evt_loop terminating... [2023-08-17 13:02:04,513][131794] Component RolloutWorker_w6 stopped! [2023-08-17 13:02:04,519][138077] Stopping RolloutWorker_w1... [2023-08-17 13:02:04,519][138077] Loop rollout_proc1_evt_loop terminating... [2023-08-17 13:02:04,519][131794] Component RolloutWorker_w1 stopped! [2023-08-17 13:02:04,535][138062] Stopping LearnerWorker_p0... [2023-08-17 13:02:04,536][138062] Loop learner_proc0_evt_loop terminating... [2023-08-17 13:02:04,535][131794] Component LearnerWorker_p0 stopped! [2023-08-17 13:02:04,539][138075] Stopping RolloutWorker_w0... [2023-08-17 13:02:04,540][138075] Loop rollout_proc0_evt_loop terminating... [2023-08-17 13:02:04,539][131794] Component RolloutWorker_w0 stopped! [2023-08-17 13:02:04,540][131794] Waiting for process learner_proc0 to stop... [2023-08-17 13:02:05,188][131794] Waiting for process inference_proc0-0 to join... [2023-08-17 13:02:05,189][131794] Waiting for process rollout_proc0 to join... [2023-08-17 13:02:05,190][131794] Waiting for process rollout_proc1 to join... [2023-08-17 13:02:05,190][131794] Waiting for process rollout_proc2 to join... [2023-08-17 13:02:05,191][131794] Waiting for process rollout_proc3 to join... [2023-08-17 13:02:05,191][131794] Waiting for process rollout_proc4 to join... [2023-08-17 13:02:05,192][131794] Waiting for process rollout_proc5 to join... [2023-08-17 13:02:05,193][131794] Waiting for process rollout_proc6 to join... [2023-08-17 13:02:05,193][131794] Waiting for process rollout_proc7 to join... [2023-08-17 13:02:05,194][131794] Batcher 0 profile tree view: batching: 12.6388, releasing_batches: 0.0155 [2023-08-17 13:02:05,194][131794] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 2.6989 update_model: 2.1930 weight_update: 0.0007 one_step: 0.0019 handle_policy_step: 134.6435 deserialize: 6.0242, stack: 0.6403, obs_to_device_normalize: 30.9018, forward: 66.9499, send_messages: 8.4976 prepare_outputs: 15.8515 to_cpu: 10.1908 [2023-08-17 13:02:05,195][131794] Learner 0 profile tree view: misc: 0.0062, prepare_batch: 6.1025 train: 17.4443 epoch_init: 0.0048, minibatch_init: 0.0045, losses_postprocess: 0.4413, kl_divergence: 0.3416, after_optimizer: 0.4372 calculate_losses: 6.4031 losses_init: 0.0025, forward_head: 0.4213, bptt_initial: 3.8098, tail: 0.4256, advantages_returns: 0.1042, losses: 0.8304 bptt: 0.6899 bptt_forward_core: 0.6553 update: 9.5344 clip: 4.9986 [2023-08-17 13:02:05,195][131794] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.1145, enqueue_policy_requests: 4.7133, env_step: 65.9256, overhead: 6.2646, complete_rollouts: 0.1526 save_policy_outputs: 6.3662 split_output_tensors: 2.9670 [2023-08-17 13:02:05,195][131794] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.1125, enqueue_policy_requests: 4.8606, env_step: 68.5610, overhead: 6.5985, complete_rollouts: 0.1617 save_policy_outputs: 6.9442 split_output_tensors: 3.1602 [2023-08-17 13:02:05,196][131794] Loop Runner_EvtLoop terminating... [2023-08-17 13:02:05,197][131794] Runner profile tree view: main_loop: 149.2631 [2023-08-17 13:02:05,197][131794] Collected {0: 10006528}, FPS: 40201.8 [2023-08-17 13:02:20,239][131794] Loading existing experiment configuration from /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json [2023-08-17 13:02:20,239][131794] Overriding arg 'num_workers' with value 1 passed from command line [2023-08-17 13:02:20,240][131794] Adding new argument 'no_render'=True that is not in the saved config file! [2023-08-17 13:02:20,240][131794] Adding new argument 'save_video'=True that is not in the saved config file! [2023-08-17 13:02:20,241][131794] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-08-17 13:02:20,241][131794] Adding new argument 'video_name'=None that is not in the saved config file! [2023-08-17 13:02:20,242][131794] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2023-08-17 13:02:20,243][131794] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-08-17 13:02:20,243][131794] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2023-08-17 13:02:20,244][131794] Adding new argument 'hf_repository'=None that is not in the saved config file! [2023-08-17 13:02:20,244][131794] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-08-17 13:02:20,245][131794] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-08-17 13:02:20,246][131794] Adding new argument 'train_script'=None that is not in the saved config file! [2023-08-17 13:02:20,246][131794] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-08-17 13:02:20,247][131794] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-08-17 13:02:20,250][131794] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 13:02:20,251][131794] RunningMeanStd input shape: (1,) [2023-08-17 13:02:20,257][131794] ConvEncoder: input_channels=3 [2023-08-17 13:02:20,280][131794] Conv encoder output size: 512 [2023-08-17 13:02:20,281][131794] Policy head output size: 512 [2023-08-17 13:02:20,723][131794] Loading state from checkpoint /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-08-17 13:02:21,021][131794] Num frames 100... [2023-08-17 13:02:21,076][131794] Num frames 200... [2023-08-17 13:02:21,129][131794] Num frames 300... [2023-08-17 13:02:21,183][131794] Num frames 400... [2023-08-17 13:02:21,237][131794] Num frames 500... [2023-08-17 13:02:21,314][131794] Avg episode rewards: #0: 8.440, true rewards: #0: 5.440 [2023-08-17 13:02:21,315][131794] Avg episode reward: 8.440, avg true_objective: 5.440 [2023-08-17 13:02:21,348][131794] Num frames 600... [2023-08-17 13:02:21,402][131794] Num frames 700... [2023-08-17 13:02:21,457][131794] Num frames 800... [2023-08-17 13:02:21,512][131794] Num frames 900... [2023-08-17 13:02:21,567][131794] Num frames 1000... [2023-08-17 13:02:21,623][131794] Num frames 1100... [2023-08-17 13:02:21,679][131794] Num frames 1200... [2023-08-17 13:02:21,736][131794] Num frames 1300... [2023-08-17 13:02:21,793][131794] Num frames 1400... [2023-08-17 13:02:21,852][131794] Num frames 1500... [2023-08-17 13:02:21,912][131794] Num frames 1600... [2023-08-17 13:02:21,969][131794] Num frames 1700... [2023-08-17 13:02:22,026][131794] Num frames 1800... [2023-08-17 13:02:22,081][131794] Num frames 1900... [2023-08-17 13:02:22,137][131794] Num frames 2000... [2023-08-17 13:02:22,193][131794] Num frames 2100... [2023-08-17 13:02:22,251][131794] Num frames 2200... [2023-08-17 13:02:22,308][131794] Num frames 2300... [2023-08-17 13:02:22,367][131794] Num frames 2400... [2023-08-17 13:02:22,424][131794] Num frames 2500... [2023-08-17 13:02:22,482][131794] Num frames 2600... [2023-08-17 13:02:22,560][131794] Avg episode rewards: #0: 31.219, true rewards: #0: 13.220 [2023-08-17 13:02:22,560][131794] Avg episode reward: 31.219, avg true_objective: 13.220 [2023-08-17 13:02:22,593][131794] Num frames 2700... [2023-08-17 13:02:22,649][131794] Num frames 2800... [2023-08-17 13:02:22,705][131794] Num frames 2900... [2023-08-17 13:02:22,760][131794] Num frames 3000... [2023-08-17 13:02:22,817][131794] Num frames 3100... [2023-08-17 13:02:22,874][131794] Num frames 3200... [2023-08-17 13:02:22,931][131794] Num frames 3300... [2023-08-17 13:02:22,986][131794] Num frames 3400... [2023-08-17 13:02:23,040][131794] Num frames 3500... [2023-08-17 13:02:23,095][131794] Num frames 3600... [2023-08-17 13:02:23,150][131794] Num frames 3700... [2023-08-17 13:02:23,204][131794] Num frames 3800... [2023-08-17 13:02:23,310][131794] Avg episode rewards: #0: 30.323, true rewards: #0: 12.990 [2023-08-17 13:02:23,311][131794] Avg episode reward: 30.323, avg true_objective: 12.990 [2023-08-17 13:02:23,313][131794] Num frames 3900... [2023-08-17 13:02:23,369][131794] Num frames 4000... [2023-08-17 13:02:23,426][131794] Num frames 4100... [2023-08-17 13:02:23,484][131794] Num frames 4200... [2023-08-17 13:02:23,539][131794] Num frames 4300... [2023-08-17 13:02:23,595][131794] Num frames 4400... [2023-08-17 13:02:23,650][131794] Num frames 4500... [2023-08-17 13:02:23,706][131794] Num frames 4600... [2023-08-17 13:02:23,761][131794] Num frames 4700... [2023-08-17 13:02:23,816][131794] Num frames 4800... [2023-08-17 13:02:23,872][131794] Num frames 4900... [2023-08-17 13:02:23,929][131794] Num frames 5000... [2023-08-17 13:02:23,984][131794] Num frames 5100... [2023-08-17 13:02:24,039][131794] Num frames 5200... [2023-08-17 13:02:24,096][131794] Num frames 5300... [2023-08-17 13:02:24,152][131794] Num frames 5400... [2023-08-17 13:02:24,242][131794] Avg episode rewards: #0: 31.662, true rewards: #0: 13.662 [2023-08-17 13:02:24,243][131794] Avg episode reward: 31.662, avg true_objective: 13.662 [2023-08-17 13:02:24,263][131794] Num frames 5500... [2023-08-17 13:02:24,318][131794] Num frames 5600... [2023-08-17 13:02:24,374][131794] Num frames 5700... [2023-08-17 13:02:24,428][131794] Num frames 5800... [2023-08-17 13:02:24,483][131794] Num frames 5900... [2023-08-17 13:02:24,538][131794] Num frames 6000... [2023-08-17 13:02:24,631][131794] Avg episode rewards: #0: 27.146, true rewards: #0: 12.146 [2023-08-17 13:02:24,632][131794] Avg episode reward: 27.146, avg true_objective: 12.146 [2023-08-17 13:02:24,648][131794] Num frames 6100... [2023-08-17 13:02:24,706][131794] Num frames 6200... [2023-08-17 13:02:24,763][131794] Num frames 6300... [2023-08-17 13:02:24,818][131794] Num frames 6400... [2023-08-17 13:02:24,872][131794] Num frames 6500... [2023-08-17 13:02:24,937][131794] Avg episode rewards: #0: 23.535, true rewards: #0: 10.868 [2023-08-17 13:02:24,938][131794] Avg episode reward: 23.535, avg true_objective: 10.868 [2023-08-17 13:02:24,985][131794] Num frames 6600... [2023-08-17 13:02:25,040][131794] Num frames 6700... [2023-08-17 13:02:25,094][131794] Num frames 6800... [2023-08-17 13:02:25,150][131794] Num frames 6900... [2023-08-17 13:02:25,205][131794] Num frames 7000... [2023-08-17 13:02:25,260][131794] Num frames 7100... [2023-08-17 13:02:25,315][131794] Num frames 7200... [2023-08-17 13:02:25,416][131794] Avg episode rewards: #0: 21.984, true rewards: #0: 10.413 [2023-08-17 13:02:25,417][131794] Avg episode reward: 21.984, avg true_objective: 10.413 [2023-08-17 13:02:25,424][131794] Num frames 7300... [2023-08-17 13:02:25,479][131794] Num frames 7400... [2023-08-17 13:02:25,535][131794] Num frames 7500... [2023-08-17 13:02:25,591][131794] Num frames 7600... [2023-08-17 13:02:25,646][131794] Num frames 7700... [2023-08-17 13:02:25,746][131794] Avg episode rewards: #0: 20.484, true rewards: #0: 9.734 [2023-08-17 13:02:25,747][131794] Avg episode reward: 20.484, avg true_objective: 9.734 [2023-08-17 13:02:25,755][131794] Num frames 7800... [2023-08-17 13:02:25,809][131794] Num frames 7900... [2023-08-17 13:02:25,864][131794] Num frames 8000... [2023-08-17 13:02:25,918][131794] Num frames 8100... [2023-08-17 13:02:25,972][131794] Num frames 8200... [2023-08-17 13:02:26,027][131794] Num frames 8300... [2023-08-17 13:02:26,082][131794] Num frames 8400... [2023-08-17 13:02:26,167][131794] Avg episode rewards: #0: 19.510, true rewards: #0: 9.399 [2023-08-17 13:02:26,168][131794] Avg episode reward: 19.510, avg true_objective: 9.399 [2023-08-17 13:02:26,190][131794] Num frames 8500... [2023-08-17 13:02:26,245][131794] Num frames 8600... [2023-08-17 13:02:26,300][131794] Num frames 8700... [2023-08-17 13:02:26,355][131794] Num frames 8800... [2023-08-17 13:02:26,410][131794] Num frames 8900... [2023-08-17 13:02:26,466][131794] Avg episode rewards: #0: 18.107, true rewards: #0: 8.907 [2023-08-17 13:02:26,467][131794] Avg episode reward: 18.107, avg true_objective: 8.907 [2023-08-17 13:02:34,995][131794] Replay video saved to /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! [2023-08-17 13:03:03,228][131794] Loading existing experiment configuration from /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json [2023-08-17 13:03:03,228][131794] Overriding arg 'num_workers' with value 1 passed from command line [2023-08-17 13:03:03,229][131794] Adding new argument 'no_render'=True that is not in the saved config file! [2023-08-17 13:03:03,229][131794] Adding new argument 'save_video'=True that is not in the saved config file! [2023-08-17 13:03:03,230][131794] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-08-17 13:03:03,230][131794] Adding new argument 'video_name'=None that is not in the saved config file! [2023-08-17 13:03:03,230][131794] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-08-17 13:03:03,231][131794] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-08-17 13:03:03,231][131794] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-08-17 13:03:03,231][131794] Adding new argument 'hf_repository'='patonw/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-08-17 13:03:03,232][131794] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-08-17 13:03:03,232][131794] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-08-17 13:03:03,233][131794] Adding new argument 'train_script'=None that is not in the saved config file! [2023-08-17 13:03:03,233][131794] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-08-17 13:03:03,233][131794] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-08-17 13:03:03,238][131794] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 13:03:03,239][131794] RunningMeanStd input shape: (1,) [2023-08-17 13:03:03,245][131794] ConvEncoder: input_channels=3 [2023-08-17 13:03:03,266][131794] Conv encoder output size: 512 [2023-08-17 13:03:03,267][131794] Policy head output size: 512 [2023-08-17 13:03:03,283][131794] Loading state from checkpoint /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-08-17 13:03:03,601][131794] Num frames 100... [2023-08-17 13:03:03,655][131794] Num frames 200... [2023-08-17 13:03:03,719][131794] Num frames 300... [2023-08-17 13:03:03,778][131794] Num frames 400... [2023-08-17 13:03:03,835][131794] Num frames 500... [2023-08-17 13:03:03,888][131794] Num frames 600... [2023-08-17 13:03:03,942][131794] Num frames 700... [2023-08-17 13:03:03,995][131794] Num frames 800... [2023-08-17 13:03:04,049][131794] Num frames 900... [2023-08-17 13:03:04,102][131794] Num frames 1000... [2023-08-17 13:03:04,156][131794] Num frames 1100... [2023-08-17 13:03:04,210][131794] Num frames 1200... [2023-08-17 13:03:04,265][131794] Num frames 1300... [2023-08-17 13:03:04,319][131794] Num frames 1400... [2023-08-17 13:03:04,376][131794] Num frames 1500... [2023-08-17 13:03:04,430][131794] Num frames 1600... [2023-08-17 13:03:04,485][131794] Num frames 1700... [2023-08-17 13:03:04,541][131794] Num frames 1800... [2023-08-17 13:03:04,597][131794] Num frames 1900... [2023-08-17 13:03:04,655][131794] Num frames 2000... [2023-08-17 13:03:04,715][131794] Num frames 2100... [2023-08-17 13:03:04,766][131794] Avg episode rewards: #0: 53.999, true rewards: #0: 21.000 [2023-08-17 13:03:04,767][131794] Avg episode reward: 53.999, avg true_objective: 21.000 [2023-08-17 13:03:04,825][131794] Num frames 2200... [2023-08-17 13:03:04,883][131794] Num frames 2300... [2023-08-17 13:03:04,942][131794] Num frames 2400... [2023-08-17 13:03:04,999][131794] Num frames 2500... [2023-08-17 13:03:05,056][131794] Num frames 2600... [2023-08-17 13:03:05,114][131794] Num frames 2700... [2023-08-17 13:03:05,173][131794] Num frames 2800... [2023-08-17 13:03:05,231][131794] Num frames 2900... [2023-08-17 13:03:05,291][131794] Num frames 3000... [2023-08-17 13:03:05,348][131794] Num frames 3100... [2023-08-17 13:03:05,408][131794] Num frames 3200... [2023-08-17 13:03:05,466][131794] Num frames 3300... [2023-08-17 13:03:05,526][131794] Num frames 3400... [2023-08-17 13:03:05,586][131794] Num frames 3500... [2023-08-17 13:03:05,672][131794] Avg episode rewards: #0: 44.770, true rewards: #0: 17.770 [2023-08-17 13:03:05,673][131794] Avg episode reward: 44.770, avg true_objective: 17.770 [2023-08-17 13:03:05,699][131794] Num frames 3600... [2023-08-17 13:03:05,759][131794] Num frames 3700... [2023-08-17 13:03:05,819][131794] Num frames 3800... [2023-08-17 13:03:05,878][131794] Num frames 3900... [2023-08-17 13:03:05,936][131794] Num frames 4000... [2023-08-17 13:03:05,993][131794] Num frames 4100... [2023-08-17 13:03:06,050][131794] Num frames 4200... [2023-08-17 13:03:06,108][131794] Num frames 4300... [2023-08-17 13:03:06,166][131794] Num frames 4400... [2023-08-17 13:03:06,225][131794] Num frames 4500... [2023-08-17 13:03:06,284][131794] Num frames 4600... [2023-08-17 13:03:06,343][131794] Num frames 4700... [2023-08-17 13:03:06,401][131794] Num frames 4800... [2023-08-17 13:03:06,463][131794] Num frames 4900... [2023-08-17 13:03:06,521][131794] Num frames 5000... [2023-08-17 13:03:06,579][131794] Num frames 5100... [2023-08-17 13:03:06,638][131794] Num frames 5200... [2023-08-17 13:03:06,696][131794] Num frames 5300... [2023-08-17 13:03:06,753][131794] Num frames 5400... [2023-08-17 13:03:06,812][131794] Num frames 5500... [2023-08-17 13:03:06,870][131794] Num frames 5600... [2023-08-17 13:03:06,954][131794] Avg episode rewards: #0: 48.179, true rewards: #0: 18.847 [2023-08-17 13:03:06,955][131794] Avg episode reward: 48.179, avg true_objective: 18.847 [2023-08-17 13:03:06,981][131794] Num frames 5700... [2023-08-17 13:03:07,037][131794] Num frames 5800... [2023-08-17 13:03:07,094][131794] Num frames 5900... [2023-08-17 13:03:07,152][131794] Num frames 6000... [2023-08-17 13:03:07,210][131794] Num frames 6100... [2023-08-17 13:03:07,269][131794] Num frames 6200... [2023-08-17 13:03:07,327][131794] Num frames 6300... [2023-08-17 13:03:07,385][131794] Num frames 6400... [2023-08-17 13:03:07,443][131794] Num frames 6500... [2023-08-17 13:03:07,500][131794] Num frames 6600... [2023-08-17 13:03:07,559][131794] Num frames 6700... [2023-08-17 13:03:07,617][131794] Num frames 6800... [2023-08-17 13:03:07,679][131794] Avg episode rewards: #0: 42.537, true rewards: #0: 17.038 [2023-08-17 13:03:07,679][131794] Avg episode reward: 42.537, avg true_objective: 17.038 [2023-08-17 13:03:07,728][131794] Num frames 6900... [2023-08-17 13:03:07,787][131794] Num frames 7000... [2023-08-17 13:03:07,845][131794] Num frames 7100... [2023-08-17 13:03:07,903][131794] Num frames 7200... [2023-08-17 13:03:07,961][131794] Num frames 7300... [2023-08-17 13:03:08,020][131794] Num frames 7400... [2023-08-17 13:03:08,078][131794] Num frames 7500... [2023-08-17 13:03:08,137][131794] Num frames 7600... [2023-08-17 13:03:08,195][131794] Num frames 7700... [2023-08-17 13:03:08,255][131794] Num frames 7800... [2023-08-17 13:03:08,313][131794] Num frames 7900... [2023-08-17 13:03:08,371][131794] Num frames 8000... [2023-08-17 13:03:08,430][131794] Num frames 8100... [2023-08-17 13:03:08,492][131794] Num frames 8200... [2023-08-17 13:03:08,551][131794] Num frames 8300... [2023-08-17 13:03:08,610][131794] Num frames 8400... [2023-08-17 13:03:08,699][131794] Avg episode rewards: #0: 41.920, true rewards: #0: 16.920 [2023-08-17 13:03:08,699][131794] Avg episode reward: 41.920, avg true_objective: 16.920 [2023-08-17 13:03:08,724][131794] Num frames 8500... [2023-08-17 13:03:08,783][131794] Num frames 8600... [2023-08-17 13:03:08,842][131794] Num frames 8700... [2023-08-17 13:03:08,901][131794] Num frames 8800... [2023-08-17 13:03:08,960][131794] Num frames 8900... [2023-08-17 13:03:09,019][131794] Num frames 9000... [2023-08-17 13:03:09,074][131794] Avg episode rewards: #0: 36.173, true rewards: #0: 15.007 [2023-08-17 13:03:09,074][131794] Avg episode reward: 36.173, avg true_objective: 15.007 [2023-08-17 13:03:09,127][131794] Num frames 9100... [2023-08-17 13:03:09,181][131794] Num frames 9200... [2023-08-17 13:03:09,235][131794] Num frames 9300... [2023-08-17 13:03:09,290][131794] Num frames 9400... [2023-08-17 13:03:09,344][131794] Num frames 9500... [2023-08-17 13:03:09,398][131794] Num frames 9600... [2023-08-17 13:03:09,452][131794] Num frames 9700... [2023-08-17 13:03:09,506][131794] Num frames 9800... [2023-08-17 13:03:09,559][131794] Num frames 9900... [2023-08-17 13:03:09,663][131794] Avg episode rewards: #0: 34.137, true rewards: #0: 14.280 [2023-08-17 13:03:09,664][131794] Avg episode reward: 34.137, avg true_objective: 14.280 [2023-08-17 13:03:09,667][131794] Num frames 10000... [2023-08-17 13:03:09,720][131794] Num frames 10100... [2023-08-17 13:03:09,775][131794] Num frames 10200... [2023-08-17 13:03:09,831][131794] Num frames 10300... [2023-08-17 13:03:09,887][131794] Num frames 10400... [2023-08-17 13:03:09,942][131794] Num frames 10500... [2023-08-17 13:03:09,997][131794] Num frames 10600... [2023-08-17 13:03:10,101][131794] Avg episode rewards: #0: 32.115, true rewards: #0: 13.365 [2023-08-17 13:03:10,102][131794] Avg episode reward: 32.115, avg true_objective: 13.365 [2023-08-17 13:03:10,108][131794] Num frames 10700... [2023-08-17 13:03:10,164][131794] Num frames 10800... [2023-08-17 13:03:10,220][131794] Num frames 10900... [2023-08-17 13:03:10,275][131794] Num frames 11000... [2023-08-17 13:03:10,334][131794] Avg episode rewards: #0: 29.235, true rewards: #0: 12.236 [2023-08-17 13:03:10,335][131794] Avg episode reward: 29.235, avg true_objective: 12.236 [2023-08-17 13:03:10,387][131794] Num frames 11100... [2023-08-17 13:03:10,442][131794] Num frames 11200... [2023-08-17 13:03:10,497][131794] Num frames 11300... [2023-08-17 13:03:10,553][131794] Num frames 11400... [2023-08-17 13:03:10,610][131794] Num frames 11500... [2023-08-17 13:03:10,666][131794] Num frames 11600... [2023-08-17 13:03:10,720][131794] Num frames 11700... [2023-08-17 13:03:10,779][131794] Num frames 11800... [2023-08-17 13:03:10,862][131794] Avg episode rewards: #0: 28.046, true rewards: #0: 11.846 [2023-08-17 13:03:10,863][131794] Avg episode reward: 28.046, avg true_objective: 11.846 [2023-08-17 13:03:22,380][131794] Replay video saved to /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! [2023-08-17 13:05:04,775][131794] The model has been pushed to https://huggingface.co/patonw/rl_course_vizdoom_health_gathering_supreme [2023-08-17 13:08:54,473][131794] Environment doom_basic already registered, overwriting... [2023-08-17 13:08:54,474][131794] Environment doom_two_colors_easy already registered, overwriting... [2023-08-17 13:08:54,474][131794] Environment doom_two_colors_hard already registered, overwriting... [2023-08-17 13:08:54,475][131794] Environment doom_dm already registered, overwriting... [2023-08-17 13:08:54,475][131794] Environment doom_dwango5 already registered, overwriting... [2023-08-17 13:08:54,475][131794] Environment doom_my_way_home_flat_actions already registered, overwriting... [2023-08-17 13:08:54,476][131794] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2023-08-17 13:08:54,476][131794] Environment doom_my_way_home already registered, overwriting... [2023-08-17 13:08:54,476][131794] Environment doom_deadly_corridor already registered, overwriting... [2023-08-17 13:08:54,477][131794] Environment doom_defend_the_center already registered, overwriting... [2023-08-17 13:08:54,477][131794] Environment doom_defend_the_line already registered, overwriting... [2023-08-17 13:08:54,477][131794] Environment doom_health_gathering already registered, overwriting... [2023-08-17 13:08:54,478][131794] Environment doom_health_gathering_supreme already registered, overwriting... [2023-08-17 13:08:54,478][131794] Environment doom_battle already registered, overwriting... [2023-08-17 13:08:54,478][131794] Environment doom_battle2 already registered, overwriting... [2023-08-17 13:08:54,479][131794] Environment doom_duel_bots already registered, overwriting... [2023-08-17 13:08:54,479][131794] Environment doom_deathmatch_bots already registered, overwriting... [2023-08-17 13:08:54,479][131794] Environment doom_duel already registered, overwriting... [2023-08-17 13:08:54,480][131794] Environment doom_deathmatch_full already registered, overwriting... [2023-08-17 13:08:54,480][131794] Environment doom_benchmark already registered, overwriting... [2023-08-17 13:08:54,480][131794] register_encoder_factory: [2023-08-17 13:08:54,496][131794] Loading existing experiment configuration from /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json [2023-08-17 13:08:54,497][131794] Overriding arg 'num_envs_per_worker' with value 8 passed from command line [2023-08-17 13:08:54,497][131794] Overriding arg 'train_for_env_steps' with value 100000000 passed from command line [2023-08-17 13:08:54,500][131794] Experiment dir /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists! [2023-08-17 13:08:54,501][131794] Resuming existing experiment from /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment... [2023-08-17 13:08:54,501][131794] Weights and Biases integration enabled. Project: sample_factory, user: None, group: None, unique_id: default_experiment_20230817_125929_635646 [2023-08-17 13:08:54,501][131794] Initializing WandB... [2023-08-17 13:09:08,147][131794] Environment var CUDA_VISIBLE_DEVICES is 0 [2023-08-17 13:09:09,155][131794] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=8 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=100000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=True wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=336df5a551fea3a2cf40925bf3083db6b4518c91 git_repo_name=https://github.com/huggingface/deep-rl-class wandb_unique_id=default_experiment_20230817_125929_635646 [2023-08-17 13:09:09,156][131794] Saving configuration to /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... [2023-08-17 13:09:09,278][131794] Rollout worker 0 uses device cpu [2023-08-17 13:09:09,279][131794] Rollout worker 1 uses device cpu [2023-08-17 13:09:09,280][131794] Rollout worker 2 uses device cpu [2023-08-17 13:09:09,280][131794] Rollout worker 3 uses device cpu [2023-08-17 13:09:09,281][131794] Rollout worker 4 uses device cpu [2023-08-17 13:09:09,281][131794] Rollout worker 5 uses device cpu [2023-08-17 13:09:09,282][131794] Rollout worker 6 uses device cpu [2023-08-17 13:09:09,282][131794] Rollout worker 7 uses device cpu [2023-08-17 13:09:09,327][131794] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 13:09:09,328][131794] InferenceWorker_p0-w0: min num requests: 2 [2023-08-17 13:09:09,346][131794] Starting all processes... [2023-08-17 13:09:09,346][131794] Starting process learner_proc0 [2023-08-17 13:09:09,396][131794] Starting all processes... [2023-08-17 13:09:09,399][131794] Starting process inference_proc0-0 [2023-08-17 13:09:09,399][131794] Starting process rollout_proc0 [2023-08-17 13:09:09,399][131794] Starting process rollout_proc1 [2023-08-17 13:09:09,400][131794] Starting process rollout_proc2 [2023-08-17 13:09:09,400][131794] Starting process rollout_proc3 [2023-08-17 13:09:09,401][131794] Starting process rollout_proc4 [2023-08-17 13:09:09,401][131794] Starting process rollout_proc5 [2023-08-17 13:09:09,402][131794] Starting process rollout_proc6 [2023-08-17 13:09:09,403][131794] Starting process rollout_proc7 [2023-08-17 13:09:10,363][139686] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 13:09:10,364][139686] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-08-17 13:09:10,368][139686] Num visible devices: 1 [2023-08-17 13:09:10,385][139686] Starting seed is not provided [2023-08-17 13:09:10,385][139686] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 13:09:10,386][139686] Initializing actor-critic model on device cuda:0 [2023-08-17 13:09:10,386][139686] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 13:09:10,387][139686] RunningMeanStd input shape: (1,) [2023-08-17 13:09:10,398][139686] ConvEncoder: input_channels=3 [2023-08-17 13:09:10,457][139700] Worker 0 uses CPU cores [0, 1, 2] [2023-08-17 13:09:10,486][139699] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 13:09:10,486][139699] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-08-17 13:09:10,486][139707] Worker 6 uses CPU cores [18, 19, 20] [2023-08-17 13:09:10,486][139706] Worker 7 uses CPU cores [21, 22, 23] [2023-08-17 13:09:10,486][139701] Worker 1 uses CPU cores [3, 4, 5] [2023-08-17 13:09:10,486][139686] Conv encoder output size: 512 [2023-08-17 13:09:10,487][139686] Policy head output size: 512 [2023-08-17 13:09:10,490][139699] Num visible devices: 1 [2023-08-17 13:09:10,498][139686] Created Actor Critic model with architecture: [2023-08-17 13:09:10,499][139686] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-08-17 13:09:10,502][139702] Worker 3 uses CPU cores [9, 10, 11] [2023-08-17 13:09:10,503][139705] Worker 5 uses CPU cores [15, 16, 17] [2023-08-17 13:09:10,504][139704] Worker 2 uses CPU cores [6, 7, 8] [2023-08-17 13:09:10,531][139703] Worker 4 uses CPU cores [12, 13, 14] [2023-08-17 13:09:10,582][139686] Using optimizer [2023-08-17 13:09:10,582][139686] Loading state from checkpoint /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... [2023-08-17 13:09:10,602][139686] Loading model from checkpoint [2023-08-17 13:09:10,604][139686] Loaded experiment state at self.train_step=2443, self.env_steps=10006528 [2023-08-17 13:09:10,604][139686] Initialized policy 0 weights for model version 2443 [2023-08-17 13:09:10,606][139686] LearnerWorker_p0 finished initialization! [2023-08-17 13:09:10,606][139686] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 13:09:10,650][139699] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 13:09:10,650][139699] RunningMeanStd input shape: (1,) [2023-08-17 13:09:10,657][139699] ConvEncoder: input_channels=3 [2023-08-17 13:09:10,707][139699] Conv encoder output size: 512 [2023-08-17 13:09:10,707][139699] Policy head output size: 512 [2023-08-17 13:09:10,732][131794] Inference worker 0-0 is ready! [2023-08-17 13:09:10,733][131794] All inference workers are ready! Signal rollout workers to start! [2023-08-17 13:09:10,749][139705] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:09:10,749][139701] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:09:10,750][139703] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:09:10,750][139700] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:09:10,751][139702] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:09:10,751][139704] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:09:10,751][139706] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:09:10,752][139707] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:09:10,993][139705] Decorrelating experience for 0 frames... [2023-08-17 13:09:10,993][139703] Decorrelating experience for 0 frames... [2023-08-17 13:09:10,995][139704] Decorrelating experience for 0 frames... [2023-08-17 13:09:10,997][139702] Decorrelating experience for 0 frames... [2023-08-17 13:09:10,997][139700] Decorrelating experience for 0 frames... [2023-08-17 13:09:10,997][139707] Decorrelating experience for 0 frames... [2023-08-17 13:09:11,018][139701] Decorrelating experience for 0 frames... [2023-08-17 13:09:11,176][139703] Decorrelating experience for 32 frames... [2023-08-17 13:09:11,181][139704] Decorrelating experience for 32 frames... [2023-08-17 13:09:11,181][139702] Decorrelating experience for 32 frames... [2023-08-17 13:09:11,198][139705] Decorrelating experience for 32 frames... [2023-08-17 13:09:11,208][139701] Decorrelating experience for 32 frames... [2023-08-17 13:09:11,367][139703] Decorrelating experience for 64 frames... [2023-08-17 13:09:11,374][139702] Decorrelating experience for 64 frames... [2023-08-17 13:09:11,377][139700] Decorrelating experience for 32 frames... [2023-08-17 13:09:11,379][139707] Decorrelating experience for 32 frames... [2023-08-17 13:09:11,391][139701] Decorrelating experience for 64 frames... [2023-08-17 13:09:11,427][139704] Decorrelating experience for 64 frames... [2023-08-17 13:09:11,574][139703] Decorrelating experience for 96 frames... [2023-08-17 13:09:11,580][139702] Decorrelating experience for 96 frames... [2023-08-17 13:09:11,605][139707] Decorrelating experience for 64 frames... [2023-08-17 13:09:11,647][139704] Decorrelating experience for 96 frames... [2023-08-17 13:09:11,782][139700] Decorrelating experience for 64 frames... [2023-08-17 13:09:11,798][139701] Decorrelating experience for 96 frames... [2023-08-17 13:09:11,804][139702] Decorrelating experience for 128 frames... [2023-08-17 13:09:11,806][139705] Decorrelating experience for 64 frames... [2023-08-17 13:09:11,871][139704] Decorrelating experience for 128 frames... [2023-08-17 13:09:11,983][139700] Decorrelating experience for 96 frames... [2023-08-17 13:09:12,010][139703] Decorrelating experience for 128 frames... [2023-08-17 13:09:12,019][139706] Decorrelating experience for 0 frames... [2023-08-17 13:09:12,027][139705] Decorrelating experience for 96 frames... [2023-08-17 13:09:12,032][139702] Decorrelating experience for 160 frames... [2023-08-17 13:09:12,180][139701] Decorrelating experience for 128 frames... [2023-08-17 13:09:12,212][139700] Decorrelating experience for 128 frames... [2023-08-17 13:09:12,214][139706] Decorrelating experience for 32 frames... [2023-08-17 13:09:12,232][139707] Decorrelating experience for 96 frames... [2023-08-17 13:09:12,254][139705] Decorrelating experience for 128 frames... [2023-08-17 13:09:12,267][139702] Decorrelating experience for 192 frames... [2023-08-17 13:09:12,398][139704] Decorrelating experience for 160 frames... [2023-08-17 13:09:12,415][139701] Decorrelating experience for 160 frames... [2023-08-17 13:09:12,424][139703] Decorrelating experience for 160 frames... [2023-08-17 13:09:12,428][139706] Decorrelating experience for 64 frames... [2023-08-17 13:09:12,479][139705] Decorrelating experience for 160 frames... [2023-08-17 13:09:12,598][139707] Decorrelating experience for 128 frames... [2023-08-17 13:09:12,625][139702] Decorrelating experience for 224 frames... [2023-08-17 13:09:12,646][139704] Decorrelating experience for 192 frames... [2023-08-17 13:09:12,647][139706] Decorrelating experience for 96 frames... [2023-08-17 13:09:12,712][139705] Decorrelating experience for 192 frames... [2023-08-17 13:09:12,827][139701] Decorrelating experience for 192 frames... [2023-08-17 13:09:12,832][139700] Decorrelating experience for 160 frames... [2023-08-17 13:09:12,883][139704] Decorrelating experience for 224 frames... [2023-08-17 13:09:12,886][139706] Decorrelating experience for 128 frames... [2023-08-17 13:09:13,051][139705] Decorrelating experience for 224 frames... [2023-08-17 13:09:13,071][139707] Decorrelating experience for 160 frames... [2023-08-17 13:09:13,074][139701] Decorrelating experience for 224 frames... [2023-08-17 13:09:13,134][139706] Decorrelating experience for 160 frames... [2023-08-17 13:09:13,147][131794] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 10006528. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-08-17 13:09:13,148][131794] Avg episode reward: [(0, '1.985')] [2023-08-17 13:09:13,264][139703] Decorrelating experience for 192 frames... [2023-08-17 13:09:13,348][139707] Decorrelating experience for 192 frames... [2023-08-17 13:09:13,387][139706] Decorrelating experience for 192 frames... [2023-08-17 13:09:13,407][139686] Signal inference workers to stop experience collection... [2023-08-17 13:09:13,411][139699] InferenceWorker_p0-w0: stopping experience collection [2023-08-17 13:09:13,473][139700] Decorrelating experience for 192 frames... [2023-08-17 13:09:13,550][139703] Decorrelating experience for 224 frames... [2023-08-17 13:09:13,605][139707] Decorrelating experience for 224 frames... [2023-08-17 13:09:13,623][139706] Decorrelating experience for 224 frames... [2023-08-17 13:09:13,718][139700] Decorrelating experience for 224 frames... [2023-08-17 13:09:13,938][139686] Signal inference workers to resume experience collection... [2023-08-17 13:09:13,938][139699] InferenceWorker_p0-w0: resuming experience collection [2023-08-17 13:09:15,083][139699] Updated weights for policy 0, policy_version 2453 (0.0186) [2023-08-17 13:09:15,917][139699] Updated weights for policy 0, policy_version 2463 (0.0007) [2023-08-17 13:09:16,726][139699] Updated weights for policy 0, policy_version 2473 (0.0007) [2023-08-17 13:09:17,525][139699] Updated weights for policy 0, policy_version 2483 (0.0006) [2023-08-17 13:09:18,147][131794] Fps is (10 sec: 38501.9, 60 sec: 38501.9, 300 sec: 38501.9). Total num frames: 10199040. Throughput: 0: 6331.1. Samples: 31656. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:09:18,148][131794] Avg episode reward: [(0, '24.256')] [2023-08-17 13:09:18,364][139699] Updated weights for policy 0, policy_version 2493 (0.0007) [2023-08-17 13:09:19,144][139699] Updated weights for policy 0, policy_version 2503 (0.0006) [2023-08-17 13:09:19,935][139699] Updated weights for policy 0, policy_version 2513 (0.0007) [2023-08-17 13:09:20,718][139699] Updated weights for policy 0, policy_version 2523 (0.0006) [2023-08-17 13:09:21,464][139699] Updated weights for policy 0, policy_version 2533 (0.0007) [2023-08-17 13:09:22,262][139699] Updated weights for policy 0, policy_version 2543 (0.0007) [2023-08-17 13:09:23,059][139699] Updated weights for policy 0, policy_version 2553 (0.0007) [2023-08-17 13:09:23,147][131794] Fps is (10 sec: 45465.6, 60 sec: 45465.6, 300 sec: 45465.6). Total num frames: 10461184. Throughput: 0: 10946.4. Samples: 109464. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:09:23,148][131794] Avg episode reward: [(0, '21.903')] [2023-08-17 13:09:23,847][139699] Updated weights for policy 0, policy_version 2563 (0.0007) [2023-08-17 13:09:24,635][139699] Updated weights for policy 0, policy_version 2573 (0.0006) [2023-08-17 13:09:25,421][139699] Updated weights for policy 0, policy_version 2583 (0.0006) [2023-08-17 13:09:26,205][139699] Updated weights for policy 0, policy_version 2593 (0.0006) [2023-08-17 13:09:27,012][139699] Updated weights for policy 0, policy_version 2603 (0.0006) [2023-08-17 13:09:27,806][139699] Updated weights for policy 0, policy_version 2613 (0.0006) [2023-08-17 13:09:28,147][131794] Fps is (10 sec: 52019.0, 60 sec: 47513.2, 300 sec: 47513.2). Total num frames: 10719232. Throughput: 0: 9892.2. Samples: 148384. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:09:28,149][131794] Avg episode reward: [(0, '26.382')] [2023-08-17 13:09:28,152][139686] Saving new best policy, reward=26.382! [2023-08-17 13:09:28,614][139699] Updated weights for policy 0, policy_version 2623 (0.0007) [2023-08-17 13:09:29,322][131794] Heartbeat connected on Batcher_0 [2023-08-17 13:09:29,325][131794] Heartbeat connected on LearnerWorker_p0 [2023-08-17 13:09:29,330][131794] Heartbeat connected on InferenceWorker_p0-w0 [2023-08-17 13:09:29,332][131794] Heartbeat connected on RolloutWorker_w0 [2023-08-17 13:09:29,337][131794] Heartbeat connected on RolloutWorker_w1 [2023-08-17 13:09:29,338][131794] Heartbeat connected on RolloutWorker_w2 [2023-08-17 13:09:29,340][131794] Heartbeat connected on RolloutWorker_w3 [2023-08-17 13:09:29,341][131794] Heartbeat connected on RolloutWorker_w4 [2023-08-17 13:09:29,344][131794] Heartbeat connected on RolloutWorker_w5 [2023-08-17 13:09:29,344][131794] Heartbeat connected on RolloutWorker_w6 [2023-08-17 13:09:29,349][131794] Heartbeat connected on RolloutWorker_w7 [2023-08-17 13:09:29,451][139699] Updated weights for policy 0, policy_version 2633 (0.0006) [2023-08-17 13:09:30,246][139699] Updated weights for policy 0, policy_version 2643 (0.0007) [2023-08-17 13:09:31,043][139699] Updated weights for policy 0, policy_version 2653 (0.0006) [2023-08-17 13:09:31,883][139699] Updated weights for policy 0, policy_version 2663 (0.0007) [2023-08-17 13:09:32,739][139699] Updated weights for policy 0, policy_version 2673 (0.0007) [2023-08-17 13:09:33,147][131794] Fps is (10 sec: 50790.1, 60 sec: 48127.8, 300 sec: 48127.8). Total num frames: 10969088. Throughput: 0: 11239.2. Samples: 224784. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:09:33,148][131794] Avg episode reward: [(0, '27.613')] [2023-08-17 13:09:33,150][139686] Saving new best policy, reward=27.613! [2023-08-17 13:09:33,585][139699] Updated weights for policy 0, policy_version 2683 (0.0007) [2023-08-17 13:09:34,503][139699] Updated weights for policy 0, policy_version 2693 (0.0008) [2023-08-17 13:09:35,346][139699] Updated weights for policy 0, policy_version 2703 (0.0007) [2023-08-17 13:09:36,222][139699] Updated weights for policy 0, policy_version 2713 (0.0008) [2023-08-17 13:09:37,030][139699] Updated weights for policy 0, policy_version 2723 (0.0006) [2023-08-17 13:09:37,865][139699] Updated weights for policy 0, policy_version 2733 (0.0006) [2023-08-17 13:09:38,147][131794] Fps is (10 sec: 48742.4, 60 sec: 48004.9, 300 sec: 48004.9). Total num frames: 11206656. Throughput: 0: 11878.0. Samples: 296952. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:09:38,148][131794] Avg episode reward: [(0, '26.963')] [2023-08-17 13:09:38,694][139699] Updated weights for policy 0, policy_version 2743 (0.0008) [2023-08-17 13:09:39,517][139699] Updated weights for policy 0, policy_version 2753 (0.0006) [2023-08-17 13:09:40,327][139699] Updated weights for policy 0, policy_version 2763 (0.0007) [2023-08-17 13:09:41,159][139699] Updated weights for policy 0, policy_version 2773 (0.0006) [2023-08-17 13:09:41,945][139699] Updated weights for policy 0, policy_version 2783 (0.0006) [2023-08-17 13:09:42,761][139699] Updated weights for policy 0, policy_version 2793 (0.0006) [2023-08-17 13:09:43,147][131794] Fps is (10 sec: 49152.1, 60 sec: 48469.2, 300 sec: 48469.2). Total num frames: 11460608. Throughput: 0: 11133.0. Samples: 333992. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:09:43,148][131794] Avg episode reward: [(0, '25.408')] [2023-08-17 13:09:43,526][139699] Updated weights for policy 0, policy_version 2803 (0.0006) [2023-08-17 13:09:44,317][139699] Updated weights for policy 0, policy_version 2813 (0.0007) [2023-08-17 13:09:45,099][139699] Updated weights for policy 0, policy_version 2823 (0.0006) [2023-08-17 13:09:45,930][139699] Updated weights for policy 0, policy_version 2833 (0.0007) [2023-08-17 13:09:46,722][139699] Updated weights for policy 0, policy_version 2843 (0.0006) [2023-08-17 13:09:47,537][139699] Updated weights for policy 0, policy_version 2853 (0.0007) [2023-08-17 13:09:48,147][131794] Fps is (10 sec: 50790.8, 60 sec: 48800.9, 300 sec: 48800.9). Total num frames: 11714560. Throughput: 0: 11747.8. Samples: 411172. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:09:48,148][131794] Avg episode reward: [(0, '25.435')] [2023-08-17 13:09:48,321][139699] Updated weights for policy 0, policy_version 2863 (0.0007) [2023-08-17 13:09:49,166][139699] Updated weights for policy 0, policy_version 2873 (0.0007) [2023-08-17 13:09:49,951][139699] Updated weights for policy 0, policy_version 2883 (0.0007) [2023-08-17 13:09:50,766][139699] Updated weights for policy 0, policy_version 2893 (0.0007) [2023-08-17 13:09:51,596][139699] Updated weights for policy 0, policy_version 2903 (0.0007) [2023-08-17 13:09:52,377][139699] Updated weights for policy 0, policy_version 2913 (0.0006) [2023-08-17 13:09:53,147][131794] Fps is (10 sec: 50790.7, 60 sec: 49049.6, 300 sec: 49049.6). Total num frames: 11968512. Throughput: 0: 12184.4. Samples: 487376. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-08-17 13:09:53,148][131794] Avg episode reward: [(0, '28.361')] [2023-08-17 13:09:53,157][139686] Saving new best policy, reward=28.361! [2023-08-17 13:09:53,160][139699] Updated weights for policy 0, policy_version 2923 (0.0007) [2023-08-17 13:09:53,952][139699] Updated weights for policy 0, policy_version 2933 (0.0006) [2023-08-17 13:09:54,736][139699] Updated weights for policy 0, policy_version 2943 (0.0007) [2023-08-17 13:09:55,533][139699] Updated weights for policy 0, policy_version 2953 (0.0007) [2023-08-17 13:09:56,360][139699] Updated weights for policy 0, policy_version 2963 (0.0007) [2023-08-17 13:09:57,144][139699] Updated weights for policy 0, policy_version 2973 (0.0006) [2023-08-17 13:09:57,954][139699] Updated weights for policy 0, policy_version 2983 (0.0007) [2023-08-17 13:09:58,147][131794] Fps is (10 sec: 51199.9, 60 sec: 49334.0, 300 sec: 49334.0). Total num frames: 12226560. Throughput: 0: 11684.3. Samples: 525796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:09:58,148][131794] Avg episode reward: [(0, '26.901')] [2023-08-17 13:09:58,789][139699] Updated weights for policy 0, policy_version 2993 (0.0007) [2023-08-17 13:09:59,601][139699] Updated weights for policy 0, policy_version 3003 (0.0007) [2023-08-17 13:10:00,411][139699] Updated weights for policy 0, policy_version 3013 (0.0007) [2023-08-17 13:10:01,253][139699] Updated weights for policy 0, policy_version 3023 (0.0006) [2023-08-17 13:10:02,136][139699] Updated weights for policy 0, policy_version 3033 (0.0006) [2023-08-17 13:10:03,005][139699] Updated weights for policy 0, policy_version 3043 (0.0007) [2023-08-17 13:10:03,147][131794] Fps is (10 sec: 49971.1, 60 sec: 49233.9, 300 sec: 49233.9). Total num frames: 12468224. Throughput: 0: 12644.1. Samples: 600640. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:10:03,148][131794] Avg episode reward: [(0, '27.558')] [2023-08-17 13:10:03,842][139699] Updated weights for policy 0, policy_version 3053 (0.0007) [2023-08-17 13:10:04,648][139699] Updated weights for policy 0, policy_version 3063 (0.0006) [2023-08-17 13:10:05,445][139699] Updated weights for policy 0, policy_version 3073 (0.0007) [2023-08-17 13:10:06,269][139699] Updated weights for policy 0, policy_version 3083 (0.0007) [2023-08-17 13:10:07,057][139699] Updated weights for policy 0, policy_version 3093 (0.0006) [2023-08-17 13:10:07,901][139699] Updated weights for policy 0, policy_version 3103 (0.0007) [2023-08-17 13:10:08,147][131794] Fps is (10 sec: 49151.4, 60 sec: 49300.8, 300 sec: 49300.8). Total num frames: 12718080. Throughput: 0: 12565.2. Samples: 674900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:10:08,149][131794] Avg episode reward: [(0, '24.161')] [2023-08-17 13:10:08,753][139699] Updated weights for policy 0, policy_version 3113 (0.0007) [2023-08-17 13:10:09,588][139699] Updated weights for policy 0, policy_version 3123 (0.0007) [2023-08-17 13:10:10,388][139699] Updated weights for policy 0, policy_version 3133 (0.0007) [2023-08-17 13:10:11,210][139699] Updated weights for policy 0, policy_version 3143 (0.0007) [2023-08-17 13:10:12,090][139699] Updated weights for policy 0, policy_version 3153 (0.0007) [2023-08-17 13:10:12,928][139699] Updated weights for policy 0, policy_version 3163 (0.0007) [2023-08-17 13:10:13,147][131794] Fps is (10 sec: 49560.8, 60 sec: 49288.4, 300 sec: 49288.4). Total num frames: 12963840. Throughput: 0: 12524.3. Samples: 711980. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:10:13,149][131794] Avg episode reward: [(0, '25.471')] [2023-08-17 13:10:13,741][139699] Updated weights for policy 0, policy_version 3173 (0.0007) [2023-08-17 13:10:14,522][139699] Updated weights for policy 0, policy_version 3183 (0.0007) [2023-08-17 13:10:15,350][139699] Updated weights for policy 0, policy_version 3193 (0.0006) [2023-08-17 13:10:16,169][139699] Updated weights for policy 0, policy_version 3203 (0.0006) [2023-08-17 13:10:16,954][139699] Updated weights for policy 0, policy_version 3213 (0.0007) [2023-08-17 13:10:17,745][139699] Updated weights for policy 0, policy_version 3223 (0.0006) [2023-08-17 13:10:18,147][131794] Fps is (10 sec: 50381.9, 60 sec: 50380.9, 300 sec: 49467.1). Total num frames: 13221888. Throughput: 0: 12490.0. Samples: 786832. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:10:18,148][131794] Avg episode reward: [(0, '23.229')] [2023-08-17 13:10:18,533][139699] Updated weights for policy 0, policy_version 3233 (0.0006) [2023-08-17 13:10:19,401][139699] Updated weights for policy 0, policy_version 3243 (0.0007) [2023-08-17 13:10:20,293][139699] Updated weights for policy 0, policy_version 3253 (0.0008) [2023-08-17 13:10:21,191][139699] Updated weights for policy 0, policy_version 3263 (0.0007) [2023-08-17 13:10:22,088][139699] Updated weights for policy 0, policy_version 3273 (0.0007) [2023-08-17 13:10:22,892][139699] Updated weights for policy 0, policy_version 3283 (0.0007) [2023-08-17 13:10:23,147][131794] Fps is (10 sec: 49562.6, 60 sec: 49971.2, 300 sec: 49327.6). Total num frames: 13459456. Throughput: 0: 12496.4. Samples: 859288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:10:23,148][131794] Avg episode reward: [(0, '25.173')] [2023-08-17 13:10:23,700][139699] Updated weights for policy 0, policy_version 3293 (0.0007) [2023-08-17 13:10:24,508][139699] Updated weights for policy 0, policy_version 3303 (0.0007) [2023-08-17 13:10:25,295][139699] Updated weights for policy 0, policy_version 3313 (0.0006) [2023-08-17 13:10:26,079][139699] Updated weights for policy 0, policy_version 3323 (0.0007) [2023-08-17 13:10:26,854][139699] Updated weights for policy 0, policy_version 3333 (0.0007) [2023-08-17 13:10:27,672][139699] Updated weights for policy 0, policy_version 3343 (0.0007) [2023-08-17 13:10:28,147][131794] Fps is (10 sec: 49151.4, 60 sec: 49903.0, 300 sec: 49425.0). Total num frames: 13713408. Throughput: 0: 12522.1. Samples: 897488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:10:28,148][131794] Avg episode reward: [(0, '25.422')] [2023-08-17 13:10:28,501][139699] Updated weights for policy 0, policy_version 3353 (0.0007) [2023-08-17 13:10:29,318][139699] Updated weights for policy 0, policy_version 3363 (0.0006) [2023-08-17 13:10:30,097][139699] Updated weights for policy 0, policy_version 3373 (0.0006) [2023-08-17 13:10:30,881][139699] Updated weights for policy 0, policy_version 3383 (0.0006) [2023-08-17 13:10:31,658][139699] Updated weights for policy 0, policy_version 3393 (0.0007) [2023-08-17 13:10:32,441][139699] Updated weights for policy 0, policy_version 3403 (0.0006) [2023-08-17 13:10:33,147][131794] Fps is (10 sec: 51200.1, 60 sec: 50039.6, 300 sec: 49561.6). Total num frames: 13971456. Throughput: 0: 12532.0. Samples: 975112. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:10:33,148][131794] Avg episode reward: [(0, '27.404')] [2023-08-17 13:10:33,217][139699] Updated weights for policy 0, policy_version 3413 (0.0006) [2023-08-17 13:10:34,003][139699] Updated weights for policy 0, policy_version 3423 (0.0006) [2023-08-17 13:10:34,894][139699] Updated weights for policy 0, policy_version 3433 (0.0007) [2023-08-17 13:10:35,705][139699] Updated weights for policy 0, policy_version 3443 (0.0007) [2023-08-17 13:10:36,625][139699] Updated weights for policy 0, policy_version 3453 (0.0007) [2023-08-17 13:10:37,463][139699] Updated weights for policy 0, policy_version 3463 (0.0007) [2023-08-17 13:10:38,147][131794] Fps is (10 sec: 50381.1, 60 sec: 50176.1, 300 sec: 49537.5). Total num frames: 14217216. Throughput: 0: 12475.8. Samples: 1048788. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:10:38,148][131794] Avg episode reward: [(0, '24.527')] [2023-08-17 13:10:38,295][139699] Updated weights for policy 0, policy_version 3473 (0.0007) [2023-08-17 13:10:39,142][139699] Updated weights for policy 0, policy_version 3483 (0.0006) [2023-08-17 13:10:39,973][139699] Updated weights for policy 0, policy_version 3493 (0.0007) [2023-08-17 13:10:40,795][139699] Updated weights for policy 0, policy_version 3503 (0.0006) [2023-08-17 13:10:41,590][139699] Updated weights for policy 0, policy_version 3513 (0.0007) [2023-08-17 13:10:42,419][139699] Updated weights for policy 0, policy_version 3523 (0.0007) [2023-08-17 13:10:42,729][139702] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(1, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 13:10:42,730][139702] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop [2023-08-17 13:10:42,733][139706] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(0, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 13:10:42,733][139707] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(0, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 13:10:42,733][139703] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(0, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 13:10:42,734][139706] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop [2023-08-17 13:10:42,734][139707] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop [2023-08-17 13:10:42,734][139703] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop [2023-08-17 13:10:42,733][139705] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(0, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 13:10:42,734][139705] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop [2023-08-17 13:10:42,734][139700] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(1, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 13:10:42,734][139701] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(1, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 13:10:42,735][139700] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop [2023-08-17 13:10:42,735][139701] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop [2023-08-17 13:10:42,735][139704] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(0, 0) Traceback (most recent call last): File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 469, in step observation, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/nix/store/b84h28azn9cg3h9940zb3b3x2569sykl-python3-3.10.12-env/lib/python3.10/site-packages/gymnasium/core.py", line 408, in step return self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) File "/home/patonw/code/learn/deep-rl-class/.mypy/lib/python3.10/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2023-08-17 13:10:42,736][139704] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop [2023-08-17 13:10:42,746][131794] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 131794], exiting... [2023-08-17 13:10:42,749][139686] Stopping Batcher_0... [2023-08-17 13:10:42,749][139686] Loop batcher_evt_loop terminating... [2023-08-17 13:10:42,748][131794] Runner profile tree view: main_loop: 93.4025 [2023-08-17 13:10:42,750][131794] Collected {0: 14442496}, FPS: 47493.0 [2023-08-17 13:10:42,750][139686] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003526_14442496.pth... [2023-08-17 13:10:42,817][139686] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002142_8773632.pth [2023-08-17 13:10:42,823][139686] Stopping LearnerWorker_p0... [2023-08-17 13:10:42,824][139686] Loop learner_proc0_evt_loop terminating... [2023-08-17 13:10:42,824][139699] Weights refcount: 2 0 [2023-08-17 13:10:42,825][139699] Stopping InferenceWorker_p0-w0... [2023-08-17 13:10:42,825][139699] Loop inference_proc0-0_evt_loop terminating... [2023-08-17 13:11:05,098][140404] Saving configuration to /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... [2023-08-17 13:11:05,231][140404] Rollout worker 0 uses device cpu [2023-08-17 13:11:05,232][140404] Rollout worker 1 uses device cpu [2023-08-17 13:11:05,233][140404] Rollout worker 2 uses device cpu [2023-08-17 13:11:05,233][140404] Rollout worker 3 uses device cpu [2023-08-17 13:11:05,234][140404] Rollout worker 4 uses device cpu [2023-08-17 13:11:05,234][140404] Rollout worker 5 uses device cpu [2023-08-17 13:11:05,235][140404] Rollout worker 6 uses device cpu [2023-08-17 13:11:05,235][140404] Rollout worker 7 uses device cpu [2023-08-17 13:11:05,297][140404] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 13:11:05,298][140404] InferenceWorker_p0-w0: min num requests: 2 [2023-08-17 13:11:05,317][140404] Starting all processes... [2023-08-17 13:11:05,318][140404] Starting process learner_proc0 [2023-08-17 13:11:05,367][140404] Starting all processes... [2023-08-17 13:11:05,370][140404] Starting process inference_proc0-0 [2023-08-17 13:11:05,370][140404] Starting process rollout_proc0 [2023-08-17 13:11:05,370][140404] Starting process rollout_proc1 [2023-08-17 13:11:05,370][140404] Starting process rollout_proc2 [2023-08-17 13:11:05,371][140404] Starting process rollout_proc3 [2023-08-17 13:11:05,371][140404] Starting process rollout_proc4 [2023-08-17 13:11:05,371][140404] Starting process rollout_proc5 [2023-08-17 13:11:05,371][140404] Starting process rollout_proc6 [2023-08-17 13:11:05,371][140404] Starting process rollout_proc7 [2023-08-17 13:11:06,429][140503] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 13:11:06,429][140503] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-08-17 13:11:06,441][140503] Num visible devices: 1 [2023-08-17 13:11:06,501][140507] Worker 4 uses CPU cores [12, 13, 14] [2023-08-17 13:11:06,501][140506] Worker 3 uses CPU cores [9, 10, 11] [2023-08-17 13:11:06,504][140489] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 13:11:06,504][140489] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-08-17 13:11:06,507][140502] Worker 0 uses CPU cores [0, 1, 2] [2023-08-17 13:11:06,508][140489] Num visible devices: 1 [2023-08-17 13:11:06,508][140509] Worker 7 uses CPU cores [21, 22, 23] [2023-08-17 13:11:06,516][140508] Worker 5 uses CPU cores [15, 16, 17] [2023-08-17 13:11:06,521][140505] Worker 2 uses CPU cores [6, 7, 8] [2023-08-17 13:11:06,531][140504] Worker 1 uses CPU cores [3, 4, 5] [2023-08-17 13:11:06,531][140489] Starting seed is not provided [2023-08-17 13:11:06,532][140489] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 13:11:06,532][140489] Initializing actor-critic model on device cuda:0 [2023-08-17 13:11:06,532][140489] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 13:11:06,532][140489] RunningMeanStd input shape: (1,) [2023-08-17 13:11:06,541][140489] ConvEncoder: input_channels=3 [2023-08-17 13:11:06,567][140510] Worker 6 uses CPU cores [18, 19, 20] [2023-08-17 13:11:06,607][140489] Conv encoder output size: 512 [2023-08-17 13:11:06,608][140489] Policy head output size: 512 [2023-08-17 13:11:06,616][140489] Created Actor Critic model with architecture: [2023-08-17 13:11:06,616][140489] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2023-08-17 13:11:07,757][140489] Using optimizer [2023-08-17 13:11:07,757][140489] Loading state from checkpoint /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003526_14442496.pth... [2023-08-17 13:11:07,780][140489] Loading model from checkpoint [2023-08-17 13:11:07,783][140489] Loaded experiment state at self.train_step=3526, self.env_steps=14442496 [2023-08-17 13:11:07,784][140489] Initialized policy 0 weights for model version 3526 [2023-08-17 13:11:07,785][140489] LearnerWorker_p0 finished initialization! [2023-08-17 13:11:07,785][140489] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-08-17 13:11:08,084][140404] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 14442496. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-08-17 13:11:08,349][140503] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 13:11:08,349][140503] RunningMeanStd input shape: (1,) [2023-08-17 13:11:08,356][140503] ConvEncoder: input_channels=3 [2023-08-17 13:11:08,406][140503] Conv encoder output size: 512 [2023-08-17 13:11:08,406][140503] Policy head output size: 512 [2023-08-17 13:11:08,992][140404] Inference worker 0-0 is ready! [2023-08-17 13:11:08,993][140404] All inference workers are ready! Signal rollout workers to start! [2023-08-17 13:11:09,013][140508] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:11:09,013][140504] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:11:09,013][140502] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:11:09,013][140507] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:11:09,013][140509] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:11:09,014][140506] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:11:09,014][140505] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:11:09,014][140510] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 13:11:09,272][140506] Decorrelating experience for 0 frames... [2023-08-17 13:11:09,273][140502] Decorrelating experience for 0 frames... [2023-08-17 13:11:09,273][140508] Decorrelating experience for 0 frames... [2023-08-17 13:11:09,304][140505] Decorrelating experience for 0 frames... [2023-08-17 13:11:09,320][140510] Decorrelating experience for 0 frames... [2023-08-17 13:11:09,495][140506] Decorrelating experience for 32 frames... [2023-08-17 13:11:09,497][140507] Decorrelating experience for 0 frames... [2023-08-17 13:11:09,498][140502] Decorrelating experience for 32 frames... [2023-08-17 13:11:09,501][140509] Decorrelating experience for 0 frames... [2023-08-17 13:11:09,705][140506] Decorrelating experience for 64 frames... [2023-08-17 13:11:09,710][140509] Decorrelating experience for 32 frames... [2023-08-17 13:11:09,715][140507] Decorrelating experience for 32 frames... [2023-08-17 13:11:09,726][140510] Decorrelating experience for 32 frames... [2023-08-17 13:11:09,731][140502] Decorrelating experience for 64 frames... [2023-08-17 13:11:09,926][140507] Decorrelating experience for 64 frames... [2023-08-17 13:11:09,929][140508] Decorrelating experience for 32 frames... [2023-08-17 13:11:09,941][140506] Decorrelating experience for 96 frames... [2023-08-17 13:11:09,946][140502] Decorrelating experience for 96 frames... [2023-08-17 13:11:10,000][140509] Decorrelating experience for 64 frames... [2023-08-17 13:11:10,130][140504] Decorrelating experience for 0 frames... [2023-08-17 13:11:10,154][140508] Decorrelating experience for 64 frames... [2023-08-17 13:11:10,182][140506] Decorrelating experience for 128 frames... [2023-08-17 13:11:10,215][140509] Decorrelating experience for 96 frames... [2023-08-17 13:11:10,264][140507] Decorrelating experience for 96 frames... [2023-08-17 13:11:10,376][140502] Decorrelating experience for 128 frames... [2023-08-17 13:11:10,381][140508] Decorrelating experience for 96 frames... [2023-08-17 13:11:10,388][140504] Decorrelating experience for 32 frames... [2023-08-17 13:11:10,511][140507] Decorrelating experience for 128 frames... [2023-08-17 13:11:10,585][140510] Decorrelating experience for 64 frames... [2023-08-17 13:11:10,593][140509] Decorrelating experience for 128 frames... [2023-08-17 13:11:10,612][140508] Decorrelating experience for 128 frames... [2023-08-17 13:11:10,617][140502] Decorrelating experience for 160 frames... [2023-08-17 13:11:10,747][140507] Decorrelating experience for 160 frames... [2023-08-17 13:11:10,784][140504] Decorrelating experience for 64 frames... [2023-08-17 13:11:10,818][140509] Decorrelating experience for 160 frames... [2023-08-17 13:11:10,854][140508] Decorrelating experience for 160 frames... [2023-08-17 13:11:10,969][140507] Decorrelating experience for 192 frames... [2023-08-17 13:11:10,999][140510] Decorrelating experience for 96 frames... [2023-08-17 13:11:11,076][140504] Decorrelating experience for 96 frames... [2023-08-17 13:11:11,114][140508] Decorrelating experience for 192 frames... [2023-08-17 13:11:11,204][140509] Decorrelating experience for 192 frames... [2023-08-17 13:11:11,221][140507] Decorrelating experience for 224 frames... [2023-08-17 13:11:11,226][140505] Decorrelating experience for 32 frames... [2023-08-17 13:11:11,243][140510] Decorrelating experience for 128 frames... [2023-08-17 13:11:11,319][140504] Decorrelating experience for 128 frames... [2023-08-17 13:11:11,328][140502] Decorrelating experience for 192 frames... [2023-08-17 13:11:11,439][140506] Decorrelating experience for 160 frames... [2023-08-17 13:11:11,457][140509] Decorrelating experience for 224 frames... [2023-08-17 13:11:11,471][140505] Decorrelating experience for 64 frames... [2023-08-17 13:11:11,536][140510] Decorrelating experience for 160 frames... [2023-08-17 13:11:11,582][140502] Decorrelating experience for 224 frames... [2023-08-17 13:11:11,660][140508] Decorrelating experience for 224 frames... [2023-08-17 13:11:11,713][140504] Decorrelating experience for 160 frames... [2023-08-17 13:11:11,737][140505] Decorrelating experience for 96 frames... [2023-08-17 13:11:11,815][140510] Decorrelating experience for 192 frames... [2023-08-17 13:11:11,902][140506] Decorrelating experience for 192 frames... [2023-08-17 13:11:11,959][140504] Decorrelating experience for 192 frames... [2023-08-17 13:11:11,999][140505] Decorrelating experience for 128 frames... [2023-08-17 13:11:12,094][140510] Decorrelating experience for 224 frames... [2023-08-17 13:11:12,137][140489] Signal inference workers to stop experience collection... [2023-08-17 13:11:12,140][140503] InferenceWorker_p0-w0: stopping experience collection [2023-08-17 13:11:12,193][140506] Decorrelating experience for 224 frames... [2023-08-17 13:11:12,231][140504] Decorrelating experience for 224 frames... [2023-08-17 13:11:12,276][140505] Decorrelating experience for 160 frames... [2023-08-17 13:11:12,511][140505] Decorrelating experience for 192 frames... [2023-08-17 13:11:12,756][140505] Decorrelating experience for 224 frames... [2023-08-17 13:11:13,047][140489] Signal inference workers to resume experience collection... [2023-08-17 13:11:13,048][140503] InferenceWorker_p0-w0: resuming experience collection [2023-08-17 13:11:13,084][140404] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 14446592. Throughput: 0: 256.8. Samples: 1284. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-08-17 13:11:13,084][140404] Avg episode reward: [(0, '1.058')] [2023-08-17 13:11:14,253][140503] Updated weights for policy 0, policy_version 3536 (0.0186) [2023-08-17 13:11:15,098][140503] Updated weights for policy 0, policy_version 3546 (0.0007) [2023-08-17 13:11:15,904][140503] Updated weights for policy 0, policy_version 3556 (0.0007) [2023-08-17 13:11:16,702][140503] Updated weights for policy 0, policy_version 3566 (0.0007) [2023-08-17 13:11:17,530][140503] Updated weights for policy 0, policy_version 3576 (0.0007) [2023-08-17 13:11:18,084][140404] Fps is (10 sec: 22937.5, 60 sec: 22937.5, 300 sec: 22937.5). Total num frames: 14671872. Throughput: 0: 5819.6. Samples: 58196. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:11:18,085][140404] Avg episode reward: [(0, '26.998')] [2023-08-17 13:11:18,337][140503] Updated weights for policy 0, policy_version 3586 (0.0007) [2023-08-17 13:11:19,147][140503] Updated weights for policy 0, policy_version 3596 (0.0007) [2023-08-17 13:11:19,945][140503] Updated weights for policy 0, policy_version 3606 (0.0006) [2023-08-17 13:11:20,716][140503] Updated weights for policy 0, policy_version 3616 (0.0006) [2023-08-17 13:11:21,534][140503] Updated weights for policy 0, policy_version 3626 (0.0006) [2023-08-17 13:11:22,333][140503] Updated weights for policy 0, policy_version 3636 (0.0006) [2023-08-17 13:11:23,084][140404] Fps is (10 sec: 48332.1, 60 sec: 32494.7, 300 sec: 32494.7). Total num frames: 14929920. Throughput: 0: 6449.3. Samples: 96740. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:11:23,085][140404] Avg episode reward: [(0, '30.229')] [2023-08-17 13:11:23,088][140489] Saving new best policy, reward=30.229! [2023-08-17 13:11:23,175][140503] Updated weights for policy 0, policy_version 3646 (0.0006) [2023-08-17 13:11:23,997][140503] Updated weights for policy 0, policy_version 3656 (0.0007) [2023-08-17 13:11:24,839][140503] Updated weights for policy 0, policy_version 3666 (0.0007) [2023-08-17 13:11:25,292][140404] Heartbeat connected on Batcher_0 [2023-08-17 13:11:25,300][140404] Heartbeat connected on InferenceWorker_p0-w0 [2023-08-17 13:11:25,303][140404] Heartbeat connected on RolloutWorker_w1 [2023-08-17 13:11:25,304][140404] Heartbeat connected on RolloutWorker_w0 [2023-08-17 13:11:25,307][140404] Heartbeat connected on RolloutWorker_w2 [2023-08-17 13:11:25,308][140404] Heartbeat connected on LearnerWorker_p0 [2023-08-17 13:11:25,310][140404] Heartbeat connected on RolloutWorker_w3 [2023-08-17 13:11:25,315][140404] Heartbeat connected on RolloutWorker_w4 [2023-08-17 13:11:25,316][140404] Heartbeat connected on RolloutWorker_w5 [2023-08-17 13:11:25,317][140404] Heartbeat connected on RolloutWorker_w6 [2023-08-17 13:11:25,321][140404] Heartbeat connected on RolloutWorker_w7 [2023-08-17 13:11:25,633][140503] Updated weights for policy 0, policy_version 3676 (0.0007) [2023-08-17 13:11:26,418][140503] Updated weights for policy 0, policy_version 3686 (0.0006) [2023-08-17 13:11:27,245][140503] Updated weights for policy 0, policy_version 3696 (0.0007) [2023-08-17 13:11:28,075][140503] Updated weights for policy 0, policy_version 3706 (0.0007) [2023-08-17 13:11:28,084][140404] Fps is (10 sec: 50790.4, 60 sec: 36864.0, 300 sec: 36864.0). Total num frames: 15179776. Throughput: 0: 8612.2. Samples: 172244. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:11:28,085][140404] Avg episode reward: [(0, '28.091')] [2023-08-17 13:11:28,894][140503] Updated weights for policy 0, policy_version 3716 (0.0007) [2023-08-17 13:11:29,730][140503] Updated weights for policy 0, policy_version 3726 (0.0007) [2023-08-17 13:11:30,591][140503] Updated weights for policy 0, policy_version 3736 (0.0007) [2023-08-17 13:11:31,412][140503] Updated weights for policy 0, policy_version 3746 (0.0007) [2023-08-17 13:11:32,255][140503] Updated weights for policy 0, policy_version 3756 (0.0007) [2023-08-17 13:11:33,039][140503] Updated weights for policy 0, policy_version 3766 (0.0006) [2023-08-17 13:11:33,084][140404] Fps is (10 sec: 49562.3, 60 sec: 39321.6, 300 sec: 39321.6). Total num frames: 15425536. Throughput: 0: 9853.8. Samples: 246344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:11:33,085][140404] Avg episode reward: [(0, '26.729')] [2023-08-17 13:11:33,853][140503] Updated weights for policy 0, policy_version 3776 (0.0007) [2023-08-17 13:11:34,648][140503] Updated weights for policy 0, policy_version 3786 (0.0007) [2023-08-17 13:11:35,463][140503] Updated weights for policy 0, policy_version 3796 (0.0007) [2023-08-17 13:11:36,358][140503] Updated weights for policy 0, policy_version 3806 (0.0007) [2023-08-17 13:11:37,165][140503] Updated weights for policy 0, policy_version 3816 (0.0007) [2023-08-17 13:11:37,949][140503] Updated weights for policy 0, policy_version 3826 (0.0006) [2023-08-17 13:11:38,084][140404] Fps is (10 sec: 49561.9, 60 sec: 41096.6, 300 sec: 41096.6). Total num frames: 15675392. Throughput: 0: 9462.4. Samples: 283872. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-08-17 13:11:38,085][140404] Avg episode reward: [(0, '29.162')] [2023-08-17 13:11:38,723][140503] Updated weights for policy 0, policy_version 3836 (0.0006) [2023-08-17 13:11:39,523][140503] Updated weights for policy 0, policy_version 3846 (0.0007) [2023-08-17 13:11:40,373][140503] Updated weights for policy 0, policy_version 3856 (0.0007) [2023-08-17 13:11:41,270][140503] Updated weights for policy 0, policy_version 3866 (0.0007) [2023-08-17 13:11:42,080][140503] Updated weights for policy 0, policy_version 3876 (0.0007) [2023-08-17 13:11:42,923][140503] Updated weights for policy 0, policy_version 3886 (0.0007) [2023-08-17 13:11:43,084][140404] Fps is (10 sec: 49971.1, 60 sec: 42364.4, 300 sec: 42364.4). Total num frames: 15925248. Throughput: 0: 10249.9. Samples: 358748. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:11:43,085][140404] Avg episode reward: [(0, '28.969')] [2023-08-17 13:11:43,688][140503] Updated weights for policy 0, policy_version 3896 (0.0006) [2023-08-17 13:11:44,467][140503] Updated weights for policy 0, policy_version 3906 (0.0006) [2023-08-17 13:11:45,251][140503] Updated weights for policy 0, policy_version 3916 (0.0006) [2023-08-17 13:11:46,053][140503] Updated weights for policy 0, policy_version 3926 (0.0007) [2023-08-17 13:11:46,844][140503] Updated weights for policy 0, policy_version 3936 (0.0006) [2023-08-17 13:11:47,667][140503] Updated weights for policy 0, policy_version 3946 (0.0007) [2023-08-17 13:11:48,084][140404] Fps is (10 sec: 50790.3, 60 sec: 43520.0, 300 sec: 43520.0). Total num frames: 16183296. Throughput: 0: 10885.5. Samples: 435420. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:11:48,085][140404] Avg episode reward: [(0, '28.787')] [2023-08-17 13:11:48,479][140503] Updated weights for policy 0, policy_version 3956 (0.0007) [2023-08-17 13:11:49,332][140503] Updated weights for policy 0, policy_version 3966 (0.0007) [2023-08-17 13:11:50,148][140503] Updated weights for policy 0, policy_version 3976 (0.0007) [2023-08-17 13:11:50,973][140503] Updated weights for policy 0, policy_version 3986 (0.0007) [2023-08-17 13:11:51,835][140503] Updated weights for policy 0, policy_version 3996 (0.0007) [2023-08-17 13:11:52,633][140503] Updated weights for policy 0, policy_version 4006 (0.0007) [2023-08-17 13:11:53,084][140404] Fps is (10 sec: 50380.2, 60 sec: 44145.7, 300 sec: 44145.7). Total num frames: 16429056. Throughput: 0: 10499.8. Samples: 472492. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:11:53,085][140404] Avg episode reward: [(0, '30.705')] [2023-08-17 13:11:53,089][140489] Saving new best policy, reward=30.705! [2023-08-17 13:11:53,446][140503] Updated weights for policy 0, policy_version 4016 (0.0007) [2023-08-17 13:11:54,259][140503] Updated weights for policy 0, policy_version 4026 (0.0007) [2023-08-17 13:11:55,061][140503] Updated weights for policy 0, policy_version 4036 (0.0006) [2023-08-17 13:11:55,916][140503] Updated weights for policy 0, policy_version 4046 (0.0007) [2023-08-17 13:11:56,749][140503] Updated weights for policy 0, policy_version 4056 (0.0006) [2023-08-17 13:11:57,597][140503] Updated weights for policy 0, policy_version 4066 (0.0007) [2023-08-17 13:11:58,084][140404] Fps is (10 sec: 49561.3, 60 sec: 44728.3, 300 sec: 44728.3). Total num frames: 16678912. Throughput: 0: 12126.7. Samples: 546988. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:11:58,085][140404] Avg episode reward: [(0, '32.163')] [2023-08-17 13:11:58,086][140489] Saving new best policy, reward=32.163! [2023-08-17 13:11:58,398][140503] Updated weights for policy 0, policy_version 4076 (0.0006) [2023-08-17 13:11:59,215][140503] Updated weights for policy 0, policy_version 4086 (0.0007) [2023-08-17 13:12:00,017][140503] Updated weights for policy 0, policy_version 4096 (0.0006) [2023-08-17 13:12:00,827][140503] Updated weights for policy 0, policy_version 4106 (0.0007) [2023-08-17 13:12:01,632][140503] Updated weights for policy 0, policy_version 4116 (0.0007) [2023-08-17 13:12:02,461][140503] Updated weights for policy 0, policy_version 4126 (0.0007) [2023-08-17 13:12:03,084][140404] Fps is (10 sec: 49971.5, 60 sec: 45204.9, 300 sec: 45204.9). Total num frames: 16928768. Throughput: 0: 12533.2. Samples: 622192. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:12:03,085][140404] Avg episode reward: [(0, '29.896')] [2023-08-17 13:12:03,297][140503] Updated weights for policy 0, policy_version 4136 (0.0007) [2023-08-17 13:12:04,106][140503] Updated weights for policy 0, policy_version 4146 (0.0006) [2023-08-17 13:12:04,930][140503] Updated weights for policy 0, policy_version 4156 (0.0007) [2023-08-17 13:12:05,765][140503] Updated weights for policy 0, policy_version 4166 (0.0007) [2023-08-17 13:12:06,551][140503] Updated weights for policy 0, policy_version 4176 (0.0006) [2023-08-17 13:12:07,375][140503] Updated weights for policy 0, policy_version 4186 (0.0007) [2023-08-17 13:12:08,084][140404] Fps is (10 sec: 49971.1, 60 sec: 45602.1, 300 sec: 45602.1). Total num frames: 17178624. Throughput: 0: 12505.9. Samples: 659504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:12:08,085][140404] Avg episode reward: [(0, '30.775')] [2023-08-17 13:12:08,235][140503] Updated weights for policy 0, policy_version 4196 (0.0007) [2023-08-17 13:12:09,058][140503] Updated weights for policy 0, policy_version 4206 (0.0007) [2023-08-17 13:12:09,873][140503] Updated weights for policy 0, policy_version 4216 (0.0007) [2023-08-17 13:12:10,702][140503] Updated weights for policy 0, policy_version 4226 (0.0006) [2023-08-17 13:12:11,536][140503] Updated weights for policy 0, policy_version 4236 (0.0007) [2023-08-17 13:12:12,351][140503] Updated weights for policy 0, policy_version 4246 (0.0007) [2023-08-17 13:12:13,084][140404] Fps is (10 sec: 49971.7, 60 sec: 49698.1, 300 sec: 45938.2). Total num frames: 17428480. Throughput: 0: 12489.3. Samples: 734264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2023-08-17 13:12:13,085][140404] Avg episode reward: [(0, '30.316')] [2023-08-17 13:12:13,154][140503] Updated weights for policy 0, policy_version 4256 (0.0006) [2023-08-17 13:12:13,954][140503] Updated weights for policy 0, policy_version 4266 (0.0006) [2023-08-17 13:12:14,761][140503] Updated weights for policy 0, policy_version 4276 (0.0007) [2023-08-17 13:12:15,587][140503] Updated weights for policy 0, policy_version 4286 (0.0008) [2023-08-17 13:12:16,377][140503] Updated weights for policy 0, policy_version 4296 (0.0006) [2023-08-17 13:12:17,173][140503] Updated weights for policy 0, policy_version 4306 (0.0006) [2023-08-17 13:12:17,977][140503] Updated weights for policy 0, policy_version 4316 (0.0006) [2023-08-17 13:12:18,084][140404] Fps is (10 sec: 50381.0, 60 sec: 50176.0, 300 sec: 46284.8). Total num frames: 17682432. Throughput: 0: 12543.5. Samples: 810800. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:12:18,085][140404] Avg episode reward: [(0, '29.583')] [2023-08-17 13:12:18,701][140503] Updated weights for policy 0, policy_version 4326 (0.0006) [2023-08-17 13:12:19,456][140503] Updated weights for policy 0, policy_version 4336 (0.0007) [2023-08-17 13:12:20,205][140503] Updated weights for policy 0, policy_version 4346 (0.0007) [2023-08-17 13:12:20,975][140503] Updated weights for policy 0, policy_version 4356 (0.0006) [2023-08-17 13:12:21,725][140503] Updated weights for policy 0, policy_version 4366 (0.0006) [2023-08-17 13:12:22,482][140503] Updated weights for policy 0, policy_version 4376 (0.0006) [2023-08-17 13:12:23,084][140404] Fps is (10 sec: 52428.5, 60 sec: 50380.9, 300 sec: 46803.6). Total num frames: 17952768. Throughput: 0: 12608.3. Samples: 851248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:12:23,085][140404] Avg episode reward: [(0, '31.579')] [2023-08-17 13:12:23,298][140503] Updated weights for policy 0, policy_version 4386 (0.0007) [2023-08-17 13:12:24,081][140503] Updated weights for policy 0, policy_version 4396 (0.0007) [2023-08-17 13:12:24,847][140503] Updated weights for policy 0, policy_version 4406 (0.0007) [2023-08-17 13:12:25,587][140503] Updated weights for policy 0, policy_version 4416 (0.0006) [2023-08-17 13:12:26,341][140503] Updated weights for policy 0, policy_version 4426 (0.0006) [2023-08-17 13:12:27,081][140503] Updated weights for policy 0, policy_version 4436 (0.0006) [2023-08-17 13:12:27,863][140503] Updated weights for policy 0, policy_version 4446 (0.0006) [2023-08-17 13:12:28,084][140404] Fps is (10 sec: 53657.3, 60 sec: 50653.8, 300 sec: 47206.4). Total num frames: 18219008. Throughput: 0: 12728.2. Samples: 931516. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-08-17 13:12:28,085][140404] Avg episode reward: [(0, '28.702')] [2023-08-17 13:12:28,678][140503] Updated weights for policy 0, policy_version 4456 (0.0007) [2023-08-17 13:12:29,427][140503] Updated weights for policy 0, policy_version 4466 (0.0006) [2023-08-17 13:12:30,187][140503] Updated weights for policy 0, policy_version 4476 (0.0006) [2023-08-17 13:12:30,952][140503] Updated weights for policy 0, policy_version 4486 (0.0007) [2023-08-17 13:12:31,713][140503] Updated weights for policy 0, policy_version 4496 (0.0007) [2023-08-17 13:12:32,487][140503] Updated weights for policy 0, policy_version 4506 (0.0006) [2023-08-17 13:12:33,084][140404] Fps is (10 sec: 53248.0, 60 sec: 50995.2, 300 sec: 47561.8). Total num frames: 18485248. Throughput: 0: 12801.8. Samples: 1011500. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:12:33,085][140404] Avg episode reward: [(0, '32.508')] [2023-08-17 13:12:33,088][140489] Saving new best policy, reward=32.508! [2023-08-17 13:12:33,270][140503] Updated weights for policy 0, policy_version 4516 (0.0006) [2023-08-17 13:12:34,059][140503] Updated weights for policy 0, policy_version 4526 (0.0006) [2023-08-17 13:12:34,818][140503] Updated weights for policy 0, policy_version 4536 (0.0006) [2023-08-17 13:12:35,593][140503] Updated weights for policy 0, policy_version 4546 (0.0007) [2023-08-17 13:12:36,376][140503] Updated weights for policy 0, policy_version 4556 (0.0007) [2023-08-17 13:12:37,179][140503] Updated weights for policy 0, policy_version 4566 (0.0007) [2023-08-17 13:12:37,952][140503] Updated weights for policy 0, policy_version 4576 (0.0007) [2023-08-17 13:12:38,084][140404] Fps is (10 sec: 52838.8, 60 sec: 51200.0, 300 sec: 47832.2). Total num frames: 18747392. Throughput: 0: 12856.7. Samples: 1051040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:12:38,085][140404] Avg episode reward: [(0, '32.126')] [2023-08-17 13:12:38,744][140503] Updated weights for policy 0, policy_version 4586 (0.0007) [2023-08-17 13:12:39,532][140503] Updated weights for policy 0, policy_version 4596 (0.0007) [2023-08-17 13:12:40,330][140503] Updated weights for policy 0, policy_version 4606 (0.0007) [2023-08-17 13:12:41,102][140503] Updated weights for policy 0, policy_version 4616 (0.0006) [2023-08-17 13:12:41,881][140503] Updated weights for policy 0, policy_version 4626 (0.0006) [2023-08-17 13:12:42,657][140503] Updated weights for policy 0, policy_version 4636 (0.0006) [2023-08-17 13:12:43,084][140404] Fps is (10 sec: 52429.0, 60 sec: 51404.8, 300 sec: 48074.1). Total num frames: 19009536. Throughput: 0: 12939.7. Samples: 1129276. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-08-17 13:12:43,085][140404] Avg episode reward: [(0, '28.503')] [2023-08-17 13:12:43,448][140503] Updated weights for policy 0, policy_version 4646 (0.0006) [2023-08-17 13:12:44,216][140503] Updated weights for policy 0, policy_version 4656 (0.0007) [2023-08-17 13:12:44,962][140503] Updated weights for policy 0, policy_version 4666 (0.0007) [2023-08-17 13:12:45,766][140503] Updated weights for policy 0, policy_version 4676 (0.0007) [2023-08-17 13:12:46,500][140503] Updated weights for policy 0, policy_version 4686 (0.0007) [2023-08-17 13:12:47,231][140503] Updated weights for policy 0, policy_version 4696 (0.0006) [2023-08-17 13:12:47,998][140503] Updated weights for policy 0, policy_version 4706 (0.0006) [2023-08-17 13:12:48,084][140404] Fps is (10 sec: 53247.8, 60 sec: 51609.6, 300 sec: 48373.7). Total num frames: 19279872. Throughput: 0: 13048.9. Samples: 1209392. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-08-17 13:12:48,085][140404] Avg episode reward: [(0, '29.581')] [2023-08-17 13:12:48,776][140503] Updated weights for policy 0, policy_version 4716 (0.0007) [2023-08-17 13:12:49,528][140503] Updated weights for policy 0, policy_version 4726 (0.0006) [2023-08-17 13:12:50,272][140503] Updated weights for policy 0, policy_version 4736 (0.0006) [2023-08-17 13:12:51,033][140503] Updated weights for policy 0, policy_version 4746 (0.0007) [2023-08-17 13:12:51,787][140503] Updated weights for policy 0, policy_version 4756 (0.0007) [2023-08-17 13:12:52,543][140503] Updated weights for policy 0, policy_version 4766 (0.0006) [2023-08-17 13:12:53,084][140404] Fps is (10 sec: 53656.9, 60 sec: 51950.9, 300 sec: 48605.8). Total num frames: 19546112. Throughput: 0: 13118.3. Samples: 1249828. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:12:53,085][140404] Avg episode reward: [(0, '26.128')] [2023-08-17 13:12:53,343][140503] Updated weights for policy 0, policy_version 4776 (0.0007) [2023-08-17 13:12:54,121][140503] Updated weights for policy 0, policy_version 4786 (0.0006) [2023-08-17 13:12:54,878][140503] Updated weights for policy 0, policy_version 4796 (0.0006) [2023-08-17 13:12:55,649][140503] Updated weights for policy 0, policy_version 4806 (0.0006) [2023-08-17 13:12:56,393][140503] Updated weights for policy 0, policy_version 4816 (0.0006) [2023-08-17 13:12:57,128][140503] Updated weights for policy 0, policy_version 4826 (0.0006) [2023-08-17 13:12:57,881][140503] Updated weights for policy 0, policy_version 4836 (0.0006) [2023-08-17 13:12:58,084][140404] Fps is (10 sec: 53657.4, 60 sec: 52292.2, 300 sec: 48854.1). Total num frames: 19816448. Throughput: 0: 13246.7. Samples: 1330368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:12:58,085][140404] Avg episode reward: [(0, '31.256')] [2023-08-17 13:12:58,638][140503] Updated weights for policy 0, policy_version 4846 (0.0006) [2023-08-17 13:12:59,417][140503] Updated weights for policy 0, policy_version 4856 (0.0006) [2023-08-17 13:13:00,155][140503] Updated weights for policy 0, policy_version 4866 (0.0006) [2023-08-17 13:13:00,912][140503] Updated weights for policy 0, policy_version 4876 (0.0006) [2023-08-17 13:13:01,641][140503] Updated weights for policy 0, policy_version 4886 (0.0006) [2023-08-17 13:13:02,372][140503] Updated weights for policy 0, policy_version 4896 (0.0006) [2023-08-17 13:13:03,084][140404] Fps is (10 sec: 54067.8, 60 sec: 52633.6, 300 sec: 49080.8). Total num frames: 20086784. Throughput: 0: 13361.0. Samples: 1412044. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:13:03,085][140404] Avg episode reward: [(0, '31.117')] [2023-08-17 13:13:03,089][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004904_20086784.pth... [2023-08-17 13:13:03,135][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth [2023-08-17 13:13:03,179][140503] Updated weights for policy 0, policy_version 4906 (0.0007) [2023-08-17 13:13:03,933][140503] Updated weights for policy 0, policy_version 4916 (0.0006) [2023-08-17 13:13:04,706][140503] Updated weights for policy 0, policy_version 4926 (0.0007) [2023-08-17 13:13:05,479][140503] Updated weights for policy 0, policy_version 4936 (0.0006) [2023-08-17 13:13:06,231][140503] Updated weights for policy 0, policy_version 4946 (0.0006) [2023-08-17 13:13:06,992][140503] Updated weights for policy 0, policy_version 4956 (0.0006) [2023-08-17 13:13:07,740][140503] Updated weights for policy 0, policy_version 4966 (0.0006) [2023-08-17 13:13:08,084][140404] Fps is (10 sec: 54067.4, 60 sec: 52975.0, 300 sec: 49288.5). Total num frames: 20357120. Throughput: 0: 13347.8. Samples: 1451900. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:13:08,085][140404] Avg episode reward: [(0, '32.514')] [2023-08-17 13:13:08,086][140489] Saving new best policy, reward=32.514! [2023-08-17 13:13:08,502][140503] Updated weights for policy 0, policy_version 4976 (0.0006) [2023-08-17 13:13:09,245][140503] Updated weights for policy 0, policy_version 4986 (0.0006) [2023-08-17 13:13:10,012][140503] Updated weights for policy 0, policy_version 4996 (0.0006) [2023-08-17 13:13:10,786][140503] Updated weights for policy 0, policy_version 5006 (0.0006) [2023-08-17 13:13:11,540][140503] Updated weights for policy 0, policy_version 5016 (0.0007) [2023-08-17 13:13:12,309][140503] Updated weights for policy 0, policy_version 5026 (0.0007) [2023-08-17 13:13:13,067][140503] Updated weights for policy 0, policy_version 5036 (0.0006) [2023-08-17 13:13:13,084][140404] Fps is (10 sec: 54066.9, 60 sec: 53316.2, 300 sec: 49479.7). Total num frames: 20627456. Throughput: 0: 13361.4. Samples: 1532780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:13:13,085][140404] Avg episode reward: [(0, '31.995')] [2023-08-17 13:13:13,828][140503] Updated weights for policy 0, policy_version 5046 (0.0006) [2023-08-17 13:13:14,592][140503] Updated weights for policy 0, policy_version 5056 (0.0007) [2023-08-17 13:13:15,353][140503] Updated weights for policy 0, policy_version 5066 (0.0007) [2023-08-17 13:13:16,115][140503] Updated weights for policy 0, policy_version 5076 (0.0007) [2023-08-17 13:13:16,877][140503] Updated weights for policy 0, policy_version 5086 (0.0006) [2023-08-17 13:13:17,599][140503] Updated weights for policy 0, policy_version 5096 (0.0007) [2023-08-17 13:13:18,084][140404] Fps is (10 sec: 54067.0, 60 sec: 53589.3, 300 sec: 49656.1). Total num frames: 20897792. Throughput: 0: 13386.2. Samples: 1613880. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:13:18,085][140404] Avg episode reward: [(0, '31.293')] [2023-08-17 13:13:18,389][140503] Updated weights for policy 0, policy_version 5106 (0.0007) [2023-08-17 13:13:19,148][140503] Updated weights for policy 0, policy_version 5116 (0.0006) [2023-08-17 13:13:19,919][140503] Updated weights for policy 0, policy_version 5126 (0.0006) [2023-08-17 13:13:20,672][140503] Updated weights for policy 0, policy_version 5136 (0.0006) [2023-08-17 13:13:21,454][140503] Updated weights for policy 0, policy_version 5146 (0.0006) [2023-08-17 13:13:22,214][140503] Updated weights for policy 0, policy_version 5156 (0.0006) [2023-08-17 13:13:22,984][140503] Updated weights for policy 0, policy_version 5166 (0.0006) [2023-08-17 13:13:23,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53521.1, 300 sec: 49789.1). Total num frames: 21164032. Throughput: 0: 13398.6. Samples: 1653980. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:13:23,085][140404] Avg episode reward: [(0, '31.505')] [2023-08-17 13:13:23,751][140503] Updated weights for policy 0, policy_version 5176 (0.0006) [2023-08-17 13:13:24,511][140503] Updated weights for policy 0, policy_version 5186 (0.0006) [2023-08-17 13:13:25,255][140503] Updated weights for policy 0, policy_version 5196 (0.0007) [2023-08-17 13:13:26,045][140503] Updated weights for policy 0, policy_version 5206 (0.0007) [2023-08-17 13:13:26,798][140503] Updated weights for policy 0, policy_version 5216 (0.0006) [2023-08-17 13:13:27,551][140503] Updated weights for policy 0, policy_version 5226 (0.0007) [2023-08-17 13:13:28,084][140404] Fps is (10 sec: 53248.3, 60 sec: 53521.1, 300 sec: 49912.7). Total num frames: 21430272. Throughput: 0: 13447.2. Samples: 1734400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:13:28,085][140404] Avg episode reward: [(0, '32.278')] [2023-08-17 13:13:28,316][140503] Updated weights for policy 0, policy_version 5236 (0.0006) [2023-08-17 13:13:29,120][140503] Updated weights for policy 0, policy_version 5246 (0.0007) [2023-08-17 13:13:29,899][140503] Updated weights for policy 0, policy_version 5256 (0.0007) [2023-08-17 13:13:30,657][140503] Updated weights for policy 0, policy_version 5266 (0.0007) [2023-08-17 13:13:31,386][140503] Updated weights for policy 0, policy_version 5276 (0.0006) [2023-08-17 13:13:32,158][140503] Updated weights for policy 0, policy_version 5286 (0.0007) [2023-08-17 13:13:32,916][140503] Updated weights for policy 0, policy_version 5296 (0.0007) [2023-08-17 13:13:33,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53521.0, 300 sec: 50027.7). Total num frames: 21696512. Throughput: 0: 13448.2. Samples: 1814560. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:13:33,085][140404] Avg episode reward: [(0, '31.197')] [2023-08-17 13:13:33,718][140503] Updated weights for policy 0, policy_version 5306 (0.0007) [2023-08-17 13:13:34,497][140503] Updated weights for policy 0, policy_version 5316 (0.0006) [2023-08-17 13:13:35,267][140503] Updated weights for policy 0, policy_version 5326 (0.0006) [2023-08-17 13:13:36,039][140503] Updated weights for policy 0, policy_version 5336 (0.0006) [2023-08-17 13:13:36,773][140503] Updated weights for policy 0, policy_version 5346 (0.0006) [2023-08-17 13:13:37,541][140503] Updated weights for policy 0, policy_version 5356 (0.0006) [2023-08-17 13:13:38,084][140404] Fps is (10 sec: 53657.8, 60 sec: 53657.6, 300 sec: 50162.4). Total num frames: 21966848. Throughput: 0: 13432.8. Samples: 1854300. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:13:38,085][140404] Avg episode reward: [(0, '28.943')] [2023-08-17 13:13:38,289][140503] Updated weights for policy 0, policy_version 5366 (0.0006) [2023-08-17 13:13:39,055][140503] Updated weights for policy 0, policy_version 5376 (0.0006) [2023-08-17 13:13:39,801][140503] Updated weights for policy 0, policy_version 5386 (0.0007) [2023-08-17 13:13:40,560][140503] Updated weights for policy 0, policy_version 5396 (0.0006) [2023-08-17 13:13:41,319][140503] Updated weights for policy 0, policy_version 5406 (0.0006) [2023-08-17 13:13:42,070][140503] Updated weights for policy 0, policy_version 5416 (0.0006) [2023-08-17 13:13:42,812][140503] Updated weights for policy 0, policy_version 5426 (0.0006) [2023-08-17 13:13:43,084][140404] Fps is (10 sec: 54067.6, 60 sec: 53794.1, 300 sec: 50288.3). Total num frames: 22237184. Throughput: 0: 13445.9. Samples: 1935432. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:13:43,085][140404] Avg episode reward: [(0, '31.314')] [2023-08-17 13:13:43,576][140503] Updated weights for policy 0, policy_version 5436 (0.0007) [2023-08-17 13:13:44,339][140503] Updated weights for policy 0, policy_version 5446 (0.0006) [2023-08-17 13:13:45,103][140503] Updated weights for policy 0, policy_version 5456 (0.0006) [2023-08-17 13:13:45,872][140503] Updated weights for policy 0, policy_version 5466 (0.0007) [2023-08-17 13:13:46,688][140503] Updated weights for policy 0, policy_version 5476 (0.0007) [2023-08-17 13:13:47,448][140503] Updated weights for policy 0, policy_version 5486 (0.0006) [2023-08-17 13:13:48,084][140404] Fps is (10 sec: 53247.8, 60 sec: 53657.6, 300 sec: 50355.2). Total num frames: 22499328. Throughput: 0: 13404.3. Samples: 2015236. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:13:48,085][140404] Avg episode reward: [(0, '33.426')] [2023-08-17 13:13:48,093][140489] Saving new best policy, reward=33.426! [2023-08-17 13:13:48,246][140503] Updated weights for policy 0, policy_version 5496 (0.0007) [2023-08-17 13:13:49,016][140503] Updated weights for policy 0, policy_version 5506 (0.0006) [2023-08-17 13:13:49,798][140503] Updated weights for policy 0, policy_version 5516 (0.0006) [2023-08-17 13:13:50,563][140503] Updated weights for policy 0, policy_version 5526 (0.0006) [2023-08-17 13:13:51,338][140503] Updated weights for policy 0, policy_version 5536 (0.0006) [2023-08-17 13:13:52,077][140503] Updated weights for policy 0, policy_version 5546 (0.0007) [2023-08-17 13:13:52,857][140503] Updated weights for policy 0, policy_version 5556 (0.0006) [2023-08-17 13:13:53,084][140404] Fps is (10 sec: 53248.1, 60 sec: 53726.0, 300 sec: 50467.7). Total num frames: 22769664. Throughput: 0: 13401.2. Samples: 2054952. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:13:53,085][140404] Avg episode reward: [(0, '31.517')] [2023-08-17 13:13:53,656][140503] Updated weights for policy 0, policy_version 5566 (0.0006) [2023-08-17 13:13:54,400][140503] Updated weights for policy 0, policy_version 5576 (0.0007) [2023-08-17 13:13:55,175][140503] Updated weights for policy 0, policy_version 5586 (0.0006) [2023-08-17 13:13:55,951][140503] Updated weights for policy 0, policy_version 5596 (0.0007) [2023-08-17 13:13:56,710][140503] Updated weights for policy 0, policy_version 5606 (0.0006) [2023-08-17 13:13:57,472][140503] Updated weights for policy 0, policy_version 5616 (0.0006) [2023-08-17 13:13:58,084][140404] Fps is (10 sec: 53657.8, 60 sec: 53657.7, 300 sec: 50549.5). Total num frames: 23035904. Throughput: 0: 13384.7. Samples: 2135092. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:13:58,085][140404] Avg episode reward: [(0, '31.760')] [2023-08-17 13:13:58,239][140503] Updated weights for policy 0, policy_version 5626 (0.0007) [2023-08-17 13:13:59,020][140503] Updated weights for policy 0, policy_version 5636 (0.0006) [2023-08-17 13:13:59,779][140503] Updated weights for policy 0, policy_version 5646 (0.0006) [2023-08-17 13:14:00,573][140503] Updated weights for policy 0, policy_version 5656 (0.0006) [2023-08-17 13:14:01,355][140503] Updated weights for policy 0, policy_version 5666 (0.0007) [2023-08-17 13:14:02,108][140503] Updated weights for policy 0, policy_version 5676 (0.0007) [2023-08-17 13:14:02,904][140503] Updated weights for policy 0, policy_version 5686 (0.0007) [2023-08-17 13:14:03,084][140404] Fps is (10 sec: 52838.2, 60 sec: 53521.1, 300 sec: 50603.2). Total num frames: 23298048. Throughput: 0: 13348.8. Samples: 2214576. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:14:03,085][140404] Avg episode reward: [(0, '31.958')] [2023-08-17 13:14:03,656][140503] Updated weights for policy 0, policy_version 5696 (0.0007) [2023-08-17 13:14:04,427][140503] Updated weights for policy 0, policy_version 5706 (0.0007) [2023-08-17 13:14:05,210][140503] Updated weights for policy 0, policy_version 5716 (0.0007) [2023-08-17 13:14:06,009][140503] Updated weights for policy 0, policy_version 5726 (0.0007) [2023-08-17 13:14:06,772][140503] Updated weights for policy 0, policy_version 5736 (0.0007) [2023-08-17 13:14:07,535][140503] Updated weights for policy 0, policy_version 5746 (0.0007) [2023-08-17 13:14:08,084][140404] Fps is (10 sec: 52838.3, 60 sec: 53452.8, 300 sec: 50676.6). Total num frames: 23564288. Throughput: 0: 13332.5. Samples: 2253944. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:14:08,085][140404] Avg episode reward: [(0, '32.971')] [2023-08-17 13:14:08,294][140503] Updated weights for policy 0, policy_version 5756 (0.0006) [2023-08-17 13:14:09,071][140503] Updated weights for policy 0, policy_version 5766 (0.0007) [2023-08-17 13:14:09,839][140503] Updated weights for policy 0, policy_version 5776 (0.0007) [2023-08-17 13:14:10,592][140503] Updated weights for policy 0, policy_version 5786 (0.0006) [2023-08-17 13:14:11,407][140503] Updated weights for policy 0, policy_version 5796 (0.0007) [2023-08-17 13:14:12,220][140503] Updated weights for policy 0, policy_version 5806 (0.0007) [2023-08-17 13:14:13,013][140503] Updated weights for policy 0, policy_version 5816 (0.0006) [2023-08-17 13:14:13,084][140404] Fps is (10 sec: 52428.2, 60 sec: 53247.9, 300 sec: 50701.8). Total num frames: 23822336. Throughput: 0: 13301.8. Samples: 2332984. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:14:13,085][140404] Avg episode reward: [(0, '31.965')] [2023-08-17 13:14:13,826][140503] Updated weights for policy 0, policy_version 5826 (0.0006) [2023-08-17 13:14:14,636][140503] Updated weights for policy 0, policy_version 5836 (0.0007) [2023-08-17 13:14:15,417][140503] Updated weights for policy 0, policy_version 5846 (0.0006) [2023-08-17 13:14:16,182][140503] Updated weights for policy 0, policy_version 5856 (0.0006) [2023-08-17 13:14:16,985][140503] Updated weights for policy 0, policy_version 5866 (0.0007) [2023-08-17 13:14:17,767][140503] Updated weights for policy 0, policy_version 5876 (0.0006) [2023-08-17 13:14:18,084][140404] Fps is (10 sec: 51609.7, 60 sec: 53043.3, 300 sec: 50725.7). Total num frames: 24080384. Throughput: 0: 13237.0. Samples: 2410224. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:14:18,085][140404] Avg episode reward: [(0, '34.376')] [2023-08-17 13:14:18,086][140489] Saving new best policy, reward=34.376! [2023-08-17 13:14:18,589][140503] Updated weights for policy 0, policy_version 5886 (0.0007) [2023-08-17 13:14:19,411][140503] Updated weights for policy 0, policy_version 5896 (0.0007) [2023-08-17 13:14:20,241][140503] Updated weights for policy 0, policy_version 5906 (0.0007) [2023-08-17 13:14:21,027][140503] Updated weights for policy 0, policy_version 5916 (0.0007) [2023-08-17 13:14:21,848][140503] Updated weights for policy 0, policy_version 5926 (0.0007) [2023-08-17 13:14:22,671][140503] Updated weights for policy 0, policy_version 5936 (0.0007) [2023-08-17 13:14:23,084][140404] Fps is (10 sec: 51200.4, 60 sec: 52838.4, 300 sec: 50727.4). Total num frames: 24334336. Throughput: 0: 13194.0. Samples: 2448032. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:14:23,085][140404] Avg episode reward: [(0, '36.557')] [2023-08-17 13:14:23,089][140489] Saving new best policy, reward=36.557! [2023-08-17 13:14:23,488][140503] Updated weights for policy 0, policy_version 5946 (0.0007) [2023-08-17 13:14:24,271][140503] Updated weights for policy 0, policy_version 5956 (0.0007) [2023-08-17 13:14:25,063][140503] Updated weights for policy 0, policy_version 5966 (0.0007) [2023-08-17 13:14:25,908][140503] Updated weights for policy 0, policy_version 5976 (0.0007) [2023-08-17 13:14:26,701][140503] Updated weights for policy 0, policy_version 5986 (0.0006) [2023-08-17 13:14:27,454][140503] Updated weights for policy 0, policy_version 5996 (0.0006) [2023-08-17 13:14:28,084][140404] Fps is (10 sec: 51200.0, 60 sec: 52701.9, 300 sec: 50749.4). Total num frames: 24592384. Throughput: 0: 13084.4. Samples: 2524232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:14:28,085][140404] Avg episode reward: [(0, '30.281')] [2023-08-17 13:14:28,236][140503] Updated weights for policy 0, policy_version 6006 (0.0006) [2023-08-17 13:14:29,036][140503] Updated weights for policy 0, policy_version 6016 (0.0006) [2023-08-17 13:14:29,829][140503] Updated weights for policy 0, policy_version 6026 (0.0007) [2023-08-17 13:14:30,615][140503] Updated weights for policy 0, policy_version 6036 (0.0007) [2023-08-17 13:14:31,431][140503] Updated weights for policy 0, policy_version 6046 (0.0007) [2023-08-17 13:14:32,243][140503] Updated weights for policy 0, policy_version 6056 (0.0007) [2023-08-17 13:14:33,039][140503] Updated weights for policy 0, policy_version 6066 (0.0006) [2023-08-17 13:14:33,084][140404] Fps is (10 sec: 51200.1, 60 sec: 52497.1, 300 sec: 50750.4). Total num frames: 24846336. Throughput: 0: 13033.6. Samples: 2601748. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:14:33,085][140404] Avg episode reward: [(0, '29.723')] [2023-08-17 13:14:33,793][140503] Updated weights for policy 0, policy_version 6076 (0.0006) [2023-08-17 13:14:34,577][140503] Updated weights for policy 0, policy_version 6086 (0.0006) [2023-08-17 13:14:35,342][140503] Updated weights for policy 0, policy_version 6096 (0.0006) [2023-08-17 13:14:36,098][140503] Updated weights for policy 0, policy_version 6106 (0.0007) [2023-08-17 13:14:36,884][140503] Updated weights for policy 0, policy_version 6116 (0.0006) [2023-08-17 13:14:37,621][140503] Updated weights for policy 0, policy_version 6126 (0.0006) [2023-08-17 13:14:38,084][140404] Fps is (10 sec: 52018.7, 60 sec: 52428.7, 300 sec: 50809.9). Total num frames: 25112576. Throughput: 0: 13032.1. Samples: 2641400. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:14:38,085][140404] Avg episode reward: [(0, '32.827')] [2023-08-17 13:14:38,452][140503] Updated weights for policy 0, policy_version 6136 (0.0007) [2023-08-17 13:14:39,200][140503] Updated weights for policy 0, policy_version 6146 (0.0006) [2023-08-17 13:14:39,950][140503] Updated weights for policy 0, policy_version 6156 (0.0006) [2023-08-17 13:14:40,719][140503] Updated weights for policy 0, policy_version 6166 (0.0006) [2023-08-17 13:14:41,515][140503] Updated weights for policy 0, policy_version 6176 (0.0007) [2023-08-17 13:14:42,330][140503] Updated weights for policy 0, policy_version 6186 (0.0007) [2023-08-17 13:14:43,072][140503] Updated weights for policy 0, policy_version 6196 (0.0007) [2023-08-17 13:14:43,084][140404] Fps is (10 sec: 53248.1, 60 sec: 52360.5, 300 sec: 50866.6). Total num frames: 25378816. Throughput: 0: 13019.4. Samples: 2720964. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:14:43,085][140404] Avg episode reward: [(0, '35.421')] [2023-08-17 13:14:43,874][140503] Updated weights for policy 0, policy_version 6206 (0.0006) [2023-08-17 13:14:44,660][140503] Updated weights for policy 0, policy_version 6216 (0.0006) [2023-08-17 13:14:45,410][140503] Updated weights for policy 0, policy_version 6226 (0.0006) [2023-08-17 13:14:46,206][140503] Updated weights for policy 0, policy_version 6236 (0.0007) [2023-08-17 13:14:46,989][140503] Updated weights for policy 0, policy_version 6246 (0.0007) [2023-08-17 13:14:47,748][140503] Updated weights for policy 0, policy_version 6256 (0.0006) [2023-08-17 13:14:48,084][140404] Fps is (10 sec: 52838.9, 60 sec: 52360.6, 300 sec: 50902.1). Total num frames: 25640960. Throughput: 0: 13004.3. Samples: 2799768. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:14:48,085][140404] Avg episode reward: [(0, '34.623')] [2023-08-17 13:14:48,530][140503] Updated weights for policy 0, policy_version 6266 (0.0006) [2023-08-17 13:14:49,346][140503] Updated weights for policy 0, policy_version 6276 (0.0007) [2023-08-17 13:14:50,119][140503] Updated weights for policy 0, policy_version 6286 (0.0006) [2023-08-17 13:14:50,905][140503] Updated weights for policy 0, policy_version 6296 (0.0006) [2023-08-17 13:14:51,653][140503] Updated weights for policy 0, policy_version 6306 (0.0006) [2023-08-17 13:14:52,396][140503] Updated weights for policy 0, policy_version 6316 (0.0006) [2023-08-17 13:14:53,084][140404] Fps is (10 sec: 52428.7, 60 sec: 52223.9, 300 sec: 50936.0). Total num frames: 25903104. Throughput: 0: 13001.0. Samples: 2838988. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:14:53,085][140404] Avg episode reward: [(0, '32.644')] [2023-08-17 13:14:53,198][140503] Updated weights for policy 0, policy_version 6326 (0.0007) [2023-08-17 13:14:54,000][140503] Updated weights for policy 0, policy_version 6336 (0.0006) [2023-08-17 13:14:54,769][140503] Updated weights for policy 0, policy_version 6346 (0.0006) [2023-08-17 13:14:55,592][140503] Updated weights for policy 0, policy_version 6356 (0.0007) [2023-08-17 13:14:56,373][140503] Updated weights for policy 0, policy_version 6366 (0.0007) [2023-08-17 13:14:57,110][140503] Updated weights for policy 0, policy_version 6376 (0.0006) [2023-08-17 13:14:57,845][140503] Updated weights for policy 0, policy_version 6386 (0.0007) [2023-08-17 13:14:58,084][140404] Fps is (10 sec: 52428.4, 60 sec: 52155.7, 300 sec: 50968.5). Total num frames: 26165248. Throughput: 0: 13002.8. Samples: 2918108. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:14:58,085][140404] Avg episode reward: [(0, '34.363')] [2023-08-17 13:14:58,645][140503] Updated weights for policy 0, policy_version 6396 (0.0007) [2023-08-17 13:14:59,437][140503] Updated weights for policy 0, policy_version 6406 (0.0007) [2023-08-17 13:15:00,204][140503] Updated weights for policy 0, policy_version 6416 (0.0006) [2023-08-17 13:15:00,968][140503] Updated weights for policy 0, policy_version 6426 (0.0006) [2023-08-17 13:15:01,773][140503] Updated weights for policy 0, policy_version 6436 (0.0007) [2023-08-17 13:15:02,538][140503] Updated weights for policy 0, policy_version 6446 (0.0006) [2023-08-17 13:15:03,084][140404] Fps is (10 sec: 52838.5, 60 sec: 52224.0, 300 sec: 51017.0). Total num frames: 26431488. Throughput: 0: 13046.9. Samples: 2997336. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:15:03,085][140404] Avg episode reward: [(0, '33.538')] [2023-08-17 13:15:03,089][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000006453_26431488.pth... [2023-08-17 13:15:03,130][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003526_14442496.pth [2023-08-17 13:15:03,337][140503] Updated weights for policy 0, policy_version 6456 (0.0006) [2023-08-17 13:15:04,088][140503] Updated weights for policy 0, policy_version 6466 (0.0006) [2023-08-17 13:15:04,898][140503] Updated weights for policy 0, policy_version 6476 (0.0006) [2023-08-17 13:15:05,698][140503] Updated weights for policy 0, policy_version 6486 (0.0007) [2023-08-17 13:15:06,517][140503] Updated weights for policy 0, policy_version 6496 (0.0007) [2023-08-17 13:15:07,279][140503] Updated weights for policy 0, policy_version 6506 (0.0006) [2023-08-17 13:15:08,074][140503] Updated weights for policy 0, policy_version 6516 (0.0006) [2023-08-17 13:15:08,084][140404] Fps is (10 sec: 52429.0, 60 sec: 52087.4, 300 sec: 51029.3). Total num frames: 26689536. Throughput: 0: 13062.0. Samples: 3035820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:15:08,085][140404] Avg episode reward: [(0, '35.697')] [2023-08-17 13:15:08,892][140503] Updated weights for policy 0, policy_version 6526 (0.0007) [2023-08-17 13:15:09,693][140503] Updated weights for policy 0, policy_version 6536 (0.0007) [2023-08-17 13:15:10,519][140503] Updated weights for policy 0, policy_version 6546 (0.0007) [2023-08-17 13:15:11,304][140503] Updated weights for policy 0, policy_version 6556 (0.0007) [2023-08-17 13:15:12,097][140503] Updated weights for policy 0, policy_version 6566 (0.0007) [2023-08-17 13:15:12,919][140503] Updated weights for policy 0, policy_version 6576 (0.0007) [2023-08-17 13:15:13,084][140404] Fps is (10 sec: 51200.0, 60 sec: 52019.3, 300 sec: 51024.5). Total num frames: 26943488. Throughput: 0: 13079.6. Samples: 3112816. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:15:13,085][140404] Avg episode reward: [(0, '29.390')] [2023-08-17 13:15:13,677][140503] Updated weights for policy 0, policy_version 6586 (0.0006) [2023-08-17 13:15:14,482][140503] Updated weights for policy 0, policy_version 6596 (0.0007) [2023-08-17 13:15:15,260][140503] Updated weights for policy 0, policy_version 6606 (0.0006) [2023-08-17 13:15:16,039][140503] Updated weights for policy 0, policy_version 6616 (0.0006) [2023-08-17 13:15:16,825][140503] Updated weights for policy 0, policy_version 6626 (0.0007) [2023-08-17 13:15:17,622][140503] Updated weights for policy 0, policy_version 6636 (0.0007) [2023-08-17 13:15:18,084][140404] Fps is (10 sec: 51200.0, 60 sec: 52019.2, 300 sec: 51036.2). Total num frames: 27201536. Throughput: 0: 13087.8. Samples: 3190700. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:15:18,085][140404] Avg episode reward: [(0, '32.401')] [2023-08-17 13:15:18,439][140503] Updated weights for policy 0, policy_version 6646 (0.0007) [2023-08-17 13:15:19,225][140503] Updated weights for policy 0, policy_version 6656 (0.0007) [2023-08-17 13:15:19,986][140503] Updated weights for policy 0, policy_version 6666 (0.0006) [2023-08-17 13:15:20,783][140503] Updated weights for policy 0, policy_version 6676 (0.0006) [2023-08-17 13:15:21,559][140503] Updated weights for policy 0, policy_version 6686 (0.0007) [2023-08-17 13:15:22,365][140503] Updated weights for policy 0, policy_version 6696 (0.0007) [2023-08-17 13:15:23,084][140404] Fps is (10 sec: 52019.2, 60 sec: 52155.8, 300 sec: 51063.5). Total num frames: 27463680. Throughput: 0: 13073.2. Samples: 3229692. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-08-17 13:15:23,085][140404] Avg episode reward: [(0, '32.669')] [2023-08-17 13:15:23,149][140503] Updated weights for policy 0, policy_version 6706 (0.0007) [2023-08-17 13:15:23,946][140503] Updated weights for policy 0, policy_version 6716 (0.0006) [2023-08-17 13:15:24,734][140503] Updated weights for policy 0, policy_version 6726 (0.0006) [2023-08-17 13:15:25,506][140503] Updated weights for policy 0, policy_version 6736 (0.0006) [2023-08-17 13:15:26,281][140503] Updated weights for policy 0, policy_version 6746 (0.0006) [2023-08-17 13:15:27,017][140503] Updated weights for policy 0, policy_version 6756 (0.0006) [2023-08-17 13:15:27,776][140503] Updated weights for policy 0, policy_version 6766 (0.0006) [2023-08-17 13:15:28,084][140404] Fps is (10 sec: 52428.9, 60 sec: 52224.0, 300 sec: 51089.7). Total num frames: 27725824. Throughput: 0: 13055.6. Samples: 3308468. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:15:28,085][140404] Avg episode reward: [(0, '33.851')] [2023-08-17 13:15:28,531][140503] Updated weights for policy 0, policy_version 6776 (0.0006) [2023-08-17 13:15:29,329][140503] Updated weights for policy 0, policy_version 6786 (0.0007) [2023-08-17 13:15:30,136][140503] Updated weights for policy 0, policy_version 6796 (0.0007) [2023-08-17 13:15:30,888][140503] Updated weights for policy 0, policy_version 6806 (0.0006) [2023-08-17 13:15:31,686][140503] Updated weights for policy 0, policy_version 6816 (0.0006) [2023-08-17 13:15:32,475][140503] Updated weights for policy 0, policy_version 6826 (0.0007) [2023-08-17 13:15:33,084][140404] Fps is (10 sec: 52019.2, 60 sec: 52292.3, 300 sec: 51099.5). Total num frames: 27983872. Throughput: 0: 13037.5. Samples: 3386456. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:15:33,085][140404] Avg episode reward: [(0, '31.815')] [2023-08-17 13:15:33,316][140503] Updated weights for policy 0, policy_version 6836 (0.0007) [2023-08-17 13:15:34,066][140503] Updated weights for policy 0, policy_version 6846 (0.0006) [2023-08-17 13:15:34,843][140503] Updated weights for policy 0, policy_version 6856 (0.0007) [2023-08-17 13:15:35,609][140503] Updated weights for policy 0, policy_version 6866 (0.0006) [2023-08-17 13:15:36,328][140503] Updated weights for policy 0, policy_version 6876 (0.0006) [2023-08-17 13:15:37,102][140503] Updated weights for policy 0, policy_version 6886 (0.0006) [2023-08-17 13:15:37,882][140503] Updated weights for policy 0, policy_version 6896 (0.0006) [2023-08-17 13:15:38,084][140404] Fps is (10 sec: 53247.9, 60 sec: 52428.8, 300 sec: 51169.7). Total num frames: 28258304. Throughput: 0: 13063.6. Samples: 3426848. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:15:38,085][140404] Avg episode reward: [(0, '33.998')] [2023-08-17 13:15:38,653][140503] Updated weights for policy 0, policy_version 6906 (0.0006) [2023-08-17 13:15:39,397][140503] Updated weights for policy 0, policy_version 6916 (0.0006) [2023-08-17 13:15:40,154][140503] Updated weights for policy 0, policy_version 6926 (0.0006) [2023-08-17 13:15:40,925][140503] Updated weights for policy 0, policy_version 6936 (0.0006) [2023-08-17 13:15:41,663][140503] Updated weights for policy 0, policy_version 6946 (0.0007) [2023-08-17 13:15:42,433][140503] Updated weights for policy 0, policy_version 6956 (0.0006) [2023-08-17 13:15:43,084][140404] Fps is (10 sec: 54067.4, 60 sec: 52428.8, 300 sec: 51207.4). Total num frames: 28524544. Throughput: 0: 13098.4. Samples: 3507536. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:15:43,085][140404] Avg episode reward: [(0, '37.333')] [2023-08-17 13:15:43,089][140489] Saving new best policy, reward=37.333! [2023-08-17 13:15:43,213][140503] Updated weights for policy 0, policy_version 6966 (0.0007) [2023-08-17 13:15:43,982][140503] Updated weights for policy 0, policy_version 6976 (0.0006) [2023-08-17 13:15:44,788][140503] Updated weights for policy 0, policy_version 6986 (0.0006) [2023-08-17 13:15:45,551][140503] Updated weights for policy 0, policy_version 6996 (0.0007) [2023-08-17 13:15:46,317][140503] Updated weights for policy 0, policy_version 7006 (0.0007) [2023-08-17 13:15:47,104][140503] Updated weights for policy 0, policy_version 7016 (0.0007) [2023-08-17 13:15:47,860][140503] Updated weights for policy 0, policy_version 7026 (0.0006) [2023-08-17 13:15:48,084][140404] Fps is (10 sec: 53247.8, 60 sec: 52497.0, 300 sec: 51243.9). Total num frames: 28790784. Throughput: 0: 13114.7. Samples: 3587496. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:15:48,085][140404] Avg episode reward: [(0, '33.937')] [2023-08-17 13:15:48,631][140503] Updated weights for policy 0, policy_version 7036 (0.0006) [2023-08-17 13:15:49,388][140503] Updated weights for policy 0, policy_version 7046 (0.0007) [2023-08-17 13:15:50,142][140503] Updated weights for policy 0, policy_version 7056 (0.0006) [2023-08-17 13:15:50,892][140503] Updated weights for policy 0, policy_version 7066 (0.0006) [2023-08-17 13:15:51,656][140503] Updated weights for policy 0, policy_version 7076 (0.0007) [2023-08-17 13:15:52,430][140503] Updated weights for policy 0, policy_version 7086 (0.0007) [2023-08-17 13:15:53,084][140404] Fps is (10 sec: 53248.0, 60 sec: 52565.4, 300 sec: 51279.0). Total num frames: 29057024. Throughput: 0: 13147.3. Samples: 3627448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:15:53,085][140404] Avg episode reward: [(0, '38.580')] [2023-08-17 13:15:53,088][140489] Saving new best policy, reward=38.580! [2023-08-17 13:15:53,189][140503] Updated weights for policy 0, policy_version 7096 (0.0006) [2023-08-17 13:15:53,984][140503] Updated weights for policy 0, policy_version 7106 (0.0006) [2023-08-17 13:15:54,729][140503] Updated weights for policy 0, policy_version 7116 (0.0006) [2023-08-17 13:15:55,492][140503] Updated weights for policy 0, policy_version 7126 (0.0006) [2023-08-17 13:15:56,241][140503] Updated weights for policy 0, policy_version 7136 (0.0007) [2023-08-17 13:15:57,004][140503] Updated weights for policy 0, policy_version 7146 (0.0006) [2023-08-17 13:15:57,758][140503] Updated weights for policy 0, policy_version 7156 (0.0007) [2023-08-17 13:15:58,084][140404] Fps is (10 sec: 53248.0, 60 sec: 52633.6, 300 sec: 51313.0). Total num frames: 29323264. Throughput: 0: 13224.4. Samples: 3707916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:15:58,085][140404] Avg episode reward: [(0, '32.351')] [2023-08-17 13:15:58,548][140503] Updated weights for policy 0, policy_version 7166 (0.0007) [2023-08-17 13:15:59,342][140503] Updated weights for policy 0, policy_version 7176 (0.0007) [2023-08-17 13:16:00,138][140503] Updated weights for policy 0, policy_version 7186 (0.0006) [2023-08-17 13:16:00,899][140503] Updated weights for policy 0, policy_version 7196 (0.0007) [2023-08-17 13:16:01,667][140503] Updated weights for policy 0, policy_version 7206 (0.0006) [2023-08-17 13:16:02,423][140503] Updated weights for policy 0, policy_version 7216 (0.0006) [2023-08-17 13:16:03,084][140404] Fps is (10 sec: 53247.8, 60 sec: 52633.6, 300 sec: 51345.8). Total num frames: 29589504. Throughput: 0: 13262.7. Samples: 3787520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:16:03,085][140404] Avg episode reward: [(0, '32.131')] [2023-08-17 13:16:03,167][140503] Updated weights for policy 0, policy_version 7226 (0.0006) [2023-08-17 13:16:03,975][140503] Updated weights for policy 0, policy_version 7236 (0.0007) [2023-08-17 13:16:04,716][140503] Updated weights for policy 0, policy_version 7246 (0.0006) [2023-08-17 13:16:05,491][140503] Updated weights for policy 0, policy_version 7256 (0.0006) [2023-08-17 13:16:06,243][140503] Updated weights for policy 0, policy_version 7266 (0.0006) [2023-08-17 13:16:07,045][140503] Updated weights for policy 0, policy_version 7276 (0.0007) [2023-08-17 13:16:07,769][140503] Updated weights for policy 0, policy_version 7286 (0.0006) [2023-08-17 13:16:08,084][140404] Fps is (10 sec: 53657.9, 60 sec: 52838.4, 300 sec: 52248.3). Total num frames: 29859840. Throughput: 0: 13296.2. Samples: 3828020. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:16:08,085][140404] Avg episode reward: [(0, '32.595')] [2023-08-17 13:16:08,527][140503] Updated weights for policy 0, policy_version 7296 (0.0007) [2023-08-17 13:16:09,310][140503] Updated weights for policy 0, policy_version 7306 (0.0007) [2023-08-17 13:16:10,092][140503] Updated weights for policy 0, policy_version 7316 (0.0006) [2023-08-17 13:16:10,851][140503] Updated weights for policy 0, policy_version 7326 (0.0006) [2023-08-17 13:16:11,597][140503] Updated weights for policy 0, policy_version 7336 (0.0006) [2023-08-17 13:16:12,300][140503] Updated weights for policy 0, policy_version 7346 (0.0005) [2023-08-17 13:16:13,076][140503] Updated weights for policy 0, policy_version 7356 (0.0006) [2023-08-17 13:16:13,084][140404] Fps is (10 sec: 54067.3, 60 sec: 53111.5, 300 sec: 52401.0). Total num frames: 30130176. Throughput: 0: 13339.0. Samples: 3908724. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:16:13,085][140404] Avg episode reward: [(0, '35.262')] [2023-08-17 13:16:13,841][140503] Updated weights for policy 0, policy_version 7366 (0.0007) [2023-08-17 13:16:14,587][140503] Updated weights for policy 0, policy_version 7376 (0.0007) [2023-08-17 13:16:15,333][140503] Updated weights for policy 0, policy_version 7386 (0.0007) [2023-08-17 13:16:16,077][140503] Updated weights for policy 0, policy_version 7396 (0.0006) [2023-08-17 13:16:16,860][140503] Updated weights for policy 0, policy_version 7406 (0.0006) [2023-08-17 13:16:17,629][140503] Updated weights for policy 0, policy_version 7416 (0.0006) [2023-08-17 13:16:18,084][140404] Fps is (10 sec: 54066.8, 60 sec: 53316.2, 300 sec: 52442.7). Total num frames: 30400512. Throughput: 0: 13404.8. Samples: 3989672. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:16:18,085][140404] Avg episode reward: [(0, '34.075')] [2023-08-17 13:16:18,374][140503] Updated weights for policy 0, policy_version 7426 (0.0006) [2023-08-17 13:16:19,113][140503] Updated weights for policy 0, policy_version 7436 (0.0006) [2023-08-17 13:16:19,886][140503] Updated weights for policy 0, policy_version 7446 (0.0006) [2023-08-17 13:16:20,630][140503] Updated weights for policy 0, policy_version 7456 (0.0006) [2023-08-17 13:16:21,404][140503] Updated weights for policy 0, policy_version 7466 (0.0007) [2023-08-17 13:16:22,159][140503] Updated weights for policy 0, policy_version 7476 (0.0007) [2023-08-17 13:16:22,921][140503] Updated weights for policy 0, policy_version 7486 (0.0006) [2023-08-17 13:16:23,084][140404] Fps is (10 sec: 54067.2, 60 sec: 53452.8, 300 sec: 52512.1). Total num frames: 30670848. Throughput: 0: 13410.8. Samples: 4030336. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:16:23,085][140404] Avg episode reward: [(0, '33.022')] [2023-08-17 13:16:23,671][140503] Updated weights for policy 0, policy_version 7496 (0.0006) [2023-08-17 13:16:24,451][140503] Updated weights for policy 0, policy_version 7506 (0.0006) [2023-08-17 13:16:25,206][140503] Updated weights for policy 0, policy_version 7516 (0.0006) [2023-08-17 13:16:25,952][140503] Updated weights for policy 0, policy_version 7526 (0.0006) [2023-08-17 13:16:26,723][140503] Updated weights for policy 0, policy_version 7536 (0.0006) [2023-08-17 13:16:27,487][140503] Updated weights for policy 0, policy_version 7546 (0.0007) [2023-08-17 13:16:28,084][140404] Fps is (10 sec: 53658.1, 60 sec: 53521.1, 300 sec: 52581.5). Total num frames: 30937088. Throughput: 0: 13413.3. Samples: 4111136. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:16:28,085][140404] Avg episode reward: [(0, '33.851')] [2023-08-17 13:16:28,279][140503] Updated weights for policy 0, policy_version 7556 (0.0007) [2023-08-17 13:16:29,039][140503] Updated weights for policy 0, policy_version 7566 (0.0006) [2023-08-17 13:16:29,798][140503] Updated weights for policy 0, policy_version 7576 (0.0006) [2023-08-17 13:16:30,534][140503] Updated weights for policy 0, policy_version 7586 (0.0007) [2023-08-17 13:16:31,303][140503] Updated weights for policy 0, policy_version 7596 (0.0006) [2023-08-17 13:16:32,034][140503] Updated weights for policy 0, policy_version 7606 (0.0006) [2023-08-17 13:16:32,783][140503] Updated weights for policy 0, policy_version 7616 (0.0006) [2023-08-17 13:16:33,084][140404] Fps is (10 sec: 53657.4, 60 sec: 53725.8, 300 sec: 52650.9). Total num frames: 31207424. Throughput: 0: 13441.8. Samples: 4192376. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:16:33,085][140404] Avg episode reward: [(0, '34.512')] [2023-08-17 13:16:33,527][140503] Updated weights for policy 0, policy_version 7626 (0.0006) [2023-08-17 13:16:34,309][140503] Updated weights for policy 0, policy_version 7636 (0.0007) [2023-08-17 13:16:35,083][140503] Updated weights for policy 0, policy_version 7646 (0.0007) [2023-08-17 13:16:35,840][140503] Updated weights for policy 0, policy_version 7656 (0.0007) [2023-08-17 13:16:36,655][140503] Updated weights for policy 0, policy_version 7666 (0.0006) [2023-08-17 13:16:37,442][140503] Updated weights for policy 0, policy_version 7676 (0.0007) [2023-08-17 13:16:38,084][140404] Fps is (10 sec: 53657.0, 60 sec: 53589.3, 300 sec: 52706.5). Total num frames: 31473664. Throughput: 0: 13433.8. Samples: 4231972. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:16:38,085][140404] Avg episode reward: [(0, '38.272')] [2023-08-17 13:16:38,222][140503] Updated weights for policy 0, policy_version 7686 (0.0007) [2023-08-17 13:16:38,995][140503] Updated weights for policy 0, policy_version 7696 (0.0007) [2023-08-17 13:16:39,767][140503] Updated weights for policy 0, policy_version 7706 (0.0007) [2023-08-17 13:16:40,509][140503] Updated weights for policy 0, policy_version 7716 (0.0006) [2023-08-17 13:16:41,308][140503] Updated weights for policy 0, policy_version 7726 (0.0007) [2023-08-17 13:16:42,045][140503] Updated weights for policy 0, policy_version 7736 (0.0007) [2023-08-17 13:16:42,814][140503] Updated weights for policy 0, policy_version 7746 (0.0006) [2023-08-17 13:16:43,084][140404] Fps is (10 sec: 53247.1, 60 sec: 53589.1, 300 sec: 52734.2). Total num frames: 31739904. Throughput: 0: 13413.3. Samples: 4311516. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:16:43,085][140404] Avg episode reward: [(0, '38.865')] [2023-08-17 13:16:43,100][140489] Saving new best policy, reward=38.865! [2023-08-17 13:16:43,546][140503] Updated weights for policy 0, policy_version 7756 (0.0006) [2023-08-17 13:16:44,320][140503] Updated weights for policy 0, policy_version 7766 (0.0007) [2023-08-17 13:16:45,113][140503] Updated weights for policy 0, policy_version 7776 (0.0007) [2023-08-17 13:16:45,912][140503] Updated weights for policy 0, policy_version 7786 (0.0007) [2023-08-17 13:16:46,669][140503] Updated weights for policy 0, policy_version 7796 (0.0007) [2023-08-17 13:16:47,468][140503] Updated weights for policy 0, policy_version 7806 (0.0007) [2023-08-17 13:16:48,084][140404] Fps is (10 sec: 53248.7, 60 sec: 53589.4, 300 sec: 52803.7). Total num frames: 32006144. Throughput: 0: 13409.4. Samples: 4390944. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:16:48,085][140404] Avg episode reward: [(0, '35.039')] [2023-08-17 13:16:48,226][140503] Updated weights for policy 0, policy_version 7816 (0.0006) [2023-08-17 13:16:48,953][140503] Updated weights for policy 0, policy_version 7826 (0.0006) [2023-08-17 13:16:49,722][140503] Updated weights for policy 0, policy_version 7836 (0.0006) [2023-08-17 13:16:50,524][140503] Updated weights for policy 0, policy_version 7846 (0.0006) [2023-08-17 13:16:51,240][140503] Updated weights for policy 0, policy_version 7856 (0.0006) [2023-08-17 13:16:51,987][140503] Updated weights for policy 0, policy_version 7866 (0.0005) [2023-08-17 13:16:52,755][140503] Updated weights for policy 0, policy_version 7876 (0.0006) [2023-08-17 13:16:53,084][140404] Fps is (10 sec: 53658.7, 60 sec: 53657.6, 300 sec: 52873.1). Total num frames: 32276480. Throughput: 0: 13413.2. Samples: 4431616. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:16:53,085][140404] Avg episode reward: [(0, '35.657')] [2023-08-17 13:16:53,509][140503] Updated weights for policy 0, policy_version 7886 (0.0007) [2023-08-17 13:16:54,290][140503] Updated weights for policy 0, policy_version 7896 (0.0006) [2023-08-17 13:16:55,065][140503] Updated weights for policy 0, policy_version 7906 (0.0006) [2023-08-17 13:16:55,842][140503] Updated weights for policy 0, policy_version 7916 (0.0006) [2023-08-17 13:16:56,623][140503] Updated weights for policy 0, policy_version 7926 (0.0007) [2023-08-17 13:16:57,391][140503] Updated weights for policy 0, policy_version 7936 (0.0006) [2023-08-17 13:16:58,084][140404] Fps is (10 sec: 53657.3, 60 sec: 53657.6, 300 sec: 52928.7). Total num frames: 32542720. Throughput: 0: 13408.7. Samples: 4512116. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:16:58,085][140404] Avg episode reward: [(0, '34.604')] [2023-08-17 13:16:58,157][140503] Updated weights for policy 0, policy_version 7946 (0.0007) [2023-08-17 13:16:58,895][140503] Updated weights for policy 0, policy_version 7956 (0.0006) [2023-08-17 13:16:59,654][140503] Updated weights for policy 0, policy_version 7966 (0.0006) [2023-08-17 13:17:00,400][140503] Updated weights for policy 0, policy_version 7976 (0.0007) [2023-08-17 13:17:01,161][140503] Updated weights for policy 0, policy_version 7986 (0.0007) [2023-08-17 13:17:01,933][140503] Updated weights for policy 0, policy_version 7996 (0.0007) [2023-08-17 13:17:02,708][140503] Updated weights for policy 0, policy_version 8006 (0.0007) [2023-08-17 13:17:03,084][140404] Fps is (10 sec: 53247.5, 60 sec: 53657.5, 300 sec: 52984.2). Total num frames: 32808960. Throughput: 0: 13396.8. Samples: 4592528. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:17:03,085][140404] Avg episode reward: [(0, '36.751')] [2023-08-17 13:17:03,088][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000008010_32808960.pth... [2023-08-17 13:17:03,129][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004904_20086784.pth [2023-08-17 13:17:03,466][140503] Updated weights for policy 0, policy_version 8016 (0.0007) [2023-08-17 13:17:04,250][140503] Updated weights for policy 0, policy_version 8026 (0.0006) [2023-08-17 13:17:05,028][140503] Updated weights for policy 0, policy_version 8036 (0.0007) [2023-08-17 13:17:05,788][140503] Updated weights for policy 0, policy_version 8046 (0.0006) [2023-08-17 13:17:06,555][140503] Updated weights for policy 0, policy_version 8056 (0.0006) [2023-08-17 13:17:07,337][140503] Updated weights for policy 0, policy_version 8066 (0.0007) [2023-08-17 13:17:08,084][140404] Fps is (10 sec: 53248.2, 60 sec: 53589.3, 300 sec: 53039.7). Total num frames: 33075200. Throughput: 0: 13376.2. Samples: 4632264. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:17:08,085][140404] Avg episode reward: [(0, '35.276')] [2023-08-17 13:17:08,121][140503] Updated weights for policy 0, policy_version 8076 (0.0007) [2023-08-17 13:17:08,885][140503] Updated weights for policy 0, policy_version 8086 (0.0007) [2023-08-17 13:17:09,659][140503] Updated weights for policy 0, policy_version 8096 (0.0006) [2023-08-17 13:17:10,413][140503] Updated weights for policy 0, policy_version 8106 (0.0006) [2023-08-17 13:17:11,147][140503] Updated weights for policy 0, policy_version 8116 (0.0007) [2023-08-17 13:17:11,900][140503] Updated weights for policy 0, policy_version 8126 (0.0006) [2023-08-17 13:17:12,669][140503] Updated weights for policy 0, policy_version 8136 (0.0006) [2023-08-17 13:17:13,084][140404] Fps is (10 sec: 53658.2, 60 sec: 53589.4, 300 sec: 53095.3). Total num frames: 33345536. Throughput: 0: 13369.8. Samples: 4712776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:17:13,085][140404] Avg episode reward: [(0, '38.402')] [2023-08-17 13:17:13,444][140503] Updated weights for policy 0, policy_version 8146 (0.0006) [2023-08-17 13:17:14,181][140503] Updated weights for policy 0, policy_version 8156 (0.0006) [2023-08-17 13:17:14,949][140503] Updated weights for policy 0, policy_version 8166 (0.0006) [2023-08-17 13:17:15,686][140503] Updated weights for policy 0, policy_version 8176 (0.0006) [2023-08-17 13:17:16,438][140503] Updated weights for policy 0, policy_version 8186 (0.0006) [2023-08-17 13:17:17,180][140503] Updated weights for policy 0, policy_version 8196 (0.0006) [2023-08-17 13:17:17,949][140503] Updated weights for policy 0, policy_version 8206 (0.0007) [2023-08-17 13:17:18,084][140404] Fps is (10 sec: 54066.8, 60 sec: 53589.3, 300 sec: 53095.3). Total num frames: 33615872. Throughput: 0: 13377.0. Samples: 4794340. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:17:18,085][140404] Avg episode reward: [(0, '37.372')] [2023-08-17 13:17:18,697][140503] Updated weights for policy 0, policy_version 8216 (0.0006) [2023-08-17 13:17:19,443][140503] Updated weights for policy 0, policy_version 8226 (0.0006) [2023-08-17 13:17:20,198][140503] Updated weights for policy 0, policy_version 8236 (0.0007) [2023-08-17 13:17:20,965][140503] Updated weights for policy 0, policy_version 8246 (0.0006) [2023-08-17 13:17:21,727][140503] Updated weights for policy 0, policy_version 8256 (0.0007) [2023-08-17 13:17:22,481][140503] Updated weights for policy 0, policy_version 8266 (0.0006) [2023-08-17 13:17:23,084][140404] Fps is (10 sec: 54066.9, 60 sec: 53589.3, 300 sec: 53109.2). Total num frames: 33886208. Throughput: 0: 13398.4. Samples: 4834900. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:17:23,085][140404] Avg episode reward: [(0, '32.837')] [2023-08-17 13:17:23,283][140503] Updated weights for policy 0, policy_version 8276 (0.0006) [2023-08-17 13:17:24,032][140503] Updated weights for policy 0, policy_version 8286 (0.0007) [2023-08-17 13:17:24,797][140503] Updated weights for policy 0, policy_version 8296 (0.0006) [2023-08-17 13:17:25,576][140503] Updated weights for policy 0, policy_version 8306 (0.0006) [2023-08-17 13:17:26,309][140503] Updated weights for policy 0, policy_version 8316 (0.0006) [2023-08-17 13:17:27,062][140503] Updated weights for policy 0, policy_version 8326 (0.0006) [2023-08-17 13:17:27,806][140503] Updated weights for policy 0, policy_version 8336 (0.0006) [2023-08-17 13:17:28,084][140404] Fps is (10 sec: 54067.3, 60 sec: 53657.5, 300 sec: 53123.0). Total num frames: 34156544. Throughput: 0: 13425.2. Samples: 4915648. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:17:28,085][140404] Avg episode reward: [(0, '37.146')] [2023-08-17 13:17:28,595][140503] Updated weights for policy 0, policy_version 8346 (0.0006) [2023-08-17 13:17:29,346][140503] Updated weights for policy 0, policy_version 8356 (0.0007) [2023-08-17 13:17:30,097][140503] Updated weights for policy 0, policy_version 8366 (0.0006) [2023-08-17 13:17:30,864][140503] Updated weights for policy 0, policy_version 8376 (0.0007) [2023-08-17 13:17:31,643][140503] Updated weights for policy 0, policy_version 8386 (0.0007) [2023-08-17 13:17:32,388][140503] Updated weights for policy 0, policy_version 8396 (0.0006) [2023-08-17 13:17:33,084][140404] Fps is (10 sec: 53658.0, 60 sec: 53589.4, 300 sec: 53136.9). Total num frames: 34422784. Throughput: 0: 13449.6. Samples: 4996176. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:17:33,085][140404] Avg episode reward: [(0, '36.424')] [2023-08-17 13:17:33,146][140503] Updated weights for policy 0, policy_version 8406 (0.0006) [2023-08-17 13:17:33,891][140503] Updated weights for policy 0, policy_version 8416 (0.0006) [2023-08-17 13:17:34,656][140503] Updated weights for policy 0, policy_version 8426 (0.0006) [2023-08-17 13:17:35,404][140503] Updated weights for policy 0, policy_version 8436 (0.0006) [2023-08-17 13:17:36,165][140503] Updated weights for policy 0, policy_version 8446 (0.0006) [2023-08-17 13:17:36,938][140503] Updated weights for policy 0, policy_version 8456 (0.0007) [2023-08-17 13:17:37,676][140503] Updated weights for policy 0, policy_version 8466 (0.0006) [2023-08-17 13:17:38,084][140404] Fps is (10 sec: 54067.2, 60 sec: 53725.9, 300 sec: 53178.6). Total num frames: 34697216. Throughput: 0: 13455.5. Samples: 5037112. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:17:38,085][140404] Avg episode reward: [(0, '35.741')] [2023-08-17 13:17:38,428][140503] Updated weights for policy 0, policy_version 8476 (0.0006) [2023-08-17 13:17:39,188][140503] Updated weights for policy 0, policy_version 8486 (0.0006) [2023-08-17 13:17:39,939][140503] Updated weights for policy 0, policy_version 8496 (0.0006) [2023-08-17 13:17:40,693][140503] Updated weights for policy 0, policy_version 8506 (0.0006) [2023-08-17 13:17:41,456][140503] Updated weights for policy 0, policy_version 8516 (0.0006) [2023-08-17 13:17:42,225][140503] Updated weights for policy 0, policy_version 8526 (0.0006) [2023-08-17 13:17:43,003][140503] Updated weights for policy 0, policy_version 8536 (0.0007) [2023-08-17 13:17:43,084][140404] Fps is (10 sec: 54476.5, 60 sec: 53794.3, 300 sec: 53178.6). Total num frames: 34967552. Throughput: 0: 13470.1. Samples: 5118272. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:17:43,085][140404] Avg episode reward: [(0, '38.042')] [2023-08-17 13:17:43,786][140503] Updated weights for policy 0, policy_version 8546 (0.0007) [2023-08-17 13:17:44,567][140503] Updated weights for policy 0, policy_version 8556 (0.0007) [2023-08-17 13:17:45,336][140503] Updated weights for policy 0, policy_version 8566 (0.0007) [2023-08-17 13:17:46,087][140503] Updated weights for policy 0, policy_version 8576 (0.0007) [2023-08-17 13:17:46,850][140503] Updated weights for policy 0, policy_version 8586 (0.0007) [2023-08-17 13:17:47,660][140503] Updated weights for policy 0, policy_version 8596 (0.0007) [2023-08-17 13:17:48,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53725.8, 300 sec: 53164.7). Total num frames: 35229696. Throughput: 0: 13442.4. Samples: 5197436. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:17:48,085][140404] Avg episode reward: [(0, '35.640')] [2023-08-17 13:17:48,422][140503] Updated weights for policy 0, policy_version 8606 (0.0006) [2023-08-17 13:17:49,167][140503] Updated weights for policy 0, policy_version 8616 (0.0006) [2023-08-17 13:17:49,986][140503] Updated weights for policy 0, policy_version 8626 (0.0007) [2023-08-17 13:17:50,736][140503] Updated weights for policy 0, policy_version 8636 (0.0007) [2023-08-17 13:17:51,560][140503] Updated weights for policy 0, policy_version 8646 (0.0006) [2023-08-17 13:17:52,294][140503] Updated weights for policy 0, policy_version 8656 (0.0006) [2023-08-17 13:17:53,057][140503] Updated weights for policy 0, policy_version 8666 (0.0006) [2023-08-17 13:17:53,084][140404] Fps is (10 sec: 52838.0, 60 sec: 53657.5, 300 sec: 53150.8). Total num frames: 35495936. Throughput: 0: 13440.1. Samples: 5237072. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:17:53,085][140404] Avg episode reward: [(0, '38.603')] [2023-08-17 13:17:53,802][140503] Updated weights for policy 0, policy_version 8676 (0.0007) [2023-08-17 13:17:54,562][140503] Updated weights for policy 0, policy_version 8686 (0.0006) [2023-08-17 13:17:55,306][140503] Updated weights for policy 0, policy_version 8696 (0.0007) [2023-08-17 13:17:56,099][140503] Updated weights for policy 0, policy_version 8706 (0.0007) [2023-08-17 13:17:56,900][140503] Updated weights for policy 0, policy_version 8716 (0.0007) [2023-08-17 13:17:57,687][140503] Updated weights for policy 0, policy_version 8726 (0.0007) [2023-08-17 13:17:58,084][140404] Fps is (10 sec: 53248.3, 60 sec: 53657.6, 300 sec: 53136.9). Total num frames: 35762176. Throughput: 0: 13426.4. Samples: 5316964. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:17:58,085][140404] Avg episode reward: [(0, '36.497')] [2023-08-17 13:17:58,478][140503] Updated weights for policy 0, policy_version 8736 (0.0007) [2023-08-17 13:17:59,249][140503] Updated weights for policy 0, policy_version 8746 (0.0006) [2023-08-17 13:17:59,996][140503] Updated weights for policy 0, policy_version 8756 (0.0007) [2023-08-17 13:18:00,772][140503] Updated weights for policy 0, policy_version 8766 (0.0007) [2023-08-17 13:18:01,562][140503] Updated weights for policy 0, policy_version 8776 (0.0007) [2023-08-17 13:18:02,359][140503] Updated weights for policy 0, policy_version 8786 (0.0006) [2023-08-17 13:18:03,084][140404] Fps is (10 sec: 52838.6, 60 sec: 53589.3, 300 sec: 53109.1). Total num frames: 36024320. Throughput: 0: 13377.1. Samples: 5396312. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:18:03,085][140404] Avg episode reward: [(0, '35.791')] [2023-08-17 13:18:03,112][140503] Updated weights for policy 0, policy_version 8796 (0.0006) [2023-08-17 13:18:03,861][140503] Updated weights for policy 0, policy_version 8806 (0.0006) [2023-08-17 13:18:04,624][140503] Updated weights for policy 0, policy_version 8816 (0.0006) [2023-08-17 13:18:05,382][140503] Updated weights for policy 0, policy_version 8826 (0.0006) [2023-08-17 13:18:06,110][140503] Updated weights for policy 0, policy_version 8836 (0.0006) [2023-08-17 13:18:06,837][140503] Updated weights for policy 0, policy_version 8846 (0.0006) [2023-08-17 13:18:07,607][140503] Updated weights for policy 0, policy_version 8856 (0.0006) [2023-08-17 13:18:08,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53725.9, 300 sec: 53123.1). Total num frames: 36298752. Throughput: 0: 13384.6. Samples: 5437204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:18:08,085][140404] Avg episode reward: [(0, '34.323')] [2023-08-17 13:18:08,383][140503] Updated weights for policy 0, policy_version 8866 (0.0007) [2023-08-17 13:18:09,156][140503] Updated weights for policy 0, policy_version 8876 (0.0006) [2023-08-17 13:18:09,926][140503] Updated weights for policy 0, policy_version 8886 (0.0006) [2023-08-17 13:18:10,684][140503] Updated weights for policy 0, policy_version 8896 (0.0006) [2023-08-17 13:18:11,471][140503] Updated weights for policy 0, policy_version 8906 (0.0007) [2023-08-17 13:18:12,247][140503] Updated weights for policy 0, policy_version 8916 (0.0007) [2023-08-17 13:18:13,012][140503] Updated weights for policy 0, policy_version 8926 (0.0006) [2023-08-17 13:18:13,084][140404] Fps is (10 sec: 53658.0, 60 sec: 53589.3, 300 sec: 53095.3). Total num frames: 36560896. Throughput: 0: 13368.5. Samples: 5517232. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:18:13,085][140404] Avg episode reward: [(0, '39.071')] [2023-08-17 13:18:13,088][140489] Saving new best policy, reward=39.071! [2023-08-17 13:18:13,775][140503] Updated weights for policy 0, policy_version 8936 (0.0007) [2023-08-17 13:18:14,574][140503] Updated weights for policy 0, policy_version 8946 (0.0006) [2023-08-17 13:18:15,328][140503] Updated weights for policy 0, policy_version 8956 (0.0006) [2023-08-17 13:18:16,084][140503] Updated weights for policy 0, policy_version 8966 (0.0007) [2023-08-17 13:18:16,859][140503] Updated weights for policy 0, policy_version 8976 (0.0006) [2023-08-17 13:18:17,615][140503] Updated weights for policy 0, policy_version 8986 (0.0006) [2023-08-17 13:18:18,084][140404] Fps is (10 sec: 53248.1, 60 sec: 53589.4, 300 sec: 53109.2). Total num frames: 36831232. Throughput: 0: 13360.9. Samples: 5597416. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:18:18,085][140404] Avg episode reward: [(0, '40.865')] [2023-08-17 13:18:18,086][140489] Saving new best policy, reward=40.865! [2023-08-17 13:18:18,386][140503] Updated weights for policy 0, policy_version 8996 (0.0006) [2023-08-17 13:18:19,125][140503] Updated weights for policy 0, policy_version 9006 (0.0006) [2023-08-17 13:18:19,902][140503] Updated weights for policy 0, policy_version 9016 (0.0006) [2023-08-17 13:18:20,668][140503] Updated weights for policy 0, policy_version 9026 (0.0006) [2023-08-17 13:18:21,423][140503] Updated weights for policy 0, policy_version 9036 (0.0006) [2023-08-17 13:18:22,180][140503] Updated weights for policy 0, policy_version 9046 (0.0006) [2023-08-17 13:18:22,981][140503] Updated weights for policy 0, policy_version 9056 (0.0006) [2023-08-17 13:18:23,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53521.1, 300 sec: 53109.2). Total num frames: 37097472. Throughput: 0: 13337.9. Samples: 5637316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:18:23,085][140404] Avg episode reward: [(0, '36.983')] [2023-08-17 13:18:23,747][140503] Updated weights for policy 0, policy_version 9066 (0.0007) [2023-08-17 13:18:24,528][140503] Updated weights for policy 0, policy_version 9076 (0.0007) [2023-08-17 13:18:25,272][140503] Updated weights for policy 0, policy_version 9086 (0.0007) [2023-08-17 13:18:26,021][140503] Updated weights for policy 0, policy_version 9096 (0.0007) [2023-08-17 13:18:26,789][140503] Updated weights for policy 0, policy_version 9106 (0.0006) [2023-08-17 13:18:27,553][140503] Updated weights for policy 0, policy_version 9116 (0.0006) [2023-08-17 13:18:28,084][140404] Fps is (10 sec: 53247.8, 60 sec: 53452.8, 300 sec: 53109.2). Total num frames: 37363712. Throughput: 0: 13322.0. Samples: 5717760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:18:28,085][140404] Avg episode reward: [(0, '37.715')] [2023-08-17 13:18:28,335][140503] Updated weights for policy 0, policy_version 9126 (0.0006) [2023-08-17 13:18:29,081][140503] Updated weights for policy 0, policy_version 9136 (0.0006) [2023-08-17 13:18:29,839][140503] Updated weights for policy 0, policy_version 9146 (0.0006) [2023-08-17 13:18:30,616][140503] Updated weights for policy 0, policy_version 9156 (0.0007) [2023-08-17 13:18:31,398][140503] Updated weights for policy 0, policy_version 9166 (0.0007) [2023-08-17 13:18:32,153][140503] Updated weights for policy 0, policy_version 9176 (0.0006) [2023-08-17 13:18:32,919][140503] Updated weights for policy 0, policy_version 9186 (0.0007) [2023-08-17 13:18:33,084][140404] Fps is (10 sec: 53657.6, 60 sec: 53521.1, 300 sec: 53109.2). Total num frames: 37634048. Throughput: 0: 13348.6. Samples: 5798124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:18:33,085][140404] Avg episode reward: [(0, '36.871')] [2023-08-17 13:18:33,691][140503] Updated weights for policy 0, policy_version 9196 (0.0007) [2023-08-17 13:18:34,457][140503] Updated weights for policy 0, policy_version 9206 (0.0006) [2023-08-17 13:18:35,248][140503] Updated weights for policy 0, policy_version 9216 (0.0006) [2023-08-17 13:18:36,008][140503] Updated weights for policy 0, policy_version 9226 (0.0006) [2023-08-17 13:18:36,750][140503] Updated weights for policy 0, policy_version 9236 (0.0006) [2023-08-17 13:18:37,544][140503] Updated weights for policy 0, policy_version 9246 (0.0006) [2023-08-17 13:18:38,084][140404] Fps is (10 sec: 53657.6, 60 sec: 53384.6, 300 sec: 53095.3). Total num frames: 37900288. Throughput: 0: 13346.3. Samples: 5837656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:18:38,085][140404] Avg episode reward: [(0, '40.892')] [2023-08-17 13:18:38,086][140489] Saving new best policy, reward=40.892! [2023-08-17 13:18:38,282][140503] Updated weights for policy 0, policy_version 9256 (0.0006) [2023-08-17 13:18:39,039][140503] Updated weights for policy 0, policy_version 9266 (0.0007) [2023-08-17 13:18:39,775][140503] Updated weights for policy 0, policy_version 9276 (0.0006) [2023-08-17 13:18:40,556][140503] Updated weights for policy 0, policy_version 9286 (0.0006) [2023-08-17 13:18:41,303][140503] Updated weights for policy 0, policy_version 9296 (0.0006) [2023-08-17 13:18:42,122][140503] Updated weights for policy 0, policy_version 9306 (0.0007) [2023-08-17 13:18:42,888][140503] Updated weights for policy 0, policy_version 9316 (0.0007) [2023-08-17 13:18:43,084][140404] Fps is (10 sec: 53247.5, 60 sec: 53316.2, 300 sec: 53109.1). Total num frames: 38166528. Throughput: 0: 13366.2. Samples: 5918444. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:18:43,085][140404] Avg episode reward: [(0, '41.705')] [2023-08-17 13:18:43,089][140489] Saving new best policy, reward=41.705! [2023-08-17 13:18:43,649][140503] Updated weights for policy 0, policy_version 9326 (0.0007) [2023-08-17 13:18:44,445][140503] Updated weights for policy 0, policy_version 9336 (0.0006) [2023-08-17 13:18:45,222][140503] Updated weights for policy 0, policy_version 9346 (0.0007) [2023-08-17 13:18:46,002][140503] Updated weights for policy 0, policy_version 9356 (0.0007) [2023-08-17 13:18:46,764][140503] Updated weights for policy 0, policy_version 9366 (0.0006) [2023-08-17 13:18:47,510][140503] Updated weights for policy 0, policy_version 9376 (0.0006) [2023-08-17 13:18:48,084][140404] Fps is (10 sec: 53248.3, 60 sec: 53384.6, 300 sec: 53095.3). Total num frames: 38432768. Throughput: 0: 13366.7. Samples: 5997812. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:18:48,085][140404] Avg episode reward: [(0, '36.396')] [2023-08-17 13:18:48,283][140503] Updated weights for policy 0, policy_version 9386 (0.0007) [2023-08-17 13:18:49,054][140503] Updated weights for policy 0, policy_version 9396 (0.0007) [2023-08-17 13:18:49,855][140503] Updated weights for policy 0, policy_version 9406 (0.0007) [2023-08-17 13:18:50,649][140503] Updated weights for policy 0, policy_version 9416 (0.0007) [2023-08-17 13:18:51,419][140503] Updated weights for policy 0, policy_version 9426 (0.0006) [2023-08-17 13:18:52,201][140503] Updated weights for policy 0, policy_version 9436 (0.0007) [2023-08-17 13:18:52,970][140503] Updated weights for policy 0, policy_version 9446 (0.0006) [2023-08-17 13:18:53,084][140404] Fps is (10 sec: 52838.9, 60 sec: 53316.4, 300 sec: 53081.4). Total num frames: 38694912. Throughput: 0: 13335.5. Samples: 6037300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:18:53,085][140404] Avg episode reward: [(0, '36.358')] [2023-08-17 13:18:53,758][140503] Updated weights for policy 0, policy_version 9456 (0.0007) [2023-08-17 13:18:54,514][140503] Updated weights for policy 0, policy_version 9466 (0.0006) [2023-08-17 13:18:55,328][140503] Updated weights for policy 0, policy_version 9476 (0.0007) [2023-08-17 13:18:56,085][140503] Updated weights for policy 0, policy_version 9486 (0.0007) [2023-08-17 13:18:56,844][140503] Updated weights for policy 0, policy_version 9496 (0.0007) [2023-08-17 13:18:57,612][140503] Updated weights for policy 0, policy_version 9506 (0.0006) [2023-08-17 13:18:58,084][140404] Fps is (10 sec: 52838.0, 60 sec: 53316.2, 300 sec: 53095.3). Total num frames: 38961152. Throughput: 0: 13322.6. Samples: 6116748. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:18:58,085][140404] Avg episode reward: [(0, '35.439')] [2023-08-17 13:18:58,360][140503] Updated weights for policy 0, policy_version 9516 (0.0006) [2023-08-17 13:18:59,123][140503] Updated weights for policy 0, policy_version 9526 (0.0006) [2023-08-17 13:18:59,843][140503] Updated weights for policy 0, policy_version 9536 (0.0006) [2023-08-17 13:19:00,663][140503] Updated weights for policy 0, policy_version 9546 (0.0006) [2023-08-17 13:19:01,399][140503] Updated weights for policy 0, policy_version 9556 (0.0006) [2023-08-17 13:19:02,145][140503] Updated weights for policy 0, policy_version 9566 (0.0007) [2023-08-17 13:19:02,877][140503] Updated weights for policy 0, policy_version 9576 (0.0006) [2023-08-17 13:19:03,084][140404] Fps is (10 sec: 53657.4, 60 sec: 53452.9, 300 sec: 53109.1). Total num frames: 39231488. Throughput: 0: 13348.3. Samples: 6198092. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:19:03,085][140404] Avg episode reward: [(0, '37.108')] [2023-08-17 13:19:03,088][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000009578_39231488.pth... [2023-08-17 13:19:03,124][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000006453_26431488.pth [2023-08-17 13:19:03,616][140503] Updated weights for policy 0, policy_version 9586 (0.0006) [2023-08-17 13:19:04,385][140503] Updated weights for policy 0, policy_version 9596 (0.0006) [2023-08-17 13:19:05,157][140503] Updated weights for policy 0, policy_version 9606 (0.0007) [2023-08-17 13:19:05,896][140503] Updated weights for policy 0, policy_version 9616 (0.0007) [2023-08-17 13:19:06,670][140503] Updated weights for policy 0, policy_version 9626 (0.0007) [2023-08-17 13:19:07,421][140503] Updated weights for policy 0, policy_version 9636 (0.0007) [2023-08-17 13:19:08,084][140404] Fps is (10 sec: 54066.7, 60 sec: 53384.4, 300 sec: 53150.8). Total num frames: 39501824. Throughput: 0: 13367.3. Samples: 6238848. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:19:08,085][140404] Avg episode reward: [(0, '36.910')] [2023-08-17 13:19:08,195][140503] Updated weights for policy 0, policy_version 9646 (0.0007) [2023-08-17 13:19:08,960][140503] Updated weights for policy 0, policy_version 9656 (0.0006) [2023-08-17 13:19:09,731][140503] Updated weights for policy 0, policy_version 9666 (0.0007) [2023-08-17 13:19:10,510][140503] Updated weights for policy 0, policy_version 9676 (0.0007) [2023-08-17 13:19:11,251][140503] Updated weights for policy 0, policy_version 9686 (0.0007) [2023-08-17 13:19:12,057][140503] Updated weights for policy 0, policy_version 9696 (0.0007) [2023-08-17 13:19:12,791][140503] Updated weights for policy 0, policy_version 9706 (0.0006) [2023-08-17 13:19:13,084][140404] Fps is (10 sec: 53657.2, 60 sec: 53452.7, 300 sec: 53178.6). Total num frames: 39768064. Throughput: 0: 13351.2. Samples: 6318564. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:19:13,085][140404] Avg episode reward: [(0, '38.633')] [2023-08-17 13:19:13,571][140503] Updated weights for policy 0, policy_version 9716 (0.0006) [2023-08-17 13:19:14,305][140503] Updated weights for policy 0, policy_version 9726 (0.0006) [2023-08-17 13:19:15,059][140503] Updated weights for policy 0, policy_version 9736 (0.0006) [2023-08-17 13:19:15,867][140503] Updated weights for policy 0, policy_version 9746 (0.0006) [2023-08-17 13:19:16,629][140503] Updated weights for policy 0, policy_version 9756 (0.0007) [2023-08-17 13:19:17,401][140503] Updated weights for policy 0, policy_version 9766 (0.0007) [2023-08-17 13:19:18,084][140404] Fps is (10 sec: 53247.8, 60 sec: 53384.4, 300 sec: 53220.2). Total num frames: 40034304. Throughput: 0: 13349.9. Samples: 6398872. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:19:18,085][140404] Avg episode reward: [(0, '39.664')] [2023-08-17 13:19:18,188][140503] Updated weights for policy 0, policy_version 9776 (0.0007) [2023-08-17 13:19:18,929][140503] Updated weights for policy 0, policy_version 9786 (0.0006) [2023-08-17 13:19:19,713][140503] Updated weights for policy 0, policy_version 9796 (0.0006) [2023-08-17 13:19:20,477][140503] Updated weights for policy 0, policy_version 9806 (0.0007) [2023-08-17 13:19:21,253][140503] Updated weights for policy 0, policy_version 9816 (0.0006) [2023-08-17 13:19:22,024][140503] Updated weights for policy 0, policy_version 9826 (0.0006) [2023-08-17 13:19:22,828][140503] Updated weights for policy 0, policy_version 9836 (0.0007) [2023-08-17 13:19:23,084][140404] Fps is (10 sec: 53248.5, 60 sec: 53384.5, 300 sec: 53248.0). Total num frames: 40300544. Throughput: 0: 13357.1. Samples: 6438724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:19:23,085][140404] Avg episode reward: [(0, '40.181')] [2023-08-17 13:19:23,575][140503] Updated weights for policy 0, policy_version 9846 (0.0007) [2023-08-17 13:19:24,356][140503] Updated weights for policy 0, policy_version 9856 (0.0006) [2023-08-17 13:19:25,100][140503] Updated weights for policy 0, policy_version 9866 (0.0006) [2023-08-17 13:19:25,856][140503] Updated weights for policy 0, policy_version 9876 (0.0006) [2023-08-17 13:19:26,651][140503] Updated weights for policy 0, policy_version 9886 (0.0007) [2023-08-17 13:19:27,433][140503] Updated weights for policy 0, policy_version 9896 (0.0007) [2023-08-17 13:19:28,084][140404] Fps is (10 sec: 53248.4, 60 sec: 53384.5, 300 sec: 53289.6). Total num frames: 40566784. Throughput: 0: 13329.5. Samples: 6518272. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:19:28,085][140404] Avg episode reward: [(0, '38.152')] [2023-08-17 13:19:28,221][140503] Updated weights for policy 0, policy_version 9906 (0.0007) [2023-08-17 13:19:28,986][140503] Updated weights for policy 0, policy_version 9916 (0.0007) [2023-08-17 13:19:29,759][140503] Updated weights for policy 0, policy_version 9926 (0.0007) [2023-08-17 13:19:30,504][140503] Updated weights for policy 0, policy_version 9936 (0.0007) [2023-08-17 13:19:31,275][140503] Updated weights for policy 0, policy_version 9946 (0.0006) [2023-08-17 13:19:32,076][140503] Updated weights for policy 0, policy_version 9956 (0.0006) [2023-08-17 13:19:32,838][140503] Updated weights for policy 0, policy_version 9966 (0.0006) [2023-08-17 13:19:33,084][140404] Fps is (10 sec: 53247.8, 60 sec: 53316.2, 300 sec: 53289.7). Total num frames: 40833024. Throughput: 0: 13352.0. Samples: 6598652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:19:33,085][140404] Avg episode reward: [(0, '39.420')] [2023-08-17 13:19:33,123][140489] Signal inference workers to stop experience collection... (50 times) [2023-08-17 13:19:33,123][140489] Signal inference workers to resume experience collection... (50 times) [2023-08-17 13:19:33,129][140503] InferenceWorker_p0-w0: stopping experience collection (50 times) [2023-08-17 13:19:33,129][140503] InferenceWorker_p0-w0: resuming experience collection (50 times) [2023-08-17 13:19:33,572][140503] Updated weights for policy 0, policy_version 9976 (0.0007) [2023-08-17 13:19:34,327][140503] Updated weights for policy 0, policy_version 9986 (0.0007) [2023-08-17 13:19:35,078][140503] Updated weights for policy 0, policy_version 9996 (0.0006) [2023-08-17 13:19:35,817][140503] Updated weights for policy 0, policy_version 10006 (0.0006) [2023-08-17 13:19:36,558][140503] Updated weights for policy 0, policy_version 10016 (0.0006) [2023-08-17 13:19:37,354][140503] Updated weights for policy 0, policy_version 10026 (0.0007) [2023-08-17 13:19:38,084][140404] Fps is (10 sec: 53657.8, 60 sec: 53384.5, 300 sec: 53303.5). Total num frames: 41103360. Throughput: 0: 13370.3. Samples: 6638964. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:19:38,085][140404] Avg episode reward: [(0, '37.055')] [2023-08-17 13:19:38,145][140503] Updated weights for policy 0, policy_version 10036 (0.0007) [2023-08-17 13:19:38,915][140503] Updated weights for policy 0, policy_version 10046 (0.0007) [2023-08-17 13:19:39,664][140503] Updated weights for policy 0, policy_version 10056 (0.0007) [2023-08-17 13:19:40,413][140503] Updated weights for policy 0, policy_version 10066 (0.0006) [2023-08-17 13:19:41,165][140503] Updated weights for policy 0, policy_version 10076 (0.0006) [2023-08-17 13:19:41,899][140503] Updated weights for policy 0, policy_version 10086 (0.0007) [2023-08-17 13:19:42,698][140503] Updated weights for policy 0, policy_version 10096 (0.0007) [2023-08-17 13:19:43,084][140404] Fps is (10 sec: 53657.9, 60 sec: 53384.6, 300 sec: 53317.4). Total num frames: 41369600. Throughput: 0: 13400.5. Samples: 6719768. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:19:43,085][140404] Avg episode reward: [(0, '37.695')] [2023-08-17 13:19:43,489][140503] Updated weights for policy 0, policy_version 10106 (0.0007) [2023-08-17 13:19:44,274][140503] Updated weights for policy 0, policy_version 10116 (0.0006) [2023-08-17 13:19:45,025][140503] Updated weights for policy 0, policy_version 10126 (0.0007) [2023-08-17 13:19:45,784][140503] Updated weights for policy 0, policy_version 10136 (0.0006) [2023-08-17 13:19:46,531][140503] Updated weights for policy 0, policy_version 10146 (0.0006) [2023-08-17 13:19:47,303][140503] Updated weights for policy 0, policy_version 10156 (0.0007) [2023-08-17 13:19:48,048][140503] Updated weights for policy 0, policy_version 10166 (0.0006) [2023-08-17 13:19:48,084][140404] Fps is (10 sec: 53657.5, 60 sec: 53452.7, 300 sec: 53345.2). Total num frames: 41639936. Throughput: 0: 13372.0. Samples: 6799832. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:19:48,085][140404] Avg episode reward: [(0, '39.590')] [2023-08-17 13:19:48,788][140503] Updated weights for policy 0, policy_version 10176 (0.0006) [2023-08-17 13:19:49,543][140503] Updated weights for policy 0, policy_version 10186 (0.0007) [2023-08-17 13:19:50,302][140503] Updated weights for policy 0, policy_version 10196 (0.0006) [2023-08-17 13:19:51,106][140503] Updated weights for policy 0, policy_version 10206 (0.0006) [2023-08-17 13:19:51,879][140503] Updated weights for policy 0, policy_version 10216 (0.0007) [2023-08-17 13:19:52,633][140503] Updated weights for policy 0, policy_version 10226 (0.0006) [2023-08-17 13:19:53,084][140404] Fps is (10 sec: 54067.3, 60 sec: 53589.3, 300 sec: 53373.0). Total num frames: 41910272. Throughput: 0: 13368.1. Samples: 6840412. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:19:53,085][140404] Avg episode reward: [(0, '38.375')] [2023-08-17 13:19:53,374][140503] Updated weights for policy 0, policy_version 10236 (0.0006) [2023-08-17 13:19:54,103][140503] Updated weights for policy 0, policy_version 10246 (0.0005) [2023-08-17 13:19:54,869][140503] Updated weights for policy 0, policy_version 10256 (0.0006) [2023-08-17 13:19:55,605][140503] Updated weights for policy 0, policy_version 10266 (0.0006) [2023-08-17 13:19:56,375][140503] Updated weights for policy 0, policy_version 10276 (0.0007) [2023-08-17 13:19:57,124][140503] Updated weights for policy 0, policy_version 10286 (0.0007) [2023-08-17 13:19:57,887][140503] Updated weights for policy 0, policy_version 10296 (0.0006) [2023-08-17 13:19:58,084][140404] Fps is (10 sec: 54067.7, 60 sec: 53657.7, 300 sec: 53386.9). Total num frames: 42180608. Throughput: 0: 13395.7. Samples: 6921368. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:19:58,085][140404] Avg episode reward: [(0, '38.812')] [2023-08-17 13:19:58,675][140503] Updated weights for policy 0, policy_version 10306 (0.0007) [2023-08-17 13:19:59,427][140503] Updated weights for policy 0, policy_version 10316 (0.0007) [2023-08-17 13:20:00,252][140503] Updated weights for policy 0, policy_version 10326 (0.0007) [2023-08-17 13:20:00,998][140503] Updated weights for policy 0, policy_version 10336 (0.0007) [2023-08-17 13:20:01,765][140503] Updated weights for policy 0, policy_version 10346 (0.0006) [2023-08-17 13:20:02,508][140503] Updated weights for policy 0, policy_version 10356 (0.0007) [2023-08-17 13:20:03,084][140404] Fps is (10 sec: 53657.1, 60 sec: 53589.3, 300 sec: 53414.6). Total num frames: 42446848. Throughput: 0: 13389.6. Samples: 7001404. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:20:03,085][140404] Avg episode reward: [(0, '39.239')] [2023-08-17 13:20:03,290][140503] Updated weights for policy 0, policy_version 10366 (0.0007) [2023-08-17 13:20:04,070][140503] Updated weights for policy 0, policy_version 10376 (0.0006) [2023-08-17 13:20:04,826][140503] Updated weights for policy 0, policy_version 10386 (0.0006) [2023-08-17 13:20:05,596][140503] Updated weights for policy 0, policy_version 10396 (0.0006) [2023-08-17 13:20:06,375][140503] Updated weights for policy 0, policy_version 10406 (0.0006) [2023-08-17 13:20:07,137][140503] Updated weights for policy 0, policy_version 10416 (0.0007) [2023-08-17 13:20:07,894][140503] Updated weights for policy 0, policy_version 10426 (0.0006) [2023-08-17 13:20:08,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53521.2, 300 sec: 53456.3). Total num frames: 42713088. Throughput: 0: 13388.5. Samples: 7041204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:20:08,085][140404] Avg episode reward: [(0, '40.826')] [2023-08-17 13:20:08,640][140503] Updated weights for policy 0, policy_version 10436 (0.0006) [2023-08-17 13:20:09,383][140503] Updated weights for policy 0, policy_version 10446 (0.0007) [2023-08-17 13:20:10,139][140503] Updated weights for policy 0, policy_version 10456 (0.0006) [2023-08-17 13:20:10,887][140503] Updated weights for policy 0, policy_version 10466 (0.0006) [2023-08-17 13:20:11,665][140503] Updated weights for policy 0, policy_version 10476 (0.0007) [2023-08-17 13:20:12,404][140503] Updated weights for policy 0, policy_version 10486 (0.0006) [2023-08-17 13:20:13,084][140404] Fps is (10 sec: 54067.7, 60 sec: 53657.7, 300 sec: 53511.8). Total num frames: 42987520. Throughput: 0: 13431.8. Samples: 7122704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:20:13,085][140404] Avg episode reward: [(0, '43.282')] [2023-08-17 13:20:13,088][140489] Saving new best policy, reward=43.282! [2023-08-17 13:20:13,162][140503] Updated weights for policy 0, policy_version 10496 (0.0006) [2023-08-17 13:20:13,940][140503] Updated weights for policy 0, policy_version 10506 (0.0007) [2023-08-17 13:20:14,707][140503] Updated weights for policy 0, policy_version 10516 (0.0007) [2023-08-17 13:20:15,497][140503] Updated weights for policy 0, policy_version 10526 (0.0007) [2023-08-17 13:20:16,266][140503] Updated weights for policy 0, policy_version 10536 (0.0006) [2023-08-17 13:20:17,047][140503] Updated weights for policy 0, policy_version 10546 (0.0006) [2023-08-17 13:20:17,846][140503] Updated weights for policy 0, policy_version 10556 (0.0007) [2023-08-17 13:20:18,084][140404] Fps is (10 sec: 53657.5, 60 sec: 53589.5, 300 sec: 53511.8). Total num frames: 43249664. Throughput: 0: 13406.3. Samples: 7201936. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:20:18,085][140404] Avg episode reward: [(0, '40.016')] [2023-08-17 13:20:18,627][140503] Updated weights for policy 0, policy_version 10566 (0.0007) [2023-08-17 13:20:19,394][140503] Updated weights for policy 0, policy_version 10576 (0.0006) [2023-08-17 13:20:20,148][140503] Updated weights for policy 0, policy_version 10586 (0.0006) [2023-08-17 13:20:20,893][140503] Updated weights for policy 0, policy_version 10596 (0.0006) [2023-08-17 13:20:21,675][140503] Updated weights for policy 0, policy_version 10606 (0.0007) [2023-08-17 13:20:22,461][140503] Updated weights for policy 0, policy_version 10616 (0.0006) [2023-08-17 13:20:23,084][140404] Fps is (10 sec: 52838.6, 60 sec: 53589.4, 300 sec: 53525.7). Total num frames: 43515904. Throughput: 0: 13403.8. Samples: 7242132. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:20:23,085][140404] Avg episode reward: [(0, '41.067')] [2023-08-17 13:20:23,230][140503] Updated weights for policy 0, policy_version 10626 (0.0007) [2023-08-17 13:20:23,964][140503] Updated weights for policy 0, policy_version 10636 (0.0007) [2023-08-17 13:20:24,703][140503] Updated weights for policy 0, policy_version 10646 (0.0006) [2023-08-17 13:20:25,499][140503] Updated weights for policy 0, policy_version 10656 (0.0007) [2023-08-17 13:20:26,265][140503] Updated weights for policy 0, policy_version 10666 (0.0006) [2023-08-17 13:20:27,019][140503] Updated weights for policy 0, policy_version 10676 (0.0006) [2023-08-17 13:20:27,779][140503] Updated weights for policy 0, policy_version 10686 (0.0006) [2023-08-17 13:20:28,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53657.7, 300 sec: 53567.4). Total num frames: 43786240. Throughput: 0: 13392.4. Samples: 7322424. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:20:28,085][140404] Avg episode reward: [(0, '41.706')] [2023-08-17 13:20:28,534][140503] Updated weights for policy 0, policy_version 10696 (0.0006) [2023-08-17 13:20:29,288][140503] Updated weights for policy 0, policy_version 10706 (0.0007) [2023-08-17 13:20:30,076][140503] Updated weights for policy 0, policy_version 10716 (0.0006) [2023-08-17 13:20:30,830][140503] Updated weights for policy 0, policy_version 10726 (0.0007) [2023-08-17 13:20:31,626][140503] Updated weights for policy 0, policy_version 10736 (0.0007) [2023-08-17 13:20:32,402][140503] Updated weights for policy 0, policy_version 10746 (0.0007) [2023-08-17 13:20:33,084][140404] Fps is (10 sec: 53657.3, 60 sec: 53657.6, 300 sec: 53539.6). Total num frames: 44052480. Throughput: 0: 13395.8. Samples: 7402644. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:20:33,085][140404] Avg episode reward: [(0, '40.306')] [2023-08-17 13:20:33,162][140503] Updated weights for policy 0, policy_version 10756 (0.0006) [2023-08-17 13:20:33,938][140503] Updated weights for policy 0, policy_version 10766 (0.0007) [2023-08-17 13:20:34,728][140503] Updated weights for policy 0, policy_version 10776 (0.0007) [2023-08-17 13:20:35,503][140503] Updated weights for policy 0, policy_version 10786 (0.0007) [2023-08-17 13:20:36,284][140503] Updated weights for policy 0, policy_version 10796 (0.0006) [2023-08-17 13:20:37,046][140503] Updated weights for policy 0, policy_version 10806 (0.0007) [2023-08-17 13:20:37,789][140503] Updated weights for policy 0, policy_version 10816 (0.0006) [2023-08-17 13:20:38,084][140404] Fps is (10 sec: 52838.1, 60 sec: 53521.1, 300 sec: 53525.7). Total num frames: 44314624. Throughput: 0: 13365.1. Samples: 7441844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:20:38,085][140404] Avg episode reward: [(0, '39.154')] [2023-08-17 13:20:38,557][140503] Updated weights for policy 0, policy_version 10826 (0.0006) [2023-08-17 13:20:39,304][140503] Updated weights for policy 0, policy_version 10836 (0.0007) [2023-08-17 13:20:40,083][140503] Updated weights for policy 0, policy_version 10846 (0.0007) [2023-08-17 13:20:40,879][140503] Updated weights for policy 0, policy_version 10856 (0.0006) [2023-08-17 13:20:41,621][140503] Updated weights for policy 0, policy_version 10866 (0.0006) [2023-08-17 13:20:42,372][140503] Updated weights for policy 0, policy_version 10876 (0.0006) [2023-08-17 13:20:43,084][140404] Fps is (10 sec: 52838.1, 60 sec: 53521.0, 300 sec: 53525.7). Total num frames: 44580864. Throughput: 0: 13355.4. Samples: 7522364. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:20:43,085][140404] Avg episode reward: [(0, '40.058')] [2023-08-17 13:20:43,128][140503] Updated weights for policy 0, policy_version 10886 (0.0006) [2023-08-17 13:20:43,921][140503] Updated weights for policy 0, policy_version 10896 (0.0007) [2023-08-17 13:20:44,681][140503] Updated weights for policy 0, policy_version 10906 (0.0006) [2023-08-17 13:20:45,458][140503] Updated weights for policy 0, policy_version 10916 (0.0006) [2023-08-17 13:20:46,208][140503] Updated weights for policy 0, policy_version 10926 (0.0006) [2023-08-17 13:20:46,987][140503] Updated weights for policy 0, policy_version 10936 (0.0007) [2023-08-17 13:20:47,762][140503] Updated weights for policy 0, policy_version 10946 (0.0006) [2023-08-17 13:20:48,084][140404] Fps is (10 sec: 53657.6, 60 sec: 53521.1, 300 sec: 53539.6). Total num frames: 44851200. Throughput: 0: 13353.7. Samples: 7602320. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:20:48,085][140404] Avg episode reward: [(0, '35.851')] [2023-08-17 13:20:48,536][140503] Updated weights for policy 0, policy_version 10956 (0.0006) [2023-08-17 13:20:49,302][140503] Updated weights for policy 0, policy_version 10966 (0.0006) [2023-08-17 13:20:50,053][140503] Updated weights for policy 0, policy_version 10976 (0.0006) [2023-08-17 13:20:50,793][140503] Updated weights for policy 0, policy_version 10986 (0.0006) [2023-08-17 13:20:51,553][140503] Updated weights for policy 0, policy_version 10996 (0.0006) [2023-08-17 13:20:52,300][140503] Updated weights for policy 0, policy_version 11006 (0.0006) [2023-08-17 13:20:53,081][140503] Updated weights for policy 0, policy_version 11016 (0.0007) [2023-08-17 13:20:53,084][140404] Fps is (10 sec: 54067.6, 60 sec: 53521.1, 300 sec: 53553.5). Total num frames: 45121536. Throughput: 0: 13370.9. Samples: 7642896. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:20:53,085][140404] Avg episode reward: [(0, '41.889')] [2023-08-17 13:20:53,869][140503] Updated weights for policy 0, policy_version 11026 (0.0007) [2023-08-17 13:20:54,621][140503] Updated weights for policy 0, policy_version 11036 (0.0007) [2023-08-17 13:20:55,397][140503] Updated weights for policy 0, policy_version 11046 (0.0007) [2023-08-17 13:20:56,159][140503] Updated weights for policy 0, policy_version 11056 (0.0007) [2023-08-17 13:20:56,913][140503] Updated weights for policy 0, policy_version 11066 (0.0006) [2023-08-17 13:20:57,724][140503] Updated weights for policy 0, policy_version 11076 (0.0007) [2023-08-17 13:20:58,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53384.5, 300 sec: 53539.6). Total num frames: 45383680. Throughput: 0: 13342.5. Samples: 7723116. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:20:58,085][140404] Avg episode reward: [(0, '40.150')] [2023-08-17 13:20:58,493][140503] Updated weights for policy 0, policy_version 11086 (0.0007) [2023-08-17 13:20:59,248][140503] Updated weights for policy 0, policy_version 11096 (0.0006) [2023-08-17 13:20:59,995][140503] Updated weights for policy 0, policy_version 11106 (0.0006) [2023-08-17 13:21:00,754][140503] Updated weights for policy 0, policy_version 11116 (0.0006) [2023-08-17 13:21:01,525][140503] Updated weights for policy 0, policy_version 11126 (0.0006) [2023-08-17 13:21:02,286][140503] Updated weights for policy 0, policy_version 11136 (0.0007) [2023-08-17 13:21:03,084][140404] Fps is (10 sec: 52837.8, 60 sec: 53384.5, 300 sec: 53525.7). Total num frames: 45649920. Throughput: 0: 13352.7. Samples: 7802808. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:21:03,085][140404] Avg episode reward: [(0, '39.033')] [2023-08-17 13:21:03,089][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000011145_45649920.pth... [2023-08-17 13:21:03,132][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000008010_32808960.pth [2023-08-17 13:21:03,160][140503] Updated weights for policy 0, policy_version 11146 (0.0007) [2023-08-17 13:21:03,904][140503] Updated weights for policy 0, policy_version 11156 (0.0007) [2023-08-17 13:21:04,650][140503] Updated weights for policy 0, policy_version 11166 (0.0006) [2023-08-17 13:21:05,426][140503] Updated weights for policy 0, policy_version 11176 (0.0007) [2023-08-17 13:21:06,232][140503] Updated weights for policy 0, policy_version 11186 (0.0007) [2023-08-17 13:21:06,975][140503] Updated weights for policy 0, policy_version 11196 (0.0007) [2023-08-17 13:21:07,729][140503] Updated weights for policy 0, policy_version 11206 (0.0006) [2023-08-17 13:21:08,084][140404] Fps is (10 sec: 53248.3, 60 sec: 53384.5, 300 sec: 53511.8). Total num frames: 45916160. Throughput: 0: 13332.9. Samples: 7842112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:21:08,085][140404] Avg episode reward: [(0, '43.509')] [2023-08-17 13:21:08,086][140489] Saving new best policy, reward=43.509! [2023-08-17 13:21:08,487][140503] Updated weights for policy 0, policy_version 11216 (0.0007) [2023-08-17 13:21:09,274][140503] Updated weights for policy 0, policy_version 11226 (0.0007) [2023-08-17 13:21:10,049][140503] Updated weights for policy 0, policy_version 11236 (0.0006) [2023-08-17 13:21:10,794][140503] Updated weights for policy 0, policy_version 11246 (0.0007) [2023-08-17 13:21:11,557][140503] Updated weights for policy 0, policy_version 11256 (0.0007) [2023-08-17 13:21:12,323][140503] Updated weights for policy 0, policy_version 11266 (0.0006) [2023-08-17 13:21:13,084][140404] Fps is (10 sec: 53248.6, 60 sec: 53248.0, 300 sec: 53497.9). Total num frames: 46182400. Throughput: 0: 13334.9. Samples: 7922496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:21:13,085][140404] Avg episode reward: [(0, '41.720')] [2023-08-17 13:21:13,115][140503] Updated weights for policy 0, policy_version 11276 (0.0007) [2023-08-17 13:21:13,935][140503] Updated weights for policy 0, policy_version 11286 (0.0007) [2023-08-17 13:21:14,702][140503] Updated weights for policy 0, policy_version 11296 (0.0007) [2023-08-17 13:21:15,486][140503] Updated weights for policy 0, policy_version 11306 (0.0006) [2023-08-17 13:21:16,222][140503] Updated weights for policy 0, policy_version 11316 (0.0007) [2023-08-17 13:21:17,024][140503] Updated weights for policy 0, policy_version 11326 (0.0007) [2023-08-17 13:21:17,777][140503] Updated weights for policy 0, policy_version 11336 (0.0006) [2023-08-17 13:21:18,084][140404] Fps is (10 sec: 53247.7, 60 sec: 53316.2, 300 sec: 53484.0). Total num frames: 46448640. Throughput: 0: 13310.8. Samples: 8001632. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:21:18,085][140404] Avg episode reward: [(0, '41.337')] [2023-08-17 13:21:18,525][140503] Updated weights for policy 0, policy_version 11346 (0.0006) [2023-08-17 13:21:19,276][140503] Updated weights for policy 0, policy_version 11356 (0.0006) [2023-08-17 13:21:20,067][140503] Updated weights for policy 0, policy_version 11366 (0.0007) [2023-08-17 13:21:20,832][140503] Updated weights for policy 0, policy_version 11376 (0.0007) [2023-08-17 13:21:21,632][140503] Updated weights for policy 0, policy_version 11386 (0.0007) [2023-08-17 13:21:22,395][140503] Updated weights for policy 0, policy_version 11396 (0.0006) [2023-08-17 13:21:23,084][140404] Fps is (10 sec: 53248.2, 60 sec: 53316.3, 300 sec: 53484.0). Total num frames: 46714880. Throughput: 0: 13327.8. Samples: 8041592. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:21:23,085][140404] Avg episode reward: [(0, '39.350')] [2023-08-17 13:21:23,171][140503] Updated weights for policy 0, policy_version 11406 (0.0007) [2023-08-17 13:21:23,940][140503] Updated weights for policy 0, policy_version 11416 (0.0007) [2023-08-17 13:21:24,681][140503] Updated weights for policy 0, policy_version 11426 (0.0006) [2023-08-17 13:21:25,482][140503] Updated weights for policy 0, policy_version 11436 (0.0007) [2023-08-17 13:21:26,260][140503] Updated weights for policy 0, policy_version 11446 (0.0007) [2023-08-17 13:21:27,023][140503] Updated weights for policy 0, policy_version 11456 (0.0007) [2023-08-17 13:21:27,834][140503] Updated weights for policy 0, policy_version 11466 (0.0006) [2023-08-17 13:21:28,084][140404] Fps is (10 sec: 52838.3, 60 sec: 53179.7, 300 sec: 53456.3). Total num frames: 46977024. Throughput: 0: 13303.0. Samples: 8121000. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:21:28,085][140404] Avg episode reward: [(0, '39.147')] [2023-08-17 13:21:28,577][140503] Updated weights for policy 0, policy_version 11476 (0.0007) [2023-08-17 13:21:29,329][140503] Updated weights for policy 0, policy_version 11486 (0.0006) [2023-08-17 13:21:30,095][140503] Updated weights for policy 0, policy_version 11496 (0.0007) [2023-08-17 13:21:30,847][140503] Updated weights for policy 0, policy_version 11506 (0.0006) [2023-08-17 13:21:31,588][140503] Updated weights for policy 0, policy_version 11516 (0.0006) [2023-08-17 13:21:32,381][140503] Updated weights for policy 0, policy_version 11526 (0.0007) [2023-08-17 13:21:33,084][140404] Fps is (10 sec: 52838.2, 60 sec: 53179.7, 300 sec: 53456.3). Total num frames: 47243264. Throughput: 0: 13312.6. Samples: 8201388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:21:33,085][140404] Avg episode reward: [(0, '39.955')] [2023-08-17 13:21:33,145][140503] Updated weights for policy 0, policy_version 11536 (0.0006) [2023-08-17 13:21:33,932][140503] Updated weights for policy 0, policy_version 11546 (0.0007) [2023-08-17 13:21:34,723][140503] Updated weights for policy 0, policy_version 11556 (0.0007) [2023-08-17 13:21:35,494][140503] Updated weights for policy 0, policy_version 11566 (0.0006) [2023-08-17 13:21:36,244][140503] Updated weights for policy 0, policy_version 11576 (0.0007) [2023-08-17 13:21:37,006][140503] Updated weights for policy 0, policy_version 11586 (0.0006) [2023-08-17 13:21:37,757][140503] Updated weights for policy 0, policy_version 11596 (0.0006) [2023-08-17 13:21:38,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53316.3, 300 sec: 53470.2). Total num frames: 47513600. Throughput: 0: 13291.0. Samples: 8240992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:21:38,085][140404] Avg episode reward: [(0, '39.510')] [2023-08-17 13:21:38,516][140503] Updated weights for policy 0, policy_version 11606 (0.0007) [2023-08-17 13:21:39,264][140503] Updated weights for policy 0, policy_version 11616 (0.0006) [2023-08-17 13:21:40,017][140503] Updated weights for policy 0, policy_version 11626 (0.0006) [2023-08-17 13:21:40,802][140503] Updated weights for policy 0, policy_version 11636 (0.0007) [2023-08-17 13:21:41,542][140503] Updated weights for policy 0, policy_version 11646 (0.0006) [2023-08-17 13:21:42,289][140503] Updated weights for policy 0, policy_version 11656 (0.0006) [2023-08-17 13:21:43,036][140503] Updated weights for policy 0, policy_version 11666 (0.0006) [2023-08-17 13:21:43,084][140404] Fps is (10 sec: 54066.8, 60 sec: 53384.5, 300 sec: 53484.0). Total num frames: 47783936. Throughput: 0: 13306.9. Samples: 8321928. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:21:43,085][140404] Avg episode reward: [(0, '41.846')] [2023-08-17 13:21:43,780][140503] Updated weights for policy 0, policy_version 11676 (0.0006) [2023-08-17 13:21:44,561][140503] Updated weights for policy 0, policy_version 11686 (0.0006) [2023-08-17 13:21:45,330][140503] Updated weights for policy 0, policy_version 11696 (0.0006) [2023-08-17 13:21:46,117][140503] Updated weights for policy 0, policy_version 11706 (0.0007) [2023-08-17 13:21:46,862][140503] Updated weights for policy 0, policy_version 11716 (0.0007) [2023-08-17 13:21:47,669][140503] Updated weights for policy 0, policy_version 11726 (0.0007) [2023-08-17 13:21:48,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53316.3, 300 sec: 53470.2). Total num frames: 48050176. Throughput: 0: 13314.3. Samples: 8401952. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:21:48,085][140404] Avg episode reward: [(0, '40.862')] [2023-08-17 13:21:48,476][140503] Updated weights for policy 0, policy_version 11736 (0.0007) [2023-08-17 13:21:49,235][140503] Updated weights for policy 0, policy_version 11746 (0.0006) [2023-08-17 13:21:50,010][140503] Updated weights for policy 0, policy_version 11756 (0.0006) [2023-08-17 13:21:50,771][140503] Updated weights for policy 0, policy_version 11766 (0.0006) [2023-08-17 13:21:51,520][140503] Updated weights for policy 0, policy_version 11776 (0.0006) [2023-08-17 13:21:52,268][140503] Updated weights for policy 0, policy_version 11786 (0.0007) [2023-08-17 13:21:53,042][140503] Updated weights for policy 0, policy_version 11796 (0.0006) [2023-08-17 13:21:53,084][140404] Fps is (10 sec: 53248.3, 60 sec: 53248.0, 300 sec: 53470.2). Total num frames: 48316416. Throughput: 0: 13333.7. Samples: 8442128. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:21:53,085][140404] Avg episode reward: [(0, '43.115')] [2023-08-17 13:21:53,817][140503] Updated weights for policy 0, policy_version 11806 (0.0006) [2023-08-17 13:21:54,595][140503] Updated weights for policy 0, policy_version 11816 (0.0006) [2023-08-17 13:21:55,380][140503] Updated weights for policy 0, policy_version 11826 (0.0007) [2023-08-17 13:21:56,150][140503] Updated weights for policy 0, policy_version 11836 (0.0007) [2023-08-17 13:21:56,907][140503] Updated weights for policy 0, policy_version 11846 (0.0006) [2023-08-17 13:21:57,668][140503] Updated weights for policy 0, policy_version 11856 (0.0006) [2023-08-17 13:21:58,084][140404] Fps is (10 sec: 52838.1, 60 sec: 53248.0, 300 sec: 53456.3). Total num frames: 48578560. Throughput: 0: 13323.3. Samples: 8522044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:21:58,085][140404] Avg episode reward: [(0, '40.423')] [2023-08-17 13:21:58,467][140503] Updated weights for policy 0, policy_version 11866 (0.0006) [2023-08-17 13:21:59,221][140503] Updated weights for policy 0, policy_version 11876 (0.0006) [2023-08-17 13:21:59,995][140503] Updated weights for policy 0, policy_version 11886 (0.0006) [2023-08-17 13:22:00,776][140503] Updated weights for policy 0, policy_version 11896 (0.0007) [2023-08-17 13:22:01,511][140503] Updated weights for policy 0, policy_version 11906 (0.0006) [2023-08-17 13:22:02,251][140503] Updated weights for policy 0, policy_version 11916 (0.0006) [2023-08-17 13:22:03,050][140503] Updated weights for policy 0, policy_version 11926 (0.0006) [2023-08-17 13:22:03,084][140404] Fps is (10 sec: 53657.8, 60 sec: 53384.7, 300 sec: 53484.0). Total num frames: 48852992. Throughput: 0: 13353.5. Samples: 8602540. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:22:03,085][140404] Avg episode reward: [(0, '41.916')] [2023-08-17 13:22:03,804][140503] Updated weights for policy 0, policy_version 11936 (0.0007) [2023-08-17 13:22:04,602][140503] Updated weights for policy 0, policy_version 11946 (0.0006) [2023-08-17 13:22:05,377][140503] Updated weights for policy 0, policy_version 11956 (0.0007) [2023-08-17 13:22:06,135][140503] Updated weights for policy 0, policy_version 11966 (0.0007) [2023-08-17 13:22:06,899][140503] Updated weights for policy 0, policy_version 11976 (0.0007) [2023-08-17 13:22:07,689][140503] Updated weights for policy 0, policy_version 11986 (0.0006) [2023-08-17 13:22:08,084][140404] Fps is (10 sec: 53657.9, 60 sec: 53316.2, 300 sec: 53456.3). Total num frames: 49115136. Throughput: 0: 13340.8. Samples: 8641928. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:22:08,085][140404] Avg episode reward: [(0, '42.304')] [2023-08-17 13:22:08,419][140503] Updated weights for policy 0, policy_version 11996 (0.0006) [2023-08-17 13:22:09,155][140503] Updated weights for policy 0, policy_version 12006 (0.0006) [2023-08-17 13:22:09,924][140503] Updated weights for policy 0, policy_version 12016 (0.0006) [2023-08-17 13:22:10,692][140503] Updated weights for policy 0, policy_version 12026 (0.0007) [2023-08-17 13:22:11,423][140503] Updated weights for policy 0, policy_version 12036 (0.0006) [2023-08-17 13:22:12,175][140503] Updated weights for policy 0, policy_version 12046 (0.0006) [2023-08-17 13:22:12,892][140503] Updated weights for policy 0, policy_version 12056 (0.0005) [2023-08-17 13:22:13,084][140404] Fps is (10 sec: 53657.6, 60 sec: 53452.8, 300 sec: 53470.2). Total num frames: 49389568. Throughput: 0: 13386.2. Samples: 8723380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:22:13,085][140404] Avg episode reward: [(0, '39.658')] [2023-08-17 13:22:13,671][140503] Updated weights for policy 0, policy_version 12066 (0.0006) [2023-08-17 13:22:14,425][140503] Updated weights for policy 0, policy_version 12076 (0.0006) [2023-08-17 13:22:15,175][140503] Updated weights for policy 0, policy_version 12086 (0.0006) [2023-08-17 13:22:15,926][140503] Updated weights for policy 0, policy_version 12096 (0.0006) [2023-08-17 13:22:16,698][140503] Updated weights for policy 0, policy_version 12106 (0.0006) [2023-08-17 13:22:17,448][140503] Updated weights for policy 0, policy_version 12116 (0.0006) [2023-08-17 13:22:18,084][140404] Fps is (10 sec: 54476.8, 60 sec: 53521.1, 300 sec: 53470.2). Total num frames: 49659904. Throughput: 0: 13406.9. Samples: 8804700. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:22:18,085][140404] Avg episode reward: [(0, '40.606')] [2023-08-17 13:22:18,225][140503] Updated weights for policy 0, policy_version 12126 (0.0006) [2023-08-17 13:22:18,982][140503] Updated weights for policy 0, policy_version 12136 (0.0006) [2023-08-17 13:22:19,735][140503] Updated weights for policy 0, policy_version 12146 (0.0006) [2023-08-17 13:22:20,532][140503] Updated weights for policy 0, policy_version 12156 (0.0007) [2023-08-17 13:22:21,261][140503] Updated weights for policy 0, policy_version 12166 (0.0006) [2023-08-17 13:22:22,011][140503] Updated weights for policy 0, policy_version 12176 (0.0006) [2023-08-17 13:22:22,719][140503] Updated weights for policy 0, policy_version 12186 (0.0006) [2023-08-17 13:22:23,084][140404] Fps is (10 sec: 54066.9, 60 sec: 53589.3, 300 sec: 53470.2). Total num frames: 49930240. Throughput: 0: 13419.4. Samples: 8844864. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:22:23,085][140404] Avg episode reward: [(0, '38.861')] [2023-08-17 13:22:23,524][140503] Updated weights for policy 0, policy_version 12196 (0.0006) [2023-08-17 13:22:24,289][140503] Updated weights for policy 0, policy_version 12206 (0.0006) [2023-08-17 13:22:25,040][140503] Updated weights for policy 0, policy_version 12216 (0.0007) [2023-08-17 13:22:25,784][140503] Updated weights for policy 0, policy_version 12226 (0.0007) [2023-08-17 13:22:26,550][140503] Updated weights for policy 0, policy_version 12236 (0.0006) [2023-08-17 13:22:27,287][140503] Updated weights for policy 0, policy_version 12246 (0.0006) [2023-08-17 13:22:28,040][140503] Updated weights for policy 0, policy_version 12256 (0.0006) [2023-08-17 13:22:28,084][140404] Fps is (10 sec: 54476.7, 60 sec: 53794.1, 300 sec: 53497.9). Total num frames: 50204672. Throughput: 0: 13434.4. Samples: 8926476. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:22:28,085][140404] Avg episode reward: [(0, '41.514')] [2023-08-17 13:22:28,795][140503] Updated weights for policy 0, policy_version 12266 (0.0006) [2023-08-17 13:22:29,590][140503] Updated weights for policy 0, policy_version 12276 (0.0007) [2023-08-17 13:22:30,325][140503] Updated weights for policy 0, policy_version 12286 (0.0006) [2023-08-17 13:22:31,079][140503] Updated weights for policy 0, policy_version 12296 (0.0006) [2023-08-17 13:22:31,857][140503] Updated weights for policy 0, policy_version 12306 (0.0007) [2023-08-17 13:22:32,577][140503] Updated weights for policy 0, policy_version 12316 (0.0006) [2023-08-17 13:22:33,084][140404] Fps is (10 sec: 54067.3, 60 sec: 53794.1, 300 sec: 53470.2). Total num frames: 50470912. Throughput: 0: 13457.5. Samples: 9007540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:22:33,085][140404] Avg episode reward: [(0, '45.341')] [2023-08-17 13:22:33,088][140489] Saving new best policy, reward=45.341! [2023-08-17 13:22:33,381][140503] Updated weights for policy 0, policy_version 12326 (0.0007) [2023-08-17 13:22:34,131][140503] Updated weights for policy 0, policy_version 12336 (0.0006) [2023-08-17 13:22:34,907][140503] Updated weights for policy 0, policy_version 12346 (0.0006) [2023-08-17 13:22:35,657][140503] Updated weights for policy 0, policy_version 12356 (0.0006) [2023-08-17 13:22:36,453][140503] Updated weights for policy 0, policy_version 12366 (0.0007) [2023-08-17 13:22:37,199][140503] Updated weights for policy 0, policy_version 12376 (0.0006) [2023-08-17 13:22:37,977][140503] Updated weights for policy 0, policy_version 12386 (0.0007) [2023-08-17 13:22:38,084][140404] Fps is (10 sec: 53248.3, 60 sec: 53725.9, 300 sec: 53456.3). Total num frames: 50737152. Throughput: 0: 13449.6. Samples: 9047360. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2023-08-17 13:22:38,085][140404] Avg episode reward: [(0, '38.457')] [2023-08-17 13:22:38,735][140503] Updated weights for policy 0, policy_version 12396 (0.0006) [2023-08-17 13:22:39,549][140503] Updated weights for policy 0, policy_version 12406 (0.0007) [2023-08-17 13:22:40,343][140503] Updated weights for policy 0, policy_version 12416 (0.0007) [2023-08-17 13:22:41,123][140503] Updated weights for policy 0, policy_version 12426 (0.0007) [2023-08-17 13:22:41,893][140503] Updated weights for policy 0, policy_version 12436 (0.0007) [2023-08-17 13:22:42,677][140503] Updated weights for policy 0, policy_version 12446 (0.0007) [2023-08-17 13:22:43,084][140404] Fps is (10 sec: 52838.6, 60 sec: 53589.4, 300 sec: 53456.3). Total num frames: 50999296. Throughput: 0: 13431.7. Samples: 9126468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2023-08-17 13:22:43,085][140404] Avg episode reward: [(0, '41.702')] [2023-08-17 13:22:43,461][140503] Updated weights for policy 0, policy_version 12456 (0.0007) [2023-08-17 13:22:44,229][140503] Updated weights for policy 0, policy_version 12466 (0.0007) [2023-08-17 13:22:44,982][140503] Updated weights for policy 0, policy_version 12476 (0.0006) [2023-08-17 13:22:45,790][140503] Updated weights for policy 0, policy_version 12486 (0.0007) [2023-08-17 13:22:46,543][140503] Updated weights for policy 0, policy_version 12496 (0.0007) [2023-08-17 13:22:47,302][140503] Updated weights for policy 0, policy_version 12506 (0.0006) [2023-08-17 13:22:48,059][140503] Updated weights for policy 0, policy_version 12516 (0.0006) [2023-08-17 13:22:48,084][140404] Fps is (10 sec: 52838.4, 60 sec: 53589.4, 300 sec: 53456.3). Total num frames: 51265536. Throughput: 0: 13410.1. Samples: 9205996. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:22:48,085][140404] Avg episode reward: [(0, '40.960')] [2023-08-17 13:22:48,819][140503] Updated weights for policy 0, policy_version 12526 (0.0006) [2023-08-17 13:22:49,604][140503] Updated weights for policy 0, policy_version 12536 (0.0007) [2023-08-17 13:22:50,394][140503] Updated weights for policy 0, policy_version 12546 (0.0006) [2023-08-17 13:22:51,143][140503] Updated weights for policy 0, policy_version 12556 (0.0007) [2023-08-17 13:22:51,905][140503] Updated weights for policy 0, policy_version 12566 (0.0006) [2023-08-17 13:22:52,654][140503] Updated weights for policy 0, policy_version 12576 (0.0006) [2023-08-17 13:22:53,084][140404] Fps is (10 sec: 53247.8, 60 sec: 53589.3, 300 sec: 53456.3). Total num frames: 51531776. Throughput: 0: 13417.8. Samples: 9245728. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:22:53,085][140404] Avg episode reward: [(0, '40.301')] [2023-08-17 13:22:53,400][140503] Updated weights for policy 0, policy_version 12586 (0.0006) [2023-08-17 13:22:54,151][140503] Updated weights for policy 0, policy_version 12596 (0.0006) [2023-08-17 13:22:54,926][140503] Updated weights for policy 0, policy_version 12606 (0.0006) [2023-08-17 13:22:55,713][140503] Updated weights for policy 0, policy_version 12616 (0.0006) [2023-08-17 13:22:56,480][140503] Updated weights for policy 0, policy_version 12626 (0.0007) [2023-08-17 13:22:57,220][140503] Updated weights for policy 0, policy_version 12636 (0.0006) [2023-08-17 13:22:57,980][140503] Updated weights for policy 0, policy_version 12646 (0.0006) [2023-08-17 13:22:58,084][140404] Fps is (10 sec: 53657.2, 60 sec: 53725.9, 300 sec: 53484.0). Total num frames: 51802112. Throughput: 0: 13411.7. Samples: 9326908. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:22:58,085][140404] Avg episode reward: [(0, '40.348')] [2023-08-17 13:22:58,733][140503] Updated weights for policy 0, policy_version 12656 (0.0006) [2023-08-17 13:22:59,488][140503] Updated weights for policy 0, policy_version 12666 (0.0006) [2023-08-17 13:23:00,230][140503] Updated weights for policy 0, policy_version 12676 (0.0006) [2023-08-17 13:23:00,975][140503] Updated weights for policy 0, policy_version 12686 (0.0006) [2023-08-17 13:23:01,701][140503] Updated weights for policy 0, policy_version 12696 (0.0005) [2023-08-17 13:23:02,463][140503] Updated weights for policy 0, policy_version 12706 (0.0006) [2023-08-17 13:23:03,084][140404] Fps is (10 sec: 54476.5, 60 sec: 53725.8, 300 sec: 53484.0). Total num frames: 52076544. Throughput: 0: 13427.6. Samples: 9408944. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:23:03,085][140404] Avg episode reward: [(0, '42.237')] [2023-08-17 13:23:03,089][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000012714_52076544.pth... [2023-08-17 13:23:03,127][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000009578_39231488.pth [2023-08-17 13:23:03,206][140503] Updated weights for policy 0, policy_version 12716 (0.0006) [2023-08-17 13:23:04,022][140503] Updated weights for policy 0, policy_version 12726 (0.0007) [2023-08-17 13:23:04,765][140503] Updated weights for policy 0, policy_version 12736 (0.0006) [2023-08-17 13:23:05,530][140503] Updated weights for policy 0, policy_version 12746 (0.0006) [2023-08-17 13:23:06,267][140503] Updated weights for policy 0, policy_version 12756 (0.0006) [2023-08-17 13:23:07,019][140503] Updated weights for policy 0, policy_version 12766 (0.0006) [2023-08-17 13:23:07,784][140503] Updated weights for policy 0, policy_version 12776 (0.0007) [2023-08-17 13:23:08,084][140404] Fps is (10 sec: 54067.5, 60 sec: 53794.1, 300 sec: 53497.9). Total num frames: 52342784. Throughput: 0: 13420.5. Samples: 9448788. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:23:08,085][140404] Avg episode reward: [(0, '44.021')] [2023-08-17 13:23:08,562][140503] Updated weights for policy 0, policy_version 12786 (0.0007) [2023-08-17 13:23:09,341][140503] Updated weights for policy 0, policy_version 12796 (0.0007) [2023-08-17 13:23:10,121][140503] Updated weights for policy 0, policy_version 12806 (0.0007) [2023-08-17 13:23:10,900][140503] Updated weights for policy 0, policy_version 12816 (0.0007) [2023-08-17 13:23:11,687][140503] Updated weights for policy 0, policy_version 12826 (0.0007) [2023-08-17 13:23:12,426][140503] Updated weights for policy 0, policy_version 12836 (0.0006) [2023-08-17 13:23:13,084][140404] Fps is (10 sec: 53248.3, 60 sec: 53657.6, 300 sec: 53484.0). Total num frames: 52609024. Throughput: 0: 13376.3. Samples: 9528408. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:23:13,085][140404] Avg episode reward: [(0, '40.441')] [2023-08-17 13:23:13,229][140503] Updated weights for policy 0, policy_version 12846 (0.0007) [2023-08-17 13:23:13,998][140503] Updated weights for policy 0, policy_version 12856 (0.0006) [2023-08-17 13:23:14,763][140503] Updated weights for policy 0, policy_version 12866 (0.0006) [2023-08-17 13:23:15,548][140503] Updated weights for policy 0, policy_version 12876 (0.0007) [2023-08-17 13:23:16,292][140503] Updated weights for policy 0, policy_version 12886 (0.0006) [2023-08-17 13:23:17,066][140503] Updated weights for policy 0, policy_version 12896 (0.0007) [2023-08-17 13:23:17,875][140503] Updated weights for policy 0, policy_version 12906 (0.0007) [2023-08-17 13:23:18,084][140404] Fps is (10 sec: 52838.3, 60 sec: 53521.1, 300 sec: 53470.1). Total num frames: 52871168. Throughput: 0: 13341.1. Samples: 9607888. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:23:18,085][140404] Avg episode reward: [(0, '41.486')] [2023-08-17 13:23:18,626][140503] Updated weights for policy 0, policy_version 12916 (0.0006) [2023-08-17 13:23:19,468][140503] Updated weights for policy 0, policy_version 12926 (0.0007) [2023-08-17 13:23:20,237][140503] Updated weights for policy 0, policy_version 12936 (0.0007) [2023-08-17 13:23:21,012][140503] Updated weights for policy 0, policy_version 12946 (0.0006) [2023-08-17 13:23:21,800][140503] Updated weights for policy 0, policy_version 12956 (0.0006) [2023-08-17 13:23:22,541][140503] Updated weights for policy 0, policy_version 12966 (0.0006) [2023-08-17 13:23:23,084][140404] Fps is (10 sec: 52838.6, 60 sec: 53452.8, 300 sec: 53470.2). Total num frames: 53137408. Throughput: 0: 13324.4. Samples: 9646960. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:23:23,085][140404] Avg episode reward: [(0, '44.213')] [2023-08-17 13:23:23,270][140503] Updated weights for policy 0, policy_version 12976 (0.0006) [2023-08-17 13:23:24,045][140503] Updated weights for policy 0, policy_version 12986 (0.0006) [2023-08-17 13:23:24,810][140503] Updated weights for policy 0, policy_version 12996 (0.0006) [2023-08-17 13:23:25,589][140503] Updated weights for policy 0, policy_version 13006 (0.0006) [2023-08-17 13:23:26,333][140503] Updated weights for policy 0, policy_version 13016 (0.0006) [2023-08-17 13:23:27,083][140503] Updated weights for policy 0, policy_version 13026 (0.0006) [2023-08-17 13:23:27,838][140503] Updated weights for policy 0, policy_version 13036 (0.0006) [2023-08-17 13:23:28,084][140404] Fps is (10 sec: 53657.6, 60 sec: 53384.5, 300 sec: 53470.1). Total num frames: 53407744. Throughput: 0: 13366.4. Samples: 9727956. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:23:28,085][140404] Avg episode reward: [(0, '41.781')] [2023-08-17 13:23:28,572][140503] Updated weights for policy 0, policy_version 13046 (0.0006) [2023-08-17 13:23:29,335][140503] Updated weights for policy 0, policy_version 13056 (0.0006) [2023-08-17 13:23:30,094][140503] Updated weights for policy 0, policy_version 13066 (0.0006) [2023-08-17 13:23:30,835][140503] Updated weights for policy 0, policy_version 13076 (0.0006) [2023-08-17 13:23:31,571][140503] Updated weights for policy 0, policy_version 13086 (0.0006) [2023-08-17 13:23:32,342][140503] Updated weights for policy 0, policy_version 13096 (0.0006) [2023-08-17 13:23:33,076][140503] Updated weights for policy 0, policy_version 13106 (0.0006) [2023-08-17 13:23:33,084][140404] Fps is (10 sec: 54476.3, 60 sec: 53521.0, 300 sec: 53497.9). Total num frames: 53682176. Throughput: 0: 13423.8. Samples: 9810068. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:23:33,085][140404] Avg episode reward: [(0, '39.926')] [2023-08-17 13:23:33,850][140503] Updated weights for policy 0, policy_version 13116 (0.0007) [2023-08-17 13:23:34,616][140503] Updated weights for policy 0, policy_version 13126 (0.0007) [2023-08-17 13:23:35,394][140503] Updated weights for policy 0, policy_version 13136 (0.0007) [2023-08-17 13:23:36,170][140503] Updated weights for policy 0, policy_version 13146 (0.0007) [2023-08-17 13:23:36,959][140503] Updated weights for policy 0, policy_version 13156 (0.0007) [2023-08-17 13:23:37,734][140503] Updated weights for policy 0, policy_version 13166 (0.0006) [2023-08-17 13:23:38,084][140404] Fps is (10 sec: 53657.3, 60 sec: 53452.7, 300 sec: 53484.0). Total num frames: 53944320. Throughput: 0: 13420.5. Samples: 9849652. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:23:38,085][140404] Avg episode reward: [(0, '42.962')] [2023-08-17 13:23:38,457][140503] Updated weights for policy 0, policy_version 13176 (0.0006) [2023-08-17 13:23:39,231][140503] Updated weights for policy 0, policy_version 13186 (0.0006) [2023-08-17 13:23:39,993][140503] Updated weights for policy 0, policy_version 13196 (0.0006) [2023-08-17 13:23:40,750][140503] Updated weights for policy 0, policy_version 13206 (0.0006) [2023-08-17 13:23:41,564][140503] Updated weights for policy 0, policy_version 13216 (0.0007) [2023-08-17 13:23:42,327][140503] Updated weights for policy 0, policy_version 13226 (0.0007) [2023-08-17 13:23:43,070][140503] Updated weights for policy 0, policy_version 13236 (0.0006) [2023-08-17 13:23:43,084][140404] Fps is (10 sec: 53248.2, 60 sec: 53589.3, 300 sec: 53497.9). Total num frames: 54214656. Throughput: 0: 13393.3. Samples: 9929604. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:23:43,085][140404] Avg episode reward: [(0, '41.434')] [2023-08-17 13:23:43,821][140503] Updated weights for policy 0, policy_version 13246 (0.0007) [2023-08-17 13:23:44,599][140503] Updated weights for policy 0, policy_version 13256 (0.0007) [2023-08-17 13:23:45,349][140503] Updated weights for policy 0, policy_version 13266 (0.0006) [2023-08-17 13:23:46,110][140503] Updated weights for policy 0, policy_version 13276 (0.0006) [2023-08-17 13:23:46,876][140503] Updated weights for policy 0, policy_version 13286 (0.0007) [2023-08-17 13:23:47,710][140503] Updated weights for policy 0, policy_version 13296 (0.0007) [2023-08-17 13:23:48,084][140404] Fps is (10 sec: 53248.4, 60 sec: 53521.0, 300 sec: 53497.9). Total num frames: 54476800. Throughput: 0: 13344.1. Samples: 10009428. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:23:48,085][140404] Avg episode reward: [(0, '43.412')] [2023-08-17 13:23:48,480][140503] Updated weights for policy 0, policy_version 13306 (0.0006) [2023-08-17 13:23:49,234][140503] Updated weights for policy 0, policy_version 13316 (0.0006) [2023-08-17 13:23:49,999][140503] Updated weights for policy 0, policy_version 13326 (0.0006) [2023-08-17 13:23:50,749][140503] Updated weights for policy 0, policy_version 13336 (0.0006) [2023-08-17 13:23:51,527][140503] Updated weights for policy 0, policy_version 13346 (0.0006) [2023-08-17 13:23:52,326][140503] Updated weights for policy 0, policy_version 13356 (0.0007) [2023-08-17 13:23:53,067][140503] Updated weights for policy 0, policy_version 13366 (0.0007) [2023-08-17 13:23:53,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53589.3, 300 sec: 53511.8). Total num frames: 54747136. Throughput: 0: 13355.2. Samples: 10049772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:23:53,085][140404] Avg episode reward: [(0, '41.801')] [2023-08-17 13:23:53,849][140503] Updated weights for policy 0, policy_version 13376 (0.0006) [2023-08-17 13:23:54,596][140503] Updated weights for policy 0, policy_version 13386 (0.0007) [2023-08-17 13:23:55,352][140503] Updated weights for policy 0, policy_version 13396 (0.0006) [2023-08-17 13:23:56,107][140503] Updated weights for policy 0, policy_version 13406 (0.0007) [2023-08-17 13:23:56,893][140503] Updated weights for policy 0, policy_version 13416 (0.0007) [2023-08-17 13:23:57,682][140503] Updated weights for policy 0, policy_version 13426 (0.0007) [2023-08-17 13:23:58,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53521.1, 300 sec: 53497.9). Total num frames: 55013376. Throughput: 0: 13364.7. Samples: 10129820. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:23:58,085][140404] Avg episode reward: [(0, '41.750')] [2023-08-17 13:23:58,484][140503] Updated weights for policy 0, policy_version 13436 (0.0007) [2023-08-17 13:23:59,228][140503] Updated weights for policy 0, policy_version 13446 (0.0006) [2023-08-17 13:23:59,951][140503] Updated weights for policy 0, policy_version 13456 (0.0006) [2023-08-17 13:24:00,731][140503] Updated weights for policy 0, policy_version 13466 (0.0007) [2023-08-17 13:24:01,529][140503] Updated weights for policy 0, policy_version 13476 (0.0007) [2023-08-17 13:24:02,312][140503] Updated weights for policy 0, policy_version 13486 (0.0007) [2023-08-17 13:24:03,070][140503] Updated weights for policy 0, policy_version 13496 (0.0006) [2023-08-17 13:24:03,084][140404] Fps is (10 sec: 53248.2, 60 sec: 53384.6, 300 sec: 53484.1). Total num frames: 55279616. Throughput: 0: 13368.9. Samples: 10209488. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:24:03,085][140404] Avg episode reward: [(0, '42.958')] [2023-08-17 13:24:03,848][140503] Updated weights for policy 0, policy_version 13506 (0.0006) [2023-08-17 13:24:04,596][140503] Updated weights for policy 0, policy_version 13516 (0.0007) [2023-08-17 13:24:05,428][140503] Updated weights for policy 0, policy_version 13526 (0.0007) [2023-08-17 13:24:06,231][140503] Updated weights for policy 0, policy_version 13536 (0.0007) [2023-08-17 13:24:06,970][140503] Updated weights for policy 0, policy_version 13546 (0.0007) [2023-08-17 13:24:07,718][140503] Updated weights for policy 0, policy_version 13556 (0.0007) [2023-08-17 13:24:08,084][140404] Fps is (10 sec: 52838.2, 60 sec: 53316.2, 300 sec: 53470.2). Total num frames: 55541760. Throughput: 0: 13368.3. Samples: 10248532. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:24:08,085][140404] Avg episode reward: [(0, '42.966')] [2023-08-17 13:24:08,491][140503] Updated weights for policy 0, policy_version 13566 (0.0006) [2023-08-17 13:24:09,229][140503] Updated weights for policy 0, policy_version 13576 (0.0006) [2023-08-17 13:24:09,989][140503] Updated weights for policy 0, policy_version 13586 (0.0006) [2023-08-17 13:24:10,758][140503] Updated weights for policy 0, policy_version 13596 (0.0006) [2023-08-17 13:24:11,502][140503] Updated weights for policy 0, policy_version 13606 (0.0007) [2023-08-17 13:24:12,247][140503] Updated weights for policy 0, policy_version 13616 (0.0006) [2023-08-17 13:24:13,038][140503] Updated weights for policy 0, policy_version 13626 (0.0007) [2023-08-17 13:24:13,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53384.5, 300 sec: 53484.1). Total num frames: 55812096. Throughput: 0: 13371.4. Samples: 10329668. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:24:13,085][140404] Avg episode reward: [(0, '46.636')] [2023-08-17 13:24:13,088][140489] Saving new best policy, reward=46.636! [2023-08-17 13:24:13,808][140503] Updated weights for policy 0, policy_version 13636 (0.0007) [2023-08-17 13:24:14,583][140503] Updated weights for policy 0, policy_version 13646 (0.0006) [2023-08-17 13:24:15,363][140503] Updated weights for policy 0, policy_version 13656 (0.0007) [2023-08-17 13:24:16,145][140503] Updated weights for policy 0, policy_version 13666 (0.0007) [2023-08-17 13:24:16,856][140503] Updated weights for policy 0, policy_version 13676 (0.0006) [2023-08-17 13:24:17,625][140503] Updated weights for policy 0, policy_version 13686 (0.0006) [2023-08-17 13:24:18,084][140404] Fps is (10 sec: 54067.5, 60 sec: 53521.1, 300 sec: 53497.9). Total num frames: 56082432. Throughput: 0: 13337.9. Samples: 10410272. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:24:18,085][140404] Avg episode reward: [(0, '41.911')] [2023-08-17 13:24:18,387][140503] Updated weights for policy 0, policy_version 13696 (0.0006) [2023-08-17 13:24:19,125][140503] Updated weights for policy 0, policy_version 13706 (0.0007) [2023-08-17 13:24:19,892][140503] Updated weights for policy 0, policy_version 13716 (0.0006) [2023-08-17 13:24:20,656][140503] Updated weights for policy 0, policy_version 13726 (0.0007) [2023-08-17 13:24:21,431][140503] Updated weights for policy 0, policy_version 13736 (0.0007) [2023-08-17 13:24:22,203][140503] Updated weights for policy 0, policy_version 13746 (0.0006) [2023-08-17 13:24:22,891][140503] Updated weights for policy 0, policy_version 13756 (0.0005) [2023-08-17 13:24:23,084][140404] Fps is (10 sec: 54067.3, 60 sec: 53589.3, 300 sec: 53511.8). Total num frames: 56352768. Throughput: 0: 13351.1. Samples: 10450452. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:24:23,085][140404] Avg episode reward: [(0, '39.813')] [2023-08-17 13:24:23,670][140503] Updated weights for policy 0, policy_version 13766 (0.0006) [2023-08-17 13:24:24,411][140503] Updated weights for policy 0, policy_version 13776 (0.0006) [2023-08-17 13:24:25,195][140503] Updated weights for policy 0, policy_version 13786 (0.0006) [2023-08-17 13:24:25,960][140503] Updated weights for policy 0, policy_version 13796 (0.0006) [2023-08-17 13:24:26,703][140503] Updated weights for policy 0, policy_version 13806 (0.0006) [2023-08-17 13:24:27,486][140503] Updated weights for policy 0, policy_version 13816 (0.0006) [2023-08-17 13:24:28,084][140404] Fps is (10 sec: 53657.2, 60 sec: 53521.0, 300 sec: 53511.8). Total num frames: 56619008. Throughput: 0: 13379.4. Samples: 10531676. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:24:28,085][140404] Avg episode reward: [(0, '42.330')] [2023-08-17 13:24:28,228][140503] Updated weights for policy 0, policy_version 13826 (0.0006) [2023-08-17 13:24:28,993][140503] Updated weights for policy 0, policy_version 13836 (0.0007) [2023-08-17 13:24:29,780][140503] Updated weights for policy 0, policy_version 13846 (0.0007) [2023-08-17 13:24:30,525][140503] Updated weights for policy 0, policy_version 13856 (0.0006) [2023-08-17 13:24:31,298][140503] Updated weights for policy 0, policy_version 13866 (0.0007) [2023-08-17 13:24:32,058][140503] Updated weights for policy 0, policy_version 13876 (0.0006) [2023-08-17 13:24:32,802][140503] Updated weights for policy 0, policy_version 13886 (0.0006) [2023-08-17 13:24:33,084][140404] Fps is (10 sec: 53657.3, 60 sec: 53452.8, 300 sec: 53511.8). Total num frames: 56889344. Throughput: 0: 13401.2. Samples: 10612484. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-08-17 13:24:33,085][140404] Avg episode reward: [(0, '42.066')] [2023-08-17 13:24:33,563][140503] Updated weights for policy 0, policy_version 13896 (0.0007) [2023-08-17 13:24:34,307][140503] Updated weights for policy 0, policy_version 13906 (0.0006) [2023-08-17 13:24:35,068][140503] Updated weights for policy 0, policy_version 13916 (0.0007) [2023-08-17 13:24:35,839][140503] Updated weights for policy 0, policy_version 13926 (0.0007) [2023-08-17 13:24:36,583][140503] Updated weights for policy 0, policy_version 13936 (0.0006) [2023-08-17 13:24:37,338][140503] Updated weights for policy 0, policy_version 13946 (0.0006) [2023-08-17 13:24:38,057][140503] Updated weights for policy 0, policy_version 13956 (0.0006) [2023-08-17 13:24:38,084][140404] Fps is (10 sec: 54476.9, 60 sec: 53657.6, 300 sec: 53539.6). Total num frames: 57163776. Throughput: 0: 13402.7. Samples: 10652892. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-08-17 13:24:38,085][140404] Avg episode reward: [(0, '44.793')] [2023-08-17 13:24:38,836][140503] Updated weights for policy 0, policy_version 13966 (0.0007) [2023-08-17 13:24:39,587][140503] Updated weights for policy 0, policy_version 13976 (0.0006) [2023-08-17 13:24:40,381][140503] Updated weights for policy 0, policy_version 13986 (0.0006) [2023-08-17 13:24:41,141][140503] Updated weights for policy 0, policy_version 13996 (0.0006) [2023-08-17 13:24:41,875][140503] Updated weights for policy 0, policy_version 14006 (0.0006) [2023-08-17 13:24:42,636][140503] Updated weights for policy 0, policy_version 14016 (0.0006) [2023-08-17 13:24:43,084][140404] Fps is (10 sec: 54067.6, 60 sec: 53589.4, 300 sec: 53525.7). Total num frames: 57430016. Throughput: 0: 13433.2. Samples: 10734312. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:24:43,085][140404] Avg episode reward: [(0, '44.290')] [2023-08-17 13:24:43,443][140503] Updated weights for policy 0, policy_version 14026 (0.0007) [2023-08-17 13:24:44,236][140503] Updated weights for policy 0, policy_version 14036 (0.0007) [2023-08-17 13:24:45,001][140503] Updated weights for policy 0, policy_version 14046 (0.0007) [2023-08-17 13:24:45,757][140503] Updated weights for policy 0, policy_version 14056 (0.0006) [2023-08-17 13:24:46,527][140503] Updated weights for policy 0, policy_version 14066 (0.0006) [2023-08-17 13:24:47,324][140503] Updated weights for policy 0, policy_version 14076 (0.0006) [2023-08-17 13:24:48,037][140503] Updated weights for policy 0, policy_version 14086 (0.0006) [2023-08-17 13:24:48,084][140404] Fps is (10 sec: 53248.1, 60 sec: 53657.6, 300 sec: 53511.8). Total num frames: 57696256. Throughput: 0: 13432.2. Samples: 10813936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:24:48,085][140404] Avg episode reward: [(0, '44.024')] [2023-08-17 13:24:48,832][140503] Updated weights for policy 0, policy_version 14096 (0.0006) [2023-08-17 13:24:49,602][140503] Updated weights for policy 0, policy_version 14106 (0.0007) [2023-08-17 13:24:50,392][140503] Updated weights for policy 0, policy_version 14116 (0.0007) [2023-08-17 13:24:51,152][140503] Updated weights for policy 0, policy_version 14126 (0.0006) [2023-08-17 13:24:51,885][140503] Updated weights for policy 0, policy_version 14136 (0.0006) [2023-08-17 13:24:52,657][140503] Updated weights for policy 0, policy_version 14146 (0.0006) [2023-08-17 13:24:53,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53589.4, 300 sec: 53497.9). Total num frames: 57962496. Throughput: 0: 13438.9. Samples: 10853280. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:24:53,085][140404] Avg episode reward: [(0, '40.979')] [2023-08-17 13:24:53,435][140503] Updated weights for policy 0, policy_version 14156 (0.0007) [2023-08-17 13:24:54,189][140503] Updated weights for policy 0, policy_version 14166 (0.0006) [2023-08-17 13:24:54,964][140503] Updated weights for policy 0, policy_version 14176 (0.0007) [2023-08-17 13:24:55,742][140503] Updated weights for policy 0, policy_version 14186 (0.0006) [2023-08-17 13:24:56,487][140503] Updated weights for policy 0, policy_version 14196 (0.0006) [2023-08-17 13:24:57,227][140503] Updated weights for policy 0, policy_version 14206 (0.0006) [2023-08-17 13:24:57,990][140503] Updated weights for policy 0, policy_version 14216 (0.0006) [2023-08-17 13:24:58,084][140404] Fps is (10 sec: 53657.6, 60 sec: 53657.6, 300 sec: 53511.8). Total num frames: 58232832. Throughput: 0: 13433.9. Samples: 10934192. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:24:58,085][140404] Avg episode reward: [(0, '40.433')] [2023-08-17 13:24:58,772][140503] Updated weights for policy 0, policy_version 14226 (0.0007) [2023-08-17 13:24:59,532][140503] Updated weights for policy 0, policy_version 14236 (0.0006) [2023-08-17 13:25:00,242][140503] Updated weights for policy 0, policy_version 14246 (0.0006) [2023-08-17 13:25:01,027][140503] Updated weights for policy 0, policy_version 14256 (0.0006) [2023-08-17 13:25:01,769][140503] Updated weights for policy 0, policy_version 14266 (0.0006) [2023-08-17 13:25:02,533][140503] Updated weights for policy 0, policy_version 14276 (0.0006) [2023-08-17 13:25:03,084][140404] Fps is (10 sec: 54067.2, 60 sec: 53725.9, 300 sec: 53525.7). Total num frames: 58503168. Throughput: 0: 13443.7. Samples: 11015240. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:25:03,085][140404] Avg episode reward: [(0, '41.848')] [2023-08-17 13:25:03,089][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000014283_58503168.pth... [2023-08-17 13:25:03,138][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000011145_45649920.pth [2023-08-17 13:25:03,301][140503] Updated weights for policy 0, policy_version 14286 (0.0006) [2023-08-17 13:25:04,102][140503] Updated weights for policy 0, policy_version 14296 (0.0007) [2023-08-17 13:25:04,843][140503] Updated weights for policy 0, policy_version 14306 (0.0006) [2023-08-17 13:25:05,623][140503] Updated weights for policy 0, policy_version 14316 (0.0006) [2023-08-17 13:25:06,381][140503] Updated weights for policy 0, policy_version 14326 (0.0006) [2023-08-17 13:25:07,163][140503] Updated weights for policy 0, policy_version 14336 (0.0007) [2023-08-17 13:25:07,917][140503] Updated weights for policy 0, policy_version 14346 (0.0007) [2023-08-17 13:25:08,084][140404] Fps is (10 sec: 53657.2, 60 sec: 53794.1, 300 sec: 53497.9). Total num frames: 58769408. Throughput: 0: 13438.8. Samples: 11055200. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:25:08,085][140404] Avg episode reward: [(0, '42.324')] [2023-08-17 13:25:08,681][140503] Updated weights for policy 0, policy_version 14356 (0.0006) [2023-08-17 13:25:09,431][140503] Updated weights for policy 0, policy_version 14366 (0.0006) [2023-08-17 13:25:10,186][140503] Updated weights for policy 0, policy_version 14376 (0.0006) [2023-08-17 13:25:10,959][140503] Updated weights for policy 0, policy_version 14386 (0.0007) [2023-08-17 13:25:11,715][140503] Updated weights for policy 0, policy_version 14396 (0.0006) [2023-08-17 13:25:12,540][140503] Updated weights for policy 0, policy_version 14406 (0.0007) [2023-08-17 13:25:13,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53725.9, 300 sec: 53511.8). Total num frames: 59035648. Throughput: 0: 13418.1. Samples: 11135492. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-08-17 13:25:13,085][140404] Avg episode reward: [(0, '45.286')] [2023-08-17 13:25:13,327][140503] Updated weights for policy 0, policy_version 14416 (0.0007) [2023-08-17 13:25:14,087][140503] Updated weights for policy 0, policy_version 14426 (0.0006) [2023-08-17 13:25:14,848][140503] Updated weights for policy 0, policy_version 14436 (0.0007) [2023-08-17 13:25:15,625][140503] Updated weights for policy 0, policy_version 14446 (0.0006) [2023-08-17 13:25:16,374][140503] Updated weights for policy 0, policy_version 14456 (0.0007) [2023-08-17 13:25:17,131][140503] Updated weights for policy 0, policy_version 14466 (0.0006) [2023-08-17 13:25:17,880][140503] Updated weights for policy 0, policy_version 14476 (0.0006) [2023-08-17 13:25:18,084][140404] Fps is (10 sec: 53248.1, 60 sec: 53657.5, 300 sec: 53511.8). Total num frames: 59301888. Throughput: 0: 13395.6. Samples: 11215288. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-08-17 13:25:18,085][140404] Avg episode reward: [(0, '43.284')] [2023-08-17 13:25:18,678][140503] Updated weights for policy 0, policy_version 14486 (0.0007) [2023-08-17 13:25:19,539][140503] Updated weights for policy 0, policy_version 14496 (0.0007) [2023-08-17 13:25:20,315][140503] Updated weights for policy 0, policy_version 14506 (0.0007) [2023-08-17 13:25:21,100][140503] Updated weights for policy 0, policy_version 14516 (0.0007) [2023-08-17 13:25:21,844][140503] Updated weights for policy 0, policy_version 14526 (0.0007) [2023-08-17 13:25:22,610][140503] Updated weights for policy 0, policy_version 14536 (0.0006) [2023-08-17 13:25:23,084][140404] Fps is (10 sec: 52428.6, 60 sec: 53452.8, 300 sec: 53470.1). Total num frames: 59559936. Throughput: 0: 13353.7. Samples: 11253808. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:25:23,085][140404] Avg episode reward: [(0, '43.227')] [2023-08-17 13:25:23,399][140503] Updated weights for policy 0, policy_version 14546 (0.0006) [2023-08-17 13:25:24,176][140503] Updated weights for policy 0, policy_version 14556 (0.0007) [2023-08-17 13:25:24,956][140503] Updated weights for policy 0, policy_version 14566 (0.0007) [2023-08-17 13:25:25,723][140503] Updated weights for policy 0, policy_version 14576 (0.0006) [2023-08-17 13:25:26,463][140503] Updated weights for policy 0, policy_version 14586 (0.0006) [2023-08-17 13:25:27,225][140503] Updated weights for policy 0, policy_version 14596 (0.0006) [2023-08-17 13:25:27,992][140503] Updated weights for policy 0, policy_version 14606 (0.0007) [2023-08-17 13:25:28,084][140404] Fps is (10 sec: 52838.6, 60 sec: 53521.1, 300 sec: 53484.0). Total num frames: 59830272. Throughput: 0: 13323.4. Samples: 11333864. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:25:28,085][140404] Avg episode reward: [(0, '41.029')] [2023-08-17 13:25:28,778][140503] Updated weights for policy 0, policy_version 14616 (0.0007) [2023-08-17 13:25:29,540][140503] Updated weights for policy 0, policy_version 14626 (0.0006) [2023-08-17 13:25:30,306][140503] Updated weights for policy 0, policy_version 14636 (0.0007) [2023-08-17 13:25:31,053][140503] Updated weights for policy 0, policy_version 14646 (0.0007) [2023-08-17 13:25:31,825][140503] Updated weights for policy 0, policy_version 14656 (0.0006) [2023-08-17 13:25:32,608][140503] Updated weights for policy 0, policy_version 14666 (0.0006) [2023-08-17 13:25:33,084][140404] Fps is (10 sec: 53657.1, 60 sec: 53452.7, 300 sec: 53497.9). Total num frames: 60096512. Throughput: 0: 13326.1. Samples: 11413612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:25:33,085][140404] Avg episode reward: [(0, '44.783')] [2023-08-17 13:25:33,366][140503] Updated weights for policy 0, policy_version 14676 (0.0007) [2023-08-17 13:25:34,120][140503] Updated weights for policy 0, policy_version 14686 (0.0006) [2023-08-17 13:25:34,881][140503] Updated weights for policy 0, policy_version 14696 (0.0007) [2023-08-17 13:25:35,671][140503] Updated weights for policy 0, policy_version 14706 (0.0007) [2023-08-17 13:25:36,480][140503] Updated weights for policy 0, policy_version 14716 (0.0007) [2023-08-17 13:25:37,228][140503] Updated weights for policy 0, policy_version 14726 (0.0007) [2023-08-17 13:25:37,980][140503] Updated weights for policy 0, policy_version 14736 (0.0006) [2023-08-17 13:25:38,084][140404] Fps is (10 sec: 53248.1, 60 sec: 53316.3, 300 sec: 53497.9). Total num frames: 60362752. Throughput: 0: 13339.7. Samples: 11453568. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:25:38,085][140404] Avg episode reward: [(0, '42.759')] [2023-08-17 13:25:38,739][140503] Updated weights for policy 0, policy_version 14746 (0.0006) [2023-08-17 13:25:39,508][140503] Updated weights for policy 0, policy_version 14756 (0.0006) [2023-08-17 13:25:40,243][140503] Updated weights for policy 0, policy_version 14766 (0.0007) [2023-08-17 13:25:40,982][140503] Updated weights for policy 0, policy_version 14776 (0.0006) [2023-08-17 13:25:41,776][140503] Updated weights for policy 0, policy_version 14786 (0.0006) [2023-08-17 13:25:42,522][140503] Updated weights for policy 0, policy_version 14796 (0.0007) [2023-08-17 13:25:43,084][140404] Fps is (10 sec: 53658.0, 60 sec: 53384.5, 300 sec: 53497.9). Total num frames: 60633088. Throughput: 0: 13341.9. Samples: 11534580. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:25:43,085][140404] Avg episode reward: [(0, '45.121')] [2023-08-17 13:25:43,334][140503] Updated weights for policy 0, policy_version 14806 (0.0007) [2023-08-17 13:25:44,086][140503] Updated weights for policy 0, policy_version 14816 (0.0006) [2023-08-17 13:25:44,812][140503] Updated weights for policy 0, policy_version 14826 (0.0006) [2023-08-17 13:25:45,586][140503] Updated weights for policy 0, policy_version 14836 (0.0007) [2023-08-17 13:25:46,373][140503] Updated weights for policy 0, policy_version 14846 (0.0007) [2023-08-17 13:25:47,126][140503] Updated weights for policy 0, policy_version 14856 (0.0007) [2023-08-17 13:25:47,925][140503] Updated weights for policy 0, policy_version 14866 (0.0007) [2023-08-17 13:25:48,084][140404] Fps is (10 sec: 53657.2, 60 sec: 53384.5, 300 sec: 53484.0). Total num frames: 60899328. Throughput: 0: 13313.6. Samples: 11614352. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-08-17 13:25:48,085][140404] Avg episode reward: [(0, '45.498')] [2023-08-17 13:25:48,685][140503] Updated weights for policy 0, policy_version 14876 (0.0007) [2023-08-17 13:25:49,455][140503] Updated weights for policy 0, policy_version 14886 (0.0006) [2023-08-17 13:25:50,208][140503] Updated weights for policy 0, policy_version 14896 (0.0006) [2023-08-17 13:25:50,971][140503] Updated weights for policy 0, policy_version 14906 (0.0006) [2023-08-17 13:25:51,730][140503] Updated weights for policy 0, policy_version 14916 (0.0006) [2023-08-17 13:25:52,489][140503] Updated weights for policy 0, policy_version 14926 (0.0006) [2023-08-17 13:25:53,084][140404] Fps is (10 sec: 53658.2, 60 sec: 53452.9, 300 sec: 53511.8). Total num frames: 61169664. Throughput: 0: 13323.8. Samples: 11654768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2023-08-17 13:25:53,085][140404] Avg episode reward: [(0, '42.994')] [2023-08-17 13:25:53,233][140503] Updated weights for policy 0, policy_version 14936 (0.0006) [2023-08-17 13:25:54,021][140503] Updated weights for policy 0, policy_version 14946 (0.0007) [2023-08-17 13:25:54,788][140503] Updated weights for policy 0, policy_version 14956 (0.0007) [2023-08-17 13:25:55,531][140503] Updated weights for policy 0, policy_version 14966 (0.0006) [2023-08-17 13:25:56,283][140503] Updated weights for policy 0, policy_version 14976 (0.0006) [2023-08-17 13:25:57,060][140503] Updated weights for policy 0, policy_version 14986 (0.0006) [2023-08-17 13:25:57,801][140503] Updated weights for policy 0, policy_version 14996 (0.0006) [2023-08-17 13:25:58,084][140404] Fps is (10 sec: 53658.0, 60 sec: 53384.5, 300 sec: 53511.8). Total num frames: 61435904. Throughput: 0: 13334.6. Samples: 11735548. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:25:58,086][140404] Avg episode reward: [(0, '45.691')] [2023-08-17 13:25:58,597][140503] Updated weights for policy 0, policy_version 15006 (0.0007) [2023-08-17 13:25:59,348][140503] Updated weights for policy 0, policy_version 15016 (0.0006) [2023-08-17 13:26:00,082][140503] Updated weights for policy 0, policy_version 15026 (0.0006) [2023-08-17 13:26:00,815][140503] Updated weights for policy 0, policy_version 15036 (0.0006) [2023-08-17 13:26:01,631][140503] Updated weights for policy 0, policy_version 15046 (0.0007) [2023-08-17 13:26:02,385][140503] Updated weights for policy 0, policy_version 15056 (0.0006) [2023-08-17 13:26:03,084][140404] Fps is (10 sec: 53657.4, 60 sec: 53384.6, 300 sec: 53525.7). Total num frames: 61706240. Throughput: 0: 13352.2. Samples: 11816136. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:26:03,085][140404] Avg episode reward: [(0, '45.715')] [2023-08-17 13:26:03,151][140503] Updated weights for policy 0, policy_version 15066 (0.0006) [2023-08-17 13:26:03,927][140503] Updated weights for policy 0, policy_version 15076 (0.0006) [2023-08-17 13:26:04,670][140503] Updated weights for policy 0, policy_version 15086 (0.0006) [2023-08-17 13:26:05,430][140503] Updated weights for policy 0, policy_version 15096 (0.0007) [2023-08-17 13:26:06,257][140503] Updated weights for policy 0, policy_version 15106 (0.0007) [2023-08-17 13:26:07,039][140503] Updated weights for policy 0, policy_version 15116 (0.0007) [2023-08-17 13:26:07,805][140503] Updated weights for policy 0, policy_version 15126 (0.0006) [2023-08-17 13:26:08,084][140404] Fps is (10 sec: 53248.0, 60 sec: 53316.3, 300 sec: 53511.8). Total num frames: 61968384. Throughput: 0: 13381.1. Samples: 11855956. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:26:08,085][140404] Avg episode reward: [(0, '44.336')] [2023-08-17 13:26:08,570][140503] Updated weights for policy 0, policy_version 15136 (0.0007) [2023-08-17 13:26:09,337][140503] Updated weights for policy 0, policy_version 15146 (0.0006) [2023-08-17 13:26:10,089][140503] Updated weights for policy 0, policy_version 15156 (0.0007) [2023-08-17 13:26:10,882][140503] Updated weights for policy 0, policy_version 15166 (0.0008) [2023-08-17 13:26:11,687][140503] Updated weights for policy 0, policy_version 15176 (0.0007) [2023-08-17 13:26:12,494][140503] Updated weights for policy 0, policy_version 15186 (0.0007) [2023-08-17 13:26:13,084][140404] Fps is (10 sec: 52428.4, 60 sec: 53248.0, 300 sec: 53497.9). Total num frames: 62230528. Throughput: 0: 13351.0. Samples: 11934660. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:26:13,085][140404] Avg episode reward: [(0, '46.688')] [2023-08-17 13:26:13,088][140489] Saving new best policy, reward=46.688! [2023-08-17 13:26:13,282][140503] Updated weights for policy 0, policy_version 15196 (0.0007) [2023-08-17 13:26:14,072][140503] Updated weights for policy 0, policy_version 15206 (0.0007) [2023-08-17 13:26:14,845][140503] Updated weights for policy 0, policy_version 15216 (0.0007) [2023-08-17 13:26:15,643][140503] Updated weights for policy 0, policy_version 15226 (0.0007) [2023-08-17 13:26:16,411][140503] Updated weights for policy 0, policy_version 15236 (0.0007) [2023-08-17 13:26:17,191][140503] Updated weights for policy 0, policy_version 15246 (0.0007) [2023-08-17 13:26:17,946][140503] Updated weights for policy 0, policy_version 15256 (0.0006) [2023-08-17 13:26:18,084][140404] Fps is (10 sec: 52428.9, 60 sec: 53179.8, 300 sec: 53484.0). Total num frames: 62492672. Throughput: 0: 13325.0. Samples: 12013236. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:26:18,085][140404] Avg episode reward: [(0, '41.636')] [2023-08-17 13:26:18,719][140503] Updated weights for policy 0, policy_version 15266 (0.0006) [2023-08-17 13:26:19,504][140503] Updated weights for policy 0, policy_version 15276 (0.0006) [2023-08-17 13:26:20,244][140503] Updated weights for policy 0, policy_version 15286 (0.0006) [2023-08-17 13:26:20,986][140503] Updated weights for policy 0, policy_version 15296 (0.0006) [2023-08-17 13:26:21,731][140503] Updated weights for policy 0, policy_version 15306 (0.0006) [2023-08-17 13:26:22,466][140503] Updated weights for policy 0, policy_version 15316 (0.0006) [2023-08-17 13:26:23,084][140404] Fps is (10 sec: 53658.2, 60 sec: 53452.9, 300 sec: 53525.7). Total num frames: 62767104. Throughput: 0: 13337.5. Samples: 12053756. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:26:23,085][140404] Avg episode reward: [(0, '43.977')] [2023-08-17 13:26:23,212][140503] Updated weights for policy 0, policy_version 15326 (0.0006) [2023-08-17 13:26:23,965][140503] Updated weights for policy 0, policy_version 15336 (0.0006) [2023-08-17 13:26:24,705][140503] Updated weights for policy 0, policy_version 15346 (0.0006) [2023-08-17 13:26:25,459][140503] Updated weights for policy 0, policy_version 15356 (0.0006) [2023-08-17 13:26:26,232][140503] Updated weights for policy 0, policy_version 15366 (0.0007) [2023-08-17 13:26:27,019][140503] Updated weights for policy 0, policy_version 15376 (0.0007) [2023-08-17 13:26:27,783][140503] Updated weights for policy 0, policy_version 15386 (0.0006) [2023-08-17 13:26:28,084][140404] Fps is (10 sec: 54067.5, 60 sec: 53384.6, 300 sec: 53525.7). Total num frames: 63033344. Throughput: 0: 13350.6. Samples: 12135356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:26:28,085][140404] Avg episode reward: [(0, '42.789')] [2023-08-17 13:26:28,553][140503] Updated weights for policy 0, policy_version 15396 (0.0006) [2023-08-17 13:26:29,320][140503] Updated weights for policy 0, policy_version 15406 (0.0006) [2023-08-17 13:26:30,068][140503] Updated weights for policy 0, policy_version 15416 (0.0006) [2023-08-17 13:26:30,831][140503] Updated weights for policy 0, policy_version 15426 (0.0006) [2023-08-17 13:26:31,636][140503] Updated weights for policy 0, policy_version 15436 (0.0007) [2023-08-17 13:26:32,434][140503] Updated weights for policy 0, policy_version 15446 (0.0006) [2023-08-17 13:26:33,084][140404] Fps is (10 sec: 53246.0, 60 sec: 53384.4, 300 sec: 53511.8). Total num frames: 63299584. Throughput: 0: 13347.6. Samples: 12214996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:26:33,086][140404] Avg episode reward: [(0, '43.614')] [2023-08-17 13:26:33,198][140503] Updated weights for policy 0, policy_version 15456 (0.0007) [2023-08-17 13:26:33,993][140503] Updated weights for policy 0, policy_version 15466 (0.0007) [2023-08-17 13:26:34,713][140503] Updated weights for policy 0, policy_version 15476 (0.0006) [2023-08-17 13:26:35,449][140503] Updated weights for policy 0, policy_version 15486 (0.0006) [2023-08-17 13:26:36,178][140503] Updated weights for policy 0, policy_version 15496 (0.0006) [2023-08-17 13:26:36,959][140503] Updated weights for policy 0, policy_version 15506 (0.0006) [2023-08-17 13:26:37,731][140503] Updated weights for policy 0, policy_version 15516 (0.0007) [2023-08-17 13:26:38,084][140404] Fps is (10 sec: 53657.3, 60 sec: 53452.8, 300 sec: 53511.8). Total num frames: 63569920. Throughput: 0: 13350.4. Samples: 12255536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2023-08-17 13:26:38,085][140404] Avg episode reward: [(0, '45.466')] [2023-08-17 13:26:38,489][140503] Updated weights for policy 0, policy_version 15526 (0.0006) [2023-08-17 13:26:39,265][140503] Updated weights for policy 0, policy_version 15536 (0.0007) [2023-08-17 13:26:40,044][140503] Updated weights for policy 0, policy_version 15546 (0.0006) [2023-08-17 13:26:40,799][140503] Updated weights for policy 0, policy_version 15556 (0.0006) [2023-08-17 13:26:41,606][140503] Updated weights for policy 0, policy_version 15566 (0.0007) [2023-08-17 13:26:42,382][140503] Updated weights for policy 0, policy_version 15576 (0.0007) [2023-08-17 13:26:43,084][140404] Fps is (10 sec: 53249.4, 60 sec: 53316.3, 300 sec: 53497.9). Total num frames: 63832064. Throughput: 0: 13329.5. Samples: 12335376. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:26:43,085][140404] Avg episode reward: [(0, '42.484')] [2023-08-17 13:26:43,164][140503] Updated weights for policy 0, policy_version 15586 (0.0007) [2023-08-17 13:26:43,960][140503] Updated weights for policy 0, policy_version 15596 (0.0006) [2023-08-17 13:26:44,717][140503] Updated weights for policy 0, policy_version 15606 (0.0007) [2023-08-17 13:26:45,480][140503] Updated weights for policy 0, policy_version 15616 (0.0007) [2023-08-17 13:26:46,230][140503] Updated weights for policy 0, policy_version 15626 (0.0006) [2023-08-17 13:26:47,018][140503] Updated weights for policy 0, policy_version 15636 (0.0006) [2023-08-17 13:26:47,813][140503] Updated weights for policy 0, policy_version 15646 (0.0007) [2023-08-17 13:26:48,084][140404] Fps is (10 sec: 52838.4, 60 sec: 53316.3, 300 sec: 53497.9). Total num frames: 64098304. Throughput: 0: 13297.0. Samples: 12414500. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:26:48,085][140404] Avg episode reward: [(0, '45.376')] [2023-08-17 13:26:48,599][140503] Updated weights for policy 0, policy_version 15656 (0.0007) [2023-08-17 13:26:49,348][140503] Updated weights for policy 0, policy_version 15666 (0.0006) [2023-08-17 13:26:50,111][140503] Updated weights for policy 0, policy_version 15676 (0.0006) [2023-08-17 13:26:50,897][140503] Updated weights for policy 0, policy_version 15686 (0.0007) [2023-08-17 13:26:51,652][140503] Updated weights for policy 0, policy_version 15696 (0.0006) [2023-08-17 13:26:52,442][140503] Updated weights for policy 0, policy_version 15706 (0.0007) [2023-08-17 13:26:53,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53247.9, 300 sec: 53511.8). Total num frames: 64364544. Throughput: 0: 13296.5. Samples: 12454300. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:26:53,085][140404] Avg episode reward: [(0, '45.440')] [2023-08-17 13:26:53,184][140503] Updated weights for policy 0, policy_version 15716 (0.0006) [2023-08-17 13:26:53,957][140503] Updated weights for policy 0, policy_version 15726 (0.0007) [2023-08-17 13:26:54,750][140503] Updated weights for policy 0, policy_version 15736 (0.0006) [2023-08-17 13:26:55,512][140503] Updated weights for policy 0, policy_version 15746 (0.0006) [2023-08-17 13:26:56,287][140503] Updated weights for policy 0, policy_version 15756 (0.0007) [2023-08-17 13:26:57,063][140503] Updated weights for policy 0, policy_version 15766 (0.0007) [2023-08-17 13:26:57,819][140503] Updated weights for policy 0, policy_version 15776 (0.0007) [2023-08-17 13:26:58,084][140404] Fps is (10 sec: 53248.1, 60 sec: 53248.0, 300 sec: 53484.0). Total num frames: 64630784. Throughput: 0: 13322.4. Samples: 12534168. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:26:58,085][140404] Avg episode reward: [(0, '43.784')] [2023-08-17 13:26:58,568][140503] Updated weights for policy 0, policy_version 15786 (0.0006) [2023-08-17 13:26:59,311][140503] Updated weights for policy 0, policy_version 15796 (0.0006) [2023-08-17 13:27:00,087][140503] Updated weights for policy 0, policy_version 15806 (0.0006) [2023-08-17 13:27:00,861][140503] Updated weights for policy 0, policy_version 15816 (0.0007) [2023-08-17 13:27:01,605][140503] Updated weights for policy 0, policy_version 15826 (0.0006) [2023-08-17 13:27:02,418][140503] Updated weights for policy 0, policy_version 15836 (0.0007) [2023-08-17 13:27:03,084][140404] Fps is (10 sec: 53247.8, 60 sec: 53179.6, 300 sec: 53497.9). Total num frames: 64897024. Throughput: 0: 13361.8. Samples: 12614520. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-08-17 13:27:03,085][140404] Avg episode reward: [(0, '45.411')] [2023-08-17 13:27:03,092][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000015845_64901120.pth... [2023-08-17 13:27:03,130][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000012714_52076544.pth [2023-08-17 13:27:03,202][140503] Updated weights for policy 0, policy_version 15846 (0.0007) [2023-08-17 13:27:03,955][140503] Updated weights for policy 0, policy_version 15856 (0.0006) [2023-08-17 13:27:04,730][140503] Updated weights for policy 0, policy_version 15866 (0.0006) [2023-08-17 13:27:05,492][140503] Updated weights for policy 0, policy_version 15876 (0.0007) [2023-08-17 13:27:06,266][140503] Updated weights for policy 0, policy_version 15886 (0.0006) [2023-08-17 13:27:07,008][140503] Updated weights for policy 0, policy_version 15896 (0.0006) [2023-08-17 13:27:07,763][140503] Updated weights for policy 0, policy_version 15906 (0.0006) [2023-08-17 13:27:08,084][140404] Fps is (10 sec: 53657.5, 60 sec: 53316.3, 300 sec: 53484.0). Total num frames: 65167360. Throughput: 0: 13346.0. Samples: 12654328. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2023-08-17 13:27:08,085][140404] Avg episode reward: [(0, '42.112')] [2023-08-17 13:27:08,502][140503] Updated weights for policy 0, policy_version 15916 (0.0006) [2023-08-17 13:27:09,270][140503] Updated weights for policy 0, policy_version 15926 (0.0006) [2023-08-17 13:27:10,034][140503] Updated weights for policy 0, policy_version 15936 (0.0007) [2023-08-17 13:27:10,803][140503] Updated weights for policy 0, policy_version 15946 (0.0006) [2023-08-17 13:27:11,578][140503] Updated weights for policy 0, policy_version 15956 (0.0007) [2023-08-17 13:27:12,328][140503] Updated weights for policy 0, policy_version 15966 (0.0006) [2023-08-17 13:27:13,075][140503] Updated weights for policy 0, policy_version 15976 (0.0006) [2023-08-17 13:27:13,084][140404] Fps is (10 sec: 54068.0, 60 sec: 53452.9, 300 sec: 53484.0). Total num frames: 65437696. Throughput: 0: 13328.5. Samples: 12735140. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:27:13,085][140404] Avg episode reward: [(0, '44.058')] [2023-08-17 13:27:13,899][140503] Updated weights for policy 0, policy_version 15986 (0.0006) [2023-08-17 13:27:14,661][140503] Updated weights for policy 0, policy_version 15996 (0.0006) [2023-08-17 13:27:15,443][140503] Updated weights for policy 0, policy_version 16006 (0.0006) [2023-08-17 13:27:16,185][140503] Updated weights for policy 0, policy_version 16016 (0.0006) [2023-08-17 13:27:16,938][140503] Updated weights for policy 0, policy_version 16026 (0.0006) [2023-08-17 13:27:17,693][140503] Updated weights for policy 0, policy_version 16036 (0.0007) [2023-08-17 13:27:18,084][140404] Fps is (10 sec: 53248.2, 60 sec: 53452.8, 300 sec: 53456.3). Total num frames: 65699840. Throughput: 0: 13341.2. Samples: 12815344. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:27:18,085][140404] Avg episode reward: [(0, '42.469')] [2023-08-17 13:27:18,449][140503] Updated weights for policy 0, policy_version 16046 (0.0006) [2023-08-17 13:27:19,238][140503] Updated weights for policy 0, policy_version 16056 (0.0006) [2023-08-17 13:27:20,008][140503] Updated weights for policy 0, policy_version 16066 (0.0006) [2023-08-17 13:27:20,772][140503] Updated weights for policy 0, policy_version 16076 (0.0006) [2023-08-17 13:27:21,521][140503] Updated weights for policy 0, policy_version 16086 (0.0006) [2023-08-17 13:27:22,259][140503] Updated weights for policy 0, policy_version 16096 (0.0006) [2023-08-17 13:27:23,006][140503] Updated weights for policy 0, policy_version 16106 (0.0007) [2023-08-17 13:27:23,084][140404] Fps is (10 sec: 53246.8, 60 sec: 53384.3, 300 sec: 53442.4). Total num frames: 65970176. Throughput: 0: 13335.2. Samples: 12855620. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:27:23,086][140404] Avg episode reward: [(0, '42.309')] [2023-08-17 13:27:23,794][140503] Updated weights for policy 0, policy_version 16116 (0.0006) [2023-08-17 13:27:24,546][140503] Updated weights for policy 0, policy_version 16126 (0.0006) [2023-08-17 13:27:25,331][140503] Updated weights for policy 0, policy_version 16136 (0.0007) [2023-08-17 13:27:26,086][140503] Updated weights for policy 0, policy_version 16146 (0.0006) [2023-08-17 13:27:26,867][140503] Updated weights for policy 0, policy_version 16156 (0.0007) [2023-08-17 13:27:27,648][140503] Updated weights for policy 0, policy_version 16166 (0.0007) [2023-08-17 13:27:28,084][140404] Fps is (10 sec: 53655.9, 60 sec: 53384.2, 300 sec: 53442.3). Total num frames: 66236416. Throughput: 0: 13349.1. Samples: 12936088. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-08-17 13:27:28,086][140404] Avg episode reward: [(0, '42.601')] [2023-08-17 13:27:28,407][140503] Updated weights for policy 0, policy_version 16176 (0.0007) [2023-08-17 13:27:29,207][140503] Updated weights for policy 0, policy_version 16186 (0.0006) [2023-08-17 13:27:29,959][140503] Updated weights for policy 0, policy_version 16196 (0.0006) [2023-08-17 13:27:30,732][140503] Updated weights for policy 0, policy_version 16206 (0.0007) [2023-08-17 13:27:31,486][140503] Updated weights for policy 0, policy_version 16216 (0.0006) [2023-08-17 13:27:32,236][140503] Updated weights for policy 0, policy_version 16226 (0.0006) [2023-08-17 13:27:33,008][140503] Updated weights for policy 0, policy_version 16236 (0.0007) [2023-08-17 13:27:33,084][140404] Fps is (10 sec: 53248.8, 60 sec: 53384.8, 300 sec: 53442.4). Total num frames: 66502656. Throughput: 0: 13369.5. Samples: 13016128. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2023-08-17 13:27:33,085][140404] Avg episode reward: [(0, '44.372')] [2023-08-17 13:27:33,747][140503] Updated weights for policy 0, policy_version 16246 (0.0006) [2023-08-17 13:27:34,510][140503] Updated weights for policy 0, policy_version 16256 (0.0007) [2023-08-17 13:27:35,299][140503] Updated weights for policy 0, policy_version 16266 (0.0007) [2023-08-17 13:27:36,091][140503] Updated weights for policy 0, policy_version 16276 (0.0007) [2023-08-17 13:27:36,870][140503] Updated weights for policy 0, policy_version 16286 (0.0006) [2023-08-17 13:27:37,633][140503] Updated weights for policy 0, policy_version 16296 (0.0006) [2023-08-17 13:27:38,084][140404] Fps is (10 sec: 53249.3, 60 sec: 53316.2, 300 sec: 53456.3). Total num frames: 66768896. Throughput: 0: 13367.0. Samples: 13055816. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:27:38,085][140404] Avg episode reward: [(0, '44.442')] [2023-08-17 13:27:38,422][140503] Updated weights for policy 0, policy_version 16306 (0.0006) [2023-08-17 13:27:39,177][140503] Updated weights for policy 0, policy_version 16316 (0.0006) [2023-08-17 13:27:39,938][140503] Updated weights for policy 0, policy_version 16326 (0.0007) [2023-08-17 13:27:40,725][140503] Updated weights for policy 0, policy_version 16336 (0.0007) [2023-08-17 13:27:41,524][140503] Updated weights for policy 0, policy_version 16346 (0.0007) [2023-08-17 13:27:42,283][140503] Updated weights for policy 0, policy_version 16356 (0.0006) [2023-08-17 13:27:43,024][140503] Updated weights for policy 0, policy_version 16366 (0.0006) [2023-08-17 13:27:43,084][140404] Fps is (10 sec: 53248.2, 60 sec: 53384.6, 300 sec: 53456.3). Total num frames: 67035136. Throughput: 0: 13359.7. Samples: 13135356. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:27:43,085][140404] Avg episode reward: [(0, '45.600')] [2023-08-17 13:27:43,809][140503] Updated weights for policy 0, policy_version 16376 (0.0007) [2023-08-17 13:27:44,597][140503] Updated weights for policy 0, policy_version 16386 (0.0007) [2023-08-17 13:27:45,358][140503] Updated weights for policy 0, policy_version 16396 (0.0006) [2023-08-17 13:27:46,135][140503] Updated weights for policy 0, policy_version 16406 (0.0006) [2023-08-17 13:27:46,899][140503] Updated weights for policy 0, policy_version 16416 (0.0007) [2023-08-17 13:27:47,633][140503] Updated weights for policy 0, policy_version 16426 (0.0006) [2023-08-17 13:27:48,084][140404] Fps is (10 sec: 53658.1, 60 sec: 53452.9, 300 sec: 53470.2). Total num frames: 67305472. Throughput: 0: 13361.6. Samples: 13215788. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:27:48,085][140404] Avg episode reward: [(0, '42.370')] [2023-08-17 13:27:48,398][140503] Updated weights for policy 0, policy_version 16436 (0.0006) [2023-08-17 13:27:49,179][140503] Updated weights for policy 0, policy_version 16446 (0.0007) [2023-08-17 13:27:49,966][140503] Updated weights for policy 0, policy_version 16456 (0.0007) [2023-08-17 13:27:50,705][140503] Updated weights for policy 0, policy_version 16466 (0.0006) [2023-08-17 13:27:51,470][140503] Updated weights for policy 0, policy_version 16476 (0.0007) [2023-08-17 13:27:52,250][140503] Updated weights for policy 0, policy_version 16486 (0.0006) [2023-08-17 13:27:53,044][140503] Updated weights for policy 0, policy_version 16496 (0.0007) [2023-08-17 13:27:53,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53452.9, 300 sec: 53456.3). Total num frames: 67571712. Throughput: 0: 13355.9. Samples: 13255344. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:27:53,085][140404] Avg episode reward: [(0, '45.125')] [2023-08-17 13:27:53,785][140503] Updated weights for policy 0, policy_version 16506 (0.0007) [2023-08-17 13:27:54,542][140503] Updated weights for policy 0, policy_version 16516 (0.0006) [2023-08-17 13:27:55,288][140503] Updated weights for policy 0, policy_version 16526 (0.0006) [2023-08-17 13:27:56,006][140503] Updated weights for policy 0, policy_version 16536 (0.0005) [2023-08-17 13:27:56,751][140503] Updated weights for policy 0, policy_version 16546 (0.0007) [2023-08-17 13:27:57,525][140503] Updated weights for policy 0, policy_version 16556 (0.0006) [2023-08-17 13:27:58,084][140404] Fps is (10 sec: 53657.0, 60 sec: 53521.0, 300 sec: 53442.4). Total num frames: 67842048. Throughput: 0: 13371.5. Samples: 13336860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:27:58,085][140404] Avg episode reward: [(0, '42.969')] [2023-08-17 13:27:58,316][140503] Updated weights for policy 0, policy_version 16566 (0.0007) [2023-08-17 13:27:59,094][140503] Updated weights for policy 0, policy_version 16576 (0.0007) [2023-08-17 13:27:59,875][140503] Updated weights for policy 0, policy_version 16586 (0.0007) [2023-08-17 13:28:00,630][140503] Updated weights for policy 0, policy_version 16596 (0.0007) [2023-08-17 13:28:01,432][140503] Updated weights for policy 0, policy_version 16606 (0.0006) [2023-08-17 13:28:02,214][140503] Updated weights for policy 0, policy_version 16616 (0.0007) [2023-08-17 13:28:02,987][140503] Updated weights for policy 0, policy_version 16626 (0.0007) [2023-08-17 13:28:03,084][140404] Fps is (10 sec: 53248.1, 60 sec: 53452.9, 300 sec: 53428.5). Total num frames: 68104192. Throughput: 0: 13338.1. Samples: 13415556. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:28:03,085][140404] Avg episode reward: [(0, '44.263')] [2023-08-17 13:28:03,771][140503] Updated weights for policy 0, policy_version 16636 (0.0006) [2023-08-17 13:28:04,518][140503] Updated weights for policy 0, policy_version 16646 (0.0006) [2023-08-17 13:28:05,329][140503] Updated weights for policy 0, policy_version 16656 (0.0007) [2023-08-17 13:28:06,084][140503] Updated weights for policy 0, policy_version 16666 (0.0007) [2023-08-17 13:28:06,859][140503] Updated weights for policy 0, policy_version 16676 (0.0007) [2023-08-17 13:28:07,625][140503] Updated weights for policy 0, policy_version 16686 (0.0006) [2023-08-17 13:28:08,084][140404] Fps is (10 sec: 52428.9, 60 sec: 53316.3, 300 sec: 53414.6). Total num frames: 68366336. Throughput: 0: 13328.3. Samples: 13455392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:28:08,085][140404] Avg episode reward: [(0, '45.608')] [2023-08-17 13:28:08,430][140503] Updated weights for policy 0, policy_version 16696 (0.0007) [2023-08-17 13:28:09,195][140503] Updated weights for policy 0, policy_version 16706 (0.0006) [2023-08-17 13:28:09,937][140503] Updated weights for policy 0, policy_version 16716 (0.0006) [2023-08-17 13:28:10,687][140503] Updated weights for policy 0, policy_version 16726 (0.0006) [2023-08-17 13:28:11,415][140503] Updated weights for policy 0, policy_version 16736 (0.0006) [2023-08-17 13:28:12,184][140503] Updated weights for policy 0, policy_version 16746 (0.0007) [2023-08-17 13:28:12,917][140503] Updated weights for policy 0, policy_version 16756 (0.0006) [2023-08-17 13:28:13,084][140404] Fps is (10 sec: 53657.6, 60 sec: 53384.5, 300 sec: 53456.3). Total num frames: 68640768. Throughput: 0: 13331.7. Samples: 13536012. Policy #0 lag: (min: 0.0, avg: 1.4, max: 4.0) [2023-08-17 13:28:13,085][140404] Avg episode reward: [(0, '41.792')] [2023-08-17 13:28:13,659][140503] Updated weights for policy 0, policy_version 16766 (0.0006) [2023-08-17 13:28:14,416][140503] Updated weights for policy 0, policy_version 16776 (0.0006) [2023-08-17 13:28:15,196][140503] Updated weights for policy 0, policy_version 16786 (0.0007) [2023-08-17 13:28:15,977][140503] Updated weights for policy 0, policy_version 16796 (0.0007) [2023-08-17 13:28:16,759][140503] Updated weights for policy 0, policy_version 16806 (0.0007) [2023-08-17 13:28:17,499][140503] Updated weights for policy 0, policy_version 16816 (0.0007) [2023-08-17 13:28:18,084][140404] Fps is (10 sec: 53658.1, 60 sec: 53384.6, 300 sec: 53442.4). Total num frames: 68902912. Throughput: 0: 13334.2. Samples: 13616164. Policy #0 lag: (min: 0.0, avg: 1.4, max: 4.0) [2023-08-17 13:28:18,085][140404] Avg episode reward: [(0, '45.146')] [2023-08-17 13:28:18,324][140503] Updated weights for policy 0, policy_version 16826 (0.0007) [2023-08-17 13:28:19,086][140503] Updated weights for policy 0, policy_version 16836 (0.0007) [2023-08-17 13:28:19,862][140503] Updated weights for policy 0, policy_version 16846 (0.0006) [2023-08-17 13:28:20,620][140503] Updated weights for policy 0, policy_version 16856 (0.0006) [2023-08-17 13:28:21,409][140503] Updated weights for policy 0, policy_version 16866 (0.0007) [2023-08-17 13:28:22,192][140503] Updated weights for policy 0, policy_version 16876 (0.0007) [2023-08-17 13:28:22,983][140503] Updated weights for policy 0, policy_version 16886 (0.0007) [2023-08-17 13:28:23,084][140404] Fps is (10 sec: 52838.2, 60 sec: 53316.4, 300 sec: 53428.5). Total num frames: 69169152. Throughput: 0: 13342.9. Samples: 13656248. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:28:23,085][140404] Avg episode reward: [(0, '45.044')] [2023-08-17 13:28:23,757][140503] Updated weights for policy 0, policy_version 16896 (0.0007) [2023-08-17 13:28:24,524][140503] Updated weights for policy 0, policy_version 16906 (0.0006) [2023-08-17 13:28:25,304][140503] Updated weights for policy 0, policy_version 16916 (0.0006) [2023-08-17 13:28:26,060][140503] Updated weights for policy 0, policy_version 16926 (0.0006) [2023-08-17 13:28:26,840][140503] Updated weights for policy 0, policy_version 16936 (0.0006) [2023-08-17 13:28:27,620][140503] Updated weights for policy 0, policy_version 16946 (0.0006) [2023-08-17 13:28:28,084][140404] Fps is (10 sec: 53248.0, 60 sec: 53316.6, 300 sec: 53400.8). Total num frames: 69435392. Throughput: 0: 13327.2. Samples: 13735080. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:28:28,085][140404] Avg episode reward: [(0, '46.548')] [2023-08-17 13:28:28,398][140503] Updated weights for policy 0, policy_version 16956 (0.0006) [2023-08-17 13:28:29,147][140503] Updated weights for policy 0, policy_version 16966 (0.0007) [2023-08-17 13:28:29,904][140503] Updated weights for policy 0, policy_version 16976 (0.0007) [2023-08-17 13:28:30,654][140503] Updated weights for policy 0, policy_version 16986 (0.0006) [2023-08-17 13:28:31,414][140503] Updated weights for policy 0, policy_version 16996 (0.0006) [2023-08-17 13:28:32,174][140503] Updated weights for policy 0, policy_version 17006 (0.0006) [2023-08-17 13:28:32,947][140503] Updated weights for policy 0, policy_version 17016 (0.0007) [2023-08-17 13:28:33,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53316.3, 300 sec: 53414.6). Total num frames: 69701632. Throughput: 0: 13325.7. Samples: 13815444. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:28:33,085][140404] Avg episode reward: [(0, '44.512')] [2023-08-17 13:28:33,720][140503] Updated weights for policy 0, policy_version 17026 (0.0006) [2023-08-17 13:28:34,468][140503] Updated weights for policy 0, policy_version 17036 (0.0007) [2023-08-17 13:28:35,233][140503] Updated weights for policy 0, policy_version 17046 (0.0006) [2023-08-17 13:28:35,988][140503] Updated weights for policy 0, policy_version 17056 (0.0006) [2023-08-17 13:28:36,725][140503] Updated weights for policy 0, policy_version 17066 (0.0006) [2023-08-17 13:28:37,498][140503] Updated weights for policy 0, policy_version 17076 (0.0006) [2023-08-17 13:28:38,084][140404] Fps is (10 sec: 53657.5, 60 sec: 53384.6, 300 sec: 53414.6). Total num frames: 69971968. Throughput: 0: 13347.1. Samples: 13855964. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:28:38,085][140404] Avg episode reward: [(0, '43.361')] [2023-08-17 13:28:38,287][140503] Updated weights for policy 0, policy_version 17086 (0.0006) [2023-08-17 13:28:39,058][140503] Updated weights for policy 0, policy_version 17096 (0.0007) [2023-08-17 13:28:39,800][140503] Updated weights for policy 0, policy_version 17106 (0.0006) [2023-08-17 13:28:40,536][140503] Updated weights for policy 0, policy_version 17116 (0.0006) [2023-08-17 13:28:41,292][140503] Updated weights for policy 0, policy_version 17126 (0.0007) [2023-08-17 13:28:42,073][140503] Updated weights for policy 0, policy_version 17136 (0.0007) [2023-08-17 13:28:42,824][140503] Updated weights for policy 0, policy_version 17146 (0.0006) [2023-08-17 13:28:43,084][140404] Fps is (10 sec: 54067.4, 60 sec: 53452.8, 300 sec: 53442.4). Total num frames: 70242304. Throughput: 0: 13331.6. Samples: 13936780. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:28:43,085][140404] Avg episode reward: [(0, '42.033')] [2023-08-17 13:28:43,612][140503] Updated weights for policy 0, policy_version 17156 (0.0007) [2023-08-17 13:28:44,405][140503] Updated weights for policy 0, policy_version 17166 (0.0007) [2023-08-17 13:28:45,159][140503] Updated weights for policy 0, policy_version 17176 (0.0006) [2023-08-17 13:28:45,890][140503] Updated weights for policy 0, policy_version 17186 (0.0007) [2023-08-17 13:28:46,669][140503] Updated weights for policy 0, policy_version 17196 (0.0006) [2023-08-17 13:28:47,420][140503] Updated weights for policy 0, policy_version 17206 (0.0006) [2023-08-17 13:28:48,084][140404] Fps is (10 sec: 53657.6, 60 sec: 53384.5, 300 sec: 53428.5). Total num frames: 70508544. Throughput: 0: 13363.5. Samples: 14016912. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:28:48,085][140404] Avg episode reward: [(0, '46.144')] [2023-08-17 13:28:48,203][140503] Updated weights for policy 0, policy_version 17216 (0.0007) [2023-08-17 13:28:48,988][140503] Updated weights for policy 0, policy_version 17226 (0.0006) [2023-08-17 13:28:49,757][140503] Updated weights for policy 0, policy_version 17236 (0.0006) [2023-08-17 13:28:50,539][140503] Updated weights for policy 0, policy_version 17246 (0.0006) [2023-08-17 13:28:51,313][140503] Updated weights for policy 0, policy_version 17256 (0.0007) [2023-08-17 13:28:52,103][140503] Updated weights for policy 0, policy_version 17266 (0.0007) [2023-08-17 13:28:52,871][140503] Updated weights for policy 0, policy_version 17276 (0.0007) [2023-08-17 13:28:53,084][140404] Fps is (10 sec: 52838.4, 60 sec: 53316.3, 300 sec: 53414.6). Total num frames: 70770688. Throughput: 0: 13366.9. Samples: 14056900. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:28:53,085][140404] Avg episode reward: [(0, '46.073')] [2023-08-17 13:28:53,643][140503] Updated weights for policy 0, policy_version 17286 (0.0007) [2023-08-17 13:28:54,416][140503] Updated weights for policy 0, policy_version 17296 (0.0006) [2023-08-17 13:28:55,201][140503] Updated weights for policy 0, policy_version 17306 (0.0006) [2023-08-17 13:28:55,971][140503] Updated weights for policy 0, policy_version 17316 (0.0006) [2023-08-17 13:28:56,749][140503] Updated weights for policy 0, policy_version 17326 (0.0006) [2023-08-17 13:28:57,501][140503] Updated weights for policy 0, policy_version 17336 (0.0006) [2023-08-17 13:28:58,084][140404] Fps is (10 sec: 52838.1, 60 sec: 53248.0, 300 sec: 53414.6). Total num frames: 71036928. Throughput: 0: 13335.1. Samples: 14136092. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:28:58,085][140404] Avg episode reward: [(0, '44.895')] [2023-08-17 13:28:58,241][140503] Updated weights for policy 0, policy_version 17346 (0.0006) [2023-08-17 13:28:58,992][140503] Updated weights for policy 0, policy_version 17356 (0.0006) [2023-08-17 13:28:59,724][140503] Updated weights for policy 0, policy_version 17366 (0.0006) [2023-08-17 13:29:00,505][140503] Updated weights for policy 0, policy_version 17376 (0.0006) [2023-08-17 13:29:01,279][140503] Updated weights for policy 0, policy_version 17386 (0.0006) [2023-08-17 13:29:02,054][140503] Updated weights for policy 0, policy_version 17396 (0.0006) [2023-08-17 13:29:02,807][140503] Updated weights for policy 0, policy_version 17406 (0.0006) [2023-08-17 13:29:03,084][140404] Fps is (10 sec: 53656.7, 60 sec: 53384.4, 300 sec: 53442.4). Total num frames: 71307264. Throughput: 0: 13343.0. Samples: 14216600. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:29:03,085][140404] Avg episode reward: [(0, '44.663')] [2023-08-17 13:29:03,089][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000017409_71307264.pth... [2023-08-17 13:29:03,137][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000014283_58503168.pth [2023-08-17 13:29:03,590][140503] Updated weights for policy 0, policy_version 17416 (0.0007) [2023-08-17 13:29:04,389][140503] Updated weights for policy 0, policy_version 17426 (0.0007) [2023-08-17 13:29:05,146][140503] Updated weights for policy 0, policy_version 17436 (0.0007) [2023-08-17 13:29:05,943][140503] Updated weights for policy 0, policy_version 17446 (0.0006) [2023-08-17 13:29:06,744][140503] Updated weights for policy 0, policy_version 17456 (0.0007) [2023-08-17 13:29:07,510][140503] Updated weights for policy 0, policy_version 17466 (0.0007) [2023-08-17 13:29:08,084][140404] Fps is (10 sec: 53248.3, 60 sec: 53384.6, 300 sec: 53414.6). Total num frames: 71569408. Throughput: 0: 13328.0. Samples: 14256008. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) [2023-08-17 13:29:08,085][140404] Avg episode reward: [(0, '46.008')] [2023-08-17 13:29:08,268][140503] Updated weights for policy 0, policy_version 17476 (0.0006) [2023-08-17 13:29:08,992][140503] Updated weights for policy 0, policy_version 17486 (0.0006) [2023-08-17 13:29:09,785][140503] Updated weights for policy 0, policy_version 17496 (0.0006) [2023-08-17 13:29:10,559][140503] Updated weights for policy 0, policy_version 17506 (0.0006) [2023-08-17 13:29:11,320][140503] Updated weights for policy 0, policy_version 17516 (0.0007) [2023-08-17 13:29:12,099][140503] Updated weights for policy 0, policy_version 17526 (0.0006) [2023-08-17 13:29:12,880][140503] Updated weights for policy 0, policy_version 17536 (0.0007) [2023-08-17 13:29:13,084][140404] Fps is (10 sec: 53248.6, 60 sec: 53316.2, 300 sec: 53414.6). Total num frames: 71839744. Throughput: 0: 13355.1. Samples: 14336060. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:29:13,085][140404] Avg episode reward: [(0, '44.910')] [2023-08-17 13:29:13,638][140503] Updated weights for policy 0, policy_version 17546 (0.0006) [2023-08-17 13:29:14,371][140503] Updated weights for policy 0, policy_version 17556 (0.0006) [2023-08-17 13:29:15,142][140503] Updated weights for policy 0, policy_version 17566 (0.0006) [2023-08-17 13:29:15,914][140503] Updated weights for policy 0, policy_version 17576 (0.0006) [2023-08-17 13:29:16,674][140503] Updated weights for policy 0, policy_version 17586 (0.0006) [2023-08-17 13:29:17,453][140503] Updated weights for policy 0, policy_version 17596 (0.0006) [2023-08-17 13:29:18,084][140404] Fps is (10 sec: 53657.4, 60 sec: 53384.5, 300 sec: 53400.7). Total num frames: 72105984. Throughput: 0: 13353.5. Samples: 14416352. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:29:18,085][140404] Avg episode reward: [(0, '47.723')] [2023-08-17 13:29:18,087][140489] Saving new best policy, reward=47.723! [2023-08-17 13:29:18,221][140503] Updated weights for policy 0, policy_version 17606 (0.0006) [2023-08-17 13:29:18,981][140503] Updated weights for policy 0, policy_version 17616 (0.0006) [2023-08-17 13:29:19,747][140503] Updated weights for policy 0, policy_version 17626 (0.0006) [2023-08-17 13:29:20,519][140503] Updated weights for policy 0, policy_version 17636 (0.0006) [2023-08-17 13:29:21,239][140503] Updated weights for policy 0, policy_version 17646 (0.0006) [2023-08-17 13:29:22,014][140503] Updated weights for policy 0, policy_version 17656 (0.0006) [2023-08-17 13:29:22,777][140503] Updated weights for policy 0, policy_version 17666 (0.0006) [2023-08-17 13:29:23,084][140404] Fps is (10 sec: 53248.0, 60 sec: 53384.5, 300 sec: 53400.7). Total num frames: 72372224. Throughput: 0: 13346.2. Samples: 14456544. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:29:23,085][140404] Avg episode reward: [(0, '46.623')] [2023-08-17 13:29:23,559][140503] Updated weights for policy 0, policy_version 17676 (0.0007) [2023-08-17 13:29:24,306][140503] Updated weights for policy 0, policy_version 17686 (0.0006) [2023-08-17 13:29:25,117][140503] Updated weights for policy 0, policy_version 17696 (0.0007) [2023-08-17 13:29:25,888][140503] Updated weights for policy 0, policy_version 17706 (0.0007) [2023-08-17 13:29:26,681][140503] Updated weights for policy 0, policy_version 17716 (0.0007) [2023-08-17 13:29:27,457][140503] Updated weights for policy 0, policy_version 17726 (0.0007) [2023-08-17 13:29:28,084][140404] Fps is (10 sec: 53248.3, 60 sec: 53384.5, 300 sec: 53386.9). Total num frames: 72638464. Throughput: 0: 13316.4. Samples: 14536020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:29:28,085][140404] Avg episode reward: [(0, '47.322')] [2023-08-17 13:29:28,219][140503] Updated weights for policy 0, policy_version 17736 (0.0006) [2023-08-17 13:29:28,992][140503] Updated weights for policy 0, policy_version 17746 (0.0007) [2023-08-17 13:29:29,774][140503] Updated weights for policy 0, policy_version 17756 (0.0006) [2023-08-17 13:29:30,512][140503] Updated weights for policy 0, policy_version 17766 (0.0007) [2023-08-17 13:29:31,265][140503] Updated weights for policy 0, policy_version 17776 (0.0007) [2023-08-17 13:29:32,014][140503] Updated weights for policy 0, policy_version 17786 (0.0006) [2023-08-17 13:29:32,791][140503] Updated weights for policy 0, policy_version 17796 (0.0007) [2023-08-17 13:29:33,084][140404] Fps is (10 sec: 53247.2, 60 sec: 53384.4, 300 sec: 53359.1). Total num frames: 72904704. Throughput: 0: 13319.2. Samples: 14616280. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:29:33,085][140404] Avg episode reward: [(0, '47.453')] [2023-08-17 13:29:33,567][140503] Updated weights for policy 0, policy_version 17806 (0.0006) [2023-08-17 13:29:34,352][140503] Updated weights for policy 0, policy_version 17816 (0.0007) [2023-08-17 13:29:35,110][140503] Updated weights for policy 0, policy_version 17826 (0.0006) [2023-08-17 13:29:35,908][140503] Updated weights for policy 0, policy_version 17836 (0.0006) [2023-08-17 13:29:36,671][140503] Updated weights for policy 0, policy_version 17846 (0.0006) [2023-08-17 13:29:37,466][140503] Updated weights for policy 0, policy_version 17856 (0.0007) [2023-08-17 13:29:38,084][140404] Fps is (10 sec: 53247.8, 60 sec: 53316.2, 300 sec: 53359.1). Total num frames: 73170944. Throughput: 0: 13315.1. Samples: 14656080. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:29:38,085][140404] Avg episode reward: [(0, '45.155')] [2023-08-17 13:29:38,215][140503] Updated weights for policy 0, policy_version 17866 (0.0006) [2023-08-17 13:29:38,980][140503] Updated weights for policy 0, policy_version 17876 (0.0007) [2023-08-17 13:29:39,741][140503] Updated weights for policy 0, policy_version 17886 (0.0006) [2023-08-17 13:29:40,508][140503] Updated weights for policy 0, policy_version 17896 (0.0006) [2023-08-17 13:29:41,263][140503] Updated weights for policy 0, policy_version 17906 (0.0006) [2023-08-17 13:29:42,031][140503] Updated weights for policy 0, policy_version 17916 (0.0007) [2023-08-17 13:29:42,841][140503] Updated weights for policy 0, policy_version 17926 (0.0006) [2023-08-17 13:29:43,084][140404] Fps is (10 sec: 53248.7, 60 sec: 53248.0, 300 sec: 53359.1). Total num frames: 73437184. Throughput: 0: 13331.1. Samples: 14735992. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:29:43,085][140404] Avg episode reward: [(0, '44.345')] [2023-08-17 13:29:43,613][140503] Updated weights for policy 0, policy_version 17936 (0.0007) [2023-08-17 13:29:44,365][140503] Updated weights for policy 0, policy_version 17946 (0.0006) [2023-08-17 13:29:45,081][140503] Updated weights for policy 0, policy_version 17956 (0.0006) [2023-08-17 13:29:45,856][140503] Updated weights for policy 0, policy_version 17966 (0.0007) [2023-08-17 13:29:46,661][140503] Updated weights for policy 0, policy_version 17976 (0.0006) [2023-08-17 13:29:47,438][140503] Updated weights for policy 0, policy_version 17986 (0.0007) [2023-08-17 13:29:48,084][140404] Fps is (10 sec: 53248.0, 60 sec: 53248.0, 300 sec: 53359.1). Total num frames: 73703424. Throughput: 0: 13318.4. Samples: 14815928. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2023-08-17 13:29:48,085][140404] Avg episode reward: [(0, '46.281')] [2023-08-17 13:29:48,217][140503] Updated weights for policy 0, policy_version 17996 (0.0007) [2023-08-17 13:29:48,970][140503] Updated weights for policy 0, policy_version 18006 (0.0006) [2023-08-17 13:29:49,725][140503] Updated weights for policy 0, policy_version 18016 (0.0007) [2023-08-17 13:29:50,517][140503] Updated weights for policy 0, policy_version 18026 (0.0007) [2023-08-17 13:29:51,311][140503] Updated weights for policy 0, policy_version 18036 (0.0007) [2023-08-17 13:29:52,085][140503] Updated weights for policy 0, policy_version 18046 (0.0007) [2023-08-17 13:29:52,846][140503] Updated weights for policy 0, policy_version 18056 (0.0006) [2023-08-17 13:29:53,084][140404] Fps is (10 sec: 53248.2, 60 sec: 53316.3, 300 sec: 53345.2). Total num frames: 73969664. Throughput: 0: 13320.4. Samples: 14855424. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2023-08-17 13:29:53,085][140404] Avg episode reward: [(0, '44.734')] [2023-08-17 13:29:53,609][140503] Updated weights for policy 0, policy_version 18066 (0.0006) [2023-08-17 13:29:54,376][140503] Updated weights for policy 0, policy_version 18076 (0.0006) [2023-08-17 13:29:55,133][140503] Updated weights for policy 0, policy_version 18086 (0.0007) [2023-08-17 13:29:55,889][140503] Updated weights for policy 0, policy_version 18096 (0.0006) [2023-08-17 13:29:56,639][140503] Updated weights for policy 0, policy_version 18106 (0.0006) [2023-08-17 13:29:57,418][140503] Updated weights for policy 0, policy_version 18116 (0.0007) [2023-08-17 13:29:58,084][140404] Fps is (10 sec: 53248.0, 60 sec: 53316.3, 300 sec: 53331.3). Total num frames: 74235904. Throughput: 0: 13328.4. Samples: 14935836. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) [2023-08-17 13:29:58,085][140404] Avg episode reward: [(0, '45.536')] [2023-08-17 13:29:58,195][140503] Updated weights for policy 0, policy_version 18126 (0.0006) [2023-08-17 13:29:58,961][140503] Updated weights for policy 0, policy_version 18136 (0.0006) [2023-08-17 13:29:59,715][140503] Updated weights for policy 0, policy_version 18146 (0.0006) [2023-08-17 13:30:00,472][140503] Updated weights for policy 0, policy_version 18156 (0.0006) [2023-08-17 13:30:01,234][140503] Updated weights for policy 0, policy_version 18166 (0.0006) [2023-08-17 13:30:01,984][140503] Updated weights for policy 0, policy_version 18176 (0.0006) [2023-08-17 13:30:02,727][140503] Updated weights for policy 0, policy_version 18186 (0.0006) [2023-08-17 13:30:03,084][140404] Fps is (10 sec: 53657.3, 60 sec: 53316.4, 300 sec: 53345.2). Total num frames: 74506240. Throughput: 0: 13338.1. Samples: 15016568. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:30:03,085][140404] Avg episode reward: [(0, '45.374')] [2023-08-17 13:30:03,524][140503] Updated weights for policy 0, policy_version 18196 (0.0007) [2023-08-17 13:30:04,303][140503] Updated weights for policy 0, policy_version 18206 (0.0006) [2023-08-17 13:30:05,098][140503] Updated weights for policy 0, policy_version 18216 (0.0007) [2023-08-17 13:30:05,876][140503] Updated weights for policy 0, policy_version 18226 (0.0007) [2023-08-17 13:30:06,621][140503] Updated weights for policy 0, policy_version 18236 (0.0007) [2023-08-17 13:30:07,389][140503] Updated weights for policy 0, policy_version 18246 (0.0007) [2023-08-17 13:30:08,084][140404] Fps is (10 sec: 53247.1, 60 sec: 53316.1, 300 sec: 53331.3). Total num frames: 74768384. Throughput: 0: 13319.7. Samples: 15055932. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:30:08,085][140404] Avg episode reward: [(0, '42.557')] [2023-08-17 13:30:08,170][140503] Updated weights for policy 0, policy_version 18256 (0.0006) [2023-08-17 13:30:08,959][140503] Updated weights for policy 0, policy_version 18266 (0.0007) [2023-08-17 13:30:09,719][140503] Updated weights for policy 0, policy_version 18276 (0.0007) [2023-08-17 13:30:10,500][140503] Updated weights for policy 0, policy_version 18286 (0.0007) [2023-08-17 13:30:11,278][140503] Updated weights for policy 0, policy_version 18296 (0.0006) [2023-08-17 13:30:12,051][140503] Updated weights for policy 0, policy_version 18306 (0.0006) [2023-08-17 13:30:12,864][140503] Updated weights for policy 0, policy_version 18316 (0.0007) [2023-08-17 13:30:13,084][140404] Fps is (10 sec: 52838.5, 60 sec: 53248.0, 300 sec: 53331.3). Total num frames: 75034624. Throughput: 0: 13313.8. Samples: 15135140. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:30:13,085][140404] Avg episode reward: [(0, '46.455')] [2023-08-17 13:30:13,609][140503] Updated weights for policy 0, policy_version 18326 (0.0007) [2023-08-17 13:30:14,379][140503] Updated weights for policy 0, policy_version 18336 (0.0006) [2023-08-17 13:30:15,138][140503] Updated weights for policy 0, policy_version 18346 (0.0007) [2023-08-17 13:30:15,941][140503] Updated weights for policy 0, policy_version 18356 (0.0006) [2023-08-17 13:30:16,706][140503] Updated weights for policy 0, policy_version 18366 (0.0006) [2023-08-17 13:30:17,484][140503] Updated weights for policy 0, policy_version 18376 (0.0007) [2023-08-17 13:30:18,084][140404] Fps is (10 sec: 52837.8, 60 sec: 53179.5, 300 sec: 53345.2). Total num frames: 75296768. Throughput: 0: 13294.2. Samples: 15214520. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:30:18,085][140404] Avg episode reward: [(0, '43.602')] [2023-08-17 13:30:18,261][140503] Updated weights for policy 0, policy_version 18386 (0.0007) [2023-08-17 13:30:19,030][140503] Updated weights for policy 0, policy_version 18396 (0.0006) [2023-08-17 13:30:19,775][140503] Updated weights for policy 0, policy_version 18406 (0.0006) [2023-08-17 13:30:20,541][140503] Updated weights for policy 0, policy_version 18416 (0.0006) [2023-08-17 13:30:21,323][140503] Updated weights for policy 0, policy_version 18426 (0.0006) [2023-08-17 13:30:22,078][140503] Updated weights for policy 0, policy_version 18436 (0.0006) [2023-08-17 13:30:22,904][140503] Updated weights for policy 0, policy_version 18446 (0.0007) [2023-08-17 13:30:23,084][140404] Fps is (10 sec: 52838.1, 60 sec: 53179.7, 300 sec: 53331.3). Total num frames: 75563008. Throughput: 0: 13298.7. Samples: 15254524. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:30:23,085][140404] Avg episode reward: [(0, '44.183')] [2023-08-17 13:30:23,647][140503] Updated weights for policy 0, policy_version 18456 (0.0007) [2023-08-17 13:30:24,422][140503] Updated weights for policy 0, policy_version 18466 (0.0007) [2023-08-17 13:30:25,208][140503] Updated weights for policy 0, policy_version 18476 (0.0007) [2023-08-17 13:30:25,998][140503] Updated weights for policy 0, policy_version 18486 (0.0007) [2023-08-17 13:30:26,766][140503] Updated weights for policy 0, policy_version 18496 (0.0007) [2023-08-17 13:30:27,558][140503] Updated weights for policy 0, policy_version 18506 (0.0007) [2023-08-17 13:30:28,084][140404] Fps is (10 sec: 52839.7, 60 sec: 53111.4, 300 sec: 53317.4). Total num frames: 75825152. Throughput: 0: 13274.2. Samples: 15333332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:30:28,085][140404] Avg episode reward: [(0, '47.887')] [2023-08-17 13:30:28,098][140489] Saving new best policy, reward=47.887! [2023-08-17 13:30:28,300][140503] Updated weights for policy 0, policy_version 18516 (0.0006) [2023-08-17 13:30:29,078][140503] Updated weights for policy 0, policy_version 18526 (0.0006) [2023-08-17 13:30:29,837][140503] Updated weights for policy 0, policy_version 18536 (0.0006) [2023-08-17 13:30:30,613][140503] Updated weights for policy 0, policy_version 18546 (0.0006) [2023-08-17 13:30:31,365][140503] Updated weights for policy 0, policy_version 18556 (0.0007) [2023-08-17 13:30:32,141][140503] Updated weights for policy 0, policy_version 18566 (0.0007) [2023-08-17 13:30:32,924][140503] Updated weights for policy 0, policy_version 18576 (0.0007) [2023-08-17 13:30:33,084][140404] Fps is (10 sec: 52838.4, 60 sec: 53111.5, 300 sec: 53317.4). Total num frames: 76091392. Throughput: 0: 13273.8. Samples: 15413252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:30:33,085][140404] Avg episode reward: [(0, '45.970')] [2023-08-17 13:30:33,688][140503] Updated weights for policy 0, policy_version 18586 (0.0006) [2023-08-17 13:30:34,452][140503] Updated weights for policy 0, policy_version 18596 (0.0007) [2023-08-17 13:30:35,234][140503] Updated weights for policy 0, policy_version 18606 (0.0007) [2023-08-17 13:30:36,014][140503] Updated weights for policy 0, policy_version 18616 (0.0007) [2023-08-17 13:30:36,764][140503] Updated weights for policy 0, policy_version 18626 (0.0006) [2023-08-17 13:30:37,537][140503] Updated weights for policy 0, policy_version 18636 (0.0007) [2023-08-17 13:30:38,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53179.7, 300 sec: 53317.4). Total num frames: 76361728. Throughput: 0: 13286.0. Samples: 15453292. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:30:38,085][140404] Avg episode reward: [(0, '43.615')] [2023-08-17 13:30:38,295][140503] Updated weights for policy 0, policy_version 18646 (0.0006) [2023-08-17 13:30:39,070][140503] Updated weights for policy 0, policy_version 18656 (0.0006) [2023-08-17 13:30:39,849][140503] Updated weights for policy 0, policy_version 18666 (0.0006) [2023-08-17 13:30:40,638][140503] Updated weights for policy 0, policy_version 18676 (0.0007) [2023-08-17 13:30:41,392][140503] Updated weights for policy 0, policy_version 18686 (0.0006) [2023-08-17 13:30:42,137][140503] Updated weights for policy 0, policy_version 18696 (0.0007) [2023-08-17 13:30:42,919][140503] Updated weights for policy 0, policy_version 18706 (0.0006) [2023-08-17 13:30:43,084][140404] Fps is (10 sec: 53657.9, 60 sec: 53179.7, 300 sec: 53317.4). Total num frames: 76627968. Throughput: 0: 13274.1. Samples: 15533172. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:30:43,085][140404] Avg episode reward: [(0, '45.831')] [2023-08-17 13:30:43,640][140503] Updated weights for policy 0, policy_version 18716 (0.0006) [2023-08-17 13:30:44,476][140503] Updated weights for policy 0, policy_version 18726 (0.0006) [2023-08-17 13:30:45,216][140503] Updated weights for policy 0, policy_version 18736 (0.0007) [2023-08-17 13:30:46,011][140503] Updated weights for policy 0, policy_version 18746 (0.0006) [2023-08-17 13:30:46,794][140503] Updated weights for policy 0, policy_version 18756 (0.0006) [2023-08-17 13:30:47,549][140503] Updated weights for policy 0, policy_version 18766 (0.0006) [2023-08-17 13:30:48,084][140404] Fps is (10 sec: 53248.4, 60 sec: 53179.8, 300 sec: 53303.5). Total num frames: 76894208. Throughput: 0: 13253.9. Samples: 15612992. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:30:48,085][140404] Avg episode reward: [(0, '43.087')] [2023-08-17 13:30:48,311][140503] Updated weights for policy 0, policy_version 18776 (0.0006) [2023-08-17 13:30:49,098][140503] Updated weights for policy 0, policy_version 18786 (0.0006) [2023-08-17 13:30:49,861][140503] Updated weights for policy 0, policy_version 18796 (0.0007) [2023-08-17 13:30:50,615][140503] Updated weights for policy 0, policy_version 18806 (0.0006) [2023-08-17 13:30:51,381][140503] Updated weights for policy 0, policy_version 18816 (0.0006) [2023-08-17 13:30:52,138][140503] Updated weights for policy 0, policy_version 18826 (0.0007) [2023-08-17 13:30:52,897][140503] Updated weights for policy 0, policy_version 18836 (0.0006) [2023-08-17 13:30:53,084][140404] Fps is (10 sec: 53248.1, 60 sec: 53179.7, 300 sec: 53303.5). Total num frames: 77160448. Throughput: 0: 13267.3. Samples: 15652960. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:30:53,085][140404] Avg episode reward: [(0, '45.473')] [2023-08-17 13:30:53,658][140503] Updated weights for policy 0, policy_version 18846 (0.0006) [2023-08-17 13:30:54,424][140503] Updated weights for policy 0, policy_version 18856 (0.0006) [2023-08-17 13:30:55,200][140503] Updated weights for policy 0, policy_version 18866 (0.0006) [2023-08-17 13:30:55,951][140503] Updated weights for policy 0, policy_version 18876 (0.0006) [2023-08-17 13:30:56,701][140503] Updated weights for policy 0, policy_version 18886 (0.0006) [2023-08-17 13:30:57,488][140503] Updated weights for policy 0, policy_version 18896 (0.0006) [2023-08-17 13:30:58,084][140404] Fps is (10 sec: 53247.6, 60 sec: 53179.7, 300 sec: 53289.6). Total num frames: 77426688. Throughput: 0: 13303.5. Samples: 15733796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:30:58,085][140404] Avg episode reward: [(0, '44.859')] [2023-08-17 13:30:58,241][140503] Updated weights for policy 0, policy_version 18906 (0.0006) [2023-08-17 13:30:59,035][140503] Updated weights for policy 0, policy_version 18916 (0.0007) [2023-08-17 13:30:59,823][140503] Updated weights for policy 0, policy_version 18926 (0.0007) [2023-08-17 13:31:00,614][140503] Updated weights for policy 0, policy_version 18936 (0.0007) [2023-08-17 13:31:01,372][140503] Updated weights for policy 0, policy_version 18946 (0.0006) [2023-08-17 13:31:02,125][140503] Updated weights for policy 0, policy_version 18956 (0.0006) [2023-08-17 13:31:02,882][140503] Updated weights for policy 0, policy_version 18966 (0.0006) [2023-08-17 13:31:03,084][140404] Fps is (10 sec: 53246.9, 60 sec: 53111.3, 300 sec: 53303.5). Total num frames: 77692928. Throughput: 0: 13303.9. Samples: 15813196. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:31:03,085][140404] Avg episode reward: [(0, '49.168')] [2023-08-17 13:31:03,102][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000018969_77697024.pth... [2023-08-17 13:31:03,143][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000015845_64901120.pth [2023-08-17 13:31:03,148][140489] Saving new best policy, reward=49.168! [2023-08-17 13:31:03,675][140503] Updated weights for policy 0, policy_version 18976 (0.0006) [2023-08-17 13:31:04,425][140503] Updated weights for policy 0, policy_version 18986 (0.0006) [2023-08-17 13:31:05,175][140503] Updated weights for policy 0, policy_version 18996 (0.0006) [2023-08-17 13:31:05,942][140503] Updated weights for policy 0, policy_version 19006 (0.0007) [2023-08-17 13:31:06,715][140503] Updated weights for policy 0, policy_version 19016 (0.0007) [2023-08-17 13:31:07,470][140503] Updated weights for policy 0, policy_version 19026 (0.0006) [2023-08-17 13:31:08,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53179.9, 300 sec: 53317.4). Total num frames: 77959168. Throughput: 0: 13311.2. Samples: 15853528. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:31:08,085][140404] Avg episode reward: [(0, '45.519')] [2023-08-17 13:31:08,252][140503] Updated weights for policy 0, policy_version 19036 (0.0007) [2023-08-17 13:31:09,011][140503] Updated weights for policy 0, policy_version 19046 (0.0006) [2023-08-17 13:31:09,773][140503] Updated weights for policy 0, policy_version 19056 (0.0006) [2023-08-17 13:31:10,543][140503] Updated weights for policy 0, policy_version 19066 (0.0007) [2023-08-17 13:31:11,353][140503] Updated weights for policy 0, policy_version 19076 (0.0007) [2023-08-17 13:31:12,127][140503] Updated weights for policy 0, policy_version 19086 (0.0007) [2023-08-17 13:31:12,856][140503] Updated weights for policy 0, policy_version 19096 (0.0006) [2023-08-17 13:31:13,084][140404] Fps is (10 sec: 53247.7, 60 sec: 53179.5, 300 sec: 53331.3). Total num frames: 78225408. Throughput: 0: 13322.7. Samples: 15932856. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:31:13,086][140404] Avg episode reward: [(0, '47.258')] [2023-08-17 13:31:13,643][140503] Updated weights for policy 0, policy_version 19106 (0.0006) [2023-08-17 13:31:14,393][140503] Updated weights for policy 0, policy_version 19116 (0.0006) [2023-08-17 13:31:15,166][140503] Updated weights for policy 0, policy_version 19126 (0.0007) [2023-08-17 13:31:15,907][140503] Updated weights for policy 0, policy_version 19136 (0.0006) [2023-08-17 13:31:16,709][140503] Updated weights for policy 0, policy_version 19146 (0.0007) [2023-08-17 13:31:17,479][140503] Updated weights for policy 0, policy_version 19156 (0.0007) [2023-08-17 13:31:18,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53248.2, 300 sec: 53303.5). Total num frames: 78491648. Throughput: 0: 13320.3. Samples: 16012664. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:31:18,085][140404] Avg episode reward: [(0, '46.262')] [2023-08-17 13:31:18,298][140503] Updated weights for policy 0, policy_version 19166 (0.0007) [2023-08-17 13:31:19,040][140503] Updated weights for policy 0, policy_version 19176 (0.0006) [2023-08-17 13:31:19,781][140503] Updated weights for policy 0, policy_version 19186 (0.0006) [2023-08-17 13:31:20,551][140503] Updated weights for policy 0, policy_version 19196 (0.0007) [2023-08-17 13:31:21,338][140503] Updated weights for policy 0, policy_version 19206 (0.0007) [2023-08-17 13:31:22,108][140503] Updated weights for policy 0, policy_version 19216 (0.0006) [2023-08-17 13:31:22,929][140503] Updated weights for policy 0, policy_version 19226 (0.0007) [2023-08-17 13:31:23,084][140404] Fps is (10 sec: 53249.4, 60 sec: 53248.1, 300 sec: 53303.5). Total num frames: 78757888. Throughput: 0: 13326.5. Samples: 16052984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:31:23,085][140404] Avg episode reward: [(0, '48.081')] [2023-08-17 13:31:23,689][140503] Updated weights for policy 0, policy_version 19236 (0.0006) [2023-08-17 13:31:24,479][140503] Updated weights for policy 0, policy_version 19246 (0.0006) [2023-08-17 13:31:25,251][140503] Updated weights for policy 0, policy_version 19256 (0.0007) [2023-08-17 13:31:26,049][140503] Updated weights for policy 0, policy_version 19266 (0.0007) [2023-08-17 13:31:26,825][140503] Updated weights for policy 0, policy_version 19276 (0.0007) [2023-08-17 13:31:27,595][140503] Updated weights for policy 0, policy_version 19286 (0.0007) [2023-08-17 13:31:28,084][140404] Fps is (10 sec: 52838.9, 60 sec: 53248.1, 300 sec: 53289.7). Total num frames: 79020032. Throughput: 0: 13297.7. Samples: 16131568. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:31:28,085][140404] Avg episode reward: [(0, '48.492')] [2023-08-17 13:31:28,388][140503] Updated weights for policy 0, policy_version 19296 (0.0007) [2023-08-17 13:31:29,160][140503] Updated weights for policy 0, policy_version 19306 (0.0007) [2023-08-17 13:31:29,917][140503] Updated weights for policy 0, policy_version 19316 (0.0007) [2023-08-17 13:31:30,699][140503] Updated weights for policy 0, policy_version 19326 (0.0006) [2023-08-17 13:31:31,450][140503] Updated weights for policy 0, policy_version 19336 (0.0006) [2023-08-17 13:31:32,231][140503] Updated weights for policy 0, policy_version 19346 (0.0006) [2023-08-17 13:31:32,989][140503] Updated weights for policy 0, policy_version 19356 (0.0006) [2023-08-17 13:31:33,084][140404] Fps is (10 sec: 52838.2, 60 sec: 53248.0, 300 sec: 53275.8). Total num frames: 79286272. Throughput: 0: 13289.7. Samples: 16211028. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:31:33,085][140404] Avg episode reward: [(0, '45.444')] [2023-08-17 13:31:33,782][140503] Updated weights for policy 0, policy_version 19366 (0.0007) [2023-08-17 13:31:34,574][140503] Updated weights for policy 0, policy_version 19376 (0.0007) [2023-08-17 13:31:35,339][140503] Updated weights for policy 0, policy_version 19386 (0.0006) [2023-08-17 13:31:36,075][140503] Updated weights for policy 0, policy_version 19396 (0.0006) [2023-08-17 13:31:36,836][140503] Updated weights for policy 0, policy_version 19406 (0.0007) [2023-08-17 13:31:37,589][140503] Updated weights for policy 0, policy_version 19416 (0.0006) [2023-08-17 13:31:38,084][140404] Fps is (10 sec: 53248.0, 60 sec: 53179.8, 300 sec: 53289.7). Total num frames: 79552512. Throughput: 0: 13292.1. Samples: 16251104. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:31:38,085][140404] Avg episode reward: [(0, '44.868')] [2023-08-17 13:31:38,371][140503] Updated weights for policy 0, policy_version 19426 (0.0007) [2023-08-17 13:31:39,133][140503] Updated weights for policy 0, policy_version 19436 (0.0007) [2023-08-17 13:31:39,914][140503] Updated weights for policy 0, policy_version 19446 (0.0007) [2023-08-17 13:31:40,667][140503] Updated weights for policy 0, policy_version 19456 (0.0006) [2023-08-17 13:31:41,442][140503] Updated weights for policy 0, policy_version 19466 (0.0006) [2023-08-17 13:31:42,242][140503] Updated weights for policy 0, policy_version 19476 (0.0006) [2023-08-17 13:31:42,995][140503] Updated weights for policy 0, policy_version 19486 (0.0007) [2023-08-17 13:31:43,084][140404] Fps is (10 sec: 53248.1, 60 sec: 53179.7, 300 sec: 53289.7). Total num frames: 79818752. Throughput: 0: 13275.6. Samples: 16331196. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:31:43,085][140404] Avg episode reward: [(0, '42.181')] [2023-08-17 13:31:43,774][140503] Updated weights for policy 0, policy_version 19496 (0.0006) [2023-08-17 13:31:44,540][140503] Updated weights for policy 0, policy_version 19506 (0.0006) [2023-08-17 13:31:45,289][140503] Updated weights for policy 0, policy_version 19516 (0.0006) [2023-08-17 13:31:46,030][140503] Updated weights for policy 0, policy_version 19526 (0.0006) [2023-08-17 13:31:46,786][140503] Updated weights for policy 0, policy_version 19536 (0.0006) [2023-08-17 13:31:47,534][140503] Updated weights for policy 0, policy_version 19546 (0.0006) [2023-08-17 13:31:48,084][140404] Fps is (10 sec: 53247.7, 60 sec: 53179.7, 300 sec: 53289.7). Total num frames: 80084992. Throughput: 0: 13300.1. Samples: 16411696. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:31:48,085][140404] Avg episode reward: [(0, '46.126')] [2023-08-17 13:31:48,322][140503] Updated weights for policy 0, policy_version 19556 (0.0006) [2023-08-17 13:31:49,085][140503] Updated weights for policy 0, policy_version 19566 (0.0006) [2023-08-17 13:31:49,845][140503] Updated weights for policy 0, policy_version 19576 (0.0006) [2023-08-17 13:31:50,601][140503] Updated weights for policy 0, policy_version 19586 (0.0006) [2023-08-17 13:31:51,375][140503] Updated weights for policy 0, policy_version 19596 (0.0006) [2023-08-17 13:31:52,143][140503] Updated weights for policy 0, policy_version 19606 (0.0006) [2023-08-17 13:31:52,914][140503] Updated weights for policy 0, policy_version 19616 (0.0007) [2023-08-17 13:31:53,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53248.0, 300 sec: 53303.5). Total num frames: 80355328. Throughput: 0: 13296.1. Samples: 16451852. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:31:53,085][140404] Avg episode reward: [(0, '43.659')] [2023-08-17 13:31:53,686][140503] Updated weights for policy 0, policy_version 19626 (0.0007) [2023-08-17 13:31:54,448][140503] Updated weights for policy 0, policy_version 19636 (0.0006) [2023-08-17 13:31:55,225][140503] Updated weights for policy 0, policy_version 19646 (0.0006) [2023-08-17 13:31:55,992][140503] Updated weights for policy 0, policy_version 19656 (0.0006) [2023-08-17 13:31:56,734][140503] Updated weights for policy 0, policy_version 19666 (0.0006) [2023-08-17 13:31:57,495][140503] Updated weights for policy 0, policy_version 19676 (0.0007) [2023-08-17 13:31:58,084][140404] Fps is (10 sec: 53657.6, 60 sec: 53248.0, 300 sec: 53303.6). Total num frames: 80621568. Throughput: 0: 13318.3. Samples: 16532176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:31:58,085][140404] Avg episode reward: [(0, '47.054')] [2023-08-17 13:31:58,269][140503] Updated weights for policy 0, policy_version 19686 (0.0006) [2023-08-17 13:31:59,025][140503] Updated weights for policy 0, policy_version 19696 (0.0006) [2023-08-17 13:31:59,762][140503] Updated weights for policy 0, policy_version 19706 (0.0006) [2023-08-17 13:32:00,520][140503] Updated weights for policy 0, policy_version 19716 (0.0007) [2023-08-17 13:32:01,295][140503] Updated weights for policy 0, policy_version 19726 (0.0007) [2023-08-17 13:32:02,056][140503] Updated weights for policy 0, policy_version 19736 (0.0007) [2023-08-17 13:32:02,841][140503] Updated weights for policy 0, policy_version 19746 (0.0006) [2023-08-17 13:32:03,084][140404] Fps is (10 sec: 53657.5, 60 sec: 53316.4, 300 sec: 53303.5). Total num frames: 80891904. Throughput: 0: 13332.9. Samples: 16612644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:32:03,085][140404] Avg episode reward: [(0, '41.820')] [2023-08-17 13:32:03,587][140503] Updated weights for policy 0, policy_version 19756 (0.0006) [2023-08-17 13:32:04,372][140503] Updated weights for policy 0, policy_version 19766 (0.0007) [2023-08-17 13:32:05,106][140503] Updated weights for policy 0, policy_version 19776 (0.0006) [2023-08-17 13:32:05,893][140503] Updated weights for policy 0, policy_version 19786 (0.0007) [2023-08-17 13:32:06,658][140503] Updated weights for policy 0, policy_version 19796 (0.0007) [2023-08-17 13:32:07,415][140503] Updated weights for policy 0, policy_version 19806 (0.0006) [2023-08-17 13:32:08,084][140404] Fps is (10 sec: 53657.5, 60 sec: 53316.3, 300 sec: 53289.6). Total num frames: 81158144. Throughput: 0: 13329.6. Samples: 16652816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:32:08,085][140404] Avg episode reward: [(0, '48.303')] [2023-08-17 13:32:08,178][140503] Updated weights for policy 0, policy_version 19816 (0.0006) [2023-08-17 13:32:08,957][140503] Updated weights for policy 0, policy_version 19826 (0.0006) [2023-08-17 13:32:09,694][140503] Updated weights for policy 0, policy_version 19836 (0.0006) [2023-08-17 13:32:10,485][140503] Updated weights for policy 0, policy_version 19846 (0.0007) [2023-08-17 13:32:11,241][140503] Updated weights for policy 0, policy_version 19856 (0.0006) [2023-08-17 13:32:12,058][140503] Updated weights for policy 0, policy_version 19866 (0.0007) [2023-08-17 13:32:12,834][140503] Updated weights for policy 0, policy_version 19876 (0.0007) [2023-08-17 13:32:13,084][140404] Fps is (10 sec: 53248.0, 60 sec: 53316.5, 300 sec: 53303.5). Total num frames: 81424384. Throughput: 0: 13358.7. Samples: 16732712. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:32:13,085][140404] Avg episode reward: [(0, '45.407')] [2023-08-17 13:32:13,642][140503] Updated weights for policy 0, policy_version 19886 (0.0006) [2023-08-17 13:32:14,395][140503] Updated weights for policy 0, policy_version 19896 (0.0007) [2023-08-17 13:32:15,152][140503] Updated weights for policy 0, policy_version 19906 (0.0007) [2023-08-17 13:32:15,915][140503] Updated weights for policy 0, policy_version 19916 (0.0006) [2023-08-17 13:32:16,682][140503] Updated weights for policy 0, policy_version 19926 (0.0007) [2023-08-17 13:32:17,455][140503] Updated weights for policy 0, policy_version 19936 (0.0006) [2023-08-17 13:32:18,084][140404] Fps is (10 sec: 52838.2, 60 sec: 53248.0, 300 sec: 53275.8). Total num frames: 81686528. Throughput: 0: 13359.1. Samples: 16812188. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:32:18,085][140404] Avg episode reward: [(0, '49.373')] [2023-08-17 13:32:18,086][140489] Saving new best policy, reward=49.373! [2023-08-17 13:32:18,240][140503] Updated weights for policy 0, policy_version 19946 (0.0007) [2023-08-17 13:32:19,005][140503] Updated weights for policy 0, policy_version 19956 (0.0006) [2023-08-17 13:32:19,763][140503] Updated weights for policy 0, policy_version 19966 (0.0006) [2023-08-17 13:32:20,524][140503] Updated weights for policy 0, policy_version 19976 (0.0006) [2023-08-17 13:32:21,310][140503] Updated weights for policy 0, policy_version 19986 (0.0006) [2023-08-17 13:32:22,061][140503] Updated weights for policy 0, policy_version 19996 (0.0006) [2023-08-17 13:32:22,816][140503] Updated weights for policy 0, policy_version 20006 (0.0006) [2023-08-17 13:32:23,084][140404] Fps is (10 sec: 53247.8, 60 sec: 53316.2, 300 sec: 53289.7). Total num frames: 81956864. Throughput: 0: 13359.0. Samples: 16852260. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:32:23,085][140404] Avg episode reward: [(0, '47.041')] [2023-08-17 13:32:23,577][140503] Updated weights for policy 0, policy_version 20016 (0.0006) [2023-08-17 13:32:24,348][140503] Updated weights for policy 0, policy_version 20026 (0.0006) [2023-08-17 13:32:25,071][140503] Updated weights for policy 0, policy_version 20036 (0.0006) [2023-08-17 13:32:25,812][140503] Updated weights for policy 0, policy_version 20046 (0.0005) [2023-08-17 13:32:26,581][140503] Updated weights for policy 0, policy_version 20056 (0.0006) [2023-08-17 13:32:27,332][140503] Updated weights for policy 0, policy_version 20066 (0.0006) [2023-08-17 13:32:28,082][140503] Updated weights for policy 0, policy_version 20076 (0.0006) [2023-08-17 13:32:28,084][140404] Fps is (10 sec: 54477.4, 60 sec: 53521.1, 300 sec: 53317.4). Total num frames: 82231296. Throughput: 0: 13386.9. Samples: 16933604. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:32:28,085][140404] Avg episode reward: [(0, '46.828')] [2023-08-17 13:32:28,829][140503] Updated weights for policy 0, policy_version 20086 (0.0006) [2023-08-17 13:32:29,586][140503] Updated weights for policy 0, policy_version 20096 (0.0006) [2023-08-17 13:32:30,335][140503] Updated weights for policy 0, policy_version 20106 (0.0006) [2023-08-17 13:32:31,078][140503] Updated weights for policy 0, policy_version 20116 (0.0006) [2023-08-17 13:32:31,834][140503] Updated weights for policy 0, policy_version 20126 (0.0006) [2023-08-17 13:32:32,612][140503] Updated weights for policy 0, policy_version 20136 (0.0006) [2023-08-17 13:32:33,084][140404] Fps is (10 sec: 54477.0, 60 sec: 53589.3, 300 sec: 53331.3). Total num frames: 82501632. Throughput: 0: 13414.4. Samples: 17015344. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:32:33,085][140404] Avg episode reward: [(0, '43.363')] [2023-08-17 13:32:33,353][140503] Updated weights for policy 0, policy_version 20146 (0.0006) [2023-08-17 13:32:34,091][140503] Updated weights for policy 0, policy_version 20156 (0.0005) [2023-08-17 13:32:34,843][140503] Updated weights for policy 0, policy_version 20166 (0.0006) [2023-08-17 13:32:35,582][140503] Updated weights for policy 0, policy_version 20176 (0.0005) [2023-08-17 13:32:36,298][140503] Updated weights for policy 0, policy_version 20186 (0.0006) [2023-08-17 13:32:37,073][140503] Updated weights for policy 0, policy_version 20196 (0.0006) [2023-08-17 13:32:37,815][140503] Updated weights for policy 0, policy_version 20206 (0.0007) [2023-08-17 13:32:38,084][140404] Fps is (10 sec: 54476.3, 60 sec: 53725.8, 300 sec: 53359.1). Total num frames: 82776064. Throughput: 0: 13441.9. Samples: 17056740. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:32:38,085][140404] Avg episode reward: [(0, '44.119')] [2023-08-17 13:32:38,603][140503] Updated weights for policy 0, policy_version 20216 (0.0007) [2023-08-17 13:32:39,358][140503] Updated weights for policy 0, policy_version 20226 (0.0006) [2023-08-17 13:32:40,119][140503] Updated weights for policy 0, policy_version 20236 (0.0006) [2023-08-17 13:32:40,882][140503] Updated weights for policy 0, policy_version 20246 (0.0006) [2023-08-17 13:32:41,650][140503] Updated weights for policy 0, policy_version 20256 (0.0007) [2023-08-17 13:32:42,438][140503] Updated weights for policy 0, policy_version 20266 (0.0007) [2023-08-17 13:32:43,084][140404] Fps is (10 sec: 54067.2, 60 sec: 53725.9, 300 sec: 53345.2). Total num frames: 83042304. Throughput: 0: 13446.7. Samples: 17137276. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:32:43,085][140404] Avg episode reward: [(0, '47.244')] [2023-08-17 13:32:43,190][140503] Updated weights for policy 0, policy_version 20276 (0.0006) [2023-08-17 13:32:43,980][140503] Updated weights for policy 0, policy_version 20286 (0.0007) [2023-08-17 13:32:44,760][140503] Updated weights for policy 0, policy_version 20296 (0.0007) [2023-08-17 13:32:45,549][140503] Updated weights for policy 0, policy_version 20306 (0.0007) [2023-08-17 13:32:46,357][140503] Updated weights for policy 0, policy_version 20316 (0.0007) [2023-08-17 13:32:47,116][140503] Updated weights for policy 0, policy_version 20326 (0.0006) [2023-08-17 13:32:47,852][140503] Updated weights for policy 0, policy_version 20336 (0.0006) [2023-08-17 13:32:48,084][140404] Fps is (10 sec: 52838.6, 60 sec: 53657.6, 300 sec: 53331.3). Total num frames: 83304448. Throughput: 0: 13421.9. Samples: 17216628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:32:48,085][140404] Avg episode reward: [(0, '43.824')] [2023-08-17 13:32:48,619][140503] Updated weights for policy 0, policy_version 20346 (0.0006) [2023-08-17 13:32:49,392][140503] Updated weights for policy 0, policy_version 20356 (0.0007) [2023-08-17 13:32:50,160][140503] Updated weights for policy 0, policy_version 20366 (0.0007) [2023-08-17 13:32:50,945][140503] Updated weights for policy 0, policy_version 20376 (0.0007) [2023-08-17 13:32:51,745][140503] Updated weights for policy 0, policy_version 20386 (0.0007) [2023-08-17 13:32:52,505][140503] Updated weights for policy 0, policy_version 20396 (0.0006) [2023-08-17 13:32:53,084][140404] Fps is (10 sec: 52838.5, 60 sec: 53589.3, 300 sec: 53317.4). Total num frames: 83570688. Throughput: 0: 13411.0. Samples: 17256312. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:32:53,085][140404] Avg episode reward: [(0, '46.029')] [2023-08-17 13:32:53,304][140503] Updated weights for policy 0, policy_version 20406 (0.0006) [2023-08-17 13:32:54,076][140503] Updated weights for policy 0, policy_version 20416 (0.0007) [2023-08-17 13:32:54,826][140503] Updated weights for policy 0, policy_version 20426 (0.0006) [2023-08-17 13:32:55,577][140503] Updated weights for policy 0, policy_version 20436 (0.0006) [2023-08-17 13:32:56,327][140503] Updated weights for policy 0, policy_version 20446 (0.0006) [2023-08-17 13:32:57,108][140503] Updated weights for policy 0, policy_version 20456 (0.0006) [2023-08-17 13:32:57,893][140503] Updated weights for policy 0, policy_version 20466 (0.0007) [2023-08-17 13:32:58,084][140404] Fps is (10 sec: 53247.7, 60 sec: 53589.3, 300 sec: 53331.3). Total num frames: 83836928. Throughput: 0: 13404.6. Samples: 17335920. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:32:58,085][140404] Avg episode reward: [(0, '43.276')] [2023-08-17 13:32:58,664][140503] Updated weights for policy 0, policy_version 20476 (0.0006) [2023-08-17 13:32:59,442][140503] Updated weights for policy 0, policy_version 20486 (0.0006) [2023-08-17 13:33:00,200][140503] Updated weights for policy 0, policy_version 20496 (0.0006) [2023-08-17 13:33:00,956][140503] Updated weights for policy 0, policy_version 20506 (0.0007) [2023-08-17 13:33:01,719][140503] Updated weights for policy 0, policy_version 20516 (0.0007) [2023-08-17 13:33:02,555][140503] Updated weights for policy 0, policy_version 20526 (0.0007) [2023-08-17 13:33:03,084][140404] Fps is (10 sec: 52838.2, 60 sec: 53452.8, 300 sec: 53331.3). Total num frames: 84099072. Throughput: 0: 13399.9. Samples: 17415184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:33:03,085][140404] Avg episode reward: [(0, '42.967')] [2023-08-17 13:33:03,088][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000020533_84103168.pth... [2023-08-17 13:33:03,125][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000017409_71307264.pth [2023-08-17 13:33:03,326][140503] Updated weights for policy 0, policy_version 20536 (0.0007) [2023-08-17 13:33:04,098][140503] Updated weights for policy 0, policy_version 20546 (0.0006) [2023-08-17 13:33:04,861][140503] Updated weights for policy 0, policy_version 20556 (0.0007) [2023-08-17 13:33:05,639][140503] Updated weights for policy 0, policy_version 20566 (0.0007) [2023-08-17 13:33:06,438][140503] Updated weights for policy 0, policy_version 20576 (0.0007) [2023-08-17 13:33:07,198][140503] Updated weights for policy 0, policy_version 20586 (0.0006) [2023-08-17 13:33:07,949][140503] Updated weights for policy 0, policy_version 20596 (0.0006) [2023-08-17 13:33:08,084][140404] Fps is (10 sec: 52838.8, 60 sec: 53452.8, 300 sec: 53303.5). Total num frames: 84365312. Throughput: 0: 13387.7. Samples: 17454708. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:33:08,085][140404] Avg episode reward: [(0, '45.282')] [2023-08-17 13:33:08,700][140503] Updated weights for policy 0, policy_version 20606 (0.0006) [2023-08-17 13:33:09,472][140503] Updated weights for policy 0, policy_version 20616 (0.0006) [2023-08-17 13:33:10,236][140503] Updated weights for policy 0, policy_version 20626 (0.0006) [2023-08-17 13:33:11,039][140503] Updated weights for policy 0, policy_version 20636 (0.0007) [2023-08-17 13:33:11,793][140503] Updated weights for policy 0, policy_version 20646 (0.0006) [2023-08-17 13:33:12,558][140503] Updated weights for policy 0, policy_version 20656 (0.0006) [2023-08-17 13:33:13,084][140404] Fps is (10 sec: 53248.2, 60 sec: 53452.8, 300 sec: 53317.4). Total num frames: 84631552. Throughput: 0: 13359.0. Samples: 17534760. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:33:13,085][140404] Avg episode reward: [(0, '46.943')] [2023-08-17 13:33:13,341][140503] Updated weights for policy 0, policy_version 20666 (0.0006) [2023-08-17 13:33:14,096][140503] Updated weights for policy 0, policy_version 20676 (0.0006) [2023-08-17 13:33:14,854][140503] Updated weights for policy 0, policy_version 20686 (0.0006) [2023-08-17 13:33:15,617][140503] Updated weights for policy 0, policy_version 20696 (0.0006) [2023-08-17 13:33:16,363][140503] Updated weights for policy 0, policy_version 20706 (0.0007) [2023-08-17 13:33:17,132][140503] Updated weights for policy 0, policy_version 20716 (0.0006) [2023-08-17 13:33:17,894][140503] Updated weights for policy 0, policy_version 20726 (0.0006) [2023-08-17 13:33:18,084][140404] Fps is (10 sec: 53657.5, 60 sec: 53589.4, 300 sec: 53331.3). Total num frames: 84901888. Throughput: 0: 13335.8. Samples: 17615456. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:33:18,085][140404] Avg episode reward: [(0, '46.052')] [2023-08-17 13:33:18,676][140503] Updated weights for policy 0, policy_version 20736 (0.0007) [2023-08-17 13:33:19,443][140503] Updated weights for policy 0, policy_version 20746 (0.0006) [2023-08-17 13:33:20,238][140503] Updated weights for policy 0, policy_version 20756 (0.0006) [2023-08-17 13:33:20,987][140503] Updated weights for policy 0, policy_version 20766 (0.0007) [2023-08-17 13:33:21,772][140503] Updated weights for policy 0, policy_version 20776 (0.0006) [2023-08-17 13:33:22,543][140503] Updated weights for policy 0, policy_version 20786 (0.0007) [2023-08-17 13:33:23,084][140404] Fps is (10 sec: 53657.5, 60 sec: 53521.1, 300 sec: 53331.3). Total num frames: 85168128. Throughput: 0: 13293.9. Samples: 17654964. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:33:23,085][140404] Avg episode reward: [(0, '46.326')] [2023-08-17 13:33:23,303][140503] Updated weights for policy 0, policy_version 20796 (0.0007) [2023-08-17 13:33:24,066][140503] Updated weights for policy 0, policy_version 20806 (0.0007) [2023-08-17 13:33:24,820][140503] Updated weights for policy 0, policy_version 20816 (0.0006) [2023-08-17 13:33:25,576][140503] Updated weights for policy 0, policy_version 20826 (0.0006) [2023-08-17 13:33:26,345][140503] Updated weights for policy 0, policy_version 20836 (0.0006) [2023-08-17 13:33:27,110][140503] Updated weights for policy 0, policy_version 20846 (0.0006) [2023-08-17 13:33:27,891][140503] Updated weights for policy 0, policy_version 20856 (0.0007) [2023-08-17 13:33:28,084][140404] Fps is (10 sec: 53248.3, 60 sec: 53384.5, 300 sec: 53331.3). Total num frames: 85434368. Throughput: 0: 13291.1. Samples: 17735376. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:33:28,085][140404] Avg episode reward: [(0, '45.289')] [2023-08-17 13:33:28,646][140503] Updated weights for policy 0, policy_version 20866 (0.0006) [2023-08-17 13:33:29,433][140503] Updated weights for policy 0, policy_version 20876 (0.0006) [2023-08-17 13:33:30,201][140503] Updated weights for policy 0, policy_version 20886 (0.0006) [2023-08-17 13:33:30,963][140503] Updated weights for policy 0, policy_version 20896 (0.0006) [2023-08-17 13:33:31,719][140503] Updated weights for policy 0, policy_version 20906 (0.0006) [2023-08-17 13:33:32,502][140503] Updated weights for policy 0, policy_version 20916 (0.0006) [2023-08-17 13:33:33,084][140404] Fps is (10 sec: 53248.0, 60 sec: 53316.3, 300 sec: 53317.4). Total num frames: 85700608. Throughput: 0: 13297.6. Samples: 17815020. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:33:33,085][140404] Avg episode reward: [(0, '48.258')] [2023-08-17 13:33:33,280][140503] Updated weights for policy 0, policy_version 20926 (0.0007) [2023-08-17 13:33:34,087][140503] Updated weights for policy 0, policy_version 20936 (0.0007) [2023-08-17 13:33:34,863][140503] Updated weights for policy 0, policy_version 20946 (0.0006) [2023-08-17 13:33:35,630][140503] Updated weights for policy 0, policy_version 20956 (0.0007) [2023-08-17 13:33:36,407][140503] Updated weights for policy 0, policy_version 20966 (0.0006) [2023-08-17 13:33:37,152][140503] Updated weights for policy 0, policy_version 20976 (0.0006) [2023-08-17 13:33:37,906][140503] Updated weights for policy 0, policy_version 20986 (0.0006) [2023-08-17 13:33:38,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53179.8, 300 sec: 53303.5). Total num frames: 85966848. Throughput: 0: 13290.1. Samples: 17854368. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:33:38,085][140404] Avg episode reward: [(0, '43.326')] [2023-08-17 13:33:38,684][140503] Updated weights for policy 0, policy_version 20996 (0.0007) [2023-08-17 13:33:39,455][140503] Updated weights for policy 0, policy_version 21006 (0.0006) [2023-08-17 13:33:40,205][140503] Updated weights for policy 0, policy_version 21016 (0.0006) [2023-08-17 13:33:40,978][140503] Updated weights for policy 0, policy_version 21026 (0.0007) [2023-08-17 13:33:41,743][140503] Updated weights for policy 0, policy_version 21036 (0.0006) [2023-08-17 13:33:42,542][140503] Updated weights for policy 0, policy_version 21046 (0.0007) [2023-08-17 13:33:43,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53179.7, 300 sec: 53303.5). Total num frames: 86233088. Throughput: 0: 13308.4. Samples: 17934796. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:33:43,085][140404] Avg episode reward: [(0, '47.495')] [2023-08-17 13:33:43,305][140503] Updated weights for policy 0, policy_version 21056 (0.0006) [2023-08-17 13:33:44,084][140503] Updated weights for policy 0, policy_version 21066 (0.0007) [2023-08-17 13:33:44,872][140503] Updated weights for policy 0, policy_version 21076 (0.0007) [2023-08-17 13:33:45,644][140503] Updated weights for policy 0, policy_version 21086 (0.0006) [2023-08-17 13:33:46,383][140503] Updated weights for policy 0, policy_version 21096 (0.0006) [2023-08-17 13:33:47,141][140503] Updated weights for policy 0, policy_version 21106 (0.0006) [2023-08-17 13:33:47,904][140503] Updated weights for policy 0, policy_version 21116 (0.0006) [2023-08-17 13:33:48,084][140404] Fps is (10 sec: 53247.3, 60 sec: 53247.9, 300 sec: 53317.4). Total num frames: 86499328. Throughput: 0: 13320.7. Samples: 18014616. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:33:48,085][140404] Avg episode reward: [(0, '47.089')] [2023-08-17 13:33:48,658][140503] Updated weights for policy 0, policy_version 21126 (0.0006) [2023-08-17 13:33:49,411][140503] Updated weights for policy 0, policy_version 21136 (0.0006) [2023-08-17 13:33:50,191][140503] Updated weights for policy 0, policy_version 21146 (0.0007) [2023-08-17 13:33:50,954][140503] Updated weights for policy 0, policy_version 21156 (0.0007) [2023-08-17 13:33:51,707][140503] Updated weights for policy 0, policy_version 21166 (0.0006) [2023-08-17 13:33:52,499][140503] Updated weights for policy 0, policy_version 21176 (0.0007) [2023-08-17 13:33:53,084][140404] Fps is (10 sec: 53246.9, 60 sec: 53247.8, 300 sec: 53317.4). Total num frames: 86765568. Throughput: 0: 13334.7. Samples: 18054772. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:33:53,085][140404] Avg episode reward: [(0, '47.201')] [2023-08-17 13:33:53,269][140503] Updated weights for policy 0, policy_version 21186 (0.0007) [2023-08-17 13:33:54,010][140503] Updated weights for policy 0, policy_version 21196 (0.0006) [2023-08-17 13:33:54,816][140503] Updated weights for policy 0, policy_version 21206 (0.0007) [2023-08-17 13:33:55,574][140503] Updated weights for policy 0, policy_version 21216 (0.0006) [2023-08-17 13:33:56,373][140503] Updated weights for policy 0, policy_version 21226 (0.0006) [2023-08-17 13:33:57,121][140503] Updated weights for policy 0, policy_version 21236 (0.0007) [2023-08-17 13:33:57,842][140503] Updated weights for policy 0, policy_version 21246 (0.0006) [2023-08-17 13:33:58,084][140404] Fps is (10 sec: 53247.6, 60 sec: 53247.9, 300 sec: 53303.5). Total num frames: 87031808. Throughput: 0: 13332.2. Samples: 18134712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:33:58,086][140404] Avg episode reward: [(0, '45.606')] [2023-08-17 13:33:58,631][140503] Updated weights for policy 0, policy_version 21256 (0.0006) [2023-08-17 13:33:59,367][140503] Updated weights for policy 0, policy_version 21266 (0.0006) [2023-08-17 13:34:00,117][140503] Updated weights for policy 0, policy_version 21276 (0.0006) [2023-08-17 13:34:00,866][140503] Updated weights for policy 0, policy_version 21286 (0.0006) [2023-08-17 13:34:01,736][140503] Updated weights for policy 0, policy_version 21296 (0.0007) [2023-08-17 13:34:02,509][140503] Updated weights for policy 0, policy_version 21306 (0.0007) [2023-08-17 13:34:03,084][140404] Fps is (10 sec: 53249.4, 60 sec: 53316.3, 300 sec: 53317.4). Total num frames: 87298048. Throughput: 0: 13309.7. Samples: 18214392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:34:03,085][140404] Avg episode reward: [(0, '48.343')] [2023-08-17 13:34:03,296][140503] Updated weights for policy 0, policy_version 21316 (0.0007) [2023-08-17 13:34:04,061][140503] Updated weights for policy 0, policy_version 21326 (0.0006) [2023-08-17 13:34:04,853][140503] Updated weights for policy 0, policy_version 21336 (0.0006) [2023-08-17 13:34:05,603][140503] Updated weights for policy 0, policy_version 21346 (0.0006) [2023-08-17 13:34:06,388][140503] Updated weights for policy 0, policy_version 21356 (0.0006) [2023-08-17 13:34:07,158][140503] Updated weights for policy 0, policy_version 21366 (0.0006) [2023-08-17 13:34:07,904][140503] Updated weights for policy 0, policy_version 21376 (0.0006) [2023-08-17 13:34:08,084][140404] Fps is (10 sec: 53248.8, 60 sec: 53316.2, 300 sec: 53303.5). Total num frames: 87564288. Throughput: 0: 13312.9. Samples: 18254044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:34:08,085][140404] Avg episode reward: [(0, '45.694')] [2023-08-17 13:34:08,665][140503] Updated weights for policy 0, policy_version 21386 (0.0006) [2023-08-17 13:34:09,418][140503] Updated weights for policy 0, policy_version 21396 (0.0006) [2023-08-17 13:34:10,189][140503] Updated weights for policy 0, policy_version 21406 (0.0007) [2023-08-17 13:34:10,954][140503] Updated weights for policy 0, policy_version 21416 (0.0007) [2023-08-17 13:34:11,687][140503] Updated weights for policy 0, policy_version 21426 (0.0006) [2023-08-17 13:34:12,446][140503] Updated weights for policy 0, policy_version 21436 (0.0006) [2023-08-17 13:34:13,084][140404] Fps is (10 sec: 53657.0, 60 sec: 53384.5, 300 sec: 53317.4). Total num frames: 87834624. Throughput: 0: 13321.3. Samples: 18334836. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:34:13,085][140404] Avg episode reward: [(0, '45.452')] [2023-08-17 13:34:13,202][140503] Updated weights for policy 0, policy_version 21446 (0.0007) [2023-08-17 13:34:13,975][140503] Updated weights for policy 0, policy_version 21456 (0.0006) [2023-08-17 13:34:14,759][140503] Updated weights for policy 0, policy_version 21466 (0.0007) [2023-08-17 13:34:15,549][140503] Updated weights for policy 0, policy_version 21476 (0.0007) [2023-08-17 13:34:16,325][140503] Updated weights for policy 0, policy_version 21486 (0.0007) [2023-08-17 13:34:17,089][140503] Updated weights for policy 0, policy_version 21496 (0.0006) [2023-08-17 13:34:17,847][140503] Updated weights for policy 0, policy_version 21506 (0.0007) [2023-08-17 13:34:18,084][140404] Fps is (10 sec: 53658.0, 60 sec: 53316.3, 300 sec: 53317.4). Total num frames: 88100864. Throughput: 0: 13328.7. Samples: 18414812. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:34:18,085][140404] Avg episode reward: [(0, '45.771')] [2023-08-17 13:34:18,598][140503] Updated weights for policy 0, policy_version 21516 (0.0006) [2023-08-17 13:34:19,353][140503] Updated weights for policy 0, policy_version 21526 (0.0007) [2023-08-17 13:34:20,118][140503] Updated weights for policy 0, policy_version 21536 (0.0006) [2023-08-17 13:34:20,869][140503] Updated weights for policy 0, policy_version 21546 (0.0006) [2023-08-17 13:34:21,650][140503] Updated weights for policy 0, policy_version 21556 (0.0006) [2023-08-17 13:34:22,407][140503] Updated weights for policy 0, policy_version 21566 (0.0006) [2023-08-17 13:34:23,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53316.2, 300 sec: 53317.4). Total num frames: 88367104. Throughput: 0: 13350.0. Samples: 18455120. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:34:23,085][140404] Avg episode reward: [(0, '46.463')] [2023-08-17 13:34:23,199][140503] Updated weights for policy 0, policy_version 21576 (0.0006) [2023-08-17 13:34:24,004][140503] Updated weights for policy 0, policy_version 21586 (0.0007) [2023-08-17 13:34:24,657][140489] Signal inference workers to stop experience collection... (100 times) [2023-08-17 13:34:24,657][140489] Signal inference workers to resume experience collection... (100 times) [2023-08-17 13:34:24,661][140503] InferenceWorker_p0-w0: stopping experience collection (100 times) [2023-08-17 13:34:24,661][140503] InferenceWorker_p0-w0: resuming experience collection (100 times) [2023-08-17 13:34:24,795][140503] Updated weights for policy 0, policy_version 21596 (0.0007) [2023-08-17 13:34:25,526][140503] Updated weights for policy 0, policy_version 21606 (0.0006) [2023-08-17 13:34:26,255][140503] Updated weights for policy 0, policy_version 21616 (0.0006) [2023-08-17 13:34:27,008][140503] Updated weights for policy 0, policy_version 21626 (0.0006) [2023-08-17 13:34:27,782][140503] Updated weights for policy 0, policy_version 21636 (0.0006) [2023-08-17 13:34:28,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53384.6, 300 sec: 53331.3). Total num frames: 88637440. Throughput: 0: 13347.6. Samples: 18535436. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:34:28,085][140404] Avg episode reward: [(0, '42.745')] [2023-08-17 13:34:28,531][140503] Updated weights for policy 0, policy_version 21646 (0.0007) [2023-08-17 13:34:29,302][140503] Updated weights for policy 0, policy_version 21656 (0.0006) [2023-08-17 13:34:30,108][140503] Updated weights for policy 0, policy_version 21666 (0.0006) [2023-08-17 13:34:30,844][140503] Updated weights for policy 0, policy_version 21676 (0.0006) [2023-08-17 13:34:31,601][140503] Updated weights for policy 0, policy_version 21686 (0.0006) [2023-08-17 13:34:32,370][140503] Updated weights for policy 0, policy_version 21696 (0.0006) [2023-08-17 13:34:33,084][140404] Fps is (10 sec: 53657.6, 60 sec: 53384.5, 300 sec: 53331.3). Total num frames: 88903680. Throughput: 0: 13360.3. Samples: 18615828. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:34:33,085][140404] Avg episode reward: [(0, '45.957')] [2023-08-17 13:34:33,124][140503] Updated weights for policy 0, policy_version 21706 (0.0006) [2023-08-17 13:34:33,878][140503] Updated weights for policy 0, policy_version 21716 (0.0006) [2023-08-17 13:34:34,649][140503] Updated weights for policy 0, policy_version 21726 (0.0007) [2023-08-17 13:34:35,404][140503] Updated weights for policy 0, policy_version 21736 (0.0006) [2023-08-17 13:34:36,189][140503] Updated weights for policy 0, policy_version 21746 (0.0006) [2023-08-17 13:34:36,975][140503] Updated weights for policy 0, policy_version 21756 (0.0006) [2023-08-17 13:34:37,704][140503] Updated weights for policy 0, policy_version 21766 (0.0006) [2023-08-17 13:34:38,084][140404] Fps is (10 sec: 53656.9, 60 sec: 53452.7, 300 sec: 53345.2). Total num frames: 89174016. Throughput: 0: 13363.7. Samples: 18656136. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:34:38,085][140404] Avg episode reward: [(0, '46.853')] [2023-08-17 13:34:38,488][140503] Updated weights for policy 0, policy_version 21776 (0.0006) [2023-08-17 13:34:39,247][140503] Updated weights for policy 0, policy_version 21786 (0.0006) [2023-08-17 13:34:39,983][140503] Updated weights for policy 0, policy_version 21796 (0.0006) [2023-08-17 13:34:40,787][140503] Updated weights for policy 0, policy_version 21806 (0.0006) [2023-08-17 13:34:41,527][140503] Updated weights for policy 0, policy_version 21816 (0.0006) [2023-08-17 13:34:42,303][140503] Updated weights for policy 0, policy_version 21826 (0.0007) [2023-08-17 13:34:43,019][140503] Updated weights for policy 0, policy_version 21836 (0.0006) [2023-08-17 13:34:43,084][140404] Fps is (10 sec: 53658.0, 60 sec: 53452.8, 300 sec: 53345.2). Total num frames: 89440256. Throughput: 0: 13371.4. Samples: 18736424. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:34:43,085][140404] Avg episode reward: [(0, '46.122')] [2023-08-17 13:34:43,836][140503] Updated weights for policy 0, policy_version 21846 (0.0007) [2023-08-17 13:34:44,590][140503] Updated weights for policy 0, policy_version 21856 (0.0006) [2023-08-17 13:34:45,352][140503] Updated weights for policy 0, policy_version 21866 (0.0007) [2023-08-17 13:34:46,142][140503] Updated weights for policy 0, policy_version 21876 (0.0007) [2023-08-17 13:34:46,896][140503] Updated weights for policy 0, policy_version 21886 (0.0007) [2023-08-17 13:34:47,687][140503] Updated weights for policy 0, policy_version 21896 (0.0007) [2023-08-17 13:34:48,084][140404] Fps is (10 sec: 52838.4, 60 sec: 53384.6, 300 sec: 53331.3). Total num frames: 89702400. Throughput: 0: 13371.9. Samples: 18816128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:34:48,085][140404] Avg episode reward: [(0, '47.630')] [2023-08-17 13:34:48,477][140503] Updated weights for policy 0, policy_version 21906 (0.0007) [2023-08-17 13:34:49,254][140503] Updated weights for policy 0, policy_version 21916 (0.0007) [2023-08-17 13:34:50,015][140503] Updated weights for policy 0, policy_version 21926 (0.0006) [2023-08-17 13:34:50,776][140503] Updated weights for policy 0, policy_version 21936 (0.0007) [2023-08-17 13:34:51,538][140503] Updated weights for policy 0, policy_version 21946 (0.0007) [2023-08-17 13:34:52,313][140503] Updated weights for policy 0, policy_version 21956 (0.0007) [2023-08-17 13:34:53,084][140404] Fps is (10 sec: 52838.4, 60 sec: 53384.7, 300 sec: 53331.3). Total num frames: 89968640. Throughput: 0: 13379.1. Samples: 18856104. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:34:53,085][140404] Avg episode reward: [(0, '44.321')] [2023-08-17 13:34:53,108][140503] Updated weights for policy 0, policy_version 21966 (0.0007) [2023-08-17 13:34:53,875][140503] Updated weights for policy 0, policy_version 21976 (0.0007) [2023-08-17 13:34:54,642][140503] Updated weights for policy 0, policy_version 21986 (0.0007) [2023-08-17 13:34:55,437][140503] Updated weights for policy 0, policy_version 21996 (0.0007) [2023-08-17 13:34:56,190][140503] Updated weights for policy 0, policy_version 22006 (0.0006) [2023-08-17 13:34:56,946][140503] Updated weights for policy 0, policy_version 22016 (0.0006) [2023-08-17 13:34:57,737][140503] Updated weights for policy 0, policy_version 22026 (0.0007) [2023-08-17 13:34:58,084][140404] Fps is (10 sec: 53248.4, 60 sec: 53384.7, 300 sec: 53317.4). Total num frames: 90234880. Throughput: 0: 13348.6. Samples: 18935520. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:34:58,085][140404] Avg episode reward: [(0, '45.958')] [2023-08-17 13:34:58,525][140503] Updated weights for policy 0, policy_version 22036 (0.0007) [2023-08-17 13:34:59,258][140503] Updated weights for policy 0, policy_version 22046 (0.0006) [2023-08-17 13:35:00,062][140503] Updated weights for policy 0, policy_version 22056 (0.0007) [2023-08-17 13:35:00,804][140503] Updated weights for policy 0, policy_version 22066 (0.0007) [2023-08-17 13:35:01,592][140503] Updated weights for policy 0, policy_version 22076 (0.0007) [2023-08-17 13:35:02,382][140503] Updated weights for policy 0, policy_version 22086 (0.0007) [2023-08-17 13:35:03,084][140404] Fps is (10 sec: 53248.0, 60 sec: 53384.5, 300 sec: 53331.3). Total num frames: 90501120. Throughput: 0: 13334.9. Samples: 19014884. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:35:03,085][140404] Avg episode reward: [(0, '44.479')] [2023-08-17 13:35:03,089][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000022095_90501120.pth... [2023-08-17 13:35:03,127][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000018969_77697024.pth [2023-08-17 13:35:03,162][140503] Updated weights for policy 0, policy_version 22096 (0.0007) [2023-08-17 13:35:03,908][140503] Updated weights for policy 0, policy_version 22106 (0.0006) [2023-08-17 13:35:04,645][140503] Updated weights for policy 0, policy_version 22116 (0.0006) [2023-08-17 13:35:05,401][140503] Updated weights for policy 0, policy_version 22126 (0.0006) [2023-08-17 13:35:06,155][140503] Updated weights for policy 0, policy_version 22136 (0.0006) [2023-08-17 13:35:06,918][140503] Updated weights for policy 0, policy_version 22146 (0.0006) [2023-08-17 13:35:07,671][140503] Updated weights for policy 0, policy_version 22156 (0.0006) [2023-08-17 13:35:08,084][140404] Fps is (10 sec: 53657.8, 60 sec: 53452.9, 300 sec: 53345.2). Total num frames: 90771456. Throughput: 0: 13344.9. Samples: 19055640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:35:08,085][140404] Avg episode reward: [(0, '49.597')] [2023-08-17 13:35:08,086][140489] Saving new best policy, reward=49.597! [2023-08-17 13:35:08,440][140503] Updated weights for policy 0, policy_version 22166 (0.0007) [2023-08-17 13:35:09,206][140503] Updated weights for policy 0, policy_version 22176 (0.0007) [2023-08-17 13:35:09,939][140503] Updated weights for policy 0, policy_version 22186 (0.0006) [2023-08-17 13:35:10,675][140503] Updated weights for policy 0, policy_version 22196 (0.0006) [2023-08-17 13:35:11,505][140503] Updated weights for policy 0, policy_version 22206 (0.0007) [2023-08-17 13:35:12,266][140503] Updated weights for policy 0, policy_version 22216 (0.0006) [2023-08-17 13:35:13,059][140503] Updated weights for policy 0, policy_version 22226 (0.0006) [2023-08-17 13:35:13,084][140404] Fps is (10 sec: 53657.5, 60 sec: 53384.6, 300 sec: 53359.1). Total num frames: 91037696. Throughput: 0: 13344.7. Samples: 19135948. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:35:13,085][140404] Avg episode reward: [(0, '47.157')] [2023-08-17 13:35:13,795][140503] Updated weights for policy 0, policy_version 22236 (0.0006) [2023-08-17 13:35:14,516][140503] Updated weights for policy 0, policy_version 22246 (0.0006) [2023-08-17 13:35:15,266][140503] Updated weights for policy 0, policy_version 22256 (0.0007) [2023-08-17 13:35:16,043][140503] Updated weights for policy 0, policy_version 22266 (0.0006) [2023-08-17 13:35:16,786][140503] Updated weights for policy 0, policy_version 22276 (0.0006) [2023-08-17 13:35:17,534][140503] Updated weights for policy 0, policy_version 22286 (0.0006) [2023-08-17 13:35:18,084][140404] Fps is (10 sec: 53657.3, 60 sec: 53452.8, 300 sec: 53373.0). Total num frames: 91308032. Throughput: 0: 13364.6. Samples: 19217232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:35:18,085][140404] Avg episode reward: [(0, '47.162')] [2023-08-17 13:35:18,327][140503] Updated weights for policy 0, policy_version 22296 (0.0006) [2023-08-17 13:35:19,092][140503] Updated weights for policy 0, policy_version 22306 (0.0006) [2023-08-17 13:35:19,828][140503] Updated weights for policy 0, policy_version 22316 (0.0005) [2023-08-17 13:35:20,610][140503] Updated weights for policy 0, policy_version 22326 (0.0006) [2023-08-17 13:35:21,370][140503] Updated weights for policy 0, policy_version 22336 (0.0006) [2023-08-17 13:35:22,103][140503] Updated weights for policy 0, policy_version 22346 (0.0006) [2023-08-17 13:35:22,843][140503] Updated weights for policy 0, policy_version 22356 (0.0006) [2023-08-17 13:35:23,084][140404] Fps is (10 sec: 54476.9, 60 sec: 53589.4, 300 sec: 53414.6). Total num frames: 91582464. Throughput: 0: 13381.4. Samples: 19258300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:35:23,085][140404] Avg episode reward: [(0, '46.431')] [2023-08-17 13:35:23,624][140503] Updated weights for policy 0, policy_version 22366 (0.0007) [2023-08-17 13:35:24,386][140503] Updated weights for policy 0, policy_version 22376 (0.0006) [2023-08-17 13:35:25,175][140503] Updated weights for policy 0, policy_version 22386 (0.0007) [2023-08-17 13:35:25,969][140503] Updated weights for policy 0, policy_version 22396 (0.0006) [2023-08-17 13:35:26,721][140503] Updated weights for policy 0, policy_version 22406 (0.0006) [2023-08-17 13:35:27,525][140503] Updated weights for policy 0, policy_version 22416 (0.0006) [2023-08-17 13:35:28,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53452.8, 300 sec: 53400.7). Total num frames: 91844608. Throughput: 0: 13358.9. Samples: 19337576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:35:28,085][140404] Avg episode reward: [(0, '46.445')] [2023-08-17 13:35:28,323][140503] Updated weights for policy 0, policy_version 22426 (0.0007) [2023-08-17 13:35:29,079][140503] Updated weights for policy 0, policy_version 22436 (0.0006) [2023-08-17 13:35:29,836][140503] Updated weights for policy 0, policy_version 22446 (0.0007) [2023-08-17 13:35:30,623][140503] Updated weights for policy 0, policy_version 22456 (0.0007) [2023-08-17 13:35:31,391][140503] Updated weights for policy 0, policy_version 22466 (0.0007) [2023-08-17 13:35:32,219][140503] Updated weights for policy 0, policy_version 22476 (0.0007) [2023-08-17 13:35:32,980][140503] Updated weights for policy 0, policy_version 22486 (0.0007) [2023-08-17 13:35:33,084][140404] Fps is (10 sec: 52428.9, 60 sec: 53384.6, 300 sec: 53373.0). Total num frames: 92106752. Throughput: 0: 13348.1. Samples: 19416792. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:35:33,085][140404] Avg episode reward: [(0, '45.287')] [2023-08-17 13:35:33,770][140503] Updated weights for policy 0, policy_version 22496 (0.0007) [2023-08-17 13:35:34,533][140503] Updated weights for policy 0, policy_version 22506 (0.0006) [2023-08-17 13:35:35,292][140503] Updated weights for policy 0, policy_version 22516 (0.0006) [2023-08-17 13:35:36,058][140503] Updated weights for policy 0, policy_version 22526 (0.0006) [2023-08-17 13:35:36,821][140503] Updated weights for policy 0, policy_version 22536 (0.0006) [2023-08-17 13:35:37,548][140503] Updated weights for policy 0, policy_version 22546 (0.0006) [2023-08-17 13:35:38,084][140404] Fps is (10 sec: 52838.2, 60 sec: 53316.3, 300 sec: 53373.0). Total num frames: 92372992. Throughput: 0: 13340.0. Samples: 19456404. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:35:38,085][140404] Avg episode reward: [(0, '44.417')] [2023-08-17 13:35:38,340][140503] Updated weights for policy 0, policy_version 22556 (0.0007) [2023-08-17 13:35:39,123][140503] Updated weights for policy 0, policy_version 22566 (0.0006) [2023-08-17 13:35:39,901][140503] Updated weights for policy 0, policy_version 22576 (0.0007) [2023-08-17 13:35:40,667][140503] Updated weights for policy 0, policy_version 22586 (0.0006) [2023-08-17 13:35:41,440][140503] Updated weights for policy 0, policy_version 22596 (0.0006) [2023-08-17 13:35:42,195][140503] Updated weights for policy 0, policy_version 22606 (0.0007) [2023-08-17 13:35:42,961][140503] Updated weights for policy 0, policy_version 22616 (0.0006) [2023-08-17 13:35:43,084][140404] Fps is (10 sec: 53247.9, 60 sec: 53316.3, 300 sec: 53372.9). Total num frames: 92639232. Throughput: 0: 13353.2. Samples: 19536416. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:35:43,085][140404] Avg episode reward: [(0, '46.946')] [2023-08-17 13:35:43,752][140503] Updated weights for policy 0, policy_version 22626 (0.0007) [2023-08-17 13:35:44,530][140503] Updated weights for policy 0, policy_version 22636 (0.0006) [2023-08-17 13:35:45,304][140503] Updated weights for policy 0, policy_version 22646 (0.0006) [2023-08-17 13:35:46,080][140503] Updated weights for policy 0, policy_version 22656 (0.0006) [2023-08-17 13:35:46,855][140503] Updated weights for policy 0, policy_version 22666 (0.0007) [2023-08-17 13:35:47,647][140503] Updated weights for policy 0, policy_version 22676 (0.0007) [2023-08-17 13:35:48,084][140404] Fps is (10 sec: 52838.6, 60 sec: 53316.3, 300 sec: 53359.1). Total num frames: 92901376. Throughput: 0: 13343.0. Samples: 19615320. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:35:48,085][140404] Avg episode reward: [(0, '47.321')] [2023-08-17 13:35:48,449][140503] Updated weights for policy 0, policy_version 22686 (0.0007) [2023-08-17 13:35:49,262][140503] Updated weights for policy 0, policy_version 22696 (0.0007) [2023-08-17 13:35:50,029][140503] Updated weights for policy 0, policy_version 22706 (0.0006) [2023-08-17 13:35:50,822][140503] Updated weights for policy 0, policy_version 22716 (0.0007) [2023-08-17 13:35:51,578][140503] Updated weights for policy 0, policy_version 22726 (0.0007) [2023-08-17 13:35:52,304][140503] Updated weights for policy 0, policy_version 22736 (0.0006) [2023-08-17 13:35:53,084][140404] Fps is (10 sec: 52428.9, 60 sec: 53248.0, 300 sec: 53345.2). Total num frames: 93163520. Throughput: 0: 13304.4. Samples: 19654340. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:35:53,085][140404] Avg episode reward: [(0, '43.398')] [2023-08-17 13:35:53,091][140503] Updated weights for policy 0, policy_version 22746 (0.0006) [2023-08-17 13:35:53,853][140503] Updated weights for policy 0, policy_version 22756 (0.0007) [2023-08-17 13:35:54,622][140503] Updated weights for policy 0, policy_version 22766 (0.0006) [2023-08-17 13:35:55,372][140503] Updated weights for policy 0, policy_version 22776 (0.0007) [2023-08-17 13:35:56,140][140503] Updated weights for policy 0, policy_version 22786 (0.0006) [2023-08-17 13:35:56,909][140503] Updated weights for policy 0, policy_version 22796 (0.0006) [2023-08-17 13:35:57,680][140503] Updated weights for policy 0, policy_version 22806 (0.0007) [2023-08-17 13:35:58,084][140404] Fps is (10 sec: 53248.2, 60 sec: 53316.3, 300 sec: 53359.1). Total num frames: 93433856. Throughput: 0: 13304.4. Samples: 19734644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2023-08-17 13:35:58,085][140404] Avg episode reward: [(0, '47.494')] [2023-08-17 13:35:58,440][140503] Updated weights for policy 0, policy_version 22816 (0.0006) [2023-08-17 13:35:59,232][140503] Updated weights for policy 0, policy_version 22826 (0.0007) [2023-08-17 13:36:00,004][140503] Updated weights for policy 0, policy_version 22836 (0.0006) [2023-08-17 13:36:00,776][140503] Updated weights for policy 0, policy_version 22846 (0.0006) [2023-08-17 13:36:01,539][140503] Updated weights for policy 0, policy_version 22856 (0.0006) [2023-08-17 13:36:02,285][140503] Updated weights for policy 0, policy_version 22866 (0.0006) [2023-08-17 13:36:03,040][140503] Updated weights for policy 0, policy_version 22876 (0.0006) [2023-08-17 13:36:03,084][140404] Fps is (10 sec: 53657.1, 60 sec: 53316.2, 300 sec: 53359.1). Total num frames: 93700096. Throughput: 0: 13284.2. Samples: 19815024. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:36:03,085][140404] Avg episode reward: [(0, '46.799')] [2023-08-17 13:36:03,786][140503] Updated weights for policy 0, policy_version 22886 (0.0006) [2023-08-17 13:36:04,585][140503] Updated weights for policy 0, policy_version 22896 (0.0007) [2023-08-17 13:36:05,359][140503] Updated weights for policy 0, policy_version 22906 (0.0006) [2023-08-17 13:36:06,122][140503] Updated weights for policy 0, policy_version 22916 (0.0006) [2023-08-17 13:36:06,896][140503] Updated weights for policy 0, policy_version 22926 (0.0007) [2023-08-17 13:36:07,661][140503] Updated weights for policy 0, policy_version 22936 (0.0006) [2023-08-17 13:36:08,084][140404] Fps is (10 sec: 53247.5, 60 sec: 53247.9, 300 sec: 53359.1). Total num frames: 93966336. Throughput: 0: 13261.5. Samples: 19855068. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:36:08,085][140404] Avg episode reward: [(0, '47.639')] [2023-08-17 13:36:08,414][140503] Updated weights for policy 0, policy_version 22946 (0.0006) [2023-08-17 13:36:09,217][140503] Updated weights for policy 0, policy_version 22956 (0.0007) [2023-08-17 13:36:09,984][140503] Updated weights for policy 0, policy_version 22966 (0.0007) [2023-08-17 13:36:10,758][140503] Updated weights for policy 0, policy_version 22976 (0.0006) [2023-08-17 13:36:11,531][140503] Updated weights for policy 0, policy_version 22986 (0.0006) [2023-08-17 13:36:12,309][140503] Updated weights for policy 0, policy_version 22996 (0.0006) [2023-08-17 13:36:13,084][140404] Fps is (10 sec: 52838.8, 60 sec: 53179.7, 300 sec: 53345.2). Total num frames: 94228480. Throughput: 0: 13258.1. Samples: 19934192. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:36:13,085][140404] Avg episode reward: [(0, '48.239')] [2023-08-17 13:36:13,090][140503] Updated weights for policy 0, policy_version 23006 (0.0007) [2023-08-17 13:36:13,911][140503] Updated weights for policy 0, policy_version 23016 (0.0006) [2023-08-17 13:36:14,662][140503] Updated weights for policy 0, policy_version 23026 (0.0006) [2023-08-17 13:36:15,430][140503] Updated weights for policy 0, policy_version 23036 (0.0007) [2023-08-17 13:36:16,207][140503] Updated weights for policy 0, policy_version 23046 (0.0006) [2023-08-17 13:36:16,962][140503] Updated weights for policy 0, policy_version 23056 (0.0006) [2023-08-17 13:36:17,755][140503] Updated weights for policy 0, policy_version 23066 (0.0007) [2023-08-17 13:36:18,084][140404] Fps is (10 sec: 52838.5, 60 sec: 53111.4, 300 sec: 53345.2). Total num frames: 94494720. Throughput: 0: 13256.6. Samples: 20013340. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:36:18,085][140404] Avg episode reward: [(0, '47.799')] [2023-08-17 13:36:18,557][140503] Updated weights for policy 0, policy_version 23076 (0.0007) [2023-08-17 13:36:19,343][140503] Updated weights for policy 0, policy_version 23086 (0.0007) [2023-08-17 13:36:20,135][140503] Updated weights for policy 0, policy_version 23096 (0.0007) [2023-08-17 13:36:20,912][140503] Updated weights for policy 0, policy_version 23106 (0.0007) [2023-08-17 13:36:21,675][140503] Updated weights for policy 0, policy_version 23116 (0.0007) [2023-08-17 13:36:22,447][140503] Updated weights for policy 0, policy_version 23126 (0.0007) [2023-08-17 13:36:23,084][140404] Fps is (10 sec: 52428.7, 60 sec: 52838.4, 300 sec: 53331.3). Total num frames: 94752768. Throughput: 0: 13237.7. Samples: 20052100. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:36:23,085][140404] Avg episode reward: [(0, '46.302')] [2023-08-17 13:36:23,249][140503] Updated weights for policy 0, policy_version 23136 (0.0006) [2023-08-17 13:36:24,014][140503] Updated weights for policy 0, policy_version 23146 (0.0007) [2023-08-17 13:36:24,782][140503] Updated weights for policy 0, policy_version 23156 (0.0007) [2023-08-17 13:36:25,559][140503] Updated weights for policy 0, policy_version 23166 (0.0007) [2023-08-17 13:36:26,367][140503] Updated weights for policy 0, policy_version 23176 (0.0007) [2023-08-17 13:36:27,151][140503] Updated weights for policy 0, policy_version 23186 (0.0007) [2023-08-17 13:36:27,910][140503] Updated weights for policy 0, policy_version 23196 (0.0006) [2023-08-17 13:36:28,084][140404] Fps is (10 sec: 52428.8, 60 sec: 52906.6, 300 sec: 53331.3). Total num frames: 95019008. Throughput: 0: 13208.7. Samples: 20130808. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:36:28,085][140404] Avg episode reward: [(0, '45.586')] [2023-08-17 13:36:28,697][140503] Updated weights for policy 0, policy_version 23206 (0.0006) [2023-08-17 13:36:29,466][140503] Updated weights for policy 0, policy_version 23216 (0.0007) [2023-08-17 13:36:30,232][140503] Updated weights for policy 0, policy_version 23226 (0.0007) [2023-08-17 13:36:31,078][140503] Updated weights for policy 0, policy_version 23236 (0.0007) [2023-08-17 13:36:31,814][140503] Updated weights for policy 0, policy_version 23246 (0.0006) [2023-08-17 13:36:32,580][140503] Updated weights for policy 0, policy_version 23256 (0.0006) [2023-08-17 13:36:33,084][140404] Fps is (10 sec: 52838.5, 60 sec: 52906.6, 300 sec: 53317.4). Total num frames: 95281152. Throughput: 0: 13228.2. Samples: 20210588. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2023-08-17 13:36:33,085][140404] Avg episode reward: [(0, '47.925')] [2023-08-17 13:36:33,340][140503] Updated weights for policy 0, policy_version 23266 (0.0006) [2023-08-17 13:36:34,097][140503] Updated weights for policy 0, policy_version 23276 (0.0007) [2023-08-17 13:36:34,881][140503] Updated weights for policy 0, policy_version 23286 (0.0007) [2023-08-17 13:36:35,640][140503] Updated weights for policy 0, policy_version 23296 (0.0006) [2023-08-17 13:36:36,374][140503] Updated weights for policy 0, policy_version 23306 (0.0006) [2023-08-17 13:36:37,113][140503] Updated weights for policy 0, policy_version 23316 (0.0006) [2023-08-17 13:36:37,863][140503] Updated weights for policy 0, policy_version 23326 (0.0006) [2023-08-17 13:36:38,084][140404] Fps is (10 sec: 53247.0, 60 sec: 52974.8, 300 sec: 53331.3). Total num frames: 95551488. Throughput: 0: 13254.0. Samples: 20250772. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:36:38,085][140404] Avg episode reward: [(0, '48.607')] [2023-08-17 13:36:38,652][140503] Updated weights for policy 0, policy_version 23336 (0.0007) [2023-08-17 13:36:39,415][140503] Updated weights for policy 0, policy_version 23346 (0.0006) [2023-08-17 13:36:40,213][140503] Updated weights for policy 0, policy_version 23356 (0.0007) [2023-08-17 13:36:40,959][140503] Updated weights for policy 0, policy_version 23366 (0.0006) [2023-08-17 13:36:41,730][140503] Updated weights for policy 0, policy_version 23376 (0.0006) [2023-08-17 13:36:42,502][140503] Updated weights for policy 0, policy_version 23386 (0.0006) [2023-08-17 13:36:43,084][140404] Fps is (10 sec: 53657.6, 60 sec: 52974.9, 300 sec: 53331.3). Total num frames: 95817728. Throughput: 0: 13261.9. Samples: 20331432. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:36:43,085][140404] Avg episode reward: [(0, '45.688')] [2023-08-17 13:36:43,270][140503] Updated weights for policy 0, policy_version 23396 (0.0007) [2023-08-17 13:36:44,030][140503] Updated weights for policy 0, policy_version 23406 (0.0007) [2023-08-17 13:36:44,778][140503] Updated weights for policy 0, policy_version 23416 (0.0007) [2023-08-17 13:36:45,558][140503] Updated weights for policy 0, policy_version 23426 (0.0007) [2023-08-17 13:36:46,354][140503] Updated weights for policy 0, policy_version 23436 (0.0007) [2023-08-17 13:36:47,145][140503] Updated weights for policy 0, policy_version 23446 (0.0006) [2023-08-17 13:36:47,927][140503] Updated weights for policy 0, policy_version 23456 (0.0006) [2023-08-17 13:36:48,084][140404] Fps is (10 sec: 52839.7, 60 sec: 52974.9, 300 sec: 53303.5). Total num frames: 96079872. Throughput: 0: 13228.7. Samples: 20410312. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:36:48,085][140404] Avg episode reward: [(0, '45.255')] [2023-08-17 13:36:48,732][140503] Updated weights for policy 0, policy_version 23466 (0.0007) [2023-08-17 13:36:49,500][140503] Updated weights for policy 0, policy_version 23476 (0.0007) [2023-08-17 13:36:50,253][140503] Updated weights for policy 0, policy_version 23486 (0.0007) [2023-08-17 13:36:51,021][140503] Updated weights for policy 0, policy_version 23496 (0.0006) [2023-08-17 13:36:51,780][140503] Updated weights for policy 0, policy_version 23506 (0.0006) [2023-08-17 13:36:52,545][140503] Updated weights for policy 0, policy_version 23516 (0.0006) [2023-08-17 13:36:53,084][140404] Fps is (10 sec: 52838.3, 60 sec: 53043.2, 300 sec: 53303.5). Total num frames: 96346112. Throughput: 0: 13222.7. Samples: 20450088. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:36:53,085][140404] Avg episode reward: [(0, '44.072')] [2023-08-17 13:36:53,306][140503] Updated weights for policy 0, policy_version 23526 (0.0006) [2023-08-17 13:36:54,087][140503] Updated weights for policy 0, policy_version 23536 (0.0007) [2023-08-17 13:36:54,836][140503] Updated weights for policy 0, policy_version 23546 (0.0006) [2023-08-17 13:36:55,628][140503] Updated weights for policy 0, policy_version 23556 (0.0006) [2023-08-17 13:36:56,395][140503] Updated weights for policy 0, policy_version 23566 (0.0007) [2023-08-17 13:36:57,197][140503] Updated weights for policy 0, policy_version 23576 (0.0007) [2023-08-17 13:36:58,012][140503] Updated weights for policy 0, policy_version 23586 (0.0007) [2023-08-17 13:36:58,084][140404] Fps is (10 sec: 53248.1, 60 sec: 52974.9, 300 sec: 53289.7). Total num frames: 96612352. Throughput: 0: 13232.5. Samples: 20529652. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:36:58,085][140404] Avg episode reward: [(0, '47.112')] [2023-08-17 13:36:58,801][140503] Updated weights for policy 0, policy_version 23596 (0.0007) [2023-08-17 13:36:59,566][140503] Updated weights for policy 0, policy_version 23606 (0.0007) [2023-08-17 13:37:00,332][140503] Updated weights for policy 0, policy_version 23616 (0.0006) [2023-08-17 13:37:01,099][140503] Updated weights for policy 0, policy_version 23626 (0.0007) [2023-08-17 13:37:01,878][140503] Updated weights for policy 0, policy_version 23636 (0.0006) [2023-08-17 13:37:02,647][140503] Updated weights for policy 0, policy_version 23646 (0.0007) [2023-08-17 13:37:03,084][140404] Fps is (10 sec: 52838.3, 60 sec: 52906.7, 300 sec: 53275.8). Total num frames: 96874496. Throughput: 0: 13219.2. Samples: 20608204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2023-08-17 13:37:03,085][140404] Avg episode reward: [(0, '46.792')] [2023-08-17 13:37:03,090][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000023651_96874496.pth... [2023-08-17 13:37:03,131][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000020533_84103168.pth [2023-08-17 13:37:03,436][140503] Updated weights for policy 0, policy_version 23656 (0.0007) [2023-08-17 13:37:04,265][140503] Updated weights for policy 0, policy_version 23666 (0.0007) [2023-08-17 13:37:05,064][140503] Updated weights for policy 0, policy_version 23676 (0.0007) [2023-08-17 13:37:05,806][140503] Updated weights for policy 0, policy_version 23686 (0.0006) [2023-08-17 13:37:06,572][140503] Updated weights for policy 0, policy_version 23696 (0.0007) [2023-08-17 13:37:07,397][140503] Updated weights for policy 0, policy_version 23706 (0.0007) [2023-08-17 13:37:08,084][140404] Fps is (10 sec: 52428.8, 60 sec: 52838.5, 300 sec: 53261.9). Total num frames: 97136640. Throughput: 0: 13233.1. Samples: 20647588. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:37:08,085][140404] Avg episode reward: [(0, '47.444')] [2023-08-17 13:37:08,161][140503] Updated weights for policy 0, policy_version 23716 (0.0007) [2023-08-17 13:37:08,934][140503] Updated weights for policy 0, policy_version 23726 (0.0006) [2023-08-17 13:37:09,704][140503] Updated weights for policy 0, policy_version 23736 (0.0006) [2023-08-17 13:37:10,486][140503] Updated weights for policy 0, policy_version 23746 (0.0007) [2023-08-17 13:37:11,238][140503] Updated weights for policy 0, policy_version 23756 (0.0006) [2023-08-17 13:37:12,004][140503] Updated weights for policy 0, policy_version 23766 (0.0005) [2023-08-17 13:37:12,743][140503] Updated weights for policy 0, policy_version 23776 (0.0006) [2023-08-17 13:37:13,084][140404] Fps is (10 sec: 52838.3, 60 sec: 52906.6, 300 sec: 53275.8). Total num frames: 97402880. Throughput: 0: 13253.8. Samples: 20727228. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:37:13,085][140404] Avg episode reward: [(0, '45.861')] [2023-08-17 13:37:13,497][140503] Updated weights for policy 0, policy_version 23786 (0.0006) [2023-08-17 13:37:14,244][140503] Updated weights for policy 0, policy_version 23796 (0.0007) [2023-08-17 13:37:14,996][140503] Updated weights for policy 0, policy_version 23806 (0.0006) [2023-08-17 13:37:15,717][140503] Updated weights for policy 0, policy_version 23816 (0.0007) [2023-08-17 13:37:16,505][140503] Updated weights for policy 0, policy_version 23826 (0.0007) [2023-08-17 13:37:17,284][140503] Updated weights for policy 0, policy_version 23836 (0.0006) [2023-08-17 13:37:18,057][140503] Updated weights for policy 0, policy_version 23846 (0.0006) [2023-08-17 13:37:18,084][140404] Fps is (10 sec: 53657.1, 60 sec: 52974.9, 300 sec: 53275.8). Total num frames: 97673216. Throughput: 0: 13282.1. Samples: 20808284. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:37:18,085][140404] Avg episode reward: [(0, '46.520')] [2023-08-17 13:37:18,804][140503] Updated weights for policy 0, policy_version 23856 (0.0006) [2023-08-17 13:37:19,539][140503] Updated weights for policy 0, policy_version 23866 (0.0006) [2023-08-17 13:37:20,266][140503] Updated weights for policy 0, policy_version 23876 (0.0006) [2023-08-17 13:37:21,039][140503] Updated weights for policy 0, policy_version 23886 (0.0007) [2023-08-17 13:37:21,842][140503] Updated weights for policy 0, policy_version 23896 (0.0006) [2023-08-17 13:37:22,591][140503] Updated weights for policy 0, policy_version 23906 (0.0006) [2023-08-17 13:37:23,084][140404] Fps is (10 sec: 54477.3, 60 sec: 53248.0, 300 sec: 53275.8). Total num frames: 97947648. Throughput: 0: 13296.8. Samples: 20849124. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:37:23,085][140404] Avg episode reward: [(0, '50.917')] [2023-08-17 13:37:23,088][140489] Saving new best policy, reward=50.917! [2023-08-17 13:37:23,352][140503] Updated weights for policy 0, policy_version 23916 (0.0006) [2023-08-17 13:37:24,091][140503] Updated weights for policy 0, policy_version 23926 (0.0006) [2023-08-17 13:37:24,818][140503] Updated weights for policy 0, policy_version 23936 (0.0005) [2023-08-17 13:37:25,564][140503] Updated weights for policy 0, policy_version 23946 (0.0006) [2023-08-17 13:37:26,323][140503] Updated weights for policy 0, policy_version 23956 (0.0006) [2023-08-17 13:37:27,058][140503] Updated weights for policy 0, policy_version 23966 (0.0006) [2023-08-17 13:37:27,848][140503] Updated weights for policy 0, policy_version 23976 (0.0007) [2023-08-17 13:37:28,084][140404] Fps is (10 sec: 54477.0, 60 sec: 53316.3, 300 sec: 53275.8). Total num frames: 98217984. Throughput: 0: 13321.7. Samples: 20930908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:37:28,085][140404] Avg episode reward: [(0, '48.072')] [2023-08-17 13:37:28,633][140503] Updated weights for policy 0, policy_version 23986 (0.0008) [2023-08-17 13:37:29,452][140503] Updated weights for policy 0, policy_version 23996 (0.0006) [2023-08-17 13:37:30,239][140503] Updated weights for policy 0, policy_version 24006 (0.0006) [2023-08-17 13:37:30,995][140503] Updated weights for policy 0, policy_version 24016 (0.0007) [2023-08-17 13:37:31,716][140503] Updated weights for policy 0, policy_version 24026 (0.0006) [2023-08-17 13:37:32,501][140503] Updated weights for policy 0, policy_version 24036 (0.0007) [2023-08-17 13:37:33,084][140404] Fps is (10 sec: 53247.7, 60 sec: 53316.2, 300 sec: 53234.1). Total num frames: 98480128. Throughput: 0: 13334.6. Samples: 21010368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2023-08-17 13:37:33,085][140404] Avg episode reward: [(0, '46.819')] [2023-08-17 13:37:33,245][140503] Updated weights for policy 0, policy_version 24046 (0.0007) [2023-08-17 13:37:34,036][140503] Updated weights for policy 0, policy_version 24056 (0.0006) [2023-08-17 13:37:34,808][140503] Updated weights for policy 0, policy_version 24066 (0.0006) [2023-08-17 13:37:35,579][140503] Updated weights for policy 0, policy_version 24076 (0.0006) [2023-08-17 13:37:36,338][140503] Updated weights for policy 0, policy_version 24086 (0.0006) [2023-08-17 13:37:37,107][140503] Updated weights for policy 0, policy_version 24096 (0.0007) [2023-08-17 13:37:37,851][140503] Updated weights for policy 0, policy_version 24106 (0.0006) [2023-08-17 13:37:38,084][140404] Fps is (10 sec: 52838.7, 60 sec: 53248.2, 300 sec: 53234.1). Total num frames: 98746368. Throughput: 0: 13338.5. Samples: 21050320. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:37:38,085][140404] Avg episode reward: [(0, '46.786')] [2023-08-17 13:37:38,628][140503] Updated weights for policy 0, policy_version 24116 (0.0006) [2023-08-17 13:37:39,391][140503] Updated weights for policy 0, policy_version 24126 (0.0006) [2023-08-17 13:37:40,164][140503] Updated weights for policy 0, policy_version 24136 (0.0006) [2023-08-17 13:37:40,942][140503] Updated weights for policy 0, policy_version 24146 (0.0006) [2023-08-17 13:37:41,720][140503] Updated weights for policy 0, policy_version 24156 (0.0007) [2023-08-17 13:37:42,453][140503] Updated weights for policy 0, policy_version 24166 (0.0006) [2023-08-17 13:37:43,084][140404] Fps is (10 sec: 53657.2, 60 sec: 53316.2, 300 sec: 53261.9). Total num frames: 99016704. Throughput: 0: 13350.6. Samples: 21130432. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:37:43,085][140404] Avg episode reward: [(0, '45.076')] [2023-08-17 13:37:43,195][140503] Updated weights for policy 0, policy_version 24176 (0.0007) [2023-08-17 13:37:43,981][140503] Updated weights for policy 0, policy_version 24186 (0.0006) [2023-08-17 13:37:44,727][140503] Updated weights for policy 0, policy_version 24196 (0.0006) [2023-08-17 13:37:45,470][140503] Updated weights for policy 0, policy_version 24206 (0.0006) [2023-08-17 13:37:46,236][140503] Updated weights for policy 0, policy_version 24216 (0.0007) [2023-08-17 13:37:47,041][140503] Updated weights for policy 0, policy_version 24226 (0.0007) [2023-08-17 13:37:47,808][140503] Updated weights for policy 0, policy_version 24236 (0.0006) [2023-08-17 13:37:48,084][140404] Fps is (10 sec: 53657.6, 60 sec: 53384.5, 300 sec: 53261.9). Total num frames: 99282944. Throughput: 0: 13394.8. Samples: 21210968. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2023-08-17 13:37:48,085][140404] Avg episode reward: [(0, '47.088')] [2023-08-17 13:37:48,600][140503] Updated weights for policy 0, policy_version 24246 (0.0007) [2023-08-17 13:37:49,357][140503] Updated weights for policy 0, policy_version 24256 (0.0006) [2023-08-17 13:37:50,128][140503] Updated weights for policy 0, policy_version 24266 (0.0007) [2023-08-17 13:37:50,874][140503] Updated weights for policy 0, policy_version 24276 (0.0006) [2023-08-17 13:37:51,670][140503] Updated weights for policy 0, policy_version 24286 (0.0007) [2023-08-17 13:37:52,428][140503] Updated weights for policy 0, policy_version 24296 (0.0006) [2023-08-17 13:37:53,084][140404] Fps is (10 sec: 53248.6, 60 sec: 53384.6, 300 sec: 53261.9). Total num frames: 99549184. Throughput: 0: 13399.3. Samples: 21250556. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:37:53,085][140404] Avg episode reward: [(0, '45.998')] [2023-08-17 13:37:53,233][140503] Updated weights for policy 0, policy_version 24306 (0.0006) [2023-08-17 13:37:53,979][140503] Updated weights for policy 0, policy_version 24316 (0.0007) [2023-08-17 13:37:54,742][140503] Updated weights for policy 0, policy_version 24326 (0.0006) [2023-08-17 13:37:55,513][140503] Updated weights for policy 0, policy_version 24336 (0.0006) [2023-08-17 13:37:56,288][140503] Updated weights for policy 0, policy_version 24346 (0.0006) [2023-08-17 13:37:57,012][140503] Updated weights for policy 0, policy_version 24356 (0.0006) [2023-08-17 13:37:57,767][140503] Updated weights for policy 0, policy_version 24366 (0.0006) [2023-08-17 13:37:58,084][140404] Fps is (10 sec: 53657.7, 60 sec: 53452.8, 300 sec: 53289.7). Total num frames: 99819520. Throughput: 0: 13419.1. Samples: 21331088. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2023-08-17 13:37:58,085][140404] Avg episode reward: [(0, '47.140')] [2023-08-17 13:37:58,542][140503] Updated weights for policy 0, policy_version 24376 (0.0007) [2023-08-17 13:37:59,332][140503] Updated weights for policy 0, policy_version 24386 (0.0006) [2023-08-17 13:38:00,134][140503] Updated weights for policy 0, policy_version 24396 (0.0006) [2023-08-17 13:38:00,903][140503] Updated weights for policy 0, policy_version 24406 (0.0006) [2023-08-17 13:38:01,659][140489] Stopping Batcher_0... [2023-08-17 13:38:01,660][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000024416_100007936.pth... [2023-08-17 13:38:01,659][140404] Component Batcher_0 stopped! [2023-08-17 13:38:01,661][140503] Updated weights for policy 0, policy_version 24416 (0.0006) [2023-08-17 13:38:01,660][140489] Loop batcher_evt_loop terminating... [2023-08-17 13:38:01,673][140503] Weights refcount: 2 0 [2023-08-17 13:38:01,674][140503] Stopping InferenceWorker_p0-w0... [2023-08-17 13:38:01,674][140503] Loop inference_proc0-0_evt_loop terminating... [2023-08-17 13:38:01,674][140404] Component InferenceWorker_p0-w0 stopped! [2023-08-17 13:38:01,692][140489] Removing /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000022095_90501120.pth [2023-08-17 13:38:01,696][140489] Saving /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000024416_100007936.pth... [2023-08-17 13:38:01,735][140489] Stopping LearnerWorker_p0... [2023-08-17 13:38:01,735][140489] Loop learner_proc0_evt_loop terminating... [2023-08-17 13:38:01,735][140404] Component LearnerWorker_p0 stopped! [2023-08-17 13:38:01,778][140502] Stopping RolloutWorker_w0... [2023-08-17 13:38:01,778][140502] Loop rollout_proc0_evt_loop terminating... [2023-08-17 13:38:01,778][140404] Component RolloutWorker_w0 stopped! [2023-08-17 13:38:01,786][140509] Stopping RolloutWorker_w7... [2023-08-17 13:38:01,786][140509] Loop rollout_proc7_evt_loop terminating... [2023-08-17 13:38:01,786][140404] Component RolloutWorker_w7 stopped! [2023-08-17 13:38:01,789][140505] Stopping RolloutWorker_w2... [2023-08-17 13:38:01,789][140506] Stopping RolloutWorker_w3... [2023-08-17 13:38:01,789][140506] Loop rollout_proc3_evt_loop terminating... [2023-08-17 13:38:01,789][140505] Loop rollout_proc2_evt_loop terminating... [2023-08-17 13:38:01,789][140404] Component RolloutWorker_w2 stopped! [2023-08-17 13:38:01,790][140508] Stopping RolloutWorker_w5... [2023-08-17 13:38:01,790][140508] Loop rollout_proc5_evt_loop terminating... [2023-08-17 13:38:01,790][140404] Component RolloutWorker_w3 stopped! [2023-08-17 13:38:01,790][140404] Component RolloutWorker_w5 stopped! [2023-08-17 13:38:01,794][140510] Stopping RolloutWorker_w6... [2023-08-17 13:38:01,794][140510] Loop rollout_proc6_evt_loop terminating... [2023-08-17 13:38:01,794][140404] Component RolloutWorker_w6 stopped! [2023-08-17 13:38:01,795][140507] Stopping RolloutWorker_w4... [2023-08-17 13:38:01,795][140507] Loop rollout_proc4_evt_loop terminating... [2023-08-17 13:38:01,795][140404] Component RolloutWorker_w4 stopped! [2023-08-17 13:38:01,810][140504] Stopping RolloutWorker_w1... [2023-08-17 13:38:01,810][140504] Loop rollout_proc1_evt_loop terminating... [2023-08-17 13:38:01,810][140404] Component RolloutWorker_w1 stopped! [2023-08-17 13:38:01,812][140404] Waiting for process learner_proc0 to stop... [2023-08-17 13:38:02,374][140404] Waiting for process inference_proc0-0 to join... [2023-08-17 13:38:02,375][140404] Waiting for process rollout_proc0 to join... [2023-08-17 13:38:02,376][140404] Waiting for process rollout_proc1 to join... [2023-08-17 13:38:02,376][140404] Waiting for process rollout_proc2 to join... [2023-08-17 13:38:02,377][140404] Waiting for process rollout_proc3 to join... [2023-08-17 13:38:02,378][140404] Waiting for process rollout_proc4 to join... [2023-08-17 13:38:02,379][140404] Waiting for process rollout_proc5 to join... [2023-08-17 13:38:02,379][140404] Waiting for process rollout_proc6 to join... [2023-08-17 13:38:02,380][140404] Waiting for process rollout_proc7 to join... [2023-08-17 13:38:02,381][140404] Batcher 0 profile tree view: batching: 196.9234, releasing_batches: 0.2545 [2023-08-17 13:38:02,381][140404] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0002 wait_policy_total: 25.1254 update_model: 25.4676 weight_update: 0.0006 one_step: 0.0011 handle_policy_step: 1486.4814 deserialize: 86.4364, stack: 6.3255, obs_to_device_normalize: 355.7892, forward: 711.8119, send_messages: 78.6610 prepare_outputs: 184.2334 to_cpu: 114.3697 [2023-08-17 13:38:02,382][140404] Learner 0 profile tree view: misc: 0.0784, prepare_batch: 70.8499 train: 241.4686 epoch_init: 0.0762, minibatch_init: 0.0640, losses_postprocess: 6.2109, kl_divergence: 4.4716, after_optimizer: 2.6542 calculate_losses: 90.7278 losses_init: 0.0406, forward_head: 5.7252, bptt_initial: 53.1618, tail: 6.2433, advantages_returns: 1.5079, losses: 11.9414 bptt: 10.3237 bptt_forward_core: 9.8118 update: 133.0732 clip: 64.8700 [2023-08-17 13:38:02,382][140404] RolloutWorker_w0 profile tree view: wait_for_trajectories: 1.3702, enqueue_policy_requests: 55.4323, env_step: 1009.7051, overhead: 92.9206, complete_rollouts: 1.2538 save_policy_outputs: 89.0069 split_output_tensors: 41.5316 [2023-08-17 13:38:02,382][140404] RolloutWorker_w7 profile tree view: wait_for_trajectories: 1.3206, enqueue_policy_requests: 53.6399, env_step: 1001.1849, overhead: 91.9858, complete_rollouts: 1.1714 save_policy_outputs: 87.5670 split_output_tensors: 40.8464 [2023-08-17 13:38:02,383][140404] Loop Runner_EvtLoop terminating... [2023-08-17 13:38:02,383][140404] Runner profile tree view: main_loop: 1617.0664 [2023-08-17 13:38:02,384][140404] Collected {0: 100007936}, FPS: 52914.0 [2023-08-17 14:01:41,022][140404] Loading existing experiment configuration from /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json [2023-08-17 14:01:41,023][140404] Overriding arg 'num_workers' with value 1 passed from command line [2023-08-17 14:01:41,024][140404] Adding new argument 'no_render'=True that is not in the saved config file! [2023-08-17 14:01:41,024][140404] Adding new argument 'save_video'=True that is not in the saved config file! [2023-08-17 14:01:41,024][140404] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2023-08-17 14:01:41,025][140404] Adding new argument 'video_name'=None that is not in the saved config file! [2023-08-17 14:01:41,025][140404] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2023-08-17 14:01:41,026][140404] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2023-08-17 14:01:41,026][140404] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2023-08-17 14:01:41,026][140404] Adding new argument 'hf_repository'='patonw/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2023-08-17 14:01:41,027][140404] Adding new argument 'policy_index'=0 that is not in the saved config file! [2023-08-17 14:01:41,027][140404] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2023-08-17 14:01:41,028][140404] Adding new argument 'train_script'=None that is not in the saved config file! [2023-08-17 14:01:41,028][140404] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2023-08-17 14:01:41,028][140404] Using frameskip 1 and render_action_repeat=4 for evaluation [2023-08-17 14:01:41,032][140404] Doom resolution: 160x120, resize resolution: (128, 72) [2023-08-17 14:01:41,033][140404] RunningMeanStd input shape: (3, 72, 128) [2023-08-17 14:01:41,034][140404] RunningMeanStd input shape: (1,) [2023-08-17 14:01:41,041][140404] ConvEncoder: input_channels=3 [2023-08-17 14:01:41,103][140404] Conv encoder output size: 512 [2023-08-17 14:01:41,104][140404] Policy head output size: 512 [2023-08-17 14:01:42,562][140404] Loading state from checkpoint /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000024416_100007936.pth... [2023-08-17 14:01:43,293][140404] Num frames 100... [2023-08-17 14:01:43,355][140404] Num frames 200... [2023-08-17 14:01:43,413][140404] Num frames 300... [2023-08-17 14:01:43,470][140404] Num frames 400... [2023-08-17 14:01:43,529][140404] Num frames 500... [2023-08-17 14:01:43,587][140404] Num frames 600... [2023-08-17 14:01:43,645][140404] Num frames 700... [2023-08-17 14:01:43,704][140404] Num frames 800... [2023-08-17 14:01:43,762][140404] Num frames 900... [2023-08-17 14:01:43,819][140404] Num frames 1000... [2023-08-17 14:01:43,881][140404] Num frames 1100... [2023-08-17 14:01:43,941][140404] Num frames 1200... [2023-08-17 14:01:43,999][140404] Num frames 1300... [2023-08-17 14:01:44,057][140404] Num frames 1400... [2023-08-17 14:01:44,126][140404] Num frames 1500... [2023-08-17 14:01:44,189][140404] Num frames 1600... [2023-08-17 14:01:44,252][140404] Num frames 1700... [2023-08-17 14:01:44,312][140404] Num frames 1800... [2023-08-17 14:01:44,372][140404] Num frames 1900... [2023-08-17 14:01:44,432][140404] Num frames 2000... [2023-08-17 14:01:44,492][140404] Num frames 2100... [2023-08-17 14:01:44,544][140404] Avg episode rewards: #0: 60.998, true rewards: #0: 21.000 [2023-08-17 14:01:44,544][140404] Avg episode reward: 60.998, avg true_objective: 21.000 [2023-08-17 14:01:44,602][140404] Num frames 2200... [2023-08-17 14:01:44,661][140404] Num frames 2300... [2023-08-17 14:01:44,720][140404] Num frames 2400... [2023-08-17 14:01:44,780][140404] Num frames 2500... [2023-08-17 14:01:44,840][140404] Num frames 2600... [2023-08-17 14:01:44,901][140404] Num frames 2700... [2023-08-17 14:01:44,969][140404] Num frames 2800... [2023-08-17 14:01:45,032][140404] Num frames 2900... [2023-08-17 14:01:45,095][140404] Num frames 3000... [2023-08-17 14:01:45,157][140404] Num frames 3100... [2023-08-17 14:01:45,218][140404] Num frames 3200... [2023-08-17 14:01:45,283][140404] Num frames 3300... [2023-08-17 14:01:45,348][140404] Avg episode rewards: #0: 46.594, true rewards: #0: 16.595 [2023-08-17 14:01:45,349][140404] Avg episode reward: 46.594, avg true_objective: 16.595 [2023-08-17 14:01:45,398][140404] Num frames 3400... [2023-08-17 14:01:45,459][140404] Num frames 3500... [2023-08-17 14:01:45,518][140404] Num frames 3600... [2023-08-17 14:01:45,578][140404] Num frames 3700... [2023-08-17 14:01:45,637][140404] Num frames 3800... [2023-08-17 14:01:45,695][140404] Num frames 3900... [2023-08-17 14:01:45,754][140404] Num frames 4000... [2023-08-17 14:01:45,813][140404] Num frames 4100... [2023-08-17 14:01:45,871][140404] Num frames 4200... [2023-08-17 14:01:45,932][140404] Num frames 4300... [2023-08-17 14:01:45,993][140404] Num frames 4400... [2023-08-17 14:01:46,055][140404] Num frames 4500... [2023-08-17 14:01:46,114][140404] Num frames 4600... [2023-08-17 14:01:46,174][140404] Num frames 4700... [2023-08-17 14:01:46,233][140404] Num frames 4800... [2023-08-17 14:01:46,292][140404] Num frames 4900... [2023-08-17 14:01:46,352][140404] Num frames 5000... [2023-08-17 14:01:46,413][140404] Num frames 5100... [2023-08-17 14:01:46,475][140404] Num frames 5200... [2023-08-17 14:01:46,533][140404] Num frames 5300... [2023-08-17 14:01:46,594][140404] Num frames 5400... [2023-08-17 14:01:46,658][140404] Avg episode rewards: #0: 50.396, true rewards: #0: 18.063 [2023-08-17 14:01:46,658][140404] Avg episode reward: 50.396, avg true_objective: 18.063 [2023-08-17 14:01:46,708][140404] Num frames 5500... [2023-08-17 14:01:46,767][140404] Num frames 5600... [2023-08-17 14:01:46,827][140404] Num frames 5700... [2023-08-17 14:01:46,886][140404] Num frames 5800... [2023-08-17 14:01:46,946][140404] Num frames 5900... [2023-08-17 14:01:47,005][140404] Num frames 6000... [2023-08-17 14:01:47,065][140404] Num frames 6100... [2023-08-17 14:01:47,125][140404] Num frames 6200... [2023-08-17 14:01:47,184][140404] Num frames 6300... [2023-08-17 14:01:47,244][140404] Num frames 6400... [2023-08-17 14:01:47,305][140404] Num frames 6500... [2023-08-17 14:01:47,365][140404] Num frames 6600... [2023-08-17 14:01:47,425][140404] Num frames 6700... [2023-08-17 14:01:47,483][140404] Num frames 6800... [2023-08-17 14:01:47,542][140404] Num frames 6900... [2023-08-17 14:01:47,599][140404] Num frames 7000... [2023-08-17 14:01:47,659][140404] Num frames 7100... [2023-08-17 14:01:47,718][140404] Num frames 7200... [2023-08-17 14:01:47,780][140404] Num frames 7300... [2023-08-17 14:01:47,842][140404] Num frames 7400... [2023-08-17 14:01:47,920][140404] Num frames 7500... [2023-08-17 14:01:47,984][140404] Avg episode rewards: #0: 52.796, true rewards: #0: 18.798 [2023-08-17 14:01:47,985][140404] Avg episode reward: 52.796, avg true_objective: 18.798 [2023-08-17 14:01:48,035][140404] Num frames 7600... [2023-08-17 14:01:48,099][140404] Num frames 7700... [2023-08-17 14:01:48,174][140404] Num frames 7800... [2023-08-17 14:01:48,233][140404] Num frames 7900... [2023-08-17 14:01:48,302][140404] Num frames 8000... [2023-08-17 14:01:48,374][140404] Num frames 8100... [2023-08-17 14:01:48,434][140404] Num frames 8200... [2023-08-17 14:01:48,494][140404] Num frames 8300... [2023-08-17 14:01:48,553][140404] Num frames 8400... [2023-08-17 14:01:48,634][140404] Avg episode rewards: #0: 46.293, true rewards: #0: 16.894 [2023-08-17 14:01:48,635][140404] Avg episode reward: 46.293, avg true_objective: 16.894 [2023-08-17 14:01:48,665][140404] Num frames 8500... [2023-08-17 14:01:48,723][140404] Num frames 8600... [2023-08-17 14:01:48,782][140404] Num frames 8700... [2023-08-17 14:01:48,839][140404] Num frames 8800... [2023-08-17 14:01:48,899][140404] Num frames 8900... [2023-08-17 14:01:48,959][140404] Num frames 9000... [2023-08-17 14:01:49,018][140404] Num frames 9100... [2023-08-17 14:01:49,078][140404] Num frames 9200... [2023-08-17 14:01:49,135][140404] Num frames 9300... [2023-08-17 14:01:49,190][140404] Num frames 9400... [2023-08-17 14:01:49,245][140404] Num frames 9500... [2023-08-17 14:01:49,299][140404] Num frames 9600... [2023-08-17 14:01:49,357][140404] Num frames 9700... [2023-08-17 14:01:49,417][140404] Num frames 9800... [2023-08-17 14:01:49,474][140404] Num frames 9900... [2023-08-17 14:01:49,532][140404] Num frames 10000... [2023-08-17 14:01:49,590][140404] Num frames 10100... [2023-08-17 14:01:49,648][140404] Num frames 10200... [2023-08-17 14:01:49,704][140404] Num frames 10300... [2023-08-17 14:01:49,760][140404] Num frames 10400... [2023-08-17 14:01:49,814][140404] Num frames 10500... [2023-08-17 14:01:49,891][140404] Avg episode rewards: #0: 48.244, true rewards: #0: 17.578 [2023-08-17 14:01:49,892][140404] Avg episode reward: 48.244, avg true_objective: 17.578 [2023-08-17 14:01:49,920][140404] Num frames 10600... [2023-08-17 14:01:49,975][140404] Num frames 10700... [2023-08-17 14:01:50,027][140404] Num frames 10800... [2023-08-17 14:01:50,080][140404] Num frames 10900... [2023-08-17 14:01:50,132][140404] Num frames 11000... [2023-08-17 14:01:50,185][140404] Num frames 11100... [2023-08-17 14:01:50,237][140404] Num frames 11200... [2023-08-17 14:01:50,289][140404] Num frames 11300... [2023-08-17 14:01:50,343][140404] Num frames 11400... [2023-08-17 14:01:50,396][140404] Num frames 11500... [2023-08-17 14:01:50,450][140404] Num frames 11600... [2023-08-17 14:01:50,504][140404] Num frames 11700... [2023-08-17 14:01:50,557][140404] Num frames 11800... [2023-08-17 14:01:50,610][140404] Num frames 11900... [2023-08-17 14:01:50,664][140404] Num frames 12000... [2023-08-17 14:01:50,717][140404] Num frames 12100... [2023-08-17 14:01:50,769][140404] Num frames 12200... [2023-08-17 14:01:50,823][140404] Num frames 12300... [2023-08-17 14:01:50,880][140404] Num frames 12400... [2023-08-17 14:01:50,934][140404] Num frames 12500... [2023-08-17 14:01:50,989][140404] Num frames 12600... [2023-08-17 14:01:51,057][140404] Avg episode rewards: #0: 49.609, true rewards: #0: 18.039 [2023-08-17 14:01:51,058][140404] Avg episode reward: 49.609, avg true_objective: 18.039 [2023-08-17 14:01:51,098][140404] Num frames 12700... [2023-08-17 14:01:51,155][140404] Num frames 12800... [2023-08-17 14:01:51,210][140404] Num frames 12900... [2023-08-17 14:01:51,264][140404] Num frames 13000... [2023-08-17 14:01:51,318][140404] Num frames 13100... [2023-08-17 14:01:51,372][140404] Num frames 13200... [2023-08-17 14:01:51,426][140404] Num frames 13300... [2023-08-17 14:01:51,480][140404] Num frames 13400... [2023-08-17 14:01:51,532][140404] Num frames 13500... [2023-08-17 14:01:51,587][140404] Num frames 13600... [2023-08-17 14:01:51,639][140404] Num frames 13700... [2023-08-17 14:01:51,692][140404] Num frames 13800... [2023-08-17 14:01:51,745][140404] Num frames 13900... [2023-08-17 14:01:51,799][140404] Num frames 14000... [2023-08-17 14:01:51,853][140404] Num frames 14100... [2023-08-17 14:01:51,906][140404] Num frames 14200... [2023-08-17 14:01:51,960][140404] Num frames 14300... [2023-08-17 14:01:52,015][140404] Num frames 14400... [2023-08-17 14:01:52,073][140404] Num frames 14500... [2023-08-17 14:01:52,129][140404] Num frames 14600... [2023-08-17 14:01:52,186][140404] Num frames 14700... [2023-08-17 14:01:52,254][140404] Avg episode rewards: #0: 51.033, true rewards: #0: 18.409 [2023-08-17 14:01:52,255][140404] Avg episode reward: 51.033, avg true_objective: 18.409 [2023-08-17 14:01:52,296][140404] Num frames 14800... [2023-08-17 14:01:52,354][140404] Num frames 14900... [2023-08-17 14:01:52,412][140404] Num frames 15000... [2023-08-17 14:01:52,465][140404] Num frames 15100... [2023-08-17 14:01:52,518][140404] Num frames 15200... [2023-08-17 14:01:52,572][140404] Num frames 15300... [2023-08-17 14:01:52,626][140404] Num frames 15400... [2023-08-17 14:01:52,683][140404] Num frames 15500... [2023-08-17 14:01:52,738][140404] Num frames 15600... [2023-08-17 14:01:52,792][140404] Num frames 15700... [2023-08-17 14:01:52,846][140404] Num frames 15800... [2023-08-17 14:01:52,899][140404] Num frames 15900... [2023-08-17 14:01:52,952][140404] Num frames 16000... [2023-08-17 14:01:53,006][140404] Num frames 16100... [2023-08-17 14:01:53,058][140404] Num frames 16200... [2023-08-17 14:01:53,113][140404] Num frames 16300... [2023-08-17 14:01:53,165][140404] Num frames 16400... [2023-08-17 14:01:53,219][140404] Num frames 16500... [2023-08-17 14:01:53,299][140404] Avg episode rewards: #0: 50.500, true rewards: #0: 18.390 [2023-08-17 14:01:53,300][140404] Avg episode reward: 50.500, avg true_objective: 18.390 [2023-08-17 14:01:53,326][140404] Num frames 16600... [2023-08-17 14:01:53,379][140404] Num frames 16700... [2023-08-17 14:01:53,433][140404] Num frames 16800... [2023-08-17 14:01:53,485][140404] Num frames 16900... [2023-08-17 14:01:53,538][140404] Num frames 17000... [2023-08-17 14:01:53,591][140404] Num frames 17100... [2023-08-17 14:01:53,644][140404] Num frames 17200... [2023-08-17 14:01:53,698][140404] Num frames 17300... [2023-08-17 14:01:53,753][140404] Num frames 17400... [2023-08-17 14:01:53,808][140404] Num frames 17500... [2023-08-17 14:01:53,861][140404] Num frames 17600... [2023-08-17 14:01:53,914][140404] Num frames 17700... [2023-08-17 14:01:53,968][140404] Num frames 17800... [2023-08-17 14:01:54,022][140404] Num frames 17900... [2023-08-17 14:01:54,076][140404] Num frames 18000... [2023-08-17 14:01:54,129][140404] Num frames 18100... [2023-08-17 14:01:54,192][140404] Avg episode rewards: #0: 49.718, true rewards: #0: 18.119 [2023-08-17 14:01:54,193][140404] Avg episode reward: 49.718, avg true_objective: 18.119 [2023-08-17 14:02:11,577][140404] Replay video saved to /home/patonw/code/learn/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!