diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -982,3 +982,1587 @@ main_loop: 1154.1242 [2023-02-27 11:51:38,412][00107] Avg episode rewards: #0: 4.528, true rewards: #0: 4.128 [2023-02-27 11:51:38,414][00107] Avg episode reward: 4.528, avg true_objective: 4.128 [2023-02-27 11:51:59,229][00107] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2023-02-27 11:52:02,875][00107] The model has been pushed to https://huggingface.co/KoRiF/rl_course_vizdoom_health_gathering_supreme +[2023-02-27 11:53:19,430][00107] Environment doom_basic already registered, overwriting... +[2023-02-27 11:53:19,433][00107] Environment doom_two_colors_easy already registered, overwriting... +[2023-02-27 11:53:19,435][00107] Environment doom_two_colors_hard already registered, overwriting... +[2023-02-27 11:53:19,436][00107] Environment doom_dm already registered, overwriting... +[2023-02-27 11:53:19,438][00107] Environment doom_dwango5 already registered, overwriting... +[2023-02-27 11:53:19,440][00107] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2023-02-27 11:53:19,441][00107] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2023-02-27 11:53:19,443][00107] Environment doom_my_way_home already registered, overwriting... +[2023-02-27 11:53:19,445][00107] Environment doom_deadly_corridor already registered, overwriting... +[2023-02-27 11:53:19,451][00107] Environment doom_defend_the_center already registered, overwriting... +[2023-02-27 11:53:19,452][00107] Environment doom_defend_the_line already registered, overwriting... +[2023-02-27 11:53:19,454][00107] Environment doom_health_gathering already registered, overwriting... +[2023-02-27 11:53:19,456][00107] Environment doom_health_gathering_supreme already registered, overwriting... +[2023-02-27 11:53:19,458][00107] Environment doom_battle already registered, overwriting... +[2023-02-27 11:53:19,459][00107] Environment doom_battle2 already registered, overwriting... +[2023-02-27 11:53:19,460][00107] Environment doom_duel_bots already registered, overwriting... +[2023-02-27 11:53:19,462][00107] Environment doom_deathmatch_bots already registered, overwriting... +[2023-02-27 11:53:19,463][00107] Environment doom_duel already registered, overwriting... +[2023-02-27 11:53:19,464][00107] Environment doom_deathmatch_full already registered, overwriting... +[2023-02-27 11:53:19,466][00107] Environment doom_benchmark already registered, overwriting... +[2023-02-27 11:53:19,467][00107] register_encoder_factory: +[2023-02-27 11:53:19,500][00107] Loading legacy config file train_dir/doom_deathmatch_bots_2222/cfg.json instead of train_dir/doom_deathmatch_bots_2222/config.json +[2023-02-27 11:53:19,502][00107] Loading existing experiment configuration from train_dir/doom_deathmatch_bots_2222/config.json +[2023-02-27 11:53:19,503][00107] Overriding arg 'experiment' with value 'doom_deathmatch_bots_2222' passed from command line +[2023-02-27 11:53:19,505][00107] Overriding arg 'train_dir' with value 'train_dir' passed from command line +[2023-02-27 11:53:19,506][00107] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-27 11:53:19,508][00107] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! +[2023-02-27 11:53:19,509][00107] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! +[2023-02-27 11:53:19,510][00107] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! +[2023-02-27 11:53:19,512][00107] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-27 11:53:19,513][00107] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-27 11:53:19,514][00107] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 11:53:19,516][00107] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-27 11:53:19,517][00107] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 11:53:19,518][00107] Adding new argument 'max_num_episodes'=1 that is not in the saved config file! +[2023-02-27 11:53:19,520][00107] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2023-02-27 11:53:19,521][00107] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2023-02-27 11:53:19,522][00107] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-27 11:53:19,524][00107] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-27 11:53:19,525][00107] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-27 11:53:19,526][00107] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-27 11:53:19,528][00107] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-27 11:53:19,571][00107] Port 40300 is available +[2023-02-27 11:53:19,574][00107] Using port 40300 +[2023-02-27 11:53:19,576][00107] RunningMeanStd input shape: (23,) +[2023-02-27 11:53:19,580][00107] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 11:53:19,581][00107] RunningMeanStd input shape: (1,) +[2023-02-27 11:53:19,598][00107] ConvEncoder: input_channels=3 +[2023-02-27 11:53:19,643][00107] Conv encoder output size: 512 +[2023-02-27 11:53:19,646][00107] Policy head output size: 512 +[2023-02-27 11:53:19,693][00107] Loading state from checkpoint train_dir/doom_deathmatch_bots_2222/checkpoint_p0/checkpoint_000282220_2311946240.pth... +[2023-02-27 11:53:19,728][00107] Using port 40300 on host... +[2023-02-27 11:53:20,084][00107] Initialized w:0 v:0 player:0 +[2023-02-27 11:53:20,374][00107] Num frames 100... +[2023-02-27 11:53:20,638][00107] Num frames 200... +[2023-02-27 11:53:20,888][00107] Num frames 300... +[2023-02-27 11:53:21,139][00107] Num frames 400... +[2023-02-27 11:53:21,416][00107] Num frames 500... +[2023-02-27 11:53:21,671][00107] Num frames 600... +[2023-02-27 11:53:21,928][00107] Num frames 700... +[2023-02-27 11:53:22,186][00107] Num frames 800... +[2023-02-27 11:53:22,442][00107] Num frames 900... +[2023-02-27 11:53:22,706][00107] Num frames 1000... +[2023-02-27 11:53:22,966][00107] Num frames 1100... +[2023-02-27 11:53:23,172][00107] Num frames 1200... +[2023-02-27 11:53:23,367][00107] Num frames 1300... +[2023-02-27 11:53:23,549][00107] Num frames 1400... +[2023-02-27 11:53:23,740][00107] Num frames 1500... +[2023-02-27 11:53:23,930][00107] Num frames 1600... +[2023-02-27 11:53:24,118][00107] Num frames 1700... +[2023-02-27 11:53:24,294][00107] Num frames 1800... +[2023-02-27 11:53:24,484][00107] Num frames 1900... +[2023-02-27 11:53:24,679][00107] Num frames 2000... +[2023-02-27 11:53:24,868][00107] Num frames 2100... +[2023-02-27 11:53:25,049][00107] Num frames 2200... +[2023-02-27 11:53:25,236][00107] Num frames 2300... +[2023-02-27 11:53:25,411][00107] Num frames 2400... +[2023-02-27 11:53:25,592][00107] Num frames 2500... +[2023-02-27 11:53:25,767][00107] Num frames 2600... +[2023-02-27 11:53:25,946][00107] Num frames 2700... +[2023-02-27 11:53:26,126][00107] Num frames 2800... +[2023-02-27 11:53:26,307][00107] Num frames 2900... +[2023-02-27 11:53:26,495][00107] Num frames 3000... +[2023-02-27 11:53:26,678][00107] Num frames 3100... +[2023-02-27 11:53:26,862][00107] Num frames 3200... +[2023-02-27 11:53:27,051][00107] Num frames 3300... +[2023-02-27 11:53:27,230][00107] Num frames 3400... +[2023-02-27 11:53:27,406][00107] Num frames 3500... +[2023-02-27 11:53:27,600][00107] Num frames 3600... +[2023-02-27 11:53:27,780][00107] Num frames 3700... +[2023-02-27 11:53:27,954][00107] Num frames 3800... +[2023-02-27 11:53:28,131][00107] Num frames 3900... +[2023-02-27 11:53:28,312][00107] Num frames 4000... +[2023-02-27 11:53:28,494][00107] Num frames 4100... +[2023-02-27 11:53:28,691][00107] Num frames 4200... +[2023-02-27 11:53:28,869][00107] Num frames 4300... +[2023-02-27 11:53:29,046][00107] Num frames 4400... +[2023-02-27 11:53:29,228][00107] Num frames 4500... +[2023-02-27 11:53:29,411][00107] Num frames 4600... +[2023-02-27 11:53:29,583][00107] Num frames 4700... +[2023-02-27 11:53:29,776][00107] Num frames 4800... +[2023-02-27 11:53:29,972][00107] Num frames 4900... +[2023-02-27 11:53:30,156][00107] Num frames 5000... +[2023-02-27 11:53:30,377][00107] Num frames 5100... +[2023-02-27 11:53:30,567][00107] Num frames 5200... +[2023-02-27 11:53:30,761][00107] Num frames 5300... +[2023-02-27 11:53:30,951][00107] Num frames 5400... +[2023-02-27 11:53:31,126][00107] Num frames 5500... +[2023-02-27 11:53:31,315][00107] Num frames 5600... +[2023-02-27 11:53:31,488][00107] Num frames 5700... +[2023-02-27 11:53:31,708][00107] Num frames 5800... +[2023-02-27 11:53:31,885][00107] Num frames 5900... +[2023-02-27 11:53:32,067][00107] Num frames 6000... +[2023-02-27 11:53:32,241][00107] Num frames 6100... +[2023-02-27 11:53:32,423][00107] Num frames 6200... +[2023-02-27 11:53:32,608][00107] Num frames 6300... +[2023-02-27 11:53:32,791][00107] Num frames 6400... +[2023-02-27 11:53:32,983][00107] Num frames 6500... +[2023-02-27 11:53:33,228][00107] Num frames 6600... +[2023-02-27 11:53:33,485][00107] Num frames 6700... +[2023-02-27 11:53:33,749][00107] Num frames 6800... +[2023-02-27 11:53:34,011][00107] Num frames 6900... +[2023-02-27 11:53:34,262][00107] Num frames 7000... +[2023-02-27 11:53:34,511][00107] Num frames 7100... +[2023-02-27 11:53:34,771][00107] Num frames 7200... +[2023-02-27 11:53:35,021][00107] Num frames 7300... +[2023-02-27 11:53:35,274][00107] Num frames 7400... +[2023-02-27 11:53:35,526][00107] Num frames 7500... +[2023-02-27 11:53:35,798][00107] Num frames 7600... +[2023-02-27 11:53:36,071][00107] Num frames 7700... +[2023-02-27 11:53:36,260][00107] Num frames 7800... +[2023-02-27 11:53:36,445][00107] Num frames 7900... +[2023-02-27 11:53:36,629][00107] Num frames 8000... +[2023-02-27 11:53:36,810][00107] Num frames 8100... +[2023-02-27 11:53:36,990][00107] Num frames 8200... +[2023-02-27 11:53:37,180][00107] Num frames 8300... +[2023-02-27 11:53:37,365][00107] DAMAGECOUNT value on done: 7437.0 +[2023-02-27 11:53:37,368][00107] Sum rewards: 97.978, reward structure: {'DEATHCOUNT': '-15.000', 'HEALTH': '-6.285', 'AMMO5': '0.008', 'AMMO2': '0.027', 'AMMO4': '0.135', 'AMMO3': '0.252', 'WEAPON4': '0.300', 'WEAPON5': '0.300', 'weapon4': '0.398', 'weapon5': '0.542', 'weapon2': '1.376', 'WEAPON3': '1.800', 'HITCOUNT': '3.990', 'weapon3': '13.824', 'DAMAGECOUNT': '22.311', 'FRAGCOUNT': '74.000'} +[2023-02-27 11:53:37,434][00107] Avg episode rewards: #0: 97.973, true rewards: #0: 74.000 +[2023-02-27 11:53:37,436][00107] Avg episode reward: 97.973, avg true_objective: 74.000 +[2023-02-27 11:53:37,445][00107] Num frames 8400... +[2023-02-27 11:54:27,397][00107] Replay video saved to train_dir/doom_deathmatch_bots_2222/replay.mp4! +[2023-02-27 12:19:45,716][00107] Environment doom_basic already registered, overwriting... +[2023-02-27 12:19:45,719][00107] Environment doom_two_colors_easy already registered, overwriting... +[2023-02-27 12:19:45,724][00107] Environment doom_two_colors_hard already registered, overwriting... +[2023-02-27 12:19:45,726][00107] Environment doom_dm already registered, overwriting... +[2023-02-27 12:19:45,733][00107] Environment doom_dwango5 already registered, overwriting... +[2023-02-27 12:19:45,735][00107] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2023-02-27 12:19:45,737][00107] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2023-02-27 12:19:45,739][00107] Environment doom_my_way_home already registered, overwriting... +[2023-02-27 12:19:45,741][00107] Environment doom_deadly_corridor already registered, overwriting... +[2023-02-27 12:19:45,743][00107] Environment doom_defend_the_center already registered, overwriting... +[2023-02-27 12:19:45,745][00107] Environment doom_defend_the_line already registered, overwriting... +[2023-02-27 12:19:45,747][00107] Environment doom_health_gathering already registered, overwriting... +[2023-02-27 12:19:45,748][00107] Environment doom_health_gathering_supreme already registered, overwriting... +[2023-02-27 12:19:45,750][00107] Environment doom_battle already registered, overwriting... +[2023-02-27 12:19:45,753][00107] Environment doom_battle2 already registered, overwriting... +[2023-02-27 12:19:45,755][00107] Environment doom_duel_bots already registered, overwriting... +[2023-02-27 12:19:45,757][00107] Environment doom_deathmatch_bots already registered, overwriting... +[2023-02-27 12:19:45,760][00107] Environment doom_duel already registered, overwriting... +[2023-02-27 12:19:45,761][00107] Environment doom_deathmatch_full already registered, overwriting... +[2023-02-27 12:19:45,763][00107] Environment doom_benchmark already registered, overwriting... +[2023-02-27 12:19:45,764][00107] register_encoder_factory: +[2023-02-27 12:19:45,809][00107] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-27 12:19:45,812][00107] Overriding arg 'train_for_env_steps' with value 10000000 passed from command line +[2023-02-27 12:19:45,819][00107] Experiment dir /content/train_dir/default_experiment already exists! +[2023-02-27 12:19:45,821][00107] Resuming existing experiment from /content/train_dir/default_experiment... +[2023-02-27 12:19:45,824][00107] Weights and Biases integration disabled +[2023-02-27 12:19:45,829][00107] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2023-02-27 12:19:47,689][00107] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/content/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=10000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2023-02-27 12:19:47,692][00107] Saving configuration to /content/train_dir/default_experiment/config.json... +[2023-02-27 12:19:47,697][00107] Rollout worker 0 uses device cpu +[2023-02-27 12:19:47,699][00107] Rollout worker 1 uses device cpu +[2023-02-27 12:19:47,700][00107] Rollout worker 2 uses device cpu +[2023-02-27 12:19:47,702][00107] Rollout worker 3 uses device cpu +[2023-02-27 12:19:47,704][00107] Rollout worker 4 uses device cpu +[2023-02-27 12:19:47,705][00107] Rollout worker 5 uses device cpu +[2023-02-27 12:19:47,707][00107] Rollout worker 6 uses device cpu +[2023-02-27 12:19:47,708][00107] Rollout worker 7 uses device cpu +[2023-02-27 12:19:47,832][00107] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 12:19:47,834][00107] InferenceWorker_p0-w0: min num requests: 2 +[2023-02-27 12:19:47,867][00107] Starting all processes... +[2023-02-27 12:19:47,869][00107] Starting process learner_proc0 +[2023-02-27 12:19:48,003][00107] Starting all processes... +[2023-02-27 12:19:48,013][00107] Starting process inference_proc0-0 +[2023-02-27 12:19:48,013][00107] Starting process rollout_proc0 +[2023-02-27 12:19:48,014][00107] Starting process rollout_proc1 +[2023-02-27 12:19:48,014][00107] Starting process rollout_proc2 +[2023-02-27 12:19:48,018][00107] Starting process rollout_proc4 +[2023-02-27 12:19:48,018][00107] Starting process rollout_proc5 +[2023-02-27 12:19:48,018][00107] Starting process rollout_proc6 +[2023-02-27 12:19:48,018][00107] Starting process rollout_proc7 +[2023-02-27 12:19:48,018][00107] Starting process rollout_proc3 +[2023-02-27 12:19:56,416][36588] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 12:19:56,416][36588] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2023-02-27 12:19:56,463][36588] Num visible devices: 1 +[2023-02-27 12:19:56,501][36588] Starting seed is not provided +[2023-02-27 12:19:56,501][36588] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 12:19:56,502][36588] Initializing actor-critic model on device cuda:0 +[2023-02-27 12:19:56,503][36588] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 12:19:56,504][36588] RunningMeanStd input shape: (1,) +[2023-02-27 12:19:56,587][36588] ConvEncoder: input_channels=3 +[2023-02-27 12:19:57,850][36602] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 12:19:57,850][36602] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2023-02-27 12:19:57,858][36588] Conv encoder output size: 512 +[2023-02-27 12:19:57,864][36588] Policy head output size: 512 +[2023-02-27 12:19:57,951][36602] Num visible devices: 1 +[2023-02-27 12:19:58,081][36588] Created Actor Critic model with architecture: +[2023-02-27 12:19:58,087][36588] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2023-02-27 12:19:58,217][36605] Worker 2 uses CPU cores [0] +[2023-02-27 12:19:58,453][36603] Worker 1 uses CPU cores [1] +[2023-02-27 12:19:59,488][36611] Worker 0 uses CPU cores [0] +[2023-02-27 12:19:59,706][36613] Worker 5 uses CPU cores [1] +[2023-02-27 12:19:59,717][36615] Worker 4 uses CPU cores [0] +[2023-02-27 12:19:59,902][36619] Worker 3 uses CPU cores [1] +[2023-02-27 12:19:59,910][36617] Worker 6 uses CPU cores [0] +[2023-02-27 12:19:59,993][36625] Worker 7 uses CPU cores [1] +[2023-02-27 12:20:02,601][36588] Using optimizer +[2023-02-27 12:20:02,602][36588] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2023-02-27 12:20:02,638][36588] Loading model from checkpoint +[2023-02-27 12:20:02,643][36588] Loaded experiment state at self.train_step=978, self.env_steps=4005888 +[2023-02-27 12:20:02,644][36588] Initialized policy 0 weights for model version 978 +[2023-02-27 12:20:02,647][36588] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2023-02-27 12:20:02,655][36588] LearnerWorker_p0 finished initialization! +[2023-02-27 12:20:02,894][36602] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 12:20:02,895][36602] RunningMeanStd input shape: (1,) +[2023-02-27 12:20:02,909][36602] ConvEncoder: input_channels=3 +[2023-02-27 12:20:03,016][36602] Conv encoder output size: 512 +[2023-02-27 12:20:03,016][36602] Policy head output size: 512 +[2023-02-27 12:20:05,511][00107] Inference worker 0-0 is ready! +[2023-02-27 12:20:05,513][00107] All inference workers are ready! Signal rollout workers to start! +[2023-02-27 12:20:05,652][36603] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 12:20:05,655][36625] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 12:20:05,672][36613] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 12:20:05,681][36619] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 12:20:05,690][36617] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 12:20:05,689][36615] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 12:20:05,701][36605] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 12:20:05,703][36611] Doom resolution: 160x120, resize resolution: (128, 72) +[2023-02-27 12:20:05,830][00107] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-27 12:20:06,948][36611] Decorrelating experience for 0 frames... +[2023-02-27 12:20:06,945][36617] Decorrelating experience for 0 frames... +[2023-02-27 12:20:06,952][36615] Decorrelating experience for 0 frames... +[2023-02-27 12:20:07,284][36625] Decorrelating experience for 0 frames... +[2023-02-27 12:20:07,293][36603] Decorrelating experience for 0 frames... +[2023-02-27 12:20:07,306][36619] Decorrelating experience for 0 frames... +[2023-02-27 12:20:07,326][36613] Decorrelating experience for 0 frames... +[2023-02-27 12:20:07,691][36625] Decorrelating experience for 32 frames... +[2023-02-27 12:20:07,804][36611] Decorrelating experience for 32 frames... +[2023-02-27 12:20:07,808][36617] Decorrelating experience for 32 frames... +[2023-02-27 12:20:07,824][00107] Heartbeat connected on Batcher_0 +[2023-02-27 12:20:07,830][00107] Heartbeat connected on LearnerWorker_p0 +[2023-02-27 12:20:07,861][00107] Heartbeat connected on InferenceWorker_p0-w0 +[2023-02-27 12:20:08,282][36615] Decorrelating experience for 32 frames... +[2023-02-27 12:20:08,807][36611] Decorrelating experience for 64 frames... +[2023-02-27 12:20:08,866][36613] Decorrelating experience for 32 frames... +[2023-02-27 12:20:09,097][36603] Decorrelating experience for 32 frames... +[2023-02-27 12:20:09,102][36619] Decorrelating experience for 32 frames... +[2023-02-27 12:20:09,239][36615] Decorrelating experience for 64 frames... +[2023-02-27 12:20:09,784][36611] Decorrelating experience for 96 frames... +[2023-02-27 12:20:09,783][36625] Decorrelating experience for 64 frames... +[2023-02-27 12:20:10,022][00107] Heartbeat connected on RolloutWorker_w0 +[2023-02-27 12:20:10,234][36617] Decorrelating experience for 64 frames... +[2023-02-27 12:20:10,777][36613] Decorrelating experience for 64 frames... +[2023-02-27 12:20:10,830][00107] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-27 12:20:11,325][36615] Decorrelating experience for 96 frames... +[2023-02-27 12:20:11,346][36603] Decorrelating experience for 64 frames... +[2023-02-27 12:20:11,379][36605] Decorrelating experience for 0 frames... +[2023-02-27 12:20:11,419][36619] Decorrelating experience for 64 frames... +[2023-02-27 12:20:11,742][00107] Heartbeat connected on RolloutWorker_w4 +[2023-02-27 12:20:12,196][36617] Decorrelating experience for 96 frames... +[2023-02-27 12:20:12,858][00107] Heartbeat connected on RolloutWorker_w6 +[2023-02-27 12:20:13,551][36625] Decorrelating experience for 96 frames... +[2023-02-27 12:20:13,738][36603] Decorrelating experience for 96 frames... +[2023-02-27 12:20:13,823][36619] Decorrelating experience for 96 frames... +[2023-02-27 12:20:14,181][00107] Heartbeat connected on RolloutWorker_w7 +[2023-02-27 12:20:14,427][36605] Decorrelating experience for 32 frames... +[2023-02-27 12:20:14,541][00107] Heartbeat connected on RolloutWorker_w1 +[2023-02-27 12:20:14,586][00107] Heartbeat connected on RolloutWorker_w3 +[2023-02-27 12:20:15,830][00107] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 60.8. Samples: 608. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-27 12:20:15,832][00107] Avg episode reward: [(0, '2.176')] +[2023-02-27 12:20:17,828][36613] Decorrelating experience for 96 frames... +[2023-02-27 12:20:18,416][00107] Heartbeat connected on RolloutWorker_w5 +[2023-02-27 12:20:19,243][36588] Signal inference workers to stop experience collection... +[2023-02-27 12:20:19,261][36602] InferenceWorker_p0-w0: stopping experience collection +[2023-02-27 12:20:19,488][36605] Decorrelating experience for 64 frames... +[2023-02-27 12:20:20,052][36605] Decorrelating experience for 96 frames... +[2023-02-27 12:20:20,119][00107] Heartbeat connected on RolloutWorker_w2 +[2023-02-27 12:20:20,830][00107] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 166.7. Samples: 2500. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2023-02-27 12:20:20,832][00107] Avg episode reward: [(0, '2.882')] +[2023-02-27 12:20:22,231][36588] Signal inference workers to resume experience collection... +[2023-02-27 12:20:22,233][36602] InferenceWorker_p0-w0: resuming experience collection +[2023-02-27 12:20:25,830][00107] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 4022272. Throughput: 0: 170.4. Samples: 3408. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2023-02-27 12:20:25,838][00107] Avg episode reward: [(0, '3.743')] +[2023-02-27 12:20:30,830][00107] Fps is (10 sec: 3686.4, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 4042752. Throughput: 0: 364.3. Samples: 9108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:20:30,833][00107] Avg episode reward: [(0, '4.363')] +[2023-02-27 12:20:31,173][36602] Updated weights for policy 0, policy_version 988 (0.0021) +[2023-02-27 12:20:35,831][00107] Fps is (10 sec: 3685.9, 60 sec: 1774.9, 300 sec: 1774.9). Total num frames: 4059136. Throughput: 0: 465.8. Samples: 13976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-27 12:20:35,842][00107] Avg episode reward: [(0, '4.626')] +[2023-02-27 12:20:40,830][00107] Fps is (10 sec: 2867.2, 60 sec: 1872.5, 300 sec: 1872.5). Total num frames: 4071424. Throughput: 0: 455.8. Samples: 15952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:20:40,832][00107] Avg episode reward: [(0, '4.666')] +[2023-02-27 12:20:44,462][36602] Updated weights for policy 0, policy_version 998 (0.0017) +[2023-02-27 12:20:45,830][00107] Fps is (10 sec: 3277.3, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 4091904. Throughput: 0: 526.3. Samples: 21052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:20:45,832][00107] Avg episode reward: [(0, '4.639')] +[2023-02-27 12:20:50,830][00107] Fps is (10 sec: 4096.0, 60 sec: 2366.6, 300 sec: 2366.6). Total num frames: 4112384. Throughput: 0: 608.6. Samples: 27388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:20:50,832][00107] Avg episode reward: [(0, '4.710')] +[2023-02-27 12:20:55,830][00107] Fps is (10 sec: 3276.8, 60 sec: 2375.7, 300 sec: 2375.7). Total num frames: 4124672. Throughput: 0: 658.4. Samples: 29628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:20:55,837][00107] Avg episode reward: [(0, '4.563')] +[2023-02-27 12:20:56,202][36602] Updated weights for policy 0, policy_version 1008 (0.0023) +[2023-02-27 12:21:00,830][00107] Fps is (10 sec: 2867.2, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 4141056. Throughput: 0: 732.8. Samples: 33584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:21:00,832][00107] Avg episode reward: [(0, '4.694')] +[2023-02-27 12:21:05,829][00107] Fps is (10 sec: 3686.4, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 4161536. Throughput: 0: 825.5. Samples: 39648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-02-27 12:21:05,836][00107] Avg episode reward: [(0, '4.608')] +[2023-02-27 12:21:07,502][36602] Updated weights for policy 0, policy_version 1018 (0.0014) +[2023-02-27 12:21:10,831][00107] Fps is (10 sec: 4095.4, 60 sec: 2935.4, 300 sec: 2709.6). Total num frames: 4182016. Throughput: 0: 875.4. Samples: 42804. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:21:10,833][00107] Avg episode reward: [(0, '4.884')] +[2023-02-27 12:21:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 2691.7). Total num frames: 4194304. Throughput: 0: 854.9. Samples: 47580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:21:15,834][00107] Avg episode reward: [(0, '4.874')] +[2023-02-27 12:21:20,434][36602] Updated weights for policy 0, policy_version 1028 (0.0026) +[2023-02-27 12:21:20,830][00107] Fps is (10 sec: 2867.6, 60 sec: 3413.3, 300 sec: 2730.7). Total num frames: 4210688. Throughput: 0: 845.2. Samples: 52008. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-02-27 12:21:20,832][00107] Avg episode reward: [(0, '4.864')] +[2023-02-27 12:21:25,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 2816.0). Total num frames: 4231168. Throughput: 0: 871.8. Samples: 55182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:21:25,837][00107] Avg episode reward: [(0, '4.847')] +[2023-02-27 12:21:30,175][36602] Updated weights for policy 0, policy_version 1038 (0.0014) +[2023-02-27 12:21:30,836][00107] Fps is (10 sec: 4093.3, 60 sec: 3481.2, 300 sec: 2891.1). Total num frames: 4251648. Throughput: 0: 901.1. Samples: 61606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:21:30,846][00107] Avg episode reward: [(0, '4.728')] +[2023-02-27 12:21:35,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 2867.2). Total num frames: 4263936. Throughput: 0: 850.3. Samples: 65650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:21:35,836][00107] Avg episode reward: [(0, '4.721')] +[2023-02-27 12:21:40,829][00107] Fps is (10 sec: 2869.1, 60 sec: 3481.6, 300 sec: 2888.8). Total num frames: 4280320. Throughput: 0: 845.2. Samples: 67664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:21:40,831][00107] Avg episode reward: [(0, '4.730')] +[2023-02-27 12:21:42,990][36602] Updated weights for policy 0, policy_version 1048 (0.0019) +[2023-02-27 12:21:45,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 2949.1). Total num frames: 4300800. Throughput: 0: 894.3. Samples: 73828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:21:45,832][00107] Avg episode reward: [(0, '4.769')] +[2023-02-27 12:21:45,845][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001050_4300800.pth... +[2023-02-27 12:21:46,004][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000916_3751936.pth +[2023-02-27 12:21:50,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3003.7). Total num frames: 4321280. Throughput: 0: 885.9. Samples: 79514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:21:50,832][00107] Avg episode reward: [(0, '4.748')] +[2023-02-27 12:21:54,951][36602] Updated weights for policy 0, policy_version 1058 (0.0025) +[2023-02-27 12:21:55,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 2978.9). Total num frames: 4333568. Throughput: 0: 860.1. Samples: 81506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:21:55,836][00107] Avg episode reward: [(0, '4.735')] +[2023-02-27 12:22:00,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 2991.9). Total num frames: 4349952. Throughput: 0: 855.6. Samples: 86082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:22:00,832][00107] Avg episode reward: [(0, '5.041')] +[2023-02-27 12:22:05,772][36602] Updated weights for policy 0, policy_version 1068 (0.0013) +[2023-02-27 12:22:05,830][00107] Fps is (10 sec: 4095.9, 60 sec: 3549.9, 300 sec: 3072.0). Total num frames: 4374528. Throughput: 0: 899.0. Samples: 92462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:22:05,837][00107] Avg episode reward: [(0, '5.104')] +[2023-02-27 12:22:10,832][00107] Fps is (10 sec: 4094.8, 60 sec: 3481.5, 300 sec: 3080.1). Total num frames: 4390912. Throughput: 0: 895.5. Samples: 95480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:22:10,839][00107] Avg episode reward: [(0, '4.768')] +[2023-02-27 12:22:15,830][00107] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3056.2). Total num frames: 4403200. Throughput: 0: 842.6. Samples: 99516. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:22:15,834][00107] Avg episode reward: [(0, '4.697')] +[2023-02-27 12:22:18,981][36602] Updated weights for policy 0, policy_version 1078 (0.0022) +[2023-02-27 12:22:20,830][00107] Fps is (10 sec: 2868.0, 60 sec: 3481.6, 300 sec: 3064.4). Total num frames: 4419584. Throughput: 0: 868.0. Samples: 104708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:22:20,831][00107] Avg episode reward: [(0, '4.698')] +[2023-02-27 12:22:25,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3130.5). Total num frames: 4444160. Throughput: 0: 893.1. Samples: 107852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:22:25,838][00107] Avg episode reward: [(0, '4.874')] +[2023-02-27 12:22:29,293][36602] Updated weights for policy 0, policy_version 1088 (0.0013) +[2023-02-27 12:22:30,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3482.0, 300 sec: 3135.6). Total num frames: 4460544. Throughput: 0: 880.1. Samples: 113434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:22:30,832][00107] Avg episode reward: [(0, '4.865')] +[2023-02-27 12:22:35,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3113.0). Total num frames: 4472832. Throughput: 0: 844.0. Samples: 117492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:22:35,835][00107] Avg episode reward: [(0, '4.813')] +[2023-02-27 12:22:40,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3144.7). Total num frames: 4493312. Throughput: 0: 859.4. Samples: 120180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:22:40,835][00107] Avg episode reward: [(0, '4.616')] +[2023-02-27 12:22:41,692][36602] Updated weights for policy 0, policy_version 1098 (0.0014) +[2023-02-27 12:22:45,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3174.4). Total num frames: 4513792. Throughput: 0: 899.4. Samples: 126556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:22:45,835][00107] Avg episode reward: [(0, '4.878')] +[2023-02-27 12:22:50,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3177.5). Total num frames: 4530176. Throughput: 0: 867.6. Samples: 131502. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:22:50,836][00107] Avg episode reward: [(0, '4.917')] +[2023-02-27 12:22:53,679][36602] Updated weights for policy 0, policy_version 1108 (0.0013) +[2023-02-27 12:22:55,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3156.3). Total num frames: 4542464. Throughput: 0: 845.3. Samples: 133514. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:22:55,837][00107] Avg episode reward: [(0, '4.937')] +[2023-02-27 12:23:00,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3183.2). Total num frames: 4562944. Throughput: 0: 876.8. Samples: 138974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:23:00,832][00107] Avg episode reward: [(0, '4.930')] +[2023-02-27 12:23:04,233][36602] Updated weights for policy 0, policy_version 1118 (0.0030) +[2023-02-27 12:23:05,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3208.5). Total num frames: 4583424. Throughput: 0: 904.2. Samples: 145396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:23:05,837][00107] Avg episode reward: [(0, '4.867')] +[2023-02-27 12:23:10,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 3210.4). Total num frames: 4599808. Throughput: 0: 883.0. Samples: 147586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:23:10,833][00107] Avg episode reward: [(0, '5.048')] +[2023-02-27 12:23:15,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3190.6). Total num frames: 4612096. Throughput: 0: 847.7. Samples: 151580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:23:15,833][00107] Avg episode reward: [(0, '4.911')] +[2023-02-27 12:23:17,256][36602] Updated weights for policy 0, policy_version 1128 (0.0026) +[2023-02-27 12:23:20,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3213.8). Total num frames: 4632576. Throughput: 0: 892.4. Samples: 157652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:23:20,836][00107] Avg episode reward: [(0, '4.753')] +[2023-02-27 12:23:25,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3235.8). Total num frames: 4653056. Throughput: 0: 903.2. Samples: 160824. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:23:25,834][00107] Avg episode reward: [(0, '4.977')] +[2023-02-27 12:23:27,537][36602] Updated weights for policy 0, policy_version 1138 (0.0014) +[2023-02-27 12:23:30,830][00107] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3236.8). Total num frames: 4669440. Throughput: 0: 869.2. Samples: 165672. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:23:30,838][00107] Avg episode reward: [(0, '5.020')] +[2023-02-27 12:23:35,830][00107] Fps is (10 sec: 2867.0, 60 sec: 3481.6, 300 sec: 3218.3). Total num frames: 4681728. Throughput: 0: 857.9. Samples: 170110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:23:35,835][00107] Avg episode reward: [(0, '4.918')] +[2023-02-27 12:23:39,896][36602] Updated weights for policy 0, policy_version 1148 (0.0015) +[2023-02-27 12:23:40,829][00107] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3257.8). Total num frames: 4706304. Throughput: 0: 883.8. Samples: 173284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:23:40,835][00107] Avg episode reward: [(0, '5.014')] +[2023-02-27 12:23:45,830][00107] Fps is (10 sec: 4096.2, 60 sec: 3481.6, 300 sec: 3258.2). Total num frames: 4722688. Throughput: 0: 901.7. Samples: 179552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:23:45,834][00107] Avg episode reward: [(0, '4.948')] +[2023-02-27 12:23:45,848][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001153_4722688.pth... +[2023-02-27 12:23:46,082][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth +[2023-02-27 12:23:50,832][00107] Fps is (10 sec: 2866.4, 60 sec: 3413.2, 300 sec: 3240.4). Total num frames: 4734976. Throughput: 0: 849.8. Samples: 183638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:23:50,835][00107] Avg episode reward: [(0, '4.771')] +[2023-02-27 12:23:52,456][36602] Updated weights for policy 0, policy_version 1158 (0.0013) +[2023-02-27 12:23:55,834][00107] Fps is (10 sec: 3275.4, 60 sec: 3549.6, 300 sec: 3258.9). Total num frames: 4755456. Throughput: 0: 846.5. Samples: 185684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:23:55,842][00107] Avg episode reward: [(0, '4.555')] +[2023-02-27 12:24:00,830][00107] Fps is (10 sec: 4097.0, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 4775936. Throughput: 0: 898.8. Samples: 192024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:24:00,832][00107] Avg episode reward: [(0, '4.586')] +[2023-02-27 12:24:02,396][36602] Updated weights for policy 0, policy_version 1168 (0.0024) +[2023-02-27 12:24:05,830][00107] Fps is (10 sec: 3688.0, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 4792320. Throughput: 0: 893.7. Samples: 197868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:24:05,836][00107] Avg episode reward: [(0, '4.651')] +[2023-02-27 12:24:10,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 4808704. Throughput: 0: 867.4. Samples: 199858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-27 12:24:10,837][00107] Avg episode reward: [(0, '4.661')] +[2023-02-27 12:24:15,363][36602] Updated weights for policy 0, policy_version 1178 (0.0036) +[2023-02-27 12:24:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 4825088. Throughput: 0: 860.0. Samples: 204372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:24:15,833][00107] Avg episode reward: [(0, '4.863')] +[2023-02-27 12:24:20,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3292.9). Total num frames: 4845568. Throughput: 0: 904.1. Samples: 210792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:24:20,835][00107] Avg episode reward: [(0, '4.967')] +[2023-02-27 12:24:25,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3292.6). Total num frames: 4861952. Throughput: 0: 900.2. Samples: 213792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:24:25,831][00107] Avg episode reward: [(0, '4.663')] +[2023-02-27 12:24:25,921][36602] Updated weights for policy 0, policy_version 1188 (0.0017) +[2023-02-27 12:24:30,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3292.3). Total num frames: 4878336. Throughput: 0: 850.3. Samples: 217814. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:24:30,834][00107] Avg episode reward: [(0, '4.690')] +[2023-02-27 12:24:35,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3292.0). Total num frames: 4894720. Throughput: 0: 879.3. Samples: 223206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:24:35,837][00107] Avg episode reward: [(0, '4.712')] +[2023-02-27 12:24:37,995][36602] Updated weights for policy 0, policy_version 1198 (0.0019) +[2023-02-27 12:24:40,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3306.6). Total num frames: 4915200. Throughput: 0: 904.3. Samples: 226374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:24:40,832][00107] Avg episode reward: [(0, '4.512')] +[2023-02-27 12:24:45,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3306.1). Total num frames: 4931584. Throughput: 0: 882.2. Samples: 231724. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:24:45,834][00107] Avg episode reward: [(0, '4.745')] +[2023-02-27 12:24:50,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3291.2). Total num frames: 4943872. Throughput: 0: 839.6. Samples: 235650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:24:50,832][00107] Avg episode reward: [(0, '4.700')] +[2023-02-27 12:24:51,261][36602] Updated weights for policy 0, policy_version 1208 (0.0015) +[2023-02-27 12:24:55,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.9, 300 sec: 3305.0). Total num frames: 4964352. Throughput: 0: 860.2. Samples: 238566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:24:55,832][00107] Avg episode reward: [(0, '4.823')] +[2023-02-27 12:25:00,810][36602] Updated weights for policy 0, policy_version 1218 (0.0014) +[2023-02-27 12:25:00,829][00107] Fps is (10 sec: 4505.7, 60 sec: 3549.9, 300 sec: 3332.3). Total num frames: 4988928. Throughput: 0: 903.1. Samples: 245010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:25:00,835][00107] Avg episode reward: [(0, '4.704')] +[2023-02-27 12:25:05,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3374.0). Total num frames: 5001216. Throughput: 0: 863.3. Samples: 249642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:25:05,832][00107] Avg episode reward: [(0, '4.627')] +[2023-02-27 12:25:10,829][00107] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 5013504. Throughput: 0: 839.7. Samples: 251580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:25:10,832][00107] Avg episode reward: [(0, '4.634')] +[2023-02-27 12:25:14,092][36602] Updated weights for policy 0, policy_version 1228 (0.0030) +[2023-02-27 12:25:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 5033984. Throughput: 0: 872.8. Samples: 257090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:25:15,833][00107] Avg episode reward: [(0, '4.637')] +[2023-02-27 12:25:20,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 5054464. Throughput: 0: 894.4. Samples: 263452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:25:20,832][00107] Avg episode reward: [(0, '4.703')] +[2023-02-27 12:25:25,550][36602] Updated weights for policy 0, policy_version 1238 (0.0014) +[2023-02-27 12:25:25,830][00107] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 5070848. Throughput: 0: 868.7. Samples: 265466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:25:25,835][00107] Avg episode reward: [(0, '4.851')] +[2023-02-27 12:25:30,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 5083136. Throughput: 0: 839.7. Samples: 269510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:25:30,835][00107] Avg episode reward: [(0, '4.982')] +[2023-02-27 12:25:35,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 5103616. Throughput: 0: 891.3. Samples: 275760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:25:35,838][00107] Avg episode reward: [(0, '4.936')] +[2023-02-27 12:25:36,810][36602] Updated weights for policy 0, policy_version 1248 (0.0015) +[2023-02-27 12:25:40,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 5124096. Throughput: 0: 896.1. Samples: 278892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:25:40,835][00107] Avg episode reward: [(0, '4.644')] +[2023-02-27 12:25:45,834][00107] Fps is (10 sec: 3684.7, 60 sec: 3481.3, 300 sec: 3485.0). Total num frames: 5140480. Throughput: 0: 851.8. Samples: 283346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:25:45,837][00107] Avg episode reward: [(0, '4.483')] +[2023-02-27 12:25:45,856][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001255_5140480.pth... +[2023-02-27 12:25:46,048][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001050_4300800.pth +[2023-02-27 12:25:49,864][36602] Updated weights for policy 0, policy_version 1258 (0.0013) +[2023-02-27 12:25:50,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 5152768. Throughput: 0: 849.9. Samples: 287888. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:25:50,832][00107] Avg episode reward: [(0, '4.541')] +[2023-02-27 12:25:55,829][00107] Fps is (10 sec: 3688.2, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5177344. Throughput: 0: 877.7. Samples: 291076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:25:55,836][00107] Avg episode reward: [(0, '4.807')] +[2023-02-27 12:26:00,020][36602] Updated weights for policy 0, policy_version 1268 (0.0019) +[2023-02-27 12:26:00,831][00107] Fps is (10 sec: 4095.3, 60 sec: 3413.2, 300 sec: 3498.9). Total num frames: 5193728. Throughput: 0: 894.5. Samples: 297346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:26:00,839][00107] Avg episode reward: [(0, '4.960')] +[2023-02-27 12:26:05,831][00107] Fps is (10 sec: 2866.7, 60 sec: 3413.2, 300 sec: 3471.2). Total num frames: 5206016. Throughput: 0: 841.3. Samples: 301314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:26:05,837][00107] Avg episode reward: [(0, '4.858')] +[2023-02-27 12:26:10,830][00107] Fps is (10 sec: 3277.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 5226496. Throughput: 0: 841.4. Samples: 303328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:26:10,832][00107] Avg episode reward: [(0, '4.673')] +[2023-02-27 12:26:12,661][36602] Updated weights for policy 0, policy_version 1278 (0.0022) +[2023-02-27 12:26:15,830][00107] Fps is (10 sec: 4096.7, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5246976. Throughput: 0: 891.3. Samples: 309620. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:26:15,831][00107] Avg episode reward: [(0, '4.717')] +[2023-02-27 12:26:20,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 5263360. Throughput: 0: 875.6. Samples: 315162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:26:20,837][00107] Avg episode reward: [(0, '4.912')] +[2023-02-27 12:26:25,015][36602] Updated weights for policy 0, policy_version 1288 (0.0028) +[2023-02-27 12:26:25,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3471.3). Total num frames: 5275648. Throughput: 0: 850.1. Samples: 317146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:26:25,837][00107] Avg episode reward: [(0, '5.133')] +[2023-02-27 12:26:30,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 5296128. Throughput: 0: 860.4. Samples: 322062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:26:30,831][00107] Avg episode reward: [(0, '4.866')] +[2023-02-27 12:26:35,081][36602] Updated weights for policy 0, policy_version 1298 (0.0012) +[2023-02-27 12:26:35,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5316608. Throughput: 0: 903.7. Samples: 328554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:26:35,837][00107] Avg episode reward: [(0, '4.702')] +[2023-02-27 12:26:40,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 5332992. Throughput: 0: 895.5. Samples: 331374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:26:40,832][00107] Avg episode reward: [(0, '4.650')] +[2023-02-27 12:26:45,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.6, 300 sec: 3471.2). Total num frames: 5345280. Throughput: 0: 844.2. Samples: 335332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:26:45,834][00107] Avg episode reward: [(0, '4.826')] +[2023-02-27 12:26:48,138][36602] Updated weights for policy 0, policy_version 1308 (0.0017) +[2023-02-27 12:26:50,830][00107] Fps is (10 sec: 3276.5, 60 sec: 3549.8, 300 sec: 3498.9). Total num frames: 5365760. Throughput: 0: 882.1. Samples: 341008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:26:50,838][00107] Avg episode reward: [(0, '4.826')] +[2023-02-27 12:26:55,829][00107] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 5390336. Throughput: 0: 908.9. Samples: 344230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:26:55,831][00107] Avg episode reward: [(0, '4.777')] +[2023-02-27 12:26:58,250][36602] Updated weights for policy 0, policy_version 1318 (0.0012) +[2023-02-27 12:27:00,832][00107] Fps is (10 sec: 3685.9, 60 sec: 3481.6, 300 sec: 3485.0). Total num frames: 5402624. Throughput: 0: 887.4. Samples: 349554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:27:00,834][00107] Avg episode reward: [(0, '4.742')] +[2023-02-27 12:27:05,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3550.0, 300 sec: 3485.1). Total num frames: 5419008. Throughput: 0: 856.8. Samples: 353716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:27:05,836][00107] Avg episode reward: [(0, '4.604')] +[2023-02-27 12:27:10,542][36602] Updated weights for policy 0, policy_version 1328 (0.0018) +[2023-02-27 12:27:10,830][00107] Fps is (10 sec: 3687.2, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5439488. Throughput: 0: 883.1. Samples: 356886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:27:10,836][00107] Avg episode reward: [(0, '4.620')] +[2023-02-27 12:27:15,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 5459968. Throughput: 0: 912.2. Samples: 363112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:27:15,834][00107] Avg episode reward: [(0, '4.684')] +[2023-02-27 12:27:20,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 5472256. Throughput: 0: 868.1. Samples: 367618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:27:20,832][00107] Avg episode reward: [(0, '4.719')] +[2023-02-27 12:27:22,891][36602] Updated weights for policy 0, policy_version 1338 (0.0012) +[2023-02-27 12:27:25,832][00107] Fps is (10 sec: 2866.4, 60 sec: 3549.7, 300 sec: 3485.0). Total num frames: 5488640. Throughput: 0: 850.7. Samples: 369656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:27:25,838][00107] Avg episode reward: [(0, '4.979')] +[2023-02-27 12:27:30,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5509120. Throughput: 0: 895.6. Samples: 375632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:27:30,839][00107] Avg episode reward: [(0, '4.626')] +[2023-02-27 12:27:33,049][36602] Updated weights for policy 0, policy_version 1348 (0.0013) +[2023-02-27 12:27:35,830][00107] Fps is (10 sec: 4097.1, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5529600. Throughput: 0: 908.3. Samples: 381882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:27:35,835][00107] Avg episode reward: [(0, '4.457')] +[2023-02-27 12:27:40,830][00107] Fps is (10 sec: 3276.5, 60 sec: 3481.5, 300 sec: 3485.1). Total num frames: 5541888. Throughput: 0: 881.0. Samples: 383874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:27:40,838][00107] Avg episode reward: [(0, '4.575')] +[2023-02-27 12:27:45,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 5558272. Throughput: 0: 857.2. Samples: 388124. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:27:45,832][00107] Avg episode reward: [(0, '4.583')] +[2023-02-27 12:27:45,900][36602] Updated weights for policy 0, policy_version 1358 (0.0015) +[2023-02-27 12:27:45,900][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001358_5562368.pth... +[2023-02-27 12:27:46,046][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001153_4722688.pth +[2023-02-27 12:27:50,830][00107] Fps is (10 sec: 4096.4, 60 sec: 3618.2, 300 sec: 3526.7). Total num frames: 5582848. Throughput: 0: 906.8. Samples: 394520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:27:50,837][00107] Avg episode reward: [(0, '4.509')] +[2023-02-27 12:27:55,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 5599232. Throughput: 0: 906.9. Samples: 397698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:27:55,833][00107] Avg episode reward: [(0, '4.634')] +[2023-02-27 12:27:56,506][36602] Updated weights for policy 0, policy_version 1368 (0.0023) +[2023-02-27 12:28:00,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3485.1). Total num frames: 5611520. Throughput: 0: 861.0. Samples: 401858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:28:00,833][00107] Avg episode reward: [(0, '4.786')] +[2023-02-27 12:28:05,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 5632000. Throughput: 0: 881.6. Samples: 407290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:28:05,839][00107] Avg episode reward: [(0, '4.794')] +[2023-02-27 12:28:08,173][36602] Updated weights for policy 0, policy_version 1378 (0.0012) +[2023-02-27 12:28:10,830][00107] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 5652480. Throughput: 0: 908.9. Samples: 410552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:28:10,832][00107] Avg episode reward: [(0, '4.530')] +[2023-02-27 12:28:15,830][00107] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 5668864. Throughput: 0: 904.9. Samples: 416352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:28:15,833][00107] Avg episode reward: [(0, '4.632')] +[2023-02-27 12:28:20,445][36602] Updated weights for policy 0, policy_version 1388 (0.0014) +[2023-02-27 12:28:20,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 5685248. Throughput: 0: 857.4. Samples: 420466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:28:20,839][00107] Avg episode reward: [(0, '4.766')] +[2023-02-27 12:28:25,829][00107] Fps is (10 sec: 3686.6, 60 sec: 3618.3, 300 sec: 3512.8). Total num frames: 5705728. Throughput: 0: 872.0. Samples: 423112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:28:25,834][00107] Avg episode reward: [(0, '4.926')] +[2023-02-27 12:28:30,479][36602] Updated weights for policy 0, policy_version 1398 (0.0016) +[2023-02-27 12:28:30,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 5726208. Throughput: 0: 923.2. Samples: 429668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:28:30,831][00107] Avg episode reward: [(0, '5.133')] +[2023-02-27 12:28:35,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5742592. Throughput: 0: 892.7. Samples: 434692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:28:35,834][00107] Avg episode reward: [(0, '5.053')] +[2023-02-27 12:28:40,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 5754880. Throughput: 0: 867.2. Samples: 436720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:28:40,839][00107] Avg episode reward: [(0, '5.075')] +[2023-02-27 12:28:43,246][36602] Updated weights for policy 0, policy_version 1408 (0.0020) +[2023-02-27 12:28:45,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3526.8). Total num frames: 5775360. Throughput: 0: 893.6. Samples: 442070. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:28:45,836][00107] Avg episode reward: [(0, '4.934')] +[2023-02-27 12:28:50,830][00107] Fps is (10 sec: 4096.1, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 5795840. Throughput: 0: 917.3. Samples: 448568. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:28:50,838][00107] Avg episode reward: [(0, '5.085')] +[2023-02-27 12:28:53,768][36602] Updated weights for policy 0, policy_version 1418 (0.0015) +[2023-02-27 12:28:55,833][00107] Fps is (10 sec: 3685.0, 60 sec: 3549.6, 300 sec: 3512.8). Total num frames: 5812224. Throughput: 0: 893.5. Samples: 450764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:28:55,840][00107] Avg episode reward: [(0, '5.213')] +[2023-02-27 12:29:00,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 5824512. Throughput: 0: 856.5. Samples: 454896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:29:00,838][00107] Avg episode reward: [(0, '5.082')] +[2023-02-27 12:29:05,475][36602] Updated weights for policy 0, policy_version 1428 (0.0019) +[2023-02-27 12:29:05,830][00107] Fps is (10 sec: 3687.8, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 5849088. Throughput: 0: 905.2. Samples: 461202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:29:05,838][00107] Avg episode reward: [(0, '5.286')] +[2023-02-27 12:29:10,834][00107] Fps is (10 sec: 4503.5, 60 sec: 3617.8, 300 sec: 3540.6). Total num frames: 5869568. Throughput: 0: 919.6. Samples: 464500. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:29:10,838][00107] Avg episode reward: [(0, '5.030')] +[2023-02-27 12:29:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5881856. Throughput: 0: 880.6. Samples: 469296. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:29:15,832][00107] Avg episode reward: [(0, '5.182')] +[2023-02-27 12:29:17,765][36602] Updated weights for policy 0, policy_version 1438 (0.0016) +[2023-02-27 12:29:20,829][00107] Fps is (10 sec: 2868.6, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 5898240. Throughput: 0: 871.2. Samples: 473894. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2023-02-27 12:29:20,837][00107] Avg episode reward: [(0, '5.186')] +[2023-02-27 12:29:25,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 5918720. Throughput: 0: 897.8. Samples: 477122. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:29:25,831][00107] Avg episode reward: [(0, '5.409')] +[2023-02-27 12:29:25,851][36588] Saving new best policy, reward=5.409! +[2023-02-27 12:29:28,099][36602] Updated weights for policy 0, policy_version 1448 (0.0030) +[2023-02-27 12:29:30,832][00107] Fps is (10 sec: 4094.8, 60 sec: 3549.7, 300 sec: 3540.6). Total num frames: 5939200. Throughput: 0: 919.1. Samples: 483434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:29:30,840][00107] Avg episode reward: [(0, '5.245')] +[2023-02-27 12:29:35,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 5951488. Throughput: 0: 865.6. Samples: 487518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:29:35,833][00107] Avg episode reward: [(0, '5.295')] +[2023-02-27 12:29:40,689][36602] Updated weights for policy 0, policy_version 1458 (0.0034) +[2023-02-27 12:29:40,830][00107] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 5971968. Throughput: 0: 864.2. Samples: 489650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:29:40,837][00107] Avg episode reward: [(0, '5.441')] +[2023-02-27 12:29:40,842][36588] Saving new best policy, reward=5.441! +[2023-02-27 12:29:45,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 5992448. Throughput: 0: 911.2. Samples: 495898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:29:45,837][00107] Avg episode reward: [(0, '5.075')] +[2023-02-27 12:29:45,850][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001463_5992448.pth... +[2023-02-27 12:29:45,982][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001255_5140480.pth +[2023-02-27 12:29:50,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 6008832. Throughput: 0: 892.4. Samples: 501360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:29:50,835][00107] Avg episode reward: [(0, '5.114')] +[2023-02-27 12:29:51,988][36602] Updated weights for policy 0, policy_version 1468 (0.0019) +[2023-02-27 12:29:55,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.8, 300 sec: 3499.0). Total num frames: 6021120. Throughput: 0: 864.0. Samples: 503376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:29:55,835][00107] Avg episode reward: [(0, '5.221')] +[2023-02-27 12:30:00,832][00107] Fps is (10 sec: 3275.9, 60 sec: 3618.0, 300 sec: 3526.7). Total num frames: 6041600. Throughput: 0: 870.4. Samples: 508466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:30:00,837][00107] Avg episode reward: [(0, '5.041')] +[2023-02-27 12:30:03,384][36602] Updated weights for policy 0, policy_version 1478 (0.0022) +[2023-02-27 12:30:05,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 6062080. Throughput: 0: 913.9. Samples: 515018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:30:05,837][00107] Avg episode reward: [(0, '4.857')] +[2023-02-27 12:30:10,830][00107] Fps is (10 sec: 3687.4, 60 sec: 3481.9, 300 sec: 3540.6). Total num frames: 6078464. Throughput: 0: 903.5. Samples: 517778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:30:10,836][00107] Avg episode reward: [(0, '5.128')] +[2023-02-27 12:30:15,830][00107] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 6090752. Throughput: 0: 851.4. Samples: 521746. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-27 12:30:15,832][00107] Avg episode reward: [(0, '5.219')] +[2023-02-27 12:30:15,947][36602] Updated weights for policy 0, policy_version 1488 (0.0033) +[2023-02-27 12:30:20,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 6111232. Throughput: 0: 887.5. Samples: 527456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:30:20,836][00107] Avg episode reward: [(0, '5.349')] +[2023-02-27 12:30:25,740][36602] Updated weights for policy 0, policy_version 1498 (0.0013) +[2023-02-27 12:30:25,830][00107] Fps is (10 sec: 4505.6, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 6135808. Throughput: 0: 911.3. Samples: 530660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:30:25,837][00107] Avg episode reward: [(0, '5.295')] +[2023-02-27 12:30:30,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.8, 300 sec: 3540.6). Total num frames: 6148096. Throughput: 0: 887.2. Samples: 535822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:30:30,832][00107] Avg episode reward: [(0, '5.060')] +[2023-02-27 12:30:35,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 6164480. Throughput: 0: 854.7. Samples: 539822. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:30:35,836][00107] Avg episode reward: [(0, '5.154')] +[2023-02-27 12:30:38,574][36602] Updated weights for policy 0, policy_version 1508 (0.0021) +[2023-02-27 12:30:40,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.7). Total num frames: 6184960. Throughput: 0: 880.2. Samples: 542986. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-27 12:30:40,832][00107] Avg episode reward: [(0, '4.977')] +[2023-02-27 12:30:45,830][00107] Fps is (10 sec: 4095.8, 60 sec: 3549.8, 300 sec: 3568.4). Total num frames: 6205440. Throughput: 0: 908.9. Samples: 549364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:30:45,837][00107] Avg episode reward: [(0, '4.997')] +[2023-02-27 12:30:50,037][36602] Updated weights for policy 0, policy_version 1518 (0.0023) +[2023-02-27 12:30:50,831][00107] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3526.7). Total num frames: 6217728. Throughput: 0: 861.8. Samples: 553800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:30:50,835][00107] Avg episode reward: [(0, '4.919')] +[2023-02-27 12:30:55,830][00107] Fps is (10 sec: 2867.4, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 6234112. Throughput: 0: 845.3. Samples: 555818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:30:55,832][00107] Avg episode reward: [(0, '4.986')] +[2023-02-27 12:31:00,829][00107] Fps is (10 sec: 3686.9, 60 sec: 3550.0, 300 sec: 3554.5). Total num frames: 6254592. Throughput: 0: 891.6. Samples: 561866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:31:00,832][00107] Avg episode reward: [(0, '4.860')] +[2023-02-27 12:31:01,274][36602] Updated weights for policy 0, policy_version 1528 (0.0017) +[2023-02-27 12:31:05,831][00107] Fps is (10 sec: 4095.3, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 6275072. Throughput: 0: 904.6. Samples: 568166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:31:05,835][00107] Avg episode reward: [(0, '4.840')] +[2023-02-27 12:31:10,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 6287360. Throughput: 0: 879.7. Samples: 570246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:31:10,834][00107] Avg episode reward: [(0, '4.726')] +[2023-02-27 12:31:13,840][36602] Updated weights for policy 0, policy_version 1538 (0.0063) +[2023-02-27 12:31:15,830][00107] Fps is (10 sec: 2867.7, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 6303744. Throughput: 0: 856.1. Samples: 574348. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-27 12:31:15,836][00107] Avg episode reward: [(0, '4.693')] +[2023-02-27 12:31:20,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 6324224. Throughput: 0: 908.1. Samples: 580688. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:31:20,832][00107] Avg episode reward: [(0, '5.035')] +[2023-02-27 12:31:24,084][36602] Updated weights for policy 0, policy_version 1548 (0.0014) +[2023-02-27 12:31:25,830][00107] Fps is (10 sec: 4095.7, 60 sec: 3481.6, 300 sec: 3554.5). Total num frames: 6344704. Throughput: 0: 907.5. Samples: 583822. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:31:25,836][00107] Avg episode reward: [(0, '4.985')] +[2023-02-27 12:31:30,830][00107] Fps is (10 sec: 3276.5, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 6356992. Throughput: 0: 857.9. Samples: 587970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:31:30,838][00107] Avg episode reward: [(0, '4.790')] +[2023-02-27 12:31:35,830][00107] Fps is (10 sec: 2457.8, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 6369280. Throughput: 0: 849.0. Samples: 592002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:31:35,832][00107] Avg episode reward: [(0, '4.848')] +[2023-02-27 12:31:38,204][36602] Updated weights for policy 0, policy_version 1558 (0.0045) +[2023-02-27 12:31:40,830][00107] Fps is (10 sec: 3277.1, 60 sec: 3413.3, 300 sec: 3540.6). Total num frames: 6389760. Throughput: 0: 871.4. Samples: 595030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:31:40,832][00107] Avg episode reward: [(0, '4.859')] +[2023-02-27 12:31:45,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3526.7). Total num frames: 6406144. Throughput: 0: 862.5. Samples: 600678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:31:45,833][00107] Avg episode reward: [(0, '4.858')] +[2023-02-27 12:31:45,846][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001564_6406144.pth... +[2023-02-27 12:31:46,007][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001358_5562368.pth +[2023-02-27 12:31:50,833][00107] Fps is (10 sec: 2866.2, 60 sec: 3344.9, 300 sec: 3485.0). Total num frames: 6418432. Throughput: 0: 803.0. Samples: 604304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:31:50,836][00107] Avg episode reward: [(0, '4.735')] +[2023-02-27 12:31:51,406][36602] Updated weights for policy 0, policy_version 1568 (0.0026) +[2023-02-27 12:31:55,832][00107] Fps is (10 sec: 2866.5, 60 sec: 3344.9, 300 sec: 3499.0). Total num frames: 6434816. Throughput: 0: 797.6. Samples: 606142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:31:55,834][00107] Avg episode reward: [(0, '4.595')] +[2023-02-27 12:32:00,830][00107] Fps is (10 sec: 3687.7, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 6455296. Throughput: 0: 834.4. Samples: 611896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:32:00,832][00107] Avg episode reward: [(0, '4.505')] +[2023-02-27 12:32:02,816][36602] Updated weights for policy 0, policy_version 1578 (0.0016) +[2023-02-27 12:32:05,830][00107] Fps is (10 sec: 3277.6, 60 sec: 3208.6, 300 sec: 3485.1). Total num frames: 6467584. Throughput: 0: 797.5. Samples: 616576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:32:05,835][00107] Avg episode reward: [(0, '4.626')] +[2023-02-27 12:32:10,834][00107] Fps is (10 sec: 2456.4, 60 sec: 3208.3, 300 sec: 3457.2). Total num frames: 6479872. Throughput: 0: 765.3. Samples: 618264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:32:10,839][00107] Avg episode reward: [(0, '4.645')] +[2023-02-27 12:32:15,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3471.2). Total num frames: 6496256. Throughput: 0: 769.5. Samples: 622598. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:32:15,832][00107] Avg episode reward: [(0, '4.880')] +[2023-02-27 12:32:17,210][36602] Updated weights for policy 0, policy_version 1588 (0.0024) +[2023-02-27 12:32:20,830][00107] Fps is (10 sec: 3688.2, 60 sec: 3208.5, 300 sec: 3485.1). Total num frames: 6516736. Throughput: 0: 813.9. Samples: 628628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:32:20,837][00107] Avg episode reward: [(0, '4.527')] +[2023-02-27 12:32:25,830][00107] Fps is (10 sec: 3686.2, 60 sec: 3140.3, 300 sec: 3471.2). Total num frames: 6533120. Throughput: 0: 815.6. Samples: 631732. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:32:25,835][00107] Avg episode reward: [(0, '4.481')] +[2023-02-27 12:32:29,190][36602] Updated weights for policy 0, policy_version 1598 (0.0025) +[2023-02-27 12:32:30,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3208.6, 300 sec: 3457.3). Total num frames: 6549504. Throughput: 0: 779.5. Samples: 635756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:32:30,834][00107] Avg episode reward: [(0, '4.603')] +[2023-02-27 12:32:35,830][00107] Fps is (10 sec: 3277.0, 60 sec: 3276.8, 300 sec: 3471.2). Total num frames: 6565888. Throughput: 0: 812.4. Samples: 640858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:32:35,832][00107] Avg episode reward: [(0, '4.776')] +[2023-02-27 12:32:39,978][36602] Updated weights for policy 0, policy_version 1608 (0.0023) +[2023-02-27 12:32:40,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3485.1). Total num frames: 6586368. Throughput: 0: 843.1. Samples: 644078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:32:40,838][00107] Avg episode reward: [(0, '4.943')] +[2023-02-27 12:32:45,830][00107] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3457.3). Total num frames: 6602752. Throughput: 0: 840.4. Samples: 649714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:32:45,838][00107] Avg episode reward: [(0, '5.014')] +[2023-02-27 12:32:50,831][00107] Fps is (10 sec: 2866.7, 60 sec: 3276.9, 300 sec: 3443.4). Total num frames: 6615040. Throughput: 0: 823.4. Samples: 653630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:32:50,834][00107] Avg episode reward: [(0, '4.758')] +[2023-02-27 12:32:53,092][36602] Updated weights for policy 0, policy_version 1618 (0.0012) +[2023-02-27 12:32:55,829][00107] Fps is (10 sec: 3276.9, 60 sec: 3345.2, 300 sec: 3471.2). Total num frames: 6635520. Throughput: 0: 844.6. Samples: 656268. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:32:55,832][00107] Avg episode reward: [(0, '4.775')] +[2023-02-27 12:33:00,829][00107] Fps is (10 sec: 4096.7, 60 sec: 3345.1, 300 sec: 3471.2). Total num frames: 6656000. Throughput: 0: 887.8. Samples: 662548. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:33:00,832][00107] Avg episode reward: [(0, '4.896')] +[2023-02-27 12:33:04,109][36602] Updated weights for policy 0, policy_version 1628 (0.0014) +[2023-02-27 12:33:05,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 6672384. Throughput: 0: 851.9. Samples: 666962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:33:05,833][00107] Avg episode reward: [(0, '4.745')] +[2023-02-27 12:33:10,834][00107] Fps is (10 sec: 2456.7, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 6680576. Throughput: 0: 822.3. Samples: 668738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:33:10,836][00107] Avg episode reward: [(0, '4.802')] +[2023-02-27 12:33:15,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6701056. Throughput: 0: 848.8. Samples: 673954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:33:15,833][00107] Avg episode reward: [(0, '4.613')] +[2023-02-27 12:33:16,823][36602] Updated weights for policy 0, policy_version 1638 (0.0015) +[2023-02-27 12:33:20,829][00107] Fps is (10 sec: 4507.3, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6725632. Throughput: 0: 878.4. Samples: 680384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:33:20,836][00107] Avg episode reward: [(0, '4.590')] +[2023-02-27 12:33:25,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 6737920. Throughput: 0: 860.1. Samples: 682784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:33:25,837][00107] Avg episode reward: [(0, '4.760')] +[2023-02-27 12:33:29,383][36602] Updated weights for policy 0, policy_version 1648 (0.0013) +[2023-02-27 12:33:30,830][00107] Fps is (10 sec: 2457.6, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 6750208. Throughput: 0: 824.3. Samples: 686806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:33:30,837][00107] Avg episode reward: [(0, '4.871')] +[2023-02-27 12:33:35,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 6770688. Throughput: 0: 865.6. Samples: 692582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:33:35,831][00107] Avg episode reward: [(0, '4.956')] +[2023-02-27 12:33:39,662][36602] Updated weights for policy 0, policy_version 1658 (0.0012) +[2023-02-27 12:33:40,830][00107] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6795264. Throughput: 0: 878.2. Samples: 695788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:33:40,835][00107] Avg episode reward: [(0, '4.974')] +[2023-02-27 12:33:45,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 6807552. Throughput: 0: 852.0. Samples: 700886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:33:45,831][00107] Avg episode reward: [(0, '5.053')] +[2023-02-27 12:33:45,851][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001662_6807552.pth... +[2023-02-27 12:33:46,061][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001463_5992448.pth +[2023-02-27 12:33:50,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3429.6). Total num frames: 6823936. Throughput: 0: 842.7. Samples: 704884. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:33:50,833][00107] Avg episode reward: [(0, '4.838')] +[2023-02-27 12:33:52,631][36602] Updated weights for policy 0, policy_version 1668 (0.0023) +[2023-02-27 12:33:55,830][00107] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 6844416. Throughput: 0: 873.2. Samples: 708028. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:33:55,833][00107] Avg episode reward: [(0, '4.748')] +[2023-02-27 12:34:00,830][00107] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 6864896. Throughput: 0: 899.3. Samples: 714424. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-27 12:34:00,834][00107] Avg episode reward: [(0, '4.679')] +[2023-02-27 12:34:03,547][36602] Updated weights for policy 0, policy_version 1678 (0.0016) +[2023-02-27 12:34:05,830][00107] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 6877184. Throughput: 0: 842.6. Samples: 718300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:34:05,836][00107] Avg episode reward: [(0, '4.834')] +[2023-02-27 12:34:10,830][00107] Fps is (10 sec: 2457.7, 60 sec: 3481.8, 300 sec: 3415.6). Total num frames: 6889472. Throughput: 0: 829.0. Samples: 720090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:34:10,832][00107] Avg episode reward: [(0, '4.858')] +[2023-02-27 12:34:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 6909952. Throughput: 0: 866.1. Samples: 725780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:34:15,832][00107] Avg episode reward: [(0, '5.005')] +[2023-02-27 12:34:16,259][36602] Updated weights for policy 0, policy_version 1688 (0.0022) +[2023-02-27 12:34:20,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 6930432. Throughput: 0: 875.7. Samples: 731990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:34:20,832][00107] Avg episode reward: [(0, '4.911')] +[2023-02-27 12:34:25,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 6942720. Throughput: 0: 848.2. Samples: 733956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:34:25,836][00107] Avg episode reward: [(0, '4.900')] +[2023-02-27 12:34:29,226][36602] Updated weights for policy 0, policy_version 1698 (0.0014) +[2023-02-27 12:34:30,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 6959104. Throughput: 0: 825.8. Samples: 738048. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:34:30,831][00107] Avg episode reward: [(0, '4.972')] +[2023-02-27 12:34:35,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 6979584. Throughput: 0: 879.4. Samples: 744456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:34:35,832][00107] Avg episode reward: [(0, '4.925')] +[2023-02-27 12:34:38,945][36602] Updated weights for policy 0, policy_version 1708 (0.0025) +[2023-02-27 12:34:40,833][00107] Fps is (10 sec: 4094.5, 60 sec: 3413.1, 300 sec: 3415.6). Total num frames: 7000064. Throughput: 0: 880.9. Samples: 747672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:34:40,837][00107] Avg episode reward: [(0, '4.630')] +[2023-02-27 12:34:45,830][00107] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 7012352. Throughput: 0: 834.7. Samples: 751984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:34:45,832][00107] Avg episode reward: [(0, '4.436')] +[2023-02-27 12:34:50,830][00107] Fps is (10 sec: 2868.2, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 7028736. Throughput: 0: 854.6. Samples: 756756. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-27 12:34:50,838][00107] Avg episode reward: [(0, '4.436')] +[2023-02-27 12:34:52,039][36602] Updated weights for policy 0, policy_version 1718 (0.0017) +[2023-02-27 12:34:55,830][00107] Fps is (10 sec: 3686.5, 60 sec: 3413.4, 300 sec: 3415.7). Total num frames: 7049216. Throughput: 0: 885.1. Samples: 759918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:34:55,837][00107] Avg episode reward: [(0, '4.437')] +[2023-02-27 12:35:00,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3415.6). Total num frames: 7069696. Throughput: 0: 895.1. Samples: 766060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:35:00,839][00107] Avg episode reward: [(0, '4.667')] +[2023-02-27 12:35:03,760][36602] Updated weights for policy 0, policy_version 1728 (0.0019) +[2023-02-27 12:35:05,833][00107] Fps is (10 sec: 2866.2, 60 sec: 3344.9, 300 sec: 3387.8). Total num frames: 7077888. Throughput: 0: 833.8. Samples: 769514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:35:05,843][00107] Avg episode reward: [(0, '4.679')] +[2023-02-27 12:35:10,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 7098368. Throughput: 0: 835.0. Samples: 771532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:35:10,832][00107] Avg episode reward: [(0, '4.649')] +[2023-02-27 12:35:15,447][36602] Updated weights for policy 0, policy_version 1738 (0.0023) +[2023-02-27 12:35:15,830][00107] Fps is (10 sec: 4097.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 7118848. Throughput: 0: 883.5. Samples: 777804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:35:15,838][00107] Avg episode reward: [(0, '4.744')] +[2023-02-27 12:35:20,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7135232. Throughput: 0: 859.1. Samples: 783114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:35:20,833][00107] Avg episode reward: [(0, '4.802')] +[2023-02-27 12:35:25,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7147520. Throughput: 0: 831.7. Samples: 785094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:35:25,832][00107] Avg episode reward: [(0, '4.865')] +[2023-02-27 12:35:28,745][36602] Updated weights for policy 0, policy_version 1748 (0.0036) +[2023-02-27 12:35:30,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 7168000. Throughput: 0: 843.8. Samples: 789954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:35:30,832][00107] Avg episode reward: [(0, '5.099')] +[2023-02-27 12:35:35,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 7188480. Throughput: 0: 878.2. Samples: 796274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:35:35,839][00107] Avg episode reward: [(0, '4.959')] +[2023-02-27 12:35:39,082][36602] Updated weights for policy 0, policy_version 1758 (0.0013) +[2023-02-27 12:35:40,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3387.9). Total num frames: 7204864. Throughput: 0: 868.4. Samples: 798998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:35:40,831][00107] Avg episode reward: [(0, '4.627')] +[2023-02-27 12:35:45,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7217152. Throughput: 0: 819.9. Samples: 802954. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-27 12:35:45,836][00107] Avg episode reward: [(0, '4.540')] +[2023-02-27 12:35:45,852][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001762_7217152.pth... +[2023-02-27 12:35:46,077][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001564_6406144.pth +[2023-02-27 12:35:50,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 7237632. Throughput: 0: 863.1. Samples: 808352. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-27 12:35:50,836][00107] Avg episode reward: [(0, '4.589')] +[2023-02-27 12:35:51,445][36602] Updated weights for policy 0, policy_version 1768 (0.0030) +[2023-02-27 12:35:55,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 7258112. Throughput: 0: 889.1. Samples: 811540. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2023-02-27 12:35:55,837][00107] Avg episode reward: [(0, '4.546')] +[2023-02-27 12:36:00,832][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7274496. Throughput: 0: 869.1. Samples: 816914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:36:00,837][00107] Avg episode reward: [(0, '4.667')] +[2023-02-27 12:36:04,197][36602] Updated weights for policy 0, policy_version 1778 (0.0029) +[2023-02-27 12:36:05,830][00107] Fps is (10 sec: 2457.6, 60 sec: 3413.5, 300 sec: 3374.0). Total num frames: 7282688. Throughput: 0: 829.2. Samples: 820426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:36:05,837][00107] Avg episode reward: [(0, '4.710')] +[2023-02-27 12:36:10,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7303168. Throughput: 0: 842.7. Samples: 823016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:36:10,832][00107] Avg episode reward: [(0, '4.679')] +[2023-02-27 12:36:15,055][36602] Updated weights for policy 0, policy_version 1788 (0.0023) +[2023-02-27 12:36:15,830][00107] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3387.9). Total num frames: 7323648. Throughput: 0: 877.4. Samples: 829436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:36:15,836][00107] Avg episode reward: [(0, '4.521')] +[2023-02-27 12:36:20,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 7340032. Throughput: 0: 841.9. Samples: 834158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:36:20,839][00107] Avg episode reward: [(0, '4.569')] +[2023-02-27 12:36:25,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3374.0). Total num frames: 7352320. Throughput: 0: 826.2. Samples: 836176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:36:25,832][00107] Avg episode reward: [(0, '4.603')] +[2023-02-27 12:36:28,126][36602] Updated weights for policy 0, policy_version 1798 (0.0013) +[2023-02-27 12:36:30,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 7372800. Throughput: 0: 865.5. Samples: 841902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:36:30,832][00107] Avg episode reward: [(0, '4.528')] +[2023-02-27 12:36:35,830][00107] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 7397376. Throughput: 0: 888.1. Samples: 848316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:36:35,832][00107] Avg episode reward: [(0, '4.473')] +[2023-02-27 12:36:38,909][36602] Updated weights for policy 0, policy_version 1808 (0.0024) +[2023-02-27 12:36:40,837][00107] Fps is (10 sec: 3683.7, 60 sec: 3412.9, 300 sec: 3401.7). Total num frames: 7409664. Throughput: 0: 863.0. Samples: 850382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:36:40,840][00107] Avg episode reward: [(0, '4.607')] +[2023-02-27 12:36:45,830][00107] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 7421952. Throughput: 0: 832.1. Samples: 854360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:36:45,837][00107] Avg episode reward: [(0, '4.878')] +[2023-02-27 12:36:50,787][36602] Updated weights for policy 0, policy_version 1818 (0.0013) +[2023-02-27 12:36:50,829][00107] Fps is (10 sec: 3689.1, 60 sec: 3481.6, 300 sec: 3429.6). Total num frames: 7446528. Throughput: 0: 892.9. Samples: 860608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-27 12:36:50,831][00107] Avg episode reward: [(0, '4.737')] +[2023-02-27 12:36:55,833][00107] Fps is (10 sec: 4094.5, 60 sec: 3413.1, 300 sec: 3415.6). Total num frames: 7462912. Throughput: 0: 905.6. Samples: 863772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:36:55,835][00107] Avg episode reward: [(0, '4.504')] +[2023-02-27 12:37:00,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7479296. Throughput: 0: 864.2. Samples: 868324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:37:00,834][00107] Avg episode reward: [(0, '4.648')] +[2023-02-27 12:37:04,097][36602] Updated weights for policy 0, policy_version 1828 (0.0025) +[2023-02-27 12:37:05,830][00107] Fps is (10 sec: 2868.2, 60 sec: 3481.6, 300 sec: 3429.6). Total num frames: 7491584. Throughput: 0: 844.7. Samples: 872170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:37:05,833][00107] Avg episode reward: [(0, '4.811')] +[2023-02-27 12:37:10,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 7512064. Throughput: 0: 866.8. Samples: 875184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:37:10,835][00107] Avg episode reward: [(0, '4.703')] +[2023-02-27 12:37:14,192][36602] Updated weights for policy 0, policy_version 1838 (0.0020) +[2023-02-27 12:37:15,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 7532544. Throughput: 0: 879.0. Samples: 881456. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:37:15,832][00107] Avg episode reward: [(0, '4.456')] +[2023-02-27 12:37:20,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7544832. Throughput: 0: 825.3. Samples: 885454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:37:20,838][00107] Avg episode reward: [(0, '4.511')] +[2023-02-27 12:37:25,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 7561216. Throughput: 0: 823.7. Samples: 887442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:37:25,831][00107] Avg episode reward: [(0, '4.712')] +[2023-02-27 12:37:27,250][36602] Updated weights for policy 0, policy_version 1848 (0.0020) +[2023-02-27 12:37:30,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 7581696. Throughput: 0: 878.3. Samples: 893882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:37:30,832][00107] Avg episode reward: [(0, '5.034')] +[2023-02-27 12:37:35,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 7602176. Throughput: 0: 865.3. Samples: 899546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:37:35,834][00107] Avg episode reward: [(0, '4.924')] +[2023-02-27 12:37:38,777][36602] Updated weights for policy 0, policy_version 1858 (0.0032) +[2023-02-27 12:37:40,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.7, 300 sec: 3429.5). Total num frames: 7614464. Throughput: 0: 840.6. Samples: 901596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:37:40,837][00107] Avg episode reward: [(0, '4.816')] +[2023-02-27 12:37:45,830][00107] Fps is (10 sec: 2867.1, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 7630848. Throughput: 0: 842.8. Samples: 906250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:37:45,832][00107] Avg episode reward: [(0, '4.745')] +[2023-02-27 12:37:45,846][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001863_7630848.pth... +[2023-02-27 12:37:46,042][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001662_6807552.pth +[2023-02-27 12:37:49,924][36602] Updated weights for policy 0, policy_version 1868 (0.0021) +[2023-02-27 12:37:50,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 7651328. Throughput: 0: 898.5. Samples: 912604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:37:50,835][00107] Avg episode reward: [(0, '4.633')] +[2023-02-27 12:37:55,836][00107] Fps is (10 sec: 4093.3, 60 sec: 3481.4, 300 sec: 3443.3). Total num frames: 7671808. Throughput: 0: 898.4. Samples: 915616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:37:55,839][00107] Avg episode reward: [(0, '4.514')] +[2023-02-27 12:38:00,830][00107] Fps is (10 sec: 3276.6, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7684096. Throughput: 0: 849.1. Samples: 919664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:38:00,833][00107] Avg episode reward: [(0, '4.449')] +[2023-02-27 12:38:03,265][36602] Updated weights for policy 0, policy_version 1878 (0.0014) +[2023-02-27 12:38:05,830][00107] Fps is (10 sec: 2869.1, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 7700480. Throughput: 0: 860.8. Samples: 924190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:38:05,833][00107] Avg episode reward: [(0, '4.505')] +[2023-02-27 12:38:10,829][00107] Fps is (10 sec: 3686.6, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 7720960. Throughput: 0: 881.7. Samples: 927120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:38:10,837][00107] Avg episode reward: [(0, '4.641')] +[2023-02-27 12:38:14,228][36602] Updated weights for policy 0, policy_version 1888 (0.0020) +[2023-02-27 12:38:15,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7737344. Throughput: 0: 859.3. Samples: 932552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:38:15,834][00107] Avg episode reward: [(0, '4.464')] +[2023-02-27 12:38:20,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7749632. Throughput: 0: 822.4. Samples: 936556. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2023-02-27 12:38:20,831][00107] Avg episode reward: [(0, '4.636')] +[2023-02-27 12:38:25,830][00107] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 7770112. Throughput: 0: 839.6. Samples: 939380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:38:25,832][00107] Avg episode reward: [(0, '4.604')] +[2023-02-27 12:38:26,617][36602] Updated weights for policy 0, policy_version 1898 (0.0023) +[2023-02-27 12:38:30,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 7790592. Throughput: 0: 879.4. Samples: 945824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:38:30,831][00107] Avg episode reward: [(0, '4.630')] +[2023-02-27 12:38:35,832][00107] Fps is (10 sec: 3685.6, 60 sec: 3413.2, 300 sec: 3429.5). Total num frames: 7806976. Throughput: 0: 846.3. Samples: 950688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:38:35,837][00107] Avg episode reward: [(0, '4.551')] +[2023-02-27 12:38:38,845][36602] Updated weights for policy 0, policy_version 1908 (0.0012) +[2023-02-27 12:38:40,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7819264. Throughput: 0: 823.6. Samples: 952674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:38:40,832][00107] Avg episode reward: [(0, '4.519')] +[2023-02-27 12:38:45,830][00107] Fps is (10 sec: 3277.5, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 7839744. Throughput: 0: 856.6. Samples: 958210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:38:45,838][00107] Avg episode reward: [(0, '4.594')] +[2023-02-27 12:38:49,211][36602] Updated weights for policy 0, policy_version 1918 (0.0028) +[2023-02-27 12:38:50,830][00107] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 7860224. Throughput: 0: 896.9. Samples: 964552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:38:50,832][00107] Avg episode reward: [(0, '4.826')] +[2023-02-27 12:38:55,830][00107] Fps is (10 sec: 3686.5, 60 sec: 3413.7, 300 sec: 3429.5). Total num frames: 7876608. Throughput: 0: 879.3. Samples: 966688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:38:55,832][00107] Avg episode reward: [(0, '4.944')] +[2023-02-27 12:39:00,830][00107] Fps is (10 sec: 2867.3, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 7888896. Throughput: 0: 849.1. Samples: 970760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:39:00,832][00107] Avg episode reward: [(0, '4.666')] +[2023-02-27 12:39:02,208][36602] Updated weights for policy 0, policy_version 1928 (0.0027) +[2023-02-27 12:39:05,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 7909376. Throughput: 0: 878.9. Samples: 976108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:39:05,838][00107] Avg episode reward: [(0, '4.647')] +[2023-02-27 12:39:10,830][00107] Fps is (10 sec: 3686.1, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 7925760. Throughput: 0: 879.3. Samples: 978950. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:39:10,836][00107] Avg episode reward: [(0, '4.816')] +[2023-02-27 12:39:14,501][36602] Updated weights for policy 0, policy_version 1938 (0.0014) +[2023-02-27 12:39:15,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 7938048. Throughput: 0: 839.4. Samples: 983596. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:39:15,833][00107] Avg episode reward: [(0, '4.815')] +[2023-02-27 12:39:20,829][00107] Fps is (10 sec: 2867.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 7954432. Throughput: 0: 827.4. Samples: 987920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:39:20,832][00107] Avg episode reward: [(0, '4.768')] +[2023-02-27 12:39:25,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 7974912. Throughput: 0: 855.9. Samples: 991190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-02-27 12:39:25,837][00107] Avg episode reward: [(0, '4.724')] +[2023-02-27 12:39:26,065][36602] Updated weights for policy 0, policy_version 1948 (0.0022) +[2023-02-27 12:39:30,835][00107] Fps is (10 sec: 4093.7, 60 sec: 3413.0, 300 sec: 3443.4). Total num frames: 7995392. Throughput: 0: 876.0. Samples: 997636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:39:30,838][00107] Avg episode reward: [(0, '4.798')] +[2023-02-27 12:39:35,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.5, 300 sec: 3429.6). Total num frames: 8011776. Throughput: 0: 826.8. Samples: 1001760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:39:35,832][00107] Avg episode reward: [(0, '4.800')] +[2023-02-27 12:39:39,039][36602] Updated weights for policy 0, policy_version 1958 (0.0013) +[2023-02-27 12:39:40,829][00107] Fps is (10 sec: 2868.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 8024064. Throughput: 0: 822.0. Samples: 1003676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:39:40,832][00107] Avg episode reward: [(0, '5.053')] +[2023-02-27 12:39:45,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8048640. Throughput: 0: 870.4. Samples: 1009926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:39:45,838][00107] Avg episode reward: [(0, '5.117')] +[2023-02-27 12:39:45,852][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001965_8048640.pth... +[2023-02-27 12:39:46,003][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001762_7217152.pth +[2023-02-27 12:39:48,706][36602] Updated weights for policy 0, policy_version 1968 (0.0012) +[2023-02-27 12:39:50,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3443.4). Total num frames: 8065024. Throughput: 0: 874.4. Samples: 1015454. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:39:50,837][00107] Avg episode reward: [(0, '4.791')] +[2023-02-27 12:39:55,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 8077312. Throughput: 0: 856.9. Samples: 1017510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:39:55,834][00107] Avg episode reward: [(0, '4.670')] +[2023-02-27 12:40:00,833][00107] Fps is (10 sec: 2866.2, 60 sec: 3413.1, 300 sec: 3443.4). Total num frames: 8093696. Throughput: 0: 856.6. Samples: 1022148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:40:00,838][00107] Avg episode reward: [(0, '4.775')] +[2023-02-27 12:40:01,891][36602] Updated weights for policy 0, policy_version 1978 (0.0032) +[2023-02-27 12:40:05,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8114176. Throughput: 0: 883.3. Samples: 1027668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:40:05,837][00107] Avg episode reward: [(0, '4.987')] +[2023-02-27 12:40:10,829][00107] Fps is (10 sec: 3687.7, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 8130560. Throughput: 0: 874.3. Samples: 1030534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:40:10,832][00107] Avg episode reward: [(0, '4.918')] +[2023-02-27 12:40:15,121][36602] Updated weights for policy 0, policy_version 1988 (0.0017) +[2023-02-27 12:40:15,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 8142848. Throughput: 0: 820.2. Samples: 1034540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:40:15,833][00107] Avg episode reward: [(0, '4.800')] +[2023-02-27 12:40:20,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8163328. Throughput: 0: 845.9. Samples: 1039824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-27 12:40:20,832][00107] Avg episode reward: [(0, '4.875')] +[2023-02-27 12:40:25,391][36602] Updated weights for policy 0, policy_version 1998 (0.0012) +[2023-02-27 12:40:25,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8183808. Throughput: 0: 873.9. Samples: 1043002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:40:25,833][00107] Avg episode reward: [(0, '5.114')] +[2023-02-27 12:40:30,833][00107] Fps is (10 sec: 3685.1, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 8200192. Throughput: 0: 859.0. Samples: 1048582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:40:30,838][00107] Avg episode reward: [(0, '5.271')] +[2023-02-27 12:40:35,831][00107] Fps is (10 sec: 2866.9, 60 sec: 3345.0, 300 sec: 3415.6). Total num frames: 8212480. Throughput: 0: 825.7. Samples: 1052610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:40:35,840][00107] Avg episode reward: [(0, '5.108')] +[2023-02-27 12:40:38,555][36602] Updated weights for policy 0, policy_version 2008 (0.0012) +[2023-02-27 12:40:40,830][00107] Fps is (10 sec: 3278.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8232960. Throughput: 0: 841.4. Samples: 1055372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:40:40,831][00107] Avg episode reward: [(0, '5.257')] +[2023-02-27 12:40:45,830][00107] Fps is (10 sec: 4096.5, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8253440. Throughput: 0: 879.3. Samples: 1061712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:40:45,832][00107] Avg episode reward: [(0, '5.593')] +[2023-02-27 12:40:45,852][36588] Saving new best policy, reward=5.593! +[2023-02-27 12:40:48,962][36602] Updated weights for policy 0, policy_version 2018 (0.0021) +[2023-02-27 12:40:50,831][00107] Fps is (10 sec: 3685.8, 60 sec: 3413.2, 300 sec: 3429.5). Total num frames: 8269824. Throughput: 0: 865.6. Samples: 1066620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:40:50,836][00107] Avg episode reward: [(0, '5.561')] +[2023-02-27 12:40:55,833][00107] Fps is (10 sec: 2866.3, 60 sec: 3413.1, 300 sec: 3415.6). Total num frames: 8282112. Throughput: 0: 846.6. Samples: 1068632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:40:55,836][00107] Avg episode reward: [(0, '5.377')] +[2023-02-27 12:41:00,830][00107] Fps is (10 sec: 3277.3, 60 sec: 3481.8, 300 sec: 3457.3). Total num frames: 8302592. Throughput: 0: 879.8. Samples: 1074130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:41:00,832][00107] Avg episode reward: [(0, '5.250')] +[2023-02-27 12:41:01,045][36602] Updated weights for policy 0, policy_version 2028 (0.0021) +[2023-02-27 12:41:05,830][00107] Fps is (10 sec: 4097.3, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8323072. Throughput: 0: 886.9. Samples: 1079734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:41:05,832][00107] Avg episode reward: [(0, '5.119')] +[2023-02-27 12:41:10,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 8335360. Throughput: 0: 861.5. Samples: 1081768. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:41:10,832][00107] Avg episode reward: [(0, '5.063')] +[2023-02-27 12:41:14,961][36602] Updated weights for policy 0, policy_version 2038 (0.0029) +[2023-02-27 12:41:15,829][00107] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 8347648. Throughput: 0: 827.2. Samples: 1085804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:41:15,831][00107] Avg episode reward: [(0, '5.243')] +[2023-02-27 12:41:20,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8372224. Throughput: 0: 876.4. Samples: 1092048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:41:20,832][00107] Avg episode reward: [(0, '4.937')] +[2023-02-27 12:41:24,457][36602] Updated weights for policy 0, policy_version 2048 (0.0012) +[2023-02-27 12:41:25,830][00107] Fps is (10 sec: 4505.6, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8392704. Throughput: 0: 885.8. Samples: 1095234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:41:25,836][00107] Avg episode reward: [(0, '4.705')] +[2023-02-27 12:41:30,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.5, 300 sec: 3415.6). Total num frames: 8404992. Throughput: 0: 849.7. Samples: 1099948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:41:30,836][00107] Avg episode reward: [(0, '4.846')] +[2023-02-27 12:41:35,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.7, 300 sec: 3429.6). Total num frames: 8421376. Throughput: 0: 840.2. Samples: 1104426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:41:35,841][00107] Avg episode reward: [(0, '4.806')] +[2023-02-27 12:41:37,458][36602] Updated weights for policy 0, policy_version 2058 (0.0014) +[2023-02-27 12:41:40,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8441856. Throughput: 0: 867.2. Samples: 1107654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:41:40,831][00107] Avg episode reward: [(0, '4.669')] +[2023-02-27 12:41:45,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8462336. Throughput: 0: 887.3. Samples: 1114060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:41:45,832][00107] Avg episode reward: [(0, '4.507')] +[2023-02-27 12:41:45,850][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002066_8462336.pth... +[2023-02-27 12:41:46,089][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001863_7630848.pth +[2023-02-27 12:41:48,569][36602] Updated weights for policy 0, policy_version 2068 (0.0035) +[2023-02-27 12:41:50,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3429.6). Total num frames: 8474624. Throughput: 0: 851.5. Samples: 1118050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:41:50,835][00107] Avg episode reward: [(0, '4.508')] +[2023-02-27 12:41:55,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.8, 300 sec: 3429.5). Total num frames: 8491008. Throughput: 0: 851.2. Samples: 1120074. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2023-02-27 12:41:55,832][00107] Avg episode reward: [(0, '4.786')] +[2023-02-27 12:41:59,866][36602] Updated weights for policy 0, policy_version 2078 (0.0013) +[2023-02-27 12:42:00,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8511488. Throughput: 0: 904.6. Samples: 1126512. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:42:00,832][00107] Avg episode reward: [(0, '4.725')] +[2023-02-27 12:42:05,830][00107] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8527872. Throughput: 0: 875.3. Samples: 1131436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:42:05,832][00107] Avg episode reward: [(0, '4.734')] +[2023-02-27 12:42:10,830][00107] Fps is (10 sec: 2867.1, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 8540160. Throughput: 0: 843.9. Samples: 1133212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:42:10,833][00107] Avg episode reward: [(0, '4.631')] +[2023-02-27 12:42:13,990][36602] Updated weights for policy 0, policy_version 2088 (0.0025) +[2023-02-27 12:42:15,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 8556544. Throughput: 0: 841.1. Samples: 1137798. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:42:15,837][00107] Avg episode reward: [(0, '4.592')] +[2023-02-27 12:42:20,829][00107] Fps is (10 sec: 4096.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8581120. Throughput: 0: 883.7. Samples: 1144192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:42:20,832][00107] Avg episode reward: [(0, '4.657')] +[2023-02-27 12:42:23,950][36602] Updated weights for policy 0, policy_version 2098 (0.0020) +[2023-02-27 12:42:25,835][00107] Fps is (10 sec: 4093.7, 60 sec: 3413.0, 300 sec: 3443.4). Total num frames: 8597504. Throughput: 0: 878.5. Samples: 1147192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:42:25,838][00107] Avg episode reward: [(0, '4.742')] +[2023-02-27 12:42:30,831][00107] Fps is (10 sec: 2866.8, 60 sec: 3413.2, 300 sec: 3415.6). Total num frames: 8609792. Throughput: 0: 824.6. Samples: 1151166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:42:30,842][00107] Avg episode reward: [(0, '4.656')] +[2023-02-27 12:42:35,830][00107] Fps is (10 sec: 3278.6, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8630272. Throughput: 0: 854.0. Samples: 1156482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:42:35,831][00107] Avg episode reward: [(0, '4.618')] +[2023-02-27 12:42:36,656][36602] Updated weights for policy 0, policy_version 2108 (0.0012) +[2023-02-27 12:42:40,829][00107] Fps is (10 sec: 4096.7, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8650752. Throughput: 0: 878.9. Samples: 1159624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:42:40,832][00107] Avg episode reward: [(0, '4.713')] +[2023-02-27 12:42:45,834][00107] Fps is (10 sec: 3684.7, 60 sec: 3413.1, 300 sec: 3443.4). Total num frames: 8667136. Throughput: 0: 857.1. Samples: 1165084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:42:45,837][00107] Avg episode reward: [(0, '4.819')] +[2023-02-27 12:42:48,837][36602] Updated weights for policy 0, policy_version 2118 (0.0019) +[2023-02-27 12:42:50,830][00107] Fps is (10 sec: 2867.0, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 8679424. Throughput: 0: 837.5. Samples: 1169124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:42:50,834][00107] Avg episode reward: [(0, '4.678')] +[2023-02-27 12:42:55,830][00107] Fps is (10 sec: 3278.3, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8699904. Throughput: 0: 859.8. Samples: 1171902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:42:55,835][00107] Avg episode reward: [(0, '4.629')] +[2023-02-27 12:42:59,393][36602] Updated weights for policy 0, policy_version 2128 (0.0015) +[2023-02-27 12:43:00,829][00107] Fps is (10 sec: 4096.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8720384. Throughput: 0: 899.6. Samples: 1178280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:43:00,837][00107] Avg episode reward: [(0, '4.572')] +[2023-02-27 12:43:05,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 8732672. Throughput: 0: 852.5. Samples: 1182556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:43:05,832][00107] Avg episode reward: [(0, '4.514')] +[2023-02-27 12:43:10,830][00107] Fps is (10 sec: 2457.5, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 8744960. Throughput: 0: 824.0. Samples: 1184266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:43:10,835][00107] Avg episode reward: [(0, '4.554')] +[2023-02-27 12:43:13,432][36602] Updated weights for policy 0, policy_version 2138 (0.0023) +[2023-02-27 12:43:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8765440. Throughput: 0: 855.6. Samples: 1189668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:43:15,838][00107] Avg episode reward: [(0, '4.656')] +[2023-02-27 12:43:20,829][00107] Fps is (10 sec: 4096.2, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8785920. Throughput: 0: 878.5. Samples: 1196016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:43:20,831][00107] Avg episode reward: [(0, '4.816')] +[2023-02-27 12:43:24,434][36602] Updated weights for policy 0, policy_version 2148 (0.0019) +[2023-02-27 12:43:25,832][00107] Fps is (10 sec: 3685.6, 60 sec: 3413.5, 300 sec: 3429.5). Total num frames: 8802304. Throughput: 0: 856.4. Samples: 1198164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:43:25,835][00107] Avg episode reward: [(0, '4.860')] +[2023-02-27 12:43:30,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.4, 300 sec: 3415.7). Total num frames: 8814592. Throughput: 0: 825.6. Samples: 1202230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:43:30,832][00107] Avg episode reward: [(0, '4.908')] +[2023-02-27 12:43:35,830][00107] Fps is (10 sec: 3277.5, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8835072. Throughput: 0: 871.3. Samples: 1208332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:43:35,840][00107] Avg episode reward: [(0, '4.758')] +[2023-02-27 12:43:36,171][36602] Updated weights for policy 0, policy_version 2158 (0.0015) +[2023-02-27 12:43:40,834][00107] Fps is (10 sec: 4094.4, 60 sec: 3413.1, 300 sec: 3443.4). Total num frames: 8855552. Throughput: 0: 880.6. Samples: 1211532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:43:40,836][00107] Avg episode reward: [(0, '4.679')] +[2023-02-27 12:43:45,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3345.3, 300 sec: 3415.6). Total num frames: 8867840. Throughput: 0: 841.4. Samples: 1216144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:43:45,837][00107] Avg episode reward: [(0, '4.646')] +[2023-02-27 12:43:45,948][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002166_8871936.pth... +[2023-02-27 12:43:46,124][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001965_8048640.pth +[2023-02-27 12:43:49,192][36602] Updated weights for policy 0, policy_version 2168 (0.0020) +[2023-02-27 12:43:50,830][00107] Fps is (10 sec: 2868.3, 60 sec: 3413.4, 300 sec: 3415.6). Total num frames: 8884224. Throughput: 0: 842.7. Samples: 1220478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:43:50,833][00107] Avg episode reward: [(0, '4.776')] +[2023-02-27 12:43:55,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8904704. Throughput: 0: 875.1. Samples: 1223646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:43:55,833][00107] Avg episode reward: [(0, '4.777')] +[2023-02-27 12:43:58,785][36602] Updated weights for policy 0, policy_version 2178 (0.0015) +[2023-02-27 12:44:00,834][00107] Fps is (10 sec: 4094.1, 60 sec: 3413.1, 300 sec: 3443.4). Total num frames: 8925184. Throughput: 0: 898.7. Samples: 1230112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:44:00,837][00107] Avg episode reward: [(0, '4.554')] +[2023-02-27 12:44:05,830][00107] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 8937472. Throughput: 0: 833.9. Samples: 1233540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:44:05,834][00107] Avg episode reward: [(0, '4.533')] +[2023-02-27 12:44:10,829][00107] Fps is (10 sec: 2868.6, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 8953856. Throughput: 0: 826.5. Samples: 1235354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:44:10,832][00107] Avg episode reward: [(0, '4.734')] +[2023-02-27 12:44:12,676][36602] Updated weights for policy 0, policy_version 2188 (0.0020) +[2023-02-27 12:44:15,830][00107] Fps is (10 sec: 3686.6, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 8974336. Throughput: 0: 874.8. Samples: 1241598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:44:15,832][00107] Avg episode reward: [(0, '4.898')] +[2023-02-27 12:44:20,830][00107] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 8990720. Throughput: 0: 867.2. Samples: 1247354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:44:20,832][00107] Avg episode reward: [(0, '4.708')] +[2023-02-27 12:44:24,212][36602] Updated weights for policy 0, policy_version 2198 (0.0012) +[2023-02-27 12:44:25,833][00107] Fps is (10 sec: 3275.6, 60 sec: 3413.3, 300 sec: 3429.6). Total num frames: 9007104. Throughput: 0: 841.0. Samples: 1249378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:44:25,840][00107] Avg episode reward: [(0, '4.593')] +[2023-02-27 12:44:30,830][00107] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 9023488. Throughput: 0: 840.2. Samples: 1253954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:44:30,839][00107] Avg episode reward: [(0, '4.668')] +[2023-02-27 12:44:35,287][36602] Updated weights for policy 0, policy_version 2208 (0.0033) +[2023-02-27 12:44:35,829][00107] Fps is (10 sec: 3687.7, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 9043968. Throughput: 0: 886.5. Samples: 1260370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:44:35,832][00107] Avg episode reward: [(0, '4.770')] +[2023-02-27 12:44:40,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.6, 300 sec: 3429.5). Total num frames: 9060352. Throughput: 0: 884.9. Samples: 1263466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:44:40,834][00107] Avg episode reward: [(0, '4.587')] +[2023-02-27 12:44:45,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 9076736. Throughput: 0: 831.2. Samples: 1267510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:44:45,843][00107] Avg episode reward: [(0, '4.547')] +[2023-02-27 12:44:48,199][36602] Updated weights for policy 0, policy_version 2218 (0.0012) +[2023-02-27 12:44:50,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 9093120. Throughput: 0: 875.3. Samples: 1272928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:44:50,832][00107] Avg episode reward: [(0, '4.650')] +[2023-02-27 12:44:55,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 9117696. Throughput: 0: 905.2. Samples: 1276090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:44:55,831][00107] Avg episode reward: [(0, '4.614')] +[2023-02-27 12:44:57,893][36602] Updated weights for policy 0, policy_version 2228 (0.0014) +[2023-02-27 12:45:00,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.6, 300 sec: 3443.4). Total num frames: 9129984. Throughput: 0: 890.4. Samples: 1281664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:45:00,831][00107] Avg episode reward: [(0, '4.648')] +[2023-02-27 12:45:05,830][00107] Fps is (10 sec: 2457.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9142272. Throughput: 0: 840.6. Samples: 1285180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:45:05,841][00107] Avg episode reward: [(0, '4.748')] +[2023-02-27 12:45:10,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 9162752. Throughput: 0: 845.7. Samples: 1287432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:45:10,832][00107] Avg episode reward: [(0, '4.993')] +[2023-02-27 12:45:11,701][36602] Updated weights for policy 0, policy_version 2238 (0.0016) +[2023-02-27 12:45:15,830][00107] Fps is (10 sec: 4096.2, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 9183232. Throughput: 0: 885.8. Samples: 1293814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:45:15,832][00107] Avg episode reward: [(0, '4.918')] +[2023-02-27 12:45:20,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9195520. Throughput: 0: 853.4. Samples: 1298774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:45:20,835][00107] Avg episode reward: [(0, '4.560')] +[2023-02-27 12:45:24,105][36602] Updated weights for policy 0, policy_version 2248 (0.0020) +[2023-02-27 12:45:25,830][00107] Fps is (10 sec: 2867.0, 60 sec: 3413.5, 300 sec: 3429.6). Total num frames: 9211904. Throughput: 0: 827.4. Samples: 1300700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:45:25,839][00107] Avg episode reward: [(0, '4.520')] +[2023-02-27 12:45:30,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 9232384. Throughput: 0: 854.4. Samples: 1305956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:45:30,837][00107] Avg episode reward: [(0, '4.632')] +[2023-02-27 12:45:34,709][36602] Updated weights for policy 0, policy_version 2258 (0.0014) +[2023-02-27 12:45:35,829][00107] Fps is (10 sec: 4096.3, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 9252864. Throughput: 0: 874.8. Samples: 1312296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:45:35,838][00107] Avg episode reward: [(0, '4.591')] +[2023-02-27 12:45:40,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9265152. Throughput: 0: 857.5. Samples: 1314676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:45:40,832][00107] Avg episode reward: [(0, '4.636')] +[2023-02-27 12:45:45,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9281536. Throughput: 0: 821.9. Samples: 1318650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:45:45,839][00107] Avg episode reward: [(0, '4.605')] +[2023-02-27 12:45:45,857][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002266_9281536.pth... +[2023-02-27 12:45:46,001][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002066_8462336.pth +[2023-02-27 12:45:47,850][36602] Updated weights for policy 0, policy_version 2268 (0.0028) +[2023-02-27 12:45:50,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.5). Total num frames: 9297920. Throughput: 0: 871.3. Samples: 1324386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:45:50,837][00107] Avg episode reward: [(0, '4.790')] +[2023-02-27 12:45:55,830][00107] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 9322496. Throughput: 0: 890.2. Samples: 1327492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:45:55,832][00107] Avg episode reward: [(0, '4.705')] +[2023-02-27 12:45:58,747][36602] Updated weights for policy 0, policy_version 2278 (0.0017) +[2023-02-27 12:46:00,830][00107] Fps is (10 sec: 3686.3, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9334784. Throughput: 0: 859.2. Samples: 1332480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:46:00,836][00107] Avg episode reward: [(0, '4.654')] +[2023-02-27 12:46:05,830][00107] Fps is (10 sec: 2457.6, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 9347072. Throughput: 0: 830.4. Samples: 1336140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-27 12:46:05,832][00107] Avg episode reward: [(0, '4.775')] +[2023-02-27 12:46:10,829][00107] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 9367552. Throughput: 0: 850.0. Samples: 1338950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-02-27 12:46:10,831][00107] Avg episode reward: [(0, '4.834')] +[2023-02-27 12:46:11,697][36602] Updated weights for policy 0, policy_version 2288 (0.0015) +[2023-02-27 12:46:15,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 9388032. Throughput: 0: 872.9. Samples: 1345236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:46:15,836][00107] Avg episode reward: [(0, '4.801')] +[2023-02-27 12:46:20,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 9400320. Throughput: 0: 825.0. Samples: 1349422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:46:20,838][00107] Avg episode reward: [(0, '4.701')] +[2023-02-27 12:46:24,730][36602] Updated weights for policy 0, policy_version 2298 (0.0023) +[2023-02-27 12:46:25,830][00107] Fps is (10 sec: 2867.1, 60 sec: 3413.4, 300 sec: 3429.5). Total num frames: 9416704. Throughput: 0: 817.7. Samples: 1351472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:46:25,833][00107] Avg episode reward: [(0, '4.823')] +[2023-02-27 12:46:30,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 9437184. Throughput: 0: 865.8. Samples: 1357610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-27 12:46:30,832][00107] Avg episode reward: [(0, '4.681')] +[2023-02-27 12:46:34,185][36602] Updated weights for policy 0, policy_version 2308 (0.0012) +[2023-02-27 12:46:35,831][00107] Fps is (10 sec: 4095.4, 60 sec: 3413.2, 300 sec: 3443.4). Total num frames: 9457664. Throughput: 0: 872.2. Samples: 1363636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:46:35,834][00107] Avg episode reward: [(0, '4.869')] +[2023-02-27 12:46:40,829][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 9469952. Throughput: 0: 847.6. Samples: 1365636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:46:40,836][00107] Avg episode reward: [(0, '5.108')] +[2023-02-27 12:46:45,830][00107] Fps is (10 sec: 2867.7, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9486336. Throughput: 0: 837.2. Samples: 1370154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:46:45,831][00107] Avg episode reward: [(0, '4.948')] +[2023-02-27 12:46:47,091][36602] Updated weights for policy 0, policy_version 2318 (0.0026) +[2023-02-27 12:46:50,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 9510912. Throughput: 0: 897.9. Samples: 1376546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:46:50,837][00107] Avg episode reward: [(0, '4.925')] +[2023-02-27 12:46:55,836][00107] Fps is (10 sec: 4093.4, 60 sec: 3413.0, 300 sec: 3443.3). Total num frames: 9527296. Throughput: 0: 907.8. Samples: 1379808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:46:55,838][00107] Avg episode reward: [(0, '4.732')] +[2023-02-27 12:46:58,129][36602] Updated weights for policy 0, policy_version 2328 (0.0012) +[2023-02-27 12:47:00,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9539584. Throughput: 0: 861.4. Samples: 1384000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:47:00,832][00107] Avg episode reward: [(0, '4.757')] +[2023-02-27 12:47:05,830][00107] Fps is (10 sec: 2869.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 9555968. Throughput: 0: 865.2. Samples: 1388358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:47:05,833][00107] Avg episode reward: [(0, '4.973')] +[2023-02-27 12:47:10,230][36602] Updated weights for policy 0, policy_version 2338 (0.0013) +[2023-02-27 12:47:10,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 9576448. Throughput: 0: 883.7. Samples: 1391240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:47:10,832][00107] Avg episode reward: [(0, '5.040')] +[2023-02-27 12:47:15,829][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9592832. Throughput: 0: 870.0. Samples: 1396762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:47:15,838][00107] Avg episode reward: [(0, '4.982')] +[2023-02-27 12:47:20,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 9605120. Throughput: 0: 817.9. Samples: 1400440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:47:20,837][00107] Avg episode reward: [(0, '4.808')] +[2023-02-27 12:47:24,611][36602] Updated weights for policy 0, policy_version 2348 (0.0034) +[2023-02-27 12:47:25,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9621504. Throughput: 0: 821.4. Samples: 1402598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:47:25,831][00107] Avg episode reward: [(0, '4.679')] +[2023-02-27 12:47:30,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9641984. Throughput: 0: 857.0. Samples: 1408718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:47:30,832][00107] Avg episode reward: [(0, '4.632')] +[2023-02-27 12:47:35,292][36602] Updated weights for policy 0, policy_version 2358 (0.0012) +[2023-02-27 12:47:35,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3415.6). Total num frames: 9658368. Throughput: 0: 830.1. Samples: 1413902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:47:35,836][00107] Avg episode reward: [(0, '4.591')] +[2023-02-27 12:47:40,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9670656. Throughput: 0: 801.6. Samples: 1415874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:47:40,836][00107] Avg episode reward: [(0, '4.721')] +[2023-02-27 12:47:45,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9691136. Throughput: 0: 820.4. Samples: 1420920. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2023-02-27 12:47:45,832][00107] Avg episode reward: [(0, '5.031')] +[2023-02-27 12:47:45,850][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002366_9691136.pth... +[2023-02-27 12:47:45,993][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002166_8871936.pth +[2023-02-27 12:47:47,481][36602] Updated weights for policy 0, policy_version 2368 (0.0021) +[2023-02-27 12:47:50,829][00107] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 9711616. Throughput: 0: 861.9. Samples: 1427144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:47:50,831][00107] Avg episode reward: [(0, '4.928')] +[2023-02-27 12:47:55,830][00107] Fps is (10 sec: 3686.2, 60 sec: 3345.4, 300 sec: 3415.6). Total num frames: 9728000. Throughput: 0: 853.9. Samples: 1429668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:47:55,833][00107] Avg episode reward: [(0, '4.615')] +[2023-02-27 12:48:00,258][36602] Updated weights for policy 0, policy_version 2378 (0.0018) +[2023-02-27 12:48:00,830][00107] Fps is (10 sec: 2867.1, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 9740288. Throughput: 0: 820.8. Samples: 1433700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:48:00,835][00107] Avg episode reward: [(0, '4.628')] +[2023-02-27 12:48:05,830][00107] Fps is (10 sec: 2867.3, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 9756672. Throughput: 0: 850.2. Samples: 1438700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:48:05,838][00107] Avg episode reward: [(0, '4.842')] +[2023-02-27 12:48:10,830][00107] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 9777152. Throughput: 0: 864.8. Samples: 1441516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2023-02-27 12:48:10,832][00107] Avg episode reward: [(0, '4.837')] +[2023-02-27 12:48:11,369][36602] Updated weights for policy 0, policy_version 2388 (0.0021) +[2023-02-27 12:48:15,830][00107] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3401.8). Total num frames: 9789440. Throughput: 0: 840.1. Samples: 1446524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:48:15,836][00107] Avg episode reward: [(0, '4.750')] +[2023-02-27 12:48:20,829][00107] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9805824. Throughput: 0: 818.5. Samples: 1450734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:48:20,832][00107] Avg episode reward: [(0, '4.594')] +[2023-02-27 12:48:24,474][36602] Updated weights for policy 0, policy_version 2398 (0.0012) +[2023-02-27 12:48:25,830][00107] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9826304. Throughput: 0: 845.2. Samples: 1453910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-02-27 12:48:25,832][00107] Avg episode reward: [(0, '4.654')] +[2023-02-27 12:48:30,834][00107] Fps is (10 sec: 4094.1, 60 sec: 3413.1, 300 sec: 3429.5). Total num frames: 9846784. Throughput: 0: 874.8. Samples: 1460292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:48:30,840][00107] Avg episode reward: [(0, '4.669')] +[2023-02-27 12:48:35,830][00107] Fps is (10 sec: 3276.7, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 9859072. Throughput: 0: 830.7. Samples: 1464524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:48:35,840][00107] Avg episode reward: [(0, '4.733')] +[2023-02-27 12:48:36,138][36602] Updated weights for policy 0, policy_version 2408 (0.0018) +[2023-02-27 12:48:40,830][00107] Fps is (10 sec: 2868.5, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 9875456. Throughput: 0: 819.1. Samples: 1466528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2023-02-27 12:48:40,832][00107] Avg episode reward: [(0, '4.784')] +[2023-02-27 12:48:45,830][00107] Fps is (10 sec: 3686.5, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9895936. Throughput: 0: 863.8. Samples: 1472570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2023-02-27 12:48:45,838][00107] Avg episode reward: [(0, '4.677')] +[2023-02-27 12:48:47,098][36602] Updated weights for policy 0, policy_version 2418 (0.0021) +[2023-02-27 12:48:50,829][00107] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 9916416. Throughput: 0: 883.0. Samples: 1478436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2023-02-27 12:48:50,831][00107] Avg episode reward: [(0, '4.656')] +[2023-02-27 12:48:55,832][00107] Fps is (10 sec: 3276.0, 60 sec: 3345.0, 300 sec: 3401.8). Total num frames: 9928704. Throughput: 0: 865.4. Samples: 1480462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:48:55,840][00107] Avg episode reward: [(0, '4.731')] +[2023-02-27 12:49:00,049][36602] Updated weights for policy 0, policy_version 2428 (0.0013) +[2023-02-27 12:49:00,830][00107] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3415.7). Total num frames: 9945088. Throughput: 0: 852.5. Samples: 1484886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:49:00,832][00107] Avg episode reward: [(0, '4.930')] +[2023-02-27 12:49:05,830][00107] Fps is (10 sec: 3687.3, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 9965568. Throughput: 0: 880.4. Samples: 1490352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2023-02-27 12:49:05,837][00107] Avg episode reward: [(0, '4.883')] +[2023-02-27 12:49:10,831][00107] Fps is (10 sec: 3685.8, 60 sec: 3413.2, 300 sec: 3415.6). Total num frames: 9981952. Throughput: 0: 874.6. Samples: 1493268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2023-02-27 12:49:10,834][00107] Avg episode reward: [(0, '4.749')] +[2023-02-27 12:49:12,188][36602] Updated weights for policy 0, policy_version 2438 (0.0013) +[2023-02-27 12:49:15,830][00107] Fps is (10 sec: 2867.0, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 9994240. Throughput: 0: 823.0. Samples: 1497322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2023-02-27 12:49:15,833][00107] Avg episode reward: [(0, '4.705')] +[2023-02-27 12:49:19,273][36588] Stopping Batcher_0... +[2023-02-27 12:49:19,275][36588] Loop batcher_evt_loop terminating... +[2023-02-27 12:49:19,276][00107] Component Batcher_0 stopped! +[2023-02-27 12:49:19,284][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... +[2023-02-27 12:49:19,320][36602] Weights refcount: 2 0 +[2023-02-27 12:49:19,328][36602] Stopping InferenceWorker_p0-w0... +[2023-02-27 12:49:19,329][36602] Loop inference_proc0-0_evt_loop terminating... +[2023-02-27 12:49:19,329][00107] Component InferenceWorker_p0-w0 stopped! +[2023-02-27 12:49:19,428][36588] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002266_9281536.pth +[2023-02-27 12:49:19,443][36588] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... +[2023-02-27 12:49:19,501][36619] Stopping RolloutWorker_w3... +[2023-02-27 12:49:19,501][00107] Component RolloutWorker_w3 stopped! +[2023-02-27 12:49:19,511][36605] Stopping RolloutWorker_w2... +[2023-02-27 12:49:19,511][36605] Loop rollout_proc2_evt_loop terminating... +[2023-02-27 12:49:19,513][36615] Stopping RolloutWorker_w4... +[2023-02-27 12:49:19,512][00107] Component RolloutWorker_w2 stopped! +[2023-02-27 12:49:19,516][36617] Stopping RolloutWorker_w6... +[2023-02-27 12:49:19,519][36617] Loop rollout_proc6_evt_loop terminating... +[2023-02-27 12:49:19,520][36615] Loop rollout_proc4_evt_loop terminating... +[2023-02-27 12:49:19,516][00107] Component RolloutWorker_w4 stopped! +[2023-02-27 12:49:19,521][00107] Component RolloutWorker_w6 stopped! +[2023-02-27 12:49:19,503][36619] Loop rollout_proc3_evt_loop terminating... +[2023-02-27 12:49:19,524][36611] Stopping RolloutWorker_w0... +[2023-02-27 12:49:19,524][00107] Component RolloutWorker_w0 stopped! +[2023-02-27 12:49:19,527][36611] Loop rollout_proc0_evt_loop terminating... +[2023-02-27 12:49:19,541][00107] Component RolloutWorker_w7 stopped! +[2023-02-27 12:49:19,543][36625] Stopping RolloutWorker_w7... +[2023-02-27 12:49:19,545][36625] Loop rollout_proc7_evt_loop terminating... +[2023-02-27 12:49:19,553][36613] Stopping RolloutWorker_w5... +[2023-02-27 12:49:19,552][00107] Component RolloutWorker_w5 stopped! +[2023-02-27 12:49:19,563][36613] Loop rollout_proc5_evt_loop terminating... +[2023-02-27 12:49:19,611][36603] Stopping RolloutWorker_w1... +[2023-02-27 12:49:19,611][00107] Component RolloutWorker_w1 stopped! +[2023-02-27 12:49:19,619][36603] Loop rollout_proc1_evt_loop terminating... +[2023-02-27 12:49:19,695][00107] Component LearnerWorker_p0 stopped! +[2023-02-27 12:49:19,702][00107] Waiting for process learner_proc0 to stop... +[2023-02-27 12:49:19,695][36588] Stopping LearnerWorker_p0... +[2023-02-27 12:49:19,709][36588] Loop learner_proc0_evt_loop terminating... +[2023-02-27 12:49:22,321][00107] Waiting for process inference_proc0-0 to join... +[2023-02-27 12:49:22,405][00107] Waiting for process rollout_proc0 to join... +[2023-02-27 12:49:22,407][00107] Waiting for process rollout_proc1 to join... +[2023-02-27 12:49:22,500][00107] Waiting for process rollout_proc2 to join... +[2023-02-27 12:49:22,502][00107] Waiting for process rollout_proc3 to join... +[2023-02-27 12:49:22,509][00107] Waiting for process rollout_proc4 to join... +[2023-02-27 12:49:22,513][00107] Waiting for process rollout_proc5 to join... +[2023-02-27 12:49:22,515][00107] Waiting for process rollout_proc6 to join... +[2023-02-27 12:49:22,518][00107] Waiting for process rollout_proc7 to join... +[2023-02-27 12:49:22,519][00107] Batcher 0 profile tree view: +batching: 40.1217, releasing_batches: 0.0466 +[2023-02-27 12:49:22,521][00107] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 806.4133 +update_model: 12.7522 + weight_update: 0.0022 +one_step: 0.0070 + handle_policy_step: 862.4692 + deserialize: 25.7586, stack: 5.0748, obs_to_device_normalize: 190.8770, forward: 415.7173, send_messages: 44.3556 + prepare_outputs: 136.8741 + to_cpu: 82.9091 +[2023-02-27 12:49:22,523][00107] Learner 0 profile tree view: +misc: 0.0093, prepare_batch: 24.1627 +train: 121.8860 + epoch_init: 0.0093, minibatch_init: 0.0207, losses_postprocess: 0.9822, kl_divergence: 0.9118, after_optimizer: 4.9356 + calculate_losses: 41.8561 + losses_init: 0.0051, forward_head: 3.0213, bptt_initial: 27.2348, tail: 1.8025, advantages_returns: 0.4406, losses: 5.3536 + bptt: 3.4238 + bptt_forward_core: 3.2784 + update: 71.9802 + clip: 2.2662 +[2023-02-27 12:49:22,526][00107] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.4486, enqueue_policy_requests: 224.9518, env_step: 1330.9066, overhead: 37.4617, complete_rollouts: 10.6690 +save_policy_outputs: 32.9040 + split_output_tensors: 15.8306 +[2023-02-27 12:49:22,528][00107] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.6556, enqueue_policy_requests: 222.0219, env_step: 1335.7480, overhead: 37.5753, complete_rollouts: 11.2510 +save_policy_outputs: 31.9910 + split_output_tensors: 15.5536 +[2023-02-27 12:49:22,531][00107] Loop Runner_EvtLoop terminating... +[2023-02-27 12:49:22,534][00107] Runner profile tree view: +main_loop: 1774.6673 +[2023-02-27 12:49:22,537][00107] Collected {0: 10006528}, FPS: 3381.3 +[2023-02-27 12:51:38,512][00107] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-27 12:51:38,515][00107] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-27 12:51:38,517][00107] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-27 12:51:38,519][00107] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-27 12:51:38,523][00107] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 12:51:38,524][00107] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-27 12:51:38,525][00107] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 12:51:38,530][00107] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-27 12:51:38,531][00107] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2023-02-27 12:51:38,532][00107] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2023-02-27 12:51:38,533][00107] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-27 12:51:38,534][00107] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-27 12:51:38,535][00107] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-27 12:51:38,538][00107] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-27 12:51:38,540][00107] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-27 12:51:38,572][00107] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 12:51:38,575][00107] RunningMeanStd input shape: (1,) +[2023-02-27 12:51:38,598][00107] ConvEncoder: input_channels=3 +[2023-02-27 12:51:38,673][00107] Conv encoder output size: 512 +[2023-02-27 12:51:38,676][00107] Policy head output size: 512 +[2023-02-27 12:51:38,724][00107] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... +[2023-02-27 12:51:39,495][00107] Num frames 100... +[2023-02-27 12:51:39,673][00107] Num frames 200... +[2023-02-27 12:51:39,864][00107] Num frames 300... +[2023-02-27 12:51:40,047][00107] Num frames 400... +[2023-02-27 12:51:40,190][00107] Avg episode rewards: #0: 6.480, true rewards: #0: 4.480 +[2023-02-27 12:51:40,192][00107] Avg episode reward: 6.480, avg true_objective: 4.480 +[2023-02-27 12:51:40,295][00107] Num frames 500... +[2023-02-27 12:51:40,480][00107] Num frames 600... +[2023-02-27 12:51:40,662][00107] Num frames 700... +[2023-02-27 12:51:40,851][00107] Num frames 800... +[2023-02-27 12:51:40,965][00107] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 +[2023-02-27 12:51:40,968][00107] Avg episode reward: 5.160, avg true_objective: 4.160 +[2023-02-27 12:51:41,087][00107] Num frames 900... +[2023-02-27 12:51:41,283][00107] Num frames 1000... +[2023-02-27 12:51:41,465][00107] Num frames 1100... +[2023-02-27 12:51:41,645][00107] Num frames 1200... +[2023-02-27 12:51:41,768][00107] Num frames 1300... +[2023-02-27 12:51:41,843][00107] Avg episode rewards: #0: 5.707, true rewards: #0: 4.373 +[2023-02-27 12:51:41,845][00107] Avg episode reward: 5.707, avg true_objective: 4.373 +[2023-02-27 12:51:41,956][00107] Num frames 1400... +[2023-02-27 12:51:42,084][00107] Num frames 1500... +[2023-02-27 12:51:42,210][00107] Num frames 1600... +[2023-02-27 12:51:42,383][00107] Avg episode rewards: #0: 5.240, true rewards: #0: 4.240 +[2023-02-27 12:51:42,385][00107] Avg episode reward: 5.240, avg true_objective: 4.240 +[2023-02-27 12:51:42,395][00107] Num frames 1700... +[2023-02-27 12:51:42,523][00107] Num frames 1800... +[2023-02-27 12:51:42,643][00107] Num frames 1900... +[2023-02-27 12:51:42,770][00107] Num frames 2000... +[2023-02-27 12:51:42,923][00107] Avg episode rewards: #0: 4.960, true rewards: #0: 4.160 +[2023-02-27 12:51:42,924][00107] Avg episode reward: 4.960, avg true_objective: 4.160 +[2023-02-27 12:51:42,953][00107] Num frames 2100... +[2023-02-27 12:51:43,071][00107] Num frames 2200... +[2023-02-27 12:51:43,197][00107] Num frames 2300... +[2023-02-27 12:51:43,303][00107] Avg episode rewards: #0: 4.560, true rewards: #0: 3.893 +[2023-02-27 12:51:43,306][00107] Avg episode reward: 4.560, avg true_objective: 3.893 +[2023-02-27 12:51:43,387][00107] Num frames 2400... +[2023-02-27 12:51:43,508][00107] Num frames 2500... +[2023-02-27 12:51:43,635][00107] Num frames 2600... +[2023-02-27 12:51:43,756][00107] Num frames 2700... +[2023-02-27 12:51:43,881][00107] Num frames 2800... +[2023-02-27 12:51:44,035][00107] Avg episode rewards: #0: 4.829, true rewards: #0: 4.114 +[2023-02-27 12:51:44,036][00107] Avg episode reward: 4.829, avg true_objective: 4.114 +[2023-02-27 12:51:44,066][00107] Num frames 2900... +[2023-02-27 12:51:44,195][00107] Num frames 3000... +[2023-02-27 12:51:44,315][00107] Num frames 3100... +[2023-02-27 12:51:44,446][00107] Num frames 3200... +[2023-02-27 12:51:44,566][00107] Num frames 3300... +[2023-02-27 12:51:44,732][00107] Avg episode rewards: #0: 5.115, true rewards: #0: 4.240 +[2023-02-27 12:51:44,734][00107] Avg episode reward: 5.115, avg true_objective: 4.240 +[2023-02-27 12:51:44,747][00107] Num frames 3400... +[2023-02-27 12:51:44,873][00107] Num frames 3500... +[2023-02-27 12:51:45,005][00107] Num frames 3600... +[2023-02-27 12:51:45,126][00107] Num frames 3700... +[2023-02-27 12:51:45,257][00107] Num frames 3800... +[2023-02-27 12:51:45,391][00107] Num frames 3900... +[2023-02-27 12:51:45,527][00107] Avg episode rewards: #0: 5.373, true rewards: #0: 4.373 +[2023-02-27 12:51:45,529][00107] Avg episode reward: 5.373, avg true_objective: 4.373 +[2023-02-27 12:51:45,692][00107] Num frames 4000... +[2023-02-27 12:51:45,888][00107] Num frames 4100... +[2023-02-27 12:51:46,156][00107] Num frames 4200... +[2023-02-27 12:51:46,368][00107] Num frames 4300... +[2023-02-27 12:51:46,558][00107] Avg episode rewards: #0: 5.252, true rewards: #0: 4.352 +[2023-02-27 12:51:46,560][00107] Avg episode reward: 5.252, avg true_objective: 4.352 +[2023-02-27 12:52:12,107][00107] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2023-02-27 12:54:07,980][00107] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2023-02-27 12:54:07,983][00107] Overriding arg 'num_workers' with value 1 passed from command line +[2023-02-27 12:54:07,990][00107] Adding new argument 'no_render'=True that is not in the saved config file! +[2023-02-27 12:54:07,992][00107] Adding new argument 'save_video'=True that is not in the saved config file! +[2023-02-27 12:54:07,994][00107] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2023-02-27 12:54:07,997][00107] Adding new argument 'video_name'=None that is not in the saved config file! +[2023-02-27 12:54:08,005][00107] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2023-02-27 12:54:08,008][00107] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2023-02-27 12:54:08,009][00107] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2023-02-27 12:54:08,012][00107] Adding new argument 'hf_repository'='KoRiF/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2023-02-27 12:54:08,014][00107] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2023-02-27 12:54:08,016][00107] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2023-02-27 12:54:08,019][00107] Adding new argument 'train_script'=None that is not in the saved config file! +[2023-02-27 12:54:08,021][00107] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2023-02-27 12:54:08,023][00107] Using frameskip 1 and render_action_repeat=4 for evaluation +[2023-02-27 12:54:08,049][00107] RunningMeanStd input shape: (3, 72, 128) +[2023-02-27 12:54:08,054][00107] RunningMeanStd input shape: (1,) +[2023-02-27 12:54:08,076][00107] ConvEncoder: input_channels=3 +[2023-02-27 12:54:08,136][00107] Conv encoder output size: 512 +[2023-02-27 12:54:08,142][00107] Policy head output size: 512 +[2023-02-27 12:54:08,173][00107] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth... +[2023-02-27 12:54:08,879][00107] Num frames 100... +[2023-02-27 12:54:09,077][00107] Num frames 200... +[2023-02-27 12:54:09,244][00107] Num frames 300... +[2023-02-27 12:54:09,385][00107] Num frames 400... +[2023-02-27 12:54:09,504][00107] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 +[2023-02-27 12:54:09,506][00107] Avg episode reward: 5.480, avg true_objective: 4.480 +[2023-02-27 12:54:09,576][00107] Num frames 500... +[2023-02-27 12:54:09,698][00107] Num frames 600... +[2023-02-27 12:54:09,832][00107] Num frames 700... +[2023-02-27 12:54:09,967][00107] Num frames 800... +[2023-02-27 12:54:10,088][00107] Num frames 900... +[2023-02-27 12:54:10,185][00107] Avg episode rewards: #0: 6.140, true rewards: #0: 4.640 +[2023-02-27 12:54:10,187][00107] Avg episode reward: 6.140, avg true_objective: 4.640 +[2023-02-27 12:54:10,295][00107] Num frames 1000... +[2023-02-27 12:54:10,414][00107] Num frames 1100... +[2023-02-27 12:54:10,536][00107] Num frames 1200... +[2023-02-27 12:54:10,658][00107] Num frames 1300... +[2023-02-27 12:54:10,731][00107] Avg episode rewards: #0: 5.373, true rewards: #0: 4.373 +[2023-02-27 12:54:10,733][00107] Avg episode reward: 5.373, avg true_objective: 4.373 +[2023-02-27 12:54:10,841][00107] Num frames 1400... +[2023-02-27 12:54:10,979][00107] Num frames 1500... +[2023-02-27 12:54:11,107][00107] Num frames 1600... +[2023-02-27 12:54:11,280][00107] Avg episode rewards: #0: 4.990, true rewards: #0: 4.240 +[2023-02-27 12:54:11,282][00107] Avg episode reward: 4.990, avg true_objective: 4.240 +[2023-02-27 12:54:11,293][00107] Num frames 1700... +[2023-02-27 12:54:11,428][00107] Num frames 1800... +[2023-02-27 12:54:11,557][00107] Num frames 1900... +[2023-02-27 12:54:11,688][00107] Num frames 2000... +[2023-02-27 12:54:11,819][00107] Num frames 2100... +[2023-02-27 12:54:11,930][00107] Avg episode rewards: #0: 5.088, true rewards: #0: 4.288 +[2023-02-27 12:54:11,932][00107] Avg episode reward: 5.088, avg true_objective: 4.288 +[2023-02-27 12:54:12,019][00107] Num frames 2200... +[2023-02-27 12:54:12,152][00107] Num frames 2300... +[2023-02-27 12:54:12,286][00107] Num frames 2400... +[2023-02-27 12:54:12,429][00107] Num frames 2500... +[2023-02-27 12:54:12,521][00107] Avg episode rewards: #0: 4.880, true rewards: #0: 4.213 +[2023-02-27 12:54:12,522][00107] Avg episode reward: 4.880, avg true_objective: 4.213 +[2023-02-27 12:54:12,614][00107] Num frames 2600... +[2023-02-27 12:54:12,742][00107] Num frames 2700... +[2023-02-27 12:54:12,872][00107] Num frames 2800... +[2023-02-27 12:54:13,006][00107] Num frames 2900... +[2023-02-27 12:54:13,141][00107] Avg episode rewards: #0: 4.920, true rewards: #0: 4.206 +[2023-02-27 12:54:13,142][00107] Avg episode reward: 4.920, avg true_objective: 4.206 +[2023-02-27 12:54:13,218][00107] Num frames 3000... +[2023-02-27 12:54:13,345][00107] Num frames 3100... +[2023-02-27 12:54:13,469][00107] Num frames 3200... +[2023-02-27 12:54:13,598][00107] Num frames 3300... +[2023-02-27 12:54:13,725][00107] Avg episode rewards: #0: 5.075, true rewards: #0: 4.200 +[2023-02-27 12:54:13,726][00107] Avg episode reward: 5.075, avg true_objective: 4.200 +[2023-02-27 12:54:13,779][00107] Num frames 3400... +[2023-02-27 12:54:13,898][00107] Num frames 3500... +[2023-02-27 12:54:14,030][00107] Num frames 3600... +[2023-02-27 12:54:14,169][00107] Num frames 3700... +[2023-02-27 12:54:14,280][00107] Avg episode rewards: #0: 4.938, true rewards: #0: 4.160 +[2023-02-27 12:54:14,283][00107] Avg episode reward: 4.938, avg true_objective: 4.160 +[2023-02-27 12:54:14,357][00107] Num frames 3800... +[2023-02-27 12:54:14,485][00107] Num frames 3900... +[2023-02-27 12:54:14,612][00107] Num frames 4000... +[2023-02-27 12:54:14,731][00107] Num frames 4100... +[2023-02-27 12:54:14,818][00107] Avg episode rewards: #0: 4.828, true rewards: #0: 4.128 +[2023-02-27 12:54:14,820][00107] Avg episode reward: 4.828, avg true_objective: 4.128 +[2023-02-27 12:54:36,484][00107] Replay video saved to /content/train_dir/default_experiment/replay.mp4!