Re-Re commited on
Commit
6eb81c6
1 Parent(s): 512adb5

Upload folder using huggingface_hub

Browse files
.summary/0/events.out.tfevents.1725614538.4ed841473a2d ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:121fa67a0979acb434a0ddd33af384a01f7f9fc66661865a4b56ff1269baf428
3
+ size 439940
README.md CHANGED
@@ -15,7 +15,7 @@ model-index:
15
  type: doom_health_gathering_supreme
16
  metrics:
17
  - type: mean_reward
18
- value: 10.63 +/- 6.83
19
  name: mean_reward
20
  verified: false
21
  ---
 
15
  type: doom_health_gathering_supreme
16
  metrics:
17
  - type: mean_reward
18
+ value: 10.94 +/- 5.09
19
  name: mean_reward
20
  verified: false
21
  ---
checkpoint_p0/best_000001877_7688192_reward_32.833.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e71c48acc753ddbc8ea9cf3b803375143d14871a05497b1a618c47f7493accd
3
+ size 34929243
checkpoint_p0/checkpoint_000002369_9703424.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbead7acfb60c8f94f705d9c84be57632bd36ccf62e720fa3fe5889a436b817c
3
+ size 34929669
checkpoint_p0/checkpoint_000002443_10006528.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c450c1532323f29ce4650608fa3c8c278b49cabc3f34136904a214ace6b6020f
3
+ size 34929669
config.json CHANGED
@@ -65,7 +65,7 @@
65
  "summaries_use_frameskip": true,
66
  "heartbeat_interval": 20,
67
  "heartbeat_reporting_interval": 600,
68
- "train_for_env_steps": 7500000,
69
  "train_for_seconds": 10000000000,
70
  "save_every_sec": 120,
71
  "keep_checkpoints": 2,
 
65
  "summaries_use_frameskip": true,
66
  "heartbeat_interval": 20,
67
  "heartbeat_reporting_interval": 600,
68
+ "train_for_env_steps": 10000000,
69
  "train_for_seconds": 10000000000,
70
  "save_every_sec": 120,
71
  "keep_checkpoints": 2,
replay.mp4 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:69c2e0d51b9dbf8963d4319ec5837c494a3bc0eb9889b1dbc46d480404b78135
3
- size 20775831
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ecb1b56568acf55e8bfc29a0bd773fc02497b82450c0461b20408cf111136ca
3
+ size 20676826
sf_log.txt CHANGED
@@ -2946,3 +2946,1058 @@ main_loop: 690.9829
2946
  [2024-09-06 09:19:52,177][01070] Avg episode rewards: #0: 26.327, true rewards: #0: 10.627
2947
  [2024-09-06 09:19:52,179][01070] Avg episode reward: 26.327, avg true_objective: 10.627
2948
  [2024-09-06 09:20:57,694][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2946
  [2024-09-06 09:19:52,177][01070] Avg episode rewards: #0: 26.327, true rewards: #0: 10.627
2947
  [2024-09-06 09:19:52,179][01070] Avg episode reward: 26.327, avg true_objective: 10.627
2948
  [2024-09-06 09:20:57,694][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
2949
+ [2024-09-06 09:21:03,134][01070] The model has been pushed to https://huggingface.co/Re-Re/rl_course_vizdoom_health_gathering_supreme
2950
+ [2024-09-06 09:22:18,448][01070] Environment doom_basic already registered, overwriting...
2951
+ [2024-09-06 09:22:18,451][01070] Environment doom_two_colors_easy already registered, overwriting...
2952
+ [2024-09-06 09:22:18,453][01070] Environment doom_two_colors_hard already registered, overwriting...
2953
+ [2024-09-06 09:22:18,454][01070] Environment doom_dm already registered, overwriting...
2954
+ [2024-09-06 09:22:18,459][01070] Environment doom_dwango5 already registered, overwriting...
2955
+ [2024-09-06 09:22:18,460][01070] Environment doom_my_way_home_flat_actions already registered, overwriting...
2956
+ [2024-09-06 09:22:18,461][01070] Environment doom_defend_the_center_flat_actions already registered, overwriting...
2957
+ [2024-09-06 09:22:18,463][01070] Environment doom_my_way_home already registered, overwriting...
2958
+ [2024-09-06 09:22:18,465][01070] Environment doom_deadly_corridor already registered, overwriting...
2959
+ [2024-09-06 09:22:18,468][01070] Environment doom_defend_the_center already registered, overwriting...
2960
+ [2024-09-06 09:22:18,470][01070] Environment doom_defend_the_line already registered, overwriting...
2961
+ [2024-09-06 09:22:18,471][01070] Environment doom_health_gathering already registered, overwriting...
2962
+ [2024-09-06 09:22:18,472][01070] Environment doom_health_gathering_supreme already registered, overwriting...
2963
+ [2024-09-06 09:22:18,474][01070] Environment doom_battle already registered, overwriting...
2964
+ [2024-09-06 09:22:18,476][01070] Environment doom_battle2 already registered, overwriting...
2965
+ [2024-09-06 09:22:18,477][01070] Environment doom_duel_bots already registered, overwriting...
2966
+ [2024-09-06 09:22:18,480][01070] Environment doom_deathmatch_bots already registered, overwriting...
2967
+ [2024-09-06 09:22:18,481][01070] Environment doom_duel already registered, overwriting...
2968
+ [2024-09-06 09:22:18,482][01070] Environment doom_deathmatch_full already registered, overwriting...
2969
+ [2024-09-06 09:22:18,484][01070] Environment doom_benchmark already registered, overwriting...
2970
+ [2024-09-06 09:22:18,487][01070] register_encoder_factory: <function make_vizdoom_encoder at 0x78dc5537e170>
2971
+ [2024-09-06 09:22:18,527][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
2972
+ [2024-09-06 09:22:18,530][01070] Overriding arg 'train_for_env_steps' with value 10000000 passed from command line
2973
+ [2024-09-06 09:22:18,537][01070] Experiment dir /content/train_dir/default_experiment already exists!
2974
+ [2024-09-06 09:22:18,542][01070] Resuming existing experiment from /content/train_dir/default_experiment...
2975
+ [2024-09-06 09:22:18,544][01070] Weights and Biases integration disabled
2976
+ [2024-09-06 09:22:18,549][01070] Environment var CUDA_VISIBLE_DEVICES is 0
2977
+
2978
+ [2024-09-06 09:22:21,018][01070] Starting experiment with the following configuration:
2979
+ help=False
2980
+ algo=APPO
2981
+ env=doom_health_gathering_supreme
2982
+ experiment=default_experiment
2983
+ train_dir=/content/train_dir
2984
+ restart_behavior=resume
2985
+ device=gpu
2986
+ seed=None
2987
+ num_policies=1
2988
+ async_rl=True
2989
+ serial_mode=False
2990
+ batched_sampling=False
2991
+ num_batches_to_accumulate=2
2992
+ worker_num_splits=2
2993
+ policy_workers_per_policy=1
2994
+ max_policy_lag=1000
2995
+ num_workers=8
2996
+ num_envs_per_worker=4
2997
+ batch_size=1024
2998
+ num_batches_per_epoch=1
2999
+ num_epochs=1
3000
+ rollout=32
3001
+ recurrence=32
3002
+ shuffle_minibatches=False
3003
+ gamma=0.99
3004
+ reward_scale=1.0
3005
+ reward_clip=1000.0
3006
+ value_bootstrap=False
3007
+ normalize_returns=True
3008
+ exploration_loss_coeff=0.001
3009
+ value_loss_coeff=0.5
3010
+ kl_loss_coeff=0.0
3011
+ exploration_loss=symmetric_kl
3012
+ gae_lambda=0.95
3013
+ ppo_clip_ratio=0.1
3014
+ ppo_clip_value=0.2
3015
+ with_vtrace=False
3016
+ vtrace_rho=1.0
3017
+ vtrace_c=1.0
3018
+ optimizer=adam
3019
+ adam_eps=1e-06
3020
+ adam_beta1=0.9
3021
+ adam_beta2=0.999
3022
+ max_grad_norm=4.0
3023
+ learning_rate=0.0001
3024
+ lr_schedule=constant
3025
+ lr_schedule_kl_threshold=0.008
3026
+ lr_adaptive_min=1e-06
3027
+ lr_adaptive_max=0.01
3028
+ obs_subtract_mean=0.0
3029
+ obs_scale=255.0
3030
+ normalize_input=True
3031
+ normalize_input_keys=None
3032
+ decorrelate_experience_max_seconds=0
3033
+ decorrelate_envs_on_one_worker=True
3034
+ actor_worker_gpus=[]
3035
+ set_workers_cpu_affinity=True
3036
+ force_envs_single_thread=False
3037
+ default_niceness=0
3038
+ log_to_file=True
3039
+ experiment_summaries_interval=10
3040
+ flush_summaries_interval=30
3041
+ stats_avg=100
3042
+ summaries_use_frameskip=True
3043
+ heartbeat_interval=20
3044
+ heartbeat_reporting_interval=600
3045
+ train_for_env_steps=10000000
3046
+ train_for_seconds=10000000000
3047
+ save_every_sec=120
3048
+ keep_checkpoints=2
3049
+ load_checkpoint_kind=latest
3050
+ save_milestones_sec=-1
3051
+ save_best_every_sec=5
3052
+ save_best_metric=reward
3053
+ save_best_after=100000
3054
+ benchmark=False
3055
+ encoder_mlp_layers=[512, 512]
3056
+ encoder_conv_architecture=convnet_simple
3057
+ encoder_conv_mlp_layers=[512]
3058
+ use_rnn=True
3059
+ rnn_size=512
3060
+ rnn_type=gru
3061
+ rnn_num_layers=1
3062
+ decoder_mlp_layers=[]
3063
+ nonlinearity=elu
3064
+ policy_initialization=orthogonal
3065
+ policy_init_gain=1.0
3066
+ actor_critic_share_weights=True
3067
+ adaptive_stddev=True
3068
+ continuous_tanh_scale=0.0
3069
+ initial_stddev=1.0
3070
+ use_env_info_cache=False
3071
+ env_gpu_actions=False
3072
+ env_gpu_observations=True
3073
+ env_frameskip=4
3074
+ env_framestack=1
3075
+ pixel_format=CHW
3076
+ use_record_episode_statistics=False
3077
+ with_wandb=False
3078
+ wandb_user=None
3079
+ wandb_project=sample_factory
3080
+ wandb_group=None
3081
+ wandb_job_type=SF
3082
+ wandb_tags=[]
3083
+ with_pbt=False
3084
+ pbt_mix_policies_in_one_env=True
3085
+ pbt_period_env_steps=5000000
3086
+ pbt_start_mutation=20000000
3087
+ pbt_replace_fraction=0.3
3088
+ pbt_mutation_rate=0.15
3089
+ pbt_replace_reward_gap=0.1
3090
+ pbt_replace_reward_gap_absolute=1e-06
3091
+ pbt_optimize_gamma=False
3092
+ pbt_target_objective=true_objective
3093
+ pbt_perturb_min=1.1
3094
+ pbt_perturb_max=1.5
3095
+ num_agents=-1
3096
+ num_humans=0
3097
+ num_bots=-1
3098
+ start_bot_difficulty=None
3099
+ timelimit=None
3100
+ res_w=128
3101
+ res_h=72
3102
+ wide_aspect_ratio=False
3103
+ eval_env_frameskip=1
3104
+ fps=35
3105
+ command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
3106
+ cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
3107
+ git_hash=unknown
3108
+ git_repo_name=not a git repository
3109
+ [2024-09-06 09:22:21,020][01070] Saving configuration to /content/train_dir/default_experiment/config.json...
3110
+ [2024-09-06 09:22:21,025][01070] Rollout worker 0 uses device cpu
3111
+ [2024-09-06 09:22:21,026][01070] Rollout worker 1 uses device cpu
3112
+ [2024-09-06 09:22:21,028][01070] Rollout worker 2 uses device cpu
3113
+ [2024-09-06 09:22:21,030][01070] Rollout worker 3 uses device cpu
3114
+ [2024-09-06 09:22:21,032][01070] Rollout worker 4 uses device cpu
3115
+ [2024-09-06 09:22:21,033][01070] Rollout worker 5 uses device cpu
3116
+ [2024-09-06 09:22:21,034][01070] Rollout worker 6 uses device cpu
3117
+ [2024-09-06 09:22:21,035][01070] Rollout worker 7 uses device cpu
3118
+ [2024-09-06 09:22:21,108][01070] Using GPUs [0] for process 0 (actually maps to GPUs [0])
3119
+ [2024-09-06 09:22:21,110][01070] InferenceWorker_p0-w0: min num requests: 2
3120
+ [2024-09-06 09:22:21,142][01070] Starting all processes...
3121
+ [2024-09-06 09:22:21,144][01070] Starting process learner_proc0
3122
+ [2024-09-06 09:22:21,192][01070] Starting all processes...
3123
+ [2024-09-06 09:22:21,199][01070] Starting process inference_proc0-0
3124
+ [2024-09-06 09:22:21,199][01070] Starting process rollout_proc0
3125
+ [2024-09-06 09:22:21,202][01070] Starting process rollout_proc1
3126
+ [2024-09-06 09:22:21,202][01070] Starting process rollout_proc2
3127
+ [2024-09-06 09:22:21,202][01070] Starting process rollout_proc3
3128
+ [2024-09-06 09:22:21,202][01070] Starting process rollout_proc4
3129
+ [2024-09-06 09:22:21,202][01070] Starting process rollout_proc5
3130
+ [2024-09-06 09:22:21,202][01070] Starting process rollout_proc6
3131
+ [2024-09-06 09:22:21,202][01070] Starting process rollout_proc7
3132
+ [2024-09-06 09:22:36,030][31321] Using GPUs [0] for process 0 (actually maps to GPUs [0])
3133
+ [2024-09-06 09:22:36,032][31321] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
3134
+ [2024-09-06 09:22:36,093][31321] Num visible devices: 1
3135
+ [2024-09-06 09:22:36,128][31321] Starting seed is not provided
3136
+ [2024-09-06 09:22:36,129][31321] Using GPUs [0] for process 0 (actually maps to GPUs [0])
3137
+ [2024-09-06 09:22:36,129][31321] Initializing actor-critic model on device cuda:0
3138
+ [2024-09-06 09:22:36,130][31321] RunningMeanStd input shape: (3, 72, 128)
3139
+ [2024-09-06 09:22:36,131][31321] RunningMeanStd input shape: (1,)
3140
+ [2024-09-06 09:22:36,216][31321] ConvEncoder: input_channels=3
3141
+ [2024-09-06 09:22:36,685][31338] Worker 3 uses CPU cores [1]
3142
+ [2024-09-06 09:22:36,789][31339] Worker 4 uses CPU cores [0]
3143
+ [2024-09-06 09:22:36,944][31335] Worker 0 uses CPU cores [0]
3144
+ [2024-09-06 09:22:37,005][31321] Conv encoder output size: 512
3145
+ [2024-09-06 09:22:37,007][31321] Policy head output size: 512
3146
+ [2024-09-06 09:22:37,042][31321] Created Actor Critic model with architecture:
3147
+ [2024-09-06 09:22:37,044][31321] ActorCriticSharedWeights(
3148
+ (obs_normalizer): ObservationNormalizer(
3149
+ (running_mean_std): RunningMeanStdDictInPlace(
3150
+ (running_mean_std): ModuleDict(
3151
+ (obs): RunningMeanStdInPlace()
3152
+ )
3153
+ )
3154
+ )
3155
+ (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
3156
+ (encoder): VizdoomEncoder(
3157
+ (basic_encoder): ConvEncoder(
3158
+ (enc): RecursiveScriptModule(
3159
+ original_name=ConvEncoderImpl
3160
+ (conv_head): RecursiveScriptModule(
3161
+ original_name=Sequential
3162
+ (0): RecursiveScriptModule(original_name=Conv2d)
3163
+ (1): RecursiveScriptModule(original_name=ELU)
3164
+ (2): RecursiveScriptModule(original_name=Conv2d)
3165
+ (3): RecursiveScriptModule(original_name=ELU)
3166
+ (4): RecursiveScriptModule(original_name=Conv2d)
3167
+ (5): RecursiveScriptModule(original_name=ELU)
3168
+ )
3169
+ (mlp_layers): RecursiveScriptModule(
3170
+ original_name=Sequential
3171
+ (0): RecursiveScriptModule(original_name=Linear)
3172
+ (1): RecursiveScriptModule(original_name=ELU)
3173
+ )
3174
+ )
3175
+ )
3176
+ )
3177
+ (core): ModelCoreRNN(
3178
+ (core): GRU(512, 512)
3179
+ )
3180
+ (decoder): MlpDecoder(
3181
+ (mlp): Identity()
3182
+ )
3183
+ (critic_linear): Linear(in_features=512, out_features=1, bias=True)
3184
+ (action_parameterization): ActionParameterizationDefault(
3185
+ (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
3186
+ )
3187
+ )
3188
+ [2024-09-06 09:22:37,147][31341] Worker 7 uses CPU cores [1]
3189
+ [2024-09-06 09:22:37,225][31342] Worker 6 uses CPU cores [0]
3190
+ [2024-09-06 09:22:37,277][31337] Worker 2 uses CPU cores [0]
3191
+ [2024-09-06 09:22:37,291][31334] Using GPUs [0] for process 0 (actually maps to GPUs [0])
3192
+ [2024-09-06 09:22:37,292][31334] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
3193
+ [2024-09-06 09:22:37,329][31321] Using optimizer <class 'torch.optim.adam.Adam'>
3194
+ [2024-09-06 09:22:37,352][31334] Num visible devices: 1
3195
+ [2024-09-06 09:22:37,367][31340] Worker 5 uses CPU cores [1]
3196
+ [2024-09-06 09:22:37,377][31336] Worker 1 uses CPU cores [1]
3197
+ [2024-09-06 09:22:37,959][31321] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001833_7507968.pth...
3198
+ [2024-09-06 09:22:38,004][31321] Loading model from checkpoint
3199
+ [2024-09-06 09:22:38,005][31321] Loaded experiment state at self.train_step=1833, self.env_steps=7507968
3200
+ [2024-09-06 09:22:38,006][31321] Initialized policy 0 weights for model version 1833
3201
+ [2024-09-06 09:22:38,010][31321] Using GPUs [0] for process 0 (actually maps to GPUs [0])
3202
+ [2024-09-06 09:22:38,016][31321] LearnerWorker_p0 finished initialization!
3203
+ [2024-09-06 09:22:38,102][31334] RunningMeanStd input shape: (3, 72, 128)
3204
+ [2024-09-06 09:22:38,103][31334] RunningMeanStd input shape: (1,)
3205
+ [2024-09-06 09:22:38,115][31334] ConvEncoder: input_channels=3
3206
+ [2024-09-06 09:22:38,216][31334] Conv encoder output size: 512
3207
+ [2024-09-06 09:22:38,216][31334] Policy head output size: 512
3208
+ [2024-09-06 09:22:38,268][01070] Inference worker 0-0 is ready!
3209
+ [2024-09-06 09:22:38,269][01070] All inference workers are ready! Signal rollout workers to start!
3210
+ [2024-09-06 09:22:38,464][31338] Doom resolution: 160x120, resize resolution: (128, 72)
3211
+ [2024-09-06 09:22:38,472][31341] Doom resolution: 160x120, resize resolution: (128, 72)
3212
+ [2024-09-06 09:22:38,478][31340] Doom resolution: 160x120, resize resolution: (128, 72)
3213
+ [2024-09-06 09:22:38,479][31337] Doom resolution: 160x120, resize resolution: (128, 72)
3214
+ [2024-09-06 09:22:38,480][31336] Doom resolution: 160x120, resize resolution: (128, 72)
3215
+ [2024-09-06 09:22:38,484][31339] Doom resolution: 160x120, resize resolution: (128, 72)
3216
+ [2024-09-06 09:22:38,477][31342] Doom resolution: 160x120, resize resolution: (128, 72)
3217
+ [2024-09-06 09:22:38,494][31335] Doom resolution: 160x120, resize resolution: (128, 72)
3218
+ [2024-09-06 09:22:38,550][01070] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 7507968. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
3219
+ [2024-09-06 09:22:40,039][31338] Decorrelating experience for 0 frames...
3220
+ [2024-09-06 09:22:40,046][31341] Decorrelating experience for 0 frames...
3221
+ [2024-09-06 09:22:40,050][31340] Decorrelating experience for 0 frames...
3222
+ [2024-09-06 09:22:40,067][31335] Decorrelating experience for 0 frames...
3223
+ [2024-09-06 09:22:40,071][31339] Decorrelating experience for 0 frames...
3224
+ [2024-09-06 09:22:40,073][31342] Decorrelating experience for 0 frames...
3225
+ [2024-09-06 09:22:40,457][31342] Decorrelating experience for 32 frames...
3226
+ [2024-09-06 09:22:41,103][01070] Heartbeat connected on Batcher_0
3227
+ [2024-09-06 09:22:41,108][01070] Heartbeat connected on LearnerWorker_p0
3228
+ [2024-09-06 09:22:41,149][31341] Decorrelating experience for 32 frames...
3229
+ [2024-09-06 09:22:41,152][31338] Decorrelating experience for 32 frames...
3230
+ [2024-09-06 09:22:41,156][01070] Heartbeat connected on InferenceWorker_p0-w0
3231
+ [2024-09-06 09:22:41,155][31336] Decorrelating experience for 0 frames...
3232
+ [2024-09-06 09:22:41,251][31340] Decorrelating experience for 32 frames...
3233
+ [2024-09-06 09:22:41,935][31339] Decorrelating experience for 32 frames...
3234
+ [2024-09-06 09:22:41,990][31342] Decorrelating experience for 64 frames...
3235
+ [2024-09-06 09:22:42,475][31336] Decorrelating experience for 32 frames...
3236
+ [2024-09-06 09:22:42,759][31338] Decorrelating experience for 64 frames...
3237
+ [2024-09-06 09:22:42,770][31341] Decorrelating experience for 64 frames...
3238
+ [2024-09-06 09:22:42,938][31337] Decorrelating experience for 0 frames...
3239
+ [2024-09-06 09:22:43,009][31342] Decorrelating experience for 96 frames...
3240
+ [2024-09-06 09:22:43,190][01070] Heartbeat connected on RolloutWorker_w6
3241
+ [2024-09-06 09:22:43,550][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 7507968. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
3242
+ [2024-09-06 09:22:43,730][31341] Decorrelating experience for 96 frames...
3243
+ [2024-09-06 09:22:44,157][01070] Heartbeat connected on RolloutWorker_w7
3244
+ [2024-09-06 09:22:44,451][31339] Decorrelating experience for 64 frames...
3245
+ [2024-09-06 09:22:44,856][31336] Decorrelating experience for 64 frames...
3246
+ [2024-09-06 09:22:44,864][31335] Decorrelating experience for 32 frames...
3247
+ [2024-09-06 09:22:46,558][31337] Decorrelating experience for 32 frames...
3248
+ [2024-09-06 09:22:47,472][31340] Decorrelating experience for 64 frames...
3249
+ [2024-09-06 09:22:47,615][31339] Decorrelating experience for 96 frames...
3250
+ [2024-09-06 09:22:47,884][31336] Decorrelating experience for 96 frames...
3251
+ [2024-09-06 09:22:48,559][01070] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 7507968. Throughput: 0: 107.7. Samples: 1078. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
3252
+ [2024-09-06 09:22:48,565][01070] Avg episode reward: [(0, '4.944')]
3253
+ [2024-09-06 09:22:48,624][01070] Heartbeat connected on RolloutWorker_w4
3254
+ [2024-09-06 09:22:48,733][01070] Heartbeat connected on RolloutWorker_w1
3255
+ [2024-09-06 09:22:49,622][31335] Decorrelating experience for 64 frames...
3256
+ [2024-09-06 09:22:51,819][31321] Signal inference workers to stop experience collection...
3257
+ [2024-09-06 09:22:51,828][31338] Decorrelating experience for 96 frames...
3258
+ [2024-09-06 09:22:51,835][31334] InferenceWorker_p0-w0: stopping experience collection
3259
+ [2024-09-06 09:22:52,119][31340] Decorrelating experience for 96 frames...
3260
+ [2024-09-06 09:22:52,209][01070] Heartbeat connected on RolloutWorker_w3
3261
+ [2024-09-06 09:22:52,262][01070] Heartbeat connected on RolloutWorker_w5
3262
+ [2024-09-06 09:22:52,408][31337] Decorrelating experience for 64 frames...
3263
+ [2024-09-06 09:22:52,497][31335] Decorrelating experience for 96 frames...
3264
+ [2024-09-06 09:22:52,617][01070] Heartbeat connected on RolloutWorker_w0
3265
+ [2024-09-06 09:22:52,947][31337] Decorrelating experience for 96 frames...
3266
+ [2024-09-06 09:22:53,040][01070] Heartbeat connected on RolloutWorker_w2
3267
+ [2024-09-06 09:22:53,160][31321] Signal inference workers to resume experience collection...
3268
+ [2024-09-06 09:22:53,160][31334] InferenceWorker_p0-w0: resuming experience collection
3269
+ [2024-09-06 09:22:53,550][01070] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 7512064. Throughput: 0: 155.3. Samples: 2330. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
3270
+ [2024-09-06 09:22:53,554][01070] Avg episode reward: [(0, '6.489')]
3271
+ [2024-09-06 09:22:58,550][01070] Fps is (10 sec: 2870.0, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 7536640. Throughput: 0: 280.7. Samples: 5614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
3272
+ [2024-09-06 09:22:58,553][01070] Avg episode reward: [(0, '9.260')]
3273
+ [2024-09-06 09:23:00,818][31334] Updated weights for policy 0, policy_version 1843 (0.0158)
3274
+ [2024-09-06 09:23:03,550][01070] Fps is (10 sec: 4096.0, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 7553024. Throughput: 0: 476.4. Samples: 11910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
3275
+ [2024-09-06 09:23:03,552][01070] Avg episode reward: [(0, '15.401')]
3276
+ [2024-09-06 09:23:08,550][01070] Fps is (10 sec: 3276.7, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 7569408. Throughput: 0: 542.8. Samples: 16284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
3277
+ [2024-09-06 09:23:08,552][01070] Avg episode reward: [(0, '17.480')]
3278
+ [2024-09-06 09:23:13,081][31334] Updated weights for policy 0, policy_version 1853 (0.0030)
3279
+ [2024-09-06 09:23:13,550][01070] Fps is (10 sec: 3686.4, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 7589888. Throughput: 0: 541.2. Samples: 18942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3280
+ [2024-09-06 09:23:13,555][01070] Avg episode reward: [(0, '23.472')]
3281
+ [2024-09-06 09:23:18,550][01070] Fps is (10 sec: 4505.7, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 7614464. Throughput: 0: 651.7. Samples: 26066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3282
+ [2024-09-06 09:23:18,556][01070] Avg episode reward: [(0, '24.891')]
3283
+ [2024-09-06 09:23:23,550][01070] Fps is (10 sec: 3686.4, 60 sec: 2639.6, 300 sec: 2639.6). Total num frames: 7626752. Throughput: 0: 685.1. Samples: 30830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
3284
+ [2024-09-06 09:23:23,554][01070] Avg episode reward: [(0, '25.069')]
3285
+ [2024-09-06 09:23:24,047][31334] Updated weights for policy 0, policy_version 1863 (0.0031)
3286
+ [2024-09-06 09:23:28,550][01070] Fps is (10 sec: 3276.8, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 7647232. Throughput: 0: 741.2. Samples: 33354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
3287
+ [2024-09-06 09:23:28,551][01070] Avg episode reward: [(0, '27.639')]
3288
+ [2024-09-06 09:23:33,156][31334] Updated weights for policy 0, policy_version 1873 (0.0025)
3289
+ [2024-09-06 09:23:33,550][01070] Fps is (10 sec: 4505.6, 60 sec: 2978.9, 300 sec: 2978.9). Total num frames: 7671808. Throughput: 0: 875.4. Samples: 40464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3290
+ [2024-09-06 09:23:33,552][01070] Avg episode reward: [(0, '30.649')]
3291
+ [2024-09-06 09:23:33,557][31321] Saving new best policy, reward=30.649!
3292
+ [2024-09-06 09:23:38,550][01070] Fps is (10 sec: 4095.7, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 7688192. Throughput: 0: 974.7. Samples: 46194. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
3293
+ [2024-09-06 09:23:38,553][01070] Avg episode reward: [(0, '32.833')]
3294
+ [2024-09-06 09:23:38,574][31321] Saving new best policy, reward=32.833!
3295
+ [2024-09-06 09:23:43,550][01070] Fps is (10 sec: 3276.7, 60 sec: 3276.8, 300 sec: 3024.7). Total num frames: 7704576. Throughput: 0: 947.1. Samples: 48236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
3296
+ [2024-09-06 09:23:43,556][01070] Avg episode reward: [(0, '31.634')]
3297
+ [2024-09-06 09:23:45,051][31334] Updated weights for policy 0, policy_version 1883 (0.0024)
3298
+ [2024-09-06 09:23:48,550][01070] Fps is (10 sec: 4096.3, 60 sec: 3687.0, 300 sec: 3159.8). Total num frames: 7729152. Throughput: 0: 945.8. Samples: 54472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3299
+ [2024-09-06 09:23:48,555][01070] Avg episode reward: [(0, '30.545')]
3300
+ [2024-09-06 09:23:53,550][01070] Fps is (10 sec: 4505.8, 60 sec: 3959.5, 300 sec: 3222.2). Total num frames: 7749632. Throughput: 0: 1005.1. Samples: 61512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3301
+ [2024-09-06 09:23:53,555][01070] Avg episode reward: [(0, '29.693')]
3302
+ [2024-09-06 09:23:53,894][31334] Updated weights for policy 0, policy_version 1893 (0.0023)
3303
+ [2024-09-06 09:23:58,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3225.6). Total num frames: 7766016. Throughput: 0: 993.6. Samples: 63654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3304
+ [2024-09-06 09:23:58,552][01070] Avg episode reward: [(0, '29.410')]
3305
+ [2024-09-06 09:24:03,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3276.8). Total num frames: 7786496. Throughput: 0: 953.6. Samples: 68980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3306
+ [2024-09-06 09:24:03,553][01070] Avg episode reward: [(0, '29.158')]
3307
+ [2024-09-06 09:24:04,972][31334] Updated weights for policy 0, policy_version 1903 (0.0015)
3308
+ [2024-09-06 09:24:08,550][01070] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3367.8). Total num frames: 7811072. Throughput: 0: 1005.6. Samples: 76080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3309
+ [2024-09-06 09:24:08,552][01070] Avg episode reward: [(0, '26.234')]
3310
+ [2024-09-06 09:24:13,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3363.0). Total num frames: 7827456. Throughput: 0: 1019.1. Samples: 79212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3311
+ [2024-09-06 09:24:13,554][01070] Avg episode reward: [(0, '26.548')]
3312
+ [2024-09-06 09:24:15,998][31334] Updated weights for policy 0, policy_version 1913 (0.0045)
3313
+ [2024-09-06 09:24:18,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3358.7). Total num frames: 7843840. Throughput: 0: 955.4. Samples: 83456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
3314
+ [2024-09-06 09:24:18,555][01070] Avg episode reward: [(0, '26.397')]
3315
+ [2024-09-06 09:24:18,565][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001915_7843840.pth...
3316
+ [2024-09-06 09:24:18,702][31321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001755_7188480.pth
3317
+ [2024-09-06 09:24:23,550][01070] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3393.8). Total num frames: 7864320. Throughput: 0: 976.5. Samples: 90134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
3318
+ [2024-09-06 09:24:23,555][01070] Avg episode reward: [(0, '27.525')]
3319
+ [2024-09-06 09:24:25,519][31334] Updated weights for policy 0, policy_version 1923 (0.0026)
3320
+ [2024-09-06 09:24:28,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3463.0). Total num frames: 7888896. Throughput: 0: 1010.0. Samples: 93684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
3321
+ [2024-09-06 09:24:28,557][01070] Avg episode reward: [(0, '27.802')]
3322
+ [2024-09-06 09:24:33,550][01070] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3419.3). Total num frames: 7901184. Throughput: 0: 983.9. Samples: 98746. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3323
+ [2024-09-06 09:24:33,552][01070] Avg episode reward: [(0, '28.191')]
3324
+ [2024-09-06 09:24:36,946][31334] Updated weights for policy 0, policy_version 1933 (0.0021)
3325
+ [2024-09-06 09:24:38,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3447.5). Total num frames: 7921664. Throughput: 0: 955.6. Samples: 104512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3326
+ [2024-09-06 09:24:38,555][01070] Avg episode reward: [(0, '28.196')]
3327
+ [2024-09-06 09:24:43,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3506.2). Total num frames: 7946240. Throughput: 0: 986.7. Samples: 108056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3328
+ [2024-09-06 09:24:43,552][01070] Avg episode reward: [(0, '28.823')]
3329
+ [2024-09-06 09:24:45,998][31334] Updated weights for policy 0, policy_version 1943 (0.0024)
3330
+ [2024-09-06 09:24:48,552][01070] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3497.3). Total num frames: 7962624. Throughput: 0: 1001.7. Samples: 114060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3331
+ [2024-09-06 09:24:48,558][01070] Avg episode reward: [(0, '30.975')]
3332
+ [2024-09-06 09:24:53,550][01070] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3458.8). Total num frames: 7974912. Throughput: 0: 927.9. Samples: 117836. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3333
+ [2024-09-06 09:24:53,552][01070] Avg episode reward: [(0, '30.576')]
3334
+ [2024-09-06 09:24:58,550][01070] Fps is (10 sec: 2867.7, 60 sec: 3754.7, 300 sec: 3452.3). Total num frames: 7991296. Throughput: 0: 904.0. Samples: 119892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3335
+ [2024-09-06 09:24:58,553][01070] Avg episode reward: [(0, '29.362')]
3336
+ [2024-09-06 09:25:00,366][31334] Updated weights for policy 0, policy_version 1953 (0.0019)
3337
+ [2024-09-06 09:25:03,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3474.5). Total num frames: 8011776. Throughput: 0: 940.5. Samples: 125780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3338
+ [2024-09-06 09:25:03,552][01070] Avg episode reward: [(0, '27.933')]
3339
+ [2024-09-06 09:25:08,551][01070] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3467.9). Total num frames: 8028160. Throughput: 0: 905.1. Samples: 130864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3340
+ [2024-09-06 09:25:08,557][01070] Avg episode reward: [(0, '27.199')]
3341
+ [2024-09-06 09:25:11,902][31334] Updated weights for policy 0, policy_version 1963 (0.0032)
3342
+ [2024-09-06 09:25:13,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3461.8). Total num frames: 8044544. Throughput: 0: 874.2. Samples: 133024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3343
+ [2024-09-06 09:25:13,552][01070] Avg episode reward: [(0, '26.753')]
3344
+ [2024-09-06 09:25:18,550][01070] Fps is (10 sec: 4096.7, 60 sec: 3754.7, 300 sec: 3507.2). Total num frames: 8069120. Throughput: 0: 914.2. Samples: 139886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3345
+ [2024-09-06 09:25:18,556][01070] Avg episode reward: [(0, '22.844')]
3346
+ [2024-09-06 09:25:20,756][31334] Updated weights for policy 0, policy_version 1973 (0.0025)
3347
+ [2024-09-06 09:25:23,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3525.0). Total num frames: 8089600. Throughput: 0: 923.9. Samples: 146086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
3348
+ [2024-09-06 09:25:23,552][01070] Avg episode reward: [(0, '23.151')]
3349
+ [2024-09-06 09:25:28,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3493.6). Total num frames: 8101888. Throughput: 0: 891.0. Samples: 148150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3350
+ [2024-09-06 09:25:28,555][01070] Avg episode reward: [(0, '22.191')]
3351
+ [2024-09-06 09:25:32,254][31334] Updated weights for policy 0, policy_version 1983 (0.0018)
3352
+ [2024-09-06 09:25:33,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3534.3). Total num frames: 8126464. Throughput: 0: 892.6. Samples: 154226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3353
+ [2024-09-06 09:25:33,553][01070] Avg episode reward: [(0, '21.268')]
3354
+ [2024-09-06 09:25:38,550][01070] Fps is (10 sec: 4915.2, 60 sec: 3822.9, 300 sec: 3572.6). Total num frames: 8151040. Throughput: 0: 966.9. Samples: 161348. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
3355
+ [2024-09-06 09:25:38,553][01070] Avg episode reward: [(0, '23.448')]
3356
+ [2024-09-06 09:25:42,015][31334] Updated weights for policy 0, policy_version 1993 (0.0023)
3357
+ [2024-09-06 09:25:43,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3542.5). Total num frames: 8163328. Throughput: 0: 973.9. Samples: 163716. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
3358
+ [2024-09-06 09:25:43,566][01070] Avg episode reward: [(0, '23.931')]
3359
+ [2024-09-06 09:25:48,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3557.1). Total num frames: 8183808. Throughput: 0: 950.0. Samples: 168532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3360
+ [2024-09-06 09:25:48,554][01070] Avg episode reward: [(0, '24.324')]
3361
+ [2024-09-06 09:25:52,316][31334] Updated weights for policy 0, policy_version 2003 (0.0041)
3362
+ [2024-09-06 09:25:53,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3591.9). Total num frames: 8208384. Throughput: 0: 994.5. Samples: 175614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3363
+ [2024-09-06 09:25:53,557][01070] Avg episode reward: [(0, '24.519')]
3364
+ [2024-09-06 09:25:58,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3604.5). Total num frames: 8228864. Throughput: 0: 1026.0. Samples: 179196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3365
+ [2024-09-06 09:25:58,554][01070] Avg episode reward: [(0, '26.480')]
3366
+ [2024-09-06 09:26:03,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3576.5). Total num frames: 8241152. Throughput: 0: 967.9. Samples: 183442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3367
+ [2024-09-06 09:26:03,557][01070] Avg episode reward: [(0, '27.819')]
3368
+ [2024-09-06 09:26:03,999][31334] Updated weights for policy 0, policy_version 2013 (0.0022)
3369
+ [2024-09-06 09:26:08,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3608.4). Total num frames: 8265728. Throughput: 0: 975.7. Samples: 189992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3370
+ [2024-09-06 09:26:08,553][01070] Avg episode reward: [(0, '26.706')]
3371
+ [2024-09-06 09:26:12,456][31334] Updated weights for policy 0, policy_version 2023 (0.0028)
3372
+ [2024-09-06 09:26:13,550][01070] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3638.8). Total num frames: 8290304. Throughput: 0: 1009.5. Samples: 193576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3373
+ [2024-09-06 09:26:13,552][01070] Avg episode reward: [(0, '27.588')]
3374
+ [2024-09-06 09:26:18,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3611.9). Total num frames: 8302592. Throughput: 0: 990.3. Samples: 198790. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
3375
+ [2024-09-06 09:26:18,552][01070] Avg episode reward: [(0, '27.201')]
3376
+ [2024-09-06 09:26:18,571][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002027_8302592.pth...
3377
+ [2024-09-06 09:26:18,738][31321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001833_7507968.pth
3378
+ [2024-09-06 09:26:23,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3622.7). Total num frames: 8323072. Throughput: 0: 964.0. Samples: 204726. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
3379
+ [2024-09-06 09:26:23,552][01070] Avg episode reward: [(0, '28.094')]
3380
+ [2024-09-06 09:26:23,905][31334] Updated weights for policy 0, policy_version 2033 (0.0024)
3381
+ [2024-09-06 09:26:28,550][01070] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3650.8). Total num frames: 8347648. Throughput: 0: 990.6. Samples: 208294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3382
+ [2024-09-06 09:26:28,553][01070] Avg episode reward: [(0, '27.199')]
3383
+ [2024-09-06 09:26:33,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3642.8). Total num frames: 8364032. Throughput: 0: 1014.7. Samples: 214192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3384
+ [2024-09-06 09:26:33,555][01070] Avg episode reward: [(0, '26.943')]
3385
+ [2024-09-06 09:26:34,200][31334] Updated weights for policy 0, policy_version 2043 (0.0015)
3386
+ [2024-09-06 09:26:38,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3635.2). Total num frames: 8380416. Throughput: 0: 965.6. Samples: 219068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3387
+ [2024-09-06 09:26:38,555][01070] Avg episode reward: [(0, '25.237')]
3388
+ [2024-09-06 09:26:43,550][01070] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3661.3). Total num frames: 8404992. Throughput: 0: 965.8. Samples: 222658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3389
+ [2024-09-06 09:26:43,557][01070] Avg episode reward: [(0, '26.134')]
3390
+ [2024-09-06 09:26:43,951][31334] Updated weights for policy 0, policy_version 2053 (0.0019)
3391
+ [2024-09-06 09:26:48,554][01070] Fps is (10 sec: 4503.9, 60 sec: 4027.5, 300 sec: 3670.0). Total num frames: 8425472. Throughput: 0: 1024.3. Samples: 229538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3392
+ [2024-09-06 09:26:48,558][01070] Avg episode reward: [(0, '26.531')]
3393
+ [2024-09-06 09:26:53,551][01070] Fps is (10 sec: 3685.9, 60 sec: 3891.1, 300 sec: 3662.3). Total num frames: 8441856. Throughput: 0: 973.3. Samples: 233794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3394
+ [2024-09-06 09:26:53,557][01070] Avg episode reward: [(0, '26.496')]
3395
+ [2024-09-06 09:26:55,624][31334] Updated weights for policy 0, policy_version 2063 (0.0027)
3396
+ [2024-09-06 09:26:58,550][01070] Fps is (10 sec: 3687.8, 60 sec: 3891.2, 300 sec: 3670.6). Total num frames: 8462336. Throughput: 0: 961.7. Samples: 236854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3397
+ [2024-09-06 09:26:58,558][01070] Avg episode reward: [(0, '26.453')]
3398
+ [2024-09-06 09:27:03,550][01070] Fps is (10 sec: 4506.3, 60 sec: 4096.0, 300 sec: 3694.1). Total num frames: 8486912. Throughput: 0: 1003.4. Samples: 243944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3399
+ [2024-09-06 09:27:03,558][01070] Avg episode reward: [(0, '27.902')]
3400
+ [2024-09-06 09:27:04,386][31334] Updated weights for policy 0, policy_version 2073 (0.0021)
3401
+ [2024-09-06 09:27:08,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3671.2). Total num frames: 8499200. Throughput: 0: 988.0. Samples: 249184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3402
+ [2024-09-06 09:27:08,552][01070] Avg episode reward: [(0, '29.399')]
3403
+ [2024-09-06 09:27:13,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3679.0). Total num frames: 8519680. Throughput: 0: 957.6. Samples: 251386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3404
+ [2024-09-06 09:27:13,552][01070] Avg episode reward: [(0, '30.254')]
3405
+ [2024-09-06 09:27:15,982][31334] Updated weights for policy 0, policy_version 2083 (0.0032)
3406
+ [2024-09-06 09:27:18,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3701.0). Total num frames: 8544256. Throughput: 0: 977.0. Samples: 258158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3407
+ [2024-09-06 09:27:18,552][01070] Avg episode reward: [(0, '30.973')]
3408
+ [2024-09-06 09:27:23,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3708.0). Total num frames: 8564736. Throughput: 0: 1010.3. Samples: 264532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3409
+ [2024-09-06 09:27:23,555][01070] Avg episode reward: [(0, '31.711')]
3410
+ [2024-09-06 09:27:26,470][31334] Updated weights for policy 0, policy_version 2093 (0.0036)
3411
+ [2024-09-06 09:27:28,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3686.4). Total num frames: 8577024. Throughput: 0: 976.8. Samples: 266614. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3412
+ [2024-09-06 09:27:28,552][01070] Avg episode reward: [(0, '32.002')]
3413
+ [2024-09-06 09:27:33,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3693.3). Total num frames: 8597504. Throughput: 0: 951.4. Samples: 272346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3414
+ [2024-09-06 09:27:33,557][01070] Avg episode reward: [(0, '32.432')]
3415
+ [2024-09-06 09:27:36,269][31334] Updated weights for policy 0, policy_version 2103 (0.0040)
3416
+ [2024-09-06 09:27:38,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3776.7). Total num frames: 8622080. Throughput: 0: 1013.1. Samples: 279382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3417
+ [2024-09-06 09:27:38,556][01070] Avg episode reward: [(0, '30.337')]
3418
+ [2024-09-06 09:27:43,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.3). Total num frames: 8638464. Throughput: 0: 1001.7. Samples: 281930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3419
+ [2024-09-06 09:27:43,552][01070] Avg episode reward: [(0, '30.111')]
3420
+ [2024-09-06 09:27:48,104][31334] Updated weights for policy 0, policy_version 2113 (0.0024)
3421
+ [2024-09-06 09:27:48,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3823.2, 300 sec: 3873.8). Total num frames: 8654848. Throughput: 0: 943.9. Samples: 286418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3422
+ [2024-09-06 09:27:48,552][01070] Avg episode reward: [(0, '28.878')]
3423
+ [2024-09-06 09:27:53,550][01070] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 8679424. Throughput: 0: 983.7. Samples: 293450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3424
+ [2024-09-06 09:27:53,556][01070] Avg episode reward: [(0, '28.257')]
3425
+ [2024-09-06 09:27:56,774][31334] Updated weights for policy 0, policy_version 2123 (0.0031)
3426
+ [2024-09-06 09:27:58,552][01070] Fps is (10 sec: 4504.7, 60 sec: 3959.3, 300 sec: 3887.7). Total num frames: 8699904. Throughput: 0: 1012.7. Samples: 296960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3427
+ [2024-09-06 09:27:58,554][01070] Avg episode reward: [(0, '28.838')]
3428
+ [2024-09-06 09:28:03,550][01070] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 8712192. Throughput: 0: 961.9. Samples: 301442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3429
+ [2024-09-06 09:28:03,553][01070] Avg episode reward: [(0, '28.745')]
3430
+ [2024-09-06 09:28:08,412][31334] Updated weights for policy 0, policy_version 2133 (0.0013)
3431
+ [2024-09-06 09:28:08,550][01070] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 8736768. Throughput: 0: 957.9. Samples: 307636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
3432
+ [2024-09-06 09:28:08,552][01070] Avg episode reward: [(0, '29.904')]
3433
+ [2024-09-06 09:28:13,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 8757248. Throughput: 0: 989.8. Samples: 311156. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3434
+ [2024-09-06 09:28:13,552][01070] Avg episode reward: [(0, '29.493')]
3435
+ [2024-09-06 09:28:18,552][01070] Fps is (10 sec: 3685.4, 60 sec: 3822.8, 300 sec: 3887.7). Total num frames: 8773632. Throughput: 0: 984.0. Samples: 316628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3436
+ [2024-09-06 09:28:18,556][01070] Avg episode reward: [(0, '27.888')]
3437
+ [2024-09-06 09:28:18,569][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002142_8773632.pth...
3438
+ [2024-09-06 09:28:18,718][31321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001915_7843840.pth
3439
+ [2024-09-06 09:28:19,217][31334] Updated weights for policy 0, policy_version 2143 (0.0036)
3440
+ [2024-09-06 09:28:23,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 8790016. Throughput: 0: 940.9. Samples: 321724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3441
+ [2024-09-06 09:28:23,552][01070] Avg episode reward: [(0, '28.035')]
3442
+ [2024-09-06 09:28:28,550][01070] Fps is (10 sec: 4097.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 8814592. Throughput: 0: 962.2. Samples: 325230. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3443
+ [2024-09-06 09:28:28,554][01070] Avg episode reward: [(0, '25.924')]
3444
+ [2024-09-06 09:28:28,713][31334] Updated weights for policy 0, policy_version 2153 (0.0035)
3445
+ [2024-09-06 09:28:33,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 8835072. Throughput: 0: 1014.2. Samples: 332058. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3446
+ [2024-09-06 09:28:33,559][01070] Avg episode reward: [(0, '26.391')]
3447
+ [2024-09-06 09:28:38,550][01070] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 8851456. Throughput: 0: 955.3. Samples: 336440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3448
+ [2024-09-06 09:28:38,554][01070] Avg episode reward: [(0, '26.390')]
3449
+ [2024-09-06 09:28:40,330][31334] Updated weights for policy 0, policy_version 2163 (0.0015)
3450
+ [2024-09-06 09:28:43,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 8871936. Throughput: 0: 948.3. Samples: 339630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3451
+ [2024-09-06 09:28:43,552][01070] Avg episode reward: [(0, '25.753')]
3452
+ [2024-09-06 09:28:48,550][01070] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3887.7). Total num frames: 8896512. Throughput: 0: 1000.0. Samples: 346442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3453
+ [2024-09-06 09:28:48,557][01070] Avg episode reward: [(0, '25.859')]
3454
+ [2024-09-06 09:28:49,169][31334] Updated weights for policy 0, policy_version 2173 (0.0034)
3455
+ [2024-09-06 09:28:53,552][01070] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3873.8). Total num frames: 8908800. Throughput: 0: 974.6. Samples: 351494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3456
+ [2024-09-06 09:28:53,558][01070] Avg episode reward: [(0, '25.517')]
3457
+ [2024-09-06 09:28:58,550][01070] Fps is (10 sec: 3276.7, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 8929280. Throughput: 0: 943.7. Samples: 353622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
3458
+ [2024-09-06 09:28:58,553][01070] Avg episode reward: [(0, '26.407')]
3459
+ [2024-09-06 09:29:00,954][31334] Updated weights for policy 0, policy_version 2183 (0.0026)
3460
+ [2024-09-06 09:29:03,555][01070] Fps is (10 sec: 4094.8, 60 sec: 3959.1, 300 sec: 3859.9). Total num frames: 8949760. Throughput: 0: 970.5. Samples: 360304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
3461
+ [2024-09-06 09:29:03,557][01070] Avg episode reward: [(0, '25.470')]
3462
+ [2024-09-06 09:29:08,550][01070] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 8962048. Throughput: 0: 944.0. Samples: 364206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3463
+ [2024-09-06 09:29:08,555][01070] Avg episode reward: [(0, '25.993')]
3464
+ [2024-09-06 09:29:13,555][01070] Fps is (10 sec: 2457.4, 60 sec: 3617.8, 300 sec: 3832.1). Total num frames: 8974336. Throughput: 0: 907.2. Samples: 366058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3465
+ [2024-09-06 09:29:13,562][01070] Avg episode reward: [(0, '25.573')]
3466
+ [2024-09-06 09:29:15,599][31334] Updated weights for policy 0, policy_version 2193 (0.0034)
3467
+ [2024-09-06 09:29:18,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3832.2). Total num frames: 8994816. Throughput: 0: 869.2. Samples: 371170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3468
+ [2024-09-06 09:29:18,557][01070] Avg episode reward: [(0, '25.499')]
3469
+ [2024-09-06 09:29:23,550][01070] Fps is (10 sec: 4508.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 9019392. Throughput: 0: 931.0. Samples: 378336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3470
+ [2024-09-06 09:29:23,558][01070] Avg episode reward: [(0, '25.004')]
3471
+ [2024-09-06 09:29:24,221][31334] Updated weights for policy 0, policy_version 2203 (0.0023)
3472
+ [2024-09-06 09:29:28,550][01070] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 9035776. Throughput: 0: 928.8. Samples: 381428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3473
+ [2024-09-06 09:29:28,558][01070] Avg episode reward: [(0, '25.638')]
3474
+ [2024-09-06 09:29:33,550][01070] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 9052160. Throughput: 0: 872.4. Samples: 385698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3475
+ [2024-09-06 09:29:33,556][01070] Avg episode reward: [(0, '25.467')]
3476
+ [2024-09-06 09:29:35,802][31334] Updated weights for policy 0, policy_version 2213 (0.0024)
3477
+ [2024-09-06 09:29:38,550][01070] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 9076736. Throughput: 0: 917.8. Samples: 392794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3478
+ [2024-09-06 09:29:38,552][01070] Avg episode reward: [(0, '26.654')]
3479
+ [2024-09-06 09:29:43,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 9097216. Throughput: 0: 949.8. Samples: 396364. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
3480
+ [2024-09-06 09:29:43,554][01070] Avg episode reward: [(0, '27.006')]
3481
+ [2024-09-06 09:29:45,512][31334] Updated weights for policy 0, policy_version 2223 (0.0020)
3482
+ [2024-09-06 09:29:48,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3860.0). Total num frames: 9113600. Throughput: 0: 906.2. Samples: 401080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3483
+ [2024-09-06 09:29:48,555][01070] Avg episode reward: [(0, '27.430')]
3484
+ [2024-09-06 09:29:53,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3873.8). Total num frames: 9134080. Throughput: 0: 950.4. Samples: 406972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3485
+ [2024-09-06 09:29:53,552][01070] Avg episode reward: [(0, '28.731')]
3486
+ [2024-09-06 09:29:55,896][31334] Updated weights for policy 0, policy_version 2233 (0.0016)
3487
+ [2024-09-06 09:29:58,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 9158656. Throughput: 0: 988.4. Samples: 410532. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
3488
+ [2024-09-06 09:29:58,552][01070] Avg episode reward: [(0, '27.817')]
3489
+ [2024-09-06 09:30:03,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3873.9). Total num frames: 9170944. Throughput: 0: 1005.6. Samples: 416422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3490
+ [2024-09-06 09:30:03,554][01070] Avg episode reward: [(0, '28.267')]
3491
+ [2024-09-06 09:30:07,671][31334] Updated weights for policy 0, policy_version 2243 (0.0043)
3492
+ [2024-09-06 09:30:08,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 9191424. Throughput: 0: 953.2. Samples: 421228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3493
+ [2024-09-06 09:30:08,556][01070] Avg episode reward: [(0, '26.904')]
3494
+ [2024-09-06 09:30:13,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.8, 300 sec: 3873.8). Total num frames: 9211904. Throughput: 0: 962.7. Samples: 424748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3495
+ [2024-09-06 09:30:13,556][01070] Avg episode reward: [(0, '28.356')]
3496
+ [2024-09-06 09:30:16,331][31334] Updated weights for policy 0, policy_version 2253 (0.0013)
3497
+ [2024-09-06 09:30:18,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9232384. Throughput: 0: 1024.3. Samples: 431792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3498
+ [2024-09-06 09:30:18,552][01070] Avg episode reward: [(0, '27.318')]
3499
+ [2024-09-06 09:30:18,626][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002255_9236480.pth...
3500
+ [2024-09-06 09:30:18,780][31321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002027_8302592.pth
3501
+ [2024-09-06 09:30:23,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3887.7). Total num frames: 9248768. Throughput: 0: 960.3. Samples: 436008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3502
+ [2024-09-06 09:30:23,554][01070] Avg episode reward: [(0, '27.558')]
3503
+ [2024-09-06 09:30:28,105][31334] Updated weights for policy 0, policy_version 2263 (0.0049)
3504
+ [2024-09-06 09:30:28,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 9269248. Throughput: 0: 942.6. Samples: 438782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3505
+ [2024-09-06 09:30:28,558][01070] Avg episode reward: [(0, '27.140')]
3506
+ [2024-09-06 09:30:33,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 9293824. Throughput: 0: 998.1. Samples: 445996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3507
+ [2024-09-06 09:30:33,552][01070] Avg episode reward: [(0, '27.243')]
3508
+ [2024-09-06 09:30:37,903][31334] Updated weights for policy 0, policy_version 2273 (0.0040)
3509
+ [2024-09-06 09:30:38,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 9310208. Throughput: 0: 988.6. Samples: 451458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
3510
+ [2024-09-06 09:30:38,555][01070] Avg episode reward: [(0, '25.930')]
3511
+ [2024-09-06 09:30:43,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 9326592. Throughput: 0: 957.8. Samples: 453632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3512
+ [2024-09-06 09:30:43,557][01070] Avg episode reward: [(0, '25.857')]
3513
+ [2024-09-06 09:30:48,412][31334] Updated weights for policy 0, policy_version 2283 (0.0029)
3514
+ [2024-09-06 09:30:48,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9351168. Throughput: 0: 974.4. Samples: 460270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3515
+ [2024-09-06 09:30:48,553][01070] Avg episode reward: [(0, '26.119')]
3516
+ [2024-09-06 09:30:53,551][01070] Fps is (10 sec: 4504.8, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 9371648. Throughput: 0: 1009.6. Samples: 466664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
3517
+ [2024-09-06 09:30:53,554][01070] Avg episode reward: [(0, '26.189')]
3518
+ [2024-09-06 09:30:58,553][01070] Fps is (10 sec: 3275.6, 60 sec: 3754.4, 300 sec: 3873.8). Total num frames: 9383936. Throughput: 0: 979.9. Samples: 468846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
3519
+ [2024-09-06 09:30:58,556][01070] Avg episode reward: [(0, '26.161')]
3520
+ [2024-09-06 09:31:00,090][31334] Updated weights for policy 0, policy_version 2293 (0.0017)
3521
+ [2024-09-06 09:31:03,550][01070] Fps is (10 sec: 3687.0, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9408512. Throughput: 0: 946.0. Samples: 474360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3522
+ [2024-09-06 09:31:03,554][01070] Avg episode reward: [(0, '26.578')]
3523
+ [2024-09-06 09:31:08,551][01070] Fps is (10 sec: 4506.5, 60 sec: 3959.3, 300 sec: 3859.9). Total num frames: 9428992. Throughput: 0: 1008.8. Samples: 481404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
3524
+ [2024-09-06 09:31:08,557][01070] Avg episode reward: [(0, '27.973')]
3525
+ [2024-09-06 09:31:08,806][31334] Updated weights for policy 0, policy_version 2303 (0.0017)
3526
+ [2024-09-06 09:31:13,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 9445376. Throughput: 0: 1010.6. Samples: 484258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
3527
+ [2024-09-06 09:31:13,558][01070] Avg episode reward: [(0, '27.929')]
3528
+ [2024-09-06 09:31:18,550][01070] Fps is (10 sec: 3277.3, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9461760. Throughput: 0: 946.3. Samples: 488582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3529
+ [2024-09-06 09:31:18,553][01070] Avg episode reward: [(0, '29.506')]
3530
+ [2024-09-06 09:31:20,377][31334] Updated weights for policy 0, policy_version 2313 (0.0028)
3531
+ [2024-09-06 09:31:23,550][01070] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 9486336. Throughput: 0: 979.2. Samples: 495520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3532
+ [2024-09-06 09:31:23,557][01070] Avg episode reward: [(0, '28.909')]
3533
+ [2024-09-06 09:31:28,550][01070] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9506816. Throughput: 0: 1009.9. Samples: 499078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3534
+ [2024-09-06 09:31:28,556][01070] Avg episode reward: [(0, '29.850')]
3535
+ [2024-09-06 09:31:30,293][31334] Updated weights for policy 0, policy_version 2323 (0.0023)
3536
+ [2024-09-06 09:31:33,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 9523200. Throughput: 0: 967.2. Samples: 503792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
3537
+ [2024-09-06 09:31:33,556][01070] Avg episode reward: [(0, '29.280')]
3538
+ [2024-09-06 09:31:38,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 9543680. Throughput: 0: 958.8. Samples: 509810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3539
+ [2024-09-06 09:31:38,553][01070] Avg episode reward: [(0, '27.557')]
3540
+ [2024-09-06 09:31:40,673][31334] Updated weights for policy 0, policy_version 2333 (0.0042)
3541
+ [2024-09-06 09:31:43,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3873.9). Total num frames: 9568256. Throughput: 0: 989.4. Samples: 513364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3542
+ [2024-09-06 09:31:43,556][01070] Avg episode reward: [(0, '27.125')]
3543
+ [2024-09-06 09:31:48,551][01070] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 9584640. Throughput: 0: 995.3. Samples: 519150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
3544
+ [2024-09-06 09:31:48,553][01070] Avg episode reward: [(0, '27.322')]
3545
+ [2024-09-06 09:31:52,363][31334] Updated weights for policy 0, policy_version 2343 (0.0016)
3546
+ [2024-09-06 09:31:53,550][01070] Fps is (10 sec: 3276.7, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 9601024. Throughput: 0: 944.7. Samples: 523914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3547
+ [2024-09-06 09:31:53,552][01070] Avg episode reward: [(0, '27.500')]
3548
+ [2024-09-06 09:31:58,551][01070] Fps is (10 sec: 4095.9, 60 sec: 4027.9, 300 sec: 3859.9). Total num frames: 9625600. Throughput: 0: 961.7. Samples: 527534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3549
+ [2024-09-06 09:31:58,553][01070] Avg episode reward: [(0, '29.126')]
3550
+ [2024-09-06 09:32:00,889][31334] Updated weights for policy 0, policy_version 2353 (0.0022)
3551
+ [2024-09-06 09:32:03,550][01070] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 9646080. Throughput: 0: 1019.8. Samples: 534472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3552
+ [2024-09-06 09:32:03,558][01070] Avg episode reward: [(0, '29.982')]
3553
+ [2024-09-06 09:32:08,550][01070] Fps is (10 sec: 3277.3, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 9658368. Throughput: 0: 960.1. Samples: 538726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3554
+ [2024-09-06 09:32:08,556][01070] Avg episode reward: [(0, '29.952')]
3555
+ [2024-09-06 09:32:12,587][31334] Updated weights for policy 0, policy_version 2363 (0.0031)
3556
+ [2024-09-06 09:32:13,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 9682944. Throughput: 0: 949.4. Samples: 541802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3557
+ [2024-09-06 09:32:13,552][01070] Avg episode reward: [(0, '28.555')]
3558
+ [2024-09-06 09:32:18,550][01070] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 9703424. Throughput: 0: 1001.6. Samples: 548864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3559
+ [2024-09-06 09:32:18,556][01070] Avg episode reward: [(0, '25.628')]
3560
+ [2024-09-06 09:32:18,570][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002369_9703424.pth...
3561
+ [2024-09-06 09:32:18,702][31321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002142_8773632.pth
3562
+ [2024-09-06 09:32:22,792][31334] Updated weights for policy 0, policy_version 2373 (0.0016)
3563
+ [2024-09-06 09:32:23,550][01070] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 9719808. Throughput: 0: 977.9. Samples: 553814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
3564
+ [2024-09-06 09:32:23,555][01070] Avg episode reward: [(0, '22.375')]
3565
+ [2024-09-06 09:32:28,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 9736192. Throughput: 0: 948.0. Samples: 556022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3566
+ [2024-09-06 09:32:28,553][01070] Avg episode reward: [(0, '20.360')]
3567
+ [2024-09-06 09:32:32,965][31334] Updated weights for policy 0, policy_version 2383 (0.0025)
3568
+ [2024-09-06 09:32:33,550][01070] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 9760768. Throughput: 0: 973.8. Samples: 562968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
3569
+ [2024-09-06 09:32:33,555][01070] Avg episode reward: [(0, '20.884')]
3570
+ [2024-09-06 09:32:38,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 9781248. Throughput: 0: 1009.0. Samples: 569320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3571
+ [2024-09-06 09:32:38,552][01070] Avg episode reward: [(0, '22.478')]
3572
+ [2024-09-06 09:32:43,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 9797632. Throughput: 0: 975.1. Samples: 571410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3573
+ [2024-09-06 09:32:43,556][01070] Avg episode reward: [(0, '24.007')]
3574
+ [2024-09-06 09:32:44,350][31334] Updated weights for policy 0, policy_version 2393 (0.0025)
3575
+ [2024-09-06 09:32:48,550][01070] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 9818112. Throughput: 0: 954.8. Samples: 577440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3576
+ [2024-09-06 09:32:48,555][01070] Avg episode reward: [(0, '24.930')]
3577
+ [2024-09-06 09:32:53,552][01070] Fps is (10 sec: 3275.9, 60 sec: 3822.8, 300 sec: 3832.2). Total num frames: 9830400. Throughput: 0: 960.2. Samples: 581938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3578
+ [2024-09-06 09:32:53,557][01070] Avg episode reward: [(0, '25.976')]
3579
+ [2024-09-06 09:32:57,156][31334] Updated weights for policy 0, policy_version 2403 (0.0037)
3580
+ [2024-09-06 09:32:58,550][01070] Fps is (10 sec: 2457.6, 60 sec: 3618.2, 300 sec: 3832.2). Total num frames: 9842688. Throughput: 0: 929.7. Samples: 583638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3581
+ [2024-09-06 09:32:58,555][01070] Avg episode reward: [(0, '26.309')]
3582
+ [2024-09-06 09:33:03,550][01070] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 9863168. Throughput: 0: 873.8. Samples: 588184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3583
+ [2024-09-06 09:33:03,552][01070] Avg episode reward: [(0, '28.969')]
3584
+ [2024-09-06 09:33:07,715][31334] Updated weights for policy 0, policy_version 2413 (0.0028)
3585
+ [2024-09-06 09:33:08,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 9887744. Throughput: 0: 922.7. Samples: 595336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3586
+ [2024-09-06 09:33:08,557][01070] Avg episode reward: [(0, '29.922')]
3587
+ [2024-09-06 09:33:13,550][01070] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 9908224. Throughput: 0: 951.1. Samples: 598822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
3588
+ [2024-09-06 09:33:13,556][01070] Avg episode reward: [(0, '30.424')]
3589
+ [2024-09-06 09:33:18,550][01070] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3832.2). Total num frames: 9920512. Throughput: 0: 895.7. Samples: 603274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
3590
+ [2024-09-06 09:33:18,552][01070] Avg episode reward: [(0, '29.949')]
3591
+ [2024-09-06 09:33:18,995][31334] Updated weights for policy 0, policy_version 2423 (0.0013)
3592
+ [2024-09-06 09:33:23,550][01070] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 9945088. Throughput: 0: 893.8. Samples: 609540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3593
+ [2024-09-06 09:33:23,552][01070] Avg episode reward: [(0, '32.278')]
3594
+ [2024-09-06 09:33:27,758][31334] Updated weights for policy 0, policy_version 2433 (0.0015)
3595
+ [2024-09-06 09:33:28,550][01070] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 9965568. Throughput: 0: 926.0. Samples: 613080. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
3596
+ [2024-09-06 09:33:28,557][01070] Avg episode reward: [(0, '32.614')]
3597
+ [2024-09-06 09:33:33,550][01070] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 9981952. Throughput: 0: 917.8. Samples: 618742. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
3598
+ [2024-09-06 09:33:33,553][01070] Avg episode reward: [(0, '30.653')]
3599
+ [2024-09-06 09:33:38,550][01070] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3818.3). Total num frames: 9998336. Throughput: 0: 930.7. Samples: 623816. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
3600
+ [2024-09-06 09:33:38,554][01070] Avg episode reward: [(0, '29.532')]
3601
+ [2024-09-06 09:33:39,515][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
3602
+ [2024-09-06 09:33:39,516][31321] Stopping Batcher_0...
3603
+ [2024-09-06 09:33:39,524][31321] Loop batcher_evt_loop terminating...
3604
+ [2024-09-06 09:33:39,530][01070] Component Batcher_0 stopped!
3605
+ [2024-09-06 09:33:39,548][31334] Updated weights for policy 0, policy_version 2443 (0.0018)
3606
+ [2024-09-06 09:33:39,594][31334] Weights refcount: 2 0
3607
+ [2024-09-06 09:33:39,598][31334] Stopping InferenceWorker_p0-w0...
3608
+ [2024-09-06 09:33:39,599][31334] Loop inference_proc0-0_evt_loop terminating...
3609
+ [2024-09-06 09:33:39,599][01070] Component InferenceWorker_p0-w0 stopped!
3610
+ [2024-09-06 09:33:39,677][31321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002255_9236480.pth
3611
+ [2024-09-06 09:33:39,692][31321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
3612
+ [2024-09-06 09:33:39,880][31321] Stopping LearnerWorker_p0...
3613
+ [2024-09-06 09:33:39,884][31321] Loop learner_proc0_evt_loop terminating...
3614
+ [2024-09-06 09:33:39,881][01070] Component LearnerWorker_p0 stopped!
3615
+ [2024-09-06 09:33:39,915][01070] Component RolloutWorker_w1 stopped!
3616
+ [2024-09-06 09:33:39,920][31336] Stopping RolloutWorker_w1...
3617
+ [2024-09-06 09:33:39,929][01070] Component RolloutWorker_w3 stopped!
3618
+ [2024-09-06 09:33:39,933][31338] Stopping RolloutWorker_w3...
3619
+ [2024-09-06 09:33:39,934][31336] Loop rollout_proc1_evt_loop terminating...
3620
+ [2024-09-06 09:33:39,937][01070] Component RolloutWorker_w5 stopped!
3621
+ [2024-09-06 09:33:39,941][31340] Stopping RolloutWorker_w5...
3622
+ [2024-09-06 09:33:39,942][31340] Loop rollout_proc5_evt_loop terminating...
3623
+ [2024-09-06 09:33:39,934][31338] Loop rollout_proc3_evt_loop terminating...
3624
+ [2024-09-06 09:33:39,963][01070] Component RolloutWorker_w7 stopped!
3625
+ [2024-09-06 09:33:39,967][31341] Stopping RolloutWorker_w7...
3626
+ [2024-09-06 09:33:39,972][31341] Loop rollout_proc7_evt_loop terminating...
3627
+ [2024-09-06 09:33:39,988][31337] Stopping RolloutWorker_w2...
3628
+ [2024-09-06 09:33:39,988][01070] Component RolloutWorker_w2 stopped!
3629
+ [2024-09-06 09:33:39,989][31337] Loop rollout_proc2_evt_loop terminating...
3630
+ [2024-09-06 09:33:39,998][31335] Stopping RolloutWorker_w0...
3631
+ [2024-09-06 09:33:39,998][01070] Component RolloutWorker_w0 stopped!
3632
+ [2024-09-06 09:33:40,002][31335] Loop rollout_proc0_evt_loop terminating...
3633
+ [2024-09-06 09:33:40,068][31339] Stopping RolloutWorker_w4...
3634
+ [2024-09-06 09:33:40,068][01070] Component RolloutWorker_w4 stopped!
3635
+ [2024-09-06 09:33:40,069][31339] Loop rollout_proc4_evt_loop terminating...
3636
+ [2024-09-06 09:33:40,097][31342] Stopping RolloutWorker_w6...
3637
+ [2024-09-06 09:33:40,097][01070] Component RolloutWorker_w6 stopped!
3638
+ [2024-09-06 09:33:40,102][01070] Waiting for process learner_proc0 to stop...
3639
+ [2024-09-06 09:33:40,098][31342] Loop rollout_proc6_evt_loop terminating...
3640
+ [2024-09-06 09:33:41,279][01070] Waiting for process inference_proc0-0 to join...
3641
+ [2024-09-06 09:33:41,286][01070] Waiting for process rollout_proc0 to join...
3642
+ [2024-09-06 09:33:43,440][01070] Waiting for process rollout_proc1 to join...
3643
+ [2024-09-06 09:33:43,450][01070] Waiting for process rollout_proc2 to join...
3644
+ [2024-09-06 09:33:43,456][01070] Waiting for process rollout_proc3 to join...
3645
+ [2024-09-06 09:33:43,459][01070] Waiting for process rollout_proc4 to join...
3646
+ [2024-09-06 09:33:43,464][01070] Waiting for process rollout_proc5 to join...
3647
+ [2024-09-06 09:33:43,468][01070] Waiting for process rollout_proc6 to join...
3648
+ [2024-09-06 09:33:43,473][01070] Waiting for process rollout_proc7 to join...
3649
+ [2024-09-06 09:33:43,477][01070] Batcher 0 profile tree view:
3650
+ batching: 17.3741, releasing_batches: 0.0206
3651
+ [2024-09-06 09:33:43,479][01070] InferenceWorker_p0-w0 profile tree view:
3652
+ wait_policy: 0.0001
3653
+ wait_policy_total: 250.6480
3654
+ update_model: 5.6099
3655
+ weight_update: 0.0018
3656
+ one_step: 0.0090
3657
+ handle_policy_step: 377.0589
3658
+ deserialize: 9.3343, stack: 2.0445, obs_to_device_normalize: 77.2195, forward: 198.9774, send_messages: 18.2353
3659
+ prepare_outputs: 52.7164
3660
+ to_cpu: 30.3890
3661
+ [2024-09-06 09:33:43,480][01070] Learner 0 profile tree view:
3662
+ misc: 0.0038, prepare_batch: 9.0030
3663
+ train: 48.0575
3664
+ epoch_init: 0.0094, minibatch_init: 0.0154, losses_postprocess: 0.4238, kl_divergence: 0.4480, after_optimizer: 1.8224
3665
+ calculate_losses: 17.3374
3666
+ losses_init: 0.0049, forward_head: 1.0561, bptt_initial: 11.6751, tail: 0.7162, advantages_returns: 0.1663, losses: 2.2672
3667
+ bptt: 1.2517
3668
+ bptt_forward_core: 1.1681
3669
+ update: 27.5809
3670
+ clip: 0.5703
3671
+ [2024-09-06 09:33:43,483][01070] RolloutWorker_w0 profile tree view:
3672
+ wait_for_trajectories: 0.2106, enqueue_policy_requests: 61.5026, env_step: 506.3878, overhead: 8.0540, complete_rollouts: 5.0957
3673
+ save_policy_outputs: 12.4197
3674
+ split_output_tensors: 4.9996
3675
+ [2024-09-06 09:33:43,485][01070] RolloutWorker_w7 profile tree view:
3676
+ wait_for_trajectories: 0.1713, enqueue_policy_requests: 59.7714, env_step: 514.1502, overhead: 8.4068, complete_rollouts: 4.1925
3677
+ save_policy_outputs: 12.9198
3678
+ split_output_tensors: 5.3484
3679
+ [2024-09-06 09:33:43,487][01070] Loop Runner_EvtLoop terminating...
3680
+ [2024-09-06 09:33:43,488][01070] Runner profile tree view:
3681
+ main_loop: 682.3462
3682
+ [2024-09-06 09:33:43,490][01070] Collected {0: 10006528}, FPS: 3661.7
3683
+ [2024-09-06 09:33:49,612][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
3684
+ [2024-09-06 09:33:49,613][01070] Overriding arg 'num_workers' with value 1 passed from command line
3685
+ [2024-09-06 09:33:49,615][01070] Adding new argument 'no_render'=True that is not in the saved config file!
3686
+ [2024-09-06 09:33:49,616][01070] Adding new argument 'save_video'=True that is not in the saved config file!
3687
+ [2024-09-06 09:33:49,617][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
3688
+ [2024-09-06 09:33:49,618][01070] Adding new argument 'video_name'=None that is not in the saved config file!
3689
+ [2024-09-06 09:33:49,619][01070] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
3690
+ [2024-09-06 09:33:49,621][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
3691
+ [2024-09-06 09:33:49,622][01070] Adding new argument 'push_to_hub'=False that is not in the saved config file!
3692
+ [2024-09-06 09:33:49,623][01070] Adding new argument 'hf_repository'=None that is not in the saved config file!
3693
+ [2024-09-06 09:33:49,624][01070] Adding new argument 'policy_index'=0 that is not in the saved config file!
3694
+ [2024-09-06 09:33:49,625][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
3695
+ [2024-09-06 09:33:49,626][01070] Adding new argument 'train_script'=None that is not in the saved config file!
3696
+ [2024-09-06 09:33:49,627][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file!
3697
+ [2024-09-06 09:33:49,628][01070] Using frameskip 1 and render_action_repeat=4 for evaluation
3698
+ [2024-09-06 09:33:49,660][01070] RunningMeanStd input shape: (3, 72, 128)
3699
+ [2024-09-06 09:33:49,661][01070] RunningMeanStd input shape: (1,)
3700
+ [2024-09-06 09:33:49,675][01070] ConvEncoder: input_channels=3
3701
+ [2024-09-06 09:33:49,713][01070] Conv encoder output size: 512
3702
+ [2024-09-06 09:33:49,714][01070] Policy head output size: 512
3703
+ [2024-09-06 09:33:49,733][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
3704
+ [2024-09-06 09:33:50,158][01070] Num frames 100...
3705
+ [2024-09-06 09:33:50,294][01070] Num frames 200...
3706
+ [2024-09-06 09:33:50,435][01070] Num frames 300...
3707
+ [2024-09-06 09:33:50,580][01070] Num frames 400...
3708
+ [2024-09-06 09:33:50,701][01070] Num frames 500...
3709
+ [2024-09-06 09:33:50,828][01070] Num frames 600...
3710
+ [2024-09-06 09:33:50,948][01070] Num frames 700...
3711
+ [2024-09-06 09:33:51,068][01070] Num frames 800...
3712
+ [2024-09-06 09:33:51,192][01070] Num frames 900...
3713
+ [2024-09-06 09:33:51,316][01070] Num frames 1000...
3714
+ [2024-09-06 09:33:51,448][01070] Num frames 1100...
3715
+ [2024-09-06 09:33:51,584][01070] Num frames 1200...
3716
+ [2024-09-06 09:33:51,708][01070] Num frames 1300...
3717
+ [2024-09-06 09:33:51,831][01070] Num frames 1400...
3718
+ [2024-09-06 09:33:51,964][01070] Num frames 1500...
3719
+ [2024-09-06 09:33:52,096][01070] Num frames 1600...
3720
+ [2024-09-06 09:33:52,228][01070] Num frames 1700...
3721
+ [2024-09-06 09:33:52,381][01070] Avg episode rewards: #0: 42.729, true rewards: #0: 17.730
3722
+ [2024-09-06 09:33:52,382][01070] Avg episode reward: 42.729, avg true_objective: 17.730
3723
+ [2024-09-06 09:33:52,418][01070] Num frames 1800...
3724
+ [2024-09-06 09:33:52,551][01070] Num frames 1900...
3725
+ [2024-09-06 09:33:52,676][01070] Num frames 2000...
3726
+ [2024-09-06 09:33:52,798][01070] Num frames 2100...
3727
+ [2024-09-06 09:33:52,927][01070] Num frames 2200...
3728
+ [2024-09-06 09:33:53,052][01070] Num frames 2300...
3729
+ [2024-09-06 09:33:53,174][01070] Num frames 2400...
3730
+ [2024-09-06 09:33:53,325][01070] Avg episode rewards: #0: 27.885, true rewards: #0: 12.385
3731
+ [2024-09-06 09:33:53,326][01070] Avg episode reward: 27.885, avg true_objective: 12.385
3732
+ [2024-09-06 09:33:53,356][01070] Num frames 2500...
3733
+ [2024-09-06 09:33:53,476][01070] Num frames 2600...
3734
+ [2024-09-06 09:33:53,599][01070] Num frames 2700...
3735
+ [2024-09-06 09:33:53,724][01070] Num frames 2800...
3736
+ [2024-09-06 09:33:53,844][01070] Num frames 2900...
3737
+ [2024-09-06 09:33:53,933][01070] Avg episode rewards: #0: 20.416, true rewards: #0: 9.750
3738
+ [2024-09-06 09:33:53,934][01070] Avg episode reward: 20.416, avg true_objective: 9.750
3739
+ [2024-09-06 09:33:54,025][01070] Num frames 3000...
3740
+ [2024-09-06 09:33:54,146][01070] Num frames 3100...
3741
+ [2024-09-06 09:33:54,275][01070] Num frames 3200...
3742
+ [2024-09-06 09:33:54,415][01070] Num frames 3300...
3743
+ [2024-09-06 09:33:54,544][01070] Num frames 3400...
3744
+ [2024-09-06 09:33:54,668][01070] Num frames 3500...
3745
+ [2024-09-06 09:33:54,789][01070] Num frames 3600...
3746
+ [2024-09-06 09:33:54,917][01070] Num frames 3700...
3747
+ [2024-09-06 09:33:55,043][01070] Avg episode rewards: #0: 19.895, true rewards: #0: 9.395
3748
+ [2024-09-06 09:33:55,044][01070] Avg episode reward: 19.895, avg true_objective: 9.395
3749
+ [2024-09-06 09:33:55,099][01070] Num frames 3800...
3750
+ [2024-09-06 09:33:55,230][01070] Num frames 3900...
3751
+ [2024-09-06 09:33:55,361][01070] Num frames 4000...
3752
+ [2024-09-06 09:33:55,490][01070] Num frames 4100...
3753
+ [2024-09-06 09:33:55,610][01070] Num frames 4200...
3754
+ [2024-09-06 09:33:55,712][01070] Avg episode rewards: #0: 17.476, true rewards: #0: 8.476
3755
+ [2024-09-06 09:33:55,714][01070] Avg episode reward: 17.476, avg true_objective: 8.476
3756
+ [2024-09-06 09:33:55,790][01070] Num frames 4300...
3757
+ [2024-09-06 09:33:55,909][01070] Num frames 4400...
3758
+ [2024-09-06 09:33:56,038][01070] Num frames 4500...
3759
+ [2024-09-06 09:33:56,162][01070] Num frames 4600...
3760
+ [2024-09-06 09:33:56,279][01070] Num frames 4700...
3761
+ [2024-09-06 09:33:56,400][01070] Num frames 4800...
3762
+ [2024-09-06 09:33:56,534][01070] Num frames 4900...
3763
+ [2024-09-06 09:33:56,659][01070] Num frames 5000...
3764
+ [2024-09-06 09:33:56,780][01070] Num frames 5100...
3765
+ [2024-09-06 09:33:56,902][01070] Num frames 5200...
3766
+ [2024-09-06 09:33:57,033][01070] Num frames 5300...
3767
+ [2024-09-06 09:33:57,157][01070] Num frames 5400...
3768
+ [2024-09-06 09:33:57,279][01070] Num frames 5500...
3769
+ [2024-09-06 09:33:57,405][01070] Num frames 5600...
3770
+ [2024-09-06 09:33:57,545][01070] Num frames 5700...
3771
+ [2024-09-06 09:33:57,669][01070] Num frames 5800...
3772
+ [2024-09-06 09:33:57,789][01070] Num frames 5900...
3773
+ [2024-09-06 09:33:57,910][01070] Num frames 6000...
3774
+ [2024-09-06 09:33:58,039][01070] Num frames 6100...
3775
+ [2024-09-06 09:33:58,162][01070] Num frames 6200...
3776
+ [2024-09-06 09:33:58,322][01070] Num frames 6300...
3777
+ [2024-09-06 09:33:58,444][01070] Avg episode rewards: #0: 24.730, true rewards: #0: 10.563
3778
+ [2024-09-06 09:33:58,446][01070] Avg episode reward: 24.730, avg true_objective: 10.563
3779
+ [2024-09-06 09:33:58,553][01070] Num frames 6400...
3780
+ [2024-09-06 09:33:58,719][01070] Num frames 6500...
3781
+ [2024-09-06 09:33:58,886][01070] Num frames 6600...
3782
+ [2024-09-06 09:33:59,057][01070] Num frames 6700...
3783
+ [2024-09-06 09:33:59,222][01070] Num frames 6800...
3784
+ [2024-09-06 09:33:59,382][01070] Num frames 6900...
3785
+ [2024-09-06 09:33:59,564][01070] Num frames 7000...
3786
+ [2024-09-06 09:33:59,740][01070] Num frames 7100...
3787
+ [2024-09-06 09:33:59,914][01070] Num frames 7200...
3788
+ [2024-09-06 09:34:00,087][01070] Num frames 7300...
3789
+ [2024-09-06 09:34:00,258][01070] Num frames 7400...
3790
+ [2024-09-06 09:34:00,429][01070] Num frames 7500...
3791
+ [2024-09-06 09:34:00,610][01070] Num frames 7600...
3792
+ [2024-09-06 09:34:00,786][01070] Num frames 7700...
3793
+ [2024-09-06 09:34:00,911][01070] Num frames 7800...
3794
+ [2024-09-06 09:34:01,031][01070] Num frames 7900...
3795
+ [2024-09-06 09:34:01,159][01070] Num frames 8000...
3796
+ [2024-09-06 09:34:01,279][01070] Num frames 8100...
3797
+ [2024-09-06 09:34:01,402][01070] Num frames 8200...
3798
+ [2024-09-06 09:34:01,537][01070] Num frames 8300...
3799
+ [2024-09-06 09:34:01,664][01070] Num frames 8400...
3800
+ [2024-09-06 09:34:01,766][01070] Avg episode rewards: #0: 29.768, true rewards: #0: 12.054
3801
+ [2024-09-06 09:34:01,767][01070] Avg episode reward: 29.768, avg true_objective: 12.054
3802
+ [2024-09-06 09:34:01,845][01070] Num frames 8500...
3803
+ [2024-09-06 09:34:01,964][01070] Num frames 8600...
3804
+ [2024-09-06 09:34:02,084][01070] Num frames 8700...
3805
+ [2024-09-06 09:34:02,213][01070] Num frames 8800...
3806
+ [2024-09-06 09:34:02,332][01070] Num frames 8900...
3807
+ [2024-09-06 09:34:02,455][01070] Num frames 9000...
3808
+ [2024-09-06 09:34:02,585][01070] Num frames 9100...
3809
+ [2024-09-06 09:34:02,710][01070] Num frames 9200...
3810
+ [2024-09-06 09:34:02,832][01070] Num frames 9300...
3811
+ [2024-09-06 09:34:02,955][01070] Num frames 9400...
3812
+ [2024-09-06 09:34:03,078][01070] Num frames 9500...
3813
+ [2024-09-06 09:34:03,208][01070] Num frames 9600...
3814
+ [2024-09-06 09:34:03,326][01070] Num frames 9700...
3815
+ [2024-09-06 09:34:03,450][01070] Num frames 9800...
3816
+ [2024-09-06 09:34:03,583][01070] Num frames 9900...
3817
+ [2024-09-06 09:34:03,708][01070] Num frames 10000...
3818
+ [2024-09-06 09:34:03,829][01070] Num frames 10100...
3819
+ [2024-09-06 09:34:03,949][01070] Num frames 10200...
3820
+ [2024-09-06 09:34:04,072][01070] Num frames 10300...
3821
+ [2024-09-06 09:34:04,200][01070] Num frames 10400...
3822
+ [2024-09-06 09:34:04,323][01070] Num frames 10500...
3823
+ [2024-09-06 09:34:04,414][01070] Avg episode rewards: #0: 33.533, true rewards: #0: 13.159
3824
+ [2024-09-06 09:34:04,416][01070] Avg episode reward: 33.533, avg true_objective: 13.159
3825
+ [2024-09-06 09:34:04,519][01070] Num frames 10600...
3826
+ [2024-09-06 09:34:04,640][01070] Num frames 10700...
3827
+ [2024-09-06 09:34:04,761][01070] Num frames 10800...
3828
+ [2024-09-06 09:34:04,884][01070] Num frames 10900...
3829
+ [2024-09-06 09:34:05,003][01070] Num frames 11000...
3830
+ [2024-09-06 09:34:05,126][01070] Num frames 11100...
3831
+ [2024-09-06 09:34:05,263][01070] Num frames 11200...
3832
+ [2024-09-06 09:34:05,384][01070] Num frames 11300...
3833
+ [2024-09-06 09:34:05,512][01070] Num frames 11400...
3834
+ [2024-09-06 09:34:05,637][01070] Num frames 11500...
3835
+ [2024-09-06 09:34:05,761][01070] Num frames 11600...
3836
+ [2024-09-06 09:34:05,884][01070] Num frames 11700...
3837
+ [2024-09-06 09:34:06,007][01070] Num frames 11800...
3838
+ [2024-09-06 09:34:06,131][01070] Num frames 11900...
3839
+ [2024-09-06 09:34:06,262][01070] Num frames 12000...
3840
+ [2024-09-06 09:34:06,384][01070] Num frames 12100...
3841
+ [2024-09-06 09:34:06,477][01070] Avg episode rewards: #0: 34.477, true rewards: #0: 13.478
3842
+ [2024-09-06 09:34:06,478][01070] Avg episode reward: 34.477, avg true_objective: 13.478
3843
+ [2024-09-06 09:34:06,568][01070] Num frames 12200...
3844
+ [2024-09-06 09:34:06,688][01070] Num frames 12300...
3845
+ [2024-09-06 09:34:06,808][01070] Num frames 12400...
3846
+ [2024-09-06 09:34:06,928][01070] Num frames 12500...
3847
+ [2024-09-06 09:34:07,058][01070] Num frames 12600...
3848
+ [2024-09-06 09:34:07,184][01070] Num frames 12700...
3849
+ [2024-09-06 09:34:07,317][01070] Num frames 12800...
3850
+ [2024-09-06 09:34:07,443][01070] Avg episode rewards: #0: 32.758, true rewards: #0: 12.858
3851
+ [2024-09-06 09:34:07,444][01070] Avg episode reward: 32.758, avg true_objective: 12.858
3852
+ [2024-09-06 09:35:25,862][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
3853
+ [2024-09-06 09:35:27,895][01070] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
3854
+ [2024-09-06 09:35:27,897][01070] Overriding arg 'num_workers' with value 1 passed from command line
3855
+ [2024-09-06 09:35:27,899][01070] Adding new argument 'no_render'=True that is not in the saved config file!
3856
+ [2024-09-06 09:35:27,901][01070] Adding new argument 'save_video'=True that is not in the saved config file!
3857
+ [2024-09-06 09:35:27,903][01070] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
3858
+ [2024-09-06 09:35:27,904][01070] Adding new argument 'video_name'=None that is not in the saved config file!
3859
+ [2024-09-06 09:35:27,906][01070] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
3860
+ [2024-09-06 09:35:27,908][01070] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
3861
+ [2024-09-06 09:35:27,909][01070] Adding new argument 'push_to_hub'=True that is not in the saved config file!
3862
+ [2024-09-06 09:35:27,910][01070] Adding new argument 'hf_repository'='Re-Re/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
3863
+ [2024-09-06 09:35:27,911][01070] Adding new argument 'policy_index'=0 that is not in the saved config file!
3864
+ [2024-09-06 09:35:27,912][01070] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
3865
+ [2024-09-06 09:35:27,913][01070] Adding new argument 'train_script'=None that is not in the saved config file!
3866
+ [2024-09-06 09:35:27,914][01070] Adding new argument 'enjoy_script'=None that is not in the saved config file!
3867
+ [2024-09-06 09:35:27,915][01070] Using frameskip 1 and render_action_repeat=4 for evaluation
3868
+ [2024-09-06 09:35:27,944][01070] RunningMeanStd input shape: (3, 72, 128)
3869
+ [2024-09-06 09:35:27,946][01070] RunningMeanStd input shape: (1,)
3870
+ [2024-09-06 09:35:27,960][01070] ConvEncoder: input_channels=3
3871
+ [2024-09-06 09:35:27,997][01070] Conv encoder output size: 512
3872
+ [2024-09-06 09:35:27,998][01070] Policy head output size: 512
3873
+ [2024-09-06 09:35:28,018][01070] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002443_10006528.pth...
3874
+ [2024-09-06 09:35:28,432][01070] Num frames 100...
3875
+ [2024-09-06 09:35:28,573][01070] Num frames 200...
3876
+ [2024-09-06 09:35:28,697][01070] Num frames 300...
3877
+ [2024-09-06 09:35:28,814][01070] Num frames 400...
3878
+ [2024-09-06 09:35:28,932][01070] Num frames 500...
3879
+ [2024-09-06 09:35:29,050][01070] Num frames 600...
3880
+ [2024-09-06 09:35:29,170][01070] Num frames 700...
3881
+ [2024-09-06 09:35:29,290][01070] Num frames 800...
3882
+ [2024-09-06 09:35:29,422][01070] Avg episode rewards: #0: 17.640, true rewards: #0: 8.640
3883
+ [2024-09-06 09:35:29,424][01070] Avg episode reward: 17.640, avg true_objective: 8.640
3884
+ [2024-09-06 09:35:29,487][01070] Num frames 900...
3885
+ [2024-09-06 09:35:29,613][01070] Num frames 1000...
3886
+ [2024-09-06 09:35:29,733][01070] Num frames 1100...
3887
+ [2024-09-06 09:35:29,853][01070] Num frames 1200...
3888
+ [2024-09-06 09:35:29,974][01070] Num frames 1300...
3889
+ [2024-09-06 09:35:30,108][01070] Num frames 1400...
3890
+ [2024-09-06 09:35:30,275][01070] Num frames 1500...
3891
+ [2024-09-06 09:35:30,436][01070] Num frames 1600...
3892
+ [2024-09-06 09:35:30,626][01070] Num frames 1700...
3893
+ [2024-09-06 09:35:30,791][01070] Num frames 1800...
3894
+ [2024-09-06 09:35:30,958][01070] Num frames 1900...
3895
+ [2024-09-06 09:35:31,122][01070] Num frames 2000...
3896
+ [2024-09-06 09:35:31,289][01070] Num frames 2100...
3897
+ [2024-09-06 09:35:31,465][01070] Num frames 2200...
3898
+ [2024-09-06 09:35:31,646][01070] Num frames 2300...
3899
+ [2024-09-06 09:35:31,818][01070] Num frames 2400...
3900
+ [2024-09-06 09:35:32,050][01070] Avg episode rewards: #0: 28.480, true rewards: #0: 12.480
3901
+ [2024-09-06 09:35:32,052][01070] Avg episode reward: 28.480, avg true_objective: 12.480
3902
+ [2024-09-06 09:35:32,066][01070] Num frames 2500...
3903
+ [2024-09-06 09:35:32,252][01070] Num frames 2600...
3904
+ [2024-09-06 09:35:32,423][01070] Num frames 2700...
3905
+ [2024-09-06 09:35:32,607][01070] Num frames 2800...
3906
+ [2024-09-06 09:35:32,773][01070] Num frames 2900...
3907
+ [2024-09-06 09:35:32,896][01070] Num frames 3000...
3908
+ [2024-09-06 09:35:33,016][01070] Num frames 3100...
3909
+ [2024-09-06 09:35:33,138][01070] Num frames 3200...
3910
+ [2024-09-06 09:35:33,262][01070] Num frames 3300...
3911
+ [2024-09-06 09:35:33,382][01070] Num frames 3400...
3912
+ [2024-09-06 09:35:33,467][01070] Avg episode rewards: #0: 26.080, true rewards: #0: 11.413
3913
+ [2024-09-06 09:35:33,469][01070] Avg episode reward: 26.080, avg true_objective: 11.413
3914
+ [2024-09-06 09:35:33,567][01070] Num frames 3500...
3915
+ [2024-09-06 09:35:33,697][01070] Num frames 3600...
3916
+ [2024-09-06 09:35:33,817][01070] Num frames 3700...
3917
+ [2024-09-06 09:35:33,963][01070] Avg episode rewards: #0: 21.190, true rewards: #0: 9.440
3918
+ [2024-09-06 09:35:33,964][01070] Avg episode reward: 21.190, avg true_objective: 9.440
3919
+ [2024-09-06 09:35:33,997][01070] Num frames 3800...
3920
+ [2024-09-06 09:35:34,117][01070] Num frames 3900...
3921
+ [2024-09-06 09:35:34,241][01070] Num frames 4000...
3922
+ [2024-09-06 09:35:34,361][01070] Num frames 4100...
3923
+ [2024-09-06 09:35:34,490][01070] Num frames 4200...
3924
+ [2024-09-06 09:35:34,616][01070] Num frames 4300...
3925
+ [2024-09-06 09:35:34,748][01070] Num frames 4400...
3926
+ [2024-09-06 09:35:34,870][01070] Num frames 4500...
3927
+ [2024-09-06 09:35:34,990][01070] Num frames 4600...
3928
+ [2024-09-06 09:35:35,113][01070] Num frames 4700...
3929
+ [2024-09-06 09:35:35,236][01070] Num frames 4800...
3930
+ [2024-09-06 09:35:35,355][01070] Num frames 4900...
3931
+ [2024-09-06 09:35:35,477][01070] Num frames 5000...
3932
+ [2024-09-06 09:35:35,600][01070] Num frames 5100...
3933
+ [2024-09-06 09:35:35,717][01070] Avg episode rewards: #0: 22.704, true rewards: #0: 10.304
3934
+ [2024-09-06 09:35:35,719][01070] Avg episode reward: 22.704, avg true_objective: 10.304
3935
+ [2024-09-06 09:35:35,781][01070] Num frames 5200...
3936
+ [2024-09-06 09:35:35,899][01070] Num frames 5300...
3937
+ [2024-09-06 09:35:36,021][01070] Num frames 5400...
3938
+ [2024-09-06 09:35:36,145][01070] Num frames 5500...
3939
+ [2024-09-06 09:35:36,267][01070] Num frames 5600...
3940
+ [2024-09-06 09:35:36,386][01070] Num frames 5700...
3941
+ [2024-09-06 09:35:36,515][01070] Num frames 5800...
3942
+ [2024-09-06 09:35:36,637][01070] Num frames 5900...
3943
+ [2024-09-06 09:35:36,764][01070] Num frames 6000...
3944
+ [2024-09-06 09:35:36,889][01070] Num frames 6100...
3945
+ [2024-09-06 09:35:37,009][01070] Num frames 6200...
3946
+ [2024-09-06 09:35:37,131][01070] Num frames 6300...
3947
+ [2024-09-06 09:35:37,191][01070] Avg episode rewards: #0: 23.507, true rewards: #0: 10.507
3948
+ [2024-09-06 09:35:37,194][01070] Avg episode reward: 23.507, avg true_objective: 10.507
3949
+ [2024-09-06 09:35:37,306][01070] Num frames 6400...
3950
+ [2024-09-06 09:35:37,424][01070] Num frames 6500...
3951
+ [2024-09-06 09:35:37,552][01070] Num frames 6600...
3952
+ [2024-09-06 09:35:37,668][01070] Num frames 6700...
3953
+ [2024-09-06 09:35:37,791][01070] Num frames 6800...
3954
+ [2024-09-06 09:35:37,943][01070] Avg episode rewards: #0: 21.400, true rewards: #0: 9.829
3955
+ [2024-09-06 09:35:37,945][01070] Avg episode reward: 21.400, avg true_objective: 9.829
3956
+ [2024-09-06 09:35:37,971][01070] Num frames 6900...
3957
+ [2024-09-06 09:35:38,088][01070] Num frames 7000...
3958
+ [2024-09-06 09:35:38,212][01070] Num frames 7100...
3959
+ [2024-09-06 09:35:38,333][01070] Num frames 7200...
3960
+ [2024-09-06 09:35:38,453][01070] Num frames 7300...
3961
+ [2024-09-06 09:35:38,583][01070] Num frames 7400...
3962
+ [2024-09-06 09:35:38,700][01070] Num frames 7500...
3963
+ [2024-09-06 09:35:38,825][01070] Num frames 7600...
3964
+ [2024-09-06 09:35:38,943][01070] Num frames 7700...
3965
+ [2024-09-06 09:35:39,063][01070] Num frames 7800...
3966
+ [2024-09-06 09:35:39,184][01070] Num frames 7900...
3967
+ [2024-09-06 09:35:39,306][01070] Num frames 8000...
3968
+ [2024-09-06 09:35:39,424][01070] Num frames 8100...
3969
+ [2024-09-06 09:35:39,552][01070] Num frames 8200...
3970
+ [2024-09-06 09:35:39,648][01070] Avg episode rewards: #0: 23.416, true rewards: #0: 10.291
3971
+ [2024-09-06 09:35:39,650][01070] Avg episode reward: 23.416, avg true_objective: 10.291
3972
+ [2024-09-06 09:35:39,731][01070] Num frames 8300...
3973
+ [2024-09-06 09:35:39,860][01070] Num frames 8400...
3974
+ [2024-09-06 09:35:39,978][01070] Num frames 8500...
3975
+ [2024-09-06 09:35:40,096][01070] Num frames 8600...
3976
+ [2024-09-06 09:35:40,219][01070] Num frames 8700...
3977
+ [2024-09-06 09:35:40,336][01070] Num frames 8800...
3978
+ [2024-09-06 09:35:40,464][01070] Num frames 8900...
3979
+ [2024-09-06 09:35:40,596][01070] Num frames 9000...
3980
+ [2024-09-06 09:35:40,719][01070] Num frames 9100...
3981
+ [2024-09-06 09:35:40,847][01070] Num frames 9200...
3982
+ [2024-09-06 09:35:40,971][01070] Num frames 9300...
3983
+ [2024-09-06 09:35:41,093][01070] Num frames 9400...
3984
+ [2024-09-06 09:35:41,220][01070] Num frames 9500...
3985
+ [2024-09-06 09:35:41,339][01070] Num frames 9600...
3986
+ [2024-09-06 09:35:41,465][01070] Num frames 9700...
3987
+ [2024-09-06 09:35:41,597][01070] Num frames 9800...
3988
+ [2024-09-06 09:35:41,723][01070] Num frames 9900...
3989
+ [2024-09-06 09:35:41,853][01070] Num frames 10000...
3990
+ [2024-09-06 09:35:41,979][01070] Num frames 10100...
3991
+ [2024-09-06 09:35:42,103][01070] Num frames 10200...
3992
+ [2024-09-06 09:35:42,228][01070] Num frames 10300...
3993
+ [2024-09-06 09:35:42,326][01070] Avg episode rewards: #0: 27.259, true rewards: #0: 11.481
3994
+ [2024-09-06 09:35:42,329][01070] Avg episode reward: 27.259, avg true_objective: 11.481
3995
+ [2024-09-06 09:35:42,414][01070] Num frames 10400...
3996
+ [2024-09-06 09:35:42,552][01070] Num frames 10500...
3997
+ [2024-09-06 09:35:42,675][01070] Num frames 10600...
3998
+ [2024-09-06 09:35:42,830][01070] Num frames 10700...
3999
+ [2024-09-06 09:35:43,000][01070] Num frames 10800...
4000
+ [2024-09-06 09:35:43,164][01070] Num frames 10900...
4001
+ [2024-09-06 09:35:43,294][01070] Avg episode rewards: #0: 25.741, true rewards: #0: 10.941
4002
+ [2024-09-06 09:35:43,299][01070] Avg episode reward: 25.741, avg true_objective: 10.941
4003
+ [2024-09-06 09:36:50,704][01070] Replay video saved to /content/train_dir/default_experiment/replay.mp4!