[2022-12-18 23:20:39,793] [WARNING] [runner.py:179:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2022-12-18 23:20:39,805] [INFO] [runner.py:508:main] cmd = /home/milan/hf_env/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 run_speech_recognition_seq2seq_streaming.py --deepspeed=ds_config.json --model_name_or_path=openai/whisper-large-v2 --dataset_name=mozilla-foundation/common_voice_11_0 --dataset_config_name=hu --language=hungarian --train_split_name=train+validation --eval_split_name=test --model_index_name=Whisper Large-v2 Hungarian CV11 --max_steps=5000 --output_dir=./ --per_device_train_batch_size=32 --per_device_eval_batch_size=8 --gradient_accumulation_steps=2 --logging_steps=25 --learning_rate=1e-5 --warmup_steps=500 --evaluation_strategy=steps --eval_steps=1000 --save_strategy=steps --save_steps=1000 --generation_max_length=225 --length_column_name=input_length --max_duration_in_seconds=30 --text_column_name=sentence --freeze_feature_encoder=False --report_to=tensorboard --metric_for_best_model=wer --greater_is_better=False --load_best_model_at_end --gradient_checkpointing --fp16 --overwrite_output_dir --do_train --do_eval --predict_with_generate --do_normalize_eval --streaming=False --use_auth_token --push_to_hub [2022-12-18 23:20:41,392] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]} [2022-12-18 23:20:41,392] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0 [2022-12-18 23:20:41,392] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(, {'localhost': [0]}) [2022-12-18 23:20:41,392] [INFO] [launch.py:162:main] dist_world_size=1 [2022-12-18 23:20:41,392] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0 [2022-12-18 23:20:42,403] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1700873 [2022-12-18 23:20:42,403] [ERROR] [launch.py:324:sigkill_handler] ['/home/milan/hf_env/bin/python3', '-u', 'run_speech_recognition_seq2seq_streaming.py', '--local_rank=0', '--deepspeed=ds_config.json', '--model_name_or_path=openai/whisper-large-v2', '--dataset_name=mozilla-foundation/common_voice_11_0', '--dataset_config_name=hu', '--language=hungarian', '--train_split_name=train+validation', '--eval_split_name=test', '--model_index_name=Whisper Large-v2 Hungarian CV11', '--max_steps=5000', '--output_dir=./', '--per_device_train_batch_size=32', '--per_device_eval_batch_size=8', '--gradient_accumulation_steps=2', '--logging_steps=25', '--learning_rate=1e-5', '--warmup_steps=500', '--evaluation_strategy=steps', '--eval_steps=1000', '--save_strategy=steps', '--save_steps=1000', '--generation_max_length=225', '--length_column_name=input_length', '--max_duration_in_seconds=30', '--text_column_name=sentence', '--freeze_feature_encoder=False', '--report_to=tensorboard', '--metric_for_best_model=wer', '--greater_is_better=False', '--load_best_model_at_end', '--gradient_checkpointing', '--fp16', '--overwrite_output_dir', '--do_train', '--do_eval', '--predict_with_generate', '--do_normalize_eval', '--streaming=False', '--use_auth_token', '--push_to_hub'] exits with return code = 2 [2022-12-18 23:21:24,489] [WARNING] [runner.py:179:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2022-12-18 23:21:24,500] [INFO] [runner.py:508:main] cmd = /home/milan/hf_env/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 run_speech_recognition_seq2seq_streaming.py --deepspeed=ds_config.json --model_name_or_path=openai/whisper-large-v2 --dataset_name=mozilla-foundation/common_voice_11_0 --dataset_config_name=hu --language=hungarian --train_split_name=train+validation --eval_split_name=test --model_index_name=Whisper Large-v2 Hungarian CV11 --max_steps=5000 --output_dir=./ --per_device_train_batch_size=32 --per_device_eval_batch_size=8 --gradient_accumulation_steps=2 --logging_steps=25 --learning_rate=1e-5 --warmup_steps=500 --evaluation_strategy=steps --eval_steps=1000 --save_strategy=steps --save_steps=1000 --generation_max_length=225 --length_column_name=input_length --max_duration_in_seconds=30 --text_column_name=sentence --freeze_feature_encoder=False --report_to=tensorboard --metric_for_best_model=wer --greater_is_better=False --load_best_model_at_end --gradient_checkpointing --fp16 --overwrite_output_dir --do_train --do_eval --predict_with_generate --do_normalize_eval --streaming=False --use_auth_token --push_to_hub [2022-12-18 23:21:26,072] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]} [2022-12-18 23:21:26,073] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0 [2022-12-18 23:21:26,073] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(, {'localhost': [0]}) [2022-12-18 23:21:26,073] [INFO] [launch.py:162:main] dist_world_size=1 [2022-12-18 23:21:26,073] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0 [2022-12-18 23:21:30,550] [INFO] [comm.py:654:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 12/18/2022 23:21:30 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True 12/18/2022 23:21:30 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=ds_config.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=1000, evaluation_strategy=steps, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_max_length=225, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=True, greater_is_better=False, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=input_length, load_best_model_at_end=True, local_rank=0, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=./runs/Dec18_23-21-30_129-146-123-136, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=25, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=5000, metric_for_best_model=wer, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_hf, optim_args=None, output_dir=./, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=32, predict_with_generate=True, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./, save_on_each_node=False, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=500, weight_decay=0.0, xpu_backend=None, ) 12/18/2022 23:21:30 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=ds_config.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=1000, evaluation_strategy=steps, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_max_length=225, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=True, greater_is_better=False, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=input_length, load_best_model_at_end=True, local_rank=0, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=./runs/Dec18_23-21-30_129-146-123-136, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=25, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=5000, metric_for_best_model=wer, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_hf, optim_args=None, output_dir=./, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=32, predict_with_generate=True, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./, save_on_each_node=False, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=500, weight_decay=0.0, xpu_backend=None, ) 12/18/2022 23:21:32 - INFO - datasets.info - Loading Dataset Infos from /home/milan/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/18/2022 23:21:32 - INFO - datasets.builder - Generating dataset common_voice_11_0 (/home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/hu/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f) Downloading and preparing dataset common_voice_11_0/hu to /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/hu/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f... 12/18/2022 23:21:32 - INFO - datasets.builder - Dataset not on Hf google storage. Downloading and preparing it from source 12/18/2022 23:21:32 - INFO - datasets.download.download_manager - Downloading took 0.0 min 12/18/2022 23:21:32 - INFO - datasets.download.download_manager - Checksum Computation took 0.0 min 12/18/2022 23:21:33 - INFO - datasets.utils.file_utils - https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/audio/hu/train/hu_train_0.tar not found in cache or force_download set to True, downloading to /home/milan/.cache/huggingface/datasets/downloads/tmpn9e2qg3x 12/18/2022 23:21:36 - INFO - datasets.utils.file_utils - storing https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/audio/hu/train/hu_train_0.tar in cache at /home/milan/.cache/huggingface/datasets/downloads/7834691dc3252612415601745623f11a98f4b1bffa5bfbef1775a6a125bf96d5 12/18/2022 23:21:36 - INFO - datasets.utils.file_utils - creating metadata file for /home/milan/.cache/huggingface/datasets/downloads/7834691dc3252612415601745623f11a98f4b1bffa5bfbef1775a6a125bf96d5 12/18/2022 23:21:36 - INFO - datasets.utils.file_utils - https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/audio/hu/dev/hu_dev_0.tar not found in cache or force_download set to True, downloading to /home/milan/.cache/huggingface/datasets/downloads/tmpbckid5hj 12/18/2022 23:21:39 - INFO - datasets.utils.file_utils - storing https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/audio/hu/dev/hu_dev_0.tar in cache at /home/milan/.cache/huggingface/datasets/downloads/f4f7f59bff00cd3b0b6b94ffedd3f53e06e4c3d38e577ece03f351ffe08a8825 12/18/2022 23:21:39 - INFO - datasets.utils.file_utils - creating metadata file for /home/milan/.cache/huggingface/datasets/downloads/f4f7f59bff00cd3b0b6b94ffedd3f53e06e4c3d38e577ece03f351ffe08a8825 12/18/2022 23:21:39 - INFO - datasets.utils.file_utils - https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/audio/hu/test/hu_test_0.tar not found in cache or force_download set to True, downloading to /home/milan/.cache/huggingface/datasets/downloads/tmphjnydhgg 12/18/2022 23:21:41 - INFO - datasets.utils.file_utils - storing https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/audio/hu/test/hu_test_0.tar in cache at /home/milan/.cache/huggingface/datasets/downloads/31d615ab9a0ef7361f97cf9d3083d5561690aead7af67e08a4ef18c5b7926f04 12/18/2022 23:21:41 - INFO - datasets.utils.file_utils - creating metadata file for /home/milan/.cache/huggingface/datasets/downloads/31d615ab9a0ef7361f97cf9d3083d5561690aead7af67e08a4ef18c5b7926f04 12/18/2022 23:21:41 - INFO - datasets.utils.file_utils - https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/audio/hu/other/hu_other_0.tar not found in cache or force_download set to True, downloading to /home/milan/.cache/huggingface/datasets/downloads/tmp3lrwbg7l 12/18/2022 23:21:43 - INFO - datasets.utils.file_utils - storing https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/audio/hu/other/hu_other_0.tar in cache at /home/milan/.cache/huggingface/datasets/downloads/99e33b35c4b81364a2c4ef5f17b9fb8fc28d87152465e864d0a680627170ab2d 12/18/2022 23:21:43 - INFO - datasets.utils.file_utils - creating metadata file for /home/milan/.cache/huggingface/datasets/downloads/99e33b35c4b81364a2c4ef5f17b9fb8fc28d87152465e864d0a680627170ab2d 12/18/2022 23:21:43 - INFO - datasets.utils.file_utils - https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/audio/hu/invalidated/hu_invalidated_0.tar not found in cache or force_download set to True, downloading to /home/milan/.cache/huggingface/datasets/downloads/tmpg6qcc5ou 12/18/2022 23:21:44 - INFO - datasets.utils.file_utils - storing https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/audio/hu/invalidated/hu_invalidated_0.tar in cache at /home/milan/.cache/huggingface/datasets/downloads/a540babc5e249d63db76dea5c2bd5c7553411f4dc9144a980f49ebcdd4eeb388 12/18/2022 23:21:44 - INFO - datasets.utils.file_utils - creating metadata file for /home/milan/.cache/huggingface/datasets/downloads/a540babc5e249d63db76dea5c2bd5c7553411f4dc9144a980f49ebcdd4eeb388 12/18/2022 23:21:44 - INFO - datasets.download.download_manager - Downloading took 0.0 min 12/18/2022 23:21:44 - INFO - datasets.download.download_manager - Checksum Computation took 0.0 min 12/18/2022 23:21:47 - INFO - datasets.utils.file_utils - https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/transcript/hu/train.tsv not found in cache or force_download set to True, downloading to /home/milan/.cache/huggingface/datasets/downloads/tmpxe8j68i2 12/18/2022 23:21:48 - INFO - datasets.utils.file_utils - storing https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/transcript/hu/train.tsv in cache at /home/milan/.cache/huggingface/datasets/downloads/1915e72959d2cb0a4b6532a7fdb8857bb29f1e69e5b1476dbce7c18a3372609b 12/18/2022 23:21:48 - INFO - datasets.utils.file_utils - creating metadata file for /home/milan/.cache/huggingface/datasets/downloads/1915e72959d2cb0a4b6532a7fdb8857bb29f1e69e5b1476dbce7c18a3372609b 12/18/2022 23:21:48 - INFO - datasets.utils.file_utils - https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/transcript/hu/dev.tsv not found in cache or force_download set to True, downloading to /home/milan/.cache/huggingface/datasets/downloads/tmpqmu0mwf4 12/18/2022 23:21:49 - INFO - datasets.utils.file_utils - storing https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/transcript/hu/dev.tsv in cache at /home/milan/.cache/huggingface/datasets/downloads/d789b00e1e46cb8ce815f3f23eb950e38ba07c6a4a354c231b781e239afa5dff 12/18/2022 23:21:49 - INFO - datasets.utils.file_utils - creating metadata file for /home/milan/.cache/huggingface/datasets/downloads/d789b00e1e46cb8ce815f3f23eb950e38ba07c6a4a354c231b781e239afa5dff 12/18/2022 23:21:49 - INFO - datasets.utils.file_utils - https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/transcript/hu/test.tsv not found in cache or force_download set to True, downloading to /home/milan/.cache/huggingface/datasets/downloads/tmpug_du8ny 12/18/2022 23:21:49 - INFO - datasets.utils.file_utils - storing https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/transcript/hu/test.tsv in cache at /home/milan/.cache/huggingface/datasets/downloads/209e8710991230d77e369b85cd3e5b7baccf6623929d9b164f1020dd0d5d74b3 12/18/2022 23:21:49 - INFO - datasets.utils.file_utils - creating metadata file for /home/milan/.cache/huggingface/datasets/downloads/209e8710991230d77e369b85cd3e5b7baccf6623929d9b164f1020dd0d5d74b3 12/18/2022 23:21:49 - INFO - datasets.utils.file_utils - https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/transcript/hu/other.tsv not found in cache or force_download set to True, downloading to /home/milan/.cache/huggingface/datasets/downloads/tmpy9ri4851 12/18/2022 23:21:50 - INFO - datasets.utils.file_utils - storing https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/transcript/hu/other.tsv in cache at /home/milan/.cache/huggingface/datasets/downloads/767ea79b4879066f497bc041f562b7cf7d3f9d63dded84079eab8f7378e39bb8 12/18/2022 23:21:50 - INFO - datasets.utils.file_utils - creating metadata file for /home/milan/.cache/huggingface/datasets/downloads/767ea79b4879066f497bc041f562b7cf7d3f9d63dded84079eab8f7378e39bb8 12/18/2022 23:21:50 - INFO - datasets.utils.file_utils - https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/transcript/hu/invalidated.tsv not found in cache or force_download set to True, downloading to /home/milan/.cache/huggingface/datasets/downloads/tmphpulzdsv 12/18/2022 23:21:51 - INFO - datasets.utils.file_utils - storing https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0/resolve/streaming/transcript/hu/invalidated.tsv in cache at /home/milan/.cache/huggingface/datasets/downloads/e809ce8b81700f5ad6edef088b29cb6a39b5ec201977c3bbd5679994e0259f0a 12/18/2022 23:21:51 - INFO - datasets.utils.file_utils - creating metadata file for /home/milan/.cache/huggingface/datasets/downloads/e809ce8b81700f5ad6edef088b29cb6a39b5ec201977c3bbd5679994e0259f0a 12/18/2022 23:21:51 - INFO - datasets.download.download_manager - Downloading took 0.0 min 12/18/2022 23:21:51 - INFO - datasets.download.download_manager - Checksum Computation took 0.0 min 12/18/2022 23:21:51 - INFO - datasets.utils.info_utils - Unable to verify checksums. 12/18/2022 23:21:51 - INFO - datasets.builder - Generating train split 12/18/2022 23:21:53 - INFO - datasets.builder - Generating validation split 12/18/2022 23:21:54 - INFO - datasets.builder - Generating test split 12/18/2022 23:21:56 - INFO - datasets.builder - Generating other split 12/18/2022 23:21:56 - INFO - datasets.builder - Generating invalidated split 12/18/2022 23:21:56 - INFO - datasets.utils.info_utils - Unable to verify splits sizes. Dataset common_voice_11_0 downloaded and prepared to /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/hu/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f. Subsequent calls will reuse this data. 12/18/2022 23:21:58 - INFO - datasets.info - Loading Dataset Infos from /home/milan/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/18/2022 23:21:58 - INFO - datasets.builder - Overwrite dataset info from restored data version. 12/18/2022 23:21:58 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/hu/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/18/2022 23:21:58 - WARNING - datasets.builder - Found cached dataset common_voice_11_0 (/home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/hu/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f) 12/18/2022 23:21:58 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/hu/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/18/2022 23:22:00 - INFO - datasets.info - Loading Dataset Infos from /home/milan/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/18/2022 23:22:00 - INFO - datasets.builder - Overwrite dataset info from restored data version. 12/18/2022 23:22:00 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/hu/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/18/2022 23:22:00 - WARNING - datasets.builder - Found cached dataset common_voice_11_0 (/home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/hu/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f) 12/18/2022 23:22:00 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/hu/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/18/2022 23:22:19 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/hu/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f/cache-49c0230e0aca00fb.arrow 12/18/2022 23:43:21 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/hu/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f/cache-74d1a56461b6f9a2.arrow 12/18/2022 23:55:29 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/hu/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f/cache-dddd3ddd0dbc3a22.arrow 12/18/2022 23:55:31 - WARNING - huggingface_hub.repository - /home/milan/whisper-large2-hu-cv11/./ is already a clone of https://huggingface.co/mikr/whisper-large2-hu-cv11. Make sure you pull the latest changes with `repo.git_pull()`. [2022-12-18 23:55:35,322] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.7, git-hash=unknown, git-branch=unknown [2022-12-18 23:55:36,509] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2022-12-18 23:55:37,681] [WARNING] [cpu_adam.py:83:__init__] FP16 params for CPUAdam may not work on AMD CPUs Installed CUDA version 11.6 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination [1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/milan/hf_env/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/TH -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -c /home/milan/hf_env/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o [2/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/milan/hf_env/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/TH -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/usr/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -c /home/milan/hf_env/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o [3/3] c++ cpu_adam.o custom_cuda_kernel.cuda.o -shared -lcurand -L/home/milan/hf_env/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/lib64 -lcudart -o cpu_adam.so Time to load cpu_adam op: 28.829350471496582 seconds Adam Optimizer #0 is created with AVX2 arithmetic capability. Config: alpha=0.000010, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 [2022-12-18 23:56:08,279] [INFO] [logging.py:68:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer [2022-12-18 23:56:08,583] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam [2022-12-18 23:56:08,583] [INFO] [utils.py:52:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type= [2022-12-18 23:56:08,583] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 2 optimizer [2022-12-18 23:56:08,583] [INFO] [stage_1_and_2.py:140:__init__] Reduce bucket size 200000000 [2022-12-18 23:56:08,584] [INFO] [stage_1_and_2.py:141:__init__] Allgather bucket size 200000000 [2022-12-18 23:56:08,584] [INFO] [stage_1_and_2.py:142:__init__] CPU Offload: True [2022-12-18 23:56:08,584] [INFO] [stage_1_and_2.py:143:__init__] Round robin gradient partitioning: False [1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/TH -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/milan/hf_env/lib/python3.8/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o [2/2] c++ flatten_unflatten.o -shared -L/home/milan/hf_env/lib/python3.8/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o utils.so Time to load utils op: 15.238850355148315 seconds Rank: 0 partition count [1] and sizes[(1543304960, False)] [2022-12-18 23:56:27,203] [INFO] [utils.py:827:see_memory_usage] Before initializing optimizer states [2022-12-18 23:56:27,203] [INFO] [utils.py:828:see_memory_usage] MA 3.0 GB Max_MA 3.0 GB CA 5.99 GB Max_CA 6 GB [2022-12-18 23:56:27,204] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 15.46 GB, percent = 7.9% [2022-12-18 23:56:31,112] [INFO] [utils.py:827:see_memory_usage] After initializing optimizer states [2022-12-18 23:56:31,112] [INFO] [utils.py:828:see_memory_usage] MA 3.0 GB Max_MA 3.0 GB CA 5.99 GB Max_CA 6 GB [2022-12-18 23:56:31,113] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 35.12 GB, percent = 17.9% [2022-12-18 23:56:31,113] [INFO] [stage_1_and_2.py:525:__init__] optimizer state initialized [2022-12-18 23:56:31,183] [INFO] [utils.py:827:see_memory_usage] After initializing ZeRO optimizer [2022-12-18 23:56:31,183] [INFO] [utils.py:828:see_memory_usage] MA 3.0 GB Max_MA 3.0 GB CA 5.99 GB Max_CA 6 GB [2022-12-18 23:56:31,184] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 35.12 GB, percent = 17.9% [2022-12-18 23:56:31,208] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw [2022-12-18 23:56:31,208] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupDecayLR [2022-12-18 23:56:31,208] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-12-18 23:56:31,208] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-18 23:56:31,210] [INFO] [config.py:1020:print] DeepSpeedEngine configuration: [2022-12-18 23:56:31,210] [INFO] [config.py:1024:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-12-18 23:56:31,210] [INFO] [config.py:1024:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-12-18 23:56:31,210] [INFO] [config.py:1024:print] amp_enabled .................. False [2022-12-18 23:56:31,210] [INFO] [config.py:1024:print] amp_params ................... False [2022-12-18 23:56:31,210] [INFO] [config.py:1024:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] bfloat16_enabled ............. False [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] checkpoint_parallel_write_pipeline False [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] checkpoint_tag_validation_enabled True [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] checkpoint_tag_validation_fail False [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] comms_config ................. [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] communication_data_type ...... None [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] curriculum_enabled ........... False [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] curriculum_params ............ False [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] dataloader_drop_last ......... False [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] disable_allgather ............ False [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] dump_state ................... False [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1} [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] eigenvalue_enabled ........... False [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] eigenvalue_gas_boundary_resolution 1 [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] eigenvalue_layer_num ......... 0 [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] eigenvalue_max_iter .......... 100 [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] eigenvalue_stability ......... 1e-06 [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] eigenvalue_tol ............... 0.01 [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] eigenvalue_verbose ........... False [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] elasticity_enabled ........... False [2022-12-18 23:56:31,211] [INFO] [config.py:1024:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] fp16_auto_cast ............... False [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] fp16_enabled ................. True [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] fp16_master_weights_and_gradients False [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] global_rank .................. 0 [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] grad_accum_dtype ............. None [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] gradient_accumulation_steps .. 2 [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] gradient_clipping ............ 1.0 [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] gradient_predivide_factor .... 1.0 [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] initial_dynamic_scale ........ 65536 [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] load_universal_checkpoint .... False [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] loss_scale ................... 0 [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] memory_breakdown ............. False [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] monitor_config ............... [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] optimizer_legacy_fusion ...... False [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] optimizer_name ............... adamw [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] optimizer_params ............. {'lr': 1e-05, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.0} [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] pld_enabled .................. False [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] pld_params ................... False [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] prescale_gradients ........... False [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] scheduler_name ............... WarmupDecayLR [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] scheduler_params ............. {'last_batch_iteration': -1, 'total_num_steps': 5000, 'warmup_min_lr': 0, 'warmup_max_lr': 1e-05, 'warmup_num_steps': 500} [2022-12-18 23:56:31,212] [INFO] [config.py:1024:print] sparse_attention ............. None [2022-12-18 23:56:31,213] [INFO] [config.py:1024:print] sparse_gradients_enabled ..... False [2022-12-18 23:56:31,213] [INFO] [config.py:1024:print] steps_per_print .............. 10 [2022-12-18 23:56:31,213] [INFO] [config.py:1024:print] train_batch_size ............. 64 [2022-12-18 23:56:31,213] [INFO] [config.py:1024:print] train_micro_batch_size_per_gpu 32 [2022-12-18 23:56:31,213] [INFO] [config.py:1024:print] use_node_local_storage ....... False [2022-12-18 23:56:31,213] [INFO] [config.py:1024:print] wall_clock_breakdown ......... False [2022-12-18 23:56:31,213] [INFO] [config.py:1024:print] world_size ................... 1 [2022-12-18 23:56:31,213] [INFO] [config.py:1024:print] zero_allow_untested_optimizer False [2022-12-18 23:56:31,213] [INFO] [config.py:1024:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=200000000 allgather_partitions=True allgather_bucket_size=200000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=True, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [2022-12-18 23:56:31,213] [INFO] [config.py:1024:print] zero_enabled ................. True [2022-12-18 23:56:31,213] [INFO] [config.py:1024:print] zero_optimization_stage ...... 2 [2022-12-18 23:56:31,213] [INFO] [config.py:1009:print_user_config] json = { "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": 1e-05, "betas": [0.9, 0.999], "eps": 1e-08, "weight_decay": 0.0 } }, "scheduler": { "type": "WarmupDecayLR", "params": { "last_batch_iteration": -1, "total_num_steps": 5.000000e+03, "warmup_min_lr": 0, "warmup_max_lr": 1e-05, "warmup_num_steps": 500 } }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2.000000e+08, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2.000000e+08, "contiguous_gradients": true }, "gradient_accumulation_steps": 2, "gradient_clipping": 1.0, "train_batch_size": 64, "train_micro_batch_size_per_gpu": 32 } Time to load utils op: 0.0003571510314941406 seconds [2022-12-18 23:56:55,627] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 65536 [2022-12-18 23:56:55,629] [INFO] [timer.py:197:stop] 0/4, RunningAvgSamplesPerSec=6.763630714125663, CurrSamplesPerSec=6.400820417656072, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:57:06,985] [INFO] [timer.py:197:stop] 0/6, RunningAvgSamplesPerSec=6.536430141688239, CurrSamplesPerSec=5.682596666322877, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:57:18,338] [INFO] [timer.py:197:stop] 0/8, RunningAvgSamplesPerSec=6.460811070153207, CurrSamplesPerSec=5.669483377217449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:57:29,033] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768.0 [2022-12-18 23:57:29,035] [INFO] [timer.py:197:stop] 0/10, RunningAvgSamplesPerSec=6.52917405857395, CurrSamplesPerSec=6.383157841872113, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:57:39,764] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0 [2022-12-18 23:57:39,766] [INFO] [timer.py:197:stop] 0/12, RunningAvgSamplesPerSec=6.568214533908828, CurrSamplesPerSec=6.393844122568189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:57:51,114] [INFO] [timer.py:197:stop] 0/14, RunningAvgSamplesPerSec=6.525088580840918, CurrSamplesPerSec=5.684102939821898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:58:02,429] [INFO] [timer.py:197:stop] 0/16, RunningAvgSamplesPerSec=6.497422112749595, CurrSamplesPerSec=5.714115760953486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:58:13,702] [INFO] [timer.py:197:stop] 0/18, RunningAvgSamplesPerSec=6.476673258035022, CurrSamplesPerSec=5.712435261056795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:58:25,074] [INFO] [logging.py:68:log_dist] [Rank 0] step=10, skipped=3, lr=[3.131187225706726e-06], mom=[[0.9, 0.999]] [2022-12-18 23:58:25,076] [INFO] [timer.py:197:stop] 0/20, RunningAvgSamplesPerSec=6.457550487934793, CurrSamplesPerSec=5.672483307478059, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:58:36,426] [INFO] [timer.py:197:stop] 0/22, RunningAvgSamplesPerSec=6.44415601612862, CurrSamplesPerSec=5.681215514560827, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:58:47,757] [INFO] [timer.py:197:stop] 0/24, RunningAvgSamplesPerSec=6.43271076727295, CurrSamplesPerSec=5.699260773577131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:58:59,101] [INFO] [timer.py:197:stop] 0/26, RunningAvgSamplesPerSec=6.422076700162281, CurrSamplesPerSec=5.681345374840111, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:59:10,506] [INFO] [timer.py:197:stop] 0/28, RunningAvgSamplesPerSec=6.411644554384639, CurrSamplesPerSec=5.650441250855927, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:59:21,916] [INFO] [timer.py:197:stop] 0/30, RunningAvgSamplesPerSec=6.404704339537071, CurrSamplesPerSec=5.6816339744942725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:59:33,244] [INFO] [timer.py:197:stop] 0/32, RunningAvgSamplesPerSec=6.398182268942303, CurrSamplesPerSec=5.66514939224817, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:59:44,592] [INFO] [timer.py:197:stop] 0/34, RunningAvgSamplesPerSec=6.393101858742206, CurrSamplesPerSec=5.696323843911464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-18 23:59:56,041] [INFO] [timer.py:197:stop] 0/36, RunningAvgSamplesPerSec=6.38533580992348, CurrSamplesPerSec=5.614370152964268, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:00:07,832] [INFO] [timer.py:197:stop] 0/38, RunningAvgSamplesPerSec=6.379980867946317, CurrSamplesPerSec=5.682436676189912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:00:19,630] [INFO] [logging.py:68:log_dist] [Rank 0] step=20, skipped=3, lr=[4.558957377820063e-06], mom=[[0.9, 0.999]] [2022-12-19 00:00:19,631] [INFO] [timer.py:197:stop] 0/40, RunningAvgSamplesPerSec=6.375262110305903, CurrSamplesPerSec=5.6558136022639145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:00:31,589] [INFO] [timer.py:197:stop] 0/42, RunningAvgSamplesPerSec=6.371413619674249, CurrSamplesPerSec=5.687795349133654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:00:43,045] [INFO] [timer.py:197:stop] 0/44, RunningAvgSamplesPerSec=6.369337824657779, CurrSamplesPerSec=5.695208352808133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:00:54,450] [INFO] [timer.py:197:stop] 0/46, RunningAvgSamplesPerSec=6.365879927224, CurrSamplesPerSec=5.663550381260192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:01:05,809] [INFO] [timer.py:197:stop] 0/48, RunningAvgSamplesPerSec=6.3628448782739975, CurrSamplesPerSec=5.668740594174487, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:01:17,133] [INFO] [timer.py:197:stop] 0/50, RunningAvgSamplesPerSec=6.3610764096707095, CurrSamplesPerSec=5.702925414016152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.469, 'learning_rate': 4.973833272194737e-06, 'epoch': 0.19} [2022-12-19 00:01:28,495] [INFO] [timer.py:197:stop] 0/52, RunningAvgSamplesPerSec=6.360192297523213, CurrSamplesPerSec=5.717501892447633, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:01:39,874] [INFO] [timer.py:197:stop] 0/54, RunningAvgSamplesPerSec=6.357957980419815, CurrSamplesPerSec=5.6773392751122245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:01:51,182] [INFO] [timer.py:197:stop] 0/56, RunningAvgSamplesPerSec=6.357177728437714, CurrSamplesPerSec=5.704344538382278, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:02:02,727] [INFO] [timer.py:197:stop] 0/58, RunningAvgSamplesPerSec=6.355616580687708, CurrSamplesPerSec=5.6876068669078315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:02:14,121] [INFO] [logging.py:68:log_dist] [Rank 0] step=30, skipped=3, lr=[5.303370403744525e-06], mom=[[0.9, 0.999]] [2022-12-19 00:02:14,122] [INFO] [timer.py:197:stop] 0/60, RunningAvgSamplesPerSec=6.354439303923271, CurrSamplesPerSec=5.698230253920275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:02:25,514] [INFO] [timer.py:197:stop] 0/62, RunningAvgSamplesPerSec=6.353114078967978, CurrSamplesPerSec=5.673747966682336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:02:36,883] [INFO] [timer.py:197:stop] 0/64, RunningAvgSamplesPerSec=6.3523286363618405, CurrSamplesPerSec=5.693366030373531, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:02:48,304] [INFO] [timer.py:197:stop] 0/66, RunningAvgSamplesPerSec=6.351188060029837, CurrSamplesPerSec=5.692320259040157, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:02:59,845] [INFO] [timer.py:197:stop] 0/68, RunningAvgSamplesPerSec=6.349831784979684, CurrSamplesPerSec=5.662888713661619, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:03:11,147] [INFO] [timer.py:197:stop] 0/70, RunningAvgSamplesPerSec=6.349548670035175, CurrSamplesPerSec=5.7110026315224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:03:22,556] [INFO] [timer.py:197:stop] 0/72, RunningAvgSamplesPerSec=6.3490761334381425, CurrSamplesPerSec=5.730710295607525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:03:33,954] [INFO] [timer.py:197:stop] 0/74, RunningAvgSamplesPerSec=6.3478368588085825, CurrSamplesPerSec=5.664371886193913, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:03:45,417] [INFO] [timer.py:197:stop] 0/76, RunningAvgSamplesPerSec=6.346795782912525, CurrSamplesPerSec=5.668257482610842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:03:56,804] [INFO] [timer.py:197:stop] 0/78, RunningAvgSamplesPerSec=6.34552787901348, CurrSamplesPerSec=5.67239604425034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:04:08,235] [INFO] [logging.py:68:log_dist] [Rank 0] step=40, skipped=3, lr=[5.810371073215365e-06], mom=[[0.9, 0.999]] [2022-12-19 00:04:08,237] [INFO] [timer.py:197:stop] 0/80, RunningAvgSamplesPerSec=6.344626667174682, CurrSamplesPerSec=5.6718181130139245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:04:19,591] [INFO] [timer.py:197:stop] 0/82, RunningAvgSamplesPerSec=6.343983756791708, CurrSamplesPerSec=5.684003283073533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:04:31,082] [INFO] [timer.py:197:stop] 0/84, RunningAvgSamplesPerSec=6.343322138412069, CurrSamplesPerSec=5.682808396438611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:04:42,408] [INFO] [timer.py:197:stop] 0/86, RunningAvgSamplesPerSec=6.342770646234098, CurrSamplesPerSec=5.688240814577332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:04:53,901] [INFO] [timer.py:197:stop] 0/88, RunningAvgSamplesPerSec=6.342187782305656, CurrSamplesPerSec=5.697844177243397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:05:05,287] [INFO] [timer.py:197:stop] 0/90, RunningAvgSamplesPerSec=6.341244291943437, CurrSamplesPerSec=5.6740868875432495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:05:16,777] [INFO] [timer.py:197:stop] 0/92, RunningAvgSamplesPerSec=6.340564634082798, CurrSamplesPerSec=5.670572042474326, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:05:28,156] [INFO] [timer.py:197:stop] 0/94, RunningAvgSamplesPerSec=6.340309004792443, CurrSamplesPerSec=5.712390769212697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:05:39,483] [INFO] [timer.py:197:stop] 0/96, RunningAvgSamplesPerSec=6.340256485181379, CurrSamplesPerSec=5.717315576498292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:05:50,971] [INFO] [timer.py:197:stop] 0/98, RunningAvgSamplesPerSec=6.33952480296046, CurrSamplesPerSec=5.674263679743229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:06:02,415] [INFO] [logging.py:68:log_dist] [Rank 0] step=50, skipped=3, lr=[6.195318418690893e-06], mom=[[0.9, 0.999]] [2022-12-19 00:06:02,417] [INFO] [timer.py:197:stop] 0/100, RunningAvgSamplesPerSec=6.33897692276685, CurrSamplesPerSec=5.695520114560736, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.2768, 'learning_rate': 6.195318418690893e-06, 'epoch': 0.37} [2022-12-19 00:06:13,756] [INFO] [timer.py:197:stop] 0/102, RunningAvgSamplesPerSec=6.3385488318911065, CurrSamplesPerSec=5.68085770806229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:06:25,338] [INFO] [timer.py:197:stop] 0/104, RunningAvgSamplesPerSec=6.337702098624332, CurrSamplesPerSec=5.671691803620073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:06:36,705] [INFO] [timer.py:197:stop] 0/106, RunningAvgSamplesPerSec=6.337090033703842, CurrSamplesPerSec=5.67942428907718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:06:48,243] [INFO] [timer.py:197:stop] 0/108, RunningAvgSamplesPerSec=6.336622589288899, CurrSamplesPerSec=5.6761032599704375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:06:59,796] [INFO] [timer.py:197:stop] 0/110, RunningAvgSamplesPerSec=6.33521692888461, CurrSamplesPerSec=5.662787171287403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:07:11,136] [INFO] [timer.py:197:stop] 0/112, RunningAvgSamplesPerSec=6.33498229466653, CurrSamplesPerSec=5.682813689893659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:07:22,432] [INFO] [timer.py:197:stop] 0/114, RunningAvgSamplesPerSec=6.335179424223348, CurrSamplesPerSec=5.717447092378461, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:07:33,770] [INFO] [timer.py:197:stop] 0/116, RunningAvgSamplesPerSec=6.334813993604455, CurrSamplesPerSec=5.681002219911839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:07:45,324] [INFO] [timer.py:197:stop] 0/118, RunningAvgSamplesPerSec=6.334159186301398, CurrSamplesPerSec=5.66402432491068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:07:56,812] [INFO] [logging.py:68:log_dist] [Rank 0] step=60, skipped=3, lr=[6.505722008216461e-06], mom=[[0.9, 0.999]] [2022-12-19 00:07:56,813] [INFO] [timer.py:197:stop] 0/120, RunningAvgSamplesPerSec=6.334148983596415, CurrSamplesPerSec=5.70464639030498, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:08:08,316] [INFO] [timer.py:197:stop] 0/122, RunningAvgSamplesPerSec=6.333515012729501, CurrSamplesPerSec=5.648025915880862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:08:19,714] [INFO] [timer.py:197:stop] 0/124, RunningAvgSamplesPerSec=6.33315550170096, CurrSamplesPerSec=5.683986914666378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:08:31,032] [INFO] [timer.py:197:stop] 0/126, RunningAvgSamplesPerSec=6.332929766520779, CurrSamplesPerSec=5.703466076508138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:08:42,352] [INFO] [timer.py:197:stop] 0/128, RunningAvgSamplesPerSec=6.332797306552546, CurrSamplesPerSec=5.687579391002574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:08:53,644] [INFO] [timer.py:197:stop] 0/130, RunningAvgSamplesPerSec=6.332691472812845, CurrSamplesPerSec=5.692608526146744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:09:04,933] [INFO] [timer.py:197:stop] 0/132, RunningAvgSamplesPerSec=6.332611746493174, CurrSamplesPerSec=5.691202474082477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:09:16,262] [INFO] [timer.py:197:stop] 0/134, RunningAvgSamplesPerSec=6.3325367852403724, CurrSamplesPerSec=5.688610642467863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:09:27,607] [INFO] [timer.py:197:stop] 0/136, RunningAvgSamplesPerSec=6.332240924170941, CurrSamplesPerSec=5.695987821193032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:09:38,967] [INFO] [timer.py:197:stop] 0/138, RunningAvgSamplesPerSec=6.3317661562480065, CurrSamplesPerSec=5.666736625998635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:09:50,308] [INFO] [logging.py:68:log_dist] [Rank 0] step=70, skipped=3, lr=[6.765821034569313e-06], mom=[[0.9, 0.999]] [2022-12-19 00:09:50,309] [INFO] [timer.py:197:stop] 0/140, RunningAvgSamplesPerSec=6.33173963301883, CurrSamplesPerSec=5.688135709298204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:10:01,649] [INFO] [timer.py:197:stop] 0/142, RunningAvgSamplesPerSec=6.331692178123189, CurrSamplesPerSec=5.699286184381356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:10:12,985] [INFO] [timer.py:197:stop] 0/144, RunningAvgSamplesPerSec=6.3315197039139335, CurrSamplesPerSec=5.6873140452636, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:10:24,349] [INFO] [timer.py:197:stop] 0/146, RunningAvgSamplesPerSec=6.331157679958395, CurrSamplesPerSec=5.658794546738779, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:10:35,686] [INFO] [timer.py:197:stop] 0/148, RunningAvgSamplesPerSec=6.330903280401342, CurrSamplesPerSec=5.6861688358950335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:10:47,000] [INFO] [timer.py:197:stop] 0/150, RunningAvgSamplesPerSec=6.330923666571888, CurrSamplesPerSec=5.68941194044744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.2426, 'learning_rate': 6.881634451095711e-06, 'epoch': 0.56} [2022-12-19 00:10:58,301] [INFO] [timer.py:197:stop] 0/152, RunningAvgSamplesPerSec=6.331214712690136, CurrSamplesPerSec=5.724838499410061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:11:09,625] [INFO] [timer.py:197:stop] 0/154, RunningAvgSamplesPerSec=6.331224620981721, CurrSamplesPerSec=5.69813155264911, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:11:20,972] [INFO] [timer.py:197:stop] 0/156, RunningAvgSamplesPerSec=6.331098653418828, CurrSamplesPerSec=5.69718777348036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:11:32,289] [INFO] [timer.py:197:stop] 0/158, RunningAvgSamplesPerSec=6.331127713740547, CurrSamplesPerSec=5.687857295508174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:11:43,679] [INFO] [logging.py:68:log_dist] [Rank 0] step=80, skipped=3, lr=[6.9896691039239e-06], mom=[[0.9, 0.999]] [2022-12-19 00:11:43,681] [INFO] [timer.py:197:stop] 0/160, RunningAvgSamplesPerSec=6.330809560610306, CurrSamplesPerSec=5.673481751513187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:11:55,122] [INFO] [timer.py:197:stop] 0/162, RunningAvgSamplesPerSec=6.3299476836731685, CurrSamplesPerSec=5.5857973350859265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:12:06,570] [INFO] [timer.py:197:stop] 0/164, RunningAvgSamplesPerSec=6.3290526826189835, CurrSamplesPerSec=5.600629157171996, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:12:17,860] [INFO] [timer.py:197:stop] 0/166, RunningAvgSamplesPerSec=6.329082340849397, CurrSamplesPerSec=5.701795467246513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:12:29,171] [INFO] [timer.py:197:stop] 0/168, RunningAvgSamplesPerSec=6.328855084166909, CurrSamplesPerSec=5.6554294390015665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:12:40,492] [INFO] [timer.py:197:stop] 0/170, RunningAvgSamplesPerSec=6.328759333849463, CurrSamplesPerSec=5.682248067814558, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:12:51,842] [INFO] [timer.py:197:stop] 0/172, RunningAvgSamplesPerSec=6.32877886699518, CurrSamplesPerSec=5.699245769208656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:13:03,165] [INFO] [timer.py:197:stop] 0/174, RunningAvgSamplesPerSec=6.328900436940702, CurrSamplesPerSec=5.718768188351172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:13:14,464] [INFO] [timer.py:197:stop] 0/176, RunningAvgSamplesPerSec=6.328756221210079, CurrSamplesPerSec=5.694242591398553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:13:25,810] [INFO] [timer.py:197:stop] 0/178, RunningAvgSamplesPerSec=6.328672475411587, CurrSamplesPerSec=5.678090078488593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:13:37,168] [INFO] [logging.py:68:log_dist] [Rank 0] step=90, skipped=3, lr=[7.186146009413563e-06], mom=[[0.9, 0.999]] [2022-12-19 00:13:37,170] [INFO] [timer.py:197:stop] 0/180, RunningAvgSamplesPerSec=6.328216223080796, CurrSamplesPerSec=5.67523899346736, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:13:48,530] [INFO] [timer.py:197:stop] 0/182, RunningAvgSamplesPerSec=6.327964880274314, CurrSamplesPerSec=5.66142064582372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:13:59,901] [INFO] [timer.py:197:stop] 0/184, RunningAvgSamplesPerSec=6.3276568043461445, CurrSamplesPerSec=5.6757672183922665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:14:11,195] [INFO] [timer.py:197:stop] 0/186, RunningAvgSamplesPerSec=6.32755591494186, CurrSamplesPerSec=5.684328504353401, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:14:22,576] [INFO] [timer.py:197:stop] 0/188, RunningAvgSamplesPerSec=6.327390350075128, CurrSamplesPerSec=5.682272605440604, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:14:33,954] [INFO] [timer.py:197:stop] 0/190, RunningAvgSamplesPerSec=6.327054551930756, CurrSamplesPerSec=5.669776521185562, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:14:45,644] [INFO] [timer.py:197:stop] 0/192, RunningAvgSamplesPerSec=6.326579129921109, CurrSamplesPerSec=5.653444632792826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:14:57,344] [INFO] [timer.py:197:stop] 0/194, RunningAvgSamplesPerSec=6.325978009508057, CurrSamplesPerSec=5.6384486702391445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:15:09,240] [INFO] [timer.py:197:stop] 0/196, RunningAvgSamplesPerSec=6.325550947850442, CurrSamplesPerSec=5.655988065840153, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:15:20,778] [INFO] [timer.py:197:stop] 0/198, RunningAvgSamplesPerSec=6.325389163849206, CurrSamplesPerSec=5.687417433165994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:15:32,255] [INFO] [logging.py:68:log_dist] [Rank 0] step=100, skipped=3, lr=[7.361221988663844e-06], mom=[[0.9, 0.999]] [2022-12-19 00:15:32,255] [INFO] [timer.py:197:stop] 0/200, RunningAvgSamplesPerSec=6.324926355796128, CurrSamplesPerSec=5.656474332607332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.2244, 'learning_rate': 7.361221988663844e-06, 'epoch': 0.75} [2022-12-19 00:15:43,680] [INFO] [timer.py:197:stop] 0/202, RunningAvgSamplesPerSec=6.32453033793769, CurrSamplesPerSec=5.638279076686136, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:15:55,010] [INFO] [timer.py:197:stop] 0/204, RunningAvgSamplesPerSec=6.324453419893308, CurrSamplesPerSec=5.681493038000189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:16:06,556] [INFO] [timer.py:197:stop] 0/206, RunningAvgSamplesPerSec=6.324486996475307, CurrSamplesPerSec=5.691360786182658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:16:17,887] [INFO] [timer.py:197:stop] 0/208, RunningAvgSamplesPerSec=6.324509560193469, CurrSamplesPerSec=5.712404627253756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:16:29,421] [INFO] [timer.py:197:stop] 0/210, RunningAvgSamplesPerSec=6.324302705124975, CurrSamplesPerSec=5.645062238684859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:16:40,802] [INFO] [timer.py:197:stop] 0/212, RunningAvgSamplesPerSec=6.324122228524852, CurrSamplesPerSec=5.669440509847704, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:16:52,142] [INFO] [timer.py:197:stop] 0/214, RunningAvgSamplesPerSec=6.324259002477085, CurrSamplesPerSec=5.714291893712712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:17:03,615] [INFO] [timer.py:197:stop] 0/216, RunningAvgSamplesPerSec=6.32430448459279, CurrSamplesPerSec=5.696339558174375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:17:14,994] [INFO] [timer.py:197:stop] 0/218, RunningAvgSamplesPerSec=6.324204261308274, CurrSamplesPerSec=5.680810821437903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:17:26,480] [INFO] [logging.py:68:log_dist] [Rank 0] step=110, skipped=3, lr=[7.5191046007362515e-06], mom=[[0.9, 0.999]] [2022-12-19 00:17:26,482] [INFO] [timer.py:197:stop] 0/220, RunningAvgSamplesPerSec=6.324113089030932, CurrSamplesPerSec=5.690471118949904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:17:37,346] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0 [2022-12-19 00:17:37,348] [INFO] [timer.py:197:stop] 0/222, RunningAvgSamplesPerSec=6.327646311552264, CurrSamplesPerSec=6.392320018044726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:17:48,819] [INFO] [timer.py:197:stop] 0/224, RunningAvgSamplesPerSec=6.3274537514777744, CurrSamplesPerSec=5.669941068705244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:18:00,164] [INFO] [timer.py:197:stop] 0/226, RunningAvgSamplesPerSec=6.327342475748278, CurrSamplesPerSec=5.693449834266986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:18:11,544] [INFO] [timer.py:197:stop] 0/228, RunningAvgSamplesPerSec=6.327154317339907, CurrSamplesPerSec=5.678557569498987, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:18:22,915] [INFO] [timer.py:197:stop] 0/230, RunningAvgSamplesPerSec=6.327103220844448, CurrSamplesPerSec=5.686582002582689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:18:34,385] [INFO] [timer.py:197:stop] 0/232, RunningAvgSamplesPerSec=6.326879425525643, CurrSamplesPerSec=5.673301650733854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:18:45,742] [INFO] [timer.py:197:stop] 0/234, RunningAvgSamplesPerSec=6.326778545660543, CurrSamplesPerSec=5.673341938597102, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:18:57,269] [INFO] [timer.py:197:stop] 0/236, RunningAvgSamplesPerSec=6.326404490464032, CurrSamplesPerSec=5.697001811848725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:19:08,626] [INFO] [timer.py:197:stop] 0/238, RunningAvgSamplesPerSec=6.326351983297122, CurrSamplesPerSec=5.681701559150151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:19:20,130] [INFO] [logging.py:68:log_dist] [Rank 0] step=120, skipped=4, lr=[7.649058662787184e-06], mom=[[0.9, 0.999]] [2022-12-19 00:19:20,132] [INFO] [timer.py:197:stop] 0/240, RunningAvgSamplesPerSec=6.326111817729406, CurrSamplesPerSec=5.663624945048574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:19:31,444] [INFO] [timer.py:197:stop] 0/242, RunningAvgSamplesPerSec=6.3261978165459585, CurrSamplesPerSec=5.6900899525945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:19:42,809] [INFO] [timer.py:197:stop] 0/244, RunningAvgSamplesPerSec=6.3261069382885795, CurrSamplesPerSec=5.673907467917805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:19:54,332] [INFO] [timer.py:197:stop] 0/246, RunningAvgSamplesPerSec=6.325872059251878, CurrSamplesPerSec=5.662446971142885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:20:05,683] [INFO] [timer.py:197:stop] 0/248, RunningAvgSamplesPerSec=6.3258389816017795, CurrSamplesPerSec=5.705126509772607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:20:17,028] [INFO] [timer.py:197:stop] 0/250, RunningAvgSamplesPerSec=6.325799453129848, CurrSamplesPerSec=5.664985840039145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.2247, 'learning_rate': 7.716963756434345e-06, 'epoch': 0.94} [2022-12-19 00:20:28,473] [INFO] [timer.py:197:stop] 0/252, RunningAvgSamplesPerSec=6.325563266341907, CurrSamplesPerSec=5.671827460611994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:20:39,976] [INFO] [timer.py:197:stop] 0/254, RunningAvgSamplesPerSec=6.325461516386534, CurrSamplesPerSec=5.6731141278529895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:20:51,316] [INFO] [timer.py:197:stop] 0/256, RunningAvgSamplesPerSec=6.325444122591116, CurrSamplesPerSec=5.6907811558264605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:21:02,778] [INFO] [timer.py:197:stop] 0/258, RunningAvgSamplesPerSec=6.325442583698493, CurrSamplesPerSec=5.695032186259812, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:21:14,145] [INFO] [logging.py:68:log_dist] [Rank 0] step=130, skipped=4, lr=[7.782118888847307e-06], mom=[[0.9, 0.999]] [2022-12-19 00:21:14,147] [INFO] [timer.py:197:stop] 0/260, RunningAvgSamplesPerSec=6.325243906111729, CurrSamplesPerSec=5.662732937205337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:21:25,617] [INFO] [timer.py:197:stop] 0/262, RunningAvgSamplesPerSec=6.325108230507205, CurrSamplesPerSec=5.668189738662974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:21:36,989] [INFO] [timer.py:197:stop] 0/264, RunningAvgSamplesPerSec=6.3250036158640786, CurrSamplesPerSec=5.690783809984471, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:21:48,328] [INFO] [timer.py:197:stop] 0/266, RunningAvgSamplesPerSec=6.324874077498679, CurrSamplesPerSec=5.657228926707887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:21:58,736] [INFO] [timer.py:197:stop] 0/268, RunningAvgSamplesPerSec=6.328685879831648, CurrSamplesPerSec=5.699145338776445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:22:10,078] [INFO] [timer.py:197:stop] 0/270, RunningAvgSamplesPerSec=6.328592651378525, CurrSamplesPerSec=5.701187070786945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:22:21,600] [INFO] [timer.py:197:stop] 0/272, RunningAvgSamplesPerSec=6.3283978797948786, CurrSamplesPerSec=5.677870293029516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:22:32,908] [INFO] [timer.py:197:stop] 0/274, RunningAvgSamplesPerSec=6.328381746136948, CurrSamplesPerSec=5.686675003479556, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:22:44,249] [INFO] [timer.py:197:stop] 0/276, RunningAvgSamplesPerSec=6.328281744558154, CurrSamplesPerSec=5.695803388164962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:22:55,672] [INFO] [timer.py:197:stop] 0/278, RunningAvgSamplesPerSec=6.328182902409602, CurrSamplesPerSec=5.677010771022341, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:23:07,023] [INFO] [logging.py:68:log_dist] [Rank 0] step=140, skipped=4, lr=[7.905011559752758e-06], mom=[[0.9, 0.999]] [2022-12-19 00:23:07,024] [INFO] [timer.py:197:stop] 0/280, RunningAvgSamplesPerSec=6.328057150363153, CurrSamplesPerSec=5.680155451498959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:23:18,360] [INFO] [timer.py:197:stop] 0/282, RunningAvgSamplesPerSec=6.3279484577468885, CurrSamplesPerSec=5.688211162924068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:23:29,711] [INFO] [timer.py:197:stop] 0/284, RunningAvgSamplesPerSec=6.327916346392182, CurrSamplesPerSec=5.692176377840019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:23:41,046] [INFO] [timer.py:197:stop] 0/286, RunningAvgSamplesPerSec=6.327953084244207, CurrSamplesPerSec=5.706254866097372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:23:52,428] [INFO] [timer.py:197:stop] 0/288, RunningAvgSamplesPerSec=6.327738478754996, CurrSamplesPerSec=5.672490499622081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:24:03,757] [INFO] [timer.py:197:stop] 0/290, RunningAvgSamplesPerSec=6.32769283480951, CurrSamplesPerSec=5.696189430066696, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:24:15,155] [INFO] [timer.py:197:stop] 0/292, RunningAvgSamplesPerSec=6.327483742968983, CurrSamplesPerSec=5.67373885259688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:24:26,544] [INFO] [timer.py:197:stop] 0/294, RunningAvgSamplesPerSec=6.327302278086137, CurrSamplesPerSec=5.670200963063309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:24:37,904] [INFO] [timer.py:197:stop] 0/296, RunningAvgSamplesPerSec=6.327141303932303, CurrSamplesPerSec=5.677278758344328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:24:49,254] [INFO] [timer.py:197:stop] 0/298, RunningAvgSamplesPerSec=6.327068725258702, CurrSamplesPerSec=5.679531716333437, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:25:00,570] [INFO] [logging.py:68:log_dist] [Rank 0] step=150, skipped=4, lr=[8.019180844200955e-06], mom=[[0.9, 0.999]] [2022-12-19 00:25:00,572] [INFO] [timer.py:197:stop] 0/300, RunningAvgSamplesPerSec=6.327055388465258, CurrSamplesPerSec=5.691934258615557, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.156, 'learning_rate': 8.019180844200955e-06, 'epoch': 1.13} [2022-12-19 00:25:11,937] [INFO] [timer.py:197:stop] 0/302, RunningAvgSamplesPerSec=6.326950257178029, CurrSamplesPerSec=5.681354513379643, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:25:23,300] [INFO] [timer.py:197:stop] 0/304, RunningAvgSamplesPerSec=6.326869748973422, CurrSamplesPerSec=5.693791837691181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:25:34,640] [INFO] [timer.py:197:stop] 0/306, RunningAvgSamplesPerSec=6.32679394074768, CurrSamplesPerSec=5.678414142951313, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:25:46,013] [INFO] [timer.py:197:stop] 0/308, RunningAvgSamplesPerSec=6.326659795175516, CurrSamplesPerSec=5.668974039823971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:25:57,321] [INFO] [timer.py:197:stop] 0/310, RunningAvgSamplesPerSec=6.326684211766849, CurrSamplesPerSec=5.698074946109996, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:26:08,668] [INFO] [timer.py:197:stop] 0/312, RunningAvgSamplesPerSec=6.326634615929587, CurrSamplesPerSec=5.702059501331119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:26:20,030] [INFO] [timer.py:197:stop] 0/314, RunningAvgSamplesPerSec=6.326579766010395, CurrSamplesPerSec=5.67752083315708, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:26:31,372] [INFO] [timer.py:197:stop] 0/316, RunningAvgSamplesPerSec=6.326458371297783, CurrSamplesPerSec=5.6651921947508415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:26:42,734] [INFO] [timer.py:197:stop] 0/318, RunningAvgSamplesPerSec=6.326282034641276, CurrSamplesPerSec=5.650637983067839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:26:54,070] [INFO] [logging.py:68:log_dist] [Rank 0] step=160, skipped=4, lr=[8.125783520495252e-06], mom=[[0.9, 0.999]] [2022-12-19 00:26:54,072] [INFO] [timer.py:197:stop] 0/320, RunningAvgSamplesPerSec=6.3262625622797435, CurrSamplesPerSec=5.677650764759339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:27:05,444] [INFO] [timer.py:197:stop] 0/322, RunningAvgSamplesPerSec=6.326144083145681, CurrSamplesPerSec=5.688546991999535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:27:16,841] [INFO] [timer.py:197:stop] 0/324, RunningAvgSamplesPerSec=6.32597403277753, CurrSamplesPerSec=5.6585590759436, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:27:28,245] [INFO] [timer.py:197:stop] 0/326, RunningAvgSamplesPerSec=6.3257032208764565, CurrSamplesPerSec=5.644172745055525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:27:39,596] [INFO] [timer.py:197:stop] 0/328, RunningAvgSamplesPerSec=6.325608758756463, CurrSamplesPerSec=5.678952570507874, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:27:51,006] [INFO] [timer.py:197:stop] 0/330, RunningAvgSamplesPerSec=6.325326156655979, CurrSamplesPerSec=5.634799687918671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:28:02,361] [INFO] [timer.py:197:stop] 0/332, RunningAvgSamplesPerSec=6.3252073965144975, CurrSamplesPerSec=5.669204631170644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:28:13,791] [INFO] [timer.py:197:stop] 0/334, RunningAvgSamplesPerSec=6.324794313794967, CurrSamplesPerSec=5.586366238664078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:28:25,155] [INFO] [timer.py:197:stop] 0/336, RunningAvgSamplesPerSec=6.324742033512023, CurrSamplesPerSec=5.667610749573001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:28:36,456] [INFO] [timer.py:197:stop] 0/338, RunningAvgSamplesPerSec=6.324778784386403, CurrSamplesPerSec=5.707499680324429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:28:47,764] [INFO] [logging.py:68:log_dist] [Rank 0] step=170, skipped=4, lr=[8.225760510392298e-06], mom=[[0.9, 0.999]] [2022-12-19 00:28:47,766] [INFO] [timer.py:197:stop] 0/340, RunningAvgSamplesPerSec=6.324802333232001, CurrSamplesPerSec=5.682101568162961, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:28:59,116] [INFO] [timer.py:197:stop] 0/342, RunningAvgSamplesPerSec=6.324733283338533, CurrSamplesPerSec=5.692450868910799, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:29:10,453] [INFO] [timer.py:197:stop] 0/344, RunningAvgSamplesPerSec=6.3246292325159965, CurrSamplesPerSec=5.679481967635714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:29:21,899] [INFO] [timer.py:197:stop] 0/346, RunningAvgSamplesPerSec=6.324397729817437, CurrSamplesPerSec=5.648484903577274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:29:33,812] [INFO] [timer.py:197:stop] 0/348, RunningAvgSamplesPerSec=6.32420868006303, CurrSamplesPerSec=5.651256815401826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:29:45,616] [INFO] [timer.py:197:stop] 0/350, RunningAvgSamplesPerSec=6.3242070956410465, CurrSamplesPerSec=5.699356367779722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1278, 'learning_rate': 8.27351214279797e-06, 'epoch': 1.31} [2022-12-19 00:29:57,281] [INFO] [timer.py:197:stop] 0/352, RunningAvgSamplesPerSec=6.324083302896628, CurrSamplesPerSec=5.681366537818554, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:30:08,710] [INFO] [timer.py:197:stop] 0/354, RunningAvgSamplesPerSec=6.323959804275063, CurrSamplesPerSec=5.659636628479426, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:30:20,253] [INFO] [timer.py:197:stop] 0/356, RunningAvgSamplesPerSec=6.323825506537739, CurrSamplesPerSec=5.675835143588331, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:30:31,710] [INFO] [timer.py:197:stop] 0/358, RunningAvgSamplesPerSec=6.32375855750815, CurrSamplesPerSec=5.67932864138317, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:30:43,099] [INFO] [logging.py:68:log_dist] [Rank 0] step=180, skipped=4, lr=[8.31988745412743e-06], mom=[[0.9, 0.999]] [2022-12-19 00:30:43,101] [INFO] [timer.py:197:stop] 0/360, RunningAvgSamplesPerSec=6.323504983802095, CurrSamplesPerSec=5.63266976909398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:30:54,557] [INFO] [timer.py:197:stop] 0/362, RunningAvgSamplesPerSec=6.32343982359596, CurrSamplesPerSec=5.671728952857804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:31:05,889] [INFO] [timer.py:197:stop] 0/364, RunningAvgSamplesPerSec=6.323428401977776, CurrSamplesPerSec=5.677055914068001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:31:17,276] [INFO] [timer.py:197:stop] 0/366, RunningAvgSamplesPerSec=6.323384431083906, CurrSamplesPerSec=5.67629362117504, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:31:28,785] [INFO] [timer.py:197:stop] 0/368, RunningAvgSamplesPerSec=6.323434262245826, CurrSamplesPerSec=5.70523515407605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:31:40,153] [INFO] [timer.py:197:stop] 0/370, RunningAvgSamplesPerSec=6.32337652578926, CurrSamplesPerSec=5.676626363854563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:31:51,613] [INFO] [timer.py:197:stop] 0/372, RunningAvgSamplesPerSec=6.323447379362817, CurrSamplesPerSec=5.696824567133537, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:32:03,153] [INFO] [timer.py:197:stop] 0/374, RunningAvgSamplesPerSec=6.323490219781556, CurrSamplesPerSec=5.694769285809873, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:32:14,534] [INFO] [timer.py:197:stop] 0/376, RunningAvgSamplesPerSec=6.323369584243994, CurrSamplesPerSec=5.650268556480972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:32:25,876] [INFO] [timer.py:197:stop] 0/378, RunningAvgSamplesPerSec=6.323304923151756, CurrSamplesPerSec=5.6597320913591185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:32:37,173] [INFO] [logging.py:68:log_dist] [Rank 0] step=190, skipped=4, lr=[8.408811289387583e-06], mom=[[0.9, 0.999]] [2022-12-19 00:32:37,175] [INFO] [timer.py:197:stop] 0/380, RunningAvgSamplesPerSec=6.323406777084629, CurrSamplesPerSec=5.7226377858585264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:32:48,540] [INFO] [timer.py:197:stop] 0/382, RunningAvgSamplesPerSec=6.323297263414538, CurrSamplesPerSec=5.668962067784482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:33:00,073] [INFO] [timer.py:197:stop] 0/384, RunningAvgSamplesPerSec=6.323267466569768, CurrSamplesPerSec=5.682340445862556, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:33:11,430] [INFO] [timer.py:197:stop] 0/386, RunningAvgSamplesPerSec=6.32315065138806, CurrSamplesPerSec=5.6789946206884006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:33:22,740] [INFO] [timer.py:197:stop] 0/388, RunningAvgSamplesPerSec=6.323156288131074, CurrSamplesPerSec=5.700271330203468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:33:34,129] [INFO] [timer.py:197:stop] 0/390, RunningAvgSamplesPerSec=6.322983044550158, CurrSamplesPerSec=5.6641335606902645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:33:45,528] [INFO] [timer.py:197:stop] 0/392, RunningAvgSamplesPerSec=6.322861784436344, CurrSamplesPerSec=5.654990050723027, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:33:56,985] [INFO] [timer.py:197:stop] 0/394, RunningAvgSamplesPerSec=6.3225094264140544, CurrSamplesPerSec=5.604591817666873, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:34:08,559] [INFO] [timer.py:197:stop] 0/396, RunningAvgSamplesPerSec=6.322321804464229, CurrSamplesPerSec=5.650546395025515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:34:19,969] [INFO] [timer.py:197:stop] 0/398, RunningAvgSamplesPerSec=6.322231207848553, CurrSamplesPerSec=5.662744166203131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:34:31,349] [INFO] [logging.py:68:log_dist] [Rank 0] step=200, skipped=4, lr=[8.49307723936858e-06], mom=[[0.9, 0.999]] [2022-12-19 00:34:31,351] [INFO] [timer.py:197:stop] 0/400, RunningAvgSamplesPerSec=6.322110726538844, CurrSamplesPerSec=5.658764724125701, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1276, 'learning_rate': 8.49307723936858e-06, 'epoch': 1.5} [2022-12-19 00:34:42,926] [INFO] [timer.py:197:stop] 0/402, RunningAvgSamplesPerSec=6.321981469962649, CurrSamplesPerSec=5.6540464545364975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:34:54,329] [INFO] [timer.py:197:stop] 0/404, RunningAvgSamplesPerSec=6.321788711822746, CurrSamplesPerSec=5.6213456603519765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:35:05,696] [INFO] [timer.py:197:stop] 0/406, RunningAvgSamplesPerSec=6.321723107311507, CurrSamplesPerSec=5.674815716520611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:35:17,050] [INFO] [timer.py:197:stop] 0/408, RunningAvgSamplesPerSec=6.321691644638525, CurrSamplesPerSec=5.695411356584488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:35:28,467] [INFO] [timer.py:197:stop] 0/410, RunningAvgSamplesPerSec=6.321686939549397, CurrSamplesPerSec=5.687924787257544, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:35:39,902] [INFO] [timer.py:197:stop] 0/412, RunningAvgSamplesPerSec=6.321612949819477, CurrSamplesPerSec=5.685645657888461, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:35:51,290] [INFO] [timer.py:197:stop] 0/414, RunningAvgSamplesPerSec=6.321465406676154, CurrSamplesPerSec=5.630122462060419, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:36:02,870] [INFO] [timer.py:197:stop] 0/416, RunningAvgSamplesPerSec=6.321325864365297, CurrSamplesPerSec=5.662598431608348, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:36:14,252] [INFO] [timer.py:197:stop] 0/418, RunningAvgSamplesPerSec=6.321178347260513, CurrSamplesPerSec=5.649072355016578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:36:25,590] [INFO] [logging.py:68:log_dist] [Rank 0] step=210, skipped=4, lr=[8.573149077803088e-06], mom=[[0.9, 0.999]] [2022-12-19 00:36:25,592] [INFO] [timer.py:197:stop] 0/420, RunningAvgSamplesPerSec=6.321159177303354, CurrSamplesPerSec=5.677153166153527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:36:36,968] [INFO] [timer.py:197:stop] 0/422, RunningAvgSamplesPerSec=6.321009637171506, CurrSamplesPerSec=5.652632959889085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:36:48,457] [INFO] [timer.py:197:stop] 0/424, RunningAvgSamplesPerSec=6.320906470732945, CurrSamplesPerSec=5.689232756123447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:36:59,774] [INFO] [timer.py:197:stop] 0/426, RunningAvgSamplesPerSec=6.32096035849853, CurrSamplesPerSec=5.697304338195877, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:37:11,122] [INFO] [timer.py:197:stop] 0/428, RunningAvgSamplesPerSec=6.3208982453379585, CurrSamplesPerSec=5.677523234796336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:37:22,458] [INFO] [timer.py:197:stop] 0/430, RunningAvgSamplesPerSec=6.320848728617728, CurrSamplesPerSec=5.681654418081767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:37:33,837] [INFO] [timer.py:197:stop] 0/432, RunningAvgSamplesPerSec=6.320680960808628, CurrSamplesPerSec=5.648207267718699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:37:45,200] [INFO] [timer.py:197:stop] 0/434, RunningAvgSamplesPerSec=6.320654285102583, CurrSamplesPerSec=5.663154652477177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:37:56,541] [INFO] [timer.py:197:stop] 0/436, RunningAvgSamplesPerSec=6.320597654109512, CurrSamplesPerSec=5.679820612469546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:38:07,937] [INFO] [timer.py:197:stop] 0/438, RunningAvgSamplesPerSec=6.320499502263493, CurrSamplesPerSec=5.669001336263085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:38:19,332] [INFO] [logging.py:68:log_dist] [Rank 0] step=220, skipped=4, lr=[8.64942458567722e-06], mom=[[0.9, 0.999]] [2022-12-19 00:38:19,334] [INFO] [timer.py:197:stop] 0/440, RunningAvgSamplesPerSec=6.3203861817805675, CurrSamplesPerSec=5.6623002963329085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:38:30,695] [INFO] [timer.py:197:stop] 0/442, RunningAvgSamplesPerSec=6.320332690166886, CurrSamplesPerSec=5.669673773648277, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:38:42,055] [INFO] [timer.py:197:stop] 0/444, RunningAvgSamplesPerSec=6.32031966390362, CurrSamplesPerSec=5.680336468504409, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:38:53,544] [INFO] [timer.py:197:stop] 0/446, RunningAvgSamplesPerSec=6.320290405265886, CurrSamplesPerSec=5.687557458672238, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:39:05,003] [INFO] [timer.py:197:stop] 0/448, RunningAvgSamplesPerSec=6.3203053824383435, CurrSamplesPerSec=5.679599491324056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:39:16,341] [INFO] [timer.py:197:stop] 0/450, RunningAvgSamplesPerSec=6.3203527615007316, CurrSamplesPerSec=5.69232919150211, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1281, 'learning_rate': 8.686247975778677e-06, 'epoch': 1.69} [2022-12-19 00:39:27,692] [INFO] [timer.py:197:stop] 0/452, RunningAvgSamplesPerSec=6.320273727186458, CurrSamplesPerSec=5.680921907926451, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:39:39,064] [INFO] [timer.py:197:stop] 0/454, RunningAvgSamplesPerSec=6.320116458786455, CurrSamplesPerSec=5.694291391097104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:39:50,443] [INFO] [timer.py:197:stop] 0/456, RunningAvgSamplesPerSec=6.320100873827638, CurrSamplesPerSec=5.683626352197276, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:40:01,851] [INFO] [timer.py:197:stop] 0/458, RunningAvgSamplesPerSec=6.319997103671828, CurrSamplesPerSec=5.649200274442475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:40:13,216] [INFO] [logging.py:68:log_dist] [Rank 0] step=230, skipped=4, lr=[8.722247506883805e-06], mom=[[0.9, 0.999]] [2022-12-19 00:40:13,217] [INFO] [timer.py:197:stop] 0/460, RunningAvgSamplesPerSec=6.319995105649749, CurrSamplesPerSec=5.693608996003589, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:40:24,778] [INFO] [timer.py:197:stop] 0/462, RunningAvgSamplesPerSec=6.319954386340108, CurrSamplesPerSec=5.664920804358087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:40:36,134] [INFO] [timer.py:197:stop] 0/464, RunningAvgSamplesPerSec=6.319953448551577, CurrSamplesPerSec=5.682989582576036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:40:47,495] [INFO] [timer.py:197:stop] 0/466, RunningAvgSamplesPerSec=6.319850342084705, CurrSamplesPerSec=5.673877485798156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:40:58,887] [INFO] [timer.py:197:stop] 0/468, RunningAvgSamplesPerSec=6.31987292974135, CurrSamplesPerSec=5.691756122229372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:41:10,220] [INFO] [timer.py:197:stop] 0/470, RunningAvgSamplesPerSec=6.319830973205626, CurrSamplesPerSec=5.66767991567336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:41:21,552] [INFO] [timer.py:197:stop] 0/472, RunningAvgSamplesPerSec=6.319829689531446, CurrSamplesPerSec=5.683424909642516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:41:33,093] [INFO] [timer.py:197:stop] 0/474, RunningAvgSamplesPerSec=6.319750643940945, CurrSamplesPerSec=5.671027992840407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:41:44,649] [INFO] [timer.py:197:stop] 0/476, RunningAvgSamplesPerSec=6.319549105580965, CurrSamplesPerSec=5.640950412424254, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:41:55,993] [INFO] [timer.py:197:stop] 0/478, RunningAvgSamplesPerSec=6.319514624812288, CurrSamplesPerSec=5.671870843457645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:42:07,309] [INFO] [logging.py:68:log_dist] [Rank 0] step=240, skipped=4, lr=[8.79191691333329e-06], mom=[[0.9, 0.999]] [2022-12-19 00:42:07,311] [INFO] [timer.py:197:stop] 0/480, RunningAvgSamplesPerSec=6.319601782661979, CurrSamplesPerSec=5.711505695751001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:42:18,609] [INFO] [timer.py:197:stop] 0/482, RunningAvgSamplesPerSec=6.3197343890317885, CurrSamplesPerSec=5.704929360030577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:42:29,977] [INFO] [timer.py:197:stop] 0/484, RunningAvgSamplesPerSec=6.319707372523856, CurrSamplesPerSec=5.692903342052109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:42:41,333] [INFO] [timer.py:197:stop] 0/486, RunningAvgSamplesPerSec=6.319773976031884, CurrSamplesPerSec=5.699332166412615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:42:52,687] [INFO] [timer.py:197:stop] 0/488, RunningAvgSamplesPerSec=6.319784253710412, CurrSamplesPerSec=5.710645436820501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:43:04,227] [INFO] [timer.py:197:stop] 0/490, RunningAvgSamplesPerSec=6.3197963137288236, CurrSamplesPerSec=5.686959325603847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:43:15,598] [INFO] [timer.py:197:stop] 0/492, RunningAvgSamplesPerSec=6.319719178697002, CurrSamplesPerSec=5.671861495716578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:43:26,930] [INFO] [timer.py:197:stop] 0/494, RunningAvgSamplesPerSec=6.31966065634711, CurrSamplesPerSec=5.682334912717824, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:43:38,308] [INFO] [timer.py:197:stop] 0/496, RunningAvgSamplesPerSec=6.319628259143031, CurrSamplesPerSec=5.69025399261467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:43:49,622] [INFO] [timer.py:197:stop] 0/498, RunningAvgSamplesPerSec=6.3196567565582455, CurrSamplesPerSec=5.701364344885284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:44:01,045] [INFO] [logging.py:68:log_dist] [Rank 0] step=250, skipped=4, lr=[8.858694625217149e-06], mom=[[0.9, 0.999]] [2022-12-19 00:44:01,046] [INFO] [timer.py:197:stop] 0/500, RunningAvgSamplesPerSec=6.319445590061159, CurrSamplesPerSec=5.670494420814168, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1256, 'learning_rate': 8.858694625217149e-06, 'epoch': 1.88} [2022-12-19 00:44:12,381] [INFO] [timer.py:197:stop] 0/502, RunningAvgSamplesPerSec=6.319467580761168, CurrSamplesPerSec=5.6888558547442925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:44:23,796] [INFO] [timer.py:197:stop] 0/504, RunningAvgSamplesPerSec=6.319435192509141, CurrSamplesPerSec=5.6658453120372965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:44:35,191] [INFO] [timer.py:197:stop] 0/506, RunningAvgSamplesPerSec=6.319352589227274, CurrSamplesPerSec=5.649429735511968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:44:46,632] [INFO] [timer.py:197:stop] 0/508, RunningAvgSamplesPerSec=6.3192937352392375, CurrSamplesPerSec=5.6509520217547236, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:44:58,011] [INFO] [timer.py:197:stop] 0/510, RunningAvgSamplesPerSec=6.3192807023787605, CurrSamplesPerSec=5.696140839572408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:45:09,378] [INFO] [timer.py:197:stop] 0/512, RunningAvgSamplesPerSec=6.319277059254272, CurrSamplesPerSec=5.676434059756074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:45:20,765] [INFO] [timer.py:197:stop] 0/514, RunningAvgSamplesPerSec=6.3192279699869065, CurrSamplesPerSec=5.675098853971721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:45:32,232] [INFO] [timer.py:197:stop] 0/516, RunningAvgSamplesPerSec=6.319209292569757, CurrSamplesPerSec=5.68089569881787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:45:43,723] [INFO] [timer.py:197:stop] 0/518, RunningAvgSamplesPerSec=6.319173952639591, CurrSamplesPerSec=5.686136796905354, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:45:55,092] [INFO] [logging.py:68:log_dist] [Rank 0] step=260, skipped=4, lr=[8.922811151820517e-06], mom=[[0.9, 0.999]] [2022-12-19 00:45:55,094] [INFO] [timer.py:197:stop] 0/520, RunningAvgSamplesPerSec=6.319140726211304, CurrSamplesPerSec=5.679437987628754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:46:06,511] [INFO] [timer.py:197:stop] 0/522, RunningAvgSamplesPerSec=6.319064093557645, CurrSamplesPerSec=5.674828433105231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:46:18,067] [INFO] [timer.py:197:stop] 0/524, RunningAvgSamplesPerSec=6.318941947109328, CurrSamplesPerSec=5.663134341803495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:46:29,444] [INFO] [timer.py:197:stop] 0/526, RunningAvgSamplesPerSec=6.318919080123138, CurrSamplesPerSec=5.672882499055666, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:46:40,816] [INFO] [timer.py:197:stop] 0/528, RunningAvgSamplesPerSec=6.318897867858227, CurrSamplesPerSec=5.669731493834369, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:46:52,235] [INFO] [timer.py:197:stop] 0/530, RunningAvgSamplesPerSec=6.3188783359901946, CurrSamplesPerSec=5.682390485226022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:47:03,591] [INFO] [timer.py:197:stop] 0/532, RunningAvgSamplesPerSec=6.318838528980661, CurrSamplesPerSec=5.677959165974818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:47:14,013] [INFO] [timer.py:197:stop] 0/534, RunningAvgSamplesPerSec=6.320788890088867, CurrSamplesPerSec=6.659617335939272, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:47:25,363] [INFO] [timer.py:197:stop] 0/536, RunningAvgSamplesPerSec=6.320844322352108, CurrSamplesPerSec=5.717355274172742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:47:37,096] [INFO] [timer.py:197:stop] 0/538, RunningAvgSamplesPerSec=6.320376506280682, CurrSamplesPerSec=5.4944077632686446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:47:48,467] [INFO] [logging.py:68:log_dist] [Rank 0] step=270, skipped=4, lr=[8.984470493319244e-06], mom=[[0.9, 0.999]] [2022-12-19 00:47:48,468] [INFO] [timer.py:197:stop] 0/540, RunningAvgSamplesPerSec=6.32036588035628, CurrSamplesPerSec=5.688256243202118, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:47:59,824] [INFO] [timer.py:197:stop] 0/542, RunningAvgSamplesPerSec=6.320436921654575, CurrSamplesPerSec=5.705787412416891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:48:11,212] [INFO] [timer.py:197:stop] 0/544, RunningAvgSamplesPerSec=6.320454577403126, CurrSamplesPerSec=5.69246197464763, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:48:22,607] [INFO] [timer.py:197:stop] 0/546, RunningAvgSamplesPerSec=6.32036708144675, CurrSamplesPerSec=5.642555899959179, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:48:34,030] [INFO] [timer.py:197:stop] 0/548, RunningAvgSamplesPerSec=6.320323704117513, CurrSamplesPerSec=5.674647527055657, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:48:45,493] [INFO] [timer.py:197:stop] 0/550, RunningAvgSamplesPerSec=6.320330352988736, CurrSamplesPerSec=5.676665258402395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:48:56,826] [INFO] [timer.py:197:stop] 0/552, RunningAvgSamplesPerSec=6.3203738826027624, CurrSamplesPerSec=5.696218197963952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1083, 'learning_rate': 9.020362953730323e-06, 'epoch': 2.07} [2022-12-19 00:49:08,314] [INFO] [timer.py:197:stop] 0/554, RunningAvgSamplesPerSec=6.320345104895774, CurrSamplesPerSec=5.672650409668064, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:49:19,652] [INFO] [timer.py:197:stop] 0/556, RunningAvgSamplesPerSec=6.320329448586411, CurrSamplesPerSec=5.678675054933501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:49:31,070] [INFO] [timer.py:197:stop] 0/558, RunningAvgSamplesPerSec=6.3202936505881135, CurrSamplesPerSec=5.667847453017027, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:49:42,444] [INFO] [logging.py:68:log_dist] [Rank 0] step=280, skipped=4, lr=[9.043854055968706e-06], mom=[[0.9, 0.999]] [2022-12-19 00:49:42,446] [INFO] [timer.py:197:stop] 0/560, RunningAvgSamplesPerSec=6.320316517487314, CurrSamplesPerSec=5.696863980761914, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:49:53,798] [INFO] [timer.py:197:stop] 0/562, RunningAvgSamplesPerSec=6.320314246239234, CurrSamplesPerSec=5.668374063442858, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:50:05,171] [INFO] [timer.py:197:stop] 0/564, RunningAvgSamplesPerSec=6.320295849098979, CurrSamplesPerSec=5.689046107739862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:50:16,610] [INFO] [timer.py:197:stop] 0/566, RunningAvgSamplesPerSec=6.320233403134386, CurrSamplesPerSec=5.679131588451052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:50:28,160] [INFO] [timer.py:197:stop] 0/568, RunningAvgSamplesPerSec=6.320158012531864, CurrSamplesPerSec=5.676459267416466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:50:39,541] [INFO] [timer.py:197:stop] 0/570, RunningAvgSamplesPerSec=6.320134681569571, CurrSamplesPerSec=5.676231926422744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:50:50,985] [INFO] [timer.py:197:stop] 0/572, RunningAvgSamplesPerSec=6.320096534587018, CurrSamplesPerSec=5.69315616920031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:51:02,426] [INFO] [timer.py:197:stop] 0/574, RunningAvgSamplesPerSec=6.320148404746433, CurrSamplesPerSec=5.712264104970907, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:51:13,799] [INFO] [timer.py:197:stop] 0/576, RunningAvgSamplesPerSec=6.320137746983063, CurrSamplesPerSec=5.682742710294148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:51:25,139] [INFO] [timer.py:197:stop] 0/578, RunningAvgSamplesPerSec=6.320093385678466, CurrSamplesPerSec=5.659379610683581, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:51:36,491] [INFO] [logging.py:68:log_dist] [Rank 0] step=290, skipped=4, lr=[9.10112387015335e-06], mom=[[0.9, 0.999]] [2022-12-19 00:51:36,493] [INFO] [timer.py:197:stop] 0/580, RunningAvgSamplesPerSec=6.320056277682747, CurrSamplesPerSec=5.678806000461184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:51:47,842] [INFO] [timer.py:197:stop] 0/582, RunningAvgSamplesPerSec=6.3199956995218045, CurrSamplesPerSec=5.685229255815799, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:51:59,186] [INFO] [timer.py:197:stop] 0/584, RunningAvgSamplesPerSec=6.3200186748889315, CurrSamplesPerSec=5.70428368696425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:52:10,538] [INFO] [timer.py:197:stop] 0/586, RunningAvgSamplesPerSec=6.319900223803636, CurrSamplesPerSec=5.69582804300266, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:52:21,887] [INFO] [timer.py:197:stop] 0/588, RunningAvgSamplesPerSec=6.319919982329858, CurrSamplesPerSec=5.702778088537659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:52:33,225] [INFO] [timer.py:197:stop] 0/590, RunningAvgSamplesPerSec=6.319943779386075, CurrSamplesPerSec=5.687501303036996, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:52:44,603] [INFO] [timer.py:197:stop] 0/592, RunningAvgSamplesPerSec=6.3198457896600235, CurrSamplesPerSec=5.643944897479634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:52:55,943] [INFO] [timer.py:197:stop] 0/594, RunningAvgSamplesPerSec=6.319856297433864, CurrSamplesPerSec=5.680393925185212, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:53:07,248] [INFO] [timer.py:197:stop] 0/596, RunningAvgSamplesPerSec=6.319840349532547, CurrSamplesPerSec=5.692354057692489, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:53:18,597] [INFO] [timer.py:197:stop] 0/598, RunningAvgSamplesPerSec=6.319815416165195, CurrSamplesPerSec=5.690553872569714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:53:29,925] [INFO] [logging.py:68:log_dist] [Rank 0] step=300, skipped=4, lr=[9.156425255148058e-06], mom=[[0.9, 0.999]] [2022-12-19 00:53:29,926] [INFO] [timer.py:197:stop] 0/600, RunningAvgSamplesPerSec=6.319814986271745, CurrSamplesPerSec=5.681727054135516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:53:41,253] [INFO] [timer.py:197:stop] 0/602, RunningAvgSamplesPerSec=6.319860543595978, CurrSamplesPerSec=5.6961142481105895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0598, 'learning_rate': 9.161852281961698e-06, 'epoch': 2.25} [2022-12-19 00:53:52,640] [INFO] [timer.py:197:stop] 0/604, RunningAvgSamplesPerSec=6.319816884425725, CurrSamplesPerSec=5.683669675042479, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:54:03,959] [INFO] [timer.py:197:stop] 0/606, RunningAvgSamplesPerSec=6.319839531991534, CurrSamplesPerSec=5.694245248786347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:54:15,332] [INFO] [timer.py:197:stop] 0/608, RunningAvgSamplesPerSec=6.319810550595329, CurrSamplesPerSec=5.662503827591686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:54:26,672] [INFO] [timer.py:197:stop] 0/610, RunningAvgSamplesPerSec=6.3198110898553725, CurrSamplesPerSec=5.6773870650588725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:54:38,028] [INFO] [timer.py:197:stop] 0/612, RunningAvgSamplesPerSec=6.3197758122270375, CurrSamplesPerSec=5.679294516603755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:54:49,400] [INFO] [timer.py:197:stop] 0/614, RunningAvgSamplesPerSec=6.319767573686687, CurrSamplesPerSec=5.683025676812096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:55:00,771] [INFO] [timer.py:197:stop] 0/616, RunningAvgSamplesPerSec=6.319750107435414, CurrSamplesPerSec=5.679651405176822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:55:12,146] [INFO] [timer.py:197:stop] 0/618, RunningAvgSamplesPerSec=6.319709566145599, CurrSamplesPerSec=5.675407457980364, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:55:23,511] [INFO] [logging.py:68:log_dist] [Rank 0] step=310, skipped=4, lr=[9.209889040960644e-06], mom=[[0.9, 0.999]] [2022-12-19 00:55:23,512] [INFO] [timer.py:197:stop] 0/620, RunningAvgSamplesPerSec=6.319705553102689, CurrSamplesPerSec=5.6907237301048665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:55:34,920] [INFO] [timer.py:197:stop] 0/622, RunningAvgSamplesPerSec=6.319562927733921, CurrSamplesPerSec=5.69239244379155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:55:46,210] [INFO] [timer.py:197:stop] 0/624, RunningAvgSamplesPerSec=6.3196183156998735, CurrSamplesPerSec=5.691242775260119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:55:57,573] [INFO] [timer.py:197:stop] 0/626, RunningAvgSamplesPerSec=6.319590430642667, CurrSamplesPerSec=5.681516126093696, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:56:08,941] [INFO] [timer.py:197:stop] 0/628, RunningAvgSamplesPerSec=6.319582780848161, CurrSamplesPerSec=5.679966273412695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:56:20,248] [INFO] [timer.py:197:stop] 0/630, RunningAvgSamplesPerSec=6.319579604732076, CurrSamplesPerSec=5.698700099493294, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:56:31,629] [INFO] [timer.py:197:stop] 0/632, RunningAvgSamplesPerSec=6.319548177013225, CurrSamplesPerSec=5.675483294273378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:56:42,997] [INFO] [timer.py:197:stop] 0/634, RunningAvgSamplesPerSec=6.319542135253894, CurrSamplesPerSec=5.691982535979329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:56:54,303] [INFO] [timer.py:197:stop] 0/636, RunningAvgSamplesPerSec=6.319527501736275, CurrSamplesPerSec=5.694565361369183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:57:05,638] [INFO] [timer.py:197:stop] 0/638, RunningAvgSamplesPerSec=6.319541868477244, CurrSamplesPerSec=5.690798769830471, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:57:16,990] [INFO] [logging.py:68:log_dist] [Rank 0] step=320, skipped=4, lr=[9.261633432763397e-06], mom=[[0.9, 0.999]] [2022-12-19 00:57:16,992] [INFO] [timer.py:197:stop] 0/640, RunningAvgSamplesPerSec=6.319518687891041, CurrSamplesPerSec=5.67143488801959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:57:28,592] [INFO] [timer.py:197:stop] 0/642, RunningAvgSamplesPerSec=6.319465908527522, CurrSamplesPerSec=5.64993699384335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:57:40,476] [INFO] [timer.py:197:stop] 0/644, RunningAvgSamplesPerSec=6.319390943798744, CurrSamplesPerSec=5.668378372477213, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:57:52,243] [INFO] [timer.py:197:stop] 0/646, RunningAvgSamplesPerSec=6.3193639859566435, CurrSamplesPerSec=5.680033337645695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:58:03,735] [INFO] [timer.py:197:stop] 0/648, RunningAvgSamplesPerSec=6.319346198603127, CurrSamplesPerSec=5.697315704720048, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:58:15,088] [INFO] [timer.py:197:stop] 0/650, RunningAvgSamplesPerSec=6.319315472059445, CurrSamplesPerSec=5.665524832890148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:58:26,580] [INFO] [timer.py:197:stop] 0/652, RunningAvgSamplesPerSec=6.319297685088794, CurrSamplesPerSec=5.670232583173057, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.063, 'learning_rate': 9.29189975311636e-06, 'epoch': 2.44} [2022-12-19 00:58:38,053] [INFO] [timer.py:197:stop] 0/654, RunningAvgSamplesPerSec=6.319277747601288, CurrSamplesPerSec=5.687855126157084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:58:49,431] [INFO] [timer.py:197:stop] 0/656, RunningAvgSamplesPerSec=6.319245498514143, CurrSamplesPerSec=5.668320679838351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:59:00,802] [INFO] [timer.py:197:stop] 0/658, RunningAvgSamplesPerSec=6.319197370830487, CurrSamplesPerSec=5.671775450008826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:59:12,252] [INFO] [logging.py:68:log_dist] [Rank 0] step=330, skipped=4, lr=[9.311765584761373e-06], mom=[[0.9, 0.999]] [2022-12-19 00:59:12,254] [INFO] [timer.py:197:stop] 0/660, RunningAvgSamplesPerSec=6.319190796656681, CurrSamplesPerSec=5.67663380659717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:59:23,576] [INFO] [timer.py:197:stop] 0/662, RunningAvgSamplesPerSec=6.31916084556495, CurrSamplesPerSec=5.678397806718739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:59:35,169] [INFO] [timer.py:197:stop] 0/664, RunningAvgSamplesPerSec=6.319083573895808, CurrSamplesPerSec=5.662233172447872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:59:46,486] [INFO] [timer.py:197:stop] 0/666, RunningAvgSamplesPerSec=6.319091632760076, CurrSamplesPerSec=5.680942346389958, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 00:59:57,979] [INFO] [timer.py:197:stop] 0/668, RunningAvgSamplesPerSec=6.319017800660259, CurrSamplesPerSec=5.660799346909331, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:00:09,328] [INFO] [timer.py:197:stop] 0/670, RunningAvgSamplesPerSec=6.3190160982731856, CurrSamplesPerSec=5.678764914150066, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:00:20,717] [INFO] [timer.py:197:stop] 0/672, RunningAvgSamplesPerSec=6.318926060512557, CurrSamplesPerSec=5.646853233351514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:00:32,070] [INFO] [timer.py:197:stop] 0/674, RunningAvgSamplesPerSec=6.318884045785445, CurrSamplesPerSec=5.676124623958836, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:00:43,441] [INFO] [timer.py:197:stop] 0/676, RunningAvgSamplesPerSec=6.318818520642245, CurrSamplesPerSec=5.657839425000005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:00:54,800] [INFO] [timer.py:197:stop] 0/678, RunningAvgSamplesPerSec=6.3188269070500445, CurrSamplesPerSec=5.682595944542649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:01:06,321] [INFO] [logging.py:68:log_dist] [Rank 0] step=340, skipped=4, lr=[9.360382936198493e-06], mom=[[0.9, 0.999]] [2022-12-19 01:01:06,323] [INFO] [timer.py:197:stop] 0/680, RunningAvgSamplesPerSec=6.318803423863587, CurrSamplesPerSec=5.678195773729671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:01:17,676] [INFO] [timer.py:197:stop] 0/682, RunningAvgSamplesPerSec=6.318788339294621, CurrSamplesPerSec=5.6738146443034605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:01:29,144] [INFO] [timer.py:197:stop] 0/684, RunningAvgSamplesPerSec=6.318843082261069, CurrSamplesPerSec=5.70212829954872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:01:40,484] [INFO] [timer.py:197:stop] 0/686, RunningAvgSamplesPerSec=6.318877418981399, CurrSamplesPerSec=5.707351390334674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:01:52,010] [INFO] [timer.py:197:stop] 0/688, RunningAvgSamplesPerSec=6.318834208561251, CurrSamplesPerSec=5.678378107269179, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:02:03,349] [INFO] [timer.py:197:stop] 0/690, RunningAvgSamplesPerSec=6.318792311453393, CurrSamplesPerSec=5.665985234079397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:02:14,745] [INFO] [timer.py:197:stop] 0/692, RunningAvgSamplesPerSec=6.318835723815898, CurrSamplesPerSec=5.695138271531494, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:02:26,244] [INFO] [timer.py:197:stop] 0/694, RunningAvgSamplesPerSec=6.318826920348641, CurrSamplesPerSec=5.658641858275141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:02:37,669] [INFO] [timer.py:197:stop] 0/696, RunningAvgSamplesPerSec=6.318804790379597, CurrSamplesPerSec=5.675080137234234, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:02:49,030] [INFO] [timer.py:197:stop] 0/698, RunningAvgSamplesPerSec=6.318774542107063, CurrSamplesPerSec=5.670887102818869, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:03:00,447] [INFO] [logging.py:68:log_dist] [Rank 0] step=350, skipped=4, lr=[9.407574351377137e-06], mom=[[0.9, 0.999]] [2022-12-19 01:03:00,448] [INFO] [timer.py:197:stop] 0/700, RunningAvgSamplesPerSec=6.318746771411151, CurrSamplesPerSec=5.685630965973367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:03:11,800] [INFO] [timer.py:197:stop] 0/702, RunningAvgSamplesPerSec=6.318741795333457, CurrSamplesPerSec=5.681823023160319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0622, 'learning_rate': 9.412218256259678e-06, 'epoch': 2.63} [2022-12-19 01:03:23,174] [INFO] [timer.py:197:stop] 0/704, RunningAvgSamplesPerSec=6.318707761718272, CurrSamplesPerSec=5.669857476413636, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:03:34,662] [INFO] [timer.py:197:stop] 0/706, RunningAvgSamplesPerSec=6.318680061373025, CurrSamplesPerSec=5.683818421837514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:03:46,225] [INFO] [timer.py:197:stop] 0/708, RunningAvgSamplesPerSec=6.318588820848263, CurrSamplesPerSec=5.6424873457028735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:03:57,614] [INFO] [timer.py:197:stop] 0/710, RunningAvgSamplesPerSec=6.31855631018623, CurrSamplesPerSec=5.667990585621616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:04:08,965] [INFO] [timer.py:197:stop] 0/712, RunningAvgSamplesPerSec=6.318527528178411, CurrSamplesPerSec=5.661748781060366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:04:20,320] [INFO] [timer.py:197:stop] 0/714, RunningAvgSamplesPerSec=6.31855001651633, CurrSamplesPerSec=5.698780431002723, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:04:31,005] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0 [2022-12-19 01:04:31,007] [INFO] [timer.py:197:stop] 0/716, RunningAvgSamplesPerSec=6.319610920055632, CurrSamplesPerSec=6.373794873163514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:04:42,331] [INFO] [timer.py:197:stop] 0/718, RunningAvgSamplesPerSec=6.31961222646377, CurrSamplesPerSec=5.68433909693796, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:04:53,753] [INFO] [logging.py:68:log_dist] [Rank 0] step=360, skipped=5, lr=[9.44889475969735e-06], mom=[[0.9, 0.999]] [2022-12-19 01:04:53,755] [INFO] [timer.py:197:stop] 0/720, RunningAvgSamplesPerSec=6.319594889338643, CurrSamplesPerSec=5.675628972838387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:05:05,101] [INFO] [timer.py:197:stop] 0/722, RunningAvgSamplesPerSec=6.319598071660983, CurrSamplesPerSec=5.689632137987318, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:05:16,579] [INFO] [timer.py:197:stop] 0/724, RunningAvgSamplesPerSec=6.319450696538225, CurrSamplesPerSec=5.677897915403727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:05:28,088] [INFO] [timer.py:197:stop] 0/726, RunningAvgSamplesPerSec=6.319403493486101, CurrSamplesPerSec=5.682042633652292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:05:39,565] [INFO] [timer.py:197:stop] 0/728, RunningAvgSamplesPerSec=6.319410687188341, CurrSamplesPerSec=5.72953678288705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:05:50,922] [INFO] [timer.py:197:stop] 0/730, RunningAvgSamplesPerSec=6.319406220205962, CurrSamplesPerSec=5.698383634739568, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:06:02,290] [INFO] [timer.py:197:stop] 0/732, RunningAvgSamplesPerSec=6.319386981148773, CurrSamplesPerSec=5.684696138275903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:06:13,615] [INFO] [timer.py:197:stop] 0/734, RunningAvgSamplesPerSec=6.319375963701701, CurrSamplesPerSec=5.689230585723028, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:06:24,899] [INFO] [timer.py:197:stop] 0/736, RunningAvgSamplesPerSec=6.319415544190312, CurrSamplesPerSec=5.696515322069078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:06:36,246] [INFO] [timer.py:197:stop] 0/738, RunningAvgSamplesPerSec=6.3194329538262926, CurrSamplesPerSec=5.699937988449255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:06:47,589] [INFO] [logging.py:68:log_dist] [Rank 0] step=370, skipped=5, lr=[9.493595187571683e-06], mom=[[0.9, 0.999]] [2022-12-19 01:06:47,590] [INFO] [timer.py:197:stop] 0/740, RunningAvgSamplesPerSec=6.31943826834621, CurrSamplesPerSec=5.6954609013696835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:06:58,917] [INFO] [timer.py:197:stop] 0/742, RunningAvgSamplesPerSec=6.319459772220872, CurrSamplesPerSec=5.711683126014281, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:07:10,261] [INFO] [timer.py:197:stop] 0/744, RunningAvgSamplesPerSec=6.319469109357799, CurrSamplesPerSec=5.6932674975392175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:07:21,616] [INFO] [timer.py:197:stop] 0/746, RunningAvgSamplesPerSec=6.319434855078194, CurrSamplesPerSec=5.665744141942675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:07:32,969] [INFO] [timer.py:197:stop] 0/748, RunningAvgSamplesPerSec=6.319418016141363, CurrSamplesPerSec=5.678815611379994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:07:44,316] [INFO] [timer.py:197:stop] 0/750, RunningAvgSamplesPerSec=6.319421415241671, CurrSamplesPerSec=5.689901318263332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:07:55,634] [INFO] [timer.py:197:stop] 0/752, RunningAvgSamplesPerSec=6.3194597168832, CurrSamplesPerSec=5.712570685852921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0652, 'learning_rate': 9.519831289296397e-06, 'epoch': 2.82} [2022-12-19 01:08:06,913] [INFO] [timer.py:197:stop] 0/754, RunningAvgSamplesPerSec=6.319525749954582, CurrSamplesPerSec=5.7049080211463465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:08:18,248] [INFO] [timer.py:197:stop] 0/756, RunningAvgSamplesPerSec=6.319546004304064, CurrSamplesPerSec=5.692388822439334, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:08:29,604] [INFO] [timer.py:197:stop] 0/758, RunningAvgSamplesPerSec=6.319534862240435, CurrSamplesPerSec=5.682529300958316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:08:40,905] [INFO] [logging.py:68:log_dist] [Rank 0] step=380, skipped=5, lr=[9.53708734662638e-06], mom=[[0.9, 0.999]] [2022-12-19 01:08:40,906] [INFO] [timer.py:197:stop] 0/760, RunningAvgSamplesPerSec=6.319604730318235, CurrSamplesPerSec=5.717760562955475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:08:52,215] [INFO] [timer.py:197:stop] 0/762, RunningAvgSamplesPerSec=6.319657120004174, CurrSamplesPerSec=5.709749244940733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:09:03,553] [INFO] [timer.py:197:stop] 0/764, RunningAvgSamplesPerSec=6.319688978421141, CurrSamplesPerSec=5.688741323272288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:09:14,877] [INFO] [timer.py:197:stop] 0/766, RunningAvgSamplesPerSec=6.319741276214747, CurrSamplesPerSec=5.717071557504972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:09:26,193] [INFO] [timer.py:197:stop] 0/768, RunningAvgSamplesPerSec=6.319783373051262, CurrSamplesPerSec=5.699022891038163, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:09:37,497] [INFO] [timer.py:197:stop] 0/770, RunningAvgSamplesPerSec=6.3198730313567, CurrSamplesPerSec=5.726977370333311, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:09:48,757] [INFO] [timer.py:197:stop] 0/772, RunningAvgSamplesPerSec=6.31995469276013, CurrSamplesPerSec=5.714096299436316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:10:00,099] [INFO] [timer.py:197:stop] 0/774, RunningAvgSamplesPerSec=6.319947080734109, CurrSamplesPerSec=5.685517527677231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:10:11,371] [INFO] [timer.py:197:stop] 0/776, RunningAvgSamplesPerSec=6.320053388805825, CurrSamplesPerSec=5.748864513648623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:10:22,700] [INFO] [timer.py:197:stop] 0/778, RunningAvgSamplesPerSec=6.320085880488491, CurrSamplesPerSec=5.699564265989264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:10:34,026] [INFO] [logging.py:68:log_dist] [Rank 0] step=390, skipped=5, lr=[9.57943484127219e-06], mom=[[0.9, 0.999]] [2022-12-19 01:10:34,028] [INFO] [timer.py:197:stop] 0/780, RunningAvgSamplesPerSec=6.3200970319439085, CurrSamplesPerSec=5.69235864468338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:10:45,324] [INFO] [timer.py:197:stop] 0/782, RunningAvgSamplesPerSec=6.320140295527232, CurrSamplesPerSec=5.68812245088063, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:10:56,613] [INFO] [timer.py:197:stop] 0/784, RunningAvgSamplesPerSec=6.320181813428936, CurrSamplesPerSec=5.711024259030779, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:11:07,921] [INFO] [timer.py:197:stop] 0/786, RunningAvgSamplesPerSec=6.320240390498994, CurrSamplesPerSec=5.7178124460165165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:11:19,247] [INFO] [timer.py:197:stop] 0/788, RunningAvgSamplesPerSec=6.320257443701773, CurrSamplesPerSec=5.674380027974973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:11:30,579] [INFO] [timer.py:197:stop] 0/790, RunningAvgSamplesPerSec=6.320281969155325, CurrSamplesPerSec=5.6852624886749235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:11:41,832] [INFO] [timer.py:197:stop] 0/792, RunningAvgSamplesPerSec=6.320364834849939, CurrSamplesPerSec=5.7186095661235115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:11:53,155] [INFO] [timer.py:197:stop] 0/794, RunningAvgSamplesPerSec=6.32039959699047, CurrSamplesPerSec=5.690001664381216, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:12:04,476] [INFO] [timer.py:197:stop] 0/796, RunningAvgSamplesPerSec=6.320444255259934, CurrSamplesPerSec=5.700093639913698, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:12:16,468] [INFO] [timer.py:197:stop] 0/798, RunningAvgSamplesPerSec=6.320384566034095, CurrSamplesPerSec=5.644303528380221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:12:28,042] [INFO] [logging.py:68:log_dist] [Rank 0] step=400, skipped=5, lr=[9.620696382156558e-06], mom=[[0.9, 0.999]] [2022-12-19 01:12:28,043] [INFO] [timer.py:197:stop] 0/800, RunningAvgSamplesPerSec=6.320384545227413, CurrSamplesPerSec=5.670629301789308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:12:38,647] [INFO] [timer.py:197:stop] 0/802, RunningAvgSamplesPerSec=6.321694783422982, CurrSamplesPerSec=5.697203976174061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0661, 'learning_rate': 9.624764935335318e-06, 'epoch': 3.01} [2022-12-19 01:12:49,942] [INFO] [timer.py:197:stop] 0/804, RunningAvgSamplesPerSec=6.321769207023047, CurrSamplesPerSec=5.717762024437264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:13:01,306] [INFO] [timer.py:197:stop] 0/806, RunningAvgSamplesPerSec=6.321793827491138, CurrSamplesPerSec=5.708415317168403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:13:12,608] [INFO] [timer.py:197:stop] 0/808, RunningAvgSamplesPerSec=6.321874553130169, CurrSamplesPerSec=5.710777374965387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:13:24,100] [INFO] [timer.py:197:stop] 0/810, RunningAvgSamplesPerSec=6.32190001086698, CurrSamplesPerSec=5.698906255990863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:13:35,455] [INFO] [timer.py:197:stop] 0/812, RunningAvgSamplesPerSec=6.321901583642028, CurrSamplesPerSec=5.692360817471119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:13:46,764] [INFO] [timer.py:197:stop] 0/814, RunningAvgSamplesPerSec=6.321962138410091, CurrSamplesPerSec=5.709352619831969, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:13:58,112] [INFO] [timer.py:197:stop] 0/816, RunningAvgSamplesPerSec=6.322025629227291, CurrSamplesPerSec=5.717471691390685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:14:09,608] [INFO] [timer.py:197:stop] 0/818, RunningAvgSamplesPerSec=6.322055882105773, CurrSamplesPerSec=5.6919250860090225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:14:20,948] [INFO] [logging.py:68:log_dist] [Rank 0] step=410, skipped=5, lr=[9.660926275674324e-06], mom=[[0.9, 0.999]] [2022-12-19 01:14:20,950] [INFO] [timer.py:197:stop] 0/820, RunningAvgSamplesPerSec=6.322071366508863, CurrSamplesPerSec=5.689165956780512, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:14:32,299] [INFO] [timer.py:197:stop] 0/822, RunningAvgSamplesPerSec=6.322091012799124, CurrSamplesPerSec=5.680982742844537, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:14:43,785] [INFO] [timer.py:197:stop] 0/824, RunningAvgSamplesPerSec=6.322089750056571, CurrSamplesPerSec=5.686235805776385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:14:55,136] [INFO] [timer.py:197:stop] 0/826, RunningAvgSamplesPerSec=6.322084835751599, CurrSamplesPerSec=5.694642193744664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:15:06,446] [INFO] [timer.py:197:stop] 0/828, RunningAvgSamplesPerSec=6.322095494181754, CurrSamplesPerSec=5.693213402531381, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:15:17,770] [INFO] [timer.py:197:stop] 0/830, RunningAvgSamplesPerSec=6.322150099578422, CurrSamplesPerSec=5.700337906393788, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:15:29,044] [INFO] [timer.py:197:stop] 0/832, RunningAvgSamplesPerSec=6.322206975503775, CurrSamplesPerSec=5.705411709894755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:15:40,424] [INFO] [timer.py:197:stop] 0/834, RunningAvgSamplesPerSec=6.3222355995601465, CurrSamplesPerSec=5.702684317896509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:15:51,780] [INFO] [timer.py:197:stop] 0/836, RunningAvgSamplesPerSec=6.322215416889453, CurrSamplesPerSec=5.672825433995187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:16:03,249] [INFO] [timer.py:197:stop] 0/838, RunningAvgSamplesPerSec=6.322242856868077, CurrSamplesPerSec=5.705001622437774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:16:14,724] [INFO] [logging.py:68:log_dist] [Rank 0] step=420, skipped=5, lr=[9.700174853763023e-06], mom=[[0.9, 0.999]] [2022-12-19 01:16:14,726] [INFO] [timer.py:197:stop] 0/840, RunningAvgSamplesPerSec=6.322253108652342, CurrSamplesPerSec=5.690793461489017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:16:26,062] [INFO] [timer.py:197:stop] 0/842, RunningAvgSamplesPerSec=6.322262189911131, CurrSamplesPerSec=5.688311208358119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:16:37,452] [INFO] [timer.py:197:stop] 0/844, RunningAvgSamplesPerSec=6.322193084837088, CurrSamplesPerSec=5.617880313851533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:16:48,950] [INFO] [timer.py:197:stop] 0/846, RunningAvgSamplesPerSec=6.322181426493689, CurrSamplesPerSec=5.690715043962175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:17:00,262] [INFO] [timer.py:197:stop] 0/848, RunningAvgSamplesPerSec=6.322283976168414, CurrSamplesPerSec=5.720835223687506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:17:11,066] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096.0, reducing to 2048.0 [2022-12-19 01:17:11,068] [INFO] [timer.py:197:stop] 0/850, RunningAvgSamplesPerSec=6.323227806105633, CurrSamplesPerSec=6.389358543624319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:17:22,524] [INFO] [timer.py:197:stop] 0/852, RunningAvgSamplesPerSec=6.323247375252913, CurrSamplesPerSec=5.700473968753976, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0342, 'learning_rate': 9.719445885591654e-06, 'epoch': 3.19} [2022-12-19 01:17:33,816] [INFO] [timer.py:197:stop] 0/854, RunningAvgSamplesPerSec=6.323299274109773, CurrSamplesPerSec=5.7191917113865145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:17:45,115] [INFO] [timer.py:197:stop] 0/856, RunningAvgSamplesPerSec=6.32335214957532, CurrSamplesPerSec=5.700958955586349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:17:56,451] [INFO] [timer.py:197:stop] 0/858, RunningAvgSamplesPerSec=6.323402909213706, CurrSamplesPerSec=5.723904893377566, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:18:07,893] [INFO] [logging.py:68:log_dist] [Rank 0] step=430, skipped=6, lr=[9.734698245522364e-06], mom=[[0.9, 0.999]] [2022-12-19 01:18:07,895] [INFO] [timer.py:197:stop] 0/860, RunningAvgSamplesPerSec=6.323411764163691, CurrSamplesPerSec=5.699367258461975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:18:19,276] [INFO] [timer.py:197:stop] 0/862, RunningAvgSamplesPerSec=6.32344425337955, CurrSamplesPerSec=5.711898730772658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:18:30,599] [INFO] [timer.py:197:stop] 0/864, RunningAvgSamplesPerSec=6.323450454321044, CurrSamplesPerSec=5.712499933497748, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:18:42,093] [INFO] [timer.py:197:stop] 0/866, RunningAvgSamplesPerSec=6.32334593858172, CurrSamplesPerSec=5.618560669677461, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:18:53,481] [INFO] [timer.py:197:stop] 0/868, RunningAvgSamplesPerSec=6.32340933639266, CurrSamplesPerSec=5.698487425498509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:19:04,854] [INFO] [timer.py:197:stop] 0/870, RunningAvgSamplesPerSec=6.323423345732062, CurrSamplesPerSec=5.697551026143254, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:19:16,313] [INFO] [timer.py:197:stop] 0/872, RunningAvgSamplesPerSec=6.32338831580146, CurrSamplesPerSec=5.650067330898288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:19:27,612] [INFO] [timer.py:197:stop] 0/874, RunningAvgSamplesPerSec=6.323402354094561, CurrSamplesPerSec=5.688818239689118, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:19:39,042] [INFO] [timer.py:197:stop] 0/876, RunningAvgSamplesPerSec=6.323316905278167, CurrSamplesPerSec=5.6311098282192935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:19:50,355] [INFO] [timer.py:197:stop] 0/878, RunningAvgSamplesPerSec=6.323354232497454, CurrSamplesPerSec=5.697809345830576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:20:01,724] [INFO] [logging.py:68:log_dist] [Rank 0] step=440, skipped=6, lr=[9.7722083805128e-06], mom=[[0.9, 0.999]] [2022-12-19 01:20:01,725] [INFO] [timer.py:197:stop] 0/880, RunningAvgSamplesPerSec=6.32338878115084, CurrSamplesPerSec=5.700413684095496, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:20:13,158] [INFO] [timer.py:197:stop] 0/882, RunningAvgSamplesPerSec=6.323436968059735, CurrSamplesPerSec=5.703073958759219, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:20:24,649] [INFO] [timer.py:197:stop] 0/884, RunningAvgSamplesPerSec=6.323396519975937, CurrSamplesPerSec=5.6685287139997245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:20:36,012] [INFO] [timer.py:197:stop] 0/886, RunningAvgSamplesPerSec=6.323400921048054, CurrSamplesPerSec=5.666818929995109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:20:47,383] [INFO] [timer.py:197:stop] 0/888, RunningAvgSamplesPerSec=6.323362386669525, CurrSamplesPerSec=5.668898616819385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:20:58,706] [INFO] [timer.py:197:stop] 0/890, RunningAvgSamplesPerSec=6.323378027172579, CurrSamplesPerSec=5.702674868313015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:21:10,066] [INFO] [timer.py:197:stop] 0/892, RunningAvgSamplesPerSec=6.32336703470946, CurrSamplesPerSec=5.667214692479574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:21:21,400] [INFO] [timer.py:197:stop] 0/894, RunningAvgSamplesPerSec=6.3233560277876695, CurrSamplesPerSec=5.6769286509397885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:21:32,744] [INFO] [timer.py:197:stop] 0/896, RunningAvgSamplesPerSec=6.323348702379982, CurrSamplesPerSec=5.688389077480841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:21:44,108] [INFO] [timer.py:197:stop] 0/898, RunningAvgSamplesPerSec=6.323314143836581, CurrSamplesPerSec=5.672014658742165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:21:55,445] [INFO] [logging.py:68:log_dist] [Rank 0] step=450, skipped=6, lr=[9.808863995752003e-06], mom=[[0.9, 0.999]] [2022-12-19 01:21:55,447] [INFO] [timer.py:197:stop] 0/900, RunningAvgSamplesPerSec=6.323313724616391, CurrSamplesPerSec=5.653171985653622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:22:06,755] [INFO] [timer.py:197:stop] 0/902, RunningAvgSamplesPerSec=6.323335546365318, CurrSamplesPerSec=5.674087607163524, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.035, 'learning_rate': 9.812484046603779e-06, 'epoch': 3.38} [2022-12-19 01:22:18,059] [INFO] [timer.py:197:stop] 0/904, RunningAvgSamplesPerSec=6.323352218581553, CurrSamplesPerSec=5.690782362261613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:22:29,388] [INFO] [timer.py:197:stop] 0/906, RunningAvgSamplesPerSec=6.323354588910296, CurrSamplesPerSec=5.698235092305796, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:22:40,707] [INFO] [timer.py:197:stop] 0/908, RunningAvgSamplesPerSec=6.323370207784674, CurrSamplesPerSec=5.699608074182279, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:22:52,073] [INFO] [timer.py:197:stop] 0/910, RunningAvgSamplesPerSec=6.323344722975836, CurrSamplesPerSec=5.67844849739414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:23:03,420] [INFO] [timer.py:197:stop] 0/912, RunningAvgSamplesPerSec=6.323322930840565, CurrSamplesPerSec=5.684461155249277, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:23:14,743] [INFO] [timer.py:197:stop] 0/914, RunningAvgSamplesPerSec=6.323304140316191, CurrSamplesPerSec=5.674361555875228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:23:26,064] [INFO] [timer.py:197:stop] 0/916, RunningAvgSamplesPerSec=6.323299847573087, CurrSamplesPerSec=5.698749459550104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:23:37,377] [INFO] [timer.py:197:stop] 0/918, RunningAvgSamplesPerSec=6.323292844634116, CurrSamplesPerSec=5.689933882120396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:23:48,716] [INFO] [logging.py:68:log_dist] [Rank 0] step=460, skipped=6, lr=[9.844703159310488e-06], mom=[[0.9, 0.999]] [2022-12-19 01:23:48,718] [INFO] [timer.py:197:stop] 0/920, RunningAvgSamplesPerSec=6.323303208444164, CurrSamplesPerSec=5.695446400367719, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:24:00,065] [INFO] [timer.py:197:stop] 0/922, RunningAvgSamplesPerSec=6.323269289437433, CurrSamplesPerSec=5.669692454741702, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:24:11,407] [INFO] [timer.py:197:stop] 0/924, RunningAvgSamplesPerSec=6.323259471754332, CurrSamplesPerSec=5.679810517433488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:24:22,740] [INFO] [timer.py:197:stop] 0/926, RunningAvgSamplesPerSec=6.323281103949813, CurrSamplesPerSec=5.700637397434466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:24:34,081] [INFO] [timer.py:197:stop] 0/928, RunningAvgSamplesPerSec=6.3232830899059795, CurrSamplesPerSec=5.674602182386966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:24:45,472] [INFO] [timer.py:197:stop] 0/930, RunningAvgSamplesPerSec=6.323205054575337, CurrSamplesPerSec=5.6307578326207715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:24:56,791] [INFO] [timer.py:197:stop] 0/932, RunningAvgSamplesPerSec=6.323213342725284, CurrSamplesPerSec=5.684693249022368, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:25:08,121] [INFO] [timer.py:197:stop] 0/934, RunningAvgSamplesPerSec=6.3232047024702505, CurrSamplesPerSec=5.675793380201075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:25:19,441] [INFO] [timer.py:197:stop] 0/936, RunningAvgSamplesPerSec=6.323231242771201, CurrSamplesPerSec=5.717024801752569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:25:30,788] [INFO] [timer.py:197:stop] 0/938, RunningAvgSamplesPerSec=6.323211393401402, CurrSamplesPerSec=5.680549232323541, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:25:42,101] [INFO] [logging.py:68:log_dist] [Rank 0] step=470, skipped=6, lr=[9.879761450742313e-06], mom=[[0.9, 0.999]] [2022-12-19 01:25:42,102] [INFO] [timer.py:197:stop] 0/940, RunningAvgSamplesPerSec=6.3232060179743215, CurrSamplesPerSec=5.676508242939549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:25:53,427] [INFO] [timer.py:197:stop] 0/942, RunningAvgSamplesPerSec=6.323238049010313, CurrSamplesPerSec=5.6956842261249525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:26:04,757] [INFO] [timer.py:197:stop] 0/944, RunningAvgSamplesPerSec=6.323240741546512, CurrSamplesPerSec=5.70115195632424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:26:16,089] [INFO] [timer.py:197:stop] 0/946, RunningAvgSamplesPerSec=6.323283876059346, CurrSamplesPerSec=5.732785974417554, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:26:27,424] [INFO] [timer.py:197:stop] 0/948, RunningAvgSamplesPerSec=6.323312334629156, CurrSamplesPerSec=5.704214351871154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:26:38,764] [INFO] [timer.py:197:stop] 0/950, RunningAvgSamplesPerSec=6.323316030948574, CurrSamplesPerSec=5.692720074519824, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:26:50,113] [INFO] [timer.py:197:stop] 0/952, RunningAvgSamplesPerSec=6.3232855482566075, CurrSamplesPerSec=5.666921574084832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0359, 'learning_rate': 9.900435550016748e-06, 'epoch': 3.57} [2022-12-19 01:27:01,958] [INFO] [timer.py:197:stop] 0/954, RunningAvgSamplesPerSec=6.323210304497367, CurrSamplesPerSec=5.653149365456727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:27:13,917] [INFO] [timer.py:197:stop] 0/956, RunningAvgSamplesPerSec=6.323053201249547, CurrSamplesPerSec=5.644260803578544, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:27:25,649] [INFO] [timer.py:197:stop] 0/958, RunningAvgSamplesPerSec=6.323061985415543, CurrSamplesPerSec=5.697902472435686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:27:37,082] [INFO] [logging.py:68:log_dist] [Rank 0] step=480, skipped=6, lr=[9.91407217336734e-06], mom=[[0.9, 0.999]] [2022-12-19 01:27:37,083] [INFO] [timer.py:197:stop] 0/960, RunningAvgSamplesPerSec=6.323074868472379, CurrSamplesPerSec=5.677153886551778, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:27:48,423] [INFO] [timer.py:197:stop] 0/962, RunningAvgSamplesPerSec=6.323072695087377, CurrSamplesPerSec=5.680810340553457, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:27:59,870] [INFO] [timer.py:197:stop] 0/964, RunningAvgSamplesPerSec=6.323061451196268, CurrSamplesPerSec=5.6770407862652705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:28:11,187] [INFO] [timer.py:197:stop] 0/966, RunningAvgSamplesPerSec=6.323070853307474, CurrSamplesPerSec=5.683402046694541, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:28:22,565] [INFO] [timer.py:197:stop] 0/968, RunningAvgSamplesPerSec=6.323063793453601, CurrSamplesPerSec=5.69253199007323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:28:34,005] [INFO] [timer.py:197:stop] 0/970, RunningAvgSamplesPerSec=6.323000504824408, CurrSamplesPerSec=5.649307512477133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:28:45,469] [INFO] [timer.py:197:stop] 0/972, RunningAvgSamplesPerSec=6.322978679455104, CurrSamplesPerSec=5.665946485745061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:28:56,797] [INFO] [timer.py:197:stop] 0/974, RunningAvgSamplesPerSec=6.323024918882465, CurrSamplesPerSec=5.723629557285836, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:29:08,196] [INFO] [timer.py:197:stop] 0/976, RunningAvgSamplesPerSec=6.322984791446119, CurrSamplesPerSec=5.6574342397492785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:29:19,660] [INFO] [timer.py:197:stop] 0/978, RunningAvgSamplesPerSec=6.322970862618648, CurrSamplesPerSec=5.695188536548119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:29:30,980] [INFO] [logging.py:68:log_dist] [Rank 0] step=490, skipped=6, lr=[9.947666544389474e-06], mom=[[0.9, 0.999]] [2022-12-19 01:29:30,981] [INFO] [timer.py:197:stop] 0/980, RunningAvgSamplesPerSec=6.322983041075797, CurrSamplesPerSec=5.683181368537643, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:29:42,438] [INFO] [timer.py:197:stop] 0/982, RunningAvgSamplesPerSec=6.322988408088883, CurrSamplesPerSec=5.6952731191076085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:29:53,980] [INFO] [timer.py:197:stop] 0/984, RunningAvgSamplesPerSec=6.322996048898094, CurrSamplesPerSec=5.676496239056689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:30:05,278] [INFO] [timer.py:197:stop] 0/986, RunningAvgSamplesPerSec=6.3230382437678, CurrSamplesPerSec=5.7131319040698605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:30:16,608] [INFO] [timer.py:197:stop] 0/988, RunningAvgSamplesPerSec=6.323051433503477, CurrSamplesPerSec=5.711673403522599, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:30:27,992] [INFO] [timer.py:197:stop] 0/990, RunningAvgSamplesPerSec=6.323072542187161, CurrSamplesPerSec=5.701948071120495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:30:39,463] [INFO] [timer.py:197:stop] 0/992, RunningAvgSamplesPerSec=6.323062899615077, CurrSamplesPerSec=5.682492250688604, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:30:51,010] [INFO] [timer.py:197:stop] 0/994, RunningAvgSamplesPerSec=6.323011157687023, CurrSamplesPerSec=5.651232544913681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:31:02,366] [INFO] [timer.py:197:stop] 0/996, RunningAvgSamplesPerSec=6.322978668959763, CurrSamplesPerSec=5.673821599972912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:31:13,759] [INFO] [timer.py:197:stop] 0/998, RunningAvgSamplesPerSec=6.322917247758531, CurrSamplesPerSec=5.663485617511615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:31:25,101] [INFO] [logging.py:68:log_dist] [Rank 0] step=500, skipped=6, lr=[9.98057386557113e-06], mom=[[0.9, 0.999]] [2022-12-19 01:31:25,103] [INFO] [timer.py:197:stop] 0/1000, RunningAvgSamplesPerSec=6.322900554029162, CurrSamplesPerSec=5.689454869188859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:31:36,430] [INFO] [timer.py:197:stop] 0/1002, RunningAvgSamplesPerSec=6.322925039750919, CurrSamplesPerSec=5.6937348342671745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0369, 'learning_rate': 9.98382788472848e-06, 'epoch': 3.76} [2022-12-19 01:31:47,789] [INFO] [timer.py:197:stop] 0/1004, RunningAvgSamplesPerSec=6.32291374621284, CurrSamplesPerSec=5.698999418456137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:31:59,234] [INFO] [timer.py:197:stop] 0/1006, RunningAvgSamplesPerSec=6.322916256308939, CurrSamplesPerSec=5.7036088323887855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:32:10,597] [INFO] [timer.py:197:stop] 0/1008, RunningAvgSamplesPerSec=6.3228884449951215, CurrSamplesPerSec=5.678974917097706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:32:21,979] [INFO] [timer.py:197:stop] 0/1010, RunningAvgSamplesPerSec=6.322889333406262, CurrSamplesPerSec=5.6726719874260825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:32:33,271] [INFO] [timer.py:197:stop] 0/1012, RunningAvgSamplesPerSec=6.322960707822512, CurrSamplesPerSec=5.72160489343192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:32:44,686] [INFO] [timer.py:197:stop] 0/1014, RunningAvgSamplesPerSec=6.3229558689351535, CurrSamplesPerSec=5.689121103228353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:32:56,034] [INFO] [timer.py:197:stop] 0/1016, RunningAvgSamplesPerSec=6.322939953575906, CurrSamplesPerSec=5.68339747412702, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:33:07,392] [INFO] [timer.py:197:stop] 0/1018, RunningAvgSamplesPerSec=6.322930569069281, CurrSamplesPerSec=5.681929097959649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:33:18,735] [INFO] [logging.py:68:log_dist] [Rank 0] step=510, skipped=6, lr=[9.993333333333333e-06], mom=[[0.9, 0.999]] [2022-12-19 01:33:18,736] [INFO] [timer.py:197:stop] 0/1020, RunningAvgSamplesPerSec=6.32292375774324, CurrSamplesPerSec=5.6816339744942725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:33:30,010] [INFO] [timer.py:197:stop] 0/1022, RunningAvgSamplesPerSec=6.322945414488586, CurrSamplesPerSec=5.696318283500739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:33:41,334] [INFO] [timer.py:197:stop] 0/1024, RunningAvgSamplesPerSec=6.322984246719093, CurrSamplesPerSec=5.705652794140159, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:33:52,658] [INFO] [timer.py:197:stop] 0/1026, RunningAvgSamplesPerSec=6.323030763936265, CurrSamplesPerSec=5.712995966314802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:34:04,035] [INFO] [timer.py:197:stop] 0/1028, RunningAvgSamplesPerSec=6.322984603066996, CurrSamplesPerSec=5.640822866337973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:34:15,328] [INFO] [timer.py:197:stop] 0/1030, RunningAvgSamplesPerSec=6.32303087310678, CurrSamplesPerSec=5.70957922130605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:34:26,770] [INFO] [timer.py:197:stop] 0/1032, RunningAvgSamplesPerSec=6.32305841440387, CurrSamplesPerSec=5.690779708104953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:34:38,112] [INFO] [timer.py:197:stop] 0/1034, RunningAvgSamplesPerSec=6.32306724241664, CurrSamplesPerSec=5.691750812101744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:34:49,406] [INFO] [timer.py:197:stop] 0/1036, RunningAvgSamplesPerSec=6.32310307845081, CurrSamplesPerSec=5.708598382672515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:35:00,711] [INFO] [timer.py:197:stop] 0/1038, RunningAvgSamplesPerSec=6.3231051157395095, CurrSamplesPerSec=5.68638010955131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:35:12,152] [INFO] [logging.py:68:log_dist] [Rank 0] step=520, skipped=6, lr=[9.97111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 01:35:12,154] [INFO] [timer.py:197:stop] 0/1040, RunningAvgSamplesPerSec=6.32315279765852, CurrSamplesPerSec=5.699839470064783, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:35:23,483] [INFO] [timer.py:197:stop] 0/1042, RunningAvgSamplesPerSec=6.323164489487023, CurrSamplesPerSec=5.692513641036533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:35:34,762] [INFO] [timer.py:197:stop] 0/1044, RunningAvgSamplesPerSec=6.32322771359841, CurrSamplesPerSec=5.718281384936492, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:35:46,247] [INFO] [timer.py:197:stop] 0/1046, RunningAvgSamplesPerSec=6.323294008773126, CurrSamplesPerSec=5.7246219166991565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:35:57,534] [INFO] [timer.py:197:stop] 0/1048, RunningAvgSamplesPerSec=6.323331923594489, CurrSamplesPerSec=5.71667781030241, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:36:08,805] [INFO] [timer.py:197:stop] 0/1050, RunningAvgSamplesPerSec=6.323393744630139, CurrSamplesPerSec=5.723526069008761, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:36:20,121] [INFO] [timer.py:197:stop] 0/1052, RunningAvgSamplesPerSec=6.323404218137753, CurrSamplesPerSec=5.694187753131443, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0384, 'learning_rate': 9.957777777777779e-06, 'epoch': 3.94} [2022-12-19 01:36:31,566] [INFO] [timer.py:197:stop] 0/1054, RunningAvgSamplesPerSec=6.323466996820582, CurrSamplesPerSec=5.719525846604936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:36:43,097] [INFO] [timer.py:197:stop] 0/1056, RunningAvgSamplesPerSec=6.3234886046669905, CurrSamplesPerSec=5.690465811219692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:36:54,569] [INFO] [timer.py:197:stop] 0/1058, RunningAvgSamplesPerSec=6.323547134916665, CurrSamplesPerSec=5.709135993011103, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:37:06,006] [INFO] [logging.py:68:log_dist] [Rank 0] step=530, skipped=6, lr=[9.94888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 01:37:06,008] [INFO] [timer.py:197:stop] 0/1060, RunningAvgSamplesPerSec=6.3235982996251545, CurrSamplesPerSec=5.694189202587467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:37:17,489] [INFO] [timer.py:197:stop] 0/1062, RunningAvgSamplesPerSec=6.323633751245574, CurrSamplesPerSec=5.715643417021246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:37:28,779] [INFO] [timer.py:197:stop] 0/1064, RunningAvgSamplesPerSec=6.323661512678611, CurrSamplesPerSec=5.698340329223768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:37:40,093] [INFO] [timer.py:197:stop] 0/1066, RunningAvgSamplesPerSec=6.323673481889273, CurrSamplesPerSec=5.695503196380515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:37:50,533] [INFO] [timer.py:197:stop] 0/1068, RunningAvgSamplesPerSec=6.324634287150376, CurrSamplesPerSec=6.682176428972918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:38:01,884] [INFO] [timer.py:197:stop] 0/1070, RunningAvgSamplesPerSec=6.32467691094481, CurrSamplesPerSec=5.700260678157333, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:38:13,456] [INFO] [timer.py:197:stop] 0/1072, RunningAvgSamplesPerSec=6.3245939777038425, CurrSamplesPerSec=5.699422196538105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:38:24,747] [INFO] [timer.py:197:stop] 0/1074, RunningAvgSamplesPerSec=6.3246355004200465, CurrSamplesPerSec=5.718520390670772, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:38:36,077] [INFO] [timer.py:197:stop] 0/1076, RunningAvgSamplesPerSec=6.324659660489291, CurrSamplesPerSec=5.70697038592648, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:38:47,579] [INFO] [timer.py:197:stop] 0/1078, RunningAvgSamplesPerSec=6.324683406855854, CurrSamplesPerSec=5.711681424575847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:38:58,908] [INFO] [logging.py:68:log_dist] [Rank 0] step=540, skipped=6, lr=[9.926666666666668e-06], mom=[[0.9, 0.999]] [2022-12-19 01:38:58,910] [INFO] [timer.py:197:stop] 0/1080, RunningAvgSamplesPerSec=6.3246894487748575, CurrSamplesPerSec=5.69167478181544, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:39:10,232] [INFO] [timer.py:197:stop] 0/1082, RunningAvgSamplesPerSec=6.324724402554547, CurrSamplesPerSec=5.725545498402413, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:39:21,519] [INFO] [timer.py:197:stop] 0/1084, RunningAvgSamplesPerSec=6.32476391106619, CurrSamplesPerSec=5.698932147591169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:39:32,830] [INFO] [timer.py:197:stop] 0/1086, RunningAvgSamplesPerSec=6.3247929630953355, CurrSamplesPerSec=5.703348290109618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:39:44,213] [INFO] [timer.py:197:stop] 0/1088, RunningAvgSamplesPerSec=6.324850666063967, CurrSamplesPerSec=5.721373434210019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:39:55,587] [INFO] [timer.py:197:stop] 0/1090, RunningAvgSamplesPerSec=6.324884669639123, CurrSamplesPerSec=5.724564294296037, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:40:06,880] [INFO] [timer.py:197:stop] 0/1092, RunningAvgSamplesPerSec=6.324903443332988, CurrSamplesPerSec=5.708613921929661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:40:18,455] [INFO] [timer.py:197:stop] 0/1094, RunningAvgSamplesPerSec=6.324914008563583, CurrSamplesPerSec=5.695235177477344, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:40:29,906] [INFO] [timer.py:197:stop] 0/1096, RunningAvgSamplesPerSec=6.324924544190334, CurrSamplesPerSec=5.699242865146464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:40:41,231] [INFO] [timer.py:197:stop] 0/1098, RunningAvgSamplesPerSec=6.324970888911037, CurrSamplesPerSec=5.710541202707478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:40:52,558] [INFO] [logging.py:68:log_dist] [Rank 0] step=550, skipped=6, lr=[9.904444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 01:40:52,560] [INFO] [timer.py:197:stop] 0/1100, RunningAvgSamplesPerSec=6.324971563388574, CurrSamplesPerSec=5.69697932316346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:41:03,909] [INFO] [timer.py:197:stop] 0/1102, RunningAvgSamplesPerSec=6.324976909037321, CurrSamplesPerSec=5.691620234353525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:41:15,246] [INFO] [timer.py:197:stop] 0/1104, RunningAvgSamplesPerSec=6.324959432707775, CurrSamplesPerSec=5.658989713678185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0274, 'learning_rate': 9.9e-06, 'epoch': 4.13} [2022-12-19 01:41:26,705] [INFO] [timer.py:197:stop] 0/1106, RunningAvgSamplesPerSec=6.32494071497359, CurrSamplesPerSec=5.666268687087448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:41:38,069] [INFO] [timer.py:197:stop] 0/1108, RunningAvgSamplesPerSec=6.324912812331334, CurrSamplesPerSec=5.6722661133204575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:41:49,397] [INFO] [timer.py:197:stop] 0/1110, RunningAvgSamplesPerSec=6.324924197247109, CurrSamplesPerSec=5.705353260948113, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:42:00,730] [INFO] [timer.py:197:stop] 0/1112, RunningAvgSamplesPerSec=6.324923475715562, CurrSamplesPerSec=5.696250108912601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:42:12,075] [INFO] [timer.py:197:stop] 0/1114, RunningAvgSamplesPerSec=6.324928165247696, CurrSamplesPerSec=5.68334476980095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:42:23,365] [INFO] [timer.py:197:stop] 0/1116, RunningAvgSamplesPerSec=6.324958701477878, CurrSamplesPerSec=5.695486519986947, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:42:34,741] [INFO] [timer.py:197:stop] 0/1118, RunningAvgSamplesPerSec=6.324939926456509, CurrSamplesPerSec=5.680064586753917, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:42:46,083] [INFO] [logging.py:68:log_dist] [Rank 0] step=560, skipped=6, lr=[9.882222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 01:42:46,085] [INFO] [timer.py:197:stop] 0/1120, RunningAvgSamplesPerSec=6.324918290930486, CurrSamplesPerSec=5.67784651398351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:42:57,608] [INFO] [timer.py:197:stop] 0/1122, RunningAvgSamplesPerSec=6.324764020850468, CurrSamplesPerSec=5.547015492313604, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:43:08,935] [INFO] [timer.py:197:stop] 0/1124, RunningAvgSamplesPerSec=6.324796596619278, CurrSamplesPerSec=5.713381423831785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:43:20,277] [INFO] [timer.py:197:stop] 0/1126, RunningAvgSamplesPerSec=6.3248052723591925, CurrSamplesPerSec=5.690979500636292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:43:31,681] [INFO] [timer.py:197:stop] 0/1128, RunningAvgSamplesPerSec=6.3247958738284344, CurrSamplesPerSec=5.6889424194851825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:43:43,057] [INFO] [timer.py:197:stop] 0/1130, RunningAvgSamplesPerSec=6.3248224663505415, CurrSamplesPerSec=5.689105911056516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:43:54,370] [INFO] [timer.py:197:stop] 0/1132, RunningAvgSamplesPerSec=6.324841766909368, CurrSamplesPerSec=5.702669537792552, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:44:05,742] [INFO] [timer.py:197:stop] 0/1134, RunningAvgSamplesPerSec=6.324824531402452, CurrSamplesPerSec=5.663527199969653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:44:17,080] [INFO] [timer.py:197:stop] 0/1136, RunningAvgSamplesPerSec=6.324830973021693, CurrSamplesPerSec=5.6934302717541225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:44:28,419] [INFO] [timer.py:197:stop] 0/1138, RunningAvgSamplesPerSec=6.3248227899266665, CurrSamplesPerSec=5.690067518444597, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:44:39,735] [INFO] [logging.py:68:log_dist] [Rank 0] step=570, skipped=6, lr=[9.86e-06], mom=[[0.9, 0.999]] [2022-12-19 01:44:39,737] [INFO] [timer.py:197:stop] 0/1140, RunningAvgSamplesPerSec=6.324844393903044, CurrSamplesPerSec=5.700690665103304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:44:51,047] [INFO] [timer.py:197:stop] 0/1142, RunningAvgSamplesPerSec=6.324839321410804, CurrSamplesPerSec=5.696933620898516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:45:02,419] [INFO] [timer.py:197:stop] 0/1144, RunningAvgSamplesPerSec=6.324853029775292, CurrSamplesPerSec=5.709043955873813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:45:13,720] [INFO] [timer.py:197:stop] 0/1146, RunningAvgSamplesPerSec=6.324889788383863, CurrSamplesPerSec=5.7113647316193985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:45:25,076] [INFO] [timer.py:197:stop] 0/1148, RunningAvgSamplesPerSec=6.324897604613154, CurrSamplesPerSec=5.705412437482551, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:45:36,460] [INFO] [timer.py:197:stop] 0/1150, RunningAvgSamplesPerSec=6.324880461634712, CurrSamplesPerSec=5.681721041146652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:45:47,778] [INFO] [timer.py:197:stop] 0/1152, RunningAvgSamplesPerSec=6.3248681661360395, CurrSamplesPerSec=5.6859392712596915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:45:59,199] [INFO] [timer.py:197:stop] 0/1154, RunningAvgSamplesPerSec=6.32486256081392, CurrSamplesPerSec=5.6809637468211225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0236, 'learning_rate': 9.844444444444446e-06, 'epoch': 4.32} [2022-12-19 01:46:10,526] [INFO] [timer.py:197:stop] 0/1156, RunningAvgSamplesPerSec=6.324858652698689, CurrSamplesPerSec=5.693544026025647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:46:21,972] [INFO] [timer.py:197:stop] 0/1158, RunningAvgSamplesPerSec=6.32485053124946, CurrSamplesPerSec=5.68506165253418, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:46:33,471] [INFO] [logging.py:68:log_dist] [Rank 0] step=580, skipped=6, lr=[9.837777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 01:46:33,472] [INFO] [timer.py:197:stop] 0/1160, RunningAvgSamplesPerSec=6.324843769577704, CurrSamplesPerSec=5.6795591145391215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:46:44,777] [INFO] [timer.py:197:stop] 0/1162, RunningAvgSamplesPerSec=6.324900878253531, CurrSamplesPerSec=5.7268080295969455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:46:56,140] [INFO] [timer.py:197:stop] 0/1164, RunningAvgSamplesPerSec=6.32492556450326, CurrSamplesPerSec=5.691554102992158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:47:07,485] [INFO] [timer.py:197:stop] 0/1166, RunningAvgSamplesPerSec=6.324901005215093, CurrSamplesPerSec=5.673412683551026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:47:18,816] [INFO] [timer.py:197:stop] 0/1168, RunningAvgSamplesPerSec=6.324909920715825, CurrSamplesPerSec=5.690371238777734, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:47:30,174] [INFO] [timer.py:197:stop] 0/1170, RunningAvgSamplesPerSec=6.3249256623144525, CurrSamplesPerSec=5.711775734406699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:47:41,661] [INFO] [timer.py:197:stop] 0/1172, RunningAvgSamplesPerSec=6.32490763089565, CurrSamplesPerSec=5.678518649015999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:47:53,117] [INFO] [timer.py:197:stop] 0/1174, RunningAvgSamplesPerSec=6.324886759410804, CurrSamplesPerSec=5.682959745020392, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:48:04,480] [INFO] [timer.py:197:stop] 0/1176, RunningAvgSamplesPerSec=6.324882111428021, CurrSamplesPerSec=5.692186034068705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:48:15,949] [INFO] [timer.py:197:stop] 0/1178, RunningAvgSamplesPerSec=6.324891442697314, CurrSamplesPerSec=5.705409769661542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:48:27,280] [INFO] [logging.py:68:log_dist] [Rank 0] step=590, skipped=6, lr=[9.815555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 01:48:27,281] [INFO] [timer.py:197:stop] 0/1180, RunningAvgSamplesPerSec=6.3249009669893, CurrSamplesPerSec=5.69699189746022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:48:38,768] [INFO] [timer.py:197:stop] 0/1182, RunningAvgSamplesPerSec=6.324901413016358, CurrSamplesPerSec=5.682886596209592, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:48:50,057] [INFO] [timer.py:197:stop] 0/1184, RunningAvgSamplesPerSec=6.324928789090779, CurrSamplesPerSec=5.695716131091108, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:49:01,381] [INFO] [timer.py:197:stop] 0/1186, RunningAvgSamplesPerSec=6.324937469834437, CurrSamplesPerSec=5.69275580955522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:49:12,701] [INFO] [timer.py:197:stop] 0/1188, RunningAvgSamplesPerSec=6.324954616801356, CurrSamplesPerSec=5.70224991172664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:49:24,185] [INFO] [timer.py:197:stop] 0/1190, RunningAvgSamplesPerSec=6.324946494106825, CurrSamplesPerSec=5.684085607962462, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:49:35,633] [INFO] [timer.py:197:stop] 0/1192, RunningAvgSamplesPerSec=6.324946190116836, CurrSamplesPerSec=5.685711170302852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:49:47,012] [INFO] [timer.py:197:stop] 0/1194, RunningAvgSamplesPerSec=6.3249330850943775, CurrSamplesPerSec=5.676195918432251, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:49:58,318] [INFO] [timer.py:197:stop] 0/1196, RunningAvgSamplesPerSec=6.324939638226609, CurrSamplesPerSec=5.694087017745471, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:50:09,617] [INFO] [timer.py:197:stop] 0/1198, RunningAvgSamplesPerSec=6.324974933378967, CurrSamplesPerSec=5.707543125168769, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:50:20,959] [INFO] [logging.py:68:log_dist] [Rank 0] step=600, skipped=6, lr=[9.793333333333333e-06], mom=[[0.9, 0.999]] [2022-12-19 01:50:20,961] [INFO] [timer.py:197:stop] 0/1200, RunningAvgSamplesPerSec=6.32499206618706, CurrSamplesPerSec=5.693619864732031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:50:32,282] [INFO] [timer.py:197:stop] 0/1202, RunningAvgSamplesPerSec=6.325001612094914, CurrSamplesPerSec=5.697246055241894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:50:43,566] [INFO] [timer.py:197:stop] 0/1204, RunningAvgSamplesPerSec=6.325050703353141, CurrSamplesPerSec=5.7203190561748505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0246, 'learning_rate': 9.78888888888889e-06, 'epoch': 4.51} [2022-12-19 01:50:54,880] [INFO] [timer.py:197:stop] 0/1206, RunningAvgSamplesPerSec=6.325062201120487, CurrSamplesPerSec=5.6983712962294675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:51:06,240] [INFO] [timer.py:197:stop] 0/1208, RunningAvgSamplesPerSec=6.325048528672647, CurrSamplesPerSec=5.674052825725717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:51:17,590] [INFO] [timer.py:197:stop] 0/1210, RunningAvgSamplesPerSec=6.325051108575158, CurrSamplesPerSec=5.693770582043694, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:51:28,899] [INFO] [timer.py:197:stop] 0/1212, RunningAvgSamplesPerSec=6.325073728314405, CurrSamplesPerSec=5.714869023518775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:51:40,218] [INFO] [timer.py:197:stop] 0/1214, RunningAvgSamplesPerSec=6.325086425815481, CurrSamplesPerSec=5.703932422683251, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:51:51,559] [INFO] [timer.py:197:stop] 0/1216, RunningAvgSamplesPerSec=6.325093477784379, CurrSamplesPerSec=5.688901427496772, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:52:02,896] [INFO] [timer.py:197:stop] 0/1218, RunningAvgSamplesPerSec=6.325131467261908, CurrSamplesPerSec=5.706784755713206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:52:14,192] [INFO] [logging.py:68:log_dist] [Rank 0] step=610, skipped=6, lr=[9.771111111111113e-06], mom=[[0.9, 0.999]] [2022-12-19 01:52:14,194] [INFO] [timer.py:197:stop] 0/1220, RunningAvgSamplesPerSec=6.3251471976647045, CurrSamplesPerSec=5.702073067045441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:52:25,485] [INFO] [timer.py:197:stop] 0/1222, RunningAvgSamplesPerSec=6.325166622369134, CurrSamplesPerSec=5.693566970644037, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:52:36,876] [INFO] [timer.py:197:stop] 0/1224, RunningAvgSamplesPerSec=6.325120746866488, CurrSamplesPerSec=5.645968875302027, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:52:48,231] [INFO] [timer.py:197:stop] 0/1226, RunningAvgSamplesPerSec=6.3251296745900145, CurrSamplesPerSec=5.706563956413903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:52:59,590] [INFO] [timer.py:197:stop] 0/1228, RunningAvgSamplesPerSec=6.32510891943198, CurrSamplesPerSec=5.7023630496254185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:53:10,916] [INFO] [timer.py:197:stop] 0/1230, RunningAvgSamplesPerSec=6.325104260058115, CurrSamplesPerSec=5.682448464629102, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:53:22,259] [INFO] [timer.py:197:stop] 0/1232, RunningAvgSamplesPerSec=6.325117850544401, CurrSamplesPerSec=5.701988040151647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:53:33,603] [INFO] [timer.py:197:stop] 0/1234, RunningAvgSamplesPerSec=6.325133320435429, CurrSamplesPerSec=5.712330961596245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:53:44,961] [INFO] [timer.py:197:stop] 0/1236, RunningAvgSamplesPerSec=6.325110017499254, CurrSamplesPerSec=5.692100818981726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:53:56,339] [INFO] [timer.py:197:stop] 0/1238, RunningAvgSamplesPerSec=6.325094500754509, CurrSamplesPerSec=5.690047496718268, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:54:07,677] [INFO] [logging.py:68:log_dist] [Rank 0] step=620, skipped=6, lr=[9.74888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 01:54:07,678] [INFO] [timer.py:197:stop] 0/1240, RunningAvgSamplesPerSec=6.325107528450415, CurrSamplesPerSec=5.706040414941849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:54:19,039] [INFO] [timer.py:197:stop] 0/1242, RunningAvgSamplesPerSec=6.325086624742035, CurrSamplesPerSec=5.682877212123439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:54:30,385] [INFO] [timer.py:197:stop] 0/1244, RunningAvgSamplesPerSec=6.325093076058267, CurrSamplesPerSec=5.716175051816756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:54:41,713] [INFO] [timer.py:197:stop] 0/1246, RunningAvgSamplesPerSec=6.3251015763933935, CurrSamplesPerSec=5.706320126491812, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:54:53,045] [INFO] [timer.py:197:stop] 0/1248, RunningAvgSamplesPerSec=6.32512076106113, CurrSamplesPerSec=5.704397390468149, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:55:04,341] [INFO] [timer.py:197:stop] 0/1250, RunningAvgSamplesPerSec=6.325156152355773, CurrSamplesPerSec=5.70754361058976, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:55:15,602] [INFO] [timer.py:197:stop] 0/1252, RunningAvgSamplesPerSec=6.325193247669528, CurrSamplesPerSec=5.702770819385449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:55:26,935] [INFO] [timer.py:197:stop] 0/1254, RunningAvgSamplesPerSec=6.3252061087681595, CurrSamplesPerSec=5.711705001741585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0238, 'learning_rate': 9.733333333333334e-06, 'epoch': 4.7} [2022-12-19 01:55:38,225] [INFO] [timer.py:197:stop] 0/1256, RunningAvgSamplesPerSec=6.325240632893343, CurrSamplesPerSec=5.71181389674396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:55:49,529] [INFO] [timer.py:197:stop] 0/1258, RunningAvgSamplesPerSec=6.32526863635693, CurrSamplesPerSec=5.699918139262459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:56:00,841] [INFO] [logging.py:68:log_dist] [Rank 0] step=630, skipped=6, lr=[9.726666666666668e-06], mom=[[0.9, 0.999]] [2022-12-19 01:56:00,843] [INFO] [timer.py:197:stop] 0/1260, RunningAvgSamplesPerSec=6.325295558883351, CurrSamplesPerSec=5.718118161105232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:56:12,501] [INFO] [timer.py:197:stop] 0/1262, RunningAvgSamplesPerSec=6.325251124463035, CurrSamplesPerSec=5.651728465684707, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:56:24,406] [INFO] [timer.py:197:stop] 0/1264, RunningAvgSamplesPerSec=6.325246251904512, CurrSamplesPerSec=5.6759393147153565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:56:36,290] [INFO] [timer.py:197:stop] 0/1266, RunningAvgSamplesPerSec=6.325183560356296, CurrSamplesPerSec=5.651439564595148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:56:47,872] [INFO] [timer.py:197:stop] 0/1268, RunningAvgSamplesPerSec=6.325224706749831, CurrSamplesPerSec=5.731369062859605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:56:59,366] [INFO] [timer.py:197:stop] 0/1270, RunningAvgSamplesPerSec=6.3252277813152284, CurrSamplesPerSec=5.700204997655028, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:57:10,635] [INFO] [timer.py:197:stop] 0/1272, RunningAvgSamplesPerSec=6.325287703851555, CurrSamplesPerSec=5.719149794917583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:57:22,023] [INFO] [timer.py:197:stop] 0/1274, RunningAvgSamplesPerSec=6.325255172337922, CurrSamplesPerSec=5.639411472787362, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:57:33,373] [INFO] [timer.py:197:stop] 0/1276, RunningAvgSamplesPerSec=6.325258944356415, CurrSamplesPerSec=5.692668887277734, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:57:44,711] [INFO] [timer.py:197:stop] 0/1278, RunningAvgSamplesPerSec=6.325280754930628, CurrSamplesPerSec=5.707937556946912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:57:56,186] [INFO] [logging.py:68:log_dist] [Rank 0] step=640, skipped=6, lr=[9.704444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 01:57:56,187] [INFO] [timer.py:197:stop] 0/1280, RunningAvgSamplesPerSec=6.32531295051093, CurrSamplesPerSec=5.706398246845059, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:58:07,621] [INFO] [timer.py:197:stop] 0/1282, RunningAvgSamplesPerSec=6.325295195887252, CurrSamplesPerSec=5.665619298721056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:58:19,018] [INFO] [timer.py:197:stop] 0/1284, RunningAvgSamplesPerSec=6.325301561560688, CurrSamplesPerSec=5.67833510528705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:58:30,307] [INFO] [timer.py:197:stop] 0/1286, RunningAvgSamplesPerSec=6.325317038768302, CurrSamplesPerSec=5.701475025697852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:58:41,597] [INFO] [timer.py:197:stop] 0/1288, RunningAvgSamplesPerSec=6.325362471717625, CurrSamplesPerSec=5.718513568619511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:58:52,942] [INFO] [timer.py:197:stop] 0/1290, RunningAvgSamplesPerSec=6.325373169120482, CurrSamplesPerSec=5.688314342368322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:59:04,385] [INFO] [timer.py:197:stop] 0/1292, RunningAvgSamplesPerSec=6.325377462346586, CurrSamplesPerSec=5.707404055455203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:59:15,888] [INFO] [timer.py:197:stop] 0/1294, RunningAvgSamplesPerSec=6.325391194090459, CurrSamplesPerSec=5.6950611841192424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:59:27,209] [INFO] [timer.py:197:stop] 0/1296, RunningAvgSamplesPerSec=6.325429844742659, CurrSamplesPerSec=5.7111965559310365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:59:38,744] [INFO] [timer.py:197:stop] 0/1298, RunningAvgSamplesPerSec=6.32535981605339, CurrSamplesPerSec=5.616333253130525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 01:59:50,192] [INFO] [logging.py:68:log_dist] [Rank 0] step=650, skipped=6, lr=[9.682222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 01:59:50,194] [INFO] [timer.py:197:stop] 0/1300, RunningAvgSamplesPerSec=6.325359474569833, CurrSamplesPerSec=5.683351267541748, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:00:01,501] [INFO] [timer.py:197:stop] 0/1302, RunningAvgSamplesPerSec=6.3253607890454315, CurrSamplesPerSec=5.7196599016118315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:00:12,944] [INFO] [timer.py:197:stop] 0/1304, RunningAvgSamplesPerSec=6.325376961895668, CurrSamplesPerSec=5.705195624531413, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0239, 'learning_rate': 9.677777777777778e-06, 'epoch': 4.88} [2022-12-19 02:00:24,284] [INFO] [timer.py:197:stop] 0/1306, RunningAvgSamplesPerSec=6.32538559015186, CurrSamplesPerSec=5.71115840184258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:00:35,691] [INFO] [timer.py:197:stop] 0/1308, RunningAvgSamplesPerSec=6.325362792814117, CurrSamplesPerSec=5.6685567243685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:00:46,991] [INFO] [timer.py:197:stop] 0/1310, RunningAvgSamplesPerSec=6.325388366129746, CurrSamplesPerSec=5.696379206853834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:00:58,317] [INFO] [timer.py:197:stop] 0/1312, RunningAvgSamplesPerSec=6.325374814942083, CurrSamplesPerSec=5.681701078114891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:01:09,785] [INFO] [timer.py:197:stop] 0/1314, RunningAvgSamplesPerSec=6.32534065932617, CurrSamplesPerSec=5.653513691700786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:01:21,161] [INFO] [timer.py:197:stop] 0/1316, RunningAvgSamplesPerSec=6.325353465877035, CurrSamplesPerSec=5.696653378070349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:01:32,470] [INFO] [timer.py:197:stop] 0/1318, RunningAvgSamplesPerSec=6.325380742160595, CurrSamplesPerSec=5.691755156750885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:01:43,973] [INFO] [logging.py:68:log_dist] [Rank 0] step=660, skipped=6, lr=[9.66e-06], mom=[[0.9, 0.999]] [2022-12-19 02:01:43,975] [INFO] [timer.py:197:stop] 0/1320, RunningAvgSamplesPerSec=6.325358185107902, CurrSamplesPerSec=5.673640278438335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:01:55,322] [INFO] [timer.py:197:stop] 0/1322, RunningAvgSamplesPerSec=6.325343054374821, CurrSamplesPerSec=5.671462447813697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:02:06,772] [INFO] [timer.py:197:stop] 0/1324, RunningAvgSamplesPerSec=6.325350374130517, CurrSamplesPerSec=5.688341102134533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:02:18,127] [INFO] [timer.py:197:stop] 0/1326, RunningAvgSamplesPerSec=6.325337390775681, CurrSamplesPerSec=5.670405781393834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:02:29,572] [INFO] [timer.py:197:stop] 0/1328, RunningAvgSamplesPerSec=6.325329397145289, CurrSamplesPerSec=5.69145852899555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:02:40,951] [INFO] [timer.py:197:stop] 0/1330, RunningAvgSamplesPerSec=6.325278941667035, CurrSamplesPerSec=5.7062065889222895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:02:52,284] [INFO] [timer.py:197:stop] 0/1332, RunningAvgSamplesPerSec=6.325288222566702, CurrSamplesPerSec=5.693684594907549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:03:03,759] [INFO] [timer.py:197:stop] 0/1334, RunningAvgSamplesPerSec=6.325271247113316, CurrSamplesPerSec=5.685391329881376, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:03:14,161] [INFO] [timer.py:197:stop] 0/1336, RunningAvgSamplesPerSec=6.3260235870858965, CurrSamplesPerSec=5.679462741319367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:03:25,503] [INFO] [timer.py:197:stop] 0/1338, RunningAvgSamplesPerSec=6.326021446884541, CurrSamplesPerSec=5.683061530873832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:03:36,882] [INFO] [logging.py:68:log_dist] [Rank 0] step=670, skipped=6, lr=[9.637777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 02:03:36,884] [INFO] [timer.py:197:stop] 0/1340, RunningAvgSamplesPerSec=6.326016064731757, CurrSamplesPerSec=5.696931686427287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:03:48,341] [INFO] [timer.py:197:stop] 0/1342, RunningAvgSamplesPerSec=6.326019465694644, CurrSamplesPerSec=5.689551581817069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:03:59,747] [INFO] [timer.py:197:stop] 0/1344, RunningAvgSamplesPerSec=6.325974472962758, CurrSamplesPerSec=5.6777204163106045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:04:11,186] [INFO] [timer.py:197:stop] 0/1346, RunningAvgSamplesPerSec=6.325981009495647, CurrSamplesPerSec=5.690097430683778, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:04:22,604] [INFO] [timer.py:197:stop] 0/1348, RunningAvgSamplesPerSec=6.325975026369539, CurrSamplesPerSec=5.696325536212535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:04:34,031] [INFO] [timer.py:197:stop] 0/1350, RunningAvgSamplesPerSec=6.325996807819703, CurrSamplesPerSec=5.719779825233946, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:04:45,388] [INFO] [timer.py:197:stop] 0/1352, RunningAvgSamplesPerSec=6.3259849220790665, CurrSamplesPerSec=5.685303187415706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:04:56,736] [INFO] [timer.py:197:stop] 0/1354, RunningAvgSamplesPerSec=6.3259316249294715, CurrSamplesPerSec=5.639803653185618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0204, 'learning_rate': 9.622222222222222e-06, 'epoch': 5.07} [2022-12-19 02:05:08,029] [INFO] [timer.py:197:stop] 0/1356, RunningAvgSamplesPerSec=6.325972569272529, CurrSamplesPerSec=5.730181091217941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:05:19,358] [INFO] [timer.py:197:stop] 0/1358, RunningAvgSamplesPerSec=6.325991540198074, CurrSamplesPerSec=5.728088718850003, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:05:30,704] [INFO] [logging.py:68:log_dist] [Rank 0] step=680, skipped=6, lr=[9.615555555555558e-06], mom=[[0.9, 0.999]] [2022-12-19 02:05:30,706] [INFO] [timer.py:197:stop] 0/1360, RunningAvgSamplesPerSec=6.325995144047257, CurrSamplesPerSec=5.692229728913043, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:05:42,056] [INFO] [timer.py:197:stop] 0/1362, RunningAvgSamplesPerSec=6.325994995404326, CurrSamplesPerSec=5.705406374256594, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:05:53,345] [INFO] [timer.py:197:stop] 0/1364, RunningAvgSamplesPerSec=6.3260101476904165, CurrSamplesPerSec=5.704354235939349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:06:04,671] [INFO] [timer.py:197:stop] 0/1366, RunningAvgSamplesPerSec=6.32601471723045, CurrSamplesPerSec=5.703956420729281, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:06:15,986] [INFO] [timer.py:197:stop] 0/1368, RunningAvgSamplesPerSec=6.326009693836045, CurrSamplesPerSec=5.686034659980005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:06:27,308] [INFO] [timer.py:197:stop] 0/1370, RunningAvgSamplesPerSec=6.326006313683808, CurrSamplesPerSec=5.693818407473715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:06:38,658] [INFO] [timer.py:197:stop] 0/1372, RunningAvgSamplesPerSec=6.325990283395943, CurrSamplesPerSec=5.690380406376861, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:06:49,979] [INFO] [timer.py:197:stop] 0/1374, RunningAvgSamplesPerSec=6.325993190789163, CurrSamplesPerSec=5.689745981734343, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:07:01,326] [INFO] [timer.py:197:stop] 0/1376, RunningAvgSamplesPerSec=6.3259889124295565, CurrSamplesPerSec=5.703390459989499, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:07:12,691] [INFO] [timer.py:197:stop] 0/1378, RunningAvgSamplesPerSec=6.325969040929874, CurrSamplesPerSec=5.681856456739923, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:07:24,025] [INFO] [logging.py:68:log_dist] [Rank 0] step=690, skipped=6, lr=[9.593333333333335e-06], mom=[[0.9, 0.999]] [2022-12-19 02:07:24,026] [INFO] [timer.py:197:stop] 0/1380, RunningAvgSamplesPerSec=6.325971148816176, CurrSamplesPerSec=5.686399864561237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:07:35,332] [INFO] [timer.py:197:stop] 0/1382, RunningAvgSamplesPerSec=6.325999826421312, CurrSamplesPerSec=5.709484983995986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:07:46,616] [INFO] [timer.py:197:stop] 0/1384, RunningAvgSamplesPerSec=6.326018551744674, CurrSamplesPerSec=5.71132851954344, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:07:57,958] [INFO] [timer.py:197:stop] 0/1386, RunningAvgSamplesPerSec=6.326009808995005, CurrSamplesPerSec=5.695271669099709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:08:09,275] [INFO] [timer.py:197:stop] 0/1388, RunningAvgSamplesPerSec=6.326024737831474, CurrSamplesPerSec=5.69747169670837, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:08:20,624] [INFO] [timer.py:197:stop] 0/1390, RunningAvgSamplesPerSec=6.326001563249644, CurrSamplesPerSec=5.673787781188459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:08:31,978] [INFO] [timer.py:197:stop] 0/1392, RunningAvgSamplesPerSec=6.326002101638358, CurrSamplesPerSec=5.704283444530918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:08:43,303] [INFO] [timer.py:197:stop] 0/1394, RunningAvgSamplesPerSec=6.326007086245346, CurrSamplesPerSec=5.709605209977455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:08:54,663] [INFO] [timer.py:197:stop] 0/1396, RunningAvgSamplesPerSec=6.325976996367318, CurrSamplesPerSec=5.67580754128076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:09:06,061] [INFO] [timer.py:197:stop] 0/1398, RunningAvgSamplesPerSec=6.3259534631195535, CurrSamplesPerSec=5.664270768711359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:09:17,415] [INFO] [logging.py:68:log_dist] [Rank 0] step=700, skipped=6, lr=[9.571111111111113e-06], mom=[[0.9, 0.999]] [2022-12-19 02:09:17,417] [INFO] [timer.py:197:stop] 0/1400, RunningAvgSamplesPerSec=6.325942329797055, CurrSamplesPerSec=5.692385201091724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:09:28,768] [INFO] [timer.py:197:stop] 0/1402, RunningAvgSamplesPerSec=6.325922639188353, CurrSamplesPerSec=5.6779937551164466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:09:40,089] [INFO] [timer.py:197:stop] 0/1404, RunningAvgSamplesPerSec=6.3259300216188254, CurrSamplesPerSec=5.695464768316012, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0146, 'learning_rate': 9.566666666666668e-06, 'epoch': 5.26} [2022-12-19 02:09:51,414] [INFO] [timer.py:197:stop] 0/1406, RunningAvgSamplesPerSec=6.325946166681824, CurrSamplesPerSec=5.699702953903435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:10:02,782] [INFO] [timer.py:197:stop] 0/1408, RunningAvgSamplesPerSec=6.325905383744529, CurrSamplesPerSec=5.637545394202437, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:10:14,134] [INFO] [timer.py:197:stop] 0/1410, RunningAvgSamplesPerSec=6.3258835375422535, CurrSamplesPerSec=5.680435275482285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:10:25,472] [INFO] [timer.py:197:stop] 0/1412, RunningAvgSamplesPerSec=6.325894852324781, CurrSamplesPerSec=5.713314299413726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:10:36,835] [INFO] [timer.py:197:stop] 0/1414, RunningAvgSamplesPerSec=6.325883353898903, CurrSamplesPerSec=5.686369268445658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:10:48,197] [INFO] [timer.py:197:stop] 0/1416, RunningAvgSamplesPerSec=6.325873910552596, CurrSamplesPerSec=5.69080697365038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:10:59,981] [INFO] [timer.py:197:stop] 0/1418, RunningAvgSamplesPerSec=6.325839118282026, CurrSamplesPerSec=5.662679420891574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:11:11,860] [INFO] [logging.py:68:log_dist] [Rank 0] step=710, skipped=6, lr=[9.54888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 02:11:11,861] [INFO] [timer.py:197:stop] 0/1420, RunningAvgSamplesPerSec=6.325789873694106, CurrSamplesPerSec=5.651609712814878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:11:23,685] [INFO] [timer.py:197:stop] 0/1422, RunningAvgSamplesPerSec=6.3257407152584335, CurrSamplesPerSec=5.660914904869334, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:11:35,206] [INFO] [timer.py:197:stop] 0/1424, RunningAvgSamplesPerSec=6.32572380394987, CurrSamplesPerSec=5.683606135095532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:11:46,531] [INFO] [timer.py:197:stop] 0/1426, RunningAvgSamplesPerSec=6.325727512231239, CurrSamplesPerSec=5.686158236490062, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:11:57,997] [INFO] [timer.py:197:stop] 0/1428, RunningAvgSamplesPerSec=6.325718381177641, CurrSamplesPerSec=5.67422889614679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:12:09,344] [INFO] [timer.py:197:stop] 0/1430, RunningAvgSamplesPerSec=6.325703349002757, CurrSamplesPerSec=5.684987968011892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:12:20,752] [INFO] [timer.py:197:stop] 0/1432, RunningAvgSamplesPerSec=6.325704946902877, CurrSamplesPerSec=5.7040150834697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:12:32,089] [INFO] [timer.py:197:stop] 0/1434, RunningAvgSamplesPerSec=6.325683975887975, CurrSamplesPerSec=5.679657413785873, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:12:43,432] [INFO] [timer.py:197:stop] 0/1436, RunningAvgSamplesPerSec=6.325679520445578, CurrSamplesPerSec=5.659775050705646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:12:54,930] [INFO] [timer.py:197:stop] 0/1438, RunningAvgSamplesPerSec=6.325646700713397, CurrSamplesPerSec=5.664626727825863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:13:06,289] [INFO] [logging.py:68:log_dist] [Rank 0] step=720, skipped=6, lr=[9.526666666666668e-06], mom=[[0.9, 0.999]] [2022-12-19 02:13:06,291] [INFO] [timer.py:197:stop] 0/1440, RunningAvgSamplesPerSec=6.325638282584947, CurrSamplesPerSec=5.6699123260919695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:13:17,859] [INFO] [timer.py:197:stop] 0/1442, RunningAvgSamplesPerSec=6.3255749779462835, CurrSamplesPerSec=5.617487649751459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:13:29,233] [INFO] [timer.py:197:stop] 0/1444, RunningAvgSamplesPerSec=6.325593953770169, CurrSamplesPerSec=5.722883500338872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:13:40,683] [INFO] [timer.py:197:stop] 0/1446, RunningAvgSamplesPerSec=6.325594187741323, CurrSamplesPerSec=5.69692636663818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:13:52,154] [INFO] [timer.py:197:stop] 0/1448, RunningAvgSamplesPerSec=6.325575935958007, CurrSamplesPerSec=5.674743976856191, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:14:03,517] [INFO] [timer.py:197:stop] 0/1450, RunningAvgSamplesPerSec=6.3255565955518085, CurrSamplesPerSec=5.663227533152615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:14:14,843] [INFO] [timer.py:197:stop] 0/1452, RunningAvgSamplesPerSec=6.325584850421658, CurrSamplesPerSec=5.7131411451538705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:14:26,207] [INFO] [timer.py:197:stop] 0/1454, RunningAvgSamplesPerSec=6.325590370650787, CurrSamplesPerSec=5.703681788492748, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0159, 'learning_rate': 9.511111111111112e-06, 'epoch': 5.45} [2022-12-19 02:14:37,594] [INFO] [timer.py:197:stop] 0/1456, RunningAvgSamplesPerSec=6.325554718829132, CurrSamplesPerSec=5.652222093218099, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:14:49,097] [INFO] [timer.py:197:stop] 0/1458, RunningAvgSamplesPerSec=6.325545425297475, CurrSamplesPerSec=5.662100362617642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:15:00,498] [INFO] [logging.py:68:log_dist] [Rank 0] step=730, skipped=6, lr=[9.504444444444446e-06], mom=[[0.9, 0.999]] [2022-12-19 02:15:00,500] [INFO] [timer.py:197:stop] 0/1460, RunningAvgSamplesPerSec=6.325526949863666, CurrSamplesPerSec=5.670537543696346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:15:11,884] [INFO] [timer.py:197:stop] 0/1462, RunningAvgSamplesPerSec=6.325526579423683, CurrSamplesPerSec=5.691162414794898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:15:23,188] [INFO] [timer.py:197:stop] 0/1464, RunningAvgSamplesPerSec=6.325554018392089, CurrSamplesPerSec=5.703167499876349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:15:34,499] [INFO] [timer.py:197:stop] 0/1466, RunningAvgSamplesPerSec=6.325567539019158, CurrSamplesPerSec=5.6901553262336755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:15:45,974] [INFO] [timer.py:197:stop] 0/1468, RunningAvgSamplesPerSec=6.325562001152822, CurrSamplesPerSec=5.687878024946484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:15:57,285] [INFO] [timer.py:197:stop] 0/1470, RunningAvgSamplesPerSec=6.325575268581615, CurrSamplesPerSec=5.701323658051909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:16:08,677] [INFO] [timer.py:197:stop] 0/1472, RunningAvgSamplesPerSec=6.325601845452154, CurrSamplesPerSec=5.720548479549482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:16:20,033] [INFO] [timer.py:197:stop] 0/1474, RunningAvgSamplesPerSec=6.325589536878379, CurrSamplesPerSec=5.676665498493852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:16:31,533] [INFO] [timer.py:197:stop] 0/1476, RunningAvgSamplesPerSec=6.3255580968111, CurrSamplesPerSec=5.661283575845675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:16:42,888] [INFO] [timer.py:197:stop] 0/1478, RunningAvgSamplesPerSec=6.325540940915712, CurrSamplesPerSec=5.686236528481549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:16:54,233] [INFO] [logging.py:68:log_dist] [Rank 0] step=740, skipped=6, lr=[9.482222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 02:16:54,234] [INFO] [timer.py:197:stop] 0/1480, RunningAvgSamplesPerSec=6.325526796553384, CurrSamplesPerSec=5.688321092555874, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:17:05,614] [INFO] [timer.py:197:stop] 0/1482, RunningAvgSamplesPerSec=6.325530489114055, CurrSamplesPerSec=5.709013601268204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:17:17,036] [INFO] [timer.py:197:stop] 0/1484, RunningAvgSamplesPerSec=6.325499610822527, CurrSamplesPerSec=5.665652541876707, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:17:28,392] [INFO] [timer.py:197:stop] 0/1486, RunningAvgSamplesPerSec=6.325476841720645, CurrSamplesPerSec=5.678753621525439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:17:39,881] [INFO] [timer.py:197:stop] 0/1488, RunningAvgSamplesPerSec=6.325445538504084, CurrSamplesPerSec=5.680460999517352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:17:51,238] [INFO] [timer.py:197:stop] 0/1490, RunningAvgSamplesPerSec=6.32541570068041, CurrSamplesPerSec=5.679860031525407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:18:02,709] [INFO] [timer.py:197:stop] 0/1492, RunningAvgSamplesPerSec=6.325397274117322, CurrSamplesPerSec=5.680096076587714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:18:14,212] [INFO] [timer.py:197:stop] 0/1494, RunningAvgSamplesPerSec=6.325386249492268, CurrSamplesPerSec=5.668855518860688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:18:25,564] [INFO] [timer.py:197:stop] 0/1496, RunningAvgSamplesPerSec=6.3253606197968955, CurrSamplesPerSec=5.662164616955378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:18:36,886] [INFO] [timer.py:197:stop] 0/1498, RunningAvgSamplesPerSec=6.325354537006481, CurrSamplesPerSec=5.6885477152923105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:18:48,265] [INFO] [logging.py:68:log_dist] [Rank 0] step=750, skipped=6, lr=[9.460000000000001e-06], mom=[[0.9, 0.999]] [2022-12-19 02:18:48,266] [INFO] [timer.py:197:stop] 0/1500, RunningAvgSamplesPerSec=6.325339923746025, CurrSamplesPerSec=5.6820378227258015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:18:59,749] [INFO] [timer.py:197:stop] 0/1502, RunningAvgSamplesPerSec=6.3253589293965415, CurrSamplesPerSec=5.704349387156692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:19:11,076] [INFO] [timer.py:197:stop] 0/1504, RunningAvgSamplesPerSec=6.32537638650134, CurrSamplesPerSec=5.70284302711964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0167, 'learning_rate': 9.455555555555557e-06, 'epoch': 5.64} [2022-12-19 02:19:22,392] [INFO] [timer.py:197:stop] 0/1506, RunningAvgSamplesPerSec=6.325383304052213, CurrSamplesPerSec=5.7053251282720945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:19:33,733] [INFO] [timer.py:197:stop] 0/1508, RunningAvgSamplesPerSec=6.325384086421241, CurrSamplesPerSec=5.697731460462139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:19:45,058] [INFO] [timer.py:197:stop] 0/1510, RunningAvgSamplesPerSec=6.325395211325426, CurrSamplesPerSec=5.700483895265787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:19:56,375] [INFO] [timer.py:197:stop] 0/1512, RunningAvgSamplesPerSec=6.325385845716689, CurrSamplesPerSec=5.689374559205892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:20:07,694] [INFO] [timer.py:197:stop] 0/1514, RunningAvgSamplesPerSec=6.325411697148107, CurrSamplesPerSec=5.711216240723798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:20:18,967] [INFO] [timer.py:197:stop] 0/1516, RunningAvgSamplesPerSec=6.325465119639401, CurrSamplesPerSec=5.7294550927893315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:20:30,284] [INFO] [timer.py:197:stop] 0/1518, RunningAvgSamplesPerSec=6.325485745584025, CurrSamplesPerSec=5.713452684450866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:20:41,635] [INFO] [logging.py:68:log_dist] [Rank 0] step=760, skipped=6, lr=[9.437777777777779e-06], mom=[[0.9, 0.999]] [2022-12-19 02:20:41,636] [INFO] [timer.py:197:stop] 0/1520, RunningAvgSamplesPerSec=6.325484895874861, CurrSamplesPerSec=5.702245308780833, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:20:52,950] [INFO] [timer.py:197:stop] 0/1522, RunningAvgSamplesPerSec=6.325485427164552, CurrSamplesPerSec=5.687078122844323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:21:04,380] [INFO] [timer.py:197:stop] 0/1524, RunningAvgSamplesPerSec=6.3254233996255325, CurrSamplesPerSec=5.605694566585853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:21:15,732] [INFO] [timer.py:197:stop] 0/1526, RunningAvgSamplesPerSec=6.325413754162813, CurrSamplesPerSec=5.679617757201075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:21:27,149] [INFO] [timer.py:197:stop] 0/1528, RunningAvgSamplesPerSec=6.3253569762311175, CurrSamplesPerSec=5.6295462648154535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:21:38,521] [INFO] [timer.py:197:stop] 0/1530, RunningAvgSamplesPerSec=6.325317329565446, CurrSamplesPerSec=5.651307498563991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:21:49,868] [INFO] [timer.py:197:stop] 0/1532, RunningAvgSamplesPerSec=6.325315798593318, CurrSamplesPerSec=5.692157065480932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:22:01,206] [INFO] [timer.py:197:stop] 0/1534, RunningAvgSamplesPerSec=6.3253135865072885, CurrSamplesPerSec=5.700478810950541, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:22:12,512] [INFO] [timer.py:197:stop] 0/1536, RunningAvgSamplesPerSec=6.3253266475622505, CurrSamplesPerSec=5.704682275228334, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:22:23,812] [INFO] [timer.py:197:stop] 0/1538, RunningAvgSamplesPerSec=6.3253494266803765, CurrSamplesPerSec=5.706959708828725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:22:35,107] [INFO] [logging.py:68:log_dist] [Rank 0] step=770, skipped=6, lr=[9.415555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 02:22:35,109] [INFO] [timer.py:197:stop] 0/1540, RunningAvgSamplesPerSec=6.325347090700444, CurrSamplesPerSec=5.67853570669287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:22:46,408] [INFO] [timer.py:197:stop] 0/1542, RunningAvgSamplesPerSec=6.325361356934764, CurrSamplesPerSec=5.700227269725416, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:22:57,762] [INFO] [timer.py:197:stop] 0/1544, RunningAvgSamplesPerSec=6.3253531798553215, CurrSamplesPerSec=5.672072906007402, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:23:09,175] [INFO] [timer.py:197:stop] 0/1546, RunningAvgSamplesPerSec=6.325296778479612, CurrSamplesPerSec=5.653853771898734, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:23:20,512] [INFO] [timer.py:197:stop] 0/1548, RunningAvgSamplesPerSec=6.325296549805419, CurrSamplesPerSec=5.685531978164002, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:23:31,873] [INFO] [timer.py:197:stop] 0/1550, RunningAvgSamplesPerSec=6.3252743759482435, CurrSamplesPerSec=5.680035260657811, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:23:43,186] [INFO] [timer.py:197:stop] 0/1552, RunningAvgSamplesPerSec=6.325286683540324, CurrSamplesPerSec=5.696729299633806, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:23:54,504] [INFO] [timer.py:197:stop] 0/1554, RunningAvgSamplesPerSec=6.325283556792846, CurrSamplesPerSec=5.6884356071073165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0154, 'learning_rate': 9.4e-06, 'epoch': 5.82} [2022-12-19 02:24:05,796] [INFO] [timer.py:197:stop] 0/1556, RunningAvgSamplesPerSec=6.325296142420349, CurrSamplesPerSec=5.687520583816793, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:24:17,110] [INFO] [timer.py:197:stop] 0/1558, RunningAvgSamplesPerSec=6.325307812381287, CurrSamplesPerSec=5.722122756745596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:24:28,435] [INFO] [logging.py:68:log_dist] [Rank 0] step=780, skipped=6, lr=[9.393333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 02:24:28,437] [INFO] [timer.py:197:stop] 0/1560, RunningAvgSamplesPerSec=6.32531039592822, CurrSamplesPerSec=5.6997230436373805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:24:39,792] [INFO] [timer.py:197:stop] 0/1562, RunningAvgSamplesPerSec=6.325303100967628, CurrSamplesPerSec=5.681589480135922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:24:51,121] [INFO] [timer.py:197:stop] 0/1564, RunningAvgSamplesPerSec=6.325310072132719, CurrSamplesPerSec=5.683654030605526, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:25:02,518] [INFO] [timer.py:197:stop] 0/1566, RunningAvgSamplesPerSec=6.325255663678181, CurrSamplesPerSec=5.686427569989322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:25:13,851] [INFO] [timer.py:197:stop] 0/1568, RunningAvgSamplesPerSec=6.325272184808881, CurrSamplesPerSec=5.71290112985152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:25:25,186] [INFO] [timer.py:197:stop] 0/1570, RunningAvgSamplesPerSec=6.325260929817044, CurrSamplesPerSec=5.673587275258184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:25:36,601] [INFO] [timer.py:197:stop] 0/1572, RunningAvgSamplesPerSec=6.325239951309191, CurrSamplesPerSec=5.655233326143737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:25:48,497] [INFO] [timer.py:197:stop] 0/1574, RunningAvgSamplesPerSec=6.325200312012156, CurrSamplesPerSec=5.666222758598824, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:26:00,277] [INFO] [timer.py:197:stop] 0/1576, RunningAvgSamplesPerSec=6.325160012079392, CurrSamplesPerSec=5.681873775007791, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:26:11,749] [INFO] [timer.py:197:stop] 0/1578, RunningAvgSamplesPerSec=6.325158573437108, CurrSamplesPerSec=5.71841757433845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:26:23,254] [INFO] [logging.py:68:log_dist] [Rank 0] step=790, skipped=6, lr=[9.371111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 02:26:23,255] [INFO] [timer.py:197:stop] 0/1580, RunningAvgSamplesPerSec=6.325146316912055, CurrSamplesPerSec=5.688942660616274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:26:34,595] [INFO] [timer.py:197:stop] 0/1582, RunningAvgSamplesPerSec=6.325152542720042, CurrSamplesPerSec=5.693014419026782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:26:46,045] [INFO] [timer.py:197:stop] 0/1584, RunningAvgSamplesPerSec=6.325102719512724, CurrSamplesPerSec=5.619288477684616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:26:57,460] [INFO] [timer.py:197:stop] 0/1586, RunningAvgSamplesPerSec=6.325082437519121, CurrSamplesPerSec=5.684310208163963, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:27:08,788] [INFO] [timer.py:197:stop] 0/1588, RunningAvgSamplesPerSec=6.325094838420507, CurrSamplesPerSec=5.706207316712834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:27:20,158] [INFO] [timer.py:197:stop] 0/1590, RunningAvgSamplesPerSec=6.3251074104796, CurrSamplesPerSec=5.709533073780991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:27:31,680] [INFO] [timer.py:197:stop] 0/1592, RunningAvgSamplesPerSec=6.325076136652226, CurrSamplesPerSec=5.664126150690167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:27:43,015] [INFO] [timer.py:197:stop] 0/1594, RunningAvgSamplesPerSec=6.325066510248264, CurrSamplesPerSec=5.6844806561751735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:27:54,382] [INFO] [timer.py:197:stop] 0/1596, RunningAvgSamplesPerSec=6.325067674676402, CurrSamplesPerSec=5.705342104853738, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:28:05,781] [INFO] [timer.py:197:stop] 0/1598, RunningAvgSamplesPerSec=6.325027181698902, CurrSamplesPerSec=5.651239921216458, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:28:17,146] [INFO] [logging.py:68:log_dist] [Rank 0] step=800, skipped=6, lr=[9.348888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 02:28:17,148] [INFO] [timer.py:197:stop] 0/1600, RunningAvgSamplesPerSec=6.32501249041159, CurrSamplesPerSec=5.68100558633207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:28:27,665] [INFO] [timer.py:197:stop] 0/1602, RunningAvgSamplesPerSec=6.325645315988089, CurrSamplesPerSec=6.654803377278157, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:28:39,158] [INFO] [timer.py:197:stop] 0/1604, RunningAvgSamplesPerSec=6.325621578162325, CurrSamplesPerSec=5.670118321260013, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:28:50,557] [INFO] [timer.py:197:stop] 0/1606, RunningAvgSamplesPerSec=6.325600692649266, CurrSamplesPerSec=5.672696442416951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0149, 'learning_rate': 9.342222222222223e-06, 'epoch': 6.01} [2022-12-19 02:29:01,903] [INFO] [timer.py:197:stop] 0/1608, RunningAvgSamplesPerSec=6.325625393612252, CurrSamplesPerSec=5.689381794246565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:29:13,354] [INFO] [timer.py:197:stop] 0/1610, RunningAvgSamplesPerSec=6.325636374601636, CurrSamplesPerSec=5.696881390636447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:29:24,697] [INFO] [timer.py:197:stop] 0/1612, RunningAvgSamplesPerSec=6.325636938945637, CurrSamplesPerSec=5.701277643880583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:29:36,257] [INFO] [timer.py:197:stop] 0/1614, RunningAvgSamplesPerSec=6.325583033474992, CurrSamplesPerSec=5.637129615058002, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:29:47,627] [INFO] [timer.py:197:stop] 0/1616, RunningAvgSamplesPerSec=6.3255813139357375, CurrSamplesPerSec=5.677452147134369, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:29:58,959] [INFO] [timer.py:197:stop] 0/1618, RunningAvgSamplesPerSec=6.3255863262793675, CurrSamplesPerSec=5.705448817108916, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:30:10,421] [INFO] [logging.py:68:log_dist] [Rank 0] step=810, skipped=6, lr=[9.326666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 02:30:10,423] [INFO] [timer.py:197:stop] 0/1620, RunningAvgSamplesPerSec=6.3255948828713215, CurrSamplesPerSec=5.700696718310479, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:30:21,722] [INFO] [timer.py:197:stop] 0/1622, RunningAvgSamplesPerSec=6.325620317873395, CurrSamplesPerSec=5.722774182850378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:30:33,099] [INFO] [timer.py:197:stop] 0/1624, RunningAvgSamplesPerSec=6.325656950420237, CurrSamplesPerSec=5.715720332420203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:30:44,564] [INFO] [timer.py:197:stop] 0/1626, RunningAvgSamplesPerSec=6.325680769220093, CurrSamplesPerSec=5.704646147840817, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:30:55,879] [INFO] [timer.py:197:stop] 0/1628, RunningAvgSamplesPerSec=6.325698925432605, CurrSamplesPerSec=5.698962153104518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:31:07,227] [INFO] [timer.py:197:stop] 0/1630, RunningAvgSamplesPerSec=6.325683709263267, CurrSamplesPerSec=5.670111374667464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:31:18,597] [INFO] [timer.py:197:stop] 0/1632, RunningAvgSamplesPerSec=6.325685383388617, CurrSamplesPerSec=5.673938889519243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:31:29,945] [INFO] [timer.py:197:stop] 0/1634, RunningAvgSamplesPerSec=6.3256888907837325, CurrSamplesPerSec=5.695137304905255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:31:41,475] [INFO] [timer.py:197:stop] 0/1636, RunningAvgSamplesPerSec=6.325681431225448, CurrSamplesPerSec=5.6871410175103945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:31:52,726] [INFO] [timer.py:197:stop] 0/1638, RunningAvgSamplesPerSec=6.3257232715578295, CurrSamplesPerSec=5.726327185825193, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:32:04,025] [INFO] [logging.py:68:log_dist] [Rank 0] step=820, skipped=6, lr=[9.304444444444444e-06], mom=[[0.9, 0.999]] [2022-12-19 02:32:04,027] [INFO] [timer.py:197:stop] 0/1640, RunningAvgSamplesPerSec=6.325740082961842, CurrSamplesPerSec=5.7176721446971275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:32:15,501] [INFO] [timer.py:197:stop] 0/1642, RunningAvgSamplesPerSec=6.325819022242892, CurrSamplesPerSec=5.753343860876287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:32:26,796] [INFO] [timer.py:197:stop] 0/1644, RunningAvgSamplesPerSec=6.325840567805974, CurrSamplesPerSec=5.710349266393622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:32:38,126] [INFO] [timer.py:197:stop] 0/1646, RunningAvgSamplesPerSec=6.325834868772187, CurrSamplesPerSec=5.6911592776455855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:32:49,590] [INFO] [timer.py:197:stop] 0/1648, RunningAvgSamplesPerSec=6.325838228256779, CurrSamplesPerSec=5.704953608956427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:33:00,960] [INFO] [timer.py:197:stop] 0/1650, RunningAvgSamplesPerSec=6.325823450903929, CurrSamplesPerSec=5.680258819690628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:33:12,573] [INFO] [timer.py:197:stop] 0/1652, RunningAvgSamplesPerSec=6.325804218173235, CurrSamplesPerSec=5.679047484656009, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:33:23,893] [INFO] [timer.py:197:stop] 0/1654, RunningAvgSamplesPerSec=6.32579980520833, CurrSamplesPerSec=5.700284645317121, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:33:35,313] [INFO] [timer.py:197:stop] 0/1656, RunningAvgSamplesPerSec=6.325790308407279, CurrSamplesPerSec=5.678851172062457, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0107, 'learning_rate': 9.286666666666667e-06, 'epoch': 6.2} [2022-12-19 02:33:46,832] [INFO] [timer.py:197:stop] 0/1658, RunningAvgSamplesPerSec=6.325774841378826, CurrSamplesPerSec=5.688787376425617, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:33:58,384] [INFO] [logging.py:68:log_dist] [Rank 0] step=830, skipped=6, lr=[9.282222222222222e-06], mom=[[0.9, 0.999]] [2022-12-19 02:33:58,386] [INFO] [timer.py:197:stop] 0/1660, RunningAvgSamplesPerSec=6.325765133165544, CurrSamplesPerSec=5.691613234976137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:34:09,852] [INFO] [timer.py:197:stop] 0/1662, RunningAvgSamplesPerSec=6.325790063809963, CurrSamplesPerSec=5.723681058742841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:34:21,445] [INFO] [timer.py:197:stop] 0/1664, RunningAvgSamplesPerSec=6.325726072017138, CurrSamplesPerSec=5.675937394475658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:34:32,936] [INFO] [timer.py:197:stop] 0/1666, RunningAvgSamplesPerSec=6.325747487353222, CurrSamplesPerSec=5.727400154986362, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:34:44,481] [INFO] [timer.py:197:stop] 0/1668, RunningAvgSamplesPerSec=6.325736716605278, CurrSamplesPerSec=5.686584411893967, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:34:55,846] [INFO] [timer.py:197:stop] 0/1670, RunningAvgSamplesPerSec=6.325709496133252, CurrSamplesPerSec=5.670677218422788, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:35:07,233] [INFO] [timer.py:197:stop] 0/1672, RunningAvgSamplesPerSec=6.325692081907887, CurrSamplesPerSec=5.669443623098537, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:35:18,607] [INFO] [timer.py:197:stop] 0/1674, RunningAvgSamplesPerSec=6.325669740927831, CurrSamplesPerSec=5.670831755082847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:35:30,032] [INFO] [timer.py:197:stop] 0/1676, RunningAvgSamplesPerSec=6.32565147997511, CurrSamplesPerSec=5.685375916794034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:35:41,367] [INFO] [timer.py:197:stop] 0/1678, RunningAvgSamplesPerSec=6.3256527981282185, CurrSamplesPerSec=5.694455431858697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:35:52,673] [INFO] [logging.py:68:log_dist] [Rank 0] step=840, skipped=6, lr=[9.260000000000001e-06], mom=[[0.9, 0.999]] [2022-12-19 02:35:52,675] [INFO] [timer.py:197:stop] 0/1680, RunningAvgSamplesPerSec=6.3256940710872165, CurrSamplesPerSec=5.719125912553076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:36:03,996] [INFO] [timer.py:197:stop] 0/1682, RunningAvgSamplesPerSec=6.325705088917455, CurrSamplesPerSec=5.707339741038364, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:36:15,447] [INFO] [timer.py:197:stop] 0/1684, RunningAvgSamplesPerSec=6.325731287157256, CurrSamplesPerSec=5.697480645345056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:36:26,815] [INFO] [timer.py:197:stop] 0/1686, RunningAvgSamplesPerSec=6.325751333721884, CurrSamplesPerSec=5.714498693268473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:36:38,126] [INFO] [timer.py:197:stop] 0/1688, RunningAvgSamplesPerSec=6.325778904785094, CurrSamplesPerSec=5.72610511603857, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:36:49,425] [INFO] [timer.py:197:stop] 0/1690, RunningAvgSamplesPerSec=6.325807550331122, CurrSamplesPerSec=5.7276685212131735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:37:00,693] [INFO] [timer.py:197:stop] 0/1692, RunningAvgSamplesPerSec=6.325853586564008, CurrSamplesPerSec=5.729620432047673, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:37:12,023] [INFO] [timer.py:197:stop] 0/1694, RunningAvgSamplesPerSec=6.325859705877085, CurrSamplesPerSec=5.695849314014534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:37:23,290] [INFO] [timer.py:197:stop] 0/1696, RunningAvgSamplesPerSec=6.325888818823823, CurrSamplesPerSec=5.716556555853943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:37:34,576] [INFO] [timer.py:197:stop] 0/1698, RunningAvgSamplesPerSec=6.3258855149655515, CurrSamplesPerSec=5.69654409325819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:37:45,931] [INFO] [logging.py:68:log_dist] [Rank 0] step=850, skipped=6, lr=[9.237777777777779e-06], mom=[[0.9, 0.999]] [2022-12-19 02:37:45,933] [INFO] [timer.py:197:stop] 0/1700, RunningAvgSamplesPerSec=6.32589653935007, CurrSamplesPerSec=5.69615606933963, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:37:57,348] [INFO] [timer.py:197:stop] 0/1702, RunningAvgSamplesPerSec=6.325929302907298, CurrSamplesPerSec=5.716632278410412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:38:08,900] [INFO] [timer.py:197:stop] 0/1704, RunningAvgSamplesPerSec=6.325942865495223, CurrSamplesPerSec=5.695347312163867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:38:20,206] [INFO] [timer.py:197:stop] 0/1706, RunningAvgSamplesPerSec=6.325962562735163, CurrSamplesPerSec=5.703012165144726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0107, 'learning_rate': 9.231111111111111e-06, 'epoch': 6.39} [2022-12-19 02:38:31,605] [INFO] [timer.py:197:stop] 0/1708, RunningAvgSamplesPerSec=6.325945546495006, CurrSamplesPerSec=5.682345497873678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:38:43,024] [INFO] [timer.py:197:stop] 0/1710, RunningAvgSamplesPerSec=6.3259615202519575, CurrSamplesPerSec=5.694182438465663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:38:54,319] [INFO] [timer.py:197:stop] 0/1712, RunningAvgSamplesPerSec=6.326003940697372, CurrSamplesPerSec=5.730686805961176, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:39:05,779] [INFO] [timer.py:197:stop] 0/1714, RunningAvgSamplesPerSec=6.326035083565533, CurrSamplesPerSec=5.728669372586125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:39:17,081] [INFO] [timer.py:197:stop] 0/1716, RunningAvgSamplesPerSec=6.326062416380829, CurrSamplesPerSec=5.712293035463973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:39:28,435] [INFO] [timer.py:197:stop] 0/1718, RunningAvgSamplesPerSec=6.326086066699801, CurrSamplesPerSec=5.708741395591852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:39:39,741] [INFO] [logging.py:68:log_dist] [Rank 0] step=860, skipped=6, lr=[9.215555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 02:39:39,742] [INFO] [timer.py:197:stop] 0/1720, RunningAvgSamplesPerSec=6.326086014131042, CurrSamplesPerSec=5.68582293025891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:39:51,073] [INFO] [timer.py:197:stop] 0/1722, RunningAvgSamplesPerSec=6.32609881723683, CurrSamplesPerSec=5.704984162896505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:40:02,377] [INFO] [timer.py:197:stop] 0/1724, RunningAvgSamplesPerSec=6.326118499130401, CurrSamplesPerSec=5.690084645575323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:40:13,672] [INFO] [timer.py:197:stop] 0/1726, RunningAvgSamplesPerSec=6.326140877664618, CurrSamplesPerSec=5.721628796504794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:40:24,889] [INFO] [timer.py:197:stop] 0/1728, RunningAvgSamplesPerSec=6.32619310797244, CurrSamplesPerSec=5.738877753763001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:40:36,201] [INFO] [timer.py:197:stop] 0/1730, RunningAvgSamplesPerSec=6.326237576628503, CurrSamplesPerSec=5.728241266578177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:40:47,473] [INFO] [timer.py:197:stop] 0/1732, RunningAvgSamplesPerSec=6.326262172904851, CurrSamplesPerSec=5.730709806237929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:40:58,837] [INFO] [timer.py:197:stop] 0/1734, RunningAvgSamplesPerSec=6.326255019400872, CurrSamplesPerSec=5.733883656497388, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:41:10,136] [INFO] [timer.py:197:stop] 0/1736, RunningAvgSamplesPerSec=6.326269407190393, CurrSamplesPerSec=5.707664482986645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:41:21,517] [INFO] [timer.py:197:stop] 0/1738, RunningAvgSamplesPerSec=6.326293877908873, CurrSamplesPerSec=5.706402613884454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:41:32,827] [INFO] [logging.py:68:log_dist] [Rank 0] step=870, skipped=6, lr=[9.193333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 02:41:32,828] [INFO] [timer.py:197:stop] 0/1740, RunningAvgSamplesPerSec=6.3262987345584145, CurrSamplesPerSec=5.704881347765542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:41:44,314] [INFO] [timer.py:197:stop] 0/1742, RunningAvgSamplesPerSec=6.32632362263264, CurrSamplesPerSec=5.731097413139867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:41:55,637] [INFO] [timer.py:197:stop] 0/1744, RunningAvgSamplesPerSec=6.32633615446047, CurrSamplesPerSec=5.703901637606746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:42:07,101] [INFO] [timer.py:197:stop] 0/1746, RunningAvgSamplesPerSec=6.326362263159016, CurrSamplesPerSec=5.713981479183356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:42:18,482] [INFO] [timer.py:197:stop] 0/1748, RunningAvgSamplesPerSec=6.326385100761066, CurrSamplesPerSec=5.709265918559529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:42:29,775] [INFO] [timer.py:197:stop] 0/1750, RunningAvgSamplesPerSec=6.3263899011333615, CurrSamplesPerSec=5.690672578757388, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:42:41,118] [INFO] [timer.py:197:stop] 0/1752, RunningAvgSamplesPerSec=6.326382952351564, CurrSamplesPerSec=5.673505973662093, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:42:52,555] [INFO] [timer.py:197:stop] 0/1754, RunningAvgSamplesPerSec=6.326335881692181, CurrSamplesPerSec=5.618049387935346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:43:03,893] [INFO] [timer.py:197:stop] 0/1756, RunningAvgSamplesPerSec=6.326334393712957, CurrSamplesPerSec=5.6876541068041035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0108, 'learning_rate': 9.175555555555557e-06, 'epoch': 6.58} [2022-12-19 02:43:15,226] [INFO] [timer.py:197:stop] 0/1758, RunningAvgSamplesPerSec=6.3263368233047546, CurrSamplesPerSec=5.693883625263997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:43:26,560] [INFO] [logging.py:68:log_dist] [Rank 0] step=880, skipped=6, lr=[9.171111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 02:43:26,562] [INFO] [timer.py:197:stop] 0/1760, RunningAvgSamplesPerSec=6.3263320887501004, CurrSamplesPerSec=5.6971232054522165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:43:37,884] [INFO] [timer.py:197:stop] 0/1762, RunningAvgSamplesPerSec=6.326337614422194, CurrSamplesPerSec=5.689881056495939, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:43:49,200] [INFO] [timer.py:197:stop] 0/1764, RunningAvgSamplesPerSec=6.326341517798706, CurrSamplesPerSec=5.6905075495471955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:44:00,621] [INFO] [timer.py:197:stop] 0/1766, RunningAvgSamplesPerSec=6.326333758352931, CurrSamplesPerSec=5.678917729401463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:44:12,101] [INFO] [timer.py:197:stop] 0/1768, RunningAvgSamplesPerSec=6.32628541309422, CurrSamplesPerSec=5.621093756380219, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:44:23,452] [INFO] [timer.py:197:stop] 0/1770, RunningAvgSamplesPerSec=6.3262677174406745, CurrSamplesPerSec=5.67610229979496, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:44:34,782] [INFO] [timer.py:197:stop] 0/1772, RunningAvgSamplesPerSec=6.326270866523503, CurrSamplesPerSec=5.692841768640046, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:44:46,153] [INFO] [timer.py:197:stop] 0/1774, RunningAvgSamplesPerSec=6.326288454194321, CurrSamplesPerSec=5.708204830629907, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:44:57,488] [INFO] [timer.py:197:stop] 0/1776, RunningAvgSamplesPerSec=6.326286738967611, CurrSamplesPerSec=5.691912051303222, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:45:08,815] [INFO] [timer.py:197:stop] 0/1778, RunningAvgSamplesPerSec=6.326285530290613, CurrSamplesPerSec=5.703175254718095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:45:20,177] [INFO] [logging.py:68:log_dist] [Rank 0] step=890, skipped=6, lr=[9.14888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 02:45:20,178] [INFO] [timer.py:197:stop] 0/1780, RunningAvgSamplesPerSec=6.3263004923385475, CurrSamplesPerSec=5.714217936093594, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:45:31,559] [INFO] [timer.py:197:stop] 0/1782, RunningAvgSamplesPerSec=6.326297856203596, CurrSamplesPerSec=5.6967970020979015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:45:43,019] [INFO] [timer.py:197:stop] 0/1784, RunningAvgSamplesPerSec=6.326290658807318, CurrSamplesPerSec=5.692538508838428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:45:54,358] [INFO] [timer.py:197:stop] 0/1786, RunningAvgSamplesPerSec=6.3262856842098865, CurrSamplesPerSec=5.697424777289617, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:46:05,658] [INFO] [timer.py:197:stop] 0/1788, RunningAvgSamplesPerSec=6.326306571710764, CurrSamplesPerSec=5.718891486053244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:46:17,122] [INFO] [timer.py:197:stop] 0/1790, RunningAvgSamplesPerSec=6.326285683438104, CurrSamplesPerSec=5.663022994110601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:46:28,493] [INFO] [timer.py:197:stop] 0/1792, RunningAvgSamplesPerSec=6.326273363068191, CurrSamplesPerSec=5.679099869012911, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:46:39,834] [INFO] [timer.py:197:stop] 0/1794, RunningAvgSamplesPerSec=6.326264734650878, CurrSamplesPerSec=5.688326637364776, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:46:51,160] [INFO] [timer.py:197:stop] 0/1796, RunningAvgSamplesPerSec=6.326274303607092, CurrSamplesPerSec=5.713274171438841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:47:02,510] [INFO] [timer.py:197:stop] 0/1798, RunningAvgSamplesPerSec=6.326277232252197, CurrSamplesPerSec=5.703496856883192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:47:13,952] [INFO] [logging.py:68:log_dist] [Rank 0] step=900, skipped=6, lr=[9.126666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 02:47:13,954] [INFO] [timer.py:197:stop] 0/1800, RunningAvgSamplesPerSec=6.326290437167872, CurrSamplesPerSec=5.705799297939293, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:47:25,315] [INFO] [timer.py:197:stop] 0/1802, RunningAvgSamplesPerSec=6.326289168268624, CurrSamplesPerSec=5.698488635203304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:47:36,792] [INFO] [timer.py:197:stop] 0/1804, RunningAvgSamplesPerSec=6.32627364734703, CurrSamplesPerSec=5.67324385783393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:47:48,200] [INFO] [timer.py:197:stop] 0/1806, RunningAvgSamplesPerSec=6.326275620745158, CurrSamplesPerSec=5.6986081565692555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0123, 'learning_rate': 9.12e-06, 'epoch': 6.76} [2022-12-19 02:47:59,543] [INFO] [timer.py:197:stop] 0/1808, RunningAvgSamplesPerSec=6.326266962597003, CurrSamplesPerSec=5.709048812640666, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:48:11,019] [INFO] [timer.py:197:stop] 0/1810, RunningAvgSamplesPerSec=6.326251483715628, CurrSamplesPerSec=5.674714945601174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:48:22,345] [INFO] [timer.py:197:stop] 0/1812, RunningAvgSamplesPerSec=6.326247948283754, CurrSamplesPerSec=5.681360044615221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:48:33,717] [INFO] [timer.py:197:stop] 0/1814, RunningAvgSamplesPerSec=6.3262437806871485, CurrSamplesPerSec=5.697264918474079, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:48:45,242] [INFO] [timer.py:197:stop] 0/1816, RunningAvgSamplesPerSec=6.326224041904885, CurrSamplesPerSec=5.669574622213104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:48:56,523] [INFO] [timer.py:197:stop] 0/1818, RunningAvgSamplesPerSec=6.32624519155112, CurrSamplesPerSec=5.709860251548659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:49:07,836] [INFO] [logging.py:68:log_dist] [Rank 0] step=910, skipped=6, lr=[9.104444444444444e-06], mom=[[0.9, 0.999]] [2022-12-19 02:49:07,838] [INFO] [timer.py:197:stop] 0/1820, RunningAvgSamplesPerSec=6.326258283689595, CurrSamplesPerSec=5.690251580192909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:49:19,161] [INFO] [timer.py:197:stop] 0/1822, RunningAvgSamplesPerSec=6.326272794780122, CurrSamplesPerSec=5.680835827541261, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:49:30,458] [INFO] [timer.py:197:stop] 0/1824, RunningAvgSamplesPerSec=6.326289842706597, CurrSamplesPerSec=5.686278445695439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:49:41,771] [INFO] [timer.py:197:stop] 0/1826, RunningAvgSamplesPerSec=6.3263006088827645, CurrSamplesPerSec=5.69110522269296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:49:53,109] [INFO] [timer.py:197:stop] 0/1828, RunningAvgSamplesPerSec=6.326305863765418, CurrSamplesPerSec=5.686967759321349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:50:04,455] [INFO] [timer.py:197:stop] 0/1830, RunningAvgSamplesPerSec=6.326295565006565, CurrSamplesPerSec=5.687589513673625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:50:15,833] [INFO] [timer.py:197:stop] 0/1832, RunningAvgSamplesPerSec=6.326279534779694, CurrSamplesPerSec=5.6756726538654725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:50:27,102] [INFO] [timer.py:197:stop] 0/1834, RunningAvgSamplesPerSec=6.32628702156709, CurrSamplesPerSec=5.707074732393959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:50:38,443] [INFO] [timer.py:197:stop] 0/1836, RunningAvgSamplesPerSec=6.326280456792154, CurrSamplesPerSec=5.6858901328649205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:50:49,764] [INFO] [timer.py:197:stop] 0/1838, RunningAvgSamplesPerSec=6.326293705340914, CurrSamplesPerSec=5.700904230026162, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:51:01,110] [INFO] [logging.py:68:log_dist] [Rank 0] step=920, skipped=6, lr=[9.082222222222224e-06], mom=[[0.9, 0.999]] [2022-12-19 02:51:01,112] [INFO] [timer.py:197:stop] 0/1840, RunningAvgSamplesPerSec=6.326294305995962, CurrSamplesPerSec=5.69232943292039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:51:12,423] [INFO] [timer.py:197:stop] 0/1842, RunningAvgSamplesPerSec=6.326313116856321, CurrSamplesPerSec=5.705744479411721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:51:23,741] [INFO] [timer.py:197:stop] 0/1844, RunningAvgSamplesPerSec=6.32631566493016, CurrSamplesPerSec=5.700933772012223, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:51:35,094] [INFO] [timer.py:197:stop] 0/1846, RunningAvgSamplesPerSec=6.326311378399368, CurrSamplesPerSec=5.69037003251689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:51:46,431] [INFO] [timer.py:197:stop] 0/1848, RunningAvgSamplesPerSec=6.326334065030902, CurrSamplesPerSec=5.71306527187534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:51:57,727] [INFO] [timer.py:197:stop] 0/1850, RunningAvgSamplesPerSec=6.326353006935345, CurrSamplesPerSec=5.700662094138969, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:52:09,037] [INFO] [timer.py:197:stop] 0/1852, RunningAvgSamplesPerSec=6.326371952645956, CurrSamplesPerSec=5.706567595824296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:52:20,373] [INFO] [timer.py:197:stop] 0/1854, RunningAvgSamplesPerSec=6.326381012581383, CurrSamplesPerSec=5.707263051024697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:52:31,697] [INFO] [timer.py:197:stop] 0/1856, RunningAvgSamplesPerSec=6.326385405224525, CurrSamplesPerSec=5.7047732018864235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0113, 'learning_rate': 9.064444444444447e-06, 'epoch': 6.95} [2022-12-19 02:52:43,074] [INFO] [timer.py:197:stop] 0/1858, RunningAvgSamplesPerSec=6.326360052318767, CurrSamplesPerSec=5.62437144846066, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:52:54,427] [INFO] [logging.py:68:log_dist] [Rank 0] step=930, skipped=6, lr=[9.060000000000001e-06], mom=[[0.9, 0.999]] [2022-12-19 02:52:54,429] [INFO] [timer.py:197:stop] 0/1860, RunningAvgSamplesPerSec=6.326355866282421, CurrSamplesPerSec=5.6719077553516275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:53:05,747] [INFO] [timer.py:197:stop] 0/1862, RunningAvgSamplesPerSec=6.326375388595798, CurrSamplesPerSec=5.70627912629274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:53:17,021] [INFO] [timer.py:197:stop] 0/1864, RunningAvgSamplesPerSec=6.326393589824482, CurrSamplesPerSec=5.701821627330498, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:53:28,389] [INFO] [timer.py:197:stop] 0/1866, RunningAvgSamplesPerSec=6.326390856696775, CurrSamplesPerSec=5.694816161501644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:53:39,718] [INFO] [timer.py:197:stop] 0/1868, RunningAvgSamplesPerSec=6.326391776659984, CurrSamplesPerSec=5.700212018180171, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:53:50,140] [INFO] [timer.py:197:stop] 0/1870, RunningAvgSamplesPerSec=6.326924106132798, CurrSamplesPerSec=5.671705944240296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:54:01,529] [INFO] [timer.py:197:stop] 0/1872, RunningAvgSamplesPerSec=6.326892313786539, CurrSamplesPerSec=5.644772119484631, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:54:12,878] [INFO] [timer.py:197:stop] 0/1874, RunningAvgSamplesPerSec=6.326871510060623, CurrSamplesPerSec=5.669993764253113, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:54:24,195] [INFO] [timer.py:197:stop] 0/1876, RunningAvgSamplesPerSec=6.32687887416701, CurrSamplesPerSec=5.675905710708204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:54:35,525] [INFO] [timer.py:197:stop] 0/1878, RunningAvgSamplesPerSec=6.326893190924998, CurrSamplesPerSec=5.704296293525905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:54:46,865] [INFO] [logging.py:68:log_dist] [Rank 0] step=940, skipped=6, lr=[9.037777777777779e-06], mom=[[0.9, 0.999]] [2022-12-19 02:54:46,867] [INFO] [timer.py:197:stop] 0/1880, RunningAvgSamplesPerSec=6.326893723872482, CurrSamplesPerSec=5.686018520754404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:54:58,219] [INFO] [timer.py:197:stop] 0/1882, RunningAvgSamplesPerSec=6.326882888179903, CurrSamplesPerSec=5.674984155904965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:55:09,747] [INFO] [timer.py:197:stop] 0/1884, RunningAvgSamplesPerSec=6.326854854129708, CurrSamplesPerSec=5.647342920570599, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:55:21,604] [INFO] [timer.py:197:stop] 0/1886, RunningAvgSamplesPerSec=6.3268220619883175, CurrSamplesPerSec=5.651489537135508, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:55:33,286] [INFO] [timer.py:197:stop] 0/1888, RunningAvgSamplesPerSec=6.326784679478798, CurrSamplesPerSec=5.65706368557152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:55:44,857] [INFO] [timer.py:197:stop] 0/1890, RunningAvgSamplesPerSec=6.326785351859433, CurrSamplesPerSec=5.689782161845206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:55:56,230] [INFO] [timer.py:197:stop] 0/1892, RunningAvgSamplesPerSec=6.326768671987495, CurrSamplesPerSec=5.67753668401372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:56:07,707] [INFO] [timer.py:197:stop] 0/1894, RunningAvgSamplesPerSec=6.32673878130999, CurrSamplesPerSec=5.653865203876641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:56:19,060] [INFO] [timer.py:197:stop] 0/1896, RunningAvgSamplesPerSec=6.326736501509452, CurrSamplesPerSec=5.696846329192205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:56:30,555] [INFO] [timer.py:197:stop] 0/1898, RunningAvgSamplesPerSec=6.32674735316769, CurrSamplesPerSec=5.70047929517065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:56:41,901] [INFO] [logging.py:68:log_dist] [Rank 0] step=950, skipped=6, lr=[9.015555555555557e-06], mom=[[0.9, 0.999]] [2022-12-19 02:56:41,902] [INFO] [timer.py:197:stop] 0/1900, RunningAvgSamplesPerSec=6.3267513569449605, CurrSamplesPerSec=5.689371906362256, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:56:53,290] [INFO] [timer.py:197:stop] 0/1902, RunningAvgSamplesPerSec=6.3267480362392785, CurrSamplesPerSec=5.687192346372233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:57:04,582] [INFO] [timer.py:197:stop] 0/1904, RunningAvgSamplesPerSec=6.326770316587786, CurrSamplesPerSec=5.708168658612521, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:57:15,866] [INFO] [timer.py:197:stop] 0/1906, RunningAvgSamplesPerSec=6.326787952251769, CurrSamplesPerSec=5.702025587327711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0086, 'learning_rate': 9.008888888888889e-06, 'epoch': 7.14} [2022-12-19 02:57:27,382] [INFO] [timer.py:197:stop] 0/1908, RunningAvgSamplesPerSec=6.326766090302302, CurrSamplesPerSec=5.660147153665728, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:57:38,708] [INFO] [timer.py:197:stop] 0/1910, RunningAvgSamplesPerSec=6.326780368445381, CurrSamplesPerSec=5.694250080406871, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:57:50,025] [INFO] [timer.py:197:stop] 0/1912, RunningAvgSamplesPerSec=6.326786052597571, CurrSamplesPerSec=5.697235656333974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:58:01,406] [INFO] [timer.py:197:stop] 0/1914, RunningAvgSamplesPerSec=6.3267708296819265, CurrSamplesPerSec=5.665626712628445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:58:12,849] [INFO] [timer.py:197:stop] 0/1916, RunningAvgSamplesPerSec=6.326775654447717, CurrSamplesPerSec=5.69344597005626, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:58:24,285] [INFO] [timer.py:197:stop] 0/1918, RunningAvgSamplesPerSec=6.3267440628330505, CurrSamplesPerSec=5.641554793761365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:58:35,776] [INFO] [logging.py:68:log_dist] [Rank 0] step=960, skipped=6, lr=[8.993333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 02:58:35,777] [INFO] [timer.py:197:stop] 0/1920, RunningAvgSamplesPerSec=6.3267433567481985, CurrSamplesPerSec=5.6891596868866126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:58:47,244] [INFO] [timer.py:197:stop] 0/1922, RunningAvgSamplesPerSec=6.326729414038707, CurrSamplesPerSec=5.666347628416906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:58:58,611] [INFO] [timer.py:197:stop] 0/1924, RunningAvgSamplesPerSec=6.326715766360123, CurrSamplesPerSec=5.680182374944682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:59:10,086] [INFO] [timer.py:197:stop] 0/1926, RunningAvgSamplesPerSec=6.326698267238547, CurrSamplesPerSec=5.690238311909788, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:59:21,478] [INFO] [timer.py:197:stop] 0/1928, RunningAvgSamplesPerSec=6.326682552091262, CurrSamplesPerSec=5.671927409932752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:59:33,005] [INFO] [timer.py:197:stop] 0/1930, RunningAvgSamplesPerSec=6.3266662201263, CurrSamplesPerSec=5.673839588852302, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:59:44,394] [INFO] [timer.py:197:stop] 0/1932, RunningAvgSamplesPerSec=6.326647452117153, CurrSamplesPerSec=5.67752227414039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 02:59:55,928] [INFO] [timer.py:197:stop] 0/1934, RunningAvgSamplesPerSec=6.326632547130804, CurrSamplesPerSec=5.696782010699283, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:00:07,308] [INFO] [timer.py:197:stop] 0/1936, RunningAvgSamplesPerSec=6.326612853608152, CurrSamplesPerSec=5.679080645283451, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:00:18,847] [INFO] [timer.py:197:stop] 0/1938, RunningAvgSamplesPerSec=6.326582143690067, CurrSamplesPerSec=5.660932812000199, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:00:30,250] [INFO] [logging.py:68:log_dist] [Rank 0] step=970, skipped=6, lr=[8.971111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 03:00:30,252] [INFO] [timer.py:197:stop] 0/1940, RunningAvgSamplesPerSec=6.3265648412872775, CurrSamplesPerSec=5.666948850761994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:00:41,691] [INFO] [timer.py:197:stop] 0/1942, RunningAvgSamplesPerSec=6.326569243293468, CurrSamplesPerSec=5.711902863153607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:00:53,099] [INFO] [timer.py:197:stop] 0/1944, RunningAvgSamplesPerSec=6.326572842640973, CurrSamplesPerSec=5.695367371204844, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:01:04,422] [INFO] [timer.py:197:stop] 0/1946, RunningAvgSamplesPerSec=6.32657294453668, CurrSamplesPerSec=5.70359210849502, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:01:15,862] [INFO] [timer.py:197:stop] 0/1948, RunningAvgSamplesPerSec=6.326541206508122, CurrSamplesPerSec=5.651977410741608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:01:27,324] [INFO] [timer.py:197:stop] 0/1950, RunningAvgSamplesPerSec=6.3265403664385085, CurrSamplesPerSec=5.702550815146744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:01:38,645] [INFO] [timer.py:197:stop] 0/1952, RunningAvgSamplesPerSec=6.326553403004653, CurrSamplesPerSec=5.705305484067998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:01:49,928] [INFO] [timer.py:197:stop] 0/1954, RunningAvgSamplesPerSec=6.326570279442877, CurrSamplesPerSec=5.703029370291108, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:02:01,286] [INFO] [timer.py:197:stop] 0/1956, RunningAvgSamplesPerSec=6.32656451884155, CurrSamplesPerSec=5.677128432591143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0077, 'learning_rate': 8.953333333333335e-06, 'epoch': 7.33} [2022-12-19 03:02:12,657] [INFO] [timer.py:197:stop] 0/1958, RunningAvgSamplesPerSec=6.326569271199113, CurrSamplesPerSec=5.707258197295713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:02:24,099] [INFO] [logging.py:68:log_dist] [Rank 0] step=980, skipped=6, lr=[8.94888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 03:02:24,101] [INFO] [timer.py:197:stop] 0/1960, RunningAvgSamplesPerSec=6.326584482872275, CurrSamplesPerSec=5.7166067126927596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:02:35,402] [INFO] [timer.py:197:stop] 0/1962, RunningAvgSamplesPerSec=6.326595460253956, CurrSamplesPerSec=5.7105045151488145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:02:46,761] [INFO] [timer.py:197:stop] 0/1964, RunningAvgSamplesPerSec=6.326610964543361, CurrSamplesPerSec=5.704105988860188, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:02:58,277] [INFO] [timer.py:197:stop] 0/1966, RunningAvgSamplesPerSec=6.326613509952827, CurrSamplesPerSec=5.690958265981202, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:03:09,639] [INFO] [timer.py:197:stop] 0/1968, RunningAvgSamplesPerSec=6.326608995818242, CurrSamplesPerSec=5.677601049007638, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:03:20,990] [INFO] [timer.py:197:stop] 0/1970, RunningAvgSamplesPerSec=6.326593487791063, CurrSamplesPerSec=5.678826663976843, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:03:32,292] [INFO] [timer.py:197:stop] 0/1972, RunningAvgSamplesPerSec=6.326593152000837, CurrSamplesPerSec=5.699780167028214, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:03:43,656] [INFO] [timer.py:197:stop] 0/1974, RunningAvgSamplesPerSec=6.326590465498463, CurrSamplesPerSec=5.681751106218245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:03:54,964] [INFO] [timer.py:197:stop] 0/1976, RunningAvgSamplesPerSec=6.32659258352864, CurrSamplesPerSec=5.6757112953345175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:04:06,280] [INFO] [timer.py:197:stop] 0/1978, RunningAvgSamplesPerSec=6.3266049487473435, CurrSamplesPerSec=5.706643781880916, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:04:17,582] [INFO] [logging.py:68:log_dist] [Rank 0] step=990, skipped=6, lr=[8.926666666666669e-06], mom=[[0.9, 0.999]] [2022-12-19 03:04:17,584] [INFO] [timer.py:197:stop] 0/1980, RunningAvgSamplesPerSec=6.3266198016964, CurrSamplesPerSec=5.708158219793414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:04:28,954] [INFO] [timer.py:197:stop] 0/1982, RunningAvgSamplesPerSec=6.326592497307985, CurrSamplesPerSec=5.714028914957275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:04:40,380] [INFO] [timer.py:197:stop] 0/1984, RunningAvgSamplesPerSec=6.326541972860564, CurrSamplesPerSec=5.600063420060266, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:04:51,717] [INFO] [timer.py:197:stop] 0/1986, RunningAvgSamplesPerSec=6.326534796318091, CurrSamplesPerSec=5.69550633832067, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:05:03,067] [INFO] [timer.py:197:stop] 0/1988, RunningAvgSamplesPerSec=6.326535161708194, CurrSamplesPerSec=5.678836274965597, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:05:14,353] [INFO] [timer.py:197:stop] 0/1990, RunningAvgSamplesPerSec=6.32654844691735, CurrSamplesPerSec=5.72301112403001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:05:25,711] [INFO] [timer.py:197:stop] 0/1992, RunningAvgSamplesPerSec=6.326516071676392, CurrSamplesPerSec=5.706936898799184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:05:36,991] [INFO] [timer.py:197:stop] 0/1994, RunningAvgSamplesPerSec=6.326542117678622, CurrSamplesPerSec=5.71591116943807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:05:48,312] [INFO] [timer.py:197:stop] 0/1996, RunningAvgSamplesPerSec=6.326543655886002, CurrSamplesPerSec=5.698899722633566, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:05:59,647] [INFO] [timer.py:197:stop] 0/1998, RunningAvgSamplesPerSec=6.326541233446142, CurrSamplesPerSec=5.6959965234396215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:06:11,045] [INFO] [logging.py:68:log_dist] [Rank 0] step=1000, skipped=6, lr=[8.904444444444446e-06], mom=[[0.9, 0.999]] [2022-12-19 03:06:11,047] [INFO] [timer.py:197:stop] 0/2000, RunningAvgSamplesPerSec=6.3265184925558104, CurrSamplesPerSec=5.647131924107018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:06:22,401] [INFO] [timer.py:197:stop] 0/2002, RunningAvgSamplesPerSec=6.3265128000256885, CurrSamplesPerSec=5.694423782610695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:06:33,753] [INFO] [timer.py:197:stop] 0/2004, RunningAvgSamplesPerSec=6.326514866019954, CurrSamplesPerSec=5.705248977420315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:06:45,054] [INFO] [timer.py:197:stop] 0/2006, RunningAvgSamplesPerSec=6.326516704682161, CurrSamplesPerSec=5.684222339947914, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0076, 'learning_rate': 8.897777777777779e-06, 'epoch': 7.52} {'eval_loss': 0.2607421875, 'eval_wer': 16.033204862140526, 'eval_runtime': 1390.3082, 'eval_samples_per_second': 3.331, 'eval_steps_per_second': 0.416, 'epoch': 7.52} [2022-12-19 03:30:04,054] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step1003 is begin to save! [2022-12-19 03:30:04,065] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-1000/global_step1003/mp_rank_00_model_states.pt [2022-12-19 03:30:04,065] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-1000/global_step1003/mp_rank_00_model_states.pt... [2022-12-19 03:30:07,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-1000/global_step1003/mp_rank_00_model_states.pt. [2022-12-19 03:30:07,700] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-1000/global_step1003/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-19 03:30:24,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-1000/global_step1003/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-19 03:30:24,042] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-1000/global_step1003/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-19 03:30:24,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1003 is ready now! [2022-12-19 03:31:08,771] [INFO] [timer.py:197:stop] 0/2008, RunningAvgSamplesPerSec=6.3263331628828965, CurrSamplesPerSec=5.402123414165982, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:31:20,026] [INFO] [timer.py:197:stop] 0/2010, RunningAvgSamplesPerSec=6.326363826599856, CurrSamplesPerSec=5.716381987699988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:31:31,436] [INFO] [timer.py:197:stop] 0/2012, RunningAvgSamplesPerSec=6.3263469103211305, CurrSamplesPerSec=5.6668701319433685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:31:42,707] [INFO] [timer.py:197:stop] 0/2014, RunningAvgSamplesPerSec=6.326361497095827, CurrSamplesPerSec=5.692060988442147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:31:54,091] [INFO] [timer.py:197:stop] 0/2016, RunningAvgSamplesPerSec=6.326373793928216, CurrSamplesPerSec=5.704518129641215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:32:05,516] [INFO] [timer.py:197:stop] 0/2018, RunningAvgSamplesPerSec=6.326346056805544, CurrSamplesPerSec=5.655095125981363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:32:16,823] [INFO] [logging.py:68:log_dist] [Rank 0] step=1010, skipped=6, lr=[8.882222222222224e-06], mom=[[0.9, 0.999]] [2022-12-19 03:32:16,825] [INFO] [timer.py:197:stop] 0/2020, RunningAvgSamplesPerSec=6.326342233988904, CurrSamplesPerSec=5.685529810586304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:32:28,091] [INFO] [timer.py:197:stop] 0/2022, RunningAvgSamplesPerSec=6.32635378833098, CurrSamplesPerSec=5.695872518935956, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:32:39,417] [INFO] [timer.py:197:stop] 0/2024, RunningAvgSamplesPerSec=6.32635341306856, CurrSamplesPerSec=5.685215770128713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:32:50,989] [INFO] [timer.py:197:stop] 0/2026, RunningAvgSamplesPerSec=6.326344806584471, CurrSamplesPerSec=5.682632514971568, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:33:02,418] [INFO] [timer.py:197:stop] 0/2028, RunningAvgSamplesPerSec=6.326348331850942, CurrSamplesPerSec=5.703409121572165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:33:13,710] [INFO] [timer.py:197:stop] 0/2030, RunningAvgSamplesPerSec=6.3263548331244, CurrSamplesPerSec=5.709211761957996, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:33:25,073] [INFO] [timer.py:197:stop] 0/2032, RunningAvgSamplesPerSec=6.326319499731557, CurrSamplesPerSec=5.632696244264232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:33:36,360] [INFO] [timer.py:197:stop] 0/2034, RunningAvgSamplesPerSec=6.326310038447824, CurrSamplesPerSec=5.668177769935911, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:33:47,842] [INFO] [timer.py:197:stop] 0/2036, RunningAvgSamplesPerSec=6.326321507858014, CurrSamplesPerSec=5.690846545349208, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:33:59,107] [INFO] [timer.py:197:stop] 0/2038, RunningAvgSamplesPerSec=6.326346672753627, CurrSamplesPerSec=5.713541458698382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:34:10,402] [INFO] [logging.py:68:log_dist] [Rank 0] step=1020, skipped=6, lr=[8.860000000000002e-06], mom=[[0.9, 0.999]] [2022-12-19 03:34:10,404] [INFO] [timer.py:197:stop] 0/2040, RunningAvgSamplesPerSec=6.326339826359868, CurrSamplesPerSec=5.675615292654977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:34:21,665] [INFO] [timer.py:197:stop] 0/2042, RunningAvgSamplesPerSec=6.326356920528593, CurrSamplesPerSec=5.693398875409551, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:34:33,002] [INFO] [timer.py:197:stop] 0/2044, RunningAvgSamplesPerSec=6.3263661191683624, CurrSamplesPerSec=5.698837293530415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:34:44,257] [INFO] [timer.py:197:stop] 0/2046, RunningAvgSamplesPerSec=6.3263852962308444, CurrSamplesPerSec=5.703746020494652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:34:55,581] [INFO] [timer.py:197:stop] 0/2048, RunningAvgSamplesPerSec=6.326378742356056, CurrSamplesPerSec=5.690288007979257, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:35:06,928] [INFO] [timer.py:197:stop] 0/2050, RunningAvgSamplesPerSec=6.326385177441003, CurrSamplesPerSec=5.689260006847404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:35:18,225] [INFO] [timer.py:197:stop] 0/2052, RunningAvgSamplesPerSec=6.326400399695237, CurrSamplesPerSec=5.695451959076411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:35:29,609] [INFO] [timer.py:197:stop] 0/2054, RunningAvgSamplesPerSec=6.326415999248159, CurrSamplesPerSec=5.712629039629598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:35:40,866] [INFO] [timer.py:197:stop] 0/2056, RunningAvgSamplesPerSec=6.3264398086458575, CurrSamplesPerSec=5.725564305250178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0069, 'learning_rate': 8.842222222222223e-06, 'epoch': 7.7} [2022-12-19 03:35:52,198] [INFO] [timer.py:197:stop] 0/2058, RunningAvgSamplesPerSec=6.326451187503932, CurrSamplesPerSec=5.706468605515632, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:36:03,511] [INFO] [logging.py:68:log_dist] [Rank 0] step=1030, skipped=6, lr=[8.83777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 03:36:03,513] [INFO] [timer.py:197:stop] 0/2060, RunningAvgSamplesPerSec=6.326444139361362, CurrSamplesPerSec=5.671281277030892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:36:14,865] [INFO] [timer.py:197:stop] 0/2062, RunningAvgSamplesPerSec=6.326436499399303, CurrSamplesPerSec=5.6899507672300595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:36:26,149] [INFO] [timer.py:197:stop] 0/2064, RunningAvgSamplesPerSec=6.326445807935181, CurrSamplesPerSec=5.716273405356531, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:36:37,481] [INFO] [timer.py:197:stop] 0/2066, RunningAvgSamplesPerSec=6.32643361077367, CurrSamplesPerSec=5.684019892288922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:36:48,888] [INFO] [timer.py:197:stop] 0/2068, RunningAvgSamplesPerSec=6.326418138610428, CurrSamplesPerSec=5.683664139308789, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:37:00,228] [INFO] [timer.py:197:stop] 0/2070, RunningAvgSamplesPerSec=6.326396612681386, CurrSamplesPerSec=5.67550873348384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:37:11,535] [INFO] [timer.py:197:stop] 0/2072, RunningAvgSamplesPerSec=6.3263959738168865, CurrSamplesPerSec=5.684900319629634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:37:22,823] [INFO] [timer.py:197:stop] 0/2074, RunningAvgSamplesPerSec=6.326413192775993, CurrSamplesPerSec=5.703061115270283, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:37:34,165] [INFO] [timer.py:197:stop] 0/2076, RunningAvgSamplesPerSec=6.326405676716698, CurrSamplesPerSec=5.678445374245796, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:37:45,643] [INFO] [timer.py:197:stop] 0/2078, RunningAvgSamplesPerSec=6.326407589920214, CurrSamplesPerSec=5.692984717585088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:37:57,135] [INFO] [logging.py:68:log_dist] [Rank 0] step=1040, skipped=6, lr=[8.815555555555557e-06], mom=[[0.9, 0.999]] [2022-12-19 03:37:57,137] [INFO] [timer.py:197:stop] 0/2080, RunningAvgSamplesPerSec=6.326412221109849, CurrSamplesPerSec=5.694543133521435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:38:08,474] [INFO] [timer.py:197:stop] 0/2082, RunningAvgSamplesPerSec=6.326411171465485, CurrSamplesPerSec=5.704083686469491, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:38:19,815] [INFO] [timer.py:197:stop] 0/2084, RunningAvgSamplesPerSec=6.326386153789793, CurrSamplesPerSec=5.6485963934646035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:38:31,151] [INFO] [timer.py:197:stop] 0/2086, RunningAvgSamplesPerSec=6.3263778019327255, CurrSamplesPerSec=5.678881447048505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:38:42,444] [INFO] [timer.py:197:stop] 0/2088, RunningAvgSamplesPerSec=6.326376750436937, CurrSamplesPerSec=5.7019330525990775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:38:53,715] [INFO] [timer.py:197:stop] 0/2090, RunningAvgSamplesPerSec=6.326400081331565, CurrSamplesPerSec=5.704683245097394, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:39:05,040] [INFO] [timer.py:197:stop] 0/2092, RunningAvgSamplesPerSec=6.326384151641467, CurrSamplesPerSec=5.649897989032479, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:39:16,352] [INFO] [timer.py:197:stop] 0/2094, RunningAvgSamplesPerSec=6.326390883251042, CurrSamplesPerSec=5.70578692729458, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:39:27,687] [INFO] [timer.py:197:stop] 0/2096, RunningAvgSamplesPerSec=6.326374514770011, CurrSamplesPerSec=5.6748555460136805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:39:39,017] [INFO] [timer.py:197:stop] 0/2098, RunningAvgSamplesPerSec=6.326348757225234, CurrSamplesPerSec=5.678633490001171, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:39:50,379] [INFO] [logging.py:68:log_dist] [Rank 0] step=1050, skipped=6, lr=[8.793333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 03:39:50,380] [INFO] [timer.py:197:stop] 0/2100, RunningAvgSamplesPerSec=6.326329049328467, CurrSamplesPerSec=5.668760466234932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:40:01,854] [INFO] [timer.py:197:stop] 0/2102, RunningAvgSamplesPerSec=6.326321141801384, CurrSamplesPerSec=5.66522399814651, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:40:13,109] [INFO] [timer.py:197:stop] 0/2104, RunningAvgSamplesPerSec=6.32634466388475, CurrSamplesPerSec=5.70014326619576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:40:24,436] [INFO] [timer.py:197:stop] 0/2106, RunningAvgSamplesPerSec=6.326333360760331, CurrSamplesPerSec=5.705591187173723, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.008, 'learning_rate': 8.786666666666668e-06, 'epoch': 7.89} [2022-12-19 03:40:35,746] [INFO] [timer.py:197:stop] 0/2108, RunningAvgSamplesPerSec=6.326343555109656, CurrSamplesPerSec=5.699310385357951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:40:47,269] [INFO] [timer.py:197:stop] 0/2110, RunningAvgSamplesPerSec=6.326329129201121, CurrSamplesPerSec=5.674179240303234, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:40:58,688] [INFO] [timer.py:197:stop] 0/2112, RunningAvgSamplesPerSec=6.326332279110047, CurrSamplesPerSec=5.707757203837147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:41:10,018] [INFO] [timer.py:197:stop] 0/2114, RunningAvgSamplesPerSec=6.32632978891732, CurrSamplesPerSec=5.683744769538712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:41:21,350] [INFO] [timer.py:197:stop] 0/2116, RunningAvgSamplesPerSec=6.326317582789852, CurrSamplesPerSec=5.682015452024639, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:41:32,855] [INFO] [timer.py:197:stop] 0/2118, RunningAvgSamplesPerSec=6.326319561216987, CurrSamplesPerSec=5.695665856761127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:41:44,163] [INFO] [logging.py:68:log_dist] [Rank 0] step=1060, skipped=6, lr=[8.771111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 03:41:44,165] [INFO] [timer.py:197:stop] 0/2120, RunningAvgSamplesPerSec=6.326314274477149, CurrSamplesPerSec=5.689704254577837, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:41:55,482] [INFO] [timer.py:197:stop] 0/2122, RunningAvgSamplesPerSec=6.326313718245797, CurrSamplesPerSec=5.682598831664661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:42:06,833] [INFO] [timer.py:197:stop] 0/2124, RunningAvgSamplesPerSec=6.326301536948877, CurrSamplesPerSec=5.672175261095847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:42:18,132] [INFO] [timer.py:197:stop] 0/2126, RunningAvgSamplesPerSec=6.326316694986368, CurrSamplesPerSec=5.709347033956345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:42:29,442] [INFO] [timer.py:197:stop] 0/2128, RunningAvgSamplesPerSec=6.3263284632713725, CurrSamplesPerSec=5.6999176551376785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:42:40,959] [INFO] [timer.py:197:stop] 0/2130, RunningAvgSamplesPerSec=6.326324899738855, CurrSamplesPerSec=5.677522754468322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:42:52,300] [INFO] [timer.py:197:stop] 0/2132, RunningAvgSamplesPerSec=6.326321114570328, CurrSamplesPerSec=5.679126061553765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:43:03,641] [INFO] [timer.py:197:stop] 0/2134, RunningAvgSamplesPerSec=6.326301485240582, CurrSamplesPerSec=5.641426271863261, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:43:14,022] [INFO] [timer.py:197:stop] 0/2136, RunningAvgSamplesPerSec=6.326789274156916, CurrSamplesPerSec=6.6894220514634695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:43:25,495] [INFO] [timer.py:197:stop] 0/2138, RunningAvgSamplesPerSec=6.326796389019196, CurrSamplesPerSec=5.715065644103102, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:43:37,041] [INFO] [logging.py:68:log_dist] [Rank 0] step=1070, skipped=6, lr=[8.74888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 03:43:37,042] [INFO] [timer.py:197:stop] 0/2140, RunningAvgSamplesPerSec=6.326794602549944, CurrSamplesPerSec=5.690337946166168, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:43:48,361] [INFO] [timer.py:197:stop] 0/2142, RunningAvgSamplesPerSec=6.326802757601128, CurrSamplesPerSec=5.700901566419516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:43:59,735] [INFO] [timer.py:197:stop] 0/2144, RunningAvgSamplesPerSec=6.326761187128666, CurrSamplesPerSec=5.638328106176111, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:44:11,070] [INFO] [timer.py:197:stop] 0/2146, RunningAvgSamplesPerSec=6.326747320262678, CurrSamplesPerSec=5.676246329746859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:44:22,450] [INFO] [timer.py:197:stop] 0/2148, RunningAvgSamplesPerSec=6.326723667378246, CurrSamplesPerSec=5.66565302019834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:44:33,801] [INFO] [timer.py:197:stop] 0/2150, RunningAvgSamplesPerSec=6.326707573783008, CurrSamplesPerSec=5.675119490517693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:44:45,219] [INFO] [timer.py:197:stop] 0/2152, RunningAvgSamplesPerSec=6.326694022233909, CurrSamplesPerSec=5.670736396582747, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:44:56,551] [INFO] [timer.py:197:stop] 0/2154, RunningAvgSamplesPerSec=6.326703778794602, CurrSamplesPerSec=5.6991230751599335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:45:07,944] [INFO] [timer.py:197:stop] 0/2156, RunningAvgSamplesPerSec=6.326691625928259, CurrSamplesPerSec=5.660217569929879, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:45:19,280] [INFO] [timer.py:197:stop] 0/2158, RunningAvgSamplesPerSec=6.326684574449171, CurrSamplesPerSec=5.674285749556581, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0078, 'learning_rate': 8.72888888888889e-06, 'epoch': 8.08} [2022-12-19 03:45:30,665] [INFO] [logging.py:68:log_dist] [Rank 0] step=1080, skipped=6, lr=[8.726666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 03:45:30,666] [INFO] [timer.py:197:stop] 0/2160, RunningAvgSamplesPerSec=6.3266822621659555, CurrSamplesPerSec=5.678146528866409, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:45:42,022] [INFO] [timer.py:197:stop] 0/2162, RunningAvgSamplesPerSec=6.326687649998629, CurrSamplesPerSec=5.699724253866839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:45:53,405] [INFO] [timer.py:197:stop] 0/2164, RunningAvgSamplesPerSec=6.326684576205909, CurrSamplesPerSec=5.680701422324087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:46:04,779] [INFO] [timer.py:197:stop] 0/2166, RunningAvgSamplesPerSec=6.326675915307126, CurrSamplesPerSec=5.688947965505454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:46:16,107] [INFO] [timer.py:197:stop] 0/2168, RunningAvgSamplesPerSec=6.3266971697620065, CurrSamplesPerSec=5.711763337833625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:46:27,460] [INFO] [timer.py:197:stop] 0/2170, RunningAvgSamplesPerSec=6.326700059223969, CurrSamplesPerSec=5.687659168268104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:46:38,840] [INFO] [timer.py:197:stop] 0/2172, RunningAvgSamplesPerSec=6.32670647190984, CurrSamplesPerSec=5.691728847588171, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:46:50,236] [INFO] [timer.py:197:stop] 0/2174, RunningAvgSamplesPerSec=6.326721238395473, CurrSamplesPerSec=5.715663619283175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:47:01,611] [INFO] [timer.py:197:stop] 0/2176, RunningAvgSamplesPerSec=6.326729142532696, CurrSamplesPerSec=5.686001418092932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:47:13,001] [INFO] [timer.py:197:stop] 0/2178, RunningAvgSamplesPerSec=6.326722082239641, CurrSamplesPerSec=5.678830027819207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:47:24,461] [INFO] [logging.py:68:log_dist] [Rank 0] step=1090, skipped=6, lr=[8.704444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 03:47:24,462] [INFO] [timer.py:197:stop] 0/2180, RunningAvgSamplesPerSec=6.326723890551851, CurrSamplesPerSec=5.699955659183087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:47:36,073] [INFO] [timer.py:197:stop] 0/2182, RunningAvgSamplesPerSec=6.32670038513686, CurrSamplesPerSec=5.634336535357509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:47:47,427] [INFO] [timer.py:197:stop] 0/2184, RunningAvgSamplesPerSec=6.326671508503432, CurrSamplesPerSec=5.642204843836243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:47:58,748] [INFO] [timer.py:197:stop] 0/2186, RunningAvgSamplesPerSec=6.32664242626278, CurrSamplesPerSec=5.639421661678552, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:48:10,083] [INFO] [timer.py:197:stop] 0/2188, RunningAvgSamplesPerSec=6.326625907509208, CurrSamplesPerSec=5.682638770486816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:48:21,366] [INFO] [timer.py:197:stop] 0/2190, RunningAvgSamplesPerSec=6.326632428434273, CurrSamplesPerSec=5.686442025102453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:48:32,672] [INFO] [timer.py:197:stop] 0/2192, RunningAvgSamplesPerSec=6.32664495061061, CurrSamplesPerSec=5.713778852155534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:48:44,043] [INFO] [timer.py:197:stop] 0/2194, RunningAvgSamplesPerSec=6.326626988288933, CurrSamplesPerSec=5.646367906682831, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:48:55,321] [INFO] [timer.py:197:stop] 0/2196, RunningAvgSamplesPerSec=6.326646720398593, CurrSamplesPerSec=5.7233854871556815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:49:06,559] [INFO] [timer.py:197:stop] 0/2198, RunningAvgSamplesPerSec=6.326680121088751, CurrSamplesPerSec=5.737371744252518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:49:17,827] [INFO] [logging.py:68:log_dist] [Rank 0] step=1100, skipped=6, lr=[8.682222222222222e-06], mom=[[0.9, 0.999]] [2022-12-19 03:49:17,828] [INFO] [timer.py:197:stop] 0/2200, RunningAvgSamplesPerSec=6.326699479323874, CurrSamplesPerSec=5.726950001455016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:49:29,108] [INFO] [timer.py:197:stop] 0/2202, RunningAvgSamplesPerSec=6.32671136389057, CurrSamplesPerSec=5.691314450022166, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:49:40,375] [INFO] [timer.py:197:stop] 0/2204, RunningAvgSamplesPerSec=6.326731943783809, CurrSamplesPerSec=5.708830266312805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:49:51,686] [INFO] [timer.py:197:stop] 0/2206, RunningAvgSamplesPerSec=6.326744243596059, CurrSamplesPerSec=5.712352112927222, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:50:03,035] [INFO] [timer.py:197:stop] 0/2208, RunningAvgSamplesPerSec=6.326756214811398, CurrSamplesPerSec=5.711282829794082, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0066, 'learning_rate': 8.673333333333334e-06, 'epoch': 8.27} [2022-12-19 03:50:14,328] [INFO] [timer.py:197:stop] 0/2210, RunningAvgSamplesPerSec=6.326778938297404, CurrSamplesPerSec=5.717787356906989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:50:25,592] [INFO] [timer.py:197:stop] 0/2212, RunningAvgSamplesPerSec=6.326803615618879, CurrSamplesPerSec=5.715070024418691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:50:36,851] [INFO] [timer.py:197:stop] 0/2214, RunningAvgSamplesPerSec=6.326825981256755, CurrSamplesPerSec=5.703315814880034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:50:48,119] [INFO] [timer.py:197:stop] 0/2216, RunningAvgSamplesPerSec=6.326839919448313, CurrSamplesPerSec=5.696322393368489, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:50:59,425] [INFO] [timer.py:197:stop] 0/2218, RunningAvgSamplesPerSec=6.326857225964502, CurrSamplesPerSec=5.711950021336125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:51:10,706] [INFO] [logging.py:68:log_dist] [Rank 0] step=1110, skipped=6, lr=[8.66e-06], mom=[[0.9, 0.999]] [2022-12-19 03:51:10,708] [INFO] [timer.py:197:stop] 0/2220, RunningAvgSamplesPerSec=6.326876989223657, CurrSamplesPerSec=5.7002865820661075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:51:22,014] [INFO] [timer.py:197:stop] 0/2222, RunningAvgSamplesPerSec=6.32686311623508, CurrSamplesPerSec=5.673763556633231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:51:33,334] [INFO] [timer.py:197:stop] 0/2224, RunningAvgSamplesPerSec=6.3268577945585, CurrSamplesPerSec=5.697523695936071, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:51:44,621] [INFO] [timer.py:197:stop] 0/2226, RunningAvgSamplesPerSec=6.326872561678371, CurrSamplesPerSec=5.719099593443624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:51:55,918] [INFO] [timer.py:197:stop] 0/2228, RunningAvgSamplesPerSec=6.326863225393763, CurrSamplesPerSec=5.68888792438288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:52:07,308] [INFO] [timer.py:197:stop] 0/2230, RunningAvgSamplesPerSec=6.326829255557307, CurrSamplesPerSec=5.6530941253156595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:52:18,614] [INFO] [timer.py:197:stop] 0/2232, RunningAvgSamplesPerSec=6.326809160648009, CurrSamplesPerSec=5.693291647428187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:52:29,929] [INFO] [timer.py:197:stop] 0/2234, RunningAvgSamplesPerSec=6.326793166919668, CurrSamplesPerSec=5.676587229754855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:52:41,247] [INFO] [timer.py:197:stop] 0/2236, RunningAvgSamplesPerSec=6.326789831833537, CurrSamplesPerSec=5.686570437916975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:52:52,555] [INFO] [timer.py:197:stop] 0/2238, RunningAvgSamplesPerSec=6.326792145151205, CurrSamplesPerSec=5.690938961886812, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:53:03,820] [INFO] [logging.py:68:log_dist] [Rank 0] step=1120, skipped=6, lr=[8.637777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 03:53:03,822] [INFO] [timer.py:197:stop] 0/2240, RunningAvgSamplesPerSec=6.326812596440566, CurrSamplesPerSec=5.718746745815257, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:53:15,120] [INFO] [timer.py:197:stop] 0/2242, RunningAvgSamplesPerSec=6.3268062402079535, CurrSamplesPerSec=5.695992172313003, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:53:26,377] [INFO] [timer.py:197:stop] 0/2244, RunningAvgSamplesPerSec=6.326825319481436, CurrSamplesPerSec=5.688221769984032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:53:37,650] [INFO] [timer.py:197:stop] 0/2246, RunningAvgSamplesPerSec=6.32684544604326, CurrSamplesPerSec=5.721677822819974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:53:48,942] [INFO] [timer.py:197:stop] 0/2248, RunningAvgSamplesPerSec=6.3268617475305815, CurrSamplesPerSec=5.716064773283627, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:54:00,218] [INFO] [timer.py:197:stop] 0/2250, RunningAvgSamplesPerSec=6.326873144476691, CurrSamplesPerSec=5.694921755448826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:54:11,924] [INFO] [timer.py:197:stop] 0/2252, RunningAvgSamplesPerSec=6.326855026852673, CurrSamplesPerSec=5.6789946206884006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:54:23,710] [INFO] [timer.py:197:stop] 0/2254, RunningAvgSamplesPerSec=6.326847263726218, CurrSamplesPerSec=5.663852233324447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:54:35,373] [INFO] [timer.py:197:stop] 0/2256, RunningAvgSamplesPerSec=6.326852061873704, CurrSamplesPerSec=5.7059625469616915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:54:46,707] [INFO] [timer.py:197:stop] 0/2258, RunningAvgSamplesPerSec=6.326854861799754, CurrSamplesPerSec=5.6923448837328845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0069, 'learning_rate': 8.617777777777778e-06, 'epoch': 8.46} [2022-12-19 03:54:57,959] [INFO] [logging.py:68:log_dist] [Rank 0] step=1130, skipped=6, lr=[8.615555555555555e-06], mom=[[0.9, 0.999]] [2022-12-19 03:54:57,961] [INFO] [timer.py:197:stop] 0/2260, RunningAvgSamplesPerSec=6.326865158219303, CurrSamplesPerSec=5.69410779261727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:55:09,335] [INFO] [timer.py:197:stop] 0/2262, RunningAvgSamplesPerSec=6.326880602108355, CurrSamplesPerSec=5.6992639196644115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:55:20,677] [INFO] [timer.py:197:stop] 0/2264, RunningAvgSamplesPerSec=6.326882653853096, CurrSamplesPerSec=5.687434785350154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:55:31,950] [INFO] [timer.py:197:stop] 0/2266, RunningAvgSamplesPerSec=6.326890793957932, CurrSamplesPerSec=5.709684391874074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:55:43,236] [INFO] [timer.py:197:stop] 0/2268, RunningAvgSamplesPerSec=6.326883951474452, CurrSamplesPerSec=5.693821064465607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:55:54,505] [INFO] [timer.py:197:stop] 0/2270, RunningAvgSamplesPerSec=6.326907571762626, CurrSamplesPerSec=5.720987629167165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:56:05,926] [INFO] [timer.py:197:stop] 0/2272, RunningAvgSamplesPerSec=6.326908845474048, CurrSamplesPerSec=5.699991969254007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:56:17,223] [INFO] [timer.py:197:stop] 0/2274, RunningAvgSamplesPerSec=6.326923134873984, CurrSamplesPerSec=5.7107960849196395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:56:28,630] [INFO] [timer.py:197:stop] 0/2276, RunningAvgSamplesPerSec=6.326934430366989, CurrSamplesPerSec=5.701065261778981, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:56:40,093] [INFO] [timer.py:197:stop] 0/2278, RunningAvgSamplesPerSec=6.3269157813518415, CurrSamplesPerSec=5.666033311453176, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:56:51,446] [INFO] [logging.py:68:log_dist] [Rank 0] step=1140, skipped=6, lr=[8.593333333333333e-06], mom=[[0.9, 0.999]] [2022-12-19 03:56:51,447] [INFO] [timer.py:197:stop] 0/2280, RunningAvgSamplesPerSec=6.326914313350795, CurrSamplesPerSec=5.683731772272363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:57:02,754] [INFO] [timer.py:197:stop] 0/2282, RunningAvgSamplesPerSec=6.326913478165649, CurrSamplesPerSec=5.699271179879084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:57:14,062] [INFO] [timer.py:197:stop] 0/2284, RunningAvgSamplesPerSec=6.326912246740567, CurrSamplesPerSec=5.6824686734957925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:57:25,403] [INFO] [timer.py:197:stop] 0/2286, RunningAvgSamplesPerSec=6.326912774169063, CurrSamplesPerSec=5.682104935886212, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:57:36,732] [INFO] [timer.py:197:stop] 0/2288, RunningAvgSamplesPerSec=6.326899969416523, CurrSamplesPerSec=5.665963228787464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:57:48,135] [INFO] [timer.py:197:stop] 0/2290, RunningAvgSamplesPerSec=6.326907377097946, CurrSamplesPerSec=5.69469293324249, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:57:59,432] [INFO] [timer.py:197:stop] 0/2292, RunningAvgSamplesPerSec=6.3269137028619875, CurrSamplesPerSec=5.695411356584488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:58:10,758] [INFO] [timer.py:197:stop] 0/2294, RunningAvgSamplesPerSec=6.326902934336184, CurrSamplesPerSec=5.679826861795565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:58:22,024] [INFO] [timer.py:197:stop] 0/2296, RunningAvgSamplesPerSec=6.32691960975345, CurrSamplesPerSec=5.733180719305521, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:58:33,366] [INFO] [timer.py:197:stop] 0/2298, RunningAvgSamplesPerSec=6.326923779451651, CurrSamplesPerSec=5.687469972048633, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:58:44,804] [INFO] [logging.py:68:log_dist] [Rank 0] step=1150, skipped=6, lr=[8.571111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 03:58:44,805] [INFO] [timer.py:197:stop] 0/2300, RunningAvgSamplesPerSec=6.326922397339787, CurrSamplesPerSec=5.67730853627524, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:58:56,156] [INFO] [timer.py:197:stop] 0/2302, RunningAvgSamplesPerSec=6.326899464099917, CurrSamplesPerSec=5.660306129820572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:59:07,493] [INFO] [timer.py:197:stop] 0/2304, RunningAvgSamplesPerSec=6.326892600064268, CurrSamplesPerSec=5.686855472447477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:59:18,817] [INFO] [timer.py:197:stop] 0/2306, RunningAvgSamplesPerSec=6.326903146286715, CurrSamplesPerSec=5.708624119613139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:59:30,215] [INFO] [timer.py:197:stop] 0/2308, RunningAvgSamplesPerSec=6.326912002601788, CurrSamplesPerSec=5.708092674592468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0076, 'learning_rate': 8.562222222222224e-06, 'epoch': 8.64} [2022-12-19 03:59:41,564] [INFO] [timer.py:197:stop] 0/2310, RunningAvgSamplesPerSec=6.326921232682535, CurrSamplesPerSec=5.702347544380242, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 03:59:52,877] [INFO] [timer.py:197:stop] 0/2312, RunningAvgSamplesPerSec=6.326921183730588, CurrSamplesPerSec=5.6817434095296155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:00:04,280] [INFO] [timer.py:197:stop] 0/2314, RunningAvgSamplesPerSec=6.32691956687804, CurrSamplesPerSec=5.694086776178087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:00:15,666] [INFO] [timer.py:197:stop] 0/2316, RunningAvgSamplesPerSec=6.326915762634836, CurrSamplesPerSec=5.6789790019772255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:00:27,020] [INFO] [timer.py:197:stop] 0/2318, RunningAvgSamplesPerSec=6.326924978072096, CurrSamplesPerSec=5.711120248263902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:00:38,312] [INFO] [logging.py:68:log_dist] [Rank 0] step=1160, skipped=6, lr=[8.54888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 04:00:38,314] [INFO] [timer.py:197:stop] 0/2320, RunningAvgSamplesPerSec=6.326931529562839, CurrSamplesPerSec=5.7127732271981815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:00:49,711] [INFO] [timer.py:197:stop] 0/2322, RunningAvgSamplesPerSec=6.326921525545097, CurrSamplesPerSec=5.686511410668486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:01:01,036] [INFO] [timer.py:197:stop] 0/2324, RunningAvgSamplesPerSec=6.326911018853958, CurrSamplesPerSec=5.6763622791686625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:01:12,465] [INFO] [timer.py:197:stop] 0/2326, RunningAvgSamplesPerSec=6.326910579743809, CurrSamplesPerSec=5.685113666288175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:01:23,844] [INFO] [timer.py:197:stop] 0/2328, RunningAvgSamplesPerSec=6.3269064120439396, CurrSamplesPerSec=5.670474297026939, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:01:35,152] [INFO] [timer.py:197:stop] 0/2330, RunningAvgSamplesPerSec=6.326898629917993, CurrSamplesPerSec=5.6657209426902755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:01:46,555] [INFO] [timer.py:197:stop] 0/2332, RunningAvgSamplesPerSec=6.32688614214516, CurrSamplesPerSec=5.68886549971036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:01:57,867] [INFO] [timer.py:197:stop] 0/2334, RunningAvgSamplesPerSec=6.326881788743624, CurrSamplesPerSec=5.693715752867512, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:02:09,245] [INFO] [timer.py:197:stop] 0/2336, RunningAvgSamplesPerSec=6.3268949475223675, CurrSamplesPerSec=5.702312657886888, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:02:20,668] [INFO] [timer.py:197:stop] 0/2338, RunningAvgSamplesPerSec=6.326899966488643, CurrSamplesPerSec=5.700914158036501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:02:31,904] [INFO] [logging.py:68:log_dist] [Rank 0] step=1170, skipped=6, lr=[8.526666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 04:02:31,906] [INFO] [timer.py:197:stop] 0/2340, RunningAvgSamplesPerSec=6.326914796387739, CurrSamplesPerSec=5.700272782758298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:02:43,132] [INFO] [timer.py:197:stop] 0/2342, RunningAvgSamplesPerSec=6.326935356414018, CurrSamplesPerSec=5.716648348407121, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:02:54,454] [INFO] [timer.py:197:stop] 0/2344, RunningAvgSamplesPerSec=6.326925778267953, CurrSamplesPerSec=5.6924689761126865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:03:05,781] [INFO] [timer.py:197:stop] 0/2346, RunningAvgSamplesPerSec=6.326927235194531, CurrSamplesPerSec=5.701729099221579, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:03:17,114] [INFO] [timer.py:197:stop] 0/2348, RunningAvgSamplesPerSec=6.3269164061083165, CurrSamplesPerSec=5.662269003505207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:03:28,336] [INFO] [timer.py:197:stop] 0/2350, RunningAvgSamplesPerSec=6.326937672478398, CurrSamplesPerSec=5.720442176951931, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:03:39,629] [INFO] [timer.py:197:stop] 0/2352, RunningAvgSamplesPerSec=6.3269344365056215, CurrSamplesPerSec=5.694238726111645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:03:50,930] [INFO] [timer.py:197:stop] 0/2354, RunningAvgSamplesPerSec=6.326938531744269, CurrSamplesPerSec=5.704920145492803, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:04:02,241] [INFO] [timer.py:197:stop] 0/2356, RunningAvgSamplesPerSec=6.326934480041807, CurrSamplesPerSec=5.672662397291141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:04:13,586] [INFO] [timer.py:197:stop] 0/2358, RunningAvgSamplesPerSec=6.3269138612854015, CurrSamplesPerSec=5.656316525110265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0073, 'learning_rate': 8.506666666666668e-06, 'epoch': 8.83} [2022-12-19 04:04:24,861] [INFO] [logging.py:68:log_dist] [Rank 0] step=1180, skipped=6, lr=[8.504444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 04:04:24,862] [INFO] [timer.py:197:stop] 0/2360, RunningAvgSamplesPerSec=6.32692340962011, CurrSamplesPerSec=5.687257894434876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:04:36,214] [INFO] [timer.py:197:stop] 0/2362, RunningAvgSamplesPerSec=6.326908359111157, CurrSamplesPerSec=5.6728424574855065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:04:47,499] [INFO] [timer.py:197:stop] 0/2364, RunningAvgSamplesPerSec=6.326923880161435, CurrSamplesPerSec=5.7307819891564185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:04:58,798] [INFO] [timer.py:197:stop] 0/2366, RunningAvgSamplesPerSec=6.326930073715006, CurrSamplesPerSec=5.699990516842288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:05:10,110] [INFO] [timer.py:197:stop] 0/2368, RunningAvgSamplesPerSec=6.326928872982972, CurrSamplesPerSec=5.693558517342062, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:05:21,415] [INFO] [timer.py:197:stop] 0/2370, RunningAvgSamplesPerSec=6.32692678772592, CurrSamplesPerSec=5.684541326578065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:05:32,688] [INFO] [timer.py:197:stop] 0/2372, RunningAvgSamplesPerSec=6.326939165947378, CurrSamplesPerSec=5.71502549152308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:05:43,994] [INFO] [timer.py:197:stop] 0/2374, RunningAvgSamplesPerSec=6.32694138603378, CurrSamplesPerSec=5.687064869356519, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:05:55,281] [INFO] [timer.py:197:stop] 0/2376, RunningAvgSamplesPerSec=6.326959890250097, CurrSamplesPerSec=5.718078452836109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:06:06,557] [INFO] [timer.py:197:stop] 0/2378, RunningAvgSamplesPerSec=6.3269624304743, CurrSamplesPerSec=5.6831763150401144, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:06:17,909] [INFO] [logging.py:68:log_dist] [Rank 0] step=1190, skipped=6, lr=[8.482222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 04:06:17,910] [INFO] [timer.py:197:stop] 0/2380, RunningAvgSamplesPerSec=6.326946025956038, CurrSamplesPerSec=5.672231354210051, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:06:29,189] [INFO] [timer.py:197:stop] 0/2382, RunningAvgSamplesPerSec=6.326933134199437, CurrSamplesPerSec=5.680017472845422, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:06:40,498] [INFO] [timer.py:197:stop] 0/2384, RunningAvgSamplesPerSec=6.32693591183812, CurrSamplesPerSec=5.683294713629209, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:06:51,794] [INFO] [timer.py:197:stop] 0/2386, RunningAvgSamplesPerSec=6.32693357995509, CurrSamplesPerSec=5.680674013096329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:07:03,101] [INFO] [timer.py:197:stop] 0/2388, RunningAvgSamplesPerSec=6.326933836209684, CurrSamplesPerSec=5.689397711400819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:07:14,416] [INFO] [timer.py:197:stop] 0/2390, RunningAvgSamplesPerSec=6.326917349381798, CurrSamplesPerSec=5.689625625876832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:07:25,757] [INFO] [timer.py:197:stop] 0/2392, RunningAvgSamplesPerSec=6.326913301962612, CurrSamplesPerSec=5.692572310082572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:07:37,087] [INFO] [timer.py:197:stop] 0/2394, RunningAvgSamplesPerSec=6.326897240515746, CurrSamplesPerSec=5.680604529484813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:07:48,368] [INFO] [timer.py:197:stop] 0/2396, RunningAvgSamplesPerSec=6.326907172372438, CurrSamplesPerSec=5.696952482061851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:07:59,675] [INFO] [timer.py:197:stop] 0/2398, RunningAvgSamplesPerSec=6.326915921707753, CurrSamplesPerSec=5.703930725859358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:08:10,983] [INFO] [logging.py:68:log_dist] [Rank 0] step=1200, skipped=6, lr=[8.46e-06], mom=[[0.9, 0.999]] [2022-12-19 04:08:10,984] [INFO] [timer.py:197:stop] 0/2400, RunningAvgSamplesPerSec=6.326910202990017, CurrSamplesPerSec=5.684619574049522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:08:22,324] [INFO] [timer.py:197:stop] 0/2402, RunningAvgSamplesPerSec=6.326893759261124, CurrSamplesPerSec=5.678590484150977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:08:32,707] [INFO] [timer.py:197:stop] 0/2404, RunningAvgSamplesPerSec=6.327327630099559, CurrSamplesPerSec=5.724181476598547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:08:44,011] [INFO] [timer.py:197:stop] 0/2406, RunningAvgSamplesPerSec=6.327325381523828, CurrSamplesPerSec=5.682155452214008, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:08:55,849] [INFO] [timer.py:197:stop] 0/2408, RunningAvgSamplesPerSec=6.3272446725703935, CurrSamplesPerSec=5.556637489341007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0077, 'learning_rate': 8.451111111111112e-06, 'epoch': 9.02} [2022-12-19 04:09:07,419] [INFO] [timer.py:197:stop] 0/2410, RunningAvgSamplesPerSec=6.327223586658248, CurrSamplesPerSec=5.68101833067337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:09:19,123] [INFO] [timer.py:197:stop] 0/2412, RunningAvgSamplesPerSec=6.327209079202645, CurrSamplesPerSec=5.681874737136878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:09:30,607] [INFO] [timer.py:197:stop] 0/2414, RunningAvgSamplesPerSec=6.3272028439943835, CurrSamplesPerSec=5.68554329776328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:09:41,912] [INFO] [timer.py:197:stop] 0/2416, RunningAvgSamplesPerSec=6.327212789379762, CurrSamplesPerSec=5.701547442822031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:09:53,185] [INFO] [timer.py:197:stop] 0/2418, RunningAvgSamplesPerSec=6.327221780296296, CurrSamplesPerSec=5.688512997446568, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:10:04,587] [INFO] [logging.py:68:log_dist] [Rank 0] step=1210, skipped=6, lr=[8.437777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 04:10:04,588] [INFO] [timer.py:197:stop] 0/2420, RunningAvgSamplesPerSec=6.327216021618297, CurrSamplesPerSec=5.679899451128424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:10:15,880] [INFO] [timer.py:197:stop] 0/2422, RunningAvgSamplesPerSec=6.3272181216320735, CurrSamplesPerSec=5.68795298964158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:10:27,325] [INFO] [timer.py:197:stop] 0/2424, RunningAvgSamplesPerSec=6.327212556279381, CurrSamplesPerSec=5.680850254239494, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:10:38,597] [INFO] [timer.py:197:stop] 0/2426, RunningAvgSamplesPerSec=6.327238584749656, CurrSamplesPerSec=5.725299799508739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:10:49,879] [INFO] [timer.py:197:stop] 0/2428, RunningAvgSamplesPerSec=6.327246183895817, CurrSamplesPerSec=5.696429977310237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:11:01,246] [INFO] [timer.py:197:stop] 0/2430, RunningAvgSamplesPerSec=6.327253695480208, CurrSamplesPerSec=5.69116555194767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:11:12,527] [INFO] [timer.py:197:stop] 0/2432, RunningAvgSamplesPerSec=6.32727968630711, CurrSamplesPerSec=5.723474082206291, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:11:23,870] [INFO] [timer.py:197:stop] 0/2434, RunningAvgSamplesPerSec=6.327291914370569, CurrSamplesPerSec=5.693422301880015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:11:35,189] [INFO] [timer.py:197:stop] 0/2436, RunningAvgSamplesPerSec=6.3273044848833715, CurrSamplesPerSec=5.699630341588255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:11:46,603] [INFO] [timer.py:197:stop] 0/2438, RunningAvgSamplesPerSec=6.327293905301759, CurrSamplesPerSec=5.675468174850679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:11:57,994] [INFO] [logging.py:68:log_dist] [Rank 0] step=1220, skipped=6, lr=[8.415555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 04:11:57,995] [INFO] [timer.py:197:stop] 0/2440, RunningAvgSamplesPerSec=6.327289269697851, CurrSamplesPerSec=5.675818582171572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:12:09,330] [INFO] [timer.py:197:stop] 0/2442, RunningAvgSamplesPerSec=6.327276850231833, CurrSamplesPerSec=5.6804506617930155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:12:20,700] [INFO] [timer.py:197:stop] 0/2444, RunningAvgSamplesPerSec=6.327291999052488, CurrSamplesPerSec=5.741507007263887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:12:32,106] [INFO] [timer.py:197:stop] 0/2446, RunningAvgSamplesPerSec=6.327285541428539, CurrSamplesPerSec=5.681104416589546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:12:43,467] [INFO] [timer.py:197:stop] 0/2448, RunningAvgSamplesPerSec=6.327293560746078, CurrSamplesPerSec=5.709729813155692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:12:54,816] [INFO] [timer.py:197:stop] 0/2450, RunningAvgSamplesPerSec=6.32727584707772, CurrSamplesPerSec=5.686059230320265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:13:06,139] [INFO] [timer.py:197:stop] 0/2452, RunningAvgSamplesPerSec=6.327275064853307, CurrSamplesPerSec=5.687267533983291, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:13:17,579] [INFO] [timer.py:197:stop] 0/2454, RunningAvgSamplesPerSec=6.327263698162629, CurrSamplesPerSec=5.684930177566222, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:13:28,942] [INFO] [timer.py:197:stop] 0/2456, RunningAvgSamplesPerSec=6.327255305468056, CurrSamplesPerSec=5.6696747316500735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:13:40,338] [INFO] [timer.py:197:stop] 0/2458, RunningAvgSamplesPerSec=6.327242493970284, CurrSamplesPerSec=5.6686043665717305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0055, 'learning_rate': 8.395555555555557e-06, 'epoch': 9.21} [2022-12-19 04:13:51,650] [INFO] [logging.py:68:log_dist] [Rank 0] step=1230, skipped=6, lr=[8.393333333333335e-06], mom=[[0.9, 0.999]] [2022-12-19 04:13:51,651] [INFO] [timer.py:197:stop] 0/2460, RunningAvgSamplesPerSec=6.327236775224114, CurrSamplesPerSec=5.685485736864895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:14:02,994] [INFO] [timer.py:197:stop] 0/2462, RunningAvgSamplesPerSec=6.327241806025147, CurrSamplesPerSec=5.6892547013762576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:14:14,284] [INFO] [timer.py:197:stop] 0/2464, RunningAvgSamplesPerSec=6.327245814373844, CurrSamplesPerSec=5.7083048522858615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:14:25,701] [INFO] [timer.py:197:stop] 0/2466, RunningAvgSamplesPerSec=6.327249647955402, CurrSamplesPerSec=5.709471868740669, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:14:36,992] [INFO] [timer.py:197:stop] 0/2468, RunningAvgSamplesPerSec=6.327255534237445, CurrSamplesPerSec=5.7064700612305845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:14:48,412] [INFO] [timer.py:197:stop] 0/2470, RunningAvgSamplesPerSec=6.327266975916028, CurrSamplesPerSec=5.695022278725524, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:14:59,739] [INFO] [timer.py:197:stop] 0/2472, RunningAvgSamplesPerSec=6.327280985810622, CurrSamplesPerSec=5.711286475225972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:15:11,144] [INFO] [timer.py:197:stop] 0/2474, RunningAvgSamplesPerSec=6.3273003831942916, CurrSamplesPerSec=5.7087105585839435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:15:22,397] [INFO] [timer.py:197:stop] 0/2476, RunningAvgSamplesPerSec=6.327320569193225, CurrSamplesPerSec=5.715042039184681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:15:33,641] [INFO] [timer.py:197:stop] 0/2478, RunningAvgSamplesPerSec=6.327340612358702, CurrSamplesPerSec=5.723859978589962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:15:44,996] [INFO] [logging.py:68:log_dist] [Rank 0] step=1240, skipped=6, lr=[8.371111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 04:15:44,998] [INFO] [timer.py:197:stop] 0/2480, RunningAvgSamplesPerSec=6.327358536969363, CurrSamplesPerSec=5.722120561178059, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:15:56,341] [INFO] [timer.py:197:stop] 0/2482, RunningAvgSamplesPerSec=6.327373581584701, CurrSamplesPerSec=5.699598634790952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:16:07,737] [INFO] [timer.py:197:stop] 0/2484, RunningAvgSamplesPerSec=6.327395492154337, CurrSamplesPerSec=5.715882932545435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:16:18,990] [INFO] [timer.py:197:stop] 0/2486, RunningAvgSamplesPerSec=6.327415349672259, CurrSamplesPerSec=5.712709278019139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:16:30,294] [INFO] [timer.py:197:stop] 0/2488, RunningAvgSamplesPerSec=6.327417885237725, CurrSamplesPerSec=5.693521081592185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:16:41,662] [INFO] [timer.py:197:stop] 0/2490, RunningAvgSamplesPerSec=6.3274268613333025, CurrSamplesPerSec=5.687544443962776, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:16:53,050] [INFO] [timer.py:197:stop] 0/2492, RunningAvgSamplesPerSec=6.327435089756953, CurrSamplesPerSec=5.70162906566551, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:17:04,346] [INFO] [timer.py:197:stop] 0/2494, RunningAvgSamplesPerSec=6.327444654506138, CurrSamplesPerSec=5.70499677255448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:17:15,620] [INFO] [timer.py:197:stop] 0/2496, RunningAvgSamplesPerSec=6.327455126581859, CurrSamplesPerSec=5.705722164206594, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:17:26,866] [INFO] [timer.py:197:stop] 0/2498, RunningAvgSamplesPerSec=6.327472123190234, CurrSamplesPerSec=5.706745204692451, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:17:38,157] [INFO] [logging.py:68:log_dist] [Rank 0] step=1250, skipped=6, lr=[8.34888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 04:17:38,159] [INFO] [timer.py:197:stop] 0/2500, RunningAvgSamplesPerSec=6.327485151369118, CurrSamplesPerSec=5.710359227327922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:17:49,413] [INFO] [timer.py:197:stop] 0/2502, RunningAvgSamplesPerSec=6.327498538552826, CurrSamplesPerSec=5.718407585270987, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:18:00,722] [INFO] [timer.py:197:stop] 0/2504, RunningAvgSamplesPerSec=6.327495420092677, CurrSamplesPerSec=5.684689878230289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:18:11,994] [INFO] [timer.py:197:stop] 0/2506, RunningAvgSamplesPerSec=6.327514848367186, CurrSamplesPerSec=5.713716096639127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:18:23,254] [INFO] [timer.py:197:stop] 0/2508, RunningAvgSamplesPerSec=6.327525513762481, CurrSamplesPerSec=5.6991613106084875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0044, 'learning_rate': 8.34e-06, 'epoch': 9.4} [2022-12-19 04:18:34,549] [INFO] [timer.py:197:stop] 0/2510, RunningAvgSamplesPerSec=6.327527886750086, CurrSamplesPerSec=5.696776691189706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:18:45,849] [INFO] [timer.py:197:stop] 0/2512, RunningAvgSamplesPerSec=6.327534164094675, CurrSamplesPerSec=5.707756232922582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:18:57,133] [INFO] [timer.py:197:stop] 0/2514, RunningAvgSamplesPerSec=6.32755409614066, CurrSamplesPerSec=5.713313083406205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:19:08,406] [INFO] [timer.py:197:stop] 0/2516, RunningAvgSamplesPerSec=6.327573708139139, CurrSamplesPerSec=5.710221720916039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:19:19,712] [INFO] [timer.py:197:stop] 0/2518, RunningAvgSamplesPerSec=6.32756496852723, CurrSamplesPerSec=5.67868634724566, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:19:31,012] [INFO] [logging.py:68:log_dist] [Rank 0] step=1260, skipped=6, lr=[8.326666666666668e-06], mom=[[0.9, 0.999]] [2022-12-19 04:19:31,014] [INFO] [timer.py:197:stop] 0/2520, RunningAvgSamplesPerSec=6.327559257872089, CurrSamplesPerSec=5.660666843071698, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:19:42,334] [INFO] [timer.py:197:stop] 0/2522, RunningAvgSamplesPerSec=6.32755267522424, CurrSamplesPerSec=5.703511883643668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:19:53,632] [INFO] [timer.py:197:stop] 0/2524, RunningAvgSamplesPerSec=6.327560655553061, CurrSamplesPerSec=5.71144104581992, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:20:04,928] [INFO] [timer.py:197:stop] 0/2526, RunningAvgSamplesPerSec=6.327567439599224, CurrSamplesPerSec=5.694448667104371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:20:16,252] [INFO] [timer.py:197:stop] 0/2528, RunningAvgSamplesPerSec=6.327570160964482, CurrSamplesPerSec=5.682361616254928, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:20:27,595] [INFO] [timer.py:197:stop] 0/2530, RunningAvgSamplesPerSec=6.327587505402186, CurrSamplesPerSec=5.7207501240007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:20:38,871] [INFO] [timer.py:197:stop] 0/2532, RunningAvgSamplesPerSec=6.327608286337964, CurrSamplesPerSec=5.715521719669567, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:20:50,135] [INFO] [timer.py:197:stop] 0/2534, RunningAvgSamplesPerSec=6.327627596622899, CurrSamplesPerSec=5.721742460788221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:21:01,420] [INFO] [timer.py:197:stop] 0/2536, RunningAvgSamplesPerSec=6.327644625057438, CurrSamplesPerSec=5.724910046116121, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:21:12,701] [INFO] [timer.py:197:stop] 0/2538, RunningAvgSamplesPerSec=6.327655959548348, CurrSamplesPerSec=5.709129679022975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:21:23,989] [INFO] [logging.py:68:log_dist] [Rank 0] step=1270, skipped=6, lr=[8.304444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 04:21:23,990] [INFO] [timer.py:197:stop] 0/2540, RunningAvgSamplesPerSec=6.327658872350754, CurrSamplesPerSec=5.683974157002646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:21:35,314] [INFO] [timer.py:197:stop] 0/2542, RunningAvgSamplesPerSec=6.327653221258386, CurrSamplesPerSec=5.670923043485756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:21:46,624] [INFO] [timer.py:197:stop] 0/2544, RunningAvgSamplesPerSec=6.327658100254132, CurrSamplesPerSec=5.6891015704508945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:21:57,899] [INFO] [timer.py:197:stop] 0/2546, RunningAvgSamplesPerSec=6.327656418721621, CurrSamplesPerSec=5.687504195145632, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:22:09,207] [INFO] [timer.py:197:stop] 0/2548, RunningAvgSamplesPerSec=6.327654910783209, CurrSamplesPerSec=5.669727422247446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:22:20,526] [INFO] [timer.py:197:stop] 0/2550, RunningAvgSamplesPerSec=6.327647340099194, CurrSamplesPerSec=5.68235584249591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:22:31,822] [INFO] [timer.py:197:stop] 0/2552, RunningAvgSamplesPerSec=6.327642232397861, CurrSamplesPerSec=5.691388298634938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:22:43,131] [INFO] [timer.py:197:stop] 0/2554, RunningAvgSamplesPerSec=6.3276290330120775, CurrSamplesPerSec=5.681775639552415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:22:54,406] [INFO] [timer.py:197:stop] 0/2556, RunningAvgSamplesPerSec=6.327643793110778, CurrSamplesPerSec=5.708642329852826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:23:05,692] [INFO] [timer.py:197:stop] 0/2558, RunningAvgSamplesPerSec=6.327646346758819, CurrSamplesPerSec=5.697527565683986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0055, 'learning_rate': 8.284444444444446e-06, 'epoch': 9.58} [2022-12-19 04:23:16,966] [INFO] [logging.py:68:log_dist] [Rank 0] step=1280, skipped=6, lr=[8.282222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 04:23:16,967] [INFO] [timer.py:197:stop] 0/2560, RunningAvgSamplesPerSec=6.327661992557761, CurrSamplesPerSec=5.690284389304119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:23:28,323] [INFO] [timer.py:197:stop] 0/2562, RunningAvgSamplesPerSec=6.327643371745875, CurrSamplesPerSec=5.660436468285014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:23:40,228] [INFO] [timer.py:197:stop] 0/2564, RunningAvgSamplesPerSec=6.327627338245924, CurrSamplesPerSec=5.658322909116909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:23:52,121] [INFO] [timer.py:197:stop] 0/2566, RunningAvgSamplesPerSec=6.327622747830682, CurrSamplesPerSec=5.687982397553558, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:24:03,603] [INFO] [timer.py:197:stop] 0/2568, RunningAvgSamplesPerSec=6.327627050938146, CurrSamplesPerSec=5.722738801968567, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:24:14,872] [INFO] [timer.py:197:stop] 0/2570, RunningAvgSamplesPerSec=6.327635695893805, CurrSamplesPerSec=5.721289294034149, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:24:26,340] [INFO] [timer.py:197:stop] 0/2572, RunningAvgSamplesPerSec=6.327631835961745, CurrSamplesPerSec=5.681131829970861, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:24:37,714] [INFO] [timer.py:197:stop] 0/2574, RunningAvgSamplesPerSec=6.3276377548146385, CurrSamplesPerSec=5.689481157333476, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:24:49,055] [INFO] [timer.py:197:stop] 0/2576, RunningAvgSamplesPerSec=6.327645189894202, CurrSamplesPerSec=5.725779982220005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:25:00,357] [INFO] [timer.py:197:stop] 0/2578, RunningAvgSamplesPerSec=6.327643015261872, CurrSamplesPerSec=5.6922893577664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:25:11,707] [INFO] [logging.py:68:log_dist] [Rank 0] step=1290, skipped=6, lr=[8.26e-06], mom=[[0.9, 0.999]] [2022-12-19 04:25:11,708] [INFO] [timer.py:197:stop] 0/2580, RunningAvgSamplesPerSec=6.327625603871908, CurrSamplesPerSec=5.667617450713021, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:25:23,090] [INFO] [timer.py:197:stop] 0/2582, RunningAvgSamplesPerSec=6.32761954442621, CurrSamplesPerSec=5.676601874968253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:25:34,480] [INFO] [timer.py:197:stop] 0/2584, RunningAvgSamplesPerSec=6.327615826455199, CurrSamplesPerSec=5.681510113551274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:25:45,888] [INFO] [timer.py:197:stop] 0/2586, RunningAvgSamplesPerSec=6.327628482275865, CurrSamplesPerSec=5.6893516483651565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:25:57,239] [INFO] [timer.py:197:stop] 0/2588, RunningAvgSamplesPerSec=6.327606292032402, CurrSamplesPerSec=5.7110801511032205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:26:08,546] [INFO] [timer.py:197:stop] 0/2590, RunningAvgSamplesPerSec=6.3276022651192605, CurrSamplesPerSec=5.701835918588878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:26:19,933] [INFO] [timer.py:197:stop] 0/2592, RunningAvgSamplesPerSec=6.327596528399326, CurrSamplesPerSec=5.69057655182449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:26:31,286] [INFO] [timer.py:197:stop] 0/2594, RunningAvgSamplesPerSec=6.327576356164596, CurrSamplesPerSec=5.663091331446026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:26:42,616] [INFO] [timer.py:197:stop] 0/2596, RunningAvgSamplesPerSec=6.32757838614737, CurrSamplesPerSec=5.679437266650708, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:26:53,929] [INFO] [timer.py:197:stop] 0/2598, RunningAvgSamplesPerSec=6.327583219355911, CurrSamplesPerSec=5.679859791163652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:27:05,275] [INFO] [logging.py:68:log_dist] [Rank 0] step=1300, skipped=6, lr=[8.237777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 04:27:05,276] [INFO] [timer.py:197:stop] 0/2600, RunningAvgSamplesPerSec=6.327574020614694, CurrSamplesPerSec=5.684490286311387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:27:16,523] [INFO] [timer.py:197:stop] 0/2602, RunningAvgSamplesPerSec=6.327594487554447, CurrSamplesPerSec=5.728894086611572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:27:27,994] [INFO] [timer.py:197:stop] 0/2604, RunningAvgSamplesPerSec=6.327590416990944, CurrSamplesPerSec=5.665715681030595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:27:39,338] [INFO] [timer.py:197:stop] 0/2606, RunningAvgSamplesPerSec=6.327579909722822, CurrSamplesPerSec=5.672136188342789, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:27:50,724] [INFO] [timer.py:197:stop] 0/2608, RunningAvgSamplesPerSec=6.3275881653323784, CurrSamplesPerSec=5.725285390387295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0047, 'learning_rate': 8.22888888888889e-06, 'epoch': 9.77} [2022-12-19 04:28:02,043] [INFO] [timer.py:197:stop] 0/2610, RunningAvgSamplesPerSec=6.327591126230465, CurrSamplesPerSec=5.714259780300831, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:28:13,511] [INFO] [timer.py:197:stop] 0/2612, RunningAvgSamplesPerSec=6.3275847497433775, CurrSamplesPerSec=5.6696064748323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:28:24,810] [INFO] [timer.py:197:stop] 0/2614, RunningAvgSamplesPerSec=6.327580509794008, CurrSamplesPerSec=5.696919112396318, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:28:36,088] [INFO] [timer.py:197:stop] 0/2616, RunningAvgSamplesPerSec=6.327574836380664, CurrSamplesPerSec=5.685706353162136, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:28:47,532] [INFO] [timer.py:197:stop] 0/2618, RunningAvgSamplesPerSec=6.327561076655581, CurrSamplesPerSec=5.655147069388042, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:28:58,801] [INFO] [logging.py:68:log_dist] [Rank 0] step=1310, skipped=6, lr=[8.215555555555557e-06], mom=[[0.9, 0.999]] [2022-12-19 04:28:58,802] [INFO] [timer.py:197:stop] 0/2620, RunningAvgSamplesPerSec=6.3275786222779375, CurrSamplesPerSec=5.71975228135555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:29:10,249] [INFO] [timer.py:197:stop] 0/2622, RunningAvgSamplesPerSec=6.327574442429613, CurrSamplesPerSec=5.6764362204039065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:29:21,524] [INFO] [timer.py:197:stop] 0/2624, RunningAvgSamplesPerSec=6.3275726514417325, CurrSamplesPerSec=5.669737002461279, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:29:32,933] [INFO] [timer.py:197:stop] 0/2626, RunningAvgSamplesPerSec=6.3275638066974835, CurrSamplesPerSec=5.688533008375364, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:29:44,254] [INFO] [timer.py:197:stop] 0/2628, RunningAvgSamplesPerSec=6.327559148783537, CurrSamplesPerSec=5.677316701246349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:29:55,688] [INFO] [timer.py:197:stop] 0/2630, RunningAvgSamplesPerSec=6.32752445490903, CurrSamplesPerSec=5.6172008280384675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:30:07,016] [INFO] [timer.py:197:stop] 0/2632, RunningAvgSamplesPerSec=6.327509828915664, CurrSamplesPerSec=5.6871624646686145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:30:18,381] [INFO] [timer.py:197:stop] 0/2634, RunningAvgSamplesPerSec=6.327507480489899, CurrSamplesPerSec=5.6947656614328555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:30:29,670] [INFO] [timer.py:197:stop] 0/2636, RunningAvgSamplesPerSec=6.327517990246805, CurrSamplesPerSec=5.6911238039671055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:30:41,142] [INFO] [timer.py:197:stop] 0/2638, RunningAvgSamplesPerSec=6.327479975516669, CurrSamplesPerSec=5.614337039184779, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:30:52,430] [INFO] [logging.py:68:log_dist] [Rank 0] step=1320, skipped=6, lr=[8.193333333333335e-06], mom=[[0.9, 0.999]] [2022-12-19 04:30:52,431] [INFO] [timer.py:197:stop] 0/2640, RunningAvgSamplesPerSec=6.327479030111488, CurrSamplesPerSec=5.690682953720507, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:31:03,727] [INFO] [timer.py:197:stop] 0/2642, RunningAvgSamplesPerSec=6.327475334560339, CurrSamplesPerSec=5.69599579825139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:31:15,004] [INFO] [timer.py:197:stop] 0/2644, RunningAvgSamplesPerSec=6.327477097981835, CurrSamplesPerSec=5.705385759384736, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:31:26,324] [INFO] [timer.py:197:stop] 0/2646, RunningAvgSamplesPerSec=6.327458119590912, CurrSamplesPerSec=5.64429830642532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:31:37,649] [INFO] [timer.py:197:stop] 0/2648, RunningAvgSamplesPerSec=6.327441210721228, CurrSamplesPerSec=5.670881591956896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:31:49,014] [INFO] [timer.py:197:stop] 0/2650, RunningAvgSamplesPerSec=6.327433495859435, CurrSamplesPerSec=5.696968683417249, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:32:00,353] [INFO] [timer.py:197:stop] 0/2652, RunningAvgSamplesPerSec=6.327422713482944, CurrSamplesPerSec=5.669090889584624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:32:11,641] [INFO] [timer.py:197:stop] 0/2654, RunningAvgSamplesPerSec=6.327435885020091, CurrSamplesPerSec=5.692680476761366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:32:23,014] [INFO] [timer.py:197:stop] 0/2656, RunningAvgSamplesPerSec=6.327408842360331, CurrSamplesPerSec=5.682880340148714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:32:34,327] [INFO] [timer.py:197:stop] 0/2658, RunningAvgSamplesPerSec=6.327407898434005, CurrSamplesPerSec=5.679799941719916, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0051, 'learning_rate': 8.173333333333334e-06, 'epoch': 9.96} [2022-12-19 04:32:45,624] [INFO] [logging.py:68:log_dist] [Rank 0] step=1330, skipped=6, lr=[8.171111111111113e-06], mom=[[0.9, 0.999]] [2022-12-19 04:32:45,625] [INFO] [timer.py:197:stop] 0/2660, RunningAvgSamplesPerSec=6.32740900031812, CurrSamplesPerSec=5.690155084999773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:32:56,916] [INFO] [timer.py:197:stop] 0/2662, RunningAvgSamplesPerSec=6.327420811164743, CurrSamplesPerSec=5.707339498345196, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:33:08,184] [INFO] [timer.py:197:stop] 0/2664, RunningAvgSamplesPerSec=6.32744019838735, CurrSamplesPerSec=5.731855404645649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:33:19,476] [INFO] [timer.py:197:stop] 0/2666, RunningAvgSamplesPerSec=6.327447784434791, CurrSamplesPerSec=5.708730469054878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:33:30,737] [INFO] [timer.py:197:stop] 0/2668, RunningAvgSamplesPerSec=6.327466118849369, CurrSamplesPerSec=5.722365499324705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:33:41,061] [INFO] [timer.py:197:stop] 0/2670, RunningAvgSamplesPerSec=6.327867303388882, CurrSamplesPerSec=6.6788652484528175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:33:52,304] [INFO] [timer.py:197:stop] 0/2672, RunningAvgSamplesPerSec=6.327883349359341, CurrSamplesPerSec=5.709854421772586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:34:03,578] [INFO] [timer.py:197:stop] 0/2674, RunningAvgSamplesPerSec=6.32789668444696, CurrSamplesPerSec=5.720442664569505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:34:14,956] [INFO] [timer.py:197:stop] 0/2676, RunningAvgSamplesPerSec=6.3278928534264045, CurrSamplesPerSec=5.692007399141155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:34:26,274] [INFO] [timer.py:197:stop] 0/2678, RunningAvgSamplesPerSec=6.327889324802171, CurrSamplesPerSec=5.6917983622338415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:34:37,566] [INFO] [logging.py:68:log_dist] [Rank 0] step=1340, skipped=6, lr=[8.14888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 04:34:37,567] [INFO] [timer.py:197:stop] 0/2680, RunningAvgSamplesPerSec=6.327885633679515, CurrSamplesPerSec=5.692718384360342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:34:49,046] [INFO] [timer.py:197:stop] 0/2682, RunningAvgSamplesPerSec=6.327870015065339, CurrSamplesPerSec=5.674565235414943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:35:00,376] [INFO] [timer.py:197:stop] 0/2684, RunningAvgSamplesPerSec=6.327865660212362, CurrSamplesPerSec=5.694771702063781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:35:11,763] [INFO] [timer.py:197:stop] 0/2686, RunningAvgSamplesPerSec=6.327850524911834, CurrSamplesPerSec=5.682534353305255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:35:23,032] [INFO] [timer.py:197:stop] 0/2688, RunningAvgSamplesPerSec=6.32784263041002, CurrSamplesPerSec=5.669807418026449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:35:34,347] [INFO] [timer.py:197:stop] 0/2690, RunningAvgSamplesPerSec=6.327828364168504, CurrSamplesPerSec=5.676121023275366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:35:45,632] [INFO] [timer.py:197:stop] 0/2692, RunningAvgSamplesPerSec=6.327828676875804, CurrSamplesPerSec=5.689163545281223, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:35:56,954] [INFO] [timer.py:197:stop] 0/2694, RunningAvgSamplesPerSec=6.327819495043791, CurrSamplesPerSec=5.6859248187024765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:36:08,282] [INFO] [timer.py:197:stop] 0/2696, RunningAvgSamplesPerSec=6.32780188095253, CurrSamplesPerSec=5.680625927370897, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:36:19,595] [INFO] [timer.py:197:stop] 0/2698, RunningAvgSamplesPerSec=6.3277991258523985, CurrSamplesPerSec=5.685590985080684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:36:30,869] [INFO] [logging.py:68:log_dist] [Rank 0] step=1350, skipped=6, lr=[8.126666666666668e-06], mom=[[0.9, 0.999]] [2022-12-19 04:36:30,870] [INFO] [timer.py:197:stop] 0/2700, RunningAvgSamplesPerSec=6.327798144134965, CurrSamplesPerSec=5.688781107366139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:36:42,112] [INFO] [timer.py:197:stop] 0/2702, RunningAvgSamplesPerSec=6.32781346736348, CurrSamplesPerSec=5.699905309983569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:36:53,443] [INFO] [timer.py:197:stop] 0/2704, RunningAvgSamplesPerSec=6.327808081797187, CurrSamplesPerSec=5.679814363157279, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:37:04,740] [INFO] [timer.py:197:stop] 0/2706, RunningAvgSamplesPerSec=6.327807342547552, CurrSamplesPerSec=5.695327011591557, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:37:16,010] [INFO] [timer.py:197:stop] 0/2708, RunningAvgSamplesPerSec=6.327813883930887, CurrSamplesPerSec=5.695094048716919, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:37:27,382] [INFO] [timer.py:197:stop] 0/2710, RunningAvgSamplesPerSec=6.327821371580702, CurrSamplesPerSec=5.715748324297511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0058, 'learning_rate': 8.115555555555557e-06, 'epoch': 10.15} [2022-12-19 04:37:38,946] [INFO] [timer.py:197:stop] 0/2712, RunningAvgSamplesPerSec=6.327825050467662, CurrSamplesPerSec=5.707824683208667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:37:50,199] [INFO] [timer.py:197:stop] 0/2714, RunningAvgSamplesPerSec=6.327843305373135, CurrSamplesPerSec=5.734748236146249, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:38:01,496] [INFO] [timer.py:197:stop] 0/2716, RunningAvgSamplesPerSec=6.327847759758398, CurrSamplesPerSec=5.697650191043467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:38:12,912] [INFO] [timer.py:197:stop] 0/2718, RunningAvgSamplesPerSec=6.327862058119661, CurrSamplesPerSec=5.716268292836665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:38:24,425] [INFO] [logging.py:68:log_dist] [Rank 0] step=1360, skipped=6, lr=[8.104444444444446e-06], mom=[[0.9, 0.999]] [2022-12-19 04:38:24,427] [INFO] [timer.py:197:stop] 0/2720, RunningAvgSamplesPerSec=6.327866515885675, CurrSamplesPerSec=5.697151499127097, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:38:35,671] [INFO] [timer.py:197:stop] 0/2722, RunningAvgSamplesPerSec=6.327874946048741, CurrSamplesPerSec=5.70919039102576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:38:46,965] [INFO] [timer.py:197:stop] 0/2724, RunningAvgSamplesPerSec=6.327893889831281, CurrSamplesPerSec=5.732130308705768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:38:58,224] [INFO] [timer.py:197:stop] 0/2726, RunningAvgSamplesPerSec=6.3279129251795805, CurrSamplesPerSec=5.72706338851098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:39:09,682] [INFO] [timer.py:197:stop] 0/2728, RunningAvgSamplesPerSec=6.327925118689292, CurrSamplesPerSec=5.7072708170082445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:39:20,975] [INFO] [timer.py:197:stop] 0/2730, RunningAvgSamplesPerSec=6.327925588400606, CurrSamplesPerSec=5.682998726405809, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:39:32,490] [INFO] [timer.py:197:stop] 0/2732, RunningAvgSamplesPerSec=6.327933669826998, CurrSamplesPerSec=5.706201251797301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:39:43,730] [INFO] [timer.py:197:stop] 0/2734, RunningAvgSamplesPerSec=6.327969822305904, CurrSamplesPerSec=5.760413307852951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:39:55,138] [INFO] [timer.py:197:stop] 0/2736, RunningAvgSamplesPerSec=6.327986783097564, CurrSamplesPerSec=5.70023186941381, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:40:06,493] [INFO] [timer.py:197:stop] 0/2738, RunningAvgSamplesPerSec=6.328012914561887, CurrSamplesPerSec=5.744524654580106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:40:17,972] [INFO] [logging.py:68:log_dist] [Rank 0] step=1370, skipped=6, lr=[8.082222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 04:40:17,974] [INFO] [timer.py:197:stop] 0/2740, RunningAvgSamplesPerSec=6.328000882705645, CurrSamplesPerSec=5.656375403917007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:40:29,226] [INFO] [timer.py:197:stop] 0/2742, RunningAvgSamplesPerSec=6.328023326407805, CurrSamplesPerSec=5.711226933754518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:40:40,546] [INFO] [timer.py:197:stop] 0/2744, RunningAvgSamplesPerSec=6.328043555487994, CurrSamplesPerSec=5.694305403046297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:40:51,786] [INFO] [timer.py:197:stop] 0/2746, RunningAvgSamplesPerSec=6.328064587316351, CurrSamplesPerSec=5.707391435153441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:41:03,042] [INFO] [timer.py:197:stop] 0/2748, RunningAvgSamplesPerSec=6.328076113522426, CurrSamplesPerSec=5.7074508966785915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:41:14,478] [INFO] [timer.py:197:stop] 0/2750, RunningAvgSamplesPerSec=6.3280921929056175, CurrSamplesPerSec=5.697020915279977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:41:25,931] [INFO] [timer.py:197:stop] 0/2752, RunningAvgSamplesPerSec=6.328106706959077, CurrSamplesPerSec=5.689574494268087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:41:37,185] [INFO] [timer.py:197:stop] 0/2754, RunningAvgSamplesPerSec=6.328114671752057, CurrSamplesPerSec=5.706690610625807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:41:48,444] [INFO] [timer.py:197:stop] 0/2756, RunningAvgSamplesPerSec=6.328139509113434, CurrSamplesPerSec=5.722638273850486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:41:59,779] [INFO] [timer.py:197:stop] 0/2758, RunningAvgSamplesPerSec=6.328155716552921, CurrSamplesPerSec=5.708027859151795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:42:11,027] [INFO] [logging.py:68:log_dist] [Rank 0] step=1380, skipped=6, lr=[8.06e-06], mom=[[0.9, 0.999]] [2022-12-19 04:42:11,029] [INFO] [timer.py:197:stop] 0/2760, RunningAvgSamplesPerSec=6.328169823812502, CurrSamplesPerSec=5.70514494021122, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0058, 'learning_rate': 8.06e-06, 'epoch': 10.34} [2022-12-19 04:42:22,341] [INFO] [timer.py:197:stop] 0/2762, RunningAvgSamplesPerSec=6.328179154793602, CurrSamplesPerSec=5.71343371387353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:42:33,638] [INFO] [timer.py:197:stop] 0/2764, RunningAvgSamplesPerSec=6.328184232624906, CurrSamplesPerSec=5.683627555600725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:42:44,929] [INFO] [timer.py:197:stop] 0/2766, RunningAvgSamplesPerSec=6.328199342644374, CurrSamplesPerSec=5.710920497084007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:42:56,205] [INFO] [timer.py:197:stop] 0/2768, RunningAvgSamplesPerSec=6.328209805534292, CurrSamplesPerSec=5.692888371139172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:43:07,530] [INFO] [timer.py:197:stop] 0/2770, RunningAvgSamplesPerSec=6.328207636937677, CurrSamplesPerSec=5.681313390183267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:43:18,829] [INFO] [timer.py:197:stop] 0/2772, RunningAvgSamplesPerSec=6.328219406012855, CurrSamplesPerSec=5.719345735316539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:43:30,148] [INFO] [timer.py:197:stop] 0/2774, RunningAvgSamplesPerSec=6.328214153594668, CurrSamplesPerSec=5.664877766860799, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:43:41,458] [INFO] [timer.py:197:stop] 0/2776, RunningAvgSamplesPerSec=6.328209011262343, CurrSamplesPerSec=5.684557216659499, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:43:52,764] [INFO] [timer.py:197:stop] 0/2778, RunningAvgSamplesPerSec=6.328214432824169, CurrSamplesPerSec=5.699428973114266, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:44:04,001] [INFO] [logging.py:68:log_dist] [Rank 0] step=1390, skipped=6, lr=[8.037777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 04:44:04,003] [INFO] [timer.py:197:stop] 0/2780, RunningAvgSamplesPerSec=6.328239435337414, CurrSamplesPerSec=5.736458073376533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:44:15,361] [INFO] [timer.py:197:stop] 0/2782, RunningAvgSamplesPerSec=6.328215285018692, CurrSamplesPerSec=5.705490290448889, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:44:26,685] [INFO] [timer.py:197:stop] 0/2784, RunningAvgSamplesPerSec=6.328197790936813, CurrSamplesPerSec=5.676392527622977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:44:38,121] [INFO] [timer.py:197:stop] 0/2786, RunningAvgSamplesPerSec=6.3281798798977915, CurrSamplesPerSec=5.664082408181399, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:44:49,397] [INFO] [timer.py:197:stop] 0/2788, RunningAvgSamplesPerSec=6.328175506082157, CurrSamplesPerSec=5.691385885251289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:45:00,729] [INFO] [timer.py:197:stop] 0/2790, RunningAvgSamplesPerSec=6.3281710018867345, CurrSamplesPerSec=5.694388268224568, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:45:12,028] [INFO] [timer.py:197:stop] 0/2792, RunningAvgSamplesPerSec=6.3281669555507, CurrSamplesPerSec=5.703269283852379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:45:23,326] [INFO] [timer.py:197:stop] 0/2794, RunningAvgSamplesPerSec=6.328164421474112, CurrSamplesPerSec=5.684226673111327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:45:34,654] [INFO] [timer.py:197:stop] 0/2796, RunningAvgSamplesPerSec=6.328158218086543, CurrSamplesPerSec=5.68168881174327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:45:45,955] [INFO] [timer.py:197:stop] 0/2798, RunningAvgSamplesPerSec=6.328155072410145, CurrSamplesPerSec=5.669355495473393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:45:57,334] [INFO] [logging.py:68:log_dist] [Rank 0] step=1400, skipped=6, lr=[8.015555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 04:45:57,336] [INFO] [timer.py:197:stop] 0/2800, RunningAvgSamplesPerSec=6.328156318484673, CurrSamplesPerSec=5.703200700440707, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:46:08,719] [INFO] [timer.py:197:stop] 0/2802, RunningAvgSamplesPerSec=6.328151188584298, CurrSamplesPerSec=5.663888084875867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:46:20,063] [INFO] [timer.py:197:stop] 0/2804, RunningAvgSamplesPerSec=6.32814429565117, CurrSamplesPerSec=5.6945110000917705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:46:31,391] [INFO] [timer.py:197:stop] 0/2806, RunningAvgSamplesPerSec=6.328135042278908, CurrSamplesPerSec=5.6865099651185895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:46:42,684] [INFO] [timer.py:197:stop] 0/2808, RunningAvgSamplesPerSec=6.328128232598905, CurrSamplesPerSec=5.691482904886723, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:46:54,023] [INFO] [timer.py:197:stop] 0/2810, RunningAvgSamplesPerSec=6.328110852049497, CurrSamplesPerSec=5.66652226402126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0069, 'learning_rate': 8.004444444444445e-06, 'epoch': 10.52} [2022-12-19 04:47:05,497] [INFO] [timer.py:197:stop] 0/2812, RunningAvgSamplesPerSec=6.328099181046538, CurrSamplesPerSec=5.660882433561006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:47:16,900] [INFO] [timer.py:197:stop] 0/2814, RunningAvgSamplesPerSec=6.328088625364998, CurrSamplesPerSec=5.702270503943606, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:47:28,293] [INFO] [timer.py:197:stop] 0/2816, RunningAvgSamplesPerSec=6.32808113358933, CurrSamplesPerSec=5.686370713924023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:47:39,670] [INFO] [timer.py:197:stop] 0/2818, RunningAvgSamplesPerSec=6.3280773159046175, CurrSamplesPerSec=5.69438367796209, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:47:51,004] [INFO] [logging.py:68:log_dist] [Rank 0] step=1410, skipped=6, lr=[7.993333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 04:47:51,005] [INFO] [timer.py:197:stop] 0/2820, RunningAvgSamplesPerSec=6.32807301196147, CurrSamplesPerSec=5.67045537121452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:48:02,331] [INFO] [timer.py:197:stop] 0/2822, RunningAvgSamplesPerSec=6.328059147017276, CurrSamplesPerSec=5.650533549147907, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:48:13,703] [INFO] [timer.py:197:stop] 0/2824, RunningAvgSamplesPerSec=6.328045706602351, CurrSamplesPerSec=5.672104786704415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:48:25,068] [INFO] [timer.py:197:stop] 0/2826, RunningAvgSamplesPerSec=6.3280339548434315, CurrSamplesPerSec=5.66394258010329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:48:36,472] [INFO] [timer.py:197:stop] 0/2828, RunningAvgSamplesPerSec=6.328030900746557, CurrSamplesPerSec=5.688564833275014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:48:47,792] [INFO] [timer.py:197:stop] 0/2830, RunningAvgSamplesPerSec=6.328026748088272, CurrSamplesPerSec=5.679451686246409, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:48:59,189] [INFO] [timer.py:197:stop] 0/2832, RunningAvgSamplesPerSec=6.328026804564848, CurrSamplesPerSec=5.684458025483507, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:49:10,529] [INFO] [timer.py:197:stop] 0/2834, RunningAvgSamplesPerSec=6.328024982892278, CurrSamplesPerSec=5.67863517180743, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:49:21,856] [INFO] [timer.py:197:stop] 0/2836, RunningAvgSamplesPerSec=6.328014395426275, CurrSamplesPerSec=5.659487235536143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:49:33,177] [INFO] [timer.py:197:stop] 0/2838, RunningAvgSamplesPerSec=6.3280235350185565, CurrSamplesPerSec=5.705126267267629, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:49:44,491] [INFO] [logging.py:68:log_dist] [Rank 0] step=1420, skipped=6, lr=[7.971111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 04:49:44,493] [INFO] [timer.py:197:stop] 0/2840, RunningAvgSamplesPerSec=6.328027766405442, CurrSamplesPerSec=5.702016624408494, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:49:55,770] [INFO] [timer.py:197:stop] 0/2842, RunningAvgSamplesPerSec=6.328039114135194, CurrSamplesPerSec=5.722776134911764, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:50:07,115] [INFO] [timer.py:197:stop] 0/2844, RunningAvgSamplesPerSec=6.328035000182872, CurrSamplesPerSec=5.678800714469704, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:50:18,480] [INFO] [timer.py:197:stop] 0/2846, RunningAvgSamplesPerSec=6.328029164231121, CurrSamplesPerSec=5.68586411876447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:50:29,778] [INFO] [timer.py:197:stop] 0/2848, RunningAvgSamplesPerSec=6.328033474333568, CurrSamplesPerSec=5.704594018524391, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:50:41,212] [INFO] [timer.py:197:stop] 0/2850, RunningAvgSamplesPerSec=6.328011381198926, CurrSamplesPerSec=5.632575689698998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:50:52,628] [INFO] [timer.py:197:stop] 0/2852, RunningAvgSamplesPerSec=6.328001232294331, CurrSamplesPerSec=5.689375041541364, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:51:03,920] [INFO] [timer.py:197:stop] 0/2854, RunningAvgSamplesPerSec=6.328009054709353, CurrSamplesPerSec=5.709449281497758, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:51:15,264] [INFO] [timer.py:197:stop] 0/2856, RunningAvgSamplesPerSec=6.328000491390668, CurrSamplesPerSec=5.681008712297285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:51:26,561] [INFO] [timer.py:197:stop] 0/2858, RunningAvgSamplesPerSec=6.328008513130327, CurrSamplesPerSec=5.712748911752284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:51:37,853] [INFO] [logging.py:68:log_dist] [Rank 0] step=1430, skipped=6, lr=[7.948888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 04:51:37,855] [INFO] [timer.py:197:stop] 0/2860, RunningAvgSamplesPerSec=6.328018197364729, CurrSamplesPerSec=5.7013168769694715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0076, 'learning_rate': 7.948888888888889e-06, 'epoch': 10.71} [2022-12-19 04:51:49,176] [INFO] [timer.py:197:stop] 0/2862, RunningAvgSamplesPerSec=6.328020750240365, CurrSamplesPerSec=5.686290490986154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:52:00,450] [INFO] [timer.py:197:stop] 0/2864, RunningAvgSamplesPerSec=6.328016521942696, CurrSamplesPerSec=5.71119266759296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:52:11,749] [INFO] [timer.py:197:stop] 0/2866, RunningAvgSamplesPerSec=6.328023959881682, CurrSamplesPerSec=5.702093900232399, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:52:23,109] [INFO] [timer.py:197:stop] 0/2868, RunningAvgSamplesPerSec=6.328001495044163, CurrSamplesPerSec=5.631199369691056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:52:34,381] [INFO] [timer.py:197:stop] 0/2870, RunningAvgSamplesPerSec=6.328013361060001, CurrSamplesPerSec=5.717824381689599, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:52:45,635] [INFO] [timer.py:197:stop] 0/2872, RunningAvgSamplesPerSec=6.328018377743403, CurrSamplesPerSec=5.69557014462461, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:52:56,930] [INFO] [timer.py:197:stop] 0/2874, RunningAvgSamplesPerSec=6.328016674991873, CurrSamplesPerSec=5.685981424971226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:53:08,231] [INFO] [timer.py:197:stop] 0/2876, RunningAvgSamplesPerSec=6.328018556051217, CurrSamplesPerSec=5.694628663364615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:53:19,518] [INFO] [timer.py:197:stop] 0/2878, RunningAvgSamplesPerSec=6.32801962212495, CurrSamplesPerSec=5.703187371700545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:53:30,802] [INFO] [logging.py:68:log_dist] [Rank 0] step=1440, skipped=6, lr=[7.926666666666666e-06], mom=[[0.9, 0.999]] [2022-12-19 04:53:30,804] [INFO] [timer.py:197:stop] 0/2880, RunningAvgSamplesPerSec=6.32801484536757, CurrSamplesPerSec=5.702641431576725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:53:42,119] [INFO] [timer.py:197:stop] 0/2882, RunningAvgSamplesPerSec=6.328017438366161, CurrSamplesPerSec=5.689974165333387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:53:53,365] [INFO] [timer.py:197:stop] 0/2884, RunningAvgSamplesPerSec=6.328039908054118, CurrSamplesPerSec=5.712858819214412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:54:04,667] [INFO] [timer.py:197:stop] 0/2886, RunningAvgSamplesPerSec=6.328045292302608, CurrSamplesPerSec=5.712285498920746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:54:15,992] [INFO] [timer.py:197:stop] 0/2888, RunningAvgSamplesPerSec=6.328048401473016, CurrSamplesPerSec=5.707663269382921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:54:27,236] [INFO] [timer.py:197:stop] 0/2890, RunningAvgSamplesPerSec=6.3280648767684315, CurrSamplesPerSec=5.721754168953787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:54:38,497] [INFO] [timer.py:197:stop] 0/2892, RunningAvgSamplesPerSec=6.3280634780263245, CurrSamplesPerSec=5.695996281710191, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:54:49,765] [INFO] [timer.py:197:stop] 0/2894, RunningAvgSamplesPerSec=6.3280686075609145, CurrSamplesPerSec=5.710282213203645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:55:01,071] [INFO] [timer.py:197:stop] 0/2896, RunningAvgSamplesPerSec=6.328062761211853, CurrSamplesPerSec=5.706813873379465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:55:12,353] [INFO] [timer.py:197:stop] 0/2898, RunningAvgSamplesPerSec=6.328069547975778, CurrSamplesPerSec=5.7058347122383895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:55:23,647] [INFO] [logging.py:68:log_dist] [Rank 0] step=1450, skipped=6, lr=[7.904444444444444e-06], mom=[[0.9, 0.999]] [2022-12-19 04:55:23,649] [INFO] [timer.py:197:stop] 0/2900, RunningAvgSamplesPerSec=6.32806329466499, CurrSamplesPerSec=5.683485316842732, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:55:34,939] [INFO] [timer.py:197:stop] 0/2902, RunningAvgSamplesPerSec=6.32806677457631, CurrSamplesPerSec=5.698579364526741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:55:46,243] [INFO] [timer.py:197:stop] 0/2904, RunningAvgSamplesPerSec=6.328061945023685, CurrSamplesPerSec=5.691809224050612, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:55:57,533] [INFO] [timer.py:197:stop] 0/2906, RunningAvgSamplesPerSec=6.328071771991496, CurrSamplesPerSec=5.720959829802187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:56:08,783] [INFO] [timer.py:197:stop] 0/2908, RunningAvgSamplesPerSec=6.328086885746512, CurrSamplesPerSec=5.715824268858664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:56:20,023] [INFO] [timer.py:197:stop] 0/2910, RunningAvgSamplesPerSec=6.328106686640438, CurrSamplesPerSec=5.725415319324214, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0055, 'learning_rate': 7.893333333333335e-06, 'epoch': 10.9} [2022-12-19 04:56:31,325] [INFO] [timer.py:197:stop] 0/2912, RunningAvgSamplesPerSec=6.328114262056226, CurrSamplesPerSec=5.700369137215287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:56:42,703] [INFO] [timer.py:197:stop] 0/2914, RunningAvgSamplesPerSec=6.328115940023668, CurrSamplesPerSec=5.6972426695467755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:56:54,031] [INFO] [timer.py:197:stop] 0/2916, RunningAvgSamplesPerSec=6.328107630403451, CurrSamplesPerSec=5.674422970074205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:57:05,425] [INFO] [timer.py:197:stop] 0/2918, RunningAvgSamplesPerSec=6.3280804283160075, CurrSamplesPerSec=5.637650295782133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:57:16,699] [INFO] [logging.py:68:log_dist] [Rank 0] step=1460, skipped=6, lr=[7.882222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 04:57:16,700] [INFO] [timer.py:197:stop] 0/2920, RunningAvgSamplesPerSec=6.328087397539253, CurrSamplesPerSec=5.714915257340535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:57:27,981] [INFO] [timer.py:197:stop] 0/2922, RunningAvgSamplesPerSec=6.328092848179146, CurrSamplesPerSec=5.683642237163832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:57:39,304] [INFO] [timer.py:197:stop] 0/2924, RunningAvgSamplesPerSec=6.328093521739278, CurrSamplesPerSec=5.700690422975284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:57:50,564] [INFO] [timer.py:197:stop] 0/2926, RunningAvgSamplesPerSec=6.328110985157465, CurrSamplesPerSec=5.715094602980705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:58:01,840] [INFO] [timer.py:197:stop] 0/2928, RunningAvgSamplesPerSec=6.328118110081718, CurrSamplesPerSec=5.708677293838701, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:58:13,114] [INFO] [timer.py:197:stop] 0/2930, RunningAvgSamplesPerSec=6.328127849693794, CurrSamplesPerSec=5.717602727319664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:58:24,418] [INFO] [timer.py:197:stop] 0/2932, RunningAvgSamplesPerSec=6.328136722095922, CurrSamplesPerSec=5.704949001644653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:58:35,606] [INFO] [timer.py:197:stop] 0/2934, RunningAvgSamplesPerSec=6.328167290645664, CurrSamplesPerSec=5.745591664906901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:58:46,895] [INFO] [timer.py:197:stop] 0/2936, RunningAvgSamplesPerSec=6.328179729976317, CurrSamplesPerSec=5.713199510585645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:58:57,245] [INFO] [timer.py:197:stop] 0/2938, RunningAvgSamplesPerSec=6.328537537096797, CurrSamplesPerSec=5.693233446543866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:59:08,514] [INFO] [logging.py:68:log_dist] [Rank 0] step=1470, skipped=6, lr=[7.860000000000001e-06], mom=[[0.9, 0.999]] [2022-12-19 04:59:08,516] [INFO] [timer.py:197:stop] 0/2940, RunningAvgSamplesPerSec=6.328553112253914, CurrSamplesPerSec=5.718624916277096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:59:19,765] [INFO] [timer.py:197:stop] 0/2942, RunningAvgSamplesPerSec=6.328564247622196, CurrSamplesPerSec=5.71425296841138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:59:31,055] [INFO] [timer.py:197:stop] 0/2944, RunningAvgSamplesPerSec=6.328569138978007, CurrSamplesPerSec=5.700540307562565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:59:42,368] [INFO] [timer.py:197:stop] 0/2946, RunningAvgSamplesPerSec=6.32856017286216, CurrSamplesPerSec=5.651219933859987, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 04:59:53,703] [INFO] [timer.py:197:stop] 0/2948, RunningAvgSamplesPerSec=6.328535937381257, CurrSamplesPerSec=5.643053857852759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:00:04,996] [INFO] [timer.py:197:stop] 0/2950, RunningAvgSamplesPerSec=6.328538435165773, CurrSamplesPerSec=5.706945149214353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:00:16,337] [INFO] [timer.py:197:stop] 0/2952, RunningAvgSamplesPerSec=6.3285274491430075, CurrSamplesPerSec=5.677940430365727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:00:27,621] [INFO] [timer.py:197:stop] 0/2954, RunningAvgSamplesPerSec=6.328523302257989, CurrSamplesPerSec=5.678563095289823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:00:38,937] [INFO] [timer.py:197:stop] 0/2956, RunningAvgSamplesPerSec=6.328519958301418, CurrSamplesPerSec=5.685603990851538, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:00:50,293] [INFO] [timer.py:197:stop] 0/2958, RunningAvgSamplesPerSec=6.328504024560361, CurrSamplesPerSec=5.6660165679965795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:01:01,691] [INFO] [logging.py:68:log_dist] [Rank 0] step=1480, skipped=6, lr=[7.837777777777779e-06], mom=[[0.9, 0.999]] [2022-12-19 05:01:01,693] [INFO] [timer.py:197:stop] 0/2960, RunningAvgSamplesPerSec=6.328466097846122, CurrSamplesPerSec=5.6228752256079675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0043, 'learning_rate': 7.837777777777779e-06, 'epoch': 11.09} [2022-12-19 05:01:13,033] [INFO] [timer.py:197:stop] 0/2962, RunningAvgSamplesPerSec=6.328453487256887, CurrSamplesPerSec=5.66519745543826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:01:24,326] [INFO] [timer.py:197:stop] 0/2964, RunningAvgSamplesPerSec=6.328455442738555, CurrSamplesPerSec=5.694266024812775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:01:35,706] [INFO] [timer.py:197:stop] 0/2966, RunningAvgSamplesPerSec=6.328419452826874, CurrSamplesPerSec=5.623002432486171, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:01:47,025] [INFO] [timer.py:197:stop] 0/2968, RunningAvgSamplesPerSec=6.328408417273541, CurrSamplesPerSec=5.667863010558653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:01:58,357] [INFO] [timer.py:197:stop] 0/2970, RunningAvgSamplesPerSec=6.328401144547497, CurrSamplesPerSec=5.685270676540449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:02:09,675] [INFO] [timer.py:197:stop] 0/2972, RunningAvgSamplesPerSec=6.328397746174453, CurrSamplesPerSec=5.6911788245553545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:02:20,963] [INFO] [timer.py:197:stop] 0/2974, RunningAvgSamplesPerSec=6.3283936642458265, CurrSamplesPerSec=5.680122278318599, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:02:32,252] [INFO] [timer.py:197:stop] 0/2976, RunningAvgSamplesPerSec=6.32839567691284, CurrSamplesPerSec=5.680390078676554, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:02:43,576] [INFO] [timer.py:197:stop] 0/2978, RunningAvgSamplesPerSec=6.328380802074389, CurrSamplesPerSec=5.6628275488777335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:02:54,964] [INFO] [logging.py:68:log_dist] [Rank 0] step=1490, skipped=6, lr=[7.815555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 05:02:54,966] [INFO] [timer.py:197:stop] 0/2980, RunningAvgSamplesPerSec=6.328336851121675, CurrSamplesPerSec=5.605354870867302, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:03:06,669] [INFO] [timer.py:197:stop] 0/2982, RunningAvgSamplesPerSec=6.328324648518924, CurrSamplesPerSec=5.666310549639738, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:03:18,545] [INFO] [timer.py:197:stop] 0/2984, RunningAvgSamplesPerSec=6.328315291214793, CurrSamplesPerSec=5.67399597698336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:03:30,371] [INFO] [timer.py:197:stop] 0/2986, RunningAvgSamplesPerSec=6.328281639564932, CurrSamplesPerSec=5.616297061030959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:03:41,939] [INFO] [timer.py:197:stop] 0/2988, RunningAvgSamplesPerSec=6.328288006869865, CurrSamplesPerSec=5.7066595531340365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:03:53,277] [INFO] [timer.py:197:stop] 0/2990, RunningAvgSamplesPerSec=6.328279222134242, CurrSamplesPerSec=5.680966151374026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:04:04,579] [INFO] [timer.py:197:stop] 0/2992, RunningAvgSamplesPerSec=6.328276428425086, CurrSamplesPerSec=5.6803513734731315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:04:15,881] [INFO] [timer.py:197:stop] 0/2994, RunningAvgSamplesPerSec=6.328286421979469, CurrSamplesPerSec=5.709254747158511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:04:27,424] [INFO] [timer.py:197:stop] 0/2996, RunningAvgSamplesPerSec=6.328291092642149, CurrSamplesPerSec=5.712866843597428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:04:38,684] [INFO] [timer.py:197:stop] 0/2998, RunningAvgSamplesPerSec=6.32829399211752, CurrSamplesPerSec=5.706301203249798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:04:50,157] [INFO] [logging.py:68:log_dist] [Rank 0] step=1500, skipped=6, lr=[7.793333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 05:04:50,159] [INFO] [timer.py:197:stop] 0/3000, RunningAvgSamplesPerSec=6.3282912851712165, CurrSamplesPerSec=5.672812246854605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:05:01,403] [INFO] [timer.py:197:stop] 0/3002, RunningAvgSamplesPerSec=6.328295661889174, CurrSamplesPerSec=5.69814026144726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:05:12,696] [INFO] [timer.py:197:stop] 0/3004, RunningAvgSamplesPerSec=6.328291927135825, CurrSamplesPerSec=5.691676712718208, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:05:24,116] [INFO] [timer.py:197:stop] 0/3006, RunningAvgSamplesPerSec=6.3282894192678745, CurrSamplesPerSec=5.695494495641252, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:05:35,589] [INFO] [timer.py:197:stop] 0/3008, RunningAvgSamplesPerSec=6.328262098889924, CurrSamplesPerSec=5.6409034710005495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:05:46,948] [INFO] [timer.py:197:stop] 0/3010, RunningAvgSamplesPerSec=6.328253928378329, CurrSamplesPerSec=5.661742571452447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0032, 'learning_rate': 7.782222222222223e-06, 'epoch': 11.28} [2022-12-19 05:05:58,282] [INFO] [timer.py:197:stop] 0/3012, RunningAvgSamplesPerSec=6.328248570858139, CurrSamplesPerSec=5.686013703092876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:06:09,857] [INFO] [timer.py:197:stop] 0/3014, RunningAvgSamplesPerSec=6.3282559988251155, CurrSamplesPerSec=5.71582621618164, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:06:21,185] [INFO] [timer.py:197:stop] 0/3016, RunningAvgSamplesPerSec=6.328254383614397, CurrSamplesPerSec=5.705850236453297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:06:32,473] [INFO] [timer.py:197:stop] 0/3018, RunningAvgSamplesPerSec=6.328255510898824, CurrSamplesPerSec=5.713960072527907, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:06:43,936] [INFO] [logging.py:68:log_dist] [Rank 0] step=1510, skipped=6, lr=[7.771111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 05:06:43,938] [INFO] [timer.py:197:stop] 0/3020, RunningAvgSamplesPerSec=6.3282515180987176, CurrSamplesPerSec=5.6939804885171785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:06:55,384] [INFO] [timer.py:197:stop] 0/3022, RunningAvgSamplesPerSec=6.328193596686554, CurrSamplesPerSec=5.5555001458643565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:07:06,704] [INFO] [timer.py:197:stop] 0/3024, RunningAvgSamplesPerSec=6.328192664787297, CurrSamplesPerSec=5.694170601291176, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:07:18,120] [INFO] [timer.py:197:stop] 0/3026, RunningAvgSamplesPerSec=6.328151454667355, CurrSamplesPerSec=5.583128732314229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:07:29,582] [INFO] [timer.py:197:stop] 0/3028, RunningAvgSamplesPerSec=6.328163263932643, CurrSamplesPerSec=5.697568682080219, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:07:40,898] [INFO] [timer.py:197:stop] 0/3030, RunningAvgSamplesPerSec=6.3281596977231445, CurrSamplesPerSec=5.679549501103515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:07:52,454] [INFO] [timer.py:197:stop] 0/3032, RunningAvgSamplesPerSec=6.328132773628256, CurrSamplesPerSec=5.63279552836932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:08:03,765] [INFO] [timer.py:197:stop] 0/3034, RunningAvgSamplesPerSec=6.328121585521561, CurrSamplesPerSec=5.661188060621376, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:08:15,091] [INFO] [timer.py:197:stop] 0/3036, RunningAvgSamplesPerSec=6.328108935852303, CurrSamplesPerSec=5.65200692384674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:08:26,506] [INFO] [timer.py:197:stop] 0/3038, RunningAvgSamplesPerSec=6.328112382478073, CurrSamplesPerSec=5.701557857476143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:08:38,007] [INFO] [logging.py:68:log_dist] [Rank 0] step=1520, skipped=6, lr=[7.748888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 05:08:38,009] [INFO] [timer.py:197:stop] 0/3040, RunningAvgSamplesPerSec=6.328099682099432, CurrSamplesPerSec=5.654802783413342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:08:49,681] [INFO] [timer.py:197:stop] 0/3042, RunningAvgSamplesPerSec=6.327951097032844, CurrSamplesPerSec=5.668309428713679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:09:01,032] [INFO] [timer.py:197:stop] 0/3044, RunningAvgSamplesPerSec=6.327949142878965, CurrSamplesPerSec=5.686962217161316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:09:12,396] [INFO] [timer.py:197:stop] 0/3046, RunningAvgSamplesPerSec=6.327931406304929, CurrSamplesPerSec=5.636227234927698, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:09:23,671] [INFO] [timer.py:197:stop] 0/3048, RunningAvgSamplesPerSec=6.327939313034381, CurrSamplesPerSec=5.707925419771607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:09:35,054] [INFO] [timer.py:197:stop] 0/3050, RunningAvgSamplesPerSec=6.327907792947351, CurrSamplesPerSec=5.604192350895354, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:09:46,646] [INFO] [timer.py:197:stop] 0/3052, RunningAvgSamplesPerSec=6.3279198882446455, CurrSamplesPerSec=5.695960989433436, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:09:57,957] [INFO] [timer.py:197:stop] 0/3054, RunningAvgSamplesPerSec=6.327923346830666, CurrSamplesPerSec=5.691283077008536, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:10:09,564] [INFO] [timer.py:197:stop] 0/3056, RunningAvgSamplesPerSec=6.3278934945213665, CurrSamplesPerSec=5.681953151753008, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:10:20,874] [INFO] [timer.py:197:stop] 0/3058, RunningAvgSamplesPerSec=6.327899010653054, CurrSamplesPerSec=5.6879083962568675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:10:32,228] [INFO] [logging.py:68:log_dist] [Rank 0] step=1530, skipped=6, lr=[7.726666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 05:10:32,230] [INFO] [timer.py:197:stop] 0/3060, RunningAvgSamplesPerSec=6.327882549011342, CurrSamplesPerSec=5.664355391606366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0044, 'learning_rate': 7.726666666666667e-06, 'epoch': 11.46} [2022-12-19 05:10:43,590] [INFO] [timer.py:197:stop] 0/3062, RunningAvgSamplesPerSec=6.3278518281033, CurrSamplesPerSec=5.634619432570532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:10:55,163] [INFO] [timer.py:197:stop] 0/3064, RunningAvgSamplesPerSec=6.327851314477935, CurrSamplesPerSec=5.69036303621408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:11:06,649] [INFO] [timer.py:197:stop] 0/3066, RunningAvgSamplesPerSec=6.327767819542793, CurrSamplesPerSec=5.679224826426758, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:11:18,043] [INFO] [timer.py:197:stop] 0/3068, RunningAvgSamplesPerSec=6.327750057566012, CurrSamplesPerSec=5.6643847950685675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:11:29,417] [INFO] [timer.py:197:stop] 0/3070, RunningAvgSamplesPerSec=6.327719741185425, CurrSamplesPerSec=5.604516927936142, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:11:40,720] [INFO] [timer.py:197:stop] 0/3072, RunningAvgSamplesPerSec=6.32772351178092, CurrSamplesPerSec=5.700912947301681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:11:52,228] [INFO] [timer.py:197:stop] 0/3074, RunningAvgSamplesPerSec=6.327646122337193, CurrSamplesPerSec=5.508028984820901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:12:03,644] [INFO] [timer.py:197:stop] 0/3076, RunningAvgSamplesPerSec=6.3276467696620315, CurrSamplesPerSec=5.674475748963572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:12:14,931] [INFO] [timer.py:197:stop] 0/3078, RunningAvgSamplesPerSec=6.327656265029042, CurrSamplesPerSec=5.709478669235905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:12:26,469] [INFO] [logging.py:68:log_dist] [Rank 0] step=1540, skipped=6, lr=[7.704444444444446e-06], mom=[[0.9, 0.999]] [2022-12-19 05:12:26,471] [INFO] [timer.py:197:stop] 0/3080, RunningAvgSamplesPerSec=6.32763300526359, CurrSamplesPerSec=5.6766467714212245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:12:37,769] [INFO] [timer.py:197:stop] 0/3082, RunningAvgSamplesPerSec=6.327630636417984, CurrSamplesPerSec=5.680954369084249, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:12:49,172] [INFO] [timer.py:197:stop] 0/3084, RunningAvgSamplesPerSec=6.327599220409787, CurrSamplesPerSec=5.595684637684543, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:13:00,673] [INFO] [timer.py:197:stop] 0/3086, RunningAvgSamplesPerSec=6.3276168914519095, CurrSamplesPerSec=5.706890066012173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:13:11,943] [INFO] [timer.py:197:stop] 0/3088, RunningAvgSamplesPerSec=6.327629744248869, CurrSamplesPerSec=5.722205457683144, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:13:23,248] [INFO] [timer.py:197:stop] 0/3090, RunningAvgSamplesPerSec=6.327630430349635, CurrSamplesPerSec=5.72100884465378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:13:34,581] [INFO] [timer.py:197:stop] 0/3092, RunningAvgSamplesPerSec=6.32761432887818, CurrSamplesPerSec=5.64807891790032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:13:46,009] [INFO] [timer.py:197:stop] 0/3094, RunningAvgSamplesPerSec=6.327569404316789, CurrSamplesPerSec=5.580794954496749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:13:57,334] [INFO] [timer.py:197:stop] 0/3096, RunningAvgSamplesPerSec=6.327560290986922, CurrSamplesPerSec=5.676106140498819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:14:09,018] [INFO] [timer.py:197:stop] 0/3098, RunningAvgSamplesPerSec=6.3275212487131665, CurrSamplesPerSec=5.601465704109731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:14:20,352] [INFO] [logging.py:68:log_dist] [Rank 0] step=1550, skipped=6, lr=[7.682222222222224e-06], mom=[[0.9, 0.999]] [2022-12-19 05:14:20,354] [INFO] [timer.py:197:stop] 0/3100, RunningAvgSamplesPerSec=6.327504026790158, CurrSamplesPerSec=5.685565696252131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:14:31,771] [INFO] [timer.py:197:stop] 0/3102, RunningAvgSamplesPerSec=6.327500181985927, CurrSamplesPerSec=5.67683740919, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:14:43,122] [INFO] [timer.py:197:stop] 0/3104, RunningAvgSamplesPerSec=6.327482044237356, CurrSamplesPerSec=5.6627269643523155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:14:54,439] [INFO] [timer.py:197:stop] 0/3106, RunningAvgSamplesPerSec=6.327473741601666, CurrSamplesPerSec=5.68795612325707, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:15:05,970] [INFO] [timer.py:197:stop] 0/3108, RunningAvgSamplesPerSec=6.327386040239431, CurrSamplesPerSec=5.473776731707643, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:15:17,435] [INFO] [timer.py:197:stop] 0/3110, RunningAvgSamplesPerSec=6.327378365915814, CurrSamplesPerSec=5.674505497497502, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0047, 'learning_rate': 7.67111111111111e-06, 'epoch': 11.65} [2022-12-19 05:15:28,782] [INFO] [timer.py:197:stop] 0/3112, RunningAvgSamplesPerSec=6.327355247962858, CurrSamplesPerSec=5.662424754437423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:15:40,378] [INFO] [timer.py:197:stop] 0/3114, RunningAvgSamplesPerSec=6.327344715012686, CurrSamplesPerSec=5.682350549893783, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:15:51,683] [INFO] [timer.py:197:stop] 0/3116, RunningAvgSamplesPerSec=6.327342463999624, CurrSamplesPerSec=5.682087856759515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:16:03,120] [INFO] [timer.py:197:stop] 0/3118, RunningAvgSamplesPerSec=6.327283237783883, CurrSamplesPerSec=5.540135768232988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:16:14,403] [INFO] [logging.py:68:log_dist] [Rank 0] step=1560, skipped=6, lr=[7.660000000000001e-06], mom=[[0.9, 0.999]] [2022-12-19 05:16:14,404] [INFO] [timer.py:197:stop] 0/3120, RunningAvgSamplesPerSec=6.327295099897236, CurrSamplesPerSec=5.703068869822721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:16:25,813] [INFO] [timer.py:197:stop] 0/3122, RunningAvgSamplesPerSec=6.327293567562267, CurrSamplesPerSec=5.69359281375146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:16:37,183] [INFO] [timer.py:197:stop] 0/3124, RunningAvgSamplesPerSec=6.327258503951038, CurrSamplesPerSec=5.6657754731933405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:16:48,473] [INFO] [timer.py:197:stop] 0/3126, RunningAvgSamplesPerSec=6.327265701232279, CurrSamplesPerSec=5.694275688132538, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:16:59,810] [INFO] [timer.py:197:stop] 0/3128, RunningAvgSamplesPerSec=6.327251002406163, CurrSamplesPerSec=5.640321272624598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:17:11,101] [INFO] [timer.py:197:stop] 0/3130, RunningAvgSamplesPerSec=6.3272614294069225, CurrSamplesPerSec=5.695106131384954, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:17:22,536] [INFO] [timer.py:197:stop] 0/3132, RunningAvgSamplesPerSec=6.327217620122587, CurrSamplesPerSec=5.5795041249204855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:17:34,068] [INFO] [timer.py:197:stop] 0/3134, RunningAvgSamplesPerSec=6.327234372320249, CurrSamplesPerSec=5.734551974027534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:17:45,340] [INFO] [timer.py:197:stop] 0/3136, RunningAvgSamplesPerSec=6.327251951136241, CurrSamplesPerSec=5.714942267972056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:17:56,632] [INFO] [timer.py:197:stop] 0/3138, RunningAvgSamplesPerSec=6.327243235304735, CurrSamplesPerSec=5.7043384774258525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:18:08,129] [INFO] [logging.py:68:log_dist] [Rank 0] step=1570, skipped=6, lr=[7.637777777777779e-06], mom=[[0.9, 0.999]] [2022-12-19 05:18:08,131] [INFO] [timer.py:197:stop] 0/3140, RunningAvgSamplesPerSec=6.3272565567054695, CurrSamplesPerSec=5.717150703099615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:18:19,511] [INFO] [timer.py:197:stop] 0/3142, RunningAvgSamplesPerSec=6.32722852672274, CurrSamplesPerSec=5.604298588988932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:18:30,811] [INFO] [timer.py:197:stop] 0/3144, RunningAvgSamplesPerSec=6.327239968785513, CurrSamplesPerSec=5.691776155981903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:18:42,148] [INFO] [timer.py:197:stop] 0/3146, RunningAvgSamplesPerSec=6.327242247085782, CurrSamplesPerSec=5.675536092889234, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:18:53,388] [INFO] [timer.py:197:stop] 0/3148, RunningAvgSamplesPerSec=6.327250329126662, CurrSamplesPerSec=5.729694544309534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:19:04,927] [INFO] [timer.py:197:stop] 0/3150, RunningAvgSamplesPerSec=6.3272649526193865, CurrSamplesPerSec=5.696082338684128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:19:16,210] [INFO] [timer.py:197:stop] 0/3152, RunningAvgSamplesPerSec=6.327273335779039, CurrSamplesPerSec=5.681502898517167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:19:27,460] [INFO] [timer.py:197:stop] 0/3154, RunningAvgSamplesPerSec=6.327283745467141, CurrSamplesPerSec=5.693651021983541, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:19:38,731] [INFO] [timer.py:197:stop] 0/3156, RunningAvgSamplesPerSec=6.327296091314748, CurrSamplesPerSec=5.719135904130286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:19:50,365] [INFO] [timer.py:197:stop] 0/3158, RunningAvgSamplesPerSec=6.32726778656571, CurrSamplesPerSec=5.676677503092577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:20:01,664] [INFO] [logging.py:68:log_dist] [Rank 0] step=1580, skipped=6, lr=[7.6155555555555564e-06], mom=[[0.9, 0.999]] [2022-12-19 05:20:01,665] [INFO] [timer.py:197:stop] 0/3160, RunningAvgSamplesPerSec=6.3272730699261, CurrSamplesPerSec=5.698339119581938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0038, 'learning_rate': 7.6155555555555564e-06, 'epoch': 11.84} [2022-12-19 05:20:13,244] [INFO] [timer.py:197:stop] 0/3162, RunningAvgSamplesPerSec=6.327276288665601, CurrSamplesPerSec=5.721637089454292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:20:24,469] [INFO] [timer.py:197:stop] 0/3164, RunningAvgSamplesPerSec=6.327304389628525, CurrSamplesPerSec=5.737241762577872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:20:35,848] [INFO] [timer.py:197:stop] 0/3166, RunningAvgSamplesPerSec=6.327292393082484, CurrSamplesPerSec=5.652820321588672, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:20:47,387] [INFO] [timer.py:197:stop] 0/3168, RunningAvgSamplesPerSec=6.327301074156639, CurrSamplesPerSec=5.718256048089038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:20:58,713] [INFO] [timer.py:197:stop] 0/3170, RunningAvgSamplesPerSec=6.327305581706233, CurrSamplesPerSec=5.703412272241028, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:21:10,110] [INFO] [timer.py:197:stop] 0/3172, RunningAvgSamplesPerSec=6.327312285317053, CurrSamplesPerSec=5.708586485485943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:21:21,485] [INFO] [timer.py:197:stop] 0/3174, RunningAvgSamplesPerSec=6.327326923478493, CurrSamplesPerSec=5.711396812499822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:21:32,826] [INFO] [timer.py:197:stop] 0/3176, RunningAvgSamplesPerSec=6.327325726442489, CurrSamplesPerSec=5.680602365662611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:21:44,160] [INFO] [timer.py:197:stop] 0/3178, RunningAvgSamplesPerSec=6.32733102583557, CurrSamplesPerSec=5.693606580736237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:21:55,664] [INFO] [logging.py:68:log_dist] [Rank 0] step=1590, skipped=6, lr=[7.593333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 05:21:55,665] [INFO] [timer.py:197:stop] 0/3180, RunningAvgSamplesPerSec=6.327349225959304, CurrSamplesPerSec=5.714199690264869, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:22:07,033] [INFO] [timer.py:197:stop] 0/3182, RunningAvgSamplesPerSec=6.3273413270946435, CurrSamplesPerSec=5.6992772300721075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:22:18,393] [INFO] [timer.py:197:stop] 0/3184, RunningAvgSamplesPerSec=6.327329403083579, CurrSamplesPerSec=5.639379721595136, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:22:30,036] [INFO] [timer.py:197:stop] 0/3186, RunningAvgSamplesPerSec=6.327317586280409, CurrSamplesPerSec=5.657172414583765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:22:41,363] [INFO] [timer.py:197:stop] 0/3188, RunningAvgSamplesPerSec=6.327328226964402, CurrSamplesPerSec=5.697944319927457, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:22:52,706] [INFO] [timer.py:197:stop] 0/3190, RunningAvgSamplesPerSec=6.32733617519061, CurrSamplesPerSec=5.703879336813997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:23:04,112] [INFO] [timer.py:197:stop] 0/3192, RunningAvgSamplesPerSec=6.32735327600993, CurrSamplesPerSec=5.709565619852711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:23:15,540] [INFO] [timer.py:197:stop] 0/3194, RunningAvgSamplesPerSec=6.327367973389358, CurrSamplesPerSec=5.7078729877674155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:23:26,882] [INFO] [timer.py:197:stop] 0/3196, RunningAvgSamplesPerSec=6.327366818085398, CurrSamplesPerSec=5.681731383495379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:23:38,465] [INFO] [timer.py:197:stop] 0/3198, RunningAvgSamplesPerSec=6.327346119321882, CurrSamplesPerSec=5.634643797120674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:23:49,951] [INFO] [logging.py:68:log_dist] [Rank 0] step=1600, skipped=6, lr=[7.571111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 05:23:49,953] [INFO] [timer.py:197:stop] 0/3200, RunningAvgSamplesPerSec=6.327278186682201, CurrSamplesPerSec=5.517059994377617, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:24:01,229] [INFO] [timer.py:197:stop] 0/3202, RunningAvgSamplesPerSec=6.327293799464067, CurrSamplesPerSec=5.706954612955244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:24:11,631] [INFO] [timer.py:197:stop] 0/3204, RunningAvgSamplesPerSec=6.327604126568133, CurrSamplesPerSec=6.58555247032808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:24:23,210] [INFO] [timer.py:197:stop] 0/3206, RunningAvgSamplesPerSec=6.32760952278763, CurrSamplesPerSec=5.701025789996429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:24:34,531] [INFO] [timer.py:197:stop] 0/3208, RunningAvgSamplesPerSec=6.327615722638122, CurrSamplesPerSec=5.689247707815773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:24:46,182] [INFO] [timer.py:197:stop] 0/3210, RunningAvgSamplesPerSec=6.327599460991006, CurrSamplesPerSec=5.698869959750996, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:24:57,539] [INFO] [timer.py:197:stop] 0/3212, RunningAvgSamplesPerSec=6.3276058441133936, CurrSamplesPerSec=5.696755896838502, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0046, 'learning_rate': 7.557777777777779e-06, 'epoch': 12.03} [2022-12-19 05:25:08,916] [INFO] [timer.py:197:stop] 0/3214, RunningAvgSamplesPerSec=6.327596566349187, CurrSamplesPerSec=5.648633716222518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:25:20,350] [INFO] [timer.py:197:stop] 0/3216, RunningAvgSamplesPerSec=6.327601960494762, CurrSamplesPerSec=5.716863597678646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:25:31,644] [INFO] [timer.py:197:stop] 0/3218, RunningAvgSamplesPerSec=6.32761008394252, CurrSamplesPerSec=5.715650719027149, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:25:43,025] [INFO] [logging.py:68:log_dist] [Rank 0] step=1610, skipped=6, lr=[7.54888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 05:25:43,028] [INFO] [timer.py:197:stop] 0/3220, RunningAvgSamplesPerSec=6.327618110916218, CurrSamplesPerSec=5.731799349975107, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:25:54,314] [INFO] [timer.py:197:stop] 0/3222, RunningAvgSamplesPerSec=6.32762548397259, CurrSamplesPerSec=5.70506806666924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:26:05,685] [INFO] [timer.py:197:stop] 0/3224, RunningAvgSamplesPerSec=6.327600770761767, CurrSamplesPerSec=5.606608742146772, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:26:16,949] [INFO] [timer.py:197:stop] 0/3226, RunningAvgSamplesPerSec=6.327612348021798, CurrSamplesPerSec=5.708296840713409, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:26:28,506] [INFO] [timer.py:197:stop] 0/3228, RunningAvgSamplesPerSec=6.3276311426972525, CurrSamplesPerSec=5.714326440361829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:26:39,847] [INFO] [timer.py:197:stop] 0/3230, RunningAvgSamplesPerSec=6.32761817671236, CurrSamplesPerSec=5.7233876836940265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:26:51,132] [INFO] [timer.py:197:stop] 0/3232, RunningAvgSamplesPerSec=6.32762935223219, CurrSamplesPerSec=5.707601618992729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:27:02,748] [INFO] [timer.py:197:stop] 0/3234, RunningAvgSamplesPerSec=6.32764760344975, CurrSamplesPerSec=5.717935702509625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:27:14,038] [INFO] [timer.py:197:stop] 0/3236, RunningAvgSamplesPerSec=6.327659647626639, CurrSamplesPerSec=5.698907465873488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:27:25,456] [INFO] [timer.py:197:stop] 0/3238, RunningAvgSamplesPerSec=6.327619817429751, CurrSamplesPerSec=5.580952289077431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:27:36,982] [INFO] [logging.py:68:log_dist] [Rank 0] step=1620, skipped=6, lr=[7.526666666666668e-06], mom=[[0.9, 0.999]] [2022-12-19 05:27:36,984] [INFO] [timer.py:197:stop] 0/3240, RunningAvgSamplesPerSec=6.327627961024659, CurrSamplesPerSec=5.702992294541466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:27:48,273] [INFO] [timer.py:197:stop] 0/3242, RunningAvgSamplesPerSec=6.3276443613839435, CurrSamplesPerSec=5.725364763567056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:27:59,618] [INFO] [timer.py:197:stop] 0/3244, RunningAvgSamplesPerSec=6.327635652418653, CurrSamplesPerSec=5.700897692087018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:28:11,162] [INFO] [timer.py:197:stop] 0/3246, RunningAvgSamplesPerSec=6.327655089079466, CurrSamplesPerSec=5.719064014291852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:28:22,792] [INFO] [timer.py:197:stop] 0/3248, RunningAvgSamplesPerSec=6.3275342379150965, CurrSamplesPerSec=5.3689646827188975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:28:34,115] [INFO] [timer.py:197:stop] 0/3250, RunningAvgSamplesPerSec=6.32753796391829, CurrSamplesPerSec=5.694850714696345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:28:45,582] [INFO] [timer.py:197:stop] 0/3252, RunningAvgSamplesPerSec=6.327483597221704, CurrSamplesPerSec=5.535634632361154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:28:56,999] [INFO] [timer.py:197:stop] 0/3254, RunningAvgSamplesPerSec=6.327490856308182, CurrSamplesPerSec=5.693454423024037, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:29:08,274] [INFO] [timer.py:197:stop] 0/3256, RunningAvgSamplesPerSec=6.327510110590001, CurrSamplesPerSec=5.723780647127385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:29:19,824] [INFO] [timer.py:197:stop] 0/3258, RunningAvgSamplesPerSec=6.327516599602008, CurrSamplesPerSec=5.719700606830363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:29:31,085] [INFO] [logging.py:68:log_dist] [Rank 0] step=1630, skipped=6, lr=[7.504444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 05:29:31,086] [INFO] [timer.py:197:stop] 0/3260, RunningAvgSamplesPerSec=6.327525091138928, CurrSamplesPerSec=5.709420136932266, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:29:42,349] [INFO] [timer.py:197:stop] 0/3262, RunningAvgSamplesPerSec=6.327527075664958, CurrSamplesPerSec=5.690434930076675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0033, 'learning_rate': 7.502222222222223e-06, 'epoch': 12.22} [2022-12-19 05:29:53,584] [INFO] [timer.py:197:stop] 0/3264, RunningAvgSamplesPerSec=6.327544151844336, CurrSamplesPerSec=5.709951829176723, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:30:05,062] [INFO] [timer.py:197:stop] 0/3266, RunningAvgSamplesPerSec=6.327557789643045, CurrSamplesPerSec=5.719409833162306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:30:16,698] [INFO] [timer.py:197:stop] 0/3268, RunningAvgSamplesPerSec=6.327424981811556, CurrSamplesPerSec=5.643409053506663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:30:28,176] [INFO] [timer.py:197:stop] 0/3270, RunningAvgSamplesPerSec=6.327434505779468, CurrSamplesPerSec=5.7109987434483775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:30:39,535] [INFO] [timer.py:197:stop] 0/3272, RunningAvgSamplesPerSec=6.32741149979322, CurrSamplesPerSec=5.605497673837791, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:30:50,851] [INFO] [timer.py:197:stop] 0/3274, RunningAvgSamplesPerSec=6.327411286233473, CurrSamplesPerSec=5.682445337079244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:31:02,348] [INFO] [timer.py:197:stop] 0/3276, RunningAvgSamplesPerSec=6.327341354910669, CurrSamplesPerSec=5.505255306742497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:31:13,810] [INFO] [timer.py:197:stop] 0/3278, RunningAvgSamplesPerSec=6.327349458695228, CurrSamplesPerSec=5.71120870702165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:31:25,098] [INFO] [logging.py:68:log_dist] [Rank 0] step=1640, skipped=6, lr=[7.482222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 05:31:25,101] [INFO] [timer.py:197:stop] 0/3280, RunningAvgSamplesPerSec=6.327355047199058, CurrSamplesPerSec=5.698099620616975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:31:36,418] [INFO] [timer.py:197:stop] 0/3282, RunningAvgSamplesPerSec=6.327348053369216, CurrSamplesPerSec=5.687372607180313, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:31:47,707] [INFO] [timer.py:197:stop] 0/3284, RunningAvgSamplesPerSec=6.327352285172244, CurrSamplesPerSec=5.708629946877198, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:31:59,359] [INFO] [timer.py:197:stop] 0/3286, RunningAvgSamplesPerSec=6.327221677951555, CurrSamplesPerSec=5.361217140718874, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:32:10,646] [INFO] [timer.py:197:stop] 0/3288, RunningAvgSamplesPerSec=6.32724343148276, CurrSamplesPerSec=5.7360609159296425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:32:21,899] [INFO] [timer.py:197:stop] 0/3290, RunningAvgSamplesPerSec=6.327256840019347, CurrSamplesPerSec=5.717670683261285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:32:33,368] [INFO] [timer.py:197:stop] 0/3292, RunningAvgSamplesPerSec=6.327261954032543, CurrSamplesPerSec=5.699463098260543, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:32:44,687] [INFO] [timer.py:197:stop] 0/3294, RunningAvgSamplesPerSec=6.327264404060781, CurrSamplesPerSec=5.722627294051534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:32:55,976] [INFO] [timer.py:197:stop] 0/3296, RunningAvgSamplesPerSec=6.327271926749637, CurrSamplesPerSec=5.69979323779556, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:33:07,454] [INFO] [timer.py:197:stop] 0/3298, RunningAvgSamplesPerSec=6.327271619355063, CurrSamplesPerSec=5.705782803758263, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:33:18,753] [INFO] [logging.py:68:log_dist] [Rank 0] step=1650, skipped=6, lr=[7.4600000000000006e-06], mom=[[0.9, 0.999]] [2022-12-19 05:33:18,755] [INFO] [timer.py:197:stop] 0/3300, RunningAvgSamplesPerSec=6.327284155641359, CurrSamplesPerSec=5.713918232709922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:33:30,037] [INFO] [timer.py:197:stop] 0/3302, RunningAvgSamplesPerSec=6.327284370338174, CurrSamplesPerSec=5.683102920017716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:33:41,626] [INFO] [timer.py:197:stop] 0/3304, RunningAvgSamplesPerSec=6.327274368910331, CurrSamplesPerSec=5.659979116507932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:33:53,016] [INFO] [timer.py:197:stop] 0/3306, RunningAvgSamplesPerSec=6.3272381215557, CurrSamplesPerSec=5.586416927154795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:34:04,330] [INFO] [timer.py:197:stop] 0/3308, RunningAvgSamplesPerSec=6.327235938884898, CurrSamplesPerSec=5.677878940004718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:34:15,674] [INFO] [timer.py:197:stop] 0/3310, RunningAvgSamplesPerSec=6.327210054771652, CurrSamplesPerSec=5.6313236456871625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:34:27,161] [INFO] [timer.py:197:stop] 0/3312, RunningAvgSamplesPerSec=6.3272076054283, CurrSamplesPerSec=5.701639238409189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.003, 'learning_rate': 7.446666666666668e-06, 'epoch': 12.4} [2022-12-19 05:34:38,473] [INFO] [timer.py:197:stop] 0/3314, RunningAvgSamplesPerSec=6.327206463449304, CurrSamplesPerSec=5.695051518133289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:34:49,885] [INFO] [timer.py:197:stop] 0/3316, RunningAvgSamplesPerSec=6.327191212694582, CurrSamplesPerSec=5.689716314386778, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:35:01,331] [INFO] [timer.py:197:stop] 0/3318, RunningAvgSamplesPerSec=6.327174328680244, CurrSamplesPerSec=5.670730167244587, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:35:12,677] [INFO] [logging.py:68:log_dist] [Rank 0] step=1660, skipped=6, lr=[7.437777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 05:35:12,679] [INFO] [timer.py:197:stop] 0/3320, RunningAvgSamplesPerSec=6.327157520485046, CurrSamplesPerSec=5.6574161162795304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:35:23,953] [INFO] [timer.py:197:stop] 0/3322, RunningAvgSamplesPerSec=6.32716703377341, CurrSamplesPerSec=5.705027326956902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:35:35,468] [INFO] [timer.py:197:stop] 0/3324, RunningAvgSamplesPerSec=6.327172944901139, CurrSamplesPerSec=5.716967819213315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:35:46,844] [INFO] [timer.py:197:stop] 0/3326, RunningAvgSamplesPerSec=6.327153050067128, CurrSamplesPerSec=5.705053759203976, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:35:58,242] [INFO] [timer.py:197:stop] 0/3328, RunningAvgSamplesPerSec=6.327118030876253, CurrSamplesPerSec=5.6032223540879, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:36:09,889] [INFO] [timer.py:197:stop] 0/3330, RunningAvgSamplesPerSec=6.327091776262096, CurrSamplesPerSec=5.669499422712521, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:36:21,220] [INFO] [timer.py:197:stop] 0/3332, RunningAvgSamplesPerSec=6.327078902932199, CurrSamplesPerSec=5.66766292315318, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:36:32,676] [INFO] [timer.py:197:stop] 0/3334, RunningAvgSamplesPerSec=6.327016952108796, CurrSamplesPerSec=5.5291192646105705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:36:44,191] [INFO] [timer.py:197:stop] 0/3336, RunningAvgSamplesPerSec=6.327007348463326, CurrSamplesPerSec=5.694495054313937, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:36:55,533] [INFO] [timer.py:197:stop] 0/3338, RunningAvgSamplesPerSec=6.327017349510959, CurrSamplesPerSec=5.7087253700186045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:37:06,945] [INFO] [logging.py:68:log_dist] [Rank 0] step=1670, skipped=6, lr=[7.415555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 05:37:06,946] [INFO] [timer.py:197:stop] 0/3340, RunningAvgSamplesPerSec=6.32696920973117, CurrSamplesPerSec=5.681510835055693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:37:18,380] [INFO] [timer.py:197:stop] 0/3342, RunningAvgSamplesPerSec=6.326947923907522, CurrSamplesPerSec=5.6529372206309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:37:29,663] [INFO] [timer.py:197:stop] 0/3344, RunningAvgSamplesPerSec=6.326934113634141, CurrSamplesPerSec=5.6519400437845375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:37:41,002] [INFO] [timer.py:197:stop] 0/3346, RunningAvgSamplesPerSec=6.326924464797597, CurrSamplesPerSec=5.675516413290304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:37:52,547] [INFO] [timer.py:197:stop] 0/3348, RunningAvgSamplesPerSec=6.326925402694814, CurrSamplesPerSec=5.691216953481804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:38:03,878] [INFO] [timer.py:197:stop] 0/3350, RunningAvgSamplesPerSec=6.326923085929561, CurrSamplesPerSec=5.6846903597717695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:38:15,367] [INFO] [timer.py:197:stop] 0/3352, RunningAvgSamplesPerSec=6.326927717158954, CurrSamplesPerSec=5.70921953324575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:38:26,649] [INFO] [timer.py:197:stop] 0/3354, RunningAvgSamplesPerSec=6.32692991987726, CurrSamplesPerSec=5.70514494021122, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:38:37,959] [INFO] [timer.py:197:stop] 0/3356, RunningAvgSamplesPerSec=6.326934590336984, CurrSamplesPerSec=5.696143015248453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:38:49,490] [INFO] [timer.py:197:stop] 0/3358, RunningAvgSamplesPerSec=6.326858953976911, CurrSamplesPerSec=5.495537103671897, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:39:00,913] [INFO] [logging.py:68:log_dist] [Rank 0] step=1680, skipped=6, lr=[7.393333333333333e-06], mom=[[0.9, 0.999]] [2022-12-19 05:39:00,914] [INFO] [timer.py:197:stop] 0/3360, RunningAvgSamplesPerSec=6.326854292042068, CurrSamplesPerSec=5.690879361321751, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:39:12,248] [INFO] [timer.py:197:stop] 0/3362, RunningAvgSamplesPerSec=6.3268493480759425, CurrSamplesPerSec=5.684688915147572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0029, 'learning_rate': 7.3911111111111125e-06, 'epoch': 12.59} [2022-12-19 05:39:23,817] [INFO] [timer.py:197:stop] 0/3364, RunningAvgSamplesPerSec=6.326834050070454, CurrSamplesPerSec=5.6998418906231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:39:35,119] [INFO] [timer.py:197:stop] 0/3366, RunningAvgSamplesPerSec=6.326828098037519, CurrSamplesPerSec=5.682196587601569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:39:46,559] [INFO] [timer.py:197:stop] 0/3368, RunningAvgSamplesPerSec=6.326780815829117, CurrSamplesPerSec=5.5551556997185125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:39:57,875] [INFO] [timer.py:197:stop] 0/3370, RunningAvgSamplesPerSec=6.32677821758068, CurrSamplesPerSec=5.676400689959532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:40:09,255] [INFO] [timer.py:197:stop] 0/3372, RunningAvgSamplesPerSec=6.3267882571478955, CurrSamplesPerSec=5.710604131472058, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:40:20,689] [INFO] [timer.py:197:stop] 0/3374, RunningAvgSamplesPerSec=6.326730527836922, CurrSamplesPerSec=5.707032750682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:40:32,148] [INFO] [timer.py:197:stop] 0/3376, RunningAvgSamplesPerSec=6.3267295741725444, CurrSamplesPerSec=5.694333185563267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:40:43,510] [INFO] [timer.py:197:stop] 0/3378, RunningAvgSamplesPerSec=6.326705435815362, CurrSamplesPerSec=5.611262213805419, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:40:54,804] [INFO] [logging.py:68:log_dist] [Rank 0] step=1690, skipped=6, lr=[7.371111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 05:40:54,806] [INFO] [timer.py:197:stop] 0/3380, RunningAvgSamplesPerSec=6.326714308887534, CurrSamplesPerSec=5.707634871203096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:41:06,075] [INFO] [timer.py:197:stop] 0/3382, RunningAvgSamplesPerSec=6.326713177964571, CurrSamplesPerSec=5.700444189425968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:41:17,729] [INFO] [timer.py:197:stop] 0/3384, RunningAvgSamplesPerSec=6.326720135513513, CurrSamplesPerSec=5.7290645289179665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:41:28,994] [INFO] [timer.py:197:stop] 0/3386, RunningAvgSamplesPerSec=6.326732466797276, CurrSamplesPerSec=5.714702101314757, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:41:40,441] [INFO] [timer.py:197:stop] 0/3388, RunningAvgSamplesPerSec=6.326686513680558, CurrSamplesPerSec=5.57524727753475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:41:51,769] [INFO] [timer.py:197:stop] 0/3390, RunningAvgSamplesPerSec=6.326677841682777, CurrSamplesPerSec=5.658647345374064, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:42:03,178] [INFO] [timer.py:197:stop] 0/3392, RunningAvgSamplesPerSec=6.326646116079718, CurrSamplesPerSec=5.611298341043529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:42:14,585] [INFO] [timer.py:197:stop] 0/3394, RunningAvgSamplesPerSec=6.3266519814796816, CurrSamplesPerSec=5.702809588411382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:42:25,883] [INFO] [timer.py:197:stop] 0/3396, RunningAvgSamplesPerSec=6.326648762953722, CurrSamplesPerSec=5.6889624334353215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:42:37,202] [INFO] [timer.py:197:stop] 0/3398, RunningAvgSamplesPerSec=6.326648497846412, CurrSamplesPerSec=5.687705444928099, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:42:48,474] [INFO] [logging.py:68:log_dist] [Rank 0] step=1700, skipped=6, lr=[7.3488888888888895e-06], mom=[[0.9, 0.999]] [2022-12-19 05:42:48,476] [INFO] [timer.py:197:stop] 0/3400, RunningAvgSamplesPerSec=6.326657401019103, CurrSamplesPerSec=5.698191547131893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:42:59,790] [INFO] [timer.py:197:stop] 0/3402, RunningAvgSamplesPerSec=6.326651698781593, CurrSamplesPerSec=5.673167122193093, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:43:11,321] [INFO] [timer.py:197:stop] 0/3404, RunningAvgSamplesPerSec=6.326648946103703, CurrSamplesPerSec=5.682953248174814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:43:22,614] [INFO] [timer.py:197:stop] 0/3406, RunningAvgSamplesPerSec=6.326632365280785, CurrSamplesPerSec=5.6279915931762865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:43:33,924] [INFO] [timer.py:197:stop] 0/3408, RunningAvgSamplesPerSec=6.326635676428318, CurrSamplesPerSec=5.707156756249256, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:43:45,365] [INFO] [timer.py:197:stop] 0/3410, RunningAvgSamplesPerSec=6.326645839426628, CurrSamplesPerSec=5.714854180187268, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:43:56,690] [INFO] [timer.py:197:stop] 0/3412, RunningAvgSamplesPerSec=6.326625956215127, CurrSamplesPerSec=5.598389311708886, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.004, 'learning_rate': 7.335555555555556e-06, 'epoch': 12.78} [2022-12-19 05:44:07,988] [INFO] [timer.py:197:stop] 0/3414, RunningAvgSamplesPerSec=6.326631776114331, CurrSamplesPerSec=5.68953469907651, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:44:19,490] [INFO] [timer.py:197:stop] 0/3416, RunningAvgSamplesPerSec=6.3266360505014525, CurrSamplesPerSec=5.698103491147264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:44:30,799] [INFO] [timer.py:197:stop] 0/3418, RunningAvgSamplesPerSec=6.32663644224908, CurrSamplesPerSec=5.710917338114325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:44:42,108] [INFO] [logging.py:68:log_dist] [Rank 0] step=1710, skipped=6, lr=[7.326666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 05:44:42,110] [INFO] [timer.py:197:stop] 0/3420, RunningAvgSamplesPerSec=6.326639560877393, CurrSamplesPerSec=5.717202575093439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:44:53,582] [INFO] [timer.py:197:stop] 0/3422, RunningAvgSamplesPerSec=6.326611401896304, CurrSamplesPerSec=5.629033690097871, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:45:04,923] [INFO] [timer.py:197:stop] 0/3424, RunningAvgSamplesPerSec=6.326606987698249, CurrSamplesPerSec=5.6887572368430535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:45:16,242] [INFO] [timer.py:197:stop] 0/3426, RunningAvgSamplesPerSec=6.326598769005772, CurrSamplesPerSec=5.679005433692618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:45:27,561] [INFO] [timer.py:197:stop] 0/3428, RunningAvgSamplesPerSec=6.3266131227743445, CurrSamplesPerSec=5.72014450193903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:45:38,935] [INFO] [timer.py:197:stop] 0/3430, RunningAvgSamplesPerSec=6.326605726866086, CurrSamplesPerSec=5.686675726296367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:45:50,536] [INFO] [timer.py:197:stop] 0/3432, RunningAvgSamplesPerSec=6.326595861106073, CurrSamplesPerSec=5.694853131019353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:46:01,869] [INFO] [timer.py:197:stop] 0/3434, RunningAvgSamplesPerSec=6.326585522624159, CurrSamplesPerSec=5.66276471299524, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:46:13,226] [INFO] [timer.py:197:stop] 0/3436, RunningAvgSamplesPerSec=6.326567020712361, CurrSamplesPerSec=5.663969111054076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:46:24,519] [INFO] [timer.py:197:stop] 0/3438, RunningAvgSamplesPerSec=6.32656453557322, CurrSamplesPerSec=5.692237695448678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:46:36,053] [INFO] [logging.py:68:log_dist] [Rank 0] step=1720, skipped=6, lr=[7.304444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 05:46:36,054] [INFO] [timer.py:197:stop] 0/3440, RunningAvgSamplesPerSec=6.326560956325877, CurrSamplesPerSec=5.692520884063204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:46:47,426] [INFO] [timer.py:197:stop] 0/3442, RunningAvgSamplesPerSec=6.326536655322881, CurrSamplesPerSec=5.702122727800303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:46:58,723] [INFO] [timer.py:197:stop] 0/3444, RunningAvgSamplesPerSec=6.326541983680875, CurrSamplesPerSec=5.706404554792997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:47:10,125] [INFO] [timer.py:197:stop] 0/3446, RunningAvgSamplesPerSec=6.326509397357117, CurrSamplesPerSec=5.614093981908943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:47:21,400] [INFO] [timer.py:197:stop] 0/3448, RunningAvgSamplesPerSec=6.326512157875123, CurrSamplesPerSec=5.6845109912147365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:47:33,090] [INFO] [timer.py:197:stop] 0/3450, RunningAvgSamplesPerSec=6.326375426060224, CurrSamplesPerSec=5.335345709382433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:47:44,424] [INFO] [timer.py:197:stop] 0/3452, RunningAvgSamplesPerSec=6.326388701570788, CurrSamplesPerSec=5.710946011967234, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:47:55,767] [INFO] [timer.py:197:stop] 0/3454, RunningAvgSamplesPerSec=6.326377641060245, CurrSamplesPerSec=5.715889018059057, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:48:07,140] [INFO] [timer.py:197:stop] 0/3456, RunningAvgSamplesPerSec=6.326388647998702, CurrSamplesPerSec=5.697459362147123, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:48:18,418] [INFO] [timer.py:197:stop] 0/3458, RunningAvgSamplesPerSec=6.326398113286027, CurrSamplesPerSec=5.709158092079534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:48:29,716] [INFO] [logging.py:68:log_dist] [Rank 0] step=1730, skipped=6, lr=[7.282222222222222e-06], mom=[[0.9, 0.999]] [2022-12-19 05:48:29,717] [INFO] [timer.py:197:stop] 0/3460, RunningAvgSamplesPerSec=6.326394017704326, CurrSamplesPerSec=5.672259161464296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:48:41,227] [INFO] [timer.py:197:stop] 0/3462, RunningAvgSamplesPerSec=6.3264093440592815, CurrSamplesPerSec=5.726206254597532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0043, 'learning_rate': 7.280000000000001e-06, 'epoch': 12.97} [2022-12-19 05:48:52,498] [INFO] [timer.py:197:stop] 0/3464, RunningAvgSamplesPerSec=6.326427099876996, CurrSamplesPerSec=5.717957382560915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:49:03,766] [INFO] [timer.py:197:stop] 0/3466, RunningAvgSamplesPerSec=6.326441977242598, CurrSamplesPerSec=5.725354017530834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:49:15,152] [INFO] [timer.py:197:stop] 0/3468, RunningAvgSamplesPerSec=6.326463967532153, CurrSamplesPerSec=5.725744564162017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:49:26,707] [INFO] [timer.py:197:stop] 0/3470, RunningAvgSamplesPerSec=6.326375323116544, CurrSamplesPerSec=5.426805899088813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:49:37,189] [INFO] [timer.py:197:stop] 0/3472, RunningAvgSamplesPerSec=6.3266802523624825, CurrSamplesPerSec=5.70831869050943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:49:48,504] [INFO] [timer.py:197:stop] 0/3474, RunningAvgSamplesPerSec=6.326685735909001, CurrSamplesPerSec=5.701034023329271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:49:59,797] [INFO] [timer.py:197:stop] 0/3476, RunningAvgSamplesPerSec=6.326692818190302, CurrSamplesPerSec=5.704854189670443, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:50:11,277] [INFO] [timer.py:197:stop] 0/3478, RunningAvgSamplesPerSec=6.326708425575501, CurrSamplesPerSec=5.7136397216300985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:50:22,542] [INFO] [logging.py:68:log_dist] [Rank 0] step=1740, skipped=6, lr=[7.260000000000001e-06], mom=[[0.9, 0.999]] [2022-12-19 05:50:22,543] [INFO] [timer.py:197:stop] 0/3480, RunningAvgSamplesPerSec=6.326728545355813, CurrSamplesPerSec=5.726455207478292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:50:34,034] [INFO] [timer.py:197:stop] 0/3482, RunningAvgSamplesPerSec=6.3267380612475845, CurrSamplesPerSec=5.717143884315967, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:50:45,326] [INFO] [timer.py:197:stop] 0/3484, RunningAvgSamplesPerSec=6.326748736202143, CurrSamplesPerSec=5.705172343583519, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:50:56,592] [INFO] [timer.py:197:stop] 0/3486, RunningAvgSamplesPerSec=6.326761533035156, CurrSamplesPerSec=5.711320013417701, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:51:08,012] [INFO] [timer.py:197:stop] 0/3488, RunningAvgSamplesPerSec=6.326762904916457, CurrSamplesPerSec=5.696182902937174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:51:19,284] [INFO] [timer.py:197:stop] 0/3490, RunningAvgSamplesPerSec=6.326780957504711, CurrSamplesPerSec=5.714920854117087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:51:30,541] [INFO] [timer.py:197:stop] 0/3492, RunningAvgSamplesPerSec=6.326787623877617, CurrSamplesPerSec=5.700508832752853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:51:41,798] [INFO] [timer.py:197:stop] 0/3494, RunningAvgSamplesPerSec=6.326803529014475, CurrSamplesPerSec=5.704377025328253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:51:53,078] [INFO] [timer.py:197:stop] 0/3496, RunningAvgSamplesPerSec=6.3268166624136795, CurrSamplesPerSec=5.703417119430692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:52:04,351] [INFO] [timer.py:197:stop] 0/3498, RunningAvgSamplesPerSec=6.326831237990966, CurrSamplesPerSec=5.71983588881477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:52:15,758] [INFO] [logging.py:68:log_dist] [Rank 0] step=1750, skipped=6, lr=[7.237777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 05:52:15,760] [INFO] [timer.py:197:stop] 0/3500, RunningAvgSamplesPerSec=6.326856513302757, CurrSamplesPerSec=5.735738817502428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:52:27,246] [INFO] [timer.py:197:stop] 0/3502, RunningAvgSamplesPerSec=6.326870947464406, CurrSamplesPerSec=5.723518014653597, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:52:38,720] [INFO] [timer.py:197:stop] 0/3504, RunningAvgSamplesPerSec=6.326877926856427, CurrSamplesPerSec=5.707395561015184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:52:50,020] [INFO] [timer.py:197:stop] 0/3506, RunningAvgSamplesPerSec=6.326887651085633, CurrSamplesPerSec=5.711364245548224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:53:01,346] [INFO] [timer.py:197:stop] 0/3508, RunningAvgSamplesPerSec=6.3268795423083, CurrSamplesPerSec=5.653505118779112, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:53:12,678] [INFO] [timer.py:197:stop] 0/3510, RunningAvgSamplesPerSec=6.326876581092225, CurrSamplesPerSec=5.652179248452941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:53:23,957] [INFO] [timer.py:197:stop] 0/3512, RunningAvgSamplesPerSec=6.3268875997764, CurrSamplesPerSec=5.71528515484695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0027, 'learning_rate': 7.224444444444445e-06, 'epoch': 13.16} [2022-12-19 05:53:35,270] [INFO] [timer.py:197:stop] 0/3514, RunningAvgSamplesPerSec=6.326899494397384, CurrSamplesPerSec=5.7058170050338894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:53:46,531] [INFO] [timer.py:197:stop] 0/3516, RunningAvgSamplesPerSec=6.326911036284156, CurrSamplesPerSec=5.715666540103945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:53:57,787] [INFO] [timer.py:197:stop] 0/3518, RunningAvgSamplesPerSec=6.326930984700249, CurrSamplesPerSec=5.725664447688843, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:54:09,126] [INFO] [logging.py:68:log_dist] [Rank 0] step=1760, skipped=6, lr=[7.215555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 05:54:09,128] [INFO] [timer.py:197:stop] 0/3520, RunningAvgSamplesPerSec=6.326945607785155, CurrSamplesPerSec=5.709355291341566, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:54:20,455] [INFO] [timer.py:197:stop] 0/3522, RunningAvgSamplesPerSec=6.326939268179807, CurrSamplesPerSec=5.692396789420292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:54:31,763] [INFO] [timer.py:197:stop] 0/3524, RunningAvgSamplesPerSec=6.326936393832304, CurrSamplesPerSec=5.674683995251484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:54:43,062] [INFO] [timer.py:197:stop] 0/3526, RunningAvgSamplesPerSec=6.326937995617768, CurrSamplesPerSec=5.699700049375305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:54:54,503] [INFO] [timer.py:197:stop] 0/3528, RunningAvgSamplesPerSec=6.3269374676738215, CurrSamplesPerSec=5.712080074997962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:55:05,755] [INFO] [timer.py:197:stop] 0/3530, RunningAvgSamplesPerSec=6.326954431236029, CurrSamplesPerSec=5.735451557796293, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:55:17,197] [INFO] [timer.py:197:stop] 0/3532, RunningAvgSamplesPerSec=6.326959540533124, CurrSamplesPerSec=5.7057088236782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:55:28,533] [INFO] [timer.py:197:stop] 0/3534, RunningAvgSamplesPerSec=6.326960778031071, CurrSamplesPerSec=5.681827833723078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:55:39,835] [INFO] [timer.py:197:stop] 0/3536, RunningAvgSamplesPerSec=6.326963432884696, CurrSamplesPerSec=5.69510588973109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:55:51,070] [INFO] [timer.py:197:stop] 0/3538, RunningAvgSamplesPerSec=6.326967350277723, CurrSamplesPerSec=5.682850744355117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:56:02,426] [INFO] [logging.py:68:log_dist] [Rank 0] step=1770, skipped=6, lr=[7.1933333333333345e-06], mom=[[0.9, 0.999]] [2022-12-19 05:56:02,427] [INFO] [timer.py:197:stop] 0/3540, RunningAvgSamplesPerSec=6.326972003226076, CurrSamplesPerSec=5.706556677607041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:56:13,748] [INFO] [timer.py:197:stop] 0/3542, RunningAvgSamplesPerSec=6.326972248516126, CurrSamplesPerSec=5.670341818992967, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:56:25,116] [INFO] [timer.py:197:stop] 0/3544, RunningAvgSamplesPerSec=6.32697695037334, CurrSamplesPerSec=5.695844237963177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:56:36,529] [INFO] [timer.py:197:stop] 0/3546, RunningAvgSamplesPerSec=6.326992951406054, CurrSamplesPerSec=5.725496650024394, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:56:47,802] [INFO] [timer.py:197:stop] 0/3548, RunningAvgSamplesPerSec=6.327003352085787, CurrSamplesPerSec=5.73460832756815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:56:59,074] [INFO] [timer.py:197:stop] 0/3550, RunningAvgSamplesPerSec=6.327017732800478, CurrSamplesPerSec=5.714079270717539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:57:10,349] [INFO] [timer.py:197:stop] 0/3552, RunningAvgSamplesPerSec=6.3270244751212426, CurrSamplesPerSec=5.702506961705374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:57:21,649] [INFO] [timer.py:197:stop] 0/3554, RunningAvgSamplesPerSec=6.32702222187331, CurrSamplesPerSec=5.687865490849446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:57:33,030] [INFO] [timer.py:197:stop] 0/3556, RunningAvgSamplesPerSec=6.327023162767492, CurrSamplesPerSec=5.699042975978352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:57:44,507] [INFO] [timer.py:197:stop] 0/3558, RunningAvgSamplesPerSec=6.326992685321458, CurrSamplesPerSec=5.601917622870201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:57:55,808] [INFO] [logging.py:68:log_dist] [Rank 0] step=1780, skipped=6, lr=[7.171111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 05:57:55,810] [INFO] [timer.py:197:stop] 0/3560, RunningAvgSamplesPerSec=6.326987922976724, CurrSamplesPerSec=5.672432483513448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:58:07,144] [INFO] [timer.py:197:stop] 0/3562, RunningAvgSamplesPerSec=6.326985467139746, CurrSamplesPerSec=5.693383660382516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0034, 'learning_rate': 7.1688888888888895e-06, 'epoch': 13.34} [2022-12-19 05:58:18,403] [INFO] [timer.py:197:stop] 0/3564, RunningAvgSamplesPerSec=6.32700287434611, CurrSamplesPerSec=5.712393443566262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:58:29,698] [INFO] [timer.py:197:stop] 0/3566, RunningAvgSamplesPerSec=6.326998814729596, CurrSamplesPerSec=5.689977301176292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:58:40,948] [INFO] [timer.py:197:stop] 0/3568, RunningAvgSamplesPerSec=6.3270074315260425, CurrSamplesPerSec=5.717765678145007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:58:52,416] [INFO] [timer.py:197:stop] 0/3570, RunningAvgSamplesPerSec=6.32700994820065, CurrSamplesPerSec=5.703384158703451, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:59:03,730] [INFO] [timer.py:197:stop] 0/3572, RunningAvgSamplesPerSec=6.327020099729775, CurrSamplesPerSec=5.712582842791417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:59:15,243] [INFO] [timer.py:197:stop] 0/3574, RunningAvgSamplesPerSec=6.327015451962599, CurrSamplesPerSec=5.684363171140599, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:59:26,616] [INFO] [timer.py:197:stop] 0/3576, RunningAvgSamplesPerSec=6.327017523729868, CurrSamplesPerSec=5.704505279646833, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:59:37,912] [INFO] [timer.py:197:stop] 0/3578, RunningAvgSamplesPerSec=6.327025049026369, CurrSamplesPerSec=5.70005757062169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 05:59:49,190] [INFO] [logging.py:68:log_dist] [Rank 0] step=1790, skipped=6, lr=[7.14888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 05:59:49,192] [INFO] [timer.py:197:stop] 0/3580, RunningAvgSamplesPerSec=6.327026677321814, CurrSamplesPerSec=5.697381728193446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:00:00,601] [INFO] [timer.py:197:stop] 0/3582, RunningAvgSamplesPerSec=6.327034385734422, CurrSamplesPerSec=5.704124897545434, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:00:11,890] [INFO] [timer.py:197:stop] 0/3584, RunningAvgSamplesPerSec=6.327027866394558, CurrSamplesPerSec=5.680452585087718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:00:23,379] [INFO] [timer.py:197:stop] 0/3586, RunningAvgSamplesPerSec=6.327026510992603, CurrSamplesPerSec=5.675444175932445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:00:34,748] [INFO] [timer.py:197:stop] 0/3588, RunningAvgSamplesPerSec=6.327030221637688, CurrSamplesPerSec=5.703420270108392, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:00:46,110] [INFO] [timer.py:197:stop] 0/3590, RunningAvgSamplesPerSec=6.3270273888671476, CurrSamplesPerSec=5.690128308163777, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:00:57,404] [INFO] [timer.py:197:stop] 0/3592, RunningAvgSamplesPerSec=6.327030438559169, CurrSamplesPerSec=5.687068724910236, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:01:08,714] [INFO] [timer.py:197:stop] 0/3594, RunningAvgSamplesPerSec=6.32702779340562, CurrSamplesPerSec=5.689826302203567, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:01:20,051] [INFO] [timer.py:197:stop] 0/3596, RunningAvgSamplesPerSec=6.327025031613675, CurrSamplesPerSec=5.671888100906719, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:01:31,473] [INFO] [timer.py:197:stop] 0/3598, RunningAvgSamplesPerSec=6.327027378912919, CurrSamplesPerSec=5.687081014522598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:01:42,826] [INFO] [logging.py:68:log_dist] [Rank 0] step=1800, skipped=6, lr=[7.126666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 06:01:42,828] [INFO] [timer.py:197:stop] 0/3600, RunningAvgSamplesPerSec=6.327034097482888, CurrSamplesPerSec=5.715511497327968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:01:54,087] [INFO] [timer.py:197:stop] 0/3602, RunningAvgSamplesPerSec=6.327037619899448, CurrSamplesPerSec=5.693747394245612, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:02:05,359] [INFO] [timer.py:197:stop] 0/3604, RunningAvgSamplesPerSec=6.327043376417096, CurrSamplesPerSec=5.697803782519358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:02:16,673] [INFO] [timer.py:197:stop] 0/3606, RunningAvgSamplesPerSec=6.327045349833805, CurrSamplesPerSec=5.6940039197741905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:02:27,952] [INFO] [timer.py:197:stop] 0/3608, RunningAvgSamplesPerSec=6.327056064768937, CurrSamplesPerSec=5.721884425527488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:02:39,272] [INFO] [timer.py:197:stop] 0/3610, RunningAvgSamplesPerSec=6.327052000142458, CurrSamplesPerSec=5.677036464050728, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:02:50,539] [INFO] [timer.py:197:stop] 0/3612, RunningAvgSamplesPerSec=6.327072923418692, CurrSamplesPerSec=5.733637731341162, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0025, 'learning_rate': 7.113333333333334e-06, 'epoch': 13.53} [2022-12-19 06:03:01,777] [INFO] [timer.py:197:stop] 0/3614, RunningAvgSamplesPerSec=6.32709929445537, CurrSamplesPerSec=5.733259087040564, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:03:13,005] [INFO] [timer.py:197:stop] 0/3616, RunningAvgSamplesPerSec=6.327114926466392, CurrSamplesPerSec=5.722547508777962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:03:24,259] [INFO] [timer.py:197:stop] 0/3618, RunningAvgSamplesPerSec=6.327127515299568, CurrSamplesPerSec=5.703545088217769, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:03:35,577] [INFO] [logging.py:68:log_dist] [Rank 0] step=1810, skipped=6, lr=[7.104444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 06:03:35,579] [INFO] [timer.py:197:stop] 0/3620, RunningAvgSamplesPerSec=6.327126952452533, CurrSamplesPerSec=5.6900701719369735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:03:46,831] [INFO] [timer.py:197:stop] 0/3622, RunningAvgSamplesPerSec=6.327132931149001, CurrSamplesPerSec=5.710332017053003, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:03:58,252] [INFO] [timer.py:197:stop] 0/3624, RunningAvgSamplesPerSec=6.327143015006508, CurrSamplesPerSec=5.703352167773687, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:04:09,658] [INFO] [timer.py:197:stop] 0/3626, RunningAvgSamplesPerSec=6.32714887193365, CurrSamplesPerSec=5.695042818773983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:04:20,911] [INFO] [timer.py:197:stop] 0/3628, RunningAvgSamplesPerSec=6.3271626754728425, CurrSamplesPerSec=5.722240588008331, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:04:32,201] [INFO] [timer.py:197:stop] 0/3630, RunningAvgSamplesPerSec=6.327175775480349, CurrSamplesPerSec=5.7129743238702595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:04:43,506] [INFO] [timer.py:197:stop] 0/3632, RunningAvgSamplesPerSec=6.327186380031405, CurrSamplesPerSec=5.7184972444914965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:04:54,780] [INFO] [timer.py:197:stop] 0/3634, RunningAvgSamplesPerSec=6.327190743884232, CurrSamplesPerSec=5.695600114745667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:05:06,035] [INFO] [timer.py:197:stop] 0/3636, RunningAvgSamplesPerSec=6.327204142877737, CurrSamplesPerSec=5.707859394442087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:05:17,434] [INFO] [timer.py:197:stop] 0/3638, RunningAvgSamplesPerSec=6.327223213597765, CurrSamplesPerSec=5.725236546447461, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:05:28,660] [INFO] [logging.py:68:log_dist] [Rank 0] step=1820, skipped=6, lr=[7.0822222222222226e-06], mom=[[0.9, 0.999]] [2022-12-19 06:05:28,661] [INFO] [timer.py:197:stop] 0/3640, RunningAvgSamplesPerSec=6.327235086880635, CurrSamplesPerSec=5.711711807557588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:05:39,986] [INFO] [timer.py:197:stop] 0/3642, RunningAvgSamplesPerSec=6.3272545467580725, CurrSamplesPerSec=5.725891368761624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:05:51,307] [INFO] [timer.py:197:stop] 0/3644, RunningAvgSamplesPerSec=6.327252201226174, CurrSamplesPerSec=5.703571264358926, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:06:02,540] [INFO] [timer.py:197:stop] 0/3646, RunningAvgSamplesPerSec=6.3272701237515765, CurrSamplesPerSec=5.718859808270509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:06:13,875] [INFO] [timer.py:197:stop] 0/3648, RunningAvgSamplesPerSec=6.327261599031, CurrSamplesPerSec=5.71337534365649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:06:25,178] [INFO] [timer.py:197:stop] 0/3650, RunningAvgSamplesPerSec=6.3272671499337525, CurrSamplesPerSec=5.698161549732585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:06:36,551] [INFO] [timer.py:197:stop] 0/3652, RunningAvgSamplesPerSec=6.32728056599775, CurrSamplesPerSec=5.717603945153865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:06:47,829] [INFO] [timer.py:197:stop] 0/3654, RunningAvgSamplesPerSec=6.327294567008897, CurrSamplesPerSec=5.723971046332069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:06:59,083] [INFO] [timer.py:197:stop] 0/3656, RunningAvgSamplesPerSec=6.327308241329662, CurrSamplesPerSec=5.706866043274611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:07:10,448] [INFO] [timer.py:197:stop] 0/3658, RunningAvgSamplesPerSec=6.327312232151331, CurrSamplesPerSec=5.694931179355266, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:07:21,683] [INFO] [logging.py:68:log_dist] [Rank 0] step=1830, skipped=6, lr=[7.06e-06], mom=[[0.9, 0.999]] [2022-12-19 06:07:21,684] [INFO] [timer.py:197:stop] 0/3660, RunningAvgSamplesPerSec=6.327332034038214, CurrSamplesPerSec=5.725602652062819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:07:32,906] [INFO] [timer.py:197:stop] 0/3662, RunningAvgSamplesPerSec=6.327354706161049, CurrSamplesPerSec=5.714217692815112, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0023, 'learning_rate': 7.057777777777778e-06, 'epoch': 13.72} [2022-12-19 06:07:44,193] [INFO] [timer.py:197:stop] 0/3664, RunningAvgSamplesPerSec=6.32736397728622, CurrSamplesPerSec=5.696914276245342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:07:55,548] [INFO] [timer.py:197:stop] 0/3666, RunningAvgSamplesPerSec=6.327358051347948, CurrSamplesPerSec=5.64943211344447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:08:06,826] [INFO] [timer.py:197:stop] 0/3668, RunningAvgSamplesPerSec=6.327370085373034, CurrSamplesPerSec=5.702346817573943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:08:18,353] [INFO] [timer.py:197:stop] 0/3670, RunningAvgSamplesPerSec=6.327366912612304, CurrSamplesPerSec=5.686476717673811, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:08:29,659] [INFO] [timer.py:197:stop] 0/3672, RunningAvgSamplesPerSec=6.327360565044021, CurrSamplesPerSec=5.681007990920392, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:08:41,174] [INFO] [timer.py:197:stop] 0/3674, RunningAvgSamplesPerSec=6.3273548787337655, CurrSamplesPerSec=5.654257968027836, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:08:52,448] [INFO] [timer.py:197:stop] 0/3676, RunningAvgSamplesPerSec=6.327379066069905, CurrSamplesPerSec=5.742913209856755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:09:03,724] [INFO] [timer.py:197:stop] 0/3678, RunningAvgSamplesPerSec=6.32739219455583, CurrSamplesPerSec=5.721471235031718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:09:15,226] [INFO] [logging.py:68:log_dist] [Rank 0] step=1840, skipped=6, lr=[7.037777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 06:09:15,227] [INFO] [timer.py:197:stop] 0/3680, RunningAvgSamplesPerSec=6.327400346049888, CurrSamplesPerSec=5.69236347310283, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:09:26,618] [INFO] [timer.py:197:stop] 0/3682, RunningAvgSamplesPerSec=6.327411508833263, CurrSamplesPerSec=5.710090780153189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:09:37,918] [INFO] [timer.py:197:stop] 0/3684, RunningAvgSamplesPerSec=6.327421722035337, CurrSamplesPerSec=5.7158030918069676, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:09:49,370] [INFO] [timer.py:197:stop] 0/3686, RunningAvgSamplesPerSec=6.327430089881653, CurrSamplesPerSec=5.708108939372282, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:10:00,682] [INFO] [timer.py:197:stop] 0/3688, RunningAvgSamplesPerSec=6.327433937992524, CurrSamplesPerSec=5.685275252122632, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:10:12,165] [INFO] [timer.py:197:stop] 0/3690, RunningAvgSamplesPerSec=6.327438991129145, CurrSamplesPerSec=5.697612217734825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:10:23,440] [INFO] [timer.py:197:stop] 0/3692, RunningAvgSamplesPerSec=6.32744592905265, CurrSamplesPerSec=5.711902133909476, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:10:34,854] [INFO] [timer.py:197:stop] 0/3694, RunningAvgSamplesPerSec=6.327454804299633, CurrSamplesPerSec=5.713361967316394, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:10:46,105] [INFO] [timer.py:197:stop] 0/3696, RunningAvgSamplesPerSec=6.327470636512603, CurrSamplesPerSec=5.721469039964127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:10:57,393] [INFO] [timer.py:197:stop] 0/3698, RunningAvgSamplesPerSec=6.327478484677782, CurrSamplesPerSec=5.704937604642237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:11:08,689] [INFO] [logging.py:68:log_dist] [Rank 0] step=1850, skipped=6, lr=[7.015555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 06:11:08,690] [INFO] [timer.py:197:stop] 0/3700, RunningAvgSamplesPerSec=6.327482925390094, CurrSamplesPerSec=5.705417530602313, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:11:20,111] [INFO] [timer.py:197:stop] 0/3702, RunningAvgSamplesPerSec=6.327495298183676, CurrSamplesPerSec=5.7164728007647305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:11:31,402] [INFO] [timer.py:197:stop] 0/3704, RunningAvgSamplesPerSec=6.327504767388684, CurrSamplesPerSec=5.69536567947304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:11:42,722] [INFO] [timer.py:197:stop] 0/3706, RunningAvgSamplesPerSec=6.327499975413602, CurrSamplesPerSec=5.671738060487139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:11:53,949] [INFO] [timer.py:197:stop] 0/3708, RunningAvgSamplesPerSec=6.327512644227029, CurrSamplesPerSec=5.719230216591281, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:12:05,246] [INFO] [timer.py:197:stop] 0/3710, RunningAvgSamplesPerSec=6.327521325349658, CurrSamplesPerSec=5.710453736457971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:12:16,615] [INFO] [timer.py:197:stop] 0/3712, RunningAvgSamplesPerSec=6.327536046112667, CurrSamplesPerSec=5.711085497358781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0029, 'learning_rate': 7.0022222222222225e-06, 'epoch': 13.91} [2022-12-19 06:12:27,872] [INFO] [timer.py:197:stop] 0/3714, RunningAvgSamplesPerSec=6.3275540357392215, CurrSamplesPerSec=5.720300283769543, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:12:39,121] [INFO] [timer.py:197:stop] 0/3716, RunningAvgSamplesPerSec=6.327572650096882, CurrSamplesPerSec=5.72144050423872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:12:50,371] [INFO] [timer.py:197:stop] 0/3718, RunningAvgSamplesPerSec=6.32759062896277, CurrSamplesPerSec=5.7236586030173005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:13:01,799] [INFO] [logging.py:68:log_dist] [Rank 0] step=1860, skipped=6, lr=[6.993333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 06:13:01,800] [INFO] [timer.py:197:stop] 0/3720, RunningAvgSamplesPerSec=6.327589406475974, CurrSamplesPerSec=5.667672496391453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:13:13,076] [INFO] [timer.py:197:stop] 0/3722, RunningAvgSamplesPerSec=6.327604010883086, CurrSamplesPerSec=5.715054936693262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:13:24,311] [INFO] [timer.py:197:stop] 0/3724, RunningAvgSamplesPerSec=6.327618611123488, CurrSamplesPerSec=5.709718154148153, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:13:35,576] [INFO] [timer.py:197:stop] 0/3726, RunningAvgSamplesPerSec=6.327636812600171, CurrSamplesPerSec=5.72545146601681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:13:46,865] [INFO] [timer.py:197:stop] 0/3728, RunningAvgSamplesPerSec=6.327639401013959, CurrSamplesPerSec=5.697421391382077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:13:58,133] [INFO] [timer.py:197:stop] 0/3730, RunningAvgSamplesPerSec=6.32765221625328, CurrSamplesPerSec=5.704256049697142, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:14:09,415] [INFO] [timer.py:197:stop] 0/3732, RunningAvgSamplesPerSec=6.327662514264426, CurrSamplesPerSec=5.7005826780475815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:14:20,771] [INFO] [timer.py:197:stop] 0/3734, RunningAvgSamplesPerSec=6.327663684947549, CurrSamplesPerSec=5.691885982070722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:14:32,083] [INFO] [timer.py:197:stop] 0/3736, RunningAvgSamplesPerSec=6.327667163733082, CurrSamplesPerSec=5.699140982837788, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:14:42,413] [INFO] [timer.py:197:stop] 0/3738, RunningAvgSamplesPerSec=6.32795106702934, CurrSamplesPerSec=6.693755406475977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:14:53,721] [INFO] [logging.py:68:log_dist] [Rank 0] step=1870, skipped=6, lr=[6.9711111111111115e-06], mom=[[0.9, 0.999]] [2022-12-19 06:14:53,722] [INFO] [timer.py:197:stop] 0/3740, RunningAvgSamplesPerSec=6.327960652983164, CurrSamplesPerSec=5.706357973352362, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:15:05,163] [INFO] [timer.py:197:stop] 0/3742, RunningAvgSamplesPerSec=6.327974282540678, CurrSamplesPerSec=5.720229583608652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:15:16,470] [INFO] [timer.py:197:stop] 0/3744, RunningAvgSamplesPerSec=6.327991373943682, CurrSamplesPerSec=5.725111509759045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:15:27,736] [INFO] [timer.py:197:stop] 0/3746, RunningAvgSamplesPerSec=6.328003393024402, CurrSamplesPerSec=5.735590772217511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:15:39,168] [INFO] [timer.py:197:stop] 0/3748, RunningAvgSamplesPerSec=6.328003184775546, CurrSamplesPerSec=5.684988449603876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:15:50,431] [INFO] [timer.py:197:stop] 0/3750, RunningAvgSamplesPerSec=6.3280215352892, CurrSamplesPerSec=5.728525848479165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:16:01,708] [INFO] [timer.py:197:stop] 0/3752, RunningAvgSamplesPerSec=6.328029773948391, CurrSamplesPerSec=5.695193128108286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:16:13,000] [INFO] [timer.py:197:stop] 0/3754, RunningAvgSamplesPerSec=6.328025452511106, CurrSamplesPerSec=5.682377734727621, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:16:24,412] [INFO] [timer.py:197:stop] 0/3756, RunningAvgSamplesPerSec=6.328038406251197, CurrSamplesPerSec=5.70034444304904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:16:35,709] [INFO] [timer.py:197:stop] 0/3758, RunningAvgSamplesPerSec=6.3280397124973415, CurrSamplesPerSec=5.696216505726658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:16:47,024] [INFO] [logging.py:68:log_dist] [Rank 0] step=1880, skipped=6, lr=[6.948888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 06:16:47,025] [INFO] [timer.py:197:stop] 0/3760, RunningAvgSamplesPerSec=6.328035202039075, CurrSamplesPerSec=5.6684257722875175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:16:58,299] [INFO] [timer.py:197:stop] 0/3762, RunningAvgSamplesPerSec=6.328042853729767, CurrSamplesPerSec=5.704215806436217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:17:09,597] [INFO] [timer.py:197:stop] 0/3764, RunningAvgSamplesPerSec=6.3280395784216905, CurrSamplesPerSec=5.685623017919801, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0021, 'learning_rate': 6.944444444444445e-06, 'epoch': 14.1} [2022-12-19 06:17:20,926] [INFO] [timer.py:197:stop] 0/3766, RunningAvgSamplesPerSec=6.328039677874865, CurrSamplesPerSec=5.6995596673782245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:17:32,197] [INFO] [timer.py:197:stop] 0/3768, RunningAvgSamplesPerSec=6.328045418894903, CurrSamplesPerSec=5.689142565323669, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:17:43,503] [INFO] [timer.py:197:stop] 0/3770, RunningAvgSamplesPerSec=6.328044100503587, CurrSamplesPerSec=5.688127031054258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:17:54,936] [INFO] [timer.py:197:stop] 0/3772, RunningAvgSamplesPerSec=6.328048141901991, CurrSamplesPerSec=5.687343687564541, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:18:06,242] [INFO] [timer.py:197:stop] 0/3774, RunningAvgSamplesPerSec=6.328048834890933, CurrSamplesPerSec=5.707304065364238, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:18:17,553] [INFO] [timer.py:197:stop] 0/3776, RunningAvgSamplesPerSec=6.328050892658502, CurrSamplesPerSec=5.693787973016198, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:18:28,852] [INFO] [timer.py:197:stop] 0/3778, RunningAvgSamplesPerSec=6.328054117165537, CurrSamplesPerSec=5.7040116897245605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:18:40,133] [INFO] [logging.py:68:log_dist] [Rank 0] step=1890, skipped=6, lr=[6.926666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 06:18:40,134] [INFO] [timer.py:197:stop] 0/3780, RunningAvgSamplesPerSec=6.328059122222756, CurrSamplesPerSec=5.670903395864742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:18:51,567] [INFO] [timer.py:197:stop] 0/3782, RunningAvgSamplesPerSec=6.3280540302231545, CurrSamplesPerSec=5.672919663893051, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:19:02,898] [INFO] [timer.py:197:stop] 0/3784, RunningAvgSamplesPerSec=6.328048511234829, CurrSamplesPerSec=5.69206605775259, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:19:14,384] [INFO] [timer.py:197:stop] 0/3786, RunningAvgSamplesPerSec=6.328049400443636, CurrSamplesPerSec=5.70094345797594, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:19:26,027] [INFO] [timer.py:197:stop] 0/3788, RunningAvgSamplesPerSec=6.328010986782262, CurrSamplesPerSec=5.591690528894056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:19:37,321] [INFO] [timer.py:197:stop] 0/3790, RunningAvgSamplesPerSec=6.3280127835213715, CurrSamplesPerSec=5.686066697818695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:19:48,692] [INFO] [timer.py:197:stop] 0/3792, RunningAvgSamplesPerSec=6.328018650166986, CurrSamplesPerSec=5.712006660801071, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:19:59,958] [INFO] [timer.py:197:stop] 0/3794, RunningAvgSamplesPerSec=6.328028634669163, CurrSamplesPerSec=5.704542132615976, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:20:11,285] [INFO] [timer.py:197:stop] 0/3796, RunningAvgSamplesPerSec=6.328025623488509, CurrSamplesPerSec=5.699830998126863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:20:22,551] [INFO] [timer.py:197:stop] 0/3798, RunningAvgSamplesPerSec=6.328037792626543, CurrSamplesPerSec=5.718709952747624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:20:33,892] [INFO] [logging.py:68:log_dist] [Rank 0] step=1900, skipped=6, lr=[6.904444444444444e-06], mom=[[0.9, 0.999]] [2022-12-19 06:20:33,893] [INFO] [timer.py:197:stop] 0/3800, RunningAvgSamplesPerSec=6.328017825343303, CurrSamplesPerSec=5.691623372011047, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:20:45,179] [INFO] [timer.py:197:stop] 0/3802, RunningAvgSamplesPerSec=6.328020657901236, CurrSamplesPerSec=5.688602686081425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:20:56,461] [INFO] [timer.py:197:stop] 0/3804, RunningAvgSamplesPerSec=6.32802445150907, CurrSamplesPerSec=5.6924252775952535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:21:07,746] [INFO] [timer.py:197:stop] 0/3806, RunningAvgSamplesPerSec=6.328029424966494, CurrSamplesPerSec=5.710552136177062, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:21:19,075] [INFO] [timer.py:197:stop] 0/3808, RunningAvgSamplesPerSec=6.328028731043501, CurrSamplesPerSec=5.706189849791001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:21:30,356] [INFO] [timer.py:197:stop] 0/3810, RunningAvgSamplesPerSec=6.328034756229449, CurrSamplesPerSec=5.695326286573795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:21:41,641] [INFO] [timer.py:197:stop] 0/3812, RunningAvgSamplesPerSec=6.328039296654855, CurrSamplesPerSec=5.69041442325283, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:21:52,940] [INFO] [timer.py:197:stop] 0/3814, RunningAvgSamplesPerSec=6.328044696385231, CurrSamplesPerSec=5.698667435219974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0018, 'learning_rate': 6.88888888888889e-06, 'epoch': 14.28} [2022-12-19 06:22:04,460] [INFO] [timer.py:197:stop] 0/3816, RunningAvgSamplesPerSec=6.328045229444641, CurrSamplesPerSec=5.701956791588592, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:22:15,776] [INFO] [timer.py:197:stop] 0/3818, RunningAvgSamplesPerSec=6.328049435214314, CurrSamplesPerSec=5.70020548182862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:22:27,089] [INFO] [logging.py:68:log_dist] [Rank 0] step=1910, skipped=6, lr=[6.882222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 06:22:27,091] [INFO] [timer.py:197:stop] 0/3820, RunningAvgSamplesPerSec=6.32804754113878, CurrSamplesPerSec=5.676681104482095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:22:38,453] [INFO] [timer.py:197:stop] 0/3822, RunningAvgSamplesPerSec=6.328046683674132, CurrSamplesPerSec=5.6892711001372325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:22:49,759] [INFO] [timer.py:197:stop] 0/3824, RunningAvgSamplesPerSec=6.3280476315846315, CurrSamplesPerSec=5.703661428462032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:23:01,133] [INFO] [timer.py:197:stop] 0/3826, RunningAvgSamplesPerSec=6.328036089151419, CurrSamplesPerSec=5.663388116211383, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:23:12,389] [INFO] [timer.py:197:stop] 0/3828, RunningAvgSamplesPerSec=6.328046889792171, CurrSamplesPerSec=5.724752303876463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:23:23,690] [INFO] [timer.py:197:stop] 0/3830, RunningAvgSamplesPerSec=6.328047719823536, CurrSamplesPerSec=5.704279080734468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:23:34,979] [INFO] [timer.py:197:stop] 0/3832, RunningAvgSamplesPerSec=6.32804980409448, CurrSamplesPerSec=5.700216133668467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:23:46,289] [INFO] [timer.py:197:stop] 0/3834, RunningAvgSamplesPerSec=6.328050852934382, CurrSamplesPerSec=5.697443399853023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:23:57,576] [INFO] [timer.py:197:stop] 0/3836, RunningAvgSamplesPerSec=6.328058943885285, CurrSamplesPerSec=5.699454143395377, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:24:08,979] [INFO] [timer.py:197:stop] 0/3838, RunningAvgSamplesPerSec=6.328061294694311, CurrSamplesPerSec=5.703052876081424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:24:20,266] [INFO] [logging.py:68:log_dist] [Rank 0] step=1920, skipped=6, lr=[6.860000000000001e-06], mom=[[0.9, 0.999]] [2022-12-19 06:24:20,268] [INFO] [timer.py:197:stop] 0/3840, RunningAvgSamplesPerSec=6.328055244767827, CurrSamplesPerSec=5.684246653894753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:24:31,665] [INFO] [timer.py:197:stop] 0/3842, RunningAvgSamplesPerSec=6.32803263012053, CurrSamplesPerSec=5.6086090723273765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:24:42,942] [INFO] [timer.py:197:stop] 0/3844, RunningAvgSamplesPerSec=6.328035805695097, CurrSamplesPerSec=5.7002616465235185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:24:54,248] [INFO] [timer.py:197:stop] 0/3846, RunningAvgSamplesPerSec=6.328037903933775, CurrSamplesPerSec=5.695735467608192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:25:05,534] [INFO] [timer.py:197:stop] 0/3848, RunningAvgSamplesPerSec=6.3280435269204975, CurrSamplesPerSec=5.694111657726423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:25:17,059] [INFO] [timer.py:197:stop] 0/3850, RunningAvgSamplesPerSec=6.328045186638439, CurrSamplesPerSec=5.675465774949722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:25:28,304] [INFO] [timer.py:197:stop] 0/3852, RunningAvgSamplesPerSec=6.328059332525189, CurrSamplesPerSec=5.714413295402361, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:25:39,580] [INFO] [timer.py:197:stop] 0/3854, RunningAvgSamplesPerSec=6.328066059218054, CurrSamplesPerSec=5.685661313291384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:25:51,033] [INFO] [timer.py:197:stop] 0/3856, RunningAvgSamplesPerSec=6.328061910532867, CurrSamplesPerSec=5.679223624886981, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:26:02,317] [INFO] [timer.py:197:stop] 0/3858, RunningAvgSamplesPerSec=6.328068860342016, CurrSamplesPerSec=5.693829277001678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:26:13,719] [INFO] [logging.py:68:log_dist] [Rank 0] step=1930, skipped=6, lr=[6.837777777777779e-06], mom=[[0.9, 0.999]] [2022-12-19 06:26:13,721] [INFO] [timer.py:197:stop] 0/3860, RunningAvgSamplesPerSec=6.328054229594348, CurrSamplesPerSec=5.699443978447397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:26:24,982] [INFO] [timer.py:197:stop] 0/3862, RunningAvgSamplesPerSec=6.328064384396151, CurrSamplesPerSec=5.703954481485739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:26:36,509] [INFO] [timer.py:197:stop] 0/3864, RunningAvgSamplesPerSec=6.3280680577692605, CurrSamplesPerSec=5.699506662870895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0019, 'learning_rate': 6.833333333333334e-06, 'epoch': 14.47} [2022-12-19 06:26:47,745] [INFO] [timer.py:197:stop] 0/3866, RunningAvgSamplesPerSec=6.3280922323477045, CurrSamplesPerSec=5.741382978072653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:26:59,026] [INFO] [timer.py:197:stop] 0/3868, RunningAvgSamplesPerSec=6.32809843299649, CurrSamplesPerSec=5.693683387241547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:27:10,238] [INFO] [timer.py:197:stop] 0/3870, RunningAvgSamplesPerSec=6.32811427606524, CurrSamplesPerSec=5.718555719410919, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:27:21,532] [INFO] [timer.py:197:stop] 0/3872, RunningAvgSamplesPerSec=6.328121853998158, CurrSamplesPerSec=5.700892122743279, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:27:32,928] [INFO] [timer.py:197:stop] 0/3874, RunningAvgSamplesPerSec=6.328124963747474, CurrSamplesPerSec=5.707206748276031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:27:44,379] [INFO] [timer.py:197:stop] 0/3876, RunningAvgSamplesPerSec=6.328129416837169, CurrSamplesPerSec=5.710579834487389, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:27:55,699] [INFO] [timer.py:197:stop] 0/3878, RunningAvgSamplesPerSec=6.328135828775522, CurrSamplesPerSec=5.702588369734863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:28:06,969] [INFO] [logging.py:68:log_dist] [Rank 0] step=1940, skipped=6, lr=[6.8155555555555565e-06], mom=[[0.9, 0.999]] [2022-12-19 06:28:06,971] [INFO] [timer.py:197:stop] 0/3880, RunningAvgSamplesPerSec=6.328149004799895, CurrSamplesPerSec=5.728929054579239, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:28:18,283] [INFO] [timer.py:197:stop] 0/3882, RunningAvgSamplesPerSec=6.328152235121767, CurrSamplesPerSec=5.691333273996371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:28:29,551] [INFO] [timer.py:197:stop] 0/3884, RunningAvgSamplesPerSec=6.328167915745979, CurrSamplesPerSec=5.7168226893297245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:28:40,881] [INFO] [timer.py:197:stop] 0/3886, RunningAvgSamplesPerSec=6.328157018143391, CurrSamplesPerSec=5.670074486150835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:28:52,154] [INFO] [timer.py:197:stop] 0/3888, RunningAvgSamplesPerSec=6.328158984179298, CurrSamplesPerSec=5.69601126897371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:29:03,534] [INFO] [timer.py:197:stop] 0/3890, RunningAvgSamplesPerSec=6.328135087629385, CurrSamplesPerSec=5.685565696252131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:29:14,826] [INFO] [timer.py:197:stop] 0/3892, RunningAvgSamplesPerSec=6.328138604641437, CurrSamplesPerSec=5.69113104345741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:29:26,132] [INFO] [timer.py:197:stop] 0/3894, RunningAvgSamplesPerSec=6.328133998898769, CurrSamplesPerSec=5.691229019704189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:29:37,490] [INFO] [timer.py:197:stop] 0/3896, RunningAvgSamplesPerSec=6.328110208199487, CurrSamplesPerSec=5.624326196662492, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:29:48,902] [INFO] [timer.py:197:stop] 0/3898, RunningAvgSamplesPerSec=6.328100254659082, CurrSamplesPerSec=5.6711906959799485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:30:00,217] [INFO] [logging.py:68:log_dist] [Rank 0] step=1950, skipped=6, lr=[6.793333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 06:30:00,219] [INFO] [timer.py:197:stop] 0/3900, RunningAvgSamplesPerSec=6.328093779128293, CurrSamplesPerSec=5.668160535057735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:30:11,517] [INFO] [timer.py:197:stop] 0/3902, RunningAvgSamplesPerSec=6.328096134875486, CurrSamplesPerSec=5.682489363674891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:30:22,836] [INFO] [timer.py:197:stop] 0/3904, RunningAvgSamplesPerSec=6.328108791261725, CurrSamplesPerSec=5.717539400656093, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:30:34,205] [INFO] [timer.py:197:stop] 0/3906, RunningAvgSamplesPerSec=6.328112343800276, CurrSamplesPerSec=5.688748556702508, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:30:45,597] [INFO] [timer.py:197:stop] 0/3908, RunningAvgSamplesPerSec=6.328106833637811, CurrSamplesPerSec=5.674268957291673, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:30:56,828] [INFO] [timer.py:197:stop] 0/3910, RunningAvgSamplesPerSec=6.328122234679754, CurrSamplesPerSec=5.720263470709066, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:31:08,138] [INFO] [timer.py:197:stop] 0/3912, RunningAvgSamplesPerSec=6.3281231858148645, CurrSamplesPerSec=5.689516610537094, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:31:19,410] [INFO] [timer.py:197:stop] 0/3914, RunningAvgSamplesPerSec=6.328139419701545, CurrSamplesPerSec=5.713426903953421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0014, 'learning_rate': 6.777777777777779e-06, 'epoch': 14.66} [2022-12-19 06:31:30,746] [INFO] [timer.py:197:stop] 0/3916, RunningAvgSamplesPerSec=6.328137007004338, CurrSamplesPerSec=5.681312668728996, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:31:42,363] [INFO] [timer.py:197:stop] 0/3918, RunningAvgSamplesPerSec=6.328138471959391, CurrSamplesPerSec=5.719615540895709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:31:53,621] [INFO] [logging.py:68:log_dist] [Rank 0] step=1960, skipped=6, lr=[6.771111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 06:31:53,623] [INFO] [timer.py:197:stop] 0/3920, RunningAvgSamplesPerSec=6.328146860710623, CurrSamplesPerSec=5.7090964093931245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:32:04,948] [INFO] [timer.py:197:stop] 0/3922, RunningAvgSamplesPerSec=6.328138143956739, CurrSamplesPerSec=5.654366112795939, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:32:16,437] [INFO] [timer.py:197:stop] 0/3924, RunningAvgSamplesPerSec=6.328149361495485, CurrSamplesPerSec=5.711961203287425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:32:27,705] [INFO] [timer.py:197:stop] 0/3926, RunningAvgSamplesPerSec=6.328165296591213, CurrSamplesPerSec=5.748615578275245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:32:39,029] [INFO] [timer.py:197:stop] 0/3928, RunningAvgSamplesPerSec=6.328156002570422, CurrSamplesPerSec=5.71855084645529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:32:50,462] [INFO] [timer.py:197:stop] 0/3930, RunningAvgSamplesPerSec=6.328164432182464, CurrSamplesPerSec=5.703972904352631, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:33:02,105] [INFO] [timer.py:197:stop] 0/3932, RunningAvgSamplesPerSec=6.3280585869555335, CurrSamplesPerSec=5.362219118955945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:33:13,409] [INFO] [timer.py:197:stop] 0/3934, RunningAvgSamplesPerSec=6.328060952559257, CurrSamplesPerSec=5.711802472296523, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:33:24,767] [INFO] [timer.py:197:stop] 0/3936, RunningAvgSamplesPerSec=6.328054752239965, CurrSamplesPerSec=5.642867142998527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:33:36,327] [INFO] [timer.py:197:stop] 0/3938, RunningAvgSamplesPerSec=6.32807389396822, CurrSamplesPerSec=5.724368240315964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:33:47,629] [INFO] [logging.py:68:log_dist] [Rank 0] step=1970, skipped=6, lr=[6.748888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 06:33:47,631] [INFO] [timer.py:197:stop] 0/3940, RunningAvgSamplesPerSec=6.3280784093486115, CurrSamplesPerSec=5.70558003014886, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:33:58,990] [INFO] [timer.py:197:stop] 0/3942, RunningAvgSamplesPerSec=6.32808509398429, CurrSamplesPerSec=5.728011714669344, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:34:10,301] [INFO] [timer.py:197:stop] 0/3944, RunningAvgSamplesPerSec=6.32809247354169, CurrSamplesPerSec=5.689206229120785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:34:21,918] [INFO] [timer.py:197:stop] 0/3946, RunningAvgSamplesPerSec=6.327996416038017, CurrSamplesPerSec=5.392654013764237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:34:33,243] [INFO] [timer.py:197:stop] 0/3948, RunningAvgSamplesPerSec=6.32799392038622, CurrSamplesPerSec=5.682347181879382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:34:44,525] [INFO] [timer.py:197:stop] 0/3950, RunningAvgSamplesPerSec=6.327998169252486, CurrSamplesPerSec=5.703867701687004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:34:55,994] [INFO] [timer.py:197:stop] 0/3952, RunningAvgSamplesPerSec=6.328000976841346, CurrSamplesPerSec=5.696323843911464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:35:07,232] [INFO] [timer.py:197:stop] 0/3954, RunningAvgSamplesPerSec=6.328021219689916, CurrSamplesPerSec=5.726922632838309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:35:18,700] [INFO] [timer.py:197:stop] 0/3956, RunningAvgSamplesPerSec=6.32801503755657, CurrSamplesPerSec=5.710078876745253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:35:29,964] [INFO] [timer.py:197:stop] 0/3958, RunningAvgSamplesPerSec=6.328023773881425, CurrSamplesPerSec=5.694403488621522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:35:41,263] [INFO] [logging.py:68:log_dist] [Rank 0] step=1980, skipped=6, lr=[6.726666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 06:35:41,264] [INFO] [timer.py:197:stop] 0/3960, RunningAvgSamplesPerSec=6.328020032117366, CurrSamplesPerSec=5.66846096356852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:35:52,547] [INFO] [timer.py:197:stop] 0/3962, RunningAvgSamplesPerSec=6.328029223922242, CurrSamplesPerSec=5.698449199091527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:36:03,916] [INFO] [timer.py:197:stop] 0/3964, RunningAvgSamplesPerSec=6.328040616324467, CurrSamplesPerSec=5.701059934267221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 6.7222222222222235e-06, 'epoch': 14.85} [2022-12-19 06:36:15,234] [INFO] [timer.py:197:stop] 0/3966, RunningAvgSamplesPerSec=6.3280388554730775, CurrSamplesPerSec=5.694168668695933, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:36:26,541] [INFO] [timer.py:197:stop] 0/3968, RunningAvgSamplesPerSec=6.3280470145310534, CurrSamplesPerSec=5.698941826754501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:36:37,939] [INFO] [timer.py:197:stop] 0/3970, RunningAvgSamplesPerSec=6.32801968561978, CurrSamplesPerSec=5.61446151144515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:36:49,228] [INFO] [timer.py:197:stop] 0/3972, RunningAvgSamplesPerSec=6.3280297146269415, CurrSamplesPerSec=5.697912389993767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:37:00,560] [INFO] [timer.py:197:stop] 0/3974, RunningAvgSamplesPerSec=6.32802357469323, CurrSamplesPerSec=5.662964693631673, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:37:11,802] [INFO] [timer.py:197:stop] 0/3976, RunningAvgSamplesPerSec=6.328043222921009, CurrSamplesPerSec=5.725830789578641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:37:23,082] [INFO] [timer.py:197:stop] 0/3978, RunningAvgSamplesPerSec=6.328053809503274, CurrSamplesPerSec=5.704827759272596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:37:34,616] [INFO] [logging.py:68:log_dist] [Rank 0] step=1990, skipped=6, lr=[6.7044444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 06:37:34,618] [INFO] [timer.py:197:stop] 0/3980, RunningAvgSamplesPerSec=6.328043078960347, CurrSamplesPerSec=5.718745283830027, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:37:45,903] [INFO] [timer.py:197:stop] 0/3982, RunningAvgSamplesPerSec=6.328058876006719, CurrSamplesPerSec=5.698116070407015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:37:57,227] [INFO] [timer.py:197:stop] 0/3984, RunningAvgSamplesPerSec=6.328048112247278, CurrSamplesPerSec=5.648955615763842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:38:08,587] [INFO] [timer.py:197:stop] 0/3986, RunningAvgSamplesPerSec=6.328067267934803, CurrSamplesPerSec=5.708456105282691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:38:19,937] [INFO] [timer.py:197:stop] 0/3988, RunningAvgSamplesPerSec=6.3280845232854634, CurrSamplesPerSec=5.725340096589343, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:38:31,495] [INFO] [timer.py:197:stop] 0/3990, RunningAvgSamplesPerSec=6.328010071073541, CurrSamplesPerSec=5.713861555329056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:38:42,765] [INFO] [timer.py:197:stop] 0/3992, RunningAvgSamplesPerSec=6.328019691569245, CurrSamplesPerSec=5.714810867300033, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:38:54,017] [INFO] [timer.py:197:stop] 0/3994, RunningAvgSamplesPerSec=6.328035662592456, CurrSamplesPerSec=5.73450493192008, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:39:05,293] [INFO] [timer.py:197:stop] 0/3996, RunningAvgSamplesPerSec=6.328045888122101, CurrSamplesPerSec=5.720399023125814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:39:16,962] [INFO] [timer.py:197:stop] 0/3998, RunningAvgSamplesPerSec=6.328054101585978, CurrSamplesPerSec=5.709639942870304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:39:28,198] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=6, lr=[6.682222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 06:39:28,200] [INFO] [timer.py:197:stop] 0/4000, RunningAvgSamplesPerSec=6.328069759588666, CurrSamplesPerSec=5.732902531147931, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:39:39,693] [INFO] [timer.py:197:stop] 0/4002, RunningAvgSamplesPerSec=6.32801316659541, CurrSamplesPerSec=5.4899484835161525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:39:50,936] [INFO] [timer.py:197:stop] 0/4004, RunningAvgSamplesPerSec=6.328036684438189, CurrSamplesPerSec=5.74323660609793, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:40:01,376] [INFO] [timer.py:197:stop] 0/4006, RunningAvgSamplesPerSec=6.328278786204708, CurrSamplesPerSec=5.607960647845322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:40:12,614] [INFO] [timer.py:197:stop] 0/4008, RunningAvgSamplesPerSec=6.3282967457058215, CurrSamplesPerSec=5.74400149957696, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:40:23,854] [INFO] [timer.py:197:stop] 0/4010, RunningAvgSamplesPerSec=6.328312653832033, CurrSamplesPerSec=5.741758520529722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:40:35,171] [INFO] [timer.py:197:stop] 0/4012, RunningAvgSamplesPerSec=6.328309339925879, CurrSamplesPerSec=5.700726258145929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 06:40:46,428] [INFO] [timer.py:197:stop] 0/4014, RunningAvgSamplesPerSec=6.328319366917588, CurrSamplesPerSec=5.701242770477087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0013, 'learning_rate': 6.666666666666667e-06, 'epoch': 15.04} {'eval_loss': 0.28955078125, 'eval_wer': 15.78416839608657, 'eval_runtime': 1408.4786, 'eval_samples_per_second': 3.288, 'eval_steps_per_second': 0.411, 'epoch': 15.04} [2022-12-19 07:04:23,533] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step2007 is begin to save! [2022-12-19 07:04:23,543] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-2000/global_step2007/mp_rank_00_model_states.pt [2022-12-19 07:04:23,543] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-2000/global_step2007/mp_rank_00_model_states.pt... [2022-12-19 07:04:27,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-2000/global_step2007/mp_rank_00_model_states.pt. [2022-12-19 07:04:27,878] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-2000/global_step2007/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-19 07:04:44,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-2000/global_step2007/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-19 07:04:44,872] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-2000/global_step2007/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-19 07:04:44,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2007 is ready now! [2022-12-19 07:06:39,097] [INFO] [timer.py:197:stop] 0/4016, RunningAvgSamplesPerSec=6.328229092206162, CurrSamplesPerSec=5.429432545116488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:06:50,626] [INFO] [timer.py:197:stop] 0/4018, RunningAvgSamplesPerSec=6.3281533768850435, CurrSamplesPerSec=5.697684778637676, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:07:02,160] [INFO] [logging.py:68:log_dist] [Rank 0] step=2010, skipped=6, lr=[6.660000000000001e-06], mom=[[0.9, 0.999]] [2022-12-19 07:07:02,162] [INFO] [timer.py:197:stop] 0/4020, RunningAvgSamplesPerSec=6.328170750555657, CurrSamplesPerSec=5.720416333339442, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:07:13,572] [INFO] [timer.py:197:stop] 0/4022, RunningAvgSamplesPerSec=6.328142496378482, CurrSamplesPerSec=5.576822529348307, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:07:24,876] [INFO] [timer.py:197:stop] 0/4024, RunningAvgSamplesPerSec=6.328149335760625, CurrSamplesPerSec=5.693257596143959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:07:36,259] [INFO] [timer.py:197:stop] 0/4026, RunningAvgSamplesPerSec=6.328128393911199, CurrSamplesPerSec=5.607739463239751, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:07:47,870] [INFO] [timer.py:197:stop] 0/4028, RunningAvgSamplesPerSec=6.328129252110263, CurrSamplesPerSec=5.683695428380117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:07:59,172] [INFO] [timer.py:197:stop] 0/4030, RunningAvgSamplesPerSec=6.3281327700345855, CurrSamplesPerSec=5.696055747768509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:08:10,633] [INFO] [timer.py:197:stop] 0/4032, RunningAvgSamplesPerSec=6.328118058824584, CurrSamplesPerSec=5.675433376485455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:08:21,928] [INFO] [timer.py:197:stop] 0/4034, RunningAvgSamplesPerSec=6.3281140298409255, CurrSamplesPerSec=5.673286782738392, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:08:33,488] [INFO] [timer.py:197:stop] 0/4036, RunningAvgSamplesPerSec=6.328049340035576, CurrSamplesPerSec=5.50942784062567, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:08:45,000] [INFO] [timer.py:197:stop] 0/4038, RunningAvgSamplesPerSec=6.328039365217708, CurrSamplesPerSec=5.6542503456451785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:08:56,304] [INFO] [logging.py:68:log_dist] [Rank 0] step=2020, skipped=6, lr=[6.637777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 07:08:56,306] [INFO] [timer.py:197:stop] 0/4040, RunningAvgSamplesPerSec=6.328037771654413, CurrSamplesPerSec=5.711119762234341, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:09:07,634] [INFO] [timer.py:197:stop] 0/4042, RunningAvgSamplesPerSec=6.328034407064329, CurrSamplesPerSec=5.697656963406474, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:09:19,151] [INFO] [timer.py:197:stop] 0/4044, RunningAvgSamplesPerSec=6.328022220110683, CurrSamplesPerSec=5.6651362407710995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:09:30,684] [INFO] [timer.py:197:stop] 0/4046, RunningAvgSamplesPerSec=6.32795485502997, CurrSamplesPerSec=5.487770022317799, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:09:41,960] [INFO] [timer.py:197:stop] 0/4048, RunningAvgSamplesPerSec=6.327957292928799, CurrSamplesPerSec=5.710636689755667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:09:53,240] [INFO] [timer.py:197:stop] 0/4050, RunningAvgSamplesPerSec=6.327965628682803, CurrSamplesPerSec=5.710540959741962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:10:04,935] [INFO] [timer.py:197:stop] 0/4052, RunningAvgSamplesPerSec=6.327968033594549, CurrSamplesPerSec=5.70881351178491, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:10:16,241] [INFO] [timer.py:197:stop] 0/4054, RunningAvgSamplesPerSec=6.327976467763057, CurrSamplesPerSec=5.697635920759834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:10:27,917] [INFO] [timer.py:197:stop] 0/4056, RunningAvgSamplesPerSec=6.327953282673122, CurrSamplesPerSec=5.706588219237548, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:10:39,203] [INFO] [timer.py:197:stop] 0/4058, RunningAvgSamplesPerSec=6.3279651799675, CurrSamplesPerSec=5.709838875761256, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:10:50,682] [INFO] [logging.py:68:log_dist] [Rank 0] step=2030, skipped=6, lr=[6.615555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 07:10:50,683] [INFO] [timer.py:197:stop] 0/4060, RunningAvgSamplesPerSec=6.327907253582283, CurrSamplesPerSec=5.510205465746691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:11:01,989] [INFO] [timer.py:197:stop] 0/4062, RunningAvgSamplesPerSec=6.327910333514097, CurrSamplesPerSec=5.679985983883158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:11:13,467] [INFO] [timer.py:197:stop] 0/4064, RunningAvgSamplesPerSec=6.32791655370848, CurrSamplesPerSec=5.708318933285881, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.001, 'learning_rate': 6.6111111111111115e-06, 'epoch': 15.22} [2022-12-19 07:11:25,000] [INFO] [timer.py:197:stop] 0/4066, RunningAvgSamplesPerSec=6.327845647743605, CurrSamplesPerSec=5.698768090774225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:11:36,249] [INFO] [timer.py:197:stop] 0/4068, RunningAvgSamplesPerSec=6.327856668691016, CurrSamplesPerSec=5.6999566274456415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:11:47,759] [INFO] [timer.py:197:stop] 0/4070, RunningAvgSamplesPerSec=6.327866082657655, CurrSamplesPerSec=5.697646562998478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:11:59,037] [INFO] [timer.py:197:stop] 0/4072, RunningAvgSamplesPerSec=6.327868050480991, CurrSamplesPerSec=5.694439727989379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:12:10,320] [INFO] [timer.py:197:stop] 0/4074, RunningAvgSamplesPerSec=6.327869079531179, CurrSamplesPerSec=5.674766770198455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:12:21,831] [INFO] [timer.py:197:stop] 0/4076, RunningAvgSamplesPerSec=6.327872401842557, CurrSamplesPerSec=5.701154862332323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:12:33,133] [INFO] [timer.py:197:stop] 0/4078, RunningAvgSamplesPerSec=6.32788128667322, CurrSamplesPerSec=5.713051896987103, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:12:44,398] [INFO] [logging.py:68:log_dist] [Rank 0] step=2040, skipped=6, lr=[6.5933333333333335e-06], mom=[[0.9, 0.999]] [2022-12-19 07:12:44,400] [INFO] [timer.py:197:stop] 0/4080, RunningAvgSamplesPerSec=6.32788881605187, CurrSamplesPerSec=5.715549222817786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:12:55,885] [INFO] [timer.py:197:stop] 0/4082, RunningAvgSamplesPerSec=6.327897653972286, CurrSamplesPerSec=5.701269652021266, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:13:07,471] [INFO] [timer.py:197:stop] 0/4084, RunningAvgSamplesPerSec=6.327808955563769, CurrSamplesPerSec=5.398714093187115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:13:18,748] [INFO] [timer.py:197:stop] 0/4086, RunningAvgSamplesPerSec=6.327819538677858, CurrSamplesPerSec=5.725805141520477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:13:30,166] [INFO] [timer.py:197:stop] 0/4088, RunningAvgSamplesPerSec=6.327816069526667, CurrSamplesPerSec=5.694456398253484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:13:41,520] [INFO] [timer.py:197:stop] 0/4090, RunningAvgSamplesPerSec=6.327798796030577, CurrSamplesPerSec=5.7176141749816365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:13:53,010] [INFO] [timer.py:197:stop] 0/4092, RunningAvgSamplesPerSec=6.327809816413052, CurrSamplesPerSec=5.717624891983325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:14:04,683] [INFO] [timer.py:197:stop] 0/4094, RunningAvgSamplesPerSec=6.327698944060678, CurrSamplesPerSec=5.361100860103376, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:14:16,000] [INFO] [timer.py:197:stop] 0/4096, RunningAvgSamplesPerSec=6.327701972236272, CurrSamplesPerSec=5.694639294372098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:14:27,490] [INFO] [timer.py:197:stop] 0/4098, RunningAvgSamplesPerSec=6.327713612337126, CurrSamplesPerSec=5.701094078947005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:14:39,080] [INFO] [logging.py:68:log_dist] [Rank 0] step=2050, skipped=6, lr=[6.571111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 07:14:39,082] [INFO] [timer.py:197:stop] 0/4100, RunningAvgSamplesPerSec=6.327630945096014, CurrSamplesPerSec=5.696520157542729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:14:50,620] [INFO] [timer.py:197:stop] 0/4102, RunningAvgSamplesPerSec=6.327607555041236, CurrSamplesPerSec=5.612557689430374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:15:01,934] [INFO] [timer.py:197:stop] 0/4104, RunningAvgSamplesPerSec=6.327599684061117, CurrSamplesPerSec=5.652306118449108, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:15:13,226] [INFO] [timer.py:197:stop] 0/4106, RunningAvgSamplesPerSec=6.327601331435723, CurrSamplesPerSec=5.710617251929748, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:15:24,697] [INFO] [timer.py:197:stop] 0/4108, RunningAvgSamplesPerSec=6.327602880433422, CurrSamplesPerSec=5.720325151138135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:15:36,175] [INFO] [timer.py:197:stop] 0/4110, RunningAvgSamplesPerSec=6.327601545756043, CurrSamplesPerSec=5.679400256690218, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:15:47,500] [INFO] [timer.py:197:stop] 0/4112, RunningAvgSamplesPerSec=6.32760505502425, CurrSamplesPerSec=5.69963082566423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:15:59,107] [INFO] [timer.py:197:stop] 0/4114, RunningAvgSamplesPerSec=6.327516669575936, CurrSamplesPerSec=5.4126140803743645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0013, 'learning_rate': 6.555555555555556e-06, 'epoch': 15.41} [2022-12-19 07:16:10,370] [INFO] [timer.py:197:stop] 0/4116, RunningAvgSamplesPerSec=6.327527950944387, CurrSamplesPerSec=5.718632956866624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:16:21,937] [INFO] [timer.py:197:stop] 0/4118, RunningAvgSamplesPerSec=6.3275397198683265, CurrSamplesPerSec=5.708851634549047, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:16:33,714] [INFO] [logging.py:68:log_dist] [Rank 0] step=2060, skipped=6, lr=[6.548888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 07:16:33,716] [INFO] [timer.py:197:stop] 0/4120, RunningAvgSamplesPerSec=6.327458733013646, CurrSamplesPerSec=5.703523517319062, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:16:45,013] [INFO] [timer.py:197:stop] 0/4122, RunningAvgSamplesPerSec=6.32746783468485, CurrSamplesPerSec=5.70978519409174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:16:56,298] [INFO] [timer.py:197:stop] 0/4124, RunningAvgSamplesPerSec=6.327473454276054, CurrSamplesPerSec=5.6946738454205486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:17:07,740] [INFO] [timer.py:197:stop] 0/4126, RunningAvgSamplesPerSec=6.327474876260285, CurrSamplesPerSec=5.705007442308613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:17:19,025] [INFO] [timer.py:197:stop] 0/4128, RunningAvgSamplesPerSec=6.327486270725622, CurrSamplesPerSec=5.707100698270615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:17:30,567] [INFO] [timer.py:197:stop] 0/4130, RunningAvgSamplesPerSec=6.3274918936995945, CurrSamplesPerSec=5.693483887850807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:17:41,889] [INFO] [timer.py:197:stop] 0/4132, RunningAvgSamplesPerSec=6.327476616471172, CurrSamplesPerSec=5.6378316923753395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:17:53,318] [INFO] [timer.py:197:stop] 0/4134, RunningAvgSamplesPerSec=6.32748756538067, CurrSamplesPerSec=5.705100076851405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:18:04,645] [INFO] [timer.py:197:stop] 0/4136, RunningAvgSamplesPerSec=6.327485912570304, CurrSamplesPerSec=5.69243445181399, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:18:15,942] [INFO] [timer.py:197:stop] 0/4138, RunningAvgSamplesPerSec=6.327486414798369, CurrSamplesPerSec=5.691163862711132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:18:27,278] [INFO] [logging.py:68:log_dist] [Rank 0] step=2070, skipped=6, lr=[6.526666666666666e-06], mom=[[0.9, 0.999]] [2022-12-19 07:18:27,279] [INFO] [timer.py:197:stop] 0/4140, RunningAvgSamplesPerSec=6.327479735949386, CurrSamplesPerSec=5.679881183439485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:18:38,723] [INFO] [timer.py:197:stop] 0/4142, RunningAvgSamplesPerSec=6.327489267041019, CurrSamplesPerSec=5.705122872200109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:18:50,023] [INFO] [timer.py:197:stop] 0/4144, RunningAvgSamplesPerSec=6.327491397561359, CurrSamplesPerSec=5.689739710561916, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:19:01,307] [INFO] [timer.py:197:stop] 0/4146, RunningAvgSamplesPerSec=6.327491171226784, CurrSamplesPerSec=5.683539708488168, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:19:12,597] [INFO] [timer.py:197:stop] 0/4148, RunningAvgSamplesPerSec=6.327499281835347, CurrSamplesPerSec=5.70815360730411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:19:23,872] [INFO] [timer.py:197:stop] 0/4150, RunningAvgSamplesPerSec=6.327505718240229, CurrSamplesPerSec=5.703661428462032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:19:35,371] [INFO] [timer.py:197:stop] 0/4152, RunningAvgSamplesPerSec=6.327508524770663, CurrSamplesPerSec=5.685593634399551, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:19:46,642] [INFO] [timer.py:197:stop] 0/4154, RunningAvgSamplesPerSec=6.327521419725678, CurrSamplesPerSec=5.71092074008182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:19:57,943] [INFO] [timer.py:197:stop] 0/4156, RunningAvgSamplesPerSec=6.327526046543743, CurrSamplesPerSec=5.711219886070681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:20:09,239] [INFO] [timer.py:197:stop] 0/4158, RunningAvgSamplesPerSec=6.327528802364569, CurrSamplesPerSec=5.728299696465824, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:20:20,604] [INFO] [logging.py:68:log_dist] [Rank 0] step=2080, skipped=6, lr=[6.504444444444446e-06], mom=[[0.9, 0.999]] [2022-12-19 07:20:20,605] [INFO] [timer.py:197:stop] 0/4160, RunningAvgSamplesPerSec=6.327512831972087, CurrSamplesPerSec=5.648555505404242, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:20:32,089] [INFO] [timer.py:197:stop] 0/4162, RunningAvgSamplesPerSec=6.327510044716558, CurrSamplesPerSec=5.707619337275578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:20:43,364] [INFO] [timer.py:197:stop] 0/4164, RunningAvgSamplesPerSec=6.3275217492277624, CurrSamplesPerSec=5.7097997682008375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0015, 'learning_rate': 6.5000000000000004e-06, 'epoch': 15.6} [2022-12-19 07:20:54,665] [INFO] [timer.py:197:stop] 0/4166, RunningAvgSamplesPerSec=6.327521799607622, CurrSamplesPerSec=5.697375198331187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:21:05,947] [INFO] [timer.py:197:stop] 0/4168, RunningAvgSamplesPerSec=6.327524867697206, CurrSamplesPerSec=5.692418517663528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:21:17,286] [INFO] [timer.py:197:stop] 0/4170, RunningAvgSamplesPerSec=6.327528279887913, CurrSamplesPerSec=5.691493524152132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:21:28,715] [INFO] [timer.py:197:stop] 0/4172, RunningAvgSamplesPerSec=6.327533107862845, CurrSamplesPerSec=5.6947825752317405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:21:40,244] [INFO] [timer.py:197:stop] 0/4174, RunningAvgSamplesPerSec=6.327533012629813, CurrSamplesPerSec=5.68367761763574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:21:51,520] [INFO] [timer.py:197:stop] 0/4176, RunningAvgSamplesPerSec=6.327533923960162, CurrSamplesPerSec=5.709229490239102, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:22:02,816] [INFO] [timer.py:197:stop] 0/4178, RunningAvgSamplesPerSec=6.327540275114805, CurrSamplesPerSec=5.705321247924761, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:22:14,195] [INFO] [logging.py:68:log_dist] [Rank 0] step=2090, skipped=6, lr=[6.482222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 07:22:14,197] [INFO] [timer.py:197:stop] 0/4180, RunningAvgSamplesPerSec=6.327535498035287, CurrSamplesPerSec=5.679389682504387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:22:25,509] [INFO] [timer.py:197:stop] 0/4182, RunningAvgSamplesPerSec=6.327535403196752, CurrSamplesPerSec=5.680460037866989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:22:36,905] [INFO] [timer.py:197:stop] 0/4184, RunningAvgSamplesPerSec=6.327534155890988, CurrSamplesPerSec=5.6874000810877146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:22:48,212] [INFO] [timer.py:197:stop] 0/4186, RunningAvgSamplesPerSec=6.327533904219744, CurrSamplesPerSec=5.682747281808152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:22:59,488] [INFO] [timer.py:197:stop] 0/4188, RunningAvgSamplesPerSec=6.327542148121434, CurrSamplesPerSec=5.706159525527907, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:23:10,965] [INFO] [timer.py:197:stop] 0/4190, RunningAvgSamplesPerSec=6.327540264591215, CurrSamplesPerSec=5.6772136802439475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:23:22,273] [INFO] [timer.py:197:stop] 0/4192, RunningAvgSamplesPerSec=6.327547240164601, CurrSamplesPerSec=5.724482013468584, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:23:33,570] [INFO] [timer.py:197:stop] 0/4194, RunningAvgSamplesPerSec=6.327550582248767, CurrSamplesPerSec=5.712139634259407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:23:44,895] [INFO] [timer.py:197:stop] 0/4196, RunningAvgSamplesPerSec=6.327550648961243, CurrSamplesPerSec=5.701479627400173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:23:56,219] [INFO] [timer.py:197:stop] 0/4198, RunningAvgSamplesPerSec=6.3275473693426765, CurrSamplesPerSec=5.6896625280333595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:24:07,708] [INFO] [logging.py:68:log_dist] [Rank 0] step=2100, skipped=6, lr=[6.460000000000001e-06], mom=[[0.9, 0.999]] [2022-12-19 07:24:07,710] [INFO] [timer.py:197:stop] 0/4200, RunningAvgSamplesPerSec=6.327543275960251, CurrSamplesPerSec=5.688015421554892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:24:19,001] [INFO] [timer.py:197:stop] 0/4202, RunningAvgSamplesPerSec=6.327545123030779, CurrSamplesPerSec=5.710624541098961, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:24:30,314] [INFO] [timer.py:197:stop] 0/4204, RunningAvgSamplesPerSec=6.327541343002448, CurrSamplesPerSec=5.683989081067795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:24:41,660] [INFO] [timer.py:197:stop] 0/4206, RunningAvgSamplesPerSec=6.327531464327186, CurrSamplesPerSec=5.6644491012652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:24:53,034] [INFO] [timer.py:197:stop] 0/4208, RunningAvgSamplesPerSec=6.327516814008641, CurrSamplesPerSec=5.652806274982304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:25:04,331] [INFO] [timer.py:197:stop] 0/4210, RunningAvgSamplesPerSec=6.327522797028719, CurrSamplesPerSec=5.7024296746853675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:25:15,735] [INFO] [timer.py:197:stop] 0/4212, RunningAvgSamplesPerSec=6.327495925164295, CurrSamplesPerSec=5.597954305538699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:25:27,000] [INFO] [timer.py:197:stop] 0/4214, RunningAvgSamplesPerSec=6.327499033268627, CurrSamplesPerSec=5.690942098793239, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0017, 'learning_rate': 6.444444444444445e-06, 'epoch': 15.79} [2022-12-19 07:25:38,331] [INFO] [timer.py:197:stop] 0/4216, RunningAvgSamplesPerSec=6.32749222060532, CurrSamplesPerSec=5.660470128094449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:25:49,613] [INFO] [timer.py:197:stop] 0/4218, RunningAvgSamplesPerSec=6.327495286618515, CurrSamplesPerSec=5.706094026219567, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:26:00,877] [INFO] [logging.py:68:log_dist] [Rank 0] step=2110, skipped=6, lr=[6.4377777777777784e-06], mom=[[0.9, 0.999]] [2022-12-19 07:26:00,878] [INFO] [timer.py:197:stop] 0/4220, RunningAvgSamplesPerSec=6.327503061226016, CurrSamplesPerSec=5.699007403952033, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:26:12,162] [INFO] [timer.py:197:stop] 0/4222, RunningAvgSamplesPerSec=6.327513182453156, CurrSamplesPerSec=5.731932267417317, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:26:23,443] [INFO] [timer.py:197:stop] 0/4224, RunningAvgSamplesPerSec=6.3275214922948475, CurrSamplesPerSec=5.7258811092937, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:26:34,754] [INFO] [timer.py:197:stop] 0/4226, RunningAvgSamplesPerSec=6.327516430894585, CurrSamplesPerSec=5.670994207350218, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:26:46,014] [INFO] [timer.py:197:stop] 0/4228, RunningAvgSamplesPerSec=6.327521429821155, CurrSamplesPerSec=5.697597705776032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:26:57,461] [INFO] [timer.py:197:stop] 0/4230, RunningAvgSamplesPerSec=6.32752909838558, CurrSamplesPerSec=5.7116636810640165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:27:08,736] [INFO] [timer.py:197:stop] 0/4232, RunningAvgSamplesPerSec=6.327534088167109, CurrSamplesPerSec=5.707540212644554, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:27:20,170] [INFO] [timer.py:197:stop] 0/4234, RunningAvgSamplesPerSec=6.3275490380497255, CurrSamplesPerSec=5.72683344228667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:27:31,454] [INFO] [timer.py:197:stop] 0/4236, RunningAvgSamplesPerSec=6.327550790421362, CurrSamplesPerSec=5.69316486279673, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:27:42,753] [INFO] [timer.py:197:stop] 0/4238, RunningAvgSamplesPerSec=6.327559987594923, CurrSamplesPerSec=5.707127392408814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:27:53,991] [INFO] [logging.py:68:log_dist] [Rank 0] step=2120, skipped=6, lr=[6.415555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 07:27:53,992] [INFO] [timer.py:197:stop] 0/4240, RunningAvgSamplesPerSec=6.327584317816777, CurrSamplesPerSec=5.733592908610527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:28:05,458] [INFO] [timer.py:197:stop] 0/4242, RunningAvgSamplesPerSec=6.327585083130646, CurrSamplesPerSec=5.70209947192448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:28:16,745] [INFO] [timer.py:197:stop] 0/4244, RunningAvgSamplesPerSec=6.327582707815992, CurrSamplesPerSec=5.681356437286448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:28:28,045] [INFO] [timer.py:197:stop] 0/4246, RunningAvgSamplesPerSec=6.32758028075958, CurrSamplesPerSec=5.67489561600762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:28:39,470] [INFO] [timer.py:197:stop] 0/4248, RunningAvgSamplesPerSec=6.3275713666367315, CurrSamplesPerSec=5.665189325289095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:28:50,808] [INFO] [timer.py:197:stop] 0/4250, RunningAvgSamplesPerSec=6.3275604840896635, CurrSamplesPerSec=5.677673581458609, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:29:02,329] [INFO] [timer.py:197:stop] 0/4252, RunningAvgSamplesPerSec=6.3275511463061465, CurrSamplesPerSec=5.673776028655633, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:29:13,671] [INFO] [timer.py:197:stop] 0/4254, RunningAvgSamplesPerSec=6.327541995742091, CurrSamplesPerSec=5.660691433405216, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:29:25,103] [INFO] [timer.py:197:stop] 0/4256, RunningAvgSamplesPerSec=6.327543637786587, CurrSamplesPerSec=5.698265090479427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:29:36,441] [INFO] [timer.py:197:stop] 0/4258, RunningAvgSamplesPerSec=6.327532307721869, CurrSamplesPerSec=5.666902432713789, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:29:47,788] [INFO] [logging.py:68:log_dist] [Rank 0] step=2130, skipped=6, lr=[6.393333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 07:29:47,790] [INFO] [timer.py:197:stop] 0/4260, RunningAvgSamplesPerSec=6.327519157191412, CurrSamplesPerSec=5.662283574931564, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:29:59,248] [INFO] [timer.py:197:stop] 0/4262, RunningAvgSamplesPerSec=6.327514119426578, CurrSamplesPerSec=5.686551645435512, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:30:10,794] [INFO] [timer.py:197:stop] 0/4264, RunningAvgSamplesPerSec=6.327510273725452, CurrSamplesPerSec=5.676855417197779, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 6.3888888888888885e-06, 'epoch': 15.97} [2022-12-19 07:30:22,265] [INFO] [timer.py:197:stop] 0/4266, RunningAvgSamplesPerSec=6.327507418885697, CurrSamplesPerSec=5.696932895471651, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:30:33,589] [INFO] [timer.py:197:stop] 0/4268, RunningAvgSamplesPerSec=6.327506789650707, CurrSamplesPerSec=5.701971810235064, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:30:45,012] [INFO] [timer.py:197:stop] 0/4270, RunningAvgSamplesPerSec=6.3275002833274945, CurrSamplesPerSec=5.687721593731484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:30:55,403] [INFO] [timer.py:197:stop] 0/4272, RunningAvgSamplesPerSec=6.327732855668634, CurrSamplesPerSec=6.639315902434932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:31:06,665] [INFO] [timer.py:197:stop] 0/4274, RunningAvgSamplesPerSec=6.327737191261836, CurrSamplesPerSec=5.693903190892417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:31:17,966] [INFO] [timer.py:197:stop] 0/4276, RunningAvgSamplesPerSec=6.32774095393647, CurrSamplesPerSec=5.7114940294898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:31:29,265] [INFO] [timer.py:197:stop] 0/4278, RunningAvgSamplesPerSec=6.32775101432188, CurrSamplesPerSec=5.711316854006023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:31:40,569] [INFO] [logging.py:68:log_dist] [Rank 0] step=2140, skipped=6, lr=[6.371111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 07:31:40,570] [INFO] [timer.py:197:stop] 0/4280, RunningAvgSamplesPerSec=6.32775375336967, CurrSamplesPerSec=5.7084301270710744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:31:52,047] [INFO] [timer.py:197:stop] 0/4282, RunningAvgSamplesPerSec=6.327769403036804, CurrSamplesPerSec=5.7136793682748355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:32:03,317] [INFO] [timer.py:197:stop] 0/4284, RunningAvgSamplesPerSec=6.327774127622931, CurrSamplesPerSec=5.702438638903169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:32:14,555] [INFO] [timer.py:197:stop] 0/4286, RunningAvgSamplesPerSec=6.327788342127575, CurrSamplesPerSec=5.740203863228582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:32:25,818] [INFO] [timer.py:197:stop] 0/4288, RunningAvgSamplesPerSec=6.327800644104289, CurrSamplesPerSec=5.734053660919016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:32:37,062] [INFO] [timer.py:197:stop] 0/4290, RunningAvgSamplesPerSec=6.327814099658171, CurrSamplesPerSec=5.731087135007432, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:32:48,319] [INFO] [timer.py:197:stop] 0/4292, RunningAvgSamplesPerSec=6.3278282709586104, CurrSamplesPerSec=5.718408559812618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:32:59,695] [INFO] [timer.py:197:stop] 0/4294, RunningAvgSamplesPerSec=6.32784402289, CurrSamplesPerSec=5.732862127542922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:33:11,019] [INFO] [timer.py:197:stop] 0/4296, RunningAvgSamplesPerSec=6.327828904339943, CurrSamplesPerSec=5.63403356407852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:33:22,282] [INFO] [timer.py:197:stop] 0/4298, RunningAvgSamplesPerSec=6.327836194797394, CurrSamplesPerSec=5.723204400571392, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:33:33,548] [INFO] [logging.py:68:log_dist] [Rank 0] step=2150, skipped=6, lr=[6.348888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 07:33:33,550] [INFO] [timer.py:197:stop] 0/4300, RunningAvgSamplesPerSec=6.327841130970408, CurrSamplesPerSec=5.705162400736591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:33:44,928] [INFO] [timer.py:197:stop] 0/4302, RunningAvgSamplesPerSec=6.327849388076931, CurrSamplesPerSec=5.719203165412237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:33:56,242] [INFO] [timer.py:197:stop] 0/4304, RunningAvgSamplesPerSec=6.3278409784105465, CurrSamplesPerSec=5.6517810612356545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:34:07,515] [INFO] [timer.py:197:stop] 0/4306, RunningAvgSamplesPerSec=6.327851028521838, CurrSamplesPerSec=5.70047300031565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:34:18,944] [INFO] [timer.py:197:stop] 0/4308, RunningAvgSamplesPerSec=6.327850961156384, CurrSamplesPerSec=5.682694348938582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:34:30,418] [INFO] [timer.py:197:stop] 0/4310, RunningAvgSamplesPerSec=6.327866023877664, CurrSamplesPerSec=5.7204502226525396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:34:41,659] [INFO] [timer.py:197:stop] 0/4312, RunningAvgSamplesPerSec=6.327869478557813, CurrSamplesPerSec=5.679821093186444, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:34:52,930] [INFO] [timer.py:197:stop] 0/4314, RunningAvgSamplesPerSec=6.327883559865045, CurrSamplesPerSec=5.735430725233605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:35:04,199] [INFO] [timer.py:197:stop] 0/4316, RunningAvgSamplesPerSec=6.327895214225625, CurrSamplesPerSec=5.718054823023591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0015, 'learning_rate': 6.331111111111111e-06, 'epoch': 16.16} [2022-12-19 07:35:15,515] [INFO] [timer.py:197:stop] 0/4318, RunningAvgSamplesPerSec=6.327895952430418, CurrSamplesPerSec=5.684045408084271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:35:26,984] [INFO] [logging.py:68:log_dist] [Rank 0] step=2160, skipped=6, lr=[6.326666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 07:35:26,986] [INFO] [timer.py:197:stop] 0/4320, RunningAvgSamplesPerSec=6.327904325152686, CurrSamplesPerSec=5.710530512244364, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:35:38,359] [INFO] [timer.py:197:stop] 0/4322, RunningAvgSamplesPerSec=6.327916821737978, CurrSamplesPerSec=5.71486367017721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:35:49,840] [INFO] [timer.py:197:stop] 0/4324, RunningAvgSamplesPerSec=6.327924438435982, CurrSamplesPerSec=5.709603024005327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:36:01,155] [INFO] [timer.py:197:stop] 0/4326, RunningAvgSamplesPerSec=6.3279215763631065, CurrSamplesPerSec=5.677422607852446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:36:12,663] [INFO] [timer.py:197:stop] 0/4328, RunningAvgSamplesPerSec=6.327933172131465, CurrSamplesPerSec=5.710836178091582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:36:24,027] [INFO] [timer.py:197:stop] 0/4330, RunningAvgSamplesPerSec=6.3279395489842445, CurrSamplesPerSec=5.704673788888134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:36:35,530] [INFO] [timer.py:197:stop] 0/4332, RunningAvgSamplesPerSec=6.327944007878182, CurrSamplesPerSec=5.703990842521588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:36:46,792] [INFO] [timer.py:197:stop] 0/4334, RunningAvgSamplesPerSec=6.327955939360021, CurrSamplesPerSec=5.712641683105012, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:36:58,159] [INFO] [timer.py:197:stop] 0/4336, RunningAvgSamplesPerSec=6.32796332171729, CurrSamplesPerSec=5.699067174889507, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:37:09,655] [INFO] [timer.py:197:stop] 0/4338, RunningAvgSamplesPerSec=6.327964049268704, CurrSamplesPerSec=5.696669094151465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:37:20,941] [INFO] [logging.py:68:log_dist] [Rank 0] step=2170, skipped=6, lr=[6.304444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 07:37:20,943] [INFO] [timer.py:197:stop] 0/4340, RunningAvgSamplesPerSec=6.327976732515257, CurrSamplesPerSec=5.6970390515674385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:37:32,433] [INFO] [timer.py:197:stop] 0/4342, RunningAvgSamplesPerSec=6.327979399548105, CurrSamplesPerSec=5.704845702818752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:37:43,692] [INFO] [timer.py:197:stop] 0/4344, RunningAvgSamplesPerSec=6.327986160652517, CurrSamplesPerSec=5.715953768726433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:37:54,956] [INFO] [timer.py:197:stop] 0/4346, RunningAvgSamplesPerSec=6.327991098539809, CurrSamplesPerSec=5.70017473696871, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:38:06,444] [INFO] [timer.py:197:stop] 0/4348, RunningAvgSamplesPerSec=6.32799036219279, CurrSamplesPerSec=5.691577272929277, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:38:17,908] [INFO] [timer.py:197:stop] 0/4350, RunningAvgSamplesPerSec=6.327990648986314, CurrSamplesPerSec=5.700692360000017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:38:29,294] [INFO] [timer.py:197:stop] 0/4352, RunningAvgSamplesPerSec=6.327998817856221, CurrSamplesPerSec=5.695446883733262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:38:40,608] [INFO] [timer.py:197:stop] 0/4354, RunningAvgSamplesPerSec=6.328001957361373, CurrSamplesPerSec=5.712365241418337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:38:51,957] [INFO] [timer.py:197:stop] 0/4356, RunningAvgSamplesPerSec=6.328007579583737, CurrSamplesPerSec=5.71044013083775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:39:03,246] [INFO] [timer.py:197:stop] 0/4358, RunningAvgSamplesPerSec=6.328012081200142, CurrSamplesPerSec=5.711334595363051, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:39:14,523] [INFO] [logging.py:68:log_dist] [Rank 0] step=2180, skipped=6, lr=[6.282222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 07:39:14,525] [INFO] [timer.py:197:stop] 0/4360, RunningAvgSamplesPerSec=6.328018769281079, CurrSamplesPerSec=5.705833984542888, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:39:25,782] [INFO] [timer.py:197:stop] 0/4362, RunningAvgSamplesPerSec=6.32802740735169, CurrSamplesPerSec=5.698088734778731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:39:37,017] [INFO] [timer.py:197:stop] 0/4364, RunningAvgSamplesPerSec=6.328045194297809, CurrSamplesPerSec=5.7285561664307325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:39:48,479] [INFO] [timer.py:197:stop] 0/4366, RunningAvgSamplesPerSec=6.328048684969584, CurrSamplesPerSec=5.707846529390251, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0014, 'learning_rate': 6.275555555555556e-06, 'epoch': 16.35} [2022-12-19 07:39:59,756] [INFO] [timer.py:197:stop] 0/4368, RunningAvgSamplesPerSec=6.3280523337873165, CurrSamplesPerSec=5.71606112174935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:40:11,119] [INFO] [timer.py:197:stop] 0/4370, RunningAvgSamplesPerSec=6.3280558735327705, CurrSamplesPerSec=5.710641792206897, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:40:22,564] [INFO] [timer.py:197:stop] 0/4372, RunningAvgSamplesPerSec=6.328054130485861, CurrSamplesPerSec=5.687892969518997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:40:33,900] [INFO] [timer.py:197:stop] 0/4374, RunningAvgSamplesPerSec=6.3280448727617795, CurrSamplesPerSec=5.715315332808633, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:40:45,207] [INFO] [timer.py:197:stop] 0/4376, RunningAvgSamplesPerSec=6.328046519784188, CurrSamplesPerSec=5.710491638170454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:40:56,490] [INFO] [timer.py:197:stop] 0/4378, RunningAvgSamplesPerSec=6.328049926986632, CurrSamplesPerSec=5.7003575164045195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:41:07,733] [INFO] [logging.py:68:log_dist] [Rank 0] step=2190, skipped=6, lr=[6.26e-06], mom=[[0.9, 0.999]] [2022-12-19 07:41:07,735] [INFO] [timer.py:197:stop] 0/4380, RunningAvgSamplesPerSec=6.328061997639105, CurrSamplesPerSec=5.712763500994983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:41:19,023] [INFO] [timer.py:197:stop] 0/4382, RunningAvgSamplesPerSec=6.328066404178767, CurrSamplesPerSec=5.726638943212002, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:41:30,479] [INFO] [timer.py:197:stop] 0/4384, RunningAvgSamplesPerSec=6.328059820367006, CurrSamplesPerSec=5.67685637762807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:41:41,723] [INFO] [timer.py:197:stop] 0/4386, RunningAvgSamplesPerSec=6.328071914463811, CurrSamplesPerSec=5.73545891050161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:41:53,013] [INFO] [timer.py:197:stop] 0/4388, RunningAvgSamplesPerSec=6.328081132450464, CurrSamplesPerSec=5.701751867639971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:42:04,333] [INFO] [timer.py:197:stop] 0/4390, RunningAvgSamplesPerSec=6.328080522218407, CurrSamplesPerSec=5.696755413250745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:42:15,623] [INFO] [timer.py:197:stop] 0/4392, RunningAvgSamplesPerSec=6.328080580546219, CurrSamplesPerSec=5.706487044626573, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:42:27,121] [INFO] [timer.py:197:stop] 0/4394, RunningAvgSamplesPerSec=6.328082486764558, CurrSamplesPerSec=5.689431234183658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:42:38,405] [INFO] [timer.py:197:stop] 0/4396, RunningAvgSamplesPerSec=6.328081462962241, CurrSamplesPerSec=5.696589547493753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:42:49,716] [INFO] [timer.py:197:stop] 0/4398, RunningAvgSamplesPerSec=6.328080728538393, CurrSamplesPerSec=5.68681788384086, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:43:00,994] [INFO] [logging.py:68:log_dist] [Rank 0] step=2200, skipped=6, lr=[6.237777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 07:43:00,996] [INFO] [timer.py:197:stop] 0/4400, RunningAvgSamplesPerSec=6.328089317218389, CurrSamplesPerSec=5.705737445252136, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:43:12,389] [INFO] [timer.py:197:stop] 0/4402, RunningAvgSamplesPerSec=6.328089225831719, CurrSamplesPerSec=5.683660769737039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:43:23,707] [INFO] [timer.py:197:stop] 0/4404, RunningAvgSamplesPerSec=6.328086645063166, CurrSamplesPerSec=5.669426859480568, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:43:35,155] [INFO] [timer.py:197:stop] 0/4406, RunningAvgSamplesPerSec=6.3280888301086256, CurrSamplesPerSec=5.698695018359517, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:43:46,456] [INFO] [timer.py:197:stop] 0/4408, RunningAvgSamplesPerSec=6.328095378619876, CurrSamplesPerSec=5.694730867648217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:43:57,864] [INFO] [timer.py:197:stop] 0/4410, RunningAvgSamplesPerSec=6.328103440200002, CurrSamplesPerSec=5.720394147028211, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:44:09,136] [INFO] [timer.py:197:stop] 0/4412, RunningAvgSamplesPerSec=6.3281104201276435, CurrSamplesPerSec=5.68973295700707, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:44:20,478] [INFO] [timer.py:197:stop] 0/4414, RunningAvgSamplesPerSec=6.328105291290537, CurrSamplesPerSec=5.651051712545644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:44:31,721] [INFO] [timer.py:197:stop] 0/4416, RunningAvgSamplesPerSec=6.3281145700853285, CurrSamplesPerSec=5.716176025597591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0018, 'learning_rate': 6.220000000000001e-06, 'epoch': 16.54} [2022-12-19 07:44:42,969] [INFO] [timer.py:197:stop] 0/4418, RunningAvgSamplesPerSec=6.328120433961753, CurrSamplesPerSec=5.71026933722783, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:44:54,268] [INFO] [logging.py:68:log_dist] [Rank 0] step=2210, skipped=6, lr=[6.2155555555555554e-06], mom=[[0.9, 0.999]] [2022-12-19 07:44:54,270] [INFO] [timer.py:197:stop] 0/4420, RunningAvgSamplesPerSec=6.3281226390857475, CurrSamplesPerSec=5.700312002018545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:45:05,560] [INFO] [timer.py:197:stop] 0/4422, RunningAvgSamplesPerSec=6.328127149036697, CurrSamplesPerSec=5.690490178608186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:45:16,849] [INFO] [timer.py:197:stop] 0/4424, RunningAvgSamplesPerSec=6.328130276060804, CurrSamplesPerSec=5.716251251169824, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:45:28,109] [INFO] [timer.py:197:stop] 0/4426, RunningAvgSamplesPerSec=6.328136066016005, CurrSamplesPerSec=5.710004056460312, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:45:39,549] [INFO] [timer.py:197:stop] 0/4428, RunningAvgSamplesPerSec=6.328142027366078, CurrSamplesPerSec=5.713190512503829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:45:50,839] [INFO] [timer.py:197:stop] 0/4430, RunningAvgSamplesPerSec=6.328145055785313, CurrSamplesPerSec=5.7037935288682835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:46:02,300] [INFO] [timer.py:197:stop] 0/4432, RunningAvgSamplesPerSec=6.328149850631655, CurrSamplesPerSec=5.705292630526187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:46:13,660] [INFO] [timer.py:197:stop] 0/4434, RunningAvgSamplesPerSec=6.328137516191118, CurrSamplesPerSec=5.684472711337361, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:46:25,012] [INFO] [timer.py:197:stop] 0/4436, RunningAvgSamplesPerSec=6.328122358307474, CurrSamplesPerSec=5.642218600618662, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:46:36,479] [INFO] [timer.py:197:stop] 0/4438, RunningAvgSamplesPerSec=6.3281221716506675, CurrSamplesPerSec=5.698713165307471, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:46:47,782] [INFO] [logging.py:68:log_dist] [Rank 0] step=2220, skipped=6, lr=[6.193333333333333e-06], mom=[[0.9, 0.999]] [2022-12-19 07:46:47,784] [INFO] [timer.py:197:stop] 0/4440, RunningAvgSamplesPerSec=6.32812643756493, CurrSamplesPerSec=5.702716785934862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:46:59,138] [INFO] [timer.py:197:stop] 0/4442, RunningAvgSamplesPerSec=6.32810844836018, CurrSamplesPerSec=5.635163072338547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:47:10,448] [INFO] [timer.py:197:stop] 0/4444, RunningAvgSamplesPerSec=6.328105168506496, CurrSamplesPerSec=5.69908338827495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:47:22,039] [INFO] [timer.py:197:stop] 0/4446, RunningAvgSamplesPerSec=6.32809291746349, CurrSamplesPerSec=5.672091363209361, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:47:33,346] [INFO] [timer.py:197:stop] 0/4448, RunningAvgSamplesPerSec=6.328090131201207, CurrSamplesPerSec=5.693735558879809, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:47:44,641] [INFO] [timer.py:197:stop] 0/4450, RunningAvgSamplesPerSec=6.328089570764818, CurrSamplesPerSec=5.690132409104298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:47:55,986] [INFO] [timer.py:197:stop] 0/4452, RunningAvgSamplesPerSec=6.328092034490897, CurrSamplesPerSec=5.698772204177787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:48:07,328] [INFO] [timer.py:197:stop] 0/4454, RunningAvgSamplesPerSec=6.32810164672663, CurrSamplesPerSec=5.720720620115813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:48:18,592] [INFO] [timer.py:197:stop] 0/4456, RunningAvgSamplesPerSec=6.328114900730474, CurrSamplesPerSec=5.735111638303263, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:48:29,812] [INFO] [timer.py:197:stop] 0/4458, RunningAvgSamplesPerSec=6.328129594049743, CurrSamplesPerSec=5.746177102813401, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:48:41,295] [INFO] [logging.py:68:log_dist] [Rank 0] step=2230, skipped=6, lr=[6.171111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 07:48:41,297] [INFO] [timer.py:197:stop] 0/4460, RunningAvgSamplesPerSec=6.328134940396023, CurrSamplesPerSec=5.702963700501722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:48:52,792] [INFO] [timer.py:197:stop] 0/4462, RunningAvgSamplesPerSec=6.328138773961759, CurrSamplesPerSec=5.719198778758668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:49:04,052] [INFO] [timer.py:197:stop] 0/4464, RunningAvgSamplesPerSec=6.328146800014386, CurrSamplesPerSec=5.697278703222744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:49:15,551] [INFO] [timer.py:197:stop] 0/4466, RunningAvgSamplesPerSec=6.328157428566357, CurrSamplesPerSec=5.72314973517512, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0011, 'learning_rate': 6.1644444444444455e-06, 'epoch': 16.73} [2022-12-19 07:49:27,065] [INFO] [timer.py:197:stop] 0/4468, RunningAvgSamplesPerSec=6.328171545093418, CurrSamplesPerSec=5.723607346043038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:49:38,394] [INFO] [timer.py:197:stop] 0/4470, RunningAvgSamplesPerSec=6.3281704628278, CurrSamplesPerSec=5.687353568400186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:49:49,666] [INFO] [timer.py:197:stop] 0/4472, RunningAvgSamplesPerSec=6.328177131993938, CurrSamplesPerSec=5.706216778006809, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:50:00,969] [INFO] [timer.py:197:stop] 0/4474, RunningAvgSamplesPerSec=6.328182231993997, CurrSamplesPerSec=5.704543102437384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:50:12,192] [INFO] [timer.py:197:stop] 0/4476, RunningAvgSamplesPerSec=6.328185041552007, CurrSamplesPerSec=5.695297044344549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:50:23,431] [INFO] [timer.py:197:stop] 0/4478, RunningAvgSamplesPerSec=6.328196625803805, CurrSamplesPerSec=5.722129587410931, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:50:34,661] [INFO] [logging.py:68:log_dist] [Rank 0] step=2240, skipped=6, lr=[6.14888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 07:50:34,663] [INFO] [timer.py:197:stop] 0/4480, RunningAvgSamplesPerSec=6.328214356069589, CurrSamplesPerSec=5.724754257287573, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:50:45,916] [INFO] [timer.py:197:stop] 0/4482, RunningAvgSamplesPerSec=6.328228186670316, CurrSamplesPerSec=5.7142408043634685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:50:57,184] [INFO] [timer.py:197:stop] 0/4484, RunningAvgSamplesPerSec=6.328234939227396, CurrSamplesPerSec=5.708210899805254, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:51:08,474] [INFO] [timer.py:197:stop] 0/4486, RunningAvgSamplesPerSec=6.32824108816872, CurrSamplesPerSec=5.692516055376709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:51:19,835] [INFO] [timer.py:197:stop] 0/4488, RunningAvgSamplesPerSec=6.328253867513497, CurrSamplesPerSec=5.7120764285529155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:51:31,275] [INFO] [timer.py:197:stop] 0/4490, RunningAvgSamplesPerSec=6.328245319201125, CurrSamplesPerSec=5.656398288318438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:51:42,550] [INFO] [timer.py:197:stop] 0/4492, RunningAvgSamplesPerSec=6.328256662144481, CurrSamplesPerSec=5.710011586984663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:51:54,033] [INFO] [timer.py:197:stop] 0/4494, RunningAvgSamplesPerSec=6.328264520496206, CurrSamplesPerSec=5.694183646343379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:52:05,291] [INFO] [timer.py:197:stop] 0/4496, RunningAvgSamplesPerSec=6.32827782094765, CurrSamplesPerSec=5.724110436344785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:52:16,535] [INFO] [timer.py:197:stop] 0/4498, RunningAvgSamplesPerSec=6.328290452346506, CurrSamplesPerSec=5.730786148905441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:52:27,789] [INFO] [logging.py:68:log_dist] [Rank 0] step=2250, skipped=6, lr=[6.126666666666668e-06], mom=[[0.9, 0.999]] [2022-12-19 07:52:27,790] [INFO] [timer.py:197:stop] 0/4500, RunningAvgSamplesPerSec=6.328302585727518, CurrSamplesPerSec=5.724235429309571, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:52:39,067] [INFO] [timer.py:197:stop] 0/4502, RunningAvgSamplesPerSec=6.328311192289777, CurrSamplesPerSec=5.694714437397707, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:52:50,541] [INFO] [timer.py:197:stop] 0/4504, RunningAvgSamplesPerSec=6.328323924914582, CurrSamplesPerSec=5.710070374341397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:53:01,842] [INFO] [timer.py:197:stop] 0/4506, RunningAvgSamplesPerSec=6.328328240386704, CurrSamplesPerSec=5.702842057876201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:53:13,121] [INFO] [timer.py:197:stop] 0/4508, RunningAvgSamplesPerSec=6.328337578000881, CurrSamplesPerSec=5.709292147238084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:53:24,478] [INFO] [timer.py:197:stop] 0/4510, RunningAvgSamplesPerSec=6.328350115083331, CurrSamplesPerSec=5.707034449351997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:53:35,731] [INFO] [timer.py:197:stop] 0/4512, RunningAvgSamplesPerSec=6.328359915829491, CurrSamplesPerSec=5.708223523731314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:53:47,014] [INFO] [timer.py:197:stop] 0/4514, RunningAvgSamplesPerSec=6.3283698350210384, CurrSamplesPerSec=5.7165655645417015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:53:58,379] [INFO] [timer.py:197:stop] 0/4516, RunningAvgSamplesPerSec=6.3283757331401675, CurrSamplesPerSec=5.7039988420116865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0013, 'learning_rate': 6.1088888888888895e-06, 'epoch': 16.91} [2022-12-19 07:54:09,770] [INFO] [timer.py:197:stop] 0/4518, RunningAvgSamplesPerSec=6.328375101712284, CurrSamplesPerSec=5.696730508592268, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:54:21,092] [INFO] [logging.py:68:log_dist] [Rank 0] step=2260, skipped=6, lr=[6.104444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 07:54:21,093] [INFO] [timer.py:197:stop] 0/4520, RunningAvgSamplesPerSec=6.32837675778896, CurrSamplesPerSec=5.677374817307428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:54:32,426] [INFO] [timer.py:197:stop] 0/4522, RunningAvgSamplesPerSec=6.328378009186982, CurrSamplesPerSec=5.70477926376669, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:54:43,700] [INFO] [timer.py:197:stop] 0/4524, RunningAvgSamplesPerSec=6.328384862459244, CurrSamplesPerSec=5.720463144582699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:54:55,013] [INFO] [timer.py:197:stop] 0/4526, RunningAvgSamplesPerSec=6.328393405283829, CurrSamplesPerSec=5.7077557474654235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:55:06,483] [INFO] [timer.py:197:stop] 0/4528, RunningAvgSamplesPerSec=6.328392374118707, CurrSamplesPerSec=5.684903209093661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:55:17,790] [INFO] [timer.py:197:stop] 0/4530, RunningAvgSamplesPerSec=6.328388932207091, CurrSamplesPerSec=5.685052020461959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:55:29,133] [INFO] [timer.py:197:stop] 0/4532, RunningAvgSamplesPerSec=6.328383551206331, CurrSamplesPerSec=5.676765858501234, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:55:40,458] [INFO] [timer.py:197:stop] 0/4534, RunningAvgSamplesPerSec=6.328389458980619, CurrSamplesPerSec=5.698872379485937, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:55:51,801] [INFO] [timer.py:197:stop] 0/4536, RunningAvgSamplesPerSec=6.328386304910444, CurrSamplesPerSec=5.679551423788033, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:56:03,088] [INFO] [timer.py:197:stop] 0/4538, RunningAvgSamplesPerSec=6.328395485147144, CurrSamplesPerSec=5.6907418263206395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:56:13,401] [INFO] [logging.py:68:log_dist] [Rank 0] step=2270, skipped=6, lr=[6.082222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 07:56:13,403] [INFO] [timer.py:197:stop] 0/4540, RunningAvgSamplesPerSec=6.3286295087082785, CurrSamplesPerSec=5.705493928489933, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:56:24,798] [INFO] [timer.py:197:stop] 0/4542, RunningAvgSamplesPerSec=6.328610798984757, CurrSamplesPerSec=5.633188916943388, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:56:36,162] [INFO] [timer.py:197:stop] 0/4544, RunningAvgSamplesPerSec=6.328603494618587, CurrSamplesPerSec=5.675135807892851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:56:47,512] [INFO] [timer.py:197:stop] 0/4546, RunningAvgSamplesPerSec=6.3286078462381745, CurrSamplesPerSec=5.690955611660457, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:56:58,877] [INFO] [timer.py:197:stop] 0/4548, RunningAvgSamplesPerSec=6.328607306907084, CurrSamplesPerSec=5.682919320444803, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:57:10,226] [INFO] [timer.py:197:stop] 0/4550, RunningAvgSamplesPerSec=6.328605483622467, CurrSamplesPerSec=5.677060236312161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:57:21,511] [INFO] [timer.py:197:stop] 0/4552, RunningAvgSamplesPerSec=6.328608943515726, CurrSamplesPerSec=5.673588954076758, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:57:32,863] [INFO] [timer.py:197:stop] 0/4554, RunningAvgSamplesPerSec=6.328610032597524, CurrSamplesPerSec=5.704874073250429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:57:44,284] [INFO] [timer.py:197:stop] 0/4556, RunningAvgSamplesPerSec=6.328607367066228, CurrSamplesPerSec=5.68419297090336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:57:55,595] [INFO] [timer.py:197:stop] 0/4558, RunningAvgSamplesPerSec=6.328603407034294, CurrSamplesPerSec=5.679418761610169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:58:06,955] [INFO] [logging.py:68:log_dist] [Rank 0] step=2280, skipped=6, lr=[6.0600000000000004e-06], mom=[[0.9, 0.999]] [2022-12-19 07:58:06,957] [INFO] [timer.py:197:stop] 0/4560, RunningAvgSamplesPerSec=6.3286000664737685, CurrSamplesPerSec=5.680030933882382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:58:18,316] [INFO] [timer.py:197:stop] 0/4562, RunningAvgSamplesPerSec=6.3285928603266575, CurrSamplesPerSec=5.661558200069828, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:58:29,614] [INFO] [timer.py:197:stop] 0/4564, RunningAvgSamplesPerSec=6.328606268781055, CurrSamplesPerSec=5.71060340255951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:58:41,011] [INFO] [timer.py:197:stop] 0/4566, RunningAvgSamplesPerSec=6.328602085635176, CurrSamplesPerSec=5.679786722133334, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0023, 'learning_rate': 6.0533333333333335e-06, 'epoch': 17.1} [2022-12-19 07:58:52,298] [INFO] [timer.py:197:stop] 0/4568, RunningAvgSamplesPerSec=6.328608662261842, CurrSamplesPerSec=5.680753596760485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:59:03,613] [INFO] [timer.py:197:stop] 0/4570, RunningAvgSamplesPerSec=6.3286080961664455, CurrSamplesPerSec=5.693281987448014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:59:14,914] [INFO] [timer.py:197:stop] 0/4572, RunningAvgSamplesPerSec=6.328607075319114, CurrSamplesPerSec=5.6792642372132836, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:59:26,274] [INFO] [timer.py:197:stop] 0/4574, RunningAvgSamplesPerSec=6.328605132933337, CurrSamplesPerSec=5.694415326764294, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:59:37,612] [INFO] [timer.py:197:stop] 0/4576, RunningAvgSamplesPerSec=6.328612157191911, CurrSamplesPerSec=5.72902124490384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 07:59:48,976] [INFO] [timer.py:197:stop] 0/4578, RunningAvgSamplesPerSec=6.32861135364644, CurrSamplesPerSec=5.685992505479141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:00:00,299] [INFO] [logging.py:68:log_dist] [Rank 0] step=2290, skipped=6, lr=[6.037777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 08:00:00,301] [INFO] [timer.py:197:stop] 0/4580, RunningAvgSamplesPerSec=6.32860666814434, CurrSamplesPerSec=5.677320063300103, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:00:11,618] [INFO] [timer.py:197:stop] 0/4582, RunningAvgSamplesPerSec=6.328615080350387, CurrSamplesPerSec=5.699467454691609, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:00:23,124] [INFO] [timer.py:197:stop] 0/4584, RunningAvgSamplesPerSec=6.328627250359331, CurrSamplesPerSec=5.715725200553083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:00:34,397] [INFO] [timer.py:197:stop] 0/4586, RunningAvgSamplesPerSec=6.328629788878268, CurrSamplesPerSec=5.686111021438891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:00:45,746] [INFO] [timer.py:197:stop] 0/4588, RunningAvgSamplesPerSec=6.328618809419887, CurrSamplesPerSec=5.625012829579804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:00:57,150] [INFO] [timer.py:197:stop] 0/4590, RunningAvgSamplesPerSec=6.328613179651839, CurrSamplesPerSec=5.6728976047050255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:01:08,509] [INFO] [timer.py:197:stop] 0/4592, RunningAvgSamplesPerSec=6.328616946062387, CurrSamplesPerSec=5.689474163216236, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:01:19,844] [INFO] [timer.py:197:stop] 0/4594, RunningAvgSamplesPerSec=6.32860221045314, CurrSamplesPerSec=5.695729908346083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:01:31,303] [INFO] [timer.py:197:stop] 0/4596, RunningAvgSamplesPerSec=6.328598214822203, CurrSamplesPerSec=5.687782333335452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:01:42,717] [INFO] [timer.py:197:stop] 0/4598, RunningAvgSamplesPerSec=6.328568716312939, CurrSamplesPerSec=5.57640847621789, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:01:54,005] [INFO] [logging.py:68:log_dist] [Rank 0] step=2300, skipped=6, lr=[6.015555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 08:01:54,007] [INFO] [timer.py:197:stop] 0/4600, RunningAvgSamplesPerSec=6.32857292021647, CurrSamplesPerSec=5.705137422517932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:02:05,528] [INFO] [timer.py:197:stop] 0/4602, RunningAvgSamplesPerSec=6.328513106154114, CurrSamplesPerSec=5.4698688749737645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:02:16,912] [INFO] [timer.py:197:stop] 0/4604, RunningAvgSamplesPerSec=6.3285142977065885, CurrSamplesPerSec=5.701476963255819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:02:28,256] [INFO] [timer.py:197:stop] 0/4606, RunningAvgSamplesPerSec=6.328511168152795, CurrSamplesPerSec=5.68484229018227, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:02:39,711] [INFO] [timer.py:197:stop] 0/4608, RunningAvgSamplesPerSec=6.328509213395203, CurrSamplesPerSec=5.6802383861450965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:02:51,104] [INFO] [timer.py:197:stop] 0/4610, RunningAvgSamplesPerSec=6.32850175432953, CurrSamplesPerSec=5.665947442487676, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:03:02,396] [INFO] [timer.py:197:stop] 0/4612, RunningAvgSamplesPerSec=6.328506003113973, CurrSamplesPerSec=5.666994073200039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:03:13,685] [INFO] [timer.py:197:stop] 0/4614, RunningAvgSamplesPerSec=6.328517260922642, CurrSamplesPerSec=5.725944132320268, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:03:25,177] [INFO] [timer.py:197:stop] 0/4616, RunningAvgSamplesPerSec=6.328519076189157, CurrSamplesPerSec=5.682928223426949, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0018, 'learning_rate': 5.9977777777777776e-06, 'epoch': 17.29} [2022-12-19 08:03:36,516] [INFO] [timer.py:197:stop] 0/4618, RunningAvgSamplesPerSec=6.328510346322669, CurrSamplesPerSec=5.693830001638351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:03:48,011] [INFO] [logging.py:68:log_dist] [Rank 0] step=2310, skipped=6, lr=[5.993333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 08:03:48,012] [INFO] [timer.py:197:stop] 0/4620, RunningAvgSamplesPerSec=6.328514419201842, CurrSamplesPerSec=5.69182370653746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:03:59,346] [INFO] [timer.py:197:stop] 0/4622, RunningAvgSamplesPerSec=6.328502912928981, CurrSamplesPerSec=5.6444701612873045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:04:10,633] [INFO] [timer.py:197:stop] 0/4624, RunningAvgSamplesPerSec=6.328500262554551, CurrSamplesPerSec=5.669273357246172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:04:22,111] [INFO] [timer.py:197:stop] 0/4626, RunningAvgSamplesPerSec=6.328505791886148, CurrSamplesPerSec=5.697225499297767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:04:33,472] [INFO] [timer.py:197:stop] 0/4628, RunningAvgSamplesPerSec=6.328499543107492, CurrSamplesPerSec=5.673938889519243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:04:44,944] [INFO] [timer.py:197:stop] 0/4630, RunningAvgSamplesPerSec=6.328505427295265, CurrSamplesPerSec=5.6999583219059025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:04:56,330] [INFO] [timer.py:197:stop] 0/4632, RunningAvgSamplesPerSec=6.328483231129387, CurrSamplesPerSec=5.6176675160138885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:05:07,712] [INFO] [timer.py:197:stop] 0/4634, RunningAvgSamplesPerSec=6.328489241837222, CurrSamplesPerSec=5.710521765531588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:05:19,336] [INFO] [timer.py:197:stop] 0/4636, RunningAvgSamplesPerSec=6.328400834815939, CurrSamplesPerSec=5.355833054636687, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:05:30,617] [INFO] [timer.py:197:stop] 0/4638, RunningAvgSamplesPerSec=6.32840974045595, CurrSamplesPerSec=5.716566051498605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:05:41,964] [INFO] [logging.py:68:log_dist] [Rank 0] step=2320, skipped=6, lr=[5.971111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 08:05:41,966] [INFO] [timer.py:197:stop] 0/4640, RunningAvgSamplesPerSec=6.32840827666769, CurrSamplesPerSec=5.678328378778922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:05:53,589] [INFO] [timer.py:197:stop] 0/4642, RunningAvgSamplesPerSec=6.328398936174408, CurrSamplesPerSec=5.670693031089545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:06:04,950] [INFO] [timer.py:197:stop] 0/4644, RunningAvgSamplesPerSec=6.328386243749554, CurrSamplesPerSec=5.637445705799821, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:06:16,313] [INFO] [timer.py:197:stop] 0/4646, RunningAvgSamplesPerSec=6.328376224498588, CurrSamplesPerSec=5.663672743381489, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:06:27,662] [INFO] [timer.py:197:stop] 0/4648, RunningAvgSamplesPerSec=6.3283666754799235, CurrSamplesPerSec=5.661832850938865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:06:39,241] [INFO] [timer.py:197:stop] 0/4650, RunningAvgSamplesPerSec=6.328360769470566, CurrSamplesPerSec=5.669332027165608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:06:50,800] [INFO] [timer.py:197:stop] 0/4652, RunningAvgSamplesPerSec=6.328296369909431, CurrSamplesPerSec=5.661080609846457, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:07:02,106] [INFO] [timer.py:197:stop] 0/4654, RunningAvgSamplesPerSec=6.328297644059955, CurrSamplesPerSec=5.6835584810676325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:07:13,779] [INFO] [timer.py:197:stop] 0/4656, RunningAvgSamplesPerSec=6.328284319478529, CurrSamplesPerSec=5.683466785413696, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:07:25,093] [INFO] [timer.py:197:stop] 0/4658, RunningAvgSamplesPerSec=6.3282823189803254, CurrSamplesPerSec=5.6893668418494565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:07:36,467] [INFO] [logging.py:68:log_dist] [Rank 0] step=2330, skipped=6, lr=[5.948888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 08:07:36,468] [INFO] [timer.py:197:stop] 0/4660, RunningAvgSamplesPerSec=6.328267026491021, CurrSamplesPerSec=5.623022220740078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:07:48,003] [INFO] [timer.py:197:stop] 0/4662, RunningAvgSamplesPerSec=6.328258249150498, CurrSamplesPerSec=5.666179940522256, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:07:59,392] [INFO] [timer.py:197:stop] 0/4664, RunningAvgSamplesPerSec=6.328255688403663, CurrSamplesPerSec=5.689629967282166, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:08:10,793] [INFO] [timer.py:197:stop] 0/4666, RunningAvgSamplesPerSec=6.328239916054198, CurrSamplesPerSec=5.710708610862146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0024, 'learning_rate': 5.9422222222222225e-06, 'epoch': 17.48} [2022-12-19 08:08:22,249] [INFO] [timer.py:197:stop] 0/4668, RunningAvgSamplesPerSec=6.32824533448246, CurrSamplesPerSec=5.692099129189937, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:08:33,586] [INFO] [timer.py:197:stop] 0/4670, RunningAvgSamplesPerSec=6.328236679525057, CurrSamplesPerSec=5.662672492514194, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:08:44,977] [INFO] [timer.py:197:stop] 0/4672, RunningAvgSamplesPerSec=6.328209104001428, CurrSamplesPerSec=5.606889330195561, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:08:56,257] [INFO] [timer.py:197:stop] 0/4674, RunningAvgSamplesPerSec=6.328216568650433, CurrSamplesPerSec=5.716594051660111, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:09:07,840] [INFO] [timer.py:197:stop] 0/4676, RunningAvgSamplesPerSec=6.328219284242913, CurrSamplesPerSec=5.703723720918729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:09:19,127] [INFO] [timer.py:197:stop] 0/4678, RunningAvgSamplesPerSec=6.3282226005057804, CurrSamplesPerSec=5.688688760898525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:09:30,671] [INFO] [logging.py:68:log_dist] [Rank 0] step=2340, skipped=6, lr=[5.926666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 08:09:30,672] [INFO] [timer.py:197:stop] 0/4680, RunningAvgSamplesPerSec=6.32822821064071, CurrSamplesPerSec=5.699202450588135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:09:42,065] [INFO] [timer.py:197:stop] 0/4682, RunningAvgSamplesPerSec=6.328203417397917, CurrSamplesPerSec=5.594181485955124, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:09:53,356] [INFO] [timer.py:197:stop] 0/4684, RunningAvgSamplesPerSec=6.328209385538534, CurrSamplesPerSec=5.712483886815656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:10:04,633] [INFO] [timer.py:197:stop] 0/4686, RunningAvgSamplesPerSec=6.32821310337981, CurrSamplesPerSec=5.685237443585601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:10:15,920] [INFO] [timer.py:197:stop] 0/4688, RunningAvgSamplesPerSec=6.328219155844885, CurrSamplesPerSec=5.703353621899073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:10:27,244] [INFO] [timer.py:197:stop] 0/4690, RunningAvgSamplesPerSec=6.328210693779651, CurrSamplesPerSec=5.689009454715617, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:10:38,612] [INFO] [timer.py:197:stop] 0/4692, RunningAvgSamplesPerSec=6.328205795503737, CurrSamplesPerSec=5.708913797600395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:10:49,990] [INFO] [timer.py:197:stop] 0/4694, RunningAvgSamplesPerSec=6.3281926406907045, CurrSamplesPerSec=5.661159406682567, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:11:01,341] [INFO] [timer.py:197:stop] 0/4696, RunningAvgSamplesPerSec=6.32818746795654, CurrSamplesPerSec=5.673308605145578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:11:12,756] [INFO] [timer.py:197:stop] 0/4698, RunningAvgSamplesPerSec=6.32816150741628, CurrSamplesPerSec=5.60894424024975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:11:24,256] [INFO] [logging.py:68:log_dist] [Rank 0] step=2350, skipped=6, lr=[5.9044444444444446e-06], mom=[[0.9, 0.999]] [2022-12-19 08:11:24,258] [INFO] [timer.py:197:stop] 0/4700, RunningAvgSamplesPerSec=6.328163671557607, CurrSamplesPerSec=5.702780026981379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:11:35,709] [INFO] [timer.py:197:stop] 0/4702, RunningAvgSamplesPerSec=6.32813513709199, CurrSamplesPerSec=5.593438721137064, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:11:47,065] [INFO] [timer.py:197:stop] 0/4704, RunningAvgSamplesPerSec=6.328127332776014, CurrSamplesPerSec=5.680918541605391, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:11:58,697] [INFO] [timer.py:197:stop] 0/4706, RunningAvgSamplesPerSec=6.3281127238558925, CurrSamplesPerSec=5.661789382678605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:12:10,114] [INFO] [timer.py:197:stop] 0/4708, RunningAvgSamplesPerSec=6.3280761709804665, CurrSamplesPerSec=5.572815725056484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:12:21,432] [INFO] [timer.py:197:stop] 0/4710, RunningAvgSamplesPerSec=6.328072034214663, CurrSamplesPerSec=5.689955350348546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:12:33,013] [INFO] [timer.py:197:stop] 0/4712, RunningAvgSamplesPerSec=6.328000762828731, CurrSamplesPerSec=5.451807090174424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:12:44,420] [INFO] [timer.py:197:stop] 0/4714, RunningAvgSamplesPerSec=6.328007137042, CurrSamplesPerSec=5.708645000697749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:12:55,766] [INFO] [timer.py:197:stop] 0/4716, RunningAvgSamplesPerSec=6.327999808026788, CurrSamplesPerSec=5.68078870083767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0028, 'learning_rate': 5.886666666666667e-06, 'epoch': 17.67} [2022-12-19 08:13:07,293] [INFO] [timer.py:197:stop] 0/4718, RunningAvgSamplesPerSec=6.327979134550257, CurrSamplesPerSec=5.6979694770969544, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:13:18,662] [INFO] [logging.py:68:log_dist] [Rank 0] step=2360, skipped=6, lr=[5.882222222222222e-06], mom=[[0.9, 0.999]] [2022-12-19 08:13:18,663] [INFO] [timer.py:197:stop] 0/4720, RunningAvgSamplesPerSec=6.327954131060739, CurrSamplesPerSec=5.596924149968914, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:13:30,227] [INFO] [timer.py:197:stop] 0/4722, RunningAvgSamplesPerSec=6.327888573857407, CurrSamplesPerSec=5.4341240181051464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:13:41,477] [INFO] [timer.py:197:stop] 0/4724, RunningAvgSamplesPerSec=6.327898151753703, CurrSamplesPerSec=5.716132205788502, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:13:53,021] [INFO] [timer.py:197:stop] 0/4726, RunningAvgSamplesPerSec=6.327903544695936, CurrSamplesPerSec=5.698011325572948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:14:04,510] [INFO] [timer.py:197:stop] 0/4728, RunningAvgSamplesPerSec=6.3278571906909145, CurrSamplesPerSec=5.7051742836552535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:14:15,809] [INFO] [timer.py:197:stop] 0/4730, RunningAvgSamplesPerSec=6.327864090400866, CurrSamplesPerSec=5.689840292186851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:14:27,321] [INFO] [timer.py:197:stop] 0/4732, RunningAvgSamplesPerSec=6.3278732979394645, CurrSamplesPerSec=5.729628503587119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:14:38,637] [INFO] [timer.py:197:stop] 0/4734, RunningAvgSamplesPerSec=6.327873111787756, CurrSamplesPerSec=5.703094556928296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:14:50,048] [INFO] [timer.py:197:stop] 0/4736, RunningAvgSamplesPerSec=6.32784652985042, CurrSamplesPerSec=5.580218368924701, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:15:01,600] [INFO] [timer.py:197:stop] 0/4738, RunningAvgSamplesPerSec=6.327845388160647, CurrSamplesPerSec=5.693643293021131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:15:12,898] [INFO] [logging.py:68:log_dist] [Rank 0] step=2370, skipped=6, lr=[5.86e-06], mom=[[0.9, 0.999]] [2022-12-19 08:15:12,900] [INFO] [timer.py:197:stop] 0/4740, RunningAvgSamplesPerSec=6.327851130327259, CurrSamplesPerSec=5.705800268188205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:15:24,247] [INFO] [timer.py:197:stop] 0/4742, RunningAvgSamplesPerSec=6.327842243010171, CurrSamplesPerSec=5.687149692745923, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:15:35,782] [INFO] [timer.py:197:stop] 0/4744, RunningAvgSamplesPerSec=6.3278455538408, CurrSamplesPerSec=5.695287619227212, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:15:47,373] [INFO] [timer.py:197:stop] 0/4746, RunningAvgSamplesPerSec=6.327770216148726, CurrSamplesPerSec=5.433185602858822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:15:58,690] [INFO] [timer.py:197:stop] 0/4748, RunningAvgSamplesPerSec=6.327765141868248, CurrSamplesPerSec=5.663782921611993, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:16:10,005] [INFO] [timer.py:197:stop] 0/4750, RunningAvgSamplesPerSec=6.327764349461753, CurrSamplesPerSec=5.700976148346481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:16:21,690] [INFO] [timer.py:197:stop] 0/4752, RunningAvgSamplesPerSec=6.327759495660428, CurrSamplesPerSec=5.660378937197155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:16:32,948] [INFO] [timer.py:197:stop] 0/4754, RunningAvgSamplesPerSec=6.327772836284775, CurrSamplesPerSec=5.718027539359287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:16:44,658] [INFO] [timer.py:197:stop] 0/4756, RunningAvgSamplesPerSec=6.327735524031336, CurrSamplesPerSec=5.6772643497802235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:16:55,989] [INFO] [timer.py:197:stop] 0/4758, RunningAvgSamplesPerSec=6.327724488358876, CurrSamplesPerSec=5.66199407146872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:17:07,308] [INFO] [logging.py:68:log_dist] [Rank 0] step=2380, skipped=6, lr=[5.837777777777777e-06], mom=[[0.9, 0.999]] [2022-12-19 08:17:07,310] [INFO] [timer.py:197:stop] 0/4760, RunningAvgSamplesPerSec=6.327721947858673, CurrSamplesPerSec=5.668504055528122, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:17:18,600] [INFO] [timer.py:197:stop] 0/4762, RunningAvgSamplesPerSec=6.327726548662647, CurrSamplesPerSec=5.70521405541658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:17:30,113] [INFO] [timer.py:197:stop] 0/4764, RunningAvgSamplesPerSec=6.32772293830232, CurrSamplesPerSec=5.686446602570265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:17:41,581] [INFO] [timer.py:197:stop] 0/4766, RunningAvgSamplesPerSec=6.327681146183064, CurrSamplesPerSec=5.689001738349641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0016, 'learning_rate': 5.831111111111112e-06, 'epoch': 17.85} [2022-12-19 08:17:53,198] [INFO] [timer.py:197:stop] 0/4768, RunningAvgSamplesPerSec=6.327687931926088, CurrSamplesPerSec=5.72119076781453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:18:04,724] [INFO] [timer.py:197:stop] 0/4770, RunningAvgSamplesPerSec=6.327630626701128, CurrSamplesPerSec=5.457755012755512, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:18:15,995] [INFO] [timer.py:197:stop] 0/4772, RunningAvgSamplesPerSec=6.327636353460444, CurrSamplesPerSec=5.698935051336716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:18:27,449] [INFO] [timer.py:197:stop] 0/4774, RunningAvgSamplesPerSec=6.327597743449596, CurrSamplesPerSec=5.557528369360366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:18:38,923] [INFO] [timer.py:197:stop] 0/4776, RunningAvgSamplesPerSec=6.327602158600098, CurrSamplesPerSec=5.7125215723481695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:18:50,240] [INFO] [timer.py:197:stop] 0/4778, RunningAvgSamplesPerSec=6.327595204223532, CurrSamplesPerSec=5.677705525145677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:19:01,717] [INFO] [logging.py:68:log_dist] [Rank 0] step=2390, skipped=6, lr=[5.815555555555557e-06], mom=[[0.9, 0.999]] [2022-12-19 08:19:01,718] [INFO] [timer.py:197:stop] 0/4780, RunningAvgSamplesPerSec=6.327593218495598, CurrSamplesPerSec=5.698819145791908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:19:13,029] [INFO] [timer.py:197:stop] 0/4782, RunningAvgSamplesPerSec=6.327595612529831, CurrSamplesPerSec=5.710081548934551, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:19:24,346] [INFO] [timer.py:197:stop] 0/4784, RunningAvgSamplesPerSec=6.327593697535643, CurrSamplesPerSec=5.671828179659276, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:19:35,669] [INFO] [timer.py:197:stop] 0/4786, RunningAvgSamplesPerSec=6.327583279217852, CurrSamplesPerSec=5.649668965550916, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:19:47,104] [INFO] [timer.py:197:stop] 0/4788, RunningAvgSamplesPerSec=6.32757646379618, CurrSamplesPerSec=5.68199644909428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:19:58,563] [INFO] [timer.py:197:stop] 0/4790, RunningAvgSamplesPerSec=6.3275439000064395, CurrSamplesPerSec=5.624860308575455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:20:09,906] [INFO] [timer.py:197:stop] 0/4792, RunningAvgSamplesPerSec=6.3275381420249905, CurrSamplesPerSec=5.6813776003473055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:20:21,219] [INFO] [timer.py:197:stop] 0/4794, RunningAvgSamplesPerSec=6.327540731668309, CurrSamplesPerSec=5.699337490695744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:20:32,540] [INFO] [timer.py:197:stop] 0/4796, RunningAvgSamplesPerSec=6.3275377178832075, CurrSamplesPerSec=5.677772295817875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:20:44,184] [INFO] [timer.py:197:stop] 0/4798, RunningAvgSamplesPerSec=6.3274434629579055, CurrSamplesPerSec=5.355468473963711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:20:55,587] [INFO] [logging.py:68:log_dist] [Rank 0] step=2400, skipped=6, lr=[5.793333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 08:20:55,589] [INFO] [timer.py:197:stop] 0/4800, RunningAvgSamplesPerSec=6.327442990406128, CurrSamplesPerSec=5.701620830613881, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:21:06,864] [INFO] [timer.py:197:stop] 0/4802, RunningAvgSamplesPerSec=6.327443374853245, CurrSamplesPerSec=5.699074192612182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:21:18,318] [INFO] [timer.py:197:stop] 0/4804, RunningAvgSamplesPerSec=6.327439942763008, CurrSamplesPerSec=5.687839458670569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:21:28,660] [INFO] [timer.py:197:stop] 0/4806, RunningAvgSamplesPerSec=6.327655921389743, CurrSamplesPerSec=6.674628815082151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:21:39,998] [INFO] [timer.py:197:stop] 0/4808, RunningAvgSamplesPerSec=6.327647904043418, CurrSamplesPerSec=5.652405380935068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:21:51,280] [INFO] [timer.py:197:stop] 0/4810, RunningAvgSamplesPerSec=6.327650706605682, CurrSamplesPerSec=5.708811569237313, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:22:02,860] [INFO] [timer.py:197:stop] 0/4812, RunningAvgSamplesPerSec=6.327653456169629, CurrSamplesPerSec=5.69854089494081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:22:14,353] [INFO] [timer.py:197:stop] 0/4814, RunningAvgSamplesPerSec=6.327605976738068, CurrSamplesPerSec=5.742368973945538, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:22:25,645] [INFO] [timer.py:197:stop] 0/4816, RunningAvgSamplesPerSec=6.3276096813678935, CurrSamplesPerSec=5.7010788227629385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:22:37,054] [INFO] [timer.py:197:stop] 0/4818, RunningAvgSamplesPerSec=6.327588728282237, CurrSamplesPerSec=5.597294803564864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.002, 'learning_rate': 5.7733333333333345e-06, 'epoch': 18.04} [2022-12-19 08:22:48,421] [INFO] [logging.py:68:log_dist] [Rank 0] step=2410, skipped=6, lr=[5.771111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 08:22:48,423] [INFO] [timer.py:197:stop] 0/4820, RunningAvgSamplesPerSec=6.3275753860140505, CurrSamplesPerSec=5.6549166671441675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:22:59,739] [INFO] [timer.py:197:stop] 0/4822, RunningAvgSamplesPerSec=6.3275644824231865, CurrSamplesPerSec=5.6557208930762854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:23:11,066] [INFO] [timer.py:197:stop] 0/4824, RunningAvgSamplesPerSec=6.3275555839415825, CurrSamplesPerSec=5.662930526285944, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:23:22,320] [INFO] [timer.py:197:stop] 0/4826, RunningAvgSamplesPerSec=6.327553018003809, CurrSamplesPerSec=5.685651920039284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:23:33,871] [INFO] [timer.py:197:stop] 0/4828, RunningAvgSamplesPerSec=6.327553624542974, CurrSamplesPerSec=5.683792185995868, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:23:45,241] [INFO] [timer.py:197:stop] 0/4830, RunningAvgSamplesPerSec=6.327538148545731, CurrSamplesPerSec=5.691821051409354, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:23:56,616] [INFO] [timer.py:197:stop] 0/4832, RunningAvgSamplesPerSec=6.327525151906703, CurrSamplesPerSec=5.675315305212774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:24:07,908] [INFO] [timer.py:197:stop] 0/4834, RunningAvgSamplesPerSec=6.327525072851254, CurrSamplesPerSec=5.694546757615206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:24:19,217] [INFO] [timer.py:197:stop] 0/4836, RunningAvgSamplesPerSec=6.327518849957523, CurrSamplesPerSec=5.66456528642286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:24:30,523] [INFO] [timer.py:197:stop] 0/4838, RunningAvgSamplesPerSec=6.32751509913721, CurrSamplesPerSec=5.6903256424735265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:24:42,183] [INFO] [logging.py:68:log_dist] [Rank 0] step=2420, skipped=6, lr=[5.7488888888888896e-06], mom=[[0.9, 0.999]] [2022-12-19 08:24:42,185] [INFO] [timer.py:197:stop] 0/4840, RunningAvgSamplesPerSec=6.327512628731874, CurrSamplesPerSec=5.68767820909432, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:24:53,476] [INFO] [timer.py:197:stop] 0/4842, RunningAvgSamplesPerSec=6.327516899716728, CurrSamplesPerSec=5.701963331957169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:25:04,884] [INFO] [timer.py:197:stop] 0/4844, RunningAvgSamplesPerSec=6.327493157705095, CurrSamplesPerSec=5.587779815766673, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:25:16,224] [INFO] [timer.py:197:stop] 0/4846, RunningAvgSamplesPerSec=6.327489885023119, CurrSamplesPerSec=5.668106676738967, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:25:27,583] [INFO] [timer.py:197:stop] 0/4848, RunningAvgSamplesPerSec=6.327475464411944, CurrSamplesPerSec=5.642081510163023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:25:38,823] [INFO] [timer.py:197:stop] 0/4850, RunningAvgSamplesPerSec=6.327476208224616, CurrSamplesPerSec=5.684388449272835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:25:50,562] [INFO] [timer.py:197:stop] 0/4852, RunningAvgSamplesPerSec=6.327360929714466, CurrSamplesPerSec=5.279606884091392, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:26:01,864] [INFO] [timer.py:197:stop] 0/4854, RunningAvgSamplesPerSec=6.327362767916472, CurrSamplesPerSec=5.689869960827325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:26:13,163] [INFO] [timer.py:197:stop] 0/4856, RunningAvgSamplesPerSec=6.327368674327938, CurrSamplesPerSec=5.707713270283764, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:26:24,567] [INFO] [timer.py:197:stop] 0/4858, RunningAvgSamplesPerSec=6.327361697149583, CurrSamplesPerSec=5.681102492853423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:26:35,906] [INFO] [logging.py:68:log_dist] [Rank 0] step=2430, skipped=6, lr=[5.726666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 08:26:35,907] [INFO] [timer.py:197:stop] 0/4860, RunningAvgSamplesPerSec=6.327369602298685, CurrSamplesPerSec=5.711781325035691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:26:47,216] [INFO] [timer.py:197:stop] 0/4862, RunningAvgSamplesPerSec=6.327372681997642, CurrSamplesPerSec=5.7022877046153075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:26:58,510] [INFO] [timer.py:197:stop] 0/4864, RunningAvgSamplesPerSec=6.327379810673013, CurrSamplesPerSec=5.710245528972669, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:27:09,781] [INFO] [timer.py:197:stop] 0/4866, RunningAvgSamplesPerSec=6.327381207918659, CurrSamplesPerSec=5.692135339233599, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:27:21,130] [INFO] [timer.py:197:stop] 0/4868, RunningAvgSamplesPerSec=6.327369667610505, CurrSamplesPerSec=5.6648029309921455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.002, 'learning_rate': 5.7177777777777786e-06, 'epoch': 18.23} [2022-12-19 08:27:32,499] [INFO] [timer.py:197:stop] 0/4870, RunningAvgSamplesPerSec=6.327360436361636, CurrSamplesPerSec=5.649565043054985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:27:43,842] [INFO] [timer.py:197:stop] 0/4872, RunningAvgSamplesPerSec=6.327353858287738, CurrSamplesPerSec=5.679330083284287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:27:55,391] [INFO] [timer.py:197:stop] 0/4874, RunningAvgSamplesPerSec=6.327285328805562, CurrSamplesPerSec=5.444565418806209, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:28:06,655] [INFO] [timer.py:197:stop] 0/4876, RunningAvgSamplesPerSec=6.327287456819221, CurrSamplesPerSec=5.6974025271137405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:28:18,052] [INFO] [timer.py:197:stop] 0/4878, RunningAvgSamplesPerSec=6.3272557001796015, CurrSamplesPerSec=5.561726617572196, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:28:29,360] [INFO] [logging.py:68:log_dist] [Rank 0] step=2440, skipped=6, lr=[5.704444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 08:28:29,362] [INFO] [timer.py:197:stop] 0/4880, RunningAvgSamplesPerSec=6.327253531036713, CurrSamplesPerSec=5.671999797491558, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:28:40,630] [INFO] [timer.py:197:stop] 0/4882, RunningAvgSamplesPerSec=6.327263877533577, CurrSamplesPerSec=5.712785141842233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:28:51,892] [INFO] [timer.py:197:stop] 0/4884, RunningAvgSamplesPerSec=6.3272692834699775, CurrSamplesPerSec=5.722192527880386, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:29:03,172] [INFO] [timer.py:197:stop] 0/4886, RunningAvgSamplesPerSec=6.327272242809911, CurrSamplesPerSec=5.703626525890407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:29:14,516] [INFO] [timer.py:197:stop] 0/4888, RunningAvgSamplesPerSec=6.32726643529082, CurrSamplesPerSec=5.682711191108299, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:29:25,872] [INFO] [timer.py:197:stop] 0/4890, RunningAvgSamplesPerSec=6.327249840714877, CurrSamplesPerSec=5.60410530418214, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:29:37,799] [INFO] [timer.py:197:stop] 0/4892, RunningAvgSamplesPerSec=6.32723745859702, CurrSamplesPerSec=5.663360634833726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:29:49,325] [INFO] [timer.py:197:stop] 0/4894, RunningAvgSamplesPerSec=6.3272221173793035, CurrSamplesPerSec=5.647042588013333, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:30:01,325] [INFO] [timer.py:197:stop] 0/4896, RunningAvgSamplesPerSec=6.327215821210697, CurrSamplesPerSec=5.667553551201293, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:30:12,638] [INFO] [timer.py:197:stop] 0/4898, RunningAvgSamplesPerSec=6.327215474227977, CurrSamplesPerSec=5.695259827396166, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:30:23,933] [INFO] [logging.py:68:log_dist] [Rank 0] step=2450, skipped=6, lr=[5.682222222222222e-06], mom=[[0.9, 0.999]] [2022-12-19 08:30:23,935] [INFO] [timer.py:197:stop] 0/4900, RunningAvgSamplesPerSec=6.327217384497405, CurrSamplesPerSec=5.696018762635046, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:30:35,309] [INFO] [timer.py:197:stop] 0/4902, RunningAvgSamplesPerSec=6.3271991393018565, CurrSamplesPerSec=5.602078488944181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:30:46,698] [INFO] [timer.py:197:stop] 0/4904, RunningAvgSamplesPerSec=6.3272041820285985, CurrSamplesPerSec=5.697172054537232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:30:58,168] [INFO] [timer.py:197:stop] 0/4906, RunningAvgSamplesPerSec=6.327157702796157, CurrSamplesPerSec=5.675063100311292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:31:09,694] [INFO] [timer.py:197:stop] 0/4908, RunningAvgSamplesPerSec=6.327154796544273, CurrSamplesPerSec=5.6918492924443385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:31:21,065] [INFO] [timer.py:197:stop] 0/4910, RunningAvgSamplesPerSec=6.327137295311493, CurrSamplesPerSec=5.626407118150094, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:31:32,355] [INFO] [timer.py:197:stop] 0/4912, RunningAvgSamplesPerSec=6.327142496333079, CurrSamplesPerSec=5.702596365291683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:31:43,799] [INFO] [timer.py:197:stop] 0/4914, RunningAvgSamplesPerSec=6.327135815882201, CurrSamplesPerSec=5.669087776721171, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:31:55,150] [INFO] [timer.py:197:stop] 0/4916, RunningAvgSamplesPerSec=6.327125425005427, CurrSamplesPerSec=5.6780350704768106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:32:06,737] [INFO] [timer.py:197:stop] 0/4918, RunningAvgSamplesPerSec=6.3271167445943455, CurrSamplesPerSec=5.680799280233679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0023, 'learning_rate': 5.662222222222223e-06, 'epoch': 18.42} [2022-12-19 08:32:18,103] [INFO] [logging.py:68:log_dist] [Rank 0] step=2460, skipped=6, lr=[5.66e-06], mom=[[0.9, 0.999]] [2022-12-19 08:32:18,105] [INFO] [timer.py:197:stop] 0/4920, RunningAvgSamplesPerSec=6.3271063804791, CurrSamplesPerSec=5.649765757218022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:32:29,397] [INFO] [timer.py:197:stop] 0/4922, RunningAvgSamplesPerSec=6.32710319908092, CurrSamplesPerSec=5.691708090071809, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:32:41,006] [INFO] [timer.py:197:stop] 0/4924, RunningAvgSamplesPerSec=6.327098196576711, CurrSamplesPerSec=5.661831417908656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:32:52,366] [INFO] [timer.py:197:stop] 0/4926, RunningAvgSamplesPerSec=6.327087701857692, CurrSamplesPerSec=5.683435498859694, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:33:04,007] [INFO] [timer.py:197:stop] 0/4928, RunningAvgSamplesPerSec=6.327078537949471, CurrSamplesPerSec=5.651837703692992, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:33:15,379] [INFO] [timer.py:197:stop] 0/4930, RunningAvgSamplesPerSec=6.32706253744641, CurrSamplesPerSec=5.6220136683175275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:33:26,670] [INFO] [timer.py:197:stop] 0/4932, RunningAvgSamplesPerSec=6.327063335989808, CurrSamplesPerSec=5.680556204514997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:33:37,985] [INFO] [timer.py:197:stop] 0/4934, RunningAvgSamplesPerSec=6.327068187003174, CurrSamplesPerSec=5.710738011565823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:33:49,490] [INFO] [timer.py:197:stop] 0/4936, RunningAvgSamplesPerSec=6.327069303055887, CurrSamplesPerSec=5.698432747378027, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:34:00,818] [INFO] [timer.py:197:stop] 0/4938, RunningAvgSamplesPerSec=6.327068379025651, CurrSamplesPerSec=5.688201761244899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:34:12,323] [INFO] [logging.py:68:log_dist] [Rank 0] step=2470, skipped=6, lr=[5.6377777777777785e-06], mom=[[0.9, 0.999]] [2022-12-19 08:34:12,325] [INFO] [timer.py:197:stop] 0/4940, RunningAvgSamplesPerSec=6.327066513247491, CurrSamplesPerSec=5.704904626338573, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:34:23,585] [INFO] [timer.py:197:stop] 0/4942, RunningAvgSamplesPerSec=6.327070687841398, CurrSamplesPerSec=5.70398599435669, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:34:34,922] [INFO] [timer.py:197:stop] 0/4944, RunningAvgSamplesPerSec=6.3270595570927926, CurrSamplesPerSec=5.656088172751707, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:34:46,225] [INFO] [timer.py:197:stop] 0/4946, RunningAvgSamplesPerSec=6.327061334359551, CurrSamplesPerSec=5.7018407630995025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:34:57,855] [INFO] [timer.py:197:stop] 0/4948, RunningAvgSamplesPerSec=6.32705799183965, CurrSamplesPerSec=5.694232928191122, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:35:09,155] [INFO] [timer.py:197:stop] 0/4950, RunningAvgSamplesPerSec=6.32706018464687, CurrSamplesPerSec=5.70229666838676, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:35:20,508] [INFO] [timer.py:197:stop] 0/4952, RunningAvgSamplesPerSec=6.327048008584175, CurrSamplesPerSec=5.657891180327236, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:35:32,029] [INFO] [timer.py:197:stop] 0/4954, RunningAvgSamplesPerSec=6.327038242434074, CurrSamplesPerSec=5.654613146208055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:35:43,324] [INFO] [timer.py:197:stop] 0/4956, RunningAvgSamplesPerSec=6.327038260508684, CurrSamplesPerSec=5.698700099493294, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:35:54,776] [INFO] [timer.py:197:stop] 0/4958, RunningAvgSamplesPerSec=6.327004696819687, CurrSamplesPerSec=5.56344828046455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:36:06,311] [INFO] [logging.py:68:log_dist] [Rank 0] step=2480, skipped=6, lr=[5.615555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 08:36:06,313] [INFO] [timer.py:197:stop] 0/4960, RunningAvgSamplesPerSec=6.327005299714216, CurrSamplesPerSec=5.689490080887334, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:36:17,634] [INFO] [timer.py:197:stop] 0/4962, RunningAvgSamplesPerSec=6.327005879573194, CurrSamplesPerSec=5.707189517915466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:36:28,992] [INFO] [timer.py:197:stop] 0/4964, RunningAvgSamplesPerSec=6.32699360983385, CurrSamplesPerSec=5.705969339099654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:36:40,441] [INFO] [timer.py:197:stop] 0/4966, RunningAvgSamplesPerSec=6.32699931933672, CurrSamplesPerSec=5.721637333364936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:36:51,814] [INFO] [timer.py:197:stop] 0/4968, RunningAvgSamplesPerSec=6.326979442005274, CurrSamplesPerSec=5.604891630663544, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0026, 'learning_rate': 5.606666666666667e-06, 'epoch': 18.61} [2022-12-19 08:37:03,116] [INFO] [timer.py:197:stop] 0/4970, RunningAvgSamplesPerSec=6.326977588709529, CurrSamplesPerSec=5.698221061010419, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:37:14,470] [INFO] [timer.py:197:stop] 0/4972, RunningAvgSamplesPerSec=6.326970612447887, CurrSamplesPerSec=5.6679194971200895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:37:26,101] [INFO] [timer.py:197:stop] 0/4974, RunningAvgSamplesPerSec=6.326966316709488, CurrSamplesPerSec=5.686292659143901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:37:37,425] [INFO] [timer.py:197:stop] 0/4976, RunningAvgSamplesPerSec=6.326960902546052, CurrSamplesPerSec=5.687363208272927, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:37:49,107] [INFO] [timer.py:197:stop] 0/4978, RunningAvgSamplesPerSec=6.326940951497693, CurrSamplesPerSec=5.6875376956183565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:38:00,412] [INFO] [logging.py:68:log_dist] [Rank 0] step=2490, skipped=6, lr=[5.593333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 08:38:00,414] [INFO] [timer.py:197:stop] 0/4980, RunningAvgSamplesPerSec=6.3269376033453835, CurrSamplesPerSec=5.6738208804201085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:38:11,908] [INFO] [timer.py:197:stop] 0/4982, RunningAvgSamplesPerSec=6.326894632065587, CurrSamplesPerSec=5.519913459847132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:38:23,253] [INFO] [timer.py:197:stop] 0/4984, RunningAvgSamplesPerSec=6.32688616046049, CurrSamplesPerSec=5.678968669646279, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:38:34,847] [INFO] [timer.py:197:stop] 0/4986, RunningAvgSamplesPerSec=6.326880693299174, CurrSamplesPerSec=5.690366896241089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:38:46,249] [INFO] [timer.py:197:stop] 0/4988, RunningAvgSamplesPerSec=6.326861827501337, CurrSamplesPerSec=5.687784984696921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:38:57,553] [INFO] [timer.py:197:stop] 0/4990, RunningAvgSamplesPerSec=6.3268683875597, CurrSamplesPerSec=5.7128826492662705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:39:09,239] [INFO] [timer.py:197:stop] 0/4992, RunningAvgSamplesPerSec=6.326865977617615, CurrSamplesPerSec=5.698946908328412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:39:20,545] [INFO] [timer.py:197:stop] 0/4994, RunningAvgSamplesPerSec=6.326862344861768, CurrSamplesPerSec=5.661643219578161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:39:31,981] [INFO] [timer.py:197:stop] 0/4996, RunningAvgSamplesPerSec=6.326821625401717, CurrSamplesPerSec=5.523061228462032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:39:43,518] [INFO] [timer.py:197:stop] 0/4998, RunningAvgSamplesPerSec=6.326819498931776, CurrSamplesPerSec=5.678779330331876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:39:54,784] [INFO] [logging.py:68:log_dist] [Rank 0] step=2500, skipped=6, lr=[5.571111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 08:39:54,786] [INFO] [timer.py:197:stop] 0/5000, RunningAvgSamplesPerSec=6.326826054062942, CurrSamplesPerSec=5.709937254291353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:40:06,124] [INFO] [timer.py:197:stop] 0/5002, RunningAvgSamplesPerSec=6.326823062990258, CurrSamplesPerSec=5.700630617984388, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:40:17,812] [INFO] [timer.py:197:stop] 0/5004, RunningAvgSamplesPerSec=6.326805838639723, CurrSamplesPerSec=5.692651503140766, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:40:29,184] [INFO] [timer.py:197:stop] 0/5006, RunningAvgSamplesPerSec=6.326792737867314, CurrSamplesPerSec=5.656071011315199, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:40:40,524] [INFO] [timer.py:197:stop] 0/5008, RunningAvgSamplesPerSec=6.326789666732949, CurrSamplesPerSec=5.6893340433181745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:40:52,230] [INFO] [timer.py:197:stop] 0/5010, RunningAvgSamplesPerSec=6.326695426849062, CurrSamplesPerSec=5.335744675962119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:41:03,614] [INFO] [timer.py:197:stop] 0/5012, RunningAvgSamplesPerSec=6.3266959447326085, CurrSamplesPerSec=5.701973748130694, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:41:14,957] [INFO] [timer.py:197:stop] 0/5014, RunningAvgSamplesPerSec=6.32669588586463, CurrSamplesPerSec=5.704777566438916, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:41:26,253] [INFO] [timer.py:197:stop] 0/5016, RunningAvgSamplesPerSec=6.326690081786702, CurrSamplesPerSec=5.680741815352365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:41:37,575] [INFO] [timer.py:197:stop] 0/5018, RunningAvgSamplesPerSec=6.326686279901627, CurrSamplesPerSec=5.663082729452929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0026, 'learning_rate': 5.5511111111111115e-06, 'epoch': 18.79} [2022-12-19 08:41:48,958] [INFO] [logging.py:68:log_dist] [Rank 0] step=2510, skipped=6, lr=[5.548888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 08:41:48,960] [INFO] [timer.py:197:stop] 0/5020, RunningAvgSamplesPerSec=6.326671668766272, CurrSamplesPerSec=5.630122698231012, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:42:00,443] [INFO] [timer.py:197:stop] 0/5022, RunningAvgSamplesPerSec=6.326668651458603, CurrSamplesPerSec=5.688079783301767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:42:11,780] [INFO] [timer.py:197:stop] 0/5024, RunningAvgSamplesPerSec=6.326677724196917, CurrSamplesPerSec=5.703023796781617, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:42:23,142] [INFO] [timer.py:197:stop] 0/5026, RunningAvgSamplesPerSec=6.326671864438544, CurrSamplesPerSec=5.696151476226653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:42:34,728] [INFO] [timer.py:197:stop] 0/5028, RunningAvgSamplesPerSec=6.326666412482619, CurrSamplesPerSec=5.671375215991909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:42:46,064] [INFO] [timer.py:197:stop] 0/5030, RunningAvgSamplesPerSec=6.326662111259606, CurrSamplesPerSec=5.663867290920763, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:42:57,395] [INFO] [timer.py:197:stop] 0/5032, RunningAvgSamplesPerSec=6.326658941487026, CurrSamplesPerSec=5.6725336528692125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:43:09,027] [INFO] [timer.py:197:stop] 0/5034, RunningAvgSamplesPerSec=6.326572039428689, CurrSamplesPerSec=5.354580951068986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:43:20,398] [INFO] [timer.py:197:stop] 0/5036, RunningAvgSamplesPerSec=6.326576139634997, CurrSamplesPerSec=5.717590792571948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:43:31,741] [INFO] [timer.py:197:stop] 0/5038, RunningAvgSamplesPerSec=6.326571287337495, CurrSamplesPerSec=5.691891051069451, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:43:43,295] [INFO] [logging.py:68:log_dist] [Rank 0] step=2520, skipped=6, lr=[5.5266666666666666e-06], mom=[[0.9, 0.999]] [2022-12-19 08:43:43,297] [INFO] [timer.py:197:stop] 0/5040, RunningAvgSamplesPerSec=6.326555008634002, CurrSamplesPerSec=5.670822889961443, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:43:54,594] [INFO] [timer.py:197:stop] 0/5042, RunningAvgSamplesPerSec=6.32656369381621, CurrSamplesPerSec=5.718615901097544, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:44:06,149] [INFO] [timer.py:197:stop] 0/5044, RunningAvgSamplesPerSec=6.326497349297817, CurrSamplesPerSec=5.447201772322673, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:44:17,458] [INFO] [timer.py:197:stop] 0/5046, RunningAvgSamplesPerSec=6.326498217758689, CurrSamplesPerSec=5.700159727787496, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:44:29,017] [INFO] [timer.py:197:stop] 0/5048, RunningAvgSamplesPerSec=6.326500435556997, CurrSamplesPerSec=5.692291530501246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:44:40,355] [INFO] [timer.py:197:stop] 0/5050, RunningAvgSamplesPerSec=6.326493755326488, CurrSamplesPerSec=5.707137827457998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:44:51,962] [INFO] [timer.py:197:stop] 0/5052, RunningAvgSamplesPerSec=6.326482302860679, CurrSamplesPerSec=5.657523904939513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:45:03,571] [INFO] [timer.py:197:stop] 0/5054, RunningAvgSamplesPerSec=6.3264266882892315, CurrSamplesPerSec=5.479909862081701, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:45:14,907] [INFO] [timer.py:197:stop] 0/5056, RunningAvgSamplesPerSec=6.3264231893407095, CurrSamplesPerSec=5.68474549681078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:45:26,607] [INFO] [timer.py:197:stop] 0/5058, RunningAvgSamplesPerSec=6.3263255063388995, CurrSamplesPerSec=5.304912664687482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:45:37,990] [INFO] [logging.py:68:log_dist] [Rank 0] step=2530, skipped=6, lr=[5.504444444444444e-06], mom=[[0.9, 0.999]] [2022-12-19 08:45:37,992] [INFO] [timer.py:197:stop] 0/5060, RunningAvgSamplesPerSec=6.326323825856821, CurrSamplesPerSec=5.701277159524834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:45:49,331] [INFO] [timer.py:197:stop] 0/5062, RunningAvgSamplesPerSec=6.326321104041919, CurrSamplesPerSec=5.685114388708127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:46:00,653] [INFO] [timer.py:197:stop] 0/5064, RunningAvgSamplesPerSec=6.326319232765039, CurrSamplesPerSec=5.69308324063147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:46:12,016] [INFO] [timer.py:197:stop] 0/5066, RunningAvgSamplesPerSec=6.326302812787618, CurrSamplesPerSec=5.6150421417677805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:46:23,323] [INFO] [timer.py:197:stop] 0/5068, RunningAvgSamplesPerSec=6.326302390480385, CurrSamplesPerSec=5.702404962663316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0028, 'learning_rate': 5.495555555555556e-06, 'epoch': 18.98} [2022-12-19 08:46:34,612] [INFO] [timer.py:197:stop] 0/5070, RunningAvgSamplesPerSec=6.326302785757876, CurrSamplesPerSec=5.688469118553357, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:46:45,954] [INFO] [timer.py:197:stop] 0/5072, RunningAvgSamplesPerSec=6.326295837074019, CurrSamplesPerSec=5.678722867371187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:46:56,346] [INFO] [timer.py:197:stop] 0/5074, RunningAvgSamplesPerSec=6.326495912843024, CurrSamplesPerSec=5.692393168062547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:47:07,785] [INFO] [timer.py:197:stop] 0/5076, RunningAvgSamplesPerSec=6.326499216337815, CurrSamplesPerSec=5.702734958505761, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:47:19,115] [INFO] [timer.py:197:stop] 0/5078, RunningAvgSamplesPerSec=6.326501366822313, CurrSamplesPerSec=5.704317870270344, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:47:30,587] [INFO] [logging.py:68:log_dist] [Rank 0] step=2540, skipped=6, lr=[5.4822222222222235e-06], mom=[[0.9, 0.999]] [2022-12-19 08:47:30,590] [INFO] [timer.py:197:stop] 0/5080, RunningAvgSamplesPerSec=6.326502031509957, CurrSamplesPerSec=5.706343659416585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:47:41,961] [INFO] [timer.py:197:stop] 0/5082, RunningAvgSamplesPerSec=6.326494835704821, CurrSamplesPerSec=5.658216282676151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:47:53,279] [INFO] [timer.py:197:stop] 0/5084, RunningAvgSamplesPerSec=6.326490445026392, CurrSamplesPerSec=5.687142704359674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:48:04,698] [INFO] [timer.py:197:stop] 0/5086, RunningAvgSamplesPerSec=6.326499860563641, CurrSamplesPerSec=5.71223930763866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:48:15,986] [INFO] [timer.py:197:stop] 0/5088, RunningAvgSamplesPerSec=6.326506752783122, CurrSamplesPerSec=5.689984296530611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:48:27,335] [INFO] [timer.py:197:stop] 0/5090, RunningAvgSamplesPerSec=6.326512647038561, CurrSamplesPerSec=5.716430437206577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:48:38,616] [INFO] [timer.py:197:stop] 0/5092, RunningAvgSamplesPerSec=6.326519872868915, CurrSamplesPerSec=5.704650269734389, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:48:50,127] [INFO] [timer.py:197:stop] 0/5094, RunningAvgSamplesPerSec=6.326526702064735, CurrSamplesPerSec=5.701698580137473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:49:01,464] [INFO] [timer.py:197:stop] 0/5096, RunningAvgSamplesPerSec=6.326529610844659, CurrSamplesPerSec=5.70489759424961, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:49:12,706] [INFO] [timer.py:197:stop] 0/5098, RunningAvgSamplesPerSec=6.32654065913177, CurrSamplesPerSec=5.7341925625041625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:49:23,999] [INFO] [logging.py:68:log_dist] [Rank 0] step=2550, skipped=6, lr=[5.460000000000001e-06], mom=[[0.9, 0.999]] [2022-12-19 08:49:24,001] [INFO] [timer.py:197:stop] 0/5100, RunningAvgSamplesPerSec=6.326544300428763, CurrSamplesPerSec=5.705857513458115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:49:35,253] [INFO] [timer.py:197:stop] 0/5102, RunningAvgSamplesPerSec=6.326555278616483, CurrSamplesPerSec=5.721692457703755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:49:46,571] [INFO] [timer.py:197:stop] 0/5104, RunningAvgSamplesPerSec=6.326551638707339, CurrSamplesPerSec=5.655830285473828, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:49:57,852] [INFO] [timer.py:197:stop] 0/5106, RunningAvgSamplesPerSec=6.326558059633236, CurrSamplesPerSec=5.708172542834028, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:50:09,134] [INFO] [timer.py:197:stop] 0/5108, RunningAvgSamplesPerSec=6.326570885016953, CurrSamplesPerSec=5.71017070432034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:50:20,527] [INFO] [timer.py:197:stop] 0/5110, RunningAvgSamplesPerSec=6.326575911406251, CurrSamplesPerSec=5.70278656923866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:50:31,860] [INFO] [timer.py:197:stop] 0/5112, RunningAvgSamplesPerSec=6.326576301281794, CurrSamplesPerSec=5.70287767778901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:50:43,124] [INFO] [timer.py:197:stop] 0/5114, RunningAvgSamplesPerSec=6.326581044356114, CurrSamplesPerSec=5.683327201909481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:50:54,454] [INFO] [timer.py:197:stop] 0/5116, RunningAvgSamplesPerSec=6.326574432990806, CurrSamplesPerSec=5.678029545713565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:51:05,843] [INFO] [timer.py:197:stop] 0/5118, RunningAvgSamplesPerSec=6.3265728355770365, CurrSamplesPerSec=5.669189545181971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0034, 'learning_rate': 5.4400000000000004e-06, 'epoch': 19.17} [2022-12-19 08:51:17,195] [INFO] [logging.py:68:log_dist] [Rank 0] step=2560, skipped=6, lr=[5.437777777777779e-06], mom=[[0.9, 0.999]] [2022-12-19 08:51:17,196] [INFO] [timer.py:197:stop] 0/5120, RunningAvgSamplesPerSec=6.326566924821098, CurrSamplesPerSec=5.6918686027150525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:51:28,595] [INFO] [timer.py:197:stop] 0/5122, RunningAvgSamplesPerSec=6.326568875432341, CurrSamplesPerSec=5.698839229296011, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:51:39,889] [INFO] [timer.py:197:stop] 0/5124, RunningAvgSamplesPerSec=6.326572220627663, CurrSamplesPerSec=5.703573688087854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:51:51,215] [INFO] [timer.py:197:stop] 0/5126, RunningAvgSamplesPerSec=6.326566899491948, CurrSamplesPerSec=5.6914471858302536, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:52:02,603] [INFO] [timer.py:197:stop] 0/5128, RunningAvgSamplesPerSec=6.326564824341951, CurrSamplesPerSec=5.689846804788743, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:52:13,958] [INFO] [timer.py:197:stop] 0/5130, RunningAvgSamplesPerSec=6.326556253707025, CurrSamplesPerSec=5.666762704622488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:52:25,484] [INFO] [timer.py:197:stop] 0/5132, RunningAvgSamplesPerSec=6.326534250546191, CurrSamplesPerSec=5.5950898107988065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:52:36,846] [INFO] [timer.py:197:stop] 0/5134, RunningAvgSamplesPerSec=6.32652168880854, CurrSamplesPerSec=5.67273336505756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:52:48,140] [INFO] [timer.py:197:stop] 0/5136, RunningAvgSamplesPerSec=6.32652710055588, CurrSamplesPerSec=5.714407942914559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:52:59,454] [INFO] [timer.py:197:stop] 0/5138, RunningAvgSamplesPerSec=6.326527574405054, CurrSamplesPerSec=5.706247345478694, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:53:10,812] [INFO] [logging.py:68:log_dist] [Rank 0] step=2570, skipped=6, lr=[5.415555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 08:53:10,814] [INFO] [timer.py:197:stop] 0/5140, RunningAvgSamplesPerSec=6.326515256432678, CurrSamplesPerSec=5.711085497358781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:53:22,089] [INFO] [timer.py:197:stop] 0/5142, RunningAvgSamplesPerSec=6.326525337238487, CurrSamplesPerSec=5.721057860345258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:53:33,381] [INFO] [timer.py:197:stop] 0/5144, RunningAvgSamplesPerSec=6.326532787009453, CurrSamplesPerSec=5.704095322477661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:53:44,703] [INFO] [timer.py:197:stop] 0/5146, RunningAvgSamplesPerSec=6.326533204113313, CurrSamplesPerSec=5.682301954644201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:53:56,164] [INFO] [timer.py:197:stop] 0/5148, RunningAvgSamplesPerSec=6.326541136064399, CurrSamplesPerSec=5.707523708396819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:54:07,625] [INFO] [timer.py:197:stop] 0/5150, RunningAvgSamplesPerSec=6.326534128526497, CurrSamplesPerSec=5.716770824229024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:54:18,931] [INFO] [timer.py:197:stop] 0/5152, RunningAvgSamplesPerSec=6.326534816139285, CurrSamplesPerSec=5.685429862965351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:54:30,231] [INFO] [timer.py:197:stop] 0/5154, RunningAvgSamplesPerSec=6.326538320411516, CurrSamplesPerSec=5.706158312564088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:54:41,511] [INFO] [timer.py:197:stop] 0/5156, RunningAvgSamplesPerSec=6.326550012930235, CurrSamplesPerSec=5.741697113890642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:54:52,778] [INFO] [timer.py:197:stop] 0/5158, RunningAvgSamplesPerSec=6.326556464695945, CurrSamplesPerSec=5.7146604939454315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:55:04,046] [INFO] [logging.py:68:log_dist] [Rank 0] step=2580, skipped=6, lr=[5.393333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 08:55:04,048] [INFO] [timer.py:197:stop] 0/5160, RunningAvgSamplesPerSec=6.326561208573604, CurrSamplesPerSec=5.695240252443188, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:55:15,356] [INFO] [timer.py:197:stop] 0/5162, RunningAvgSamplesPerSec=6.326567918257104, CurrSamplesPerSec=5.715115531427877, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:55:26,649] [INFO] [timer.py:197:stop] 0/5164, RunningAvgSamplesPerSec=6.326572499270032, CurrSamplesPerSec=5.709664474748558, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:55:37,906] [INFO] [timer.py:197:stop] 0/5166, RunningAvgSamplesPerSec=6.326584114109376, CurrSamplesPerSec=5.736320042670533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:55:49,177] [INFO] [timer.py:197:stop] 0/5168, RunningAvgSamplesPerSec=6.326596997603105, CurrSamplesPerSec=5.731909012539607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.002, 'learning_rate': 5.3844444444444445e-06, 'epoch': 19.36} [2022-12-19 08:56:00,519] [INFO] [timer.py:197:stop] 0/5170, RunningAvgSamplesPerSec=6.32660737459307, CurrSamplesPerSec=5.7360719473383615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:56:11,778] [INFO] [timer.py:197:stop] 0/5172, RunningAvgSamplesPerSec=6.326616282610804, CurrSamplesPerSec=5.707302609223747, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:56:23,191] [INFO] [timer.py:197:stop] 0/5174, RunningAvgSamplesPerSec=6.326622682871222, CurrSamplesPerSec=5.717192833802156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:56:34,540] [INFO] [timer.py:197:stop] 0/5176, RunningAvgSamplesPerSec=6.32663255710916, CurrSamplesPerSec=5.710650053337745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:56:45,886] [INFO] [timer.py:197:stop] 0/5178, RunningAvgSamplesPerSec=6.326643195569925, CurrSamplesPerSec=5.729320578243453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:56:57,123] [INFO] [logging.py:68:log_dist] [Rank 0] step=2590, skipped=6, lr=[5.3711111111111115e-06], mom=[[0.9, 0.999]] [2022-12-19 08:56:57,125] [INFO] [timer.py:197:stop] 0/5180, RunningAvgSamplesPerSec=6.326655287983985, CurrSamplesPerSec=5.72021276210361, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:57:08,402] [INFO] [timer.py:197:stop] 0/5182, RunningAvgSamplesPerSec=6.326664100104584, CurrSamplesPerSec=5.721499039366984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:57:19,727] [INFO] [timer.py:197:stop] 0/5184, RunningAvgSamplesPerSec=6.326663591091177, CurrSamplesPerSec=5.695253302396888, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:57:31,019] [INFO] [timer.py:197:stop] 0/5186, RunningAvgSamplesPerSec=6.326671962408299, CurrSamplesPerSec=5.719627240358427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:57:42,417] [INFO] [timer.py:197:stop] 0/5188, RunningAvgSamplesPerSec=6.326674591582271, CurrSamplesPerSec=5.707716668435031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:57:53,827] [INFO] [timer.py:197:stop] 0/5190, RunningAvgSamplesPerSec=6.326664032957695, CurrSamplesPerSec=5.643213536138158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:58:05,057] [INFO] [timer.py:197:stop] 0/5192, RunningAvgSamplesPerSec=6.326676370619144, CurrSamplesPerSec=5.727064610378707, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:58:16,347] [INFO] [timer.py:197:stop] 0/5194, RunningAvgSamplesPerSec=6.326686967549143, CurrSamplesPerSec=5.733222596797397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:58:27,646] [INFO] [timer.py:197:stop] 0/5196, RunningAvgSamplesPerSec=6.326691884224388, CurrSamplesPerSec=5.710569386847547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:58:39,152] [INFO] [timer.py:197:stop] 0/5198, RunningAvgSamplesPerSec=6.326702782463945, CurrSamplesPerSec=5.7146130476847015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:58:50,571] [INFO] [logging.py:68:log_dist] [Rank 0] step=2600, skipped=6, lr=[5.348888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 08:58:50,572] [INFO] [timer.py:197:stop] 0/5200, RunningAvgSamplesPerSec=6.3267099187053795, CurrSamplesPerSec=5.720873507180304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:59:01,880] [INFO] [timer.py:197:stop] 0/5202, RunningAvgSamplesPerSec=6.326720608413111, CurrSamplesPerSec=5.715921149785817, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:59:13,367] [INFO] [timer.py:197:stop] 0/5204, RunningAvgSamplesPerSec=6.326719722632003, CurrSamplesPerSec=5.698069624185527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:59:24,868] [INFO] [timer.py:197:stop] 0/5206, RunningAvgSamplesPerSec=6.326724811077986, CurrSamplesPerSec=5.707164521943531, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:59:36,169] [INFO] [timer.py:197:stop] 0/5208, RunningAvgSamplesPerSec=6.326726875762654, CurrSamplesPerSec=5.697777175528978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:59:47,536] [INFO] [timer.py:197:stop] 0/5210, RunningAvgSamplesPerSec=6.326739605345858, CurrSamplesPerSec=5.715268119041571, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 08:59:58,802] [INFO] [timer.py:197:stop] 0/5212, RunningAvgSamplesPerSec=6.326748259031097, CurrSamplesPerSec=5.72149172241045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:00:10,117] [INFO] [timer.py:197:stop] 0/5214, RunningAvgSamplesPerSec=6.32675297754615, CurrSamplesPerSec=5.7101194457065745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:00:21,524] [INFO] [timer.py:197:stop] 0/5216, RunningAvgSamplesPerSec=6.326762373485947, CurrSamplesPerSec=5.721499771063667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:00:32,797] [INFO] [timer.py:197:stop] 0/5218, RunningAvgSamplesPerSec=6.326769687259797, CurrSamplesPerSec=5.699889333981161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0022, 'learning_rate': 5.328888888888889e-06, 'epoch': 19.55} [2022-12-19 09:00:44,063] [INFO] [logging.py:68:log_dist] [Rank 0] step=2610, skipped=6, lr=[5.326666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 09:00:44,065] [INFO] [timer.py:197:stop] 0/5220, RunningAvgSamplesPerSec=6.326776497684699, CurrSamplesPerSec=5.700952901822254, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:00:55,356] [INFO] [timer.py:197:stop] 0/5222, RunningAvgSamplesPerSec=6.326785232909346, CurrSamplesPerSec=5.7228673952679845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:01:06,785] [INFO] [timer.py:197:stop] 0/5224, RunningAvgSamplesPerSec=6.326793500180556, CurrSamplesPerSec=5.704409027756232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:01:18,086] [INFO] [timer.py:197:stop] 0/5226, RunningAvgSamplesPerSec=6.326803501970373, CurrSamplesPerSec=5.715140353738345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:01:29,311] [INFO] [timer.py:197:stop] 0/5228, RunningAvgSamplesPerSec=6.3268166745835615, CurrSamplesPerSec=5.727509649280324, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:01:40,585] [INFO] [timer.py:197:stop] 0/5230, RunningAvgSamplesPerSec=6.326825404571015, CurrSamplesPerSec=5.708428670355929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:01:52,099] [INFO] [timer.py:197:stop] 0/5232, RunningAvgSamplesPerSec=6.326828604804595, CurrSamplesPerSec=5.6884736992852885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:02:03,365] [INFO] [timer.py:197:stop] 0/5234, RunningAvgSamplesPerSec=6.326834135257586, CurrSamplesPerSec=5.70580342149945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:02:14,688] [INFO] [timer.py:197:stop] 0/5236, RunningAvgSamplesPerSec=6.326839461889243, CurrSamplesPerSec=5.709518258154823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:02:25,982] [INFO] [timer.py:197:stop] 0/5238, RunningAvgSamplesPerSec=6.326846890931107, CurrSamplesPerSec=5.712144010090859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:02:37,266] [INFO] [logging.py:68:log_dist] [Rank 0] step=2620, skipped=6, lr=[5.304444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 09:02:37,268] [INFO] [timer.py:197:stop] 0/5240, RunningAvgSamplesPerSec=6.32685240217737, CurrSamplesPerSec=5.691459011684438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:02:48,554] [INFO] [timer.py:197:stop] 0/5242, RunningAvgSamplesPerSec=6.326858405249853, CurrSamplesPerSec=5.709219290392687, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:02:59,755] [INFO] [timer.py:197:stop] 0/5244, RunningAvgSamplesPerSec=6.326869292984128, CurrSamplesPerSec=5.7397755043514325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:03:11,051] [INFO] [timer.py:197:stop] 0/5246, RunningAvgSamplesPerSec=6.326875163188855, CurrSamplesPerSec=5.711337511761057, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:03:22,305] [INFO] [timer.py:197:stop] 0/5248, RunningAvgSamplesPerSec=6.326884367244508, CurrSamplesPerSec=5.732285030333962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:03:33,554] [INFO] [timer.py:197:stop] 0/5250, RunningAvgSamplesPerSec=6.326890821528471, CurrSamplesPerSec=5.707287319793456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:03:44,779] [INFO] [timer.py:197:stop] 0/5252, RunningAvgSamplesPerSec=6.326903404628032, CurrSamplesPerSec=5.7345495238987185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:03:56,060] [INFO] [timer.py:197:stop] 0/5254, RunningAvgSamplesPerSec=6.326914430519431, CurrSamplesPerSec=5.705262558315002, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:04:07,316] [INFO] [timer.py:197:stop] 0/5256, RunningAvgSamplesPerSec=6.326925978529813, CurrSamplesPerSec=5.714651491261038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:04:18,626] [INFO] [timer.py:197:stop] 0/5258, RunningAvgSamplesPerSec=6.326930492352954, CurrSamplesPerSec=5.6947511639709205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:04:29,919] [INFO] [logging.py:68:log_dist] [Rank 0] step=2630, skipped=6, lr=[5.282222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 09:04:29,920] [INFO] [timer.py:197:stop] 0/5260, RunningAvgSamplesPerSec=6.326934673644035, CurrSamplesPerSec=5.702886643415389, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:04:41,183] [INFO] [timer.py:197:stop] 0/5262, RunningAvgSamplesPerSec=6.32694649746278, CurrSamplesPerSec=5.711653958638533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:04:52,599] [INFO] [timer.py:197:stop] 0/5264, RunningAvgSamplesPerSec=6.326949836433888, CurrSamplesPerSec=5.700066527383142, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:05:03,886] [INFO] [timer.py:197:stop] 0/5266, RunningAvgSamplesPerSec=6.326953477409639, CurrSamplesPerSec=5.687694598768349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:05:15,142] [INFO] [timer.py:197:stop] 0/5268, RunningAvgSamplesPerSec=6.326964706952403, CurrSamplesPerSec=5.7148303336849775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0024, 'learning_rate': 5.273333333333333e-06, 'epoch': 19.73} [2022-12-19 09:05:26,414] [INFO] [timer.py:197:stop] 0/5270, RunningAvgSamplesPerSec=6.326972398702189, CurrSamplesPerSec=5.706219203984677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:05:37,720] [INFO] [timer.py:197:stop] 0/5272, RunningAvgSamplesPerSec=6.326977105115562, CurrSamplesPerSec=5.70524436963145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:05:48,958] [INFO] [timer.py:197:stop] 0/5274, RunningAvgSamplesPerSec=6.326988204038847, CurrSamplesPerSec=5.725098566820031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:06:00,268] [INFO] [timer.py:197:stop] 0/5276, RunningAvgSamplesPerSec=6.326989925679104, CurrSamplesPerSec=5.69340370559387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:06:11,567] [INFO] [timer.py:197:stop] 0/5278, RunningAvgSamplesPerSec=6.326988845910196, CurrSamplesPerSec=5.685329437208514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:06:22,940] [INFO] [logging.py:68:log_dist] [Rank 0] step=2640, skipped=6, lr=[5.2600000000000005e-06], mom=[[0.9, 0.999]] [2022-12-19 09:06:22,942] [INFO] [timer.py:197:stop] 0/5280, RunningAvgSamplesPerSec=6.326982777478574, CurrSamplesPerSec=5.666889033830997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:06:34,305] [INFO] [timer.py:197:stop] 0/5282, RunningAvgSamplesPerSec=6.3269820880615075, CurrSamplesPerSec=5.686813546725917, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:06:45,595] [INFO] [timer.py:197:stop] 0/5284, RunningAvgSamplesPerSec=6.326981227397975, CurrSamplesPerSec=5.685381696691994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:06:57,141] [INFO] [timer.py:197:stop] 0/5286, RunningAvgSamplesPerSec=6.326966146496324, CurrSamplesPerSec=5.595586657497565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:07:08,535] [INFO] [timer.py:197:stop] 0/5288, RunningAvgSamplesPerSec=6.326964703918456, CurrSamplesPerSec=5.6876010824846635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:07:19,857] [INFO] [timer.py:197:stop] 0/5290, RunningAvgSamplesPerSec=6.326959729229911, CurrSamplesPerSec=5.687245845045309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:07:31,123] [INFO] [timer.py:197:stop] 0/5292, RunningAvgSamplesPerSec=6.326966145312834, CurrSamplesPerSec=5.70872367034187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:07:42,607] [INFO] [timer.py:197:stop] 0/5294, RunningAvgSamplesPerSec=6.326968506839615, CurrSamplesPerSec=5.687487565561142, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:07:53,927] [INFO] [timer.py:197:stop] 0/5296, RunningAvgSamplesPerSec=6.326967208646083, CurrSamplesPerSec=5.6633126028054, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:08:05,436] [INFO] [timer.py:197:stop] 0/5298, RunningAvgSamplesPerSec=6.326960681017089, CurrSamplesPerSec=5.664069261657801, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:08:16,881] [INFO] [logging.py:68:log_dist] [Rank 0] step=2650, skipped=6, lr=[5.237777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 09:08:16,883] [INFO] [timer.py:197:stop] 0/5300, RunningAvgSamplesPerSec=6.326966155542184, CurrSamplesPerSec=5.702786811544774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:08:28,192] [INFO] [timer.py:197:stop] 0/5302, RunningAvgSamplesPerSec=6.326964430226077, CurrSamplesPerSec=5.686643681594253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:08:39,527] [INFO] [timer.py:197:stop] 0/5304, RunningAvgSamplesPerSec=6.3269612646892455, CurrSamplesPerSec=5.6898294378834935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:08:50,825] [INFO] [timer.py:197:stop] 0/5306, RunningAvgSamplesPerSec=6.326962498142549, CurrSamplesPerSec=5.702096080458439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:09:02,231] [INFO] [timer.py:197:stop] 0/5308, RunningAvgSamplesPerSec=6.326957389679294, CurrSamplesPerSec=5.661688357377286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:09:13,518] [INFO] [timer.py:197:stop] 0/5310, RunningAvgSamplesPerSec=6.326962235475069, CurrSamplesPerSec=5.705529581537679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:09:24,896] [INFO] [timer.py:197:stop] 0/5312, RunningAvgSamplesPerSec=6.326946778512884, CurrSamplesPerSec=5.625201193826876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:09:36,441] [INFO] [timer.py:197:stop] 0/5314, RunningAvgSamplesPerSec=6.326939063839756, CurrSamplesPerSec=5.666051968564235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:09:47,748] [INFO] [timer.py:197:stop] 0/5316, RunningAvgSamplesPerSec=6.326943084143451, CurrSamplesPerSec=5.710879187756437, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:09:59,054] [INFO] [timer.py:197:stop] 0/5318, RunningAvgSamplesPerSec=6.326941189309138, CurrSamplesPerSec=5.701843912035822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0019, 'learning_rate': 5.217777777777778e-06, 'epoch': 19.92} [2022-12-19 09:10:10,569] [INFO] [logging.py:68:log_dist] [Rank 0] step=2660, skipped=6, lr=[5.215555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 09:10:10,571] [INFO] [timer.py:197:stop] 0/5320, RunningAvgSamplesPerSec=6.326940495006183, CurrSamplesPerSec=5.683466304079487, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:10:21,872] [INFO] [timer.py:197:stop] 0/5322, RunningAvgSamplesPerSec=6.326943298913801, CurrSamplesPerSec=5.68685402672265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:10:33,297] [INFO] [timer.py:197:stop] 0/5324, RunningAvgSamplesPerSec=6.326942637403409, CurrSamplesPerSec=5.694474518216475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:10:44,586] [INFO] [timer.py:197:stop] 0/5326, RunningAvgSamplesPerSec=6.326950719824739, CurrSamplesPerSec=5.719741312628275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:10:55,935] [INFO] [timer.py:197:stop] 0/5328, RunningAvgSamplesPerSec=6.32695685466163, CurrSamplesPerSec=5.724115318780022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:11:07,342] [INFO] [timer.py:197:stop] 0/5330, RunningAvgSamplesPerSec=6.326964677172577, CurrSamplesPerSec=5.726946580363623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:11:18,586] [INFO] [timer.py:197:stop] 0/5332, RunningAvgSamplesPerSec=6.326973400504483, CurrSamplesPerSec=5.735480478545961, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:11:29,960] [INFO] [timer.py:197:stop] 0/5334, RunningAvgSamplesPerSec=6.3269781951262205, CurrSamplesPerSec=5.725879399385953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:11:41,245] [INFO] [timer.py:197:stop] 0/5336, RunningAvgSamplesPerSec=6.326973029007461, CurrSamplesPerSec=5.673645075155088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:11:52,544] [INFO] [timer.py:197:stop] 0/5338, RunningAvgSamplesPerSec=6.326976010934132, CurrSamplesPerSec=5.697388016223101, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:12:03,182] [INFO] [logging.py:68:log_dist] [Rank 0] step=2670, skipped=6, lr=[5.193333333333333e-06], mom=[[0.9, 0.999]] [2022-12-19 09:12:03,184] [INFO] [timer.py:197:stop] 0/5340, RunningAvgSamplesPerSec=6.327168970306973, CurrSamplesPerSec=6.6470372862909395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:12:14,479] [INFO] [timer.py:197:stop] 0/5342, RunningAvgSamplesPerSec=6.327178601783213, CurrSamplesPerSec=5.708575316743677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:12:25,901] [INFO] [timer.py:197:stop] 0/5344, RunningAvgSamplesPerSec=6.327179519712794, CurrSamplesPerSec=5.685566900476961, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:12:37,162] [INFO] [timer.py:197:stop] 0/5346, RunningAvgSamplesPerSec=6.327183593189999, CurrSamplesPerSec=5.691187029471069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:12:48,589] [INFO] [timer.py:197:stop] 0/5348, RunningAvgSamplesPerSec=6.327187354919872, CurrSamplesPerSec=5.696630892135851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:12:59,882] [INFO] [timer.py:197:stop] 0/5350, RunningAvgSamplesPerSec=6.3271918732399985, CurrSamplesPerSec=5.696502024558867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:13:11,349] [INFO] [timer.py:197:stop] 0/5352, RunningAvgSamplesPerSec=6.327184894081798, CurrSamplesPerSec=5.667741424661722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:13:22,676] [INFO] [timer.py:197:stop] 0/5354, RunningAvgSamplesPerSec=6.327183307140008, CurrSamplesPerSec=5.696361800048544, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:13:33,981] [INFO] [timer.py:197:stop] 0/5356, RunningAvgSamplesPerSec=6.327183434910143, CurrSamplesPerSec=5.693474468733676, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:13:45,304] [INFO] [timer.py:197:stop] 0/5358, RunningAvgSamplesPerSec=6.327171825788782, CurrSamplesPerSec=5.705024416999714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:13:56,558] [INFO] [logging.py:68:log_dist] [Rank 0] step=2680, skipped=6, lr=[5.171111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 09:13:56,560] [INFO] [timer.py:197:stop] 0/5360, RunningAvgSamplesPerSec=6.3271785511453675, CurrSamplesPerSec=5.703337384207692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:14:07,957] [INFO] [timer.py:197:stop] 0/5362, RunningAvgSamplesPerSec=6.32715800907698, CurrSamplesPerSec=5.595283873749549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:14:19,301] [INFO] [timer.py:197:stop] 0/5364, RunningAvgSamplesPerSec=6.327154786538909, CurrSamplesPerSec=5.679258469746943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:14:30,643] [INFO] [timer.py:197:stop] 0/5366, RunningAvgSamplesPerSec=6.3271531326479735, CurrSamplesPerSec=5.682538202718477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:14:42,001] [INFO] [timer.py:197:stop] 0/5368, RunningAvgSamplesPerSec=6.327168046244887, CurrSamplesPerSec=5.722616314294717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:14:53,322] [INFO] [timer.py:197:stop] 0/5370, RunningAvgSamplesPerSec=6.327167859594129, CurrSamplesPerSec=5.701813391722613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0015, 'learning_rate': 5.1600000000000006e-06, 'epoch': 20.11} [2022-12-19 09:15:04,617] [INFO] [timer.py:197:stop] 0/5372, RunningAvgSamplesPerSec=6.327171001593724, CurrSamplesPerSec=5.689900835838622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:15:15,919] [INFO] [timer.py:197:stop] 0/5374, RunningAvgSamplesPerSec=6.3271777615315115, CurrSamplesPerSec=5.708940751531693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:15:27,232] [INFO] [timer.py:197:stop] 0/5376, RunningAvgSamplesPerSec=6.327186386890859, CurrSamplesPerSec=5.727134013321466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:15:38,578] [INFO] [timer.py:197:stop] 0/5378, RunningAvgSamplesPerSec=6.327182631640413, CurrSamplesPerSec=5.688924334711592, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:15:49,997] [INFO] [logging.py:68:log_dist] [Rank 0] step=2690, skipped=6, lr=[5.1488888888888885e-06], mom=[[0.9, 0.999]] [2022-12-19 09:15:49,998] [INFO] [timer.py:197:stop] 0/5380, RunningAvgSamplesPerSec=6.327187850831585, CurrSamplesPerSec=5.695013579455546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:16:01,312] [INFO] [timer.py:197:stop] 0/5382, RunningAvgSamplesPerSec=6.327186278255629, CurrSamplesPerSec=5.691755639490088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:16:12,617] [INFO] [timer.py:197:stop] 0/5384, RunningAvgSamplesPerSec=6.32718475323586, CurrSamplesPerSec=5.696297008986016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:16:23,969] [INFO] [timer.py:197:stop] 0/5386, RunningAvgSamplesPerSec=6.3271703022340136, CurrSamplesPerSec=5.637367094153941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:16:35,326] [INFO] [timer.py:197:stop] 0/5388, RunningAvgSamplesPerSec=6.32717219029916, CurrSamplesPerSec=5.702830184670812, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:16:46,644] [INFO] [timer.py:197:stop] 0/5390, RunningAvgSamplesPerSec=6.327166078098964, CurrSamplesPerSec=5.675496493835246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:16:57,999] [INFO] [timer.py:197:stop] 0/5392, RunningAvgSamplesPerSec=6.327174112622288, CurrSamplesPerSec=5.703911091256303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:17:09,270] [INFO] [timer.py:197:stop] 0/5394, RunningAvgSamplesPerSec=6.327181817378258, CurrSamplesPerSec=5.70997636373514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:17:20,629] [INFO] [timer.py:197:stop] 0/5396, RunningAvgSamplesPerSec=6.327171254687691, CurrSamplesPerSec=5.631382714148365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:17:31,909] [INFO] [timer.py:197:stop] 0/5398, RunningAvgSamplesPerSec=6.327180145434734, CurrSamplesPerSec=5.707966200885282, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:17:43,190] [INFO] [logging.py:68:log_dist] [Rank 0] step=2700, skipped=6, lr=[5.126666666666668e-06], mom=[[0.9, 0.999]] [2022-12-19 09:17:43,192] [INFO] [timer.py:197:stop] 0/5400, RunningAvgSamplesPerSec=6.3271851038532665, CurrSamplesPerSec=5.702080092172868, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:17:54,484] [INFO] [timer.py:197:stop] 0/5402, RunningAvgSamplesPerSec=6.327184118644154, CurrSamplesPerSec=5.687512389469407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:18:05,873] [INFO] [timer.py:197:stop] 0/5404, RunningAvgSamplesPerSec=6.327181976805404, CurrSamplesPerSec=5.690761129077653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:18:17,177] [INFO] [timer.py:197:stop] 0/5406, RunningAvgSamplesPerSec=6.32718537277891, CurrSamplesPerSec=5.709541088823918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:18:28,497] [INFO] [timer.py:197:stop] 0/5408, RunningAvgSamplesPerSec=6.327184525947135, CurrSamplesPerSec=5.668778662460667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:18:39,772] [INFO] [timer.py:197:stop] 0/5410, RunningAvgSamplesPerSec=6.327185135853989, CurrSamplesPerSec=5.714341280952035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:18:51,188] [INFO] [timer.py:197:stop] 0/5412, RunningAvgSamplesPerSec=6.32718514242739, CurrSamplesPerSec=5.692471390414974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:19:02,616] [INFO] [timer.py:197:stop] 0/5414, RunningAvgSamplesPerSec=6.32719158856131, CurrSamplesPerSec=5.712962408437075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:19:13,941] [INFO] [timer.py:197:stop] 0/5416, RunningAvgSamplesPerSec=6.327192215568873, CurrSamplesPerSec=5.687191864406779, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:19:25,254] [INFO] [timer.py:197:stop] 0/5418, RunningAvgSamplesPerSec=6.327190349484859, CurrSamplesPerSec=5.698551298614068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:19:36,581] [INFO] [logging.py:68:log_dist] [Rank 0] step=2710, skipped=6, lr=[5.1044444444444455e-06], mom=[[0.9, 0.999]] [2022-12-19 09:19:36,582] [INFO] [timer.py:197:stop] 0/5420, RunningAvgSamplesPerSec=6.32718458060827, CurrSamplesPerSec=5.68088006964804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0014, 'learning_rate': 5.1044444444444455e-06, 'epoch': 20.3} [2022-12-19 09:19:47,952] [INFO] [timer.py:197:stop] 0/5422, RunningAvgSamplesPerSec=6.327182302960085, CurrSamplesPerSec=5.687313081291521, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:19:59,245] [INFO] [timer.py:197:stop] 0/5424, RunningAvgSamplesPerSec=6.327180700220772, CurrSamplesPerSec=5.696810300985299, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:20:10,674] [INFO] [timer.py:197:stop] 0/5426, RunningAvgSamplesPerSec=6.32717462746869, CurrSamplesPerSec=5.672319810990473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:20:22,079] [INFO] [timer.py:197:stop] 0/5428, RunningAvgSamplesPerSec=6.327170383179405, CurrSamplesPerSec=5.689570152947408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:20:33,559] [INFO] [timer.py:197:stop] 0/5430, RunningAvgSamplesPerSec=6.327173759604896, CurrSamplesPerSec=5.6884628501953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:20:44,905] [INFO] [timer.py:197:stop] 0/5432, RunningAvgSamplesPerSec=6.327170095692326, CurrSamplesPerSec=5.680122278318599, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:20:56,430] [INFO] [timer.py:197:stop] 0/5434, RunningAvgSamplesPerSec=6.327167262794571, CurrSamplesPerSec=5.694608851138402, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:21:07,743] [INFO] [timer.py:197:stop] 0/5436, RunningAvgSamplesPerSec=6.327164783825932, CurrSamplesPerSec=5.690069448256989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:21:19,159] [INFO] [timer.py:197:stop] 0/5438, RunningAvgSamplesPerSec=6.327162225202587, CurrSamplesPerSec=5.6876659169007855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:21:30,412] [INFO] [logging.py:68:log_dist] [Rank 0] step=2720, skipped=6, lr=[5.082222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 09:21:30,414] [INFO] [timer.py:197:stop] 0/5440, RunningAvgSamplesPerSec=6.327169282735077, CurrSamplesPerSec=5.728104364396911, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:21:41,664] [INFO] [timer.py:197:stop] 0/5442, RunningAvgSamplesPerSec=6.327175297672756, CurrSamplesPerSec=5.712350654209712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:21:53,120] [INFO] [timer.py:197:stop] 0/5444, RunningAvgSamplesPerSec=6.327179147958254, CurrSamplesPerSec=5.710764982725515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:22:04,397] [INFO] [timer.py:197:stop] 0/5446, RunningAvgSamplesPerSec=6.3271886569286275, CurrSamplesPerSec=5.716229584054422, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:22:15,744] [INFO] [timer.py:197:stop] 0/5448, RunningAvgSamplesPerSec=6.327194432622465, CurrSamplesPerSec=5.717956651769992, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:22:27,075] [INFO] [timer.py:197:stop] 0/5450, RunningAvgSamplesPerSec=6.3271883539616685, CurrSamplesPerSec=5.717734743567056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:22:38,367] [INFO] [timer.py:197:stop] 0/5452, RunningAvgSamplesPerSec=6.32719257226937, CurrSamplesPerSec=5.717352108080316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:22:49,926] [INFO] [timer.py:197:stop] 0/5454, RunningAvgSamplesPerSec=6.327189274724262, CurrSamplesPerSec=5.689497316221823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:23:01,184] [INFO] [timer.py:197:stop] 0/5456, RunningAvgSamplesPerSec=6.327193864601911, CurrSamplesPerSec=5.693525187424588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:23:12,473] [INFO] [timer.py:197:stop] 0/5458, RunningAvgSamplesPerSec=6.3271959129642745, CurrSamplesPerSec=5.684621018638477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:23:23,894] [INFO] [logging.py:68:log_dist] [Rank 0] step=2730, skipped=6, lr=[5.060000000000001e-06], mom=[[0.9, 0.999]] [2022-12-19 09:23:23,896] [INFO] [timer.py:197:stop] 0/5460, RunningAvgSamplesPerSec=6.327201701895241, CurrSamplesPerSec=5.696515805616074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:23:35,408] [INFO] [timer.py:197:stop] 0/5462, RunningAvgSamplesPerSec=6.32720850493136, CurrSamplesPerSec=5.707695065970879, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:23:46,704] [INFO] [timer.py:197:stop] 0/5464, RunningAvgSamplesPerSec=6.327212100515684, CurrSamplesPerSec=5.7028689545039155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:23:57,977] [INFO] [timer.py:197:stop] 0/5466, RunningAvgSamplesPerSec=6.327219163802245, CurrSamplesPerSec=5.722468457673567, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:24:09,200] [INFO] [timer.py:197:stop] 0/5468, RunningAvgSamplesPerSec=6.327226269519971, CurrSamplesPerSec=5.677955803164082, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:24:20,571] [INFO] [timer.py:197:stop] 0/5470, RunningAvgSamplesPerSec=6.327232834326806, CurrSamplesPerSec=5.699783071637999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0013, 'learning_rate': 5.0488888888888895e-06, 'epoch': 20.49} [2022-12-19 09:24:31,841] [INFO] [timer.py:197:stop] 0/5472, RunningAvgSamplesPerSec=6.327237362960743, CurrSamplesPerSec=5.707804536322762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:24:43,115] [INFO] [timer.py:197:stop] 0/5474, RunningAvgSamplesPerSec=6.327245683904697, CurrSamplesPerSec=5.723284692265474, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:24:54,395] [INFO] [timer.py:197:stop] 0/5476, RunningAvgSamplesPerSec=6.327249843449044, CurrSamplesPerSec=5.707456721547685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:25:05,679] [INFO] [timer.py:197:stop] 0/5478, RunningAvgSamplesPerSec=6.32725263265533, CurrSamplesPerSec=5.679213772279983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:25:16,971] [INFO] [logging.py:68:log_dist] [Rank 0] step=2740, skipped=6, lr=[5.037777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 09:25:16,972] [INFO] [timer.py:197:stop] 0/5480, RunningAvgSamplesPerSec=6.3272573975652024, CurrSamplesPerSec=5.687394779084915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:25:28,291] [INFO] [timer.py:197:stop] 0/5482, RunningAvgSamplesPerSec=6.327254841993636, CurrSamplesPerSec=5.656133222018028, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:25:39,696] [INFO] [timer.py:197:stop] 0/5484, RunningAvgSamplesPerSec=6.32725078286408, CurrSamplesPerSec=5.666228738828618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:25:51,022] [INFO] [timer.py:197:stop] 0/5486, RunningAvgSamplesPerSec=6.327249858664082, CurrSamplesPerSec=5.66952361086928, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:26:02,317] [INFO] [timer.py:197:stop] 0/5488, RunningAvgSamplesPerSec=6.3272559599367755, CurrSamplesPerSec=5.711906995540534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:26:13,639] [INFO] [timer.py:197:stop] 0/5490, RunningAvgSamplesPerSec=6.327260734577367, CurrSamplesPerSec=5.693732418893059, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:26:24,949] [INFO] [timer.py:197:stop] 0/5492, RunningAvgSamplesPerSec=6.327268112072993, CurrSamplesPerSec=5.7113788277194395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:26:36,313] [INFO] [timer.py:197:stop] 0/5494, RunningAvgSamplesPerSec=6.327275974646863, CurrSamplesPerSec=5.704688094447636, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:26:47,608] [INFO] [timer.py:197:stop] 0/5496, RunningAvgSamplesPerSec=6.32728804870645, CurrSamplesPerSec=5.70186643914325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:26:58,952] [INFO] [timer.py:197:stop] 0/5498, RunningAvgSamplesPerSec=6.327289262028841, CurrSamplesPerSec=5.686855713401687, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:27:10,288] [INFO] [logging.py:68:log_dist] [Rank 0] step=2750, skipped=6, lr=[5.015555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 09:27:10,289] [INFO] [timer.py:197:stop] 0/5500, RunningAvgSamplesPerSec=6.327283499430925, CurrSamplesPerSec=5.669504930888486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:27:21,617] [INFO] [timer.py:197:stop] 0/5502, RunningAvgSamplesPerSec=6.3272912644209285, CurrSamplesPerSec=5.7285561664307325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:27:32,907] [INFO] [timer.py:197:stop] 0/5504, RunningAvgSamplesPerSec=6.3272996280468226, CurrSamplesPerSec=5.71489165366438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:27:44,313] [INFO] [timer.py:197:stop] 0/5506, RunningAvgSamplesPerSec=6.327301827201211, CurrSamplesPerSec=5.700405210450529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:27:55,661] [INFO] [timer.py:197:stop] 0/5508, RunningAvgSamplesPerSec=6.327306888942876, CurrSamplesPerSec=5.698007697068032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:28:06,943] [INFO] [timer.py:197:stop] 0/5510, RunningAvgSamplesPerSec=6.32730964604632, CurrSamplesPerSec=5.682682559479179, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:28:18,238] [INFO] [timer.py:197:stop] 0/5512, RunningAvgSamplesPerSec=6.327312340952667, CurrSamplesPerSec=5.710392511677529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:28:29,651] [INFO] [timer.py:197:stop] 0/5514, RunningAvgSamplesPerSec=6.327298654554762, CurrSamplesPerSec=5.641737390291214, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:28:41,021] [INFO] [timer.py:197:stop] 0/5516, RunningAvgSamplesPerSec=6.32729353124478, CurrSamplesPerSec=5.668375260396188, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:28:52,349] [INFO] [timer.py:197:stop] 0/5518, RunningAvgSamplesPerSec=6.327300269047504, CurrSamplesPerSec=5.712615909925904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:29:03,623] [INFO] [logging.py:68:log_dist] [Rank 0] step=2760, skipped=6, lr=[4.9933333333333335e-06], mom=[[0.9, 0.999]] [2022-12-19 09:29:03,625] [INFO] [timer.py:197:stop] 0/5520, RunningAvgSamplesPerSec=6.327310681431952, CurrSamplesPerSec=5.732619228176159, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0009, 'learning_rate': 4.9933333333333335e-06, 'epoch': 20.67} [2022-12-19 09:29:14,934] [INFO] [timer.py:197:stop] 0/5522, RunningAvgSamplesPerSec=6.3273143413130875, CurrSamplesPerSec=5.702070160101224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:29:26,216] [INFO] [timer.py:197:stop] 0/5524, RunningAvgSamplesPerSec=6.327321656764444, CurrSamplesPerSec=5.709871425319408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:29:37,444] [INFO] [timer.py:197:stop] 0/5526, RunningAvgSamplesPerSec=6.327335053212334, CurrSamplesPerSec=5.7233869515143905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:29:48,866] [INFO] [timer.py:197:stop] 0/5528, RunningAvgSamplesPerSec=6.3273350167233495, CurrSamplesPerSec=5.694494571109942, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:30:00,205] [INFO] [timer.py:197:stop] 0/5530, RunningAvgSamplesPerSec=6.327322904587677, CurrSamplesPerSec=5.621468324571367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:30:11,784] [INFO] [timer.py:197:stop] 0/5532, RunningAvgSamplesPerSec=6.327310195577793, CurrSamplesPerSec=5.686455034767105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:30:23,053] [INFO] [timer.py:197:stop] 0/5534, RunningAvgSamplesPerSec=6.327321346896436, CurrSamplesPerSec=5.7211100471289855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:30:34,414] [INFO] [timer.py:197:stop] 0/5536, RunningAvgSamplesPerSec=6.327311520353038, CurrSamplesPerSec=5.634310754331623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:30:45,950] [INFO] [timer.py:197:stop] 0/5538, RunningAvgSamplesPerSec=6.327312174722138, CurrSamplesPerSec=5.669162725844795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:30:57,193] [INFO] [logging.py:68:log_dist] [Rank 0] step=2770, skipped=6, lr=[4.971111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 09:30:57,194] [INFO] [timer.py:197:stop] 0/5540, RunningAvgSamplesPerSec=6.327325163066052, CurrSamplesPerSec=5.725389186526671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:31:08,502] [INFO] [timer.py:197:stop] 0/5542, RunningAvgSamplesPerSec=6.327329341920278, CurrSamplesPerSec=5.7063409907246365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:31:19,843] [INFO] [timer.py:197:stop] 0/5544, RunningAvgSamplesPerSec=6.32733060741325, CurrSamplesPerSec=5.697363589724128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:31:31,501] [INFO] [timer.py:197:stop] 0/5546, RunningAvgSamplesPerSec=6.327245100993378, CurrSamplesPerSec=5.330481042158059, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:31:42,788] [INFO] [timer.py:197:stop] 0/5548, RunningAvgSamplesPerSec=6.327249421338328, CurrSamplesPerSec=5.694891550830674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:31:54,019] [INFO] [timer.py:197:stop] 0/5550, RunningAvgSamplesPerSec=6.327264033576318, CurrSamplesPerSec=5.741278601222568, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:32:05,607] [INFO] [timer.py:197:stop] 0/5552, RunningAvgSamplesPerSec=6.327272437368167, CurrSamplesPerSec=5.694354928591742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:32:16,918] [INFO] [timer.py:197:stop] 0/5554, RunningAvgSamplesPerSec=6.327272738272082, CurrSamplesPerSec=5.694001745731919, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:32:28,412] [INFO] [timer.py:197:stop] 0/5556, RunningAvgSamplesPerSec=6.327272882837922, CurrSamplesPerSec=5.698257107063665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:32:39,725] [INFO] [timer.py:197:stop] 0/5558, RunningAvgSamplesPerSec=6.327272076574496, CurrSamplesPerSec=5.675074618219827, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:32:51,102] [INFO] [logging.py:68:log_dist] [Rank 0] step=2780, skipped=6, lr=[4.94888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 09:32:51,104] [INFO] [timer.py:197:stop] 0/5560, RunningAvgSamplesPerSec=6.327253292561312, CurrSamplesPerSec=5.589430126255712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:33:02,425] [INFO] [timer.py:197:stop] 0/5562, RunningAvgSamplesPerSec=6.327254845494401, CurrSamplesPerSec=5.690062935145412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:33:13,919] [INFO] [timer.py:197:stop] 0/5564, RunningAvgSamplesPerSec=6.32726033064504, CurrSamplesPerSec=5.686731865630022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:33:25,441] [INFO] [timer.py:197:stop] 0/5566, RunningAvgSamplesPerSec=6.327206768718379, CurrSamplesPerSec=5.693810678056856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:33:36,705] [INFO] [timer.py:197:stop] 0/5568, RunningAvgSamplesPerSec=6.327211380121902, CurrSamplesPerSec=5.7038003158434085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:33:48,065] [INFO] [timer.py:197:stop] 0/5570, RunningAvgSamplesPerSec=6.327203497215607, CurrSamplesPerSec=5.630594134224443, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0013, 'learning_rate': 4.937777777777778e-06, 'epoch': 20.86} [2022-12-19 09:33:59,317] [INFO] [timer.py:197:stop] 0/5572, RunningAvgSamplesPerSec=6.327214966931058, CurrSamplesPerSec=5.711514688526538, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:34:10,917] [INFO] [timer.py:197:stop] 0/5574, RunningAvgSamplesPerSec=6.3271475112880085, CurrSamplesPerSec=5.373344381797514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:34:22,233] [INFO] [timer.py:197:stop] 0/5576, RunningAvgSamplesPerSec=6.327162233856269, CurrSamplesPerSec=5.727136701497453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:34:33,529] [INFO] [timer.py:197:stop] 0/5578, RunningAvgSamplesPerSec=6.327168667014627, CurrSamplesPerSec=5.712193117103601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:34:45,006] [INFO] [logging.py:68:log_dist] [Rank 0] step=2790, skipped=6, lr=[4.926666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 09:34:45,008] [INFO] [timer.py:197:stop] 0/5580, RunningAvgSamplesPerSec=6.327162301048915, CurrSamplesPerSec=5.693340672332899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:34:56,288] [INFO] [timer.py:197:stop] 0/5582, RunningAvgSamplesPerSec=6.327168909762913, CurrSamplesPerSec=5.710165359777352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:35:07,583] [INFO] [timer.py:197:stop] 0/5584, RunningAvgSamplesPerSec=6.327170185345427, CurrSamplesPerSec=5.6879409373064895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:35:19,056] [INFO] [timer.py:197:stop] 0/5586, RunningAvgSamplesPerSec=6.327178054808086, CurrSamplesPerSec=5.702443242161101, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:35:30,361] [INFO] [timer.py:197:stop] 0/5588, RunningAvgSamplesPerSec=6.32718639713858, CurrSamplesPerSec=5.727915401243383, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:35:41,629] [INFO] [timer.py:197:stop] 0/5590, RunningAvgSamplesPerSec=6.327187793097923, CurrSamplesPerSec=5.725961720389244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:35:53,084] [INFO] [timer.py:197:stop] 0/5592, RunningAvgSamplesPerSec=6.327175618068802, CurrSamplesPerSec=5.632879213404751, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:36:04,334] [INFO] [timer.py:197:stop] 0/5594, RunningAvgSamplesPerSec=6.327181526888844, CurrSamplesPerSec=5.7055628096091775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:36:15,586] [INFO] [timer.py:197:stop] 0/5596, RunningAvgSamplesPerSec=6.3271948410454275, CurrSamplesPerSec=5.743845653210668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:36:26,910] [INFO] [timer.py:197:stop] 0/5598, RunningAvgSamplesPerSec=6.32719456394194, CurrSamplesPerSec=5.687586139445938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:36:38,236] [INFO] [logging.py:68:log_dist] [Rank 0] step=2800, skipped=6, lr=[4.904444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 09:36:38,238] [INFO] [timer.py:197:stop] 0/5600, RunningAvgSamplesPerSec=6.327205010140148, CurrSamplesPerSec=5.733503265251636, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:36:49,441] [INFO] [timer.py:197:stop] 0/5602, RunningAvgSamplesPerSec=6.327223378428297, CurrSamplesPerSec=5.751916030718203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:37:01,051] [INFO] [timer.py:197:stop] 0/5604, RunningAvgSamplesPerSec=6.327219059826464, CurrSamplesPerSec=5.698002375269181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:37:12,342] [INFO] [timer.py:197:stop] 0/5606, RunningAvgSamplesPerSec=6.327218460763419, CurrSamplesPerSec=5.678211388132963, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:37:22,660] [INFO] [timer.py:197:stop] 0/5608, RunningAvgSamplesPerSec=6.327415078281939, CurrSamplesPerSec=5.728377930817846, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:37:34,083] [INFO] [timer.py:197:stop] 0/5610, RunningAvgSamplesPerSec=6.327420423804473, CurrSamplesPerSec=5.699373550875137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:37:45,407] [INFO] [timer.py:197:stop] 0/5612, RunningAvgSamplesPerSec=6.327422636259304, CurrSamplesPerSec=5.675730976284964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:37:56,638] [INFO] [timer.py:197:stop] 0/5614, RunningAvgSamplesPerSec=6.327429930148577, CurrSamplesPerSec=5.709820900791182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:38:08,199] [INFO] [timer.py:197:stop] 0/5616, RunningAvgSamplesPerSec=6.327432036479824, CurrSamplesPerSec=5.714081703385435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:38:19,615] [INFO] [timer.py:197:stop] 0/5618, RunningAvgSamplesPerSec=6.327410992068545, CurrSamplesPerSec=5.596495205391974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:38:30,949] [INFO] [logging.py:68:log_dist] [Rank 0] step=2810, skipped=6, lr=[4.8822222222222224e-06], mom=[[0.9, 0.999]] [2022-12-19 09:38:30,950] [INFO] [timer.py:197:stop] 0/5620, RunningAvgSamplesPerSec=6.327407622759622, CurrSamplesPerSec=5.68998357287247, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0019, 'learning_rate': 4.8822222222222224e-06, 'epoch': 21.05} [2022-12-19 09:38:42,256] [INFO] [timer.py:197:stop] 0/5622, RunningAvgSamplesPerSec=6.327410267605082, CurrSamplesPerSec=5.709107094487449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:38:53,795] [INFO] [timer.py:197:stop] 0/5624, RunningAvgSamplesPerSec=6.3274179594238555, CurrSamplesPerSec=5.728789918832232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:39:05,086] [INFO] [timer.py:197:stop] 0/5626, RunningAvgSamplesPerSec=6.3274188191981, CurrSamplesPerSec=5.701032570386453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:39:16,498] [INFO] [timer.py:197:stop] 0/5628, RunningAvgSamplesPerSec=6.327419068519769, CurrSamplesPerSec=5.716022415772843, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:39:27,778] [INFO] [timer.py:197:stop] 0/5630, RunningAvgSamplesPerSec=6.327425834168894, CurrSamplesPerSec=5.701110061703604, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:39:39,148] [INFO] [timer.py:197:stop] 0/5632, RunningAvgSamplesPerSec=6.327410685347695, CurrSamplesPerSec=5.671059382557289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:39:50,453] [INFO] [timer.py:197:stop] 0/5634, RunningAvgSamplesPerSec=6.32741430489386, CurrSamplesPerSec=5.688350986436177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:40:02,025] [INFO] [timer.py:197:stop] 0/5636, RunningAvgSamplesPerSec=6.327418356713053, CurrSamplesPerSec=5.688830777985573, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:40:13,310] [INFO] [timer.py:197:stop] 0/5638, RunningAvgSamplesPerSec=6.327421222279422, CurrSamplesPerSec=5.6982295281631545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:40:24,683] [INFO] [logging.py:68:log_dist] [Rank 0] step=2820, skipped=6, lr=[4.86e-06], mom=[[0.9, 0.999]] [2022-12-19 09:40:24,685] [INFO] [timer.py:197:stop] 0/5640, RunningAvgSamplesPerSec=6.3274295438319195, CurrSamplesPerSec=5.706835469173851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:40:36,000] [INFO] [timer.py:197:stop] 0/5642, RunningAvgSamplesPerSec=6.327427083353472, CurrSamplesPerSec=5.6800330972692725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:40:47,339] [INFO] [timer.py:197:stop] 0/5644, RunningAvgSamplesPerSec=6.327426749052634, CurrSamplesPerSec=5.696525234798897, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:40:58,798] [INFO] [timer.py:197:stop] 0/5646, RunningAvgSamplesPerSec=6.327399834865437, CurrSamplesPerSec=5.571845224455094, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:41:10,315] [INFO] [timer.py:197:stop] 0/5648, RunningAvgSamplesPerSec=6.327398929512355, CurrSamplesPerSec=5.697679215569709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:41:21,606] [INFO] [timer.py:197:stop] 0/5650, RunningAvgSamplesPerSec=6.327403132566141, CurrSamplesPerSec=5.7144738765314695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:41:32,999] [INFO] [timer.py:197:stop] 0/5652, RunningAvgSamplesPerSec=6.3274132371488925, CurrSamplesPerSec=5.748354108547398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:41:44,337] [INFO] [timer.py:197:stop] 0/5654, RunningAvgSamplesPerSec=6.327410067859584, CurrSamplesPerSec=5.660059553504746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:41:55,642] [INFO] [timer.py:197:stop] 0/5656, RunningAvgSamplesPerSec=6.327413901031317, CurrSamplesPerSec=5.696571655846379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:42:07,140] [INFO] [timer.py:197:stop] 0/5658, RunningAvgSamplesPerSec=6.327417873945464, CurrSamplesPerSec=5.718135944741984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:42:18,388] [INFO] [logging.py:68:log_dist] [Rank 0] step=2830, skipped=6, lr=[4.837777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 09:42:18,390] [INFO] [timer.py:197:stop] 0/5660, RunningAvgSamplesPerSec=6.327419975425736, CurrSamplesPerSec=5.706910206442957, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:42:29,878] [INFO] [timer.py:197:stop] 0/5662, RunningAvgSamplesPerSec=6.327420450882776, CurrSamplesPerSec=5.689966205132308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:42:41,166] [INFO] [timer.py:197:stop] 0/5664, RunningAvgSamplesPerSec=6.32741802272912, CurrSamplesPerSec=5.674001493900007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:42:52,467] [INFO] [timer.py:197:stop] 0/5666, RunningAvgSamplesPerSec=6.327415628733577, CurrSamplesPerSec=5.669098073128717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:43:03,779] [INFO] [timer.py:197:stop] 0/5668, RunningAvgSamplesPerSec=6.327413317294286, CurrSamplesPerSec=5.674418891746345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:43:15,315] [INFO] [timer.py:197:stop] 0/5670, RunningAvgSamplesPerSec=6.327416984761981, CurrSamplesPerSec=5.719089845686782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.002, 'learning_rate': 4.826666666666667e-06, 'epoch': 21.24} [2022-12-19 09:43:26,658] [INFO] [timer.py:197:stop] 0/5672, RunningAvgSamplesPerSec=6.3274024234510255, CurrSamplesPerSec=5.707454537220382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:43:37,961] [INFO] [timer.py:197:stop] 0/5674, RunningAvgSamplesPerSec=6.327404247400426, CurrSamplesPerSec=5.697620924945585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:43:49,556] [INFO] [timer.py:197:stop] 0/5676, RunningAvgSamplesPerSec=6.327399979426677, CurrSamplesPerSec=5.695306711163966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:44:00,857] [INFO] [timer.py:197:stop] 0/5678, RunningAvgSamplesPerSec=6.327400425007796, CurrSamplesPerSec=5.702805711485068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:44:12,211] [INFO] [logging.py:68:log_dist] [Rank 0] step=2840, skipped=6, lr=[4.815555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 09:44:12,212] [INFO] [timer.py:197:stop] 0/5680, RunningAvgSamplesPerSec=6.327396045290966, CurrSamplesPerSec=5.684045648800412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:44:23,545] [INFO] [timer.py:197:stop] 0/5682, RunningAvgSamplesPerSec=6.327395027061223, CurrSamplesPerSec=5.685081879992083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:44:35,150] [INFO] [timer.py:197:stop] 0/5684, RunningAvgSamplesPerSec=6.3273916120935665, CurrSamplesPerSec=5.6889385613905015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:44:46,579] [INFO] [timer.py:197:stop] 0/5686, RunningAvgSamplesPerSec=6.32736545235806, CurrSamplesPerSec=5.6722610792160175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:44:58,031] [INFO] [timer.py:197:stop] 0/5688, RunningAvgSamplesPerSec=6.3273611657510695, CurrSamplesPerSec=5.684350652529778, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:45:09,367] [INFO] [timer.py:197:stop] 0/5690, RunningAvgSamplesPerSec=6.327355915824812, CurrSamplesPerSec=5.6710406924550005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:45:20,689] [INFO] [timer.py:197:stop] 0/5692, RunningAvgSamplesPerSec=6.327352207268951, CurrSamplesPerSec=5.670585937934044, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:45:32,184] [INFO] [timer.py:197:stop] 0/5694, RunningAvgSamplesPerSec=6.327353750164532, CurrSamplesPerSec=5.683844417222247, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:45:43,533] [INFO] [timer.py:197:stop] 0/5696, RunningAvgSamplesPerSec=6.327341765044639, CurrSamplesPerSec=5.683435498859694, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:45:55,061] [INFO] [timer.py:197:stop] 0/5698, RunningAvgSamplesPerSec=6.327345367885964, CurrSamplesPerSec=5.684843253316956, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:46:06,639] [INFO] [logging.py:68:log_dist] [Rank 0] step=2850, skipped=6, lr=[4.793333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 09:46:06,641] [INFO] [timer.py:197:stop] 0/5700, RunningAvgSamplesPerSec=6.327286582690143, CurrSamplesPerSec=5.4192383148168854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:46:17,980] [INFO] [timer.py:197:stop] 0/5702, RunningAvgSamplesPerSec=6.327280595627196, CurrSamplesPerSec=5.672219368408686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:46:29,436] [INFO] [timer.py:197:stop] 0/5704, RunningAvgSamplesPerSec=6.3272825137410305, CurrSamplesPerSec=5.694223506595454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:46:40,786] [INFO] [timer.py:197:stop] 0/5706, RunningAvgSamplesPerSec=6.327261208122124, CurrSamplesPerSec=5.6869682412488185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:46:52,289] [INFO] [timer.py:197:stop] 0/5708, RunningAvgSamplesPerSec=6.327261407870596, CurrSamplesPerSec=5.701694220294985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:47:03,884] [INFO] [timer.py:197:stop] 0/5710, RunningAvgSamplesPerSec=6.32720118432913, CurrSamplesPerSec=5.416812582457253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:47:15,174] [INFO] [timer.py:197:stop] 0/5712, RunningAvgSamplesPerSec=6.327204076092369, CurrSamplesPerSec=5.711932762323314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:47:26,592] [INFO] [timer.py:197:stop] 0/5714, RunningAvgSamplesPerSec=6.32720852910461, CurrSamplesPerSec=5.729683292819606, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:47:38,090] [INFO] [timer.py:197:stop] 0/5716, RunningAvgSamplesPerSec=6.327203837351662, CurrSamplesPerSec=5.698907465873488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:47:49,449] [INFO] [timer.py:197:stop] 0/5718, RunningAvgSamplesPerSec=6.32719679751568, CurrSamplesPerSec=5.677196390372188, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:48:00,816] [INFO] [logging.py:68:log_dist] [Rank 0] step=2860, skipped=6, lr=[4.771111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 09:48:00,816] [INFO] [timer.py:197:stop] 0/5720, RunningAvgSamplesPerSec=6.327186640524851, CurrSamplesPerSec=5.642861211979465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0015, 'learning_rate': 4.771111111111111e-06, 'epoch': 21.43} [2022-12-19 09:48:12,291] [INFO] [timer.py:197:stop] 0/5722, RunningAvgSamplesPerSec=6.327188655688786, CurrSamplesPerSec=5.695544766946134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:48:23,601] [INFO] [timer.py:197:stop] 0/5724, RunningAvgSamplesPerSec=6.327185767827487, CurrSamplesPerSec=5.700987045219974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:48:35,102] [INFO] [timer.py:197:stop] 0/5726, RunningAvgSamplesPerSec=6.327180149714026, CurrSamplesPerSec=5.685788245664439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:48:46,404] [INFO] [timer.py:197:stop] 0/5728, RunningAvgSamplesPerSec=6.327182441682619, CurrSamplesPerSec=5.712367915748, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:48:57,850] [INFO] [timer.py:197:stop] 0/5730, RunningAvgSamplesPerSec=6.32714800753467, CurrSamplesPerSec=5.522254069048929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:49:09,384] [INFO] [timer.py:197:stop] 0/5732, RunningAvgSamplesPerSec=6.327138376638185, CurrSamplesPerSec=5.643304412064901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:49:20,675] [INFO] [timer.py:197:stop] 0/5734, RunningAvgSamplesPerSec=6.327138979351629, CurrSamplesPerSec=5.689489598532356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:49:32,292] [INFO] [timer.py:197:stop] 0/5736, RunningAvgSamplesPerSec=6.327136622541701, CurrSamplesPerSec=5.697290795162459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:49:43,615] [INFO] [timer.py:197:stop] 0/5738, RunningAvgSamplesPerSec=6.327135704318018, CurrSamplesPerSec=5.6942413834958305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:49:54,993] [INFO] [logging.py:68:log_dist] [Rank 0] step=2870, skipped=6, lr=[4.74888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 09:49:54,993] [INFO] [timer.py:197:stop] 0/5740, RunningAvgSamplesPerSec=6.327122930590715, CurrSamplesPerSec=5.62946315101259, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:50:06,369] [INFO] [timer.py:197:stop] 0/5742, RunningAvgSamplesPerSec=6.327131694110766, CurrSamplesPerSec=5.709245761498113, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:50:17,681] [INFO] [timer.py:197:stop] 0/5744, RunningAvgSamplesPerSec=6.32713618408195, CurrSamplesPerSec=5.706669258563901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:50:28,926] [INFO] [timer.py:197:stop] 0/5746, RunningAvgSamplesPerSec=6.327142986967995, CurrSamplesPerSec=5.709714996508473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:50:40,538] [INFO] [timer.py:197:stop] 0/5748, RunningAvgSamplesPerSec=6.3271468646431215, CurrSamplesPerSec=5.707299454255233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:50:51,900] [INFO] [timer.py:197:stop] 0/5750, RunningAvgSamplesPerSec=6.327135596027916, CurrSamplesPerSec=5.63494990951894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:51:03,157] [INFO] [timer.py:197:stop] 0/5752, RunningAvgSamplesPerSec=6.327141318724086, CurrSamplesPerSec=5.699698113024863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:51:14,840] [INFO] [timer.py:197:stop] 0/5754, RunningAvgSamplesPerSec=6.3270582448384, CurrSamplesPerSec=5.315250036017745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:51:26,196] [INFO] [timer.py:197:stop] 0/5756, RunningAvgSamplesPerSec=6.327061663784205, CurrSamplesPerSec=5.677727141378433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:51:37,468] [INFO] [timer.py:197:stop] 0/5758, RunningAvgSamplesPerSec=6.327065276123169, CurrSamplesPerSec=5.706542605299759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:51:49,018] [INFO] [logging.py:68:log_dist] [Rank 0] step=2880, skipped=6, lr=[4.7266666666666674e-06], mom=[[0.9, 0.999]] [2022-12-19 09:51:49,020] [INFO] [timer.py:197:stop] 0/5760, RunningAvgSamplesPerSec=6.327072260110817, CurrSamplesPerSec=5.7087773320528985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:52:00,265] [INFO] [timer.py:197:stop] 0/5762, RunningAvgSamplesPerSec=6.327082291108477, CurrSamplesPerSec=5.698030919579351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:52:11,615] [INFO] [timer.py:197:stop] 0/5764, RunningAvgSamplesPerSec=6.327076965000704, CurrSamplesPerSec=5.6473930583099206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:52:22,956] [INFO] [timer.py:197:stop] 0/5766, RunningAvgSamplesPerSec=6.327070247646142, CurrSamplesPerSec=5.659319237558479, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:52:34,398] [INFO] [timer.py:197:stop] 0/5768, RunningAvgSamplesPerSec=6.327080207341556, CurrSamplesPerSec=5.711380528977598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:52:45,786] [INFO] [timer.py:197:stop] 0/5770, RunningAvgSamplesPerSec=6.327067159723164, CurrSamplesPerSec=5.632695535104642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.001, 'learning_rate': 4.715555555555556e-06, 'epoch': 21.61} [2022-12-19 09:52:57,088] [INFO] [timer.py:197:stop] 0/5772, RunningAvgSamplesPerSec=6.327067991248805, CurrSamplesPerSec=5.690279805655552, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:53:08,517] [INFO] [timer.py:197:stop] 0/5774, RunningAvgSamplesPerSec=6.327057749593877, CurrSamplesPerSec=5.657660554432299, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:53:19,861] [INFO] [timer.py:197:stop] 0/5776, RunningAvgSamplesPerSec=6.32705066759627, CurrSamplesPerSec=5.666897886657168, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:53:31,171] [INFO] [timer.py:197:stop] 0/5778, RunningAvgSamplesPerSec=6.327049254943086, CurrSamplesPerSec=5.683146234883642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:53:42,693] [INFO] [logging.py:68:log_dist] [Rank 0] step=2890, skipped=6, lr=[4.704444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 09:53:42,695] [INFO] [timer.py:197:stop] 0/5780, RunningAvgSamplesPerSec=6.327043791543741, CurrSamplesPerSec=5.661496108783891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:53:54,041] [INFO] [timer.py:197:stop] 0/5782, RunningAvgSamplesPerSec=6.327042645625433, CurrSamplesPerSec=5.6944846654461205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:54:05,365] [INFO] [timer.py:197:stop] 0/5784, RunningAvgSamplesPerSec=6.327037925383355, CurrSamplesPerSec=5.6893287377089425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:54:16,691] [INFO] [timer.py:197:stop] 0/5786, RunningAvgSamplesPerSec=6.327037155006354, CurrSamplesPerSec=5.691439462849965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:54:28,093] [INFO] [timer.py:197:stop] 0/5788, RunningAvgSamplesPerSec=6.327017300838014, CurrSamplesPerSec=5.583380496683008, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:54:39,379] [INFO] [timer.py:197:stop] 0/5790, RunningAvgSamplesPerSec=6.327020962436225, CurrSamplesPerSec=5.694250321988112, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:54:50,777] [INFO] [timer.py:197:stop] 0/5792, RunningAvgSamplesPerSec=6.327020789462871, CurrSamplesPerSec=5.683115673770766, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:55:02,125] [INFO] [timer.py:197:stop] 0/5794, RunningAvgSamplesPerSec=6.3270091364151755, CurrSamplesPerSec=5.707822741333952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:55:13,414] [INFO] [timer.py:197:stop] 0/5796, RunningAvgSamplesPerSec=6.327014101290869, CurrSamplesPerSec=5.694762520309839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:55:24,989] [INFO] [timer.py:197:stop] 0/5798, RunningAvgSamplesPerSec=6.327018074378067, CurrSamplesPerSec=5.704875528151967, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:55:36,304] [INFO] [logging.py:68:log_dist] [Rank 0] step=2900, skipped=6, lr=[4.682222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 09:55:36,306] [INFO] [timer.py:197:stop] 0/5800, RunningAvgSamplesPerSec=6.327021477282059, CurrSamplesPerSec=5.701546231818208, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:55:47,765] [INFO] [timer.py:197:stop] 0/5802, RunningAvgSamplesPerSec=6.326984708001661, CurrSamplesPerSec=5.4972002386010415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:55:59,322] [INFO] [timer.py:197:stop] 0/5804, RunningAvgSamplesPerSec=6.3269943571697365, CurrSamplesPerSec=5.72997291045628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:56:10,578] [INFO] [timer.py:197:stop] 0/5806, RunningAvgSamplesPerSec=6.327001051094036, CurrSamplesPerSec=5.727020379099153, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:56:21,958] [INFO] [timer.py:197:stop] 0/5808, RunningAvgSamplesPerSec=6.32700769326307, CurrSamplesPerSec=5.750022555789915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:56:33,325] [INFO] [timer.py:197:stop] 0/5810, RunningAvgSamplesPerSec=6.327011629477202, CurrSamplesPerSec=5.709083053081462, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:56:44,931] [INFO] [timer.py:197:stop] 0/5812, RunningAvgSamplesPerSec=6.326946189757655, CurrSamplesPerSec=5.376336205594705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:56:56,173] [INFO] [timer.py:197:stop] 0/5814, RunningAvgSamplesPerSec=6.326959175733956, CurrSamplesPerSec=5.709152749431925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:57:07,451] [INFO] [timer.py:197:stop] 0/5816, RunningAvgSamplesPerSec=6.326962736697872, CurrSamplesPerSec=5.698312023739548, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:57:18,971] [INFO] [timer.py:197:stop] 0/5818, RunningAvgSamplesPerSec=6.32697302985932, CurrSamplesPerSec=5.738290122205588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:57:30,209] [INFO] [logging.py:68:log_dist] [Rank 0] step=2910, skipped=6, lr=[4.66e-06], mom=[[0.9, 0.999]] [2022-12-19 09:57:30,211] [INFO] [timer.py:197:stop] 0/5820, RunningAvgSamplesPerSec=6.326981300033884, CurrSamplesPerSec=5.709944298809991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 4.66e-06, 'epoch': 21.8} [2022-12-19 09:57:41,750] [INFO] [timer.py:197:stop] 0/5822, RunningAvgSamplesPerSec=6.326957921120016, CurrSamplesPerSec=5.6115923026815855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:57:53,010] [INFO] [timer.py:197:stop] 0/5824, RunningAvgSamplesPerSec=6.326966153366738, CurrSamplesPerSec=5.720722083111198, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:58:04,383] [INFO] [timer.py:197:stop] 0/5826, RunningAvgSamplesPerSec=6.32695049743848, CurrSamplesPerSec=5.609825712111343, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:58:15,740] [INFO] [timer.py:197:stop] 0/5828, RunningAvgSamplesPerSec=6.326957168711861, CurrSamplesPerSec=5.713642397153228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:58:27,084] [INFO] [timer.py:197:stop] 0/5830, RunningAvgSamplesPerSec=6.326962727780275, CurrSamplesPerSec=5.713342754137487, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:58:38,438] [INFO] [timer.py:197:stop] 0/5832, RunningAvgSamplesPerSec=6.326951504928178, CurrSamplesPerSec=5.7268217133249255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:58:49,903] [INFO] [timer.py:197:stop] 0/5834, RunningAvgSamplesPerSec=6.326958748208818, CurrSamplesPerSec=5.699728610697147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:59:01,216] [INFO] [timer.py:197:stop] 0/5836, RunningAvgSamplesPerSec=6.32695359020357, CurrSamplesPerSec=5.641792883007067, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:59:12,490] [INFO] [timer.py:197:stop] 0/5838, RunningAvgSamplesPerSec=6.326963591580429, CurrSamplesPerSec=5.726723240811893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:59:23,967] [INFO] [logging.py:68:log_dist] [Rank 0] step=2920, skipped=6, lr=[4.637777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 09:59:23,969] [INFO] [timer.py:197:stop] 0/5840, RunningAvgSamplesPerSec=6.3269273832902195, CurrSamplesPerSec=5.517450311069923, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:59:35,482] [INFO] [timer.py:197:stop] 0/5842, RunningAvgSamplesPerSec=6.326935179829532, CurrSamplesPerSec=5.718429999812535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:59:46,757] [INFO] [timer.py:197:stop] 0/5844, RunningAvgSamplesPerSec=6.326947470354533, CurrSamplesPerSec=5.722882280254571, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 09:59:58,264] [INFO] [timer.py:197:stop] 0/5846, RunningAvgSamplesPerSec=6.326955208953875, CurrSamplesPerSec=5.720715011973767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:00:09,542] [INFO] [timer.py:197:stop] 0/5848, RunningAvgSamplesPerSec=6.326966322260483, CurrSamplesPerSec=5.729824428838214, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:00:20,792] [INFO] [timer.py:197:stop] 0/5850, RunningAvgSamplesPerSec=6.3269755213986265, CurrSamplesPerSec=5.699409853530076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:00:32,036] [INFO] [timer.py:197:stop] 0/5852, RunningAvgSamplesPerSec=6.326986057020568, CurrSamplesPerSec=5.737037237301803, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:00:43,562] [INFO] [timer.py:197:stop] 0/5854, RunningAvgSamplesPerSec=6.326996133713118, CurrSamplesPerSec=5.723843135726337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:00:54,824] [INFO] [timer.py:197:stop] 0/5856, RunningAvgSamplesPerSec=6.327004189750953, CurrSamplesPerSec=5.717485817651799, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:01:06,128] [INFO] [timer.py:197:stop] 0/5858, RunningAvgSamplesPerSec=6.327010210840014, CurrSamplesPerSec=5.709431066109457, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:01:17,791] [INFO] [logging.py:68:log_dist] [Rank 0] step=2930, skipped=6, lr=[4.6155555555555555e-06], mom=[[0.9, 0.999]] [2022-12-19 10:01:17,792] [INFO] [timer.py:197:stop] 0/5860, RunningAvgSamplesPerSec=6.327014788944203, CurrSamplesPerSec=5.701773425139868, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:01:29,036] [INFO] [timer.py:197:stop] 0/5862, RunningAvgSamplesPerSec=6.327023739432682, CurrSamplesPerSec=5.727167982276287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:01:40,372] [INFO] [timer.py:197:stop] 0/5864, RunningAvgSamplesPerSec=6.3270097142862145, CurrSamplesPerSec=5.614817109777724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:01:51,841] [INFO] [timer.py:197:stop] 0/5866, RunningAvgSamplesPerSec=6.327020617712988, CurrSamplesPerSec=5.733233862192032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:02:03,121] [INFO] [timer.py:197:stop] 0/5868, RunningAvgSamplesPerSec=6.32702696942492, CurrSamplesPerSec=5.714019184477904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:02:14,402] [INFO] [timer.py:197:stop] 0/5870, RunningAvgSamplesPerSec=6.327027786112765, CurrSamplesPerSec=5.711264359677365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.001, 'learning_rate': 4.604444444444444e-06, 'epoch': 21.99} [2022-12-19 10:02:25,956] [INFO] [timer.py:197:stop] 0/5872, RunningAvgSamplesPerSec=6.3270325574716155, CurrSamplesPerSec=5.708980575823943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:02:36,565] [INFO] [timer.py:197:stop] 0/5874, RunningAvgSamplesPerSec=6.327158145468831, CurrSamplesPerSec=6.3286307726982525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:02:47,839] [INFO] [timer.py:197:stop] 0/5876, RunningAvgSamplesPerSec=6.327164963162884, CurrSamplesPerSec=5.693196981035972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:02:59,101] [INFO] [timer.py:197:stop] 0/5878, RunningAvgSamplesPerSec=6.3271705723900995, CurrSamplesPerSec=5.7137195019418545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:03:10,665] [INFO] [logging.py:68:log_dist] [Rank 0] step=2940, skipped=6, lr=[4.593333333333333e-06], mom=[[0.9, 0.999]] [2022-12-19 10:03:10,667] [INFO] [timer.py:197:stop] 0/5880, RunningAvgSamplesPerSec=6.327174173544156, CurrSamplesPerSec=5.703298365655687, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:03:21,934] [INFO] [timer.py:197:stop] 0/5882, RunningAvgSamplesPerSec=6.327182050001152, CurrSamplesPerSec=5.696695207216446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:03:33,460] [INFO] [timer.py:197:stop] 0/5884, RunningAvgSamplesPerSec=6.327159301933449, CurrSamplesPerSec=5.647914686177536, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:03:44,763] [INFO] [timer.py:197:stop] 0/5886, RunningAvgSamplesPerSec=6.3271609129828, CurrSamplesPerSec=5.691050444837621, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:03:56,164] [INFO] [timer.py:197:stop] 0/5888, RunningAvgSamplesPerSec=6.32713891183848, CurrSamplesPerSec=5.606526772641925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:04:07,546] [INFO] [timer.py:197:stop] 0/5890, RunningAvgSamplesPerSec=6.32713992082808, CurrSamplesPerSec=5.699807276834669, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:04:18,828] [INFO] [timer.py:197:stop] 0/5892, RunningAvgSamplesPerSec=6.327144882716841, CurrSamplesPerSec=5.70081681659845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:04:30,221] [INFO] [timer.py:197:stop] 0/5894, RunningAvgSamplesPerSec=6.3271297192686875, CurrSamplesPerSec=5.69585970786221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:04:41,713] [INFO] [timer.py:197:stop] 0/5896, RunningAvgSamplesPerSec=6.327134341979254, CurrSamplesPerSec=5.71248923569967, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:04:53,258] [INFO] [timer.py:197:stop] 0/5898, RunningAvgSamplesPerSec=6.327083623947903, CurrSamplesPerSec=5.451876625623628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:05:04,500] [INFO] [logging.py:68:log_dist] [Rank 0] step=2950, skipped=6, lr=[4.571111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 10:05:04,502] [INFO] [timer.py:197:stop] 0/5900, RunningAvgSamplesPerSec=6.327088247910819, CurrSamplesPerSec=5.712549289766839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:05:16,004] [INFO] [timer.py:197:stop] 0/5902, RunningAvgSamplesPerSec=6.327054347984338, CurrSamplesPerSec=5.51459689861742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:05:27,566] [INFO] [timer.py:197:stop] 0/5904, RunningAvgSamplesPerSec=6.327054394134031, CurrSamplesPerSec=5.676833807602154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:05:38,894] [INFO] [timer.py:197:stop] 0/5906, RunningAvgSamplesPerSec=6.327049726008971, CurrSamplesPerSec=5.685233349697753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:05:50,549] [INFO] [timer.py:197:stop] 0/5908, RunningAvgSamplesPerSec=6.327035228707285, CurrSamplesPerSec=5.686025024610416, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:06:01,776] [INFO] [timer.py:197:stop] 0/5910, RunningAvgSamplesPerSec=6.327041364155486, CurrSamplesPerSec=5.703674517036521, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:06:13,119] [INFO] [timer.py:197:stop] 0/5912, RunningAvgSamplesPerSec=6.327033561186232, CurrSamplesPerSec=5.644803931427298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:06:24,454] [INFO] [timer.py:197:stop] 0/5914, RunningAvgSamplesPerSec=6.327029687311023, CurrSamplesPerSec=5.683738270898107, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:06:35,850] [INFO] [timer.py:197:stop] 0/5916, RunningAvgSamplesPerSec=6.3270380769619, CurrSamplesPerSec=5.726126613790753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:06:47,177] [INFO] [timer.py:197:stop] 0/5918, RunningAvgSamplesPerSec=6.327033412754576, CurrSamplesPerSec=5.721812710500365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:06:58,471] [INFO] [logging.py:68:log_dist] [Rank 0] step=2960, skipped=6, lr=[4.548888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 10:06:58,473] [INFO] [timer.py:197:stop] 0/5920, RunningAvgSamplesPerSec=6.327035370354154, CurrSamplesPerSec=5.705562324525058, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:07:10,060] [INFO] [timer.py:197:stop] 0/5922, RunningAvgSamplesPerSec=6.3270288536709325, CurrSamplesPerSec=5.667065617049686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0009, 'learning_rate': 4.546666666666667e-06, 'epoch': 22.18} [2022-12-19 10:07:21,435] [INFO] [timer.py:197:stop] 0/5924, RunningAvgSamplesPerSec=6.327017876534208, CurrSamplesPerSec=5.649178161562496, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:07:32,953] [INFO] [timer.py:197:stop] 0/5926, RunningAvgSamplesPerSec=6.326972921242991, CurrSamplesPerSec=5.490632344847329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:07:44,282] [INFO] [timer.py:197:stop] 0/5928, RunningAvgSamplesPerSec=6.326975546026964, CurrSamplesPerSec=5.695770273669802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:07:55,617] [INFO] [timer.py:197:stop] 0/5930, RunningAvgSamplesPerSec=6.3269660870575954, CurrSamplesPerSec=5.6737510846656605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:08:06,946] [INFO] [timer.py:197:stop] 0/5932, RunningAvgSamplesPerSec=6.326961510353099, CurrSamplesPerSec=5.6774910529991045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:08:18,249] [INFO] [timer.py:197:stop] 0/5934, RunningAvgSamplesPerSec=6.326963092041931, CurrSamplesPerSec=5.686075851553204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:08:29,564] [INFO] [timer.py:197:stop] 0/5936, RunningAvgSamplesPerSec=6.326961327528801, CurrSamplesPerSec=5.687967452511109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:08:40,856] [INFO] [timer.py:197:stop] 0/5938, RunningAvgSamplesPerSec=6.326964798856647, CurrSamplesPerSec=5.70273083937954, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:08:52,163] [INFO] [logging.py:68:log_dist] [Rank 0] step=2970, skipped=6, lr=[4.526666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 10:08:52,165] [INFO] [timer.py:197:stop] 0/5940, RunningAvgSamplesPerSec=6.326963721464207, CurrSamplesPerSec=5.709717668357206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:09:03,481] [INFO] [timer.py:197:stop] 0/5942, RunningAvgSamplesPerSec=6.326964574888375, CurrSamplesPerSec=5.6890902366785525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:09:14,809] [INFO] [timer.py:197:stop] 0/5944, RunningAvgSamplesPerSec=6.326963549494113, CurrSamplesPerSec=5.69632698675711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:09:26,105] [INFO] [timer.py:197:stop] 0/5946, RunningAvgSamplesPerSec=6.326967258159801, CurrSamplesPerSec=5.711505209655833, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:09:37,494] [INFO] [timer.py:197:stop] 0/5948, RunningAvgSamplesPerSec=6.326956425764327, CurrSamplesPerSec=5.7083356849108675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:09:48,863] [INFO] [timer.py:197:stop] 0/5950, RunningAvgSamplesPerSec=6.326951969316135, CurrSamplesPerSec=5.683016773524603, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:10:00,211] [INFO] [timer.py:197:stop] 0/5952, RunningAvgSamplesPerSec=6.326939747879126, CurrSamplesPerSec=5.629318888278358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:10:11,513] [INFO] [timer.py:197:stop] 0/5954, RunningAvgSamplesPerSec=6.3269391516184, CurrSamplesPerSec=5.691589340679574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:10:22,842] [INFO] [timer.py:197:stop] 0/5956, RunningAvgSamplesPerSec=6.326934073077021, CurrSamplesPerSec=5.66583837591072, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:10:34,136] [INFO] [timer.py:197:stop] 0/5958, RunningAvgSamplesPerSec=6.326935015127865, CurrSamplesPerSec=5.703295215112671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:10:45,799] [INFO] [logging.py:68:log_dist] [Rank 0] step=2980, skipped=6, lr=[4.504444444444444e-06], mom=[[0.9, 0.999]] [2022-12-19 10:10:45,801] [INFO] [timer.py:197:stop] 0/5960, RunningAvgSamplesPerSec=6.326856385601546, CurrSamplesPerSec=5.342274264791985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:10:57,087] [INFO] [timer.py:197:stop] 0/5962, RunningAvgSamplesPerSec=6.326862635775639, CurrSamplesPerSec=5.71736258055326, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:11:08,410] [INFO] [timer.py:197:stop] 0/5964, RunningAvgSamplesPerSec=6.326859315450841, CurrSamplesPerSec=5.6833195009502, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:11:19,769] [INFO] [timer.py:197:stop] 0/5966, RunningAvgSamplesPerSec=6.3268480148935815, CurrSamplesPerSec=5.705248977420315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:11:31,052] [INFO] [timer.py:197:stop] 0/5968, RunningAvgSamplesPerSec=6.326848027383447, CurrSamplesPerSec=5.686951373835971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:11:42,358] [INFO] [timer.py:197:stop] 0/5970, RunningAvgSamplesPerSec=6.326849287343159, CurrSamplesPerSec=5.692717901457818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:11:53,679] [INFO] [timer.py:197:stop] 0/5972, RunningAvgSamplesPerSec=6.326852992624097, CurrSamplesPerSec=5.70224409748054, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0008, 'learning_rate': 4.4911111111111115e-06, 'epoch': 22.37} [2022-12-19 10:12:04,967] [INFO] [timer.py:197:stop] 0/5974, RunningAvgSamplesPerSec=6.326855429804918, CurrSamplesPerSec=5.691295143511307, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:12:16,267] [INFO] [timer.py:197:stop] 0/5976, RunningAvgSamplesPerSec=6.326855196171432, CurrSamplesPerSec=5.690240965561461, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:12:27,590] [INFO] [timer.py:197:stop] 0/5978, RunningAvgSamplesPerSec=6.326848469842774, CurrSamplesPerSec=5.669094960257375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:12:38,794] [INFO] [logging.py:68:log_dist] [Rank 0] step=2990, skipped=6, lr=[4.482222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 10:12:38,795] [INFO] [timer.py:197:stop] 0/5980, RunningAvgSamplesPerSec=6.326853938473931, CurrSamplesPerSec=5.701912705051104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:12:50,151] [INFO] [timer.py:197:stop] 0/5982, RunningAvgSamplesPerSec=6.3268435096766895, CurrSamplesPerSec=5.650225503466684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:13:01,481] [INFO] [timer.py:197:stop] 0/5984, RunningAvgSamplesPerSec=6.32683696248987, CurrSamplesPerSec=5.674698150795881, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:13:12,859] [INFO] [timer.py:197:stop] 0/5986, RunningAvgSamplesPerSec=6.326825400990913, CurrSamplesPerSec=5.634763257430251, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:13:24,199] [INFO] [timer.py:197:stop] 0/5988, RunningAvgSamplesPerSec=6.3268239588553365, CurrSamplesPerSec=5.693355404119485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:13:35,546] [INFO] [timer.py:197:stop] 0/5990, RunningAvgSamplesPerSec=6.326819849938619, CurrSamplesPerSec=5.6655477913846, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:13:47,482] [INFO] [timer.py:197:stop] 0/5992, RunningAvgSamplesPerSec=6.3268090296776816, CurrSamplesPerSec=5.650846386664215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:13:59,164] [INFO] [timer.py:197:stop] 0/5994, RunningAvgSamplesPerSec=6.326795131353419, CurrSamplesPerSec=5.642845554149048, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:14:10,793] [INFO] [timer.py:197:stop] 0/5996, RunningAvgSamplesPerSec=6.326783338306078, CurrSamplesPerSec=5.650995085841466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:14:22,101] [INFO] [timer.py:197:stop] 0/5998, RunningAvgSamplesPerSec=6.326784150319048, CurrSamplesPerSec=5.707723464749703, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:14:33,490] [INFO] [logging.py:68:log_dist] [Rank 0] step=3000, skipped=6, lr=[4.4600000000000005e-06], mom=[[0.9, 0.999]] [2022-12-19 10:14:33,491] [INFO] [timer.py:197:stop] 0/6000, RunningAvgSamplesPerSec=6.326779954974161, CurrSamplesPerSec=5.690617085410944, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:14:44,826] [INFO] [timer.py:197:stop] 0/6002, RunningAvgSamplesPerSec=6.3267751031391, CurrSamplesPerSec=5.6754506556203665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:14:56,308] [INFO] [timer.py:197:stop] 0/6004, RunningAvgSamplesPerSec=6.326778540555864, CurrSamplesPerSec=5.68618642135863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:15:07,603] [INFO] [timer.py:197:stop] 0/6006, RunningAvgSamplesPerSec=6.326782410766883, CurrSamplesPerSec=5.680037664424787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:15:18,952] [INFO] [timer.py:197:stop] 0/6008, RunningAvgSamplesPerSec=6.326790784882522, CurrSamplesPerSec=5.7196143222044284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:15:30,397] [INFO] [timer.py:197:stop] 0/6010, RunningAvgSamplesPerSec=6.3267987448118195, CurrSamplesPerSec=5.710988294275674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:15:41,828] [INFO] [timer.py:197:stop] 0/6012, RunningAvgSamplesPerSec=6.326807707658075, CurrSamplesPerSec=5.720318812376589, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:15:53,101] [INFO] [timer.py:197:stop] 0/6014, RunningAvgSamplesPerSec=6.326813039696776, CurrSamplesPerSec=5.696548928780686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:16:04,480] [INFO] [timer.py:197:stop] 0/6016, RunningAvgSamplesPerSec=6.326817658559794, CurrSamplesPerSec=5.706221629964608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:16:15,782] [INFO] [timer.py:197:stop] 0/6018, RunningAvgSamplesPerSec=6.326814536121615, CurrSamplesPerSec=5.721109803263296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:16:27,222] [INFO] [logging.py:68:log_dist] [Rank 0] step=3010, skipped=6, lr=[4.437777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 10:16:27,224] [INFO] [timer.py:197:stop] 0/6020, RunningAvgSamplesPerSec=6.326815685473837, CurrSamplesPerSec=5.682755462430516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:16:38,619] [INFO] [timer.py:197:stop] 0/6022, RunningAvgSamplesPerSec=6.326820754437113, CurrSamplesPerSec=5.7102598624902585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0009, 'learning_rate': 4.4355555555555555e-06, 'epoch': 22.55} {'eval_loss': 0.30419921875, 'eval_wer': 16.237770530684852, 'eval_runtime': 1375.4073, 'eval_samples_per_second': 3.367, 'eval_steps_per_second': 0.421, 'epoch': 22.55} [2022-12-19 10:39:37,785] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step3011 is begin to save! [2022-12-19 10:39:37,796] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-3000/global_step3011/mp_rank_00_model_states.pt [2022-12-19 10:39:37,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-3000/global_step3011/mp_rank_00_model_states.pt... [2022-12-19 10:39:41,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-3000/global_step3011/mp_rank_00_model_states.pt. [2022-12-19 10:39:41,731] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-3000/global_step3011/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-19 10:39:59,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-3000/global_step3011/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-19 10:39:59,121] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-3000/global_step3011/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-19 10:39:59,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3011 is ready now! [2022-12-19 10:42:05,711] [INFO] [timer.py:197:stop] 0/6024, RunningAvgSamplesPerSec=6.326761887527537, CurrSamplesPerSec=5.438508460362663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:42:17,124] [INFO] [timer.py:197:stop] 0/6026, RunningAvgSamplesPerSec=6.326764560887732, CurrSamplesPerSec=5.691099913779776, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:42:28,441] [INFO] [timer.py:197:stop] 0/6028, RunningAvgSamplesPerSec=6.326770034944391, CurrSamplesPerSec=5.7122585133967965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:42:39,637] [INFO] [timer.py:197:stop] 0/6030, RunningAvgSamplesPerSec=6.326788309166337, CurrSamplesPerSec=5.749127261112158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:42:50,890] [INFO] [timer.py:197:stop] 0/6032, RunningAvgSamplesPerSec=6.326799810386584, CurrSamplesPerSec=5.712630255345956, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:43:02,169] [INFO] [timer.py:197:stop] 0/6034, RunningAvgSamplesPerSec=6.326807189882363, CurrSamplesPerSec=5.716685601928798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:43:13,415] [INFO] [timer.py:197:stop] 0/6036, RunningAvgSamplesPerSec=6.32681319841987, CurrSamplesPerSec=5.71400361577984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:43:24,673] [INFO] [timer.py:197:stop] 0/6038, RunningAvgSamplesPerSec=6.326822272957944, CurrSamplesPerSec=5.714242993888271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:43:35,961] [INFO] [logging.py:68:log_dist] [Rank 0] step=3020, skipped=6, lr=[4.415555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 10:43:35,963] [INFO] [timer.py:197:stop] 0/6040, RunningAvgSamplesPerSec=6.326828730523351, CurrSamplesPerSec=5.701537270405908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:43:47,279] [INFO] [timer.py:197:stop] 0/6042, RunningAvgSamplesPerSec=6.326835761637122, CurrSamplesPerSec=5.735824363708874, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:43:58,545] [INFO] [timer.py:197:stop] 0/6044, RunningAvgSamplesPerSec=6.326840276923866, CurrSamplesPerSec=5.71355580876726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:44:09,807] [INFO] [timer.py:197:stop] 0/6046, RunningAvgSamplesPerSec=6.32685027897324, CurrSamplesPerSec=5.724633880818117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:44:21,096] [INFO] [timer.py:197:stop] 0/6048, RunningAvgSamplesPerSec=6.326860049745313, CurrSamplesPerSec=5.716614991090585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:44:32,390] [INFO] [timer.py:197:stop] 0/6050, RunningAvgSamplesPerSec=6.326858041856899, CurrSamplesPerSec=5.688352915084405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:44:43,846] [INFO] [timer.py:197:stop] 0/6052, RunningAvgSamplesPerSec=6.326860540254029, CurrSamplesPerSec=5.693900533823878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:44:55,108] [INFO] [timer.py:197:stop] 0/6054, RunningAvgSamplesPerSec=6.326864413088513, CurrSamplesPerSec=5.7018143606164235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:45:06,341] [INFO] [timer.py:197:stop] 0/6056, RunningAvgSamplesPerSec=6.32687690114365, CurrSamplesPerSec=5.7256634706725205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:45:17,894] [INFO] [timer.py:197:stop] 0/6058, RunningAvgSamplesPerSec=6.326878395956099, CurrSamplesPerSec=5.647484544156928, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:45:29,137] [INFO] [logging.py:68:log_dist] [Rank 0] step=3030, skipped=6, lr=[4.393333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 10:45:29,139] [INFO] [timer.py:197:stop] 0/6060, RunningAvgSamplesPerSec=6.326884485513943, CurrSamplesPerSec=5.7110828242297496, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:45:40,408] [INFO] [timer.py:197:stop] 0/6062, RunningAvgSamplesPerSec=6.326890388869789, CurrSamplesPerSec=5.6898364328742455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:45:51,728] [INFO] [timer.py:197:stop] 0/6064, RunningAvgSamplesPerSec=6.3268890115334475, CurrSamplesPerSec=5.678600815105825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:46:03,032] [INFO] [timer.py:197:stop] 0/6066, RunningAvgSamplesPerSec=6.3268874158503765, CurrSamplesPerSec=5.696402657856941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:46:14,271] [INFO] [timer.py:197:stop] 0/6068, RunningAvgSamplesPerSec=6.326899567169986, CurrSamplesPerSec=5.706185240482239, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:46:25,603] [INFO] [timer.py:197:stop] 0/6070, RunningAvgSamplesPerSec=6.326907265602215, CurrSamplesPerSec=5.7104194795738, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:46:36,893] [INFO] [timer.py:197:stop] 0/6072, RunningAvgSamplesPerSec=6.326913383388506, CurrSamplesPerSec=5.69545727611227, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0014, 'learning_rate': 4.38e-06, 'epoch': 22.74} [2022-12-19 10:46:48,191] [INFO] [timer.py:197:stop] 0/6074, RunningAvgSamplesPerSec=6.326915999054165, CurrSamplesPerSec=5.699317161668229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:46:59,438] [INFO] [timer.py:197:stop] 0/6076, RunningAvgSamplesPerSec=6.326928389139036, CurrSamplesPerSec=5.725689605973998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:47:10,845] [INFO] [timer.py:197:stop] 0/6078, RunningAvgSamplesPerSec=6.326936348447261, CurrSamplesPerSec=5.729810486149052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:47:22,086] [INFO] [logging.py:68:log_dist] [Rank 0] step=3040, skipped=6, lr=[4.371111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 10:47:22,088] [INFO] [timer.py:197:stop] 0/6080, RunningAvgSamplesPerSec=6.326949272268606, CurrSamplesPerSec=5.7311138092845875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:47:33,334] [INFO] [timer.py:197:stop] 0/6082, RunningAvgSamplesPerSec=6.326958758305513, CurrSamplesPerSec=5.720723302274591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:47:44,644] [INFO] [timer.py:197:stop] 0/6084, RunningAvgSamplesPerSec=6.3269667478598475, CurrSamplesPerSec=5.722809808180294, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:47:55,868] [INFO] [timer.py:197:stop] 0/6086, RunningAvgSamplesPerSec=6.326975359715211, CurrSamplesPerSec=5.7243887484471125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:48:07,119] [INFO] [timer.py:197:stop] 0/6088, RunningAvgSamplesPerSec=6.326978719425123, CurrSamplesPerSec=5.70558124286684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:48:18,407] [INFO] [timer.py:197:stop] 0/6090, RunningAvgSamplesPerSec=6.3269838124563496, CurrSamplesPerSec=5.703181797882223, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:48:29,872] [INFO] [timer.py:197:stop] 0/6092, RunningAvgSamplesPerSec=6.326986761392962, CurrSamplesPerSec=5.697480161634246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:48:41,141] [INFO] [timer.py:197:stop] 0/6094, RunningAvgSamplesPerSec=6.326992859732046, CurrSamplesPerSec=5.72160367389274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:48:52,442] [INFO] [timer.py:197:stop] 0/6096, RunningAvgSamplesPerSec=6.326990819706599, CurrSamplesPerSec=5.681417281511372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:49:03,711] [INFO] [timer.py:197:stop] 0/6098, RunningAvgSamplesPerSec=6.326999632228783, CurrSamplesPerSec=5.705998448445504, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:49:14,980] [INFO] [logging.py:68:log_dist] [Rank 0] step=3050, skipped=6, lr=[4.348888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 10:49:14,983] [INFO] [timer.py:197:stop] 0/6100, RunningAvgSamplesPerSec=6.327008253674115, CurrSamplesPerSec=5.725880376475969, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:49:26,264] [INFO] [timer.py:197:stop] 0/6102, RunningAvgSamplesPerSec=6.32701415615018, CurrSamplesPerSec=5.695239527447513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:49:37,771] [INFO] [timer.py:197:stop] 0/6104, RunningAvgSamplesPerSec=6.32701941954536, CurrSamplesPerSec=5.69024458418137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:49:49,029] [INFO] [timer.py:197:stop] 0/6106, RunningAvgSamplesPerSec=6.327026432878404, CurrSamplesPerSec=5.711487953330966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:50:00,304] [INFO] [timer.py:197:stop] 0/6108, RunningAvgSamplesPerSec=6.32703174587746, CurrSamplesPerSec=5.711076262923646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:50:11,564] [INFO] [timer.py:197:stop] 0/6110, RunningAvgSamplesPerSec=6.327040125532337, CurrSamplesPerSec=5.7332789242132876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:50:22,826] [INFO] [timer.py:197:stop] 0/6112, RunningAvgSamplesPerSec=6.327042650157597, CurrSamplesPerSec=5.696819972942406, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:50:34,103] [INFO] [timer.py:197:stop] 0/6114, RunningAvgSamplesPerSec=6.327054404599647, CurrSamplesPerSec=5.726820980266411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:50:45,534] [INFO] [timer.py:197:stop] 0/6116, RunningAvgSamplesPerSec=6.327056723147415, CurrSamplesPerSec=5.702900455381463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:50:56,820] [INFO] [timer.py:197:stop] 0/6118, RunningAvgSamplesPerSec=6.327059432413841, CurrSamplesPerSec=5.698478473698988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:51:08,090] [INFO] [logging.py:68:log_dist] [Rank 0] step=3060, skipped=6, lr=[4.326666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 10:51:08,091] [INFO] [timer.py:197:stop] 0/6120, RunningAvgSamplesPerSec=6.32706328882975, CurrSamplesPerSec=5.69160333933403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:51:19,377] [INFO] [timer.py:197:stop] 0/6122, RunningAvgSamplesPerSec=6.327065481539063, CurrSamplesPerSec=5.703493463754688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0013, 'learning_rate': 4.324444444444445e-06, 'epoch': 22.93} [2022-12-19 10:51:30,644] [INFO] [timer.py:197:stop] 0/6124, RunningAvgSamplesPerSec=6.327071927142479, CurrSamplesPerSec=5.703324781884078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:51:42,041] [INFO] [timer.py:197:stop] 0/6126, RunningAvgSamplesPerSec=6.3270740223857995, CurrSamplesPerSec=5.696163563381969, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:51:53,317] [INFO] [timer.py:197:stop] 0/6128, RunningAvgSamplesPerSec=6.327071826879103, CurrSamplesPerSec=5.6841262898552, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:52:04,787] [INFO] [timer.py:197:stop] 0/6130, RunningAvgSamplesPerSec=6.327078833671455, CurrSamplesPerSec=5.713600075535566, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:52:16,050] [INFO] [timer.py:197:stop] 0/6132, RunningAvgSamplesPerSec=6.327081196164336, CurrSamplesPerSec=5.695300669397985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:52:27,421] [INFO] [timer.py:197:stop] 0/6134, RunningAvgSamplesPerSec=6.327090102194974, CurrSamplesPerSec=5.727138167776327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:52:38,652] [INFO] [timer.py:197:stop] 0/6136, RunningAvgSamplesPerSec=6.327104046417598, CurrSamplesPerSec=5.745165698528648, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:52:49,846] [INFO] [timer.py:197:stop] 0/6138, RunningAvgSamplesPerSec=6.32711232513379, CurrSamplesPerSec=5.716946146665365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:53:01,194] [INFO] [logging.py:68:log_dist] [Rank 0] step=3070, skipped=6, lr=[4.304444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 10:53:01,195] [INFO] [timer.py:197:stop] 0/6140, RunningAvgSamplesPerSec=6.327105732978362, CurrSamplesPerSec=5.668607957725034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:53:11,571] [INFO] [timer.py:197:stop] 0/6142, RunningAvgSamplesPerSec=6.327273908813965, CurrSamplesPerSec=5.702436700691527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:53:22,867] [INFO] [timer.py:197:stop] 0/6144, RunningAvgSamplesPerSec=6.327274495416698, CurrSamplesPerSec=5.69301514345606, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:53:34,125] [INFO] [timer.py:197:stop] 0/6146, RunningAvgSamplesPerSec=6.327280995337194, CurrSamplesPerSec=5.713020770216942, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:53:45,393] [INFO] [timer.py:197:stop] 0/6148, RunningAvgSamplesPerSec=6.32728460602732, CurrSamplesPerSec=5.689499728004075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:53:56,670] [INFO] [timer.py:197:stop] 0/6150, RunningAvgSamplesPerSec=6.3272774363998865, CurrSamplesPerSec=5.7160448116198745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:54:07,986] [INFO] [timer.py:197:stop] 0/6152, RunningAvgSamplesPerSec=6.32727711257631, CurrSamplesPerSec=5.692890785797223, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:54:19,262] [INFO] [timer.py:197:stop] 0/6154, RunningAvgSamplesPerSec=6.327278756824882, CurrSamplesPerSec=5.698396699102621, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:54:30,549] [INFO] [timer.py:197:stop] 0/6156, RunningAvgSamplesPerSec=6.32727809422466, CurrSamplesPerSec=5.6943851275178625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:54:41,901] [INFO] [timer.py:197:stop] 0/6158, RunningAvgSamplesPerSec=6.327261356238215, CurrSamplesPerSec=5.60578751600644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:54:53,215] [INFO] [logging.py:68:log_dist] [Rank 0] step=3080, skipped=6, lr=[4.282222222222222e-06], mom=[[0.9, 0.999]] [2022-12-19 10:54:53,216] [INFO] [timer.py:197:stop] 0/6160, RunningAvgSamplesPerSec=6.327261778233376, CurrSamplesPerSec=5.686400587308105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:55:04,579] [INFO] [timer.py:197:stop] 0/6162, RunningAvgSamplesPerSec=6.327261172589079, CurrSamplesPerSec=5.6965329715876045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:55:15,868] [INFO] [timer.py:197:stop] 0/6164, RunningAvgSamplesPerSec=6.327258906574479, CurrSamplesPerSec=5.683610226640229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:55:27,176] [INFO] [timer.py:197:stop] 0/6166, RunningAvgSamplesPerSec=6.3272574135671515, CurrSamplesPerSec=5.670636489232706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:55:38,589] [INFO] [timer.py:197:stop] 0/6168, RunningAvgSamplesPerSec=6.327247669219849, CurrSamplesPerSec=5.617188603501507, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:55:49,946] [INFO] [timer.py:197:stop] 0/6170, RunningAvgSamplesPerSec=6.3272473329776275, CurrSamplesPerSec=5.698697921863423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:56:01,301] [INFO] [timer.py:197:stop] 0/6172, RunningAvgSamplesPerSec=6.327250608765455, CurrSamplesPerSec=5.698032854797171, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0008, 'learning_rate': 4.268888888888889e-06, 'epoch': 23.12} [2022-12-19 10:56:12,640] [INFO] [timer.py:197:stop] 0/6174, RunningAvgSamplesPerSec=6.327248103189646, CurrSamplesPerSec=5.671972711864149, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:56:23,999] [INFO] [timer.py:197:stop] 0/6176, RunningAvgSamplesPerSec=6.327250994218186, CurrSamplesPerSec=5.710210545774247, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:56:35,298] [INFO] [timer.py:197:stop] 0/6178, RunningAvgSamplesPerSec=6.3272514626735505, CurrSamplesPerSec=5.69936048203257, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:56:46,590] [INFO] [logging.py:68:log_dist] [Rank 0] step=3090, skipped=6, lr=[4.26e-06], mom=[[0.9, 0.999]] [2022-12-19 10:56:46,592] [INFO] [timer.py:197:stop] 0/6180, RunningAvgSamplesPerSec=6.327257214560063, CurrSamplesPerSec=5.702066041935323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:56:57,898] [INFO] [timer.py:197:stop] 0/6182, RunningAvgSamplesPerSec=6.327262766102019, CurrSamplesPerSec=5.709976849570109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:57:09,148] [INFO] [timer.py:197:stop] 0/6184, RunningAvgSamplesPerSec=6.32726483377244, CurrSamplesPerSec=5.7038301302468835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:57:20,551] [INFO] [timer.py:197:stop] 0/6186, RunningAvgSamplesPerSec=6.327261566930227, CurrSamplesPerSec=5.682463140101337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:57:31,909] [INFO] [timer.py:197:stop] 0/6188, RunningAvgSamplesPerSec=6.327258616972464, CurrSamplesPerSec=5.674133423363426, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:57:43,212] [INFO] [timer.py:197:stop] 0/6190, RunningAvgSamplesPerSec=6.327262718419701, CurrSamplesPerSec=5.692825107828098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:57:54,535] [INFO] [timer.py:197:stop] 0/6192, RunningAvgSamplesPerSec=6.327264761122561, CurrSamplesPerSec=5.669983704118327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:58:05,856] [INFO] [timer.py:197:stop] 0/6194, RunningAvgSamplesPerSec=6.32726868375492, CurrSamplesPerSec=5.693701502285367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:58:17,272] [INFO] [timer.py:197:stop] 0/6196, RunningAvgSamplesPerSec=6.327262512748024, CurrSamplesPerSec=5.675874987392722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:58:28,586] [INFO] [timer.py:197:stop] 0/6198, RunningAvgSamplesPerSec=6.327260161972137, CurrSamplesPerSec=5.681143853467182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:58:39,865] [INFO] [logging.py:68:log_dist] [Rank 0] step=3100, skipped=6, lr=[4.2377777777777775e-06], mom=[[0.9, 0.999]] [2022-12-19 10:58:39,865] [INFO] [timer.py:197:stop] 0/6200, RunningAvgSamplesPerSec=6.327266045592419, CurrSamplesPerSec=5.701361923033709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:58:51,324] [INFO] [timer.py:197:stop] 0/6202, RunningAvgSamplesPerSec=6.3272670842140775, CurrSamplesPerSec=5.699359513972542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:59:02,594] [INFO] [timer.py:197:stop] 0/6204, RunningAvgSamplesPerSec=6.327270603961485, CurrSamplesPerSec=5.726513845261277, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:59:13,974] [INFO] [timer.py:197:stop] 0/6206, RunningAvgSamplesPerSec=6.327276229209095, CurrSamplesPerSec=5.716497634866609, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:59:25,629] [INFO] [timer.py:197:stop] 0/6208, RunningAvgSamplesPerSec=6.327202555192295, CurrSamplesPerSec=5.345300339318018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:59:36,953] [INFO] [timer.py:197:stop] 0/6210, RunningAvgSamplesPerSec=6.3272092445918515, CurrSamplesPerSec=5.698736393569498, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:59:48,355] [INFO] [timer.py:197:stop] 0/6212, RunningAvgSamplesPerSec=6.32718454580575, CurrSamplesPerSec=5.569730727469691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 10:59:59,853] [INFO] [timer.py:197:stop] 0/6214, RunningAvgSamplesPerSec=6.327196232616864, CurrSamplesPerSec=5.727615237028793, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:00:11,107] [INFO] [timer.py:197:stop] 0/6216, RunningAvgSamplesPerSec=6.32720380015745, CurrSamplesPerSec=5.711213810495127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:00:22,729] [INFO] [timer.py:197:stop] 0/6218, RunningAvgSamplesPerSec=6.327210685252948, CurrSamplesPerSec=5.718922920507628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:00:34,029] [INFO] [logging.py:68:log_dist] [Rank 0] step=3110, skipped=6, lr=[4.215555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 11:00:34,030] [INFO] [timer.py:197:stop] 0/6220, RunningAvgSamplesPerSec=6.327204681936038, CurrSamplesPerSec=5.633999744950366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:00:45,344] [INFO] [timer.py:197:stop] 0/6222, RunningAvgSamplesPerSec=6.327207084049242, CurrSamplesPerSec=5.679151773732535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0007, 'learning_rate': 4.213333333333333e-06, 'epoch': 23.31} [2022-12-19 11:00:56,788] [INFO] [timer.py:197:stop] 0/6224, RunningAvgSamplesPerSec=6.327210586630671, CurrSamplesPerSec=5.698925614174507, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:01:08,058] [INFO] [timer.py:197:stop] 0/6226, RunningAvgSamplesPerSec=6.32722084476299, CurrSamplesPerSec=5.720849610417698, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:01:19,453] [INFO] [timer.py:197:stop] 0/6228, RunningAvgSamplesPerSec=6.327192397898032, CurrSamplesPerSec=5.655946832356997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:01:30,720] [INFO] [timer.py:197:stop] 0/6230, RunningAvgSamplesPerSec=6.327201049206538, CurrSamplesPerSec=5.705797114880445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:01:42,187] [INFO] [timer.py:197:stop] 0/6232, RunningAvgSamplesPerSec=6.327168462948351, CurrSamplesPerSec=5.515368051235686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:01:53,572] [INFO] [timer.py:197:stop] 0/6234, RunningAvgSamplesPerSec=6.327168528806226, CurrSamplesPerSec=5.713845744243135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:02:04,856] [INFO] [timer.py:197:stop] 0/6236, RunningAvgSamplesPerSec=6.327174191367501, CurrSamplesPerSec=5.712601807718791, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:02:16,140] [INFO] [timer.py:197:stop] 0/6238, RunningAvgSamplesPerSec=6.327171667677079, CurrSamplesPerSec=5.677078485860062, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:02:27,514] [INFO] [logging.py:68:log_dist] [Rank 0] step=3120, skipped=6, lr=[4.1933333333333336e-06], mom=[[0.9, 0.999]] [2022-12-19 11:02:27,515] [INFO] [timer.py:197:stop] 0/6240, RunningAvgSamplesPerSec=6.327157256156985, CurrSamplesPerSec=5.7065739041466435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:02:38,954] [INFO] [timer.py:197:stop] 0/6242, RunningAvgSamplesPerSec=6.327158221857303, CurrSamplesPerSec=5.708536469284858, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:02:50,281] [INFO] [timer.py:197:stop] 0/6244, RunningAvgSamplesPerSec=6.327155968487, CurrSamplesPerSec=5.699792753691997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:03:01,927] [INFO] [timer.py:197:stop] 0/6246, RunningAvgSamplesPerSec=6.327149307938019, CurrSamplesPerSec=5.670178685470472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:03:13,287] [INFO] [timer.py:197:stop] 0/6248, RunningAvgSamplesPerSec=6.32713724389844, CurrSamplesPerSec=5.646811182659259, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:03:24,625] [INFO] [timer.py:197:stop] 0/6250, RunningAvgSamplesPerSec=6.327129170582267, CurrSamplesPerSec=5.688087979284194, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:03:36,183] [INFO] [timer.py:197:stop] 0/6252, RunningAvgSamplesPerSec=6.327130026939177, CurrSamplesPerSec=5.682758109107501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:03:47,546] [INFO] [timer.py:197:stop] 0/6254, RunningAvgSamplesPerSec=6.3271187863008915, CurrSamplesPerSec=5.676775702616508, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:03:59,000] [INFO] [timer.py:197:stop] 0/6256, RunningAvgSamplesPerSec=6.327120615767011, CurrSamplesPerSec=5.697514747164152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:04:10,576] [INFO] [timer.py:197:stop] 0/6258, RunningAvgSamplesPerSec=6.327066060666655, CurrSamplesPerSec=5.420202343108089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:04:21,897] [INFO] [logging.py:68:log_dist] [Rank 0] step=3130, skipped=6, lr=[4.171111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 11:04:21,898] [INFO] [timer.py:197:stop] 0/6260, RunningAvgSamplesPerSec=6.327060754270112, CurrSamplesPerSec=5.670689916466391, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:04:33,364] [INFO] [timer.py:197:stop] 0/6262, RunningAvgSamplesPerSec=6.32705763904573, CurrSamplesPerSec=5.661791532192272, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:04:44,733] [INFO] [timer.py:197:stop] 0/6264, RunningAvgSamplesPerSec=6.327040786934852, CurrSamplesPerSec=5.689076250383443, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:04:56,002] [INFO] [timer.py:197:stop] 0/6266, RunningAvgSamplesPerSec=6.3270442624725405, CurrSamplesPerSec=5.711625763791807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:05:07,408] [INFO] [timer.py:197:stop] 0/6268, RunningAvgSamplesPerSec=6.327018144401937, CurrSamplesPerSec=5.542274536438821, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:05:18,704] [INFO] [timer.py:197:stop] 0/6270, RunningAvgSamplesPerSec=6.327017561023724, CurrSamplesPerSec=5.693643534550888, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:05:30,003] [INFO] [timer.py:197:stop] 0/6272, RunningAvgSamplesPerSec=6.327016340978862, CurrSamplesPerSec=5.667957554378917, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0007, 'learning_rate': 4.157777777777778e-06, 'epoch': 23.49} [2022-12-19 11:05:41,529] [INFO] [timer.py:197:stop] 0/6274, RunningAvgSamplesPerSec=6.327026269681883, CurrSamplesPerSec=5.709424994339185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:05:52,779] [INFO] [timer.py:197:stop] 0/6276, RunningAvgSamplesPerSec=6.327037324312627, CurrSamplesPerSec=5.729613094304274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:06:04,286] [INFO] [timer.py:197:stop] 0/6278, RunningAvgSamplesPerSec=6.327018969305853, CurrSamplesPerSec=5.6582818801226615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:06:15,677] [INFO] [logging.py:68:log_dist] [Rank 0] step=3140, skipped=6, lr=[4.148888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 11:06:15,678] [INFO] [timer.py:197:stop] 0/6280, RunningAvgSamplesPerSec=6.3270224987964525, CurrSamplesPerSec=5.700348074530658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:06:27,024] [INFO] [timer.py:197:stop] 0/6282, RunningAvgSamplesPerSec=6.327011520142965, CurrSamplesPerSec=5.6104314157040145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:06:38,261] [INFO] [timer.py:197:stop] 0/6284, RunningAvgSamplesPerSec=6.327019179744702, CurrSamplesPerSec=5.723443573959128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:06:49,611] [INFO] [timer.py:197:stop] 0/6286, RunningAvgSamplesPerSec=6.3270259205933925, CurrSamplesPerSec=5.704904383852458, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:07:00,910] [INFO] [timer.py:197:stop] 0/6288, RunningAvgSamplesPerSec=6.327035421475837, CurrSamplesPerSec=5.7436625322465975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:07:12,293] [INFO] [timer.py:197:stop] 0/6290, RunningAvgSamplesPerSec=6.327034877191592, CurrSamplesPerSec=5.663493025835873, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:07:23,675] [INFO] [timer.py:197:stop] 0/6292, RunningAvgSamplesPerSec=6.327020241286576, CurrSamplesPerSec=5.589402892412958, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:07:34,946] [INFO] [timer.py:197:stop] 0/6294, RunningAvgSamplesPerSec=6.327028168245351, CurrSamplesPerSec=5.718962640506195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:07:46,464] [INFO] [timer.py:197:stop] 0/6296, RunningAvgSamplesPerSec=6.327033314226343, CurrSamplesPerSec=5.696637178508264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:07:58,074] [INFO] [timer.py:197:stop] 0/6298, RunningAvgSamplesPerSec=6.326975378337808, CurrSamplesPerSec=5.70488619745259, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:08:09,494] [INFO] [logging.py:68:log_dist] [Rank 0] step=3150, skipped=6, lr=[4.126666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 11:08:09,496] [INFO] [timer.py:197:stop] 0/6300, RunningAvgSamplesPerSec=6.326978656109581, CurrSamplesPerSec=5.699656965889001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:08:20,881] [INFO] [timer.py:197:stop] 0/6302, RunningAvgSamplesPerSec=6.326961792569862, CurrSamplesPerSec=5.601718657411167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:08:32,134] [INFO] [timer.py:197:stop] 0/6304, RunningAvgSamplesPerSec=6.326963244296735, CurrSamplesPerSec=5.69401744718563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:08:43,692] [INFO] [timer.py:197:stop] 0/6306, RunningAvgSamplesPerSec=6.326908503043957, CurrSamplesPerSec=5.3873461051160385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:08:55,005] [INFO] [timer.py:197:stop] 0/6308, RunningAvgSamplesPerSec=6.326914544342855, CurrSamplesPerSec=5.70276742712076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:09:06,290] [INFO] [timer.py:197:stop] 0/6310, RunningAvgSamplesPerSec=6.326912639727731, CurrSamplesPerSec=5.687040772264228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:09:17,918] [INFO] [timer.py:197:stop] 0/6312, RunningAvgSamplesPerSec=6.326902910914335, CurrSamplesPerSec=5.707576619410886, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:09:29,124] [INFO] [timer.py:197:stop] 0/6314, RunningAvgSamplesPerSec=6.326911901294146, CurrSamplesPerSec=5.7121199431008405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:09:40,367] [INFO] [timer.py:197:stop] 0/6316, RunningAvgSamplesPerSec=6.326920263673688, CurrSamplesPerSec=5.709680505594769, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:09:51,665] [INFO] [timer.py:197:stop] 0/6318, RunningAvgSamplesPerSec=6.326920864385284, CurrSamplesPerSec=5.702171178150874, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:10:03,164] [INFO] [logging.py:68:log_dist] [Rank 0] step=3160, skipped=6, lr=[4.104444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 11:10:03,166] [INFO] [timer.py:197:stop] 0/6320, RunningAvgSamplesPerSec=6.326928441930766, CurrSamplesPerSec=5.717109060067421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:10:14,683] [INFO] [timer.py:197:stop] 0/6322, RunningAvgSamplesPerSec=6.326883869503104, CurrSamplesPerSec=5.728803122988968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 4.102222222222222e-06, 'epoch': 23.68} [2022-12-19 11:10:26,155] [INFO] [timer.py:197:stop] 0/6324, RunningAvgSamplesPerSec=6.3268878036679155, CurrSamplesPerSec=5.687947686607847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:10:37,473] [INFO] [timer.py:197:stop] 0/6326, RunningAvgSamplesPerSec=6.326884155098113, CurrSamplesPerSec=5.639702700027111, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:10:48,722] [INFO] [timer.py:197:stop] 0/6328, RunningAvgSamplesPerSec=6.3268928187878055, CurrSamplesPerSec=5.70551769713892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:11:00,310] [INFO] [timer.py:197:stop] 0/6330, RunningAvgSamplesPerSec=6.3268344599426385, CurrSamplesPerSec=5.4034570230087215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:11:11,685] [INFO] [timer.py:197:stop] 0/6332, RunningAvgSamplesPerSec=6.326844280223183, CurrSamplesPerSec=5.713283412983096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:11:22,942] [INFO] [timer.py:197:stop] 0/6334, RunningAvgSamplesPerSec=6.326848182855663, CurrSamplesPerSec=5.708386911504555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:11:34,594] [INFO] [timer.py:197:stop] 0/6336, RunningAvgSamplesPerSec=6.32683128447851, CurrSamplesPerSec=5.689715831993438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:11:45,934] [INFO] [timer.py:197:stop] 0/6338, RunningAvgSamplesPerSec=6.3268331310510195, CurrSamplesPerSec=5.693300824439715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:11:57,253] [INFO] [logging.py:68:log_dist] [Rank 0] step=3170, skipped=6, lr=[4.0822222222222225e-06], mom=[[0.9, 0.999]] [2022-12-19 11:11:57,254] [INFO] [timer.py:197:stop] 0/6340, RunningAvgSamplesPerSec=6.326827786896254, CurrSamplesPerSec=5.639840148878837, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:12:08,496] [INFO] [timer.py:197:stop] 0/6342, RunningAvgSamplesPerSec=6.326834616234841, CurrSamplesPerSec=5.70303882104949, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:12:19,801] [INFO] [timer.py:197:stop] 0/6344, RunningAvgSamplesPerSec=6.326846280727263, CurrSamplesPerSec=5.727352985743719, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:12:31,307] [INFO] [timer.py:197:stop] 0/6346, RunningAvgSamplesPerSec=6.3268030692259645, CurrSamplesPerSec=5.707925419771607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:12:42,568] [INFO] [timer.py:197:stop] 0/6348, RunningAvgSamplesPerSec=6.326813127004732, CurrSamplesPerSec=5.7318062037624555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:12:53,937] [INFO] [timer.py:197:stop] 0/6350, RunningAvgSamplesPerSec=6.326800424129535, CurrSamplesPerSec=5.596799287173705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:13:05,169] [INFO] [timer.py:197:stop] 0/6352, RunningAvgSamplesPerSec=6.326808252498459, CurrSamplesPerSec=5.705792991329403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:13:16,757] [INFO] [timer.py:197:stop] 0/6354, RunningAvgSamplesPerSec=6.326754094868004, CurrSamplesPerSec=5.39939278934328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:13:28,193] [INFO] [timer.py:197:stop] 0/6356, RunningAvgSamplesPerSec=6.326752180715505, CurrSamplesPerSec=5.654919526208986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:13:39,523] [INFO] [timer.py:197:stop] 0/6358, RunningAvgSamplesPerSec=6.326744195019252, CurrSamplesPerSec=5.723062126293856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:13:50,981] [INFO] [logging.py:68:log_dist] [Rank 0] step=3180, skipped=6, lr=[4.060000000000001e-06], mom=[[0.9, 0.999]] [2022-12-19 11:13:50,983] [INFO] [timer.py:197:stop] 0/6360, RunningAvgSamplesPerSec=6.326750292035882, CurrSamplesPerSec=5.7285070221707475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:14:02,205] [INFO] [timer.py:197:stop] 0/6362, RunningAvgSamplesPerSec=6.326763335935045, CurrSamplesPerSec=5.712003743719647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:14:13,446] [INFO] [timer.py:197:stop] 0/6364, RunningAvgSamplesPerSec=6.326775637871619, CurrSamplesPerSec=5.718078940050794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:14:24,693] [INFO] [timer.py:197:stop] 0/6366, RunningAvgSamplesPerSec=6.32678936152753, CurrSamplesPerSec=5.734124458240602, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:14:36,184] [INFO] [timer.py:197:stop] 0/6368, RunningAvgSamplesPerSec=6.326798888448008, CurrSamplesPerSec=5.7140797572509525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:14:47,674] [INFO] [timer.py:197:stop] 0/6370, RunningAvgSamplesPerSec=6.326753911822436, CurrSamplesPerSec=5.728961822153146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:14:59,133] [INFO] [timer.py:197:stop] 0/6372, RunningAvgSamplesPerSec=6.326757280607201, CurrSamplesPerSec=5.701119990430859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0005, 'learning_rate': 4.046666666666667e-06, 'epoch': 23.87} [2022-12-19 11:15:10,505] [INFO] [timer.py:197:stop] 0/6374, RunningAvgSamplesPerSec=6.326741781411964, CurrSamplesPerSec=5.576822297628186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:15:21,783] [INFO] [timer.py:197:stop] 0/6376, RunningAvgSamplesPerSec=6.326748509885421, CurrSamplesPerSec=5.711012837741898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:15:33,359] [INFO] [timer.py:197:stop] 0/6378, RunningAvgSamplesPerSec=6.3266989593453715, CurrSamplesPerSec=5.43475817083076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:15:44,853] [INFO] [logging.py:68:log_dist] [Rank 0] step=3190, skipped=6, lr=[4.0377777777777786e-06], mom=[[0.9, 0.999]] [2022-12-19 11:15:44,855] [INFO] [timer.py:197:stop] 0/6380, RunningAvgSamplesPerSec=6.326700992376556, CurrSamplesPerSec=5.687328263889719, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:15:56,118] [INFO] [timer.py:197:stop] 0/6382, RunningAvgSamplesPerSec=6.326703667049184, CurrSamplesPerSec=5.703754988851524, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:16:07,624] [INFO] [timer.py:197:stop] 0/6384, RunningAvgSamplesPerSec=6.32671023479163, CurrSamplesPerSec=5.70599165623824, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:16:18,908] [INFO] [timer.py:197:stop] 0/6386, RunningAvgSamplesPerSec=6.326712213616765, CurrSamplesPerSec=5.691741640086451, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:16:30,193] [INFO] [timer.py:197:stop] 0/6388, RunningAvgSamplesPerSec=6.326715755610062, CurrSamplesPerSec=5.686582484444782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:16:41,425] [INFO] [timer.py:197:stop] 0/6390, RunningAvgSamplesPerSec=6.326728167287535, CurrSamplesPerSec=5.7346516960470355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:16:52,755] [INFO] [timer.py:197:stop] 0/6392, RunningAvgSamplesPerSec=6.326737179693918, CurrSamplesPerSec=5.713086671826778, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:17:04,092] [INFO] [timer.py:197:stop] 0/6394, RunningAvgSamplesPerSec=6.326728430482575, CurrSamplesPerSec=5.707211359235236, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:17:15,554] [INFO] [timer.py:197:stop] 0/6396, RunningAvgSamplesPerSec=6.326730842278121, CurrSamplesPerSec=5.701452017297671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:17:26,884] [INFO] [timer.py:197:stop] 0/6398, RunningAvgSamplesPerSec=6.32672451895067, CurrSamplesPerSec=5.647306802940369, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:17:38,160] [INFO] [logging.py:68:log_dist] [Rank 0] step=3200, skipped=6, lr=[4.015555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 11:17:38,162] [INFO] [timer.py:197:stop] 0/6400, RunningAvgSamplesPerSec=6.326729296565198, CurrSamplesPerSec=5.6953838052232335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:17:49,768] [INFO] [timer.py:197:stop] 0/6402, RunningAvgSamplesPerSec=6.326668901422619, CurrSamplesPerSec=5.382787477680625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:18:01,177] [INFO] [timer.py:197:stop] 0/6404, RunningAvgSamplesPerSec=6.326670811640981, CurrSamplesPerSec=5.688068694657253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:18:12,470] [INFO] [timer.py:197:stop] 0/6406, RunningAvgSamplesPerSec=6.326677801914949, CurrSamplesPerSec=5.719692806983137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:18:22,922] [INFO] [timer.py:197:stop] 0/6408, RunningAvgSamplesPerSec=6.326842163124239, CurrSamplesPerSec=6.658537642905244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:18:34,193] [INFO] [timer.py:197:stop] 0/6410, RunningAvgSamplesPerSec=6.326846298587914, CurrSamplesPerSec=5.70270370175553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:18:45,627] [INFO] [timer.py:197:stop] 0/6412, RunningAvgSamplesPerSec=6.3268168620699505, CurrSamplesPerSec=5.531136929564547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:18:57,171] [INFO] [timer.py:197:stop] 0/6414, RunningAvgSamplesPerSec=6.326820112381396, CurrSamplesPerSec=5.700635702570435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:19:08,425] [INFO] [timer.py:197:stop] 0/6416, RunningAvgSamplesPerSec=6.326825233183382, CurrSamplesPerSec=5.696644190247708, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:19:19,744] [INFO] [timer.py:197:stop] 0/6418, RunningAvgSamplesPerSec=6.326824767449204, CurrSamplesPerSec=5.694154898992798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:19:31,082] [INFO] [logging.py:68:log_dist] [Rank 0] step=3210, skipped=6, lr=[3.993333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 11:19:31,084] [INFO] [timer.py:197:stop] 0/6420, RunningAvgSamplesPerSec=6.326831185722855, CurrSamplesPerSec=5.71410116480318, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:19:42,388] [INFO] [timer.py:197:stop] 0/6422, RunningAvgSamplesPerSec=6.326831176185993, CurrSamplesPerSec=5.658815780630872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:19:53,682] [INFO] [timer.py:197:stop] 0/6424, RunningAvgSamplesPerSec=6.326834924211408, CurrSamplesPerSec=5.695173795273409, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 3.9888888888888895e-06, 'epoch': 24.06} [2022-12-19 11:20:04,944] [INFO] [timer.py:197:stop] 0/6426, RunningAvgSamplesPerSec=6.326844999681441, CurrSamplesPerSec=5.73325320938612, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:20:16,260] [INFO] [timer.py:197:stop] 0/6428, RunningAvgSamplesPerSec=6.326844150961175, CurrSamplesPerSec=5.710333231791889, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:20:27,533] [INFO] [timer.py:197:stop] 0/6430, RunningAvgSamplesPerSec=6.326853608777017, CurrSamplesPerSec=5.736650542909522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:20:39,209] [INFO] [timer.py:197:stop] 0/6432, RunningAvgSamplesPerSec=6.326859442378916, CurrSamplesPerSec=5.691139972187485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:20:50,478] [INFO] [timer.py:197:stop] 0/6434, RunningAvgSamplesPerSec=6.326866584123425, CurrSamplesPerSec=5.698333797163989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:21:02,090] [INFO] [timer.py:197:stop] 0/6436, RunningAvgSamplesPerSec=6.326806159699653, CurrSamplesPerSec=5.373908699100902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:21:13,552] [INFO] [timer.py:197:stop] 0/6438, RunningAvgSamplesPerSec=6.326810754341281, CurrSamplesPerSec=5.7038812760064435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:21:24,794] [INFO] [logging.py:68:log_dist] [Rank 0] step=3220, skipped=6, lr=[3.971111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 11:21:24,795] [INFO] [timer.py:197:stop] 0/6440, RunningAvgSamplesPerSec=6.3268243169532905, CurrSamplesPerSec=5.751276683341738, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:21:36,128] [INFO] [timer.py:197:stop] 0/6442, RunningAvgSamplesPerSec=6.326826765914643, CurrSamplesPerSec=5.728342236129661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:21:47,612] [INFO] [timer.py:197:stop] 0/6444, RunningAvgSamplesPerSec=6.326831332918718, CurrSamplesPerSec=5.706153218121675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:21:59,027] [INFO] [timer.py:197:stop] 0/6446, RunningAvgSamplesPerSec=6.326808717918829, CurrSamplesPerSec=5.561282312034322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:22:10,291] [INFO] [timer.py:197:stop] 0/6448, RunningAvgSamplesPerSec=6.326813822229726, CurrSamplesPerSec=5.70269885577842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:22:21,676] [INFO] [timer.py:197:stop] 0/6450, RunningAvgSamplesPerSec=6.326819882059634, CurrSamplesPerSec=5.711502293086561, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:22:33,008] [INFO] [timer.py:197:stop] 0/6452, RunningAvgSamplesPerSec=6.326816129250669, CurrSamplesPerSec=5.727025510870033, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:22:44,253] [INFO] [timer.py:197:stop] 0/6454, RunningAvgSamplesPerSec=6.326828732935054, CurrSamplesPerSec=5.7266526261319575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:22:55,553] [INFO] [timer.py:197:stop] 0/6456, RunningAvgSamplesPerSec=6.326834485826158, CurrSamplesPerSec=5.712570685852921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:23:06,819] [INFO] [timer.py:197:stop] 0/6458, RunningAvgSamplesPerSec=6.326843586930905, CurrSamplesPerSec=5.7133930978046275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:23:18,143] [INFO] [logging.py:68:log_dist] [Rank 0] step=3230, skipped=6, lr=[3.948888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 11:23:18,145] [INFO] [timer.py:197:stop] 0/6460, RunningAvgSamplesPerSec=6.326835270266657, CurrSamplesPerSec=5.62331010847114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:23:29,666] [INFO] [timer.py:197:stop] 0/6462, RunningAvgSamplesPerSec=6.326840167761349, CurrSamplesPerSec=5.69086898564246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:23:40,940] [INFO] [timer.py:197:stop] 0/6464, RunningAvgSamplesPerSec=6.326844238128695, CurrSamplesPerSec=5.7064960216052905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:23:52,338] [INFO] [timer.py:197:stop] 0/6466, RunningAvgSamplesPerSec=6.326825955319001, CurrSamplesPerSec=5.7005282018253896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:24:03,750] [INFO] [timer.py:197:stop] 0/6468, RunningAvgSamplesPerSec=6.3268330922552725, CurrSamplesPerSec=5.721066883225771, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:24:15,303] [INFO] [timer.py:197:stop] 0/6470, RunningAvgSamplesPerSec=6.326786425368891, CurrSamplesPerSec=5.4342120249058095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:24:26,556] [INFO] [timer.py:197:stop] 0/6472, RunningAvgSamplesPerSec=6.326800104262847, CurrSamplesPerSec=5.748316687330606, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:24:37,787] [INFO] [timer.py:197:stop] 0/6474, RunningAvgSamplesPerSec=6.326809580289309, CurrSamplesPerSec=5.727884112299497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 3.9333333333333335e-06, 'epoch': 24.25} [2022-12-19 11:24:49,090] [INFO] [timer.py:197:stop] 0/6476, RunningAvgSamplesPerSec=6.3268155441151235, CurrSamplesPerSec=5.717216212957002, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:25:00,352] [INFO] [timer.py:197:stop] 0/6478, RunningAvgSamplesPerSec=6.326823523953328, CurrSamplesPerSec=5.728636608357531, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:25:11,746] [INFO] [logging.py:68:log_dist] [Rank 0] step=3240, skipped=6, lr=[3.926666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 11:25:11,747] [INFO] [timer.py:197:stop] 0/6480, RunningAvgSamplesPerSec=6.3268247762952035, CurrSamplesPerSec=5.708309465019596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:25:23,006] [INFO] [timer.py:197:stop] 0/6482, RunningAvgSamplesPerSec=6.326832270280319, CurrSamplesPerSec=5.700892607033607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:25:34,413] [INFO] [timer.py:197:stop] 0/6484, RunningAvgSamplesPerSec=6.3268142494145865, CurrSamplesPerSec=5.5893474944748975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:25:45,762] [INFO] [timer.py:197:stop] 0/6486, RunningAvgSamplesPerSec=6.326829992048655, CurrSamplesPerSec=5.724733258188007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:25:57,056] [INFO] [timer.py:197:stop] 0/6488, RunningAvgSamplesPerSec=6.326841234731283, CurrSamplesPerSec=5.748723423000859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:26:08,461] [INFO] [timer.py:197:stop] 0/6490, RunningAvgSamplesPerSec=6.326838650578448, CurrSamplesPerSec=5.7163944043294395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:26:19,723] [INFO] [timer.py:197:stop] 0/6492, RunningAvgSamplesPerSec=6.3268477125665585, CurrSamplesPerSec=5.712261673850383, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:26:31,020] [INFO] [timer.py:197:stop] 0/6494, RunningAvgSamplesPerSec=6.326842469672551, CurrSamplesPerSec=5.632480904341129, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:26:42,290] [INFO] [timer.py:197:stop] 0/6496, RunningAvgSamplesPerSec=6.326849807854566, CurrSamplesPerSec=5.716524173508092, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:26:53,530] [INFO] [timer.py:197:stop] 0/6498, RunningAvgSamplesPerSec=6.3268618325316845, CurrSamplesPerSec=5.731624339172711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:27:05,169] [INFO] [logging.py:68:log_dist] [Rank 0] step=3250, skipped=6, lr=[3.904444444444444e-06], mom=[[0.9, 0.999]] [2022-12-19 11:27:05,170] [INFO] [timer.py:197:stop] 0/6500, RunningAvgSamplesPerSec=6.326868604326522, CurrSamplesPerSec=5.7120630582942455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:27:16,428] [INFO] [timer.py:197:stop] 0/6502, RunningAvgSamplesPerSec=6.326870803548139, CurrSamplesPerSec=5.750672958380139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:27:27,905] [INFO] [timer.py:197:stop] 0/6504, RunningAvgSamplesPerSec=6.326858065517115, CurrSamplesPerSec=5.684149880805315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:27:39,169] [INFO] [timer.py:197:stop] 0/6506, RunningAvgSamplesPerSec=6.3268663980721405, CurrSamplesPerSec=5.723673492119986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:27:50,521] [INFO] [timer.py:197:stop] 0/6508, RunningAvgSamplesPerSec=6.326855626186542, CurrSamplesPerSec=5.6259116220634535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:28:01,961] [INFO] [timer.py:197:stop] 0/6510, RunningAvgSamplesPerSec=6.326867718395923, CurrSamplesPerSec=5.74741035641613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:28:13,224] [INFO] [timer.py:197:stop] 0/6512, RunningAvgSamplesPerSec=6.326877177911314, CurrSamplesPerSec=5.716813192832455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:28:24,563] [INFO] [timer.py:197:stop] 0/6514, RunningAvgSamplesPerSec=6.326871012535383, CurrSamplesPerSec=5.712419700988795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:28:36,089] [INFO] [timer.py:197:stop] 0/6516, RunningAvgSamplesPerSec=6.326874610693685, CurrSamplesPerSec=5.699740955085859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:28:47,574] [INFO] [timer.py:197:stop] 0/6518, RunningAvgSamplesPerSec=6.326840290708968, CurrSamplesPerSec=5.490621114221415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:28:58,912] [INFO] [logging.py:68:log_dist] [Rank 0] step=3260, skipped=6, lr=[3.882222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 11:28:58,913] [INFO] [timer.py:197:stop] 0/6520, RunningAvgSamplesPerSec=6.326843771639819, CurrSamplesPerSec=5.6795413297088375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:29:10,346] [INFO] [timer.py:197:stop] 0/6522, RunningAvgSamplesPerSec=6.326810164466902, CurrSamplesPerSec=5.4941221272084775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:29:21,816] [INFO] [timer.py:197:stop] 0/6524, RunningAvgSamplesPerSec=6.32681415003125, CurrSamplesPerSec=5.696044628004686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 3.877777777777778e-06, 'epoch': 24.43} [2022-12-19 11:29:33,111] [INFO] [timer.py:197:stop] 0/6526, RunningAvgSamplesPerSec=6.326817962461837, CurrSamplesPerSec=5.704183806176186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:29:44,666] [INFO] [timer.py:197:stop] 0/6528, RunningAvgSamplesPerSec=6.326825761560741, CurrSamplesPerSec=5.711419415154709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:29:55,924] [INFO] [timer.py:197:stop] 0/6530, RunningAvgSamplesPerSec=6.326833331856038, CurrSamplesPerSec=5.70626044592404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:30:07,211] [INFO] [timer.py:197:stop] 0/6532, RunningAvgSamplesPerSec=6.32683410075801, CurrSamplesPerSec=5.683599877450332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:30:18,498] [INFO] [timer.py:197:stop] 0/6534, RunningAvgSamplesPerSec=6.326841381945299, CurrSamplesPerSec=5.719547538716265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:30:30,029] [INFO] [timer.py:197:stop] 0/6536, RunningAvgSamplesPerSec=6.326845540507925, CurrSamplesPerSec=5.689311132803749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:30:41,359] [INFO] [timer.py:197:stop] 0/6538, RunningAvgSamplesPerSec=6.326832998859533, CurrSamplesPerSec=5.682834382585286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:30:52,630] [INFO] [logging.py:68:log_dist] [Rank 0] step=3270, skipped=6, lr=[3.86e-06], mom=[[0.9, 0.999]] [2022-12-19 11:30:52,632] [INFO] [timer.py:197:stop] 0/6540, RunningAvgSamplesPerSec=6.326836600868871, CurrSamplesPerSec=5.699828819632612, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:31:04,002] [INFO] [timer.py:197:stop] 0/6542, RunningAvgSamplesPerSec=6.326824531152706, CurrSamplesPerSec=5.608559152141324, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:31:15,297] [INFO] [timer.py:197:stop] 0/6544, RunningAvgSamplesPerSec=6.32682675176322, CurrSamplesPerSec=5.691515004151034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:31:26,590] [INFO] [timer.py:197:stop] 0/6546, RunningAvgSamplesPerSec=6.326827091401944, CurrSamplesPerSec=5.661920505999156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:31:38,055] [INFO] [timer.py:197:stop] 0/6548, RunningAvgSamplesPerSec=6.326825107261293, CurrSamplesPerSec=5.656068151085906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:31:49,321] [INFO] [timer.py:197:stop] 0/6550, RunningAvgSamplesPerSec=6.326829678950247, CurrSamplesPerSec=5.708233962789272, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:32:00,958] [INFO] [timer.py:197:stop] 0/6552, RunningAvgSamplesPerSec=6.326830904812209, CurrSamplesPerSec=5.697524663372557, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:32:12,249] [INFO] [timer.py:197:stop] 0/6554, RunningAvgSamplesPerSec=6.326833808015235, CurrSamplesPerSec=5.678041315874295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:32:23,822] [INFO] [timer.py:197:stop] 0/6556, RunningAvgSamplesPerSec=6.326783693194901, CurrSamplesPerSec=5.406673701396466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:32:35,074] [INFO] [timer.py:197:stop] 0/6558, RunningAvgSamplesPerSec=6.326790940471604, CurrSamplesPerSec=5.7037782582333065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:32:46,403] [INFO] [logging.py:68:log_dist] [Rank 0] step=3280, skipped=6, lr=[3.837777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 11:32:46,405] [INFO] [timer.py:197:stop] 0/6560, RunningAvgSamplesPerSec=6.326798959036579, CurrSamplesPerSec=5.714557329748483, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:32:57,755] [INFO] [timer.py:197:stop] 0/6562, RunningAvgSamplesPerSec=6.326798823777157, CurrSamplesPerSec=5.718009999998296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:33:09,176] [INFO] [timer.py:197:stop] 0/6564, RunningAvgSamplesPerSec=6.326800071560442, CurrSamplesPerSec=5.666667961412302, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:33:20,611] [INFO] [timer.py:197:stop] 0/6566, RunningAvgSamplesPerSec=6.326775618597421, CurrSamplesPerSec=5.54721723943281, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:33:32,055] [INFO] [timer.py:197:stop] 0/6568, RunningAvgSamplesPerSec=6.326782436680691, CurrSamplesPerSec=5.692589210855173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:33:43,348] [INFO] [timer.py:197:stop] 0/6570, RunningAvgSamplesPerSec=6.326784676636838, CurrSamplesPerSec=5.685333772060004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:33:54,604] [INFO] [timer.py:197:stop] 0/6572, RunningAvgSamplesPerSec=6.326793288856398, CurrSamplesPerSec=5.732783036073834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:34:05,968] [INFO] [timer.py:197:stop] 0/6574, RunningAvgSamplesPerSec=6.326803929229126, CurrSamplesPerSec=5.721157357465983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 3.8222222222222224e-06, 'epoch': 24.62} [2022-12-19 11:34:17,615] [INFO] [timer.py:197:stop] 0/6576, RunningAvgSamplesPerSec=6.326739695970986, CurrSamplesPerSec=5.340781107433012, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:34:28,916] [INFO] [timer.py:197:stop] 0/6578, RunningAvgSamplesPerSec=6.326742114559739, CurrSamplesPerSec=5.6917312612626905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:34:40,331] [INFO] [logging.py:68:log_dist] [Rank 0] step=3290, skipped=6, lr=[3.8155555555555555e-06], mom=[[0.9, 0.999]] [2022-12-19 11:34:40,332] [INFO] [timer.py:197:stop] 0/6580, RunningAvgSamplesPerSec=6.326746255907684, CurrSamplesPerSec=5.695793477947205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:34:51,683] [INFO] [timer.py:197:stop] 0/6582, RunningAvgSamplesPerSec=6.326738072424267, CurrSamplesPerSec=5.68771363983179, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:35:03,100] [INFO] [timer.py:197:stop] 0/6584, RunningAvgSamplesPerSec=6.326733236023026, CurrSamplesPerSec=5.710983920214732, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:35:14,413] [INFO] [timer.py:197:stop] 0/6586, RunningAvgSamplesPerSec=6.326727500681358, CurrSamplesPerSec=5.634306260415478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:35:25,634] [INFO] [timer.py:197:stop] 0/6588, RunningAvgSamplesPerSec=6.326739427107374, CurrSamplesPerSec=5.744116546168851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:35:37,091] [INFO] [timer.py:197:stop] 0/6590, RunningAvgSamplesPerSec=6.3267097262929575, CurrSamplesPerSec=5.504089917240412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:35:48,518] [INFO] [timer.py:197:stop] 0/6592, RunningAvgSamplesPerSec=6.326714638986022, CurrSamplesPerSec=5.713554592656932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:35:59,753] [INFO] [timer.py:197:stop] 0/6594, RunningAvgSamplesPerSec=6.3267269872035214, CurrSamplesPerSec=5.731277775891802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:36:11,056] [INFO] [timer.py:197:stop] 0/6596, RunningAvgSamplesPerSec=6.326727756539312, CurrSamplesPerSec=5.70619033498183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:36:22,570] [INFO] [timer.py:197:stop] 0/6598, RunningAvgSamplesPerSec=6.326735082092559, CurrSamplesPerSec=5.724685644521287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:36:33,995] [INFO] [logging.py:68:log_dist] [Rank 0] step=3300, skipped=6, lr=[3.793333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 11:36:33,997] [INFO] [timer.py:197:stop] 0/6600, RunningAvgSamplesPerSec=6.326707878636267, CurrSamplesPerSec=5.523163958239038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:36:45,265] [INFO] [timer.py:197:stop] 0/6602, RunningAvgSamplesPerSec=6.326717933509806, CurrSamplesPerSec=5.712747209678823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:36:56,778] [INFO] [timer.py:197:stop] 0/6604, RunningAvgSamplesPerSec=6.326727966422583, CurrSamplesPerSec=5.707620793577691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:37:08,069] [INFO] [timer.py:197:stop] 0/6606, RunningAvgSamplesPerSec=6.326724491188016, CurrSamplesPerSec=5.712161756587142, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:37:19,431] [INFO] [timer.py:197:stop] 0/6608, RunningAvgSamplesPerSec=6.3267280504746015, CurrSamplesPerSec=5.712698579436967, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:37:30,752] [INFO] [timer.py:197:stop] 0/6610, RunningAvgSamplesPerSec=6.326717693988499, CurrSamplesPerSec=5.631077697936864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:37:42,011] [INFO] [timer.py:197:stop] 0/6612, RunningAvgSamplesPerSec=6.326725837393116, CurrSamplesPerSec=5.709005102036476, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:37:53,587] [INFO] [timer.py:197:stop] 0/6614, RunningAvgSamplesPerSec=6.326676845157258, CurrSamplesPerSec=5.429351062232371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:38:05,068] [INFO] [timer.py:197:stop] 0/6616, RunningAvgSamplesPerSec=6.326686044808299, CurrSamplesPerSec=5.716079622904415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:38:16,320] [INFO] [timer.py:197:stop] 0/6618, RunningAvgSamplesPerSec=6.326688988557548, CurrSamplesPerSec=5.702507930834911, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:38:27,621] [INFO] [logging.py:68:log_dist] [Rank 0] step=3310, skipped=6, lr=[3.7711111111111116e-06], mom=[[0.9, 0.999]] [2022-12-19 11:38:27,623] [INFO] [timer.py:197:stop] 0/6620, RunningAvgSamplesPerSec=6.32668845092955, CurrSamplesPerSec=5.705162158228562, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:38:39,075] [INFO] [timer.py:197:stop] 0/6622, RunningAvgSamplesPerSec=6.326696982269236, CurrSamplesPerSec=5.711713509014124, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:38:50,394] [INFO] [timer.py:197:stop] 0/6624, RunningAvgSamplesPerSec=6.326689979197886, CurrSamplesPerSec=5.640487670480343, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 3.766666666666667e-06, 'epoch': 24.81} [2022-12-19 11:39:01,684] [INFO] [timer.py:197:stop] 0/6626, RunningAvgSamplesPerSec=6.326693752237434, CurrSamplesPerSec=5.711710835297166, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:39:13,010] [INFO] [timer.py:197:stop] 0/6628, RunningAvgSamplesPerSec=6.326688809224456, CurrSamplesPerSec=5.6516687316534515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:39:24,462] [INFO] [timer.py:197:stop] 0/6630, RunningAvgSamplesPerSec=6.326700993813369, CurrSamplesPerSec=5.7129590040367235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:39:35,681] [INFO] [timer.py:197:stop] 0/6632, RunningAvgSamplesPerSec=6.326711690850725, CurrSamplesPerSec=5.709818957557953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:39:47,146] [INFO] [timer.py:197:stop] 0/6634, RunningAvgSamplesPerSec=6.32671367267514, CurrSamplesPerSec=5.7029879327202595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:39:58,466] [INFO] [timer.py:197:stop] 0/6636, RunningAvgSamplesPerSec=6.326711567861301, CurrSamplesPerSec=5.672803135775106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:40:09,918] [INFO] [timer.py:197:stop] 0/6638, RunningAvgSamplesPerSec=6.3266864009857375, CurrSamplesPerSec=5.555853603966736, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:40:21,150] [INFO] [logging.py:68:log_dist] [Rank 0] step=3320, skipped=6, lr=[3.7488888888888892e-06], mom=[[0.9, 0.999]] [2022-12-19 11:40:21,151] [INFO] [timer.py:197:stop] 0/6640, RunningAvgSamplesPerSec=6.326693222034245, CurrSamplesPerSec=5.69603375001687, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:40:32,706] [INFO] [timer.py:197:stop] 0/6642, RunningAvgSamplesPerSec=6.326704709407935, CurrSamplesPerSec=5.7154635501660715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:40:44,054] [INFO] [timer.py:197:stop] 0/6644, RunningAvgSamplesPerSec=6.3267000565532925, CurrSamplesPerSec=5.719547782448788, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:40:55,325] [INFO] [timer.py:197:stop] 0/6646, RunningAvgSamplesPerSec=6.326708505000815, CurrSamplesPerSec=5.736377901980217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:41:06,910] [INFO] [timer.py:197:stop] 0/6648, RunningAvgSamplesPerSec=6.3267153340298945, CurrSamplesPerSec=5.693217507920035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:41:18,137] [INFO] [timer.py:197:stop] 0/6650, RunningAvgSamplesPerSec=6.326729379191029, CurrSamplesPerSec=5.722331099375108, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:41:29,524] [INFO] [timer.py:197:stop] 0/6652, RunningAvgSamplesPerSec=6.326711562303916, CurrSamplesPerSec=5.593860669417493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:41:41,089] [INFO] [timer.py:197:stop] 0/6654, RunningAvgSamplesPerSec=6.32671714230551, CurrSamplesPerSec=5.687416469158867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:41:52,372] [INFO] [timer.py:197:stop] 0/6656, RunningAvgSamplesPerSec=6.326719868278054, CurrSamplesPerSec=5.704952396505239, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:42:03,646] [INFO] [timer.py:197:stop] 0/6658, RunningAvgSamplesPerSec=6.326727135345242, CurrSamplesPerSec=5.720975680284201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:42:15,120] [INFO] [logging.py:68:log_dist] [Rank 0] step=3330, skipped=6, lr=[3.726666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 11:42:15,122] [INFO] [timer.py:197:stop] 0/6660, RunningAvgSamplesPerSec=6.3267284708532605, CurrSamplesPerSec=5.684386764057025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:42:26,476] [INFO] [timer.py:197:stop] 0/6662, RunningAvgSamplesPerSec=6.326715366019613, CurrSamplesPerSec=5.6126231712928245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:42:37,750] [INFO] [timer.py:197:stop] 0/6664, RunningAvgSamplesPerSec=6.326718624062689, CurrSamplesPerSec=5.682032290170409, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:42:49,184] [INFO] [timer.py:197:stop] 0/6666, RunningAvgSamplesPerSec=6.3267219196695414, CurrSamplesPerSec=5.689054788788464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:43:00,529] [INFO] [timer.py:197:stop] 0/6668, RunningAvgSamplesPerSec=6.326713542265986, CurrSamplesPerSec=5.708502478192105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:43:11,800] [INFO] [timer.py:197:stop] 0/6670, RunningAvgSamplesPerSec=6.326719410987564, CurrSamplesPerSec=5.714034996523711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:43:23,363] [INFO] [timer.py:197:stop] 0/6672, RunningAvgSamplesPerSec=6.326702989365135, CurrSamplesPerSec=5.687142463381145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:43:34,672] [INFO] [timer.py:197:stop] 0/6674, RunningAvgSamplesPerSec=6.326701263546173, CurrSamplesPerSec=5.665016684689339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 3.7111111111111113e-06, 'epoch': 25.0} [2022-12-19 11:43:45,089] [INFO] [timer.py:197:stop] 0/6676, RunningAvgSamplesPerSec=6.32685044554381, CurrSamplesPerSec=5.654469020909715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:43:56,640] [INFO] [timer.py:197:stop] 0/6678, RunningAvgSamplesPerSec=6.326849338874557, CurrSamplesPerSec=5.694234136090257, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:44:07,967] [INFO] [logging.py:68:log_dist] [Rank 0] step=3340, skipped=6, lr=[3.704444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 11:44:07,969] [INFO] [timer.py:197:stop] 0/6680, RunningAvgSamplesPerSec=6.326848650528736, CurrSamplesPerSec=5.701466791091186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:44:19,291] [INFO] [timer.py:197:stop] 0/6682, RunningAvgSamplesPerSec=6.326849465846833, CurrSamplesPerSec=5.693165587264298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:44:30,596] [INFO] [timer.py:197:stop] 0/6684, RunningAvgSamplesPerSec=6.326847879943356, CurrSamplesPerSec=5.697898602178483, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:44:41,877] [INFO] [timer.py:197:stop] 0/6686, RunningAvgSamplesPerSec=6.326851091762638, CurrSamplesPerSec=5.706673140745091, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:44:53,195] [INFO] [timer.py:197:stop] 0/6688, RunningAvgSamplesPerSec=6.326848643317731, CurrSamplesPerSec=5.710819411787557, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:45:04,519] [INFO] [timer.py:197:stop] 0/6690, RunningAvgSamplesPerSec=6.326849651866211, CurrSamplesPerSec=5.69113031950755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:45:16,123] [INFO] [timer.py:197:stop] 0/6692, RunningAvgSamplesPerSec=6.326827854507393, CurrSamplesPerSec=5.591482272801465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:45:27,436] [INFO] [timer.py:197:stop] 0/6694, RunningAvgSamplesPerSec=6.326827839624245, CurrSamplesPerSec=5.702410050415053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:45:38,846] [INFO] [timer.py:197:stop] 0/6696, RunningAvgSamplesPerSec=6.326825233195205, CurrSamplesPerSec=5.687449004579995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:45:50,192] [INFO] [timer.py:197:stop] 0/6698, RunningAvgSamplesPerSec=6.326818517771348, CurrSamplesPerSec=5.66417658688041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:46:01,712] [INFO] [logging.py:68:log_dist] [Rank 0] step=3350, skipped=6, lr=[3.6822222222222225e-06], mom=[[0.9, 0.999]] [2022-12-19 11:46:01,713] [INFO] [timer.py:197:stop] 0/6700, RunningAvgSamplesPerSec=6.326816596317148, CurrSamplesPerSec=5.679845609856134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:46:13,199] [INFO] [timer.py:197:stop] 0/6702, RunningAvgSamplesPerSec=6.326803787293123, CurrSamplesPerSec=5.694780400594826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:46:24,516] [INFO] [timer.py:197:stop] 0/6704, RunningAvgSamplesPerSec=6.32680251712136, CurrSamplesPerSec=5.705383819169173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:46:35,874] [INFO] [timer.py:197:stop] 0/6706, RunningAvgSamplesPerSec=6.326797831265038, CurrSamplesPerSec=5.667299164061334, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:46:47,166] [INFO] [timer.py:197:stop] 0/6708, RunningAvgSamplesPerSec=6.326797786186666, CurrSamplesPerSec=5.700003104435108, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:46:58,609] [INFO] [timer.py:197:stop] 0/6710, RunningAvgSamplesPerSec=6.326797973747498, CurrSamplesPerSec=5.704979555535379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:47:09,932] [INFO] [timer.py:197:stop] 0/6712, RunningAvgSamplesPerSec=6.326796254673318, CurrSamplesPerSec=5.674995673493055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:47:21,197] [INFO] [timer.py:197:stop] 0/6714, RunningAvgSamplesPerSec=6.3268025383069375, CurrSamplesPerSec=5.722631685966059, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:47:32,529] [INFO] [timer.py:197:stop] 0/6716, RunningAvgSamplesPerSec=6.326804139724594, CurrSamplesPerSec=5.69564917941501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:47:43,855] [INFO] [timer.py:197:stop] 0/6718, RunningAvgSamplesPerSec=6.32680130112175, CurrSamplesPerSec=5.6964493186745395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:47:55,310] [INFO] [logging.py:68:log_dist] [Rank 0] step=3360, skipped=6, lr=[3.66e-06], mom=[[0.9, 0.999]] [2022-12-19 11:47:55,312] [INFO] [timer.py:197:stop] 0/6720, RunningAvgSamplesPerSec=6.3268036430576435, CurrSamplesPerSec=5.718299169588529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:48:06,606] [INFO] [timer.py:197:stop] 0/6722, RunningAvgSamplesPerSec=6.326797597340245, CurrSamplesPerSec=5.672963543005338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:48:17,973] [INFO] [timer.py:197:stop] 0/6724, RunningAvgSamplesPerSec=6.326796086999058, CurrSamplesPerSec=5.690633733301235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 3.6555555555555562e-06, 'epoch': 25.19} [2022-12-19 11:48:29,279] [INFO] [timer.py:197:stop] 0/6726, RunningAvgSamplesPerSec=6.3267977264053945, CurrSamplesPerSec=5.698250333273896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:48:40,605] [INFO] [timer.py:197:stop] 0/6728, RunningAvgSamplesPerSec=6.326792812769777, CurrSamplesPerSec=5.680603808210563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:48:51,935] [INFO] [timer.py:197:stop] 0/6730, RunningAvgSamplesPerSec=6.326794362631848, CurrSamplesPerSec=5.695282785845808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:49:03,253] [INFO] [timer.py:197:stop] 0/6732, RunningAvgSamplesPerSec=6.326791441162336, CurrSamplesPerSec=5.676643410164857, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:49:14,554] [INFO] [timer.py:197:stop] 0/6734, RunningAvgSamplesPerSec=6.326789113424503, CurrSamplesPerSec=5.701911493892114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:49:25,874] [INFO] [timer.py:197:stop] 0/6736, RunningAvgSamplesPerSec=6.326788313935144, CurrSamplesPerSec=5.6968057068171785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:49:37,217] [INFO] [timer.py:197:stop] 0/6738, RunningAvgSamplesPerSec=6.32678073431901, CurrSamplesPerSec=5.669773168060295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:49:48,542] [INFO] [logging.py:68:log_dist] [Rank 0] step=3370, skipped=6, lr=[3.6377777777777777e-06], mom=[[0.9, 0.999]] [2022-12-19 11:49:48,544] [INFO] [timer.py:197:stop] 0/6740, RunningAvgSamplesPerSec=6.326775080336357, CurrSamplesPerSec=5.672798580246332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:49:59,886] [INFO] [timer.py:197:stop] 0/6742, RunningAvgSamplesPerSec=6.32677141698878, CurrSamplesPerSec=5.6740835293177145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:50:11,232] [INFO] [timer.py:197:stop] 0/6744, RunningAvgSamplesPerSec=6.3267636627210235, CurrSamplesPerSec=5.648126216506501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:50:22,602] [INFO] [timer.py:197:stop] 0/6746, RunningAvgSamplesPerSec=6.326754337543741, CurrSamplesPerSec=5.667857984266635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:50:34,019] [INFO] [timer.py:197:stop] 0/6748, RunningAvgSamplesPerSec=6.326739147544227, CurrSamplesPerSec=5.619696452539014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:50:45,413] [INFO] [timer.py:197:stop] 0/6750, RunningAvgSamplesPerSec=6.326735604857437, CurrSamplesPerSec=5.664975797667377, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:50:56,808] [INFO] [timer.py:197:stop] 0/6752, RunningAvgSamplesPerSec=6.326732242415751, CurrSamplesPerSec=5.692332088522818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:51:08,134] [INFO] [timer.py:197:stop] 0/6754, RunningAvgSamplesPerSec=6.326729981597586, CurrSamplesPerSec=5.680756001135493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:51:19,572] [INFO] [timer.py:197:stop] 0/6756, RunningAvgSamplesPerSec=6.326727740709559, CurrSamplesPerSec=5.676340193199298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:51:30,872] [INFO] [timer.py:197:stop] 0/6758, RunningAvgSamplesPerSec=6.326722797236779, CurrSamplesPerSec=5.667297967562424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:51:42,408] [INFO] [logging.py:68:log_dist] [Rank 0] step=3380, skipped=6, lr=[3.615555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 11:51:42,410] [INFO] [timer.py:197:stop] 0/6760, RunningAvgSamplesPerSec=6.326715253544564, CurrSamplesPerSec=5.672227758464323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:51:53,764] [INFO] [timer.py:197:stop] 0/6762, RunningAvgSamplesPerSec=6.3267176184120935, CurrSamplesPerSec=5.702851750325417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:52:05,141] [INFO] [timer.py:197:stop] 0/6764, RunningAvgSamplesPerSec=6.326706433740634, CurrSamplesPerSec=5.654354678792293, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:52:16,493] [INFO] [timer.py:197:stop] 0/6766, RunningAvgSamplesPerSec=6.326702477015988, CurrSamplesPerSec=5.696889370197837, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:52:27,851] [INFO] [timer.py:197:stop] 0/6768, RunningAvgSamplesPerSec=6.326694509599993, CurrSamplesPerSec=5.659565033432985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:52:39,175] [INFO] [timer.py:197:stop] 0/6770, RunningAvgSamplesPerSec=6.326689706600722, CurrSamplesPerSec=5.680643238138272, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:52:50,522] [INFO] [timer.py:197:stop] 0/6772, RunningAvgSamplesPerSec=6.326690603630527, CurrSamplesPerSec=5.697017046220263, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:53:02,045] [INFO] [timer.py:197:stop] 0/6774, RunningAvgSamplesPerSec=6.3266923559631705, CurrSamplesPerSec=5.703084378990861, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 3.6000000000000003e-06, 'epoch': 25.37} [2022-12-19 11:53:13,577] [INFO] [timer.py:197:stop] 0/6776, RunningAvgSamplesPerSec=6.326689688747822, CurrSamplesPerSec=5.689915067401962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:53:25,077] [INFO] [timer.py:197:stop] 0/6778, RunningAvgSamplesPerSec=6.3266908699215065, CurrSamplesPerSec=5.7131554932119695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:53:36,343] [INFO] [logging.py:68:log_dist] [Rank 0] step=3390, skipped=6, lr=[3.593333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 11:53:36,345] [INFO] [timer.py:197:stop] 0/6780, RunningAvgSamplesPerSec=6.326693134047975, CurrSamplesPerSec=5.707500893858588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:53:47,815] [INFO] [timer.py:197:stop] 0/6782, RunningAvgSamplesPerSec=6.326687344513093, CurrSamplesPerSec=5.657660315945758, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:53:59,157] [INFO] [timer.py:197:stop] 0/6784, RunningAvgSamplesPerSec=6.32669042315977, CurrSamplesPerSec=5.7036682151227875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:54:10,504] [INFO] [timer.py:197:stop] 0/6786, RunningAvgSamplesPerSec=6.3266812274358255, CurrSamplesPerSec=5.671192852638011, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:54:21,827] [INFO] [timer.py:197:stop] 0/6788, RunningAvgSamplesPerSec=6.326678633970738, CurrSamplesPerSec=5.6745858680802606, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:54:33,111] [INFO] [timer.py:197:stop] 0/6790, RunningAvgSamplesPerSec=6.326680035232094, CurrSamplesPerSec=5.697028169781104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:54:44,438] [INFO] [timer.py:197:stop] 0/6792, RunningAvgSamplesPerSec=6.3266748868777105, CurrSamplesPerSec=5.6809346518923185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:54:55,793] [INFO] [timer.py:197:stop] 0/6794, RunningAvgSamplesPerSec=6.32667018239916, CurrSamplesPerSec=5.673206928547661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:55:07,075] [INFO] [timer.py:197:stop] 0/6796, RunningAvgSamplesPerSec=6.3266686928418885, CurrSamplesPerSec=5.692960811771759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:55:18,545] [INFO] [timer.py:197:stop] 0/6798, RunningAvgSamplesPerSec=6.32666299280626, CurrSamplesPerSec=5.677175978853579, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:55:29,825] [INFO] [logging.py:68:log_dist] [Rank 0] step=3400, skipped=6, lr=[3.5711111111111114e-06], mom=[[0.9, 0.999]] [2022-12-19 11:55:29,827] [INFO] [timer.py:197:stop] 0/6800, RunningAvgSamplesPerSec=6.32666500124872, CurrSamplesPerSec=5.7008703297636805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:55:41,127] [INFO] [timer.py:197:stop] 0/6802, RunningAvgSamplesPerSec=6.326663546897921, CurrSamplesPerSec=5.697092735655091, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:55:52,658] [INFO] [timer.py:197:stop] 0/6804, RunningAvgSamplesPerSec=6.3266608909582125, CurrSamplesPerSec=5.686218220007328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:56:04,135] [INFO] [timer.py:197:stop] 0/6806, RunningAvgSamplesPerSec=6.326660028979415, CurrSamplesPerSec=5.669952326308196, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:56:15,369] [INFO] [timer.py:197:stop] 0/6808, RunningAvgSamplesPerSec=6.326666322991937, CurrSamplesPerSec=5.730442622918623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:56:26,703] [INFO] [timer.py:197:stop] 0/6810, RunningAvgSamplesPerSec=6.326665644383758, CurrSamplesPerSec=5.693702709959053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:56:38,213] [INFO] [timer.py:197:stop] 0/6812, RunningAvgSamplesPerSec=6.3266716657994895, CurrSamplesPerSec=5.70269934037576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:56:49,519] [INFO] [timer.py:197:stop] 0/6814, RunningAvgSamplesPerSec=6.326670542582394, CurrSamplesPerSec=5.696385492670777, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:57:00,751] [INFO] [timer.py:197:stop] 0/6816, RunningAvgSamplesPerSec=6.326678571407443, CurrSamplesPerSec=5.711110041660492, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:57:12,144] [INFO] [timer.py:197:stop] 0/6818, RunningAvgSamplesPerSec=6.32668389082979, CurrSamplesPerSec=5.705772131103705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:57:23,502] [INFO] [logging.py:68:log_dist] [Rank 0] step=3410, skipped=6, lr=[3.548888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 11:57:23,504] [INFO] [timer.py:197:stop] 0/6820, RunningAvgSamplesPerSec=6.326685353627198, CurrSamplesPerSec=5.67621824333254, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:57:35,027] [INFO] [timer.py:197:stop] 0/6822, RunningAvgSamplesPerSec=6.326683491725522, CurrSamplesPerSec=5.686896675914068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:57:46,295] [INFO] [timer.py:197:stop] 0/6824, RunningAvgSamplesPerSec=6.326692256867763, CurrSamplesPerSec=5.727442681290797, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0005, 'learning_rate': 3.5444444444444447e-06, 'epoch': 25.56} [2022-12-19 11:57:57,797] [INFO] [timer.py:197:stop] 0/6826, RunningAvgSamplesPerSec=6.32669331548638, CurrSamplesPerSec=5.7023686218435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:58:09,129] [INFO] [timer.py:197:stop] 0/6828, RunningAvgSamplesPerSec=6.326695982833575, CurrSamplesPerSec=5.685334494535895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:58:20,450] [INFO] [timer.py:197:stop] 0/6830, RunningAvgSamplesPerSec=6.32669326170209, CurrSamplesPerSec=5.686235805776385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:58:31,910] [INFO] [timer.py:197:stop] 0/6832, RunningAvgSamplesPerSec=6.326694476680062, CurrSamplesPerSec=5.691904568443533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:58:43,361] [INFO] [timer.py:197:stop] 0/6834, RunningAvgSamplesPerSec=6.326696344100076, CurrSamplesPerSec=5.708074710615021, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:58:54,615] [INFO] [timer.py:197:stop] 0/6836, RunningAvgSamplesPerSec=6.326700089762492, CurrSamplesPerSec=5.707072791029496, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:59:05,893] [INFO] [timer.py:197:stop] 0/6838, RunningAvgSamplesPerSec=6.326706714851692, CurrSamplesPerSec=5.703387066987589, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:59:17,162] [INFO] [logging.py:68:log_dist] [Rank 0] step=3420, skipped=6, lr=[3.526666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 11:59:17,163] [INFO] [timer.py:197:stop] 0/6840, RunningAvgSamplesPerSec=6.3267116031553545, CurrSamplesPerSec=5.701212256598993, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:59:28,503] [INFO] [timer.py:197:stop] 0/6842, RunningAvgSamplesPerSec=6.326718062976188, CurrSamplesPerSec=5.722394288252905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:59:39,928] [INFO] [timer.py:197:stop] 0/6844, RunningAvgSamplesPerSec=6.326725485200838, CurrSamplesPerSec=5.706414987199039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 11:59:51,203] [INFO] [timer.py:197:stop] 0/6846, RunningAvgSamplesPerSec=6.326731299844021, CurrSamplesPerSec=5.721834663889172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:00:02,486] [INFO] [timer.py:197:stop] 0/6848, RunningAvgSamplesPerSec=6.326734728624669, CurrSamplesPerSec=5.696703186256274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:00:13,728] [INFO] [timer.py:197:stop] 0/6850, RunningAvgSamplesPerSec=6.326745551911587, CurrSamplesPerSec=5.726307641038312, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:00:25,017] [INFO] [timer.py:197:stop] 0/6852, RunningAvgSamplesPerSec=6.32675060074714, CurrSamplesPerSec=5.693997397652355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:00:36,346] [INFO] [timer.py:197:stop] 0/6854, RunningAvgSamplesPerSec=6.326744888074546, CurrSamplesPerSec=5.663122872310957, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:00:47,648] [INFO] [timer.py:197:stop] 0/6856, RunningAvgSamplesPerSec=6.326748145776996, CurrSamplesPerSec=5.699501822325745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:00:58,909] [INFO] [timer.py:197:stop] 0/6858, RunningAvgSamplesPerSec=6.32675244782407, CurrSamplesPerSec=5.714684339030345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:01:10,429] [INFO] [logging.py:68:log_dist] [Rank 0] step=3430, skipped=6, lr=[3.5044444444444447e-06], mom=[[0.9, 0.999]] [2022-12-19 12:01:10,430] [INFO] [timer.py:197:stop] 0/6860, RunningAvgSamplesPerSec=6.326752228944971, CurrSamplesPerSec=5.688178136649795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:01:21,738] [INFO] [timer.py:197:stop] 0/6862, RunningAvgSamplesPerSec=6.32675083327096, CurrSamplesPerSec=5.680889687588527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:01:32,984] [INFO] [timer.py:197:stop] 0/6864, RunningAvgSamplesPerSec=6.3267576751687615, CurrSamplesPerSec=5.72385851398919, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:01:44,501] [INFO] [timer.py:197:stop] 0/6866, RunningAvgSamplesPerSec=6.326758466048704, CurrSamplesPerSec=5.6880626682381505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:01:55,786] [INFO] [timer.py:197:stop] 0/6868, RunningAvgSamplesPerSec=6.326763835658907, CurrSamplesPerSec=5.7223601319294515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:02:07,042] [INFO] [timer.py:197:stop] 0/6870, RunningAvgSamplesPerSec=6.326767568432781, CurrSamplesPerSec=5.688278663071686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:02:18,368] [INFO] [timer.py:197:stop] 0/6872, RunningAvgSamplesPerSec=6.326769015240449, CurrSamplesPerSec=5.694235343989905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:02:29,691] [INFO] [timer.py:197:stop] 0/6874, RunningAvgSamplesPerSec=6.326766410817363, CurrSamplesPerSec=5.6585030143569135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 3.4888888888888896e-06, 'epoch': 25.75} [2022-12-19 12:02:40,965] [INFO] [timer.py:197:stop] 0/6876, RunningAvgSamplesPerSec=6.326770058889809, CurrSamplesPerSec=5.711771845279942, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:02:52,387] [INFO] [timer.py:197:stop] 0/6878, RunningAvgSamplesPerSec=6.326783343019937, CurrSamplesPerSec=5.737047291560892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:03:03,818] [INFO] [logging.py:68:log_dist] [Rank 0] step=3440, skipped=6, lr=[3.4822222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 12:03:03,820] [INFO] [timer.py:197:stop] 0/6880, RunningAvgSamplesPerSec=6.326785156161621, CurrSamplesPerSec=5.696204660093753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:03:15,138] [INFO] [timer.py:197:stop] 0/6882, RunningAvgSamplesPerSec=6.326786973321895, CurrSamplesPerSec=5.694376430194299, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:03:26,376] [INFO] [timer.py:197:stop] 0/6884, RunningAvgSamplesPerSec=6.326795453497971, CurrSamplesPerSec=5.712928121447449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:03:37,879] [INFO] [timer.py:197:stop] 0/6886, RunningAvgSamplesPerSec=6.326798006484448, CurrSamplesPerSec=5.697668573209085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:03:49,141] [INFO] [timer.py:197:stop] 0/6888, RunningAvgSamplesPerSec=6.32680645447892, CurrSamplesPerSec=5.71987708405941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:04:00,527] [INFO] [timer.py:197:stop] 0/6890, RunningAvgSamplesPerSec=6.32681366173715, CurrSamplesPerSec=5.72399521329088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:04:11,929] [INFO] [timer.py:197:stop] 0/6892, RunningAvgSamplesPerSec=6.326817115660271, CurrSamplesPerSec=5.6981068778655795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:04:23,200] [INFO] [timer.py:197:stop] 0/6894, RunningAvgSamplesPerSec=6.326821790864079, CurrSamplesPerSec=5.698748491697634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:04:34,603] [INFO] [timer.py:197:stop] 0/6896, RunningAvgSamplesPerSec=6.326822501273005, CurrSamplesPerSec=5.6854883860856775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:04:46,006] [INFO] [timer.py:197:stop] 0/6898, RunningAvgSamplesPerSec=6.326827055874184, CurrSamplesPerSec=5.706790579222687, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:04:57,290] [INFO] [logging.py:68:log_dist] [Rank 0] step=3450, skipped=6, lr=[3.46e-06], mom=[[0.9, 0.999]] [2022-12-19 12:04:57,291] [INFO] [timer.py:197:stop] 0/6900, RunningAvgSamplesPerSec=6.326832622489323, CurrSamplesPerSec=5.718058233499936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:05:08,752] [INFO] [timer.py:197:stop] 0/6902, RunningAvgSamplesPerSec=6.326836617801935, CurrSamplesPerSec=5.705922037046099, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:05:19,988] [INFO] [timer.py:197:stop] 0/6904, RunningAvgSamplesPerSec=6.326849783362304, CurrSamplesPerSec=5.730966981479509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:05:31,238] [INFO] [timer.py:197:stop] 0/6906, RunningAvgSamplesPerSec=6.3268605384477015, CurrSamplesPerSec=5.706794946862598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:05:42,485] [INFO] [timer.py:197:stop] 0/6908, RunningAvgSamplesPerSec=6.326871609809026, CurrSamplesPerSec=5.722573127660011, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:05:53,964] [INFO] [timer.py:197:stop] 0/6910, RunningAvgSamplesPerSec=6.32687790266125, CurrSamplesPerSec=5.709382978042793, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:06:05,433] [INFO] [timer.py:197:stop] 0/6912, RunningAvgSamplesPerSec=6.326887903777489, CurrSamplesPerSec=5.745180207891551, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:06:16,689] [INFO] [timer.py:197:stop] 0/6914, RunningAvgSamplesPerSec=6.32689261627017, CurrSamplesPerSec=5.712771281954892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:06:28,152] [INFO] [timer.py:197:stop] 0/6916, RunningAvgSamplesPerSec=6.32690100927689, CurrSamplesPerSec=5.724405838668654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:06:39,737] [INFO] [timer.py:197:stop] 0/6918, RunningAvgSamplesPerSec=6.326889533034824, CurrSamplesPerSec=5.60374380846178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:06:50,985] [INFO] [logging.py:68:log_dist] [Rank 0] step=3460, skipped=6, lr=[3.4377777777777784e-06], mom=[[0.9, 0.999]] [2022-12-19 12:06:50,987] [INFO] [timer.py:197:stop] 0/6920, RunningAvgSamplesPerSec=6.326896405761265, CurrSamplesPerSec=5.726111711922643, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:07:02,270] [INFO] [timer.py:197:stop] 0/6922, RunningAvgSamplesPerSec=6.326898766601363, CurrSamplesPerSec=5.703230508575982, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:07:13,628] [INFO] [timer.py:197:stop] 0/6924, RunningAvgSamplesPerSec=6.326904955431773, CurrSamplesPerSec=5.718479945890385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 3.4333333333333336e-06, 'epoch': 25.94} [2022-12-19 12:07:24,854] [INFO] [timer.py:197:stop] 0/6926, RunningAvgSamplesPerSec=6.326912860477222, CurrSamplesPerSec=5.708857947922186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:07:36,143] [INFO] [timer.py:197:stop] 0/6928, RunningAvgSamplesPerSec=6.326916720101064, CurrSamplesPerSec=5.700321201676251, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:07:47,465] [INFO] [timer.py:197:stop] 0/6930, RunningAvgSamplesPerSec=6.326922939320764, CurrSamplesPerSec=5.702018320093752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:07:58,725] [INFO] [timer.py:197:stop] 0/6932, RunningAvgSamplesPerSec=6.326931361302798, CurrSamplesPerSec=5.721517331840195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:08:09,998] [INFO] [timer.py:197:stop] 0/6934, RunningAvgSamplesPerSec=6.326941261426976, CurrSamplesPerSec=5.744024606855902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:08:21,255] [INFO] [timer.py:197:stop] 0/6936, RunningAvgSamplesPerSec=6.3269464202352745, CurrSamplesPerSec=5.713941585090915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:08:32,498] [INFO] [timer.py:197:stop] 0/6938, RunningAvgSamplesPerSec=6.326957405785712, CurrSamplesPerSec=5.728837845321236, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:08:43,775] [INFO] [logging.py:68:log_dist] [Rank 0] step=3470, skipped=6, lr=[3.415555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 12:08:43,777] [INFO] [timer.py:197:stop] 0/6940, RunningAvgSamplesPerSec=6.326960118804726, CurrSamplesPerSec=5.697890136009183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:08:54,126] [INFO] [timer.py:197:stop] 0/6942, RunningAvgSamplesPerSec=6.327111863833196, CurrSamplesPerSec=6.659344736575093, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:09:05,437] [INFO] [timer.py:197:stop] 0/6944, RunningAvgSamplesPerSec=6.3271081505939675, CurrSamplesPerSec=5.644168710094701, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:09:16,737] [INFO] [timer.py:197:stop] 0/6946, RunningAvgSamplesPerSec=6.327106987968343, CurrSamplesPerSec=5.6618010856060925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:09:28,020] [INFO] [timer.py:197:stop] 0/6948, RunningAvgSamplesPerSec=6.327116419971881, CurrSamplesPerSec=5.734239109514243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:09:39,263] [INFO] [timer.py:197:stop] 0/6950, RunningAvgSamplesPerSec=6.327123993103591, CurrSamplesPerSec=5.706342446374481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:09:50,490] [INFO] [timer.py:197:stop] 0/6952, RunningAvgSamplesPerSec=6.3271374437929095, CurrSamplesPerSec=5.737908410196686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:10:01,926] [INFO] [timer.py:197:stop] 0/6954, RunningAvgSamplesPerSec=6.327150860714624, CurrSamplesPerSec=5.740781819437764, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:10:13,181] [INFO] [timer.py:197:stop] 0/6956, RunningAvgSamplesPerSec=6.327158193852021, CurrSamplesPerSec=5.720032363776691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:10:24,451] [INFO] [timer.py:197:stop] 0/6958, RunningAvgSamplesPerSec=6.327165247803802, CurrSamplesPerSec=5.706016641937484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:10:35,702] [INFO] [logging.py:68:log_dist] [Rank 0] step=3480, skipped=6, lr=[3.3933333333333336e-06], mom=[[0.9, 0.999]] [2022-12-19 12:10:35,703] [INFO] [timer.py:197:stop] 0/6960, RunningAvgSamplesPerSec=6.327171302168676, CurrSamplesPerSec=5.701667576957991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:10:46,958] [INFO] [timer.py:197:stop] 0/6962, RunningAvgSamplesPerSec=6.32718262932739, CurrSamplesPerSec=5.733404807743363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:10:58,227] [INFO] [timer.py:197:stop] 0/6964, RunningAvgSamplesPerSec=6.327186124523997, CurrSamplesPerSec=5.711265088758656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:11:09,487] [INFO] [timer.py:197:stop] 0/6966, RunningAvgSamplesPerSec=6.327192917079825, CurrSamplesPerSec=5.701794982802777, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:11:20,728] [INFO] [timer.py:197:stop] 0/6968, RunningAvgSamplesPerSec=6.327201746343662, CurrSamplesPerSec=5.723417947282855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:11:31,999] [INFO] [timer.py:197:stop] 0/6970, RunningAvgSamplesPerSec=6.327206409633725, CurrSamplesPerSec=5.704122473348012, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:11:43,291] [INFO] [timer.py:197:stop] 0/6972, RunningAvgSamplesPerSec=6.327208986961938, CurrSamplesPerSec=5.699508841118894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:11:54,560] [INFO] [timer.py:197:stop] 0/6974, RunningAvgSamplesPerSec=6.327214551668854, CurrSamplesPerSec=5.715199490111434, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:12:06,067] [INFO] [timer.py:197:stop] 0/6976, RunningAvgSamplesPerSec=6.327215482501922, CurrSamplesPerSec=5.677638515870118, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0006, 'learning_rate': 3.375555555555556e-06, 'epoch': 26.13} [2022-12-19 12:12:17,335] [INFO] [timer.py:197:stop] 0/6978, RunningAvgSamplesPerSec=6.327219373096061, CurrSamplesPerSec=5.707227376309316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:12:28,587] [INFO] [logging.py:68:log_dist] [Rank 0] step=3490, skipped=6, lr=[3.371111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 12:12:28,588] [INFO] [timer.py:197:stop] 0/6980, RunningAvgSamplesPerSec=6.32722763945495, CurrSamplesPerSec=5.713425201475929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:12:39,831] [INFO] [timer.py:197:stop] 0/6982, RunningAvgSamplesPerSec=6.3272334758767785, CurrSamplesPerSec=5.698996272660868, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:12:51,120] [INFO] [timer.py:197:stop] 0/6984, RunningAvgSamplesPerSec=6.327237462703871, CurrSamplesPerSec=5.703657065617219, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:13:02,558] [INFO] [timer.py:197:stop] 0/6986, RunningAvgSamplesPerSec=6.327246181454808, CurrSamplesPerSec=5.710762066912186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:13:13,832] [INFO] [timer.py:197:stop] 0/6988, RunningAvgSamplesPerSec=6.327252160613873, CurrSamplesPerSec=5.714901630452162, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:13:25,186] [INFO] [timer.py:197:stop] 0/6990, RunningAvgSamplesPerSec=6.327245443480158, CurrSamplesPerSec=5.698858828996737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:13:36,662] [INFO] [timer.py:197:stop] 0/6992, RunningAvgSamplesPerSec=6.327253258975746, CurrSamplesPerSec=5.729551702587606, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:13:47,895] [INFO] [timer.py:197:stop] 0/6994, RunningAvgSamplesPerSec=6.327260857914323, CurrSamplesPerSec=5.693374966117864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:13:59,173] [INFO] [timer.py:197:stop] 0/6996, RunningAvgSamplesPerSec=6.32726507422119, CurrSamplesPerSec=5.709566105617785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:14:10,423] [INFO] [timer.py:197:stop] 0/6998, RunningAvgSamplesPerSec=6.327272391190203, CurrSamplesPerSec=5.709992639251602, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:14:21,667] [INFO] [logging.py:68:log_dist] [Rank 0] step=3500, skipped=6, lr=[3.3488888888888892e-06], mom=[[0.9, 0.999]] [2022-12-19 12:14:21,669] [INFO] [timer.py:197:stop] 0/7000, RunningAvgSamplesPerSec=6.3272817847087115, CurrSamplesPerSec=5.719481244241324, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:14:32,956] [INFO] [timer.py:197:stop] 0/7002, RunningAvgSamplesPerSec=6.327286904743152, CurrSamplesPerSec=5.715258627708343, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:14:44,327] [INFO] [timer.py:197:stop] 0/7004, RunningAvgSamplesPerSec=6.327285664584316, CurrSamplesPerSec=5.700365021506055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:14:55,611] [INFO] [timer.py:197:stop] 0/7006, RunningAvgSamplesPerSec=6.327289556763581, CurrSamplesPerSec=5.706263114540652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:15:06,916] [INFO] [timer.py:197:stop] 0/7008, RunningAvgSamplesPerSec=6.327292418451169, CurrSamplesPerSec=5.68674728607022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:15:18,194] [INFO] [timer.py:197:stop] 0/7010, RunningAvgSamplesPerSec=6.3272933057135345, CurrSamplesPerSec=5.675162923738506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:15:29,534] [INFO] [timer.py:197:stop] 0/7012, RunningAvgSamplesPerSec=6.327302922382785, CurrSamplesPerSec=5.729831277903357, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:15:40,775] [INFO] [timer.py:197:stop] 0/7014, RunningAvgSamplesPerSec=6.327310769377955, CurrSamplesPerSec=5.704530494784805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:15:52,308] [INFO] [timer.py:197:stop] 0/7016, RunningAvgSamplesPerSec=6.327314832936597, CurrSamplesPerSec=5.707043670721053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:16:03,566] [INFO] [timer.py:197:stop] 0/7018, RunningAvgSamplesPerSec=6.327324489389406, CurrSamplesPerSec=5.72722687904504, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:16:14,952] [INFO] [logging.py:68:log_dist] [Rank 0] step=3510, skipped=6, lr=[3.326666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 12:16:14,953] [INFO] [timer.py:197:stop] 0/7020, RunningAvgSamplesPerSec=6.327329895863733, CurrSamplesPerSec=5.701359985553931, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:16:26,242] [INFO] [timer.py:197:stop] 0/7022, RunningAvgSamplesPerSec=6.327333407492737, CurrSamplesPerSec=5.706435366892974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:16:37,675] [INFO] [timer.py:197:stop] 0/7024, RunningAvgSamplesPerSec=6.327342333503933, CurrSamplesPerSec=5.7196033540062725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:16:48,942] [INFO] [timer.py:197:stop] 0/7026, RunningAvgSamplesPerSec=6.327344652193617, CurrSamplesPerSec=5.702480310772168, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 3.3200000000000004e-06, 'epoch': 26.31} [2022-12-19 12:17:00,203] [INFO] [timer.py:197:stop] 0/7028, RunningAvgSamplesPerSec=6.327352237979204, CurrSamplesPerSec=5.718926332019588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:17:11,490] [INFO] [timer.py:197:stop] 0/7030, RunningAvgSamplesPerSec=6.327351940871642, CurrSamplesPerSec=5.686358668293434, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:17:22,849] [INFO] [timer.py:197:stop] 0/7032, RunningAvgSamplesPerSec=6.327360107058544, CurrSamplesPerSec=5.721843689220102, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:17:34,381] [INFO] [timer.py:197:stop] 0/7034, RunningAvgSamplesPerSec=6.3273629775277085, CurrSamplesPerSec=5.703131149336778, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:17:45,664] [INFO] [timer.py:197:stop] 0/7036, RunningAvgSamplesPerSec=6.327365641331141, CurrSamplesPerSec=5.704305506048507, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:17:56,986] [INFO] [timer.py:197:stop] 0/7038, RunningAvgSamplesPerSec=6.327361523504125, CurrSamplesPerSec=5.678463632700161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:18:08,453] [INFO] [logging.py:68:log_dist] [Rank 0] step=3520, skipped=6, lr=[3.3044444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 12:18:08,455] [INFO] [timer.py:197:stop] 0/7040, RunningAvgSamplesPerSec=6.327365546132977, CurrSamplesPerSec=5.7127075759692945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:18:19,775] [INFO] [timer.py:197:stop] 0/7042, RunningAvgSamplesPerSec=6.327360763084575, CurrSamplesPerSec=5.673433068017718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:18:31,209] [INFO] [timer.py:197:stop] 0/7044, RunningAvgSamplesPerSec=6.327358984059812, CurrSamplesPerSec=5.685283440024921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:18:42,541] [INFO] [timer.py:197:stop] 0/7046, RunningAvgSamplesPerSec=6.3273551650399495, CurrSamplesPerSec=5.692710899380416, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:18:54,106] [INFO] [timer.py:197:stop] 0/7048, RunningAvgSamplesPerSec=6.327352592649196, CurrSamplesPerSec=5.686666088753998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:19:05,422] [INFO] [timer.py:197:stop] 0/7050, RunningAvgSamplesPerSec=6.327350916157355, CurrSamplesPerSec=5.684598146066236, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:19:16,836] [INFO] [timer.py:197:stop] 0/7052, RunningAvgSamplesPerSec=6.327351409479182, CurrSamplesPerSec=5.694016722501245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:19:28,150] [INFO] [timer.py:197:stop] 0/7054, RunningAvgSamplesPerSec=6.327350299885523, CurrSamplesPerSec=5.698830276391154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:19:39,601] [INFO] [timer.py:197:stop] 0/7056, RunningAvgSamplesPerSec=6.327345983855994, CurrSamplesPerSec=5.681009674133429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:19:51,026] [INFO] [timer.py:197:stop] 0/7058, RunningAvgSamplesPerSec=6.327341423060294, CurrSamplesPerSec=5.685078749542755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:20:02,302] [INFO] [logging.py:68:log_dist] [Rank 0] step=3530, skipped=6, lr=[3.282222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 12:20:02,303] [INFO] [timer.py:197:stop] 0/7060, RunningAvgSamplesPerSec=6.3273481449927775, CurrSamplesPerSec=5.706856822479563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:20:13,588] [INFO] [timer.py:197:stop] 0/7062, RunningAvgSamplesPerSec=6.3273449428183595, CurrSamplesPerSec=5.6764986398292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:20:24,949] [INFO] [timer.py:197:stop] 0/7064, RunningAvgSamplesPerSec=6.327334587535976, CurrSamplesPerSec=5.703334475974223, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:20:36,370] [INFO] [timer.py:197:stop] 0/7066, RunningAvgSamplesPerSec=6.32733031261573, CurrSamplesPerSec=5.688125584682842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:20:47,704] [INFO] [timer.py:197:stop] 0/7068, RunningAvgSamplesPerSec=6.327325398341272, CurrSamplesPerSec=5.664596365565187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:20:59,031] [INFO] [timer.py:197:stop] 0/7070, RunningAvgSamplesPerSec=6.327319552714002, CurrSamplesPerSec=5.677038865280217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:21:10,496] [INFO] [timer.py:197:stop] 0/7072, RunningAvgSamplesPerSec=6.327314271911456, CurrSamplesPerSec=5.665342367305618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:21:21,977] [INFO] [timer.py:197:stop] 0/7074, RunningAvgSamplesPerSec=6.327309233380383, CurrSamplesPerSec=5.661402496804282, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:21:33,477] [INFO] [timer.py:197:stop] 0/7076, RunningAvgSamplesPerSec=6.327308607346268, CurrSamplesPerSec=5.683670878464273, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 3.2644444444444444e-06, 'epoch': 26.5} [2022-12-19 12:21:44,759] [INFO] [timer.py:197:stop] 0/7078, RunningAvgSamplesPerSec=6.327308592943422, CurrSamplesPerSec=5.696073877911319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:21:56,115] [INFO] [logging.py:68:log_dist] [Rank 0] step=3540, skipped=6, lr=[3.2600000000000006e-06], mom=[[0.9, 0.999]] [2022-12-19 12:21:56,116] [INFO] [timer.py:197:stop] 0/7080, RunningAvgSamplesPerSec=6.327298609059607, CurrSamplesPerSec=5.649855417323098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:22:07,564] [INFO] [timer.py:197:stop] 0/7082, RunningAvgSamplesPerSec=6.327297023631828, CurrSamplesPerSec=5.69110932504175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:22:18,871] [INFO] [timer.py:197:stop] 0/7084, RunningAvgSamplesPerSec=6.327297150702625, CurrSamplesPerSec=5.69116675854581, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:22:30,205] [INFO] [timer.py:197:stop] 0/7086, RunningAvgSamplesPerSec=6.3272916317867445, CurrSamplesPerSec=5.685466710715448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:22:41,550] [INFO] [timer.py:197:stop] 0/7088, RunningAvgSamplesPerSec=6.3272836671574, CurrSamplesPerSec=5.660220911762233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:22:53,073] [INFO] [timer.py:197:stop] 0/7090, RunningAvgSamplesPerSec=6.3272784008610605, CurrSamplesPerSec=5.686421547047207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:23:04,391] [INFO] [timer.py:197:stop] 0/7092, RunningAvgSamplesPerSec=6.327275332191891, CurrSamplesPerSec=5.676555778814263, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:23:15,715] [INFO] [timer.py:197:stop] 0/7094, RunningAvgSamplesPerSec=6.327269518707291, CurrSamplesPerSec=5.677464875537601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:23:27,093] [INFO] [timer.py:197:stop] 0/7096, RunningAvgSamplesPerSec=6.327276572650359, CurrSamplesPerSec=5.736090823402809, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:23:38,394] [INFO] [timer.py:197:stop] 0/7098, RunningAvgSamplesPerSec=6.327272394472118, CurrSamplesPerSec=5.679779271120741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:23:49,735] [INFO] [logging.py:68:log_dist] [Rank 0] step=3550, skipped=6, lr=[3.237777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 12:23:49,736] [INFO] [timer.py:197:stop] 0/7100, RunningAvgSamplesPerSec=6.327270770541093, CurrSamplesPerSec=5.706095239156057, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:24:01,058] [INFO] [timer.py:197:stop] 0/7102, RunningAvgSamplesPerSec=6.327265807180421, CurrSamplesPerSec=5.673152974285963, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:24:12,361] [INFO] [timer.py:197:stop] 0/7104, RunningAvgSamplesPerSec=6.32725832243062, CurrSamplesPerSec=5.669801669751485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:24:23,710] [INFO] [timer.py:197:stop] 0/7106, RunningAvgSamplesPerSec=6.327249767165843, CurrSamplesPerSec=5.658224631357601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:24:35,015] [INFO] [timer.py:197:stop] 0/7108, RunningAvgSamplesPerSec=6.327248077734367, CurrSamplesPerSec=5.695754562547654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:24:46,357] [INFO] [timer.py:197:stop] 0/7110, RunningAvgSamplesPerSec=6.327243295997078, CurrSamplesPerSec=5.692390270979668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:24:57,694] [INFO] [timer.py:197:stop] 0/7112, RunningAvgSamplesPerSec=6.327240793463647, CurrSamplesPerSec=5.684347522885688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:25:09,036] [INFO] [timer.py:197:stop] 0/7114, RunningAvgSamplesPerSec=6.32723324303884, CurrSamplesPerSec=5.668473891087608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:25:20,442] [INFO] [timer.py:197:stop] 0/7116, RunningAvgSamplesPerSec=6.327225635411155, CurrSamplesPerSec=5.665001620981002, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:25:31,837] [INFO] [timer.py:197:stop] 0/7118, RunningAvgSamplesPerSec=6.327219605353441, CurrSamplesPerSec=5.686043090955184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:25:43,207] [INFO] [logging.py:68:log_dist] [Rank 0] step=3560, skipped=6, lr=[3.2155555555555558e-06], mom=[[0.9, 0.999]] [2022-12-19 12:25:43,209] [INFO] [timer.py:197:stop] 0/7120, RunningAvgSamplesPerSec=6.327212366534478, CurrSamplesPerSec=5.67949470514198, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:25:54,545] [INFO] [timer.py:197:stop] 0/7122, RunningAvgSamplesPerSec=6.327211248412119, CurrSamplesPerSec=5.688465743281917, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:26:05,919] [INFO] [timer.py:197:stop] 0/7124, RunningAvgSamplesPerSec=6.327198828291783, CurrSamplesPerSec=5.691460942440809, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:26:17,329] [INFO] [timer.py:197:stop] 0/7126, RunningAvgSamplesPerSec=6.327195208387872, CurrSamplesPerSec=5.681705647953154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 3.2088888888888893e-06, 'epoch': 26.69} [2022-12-19 12:26:28,676] [INFO] [timer.py:197:stop] 0/7128, RunningAvgSamplesPerSec=6.3271947873784296, CurrSamplesPerSec=5.686125234050375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:26:40,046] [INFO] [timer.py:197:stop] 0/7130, RunningAvgSamplesPerSec=6.327189896378389, CurrSamplesPerSec=5.680363634070553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:26:51,368] [INFO] [timer.py:197:stop] 0/7132, RunningAvgSamplesPerSec=6.327190153833476, CurrSamplesPerSec=5.696807399404571, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:27:02,709] [INFO] [timer.py:197:stop] 0/7134, RunningAvgSamplesPerSec=6.327186888889604, CurrSamplesPerSec=5.6768530161234745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:27:14,117] [INFO] [timer.py:197:stop] 0/7136, RunningAvgSamplesPerSec=6.327179102006512, CurrSamplesPerSec=5.671148761288088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:27:25,544] [INFO] [timer.py:197:stop] 0/7138, RunningAvgSamplesPerSec=6.32717484867328, CurrSamplesPerSec=5.6871624646686145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:27:36,959] [INFO] [logging.py:68:log_dist] [Rank 0] step=3570, skipped=6, lr=[3.193333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 12:27:36,961] [INFO] [timer.py:197:stop] 0/7140, RunningAvgSamplesPerSec=6.327169880051704, CurrSamplesPerSec=5.679995358423211, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:27:48,281] [INFO] [timer.py:197:stop] 0/7142, RunningAvgSamplesPerSec=6.327166715784723, CurrSamplesPerSec=5.691830465056544, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:27:59,588] [INFO] [timer.py:197:stop] 0/7144, RunningAvgSamplesPerSec=6.3271671503150495, CurrSamplesPerSec=5.689345378061959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:28:10,968] [INFO] [timer.py:197:stop] 0/7146, RunningAvgSamplesPerSec=6.327152979118412, CurrSamplesPerSec=5.688135709298204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:28:22,317] [INFO] [timer.py:197:stop] 0/7148, RunningAvgSamplesPerSec=6.3271496780466245, CurrSamplesPerSec=5.690772469508463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:28:33,691] [INFO] [timer.py:197:stop] 0/7150, RunningAvgSamplesPerSec=6.327144426025504, CurrSamplesPerSec=5.6634146419679885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:28:45,024] [INFO] [timer.py:197:stop] 0/7152, RunningAvgSamplesPerSec=6.3271421996258725, CurrSamplesPerSec=5.68887538583452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:28:56,320] [INFO] [timer.py:197:stop] 0/7154, RunningAvgSamplesPerSec=6.327147504844384, CurrSamplesPerSec=5.6989660248066745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:29:07,748] [INFO] [timer.py:197:stop] 0/7156, RunningAvgSamplesPerSec=6.327141952067556, CurrSamplesPerSec=5.679028982155395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:29:19,232] [INFO] [timer.py:197:stop] 0/7158, RunningAvgSamplesPerSec=6.3271331772011905, CurrSamplesPerSec=5.659929948431017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:29:30,578] [INFO] [logging.py:68:log_dist] [Rank 0] step=3580, skipped=6, lr=[3.1711111111111114e-06], mom=[[0.9, 0.999]] [2022-12-19 12:29:30,580] [INFO] [timer.py:197:stop] 0/7160, RunningAvgSamplesPerSec=6.327127294996347, CurrSamplesPerSec=5.660737272267135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:29:41,866] [INFO] [timer.py:197:stop] 0/7162, RunningAvgSamplesPerSec=6.327121722310218, CurrSamplesPerSec=5.673680810950195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:29:53,480] [INFO] [timer.py:197:stop] 0/7164, RunningAvgSamplesPerSec=6.327118631650173, CurrSamplesPerSec=5.693451041833915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:30:05,036] [INFO] [timer.py:197:stop] 0/7166, RunningAvgSamplesPerSec=6.327081303057305, CurrSamplesPerSec=5.677032622087769, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:30:16,370] [INFO] [timer.py:197:stop] 0/7168, RunningAvgSamplesPerSec=6.327082155739768, CurrSamplesPerSec=5.691306968733664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:30:27,857] [INFO] [timer.py:197:stop] 0/7170, RunningAvgSamplesPerSec=6.327073223337168, CurrSamplesPerSec=5.650896586631433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:30:39,192] [INFO] [timer.py:197:stop] 0/7172, RunningAvgSamplesPerSec=6.327066514214058, CurrSamplesPerSec=5.6699262183186665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:30:50,539] [INFO] [timer.py:197:stop] 0/7174, RunningAvgSamplesPerSec=6.327056522215855, CurrSamplesPerSec=5.611315935646122, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:31:01,920] [INFO] [timer.py:197:stop] 0/7176, RunningAvgSamplesPerSec=6.32705366088648, CurrSamplesPerSec=5.677198791736968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 3.1533333333333338e-06, 'epoch': 26.88} [2022-12-19 12:31:13,317] [INFO] [timer.py:197:stop] 0/7178, RunningAvgSamplesPerSec=6.3270520934003756, CurrSamplesPerSec=5.690949820423792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:31:24,715] [INFO] [logging.py:68:log_dist] [Rank 0] step=3590, skipped=6, lr=[3.148888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 12:31:24,717] [INFO] [timer.py:197:stop] 0/7180, RunningAvgSamplesPerSec=6.327036113518233, CurrSamplesPerSec=5.674137981036339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:31:36,329] [INFO] [timer.py:197:stop] 0/7182, RunningAvgSamplesPerSec=6.327025496643367, CurrSamplesPerSec=5.657281386230217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:31:47,684] [INFO] [timer.py:197:stop] 0/7184, RunningAvgSamplesPerSec=6.327019641730397, CurrSamplesPerSec=5.664678368323859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:31:58,989] [INFO] [timer.py:197:stop] 0/7186, RunningAvgSamplesPerSec=6.327019694537303, CurrSamplesPerSec=5.6876300047181605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:32:10,635] [INFO] [timer.py:197:stop] 0/7188, RunningAvgSamplesPerSec=6.326964475146932, CurrSamplesPerSec=5.405721446054562, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:32:22,057] [INFO] [timer.py:197:stop] 0/7190, RunningAvgSamplesPerSec=6.326959396799485, CurrSamplesPerSec=5.668670205105268, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:32:33,400] [INFO] [timer.py:197:stop] 0/7192, RunningAvgSamplesPerSec=6.326957587138292, CurrSamplesPerSec=5.705022477029904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:32:44,735] [INFO] [timer.py:197:stop] 0/7194, RunningAvgSamplesPerSec=6.326954642175814, CurrSamplesPerSec=5.697797493571933, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:32:56,307] [INFO] [timer.py:197:stop] 0/7196, RunningAvgSamplesPerSec=6.3269492010078645, CurrSamplesPerSec=5.673307166300374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:33:07,667] [INFO] [timer.py:197:stop] 0/7198, RunningAvgSamplesPerSec=6.326935775497697, CurrSamplesPerSec=5.647910408199515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:33:19,017] [INFO] [logging.py:68:log_dist] [Rank 0] step=3600, skipped=6, lr=[3.1266666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 12:33:19,019] [INFO] [timer.py:197:stop] 0/7200, RunningAvgSamplesPerSec=6.3269310383999136, CurrSamplesPerSec=5.667104619976033, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:33:30,524] [INFO] [timer.py:197:stop] 0/7202, RunningAvgSamplesPerSec=6.326896707980863, CurrSamplesPerSec=5.5139306133499195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:33:42,029] [INFO] [timer.py:197:stop] 0/7204, RunningAvgSamplesPerSec=6.326899483005291, CurrSamplesPerSec=5.687008482480714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:33:53,376] [INFO] [timer.py:197:stop] 0/7206, RunningAvgSamplesPerSec=6.326891019559699, CurrSamplesPerSec=5.668972603176562, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:34:04,885] [INFO] [timer.py:197:stop] 0/7208, RunningAvgSamplesPerSec=6.326885039692694, CurrSamplesPerSec=5.688290957913902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:34:15,268] [INFO] [timer.py:197:stop] 0/7210, RunningAvgSamplesPerSec=6.327017970096982, CurrSamplesPerSec=5.654304893773773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:34:26,688] [INFO] [timer.py:197:stop] 0/7212, RunningAvgSamplesPerSec=6.326988249944692, CurrSamplesPerSec=5.537652029625908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:34:37,981] [INFO] [timer.py:197:stop] 0/7214, RunningAvgSamplesPerSec=6.326985434885538, CurrSamplesPerSec=5.669920948845354, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:34:49,399] [INFO] [timer.py:197:stop] 0/7216, RunningAvgSamplesPerSec=6.32698572736324, CurrSamplesPerSec=5.701937897274761, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:35:00,789] [INFO] [timer.py:197:stop] 0/7218, RunningAvgSamplesPerSec=6.326970760115527, CurrSamplesPerSec=5.687035470931329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:35:12,208] [INFO] [logging.py:68:log_dist] [Rank 0] step=3610, skipped=6, lr=[3.104444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 12:35:12,210] [INFO] [timer.py:197:stop] 0/7220, RunningAvgSamplesPerSec=6.326965121894671, CurrSamplesPerSec=5.661908086043561, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:35:23,807] [INFO] [timer.py:197:stop] 0/7222, RunningAvgSamplesPerSec=6.326916577579626, CurrSamplesPerSec=5.4418314444658025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:35:35,146] [INFO] [timer.py:197:stop] 0/7224, RunningAvgSamplesPerSec=6.326912723846101, CurrSamplesPerSec=5.655555740022739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:35:46,513] [INFO] [timer.py:197:stop] 0/7226, RunningAvgSamplesPerSec=6.326912665299861, CurrSamplesPerSec=5.715041552487384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0004, 'learning_rate': 3.097777777777778e-06, 'epoch': 27.07} [2022-12-19 12:35:57,873] [INFO] [timer.py:197:stop] 0/7228, RunningAvgSamplesPerSec=6.326903814419152, CurrSamplesPerSec=5.688290957913902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:36:09,162] [INFO] [timer.py:197:stop] 0/7230, RunningAvgSamplesPerSec=6.326905582086577, CurrSamplesPerSec=5.700545149871831, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:36:20,820] [INFO] [timer.py:197:stop] 0/7232, RunningAvgSamplesPerSec=6.326901636298663, CurrSamplesPerSec=5.674160529631101, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:36:32,117] [INFO] [timer.py:197:stop] 0/7234, RunningAvgSamplesPerSec=6.326899159628436, CurrSamplesPerSec=5.6913214486647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:36:43,617] [INFO] [timer.py:197:stop] 0/7236, RunningAvgSamplesPerSec=6.32686580934922, CurrSamplesPerSec=5.518545806082945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:36:54,899] [INFO] [timer.py:197:stop] 0/7238, RunningAvgSamplesPerSec=6.326867493550722, CurrSamplesPerSec=5.700498906154192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:37:06,152] [INFO] [logging.py:68:log_dist] [Rank 0] step=3620, skipped=6, lr=[3.0822222222222227e-06], mom=[[0.9, 0.999]] [2022-12-19 12:37:06,153] [INFO] [timer.py:197:stop] 0/7240, RunningAvgSamplesPerSec=6.326870849413581, CurrSamplesPerSec=5.697868124086742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:37:17,469] [INFO] [timer.py:197:stop] 0/7242, RunningAvgSamplesPerSec=6.326868110442388, CurrSamplesPerSec=5.690521784145723, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:37:29,144] [INFO] [timer.py:197:stop] 0/7244, RunningAvgSamplesPerSec=6.326800519212486, CurrSamplesPerSec=5.309836414946014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:37:40,457] [INFO] [timer.py:197:stop] 0/7246, RunningAvgSamplesPerSec=6.3267996860306015, CurrSamplesPerSec=5.7038875783910035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:37:51,754] [INFO] [timer.py:197:stop] 0/7248, RunningAvgSamplesPerSec=6.3267986358272195, CurrSamplesPerSec=5.682793478572898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:38:03,135] [INFO] [timer.py:197:stop] 0/7250, RunningAvgSamplesPerSec=6.326784263647111, CurrSamplesPerSec=5.679242368965407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:38:14,427] [INFO] [timer.py:197:stop] 0/7252, RunningAvgSamplesPerSec=6.326785442609721, CurrSamplesPerSec=5.697731702338869, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:38:25,801] [INFO] [timer.py:197:stop] 0/7254, RunningAvgSamplesPerSec=6.326772569185867, CurrSamplesPerSec=5.687660614402331, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:38:37,082] [INFO] [timer.py:197:stop] 0/7256, RunningAvgSamplesPerSec=6.326774378469869, CurrSamplesPerSec=5.694916439412799, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:38:48,583] [INFO] [timer.py:197:stop] 0/7258, RunningAvgSamplesPerSec=6.326762998296623, CurrSamplesPerSec=5.687485878507279, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:38:59,899] [INFO] [logging.py:68:log_dist] [Rank 0] step=3630, skipped=6, lr=[3.0600000000000003e-06], mom=[[0.9, 0.999]] [2022-12-19 12:38:59,901] [INFO] [timer.py:197:stop] 0/7260, RunningAvgSamplesPerSec=6.326759985289769, CurrSamplesPerSec=5.692670094513411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:39:11,311] [INFO] [timer.py:197:stop] 0/7262, RunningAvgSamplesPerSec=6.326745468216178, CurrSamplesPerSec=5.643049349978831, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:39:22,667] [INFO] [timer.py:197:stop] 0/7264, RunningAvgSamplesPerSec=6.326741408465387, CurrSamplesPerSec=5.682456403809588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:39:33,989] [INFO] [timer.py:197:stop] 0/7266, RunningAvgSamplesPerSec=6.326740319779466, CurrSamplesPerSec=5.6635828831829755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:39:45,285] [INFO] [timer.py:197:stop] 0/7268, RunningAvgSamplesPerSec=6.326739705223273, CurrSamplesPerSec=5.689250601700785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:39:56,719] [INFO] [timer.py:197:stop] 0/7270, RunningAvgSamplesPerSec=6.326712631452167, CurrSamplesPerSec=5.538936593660579, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:40:07,995] [INFO] [timer.py:197:stop] 0/7272, RunningAvgSamplesPerSec=6.326715114681684, CurrSamplesPerSec=5.705919126176111, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:40:19,288] [INFO] [timer.py:197:stop] 0/7274, RunningAvgSamplesPerSec=6.32671644337829, CurrSamplesPerSec=5.713385071943174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:40:30,607] [INFO] [timer.py:197:stop] 0/7276, RunningAvgSamplesPerSec=6.326714338988593, CurrSamplesPerSec=5.697178583933845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 3.0422222222222227e-06, 'epoch': 27.25} [2022-12-19 12:40:41,917] [INFO] [timer.py:197:stop] 0/7278, RunningAvgSamplesPerSec=6.326708783569714, CurrSamplesPerSec=5.676690228022659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:40:53,261] [INFO] [logging.py:68:log_dist] [Rank 0] step=3640, skipped=6, lr=[3.037777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 12:40:53,263] [INFO] [timer.py:197:stop] 0/7280, RunningAvgSamplesPerSec=6.326699367476691, CurrSamplesPerSec=5.685469119081756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:41:04,585] [INFO] [timer.py:197:stop] 0/7282, RunningAvgSamplesPerSec=6.32669846734041, CurrSamplesPerSec=5.695473227279411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:41:15,916] [INFO] [timer.py:197:stop] 0/7284, RunningAvgSamplesPerSec=6.326697886278006, CurrSamplesPerSec=5.6970182553008595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:41:27,265] [INFO] [timer.py:197:stop] 0/7286, RunningAvgSamplesPerSec=6.326691022301211, CurrSamplesPerSec=5.672443990746431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:41:38,658] [INFO] [timer.py:197:stop] 0/7288, RunningAvgSamplesPerSec=6.326673781060028, CurrSamplesPerSec=5.606230531416746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:41:49,983] [INFO] [timer.py:197:stop] 0/7290, RunningAvgSamplesPerSec=6.326669547234485, CurrSamplesPerSec=5.681488228004327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:42:01,306] [INFO] [timer.py:197:stop] 0/7292, RunningAvgSamplesPerSec=6.326661632471784, CurrSamplesPerSec=5.654440911377166, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:42:12,630] [INFO] [timer.py:197:stop] 0/7294, RunningAvgSamplesPerSec=6.326658020840207, CurrSamplesPerSec=5.6800657886495625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:42:24,050] [INFO] [timer.py:197:stop] 0/7296, RunningAvgSamplesPerSec=6.326641665150236, CurrSamplesPerSec=5.613052009642134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:42:35,321] [INFO] [timer.py:197:stop] 0/7298, RunningAvgSamplesPerSec=6.326640935487673, CurrSamplesPerSec=5.701249309207745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:42:46,712] [INFO] [logging.py:68:log_dist] [Rank 0] step=3650, skipped=6, lr=[3.015555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 12:42:46,713] [INFO] [timer.py:197:stop] 0/7300, RunningAvgSamplesPerSec=6.326622562110285, CurrSamplesPerSec=5.682208615605204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:42:58,650] [INFO] [timer.py:197:stop] 0/7302, RunningAvgSamplesPerSec=6.326613666064802, CurrSamplesPerSec=5.638910840764831, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:43:10,525] [INFO] [timer.py:197:stop] 0/7304, RunningAvgSamplesPerSec=6.326602686137347, CurrSamplesPerSec=5.64941094991558, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:43:22,395] [INFO] [timer.py:197:stop] 0/7306, RunningAvgSamplesPerSec=6.326600771907224, CurrSamplesPerSec=5.682937367059271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:43:33,674] [INFO] [timer.py:197:stop] 0/7308, RunningAvgSamplesPerSec=6.326603261270357, CurrSamplesPerSec=5.697195753900068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:43:45,110] [INFO] [timer.py:197:stop] 0/7310, RunningAvgSamplesPerSec=6.326584396464529, CurrSamplesPerSec=5.586230453845414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:43:56,557] [INFO] [timer.py:197:stop] 0/7312, RunningAvgSamplesPerSec=6.326583089614416, CurrSamplesPerSec=5.696100468996214, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:44:07,882] [INFO] [timer.py:197:stop] 0/7314, RunningAvgSamplesPerSec=6.3265840158457385, CurrSamplesPerSec=5.709279518584993, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:44:19,479] [INFO] [timer.py:197:stop] 0/7316, RunningAvgSamplesPerSec=6.32658137147921, CurrSamplesPerSec=5.697181727722739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:44:30,773] [INFO] [timer.py:197:stop] 0/7318, RunningAvgSamplesPerSec=6.3265816803746935, CurrSamplesPerSec=5.692584623492688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:44:42,067] [INFO] [logging.py:68:log_dist] [Rank 0] step=3660, skipped=6, lr=[2.9933333333333336e-06], mom=[[0.9, 0.999]] [2022-12-19 12:44:42,069] [INFO] [timer.py:197:stop] 0/7320, RunningAvgSamplesPerSec=6.326586614971007, CurrSamplesPerSec=5.718194412272607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:44:53,337] [INFO] [timer.py:197:stop] 0/7322, RunningAvgSamplesPerSec=6.326588955952083, CurrSamplesPerSec=5.699992453391411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:45:04,735] [INFO] [timer.py:197:stop] 0/7324, RunningAvgSamplesPerSec=6.326596695545706, CurrSamplesPerSec=5.699129125038513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:45:16,048] [INFO] [timer.py:197:stop] 0/7326, RunningAvgSamplesPerSec=6.32659401829469, CurrSamplesPerSec=5.7174013046817755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.986666666666667e-06, 'epoch': 27.44} [2022-12-19 12:45:27,568] [INFO] [timer.py:197:stop] 0/7328, RunningAvgSamplesPerSec=6.3265934954703065, CurrSamplesPerSec=5.684795819321407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:45:39,062] [INFO] [timer.py:197:stop] 0/7330, RunningAvgSamplesPerSec=6.326565293643663, CurrSamplesPerSec=5.533401976383905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:45:50,336] [INFO] [timer.py:197:stop] 0/7332, RunningAvgSamplesPerSec=6.3265678839399255, CurrSamplesPerSec=5.699976718967875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:46:01,556] [INFO] [timer.py:197:stop] 0/7334, RunningAvgSamplesPerSec=6.32657856038314, CurrSamplesPerSec=5.7139858578371845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:46:13,153] [INFO] [timer.py:197:stop] 0/7336, RunningAvgSamplesPerSec=6.3265810690933115, CurrSamplesPerSec=5.7070824978650165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:46:24,436] [INFO] [timer.py:197:stop] 0/7338, RunningAvgSamplesPerSec=6.326586119838358, CurrSamplesPerSec=5.707291688193751, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:46:35,883] [INFO] [logging.py:68:log_dist] [Rank 0] step=3670, skipped=6, lr=[2.9711111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 12:46:35,885] [INFO] [timer.py:197:stop] 0/7340, RunningAvgSamplesPerSec=6.326585668053983, CurrSamplesPerSec=5.723874868740355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:46:47,191] [INFO] [timer.py:197:stop] 0/7342, RunningAvgSamplesPerSec=6.326586710514287, CurrSamplesPerSec=5.67402835903874, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:46:58,440] [INFO] [timer.py:197:stop] 0/7344, RunningAvgSamplesPerSec=6.3265900851201975, CurrSamplesPerSec=5.683738511588235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:47:09,794] [INFO] [timer.py:197:stop] 0/7346, RunningAvgSamplesPerSec=6.326593238290855, CurrSamplesPerSec=5.72843245164542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:47:21,289] [INFO] [timer.py:197:stop] 0/7348, RunningAvgSamplesPerSec=6.326599297574536, CurrSamplesPerSec=5.700672989811924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:47:32,845] [INFO] [timer.py:197:stop] 0/7350, RunningAvgSamplesPerSec=6.326552546444211, CurrSamplesPerSec=5.698787448019233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:47:44,392] [INFO] [timer.py:197:stop] 0/7352, RunningAvgSamplesPerSec=6.326557609444567, CurrSamplesPerSec=5.712430884779102, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:47:55,818] [INFO] [timer.py:197:stop] 0/7354, RunningAvgSamplesPerSec=6.326538433054389, CurrSamplesPerSec=5.563145044779475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:48:07,133] [INFO] [timer.py:197:stop] 0/7356, RunningAvgSamplesPerSec=6.326540553899915, CurrSamplesPerSec=5.695326769918949, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:48:18,356] [INFO] [timer.py:197:stop] 0/7358, RunningAvgSamplesPerSec=6.3265516815027745, CurrSamplesPerSec=5.7321195372578275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:48:30,031] [INFO] [logging.py:68:log_dist] [Rank 0] step=3680, skipped=6, lr=[2.948888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 12:48:30,032] [INFO] [timer.py:197:stop] 0/7360, RunningAvgSamplesPerSec=6.326551907498219, CurrSamplesPerSec=5.670553595158882, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:48:41,286] [INFO] [timer.py:197:stop] 0/7362, RunningAvgSamplesPerSec=6.326555096807908, CurrSamplesPerSec=5.7041876849764614, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:48:52,709] [INFO] [timer.py:197:stop] 0/7364, RunningAvgSamplesPerSec=6.326558930027986, CurrSamplesPerSec=5.719636014986876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:49:04,009] [INFO] [timer.py:197:stop] 0/7366, RunningAvgSamplesPerSec=6.326561121002562, CurrSamplesPerSec=5.697483547611639, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:49:15,348] [INFO] [timer.py:197:stop] 0/7368, RunningAvgSamplesPerSec=6.326555367561322, CurrSamplesPerSec=5.658960843392543, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:49:26,722] [INFO] [timer.py:197:stop] 0/7370, RunningAvgSamplesPerSec=6.326547509903645, CurrSamplesPerSec=5.698513071350141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:49:38,106] [INFO] [timer.py:197:stop] 0/7372, RunningAvgSamplesPerSec=6.326541176917299, CurrSamplesPerSec=5.693786765306341, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:49:49,471] [INFO] [timer.py:197:stop] 0/7374, RunningAvgSamplesPerSec=6.326528718793954, CurrSamplesPerSec=5.679082327354583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:50:00,911] [INFO] [timer.py:197:stop] 0/7376, RunningAvgSamplesPerSec=6.32653190536233, CurrSamplesPerSec=5.69905459143006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.931111111111111e-06, 'epoch': 27.63} [2022-12-19 12:50:12,393] [INFO] [timer.py:197:stop] 0/7378, RunningAvgSamplesPerSec=6.326503400071382, CurrSamplesPerSec=5.588599962475551, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:50:23,704] [INFO] [logging.py:68:log_dist] [Rank 0] step=3690, skipped=6, lr=[2.9266666666666673e-06], mom=[[0.9, 0.999]] [2022-12-19 12:50:23,706] [INFO] [timer.py:197:stop] 0/7380, RunningAvgSamplesPerSec=6.326502801029303, CurrSamplesPerSec=5.698518877990106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:50:35,377] [INFO] [timer.py:197:stop] 0/7382, RunningAvgSamplesPerSec=6.3264362498934075, CurrSamplesPerSec=5.3266534861973565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:50:46,764] [INFO] [timer.py:197:stop] 0/7384, RunningAvgSamplesPerSec=6.32643801608496, CurrSamplesPerSec=5.695301394409226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:50:58,030] [INFO] [timer.py:197:stop] 0/7386, RunningAvgSamplesPerSec=6.32643681518484, CurrSamplesPerSec=5.677562141635366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:51:09,394] [INFO] [timer.py:197:stop] 0/7388, RunningAvgSamplesPerSec=6.326430187699988, CurrSamplesPerSec=5.7010899621908795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:51:20,921] [INFO] [timer.py:197:stop] 0/7390, RunningAvgSamplesPerSec=6.326422137159056, CurrSamplesPerSec=5.646931397036695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:51:32,461] [INFO] [timer.py:197:stop] 0/7392, RunningAvgSamplesPerSec=6.326384269841426, CurrSamplesPerSec=5.445823505670881, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:51:43,725] [INFO] [timer.py:197:stop] 0/7394, RunningAvgSamplesPerSec=6.326394622835812, CurrSamplesPerSec=5.738654218766829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:51:55,083] [INFO] [timer.py:197:stop] 0/7396, RunningAvgSamplesPerSec=6.326387965586737, CurrSamplesPerSec=5.6175076343229415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:52:06,661] [INFO] [timer.py:197:stop] 0/7398, RunningAvgSamplesPerSec=6.326394853742537, CurrSamplesPerSec=5.722706105542608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:52:17,913] [INFO] [logging.py:68:log_dist] [Rank 0] step=3700, skipped=6, lr=[2.904444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 12:52:17,915] [INFO] [timer.py:197:stop] 0/7400, RunningAvgSamplesPerSec=6.326398277683811, CurrSamplesPerSec=5.714524483439243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:52:29,416] [INFO] [timer.py:197:stop] 0/7402, RunningAvgSamplesPerSec=6.326398009347313, CurrSamplesPerSec=5.714412322222015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:52:40,728] [INFO] [timer.py:197:stop] 0/7404, RunningAvgSamplesPerSec=6.326399765728833, CurrSamplesPerSec=5.691291523555103, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:52:52,010] [INFO] [timer.py:197:stop] 0/7406, RunningAvgSamplesPerSec=6.326405003152809, CurrSamplesPerSec=5.7120764285529155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:53:03,268] [INFO] [timer.py:197:stop] 0/7408, RunningAvgSamplesPerSec=6.326413113158804, CurrSamplesPerSec=5.707861336341741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:53:14,796] [INFO] [timer.py:197:stop] 0/7410, RunningAvgSamplesPerSec=6.3264179125338, CurrSamplesPerSec=5.709642857537873, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:53:26,154] [INFO] [timer.py:197:stop] 0/7412, RunningAvgSamplesPerSec=6.326416051279769, CurrSamplesPerSec=5.711629895777799, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:53:37,499] [INFO] [timer.py:197:stop] 0/7414, RunningAvgSamplesPerSec=6.32641607504892, CurrSamplesPerSec=5.6832872534098815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:53:48,780] [INFO] [timer.py:197:stop] 0/7416, RunningAvgSamplesPerSec=6.326419960387166, CurrSamplesPerSec=5.709413822315627, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:54:00,135] [INFO] [timer.py:197:stop] 0/7418, RunningAvgSamplesPerSec=6.326415719843938, CurrSamplesPerSec=5.674573872326353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:54:11,432] [INFO] [logging.py:68:log_dist] [Rank 0] step=3710, skipped=6, lr=[2.8822222222222225e-06], mom=[[0.9, 0.999]] [2022-12-19 12:54:11,433] [INFO] [timer.py:197:stop] 0/7420, RunningAvgSamplesPerSec=6.326419392417126, CurrSamplesPerSec=5.6797908081807975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:54:22,859] [INFO] [timer.py:197:stop] 0/7422, RunningAvgSamplesPerSec=6.3264287259124075, CurrSamplesPerSec=5.722514570552376, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:54:34,145] [INFO] [timer.py:197:stop] 0/7424, RunningAvgSamplesPerSec=6.326432876517175, CurrSamplesPerSec=5.712724353362031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:54:45,449] [INFO] [timer.py:197:stop] 0/7426, RunningAvgSamplesPerSec=6.32644122890358, CurrSamplesPerSec=5.714569495144024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.875555555555556e-06, 'epoch': 27.82} [2022-12-19 12:54:56,915] [INFO] [timer.py:197:stop] 0/7428, RunningAvgSamplesPerSec=6.326445859010926, CurrSamplesPerSec=5.70093740424476, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:55:08,299] [INFO] [timer.py:197:stop] 0/7430, RunningAvgSamplesPerSec=6.3264295046598065, CurrSamplesPerSec=5.592551671768745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:55:19,597] [INFO] [timer.py:197:stop] 0/7432, RunningAvgSamplesPerSec=6.326434083278377, CurrSamplesPerSec=5.703636948141368, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:55:31,017] [INFO] [timer.py:197:stop] 0/7434, RunningAvgSamplesPerSec=6.326415202338471, CurrSamplesPerSec=5.583078567957287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:55:42,454] [INFO] [timer.py:197:stop] 0/7436, RunningAvgSamplesPerSec=6.326419215285285, CurrSamplesPerSec=5.691255806900758, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:55:53,743] [INFO] [timer.py:197:stop] 0/7438, RunningAvgSamplesPerSec=6.326421154725038, CurrSamplesPerSec=5.698271380459116, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:56:05,214] [INFO] [logging.py:68:log_dist] [Rank 0] step=3720, skipped=6, lr=[2.86e-06], mom=[[0.9, 0.999]] [2022-12-19 12:56:05,216] [INFO] [timer.py:197:stop] 0/7440, RunningAvgSamplesPerSec=6.326419953612983, CurrSamplesPerSec=5.6947620370604515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:56:16,560] [INFO] [timer.py:197:stop] 0/7442, RunningAvgSamplesPerSec=6.3264236008463675, CurrSamplesPerSec=5.697373989099078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:56:28,145] [INFO] [timer.py:197:stop] 0/7444, RunningAvgSamplesPerSec=6.326377656613037, CurrSamplesPerSec=5.427064828522956, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:56:39,465] [INFO] [timer.py:197:stop] 0/7446, RunningAvgSamplesPerSec=6.326375073057987, CurrSamplesPerSec=5.681899993141981, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:56:50,910] [INFO] [timer.py:197:stop] 0/7448, RunningAvgSamplesPerSec=6.3263797892178575, CurrSamplesPerSec=5.706078985849943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:57:02,238] [INFO] [timer.py:197:stop] 0/7450, RunningAvgSamplesPerSec=6.326375759822367, CurrSamplesPerSec=5.687042459054042, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:57:13,490] [INFO] [timer.py:197:stop] 0/7452, RunningAvgSamplesPerSec=6.326378812019386, CurrSamplesPerSec=5.717358196722708, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:57:25,014] [INFO] [timer.py:197:stop] 0/7454, RunningAvgSamplesPerSec=6.326373066909723, CurrSamplesPerSec=5.696140114347429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:57:36,353] [INFO] [timer.py:197:stop] 0/7456, RunningAvgSamplesPerSec=6.326369693444543, CurrSamplesPerSec=5.6874087571136185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:57:47,691] [INFO] [timer.py:197:stop] 0/7458, RunningAvgSamplesPerSec=6.326366937337701, CurrSamplesPerSec=5.668147369596408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:57:59,195] [INFO] [logging.py:68:log_dist] [Rank 0] step=3730, skipped=6, lr=[2.837777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 12:57:59,198] [INFO] [timer.py:197:stop] 0/7460, RunningAvgSamplesPerSec=6.32636865482078, CurrSamplesPerSec=5.708717600076083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:58:10,481] [INFO] [timer.py:197:stop] 0/7462, RunningAvgSamplesPerSec=6.326368504911937, CurrSamplesPerSec=5.702390910824729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:58:21,836] [INFO] [timer.py:197:stop] 0/7464, RunningAvgSamplesPerSec=6.3263613051702325, CurrSamplesPerSec=5.685135579775044, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:58:33,383] [INFO] [timer.py:197:stop] 0/7466, RunningAvgSamplesPerSec=6.326363251777437, CurrSamplesPerSec=5.69909572986899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:58:44,757] [INFO] [timer.py:197:stop] 0/7468, RunningAvgSamplesPerSec=6.326344320649407, CurrSamplesPerSec=5.58020700082952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:58:56,047] [INFO] [timer.py:197:stop] 0/7470, RunningAvgSamplesPerSec=6.326345274596041, CurrSamplesPerSec=5.688594247514069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:59:07,377] [INFO] [timer.py:197:stop] 0/7472, RunningAvgSamplesPerSec=6.326348882620933, CurrSamplesPerSec=5.705891473059357, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:59:18,707] [INFO] [timer.py:197:stop] 0/7474, RunningAvgSamplesPerSec=6.326339032924863, CurrSamplesPerSec=5.679160664913947, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:59:29,140] [INFO] [timer.py:197:stop] 0/7476, RunningAvgSamplesPerSec=6.3264708255295705, CurrSamplesPerSec=6.645330875446805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 12:59:40,484] [INFO] [timer.py:197:stop] 0/7478, RunningAvgSamplesPerSec=6.326467351849989, CurrSamplesPerSec=5.655693486057903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.8177777777777784e-06, 'epoch': 28.01} [2022-12-19 12:59:51,837] [INFO] [logging.py:68:log_dist] [Rank 0] step=3740, skipped=6, lr=[2.815555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 12:59:51,839] [INFO] [timer.py:197:stop] 0/7480, RunningAvgSamplesPerSec=6.326459469316938, CurrSamplesPerSec=5.658272338581369, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:00:03,346] [INFO] [timer.py:197:stop] 0/7482, RunningAvgSamplesPerSec=6.326426979765127, CurrSamplesPerSec=5.513690056194417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:00:14,745] [INFO] [timer.py:197:stop] 0/7484, RunningAvgSamplesPerSec=6.326424775930554, CurrSamplesPerSec=5.686586098413076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:00:26,038] [INFO] [timer.py:197:stop] 0/7486, RunningAvgSamplesPerSec=6.326425119803784, CurrSamplesPerSec=5.696075086591611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:00:37,433] [INFO] [timer.py:197:stop] 0/7488, RunningAvgSamplesPerSec=6.326424688198611, CurrSamplesPerSec=5.689474645568597, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:00:48,859] [INFO] [timer.py:197:stop] 0/7490, RunningAvgSamplesPerSec=6.326421208684619, CurrSamplesPerSec=5.684516047086141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:01:00,349] [INFO] [timer.py:197:stop] 0/7492, RunningAvgSamplesPerSec=6.326391308661792, CurrSamplesPerSec=5.5437746491392526, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:01:11,648] [INFO] [timer.py:197:stop] 0/7494, RunningAvgSamplesPerSec=6.326389224741426, CurrSamplesPerSec=5.678055007755317, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:01:23,228] [INFO] [timer.py:197:stop] 0/7496, RunningAvgSamplesPerSec=6.326341417622771, CurrSamplesPerSec=5.434159440499643, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:01:34,824] [INFO] [timer.py:197:stop] 0/7498, RunningAvgSamplesPerSec=6.326327806788834, CurrSamplesPerSec=5.677878219422445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:01:46,135] [INFO] [logging.py:68:log_dist] [Rank 0] step=3750, skipped=6, lr=[2.7933333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 13:01:46,137] [INFO] [timer.py:197:stop] 0/7500, RunningAvgSamplesPerSec=6.326327825745383, CurrSamplesPerSec=5.676055251594499, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:01:57,616] [INFO] [timer.py:197:stop] 0/7502, RunningAvgSamplesPerSec=6.3263219870073195, CurrSamplesPerSec=5.672490020145245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:02:09,033] [INFO] [timer.py:197:stop] 0/7504, RunningAvgSamplesPerSec=6.326320563459536, CurrSamplesPerSec=5.676534171499789, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:02:20,400] [INFO] [timer.py:197:stop] 0/7506, RunningAvgSamplesPerSec=6.326309885088488, CurrSamplesPerSec=5.637692210145909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:02:31,742] [INFO] [timer.py:197:stop] 0/7508, RunningAvgSamplesPerSec=6.326304603264045, CurrSamplesPerSec=5.674648726659906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:02:43,056] [INFO] [timer.py:197:stop] 0/7510, RunningAvgSamplesPerSec=6.326304108572571, CurrSamplesPerSec=5.675035505512281, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:02:54,485] [INFO] [timer.py:197:stop] 0/7512, RunningAvgSamplesPerSec=6.326283271752458, CurrSamplesPerSec=5.654965986417487, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:03:05,822] [INFO] [timer.py:197:stop] 0/7514, RunningAvgSamplesPerSec=6.326276313781599, CurrSamplesPerSec=5.671593779903194, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:03:17,272] [INFO] [timer.py:197:stop] 0/7516, RunningAvgSamplesPerSec=6.326269451914143, CurrSamplesPerSec=5.669724069180253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:03:28,642] [INFO] [timer.py:197:stop] 0/7518, RunningAvgSamplesPerSec=6.326263880634568, CurrSamplesPerSec=5.665940027740865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:03:39,944] [INFO] [logging.py:68:log_dist] [Rank 0] step=3760, skipped=6, lr=[2.771111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 13:03:39,946] [INFO] [timer.py:197:stop] 0/7520, RunningAvgSamplesPerSec=6.32626291974317, CurrSamplesPerSec=5.68166692481959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:03:51,213] [INFO] [timer.py:197:stop] 0/7522, RunningAvgSamplesPerSec=6.32626590760842, CurrSamplesPerSec=5.708565847626858, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:04:02,820] [INFO] [timer.py:197:stop] 0/7524, RunningAvgSamplesPerSec=6.3262625499165495, CurrSamplesPerSec=5.6823375590030905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:04:14,166] [INFO] [timer.py:197:stop] 0/7526, RunningAvgSamplesPerSec=6.326255467549943, CurrSamplesPerSec=5.701100617336601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:04:25,503] [INFO] [timer.py:197:stop] 0/7528, RunningAvgSamplesPerSec=6.326252881558468, CurrSamplesPerSec=5.6723212493349005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.7622222222222224e-06, 'epoch': 28.19} [2022-12-19 13:04:37,219] [INFO] [timer.py:197:stop] 0/7530, RunningAvgSamplesPerSec=6.326245262495048, CurrSamplesPerSec=5.66441611128677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:04:48,543] [INFO] [timer.py:197:stop] 0/7532, RunningAvgSamplesPerSec=6.326244874988683, CurrSamplesPerSec=5.685633856180173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:05:00,041] [INFO] [timer.py:197:stop] 0/7534, RunningAvgSamplesPerSec=6.326210476110107, CurrSamplesPerSec=5.481106218833133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:05:11,602] [INFO] [timer.py:197:stop] 0/7536, RunningAvgSamplesPerSec=6.3262115691556104, CurrSamplesPerSec=5.695274569116246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:05:22,915] [INFO] [timer.py:197:stop] 0/7538, RunningAvgSamplesPerSec=6.326208759176597, CurrSamplesPerSec=5.692261112364346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:05:34,294] [INFO] [logging.py:68:log_dist] [Rank 0] step=3770, skipped=6, lr=[2.748888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 13:05:34,295] [INFO] [timer.py:197:stop] 0/7540, RunningAvgSamplesPerSec=6.326205545353157, CurrSamplesPerSec=5.706677265568391, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:05:45,777] [INFO] [timer.py:197:stop] 0/7542, RunningAvgSamplesPerSec=6.326205118400007, CurrSamplesPerSec=5.686262064181963, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:05:57,479] [INFO] [timer.py:197:stop] 0/7544, RunningAvgSamplesPerSec=6.3261432187928275, CurrSamplesPerSec=5.340915848471092, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:06:08,768] [INFO] [timer.py:197:stop] 0/7546, RunningAvgSamplesPerSec=6.326143252932276, CurrSamplesPerSec=5.6954246490032245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:06:20,493] [INFO] [timer.py:197:stop] 0/7548, RunningAvgSamplesPerSec=6.326075836448096, CurrSamplesPerSec=5.285450380015823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:06:31,885] [INFO] [timer.py:197:stop] 0/7550, RunningAvgSamplesPerSec=6.326075442753217, CurrSamplesPerSec=5.6899100019221045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:06:43,190] [INFO] [timer.py:197:stop] 0/7552, RunningAvgSamplesPerSec=6.326078233765403, CurrSamplesPerSec=5.709418193971812, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:06:54,573] [INFO] [timer.py:197:stop] 0/7554, RunningAvgSamplesPerSec=6.326071753052907, CurrSamplesPerSec=5.673140025415861, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:07:05,887] [INFO] [timer.py:197:stop] 0/7556, RunningAvgSamplesPerSec=6.326072649016556, CurrSamplesPerSec=5.697257663369995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:07:17,246] [INFO] [timer.py:197:stop] 0/7558, RunningAvgSamplesPerSec=6.326061968286915, CurrSamplesPerSec=5.631799536432363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:07:28,539] [INFO] [logging.py:68:log_dist] [Rank 0] step=3780, skipped=6, lr=[2.726666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 13:07:28,541] [INFO] [timer.py:197:stop] 0/7560, RunningAvgSamplesPerSec=6.3260639996121855, CurrSamplesPerSec=5.698284444307455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:07:39,849] [INFO] [timer.py:197:stop] 0/7562, RunningAvgSamplesPerSec=6.326060723624805, CurrSamplesPerSec=5.675312665464043, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:07:51,195] [INFO] [timer.py:197:stop] 0/7564, RunningAvgSamplesPerSec=6.3260537664372025, CurrSamplesPerSec=5.701135246835258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:08:02,793] [INFO] [timer.py:197:stop] 0/7566, RunningAvgSamplesPerSec=6.326050391108755, CurrSamplesPerSec=5.69980558246421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:08:14,417] [INFO] [timer.py:197:stop] 0/7568, RunningAvgSamplesPerSec=6.3260031671798025, CurrSamplesPerSec=5.433674347849867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:08:25,722] [INFO] [timer.py:197:stop] 0/7570, RunningAvgSamplesPerSec=6.3260028745032, CurrSamplesPerSec=5.70237346725938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:08:37,142] [INFO] [timer.py:197:stop] 0/7572, RunningAvgSamplesPerSec=6.32598002973172, CurrSamplesPerSec=5.548191563588449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:08:48,764] [INFO] [timer.py:197:stop] 0/7574, RunningAvgSamplesPerSec=6.325966080557791, CurrSamplesPerSec=5.626351691834698, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:09:00,089] [INFO] [timer.py:197:stop] 0/7576, RunningAvgSamplesPerSec=6.325966599750553, CurrSamplesPerSec=5.698147276887351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:09:11,396] [INFO] [timer.py:197:stop] 0/7578, RunningAvgSamplesPerSec=6.325968676866865, CurrSamplesPerSec=5.720779872027458, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.706666666666667e-06, 'epoch': 28.38} [2022-12-19 13:09:22,992] [INFO] [logging.py:68:log_dist] [Rank 0] step=3790, skipped=6, lr=[2.7044444444444447e-06], mom=[[0.9, 0.999]] [2022-12-19 13:09:22,993] [INFO] [timer.py:197:stop] 0/7580, RunningAvgSamplesPerSec=6.3259676768165445, CurrSamplesPerSec=5.696872443882077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:09:34,474] [INFO] [timer.py:197:stop] 0/7582, RunningAvgSamplesPerSec=6.32594000190206, CurrSamplesPerSec=5.537779293933153, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:09:45,807] [INFO] [timer.py:197:stop] 0/7584, RunningAvgSamplesPerSec=6.325939561194481, CurrSamplesPerSec=5.696159695486687, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:09:57,274] [INFO] [timer.py:197:stop] 0/7586, RunningAvgSamplesPerSec=6.325913951524841, CurrSamplesPerSec=5.555176622798571, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:10:08,646] [INFO] [timer.py:197:stop] 0/7588, RunningAvgSamplesPerSec=6.325908385133546, CurrSamplesPerSec=5.67178671489725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:10:19,935] [INFO] [timer.py:197:stop] 0/7590, RunningAvgSamplesPerSec=6.325907977394865, CurrSamplesPerSec=5.689160651484775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:10:31,325] [INFO] [timer.py:197:stop] 0/7592, RunningAvgSamplesPerSec=6.325903945437282, CurrSamplesPerSec=5.69180029322046, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:10:42,782] [INFO] [timer.py:197:stop] 0/7594, RunningAvgSamplesPerSec=6.325902459977605, CurrSamplesPerSec=5.694957518130973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:10:54,526] [INFO] [timer.py:197:stop] 0/7596, RunningAvgSamplesPerSec=6.325835366597209, CurrSamplesPerSec=5.3164739131673056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:11:05,911] [INFO] [timer.py:197:stop] 0/7598, RunningAvgSamplesPerSec=6.325822192746681, CurrSamplesPerSec=5.5952761762847825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:11:17,353] [INFO] [logging.py:68:log_dist] [Rank 0] step=3800, skipped=6, lr=[2.6822222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 13:11:17,354] [INFO] [timer.py:197:stop] 0/7600, RunningAvgSamplesPerSec=6.3258179463615445, CurrSamplesPerSec=5.674487024580855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:11:28,966] [INFO] [timer.py:197:stop] 0/7602, RunningAvgSamplesPerSec=6.3257707953482045, CurrSamplesPerSec=5.6937309696695735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:11:40,257] [INFO] [timer.py:197:stop] 0/7604, RunningAvgSamplesPerSec=6.325772798854282, CurrSamplesPerSec=5.691630854131405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:11:51,969] [INFO] [timer.py:197:stop] 0/7606, RunningAvgSamplesPerSec=6.325770743510333, CurrSamplesPerSec=5.671903201260805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:12:03,295] [INFO] [timer.py:197:stop] 0/7608, RunningAvgSamplesPerSec=6.325767407406213, CurrSamplesPerSec=5.667853676023428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:12:14,878] [INFO] [timer.py:197:stop] 0/7610, RunningAvgSamplesPerSec=6.3257257522439545, CurrSamplesPerSec=5.4408094206481685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:12:26,230] [INFO] [timer.py:197:stop] 0/7612, RunningAvgSamplesPerSec=6.325726316938821, CurrSamplesPerSec=5.679497589113596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:12:37,628] [INFO] [timer.py:197:stop] 0/7614, RunningAvgSamplesPerSec=6.325715347635781, CurrSamplesPerSec=5.625447335834916, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:12:49,227] [INFO] [timer.py:197:stop] 0/7616, RunningAvgSamplesPerSec=6.3257120617539275, CurrSamplesPerSec=5.669915918902692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:13:00,592] [INFO] [timer.py:197:stop] 0/7618, RunningAvgSamplesPerSec=6.325705478500622, CurrSamplesPerSec=5.661552707324238, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:13:11,963] [INFO] [logging.py:68:log_dist] [Rank 0] step=3810, skipped=6, lr=[2.6600000000000004e-06], mom=[[0.9, 0.999]] [2022-12-19 13:13:11,964] [INFO] [timer.py:197:stop] 0/7620, RunningAvgSamplesPerSec=6.325696850153674, CurrSamplesPerSec=5.651596148175384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:13:23,285] [INFO] [timer.py:197:stop] 0/7622, RunningAvgSamplesPerSec=6.325696742090677, CurrSamplesPerSec=5.6842028407943115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:13:34,840] [INFO] [timer.py:197:stop] 0/7624, RunningAvgSamplesPerSec=6.325698634401066, CurrSamplesPerSec=5.691122838703123, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:13:46,342] [INFO] [timer.py:197:stop] 0/7626, RunningAvgSamplesPerSec=6.325672468667422, CurrSamplesPerSec=5.643542174418512, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:13:57,622] [INFO] [timer.py:197:stop] 0/7628, RunningAvgSamplesPerSec=6.325676085522505, CurrSamplesPerSec=5.7086554412973385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.6511111111111113e-06, 'epoch': 28.57} [2022-12-19 13:14:09,173] [INFO] [timer.py:197:stop] 0/7630, RunningAvgSamplesPerSec=6.325670713121941, CurrSamplesPerSec=5.6756949746498675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:14:20,484] [INFO] [timer.py:197:stop] 0/7632, RunningAvgSamplesPerSec=6.325668846179882, CurrSamplesPerSec=5.677485529294473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:14:31,816] [INFO] [timer.py:197:stop] 0/7634, RunningAvgSamplesPerSec=6.325668832541273, CurrSamplesPerSec=5.694375947010436, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:14:43,094] [INFO] [timer.py:197:stop] 0/7636, RunningAvgSamplesPerSec=6.32567281029167, CurrSamplesPerSec=5.709598409180774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:14:54,387] [INFO] [timer.py:197:stop] 0/7638, RunningAvgSamplesPerSec=6.325671471222432, CurrSamplesPerSec=5.694285593069332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:15:05,640] [INFO] [logging.py:68:log_dist] [Rank 0] step=3820, skipped=6, lr=[2.637777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 13:15:05,642] [INFO] [timer.py:197:stop] 0/7640, RunningAvgSamplesPerSec=6.325678058596746, CurrSamplesPerSec=5.716272431542518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:15:16,919] [INFO] [timer.py:197:stop] 0/7642, RunningAvgSamplesPerSec=6.325681366467181, CurrSamplesPerSec=5.6925435789994605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:15:28,183] [INFO] [timer.py:197:stop] 0/7644, RunningAvgSamplesPerSec=6.325692461481123, CurrSamplesPerSec=5.726431264062196, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:15:39,473] [INFO] [timer.py:197:stop] 0/7646, RunningAvgSamplesPerSec=6.325697762287474, CurrSamplesPerSec=5.700393347389898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:15:50,977] [INFO] [timer.py:197:stop] 0/7648, RunningAvgSamplesPerSec=6.325702033948269, CurrSamplesPerSec=5.700552413351156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:16:02,372] [INFO] [timer.py:197:stop] 0/7650, RunningAvgSamplesPerSec=6.32570845528579, CurrSamplesPerSec=5.712707089669525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:16:13,632] [INFO] [timer.py:197:stop] 0/7652, RunningAvgSamplesPerSec=6.325718778856092, CurrSamplesPerSec=5.722158861875652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:16:25,065] [INFO] [timer.py:197:stop] 0/7654, RunningAvgSamplesPerSec=6.325723155548144, CurrSamplesPerSec=5.698102281605983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:16:36,445] [INFO] [timer.py:197:stop] 0/7656, RunningAvgSamplesPerSec=6.32571972041576, CurrSamplesPerSec=5.650399860333155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:16:47,671] [INFO] [timer.py:197:stop] 0/7658, RunningAvgSamplesPerSec=6.325725337616911, CurrSamplesPerSec=5.731594967777702, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:16:58,991] [INFO] [logging.py:68:log_dist] [Rank 0] step=3830, skipped=6, lr=[2.6155555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 13:16:58,993] [INFO] [timer.py:197:stop] 0/7660, RunningAvgSamplesPerSec=6.325723345008324, CurrSamplesPerSec=5.683957307345862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:17:10,328] [INFO] [timer.py:197:stop] 0/7662, RunningAvgSamplesPerSec=6.325724145657958, CurrSamplesPerSec=5.692300945704742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:17:21,652] [INFO] [timer.py:197:stop] 0/7664, RunningAvgSamplesPerSec=6.325722867244221, CurrSamplesPerSec=5.695975976511234, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:17:33,170] [INFO] [timer.py:197:stop] 0/7666, RunningAvgSamplesPerSec=6.325724055477331, CurrSamplesPerSec=5.6804795113502955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:17:44,614] [INFO] [timer.py:197:stop] 0/7668, RunningAvgSamplesPerSec=6.325724318278285, CurrSamplesPerSec=5.686640308488723, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:17:56,011] [INFO] [timer.py:197:stop] 0/7670, RunningAvgSamplesPerSec=6.325712550340293, CurrSamplesPerSec=5.629348873540756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:18:07,298] [INFO] [timer.py:197:stop] 0/7672, RunningAvgSamplesPerSec=6.325718167774465, CurrSamplesPerSec=5.69050634322856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:18:18,593] [INFO] [timer.py:197:stop] 0/7674, RunningAvgSamplesPerSec=6.325719674021079, CurrSamplesPerSec=5.689471510279712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:18:29,919] [INFO] [timer.py:197:stop] 0/7676, RunningAvgSamplesPerSec=6.325726646737962, CurrSamplesPerSec=5.725336677421086, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:18:41,196] [INFO] [timer.py:197:stop] 0/7678, RunningAvgSamplesPerSec=6.32572937290245, CurrSamplesPerSec=5.6860312875969345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.5955555555555558e-06, 'epoch': 28.76} [2022-12-19 13:18:52,475] [INFO] [logging.py:68:log_dist] [Rank 0] step=3840, skipped=6, lr=[2.5933333333333336e-06], mom=[[0.9, 0.999]] [2022-12-19 13:18:52,477] [INFO] [timer.py:197:stop] 0/7680, RunningAvgSamplesPerSec=6.325735056664741, CurrSamplesPerSec=5.715024031439892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:19:03,764] [INFO] [timer.py:197:stop] 0/7682, RunningAvgSamplesPerSec=6.3257420877413, CurrSamplesPerSec=5.708830994772771, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:19:15,022] [INFO] [timer.py:197:stop] 0/7684, RunningAvgSamplesPerSec=6.325749602117144, CurrSamplesPerSec=5.7203568451565845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:19:26,311] [INFO] [timer.py:197:stop] 0/7686, RunningAvgSamplesPerSec=6.325756617011262, CurrSamplesPerSec=5.7107360677083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:19:37,788] [INFO] [timer.py:197:stop] 0/7688, RunningAvgSamplesPerSec=6.3257642954131335, CurrSamplesPerSec=5.718456800038516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:19:49,051] [INFO] [timer.py:197:stop] 0/7690, RunningAvgSamplesPerSec=6.325768625483976, CurrSamplesPerSec=5.711497918238238, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:20:00,549] [INFO] [timer.py:197:stop] 0/7692, RunningAvgSamplesPerSec=6.325771549113706, CurrSamplesPerSec=5.695751178624997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:20:11,929] [INFO] [timer.py:197:stop] 0/7694, RunningAvgSamplesPerSec=6.325773127007595, CurrSamplesPerSec=5.696356964843733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:20:23,223] [INFO] [timer.py:197:stop] 0/7696, RunningAvgSamplesPerSec=6.3257719763941385, CurrSamplesPerSec=5.680045837247723, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:20:34,522] [INFO] [timer.py:197:stop] 0/7698, RunningAvgSamplesPerSec=6.325772180124875, CurrSamplesPerSec=5.698418473174162, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:20:46,042] [INFO] [logging.py:68:log_dist] [Rank 0] step=3850, skipped=6, lr=[2.5711111111111112e-06], mom=[[0.9, 0.999]] [2022-12-19 13:20:46,044] [INFO] [timer.py:197:stop] 0/7700, RunningAvgSamplesPerSec=6.325773462205251, CurrSamplesPerSec=5.694615374661078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:20:57,368] [INFO] [timer.py:197:stop] 0/7702, RunningAvgSamplesPerSec=6.325770168635335, CurrSamplesPerSec=5.684020373716902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:21:08,705] [INFO] [timer.py:197:stop] 0/7704, RunningAvgSamplesPerSec=6.325770112280341, CurrSamplesPerSec=5.701145659983558, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:21:20,264] [INFO] [timer.py:197:stop] 0/7706, RunningAvgSamplesPerSec=6.325766994433138, CurrSamplesPerSec=5.683361856484511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:21:31,573] [INFO] [timer.py:197:stop] 0/7708, RunningAvgSamplesPerSec=6.325764310711333, CurrSamplesPerSec=5.676061732677834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:21:42,891] [INFO] [timer.py:197:stop] 0/7710, RunningAvgSamplesPerSec=6.325760482870555, CurrSamplesPerSec=5.67554809271166, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:21:54,224] [INFO] [timer.py:197:stop] 0/7712, RunningAvgSamplesPerSec=6.325757343275594, CurrSamplesPerSec=5.679973003801863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:22:05,691] [INFO] [timer.py:197:stop] 0/7714, RunningAvgSamplesPerSec=6.325754972610979, CurrSamplesPerSec=5.695566760921079, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:22:17,039] [INFO] [timer.py:197:stop] 0/7716, RunningAvgSamplesPerSec=6.325747574444491, CurrSamplesPerSec=5.661574439553909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:22:28,362] [INFO] [timer.py:197:stop] 0/7718, RunningAvgSamplesPerSec=6.32574452982742, CurrSamplesPerSec=5.677081127251715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:22:39,692] [INFO] [logging.py:68:log_dist] [Rank 0] step=3860, skipped=6, lr=[2.5488888888888893e-06], mom=[[0.9, 0.999]] [2022-12-19 13:22:39,694] [INFO] [timer.py:197:stop] 0/7720, RunningAvgSamplesPerSec=6.325742283968359, CurrSamplesPerSec=5.675130288770164, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:22:51,015] [INFO] [timer.py:197:stop] 0/7722, RunningAvgSamplesPerSec=6.325741203429703, CurrSamplesPerSec=5.67225077131576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:23:02,319] [INFO] [timer.py:197:stop] 0/7724, RunningAvgSamplesPerSec=6.325739198738911, CurrSamplesPerSec=5.691689022254152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:23:13,828] [INFO] [timer.py:197:stop] 0/7726, RunningAvgSamplesPerSec=6.325732496228535, CurrSamplesPerSec=5.662396565750241, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:23:25,249] [INFO] [timer.py:197:stop] 0/7728, RunningAvgSamplesPerSec=6.325728806901759, CurrSamplesPerSec=5.691551448115575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.5400000000000002e-06, 'epoch': 28.94} [2022-12-19 13:23:36,520] [INFO] [timer.py:197:stop] 0/7730, RunningAvgSamplesPerSec=6.32573032128197, CurrSamplesPerSec=5.695283027514683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:23:47,908] [INFO] [timer.py:197:stop] 0/7732, RunningAvgSamplesPerSec=6.32572946778015, CurrSamplesPerSec=5.693064888040857, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:23:59,292] [INFO] [timer.py:197:stop] 0/7734, RunningAvgSamplesPerSec=6.325725122831717, CurrSamplesPerSec=5.6795132106774115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:24:10,612] [INFO] [timer.py:197:stop] 0/7736, RunningAvgSamplesPerSec=6.325723869249892, CurrSamplesPerSec=5.683685560251199, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:24:21,974] [INFO] [timer.py:197:stop] 0/7738, RunningAvgSamplesPerSec=6.325719178768318, CurrSamplesPerSec=5.661772186628028, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:24:33,296] [INFO] [logging.py:68:log_dist] [Rank 0] step=3870, skipped=6, lr=[2.526666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 13:24:33,298] [INFO] [timer.py:197:stop] 0/7740, RunningAvgSamplesPerSec=6.325713108821826, CurrSamplesPerSec=5.673321314976575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:24:44,714] [INFO] [timer.py:197:stop] 0/7742, RunningAvgSamplesPerSec=6.325709675984565, CurrSamplesPerSec=5.691050927457292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:24:55,078] [INFO] [timer.py:197:stop] 0/7744, RunningAvgSamplesPerSec=6.325841657917056, CurrSamplesPerSec=5.6900863341712595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:25:06,423] [INFO] [timer.py:197:stop] 0/7746, RunningAvgSamplesPerSec=6.3258371434638905, CurrSamplesPerSec=5.659927322975207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:25:17,868] [INFO] [timer.py:197:stop] 0/7748, RunningAvgSamplesPerSec=6.325832884458717, CurrSamplesPerSec=5.687773656169745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:25:29,317] [INFO] [timer.py:197:stop] 0/7750, RunningAvgSamplesPerSec=6.325832150390885, CurrSamplesPerSec=5.683476652782946, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:25:40,622] [INFO] [timer.py:197:stop] 0/7752, RunningAvgSamplesPerSec=6.325830443561862, CurrSamplesPerSec=5.6855163234733155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:25:51,907] [INFO] [timer.py:197:stop] 0/7754, RunningAvgSamplesPerSec=6.3258263312009415, CurrSamplesPerSec=5.678498227988329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:26:03,230] [INFO] [timer.py:197:stop] 0/7756, RunningAvgSamplesPerSec=6.325824978729657, CurrSamplesPerSec=5.696352129647129, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:26:14,579] [INFO] [timer.py:197:stop] 0/7758, RunningAvgSamplesPerSec=6.325821598310091, CurrSamplesPerSec=5.6768133986906815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:26:26,125] [INFO] [logging.py:68:log_dist] [Rank 0] step=3880, skipped=6, lr=[2.504444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 13:26:26,127] [INFO] [timer.py:197:stop] 0/7760, RunningAvgSamplesPerSec=6.325814874319787, CurrSamplesPerSec=5.660453894944887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:26:37,431] [INFO] [timer.py:197:stop] 0/7762, RunningAvgSamplesPerSec=6.325815392043486, CurrSamplesPerSec=5.690979741940102, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:26:48,740] [INFO] [timer.py:197:stop] 0/7764, RunningAvgSamplesPerSec=6.325814451055411, CurrSamplesPerSec=5.704137260984337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:27:00,212] [INFO] [timer.py:197:stop] 0/7766, RunningAvgSamplesPerSec=6.3258091014436015, CurrSamplesPerSec=5.675388019256898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:27:11,591] [INFO] [timer.py:197:stop] 0/7768, RunningAvgSamplesPerSec=6.325803334718985, CurrSamplesPerSec=5.667801977615815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:27:23,051] [INFO] [timer.py:197:stop] 0/7770, RunningAvgSamplesPerSec=6.325797842248269, CurrSamplesPerSec=5.653600136780518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:27:34,352] [INFO] [timer.py:197:stop] 0/7772, RunningAvgSamplesPerSec=6.325798748928832, CurrSamplesPerSec=5.707628803252596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:27:45,686] [INFO] [timer.py:197:stop] 0/7774, RunningAvgSamplesPerSec=6.3257977245956, CurrSamplesPerSec=5.681256636340298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:27:57,004] [INFO] [timer.py:197:stop] 0/7776, RunningAvgSamplesPerSec=6.325793502936383, CurrSamplesPerSec=5.67399381819281, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:28:08,485] [INFO] [timer.py:197:stop] 0/7778, RunningAvgSamplesPerSec=6.325790533102521, CurrSamplesPerSec=5.690010589567908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.4844444444444447e-06, 'epoch': 29.13} [2022-12-19 13:28:19,816] [INFO] [logging.py:68:log_dist] [Rank 0] step=3890, skipped=6, lr=[2.4822222222222225e-06], mom=[[0.9, 0.999]] [2022-12-19 13:28:19,818] [INFO] [timer.py:197:stop] 0/7780, RunningAvgSamplesPerSec=6.325785400701735, CurrSamplesPerSec=5.676211041732614, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:28:31,117] [INFO] [timer.py:197:stop] 0/7782, RunningAvgSamplesPerSec=6.325780871747347, CurrSamplesPerSec=5.676092698058052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:28:42,446] [INFO] [timer.py:197:stop] 0/7784, RunningAvgSamplesPerSec=6.325779225829728, CurrSamplesPerSec=5.701055091083352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:28:53,803] [INFO] [timer.py:197:stop] 0/7786, RunningAvgSamplesPerSec=6.325772996615691, CurrSamplesPerSec=5.669406264314205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:29:05,122] [INFO] [timer.py:197:stop] 0/7788, RunningAvgSamplesPerSec=6.325772062078862, CurrSamplesPerSec=5.682317591638766, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:29:16,453] [INFO] [timer.py:197:stop] 0/7790, RunningAvgSamplesPerSec=6.325764875096921, CurrSamplesPerSec=5.659399417167287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:29:27,789] [INFO] [timer.py:197:stop] 0/7792, RunningAvgSamplesPerSec=6.325760759261706, CurrSamplesPerSec=5.675257951225106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:29:39,117] [INFO] [timer.py:197:stop] 0/7794, RunningAvgSamplesPerSec=6.325758270335204, CurrSamplesPerSec=5.683757766864503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:29:50,541] [INFO] [timer.py:197:stop] 0/7796, RunningAvgSamplesPerSec=6.32575613070506, CurrSamplesPerSec=5.672699319488562, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:30:02,103] [INFO] [timer.py:197:stop] 0/7798, RunningAvgSamplesPerSec=6.325754552843133, CurrSamplesPerSec=5.679781434315931, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:30:13,412] [INFO] [logging.py:68:log_dist] [Rank 0] step=3900, skipped=6, lr=[2.46e-06], mom=[[0.9, 0.999]] [2022-12-19 13:30:13,414] [INFO] [timer.py:197:stop] 0/7800, RunningAvgSamplesPerSec=6.325751902723312, CurrSamplesPerSec=5.696343184555052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:30:24,775] [INFO] [timer.py:197:stop] 0/7802, RunningAvgSamplesPerSec=6.325748732953443, CurrSamplesPerSec=5.684716122359932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:30:36,112] [INFO] [timer.py:197:stop] 0/7804, RunningAvgSamplesPerSec=6.325745867188887, CurrSamplesPerSec=5.693628318216173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:30:47,421] [INFO] [timer.py:197:stop] 0/7806, RunningAvgSamplesPerSec=6.3257484557483545, CurrSamplesPerSec=5.709247704341207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:30:58,750] [INFO] [timer.py:197:stop] 0/7808, RunningAvgSamplesPerSec=6.325743095757645, CurrSamplesPerSec=5.682373163808165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:31:10,107] [INFO] [timer.py:197:stop] 0/7810, RunningAvgSamplesPerSec=6.325734229026709, CurrSamplesPerSec=5.654123388355881, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:31:21,477] [INFO] [timer.py:197:stop] 0/7812, RunningAvgSamplesPerSec=6.32573882641567, CurrSamplesPerSec=5.72103932695023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:31:32,806] [INFO] [timer.py:197:stop] 0/7814, RunningAvgSamplesPerSec=6.3257386716007, CurrSamplesPerSec=5.682810080718693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:31:44,131] [INFO] [timer.py:197:stop] 0/7816, RunningAvgSamplesPerSec=6.325741603175356, CurrSamplesPerSec=5.692630497450255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:31:55,421] [INFO] [timer.py:197:stop] 0/7818, RunningAvgSamplesPerSec=6.3257397705291885, CurrSamplesPerSec=5.69480770448989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:32:06,818] [INFO] [logging.py:68:log_dist] [Rank 0] step=3910, skipped=6, lr=[2.437777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 13:32:06,820] [INFO] [timer.py:197:stop] 0/7820, RunningAvgSamplesPerSec=6.325740738535742, CurrSamplesPerSec=5.7035879881304545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:32:18,142] [INFO] [timer.py:197:stop] 0/7822, RunningAvgSamplesPerSec=6.325743387194039, CurrSamplesPerSec=5.705876676326264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:32:29,564] [INFO] [timer.py:197:stop] 0/7824, RunningAvgSamplesPerSec=6.3257228990907395, CurrSamplesPerSec=5.5855037452680865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:32:40,880] [INFO] [timer.py:197:stop] 0/7826, RunningAvgSamplesPerSec=6.325722698477354, CurrSamplesPerSec=5.696906538420857, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:32:52,379] [INFO] [timer.py:197:stop] 0/7828, RunningAvgSamplesPerSec=6.32572179050017, CurrSamplesPerSec=5.694436828822976, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.428888888888889e-06, 'epoch': 29.32} [2022-12-19 13:33:03,679] [INFO] [timer.py:197:stop] 0/7830, RunningAvgSamplesPerSec=6.325720944190565, CurrSamplesPerSec=5.6768803884909556, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:33:15,031] [INFO] [timer.py:197:stop] 0/7832, RunningAvgSamplesPerSec=6.325715037024329, CurrSamplesPerSec=5.677366412018387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:33:26,368] [INFO] [timer.py:197:stop] 0/7834, RunningAvgSamplesPerSec=6.325715139611505, CurrSamplesPerSec=5.684971112343868, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:33:37,747] [INFO] [timer.py:197:stop] 0/7836, RunningAvgSamplesPerSec=6.325704826818761, CurrSamplesPerSec=5.643775685090993, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:33:49,246] [INFO] [timer.py:197:stop] 0/7838, RunningAvgSamplesPerSec=6.325700042696439, CurrSamplesPerSec=5.66693736581329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:34:00,634] [INFO] [logging.py:68:log_dist] [Rank 0] step=3920, skipped=6, lr=[2.415555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 13:34:00,636] [INFO] [timer.py:197:stop] 0/7840, RunningAvgSamplesPerSec=6.325698139251697, CurrSamplesPerSec=5.67812635073012, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:34:12,010] [INFO] [timer.py:197:stop] 0/7842, RunningAvgSamplesPerSec=6.325698215128416, CurrSamplesPerSec=5.694233169770908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:34:23,386] [INFO] [timer.py:197:stop] 0/7844, RunningAvgSamplesPerSec=6.325688770103077, CurrSamplesPerSec=5.702567532934563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:34:34,828] [INFO] [timer.py:197:stop] 0/7846, RunningAvgSamplesPerSec=6.32568319556992, CurrSamplesPerSec=5.677589040502513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:34:46,152] [INFO] [timer.py:197:stop] 0/7848, RunningAvgSamplesPerSec=6.325679262952537, CurrSamplesPerSec=5.690351697415027, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:34:57,666] [INFO] [timer.py:197:stop] 0/7850, RunningAvgSamplesPerSec=6.3256756614333485, CurrSamplesPerSec=5.684639557595243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:35:08,957] [INFO] [timer.py:197:stop] 0/7852, RunningAvgSamplesPerSec=6.325673402534322, CurrSamplesPerSec=5.6917049523208805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:35:20,435] [INFO] [timer.py:197:stop] 0/7854, RunningAvgSamplesPerSec=6.325670313544138, CurrSamplesPerSec=5.700030216362319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:35:31,830] [INFO] [timer.py:197:stop] 0/7856, RunningAvgSamplesPerSec=6.325670685123161, CurrSamplesPerSec=5.6773042136529845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:35:43,151] [INFO] [timer.py:197:stop] 0/7858, RunningAvgSamplesPerSec=6.3256668778988825, CurrSamplesPerSec=5.671878034048663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:35:54,554] [INFO] [logging.py:68:log_dist] [Rank 0] step=3930, skipped=6, lr=[2.3933333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 13:35:54,555] [INFO] [timer.py:197:stop] 0/7860, RunningAvgSamplesPerSec=6.32566464243522, CurrSamplesPerSec=5.682861572048722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:36:05,882] [INFO] [timer.py:197:stop] 0/7862, RunningAvgSamplesPerSec=6.325664565477907, CurrSamplesPerSec=5.6945155905595195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:36:17,154] [INFO] [timer.py:197:stop] 0/7864, RunningAvgSamplesPerSec=6.325666616636037, CurrSamplesPerSec=5.691385885251289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:36:28,482] [INFO] [timer.py:197:stop] 0/7866, RunningAvgSamplesPerSec=6.325666397165196, CurrSamplesPerSec=5.690209604381714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:36:39,884] [INFO] [timer.py:197:stop] 0/7868, RunningAvgSamplesPerSec=6.325674284386857, CurrSamplesPerSec=5.710259619548682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:36:51,145] [INFO] [timer.py:197:stop] 0/7870, RunningAvgSamplesPerSec=6.325674428880491, CurrSamplesPerSec=5.692097922196415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:37:02,458] [INFO] [timer.py:197:stop] 0/7872, RunningAvgSamplesPerSec=6.325674290892881, CurrSamplesPerSec=5.682489604259255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:37:13,769] [INFO] [timer.py:197:stop] 0/7874, RunningAvgSamplesPerSec=6.32567752733782, CurrSamplesPerSec=5.698626544916717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:37:25,089] [INFO] [timer.py:197:stop] 0/7876, RunningAvgSamplesPerSec=6.325679090210268, CurrSamplesPerSec=5.704312294242095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:37:36,612] [INFO] [timer.py:197:stop] 0/7878, RunningAvgSamplesPerSec=6.325675084325163, CurrSamplesPerSec=5.6934445209785896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.3733333333333336e-06, 'epoch': 29.51} [2022-12-19 13:37:47,912] [INFO] [logging.py:68:log_dist] [Rank 0] step=3940, skipped=6, lr=[2.371111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 13:37:47,914] [INFO] [timer.py:197:stop] 0/7880, RunningAvgSamplesPerSec=6.325675238725691, CurrSamplesPerSec=5.700597931576377, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:37:59,257] [INFO] [timer.py:197:stop] 0/7882, RunningAvgSamplesPerSec=6.325669996240306, CurrSamplesPerSec=5.677592883218626, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:38:10,759] [INFO] [timer.py:197:stop] 0/7884, RunningAvgSamplesPerSec=6.325664868117995, CurrSamplesPerSec=5.691140454822341, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:38:22,069] [INFO] [timer.py:197:stop] 0/7886, RunningAvgSamplesPerSec=6.325663727546513, CurrSamplesPerSec=5.681426179787885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:38:33,399] [INFO] [timer.py:197:stop] 0/7888, RunningAvgSamplesPerSec=6.325661791879295, CurrSamplesPerSec=5.680121316782918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:38:44,681] [INFO] [timer.py:197:stop] 0/7890, RunningAvgSamplesPerSec=6.325663401276143, CurrSamplesPerSec=5.677583996945508, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:38:55,952] [INFO] [timer.py:197:stop] 0/7892, RunningAvgSamplesPerSec=6.325667065806909, CurrSamplesPerSec=5.695039435696986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:39:07,233] [INFO] [timer.py:197:stop] 0/7894, RunningAvgSamplesPerSec=6.325670236867487, CurrSamplesPerSec=5.7248277553492555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:39:18,494] [INFO] [timer.py:197:stop] 0/7896, RunningAvgSamplesPerSec=6.325677669974229, CurrSamplesPerSec=5.726673394974693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:39:29,783] [INFO] [timer.py:197:stop] 0/7898, RunningAvgSamplesPerSec=6.325683227300923, CurrSamplesPerSec=5.71162211792678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:39:41,032] [INFO] [logging.py:68:log_dist] [Rank 0] step=3950, skipped=6, lr=[2.348888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 13:39:41,034] [INFO] [timer.py:197:stop] 0/7900, RunningAvgSamplesPerSec=6.325689289564999, CurrSamplesPerSec=5.707183450911864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:39:52,331] [INFO] [timer.py:197:stop] 0/7902, RunningAvgSamplesPerSec=6.325691317858357, CurrSamplesPerSec=5.684706009914309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:40:03,623] [INFO] [timer.py:197:stop] 0/7904, RunningAvgSamplesPerSec=6.3256933772709605, CurrSamplesPerSec=5.704934694776576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:40:14,926] [INFO] [timer.py:197:stop] 0/7906, RunningAvgSamplesPerSec=6.325695782430855, CurrSamplesPerSec=5.68759144180553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:40:26,256] [INFO] [timer.py:197:stop] 0/7908, RunningAvgSamplesPerSec=6.325692089818773, CurrSamplesPerSec=5.67573145630985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:40:37,723] [INFO] [timer.py:197:stop] 0/7910, RunningAvgSamplesPerSec=6.325692742777325, CurrSamplesPerSec=5.688341343214653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:40:48,984] [INFO] [timer.py:197:stop] 0/7912, RunningAvgSamplesPerSec=6.3256966440875635, CurrSamplesPerSec=5.703737052165982, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:41:00,526] [INFO] [timer.py:197:stop] 0/7914, RunningAvgSamplesPerSec=6.325695477879883, CurrSamplesPerSec=5.6884517600571955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:41:11,832] [INFO] [timer.py:197:stop] 0/7916, RunningAvgSamplesPerSec=6.325694027052762, CurrSamplesPerSec=5.682457606717658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:41:23,205] [INFO] [timer.py:197:stop] 0/7918, RunningAvgSamplesPerSec=6.325686363906638, CurrSamplesPerSec=5.631528264125245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:41:34,540] [INFO] [logging.py:68:log_dist] [Rank 0] step=3960, skipped=6, lr=[2.3266666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 13:41:34,541] [INFO] [timer.py:197:stop] 0/7920, RunningAvgSamplesPerSec=6.3256836034145705, CurrSamplesPerSec=5.675657533433855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:41:45,867] [INFO] [timer.py:197:stop] 0/7922, RunningAvgSamplesPerSec=6.325680639392849, CurrSamplesPerSec=5.659410871582581, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:41:57,218] [INFO] [timer.py:197:stop] 0/7924, RunningAvgSamplesPerSec=6.325674471251452, CurrSamplesPerSec=5.667217803286366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:42:08,674] [INFO] [timer.py:197:stop] 0/7926, RunningAvgSamplesPerSec=6.325674729353246, CurrSamplesPerSec=5.687237169516435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:42:19,984] [INFO] [timer.py:197:stop] 0/7928, RunningAvgSamplesPerSec=6.32567673746821, CurrSamplesPerSec=5.682100365405626, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.317777777777778e-06, 'epoch': 29.7} [2022-12-19 13:42:31,271] [INFO] [timer.py:197:stop] 0/7930, RunningAvgSamplesPerSec=6.325676851143644, CurrSamplesPerSec=5.6904969339607545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:42:42,554] [INFO] [timer.py:197:stop] 0/7932, RunningAvgSamplesPerSec=6.325675403852229, CurrSamplesPerSec=5.688175002789677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:42:53,862] [INFO] [timer.py:197:stop] 0/7934, RunningAvgSamplesPerSec=6.325676985974739, CurrSamplesPerSec=5.696744532547926, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:43:05,138] [INFO] [timer.py:197:stop] 0/7936, RunningAvgSamplesPerSec=6.325681967942861, CurrSamplesPerSec=5.698733248064597, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:43:16,572] [INFO] [timer.py:197:stop] 0/7938, RunningAvgSamplesPerSec=6.325682632003199, CurrSamplesPerSec=5.695722657150945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:43:27,873] [INFO] [logging.py:68:log_dist] [Rank 0] step=3970, skipped=6, lr=[2.3044444444444447e-06], mom=[[0.9, 0.999]] [2022-12-19 13:43:27,875] [INFO] [timer.py:197:stop] 0/7940, RunningAvgSamplesPerSec=6.325677814326682, CurrSamplesPerSec=5.658184319380613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:43:39,192] [INFO] [timer.py:197:stop] 0/7942, RunningAvgSamplesPerSec=6.325673007457672, CurrSamplesPerSec=5.663491830943553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:43:50,549] [INFO] [timer.py:197:stop] 0/7944, RunningAvgSamplesPerSec=6.325669816144969, CurrSamplesPerSec=5.6715652601915805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:44:02,055] [INFO] [timer.py:197:stop] 0/7946, RunningAvgSamplesPerSec=6.325669305429533, CurrSamplesPerSec=5.702150344399227, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:44:13,404] [INFO] [timer.py:197:stop] 0/7948, RunningAvgSamplesPerSec=6.325662767917968, CurrSamplesPerSec=5.651939567775533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:44:24,747] [INFO] [timer.py:197:stop] 0/7950, RunningAvgSamplesPerSec=6.325660381871462, CurrSamplesPerSec=5.684307319302714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:44:36,071] [INFO] [timer.py:197:stop] 0/7952, RunningAvgSamplesPerSec=6.325656257240198, CurrSamplesPerSec=5.669340408681799, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:44:47,348] [INFO] [timer.py:197:stop] 0/7954, RunningAvgSamplesPerSec=6.3256550926371995, CurrSamplesPerSec=5.680700941458163, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:44:58,631] [INFO] [timer.py:197:stop] 0/7956, RunningAvgSamplesPerSec=6.325654185967354, CurrSamplesPerSec=5.686292659143901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:45:09,926] [INFO] [timer.py:197:stop] 0/7958, RunningAvgSamplesPerSec=6.325657210149753, CurrSamplesPerSec=5.709068482630922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:45:21,351] [INFO] [logging.py:68:log_dist] [Rank 0] step=3980, skipped=6, lr=[2.2822222222222223e-06], mom=[[0.9, 0.999]] [2022-12-19 13:45:21,353] [INFO] [timer.py:197:stop] 0/7960, RunningAvgSamplesPerSec=6.325658832987802, CurrSamplesPerSec=5.707909156037622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:45:32,684] [INFO] [timer.py:197:stop] 0/7962, RunningAvgSamplesPerSec=6.325654356364491, CurrSamplesPerSec=5.67922098150126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:45:43,983] [INFO] [timer.py:197:stop] 0/7964, RunningAvgSamplesPerSec=6.32565760157723, CurrSamplesPerSec=5.7111652063560365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:45:55,305] [INFO] [timer.py:197:stop] 0/7966, RunningAvgSamplesPerSec=6.325654315415912, CurrSamplesPerSec=5.672793305432681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:46:06,796] [INFO] [timer.py:197:stop] 0/7968, RunningAvgSamplesPerSec=6.325652586847356, CurrSamplesPerSec=5.676224724788098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:46:18,122] [INFO] [timer.py:197:stop] 0/7970, RunningAvgSamplesPerSec=6.325647969551927, CurrSamplesPerSec=5.666931144818846, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:46:29,484] [INFO] [timer.py:197:stop] 0/7972, RunningAvgSamplesPerSec=6.325650871885098, CurrSamplesPerSec=5.701778027323883, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:46:40,790] [INFO] [timer.py:197:stop] 0/7974, RunningAvgSamplesPerSec=6.325652864398155, CurrSamplesPerSec=5.694726276833383, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:46:52,156] [INFO] [timer.py:197:stop] 0/7976, RunningAvgSamplesPerSec=6.325640492130813, CurrSamplesPerSec=5.682649116176873, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:47:03,468] [INFO] [timer.py:197:stop] 0/7978, RunningAvgSamplesPerSec=6.32563813060356, CurrSamplesPerSec=5.676486155834321, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.262222222222222e-06, 'epoch': 29.88} [2022-12-19 13:47:14,804] [INFO] [logging.py:68:log_dist] [Rank 0] step=3990, skipped=6, lr=[2.2600000000000004e-06], mom=[[0.9, 0.999]] [2022-12-19 13:47:14,805] [INFO] [timer.py:197:stop] 0/7980, RunningAvgSamplesPerSec=6.3256318042446775, CurrSamplesPerSec=5.677628188416351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:47:26,165] [INFO] [timer.py:197:stop] 0/7982, RunningAvgSamplesPerSec=6.325626031291596, CurrSamplesPerSec=5.6669605750284795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:47:37,511] [INFO] [timer.py:197:stop] 0/7984, RunningAvgSamplesPerSec=6.32561949501742, CurrSamplesPerSec=5.664622663567377, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:47:48,826] [INFO] [timer.py:197:stop] 0/7986, RunningAvgSamplesPerSec=6.3256174973937105, CurrSamplesPerSec=5.680290071279917, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:48:00,160] [INFO] [timer.py:197:stop] 0/7988, RunningAvgSamplesPerSec=6.325608477638873, CurrSamplesPerSec=5.6495203361550965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:48:11,526] [INFO] [timer.py:197:stop] 0/7990, RunningAvgSamplesPerSec=6.325599492768515, CurrSamplesPerSec=5.646439405763403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:48:22,860] [INFO] [timer.py:197:stop] 0/7992, RunningAvgSamplesPerSec=6.325593848988396, CurrSamplesPerSec=5.676637888109467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:48:34,141] [INFO] [timer.py:197:stop] 0/7994, RunningAvgSamplesPerSec=6.325595392227366, CurrSamplesPerSec=5.698204610614183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:48:45,571] [INFO] [timer.py:197:stop] 0/7996, RunningAvgSamplesPerSec=6.325591235598977, CurrSamplesPerSec=5.681809072574381, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:48:56,911] [INFO] [timer.py:197:stop] 0/7998, RunningAvgSamplesPerSec=6.325586575115625, CurrSamplesPerSec=5.676135666083297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:49:08,213] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=6, lr=[2.237777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 13:49:08,215] [INFO] [timer.py:197:stop] 0/8000, RunningAvgSamplesPerSec=6.325585084244748, CurrSamplesPerSec=5.68784307423672, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:49:19,533] [INFO] [timer.py:197:stop] 0/8002, RunningAvgSamplesPerSec=6.325581387696361, CurrSamplesPerSec=5.669517144708152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:49:30,839] [INFO] [timer.py:197:stop] 0/8004, RunningAvgSamplesPerSec=6.325582140096495, CurrSamplesPerSec=5.6876603733799085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:49:42,152] [INFO] [timer.py:197:stop] 0/8006, RunningAvgSamplesPerSec=6.325579723348849, CurrSamplesPerSec=5.674735579355198, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:49:53,463] [INFO] [timer.py:197:stop] 0/8008, RunningAvgSamplesPerSec=6.32558171689966, CurrSamplesPerSec=5.698070833712943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:50:04,034] [INFO] [timer.py:197:stop] 0/8010, RunningAvgSamplesPerSec=6.325705330594667, CurrSamplesPerSec=6.642331888924197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:50:15,587] [INFO] [timer.py:197:stop] 0/8012, RunningAvgSamplesPerSec=6.325697311389212, CurrSamplesPerSec=5.652678192198945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:50:26,914] [INFO] [timer.py:197:stop] 0/8014, RunningAvgSamplesPerSec=6.325694146146658, CurrSamplesPerSec=5.673876766231179, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:50:38,235] [INFO] [timer.py:197:stop] 0/8016, RunningAvgSamplesPerSec=6.325690082925503, CurrSamplesPerSec=5.701098437871735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:50:49,631] [INFO] [timer.py:197:stop] 0/8018, RunningAvgSamplesPerSec=6.325684007204523, CurrSamplesPerSec=5.669918314112371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:51:00,944] [INFO] [logging.py:68:log_dist] [Rank 0] step=4010, skipped=6, lr=[2.2155555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 13:51:00,945] [INFO] [timer.py:197:stop] 0/8020, RunningAvgSamplesPerSec=6.325684027734768, CurrSamplesPerSec=5.704071323262826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:51:12,461] [INFO] [timer.py:197:stop] 0/8022, RunningAvgSamplesPerSec=6.325685066765304, CurrSamplesPerSec=5.689485980872626, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:51:23,806] [INFO] [timer.py:197:stop] 0/8024, RunningAvgSamplesPerSec=6.325681309933354, CurrSamplesPerSec=5.677605372081919, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:51:35,116] [INFO] [timer.py:197:stop] 0/8026, RunningAvgSamplesPerSec=6.325681710247161, CurrSamplesPerSec=5.69005328611863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:51:46,408] [INFO] [timer.py:197:stop] 0/8028, RunningAvgSamplesPerSec=6.325683797025658, CurrSamplesPerSec=5.702521014115912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 13:51:57,724] [INFO] [timer.py:197:stop] 0/8030, RunningAvgSamplesPerSec=6.325682184369224, CurrSamplesPerSec=5.704176291015664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.2044444444444444e-06, 'epoch': 30.07} {'eval_loss': 0.32470703125, 'eval_wer': 15.594426326712126, 'eval_runtime': 1388.6023, 'eval_samples_per_second': 3.335, 'eval_steps_per_second': 0.417, 'epoch': 30.07} [2022-12-19 14:15:10,033] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step4015 is begin to save! [2022-12-19 14:15:10,043] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-4000/global_step4015/mp_rank_00_model_states.pt [2022-12-19 14:15:10,043] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-4000/global_step4015/mp_rank_00_model_states.pt... [2022-12-19 14:15:13,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-4000/global_step4015/mp_rank_00_model_states.pt. [2022-12-19 14:15:13,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-4000/global_step4015/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-19 14:15:30,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-4000/global_step4015/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-19 14:15:30,212] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-4000/global_step4015/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-19 14:15:30,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4015 is ready now! [2022-12-19 14:17:43,412] [INFO] [timer.py:197:stop] 0/8032, RunningAvgSamplesPerSec=6.325641803473397, CurrSamplesPerSec=5.437451557571441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:17:54,923] [INFO] [timer.py:197:stop] 0/8034, RunningAvgSamplesPerSec=6.325647357340173, CurrSamplesPerSec=5.698247188305548, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:18:06,493] [INFO] [timer.py:197:stop] 0/8036, RunningAvgSamplesPerSec=6.325605164571048, CurrSamplesPerSec=5.713375830070038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:18:17,733] [INFO] [timer.py:197:stop] 0/8038, RunningAvgSamplesPerSec=6.325612020463562, CurrSamplesPerSec=5.696201517383088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:18:29,430] [INFO] [logging.py:68:log_dist] [Rank 0] step=4020, skipped=6, lr=[2.1933333333333332e-06], mom=[[0.9, 0.999]] [2022-12-19 14:18:29,431] [INFO] [timer.py:197:stop] 0/8040, RunningAvgSamplesPerSec=6.325604581201806, CurrSamplesPerSec=5.685161828020139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:18:40,732] [INFO] [timer.py:197:stop] 0/8042, RunningAvgSamplesPerSec=6.3256044039223385, CurrSamplesPerSec=5.6818045025697455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:18:52,061] [INFO] [timer.py:197:stop] 0/8044, RunningAvgSamplesPerSec=6.325600916176382, CurrSamplesPerSec=5.656693179932164, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:19:03,603] [INFO] [timer.py:197:stop] 0/8046, RunningAvgSamplesPerSec=6.325601506474235, CurrSamplesPerSec=5.678836515240733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:19:14,927] [INFO] [timer.py:197:stop] 0/8048, RunningAvgSamplesPerSec=6.3255970231493714, CurrSamplesPerSec=5.671116651714699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:19:26,200] [INFO] [timer.py:197:stop] 0/8050, RunningAvgSamplesPerSec=6.325593628119118, CurrSamplesPerSec=5.693810436512917, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:19:37,651] [INFO] [timer.py:197:stop] 0/8052, RunningAvgSamplesPerSec=6.325594192418441, CurrSamplesPerSec=5.699942103541868, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:19:49,025] [INFO] [timer.py:197:stop] 0/8054, RunningAvgSamplesPerSec=6.3255826824412456, CurrSamplesPerSec=5.630338802626227, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:20:00,364] [INFO] [timer.py:197:stop] 0/8056, RunningAvgSamplesPerSec=6.325577248543317, CurrSamplesPerSec=5.655623420627152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:20:11,720] [INFO] [timer.py:197:stop] 0/8058, RunningAvgSamplesPerSec=6.325570549162972, CurrSamplesPerSec=5.632460340357941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:20:23,262] [INFO] [logging.py:68:log_dist] [Rank 0] step=4030, skipped=6, lr=[2.1711111111111113e-06], mom=[[0.9, 0.999]] [2022-12-19 14:20:23,264] [INFO] [timer.py:197:stop] 0/8060, RunningAvgSamplesPerSec=6.325569138270351, CurrSamplesPerSec=5.690764024502499, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:20:34,605] [INFO] [timer.py:197:stop] 0/8062, RunningAvgSamplesPerSec=6.325562954226198, CurrSamplesPerSec=5.657969172862315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:20:45,970] [INFO] [timer.py:197:stop] 0/8064, RunningAvgSamplesPerSec=6.325554312924046, CurrSamplesPerSec=5.684576718244492, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:20:57,299] [INFO] [timer.py:197:stop] 0/8066, RunningAvgSamplesPerSec=6.3255505906815195, CurrSamplesPerSec=5.673434027290347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:21:08,669] [INFO] [timer.py:197:stop] 0/8068, RunningAvgSamplesPerSec=6.3255410163638235, CurrSamplesPerSec=5.631967558780618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:21:20,001] [INFO] [timer.py:197:stop] 0/8070, RunningAvgSamplesPerSec=6.325535887108654, CurrSamplesPerSec=5.673128515358727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:21:31,411] [INFO] [timer.py:197:stop] 0/8072, RunningAvgSamplesPerSec=6.325530521579591, CurrSamplesPerSec=5.674160769510732, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:21:42,707] [INFO] [timer.py:197:stop] 0/8074, RunningAvgSamplesPerSec=6.32553242272264, CurrSamplesPerSec=5.692814483593138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:21:54,199] [INFO] [timer.py:197:stop] 0/8076, RunningAvgSamplesPerSec=6.325499063007307, CurrSamplesPerSec=5.493385233373368, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:22:05,500] [INFO] [timer.py:197:stop] 0/8078, RunningAvgSamplesPerSec=6.325494255449442, CurrSamplesPerSec=5.654502371565065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:22:16,893] [INFO] [logging.py:68:log_dist] [Rank 0] step=4040, skipped=6, lr=[2.148888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 14:22:16,894] [INFO] [timer.py:197:stop] 0/8080, RunningAvgSamplesPerSec=6.325483275750912, CurrSamplesPerSec=5.6336131005590095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.148888888888889e-06, 'epoch': 30.26} [2022-12-19 14:22:28,155] [INFO] [timer.py:197:stop] 0/8082, RunningAvgSamplesPerSec=6.325482632923822, CurrSamplesPerSec=5.683632369219616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:22:39,759] [INFO] [timer.py:197:stop] 0/8084, RunningAvgSamplesPerSec=6.325433027886173, CurrSamplesPerSec=5.3930115400426635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:22:51,133] [INFO] [timer.py:197:stop] 0/8086, RunningAvgSamplesPerSec=6.32542760275741, CurrSamplesPerSec=5.6578057964699955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:23:02,497] [INFO] [timer.py:197:stop] 0/8088, RunningAvgSamplesPerSec=6.3254240533029975, CurrSamplesPerSec=5.67082217116903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:23:13,847] [INFO] [timer.py:197:stop] 0/8090, RunningAvgSamplesPerSec=6.325411300752105, CurrSamplesPerSec=5.680036702917753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:23:25,212] [INFO] [timer.py:197:stop] 0/8092, RunningAvgSamplesPerSec=6.325408366205241, CurrSamplesPerSec=5.687234759652001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:23:36,541] [INFO] [timer.py:197:stop] 0/8094, RunningAvgSamplesPerSec=6.325403928854267, CurrSamplesPerSec=5.677078966112908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:23:47,874] [INFO] [timer.py:197:stop] 0/8096, RunningAvgSamplesPerSec=6.325398889541606, CurrSamplesPerSec=5.67825270666057, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:23:59,197] [INFO] [timer.py:197:stop] 0/8098, RunningAvgSamplesPerSec=6.325391579256017, CurrSamplesPerSec=5.6836386269363635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:24:10,528] [INFO] [logging.py:68:log_dist] [Rank 0] step=4050, skipped=6, lr=[2.126666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 14:24:10,531] [INFO] [timer.py:197:stop] 0/8100, RunningAvgSamplesPerSec=6.325388542025015, CurrSamplesPerSec=5.684980984937298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:24:21,897] [INFO] [timer.py:197:stop] 0/8102, RunningAvgSamplesPerSec=6.325377397468815, CurrSamplesPerSec=5.6152777634834425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:24:33,253] [INFO] [timer.py:197:stop] 0/8104, RunningAvgSamplesPerSec=6.325374545649942, CurrSamplesPerSec=5.676826604440173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:24:44,681] [INFO] [timer.py:197:stop] 0/8106, RunningAvgSamplesPerSec=6.325356635884962, CurrSamplesPerSec=5.580916319475154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:24:56,013] [INFO] [timer.py:197:stop] 0/8108, RunningAvgSamplesPerSec=6.325353743910263, CurrSamplesPerSec=5.684180212314628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:25:07,464] [INFO] [timer.py:197:stop] 0/8110, RunningAvgSamplesPerSec=6.325328683841103, CurrSamplesPerSec=5.543553919326336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:25:18,771] [INFO] [timer.py:197:stop] 0/8112, RunningAvgSamplesPerSec=6.325329012480698, CurrSamplesPerSec=5.692425036168844, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:25:30,067] [INFO] [timer.py:197:stop] 0/8114, RunningAvgSamplesPerSec=6.325329034005955, CurrSamplesPerSec=5.6834788187954155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:25:41,488] [INFO] [timer.py:197:stop] 0/8116, RunningAvgSamplesPerSec=6.325316656045708, CurrSamplesPerSec=5.692333778452927, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:25:52,805] [INFO] [timer.py:197:stop] 0/8118, RunningAvgSamplesPerSec=6.3253111971829155, CurrSamplesPerSec=5.670043586399185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:26:04,159] [INFO] [logging.py:68:log_dist] [Rank 0] step=4060, skipped=6, lr=[2.1044444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 14:26:04,161] [INFO] [timer.py:197:stop] 0/8120, RunningAvgSamplesPerSec=6.325302627888887, CurrSamplesPerSec=5.655603878877854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:26:15,447] [INFO] [timer.py:197:stop] 0/8122, RunningAvgSamplesPerSec=6.32530024063617, CurrSamplesPerSec=5.674199390395839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:26:26,833] [INFO] [timer.py:197:stop] 0/8124, RunningAvgSamplesPerSec=6.325292669277545, CurrSamplesPerSec=5.695711538687593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:26:38,118] [INFO] [timer.py:197:stop] 0/8126, RunningAvgSamplesPerSec=6.325291902162122, CurrSamplesPerSec=5.678206583692035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:26:49,875] [INFO] [timer.py:197:stop] 0/8128, RunningAvgSamplesPerSec=6.325284893571726, CurrSamplesPerSec=5.659085154735508, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:27:01,785] [INFO] [timer.py:197:stop] 0/8130, RunningAvgSamplesPerSec=6.325279116129157, CurrSamplesPerSec=5.67654689578729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.0933333333333338e-06, 'epoch': 30.45} [2022-12-19 14:27:13,765] [INFO] [timer.py:197:stop] 0/8132, RunningAvgSamplesPerSec=6.325269701936246, CurrSamplesPerSec=5.659644026737484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:27:25,019] [INFO] [timer.py:197:stop] 0/8134, RunningAvgSamplesPerSec=6.32527235241006, CurrSamplesPerSec=5.696636211373144, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:27:36,385] [INFO] [timer.py:197:stop] 0/8136, RunningAvgSamplesPerSec=6.3252628656280425, CurrSamplesPerSec=5.62482023487689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:27:47,884] [INFO] [timer.py:197:stop] 0/8138, RunningAvgSamplesPerSec=6.325258110722405, CurrSamplesPerSec=5.661108307878231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:27:59,187] [INFO] [logging.py:68:log_dist] [Rank 0] step=4070, skipped=6, lr=[2.0822222222222226e-06], mom=[[0.9, 0.999]] [2022-12-19 14:27:59,189] [INFO] [timer.py:197:stop] 0/8140, RunningAvgSamplesPerSec=6.3252550066984785, CurrSamplesPerSec=5.672102389647067, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:28:10,621] [INFO] [timer.py:197:stop] 0/8142, RunningAvgSamplesPerSec=6.325248587874361, CurrSamplesPerSec=5.6936418438430145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:28:22,091] [INFO] [timer.py:197:stop] 0/8144, RunningAvgSamplesPerSec=6.325243979855986, CurrSamplesPerSec=5.687139330662115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:28:33,736] [INFO] [timer.py:197:stop] 0/8146, RunningAvgSamplesPerSec=6.325190652956894, CurrSamplesPerSec=5.360793158390594, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:28:45,087] [INFO] [timer.py:197:stop] 0/8148, RunningAvgSamplesPerSec=6.325185980859677, CurrSamplesPerSec=5.679821573903423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:28:56,408] [INFO] [timer.py:197:stop] 0/8150, RunningAvgSamplesPerSec=6.325183501280296, CurrSamplesPerSec=5.690922553509549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:29:07,811] [INFO] [timer.py:197:stop] 0/8152, RunningAvgSamplesPerSec=6.325179791237123, CurrSamplesPerSec=5.676055971714138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:29:19,161] [INFO] [timer.py:197:stop] 0/8154, RunningAvgSamplesPerSec=6.325174362430907, CurrSamplesPerSec=5.67159330057794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:29:30,599] [INFO] [timer.py:197:stop] 0/8156, RunningAvgSamplesPerSec=6.325160303692914, CurrSamplesPerSec=5.6771603701442634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:29:41,941] [INFO] [timer.py:197:stop] 0/8158, RunningAvgSamplesPerSec=6.325156859607074, CurrSamplesPerSec=5.68541204134909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:29:53,319] [INFO] [logging.py:68:log_dist] [Rank 0] step=4080, skipped=6, lr=[2.06e-06], mom=[[0.9, 0.999]] [2022-12-19 14:29:53,320] [INFO] [timer.py:197:stop] 0/8160, RunningAvgSamplesPerSec=6.325150324831231, CurrSamplesPerSec=5.657251579564087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:30:04,893] [INFO] [timer.py:197:stop] 0/8162, RunningAvgSamplesPerSec=6.325146764197562, CurrSamplesPerSec=5.678454022972701, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:30:16,240] [INFO] [timer.py:197:stop] 0/8164, RunningAvgSamplesPerSec=6.325143476585626, CurrSamplesPerSec=5.669091368486997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:30:27,652] [INFO] [timer.py:197:stop] 0/8166, RunningAvgSamplesPerSec=6.325129598856739, CurrSamplesPerSec=5.690020962117351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:30:39,230] [INFO] [timer.py:197:stop] 0/8168, RunningAvgSamplesPerSec=6.32512527515823, CurrSamplesPerSec=5.674230095574051, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:30:50,544] [INFO] [timer.py:197:stop] 0/8170, RunningAvgSamplesPerSec=6.325121392523363, CurrSamplesPerSec=5.657758574102674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:31:01,820] [INFO] [timer.py:197:stop] 0/8172, RunningAvgSamplesPerSec=6.325124649578188, CurrSamplesPerSec=5.699945250381404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:31:13,546] [INFO] [timer.py:197:stop] 0/8174, RunningAvgSamplesPerSec=6.3250603571028075, CurrSamplesPerSec=5.2868382009871535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:31:25,050] [INFO] [timer.py:197:stop] 0/8176, RunningAvgSamplesPerSec=6.325059209793325, CurrSamplesPerSec=5.696287580558719, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:31:36,374] [INFO] [timer.py:197:stop] 0/8178, RunningAvgSamplesPerSec=6.325057497892663, CurrSamplesPerSec=5.694060203890854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:31:47,812] [INFO] [logging.py:68:log_dist] [Rank 0] step=4090, skipped=6, lr=[2.037777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 14:31:47,814] [INFO] [timer.py:197:stop] 0/8180, RunningAvgSamplesPerSec=6.325044112318793, CurrSamplesPerSec=5.691101844292516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 2.037777777777778e-06, 'epoch': 30.64} [2022-12-19 14:31:59,257] [INFO] [timer.py:197:stop] 0/8182, RunningAvgSamplesPerSec=6.325044005783121, CurrSamplesPerSec=5.669044436438997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:32:10,844] [INFO] [timer.py:197:stop] 0/8184, RunningAvgSamplesPerSec=6.32500582385698, CurrSamplesPerSec=5.435942817708837, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:32:22,165] [INFO] [timer.py:197:stop] 0/8186, RunningAvgSamplesPerSec=6.3249997508190665, CurrSamplesPerSec=5.649869211438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:32:33,738] [INFO] [timer.py:197:stop] 0/8188, RunningAvgSamplesPerSec=6.324961772431633, CurrSamplesPerSec=5.444744541948751, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:32:45,186] [INFO] [timer.py:197:stop] 0/8190, RunningAvgSamplesPerSec=6.324954964776462, CurrSamplesPerSec=5.653176747823403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:32:56,466] [INFO] [timer.py:197:stop] 0/8192, RunningAvgSamplesPerSec=6.3249568193957915, CurrSamplesPerSec=5.685648548110204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:33:07,945] [INFO] [timer.py:197:stop] 0/8194, RunningAvgSamplesPerSec=6.324946445783223, CurrSamplesPerSec=5.676103500014357, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:33:19,311] [INFO] [timer.py:197:stop] 0/8196, RunningAvgSamplesPerSec=6.324946137889031, CurrSamplesPerSec=5.6797388917796345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:33:30,728] [INFO] [timer.py:197:stop] 0/8198, RunningAvgSamplesPerSec=6.3249305093294925, CurrSamplesPerSec=5.5809379011808815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:33:42,033] [INFO] [logging.py:68:log_dist] [Rank 0] step=4100, skipped=6, lr=[2.0155555555555554e-06], mom=[[0.9, 0.999]] [2022-12-19 14:33:42,034] [INFO] [timer.py:197:stop] 0/8200, RunningAvgSamplesPerSec=6.324927315129972, CurrSamplesPerSec=5.680190548184062, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:33:53,510] [INFO] [timer.py:197:stop] 0/8202, RunningAvgSamplesPerSec=6.324925168748207, CurrSamplesPerSec=5.67417132423458, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:34:04,910] [INFO] [timer.py:197:stop] 0/8204, RunningAvgSamplesPerSec=6.324909646451287, CurrSamplesPerSec=5.6941338822058825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:34:16,183] [INFO] [timer.py:197:stop] 0/8206, RunningAvgSamplesPerSec=6.3249133369776995, CurrSamplesPerSec=5.73014048132714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:34:27,700] [INFO] [timer.py:197:stop] 0/8208, RunningAvgSamplesPerSec=6.324908635585216, CurrSamplesPerSec=5.695470568745349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:34:38,973] [INFO] [timer.py:197:stop] 0/8210, RunningAvgSamplesPerSec=6.324911555195426, CurrSamplesPerSec=5.705862364804972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:34:50,353] [INFO] [timer.py:197:stop] 0/8212, RunningAvgSamplesPerSec=6.3249020592060905, CurrSamplesPerSec=5.6438725122566655, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:35:01,649] [INFO] [timer.py:197:stop] 0/8214, RunningAvgSamplesPerSec=6.3249053485112094, CurrSamplesPerSec=5.7012505200854395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:35:13,094] [INFO] [timer.py:197:stop] 0/8216, RunningAvgSamplesPerSec=6.324902634234698, CurrSamplesPerSec=5.672772206276174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:35:24,439] [INFO] [timer.py:197:stop] 0/8218, RunningAvgSamplesPerSec=6.324896993147265, CurrSamplesPerSec=5.680937777779471, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:35:35,844] [INFO] [logging.py:68:log_dist] [Rank 0] step=4110, skipped=6, lr=[1.9933333333333334e-06], mom=[[0.9, 0.999]] [2022-12-19 14:35:35,846] [INFO] [timer.py:197:stop] 0/8220, RunningAvgSamplesPerSec=6.3248973192920825, CurrSamplesPerSec=5.689913861334511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:35:47,234] [INFO] [timer.py:197:stop] 0/8222, RunningAvgSamplesPerSec=6.324883758932601, CurrSamplesPerSec=5.592804286705785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:35:58,564] [INFO] [timer.py:197:stop] 0/8224, RunningAvgSamplesPerSec=6.324882866049327, CurrSamplesPerSec=5.687026073138207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:36:10,036] [INFO] [timer.py:197:stop] 0/8226, RunningAvgSamplesPerSec=6.3248770579602525, CurrSamplesPerSec=5.661218386355887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:36:21,635] [INFO] [timer.py:197:stop] 0/8228, RunningAvgSamplesPerSec=6.324833189001884, CurrSamplesPerSec=5.675286748061461, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:36:32,942] [INFO] [timer.py:197:stop] 0/8230, RunningAvgSamplesPerSec=6.3248308103171595, CurrSamplesPerSec=5.690223596249968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.9822222222222223e-06, 'epoch': 30.82} [2022-12-19 14:36:44,653] [INFO] [timer.py:197:stop] 0/8232, RunningAvgSamplesPerSec=6.324824794204879, CurrSamplesPerSec=5.661336349389411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:36:55,980] [INFO] [timer.py:197:stop] 0/8234, RunningAvgSamplesPerSec=6.324820846835549, CurrSamplesPerSec=5.675437936246949, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:37:07,396] [INFO] [timer.py:197:stop] 0/8236, RunningAvgSamplesPerSec=6.324807140209151, CurrSamplesPerSec=5.618571018566799, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:37:19,065] [INFO] [timer.py:197:stop] 0/8238, RunningAvgSamplesPerSec=6.324789105286443, CurrSamplesPerSec=5.5737430481722905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:37:30,341] [INFO] [logging.py:68:log_dist] [Rank 0] step=4120, skipped=6, lr=[1.971111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 14:37:30,342] [INFO] [timer.py:197:stop] 0/8240, RunningAvgSamplesPerSec=6.3247891851139135, CurrSamplesPerSec=5.696119566383091, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:37:41,786] [INFO] [timer.py:197:stop] 0/8242, RunningAvgSamplesPerSec=6.32477085259466, CurrSamplesPerSec=5.670332955403314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:37:53,235] [INFO] [timer.py:197:stop] 0/8244, RunningAvgSamplesPerSec=6.324766050883402, CurrSamplesPerSec=5.671452142815853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:38:04,884] [INFO] [timer.py:197:stop] 0/8246, RunningAvgSamplesPerSec=6.324710946586333, CurrSamplesPerSec=5.345798737267667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:38:16,200] [INFO] [timer.py:197:stop] 0/8248, RunningAvgSamplesPerSec=6.324708980459276, CurrSamplesPerSec=5.675586492484414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:38:27,896] [INFO] [timer.py:197:stop] 0/8250, RunningAvgSamplesPerSec=6.3246499686088455, CurrSamplesPerSec=5.3077762730079705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:38:39,231] [INFO] [timer.py:197:stop] 0/8252, RunningAvgSamplesPerSec=6.324648602470289, CurrSamplesPerSec=5.692812310459055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:38:50,522] [INFO] [timer.py:197:stop] 0/8254, RunningAvgSamplesPerSec=6.3246506015206165, CurrSamplesPerSec=5.713380694210066, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:39:02,076] [INFO] [timer.py:197:stop] 0/8256, RunningAvgSamplesPerSec=6.324636826962675, CurrSamplesPerSec=5.695391297233678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:39:13,398] [INFO] [timer.py:197:stop] 0/8258, RunningAvgSamplesPerSec=6.324631746626775, CurrSamplesPerSec=5.677424769252895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:39:25,055] [INFO] [logging.py:68:log_dist] [Rank 0] step=4130, skipped=6, lr=[1.948888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 14:39:25,057] [INFO] [timer.py:197:stop] 0/8260, RunningAvgSamplesPerSec=6.324579073756963, CurrSamplesPerSec=5.353957468067383, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:39:36,348] [INFO] [timer.py:197:stop] 0/8262, RunningAvgSamplesPerSec=6.32457847106292, CurrSamplesPerSec=5.691426913051699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:39:48,005] [INFO] [timer.py:197:stop] 0/8264, RunningAvgSamplesPerSec=6.324571696346121, CurrSamplesPerSec=5.675519293223086, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:39:59,355] [INFO] [timer.py:197:stop] 0/8266, RunningAvgSamplesPerSec=6.324561531510408, CurrSamplesPerSec=5.687904539564554, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:40:10,765] [INFO] [timer.py:197:stop] 0/8268, RunningAvgSamplesPerSec=6.324561580061063, CurrSamplesPerSec=5.698787448019233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:40:22,189] [INFO] [timer.py:197:stop] 0/8270, RunningAvgSamplesPerSec=6.324541718115034, CurrSamplesPerSec=5.573497012489421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:40:33,496] [INFO] [timer.py:197:stop] 0/8272, RunningAvgSamplesPerSec=6.324540622301462, CurrSamplesPerSec=5.697203734342135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:40:45,151] [INFO] [timer.py:197:stop] 0/8274, RunningAvgSamplesPerSec=6.324487912719239, CurrSamplesPerSec=5.355868532311154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:40:56,668] [INFO] [timer.py:197:stop] 0/8276, RunningAvgSamplesPerSec=6.324485656042035, CurrSamplesPerSec=5.688139084178001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:41:07,045] [INFO] [timer.py:197:stop] 0/8278, RunningAvgSamplesPerSec=6.324609786082329, CurrSamplesPerSec=5.691166034586864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:41:18,719] [INFO] [logging.py:68:log_dist] [Rank 0] step=4140, skipped=6, lr=[1.926666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 14:41:18,720] [INFO] [timer.py:197:stop] 0/8280, RunningAvgSamplesPerSec=6.3245912109479985, CurrSamplesPerSec=5.625761646167135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0003, 'learning_rate': 1.926666666666667e-06, 'epoch': 31.01} [2022-12-19 14:41:30,108] [INFO] [timer.py:197:stop] 0/8282, RunningAvgSamplesPerSec=6.32457992495355, CurrSamplesPerSec=5.682636123920992, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:41:41,455] [INFO] [timer.py:197:stop] 0/8284, RunningAvgSamplesPerSec=6.324566593976239, CurrSamplesPerSec=5.60120669612184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:41:52,791] [INFO] [timer.py:197:stop] 0/8286, RunningAvgSamplesPerSec=6.324562498685906, CurrSamplesPerSec=5.660885776178486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:42:04,200] [INFO] [timer.py:197:stop] 0/8288, RunningAvgSamplesPerSec=6.3245622974536815, CurrSamplesPerSec=5.685604954244338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:42:15,630] [INFO] [timer.py:197:stop] 0/8290, RunningAvgSamplesPerSec=6.3245399771856965, CurrSamplesPerSec=5.660935199626208, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:42:26,922] [INFO] [timer.py:197:stop] 0/8292, RunningAvgSamplesPerSec=6.324538604461437, CurrSamplesPerSec=5.676794190437464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:42:38,587] [INFO] [timer.py:197:stop] 0/8294, RunningAvgSamplesPerSec=6.324537232867196, CurrSamplesPerSec=5.691165069308556, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:42:49,908] [INFO] [timer.py:197:stop] 0/8296, RunningAvgSamplesPerSec=6.324531609401907, CurrSamplesPerSec=5.68249321302716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:43:01,254] [INFO] [timer.py:197:stop] 0/8298, RunningAvgSamplesPerSec=6.324525996668939, CurrSamplesPerSec=5.655448979545401, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:43:12,705] [INFO] [logging.py:68:log_dist] [Rank 0] step=4150, skipped=6, lr=[1.9044444444444445e-06], mom=[[0.9, 0.999]] [2022-12-19 14:43:12,707] [INFO] [timer.py:197:stop] 0/8300, RunningAvgSamplesPerSec=6.324524286879332, CurrSamplesPerSec=5.701715777358971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:43:24,148] [INFO] [timer.py:197:stop] 0/8302, RunningAvgSamplesPerSec=6.324524173579496, CurrSamplesPerSec=5.696224483425561, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:43:35,476] [INFO] [timer.py:197:stop] 0/8304, RunningAvgSamplesPerSec=6.324518586056128, CurrSamplesPerSec=5.671620382581822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:43:46,952] [INFO] [timer.py:197:stop] 0/8306, RunningAvgSamplesPerSec=6.324514303632991, CurrSamplesPerSec=5.656429277907199, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:43:58,316] [INFO] [timer.py:197:stop] 0/8308, RunningAvgSamplesPerSec=6.3245084768161615, CurrSamplesPerSec=5.673123959307348, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:44:09,640] [INFO] [timer.py:197:stop] 0/8310, RunningAvgSamplesPerSec=6.324503659039768, CurrSamplesPerSec=5.676433579612334, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:44:20,954] [INFO] [timer.py:197:stop] 0/8312, RunningAvgSamplesPerSec=6.32450290760975, CurrSamplesPerSec=5.678657275639422, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:44:32,277] [INFO] [timer.py:197:stop] 0/8314, RunningAvgSamplesPerSec=6.324498363874806, CurrSamplesPerSec=5.67130092727044, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:44:43,592] [INFO] [timer.py:197:stop] 0/8316, RunningAvgSamplesPerSec=6.324495111829822, CurrSamplesPerSec=5.683533932334821, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:44:54,940] [INFO] [timer.py:197:stop] 0/8318, RunningAvgSamplesPerSec=6.324489082797967, CurrSamplesPerSec=5.6660213518312235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:45:06,384] [INFO] [logging.py:68:log_dist] [Rank 0] step=4160, skipped=6, lr=[1.8822222222222226e-06], mom=[[0.9, 0.999]] [2022-12-19 14:45:06,386] [INFO] [timer.py:197:stop] 0/8320, RunningAvgSamplesPerSec=6.3244889142919964, CurrSamplesPerSec=5.674500459417397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:45:17,680] [INFO] [timer.py:197:stop] 0/8322, RunningAvgSamplesPerSec=6.324491323105782, CurrSamplesPerSec=5.685833046678675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:45:29,211] [INFO] [timer.py:197:stop] 0/8324, RunningAvgSamplesPerSec=6.324489921536028, CurrSamplesPerSec=5.674060021850278, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:45:40,544] [INFO] [timer.py:197:stop] 0/8326, RunningAvgSamplesPerSec=6.324487624835438, CurrSamplesPerSec=5.692437590369295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:45:52,032] [INFO] [timer.py:197:stop] 0/8328, RunningAvgSamplesPerSec=6.324484554847763, CurrSamplesPerSec=5.683809997458292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:46:03,317] [INFO] [timer.py:197:stop] 0/8330, RunningAvgSamplesPerSec=6.3244848760239485, CurrSamplesPerSec=5.68895085908554, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.8711111111111114e-06, 'epoch': 31.2} [2022-12-19 14:46:14,563] [INFO] [timer.py:197:stop] 0/8332, RunningAvgSamplesPerSec=6.3244897025022615, CurrSamplesPerSec=5.7033182383918595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:46:25,865] [INFO] [timer.py:197:stop] 0/8334, RunningAvgSamplesPerSec=6.3244900987606405, CurrSamplesPerSec=5.698411215131826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:46:37,144] [INFO] [timer.py:197:stop] 0/8336, RunningAvgSamplesPerSec=6.324487392040982, CurrSamplesPerSec=5.674106557229778, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:46:48,484] [INFO] [timer.py:197:stop] 0/8338, RunningAvgSamplesPerSec=6.324485483359261, CurrSamplesPerSec=5.67487762043136, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:46:59,756] [INFO] [logging.py:68:log_dist] [Rank 0] step=4170, skipped=6, lr=[1.8600000000000002e-06], mom=[[0.9, 0.999]] [2022-12-19 14:46:59,758] [INFO] [timer.py:197:stop] 0/8340, RunningAvgSamplesPerSec=6.3244856049520015, CurrSamplesPerSec=5.689543140434265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:47:11,103] [INFO] [timer.py:197:stop] 0/8342, RunningAvgSamplesPerSec=6.324481253916942, CurrSamplesPerSec=5.678333904123716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:47:22,546] [INFO] [timer.py:197:stop] 0/8344, RunningAvgSamplesPerSec=6.324477286657404, CurrSamplesPerSec=5.675807061243003, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:47:33,835] [INFO] [timer.py:197:stop] 0/8346, RunningAvgSamplesPerSec=6.3244787358765855, CurrSamplesPerSec=5.703105946568021, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:47:45,130] [INFO] [timer.py:197:stop] 0/8348, RunningAvgSamplesPerSec=6.324478335031197, CurrSamplesPerSec=5.684886353928244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:47:56,411] [INFO] [timer.py:197:stop] 0/8350, RunningAvgSamplesPerSec=6.324479928993325, CurrSamplesPerSec=5.712419214738036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:48:07,702] [INFO] [timer.py:197:stop] 0/8352, RunningAvgSamplesPerSec=6.324480183725168, CurrSamplesPerSec=5.683293751019163, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:48:19,168] [INFO] [timer.py:197:stop] 0/8354, RunningAvgSamplesPerSec=6.3244816612291945, CurrSamplesPerSec=5.694221815543122, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:48:30,566] [INFO] [timer.py:197:stop] 0/8356, RunningAvgSamplesPerSec=6.324476574323268, CurrSamplesPerSec=5.6725430028259645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:48:41,863] [INFO] [timer.py:197:stop] 0/8358, RunningAvgSamplesPerSec=6.32447488864945, CurrSamplesPerSec=5.697015111692377, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:48:53,177] [INFO] [logging.py:68:log_dist] [Rank 0] step=4180, skipped=6, lr=[1.837777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 14:48:53,178] [INFO] [timer.py:197:stop] 0/8360, RunningAvgSamplesPerSec=6.324475140813201, CurrSamplesPerSec=5.686711867371195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:49:04,601] [INFO] [timer.py:197:stop] 0/8362, RunningAvgSamplesPerSec=6.324472745401524, CurrSamplesPerSec=5.686591639840049, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:49:15,946] [INFO] [timer.py:197:stop] 0/8364, RunningAvgSamplesPerSec=6.324468974822147, CurrSamplesPerSec=5.689803146520338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:49:27,477] [INFO] [timer.py:197:stop] 0/8366, RunningAvgSamplesPerSec=6.324467552293848, CurrSamplesPerSec=5.694431513692238, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:49:38,791] [INFO] [timer.py:197:stop] 0/8368, RunningAvgSamplesPerSec=6.324462170406821, CurrSamplesPerSec=5.670787190158504, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:49:50,236] [INFO] [timer.py:197:stop] 0/8370, RunningAvgSamplesPerSec=6.32446015504963, CurrSamplesPerSec=5.669835680547889, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:50:01,541] [INFO] [timer.py:197:stop] 0/8372, RunningAvgSamplesPerSec=6.324457605335341, CurrSamplesPerSec=5.689094094978961, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:50:12,832] [INFO] [timer.py:197:stop] 0/8374, RunningAvgSamplesPerSec=6.324458206526692, CurrSamplesPerSec=5.68686800209344, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:50:24,339] [INFO] [timer.py:197:stop] 0/8376, RunningAvgSamplesPerSec=6.324457949368211, CurrSamplesPerSec=5.686729456193795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:50:35,615] [INFO] [timer.py:197:stop] 0/8378, RunningAvgSamplesPerSec=6.32446218609974, CurrSamplesPerSec=5.699898774335396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:50:46,963] [INFO] [logging.py:68:log_dist] [Rank 0] step=4190, skipped=6, lr=[1.8155555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 14:50:46,964] [INFO] [timer.py:197:stop] 0/8380, RunningAvgSamplesPerSec=6.324464421362113, CurrSamplesPerSec=5.701188281638202, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.8155555555555556e-06, 'epoch': 31.39} [2022-12-19 14:50:58,293] [INFO] [timer.py:197:stop] 0/8382, RunningAvgSamplesPerSec=6.324460691639991, CurrSamplesPerSec=5.678577270193751, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:51:09,796] [INFO] [timer.py:197:stop] 0/8384, RunningAvgSamplesPerSec=6.324456157621069, CurrSamplesPerSec=5.67680667578727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:51:21,089] [INFO] [timer.py:197:stop] 0/8386, RunningAvgSamplesPerSec=6.32445385916731, CurrSamplesPerSec=5.686079705766025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:51:32,378] [INFO] [timer.py:197:stop] 0/8388, RunningAvgSamplesPerSec=6.324454444789108, CurrSamplesPerSec=5.6957523871683415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:51:43,680] [INFO] [timer.py:197:stop] 0/8390, RunningAvgSamplesPerSec=6.324453610054516, CurrSamplesPerSec=5.701119263937448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:51:55,020] [INFO] [timer.py:197:stop] 0/8392, RunningAvgSamplesPerSec=6.324450719034498, CurrSamplesPerSec=5.668705638839983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:52:06,294] [INFO] [timer.py:197:stop] 0/8394, RunningAvgSamplesPerSec=6.324453747900914, CurrSamplesPerSec=5.706743748837117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:52:17,615] [INFO] [timer.py:197:stop] 0/8396, RunningAvgSamplesPerSec=6.324451456134067, CurrSamplesPerSec=5.6859385486300855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:52:28,891] [INFO] [timer.py:197:stop] 0/8398, RunningAvgSamplesPerSec=6.32445210605322, CurrSamplesPerSec=5.691865706166093, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:52:40,376] [INFO] [logging.py:68:log_dist] [Rank 0] step=4200, skipped=6, lr=[1.7933333333333337e-06], mom=[[0.9, 0.999]] [2022-12-19 14:52:40,377] [INFO] [timer.py:197:stop] 0/8400, RunningAvgSamplesPerSec=6.3244461121464415, CurrSamplesPerSec=5.666336624342077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:52:51,702] [INFO] [timer.py:197:stop] 0/8402, RunningAvgSamplesPerSec=6.324444244605733, CurrSamplesPerSec=5.681695305698119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:53:03,028] [INFO] [timer.py:197:stop] 0/8404, RunningAvgSamplesPerSec=6.324442238738445, CurrSamplesPerSec=5.694266024812775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:53:14,342] [INFO] [timer.py:197:stop] 0/8406, RunningAvgSamplesPerSec=6.324440729923471, CurrSamplesPerSec=5.695698970042161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:53:25,855] [INFO] [timer.py:197:stop] 0/8408, RunningAvgSamplesPerSec=6.324435527873863, CurrSamplesPerSec=5.66847820027374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:53:37,270] [INFO] [timer.py:197:stop] 0/8410, RunningAvgSamplesPerSec=6.324436808149205, CurrSamplesPerSec=5.702829700051233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:53:48,674] [INFO] [timer.py:197:stop] 0/8412, RunningAvgSamplesPerSec=6.324435593090223, CurrSamplesPerSec=5.700377852854457, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:54:00,066] [INFO] [timer.py:197:stop] 0/8414, RunningAvgSamplesPerSec=6.3244245224824125, CurrSamplesPerSec=5.686692351133733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:54:11,557] [INFO] [timer.py:197:stop] 0/8416, RunningAvgSamplesPerSec=6.324424167060802, CurrSamplesPerSec=5.687830299256896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:54:22,960] [INFO] [timer.py:197:stop] 0/8418, RunningAvgSamplesPerSec=6.324417756131905, CurrSamplesPerSec=5.670015321805009, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:54:34,257] [INFO] [logging.py:68:log_dist] [Rank 0] step=4210, skipped=6, lr=[1.7711111111111113e-06], mom=[[0.9, 0.999]] [2022-12-19 14:54:34,259] [INFO] [timer.py:197:stop] 0/8420, RunningAvgSamplesPerSec=6.324417352819159, CurrSamplesPerSec=5.692540681763479, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:54:45,567] [INFO] [timer.py:197:stop] 0/8422, RunningAvgSamplesPerSec=6.324415876896977, CurrSamplesPerSec=5.695115072592306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:54:56,865] [INFO] [timer.py:197:stop] 0/8424, RunningAvgSamplesPerSec=6.324414419510356, CurrSamplesPerSec=5.684884427629989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:55:08,348] [INFO] [timer.py:197:stop] 0/8426, RunningAvgSamplesPerSec=6.324416309980584, CurrSamplesPerSec=5.711269463250304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:55:19,638] [INFO] [timer.py:197:stop] 0/8428, RunningAvgSamplesPerSec=6.324413294165854, CurrSamplesPerSec=5.6834104698645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:55:30,947] [INFO] [timer.py:197:stop] 0/8430, RunningAvgSamplesPerSec=6.32441113435295, CurrSamplesPerSec=5.68562518556857, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.76e-06, 'epoch': 31.58} [2022-12-19 14:55:42,194] [INFO] [timer.py:197:stop] 0/8432, RunningAvgSamplesPerSec=6.324413763552823, CurrSamplesPerSec=5.708960177946308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:55:53,566] [INFO] [timer.py:197:stop] 0/8434, RunningAvgSamplesPerSec=6.324408601796768, CurrSamplesPerSec=5.682489604259255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:56:04,909] [INFO] [timer.py:197:stop] 0/8436, RunningAvgSamplesPerSec=6.324404794208527, CurrSamplesPerSec=5.6626168272009005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:56:16,172] [INFO] [timer.py:197:stop] 0/8438, RunningAvgSamplesPerSec=6.324404724491973, CurrSamplesPerSec=5.688435366019207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:56:27,494] [INFO] [logging.py:68:log_dist] [Rank 0] step=4220, skipped=6, lr=[1.7488888888888891e-06], mom=[[0.9, 0.999]] [2022-12-19 14:56:27,496] [INFO] [timer.py:197:stop] 0/8440, RunningAvgSamplesPerSec=6.3244032526335845, CurrSamplesPerSec=5.695156395834242, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:56:38,812] [INFO] [timer.py:197:stop] 0/8442, RunningAvgSamplesPerSec=6.324402882375879, CurrSamplesPerSec=5.6923091538562804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:56:50,154] [INFO] [timer.py:197:stop] 0/8444, RunningAvgSamplesPerSec=6.324399870211086, CurrSamplesPerSec=5.686228096932723, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:57:01,725] [INFO] [timer.py:197:stop] 0/8446, RunningAvgSamplesPerSec=6.324395372293625, CurrSamplesPerSec=5.6516313687777995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:57:13,047] [INFO] [timer.py:197:stop] 0/8448, RunningAvgSamplesPerSec=6.324392871182201, CurrSamplesPerSec=5.68668994173099, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:57:24,367] [INFO] [timer.py:197:stop] 0/8450, RunningAvgSamplesPerSec=6.324389062535092, CurrSamplesPerSec=5.677670699339108, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:57:35,746] [INFO] [timer.py:197:stop] 0/8452, RunningAvgSamplesPerSec=6.324392590207735, CurrSamplesPerSec=5.696567787396858, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:57:47,048] [INFO] [timer.py:197:stop] 0/8454, RunningAvgSamplesPerSec=6.324389279909388, CurrSamplesPerSec=5.68497929937014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:57:58,476] [INFO] [timer.py:197:stop] 0/8456, RunningAvgSamplesPerSec=6.324389114535033, CurrSamplesPerSec=5.690693328721456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:58:09,813] [INFO] [timer.py:197:stop] 0/8458, RunningAvgSamplesPerSec=6.324383958109871, CurrSamplesPerSec=5.690806008493869, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:58:21,136] [INFO] [logging.py:68:log_dist] [Rank 0] step=4230, skipped=6, lr=[1.7266666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 14:58:21,138] [INFO] [timer.py:197:stop] 0/8460, RunningAvgSamplesPerSec=6.324381795861367, CurrSamplesPerSec=5.663569977963099, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:58:32,596] [INFO] [timer.py:197:stop] 0/8462, RunningAvgSamplesPerSec=6.324375824697713, CurrSamplesPerSec=5.663230161671038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:58:43,922] [INFO] [timer.py:197:stop] 0/8464, RunningAvgSamplesPerSec=6.324370836385059, CurrSamplesPerSec=5.6750559016422, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:58:55,294] [INFO] [timer.py:197:stop] 0/8466, RunningAvgSamplesPerSec=6.324358709689297, CurrSamplesPerSec=5.686535985129173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:59:06,608] [INFO] [timer.py:197:stop] 0/8468, RunningAvgSamplesPerSec=6.324359433220492, CurrSamplesPerSec=5.6848040058428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:59:17,978] [INFO] [timer.py:197:stop] 0/8470, RunningAvgSamplesPerSec=6.324357024404786, CurrSamplesPerSec=5.676291700695601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:59:29,267] [INFO] [timer.py:197:stop] 0/8472, RunningAvgSamplesPerSec=6.324356235381352, CurrSamplesPerSec=5.695493528894086, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:59:40,721] [INFO] [timer.py:197:stop] 0/8474, RunningAvgSamplesPerSec=6.324354550396047, CurrSamplesPerSec=5.685154362898246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 14:59:52,000] [INFO] [timer.py:197:stop] 0/8476, RunningAvgSamplesPerSec=6.324353183799122, CurrSamplesPerSec=5.679853301403965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:00:03,302] [INFO] [timer.py:197:stop] 0/8478, RunningAvgSamplesPerSec=6.324350961560836, CurrSamplesPerSec=5.685031793216535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:00:14,622] [INFO] [logging.py:68:log_dist] [Rank 0] step=4240, skipped=6, lr=[1.7044444444444448e-06], mom=[[0.9, 0.999]] [2022-12-19 15:00:14,623] [INFO] [timer.py:197:stop] 0/8480, RunningAvgSamplesPerSec=6.324350782532028, CurrSamplesPerSec=5.687238374449418, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.7044444444444448e-06, 'epoch': 31.76} [2022-12-19 15:00:25,956] [INFO] [timer.py:197:stop] 0/8482, RunningAvgSamplesPerSec=6.324351474897878, CurrSamplesPerSec=5.6915364843120715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:00:37,232] [INFO] [timer.py:197:stop] 0/8484, RunningAvgSamplesPerSec=6.3243509383471075, CurrSamplesPerSec=5.673400692756692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:00:48,531] [INFO] [timer.py:197:stop] 0/8486, RunningAvgSamplesPerSec=6.324350475496616, CurrSamplesPerSec=5.683620335187676, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:01:00,067] [INFO] [timer.py:197:stop] 0/8488, RunningAvgSamplesPerSec=6.32434666420298, CurrSamplesPerSec=5.683262225720369, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:01:11,389] [INFO] [timer.py:197:stop] 0/8490, RunningAvgSamplesPerSec=6.324344092434678, CurrSamplesPerSec=5.686257727914783, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:01:22,705] [INFO] [timer.py:197:stop] 0/8492, RunningAvgSamplesPerSec=6.324342209358211, CurrSamplesPerSec=5.673409086307404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:01:34,027] [INFO] [timer.py:197:stop] 0/8494, RunningAvgSamplesPerSec=6.3243406427540245, CurrSamplesPerSec=5.697523454077001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:01:45,333] [INFO] [timer.py:197:stop] 0/8496, RunningAvgSamplesPerSec=6.324342922088693, CurrSamplesPerSec=5.693810194968999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:01:56,805] [INFO] [timer.py:197:stop] 0/8498, RunningAvgSamplesPerSec=6.324339838785201, CurrSamplesPerSec=5.6841366409622625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:02:08,125] [INFO] [logging.py:68:log_dist] [Rank 0] step=4250, skipped=6, lr=[1.6822222222222224e-06], mom=[[0.9, 0.999]] [2022-12-19 15:02:08,126] [INFO] [timer.py:197:stop] 0/8500, RunningAvgSamplesPerSec=6.324337464630968, CurrSamplesPerSec=5.6852858482359645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:02:19,629] [INFO] [timer.py:197:stop] 0/8502, RunningAvgSamplesPerSec=6.324336612180026, CurrSamplesPerSec=5.679392566369344, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:02:31,172] [INFO] [timer.py:197:stop] 0/8504, RunningAvgSamplesPerSec=6.324333880562854, CurrSamplesPerSec=5.648668662181801, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:02:42,500] [INFO] [timer.py:197:stop] 0/8506, RunningAvgSamplesPerSec=6.324330746332154, CurrSamplesPerSec=5.675144446541218, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:02:53,819] [INFO] [timer.py:197:stop] 0/8508, RunningAvgSamplesPerSec=6.324325145553044, CurrSamplesPerSec=5.670791742457364, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:03:05,339] [INFO] [timer.py:197:stop] 0/8510, RunningAvgSamplesPerSec=6.324316539212508, CurrSamplesPerSec=5.668386751173881, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:03:16,662] [INFO] [timer.py:197:stop] 0/8512, RunningAvgSamplesPerSec=6.324313913542862, CurrSamplesPerSec=5.688940490437187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:03:27,962] [INFO] [timer.py:197:stop] 0/8514, RunningAvgSamplesPerSec=6.3243118690754505, CurrSamplesPerSec=5.676007964138119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:03:39,548] [INFO] [timer.py:197:stop] 0/8516, RunningAvgSamplesPerSec=6.324311003254968, CurrSamplesPerSec=5.6941428203606685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:03:50,896] [INFO] [timer.py:197:stop] 0/8518, RunningAvgSamplesPerSec=6.324305122898119, CurrSamplesPerSec=5.684078867823448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:04:02,392] [INFO] [logging.py:68:log_dist] [Rank 0] step=4260, skipped=6, lr=[1.6600000000000002e-06], mom=[[0.9, 0.999]] [2022-12-19 15:04:02,393] [INFO] [timer.py:197:stop] 0/8520, RunningAvgSamplesPerSec=6.324301576865203, CurrSamplesPerSec=5.678978761690032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:04:13,709] [INFO] [timer.py:197:stop] 0/8522, RunningAvgSamplesPerSec=6.3242986990464996, CurrSamplesPerSec=5.67135892023476, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:04:25,132] [INFO] [timer.py:197:stop] 0/8524, RunningAvgSamplesPerSec=6.324294847843518, CurrSamplesPerSec=5.663693535908116, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:04:36,418] [INFO] [timer.py:197:stop] 0/8526, RunningAvgSamplesPerSec=6.324293960537501, CurrSamplesPerSec=5.694788374271629, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:04:47,726] [INFO] [timer.py:197:stop] 0/8528, RunningAvgSamplesPerSec=6.324291211979245, CurrSamplesPerSec=5.691470354896874, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:04:59,173] [INFO] [timer.py:197:stop] 0/8530, RunningAvgSamplesPerSec=6.324289271033683, CurrSamplesPerSec=5.680743979280805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.648888888888889e-06, 'epoch': 31.95} [2022-12-19 15:05:10,462] [INFO] [timer.py:197:stop] 0/8532, RunningAvgSamplesPerSec=6.324288430762402, CurrSamplesPerSec=5.67508229685149, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:05:21,759] [INFO] [timer.py:197:stop] 0/8534, RunningAvgSamplesPerSec=6.3242880709931955, CurrSamplesPerSec=5.6877861898620425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:05:33,063] [INFO] [timer.py:197:stop] 0/8536, RunningAvgSamplesPerSec=6.3242880582422405, CurrSamplesPerSec=5.682702770010962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:05:44,576] [INFO] [timer.py:197:stop] 0/8538, RunningAvgSamplesPerSec=6.324285615676115, CurrSamplesPerSec=5.676141907303084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:05:55,928] [INFO] [logging.py:68:log_dist] [Rank 0] step=4270, skipped=6, lr=[1.6377777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 15:05:55,929] [INFO] [timer.py:197:stop] 0/8540, RunningAvgSamplesPerSec=6.324281080630212, CurrSamplesPerSec=5.664133082625157, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:06:07,253] [INFO] [timer.py:197:stop] 0/8542, RunningAvgSamplesPerSec=6.3242784956525355, CurrSamplesPerSec=5.684596942251742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:06:17,645] [INFO] [timer.py:197:stop] 0/8544, RunningAvgSamplesPerSec=6.32439837768994, CurrSamplesPerSec=6.659132950535032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:06:29,119] [INFO] [timer.py:197:stop] 0/8546, RunningAvgSamplesPerSec=6.324393089667802, CurrSamplesPerSec=5.6580078121624275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:06:40,430] [INFO] [timer.py:197:stop] 0/8548, RunningAvgSamplesPerSec=6.324390545513252, CurrSamplesPerSec=5.686754032539105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:06:51,712] [INFO] [timer.py:197:stop] 0/8550, RunningAvgSamplesPerSec=6.32439284657065, CurrSamplesPerSec=5.714501369596124, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:07:03,040] [INFO] [timer.py:197:stop] 0/8552, RunningAvgSamplesPerSec=6.324387446694769, CurrSamplesPerSec=5.676283778731651, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:07:14,370] [INFO] [timer.py:197:stop] 0/8554, RunningAvgSamplesPerSec=6.324382516470552, CurrSamplesPerSec=5.691534794855329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:07:25,669] [INFO] [timer.py:197:stop] 0/8556, RunningAvgSamplesPerSec=6.324383583254193, CurrSamplesPerSec=5.701409633888713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:07:37,213] [INFO] [timer.py:197:stop] 0/8558, RunningAvgSamplesPerSec=6.324379557020904, CurrSamplesPerSec=5.668694865022425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:07:48,523] [INFO] [logging.py:68:log_dist] [Rank 0] step=4280, skipped=6, lr=[1.6155555555555559e-06], mom=[[0.9, 0.999]] [2022-12-19 15:07:48,525] [INFO] [timer.py:197:stop] 0/8560, RunningAvgSamplesPerSec=6.3243778511211275, CurrSamplesPerSec=5.692809895867574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:07:59,814] [INFO] [timer.py:197:stop] 0/8562, RunningAvgSamplesPerSec=6.324377050378271, CurrSamplesPerSec=5.685019271660546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:08:11,204] [INFO] [timer.py:197:stop] 0/8564, RunningAvgSamplesPerSec=6.3243697928640366, CurrSamplesPerSec=5.653920697259605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:08:22,624] [INFO] [timer.py:197:stop] 0/8566, RunningAvgSamplesPerSec=6.324364847809089, CurrSamplesPerSec=5.660230459876417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:08:33,950] [INFO] [timer.py:197:stop] 0/8568, RunningAvgSamplesPerSec=6.324365883506766, CurrSamplesPerSec=5.694729176294523, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:08:45,453] [INFO] [timer.py:197:stop] 0/8570, RunningAvgSamplesPerSec=6.324363382273424, CurrSamplesPerSec=5.687454306683898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:08:56,724] [INFO] [timer.py:197:stop] 0/8572, RunningAvgSamplesPerSec=6.324367694254155, CurrSamplesPerSec=5.7147612286186735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:09:08,197] [INFO] [timer.py:197:stop] 0/8574, RunningAvgSamplesPerSec=6.324365372177889, CurrSamplesPerSec=5.695639511432995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:09:19,509] [INFO] [timer.py:197:stop] 0/8576, RunningAvgSamplesPerSec=6.324360883473087, CurrSamplesPerSec=5.675275949213582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:09:30,825] [INFO] [timer.py:197:stop] 0/8578, RunningAvgSamplesPerSec=6.324354859518585, CurrSamplesPerSec=5.667170423676084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:09:42,159] [INFO] [logging.py:68:log_dist] [Rank 0] step=4290, skipped=6, lr=[1.5933333333333335e-06], mom=[[0.9, 0.999]] [2022-12-19 15:09:42,160] [INFO] [timer.py:197:stop] 0/8580, RunningAvgSamplesPerSec=6.324352520807218, CurrSamplesPerSec=5.675452815519628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:09:53,475] [INFO] [timer.py:197:stop] 0/8582, RunningAvgSamplesPerSec=6.32435121446213, CurrSamplesPerSec=5.69674259820511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.5911111111111113e-06, 'epoch': 32.14} [2022-12-19 15:10:04,995] [INFO] [timer.py:197:stop] 0/8584, RunningAvgSamplesPerSec=6.3243484556541105, CurrSamplesPerSec=5.675891069086685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:10:16,300] [INFO] [timer.py:197:stop] 0/8586, RunningAvgSamplesPerSec=6.3243485579479914, CurrSamplesPerSec=5.694869078802632, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:10:27,728] [INFO] [timer.py:197:stop] 0/8588, RunningAvgSamplesPerSec=6.3243474192757585, CurrSamplesPerSec=5.692518469718932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:10:39,012] [INFO] [timer.py:197:stop] 0/8590, RunningAvgSamplesPerSec=6.324348034247726, CurrSamplesPerSec=5.697069278969814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:10:50,350] [INFO] [timer.py:197:stop] 0/8592, RunningAvgSamplesPerSec=6.324346910798452, CurrSamplesPerSec=5.69260731893718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:11:01,678] [INFO] [timer.py:197:stop] 0/8594, RunningAvgSamplesPerSec=6.324343431453987, CurrSamplesPerSec=5.687533839428736, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:11:13,005] [INFO] [timer.py:197:stop] 0/8596, RunningAvgSamplesPerSec=6.324341724204936, CurrSamplesPerSec=5.673594949865487, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:11:24,355] [INFO] [timer.py:197:stop] 0/8598, RunningAvgSamplesPerSec=6.324332845720422, CurrSamplesPerSec=5.625732170712635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:11:35,658] [INFO] [logging.py:68:log_dist] [Rank 0] step=4300, skipped=6, lr=[1.5711111111111113e-06], mom=[[0.9, 0.999]] [2022-12-19 15:11:35,660] [INFO] [timer.py:197:stop] 0/8600, RunningAvgSamplesPerSec=6.324333551988613, CurrSamplesPerSec=5.685459726464692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:11:46,964] [INFO] [timer.py:197:stop] 0/8602, RunningAvgSamplesPerSec=6.324333763417794, CurrSamplesPerSec=5.691221055991674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:11:58,474] [INFO] [timer.py:197:stop] 0/8604, RunningAvgSamplesPerSec=6.324330818183937, CurrSamplesPerSec=5.686188589437016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:12:09,758] [INFO] [timer.py:197:stop] 0/8606, RunningAvgSamplesPerSec=6.3243326984288535, CurrSamplesPerSec=5.718043617201387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:12:21,087] [INFO] [timer.py:197:stop] 0/8608, RunningAvgSamplesPerSec=6.324333608721895, CurrSamplesPerSec=5.700537402180953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:12:32,405] [INFO] [timer.py:197:stop] 0/8610, RunningAvgSamplesPerSec=6.324330549879674, CurrSamplesPerSec=5.684198266938145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:12:43,718] [INFO] [timer.py:197:stop] 0/8612, RunningAvgSamplesPerSec=6.324330174651443, CurrSamplesPerSec=5.682641417055105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:12:55,088] [INFO] [timer.py:197:stop] 0/8614, RunningAvgSamplesPerSec=6.32433295616388, CurrSamplesPerSec=5.727620369865789, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:13:06,557] [INFO] [timer.py:197:stop] 0/8616, RunningAvgSamplesPerSec=6.32433079700079, CurrSamplesPerSec=5.697559733166986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:13:17,817] [INFO] [timer.py:197:stop] 0/8618, RunningAvgSamplesPerSec=6.324332602967465, CurrSamplesPerSec=5.704387207880025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:13:29,230] [INFO] [logging.py:68:log_dist] [Rank 0] step=4310, skipped=6, lr=[1.548888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 15:13:29,232] [INFO] [timer.py:197:stop] 0/8620, RunningAvgSamplesPerSec=6.324332993087232, CurrSamplesPerSec=5.707080313824146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:13:40,514] [INFO] [timer.py:197:stop] 0/8622, RunningAvgSamplesPerSec=6.3243333357328435, CurrSamplesPerSec=5.708528457062246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:13:51,822] [INFO] [timer.py:197:stop] 0/8624, RunningAvgSamplesPerSec=6.324333640096521, CurrSamplesPerSec=5.692415137703685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:14:03,139] [INFO] [timer.py:197:stop] 0/8626, RunningAvgSamplesPerSec=6.3243333580589, CurrSamplesPerSec=5.699427763010198, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:14:14,443] [INFO] [timer.py:197:stop] 0/8628, RunningAvgSamplesPerSec=6.324336205661227, CurrSamplesPerSec=5.693742805016297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:14:25,731] [INFO] [timer.py:197:stop] 0/8630, RunningAvgSamplesPerSec=6.324338580073613, CurrSamplesPerSec=5.706564441668354, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:14:37,274] [INFO] [timer.py:197:stop] 0/8632, RunningAvgSamplesPerSec=6.3243376605764245, CurrSamplesPerSec=5.68032108281229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.5355555555555558e-06, 'epoch': 32.33} [2022-12-19 15:14:48,587] [INFO] [timer.py:197:stop] 0/8634, RunningAvgSamplesPerSec=6.324335606354695, CurrSamplesPerSec=5.69015363759679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:14:59,986] [INFO] [timer.py:197:stop] 0/8636, RunningAvgSamplesPerSec=6.324322462510947, CurrSamplesPerSec=5.623543597173835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:15:11,308] [INFO] [timer.py:197:stop] 0/8638, RunningAvgSamplesPerSec=6.324317939941913, CurrSamplesPerSec=5.673480552402277, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:15:22,614] [INFO] [logging.py:68:log_dist] [Rank 0] step=4320, skipped=6, lr=[1.526666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 15:15:22,615] [INFO] [timer.py:197:stop] 0/8640, RunningAvgSamplesPerSec=6.324320022097739, CurrSamplesPerSec=5.707247519121006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:15:33,928] [INFO] [timer.py:197:stop] 0/8642, RunningAvgSamplesPerSec=6.324318241279973, CurrSamplesPerSec=5.69022673236781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:15:45,219] [INFO] [timer.py:197:stop] 0/8644, RunningAvgSamplesPerSec=6.324318263464713, CurrSamplesPerSec=5.680256656131789, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:15:56,539] [INFO] [timer.py:197:stop] 0/8646, RunningAvgSamplesPerSec=6.324314246461094, CurrSamplesPerSec=5.656256694056275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:16:07,877] [INFO] [timer.py:197:stop] 0/8648, RunningAvgSamplesPerSec=6.324307505789451, CurrSamplesPerSec=5.677602249860945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:16:19,210] [INFO] [timer.py:197:stop] 0/8650, RunningAvgSamplesPerSec=6.32430508913844, CurrSamplesPerSec=5.69032636621868, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:16:30,562] [INFO] [timer.py:197:stop] 0/8652, RunningAvgSamplesPerSec=6.324296515429065, CurrSamplesPerSec=5.648543381732684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:16:42,108] [INFO] [timer.py:197:stop] 0/8654, RunningAvgSamplesPerSec=6.324290719927663, CurrSamplesPerSec=5.656649552153257, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:16:53,414] [INFO] [timer.py:197:stop] 0/8656, RunningAvgSamplesPerSec=6.3242904987899, CurrSamplesPerSec=5.664416828456128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:17:04,702] [INFO] [timer.py:197:stop] 0/8658, RunningAvgSamplesPerSec=6.3242858544320395, CurrSamplesPerSec=5.655579332725572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:17:15,969] [INFO] [logging.py:68:log_dist] [Rank 0] step=4330, skipped=6, lr=[1.5044444444444446e-06], mom=[[0.9, 0.999]] [2022-12-19 15:17:15,970] [INFO] [timer.py:197:stop] 0/8660, RunningAvgSamplesPerSec=6.324285840413869, CurrSamplesPerSec=5.685522103656761, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:17:27,254] [INFO] [timer.py:197:stop] 0/8662, RunningAvgSamplesPerSec=6.324284667651946, CurrSamplesPerSec=5.674667680474356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:17:38,619] [INFO] [timer.py:197:stop] 0/8664, RunningAvgSamplesPerSec=6.324276879768223, CurrSamplesPerSec=5.656827881965941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:17:49,925] [INFO] [timer.py:197:stop] 0/8666, RunningAvgSamplesPerSec=6.324275943607626, CurrSamplesPerSec=5.673101658740379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:18:01,435] [INFO] [timer.py:197:stop] 0/8668, RunningAvgSamplesPerSec=6.324271027839865, CurrSamplesPerSec=5.672188205562194, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:18:12,798] [INFO] [timer.py:197:stop] 0/8670, RunningAvgSamplesPerSec=6.324265965969047, CurrSamplesPerSec=5.67517444205224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:18:24,194] [INFO] [timer.py:197:stop] 0/8672, RunningAvgSamplesPerSec=6.324266747536519, CurrSamplesPerSec=5.703787469082713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:18:35,517] [INFO] [timer.py:197:stop] 0/8674, RunningAvgSamplesPerSec=6.324265201921424, CurrSamplesPerSec=5.682437638509645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:18:46,959] [INFO] [timer.py:197:stop] 0/8676, RunningAvgSamplesPerSec=6.324267580987817, CurrSamplesPerSec=5.701930872497705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:18:58,303] [INFO] [timer.py:197:stop] 0/8678, RunningAvgSamplesPerSec=6.3242630214844, CurrSamplesPerSec=5.675972439054684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:19:09,598] [INFO] [logging.py:68:log_dist] [Rank 0] step=4340, skipped=6, lr=[1.4822222222222224e-06], mom=[[0.9, 0.999]] [2022-12-19 15:19:09,600] [INFO] [timer.py:197:stop] 0/8680, RunningAvgSamplesPerSec=6.324263972404626, CurrSamplesPerSec=5.671304521841188, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:19:20,890] [INFO] [timer.py:197:stop] 0/8682, RunningAvgSamplesPerSec=6.324263818448559, CurrSamplesPerSec=5.6980255977371215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.48e-06, 'epoch': 32.52} [2022-12-19 15:19:32,273] [INFO] [timer.py:197:stop] 0/8684, RunningAvgSamplesPerSec=6.324262577129733, CurrSamplesPerSec=5.6950623923697945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:19:43,599] [INFO] [timer.py:197:stop] 0/8686, RunningAvgSamplesPerSec=6.324259601530878, CurrSamplesPerSec=5.684225710185554, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:19:55,029] [INFO] [timer.py:197:stop] 0/8688, RunningAvgSamplesPerSec=6.324237249965437, CurrSamplesPerSec=5.552336205067439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:20:06,516] [INFO] [timer.py:197:stop] 0/8690, RunningAvgSamplesPerSec=6.324236771520533, CurrSamplesPerSec=5.698772204177787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:20:17,824] [INFO] [timer.py:197:stop] 0/8692, RunningAvgSamplesPerSec=6.324235834441364, CurrSamplesPerSec=5.7012202482974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:20:29,105] [INFO] [timer.py:197:stop] 0/8694, RunningAvgSamplesPerSec=6.324235405430976, CurrSamplesPerSec=5.698434198995986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:20:40,388] [INFO] [timer.py:197:stop] 0/8696, RunningAvgSamplesPerSec=6.324235805644765, CurrSamplesPerSec=5.716125389434137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:20:51,680] [INFO] [timer.py:197:stop] 0/8698, RunningAvgSamplesPerSec=6.324234793634151, CurrSamplesPerSec=5.680016511345223, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:21:03,079] [INFO] [logging.py:68:log_dist] [Rank 0] step=4350, skipped=6, lr=[1.46e-06], mom=[[0.9, 0.999]] [2022-12-19 15:21:03,080] [INFO] [timer.py:197:stop] 0/8700, RunningAvgSamplesPerSec=6.324233869828957, CurrSamplesPerSec=5.694019621239892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:21:14,541] [INFO] [timer.py:197:stop] 0/8702, RunningAvgSamplesPerSec=6.324237258853427, CurrSamplesPerSec=5.717713552356454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:21:26,023] [INFO] [timer.py:197:stop] 0/8704, RunningAvgSamplesPerSec=6.324237810728861, CurrSamplesPerSec=5.696755413250745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:21:37,341] [INFO] [timer.py:197:stop] 0/8706, RunningAvgSamplesPerSec=6.324236820653229, CurrSamplesPerSec=5.702961761933135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:21:48,649] [INFO] [timer.py:197:stop] 0/8708, RunningAvgSamplesPerSec=6.3242360051872915, CurrSamplesPerSec=5.688889612268611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:21:59,978] [INFO] [timer.py:197:stop] 0/8710, RunningAvgSamplesPerSec=6.324232094022373, CurrSamplesPerSec=5.675533692930839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:22:11,307] [INFO] [timer.py:197:stop] 0/8712, RunningAvgSamplesPerSec=6.324227260505834, CurrSamplesPerSec=5.67039020988995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:22:22,614] [INFO] [timer.py:197:stop] 0/8714, RunningAvgSamplesPerSec=6.324226711586121, CurrSamplesPerSec=5.691715572414906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:22:33,936] [INFO] [timer.py:197:stop] 0/8716, RunningAvgSamplesPerSec=6.324224755953759, CurrSamplesPerSec=5.666530876466565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:22:45,259] [INFO] [timer.py:197:stop] 0/8718, RunningAvgSamplesPerSec=6.324223149846008, CurrSamplesPerSec=5.688248287807011, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:22:56,560] [INFO] [logging.py:68:log_dist] [Rank 0] step=4360, skipped=6, lr=[1.437777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 15:22:56,562] [INFO] [timer.py:197:stop] 0/8720, RunningAvgSamplesPerSec=6.324223438764905, CurrSamplesPerSec=5.687926474571801, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:23:07,912] [INFO] [timer.py:197:stop] 0/8722, RunningAvgSamplesPerSec=6.3242134700485515, CurrSamplesPerSec=5.619807985387414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:23:19,267] [INFO] [timer.py:197:stop] 0/8724, RunningAvgSamplesPerSec=6.324208576587907, CurrSamplesPerSec=5.6763858057163805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:23:30,652] [INFO] [timer.py:197:stop] 0/8726, RunningAvgSamplesPerSec=6.324207861751254, CurrSamplesPerSec=5.690350008661587, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:23:41,955] [INFO] [timer.py:197:stop] 0/8728, RunningAvgSamplesPerSec=6.324202809890134, CurrSamplesPerSec=5.670893332501825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:23:53,413] [INFO] [timer.py:197:stop] 0/8730, RunningAvgSamplesPerSec=6.32419060806549, CurrSamplesPerSec=5.615922946659628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:24:04,822] [INFO] [timer.py:197:stop] 0/8732, RunningAvgSamplesPerSec=6.324182391092218, CurrSamplesPerSec=5.671426500309331, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.4244444444444447e-06, 'epoch': 32.7} [2022-12-19 15:24:16,184] [INFO] [timer.py:197:stop] 0/8734, RunningAvgSamplesPerSec=6.324183613659826, CurrSamplesPerSec=5.710219534471811, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:24:27,617] [INFO] [timer.py:197:stop] 0/8736, RunningAvgSamplesPerSec=6.324180088841298, CurrSamplesPerSec=5.6672659015798565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:24:38,915] [INFO] [timer.py:197:stop] 0/8738, RunningAvgSamplesPerSec=6.324181351216188, CurrSamplesPerSec=5.707630987713289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:24:50,229] [INFO] [logging.py:68:log_dist] [Rank 0] step=4370, skipped=6, lr=[1.4155555555555556e-06], mom=[[0.9, 0.999]] [2022-12-19 15:24:50,231] [INFO] [timer.py:197:stop] 0/8740, RunningAvgSamplesPerSec=6.324177509059371, CurrSamplesPerSec=5.679264717836007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:25:01,536] [INFO] [timer.py:197:stop] 0/8742, RunningAvgSamplesPerSec=6.324179692207612, CurrSamplesPerSec=5.714873403532969, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:25:12,861] [INFO] [timer.py:197:stop] 0/8744, RunningAvgSamplesPerSec=6.3241774732944025, CurrSamplesPerSec=5.668343900385806, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:25:24,257] [INFO] [timer.py:197:stop] 0/8746, RunningAvgSamplesPerSec=6.324173850931399, CurrSamplesPerSec=5.682090021713555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:25:35,669] [INFO] [timer.py:197:stop] 0/8748, RunningAvgSamplesPerSec=6.3241676617369205, CurrSamplesPerSec=5.66690889291206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:25:47,046] [INFO] [timer.py:197:stop] 0/8750, RunningAvgSamplesPerSec=6.324165182397935, CurrSamplesPerSec=5.688359665363504, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:25:58,373] [INFO] [timer.py:197:stop] 0/8752, RunningAvgSamplesPerSec=6.3241633839094655, CurrSamplesPerSec=5.686433111107336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:26:09,687] [INFO] [timer.py:197:stop] 0/8754, RunningAvgSamplesPerSec=6.324163268170435, CurrSamplesPerSec=5.694805771462159, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:26:21,033] [INFO] [timer.py:197:stop] 0/8756, RunningAvgSamplesPerSec=6.324158843600378, CurrSamplesPerSec=5.675925633035851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:26:32,424] [INFO] [timer.py:197:stop] 0/8758, RunningAvgSamplesPerSec=6.32414598239083, CurrSamplesPerSec=5.666834003369758, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:26:43,744] [INFO] [logging.py:68:log_dist] [Rank 0] step=4380, skipped=6, lr=[1.3933333333333335e-06], mom=[[0.9, 0.999]] [2022-12-19 15:26:43,746] [INFO] [timer.py:197:stop] 0/8760, RunningAvgSamplesPerSec=6.32414410528829, CurrSamplesPerSec=5.650882549583945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:26:55,040] [INFO] [timer.py:197:stop] 0/8762, RunningAvgSamplesPerSec=6.3241462050958255, CurrSamplesPerSec=5.711832127339839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:27:06,317] [INFO] [timer.py:197:stop] 0/8764, RunningAvgSamplesPerSec=6.324146976856106, CurrSamplesPerSec=5.687408998114716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:27:17,877] [INFO] [timer.py:197:stop] 0/8766, RunningAvgSamplesPerSec=6.324146503691765, CurrSamplesPerSec=5.6781510929885535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:27:29,207] [INFO] [timer.py:197:stop] 0/8768, RunningAvgSamplesPerSec=6.3241431389746605, CurrSamplesPerSec=5.672514713307682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:27:40,634] [INFO] [timer.py:197:stop] 0/8770, RunningAvgSamplesPerSec=6.324138651423073, CurrSamplesPerSec=5.657766921433482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:27:51,989] [INFO] [timer.py:197:stop] 0/8772, RunningAvgSamplesPerSec=6.324130481701845, CurrSamplesPerSec=5.624173712891171, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:28:03,324] [INFO] [timer.py:197:stop] 0/8774, RunningAvgSamplesPerSec=6.324126831903174, CurrSamplesPerSec=5.663024905622038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:28:14,814] [INFO] [timer.py:197:stop] 0/8776, RunningAvgSamplesPerSec=6.324098554803283, CurrSamplesPerSec=5.496910710098048, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:28:26,332] [INFO] [timer.py:197:stop] 0/8778, RunningAvgSamplesPerSec=6.324100006635581, CurrSamplesPerSec=5.70339676128947, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:28:37,636] [INFO] [logging.py:68:log_dist] [Rank 0] step=4390, skipped=6, lr=[1.371111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 15:28:37,638] [INFO] [timer.py:197:stop] 0/8780, RunningAvgSamplesPerSec=6.3241024277202476, CurrSamplesPerSec=5.7128695183967775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:28:48,985] [INFO] [timer.py:197:stop] 0/8782, RunningAvgSamplesPerSec=6.324095449105546, CurrSamplesPerSec=5.675232994203316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.3688888888888891e-06, 'epoch': 32.89} [2022-12-19 15:29:00,564] [INFO] [timer.py:197:stop] 0/8784, RunningAvgSamplesPerSec=6.324093823946785, CurrSamplesPerSec=5.671800376630485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:29:12,023] [INFO] [timer.py:197:stop] 0/8786, RunningAvgSamplesPerSec=6.324071451920683, CurrSamplesPerSec=5.552195868173035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:29:23,322] [INFO] [timer.py:197:stop] 0/8788, RunningAvgSamplesPerSec=6.324070036161671, CurrSamplesPerSec=5.6921005775828375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:29:34,894] [INFO] [timer.py:197:stop] 0/8790, RunningAvgSamplesPerSec=6.32402816019948, CurrSamplesPerSec=5.420469180201747, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:29:46,307] [INFO] [timer.py:197:stop] 0/8792, RunningAvgSamplesPerSec=6.324025519636047, CurrSamplesPerSec=5.670112332817148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:29:57,631] [INFO] [timer.py:197:stop] 0/8794, RunningAvgSamplesPerSec=6.324022170285646, CurrSamplesPerSec=5.685706353162136, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:30:09,044] [INFO] [timer.py:197:stop] 0/8796, RunningAvgSamplesPerSec=6.324017826859642, CurrSamplesPerSec=5.680973845956995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:30:20,392] [INFO] [timer.py:197:stop] 0/8798, RunningAvgSamplesPerSec=6.324012015405575, CurrSamplesPerSec=5.66020491874314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:30:31,740] [INFO] [logging.py:68:log_dist] [Rank 0] step=4400, skipped=6, lr=[1.3488888888888891e-06], mom=[[0.9, 0.999]] [2022-12-19 15:30:31,741] [INFO] [timer.py:197:stop] 0/8800, RunningAvgSamplesPerSec=6.324006622369081, CurrSamplesPerSec=5.642712703326814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:30:43,087] [INFO] [timer.py:197:stop] 0/8802, RunningAvgSamplesPerSec=6.324002026326177, CurrSamplesPerSec=5.673635481729693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:30:54,625] [INFO] [timer.py:197:stop] 0/8804, RunningAvgSamplesPerSec=6.324002990270675, CurrSamplesPerSec=5.697042920657041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:31:06,208] [INFO] [timer.py:197:stop] 0/8806, RunningAvgSamplesPerSec=6.323961863992103, CurrSamplesPerSec=5.706127988636187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:31:17,524] [INFO] [timer.py:197:stop] 0/8808, RunningAvgSamplesPerSec=6.323962634838845, CurrSamplesPerSec=5.707496039725049, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:31:28,970] [INFO] [timer.py:197:stop] 0/8810, RunningAvgSamplesPerSec=6.3239409422629915, CurrSamplesPerSec=5.546073206678359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:31:39,348] [INFO] [timer.py:197:stop] 0/8812, RunningAvgSamplesPerSec=6.3240586297887695, CurrSamplesPerSec=5.690613707590018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:31:50,897] [INFO] [timer.py:197:stop] 0/8814, RunningAvgSamplesPerSec=6.324053543863657, CurrSamplesPerSec=5.665197694560647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:32:02,256] [INFO] [timer.py:197:stop] 0/8816, RunningAvgSamplesPerSec=6.324045785711556, CurrSamplesPerSec=5.677918332113927, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:32:13,658] [INFO] [timer.py:197:stop] 0/8818, RunningAvgSamplesPerSec=6.324044545105747, CurrSamplesPerSec=5.676413413648627, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:32:24,997] [INFO] [logging.py:68:log_dist] [Rank 0] step=4410, skipped=6, lr=[1.3266666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 15:32:24,999] [INFO] [timer.py:197:stop] 0/8820, RunningAvgSamplesPerSec=6.324031898738353, CurrSamplesPerSec=5.595549099335644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:32:36,491] [INFO] [timer.py:197:stop] 0/8822, RunningAvgSamplesPerSec=6.324031886886927, CurrSamplesPerSec=5.6928062739841945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:32:47,809] [INFO] [timer.py:197:stop] 0/8824, RunningAvgSamplesPerSec=6.32403075458222, CurrSamplesPerSec=5.681123894491168, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:32:59,357] [INFO] [timer.py:197:stop] 0/8826, RunningAvgSamplesPerSec=6.324029073005493, CurrSamplesPerSec=5.700459926430459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:33:10,639] [INFO] [timer.py:197:stop] 0/8828, RunningAvgSamplesPerSec=6.324030600826837, CurrSamplesPerSec=5.713928206099285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:33:22,067] [INFO] [timer.py:197:stop] 0/8830, RunningAvgSamplesPerSec=6.324017900071586, CurrSamplesPerSec=5.597418054426825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:33:33,588] [INFO] [timer.py:197:stop] 0/8832, RunningAvgSamplesPerSec=6.324018881530581, CurrSamplesPerSec=5.690499105327486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.3133333333333334e-06, 'epoch': 33.08} [2022-12-19 15:33:44,892] [INFO] [timer.py:197:stop] 0/8834, RunningAvgSamplesPerSec=6.324018803615403, CurrSamplesPerSec=5.687321998045718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:33:56,531] [INFO] [timer.py:197:stop] 0/8836, RunningAvgSamplesPerSec=6.324003289480892, CurrSamplesPerSec=5.662013418549301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:34:07,832] [INFO] [timer.py:197:stop] 0/8838, RunningAvgSamplesPerSec=6.324003798764334, CurrSamplesPerSec=5.681158041258299, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:34:19,178] [INFO] [logging.py:68:log_dist] [Rank 0] step=4420, skipped=6, lr=[1.3044444444444446e-06], mom=[[0.9, 0.999]] [2022-12-19 15:34:19,178] [INFO] [timer.py:197:stop] 0/8840, RunningAvgSamplesPerSec=6.323993851867425, CurrSamplesPerSec=5.614067681320683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:34:30,712] [INFO] [timer.py:197:stop] 0/8842, RunningAvgSamplesPerSec=6.323991310173968, CurrSamplesPerSec=5.668320679838351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:34:42,019] [INFO] [timer.py:197:stop] 0/8844, RunningAvgSamplesPerSec=6.323988624455495, CurrSamplesPerSec=5.695025178488089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:34:53,389] [INFO] [timer.py:197:stop] 0/8846, RunningAvgSamplesPerSec=6.323984992307161, CurrSamplesPerSec=5.681078205796923, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:35:04,714] [INFO] [timer.py:197:stop] 0/8848, RunningAvgSamplesPerSec=6.323983296820101, CurrSamplesPerSec=5.681967343586617, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:35:16,425] [INFO] [timer.py:197:stop] 0/8850, RunningAvgSamplesPerSec=6.323923938454901, CurrSamplesPerSec=5.298612712701134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:35:27,738] [INFO] [timer.py:197:stop] 0/8852, RunningAvgSamplesPerSec=6.3239233248463265, CurrSamplesPerSec=5.681720800627362, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:35:39,080] [INFO] [timer.py:197:stop] 0/8854, RunningAvgSamplesPerSec=6.323920956822353, CurrSamplesPerSec=5.675144206578408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:35:50,550] [INFO] [timer.py:197:stop] 0/8856, RunningAvgSamplesPerSec=6.3239191118272595, CurrSamplesPerSec=5.678839879094767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:36:01,884] [INFO] [timer.py:197:stop] 0/8858, RunningAvgSamplesPerSec=6.323915851030022, CurrSamplesPerSec=5.677623625134794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:36:13,362] [INFO] [logging.py:68:log_dist] [Rank 0] step=4430, skipped=6, lr=[1.2822222222222222e-06], mom=[[0.9, 0.999]] [2022-12-19 15:36:13,363] [INFO] [timer.py:197:stop] 0/8860, RunningAvgSamplesPerSec=6.323891959897214, CurrSamplesPerSec=5.553590824263321, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:36:24,841] [INFO] [timer.py:197:stop] 0/8862, RunningAvgSamplesPerSec=6.3238861130622785, CurrSamplesPerSec=5.66753560215946, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:36:36,181] [INFO] [timer.py:197:stop] 0/8864, RunningAvgSamplesPerSec=6.323882984775831, CurrSamplesPerSec=5.669771731007823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:36:47,653] [INFO] [timer.py:197:stop] 0/8866, RunningAvgSamplesPerSec=6.323862659626423, CurrSamplesPerSec=5.659406098903907, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:36:58,970] [INFO] [timer.py:197:stop] 0/8868, RunningAvgSamplesPerSec=6.323861408719546, CurrSamplesPerSec=5.678617873276337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:37:10,325] [INFO] [timer.py:197:stop] 0/8870, RunningAvgSamplesPerSec=6.323853513033638, CurrSamplesPerSec=5.649505830344935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:37:21,867] [INFO] [timer.py:197:stop] 0/8872, RunningAvgSamplesPerSec=6.323855601003283, CurrSamplesPerSec=5.699714087955356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:37:33,173] [INFO] [timer.py:197:stop] 0/8874, RunningAvgSamplesPerSec=6.323853278587702, CurrSamplesPerSec=5.698003100968439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:37:44,543] [INFO] [timer.py:197:stop] 0/8876, RunningAvgSamplesPerSec=6.323843568980673, CurrSamplesPerSec=5.681931984404098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:37:56,097] [INFO] [timer.py:197:stop] 0/8878, RunningAvgSamplesPerSec=6.323840336054217, CurrSamplesPerSec=5.667914470727885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:38:07,472] [INFO] [logging.py:68:log_dist] [Rank 0] step=4440, skipped=6, lr=[1.26e-06], mom=[[0.9, 0.999]] [2022-12-19 15:38:07,474] [INFO] [timer.py:197:stop] 0/8880, RunningAvgSamplesPerSec=6.323832799841172, CurrSamplesPerSec=5.654717254506145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:38:18,754] [INFO] [timer.py:197:stop] 0/8882, RunningAvgSamplesPerSec=6.323833590079767, CurrSamplesPerSec=5.695831427016649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.2577777777777779e-06, 'epoch': 33.27} [2022-12-19 15:38:30,384] [INFO] [timer.py:197:stop] 0/8884, RunningAvgSamplesPerSec=6.323787383518383, CurrSamplesPerSec=5.362698821668626, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:38:41,759] [INFO] [timer.py:197:stop] 0/8886, RunningAvgSamplesPerSec=6.323782747255144, CurrSamplesPerSec=5.658938654034368, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:38:53,043] [INFO] [timer.py:197:stop] 0/8888, RunningAvgSamplesPerSec=6.323784782799978, CurrSamplesPerSec=5.697492979998455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:39:04,464] [INFO] [timer.py:197:stop] 0/8890, RunningAvgSamplesPerSec=6.323774099837281, CurrSamplesPerSec=5.699743375560503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:39:15,786] [INFO] [timer.py:197:stop] 0/8892, RunningAvgSamplesPerSec=6.323771104644672, CurrSamplesPerSec=5.6639542919338055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:39:27,159] [INFO] [timer.py:197:stop] 0/8894, RunningAvgSamplesPerSec=6.32375754991367, CurrSamplesPerSec=5.589244149996165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:39:38,599] [INFO] [timer.py:197:stop] 0/8896, RunningAvgSamplesPerSec=6.32375304902901, CurrSamplesPerSec=5.678653431482366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:39:50,019] [INFO] [timer.py:197:stop] 0/8898, RunningAvgSamplesPerSec=6.323752471183498, CurrSamplesPerSec=5.683198213594311, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:40:01,410] [INFO] [logging.py:68:log_dist] [Rank 0] step=4450, skipped=6, lr=[1.2377777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 15:40:01,412] [INFO] [timer.py:197:stop] 0/8900, RunningAvgSamplesPerSec=6.323738950654541, CurrSamplesPerSec=5.690224078729411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:40:13,020] [INFO] [timer.py:197:stop] 0/8902, RunningAvgSamplesPerSec=6.32373249003248, CurrSamplesPerSec=5.661313902603506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:40:24,449] [INFO] [timer.py:197:stop] 0/8904, RunningAvgSamplesPerSec=6.323717447624283, CurrSamplesPerSec=5.577436887087612, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:40:35,773] [INFO] [timer.py:197:stop] 0/8906, RunningAvgSamplesPerSec=6.323711371123048, CurrSamplesPerSec=5.677550853793822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:40:47,344] [INFO] [timer.py:197:stop] 0/8908, RunningAvgSamplesPerSec=6.323674731142299, CurrSamplesPerSec=5.423638647494091, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:40:58,864] [INFO] [timer.py:197:stop] 0/8910, RunningAvgSamplesPerSec=6.323662704073645, CurrSamplesPerSec=5.618237521269672, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:41:10,193] [INFO] [timer.py:197:stop] 0/8912, RunningAvgSamplesPerSec=6.323659142428546, CurrSamplesPerSec=5.672600062204396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:41:21,758] [INFO] [timer.py:197:stop] 0/8914, RunningAvgSamplesPerSec=6.323643892335788, CurrSamplesPerSec=5.60769330721212, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:41:33,143] [INFO] [timer.py:197:stop] 0/8916, RunningAvgSamplesPerSec=6.323642155872206, CurrSamplesPerSec=5.675269229951197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:41:44,518] [INFO] [timer.py:197:stop] 0/8918, RunningAvgSamplesPerSec=6.3236305913569275, CurrSamplesPerSec=5.591558212264175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:41:55,898] [INFO] [logging.py:68:log_dist] [Rank 0] step=4460, skipped=6, lr=[1.2155555555555557e-06], mom=[[0.9, 0.999]] [2022-12-19 15:41:55,900] [INFO] [timer.py:197:stop] 0/8920, RunningAvgSamplesPerSec=6.323629865123412, CurrSamplesPerSec=5.685073933473597, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:42:07,248] [INFO] [timer.py:197:stop] 0/8922, RunningAvgSamplesPerSec=6.323630078248506, CurrSamplesPerSec=5.694695107812603, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:42:18,606] [INFO] [timer.py:197:stop] 0/8924, RunningAvgSamplesPerSec=6.323625981293052, CurrSamplesPerSec=5.695619933869941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:42:30,138] [INFO] [timer.py:197:stop] 0/8926, RunningAvgSamplesPerSec=6.323628595405352, CurrSamplesPerSec=5.70955760474091, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:42:41,595] [INFO] [timer.py:197:stop] 0/8928, RunningAvgSamplesPerSec=6.323609476074532, CurrSamplesPerSec=5.554583709970171, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:42:52,936] [INFO] [timer.py:197:stop] 0/8930, RunningAvgSamplesPerSec=6.323613318875628, CurrSamplesPerSec=5.713063812793581, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:43:04,685] [INFO] [timer.py:197:stop] 0/8932, RunningAvgSamplesPerSec=6.323560565763333, CurrSamplesPerSec=5.327803100244816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.2022222222222223e-06, 'epoch': 33.46} [2022-12-19 15:43:16,182] [INFO] [timer.py:197:stop] 0/8934, RunningAvgSamplesPerSec=6.323557436205139, CurrSamplesPerSec=5.686220869910774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:43:27,449] [INFO] [timer.py:197:stop] 0/8936, RunningAvgSamplesPerSec=6.323558129473465, CurrSamplesPerSec=5.71203096992881, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:43:38,803] [INFO] [timer.py:197:stop] 0/8938, RunningAvgSamplesPerSec=6.323552141400391, CurrSamplesPerSec=5.668801647333834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:43:50,258] [INFO] [logging.py:68:log_dist] [Rank 0] step=4470, skipped=6, lr=[1.1933333333333335e-06], mom=[[0.9, 0.999]] [2022-12-19 15:43:50,260] [INFO] [timer.py:197:stop] 0/8940, RunningAvgSamplesPerSec=6.323552371249255, CurrSamplesPerSec=5.6848663686473335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:44:01,606] [INFO] [timer.py:197:stop] 0/8942, RunningAvgSamplesPerSec=6.323545519130629, CurrSamplesPerSec=5.648455664953627, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:44:12,858] [INFO] [timer.py:197:stop] 0/8944, RunningAvgSamplesPerSec=6.323547115655917, CurrSamplesPerSec=5.690895528153259, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:44:24,191] [INFO] [timer.py:197:stop] 0/8946, RunningAvgSamplesPerSec=6.3235445771816705, CurrSamplesPerSec=5.660944511386886, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:44:35,835] [INFO] [timer.py:197:stop] 0/8948, RunningAvgSamplesPerSec=6.323541190187725, CurrSamplesPerSec=5.686343731782379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:44:47,143] [INFO] [timer.py:197:stop] 0/8950, RunningAvgSamplesPerSec=6.323539834700049, CurrSamplesPerSec=5.692744219764854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:44:58,580] [INFO] [timer.py:197:stop] 0/8952, RunningAvgSamplesPerSec=6.323533330072924, CurrSamplesPerSec=5.648839832309697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:45:09,813] [INFO] [timer.py:197:stop] 0/8954, RunningAvgSamplesPerSec=6.32353588863964, CurrSamplesPerSec=5.707480263848071, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:45:21,251] [INFO] [timer.py:197:stop] 0/8956, RunningAvgSamplesPerSec=6.323521181928573, CurrSamplesPerSec=5.617655054319776, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:45:32,815] [INFO] [timer.py:197:stop] 0/8958, RunningAvgSamplesPerSec=6.323517515155234, CurrSamplesPerSec=5.687101738302961, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:45:44,147] [INFO] [logging.py:68:log_dist] [Rank 0] step=4480, skipped=6, lr=[1.171111111111111e-06], mom=[[0.9, 0.999]] [2022-12-19 15:45:44,148] [INFO] [timer.py:197:stop] 0/8960, RunningAvgSamplesPerSec=6.323516425689598, CurrSamplesPerSec=5.704852977261512, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:45:55,666] [INFO] [timer.py:197:stop] 0/8962, RunningAvgSamplesPerSec=6.323515510312004, CurrSamplesPerSec=5.694460505434988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:46:07,017] [INFO] [timer.py:197:stop] 0/8964, RunningAvgSamplesPerSec=6.323509697666315, CurrSamplesPerSec=5.649801668483457, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:46:18,350] [INFO] [timer.py:197:stop] 0/8966, RunningAvgSamplesPerSec=6.323503924117797, CurrSamplesPerSec=5.658061001691882, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:46:29,668] [INFO] [timer.py:197:stop] 0/8968, RunningAvgSamplesPerSec=6.323501613071905, CurrSamplesPerSec=5.676126784371111, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:46:41,086] [INFO] [timer.py:197:stop] 0/8970, RunningAvgSamplesPerSec=6.323499522049706, CurrSamplesPerSec=5.6753018665175325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:46:52,754] [INFO] [timer.py:197:stop] 0/8972, RunningAvgSamplesPerSec=6.323450511179718, CurrSamplesPerSec=5.670729688065295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:47:04,066] [INFO] [timer.py:197:stop] 0/8974, RunningAvgSamplesPerSec=6.323445382001952, CurrSamplesPerSec=5.668404226820925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:47:15,751] [INFO] [timer.py:197:stop] 0/8976, RunningAvgSamplesPerSec=6.323438510634855, CurrSamplesPerSec=5.656272426401999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:47:27,070] [INFO] [timer.py:197:stop] 0/8978, RunningAvgSamplesPerSec=6.3234380728983135, CurrSamplesPerSec=5.684726234841532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:47:38,389] [INFO] [logging.py:68:log_dist] [Rank 0] step=4490, skipped=6, lr=[1.148888888888889e-06], mom=[[0.9, 0.999]] [2022-12-19 15:47:38,390] [INFO] [timer.py:197:stop] 0/8980, RunningAvgSamplesPerSec=6.323433103975106, CurrSamplesPerSec=5.653447728500855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:47:49,917] [INFO] [timer.py:197:stop] 0/8982, RunningAvgSamplesPerSec=6.323432579955181, CurrSamplesPerSec=5.675697614754256, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.1466666666666668e-06, 'epoch': 33.64} [2022-12-19 15:48:01,247] [INFO] [timer.py:197:stop] 0/8984, RunningAvgSamplesPerSec=6.323430885063757, CurrSamplesPerSec=5.692198104400633, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:48:12,649] [INFO] [timer.py:197:stop] 0/8986, RunningAvgSamplesPerSec=6.323418745807003, CurrSamplesPerSec=5.692988339695532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:48:24,148] [INFO] [timer.py:197:stop] 0/8988, RunningAvgSamplesPerSec=6.32342035477588, CurrSamplesPerSec=5.717065712994098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:48:35,515] [INFO] [timer.py:197:stop] 0/8990, RunningAvgSamplesPerSec=6.323412011184888, CurrSamplesPerSec=5.625435546933448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:48:46,846] [INFO] [timer.py:197:stop] 0/8992, RunningAvgSamplesPerSec=6.3234134608403245, CurrSamplesPerSec=5.704590381630891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:48:58,296] [INFO] [timer.py:197:stop] 0/8994, RunningAvgSamplesPerSec=6.323395213513427, CurrSamplesPerSec=5.554774973175471, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:49:09,927] [INFO] [timer.py:197:stop] 0/8996, RunningAvgSamplesPerSec=6.323390327651662, CurrSamplesPerSec=5.668797098229558, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:49:21,252] [INFO] [timer.py:197:stop] 0/8998, RunningAvgSamplesPerSec=6.323390787929262, CurrSamplesPerSec=5.6868186066939925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:49:32,709] [INFO] [logging.py:68:log_dist] [Rank 0] step=4500, skipped=6, lr=[1.1266666666666667e-06], mom=[[0.9, 0.999]] [2022-12-19 15:49:32,710] [INFO] [timer.py:197:stop] 0/9000, RunningAvgSamplesPerSec=6.323389771877217, CurrSamplesPerSec=5.68070575012107, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:49:44,163] [INFO] [timer.py:197:stop] 0/9002, RunningAvgSamplesPerSec=6.323390234159069, CurrSamplesPerSec=5.695671174196365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:49:55,574] [INFO] [timer.py:197:stop] 0/9004, RunningAvgSamplesPerSec=6.323381222952024, CurrSamplesPerSec=5.634086776815446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:50:06,950] [INFO] [timer.py:197:stop] 0/9006, RunningAvgSamplesPerSec=6.32337742542854, CurrSamplesPerSec=5.664655655951809, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:50:18,290] [INFO] [timer.py:197:stop] 0/9008, RunningAvgSamplesPerSec=6.323377135338811, CurrSamplesPerSec=5.685753320632309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:50:29,952] [INFO] [timer.py:197:stop] 0/9010, RunningAvgSamplesPerSec=6.323378567104373, CurrSamplesPerSec=5.706249043681178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:50:41,323] [INFO] [timer.py:197:stop] 0/9012, RunningAvgSamplesPerSec=6.323376248276379, CurrSamplesPerSec=5.68428204189203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:50:52,851] [INFO] [timer.py:197:stop] 0/9014, RunningAvgSamplesPerSec=6.323369055960813, CurrSamplesPerSec=5.67895281079285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:51:04,338] [INFO] [timer.py:197:stop] 0/9016, RunningAvgSamplesPerSec=6.3233642715564695, CurrSamplesPerSec=5.671731589273814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:51:15,949] [INFO] [timer.py:197:stop] 0/9018, RunningAvgSamplesPerSec=6.323326568951416, CurrSamplesPerSec=5.443351181949506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:51:27,304] [INFO] [logging.py:68:log_dist] [Rank 0] step=4510, skipped=6, lr=[1.1044444444444446e-06], mom=[[0.9, 0.999]] [2022-12-19 15:51:27,305] [INFO] [timer.py:197:stop] 0/9020, RunningAvgSamplesPerSec=6.323321913957894, CurrSamplesPerSec=5.653560128811695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:51:38,788] [INFO] [timer.py:197:stop] 0/9022, RunningAvgSamplesPerSec=6.323321157734947, CurrSamplesPerSec=5.692504707995691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:51:50,213] [INFO] [timer.py:197:stop] 0/9024, RunningAvgSamplesPerSec=6.323310246053406, CurrSamplesPerSec=5.695025178488089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:52:01,642] [INFO] [timer.py:197:stop] 0/9026, RunningAvgSamplesPerSec=6.32330556118568, CurrSamplesPerSec=5.668698935126467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:52:13,007] [INFO] [timer.py:197:stop] 0/9028, RunningAvgSamplesPerSec=6.323301919847579, CurrSamplesPerSec=5.673523241062834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:52:24,338] [INFO] [timer.py:197:stop] 0/9030, RunningAvgSamplesPerSec=6.323299646206129, CurrSamplesPerSec=5.670459204280251, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:52:35,804] [INFO] [timer.py:197:stop] 0/9032, RunningAvgSamplesPerSec=6.323296964391675, CurrSamplesPerSec=5.678262796159818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.0911111111111112e-06, 'epoch': 33.83} [2022-12-19 15:52:47,121] [INFO] [timer.py:197:stop] 0/9034, RunningAvgSamplesPerSec=6.323295050561048, CurrSamplesPerSec=5.707203593413499, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:52:58,455] [INFO] [timer.py:197:stop] 0/9036, RunningAvgSamplesPerSec=6.323291064945384, CurrSamplesPerSec=5.67140684919959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:53:09,933] [INFO] [timer.py:197:stop] 0/9038, RunningAvgSamplesPerSec=6.3232843100616645, CurrSamplesPerSec=5.662207374367384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:53:21,312] [INFO] [logging.py:68:log_dist] [Rank 0] step=4520, skipped=6, lr=[1.0822222222222222e-06], mom=[[0.9, 0.999]] [2022-12-19 15:53:21,314] [INFO] [timer.py:197:stop] 0/9040, RunningAvgSamplesPerSec=6.323275990682158, CurrSamplesPerSec=5.644575320735228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:53:32,757] [INFO] [timer.py:197:stop] 0/9042, RunningAvgSamplesPerSec=6.323265224467037, CurrSamplesPerSec=5.5891815399975755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:53:44,103] [INFO] [timer.py:197:stop] 0/9044, RunningAvgSamplesPerSec=6.323261622558663, CurrSamplesPerSec=5.665027444530056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:53:55,681] [INFO] [timer.py:197:stop] 0/9046, RunningAvgSamplesPerSec=6.32325577952559, CurrSamplesPerSec=5.664355869709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:54:07,346] [INFO] [timer.py:197:stop] 0/9048, RunningAvgSamplesPerSec=6.323209961565842, CurrSamplesPerSec=5.69359281375146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:54:18,678] [INFO] [timer.py:197:stop] 0/9050, RunningAvgSamplesPerSec=6.323208264984011, CurrSamplesPerSec=5.674456316621927, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:54:30,169] [INFO] [timer.py:197:stop] 0/9052, RunningAvgSamplesPerSec=6.3232057053201425, CurrSamplesPerSec=5.689054065366736, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:54:41,546] [INFO] [timer.py:197:stop] 0/9054, RunningAvgSamplesPerSec=6.323201937205017, CurrSamplesPerSec=5.6553491335501995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:54:53,025] [INFO] [timer.py:197:stop] 0/9056, RunningAvgSamplesPerSec=6.3231865052203515, CurrSamplesPerSec=5.5911831938735395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:55:04,408] [INFO] [timer.py:197:stop] 0/9058, RunningAvgSamplesPerSec=6.323181409221067, CurrSamplesPerSec=5.665261062704806, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:55:16,023] [INFO] [logging.py:68:log_dist] [Rank 0] step=4530, skipped=6, lr=[1.06e-06], mom=[[0.9, 0.999]] [2022-12-19 15:55:16,025] [INFO] [timer.py:197:stop] 0/9060, RunningAvgSamplesPerSec=6.323181327708091, CurrSamplesPerSec=5.681869204899072, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:55:27,452] [INFO] [timer.py:197:stop] 0/9062, RunningAvgSamplesPerSec=6.323171284759019, CurrSamplesPerSec=5.681462254167408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:55:38,821] [INFO] [timer.py:197:stop] 0/9064, RunningAvgSamplesPerSec=6.323168170542251, CurrSamplesPerSec=5.683027601850898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:55:50,236] [INFO] [timer.py:197:stop] 0/9066, RunningAvgSamplesPerSec=6.32315948284428, CurrSamplesPerSec=5.651432663694737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:56:01,589] [INFO] [timer.py:197:stop] 0/9068, RunningAvgSamplesPerSec=6.323159765610076, CurrSamplesPerSec=5.699398720666729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:56:13,205] [INFO] [timer.py:197:stop] 0/9070, RunningAvgSamplesPerSec=6.323122888215652, CurrSamplesPerSec=5.439301903103003, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:56:24,744] [INFO] [timer.py:197:stop] 0/9072, RunningAvgSamplesPerSec=6.323113823077329, CurrSamplesPerSec=5.620993471923475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:56:36,075] [INFO] [timer.py:197:stop] 0/9074, RunningAvgSamplesPerSec=6.323108769214801, CurrSamplesPerSec=5.674531167854095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:56:47,380] [INFO] [timer.py:197:stop] 0/9076, RunningAvgSamplesPerSec=6.323107867811444, CurrSamplesPerSec=5.6964558464146355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:56:57,895] [INFO] [timer.py:197:stop] 0/9078, RunningAvgSamplesPerSec=6.323217859256297, CurrSamplesPerSec=6.641191412896074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:57:09,277] [INFO] [logging.py:68:log_dist] [Rank 0] step=4540, skipped=6, lr=[1.0377777777777778e-06], mom=[[0.9, 0.999]] [2022-12-19 15:57:09,278] [INFO] [timer.py:197:stop] 0/9080, RunningAvgSamplesPerSec=6.32321317792037, CurrSamplesPerSec=5.655213548809802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:57:20,856] [INFO] [timer.py:197:stop] 0/9082, RunningAvgSamplesPerSec=6.323212513202633, CurrSamplesPerSec=5.6814961645018665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:57:32,207] [INFO] [timer.py:197:stop] 0/9084, RunningAvgSamplesPerSec=6.323210587883521, CurrSamplesPerSec=5.6602774848087805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.0333333333333333e-06, 'epoch': 34.02} [2022-12-19 15:57:43,565] [INFO] [timer.py:197:stop] 0/9086, RunningAvgSamplesPerSec=6.323206699639693, CurrSamplesPerSec=5.684198507667234, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:57:55,211] [INFO] [timer.py:197:stop] 0/9088, RunningAvgSamplesPerSec=6.323198962633191, CurrSamplesPerSec=5.648550988736207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:58:06,753] [INFO] [timer.py:197:stop] 0/9090, RunningAvgSamplesPerSec=6.3231689617641935, CurrSamplesPerSec=5.454443813279142, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:58:18,078] [INFO] [timer.py:197:stop] 0/9092, RunningAvgSamplesPerSec=6.323167011076989, CurrSamplesPerSec=5.674251205576841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:58:29,490] [INFO] [timer.py:197:stop] 0/9094, RunningAvgSamplesPerSec=6.323160398512082, CurrSamplesPerSec=5.651149979225715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:58:41,053] [INFO] [timer.py:197:stop] 0/9096, RunningAvgSamplesPerSec=6.323162183947367, CurrSamplesPerSec=5.691885016548178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:58:52,440] [INFO] [timer.py:197:stop] 0/9098, RunningAvgSamplesPerSec=6.323152657462647, CurrSamplesPerSec=5.624471853488032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:59:03,807] [INFO] [logging.py:68:log_dist] [Rank 0] step=4550, skipped=6, lr=[1.0155555555555557e-06], mom=[[0.9, 0.999]] [2022-12-19 15:59:03,809] [INFO] [timer.py:197:stop] 0/9100, RunningAvgSamplesPerSec=6.323144060527465, CurrSamplesPerSec=5.684288782512897, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:59:15,393] [INFO] [timer.py:197:stop] 0/9102, RunningAvgSamplesPerSec=6.323141524867252, CurrSamplesPerSec=5.684150843705397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:59:26,759] [INFO] [timer.py:197:stop] 0/9104, RunningAvgSamplesPerSec=6.323134906663921, CurrSamplesPerSec=5.617795663056977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:59:38,077] [INFO] [timer.py:197:stop] 0/9106, RunningAvgSamplesPerSec=6.323135938467985, CurrSamplesPerSec=5.703576838938539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 15:59:49,802] [INFO] [timer.py:197:stop] 0/9108, RunningAvgSamplesPerSec=6.323075340712299, CurrSamplesPerSec=5.2892895905121255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:00:01,174] [INFO] [timer.py:197:stop] 0/9110, RunningAvgSamplesPerSec=6.3230737521819655, CurrSamplesPerSec=5.686994265453489, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:00:12,453] [INFO] [timer.py:197:stop] 0/9112, RunningAvgSamplesPerSec=6.323076807934703, CurrSamplesPerSec=5.692709209226383, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:00:23,924] [INFO] [timer.py:197:stop] 0/9114, RunningAvgSamplesPerSec=6.323069394313172, CurrSamplesPerSec=5.686738371118032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:00:35,328] [INFO] [timer.py:197:stop] 0/9116, RunningAvgSamplesPerSec=6.3230646352619475, CurrSamplesPerSec=5.647732402525945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:00:46,909] [INFO] [timer.py:197:stop] 0/9118, RunningAvgSamplesPerSec=6.323028937779285, CurrSamplesPerSec=5.455993892045183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:00:58,260] [INFO] [logging.py:68:log_dist] [Rank 0] step=4560, skipped=6, lr=[9.933333333333333e-07], mom=[[0.9, 0.999]] [2022-12-19 16:00:58,261] [INFO] [timer.py:197:stop] 0/9120, RunningAvgSamplesPerSec=6.323019979101601, CurrSamplesPerSec=5.615687505758248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:01:09,646] [INFO] [timer.py:197:stop] 0/9122, RunningAvgSamplesPerSec=6.323011556998278, CurrSamplesPerSec=5.6334426155196375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:01:21,230] [INFO] [timer.py:197:stop] 0/9124, RunningAvgSamplesPerSec=6.323009188568488, CurrSamplesPerSec=5.67896434449565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:01:32,529] [INFO] [timer.py:197:stop] 0/9126, RunningAvgSamplesPerSec=6.32300937292142, CurrSamplesPerSec=5.6945059264255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:01:44,060] [INFO] [timer.py:197:stop] 0/9128, RunningAvgSamplesPerSec=6.323008459525045, CurrSamplesPerSec=5.692548166295795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:01:55,343] [INFO] [timer.py:197:stop] 0/9130, RunningAvgSamplesPerSec=6.323009711982276, CurrSamplesPerSec=5.722053963096774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:02:06,848] [INFO] [timer.py:197:stop] 0/9132, RunningAvgSamplesPerSec=6.322981986396933, CurrSamplesPerSec=5.486095769511735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:02:18,197] [INFO] [timer.py:197:stop] 0/9134, RunningAvgSamplesPerSec=6.322977955640329, CurrSamplesPerSec=5.657697520089129, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 9.77777777777778e-07, 'epoch': 34.21} [2022-12-19 16:02:29,490] [INFO] [timer.py:197:stop] 0/9136, RunningAvgSamplesPerSec=6.3229810616526745, CurrSamplesPerSec=5.70981385657702, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:02:40,876] [INFO] [timer.py:197:stop] 0/9138, RunningAvgSamplesPerSec=6.322965339746968, CurrSamplesPerSec=5.693481955721675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:02:52,186] [INFO] [logging.py:68:log_dist] [Rank 0] step=4570, skipped=6, lr=[9.711111111111111e-07], mom=[[0.9, 0.999]] [2022-12-19 16:02:52,187] [INFO] [timer.py:197:stop] 0/9140, RunningAvgSamplesPerSec=6.3229651410391305, CurrSamplesPerSec=5.6853202858770775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:03:03,743] [INFO] [timer.py:197:stop] 0/9142, RunningAvgSamplesPerSec=6.322948326799095, CurrSamplesPerSec=5.676845332699347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:03:15,026] [INFO] [timer.py:197:stop] 0/9144, RunningAvgSamplesPerSec=6.322950743281768, CurrSamplesPerSec=5.703764926793433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:03:26,362] [INFO] [timer.py:197:stop] 0/9146, RunningAvgSamplesPerSec=6.322944253924651, CurrSamplesPerSec=5.6533424763297555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:03:37,892] [INFO] [timer.py:197:stop] 0/9148, RunningAvgSamplesPerSec=6.322942555990443, CurrSamplesPerSec=5.685969621868994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:03:49,211] [INFO] [timer.py:197:stop] 0/9150, RunningAvgSamplesPerSec=6.322942546486573, CurrSamplesPerSec=5.69447113581463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:04:00,637] [INFO] [timer.py:197:stop] 0/9152, RunningAvgSamplesPerSec=6.32293242845732, CurrSamplesPerSec=5.702763065643517, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:04:12,166] [INFO] [timer.py:197:stop] 0/9154, RunningAvgSamplesPerSec=6.322916868519529, CurrSamplesPerSec=5.643369426925937, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:04:23,853] [INFO] [timer.py:197:stop] 0/9156, RunningAvgSamplesPerSec=6.3228607063155895, CurrSamplesPerSec=5.303596651762446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:04:35,190] [INFO] [timer.py:197:stop] 0/9158, RunningAvgSamplesPerSec=6.322859695014232, CurrSamplesPerSec=5.695184669976877, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:04:46,590] [INFO] [logging.py:68:log_dist] [Rank 0] step=4580, skipped=6, lr=[9.488888888888889e-07], mom=[[0.9, 0.999]] [2022-12-19 16:04:46,591] [INFO] [timer.py:197:stop] 0/9160, RunningAvgSamplesPerSec=6.322844451828596, CurrSamplesPerSec=5.591267976239167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:04:57,876] [INFO] [timer.py:197:stop] 0/9162, RunningAvgSamplesPerSec=6.322843342742733, CurrSamplesPerSec=5.698048336587061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:05:09,560] [INFO] [timer.py:197:stop] 0/9164, RunningAvgSamplesPerSec=6.322793548254315, CurrSamplesPerSec=5.356051698625808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:05:20,903] [INFO] [timer.py:197:stop] 0/9166, RunningAvgSamplesPerSec=6.322791414683152, CurrSamplesPerSec=5.661439033764716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:05:32,231] [INFO] [timer.py:197:stop] 0/9168, RunningAvgSamplesPerSec=6.322788955201434, CurrSamplesPerSec=5.670039993426664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:05:43,567] [INFO] [timer.py:197:stop] 0/9170, RunningAvgSamplesPerSec=6.322781734017509, CurrSamplesPerSec=5.682551916295465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:05:54,884] [INFO] [timer.py:197:stop] 0/9172, RunningAvgSamplesPerSec=6.3227805039065, CurrSamplesPerSec=5.680817553828689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:06:06,234] [INFO] [timer.py:197:stop] 0/9174, RunningAvgSamplesPerSec=6.322773698762592, CurrSamplesPerSec=5.667331948328022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:06:17,539] [INFO] [timer.py:197:stop] 0/9176, RunningAvgSamplesPerSec=6.3227728760051525, CurrSamplesPerSec=5.680489849179637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:06:28,878] [INFO] [timer.py:197:stop] 0/9178, RunningAvgSamplesPerSec=6.322766974751252, CurrSamplesPerSec=5.670391407694905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:06:40,217] [INFO] [logging.py:68:log_dist] [Rank 0] step=4590, skipped=6, lr=[9.266666666666667e-07], mom=[[0.9, 0.999]] [2022-12-19 16:06:40,219] [INFO] [timer.py:197:stop] 0/9180, RunningAvgSamplesPerSec=6.322761336970404, CurrSamplesPerSec=5.662203074710157, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:06:51,604] [INFO] [timer.py:197:stop] 0/9182, RunningAvgSamplesPerSec=6.322751895328947, CurrSamplesPerSec=5.618009411224992, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:07:02,962] [INFO] [timer.py:197:stop] 0/9184, RunningAvgSamplesPerSec=6.322745120595309, CurrSamplesPerSec=5.655626042091599, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 9.222222222222222e-07, 'epoch': 34.4} [2022-12-19 16:07:14,321] [INFO] [timer.py:197:stop] 0/9186, RunningAvgSamplesPerSec=6.322738840490264, CurrSamplesPerSec=5.6165506505571345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:07:25,624] [INFO] [timer.py:197:stop] 0/9188, RunningAvgSamplesPerSec=6.322741837169283, CurrSamplesPerSec=5.70487843795727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:07:37,201] [INFO] [timer.py:197:stop] 0/9190, RunningAvgSamplesPerSec=6.322703367473801, CurrSamplesPerSec=5.424172586498367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:07:48,460] [INFO] [timer.py:197:stop] 0/9192, RunningAvgSamplesPerSec=6.322704868617776, CurrSamplesPerSec=5.707758902938432, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:07:59,790] [INFO] [timer.py:197:stop] 0/9194, RunningAvgSamplesPerSec=6.322700625087438, CurrSamplesPerSec=5.666088804759663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:08:11,428] [INFO] [timer.py:197:stop] 0/9196, RunningAvgSamplesPerSec=6.32269824413382, CurrSamplesPerSec=5.6853939790141785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:08:22,746] [INFO] [timer.py:197:stop] 0/9198, RunningAvgSamplesPerSec=6.322695579811025, CurrSamplesPerSec=5.663583122169084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:08:34,141] [INFO] [logging.py:68:log_dist] [Rank 0] step=4600, skipped=6, lr=[9.044444444444445e-07], mom=[[0.9, 0.999]] [2022-12-19 16:08:34,143] [INFO] [timer.py:197:stop] 0/9200, RunningAvgSamplesPerSec=6.322680844739795, CurrSamplesPerSec=5.688727338692689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:08:45,427] [INFO] [timer.py:197:stop] 0/9202, RunningAvgSamplesPerSec=6.3226802738588175, CurrSamplesPerSec=5.6813275788211826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:08:56,747] [INFO] [timer.py:197:stop] 0/9204, RunningAvgSamplesPerSec=6.322677908800252, CurrSamplesPerSec=5.692617942398909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:09:08,059] [INFO] [timer.py:197:stop] 0/9206, RunningAvgSamplesPerSec=6.322678096818882, CurrSamplesPerSec=5.6798900769049485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:09:19,686] [INFO] [timer.py:197:stop] 0/9208, RunningAvgSamplesPerSec=6.322671463350533, CurrSamplesPerSec=5.671648663180514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:09:31,002] [INFO] [timer.py:197:stop] 0/9210, RunningAvgSamplesPerSec=6.322672781382841, CurrSamplesPerSec=5.690908799501505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:09:42,506] [INFO] [timer.py:197:stop] 0/9212, RunningAvgSamplesPerSec=6.322642808517744, CurrSamplesPerSec=5.4845715719897346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:09:53,814] [INFO] [timer.py:197:stop] 0/9214, RunningAvgSamplesPerSec=6.322640122785086, CurrSamplesPerSec=5.665974709788017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:10:05,196] [INFO] [timer.py:197:stop] 0/9216, RunningAvgSamplesPerSec=6.322629662711259, CurrSamplesPerSec=5.627873127763353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:10:16,480] [INFO] [timer.py:197:stop] 0/9218, RunningAvgSamplesPerSec=6.322635004738392, CurrSamplesPerSec=5.721895890341347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:10:27,807] [INFO] [logging.py:68:log_dist] [Rank 0] step=4610, skipped=6, lr=[8.822222222222222e-07], mom=[[0.9, 0.999]] [2022-12-19 16:10:27,809] [INFO] [timer.py:197:stop] 0/9220, RunningAvgSamplesPerSec=6.3226336599319914, CurrSamplesPerSec=5.671459811647893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:10:39,128] [INFO] [timer.py:197:stop] 0/9222, RunningAvgSamplesPerSec=6.322632864197676, CurrSamplesPerSec=5.688736983222985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:10:50,561] [INFO] [timer.py:197:stop] 0/9224, RunningAvgSamplesPerSec=6.32262198980247, CurrSamplesPerSec=5.645254322549602, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:11:02,480] [INFO] [timer.py:197:stop] 0/9226, RunningAvgSamplesPerSec=6.322615605071156, CurrSamplesPerSec=5.665794128606417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:11:14,413] [INFO] [timer.py:197:stop] 0/9228, RunningAvgSamplesPerSec=6.322608245957112, CurrSamplesPerSec=5.676591551285375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:11:26,000] [INFO] [timer.py:197:stop] 0/9230, RunningAvgSamplesPerSec=6.322605324169335, CurrSamplesPerSec=5.6723938866770744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:11:37,325] [INFO] [timer.py:197:stop] 0/9232, RunningAvgSamplesPerSec=6.322602102616689, CurrSamplesPerSec=5.66642752885074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:11:48,627] [INFO] [timer.py:197:stop] 0/9234, RunningAvgSamplesPerSec=6.322601629408505, CurrSamplesPerSec=5.686336986286977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 8.666666666666668e-07, 'epoch': 34.58} [2022-12-19 16:12:00,158] [INFO] [timer.py:197:stop] 0/9236, RunningAvgSamplesPerSec=6.322602399659632, CurrSamplesPerSec=5.684956905501303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:12:11,578] [INFO] [timer.py:197:stop] 0/9238, RunningAvgSamplesPerSec=6.322586555020157, CurrSamplesPerSec=5.59383642315159, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:12:23,140] [INFO] [logging.py:68:log_dist] [Rank 0] step=4620, skipped=6, lr=[8.6e-07], mom=[[0.9, 0.999]] [2022-12-19 16:12:23,142] [INFO] [timer.py:197:stop] 0/9240, RunningAvgSamplesPerSec=6.322581845349933, CurrSamplesPerSec=5.687652901694955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:12:34,456] [INFO] [timer.py:197:stop] 0/9242, RunningAvgSamplesPerSec=6.32257931100032, CurrSamplesPerSec=5.688466466554031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:12:45,773] [INFO] [timer.py:197:stop] 0/9244, RunningAvgSamplesPerSec=6.322576411850607, CurrSamplesPerSec=5.669241268905974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:12:57,095] [INFO] [timer.py:197:stop] 0/9246, RunningAvgSamplesPerSec=6.32257497179128, CurrSamplesPerSec=5.682327214447428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:13:08,496] [INFO] [timer.py:197:stop] 0/9248, RunningAvgSamplesPerSec=6.32257281595039, CurrSamplesPerSec=5.671384082840286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:13:19,859] [INFO] [timer.py:197:stop] 0/9250, RunningAvgSamplesPerSec=6.3225680117257275, CurrSamplesPerSec=5.669789694216079, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:13:31,168] [INFO] [timer.py:197:stop] 0/9252, RunningAvgSamplesPerSec=6.322568258980446, CurrSamplesPerSec=5.6834788187954155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:13:42,544] [INFO] [timer.py:197:stop] 0/9254, RunningAvgSamplesPerSec=6.3225671928430165, CurrSamplesPerSec=5.6757544975999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:13:53,867] [INFO] [timer.py:197:stop] 0/9256, RunningAvgSamplesPerSec=6.322565627566117, CurrSamplesPerSec=5.682580065424022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:14:05,169] [INFO] [timer.py:197:stop] 0/9258, RunningAvgSamplesPerSec=6.322564820362171, CurrSamplesPerSec=5.693224511243757, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:14:16,492] [INFO] [logging.py:68:log_dist] [Rank 0] step=4630, skipped=6, lr=[8.37777777777778e-07], mom=[[0.9, 0.999]] [2022-12-19 16:14:16,493] [INFO] [timer.py:197:stop] 0/9260, RunningAvgSamplesPerSec=6.322563200041035, CurrSamplesPerSec=5.673093266099292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:14:27,798] [INFO] [timer.py:197:stop] 0/9262, RunningAvgSamplesPerSec=6.322564474442468, CurrSamplesPerSec=5.675381779694881, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:14:39,150] [INFO] [timer.py:197:stop] 0/9264, RunningAvgSamplesPerSec=6.322560424878411, CurrSamplesPerSec=5.658557883132318, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:14:50,664] [INFO] [timer.py:197:stop] 0/9266, RunningAvgSamplesPerSec=6.3225589725791025, CurrSamplesPerSec=5.686341804496345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:15:01,952] [INFO] [timer.py:197:stop] 0/9268, RunningAvgSamplesPerSec=6.322556766259363, CurrSamplesPerSec=5.67901504528649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:15:13,428] [INFO] [timer.py:197:stop] 0/9270, RunningAvgSamplesPerSec=6.322556066552901, CurrSamplesPerSec=5.680674013096329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:15:24,732] [INFO] [timer.py:197:stop] 0/9272, RunningAvgSamplesPerSec=6.322554786418958, CurrSamplesPerSec=5.692616252300073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:15:36,182] [INFO] [timer.py:197:stop] 0/9274, RunningAvgSamplesPerSec=6.322558133974026, CurrSamplesPerSec=5.705088679245363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:15:47,546] [INFO] [timer.py:197:stop] 0/9276, RunningAvgSamplesPerSec=6.322556559290024, CurrSamplesPerSec=5.6849670188395764, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:15:58,866] [INFO] [timer.py:197:stop] 0/9278, RunningAvgSamplesPerSec=6.322554422527222, CurrSamplesPerSec=5.684907784084378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:16:10,166] [INFO] [logging.py:68:log_dist] [Rank 0] step=4640, skipped=6, lr=[8.155555555555557e-07], mom=[[0.9, 0.999]] [2022-12-19 16:16:10,167] [INFO] [timer.py:197:stop] 0/9280, RunningAvgSamplesPerSec=6.3225570088983645, CurrSamplesPerSec=5.706502087147441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:16:21,461] [INFO] [timer.py:197:stop] 0/9282, RunningAvgSamplesPerSec=6.322556722584169, CurrSamplesPerSec=5.7039450276923525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:16:32,901] [INFO] [timer.py:197:stop] 0/9284, RunningAvgSamplesPerSec=6.3225570787182175, CurrSamplesPerSec=5.708354621648811, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 8.111111111111112e-07, 'epoch': 34.77} [2022-12-19 16:16:44,239] [INFO] [timer.py:197:stop] 0/9286, RunningAvgSamplesPerSec=6.3225526322580405, CurrSamplesPerSec=5.677095775013672, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:16:55,567] [INFO] [timer.py:197:stop] 0/9288, RunningAvgSamplesPerSec=6.3225500992016785, CurrSamplesPerSec=5.6657034835843785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:17:07,010] [INFO] [timer.py:197:stop] 0/9290, RunningAvgSamplesPerSec=6.322547245771178, CurrSamplesPerSec=5.681442292954126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:17:18,511] [INFO] [timer.py:197:stop] 0/9292, RunningAvgSamplesPerSec=6.322547855369982, CurrSamplesPerSec=5.680758405512536, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:17:29,855] [INFO] [timer.py:197:stop] 0/9294, RunningAvgSamplesPerSec=6.3225436139436955, CurrSamplesPerSec=5.674741577567657, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:17:41,137] [INFO] [timer.py:197:stop] 0/9296, RunningAvgSamplesPerSec=6.322546381994253, CurrSamplesPerSec=5.69664684987753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:17:52,558] [INFO] [timer.py:197:stop] 0/9298, RunningAvgSamplesPerSec=6.322544185608419, CurrSamplesPerSec=5.674994233791987, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:18:03,905] [INFO] [logging.py:68:log_dist] [Rank 0] step=4650, skipped=6, lr=[7.933333333333335e-07], mom=[[0.9, 0.999]] [2022-12-19 16:18:03,906] [INFO] [timer.py:197:stop] 0/9300, RunningAvgSamplesPerSec=6.322538525878349, CurrSamplesPerSec=5.670304927477845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:18:15,303] [INFO] [timer.py:197:stop] 0/9302, RunningAvgSamplesPerSec=6.322538625222725, CurrSamplesPerSec=5.7027276894635035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:18:26,605] [INFO] [timer.py:197:stop] 0/9304, RunningAvgSamplesPerSec=6.322538976443515, CurrSamplesPerSec=5.710881131711393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:18:37,911] [INFO] [timer.py:197:stop] 0/9306, RunningAvgSamplesPerSec=6.322537433372484, CurrSamplesPerSec=5.695314202971585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:18:49,509] [INFO] [timer.py:197:stop] 0/9308, RunningAvgSamplesPerSec=6.32253341056151, CurrSamplesPerSec=5.670602229249404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:19:00,839] [INFO] [timer.py:197:stop] 0/9310, RunningAvgSamplesPerSec=6.322529008842687, CurrSamplesPerSec=5.671324891161495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:19:12,232] [INFO] [timer.py:197:stop] 0/9312, RunningAvgSamplesPerSec=6.32252607210859, CurrSamplesPerSec=5.66482827454733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:19:23,803] [INFO] [timer.py:197:stop] 0/9314, RunningAvgSamplesPerSec=6.322522320256179, CurrSamplesPerSec=5.673214122526761, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:19:35,137] [INFO] [timer.py:197:stop] 0/9316, RunningAvgSamplesPerSec=6.322518293132962, CurrSamplesPerSec=5.666383511566529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:19:46,464] [INFO] [timer.py:197:stop] 0/9318, RunningAvgSamplesPerSec=6.322516127940838, CurrSamplesPerSec=5.684512917259926, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:19:57,769] [INFO] [logging.py:68:log_dist] [Rank 0] step=4660, skipped=6, lr=[7.711111111111112e-07], mom=[[0.9, 0.999]] [2022-12-19 16:19:57,771] [INFO] [timer.py:197:stop] 0/9320, RunningAvgSamplesPerSec=6.322514942737274, CurrSamplesPerSec=5.700017870720625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:20:09,104] [INFO] [timer.py:197:stop] 0/9322, RunningAvgSamplesPerSec=6.322511049764516, CurrSamplesPerSec=5.654504277328681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:20:20,421] [INFO] [timer.py:197:stop] 0/9324, RunningAvgSamplesPerSec=6.32251067236725, CurrSamplesPerSec=5.688242261007351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:20:31,792] [INFO] [timer.py:197:stop] 0/9326, RunningAvgSamplesPerSec=6.3225037277584635, CurrSamplesPerSec=5.652431327819109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:20:43,100] [INFO] [timer.py:197:stop] 0/9328, RunningAvgSamplesPerSec=6.322501655269136, CurrSamplesPerSec=5.672527899064994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:20:54,503] [INFO] [timer.py:197:stop] 0/9330, RunningAvgSamplesPerSec=6.322491952372458, CurrSamplesPerSec=5.616162402315428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:21:06,015] [INFO] [timer.py:197:stop] 0/9332, RunningAvgSamplesPerSec=6.3224887917762915, CurrSamplesPerSec=5.687637235322494, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:21:17,364] [INFO] [timer.py:197:stop] 0/9334, RunningAvgSamplesPerSec=6.322485980600014, CurrSamplesPerSec=5.687642778798268, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 7.555555555555556e-07, 'epoch': 34.96} [2022-12-19 16:21:28,979] [INFO] [timer.py:197:stop] 0/9336, RunningAvgSamplesPerSec=6.322480824000118, CurrSamplesPerSec=5.668199553056884, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:21:40,325] [INFO] [timer.py:197:stop] 0/9338, RunningAvgSamplesPerSec=6.3224763570516, CurrSamplesPerSec=5.654018349205111, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:21:51,684] [INFO] [logging.py:68:log_dist] [Rank 0] step=4670, skipped=6, lr=[7.48888888888889e-07], mom=[[0.9, 0.999]] [2022-12-19 16:21:51,686] [INFO] [timer.py:197:stop] 0/9340, RunningAvgSamplesPerSec=6.322467576852614, CurrSamplesPerSec=5.629083267176576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:22:03,024] [INFO] [timer.py:197:stop] 0/9342, RunningAvgSamplesPerSec=6.322467151717146, CurrSamplesPerSec=5.694037979986531, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:22:14,540] [INFO] [timer.py:197:stop] 0/9344, RunningAvgSamplesPerSec=6.322465829203946, CurrSamplesPerSec=5.691404951037908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:22:24,943] [INFO] [timer.py:197:stop] 0/9346, RunningAvgSamplesPerSec=6.3225724872949955, CurrSamplesPerSec=5.675049182900858, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:22:36,269] [INFO] [timer.py:197:stop] 0/9348, RunningAvgSamplesPerSec=6.322567877718656, CurrSamplesPerSec=5.6646276841228245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:22:47,744] [INFO] [timer.py:197:stop] 0/9350, RunningAvgSamplesPerSec=6.3225646174546934, CurrSamplesPerSec=5.663898601417046, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:22:59,052] [INFO] [timer.py:197:stop] 0/9352, RunningAvgSamplesPerSec=6.322563249095236, CurrSamplesPerSec=5.68122513363719, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:23:10,356] [INFO] [timer.py:197:stop] 0/9354, RunningAvgSamplesPerSec=6.322566077516722, CurrSamplesPerSec=5.712588921280069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:23:21,688] [INFO] [timer.py:197:stop] 0/9356, RunningAvgSamplesPerSec=6.322565279182819, CurrSamplesPerSec=5.6819519490585035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:23:32,977] [INFO] [timer.py:197:stop] 0/9358, RunningAvgSamplesPerSec=6.322566113302093, CurrSamplesPerSec=5.691513797405224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:23:44,375] [INFO] [logging.py:68:log_dist] [Rank 0] step=4680, skipped=6, lr=[7.266666666666668e-07], mom=[[0.9, 0.999]] [2022-12-19 16:23:44,376] [INFO] [timer.py:197:stop] 0/9360, RunningAvgSamplesPerSec=6.322563148816672, CurrSamplesPerSec=5.666407673143306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:23:55,738] [INFO] [timer.py:197:stop] 0/9362, RunningAvgSamplesPerSec=6.322554418803424, CurrSamplesPerSec=5.688014939449925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:24:07,040] [INFO] [timer.py:197:stop] 0/9364, RunningAvgSamplesPerSec=6.322555389052311, CurrSamplesPerSec=5.711218913977723, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:24:18,384] [INFO] [timer.py:197:stop] 0/9366, RunningAvgSamplesPerSec=6.322551879089973, CurrSamplesPerSec=5.69221234745815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:24:29,926] [INFO] [timer.py:197:stop] 0/9368, RunningAvgSamplesPerSec=6.3225436869175375, CurrSamplesPerSec=5.686026710797736, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:24:41,267] [INFO] [timer.py:197:stop] 0/9370, RunningAvgSamplesPerSec=6.322538957585107, CurrSamplesPerSec=5.651792960817362, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:24:52,584] [INFO] [timer.py:197:stop] 0/9372, RunningAvgSamplesPerSec=6.322540522026257, CurrSamplesPerSec=5.697441706887684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:25:04,088] [INFO] [timer.py:197:stop] 0/9374, RunningAvgSamplesPerSec=6.322531712204792, CurrSamplesPerSec=5.64060927512237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:25:15,398] [INFO] [timer.py:197:stop] 0/9376, RunningAvgSamplesPerSec=6.322533308339677, CurrSamplesPerSec=5.691052857936798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:25:26,696] [INFO] [timer.py:197:stop] 0/9378, RunningAvgSamplesPerSec=6.3225360145688185, CurrSamplesPerSec=5.702894882123943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:25:37,988] [INFO] [logging.py:68:log_dist] [Rank 0] step=4690, skipped=6, lr=[7.044444444444446e-07], mom=[[0.9, 0.999]] [2022-12-19 16:25:37,990] [INFO] [timer.py:197:stop] 0/9380, RunningAvgSamplesPerSec=6.322536824223091, CurrSamplesPerSec=5.705999418762147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:25:49,280] [INFO] [timer.py:197:stop] 0/9382, RunningAvgSamplesPerSec=6.32253822970659, CurrSamplesPerSec=5.6901437470294445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:26:00,662] [INFO] [timer.py:197:stop] 0/9384, RunningAvgSamplesPerSec=6.322541666791659, CurrSamplesPerSec=5.699905067922258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 7.000000000000001e-07, 'epoch': 35.15} [2022-12-19 16:26:11,928] [INFO] [timer.py:197:stop] 0/9386, RunningAvgSamplesPerSec=6.322547766861077, CurrSamplesPerSec=5.700380273870067, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:26:23,240] [INFO] [timer.py:197:stop] 0/9388, RunningAvgSamplesPerSec=6.322547988127305, CurrSamplesPerSec=5.696036167343904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:26:34,506] [INFO] [timer.py:197:stop] 0/9390, RunningAvgSamplesPerSec=6.322552759616199, CurrSamplesPerSec=5.72062260112955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:26:45,960] [INFO] [timer.py:197:stop] 0/9392, RunningAvgSamplesPerSec=6.3225544957001345, CurrSamplesPerSec=5.682313020816068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:26:57,360] [INFO] [timer.py:197:stop] 0/9394, RunningAvgSamplesPerSec=6.322562212419936, CurrSamplesPerSec=5.704583592775432, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:27:08,672] [INFO] [timer.py:197:stop] 0/9396, RunningAvgSamplesPerSec=6.322562191653972, CurrSamplesPerSec=5.6917095382657115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:27:20,165] [INFO] [timer.py:197:stop] 0/9398, RunningAvgSamplesPerSec=6.322570309840051, CurrSamplesPerSec=5.736198198704124, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:27:31,456] [INFO] [logging.py:68:log_dist] [Rank 0] step=4700, skipped=6, lr=[6.822222222222223e-07], mom=[[0.9, 0.999]] [2022-12-19 16:27:31,458] [INFO] [timer.py:197:stop] 0/9400, RunningAvgSamplesPerSec=6.32257372166807, CurrSamplesPerSec=5.709097623606374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:27:42,901] [INFO] [timer.py:197:stop] 0/9402, RunningAvgSamplesPerSec=6.322578544396791, CurrSamplesPerSec=5.698278638145243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:27:54,177] [INFO] [timer.py:197:stop] 0/9404, RunningAvgSamplesPerSec=6.322583048648643, CurrSamplesPerSec=5.709128464796089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:28:05,489] [INFO] [timer.py:197:stop] 0/9406, RunningAvgSamplesPerSec=6.322586299559417, CurrSamplesPerSec=5.696308855003016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:28:16,911] [INFO] [timer.py:197:stop] 0/9408, RunningAvgSamplesPerSec=6.32259269896845, CurrSamplesPerSec=5.714916474030157, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:28:28,205] [INFO] [timer.py:197:stop] 0/9410, RunningAvgSamplesPerSec=6.3225972721667025, CurrSamplesPerSec=5.702406416305743, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:28:39,521] [INFO] [timer.py:197:stop] 0/9412, RunningAvgSamplesPerSec=6.322601553343033, CurrSamplesPerSec=5.70199627626406, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:28:50,853] [INFO] [timer.py:197:stop] 0/9414, RunningAvgSamplesPerSec=6.3225977690240995, CurrSamplesPerSec=5.707317413353358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:29:02,113] [INFO] [timer.py:197:stop] 0/9416, RunningAvgSamplesPerSec=6.322604399732466, CurrSamplesPerSec=5.714292623567177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:29:13,382] [INFO] [timer.py:197:stop] 0/9418, RunningAvgSamplesPerSec=6.322611052974593, CurrSamplesPerSec=5.705558201313372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:29:24,641] [INFO] [logging.py:68:log_dist] [Rank 0] step=4710, skipped=6, lr=[6.6e-07], mom=[[0.9, 0.999]] [2022-12-19 16:29:24,643] [INFO] [timer.py:197:stop] 0/9420, RunningAvgSamplesPerSec=6.322614534145536, CurrSamplesPerSec=5.703931695472888, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:29:35,897] [INFO] [timer.py:197:stop] 0/9422, RunningAvgSamplesPerSec=6.3226199531901965, CurrSamplesPerSec=5.7234731059373445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:29:47,280] [INFO] [timer.py:197:stop] 0/9424, RunningAvgSamplesPerSec=6.322620861610948, CurrSamplesPerSec=5.681665722246253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:29:58,530] [INFO] [timer.py:197:stop] 0/9426, RunningAvgSamplesPerSec=6.322626950295691, CurrSamplesPerSec=5.70430429387278, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:30:09,828] [INFO] [timer.py:197:stop] 0/9428, RunningAvgSamplesPerSec=6.322627606150757, CurrSamplesPerSec=5.688377746547162, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:30:21,025] [INFO] [timer.py:197:stop] 0/9430, RunningAvgSamplesPerSec=6.322637433671302, CurrSamplesPerSec=5.731561435803076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:30:32,350] [INFO] [timer.py:197:stop] 0/9432, RunningAvgSamplesPerSec=6.322642250098064, CurrSamplesPerSec=5.710060657339424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:30:43,611] [INFO] [timer.py:197:stop] 0/9434, RunningAvgSamplesPerSec=6.322647809477121, CurrSamplesPerSec=5.703833038985861, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 6.444444444444445e-07, 'epoch': 35.34} [2022-12-19 16:30:54,874] [INFO] [timer.py:197:stop] 0/9436, RunningAvgSamplesPerSec=6.32265700918031, CurrSamplesPerSec=5.741222607676481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:31:06,126] [INFO] [timer.py:197:stop] 0/9438, RunningAvgSamplesPerSec=6.322665088943316, CurrSamplesPerSec=5.733904967738228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:31:17,414] [INFO] [logging.py:68:log_dist] [Rank 0] step=4720, skipped=6, lr=[6.377777777777779e-07], mom=[[0.9, 0.999]] [2022-12-19 16:31:17,416] [INFO] [timer.py:197:stop] 0/9440, RunningAvgSamplesPerSec=6.322667134418642, CurrSamplesPerSec=5.6984317796331325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:31:28,659] [INFO] [timer.py:197:stop] 0/9442, RunningAvgSamplesPerSec=6.3226743026377985, CurrSamplesPerSec=5.721033962042501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:31:40,141] [INFO] [timer.py:197:stop] 0/9444, RunningAvgSamplesPerSec=6.322679829995748, CurrSamplesPerSec=5.701234294367075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:31:51,424] [INFO] [timer.py:197:stop] 0/9446, RunningAvgSamplesPerSec=6.322681268837546, CurrSamplesPerSec=5.67059959388322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:32:02,749] [INFO] [timer.py:197:stop] 0/9448, RunningAvgSamplesPerSec=6.322682953111194, CurrSamplesPerSec=5.692860119792155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:32:14,058] [INFO] [timer.py:197:stop] 0/9450, RunningAvgSamplesPerSec=6.322685566120479, CurrSamplesPerSec=5.69783861386416, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:32:25,503] [INFO] [timer.py:197:stop] 0/9452, RunningAvgSamplesPerSec=6.322692680439932, CurrSamplesPerSec=5.73994438548925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:32:36,828] [INFO] [timer.py:197:stop] 0/9454, RunningAvgSamplesPerSec=6.3226929192642825, CurrSamplesPerSec=5.698000681971632, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:32:48,163] [INFO] [timer.py:197:stop] 0/9456, RunningAvgSamplesPerSec=6.3227014640883334, CurrSamplesPerSec=5.737893937540116, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:32:59,449] [INFO] [timer.py:197:stop] 0/9458, RunningAvgSamplesPerSec=6.322704570331068, CurrSamplesPerSec=5.702087601810985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:33:10,886] [INFO] [logging.py:68:log_dist] [Rank 0] step=4730, skipped=6, lr=[6.155555555555556e-07], mom=[[0.9, 0.999]] [2022-12-19 16:33:10,888] [INFO] [timer.py:197:stop] 0/9460, RunningAvgSamplesPerSec=6.322712190114878, CurrSamplesPerSec=5.711276511056498, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:33:22,135] [INFO] [timer.py:197:stop] 0/9462, RunningAvgSamplesPerSec=6.322718294711852, CurrSamplesPerSec=5.7057194960959245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:33:33,389] [INFO] [timer.py:197:stop] 0/9464, RunningAvgSamplesPerSec=6.322723614150068, CurrSamplesPerSec=5.7157225230789726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:33:44,661] [INFO] [timer.py:197:stop] 0/9466, RunningAvgSamplesPerSec=6.322729404027132, CurrSamplesPerSec=5.721357581523905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:33:55,926] [INFO] [timer.py:197:stop] 0/9468, RunningAvgSamplesPerSec=6.3227354373705, CurrSamplesPerSec=5.698722359805214, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:34:07,254] [INFO] [timer.py:197:stop] 0/9470, RunningAvgSamplesPerSec=6.322743105861195, CurrSamplesPerSec=5.726365787171248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:34:18,522] [INFO] [timer.py:197:stop] 0/9472, RunningAvgSamplesPerSec=6.322747629741205, CurrSamplesPerSec=5.704211200316061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:34:29,807] [INFO] [timer.py:197:stop] 0/9474, RunningAvgSamplesPerSec=6.322754254332158, CurrSamplesPerSec=5.707866676572604, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:34:41,043] [INFO] [timer.py:197:stop] 0/9476, RunningAvgSamplesPerSec=6.32275867062431, CurrSamplesPerSec=5.708665153517289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:34:52,436] [INFO] [timer.py:197:stop] 0/9478, RunningAvgSamplesPerSec=6.3227678525231426, CurrSamplesPerSec=5.727127170703068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:35:03,716] [INFO] [logging.py:68:log_dist] [Rank 0] step=4740, skipped=6, lr=[5.933333333333334e-07], mom=[[0.9, 0.999]] [2022-12-19 16:35:03,718] [INFO] [timer.py:197:stop] 0/9480, RunningAvgSamplesPerSec=6.322773720065898, CurrSamplesPerSec=5.705560869273194, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:35:15,118] [INFO] [timer.py:197:stop] 0/9482, RunningAvgSamplesPerSec=6.322779184483639, CurrSamplesPerSec=5.70726353639805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:35:26,432] [INFO] [timer.py:197:stop] 0/9484, RunningAvgSamplesPerSec=6.32278335070137, CurrSamplesPerSec=5.706433183347375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 5.888888888888889e-07, 'epoch': 35.52} [2022-12-19 16:35:37,730] [INFO] [timer.py:197:stop] 0/9486, RunningAvgSamplesPerSec=6.322785049685468, CurrSamplesPerSec=5.69952796136723, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:35:49,048] [INFO] [timer.py:197:stop] 0/9488, RunningAvgSamplesPerSec=6.322786367767769, CurrSamplesPerSec=5.684078386385559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:36:00,388] [INFO] [timer.py:197:stop] 0/9490, RunningAvgSamplesPerSec=6.322791900277129, CurrSamplesPerSec=5.722019079006213, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:36:11,951] [INFO] [timer.py:197:stop] 0/9492, RunningAvgSamplesPerSec=6.322795226130608, CurrSamplesPerSec=5.696374129857968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:36:23,351] [INFO] [timer.py:197:stop] 0/9494, RunningAvgSamplesPerSec=6.322802146626534, CurrSamplesPerSec=5.734045331937294, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:36:34,624] [INFO] [timer.py:197:stop] 0/9496, RunningAvgSamplesPerSec=6.322805671822339, CurrSamplesPerSec=5.709935068064965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:36:45,896] [INFO] [timer.py:197:stop] 0/9498, RunningAvgSamplesPerSec=6.322811268848838, CurrSamplesPerSec=5.713764500966252, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:36:57,290] [INFO] [logging.py:68:log_dist] [Rank 0] step=4750, skipped=6, lr=[5.711111111111111e-07], mom=[[0.9, 0.999]] [2022-12-19 16:36:57,292] [INFO] [timer.py:197:stop] 0/9500, RunningAvgSamplesPerSec=6.322820070394884, CurrSamplesPerSec=5.726683657282005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:37:08,547] [INFO] [timer.py:197:stop] 0/9502, RunningAvgSamplesPerSec=6.322823317899727, CurrSamplesPerSec=5.694380054075888, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:37:20,000] [INFO] [timer.py:197:stop] 0/9504, RunningAvgSamplesPerSec=6.322829118957422, CurrSamplesPerSec=5.714649301423177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:37:31,467] [INFO] [timer.py:197:stop] 0/9506, RunningAvgSamplesPerSec=6.322830318792566, CurrSamplesPerSec=5.683597711345588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:37:42,726] [INFO] [timer.py:197:stop] 0/9508, RunningAvgSamplesPerSec=6.322836513577922, CurrSamplesPerSec=5.7118452534409325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:37:54,132] [INFO] [timer.py:197:stop] 0/9510, RunningAvgSamplesPerSec=6.322841453139544, CurrSamplesPerSec=5.714188986099581, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:38:05,499] [INFO] [timer.py:197:stop] 0/9512, RunningAvgSamplesPerSec=6.322843272319467, CurrSamplesPerSec=5.657462855990559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:38:17,007] [INFO] [timer.py:197:stop] 0/9514, RunningAvgSamplesPerSec=6.322844500303475, CurrSamplesPerSec=5.685806310504589, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:38:28,249] [INFO] [timer.py:197:stop] 0/9516, RunningAvgSamplesPerSec=6.322851054699384, CurrSamplesPerSec=5.705999903920593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:38:39,754] [INFO] [timer.py:197:stop] 0/9518, RunningAvgSamplesPerSec=6.322858089583397, CurrSamplesPerSec=5.718136675578738, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:38:51,038] [INFO] [logging.py:68:log_dist] [Rank 0] step=4760, skipped=6, lr=[5.48888888888889e-07], mom=[[0.9, 0.999]] [2022-12-19 16:38:51,040] [INFO] [timer.py:197:stop] 0/9520, RunningAvgSamplesPerSec=6.322859799380042, CurrSamplesPerSec=5.700610037609713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:39:02,336] [INFO] [timer.py:197:stop] 0/9522, RunningAvgSamplesPerSec=6.32286447677165, CurrSamplesPerSec=5.702797715341194, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:39:13,587] [INFO] [timer.py:197:stop] 0/9524, RunningAvgSamplesPerSec=6.32286958342354, CurrSamplesPerSec=5.7084068197179585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:39:24,901] [INFO] [timer.py:197:stop] 0/9526, RunningAvgSamplesPerSec=6.3228723945304806, CurrSamplesPerSec=5.696858661099253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:39:36,161] [INFO] [timer.py:197:stop] 0/9528, RunningAvgSamplesPerSec=6.322876796845835, CurrSamplesPerSec=5.708520930449372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:39:47,398] [INFO] [timer.py:197:stop] 0/9530, RunningAvgSamplesPerSec=6.322883081268796, CurrSamplesPerSec=5.734982984262713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:39:58,910] [INFO] [timer.py:197:stop] 0/9532, RunningAvgSamplesPerSec=6.322886231440559, CurrSamplesPerSec=5.706763160302638, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:40:10,233] [INFO] [timer.py:197:stop] 0/9534, RunningAvgSamplesPerSec=6.322884891053721, CurrSamplesPerSec=5.674214503059211, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 5.333333333333335e-07, 'epoch': 35.71} [2022-12-19 16:40:21,539] [INFO] [timer.py:197:stop] 0/9536, RunningAvgSamplesPerSec=6.322887991006358, CurrSamplesPerSec=5.704079322978666, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:40:32,858] [INFO] [timer.py:197:stop] 0/9538, RunningAvgSamplesPerSec=6.322889248644229, CurrSamplesPerSec=5.683191475559661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:40:44,113] [INFO] [logging.py:68:log_dist] [Rank 0] step=4770, skipped=6, lr=[5.266666666666667e-07], mom=[[0.9, 0.999]] [2022-12-19 16:40:44,115] [INFO] [timer.py:197:stop] 0/9540, RunningAvgSamplesPerSec=6.322893858267986, CurrSamplesPerSec=5.702207516454134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:40:55,375] [INFO] [timer.py:197:stop] 0/9542, RunningAvgSamplesPerSec=6.322899179129556, CurrSamplesPerSec=5.700030700506221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:41:06,651] [INFO] [timer.py:197:stop] 0/9544, RunningAvgSamplesPerSec=6.322903811100707, CurrSamplesPerSec=5.7095879651317025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:41:18,116] [INFO] [timer.py:197:stop] 0/9546, RunningAvgSamplesPerSec=6.322911272570285, CurrSamplesPerSec=5.720660881776676, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:41:29,417] [INFO] [timer.py:197:stop] 0/9548, RunningAvgSamplesPerSec=6.322911455084735, CurrSamplesPerSec=5.687993485861609, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:41:40,682] [INFO] [timer.py:197:stop] 0/9550, RunningAvgSamplesPerSec=6.322916572819354, CurrSamplesPerSec=5.714780937991874, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:41:51,928] [INFO] [timer.py:197:stop] 0/9552, RunningAvgSamplesPerSec=6.322924200970508, CurrSamplesPerSec=5.718155433786009, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:42:03,235] [INFO] [timer.py:197:stop] 0/9554, RunningAvgSamplesPerSec=6.322928386406965, CurrSamplesPerSec=5.693898118309175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:42:14,698] [INFO] [timer.py:197:stop] 0/9556, RunningAvgSamplesPerSec=6.32293267806449, CurrSamplesPerSec=5.696818763945972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:42:25,964] [INFO] [timer.py:197:stop] 0/9558, RunningAvgSamplesPerSec=6.322936599863463, CurrSamplesPerSec=5.708271349496127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:42:37,234] [INFO] [logging.py:68:log_dist] [Rank 0] step=4780, skipped=6, lr=[5.044444444444445e-07], mom=[[0.9, 0.999]] [2022-12-19 16:42:37,236] [INFO] [timer.py:197:stop] 0/9560, RunningAvgSamplesPerSec=6.322940064244079, CurrSamplesPerSec=5.704911658444872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:42:48,543] [INFO] [timer.py:197:stop] 0/9562, RunningAvgSamplesPerSec=6.322939753531428, CurrSamplesPerSec=5.6742368123760825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:42:59,942] [INFO] [timer.py:197:stop] 0/9564, RunningAvgSamplesPerSec=6.322947311025854, CurrSamplesPerSec=5.7126708605695695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:43:11,167] [INFO] [timer.py:197:stop] 0/9566, RunningAvgSamplesPerSec=6.322955589347097, CurrSamplesPerSec=5.726668996854246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:43:22,480] [INFO] [timer.py:197:stop] 0/9568, RunningAvgSamplesPerSec=6.322961078775288, CurrSamplesPerSec=5.706096694680524, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:43:33,757] [INFO] [timer.py:197:stop] 0/9570, RunningAvgSamplesPerSec=6.3229677355886444, CurrSamplesPerSec=5.7162337227042235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:43:45,259] [INFO] [timer.py:197:stop] 0/9572, RunningAvgSamplesPerSec=6.322974872862283, CurrSamplesPerSec=5.725272202441408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:43:56,536] [INFO] [timer.py:197:stop] 0/9574, RunningAvgSamplesPerSec=6.322980074219177, CurrSamplesPerSec=5.721171989687393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:44:07,990] [INFO] [timer.py:197:stop] 0/9576, RunningAvgSamplesPerSec=6.322986256331621, CurrSamplesPerSec=5.722005174277484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:44:19,247] [INFO] [timer.py:197:stop] 0/9578, RunningAvgSamplesPerSec=6.322992508196449, CurrSamplesPerSec=5.727458078901573, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:44:30,528] [INFO] [logging.py:68:log_dist] [Rank 0] step=4790, skipped=6, lr=[4.822222222222222e-07], mom=[[0.9, 0.999]] [2022-12-19 16:44:30,529] [INFO] [timer.py:197:stop] 0/9580, RunningAvgSamplesPerSec=6.322997590432497, CurrSamplesPerSec=5.717934240939049, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:44:41,857] [INFO] [timer.py:197:stop] 0/9582, RunningAvgSamplesPerSec=6.323004059755914, CurrSamplesPerSec=5.730837534538466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:44:53,090] [INFO] [timer.py:197:stop] 0/9584, RunningAvgSamplesPerSec=6.323013517293927, CurrSamplesPerSec=5.728275737622947, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 4.777777777777778e-07, 'epoch': 35.9} [2022-12-19 16:45:04,367] [INFO] [timer.py:197:stop] 0/9586, RunningAvgSamplesPerSec=6.323015322671991, CurrSamplesPerSec=5.692311809439787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:45:15,675] [INFO] [timer.py:197:stop] 0/9588, RunningAvgSamplesPerSec=6.323019225000402, CurrSamplesPerSec=5.6917121932897805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:45:26,901] [INFO] [timer.py:197:stop] 0/9590, RunningAvgSamplesPerSec=6.32302640156682, CurrSamplesPerSec=5.709344362454474, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:45:38,180] [INFO] [timer.py:197:stop] 0/9592, RunningAvgSamplesPerSec=6.323030786455532, CurrSamplesPerSec=5.703342715976756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:45:49,528] [INFO] [timer.py:197:stop] 0/9594, RunningAvgSamplesPerSec=6.323023745540936, CurrSamplesPerSec=5.624391482041347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:46:00,800] [INFO] [timer.py:197:stop] 0/9596, RunningAvgSamplesPerSec=6.323028283741957, CurrSamplesPerSec=5.696181452465979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:46:12,174] [INFO] [timer.py:197:stop] 0/9598, RunningAvgSamplesPerSec=6.3230319231902685, CurrSamplesPerSec=5.713839663079542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:46:23,412] [INFO] [logging.py:68:log_dist] [Rank 0] step=4800, skipped=6, lr=[4.6000000000000004e-07], mom=[[0.9, 0.999]] [2022-12-19 16:46:23,414] [INFO] [timer.py:197:stop] 0/9600, RunningAvgSamplesPerSec=6.323042638717136, CurrSamplesPerSec=5.723836789165756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:46:34,847] [INFO] [timer.py:197:stop] 0/9602, RunningAvgSamplesPerSec=6.32304661378678, CurrSamplesPerSec=5.70779385610358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:46:46,082] [INFO] [timer.py:197:stop] 0/9604, RunningAvgSamplesPerSec=6.3230531733956195, CurrSamplesPerSec=5.728154235148083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:46:57,359] [INFO] [timer.py:197:stop] 0/9606, RunningAvgSamplesPerSec=6.323055415400089, CurrSamplesPerSec=5.695823692133438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:47:08,821] [INFO] [timer.py:197:stop] 0/9608, RunningAvgSamplesPerSec=6.323058257818612, CurrSamplesPerSec=5.69525451072896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:47:20,177] [INFO] [timer.py:197:stop] 0/9610, RunningAvgSamplesPerSec=6.323065043489573, CurrSamplesPerSec=5.7166707491593405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:47:30,529] [INFO] [timer.py:197:stop] 0/9612, RunningAvgSamplesPerSec=6.323175263709056, CurrSamplesPerSec=6.711564603603194, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:47:41,773] [INFO] [timer.py:197:stop] 0/9614, RunningAvgSamplesPerSec=6.323182502834872, CurrSamplesPerSec=5.718741385206402, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:47:53,018] [INFO] [timer.py:197:stop] 0/9616, RunningAvgSamplesPerSec=6.323188862354847, CurrSamplesPerSec=5.725053633297326, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:48:04,281] [INFO] [timer.py:197:stop] 0/9618, RunningAvgSamplesPerSec=6.32319345956768, CurrSamplesPerSec=5.6962484166563465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:48:15,754] [INFO] [logging.py:68:log_dist] [Rank 0] step=4810, skipped=6, lr=[4.377777777777778e-07], mom=[[0.9, 0.999]] [2022-12-19 16:48:15,755] [INFO] [timer.py:197:stop] 0/9620, RunningAvgSamplesPerSec=6.323197186556051, CurrSamplesPerSec=5.717672388269841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:48:27,011] [INFO] [timer.py:197:stop] 0/9622, RunningAvgSamplesPerSec=6.323203389642964, CurrSamplesPerSec=5.713078890007706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:48:38,296] [INFO] [timer.py:197:stop] 0/9624, RunningAvgSamplesPerSec=6.323206629931356, CurrSamplesPerSec=5.702576982162341, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:48:49,511] [INFO] [timer.py:197:stop] 0/9626, RunningAvgSamplesPerSec=6.323216095642306, CurrSamplesPerSec=5.734985924862038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:49:00,793] [INFO] [timer.py:197:stop] 0/9628, RunningAvgSamplesPerSec=6.323219957677922, CurrSamplesPerSec=5.708022761370585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:49:11,996] [INFO] [timer.py:197:stop] 0/9630, RunningAvgSamplesPerSec=6.323226761263225, CurrSamplesPerSec=5.727007183159122, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:49:23,200] [INFO] [timer.py:197:stop] 0/9632, RunningAvgSamplesPerSec=6.3232375054384695, CurrSamplesPerSec=5.7467609376344715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:49:34,672] [INFO] [timer.py:197:stop] 0/9634, RunningAvgSamplesPerSec=6.323239477406858, CurrSamplesPerSec=5.69246704467233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:49:45,959] [INFO] [timer.py:197:stop] 0/9636, RunningAvgSamplesPerSec=6.323243820529349, CurrSamplesPerSec=5.713548025670109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 4.2000000000000006e-07, 'epoch': 36.09} [2022-12-19 16:49:57,210] [INFO] [timer.py:197:stop] 0/9638, RunningAvgSamplesPerSec=6.323251035497274, CurrSamplesPerSec=5.7309077630605705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:50:08,468] [INFO] [logging.py:68:log_dist] [Rank 0] step=4820, skipped=6, lr=[4.155555555555556e-07], mom=[[0.9, 0.999]] [2022-12-19 16:50:08,470] [INFO] [timer.py:197:stop] 0/9640, RunningAvgSamplesPerSec=6.323258362437865, CurrSamplesPerSec=5.727101999783235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:50:19,914] [INFO] [timer.py:197:stop] 0/9642, RunningAvgSamplesPerSec=6.323263214946381, CurrSamplesPerSec=5.705041149294086, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:50:31,216] [INFO] [timer.py:197:stop] 0/9644, RunningAvgSamplesPerSec=6.323265625540171, CurrSamplesPerSec=5.685891096354694, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:50:42,486] [INFO] [timer.py:197:stop] 0/9646, RunningAvgSamplesPerSec=6.323270174878071, CurrSamplesPerSec=5.710168274981378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:50:53,761] [INFO] [timer.py:197:stop] 0/9648, RunningAvgSamplesPerSec=6.32327220356879, CurrSamplesPerSec=5.686729697137326, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:51:05,028] [INFO] [timer.py:197:stop] 0/9650, RunningAvgSamplesPerSec=6.323279013209308, CurrSamplesPerSec=5.710868982014628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:51:16,344] [INFO] [timer.py:197:stop] 0/9652, RunningAvgSamplesPerSec=6.3232801909654945, CurrSamplesPerSec=5.68525406001446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:51:27,637] [INFO] [timer.py:197:stop] 0/9654, RunningAvgSamplesPerSec=6.323282289474732, CurrSamplesPerSec=5.692337882573081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:51:38,891] [INFO] [timer.py:197:stop] 0/9656, RunningAvgSamplesPerSec=6.323292496062131, CurrSamplesPerSec=5.7210963906823915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:51:50,210] [INFO] [timer.py:197:stop] 0/9658, RunningAvgSamplesPerSec=6.323292607086621, CurrSamplesPerSec=5.683836474162795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:52:01,536] [INFO] [logging.py:68:log_dist] [Rank 0] step=4830, skipped=6, lr=[3.9333333333333336e-07], mom=[[0.9, 0.999]] [2022-12-19 16:52:01,537] [INFO] [timer.py:197:stop] 0/9660, RunningAvgSamplesPerSec=6.323294440870088, CurrSamplesPerSec=5.685715024021303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:52:12,817] [INFO] [timer.py:197:stop] 0/9662, RunningAvgSamplesPerSec=6.323295318657922, CurrSamplesPerSec=5.684598627592176, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:52:24,078] [INFO] [timer.py:197:stop] 0/9664, RunningAvgSamplesPerSec=6.323298291528393, CurrSamplesPerSec=5.709907861832185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:52:35,374] [INFO] [timer.py:197:stop] 0/9666, RunningAvgSamplesPerSec=6.323305119693711, CurrSamplesPerSec=5.728683798745909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:52:46,644] [INFO] [timer.py:197:stop] 0/9668, RunningAvgSamplesPerSec=6.32331033963508, CurrSamplesPerSec=5.715936728934892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:52:58,001] [INFO] [timer.py:197:stop] 0/9670, RunningAvgSamplesPerSec=6.323314702007681, CurrSamplesPerSec=5.697821440022876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:53:09,307] [INFO] [timer.py:197:stop] 0/9672, RunningAvgSamplesPerSec=6.323321251904328, CurrSamplesPerSec=5.729921540327343, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:53:20,558] [INFO] [timer.py:197:stop] 0/9674, RunningAvgSamplesPerSec=6.3233247013245855, CurrSamplesPerSec=5.7032673450760365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:53:31,896] [INFO] [timer.py:197:stop] 0/9676, RunningAvgSamplesPerSec=6.323324942583187, CurrSamplesPerSec=5.6806141464923225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:53:43,261] [INFO] [timer.py:197:stop] 0/9678, RunningAvgSamplesPerSec=6.323324828537818, CurrSamplesPerSec=5.682811043164903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:53:54,585] [INFO] [logging.py:68:log_dist] [Rank 0] step=4840, skipped=6, lr=[3.7111111111111113e-07], mom=[[0.9, 0.999]] [2022-12-19 16:53:54,586] [INFO] [timer.py:197:stop] 0/9680, RunningAvgSamplesPerSec=6.323325963176956, CurrSamplesPerSec=5.686457203050335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:54:05,904] [INFO] [timer.py:197:stop] 0/9682, RunningAvgSamplesPerSec=6.323329224247756, CurrSamplesPerSec=5.6948031135511705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:54:17,232] [INFO] [timer.py:197:stop] 0/9684, RunningAvgSamplesPerSec=6.3233314037583614, CurrSamplesPerSec=5.678841080472173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:54:28,591] [INFO] [timer.py:197:stop] 0/9686, RunningAvgSamplesPerSec=6.323333515614688, CurrSamplesPerSec=5.705187379173971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 3.644444444444445e-07, 'epoch': 36.28} [2022-12-19 16:54:39,911] [INFO] [timer.py:197:stop] 0/9688, RunningAvgSamplesPerSec=6.323337240614044, CurrSamplesPerSec=5.707592881249909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:54:51,124] [INFO] [timer.py:197:stop] 0/9690, RunningAvgSamplesPerSec=6.32334508019303, CurrSamplesPerSec=5.731190896894036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:55:02,395] [INFO] [timer.py:197:stop] 0/9692, RunningAvgSamplesPerSec=6.3233506764825735, CurrSamplesPerSec=5.711719342587083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:55:13,661] [INFO] [timer.py:197:stop] 0/9694, RunningAvgSamplesPerSec=6.323353857314337, CurrSamplesPerSec=5.700940067884895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:55:24,938] [INFO] [timer.py:197:stop] 0/9696, RunningAvgSamplesPerSec=6.323358431705155, CurrSamplesPerSec=5.697022366178724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:55:36,282] [INFO] [timer.py:197:stop] 0/9698, RunningAvgSamplesPerSec=6.323366436337318, CurrSamplesPerSec=5.7174276081625734, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:55:47,808] [INFO] [logging.py:68:log_dist] [Rank 0] step=4850, skipped=6, lr=[3.488888888888889e-07], mom=[[0.9, 0.999]] [2022-12-19 16:55:47,810] [INFO] [timer.py:197:stop] 0/9700, RunningAvgSamplesPerSec=6.32337119988649, CurrSamplesPerSec=5.714169524083618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:55:59,055] [INFO] [timer.py:197:stop] 0/9702, RunningAvgSamplesPerSec=6.323376504569196, CurrSamplesPerSec=5.720176437741802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:56:10,310] [INFO] [timer.py:197:stop] 0/9704, RunningAvgSamplesPerSec=6.323383320557521, CurrSamplesPerSec=5.714719863709586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:56:21,530] [INFO] [timer.py:197:stop] 0/9706, RunningAvgSamplesPerSec=6.323390702746014, CurrSamplesPerSec=5.718576429564987, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:56:32,961] [INFO] [timer.py:197:stop] 0/9708, RunningAvgSamplesPerSec=6.323398857542222, CurrSamplesPerSec=5.725019201020847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:56:44,278] [INFO] [timer.py:197:stop] 0/9710, RunningAvgSamplesPerSec=6.323403177043801, CurrSamplesPerSec=5.704658756004614, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:56:55,743] [INFO] [timer.py:197:stop] 0/9712, RunningAvgSamplesPerSec=6.323400688509288, CurrSamplesPerSec=5.670077839632513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:57:07,127] [INFO] [timer.py:197:stop] 0/9714, RunningAvgSamplesPerSec=6.323391119259401, CurrSamplesPerSec=5.655331023437156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:57:18,417] [INFO] [timer.py:197:stop] 0/9716, RunningAvgSamplesPerSec=6.323391700566342, CurrSamplesPerSec=5.688196698815142, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:57:29,707] [INFO] [timer.py:197:stop] 0/9718, RunningAvgSamplesPerSec=6.323394326372075, CurrSamplesPerSec=5.699629857512362, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:57:41,367] [INFO] [logging.py:68:log_dist] [Rank 0] step=4860, skipped=6, lr=[3.266666666666667e-07], mom=[[0.9, 0.999]] [2022-12-19 16:57:41,368] [INFO] [timer.py:197:stop] 0/9720, RunningAvgSamplesPerSec=6.323397201864483, CurrSamplesPerSec=5.7122159691696845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:57:52,634] [INFO] [timer.py:197:stop] 0/9722, RunningAvgSamplesPerSec=6.323401781000754, CurrSamplesPerSec=5.715725687366827, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:58:04,207] [INFO] [timer.py:197:stop] 0/9724, RunningAvgSamplesPerSec=6.323398091850636, CurrSamplesPerSec=5.714017238386006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:58:15,473] [INFO] [timer.py:197:stop] 0/9726, RunningAvgSamplesPerSec=6.323402505809888, CurrSamplesPerSec=5.708725855640714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:58:26,735] [INFO] [timer.py:197:stop] 0/9728, RunningAvgSamplesPerSec=6.3234064812875515, CurrSamplesPerSec=5.713644829449157, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:58:38,100] [INFO] [timer.py:197:stop] 0/9730, RunningAvgSamplesPerSec=6.3234112367652955, CurrSamplesPerSec=5.711242973375676, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:58:49,376] [INFO] [timer.py:197:stop] 0/9732, RunningAvgSamplesPerSec=6.323414327753692, CurrSamplesPerSec=5.702507203987727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:59:00,763] [INFO] [timer.py:197:stop] 0/9734, RunningAvgSamplesPerSec=6.323420616216106, CurrSamplesPerSec=5.736214869202727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:59:12,031] [INFO] [timer.py:197:stop] 0/9736, RunningAvgSamplesPerSec=6.3234261780348895, CurrSamplesPerSec=5.702066041935323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 3.088888888888889e-07, 'epoch': 36.46} [2022-12-19 16:59:23,566] [INFO] [timer.py:197:stop] 0/9738, RunningAvgSamplesPerSec=6.323398803471107, CurrSamplesPerSec=5.454487037693208, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:59:35,051] [INFO] [logging.py:68:log_dist] [Rank 0] step=4870, skipped=6, lr=[3.0444444444444445e-07], mom=[[0.9, 0.999]] [2022-12-19 16:59:35,053] [INFO] [timer.py:197:stop] 0/9740, RunningAvgSamplesPerSec=6.323401096657717, CurrSamplesPerSec=5.700904230026162, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:59:46,287] [INFO] [timer.py:197:stop] 0/9742, RunningAvgSamplesPerSec=6.3234053100874075, CurrSamplesPerSec=5.7127941386472045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 16:59:57,934] [INFO] [timer.py:197:stop] 0/9744, RunningAvgSamplesPerSec=6.323402048599632, CurrSamplesPerSec=5.6521080799730825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:00:09,193] [INFO] [timer.py:197:stop] 0/9746, RunningAvgSamplesPerSec=6.323407782180994, CurrSamplesPerSec=5.713171786856861, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:00:20,592] [INFO] [timer.py:197:stop] 0/9748, RunningAvgSamplesPerSec=6.323397792952967, CurrSamplesPerSec=5.601454716822306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:00:32,069] [INFO] [timer.py:197:stop] 0/9750, RunningAvgSamplesPerSec=6.323403234742196, CurrSamplesPerSec=5.7308162460929815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:00:43,335] [INFO] [timer.py:197:stop] 0/9752, RunningAvgSamplesPerSec=6.323408132102696, CurrSamplesPerSec=5.6994943194970125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:00:54,622] [INFO] [timer.py:197:stop] 0/9754, RunningAvgSamplesPerSec=6.323411309498928, CurrSamplesPerSec=5.707904543950822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:01:05,906] [INFO] [timer.py:197:stop] 0/9756, RunningAvgSamplesPerSec=6.3234190494145315, CurrSamplesPerSec=5.737812008962458, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:01:17,380] [INFO] [timer.py:197:stop] 0/9758, RunningAvgSamplesPerSec=6.323395258537348, CurrSamplesPerSec=5.479416343691206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:01:28,655] [INFO] [logging.py:68:log_dist] [Rank 0] step=4880, skipped=6, lr=[2.822222222222222e-07], mom=[[0.9, 0.999]] [2022-12-19 17:01:28,656] [INFO] [timer.py:197:stop] 0/9760, RunningAvgSamplesPerSec=6.323400824316397, CurrSamplesPerSec=5.705186166623416, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:01:40,002] [INFO] [timer.py:197:stop] 0/9762, RunningAvgSamplesPerSec=6.3234099681779, CurrSamplesPerSec=5.734172229047445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:01:51,247] [INFO] [timer.py:197:stop] 0/9764, RunningAvgSamplesPerSec=6.323413417246795, CurrSamplesPerSec=5.704784113280177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:02:02,569] [INFO] [timer.py:197:stop] 0/9766, RunningAvgSamplesPerSec=6.3234207639036, CurrSamplesPerSec=5.723009171808309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:02:13,916] [INFO] [timer.py:197:stop] 0/9768, RunningAvgSamplesPerSec=6.323416570093127, CurrSamplesPerSec=5.655698252476801, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:02:25,182] [INFO] [timer.py:197:stop] 0/9770, RunningAvgSamplesPerSec=6.323421742411865, CurrSamplesPerSec=5.724882696993447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:02:36,848] [INFO] [timer.py:197:stop] 0/9772, RunningAvgSamplesPerSec=6.323375437667541, CurrSamplesPerSec=5.329538715890593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:02:48,178] [INFO] [timer.py:197:stop] 0/9774, RunningAvgSamplesPerSec=6.323380850812605, CurrSamplesPerSec=5.712075213072269, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:02:59,461] [INFO] [timer.py:197:stop] 0/9776, RunningAvgSamplesPerSec=6.3233833829325325, CurrSamplesPerSec=5.702837453974365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:03:10,809] [INFO] [timer.py:197:stop] 0/9778, RunningAvgSamplesPerSec=6.323393428834671, CurrSamplesPerSec=5.727111041738682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:03:22,114] [INFO] [logging.py:68:log_dist] [Rank 0] step=4890, skipped=6, lr=[2.6e-07], mom=[[0.9, 0.999]] [2022-12-19 17:03:22,115] [INFO] [timer.py:197:stop] 0/9780, RunningAvgSamplesPerSec=6.3233969957592135, CurrSamplesPerSec=5.693026251394755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:03:33,472] [INFO] [timer.py:197:stop] 0/9782, RunningAvgSamplesPerSec=6.323392449986547, CurrSamplesPerSec=5.649130845336326, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:03:44,897] [INFO] [timer.py:197:stop] 0/9784, RunningAvgSamplesPerSec=6.3233969932371386, CurrSamplesPerSec=5.711207005865852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:03:56,211] [INFO] [timer.py:197:stop] 0/9786, RunningAvgSamplesPerSec=6.323398599035512, CurrSamplesPerSec=5.677243457492178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 2.533333333333333e-07, 'epoch': 36.65} [2022-12-19 17:04:07,469] [INFO] [timer.py:197:stop] 0/9788, RunningAvgSamplesPerSec=6.323402689112739, CurrSamplesPerSec=5.726839795494341, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:04:18,965] [INFO] [timer.py:197:stop] 0/9790, RunningAvgSamplesPerSec=6.323407175788796, CurrSamplesPerSec=5.713520784997067, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:04:30,391] [INFO] [timer.py:197:stop] 0/9792, RunningAvgSamplesPerSec=6.323390716490419, CurrSamplesPerSec=5.537303853409053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:04:41,700] [INFO] [timer.py:197:stop] 0/9794, RunningAvgSamplesPerSec=6.323391305236168, CurrSamplesPerSec=5.685007231754882, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:04:52,971] [INFO] [timer.py:197:stop] 0/9796, RunningAvgSamplesPerSec=6.3233936927824494, CurrSamplesPerSec=5.692741563777875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:05:04,529] [INFO] [timer.py:197:stop] 0/9798, RunningAvgSamplesPerSec=6.3233980189295265, CurrSamplesPerSec=5.714405996557844, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:05:15,842] [INFO] [logging.py:68:log_dist] [Rank 0] step=4900, skipped=6, lr=[2.3777777777777777e-07], mom=[[0.9, 0.999]] [2022-12-19 17:05:15,844] [INFO] [timer.py:197:stop] 0/9800, RunningAvgSamplesPerSec=6.323398921679056, CurrSamplesPerSec=5.682945788851915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:05:27,402] [INFO] [timer.py:197:stop] 0/9802, RunningAvgSamplesPerSec=6.323386565866633, CurrSamplesPerSec=5.67253988617029, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:05:38,773] [INFO] [timer.py:197:stop] 0/9804, RunningAvgSamplesPerSec=6.323381143595169, CurrSamplesPerSec=5.708326702143215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:05:50,021] [INFO] [timer.py:197:stop] 0/9806, RunningAvgSamplesPerSec=6.323383719565044, CurrSamplesPerSec=5.698354603036143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:06:01,333] [INFO] [timer.py:197:stop] 0/9808, RunningAvgSamplesPerSec=6.3233876881096345, CurrSamplesPerSec=5.720229583608652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:06:12,622] [INFO] [timer.py:197:stop] 0/9810, RunningAvgSamplesPerSec=6.323389144160075, CurrSamplesPerSec=5.705668802467312, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:06:23,898] [INFO] [timer.py:197:stop] 0/9812, RunningAvgSamplesPerSec=6.323392759263094, CurrSamplesPerSec=5.702819523059122, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:06:35,389] [INFO] [timer.py:197:stop] 0/9814, RunningAvgSamplesPerSec=6.323394272105478, CurrSamplesPerSec=5.710735581744126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:06:46,681] [INFO] [timer.py:197:stop] 0/9816, RunningAvgSamplesPerSec=6.323390733428622, CurrSamplesPerSec=5.643813656132533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:06:57,979] [INFO] [timer.py:197:stop] 0/9818, RunningAvgSamplesPerSec=6.323392278449553, CurrSamplesPerSec=5.679968436750331, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:07:09,414] [INFO] [logging.py:68:log_dist] [Rank 0] step=4910, skipped=6, lr=[2.155555555555556e-07], mom=[[0.9, 0.999]] [2022-12-19 17:07:09,414] [INFO] [timer.py:197:stop] 0/9820, RunningAvgSamplesPerSec=6.323378063430713, CurrSamplesPerSec=5.576562088084332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:07:20,766] [INFO] [timer.py:197:stop] 0/9822, RunningAvgSamplesPerSec=6.323386248420374, CurrSamplesPerSec=5.742783468410691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:07:32,045] [INFO] [timer.py:197:stop] 0/9824, RunningAvgSamplesPerSec=6.32339000331139, CurrSamplesPerSec=5.697318606818595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:07:43,639] [INFO] [timer.py:197:stop] 0/9826, RunningAvgSamplesPerSec=6.323396578914846, CurrSamplesPerSec=5.734371160145044, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:07:54,914] [INFO] [timer.py:197:stop] 0/9828, RunningAvgSamplesPerSec=6.323399315956231, CurrSamplesPerSec=5.701716746219607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:08:06,411] [INFO] [timer.py:197:stop] 0/9830, RunningAvgSamplesPerSec=6.3233765885096345, CurrSamplesPerSec=5.491488478378197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:08:17,738] [INFO] [timer.py:197:stop] 0/9832, RunningAvgSamplesPerSec=6.323375768723862, CurrSamplesPerSec=5.693486061497647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:08:29,062] [INFO] [timer.py:197:stop] 0/9834, RunningAvgSamplesPerSec=6.323380308389495, CurrSamplesPerSec=5.722716353635918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:08:40,363] [INFO] [timer.py:197:stop] 0/9836, RunningAvgSamplesPerSec=6.323380298668782, CurrSamplesPerSec=5.671522840646906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.9777777777777778e-07, 'epoch': 36.84} [2022-12-19 17:08:51,626] [INFO] [timer.py:197:stop] 0/9838, RunningAvgSamplesPerSec=6.323387260493729, CurrSamplesPerSec=5.727826179766741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:09:02,916] [INFO] [logging.py:68:log_dist] [Rank 0] step=4920, skipped=6, lr=[1.9333333333333337e-07], mom=[[0.9, 0.999]] [2022-12-19 17:09:02,918] [INFO] [timer.py:197:stop] 0/9840, RunningAvgSamplesPerSec=6.3233865036421015, CurrSamplesPerSec=5.688688519788957, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:09:14,162] [INFO] [timer.py:197:stop] 0/9842, RunningAvgSamplesPerSec=6.323389642040837, CurrSamplesPerSec=5.709802683031585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:09:25,490] [INFO] [timer.py:197:stop] 0/9844, RunningAvgSamplesPerSec=6.323388078872663, CurrSamplesPerSec=5.627818380492583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:09:37,064] [INFO] [timer.py:197:stop] 0/9846, RunningAvgSamplesPerSec=6.323387296652491, CurrSamplesPerSec=5.6584073543846065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:09:48,354] [INFO] [timer.py:197:stop] 0/9848, RunningAvgSamplesPerSec=6.323391086082492, CurrSamplesPerSec=5.695117489139653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:09:59,667] [INFO] [timer.py:197:stop] 0/9850, RunningAvgSamplesPerSec=6.323389797194416, CurrSamplesPerSec=5.707090506029179, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:10:11,190] [INFO] [timer.py:197:stop] 0/9852, RunningAvgSamplesPerSec=6.323394222207052, CurrSamplesPerSec=5.700919243128358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:10:22,751] [INFO] [timer.py:197:stop] 0/9854, RunningAvgSamplesPerSec=6.32336128550676, CurrSamplesPerSec=5.422223866898835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:10:33,981] [INFO] [timer.py:197:stop] 0/9856, RunningAvgSamplesPerSec=6.323370408402808, CurrSamplesPerSec=5.7306028809853355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:10:45,431] [INFO] [timer.py:197:stop] 0/9858, RunningAvgSamplesPerSec=6.323351664591335, CurrSamplesPerSec=5.516382231013657, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:10:56,860] [INFO] [logging.py:68:log_dist] [Rank 0] step=4930, skipped=6, lr=[1.7111111111111114e-07], mom=[[0.9, 0.999]] [2022-12-19 17:10:56,862] [INFO] [timer.py:197:stop] 0/9860, RunningAvgSamplesPerSec=6.323356835238052, CurrSamplesPerSec=5.701128466201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:11:08,123] [INFO] [timer.py:197:stop] 0/9862, RunningAvgSamplesPerSec=6.323360081909127, CurrSamplesPerSec=5.701987313436989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:11:19,553] [INFO] [timer.py:197:stop] 0/9864, RunningAvgSamplesPerSec=6.3233662358716245, CurrSamplesPerSec=5.726819269797276, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:11:30,757] [INFO] [timer.py:197:stop] 0/9866, RunningAvgSamplesPerSec=6.323375867308454, CurrSamplesPerSec=5.71373385290509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:11:42,138] [INFO] [timer.py:197:stop] 0/9868, RunningAvgSamplesPerSec=6.323365276679728, CurrSamplesPerSec=5.581476569648438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:11:53,612] [INFO] [timer.py:197:stop] 0/9870, RunningAvgSamplesPerSec=6.32337420566333, CurrSamplesPerSec=5.734459605619558, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:12:04,881] [INFO] [timer.py:197:stop] 0/9872, RunningAvgSamplesPerSec=6.323381469654518, CurrSamplesPerSec=5.716360319593478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:12:16,208] [INFO] [timer.py:197:stop] 0/9874, RunningAvgSamplesPerSec=6.323379913771496, CurrSamplesPerSec=5.722218143584147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:12:27,640] [INFO] [timer.py:197:stop] 0/9876, RunningAvgSamplesPerSec=6.323384207188721, CurrSamplesPerSec=5.71225583916957, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:12:38,914] [INFO] [timer.py:197:stop] 0/9878, RunningAvgSamplesPerSec=6.323388127235413, CurrSamplesPerSec=5.7080608735743485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:12:49,286] [INFO] [logging.py:68:log_dist] [Rank 0] step=4940, skipped=6, lr=[1.488888888888889e-07], mom=[[0.9, 0.999]] [2022-12-19 17:12:49,287] [INFO] [timer.py:197:stop] 0/9880, RunningAvgSamplesPerSec=6.323496218284474, CurrSamplesPerSec=5.720688190576901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:13:00,786] [INFO] [timer.py:197:stop] 0/9882, RunningAvgSamplesPerSec=6.323503552817446, CurrSamplesPerSec=5.731808896326255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:13:12,095] [INFO] [timer.py:197:stop] 0/9884, RunningAvgSamplesPerSec=6.323501812206461, CurrSamplesPerSec=5.711126080625087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:13:23,360] [INFO] [timer.py:197:stop] 0/9886, RunningAvgSamplesPerSec=6.3235080493377325, CurrSamplesPerSec=5.727790736389908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 1.4222222222222224e-07, 'epoch': 37.03} [2022-12-19 17:13:34,681] [INFO] [timer.py:197:stop] 0/9888, RunningAvgSamplesPerSec=6.323512501018272, CurrSamplesPerSec=5.708170600722613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:13:45,958] [INFO] [timer.py:197:stop] 0/9890, RunningAvgSamplesPerSec=6.323516589826417, CurrSamplesPerSec=5.710861449228595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:13:57,208] [INFO] [timer.py:197:stop] 0/9892, RunningAvgSamplesPerSec=6.323521265668136, CurrSamplesPerSec=5.691522727336335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:14:08,688] [INFO] [timer.py:197:stop] 0/9894, RunningAvgSamplesPerSec=6.323527838379167, CurrSamplesPerSec=5.721116143777978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:14:19,953] [INFO] [timer.py:197:stop] 0/9896, RunningAvgSamplesPerSec=6.3235339967569, CurrSamplesPerSec=5.718331084789433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:14:31,224] [INFO] [timer.py:197:stop] 0/9898, RunningAvgSamplesPerSec=6.323536904869323, CurrSamplesPerSec=5.694063827369964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:14:42,740] [INFO] [logging.py:68:log_dist] [Rank 0] step=4950, skipped=6, lr=[1.2666666666666666e-07], mom=[[0.9, 0.999]] [2022-12-19 17:14:42,742] [INFO] [timer.py:197:stop] 0/9900, RunningAvgSamplesPerSec=6.323539756967406, CurrSamplesPerSec=5.702630528377929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:14:54,082] [INFO] [timer.py:197:stop] 0/9902, RunningAvgSamplesPerSec=6.323535296746084, CurrSamplesPerSec=5.619330354728013, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:15:05,308] [INFO] [timer.py:197:stop] 0/9904, RunningAvgSamplesPerSec=6.323544127689941, CurrSamplesPerSec=5.723214894494615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:15:16,992] [INFO] [timer.py:197:stop] 0/9906, RunningAvgSamplesPerSec=6.323493269399313, CurrSamplesPerSec=5.303811261104228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:15:28,314] [INFO] [timer.py:197:stop] 0/9908, RunningAvgSamplesPerSec=6.323500490052342, CurrSamplesPerSec=5.716827315839824, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:15:39,540] [INFO] [timer.py:197:stop] 0/9910, RunningAvgSamplesPerSec=6.323506164511322, CurrSamplesPerSec=5.7087945720022315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:15:50,882] [INFO] [timer.py:197:stop] 0/9912, RunningAvgSamplesPerSec=6.323504887922819, CurrSamplesPerSec=5.715926505108733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:16:02,264] [INFO] [timer.py:197:stop] 0/9914, RunningAvgSamplesPerSec=6.323507540561405, CurrSamplesPerSec=5.720276879384842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:16:13,744] [INFO] [timer.py:197:stop] 0/9916, RunningAvgSamplesPerSec=6.323485213702696, CurrSamplesPerSec=5.504966735443686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:16:24,985] [INFO] [timer.py:197:stop] 0/9918, RunningAvgSamplesPerSec=6.323493551840836, CurrSamplesPerSec=5.723259799158877, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:16:36,497] [INFO] [logging.py:68:log_dist] [Rank 0] step=4960, skipped=6, lr=[1.0444444444444445e-07], mom=[[0.9, 0.999]] [2022-12-19 17:16:36,498] [INFO] [timer.py:197:stop] 0/9920, RunningAvgSamplesPerSec=6.323497401230675, CurrSamplesPerSec=5.7079761535074365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:16:48,070] [INFO] [timer.py:197:stop] 0/9922, RunningAvgSamplesPerSec=6.323466349553814, CurrSamplesPerSec=5.683448254105506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:16:59,359] [INFO] [timer.py:197:stop] 0/9924, RunningAvgSamplesPerSec=6.323471692026716, CurrSamplesPerSec=5.7124340454234295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:17:10,663] [INFO] [timer.py:197:stop] 0/9926, RunningAvgSamplesPerSec=6.323470406968654, CurrSamplesPerSec=5.644789687229534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:17:21,915] [INFO] [timer.py:197:stop] 0/9928, RunningAvgSamplesPerSec=6.323474435035389, CurrSamplesPerSec=5.694953651873411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:17:33,565] [INFO] [timer.py:197:stop] 0/9930, RunningAvgSamplesPerSec=6.323431624229629, CurrSamplesPerSec=5.342863126482902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:17:44,866] [INFO] [timer.py:197:stop] 0/9932, RunningAvgSamplesPerSec=6.323438797823788, CurrSamplesPerSec=5.723793828202679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:17:56,196] [INFO] [timer.py:197:stop] 0/9934, RunningAvgSamplesPerSec=6.323438979102163, CurrSamplesPerSec=5.6803282948450935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:18:07,620] [INFO] [timer.py:197:stop] 0/9936, RunningAvgSamplesPerSec=6.323437264776838, CurrSamplesPerSec=5.694957759772246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 8.666666666666668e-08, 'epoch': 37.22} [2022-12-19 17:18:18,917] [INFO] [timer.py:197:stop] 0/9938, RunningAvgSamplesPerSec=6.323442110556006, CurrSamplesPerSec=5.7074373053636105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:18:30,415] [INFO] [logging.py:68:log_dist] [Rank 0] step=4970, skipped=6, lr=[8.222222222222223e-08], mom=[[0.9, 0.999]] [2022-12-19 17:18:30,424] [INFO] [timer.py:197:stop] 0/9940, RunningAvgSamplesPerSec=6.323413869992615, CurrSamplesPerSec=5.463416349739464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:18:41,951] [INFO] [timer.py:197:stop] 0/9942, RunningAvgSamplesPerSec=6.323420085790521, CurrSamplesPerSec=5.731753576887019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:18:53,227] [INFO] [timer.py:197:stop] 0/9944, RunningAvgSamplesPerSec=6.323422947132712, CurrSamplesPerSec=5.696280086190056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:19:04,776] [INFO] [timer.py:197:stop] 0/9946, RunningAvgSamplesPerSec=6.32342294646983, CurrSamplesPerSec=5.719176358189957, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:19:16,045] [INFO] [timer.py:197:stop] 0/9948, RunningAvgSamplesPerSec=6.32342585419382, CurrSamplesPerSec=5.695067950328934, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:19:27,411] [INFO] [timer.py:197:stop] 0/9950, RunningAvgSamplesPerSec=6.323418110623573, CurrSamplesPerSec=5.603789899496652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:19:38,685] [INFO] [timer.py:197:stop] 0/9952, RunningAvgSamplesPerSec=6.323423480732561, CurrSamplesPerSec=5.703806618049058, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:19:50,257] [INFO] [timer.py:197:stop] 0/9954, RunningAvgSamplesPerSec=6.3234223946571895, CurrSamplesPerSec=5.6685088435640765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:20:01,621] [INFO] [timer.py:197:stop] 0/9956, RunningAvgSamplesPerSec=6.323415651845205, CurrSamplesPerSec=5.698738087304345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:20:12,882] [INFO] [timer.py:197:stop] 0/9958, RunningAvgSamplesPerSec=6.323420961754914, CurrSamplesPerSec=5.719809563170383, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:20:24,549] [INFO] [logging.py:68:log_dist] [Rank 0] step=4980, skipped=6, lr=[6.000000000000001e-08], mom=[[0.9, 0.999]] [2022-12-19 17:20:24,550] [INFO] [timer.py:197:stop] 0/9960, RunningAvgSamplesPerSec=6.3234236722681905, CurrSamplesPerSec=5.681341527042789, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:20:35,818] [INFO] [timer.py:197:stop] 0/9962, RunningAvgSamplesPerSec=6.323428223665327, CurrSamplesPerSec=5.698850118001992, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:20:47,415] [INFO] [timer.py:197:stop] 0/9964, RunningAvgSamplesPerSec=6.32339384073767, CurrSamplesPerSec=5.398530603051255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:20:58,790] [INFO] [timer.py:197:stop] 0/9966, RunningAvgSamplesPerSec=6.323397299884263, CurrSamplesPerSec=5.69351552665189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:21:10,035] [INFO] [timer.py:197:stop] 0/9968, RunningAvgSamplesPerSec=6.323406304551056, CurrSamplesPerSec=5.731422661656761, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:21:21,608] [INFO] [timer.py:197:stop] 0/9970, RunningAvgSamplesPerSec=6.323405460317379, CurrSamplesPerSec=5.713472384799131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:21:32,886] [INFO] [timer.py:197:stop] 0/9972, RunningAvgSamplesPerSec=6.323408975663026, CurrSamplesPerSec=5.687774861330066, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:21:44,165] [INFO] [timer.py:197:stop] 0/9974, RunningAvgSamplesPerSec=6.323411223551371, CurrSamplesPerSec=5.688894193677786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:21:55,445] [INFO] [timer.py:197:stop] 0/9976, RunningAvgSamplesPerSec=6.3234162703923325, CurrSamplesPerSec=5.70913186463267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:22:06,926] [INFO] [timer.py:197:stop] 0/9978, RunningAvgSamplesPerSec=6.323421347738896, CurrSamplesPerSec=5.703013619096742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:22:18,445] [INFO] [logging.py:68:log_dist] [Rank 0] step=4990, skipped=6, lr=[3.777777777777778e-08], mom=[[0.9, 0.999]] [2022-12-19 17:22:18,447] [INFO] [timer.py:197:stop] 0/9980, RunningAvgSamplesPerSec=6.3233961016107285, CurrSamplesPerSec=5.711089142538764, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:22:29,858] [INFO] [timer.py:197:stop] 0/9982, RunningAvgSamplesPerSec=6.3233987277951424, CurrSamplesPerSec=5.6968753455290635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:22:41,178] [INFO] [timer.py:197:stop] 0/9984, RunningAvgSamplesPerSec=6.3233954560666215, CurrSamplesPerSec=5.653687775262317, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:22:52,494] [INFO] [timer.py:197:stop] 0/9986, RunningAvgSamplesPerSec=6.323397686167688, CurrSamplesPerSec=5.686217738206968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 3.1111111111111114e-08, 'epoch': 37.4} [2022-12-19 17:23:03,776] [INFO] [timer.py:197:stop] 0/9988, RunningAvgSamplesPerSec=6.323400632567268, CurrSamplesPerSec=5.696728332467407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:23:15,381] [INFO] [timer.py:197:stop] 0/9990, RunningAvgSamplesPerSec=6.323401288369146, CurrSamplesPerSec=5.683423224997966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:23:26,654] [INFO] [timer.py:197:stop] 0/9992, RunningAvgSamplesPerSec=6.3234033288983476, CurrSamplesPerSec=5.681763372858846, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:23:38,150] [INFO] [timer.py:197:stop] 0/9994, RunningAvgSamplesPerSec=6.323402941882931, CurrSamplesPerSec=5.711038839465946, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:23:49,431] [INFO] [timer.py:197:stop] 0/9996, RunningAvgSamplesPerSec=6.323409178546611, CurrSamplesPerSec=5.714779234707342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:24:00,798] [INFO] [timer.py:197:stop] 0/9998, RunningAvgSamplesPerSec=6.32340002810726, CurrSamplesPerSec=5.61287455134299, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:24:12,171] [INFO] [logging.py:68:log_dist] [Rank 0] step=5000, skipped=6, lr=[1.5555555555555557e-08], mom=[[0.9, 0.999]] [2022-12-19 17:24:12,173] [INFO] [timer.py:197:stop] 0/10000, RunningAvgSamplesPerSec=6.323406430612569, CurrSamplesPerSec=5.707368136281433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:24:23,474] [INFO] [timer.py:197:stop] 0/10002, RunningAvgSamplesPerSec=6.323404975797474, CurrSamplesPerSec=5.670214377612253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:24:34,827] [INFO] [timer.py:197:stop] 0/10004, RunningAvgSamplesPerSec=6.3234006226019375, CurrSamplesPerSec=5.699110249459824, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:24:46,412] [INFO] [timer.py:197:stop] 0/10006, RunningAvgSamplesPerSec=6.323399769946995, CurrSamplesPerSec=5.672203067800118, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:24:57,733] [INFO] [timer.py:197:stop] 0/10008, RunningAvgSamplesPerSec=6.323399414196627, CurrSamplesPerSec=5.661337065779127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:25:09,021] [INFO] [timer.py:197:stop] 0/10010, RunningAvgSamplesPerSec=6.323401782207115, CurrSamplesPerSec=5.699252545365281, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:25:20,519] [INFO] [timer.py:197:stop] 0/10012, RunningAvgSamplesPerSec=6.3233793995297445, CurrSamplesPerSec=5.492515471849727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:25:31,970] [INFO] [timer.py:197:stop] 0/10014, RunningAvgSamplesPerSec=6.323384019741134, CurrSamplesPerSec=5.697356092523878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:25:43,299] [INFO] [timer.py:197:stop] 0/10016, RunningAvgSamplesPerSec=6.323389521084357, CurrSamplesPerSec=5.71487072685683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:25:54,668] [INFO] [timer.py:197:stop] 0/10018, RunningAvgSamplesPerSec=6.323377678552771, CurrSamplesPerSec=5.570472063788173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:26:05,914] [INFO] [logging.py:68:log_dist] [Rank 0] step=5010, skipped=6, lr=[0.0], mom=[[0.9, 0.999]] [2022-12-19 17:26:05,915] [INFO] [timer.py:197:stop] 0/10020, RunningAvgSamplesPerSec=6.323382480652598, CurrSamplesPerSec=5.713756960539757, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:26:17,209] [INFO] [timer.py:197:stop] 0/10022, RunningAvgSamplesPerSec=6.323385164084235, CurrSamplesPerSec=5.701519347665819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:26:28,796] [INFO] [timer.py:197:stop] 0/10024, RunningAvgSamplesPerSec=6.323390351005826, CurrSamplesPerSec=5.727174580609213, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:26:40,080] [INFO] [timer.py:197:stop] 0/10026, RunningAvgSamplesPerSec=6.323391542086301, CurrSamplesPerSec=5.679400497013081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:26:51,418] [INFO] [timer.py:197:stop] 0/10028, RunningAvgSamplesPerSec=6.323385713513873, CurrSamplesPerSec=5.646799304046079, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:27:02,705] [INFO] [timer.py:197:stop] 0/10030, RunningAvgSamplesPerSec=6.323388054568966, CurrSamplesPerSec=5.677830181016791, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:27:13,958] [INFO] [timer.py:197:stop] 0/10032, RunningAvgSamplesPerSec=6.323393176515903, CurrSamplesPerSec=5.714544921098391, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:27:25,518] [INFO] [timer.py:197:stop] 0/10034, RunningAvgSamplesPerSec=6.323399436206267, CurrSamplesPerSec=5.72215349486803, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-19 17:27:36,811] [INFO] [timer.py:197:stop] 0/10036, RunningAvgSamplesPerSec=6.3234024936359425, CurrSamplesPerSec=5.70603240972438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0002, 'learning_rate': 0.0, 'epoch': 37.59} {'eval_loss': 0.331298828125, 'eval_wer': 15.600355766380078, 'eval_runtime': 1391.4178, 'eval_samples_per_second': 3.328, 'eval_steps_per_second': 0.416, 'epoch': 37.59} [2022-12-19 17:50:57,245] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step5018 is begin to save! [2022-12-19 17:50:57,255] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-5000/global_step5018/mp_rank_00_model_states.pt [2022-12-19 17:50:57,255] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-5000/global_step5018/mp_rank_00_model_states.pt... [2022-12-19 17:51:01,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-5000/global_step5018/mp_rank_00_model_states.pt. [2022-12-19 17:51:01,145] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-5000/global_step5018/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-19 17:51:17,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-5000/global_step5018/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-19 17:51:17,444] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-5000/global_step5018/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-19 17:51:17,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5018 is ready now! [2022-12-19 17:53:07,628] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.7, git-hash=unknown, git-branch=unknown [2022-12-19 17:53:07,690] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2022-12-19 17:53:09,125] [WARNING] [cpu_adam.py:83:__init__] FP16 params for CPUAdam may not work on AMD CPUs Installed CUDA version 11.6 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Time to load cpu_adam op: 3.2118442058563232 seconds Adam Optimizer #1 is created with AVX2 arithmetic capability. Config: alpha=0.000010, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 [2022-12-19 17:53:12,976] [INFO] [logging.py:68:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer [2022-12-19 17:53:13,280] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam [2022-12-19 17:53:13,280] [INFO] [utils.py:52:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type= [2022-12-19 17:53:13,280] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 2 optimizer [2022-12-19 17:53:13,280] [INFO] [stage_1_and_2.py:140:__init__] Reduce bucket size 200000000 [2022-12-19 17:53:13,280] [INFO] [stage_1_and_2.py:141:__init__] Allgather bucket size 200000000 [2022-12-19 17:53:13,280] [INFO] [stage_1_and_2.py:142:__init__] CPU Offload: True [2022-12-19 17:53:13,280] [INFO] [stage_1_and_2.py:143:__init__] Round robin gradient partitioning: False Time to load utils op: 0.0003826618194580078 seconds Rank: 0 partition count [1] and sizes[(1543304960, False)] [2022-12-19 17:53:16,589] [INFO] [utils.py:827:see_memory_usage] Before initializing optimizer states [2022-12-19 17:53:16,590] [INFO] [utils.py:828:see_memory_usage] MA 6.0 GB Max_MA 19.53 GB CA 29.61 GB Max_CA 30 GB [2022-12-19 17:53:16,590] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 48.69 GB, percent = 24.8% [2022-12-19 17:53:20,495] [INFO] [utils.py:827:see_memory_usage] After initializing optimizer states [2022-12-19 17:53:20,496] [INFO] [utils.py:828:see_memory_usage] MA 6.0 GB Max_MA 6.0 GB CA 29.61 GB Max_CA 30 GB [2022-12-19 17:53:20,496] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 68.29 GB, percent = 34.7% [2022-12-19 17:53:20,496] [INFO] [stage_1_and_2.py:525:__init__] optimizer state initialized [2022-12-19 17:53:20,573] [INFO] [utils.py:827:see_memory_usage] After initializing ZeRO optimizer [2022-12-19 17:53:20,574] [INFO] [utils.py:828:see_memory_usage] MA 6.0 GB Max_MA 6.0 GB CA 29.61 GB Max_CA 30 GB [2022-12-19 17:53:20,574] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 68.3 GB, percent = 34.7% [2022-12-19 17:53:20,603] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw [2022-12-19 17:53:20,604] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupDecayLR [2022-12-19 17:53:20,604] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-12-19 17:53:20,604] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-19 17:53:20,605] [INFO] [config.py:1020:print] DeepSpeedEngine configuration: [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] amp_enabled .................. False [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] amp_params ................... False [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] bfloat16_enabled ............. False [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] checkpoint_parallel_write_pipeline False [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] checkpoint_tag_validation_enabled True [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] checkpoint_tag_validation_fail False [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] comms_config ................. [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] communication_data_type ...... None [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] curriculum_enabled ........... False [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] curriculum_params ............ False [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] dataloader_drop_last ......... False [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] disable_allgather ............ False [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] dump_state ................... False [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1} [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] eigenvalue_enabled ........... False [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] eigenvalue_gas_boundary_resolution 1 [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] eigenvalue_layer_num ......... 0 [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] eigenvalue_max_iter .......... 100 [2022-12-19 17:53:20,606] [INFO] [config.py:1024:print] eigenvalue_stability ......... 1e-06 [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] eigenvalue_tol ............... 0.01 [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] eigenvalue_verbose ........... False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] elasticity_enabled ........... False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] fp16_auto_cast ............... False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] fp16_enabled ................. True [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] fp16_master_weights_and_gradients False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] global_rank .................. 0 [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] grad_accum_dtype ............. None [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] gradient_accumulation_steps .. 2 [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] gradient_clipping ............ 1.0 [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] gradient_predivide_factor .... 1.0 [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] initial_dynamic_scale ........ 65536 [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] load_universal_checkpoint .... False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] loss_scale ................... 0 [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] memory_breakdown ............. False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] monitor_config ............... [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] optimizer_legacy_fusion ...... False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] optimizer_name ............... adamw [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] optimizer_params ............. {'lr': 1e-05, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.0} [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] pld_enabled .................. False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] pld_params ................... False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] prescale_gradients ........... False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] scheduler_name ............... WarmupDecayLR [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] scheduler_params ............. {'last_batch_iteration': -1, 'total_num_steps': 5000, 'warmup_min_lr': 0, 'warmup_max_lr': 1e-05, 'warmup_num_steps': 500} [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] sparse_attention ............. None [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] sparse_gradients_enabled ..... False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] steps_per_print .............. 10 [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] train_batch_size ............. 64 [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] train_micro_batch_size_per_gpu 32 [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] use_node_local_storage ....... False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] wall_clock_breakdown ......... False [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] world_size ................... 1 [2022-12-19 17:53:20,607] [INFO] [config.py:1024:print] zero_allow_untested_optimizer False [2022-12-19 17:53:20,608] [INFO] [config.py:1024:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=200000000 allgather_partitions=True allgather_bucket_size=200000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=True, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [2022-12-19 17:53:20,608] [INFO] [config.py:1024:print] zero_enabled ................. True [2022-12-19 17:53:20,608] [INFO] [config.py:1024:print] zero_optimization_stage ...... 2 [2022-12-19 17:53:20,608] [INFO] [config.py:1009:print_user_config] json = { "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": 1e-05, "betas": [0.9, 0.999], "eps": 1e-08, "weight_decay": 0.0 } }, "scheduler": { "type": "WarmupDecayLR", "params": { "last_batch_iteration": -1, "total_num_steps": 5.000000e+03, "warmup_min_lr": 0, "warmup_max_lr": 1e-05, "warmup_num_steps": 500 } }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2.000000e+08, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2.000000e+08, "contiguous_gradients": true }, "gradient_accumulation_steps": 2, "gradient_clipping": 1.0, "train_batch_size": 64, "train_micro_batch_size_per_gpu": 32 } Time to load utils op: 0.0003173351287841797 seconds [2022-12-19 17:53:20,609] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from ./checkpoint-4000/global_step4015/mp_rank_00_model_states.pt... [2022-12-19 17:53:21,616] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from ./checkpoint-4000/global_step4015/mp_rank_00_model_states.pt. [2022-12-19 17:53:21,742] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from ./checkpoint-4000/global_step4015/mp_rank_00_model_states.pt... [2022-12-19 17:53:22,870] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from ./checkpoint-4000/global_step4015/mp_rank_00_model_states.pt. [2022-12-19 17:53:23,240] [INFO] [torch_checkpoint_engine.py:21:load] [Torch] Loading checkpoint from ./checkpoint-4000/global_step4015/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-19 17:53:29,120] [INFO] [torch_checkpoint_engine.py:23:load] [Torch] Loaded checkpoint from ./checkpoint-4000/global_step4015/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-19 17:53:29,120] [INFO] [engine.py:2900:_get_all_zero_checkpoint_state_dicts] successfully read 1 ZeRO state_dicts for rank 0 [2022-12-19 17:53:30,350] [INFO] [engine.py:2840:_load_zero_checkpoint] loading 1 zero partition checkpoints for rank 0 {'train_runtime': 64619.8702, 'train_samples_per_second': 4.952, 'train_steps_per_second': 0.077, 'train_loss': 0.01643497463464737, 'epoch': 37.59} 12/19/2022 17:56:04 - WARNING - huggingface_hub.repository - Several commits (2) will be pushed upstream. 12/19/2022 17:56:04 - WARNING - huggingface_hub.repository - The progress bars may be unreliable. 12/19/2022 17:56:11 - WARNING - huggingface_hub.repository - remote: Scanning LFS files for validity, may be slow... remote: LFS file scan complete. To https://huggingface.co/mikr/whisper-large2-hu-cv11 ac2e03c..27a232c main -> main 12/19/2022 17:56:51 - WARNING - huggingface_hub.repository - To https://huggingface.co/mikr/whisper-large2-hu-cv11 27a232c..22308dd main -> main ***** train metrics ***** epoch = 37.59 train_loss = 0.0164 train_runtime = 17:56:59.87 train_samples_per_second = 4.952 train_steps_per_second = 0.077 12/19/2022 17:56:54 - INFO - __main__ - *** Evaluate *** ***** eval metrics ***** epoch = 37.59 eval_loss = 0.3247 eval_runtime = 0:23:28.84 eval_samples_per_second = 3.287 eval_steps_per_second = 0.411 eval_wer = 15.5944 12/19/2022 18:21:23 - WARNING - huggingface_hub.repository - remote: Scanning LFS files for validity, may be slow... remote: LFS file scan complete. To https://huggingface.co/mikr/whisper-large2-hu-cv11 22308dd..d9d4375 main -> main