[2022-12-14 16:16:14,483] [WARNING] [runner.py:179:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2022-12-14 16:16:14,494] [INFO] [runner.py:508:main] cmd = /home/milan/hf_env/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 run_speech_recognition_seq2seq_streaming.py --deepspeed=ds_config.json --model_name_or_path=openai/whisper-small --dataset_name=facebook/voxpopuli --dataset_config_name=hr --language=croatian --train_split_name=train+validation --eval_split_name=test --model_index_name=Whisper Small Croatian --max_steps=5000 --output_dir=./ --per_device_train_batch_size=64 --per_device_eval_batch_size=32 --logging_steps=25 --learning_rate=1e-5 --warmup_steps=500 --evaluation_strategy=steps --eval_steps=1000 --save_strategy=steps --save_steps=1000 --generation_max_length=225 --length_column_name=input_length --max_duration_in_seconds=30 --text_column_name=normalized_text --freeze_feature_encoder=False --report_to=tensorboard --metric_for_best_model=wer --greater_is_better=False --load_best_model_at_end --gradient_checkpointing --fp16 --overwrite_output_dir --do_train --do_eval --predict_with_generate --do_normalize_eval --streaming --use_auth_token --push_to_hub [2022-12-14 16:16:16,035] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]} [2022-12-14 16:16:16,035] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0 [2022-12-14 16:16:16,035] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(, {'localhost': [0]}) [2022-12-14 16:16:16,035] [INFO] [launch.py:162:main] dist_world_size=1 [2022-12-14 16:16:16,035] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0 [2022-12-14 16:16:20,163] [INFO] [comm.py:654:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 12/14/2022 16:16:20 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True 12/14/2022 16:16:20 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=ds_config.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=1000, evaluation_strategy=steps, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_max_length=225, generation_num_beams=None, gradient_accumulation_steps=1, gradient_checkpointing=True, greater_is_better=False, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=input_length, load_best_model_at_end=True, local_rank=0, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=./runs/Dec14_16-16-20_129-146-123-136, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=25, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=5000, metric_for_best_model=wer, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_hf, optim_args=None, output_dir=./, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=32, per_device_train_batch_size=64, predict_with_generate=True, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./, save_on_each_node=False, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=500, weight_decay=0.0, xpu_backend=None, ) 12/14/2022 16:16:20 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=ds_config.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=1000, evaluation_strategy=steps, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_max_length=225, generation_num_beams=None, gradient_accumulation_steps=1, gradient_checkpointing=True, greater_is_better=False, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=input_length, load_best_model_at_end=True, local_rank=0, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=./runs/Dec14_16-16-20_129-146-123-136, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=25, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=5000, metric_for_best_model=wer, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_hf, optim_args=None, output_dir=./, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=32, per_device_train_batch_size=64, predict_with_generate=True, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./, save_on_each_node=False, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=500, weight_decay=0.0, xpu_backend=None, ) 12/14/2022 16:16:21 - INFO - datasets.utils.file_utils - https://huggingface.co/datasets/facebook/voxpopuli/resolve/main/voxpopuli.py not found in cache or force_download set to True, downloading to /home/milan/.cache/huggingface/datasets/downloads/tmpytvg4dlk 12/14/2022 16:16:21 - INFO - datasets.utils.file_utils - storing https://huggingface.co/datasets/facebook/voxpopuli/resolve/main/voxpopuli.py in cache at /home/milan/.cache/huggingface/datasets/downloads/106bf524483c334048ae062c58ec9b0e6b97d2b58ee9189e35d7de119584e588.e2ef81d0abcf78daf2af04b0007d1e3b9b865252392ed14408b8bf57cce986b7.py 12/14/2022 16:16:21 - INFO - datasets.utils.file_utils - creating metadata file for /home/milan/.cache/huggingface/datasets/downloads/106bf524483c334048ae062c58ec9b0e6b97d2b58ee9189e35d7de119584e588.e2ef81d0abcf78daf2af04b0007d1e3b9b865252392ed14408b8bf57cce986b7.py 12/14/2022 16:16:22 - INFO - datasets.utils.file_utils - https://huggingface.co/datasets/facebook/voxpopuli/resolve/main/README.md not found in cache or force_download set to True, downloading to /home/milan/.cache/huggingface/datasets/downloads/tmpecycnlp_ 12/14/2022 16:16:22 - INFO - datasets.utils.file_utils - storing https://huggingface.co/datasets/facebook/voxpopuli/resolve/main/README.md in cache at /home/milan/.cache/huggingface/datasets/downloads/64e94143cbc03db672e215dcfdbc3f01c69ce8fd2fafc243def70ffe62bfcea6.e94f2c74349ad27464fb7b6584da6f25bff8564f6df09f92efdd7436695ce51b 12/14/2022 16:16:22 - INFO - datasets.utils.file_utils - creating metadata file for /home/milan/.cache/huggingface/datasets/downloads/64e94143cbc03db672e215dcfdbc3f01c69ce8fd2fafc243def70ffe62bfcea6.e94f2c74349ad27464fb7b6584da6f25bff8564f6df09f92efdd7436695ce51b 12/14/2022 16:16:22 - INFO - datasets.info - Loading Dataset Infos from /home/milan/.cache/huggingface/modules/datasets_modules/datasets/facebook--voxpopuli/b5ff837284f0778eefe0f642734e142d8c3f574eba8c9c8a4b13602297f73604 12/14/2022 16:16:24 - INFO - datasets.info - Loading Dataset Infos from /home/milan/.cache/huggingface/modules/datasets_modules/datasets/facebook--voxpopuli/b5ff837284f0778eefe0f642734e142d8c3f574eba8c9c8a4b13602297f73604 12/14/2022 16:16:25 - INFO - datasets.info - Loading Dataset Infos from /home/milan/.cache/huggingface/modules/datasets_modules/datasets/facebook--voxpopuli/b5ff837284f0778eefe0f642734e142d8c3f574eba8c9c8a4b13602297f73604 12/14/2022 16:16:50 - WARNING - huggingface_hub.repository - /home/milan/whisper-small-hr-vox/./ is already a clone of https://huggingface.co/mikr/whisper-small-hr-vox. Make sure you pull the latest changes with `repo.git_pull()`. [2022-12-14 16:16:54,254] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.7, git-hash=unknown, git-branch=unknown [2022-12-14 16:16:54,570] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2022-12-14 16:16:55,696] [WARNING] [cpu_adam.py:83:__init__] FP16 params for CPUAdam may not work on AMD CPUs Installed CUDA version 11.6 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination [1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/milan/hf_env/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/TH -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -c /home/milan/hf_env/lib/python3.8/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o [2/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/milan/hf_env/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/TH -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/usr/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -c /home/milan/hf_env/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o [3/3] c++ cpu_adam.o custom_cuda_kernel.cuda.o -shared -lcurand -L/home/milan/hf_env/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/lib64 -lcudart -o cpu_adam.so Time to load cpu_adam op: 27.422913312911987 seconds Adam Optimizer #0 is created with AVX2 arithmetic capability. Config: alpha=0.000010, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 [2022-12-14 16:17:24,680] [INFO] [logging.py:68:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer [2022-12-14 16:17:24,726] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam [2022-12-14 16:17:24,726] [INFO] [utils.py:52:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type= [2022-12-14 16:17:24,727] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 2 optimizer [2022-12-14 16:17:24,727] [INFO] [stage_1_and_2.py:140:__init__] Reduce bucket size 200000000 [2022-12-14 16:17:24,727] [INFO] [stage_1_and_2.py:141:__init__] Allgather bucket size 200000000 [2022-12-14 16:17:24,727] [INFO] [stage_1_and_2.py:142:__init__] CPU Offload: True [2022-12-14 16:17:24,727] [INFO] [stage_1_and_2.py:143:__init__] Round robin gradient partitioning: False [1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/TH -isystem /home/milan/hf_env/lib/python3.8/site-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/milan/hf_env/lib/python3.8/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o [2/2] c++ flatten_unflatten.o -shared -L/home/milan/hf_env/lib/python3.8/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o utils.so Time to load utils op: 14.912862777709961 seconds Rank: 0 partition count [1] and sizes[(241734912, False)] [2022-12-14 16:17:40,429] [INFO] [utils.py:827:see_memory_usage] Before initializing optimizer states [2022-12-14 16:17:40,430] [INFO] [utils.py:828:see_memory_usage] MA 0.53 GB Max_MA 0.53 GB CA 0.94 GB Max_CA 1 GB [2022-12-14 16:17:40,430] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 7.45 GB, percent = 3.8% [2022-12-14 16:17:41,202] [INFO] [utils.py:827:see_memory_usage] After initializing optimizer states [2022-12-14 16:17:41,202] [INFO] [utils.py:828:see_memory_usage] MA 0.53 GB Max_MA 0.53 GB CA 0.94 GB Max_CA 1 GB [2022-12-14 16:17:41,203] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 10.27 GB, percent = 5.2% [2022-12-14 16:17:41,203] [INFO] [stage_1_and_2.py:525:__init__] optimizer state initialized [2022-12-14 16:17:41,269] [INFO] [utils.py:827:see_memory_usage] After initializing ZeRO optimizer [2022-12-14 16:17:41,270] [INFO] [utils.py:828:see_memory_usage] MA 0.53 GB Max_MA 0.53 GB CA 0.94 GB Max_CA 1 GB [2022-12-14 16:17:41,270] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 10.27 GB, percent = 5.2% [2022-12-14 16:17:41,279] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw [2022-12-14 16:17:41,279] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupLR [2022-12-14 16:17:41,279] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-12-14 16:17:41,279] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 16:17:41,280] [INFO] [config.py:1020:print] DeepSpeedEngine configuration: [2022-12-14 16:17:41,280] [INFO] [config.py:1024:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-12-14 16:17:41,280] [INFO] [config.py:1024:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] amp_enabled .................. False [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] amp_params ................... False [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] bfloat16_enabled ............. False [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] checkpoint_parallel_write_pipeline False [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] checkpoint_tag_validation_enabled True [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] checkpoint_tag_validation_fail False [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] comms_config ................. [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] communication_data_type ...... None [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] curriculum_enabled ........... False [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] curriculum_params ............ False [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] dataloader_drop_last ......... False [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] disable_allgather ............ False [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] dump_state ................... False [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1} [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] eigenvalue_enabled ........... False [2022-12-14 16:17:41,281] [INFO] [config.py:1024:print] eigenvalue_gas_boundary_resolution 1 [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] eigenvalue_layer_num ......... 0 [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] eigenvalue_max_iter .......... 100 [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] eigenvalue_stability ......... 1e-06 [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] eigenvalue_tol ............... 0.01 [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] eigenvalue_verbose ........... False [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] elasticity_enabled ........... False [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] fp16_auto_cast ............... False [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] fp16_enabled ................. True [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] fp16_master_weights_and_gradients False [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] global_rank .................. 0 [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] grad_accum_dtype ............. None [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] gradient_accumulation_steps .. 1 [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] gradient_clipping ............ 1.0 [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] gradient_predivide_factor .... 1.0 [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] initial_dynamic_scale ........ 65536 [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] load_universal_checkpoint .... False [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] loss_scale ................... 0 [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] memory_breakdown ............. False [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] monitor_config ............... [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] optimizer_legacy_fusion ...... False [2022-12-14 16:17:41,282] [INFO] [config.py:1024:print] optimizer_name ............... adamw [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] optimizer_params ............. {'lr': 1e-05, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.0} [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] pld_enabled .................. False [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] pld_params ................... False [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] prescale_gradients ........... False [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] scheduler_name ............... WarmupLR [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] scheduler_params ............. {'warmup_min_lr': 0, 'warmup_max_lr': 1e-05, 'warmup_num_steps': 500} [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] sparse_attention ............. None [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] sparse_gradients_enabled ..... False [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] steps_per_print .............. 10 [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] train_batch_size ............. 64 [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] train_micro_batch_size_per_gpu 64 [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] use_node_local_storage ....... False [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] wall_clock_breakdown ......... False [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] world_size ................... 1 [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] zero_allow_untested_optimizer False [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=200000000 allgather_partitions=True allgather_bucket_size=200000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=True, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] zero_enabled ................. True [2022-12-14 16:17:41,283] [INFO] [config.py:1024:print] zero_optimization_stage ...... 2 [2022-12-14 16:17:41,284] [INFO] [config.py:1009:print_user_config] json = { "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": 1e-05, "betas": [0.9, 0.999], "eps": 1e-08, "weight_decay": 0.0 } }, "scheduler": { "type": "WarmupLR", "params": { "warmup_min_lr": 0, "warmup_max_lr": 1e-05, "warmup_num_steps": 500 } }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2.000000e+08, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2.000000e+08, "contiguous_gradients": true }, "gradient_accumulation_steps": 1, "gradient_clipping": 1.0, "train_batch_size": 64, "train_micro_batch_size_per_gpu": 64 } Time to load utils op: 0.0003387928009033203 seconds [2022-12-14 16:19:01,227] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 65536 [2022-12-14 16:19:08,720] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768.0 [2022-12-14 16:19:15,883] [INFO] [timer.py:197:stop] 0/3, RunningAvgSamplesPerSec=29.600569261531366, CurrSamplesPerSec=29.600569261531366, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:19:22,453] [INFO] [timer.py:197:stop] 0/4, RunningAvgSamplesPerSec=29.84915059859713, CurrSamplesPerSec=30.10194239536303, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:19:29,055] [INFO] [timer.py:197:stop] 0/5, RunningAvgSamplesPerSec=29.860358523922898, CurrSamplesPerSec=29.882799644036442, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:19:35,856] [INFO] [timer.py:197:stop] 0/6, RunningAvgSamplesPerSec=29.738395511956575, CurrSamplesPerSec=29.378411930087303, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:19:42,753] [INFO] [timer.py:197:stop] 0/7, RunningAvgSamplesPerSec=29.8687180635254, CurrSamplesPerSec=30.40163431370184, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:19:49,351] [INFO] [timer.py:197:stop] 0/8, RunningAvgSamplesPerSec=29.89304192333273, CurrSamplesPerSec=30.01525790256393, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:19:56,629] [INFO] [timer.py:197:stop] 0/9, RunningAvgSamplesPerSec=29.940243610774065, CurrSamplesPerSec=30.226614036065587, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:20:03,623] [INFO] [logging.py:68:log_dist] [Rank 0] step=10, skipped=2, lr=[3.3460541819326935e-06], mom=[[0.9, 0.999]] [2022-12-14 16:20:03,623] [INFO] [timer.py:197:stop] 0/10, RunningAvgSamplesPerSec=29.907359011353645, CurrSamplesPerSec=29.679174019602694, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:20:11,000] [INFO] [timer.py:197:stop] 0/11, RunningAvgSamplesPerSec=29.902437357989612, CurrSamplesPerSec=29.86312236889635, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:20:17,763] [INFO] [timer.py:197:stop] 0/12, RunningAvgSamplesPerSec=29.86817169313274, CurrSamplesPerSec=29.56327853542334, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:20:24,057] [INFO] [timer.py:197:stop] 0/13, RunningAvgSamplesPerSec=29.837106570960852, CurrSamplesPerSec=29.52997286879566, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:20:30,994] [INFO] [timer.py:197:stop] 0/14, RunningAvgSamplesPerSec=29.79637836635386, CurrSamplesPerSec=29.355598064273707, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:20:37,212] [INFO] [timer.py:197:stop] 0/15, RunningAvgSamplesPerSec=29.825892070138377, CurrSamplesPerSec=30.184671832841286, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:20:44,782] [INFO] [timer.py:197:stop] 0/16, RunningAvgSamplesPerSec=29.840228129760057, CurrSamplesPerSec=30.027858906049715, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:20:51,255] [INFO] [timer.py:197:stop] 0/17, RunningAvgSamplesPerSec=29.884671093684663, CurrSamplesPerSec=30.521068864924246, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:20:57,498] [INFO] [timer.py:197:stop] 0/18, RunningAvgSamplesPerSec=29.890886340636435, CurrSamplesPerSec=29.98442624269254, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:21:04,165] [INFO] [timer.py:197:stop] 0/19, RunningAvgSamplesPerSec=29.882107919579553, CurrSamplesPerSec=29.74235113697945, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:21:10,795] [INFO] [logging.py:68:log_dist] [Rank 0] step=20, skipped=2, lr=[4.650931663140581e-06], mom=[[0.9, 0.999]] [2022-12-14 16:21:10,796] [INFO] [timer.py:197:stop] 0/20, RunningAvgSamplesPerSec=29.867278880670913, CurrSamplesPerSec=29.61741821420407, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:21:17,272] [INFO] [timer.py:197:stop] 0/21, RunningAvgSamplesPerSec=29.844038356753472, CurrSamplesPerSec=29.431808249247034, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:21:24,087] [INFO] [timer.py:197:stop] 0/22, RunningAvgSamplesPerSec=29.8036097174558, CurrSamplesPerSec=29.055754893608178, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:21:31,311] [INFO] [timer.py:197:stop] 0/23, RunningAvgSamplesPerSec=29.828601725834293, CurrSamplesPerSec=30.337394059176777, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:21:38,722] [INFO] [timer.py:197:stop] 0/24, RunningAvgSamplesPerSec=29.819506304936226, CurrSamplesPerSec=29.629775624935704, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:21:45,429] [INFO] [timer.py:197:stop] 0/25, RunningAvgSamplesPerSec=29.838995717230414, CurrSamplesPerSec=30.27430218153701, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 1.1454, 'learning_rate': 5.0453611334320685e-06, 'epoch': 0.01} [2022-12-14 16:21:52,173] [INFO] [timer.py:197:stop] 0/26, RunningAvgSamplesPerSec=29.848569209233407, CurrSamplesPerSec=30.070467621386197, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:21:59,549] [INFO] [timer.py:197:stop] 0/27, RunningAvgSamplesPerSec=29.84642259063844, CurrSamplesPerSec=29.794996211740298, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:06,836] [INFO] [timer.py:197:stop] 0/28, RunningAvgSamplesPerSec=29.83080926488274, CurrSamplesPerSec=29.44571657612171, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:13,897] [INFO] [timer.py:197:stop] 0/29, RunningAvgSamplesPerSec=29.798796059547932, CurrSamplesPerSec=28.9899154554482, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:20,873] [INFO] [logging.py:68:log_dist] [Rank 0] step=30, skipped=2, lr=[5.361890013661856e-06], mom=[[0.9, 0.999]] [2022-12-14 16:22:20,874] [INFO] [timer.py:197:stop] 0/30, RunningAvgSamplesPerSec=29.782128428164537, CurrSamplesPerSec=29.33904560362086, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:27,598] [INFO] [timer.py:197:stop] 0/31, RunningAvgSamplesPerSec=29.79391391839002, CurrSamplesPerSec=30.127737078406284, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:34,420] [INFO] [timer.py:197:stop] 0/32, RunningAvgSamplesPerSec=29.77782222578812, CurrSamplesPerSec=29.318607812202202, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:36,804] [INFO] [timer.py:197:stop] 0/33, RunningAvgSamplesPerSec=29.815271188935967, CurrSamplesPerSec=30.98425690543709, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:38,921] [INFO] [timer.py:197:stop] 0/34, RunningAvgSamplesPerSec=29.838762334848617, CurrSamplesPerSec=30.585807894767726, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:41,059] [INFO] [timer.py:197:stop] 0/35, RunningAvgSamplesPerSec=29.85245103221372, CurrSamplesPerSec=30.297219581472785, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:43,229] [INFO] [timer.py:197:stop] 0/36, RunningAvgSamplesPerSec=29.853716367740542, CurrSamplesPerSec=29.895532700546468, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:45,414] [INFO] [timer.py:197:stop] 0/37, RunningAvgSamplesPerSec=29.847766064629422, CurrSamplesPerSec=29.646857583115096, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:47,558] [INFO] [timer.py:197:stop] 0/38, RunningAvgSamplesPerSec=29.858230162057044, CurrSamplesPerSec=30.22915333852326, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:49,715] [INFO] [timer.py:197:stop] 0/39, RunningAvgSamplesPerSec=29.86308115526223, CurrSamplesPerSec=30.038772874542357, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:22:51,612] [INFO] [logging.py:68:log_dist] [Rank 0] step=40, skipped=2, lr=[5.853283267612517e-06], mom=[[0.9, 0.999]] [2022-12-14 16:22:51,613] [INFO] [timer.py:197:stop] 0/40, RunningAvgSamplesPerSec=29.962941114579873, CurrSamplesPerSec=34.1935478775732, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:24:12,666] [INFO] [timer.py:197:stop] 0/41, RunningAvgSamplesPerSec=29.970648309388704, CurrSamplesPerSec=30.266488744867694, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:24:19,348] [INFO] [timer.py:197:stop] 0/42, RunningAvgSamplesPerSec=29.967836807384465, CurrSamplesPerSec=29.85859816825456, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:24:26,128] [INFO] [timer.py:197:stop] 0/43, RunningAvgSamplesPerSec=29.915549602132852, CurrSamplesPerSec=27.96391714398192, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:24:32,349] [INFO] [timer.py:197:stop] 0/44, RunningAvgSamplesPerSec=29.924257849065697, CurrSamplesPerSec=30.285713834421983, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:24:38,469] [INFO] [timer.py:197:stop] 0/45, RunningAvgSamplesPerSec=29.933067535113366, CurrSamplesPerSec=30.30781697264223, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:24:44,757] [INFO] [timer.py:197:stop] 0/46, RunningAvgSamplesPerSec=29.92950799097278, CurrSamplesPerSec=29.777244382422875, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:24:51,275] [INFO] [timer.py:197:stop] 0/47, RunningAvgSamplesPerSec=29.921765216164907, CurrSamplesPerSec=29.585004544070873, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:24:57,333] [INFO] [timer.py:197:stop] 0/48, RunningAvgSamplesPerSec=29.93684697840389, CurrSamplesPerSec=30.631627222624974, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:25:04,379] [INFO] [timer.py:197:stop] 0/49, RunningAvgSamplesPerSec=29.945457187529467, CurrSamplesPerSec=30.346952574679083, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:25:10,893] [INFO] [logging.py:68:log_dist] [Rank 0] step=50, skipped=2, lr=[6.229195710491767e-06], mom=[[0.9, 0.999]] [2022-12-14 16:25:10,894] [INFO] [timer.py:197:stop] 0/50, RunningAvgSamplesPerSec=29.936708270411458, CurrSamplesPerSec=29.531197613525528, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.6809, 'learning_rate': 6.229195710491767e-06, 'epoch': 1.0} [2022-12-14 16:25:17,390] [INFO] [timer.py:197:stop] 0/51, RunningAvgSamplesPerSec=29.923998528458622, CurrSamplesPerSec=29.326368767303713, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:25:23,613] [INFO] [timer.py:197:stop] 0/52, RunningAvgSamplesPerSec=29.933519342684296, CurrSamplesPerSec=30.407578325703934, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:25:29,915] [INFO] [timer.py:197:stop] 0/53, RunningAvgSamplesPerSec=29.929955276297893, CurrSamplesPerSec=29.752827669312406, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:25:36,916] [INFO] [timer.py:197:stop] 0/54, RunningAvgSamplesPerSec=29.92547656948279, CurrSamplesPerSec=29.698826409611765, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:25:42,876] [INFO] [timer.py:197:stop] 0/55, RunningAvgSamplesPerSec=29.93065553177582, CurrSamplesPerSec=30.202454154083505, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:25:49,272] [INFO] [timer.py:197:stop] 0/56, RunningAvgSamplesPerSec=29.899238839639505, CurrSamplesPerSec=28.323559197599526, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:25:55,606] [INFO] [timer.py:197:stop] 0/57, RunningAvgSamplesPerSec=29.89069645566325, CurrSamplesPerSec=29.436546193811203, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:26:02,276] [INFO] [timer.py:197:stop] 0/58, RunningAvgSamplesPerSec=29.893624214223227, CurrSamplesPerSec=30.055538972204985, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:26:08,719] [INFO] [timer.py:197:stop] 0/59, RunningAvgSamplesPerSec=29.886494804178632, CurrSamplesPerSec=29.492603706809728, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:26:14,917] [INFO] [logging.py:68:log_dist] [Rank 0] step=60, skipped=2, lr=[6.533707268809618e-06], mom=[[0.9, 0.999]] [2022-12-14 16:26:14,918] [INFO] [timer.py:197:stop] 0/60, RunningAvgSamplesPerSec=29.889899928001835, CurrSamplesPerSec=30.085282977251627, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:26:21,235] [INFO] [timer.py:197:stop] 0/61, RunningAvgSamplesPerSec=29.895606343684157, CurrSamplesPerSec=30.23034825175896, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:26:27,350] [INFO] [timer.py:197:stop] 0/62, RunningAvgSamplesPerSec=29.883619954645205, CurrSamplesPerSec=29.19304253334299, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:26:35,474] [INFO] [timer.py:197:stop] 0/63, RunningAvgSamplesPerSec=29.879221451621035, CurrSamplesPerSec=29.617660032848754, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:26:41,644] [INFO] [timer.py:197:stop] 0/64, RunningAvgSamplesPerSec=29.87529914928358, CurrSamplesPerSec=29.637970544145965, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:26:48,110] [INFO] [timer.py:197:stop] 0/65, RunningAvgSamplesPerSec=29.870454393485574, CurrSamplesPerSec=29.573117752775172, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:26:54,077] [INFO] [timer.py:197:stop] 0/66, RunningAvgSamplesPerSec=29.870881651018696, CurrSamplesPerSec=29.897823538819935, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:00,909] [INFO] [timer.py:197:stop] 0/67, RunningAvgSamplesPerSec=29.87632228435029, CurrSamplesPerSec=30.228693781661896, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:07,172] [INFO] [timer.py:197:stop] 0/68, RunningAvgSamplesPerSec=29.873972757703843, CurrSamplesPerSec=29.722042161636438, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:13,579] [INFO] [timer.py:197:stop] 0/69, RunningAvgSamplesPerSec=29.87591488697557, CurrSamplesPerSec=30.00465614332373, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:20,026] [INFO] [logging.py:68:log_dist] [Rank 0] step=70, skipped=2, lr=[6.7896601657751925e-06], mom=[[0.9, 0.999]] [2022-12-14 16:27:20,027] [INFO] [timer.py:197:stop] 0/70, RunningAvgSamplesPerSec=29.874000743173745, CurrSamplesPerSec=29.74630946259818, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:26,280] [INFO] [timer.py:197:stop] 0/71, RunningAvgSamplesPerSec=29.87049695465629, CurrSamplesPerSec=29.634152227445416, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:32,641] [INFO] [timer.py:197:stop] 0/72, RunningAvgSamplesPerSec=29.84548165815693, CurrSamplesPerSec=28.215083637108194, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:35,527] [INFO] [timer.py:197:stop] 0/73, RunningAvgSamplesPerSec=29.84350267426596, CurrSamplesPerSec=29.70562296105138, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:37,710] [INFO] [timer.py:197:stop] 0/74, RunningAvgSamplesPerSec=29.841297256115247, CurrSamplesPerSec=29.685541368316066, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:39,889] [INFO] [timer.py:197:stop] 0/75, RunningAvgSamplesPerSec=29.839675017515923, CurrSamplesPerSec=29.723335549739872, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.5112, 'learning_rate': 6.903829450223392e-06, 'epoch': 1.01} [2022-12-14 16:27:42,044] [INFO] [timer.py:197:stop] 0/76, RunningAvgSamplesPerSec=29.84288306778642, CurrSamplesPerSec=30.07894860395078, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:44,204] [INFO] [timer.py:197:stop] 0/77, RunningAvgSamplesPerSec=29.844626360714454, CurrSamplesPerSec=29.974197679182442, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:46,327] [INFO] [timer.py:197:stop] 0/78, RunningAvgSamplesPerSec=29.853271035650735, CurrSamplesPerSec=30.516211301170568, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:48,474] [INFO] [timer.py:197:stop] 0/79, RunningAvgSamplesPerSec=29.85746118631038, CurrSamplesPerSec=30.179391434929336, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:27:50,396] [INFO] [logging.py:68:log_dist] [Rank 0] step=80, skipped=2, lr=[7.010432126517687e-06], mom=[[0.9, 0.999]] [2022-12-14 16:27:50,397] [INFO] [timer.py:197:stop] 0/80, RunningAvgSamplesPerSec=29.901887840184944, CurrSamplesPerSec=33.771141414456565, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:28:32,721] [INFO] [timer.py:197:stop] 0/81, RunningAvgSamplesPerSec=29.910223767557383, CurrSamplesPerSec=30.575063998118356, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:28:39,414] [INFO] [timer.py:197:stop] 0/82, RunningAvgSamplesPerSec=29.90696612188145, CurrSamplesPerSec=29.651835343147045, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:28:46,210] [INFO] [timer.py:197:stop] 0/83, RunningAvgSamplesPerSec=29.909513766649606, CurrSamplesPerSec=30.114741305460907, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:28:52,560] [INFO] [timer.py:197:stop] 0/84, RunningAvgSamplesPerSec=29.900797653094696, CurrSamplesPerSec=29.21127419310433, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:29:00,548] [INFO] [timer.py:197:stop] 0/85, RunningAvgSamplesPerSec=29.880049346772864, CurrSamplesPerSec=28.27140119897457, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:29:07,189] [INFO] [timer.py:197:stop] 0/86, RunningAvgSamplesPerSec=29.875095643265876, CurrSamplesPerSec=29.469586331958048, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:29:13,780] [INFO] [timer.py:197:stop] 0/87, RunningAvgSamplesPerSec=29.873480942036643, CurrSamplesPerSec=29.738466344800997, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:29:20,389] [INFO] [timer.py:197:stop] 0/88, RunningAvgSamplesPerSec=29.870455495306782, CurrSamplesPerSec=29.615513213515204, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:29:27,129] [INFO] [timer.py:197:stop] 0/89, RunningAvgSamplesPerSec=29.875796502723066, CurrSamplesPerSec=30.342380061133404, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:29:33,382] [INFO] [logging.py:68:log_dist] [Rank 0] step=90, skipped=2, lr=[7.204536060149867e-06], mom=[[0.9, 0.999]] [2022-12-14 16:29:33,383] [INFO] [timer.py:197:stop] 0/90, RunningAvgSamplesPerSec=29.869865350806034, CurrSamplesPerSec=29.36271696580997, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:29:40,351] [INFO] [timer.py:197:stop] 0/91, RunningAvgSamplesPerSec=29.868044681571746, CurrSamplesPerSec=29.708690315305468, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:29:46,887] [INFO] [timer.py:197:stop] 0/92, RunningAvgSamplesPerSec=29.86866381701748, CurrSamplesPerSec=29.923869862628116, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:29:53,102] [INFO] [timer.py:197:stop] 0/93, RunningAvgSamplesPerSec=29.865531586317697, CurrSamplesPerSec=29.58629581550289, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:29:59,546] [INFO] [timer.py:197:stop] 0/94, RunningAvgSamplesPerSec=29.870006835274786, CurrSamplesPerSec=30.28294637614371, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:30:05,944] [INFO] [timer.py:197:stop] 0/95, RunningAvgSamplesPerSec=29.874263124768145, CurrSamplesPerSec=30.27109985730257, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:30:14,446] [INFO] [timer.py:197:stop] 0/96, RunningAvgSamplesPerSec=29.86604516278834, CurrSamplesPerSec=29.121044226897173, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:30:21,070] [INFO] [timer.py:197:stop] 0/97, RunningAvgSamplesPerSec=29.867700867887265, CurrSamplesPerSec=30.024161112259424, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:30:27,501] [INFO] [timer.py:197:stop] 0/98, RunningAvgSamplesPerSec=29.871347653688392, CurrSamplesPerSec=30.221900778527722, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:30:34,630] [INFO] [timer.py:197:stop] 0/99, RunningAvgSamplesPerSec=29.876555553069753, CurrSamplesPerSec=30.38511280850852, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:30:41,121] [INFO] [logging.py:68:log_dist] [Rank 0] step=100, skipped=2, lr=[7.377725845391017e-06], mom=[[0.9, 0.999]] [2022-12-14 16:30:41,122] [INFO] [timer.py:197:stop] 0/100, RunningAvgSamplesPerSec=29.880088935430027, CurrSamplesPerSec=30.22684548296153, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.3992, 'learning_rate': 7.377725845391017e-06, 'epoch': 2.0} [2022-12-14 16:30:47,576] [INFO] [timer.py:197:stop] 0/101, RunningAvgSamplesPerSec=29.882057356650414, CurrSamplesPerSec=30.07622891216638, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:30:53,800] [INFO] [timer.py:197:stop] 0/102, RunningAvgSamplesPerSec=29.877705999429168, CurrSamplesPerSec=29.45310547131088, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:31:00,954] [INFO] [timer.py:197:stop] 0/103, RunningAvgSamplesPerSec=29.876934423208343, CurrSamplesPerSec=29.799977530854, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:31:08,257] [INFO] [timer.py:197:stop] 0/104, RunningAvgSamplesPerSec=29.875741221222004, CurrSamplesPerSec=29.755716771416825, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:31:15,420] [INFO] [timer.py:197:stop] 0/105, RunningAvgSamplesPerSec=29.880548017566653, CurrSamplesPerSec=30.379101938528244, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:31:21,810] [INFO] [timer.py:197:stop] 0/106, RunningAvgSamplesPerSec=29.886416017391934, CurrSamplesPerSec=30.50341902252864, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:31:29,165] [INFO] [timer.py:197:stop] 0/107, RunningAvgSamplesPerSec=29.881744953860476, CurrSamplesPerSec=29.4037990737055, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:31:36,262] [INFO] [timer.py:197:stop] 0/108, RunningAvgSamplesPerSec=29.871422054696275, CurrSamplesPerSec=28.825819368241685, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:31:43,322] [INFO] [timer.py:197:stop] 0/109, RunningAvgSamplesPerSec=29.86719284968771, CurrSamplesPerSec=29.4255879791084, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:31:51,521] [INFO] [logging.py:68:log_dist] [Rank 0] step=110, skipped=2, lr=[7.5340731916996546e-06], mom=[[0.9, 0.999]] [2022-12-14 16:31:51,521] [INFO] [timer.py:197:stop] 0/110, RunningAvgSamplesPerSec=29.870941853730013, CurrSamplesPerSec=30.27759739069556, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:31:58,443] [INFO] [timer.py:197:stop] 0/111, RunningAvgSamplesPerSec=29.87520709214369, CurrSamplesPerSec=30.34313461958872, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:32:05,126] [INFO] [timer.py:197:stop] 0/112, RunningAvgSamplesPerSec=29.87780408333542, CurrSamplesPerSec=30.163608772135834, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:32:07,505] [INFO] [timer.py:197:stop] 0/113, RunningAvgSamplesPerSec=29.881818444378382, CurrSamplesPerSec=30.33008262048034, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:32:09,766] [INFO] [timer.py:197:stop] 0/114, RunningAvgSamplesPerSec=29.870171326592605, CurrSamplesPerSec=28.631438649619344, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:32:11,916] [INFO] [timer.py:197:stop] 0/115, RunningAvgSamplesPerSec=29.872571740612898, CurrSamplesPerSec=30.143881639579124, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:32:14,102] [INFO] [timer.py:197:stop] 0/116, RunningAvgSamplesPerSec=29.87038578645115, CurrSamplesPerSec=29.625416662757967, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:32:16,274] [INFO] [timer.py:197:stop] 0/117, RunningAvgSamplesPerSec=29.873335758051173, CurrSamplesPerSec=30.21349543216306, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:32:18,500] [INFO] [timer.py:197:stop] 0/118, RunningAvgSamplesPerSec=29.868307046180544, CurrSamplesPerSec=29.3010831127219, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:32:20,661] [INFO] [timer.py:197:stop] 0/119, RunningAvgSamplesPerSec=29.87283275972825, CurrSamplesPerSec=30.407288992055616, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:32:22,522] [INFO] [logging.py:68:log_dist] [Rank 0] step=120, skipped=2, lr=[7.676565519355727e-06], mom=[[0.9, 0.999]] [2022-12-14 16:32:22,522] [INFO] [timer.py:197:stop] 0/120, RunningAvgSamplesPerSec=29.909364193458057, CurrSamplesPerSec=34.90329605345792, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:33:04,127] [INFO] [timer.py:197:stop] 0/121, RunningAvgSamplesPerSec=29.911876844300526, CurrSamplesPerSec=30.21136337443817, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:33:10,170] [INFO] [timer.py:197:stop] 0/122, RunningAvgSamplesPerSec=29.908932172072607, CurrSamplesPerSec=29.562607844113302, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:33:18,146] [INFO] [timer.py:197:stop] 0/123, RunningAvgSamplesPerSec=29.909881954328096, CurrSamplesPerSec=30.024295439415305, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:33:24,693] [INFO] [timer.py:197:stop] 0/124, RunningAvgSamplesPerSec=29.90567085183338, CurrSamplesPerSec=29.404733142381172, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:33:31,608] [INFO] [timer.py:197:stop] 0/125, RunningAvgSamplesPerSec=29.90631773337061, CurrSamplesPerSec=29.985447808251635, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.3358, 'learning_rate': 7.743343231239583e-06, 'epoch': 3.0} [2022-12-14 16:33:38,458] [INFO] [timer.py:197:stop] 0/126, RunningAvgSamplesPerSec=29.90886748799945, CurrSamplesPerSec=30.22583803410272, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:33:45,022] [INFO] [timer.py:197:stop] 0/127, RunningAvgSamplesPerSec=29.908483296224187, CurrSamplesPerSec=29.860919888528024, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:33:51,120] [INFO] [timer.py:197:stop] 0/128, RunningAvgSamplesPerSec=29.905950933985928, CurrSamplesPerSec=29.592747339099116, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:33:57,483] [INFO] [timer.py:197:stop] 0/129, RunningAvgSamplesPerSec=29.90300921506578, CurrSamplesPerSec=29.53692634932309, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:34:04,064] [INFO] [logging.py:68:log_dist] [Rank 0] step=130, skipped=2, lr=[7.807459757842952e-06], mom=[[0.9, 0.999]] [2022-12-14 16:34:04,065] [INFO] [timer.py:197:stop] 0/130, RunningAvgSamplesPerSec=29.906086703096868, CurrSamplesPerSec=30.302144496625832, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:34:11,245] [INFO] [timer.py:197:stop] 0/131, RunningAvgSamplesPerSec=29.907799224541503, CurrSamplesPerSec=30.12863316903949, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:34:17,251] [INFO] [timer.py:197:stop] 0/132, RunningAvgSamplesPerSec=29.91030660270893, CurrSamplesPerSec=30.237322165174902, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:34:23,115] [INFO] [timer.py:197:stop] 0/133, RunningAvgSamplesPerSec=29.904425082389203, CurrSamplesPerSec=29.159032278282943, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:34:29,397] [INFO] [timer.py:197:stop] 0/134, RunningAvgSamplesPerSec=29.90399273353205, CurrSamplesPerSec=29.847462917146313, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:34:35,609] [INFO] [timer.py:197:stop] 0/135, RunningAvgSamplesPerSec=29.895897067159932, CurrSamplesPerSec=28.864418593198145, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:34:42,768] [INFO] [timer.py:197:stop] 0/136, RunningAvgSamplesPerSec=29.881730721352636, CurrSamplesPerSec=28.110149586142146, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:34:49,054] [INFO] [timer.py:197:stop] 0/137, RunningAvgSamplesPerSec=29.88144481168244, CurrSamplesPerSec=29.843182339438705, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:34:55,248] [INFO] [timer.py:197:stop] 0/138, RunningAvgSamplesPerSec=29.882595254226466, CurrSamplesPerSec=30.038722453127765, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:35:01,701] [INFO] [timer.py:197:stop] 0/139, RunningAvgSamplesPerSec=29.882899321443116, CurrSamplesPerSec=29.924310190278536, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:35:07,703] [INFO] [logging.py:68:log_dist] [Rank 0] step=140, skipped=2, lr=[7.928502661991142e-06], mom=[[0.9, 0.999]] [2022-12-14 16:35:07,704] [INFO] [timer.py:197:stop] 0/140, RunningAvgSamplesPerSec=29.881293437822908, CurrSamplesPerSec=29.662907025147042, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:35:14,151] [INFO] [timer.py:197:stop] 0/141, RunningAvgSamplesPerSec=29.882948600597054, CurrSamplesPerSec=30.113133245802892, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:35:20,851] [INFO] [timer.py:197:stop] 0/142, RunningAvgSamplesPerSec=29.882069588442416, CurrSamplesPerSec=29.760388013558035, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:35:28,341] [INFO] [timer.py:197:stop] 0/143, RunningAvgSamplesPerSec=29.87650998317363, CurrSamplesPerSec=29.118065427374287, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:35:35,007] [INFO] [timer.py:197:stop] 0/144, RunningAvgSamplesPerSec=29.878907635319162, CurrSamplesPerSec=30.22087324536063, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:35:41,232] [INFO] [timer.py:197:stop] 0/145, RunningAvgSamplesPerSec=29.882533726480787, CurrSamplesPerSec=30.40653123954701, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:35:47,312] [INFO] [timer.py:197:stop] 0/146, RunningAvgSamplesPerSec=29.881781988278192, CurrSamplesPerSec=29.774671445957235, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:35:54,533] [INFO] [timer.py:197:stop] 0/147, RunningAvgSamplesPerSec=29.8771309268439, CurrSamplesPerSec=29.22216240912961, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:36:01,232] [INFO] [timer.py:197:stop] 0/148, RunningAvgSamplesPerSec=29.875274842520092, CurrSamplesPerSec=29.608561876864286, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:36:07,750] [INFO] [timer.py:197:stop] 0/149, RunningAvgSamplesPerSec=29.87885509226357, CurrSamplesPerSec=30.410943956090765, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:36:14,287] [INFO] [logging.py:68:log_dist] [Rank 0] step=150, skipped=2, lr=[8.041073861170494e-06], mom=[[0.9, 0.999]] [2022-12-14 16:36:14,288] [INFO] [timer.py:197:stop] 0/150, RunningAvgSamplesPerSec=29.876929387415014, CurrSamplesPerSec=29.596525623964286, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.2592, 'learning_rate': 8.041073861170494e-06, 'epoch': 3.01} [2022-12-14 16:36:20,713] [INFO] [timer.py:197:stop] 0/151, RunningAvgSamplesPerSec=29.875216058247887, CurrSamplesPerSec=29.623791781520946, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:36:27,616] [INFO] [timer.py:197:stop] 0/152, RunningAvgSamplesPerSec=29.87231476962854, CurrSamplesPerSec=29.446230157271902, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:36:30,183] [INFO] [timer.py:197:stop] 0/153, RunningAvgSamplesPerSec=29.874785772461607, CurrSamplesPerSec=30.250123987715384, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:36:32,333] [INFO] [timer.py:197:stop] 0/154, RunningAvgSamplesPerSec=29.87641699052338, CurrSamplesPerSec=30.124792189955944, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:36:34,523] [INFO] [timer.py:197:stop] 0/155, RunningAvgSamplesPerSec=29.874405362716786, CurrSamplesPerSec=29.571755961243202, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:36:36,630] [INFO] [timer.py:197:stop] 0/156, RunningAvgSamplesPerSec=29.880095328286824, CurrSamplesPerSec=30.776961281986928, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:36:38,758] [INFO] [timer.py:197:stop] 0/157, RunningAvgSamplesPerSec=29.883637352381953, CurrSamplesPerSec=30.439317868331734, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:36:40,880] [INFO] [timer.py:197:stop] 0/158, RunningAvgSamplesPerSec=29.887815316898752, CurrSamplesPerSec=30.549836487779118, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:36:43,077] [INFO] [timer.py:197:stop] 0/159, RunningAvgSamplesPerSec=29.885222634072555, CurrSamplesPerSec=29.486199008190678, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:36:44,982] [INFO] [logging.py:68:log_dist] [Rank 0] step=160, skipped=2, lr=[8.146282038785833e-06], mom=[[0.9, 0.999]] [2022-12-14 16:36:44,983] [INFO] [timer.py:197:stop] 0/160, RunningAvgSamplesPerSec=29.90864738746714, CurrSamplesPerSec=34.10570813994707, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:37:28,021] [INFO] [timer.py:197:stop] 0/161, RunningAvgSamplesPerSec=29.90887691921785, CurrSamplesPerSec=29.94518724258098, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:37:34,791] [INFO] [timer.py:197:stop] 0/162, RunningAvgSamplesPerSec=29.910841113769926, CurrSamplesPerSec=30.22646427819808, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:37:41,259] [INFO] [timer.py:197:stop] 0/163, RunningAvgSamplesPerSec=29.909696215366104, CurrSamplesPerSec=29.72763448787347, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:37:48,767] [INFO] [timer.py:197:stop] 0/164, RunningAvgSamplesPerSec=29.901290616209685, CurrSamplesPerSec=28.606934240449807, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:37:55,281] [INFO] [timer.py:197:stop] 0/165, RunningAvgSamplesPerSec=29.895942698523754, CurrSamplesPerSec=29.054125860698672, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:38:01,495] [INFO] [timer.py:197:stop] 0/166, RunningAvgSamplesPerSec=29.899972330883493, CurrSamplesPerSec=30.571648017843103, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:38:07,907] [INFO] [timer.py:197:stop] 0/167, RunningAvgSamplesPerSec=29.90355808626835, CurrSamplesPerSec=30.503491813496467, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:38:14,322] [INFO] [timer.py:197:stop] 0/168, RunningAvgSamplesPerSec=29.906009140721142, CurrSamplesPerSec=30.31601125609953, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:38:21,526] [INFO] [timer.py:197:stop] 0/169, RunningAvgSamplesPerSec=29.903646064132417, CurrSamplesPerSec=29.516484659557914, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:38:28,351] [INFO] [logging.py:68:log_dist] [Rank 0] step=170, skipped=2, lr=[8.245031542220927e-06], mom=[[0.9, 0.999]] [2022-12-14 16:38:28,352] [INFO] [timer.py:197:stop] 0/170, RunningAvgSamplesPerSec=29.90449434601322, CurrSamplesPerSec=30.046835755175707, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:38:35,152] [INFO] [timer.py:197:stop] 0/171, RunningAvgSamplesPerSec=29.898398923967992, CurrSamplesPerSec=28.908475084433583, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:38:41,924] [INFO] [timer.py:197:stop] 0/172, RunningAvgSamplesPerSec=29.896456789041316, CurrSamplesPerSec=29.57182111587593, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:38:48,250] [INFO] [timer.py:197:stop] 0/173, RunningAvgSamplesPerSec=29.896608794692224, CurrSamplesPerSec=29.92247224176661, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:38:54,779] [INFO] [timer.py:197:stop] 0/174, RunningAvgSamplesPerSec=29.898726314889203, CurrSamplesPerSec=30.265287560614187, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:39:01,225] [INFO] [timer.py:197:stop] 0/175, RunningAvgSamplesPerSec=29.896820195289397, CurrSamplesPerSec=29.572544351580362, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.21, 'learning_rate': 8.292222957399574e-06, 'epoch': 4.0} [2022-12-14 16:39:07,847] [INFO] [timer.py:197:stop] 0/176, RunningAvgSamplesPerSec=29.889302012131214, CurrSamplesPerSec=28.64319465591145, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:39:14,184] [INFO] [timer.py:197:stop] 0/177, RunningAvgSamplesPerSec=29.888261139806485, CurrSamplesPerSec=29.708246446714497, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:39:20,655] [INFO] [timer.py:197:stop] 0/178, RunningAvgSamplesPerSec=29.886673310134526, CurrSamplesPerSec=29.611377297625964, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:39:27,290] [INFO] [timer.py:197:stop] 0/179, RunningAvgSamplesPerSec=29.887984821092203, CurrSamplesPerSec=30.12061758858853, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:39:33,945] [INFO] [logging.py:68:log_dist] [Rank 0] step=180, skipped=2, lr=[8.338069703233054e-06], mom=[[0.9, 0.999]] [2022-12-14 16:39:33,946] [INFO] [timer.py:197:stop] 0/180, RunningAvgSamplesPerSec=29.885175803115224, CurrSamplesPerSec=29.396161263298527, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:39:40,842] [INFO] [timer.py:197:stop] 0/181, RunningAvgSamplesPerSec=29.884914796797343, CurrSamplesPerSec=29.83852819001653, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:39:46,981] [INFO] [timer.py:197:stop] 0/182, RunningAvgSamplesPerSec=29.88567300775043, CurrSamplesPerSec=30.022015398901747, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:39:53,691] [INFO] [timer.py:197:stop] 0/183, RunningAvgSamplesPerSec=29.88566176371313, CurrSamplesPerSec=29.883637974816008, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:40:00,231] [INFO] [timer.py:197:stop] 0/184, RunningAvgSamplesPerSec=29.886014268661206, CurrSamplesPerSec=29.949954924800284, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:40:06,660] [INFO] [timer.py:197:stop] 0/185, RunningAvgSamplesPerSec=29.888595078639423, CurrSamplesPerSec=30.365843797645397, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:40:13,513] [INFO] [timer.py:197:stop] 0/186, RunningAvgSamplesPerSec=29.884039474679355, CurrSamplesPerSec=29.073110080473356, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:40:20,556] [INFO] [timer.py:197:stop] 0/187, RunningAvgSamplesPerSec=29.880472326870574, CurrSamplesPerSec=29.238299771276466, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:40:27,253] [INFO] [timer.py:197:stop] 0/188, RunningAvgSamplesPerSec=29.876584153321808, CurrSamplesPerSec=29.174272386068537, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:40:33,431] [INFO] [timer.py:197:stop] 0/189, RunningAvgSamplesPerSec=29.879861588062557, CurrSamplesPerSec=30.502230152562593, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:40:39,511] [INFO] [logging.py:68:log_dist] [Rank 0] step=190, skipped=2, lr=[8.426021206646023e-06], mom=[[0.9, 0.999]] [2022-12-14 16:40:39,511] [INFO] [timer.py:197:stop] 0/190, RunningAvgSamplesPerSec=29.882009257413785, CurrSamplesPerSec=30.289124311969722, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:40:45,859] [INFO] [timer.py:197:stop] 0/191, RunningAvgSamplesPerSec=29.87874432775277, CurrSamplesPerSec=29.277357707549758, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:40:52,573] [INFO] [timer.py:197:stop] 0/192, RunningAvgSamplesPerSec=29.878534970613565, CurrSamplesPerSec=29.8390190795545, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:40:55,042] [INFO] [timer.py:197:stop] 0/193, RunningAvgSamplesPerSec=29.88132102762991, CurrSamplesPerSec=30.420269643923607, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:40:57,215] [INFO] [timer.py:197:stop] 0/194, RunningAvgSamplesPerSec=29.880930133440202, CurrSamplesPerSec=29.806456398270033, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:40:59,358] [INFO] [timer.py:197:stop] 0/195, RunningAvgSamplesPerSec=29.88271622032109, CurrSamplesPerSec=30.229646952197744, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:41:01,487] [INFO] [timer.py:197:stop] 0/196, RunningAvgSamplesPerSec=29.885456800568246, CurrSamplesPerSec=30.423969107656138, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:41:03,626] [INFO] [timer.py:197:stop] 0/197, RunningAvgSamplesPerSec=29.887509851203806, CurrSamplesPerSec=30.291209250643238, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:41:05,748] [INFO] [timer.py:197:stop] 0/198, RunningAvgSamplesPerSec=29.89072355476984, CurrSamplesPerSec=30.530885862211292, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:41:07,897] [INFO] [timer.py:197:stop] 0/199, RunningAvgSamplesPerSec=29.892240725512746, CurrSamplesPerSec=30.1926094790247, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:41:09,791] [INFO] [logging.py:68:log_dist] [Rank 0] step=200, skipped=2, lr=[8.509413541357755e-06], mom=[[0.9, 0.999]] [2022-12-14 16:41:09,792] [INFO] [timer.py:197:stop] 0/200, RunningAvgSamplesPerSec=29.911202818710223, CurrSamplesPerSec=34.182927857316585, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.185, 'learning_rate': 8.509413541357755e-06, 'epoch': 4.01} [2022-12-14 16:41:50,835] [INFO] [timer.py:197:stop] 0/201, RunningAvgSamplesPerSec=29.91464103479585, CurrSamplesPerSec=30.611342718507494, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:41:57,110] [INFO] [timer.py:197:stop] 0/202, RunningAvgSamplesPerSec=29.9141631965936, CurrSamplesPerSec=29.819376213057296, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:42:04,041] [INFO] [timer.py:197:stop] 0/203, RunningAvgSamplesPerSec=29.910821638508406, CurrSamplesPerSec=29.25718750006812, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:42:10,458] [INFO] [timer.py:197:stop] 0/204, RunningAvgSamplesPerSec=29.905148149922873, CurrSamplesPerSec=28.806866031640013, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:42:16,569] [INFO] [timer.py:197:stop] 0/205, RunningAvgSamplesPerSec=29.90836666698895, CurrSamplesPerSec=30.57302685378593, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:42:23,314] [INFO] [timer.py:197:stop] 0/206, RunningAvgSamplesPerSec=29.905273932183338, CurrSamplesPerSec=29.290420460606313, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:42:30,184] [INFO] [timer.py:197:stop] 0/207, RunningAvgSamplesPerSec=29.90471382678926, CurrSamplesPerSec=29.79088936476989, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:42:37,095] [INFO] [timer.py:197:stop] 0/208, RunningAvgSamplesPerSec=29.90266419580228, CurrSamplesPerSec=29.488340077369205, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:42:43,713] [INFO] [timer.py:197:stop] 0/209, RunningAvgSamplesPerSec=29.901245007746052, CurrSamplesPerSec=29.61173661258017, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:42:50,193] [INFO] [logging.py:68:log_dist] [Rank 0] step=210, skipped=2, lr=[8.588696173868873e-06], mom=[[0.9, 0.999]] [2022-12-14 16:42:50,194] [INFO] [timer.py:197:stop] 0/210, RunningAvgSamplesPerSec=29.900280404110475, CurrSamplesPerSec=29.7019383708647, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:42:56,920] [INFO] [timer.py:197:stop] 0/211, RunningAvgSamplesPerSec=29.90117772847084, CurrSamplesPerSec=30.08899921559238, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:43:03,393] [INFO] [timer.py:197:stop] 0/212, RunningAvgSamplesPerSec=29.902888460119943, CurrSamplesPerSec=30.264779133499882, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:43:09,588] [INFO] [timer.py:197:stop] 0/213, RunningAvgSamplesPerSec=29.903968234020752, CurrSamplesPerSec=30.132461597972455, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:43:15,765] [INFO] [timer.py:197:stop] 0/214, RunningAvgSamplesPerSec=29.89919358249735, CurrSamplesPerSec=28.924732142145203, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:43:22,142] [INFO] [timer.py:197:stop] 0/215, RunningAvgSamplesPerSec=29.89956791530157, CurrSamplesPerSec=29.97913866033063, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:43:28,787] [INFO] [timer.py:197:stop] 0/216, RunningAvgSamplesPerSec=29.898774537732756, CurrSamplesPerSec=29.730739318228185, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:43:34,944] [INFO] [timer.py:197:stop] 0/217, RunningAvgSamplesPerSec=29.899603873458314, CurrSamplesPerSec=30.078146462794248, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:43:40,949] [INFO] [timer.py:197:stop] 0/218, RunningAvgSamplesPerSec=29.90012109619336, CurrSamplesPerSec=30.01174104585376, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:43:47,669] [INFO] [timer.py:197:stop] 0/219, RunningAvgSamplesPerSec=29.898399157609145, CurrSamplesPerSec=29.53105141849687, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:43:54,203] [INFO] [logging.py:68:log_dist] [Rank 0] step=220, skipped=2, lr=[8.664255215314613e-06], mom=[[0.9, 0.999]] [2022-12-14 16:43:54,203] [INFO] [timer.py:197:stop] 0/220, RunningAvgSamplesPerSec=29.894954379871145, CurrSamplesPerSec=29.165755103686102, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:44:00,866] [INFO] [timer.py:197:stop] 0/221, RunningAvgSamplesPerSec=29.895256373815528, CurrSamplesPerSec=29.961237021349028, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:44:07,380] [INFO] [timer.py:197:stop] 0/222, RunningAvgSamplesPerSec=29.89420330961516, CurrSamplesPerSec=29.665355772586512, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:44:15,243] [INFO] [timer.py:197:stop] 0/223, RunningAvgSamplesPerSec=29.896603421247487, CurrSamplesPerSec=30.434165389201446, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:44:22,682] [INFO] [timer.py:197:stop] 0/224, RunningAvgSamplesPerSec=29.894522666023157, CurrSamplesPerSec=29.441673154166466, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:44:29,909] [INFO] [timer.py:197:stop] 0/225, RunningAvgSamplesPerSec=29.891065244111157, CurrSamplesPerSec=29.14281774910588, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.1341, 'learning_rate': 8.700744577655557e-06, 'epoch': 5.0} [2022-12-14 16:44:36,000] [INFO] [timer.py:197:stop] 0/226, RunningAvgSamplesPerSec=29.891384110688907, CurrSamplesPerSec=29.962661676823494, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:44:43,660] [INFO] [timer.py:197:stop] 0/227, RunningAvgSamplesPerSec=29.89109564404624, CurrSamplesPerSec=29.82661911927094, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:44:50,479] [INFO] [timer.py:197:stop] 0/228, RunningAvgSamplesPerSec=29.892177299288036, CurrSamplesPerSec=30.137556405024, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:44:57,338] [INFO] [timer.py:197:stop] 0/229, RunningAvgSamplesPerSec=29.89248940754049, CurrSamplesPerSec=29.9631934491519, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:45:03,748] [INFO] [logging.py:68:log_dist] [Rank 0] step=230, skipped=2, lr=[8.73642479617159e-06], mom=[[0.9, 0.999]] [2022-12-14 16:45:03,749] [INFO] [timer.py:197:stop] 0/230, RunningAvgSamplesPerSec=29.893747111293134, CurrSamplesPerSec=30.18201104151401, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:45:10,011] [INFO] [timer.py:197:stop] 0/231, RunningAvgSamplesPerSec=29.894146637027717, CurrSamplesPerSec=29.985518148113808, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:45:16,607] [INFO] [timer.py:197:stop] 0/232, RunningAvgSamplesPerSec=29.88797323962098, CurrSamplesPerSec=28.538380061238847, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:45:19,093] [INFO] [timer.py:197:stop] 0/233, RunningAvgSamplesPerSec=29.890426280544805, CurrSamplesPerSec=30.465528279545815, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:45:21,220] [INFO] [timer.py:197:stop] 0/234, RunningAvgSamplesPerSec=29.893281457558572, CurrSamplesPerSec=30.56777331911125, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:45:23,340] [INFO] [timer.py:197:stop] 0/235, RunningAvgSamplesPerSec=29.89609505979747, CurrSamplesPerSec=30.56348548955633, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:45:25,510] [INFO] [timer.py:197:stop] 0/236, RunningAvgSamplesPerSec=29.895865435989688, CurrSamplesPerSec=29.8424590762622, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:45:27,644] [INFO] [timer.py:197:stop] 0/237, RunningAvgSamplesPerSec=29.897799563113182, CurrSamplesPerSec=30.357371937140496, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:45:29,816] [INFO] [timer.py:197:stop] 0/238, RunningAvgSamplesPerSec=29.897440781526907, CurrSamplesPerSec=29.813365218906597, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:45:31,986] [INFO] [timer.py:197:stop] 0/239, RunningAvgSamplesPerSec=29.897258762652665, CurrSamplesPerSec=29.85436420049251, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:45:33,884] [INFO] [logging.py:68:log_dist] [Rank 0] step=240, skipped=2, lr=[8.805495997504354e-06], mom=[[0.9, 0.999]] [2022-12-14 16:45:33,885] [INFO] [timer.py:197:stop] 0/240, RunningAvgSamplesPerSec=29.912886875648738, CurrSamplesPerSec=34.14270222363772, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:46:16,834] [INFO] [timer.py:197:stop] 0/241, RunningAvgSamplesPerSec=29.91298844738415, CurrSamplesPerSec=29.937182154621404, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:46:23,171] [INFO] [timer.py:197:stop] 0/242, RunningAvgSamplesPerSec=29.9037725477038, CurrSamplesPerSec=27.85286664015944, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:46:29,529] [INFO] [timer.py:197:stop] 0/243, RunningAvgSamplesPerSec=29.904032567935364, CurrSamplesPerSec=29.96656846953241, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:46:35,647] [INFO] [timer.py:197:stop] 0/244, RunningAvgSamplesPerSec=29.902895421982265, CurrSamplesPerSec=29.631342286983593, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:46:41,721] [INFO] [timer.py:197:stop] 0/245, RunningAvgSamplesPerSec=29.904249114961235, CurrSamplesPerSec=30.2354864336176, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:46:48,420] [INFO] [timer.py:197:stop] 0/246, RunningAvgSamplesPerSec=29.906507298344664, CurrSamplesPerSec=30.465545567667025, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:46:54,997] [INFO] [timer.py:197:stop] 0/247, RunningAvgSamplesPerSec=29.904651479006407, CurrSamplesPerSec=29.45861320715065, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:47:01,096] [INFO] [timer.py:197:stop] 0/248, RunningAvgSamplesPerSec=29.9003623901369, CurrSamplesPerSec=28.885352992865485, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:47:09,485] [INFO] [timer.py:197:stop] 0/249, RunningAvgSamplesPerSec=29.898604913913687, CurrSamplesPerSec=29.472453044638062, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:47:16,980] [INFO] [logging.py:68:log_dist] [Rank 0] step=250, skipped=2, lr=[8.871723942761204e-06], mom=[[0.9, 0.999]] [2022-12-14 16:47:16,981] [INFO] [timer.py:197:stop] 0/250, RunningAvgSamplesPerSec=29.891859182639145, CurrSamplesPerSec=28.313972291623664, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.1139, 'learning_rate': 8.871723942761204e-06, 'epoch': 6.0} [2022-12-14 16:47:25,337] [INFO] [timer.py:197:stop] 0/251, RunningAvgSamplesPerSec=29.893462887658067, CurrSamplesPerSec=30.29656646644681, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:47:31,969] [INFO] [timer.py:197:stop] 0/252, RunningAvgSamplesPerSec=29.895590220264072, CurrSamplesPerSec=30.434890012926317, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:47:37,992] [INFO] [timer.py:197:stop] 0/253, RunningAvgSamplesPerSec=29.89430894502289, CurrSamplesPerSec=29.577399416793977, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:47:44,079] [INFO] [timer.py:197:stop] 0/254, RunningAvgSamplesPerSec=29.893720359990652, CurrSamplesPerSec=29.746714912627244, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:47:50,410] [INFO] [timer.py:197:stop] 0/255, RunningAvgSamplesPerSec=29.892986584361992, CurrSamplesPerSec=29.70921639895934, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:47:56,427] [INFO] [timer.py:197:stop] 0/256, RunningAvgSamplesPerSec=29.893559606344997, CurrSamplesPerSec=30.03924348257456, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:48:03,483] [INFO] [timer.py:197:stop] 0/257, RunningAvgSamplesPerSec=29.895903266581495, CurrSamplesPerSec=30.503335833276516, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:48:09,653] [INFO] [timer.py:197:stop] 0/258, RunningAvgSamplesPerSec=29.895849131860434, CurrSamplesPerSec=29.882051174157176, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:48:15,795] [INFO] [timer.py:197:stop] 0/259, RunningAvgSamplesPerSec=29.894191691255383, CurrSamplesPerSec=29.475847875960405, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:48:21,884] [INFO] [logging.py:68:log_dist] [Rank 0] step=260, skipped=2, lr=[8.935333486807386e-06], mom=[[0.9, 0.999]] [2022-12-14 16:48:21,884] [INFO] [timer.py:197:stop] 0/260, RunningAvgSamplesPerSec=29.896080018219457, CurrSamplesPerSec=30.389419539129022, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:48:28,516] [INFO] [timer.py:197:stop] 0/261, RunningAvgSamplesPerSec=29.898679389251754, CurrSamplesPerSec=30.584765921859635, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:48:34,747] [INFO] [timer.py:197:stop] 0/262, RunningAvgSamplesPerSec=29.89745708099995, CurrSamplesPerSec=29.58420896829772, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:48:41,539] [INFO] [timer.py:197:stop] 0/263, RunningAvgSamplesPerSec=29.895875918651516, CurrSamplesPerSec=29.490371303286864, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:48:48,203] [INFO] [timer.py:197:stop] 0/264, RunningAvgSamplesPerSec=29.895695793736387, CurrSamplesPerSec=29.84875728704407, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:48:54,408] [INFO] [timer.py:197:stop] 0/265, RunningAvgSamplesPerSec=29.896197225763203, CurrSamplesPerSec=30.028154500177248, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:00,447] [INFO] [timer.py:197:stop] 0/266, RunningAvgSamplesPerSec=29.89719937318894, CurrSamplesPerSec=30.163117312735828, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:08,292] [INFO] [timer.py:197:stop] 0/267, RunningAvgSamplesPerSec=29.897553374537637, CurrSamplesPerSec=29.9913038941201, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:14,785] [INFO] [timer.py:197:stop] 0/268, RunningAvgSamplesPerSec=29.897578563288715, CurrSamplesPerSec=29.904255078567953, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:21,282] [INFO] [timer.py:197:stop] 0/269, RunningAvgSamplesPerSec=29.8958208068856, CurrSamplesPerSec=29.435484219389963, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:27,474] [INFO] [logging.py:68:log_dist] [Rank 0] step=270, skipped=2, lr=[8.996523822524443e-06], mom=[[0.9, 0.999]] [2022-12-14 16:49:27,474] [INFO] [timer.py:197:stop] 0/270, RunningAvgSamplesPerSec=29.895295255381498, CurrSamplesPerSec=29.755631013731563, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:33,857] [INFO] [timer.py:197:stop] 0/271, RunningAvgSamplesPerSec=29.894348242444387, CurrSamplesPerSec=29.642693268242866, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:40,199] [INFO] [timer.py:197:stop] 0/272, RunningAvgSamplesPerSec=29.892225441201937, CurrSamplesPerSec=29.331934969068403, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:42,699] [INFO] [timer.py:197:stop] 0/273, RunningAvgSamplesPerSec=29.891306209120906, CurrSamplesPerSec=29.645164870613847, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:44,812] [INFO] [timer.py:197:stop] 0/274, RunningAvgSamplesPerSec=29.89405258775281, CurrSamplesPerSec=30.657396199190565, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:46,966] [INFO] [timer.py:197:stop] 0/275, RunningAvgSamplesPerSec=29.89472603154315, CurrSamplesPerSec=30.07903623540886, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0887, 'learning_rate': 9.026267958246849e-06, 'epoch': 6.01} [2022-12-14 16:49:49,092] [INFO] [timer.py:197:stop] 0/276, RunningAvgSamplesPerSec=29.896864601304152, CurrSamplesPerSec=30.492365761005505, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:51,222] [INFO] [timer.py:197:stop] 0/277, RunningAvgSamplesPerSec=29.89877526012867, CurrSamplesPerSec=30.431660527289253, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:53,386] [INFO] [timer.py:197:stop] 0/278, RunningAvgSamplesPerSec=29.898893049854365, CurrSamplesPerSec=29.931320483744, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:55,562] [INFO] [timer.py:197:stop] 0/279, RunningAvgSamplesPerSec=29.898438320139068, CurrSamplesPerSec=29.773459446395496, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:49:57,439] [INFO] [logging.py:68:log_dist] [Rank 0] step=280, skipped=2, lr=[9.055472243083868e-06], mom=[[0.9, 0.999]] [2022-12-14 16:49:57,440] [INFO] [timer.py:197:stop] 0/280, RunningAvgSamplesPerSec=29.912678965172393, CurrSamplesPerSec=34.459042160024424, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:50:43,017] [INFO] [timer.py:197:stop] 0/281, RunningAvgSamplesPerSec=29.9048444724259, CurrSamplesPerSec=27.875207011495448, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:50:50,331] [INFO] [timer.py:197:stop] 0/282, RunningAvgSamplesPerSec=29.90575769638925, CurrSamplesPerSec=30.16274449214962, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:50:56,769] [INFO] [timer.py:197:stop] 0/283, RunningAvgSamplesPerSec=29.90477983580079, CurrSamplesPerSec=29.633471773872163, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:51:03,807] [INFO] [timer.py:197:stop] 0/284, RunningAvgSamplesPerSec=29.903308059184262, CurrSamplesPerSec=29.495400358161355, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:51:10,377] [INFO] [timer.py:197:stop] 0/285, RunningAvgSamplesPerSec=29.901247990848436, CurrSamplesPerSec=29.331418957088907, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:51:17,026] [INFO] [timer.py:197:stop] 0/286, RunningAvgSamplesPerSec=29.901487696379526, CurrSamplesPerSec=29.969479157048916, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:51:23,751] [INFO] [timer.py:197:stop] 0/287, RunningAvgSamplesPerSec=29.90259543474367, CurrSamplesPerSec=30.22055002917519, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:51:30,123] [INFO] [timer.py:197:stop] 0/288, RunningAvgSamplesPerSec=29.904432389028543, CurrSamplesPerSec=30.437326381900572, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:51:37,048] [INFO] [timer.py:197:stop] 0/289, RunningAvgSamplesPerSec=29.903441184102462, CurrSamplesPerSec=29.622627992246166, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:51:43,348] [INFO] [logging.py:68:log_dist] [Rank 0] step=290, skipped=2, lr=[9.11233723905084e-06], mom=[[0.9, 0.999]] [2022-12-14 16:51:43,348] [INFO] [timer.py:197:stop] 0/290, RunningAvgSamplesPerSec=29.904026856633884, CurrSamplesPerSec=30.073068351850548, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:51:50,721] [INFO] [timer.py:197:stop] 0/291, RunningAvgSamplesPerSec=29.905649695422376, CurrSamplesPerSec=30.380473775460235, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:51:57,009] [INFO] [timer.py:197:stop] 0/292, RunningAvgSamplesPerSec=29.90447533133414, CurrSamplesPerSec=29.568905727084207, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:52:03,521] [INFO] [timer.py:197:stop] 0/293, RunningAvgSamplesPerSec=29.904385369870248, CurrSamplesPerSec=29.87831936394744, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:52:10,176] [INFO] [timer.py:197:stop] 0/294, RunningAvgSamplesPerSec=29.90432655541064, CurrSamplesPerSec=29.88722137103881, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:52:16,398] [INFO] [timer.py:197:stop] 0/295, RunningAvgSamplesPerSec=29.903681896366468, CurrSamplesPerSec=29.71662300353973, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:52:23,921] [INFO] [timer.py:197:stop] 0/296, RunningAvgSamplesPerSec=29.90237215457338, CurrSamplesPerSec=29.52349673046265, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:52:30,261] [INFO] [timer.py:197:stop] 0/297, RunningAvgSamplesPerSec=29.90029200070283, CurrSamplesPerSec=29.301025542338483, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:52:36,498] [INFO] [timer.py:197:stop] 0/298, RunningAvgSamplesPerSec=29.90018203540275, CurrSamplesPerSec=29.86777754782775, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:52:43,293] [INFO] [timer.py:197:stop] 0/299, RunningAvgSamplesPerSec=29.898101699743936, CurrSamplesPerSec=29.294790090283477, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:52:49,826] [INFO] [logging.py:68:log_dist] [Rank 0] step=300, skipped=2, lr=[9.16726106663399e-06], mom=[[0.9, 0.999]] [2022-12-14 16:52:49,826] [INFO] [timer.py:197:stop] 0/300, RunningAvgSamplesPerSec=29.898291975704918, CurrSamplesPerSec=29.95491131493284, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0656, 'learning_rate': 9.16726106663399e-06, 'epoch': 7.0} [2022-12-14 16:52:56,340] [INFO] [timer.py:197:stop] 0/301, RunningAvgSamplesPerSec=29.896398750531528, CurrSamplesPerSec=29.342701644581112, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:53:02,764] [INFO] [timer.py:197:stop] 0/302, RunningAvgSamplesPerSec=29.894165090958168, CurrSamplesPerSec=29.24094328583615, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:53:09,804] [INFO] [timer.py:197:stop] 0/303, RunningAvgSamplesPerSec=29.89530749415042, CurrSamplesPerSec=30.242016384861973, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:53:16,258] [INFO] [timer.py:197:stop] 0/304, RunningAvgSamplesPerSec=29.89465175580126, CurrSamplesPerSec=29.698573406581612, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:53:23,381] [INFO] [timer.py:197:stop] 0/305, RunningAvgSamplesPerSec=29.894623645037804, CurrSamplesPerSec=29.886136612595564, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:53:29,606] [INFO] [timer.py:197:stop] 0/306, RunningAvgSamplesPerSec=29.89623407037442, CurrSamplesPerSec=30.39231660535038, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:53:36,664] [INFO] [timer.py:197:stop] 0/307, RunningAvgSamplesPerSec=29.893678791257845, CurrSamplesPerSec=29.136611492788177, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:53:43,317] [INFO] [timer.py:197:stop] 0/308, RunningAvgSamplesPerSec=29.894424240974, CurrSamplesPerSec=30.123534617401965, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:53:50,202] [INFO] [timer.py:197:stop] 0/309, RunningAvgSamplesPerSec=29.89414779957358, CurrSamplesPerSec=29.809796199664163, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:53:56,919] [INFO] [logging.py:68:log_dist] [Rank 0] step=310, skipped=2, lr=[9.220371891879027e-06], mom=[[0.9, 0.999]] [2022-12-14 16:53:56,920] [INFO] [timer.py:197:stop] 0/310, RunningAvgSamplesPerSec=29.89206282018668, CurrSamplesPerSec=29.26543602356625, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:54:03,836] [INFO] [timer.py:197:stop] 0/311, RunningAvgSamplesPerSec=29.893254527161083, CurrSamplesPerSec=30.264878087773916, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:54:10,533] [INFO] [timer.py:197:stop] 0/312, RunningAvgSamplesPerSec=29.894660173393245, CurrSamplesPerSec=30.335429599287323, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:54:12,950] [INFO] [timer.py:197:stop] 0/313, RunningAvgSamplesPerSec=29.894290673977526, CurrSamplesPerSec=29.780184481847193, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:54:15,093] [INFO] [timer.py:197:stop] 0/314, RunningAvgSamplesPerSec=29.895338078289655, CurrSamplesPerSec=30.224680912200906, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:54:17,249] [INFO] [timer.py:197:stop] 0/315, RunningAvgSamplesPerSec=29.895872010624874, CurrSamplesPerSec=30.063395371027823, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:54:19,387] [INFO] [timer.py:197:stop] 0/316, RunningAvgSamplesPerSec=29.89718989442782, CurrSamplesPerSec=30.315477157809276, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:54:21,508] [INFO] [timer.py:197:stop] 0/317, RunningAvgSamplesPerSec=29.899231787911315, CurrSamplesPerSec=30.55448217960817, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:54:23,633] [INFO] [timer.py:197:stop] 0/318, RunningAvgSamplesPerSec=29.901080301258954, CurrSamplesPerSec=30.49496376986114, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:54:25,808] [INFO] [timer.py:197:stop] 0/319, RunningAvgSamplesPerSec=29.900744841635667, CurrSamplesPerSec=29.795115267668866, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:54:27,735] [INFO] [logging.py:68:log_dist] [Rank 0] step=320, skipped=2, lr=[9.271785592148743e-06], mom=[[0.9, 0.999]] [2022-12-14 16:54:27,736] [INFO] [timer.py:197:stop] 0/320, RunningAvgSamplesPerSec=29.911293719653127, CurrSamplesPerSec=33.67768812884054, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:55:10,865] [INFO] [timer.py:197:stop] 0/321, RunningAvgSamplesPerSec=29.91291173277871, CurrSamplesPerSec=30.436473953493223, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:55:17,550] [INFO] [timer.py:197:stop] 0/322, RunningAvgSamplesPerSec=29.91233759174174, CurrSamplesPerSec=29.730304668791224, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:55:24,060] [INFO] [timer.py:197:stop] 0/323, RunningAvgSamplesPerSec=29.911987843807836, CurrSamplesPerSec=29.80048700264239, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:55:30,812] [INFO] [timer.py:197:stop] 0/324, RunningAvgSamplesPerSec=29.91190763617, CurrSamplesPerSec=29.8861831956958, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:55:36,907] [INFO] [timer.py:197:stop] 0/325, RunningAvgSamplesPerSec=29.913488367401925, CurrSamplesPerSec=30.431322437018373, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0581, 'learning_rate': 9.296889251455016e-06, 'epoch': 8.0} [2022-12-14 16:55:43,524] [INFO] [timer.py:197:stop] 0/326, RunningAvgSamplesPerSec=29.914541973559068, CurrSamplesPerSec=30.25878507636347, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:55:50,284] [INFO] [timer.py:197:stop] 0/327, RunningAvgSamplesPerSec=29.90928590033612, CurrSamplesPerSec=28.298325738387017, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:55:57,195] [INFO] [timer.py:197:stop] 0/328, RunningAvgSamplesPerSec=29.910322053496838, CurrSamplesPerSec=30.250918282596864, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:56:05,130] [INFO] [timer.py:197:stop] 0/329, RunningAvgSamplesPerSec=29.90859380540244, CurrSamplesPerSec=29.35563337732087, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:56:12,058] [INFO] [logging.py:68:log_dist] [Rank 0] step=330, skipped=2, lr=[9.321607278590771e-06], mom=[[0.9, 0.999]] [2022-12-14 16:56:12,058] [INFO] [timer.py:197:stop] 0/330, RunningAvgSamplesPerSec=29.907732778590503, CurrSamplesPerSec=29.62881085286702, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:56:18,739] [INFO] [timer.py:197:stop] 0/331, RunningAvgSamplesPerSec=29.90681446695206, CurrSamplesPerSec=29.608620662079996, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:56:25,623] [INFO] [timer.py:197:stop] 0/332, RunningAvgSamplesPerSec=29.9050179571439, CurrSamplesPerSec=29.3254556889605, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:56:33,074] [INFO] [timer.py:197:stop] 0/333, RunningAvgSamplesPerSec=29.90644487108301, CurrSamplesPerSec=30.384882370912152, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:56:39,664] [INFO] [timer.py:197:stop] 0/334, RunningAvgSamplesPerSec=29.90749419731803, CurrSamplesPerSec=30.25891468936011, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:56:46,242] [INFO] [timer.py:197:stop] 0/335, RunningAvgSamplesPerSec=29.9074117732332, CurrSamplesPerSec=29.88007206780411, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:56:52,955] [INFO] [timer.py:197:stop] 0/336, RunningAvgSamplesPerSec=29.907204431777707, CurrSamplesPerSec=29.83831923518171, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:56:59,427] [INFO] [timer.py:197:stop] 0/337, RunningAvgSamplesPerSec=29.90783380265536, CurrSamplesPerSec=30.11953609922964, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:57:06,061] [INFO] [timer.py:197:stop] 0/338, RunningAvgSamplesPerSec=29.90741746181621, CurrSamplesPerSec=29.768592627258062, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:57:12,421] [INFO] [timer.py:197:stop] 0/339, RunningAvgSamplesPerSec=29.90760980362913, CurrSamplesPerSec=29.97237702369169, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:57:18,770] [INFO] [logging.py:68:log_dist] [Rank 0] step=340, skipped=2, lr=[9.369932589894792e-06], mom=[[0.9, 0.999]] [2022-12-14 16:57:18,771] [INFO] [timer.py:197:stop] 0/340, RunningAvgSamplesPerSec=29.908087945193127, CurrSamplesPerSec=30.070097087596157, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:57:25,561] [INFO] [timer.py:197:stop] 0/341, RunningAvgSamplesPerSec=29.90789551759885, CurrSamplesPerSec=29.84299654371694, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:57:31,888] [INFO] [timer.py:197:stop] 0/342, RunningAvgSamplesPerSec=29.90827921001602, CurrSamplesPerSec=30.038920778334997, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:57:38,720] [INFO] [timer.py:197:stop] 0/343, RunningAvgSamplesPerSec=29.908441918781286, CurrSamplesPerSec=29.96386571665629, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:57:45,378] [INFO] [timer.py:197:stop] 0/344, RunningAvgSamplesPerSec=29.91022627616752, CurrSamplesPerSec=30.531365071067608, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:57:51,830] [INFO] [timer.py:197:stop] 0/345, RunningAvgSamplesPerSec=29.91096045525199, CurrSamplesPerSec=30.164181596750968, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:57:58,953] [INFO] [timer.py:197:stop] 0/346, RunningAvgSamplesPerSec=29.909288355696958, CurrSamplesPerSec=29.346579967941675, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:05,990] [INFO] [timer.py:197:stop] 0/347, RunningAvgSamplesPerSec=29.90674383756232, CurrSamplesPerSec=29.05639020305504, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:12,277] [INFO] [timer.py:197:stop] 0/348, RunningAvgSamplesPerSec=29.90587822989526, CurrSamplesPerSec=29.610204684942264, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:18,735] [INFO] [timer.py:197:stop] 0/349, RunningAvgSamplesPerSec=29.906074783669883, CurrSamplesPerSec=29.97423784319403, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:25,228] [INFO] [logging.py:68:log_dist] [Rank 0] step=350, skipped=2, lr=[9.416848797368692e-06], mom=[[0.9, 0.999]] [2022-12-14 16:58:25,229] [INFO] [timer.py:197:stop] 0/350, RunningAvgSamplesPerSec=29.9041242453122, CurrSamplesPerSec=29.242309819913388, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0417, 'learning_rate': 9.416848797368692e-06, 'epoch': 8.01} [2022-12-14 16:58:31,496] [INFO] [timer.py:197:stop] 0/351, RunningAvgSamplesPerSec=29.906699262066486, CurrSamplesPerSec=30.83056683537779, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:38,803] [INFO] [timer.py:197:stop] 0/352, RunningAvgSamplesPerSec=29.903709786780272, CurrSamplesPerSec=28.8956542865573, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:41,143] [INFO] [timer.py:197:stop] 0/353, RunningAvgSamplesPerSec=29.90592156166608, CurrSamplesPerSec=30.700673884565617, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:43,289] [INFO] [timer.py:197:stop] 0/354, RunningAvgSamplesPerSec=29.906715970376048, CurrSamplesPerSec=30.188185203220396, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:45,473] [INFO] [timer.py:197:stop] 0/355, RunningAvgSamplesPerSec=29.905995404204994, CurrSamplesPerSec=29.65449520000884, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:47,676] [INFO] [timer.py:197:stop] 0/356, RunningAvgSamplesPerSec=29.905534106537285, CurrSamplesPerSec=29.74358037927585, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:49,843] [INFO] [timer.py:197:stop] 0/357, RunningAvgSamplesPerSec=29.906605904883662, CurrSamplesPerSec=30.29091187298816, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:51,963] [INFO] [timer.py:197:stop] 0/358, RunningAvgSamplesPerSec=29.908384659901674, CurrSamplesPerSec=30.553501459351626, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:54,105] [INFO] [timer.py:197:stop] 0/359, RunningAvgSamplesPerSec=29.909309229228302, CurrSamplesPerSec=30.242128818733313, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:58:55,988] [INFO] [logging.py:68:log_dist] [Rank 0] step=360, skipped=2, lr=[9.462435753420545e-06], mom=[[0.9, 0.999]] [2022-12-14 16:58:55,988] [INFO] [timer.py:197:stop] 0/360, RunningAvgSamplesPerSec=29.920161536970852, CurrSamplesPerSec=34.372581064208184, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:59:37,241] [INFO] [timer.py:197:stop] 0/361, RunningAvgSamplesPerSec=29.91911723401767, CurrSamplesPerSec=29.549883500331237, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:59:43,813] [INFO] [timer.py:197:stop] 0/362, RunningAvgSamplesPerSec=29.9191702094041, CurrSamplesPerSec=29.9382005034619, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:59:50,536] [INFO] [timer.py:197:stop] 0/363, RunningAvgSamplesPerSec=29.919841076456315, CurrSamplesPerSec=30.16332406266985, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 16:59:57,251] [INFO] [timer.py:197:stop] 0/364, RunningAvgSamplesPerSec=29.918559758093185, CurrSamplesPerSec=29.463065505831796, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:00:03,904] [INFO] [timer.py:197:stop] 0/365, RunningAvgSamplesPerSec=29.918710009373058, CurrSamplesPerSec=29.97320030741921, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:00:11,146] [INFO] [timer.py:197:stop] 0/366, RunningAvgSamplesPerSec=29.914147915879507, CurrSamplesPerSec=28.34520362504587, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:00:18,617] [INFO] [timer.py:197:stop] 0/367, RunningAvgSamplesPerSec=29.914952535081028, CurrSamplesPerSec=30.210737755913275, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:00:24,915] [INFO] [timer.py:197:stop] 0/368, RunningAvgSamplesPerSec=29.913759154102458, CurrSamplesPerSec=29.484443628225232, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:00:31,425] [INFO] [timer.py:197:stop] 0/369, RunningAvgSamplesPerSec=29.914507278671447, CurrSamplesPerSec=30.190857269980047, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:00:37,975] [INFO] [logging.py:68:log_dist] [Rank 0] step=370, skipped=2, lr=[9.506766709342328e-06], mom=[[0.9, 0.999]] [2022-12-14 17:00:37,975] [INFO] [timer.py:197:stop] 0/370, RunningAvgSamplesPerSec=29.91217844652727, CurrSamplesPerSec=29.081302379920864, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:00:44,783] [INFO] [timer.py:197:stop] 0/371, RunningAvgSamplesPerSec=29.911494985747904, CurrSamplesPerSec=29.66208431052056, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:00:51,367] [INFO] [timer.py:197:stop] 0/372, RunningAvgSamplesPerSec=29.910344162979843, CurrSamplesPerSec=29.49165108689956, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:00:57,593] [INFO] [timer.py:197:stop] 0/373, RunningAvgSamplesPerSec=29.911814208493062, CurrSamplesPerSec=30.465832553345333, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:01:03,946] [INFO] [timer.py:197:stop] 0/374, RunningAvgSamplesPerSec=29.91133592389149, CurrSamplesPerSec=29.73494158476053, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:01:10,190] [INFO] [timer.py:197:stop] 0/375, RunningAvgSamplesPerSec=29.909455629123215, CurrSamplesPerSec=29.226012113692462, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0344, 'learning_rate': 9.528482449516371e-06, 'epoch': 9.0} [2022-12-14 17:01:16,915] [INFO] [timer.py:197:stop] 0/376, RunningAvgSamplesPerSec=29.9101051810725, CurrSamplesPerSec=30.154372014238085, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:01:23,045] [INFO] [timer.py:197:stop] 0/377, RunningAvgSamplesPerSec=29.91008860016499, CurrSamplesPerSec=29.903888629634782, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:01:29,521] [INFO] [timer.py:197:stop] 0/378, RunningAvgSamplesPerSec=29.907999007179324, CurrSamplesPerSec=29.14445990744925, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:01:36,033] [INFO] [timer.py:197:stop] 0/379, RunningAvgSamplesPerSec=29.907106299706733, CurrSamplesPerSec=29.575183484595996, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:01:42,146] [INFO] [logging.py:68:log_dist] [Rank 0] step=380, skipped=2, lr=[9.549909023428816e-06], mom=[[0.9, 0.999]] [2022-12-14 17:01:42,146] [INFO] [timer.py:197:stop] 0/380, RunningAvgSamplesPerSec=29.908239058447485, CurrSamplesPerSec=30.34149178813811, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:01:48,943] [INFO] [timer.py:197:stop] 0/381, RunningAvgSamplesPerSec=29.909724541270474, CurrSamplesPerSec=30.48200930305233, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:01:55,168] [INFO] [timer.py:197:stop] 0/382, RunningAvgSamplesPerSec=29.91093039527844, CurrSamplesPerSec=30.375059350276846, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:02:01,996] [INFO] [timer.py:197:stop] 0/383, RunningAvgSamplesPerSec=29.910329917193575, CurrSamplesPerSec=29.683880342876517, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:02:08,807] [INFO] [timer.py:197:stop] 0/384, RunningAvgSamplesPerSec=29.9104266095955, CurrSamplesPerSec=29.947311964590977, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:02:15,112] [INFO] [timer.py:197:stop] 0/385, RunningAvgSamplesPerSec=29.910031136382674, CurrSamplesPerSec=29.759721546108228, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:02:21,627] [INFO] [timer.py:197:stop] 0/386, RunningAvgSamplesPerSec=29.90925423821233, CurrSamplesPerSec=29.614640850759056, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:02:28,953] [INFO] [timer.py:197:stop] 0/387, RunningAvgSamplesPerSec=29.908659900389534, CurrSamplesPerSec=29.68216698967376, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:02:35,421] [INFO] [timer.py:197:stop] 0/388, RunningAvgSamplesPerSec=29.909814630426162, CurrSamplesPerSec=30.36111106084269, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:02:41,686] [INFO] [timer.py:197:stop] 0/389, RunningAvgSamplesPerSec=29.911149013348545, CurrSamplesPerSec=30.435269591505826, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:02:48,287] [INFO] [logging.py:68:log_dist] [Rank 0] step=390, skipped=2, lr=[9.591924776618972e-06], mom=[[0.9, 0.999]] [2022-12-14 17:02:48,287] [INFO] [timer.py:197:stop] 0/390, RunningAvgSamplesPerSec=29.909647329276165, CurrSamplesPerSec=29.339600365231963, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:02:54,383] [INFO] [timer.py:197:stop] 0/391, RunningAvgSamplesPerSec=29.909537479726403, CurrSamplesPerSec=29.866976660661052, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:03:00,919] [INFO] [timer.py:197:stop] 0/392, RunningAvgSamplesPerSec=29.91063872707509, CurrSamplesPerSec=30.345264734811877, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:03:03,320] [INFO] [timer.py:197:stop] 0/393, RunningAvgSamplesPerSec=29.91263156233573, CurrSamplesPerSec=30.710624346670286, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:03:05,454] [INFO] [timer.py:197:stop] 0/394, RunningAvgSamplesPerSec=29.913726487749567, CurrSamplesPerSec=30.34807447455813, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:03:07,641] [INFO] [timer.py:197:stop] 0/395, RunningAvgSamplesPerSec=29.912948501963996, CurrSamplesPerSec=29.611063720791755, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:03:09,832] [INFO] [timer.py:197:stop] 0/396, RunningAvgSamplesPerSec=29.912030444337542, CurrSamplesPerSec=29.555544643126503, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:03:11,957] [INFO] [timer.py:197:stop] 0/397, RunningAvgSamplesPerSec=29.91370389986625, CurrSamplesPerSec=30.587944335471246, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:03:14,081] [INFO] [timer.py:197:stop] 0/398, RunningAvgSamplesPerSec=29.915155058106357, CurrSamplesPerSec=30.49958932114541, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:03:16,277] [INFO] [timer.py:197:stop] 0/399, RunningAvgSamplesPerSec=29.914107168157873, CurrSamplesPerSec=29.504834460612074, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:03:18,182] [INFO] [logging.py:68:log_dist] [Rank 0] step=400, skipped=2, lr=[9.632871309784314e-06], mom=[[0.9, 0.999]] [2022-12-14 17:03:18,182] [INFO] [timer.py:197:stop] 0/400, RunningAvgSamplesPerSec=29.92319330402866, CurrSamplesPerSec=34.02625278090482, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0277, 'learning_rate': 9.632871309784314e-06, 'epoch': 9.01} [2022-12-14 17:04:00,219] [INFO] [timer.py:197:stop] 0/401, RunningAvgSamplesPerSec=29.923624113130717, CurrSamplesPerSec=30.09607677028784, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:04:06,728] [INFO] [timer.py:197:stop] 0/402, RunningAvgSamplesPerSec=29.922805824325504, CurrSamplesPerSec=29.59984139050597, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:04:13,407] [INFO] [timer.py:197:stop] 0/403, RunningAvgSamplesPerSec=29.920893640128966, CurrSamplesPerSec=29.17513168104055, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:04:19,462] [INFO] [timer.py:197:stop] 0/404, RunningAvgSamplesPerSec=29.917980863159624, CurrSamplesPerSec=28.793949864383823, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:04:25,649] [INFO] [timer.py:197:stop] 0/405, RunningAvgSamplesPerSec=29.91689952783603, CurrSamplesPerSec=29.488443737625744, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:04:32,134] [INFO] [timer.py:197:stop] 0/406, RunningAvgSamplesPerSec=29.918150975053766, CurrSamplesPerSec=30.43115339470001, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:04:38,625] [INFO] [timer.py:197:stop] 0/407, RunningAvgSamplesPerSec=29.918880453970214, CurrSamplesPerSec=30.216529117446065, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:04:45,258] [INFO] [timer.py:197:stop] 0/408, RunningAvgSamplesPerSec=29.91474986404059, CurrSamplesPerSec=28.33066468533, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:04:51,742] [INFO] [timer.py:197:stop] 0/409, RunningAvgSamplesPerSec=29.915414740248504, CurrSamplesPerSec=30.187818552440135, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:04:57,894] [INFO] [logging.py:68:log_dist] [Rank 0] step=410, skipped=2, lr=[9.672801694334265e-06], mom=[[0.9, 0.999]] [2022-12-14 17:04:57,895] [INFO] [timer.py:197:stop] 0/410, RunningAvgSamplesPerSec=29.914526715351123, CurrSamplesPerSec=29.557425663643926, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:05:04,173] [INFO] [timer.py:197:stop] 0/411, RunningAvgSamplesPerSec=29.91348045431862, CurrSamplesPerSec=29.492626388986245, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:05:10,790] [INFO] [timer.py:197:stop] 0/412, RunningAvgSamplesPerSec=29.91303565367492, CurrSamplesPerSec=29.732214586842154, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:05:16,931] [INFO] [timer.py:197:stop] 0/413, RunningAvgSamplesPerSec=29.913353007247164, CurrSamplesPerSec=30.044037801689207, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:05:22,987] [INFO] [timer.py:197:stop] 0/414, RunningAvgSamplesPerSec=29.912661814384254, CurrSamplesPerSec=29.63126051555557, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:05:29,355] [INFO] [timer.py:197:stop] 0/415, RunningAvgSamplesPerSec=29.91309886807259, CurrSamplesPerSec=30.094258148395994, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:05:36,674] [INFO] [timer.py:197:stop] 0/416, RunningAvgSamplesPerSec=29.912853490135078, CurrSamplesPerSec=29.81185539961434, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:05:42,967] [INFO] [timer.py:197:stop] 0/417, RunningAvgSamplesPerSec=29.913983024887486, CurrSamplesPerSec=30.38905486447072, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:05:49,474] [INFO] [timer.py:197:stop] 0/418, RunningAvgSamplesPerSec=29.91317250928748, CurrSamplesPerSec=29.58055769397335, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:05:55,541] [INFO] [timer.py:197:stop] 0/419, RunningAvgSamplesPerSec=29.913339866874793, CurrSamplesPerSec=29.983123428928216, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:06:01,879] [INFO] [logging.py:68:log_dist] [Rank 0] step=420, skipped=2, lr=[9.71176514582969e-06], mom=[[0.9, 0.999]] [2022-12-14 17:06:01,880] [INFO] [timer.py:197:stop] 0/420, RunningAvgSamplesPerSec=29.913919604412595, CurrSamplesPerSec=30.157644554169018, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:06:08,258] [INFO] [timer.py:197:stop] 0/421, RunningAvgSamplesPerSec=29.913501817917656, CurrSamplesPerSec=29.73988307255285, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:06:14,629] [INFO] [timer.py:197:stop] 0/422, RunningAvgSamplesPerSec=29.913065095791428, CurrSamplesPerSec=29.731193738046514, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:06:21,032] [INFO] [timer.py:197:stop] 0/423, RunningAvgSamplesPerSec=29.91225092536343, CurrSamplesPerSec=29.57417339039528, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:06:27,596] [INFO] [timer.py:197:stop] 0/424, RunningAvgSamplesPerSec=29.910256400964272, CurrSamplesPerSec=29.093544308091587, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:06:33,738] [INFO] [timer.py:197:stop] 0/425, RunningAvgSamplesPerSec=29.91287926190205, CurrSamplesPerSec=31.062360908993288, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0201, 'learning_rate': 9.73089868785391e-06, 'epoch': 10.01} [2022-12-14 17:06:40,148] [INFO] [timer.py:197:stop] 0/426, RunningAvgSamplesPerSec=29.91272795042511, CurrSamplesPerSec=29.84886017767289, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:06:46,889] [INFO] [timer.py:197:stop] 0/427, RunningAvgSamplesPerSec=29.912709036054824, CurrSamplesPerSec=29.904691497650607, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:06:53,615] [INFO] [timer.py:197:stop] 0/428, RunningAvgSamplesPerSec=29.913341011284587, CurrSamplesPerSec=30.184369755118354, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:06:59,514] [INFO] [timer.py:197:stop] 0/429, RunningAvgSamplesPerSec=29.91478897562596, CurrSamplesPerSec=30.544639577620245, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:07:06,840] [INFO] [logging.py:68:log_dist] [Rank 0] step=430, skipped=2, lr=[9.74980738869138e-06], mom=[[0.9, 0.999]] [2022-12-14 17:07:06,841] [INFO] [timer.py:197:stop] 0/430, RunningAvgSamplesPerSec=29.915653499972326, CurrSamplesPerSec=30.289428490057038, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:07:13,024] [INFO] [timer.py:197:stop] 0/431, RunningAvgSamplesPerSec=29.91436405162306, CurrSamplesPerSec=29.37250024783877, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:07:19,549] [INFO] [timer.py:197:stop] 0/432, RunningAvgSamplesPerSec=29.915314757222646, CurrSamplesPerSec=30.3288181299895, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:07:21,901] [INFO] [timer.py:197:stop] 0/433, RunningAvgSamplesPerSec=29.91688368249784, CurrSamplesPerSec=30.60712292393048, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:07:24,073] [INFO] [timer.py:197:stop] 0/434, RunningAvgSamplesPerSec=29.91664758696752, CurrSamplesPerSec=29.81523615013076, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:07:26,257] [INFO] [timer.py:197:stop] 0/435, RunningAvgSamplesPerSec=29.916357469082858, CurrSamplesPerSec=29.79155061630621, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:07:28,380] [INFO] [timer.py:197:stop] 0/436, RunningAvgSamplesPerSec=29.917709063700702, CurrSamplesPerSec=30.51465373852896, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:07:30,494] [INFO] [timer.py:197:stop] 0/437, RunningAvgSamplesPerSec=29.919365952286498, CurrSamplesPerSec=30.656205799767186, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:07:32,662] [INFO] [timer.py:197:stop] 0/438, RunningAvgSamplesPerSec=29.919260336993254, CurrSamplesPerSec=29.873388285391616, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:07:34,812] [INFO] [timer.py:197:stop] 0/439, RunningAvgSamplesPerSec=29.919722331839022, CurrSamplesPerSec=30.122520522464683, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:07:36,684] [INFO] [logging.py:68:log_dist] [Rank 0] step=440, skipped=2, lr=[9.786970978782465e-06], mom=[[0.9, 0.999]] [2022-12-14 17:07:36,685] [INFO] [timer.py:197:stop] 0/440, RunningAvgSamplesPerSec=29.928891947749587, CurrSamplesPerSec=34.557093097308496, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:08:15,939] [INFO] [timer.py:197:stop] 0/441, RunningAvgSamplesPerSec=29.92737256900503, CurrSamplesPerSec=29.276393397889215, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:08:22,268] [INFO] [timer.py:197:stop] 0/442, RunningAvgSamplesPerSec=29.92620809160595, CurrSamplesPerSec=29.423607594409464, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:08:28,552] [INFO] [timer.py:197:stop] 0/443, RunningAvgSamplesPerSec=29.92769489472171, CurrSamplesPerSec=30.596541907622502, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:08:34,735] [INFO] [timer.py:197:stop] 0/444, RunningAvgSamplesPerSec=29.92815181074884, CurrSamplesPerSec=30.131020749943314, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:08:40,814] [INFO] [timer.py:197:stop] 0/445, RunningAvgSamplesPerSec=29.928205059311185, CurrSamplesPerSec=29.951759489235975, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:08:46,831] [INFO] [timer.py:197:stop] 0/446, RunningAvgSamplesPerSec=29.929105655464873, CurrSamplesPerSec=30.333472249216904, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:08:52,752] [INFO] [timer.py:197:stop] 0/447, RunningAvgSamplesPerSec=29.93099027703326, CurrSamplesPerSec=30.791884221289614, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:08:58,645] [INFO] [timer.py:197:stop] 0/448, RunningAvgSamplesPerSec=29.932156225931976, CurrSamplesPerSec=30.460176820438846, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:09:04,880] [INFO] [timer.py:197:stop] 0/449, RunningAvgSamplesPerSec=29.931726632562764, CurrSamplesPerSec=29.741349363177118, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:09:10,763] [INFO] [logging.py:68:log_dist] [Rank 0] step=450, skipped=2, lr=[9.823295589572114e-06], mom=[[0.9, 0.999]] [2022-12-14 17:09:10,764] [INFO] [timer.py:197:stop] 0/450, RunningAvgSamplesPerSec=29.932036557922796, CurrSamplesPerSec=30.071218822455034, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0181, 'learning_rate': 9.823295589572114e-06, 'epoch': 11.0} [2022-12-14 17:09:17,267] [INFO] [timer.py:197:stop] 0/451, RunningAvgSamplesPerSec=29.932303563186675, CurrSamplesPerSec=30.05240294546299, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:09:23,285] [INFO] [timer.py:197:stop] 0/452, RunningAvgSamplesPerSec=29.93004199944191, CurrSamplesPerSec=28.94799226789532, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:09:28,954] [INFO] [timer.py:197:stop] 0/453, RunningAvgSamplesPerSec=29.930919146785687, CurrSamplesPerSec=30.330922245583842, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:09:35,276] [INFO] [timer.py:197:stop] 0/454, RunningAvgSamplesPerSec=29.92960850337359, CurrSamplesPerSec=29.349981161200425, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:09:41,289] [INFO] [timer.py:197:stop] 0/455, RunningAvgSamplesPerSec=29.928111312718112, CurrSamplesPerSec=29.2663772759807, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:09:47,400] [INFO] [timer.py:197:stop] 0/456, RunningAvgSamplesPerSec=29.92765817843899, CurrSamplesPerSec=29.72378974325869, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:09:53,344] [INFO] [timer.py:197:stop] 0/457, RunningAvgSamplesPerSec=29.927217697095426, CurrSamplesPerSec=29.728569490489278, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:09:59,370] [INFO] [timer.py:197:stop] 0/458, RunningAvgSamplesPerSec=29.9280542449503, CurrSamplesPerSec=30.313597692315163, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:10:05,163] [INFO] [timer.py:197:stop] 0/459, RunningAvgSamplesPerSec=29.928461450309317, CurrSamplesPerSec=30.115308896933534, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:10:11,063] [INFO] [logging.py:68:log_dist] [Rank 0] step=460, skipped=2, lr=[9.858818266705698e-06], mom=[[0.9, 0.999]] [2022-12-14 17:10:11,063] [INFO] [timer.py:197:stop] 0/460, RunningAvgSamplesPerSec=29.928808172618155, CurrSamplesPerSec=30.088105480972292, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:10:17,081] [INFO] [timer.py:197:stop] 0/461, RunningAvgSamplesPerSec=29.92835484408868, CurrSamplesPerSec=29.722163926200004, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:10:22,940] [INFO] [timer.py:197:stop] 0/462, RunningAvgSamplesPerSec=29.929744626066572, CurrSamplesPerSec=30.581577716675866, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:10:29,204] [INFO] [timer.py:197:stop] 0/463, RunningAvgSamplesPerSec=29.92965260123662, CurrSamplesPerSec=29.887381096724233, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:10:35,195] [INFO] [timer.py:197:stop] 0/464, RunningAvgSamplesPerSec=29.93083831266102, CurrSamplesPerSec=30.487641983343043, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:10:41,573] [INFO] [timer.py:197:stop] 0/465, RunningAvgSamplesPerSec=29.92979909767766, CurrSamplesPerSec=29.4572781031843, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:10:47,693] [INFO] [timer.py:197:stop] 0/466, RunningAvgSamplesPerSec=29.930972539365367, CurrSamplesPerSec=30.484342441125804, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:10:53,816] [INFO] [timer.py:197:stop] 0/467, RunningAvgSamplesPerSec=29.929620702712477, CurrSamplesPerSec=29.31527151212639, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:10:59,756] [INFO] [timer.py:197:stop] 0/468, RunningAvgSamplesPerSec=29.92980825417258, CurrSamplesPerSec=30.017275097820022, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:11:05,629] [INFO] [timer.py:197:stop] 0/469, RunningAvgSamplesPerSec=29.92970441981856, CurrSamplesPerSec=29.881395877980676, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:11:11,480] [INFO] [logging.py:68:log_dist] [Rank 0] step=470, skipped=2, lr=[9.893573655076761e-06], mom=[[0.9, 0.999]] [2022-12-14 17:11:11,481] [INFO] [timer.py:197:stop] 0/470, RunningAvgSamplesPerSec=29.929717990925777, CurrSamplesPerSec=29.936057043183304, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:11:17,581] [INFO] [timer.py:197:stop] 0/471, RunningAvgSamplesPerSec=29.930056363614536, CurrSamplesPerSec=30.089258914474957, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:11:23,593] [INFO] [timer.py:197:stop] 0/472, RunningAvgSamplesPerSec=29.931275847489342, CurrSamplesPerSec=30.514379707207503, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:11:26,000] [INFO] [timer.py:197:stop] 0/473, RunningAvgSamplesPerSec=29.932750033979097, CurrSamplesPerSec=30.642071623573287, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:11:28,123] [INFO] [timer.py:197:stop] 0/474, RunningAvgSamplesPerSec=29.93396511980529, CurrSamplesPerSec=30.51744983305158, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:11:30,246] [INFO] [timer.py:197:stop] 0/475, RunningAvgSamplesPerSec=29.935145340108186, CurrSamplesPerSec=30.502795114197653, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0133, 'learning_rate': 9.910673836465484e-06, 'epoch': 11.01} [2022-12-14 17:11:32,396] [INFO] [timer.py:197:stop] 0/476, RunningAvgSamplesPerSec=29.93557766144186, CurrSamplesPerSec=30.141475097828955, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:11:34,567] [INFO] [timer.py:197:stop] 0/477, RunningAvgSamplesPerSec=29.935370371403945, CurrSamplesPerSec=29.837437013698178, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:11:36,687] [INFO] [timer.py:197:stop] 0/478, RunningAvgSamplesPerSec=29.936667388038625, CurrSamplesPerSec=30.565723224535706, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:11:38,859] [INFO] [timer.py:197:stop] 0/479, RunningAvgSamplesPerSec=29.93640749713423, CurrSamplesPerSec=29.81320959449839, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:11:40,752] [INFO] [logging.py:68:log_dist] [Rank 0] step=480, skipped=2, lr=[9.927594201889966e-06], mom=[[0.9, 0.999]] [2022-12-14 17:11:40,753] [INFO] [timer.py:197:stop] 0/480, RunningAvgSamplesPerSec=29.944240423546947, CurrSamplesPerSec=34.21448501446722, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:12:21,340] [INFO] [timer.py:197:stop] 0/481, RunningAvgSamplesPerSec=29.945001994759394, CurrSamplesPerSec=30.313522381614455, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:12:27,776] [INFO] [timer.py:197:stop] 0/482, RunningAvgSamplesPerSec=29.942993718022933, CurrSamplesPerSec=29.01103235480864, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:12:34,217] [INFO] [timer.py:197:stop] 0/483, RunningAvgSamplesPerSec=29.94243941978199, CurrSamplesPerSec=29.6787244698835, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:12:40,642] [INFO] [timer.py:197:stop] 0/484, RunningAvgSamplesPerSec=29.94232694187008, CurrSamplesPerSec=29.888322847450954, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:12:46,689] [INFO] [timer.py:197:stop] 0/485, RunningAvgSamplesPerSec=29.943146399514973, CurrSamplesPerSec=30.343415873704867, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:12:53,308] [INFO] [timer.py:197:stop] 0/486, RunningAvgSamplesPerSec=29.94162294462178, CurrSamplesPerSec=29.223479466630902, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:12:59,603] [INFO] [timer.py:197:stop] 0/487, RunningAvgSamplesPerSec=29.939712076346126, CurrSamplesPerSec=29.042620908292943, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:13:06,807] [INFO] [timer.py:197:stop] 0/488, RunningAvgSamplesPerSec=29.939920573178014, CurrSamplesPerSec=30.041384935317712, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:13:13,282] [INFO] [timer.py:197:stop] 0/489, RunningAvgSamplesPerSec=29.941185675630535, CurrSamplesPerSec=30.568942935982353, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:13:19,587] [INFO] [logging.py:68:log_dist] [Rank 0] step=490, skipped=2, lr=[9.96091033869825e-06], mom=[[0.9, 0.999]] [2022-12-14 17:13:19,588] [INFO] [timer.py:197:stop] 0/490, RunningAvgSamplesPerSec=29.939995857533276, CurrSamplesPerSec=29.371577867128234, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:13:26,486] [INFO] [timer.py:197:stop] 0/491, RunningAvgSamplesPerSec=29.93944419906448, CurrSamplesPerSec=29.67263884137239, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:13:32,973] [INFO] [timer.py:197:stop] 0/492, RunningAvgSamplesPerSec=29.935721808624344, CurrSamplesPerSec=28.22001047492714, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:13:39,418] [INFO] [timer.py:197:stop] 0/493, RunningAvgSamplesPerSec=29.936618150213135, CurrSamplesPerSec=30.38237873695493, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:13:45,853] [INFO] [timer.py:197:stop] 0/494, RunningAvgSamplesPerSec=29.93721248149583, CurrSamplesPerSec=30.231907568353492, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:13:52,163] [INFO] [timer.py:197:stop] 0/495, RunningAvgSamplesPerSec=29.936497871787843, CurrSamplesPerSec=29.588999372474202, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:14:00,339] [INFO] [timer.py:197:stop] 0/496, RunningAvgSamplesPerSec=29.935169532812726, CurrSamplesPerSec=29.294345715798567, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:14:06,672] [INFO] [timer.py:197:stop] 0/497, RunningAvgSamplesPerSec=29.93406445150269, CurrSamplesPerSec=29.397951220302847, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:14:12,893] [INFO] [timer.py:197:stop] 0/498, RunningAvgSamplesPerSec=29.933623338039, CurrSamplesPerSec=29.71685657554701, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:14:19,557] [INFO] [timer.py:197:stop] 0/499, RunningAvgSamplesPerSec=29.93394986241016, CurrSamplesPerSec=30.096788758203907, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:14:25,984] [INFO] [logging.py:68:log_dist] [Rank 0] step=500, skipped=2, lr=[9.993550644973805e-06], mom=[[0.9, 0.999]] [2022-12-14 17:14:25,985] [INFO] [timer.py:197:stop] 0/500, RunningAvgSamplesPerSec=29.93467180396412, CurrSamplesPerSec=30.29783852798139, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0092, 'learning_rate': 9.993550644973805e-06, 'epoch': 12.0} [2022-12-14 17:14:32,128] [INFO] [timer.py:197:stop] 0/501, RunningAvgSamplesPerSec=29.93590029581805, CurrSamplesPerSec=30.560479157807972, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:14:38,648] [INFO] [timer.py:197:stop] 0/502, RunningAvgSamplesPerSec=29.934503480095145, CurrSamplesPerSec=29.253383773861543, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:14:47,339] [INFO] [timer.py:197:stop] 0/503, RunningAvgSamplesPerSec=29.934596817472947, CurrSamplesPerSec=29.981338523455513, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:14:53,647] [INFO] [timer.py:197:stop] 0/504, RunningAvgSamplesPerSec=29.93502506818842, CurrSamplesPerSec=30.15113066407518, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:00,118] [INFO] [timer.py:197:stop] 0/505, RunningAvgSamplesPerSec=29.936565291221, CurrSamplesPerSec=30.730298373077677, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:06,055] [INFO] [timer.py:197:stop] 0/506, RunningAvgSamplesPerSec=29.937643728614617, CurrSamplesPerSec=30.490128364204864, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:12,533] [INFO] [timer.py:197:stop] 0/507, RunningAvgSamplesPerSec=29.938508332497605, CurrSamplesPerSec=30.380717899779057, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:18,646] [INFO] [timer.py:197:stop] 0/508, RunningAvgSamplesPerSec=29.938156067615463, CurrSamplesPerSec=29.761315178815934, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:25,052] [INFO] [timer.py:197:stop] 0/509, RunningAvgSamplesPerSec=29.935042408863985, CurrSamplesPerSec=28.43845371182981, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:31,534] [INFO] [logging.py:68:log_dist] [Rank 0] step=510, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:15:31,534] [INFO] [timer.py:197:stop] 0/510, RunningAvgSamplesPerSec=29.935583597103015, CurrSamplesPerSec=30.212509275557178, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:37,409] [INFO] [timer.py:197:stop] 0/511, RunningAvgSamplesPerSec=29.936109258740398, CurrSamplesPerSec=30.205553599505432, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:43,906] [INFO] [timer.py:197:stop] 0/512, RunningAvgSamplesPerSec=29.934394845316323, CurrSamplesPerSec=29.08652379600514, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:46,438] [INFO] [timer.py:197:stop] 0/513, RunningAvgSamplesPerSec=29.935059830032312, CurrSamplesPerSec=30.278096002950708, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:48,659] [INFO] [timer.py:197:stop] 0/514, RunningAvgSamplesPerSec=29.93466336266809, CurrSamplesPerSec=29.733433111048786, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:50,832] [INFO] [timer.py:197:stop] 0/515, RunningAvgSamplesPerSec=29.934432558581378, CurrSamplesPerSec=29.816726440344212, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:52,988] [INFO] [timer.py:197:stop] 0/516, RunningAvgSamplesPerSec=29.93465054874488, CurrSamplesPerSec=30.04689965683451, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:55,118] [INFO] [timer.py:197:stop] 0/517, RunningAvgSamplesPerSec=29.93556163978349, CurrSamplesPerSec=30.41131949227055, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:57,234] [INFO] [timer.py:197:stop] 0/518, RunningAvgSamplesPerSec=29.936841827505667, CurrSamplesPerSec=30.611014586890274, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:15:59,388] [INFO] [timer.py:197:stop] 0/519, RunningAvgSamplesPerSec=29.93710048822155, CurrSamplesPerSec=30.07116829205778, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:16:01,269] [INFO] [logging.py:68:log_dist] [Rank 0] step=520, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:16:01,269] [INFO] [timer.py:197:stop] 0/520, RunningAvgSamplesPerSec=29.94468723686717, CurrSamplesPerSec=34.459568565162925, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:16:40,251] [INFO] [timer.py:197:stop] 0/521, RunningAvgSamplesPerSec=29.945293212773635, CurrSamplesPerSec=30.262520422248446, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:16:46,395] [INFO] [timer.py:197:stop] 0/522, RunningAvgSamplesPerSec=29.945163621262726, CurrSamplesPerSec=29.878056642381523, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:16:52,424] [INFO] [timer.py:197:stop] 0/523, RunningAvgSamplesPerSec=29.944881706657558, CurrSamplesPerSec=29.799001643280018, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:16:58,883] [INFO] [timer.py:197:stop] 0/524, RunningAvgSamplesPerSec=29.941114511350243, CurrSamplesPerSec=28.099368182694626, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:17:05,247] [INFO] [timer.py:197:stop] 0/525, RunningAvgSamplesPerSec=29.94193316883897, CurrSamplesPerSec=30.37547180998963, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0092, 'learning_rate': 1e-05, 'epoch': 13.0} [2022-12-14 17:17:11,682] [INFO] [timer.py:197:stop] 0/526, RunningAvgSamplesPerSec=29.941617837060992, CurrSamplesPerSec=29.777604430819256, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:17:17,851] [INFO] [timer.py:197:stop] 0/527, RunningAvgSamplesPerSec=29.940848585597934, CurrSamplesPerSec=29.54312550516034, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:17:23,771] [INFO] [timer.py:197:stop] 0/528, RunningAvgSamplesPerSec=29.942090296743146, CurrSamplesPerSec=30.608525902692566, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:17:30,362] [INFO] [timer.py:197:stop] 0/529, RunningAvgSamplesPerSec=29.938771305819966, CurrSamplesPerSec=28.2893461893045, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:17:36,369] [INFO] [logging.py:68:log_dist] [Rank 0] step=530, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:17:36,370] [INFO] [timer.py:197:stop] 0/530, RunningAvgSamplesPerSec=29.938377798347336, CurrSamplesPerSec=29.732428644673853, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:17:42,570] [INFO] [timer.py:197:stop] 0/531, RunningAvgSamplesPerSec=29.938816513641605, CurrSamplesPerSec=30.172267862427685, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:17:48,936] [INFO] [timer.py:197:stop] 0/532, RunningAvgSamplesPerSec=29.937821309254, CurrSamplesPerSec=29.420473067821234, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:17:54,790] [INFO] [timer.py:197:stop] 0/533, RunningAvgSamplesPerSec=29.93730378944752, CurrSamplesPerSec=29.665513135643888, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:18:00,885] [INFO] [timer.py:197:stop] 0/534, RunningAvgSamplesPerSec=29.937570827341574, CurrSamplesPerSec=30.080044033877936, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:18:06,530] [INFO] [timer.py:197:stop] 0/535, RunningAvgSamplesPerSec=29.9373927532391, CurrSamplesPerSec=29.84295673064897, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:18:12,414] [INFO] [timer.py:197:stop] 0/536, RunningAvgSamplesPerSec=29.9373822237188, CurrSamplesPerSec=29.931771043275468, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:18:18,622] [INFO] [timer.py:197:stop] 0/537, RunningAvgSamplesPerSec=29.936924294162537, CurrSamplesPerSec=29.694374842850227, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:18:24,989] [INFO] [timer.py:197:stop] 0/538, RunningAvgSamplesPerSec=29.936540938120686, CurrSamplesPerSec=29.732843596326063, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:18:31,273] [INFO] [timer.py:197:stop] 0/539, RunningAvgSamplesPerSec=29.934823608046056, CurrSamplesPerSec=29.041844809369366, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:18:37,454] [INFO] [logging.py:68:log_dist] [Rank 0] step=540, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:18:37,455] [INFO] [timer.py:197:stop] 0/540, RunningAvgSamplesPerSec=29.9340974316372, CurrSamplesPerSec=29.549164625339333, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:18:43,699] [INFO] [timer.py:197:stop] 0/541, RunningAvgSamplesPerSec=29.93523497487942, CurrSamplesPerSec=30.56003034651909, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:18:49,979] [INFO] [timer.py:197:stop] 0/542, RunningAvgSamplesPerSec=29.93270794768381, CurrSamplesPerSec=28.63002784446392, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:18:56,197] [INFO] [timer.py:197:stop] 0/543, RunningAvgSamplesPerSec=29.933072421243782, CurrSamplesPerSec=30.131193238573296, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:19:02,459] [INFO] [timer.py:197:stop] 0/544, RunningAvgSamplesPerSec=29.933049131013156, CurrSamplesPerSec=29.920454427659077, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:19:08,601] [INFO] [timer.py:197:stop] 0/545, RunningAvgSamplesPerSec=29.933497148926012, CurrSamplesPerSec=30.178312506415136, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:19:14,432] [INFO] [timer.py:197:stop] 0/546, RunningAvgSamplesPerSec=29.932962232425073, CurrSamplesPerSec=29.645299101879335, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:19:20,879] [INFO] [timer.py:197:stop] 0/547, RunningAvgSamplesPerSec=29.93385955084332, CurrSamplesPerSec=30.43010813391152, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:19:27,148] [INFO] [timer.py:197:stop] 0/548, RunningAvgSamplesPerSec=29.932970315349, CurrSamplesPerSec=29.456072411107687, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:19:33,271] [INFO] [timer.py:197:stop] 0/549, RunningAvgSamplesPerSec=29.93353806676922, CurrSamplesPerSec=30.24678022509466, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:19:39,134] [INFO] [logging.py:68:log_dist] [Rank 0] step=550, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:19:39,134] [INFO] [timer.py:197:stop] 0/550, RunningAvgSamplesPerSec=29.932414700657624, CurrSamplesPerSec=29.33031648301219, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.007, 'learning_rate': 1e-05, 'epoch': 13.01} [2022-12-14 17:19:45,132] [INFO] [timer.py:197:stop] 0/551, RunningAvgSamplesPerSec=29.932154784913248, CurrSamplesPerSec=29.790396751455614, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:19:51,488] [INFO] [timer.py:197:stop] 0/552, RunningAvgSamplesPerSec=29.93110911076577, CurrSamplesPerSec=29.367856793937193, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:19:53,850] [INFO] [timer.py:197:stop] 0/553, RunningAvgSamplesPerSec=29.932402473328587, CurrSamplesPerSec=30.661101033601796, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:19:56,002] [INFO] [timer.py:197:stop] 0/554, RunningAvgSamplesPerSec=29.932702508678116, CurrSamplesPerSec=30.098941798645164, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:19:58,157] [INFO] [timer.py:197:stop] 0/555, RunningAvgSamplesPerSec=29.933580666341967, CurrSamplesPerSec=30.426317509959787, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:20:00,266] [INFO] [timer.py:197:stop] 0/556, RunningAvgSamplesPerSec=29.9349376925135, CurrSamplesPerSec=30.70470527490693, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:20:02,409] [INFO] [timer.py:197:stop] 0/557, RunningAvgSamplesPerSec=29.935445114557695, CurrSamplesPerSec=30.219226616064777, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:20:04,542] [INFO] [timer.py:197:stop] 0/558, RunningAvgSamplesPerSec=29.936215321793203, CurrSamplesPerSec=30.3698839331272, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:20:06,719] [INFO] [timer.py:197:stop] 0/559, RunningAvgSamplesPerSec=29.935903215974935, CurrSamplesPerSec=29.763374286200246, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:20:08,601] [INFO] [logging.py:68:log_dist] [Rank 0] step=560, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:20:08,602] [INFO] [timer.py:197:stop] 0/560, RunningAvgSamplesPerSec=29.94283087015208, CurrSamplesPerSec=34.37354058199113, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:20:49,072] [INFO] [timer.py:197:stop] 0/561, RunningAvgSamplesPerSec=29.94246897467693, CurrSamplesPerSec=29.741886490081107, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:20:55,346] [INFO] [timer.py:197:stop] 0/562, RunningAvgSamplesPerSec=29.94006094572516, CurrSamplesPerSec=28.651987415711147, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:21:01,879] [INFO] [timer.py:197:stop] 0/563, RunningAvgSamplesPerSec=29.939146846456882, CurrSamplesPerSec=29.435871556806084, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:21:08,301] [INFO] [timer.py:197:stop] 0/564, RunningAvgSamplesPerSec=29.93940314258978, CurrSamplesPerSec=30.083880352800097, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:21:15,149] [INFO] [timer.py:197:stop] 0/565, RunningAvgSamplesPerSec=29.939253054901457, CurrSamplesPerSec=29.855141168138317, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:21:22,788] [INFO] [timer.py:197:stop] 0/566, RunningAvgSamplesPerSec=29.938512066217378, CurrSamplesPerSec=29.5270787166701, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:21:29,223] [INFO] [timer.py:197:stop] 0/567, RunningAvgSamplesPerSec=29.939151115508917, CurrSamplesPerSec=30.303974647367706, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:21:35,730] [INFO] [timer.py:197:stop] 0/568, RunningAvgSamplesPerSec=29.938168411703824, CurrSamplesPerSec=29.393067984610315, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:21:42,268] [INFO] [timer.py:197:stop] 0/569, RunningAvgSamplesPerSec=29.93778068343115, CurrSamplesPerSec=29.719926250289106, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:21:48,739] [INFO] [logging.py:68:log_dist] [Rank 0] step=570, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:21:48,740] [INFO] [timer.py:197:stop] 0/570, RunningAvgSamplesPerSec=29.937520957774137, CurrSamplesPerSec=29.79097863201337, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:21:55,234] [INFO] [timer.py:197:stop] 0/571, RunningAvgSamplesPerSec=29.936789292168935, CurrSamplesPerSec=29.52690333174352, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:22:01,783] [INFO] [timer.py:197:stop] 0/572, RunningAvgSamplesPerSec=29.936838649581954, CurrSamplesPerSec=29.964949435245355, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:22:08,995] [INFO] [timer.py:197:stop] 0/573, RunningAvgSamplesPerSec=29.93641579128206, CurrSamplesPerSec=29.697315026980718, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:22:15,170] [INFO] [timer.py:197:stop] 0/574, RunningAvgSamplesPerSec=29.93731802191041, CurrSamplesPerSec=30.46152833580602, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:22:21,451] [INFO] [timer.py:197:stop] 0/575, RunningAvgSamplesPerSec=29.937378482854818, CurrSamplesPerSec=29.97200221031068, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0055, 'learning_rate': 1e-05, 'epoch': 14.0} [2022-12-14 17:22:28,158] [INFO] [timer.py:197:stop] 0/576, RunningAvgSamplesPerSec=29.938017047807552, CurrSamplesPerSec=30.308450040454854, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:22:34,152] [INFO] [timer.py:197:stop] 0/577, RunningAvgSamplesPerSec=29.938265447869927, CurrSamplesPerSec=30.08153057561878, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:22:40,469] [INFO] [timer.py:197:stop] 0/578, RunningAvgSamplesPerSec=29.938827464718557, CurrSamplesPerSec=30.265519600142017, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:22:46,759] [INFO] [timer.py:197:stop] 0/579, RunningAvgSamplesPerSec=29.937391477994442, CurrSamplesPerSec=29.132538690205195, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:22:53,054] [INFO] [logging.py:68:log_dist] [Rank 0] step=580, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:22:53,055] [INFO] [timer.py:197:stop] 0/580, RunningAvgSamplesPerSec=29.936298461430226, CurrSamplesPerSec=29.318662249497258, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:22:59,504] [INFO] [timer.py:197:stop] 0/581, RunningAvgSamplesPerSec=29.936155076159384, CurrSamplesPerSec=29.85350759073089, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:23:06,719] [INFO] [timer.py:197:stop] 0/582, RunningAvgSamplesPerSec=29.9358291127924, CurrSamplesPerSec=29.748280778699144, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:23:13,841] [INFO] [timer.py:197:stop] 0/583, RunningAvgSamplesPerSec=29.935918999511994, CurrSamplesPerSec=29.988144405793694, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:23:20,423] [INFO] [timer.py:197:stop] 0/584, RunningAvgSamplesPerSec=29.936315841039608, CurrSamplesPerSec=30.168673431394044, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:23:27,006] [INFO] [timer.py:197:stop] 0/585, RunningAvgSamplesPerSec=29.937301393180775, CurrSamplesPerSec=30.522116913602776, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:23:33,432] [INFO] [timer.py:197:stop] 0/586, RunningAvgSamplesPerSec=29.937878937409387, CurrSamplesPerSec=30.27842386653355, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:23:39,954] [INFO] [timer.py:197:stop] 0/587, RunningAvgSamplesPerSec=29.93758136993092, CurrSamplesPerSec=29.764806589794016, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:23:46,200] [INFO] [timer.py:197:stop] 0/588, RunningAvgSamplesPerSec=29.937691081078565, CurrSamplesPerSec=30.002010226830617, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:23:52,291] [INFO] [timer.py:197:stop] 0/589, RunningAvgSamplesPerSec=29.937609012585035, CurrSamplesPerSec=29.88959413867207, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:23:58,590] [INFO] [logging.py:68:log_dist] [Rank 0] step=590, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:23:58,591] [INFO] [timer.py:197:stop] 0/590, RunningAvgSamplesPerSec=29.93745058451544, CurrSamplesPerSec=29.844741787581757, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:24:04,951] [INFO] [timer.py:197:stop] 0/591, RunningAvgSamplesPerSec=29.935937226657522, CurrSamplesPerSec=29.07181283864351, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:24:11,173] [INFO] [timer.py:197:stop] 0/592, RunningAvgSamplesPerSec=29.937051837001732, CurrSamplesPerSec=30.608302534281275, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:24:13,610] [INFO] [timer.py:197:stop] 0/593, RunningAvgSamplesPerSec=29.937715644281155, CurrSamplesPerSec=30.334562300927477, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:24:15,729] [INFO] [timer.py:197:stop] 0/594, RunningAvgSamplesPerSec=29.938760246952178, CurrSamplesPerSec=30.56914136201017, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:24:17,872] [INFO] [timer.py:197:stop] 0/595, RunningAvgSamplesPerSec=29.939262663341207, CurrSamplesPerSec=30.239682719031034, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:24:20,018] [INFO] [timer.py:197:stop] 0/596, RunningAvgSamplesPerSec=29.939884486298144, CurrSamplesPerSec=30.313231411968463, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:24:22,178] [INFO] [timer.py:197:stop] 0/597, RunningAvgSamplesPerSec=29.939979467948685, CurrSamplesPerSec=29.996505265289297, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:24:24,291] [INFO] [timer.py:197:stop] 0/598, RunningAvgSamplesPerSec=29.941159771949764, CurrSamplesPerSec=30.660337582381636, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:24:26,482] [INFO] [timer.py:197:stop] 0/599, RunningAvgSamplesPerSec=29.940508060812352, CurrSamplesPerSec=29.557070920003078, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:24:28,404] [INFO] [logging.py:68:log_dist] [Rank 0] step=600, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:24:28,405] [INFO] [timer.py:197:stop] 0/600, RunningAvgSamplesPerSec=29.946012709155607, CurrSamplesPerSec=33.63814051266577, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0054, 'learning_rate': 1e-05, 'epoch': 14.01} [2022-12-14 17:25:11,118] [INFO] [timer.py:197:stop] 0/601, RunningAvgSamplesPerSec=29.94497381326137, CurrSamplesPerSec=29.3363618798321, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:25:17,788] [INFO] [timer.py:197:stop] 0/602, RunningAvgSamplesPerSec=29.94249628844026, CurrSamplesPerSec=28.52865023637711, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:25:24,256] [INFO] [timer.py:197:stop] 0/603, RunningAvgSamplesPerSec=29.943150736113477, CurrSamplesPerSec=30.341045956431085, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:25:30,845] [INFO] [timer.py:197:stop] 0/604, RunningAvgSamplesPerSec=29.94239256398053, CurrSamplesPerSec=29.493572590861497, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:25:37,306] [INFO] [timer.py:197:stop] 0/605, RunningAvgSamplesPerSec=29.942193305133994, CurrSamplesPerSec=29.822718910561324, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:25:44,147] [INFO] [timer.py:197:stop] 0/606, RunningAvgSamplesPerSec=29.941210554363934, CurrSamplesPerSec=29.360131670016457, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:25:50,876] [INFO] [timer.py:197:stop] 0/607, RunningAvgSamplesPerSec=29.94208792998009, CurrSamplesPerSec=30.481587022463803, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:25:57,052] [INFO] [timer.py:197:stop] 0/608, RunningAvgSamplesPerSec=29.94168968481071, CurrSamplesPerSec=29.70267784508315, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:26:03,878] [INFO] [timer.py:197:stop] 0/609, RunningAvgSamplesPerSec=29.94198498864028, CurrSamplesPerSec=30.12201688068914, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:26:10,026] [INFO] [logging.py:68:log_dist] [Rank 0] step=610, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:26:10,026] [INFO] [timer.py:197:stop] 0/610, RunningAvgSamplesPerSec=29.940197143817983, CurrSamplesPerSec=28.892995084534213, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:26:17,234] [INFO] [timer.py:197:stop] 0/611, RunningAvgSamplesPerSec=29.93937412067569, CurrSamplesPerSec=29.447215378806376, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:26:23,741] [INFO] [timer.py:197:stop] 0/612, RunningAvgSamplesPerSec=29.939336417720835, CurrSamplesPerSec=29.916392942937335, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:26:30,214] [INFO] [timer.py:197:stop] 0/613, RunningAvgSamplesPerSec=29.93554349915285, CurrSamplesPerSec=27.78810784668125, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:26:37,025] [INFO] [timer.py:197:stop] 0/614, RunningAvgSamplesPerSec=29.935314297987233, CurrSamplesPerSec=29.795925534676865, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:26:43,158] [INFO] [timer.py:197:stop] 0/615, RunningAvgSamplesPerSec=29.935948796085846, CurrSamplesPerSec=30.32937325917252, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:26:50,467] [INFO] [timer.py:197:stop] 0/616, RunningAvgSamplesPerSec=29.936661652696223, CurrSamplesPerSec=30.38012650678087, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:26:56,609] [INFO] [timer.py:197:stop] 0/617, RunningAvgSamplesPerSec=29.936631539472497, CurrSamplesPerSec=29.918153451161555, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:27:03,124] [INFO] [timer.py:197:stop] 0/618, RunningAvgSamplesPerSec=29.934643034320832, CurrSamplesPerSec=28.759787214256114, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:27:09,596] [INFO] [timer.py:197:stop] 0/619, RunningAvgSamplesPerSec=29.93468814872888, CurrSamplesPerSec=29.96250448987537, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:27:16,227] [INFO] [logging.py:68:log_dist] [Rank 0] step=620, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:27:16,227] [INFO] [timer.py:197:stop] 0/620, RunningAvgSamplesPerSec=29.93387322884523, CurrSamplesPerSec=29.439387109927548, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:27:22,416] [INFO] [timer.py:197:stop] 0/621, RunningAvgSamplesPerSec=29.934121295211654, CurrSamplesPerSec=30.088216773332718, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:27:28,855] [INFO] [timer.py:197:stop] 0/622, RunningAvgSamplesPerSec=29.934363897489884, CurrSamplesPerSec=30.08529309280681, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:27:35,674] [INFO] [timer.py:197:stop] 0/623, RunningAvgSamplesPerSec=29.934408659300466, CurrSamplesPerSec=29.96218677661173, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:27:42,249] [INFO] [timer.py:197:stop] 0/624, RunningAvgSamplesPerSec=29.934695121699956, CurrSamplesPerSec=30.113653481605166, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:27:48,834] [INFO] [timer.py:197:stop] 0/625, RunningAvgSamplesPerSec=29.934818345346983, CurrSamplesPerSec=30.011660516803193, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0043, 'learning_rate': 1e-05, 'epoch': 15.01} [2022-12-14 17:27:55,049] [INFO] [timer.py:197:stop] 0/626, RunningAvgSamplesPerSec=29.935537516656403, CurrSamplesPerSec=30.390400075716897, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:01,960] [INFO] [timer.py:197:stop] 0/627, RunningAvgSamplesPerSec=29.935972630517654, CurrSamplesPerSec=30.209972768473957, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:08,807] [INFO] [timer.py:197:stop] 0/628, RunningAvgSamplesPerSec=29.93518989971446, CurrSamplesPerSec=29.453861693322878, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:15,317] [INFO] [timer.py:197:stop] 0/629, RunningAvgSamplesPerSec=29.935582772968477, CurrSamplesPerSec=30.18356198147127, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:22,496] [INFO] [logging.py:68:log_dist] [Rank 0] step=630, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:28:22,497] [INFO] [timer.py:197:stop] 0/630, RunningAvgSamplesPerSec=29.935366301237053, CurrSamplesPerSec=29.8002521141341, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:29,053] [INFO] [timer.py:197:stop] 0/631, RunningAvgSamplesPerSec=29.935780965298868, CurrSamplesPerSec=30.198478823188875, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:35,566] [INFO] [timer.py:197:stop] 0/632, RunningAvgSamplesPerSec=29.935405623718616, CurrSamplesPerSec=29.701166070435132, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:37,918] [INFO] [timer.py:197:stop] 0/633, RunningAvgSamplesPerSec=29.936705686082373, CurrSamplesPerSec=30.778821005811658, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:40,032] [INFO] [timer.py:197:stop] 0/634, RunningAvgSamplesPerSec=29.937787222776993, CurrSamplesPerSec=30.63618244153188, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:42,193] [INFO] [timer.py:197:stop] 0/635, RunningAvgSamplesPerSec=29.93784448359478, CurrSamplesPerSec=29.974077187793498, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:44,375] [INFO] [timer.py:197:stop] 0/636, RunningAvgSamplesPerSec=29.937424562443457, CurrSamplesPerSec=29.673957455774342, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:46,523] [INFO] [timer.py:197:stop] 0/637, RunningAvgSamplesPerSec=29.93775304714489, CurrSamplesPerSec=30.147473550883088, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:48,723] [INFO] [timer.py:197:stop] 0/638, RunningAvgSamplesPerSec=29.937559809170597, CurrSamplesPerSec=29.815355367865557, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:50,805] [INFO] [timer.py:197:stop] 0/639, RunningAvgSamplesPerSec=29.939344076185485, CurrSamplesPerSec=31.118917686959534, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:28:52,715] [INFO] [logging.py:68:log_dist] [Rank 0] step=640, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:28:52,716] [INFO] [timer.py:197:stop] 0/640, RunningAvgSamplesPerSec=29.944845073483936, CurrSamplesPerSec=33.91420233694085, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:29:32,521] [INFO] [timer.py:197:stop] 0/641, RunningAvgSamplesPerSec=29.944714830333318, CurrSamplesPerSec=29.861850006518903, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:29:38,726] [INFO] [timer.py:197:stop] 0/642, RunningAvgSamplesPerSec=29.94567385476609, CurrSamplesPerSec=30.57131377331381, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:29:45,900] [INFO] [timer.py:197:stop] 0/643, RunningAvgSamplesPerSec=29.946052248800296, CurrSamplesPerSec=30.19020194172486, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:29:52,087] [INFO] [timer.py:197:stop] 0/644, RunningAvgSamplesPerSec=29.946184634610937, CurrSamplesPerSec=30.031285467952383, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:29:58,236] [INFO] [timer.py:197:stop] 0/645, RunningAvgSamplesPerSec=29.945897122688283, CurrSamplesPerSec=29.762446992643635, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:30:04,897] [INFO] [timer.py:197:stop] 0/646, RunningAvgSamplesPerSec=29.94412605273401, CurrSamplesPerSec=28.84711319266241, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:30:11,958] [INFO] [timer.py:197:stop] 0/647, RunningAvgSamplesPerSec=29.942835379522048, CurrSamplesPerSec=29.13412593592142, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:30:18,196] [INFO] [timer.py:197:stop] 0/648, RunningAvgSamplesPerSec=29.94174123519367, CurrSamplesPerSec=29.25229353349256, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:30:24,733] [INFO] [timer.py:197:stop] 0/649, RunningAvgSamplesPerSec=29.94113689007374, CurrSamplesPerSec=29.55576267302949, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:30:31,172] [INFO] [logging.py:68:log_dist] [Rank 0] step=650, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:30:31,172] [INFO] [timer.py:197:stop] 0/650, RunningAvgSamplesPerSec=29.941332313872845, CurrSamplesPerSec=30.068308548364097, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0048, 'learning_rate': 1e-05, 'epoch': 16.0} [2022-12-14 17:30:37,874] [INFO] [timer.py:197:stop] 0/651, RunningAvgSamplesPerSec=29.94140766555329, CurrSamplesPerSec=29.990315435442305, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:30:44,759] [INFO] [timer.py:197:stop] 0/652, RunningAvgSamplesPerSec=29.94150999144909, CurrSamplesPerSec=30.0080673478105, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:30:51,093] [INFO] [timer.py:197:stop] 0/653, RunningAvgSamplesPerSec=29.941519066289764, CurrSamplesPerSec=29.94741887680998, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:30:57,529] [INFO] [timer.py:197:stop] 0/654, RunningAvgSamplesPerSec=29.941778439563436, CurrSamplesPerSec=30.111589534979853, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:31:03,313] [INFO] [timer.py:197:stop] 0/655, RunningAvgSamplesPerSec=29.94239728262093, CurrSamplesPerSec=30.3514029224857, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:31:09,476] [INFO] [timer.py:197:stop] 0/656, RunningAvgSamplesPerSec=29.94315026104793, CurrSamplesPerSec=30.44306684954288, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:31:15,800] [INFO] [timer.py:197:stop] 0/657, RunningAvgSamplesPerSec=29.943477860492052, CurrSamplesPerSec=30.15927431360199, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:31:21,824] [INFO] [timer.py:197:stop] 0/658, RunningAvgSamplesPerSec=29.94284842413404, CurrSamplesPerSec=29.5361756056812, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:31:28,316] [INFO] [timer.py:197:stop] 0/659, RunningAvgSamplesPerSec=29.943811372207737, CurrSamplesPerSec=30.589139896632847, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:31:34,475] [INFO] [logging.py:68:log_dist] [Rank 0] step=660, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:31:34,476] [INFO] [timer.py:197:stop] 0/660, RunningAvgSamplesPerSec=29.94447126626885, CurrSamplesPerSec=30.3844008707988, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:31:41,217] [INFO] [timer.py:197:stop] 0/661, RunningAvgSamplesPerSec=29.94371674046536, CurrSamplesPerSec=29.455348396810333, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:31:47,405] [INFO] [timer.py:197:stop] 0/662, RunningAvgSamplesPerSec=29.94290469189936, CurrSamplesPerSec=29.417174796580912, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:31:54,022] [INFO] [timer.py:197:stop] 0/663, RunningAvgSamplesPerSec=29.94218795743871, CurrSamplesPerSec=29.47651139955624, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:32:00,499] [INFO] [timer.py:197:stop] 0/664, RunningAvgSamplesPerSec=29.941906019797887, CurrSamplesPerSec=29.756699721727358, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:32:07,196] [INFO] [timer.py:197:stop] 0/665, RunningAvgSamplesPerSec=29.94233806955962, CurrSamplesPerSec=30.231117676088203, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:32:13,226] [INFO] [timer.py:197:stop] 0/666, RunningAvgSamplesPerSec=29.94323547428685, CurrSamplesPerSec=30.55029543036369, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:32:20,063] [INFO] [timer.py:197:stop] 0/667, RunningAvgSamplesPerSec=29.942134021481632, CurrSamplesPerSec=29.228233313716128, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:32:26,125] [INFO] [timer.py:197:stop] 0/668, RunningAvgSamplesPerSec=29.941827406720932, CurrSamplesPerSec=29.739309775761544, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:32:32,712] [INFO] [timer.py:197:stop] 0/669, RunningAvgSamplesPerSec=29.941779963491797, CurrSamplesPerSec=29.910216131822896, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:32:38,930] [INFO] [logging.py:68:log_dist] [Rank 0] step=670, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:32:38,931] [INFO] [timer.py:197:stop] 0/670, RunningAvgSamplesPerSec=29.941666222575776, CurrSamplesPerSec=29.865993057185044, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:32:45,165] [INFO] [timer.py:197:stop] 0/671, RunningAvgSamplesPerSec=29.94161838884207, CurrSamplesPerSec=29.90969956867308, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:32:51,730] [INFO] [timer.py:197:stop] 0/672, RunningAvgSamplesPerSec=29.941266747280356, CurrSamplesPerSec=29.707855195769824, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:32:54,293] [INFO] [timer.py:197:stop] 0/673, RunningAvgSamplesPerSec=29.941105518249742, CurrSamplesPerSec=29.83347097816743, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:32:56,411] [INFO] [timer.py:197:stop] 0/674, RunningAvgSamplesPerSec=29.942047184018204, CurrSamplesPerSec=30.58754699764311, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:32:58,533] [INFO] [timer.py:197:stop] 0/675, RunningAvgSamplesPerSec=29.942911344485417, CurrSamplesPerSec=30.53512981460549, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0039, 'learning_rate': 1e-05, 'epoch': 16.01} [2022-12-14 17:33:00,671] [INFO] [timer.py:197:stop] 0/676, RunningAvgSamplesPerSec=29.943457745673662, CurrSamplesPerSec=30.315764746857838, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:33:02,884] [INFO] [timer.py:197:stop] 0/677, RunningAvgSamplesPerSec=29.943007273454043, CurrSamplesPerSec=29.64244122069648, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:33:05,076] [INFO] [timer.py:197:stop] 0/678, RunningAvgSamplesPerSec=29.942416946994346, CurrSamplesPerSec=29.54918739459907, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:33:07,212] [INFO] [timer.py:197:stop] 0/679, RunningAvgSamplesPerSec=29.943039846286545, CurrSamplesPerSec=30.370134759289076, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:33:09,129] [INFO] [logging.py:68:log_dist] [Rank 0] step=680, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:33:09,130] [INFO] [timer.py:197:stop] 0/680, RunningAvgSamplesPerSec=29.948531896284823, CurrSamplesPerSec=34.19457583116939, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:33:51,673] [INFO] [timer.py:197:stop] 0/681, RunningAvgSamplesPerSec=29.946870915473475, CurrSamplesPerSec=28.86159756657047, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:33:58,134] [INFO] [timer.py:197:stop] 0/682, RunningAvgSamplesPerSec=29.94563833267967, CurrSamplesPerSec=29.131501698769487, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:34:04,306] [INFO] [timer.py:197:stop] 0/683, RunningAvgSamplesPerSec=29.946452534257954, CurrSamplesPerSec=30.510554188975416, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:34:10,443] [INFO] [timer.py:197:stop] 0/684, RunningAvgSamplesPerSec=29.94658118632476, CurrSamplesPerSec=30.034450693336062, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:34:16,488] [INFO] [timer.py:197:stop] 0/685, RunningAvgSamplesPerSec=29.94684454077219, CurrSamplesPerSec=30.12753757868994, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:34:22,999] [INFO] [timer.py:197:stop] 0/686, RunningAvgSamplesPerSec=29.946778391810298, CurrSamplesPerSec=29.901666808877618, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:34:29,675] [INFO] [timer.py:197:stop] 0/687, RunningAvgSamplesPerSec=29.94723168825824, CurrSamplesPerSec=30.260534945512305, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:34:36,085] [INFO] [timer.py:197:stop] 0/688, RunningAvgSamplesPerSec=29.946516385743518, CurrSamplesPerSec=29.464433475649116, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:34:42,601] [INFO] [timer.py:197:stop] 0/689, RunningAvgSamplesPerSec=29.9468376626777, CurrSamplesPerSec=30.168870085660856, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:34:48,804] [INFO] [logging.py:68:log_dist] [Rank 0] step=690, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:34:48,805] [INFO] [timer.py:197:stop] 0/690, RunningAvgSamplesPerSec=29.945510737648256, CurrSamplesPerSec=29.060882193066302, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:34:55,904] [INFO] [timer.py:197:stop] 0/691, RunningAvgSamplesPerSec=29.945228065837796, CurrSamplesPerSec=29.75200655433036, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:35:02,116] [INFO] [timer.py:197:stop] 0/692, RunningAvgSamplesPerSec=29.944197227234145, CurrSamplesPerSec=29.250428837026238, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:35:08,366] [INFO] [timer.py:197:stop] 0/693, RunningAvgSamplesPerSec=29.943576173332435, CurrSamplesPerSec=29.521103811480522, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:35:14,501] [INFO] [timer.py:197:stop] 0/694, RunningAvgSamplesPerSec=29.94298703405994, CurrSamplesPerSec=29.54136008989776, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:35:20,383] [INFO] [timer.py:197:stop] 0/695, RunningAvgSamplesPerSec=29.943743628724324, CurrSamplesPerSec=30.476638212315773, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:35:26,718] [INFO] [timer.py:197:stop] 0/696, RunningAvgSamplesPerSec=29.94378636394865, CurrSamplesPerSec=29.973431236586876, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:35:33,119] [INFO] [timer.py:197:stop] 0/697, RunningAvgSamplesPerSec=29.944249080683605, CurrSamplesPerSec=30.26886068127514, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:35:39,352] [INFO] [timer.py:197:stop] 0/698, RunningAvgSamplesPerSec=29.94379795273833, CurrSamplesPerSec=29.633517572651037, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:35:46,379] [INFO] [timer.py:197:stop] 0/699, RunningAvgSamplesPerSec=29.94387702493779, CurrSamplesPerSec=29.999012756054526, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:35:52,632] [INFO] [logging.py:68:log_dist] [Rank 0] step=700, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:35:52,633] [INFO] [timer.py:197:stop] 0/700, RunningAvgSamplesPerSec=29.944082284042878, CurrSamplesPerSec=30.087835684727857, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0033, 'learning_rate': 1e-05, 'epoch': 17.0} [2022-12-14 17:35:58,907] [INFO] [timer.py:197:stop] 0/701, RunningAvgSamplesPerSec=29.944393987861464, CurrSamplesPerSec=30.163557930765574, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:36:05,262] [INFO] [timer.py:197:stop] 0/702, RunningAvgSamplesPerSec=29.945362931147855, CurrSamplesPerSec=30.638350404386504, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:36:12,420] [INFO] [timer.py:197:stop] 0/703, RunningAvgSamplesPerSec=29.945381542068255, CurrSamplesPerSec=29.95841486455373, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:36:18,733] [INFO] [timer.py:197:stop] 0/704, RunningAvgSamplesPerSec=29.94560556538671, CurrSamplesPerSec=30.103474989379947, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:36:25,521] [INFO] [timer.py:197:stop] 0/705, RunningAvgSamplesPerSec=29.946610913853082, CurrSamplesPerSec=30.669424417566415, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:36:31,593] [INFO] [timer.py:197:stop] 0/706, RunningAvgSamplesPerSec=29.946690680560625, CurrSamplesPerSec=30.002872026710207, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:36:38,453] [INFO] [timer.py:197:stop] 0/707, RunningAvgSamplesPerSec=29.94644158725778, CurrSamplesPerSec=29.772102256103416, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:36:45,054] [INFO] [timer.py:197:stop] 0/708, RunningAvgSamplesPerSec=29.947529518888548, CurrSamplesPerSec=30.734710556539646, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:36:51,347] [INFO] [timer.py:197:stop] 0/709, RunningAvgSamplesPerSec=29.94781216761311, CurrSamplesPerSec=30.14870264827659, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:36:57,758] [INFO] [logging.py:68:log_dist] [Rank 0] step=710, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:36:57,759] [INFO] [timer.py:197:stop] 0/710, RunningAvgSamplesPerSec=29.946692340827518, CurrSamplesPerSec=29.17539487006926, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:37:03,887] [INFO] [timer.py:197:stop] 0/711, RunningAvgSamplesPerSec=29.94665207373273, CurrSamplesPerSec=29.91817012363132, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:37:10,009] [INFO] [timer.py:197:stop] 0/712, RunningAvgSamplesPerSec=29.94557329884982, CurrSamplesPerSec=29.199796933751866, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:37:12,391] [INFO] [timer.py:197:stop] 0/713, RunningAvgSamplesPerSec=29.946180824441708, CurrSamplesPerSec=30.383836847217218, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:37:14,542] [INFO] [timer.py:197:stop] 0/714, RunningAvgSamplesPerSec=29.946431577311444, CurrSamplesPerSec=30.1257861522287, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:37:16,660] [INFO] [timer.py:197:stop] 0/715, RunningAvgSamplesPerSec=29.947301346509917, CurrSamplesPerSec=30.579672079753646, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:37:18,773] [INFO] [timer.py:197:stop] 0/716, RunningAvgSamplesPerSec=29.948292550630356, CurrSamplesPerSec=30.67212628482552, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:37:20,948] [INFO] [timer.py:197:stop] 0/717, RunningAvgSamplesPerSec=29.94805854370298, CurrSamplesPerSec=29.781905865661404, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:37:23,122] [INFO] [timer.py:197:stop] 0/718, RunningAvgSamplesPerSec=29.947866062051528, CurrSamplesPerSec=29.810872112703084, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:37:25,264] [INFO] [timer.py:197:stop] 0/719, RunningAvgSamplesPerSec=29.948262221225615, CurrSamplesPerSec=30.2346282459778, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:37:27,159] [INFO] [logging.py:68:log_dist] [Rank 0] step=720, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:37:27,159] [INFO] [timer.py:197:stop] 0/720, RunningAvgSamplesPerSec=29.95349718608037, CurrSamplesPerSec=34.245554374923216, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:38:08,754] [INFO] [timer.py:197:stop] 0/721, RunningAvgSamplesPerSec=29.9527151375341, CurrSamplesPerSec=29.401551110320554, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:38:15,356] [INFO] [timer.py:197:stop] 0/722, RunningAvgSamplesPerSec=29.95225841448229, CurrSamplesPerSec=29.627440657474544, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:38:21,666] [INFO] [timer.py:197:stop] 0/723, RunningAvgSamplesPerSec=29.952156398368338, CurrSamplesPerSec=29.87888472982159, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:38:28,401] [INFO] [timer.py:197:stop] 0/724, RunningAvgSamplesPerSec=29.952301604573805, CurrSamplesPerSec=30.057363014082096, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:38:34,788] [INFO] [timer.py:197:stop] 0/725, RunningAvgSamplesPerSec=29.95252055258793, CurrSamplesPerSec=30.111440914719488, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.003, 'learning_rate': 1e-05, 'epoch': 18.0} [2022-12-14 17:38:41,253] [INFO] [timer.py:197:stop] 0/726, RunningAvgSamplesPerSec=29.95304727602677, CurrSamplesPerSec=30.338779276332595, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:38:47,984] [INFO] [timer.py:197:stop] 0/727, RunningAvgSamplesPerSec=29.9530281659767, CurrSamplesPerSec=29.939198886462442, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:38:54,513] [INFO] [timer.py:197:stop] 0/728, RunningAvgSamplesPerSec=29.953204153657197, CurrSamplesPerSec=30.081341800269463, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:39:03,690] [INFO] [timer.py:197:stop] 0/729, RunningAvgSamplesPerSec=29.9541638766691, CurrSamplesPerSec=30.66753934985864, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:39:10,089] [INFO] [logging.py:68:log_dist] [Rank 0] step=730, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:39:10,090] [INFO] [timer.py:197:stop] 0/730, RunningAvgSamplesPerSec=29.95451769072158, CurrSamplesPerSec=30.213971530807484, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:39:16,679] [INFO] [timer.py:197:stop] 0/731, RunningAvgSamplesPerSec=29.954606372018862, CurrSamplesPerSec=30.01930599253374, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:39:22,817] [INFO] [timer.py:197:stop] 0/732, RunningAvgSamplesPerSec=29.95368503722233, CurrSamplesPerSec=29.29678194875574, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:39:28,801] [INFO] [timer.py:197:stop] 0/733, RunningAvgSamplesPerSec=29.95448835392155, CurrSamplesPerSec=30.552635556567267, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:39:35,194] [INFO] [timer.py:197:stop] 0/734, RunningAvgSamplesPerSec=29.95282915513884, CurrSamplesPerSec=28.787218275881965, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:39:41,324] [INFO] [timer.py:197:stop] 0/735, RunningAvgSamplesPerSec=29.95254771290634, CurrSamplesPerSec=29.747941219005803, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:39:48,285] [INFO] [timer.py:197:stop] 0/736, RunningAvgSamplesPerSec=29.950949985254304, CurrSamplesPerSec=28.82394364911677, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:39:54,708] [INFO] [timer.py:197:stop] 0/737, RunningAvgSamplesPerSec=29.9502971979077, CurrSamplesPerSec=29.478706095268215, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:40:00,613] [INFO] [timer.py:197:stop] 0/738, RunningAvgSamplesPerSec=29.948894295267273, CurrSamplesPerSec=28.95212604461684, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:40:07,244] [INFO] [timer.py:197:stop] 0/739, RunningAvgSamplesPerSec=29.948381909972525, CurrSamplesPerSec=29.575962281746328, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:40:13,616] [INFO] [logging.py:68:log_dist] [Rank 0] step=740, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:40:13,617] [INFO] [timer.py:197:stop] 0/740, RunningAvgSamplesPerSec=29.9474119593635, CurrSamplesPerSec=29.249246392025206, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:40:20,057] [INFO] [timer.py:197:stop] 0/741, RunningAvgSamplesPerSec=29.946815245098577, CurrSamplesPerSec=29.51283061356041, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:40:26,527] [INFO] [timer.py:197:stop] 0/742, RunningAvgSamplesPerSec=29.94679403806896, CurrSamplesPerSec=29.931130251570035, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:40:33,190] [INFO] [timer.py:197:stop] 0/743, RunningAvgSamplesPerSec=29.947691174570096, CurrSamplesPerSec=30.62664355443937, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:40:39,702] [INFO] [timer.py:197:stop] 0/744, RunningAvgSamplesPerSec=29.94750432068207, CurrSamplesPerSec=29.809683647095078, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:40:46,181] [INFO] [timer.py:197:stop] 0/745, RunningAvgSamplesPerSec=29.94815980539915, CurrSamplesPerSec=30.442569693956496, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:40:52,290] [INFO] [timer.py:197:stop] 0/746, RunningAvgSamplesPerSec=29.94904397544926, CurrSamplesPerSec=30.620735850046653, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:40:59,066] [INFO] [timer.py:197:stop] 0/747, RunningAvgSamplesPerSec=29.949681787507362, CurrSamplesPerSec=30.431864075054744, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:41:05,509] [INFO] [timer.py:197:stop] 0/748, RunningAvgSamplesPerSec=29.948930562571036, CurrSamplesPerSec=29.399548200457406, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:41:12,098] [INFO] [timer.py:197:stop] 0/749, RunningAvgSamplesPerSec=29.9490099526876, CurrSamplesPerSec=30.008352488519744, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:41:18,380] [INFO] [logging.py:68:log_dist] [Rank 0] step=750, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:41:18,380] [INFO] [timer.py:197:stop] 0/750, RunningAvgSamplesPerSec=29.94826059600099, CurrSamplesPerSec=29.39877544645957, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0029, 'learning_rate': 1e-05, 'epoch': 18.01} [2022-12-14 17:41:24,572] [INFO] [timer.py:197:stop] 0/751, RunningAvgSamplesPerSec=29.949044591492033, CurrSamplesPerSec=30.547201311671962, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:41:30,854] [INFO] [timer.py:197:stop] 0/752, RunningAvgSamplesPerSec=29.94845207573533, CurrSamplesPerSec=29.511146682028613, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:41:33,166] [INFO] [timer.py:197:stop] 0/753, RunningAvgSamplesPerSec=29.949362421503853, CurrSamplesPerSec=30.648071544432522, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:41:35,348] [INFO] [timer.py:197:stop] 0/754, RunningAvgSamplesPerSec=29.949001848092923, CurrSamplesPerSec=29.680640893326785, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:41:37,493] [INFO] [timer.py:197:stop] 0/755, RunningAvgSamplesPerSec=29.94933358489095, CurrSamplesPerSec=30.200897874256878, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:41:39,631] [INFO] [timer.py:197:stop] 0/756, RunningAvgSamplesPerSec=29.949777502774428, CurrSamplesPerSec=30.287825642795063, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:41:41,762] [INFO] [timer.py:197:stop] 0/757, RunningAvgSamplesPerSec=29.95036625517746, CurrSamplesPerSec=30.400973246260317, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:41:43,896] [INFO] [timer.py:197:stop] 0/758, RunningAvgSamplesPerSec=29.950907492365626, CurrSamplesPerSec=30.36520146018776, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:41:46,029] [INFO] [timer.py:197:stop] 0/759, RunningAvgSamplesPerSec=29.951521207875523, CurrSamplesPerSec=30.42280021433681, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:41:47,909] [INFO] [logging.py:68:log_dist] [Rank 0] step=760, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:41:47,910] [INFO] [timer.py:197:stop] 0/760, RunningAvgSamplesPerSec=29.95670289263911, CurrSamplesPerSec=34.47114026873665, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:42:29,053] [INFO] [timer.py:197:stop] 0/761, RunningAvgSamplesPerSec=29.955623668517134, CurrSamplesPerSec=29.159345856726166, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:42:35,528] [INFO] [timer.py:197:stop] 0/762, RunningAvgSamplesPerSec=29.956153457536757, CurrSamplesPerSec=30.363741703389046, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:42:42,308] [INFO] [timer.py:197:stop] 0/763, RunningAvgSamplesPerSec=29.956826087527663, CurrSamplesPerSec=30.476911566185997, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:42:48,827] [INFO] [timer.py:197:stop] 0/764, RunningAvgSamplesPerSec=29.955522368631975, CurrSamplesPerSec=28.99523877676561, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:42:55,214] [INFO] [timer.py:197:stop] 0/765, RunningAvgSamplesPerSec=29.953454182943766, CurrSamplesPerSec=28.456367119215795, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:43:01,561] [INFO] [timer.py:197:stop] 0/766, RunningAvgSamplesPerSec=29.954140503284144, CurrSamplesPerSec=30.487132983231906, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:43:08,764] [INFO] [timer.py:197:stop] 0/767, RunningAvgSamplesPerSec=29.953917209852587, CurrSamplesPerSec=29.784288377684128, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:43:15,801] [INFO] [timer.py:197:stop] 0/768, RunningAvgSamplesPerSec=29.953624211751144, CurrSamplesPerSec=29.731147636852096, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:43:22,520] [INFO] [timer.py:197:stop] 0/769, RunningAvgSamplesPerSec=29.953060119084114, CurrSamplesPerSec=29.52711769138118, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:43:28,722] [INFO] [logging.py:68:log_dist] [Rank 0] step=770, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:43:28,723] [INFO] [timer.py:197:stop] 0/770, RunningAvgSamplesPerSec=29.952298716417154, CurrSamplesPerSec=29.379485867805617, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:43:35,081] [INFO] [timer.py:197:stop] 0/771, RunningAvgSamplesPerSec=29.951978236725544, CurrSamplesPerSec=29.707858483549952, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:43:41,819] [INFO] [timer.py:197:stop] 0/772, RunningAvgSamplesPerSec=29.95188148963192, CurrSamplesPerSec=29.877667556905042, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:43:47,781] [INFO] [timer.py:197:stop] 0/773, RunningAvgSamplesPerSec=29.952292383150183, CurrSamplesPerSec=30.27206253174538, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:43:54,232] [INFO] [timer.py:197:stop] 0/774, RunningAvgSamplesPerSec=29.951912052601106, CurrSamplesPerSec=29.661523841831258, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:44:00,576] [INFO] [timer.py:197:stop] 0/775, RunningAvgSamplesPerSec=29.951526131275514, CurrSamplesPerSec=29.656532999362092, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0029, 'learning_rate': 1e-05, 'epoch': 19.0} [2022-12-14 17:44:07,652] [INFO] [timer.py:197:stop] 0/776, RunningAvgSamplesPerSec=29.951469300464325, CurrSamplesPerSec=29.907603505189464, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:44:13,962] [INFO] [timer.py:197:stop] 0/777, RunningAvgSamplesPerSec=29.951367977302183, CurrSamplesPerSec=29.87314892179299, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:44:19,941] [INFO] [timer.py:197:stop] 0/778, RunningAvgSamplesPerSec=29.95189599453701, CurrSamplesPerSec=30.36678502567713, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:44:26,274] [INFO] [timer.py:197:stop] 0/779, RunningAvgSamplesPerSec=29.95153057130426, CurrSamplesPerSec=29.67062506445379, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:44:32,842] [INFO] [logging.py:68:log_dist] [Rank 0] step=780, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:44:32,843] [INFO] [timer.py:197:stop] 0/780, RunningAvgSamplesPerSec=29.951923174484907, CurrSamplesPerSec=30.2601187789513, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:44:39,030] [INFO] [timer.py:197:stop] 0/781, RunningAvgSamplesPerSec=29.952499880001987, CurrSamplesPerSec=30.408008891921735, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:44:44,852] [INFO] [timer.py:197:stop] 0/782, RunningAvgSamplesPerSec=29.95231266691134, CurrSamplesPerSec=29.807181227154132, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:44:51,717] [INFO] [timer.py:197:stop] 0/783, RunningAvgSamplesPerSec=29.952279684377267, CurrSamplesPerSec=29.92657541381627, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:44:58,303] [INFO] [timer.py:197:stop] 0/784, RunningAvgSamplesPerSec=29.95153565040869, CurrSamplesPerSec=29.38151820433898, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:45:04,639] [INFO] [timer.py:197:stop] 0/785, RunningAvgSamplesPerSec=29.951720644552235, CurrSamplesPerSec=30.09708908548328, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:45:10,620] [INFO] [timer.py:197:stop] 0/786, RunningAvgSamplesPerSec=29.951473287414363, CurrSamplesPerSec=29.759038612906444, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:45:17,571] [INFO] [timer.py:197:stop] 0/787, RunningAvgSamplesPerSec=29.9505776856815, CurrSamplesPerSec=29.264529925897694, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:45:24,283] [INFO] [timer.py:197:stop] 0/788, RunningAvgSamplesPerSec=29.95060081262462, CurrSamplesPerSec=29.96876648818031, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:45:30,894] [INFO] [timer.py:197:stop] 0/789, RunningAvgSamplesPerSec=29.950743865270418, CurrSamplesPerSec=30.063607490135702, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:45:37,924] [INFO] [logging.py:68:log_dist] [Rank 0] step=790, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:45:37,925] [INFO] [timer.py:197:stop] 0/790, RunningAvgSamplesPerSec=29.95096170431202, CurrSamplesPerSec=30.123389259602273, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:45:44,122] [INFO] [timer.py:197:stop] 0/791, RunningAvgSamplesPerSec=29.951263697313006, CurrSamplesPerSec=30.191142499453672, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:45:50,226] [INFO] [timer.py:197:stop] 0/792, RunningAvgSamplesPerSec=29.950752865715643, CurrSamplesPerSec=29.55306518606724, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:45:52,655] [INFO] [timer.py:197:stop] 0/793, RunningAvgSamplesPerSec=29.950939786817106, CurrSamplesPerSec=30.099340043642762, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:45:54,840] [INFO] [timer.py:197:stop] 0/794, RunningAvgSamplesPerSec=29.95092876680191, CurrSamplesPerSec=29.942214474166928, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:45:56,981] [INFO] [timer.py:197:stop] 0/795, RunningAvgSamplesPerSec=29.951314396237077, CurrSamplesPerSec=30.259883411128012, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:45:59,098] [INFO] [timer.py:197:stop] 0/796, RunningAvgSamplesPerSec=29.952103802071427, CurrSamplesPerSec=30.591482491481614, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:46:01,241] [INFO] [timer.py:197:stop] 0/797, RunningAvgSamplesPerSec=29.952449034868724, CurrSamplesPerSec=30.22909887179499, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:46:03,370] [INFO] [timer.py:197:stop] 0/798, RunningAvgSamplesPerSec=29.953054183232208, CurrSamplesPerSec=30.44201041333066, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:46:05,543] [INFO] [timer.py:197:stop] 0/799, RunningAvgSamplesPerSec=29.9528890271097, CurrSamplesPerSec=29.8219999520067, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:46:07,465] [INFO] [logging.py:68:log_dist] [Rank 0] step=800, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:46:07,465] [INFO] [timer.py:197:stop] 0/800, RunningAvgSamplesPerSec=29.957094129557106, CurrSamplesPerSec=33.731335991607985, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0036, 'learning_rate': 1e-05, 'epoch': 19.01} [2022-12-14 17:46:44,688] [INFO] [timer.py:197:stop] 0/801, RunningAvgSamplesPerSec=29.9559954778868, CurrSamplesPerSec=29.10423131988796, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:46:50,962] [INFO] [timer.py:197:stop] 0/802, RunningAvgSamplesPerSec=29.95631554067597, CurrSamplesPerSec=30.214250395552924, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:46:57,157] [INFO] [timer.py:197:stop] 0/803, RunningAvgSamplesPerSec=29.95649393934405, CurrSamplesPerSec=30.099896929318987, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:47:02,810] [INFO] [timer.py:197:stop] 0/804, RunningAvgSamplesPerSec=29.95693756854211, CurrSamplesPerSec=30.316555644876942, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:47:08,593] [INFO] [timer.py:197:stop] 0/805, RunningAvgSamplesPerSec=29.957274017680557, CurrSamplesPerSec=30.229561845241708, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:47:14,378] [INFO] [timer.py:197:stop] 0/806, RunningAvgSamplesPerSec=29.95815186780218, CurrSamplesPerSec=30.68007344415484, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:47:21,522] [INFO] [timer.py:197:stop] 0/807, RunningAvgSamplesPerSec=29.958575393637748, CurrSamplesPerSec=30.303009942228687, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:47:27,613] [INFO] [timer.py:197:stop] 0/808, RunningAvgSamplesPerSec=29.95766418662826, CurrSamplesPerSec=29.241695023043313, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:47:33,562] [INFO] [timer.py:197:stop] 0/809, RunningAvgSamplesPerSec=29.95581908801141, CurrSamplesPerSec=28.539090040099357, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:47:39,889] [INFO] [logging.py:68:log_dist] [Rank 0] step=810, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:47:39,890] [INFO] [timer.py:197:stop] 0/810, RunningAvgSamplesPerSec=29.955662202209513, CurrSamplesPerSec=29.829588866098618, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:47:46,112] [INFO] [timer.py:197:stop] 0/811, RunningAvgSamplesPerSec=29.95491253383742, CurrSamplesPerSec=29.361201060144996, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:47:52,382] [INFO] [timer.py:197:stop] 0/812, RunningAvgSamplesPerSec=29.952767668722668, CurrSamplesPerSec=28.312700102108437, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:47:58,929] [INFO] [timer.py:197:stop] 0/813, RunningAvgSamplesPerSec=29.95219977266448, CurrSamplesPerSec=29.499170033329438, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:48:05,292] [INFO] [timer.py:197:stop] 0/814, RunningAvgSamplesPerSec=29.952355309110107, CurrSamplesPerSec=30.079029494509342, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:48:10,999] [INFO] [timer.py:197:stop] 0/815, RunningAvgSamplesPerSec=29.952290595114402, CurrSamplesPerSec=29.89983497114668, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:48:17,034] [INFO] [timer.py:197:stop] 0/816, RunningAvgSamplesPerSec=29.952799059922476, CurrSamplesPerSec=30.371973132274196, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:48:22,887] [INFO] [timer.py:197:stop] 0/817, RunningAvgSamplesPerSec=29.952219132315776, CurrSamplesPerSec=29.48749138642878, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:48:29,128] [INFO] [timer.py:197:stop] 0/818, RunningAvgSamplesPerSec=29.95220312414883, CurrSamplesPerSec=29.93916215547325, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:48:34,888] [INFO] [timer.py:197:stop] 0/819, RunningAvgSamplesPerSec=29.952414888975042, CurrSamplesPerSec=30.12621891820308, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:48:40,895] [INFO] [logging.py:68:log_dist] [Rank 0] step=820, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:48:40,895] [INFO] [timer.py:197:stop] 0/820, RunningAvgSamplesPerSec=29.95192287163559, CurrSamplesPerSec=29.555274550841734, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:48:46,760] [INFO] [timer.py:197:stop] 0/821, RunningAvgSamplesPerSec=29.95225777600475, CurrSamplesPerSec=30.2287414387611, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:48:52,507] [INFO] [timer.py:197:stop] 0/822, RunningAvgSamplesPerSec=29.95225783267183, CurrSamplesPerSec=29.952304243080608, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:48:58,838] [INFO] [timer.py:197:stop] 0/823, RunningAvgSamplesPerSec=29.952800792626054, CurrSamplesPerSec=30.404754113642355, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:49:04,840] [INFO] [timer.py:197:stop] 0/824, RunningAvgSamplesPerSec=29.951745011363496, CurrSamplesPerSec=29.109356803971885, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:49:10,925] [INFO] [timer.py:197:stop] 0/825, RunningAvgSamplesPerSec=29.95175123781871, CurrSamplesPerSec=29.95687025980358, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0039, 'learning_rate': 1e-05, 'epoch': 20.0} [2022-12-14 17:49:16,785] [INFO] [timer.py:197:stop] 0/826, RunningAvgSamplesPerSec=29.951944877397377, CurrSamplesPerSec=30.11216376340664, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:49:22,736] [INFO] [timer.py:197:stop] 0/827, RunningAvgSamplesPerSec=29.95234393118289, CurrSamplesPerSec=30.28481862590745, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:49:28,879] [INFO] [timer.py:197:stop] 0/828, RunningAvgSamplesPerSec=29.95245133301024, CurrSamplesPerSec=30.041321057113386, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:49:34,855] [INFO] [timer.py:197:stop] 0/829, RunningAvgSamplesPerSec=29.952159587444935, CurrSamplesPerSec=29.713103419267465, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:49:41,851] [INFO] [logging.py:68:log_dist] [Rank 0] step=830, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:49:41,851] [INFO] [timer.py:197:stop] 0/830, RunningAvgSamplesPerSec=29.951570381573674, CurrSamplesPerSec=29.472106809320344, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:49:47,889] [INFO] [timer.py:197:stop] 0/831, RunningAvgSamplesPerSec=29.949127887298303, CurrSamplesPerSec=28.05481504505161, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:49:54,217] [INFO] [timer.py:197:stop] 0/832, RunningAvgSamplesPerSec=29.949466147514606, CurrSamplesPerSec=30.232537468922544, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:49:56,573] [INFO] [timer.py:197:stop] 0/833, RunningAvgSamplesPerSec=29.94960797904154, CurrSamplesPerSec=30.067793246034746, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:49:58,678] [INFO] [timer.py:197:stop] 0/834, RunningAvgSamplesPerSec=29.950590321323443, CurrSamplesPerSec=30.789818083050186, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:50:00,816] [INFO] [timer.py:197:stop] 0/835, RunningAvgSamplesPerSec=29.951012792333742, CurrSamplesPerSec=30.306687782203632, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:50:02,989] [INFO] [timer.py:197:stop] 0/836, RunningAvgSamplesPerSec=29.950874008984027, CurrSamplesPerSec=29.835712520863623, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:50:05,178] [INFO] [timer.py:197:stop] 0/837, RunningAvgSamplesPerSec=29.950456250927598, CurrSamplesPerSec=29.606057190376877, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:50:07,372] [INFO] [timer.py:197:stop] 0/838, RunningAvgSamplesPerSec=29.949965033496984, CurrSamplesPerSec=29.54534639159077, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:50:09,539] [INFO] [timer.py:197:stop] 0/839, RunningAvgSamplesPerSec=29.95027161216249, CurrSamplesPerSec=30.20878626460527, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:50:11,477] [INFO] [logging.py:68:log_dist] [Rank 0] step=840, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:50:11,478] [INFO] [timer.py:197:stop] 0/840, RunningAvgSamplesPerSec=29.954046707856005, CurrSamplesPerSec=33.486918187070096, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:50:49,669] [INFO] [timer.py:197:stop] 0/841, RunningAvgSamplesPerSec=29.954186845075274, CurrSamplesPerSec=30.072084602975444, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:50:55,696] [INFO] [timer.py:197:stop] 0/842, RunningAvgSamplesPerSec=29.95285551021903, CurrSamplesPerSec=28.876068536966276, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:51:01,745] [INFO] [timer.py:197:stop] 0/843, RunningAvgSamplesPerSec=29.950942393634865, CurrSamplesPerSec=28.42585068000712, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:51:07,788] [INFO] [timer.py:197:stop] 0/844, RunningAvgSamplesPerSec=29.951350359366927, CurrSamplesPerSec=30.298430141375878, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:51:13,813] [INFO] [timer.py:197:stop] 0/845, RunningAvgSamplesPerSec=29.951228804527606, CurrSamplesPerSec=29.84922859832237, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:51:20,297] [INFO] [timer.py:197:stop] 0/846, RunningAvgSamplesPerSec=29.950233577544317, CurrSamplesPerSec=29.134144908031725, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:51:26,631] [INFO] [timer.py:197:stop] 0/847, RunningAvgSamplesPerSec=29.950584014063285, CurrSamplesPerSec=30.249305873506312, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:51:32,646] [INFO] [timer.py:197:stop] 0/848, RunningAvgSamplesPerSec=29.95042453540347, CurrSamplesPerSec=29.8162694021739, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:51:39,044] [INFO] [timer.py:197:stop] 0/849, RunningAvgSamplesPerSec=29.950200121006198, CurrSamplesPerSec=29.76154285456433, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:51:44,803] [INFO] [logging.py:68:log_dist] [Rank 0] step=850, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:51:44,803] [INFO] [timer.py:197:stop] 0/850, RunningAvgSamplesPerSec=29.950414852915838, CurrSamplesPerSec=30.133405326644393, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0053, 'learning_rate': 1e-05, 'epoch': 21.0} [2022-12-14 17:51:50,648] [INFO] [timer.py:197:stop] 0/851, RunningAvgSamplesPerSec=29.949528569857904, CurrSamplesPerSec=29.216380213806485, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:51:57,058] [INFO] [timer.py:197:stop] 0/852, RunningAvgSamplesPerSec=29.949594961834055, CurrSamplesPerSec=30.006068160445086, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:52:03,273] [INFO] [timer.py:197:stop] 0/853, RunningAvgSamplesPerSec=29.948827953975787, CurrSamplesPerSec=29.31077735257529, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:52:09,485] [INFO] [timer.py:197:stop] 0/854, RunningAvgSamplesPerSec=29.94901054509995, CurrSamplesPerSec=30.105206940913103, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:52:15,331] [INFO] [timer.py:197:stop] 0/855, RunningAvgSamplesPerSec=29.949170009878827, CurrSamplesPerSec=30.085653885389696, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:52:21,613] [INFO] [timer.py:197:stop] 0/856, RunningAvgSamplesPerSec=29.950471130268827, CurrSamplesPerSec=31.1030887173733, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:52:27,406] [INFO] [timer.py:197:stop] 0/857, RunningAvgSamplesPerSec=29.950295395109798, CurrSamplesPerSec=29.800966716028636, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:52:33,345] [INFO] [timer.py:197:stop] 0/858, RunningAvgSamplesPerSec=29.949980279485672, CurrSamplesPerSec=29.682961278251067, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:52:39,272] [INFO] [timer.py:197:stop] 0/859, RunningAvgSamplesPerSec=29.949551530391194, CurrSamplesPerSec=29.586990409636414, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:52:45,232] [INFO] [logging.py:68:log_dist] [Rank 0] step=860, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:52:45,233] [INFO] [timer.py:197:stop] 0/860, RunningAvgSamplesPerSec=29.949359192083797, CurrSamplesPerSec=29.78542854949032, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:52:51,066] [INFO] [timer.py:197:stop] 0/861, RunningAvgSamplesPerSec=29.94974395909321, CurrSamplesPerSec=30.283557906721363, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:52:57,316] [INFO] [timer.py:197:stop] 0/862, RunningAvgSamplesPerSec=29.94979723780498, CurrSamplesPerSec=29.995633775762638, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:53:04,554] [INFO] [timer.py:197:stop] 0/863, RunningAvgSamplesPerSec=29.950096432458448, CurrSamplesPerSec=30.20963618627385, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:53:11,041] [INFO] [timer.py:197:stop] 0/864, RunningAvgSamplesPerSec=29.94945794792266, CurrSamplesPerSec=29.409642809875777, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:53:16,902] [INFO] [timer.py:197:stop] 0/865, RunningAvgSamplesPerSec=29.949901970650913, CurrSamplesPerSec=30.337610062438173, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:53:22,806] [INFO] [timer.py:197:stop] 0/866, RunningAvgSamplesPerSec=29.950423063374863, CurrSamplesPerSec=30.406989330867265, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:53:29,127] [INFO] [timer.py:197:stop] 0/867, RunningAvgSamplesPerSec=29.950203196940272, CurrSamplesPerSec=29.761437265378305, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:53:35,173] [INFO] [timer.py:197:stop] 0/868, RunningAvgSamplesPerSec=29.950589897225772, CurrSamplesPerSec=30.288867986864222, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:53:41,067] [INFO] [timer.py:197:stop] 0/869, RunningAvgSamplesPerSec=29.95005472173191, CurrSamplesPerSec=29.493663325738215, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:53:47,185] [INFO] [logging.py:68:log_dist] [Rank 0] step=870, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:53:47,186] [INFO] [timer.py:197:stop] 0/870, RunningAvgSamplesPerSec=29.949660479420707, CurrSamplesPerSec=29.611713746823284, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:53:53,366] [INFO] [timer.py:197:stop] 0/871, RunningAvgSamplesPerSec=29.949622342764865, CurrSamplesPerSec=29.916556314713592, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:53:59,519] [INFO] [timer.py:197:stop] 0/872, RunningAvgSamplesPerSec=29.949249417672515, CurrSamplesPerSec=29.628650608675766, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:54:02,183] [INFO] [timer.py:197:stop] 0/873, RunningAvgSamplesPerSec=29.949354886432694, CurrSamplesPerSec=30.041395021374807, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:54:04,348] [INFO] [timer.py:197:stop] 0/874, RunningAvgSamplesPerSec=29.94976765544327, CurrSamplesPerSec=30.313662733676008, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:54:06,487] [INFO] [timer.py:197:stop] 0/875, RunningAvgSamplesPerSec=29.950135298731468, CurrSamplesPerSec=30.274192922639127, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0057, 'learning_rate': 1e-05, 'epoch': 21.01} [2022-12-14 17:54:08,616] [INFO] [timer.py:197:stop] 0/876, RunningAvgSamplesPerSec=29.950693765863818, CurrSamplesPerSec=30.446312569739955, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:54:10,737] [INFO] [timer.py:197:stop] 0/877, RunningAvgSamplesPerSec=29.951347173881484, CurrSamplesPerSec=30.533539062468897, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:54:12,986] [INFO] [timer.py:197:stop] 0/878, RunningAvgSamplesPerSec=29.949957572129005, CurrSamplesPerSec=28.78154518899172, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:54:15,119] [INFO] [timer.py:197:stop] 0/879, RunningAvgSamplesPerSec=29.950440072072663, CurrSamplesPerSec=30.379167261287105, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:54:17,051] [INFO] [logging.py:68:log_dist] [Rank 0] step=880, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:54:17,051] [INFO] [timer.py:197:stop] 0/880, RunningAvgSamplesPerSec=29.954165091141252, CurrSamplesPerSec=33.6214184736314, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:55:11,289] [INFO] [timer.py:197:stop] 0/881, RunningAvgSamplesPerSec=29.953546479296797, CurrSamplesPerSec=29.420089359937826, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:55:19,115] [INFO] [timer.py:197:stop] 0/882, RunningAvgSamplesPerSec=29.95361357527779, CurrSamplesPerSec=30.012707428170206, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:55:26,279] [INFO] [timer.py:197:stop] 0/883, RunningAvgSamplesPerSec=29.95379130120383, CurrSamplesPerSec=30.111011950832655, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:55:33,202] [INFO] [timer.py:197:stop] 0/884, RunningAvgSamplesPerSec=29.953826965604925, CurrSamplesPerSec=29.985280333717558, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:55:39,777] [INFO] [timer.py:197:stop] 0/885, RunningAvgSamplesPerSec=29.954319772565878, CurrSamplesPerSec=30.39538286726253, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:55:46,156] [INFO] [timer.py:197:stop] 0/886, RunningAvgSamplesPerSec=29.95399446259059, CurrSamplesPerSec=29.66947726832376, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:55:53,223] [INFO] [timer.py:197:stop] 0/887, RunningAvgSamplesPerSec=29.953944133876423, CurrSamplesPerSec=29.90951960892786, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:55:59,873] [INFO] [timer.py:197:stop] 0/888, RunningAvgSamplesPerSec=29.953101911801618, CurrSamplesPerSec=29.22585301490071, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:56:06,625] [INFO] [timer.py:197:stop] 0/889, RunningAvgSamplesPerSec=29.953683075308145, CurrSamplesPerSec=30.477610543951222, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:56:12,957] [INFO] [logging.py:68:log_dist] [Rank 0] step=890, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:56:12,958] [INFO] [timer.py:197:stop] 0/890, RunningAvgSamplesPerSec=29.953470104868682, CurrSamplesPerSec=29.7657505334174, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:56:19,935] [INFO] [timer.py:197:stop] 0/891, RunningAvgSamplesPerSec=29.952973067015495, CurrSamplesPerSec=29.51801988838296, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:56:27,052] [INFO] [timer.py:197:stop] 0/892, RunningAvgSamplesPerSec=29.95435270914646, CurrSamplesPerSec=31.233280039764814, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:56:33,358] [INFO] [timer.py:197:stop] 0/893, RunningAvgSamplesPerSec=29.95329224207347, CurrSamplesPerSec=29.038338741784827, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:56:39,964] [INFO] [timer.py:197:stop] 0/894, RunningAvgSamplesPerSec=29.952973054472352, CurrSamplesPerSec=29.671254750338427, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:56:46,201] [INFO] [timer.py:197:stop] 0/895, RunningAvgSamplesPerSec=29.952907414192456, CurrSamplesPerSec=29.894470643197483, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:56:53,290] [INFO] [timer.py:197:stop] 0/896, RunningAvgSamplesPerSec=29.95252215839143, CurrSamplesPerSec=29.612399734891508, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:56:59,761] [INFO] [timer.py:197:stop] 0/897, RunningAvgSamplesPerSec=29.952765349182773, CurrSamplesPerSec=30.171769338291462, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:57:06,171] [INFO] [timer.py:197:stop] 0/898, RunningAvgSamplesPerSec=29.953207270086363, CurrSamplesPerSec=30.35402501818287, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:57:12,999] [INFO] [timer.py:197:stop] 0/899, RunningAvgSamplesPerSec=29.953641884103057, CurrSamplesPerSec=30.348191129459316, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:57:19,676] [INFO] [logging.py:68:log_dist] [Rank 0] step=900, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:57:19,677] [INFO] [timer.py:197:stop] 0/900, RunningAvgSamplesPerSec=29.952889294766806, CurrSamplesPerSec=29.292712195976844, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0065, 'learning_rate': 1e-05, 'epoch': 22.0} [2022-12-14 17:57:26,632] [INFO] [timer.py:197:stop] 0/901, RunningAvgSamplesPerSec=29.953470398237442, CurrSamplesPerSec=30.484564003961562, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:57:33,096] [INFO] [timer.py:197:stop] 0/902, RunningAvgSamplesPerSec=29.95314963482249, CurrSamplesPerSec=29.667536055276006, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:57:40,008] [INFO] [timer.py:197:stop] 0/903, RunningAvgSamplesPerSec=29.953133546752454, CurrSamplesPerSec=29.938661287341844, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:57:46,454] [INFO] [timer.py:197:stop] 0/904, RunningAvgSamplesPerSec=29.9537271561121, CurrSamplesPerSec=30.498303727527514, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:57:52,974] [INFO] [timer.py:197:stop] 0/905, RunningAvgSamplesPerSec=29.953721120006442, CurrSamplesPerSec=29.948277543253592, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:57:59,718] [INFO] [timer.py:197:stop] 0/906, RunningAvgSamplesPerSec=29.95353723256386, CurrSamplesPerSec=29.788403321656283, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:08,224] [INFO] [timer.py:197:stop] 0/907, RunningAvgSamplesPerSec=29.953934793520702, CurrSamplesPerSec=30.317699267366336, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:15,207] [INFO] [timer.py:197:stop] 0/908, RunningAvgSamplesPerSec=29.953055657284487, CurrSamplesPerSec=29.17804605087527, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:21,510] [INFO] [timer.py:197:stop] 0/909, RunningAvgSamplesPerSec=29.953695509358724, CurrSamplesPerSec=30.54485506651049, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:28,118] [INFO] [logging.py:68:log_dist] [Rank 0] step=910, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:58:28,119] [INFO] [timer.py:197:stop] 0/910, RunningAvgSamplesPerSec=29.954260947460142, CurrSamplesPerSec=30.47605691912094, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:34,402] [INFO] [timer.py:197:stop] 0/911, RunningAvgSamplesPerSec=29.954027693899967, CurrSamplesPerSec=29.743722094705287, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:40,855] [INFO] [timer.py:197:stop] 0/912, RunningAvgSamplesPerSec=29.954401647753116, CurrSamplesPerSec=30.298231794426336, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:43,451] [INFO] [timer.py:197:stop] 0/913, RunningAvgSamplesPerSec=29.95406443827387, CurrSamplesPerSec=29.650318914217074, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:45,604] [INFO] [timer.py:197:stop] 0/914, RunningAvgSamplesPerSec=29.95421671477354, CurrSamplesPerSec=30.09358676469632, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:47,816] [INFO] [timer.py:197:stop] 0/915, RunningAvgSamplesPerSec=29.953870262213606, CurrSamplesPerSec=29.64120723158761, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:49,989] [INFO] [timer.py:197:stop] 0/916, RunningAvgSamplesPerSec=29.954001900258636, CurrSamplesPerSec=30.074672135261682, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:52,105] [INFO] [timer.py:197:stop] 0/917, RunningAvgSamplesPerSec=29.954699780727903, CurrSamplesPerSec=30.606456379588508, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:54,244] [INFO] [timer.py:197:stop] 0/918, RunningAvgSamplesPerSec=29.955185626090998, CurrSamplesPerSec=30.406438245250133, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:56,407] [INFO] [timer.py:197:stop] 0/919, RunningAvgSamplesPerSec=29.955163305629718, CurrSamplesPerSec=29.934731723658185, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:58:58,289] [INFO] [logging.py:68:log_dist] [Rank 0] step=920, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 17:58:58,290] [INFO] [timer.py:197:stop] 0/920, RunningAvgSamplesPerSec=29.959410156205593, CurrSamplesPerSec=34.43635567800256, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:59:40,810] [INFO] [timer.py:197:stop] 0/921, RunningAvgSamplesPerSec=29.95969549844748, CurrSamplesPerSec=30.223952651491906, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:59:47,291] [INFO] [timer.py:197:stop] 0/922, RunningAvgSamplesPerSec=29.95953106272203, CurrSamplesPerSec=29.809173860807547, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 17:59:54,292] [INFO] [timer.py:197:stop] 0/923, RunningAvgSamplesPerSec=29.959756535717027, CurrSamplesPerSec=30.168639525745096, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:00:00,707] [INFO] [timer.py:197:stop] 0/924, RunningAvgSamplesPerSec=29.959917196263245, CurrSamplesPerSec=30.10862078557418, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:00:07,341] [INFO] [timer.py:197:stop] 0/925, RunningAvgSamplesPerSec=29.959160002966673, CurrSamplesPerSec=29.276942599577808, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.008, 'learning_rate': 1e-05, 'epoch': 23.0} [2022-12-14 18:00:14,728] [INFO] [timer.py:197:stop] 0/926, RunningAvgSamplesPerSec=29.95880628755727, CurrSamplesPerSec=29.635850226241597, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:00:21,440] [INFO] [timer.py:197:stop] 0/927, RunningAvgSamplesPerSec=29.95896295108542, CurrSamplesPerSec=30.104423655497463, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:00:27,905] [INFO] [timer.py:197:stop] 0/928, RunningAvgSamplesPerSec=29.959170190690852, CurrSamplesPerSec=30.15210265726923, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:00:35,152] [INFO] [timer.py:197:stop] 0/929, RunningAvgSamplesPerSec=29.95755155933347, CurrSamplesPerSec=28.53019055203622, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:00:41,642] [INFO] [logging.py:68:log_dist] [Rank 0] step=930, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 18:00:41,643] [INFO] [timer.py:197:stop] 0/930, RunningAvgSamplesPerSec=29.958005442460372, CurrSamplesPerSec=30.384755115827144, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:00:48,797] [INFO] [timer.py:197:stop] 0/931, RunningAvgSamplesPerSec=29.957743687341846, CurrSamplesPerSec=29.71679078024391, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:00:54,871] [INFO] [timer.py:197:stop] 0/932, RunningAvgSamplesPerSec=29.957646938889162, CurrSamplesPerSec=29.86803676532276, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:01:01,253] [INFO] [timer.py:197:stop] 0/933, RunningAvgSamplesPerSec=29.956905982119917, CurrSamplesPerSec=29.283326984359707, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:01:08,148] [INFO] [timer.py:197:stop] 0/934, RunningAvgSamplesPerSec=29.956792503584186, CurrSamplesPerSec=29.851515665082655, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:01:14,275] [INFO] [timer.py:197:stop] 0/935, RunningAvgSamplesPerSec=29.957045139023403, CurrSamplesPerSec=30.194368684161713, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:01:21,235] [INFO] [timer.py:197:stop] 0/936, RunningAvgSamplesPerSec=29.957034958082044, CurrSamplesPerSec=29.947539153968812, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:01:27,367] [INFO] [timer.py:197:stop] 0/937, RunningAvgSamplesPerSec=29.95740706492968, CurrSamplesPerSec=30.30903864614626, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:01:33,732] [INFO] [timer.py:197:stop] 0/938, RunningAvgSamplesPerSec=29.956604779936853, CurrSamplesPerSec=29.22481255150955, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:01:40,366] [INFO] [timer.py:197:stop] 0/939, RunningAvgSamplesPerSec=29.956501799544046, CurrSamplesPerSec=29.86042163486685, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:01:46,898] [INFO] [logging.py:68:log_dist] [Rank 0] step=940, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 18:01:46,899] [INFO] [timer.py:197:stop] 0/940, RunningAvgSamplesPerSec=29.956460860532417, CurrSamplesPerSec=29.918150116669832, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:01:53,564] [INFO] [timer.py:197:stop] 0/941, RunningAvgSamplesPerSec=29.956592551338247, CurrSamplesPerSec=30.08063054300245, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:01:59,924] [INFO] [timer.py:197:stop] 0/942, RunningAvgSamplesPerSec=29.95622625561409, CurrSamplesPerSec=29.61618304002401, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:02:07,011] [INFO] [timer.py:197:stop] 0/943, RunningAvgSamplesPerSec=29.9559598304527, CurrSamplesPerSec=29.70759875116189, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:02:13,678] [INFO] [timer.py:197:stop] 0/944, RunningAvgSamplesPerSec=29.95595621192119, CurrSamplesPerSec=29.952551561182112, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:02:20,082] [INFO] [timer.py:197:stop] 0/945, RunningAvgSamplesPerSec=29.956747738033258, CurrSamplesPerSec=30.721418034203392, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:02:26,288] [INFO] [timer.py:197:stop] 0/946, RunningAvgSamplesPerSec=29.95653459793609, CurrSamplesPerSec=29.756884444470558, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:02:33,248] [INFO] [timer.py:197:stop] 0/947, RunningAvgSamplesPerSec=29.956903376264254, CurrSamplesPerSec=30.309127623414344, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:02:40,140] [INFO] [timer.py:197:stop] 0/948, RunningAvgSamplesPerSec=29.95686131181474, CurrSamplesPerSec=29.917163139790546, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:02:46,424] [INFO] [timer.py:197:stop] 0/949, RunningAvgSamplesPerSec=29.956682417435267, CurrSamplesPerSec=29.78840001602413, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:02:53,332] [INFO] [logging.py:68:log_dist] [Rank 0] step=950, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 18:02:53,333] [INFO] [timer.py:197:stop] 0/950, RunningAvgSamplesPerSec=29.954972354423592, CurrSamplesPerSec=28.41868533723887, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0066, 'learning_rate': 1e-05, 'epoch': 23.01} [2022-12-14 18:02:59,815] [INFO] [timer.py:197:stop] 0/951, RunningAvgSamplesPerSec=29.95480874504778, CurrSamplesPerSec=29.800506852545166, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:03:06,413] [INFO] [timer.py:197:stop] 0/952, RunningAvgSamplesPerSec=29.954411995632682, CurrSamplesPerSec=29.582575563633974, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:03:09,068] [INFO] [timer.py:197:stop] 0/953, RunningAvgSamplesPerSec=29.95431972308028, CurrSamplesPerSec=29.866916845086056, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:03:11,228] [INFO] [timer.py:197:stop] 0/954, RunningAvgSamplesPerSec=29.954343384286354, CurrSamplesPerSec=29.976862125206104, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:03:13,368] [INFO] [timer.py:197:stop] 0/955, RunningAvgSamplesPerSec=29.954671898814677, CurrSamplesPerSec=30.27072094904895, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:03:15,508] [INFO] [timer.py:197:stop] 0/956, RunningAvgSamplesPerSec=29.955008666737065, CurrSamplesPerSec=30.279427992927463, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:03:17,636] [INFO] [timer.py:197:stop] 0/957, RunningAvgSamplesPerSec=29.955512913341597, CurrSamplesPerSec=30.44442375263249, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:03:19,775] [INFO] [timer.py:197:stop] 0/958, RunningAvgSamplesPerSec=29.955950197003858, CurrSamplesPerSec=30.37946637435008, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:03:21,908] [INFO] [timer.py:197:stop] 0/959, RunningAvgSamplesPerSec=29.956491352308486, CurrSamplesPerSec=30.48293697690314, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:03:23,776] [INFO] [logging.py:68:log_dist] [Rank 0] step=960, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 18:03:23,777] [INFO] [timer.py:197:stop] 0/960, RunningAvgSamplesPerSec=29.960755862270865, CurrSamplesPerSec=34.68625311056088, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:04:04,169] [INFO] [timer.py:197:stop] 0/961, RunningAvgSamplesPerSec=29.95987344970405, CurrSamplesPerSec=29.137743731789282, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:04:10,707] [INFO] [timer.py:197:stop] 0/962, RunningAvgSamplesPerSec=29.959653623653036, CurrSamplesPerSec=29.75031500247425, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:04:18,419] [INFO] [timer.py:197:stop] 0/963, RunningAvgSamplesPerSec=29.959806095616543, CurrSamplesPerSec=30.106898571322503, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:04:25,072] [INFO] [timer.py:197:stop] 0/964, RunningAvgSamplesPerSec=29.959178638153322, CurrSamplesPerSec=29.368100980539637, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:04:31,342] [INFO] [timer.py:197:stop] 0/965, RunningAvgSamplesPerSec=29.958862764659536, CurrSamplesPerSec=29.65804678723361, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:04:37,789] [INFO] [timer.py:197:stop] 0/966, RunningAvgSamplesPerSec=29.959215911279077, CurrSamplesPerSec=30.30320493086781, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:04:44,209] [INFO] [timer.py:197:stop] 0/967, RunningAvgSamplesPerSec=29.9580056082966, CurrSamplesPerSec=28.83505294948248, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:04:50,659] [INFO] [timer.py:197:stop] 0/968, RunningAvgSamplesPerSec=29.958086020662837, CurrSamplesPerSec=30.035885680884235, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:04:57,338] [INFO] [timer.py:197:stop] 0/969, RunningAvgSamplesPerSec=29.958378542797558, CurrSamplesPerSec=30.24364846154635, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:05:03,574] [INFO] [logging.py:68:log_dist] [Rank 0] step=970, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 18:05:03,574] [INFO] [timer.py:197:stop] 0/970, RunningAvgSamplesPerSec=29.958545909914733, CurrSamplesPerSec=30.121269899405554, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:05:10,253] [INFO] [timer.py:197:stop] 0/971, RunningAvgSamplesPerSec=29.95881101092534, CurrSamplesPerSec=30.217648197762117, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:05:16,394] [INFO] [timer.py:197:stop] 0/972, RunningAvgSamplesPerSec=29.9586092215385, CurrSamplesPerSec=29.76434454131486, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:05:23,295] [INFO] [timer.py:197:stop] 0/973, RunningAvgSamplesPerSec=29.95902700459004, CurrSamplesPerSec=30.36983926588914, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:05:29,602] [INFO] [timer.py:197:stop] 0/974, RunningAvgSamplesPerSec=29.959059005988113, CurrSamplesPerSec=29.990164659337623, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:05:35,723] [INFO] [timer.py:197:stop] 0/975, RunningAvgSamplesPerSec=29.95842694641888, CurrSamplesPerSec=29.356423127662467, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0054, 'learning_rate': 1e-05, 'epoch': 24.0} [2022-12-14 18:05:42,583] [INFO] [timer.py:197:stop] 0/976, RunningAvgSamplesPerSec=29.958490123773252, CurrSamplesPerSec=30.020088212062344, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:05:48,761] [INFO] [timer.py:197:stop] 0/977, RunningAvgSamplesPerSec=29.95878100843842, CurrSamplesPerSec=30.244810443942836, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:05:55,158] [INFO] [timer.py:197:stop] 0/978, RunningAvgSamplesPerSec=29.958751867098993, CurrSamplesPerSec=29.930366009839144, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:06:01,449] [INFO] [timer.py:197:stop] 0/979, RunningAvgSamplesPerSec=29.958584479165886, CurrSamplesPerSec=29.796100822725517, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:06:07,466] [INFO] [logging.py:68:log_dist] [Rank 0] step=980, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 18:06:07,467] [INFO] [timer.py:197:stop] 0/980, RunningAvgSamplesPerSec=29.958040742150846, CurrSamplesPerSec=29.4360749130303, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:06:14,213] [INFO] [timer.py:197:stop] 0/981, RunningAvgSamplesPerSec=29.958042418619012, CurrSamplesPerSec=29.959682094311425, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:06:20,484] [INFO] [timer.py:197:stop] 0/982, RunningAvgSamplesPerSec=29.95754112803186, CurrSamplesPerSec=29.47469568157215, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:06:26,974] [INFO] [timer.py:197:stop] 0/983, RunningAvgSamplesPerSec=29.95819711097356, CurrSamplesPerSec=30.61517260445392, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:06:33,902] [INFO] [timer.py:197:stop] 0/984, RunningAvgSamplesPerSec=29.95826873303776, CurrSamplesPerSec=30.028695318133483, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:06:40,199] [INFO] [timer.py:197:stop] 0/985, RunningAvgSamplesPerSec=29.95871501821471, CurrSamplesPerSec=30.40347994581103, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:06:46,017] [INFO] [timer.py:197:stop] 0/986, RunningAvgSamplesPerSec=29.958173504465343, CurrSamplesPerSec=29.435167901395126, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:06:52,699] [INFO] [timer.py:197:stop] 0/987, RunningAvgSamplesPerSec=29.95803624905819, CurrSamplesPerSec=29.82358369465395, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:06:59,232] [INFO] [timer.py:197:stop] 0/988, RunningAvgSamplesPerSec=29.95784531175417, CurrSamplesPerSec=29.770946595492006, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:07:05,096] [INFO] [timer.py:197:stop] 0/989, RunningAvgSamplesPerSec=29.957517834637336, CurrSamplesPerSec=29.638071986711886, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:07:11,928] [INFO] [logging.py:68:log_dist] [Rank 0] step=990, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 18:07:11,929] [INFO] [timer.py:197:stop] 0/990, RunningAvgSamplesPerSec=29.95667237703617, CurrSamplesPerSec=29.144842787924468, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:07:18,510] [INFO] [timer.py:197:stop] 0/991, RunningAvgSamplesPerSec=29.956306177584285, CurrSamplesPerSec=29.598823083944364, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:07:24,958] [INFO] [timer.py:197:stop] 0/992, RunningAvgSamplesPerSec=29.955894695923767, CurrSamplesPerSec=29.554399225433507, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:07:27,471] [INFO] [timer.py:197:stop] 0/993, RunningAvgSamplesPerSec=29.955745975347195, CurrSamplesPerSec=29.809233445317886, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:07:29,577] [INFO] [timer.py:197:stop] 0/994, RunningAvgSamplesPerSec=29.956542970479035, CurrSamplesPerSec=30.767775356135235, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:07:31,703] [INFO] [timer.py:197:stop] 0/995, RunningAvgSamplesPerSec=29.957044204467838, CurrSamplesPerSec=30.462669094279722, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:07:33,846] [INFO] [timer.py:197:stop] 0/996, RunningAvgSamplesPerSec=29.957340558516023, CurrSamplesPerSec=30.254542569247434, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:07:36,017] [INFO] [timer.py:197:stop] 0/997, RunningAvgSamplesPerSec=29.957220845470836, CurrSamplesPerSec=29.838697346073946, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:07:38,189] [INFO] [timer.py:197:stop] 0/998, RunningAvgSamplesPerSec=29.95708638317248, CurrSamplesPerSec=29.82389184823155, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:07:40,392] [INFO] [timer.py:197:stop] 0/999, RunningAvgSamplesPerSec=29.956824777237212, CurrSamplesPerSec=29.698514263637744, MemAllocated=0.53GB, MaxMemAllocated=17.47GB [2022-12-14 18:07:42,328] [INFO] [logging.py:68:log_dist] [Rank 0] step=1000, skipped=2, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-14 18:07:42,328] [INFO] [timer.py:197:stop] 0/1000, RunningAvgSamplesPerSec=29.95998316265552, CurrSamplesPerSec=33.479141595444545, MemAllocated=0.53GB, MaxMemAllocated=17.47GB {'loss': 0.0048, 'learning_rate': 1e-05, 'epoch': 24.01} {'eval_loss': 0.5263671875, 'eval_wer': 27.20744979243801, 'eval_runtime': 236.2022, 'eval_samples_per_second': 2.82, 'eval_steps_per_second': 0.089, 'epoch': 24.01} [2022-12-14 18:11:39,119] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step1000 is begin to save! [2022-12-14 18:11:39,124] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-1000/global_step1000/mp_rank_00_model_states.pt [2022-12-14 18:11:39,125] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-1000/global_step1000/mp_rank_00_model_states.pt... [2022-12-14 18:11:39,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-1000/global_step1000/mp_rank_00_model_states.pt. [2022-12-14 18:11:39,689] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-14 18:11:41,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-14 18:11:41,932] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-14 18:11:41,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now!