File size: 181,273 Bytes
cf05c06 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
+ deepspeed --num_nodes=1 --num_gpus=8 --master_port 35109 --module safe_rlhf.finetune --train_datasets bt --model_name_or_path cerebras/btlm-3b-8k-base --max_length 8092 --trust_remote_code True --epochs 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 2 --gradient_accumulation_steps 1 --gradient_checkpointing --learning_rate 4.7e-6 --lr_scheduler_type cosine --num_warmup_steps 20 --weight_decay 0.0 --seed 42 --output_dir /home/paperspace/safe-rlhf/output/sft --log_type wandb --log_project BT-Training --zero_stage 2 --bf16 True --tf32 True
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/paperspace/.cache/torch_extensions/py39_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
wandb: Tracking run with wandb version 0.13.4
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
Training 1/16 epoch: 0%| | 0/880 [00:00<?, ?it/s]WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
Training 1/16 epoch (loss 6.2812): 0%| | 0/880 [00:05<?, ?it/s]
Training 1/16 epoch (loss 6.2812): 0%| | 1/880 [00:05<1:27:36, 5.98s/it]
Training 1/16 epoch (loss 6.2812): 0%| | 1/880 [00:12<1:27:36, 5.98s/it]
Training 1/16 epoch (loss 6.2812): 0%| | 2/880 [00:12<1:28:07, 6.02s/it]
Training 1/16 epoch (loss 6.2812): 0%| | 2/880 [00:18<1:28:07, 6.02s/it]
Training 1/16 epoch (loss 6.2812): 0%| | 3/880 [00:18<1:30:05, 6.16s/it]
Training 1/16 epoch (loss 6.2812): 0%| | 3/880 [00:23<1:30:05, 6.16s/it]
Training 1/16 epoch (loss 6.2812): 0%| | 4/880 [00:23<1:22:47, 5.67s/it]
Training 1/16 epoch (loss 6.2188): 0%| | 4/880 [00:28<1:22:47, 5.67s/it]
Training 1/16 epoch (loss 6.2188): 1%| | 5/880 [00:28<1:22:41, 5.67s/it]
Training 1/16 epoch (loss 6.3750): 1%| | 5/880 [00:34<1:22:41, 5.67s/it]
Training 1/16 epoch (loss 6.3750): 1%| | 6/880 [00:34<1:23:25, 5.73s/it]
Training 1/16 epoch (loss 6.3750): 1%| | 6/880 [00:39<1:23:25, 5.73s/it]
Training 1/16 epoch (loss 6.3750): 1%| | 7/880 [00:39<1:18:11, 5.37s/it]
Training 1/16 epoch (loss 6.2500): 1%| | 7/880 [00:45<1:18:11, 5.37s/it]
Training 1/16 epoch (loss 6.2500): 1%| | 8/880 [00:45<1:21:21, 5.60s/it]
Training 1/16 epoch (loss 6.3438): 1%| | 8/880 [00:54<1:21:21, 5.60s/it]
Training 1/16 epoch (loss 6.3438): 1%| | 9/880 [00:54<1:35:15, 6.56s/it]
Training 1/16 epoch (loss 6.2500): 1%| | 9/880 [00:59<1:35:15, 6.56s/it]
Training 1/16 epoch (loss 6.2500): 1%| | 10/880 [00:59<1:28:31, 6.11s/it]
Training 1/16 epoch (loss 6.1250): 1%| | 10/880 [01:04<1:28:31, 6.11s/it]
Training 1/16 epoch (loss 6.1250): 1%|β | 11/880 [01:04<1:23:04, 5.74s/it]
Training 1/16 epoch (loss 6.1562): 1%|β | 11/880 [01:12<1:23:04, 5.74s/it]
Training 1/16 epoch (loss 6.1562): 1%|β | 12/880 [01:12<1:32:30, 6.39s/it]
Training 1/16 epoch (loss 6.0000): 1%|β | 12/880 [01:16<1:32:30, 6.39s/it]
Training 1/16 epoch (loss 6.0000): 1%|β | 13/880 [01:16<1:25:28, 5.92s/it]
Training 1/16 epoch (loss 6.0000): 1%|β | 13/880 [01:21<1:25:28, 5.92s/it]
Training 1/16 epoch (loss 6.0000): 2%|β | 14/880 [01:21<1:20:32, 5.58s/it]
Training 1/16 epoch (loss 5.9375): 2%|β | 14/880 [01:27<1:20:32, 5.58s/it]
Training 1/16 epoch (loss 5.9375): 2%|β | 15/880 [01:27<1:20:18, 5.57s/it]
Training 1/16 epoch (loss 5.7812): 2%|β | 15/880 [01:32<1:20:18, 5.57s/it]
Training 1/16 epoch (loss 5.7812): 2%|β | 16/880 [01:32<1:18:38, 5.46s/it]
Training 1/16 epoch (loss 5.6250): 2%|β | 16/880 [01:39<1:18:38, 5.46s/it]
Training 1/16 epoch (loss 5.6250): 2%|β | 17/880 [01:39<1:24:30, 5.88s/it]
Training 1/16 epoch (loss 5.3750): 2%|β | 17/880 [01:44<1:24:30, 5.88s/it]
Training 1/16 epoch (loss 5.3750): 2%|β | 18/880 [01:44<1:20:56, 5.63s/it]
Training 1/16 epoch (loss 5.2500): 2%|β | 18/880 [01:49<1:20:56, 5.63s/it]
Training 1/16 epoch (loss 5.2500): 2%|β | 19/880 [01:49<1:17:58, 5.43s/it]
Training 1/16 epoch (loss 5.0625): 2%|β | 19/880 [01:53<1:17:58, 5.43s/it]
Training 1/16 epoch (loss 5.0625): 2%|β | 20/880 [01:53<1:12:27, 5.06s/it]
Training 1/16 epoch (loss 5.0625): 2%|β | 20/880 [01:58<1:12:27, 5.06s/it]
Training 1/16 epoch (loss 5.0625): 2%|β | 21/880 [01:58<1:11:50, 5.02s/it]
Training 1/16 epoch (loss 5.1250): 2%|β | 21/880 [02:03<1:11:50, 5.02s/it]
Training 1/16 epoch (loss 5.1250): 2%|β | 22/880 [02:03<1:13:28, 5.14s/it]
Training 1/16 epoch (loss 5.0625): 2%|β | 22/880 [02:07<1:13:28, 5.14s/it]
Training 1/16 epoch (loss 5.0625): 3%|β | 23/880 [02:07<1:09:07, 4.84s/it]
Training 1/16 epoch (loss 4.9688): 3%|β | 23/880 [02:13<1:09:07, 4.84s/it]
Training 1/16 epoch (loss 4.9688): 3%|β | 24/880 [02:13<1:10:16, 4.93s/it]
Training 1/16 epoch (loss 4.9062): 3%|β | 24/880 [02:26<1:10:16, 4.93s/it]
Training 1/16 epoch (loss 4.9062): 3%|β | 25/880 [02:26<1:44:18, 7.32s/it]
Training 1/16 epoch (loss 4.7500): 3%|β | 25/880 [02:32<1:44:18, 7.32s/it]
Training 1/16 epoch (loss 4.7500): 3%|β | 26/880 [02:32<1:40:29, 7.06s/it]
Training 1/16 epoch (loss 4.7500): 3%|β | 26/880 [02:37<1:40:29, 7.06s/it]
Training 1/16 epoch (loss 4.7500): 3%|β | 27/880 [02:37<1:33:17, 6.56s/it]
Training 1/16 epoch (loss 4.6250): 3%|β | 27/880 [02:43<1:33:17, 6.56s/it]
Training 1/16 epoch (loss 4.6250): 3%|β | 28/880 [02:43<1:30:02, 6.34s/it]
Training 1/16 epoch (loss 4.6875): 3%|β | 28/880 [02:48<1:30:02, 6.34s/it]
Training 1/16 epoch (loss 4.6875): 3%|β | 29/880 [02:48<1:22:54, 5.85s/it]
Training 1/16 epoch (loss 4.6250): 3%|β | 29/880 [02:54<1:22:54, 5.85s/it]
Training 1/16 epoch (loss 4.6250): 3%|β | 30/880 [02:54<1:21:47, 5.77s/it]
Training 1/16 epoch (loss 4.6562): 3%|β | 30/880 [02:58<1:21:47, 5.77s/it]
Training 1/16 epoch (loss 4.6562): 4%|β | 31/880 [02:58<1:16:01, 5.37s/it]
Training 1/16 epoch (loss 4.5000): 4%|β | 31/880 [03:03<1:16:01, 5.37s/it]
Training 1/16 epoch (loss 4.5000): 4%|β | 32/880 [03:03<1:12:35, 5.14s/it]
Training 1/16 epoch (loss 4.5000): 4%|β | 32/880 [03:08<1:12:35, 5.14s/it]
Training 1/16 epoch (loss 4.5000): 4%|β | 33/880 [03:08<1:14:13, 5.26s/it]
Training 1/16 epoch (loss 4.5000): 4%|β | 33/880 [03:13<1:14:13, 5.26s/it]
Training 1/16 epoch (loss 4.5000): 4%|β | 34/880 [03:13<1:14:47, 5.30s/it]
Training 1/16 epoch (loss 4.4688): 4%|β | 34/880 [03:19<1:14:47, 5.30s/it]
Training 1/16 epoch (loss 4.4688): 4%|β | 35/880 [03:19<1:15:09, 5.34s/it]
Training 1/16 epoch (loss 4.5312): 4%|β | 35/880 [03:26<1:15:09, 5.34s/it]
Training 1/16 epoch (loss 4.5312): 4%|β | 36/880 [03:26<1:21:58, 5.83s/it]
Training 1/16 epoch (loss 4.5000): 4%|β | 36/880 [03:41<1:21:58, 5.83s/it]
Training 1/16 epoch (loss 4.5000): 4%|β | 37/880 [03:41<1:59:20, 8.49s/it]
Training 1/16 epoch (loss 4.3125): 4%|β | 37/880 [03:46<1:59:20, 8.49s/it]
Training 1/16 epoch (loss 4.3125): 4%|β | 38/880 [03:46<1:45:09, 7.49s/it]
Training 1/16 epoch (loss 4.4688): 4%|β | 38/880 [03:50<1:45:09, 7.49s/it]
Training 1/16 epoch (loss 4.4688): 4%|β | 39/880 [03:50<1:32:49, 6.62s/it]
Training 1/16 epoch (loss 4.3125): 4%|β | 39/880 [03:56<1:32:49, 6.62s/it]
Training 1/16 epoch (loss 4.3125): 5%|β | 40/880 [03:56<1:26:40, 6.19s/it]
Training 1/16 epoch (loss 4.2812): 5%|β | 40/880 [04:00<1:26:40, 6.19s/it]
Training 1/16 epoch (loss 4.2812): 5%|β | 41/880 [04:00<1:21:22, 5.82s/it]
Training 1/16 epoch (loss 4.3125): 5%|β | 41/880 [04:06<1:21:22, 5.82s/it]
Training 1/16 epoch (loss 4.3125): 5%|β | 42/880 [04:06<1:19:40, 5.70s/it]
Training 1/16 epoch (loss 4.2812): 5%|β | 42/880 [04:12<1:19:40, 5.70s/it]
Training 1/16 epoch (loss 4.2812): 5%|β | 43/880 [04:12<1:21:38, 5.85s/it]
Training 1/16 epoch (loss 4.2500): 5%|β | 43/880 [04:21<1:21:38, 5.85s/it]
Training 1/16 epoch (loss 4.2500): 5%|β | 44/880 [04:21<1:32:35, 6.65s/it]
Training 1/16 epoch (loss 4.3750): 5%|β | 44/880 [04:37<1:32:35, 6.65s/it]
Training 1/16 epoch (loss 4.3750): 5%|β | 45/880 [04:37<2:13:14, 9.57s/it]
Training 1/16 epoch (loss 4.0312): 5%|β | 45/880 [04:43<2:13:14, 9.57s/it]
Training 1/16 epoch (loss 4.0312): 5%|β | 46/880 [04:43<1:57:52, 8.48s/it]
Training 1/16 epoch (loss 3.9375): 5%|β | 46/880 [04:48<1:57:52, 8.48s/it]
Training 1/16 epoch (loss 3.9375): 5%|β | 47/880 [04:48<1:42:37, 7.39s/it]
Training 1/16 epoch (loss 4.0938): 5%|β | 47/880 [04:53<1:42:37, 7.39s/it]
Training 1/16 epoch (loss 4.0938): 5%|β | 48/880 [04:53<1:34:24, 6.81s/it]
Training 1/16 epoch (loss 3.9844): 5%|β | 48/880 [05:00<1:34:24, 6.81s/it]
Training 1/16 epoch (loss 3.9844): 6%|β | 49/880 [05:00<1:35:10, 6.87s/it]
Training 1/16 epoch (loss 4.1250): 6%|β | 49/880 [05:06<1:35:10, 6.87s/it]
Training 1/16 epoch (loss 4.1250): 6%|β | 50/880 [05:06<1:30:13, 6.52s/it]
Training 1/16 epoch (loss 4.1250): 6%|β | 50/880 [05:13<1:30:13, 6.52s/it]
Training 1/16 epoch (loss 4.1250): 6%|β | 51/880 [05:13<1:30:31, 6.55s/it]
Training 1/16 epoch (loss 3.8750): 6%|β | 51/880 [05:18<1:30:31, 6.55s/it]
Training 1/16 epoch (loss 3.8750): 6%|β | 52/880 [05:18<1:25:42, 6.21s/it]
Training 1/16 epoch (loss 3.9844): 6%|β | 52/880 [05:23<1:25:42, 6.21s/it]
Training 1/16 epoch (loss 3.9844): 6%|β | 53/880 [05:23<1:19:40, 5.78s/it]
Training 1/16 epoch (loss 3.8750): 6%|β | 53/880 [05:30<1:19:40, 5.78s/it]
Training 1/16 epoch (loss 3.8750): 6%|β | 54/880 [05:30<1:24:21, 6.13s/it]
Training 1/16 epoch (loss 3.7344): 6%|β | 54/880 [05:34<1:24:21, 6.13s/it]
Training 1/16 epoch (loss 3.7344): 6%|β | 55/880 [05:34<1:18:10, 5.69s/it]
Training 2/16 epoch (loss 3.7344): 6%|β | 55/880 [05:40<1:18:10, 5.69s/it]
Training 2/16 epoch (loss 3.7344): 6%|β | 56/880 [05:40<1:16:11, 5.55s/it]
Training 2/16 epoch (loss 3.7969): 6%|β | 56/880 [05:46<1:16:11, 5.55s/it]
Training 2/16 epoch (loss 3.7969): 6%|β | 57/880 [05:46<1:17:53, 5.68s/it]
Training 2/16 epoch (loss 3.6719): 6%|β | 57/880 [05:52<1:17:53, 5.68s/it]
Training 2/16 epoch (loss 3.6719): 7%|β | 58/880 [05:52<1:19:07, 5.78s/it]
Training 2/16 epoch (loss 3.7969): 7%|β | 58/880 [05:56<1:19:07, 5.78s/it]
Training 2/16 epoch (loss 3.7969): 7%|β | 59/880 [05:56<1:14:35, 5.45s/it]
Training 2/16 epoch (loss 3.6719): 7%|β | 59/880 [06:02<1:14:35, 5.45s/it]
Training 2/16 epoch (loss 3.6719): 7%|β | 60/880 [06:02<1:13:38, 5.39s/it]
Training 2/16 epoch (loss 3.6562): 7%|β | 60/880 [06:07<1:13:38, 5.39s/it]
Training 2/16 epoch (loss 3.6562): 7%|β | 61/880 [06:07<1:15:17, 5.52s/it]
Training 2/16 epoch (loss 3.5781): 7%|β | 61/880 [06:12<1:15:17, 5.52s/it]
Training 2/16 epoch (loss 3.5781): 7%|β | 62/880 [06:12<1:11:35, 5.25s/it]
Training 2/16 epoch (loss 3.5469): 7%|β | 62/880 [06:18<1:11:35, 5.25s/it]
Training 2/16 epoch (loss 3.5469): 7%|β | 63/880 [06:18<1:12:50, 5.35s/it]
Training 2/16 epoch (loss 3.5938): 7%|β | 63/880 [06:26<1:12:50, 5.35s/it]
Training 2/16 epoch (loss 3.5938): 7%|β | 64/880 [06:26<1:25:39, 6.30s/it]
Training 2/16 epoch (loss 3.5312): 7%|β | 64/880 [06:31<1:25:39, 6.30s/it]
Training 2/16 epoch (loss 3.5312): 7%|β | 65/880 [06:31<1:20:31, 5.93s/it]
Training 2/16 epoch (loss 3.5469): 7%|β | 65/880 [06:36<1:20:31, 5.93s/it]
Training 2/16 epoch (loss 3.5469): 8%|β | 66/880 [06:36<1:15:52, 5.59s/it]
Training 2/16 epoch (loss 3.4375): 8%|β | 66/880 [06:43<1:15:52, 5.59s/it]
Training 2/16 epoch (loss 3.4375): 8%|β | 67/880 [06:43<1:23:20, 6.15s/it]
Training 2/16 epoch (loss 3.1875): 8%|β | 67/880 [06:48<1:23:20, 6.15s/it]
Training 2/16 epoch (loss 3.1875): 8%|β | 68/880 [06:48<1:17:40, 5.74s/it]
Training 2/16 epoch (loss 3.5000): 8%|β | 68/880 [06:53<1:17:40, 5.74s/it]
Training 2/16 epoch (loss 3.5000): 8%|β | 69/880 [06:53<1:13:39, 5.45s/it]
Training 2/16 epoch (loss 3.3438): 8%|β | 69/880 [06:58<1:13:39, 5.45s/it]
Training 2/16 epoch (loss 3.3438): 8%|β | 70/880 [06:58<1:13:49, 5.47s/it]
Training 2/16 epoch (loss 3.4062): 8%|β | 70/880 [07:04<1:13:49, 5.47s/it]
Training 2/16 epoch (loss 3.4062): 8%|β | 71/880 [07:04<1:12:33, 5.38s/it]
Training 2/16 epoch (loss 3.1719): 8%|β | 71/880 [07:10<1:12:33, 5.38s/it]
Training 2/16 epoch (loss 3.1719): 8%|β | 72/880 [07:10<1:17:08, 5.73s/it]
Training 2/16 epoch (loss 3.2344): 8%|β | 72/880 [07:15<1:17:08, 5.73s/it]
Training 2/16 epoch (loss 3.2344): 8%|β | 73/880 [07:15<1:14:18, 5.52s/it]
Training 2/16 epoch (loss 3.2812): 8%|β | 73/880 [07:20<1:14:18, 5.52s/it]
Training 2/16 epoch (loss 3.2812): 8%|β | 74/880 [07:20<1:11:53, 5.35s/it]
Training 2/16 epoch (loss 2.9688): 8%|β | 74/880 [07:24<1:11:53, 5.35s/it]
Training 2/16 epoch (loss 2.9688): 9%|β | 75/880 [07:24<1:06:59, 4.99s/it]
Training 2/16 epoch (loss 3.1719): 9%|β | 75/880 [07:29<1:06:59, 4.99s/it]
Training 2/16 epoch (loss 3.1719): 9%|β | 76/880 [07:29<1:06:35, 4.97s/it]
Training 2/16 epoch (loss 3.3125): 9%|β | 76/880 [07:35<1:06:35, 4.97s/it]
Training 2/16 epoch (loss 3.3125): 9%|β | 77/880 [07:35<1:08:12, 5.10s/it]
Training 2/16 epoch (loss 3.2031): 9%|β | 77/880 [07:39<1:08:12, 5.10s/it]
Training 2/16 epoch (loss 3.2031): 9%|β | 78/880 [07:39<1:04:11, 4.80s/it]
Training 2/16 epoch (loss 3.1094): 9%|β | 78/880 [07:44<1:04:11, 4.80s/it]
Training 2/16 epoch (loss 3.1094): 9%|β | 79/880 [07:44<1:05:17, 4.89s/it]
Training 2/16 epoch (loss 3.1250): 9%|β | 79/880 [07:56<1:05:17, 4.89s/it]
Training 2/16 epoch (loss 3.1250): 9%|β | 80/880 [07:56<1:35:31, 7.16s/it]
Training 2/16 epoch (loss 2.9219): 9%|β | 80/880 [08:02<1:35:31, 7.16s/it]
Training 2/16 epoch (loss 2.9219): 9%|β | 81/880 [08:02<1:29:59, 6.76s/it]
Training 2/16 epoch (loss 3.0781): 9%|β | 81/880 [08:07<1:29:59, 6.76s/it]
Training 2/16 epoch (loss 3.0781): 9%|β | 82/880 [08:07<1:24:17, 6.34s/it]
Training 2/16 epoch (loss 2.8281): 9%|β | 82/880 [08:13<1:24:17, 6.34s/it]
Training 2/16 epoch (loss 2.8281): 9%|β | 83/880 [08:13<1:20:01, 6.02s/it]
Training 2/16 epoch (loss 3.0000): 9%|β | 83/880 [08:17<1:20:01, 6.02s/it]
Training 2/16 epoch (loss 3.0000): 10%|β | 84/880 [08:17<1:14:32, 5.62s/it]
Training 2/16 epoch (loss 2.9531): 10%|β | 84/880 [08:23<1:14:32, 5.62s/it]
Training 2/16 epoch (loss 2.9531): 10%|β | 85/880 [08:23<1:14:04, 5.59s/it]
Training 2/16 epoch (loss 3.0000): 10%|β | 85/880 [08:27<1:14:04, 5.59s/it]
Training 2/16 epoch (loss 3.0000): 10%|β | 86/880 [08:27<1:09:22, 5.24s/it]
Training 2/16 epoch (loss 2.8281): 10%|β | 86/880 [08:32<1:09:22, 5.24s/it]
Training 2/16 epoch (loss 2.8281): 10%|β | 87/880 [08:32<1:06:37, 5.04s/it]
Training 2/16 epoch (loss 2.9531): 10%|β | 87/880 [08:37<1:06:37, 5.04s/it]
Training 2/16 epoch (loss 2.9531): 10%|β | 88/880 [08:37<1:08:14, 5.17s/it]
Training 2/16 epoch (loss 2.8906): 10%|β | 88/880 [08:43<1:08:14, 5.17s/it]
Training 2/16 epoch (loss 2.8906): 10%|β | 89/880 [08:43<1:09:00, 5.23s/it]
Training 2/16 epoch (loss 2.7969): 10%|β | 89/880 [08:48<1:09:00, 5.23s/it]
Training 2/16 epoch (loss 2.7969): 10%|β | 90/880 [08:48<1:09:32, 5.28s/it]
Training 2/16 epoch (loss 2.9531): 10%|β | 90/880 [08:55<1:09:32, 5.28s/it]
Training 2/16 epoch (loss 2.9531): 10%|β | 91/880 [08:55<1:13:56, 5.62s/it]
Training 2/16 epoch (loss 2.9375): 10%|β | 91/880 [09:09<1:13:56, 5.62s/it]
Training 2/16 epoch (loss 2.9375): 10%|β | 92/880 [09:09<1:48:20, 8.25s/it]
Training 2/16 epoch (loss 2.7500): 10%|β | 92/880 [09:14<1:48:20, 8.25s/it]
Training 2/16 epoch (loss 2.7500): 11%|β | 93/880 [09:14<1:35:56, 7.31s/it]
Training 2/16 epoch (loss 2.9531): 11%|β | 93/880 [09:19<1:35:56, 7.31s/it]
Training 2/16 epoch (loss 2.9531): 11%|β | 94/880 [09:19<1:25:00, 6.49s/it]
Training 2/16 epoch (loss 2.7188): 11%|β | 94/880 [09:24<1:25:00, 6.49s/it]
Training 2/16 epoch (loss 2.7188): 11%|β | 95/880 [09:24<1:19:40, 6.09s/it]
Training 2/16 epoch (loss 2.8281): 11%|β | 95/880 [09:29<1:19:40, 6.09s/it]
Training 2/16 epoch (loss 2.8281): 11%|β | 96/880 [09:29<1:15:02, 5.74s/it]
Training 2/16 epoch (loss 2.8281): 11%|β | 96/880 [09:34<1:15:02, 5.74s/it]
Training 2/16 epoch (loss 2.8281): 11%|β | 97/880 [09:34<1:13:42, 5.65s/it]
Training 2/16 epoch (loss 2.7500): 11%|β | 97/880 [09:40<1:13:42, 5.65s/it]
Training 2/16 epoch (loss 2.7500): 11%|β | 98/880 [09:40<1:15:41, 5.81s/it]
Training 2/16 epoch (loss 2.9531): 11%|β | 98/880 [09:48<1:15:41, 5.81s/it]
Training 2/16 epoch (loss 2.9531): 11%|ββ | 99/880 [09:48<1:23:42, 6.43s/it]
Training 2/16 epoch (loss 2.9844): 11%|ββ | 99/880 [10:04<1:23:42, 6.43s/it]
Training 2/16 epoch (loss 2.9844): 11%|ββ | 100/880 [10:04<1:59:52, 9.22s/it]
Training 2/16 epoch (loss 2.5781): 11%|ββ | 100/880 [10:10<1:59:52, 9.22s/it]
Training 2/16 epoch (loss 2.5781): 11%|ββ | 101/880 [10:10<1:46:45, 8.22s/it]
Training 2/16 epoch (loss 2.5156): 11%|ββ | 101/880 [10:15<1:46:45, 8.22s/it]
Training 2/16 epoch (loss 2.5156): 12%|ββ | 102/880 [10:15<1:33:23, 7.20s/it]
Training 2/16 epoch (loss 2.7031): 12%|ββ | 102/880 [10:20<1:33:23, 7.20s/it]
Training 2/16 epoch (loss 2.7031): 12%|ββ | 103/880 [10:20<1:26:19, 6.67s/it]
Training 2/16 epoch (loss 2.6094): 12%|ββ | 103/880 [10:27<1:26:19, 6.67s/it]
Training 2/16 epoch (loss 2.6094): 12%|ββ | 104/880 [10:27<1:27:27, 6.76s/it]
Training 2/16 epoch (loss 2.7656): 12%|ββ | 104/880 [10:33<1:27:27, 6.76s/it]
Training 2/16 epoch (loss 2.7656): 12%|ββ | 105/880 [10:33<1:23:08, 6.44s/it]
Training 2/16 epoch (loss 2.8125): 12%|ββ | 105/880 [10:39<1:23:08, 6.44s/it]
Training 2/16 epoch (loss 2.8125): 12%|ββ | 106/880 [10:39<1:23:35, 6.48s/it]
Training 2/16 epoch (loss 2.4844): 12%|ββ | 106/880 [10:45<1:23:35, 6.48s/it]
Training 2/16 epoch (loss 2.4844): 12%|ββ | 107/880 [10:45<1:19:15, 6.15s/it]
Training 2/16 epoch (loss 2.7969): 12%|ββ | 107/880 [10:50<1:19:15, 6.15s/it]
Training 2/16 epoch (loss 2.7969): 12%|ββ | 108/880 [10:50<1:13:46, 5.73s/it]
Training 2/16 epoch (loss 2.6562): 12%|ββ | 108/880 [10:56<1:13:46, 5.73s/it]
Training 2/16 epoch (loss 2.6562): 12%|ββ | 109/880 [10:56<1:18:16, 6.09s/it]
Training 2/16 epoch (loss 2.4688): 12%|ββ | 109/880 [11:01<1:18:16, 6.09s/it]
Training 2/16 epoch (loss 2.4688): 12%|ββ | 110/880 [11:01<1:12:34, 5.65s/it]
Training 3/16 epoch (loss 2.6250): 12%|ββ | 110/880 [11:06<1:12:34, 5.65s/it]
Training 3/16 epoch (loss 2.6250): 13%|ββ | 111/880 [11:06<1:10:50, 5.53s/it]
Training 3/16 epoch (loss 2.6406): 13%|ββ | 111/880 [11:12<1:10:50, 5.53s/it]
Training 3/16 epoch (loss 2.6406): 13%|ββ | 112/880 [11:12<1:12:29, 5.66s/it]
Training 3/16 epoch (loss 2.5156): 13%|ββ | 112/880 [11:18<1:12:29, 5.66s/it]
Training 3/16 epoch (loss 2.5156): 13%|ββ | 113/880 [11:18<1:13:40, 5.76s/it]
Training 3/16 epoch (loss 2.7500): 13%|ββ | 113/880 [11:23<1:13:40, 5.76s/it]
Training 3/16 epoch (loss 2.7500): 13%|ββ | 114/880 [11:23<1:09:27, 5.44s/it]
Training 3/16 epoch (loss 2.6094): 13%|ββ | 114/880 [11:28<1:09:27, 5.44s/it]
Training 3/16 epoch (loss 2.6094): 13%|ββ | 115/880 [11:28<1:08:34, 5.38s/it]
Training 3/16 epoch (loss 2.6250): 13%|ββ | 115/880 [11:34<1:08:34, 5.38s/it]
Training 3/16 epoch (loss 2.6250): 13%|ββ | 116/880 [11:34<1:10:06, 5.51s/it]
Training 3/16 epoch (loss 2.6094): 13%|ββ | 116/880 [11:39<1:10:06, 5.51s/it]
Training 3/16 epoch (loss 2.6094): 13%|ββ | 117/880 [11:39<1:06:38, 5.24s/it]
Training 3/16 epoch (loss 2.5938): 13%|ββ | 117/880 [11:44<1:06:38, 5.24s/it]
Training 3/16 epoch (loss 2.5938): 13%|ββ | 118/880 [11:44<1:07:49, 5.34s/it]
Training 3/16 epoch (loss 2.6562): 13%|ββ | 118/880 [11:53<1:07:49, 5.34s/it]
Training 3/16 epoch (loss 2.6562): 14%|ββ | 119/880 [11:53<1:19:49, 6.29s/it]
Training 3/16 epoch (loss 2.6094): 14%|ββ | 119/880 [11:58<1:19:49, 6.29s/it]
Training 3/16 epoch (loss 2.6094): 14%|ββ | 120/880 [11:58<1:15:04, 5.93s/it]
Training 3/16 epoch (loss 2.6719): 14%|ββ | 120/880 [12:03<1:15:04, 5.93s/it]
Training 3/16 epoch (loss 2.6719): 14%|ββ | 121/880 [12:03<1:10:46, 5.60s/it]
Training 3/16 epoch (loss 2.5938): 14%|ββ | 121/880 [12:10<1:10:46, 5.60s/it]
Training 3/16 epoch (loss 2.5938): 14%|ββ | 122/880 [12:10<1:17:43, 6.15s/it]
Training 3/16 epoch (loss 2.3281): 14%|ββ | 122/880 [12:15<1:17:43, 6.15s/it]
Training 3/16 epoch (loss 2.3281): 14%|ββ | 123/880 [12:15<1:12:25, 5.74s/it]
Training 3/16 epoch (loss 2.7500): 14%|ββ | 123/880 [12:20<1:12:25, 5.74s/it]
Training 3/16 epoch (loss 2.7500): 14%|ββ | 124/880 [12:20<1:08:39, 5.45s/it]
Training 3/16 epoch (loss 2.5312): 14%|ββ | 124/880 [12:25<1:08:39, 5.45s/it]
Training 3/16 epoch (loss 2.5312): 14%|ββ | 125/880 [12:25<1:08:48, 5.47s/it]
Training 3/16 epoch (loss 2.6562): 14%|ββ | 125/880 [12:30<1:08:48, 5.47s/it]
Training 3/16 epoch (loss 2.6562): 14%|ββ | 126/880 [12:30<1:07:35, 5.38s/it]
Training 3/16 epoch (loss 2.4844): 14%|ββ | 126/880 [12:37<1:07:35, 5.38s/it]
Training 3/16 epoch (loss 2.4844): 14%|ββ | 127/880 [12:37<1:11:51, 5.73s/it]
Training 3/16 epoch (loss 2.5312): 14%|ββ | 127/880 [12:42<1:11:51, 5.73s/it]
Training 3/16 epoch (loss 2.5312): 15%|ββ | 128/880 [12:42<1:09:11, 5.52s/it]
Training 3/16 epoch (loss 2.6094): 15%|ββ | 128/880 [12:47<1:09:11, 5.52s/it]
Training 3/16 epoch (loss 2.6094): 15%|ββ | 129/880 [12:47<1:06:55, 5.35s/it]
Training 3/16 epoch (loss 2.2500): 15%|ββ | 129/880 [12:51<1:06:55, 5.35s/it]
Training 3/16 epoch (loss 2.2500): 15%|ββ | 130/880 [12:51<1:02:23, 4.99s/it]
Training 3/16 epoch (loss 2.4844): 15%|ββ | 130/880 [12:56<1:02:23, 4.99s/it]
Training 3/16 epoch (loss 2.4844): 15%|ββ | 131/880 [12:56<1:02:01, 4.97s/it]
Training 3/16 epoch (loss 2.7188): 15%|ββ | 131/880 [13:01<1:02:01, 4.97s/it]
Training 3/16 epoch (loss 2.7188): 15%|ββ | 132/880 [13:01<1:03:32, 5.10s/it]
Training 3/16 epoch (loss 2.5625): 15%|ββ | 132/880 [13:05<1:03:32, 5.10s/it]
Training 3/16 epoch (loss 2.5625): 15%|ββ | 133/880 [13:05<59:49, 4.80s/it]
Training 3/16 epoch (loss 2.4844): 15%|ββ | 133/880 [13:11<59:49, 4.80s/it]
Training 3/16 epoch (loss 2.4844): 15%|ββ | 134/880 [13:11<1:00:51, 4.89s/it]
Training 3/16 epoch (loss 2.5312): 15%|ββ | 134/880 [13:23<1:00:51, 4.89s/it]
Training 3/16 epoch (loss 2.5312): 15%|ββ | 135/880 [13:23<1:28:58, 7.17s/it]
Training 3/16 epoch (loss 2.3438): 15%|ββ | 135/880 [13:29<1:28:58, 7.17s/it]
Training 3/16 epoch (loss 2.3438): 15%|ββ | 136/880 [13:29<1:23:47, 6.76s/it]
Training 3/16 epoch (loss 2.5000): 15%|ββ | 136/880 [13:34<1:23:47, 6.76s/it]
Training 3/16 epoch (loss 2.5000): 16%|ββ | 137/880 [13:34<1:18:29, 6.34s/it]
Training 3/16 epoch (loss 2.2812): 16%|ββ | 137/880 [13:39<1:18:29, 6.34s/it]
Training 3/16 epoch (loss 2.2812): 16%|ββ | 138/880 [13:39<1:14:30, 6.03s/it]
Training 3/16 epoch (loss 2.4219): 16%|ββ | 138/880 [13:44<1:14:30, 6.03s/it]
Training 3/16 epoch (loss 2.4219): 16%|ββ | 139/880 [13:44<1:09:24, 5.62s/it]
Training 3/16 epoch (loss 2.4062): 16%|ββ | 139/880 [13:50<1:09:24, 5.62s/it]
Training 3/16 epoch (loss 2.4062): 16%|ββ | 140/880 [13:50<1:08:58, 5.59s/it]
Training 3/16 epoch (loss 2.4688): 16%|ββ | 140/880 [13:54<1:08:58, 5.59s/it]
Training 3/16 epoch (loss 2.4688): 16%|ββ | 141/880 [13:54<1:04:34, 5.24s/it]
Training 3/16 epoch (loss 2.2969): 16%|ββ | 141/880 [13:59<1:04:34, 5.24s/it]
Training 3/16 epoch (loss 2.2969): 16%|ββ | 142/880 [13:59<1:02:02, 5.04s/it]
Training 3/16 epoch (loss 2.4844): 16%|ββ | 142/880 [14:04<1:02:02, 5.04s/it]
Training 3/16 epoch (loss 2.4844): 16%|ββ | 143/880 [14:04<1:03:32, 5.17s/it]
Training 3/16 epoch (loss 2.3594): 16%|ββ | 143/880 [14:10<1:03:32, 5.17s/it]
Training 3/16 epoch (loss 2.3594): 16%|ββ | 144/880 [14:10<1:04:17, 5.24s/it]
Training 3/16 epoch (loss 2.2656): 16%|ββ | 144/880 [14:15<1:04:17, 5.24s/it]
Training 3/16 epoch (loss 2.2656): 16%|ββ | 145/880 [14:15<1:04:46, 5.29s/it]
Training 3/16 epoch (loss 2.4688): 16%|ββ | 145/880 [14:21<1:04:46, 5.29s/it]
Training 3/16 epoch (loss 2.4688): 17%|ββ | 146/880 [14:21<1:08:51, 5.63s/it]
Training 3/16 epoch (loss 2.4688): 17%|ββ | 146/880 [14:36<1:08:51, 5.63s/it]
Training 3/16 epoch (loss 2.4688): 17%|ββ | 147/880 [14:36<1:40:51, 8.26s/it]
Training 3/16 epoch (loss 2.2656): 17%|ββ | 147/880 [14:41<1:40:51, 8.26s/it]
Training 3/16 epoch (loss 2.2656): 17%|ββ | 148/880 [14:41<1:29:16, 7.32s/it]
Training 3/16 epoch (loss 2.5156): 17%|ββ | 148/880 [14:45<1:29:16, 7.32s/it]
Training 3/16 epoch (loss 2.5156): 17%|ββ | 149/880 [14:45<1:19:06, 6.49s/it]
Training 3/16 epoch (loss 2.2812): 17%|ββ | 149/880 [14:51<1:19:06, 6.49s/it]
Training 3/16 epoch (loss 2.2812): 17%|ββ | 150/880 [14:51<1:14:07, 6.09s/it]
Training 3/16 epoch (loss 2.3594): 17%|ββ | 150/880 [14:56<1:14:07, 6.09s/it]
Training 3/16 epoch (loss 2.3594): 17%|ββ | 151/880 [14:56<1:09:48, 5.75s/it]
Training 3/16 epoch (loss 2.4062): 17%|ββ | 151/880 [15:01<1:09:48, 5.75s/it]
Training 3/16 epoch (loss 2.4062): 17%|ββ | 152/880 [15:01<1:08:33, 5.65s/it]
Training 3/16 epoch (loss 2.3281): 17%|ββ | 152/880 [15:07<1:08:33, 5.65s/it]
Training 3/16 epoch (loss 2.3281): 17%|ββ | 153/880 [15:07<1:10:20, 5.81s/it]
Training 3/16 epoch (loss 2.5312): 17%|ββ | 153/880 [15:15<1:10:20, 5.81s/it]
Training 3/16 epoch (loss 2.5312): 18%|ββ | 154/880 [15:15<1:17:47, 6.43s/it]
Training 3/16 epoch (loss 2.5625): 18%|ββ | 154/880 [15:31<1:17:47, 6.43s/it]
Training 3/16 epoch (loss 2.5625): 18%|ββ | 155/880 [15:31<1:51:28, 9.23s/it]
Training 3/16 epoch (loss 2.2188): 18%|ββ | 155/880 [15:37<1:51:28, 9.23s/it]
Training 3/16 epoch (loss 2.2188): 18%|ββ | 156/880 [15:37<1:39:17, 8.23s/it]
Training 3/16 epoch (loss 2.1719): 18%|ββ | 156/880 [15:42<1:39:17, 8.23s/it]
Training 3/16 epoch (loss 2.1719): 18%|ββ | 157/880 [15:42<1:26:52, 7.21s/it]
Training 3/16 epoch (loss 2.2812): 18%|ββ | 157/880 [15:47<1:26:52, 7.21s/it]
Training 3/16 epoch (loss 2.2812): 18%|ββ | 158/880 [15:47<1:20:17, 6.67s/it]
Training 3/16 epoch (loss 2.2031): 18%|ββ | 158/880 [15:54<1:20:17, 6.67s/it]
Training 3/16 epoch (loss 2.2031): 18%|ββ | 159/880 [15:54<1:21:17, 6.77s/it]
Training 3/16 epoch (loss 2.4219): 18%|ββ | 159/880 [16:00<1:21:17, 6.77s/it]
Training 3/16 epoch (loss 2.4219): 18%|ββ | 160/880 [16:00<1:17:15, 6.44s/it]
Training 3/16 epoch (loss 2.4375): 18%|ββ | 160/880 [16:06<1:17:15, 6.44s/it]
Training 3/16 epoch (loss 2.4375): 18%|ββ | 161/880 [16:06<1:17:42, 6.48s/it]
Training 3/16 epoch (loss 2.1406): 18%|ββ | 161/880 [16:12<1:17:42, 6.48s/it]
Training 3/16 epoch (loss 2.1406): 18%|ββ | 162/880 [16:12<1:13:40, 6.16s/it]
Training 3/16 epoch (loss 2.4375): 18%|ββ | 162/880 [16:16<1:13:40, 6.16s/it]
Training 3/16 epoch (loss 2.4375): 19%|ββ | 163/880 [16:16<1:08:31, 5.73s/it]
Training 3/16 epoch (loss 2.2812): 19%|ββ | 163/880 [16:23<1:08:31, 5.73s/it]
Training 3/16 epoch (loss 2.2812): 19%|ββ | 164/880 [16:23<1:12:40, 6.09s/it]
Training 3/16 epoch (loss 2.1094): 19%|ββ | 164/880 [16:28<1:12:40, 6.09s/it]
Training 3/16 epoch (loss 2.1094): 19%|ββ | 165/880 [16:28<1:07:22, 5.65s/it]
Training 4/16 epoch (loss 2.2812): 19%|ββ | 165/880 [16:33<1:07:22, 5.65s/it]
Training 4/16 epoch (loss 2.2812): 19%|ββ | 166/880 [16:33<1:05:45, 5.53s/it]
Training 4/16 epoch (loss 2.3125): 19%|ββ | 166/880 [16:39<1:05:45, 5.53s/it]
Training 4/16 epoch (loss 2.3125): 19%|ββ | 167/880 [16:39<1:07:17, 5.66s/it]
Training 4/16 epoch (loss 2.1562): 19%|ββ | 167/880 [16:45<1:07:17, 5.66s/it]
Training 4/16 epoch (loss 2.1562): 19%|ββ | 168/880 [16:45<1:08:25, 5.77s/it]
Training 4/16 epoch (loss 2.3906): 19%|ββ | 168/880 [16:50<1:08:25, 5.77s/it]
Training 4/16 epoch (loss 2.3906): 19%|ββ | 169/880 [16:50<1:04:30, 5.44s/it]
Training 4/16 epoch (loss 2.2656): 19%|ββ | 169/880 [16:55<1:04:30, 5.44s/it]
Training 4/16 epoch (loss 2.2656): 19%|ββ | 170/880 [16:55<1:03:40, 5.38s/it]
Training 4/16 epoch (loss 2.2500): 19%|ββ | 170/880 [17:01<1:03:40, 5.38s/it]
Training 4/16 epoch (loss 2.2500): 19%|ββ | 171/880 [17:01<1:05:05, 5.51s/it]
Training 4/16 epoch (loss 2.2500): 19%|ββ | 171/880 [17:05<1:05:05, 5.51s/it]
Training 4/16 epoch (loss 2.2500): 20%|ββ | 172/880 [17:05<1:01:52, 5.24s/it]
Training 4/16 epoch (loss 2.2188): 20%|ββ | 172/880 [17:11<1:01:52, 5.24s/it]
Training 4/16 epoch (loss 2.2188): 20%|ββ | 173/880 [17:11<1:02:57, 5.34s/it]
Training 4/16 epoch (loss 2.3125): 20%|ββ | 173/880 [17:20<1:02:57, 5.34s/it]
Training 4/16 epoch (loss 2.3125): 20%|ββ | 174/880 [17:20<1:14:03, 6.29s/it]
Training 4/16 epoch (loss 2.2812): 20%|ββ | 174/880 [17:25<1:14:03, 6.29s/it]
Training 4/16 epoch (loss 2.2812): 20%|ββ | 175/880 [17:25<1:09:36, 5.92s/it]
Training 4/16 epoch (loss 2.3125): 20%|ββ | 175/880 [17:29<1:09:36, 5.92s/it]
Training 4/16 epoch (loss 2.3125): 20%|ββ | 176/880 [17:29<1:05:36, 5.59s/it]
Training 4/16 epoch (loss 2.2812): 20%|ββ | 176/880 [17:37<1:05:36, 5.59s/it]
Training 4/16 epoch (loss 2.2812): 20%|ββ | 177/880 [17:37<1:12:03, 6.15s/it]
Training 4/16 epoch (loss 1.9922): 20%|ββ | 177/880 [17:42<1:12:03, 6.15s/it]
Training 4/16 epoch (loss 1.9922): 20%|ββ | 178/880 [17:42<1:07:09, 5.74s/it]
Training 4/16 epoch (loss 2.4531): 20%|ββ | 178/880 [17:46<1:07:09, 5.74s/it]
Training 4/16 epoch (loss 2.4531): 20%|ββ | 179/880 [17:46<1:03:42, 5.45s/it]
Training 4/16 epoch (loss 2.2031): 20%|ββ | 179/880 [17:52<1:03:42, 5.45s/it]
Training 4/16 epoch (loss 2.2031): 20%|ββ | 180/880 [17:52<1:03:51, 5.47s/it]
Training 4/16 epoch (loss 2.3281): 20%|ββ | 180/880 [17:57<1:03:51, 5.47s/it]
Training 4/16 epoch (loss 2.3281): 21%|ββ | 181/880 [17:57<1:02:43, 5.38s/it]
Training 4/16 epoch (loss 2.1719): 21%|ββ | 181/880 [18:04<1:02:43, 5.38s/it]
Training 4/16 epoch (loss 2.1719): 21%|ββ | 182/880 [18:04<1:06:39, 5.73s/it]
Training 4/16 epoch (loss 2.2188): 21%|ββ | 182/880 [18:09<1:06:39, 5.73s/it]
Training 4/16 epoch (loss 2.2188): 21%|ββ | 183/880 [18:09<1:04:10, 5.52s/it]
Training 4/16 epoch (loss 2.3125): 21%|ββ | 183/880 [18:14<1:04:10, 5.52s/it]
Training 4/16 epoch (loss 2.3125): 21%|ββ | 184/880 [18:14<1:02:04, 5.35s/it]
Training 4/16 epoch (loss 1.9375): 21%|ββ | 184/880 [18:18<1:02:04, 5.35s/it]
Training 4/16 epoch (loss 1.9375): 21%|ββ | 185/880 [18:18<57:49, 4.99s/it]
Training 4/16 epoch (loss 2.1719): 21%|ββ | 185/880 [18:23<57:49, 4.99s/it]
Training 4/16 epoch (loss 2.1719): 21%|ββ | 186/880 [18:23<57:26, 4.97s/it]
Training 4/16 epoch (loss 2.4688): 21%|ββ | 186/880 [18:28<57:26, 4.97s/it]
Training 4/16 epoch (loss 2.4688): 21%|βββ | 187/880 [18:28<58:51, 5.10s/it]
Training 4/16 epoch (loss 2.2812): 21%|βββ | 187/880 [18:32<58:51, 5.10s/it]
Training 4/16 epoch (loss 2.2812): 21%|βββ | 188/880 [18:32<55:22, 4.80s/it]
Training 4/16 epoch (loss 2.1406): 21%|βββ | 188/880 [18:37<55:22, 4.80s/it]
Training 4/16 epoch (loss 2.1406): 21%|βββ | 189/880 [18:37<56:19, 4.89s/it]
Training 4/16 epoch (loss 2.2500): 21%|βββ | 189/880 [18:50<56:19, 4.89s/it]
Training 4/16 epoch (loss 2.2500): 22%|βββ | 190/880 [18:50<1:22:25, 7.17s/it]
Training 4/16 epoch (loss 2.0781): 22%|βββ | 190/880 [18:56<1:22:25, 7.17s/it]
Training 4/16 epoch (loss 2.0781): 22%|βββ | 191/880 [18:56<1:17:38, 6.76s/it]
Training 4/16 epoch (loss 2.2188): 22%|βββ | 191/880 [19:01<1:17:38, 6.76s/it]
Training 4/16 epoch (loss 2.2188): 22%|βββ | 192/880 [19:01<1:12:42, 6.34s/it]
Training 4/16 epoch (loss 1.9688): 22%|βββ | 192/880 [19:06<1:12:42, 6.34s/it]
Training 4/16 epoch (loss 1.9688): 22%|βββ | 193/880 [19:06<1:09:00, 6.03s/it]
Training 4/16 epoch (loss 2.1250): 22%|βββ | 193/880 [19:11<1:09:00, 6.03s/it]
Training 4/16 epoch (loss 2.1250): 22%|βββ | 194/880 [19:11<1:04:14, 5.62s/it]
Training 4/16 epoch (loss 2.1094): 22%|βββ | 194/880 [19:16<1:04:14, 5.62s/it]
Training 4/16 epoch (loss 2.1094): 22%|βββ | 195/880 [19:16<1:03:49, 5.59s/it]
Training 4/16 epoch (loss 2.1719): 22%|βββ | 195/880 [19:21<1:03:49, 5.59s/it]
Training 4/16 epoch (loss 2.1719): 22%|βββ | 196/880 [19:21<59:43, 5.24s/it]
Training 4/16 epoch (loss 2.0156): 22%|βββ | 196/880 [19:25<59:43, 5.24s/it]
Training 4/16 epoch (loss 2.0156): 22%|βββ | 197/880 [19:25<57:20, 5.04s/it]
Training 4/16 epoch (loss 2.2188): 22%|βββ | 197/880 [19:31<57:20, 5.04s/it]
Training 4/16 epoch (loss 2.2188): 22%|βββ | 198/880 [19:31<58:41, 5.16s/it]
Training 4/16 epoch (loss 2.0938): 22%|βββ | 198/880 [19:36<58:41, 5.16s/it]
Training 4/16 epoch (loss 2.0938): 23%|βββ | 199/880 [19:36<59:21, 5.23s/it]
Training 4/16 epoch (loss 1.9844): 23%|βββ | 199/880 [19:42<59:21, 5.23s/it]
Training 4/16 epoch (loss 1.9844): 23%|βββ | 200/880 [19:42<59:49, 5.28s/it]
Training 4/16 epoch (loss 2.2031): 23%|βββ | 200/880 [19:48<59:49, 5.28s/it]
Training 4/16 epoch (loss 2.2031): 23%|βββ | 201/880 [19:48<1:03:38, 5.62s/it]
Training 4/16 epoch (loss 2.2188): 23%|βββ | 201/880 [20:02<1:03:38, 5.62s/it]
Training 4/16 epoch (loss 2.2188): 23%|βββ | 202/880 [20:02<1:33:13, 8.25s/it]
Training 4/16 epoch (loss 2.0312): 23%|βββ | 202/880 [20:08<1:33:13, 8.25s/it]
Training 4/16 epoch (loss 2.0312): 23%|βββ | 203/880 [20:08<1:22:32, 7.32s/it]
Training 4/16 epoch (loss 2.2656): 23%|βββ | 203/880 [20:12<1:22:32, 7.32s/it]
Training 4/16 epoch (loss 2.2656): 23%|βββ | 204/880 [20:12<1:13:09, 6.49s/it]
Training 4/16 epoch (loss 2.0312): 23%|βββ | 204/880 [20:17<1:13:09, 6.49s/it]
Training 4/16 epoch (loss 2.0312): 23%|βββ | 205/880 [20:17<1:08:32, 6.09s/it]
Training 4/16 epoch (loss 2.0938): 23%|βββ | 205/880 [20:22<1:08:32, 6.09s/it]
Training 4/16 epoch (loss 2.0938): 23%|βββ | 206/880 [20:22<1:04:30, 5.74s/it]
Training 4/16 epoch (loss 2.1719): 23%|βββ | 206/880 [20:28<1:04:30, 5.74s/it]
Training 4/16 epoch (loss 2.1719): 24%|βββ | 207/880 [20:28<1:03:19, 5.65s/it]
Training 4/16 epoch (loss 2.0625): 24%|βββ | 207/880 [20:34<1:03:19, 5.65s/it]
Training 4/16 epoch (loss 2.0625): 24%|βββ | 208/880 [20:34<1:04:58, 5.80s/it]
Training 4/16 epoch (loss 2.3438): 24%|βββ | 208/880 [20:42<1:04:58, 5.80s/it]
Training 4/16 epoch (loss 2.3438): 24%|βββ | 209/880 [20:42<1:11:49, 6.42s/it]
Training 4/16 epoch (loss 2.2969): 24%|βββ | 209/880 [20:57<1:11:49, 6.42s/it]
Training 4/16 epoch (loss 2.2969): 24%|βββ | 210/880 [20:57<1:42:53, 9.21s/it]
Training 4/16 epoch (loss 2.0156): 24%|βββ | 210/880 [21:03<1:42:53, 9.21s/it]
Training 4/16 epoch (loss 2.0156): 24%|βββ | 211/880 [21:03<1:31:38, 8.22s/it]
Training 4/16 epoch (loss 1.9141): 24%|βββ | 211/880 [21:08<1:31:38, 8.22s/it]
Training 4/16 epoch (loss 1.9141): 24%|βββ | 212/880 [21:08<1:20:10, 7.20s/it]
Training 4/16 epoch (loss 2.0312): 24%|βββ | 212/880 [21:14<1:20:10, 7.20s/it]
Training 4/16 epoch (loss 2.0312): 24%|βββ | 213/880 [21:14<1:14:05, 6.67s/it]
Training 4/16 epoch (loss 1.9453): 24%|βββ | 213/880 [21:21<1:14:05, 6.67s/it]
Training 4/16 epoch (loss 1.9453): 24%|βββ | 214/880 [21:21<1:15:05, 6.76s/it]
Training 4/16 epoch (loss 2.1562): 24%|βββ | 214/880 [21:26<1:15:05, 6.76s/it]
Training 4/16 epoch (loss 2.1562): 24%|βββ | 215/880 [21:26<1:11:23, 6.44s/it]
Training 4/16 epoch (loss 2.2188): 24%|βββ | 215/880 [21:33<1:11:23, 6.44s/it]
Training 4/16 epoch (loss 2.2188): 25%|βββ | 216/880 [21:33<1:11:46, 6.49s/it]
Training 4/16 epoch (loss 1.9062): 25%|βββ | 216/880 [21:38<1:11:46, 6.49s/it]
Training 4/16 epoch (loss 1.9062): 25%|βββ | 217/880 [21:38<1:08:00, 6.16s/it]
Training 4/16 epoch (loss 2.2188): 25%|βββ | 217/880 [21:43<1:08:00, 6.16s/it]
Training 4/16 epoch (loss 2.2188): 25%|βββ | 218/880 [21:43<1:03:15, 5.73s/it]
Training 4/16 epoch (loss 2.0781): 25%|βββ | 218/880 [21:50<1:03:15, 5.73s/it]
Training 4/16 epoch (loss 2.0781): 25%|βββ | 219/880 [21:50<1:07:03, 6.09s/it]
Training 4/16 epoch (loss 1.9062): 25%|βββ | 219/880 [21:55<1:07:03, 6.09s/it]
Training 4/16 epoch (loss 1.9062): 25%|βββ | 220/880 [21:55<1:02:07, 5.65s/it]
Training 5/16 epoch (loss 2.0469): 25%|βββ | 220/880 [22:00<1:02:07, 5.65s/it]
Training 5/16 epoch (loss 2.0469): 25%|βββ | 221/880 [22:00<1:00:37, 5.52s/it]
Training 5/16 epoch (loss 2.1094): 25%|βββ | 221/880 [22:06<1:00:37, 5.52s/it]
Training 5/16 epoch (loss 2.1094): 25%|βββ | 222/880 [22:06<1:02:02, 5.66s/it]
Training 5/16 epoch (loss 1.9141): 25%|βββ | 222/880 [22:12<1:02:02, 5.66s/it]
Training 5/16 epoch (loss 1.9141): 25%|βββ | 223/880 [22:12<1:03:04, 5.76s/it]
Training 5/16 epoch (loss 2.1719): 25%|βββ | 223/880 [22:16<1:03:04, 5.76s/it]
Training 5/16 epoch (loss 2.1719): 25%|βββ | 224/880 [22:16<59:29, 5.44s/it]
Training 5/16 epoch (loss 2.0781): 25%|βββ | 224/880 [22:22<59:29, 5.44s/it]
Training 5/16 epoch (loss 2.0781): 26%|βββ | 225/880 [22:22<58:47, 5.39s/it]
Training 5/16 epoch (loss 1.9766): 26%|βββ | 225/880 [22:28<58:47, 5.39s/it]
Training 5/16 epoch (loss 1.9766): 26%|βββ | 226/880 [22:28<1:00:08, 5.52s/it]
Training 5/16 epoch (loss 2.0156): 26%|βββ | 226/880 [22:32<1:00:08, 5.52s/it]
Training 5/16 epoch (loss 2.0156): 26%|βββ | 227/880 [22:32<57:10, 5.25s/it]
Training 5/16 epoch (loss 2.0156): 26%|βββ | 227/880 [22:38<57:10, 5.25s/it]
Training 5/16 epoch (loss 2.0156): 26%|βββ | 228/880 [22:38<58:08, 5.35s/it]
Training 5/16 epoch (loss 2.1250): 26%|βββ | 228/880 [22:46<58:08, 5.35s/it]
Training 5/16 epoch (loss 2.1250): 26%|βββ | 229/880 [22:46<1:08:20, 6.30s/it]
Training 5/16 epoch (loss 2.0625): 26%|βββ | 229/880 [22:51<1:08:20, 6.30s/it]
Training 5/16 epoch (loss 2.0625): 26%|βββ | 230/880 [22:51<1:04:13, 5.93s/it]
Training 5/16 epoch (loss 2.0938): 26%|βββ | 230/880 [22:56<1:04:13, 5.93s/it]
Training 5/16 epoch (loss 2.0938): 26%|βββ | 231/880 [22:56<1:00:29, 5.59s/it]
Training 5/16 epoch (loss 2.0781): 26%|βββ | 231/880 [23:04<1:00:29, 5.59s/it]
Training 5/16 epoch (loss 2.0781): 26%|βββ | 232/880 [23:04<1:06:25, 6.15s/it]
Training 5/16 epoch (loss 1.7812): 26%|βββ | 232/880 [23:08<1:06:25, 6.15s/it]
Training 5/16 epoch (loss 1.7812): 26%|βββ | 233/880 [23:08<1:01:53, 5.74s/it]
Training 5/16 epoch (loss 2.2500): 26%|βββ | 233/880 [23:13<1:01:53, 5.74s/it]
Training 5/16 epoch (loss 2.2500): 27%|βββ | 234/880 [23:13<58:41, 5.45s/it]
Training 5/16 epoch (loss 2.0000): 27%|βββ | 234/880 [23:19<58:41, 5.45s/it]
Training 5/16 epoch (loss 2.0000): 27%|βββ | 235/880 [23:19<58:49, 5.47s/it]
Training 5/16 epoch (loss 2.0781): 27%|βββ | 235/880 [23:24<58:49, 5.47s/it]
Training 5/16 epoch (loss 2.0781): 27%|βββ | 236/880 [23:24<57:48, 5.39s/it]
Training 5/16 epoch (loss 1.9609): 27%|βββ | 236/880 [23:30<57:48, 5.39s/it]
Training 5/16 epoch (loss 1.9609): 27%|βββ | 237/880 [23:30<1:01:27, 5.73s/it]
Training 5/16 epoch (loss 2.0156): 27%|βββ | 237/880 [23:35<1:01:27, 5.73s/it]
Training 5/16 epoch (loss 2.0156): 27%|βββ | 238/880 [23:35<59:11, 5.53s/it]
Training 5/16 epoch (loss 2.0938): 27%|βββ | 238/880 [23:40<59:11, 5.53s/it]
Training 5/16 epoch (loss 2.0938): 27%|βββ | 239/880 [23:40<57:14, 5.36s/it]
Training 5/16 epoch (loss 1.7344): 27%|βββ | 239/880 [23:45<57:14, 5.36s/it]
Training 5/16 epoch (loss 1.7344): 27%|βββ | 240/880 [23:45<53:17, 5.00s/it]
Training 5/16 epoch (loss 1.9375): 27%|βββ | 240/880 [23:49<53:17, 5.00s/it]
Training 5/16 epoch (loss 1.9375): 27%|βββ | 241/880 [23:49<52:56, 4.97s/it]
Training 5/16 epoch (loss 2.2656): 27%|βββ | 241/880 [23:55<52:56, 4.97s/it]
Training 5/16 epoch (loss 2.2656): 28%|βββ | 242/880 [23:55<54:11, 5.10s/it]
Training 5/16 epoch (loss 2.0625): 28%|βββ | 242/880 [23:59<54:11, 5.10s/it]
Training 5/16 epoch (loss 2.0625): 28%|βββ | 243/880 [23:59<50:59, 4.80s/it]
Training 5/16 epoch (loss 1.9062): 28%|βββ | 243/880 [24:04<50:59, 4.80s/it]
Training 5/16 epoch (loss 1.9062): 28%|βββ | 244/880 [24:04<51:50, 4.89s/it]
Training 5/16 epoch (loss 2.0156): 28%|βββ | 244/880 [24:17<51:50, 4.89s/it]
Training 5/16 epoch (loss 2.0156): 28%|βββ | 245/880 [24:17<1:15:50, 7.17s/it]
Training 5/16 epoch (loss 1.8750): 28%|βββ | 245/880 [24:22<1:15:50, 7.17s/it]
Training 5/16 epoch (loss 1.8750): 28%|βββ | 246/880 [24:22<1:11:27, 6.76s/it]
Training 5/16 epoch (loss 2.0312): 28%|βββ | 246/880 [24:28<1:11:27, 6.76s/it]
Training 5/16 epoch (loss 2.0312): 28%|βββ | 247/880 [24:28<1:06:55, 6.34s/it]
Training 5/16 epoch (loss 1.7656): 28%|βββ | 247/880 [24:33<1:06:55, 6.34s/it]
Training 5/16 epoch (loss 1.7656): 28%|βββ | 248/880 [24:33<1:03:33, 6.03s/it]
Training 5/16 epoch (loss 1.9062): 28%|βββ | 248/880 [24:38<1:03:33, 6.03s/it]
Training 5/16 epoch (loss 1.9062): 28%|βββ | 249/880 [24:38<59:10, 5.63s/it]
Training 5/16 epoch (loss 1.8984): 28%|βββ | 249/880 [24:43<59:10, 5.63s/it]
Training 5/16 epoch (loss 1.8984): 28%|βββ | 250/880 [24:43<58:46, 5.60s/it]
Training 5/16 epoch (loss 1.9766): 28%|βββ | 250/880 [24:48<58:46, 5.60s/it]
Training 5/16 epoch (loss 1.9766): 29%|βββ | 251/880 [24:48<54:58, 5.24s/it]
Training 5/16 epoch (loss 1.7969): 29%|βββ | 251/880 [24:52<54:58, 5.24s/it]
Training 5/16 epoch (loss 1.7969): 29%|βββ | 252/880 [24:52<52:44, 5.04s/it]
Training 5/16 epoch (loss 2.0156): 29%|βββ | 252/880 [24:58<52:44, 5.04s/it]
Training 5/16 epoch (loss 2.0156): 29%|βββ | 253/880 [24:58<53:58, 5.17s/it]
Training 5/16 epoch (loss 1.8594): 29%|βββ | 253/880 [25:03<53:58, 5.17s/it]
Training 5/16 epoch (loss 1.8594): 29%|βββ | 254/880 [25:03<54:34, 5.23s/it]
Training 5/16 epoch (loss 1.7969): 29%|βββ | 254/880 [25:08<54:34, 5.23s/it]
Training 5/16 epoch (loss 1.7969): 29%|βββ | 255/880 [25:08<54:59, 5.28s/it]
Training 5/16 epoch (loss 2.0000): 29%|βββ | 255/880 [25:15<54:59, 5.28s/it]
Training 5/16 epoch (loss 2.0000): 29%|βββ | 256/880 [25:15<58:27, 5.62s/it]
Training 5/16 epoch (loss 2.0625): 29%|βββ | 256/880 [25:29<58:27, 5.62s/it]
Training 5/16 epoch (loss 2.0625): 29%|βββ | 257/880 [25:29<1:25:40, 8.25s/it]
Training 5/16 epoch (loss 1.8594): 29%|βββ | 257/880 [25:34<1:25:40, 8.25s/it]
Training 5/16 epoch (loss 1.8594): 29%|βββ | 258/880 [25:34<1:15:52, 7.32s/it]
Training 5/16 epoch (loss 2.0781): 29%|βββ | 258/880 [25:39<1:15:52, 7.32s/it]
Training 5/16 epoch (loss 2.0781): 29%|βββ | 259/880 [25:39<1:07:14, 6.50s/it]
Training 5/16 epoch (loss 1.8359): 29%|βββ | 259/880 [25:44<1:07:14, 6.50s/it]
Training 5/16 epoch (loss 1.8359): 30%|βββ | 260/880 [25:44<1:03:01, 6.10s/it]
Training 5/16 epoch (loss 1.9141): 30%|βββ | 260/880 [25:49<1:03:01, 6.10s/it]
Training 5/16 epoch (loss 1.9141): 30%|βββ | 261/880 [25:49<59:19, 5.75s/it]
Training 5/16 epoch (loss 1.9688): 30%|βββ | 261/880 [25:55<59:19, 5.75s/it]
Training 5/16 epoch (loss 1.9688): 30%|βββ | 262/880 [25:55<58:12, 5.65s/it]
Training 5/16 epoch (loss 1.8516): 30%|βββ | 262/880 [26:01<58:12, 5.65s/it]
Training 5/16 epoch (loss 1.8516): 30%|βββ | 263/880 [26:01<59:41, 5.81s/it]
Training 5/16 epoch (loss 2.1562): 30%|βββ | 263/880 [26:09<59:41, 5.81s/it]
Training 5/16 epoch (loss 2.1562): 30%|βββ | 264/880 [26:09<1:05:57, 6.43s/it]
Training 5/16 epoch (loss 2.0938): 30%|βββ | 264/880 [26:24<1:05:57, 6.43s/it]
Training 5/16 epoch (loss 2.0938): 30%|βββ | 265/880 [26:24<1:34:28, 9.22s/it]
Training 5/16 epoch (loss 1.7812): 30%|βββ | 265/880 [26:30<1:34:28, 9.22s/it]
Training 5/16 epoch (loss 1.7812): 30%|βββ | 266/880 [26:30<1:24:06, 8.22s/it]
Training 5/16 epoch (loss 1.7109): 30%|βββ | 266/880 [26:35<1:24:06, 8.22s/it]
Training 5/16 epoch (loss 1.7109): 30%|βββ | 267/880 [26:35<1:13:33, 7.20s/it]
Training 5/16 epoch (loss 1.8516): 30%|βββ | 267/880 [26:40<1:13:33, 7.20s/it]
Training 5/16 epoch (loss 1.8516): 30%|βββ | 268/880 [26:40<1:07:59, 6.67s/it]
Training 5/16 epoch (loss 1.8047): 30%|βββ | 268/880 [26:47<1:07:59, 6.67s/it]
Training 5/16 epoch (loss 1.8047): 31%|βββ | 269/880 [26:47<1:08:52, 6.76s/it]
Training 5/16 epoch (loss 1.9453): 31%|βββ | 269/880 [26:53<1:08:52, 6.76s/it]
Training 5/16 epoch (loss 1.9453): 31%|βββ | 270/880 [26:53<1:05:29, 6.44s/it]
Training 5/16 epoch (loss 2.0000): 31%|βββ | 270/880 [27:00<1:05:29, 6.44s/it]
Training 5/16 epoch (loss 2.0000): 31%|βββ | 271/880 [27:00<1:05:50, 6.49s/it]
Training 5/16 epoch (loss 1.7031): 31%|βββ | 271/880 [27:05<1:05:50, 6.49s/it]
Training 5/16 epoch (loss 1.7031): 31%|βββ | 272/880 [27:05<1:02:25, 6.16s/it]
Training 5/16 epoch (loss 2.0469): 31%|βββ | 272/880 [27:10<1:02:25, 6.16s/it]
Training 5/16 epoch (loss 2.0469): 31%|βββ | 273/880 [27:10<58:03, 5.74s/it]
Training 5/16 epoch (loss 1.9219): 31%|βββ | 273/880 [27:17<58:03, 5.74s/it]
Training 5/16 epoch (loss 1.9219): 31%|βββ | 274/880 [27:17<1:01:30, 6.09s/it]
Training 5/16 epoch (loss 1.7266): 31%|βββ | 274/880 [27:21<1:01:30, 6.09s/it]
Training 5/16 epoch (loss 1.7266): 31%|ββββ | 275/880 [27:21<56:58, 5.65s/it]
Training 6/16 epoch (loss 1.8828): 31%|ββββ | 275/880 [27:27<56:58, 5.65s/it]
Training 6/16 epoch (loss 1.8828): 31%|ββββ | 276/880 [27:27<55:35, 5.52s/it]
Training 6/16 epoch (loss 1.9531): 31%|ββββ | 276/880 [27:33<55:35, 5.52s/it]
Training 6/16 epoch (loss 1.9531): 31%|ββββ | 277/880 [27:33<56:51, 5.66s/it]
Training 6/16 epoch (loss 1.7422): 31%|ββββ | 277/880 [27:39<56:51, 5.66s/it]
Training 6/16 epoch (loss 1.7422): 32%|ββββ | 278/880 [27:39<57:48, 5.76s/it]
Training 6/16 epoch (loss 2.0312): 32%|ββββ | 278/880 [27:43<57:48, 5.76s/it]
Training 6/16 epoch (loss 2.0312): 32%|ββββ | 279/880 [27:43<54:28, 5.44s/it]
Training 6/16 epoch (loss 1.9375): 32%|ββββ | 279/880 [27:49<54:28, 5.44s/it]
Training 6/16 epoch (loss 1.9375): 32%|ββββ | 280/880 [27:49<53:48, 5.38s/it]
Training 6/16 epoch (loss 1.8047): 32%|ββββ | 280/880 [27:54<53:48, 5.38s/it]
Training 6/16 epoch (loss 1.8047): 32%|ββββ | 281/880 [27:54<55:01, 5.51s/it]
Training 6/16 epoch (loss 1.8594): 32%|ββββ | 281/880 [27:59<55:01, 5.51s/it]
Training 6/16 epoch (loss 1.8594): 32%|ββββ | 282/880 [27:59<52:18, 5.25s/it]
Training 6/16 epoch (loss 1.8594): 32%|ββββ | 282/880 [28:05<52:18, 5.25s/it]
Training 6/16 epoch (loss 1.8594): 32%|ββββ | 283/880 [28:05<53:12, 5.35s/it]
Training 6/16 epoch (loss 2.0000): 32%|ββββ | 283/880 [28:13<53:12, 5.35s/it]
Training 6/16 epoch (loss 2.0000): 32%|ββββ | 284/880 [28:13<1:02:34, 6.30s/it]
Training 6/16 epoch (loss 1.9375): 32%|ββββ | 284/880 [28:18<1:02:34, 6.30s/it]
Training 6/16 epoch (loss 1.9375): 32%|ββββ | 285/880 [28:18<58:48, 5.93s/it]
Training 6/16 epoch (loss 1.9609): 32%|ββββ | 285/880 [28:23<58:48, 5.93s/it]
Training 6/16 epoch (loss 1.9609): 32%|ββββ | 286/880 [28:23<55:22, 5.59s/it]
Training 6/16 epoch (loss 1.9219): 32%|ββββ | 286/880 [28:30<55:22, 5.59s/it]
Training 6/16 epoch (loss 1.9219): 33%|ββββ | 287/880 [28:30<1:00:46, 6.15s/it]
Training 6/16 epoch (loss 1.6797): 33%|ββββ | 287/880 [28:35<1:00:46, 6.15s/it]
Training 6/16 epoch (loss 1.6797): 33%|ββββ | 288/880 [28:35<56:36, 5.74s/it]
Training 6/16 epoch (loss 2.0938): 33%|ββββ | 288/880 [28:40<56:36, 5.74s/it]
Training 6/16 epoch (loss 2.0938): 33%|ββββ | 289/880 [28:40<53:40, 5.45s/it]
Training 6/16 epoch (loss 1.8203): 33%|ββββ | 289/880 [28:45<53:40, 5.45s/it]
Training 6/16 epoch (loss 1.8203): 33%|ββββ | 290/880 [28:45<53:47, 5.47s/it]
Training 6/16 epoch (loss 1.9375): 33%|ββββ | 290/880 [28:51<53:47, 5.47s/it]
Training 6/16 epoch (loss 1.9375): 33%|ββββ | 291/880 [28:51<52:49, 5.38s/it]
Training 6/16 epoch (loss 1.8203): 33%|ββββ | 291/880 [28:57<52:49, 5.38s/it]
Training 6/16 epoch (loss 1.8203): 33%|ββββ | 292/880 [28:57<56:08, 5.73s/it]
Training 6/16 epoch (loss 1.8516): 33%|ββββ | 292/880 [29:02<56:08, 5.73s/it]
Training 6/16 epoch (loss 1.8516): 33%|ββββ | 293/880 [29:02<54:03, 5.53s/it]
Training 6/16 epoch (loss 1.9375): 33%|ββββ | 293/880 [29:07<54:03, 5.53s/it]
Training 6/16 epoch (loss 1.9375): 33%|ββββ | 294/880 [29:07<52:16, 5.35s/it]
Training 6/16 epoch (loss 1.5938): 33%|ββββ | 294/880 [29:11<52:16, 5.35s/it]
Training 6/16 epoch (loss 1.5938): 34%|ββββ | 295/880 [29:11<48:41, 4.99s/it]
Training 6/16 epoch (loss 1.8125): 34%|ββββ | 295/880 [29:16<48:41, 4.99s/it]
Training 6/16 epoch (loss 1.8125): 34%|ββββ | 296/880 [29:16<48:21, 4.97s/it]
Training 6/16 epoch (loss 2.0781): 34%|ββββ | 296/880 [29:22<48:21, 4.97s/it]
Training 6/16 epoch (loss 2.0781): 34%|ββββ | 297/880 [29:22<49:31, 5.10s/it]
Training 6/16 epoch (loss 1.8750): 34%|ββββ | 297/880 [29:26<49:31, 5.10s/it]
Training 6/16 epoch (loss 1.8750): 34%|ββββ | 298/880 [29:26<46:34, 4.80s/it]
Training 6/16 epoch (loss 1.7266): 34%|ββββ | 298/880 [29:31<46:34, 4.80s/it]
Training 6/16 epoch (loss 1.7266): 34%|ββββ | 299/880 [29:31<47:21, 4.89s/it]
Training 6/16 epoch (loss 1.8594): 34%|ββββ | 299/880 [29:43<47:21, 4.89s/it]
Training 6/16 epoch (loss 1.8594): 34%|ββββ | 300/880 [29:43<1:09:15, 7.17s/it]
Training 6/16 epoch (loss 1.6953): 34%|ββββ | 300/880 [29:49<1:09:15, 7.17s/it]
Training 6/16 epoch (loss 1.6953): 34%|ββββ | 301/880 [29:49<1:05:13, 6.76s/it]
Training 6/16 epoch (loss 1.8516): 34%|ββββ | 301/880 [29:54<1:05:13, 6.76s/it]
Training 6/16 epoch (loss 1.8516): 34%|ββββ | 302/880 [29:54<1:01:03, 6.34s/it]
Training 6/16 epoch (loss 1.5781): 34%|ββββ | 302/880 [30:00<1:01:03, 6.34s/it]
Training 6/16 epoch (loss 1.5781): 34%|ββββ | 303/880 [30:00<57:56, 6.03s/it]
Training 6/16 epoch (loss 1.7578): 34%|ββββ | 303/880 [30:04<57:56, 6.03s/it]
Training 6/16 epoch (loss 1.7578): 35%|ββββ | 304/880 [30:04<53:57, 5.62s/it]
Training 6/16 epoch (loss 1.7422): 35%|ββββ | 304/880 [30:10<53:57, 5.62s/it]
Training 6/16 epoch (loss 1.7422): 35%|ββββ | 305/880 [30:10<53:34, 5.59s/it]
Training 6/16 epoch (loss 1.8047): 35%|ββββ | 305/880 [30:14<53:34, 5.59s/it]
Training 6/16 epoch (loss 1.8047): 35%|ββββ | 306/880 [30:14<50:07, 5.24s/it]
Training 6/16 epoch (loss 1.6328): 35%|ββββ | 306/880 [30:19<50:07, 5.24s/it]
Training 6/16 epoch (loss 1.6328): 35%|ββββ | 307/880 [30:19<48:06, 5.04s/it]
Training 6/16 epoch (loss 1.8438): 35%|ββββ | 307/880 [30:24<48:06, 5.04s/it]
Training 6/16 epoch (loss 1.8438): 35%|ββββ | 308/880 [30:24<49:15, 5.17s/it]
Training 6/16 epoch (loss 1.6875): 35%|ββββ | 308/880 [30:30<49:15, 5.17s/it]
Training 6/16 epoch (loss 1.6875): 35%|ββββ | 309/880 [30:30<49:47, 5.23s/it]
Training 6/16 epoch (loss 1.6328): 35%|ββββ | 309/880 [30:35<49:47, 5.23s/it]
Training 6/16 epoch (loss 1.6328): 35%|ββββ | 310/880 [30:35<50:09, 5.28s/it]
Training 6/16 epoch (loss 1.8359): 35%|ββββ | 310/880 [30:42<50:09, 5.28s/it]
Training 6/16 epoch (loss 1.8359): 35%|ββββ | 311/880 [30:42<53:19, 5.62s/it]
Training 6/16 epoch (loss 1.8906): 35%|ββββ | 311/880 [30:56<53:19, 5.62s/it]
Training 6/16 epoch (loss 1.8906): 35%|ββββ | 312/880 [30:56<1:18:08, 8.25s/it]
Training 6/16 epoch (loss 1.6953): 35%|ββββ | 312/880 [31:01<1:18:08, 8.25s/it]
Training 6/16 epoch (loss 1.6953): 36%|ββββ | 313/880 [31:01<1:09:09, 7.32s/it]
Training 6/16 epoch (loss 1.9141): 36%|ββββ | 313/880 [31:06<1:09:09, 7.32s/it]
Training 6/16 epoch (loss 1.9141): 36%|ββββ | 314/880 [31:06<1:01:15, 6.49s/it]
Training 6/16 epoch (loss 1.7188): 36%|ββββ | 314/880 [31:11<1:01:15, 6.49s/it]
Training 6/16 epoch (loss 1.7188): 36%|ββββ | 315/880 [31:11<57:23, 6.10s/it]
Training 6/16 epoch (loss 1.7422): 36%|ββββ | 315/880 [31:16<57:23, 6.10s/it]
Training 6/16 epoch (loss 1.7422): 36%|ββββ | 316/880 [31:16<54:00, 5.75s/it]
Training 6/16 epoch (loss 1.8047): 36%|ββββ | 316/880 [31:21<54:00, 5.75s/it]
Training 6/16 epoch (loss 1.8047): 36%|ββββ | 317/880 [31:21<53:00, 5.65s/it]
Training 6/16 epoch (loss 1.6875): 36%|ββββ | 317/880 [31:27<53:00, 5.65s/it]
Training 6/16 epoch (loss 1.6875): 36%|ββββ | 318/880 [31:27<54:23, 5.81s/it]
Training 6/16 epoch (loss 2.0156): 36%|ββββ | 318/880 [31:35<54:23, 5.81s/it]
Training 6/16 epoch (loss 2.0156): 36%|ββββ | 319/880 [31:35<1:00:06, 6.43s/it]
Training 6/16 epoch (loss 1.9453): 36%|ββββ | 319/880 [31:51<1:00:06, 6.43s/it]
Training 6/16 epoch (loss 1.9453): 36%|ββββ | 320/880 [31:51<1:26:04, 9.22s/it]
Training 6/16 epoch (loss 1.6406): 36%|ββββ | 320/880 [31:57<1:26:04, 9.22s/it]
Training 6/16 epoch (loss 1.6406): 36%|ββββ | 321/880 [31:57<1:16:37, 8.23s/it]
Training 6/16 epoch (loss 1.5469): 36%|ββββ | 321/880 [32:02<1:16:37, 8.23s/it]
Training 6/16 epoch (loss 1.5469): 37%|ββββ | 322/880 [32:02<1:07:01, 7.21s/it]
Training 6/16 epoch (loss 1.6797): 37%|ββββ | 322/880 [32:07<1:07:01, 7.21s/it]
Training 6/16 epoch (loss 1.6797): 37%|ββββ | 323/880 [32:07<1:01:55, 6.67s/it]
Training 6/16 epoch (loss 1.6406): 37%|ββββ | 323/880 [32:14<1:01:55, 6.67s/it]
Training 6/16 epoch (loss 1.6406): 37%|ββββ | 324/880 [32:14<1:02:42, 6.77s/it]
Training 6/16 epoch (loss 1.7812): 37%|ββββ | 324/880 [32:20<1:02:42, 6.77s/it]
Training 6/16 epoch (loss 1.7812): 37%|ββββ | 325/880 [32:20<59:34, 6.44s/it]
Training 6/16 epoch (loss 1.8438): 37%|ββββ | 325/880 [32:26<59:34, 6.44s/it]
Training 6/16 epoch (loss 1.8438): 37%|ββββ | 326/880 [32:26<59:52, 6.48s/it]
Training 6/16 epoch (loss 1.5391): 37%|ββββ | 326/880 [32:32<59:52, 6.48s/it]
Training 6/16 epoch (loss 1.5391): 37%|ββββ | 327/880 [32:32<56:43, 6.15s/it]
Training 6/16 epoch (loss 1.8906): 37%|ββββ | 327/880 [32:37<56:43, 6.15s/it]
Training 6/16 epoch (loss 1.8906): 37%|ββββ | 328/880 [32:37<52:45, 5.73s/it]
Training 6/16 epoch (loss 1.8047): 37%|ββββ | 328/880 [32:44<52:45, 5.73s/it]
Training 6/16 epoch (loss 1.8047): 37%|ββββ | 329/880 [32:44<55:55, 6.09s/it]
Training 6/16 epoch (loss 1.5781): 37%|ββββ | 329/880 [32:48<55:55, 6.09s/it]
Training 6/16 epoch (loss 1.5781): 38%|ββββ | 330/880 [32:48<51:49, 5.65s/it]
Training 7/16 epoch (loss 1.7578): 38%|ββββ | 330/880 [32:53<51:49, 5.65s/it]
Training 7/16 epoch (loss 1.7578): 38%|ββββ | 331/880 [32:53<50:33, 5.53s/it]
Training 7/16 epoch (loss 1.8125): 38%|ββββ | 331/880 [32:59<50:33, 5.53s/it]
Training 7/16 epoch (loss 1.8125): 38%|ββββ | 332/880 [32:59<51:44, 5.66s/it]
Training 7/16 epoch (loss 1.5938): 38%|ββββ | 332/880 [33:05<51:44, 5.66s/it]
Training 7/16 epoch (loss 1.5938): 38%|ββββ | 333/880 [33:05<52:33, 5.77s/it]
Training 7/16 epoch (loss 1.9141): 38%|ββββ | 333/880 [33:10<52:33, 5.77s/it]
Training 7/16 epoch (loss 1.9141): 38%|ββββ | 334/880 [33:10<49:31, 5.44s/it]
Training 7/16 epoch (loss 1.8203): 38%|ββββ | 334/880 [33:15<49:31, 5.44s/it]
Training 7/16 epoch (loss 1.8203): 38%|ββββ | 335/880 [33:15<48:53, 5.38s/it]
Training 7/16 epoch (loss 1.6953): 38%|ββββ | 335/880 [33:21<48:53, 5.38s/it]
Training 7/16 epoch (loss 1.6953): 38%|ββββ | 336/880 [33:21<49:56, 5.51s/it]
Training 7/16 epoch (loss 1.7578): 38%|ββββ | 336/880 [33:26<49:56, 5.51s/it]
Training 7/16 epoch (loss 1.7578): 38%|ββββ | 337/880 [33:26<47:27, 5.24s/it]
Training 7/16 epoch (loss 1.7422): 38%|ββββ | 337/880 [33:31<47:27, 5.24s/it]
Training 7/16 epoch (loss 1.7422): 38%|ββββ | 338/880 [33:31<48:15, 5.34s/it]
Training 7/16 epoch (loss 1.8906): 38%|ββββ | 338/880 [33:40<48:15, 5.34s/it]
Training 7/16 epoch (loss 1.8906): 39%|ββββ | 339/880 [33:40<56:44, 6.29s/it]
Training 7/16 epoch (loss 1.8594): 39%|ββββ | 339/880 [33:45<56:44, 6.29s/it]
Training 7/16 epoch (loss 1.8594): 39%|ββββ | 340/880 [33:45<53:19, 5.93s/it]
Training 7/16 epoch (loss 1.8672): 39%|ββββ | 340/880 [33:50<53:19, 5.93s/it]
Training 7/16 epoch (loss 1.8672): 39%|ββββ | 341/880 [33:50<50:14, 5.59s/it]
Training 7/16 epoch (loss 1.7969): 39%|ββββ | 341/880 [33:57<50:14, 5.59s/it]
Training 7/16 epoch (loss 1.7969): 39%|ββββ | 342/880 [33:57<55:09, 6.15s/it]
Training 7/16 epoch (loss 1.5859): 39%|ββββ | 342/880 [34:02<55:09, 6.15s/it]
Training 7/16 epoch (loss 1.5859): 39%|ββββ | 343/880 [34:02<51:23, 5.74s/it]
Training 7/16 epoch (loss 1.9922): 39%|ββββ | 343/880 [34:07<51:23, 5.74s/it]
Training 7/16 epoch (loss 1.9922): 39%|ββββ | 344/880 [34:07<48:42, 5.45s/it]
Training 7/16 epoch (loss 1.7266): 39%|ββββ | 344/880 [34:12<48:42, 5.45s/it]
Training 7/16 epoch (loss 1.7266): 39%|ββββ | 345/880 [34:12<48:47, 5.47s/it]
Training 7/16 epoch (loss 1.8281): 39%|ββββ | 345/880 [34:17<48:47, 5.47s/it]
Training 7/16 epoch (loss 1.8281): 39%|ββββ | 346/880 [34:17<47:54, 5.38s/it]
Training 7/16 epoch (loss 1.7031): 39%|ββββ | 346/880 [34:24<47:54, 5.38s/it]
Training 7/16 epoch (loss 1.7031): 39%|ββββ | 347/880 [34:24<50:52, 5.73s/it]
Training 7/16 epoch (loss 1.7031): 39%|ββββ | 347/880 [34:29<50:52, 5.73s/it]
Training 7/16 epoch (loss 1.7031): 40%|ββββ | 348/880 [34:29<48:58, 5.52s/it]
Training 7/16 epoch (loss 1.8047): 40%|ββββ | 348/880 [34:34<48:58, 5.52s/it]
Training 7/16 epoch (loss 1.8047): 40%|ββββ | 349/880 [34:34<47:19, 5.35s/it]
Training 7/16 epoch (loss 1.4531): 40%|ββββ | 349/880 [34:38<47:19, 5.35s/it]
Training 7/16 epoch (loss 1.4531): 40%|ββββ | 350/880 [34:38<44:04, 4.99s/it]
Training 7/16 epoch (loss 1.6641): 40%|ββββ | 350/880 [34:43<44:04, 4.99s/it]
Training 7/16 epoch (loss 1.6641): 40%|ββββ | 351/880 [34:43<43:46, 4.97s/it]
Training 7/16 epoch (loss 1.9453): 40%|ββββ | 351/880 [34:48<43:46, 4.97s/it]
Training 7/16 epoch (loss 1.9453): 40%|ββββ | 352/880 [34:48<44:50, 5.09s/it]
Training 7/16 epoch (loss 1.7422): 40%|ββββ | 352/880 [34:53<44:50, 5.09s/it]
Training 7/16 epoch (loss 1.7422): 40%|ββββ | 353/880 [34:53<42:11, 4.80s/it]
Training 7/16 epoch (loss 1.6016): 40%|ββββ | 353/880 [34:58<42:11, 4.80s/it]
Training 7/16 epoch (loss 1.6016): 40%|ββββ | 354/880 [34:58<42:55, 4.90s/it]
Training 7/16 epoch (loss 1.7500): 40%|ββββ | 354/880 [35:10<42:55, 4.90s/it]
Training 7/16 epoch (loss 1.7500): 40%|ββββ | 355/880 [35:10<1:02:45, 7.17s/it]
Training 7/16 epoch (loss 1.5547): 40%|ββββ | 355/880 [35:16<1:02:45, 7.17s/it]
Training 7/16 epoch (loss 1.5547): 40%|ββββ | 356/880 [35:16<59:04, 6.76s/it]
Training 7/16 epoch (loss 1.7344): 40%|ββββ | 356/880 [35:21<59:04, 6.76s/it]
Training 7/16 epoch (loss 1.7344): 41%|ββββ | 357/880 [35:21<55:16, 6.34s/it]
Training 7/16 epoch (loss 1.4609): 41%|ββββ | 357/880 [35:27<55:16, 6.34s/it]
Training 7/16 epoch (loss 1.4609): 41%|ββββ | 358/880 [35:27<52:26, 6.03s/it]
Training 7/16 epoch (loss 1.6406): 41%|ββββ | 358/880 [35:31<52:26, 6.03s/it]
Training 7/16 epoch (loss 1.6406): 41%|ββββ | 359/880 [35:31<48:47, 5.62s/it]
Training 7/16 epoch (loss 1.6172): 41%|ββββ | 359/880 [35:37<48:47, 5.62s/it]
Training 7/16 epoch (loss 1.6172): 41%|ββββ | 360/880 [35:37<48:26, 5.59s/it]
Training 7/16 epoch (loss 1.6719): 41%|ββββ | 360/880 [35:41<48:26, 5.59s/it]
Training 7/16 epoch (loss 1.6719): 41%|ββββ | 361/880 [35:41<45:17, 5.24s/it]
Training 7/16 epoch (loss 1.4922): 41%|ββββ | 361/880 [35:46<45:17, 5.24s/it]
Training 7/16 epoch (loss 1.4922): 41%|ββββ | 362/880 [35:46<43:28, 5.04s/it]
Training 7/16 epoch (loss 1.7266): 41%|ββββ | 362/880 [35:51<43:28, 5.04s/it]
Training 7/16 epoch (loss 1.7266): 41%|βββββ | 363/880 [35:51<44:30, 5.16s/it]
Training 7/16 epoch (loss 1.5625): 41%|βββββ | 363/880 [35:57<44:30, 5.16s/it]
Training 7/16 epoch (loss 1.5625): 41%|βββββ | 364/880 [35:57<45:00, 5.23s/it]
Training 7/16 epoch (loss 1.5078): 41%|βββββ | 364/880 [36:02<45:00, 5.23s/it]
Training 7/16 epoch (loss 1.5078): 41%|βββββ | 365/880 [36:02<45:21, 5.28s/it]
Training 7/16 epoch (loss 1.6953): 41%|βββββ | 365/880 [36:08<45:21, 5.28s/it]
Training 7/16 epoch (loss 1.6953): 42%|βββββ | 366/880 [36:08<48:12, 5.63s/it]
Training 7/16 epoch (loss 1.7500): 42%|βββββ | 366/880 [36:23<48:12, 5.63s/it]
Training 7/16 epoch (loss 1.7500): 42%|βββββ | 367/880 [36:23<1:10:33, 8.25s/it]
Training 7/16 epoch (loss 1.5703): 42%|βββββ | 367/880 [36:28<1:10:33, 8.25s/it]
Training 7/16 epoch (loss 1.5703): 42%|βββββ | 368/880 [36:28<1:02:26, 7.32s/it]
Training 7/16 epoch (loss 1.7891): 42%|βββββ | 368/880 [36:32<1:02:26, 7.32s/it]
Training 7/16 epoch (loss 1.7891): 42%|βββββ | 369/880 [36:32<55:16, 6.49s/it]
Training 7/16 epoch (loss 1.5703): 42%|βββββ | 369/880 [36:38<55:16, 6.49s/it]
Training 7/16 epoch (loss 1.5703): 42%|βββββ | 370/880 [36:38<51:46, 6.09s/it]
Training 7/16 epoch (loss 1.6016): 42%|βββββ | 370/880 [36:43<51:46, 6.09s/it]
Training 7/16 epoch (loss 1.6016): 42%|βββββ | 371/880 [36:43<48:42, 5.74s/it]
Training 7/16 epoch (loss 1.6875): 42%|βββββ | 371/880 [36:48<48:42, 5.74s/it]
Training 7/16 epoch (loss 1.6875): 42%|βββββ | 372/880 [36:48<47:46, 5.64s/it]
Training 7/16 epoch (loss 1.5547): 42%|βββββ | 372/880 [36:54<47:46, 5.64s/it]
Training 7/16 epoch (loss 1.5547): 42%|βββββ | 373/880 [36:54<49:00, 5.80s/it]
Training 7/16 epoch (loss 1.9453): 42%|βββββ | 373/880 [37:02<49:00, 5.80s/it]
Training 7/16 epoch (loss 1.9453): 42%|βββββ | 374/880 [37:02<54:10, 6.42s/it]
Training 7/16 epoch (loss 1.7969): 42%|βββββ | 374/880 [37:18<54:10, 6.42s/it]
Training 7/16 epoch (loss 1.7969): 43%|βββββ | 375/880 [37:18<1:17:36, 9.22s/it]
Training 7/16 epoch (loss 1.5312): 43%|βββββ | 375/880 [37:24<1:17:36, 9.22s/it]
Training 7/16 epoch (loss 1.5312): 43%|βββββ | 376/880 [37:24<1:09:07, 8.23s/it]
Training 7/16 epoch (loss 1.4297): 43%|βββββ | 376/880 [37:29<1:09:07, 8.23s/it]
Training 7/16 epoch (loss 1.4297): 43%|βββββ | 377/880 [37:29<1:00:27, 7.21s/it]
Training 7/16 epoch (loss 1.5703): 43%|βββββ | 377/880 [37:34<1:00:27, 7.21s/it]
Training 7/16 epoch (loss 1.5703): 43%|βββββ | 378/880 [37:34<55:51, 6.68s/it]
Training 7/16 epoch (loss 1.5234): 43%|βββββ | 378/880 [37:41<55:51, 6.68s/it]
Training 7/16 epoch (loss 1.5234): 43%|βββββ | 379/880 [37:41<56:31, 6.77s/it]
Training 7/16 epoch (loss 1.6797): 43%|βββββ | 379/880 [37:47<56:31, 6.77s/it]
Training 7/16 epoch (loss 1.6797): 43%|βββββ | 380/880 [37:47<53:40, 6.44s/it]
Training 7/16 epoch (loss 1.7266): 43%|βββββ | 380/880 [37:53<53:40, 6.44s/it]
Training 7/16 epoch (loss 1.7266): 43%|βββββ | 381/880 [37:53<53:55, 6.48s/it]
Training 7/16 epoch (loss 1.4062): 43%|βββββ | 381/880 [37:59<53:55, 6.48s/it]
Training 7/16 epoch (loss 1.4062): 43%|βββββ | 382/880 [37:59<51:04, 6.15s/it]
Training 7/16 epoch (loss 1.7734): 43%|βββββ | 382/880 [38:03<51:04, 6.15s/it]
Training 7/16 epoch (loss 1.7734): 44%|βββββ | 383/880 [38:03<47:28, 5.73s/it]
Training 7/16 epoch (loss 1.7031): 44%|βββββ | 383/880 [38:10<47:28, 5.73s/it]
Training 7/16 epoch (loss 1.7031): 44%|βββββ | 384/880 [38:10<50:18, 6.09s/it]
Training 7/16 epoch (loss 1.4609): 44%|βββββ | 384/880 [38:15<50:18, 6.09s/it]
Training 7/16 epoch (loss 1.4609): 44%|βββββ | 385/880 [38:15<46:36, 5.65s/it]
Training 8/16 epoch (loss 1.6875): 44%|βββββ | 385/880 [38:20<46:36, 5.65s/it]
Training 8/16 epoch (loss 1.6875): 44%|βββββ | 386/880 [38:20<45:29, 5.53s/it]
Training 8/16 epoch (loss 1.7031): 44%|βββββ | 386/880 [38:26<45:29, 5.53s/it]
Training 8/16 epoch (loss 1.7031): 44%|βββββ | 387/880 [38:26<46:32, 5.66s/it]
Training 8/16 epoch (loss 1.5078): 44%|βββββ | 387/880 [38:32<46:32, 5.66s/it]
Training 8/16 epoch (loss 1.5078): 44%|βββββ | 388/880 [38:32<47:18, 5.77s/it]
Training 8/16 epoch (loss 1.7734): 44%|βββββ | 388/880 [38:37<47:18, 5.77s/it]
Training 8/16 epoch (loss 1.7734): 44%|βββββ | 389/880 [38:37<44:34, 5.45s/it]
Training 8/16 epoch (loss 1.6875): 44%|βββββ | 389/880 [38:42<44:34, 5.45s/it]
Training 8/16 epoch (loss 1.6875): 44%|βββββ | 390/880 [38:42<43:58, 5.39s/it]
Training 8/16 epoch (loss 1.5625): 44%|βββββ | 390/880 [38:48<43:58, 5.39s/it]
Training 8/16 epoch (loss 1.5625): 44%|βββββ | 391/880 [38:48<44:54, 5.51s/it]
Training 8/16 epoch (loss 1.6562): 44%|βββββ | 391/880 [38:52<44:54, 5.51s/it]
Training 8/16 epoch (loss 1.6562): 45%|βββββ | 392/880 [38:52<42:39, 5.25s/it]
Training 8/16 epoch (loss 1.6406): 45%|βββββ | 392/880 [38:58<42:39, 5.25s/it]
Training 8/16 epoch (loss 1.6406): 45%|βββββ | 393/880 [38:58<43:22, 5.34s/it]
Training 8/16 epoch (loss 1.8281): 45%|βββββ | 393/880 [39:07<43:22, 5.34s/it]
Training 8/16 epoch (loss 1.8281): 45%|βββββ | 394/880 [39:07<50:58, 6.29s/it]
Training 8/16 epoch (loss 1.7969): 45%|βββββ | 394/880 [39:12<50:58, 6.29s/it]
Training 8/16 epoch (loss 1.7969): 45%|βββββ | 395/880 [39:12<47:52, 5.92s/it]
Training 8/16 epoch (loss 1.7812): 45%|βββββ | 395/880 [39:16<47:52, 5.92s/it]
Training 8/16 epoch (loss 1.7812): 45%|βββββ | 396/880 [39:16<45:05, 5.59s/it]
Training 8/16 epoch (loss 1.7031): 45%|βββββ | 396/880 [39:24<45:05, 5.59s/it]
Training 8/16 epoch (loss 1.7031): 45%|βββββ | 397/880 [39:24<49:30, 6.15s/it]
Training 8/16 epoch (loss 1.5000): 45%|βββββ | 397/880 [39:29<49:30, 6.15s/it]
Training 8/16 epoch (loss 1.5000): 45%|βββββ | 398/880 [39:29<46:08, 5.74s/it]
Training 8/16 epoch (loss 1.9219): 45%|βββββ | 398/880 [39:33<46:08, 5.74s/it]
Training 8/16 epoch (loss 1.9219): 45%|βββββ | 399/880 [39:33<43:44, 5.46s/it]
Training 8/16 epoch (loss 1.6328): 45%|βββββ | 399/880 [39:39<43:44, 5.46s/it]
Training 8/16 epoch (loss 1.6328): 45%|βββββ | 400/880 [39:39<43:49, 5.48s/it]
Training 8/16 epoch (loss 1.7500): 45%|βββββ | 400/880 [39:44<43:49, 5.48s/it]
Training 8/16 epoch (loss 1.7500): 46%|βββββ | 401/880 [39:44<43:01, 5.39s/it]
Training 8/16 epoch (loss 1.5938): 46%|βββββ | 401/880 [39:51<43:01, 5.39s/it]
Training 8/16 epoch (loss 1.5938): 46%|βββββ | 402/880 [39:51<45:41, 5.74s/it]
Training 8/16 epoch (loss 1.6172): 46%|βββββ | 402/880 [39:56<45:41, 5.74s/it]
Training 8/16 epoch (loss 1.6172): 46%|βββββ | 403/880 [39:56<43:57, 5.53s/it]
Training 8/16 epoch (loss 1.6953): 46%|βββββ | 403/880 [40:01<43:57, 5.53s/it]
Training 8/16 epoch (loss 1.6953): 46%|βββββ | 404/880 [40:01<42:28, 5.35s/it]
Training 8/16 epoch (loss 1.3359): 46%|βββββ | 404/880 [40:05<42:28, 5.35s/it]
Training 8/16 epoch (loss 1.3359): 46%|βββββ | 405/880 [40:05<39:31, 4.99s/it]
Training 8/16 epoch (loss 1.5547): 46%|βββββ | 405/880 [40:10<39:31, 4.99s/it]
Training 8/16 epoch (loss 1.5547): 46%|βββββ | 406/880 [40:10<39:14, 4.97s/it]
Training 8/16 epoch (loss 1.8438): 46%|βββββ | 406/880 [40:15<39:14, 4.97s/it]
Training 8/16 epoch (loss 1.8438): 46%|βββββ | 407/880 [40:15<40:10, 5.10s/it]
Training 8/16 epoch (loss 1.6406): 46%|βββββ | 407/880 [40:19<40:10, 5.10s/it]
Training 8/16 epoch (loss 1.6406): 46%|βββββ | 408/880 [40:19<37:47, 4.80s/it]
Training 8/16 epoch (loss 1.4844): 46%|βββββ | 408/880 [40:24<37:47, 4.80s/it]
Training 8/16 epoch (loss 1.4844): 46%|βββββ | 409/880 [40:24<38:25, 4.89s/it]
Training 8/16 epoch (loss 1.6250): 46%|βββββ | 409/880 [40:37<38:25, 4.89s/it]
Training 8/16 epoch (loss 1.6250): 47%|βββββ | 410/880 [40:37<56:10, 7.17s/it]
Training 8/16 epoch (loss 1.4297): 47%|βββββ | 410/880 [40:43<56:10, 7.17s/it]
Training 8/16 epoch (loss 1.4297): 47%|βββββ | 411/880 [40:43<52:53, 6.77s/it]
Training 8/16 epoch (loss 1.6094): 47%|βββββ | 411/880 [40:48<52:53, 6.77s/it]
Training 8/16 epoch (loss 1.6094): 47%|βββββ | 412/880 [40:48<49:30, 6.35s/it]
Training 8/16 epoch (loss 1.3438): 47%|βββββ | 412/880 [40:53<49:30, 6.35s/it]
Training 8/16 epoch (loss 1.3438): 47%|βββββ | 413/880 [40:53<46:56, 6.03s/it]
Training 8/16 epoch (loss 1.5078): 47%|βββββ | 413/880 [40:58<46:56, 6.03s/it]
Training 8/16 epoch (loss 1.5078): 47%|βββββ | 414/880 [40:58<43:40, 5.62s/it]
Training 8/16 epoch (loss 1.5078): 47%|βββββ | 414/880 [41:04<43:40, 5.62s/it]
Training 8/16 epoch (loss 1.5078): 47%|βββββ | 415/880 [41:04<43:20, 5.59s/it]
Training 8/16 epoch (loss 1.5625): 47%|βββββ | 415/880 [41:08<43:20, 5.59s/it]
Training 8/16 epoch (loss 1.5625): 47%|βββββ | 416/880 [41:08<40:31, 5.24s/it]
Training 8/16 epoch (loss 1.3828): 47%|βββββ | 416/880 [41:13<40:31, 5.24s/it]
Training 8/16 epoch (loss 1.3828): 47%|βββββ | 417/880 [41:13<38:52, 5.04s/it]
Training 8/16 epoch (loss 1.6016): 47%|βββββ | 417/880 [41:18<38:52, 5.04s/it]
Training 8/16 epoch (loss 1.6016): 48%|βββββ | 418/880 [41:18<39:46, 5.17s/it]
Training 8/16 epoch (loss 1.4141): 48%|βββββ | 418/880 [41:23<39:46, 5.17s/it]
Training 8/16 epoch (loss 1.4141): 48%|βββββ | 419/880 [41:23<40:12, 5.23s/it]
Training 8/16 epoch (loss 1.3984): 48%|βββββ | 419/880 [41:29<40:12, 5.23s/it]
Training 8/16 epoch (loss 1.3984): 48%|βββββ | 420/880 [41:29<40:29, 5.28s/it]
Training 8/16 epoch (loss 1.5938): 48%|βββββ | 420/880 [41:35<40:29, 5.28s/it]
Training 8/16 epoch (loss 1.5938): 48%|βββββ | 421/880 [41:35<43:02, 5.63s/it]
Training 8/16 epoch (loss 1.6484): 48%|βββββ | 421/880 [41:50<43:02, 5.63s/it]
Training 8/16 epoch (loss 1.6484): 48%|βββββ | 422/880 [41:50<1:02:59, 8.25s/it]
Training 8/16 epoch (loss 1.4531): 48%|βββββ | 422/880 [41:55<1:02:59, 8.25s/it]
Training 8/16 epoch (loss 1.4531): 48%|βββββ | 423/880 [41:55<55:43, 7.32s/it]
Training 8/16 epoch (loss 1.6484): 48%|βββββ | 423/880 [41:59<55:43, 7.32s/it]
Training 8/16 epoch (loss 1.6484): 48%|βββββ | 424/880 [41:59<49:20, 6.49s/it]
Training 8/16 epoch (loss 1.4531): 48%|βββββ | 424/880 [42:04<49:20, 6.49s/it]
Training 8/16 epoch (loss 1.4531): 48%|βββββ | 425/880 [42:04<46:11, 6.09s/it]
Training 8/16 epoch (loss 1.4922): 48%|βββββ | 425/880 [42:09<46:11, 6.09s/it]
Training 8/16 epoch (loss 1.4922): 48%|βββββ | 426/880 [42:09<43:26, 5.74s/it]
Training 8/16 epoch (loss 1.6094): 48%|βββββ | 426/880 [42:15<43:26, 5.74s/it]
Training 8/16 epoch (loss 1.6094): 49%|βββββ | 427/880 [42:15<42:36, 5.64s/it]
Training 8/16 epoch (loss 1.4453): 49%|βββββ | 427/880 [42:21<42:36, 5.64s/it]
Training 8/16 epoch (loss 1.4453): 49%|βββββ | 428/880 [42:21<43:41, 5.80s/it]
Training 8/16 epoch (loss 1.8594): 49%|βββββ | 428/880 [42:29<43:41, 5.80s/it]
Training 8/16 epoch (loss 1.8594): 49%|βββββ | 429/880 [42:29<48:16, 6.42s/it]
Training 8/16 epoch (loss 1.6562): 49%|βββββ | 429/880 [42:45<48:16, 6.42s/it]
Training 8/16 epoch (loss 1.6562): 49%|βββββ | 430/880 [42:45<1:09:08, 9.22s/it]
Training 8/16 epoch (loss 1.4219): 49%|βββββ | 430/880 [42:50<1:09:08, 9.22s/it]
Training 8/16 epoch (loss 1.4219): 49%|βββββ | 431/880 [42:50<1:01:31, 8.22s/it]
Training 8/16 epoch (loss 1.3281): 49%|βββββ | 431/880 [42:55<1:01:31, 8.22s/it]
Training 8/16 epoch (loss 1.3281): 49%|βββββ | 432/880 [42:55<53:48, 7.21s/it]
Training 8/16 epoch (loss 1.4844): 49%|βββββ | 432/880 [43:01<53:48, 7.21s/it]
Training 8/16 epoch (loss 1.4844): 49%|βββββ | 433/880 [43:01<49:41, 6.67s/it]
Training 8/16 epoch (loss 1.4375): 49%|βββββ | 433/880 [43:08<49:41, 6.67s/it]
Training 8/16 epoch (loss 1.4375): 49%|βββββ | 434/880 [43:08<50:17, 6.77s/it]
Training 8/16 epoch (loss 1.5781): 49%|βββββ | 434/880 [43:13<50:17, 6.77s/it]
Training 8/16 epoch (loss 1.5781): 49%|βββββ | 435/880 [43:13<47:46, 6.44s/it]
Training 8/16 epoch (loss 1.6406): 49%|βββββ | 435/880 [43:20<47:46, 6.44s/it]
Training 8/16 epoch (loss 1.6406): 50%|βββββ | 436/880 [43:20<48:00, 6.49s/it]
Training 8/16 epoch (loss 1.3047): 50%|βββββ | 436/880 [43:25<48:00, 6.49s/it]
Training 8/16 epoch (loss 1.3047): 50%|βββββ | 437/880 [43:25<45:27, 6.16s/it]
Training 8/16 epoch (loss 1.6797): 50%|βββββ | 437/880 [43:30<45:27, 6.16s/it]
Training 8/16 epoch (loss 1.6797): 50%|βββββ | 438/880 [43:30<42:14, 5.73s/it]
Training 8/16 epoch (loss 1.6172): 50%|βββββ | 438/880 [43:37<42:14, 5.73s/it]
Training 8/16 epoch (loss 1.6172): 50%|βββββ | 439/880 [43:37<44:44, 6.09s/it]
Training 8/16 epoch (loss 1.3906): 50%|βββββ | 439/880 [43:42<44:44, 6.09s/it]
Training 8/16 epoch (loss 1.3906): 50%|βββββ | 440/880 [43:42<41:26, 5.65s/it]
Training 9/16 epoch (loss 1.6172): 50%|βββββ | 440/880 [43:47<41:26, 5.65s/it]
Training 9/16 epoch (loss 1.6172): 50%|βββββ | 441/880 [43:47<40:24, 5.52s/it]
Training 9/16 epoch (loss 1.6484): 50%|βββββ | 441/880 [43:53<40:24, 5.52s/it]
Training 9/16 epoch (loss 1.6484): 50%|βββββ | 442/880 [43:53<41:18, 5.66s/it]
Training 9/16 epoch (loss 1.4453): 50%|βββββ | 442/880 [43:59<41:18, 5.66s/it]
Training 9/16 epoch (loss 1.4453): 50%|βββββ | 443/880 [43:59<41:57, 5.76s/it]
Training 9/16 epoch (loss 1.7188): 50%|βββββ | 443/880 [44:04<41:57, 5.76s/it]
Training 9/16 epoch (loss 1.7188): 50%|βββββ | 444/880 [44:04<39:31, 5.44s/it]
Training 9/16 epoch (loss 1.6250): 50%|βββββ | 444/880 [44:09<39:31, 5.44s/it]
Training 9/16 epoch (loss 1.6250): 51%|βββββ | 445/880 [44:09<39:00, 5.38s/it]
Training 9/16 epoch (loss 1.4609): 51%|βββββ | 445/880 [44:15<39:00, 5.38s/it]
Training 9/16 epoch (loss 1.4609): 51%|βββββ | 446/880 [44:15<39:52, 5.51s/it]
Training 9/16 epoch (loss 1.5547): 51%|βββββ | 446/880 [44:19<39:52, 5.51s/it]
Training 9/16 epoch (loss 1.5547): 51%|βββββ | 447/880 [44:19<37:52, 5.25s/it]
Training 9/16 epoch (loss 1.5625): 51%|βββββ | 447/880 [44:25<37:52, 5.25s/it]
Training 9/16 epoch (loss 1.5625): 51%|βββββ | 448/880 [44:25<38:29, 5.35s/it]
Training 9/16 epoch (loss 1.7266): 51%|βββββ | 448/880 [44:33<38:29, 5.35s/it]
Training 9/16 epoch (loss 1.7266): 51%|βββββ | 449/880 [44:33<45:12, 6.29s/it]
Training 9/16 epoch (loss 1.7031): 51%|βββββ | 449/880 [44:38<45:12, 6.29s/it]
Training 9/16 epoch (loss 1.7031): 51%|βββββ | 450/880 [44:38<42:27, 5.92s/it]
Training 9/16 epoch (loss 1.6875): 51%|βββββ | 450/880 [44:43<42:27, 5.92s/it]
Training 9/16 epoch (loss 1.6875): 51%|ββββββ | 451/880 [44:43<39:58, 5.59s/it]
Training 9/16 epoch (loss 1.5938): 51%|ββββββ | 451/880 [44:51<39:58, 5.59s/it]
Training 9/16 epoch (loss 1.5938): 51%|ββββββ | 452/880 [44:51<43:51, 6.15s/it]
Training 9/16 epoch (loss 1.3828): 51%|ββββββ | 452/880 [44:55<43:51, 6.15s/it]
Training 9/16 epoch (loss 1.3828): 51%|ββββββ | 453/880 [44:55<40:50, 5.74s/it]
Training 9/16 epoch (loss 1.8281): 51%|ββββββ | 453/880 [45:00<40:50, 5.74s/it]
Training 9/16 epoch (loss 1.8281): 52%|ββββββ | 454/880 [45:00<38:40, 5.45s/it]
Training 9/16 epoch (loss 1.5469): 52%|ββββββ | 454/880 [45:06<38:40, 5.45s/it]
Training 9/16 epoch (loss 1.5469): 52%|ββββββ | 455/880 [45:06<38:44, 5.47s/it]
Training 9/16 epoch (loss 1.6562): 52%|ββββββ | 455/880 [45:11<38:44, 5.47s/it]
Training 9/16 epoch (loss 1.6562): 52%|ββββββ | 456/880 [45:11<38:01, 5.38s/it]
Training 9/16 epoch (loss 1.5156): 52%|ββββββ | 456/880 [45:17<38:01, 5.38s/it]
Training 9/16 epoch (loss 1.5156): 52%|ββββββ | 457/880 [45:17<40:22, 5.73s/it]
Training 9/16 epoch (loss 1.5391): 52%|ββββββ | 457/880 [45:23<40:22, 5.73s/it]
Training 9/16 epoch (loss 1.5391): 52%|ββββββ | 458/880 [45:23<38:51, 5.52s/it]
Training 9/16 epoch (loss 1.6016): 52%|ββββββ | 458/880 [45:27<38:51, 5.52s/it]
Training 9/16 epoch (loss 1.6016): 52%|ββββββ | 459/880 [45:27<37:33, 5.35s/it]
Training 9/16 epoch (loss 1.2422): 52%|ββββββ | 459/880 [45:32<37:33, 5.35s/it]
Training 9/16 epoch (loss 1.2422): 52%|ββββββ | 460/880 [45:32<34:58, 5.00s/it]
Training 9/16 epoch (loss 1.4375): 52%|ββββββ | 460/880 [45:37<34:58, 5.00s/it]
Training 9/16 epoch (loss 1.4375): 52%|ββββββ | 461/880 [45:37<34:42, 4.97s/it]
Training 9/16 epoch (loss 1.7188): 52%|ββββββ | 461/880 [45:42<34:42, 4.97s/it]
Training 9/16 epoch (loss 1.7188): 52%|ββββββ | 462/880 [45:42<35:30, 5.10s/it]
Training 9/16 epoch (loss 1.5078): 52%|ββββββ | 462/880 [45:46<35:30, 5.10s/it]
Training 9/16 epoch (loss 1.5078): 53%|ββββββ | 463/880 [45:46<33:22, 4.80s/it]
Training 9/16 epoch (loss 1.3672): 53%|ββββββ | 463/880 [45:51<33:22, 4.80s/it]
Training 9/16 epoch (loss 1.3672): 53%|ββββββ | 464/880 [45:51<33:54, 4.89s/it]
Training 9/16 epoch (loss 1.5078): 53%|ββββββ | 464/880 [46:04<33:54, 4.89s/it]
Training 9/16 epoch (loss 1.5078): 53%|ββββββ | 465/880 [46:04<49:33, 7.16s/it]
Training 9/16 epoch (loss 1.3047): 53%|ββββββ | 465/880 [46:09<49:33, 7.16s/it]
Training 9/16 epoch (loss 1.3047): 53%|ββββββ | 466/880 [46:09<46:37, 6.76s/it]
Training 9/16 epoch (loss 1.5234): 53%|ββββββ | 466/880 [46:15<46:37, 6.76s/it]
Training 9/16 epoch (loss 1.5234): 53%|ββββββ | 467/880 [46:15<43:37, 6.34s/it]
Training 9/16 epoch (loss 1.2344): 53%|ββββββ | 467/880 [46:20<43:37, 6.34s/it]
Training 9/16 epoch (loss 1.2344): 53%|ββββββ | 468/880 [46:20<41:22, 6.02s/it]
Training 9/16 epoch (loss 1.3984): 53%|ββββββ | 468/880 [46:25<41:22, 6.02s/it]
Training 9/16 epoch (loss 1.3984): 53%|ββββββ | 469/880 [46:25<38:29, 5.62s/it]
Training 9/16 epoch (loss 1.4062): 53%|ββββββ | 469/880 [46:30<38:29, 5.62s/it]
Training 9/16 epoch (loss 1.4062): 53%|ββββββ | 470/880 [46:30<38:13, 5.59s/it]
Training 9/16 epoch (loss 1.4766): 53%|ββββββ | 470/880 [46:35<38:13, 5.59s/it]
Training 9/16 epoch (loss 1.4766): 54%|ββββββ | 471/880 [46:35<35:44, 5.24s/it]
Training 9/16 epoch (loss 1.2891): 54%|ββββββ | 471/880 [46:39<35:44, 5.24s/it]
Training 9/16 epoch (loss 1.2891): 54%|ββββββ | 472/880 [46:39<34:16, 5.04s/it]
Training 9/16 epoch (loss 1.5000): 54%|ββββββ | 472/880 [46:45<34:16, 5.04s/it]
Training 9/16 epoch (loss 1.5000): 54%|ββββββ | 473/880 [46:45<35:02, 5.17s/it]
Training 9/16 epoch (loss 1.3125): 54%|ββββββ | 473/880 [46:50<35:02, 5.17s/it]
Training 9/16 epoch (loss 1.3125): 54%|ββββββ | 474/880 [46:50<35:23, 5.23s/it]
Training 9/16 epoch (loss 1.2969): 54%|ββββββ | 474/880 [46:55<35:23, 5.23s/it]
Training 9/16 epoch (loss 1.2969): 54%|ββββββ | 475/880 [46:55<35:37, 5.28s/it]
Training 9/16 epoch (loss 1.4844): 54%|ββββββ | 475/880 [47:02<35:37, 5.28s/it]
Training 9/16 epoch (loss 1.4844): 54%|ββββββ | 476/880 [47:02<37:50, 5.62s/it]
Training 9/16 epoch (loss 1.5469): 54%|ββββββ | 476/880 [47:16<37:50, 5.62s/it]
Training 9/16 epoch (loss 1.5469): 54%|ββββββ | 477/880 [47:16<55:23, 8.25s/it]
Training 9/16 epoch (loss 1.3750): 54%|ββββββ | 477/880 [47:21<55:23, 8.25s/it]
Training 9/16 epoch (loss 1.3750): 54%|ββββββ | 478/880 [47:21<48:59, 7.31s/it]
Training 9/16 epoch (loss 1.5703): 54%|ββββββ | 478/880 [47:26<48:59, 7.31s/it]
Training 9/16 epoch (loss 1.5703): 54%|ββββββ | 479/880 [47:26<43:21, 6.49s/it]
Training 9/16 epoch (loss 1.3672): 54%|ββββββ | 479/880 [47:31<43:21, 6.49s/it]
Training 9/16 epoch (loss 1.3672): 55%|ββββββ | 480/880 [47:31<40:35, 6.09s/it]
Training 9/16 epoch (loss 1.3750): 55%|ββββββ | 480/880 [47:36<40:35, 6.09s/it]
Training 9/16 epoch (loss 1.3750): 55%|ββββββ | 481/880 [47:36<38:11, 5.74s/it]
Training 9/16 epoch (loss 1.5078): 55%|ββββββ | 481/880 [47:42<38:11, 5.74s/it]
Training 9/16 epoch (loss 1.5078): 55%|ββββββ | 482/880 [47:42<37:28, 5.65s/it]
Training 9/16 epoch (loss 1.3438): 55%|ββββββ | 482/880 [47:48<37:28, 5.65s/it]
Training 9/16 epoch (loss 1.3438): 55%|ββββββ | 483/880 [47:48<38:26, 5.81s/it]
Training 9/16 epoch (loss 1.8203): 55%|ββββββ | 483/880 [47:56<38:26, 5.81s/it]
Training 9/16 epoch (loss 1.8203): 55%|ββββββ | 484/880 [47:56<42:26, 6.43s/it]
Training 9/16 epoch (loss 1.5547): 55%|ββββββ | 484/880 [48:11<42:26, 6.43s/it]
Training 9/16 epoch (loss 1.5547): 55%|ββββββ | 485/880 [48:11<1:00:44, 9.23s/it]
Training 9/16 epoch (loss 1.3438): 55%|ββββββ | 485/880 [48:17<1:00:44, 9.23s/it]
Training 9/16 epoch (loss 1.3438): 55%|ββββββ | 486/880 [48:17<54:01, 8.23s/it]
Training 9/16 epoch (loss 1.2656): 55%|ββββββ | 486/880 [48:22<54:01, 8.23s/it]
Training 9/16 epoch (loss 1.2656): 55%|ββββββ | 487/880 [48:22<47:12, 7.21s/it]
Training 9/16 epoch (loss 1.3750): 55%|ββββββ | 487/880 [48:27<47:12, 7.21s/it]
Training 9/16 epoch (loss 1.3750): 55%|ββββββ | 488/880 [48:27<43:35, 6.67s/it]
Training 9/16 epoch (loss 1.3516): 55%|ββββββ | 488/880 [48:34<43:35, 6.67s/it]
Training 9/16 epoch (loss 1.3516): 56%|ββββββ | 489/880 [48:34<44:05, 6.77s/it]
Training 9/16 epoch (loss 1.4766): 56%|ββββββ | 489/880 [48:40<44:05, 6.77s/it]
Training 9/16 epoch (loss 1.4766): 56%|ββββββ | 490/880 [48:40<41:52, 6.44s/it]
Training 9/16 epoch (loss 1.5469): 56%|ββββββ | 490/880 [48:47<41:52, 6.44s/it]
Training 9/16 epoch (loss 1.5469): 56%|ββββββ | 491/880 [48:47<42:02, 6.49s/it]
Training 9/16 epoch (loss 1.2344): 56%|ββββββ | 491/880 [48:52<42:02, 6.49s/it]
Training 9/16 epoch (loss 1.2344): 56%|ββββββ | 492/880 [48:52<39:49, 6.16s/it]
Training 9/16 epoch (loss 1.6094): 56%|ββββββ | 492/880 [48:57<39:49, 6.16s/it]
Training 9/16 epoch (loss 1.6094): 56%|ββββββ | 493/880 [48:57<37:01, 5.74s/it]
Training 9/16 epoch (loss 1.5312): 56%|ββββββ | 493/880 [49:04<37:01, 5.74s/it]
Training 9/16 epoch (loss 1.5312): 56%|ββββββ | 494/880 [49:04<39:12, 6.10s/it]
Training 9/16 epoch (loss 1.2891): 56%|ββββββ | 494/880 [49:08<39:12, 6.10s/it]
Training 9/16 epoch (loss 1.2891): 56%|ββββββ | 495/880 [49:08<36:18, 5.66s/it]
Training 10/16 epoch (loss 1.5234): 56%|ββββββ | 495/880 [49:14<36:18, 5.66s/it]
Training 10/16 epoch (loss 1.5234): 56%|ββββββ | 496/880 [49:14<35:22, 5.53s/it]
Training 10/16 epoch (loss 1.5547): 56%|ββββββ | 496/880 [49:20<35:22, 5.53s/it]
Training 10/16 epoch (loss 1.5547): 56%|ββββββ | 497/880 [49:20<36:08, 5.66s/it]
Training 10/16 epoch (loss 1.3359): 56%|ββββββ | 497/880 [49:26<36:08, 5.66s/it]
Training 10/16 epoch (loss 1.3359): 57%|ββββββ | 498/880 [49:26<36:41, 5.76s/it]
Training 10/16 epoch (loss 1.6484): 57%|ββββββ | 498/880 [49:30<36:41, 5.76s/it]
Training 10/16 epoch (loss 1.6484): 57%|ββββββ | 499/880 [49:30<34:32, 5.44s/it]
Training 10/16 epoch (loss 1.5547): 57%|ββββββ | 499/880 [49:36<34:32, 5.44s/it]
Training 10/16 epoch (loss 1.5547): 57%|ββββββ | 500/880 [49:36<34:04, 5.38s/it]
Training 10/16 epoch (loss 1.3594): 57%|ββββββ | 500/880 [49:41<34:04, 5.38s/it]
Training 10/16 epoch (loss 1.3594): 57%|ββββββ | 501/880 [49:41<34:48, 5.51s/it]
Training 10/16 epoch (loss 1.4531): 57%|ββββββ | 501/880 [49:46<34:48, 5.51s/it]
Training 10/16 epoch (loss 1.4531): 57%|ββββββ | 502/880 [49:46<33:03, 5.25s/it]
Training 10/16 epoch (loss 1.4609): 57%|ββββββ | 502/880 [49:52<33:03, 5.25s/it]
Training 10/16 epoch (loss 1.4609): 57%|ββββββ | 503/880 [49:52<33:35, 5.35s/it]
Training 10/16 epoch (loss 1.6328): 57%|ββββββ | 503/880 [50:00<33:35, 5.35s/it]
Training 10/16 epoch (loss 1.6328): 57%|ββββββ | 504/880 [50:00<39:28, 6.30s/it]
Training 10/16 epoch (loss 1.6094): 57%|ββββββ | 504/880 [50:05<39:28, 6.30s/it]
Training 10/16 epoch (loss 1.6094): 57%|ββββββ | 505/880 [50:05<37:04, 5.93s/it]
Training 10/16 epoch (loss 1.5938): 57%|ββββββ | 505/880 [50:10<37:04, 5.93s/it]
Training 10/16 epoch (loss 1.5938): 57%|ββββββ | 506/880 [50:10<34:53, 5.60s/it]
Training 10/16 epoch (loss 1.5000): 57%|ββββββ | 506/880 [50:17<34:53, 5.60s/it]
Training 10/16 epoch (loss 1.5000): 58%|ββββββ | 507/880 [50:17<38:15, 6.15s/it]
Training 10/16 epoch (loss 1.2891): 58%|ββββββ | 507/880 [50:22<38:15, 6.15s/it]
Training 10/16 epoch (loss 1.2891): 58%|ββββββ | 508/880 [50:22<35:35, 5.74s/it]
Training 10/16 epoch (loss 1.7188): 58%|ββββββ | 508/880 [50:27<35:35, 5.74s/it]
Training 10/16 epoch (loss 1.7188): 58%|ββββββ | 509/880 [50:27<33:41, 5.45s/it]
Training 10/16 epoch (loss 1.4219): 58%|ββββββ | 509/880 [50:33<33:41, 5.45s/it]
Training 10/16 epoch (loss 1.4219): 58%|ββββββ | 510/880 [50:33<33:43, 5.47s/it]
Training 10/16 epoch (loss 1.5547): 58%|ββββββ | 510/880 [50:38<33:43, 5.47s/it]
Training 10/16 epoch (loss 1.5547): 58%|ββββββ | 511/880 [50:38<33:05, 5.38s/it]
Training 10/16 epoch (loss 1.4297): 58%|ββββββ | 511/880 [50:44<33:05, 5.38s/it]
Training 10/16 epoch (loss 1.4297): 58%|ββββββ | 512/880 [50:44<35:07, 5.73s/it]
Training 10/16 epoch (loss 1.4453): 58%|ββββββ | 512/880 [50:49<35:07, 5.73s/it]
Training 10/16 epoch (loss 1.4453): 58%|ββββββ | 513/880 [50:49<33:47, 5.52s/it]
Training 10/16 epoch (loss 1.5078): 58%|ββββββ | 513/880 [50:54<33:47, 5.52s/it]
Training 10/16 epoch (loss 1.5078): 58%|ββββββ | 514/880 [50:54<32:39, 5.35s/it]
Training 10/16 epoch (loss 1.1641): 58%|ββββββ | 514/880 [50:58<32:39, 5.35s/it]
Training 10/16 epoch (loss 1.1641): 59%|ββββββ | 515/880 [50:58<30:23, 5.00s/it]
Training 10/16 epoch (loss 1.3438): 59%|ββββββ | 515/880 [51:03<30:23, 5.00s/it]
Training 10/16 epoch (loss 1.3438): 59%|ββββββ | 516/880 [51:03<30:10, 4.97s/it]
Training 10/16 epoch (loss 1.6016): 59%|ββββββ | 516/880 [51:09<30:10, 4.97s/it]
Training 10/16 epoch (loss 1.6016): 59%|ββββββ | 517/880 [51:09<30:51, 5.10s/it]
Training 10/16 epoch (loss 1.3906): 59%|ββββββ | 517/880 [51:13<30:51, 5.10s/it]
Training 10/16 epoch (loss 1.3906): 59%|ββββββ | 518/880 [51:13<28:59, 4.81s/it]
Training 10/16 epoch (loss 1.2656): 59%|ββββββ | 518/880 [51:18<28:59, 4.81s/it]
Training 10/16 epoch (loss 1.2656): 59%|ββββββ | 519/880 [51:18<29:26, 4.89s/it]
Training 10/16 epoch (loss 1.4062): 59%|ββββββ | 519/880 [51:30<29:26, 4.89s/it]
Training 10/16 epoch (loss 1.4062): 59%|ββββββ | 520/880 [51:30<42:59, 7.17s/it]
Training 10/16 epoch (loss 1.1953): 59%|ββββββ | 520/880 [51:36<42:59, 7.17s/it]
Training 10/16 epoch (loss 1.1953): 59%|ββββββ | 521/880 [51:36<40:26, 6.76s/it]
Training 10/16 epoch (loss 1.4297): 59%|ββββββ | 521/880 [51:42<40:26, 6.76s/it]
Training 10/16 epoch (loss 1.4297): 59%|ββββββ | 522/880 [51:42<37:49, 6.34s/it]
Training 10/16 epoch (loss 1.1250): 59%|ββββββ | 522/880 [51:47<37:49, 6.34s/it]
Training 10/16 epoch (loss 1.1250): 59%|ββββββ | 523/880 [51:47<35:50, 6.02s/it]
Training 10/16 epoch (loss 1.3047): 59%|ββββββ | 523/880 [51:52<35:50, 6.02s/it]
Training 10/16 epoch (loss 1.3047): 60%|ββββββ | 524/880 [51:52<33:20, 5.62s/it]
Training 10/16 epoch (loss 1.3281): 60%|ββββββ | 524/880 [51:57<33:20, 5.62s/it]
Training 10/16 epoch (loss 1.3281): 60%|ββββββ | 525/880 [51:57<33:05, 5.59s/it]
Training 10/16 epoch (loss 1.3984): 60%|ββββββ | 525/880 [52:01<33:05, 5.59s/it]
Training 10/16 epoch (loss 1.3984): 60%|ββββββ | 526/880 [52:01<30:55, 5.24s/it]
Training 10/16 epoch (loss 1.2188): 60%|ββββββ | 526/880 [52:06<30:55, 5.24s/it]
Training 10/16 epoch (loss 1.2188): 60%|ββββββ | 527/880 [52:06<29:39, 5.04s/it]
Training 10/16 epoch (loss 1.4062): 60%|ββββββ | 527/880 [52:12<29:39, 5.04s/it]
Training 10/16 epoch (loss 1.4062): 60%|ββββββ | 528/880 [52:12<30:20, 5.17s/it]
Training 10/16 epoch (loss 1.2266): 60%|ββββββ | 528/880 [52:17<30:20, 5.17s/it]
Training 10/16 epoch (loss 1.2266): 60%|ββββββ | 529/880 [52:17<30:37, 5.24s/it]
Training 10/16 epoch (loss 1.2266): 60%|ββββββ | 529/880 [52:22<30:37, 5.24s/it]
Training 10/16 epoch (loss 1.2266): 60%|ββββββ | 530/880 [52:22<30:48, 5.28s/it]
Training 10/16 epoch (loss 1.4062): 60%|ββββββ | 530/880 [52:29<30:48, 5.28s/it]
Training 10/16 epoch (loss 1.4062): 60%|ββββββ | 531/880 [52:29<32:42, 5.62s/it]
Training 10/16 epoch (loss 1.4609): 60%|ββββββ | 531/880 [52:43<32:42, 5.62s/it]
Training 10/16 epoch (loss 1.4609): 60%|ββββββ | 532/880 [52:43<47:51, 8.25s/it]
Training 10/16 epoch (loss 1.2969): 60%|ββββββ | 532/880 [52:48<47:51, 8.25s/it]
Training 10/16 epoch (loss 1.2969): 61%|ββββββ | 533/880 [52:48<42:17, 7.31s/it]
Training 10/16 epoch (loss 1.5000): 61%|ββββββ | 533/880 [52:53<42:17, 7.31s/it]
Training 10/16 epoch (loss 1.5000): 61%|ββββββ | 534/880 [52:53<37:24, 6.49s/it]
Training 10/16 epoch (loss 1.3047): 61%|ββββββ | 534/880 [52:58<37:24, 6.49s/it]
Training 10/16 epoch (loss 1.3047): 61%|ββββββ | 535/880 [52:58<35:00, 6.09s/it]
Training 10/16 epoch (loss 1.3125): 61%|ββββββ | 535/880 [53:03<35:00, 6.09s/it]
Training 10/16 epoch (loss 1.3125): 61%|ββββββ | 536/880 [53:03<32:55, 5.74s/it]
Training 10/16 epoch (loss 1.4062): 61%|ββββββ | 536/880 [53:08<32:55, 5.74s/it]
Training 10/16 epoch (loss 1.4062): 61%|ββββββ | 537/880 [53:08<32:16, 5.65s/it]
Training 10/16 epoch (loss 1.2500): 61%|ββββββ | 537/880 [53:14<32:16, 5.65s/it]
Training 10/16 epoch (loss 1.2500): 61%|ββββββ | 538/880 [53:14<33:05, 5.81s/it]
Training 10/16 epoch (loss 1.7344): 61%|ββββββ | 538/880 [53:22<33:05, 5.81s/it]
Training 10/16 epoch (loss 1.7344): 61%|βββββββ | 539/880 [53:22<36:32, 6.43s/it]
Training 10/16 epoch (loss 1.4844): 61%|βββββββ | 539/880 [53:38<36:32, 6.43s/it]
Training 10/16 epoch (loss 1.4844): 61%|βββββββ | 540/880 [53:38<52:15, 9.22s/it]
Training 10/16 epoch (loss 1.2656): 61%|βββββββ | 540/880 [53:44<52:15, 9.22s/it]
Training 10/16 epoch (loss 1.2656): 61%|βββββββ | 541/880 [53:44<46:28, 8.23s/it]
Training 10/16 epoch (loss 1.1875): 61%|βββββββ | 541/880 [53:49<46:28, 8.23s/it]
Training 10/16 epoch (loss 1.1875): 62%|βββββββ | 542/880 [53:49<40:35, 7.20s/it]
Training 10/16 epoch (loss 1.3047): 62%|βββββββ | 542/880 [53:54<40:35, 7.20s/it]
Training 10/16 epoch (loss 1.3047): 62%|βββββββ | 543/880 [53:54<37:27, 6.67s/it]
Training 10/16 epoch (loss 1.2422): 62%|βββββββ | 543/880 [54:01<37:27, 6.67s/it]
Training 10/16 epoch (loss 1.2422): 62%|βββββββ | 544/880 [54:01<37:52, 6.76s/it]
Training 10/16 epoch (loss 1.3750): 62%|βββββββ | 544/880 [54:07<37:52, 6.76s/it]
Training 10/16 epoch (loss 1.3750): 62%|βββββββ | 545/880 [54:07<35:56, 6.44s/it]
Training 10/16 epoch (loss 1.4609): 62%|βββββββ | 545/880 [54:14<35:56, 6.44s/it]
Training 10/16 epoch (loss 1.4609): 62%|βββββββ | 546/880 [54:14<36:05, 6.48s/it]
Training 10/16 epoch (loss 1.1562): 62%|βββββββ | 546/880 [54:19<36:05, 6.48s/it]
Training 10/16 epoch (loss 1.1562): 62%|βββββββ | 547/880 [54:19<34:09, 6.15s/it]
Training 10/16 epoch (loss 1.5312): 62%|βββββββ | 547/880 [54:24<34:09, 6.15s/it]
Training 10/16 epoch (loss 1.5312): 62%|βββββββ | 548/880 [54:24<31:43, 5.73s/it]
Training 10/16 epoch (loss 1.4531): 62%|βββββββ | 548/880 [54:31<31:43, 5.73s/it]
Training 10/16 epoch (loss 1.4531): 62%|βββββββ | 549/880 [54:31<33:35, 6.09s/it]
Training 10/16 epoch (loss 1.2109): 62%|βββββββ | 549/880 [54:35<33:35, 6.09s/it]
Training 10/16 epoch (loss 1.2109): 62%|βββββββ | 550/880 [54:35<31:04, 5.65s/it]
Training 11/16 epoch (loss 1.4297): 62%|βββββββ | 550/880 [54:40<31:04, 5.65s/it]
Training 11/16 epoch (loss 1.4297): 63%|βββββββ | 551/880 [54:40<30:17, 5.52s/it]
Training 11/16 epoch (loss 1.4531): 63%|βββββββ | 551/880 [54:46<30:17, 5.52s/it]
Training 11/16 epoch (loss 1.4531): 63%|βββββββ | 552/880 [54:46<30:56, 5.66s/it]
Training 11/16 epoch (loss 1.2656): 63%|βββββββ | 552/880 [54:52<30:56, 5.66s/it]
Training 11/16 epoch (loss 1.2656): 63%|βββββββ | 553/880 [54:52<31:24, 5.76s/it]
Training 11/16 epoch (loss 1.5703): 63%|βββββββ | 553/880 [54:57<31:24, 5.76s/it]
Training 11/16 epoch (loss 1.5703): 63%|βββββββ | 554/880 [54:57<29:33, 5.44s/it]
Training 11/16 epoch (loss 1.4844): 63%|βββββββ | 554/880 [55:02<29:33, 5.44s/it]
Training 11/16 epoch (loss 1.4844): 63%|βββββββ | 555/880 [55:02<29:08, 5.38s/it]
Training 11/16 epoch (loss 1.2812): 63%|βββββββ | 555/880 [55:08<29:08, 5.38s/it]
Training 11/16 epoch (loss 1.2812): 63%|βββββββ | 556/880 [55:08<29:44, 5.51s/it]
Training 11/16 epoch (loss 1.3828): 63%|βββββββ | 556/880 [55:13<29:44, 5.51s/it]
Training 11/16 epoch (loss 1.3828): 63%|βββββββ | 557/880 [55:13<28:13, 5.24s/it]
Training 11/16 epoch (loss 1.3516): 63%|βββββββ | 557/880 [55:18<28:13, 5.24s/it]
Training 11/16 epoch (loss 1.3516): 63%|βββββββ | 558/880 [55:18<28:40, 5.34s/it]
Training 11/16 epoch (loss 1.5312): 63%|βββββββ | 558/880 [55:27<28:40, 5.34s/it]
Training 11/16 epoch (loss 1.5312): 64%|βββββββ | 559/880 [55:27<33:40, 6.29s/it]
Training 11/16 epoch (loss 1.5156): 64%|βββββββ | 559/880 [55:32<33:40, 6.29s/it]
Training 11/16 epoch (loss 1.5156): 64%|βββββββ | 560/880 [55:32<31:36, 5.93s/it]
Training 11/16 epoch (loss 1.5078): 64%|βββββββ | 560/880 [55:37<31:36, 5.93s/it]
Training 11/16 epoch (loss 1.5078): 64%|βββββββ | 561/880 [55:37<29:43, 5.59s/it]
Training 11/16 epoch (loss 1.4375): 64%|βββββββ | 561/880 [55:44<29:43, 5.59s/it]
Training 11/16 epoch (loss 1.4375): 64%|βββββββ | 562/880 [55:44<32:35, 6.15s/it]
Training 11/16 epoch (loss 1.2188): 64%|βββββββ | 562/880 [55:49<32:35, 6.15s/it]
Training 11/16 epoch (loss 1.2188): 64%|βββββββ | 563/880 [55:49<30:20, 5.74s/it]
Training 11/16 epoch (loss 1.6406): 64%|βββββββ | 563/880 [55:54<30:20, 5.74s/it]
Training 11/16 epoch (loss 1.6406): 64%|βββββββ | 564/880 [55:54<28:43, 5.45s/it]
Training 11/16 epoch (loss 1.3516): 64%|βββββββ | 564/880 [55:59<28:43, 5.45s/it]
Training 11/16 epoch (loss 1.3516): 64%|βββββββ | 565/880 [55:59<28:43, 5.47s/it]
Training 11/16 epoch (loss 1.4531): 64%|βββββββ | 565/880 [56:04<28:43, 5.47s/it]
Training 11/16 epoch (loss 1.4531): 64%|βββββββ | 566/880 [56:04<28:10, 5.38s/it]
Training 11/16 epoch (loss 1.3281): 64%|βββββββ | 566/880 [56:11<28:10, 5.38s/it]
Training 11/16 epoch (loss 1.3281): 64%|βββββββ | 567/880 [56:11<29:53, 5.73s/it]
Training 11/16 epoch (loss 1.3672): 64%|βββββββ | 567/880 [56:16<29:53, 5.73s/it]
Training 11/16 epoch (loss 1.3672): 65%|βββββββ | 568/880 [56:16<28:44, 5.53s/it]
Training 11/16 epoch (loss 1.4297): 65%|βββββββ | 568/880 [56:21<28:44, 5.53s/it]
Training 11/16 epoch (loss 1.4297): 65%|βββββββ | 569/880 [56:21<27:44, 5.35s/it]
Training 11/16 epoch (loss 1.0859): 65%|βββββββ | 569/880 [56:25<27:44, 5.35s/it]
Training 11/16 epoch (loss 1.0859): 65%|βββββββ | 570/880 [56:25<25:47, 4.99s/it]
Training 11/16 epoch (loss 1.2656): 65%|βββββββ | 570/880 [56:30<25:47, 4.99s/it]
Training 11/16 epoch (loss 1.2656): 65%|βββββββ | 571/880 [56:30<25:35, 4.97s/it]
Training 11/16 epoch (loss 1.5312): 65%|βββββββ | 571/880 [56:35<25:35, 4.97s/it]
Training 11/16 epoch (loss 1.5312): 65%|βββββββ | 572/880 [56:35<26:09, 5.10s/it]
Training 11/16 epoch (loss 1.3125): 65%|βββββββ | 572/880 [56:40<26:09, 5.10s/it]
Training 11/16 epoch (loss 1.3125): 65%|βββββββ | 573/880 [56:40<24:34, 4.80s/it]
Training 11/16 epoch (loss 1.1797): 65%|βββββββ | 573/880 [56:45<24:34, 4.80s/it]
Training 11/16 epoch (loss 1.1797): 65%|βββββββ | 574/880 [56:45<24:57, 4.89s/it]
Training 11/16 epoch (loss 1.3203): 65%|βββββββ | 574/880 [56:57<24:57, 4.89s/it]
Training 11/16 epoch (loss 1.3203): 65%|βββββββ | 575/880 [56:57<36:27, 7.17s/it]
Training 11/16 epoch (loss 1.1250): 65%|βββββββ | 575/880 [57:03<36:27, 7.17s/it]
Training 11/16 epoch (loss 1.1250): 65%|βββββββ | 576/880 [57:03<34:16, 6.76s/it]
Training 11/16 epoch (loss 1.3594): 65%|βββββββ | 576/880 [57:08<34:16, 6.76s/it]
Training 11/16 epoch (loss 1.3594): 66%|βββββββ | 577/880 [57:08<32:01, 6.34s/it]
Training 11/16 epoch (loss 1.0703): 66%|βββββββ | 577/880 [57:14<32:01, 6.34s/it]
Training 11/16 epoch (loss 1.0703): 66%|βββββββ | 578/880 [57:14<30:21, 6.03s/it]
Training 11/16 epoch (loss 1.2344): 66%|βββββββ | 578/880 [57:18<30:21, 6.03s/it]
Training 11/16 epoch (loss 1.2344): 66%|βββββββ | 579/880 [57:18<28:12, 5.62s/it]
Training 11/16 epoch (loss 1.2578): 66%|βββββββ | 579/880 [57:24<28:12, 5.62s/it]
Training 11/16 epoch (loss 1.2578): 66%|βββββββ | 580/880 [57:24<27:58, 5.59s/it]
Training 11/16 epoch (loss 1.3203): 66%|βββββββ | 580/880 [57:28<27:58, 5.59s/it]
Training 11/16 epoch (loss 1.3203): 66%|βββββββ | 581/880 [57:28<26:07, 5.24s/it]
Training 11/16 epoch (loss 1.1406): 66%|βββββββ | 581/880 [57:33<26:07, 5.24s/it]
Training 11/16 epoch (loss 1.1406): 66%|βββββββ | 582/880 [57:33<25:01, 5.04s/it]
Training 11/16 epoch (loss 1.3281): 66%|βββββββ | 582/880 [57:38<25:01, 5.04s/it]
Training 11/16 epoch (loss 1.3281): 66%|βββββββ | 583/880 [57:38<25:34, 5.17s/it]
Training 11/16 epoch (loss 1.1719): 66%|βββββββ | 583/880 [57:44<25:34, 5.17s/it]
Training 11/16 epoch (loss 1.1719): 66%|βββββββ | 584/880 [57:44<25:48, 5.23s/it]
Training 11/16 epoch (loss 1.1562): 66%|βββββββ | 584/880 [57:49<25:48, 5.23s/it]
Training 11/16 epoch (loss 1.1562): 66%|βββββββ | 585/880 [57:49<25:57, 5.28s/it]
Training 11/16 epoch (loss 1.3359): 66%|βββββββ | 585/880 [57:55<25:57, 5.28s/it]
Training 11/16 epoch (loss 1.3359): 67%|βββββββ | 586/880 [57:55<27:33, 5.62s/it]
Training 11/16 epoch (loss 1.3750): 67%|βββββββ | 586/880 [58:10<27:33, 5.62s/it]
Training 11/16 epoch (loss 1.3750): 67%|βββββββ | 587/880 [58:10<40:17, 8.25s/it]
Training 11/16 epoch (loss 1.2266): 67%|βββββββ | 587/880 [58:15<40:17, 8.25s/it]
Training 11/16 epoch (loss 1.2266): 67%|βββββββ | 588/880 [58:15<35:37, 7.32s/it]
Training 11/16 epoch (loss 1.4219): 67%|βββββββ | 588/880 [58:20<35:37, 7.32s/it]
Training 11/16 epoch (loss 1.4219): 67%|βββββββ | 589/880 [58:20<31:29, 6.49s/it]
Training 11/16 epoch (loss 1.2266): 67%|βββββββ | 589/880 [58:25<31:29, 6.49s/it]
Training 11/16 epoch (loss 1.2266): 67%|βββββββ | 590/880 [58:25<29:27, 6.09s/it]
Training 11/16 epoch (loss 1.2422): 67%|βββββββ | 590/880 [58:30<29:27, 6.09s/it]
Training 11/16 epoch (loss 1.2422): 67%|βββββββ | 591/880 [58:30<27:39, 5.74s/it]
Training 11/16 epoch (loss 1.3516): 67%|βββββββ | 591/880 [58:35<27:39, 5.74s/it]
Training 11/16 epoch (loss 1.3516): 67%|βββββββ | 592/880 [58:35<27:06, 5.65s/it]
Training 11/16 epoch (loss 1.1797): 67%|βββββββ | 592/880 [58:41<27:06, 5.65s/it]
Training 11/16 epoch (loss 1.1797): 67%|βββββββ | 593/880 [58:41<27:45, 5.80s/it]
Training 11/16 epoch (loss 1.6562): 67%|βββββββ | 593/880 [58:49<27:45, 5.80s/it]
Training 11/16 epoch (loss 1.6562): 68%|βββββββ | 594/880 [58:49<30:37, 6.42s/it]
Training 11/16 epoch (loss 1.3828): 68%|βββββββ | 594/880 [59:05<30:37, 6.42s/it]
Training 11/16 epoch (loss 1.3828): 68%|βββββββ | 595/880 [59:05<43:47, 9.22s/it]
Training 11/16 epoch (loss 1.1875): 68%|βββββββ | 595/880 [59:11<43:47, 9.22s/it]
Training 11/16 epoch (loss 1.1875): 68%|βββββββ | 596/880 [59:11<38:55, 8.22s/it]
Training 11/16 epoch (loss 1.1250): 68%|βββββββ | 596/880 [59:16<38:55, 8.22s/it]
Training 11/16 epoch (loss 1.1250): 68%|βββββββ | 597/880 [59:16<33:59, 7.21s/it]
Training 11/16 epoch (loss 1.2344): 68%|βββββββ | 597/880 [59:21<33:59, 7.21s/it]
Training 11/16 epoch (loss 1.2344): 68%|βββββββ | 598/880 [59:21<31:21, 6.67s/it]
Training 11/16 epoch (loss 1.1797): 68%|βββββββ | 598/880 [59:28<31:21, 6.67s/it]
Training 11/16 epoch (loss 1.1797): 68%|βββββββ | 599/880 [59:28<31:42, 6.77s/it]
Training 11/16 epoch (loss 1.3047): 68%|βββββββ | 599/880 [59:34<31:42, 6.77s/it]
Training 11/16 epoch (loss 1.3047): 68%|βββββββ | 600/880 [59:34<30:04, 6.44s/it]
Training 11/16 epoch (loss 1.3672): 68%|βββββββ | 600/880 [59:40<30:04, 6.44s/it]
Training 11/16 epoch (loss 1.3672): 68%|βββββββ | 601/880 [59:40<30:09, 6.49s/it]
Training 11/16 epoch (loss 1.0859): 68%|βββββββ | 601/880 [59:46<30:09, 6.49s/it]
Training 11/16 epoch (loss 1.0859): 68%|βββββββ | 602/880 [59:46<28:31, 6.16s/it]
Training 11/16 epoch (loss 1.4688): 68%|βββββββ | 602/880 [59:50<28:31, 6.16s/it]
Training 11/16 epoch (loss 1.4688): 69%|βββββββ | 603/880 [59:50<26:28, 5.73s/it]
Training 11/16 epoch (loss 1.3672): 69%|βββββββ | 603/880 [59:57<26:28, 5.73s/it]
Training 11/16 epoch (loss 1.3672): 69%|βββββββ | 604/880 [59:57<27:59, 6.09s/it]
Training 11/16 epoch (loss 1.1328): 69%|βββββββ | 604/880 [1:00:02<27:59, 6.09s/it]
Training 11/16 epoch (loss 1.1328): 69%|βββββββ | 605/880 [1:00:02<25:53, 5.65s/it]
Training 12/16 epoch (loss 1.3516): 69%|βββββββ | 605/880 [1:00:07<25:53, 5.65s/it]
Training 12/16 epoch (loss 1.3516): 69%|βββββββ | 606/880 [1:00:07<25:13, 5.52s/it]
Training 12/16 epoch (loss 1.3750): 69%|βββββββ | 606/880 [1:00:13<25:13, 5.52s/it]
Training 12/16 epoch (loss 1.3750): 69%|βββββββ | 607/880 [1:00:13<25:45, 5.66s/it]
Training 12/16 epoch (loss 1.1953): 69%|βββββββ | 607/880 [1:00:19<25:45, 5.66s/it]
Training 12/16 epoch (loss 1.1953): 69%|βββββββ | 608/880 [1:00:19<26:07, 5.76s/it]
Training 12/16 epoch (loss 1.5234): 69%|βββββββ | 608/880 [1:00:24<26:07, 5.76s/it]
Training 12/16 epoch (loss 1.5234): 69%|βββββββ | 609/880 [1:00:24<24:35, 5.44s/it]
Training 12/16 epoch (loss 1.4219): 69%|βββββββ | 609/880 [1:00:29<24:35, 5.44s/it]
Training 12/16 epoch (loss 1.4219): 69%|βββββββ | 610/880 [1:00:29<24:14, 5.39s/it]
Training 12/16 epoch (loss 1.2188): 69%|βββββββ | 610/880 [1:00:35<24:14, 5.39s/it]
Training 12/16 epoch (loss 1.2188): 69%|βββββββ | 611/880 [1:00:35<24:43, 5.52s/it]
Training 12/16 epoch (loss 1.3281): 69%|βββββββ | 611/880 [1:00:40<24:43, 5.52s/it]
Training 12/16 epoch (loss 1.3281): 70%|βββββββ | 612/880 [1:00:40<23:26, 5.25s/it]
Training 12/16 epoch (loss 1.3047): 70%|βββββββ | 612/880 [1:00:45<23:26, 5.25s/it]
Training 12/16 epoch (loss 1.3047): 70%|βββββββ | 613/880 [1:00:45<23:47, 5.35s/it]
Training 12/16 epoch (loss 1.4609): 70%|βββββββ | 613/880 [1:00:54<23:47, 5.35s/it]
Training 12/16 epoch (loss 1.4609): 70%|βββββββ | 614/880 [1:00:54<27:54, 6.30s/it]
Training 12/16 epoch (loss 1.4297): 70%|βββββββ | 614/880 [1:00:59<27:54, 6.30s/it]
Training 12/16 epoch (loss 1.4297): 70%|βββββββ | 615/880 [1:00:59<26:10, 5.93s/it]
Training 12/16 epoch (loss 1.4453): 70%|βββββββ | 615/880 [1:01:04<26:10, 5.93s/it]
Training 12/16 epoch (loss 1.4453): 70%|βββββββ | 616/880 [1:01:04<24:35, 5.59s/it]
Training 12/16 epoch (loss 1.3750): 70%|βββββββ | 616/880 [1:01:11<24:35, 5.59s/it]
Training 12/16 epoch (loss 1.3750): 70%|βββββββ | 617/880 [1:01:11<26:56, 6.15s/it]
Training 12/16 epoch (loss 1.1719): 70%|βββββββ | 617/880 [1:01:16<26:56, 6.15s/it]
Training 12/16 epoch (loss 1.1719): 70%|βββββββ | 618/880 [1:01:16<25:03, 5.74s/it]
Training 12/16 epoch (loss 1.5938): 70%|βββββββ | 618/880 [1:01:21<25:03, 5.74s/it]
Training 12/16 epoch (loss 1.5938): 70%|βββββββ | 619/880 [1:01:21<23:42, 5.45s/it]
Training 12/16 epoch (loss 1.3047): 70%|βββββββ | 619/880 [1:01:26<23:42, 5.45s/it]
Training 12/16 epoch (loss 1.3047): 70%|βββββββ | 620/880 [1:01:26<23:42, 5.47s/it]
Training 12/16 epoch (loss 1.3984): 70%|βββββββ | 620/880 [1:01:31<23:42, 5.47s/it]
Training 12/16 epoch (loss 1.3984): 71%|βββββββ | 621/880 [1:01:31<23:15, 5.39s/it]
Training 12/16 epoch (loss 1.2812): 71%|βββββββ | 621/880 [1:01:38<23:15, 5.39s/it]
Training 12/16 epoch (loss 1.2812): 71%|βββββββ | 622/880 [1:01:38<24:39, 5.73s/it]
Training 12/16 epoch (loss 1.2969): 71%|βββββββ | 622/880 [1:01:43<24:39, 5.73s/it]
Training 12/16 epoch (loss 1.2969): 71%|βββββββ | 623/880 [1:01:43<23:40, 5.53s/it]
Training 12/16 epoch (loss 1.3516): 71%|βββββββ | 623/880 [1:01:48<23:40, 5.53s/it]
Training 12/16 epoch (loss 1.3516): 71%|βββββββ | 624/880 [1:01:48<22:50, 5.35s/it]
Training 12/16 epoch (loss 1.0312): 71%|βββββββ | 624/880 [1:01:52<22:50, 5.35s/it]
Training 12/16 epoch (loss 1.0312): 71%|βββββββ | 625/880 [1:01:52<21:12, 4.99s/it]
Training 12/16 epoch (loss 1.2109): 71%|βββββββ | 625/880 [1:01:57<21:12, 4.99s/it]
Training 12/16 epoch (loss 1.2109): 71%|βββββββ | 626/880 [1:01:57<21:01, 4.97s/it]
Training 12/16 epoch (loss 1.4766): 71%|βββββββ | 626/880 [1:02:02<21:01, 4.97s/it]
Training 12/16 epoch (loss 1.4766): 71%|ββββββββ | 627/880 [1:02:02<21:28, 5.09s/it]
Training 12/16 epoch (loss 1.2734): 71%|ββββββββ | 627/880 [1:02:06<21:28, 5.09s/it]
Training 12/16 epoch (loss 1.2734): 71%|ββββββββ | 628/880 [1:02:06<20:09, 4.80s/it]
Training 12/16 epoch (loss 1.1484): 71%|ββββββββ | 628/880 [1:02:11<20:09, 4.80s/it]
Training 12/16 epoch (loss 1.1484): 71%|ββββββββ | 629/880 [1:02:11<20:26, 4.89s/it]
Training 12/16 epoch (loss 1.2656): 71%|ββββββββ | 629/880 [1:02:24<20:26, 4.89s/it]
Training 12/16 epoch (loss 1.2656): 72%|ββββββββ | 630/880 [1:02:24<29:52, 7.17s/it]
Training 12/16 epoch (loss 1.0625): 72%|ββββββββ | 630/880 [1:02:30<29:52, 7.17s/it]
Training 12/16 epoch (loss 1.0625): 72%|ββββββββ | 631/880 [1:02:30<28:04, 6.76s/it]
Training 12/16 epoch (loss 1.2969): 72%|ββββββββ | 631/880 [1:02:35<28:04, 6.76s/it]
Training 12/16 epoch (loss 1.2969): 72%|ββββββββ | 632/880 [1:02:35<26:13, 6.35s/it]
Training 12/16 epoch (loss 1.0078): 72%|ββββββββ | 632/880 [1:02:40<26:13, 6.35s/it]
Training 12/16 epoch (loss 1.0078): 72%|ββββββββ | 633/880 [1:02:40<24:50, 6.03s/it]
Training 12/16 epoch (loss 1.1953): 72%|ββββββββ | 633/880 [1:02:45<24:50, 6.03s/it]
Training 12/16 epoch (loss 1.1953): 72%|ββββββββ | 634/880 [1:02:45<23:03, 5.63s/it]
Training 12/16 epoch (loss 1.2031): 72%|ββββββββ | 634/880 [1:02:51<23:03, 5.63s/it]
Training 12/16 epoch (loss 1.2031): 72%|ββββββββ | 635/880 [1:02:51<22:50, 5.59s/it]
Training 12/16 epoch (loss 1.2734): 72%|ββββββββ | 635/880 [1:02:55<22:50, 5.59s/it]
Training 12/16 epoch (loss 1.2734): 72%|ββββββββ | 636/880 [1:02:55<21:18, 5.24s/it]
Training 12/16 epoch (loss 1.0859): 72%|ββββββββ | 636/880 [1:03:00<21:18, 5.24s/it]
Training 12/16 epoch (loss 1.0859): 72%|ββββββββ | 637/880 [1:03:00<20:24, 5.04s/it]
Training 12/16 epoch (loss 1.2656): 72%|ββββββββ | 637/880 [1:03:05<20:24, 5.04s/it]
Training 12/16 epoch (loss 1.2656): 72%|ββββββββ | 638/880 [1:03:05<20:50, 5.17s/it]
Training 12/16 epoch (loss 1.1094): 72%|ββββββββ | 638/880 [1:03:10<20:50, 5.17s/it]
Training 12/16 epoch (loss 1.1094): 73%|ββββββββ | 639/880 [1:03:10<21:00, 5.23s/it]
Training 12/16 epoch (loss 1.1172): 73%|ββββββββ | 639/880 [1:03:16<21:00, 5.23s/it]
Training 12/16 epoch (loss 1.1172): 73%|ββββββββ | 640/880 [1:03:16<21:06, 5.28s/it]
Training 12/16 epoch (loss 1.2891): 73%|ββββββββ | 640/880 [1:03:22<21:06, 5.28s/it]
Training 12/16 epoch (loss 1.2891): 73%|ββββββββ | 641/880 [1:03:22<22:23, 5.62s/it]
Training 12/16 epoch (loss 1.3281): 73%|ββββββββ | 641/880 [1:03:37<22:23, 5.62s/it]
Training 12/16 epoch (loss 1.3281): 73%|ββββββββ | 642/880 [1:03:37<32:44, 8.25s/it]
Training 12/16 epoch (loss 1.1641): 73%|ββββββββ | 642/880 [1:03:42<32:44, 8.25s/it]
Training 12/16 epoch (loss 1.1641): 73%|ββββββββ | 643/880 [1:03:42<28:54, 7.32s/it]
Training 12/16 epoch (loss 1.3516): 73%|ββββββββ | 643/880 [1:03:46<28:54, 7.32s/it]
Training 12/16 epoch (loss 1.3516): 73%|ββββββββ | 644/880 [1:03:46<25:33, 6.50s/it]
Training 12/16 epoch (loss 1.1719): 73%|ββββββββ | 644/880 [1:03:52<25:33, 6.50s/it]
Training 12/16 epoch (loss 1.1719): 73%|ββββββββ | 645/880 [1:03:52<23:53, 6.10s/it]
Training 12/16 epoch (loss 1.1875): 73%|ββββββββ | 645/880 [1:03:56<23:53, 6.10s/it]
Training 12/16 epoch (loss 1.1875): 73%|ββββββββ | 646/880 [1:03:56<22:25, 5.75s/it]
Training 12/16 epoch (loss 1.2969): 73%|ββββββββ | 646/880 [1:04:02<22:25, 5.75s/it]
Training 12/16 epoch (loss 1.2969): 74%|ββββββββ | 647/880 [1:04:02<21:56, 5.65s/it]
Training 12/16 epoch (loss 1.1250): 74%|ββββββββ | 647/880 [1:04:08<21:56, 5.65s/it]
Training 12/16 epoch (loss 1.1250): 74%|ββββββββ | 648/880 [1:04:08<22:26, 5.80s/it]
Training 12/16 epoch (loss 1.6172): 74%|ββββββββ | 648/880 [1:04:16<22:26, 5.80s/it]
Training 12/16 epoch (loss 1.6172): 74%|ββββββββ | 649/880 [1:04:16<24:43, 6.42s/it]
Training 12/16 epoch (loss 1.3281): 74%|ββββββββ | 649/880 [1:04:32<24:43, 6.42s/it]
Training 12/16 epoch (loss 1.3281): 74%|ββββββββ | 650/880 [1:04:32<35:19, 9.22s/it]
Training 12/16 epoch (loss 1.1250): 74%|ββββββββ | 650/880 [1:04:38<35:19, 9.22s/it]
Training 12/16 epoch (loss 1.1250): 74%|ββββββββ | 651/880 [1:04:38<31:22, 8.22s/it]
Training 12/16 epoch (loss 1.0703): 74%|ββββββββ | 651/880 [1:04:42<31:22, 8.22s/it]
Training 12/16 epoch (loss 1.0703): 74%|ββββββββ | 652/880 [1:04:42<27:21, 7.20s/it]
Training 12/16 epoch (loss 1.1719): 74%|ββββββββ | 652/880 [1:04:48<27:21, 7.20s/it]
Training 12/16 epoch (loss 1.1719): 74%|ββββββββ | 653/880 [1:04:48<25:13, 6.67s/it]
Training 12/16 epoch (loss 1.1250): 74%|ββββββββ | 653/880 [1:04:55<25:13, 6.67s/it]
Training 12/16 epoch (loss 1.1250): 74%|ββββββββ | 654/880 [1:04:55<25:29, 6.77s/it]
Training 12/16 epoch (loss 1.2422): 74%|ββββββββ | 654/880 [1:05:00<25:29, 6.77s/it]
Training 12/16 epoch (loss 1.2422): 74%|ββββββββ | 655/880 [1:05:00<24:09, 6.44s/it]
Training 12/16 epoch (loss 1.3203): 74%|ββββββββ | 655/880 [1:05:07<24:09, 6.44s/it]
Training 12/16 epoch (loss 1.3203): 75%|ββββββββ | 656/880 [1:05:07<24:13, 6.49s/it]
Training 12/16 epoch (loss 1.0469): 75%|ββββββββ | 656/880 [1:05:12<24:13, 6.49s/it]
Training 12/16 epoch (loss 1.0469): 75%|ββββββββ | 657/880 [1:05:12<22:53, 6.16s/it]
Training 12/16 epoch (loss 1.4297): 75%|ββββββββ | 657/880 [1:05:17<22:53, 6.16s/it]
Training 12/16 epoch (loss 1.4297): 75%|ββββββββ | 658/880 [1:05:17<21:13, 5.74s/it]
Training 12/16 epoch (loss 1.3281): 75%|ββββββββ | 658/880 [1:05:24<21:13, 5.74s/it]
Training 12/16 epoch (loss 1.3281): 75%|ββββββββ | 659/880 [1:05:24<22:26, 6.09s/it]
Training 12/16 epoch (loss 1.0859): 75%|ββββββββ | 659/880 [1:05:29<22:26, 6.09s/it]
Training 12/16 epoch (loss 1.0859): 75%|ββββββββ | 660/880 [1:05:29<20:43, 5.65s/it]
Training 13/16 epoch (loss 1.2969): 75%|ββββββββ | 660/880 [1:05:34<20:43, 5.65s/it]
Training 13/16 epoch (loss 1.2969): 75%|ββββββββ | 661/880 [1:05:34<20:09, 5.52s/it]
Training 13/16 epoch (loss 1.3125): 75%|ββββββββ | 661/880 [1:05:40<20:09, 5.52s/it]
Training 13/16 epoch (loss 1.3125): 75%|ββββββββ | 662/880 [1:05:40<20:33, 5.66s/it]
Training 13/16 epoch (loss 1.1484): 75%|ββββββββ | 662/880 [1:05:46<20:33, 5.66s/it]
Training 13/16 epoch (loss 1.1484): 75%|ββββββββ | 663/880 [1:05:46<20:50, 5.76s/it]
Training 13/16 epoch (loss 1.4766): 75%|ββββββββ | 663/880 [1:05:51<20:50, 5.76s/it]
Training 13/16 epoch (loss 1.4766): 75%|ββββββββ | 664/880 [1:05:51<19:35, 5.44s/it]
Training 13/16 epoch (loss 1.3906): 75%|ββββββββ | 664/880 [1:05:56<19:35, 5.44s/it]
Training 13/16 epoch (loss 1.3906): 76%|ββββββββ | 665/880 [1:05:56<19:17, 5.38s/it]
Training 13/16 epoch (loss 1.1641): 76%|ββββββββ | 665/880 [1:06:02<19:17, 5.38s/it]
Training 13/16 epoch (loss 1.1641): 76%|ββββββββ | 666/880 [1:06:02<19:40, 5.51s/it]
Training 13/16 epoch (loss 1.2812): 76%|ββββββββ | 666/880 [1:06:06<19:40, 5.51s/it]
Training 13/16 epoch (loss 1.2812): 76%|ββββββββ | 667/880 [1:06:06<18:38, 5.25s/it]
Training 13/16 epoch (loss 1.2578): 76%|ββββββββ | 667/880 [1:06:12<18:38, 5.25s/it]
Training 13/16 epoch (loss 1.2578): 76%|ββββββββ | 668/880 [1:06:12<18:54, 5.35s/it]
Training 13/16 epoch (loss 1.4141): 76%|ββββββββ | 668/880 [1:06:20<18:54, 5.35s/it]
Training 13/16 epoch (loss 1.4141): 76%|ββββββββ | 669/880 [1:06:20<22:09, 6.30s/it]
Training 13/16 epoch (loss 1.3984): 76%|ββββββββ | 669/880 [1:06:26<22:09, 6.30s/it]
Training 13/16 epoch (loss 1.3984): 76%|ββββββββ | 670/880 [1:06:26<20:45, 5.93s/it]
Training 13/16 epoch (loss 1.4062): 76%|ββββββββ | 670/880 [1:06:30<20:45, 5.93s/it]
Training 13/16 epoch (loss 1.4062): 76%|ββββββββ | 671/880 [1:06:30<19:29, 5.59s/it]
Training 13/16 epoch (loss 1.3281): 76%|ββββββββ | 671/880 [1:06:38<19:29, 5.59s/it]
Training 13/16 epoch (loss 1.3281): 76%|ββββββββ | 672/880 [1:06:38<21:19, 6.15s/it]
Training 13/16 epoch (loss 1.1250): 76%|ββββββββ | 672/880 [1:06:43<21:19, 6.15s/it]
Training 13/16 epoch (loss 1.1250): 76%|ββββββββ | 673/880 [1:06:43<19:48, 5.74s/it]
Training 13/16 epoch (loss 1.5547): 76%|ββββββββ | 673/880 [1:06:47<19:48, 5.74s/it]
Training 13/16 epoch (loss 1.5547): 77%|ββββββββ | 674/880 [1:06:47<18:43, 5.45s/it]
Training 13/16 epoch (loss 1.2656): 77%|ββββββββ | 674/880 [1:06:53<18:43, 5.45s/it]
Training 13/16 epoch (loss 1.2656): 77%|ββββββββ | 675/880 [1:06:53<18:41, 5.47s/it]
Training 13/16 epoch (loss 1.3594): 77%|ββββββββ | 675/880 [1:06:58<18:41, 5.47s/it]
Training 13/16 epoch (loss 1.3594): 77%|ββββββββ | 676/880 [1:06:58<18:18, 5.38s/it]
Training 13/16 epoch (loss 1.2500): 77%|ββββββββ | 676/880 [1:07:05<18:18, 5.38s/it]
Training 13/16 epoch (loss 1.2500): 77%|ββββββββ | 677/880 [1:07:05<19:23, 5.73s/it]
Training 13/16 epoch (loss 1.2500): 77%|ββββββββ | 677/880 [1:07:10<19:23, 5.73s/it]
Training 13/16 epoch (loss 1.2500): 77%|ββββββββ | 678/880 [1:07:10<18:36, 5.53s/it]
Training 13/16 epoch (loss 1.3203): 77%|ββββββββ | 678/880 [1:07:15<18:36, 5.53s/it]
Training 13/16 epoch (loss 1.3203): 77%|ββββββββ | 679/880 [1:07:15<17:55, 5.35s/it]
Training 13/16 epoch (loss 0.9922): 77%|ββββββββ | 679/880 [1:07:19<17:55, 5.35s/it]
Training 13/16 epoch (loss 0.9922): 77%|ββββββββ | 680/880 [1:07:19<16:38, 4.99s/it]
Training 13/16 epoch (loss 1.1641): 77%|ββββββββ | 680/880 [1:07:24<16:38, 4.99s/it]
Training 13/16 epoch (loss 1.1641): 77%|ββββββββ | 681/880 [1:07:24<16:28, 4.97s/it]
Training 13/16 epoch (loss 1.4375): 77%|ββββββββ | 681/880 [1:07:29<16:28, 4.97s/it]
Training 13/16 epoch (loss 1.4375): 78%|ββββββββ | 682/880 [1:07:29<16:48, 5.09s/it]
Training 13/16 epoch (loss 1.2344): 78%|ββββββββ | 682/880 [1:07:33<16:48, 5.09s/it]
Training 13/16 epoch (loss 1.2344): 78%|ββββββββ | 683/880 [1:07:33<15:45, 4.80s/it]
Training 13/16 epoch (loss 1.1094): 78%|ββββββββ | 683/880 [1:07:38<15:45, 4.80s/it]
Training 13/16 epoch (loss 1.1094): 78%|ββββββββ | 684/880 [1:07:38<15:58, 4.89s/it]
Training 13/16 epoch (loss 1.2344): 78%|ββββββββ | 684/880 [1:07:51<15:58, 4.89s/it]
Training 13/16 epoch (loss 1.2344): 78%|ββββββββ | 685/880 [1:07:51<23:17, 7.17s/it]
Training 13/16 epoch (loss 1.0234): 78%|ββββββββ | 685/880 [1:07:57<23:17, 7.17s/it]
Training 13/16 epoch (loss 1.0234): 78%|ββββββββ | 686/880 [1:07:57<21:51, 6.76s/it]
Training 13/16 epoch (loss 1.2500): 78%|ββββββββ | 686/880 [1:08:02<21:51, 6.76s/it]
Training 13/16 epoch (loss 1.2500): 78%|ββββββββ | 687/880 [1:08:02<20:23, 6.34s/it]
Training 13/16 epoch (loss 0.9688): 78%|ββββββββ | 687/880 [1:08:07<20:23, 6.34s/it]
Training 13/16 epoch (loss 0.9688): 78%|ββββββββ | 688/880 [1:08:07<19:17, 6.03s/it]
Training 13/16 epoch (loss 1.1562): 78%|ββββββββ | 688/880 [1:08:12<19:17, 6.03s/it]
Training 13/16 epoch (loss 1.1562): 78%|ββββββββ | 689/880 [1:08:12<17:53, 5.62s/it]
Training 13/16 epoch (loss 1.1719): 78%|ββββββββ | 689/880 [1:08:17<17:53, 5.62s/it]
Training 13/16 epoch (loss 1.1719): 78%|ββββββββ | 690/880 [1:08:17<17:42, 5.59s/it]
Training 13/16 epoch (loss 1.2188): 78%|ββββββββ | 690/880 [1:08:22<17:42, 5.59s/it]
Training 13/16 epoch (loss 1.2188): 79%|ββββββββ | 691/880 [1:08:22<16:30, 5.24s/it]
Training 13/16 epoch (loss 1.0391): 79%|ββββββββ | 691/880 [1:08:26<16:30, 5.24s/it]
Training 13/16 epoch (loss 1.0391): 79%|ββββββββ | 692/880 [1:08:26<15:47, 5.04s/it]
Training 13/16 epoch (loss 1.2344): 79%|ββββββββ | 692/880 [1:08:32<15:47, 5.04s/it]
Training 13/16 epoch (loss 1.2344): 79%|ββββββββ | 693/880 [1:08:32<16:06, 5.17s/it]
Training 13/16 epoch (loss 1.0625): 79%|ββββββββ | 693/880 [1:08:37<16:06, 5.17s/it]
Training 13/16 epoch (loss 1.0625): 79%|ββββββββ | 694/880 [1:08:37<16:13, 5.23s/it]
Training 13/16 epoch (loss 1.0703): 79%|ββββββββ | 694/880 [1:08:43<16:13, 5.23s/it]
Training 13/16 epoch (loss 1.0703): 79%|ββββββββ | 695/880 [1:08:43<16:16, 5.28s/it]
Training 13/16 epoch (loss 1.2500): 79%|ββββββββ | 695/880 [1:08:49<16:16, 5.28s/it]
Training 13/16 epoch (loss 1.2500): 79%|ββββββββ | 696/880 [1:08:49<17:14, 5.62s/it]
Training 13/16 epoch (loss 1.2812): 79%|ββββββββ | 696/880 [1:09:03<17:14, 5.62s/it]
Training 13/16 epoch (loss 1.2812): 79%|ββββββββ | 697/880 [1:09:03<25:09, 8.25s/it]
Training 13/16 epoch (loss 1.1250): 79%|ββββββββ | 697/880 [1:09:09<25:09, 8.25s/it]
Training 13/16 epoch (loss 1.1250): 79%|ββββββββ | 698/880 [1:09:09<22:11, 7.32s/it]
Training 13/16 epoch (loss 1.3125): 79%|ββββββββ | 698/880 [1:09:13<22:11, 7.32s/it]
Training 13/16 epoch (loss 1.3125): 79%|ββββββββ | 699/880 [1:09:13<19:35, 6.49s/it]
Training 13/16 epoch (loss 1.1328): 79%|ββββββββ | 699/880 [1:09:18<19:35, 6.49s/it]
Training 13/16 epoch (loss 1.1328): 80%|ββββββββ | 700/880 [1:09:18<18:16, 6.09s/it]
Training 13/16 epoch (loss 1.1406): 80%|ββββββββ | 700/880 [1:09:23<18:16, 6.09s/it]
Training 13/16 epoch (loss 1.1406): 80%|ββββββββ | 701/880 [1:09:23<17:08, 5.74s/it]
Training 13/16 epoch (loss 1.2500): 80%|ββββββββ | 701/880 [1:09:29<17:08, 5.74s/it]
Training 13/16 epoch (loss 1.2500): 80%|ββββββββ | 702/880 [1:09:29<16:45, 5.65s/it]
Training 13/16 epoch (loss 1.0859): 80%|ββββββββ | 702/880 [1:09:35<16:45, 5.65s/it]
Training 13/16 epoch (loss 1.0859): 80%|ββββββββ | 703/880 [1:09:35<17:07, 5.80s/it]
Training 13/16 epoch (loss 1.5859): 80%|ββββββββ | 703/880 [1:09:43<17:07, 5.80s/it]
Training 13/16 epoch (loss 1.5859): 80%|ββββββββ | 704/880 [1:09:43<18:51, 6.43s/it]
Training 13/16 epoch (loss 1.2812): 80%|ββββββββ | 704/880 [1:09:58<18:51, 6.43s/it]
Training 13/16 epoch (loss 1.2812): 80%|ββββββββ | 705/880 [1:09:58<26:53, 9.22s/it]
Training 13/16 epoch (loss 1.0781): 80%|ββββββββ | 705/880 [1:10:04<26:53, 9.22s/it]
Training 13/16 epoch (loss 1.0781): 80%|ββββββββ | 706/880 [1:10:04<23:50, 8.22s/it]
Training 13/16 epoch (loss 1.0312): 80%|ββββββββ | 706/880 [1:10:09<23:50, 8.22s/it]
Training 13/16 epoch (loss 1.0312): 80%|ββββββββ | 707/880 [1:10:09<20:46, 7.20s/it]
Training 13/16 epoch (loss 1.1406): 80%|ββββββββ | 707/880 [1:10:15<20:46, 7.20s/it]
Training 13/16 epoch (loss 1.1406): 80%|ββββββββ | 708/880 [1:10:15<19:06, 6.67s/it]
Training 13/16 epoch (loss 1.0938): 80%|ββββββββ | 708/880 [1:10:22<19:06, 6.67s/it]
Training 13/16 epoch (loss 1.0938): 81%|ββββββββ | 709/880 [1:10:22<19:16, 6.76s/it]
Training 13/16 epoch (loss 1.2188): 81%|ββββββββ | 709/880 [1:10:27<19:16, 6.76s/it]
Training 13/16 epoch (loss 1.2188): 81%|ββββββββ | 710/880 [1:10:27<18:14, 6.44s/it]
Training 13/16 epoch (loss 1.2891): 81%|ββββββββ | 710/880 [1:10:34<18:14, 6.44s/it]
Training 13/16 epoch (loss 1.2891): 81%|ββββββββ | 711/880 [1:10:34<18:15, 6.48s/it]
Training 13/16 epoch (loss 1.0078): 81%|ββββββββ | 711/880 [1:10:39<18:15, 6.48s/it]
Training 13/16 epoch (loss 1.0078): 81%|ββββββββ | 712/880 [1:10:39<17:13, 6.15s/it]
Training 13/16 epoch (loss 1.4062): 81%|ββββββββ | 712/880 [1:10:44<17:13, 6.15s/it]
Training 13/16 epoch (loss 1.4062): 81%|ββββββββ | 713/880 [1:10:44<15:57, 5.73s/it]
Training 13/16 epoch (loss 1.2969): 81%|ββββββββ | 713/880 [1:10:51<15:57, 5.73s/it]
Training 13/16 epoch (loss 1.2969): 81%|ββββββββ | 714/880 [1:10:51<16:50, 6.09s/it]
Training 13/16 epoch (loss 1.0625): 81%|ββββββββ | 714/880 [1:10:56<16:50, 6.09s/it]
Training 13/16 epoch (loss 1.0625): 81%|βββββββββ | 715/880 [1:10:56<15:32, 5.65s/it]
Training 14/16 epoch (loss 1.2656): 81%|βββββββββ | 715/880 [1:11:01<15:32, 5.65s/it]
Training 14/16 epoch (loss 1.2656): 81%|βββββββββ | 716/880 [1:11:01<15:06, 5.53s/it]
Training 14/16 epoch (loss 1.2969): 81%|βββββββββ | 716/880 [1:11:07<15:06, 5.53s/it]
Training 14/16 epoch (loss 1.2969): 81%|βββββββββ | 717/880 [1:11:07<15:22, 5.66s/it]
Training 14/16 epoch (loss 1.1094): 81%|βββββββββ | 717/880 [1:11:13<15:22, 5.66s/it]
Training 14/16 epoch (loss 1.1094): 82%|βββββββββ | 718/880 [1:11:13<15:33, 5.76s/it]
Training 14/16 epoch (loss 1.4531): 82%|βββββββββ | 718/880 [1:11:17<15:33, 5.76s/it]
Training 14/16 epoch (loss 1.4531): 82%|βββββββββ | 719/880 [1:11:17<14:35, 5.44s/it]
Training 14/16 epoch (loss 1.3594): 82%|βββββββββ | 719/880 [1:11:23<14:35, 5.44s/it]
Training 14/16 epoch (loss 1.3594): 82%|βββββββββ | 720/880 [1:11:23<14:20, 5.38s/it]
Training 14/16 epoch (loss 1.1484): 82%|βββββββββ | 720/880 [1:11:28<14:20, 5.38s/it]
Training 14/16 epoch (loss 1.1484): 82%|βββββββββ | 721/880 [1:11:28<14:35, 5.51s/it]
Training 14/16 epoch (loss 1.2578): 82%|βββββββββ | 721/880 [1:11:33<14:35, 5.51s/it]
Training 14/16 epoch (loss 1.2578): 82%|βββββββββ | 722/880 [1:11:33<13:48, 5.24s/it]
Training 14/16 epoch (loss 1.2344): 82%|βββββββββ | 722/880 [1:11:39<13:48, 5.24s/it]
Training 14/16 epoch (loss 1.2344): 82%|βββββββββ | 723/880 [1:11:39<13:58, 5.34s/it]
Training 14/16 epoch (loss 1.3984): 82%|βββββββββ | 723/880 [1:11:47<13:58, 5.34s/it]
Training 14/16 epoch (loss 1.3984): 82%|βββββββββ | 724/880 [1:11:47<16:21, 6.29s/it]
Training 14/16 epoch (loss 1.3672): 82%|βββββββββ | 724/880 [1:11:52<16:21, 6.29s/it]
Training 14/16 epoch (loss 1.3672): 82%|βββββββββ | 725/880 [1:11:52<15:18, 5.93s/it]
Training 14/16 epoch (loss 1.3672): 82%|βββββββββ | 725/880 [1:11:57<15:18, 5.93s/it]
Training 14/16 epoch (loss 1.3672): 82%|βββββββββ | 726/880 [1:11:57<14:21, 5.59s/it]
Training 14/16 epoch (loss 1.2969): 82%|βββββββββ | 726/880 [1:12:05<14:21, 5.59s/it]
Training 14/16 epoch (loss 1.2969): 83%|βββββββββ | 727/880 [1:12:05<15:41, 6.15s/it]
Training 14/16 epoch (loss 1.1016): 83%|βββββββββ | 727/880 [1:12:09<15:41, 6.15s/it]
Training 14/16 epoch (loss 1.1016): 83%|βββββββββ | 728/880 [1:12:09<14:32, 5.74s/it]
Training 14/16 epoch (loss 1.5312): 83%|βββββββββ | 728/880 [1:12:14<14:32, 5.74s/it]
Training 14/16 epoch (loss 1.5312): 83%|βββββββββ | 729/880 [1:12:14<13:43, 5.45s/it]
Training 14/16 epoch (loss 1.2422): 83%|βββββββββ | 729/880 [1:12:20<13:43, 5.45s/it]
Training 14/16 epoch (loss 1.2422): 83%|βββββββββ | 730/880 [1:12:20<13:40, 5.47s/it]
Training 14/16 epoch (loss 1.3438): 83%|βββββββββ | 730/880 [1:12:25<13:40, 5.47s/it]
Training 14/16 epoch (loss 1.3438): 83%|βββββββββ | 731/880 [1:12:25<13:21, 5.38s/it]
Training 14/16 epoch (loss 1.2188): 83%|βββββββββ | 731/880 [1:12:31<13:21, 5.38s/it]
Training 14/16 epoch (loss 1.2188): 83%|βββββββββ | 732/880 [1:12:31<14:07, 5.73s/it]
Training 14/16 epoch (loss 1.2266): 83%|βββββββββ | 732/880 [1:12:36<14:07, 5.73s/it]
Training 14/16 epoch (loss 1.2266): 83%|βββββββββ | 733/880 [1:12:36<13:31, 5.52s/it]
Training 14/16 epoch (loss 1.2969): 83%|βββββββββ | 733/880 [1:12:41<13:31, 5.52s/it]
Training 14/16 epoch (loss 1.2969): 83%|βββββββββ | 734/880 [1:12:41<13:00, 5.35s/it]
Training 14/16 epoch (loss 0.9688): 83%|βββββββββ | 734/880 [1:12:45<13:00, 5.35s/it]
Training 14/16 epoch (loss 0.9688): 84%|βββββββββ | 735/880 [1:12:45<12:03, 4.99s/it]
Training 14/16 epoch (loss 1.1484): 84%|βββββββββ | 735/880 [1:12:50<12:03, 4.99s/it]
Training 14/16 epoch (loss 1.1484): 84%|βββββββββ | 736/880 [1:12:50<11:55, 4.97s/it]
Training 14/16 epoch (loss 1.3984): 84%|βββββββββ | 736/880 [1:12:56<11:55, 4.97s/it]
Training 14/16 epoch (loss 1.3984): 84%|βββββββββ | 737/880 [1:12:56<12:08, 5.10s/it]
Training 14/16 epoch (loss 1.2031): 84%|βββββββββ | 737/880 [1:13:00<12:08, 5.10s/it]
Training 14/16 epoch (loss 1.2031): 84%|βββββββββ | 738/880 [1:13:00<11:22, 4.81s/it]
Training 14/16 epoch (loss 1.0781): 84%|βββββββββ | 738/880 [1:13:05<11:22, 4.81s/it]
Training 14/16 epoch (loss 1.0781): 84%|βββββββββ | 739/880 [1:13:05<11:30, 4.90s/it]
Training 14/16 epoch (loss 1.1953): 84%|βββββββββ | 739/880 [1:13:17<11:30, 4.90s/it]
Training 14/16 epoch (loss 1.1953): 84%|βββββββββ | 740/880 [1:13:17<16:43, 7.17s/it]
Training 14/16 epoch (loss 1.0000): 84%|βββββββββ | 740/880 [1:13:23<16:43, 7.17s/it]
Training 14/16 epoch (loss 1.0000): 84%|βββββββββ | 741/880 [1:13:23<15:39, 6.76s/it]
Training 14/16 epoch (loss 1.2266): 84%|βββββββββ | 741/880 [1:13:29<15:39, 6.76s/it]
Training 14/16 epoch (loss 1.2266): 84%|βββββββββ | 742/880 [1:13:29<14:35, 6.34s/it]
Training 14/16 epoch (loss 0.9492): 84%|βββββββββ | 742/880 [1:13:34<14:35, 6.34s/it]
Training 14/16 epoch (loss 0.9492): 84%|βββββββββ | 743/880 [1:13:34<13:45, 6.03s/it]
Training 14/16 epoch (loss 1.1250): 84%|βββββββββ | 743/880 [1:13:39<13:45, 6.03s/it]
Training 14/16 epoch (loss 1.1250): 85%|βββββββββ | 744/880 [1:13:39<12:44, 5.62s/it]
Training 14/16 epoch (loss 1.1484): 85%|βββββββββ | 744/880 [1:13:44<12:44, 5.62s/it]
Training 14/16 epoch (loss 1.1484): 85%|βββββββββ | 745/880 [1:13:44<12:35, 5.59s/it]
Training 14/16 epoch (loss 1.1953): 85%|βββββββββ | 745/880 [1:13:49<12:35, 5.59s/it]
Training 14/16 epoch (loss 1.1953): 85%|βββββββββ | 746/880 [1:13:49<11:42, 5.24s/it]
Training 14/16 epoch (loss 1.0078): 85%|βββββββββ | 746/880 [1:13:53<11:42, 5.24s/it]
Training 14/16 epoch (loss 1.0078): 85%|βββββββββ | 747/880 [1:13:53<11:10, 5.04s/it]
Training 14/16 epoch (loss 1.2109): 85%|βββββββββ | 747/880 [1:13:59<11:10, 5.04s/it]
Training 14/16 epoch (loss 1.2109): 85%|βββββββββ | 748/880 [1:13:59<11:22, 5.17s/it]
Training 14/16 epoch (loss 1.0312): 85%|βββββββββ | 748/880 [1:14:04<11:22, 5.17s/it]
Training 14/16 epoch (loss 1.0312): 85%|βββββββββ | 749/880 [1:14:04<11:26, 5.24s/it]
Training 14/16 epoch (loss 1.0469): 85%|βββββββββ | 749/880 [1:14:09<11:26, 5.24s/it]
Training 14/16 epoch (loss 1.0469): 85%|βββββββββ | 750/880 [1:14:09<11:27, 5.29s/it]
Training 14/16 epoch (loss 1.2266): 85%|βββββββββ | 750/880 [1:14:16<11:27, 5.29s/it]
Training 14/16 epoch (loss 1.2266): 85%|βββββββββ | 751/880 [1:14:16<12:06, 5.63s/it]
Training 14/16 epoch (loss 1.2656): 85%|βββββββββ | 751/880 [1:14:30<12:06, 5.63s/it]
Training 14/16 epoch (loss 1.2656): 85%|βββββββββ | 752/880 [1:14:30<17:37, 8.26s/it]
Training 14/16 epoch (loss 1.1094): 85%|βββββββββ | 752/880 [1:14:35<17:37, 8.26s/it]
Training 14/16 epoch (loss 1.1094): 86%|βββββββββ | 753/880 [1:14:35<15:29, 7.32s/it]
Training 14/16 epoch (loss 1.2969): 86%|βββββββββ | 753/880 [1:14:40<15:29, 7.32s/it]
Training 14/16 epoch (loss 1.2969): 86%|βββββββββ | 754/880 [1:14:40<13:38, 6.50s/it]
Training 14/16 epoch (loss 1.1094): 86%|βββββββββ | 754/880 [1:14:45<13:38, 6.50s/it]
Training 14/16 epoch (loss 1.1094): 86%|βββββββββ | 755/880 [1:14:45<12:41, 6.09s/it]
Training 14/16 epoch (loss 1.1328): 86%|βββββββββ | 755/880 [1:14:50<12:41, 6.09s/it]
Training 14/16 epoch (loss 1.1328): 86%|βββββββββ | 756/880 [1:14:50<11:52, 5.74s/it]
Training 14/16 epoch (loss 1.2266): 86%|βββββββββ | 756/880 [1:14:55<11:52, 5.74s/it]
Training 14/16 epoch (loss 1.2266): 86%|βββββββββ | 757/880 [1:14:55<11:34, 5.65s/it]
Training 14/16 epoch (loss 1.0625): 86%|βββββββββ | 757/880 [1:15:02<11:34, 5.65s/it]
Training 14/16 epoch (loss 1.0625): 86%|βββββββββ | 758/880 [1:15:02<11:48, 5.81s/it]
Training 14/16 epoch (loss 1.5703): 86%|βββββββββ | 758/880 [1:15:09<11:48, 5.81s/it]
Training 14/16 epoch (loss 1.5703): 86%|βββββββββ | 759/880 [1:15:09<12:57, 6.43s/it]
Training 14/16 epoch (loss 1.2578): 86%|βββββββββ | 759/880 [1:15:25<12:57, 6.43s/it]
Training 14/16 epoch (loss 1.2578): 86%|βββββββββ | 760/880 [1:15:25<18:27, 9.23s/it]
Training 14/16 epoch (loss 1.0547): 86%|βββββββββ | 760/880 [1:15:31<18:27, 9.23s/it]
Training 14/16 epoch (loss 1.0547): 86%|βββββββββ | 761/880 [1:15:31<16:19, 8.23s/it]
Training 14/16 epoch (loss 1.0078): 86%|βββββββββ | 761/880 [1:15:36<16:19, 8.23s/it]
Training 14/16 epoch (loss 1.0078): 87%|βββββββββ | 762/880 [1:15:36<14:11, 7.21s/it]
Training 14/16 epoch (loss 1.1172): 87%|βββββββββ | 762/880 [1:15:41<14:11, 7.21s/it]
Training 14/16 epoch (loss 1.1172): 87%|βββββββββ | 763/880 [1:15:41<13:00, 6.67s/it]
Training 14/16 epoch (loss 1.0625): 87%|βββββββββ | 763/880 [1:15:48<13:00, 6.67s/it]
Training 14/16 epoch (loss 1.0625): 87%|βββββββββ | 764/880 [1:15:48<13:05, 6.77s/it]
Training 14/16 epoch (loss 1.1875): 87%|βββββββββ | 764/880 [1:15:54<13:05, 6.77s/it]
Training 14/16 epoch (loss 1.1875): 87%|βββββββββ | 765/880 [1:15:54<12:20, 6.44s/it]
Training 14/16 epoch (loss 1.2656): 87%|βββββββββ | 765/880 [1:16:01<12:20, 6.44s/it]
Training 14/16 epoch (loss 1.2656): 87%|βββββββββ | 766/880 [1:16:01<12:19, 6.48s/it]
Training 14/16 epoch (loss 0.9805): 87%|βββββββββ | 766/880 [1:16:06<12:19, 6.48s/it]
Training 14/16 epoch (loss 0.9805): 87%|βββββββββ | 767/880 [1:16:06<11:35, 6.15s/it]
Training 14/16 epoch (loss 1.3750): 87%|βββββββββ | 767/880 [1:16:11<11:35, 6.15s/it]
Training 14/16 epoch (loss 1.3750): 87%|βββββββββ | 768/880 [1:16:11<10:42, 5.73s/it]
Training 14/16 epoch (loss 1.2734): 87%|βββββββββ | 768/880 [1:16:18<10:42, 5.73s/it]
Training 14/16 epoch (loss 1.2734): 87%|βββββββββ | 769/880 [1:16:18<11:15, 6.09s/it]
Training 14/16 epoch (loss 1.0391): 87%|βββββββββ | 769/880 [1:16:22<11:15, 6.09s/it]
Training 14/16 epoch (loss 1.0391): 88%|βββββββββ | 770/880 [1:16:22<10:21, 5.65s/it]
Training 15/16 epoch (loss 1.2500): 88%|βββββββββ | 770/880 [1:16:28<10:21, 5.65s/it]
Training 15/16 epoch (loss 1.2500): 88%|βββββββββ | 771/880 [1:16:28<10:02, 5.53s/it]
Training 15/16 epoch (loss 1.2734): 88%|βββββββββ | 771/880 [1:16:34<10:02, 5.53s/it]
Training 15/16 epoch (loss 1.2734): 88%|βββββββββ | 772/880 [1:16:34<10:11, 5.67s/it]
Training 15/16 epoch (loss 1.0938): 88%|βββββββββ | 772/880 [1:16:40<10:11, 5.67s/it]
Training 15/16 epoch (loss 1.0938): 88%|βββββββββ | 773/880 [1:16:40<10:17, 5.77s/it]
Training 15/16 epoch (loss 1.4219): 88%|βββββββββ | 773/880 [1:16:44<10:17, 5.77s/it]
Training 15/16 epoch (loss 1.4219): 88%|βββββββββ | 774/880 [1:16:44<09:37, 5.45s/it]
Training 15/16 epoch (loss 1.3281): 88%|βββββββββ | 774/880 [1:16:50<09:37, 5.45s/it]
Training 15/16 epoch (loss 1.3281): 88%|βββββββββ | 775/880 [1:16:50<09:25, 5.38s/it]
Training 15/16 epoch (loss 1.1172): 88%|βββββββββ | 775/880 [1:16:55<09:25, 5.38s/it]
Training 15/16 epoch (loss 1.1172): 88%|βββββββββ | 776/880 [1:16:55<09:33, 5.51s/it]
Training 15/16 epoch (loss 1.2344): 88%|βββββββββ | 776/880 [1:17:00<09:33, 5.51s/it]
Training 15/16 epoch (loss 1.2344): 88%|βββββββββ | 777/880 [1:17:00<09:00, 5.25s/it]
Training 15/16 epoch (loss 1.2109): 88%|βββββββββ | 777/880 [1:17:06<09:00, 5.25s/it]
Training 15/16 epoch (loss 1.2109): 88%|βββββββββ | 778/880 [1:17:06<09:05, 5.34s/it]
Training 15/16 epoch (loss 1.3828): 88%|βββββββββ | 778/880 [1:17:14<09:05, 5.34s/it]
Training 15/16 epoch (loss 1.3828): 89%|βββββββββ | 779/880 [1:17:14<10:35, 6.29s/it]
Training 15/16 epoch (loss 1.3438): 89%|βββββββββ | 779/880 [1:17:19<10:35, 6.29s/it]
Training 15/16 epoch (loss 1.3438): 89%|βββββββββ | 780/880 [1:17:19<09:52, 5.92s/it]
Training 15/16 epoch (loss 1.3516): 89%|βββββββββ | 780/880 [1:17:24<09:52, 5.92s/it]
Training 15/16 epoch (loss 1.3516): 89%|βββββββββ | 781/880 [1:17:24<09:13, 5.59s/it]
Training 15/16 epoch (loss 1.2734): 89%|βββββββββ | 781/880 [1:17:31<09:13, 5.59s/it]
Training 15/16 epoch (loss 1.2734): 89%|βββββββββ | 782/880 [1:17:31<10:02, 6.15s/it]
Training 15/16 epoch (loss 1.0859): 89%|βββββββββ | 782/880 [1:17:36<10:02, 6.15s/it]
Training 15/16 epoch (loss 1.0859): 89%|βββββββββ | 783/880 [1:17:36<09:17, 5.74s/it]
Training 15/16 epoch (loss 1.5156): 89%|βββββββββ | 783/880 [1:17:41<09:17, 5.74s/it]
Training 15/16 epoch (loss 1.5156): 89%|βββββββββ | 784/880 [1:17:41<08:43, 5.46s/it]
Training 15/16 epoch (loss 1.2188): 89%|βββββββββ | 784/880 [1:17:46<08:43, 5.46s/it]
Training 15/16 epoch (loss 1.2188): 89%|βββββββββ | 785/880 [1:17:46<08:40, 5.47s/it]
Training 15/16 epoch (loss 1.3125): 89%|βββββββββ | 785/880 [1:17:52<08:40, 5.47s/it]
Training 15/16 epoch (loss 1.3125): 89%|βββββββββ | 786/880 [1:17:52<08:26, 5.38s/it]
Training 15/16 epoch (loss 1.1953): 89%|βββββββββ | 786/880 [1:17:58<08:26, 5.38s/it]
Training 15/16 epoch (loss 1.1953): 89%|βββββββββ | 787/880 [1:17:58<08:52, 5.73s/it]
Training 15/16 epoch (loss 1.2188): 89%|βββββββββ | 787/880 [1:18:03<08:52, 5.73s/it]
Training 15/16 epoch (loss 1.2188): 90%|βββββββββ | 788/880 [1:18:03<08:28, 5.52s/it]
Training 15/16 epoch (loss 1.2734): 90%|βββββββββ | 788/880 [1:18:08<08:28, 5.52s/it]
Training 15/16 epoch (loss 1.2734): 90%|βββββββββ | 789/880 [1:18:08<08:06, 5.35s/it]
Training 15/16 epoch (loss 0.9453): 90%|βββββββββ | 789/880 [1:18:12<08:06, 5.35s/it]
Training 15/16 epoch (loss 0.9453): 90%|βββββββββ | 790/880 [1:18:12<07:29, 4.99s/it]
Training 15/16 epoch (loss 1.1250): 90%|βββββββββ | 790/880 [1:18:17<07:29, 4.99s/it]
Training 15/16 epoch (loss 1.1250): 90%|βββββββββ | 791/880 [1:18:17<07:21, 4.97s/it]
Training 15/16 epoch (loss 1.3828): 90%|βββββββββ | 791/880 [1:18:23<07:21, 4.97s/it]
Training 15/16 epoch (loss 1.3828): 90%|βββββββββ | 792/880 [1:18:23<07:28, 5.09s/it]
Training 15/16 epoch (loss 1.1797): 90%|βββββββββ | 792/880 [1:18:27<07:28, 5.09s/it]
Training 15/16 epoch (loss 1.1797): 90%|βββββββββ | 793/880 [1:18:27<06:57, 4.80s/it]
Training 15/16 epoch (loss 1.0625): 90%|βββββββββ | 793/880 [1:18:32<06:57, 4.80s/it]
Training 15/16 epoch (loss 1.0625): 90%|βββββββββ | 794/880 [1:18:32<07:00, 4.89s/it]
Training 15/16 epoch (loss 1.1719): 90%|βββββββββ | 794/880 [1:18:44<07:00, 4.89s/it]
Training 15/16 epoch (loss 1.1719): 90%|βββββββββ | 795/880 [1:18:44<10:09, 7.17s/it]
Training 15/16 epoch (loss 0.9844): 90%|βββββββββ | 795/880 [1:18:50<10:09, 7.17s/it]
Training 15/16 epoch (loss 0.9844): 90%|βββββββββ | 796/880 [1:18:50<09:28, 6.76s/it]
Training 15/16 epoch (loss 1.2109): 90%|βββββββββ | 796/880 [1:18:55<09:28, 6.76s/it]
Training 15/16 epoch (loss 1.2109): 91%|βββββββββ | 797/880 [1:18:55<08:46, 6.34s/it]
Training 15/16 epoch (loss 0.9297): 91%|βββββββββ | 797/880 [1:19:01<08:46, 6.34s/it]
Training 15/16 epoch (loss 0.9297): 91%|βββββββββ | 798/880 [1:19:01<08:14, 6.03s/it]
Training 15/16 epoch (loss 1.1094): 91%|βββββββββ | 798/880 [1:19:05<08:14, 6.03s/it]
Training 15/16 epoch (loss 1.1094): 91%|βββββββββ | 799/880 [1:19:05<07:35, 5.62s/it]
Training 15/16 epoch (loss 1.1328): 91%|βββββββββ | 799/880 [1:19:11<07:35, 5.62s/it]
Training 15/16 epoch (loss 1.1328): 91%|βββββββββ | 800/880 [1:19:11<07:27, 5.59s/it]
Training 15/16 epoch (loss 1.1719): 91%|βββββββββ | 800/880 [1:19:15<07:27, 5.59s/it]
Training 15/16 epoch (loss 1.1719): 91%|βββββββββ | 801/880 [1:19:15<06:53, 5.24s/it]
Training 15/16 epoch (loss 0.9961): 91%|βββββββββ | 801/880 [1:19:20<06:53, 5.24s/it]
Training 15/16 epoch (loss 0.9961): 91%|βββββββββ | 802/880 [1:19:20<06:33, 5.04s/it]
Training 15/16 epoch (loss 1.1875): 91%|βββββββββ | 802/880 [1:19:25<06:33, 5.04s/it]
Training 15/16 epoch (loss 1.1875): 91%|ββββββββββ| 803/880 [1:19:25<06:37, 5.17s/it]
Training 15/16 epoch (loss 1.0234): 91%|ββββββββββ| 803/880 [1:19:31<06:37, 5.17s/it]
Training 15/16 epoch (loss 1.0234): 91%|ββββββββββ| 804/880 [1:19:31<06:37, 5.23s/it]
Training 15/16 epoch (loss 1.0234): 91%|ββββββββββ| 804/880 [1:19:36<06:37, 5.23s/it]
Training 15/16 epoch (loss 1.0234): 91%|ββββββββββ| 805/880 [1:19:36<06:36, 5.28s/it]
Training 15/16 epoch (loss 1.2031): 91%|ββββββββββ| 805/880 [1:19:43<06:36, 5.28s/it]
Training 15/16 epoch (loss 1.2031): 92%|ββββββββββ| 806/880 [1:19:43<06:56, 5.63s/it]
Training 15/16 epoch (loss 1.2422): 92%|ββββββββββ| 806/880 [1:19:57<06:56, 5.63s/it]
Training 15/16 epoch (loss 1.2422): 92%|ββββββββββ| 807/880 [1:19:57<10:02, 8.25s/it]
Training 15/16 epoch (loss 1.0938): 92%|ββββββββββ| 807/880 [1:20:02<10:02, 8.25s/it]
Training 15/16 epoch (loss 1.0938): 92%|ββββββββββ| 808/880 [1:20:02<08:46, 7.32s/it]
Training 15/16 epoch (loss 1.2734): 92%|ββββββββββ| 808/880 [1:20:07<08:46, 7.32s/it]
Training 15/16 epoch (loss 1.2734): 92%|ββββββββββ| 809/880 [1:20:07<07:40, 6.49s/it]
Training 15/16 epoch (loss 1.0938): 92%|ββββββββββ| 809/880 [1:20:12<07:40, 6.49s/it]
Training 15/16 epoch (loss 1.0938): 92%|ββββββββββ| 810/880 [1:20:12<07:06, 6.09s/it]
Training 15/16 epoch (loss 1.1094): 92%|ββββββββββ| 810/880 [1:20:17<07:06, 6.09s/it]
Training 15/16 epoch (loss 1.1094): 92%|ββββββββββ| 811/880 [1:20:17<06:36, 5.74s/it]
Training 15/16 epoch (loss 1.2109): 92%|ββββββββββ| 811/880 [1:20:22<06:36, 5.74s/it]
Training 15/16 epoch (loss 1.2109): 92%|ββββββββββ| 812/880 [1:20:22<06:23, 5.65s/it]
Training 15/16 epoch (loss 1.0469): 92%|ββββββββββ| 812/880 [1:20:28<06:23, 5.65s/it]
Training 15/16 epoch (loss 1.0469): 92%|ββββββββββ| 813/880 [1:20:28<06:28, 5.80s/it]
Training 15/16 epoch (loss 1.5469): 92%|ββββββββββ| 813/880 [1:20:36<06:28, 5.80s/it]
Training 15/16 epoch (loss 1.5469): 92%|ββββββββββ| 814/880 [1:20:36<07:03, 6.42s/it]
Training 15/16 epoch (loss 1.2344): 92%|ββββββββββ| 814/880 [1:20:52<07:03, 6.42s/it]
Training 15/16 epoch (loss 1.2344): 93%|ββββββββββ| 815/880 [1:20:52<09:59, 9.22s/it]
Training 15/16 epoch (loss 1.0469): 93%|ββββββββββ| 815/880 [1:20:58<09:59, 9.22s/it]
Training 15/16 epoch (loss 1.0469): 93%|ββββββββββ| 816/880 [1:20:58<08:46, 8.22s/it]
Training 15/16 epoch (loss 1.0000): 93%|ββββββββββ| 816/880 [1:21:03<08:46, 8.22s/it]
Training 15/16 epoch (loss 1.0000): 93%|ββββββββββ| 817/880 [1:21:03<07:33, 7.20s/it]
Training 15/16 epoch (loss 1.0938): 93%|ββββββββββ| 817/880 [1:21:08<07:33, 7.20s/it]
Training 15/16 epoch (loss 1.0938): 93%|ββββββββββ| 818/880 [1:21:08<06:53, 6.67s/it]
Training 15/16 epoch (loss 1.0469): 93%|ββββββββββ| 818/880 [1:21:15<06:53, 6.67s/it]
Training 15/16 epoch (loss 1.0469): 93%|ββββββββββ| 819/880 [1:21:15<06:52, 6.77s/it]
Training 15/16 epoch (loss 1.1797): 93%|ββββββββββ| 819/880 [1:21:21<06:52, 6.77s/it]
Training 15/16 epoch (loss 1.1797): 93%|ββββββββββ| 820/880 [1:21:21<06:26, 6.44s/it]
Training 15/16 epoch (loss 1.2578): 93%|ββββββββββ| 820/880 [1:21:27<06:26, 6.44s/it]
Training 15/16 epoch (loss 1.2578): 93%|ββββββββββ| 821/880 [1:21:27<06:22, 6.49s/it]
Training 15/16 epoch (loss 0.9766): 93%|ββββββββββ| 821/880 [1:21:33<06:22, 6.49s/it]
Training 15/16 epoch (loss 0.9766): 93%|ββββββββββ| 822/880 [1:21:33<05:56, 6.15s/it]
Training 15/16 epoch (loss 1.3672): 93%|ββββββββββ| 822/880 [1:21:38<05:56, 6.15s/it]
Training 15/16 epoch (loss 1.3672): 94%|ββββββββββ| 823/880 [1:21:38<05:26, 5.73s/it]
Training 15/16 epoch (loss 1.2656): 94%|ββββββββββ| 823/880 [1:21:44<05:26, 5.73s/it]
Training 15/16 epoch (loss 1.2656): 94%|ββββββββββ| 824/880 [1:21:44<05:40, 6.09s/it]
Training 15/16 epoch (loss 1.0312): 94%|ββββββββββ| 824/880 [1:21:49<05:40, 6.09s/it]
Training 15/16 epoch (loss 1.0312): 94%|ββββββββββ| 825/880 [1:21:49<05:10, 5.65s/it]
Training 16/16 epoch (loss 1.2422): 94%|ββββββββββ| 825/880 [1:21:54<05:10, 5.65s/it]
Training 16/16 epoch (loss 1.2422): 94%|ββββββββββ| 826/880 [1:21:54<04:58, 5.52s/it]
Training 16/16 epoch (loss 1.2578): 94%|ββββββββββ| 826/880 [1:22:00<04:58, 5.52s/it]
Training 16/16 epoch (loss 1.2578): 94%|ββββββββββ| 827/880 [1:22:00<05:00, 5.66s/it]
Training 16/16 epoch (loss 1.0859): 94%|ββββββββββ| 827/880 [1:22:06<05:00, 5.66s/it]
Training 16/16 epoch (loss 1.0859): 94%|ββββββββββ| 828/880 [1:22:06<04:59, 5.76s/it]
Training 16/16 epoch (loss 1.4062): 94%|ββββββββββ| 828/880 [1:22:11<04:59, 5.76s/it]
Training 16/16 epoch (loss 1.4062): 94%|ββββββββββ| 829/880 [1:22:11<04:37, 5.44s/it]
Training 16/16 epoch (loss 1.3203): 94%|ββββββββββ| 829/880 [1:22:16<04:37, 5.44s/it]
Training 16/16 epoch (loss 1.3203): 94%|ββββββββββ| 830/880 [1:22:16<04:29, 5.38s/it]
Training 16/16 epoch (loss 1.1094): 94%|ββββββββββ| 830/880 [1:22:22<04:29, 5.38s/it]
Training 16/16 epoch (loss 1.1094): 94%|ββββββββββ| 831/880 [1:22:22<04:30, 5.51s/it]
Training 16/16 epoch (loss 1.2188): 94%|ββββββββββ| 831/880 [1:22:27<04:30, 5.51s/it]
Training 16/16 epoch (loss 1.2188): 95%|ββββββββββ| 832/880 [1:22:27<04:12, 5.25s/it]
Training 16/16 epoch (loss 1.1875): 95%|ββββββββββ| 832/880 [1:22:32<04:12, 5.25s/it]
Training 16/16 epoch (loss 1.1875): 95%|ββββββββββ| 833/880 [1:22:32<04:11, 5.35s/it]
Training 16/16 epoch (loss 1.3594): 95%|ββββββββββ| 833/880 [1:22:41<04:11, 5.35s/it]
Training 16/16 epoch (loss 1.3594): 95%|ββββββββββ| 834/880 [1:22:41<04:49, 6.30s/it]
Training 16/16 epoch (loss 1.3359): 95%|ββββββββββ| 834/880 [1:22:46<04:49, 6.30s/it]
Training 16/16 epoch (loss 1.3359): 95%|ββββββββββ| 835/880 [1:22:46<04:26, 5.93s/it]
Training 16/16 epoch (loss 1.3438): 95%|ββββββββββ| 835/880 [1:22:51<04:26, 5.93s/it]
Training 16/16 epoch (loss 1.3438): 95%|ββββββββββ| 836/880 [1:22:51<04:06, 5.59s/it]
Training 16/16 epoch (loss 1.2734): 95%|ββββββββββ| 836/880 [1:22:58<04:06, 5.59s/it]
Training 16/16 epoch (loss 1.2734): 95%|ββββββββββ| 837/880 [1:22:58<04:24, 6.15s/it]
Training 16/16 epoch (loss 1.0781): 95%|ββββββββββ| 837/880 [1:23:03<04:24, 6.15s/it]
Training 16/16 epoch (loss 1.0781): 95%|ββββββββββ| 838/880 [1:23:03<04:01, 5.74s/it]
Training 16/16 epoch (loss 1.5078): 95%|ββββββββββ| 838/880 [1:23:08<04:01, 5.74s/it]
Training 16/16 epoch (loss 1.5078): 95%|ββββββββββ| 839/880 [1:23:08<03:43, 5.45s/it]
Training 16/16 epoch (loss 1.2188): 95%|ββββββββββ| 839/880 [1:23:13<03:43, 5.45s/it]
Training 16/16 epoch (loss 1.2188): 95%|ββββββββββ| 840/880 [1:23:13<03:38, 5.47s/it]
Training 16/16 epoch (loss 1.3047): 95%|ββββββββββ| 840/880 [1:23:18<03:38, 5.47s/it]
Training 16/16 epoch (loss 1.3047): 96%|ββββββββββ| 841/880 [1:23:18<03:29, 5.38s/it]
Training 16/16 epoch (loss 1.1953): 96%|ββββββββββ| 841/880 [1:23:25<03:29, 5.38s/it]
Training 16/16 epoch (loss 1.1953): 96%|ββββββββββ| 842/880 [1:23:25<03:37, 5.73s/it]
Training 16/16 epoch (loss 1.2109): 96%|ββββββββββ| 842/880 [1:23:30<03:37, 5.73s/it]
Training 16/16 epoch (loss 1.2109): 96%|ββββββββββ| 843/880 [1:23:30<03:24, 5.53s/it]
Training 16/16 epoch (loss 1.2656): 96%|ββββββββββ| 843/880 [1:23:35<03:24, 5.53s/it]
Training 16/16 epoch (loss 1.2656): 96%|ββββββββββ| 844/880 [1:23:35<03:12, 5.36s/it]
Training 16/16 epoch (loss 0.9375): 96%|ββββββββββ| 844/880 [1:23:39<03:12, 5.36s/it]
Training 16/16 epoch (loss 0.9375): 96%|ββββββββββ| 845/880 [1:23:39<02:54, 5.00s/it]
Training 16/16 epoch (loss 1.1094): 96%|ββββββββββ| 845/880 [1:23:44<02:54, 5.00s/it]
Training 16/16 epoch (loss 1.1094): 96%|ββββββββββ| 846/880 [1:23:44<02:48, 4.97s/it]
Training 16/16 epoch (loss 1.3750): 96%|ββββββββββ| 846/880 [1:23:49<02:48, 4.97s/it]
Training 16/16 epoch (loss 1.3750): 96%|ββββββββββ| 847/880 [1:23:49<02:48, 5.10s/it]
Training 16/16 epoch (loss 1.1797): 96%|ββββββββββ| 847/880 [1:23:54<02:48, 5.10s/it]
Training 16/16 epoch (loss 1.1797): 96%|ββββββββββ| 848/880 [1:23:54<02:33, 4.80s/it]
Training 16/16 epoch (loss 1.0547): 96%|ββββββββββ| 848/880 [1:23:59<02:33, 4.80s/it]
Training 16/16 epoch (loss 1.0547): 96%|ββββββββββ| 849/880 [1:23:59<02:31, 4.89s/it]
Training 16/16 epoch (loss 1.1641): 96%|ββββββββββ| 849/880 [1:24:11<02:31, 4.89s/it]
Training 16/16 epoch (loss 1.1641): 97%|ββββββββββ| 850/880 [1:24:11<03:34, 7.17s/it]
Training 16/16 epoch (loss 0.9688): 97%|ββββββββββ| 850/880 [1:24:17<03:34, 7.17s/it]
Training 16/16 epoch (loss 0.9688): 97%|ββββββββββ| 851/880 [1:24:17<03:15, 6.76s/it]
Training 16/16 epoch (loss 1.2109): 97%|ββββββββββ| 851/880 [1:24:22<03:15, 6.76s/it]
Training 16/16 epoch (loss 1.2109): 97%|ββββββββββ| 852/880 [1:24:22<02:57, 6.34s/it]
Training 16/16 epoch (loss 0.9141): 97%|ββββββββββ| 852/880 [1:24:28<02:57, 6.34s/it]
Training 16/16 epoch (loss 0.9141): 97%|ββββββββββ| 853/880 [1:24:28<02:42, 6.03s/it]
Training 16/16 epoch (loss 1.1016): 97%|ββββββββββ| 853/880 [1:24:32<02:42, 6.03s/it]
Training 16/16 epoch (loss 1.1016): 97%|ββββββββββ| 854/880 [1:24:32<02:26, 5.62s/it]
Training 16/16 epoch (loss 1.1250): 97%|ββββββββββ| 854/880 [1:24:38<02:26, 5.62s/it]
Training 16/16 epoch (loss 1.1250): 97%|ββββββββββ| 855/880 [1:24:38<02:19, 5.59s/it]
Training 16/16 epoch (loss 1.1719): 97%|ββββββββββ| 855/880 [1:24:42<02:19, 5.59s/it]
Training 16/16 epoch (loss 1.1719): 97%|ββββββββββ| 856/880 [1:24:42<02:05, 5.24s/it]
Training 16/16 epoch (loss 0.9922): 97%|ββββββββββ| 856/880 [1:24:47<02:05, 5.24s/it]
Training 16/16 epoch (loss 0.9922): 97%|ββββββββββ| 857/880 [1:24:47<01:55, 5.04s/it]
Training 16/16 epoch (loss 1.1797): 97%|ββββββββββ| 857/880 [1:24:52<01:55, 5.04s/it]
Training 16/16 epoch (loss 1.1797): 98%|ββββββββββ| 858/880 [1:24:52<01:53, 5.17s/it]
Training 16/16 epoch (loss 1.0078): 98%|ββββββββββ| 858/880 [1:24:58<01:53, 5.17s/it]
Training 16/16 epoch (loss 1.0078): 98%|ββββββββββ| 859/880 [1:24:58<01:49, 5.23s/it]
Training 16/16 epoch (loss 1.0234): 98%|ββββββββββ| 859/880 [1:25:03<01:49, 5.23s/it]
Training 16/16 epoch (loss 1.0234): 98%|ββββββββββ| 860/880 [1:25:03<01:45, 5.28s/it]
Training 16/16 epoch (loss 1.2031): 98%|ββββββββββ| 860/880 [1:25:09<01:45, 5.28s/it]
Training 16/16 epoch (loss 1.2031): 98%|ββββββββββ| 861/880 [1:25:09<01:46, 5.62s/it]
Training 16/16 epoch (loss 1.2344): 98%|ββββββββββ| 861/880 [1:25:24<01:46, 5.62s/it]
Training 16/16 epoch (loss 1.2344): 98%|ββββββββββ| 862/880 [1:25:24<02:28, 8.25s/it]
Training 16/16 epoch (loss 1.0781): 98%|ββββββββββ| 862/880 [1:25:29<02:28, 8.25s/it]
Training 16/16 epoch (loss 1.0781): 98%|ββββββββββ| 863/880 [1:25:29<02:04, 7.32s/it]
Training 16/16 epoch (loss 1.2656): 98%|ββββββββββ| 863/880 [1:25:33<02:04, 7.32s/it]
Training 16/16 epoch (loss 1.2656): 98%|ββββββββββ| 864/880 [1:25:33<01:43, 6.49s/it]
Training 16/16 epoch (loss 1.0859): 98%|ββββββββββ| 864/880 [1:25:39<01:43, 6.49s/it]
Training 16/16 epoch (loss 1.0859): 98%|ββββββββββ| 865/880 [1:25:39<01:31, 6.10s/it]
Training 16/16 epoch (loss 1.1016): 98%|ββββββββββ| 865/880 [1:25:44<01:31, 6.10s/it]
Training 16/16 epoch (loss 1.1016): 98%|ββββββββββ| 866/880 [1:25:44<01:20, 5.75s/it]
Training 16/16 epoch (loss 1.1953): 98%|ββββββββββ| 866/880 [1:25:49<01:20, 5.75s/it]
Training 16/16 epoch (loss 1.1953): 99%|ββββββββββ| 867/880 [1:25:49<01:13, 5.65s/it]
Training 16/16 epoch (loss 1.0469): 99%|ββββββββββ| 867/880 [1:25:55<01:13, 5.65s/it]
Training 16/16 epoch (loss 1.0469): 99%|ββββββββββ| 868/880 [1:25:55<01:09, 5.81s/it]
Training 16/16 epoch (loss 1.5469): 99%|ββββββββββ| 868/880 [1:26:03<01:09, 5.81s/it]
Training 16/16 epoch (loss 1.5469): 99%|ββββββββββ| 869/880 [1:26:03<01:10, 6.43s/it]
Training 16/16 epoch (loss 1.2266): 99%|ββββββββββ| 869/880 [1:26:19<01:10, 6.43s/it]
Training 16/16 epoch (loss 1.2266): 99%|ββββββββββ| 870/880 [1:26:19<01:32, 9.22s/it]
Training 16/16 epoch (loss 1.0391): 99%|ββββββββββ| 870/880 [1:26:25<01:32, 9.22s/it]
Training 16/16 epoch (loss 1.0391): 99%|ββββββββββ| 871/880 [1:26:25<01:14, 8.22s/it]
Training 16/16 epoch (loss 0.9922): 99%|ββββββββββ| 871/880 [1:26:30<01:14, 8.22s/it]
Training 16/16 epoch (loss 0.9922): 99%|ββββββββββ| 872/880 [1:26:30<00:57, 7.20s/it]
Training 16/16 epoch (loss 1.0859): 99%|ββββββββββ| 872/880 [1:26:35<00:57, 7.20s/it]
Training 16/16 epoch (loss 1.0859): 99%|ββββββββββ| 873/880 [1:26:35<00:46, 6.67s/it]
Training 16/16 epoch (loss 1.0469): 99%|ββββββββββ| 873/880 [1:26:42<00:46, 6.67s/it]
Training 16/16 epoch (loss 1.0469): 99%|ββββββββββ| 874/880 [1:26:42<00:40, 6.76s/it]
Training 16/16 epoch (loss 1.1719): 99%|ββββββββββ| 874/880 [1:26:48<00:40, 6.76s/it]
Training 16/16 epoch (loss 1.1719): 99%|ββββββββββ| 875/880 [1:26:48<00:32, 6.44s/it]
Training 16/16 epoch (loss 1.2500): 99%|ββββββββββ| 875/880 [1:26:54<00:32, 6.44s/it]
Training 16/16 epoch (loss 1.2500): 100%|ββββββββββ| 876/880 [1:26:54<00:25, 6.49s/it]
Training 16/16 epoch (loss 0.9648): 100%|ββββββββββ| 876/880 [1:27:00<00:25, 6.49s/it]
Training 16/16 epoch (loss 0.9648): 100%|ββββββββββ| 877/880 [1:27:00<00:18, 6.16s/it]
Training 16/16 epoch (loss 1.3594): 100%|ββββββββββ| 877/880 [1:27:04<00:18, 6.16s/it]
Training 16/16 epoch (loss 1.3594): 100%|ββββββββββ| 878/880 [1:27:04<00:11, 5.74s/it]
Training 16/16 epoch (loss 1.2500): 100%|ββββββββββ| 878/880 [1:27:11<00:11, 5.74s/it]
Training 16/16 epoch (loss 1.2500): 100%|ββββββββββ| 879/880 [1:27:11<00:06, 6.09s/it]
Training 16/16 epoch (loss 1.0234): 100%|ββββββββββ| 879/880 [1:27:16<00:06, 6.09s/it]
Training 16/16 epoch (loss 1.0234): 100%|ββββββββββ| 880/880 [1:27:16<00:00, 5.65s/it]
Training 16/16 epoch (loss 1.0234): 100%|ββββββββββ| 880/880 [1:27:16<00:00, 5.95s/it]
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
wandb: Waiting for W&B process to finish... (success).
wandb:
wandb: Run history:
wandb: train/epoch βββββββββββββββββββββ
β
β
β
β
β
ββββββββββββββ
wandb: train/loss βββ
βββββββββββββββββββββββββββββββββββββ
wandb: train/lr ββββββββββββββββββ
β
β
β
βββββββββββββββββββ
wandb: train/step βββββββββββββββββββββ
β
β
β
β
β
ββββββββββββββ
wandb:
wandb: Run summary:
wandb: train/epoch 16.0
wandb: train/loss 1.02344
wandb: train/lr 0.0
wandb: train/step 880
wandb:
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /home/paperspace/safe-rlhf/output/sft/wandb/offline-run-20230725_194014-2rh62cpq
wandb: Find logs at: ./output/sft/wandb/offline-run-20230725_194014-2rh62cpq/logs
|