File size: 181,273 Bytes
cf05c06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
+ deepspeed --num_nodes=1 --num_gpus=8 --master_port 35109 --module safe_rlhf.finetune --train_datasets bt --model_name_or_path cerebras/btlm-3b-8k-base --max_length 8092 --trust_remote_code True --epochs 16 --per_device_train_batch_size 8 --per_device_eval_batch_size 2 --gradient_accumulation_steps 1 --gradient_checkpointing --learning_rate 4.7e-6 --lr_scheduler_type cosine --num_warmup_steps 20 --weight_decay 0.0 --seed 42 --output_dir /home/paperspace/safe-rlhf/output/sft --log_type wandb --log_project BT-Training --zero_stage 2 --bf16 True --tf32 True
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
WARNING:datasets.builder:Using custom data configuration robertmyers--sakura-541a529765142ab6
WARNING:datasets.builder:Reusing dataset parquet (/home/paperspace/.cache/huggingface/datasets/robertmyers___parquet/robertmyers--sakura-541a529765142ab6/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/paperspace/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/paperspace/.cache/torch_extensions/py39_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
wandb: Tracking run with wandb version 0.13.4
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.

Training 1/16 epoch:   0%|          | 0/880 [00:00<?, ?it/s]WARNING:transformers_modules.cerebras.btlm-3b-8k-base.099ed6b507c686ba96229c0ab34201fee7415cae.modeling_btlm:`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...

Training 1/16 epoch (loss 6.2812):   0%|          | 0/880 [00:05<?, ?it/s]
Training 1/16 epoch (loss 6.2812):   0%|          | 1/880 [00:05<1:27:36,  5.98s/it]
Training 1/16 epoch (loss 6.2812):   0%|          | 1/880 [00:12<1:27:36,  5.98s/it]
Training 1/16 epoch (loss 6.2812):   0%|          | 2/880 [00:12<1:28:07,  6.02s/it]
Training 1/16 epoch (loss 6.2812):   0%|          | 2/880 [00:18<1:28:07,  6.02s/it]
Training 1/16 epoch (loss 6.2812):   0%|          | 3/880 [00:18<1:30:05,  6.16s/it]
Training 1/16 epoch (loss 6.2812):   0%|          | 3/880 [00:23<1:30:05,  6.16s/it]
Training 1/16 epoch (loss 6.2812):   0%|          | 4/880 [00:23<1:22:47,  5.67s/it]
Training 1/16 epoch (loss 6.2188):   0%|          | 4/880 [00:28<1:22:47,  5.67s/it]
Training 1/16 epoch (loss 6.2188):   1%|          | 5/880 [00:28<1:22:41,  5.67s/it]
Training 1/16 epoch (loss 6.3750):   1%|          | 5/880 [00:34<1:22:41,  5.67s/it]
Training 1/16 epoch (loss 6.3750):   1%|          | 6/880 [00:34<1:23:25,  5.73s/it]
Training 1/16 epoch (loss 6.3750):   1%|          | 6/880 [00:39<1:23:25,  5.73s/it]
Training 1/16 epoch (loss 6.3750):   1%|          | 7/880 [00:39<1:18:11,  5.37s/it]
Training 1/16 epoch (loss 6.2500):   1%|          | 7/880 [00:45<1:18:11,  5.37s/it]
Training 1/16 epoch (loss 6.2500):   1%|          | 8/880 [00:45<1:21:21,  5.60s/it]
Training 1/16 epoch (loss 6.3438):   1%|          | 8/880 [00:54<1:21:21,  5.60s/it]
Training 1/16 epoch (loss 6.3438):   1%|          | 9/880 [00:54<1:35:15,  6.56s/it]
Training 1/16 epoch (loss 6.2500):   1%|          | 9/880 [00:59<1:35:15,  6.56s/it]
Training 1/16 epoch (loss 6.2500):   1%|          | 10/880 [00:59<1:28:31,  6.11s/it]
Training 1/16 epoch (loss 6.1250):   1%|          | 10/880 [01:04<1:28:31,  6.11s/it]
Training 1/16 epoch (loss 6.1250):   1%|▏         | 11/880 [01:04<1:23:04,  5.74s/it]
Training 1/16 epoch (loss 6.1562):   1%|▏         | 11/880 [01:12<1:23:04,  5.74s/it]
Training 1/16 epoch (loss 6.1562):   1%|▏         | 12/880 [01:12<1:32:30,  6.39s/it]
Training 1/16 epoch (loss 6.0000):   1%|▏         | 12/880 [01:16<1:32:30,  6.39s/it]
Training 1/16 epoch (loss 6.0000):   1%|▏         | 13/880 [01:16<1:25:28,  5.92s/it]
Training 1/16 epoch (loss 6.0000):   1%|▏         | 13/880 [01:21<1:25:28,  5.92s/it]
Training 1/16 epoch (loss 6.0000):   2%|▏         | 14/880 [01:21<1:20:32,  5.58s/it]
Training 1/16 epoch (loss 5.9375):   2%|▏         | 14/880 [01:27<1:20:32,  5.58s/it]
Training 1/16 epoch (loss 5.9375):   2%|▏         | 15/880 [01:27<1:20:18,  5.57s/it]
Training 1/16 epoch (loss 5.7812):   2%|▏         | 15/880 [01:32<1:20:18,  5.57s/it]
Training 1/16 epoch (loss 5.7812):   2%|▏         | 16/880 [01:32<1:18:38,  5.46s/it]
Training 1/16 epoch (loss 5.6250):   2%|▏         | 16/880 [01:39<1:18:38,  5.46s/it]
Training 1/16 epoch (loss 5.6250):   2%|▏         | 17/880 [01:39<1:24:30,  5.88s/it]
Training 1/16 epoch (loss 5.3750):   2%|▏         | 17/880 [01:44<1:24:30,  5.88s/it]
Training 1/16 epoch (loss 5.3750):   2%|▏         | 18/880 [01:44<1:20:56,  5.63s/it]
Training 1/16 epoch (loss 5.2500):   2%|▏         | 18/880 [01:49<1:20:56,  5.63s/it]
Training 1/16 epoch (loss 5.2500):   2%|▏         | 19/880 [01:49<1:17:58,  5.43s/it]
Training 1/16 epoch (loss 5.0625):   2%|▏         | 19/880 [01:53<1:17:58,  5.43s/it]
Training 1/16 epoch (loss 5.0625):   2%|▏         | 20/880 [01:53<1:12:27,  5.06s/it]
Training 1/16 epoch (loss 5.0625):   2%|▏         | 20/880 [01:58<1:12:27,  5.06s/it]
Training 1/16 epoch (loss 5.0625):   2%|▏         | 21/880 [01:58<1:11:50,  5.02s/it]
Training 1/16 epoch (loss 5.1250):   2%|▏         | 21/880 [02:03<1:11:50,  5.02s/it]
Training 1/16 epoch (loss 5.1250):   2%|β–Ž         | 22/880 [02:03<1:13:28,  5.14s/it]
Training 1/16 epoch (loss 5.0625):   2%|β–Ž         | 22/880 [02:07<1:13:28,  5.14s/it]
Training 1/16 epoch (loss 5.0625):   3%|β–Ž         | 23/880 [02:07<1:09:07,  4.84s/it]
Training 1/16 epoch (loss 4.9688):   3%|β–Ž         | 23/880 [02:13<1:09:07,  4.84s/it]
Training 1/16 epoch (loss 4.9688):   3%|β–Ž         | 24/880 [02:13<1:10:16,  4.93s/it]
Training 1/16 epoch (loss 4.9062):   3%|β–Ž         | 24/880 [02:26<1:10:16,  4.93s/it]
Training 1/16 epoch (loss 4.9062):   3%|β–Ž         | 25/880 [02:26<1:44:18,  7.32s/it]
Training 1/16 epoch (loss 4.7500):   3%|β–Ž         | 25/880 [02:32<1:44:18,  7.32s/it]
Training 1/16 epoch (loss 4.7500):   3%|β–Ž         | 26/880 [02:32<1:40:29,  7.06s/it]
Training 1/16 epoch (loss 4.7500):   3%|β–Ž         | 26/880 [02:37<1:40:29,  7.06s/it]
Training 1/16 epoch (loss 4.7500):   3%|β–Ž         | 27/880 [02:37<1:33:17,  6.56s/it]
Training 1/16 epoch (loss 4.6250):   3%|β–Ž         | 27/880 [02:43<1:33:17,  6.56s/it]
Training 1/16 epoch (loss 4.6250):   3%|β–Ž         | 28/880 [02:43<1:30:02,  6.34s/it]
Training 1/16 epoch (loss 4.6875):   3%|β–Ž         | 28/880 [02:48<1:30:02,  6.34s/it]
Training 1/16 epoch (loss 4.6875):   3%|β–Ž         | 29/880 [02:48<1:22:54,  5.85s/it]
Training 1/16 epoch (loss 4.6250):   3%|β–Ž         | 29/880 [02:54<1:22:54,  5.85s/it]
Training 1/16 epoch (loss 4.6250):   3%|β–Ž         | 30/880 [02:54<1:21:47,  5.77s/it]
Training 1/16 epoch (loss 4.6562):   3%|β–Ž         | 30/880 [02:58<1:21:47,  5.77s/it]
Training 1/16 epoch (loss 4.6562):   4%|β–Ž         | 31/880 [02:58<1:16:01,  5.37s/it]
Training 1/16 epoch (loss 4.5000):   4%|β–Ž         | 31/880 [03:03<1:16:01,  5.37s/it]
Training 1/16 epoch (loss 4.5000):   4%|β–Ž         | 32/880 [03:03<1:12:35,  5.14s/it]
Training 1/16 epoch (loss 4.5000):   4%|β–Ž         | 32/880 [03:08<1:12:35,  5.14s/it]
Training 1/16 epoch (loss 4.5000):   4%|▍         | 33/880 [03:08<1:14:13,  5.26s/it]
Training 1/16 epoch (loss 4.5000):   4%|▍         | 33/880 [03:13<1:14:13,  5.26s/it]
Training 1/16 epoch (loss 4.5000):   4%|▍         | 34/880 [03:13<1:14:47,  5.30s/it]
Training 1/16 epoch (loss 4.4688):   4%|▍         | 34/880 [03:19<1:14:47,  5.30s/it]
Training 1/16 epoch (loss 4.4688):   4%|▍         | 35/880 [03:19<1:15:09,  5.34s/it]
Training 1/16 epoch (loss 4.5312):   4%|▍         | 35/880 [03:26<1:15:09,  5.34s/it]
Training 1/16 epoch (loss 4.5312):   4%|▍         | 36/880 [03:26<1:21:58,  5.83s/it]
Training 1/16 epoch (loss 4.5000):   4%|▍         | 36/880 [03:41<1:21:58,  5.83s/it]
Training 1/16 epoch (loss 4.5000):   4%|▍         | 37/880 [03:41<1:59:20,  8.49s/it]
Training 1/16 epoch (loss 4.3125):   4%|▍         | 37/880 [03:46<1:59:20,  8.49s/it]
Training 1/16 epoch (loss 4.3125):   4%|▍         | 38/880 [03:46<1:45:09,  7.49s/it]
Training 1/16 epoch (loss 4.4688):   4%|▍         | 38/880 [03:50<1:45:09,  7.49s/it]
Training 1/16 epoch (loss 4.4688):   4%|▍         | 39/880 [03:50<1:32:49,  6.62s/it]
Training 1/16 epoch (loss 4.3125):   4%|▍         | 39/880 [03:56<1:32:49,  6.62s/it]
Training 1/16 epoch (loss 4.3125):   5%|▍         | 40/880 [03:56<1:26:40,  6.19s/it]
Training 1/16 epoch (loss 4.2812):   5%|▍         | 40/880 [04:00<1:26:40,  6.19s/it]
Training 1/16 epoch (loss 4.2812):   5%|▍         | 41/880 [04:00<1:21:22,  5.82s/it]
Training 1/16 epoch (loss 4.3125):   5%|▍         | 41/880 [04:06<1:21:22,  5.82s/it]
Training 1/16 epoch (loss 4.3125):   5%|▍         | 42/880 [04:06<1:19:40,  5.70s/it]
Training 1/16 epoch (loss 4.2812):   5%|▍         | 42/880 [04:12<1:19:40,  5.70s/it]
Training 1/16 epoch (loss 4.2812):   5%|▍         | 43/880 [04:12<1:21:38,  5.85s/it]
Training 1/16 epoch (loss 4.2500):   5%|▍         | 43/880 [04:21<1:21:38,  5.85s/it]
Training 1/16 epoch (loss 4.2500):   5%|β–Œ         | 44/880 [04:21<1:32:35,  6.65s/it]
Training 1/16 epoch (loss 4.3750):   5%|β–Œ         | 44/880 [04:37<1:32:35,  6.65s/it]
Training 1/16 epoch (loss 4.3750):   5%|β–Œ         | 45/880 [04:37<2:13:14,  9.57s/it]
Training 1/16 epoch (loss 4.0312):   5%|β–Œ         | 45/880 [04:43<2:13:14,  9.57s/it]
Training 1/16 epoch (loss 4.0312):   5%|β–Œ         | 46/880 [04:43<1:57:52,  8.48s/it]
Training 1/16 epoch (loss 3.9375):   5%|β–Œ         | 46/880 [04:48<1:57:52,  8.48s/it]
Training 1/16 epoch (loss 3.9375):   5%|β–Œ         | 47/880 [04:48<1:42:37,  7.39s/it]
Training 1/16 epoch (loss 4.0938):   5%|β–Œ         | 47/880 [04:53<1:42:37,  7.39s/it]
Training 1/16 epoch (loss 4.0938):   5%|β–Œ         | 48/880 [04:53<1:34:24,  6.81s/it]
Training 1/16 epoch (loss 3.9844):   5%|β–Œ         | 48/880 [05:00<1:34:24,  6.81s/it]
Training 1/16 epoch (loss 3.9844):   6%|β–Œ         | 49/880 [05:00<1:35:10,  6.87s/it]
Training 1/16 epoch (loss 4.1250):   6%|β–Œ         | 49/880 [05:06<1:35:10,  6.87s/it]
Training 1/16 epoch (loss 4.1250):   6%|β–Œ         | 50/880 [05:06<1:30:13,  6.52s/it]
Training 1/16 epoch (loss 4.1250):   6%|β–Œ         | 50/880 [05:13<1:30:13,  6.52s/it]
Training 1/16 epoch (loss 4.1250):   6%|β–Œ         | 51/880 [05:13<1:30:31,  6.55s/it]
Training 1/16 epoch (loss 3.8750):   6%|β–Œ         | 51/880 [05:18<1:30:31,  6.55s/it]
Training 1/16 epoch (loss 3.8750):   6%|β–Œ         | 52/880 [05:18<1:25:42,  6.21s/it]
Training 1/16 epoch (loss 3.9844):   6%|β–Œ         | 52/880 [05:23<1:25:42,  6.21s/it]
Training 1/16 epoch (loss 3.9844):   6%|β–Œ         | 53/880 [05:23<1:19:40,  5.78s/it]
Training 1/16 epoch (loss 3.8750):   6%|β–Œ         | 53/880 [05:30<1:19:40,  5.78s/it]
Training 1/16 epoch (loss 3.8750):   6%|β–Œ         | 54/880 [05:30<1:24:21,  6.13s/it]
Training 1/16 epoch (loss 3.7344):   6%|β–Œ         | 54/880 [05:34<1:24:21,  6.13s/it]
Training 1/16 epoch (loss 3.7344):   6%|β–‹         | 55/880 [05:34<1:18:10,  5.69s/it]
Training 2/16 epoch (loss 3.7344):   6%|β–‹         | 55/880 [05:40<1:18:10,  5.69s/it]
Training 2/16 epoch (loss 3.7344):   6%|β–‹         | 56/880 [05:40<1:16:11,  5.55s/it]
Training 2/16 epoch (loss 3.7969):   6%|β–‹         | 56/880 [05:46<1:16:11,  5.55s/it]
Training 2/16 epoch (loss 3.7969):   6%|β–‹         | 57/880 [05:46<1:17:53,  5.68s/it]
Training 2/16 epoch (loss 3.6719):   6%|β–‹         | 57/880 [05:52<1:17:53,  5.68s/it]
Training 2/16 epoch (loss 3.6719):   7%|β–‹         | 58/880 [05:52<1:19:07,  5.78s/it]
Training 2/16 epoch (loss 3.7969):   7%|β–‹         | 58/880 [05:56<1:19:07,  5.78s/it]
Training 2/16 epoch (loss 3.7969):   7%|β–‹         | 59/880 [05:56<1:14:35,  5.45s/it]
Training 2/16 epoch (loss 3.6719):   7%|β–‹         | 59/880 [06:02<1:14:35,  5.45s/it]
Training 2/16 epoch (loss 3.6719):   7%|β–‹         | 60/880 [06:02<1:13:38,  5.39s/it]
Training 2/16 epoch (loss 3.6562):   7%|β–‹         | 60/880 [06:07<1:13:38,  5.39s/it]
Training 2/16 epoch (loss 3.6562):   7%|β–‹         | 61/880 [06:07<1:15:17,  5.52s/it]
Training 2/16 epoch (loss 3.5781):   7%|β–‹         | 61/880 [06:12<1:15:17,  5.52s/it]
Training 2/16 epoch (loss 3.5781):   7%|β–‹         | 62/880 [06:12<1:11:35,  5.25s/it]
Training 2/16 epoch (loss 3.5469):   7%|β–‹         | 62/880 [06:18<1:11:35,  5.25s/it]
Training 2/16 epoch (loss 3.5469):   7%|β–‹         | 63/880 [06:18<1:12:50,  5.35s/it]
Training 2/16 epoch (loss 3.5938):   7%|β–‹         | 63/880 [06:26<1:12:50,  5.35s/it]
Training 2/16 epoch (loss 3.5938):   7%|β–‹         | 64/880 [06:26<1:25:39,  6.30s/it]
Training 2/16 epoch (loss 3.5312):   7%|β–‹         | 64/880 [06:31<1:25:39,  6.30s/it]
Training 2/16 epoch (loss 3.5312):   7%|β–‹         | 65/880 [06:31<1:20:31,  5.93s/it]
Training 2/16 epoch (loss 3.5469):   7%|β–‹         | 65/880 [06:36<1:20:31,  5.93s/it]
Training 2/16 epoch (loss 3.5469):   8%|β–Š         | 66/880 [06:36<1:15:52,  5.59s/it]
Training 2/16 epoch (loss 3.4375):   8%|β–Š         | 66/880 [06:43<1:15:52,  5.59s/it]
Training 2/16 epoch (loss 3.4375):   8%|β–Š         | 67/880 [06:43<1:23:20,  6.15s/it]
Training 2/16 epoch (loss 3.1875):   8%|β–Š         | 67/880 [06:48<1:23:20,  6.15s/it]
Training 2/16 epoch (loss 3.1875):   8%|β–Š         | 68/880 [06:48<1:17:40,  5.74s/it]
Training 2/16 epoch (loss 3.5000):   8%|β–Š         | 68/880 [06:53<1:17:40,  5.74s/it]
Training 2/16 epoch (loss 3.5000):   8%|β–Š         | 69/880 [06:53<1:13:39,  5.45s/it]
Training 2/16 epoch (loss 3.3438):   8%|β–Š         | 69/880 [06:58<1:13:39,  5.45s/it]
Training 2/16 epoch (loss 3.3438):   8%|β–Š         | 70/880 [06:58<1:13:49,  5.47s/it]
Training 2/16 epoch (loss 3.4062):   8%|β–Š         | 70/880 [07:04<1:13:49,  5.47s/it]
Training 2/16 epoch (loss 3.4062):   8%|β–Š         | 71/880 [07:04<1:12:33,  5.38s/it]
Training 2/16 epoch (loss 3.1719):   8%|β–Š         | 71/880 [07:10<1:12:33,  5.38s/it]
Training 2/16 epoch (loss 3.1719):   8%|β–Š         | 72/880 [07:10<1:17:08,  5.73s/it]
Training 2/16 epoch (loss 3.2344):   8%|β–Š         | 72/880 [07:15<1:17:08,  5.73s/it]
Training 2/16 epoch (loss 3.2344):   8%|β–Š         | 73/880 [07:15<1:14:18,  5.52s/it]
Training 2/16 epoch (loss 3.2812):   8%|β–Š         | 73/880 [07:20<1:14:18,  5.52s/it]
Training 2/16 epoch (loss 3.2812):   8%|β–Š         | 74/880 [07:20<1:11:53,  5.35s/it]
Training 2/16 epoch (loss 2.9688):   8%|β–Š         | 74/880 [07:24<1:11:53,  5.35s/it]
Training 2/16 epoch (loss 2.9688):   9%|β–Š         | 75/880 [07:24<1:06:59,  4.99s/it]
Training 2/16 epoch (loss 3.1719):   9%|β–Š         | 75/880 [07:29<1:06:59,  4.99s/it]
Training 2/16 epoch (loss 3.1719):   9%|β–Š         | 76/880 [07:29<1:06:35,  4.97s/it]
Training 2/16 epoch (loss 3.3125):   9%|β–Š         | 76/880 [07:35<1:06:35,  4.97s/it]
Training 2/16 epoch (loss 3.3125):   9%|β–‰         | 77/880 [07:35<1:08:12,  5.10s/it]
Training 2/16 epoch (loss 3.2031):   9%|β–‰         | 77/880 [07:39<1:08:12,  5.10s/it]
Training 2/16 epoch (loss 3.2031):   9%|β–‰         | 78/880 [07:39<1:04:11,  4.80s/it]
Training 2/16 epoch (loss 3.1094):   9%|β–‰         | 78/880 [07:44<1:04:11,  4.80s/it]
Training 2/16 epoch (loss 3.1094):   9%|β–‰         | 79/880 [07:44<1:05:17,  4.89s/it]
Training 2/16 epoch (loss 3.1250):   9%|β–‰         | 79/880 [07:56<1:05:17,  4.89s/it]
Training 2/16 epoch (loss 3.1250):   9%|β–‰         | 80/880 [07:56<1:35:31,  7.16s/it]
Training 2/16 epoch (loss 2.9219):   9%|β–‰         | 80/880 [08:02<1:35:31,  7.16s/it]
Training 2/16 epoch (loss 2.9219):   9%|β–‰         | 81/880 [08:02<1:29:59,  6.76s/it]
Training 2/16 epoch (loss 3.0781):   9%|β–‰         | 81/880 [08:07<1:29:59,  6.76s/it]
Training 2/16 epoch (loss 3.0781):   9%|β–‰         | 82/880 [08:07<1:24:17,  6.34s/it]
Training 2/16 epoch (loss 2.8281):   9%|β–‰         | 82/880 [08:13<1:24:17,  6.34s/it]
Training 2/16 epoch (loss 2.8281):   9%|β–‰         | 83/880 [08:13<1:20:01,  6.02s/it]
Training 2/16 epoch (loss 3.0000):   9%|β–‰         | 83/880 [08:17<1:20:01,  6.02s/it]
Training 2/16 epoch (loss 3.0000):  10%|β–‰         | 84/880 [08:17<1:14:32,  5.62s/it]
Training 2/16 epoch (loss 2.9531):  10%|β–‰         | 84/880 [08:23<1:14:32,  5.62s/it]
Training 2/16 epoch (loss 2.9531):  10%|β–‰         | 85/880 [08:23<1:14:04,  5.59s/it]
Training 2/16 epoch (loss 3.0000):  10%|β–‰         | 85/880 [08:27<1:14:04,  5.59s/it]
Training 2/16 epoch (loss 3.0000):  10%|β–‰         | 86/880 [08:27<1:09:22,  5.24s/it]
Training 2/16 epoch (loss 2.8281):  10%|β–‰         | 86/880 [08:32<1:09:22,  5.24s/it]
Training 2/16 epoch (loss 2.8281):  10%|β–‰         | 87/880 [08:32<1:06:37,  5.04s/it]
Training 2/16 epoch (loss 2.9531):  10%|β–‰         | 87/880 [08:37<1:06:37,  5.04s/it]
Training 2/16 epoch (loss 2.9531):  10%|β–ˆ         | 88/880 [08:37<1:08:14,  5.17s/it]
Training 2/16 epoch (loss 2.8906):  10%|β–ˆ         | 88/880 [08:43<1:08:14,  5.17s/it]
Training 2/16 epoch (loss 2.8906):  10%|β–ˆ         | 89/880 [08:43<1:09:00,  5.23s/it]
Training 2/16 epoch (loss 2.7969):  10%|β–ˆ         | 89/880 [08:48<1:09:00,  5.23s/it]
Training 2/16 epoch (loss 2.7969):  10%|β–ˆ         | 90/880 [08:48<1:09:32,  5.28s/it]
Training 2/16 epoch (loss 2.9531):  10%|β–ˆ         | 90/880 [08:55<1:09:32,  5.28s/it]
Training 2/16 epoch (loss 2.9531):  10%|β–ˆ         | 91/880 [08:55<1:13:56,  5.62s/it]
Training 2/16 epoch (loss 2.9375):  10%|β–ˆ         | 91/880 [09:09<1:13:56,  5.62s/it]
Training 2/16 epoch (loss 2.9375):  10%|β–ˆ         | 92/880 [09:09<1:48:20,  8.25s/it]
Training 2/16 epoch (loss 2.7500):  10%|β–ˆ         | 92/880 [09:14<1:48:20,  8.25s/it]
Training 2/16 epoch (loss 2.7500):  11%|β–ˆ         | 93/880 [09:14<1:35:56,  7.31s/it]
Training 2/16 epoch (loss 2.9531):  11%|β–ˆ         | 93/880 [09:19<1:35:56,  7.31s/it]
Training 2/16 epoch (loss 2.9531):  11%|β–ˆ         | 94/880 [09:19<1:25:00,  6.49s/it]
Training 2/16 epoch (loss 2.7188):  11%|β–ˆ         | 94/880 [09:24<1:25:00,  6.49s/it]
Training 2/16 epoch (loss 2.7188):  11%|β–ˆ         | 95/880 [09:24<1:19:40,  6.09s/it]
Training 2/16 epoch (loss 2.8281):  11%|β–ˆ         | 95/880 [09:29<1:19:40,  6.09s/it]
Training 2/16 epoch (loss 2.8281):  11%|β–ˆ         | 96/880 [09:29<1:15:02,  5.74s/it]
Training 2/16 epoch (loss 2.8281):  11%|β–ˆ         | 96/880 [09:34<1:15:02,  5.74s/it]
Training 2/16 epoch (loss 2.8281):  11%|β–ˆ         | 97/880 [09:34<1:13:42,  5.65s/it]
Training 2/16 epoch (loss 2.7500):  11%|β–ˆ         | 97/880 [09:40<1:13:42,  5.65s/it]
Training 2/16 epoch (loss 2.7500):  11%|β–ˆ         | 98/880 [09:40<1:15:41,  5.81s/it]
Training 2/16 epoch (loss 2.9531):  11%|β–ˆ         | 98/880 [09:48<1:15:41,  5.81s/it]
Training 2/16 epoch (loss 2.9531):  11%|β–ˆβ–        | 99/880 [09:48<1:23:42,  6.43s/it]
Training 2/16 epoch (loss 2.9844):  11%|β–ˆβ–        | 99/880 [10:04<1:23:42,  6.43s/it]
Training 2/16 epoch (loss 2.9844):  11%|β–ˆβ–        | 100/880 [10:04<1:59:52,  9.22s/it]
Training 2/16 epoch (loss 2.5781):  11%|β–ˆβ–        | 100/880 [10:10<1:59:52,  9.22s/it]
Training 2/16 epoch (loss 2.5781):  11%|β–ˆβ–        | 101/880 [10:10<1:46:45,  8.22s/it]
Training 2/16 epoch (loss 2.5156):  11%|β–ˆβ–        | 101/880 [10:15<1:46:45,  8.22s/it]
Training 2/16 epoch (loss 2.5156):  12%|β–ˆβ–        | 102/880 [10:15<1:33:23,  7.20s/it]
Training 2/16 epoch (loss 2.7031):  12%|β–ˆβ–        | 102/880 [10:20<1:33:23,  7.20s/it]
Training 2/16 epoch (loss 2.7031):  12%|β–ˆβ–        | 103/880 [10:20<1:26:19,  6.67s/it]
Training 2/16 epoch (loss 2.6094):  12%|β–ˆβ–        | 103/880 [10:27<1:26:19,  6.67s/it]
Training 2/16 epoch (loss 2.6094):  12%|β–ˆβ–        | 104/880 [10:27<1:27:27,  6.76s/it]
Training 2/16 epoch (loss 2.7656):  12%|β–ˆβ–        | 104/880 [10:33<1:27:27,  6.76s/it]
Training 2/16 epoch (loss 2.7656):  12%|β–ˆβ–        | 105/880 [10:33<1:23:08,  6.44s/it]
Training 2/16 epoch (loss 2.8125):  12%|β–ˆβ–        | 105/880 [10:39<1:23:08,  6.44s/it]
Training 2/16 epoch (loss 2.8125):  12%|β–ˆβ–        | 106/880 [10:39<1:23:35,  6.48s/it]
Training 2/16 epoch (loss 2.4844):  12%|β–ˆβ–        | 106/880 [10:45<1:23:35,  6.48s/it]
Training 2/16 epoch (loss 2.4844):  12%|β–ˆβ–        | 107/880 [10:45<1:19:15,  6.15s/it]
Training 2/16 epoch (loss 2.7969):  12%|β–ˆβ–        | 107/880 [10:50<1:19:15,  6.15s/it]
Training 2/16 epoch (loss 2.7969):  12%|β–ˆβ–        | 108/880 [10:50<1:13:46,  5.73s/it]
Training 2/16 epoch (loss 2.6562):  12%|β–ˆβ–        | 108/880 [10:56<1:13:46,  5.73s/it]
Training 2/16 epoch (loss 2.6562):  12%|β–ˆβ–        | 109/880 [10:56<1:18:16,  6.09s/it]
Training 2/16 epoch (loss 2.4688):  12%|β–ˆβ–        | 109/880 [11:01<1:18:16,  6.09s/it]
Training 2/16 epoch (loss 2.4688):  12%|β–ˆβ–Ž        | 110/880 [11:01<1:12:34,  5.65s/it]
Training 3/16 epoch (loss 2.6250):  12%|β–ˆβ–Ž        | 110/880 [11:06<1:12:34,  5.65s/it]
Training 3/16 epoch (loss 2.6250):  13%|β–ˆβ–Ž        | 111/880 [11:06<1:10:50,  5.53s/it]
Training 3/16 epoch (loss 2.6406):  13%|β–ˆβ–Ž        | 111/880 [11:12<1:10:50,  5.53s/it]
Training 3/16 epoch (loss 2.6406):  13%|β–ˆβ–Ž        | 112/880 [11:12<1:12:29,  5.66s/it]
Training 3/16 epoch (loss 2.5156):  13%|β–ˆβ–Ž        | 112/880 [11:18<1:12:29,  5.66s/it]
Training 3/16 epoch (loss 2.5156):  13%|β–ˆβ–Ž        | 113/880 [11:18<1:13:40,  5.76s/it]
Training 3/16 epoch (loss 2.7500):  13%|β–ˆβ–Ž        | 113/880 [11:23<1:13:40,  5.76s/it]
Training 3/16 epoch (loss 2.7500):  13%|β–ˆβ–Ž        | 114/880 [11:23<1:09:27,  5.44s/it]
Training 3/16 epoch (loss 2.6094):  13%|β–ˆβ–Ž        | 114/880 [11:28<1:09:27,  5.44s/it]
Training 3/16 epoch (loss 2.6094):  13%|β–ˆβ–Ž        | 115/880 [11:28<1:08:34,  5.38s/it]
Training 3/16 epoch (loss 2.6250):  13%|β–ˆβ–Ž        | 115/880 [11:34<1:08:34,  5.38s/it]
Training 3/16 epoch (loss 2.6250):  13%|β–ˆβ–Ž        | 116/880 [11:34<1:10:06,  5.51s/it]
Training 3/16 epoch (loss 2.6094):  13%|β–ˆβ–Ž        | 116/880 [11:39<1:10:06,  5.51s/it]
Training 3/16 epoch (loss 2.6094):  13%|β–ˆβ–Ž        | 117/880 [11:39<1:06:38,  5.24s/it]
Training 3/16 epoch (loss 2.5938):  13%|β–ˆβ–Ž        | 117/880 [11:44<1:06:38,  5.24s/it]
Training 3/16 epoch (loss 2.5938):  13%|β–ˆβ–Ž        | 118/880 [11:44<1:07:49,  5.34s/it]
Training 3/16 epoch (loss 2.6562):  13%|β–ˆβ–Ž        | 118/880 [11:53<1:07:49,  5.34s/it]
Training 3/16 epoch (loss 2.6562):  14%|β–ˆβ–Ž        | 119/880 [11:53<1:19:49,  6.29s/it]
Training 3/16 epoch (loss 2.6094):  14%|β–ˆβ–Ž        | 119/880 [11:58<1:19:49,  6.29s/it]
Training 3/16 epoch (loss 2.6094):  14%|β–ˆβ–Ž        | 120/880 [11:58<1:15:04,  5.93s/it]
Training 3/16 epoch (loss 2.6719):  14%|β–ˆβ–Ž        | 120/880 [12:03<1:15:04,  5.93s/it]
Training 3/16 epoch (loss 2.6719):  14%|β–ˆβ–        | 121/880 [12:03<1:10:46,  5.60s/it]
Training 3/16 epoch (loss 2.5938):  14%|β–ˆβ–        | 121/880 [12:10<1:10:46,  5.60s/it]
Training 3/16 epoch (loss 2.5938):  14%|β–ˆβ–        | 122/880 [12:10<1:17:43,  6.15s/it]
Training 3/16 epoch (loss 2.3281):  14%|β–ˆβ–        | 122/880 [12:15<1:17:43,  6.15s/it]
Training 3/16 epoch (loss 2.3281):  14%|β–ˆβ–        | 123/880 [12:15<1:12:25,  5.74s/it]
Training 3/16 epoch (loss 2.7500):  14%|β–ˆβ–        | 123/880 [12:20<1:12:25,  5.74s/it]
Training 3/16 epoch (loss 2.7500):  14%|β–ˆβ–        | 124/880 [12:20<1:08:39,  5.45s/it]
Training 3/16 epoch (loss 2.5312):  14%|β–ˆβ–        | 124/880 [12:25<1:08:39,  5.45s/it]
Training 3/16 epoch (loss 2.5312):  14%|β–ˆβ–        | 125/880 [12:25<1:08:48,  5.47s/it]
Training 3/16 epoch (loss 2.6562):  14%|β–ˆβ–        | 125/880 [12:30<1:08:48,  5.47s/it]
Training 3/16 epoch (loss 2.6562):  14%|β–ˆβ–        | 126/880 [12:30<1:07:35,  5.38s/it]
Training 3/16 epoch (loss 2.4844):  14%|β–ˆβ–        | 126/880 [12:37<1:07:35,  5.38s/it]
Training 3/16 epoch (loss 2.4844):  14%|β–ˆβ–        | 127/880 [12:37<1:11:51,  5.73s/it]
Training 3/16 epoch (loss 2.5312):  14%|β–ˆβ–        | 127/880 [12:42<1:11:51,  5.73s/it]
Training 3/16 epoch (loss 2.5312):  15%|β–ˆβ–        | 128/880 [12:42<1:09:11,  5.52s/it]
Training 3/16 epoch (loss 2.6094):  15%|β–ˆβ–        | 128/880 [12:47<1:09:11,  5.52s/it]
Training 3/16 epoch (loss 2.6094):  15%|β–ˆβ–        | 129/880 [12:47<1:06:55,  5.35s/it]
Training 3/16 epoch (loss 2.2500):  15%|β–ˆβ–        | 129/880 [12:51<1:06:55,  5.35s/it]
Training 3/16 epoch (loss 2.2500):  15%|β–ˆβ–        | 130/880 [12:51<1:02:23,  4.99s/it]
Training 3/16 epoch (loss 2.4844):  15%|β–ˆβ–        | 130/880 [12:56<1:02:23,  4.99s/it]
Training 3/16 epoch (loss 2.4844):  15%|β–ˆβ–        | 131/880 [12:56<1:02:01,  4.97s/it]
Training 3/16 epoch (loss 2.7188):  15%|β–ˆβ–        | 131/880 [13:01<1:02:01,  4.97s/it]
Training 3/16 epoch (loss 2.7188):  15%|β–ˆβ–Œ        | 132/880 [13:01<1:03:32,  5.10s/it]
Training 3/16 epoch (loss 2.5625):  15%|β–ˆβ–Œ        | 132/880 [13:05<1:03:32,  5.10s/it]
Training 3/16 epoch (loss 2.5625):  15%|β–ˆβ–Œ        | 133/880 [13:05<59:49,  4.80s/it]  
Training 3/16 epoch (loss 2.4844):  15%|β–ˆβ–Œ        | 133/880 [13:11<59:49,  4.80s/it]
Training 3/16 epoch (loss 2.4844):  15%|β–ˆβ–Œ        | 134/880 [13:11<1:00:51,  4.89s/it]
Training 3/16 epoch (loss 2.5312):  15%|β–ˆβ–Œ        | 134/880 [13:23<1:00:51,  4.89s/it]
Training 3/16 epoch (loss 2.5312):  15%|β–ˆβ–Œ        | 135/880 [13:23<1:28:58,  7.17s/it]
Training 3/16 epoch (loss 2.3438):  15%|β–ˆβ–Œ        | 135/880 [13:29<1:28:58,  7.17s/it]
Training 3/16 epoch (loss 2.3438):  15%|β–ˆβ–Œ        | 136/880 [13:29<1:23:47,  6.76s/it]
Training 3/16 epoch (loss 2.5000):  15%|β–ˆβ–Œ        | 136/880 [13:34<1:23:47,  6.76s/it]
Training 3/16 epoch (loss 2.5000):  16%|β–ˆβ–Œ        | 137/880 [13:34<1:18:29,  6.34s/it]
Training 3/16 epoch (loss 2.2812):  16%|β–ˆβ–Œ        | 137/880 [13:39<1:18:29,  6.34s/it]
Training 3/16 epoch (loss 2.2812):  16%|β–ˆβ–Œ        | 138/880 [13:39<1:14:30,  6.03s/it]
Training 3/16 epoch (loss 2.4219):  16%|β–ˆβ–Œ        | 138/880 [13:44<1:14:30,  6.03s/it]
Training 3/16 epoch (loss 2.4219):  16%|β–ˆβ–Œ        | 139/880 [13:44<1:09:24,  5.62s/it]
Training 3/16 epoch (loss 2.4062):  16%|β–ˆβ–Œ        | 139/880 [13:50<1:09:24,  5.62s/it]
Training 3/16 epoch (loss 2.4062):  16%|β–ˆβ–Œ        | 140/880 [13:50<1:08:58,  5.59s/it]
Training 3/16 epoch (loss 2.4688):  16%|β–ˆβ–Œ        | 140/880 [13:54<1:08:58,  5.59s/it]
Training 3/16 epoch (loss 2.4688):  16%|β–ˆβ–Œ        | 141/880 [13:54<1:04:34,  5.24s/it]
Training 3/16 epoch (loss 2.2969):  16%|β–ˆβ–Œ        | 141/880 [13:59<1:04:34,  5.24s/it]
Training 3/16 epoch (loss 2.2969):  16%|β–ˆβ–Œ        | 142/880 [13:59<1:02:02,  5.04s/it]
Training 3/16 epoch (loss 2.4844):  16%|β–ˆβ–Œ        | 142/880 [14:04<1:02:02,  5.04s/it]
Training 3/16 epoch (loss 2.4844):  16%|β–ˆβ–‹        | 143/880 [14:04<1:03:32,  5.17s/it]
Training 3/16 epoch (loss 2.3594):  16%|β–ˆβ–‹        | 143/880 [14:10<1:03:32,  5.17s/it]
Training 3/16 epoch (loss 2.3594):  16%|β–ˆβ–‹        | 144/880 [14:10<1:04:17,  5.24s/it]
Training 3/16 epoch (loss 2.2656):  16%|β–ˆβ–‹        | 144/880 [14:15<1:04:17,  5.24s/it]
Training 3/16 epoch (loss 2.2656):  16%|β–ˆβ–‹        | 145/880 [14:15<1:04:46,  5.29s/it]
Training 3/16 epoch (loss 2.4688):  16%|β–ˆβ–‹        | 145/880 [14:21<1:04:46,  5.29s/it]
Training 3/16 epoch (loss 2.4688):  17%|β–ˆβ–‹        | 146/880 [14:21<1:08:51,  5.63s/it]
Training 3/16 epoch (loss 2.4688):  17%|β–ˆβ–‹        | 146/880 [14:36<1:08:51,  5.63s/it]
Training 3/16 epoch (loss 2.4688):  17%|β–ˆβ–‹        | 147/880 [14:36<1:40:51,  8.26s/it]
Training 3/16 epoch (loss 2.2656):  17%|β–ˆβ–‹        | 147/880 [14:41<1:40:51,  8.26s/it]
Training 3/16 epoch (loss 2.2656):  17%|β–ˆβ–‹        | 148/880 [14:41<1:29:16,  7.32s/it]
Training 3/16 epoch (loss 2.5156):  17%|β–ˆβ–‹        | 148/880 [14:45<1:29:16,  7.32s/it]
Training 3/16 epoch (loss 2.5156):  17%|β–ˆβ–‹        | 149/880 [14:45<1:19:06,  6.49s/it]
Training 3/16 epoch (loss 2.2812):  17%|β–ˆβ–‹        | 149/880 [14:51<1:19:06,  6.49s/it]
Training 3/16 epoch (loss 2.2812):  17%|β–ˆβ–‹        | 150/880 [14:51<1:14:07,  6.09s/it]
Training 3/16 epoch (loss 2.3594):  17%|β–ˆβ–‹        | 150/880 [14:56<1:14:07,  6.09s/it]
Training 3/16 epoch (loss 2.3594):  17%|β–ˆβ–‹        | 151/880 [14:56<1:09:48,  5.75s/it]
Training 3/16 epoch (loss 2.4062):  17%|β–ˆβ–‹        | 151/880 [15:01<1:09:48,  5.75s/it]
Training 3/16 epoch (loss 2.4062):  17%|β–ˆβ–‹        | 152/880 [15:01<1:08:33,  5.65s/it]
Training 3/16 epoch (loss 2.3281):  17%|β–ˆβ–‹        | 152/880 [15:07<1:08:33,  5.65s/it]
Training 3/16 epoch (loss 2.3281):  17%|β–ˆβ–‹        | 153/880 [15:07<1:10:20,  5.81s/it]
Training 3/16 epoch (loss 2.5312):  17%|β–ˆβ–‹        | 153/880 [15:15<1:10:20,  5.81s/it]
Training 3/16 epoch (loss 2.5312):  18%|β–ˆβ–Š        | 154/880 [15:15<1:17:47,  6.43s/it]
Training 3/16 epoch (loss 2.5625):  18%|β–ˆβ–Š        | 154/880 [15:31<1:17:47,  6.43s/it]
Training 3/16 epoch (loss 2.5625):  18%|β–ˆβ–Š        | 155/880 [15:31<1:51:28,  9.23s/it]
Training 3/16 epoch (loss 2.2188):  18%|β–ˆβ–Š        | 155/880 [15:37<1:51:28,  9.23s/it]
Training 3/16 epoch (loss 2.2188):  18%|β–ˆβ–Š        | 156/880 [15:37<1:39:17,  8.23s/it]
Training 3/16 epoch (loss 2.1719):  18%|β–ˆβ–Š        | 156/880 [15:42<1:39:17,  8.23s/it]
Training 3/16 epoch (loss 2.1719):  18%|β–ˆβ–Š        | 157/880 [15:42<1:26:52,  7.21s/it]
Training 3/16 epoch (loss 2.2812):  18%|β–ˆβ–Š        | 157/880 [15:47<1:26:52,  7.21s/it]
Training 3/16 epoch (loss 2.2812):  18%|β–ˆβ–Š        | 158/880 [15:47<1:20:17,  6.67s/it]
Training 3/16 epoch (loss 2.2031):  18%|β–ˆβ–Š        | 158/880 [15:54<1:20:17,  6.67s/it]
Training 3/16 epoch (loss 2.2031):  18%|β–ˆβ–Š        | 159/880 [15:54<1:21:17,  6.77s/it]
Training 3/16 epoch (loss 2.4219):  18%|β–ˆβ–Š        | 159/880 [16:00<1:21:17,  6.77s/it]
Training 3/16 epoch (loss 2.4219):  18%|β–ˆβ–Š        | 160/880 [16:00<1:17:15,  6.44s/it]
Training 3/16 epoch (loss 2.4375):  18%|β–ˆβ–Š        | 160/880 [16:06<1:17:15,  6.44s/it]
Training 3/16 epoch (loss 2.4375):  18%|β–ˆβ–Š        | 161/880 [16:06<1:17:42,  6.48s/it]
Training 3/16 epoch (loss 2.1406):  18%|β–ˆβ–Š        | 161/880 [16:12<1:17:42,  6.48s/it]
Training 3/16 epoch (loss 2.1406):  18%|β–ˆβ–Š        | 162/880 [16:12<1:13:40,  6.16s/it]
Training 3/16 epoch (loss 2.4375):  18%|β–ˆβ–Š        | 162/880 [16:16<1:13:40,  6.16s/it]
Training 3/16 epoch (loss 2.4375):  19%|β–ˆβ–Š        | 163/880 [16:16<1:08:31,  5.73s/it]
Training 3/16 epoch (loss 2.2812):  19%|β–ˆβ–Š        | 163/880 [16:23<1:08:31,  5.73s/it]
Training 3/16 epoch (loss 2.2812):  19%|β–ˆβ–Š        | 164/880 [16:23<1:12:40,  6.09s/it]
Training 3/16 epoch (loss 2.1094):  19%|β–ˆβ–Š        | 164/880 [16:28<1:12:40,  6.09s/it]
Training 3/16 epoch (loss 2.1094):  19%|β–ˆβ–‰        | 165/880 [16:28<1:07:22,  5.65s/it]
Training 4/16 epoch (loss 2.2812):  19%|β–ˆβ–‰        | 165/880 [16:33<1:07:22,  5.65s/it]
Training 4/16 epoch (loss 2.2812):  19%|β–ˆβ–‰        | 166/880 [16:33<1:05:45,  5.53s/it]
Training 4/16 epoch (loss 2.3125):  19%|β–ˆβ–‰        | 166/880 [16:39<1:05:45,  5.53s/it]
Training 4/16 epoch (loss 2.3125):  19%|β–ˆβ–‰        | 167/880 [16:39<1:07:17,  5.66s/it]
Training 4/16 epoch (loss 2.1562):  19%|β–ˆβ–‰        | 167/880 [16:45<1:07:17,  5.66s/it]
Training 4/16 epoch (loss 2.1562):  19%|β–ˆβ–‰        | 168/880 [16:45<1:08:25,  5.77s/it]
Training 4/16 epoch (loss 2.3906):  19%|β–ˆβ–‰        | 168/880 [16:50<1:08:25,  5.77s/it]
Training 4/16 epoch (loss 2.3906):  19%|β–ˆβ–‰        | 169/880 [16:50<1:04:30,  5.44s/it]
Training 4/16 epoch (loss 2.2656):  19%|β–ˆβ–‰        | 169/880 [16:55<1:04:30,  5.44s/it]
Training 4/16 epoch (loss 2.2656):  19%|β–ˆβ–‰        | 170/880 [16:55<1:03:40,  5.38s/it]
Training 4/16 epoch (loss 2.2500):  19%|β–ˆβ–‰        | 170/880 [17:01<1:03:40,  5.38s/it]
Training 4/16 epoch (loss 2.2500):  19%|β–ˆβ–‰        | 171/880 [17:01<1:05:05,  5.51s/it]
Training 4/16 epoch (loss 2.2500):  19%|β–ˆβ–‰        | 171/880 [17:05<1:05:05,  5.51s/it]
Training 4/16 epoch (loss 2.2500):  20%|β–ˆβ–‰        | 172/880 [17:05<1:01:52,  5.24s/it]
Training 4/16 epoch (loss 2.2188):  20%|β–ˆβ–‰        | 172/880 [17:11<1:01:52,  5.24s/it]
Training 4/16 epoch (loss 2.2188):  20%|β–ˆβ–‰        | 173/880 [17:11<1:02:57,  5.34s/it]
Training 4/16 epoch (loss 2.3125):  20%|β–ˆβ–‰        | 173/880 [17:20<1:02:57,  5.34s/it]
Training 4/16 epoch (loss 2.3125):  20%|β–ˆβ–‰        | 174/880 [17:20<1:14:03,  6.29s/it]
Training 4/16 epoch (loss 2.2812):  20%|β–ˆβ–‰        | 174/880 [17:25<1:14:03,  6.29s/it]
Training 4/16 epoch (loss 2.2812):  20%|β–ˆβ–‰        | 175/880 [17:25<1:09:36,  5.92s/it]
Training 4/16 epoch (loss 2.3125):  20%|β–ˆβ–‰        | 175/880 [17:29<1:09:36,  5.92s/it]
Training 4/16 epoch (loss 2.3125):  20%|β–ˆβ–ˆ        | 176/880 [17:29<1:05:36,  5.59s/it]
Training 4/16 epoch (loss 2.2812):  20%|β–ˆβ–ˆ        | 176/880 [17:37<1:05:36,  5.59s/it]
Training 4/16 epoch (loss 2.2812):  20%|β–ˆβ–ˆ        | 177/880 [17:37<1:12:03,  6.15s/it]
Training 4/16 epoch (loss 1.9922):  20%|β–ˆβ–ˆ        | 177/880 [17:42<1:12:03,  6.15s/it]
Training 4/16 epoch (loss 1.9922):  20%|β–ˆβ–ˆ        | 178/880 [17:42<1:07:09,  5.74s/it]
Training 4/16 epoch (loss 2.4531):  20%|β–ˆβ–ˆ        | 178/880 [17:46<1:07:09,  5.74s/it]
Training 4/16 epoch (loss 2.4531):  20%|β–ˆβ–ˆ        | 179/880 [17:46<1:03:42,  5.45s/it]
Training 4/16 epoch (loss 2.2031):  20%|β–ˆβ–ˆ        | 179/880 [17:52<1:03:42,  5.45s/it]
Training 4/16 epoch (loss 2.2031):  20%|β–ˆβ–ˆ        | 180/880 [17:52<1:03:51,  5.47s/it]
Training 4/16 epoch (loss 2.3281):  20%|β–ˆβ–ˆ        | 180/880 [17:57<1:03:51,  5.47s/it]
Training 4/16 epoch (loss 2.3281):  21%|β–ˆβ–ˆ        | 181/880 [17:57<1:02:43,  5.38s/it]
Training 4/16 epoch (loss 2.1719):  21%|β–ˆβ–ˆ        | 181/880 [18:04<1:02:43,  5.38s/it]
Training 4/16 epoch (loss 2.1719):  21%|β–ˆβ–ˆ        | 182/880 [18:04<1:06:39,  5.73s/it]
Training 4/16 epoch (loss 2.2188):  21%|β–ˆβ–ˆ        | 182/880 [18:09<1:06:39,  5.73s/it]
Training 4/16 epoch (loss 2.2188):  21%|β–ˆβ–ˆ        | 183/880 [18:09<1:04:10,  5.52s/it]
Training 4/16 epoch (loss 2.3125):  21%|β–ˆβ–ˆ        | 183/880 [18:14<1:04:10,  5.52s/it]
Training 4/16 epoch (loss 2.3125):  21%|β–ˆβ–ˆ        | 184/880 [18:14<1:02:04,  5.35s/it]
Training 4/16 epoch (loss 1.9375):  21%|β–ˆβ–ˆ        | 184/880 [18:18<1:02:04,  5.35s/it]
Training 4/16 epoch (loss 1.9375):  21%|β–ˆβ–ˆ        | 185/880 [18:18<57:49,  4.99s/it]  
Training 4/16 epoch (loss 2.1719):  21%|β–ˆβ–ˆ        | 185/880 [18:23<57:49,  4.99s/it]
Training 4/16 epoch (loss 2.1719):  21%|β–ˆβ–ˆ        | 186/880 [18:23<57:26,  4.97s/it]
Training 4/16 epoch (loss 2.4688):  21%|β–ˆβ–ˆ        | 186/880 [18:28<57:26,  4.97s/it]
Training 4/16 epoch (loss 2.4688):  21%|β–ˆβ–ˆβ–       | 187/880 [18:28<58:51,  5.10s/it]
Training 4/16 epoch (loss 2.2812):  21%|β–ˆβ–ˆβ–       | 187/880 [18:32<58:51,  5.10s/it]
Training 4/16 epoch (loss 2.2812):  21%|β–ˆβ–ˆβ–       | 188/880 [18:32<55:22,  4.80s/it]
Training 4/16 epoch (loss 2.1406):  21%|β–ˆβ–ˆβ–       | 188/880 [18:37<55:22,  4.80s/it]
Training 4/16 epoch (loss 2.1406):  21%|β–ˆβ–ˆβ–       | 189/880 [18:37<56:19,  4.89s/it]
Training 4/16 epoch (loss 2.2500):  21%|β–ˆβ–ˆβ–       | 189/880 [18:50<56:19,  4.89s/it]
Training 4/16 epoch (loss 2.2500):  22%|β–ˆβ–ˆβ–       | 190/880 [18:50<1:22:25,  7.17s/it]
Training 4/16 epoch (loss 2.0781):  22%|β–ˆβ–ˆβ–       | 190/880 [18:56<1:22:25,  7.17s/it]
Training 4/16 epoch (loss 2.0781):  22%|β–ˆβ–ˆβ–       | 191/880 [18:56<1:17:38,  6.76s/it]
Training 4/16 epoch (loss 2.2188):  22%|β–ˆβ–ˆβ–       | 191/880 [19:01<1:17:38,  6.76s/it]
Training 4/16 epoch (loss 2.2188):  22%|β–ˆβ–ˆβ–       | 192/880 [19:01<1:12:42,  6.34s/it]
Training 4/16 epoch (loss 1.9688):  22%|β–ˆβ–ˆβ–       | 192/880 [19:06<1:12:42,  6.34s/it]
Training 4/16 epoch (loss 1.9688):  22%|β–ˆβ–ˆβ–       | 193/880 [19:06<1:09:00,  6.03s/it]
Training 4/16 epoch (loss 2.1250):  22%|β–ˆβ–ˆβ–       | 193/880 [19:11<1:09:00,  6.03s/it]
Training 4/16 epoch (loss 2.1250):  22%|β–ˆβ–ˆβ–       | 194/880 [19:11<1:04:14,  5.62s/it]
Training 4/16 epoch (loss 2.1094):  22%|β–ˆβ–ˆβ–       | 194/880 [19:16<1:04:14,  5.62s/it]
Training 4/16 epoch (loss 2.1094):  22%|β–ˆβ–ˆβ–       | 195/880 [19:16<1:03:49,  5.59s/it]
Training 4/16 epoch (loss 2.1719):  22%|β–ˆβ–ˆβ–       | 195/880 [19:21<1:03:49,  5.59s/it]
Training 4/16 epoch (loss 2.1719):  22%|β–ˆβ–ˆβ–       | 196/880 [19:21<59:43,  5.24s/it]  
Training 4/16 epoch (loss 2.0156):  22%|β–ˆβ–ˆβ–       | 196/880 [19:25<59:43,  5.24s/it]
Training 4/16 epoch (loss 2.0156):  22%|β–ˆβ–ˆβ–       | 197/880 [19:25<57:20,  5.04s/it]
Training 4/16 epoch (loss 2.2188):  22%|β–ˆβ–ˆβ–       | 197/880 [19:31<57:20,  5.04s/it]
Training 4/16 epoch (loss 2.2188):  22%|β–ˆβ–ˆβ–Ž       | 198/880 [19:31<58:41,  5.16s/it]
Training 4/16 epoch (loss 2.0938):  22%|β–ˆβ–ˆβ–Ž       | 198/880 [19:36<58:41,  5.16s/it]
Training 4/16 epoch (loss 2.0938):  23%|β–ˆβ–ˆβ–Ž       | 199/880 [19:36<59:21,  5.23s/it]
Training 4/16 epoch (loss 1.9844):  23%|β–ˆβ–ˆβ–Ž       | 199/880 [19:42<59:21,  5.23s/it]
Training 4/16 epoch (loss 1.9844):  23%|β–ˆβ–ˆβ–Ž       | 200/880 [19:42<59:49,  5.28s/it]
Training 4/16 epoch (loss 2.2031):  23%|β–ˆβ–ˆβ–Ž       | 200/880 [19:48<59:49,  5.28s/it]
Training 4/16 epoch (loss 2.2031):  23%|β–ˆβ–ˆβ–Ž       | 201/880 [19:48<1:03:38,  5.62s/it]
Training 4/16 epoch (loss 2.2188):  23%|β–ˆβ–ˆβ–Ž       | 201/880 [20:02<1:03:38,  5.62s/it]
Training 4/16 epoch (loss 2.2188):  23%|β–ˆβ–ˆβ–Ž       | 202/880 [20:02<1:33:13,  8.25s/it]
Training 4/16 epoch (loss 2.0312):  23%|β–ˆβ–ˆβ–Ž       | 202/880 [20:08<1:33:13,  8.25s/it]
Training 4/16 epoch (loss 2.0312):  23%|β–ˆβ–ˆβ–Ž       | 203/880 [20:08<1:22:32,  7.32s/it]
Training 4/16 epoch (loss 2.2656):  23%|β–ˆβ–ˆβ–Ž       | 203/880 [20:12<1:22:32,  7.32s/it]
Training 4/16 epoch (loss 2.2656):  23%|β–ˆβ–ˆβ–Ž       | 204/880 [20:12<1:13:09,  6.49s/it]
Training 4/16 epoch (loss 2.0312):  23%|β–ˆβ–ˆβ–Ž       | 204/880 [20:17<1:13:09,  6.49s/it]
Training 4/16 epoch (loss 2.0312):  23%|β–ˆβ–ˆβ–Ž       | 205/880 [20:17<1:08:32,  6.09s/it]
Training 4/16 epoch (loss 2.0938):  23%|β–ˆβ–ˆβ–Ž       | 205/880 [20:22<1:08:32,  6.09s/it]
Training 4/16 epoch (loss 2.0938):  23%|β–ˆβ–ˆβ–Ž       | 206/880 [20:22<1:04:30,  5.74s/it]
Training 4/16 epoch (loss 2.1719):  23%|β–ˆβ–ˆβ–Ž       | 206/880 [20:28<1:04:30,  5.74s/it]
Training 4/16 epoch (loss 2.1719):  24%|β–ˆβ–ˆβ–Ž       | 207/880 [20:28<1:03:19,  5.65s/it]
Training 4/16 epoch (loss 2.0625):  24%|β–ˆβ–ˆβ–Ž       | 207/880 [20:34<1:03:19,  5.65s/it]
Training 4/16 epoch (loss 2.0625):  24%|β–ˆβ–ˆβ–Ž       | 208/880 [20:34<1:04:58,  5.80s/it]
Training 4/16 epoch (loss 2.3438):  24%|β–ˆβ–ˆβ–Ž       | 208/880 [20:42<1:04:58,  5.80s/it]
Training 4/16 epoch (loss 2.3438):  24%|β–ˆβ–ˆβ–       | 209/880 [20:42<1:11:49,  6.42s/it]
Training 4/16 epoch (loss 2.2969):  24%|β–ˆβ–ˆβ–       | 209/880 [20:57<1:11:49,  6.42s/it]
Training 4/16 epoch (loss 2.2969):  24%|β–ˆβ–ˆβ–       | 210/880 [20:57<1:42:53,  9.21s/it]
Training 4/16 epoch (loss 2.0156):  24%|β–ˆβ–ˆβ–       | 210/880 [21:03<1:42:53,  9.21s/it]
Training 4/16 epoch (loss 2.0156):  24%|β–ˆβ–ˆβ–       | 211/880 [21:03<1:31:38,  8.22s/it]
Training 4/16 epoch (loss 1.9141):  24%|β–ˆβ–ˆβ–       | 211/880 [21:08<1:31:38,  8.22s/it]
Training 4/16 epoch (loss 1.9141):  24%|β–ˆβ–ˆβ–       | 212/880 [21:08<1:20:10,  7.20s/it]
Training 4/16 epoch (loss 2.0312):  24%|β–ˆβ–ˆβ–       | 212/880 [21:14<1:20:10,  7.20s/it]
Training 4/16 epoch (loss 2.0312):  24%|β–ˆβ–ˆβ–       | 213/880 [21:14<1:14:05,  6.67s/it]
Training 4/16 epoch (loss 1.9453):  24%|β–ˆβ–ˆβ–       | 213/880 [21:21<1:14:05,  6.67s/it]
Training 4/16 epoch (loss 1.9453):  24%|β–ˆβ–ˆβ–       | 214/880 [21:21<1:15:05,  6.76s/it]
Training 4/16 epoch (loss 2.1562):  24%|β–ˆβ–ˆβ–       | 214/880 [21:26<1:15:05,  6.76s/it]
Training 4/16 epoch (loss 2.1562):  24%|β–ˆβ–ˆβ–       | 215/880 [21:26<1:11:23,  6.44s/it]
Training 4/16 epoch (loss 2.2188):  24%|β–ˆβ–ˆβ–       | 215/880 [21:33<1:11:23,  6.44s/it]
Training 4/16 epoch (loss 2.2188):  25%|β–ˆβ–ˆβ–       | 216/880 [21:33<1:11:46,  6.49s/it]
Training 4/16 epoch (loss 1.9062):  25%|β–ˆβ–ˆβ–       | 216/880 [21:38<1:11:46,  6.49s/it]
Training 4/16 epoch (loss 1.9062):  25%|β–ˆβ–ˆβ–       | 217/880 [21:38<1:08:00,  6.16s/it]
Training 4/16 epoch (loss 2.2188):  25%|β–ˆβ–ˆβ–       | 217/880 [21:43<1:08:00,  6.16s/it]
Training 4/16 epoch (loss 2.2188):  25%|β–ˆβ–ˆβ–       | 218/880 [21:43<1:03:15,  5.73s/it]
Training 4/16 epoch (loss 2.0781):  25%|β–ˆβ–ˆβ–       | 218/880 [21:50<1:03:15,  5.73s/it]
Training 4/16 epoch (loss 2.0781):  25%|β–ˆβ–ˆβ–       | 219/880 [21:50<1:07:03,  6.09s/it]
Training 4/16 epoch (loss 1.9062):  25%|β–ˆβ–ˆβ–       | 219/880 [21:55<1:07:03,  6.09s/it]
Training 4/16 epoch (loss 1.9062):  25%|β–ˆβ–ˆβ–Œ       | 220/880 [21:55<1:02:07,  5.65s/it]
Training 5/16 epoch (loss 2.0469):  25%|β–ˆβ–ˆβ–Œ       | 220/880 [22:00<1:02:07,  5.65s/it]
Training 5/16 epoch (loss 2.0469):  25%|β–ˆβ–ˆβ–Œ       | 221/880 [22:00<1:00:37,  5.52s/it]
Training 5/16 epoch (loss 2.1094):  25%|β–ˆβ–ˆβ–Œ       | 221/880 [22:06<1:00:37,  5.52s/it]
Training 5/16 epoch (loss 2.1094):  25%|β–ˆβ–ˆβ–Œ       | 222/880 [22:06<1:02:02,  5.66s/it]
Training 5/16 epoch (loss 1.9141):  25%|β–ˆβ–ˆβ–Œ       | 222/880 [22:12<1:02:02,  5.66s/it]
Training 5/16 epoch (loss 1.9141):  25%|β–ˆβ–ˆβ–Œ       | 223/880 [22:12<1:03:04,  5.76s/it]
Training 5/16 epoch (loss 2.1719):  25%|β–ˆβ–ˆβ–Œ       | 223/880 [22:16<1:03:04,  5.76s/it]
Training 5/16 epoch (loss 2.1719):  25%|β–ˆβ–ˆβ–Œ       | 224/880 [22:16<59:29,  5.44s/it]  
Training 5/16 epoch (loss 2.0781):  25%|β–ˆβ–ˆβ–Œ       | 224/880 [22:22<59:29,  5.44s/it]
Training 5/16 epoch (loss 2.0781):  26%|β–ˆβ–ˆβ–Œ       | 225/880 [22:22<58:47,  5.39s/it]
Training 5/16 epoch (loss 1.9766):  26%|β–ˆβ–ˆβ–Œ       | 225/880 [22:28<58:47,  5.39s/it]
Training 5/16 epoch (loss 1.9766):  26%|β–ˆβ–ˆβ–Œ       | 226/880 [22:28<1:00:08,  5.52s/it]
Training 5/16 epoch (loss 2.0156):  26%|β–ˆβ–ˆβ–Œ       | 226/880 [22:32<1:00:08,  5.52s/it]
Training 5/16 epoch (loss 2.0156):  26%|β–ˆβ–ˆβ–Œ       | 227/880 [22:32<57:10,  5.25s/it]  
Training 5/16 epoch (loss 2.0156):  26%|β–ˆβ–ˆβ–Œ       | 227/880 [22:38<57:10,  5.25s/it]
Training 5/16 epoch (loss 2.0156):  26%|β–ˆβ–ˆβ–Œ       | 228/880 [22:38<58:08,  5.35s/it]
Training 5/16 epoch (loss 2.1250):  26%|β–ˆβ–ˆβ–Œ       | 228/880 [22:46<58:08,  5.35s/it]
Training 5/16 epoch (loss 2.1250):  26%|β–ˆβ–ˆβ–Œ       | 229/880 [22:46<1:08:20,  6.30s/it]
Training 5/16 epoch (loss 2.0625):  26%|β–ˆβ–ˆβ–Œ       | 229/880 [22:51<1:08:20,  6.30s/it]
Training 5/16 epoch (loss 2.0625):  26%|β–ˆβ–ˆβ–Œ       | 230/880 [22:51<1:04:13,  5.93s/it]
Training 5/16 epoch (loss 2.0938):  26%|β–ˆβ–ˆβ–Œ       | 230/880 [22:56<1:04:13,  5.93s/it]
Training 5/16 epoch (loss 2.0938):  26%|β–ˆβ–ˆβ–‹       | 231/880 [22:56<1:00:29,  5.59s/it]
Training 5/16 epoch (loss 2.0781):  26%|β–ˆβ–ˆβ–‹       | 231/880 [23:04<1:00:29,  5.59s/it]
Training 5/16 epoch (loss 2.0781):  26%|β–ˆβ–ˆβ–‹       | 232/880 [23:04<1:06:25,  6.15s/it]
Training 5/16 epoch (loss 1.7812):  26%|β–ˆβ–ˆβ–‹       | 232/880 [23:08<1:06:25,  6.15s/it]
Training 5/16 epoch (loss 1.7812):  26%|β–ˆβ–ˆβ–‹       | 233/880 [23:08<1:01:53,  5.74s/it]
Training 5/16 epoch (loss 2.2500):  26%|β–ˆβ–ˆβ–‹       | 233/880 [23:13<1:01:53,  5.74s/it]
Training 5/16 epoch (loss 2.2500):  27%|β–ˆβ–ˆβ–‹       | 234/880 [23:13<58:41,  5.45s/it]  
Training 5/16 epoch (loss 2.0000):  27%|β–ˆβ–ˆβ–‹       | 234/880 [23:19<58:41,  5.45s/it]
Training 5/16 epoch (loss 2.0000):  27%|β–ˆβ–ˆβ–‹       | 235/880 [23:19<58:49,  5.47s/it]
Training 5/16 epoch (loss 2.0781):  27%|β–ˆβ–ˆβ–‹       | 235/880 [23:24<58:49,  5.47s/it]
Training 5/16 epoch (loss 2.0781):  27%|β–ˆβ–ˆβ–‹       | 236/880 [23:24<57:48,  5.39s/it]
Training 5/16 epoch (loss 1.9609):  27%|β–ˆβ–ˆβ–‹       | 236/880 [23:30<57:48,  5.39s/it]
Training 5/16 epoch (loss 1.9609):  27%|β–ˆβ–ˆβ–‹       | 237/880 [23:30<1:01:27,  5.73s/it]
Training 5/16 epoch (loss 2.0156):  27%|β–ˆβ–ˆβ–‹       | 237/880 [23:35<1:01:27,  5.73s/it]
Training 5/16 epoch (loss 2.0156):  27%|β–ˆβ–ˆβ–‹       | 238/880 [23:35<59:11,  5.53s/it]  
Training 5/16 epoch (loss 2.0938):  27%|β–ˆβ–ˆβ–‹       | 238/880 [23:40<59:11,  5.53s/it]
Training 5/16 epoch (loss 2.0938):  27%|β–ˆβ–ˆβ–‹       | 239/880 [23:40<57:14,  5.36s/it]
Training 5/16 epoch (loss 1.7344):  27%|β–ˆβ–ˆβ–‹       | 239/880 [23:45<57:14,  5.36s/it]
Training 5/16 epoch (loss 1.7344):  27%|β–ˆβ–ˆβ–‹       | 240/880 [23:45<53:17,  5.00s/it]
Training 5/16 epoch (loss 1.9375):  27%|β–ˆβ–ˆβ–‹       | 240/880 [23:49<53:17,  5.00s/it]
Training 5/16 epoch (loss 1.9375):  27%|β–ˆβ–ˆβ–‹       | 241/880 [23:49<52:56,  4.97s/it]
Training 5/16 epoch (loss 2.2656):  27%|β–ˆβ–ˆβ–‹       | 241/880 [23:55<52:56,  4.97s/it]
Training 5/16 epoch (loss 2.2656):  28%|β–ˆβ–ˆβ–Š       | 242/880 [23:55<54:11,  5.10s/it]
Training 5/16 epoch (loss 2.0625):  28%|β–ˆβ–ˆβ–Š       | 242/880 [23:59<54:11,  5.10s/it]
Training 5/16 epoch (loss 2.0625):  28%|β–ˆβ–ˆβ–Š       | 243/880 [23:59<50:59,  4.80s/it]
Training 5/16 epoch (loss 1.9062):  28%|β–ˆβ–ˆβ–Š       | 243/880 [24:04<50:59,  4.80s/it]
Training 5/16 epoch (loss 1.9062):  28%|β–ˆβ–ˆβ–Š       | 244/880 [24:04<51:50,  4.89s/it]
Training 5/16 epoch (loss 2.0156):  28%|β–ˆβ–ˆβ–Š       | 244/880 [24:17<51:50,  4.89s/it]
Training 5/16 epoch (loss 2.0156):  28%|β–ˆβ–ˆβ–Š       | 245/880 [24:17<1:15:50,  7.17s/it]
Training 5/16 epoch (loss 1.8750):  28%|β–ˆβ–ˆβ–Š       | 245/880 [24:22<1:15:50,  7.17s/it]
Training 5/16 epoch (loss 1.8750):  28%|β–ˆβ–ˆβ–Š       | 246/880 [24:22<1:11:27,  6.76s/it]
Training 5/16 epoch (loss 2.0312):  28%|β–ˆβ–ˆβ–Š       | 246/880 [24:28<1:11:27,  6.76s/it]
Training 5/16 epoch (loss 2.0312):  28%|β–ˆβ–ˆβ–Š       | 247/880 [24:28<1:06:55,  6.34s/it]
Training 5/16 epoch (loss 1.7656):  28%|β–ˆβ–ˆβ–Š       | 247/880 [24:33<1:06:55,  6.34s/it]
Training 5/16 epoch (loss 1.7656):  28%|β–ˆβ–ˆβ–Š       | 248/880 [24:33<1:03:33,  6.03s/it]
Training 5/16 epoch (loss 1.9062):  28%|β–ˆβ–ˆβ–Š       | 248/880 [24:38<1:03:33,  6.03s/it]
Training 5/16 epoch (loss 1.9062):  28%|β–ˆβ–ˆβ–Š       | 249/880 [24:38<59:10,  5.63s/it]  
Training 5/16 epoch (loss 1.8984):  28%|β–ˆβ–ˆβ–Š       | 249/880 [24:43<59:10,  5.63s/it]
Training 5/16 epoch (loss 1.8984):  28%|β–ˆβ–ˆβ–Š       | 250/880 [24:43<58:46,  5.60s/it]
Training 5/16 epoch (loss 1.9766):  28%|β–ˆβ–ˆβ–Š       | 250/880 [24:48<58:46,  5.60s/it]
Training 5/16 epoch (loss 1.9766):  29%|β–ˆβ–ˆβ–Š       | 251/880 [24:48<54:58,  5.24s/it]
Training 5/16 epoch (loss 1.7969):  29%|β–ˆβ–ˆβ–Š       | 251/880 [24:52<54:58,  5.24s/it]
Training 5/16 epoch (loss 1.7969):  29%|β–ˆβ–ˆβ–Š       | 252/880 [24:52<52:44,  5.04s/it]
Training 5/16 epoch (loss 2.0156):  29%|β–ˆβ–ˆβ–Š       | 252/880 [24:58<52:44,  5.04s/it]
Training 5/16 epoch (loss 2.0156):  29%|β–ˆβ–ˆβ–‰       | 253/880 [24:58<53:58,  5.17s/it]
Training 5/16 epoch (loss 1.8594):  29%|β–ˆβ–ˆβ–‰       | 253/880 [25:03<53:58,  5.17s/it]
Training 5/16 epoch (loss 1.8594):  29%|β–ˆβ–ˆβ–‰       | 254/880 [25:03<54:34,  5.23s/it]
Training 5/16 epoch (loss 1.7969):  29%|β–ˆβ–ˆβ–‰       | 254/880 [25:08<54:34,  5.23s/it]
Training 5/16 epoch (loss 1.7969):  29%|β–ˆβ–ˆβ–‰       | 255/880 [25:08<54:59,  5.28s/it]
Training 5/16 epoch (loss 2.0000):  29%|β–ˆβ–ˆβ–‰       | 255/880 [25:15<54:59,  5.28s/it]
Training 5/16 epoch (loss 2.0000):  29%|β–ˆβ–ˆβ–‰       | 256/880 [25:15<58:27,  5.62s/it]
Training 5/16 epoch (loss 2.0625):  29%|β–ˆβ–ˆβ–‰       | 256/880 [25:29<58:27,  5.62s/it]
Training 5/16 epoch (loss 2.0625):  29%|β–ˆβ–ˆβ–‰       | 257/880 [25:29<1:25:40,  8.25s/it]
Training 5/16 epoch (loss 1.8594):  29%|β–ˆβ–ˆβ–‰       | 257/880 [25:34<1:25:40,  8.25s/it]
Training 5/16 epoch (loss 1.8594):  29%|β–ˆβ–ˆβ–‰       | 258/880 [25:34<1:15:52,  7.32s/it]
Training 5/16 epoch (loss 2.0781):  29%|β–ˆβ–ˆβ–‰       | 258/880 [25:39<1:15:52,  7.32s/it]
Training 5/16 epoch (loss 2.0781):  29%|β–ˆβ–ˆβ–‰       | 259/880 [25:39<1:07:14,  6.50s/it]
Training 5/16 epoch (loss 1.8359):  29%|β–ˆβ–ˆβ–‰       | 259/880 [25:44<1:07:14,  6.50s/it]
Training 5/16 epoch (loss 1.8359):  30%|β–ˆβ–ˆβ–‰       | 260/880 [25:44<1:03:01,  6.10s/it]
Training 5/16 epoch (loss 1.9141):  30%|β–ˆβ–ˆβ–‰       | 260/880 [25:49<1:03:01,  6.10s/it]
Training 5/16 epoch (loss 1.9141):  30%|β–ˆβ–ˆβ–‰       | 261/880 [25:49<59:19,  5.75s/it]  
Training 5/16 epoch (loss 1.9688):  30%|β–ˆβ–ˆβ–‰       | 261/880 [25:55<59:19,  5.75s/it]
Training 5/16 epoch (loss 1.9688):  30%|β–ˆβ–ˆβ–‰       | 262/880 [25:55<58:12,  5.65s/it]
Training 5/16 epoch (loss 1.8516):  30%|β–ˆβ–ˆβ–‰       | 262/880 [26:01<58:12,  5.65s/it]
Training 5/16 epoch (loss 1.8516):  30%|β–ˆβ–ˆβ–‰       | 263/880 [26:01<59:41,  5.81s/it]
Training 5/16 epoch (loss 2.1562):  30%|β–ˆβ–ˆβ–‰       | 263/880 [26:09<59:41,  5.81s/it]
Training 5/16 epoch (loss 2.1562):  30%|β–ˆβ–ˆβ–ˆ       | 264/880 [26:09<1:05:57,  6.43s/it]
Training 5/16 epoch (loss 2.0938):  30%|β–ˆβ–ˆβ–ˆ       | 264/880 [26:24<1:05:57,  6.43s/it]
Training 5/16 epoch (loss 2.0938):  30%|β–ˆβ–ˆβ–ˆ       | 265/880 [26:24<1:34:28,  9.22s/it]
Training 5/16 epoch (loss 1.7812):  30%|β–ˆβ–ˆβ–ˆ       | 265/880 [26:30<1:34:28,  9.22s/it]
Training 5/16 epoch (loss 1.7812):  30%|β–ˆβ–ˆβ–ˆ       | 266/880 [26:30<1:24:06,  8.22s/it]
Training 5/16 epoch (loss 1.7109):  30%|β–ˆβ–ˆβ–ˆ       | 266/880 [26:35<1:24:06,  8.22s/it]
Training 5/16 epoch (loss 1.7109):  30%|β–ˆβ–ˆβ–ˆ       | 267/880 [26:35<1:13:33,  7.20s/it]
Training 5/16 epoch (loss 1.8516):  30%|β–ˆβ–ˆβ–ˆ       | 267/880 [26:40<1:13:33,  7.20s/it]
Training 5/16 epoch (loss 1.8516):  30%|β–ˆβ–ˆβ–ˆ       | 268/880 [26:40<1:07:59,  6.67s/it]
Training 5/16 epoch (loss 1.8047):  30%|β–ˆβ–ˆβ–ˆ       | 268/880 [26:47<1:07:59,  6.67s/it]
Training 5/16 epoch (loss 1.8047):  31%|β–ˆβ–ˆβ–ˆ       | 269/880 [26:47<1:08:52,  6.76s/it]
Training 5/16 epoch (loss 1.9453):  31%|β–ˆβ–ˆβ–ˆ       | 269/880 [26:53<1:08:52,  6.76s/it]
Training 5/16 epoch (loss 1.9453):  31%|β–ˆβ–ˆβ–ˆ       | 270/880 [26:53<1:05:29,  6.44s/it]
Training 5/16 epoch (loss 2.0000):  31%|β–ˆβ–ˆβ–ˆ       | 270/880 [27:00<1:05:29,  6.44s/it]
Training 5/16 epoch (loss 2.0000):  31%|β–ˆβ–ˆβ–ˆ       | 271/880 [27:00<1:05:50,  6.49s/it]
Training 5/16 epoch (loss 1.7031):  31%|β–ˆβ–ˆβ–ˆ       | 271/880 [27:05<1:05:50,  6.49s/it]
Training 5/16 epoch (loss 1.7031):  31%|β–ˆβ–ˆβ–ˆ       | 272/880 [27:05<1:02:25,  6.16s/it]
Training 5/16 epoch (loss 2.0469):  31%|β–ˆβ–ˆβ–ˆ       | 272/880 [27:10<1:02:25,  6.16s/it]
Training 5/16 epoch (loss 2.0469):  31%|β–ˆβ–ˆβ–ˆ       | 273/880 [27:10<58:03,  5.74s/it]  
Training 5/16 epoch (loss 1.9219):  31%|β–ˆβ–ˆβ–ˆ       | 273/880 [27:17<58:03,  5.74s/it]
Training 5/16 epoch (loss 1.9219):  31%|β–ˆβ–ˆβ–ˆ       | 274/880 [27:17<1:01:30,  6.09s/it]
Training 5/16 epoch (loss 1.7266):  31%|β–ˆβ–ˆβ–ˆ       | 274/880 [27:21<1:01:30,  6.09s/it]
Training 5/16 epoch (loss 1.7266):  31%|β–ˆβ–ˆβ–ˆβ–      | 275/880 [27:21<56:58,  5.65s/it]  
Training 6/16 epoch (loss 1.8828):  31%|β–ˆβ–ˆβ–ˆβ–      | 275/880 [27:27<56:58,  5.65s/it]
Training 6/16 epoch (loss 1.8828):  31%|β–ˆβ–ˆβ–ˆβ–      | 276/880 [27:27<55:35,  5.52s/it]
Training 6/16 epoch (loss 1.9531):  31%|β–ˆβ–ˆβ–ˆβ–      | 276/880 [27:33<55:35,  5.52s/it]
Training 6/16 epoch (loss 1.9531):  31%|β–ˆβ–ˆβ–ˆβ–      | 277/880 [27:33<56:51,  5.66s/it]
Training 6/16 epoch (loss 1.7422):  31%|β–ˆβ–ˆβ–ˆβ–      | 277/880 [27:39<56:51,  5.66s/it]
Training 6/16 epoch (loss 1.7422):  32%|β–ˆβ–ˆβ–ˆβ–      | 278/880 [27:39<57:48,  5.76s/it]
Training 6/16 epoch (loss 2.0312):  32%|β–ˆβ–ˆβ–ˆβ–      | 278/880 [27:43<57:48,  5.76s/it]
Training 6/16 epoch (loss 2.0312):  32%|β–ˆβ–ˆβ–ˆβ–      | 279/880 [27:43<54:28,  5.44s/it]
Training 6/16 epoch (loss 1.9375):  32%|β–ˆβ–ˆβ–ˆβ–      | 279/880 [27:49<54:28,  5.44s/it]
Training 6/16 epoch (loss 1.9375):  32%|β–ˆβ–ˆβ–ˆβ–      | 280/880 [27:49<53:48,  5.38s/it]
Training 6/16 epoch (loss 1.8047):  32%|β–ˆβ–ˆβ–ˆβ–      | 280/880 [27:54<53:48,  5.38s/it]
Training 6/16 epoch (loss 1.8047):  32%|β–ˆβ–ˆβ–ˆβ–      | 281/880 [27:54<55:01,  5.51s/it]
Training 6/16 epoch (loss 1.8594):  32%|β–ˆβ–ˆβ–ˆβ–      | 281/880 [27:59<55:01,  5.51s/it]
Training 6/16 epoch (loss 1.8594):  32%|β–ˆβ–ˆβ–ˆβ–      | 282/880 [27:59<52:18,  5.25s/it]
Training 6/16 epoch (loss 1.8594):  32%|β–ˆβ–ˆβ–ˆβ–      | 282/880 [28:05<52:18,  5.25s/it]
Training 6/16 epoch (loss 1.8594):  32%|β–ˆβ–ˆβ–ˆβ–      | 283/880 [28:05<53:12,  5.35s/it]
Training 6/16 epoch (loss 2.0000):  32%|β–ˆβ–ˆβ–ˆβ–      | 283/880 [28:13<53:12,  5.35s/it]
Training 6/16 epoch (loss 2.0000):  32%|β–ˆβ–ˆβ–ˆβ–      | 284/880 [28:13<1:02:34,  6.30s/it]
Training 6/16 epoch (loss 1.9375):  32%|β–ˆβ–ˆβ–ˆβ–      | 284/880 [28:18<1:02:34,  6.30s/it]
Training 6/16 epoch (loss 1.9375):  32%|β–ˆβ–ˆβ–ˆβ–      | 285/880 [28:18<58:48,  5.93s/it]  
Training 6/16 epoch (loss 1.9609):  32%|β–ˆβ–ˆβ–ˆβ–      | 285/880 [28:23<58:48,  5.93s/it]
Training 6/16 epoch (loss 1.9609):  32%|β–ˆβ–ˆβ–ˆβ–Ž      | 286/880 [28:23<55:22,  5.59s/it]
Training 6/16 epoch (loss 1.9219):  32%|β–ˆβ–ˆβ–ˆβ–Ž      | 286/880 [28:30<55:22,  5.59s/it]
Training 6/16 epoch (loss 1.9219):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 287/880 [28:30<1:00:46,  6.15s/it]
Training 6/16 epoch (loss 1.6797):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 287/880 [28:35<1:00:46,  6.15s/it]
Training 6/16 epoch (loss 1.6797):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 288/880 [28:35<56:36,  5.74s/it]  
Training 6/16 epoch (loss 2.0938):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 288/880 [28:40<56:36,  5.74s/it]
Training 6/16 epoch (loss 2.0938):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 289/880 [28:40<53:40,  5.45s/it]
Training 6/16 epoch (loss 1.8203):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 289/880 [28:45<53:40,  5.45s/it]
Training 6/16 epoch (loss 1.8203):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 290/880 [28:45<53:47,  5.47s/it]
Training 6/16 epoch (loss 1.9375):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 290/880 [28:51<53:47,  5.47s/it]
Training 6/16 epoch (loss 1.9375):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 291/880 [28:51<52:49,  5.38s/it]
Training 6/16 epoch (loss 1.8203):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 291/880 [28:57<52:49,  5.38s/it]
Training 6/16 epoch (loss 1.8203):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 292/880 [28:57<56:08,  5.73s/it]
Training 6/16 epoch (loss 1.8516):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 292/880 [29:02<56:08,  5.73s/it]
Training 6/16 epoch (loss 1.8516):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 293/880 [29:02<54:03,  5.53s/it]
Training 6/16 epoch (loss 1.9375):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 293/880 [29:07<54:03,  5.53s/it]
Training 6/16 epoch (loss 1.9375):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 294/880 [29:07<52:16,  5.35s/it]
Training 6/16 epoch (loss 1.5938):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 294/880 [29:11<52:16,  5.35s/it]
Training 6/16 epoch (loss 1.5938):  34%|β–ˆβ–ˆβ–ˆβ–Ž      | 295/880 [29:11<48:41,  4.99s/it]
Training 6/16 epoch (loss 1.8125):  34%|β–ˆβ–ˆβ–ˆβ–Ž      | 295/880 [29:16<48:41,  4.99s/it]
Training 6/16 epoch (loss 1.8125):  34%|β–ˆβ–ˆβ–ˆβ–Ž      | 296/880 [29:16<48:21,  4.97s/it]
Training 6/16 epoch (loss 2.0781):  34%|β–ˆβ–ˆβ–ˆβ–Ž      | 296/880 [29:22<48:21,  4.97s/it]
Training 6/16 epoch (loss 2.0781):  34%|β–ˆβ–ˆβ–ˆβ–      | 297/880 [29:22<49:31,  5.10s/it]
Training 6/16 epoch (loss 1.8750):  34%|β–ˆβ–ˆβ–ˆβ–      | 297/880 [29:26<49:31,  5.10s/it]
Training 6/16 epoch (loss 1.8750):  34%|β–ˆβ–ˆβ–ˆβ–      | 298/880 [29:26<46:34,  4.80s/it]
Training 6/16 epoch (loss 1.7266):  34%|β–ˆβ–ˆβ–ˆβ–      | 298/880 [29:31<46:34,  4.80s/it]
Training 6/16 epoch (loss 1.7266):  34%|β–ˆβ–ˆβ–ˆβ–      | 299/880 [29:31<47:21,  4.89s/it]
Training 6/16 epoch (loss 1.8594):  34%|β–ˆβ–ˆβ–ˆβ–      | 299/880 [29:43<47:21,  4.89s/it]
Training 6/16 epoch (loss 1.8594):  34%|β–ˆβ–ˆβ–ˆβ–      | 300/880 [29:43<1:09:15,  7.17s/it]
Training 6/16 epoch (loss 1.6953):  34%|β–ˆβ–ˆβ–ˆβ–      | 300/880 [29:49<1:09:15,  7.17s/it]
Training 6/16 epoch (loss 1.6953):  34%|β–ˆβ–ˆβ–ˆβ–      | 301/880 [29:49<1:05:13,  6.76s/it]
Training 6/16 epoch (loss 1.8516):  34%|β–ˆβ–ˆβ–ˆβ–      | 301/880 [29:54<1:05:13,  6.76s/it]
Training 6/16 epoch (loss 1.8516):  34%|β–ˆβ–ˆβ–ˆβ–      | 302/880 [29:54<1:01:03,  6.34s/it]
Training 6/16 epoch (loss 1.5781):  34%|β–ˆβ–ˆβ–ˆβ–      | 302/880 [30:00<1:01:03,  6.34s/it]
Training 6/16 epoch (loss 1.5781):  34%|β–ˆβ–ˆβ–ˆβ–      | 303/880 [30:00<57:56,  6.03s/it]  
Training 6/16 epoch (loss 1.7578):  34%|β–ˆβ–ˆβ–ˆβ–      | 303/880 [30:04<57:56,  6.03s/it]
Training 6/16 epoch (loss 1.7578):  35%|β–ˆβ–ˆβ–ˆβ–      | 304/880 [30:04<53:57,  5.62s/it]
Training 6/16 epoch (loss 1.7422):  35%|β–ˆβ–ˆβ–ˆβ–      | 304/880 [30:10<53:57,  5.62s/it]
Training 6/16 epoch (loss 1.7422):  35%|β–ˆβ–ˆβ–ˆβ–      | 305/880 [30:10<53:34,  5.59s/it]
Training 6/16 epoch (loss 1.8047):  35%|β–ˆβ–ˆβ–ˆβ–      | 305/880 [30:14<53:34,  5.59s/it]
Training 6/16 epoch (loss 1.8047):  35%|β–ˆβ–ˆβ–ˆβ–      | 306/880 [30:14<50:07,  5.24s/it]
Training 6/16 epoch (loss 1.6328):  35%|β–ˆβ–ˆβ–ˆβ–      | 306/880 [30:19<50:07,  5.24s/it]
Training 6/16 epoch (loss 1.6328):  35%|β–ˆβ–ˆβ–ˆβ–      | 307/880 [30:19<48:06,  5.04s/it]
Training 6/16 epoch (loss 1.8438):  35%|β–ˆβ–ˆβ–ˆβ–      | 307/880 [30:24<48:06,  5.04s/it]
Training 6/16 epoch (loss 1.8438):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 308/880 [30:24<49:15,  5.17s/it]
Training 6/16 epoch (loss 1.6875):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 308/880 [30:30<49:15,  5.17s/it]
Training 6/16 epoch (loss 1.6875):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 309/880 [30:30<49:47,  5.23s/it]
Training 6/16 epoch (loss 1.6328):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 309/880 [30:35<49:47,  5.23s/it]
Training 6/16 epoch (loss 1.6328):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 310/880 [30:35<50:09,  5.28s/it]
Training 6/16 epoch (loss 1.8359):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 310/880 [30:42<50:09,  5.28s/it]
Training 6/16 epoch (loss 1.8359):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 311/880 [30:42<53:19,  5.62s/it]
Training 6/16 epoch (loss 1.8906):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 311/880 [30:56<53:19,  5.62s/it]
Training 6/16 epoch (loss 1.8906):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 312/880 [30:56<1:18:08,  8.25s/it]
Training 6/16 epoch (loss 1.6953):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 312/880 [31:01<1:18:08,  8.25s/it]
Training 6/16 epoch (loss 1.6953):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 313/880 [31:01<1:09:09,  7.32s/it]
Training 6/16 epoch (loss 1.9141):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 313/880 [31:06<1:09:09,  7.32s/it]
Training 6/16 epoch (loss 1.9141):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 314/880 [31:06<1:01:15,  6.49s/it]
Training 6/16 epoch (loss 1.7188):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 314/880 [31:11<1:01:15,  6.49s/it]
Training 6/16 epoch (loss 1.7188):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 315/880 [31:11<57:23,  6.10s/it]  
Training 6/16 epoch (loss 1.7422):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 315/880 [31:16<57:23,  6.10s/it]
Training 6/16 epoch (loss 1.7422):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 316/880 [31:16<54:00,  5.75s/it]
Training 6/16 epoch (loss 1.8047):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 316/880 [31:21<54:00,  5.75s/it]
Training 6/16 epoch (loss 1.8047):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 317/880 [31:21<53:00,  5.65s/it]
Training 6/16 epoch (loss 1.6875):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 317/880 [31:27<53:00,  5.65s/it]
Training 6/16 epoch (loss 1.6875):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 318/880 [31:27<54:23,  5.81s/it]
Training 6/16 epoch (loss 2.0156):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 318/880 [31:35<54:23,  5.81s/it]
Training 6/16 epoch (loss 2.0156):  36%|β–ˆβ–ˆβ–ˆβ–‹      | 319/880 [31:35<1:00:06,  6.43s/it]
Training 6/16 epoch (loss 1.9453):  36%|β–ˆβ–ˆβ–ˆβ–‹      | 319/880 [31:51<1:00:06,  6.43s/it]
Training 6/16 epoch (loss 1.9453):  36%|β–ˆβ–ˆβ–ˆβ–‹      | 320/880 [31:51<1:26:04,  9.22s/it]
Training 6/16 epoch (loss 1.6406):  36%|β–ˆβ–ˆβ–ˆβ–‹      | 320/880 [31:57<1:26:04,  9.22s/it]
Training 6/16 epoch (loss 1.6406):  36%|β–ˆβ–ˆβ–ˆβ–‹      | 321/880 [31:57<1:16:37,  8.23s/it]
Training 6/16 epoch (loss 1.5469):  36%|β–ˆβ–ˆβ–ˆβ–‹      | 321/880 [32:02<1:16:37,  8.23s/it]
Training 6/16 epoch (loss 1.5469):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 322/880 [32:02<1:07:01,  7.21s/it]
Training 6/16 epoch (loss 1.6797):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 322/880 [32:07<1:07:01,  7.21s/it]
Training 6/16 epoch (loss 1.6797):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 323/880 [32:07<1:01:55,  6.67s/it]
Training 6/16 epoch (loss 1.6406):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 323/880 [32:14<1:01:55,  6.67s/it]
Training 6/16 epoch (loss 1.6406):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 324/880 [32:14<1:02:42,  6.77s/it]
Training 6/16 epoch (loss 1.7812):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 324/880 [32:20<1:02:42,  6.77s/it]
Training 6/16 epoch (loss 1.7812):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 325/880 [32:20<59:34,  6.44s/it]  
Training 6/16 epoch (loss 1.8438):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 325/880 [32:26<59:34,  6.44s/it]
Training 6/16 epoch (loss 1.8438):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 326/880 [32:26<59:52,  6.48s/it]
Training 6/16 epoch (loss 1.5391):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 326/880 [32:32<59:52,  6.48s/it]
Training 6/16 epoch (loss 1.5391):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 327/880 [32:32<56:43,  6.15s/it]
Training 6/16 epoch (loss 1.8906):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 327/880 [32:37<56:43,  6.15s/it]
Training 6/16 epoch (loss 1.8906):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 328/880 [32:37<52:45,  5.73s/it]
Training 6/16 epoch (loss 1.8047):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 328/880 [32:44<52:45,  5.73s/it]
Training 6/16 epoch (loss 1.8047):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 329/880 [32:44<55:55,  6.09s/it]
Training 6/16 epoch (loss 1.5781):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 329/880 [32:48<55:55,  6.09s/it]
Training 6/16 epoch (loss 1.5781):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 330/880 [32:48<51:49,  5.65s/it]
Training 7/16 epoch (loss 1.7578):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 330/880 [32:53<51:49,  5.65s/it]
Training 7/16 epoch (loss 1.7578):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 331/880 [32:53<50:33,  5.53s/it]
Training 7/16 epoch (loss 1.8125):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 331/880 [32:59<50:33,  5.53s/it]
Training 7/16 epoch (loss 1.8125):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 332/880 [32:59<51:44,  5.66s/it]
Training 7/16 epoch (loss 1.5938):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 332/880 [33:05<51:44,  5.66s/it]
Training 7/16 epoch (loss 1.5938):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 333/880 [33:05<52:33,  5.77s/it]
Training 7/16 epoch (loss 1.9141):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 333/880 [33:10<52:33,  5.77s/it]
Training 7/16 epoch (loss 1.9141):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 334/880 [33:10<49:31,  5.44s/it]
Training 7/16 epoch (loss 1.8203):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 334/880 [33:15<49:31,  5.44s/it]
Training 7/16 epoch (loss 1.8203):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 335/880 [33:15<48:53,  5.38s/it]
Training 7/16 epoch (loss 1.6953):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 335/880 [33:21<48:53,  5.38s/it]
Training 7/16 epoch (loss 1.6953):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 336/880 [33:21<49:56,  5.51s/it]
Training 7/16 epoch (loss 1.7578):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 336/880 [33:26<49:56,  5.51s/it]
Training 7/16 epoch (loss 1.7578):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 337/880 [33:26<47:27,  5.24s/it]
Training 7/16 epoch (loss 1.7422):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 337/880 [33:31<47:27,  5.24s/it]
Training 7/16 epoch (loss 1.7422):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 338/880 [33:31<48:15,  5.34s/it]
Training 7/16 epoch (loss 1.8906):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 338/880 [33:40<48:15,  5.34s/it]
Training 7/16 epoch (loss 1.8906):  39%|β–ˆβ–ˆβ–ˆβ–Š      | 339/880 [33:40<56:44,  6.29s/it]
Training 7/16 epoch (loss 1.8594):  39%|β–ˆβ–ˆβ–ˆβ–Š      | 339/880 [33:45<56:44,  6.29s/it]
Training 7/16 epoch (loss 1.8594):  39%|β–ˆβ–ˆβ–ˆβ–Š      | 340/880 [33:45<53:19,  5.93s/it]
Training 7/16 epoch (loss 1.8672):  39%|β–ˆβ–ˆβ–ˆβ–Š      | 340/880 [33:50<53:19,  5.93s/it]
Training 7/16 epoch (loss 1.8672):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 341/880 [33:50<50:14,  5.59s/it]
Training 7/16 epoch (loss 1.7969):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 341/880 [33:57<50:14,  5.59s/it]
Training 7/16 epoch (loss 1.7969):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 342/880 [33:57<55:09,  6.15s/it]
Training 7/16 epoch (loss 1.5859):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 342/880 [34:02<55:09,  6.15s/it]
Training 7/16 epoch (loss 1.5859):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 343/880 [34:02<51:23,  5.74s/it]
Training 7/16 epoch (loss 1.9922):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 343/880 [34:07<51:23,  5.74s/it]
Training 7/16 epoch (loss 1.9922):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 344/880 [34:07<48:42,  5.45s/it]
Training 7/16 epoch (loss 1.7266):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 344/880 [34:12<48:42,  5.45s/it]
Training 7/16 epoch (loss 1.7266):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 345/880 [34:12<48:47,  5.47s/it]
Training 7/16 epoch (loss 1.8281):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 345/880 [34:17<48:47,  5.47s/it]
Training 7/16 epoch (loss 1.8281):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 346/880 [34:17<47:54,  5.38s/it]
Training 7/16 epoch (loss 1.7031):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 346/880 [34:24<47:54,  5.38s/it]
Training 7/16 epoch (loss 1.7031):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 347/880 [34:24<50:52,  5.73s/it]
Training 7/16 epoch (loss 1.7031):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 347/880 [34:29<50:52,  5.73s/it]
Training 7/16 epoch (loss 1.7031):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 348/880 [34:29<48:58,  5.52s/it]
Training 7/16 epoch (loss 1.8047):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 348/880 [34:34<48:58,  5.52s/it]
Training 7/16 epoch (loss 1.8047):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 349/880 [34:34<47:19,  5.35s/it]
Training 7/16 epoch (loss 1.4531):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 349/880 [34:38<47:19,  5.35s/it]
Training 7/16 epoch (loss 1.4531):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 350/880 [34:38<44:04,  4.99s/it]
Training 7/16 epoch (loss 1.6641):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 350/880 [34:43<44:04,  4.99s/it]
Training 7/16 epoch (loss 1.6641):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 351/880 [34:43<43:46,  4.97s/it]
Training 7/16 epoch (loss 1.9453):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 351/880 [34:48<43:46,  4.97s/it]
Training 7/16 epoch (loss 1.9453):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 352/880 [34:48<44:50,  5.09s/it]
Training 7/16 epoch (loss 1.7422):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 352/880 [34:53<44:50,  5.09s/it]
Training 7/16 epoch (loss 1.7422):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 353/880 [34:53<42:11,  4.80s/it]
Training 7/16 epoch (loss 1.6016):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 353/880 [34:58<42:11,  4.80s/it]
Training 7/16 epoch (loss 1.6016):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 354/880 [34:58<42:55,  4.90s/it]
Training 7/16 epoch (loss 1.7500):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 354/880 [35:10<42:55,  4.90s/it]
Training 7/16 epoch (loss 1.7500):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 355/880 [35:10<1:02:45,  7.17s/it]
Training 7/16 epoch (loss 1.5547):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 355/880 [35:16<1:02:45,  7.17s/it]
Training 7/16 epoch (loss 1.5547):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 356/880 [35:16<59:04,  6.76s/it]  
Training 7/16 epoch (loss 1.7344):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 356/880 [35:21<59:04,  6.76s/it]
Training 7/16 epoch (loss 1.7344):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 357/880 [35:21<55:16,  6.34s/it]
Training 7/16 epoch (loss 1.4609):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 357/880 [35:27<55:16,  6.34s/it]
Training 7/16 epoch (loss 1.4609):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 358/880 [35:27<52:26,  6.03s/it]
Training 7/16 epoch (loss 1.6406):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 358/880 [35:31<52:26,  6.03s/it]
Training 7/16 epoch (loss 1.6406):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 359/880 [35:31<48:47,  5.62s/it]
Training 7/16 epoch (loss 1.6172):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 359/880 [35:37<48:47,  5.62s/it]
Training 7/16 epoch (loss 1.6172):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 360/880 [35:37<48:26,  5.59s/it]
Training 7/16 epoch (loss 1.6719):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 360/880 [35:41<48:26,  5.59s/it]
Training 7/16 epoch (loss 1.6719):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 361/880 [35:41<45:17,  5.24s/it]
Training 7/16 epoch (loss 1.4922):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 361/880 [35:46<45:17,  5.24s/it]
Training 7/16 epoch (loss 1.4922):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 362/880 [35:46<43:28,  5.04s/it]
Training 7/16 epoch (loss 1.7266):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 362/880 [35:51<43:28,  5.04s/it]
Training 7/16 epoch (loss 1.7266):  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 363/880 [35:51<44:30,  5.16s/it]
Training 7/16 epoch (loss 1.5625):  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 363/880 [35:57<44:30,  5.16s/it]
Training 7/16 epoch (loss 1.5625):  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 364/880 [35:57<45:00,  5.23s/it]
Training 7/16 epoch (loss 1.5078):  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 364/880 [36:02<45:00,  5.23s/it]
Training 7/16 epoch (loss 1.5078):  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 365/880 [36:02<45:21,  5.28s/it]
Training 7/16 epoch (loss 1.6953):  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 365/880 [36:08<45:21,  5.28s/it]
Training 7/16 epoch (loss 1.6953):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 366/880 [36:08<48:12,  5.63s/it]
Training 7/16 epoch (loss 1.7500):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 366/880 [36:23<48:12,  5.63s/it]
Training 7/16 epoch (loss 1.7500):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 367/880 [36:23<1:10:33,  8.25s/it]
Training 7/16 epoch (loss 1.5703):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 367/880 [36:28<1:10:33,  8.25s/it]
Training 7/16 epoch (loss 1.5703):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 368/880 [36:28<1:02:26,  7.32s/it]
Training 7/16 epoch (loss 1.7891):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 368/880 [36:32<1:02:26,  7.32s/it]
Training 7/16 epoch (loss 1.7891):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 369/880 [36:32<55:16,  6.49s/it]  
Training 7/16 epoch (loss 1.5703):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 369/880 [36:38<55:16,  6.49s/it]
Training 7/16 epoch (loss 1.5703):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 370/880 [36:38<51:46,  6.09s/it]
Training 7/16 epoch (loss 1.6016):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 370/880 [36:43<51:46,  6.09s/it]
Training 7/16 epoch (loss 1.6016):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 371/880 [36:43<48:42,  5.74s/it]
Training 7/16 epoch (loss 1.6875):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 371/880 [36:48<48:42,  5.74s/it]
Training 7/16 epoch (loss 1.6875):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 372/880 [36:48<47:46,  5.64s/it]
Training 7/16 epoch (loss 1.5547):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 372/880 [36:54<47:46,  5.64s/it]
Training 7/16 epoch (loss 1.5547):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 373/880 [36:54<49:00,  5.80s/it]
Training 7/16 epoch (loss 1.9453):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 373/880 [37:02<49:00,  5.80s/it]
Training 7/16 epoch (loss 1.9453):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 374/880 [37:02<54:10,  6.42s/it]
Training 7/16 epoch (loss 1.7969):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 374/880 [37:18<54:10,  6.42s/it]
Training 7/16 epoch (loss 1.7969):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 375/880 [37:18<1:17:36,  9.22s/it]
Training 7/16 epoch (loss 1.5312):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 375/880 [37:24<1:17:36,  9.22s/it]
Training 7/16 epoch (loss 1.5312):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 376/880 [37:24<1:09:07,  8.23s/it]
Training 7/16 epoch (loss 1.4297):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 376/880 [37:29<1:09:07,  8.23s/it]
Training 7/16 epoch (loss 1.4297):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 377/880 [37:29<1:00:27,  7.21s/it]
Training 7/16 epoch (loss 1.5703):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 377/880 [37:34<1:00:27,  7.21s/it]
Training 7/16 epoch (loss 1.5703):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 378/880 [37:34<55:51,  6.68s/it]  
Training 7/16 epoch (loss 1.5234):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 378/880 [37:41<55:51,  6.68s/it]
Training 7/16 epoch (loss 1.5234):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 379/880 [37:41<56:31,  6.77s/it]
Training 7/16 epoch (loss 1.6797):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 379/880 [37:47<56:31,  6.77s/it]
Training 7/16 epoch (loss 1.6797):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 380/880 [37:47<53:40,  6.44s/it]
Training 7/16 epoch (loss 1.7266):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 380/880 [37:53<53:40,  6.44s/it]
Training 7/16 epoch (loss 1.7266):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 381/880 [37:53<53:55,  6.48s/it]
Training 7/16 epoch (loss 1.4062):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 381/880 [37:59<53:55,  6.48s/it]
Training 7/16 epoch (loss 1.4062):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 382/880 [37:59<51:04,  6.15s/it]
Training 7/16 epoch (loss 1.7734):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 382/880 [38:03<51:04,  6.15s/it]
Training 7/16 epoch (loss 1.7734):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 383/880 [38:03<47:28,  5.73s/it]
Training 7/16 epoch (loss 1.7031):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 383/880 [38:10<47:28,  5.73s/it]
Training 7/16 epoch (loss 1.7031):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 384/880 [38:10<50:18,  6.09s/it]
Training 7/16 epoch (loss 1.4609):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 384/880 [38:15<50:18,  6.09s/it]
Training 7/16 epoch (loss 1.4609):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 385/880 [38:15<46:36,  5.65s/it]
Training 8/16 epoch (loss 1.6875):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 385/880 [38:20<46:36,  5.65s/it]
Training 8/16 epoch (loss 1.6875):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 386/880 [38:20<45:29,  5.53s/it]
Training 8/16 epoch (loss 1.7031):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 386/880 [38:26<45:29,  5.53s/it]
Training 8/16 epoch (loss 1.7031):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 387/880 [38:26<46:32,  5.66s/it]
Training 8/16 epoch (loss 1.5078):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 387/880 [38:32<46:32,  5.66s/it]
Training 8/16 epoch (loss 1.5078):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 388/880 [38:32<47:18,  5.77s/it]
Training 8/16 epoch (loss 1.7734):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 388/880 [38:37<47:18,  5.77s/it]
Training 8/16 epoch (loss 1.7734):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 389/880 [38:37<44:34,  5.45s/it]
Training 8/16 epoch (loss 1.6875):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 389/880 [38:42<44:34,  5.45s/it]
Training 8/16 epoch (loss 1.6875):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 390/880 [38:42<43:58,  5.39s/it]
Training 8/16 epoch (loss 1.5625):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 390/880 [38:48<43:58,  5.39s/it]
Training 8/16 epoch (loss 1.5625):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 391/880 [38:48<44:54,  5.51s/it]
Training 8/16 epoch (loss 1.6562):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 391/880 [38:52<44:54,  5.51s/it]
Training 8/16 epoch (loss 1.6562):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 392/880 [38:52<42:39,  5.25s/it]
Training 8/16 epoch (loss 1.6406):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 392/880 [38:58<42:39,  5.25s/it]
Training 8/16 epoch (loss 1.6406):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 393/880 [38:58<43:22,  5.34s/it]
Training 8/16 epoch (loss 1.8281):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 393/880 [39:07<43:22,  5.34s/it]
Training 8/16 epoch (loss 1.8281):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 394/880 [39:07<50:58,  6.29s/it]
Training 8/16 epoch (loss 1.7969):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 394/880 [39:12<50:58,  6.29s/it]
Training 8/16 epoch (loss 1.7969):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 395/880 [39:12<47:52,  5.92s/it]
Training 8/16 epoch (loss 1.7812):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 395/880 [39:16<47:52,  5.92s/it]
Training 8/16 epoch (loss 1.7812):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 396/880 [39:16<45:05,  5.59s/it]
Training 8/16 epoch (loss 1.7031):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 396/880 [39:24<45:05,  5.59s/it]
Training 8/16 epoch (loss 1.7031):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 397/880 [39:24<49:30,  6.15s/it]
Training 8/16 epoch (loss 1.5000):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 397/880 [39:29<49:30,  6.15s/it]
Training 8/16 epoch (loss 1.5000):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 398/880 [39:29<46:08,  5.74s/it]
Training 8/16 epoch (loss 1.9219):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 398/880 [39:33<46:08,  5.74s/it]
Training 8/16 epoch (loss 1.9219):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 399/880 [39:33<43:44,  5.46s/it]
Training 8/16 epoch (loss 1.6328):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 399/880 [39:39<43:44,  5.46s/it]
Training 8/16 epoch (loss 1.6328):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 400/880 [39:39<43:49,  5.48s/it]
Training 8/16 epoch (loss 1.7500):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 400/880 [39:44<43:49,  5.48s/it]
Training 8/16 epoch (loss 1.7500):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 401/880 [39:44<43:01,  5.39s/it]
Training 8/16 epoch (loss 1.5938):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 401/880 [39:51<43:01,  5.39s/it]
Training 8/16 epoch (loss 1.5938):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 402/880 [39:51<45:41,  5.74s/it]
Training 8/16 epoch (loss 1.6172):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 402/880 [39:56<45:41,  5.74s/it]
Training 8/16 epoch (loss 1.6172):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 403/880 [39:56<43:57,  5.53s/it]
Training 8/16 epoch (loss 1.6953):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 403/880 [40:01<43:57,  5.53s/it]
Training 8/16 epoch (loss 1.6953):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 404/880 [40:01<42:28,  5.35s/it]
Training 8/16 epoch (loss 1.3359):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 404/880 [40:05<42:28,  5.35s/it]
Training 8/16 epoch (loss 1.3359):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 405/880 [40:05<39:31,  4.99s/it]
Training 8/16 epoch (loss 1.5547):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 405/880 [40:10<39:31,  4.99s/it]
Training 8/16 epoch (loss 1.5547):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 406/880 [40:10<39:14,  4.97s/it]
Training 8/16 epoch (loss 1.8438):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 406/880 [40:15<39:14,  4.97s/it]
Training 8/16 epoch (loss 1.8438):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 407/880 [40:15<40:10,  5.10s/it]
Training 8/16 epoch (loss 1.6406):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 407/880 [40:19<40:10,  5.10s/it]
Training 8/16 epoch (loss 1.6406):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 408/880 [40:19<37:47,  4.80s/it]
Training 8/16 epoch (loss 1.4844):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 408/880 [40:24<37:47,  4.80s/it]
Training 8/16 epoch (loss 1.4844):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 409/880 [40:24<38:25,  4.89s/it]
Training 8/16 epoch (loss 1.6250):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 409/880 [40:37<38:25,  4.89s/it]
Training 8/16 epoch (loss 1.6250):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 410/880 [40:37<56:10,  7.17s/it]
Training 8/16 epoch (loss 1.4297):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 410/880 [40:43<56:10,  7.17s/it]
Training 8/16 epoch (loss 1.4297):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 411/880 [40:43<52:53,  6.77s/it]
Training 8/16 epoch (loss 1.6094):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 411/880 [40:48<52:53,  6.77s/it]
Training 8/16 epoch (loss 1.6094):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 412/880 [40:48<49:30,  6.35s/it]
Training 8/16 epoch (loss 1.3438):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 412/880 [40:53<49:30,  6.35s/it]
Training 8/16 epoch (loss 1.3438):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 413/880 [40:53<46:56,  6.03s/it]
Training 8/16 epoch (loss 1.5078):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 413/880 [40:58<46:56,  6.03s/it]
Training 8/16 epoch (loss 1.5078):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 414/880 [40:58<43:40,  5.62s/it]
Training 8/16 epoch (loss 1.5078):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 414/880 [41:04<43:40,  5.62s/it]
Training 8/16 epoch (loss 1.5078):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 415/880 [41:04<43:20,  5.59s/it]
Training 8/16 epoch (loss 1.5625):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 415/880 [41:08<43:20,  5.59s/it]
Training 8/16 epoch (loss 1.5625):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 416/880 [41:08<40:31,  5.24s/it]
Training 8/16 epoch (loss 1.3828):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 416/880 [41:13<40:31,  5.24s/it]
Training 8/16 epoch (loss 1.3828):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 417/880 [41:13<38:52,  5.04s/it]
Training 8/16 epoch (loss 1.6016):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 417/880 [41:18<38:52,  5.04s/it]
Training 8/16 epoch (loss 1.6016):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 418/880 [41:18<39:46,  5.17s/it]
Training 8/16 epoch (loss 1.4141):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 418/880 [41:23<39:46,  5.17s/it]
Training 8/16 epoch (loss 1.4141):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 419/880 [41:23<40:12,  5.23s/it]
Training 8/16 epoch (loss 1.3984):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 419/880 [41:29<40:12,  5.23s/it]
Training 8/16 epoch (loss 1.3984):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 420/880 [41:29<40:29,  5.28s/it]
Training 8/16 epoch (loss 1.5938):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 420/880 [41:35<40:29,  5.28s/it]
Training 8/16 epoch (loss 1.5938):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 421/880 [41:35<43:02,  5.63s/it]
Training 8/16 epoch (loss 1.6484):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 421/880 [41:50<43:02,  5.63s/it]
Training 8/16 epoch (loss 1.6484):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 422/880 [41:50<1:02:59,  8.25s/it]
Training 8/16 epoch (loss 1.4531):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 422/880 [41:55<1:02:59,  8.25s/it]
Training 8/16 epoch (loss 1.4531):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 423/880 [41:55<55:43,  7.32s/it]  
Training 8/16 epoch (loss 1.6484):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 423/880 [41:59<55:43,  7.32s/it]
Training 8/16 epoch (loss 1.6484):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 424/880 [41:59<49:20,  6.49s/it]
Training 8/16 epoch (loss 1.4531):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 424/880 [42:04<49:20,  6.49s/it]
Training 8/16 epoch (loss 1.4531):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 425/880 [42:04<46:11,  6.09s/it]
Training 8/16 epoch (loss 1.4922):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 425/880 [42:09<46:11,  6.09s/it]
Training 8/16 epoch (loss 1.4922):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 426/880 [42:09<43:26,  5.74s/it]
Training 8/16 epoch (loss 1.6094):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 426/880 [42:15<43:26,  5.74s/it]
Training 8/16 epoch (loss 1.6094):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 427/880 [42:15<42:36,  5.64s/it]
Training 8/16 epoch (loss 1.4453):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 427/880 [42:21<42:36,  5.64s/it]
Training 8/16 epoch (loss 1.4453):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 428/880 [42:21<43:41,  5.80s/it]
Training 8/16 epoch (loss 1.8594):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 428/880 [42:29<43:41,  5.80s/it]
Training 8/16 epoch (loss 1.8594):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 429/880 [42:29<48:16,  6.42s/it]
Training 8/16 epoch (loss 1.6562):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 429/880 [42:45<48:16,  6.42s/it]
Training 8/16 epoch (loss 1.6562):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 430/880 [42:45<1:09:08,  9.22s/it]
Training 8/16 epoch (loss 1.4219):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 430/880 [42:50<1:09:08,  9.22s/it]
Training 8/16 epoch (loss 1.4219):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 431/880 [42:50<1:01:31,  8.22s/it]
Training 8/16 epoch (loss 1.3281):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 431/880 [42:55<1:01:31,  8.22s/it]
Training 8/16 epoch (loss 1.3281):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 432/880 [42:55<53:48,  7.21s/it]  
Training 8/16 epoch (loss 1.4844):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 432/880 [43:01<53:48,  7.21s/it]
Training 8/16 epoch (loss 1.4844):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 433/880 [43:01<49:41,  6.67s/it]
Training 8/16 epoch (loss 1.4375):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 433/880 [43:08<49:41,  6.67s/it]
Training 8/16 epoch (loss 1.4375):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 434/880 [43:08<50:17,  6.77s/it]
Training 8/16 epoch (loss 1.5781):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 434/880 [43:13<50:17,  6.77s/it]
Training 8/16 epoch (loss 1.5781):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 435/880 [43:13<47:46,  6.44s/it]
Training 8/16 epoch (loss 1.6406):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 435/880 [43:20<47:46,  6.44s/it]
Training 8/16 epoch (loss 1.6406):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 436/880 [43:20<48:00,  6.49s/it]
Training 8/16 epoch (loss 1.3047):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 436/880 [43:25<48:00,  6.49s/it]
Training 8/16 epoch (loss 1.3047):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 437/880 [43:25<45:27,  6.16s/it]
Training 8/16 epoch (loss 1.6797):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 437/880 [43:30<45:27,  6.16s/it]
Training 8/16 epoch (loss 1.6797):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 438/880 [43:30<42:14,  5.73s/it]
Training 8/16 epoch (loss 1.6172):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 438/880 [43:37<42:14,  5.73s/it]
Training 8/16 epoch (loss 1.6172):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 439/880 [43:37<44:44,  6.09s/it]
Training 8/16 epoch (loss 1.3906):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 439/880 [43:42<44:44,  6.09s/it]
Training 8/16 epoch (loss 1.3906):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 440/880 [43:42<41:26,  5.65s/it]
Training 9/16 epoch (loss 1.6172):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 440/880 [43:47<41:26,  5.65s/it]
Training 9/16 epoch (loss 1.6172):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 441/880 [43:47<40:24,  5.52s/it]
Training 9/16 epoch (loss 1.6484):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 441/880 [43:53<40:24,  5.52s/it]
Training 9/16 epoch (loss 1.6484):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 442/880 [43:53<41:18,  5.66s/it]
Training 9/16 epoch (loss 1.4453):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 442/880 [43:59<41:18,  5.66s/it]
Training 9/16 epoch (loss 1.4453):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 443/880 [43:59<41:57,  5.76s/it]
Training 9/16 epoch (loss 1.7188):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 443/880 [44:04<41:57,  5.76s/it]
Training 9/16 epoch (loss 1.7188):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 444/880 [44:04<39:31,  5.44s/it]
Training 9/16 epoch (loss 1.6250):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 444/880 [44:09<39:31,  5.44s/it]
Training 9/16 epoch (loss 1.6250):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 445/880 [44:09<39:00,  5.38s/it]
Training 9/16 epoch (loss 1.4609):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 445/880 [44:15<39:00,  5.38s/it]
Training 9/16 epoch (loss 1.4609):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 446/880 [44:15<39:52,  5.51s/it]
Training 9/16 epoch (loss 1.5547):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 446/880 [44:19<39:52,  5.51s/it]
Training 9/16 epoch (loss 1.5547):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 447/880 [44:19<37:52,  5.25s/it]
Training 9/16 epoch (loss 1.5625):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 447/880 [44:25<37:52,  5.25s/it]
Training 9/16 epoch (loss 1.5625):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 448/880 [44:25<38:29,  5.35s/it]
Training 9/16 epoch (loss 1.7266):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 448/880 [44:33<38:29,  5.35s/it]
Training 9/16 epoch (loss 1.7266):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 449/880 [44:33<45:12,  6.29s/it]
Training 9/16 epoch (loss 1.7031):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 449/880 [44:38<45:12,  6.29s/it]
Training 9/16 epoch (loss 1.7031):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 450/880 [44:38<42:27,  5.92s/it]
Training 9/16 epoch (loss 1.6875):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 450/880 [44:43<42:27,  5.92s/it]
Training 9/16 epoch (loss 1.6875):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 451/880 [44:43<39:58,  5.59s/it]
Training 9/16 epoch (loss 1.5938):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 451/880 [44:51<39:58,  5.59s/it]
Training 9/16 epoch (loss 1.5938):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 452/880 [44:51<43:51,  6.15s/it]
Training 9/16 epoch (loss 1.3828):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 452/880 [44:55<43:51,  6.15s/it]
Training 9/16 epoch (loss 1.3828):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 453/880 [44:55<40:50,  5.74s/it]
Training 9/16 epoch (loss 1.8281):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 453/880 [45:00<40:50,  5.74s/it]
Training 9/16 epoch (loss 1.8281):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 454/880 [45:00<38:40,  5.45s/it]
Training 9/16 epoch (loss 1.5469):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 454/880 [45:06<38:40,  5.45s/it]
Training 9/16 epoch (loss 1.5469):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 455/880 [45:06<38:44,  5.47s/it]
Training 9/16 epoch (loss 1.6562):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 455/880 [45:11<38:44,  5.47s/it]
Training 9/16 epoch (loss 1.6562):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 456/880 [45:11<38:01,  5.38s/it]
Training 9/16 epoch (loss 1.5156):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 456/880 [45:17<38:01,  5.38s/it]
Training 9/16 epoch (loss 1.5156):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 457/880 [45:17<40:22,  5.73s/it]
Training 9/16 epoch (loss 1.5391):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 457/880 [45:23<40:22,  5.73s/it]
Training 9/16 epoch (loss 1.5391):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 458/880 [45:23<38:51,  5.52s/it]
Training 9/16 epoch (loss 1.6016):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 458/880 [45:27<38:51,  5.52s/it]
Training 9/16 epoch (loss 1.6016):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 459/880 [45:27<37:33,  5.35s/it]
Training 9/16 epoch (loss 1.2422):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 459/880 [45:32<37:33,  5.35s/it]
Training 9/16 epoch (loss 1.2422):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 460/880 [45:32<34:58,  5.00s/it]
Training 9/16 epoch (loss 1.4375):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 460/880 [45:37<34:58,  5.00s/it]
Training 9/16 epoch (loss 1.4375):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 461/880 [45:37<34:42,  4.97s/it]
Training 9/16 epoch (loss 1.7188):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 461/880 [45:42<34:42,  4.97s/it]
Training 9/16 epoch (loss 1.7188):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 462/880 [45:42<35:30,  5.10s/it]
Training 9/16 epoch (loss 1.5078):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 462/880 [45:46<35:30,  5.10s/it]
Training 9/16 epoch (loss 1.5078):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 463/880 [45:46<33:22,  4.80s/it]
Training 9/16 epoch (loss 1.3672):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 463/880 [45:51<33:22,  4.80s/it]
Training 9/16 epoch (loss 1.3672):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 464/880 [45:51<33:54,  4.89s/it]
Training 9/16 epoch (loss 1.5078):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 464/880 [46:04<33:54,  4.89s/it]
Training 9/16 epoch (loss 1.5078):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 465/880 [46:04<49:33,  7.16s/it]
Training 9/16 epoch (loss 1.3047):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 465/880 [46:09<49:33,  7.16s/it]
Training 9/16 epoch (loss 1.3047):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 466/880 [46:09<46:37,  6.76s/it]
Training 9/16 epoch (loss 1.5234):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 466/880 [46:15<46:37,  6.76s/it]
Training 9/16 epoch (loss 1.5234):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 467/880 [46:15<43:37,  6.34s/it]
Training 9/16 epoch (loss 1.2344):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 467/880 [46:20<43:37,  6.34s/it]
Training 9/16 epoch (loss 1.2344):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 468/880 [46:20<41:22,  6.02s/it]
Training 9/16 epoch (loss 1.3984):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 468/880 [46:25<41:22,  6.02s/it]
Training 9/16 epoch (loss 1.3984):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 469/880 [46:25<38:29,  5.62s/it]
Training 9/16 epoch (loss 1.4062):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 469/880 [46:30<38:29,  5.62s/it]
Training 9/16 epoch (loss 1.4062):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 470/880 [46:30<38:13,  5.59s/it]
Training 9/16 epoch (loss 1.4766):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 470/880 [46:35<38:13,  5.59s/it]
Training 9/16 epoch (loss 1.4766):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 471/880 [46:35<35:44,  5.24s/it]
Training 9/16 epoch (loss 1.2891):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 471/880 [46:39<35:44,  5.24s/it]
Training 9/16 epoch (loss 1.2891):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 472/880 [46:39<34:16,  5.04s/it]
Training 9/16 epoch (loss 1.5000):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 472/880 [46:45<34:16,  5.04s/it]
Training 9/16 epoch (loss 1.5000):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 473/880 [46:45<35:02,  5.17s/it]
Training 9/16 epoch (loss 1.3125):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 473/880 [46:50<35:02,  5.17s/it]
Training 9/16 epoch (loss 1.3125):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 474/880 [46:50<35:23,  5.23s/it]
Training 9/16 epoch (loss 1.2969):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 474/880 [46:55<35:23,  5.23s/it]
Training 9/16 epoch (loss 1.2969):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 475/880 [46:55<35:37,  5.28s/it]
Training 9/16 epoch (loss 1.4844):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 475/880 [47:02<35:37,  5.28s/it]
Training 9/16 epoch (loss 1.4844):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 476/880 [47:02<37:50,  5.62s/it]
Training 9/16 epoch (loss 1.5469):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 476/880 [47:16<37:50,  5.62s/it]
Training 9/16 epoch (loss 1.5469):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 477/880 [47:16<55:23,  8.25s/it]
Training 9/16 epoch (loss 1.3750):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 477/880 [47:21<55:23,  8.25s/it]
Training 9/16 epoch (loss 1.3750):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 478/880 [47:21<48:59,  7.31s/it]
Training 9/16 epoch (loss 1.5703):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 478/880 [47:26<48:59,  7.31s/it]
Training 9/16 epoch (loss 1.5703):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 479/880 [47:26<43:21,  6.49s/it]
Training 9/16 epoch (loss 1.3672):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 479/880 [47:31<43:21,  6.49s/it]
Training 9/16 epoch (loss 1.3672):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 480/880 [47:31<40:35,  6.09s/it]
Training 9/16 epoch (loss 1.3750):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 480/880 [47:36<40:35,  6.09s/it]
Training 9/16 epoch (loss 1.3750):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 481/880 [47:36<38:11,  5.74s/it]
Training 9/16 epoch (loss 1.5078):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 481/880 [47:42<38:11,  5.74s/it]
Training 9/16 epoch (loss 1.5078):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 482/880 [47:42<37:28,  5.65s/it]
Training 9/16 epoch (loss 1.3438):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 482/880 [47:48<37:28,  5.65s/it]
Training 9/16 epoch (loss 1.3438):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 483/880 [47:48<38:26,  5.81s/it]
Training 9/16 epoch (loss 1.8203):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 483/880 [47:56<38:26,  5.81s/it]
Training 9/16 epoch (loss 1.8203):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 484/880 [47:56<42:26,  6.43s/it]
Training 9/16 epoch (loss 1.5547):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 484/880 [48:11<42:26,  6.43s/it]
Training 9/16 epoch (loss 1.5547):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 485/880 [48:11<1:00:44,  9.23s/it]
Training 9/16 epoch (loss 1.3438):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 485/880 [48:17<1:00:44,  9.23s/it]
Training 9/16 epoch (loss 1.3438):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 486/880 [48:17<54:01,  8.23s/it]  
Training 9/16 epoch (loss 1.2656):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 486/880 [48:22<54:01,  8.23s/it]
Training 9/16 epoch (loss 1.2656):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 487/880 [48:22<47:12,  7.21s/it]
Training 9/16 epoch (loss 1.3750):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 487/880 [48:27<47:12,  7.21s/it]
Training 9/16 epoch (loss 1.3750):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 488/880 [48:27<43:35,  6.67s/it]
Training 9/16 epoch (loss 1.3516):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 488/880 [48:34<43:35,  6.67s/it]
Training 9/16 epoch (loss 1.3516):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 489/880 [48:34<44:05,  6.77s/it]
Training 9/16 epoch (loss 1.4766):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 489/880 [48:40<44:05,  6.77s/it]
Training 9/16 epoch (loss 1.4766):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 490/880 [48:40<41:52,  6.44s/it]
Training 9/16 epoch (loss 1.5469):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 490/880 [48:47<41:52,  6.44s/it]
Training 9/16 epoch (loss 1.5469):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 491/880 [48:47<42:02,  6.49s/it]
Training 9/16 epoch (loss 1.2344):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 491/880 [48:52<42:02,  6.49s/it]
Training 9/16 epoch (loss 1.2344):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 492/880 [48:52<39:49,  6.16s/it]
Training 9/16 epoch (loss 1.6094):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 492/880 [48:57<39:49,  6.16s/it]
Training 9/16 epoch (loss 1.6094):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 493/880 [48:57<37:01,  5.74s/it]
Training 9/16 epoch (loss 1.5312):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 493/880 [49:04<37:01,  5.74s/it]
Training 9/16 epoch (loss 1.5312):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 494/880 [49:04<39:12,  6.10s/it]
Training 9/16 epoch (loss 1.2891):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 494/880 [49:08<39:12,  6.10s/it]
Training 9/16 epoch (loss 1.2891):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 495/880 [49:08<36:18,  5.66s/it]
Training 10/16 epoch (loss 1.5234):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 495/880 [49:14<36:18,  5.66s/it]
Training 10/16 epoch (loss 1.5234):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 496/880 [49:14<35:22,  5.53s/it]
Training 10/16 epoch (loss 1.5547):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 496/880 [49:20<35:22,  5.53s/it]
Training 10/16 epoch (loss 1.5547):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 497/880 [49:20<36:08,  5.66s/it]
Training 10/16 epoch (loss 1.3359):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 497/880 [49:26<36:08,  5.66s/it]
Training 10/16 epoch (loss 1.3359):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 498/880 [49:26<36:41,  5.76s/it]
Training 10/16 epoch (loss 1.6484):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 498/880 [49:30<36:41,  5.76s/it]
Training 10/16 epoch (loss 1.6484):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 499/880 [49:30<34:32,  5.44s/it]
Training 10/16 epoch (loss 1.5547):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 499/880 [49:36<34:32,  5.44s/it]
Training 10/16 epoch (loss 1.5547):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 500/880 [49:36<34:04,  5.38s/it]
Training 10/16 epoch (loss 1.3594):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 500/880 [49:41<34:04,  5.38s/it]
Training 10/16 epoch (loss 1.3594):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 501/880 [49:41<34:48,  5.51s/it]
Training 10/16 epoch (loss 1.4531):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 501/880 [49:46<34:48,  5.51s/it]
Training 10/16 epoch (loss 1.4531):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 502/880 [49:46<33:03,  5.25s/it]
Training 10/16 epoch (loss 1.4609):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 502/880 [49:52<33:03,  5.25s/it]
Training 10/16 epoch (loss 1.4609):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 503/880 [49:52<33:35,  5.35s/it]
Training 10/16 epoch (loss 1.6328):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 503/880 [50:00<33:35,  5.35s/it]
Training 10/16 epoch (loss 1.6328):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 504/880 [50:00<39:28,  6.30s/it]
Training 10/16 epoch (loss 1.6094):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 504/880 [50:05<39:28,  6.30s/it]
Training 10/16 epoch (loss 1.6094):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 505/880 [50:05<37:04,  5.93s/it]
Training 10/16 epoch (loss 1.5938):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 505/880 [50:10<37:04,  5.93s/it]
Training 10/16 epoch (loss 1.5938):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 506/880 [50:10<34:53,  5.60s/it]
Training 10/16 epoch (loss 1.5000):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 506/880 [50:17<34:53,  5.60s/it]
Training 10/16 epoch (loss 1.5000):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 507/880 [50:17<38:15,  6.15s/it]
Training 10/16 epoch (loss 1.2891):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 507/880 [50:22<38:15,  6.15s/it]
Training 10/16 epoch (loss 1.2891):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 508/880 [50:22<35:35,  5.74s/it]
Training 10/16 epoch (loss 1.7188):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 508/880 [50:27<35:35,  5.74s/it]
Training 10/16 epoch (loss 1.7188):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 509/880 [50:27<33:41,  5.45s/it]
Training 10/16 epoch (loss 1.4219):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 509/880 [50:33<33:41,  5.45s/it]
Training 10/16 epoch (loss 1.4219):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 510/880 [50:33<33:43,  5.47s/it]
Training 10/16 epoch (loss 1.5547):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 510/880 [50:38<33:43,  5.47s/it]
Training 10/16 epoch (loss 1.5547):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 511/880 [50:38<33:05,  5.38s/it]
Training 10/16 epoch (loss 1.4297):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 511/880 [50:44<33:05,  5.38s/it]
Training 10/16 epoch (loss 1.4297):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 512/880 [50:44<35:07,  5.73s/it]
Training 10/16 epoch (loss 1.4453):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 512/880 [50:49<35:07,  5.73s/it]
Training 10/16 epoch (loss 1.4453):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 513/880 [50:49<33:47,  5.52s/it]
Training 10/16 epoch (loss 1.5078):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 513/880 [50:54<33:47,  5.52s/it]
Training 10/16 epoch (loss 1.5078):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 514/880 [50:54<32:39,  5.35s/it]
Training 10/16 epoch (loss 1.1641):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 514/880 [50:58<32:39,  5.35s/it]
Training 10/16 epoch (loss 1.1641):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 515/880 [50:58<30:23,  5.00s/it]
Training 10/16 epoch (loss 1.3438):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 515/880 [51:03<30:23,  5.00s/it]
Training 10/16 epoch (loss 1.3438):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 516/880 [51:03<30:10,  4.97s/it]
Training 10/16 epoch (loss 1.6016):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 516/880 [51:09<30:10,  4.97s/it]
Training 10/16 epoch (loss 1.6016):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 517/880 [51:09<30:51,  5.10s/it]
Training 10/16 epoch (loss 1.3906):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 517/880 [51:13<30:51,  5.10s/it]
Training 10/16 epoch (loss 1.3906):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 518/880 [51:13<28:59,  4.81s/it]
Training 10/16 epoch (loss 1.2656):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 518/880 [51:18<28:59,  4.81s/it]
Training 10/16 epoch (loss 1.2656):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 519/880 [51:18<29:26,  4.89s/it]
Training 10/16 epoch (loss 1.4062):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 519/880 [51:30<29:26,  4.89s/it]
Training 10/16 epoch (loss 1.4062):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 520/880 [51:30<42:59,  7.17s/it]
Training 10/16 epoch (loss 1.1953):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 520/880 [51:36<42:59,  7.17s/it]
Training 10/16 epoch (loss 1.1953):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 521/880 [51:36<40:26,  6.76s/it]
Training 10/16 epoch (loss 1.4297):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 521/880 [51:42<40:26,  6.76s/it]
Training 10/16 epoch (loss 1.4297):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 522/880 [51:42<37:49,  6.34s/it]
Training 10/16 epoch (loss 1.1250):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 522/880 [51:47<37:49,  6.34s/it]
Training 10/16 epoch (loss 1.1250):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 523/880 [51:47<35:50,  6.02s/it]
Training 10/16 epoch (loss 1.3047):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 523/880 [51:52<35:50,  6.02s/it]
Training 10/16 epoch (loss 1.3047):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 524/880 [51:52<33:20,  5.62s/it]
Training 10/16 epoch (loss 1.3281):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 524/880 [51:57<33:20,  5.62s/it]
Training 10/16 epoch (loss 1.3281):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 525/880 [51:57<33:05,  5.59s/it]
Training 10/16 epoch (loss 1.3984):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 525/880 [52:01<33:05,  5.59s/it]
Training 10/16 epoch (loss 1.3984):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 526/880 [52:01<30:55,  5.24s/it]
Training 10/16 epoch (loss 1.2188):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 526/880 [52:06<30:55,  5.24s/it]
Training 10/16 epoch (loss 1.2188):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 527/880 [52:06<29:39,  5.04s/it]
Training 10/16 epoch (loss 1.4062):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 527/880 [52:12<29:39,  5.04s/it]
Training 10/16 epoch (loss 1.4062):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 528/880 [52:12<30:20,  5.17s/it]
Training 10/16 epoch (loss 1.2266):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 528/880 [52:17<30:20,  5.17s/it]
Training 10/16 epoch (loss 1.2266):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 529/880 [52:17<30:37,  5.24s/it]
Training 10/16 epoch (loss 1.2266):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 529/880 [52:22<30:37,  5.24s/it]
Training 10/16 epoch (loss 1.2266):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 530/880 [52:22<30:48,  5.28s/it]
Training 10/16 epoch (loss 1.4062):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 530/880 [52:29<30:48,  5.28s/it]
Training 10/16 epoch (loss 1.4062):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 531/880 [52:29<32:42,  5.62s/it]
Training 10/16 epoch (loss 1.4609):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 531/880 [52:43<32:42,  5.62s/it]
Training 10/16 epoch (loss 1.4609):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 532/880 [52:43<47:51,  8.25s/it]
Training 10/16 epoch (loss 1.2969):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 532/880 [52:48<47:51,  8.25s/it]
Training 10/16 epoch (loss 1.2969):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 533/880 [52:48<42:17,  7.31s/it]
Training 10/16 epoch (loss 1.5000):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 533/880 [52:53<42:17,  7.31s/it]
Training 10/16 epoch (loss 1.5000):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 534/880 [52:53<37:24,  6.49s/it]
Training 10/16 epoch (loss 1.3047):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 534/880 [52:58<37:24,  6.49s/it]
Training 10/16 epoch (loss 1.3047):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 535/880 [52:58<35:00,  6.09s/it]
Training 10/16 epoch (loss 1.3125):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 535/880 [53:03<35:00,  6.09s/it]
Training 10/16 epoch (loss 1.3125):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 536/880 [53:03<32:55,  5.74s/it]
Training 10/16 epoch (loss 1.4062):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 536/880 [53:08<32:55,  5.74s/it]
Training 10/16 epoch (loss 1.4062):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 537/880 [53:08<32:16,  5.65s/it]
Training 10/16 epoch (loss 1.2500):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 537/880 [53:14<32:16,  5.65s/it]
Training 10/16 epoch (loss 1.2500):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 538/880 [53:14<33:05,  5.81s/it]
Training 10/16 epoch (loss 1.7344):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 538/880 [53:22<33:05,  5.81s/it]
Training 10/16 epoch (loss 1.7344):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 539/880 [53:22<36:32,  6.43s/it]
Training 10/16 epoch (loss 1.4844):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 539/880 [53:38<36:32,  6.43s/it]
Training 10/16 epoch (loss 1.4844):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 540/880 [53:38<52:15,  9.22s/it]
Training 10/16 epoch (loss 1.2656):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 540/880 [53:44<52:15,  9.22s/it]
Training 10/16 epoch (loss 1.2656):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 541/880 [53:44<46:28,  8.23s/it]
Training 10/16 epoch (loss 1.1875):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 541/880 [53:49<46:28,  8.23s/it]
Training 10/16 epoch (loss 1.1875):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 542/880 [53:49<40:35,  7.20s/it]
Training 10/16 epoch (loss 1.3047):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 542/880 [53:54<40:35,  7.20s/it]
Training 10/16 epoch (loss 1.3047):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 543/880 [53:54<37:27,  6.67s/it]
Training 10/16 epoch (loss 1.2422):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 543/880 [54:01<37:27,  6.67s/it]
Training 10/16 epoch (loss 1.2422):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 544/880 [54:01<37:52,  6.76s/it]
Training 10/16 epoch (loss 1.3750):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 544/880 [54:07<37:52,  6.76s/it]
Training 10/16 epoch (loss 1.3750):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 545/880 [54:07<35:56,  6.44s/it]
Training 10/16 epoch (loss 1.4609):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 545/880 [54:14<35:56,  6.44s/it]
Training 10/16 epoch (loss 1.4609):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 546/880 [54:14<36:05,  6.48s/it]
Training 10/16 epoch (loss 1.1562):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 546/880 [54:19<36:05,  6.48s/it]
Training 10/16 epoch (loss 1.1562):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 547/880 [54:19<34:09,  6.15s/it]
Training 10/16 epoch (loss 1.5312):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 547/880 [54:24<34:09,  6.15s/it]
Training 10/16 epoch (loss 1.5312):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 548/880 [54:24<31:43,  5.73s/it]
Training 10/16 epoch (loss 1.4531):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 548/880 [54:31<31:43,  5.73s/it]
Training 10/16 epoch (loss 1.4531):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 549/880 [54:31<33:35,  6.09s/it]
Training 10/16 epoch (loss 1.2109):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 549/880 [54:35<33:35,  6.09s/it]
Training 10/16 epoch (loss 1.2109):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 550/880 [54:35<31:04,  5.65s/it]
Training 11/16 epoch (loss 1.4297):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 550/880 [54:40<31:04,  5.65s/it]
Training 11/16 epoch (loss 1.4297):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 551/880 [54:40<30:17,  5.52s/it]
Training 11/16 epoch (loss 1.4531):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 551/880 [54:46<30:17,  5.52s/it]
Training 11/16 epoch (loss 1.4531):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 552/880 [54:46<30:56,  5.66s/it]
Training 11/16 epoch (loss 1.2656):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 552/880 [54:52<30:56,  5.66s/it]
Training 11/16 epoch (loss 1.2656):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 553/880 [54:52<31:24,  5.76s/it]
Training 11/16 epoch (loss 1.5703):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 553/880 [54:57<31:24,  5.76s/it]
Training 11/16 epoch (loss 1.5703):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 554/880 [54:57<29:33,  5.44s/it]
Training 11/16 epoch (loss 1.4844):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 554/880 [55:02<29:33,  5.44s/it]
Training 11/16 epoch (loss 1.4844):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 555/880 [55:02<29:08,  5.38s/it]
Training 11/16 epoch (loss 1.2812):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 555/880 [55:08<29:08,  5.38s/it]
Training 11/16 epoch (loss 1.2812):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 556/880 [55:08<29:44,  5.51s/it]
Training 11/16 epoch (loss 1.3828):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 556/880 [55:13<29:44,  5.51s/it]
Training 11/16 epoch (loss 1.3828):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 557/880 [55:13<28:13,  5.24s/it]
Training 11/16 epoch (loss 1.3516):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 557/880 [55:18<28:13,  5.24s/it]
Training 11/16 epoch (loss 1.3516):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 558/880 [55:18<28:40,  5.34s/it]
Training 11/16 epoch (loss 1.5312):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 558/880 [55:27<28:40,  5.34s/it]
Training 11/16 epoch (loss 1.5312):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 559/880 [55:27<33:40,  6.29s/it]
Training 11/16 epoch (loss 1.5156):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 559/880 [55:32<33:40,  6.29s/it]
Training 11/16 epoch (loss 1.5156):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 560/880 [55:32<31:36,  5.93s/it]
Training 11/16 epoch (loss 1.5078):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 560/880 [55:37<31:36,  5.93s/it]
Training 11/16 epoch (loss 1.5078):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 561/880 [55:37<29:43,  5.59s/it]
Training 11/16 epoch (loss 1.4375):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 561/880 [55:44<29:43,  5.59s/it]
Training 11/16 epoch (loss 1.4375):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 562/880 [55:44<32:35,  6.15s/it]
Training 11/16 epoch (loss 1.2188):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 562/880 [55:49<32:35,  6.15s/it]
Training 11/16 epoch (loss 1.2188):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 563/880 [55:49<30:20,  5.74s/it]
Training 11/16 epoch (loss 1.6406):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 563/880 [55:54<30:20,  5.74s/it]
Training 11/16 epoch (loss 1.6406):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 564/880 [55:54<28:43,  5.45s/it]
Training 11/16 epoch (loss 1.3516):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 564/880 [55:59<28:43,  5.45s/it]
Training 11/16 epoch (loss 1.3516):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 565/880 [55:59<28:43,  5.47s/it]
Training 11/16 epoch (loss 1.4531):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 565/880 [56:04<28:43,  5.47s/it]
Training 11/16 epoch (loss 1.4531):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 566/880 [56:04<28:10,  5.38s/it]
Training 11/16 epoch (loss 1.3281):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 566/880 [56:11<28:10,  5.38s/it]
Training 11/16 epoch (loss 1.3281):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 567/880 [56:11<29:53,  5.73s/it]
Training 11/16 epoch (loss 1.3672):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 567/880 [56:16<29:53,  5.73s/it]
Training 11/16 epoch (loss 1.3672):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 568/880 [56:16<28:44,  5.53s/it]
Training 11/16 epoch (loss 1.4297):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 568/880 [56:21<28:44,  5.53s/it]
Training 11/16 epoch (loss 1.4297):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 569/880 [56:21<27:44,  5.35s/it]
Training 11/16 epoch (loss 1.0859):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 569/880 [56:25<27:44,  5.35s/it]
Training 11/16 epoch (loss 1.0859):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 570/880 [56:25<25:47,  4.99s/it]
Training 11/16 epoch (loss 1.2656):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 570/880 [56:30<25:47,  4.99s/it]
Training 11/16 epoch (loss 1.2656):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 571/880 [56:30<25:35,  4.97s/it]
Training 11/16 epoch (loss 1.5312):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 571/880 [56:35<25:35,  4.97s/it]
Training 11/16 epoch (loss 1.5312):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 572/880 [56:35<26:09,  5.10s/it]
Training 11/16 epoch (loss 1.3125):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 572/880 [56:40<26:09,  5.10s/it]
Training 11/16 epoch (loss 1.3125):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 573/880 [56:40<24:34,  4.80s/it]
Training 11/16 epoch (loss 1.1797):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 573/880 [56:45<24:34,  4.80s/it]
Training 11/16 epoch (loss 1.1797):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 574/880 [56:45<24:57,  4.89s/it]
Training 11/16 epoch (loss 1.3203):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 574/880 [56:57<24:57,  4.89s/it]
Training 11/16 epoch (loss 1.3203):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 575/880 [56:57<36:27,  7.17s/it]
Training 11/16 epoch (loss 1.1250):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 575/880 [57:03<36:27,  7.17s/it]
Training 11/16 epoch (loss 1.1250):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 576/880 [57:03<34:16,  6.76s/it]
Training 11/16 epoch (loss 1.3594):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 576/880 [57:08<34:16,  6.76s/it]
Training 11/16 epoch (loss 1.3594):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 577/880 [57:08<32:01,  6.34s/it]
Training 11/16 epoch (loss 1.0703):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 577/880 [57:14<32:01,  6.34s/it]
Training 11/16 epoch (loss 1.0703):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 578/880 [57:14<30:21,  6.03s/it]
Training 11/16 epoch (loss 1.2344):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 578/880 [57:18<30:21,  6.03s/it]
Training 11/16 epoch (loss 1.2344):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 579/880 [57:18<28:12,  5.62s/it]
Training 11/16 epoch (loss 1.2578):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 579/880 [57:24<28:12,  5.62s/it]
Training 11/16 epoch (loss 1.2578):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 580/880 [57:24<27:58,  5.59s/it]
Training 11/16 epoch (loss 1.3203):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 580/880 [57:28<27:58,  5.59s/it]
Training 11/16 epoch (loss 1.3203):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 581/880 [57:28<26:07,  5.24s/it]
Training 11/16 epoch (loss 1.1406):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 581/880 [57:33<26:07,  5.24s/it]
Training 11/16 epoch (loss 1.1406):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 582/880 [57:33<25:01,  5.04s/it]
Training 11/16 epoch (loss 1.3281):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 582/880 [57:38<25:01,  5.04s/it]
Training 11/16 epoch (loss 1.3281):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 583/880 [57:38<25:34,  5.17s/it]
Training 11/16 epoch (loss 1.1719):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 583/880 [57:44<25:34,  5.17s/it]
Training 11/16 epoch (loss 1.1719):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 584/880 [57:44<25:48,  5.23s/it]
Training 11/16 epoch (loss 1.1562):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 584/880 [57:49<25:48,  5.23s/it]
Training 11/16 epoch (loss 1.1562):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 585/880 [57:49<25:57,  5.28s/it]
Training 11/16 epoch (loss 1.3359):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 585/880 [57:55<25:57,  5.28s/it]
Training 11/16 epoch (loss 1.3359):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 586/880 [57:55<27:33,  5.62s/it]
Training 11/16 epoch (loss 1.3750):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 586/880 [58:10<27:33,  5.62s/it]
Training 11/16 epoch (loss 1.3750):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 587/880 [58:10<40:17,  8.25s/it]
Training 11/16 epoch (loss 1.2266):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 587/880 [58:15<40:17,  8.25s/it]
Training 11/16 epoch (loss 1.2266):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 588/880 [58:15<35:37,  7.32s/it]
Training 11/16 epoch (loss 1.4219):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 588/880 [58:20<35:37,  7.32s/it]
Training 11/16 epoch (loss 1.4219):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 589/880 [58:20<31:29,  6.49s/it]
Training 11/16 epoch (loss 1.2266):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 589/880 [58:25<31:29,  6.49s/it]
Training 11/16 epoch (loss 1.2266):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 590/880 [58:25<29:27,  6.09s/it]
Training 11/16 epoch (loss 1.2422):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 590/880 [58:30<29:27,  6.09s/it]
Training 11/16 epoch (loss 1.2422):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 591/880 [58:30<27:39,  5.74s/it]
Training 11/16 epoch (loss 1.3516):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 591/880 [58:35<27:39,  5.74s/it]
Training 11/16 epoch (loss 1.3516):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 592/880 [58:35<27:06,  5.65s/it]
Training 11/16 epoch (loss 1.1797):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 592/880 [58:41<27:06,  5.65s/it]
Training 11/16 epoch (loss 1.1797):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 593/880 [58:41<27:45,  5.80s/it]
Training 11/16 epoch (loss 1.6562):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 593/880 [58:49<27:45,  5.80s/it]
Training 11/16 epoch (loss 1.6562):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 594/880 [58:49<30:37,  6.42s/it]
Training 11/16 epoch (loss 1.3828):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 594/880 [59:05<30:37,  6.42s/it]
Training 11/16 epoch (loss 1.3828):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 595/880 [59:05<43:47,  9.22s/it]
Training 11/16 epoch (loss 1.1875):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 595/880 [59:11<43:47,  9.22s/it]
Training 11/16 epoch (loss 1.1875):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 596/880 [59:11<38:55,  8.22s/it]
Training 11/16 epoch (loss 1.1250):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 596/880 [59:16<38:55,  8.22s/it]
Training 11/16 epoch (loss 1.1250):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 597/880 [59:16<33:59,  7.21s/it]
Training 11/16 epoch (loss 1.2344):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 597/880 [59:21<33:59,  7.21s/it]
Training 11/16 epoch (loss 1.2344):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 598/880 [59:21<31:21,  6.67s/it]
Training 11/16 epoch (loss 1.1797):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 598/880 [59:28<31:21,  6.67s/it]
Training 11/16 epoch (loss 1.1797):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 599/880 [59:28<31:42,  6.77s/it]
Training 11/16 epoch (loss 1.3047):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 599/880 [59:34<31:42,  6.77s/it]
Training 11/16 epoch (loss 1.3047):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 600/880 [59:34<30:04,  6.44s/it]
Training 11/16 epoch (loss 1.3672):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 600/880 [59:40<30:04,  6.44s/it]
Training 11/16 epoch (loss 1.3672):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 601/880 [59:40<30:09,  6.49s/it]
Training 11/16 epoch (loss 1.0859):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 601/880 [59:46<30:09,  6.49s/it]
Training 11/16 epoch (loss 1.0859):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 602/880 [59:46<28:31,  6.16s/it]
Training 11/16 epoch (loss 1.4688):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 602/880 [59:50<28:31,  6.16s/it]
Training 11/16 epoch (loss 1.4688):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 603/880 [59:50<26:28,  5.73s/it]
Training 11/16 epoch (loss 1.3672):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 603/880 [59:57<26:28,  5.73s/it]
Training 11/16 epoch (loss 1.3672):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 604/880 [59:57<27:59,  6.09s/it]
Training 11/16 epoch (loss 1.1328):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 604/880 [1:00:02<27:59,  6.09s/it]
Training 11/16 epoch (loss 1.1328):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 605/880 [1:00:02<25:53,  5.65s/it]
Training 12/16 epoch (loss 1.3516):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 605/880 [1:00:07<25:53,  5.65s/it]
Training 12/16 epoch (loss 1.3516):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 606/880 [1:00:07<25:13,  5.52s/it]
Training 12/16 epoch (loss 1.3750):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 606/880 [1:00:13<25:13,  5.52s/it]
Training 12/16 epoch (loss 1.3750):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 607/880 [1:00:13<25:45,  5.66s/it]
Training 12/16 epoch (loss 1.1953):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 607/880 [1:00:19<25:45,  5.66s/it]
Training 12/16 epoch (loss 1.1953):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 608/880 [1:00:19<26:07,  5.76s/it]
Training 12/16 epoch (loss 1.5234):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 608/880 [1:00:24<26:07,  5.76s/it]
Training 12/16 epoch (loss 1.5234):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 609/880 [1:00:24<24:35,  5.44s/it]
Training 12/16 epoch (loss 1.4219):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 609/880 [1:00:29<24:35,  5.44s/it]
Training 12/16 epoch (loss 1.4219):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 610/880 [1:00:29<24:14,  5.39s/it]
Training 12/16 epoch (loss 1.2188):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 610/880 [1:00:35<24:14,  5.39s/it]
Training 12/16 epoch (loss 1.2188):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 611/880 [1:00:35<24:43,  5.52s/it]
Training 12/16 epoch (loss 1.3281):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 611/880 [1:00:40<24:43,  5.52s/it]
Training 12/16 epoch (loss 1.3281):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 612/880 [1:00:40<23:26,  5.25s/it]
Training 12/16 epoch (loss 1.3047):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 612/880 [1:00:45<23:26,  5.25s/it]
Training 12/16 epoch (loss 1.3047):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 613/880 [1:00:45<23:47,  5.35s/it]
Training 12/16 epoch (loss 1.4609):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 613/880 [1:00:54<23:47,  5.35s/it]
Training 12/16 epoch (loss 1.4609):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 614/880 [1:00:54<27:54,  6.30s/it]
Training 12/16 epoch (loss 1.4297):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 614/880 [1:00:59<27:54,  6.30s/it]
Training 12/16 epoch (loss 1.4297):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 615/880 [1:00:59<26:10,  5.93s/it]
Training 12/16 epoch (loss 1.4453):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 615/880 [1:01:04<26:10,  5.93s/it]
Training 12/16 epoch (loss 1.4453):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 616/880 [1:01:04<24:35,  5.59s/it]
Training 12/16 epoch (loss 1.3750):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 616/880 [1:01:11<24:35,  5.59s/it]
Training 12/16 epoch (loss 1.3750):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 617/880 [1:01:11<26:56,  6.15s/it]
Training 12/16 epoch (loss 1.1719):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 617/880 [1:01:16<26:56,  6.15s/it]
Training 12/16 epoch (loss 1.1719):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 618/880 [1:01:16<25:03,  5.74s/it]
Training 12/16 epoch (loss 1.5938):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 618/880 [1:01:21<25:03,  5.74s/it]
Training 12/16 epoch (loss 1.5938):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 619/880 [1:01:21<23:42,  5.45s/it]
Training 12/16 epoch (loss 1.3047):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 619/880 [1:01:26<23:42,  5.45s/it]
Training 12/16 epoch (loss 1.3047):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 620/880 [1:01:26<23:42,  5.47s/it]
Training 12/16 epoch (loss 1.3984):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 620/880 [1:01:31<23:42,  5.47s/it]
Training 12/16 epoch (loss 1.3984):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 621/880 [1:01:31<23:15,  5.39s/it]
Training 12/16 epoch (loss 1.2812):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 621/880 [1:01:38<23:15,  5.39s/it]
Training 12/16 epoch (loss 1.2812):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 622/880 [1:01:38<24:39,  5.73s/it]
Training 12/16 epoch (loss 1.2969):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 622/880 [1:01:43<24:39,  5.73s/it]
Training 12/16 epoch (loss 1.2969):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 623/880 [1:01:43<23:40,  5.53s/it]
Training 12/16 epoch (loss 1.3516):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 623/880 [1:01:48<23:40,  5.53s/it]
Training 12/16 epoch (loss 1.3516):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 624/880 [1:01:48<22:50,  5.35s/it]
Training 12/16 epoch (loss 1.0312):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 624/880 [1:01:52<22:50,  5.35s/it]
Training 12/16 epoch (loss 1.0312):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 625/880 [1:01:52<21:12,  4.99s/it]
Training 12/16 epoch (loss 1.2109):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 625/880 [1:01:57<21:12,  4.99s/it]
Training 12/16 epoch (loss 1.2109):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 626/880 [1:01:57<21:01,  4.97s/it]
Training 12/16 epoch (loss 1.4766):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 626/880 [1:02:02<21:01,  4.97s/it]
Training 12/16 epoch (loss 1.4766):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 627/880 [1:02:02<21:28,  5.09s/it]
Training 12/16 epoch (loss 1.2734):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 627/880 [1:02:06<21:28,  5.09s/it]
Training 12/16 epoch (loss 1.2734):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 628/880 [1:02:06<20:09,  4.80s/it]
Training 12/16 epoch (loss 1.1484):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 628/880 [1:02:11<20:09,  4.80s/it]
Training 12/16 epoch (loss 1.1484):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 629/880 [1:02:11<20:26,  4.89s/it]
Training 12/16 epoch (loss 1.2656):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 629/880 [1:02:24<20:26,  4.89s/it]
Training 12/16 epoch (loss 1.2656):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 630/880 [1:02:24<29:52,  7.17s/it]
Training 12/16 epoch (loss 1.0625):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 630/880 [1:02:30<29:52,  7.17s/it]
Training 12/16 epoch (loss 1.0625):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 631/880 [1:02:30<28:04,  6.76s/it]
Training 12/16 epoch (loss 1.2969):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 631/880 [1:02:35<28:04,  6.76s/it]
Training 12/16 epoch (loss 1.2969):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 632/880 [1:02:35<26:13,  6.35s/it]
Training 12/16 epoch (loss 1.0078):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 632/880 [1:02:40<26:13,  6.35s/it]
Training 12/16 epoch (loss 1.0078):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 633/880 [1:02:40<24:50,  6.03s/it]
Training 12/16 epoch (loss 1.1953):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 633/880 [1:02:45<24:50,  6.03s/it]
Training 12/16 epoch (loss 1.1953):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 634/880 [1:02:45<23:03,  5.63s/it]
Training 12/16 epoch (loss 1.2031):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 634/880 [1:02:51<23:03,  5.63s/it]
Training 12/16 epoch (loss 1.2031):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 635/880 [1:02:51<22:50,  5.59s/it]
Training 12/16 epoch (loss 1.2734):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 635/880 [1:02:55<22:50,  5.59s/it]
Training 12/16 epoch (loss 1.2734):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 636/880 [1:02:55<21:18,  5.24s/it]
Training 12/16 epoch (loss 1.0859):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 636/880 [1:03:00<21:18,  5.24s/it]
Training 12/16 epoch (loss 1.0859):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 637/880 [1:03:00<20:24,  5.04s/it]
Training 12/16 epoch (loss 1.2656):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 637/880 [1:03:05<20:24,  5.04s/it]
Training 12/16 epoch (loss 1.2656):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 638/880 [1:03:05<20:50,  5.17s/it]
Training 12/16 epoch (loss 1.1094):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 638/880 [1:03:10<20:50,  5.17s/it]
Training 12/16 epoch (loss 1.1094):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 639/880 [1:03:10<21:00,  5.23s/it]
Training 12/16 epoch (loss 1.1172):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 639/880 [1:03:16<21:00,  5.23s/it]
Training 12/16 epoch (loss 1.1172):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 640/880 [1:03:16<21:06,  5.28s/it]
Training 12/16 epoch (loss 1.2891):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 640/880 [1:03:22<21:06,  5.28s/it]
Training 12/16 epoch (loss 1.2891):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 641/880 [1:03:22<22:23,  5.62s/it]
Training 12/16 epoch (loss 1.3281):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 641/880 [1:03:37<22:23,  5.62s/it]
Training 12/16 epoch (loss 1.3281):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 642/880 [1:03:37<32:44,  8.25s/it]
Training 12/16 epoch (loss 1.1641):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 642/880 [1:03:42<32:44,  8.25s/it]
Training 12/16 epoch (loss 1.1641):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 643/880 [1:03:42<28:54,  7.32s/it]
Training 12/16 epoch (loss 1.3516):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 643/880 [1:03:46<28:54,  7.32s/it]
Training 12/16 epoch (loss 1.3516):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 644/880 [1:03:46<25:33,  6.50s/it]
Training 12/16 epoch (loss 1.1719):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 644/880 [1:03:52<25:33,  6.50s/it]
Training 12/16 epoch (loss 1.1719):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 645/880 [1:03:52<23:53,  6.10s/it]
Training 12/16 epoch (loss 1.1875):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 645/880 [1:03:56<23:53,  6.10s/it]
Training 12/16 epoch (loss 1.1875):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 646/880 [1:03:56<22:25,  5.75s/it]
Training 12/16 epoch (loss 1.2969):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 646/880 [1:04:02<22:25,  5.75s/it]
Training 12/16 epoch (loss 1.2969):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 647/880 [1:04:02<21:56,  5.65s/it]
Training 12/16 epoch (loss 1.1250):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 647/880 [1:04:08<21:56,  5.65s/it]
Training 12/16 epoch (loss 1.1250):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 648/880 [1:04:08<22:26,  5.80s/it]
Training 12/16 epoch (loss 1.6172):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 648/880 [1:04:16<22:26,  5.80s/it]
Training 12/16 epoch (loss 1.6172):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 649/880 [1:04:16<24:43,  6.42s/it]
Training 12/16 epoch (loss 1.3281):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 649/880 [1:04:32<24:43,  6.42s/it]
Training 12/16 epoch (loss 1.3281):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 650/880 [1:04:32<35:19,  9.22s/it]
Training 12/16 epoch (loss 1.1250):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 650/880 [1:04:38<35:19,  9.22s/it]
Training 12/16 epoch (loss 1.1250):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 651/880 [1:04:38<31:22,  8.22s/it]
Training 12/16 epoch (loss 1.0703):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 651/880 [1:04:42<31:22,  8.22s/it]
Training 12/16 epoch (loss 1.0703):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 652/880 [1:04:42<27:21,  7.20s/it]
Training 12/16 epoch (loss 1.1719):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 652/880 [1:04:48<27:21,  7.20s/it]
Training 12/16 epoch (loss 1.1719):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 653/880 [1:04:48<25:13,  6.67s/it]
Training 12/16 epoch (loss 1.1250):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 653/880 [1:04:55<25:13,  6.67s/it]
Training 12/16 epoch (loss 1.1250):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 654/880 [1:04:55<25:29,  6.77s/it]
Training 12/16 epoch (loss 1.2422):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 654/880 [1:05:00<25:29,  6.77s/it]
Training 12/16 epoch (loss 1.2422):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 655/880 [1:05:00<24:09,  6.44s/it]
Training 12/16 epoch (loss 1.3203):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 655/880 [1:05:07<24:09,  6.44s/it]
Training 12/16 epoch (loss 1.3203):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 656/880 [1:05:07<24:13,  6.49s/it]
Training 12/16 epoch (loss 1.0469):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 656/880 [1:05:12<24:13,  6.49s/it]
Training 12/16 epoch (loss 1.0469):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 657/880 [1:05:12<22:53,  6.16s/it]
Training 12/16 epoch (loss 1.4297):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 657/880 [1:05:17<22:53,  6.16s/it]
Training 12/16 epoch (loss 1.4297):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 658/880 [1:05:17<21:13,  5.74s/it]
Training 12/16 epoch (loss 1.3281):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 658/880 [1:05:24<21:13,  5.74s/it]
Training 12/16 epoch (loss 1.3281):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 659/880 [1:05:24<22:26,  6.09s/it]
Training 12/16 epoch (loss 1.0859):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 659/880 [1:05:29<22:26,  6.09s/it]
Training 12/16 epoch (loss 1.0859):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 660/880 [1:05:29<20:43,  5.65s/it]
Training 13/16 epoch (loss 1.2969):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 660/880 [1:05:34<20:43,  5.65s/it]
Training 13/16 epoch (loss 1.2969):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 661/880 [1:05:34<20:09,  5.52s/it]
Training 13/16 epoch (loss 1.3125):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 661/880 [1:05:40<20:09,  5.52s/it]
Training 13/16 epoch (loss 1.3125):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 662/880 [1:05:40<20:33,  5.66s/it]
Training 13/16 epoch (loss 1.1484):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 662/880 [1:05:46<20:33,  5.66s/it]
Training 13/16 epoch (loss 1.1484):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 663/880 [1:05:46<20:50,  5.76s/it]
Training 13/16 epoch (loss 1.4766):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 663/880 [1:05:51<20:50,  5.76s/it]
Training 13/16 epoch (loss 1.4766):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 664/880 [1:05:51<19:35,  5.44s/it]
Training 13/16 epoch (loss 1.3906):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 664/880 [1:05:56<19:35,  5.44s/it]
Training 13/16 epoch (loss 1.3906):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 665/880 [1:05:56<19:17,  5.38s/it]
Training 13/16 epoch (loss 1.1641):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 665/880 [1:06:02<19:17,  5.38s/it]
Training 13/16 epoch (loss 1.1641):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 666/880 [1:06:02<19:40,  5.51s/it]
Training 13/16 epoch (loss 1.2812):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 666/880 [1:06:06<19:40,  5.51s/it]
Training 13/16 epoch (loss 1.2812):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 667/880 [1:06:06<18:38,  5.25s/it]
Training 13/16 epoch (loss 1.2578):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 667/880 [1:06:12<18:38,  5.25s/it]
Training 13/16 epoch (loss 1.2578):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 668/880 [1:06:12<18:54,  5.35s/it]
Training 13/16 epoch (loss 1.4141):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 668/880 [1:06:20<18:54,  5.35s/it]
Training 13/16 epoch (loss 1.4141):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 669/880 [1:06:20<22:09,  6.30s/it]
Training 13/16 epoch (loss 1.3984):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 669/880 [1:06:26<22:09,  6.30s/it]
Training 13/16 epoch (loss 1.3984):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 670/880 [1:06:26<20:45,  5.93s/it]
Training 13/16 epoch (loss 1.4062):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 670/880 [1:06:30<20:45,  5.93s/it]
Training 13/16 epoch (loss 1.4062):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 671/880 [1:06:30<19:29,  5.59s/it]
Training 13/16 epoch (loss 1.3281):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 671/880 [1:06:38<19:29,  5.59s/it]
Training 13/16 epoch (loss 1.3281):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 672/880 [1:06:38<21:19,  6.15s/it]
Training 13/16 epoch (loss 1.1250):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 672/880 [1:06:43<21:19,  6.15s/it]
Training 13/16 epoch (loss 1.1250):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 673/880 [1:06:43<19:48,  5.74s/it]
Training 13/16 epoch (loss 1.5547):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 673/880 [1:06:47<19:48,  5.74s/it]
Training 13/16 epoch (loss 1.5547):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 674/880 [1:06:47<18:43,  5.45s/it]
Training 13/16 epoch (loss 1.2656):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 674/880 [1:06:53<18:43,  5.45s/it]
Training 13/16 epoch (loss 1.2656):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 675/880 [1:06:53<18:41,  5.47s/it]
Training 13/16 epoch (loss 1.3594):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 675/880 [1:06:58<18:41,  5.47s/it]
Training 13/16 epoch (loss 1.3594):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 676/880 [1:06:58<18:18,  5.38s/it]
Training 13/16 epoch (loss 1.2500):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 676/880 [1:07:05<18:18,  5.38s/it]
Training 13/16 epoch (loss 1.2500):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 677/880 [1:07:05<19:23,  5.73s/it]
Training 13/16 epoch (loss 1.2500):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 677/880 [1:07:10<19:23,  5.73s/it]
Training 13/16 epoch (loss 1.2500):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 678/880 [1:07:10<18:36,  5.53s/it]
Training 13/16 epoch (loss 1.3203):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 678/880 [1:07:15<18:36,  5.53s/it]
Training 13/16 epoch (loss 1.3203):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 679/880 [1:07:15<17:55,  5.35s/it]
Training 13/16 epoch (loss 0.9922):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 679/880 [1:07:19<17:55,  5.35s/it]
Training 13/16 epoch (loss 0.9922):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 680/880 [1:07:19<16:38,  4.99s/it]
Training 13/16 epoch (loss 1.1641):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 680/880 [1:07:24<16:38,  4.99s/it]
Training 13/16 epoch (loss 1.1641):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 681/880 [1:07:24<16:28,  4.97s/it]
Training 13/16 epoch (loss 1.4375):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 681/880 [1:07:29<16:28,  4.97s/it]
Training 13/16 epoch (loss 1.4375):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 682/880 [1:07:29<16:48,  5.09s/it]
Training 13/16 epoch (loss 1.2344):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 682/880 [1:07:33<16:48,  5.09s/it]
Training 13/16 epoch (loss 1.2344):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 683/880 [1:07:33<15:45,  4.80s/it]
Training 13/16 epoch (loss 1.1094):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 683/880 [1:07:38<15:45,  4.80s/it]
Training 13/16 epoch (loss 1.1094):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 684/880 [1:07:38<15:58,  4.89s/it]
Training 13/16 epoch (loss 1.2344):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 684/880 [1:07:51<15:58,  4.89s/it]
Training 13/16 epoch (loss 1.2344):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 685/880 [1:07:51<23:17,  7.17s/it]
Training 13/16 epoch (loss 1.0234):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 685/880 [1:07:57<23:17,  7.17s/it]
Training 13/16 epoch (loss 1.0234):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 686/880 [1:07:57<21:51,  6.76s/it]
Training 13/16 epoch (loss 1.2500):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 686/880 [1:08:02<21:51,  6.76s/it]
Training 13/16 epoch (loss 1.2500):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 687/880 [1:08:02<20:23,  6.34s/it]
Training 13/16 epoch (loss 0.9688):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 687/880 [1:08:07<20:23,  6.34s/it]
Training 13/16 epoch (loss 0.9688):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 688/880 [1:08:07<19:17,  6.03s/it]
Training 13/16 epoch (loss 1.1562):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 688/880 [1:08:12<19:17,  6.03s/it]
Training 13/16 epoch (loss 1.1562):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 689/880 [1:08:12<17:53,  5.62s/it]
Training 13/16 epoch (loss 1.1719):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 689/880 [1:08:17<17:53,  5.62s/it]
Training 13/16 epoch (loss 1.1719):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 690/880 [1:08:17<17:42,  5.59s/it]
Training 13/16 epoch (loss 1.2188):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 690/880 [1:08:22<17:42,  5.59s/it]
Training 13/16 epoch (loss 1.2188):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 691/880 [1:08:22<16:30,  5.24s/it]
Training 13/16 epoch (loss 1.0391):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 691/880 [1:08:26<16:30,  5.24s/it]
Training 13/16 epoch (loss 1.0391):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 692/880 [1:08:26<15:47,  5.04s/it]
Training 13/16 epoch (loss 1.2344):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 692/880 [1:08:32<15:47,  5.04s/it]
Training 13/16 epoch (loss 1.2344):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 693/880 [1:08:32<16:06,  5.17s/it]
Training 13/16 epoch (loss 1.0625):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 693/880 [1:08:37<16:06,  5.17s/it]
Training 13/16 epoch (loss 1.0625):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 694/880 [1:08:37<16:13,  5.23s/it]
Training 13/16 epoch (loss 1.0703):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 694/880 [1:08:43<16:13,  5.23s/it]
Training 13/16 epoch (loss 1.0703):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 695/880 [1:08:43<16:16,  5.28s/it]
Training 13/16 epoch (loss 1.2500):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 695/880 [1:08:49<16:16,  5.28s/it]
Training 13/16 epoch (loss 1.2500):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 696/880 [1:08:49<17:14,  5.62s/it]
Training 13/16 epoch (loss 1.2812):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 696/880 [1:09:03<17:14,  5.62s/it]
Training 13/16 epoch (loss 1.2812):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 697/880 [1:09:03<25:09,  8.25s/it]
Training 13/16 epoch (loss 1.1250):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 697/880 [1:09:09<25:09,  8.25s/it]
Training 13/16 epoch (loss 1.1250):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 698/880 [1:09:09<22:11,  7.32s/it]
Training 13/16 epoch (loss 1.3125):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 698/880 [1:09:13<22:11,  7.32s/it]
Training 13/16 epoch (loss 1.3125):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 699/880 [1:09:13<19:35,  6.49s/it]
Training 13/16 epoch (loss 1.1328):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 699/880 [1:09:18<19:35,  6.49s/it]
Training 13/16 epoch (loss 1.1328):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 700/880 [1:09:18<18:16,  6.09s/it]
Training 13/16 epoch (loss 1.1406):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 700/880 [1:09:23<18:16,  6.09s/it]
Training 13/16 epoch (loss 1.1406):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 701/880 [1:09:23<17:08,  5.74s/it]
Training 13/16 epoch (loss 1.2500):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 701/880 [1:09:29<17:08,  5.74s/it]
Training 13/16 epoch (loss 1.2500):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 702/880 [1:09:29<16:45,  5.65s/it]
Training 13/16 epoch (loss 1.0859):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 702/880 [1:09:35<16:45,  5.65s/it]
Training 13/16 epoch (loss 1.0859):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 703/880 [1:09:35<17:07,  5.80s/it]
Training 13/16 epoch (loss 1.5859):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 703/880 [1:09:43<17:07,  5.80s/it]
Training 13/16 epoch (loss 1.5859):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 704/880 [1:09:43<18:51,  6.43s/it]
Training 13/16 epoch (loss 1.2812):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 704/880 [1:09:58<18:51,  6.43s/it]
Training 13/16 epoch (loss 1.2812):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 705/880 [1:09:58<26:53,  9.22s/it]
Training 13/16 epoch (loss 1.0781):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 705/880 [1:10:04<26:53,  9.22s/it]
Training 13/16 epoch (loss 1.0781):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 706/880 [1:10:04<23:50,  8.22s/it]
Training 13/16 epoch (loss 1.0312):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 706/880 [1:10:09<23:50,  8.22s/it]
Training 13/16 epoch (loss 1.0312):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 707/880 [1:10:09<20:46,  7.20s/it]
Training 13/16 epoch (loss 1.1406):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 707/880 [1:10:15<20:46,  7.20s/it]
Training 13/16 epoch (loss 1.1406):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 708/880 [1:10:15<19:06,  6.67s/it]
Training 13/16 epoch (loss 1.0938):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 708/880 [1:10:22<19:06,  6.67s/it]
Training 13/16 epoch (loss 1.0938):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 709/880 [1:10:22<19:16,  6.76s/it]
Training 13/16 epoch (loss 1.2188):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 709/880 [1:10:27<19:16,  6.76s/it]
Training 13/16 epoch (loss 1.2188):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 710/880 [1:10:27<18:14,  6.44s/it]
Training 13/16 epoch (loss 1.2891):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 710/880 [1:10:34<18:14,  6.44s/it]
Training 13/16 epoch (loss 1.2891):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 711/880 [1:10:34<18:15,  6.48s/it]
Training 13/16 epoch (loss 1.0078):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 711/880 [1:10:39<18:15,  6.48s/it]
Training 13/16 epoch (loss 1.0078):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 712/880 [1:10:39<17:13,  6.15s/it]
Training 13/16 epoch (loss 1.4062):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 712/880 [1:10:44<17:13,  6.15s/it]
Training 13/16 epoch (loss 1.4062):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 713/880 [1:10:44<15:57,  5.73s/it]
Training 13/16 epoch (loss 1.2969):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 713/880 [1:10:51<15:57,  5.73s/it]
Training 13/16 epoch (loss 1.2969):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 714/880 [1:10:51<16:50,  6.09s/it]
Training 13/16 epoch (loss 1.0625):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 714/880 [1:10:56<16:50,  6.09s/it]
Training 13/16 epoch (loss 1.0625):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 715/880 [1:10:56<15:32,  5.65s/it]
Training 14/16 epoch (loss 1.2656):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 715/880 [1:11:01<15:32,  5.65s/it]
Training 14/16 epoch (loss 1.2656):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 716/880 [1:11:01<15:06,  5.53s/it]
Training 14/16 epoch (loss 1.2969):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 716/880 [1:11:07<15:06,  5.53s/it]
Training 14/16 epoch (loss 1.2969):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 717/880 [1:11:07<15:22,  5.66s/it]
Training 14/16 epoch (loss 1.1094):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 717/880 [1:11:13<15:22,  5.66s/it]
Training 14/16 epoch (loss 1.1094):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 718/880 [1:11:13<15:33,  5.76s/it]
Training 14/16 epoch (loss 1.4531):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 718/880 [1:11:17<15:33,  5.76s/it]
Training 14/16 epoch (loss 1.4531):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 719/880 [1:11:17<14:35,  5.44s/it]
Training 14/16 epoch (loss 1.3594):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 719/880 [1:11:23<14:35,  5.44s/it]
Training 14/16 epoch (loss 1.3594):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 720/880 [1:11:23<14:20,  5.38s/it]
Training 14/16 epoch (loss 1.1484):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 720/880 [1:11:28<14:20,  5.38s/it]
Training 14/16 epoch (loss 1.1484):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 721/880 [1:11:28<14:35,  5.51s/it]
Training 14/16 epoch (loss 1.2578):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 721/880 [1:11:33<14:35,  5.51s/it]
Training 14/16 epoch (loss 1.2578):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 722/880 [1:11:33<13:48,  5.24s/it]
Training 14/16 epoch (loss 1.2344):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 722/880 [1:11:39<13:48,  5.24s/it]
Training 14/16 epoch (loss 1.2344):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 723/880 [1:11:39<13:58,  5.34s/it]
Training 14/16 epoch (loss 1.3984):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 723/880 [1:11:47<13:58,  5.34s/it]
Training 14/16 epoch (loss 1.3984):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 724/880 [1:11:47<16:21,  6.29s/it]
Training 14/16 epoch (loss 1.3672):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 724/880 [1:11:52<16:21,  6.29s/it]
Training 14/16 epoch (loss 1.3672):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 725/880 [1:11:52<15:18,  5.93s/it]
Training 14/16 epoch (loss 1.3672):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 725/880 [1:11:57<15:18,  5.93s/it]
Training 14/16 epoch (loss 1.3672):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 726/880 [1:11:57<14:21,  5.59s/it]
Training 14/16 epoch (loss 1.2969):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 726/880 [1:12:05<14:21,  5.59s/it]
Training 14/16 epoch (loss 1.2969):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 727/880 [1:12:05<15:41,  6.15s/it]
Training 14/16 epoch (loss 1.1016):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 727/880 [1:12:09<15:41,  6.15s/it]
Training 14/16 epoch (loss 1.1016):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 728/880 [1:12:09<14:32,  5.74s/it]
Training 14/16 epoch (loss 1.5312):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 728/880 [1:12:14<14:32,  5.74s/it]
Training 14/16 epoch (loss 1.5312):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 729/880 [1:12:14<13:43,  5.45s/it]
Training 14/16 epoch (loss 1.2422):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 729/880 [1:12:20<13:43,  5.45s/it]
Training 14/16 epoch (loss 1.2422):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 730/880 [1:12:20<13:40,  5.47s/it]
Training 14/16 epoch (loss 1.3438):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 730/880 [1:12:25<13:40,  5.47s/it]
Training 14/16 epoch (loss 1.3438):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 731/880 [1:12:25<13:21,  5.38s/it]
Training 14/16 epoch (loss 1.2188):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 731/880 [1:12:31<13:21,  5.38s/it]
Training 14/16 epoch (loss 1.2188):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 732/880 [1:12:31<14:07,  5.73s/it]
Training 14/16 epoch (loss 1.2266):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 732/880 [1:12:36<14:07,  5.73s/it]
Training 14/16 epoch (loss 1.2266):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 733/880 [1:12:36<13:31,  5.52s/it]
Training 14/16 epoch (loss 1.2969):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 733/880 [1:12:41<13:31,  5.52s/it]
Training 14/16 epoch (loss 1.2969):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 734/880 [1:12:41<13:00,  5.35s/it]
Training 14/16 epoch (loss 0.9688):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 734/880 [1:12:45<13:00,  5.35s/it]
Training 14/16 epoch (loss 0.9688):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 735/880 [1:12:45<12:03,  4.99s/it]
Training 14/16 epoch (loss 1.1484):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 735/880 [1:12:50<12:03,  4.99s/it]
Training 14/16 epoch (loss 1.1484):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 736/880 [1:12:50<11:55,  4.97s/it]
Training 14/16 epoch (loss 1.3984):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 736/880 [1:12:56<11:55,  4.97s/it]
Training 14/16 epoch (loss 1.3984):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 737/880 [1:12:56<12:08,  5.10s/it]
Training 14/16 epoch (loss 1.2031):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 737/880 [1:13:00<12:08,  5.10s/it]
Training 14/16 epoch (loss 1.2031):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 738/880 [1:13:00<11:22,  4.81s/it]
Training 14/16 epoch (loss 1.0781):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 738/880 [1:13:05<11:22,  4.81s/it]
Training 14/16 epoch (loss 1.0781):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 739/880 [1:13:05<11:30,  4.90s/it]
Training 14/16 epoch (loss 1.1953):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 739/880 [1:13:17<11:30,  4.90s/it]
Training 14/16 epoch (loss 1.1953):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 740/880 [1:13:17<16:43,  7.17s/it]
Training 14/16 epoch (loss 1.0000):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 740/880 [1:13:23<16:43,  7.17s/it]
Training 14/16 epoch (loss 1.0000):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 741/880 [1:13:23<15:39,  6.76s/it]
Training 14/16 epoch (loss 1.2266):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 741/880 [1:13:29<15:39,  6.76s/it]
Training 14/16 epoch (loss 1.2266):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 742/880 [1:13:29<14:35,  6.34s/it]
Training 14/16 epoch (loss 0.9492):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 742/880 [1:13:34<14:35,  6.34s/it]
Training 14/16 epoch (loss 0.9492):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 743/880 [1:13:34<13:45,  6.03s/it]
Training 14/16 epoch (loss 1.1250):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 743/880 [1:13:39<13:45,  6.03s/it]
Training 14/16 epoch (loss 1.1250):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 744/880 [1:13:39<12:44,  5.62s/it]
Training 14/16 epoch (loss 1.1484):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 744/880 [1:13:44<12:44,  5.62s/it]
Training 14/16 epoch (loss 1.1484):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 745/880 [1:13:44<12:35,  5.59s/it]
Training 14/16 epoch (loss 1.1953):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 745/880 [1:13:49<12:35,  5.59s/it]
Training 14/16 epoch (loss 1.1953):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 746/880 [1:13:49<11:42,  5.24s/it]
Training 14/16 epoch (loss 1.0078):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 746/880 [1:13:53<11:42,  5.24s/it]
Training 14/16 epoch (loss 1.0078):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 747/880 [1:13:53<11:10,  5.04s/it]
Training 14/16 epoch (loss 1.2109):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 747/880 [1:13:59<11:10,  5.04s/it]
Training 14/16 epoch (loss 1.2109):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 748/880 [1:13:59<11:22,  5.17s/it]
Training 14/16 epoch (loss 1.0312):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 748/880 [1:14:04<11:22,  5.17s/it]
Training 14/16 epoch (loss 1.0312):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 749/880 [1:14:04<11:26,  5.24s/it]
Training 14/16 epoch (loss 1.0469):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 749/880 [1:14:09<11:26,  5.24s/it]
Training 14/16 epoch (loss 1.0469):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 750/880 [1:14:09<11:27,  5.29s/it]
Training 14/16 epoch (loss 1.2266):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 750/880 [1:14:16<11:27,  5.29s/it]
Training 14/16 epoch (loss 1.2266):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 751/880 [1:14:16<12:06,  5.63s/it]
Training 14/16 epoch (loss 1.2656):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 751/880 [1:14:30<12:06,  5.63s/it]
Training 14/16 epoch (loss 1.2656):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 752/880 [1:14:30<17:37,  8.26s/it]
Training 14/16 epoch (loss 1.1094):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 752/880 [1:14:35<17:37,  8.26s/it]
Training 14/16 epoch (loss 1.1094):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 753/880 [1:14:35<15:29,  7.32s/it]
Training 14/16 epoch (loss 1.2969):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 753/880 [1:14:40<15:29,  7.32s/it]
Training 14/16 epoch (loss 1.2969):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 754/880 [1:14:40<13:38,  6.50s/it]
Training 14/16 epoch (loss 1.1094):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 754/880 [1:14:45<13:38,  6.50s/it]
Training 14/16 epoch (loss 1.1094):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 755/880 [1:14:45<12:41,  6.09s/it]
Training 14/16 epoch (loss 1.1328):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 755/880 [1:14:50<12:41,  6.09s/it]
Training 14/16 epoch (loss 1.1328):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 756/880 [1:14:50<11:52,  5.74s/it]
Training 14/16 epoch (loss 1.2266):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 756/880 [1:14:55<11:52,  5.74s/it]
Training 14/16 epoch (loss 1.2266):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 757/880 [1:14:55<11:34,  5.65s/it]
Training 14/16 epoch (loss 1.0625):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 757/880 [1:15:02<11:34,  5.65s/it]
Training 14/16 epoch (loss 1.0625):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 758/880 [1:15:02<11:48,  5.81s/it]
Training 14/16 epoch (loss 1.5703):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 758/880 [1:15:09<11:48,  5.81s/it]
Training 14/16 epoch (loss 1.5703):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 759/880 [1:15:09<12:57,  6.43s/it]
Training 14/16 epoch (loss 1.2578):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 759/880 [1:15:25<12:57,  6.43s/it]
Training 14/16 epoch (loss 1.2578):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 760/880 [1:15:25<18:27,  9.23s/it]
Training 14/16 epoch (loss 1.0547):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 760/880 [1:15:31<18:27,  9.23s/it]
Training 14/16 epoch (loss 1.0547):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 761/880 [1:15:31<16:19,  8.23s/it]
Training 14/16 epoch (loss 1.0078):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 761/880 [1:15:36<16:19,  8.23s/it]
Training 14/16 epoch (loss 1.0078):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 762/880 [1:15:36<14:11,  7.21s/it]
Training 14/16 epoch (loss 1.1172):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 762/880 [1:15:41<14:11,  7.21s/it]
Training 14/16 epoch (loss 1.1172):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 763/880 [1:15:41<13:00,  6.67s/it]
Training 14/16 epoch (loss 1.0625):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 763/880 [1:15:48<13:00,  6.67s/it]
Training 14/16 epoch (loss 1.0625):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 764/880 [1:15:48<13:05,  6.77s/it]
Training 14/16 epoch (loss 1.1875):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 764/880 [1:15:54<13:05,  6.77s/it]
Training 14/16 epoch (loss 1.1875):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 765/880 [1:15:54<12:20,  6.44s/it]
Training 14/16 epoch (loss 1.2656):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 765/880 [1:16:01<12:20,  6.44s/it]
Training 14/16 epoch (loss 1.2656):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 766/880 [1:16:01<12:19,  6.48s/it]
Training 14/16 epoch (loss 0.9805):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 766/880 [1:16:06<12:19,  6.48s/it]
Training 14/16 epoch (loss 0.9805):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 767/880 [1:16:06<11:35,  6.15s/it]
Training 14/16 epoch (loss 1.3750):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 767/880 [1:16:11<11:35,  6.15s/it]
Training 14/16 epoch (loss 1.3750):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 768/880 [1:16:11<10:42,  5.73s/it]
Training 14/16 epoch (loss 1.2734):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 768/880 [1:16:18<10:42,  5.73s/it]
Training 14/16 epoch (loss 1.2734):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 769/880 [1:16:18<11:15,  6.09s/it]
Training 14/16 epoch (loss 1.0391):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 769/880 [1:16:22<11:15,  6.09s/it]
Training 14/16 epoch (loss 1.0391):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 770/880 [1:16:22<10:21,  5.65s/it]
Training 15/16 epoch (loss 1.2500):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 770/880 [1:16:28<10:21,  5.65s/it]
Training 15/16 epoch (loss 1.2500):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 771/880 [1:16:28<10:02,  5.53s/it]
Training 15/16 epoch (loss 1.2734):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 771/880 [1:16:34<10:02,  5.53s/it]
Training 15/16 epoch (loss 1.2734):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 772/880 [1:16:34<10:11,  5.67s/it]
Training 15/16 epoch (loss 1.0938):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 772/880 [1:16:40<10:11,  5.67s/it]
Training 15/16 epoch (loss 1.0938):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 773/880 [1:16:40<10:17,  5.77s/it]
Training 15/16 epoch (loss 1.4219):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 773/880 [1:16:44<10:17,  5.77s/it]
Training 15/16 epoch (loss 1.4219):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 774/880 [1:16:44<09:37,  5.45s/it]
Training 15/16 epoch (loss 1.3281):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 774/880 [1:16:50<09:37,  5.45s/it]
Training 15/16 epoch (loss 1.3281):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 775/880 [1:16:50<09:25,  5.38s/it]
Training 15/16 epoch (loss 1.1172):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 775/880 [1:16:55<09:25,  5.38s/it]
Training 15/16 epoch (loss 1.1172):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 776/880 [1:16:55<09:33,  5.51s/it]
Training 15/16 epoch (loss 1.2344):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 776/880 [1:17:00<09:33,  5.51s/it]
Training 15/16 epoch (loss 1.2344):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 777/880 [1:17:00<09:00,  5.25s/it]
Training 15/16 epoch (loss 1.2109):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 777/880 [1:17:06<09:00,  5.25s/it]
Training 15/16 epoch (loss 1.2109):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 778/880 [1:17:06<09:05,  5.34s/it]
Training 15/16 epoch (loss 1.3828):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 778/880 [1:17:14<09:05,  5.34s/it]
Training 15/16 epoch (loss 1.3828):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 779/880 [1:17:14<10:35,  6.29s/it]
Training 15/16 epoch (loss 1.3438):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 779/880 [1:17:19<10:35,  6.29s/it]
Training 15/16 epoch (loss 1.3438):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 780/880 [1:17:19<09:52,  5.92s/it]
Training 15/16 epoch (loss 1.3516):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 780/880 [1:17:24<09:52,  5.92s/it]
Training 15/16 epoch (loss 1.3516):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 781/880 [1:17:24<09:13,  5.59s/it]
Training 15/16 epoch (loss 1.2734):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 781/880 [1:17:31<09:13,  5.59s/it]
Training 15/16 epoch (loss 1.2734):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 782/880 [1:17:31<10:02,  6.15s/it]
Training 15/16 epoch (loss 1.0859):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 782/880 [1:17:36<10:02,  6.15s/it]
Training 15/16 epoch (loss 1.0859):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 783/880 [1:17:36<09:17,  5.74s/it]
Training 15/16 epoch (loss 1.5156):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 783/880 [1:17:41<09:17,  5.74s/it]
Training 15/16 epoch (loss 1.5156):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 784/880 [1:17:41<08:43,  5.46s/it]
Training 15/16 epoch (loss 1.2188):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 784/880 [1:17:46<08:43,  5.46s/it]
Training 15/16 epoch (loss 1.2188):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 785/880 [1:17:46<08:40,  5.47s/it]
Training 15/16 epoch (loss 1.3125):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 785/880 [1:17:52<08:40,  5.47s/it]
Training 15/16 epoch (loss 1.3125):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 786/880 [1:17:52<08:26,  5.38s/it]
Training 15/16 epoch (loss 1.1953):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 786/880 [1:17:58<08:26,  5.38s/it]
Training 15/16 epoch (loss 1.1953):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 787/880 [1:17:58<08:52,  5.73s/it]
Training 15/16 epoch (loss 1.2188):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 787/880 [1:18:03<08:52,  5.73s/it]
Training 15/16 epoch (loss 1.2188):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 788/880 [1:18:03<08:28,  5.52s/it]
Training 15/16 epoch (loss 1.2734):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 788/880 [1:18:08<08:28,  5.52s/it]
Training 15/16 epoch (loss 1.2734):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 789/880 [1:18:08<08:06,  5.35s/it]
Training 15/16 epoch (loss 0.9453):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 789/880 [1:18:12<08:06,  5.35s/it]
Training 15/16 epoch (loss 0.9453):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 790/880 [1:18:12<07:29,  4.99s/it]
Training 15/16 epoch (loss 1.1250):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 790/880 [1:18:17<07:29,  4.99s/it]
Training 15/16 epoch (loss 1.1250):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 791/880 [1:18:17<07:21,  4.97s/it]
Training 15/16 epoch (loss 1.3828):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 791/880 [1:18:23<07:21,  4.97s/it]
Training 15/16 epoch (loss 1.3828):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 792/880 [1:18:23<07:28,  5.09s/it]
Training 15/16 epoch (loss 1.1797):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 792/880 [1:18:27<07:28,  5.09s/it]
Training 15/16 epoch (loss 1.1797):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 793/880 [1:18:27<06:57,  4.80s/it]
Training 15/16 epoch (loss 1.0625):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 793/880 [1:18:32<06:57,  4.80s/it]
Training 15/16 epoch (loss 1.0625):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 794/880 [1:18:32<07:00,  4.89s/it]
Training 15/16 epoch (loss 1.1719):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 794/880 [1:18:44<07:00,  4.89s/it]
Training 15/16 epoch (loss 1.1719):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 795/880 [1:18:44<10:09,  7.17s/it]
Training 15/16 epoch (loss 0.9844):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 795/880 [1:18:50<10:09,  7.17s/it]
Training 15/16 epoch (loss 0.9844):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 796/880 [1:18:50<09:28,  6.76s/it]
Training 15/16 epoch (loss 1.2109):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 796/880 [1:18:55<09:28,  6.76s/it]
Training 15/16 epoch (loss 1.2109):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 797/880 [1:18:55<08:46,  6.34s/it]
Training 15/16 epoch (loss 0.9297):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 797/880 [1:19:01<08:46,  6.34s/it]
Training 15/16 epoch (loss 0.9297):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 798/880 [1:19:01<08:14,  6.03s/it]
Training 15/16 epoch (loss 1.1094):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 798/880 [1:19:05<08:14,  6.03s/it]
Training 15/16 epoch (loss 1.1094):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 799/880 [1:19:05<07:35,  5.62s/it]
Training 15/16 epoch (loss 1.1328):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 799/880 [1:19:11<07:35,  5.62s/it]
Training 15/16 epoch (loss 1.1328):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 800/880 [1:19:11<07:27,  5.59s/it]
Training 15/16 epoch (loss 1.1719):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 800/880 [1:19:15<07:27,  5.59s/it]
Training 15/16 epoch (loss 1.1719):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 801/880 [1:19:15<06:53,  5.24s/it]
Training 15/16 epoch (loss 0.9961):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 801/880 [1:19:20<06:53,  5.24s/it]
Training 15/16 epoch (loss 0.9961):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 802/880 [1:19:20<06:33,  5.04s/it]
Training 15/16 epoch (loss 1.1875):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 802/880 [1:19:25<06:33,  5.04s/it]
Training 15/16 epoch (loss 1.1875):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 803/880 [1:19:25<06:37,  5.17s/it]
Training 15/16 epoch (loss 1.0234):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 803/880 [1:19:31<06:37,  5.17s/it]
Training 15/16 epoch (loss 1.0234):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 804/880 [1:19:31<06:37,  5.23s/it]
Training 15/16 epoch (loss 1.0234):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 804/880 [1:19:36<06:37,  5.23s/it]
Training 15/16 epoch (loss 1.0234):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 805/880 [1:19:36<06:36,  5.28s/it]
Training 15/16 epoch (loss 1.2031):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 805/880 [1:19:43<06:36,  5.28s/it]
Training 15/16 epoch (loss 1.2031):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 806/880 [1:19:43<06:56,  5.63s/it]
Training 15/16 epoch (loss 1.2422):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 806/880 [1:19:57<06:56,  5.63s/it]
Training 15/16 epoch (loss 1.2422):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 807/880 [1:19:57<10:02,  8.25s/it]
Training 15/16 epoch (loss 1.0938):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 807/880 [1:20:02<10:02,  8.25s/it]
Training 15/16 epoch (loss 1.0938):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 808/880 [1:20:02<08:46,  7.32s/it]
Training 15/16 epoch (loss 1.2734):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 808/880 [1:20:07<08:46,  7.32s/it]
Training 15/16 epoch (loss 1.2734):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 809/880 [1:20:07<07:40,  6.49s/it]
Training 15/16 epoch (loss 1.0938):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 809/880 [1:20:12<07:40,  6.49s/it]
Training 15/16 epoch (loss 1.0938):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 810/880 [1:20:12<07:06,  6.09s/it]
Training 15/16 epoch (loss 1.1094):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 810/880 [1:20:17<07:06,  6.09s/it]
Training 15/16 epoch (loss 1.1094):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 811/880 [1:20:17<06:36,  5.74s/it]
Training 15/16 epoch (loss 1.2109):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 811/880 [1:20:22<06:36,  5.74s/it]
Training 15/16 epoch (loss 1.2109):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 812/880 [1:20:22<06:23,  5.65s/it]
Training 15/16 epoch (loss 1.0469):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 812/880 [1:20:28<06:23,  5.65s/it]
Training 15/16 epoch (loss 1.0469):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 813/880 [1:20:28<06:28,  5.80s/it]
Training 15/16 epoch (loss 1.5469):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 813/880 [1:20:36<06:28,  5.80s/it]
Training 15/16 epoch (loss 1.5469):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 814/880 [1:20:36<07:03,  6.42s/it]
Training 15/16 epoch (loss 1.2344):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 814/880 [1:20:52<07:03,  6.42s/it]
Training 15/16 epoch (loss 1.2344):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 815/880 [1:20:52<09:59,  9.22s/it]
Training 15/16 epoch (loss 1.0469):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 815/880 [1:20:58<09:59,  9.22s/it]
Training 15/16 epoch (loss 1.0469):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 816/880 [1:20:58<08:46,  8.22s/it]
Training 15/16 epoch (loss 1.0000):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 816/880 [1:21:03<08:46,  8.22s/it]
Training 15/16 epoch (loss 1.0000):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 817/880 [1:21:03<07:33,  7.20s/it]
Training 15/16 epoch (loss 1.0938):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 817/880 [1:21:08<07:33,  7.20s/it]
Training 15/16 epoch (loss 1.0938):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 818/880 [1:21:08<06:53,  6.67s/it]
Training 15/16 epoch (loss 1.0469):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 818/880 [1:21:15<06:53,  6.67s/it]
Training 15/16 epoch (loss 1.0469):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 819/880 [1:21:15<06:52,  6.77s/it]
Training 15/16 epoch (loss 1.1797):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 819/880 [1:21:21<06:52,  6.77s/it]
Training 15/16 epoch (loss 1.1797):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 820/880 [1:21:21<06:26,  6.44s/it]
Training 15/16 epoch (loss 1.2578):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 820/880 [1:21:27<06:26,  6.44s/it]
Training 15/16 epoch (loss 1.2578):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 821/880 [1:21:27<06:22,  6.49s/it]
Training 15/16 epoch (loss 0.9766):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 821/880 [1:21:33<06:22,  6.49s/it]
Training 15/16 epoch (loss 0.9766):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 822/880 [1:21:33<05:56,  6.15s/it]
Training 15/16 epoch (loss 1.3672):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 822/880 [1:21:38<05:56,  6.15s/it]
Training 15/16 epoch (loss 1.3672):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 823/880 [1:21:38<05:26,  5.73s/it]
Training 15/16 epoch (loss 1.2656):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 823/880 [1:21:44<05:26,  5.73s/it]
Training 15/16 epoch (loss 1.2656):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 824/880 [1:21:44<05:40,  6.09s/it]
Training 15/16 epoch (loss 1.0312):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 824/880 [1:21:49<05:40,  6.09s/it]
Training 15/16 epoch (loss 1.0312):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 825/880 [1:21:49<05:10,  5.65s/it]
Training 16/16 epoch (loss 1.2422):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 825/880 [1:21:54<05:10,  5.65s/it]
Training 16/16 epoch (loss 1.2422):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 826/880 [1:21:54<04:58,  5.52s/it]
Training 16/16 epoch (loss 1.2578):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 826/880 [1:22:00<04:58,  5.52s/it]
Training 16/16 epoch (loss 1.2578):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 827/880 [1:22:00<05:00,  5.66s/it]
Training 16/16 epoch (loss 1.0859):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 827/880 [1:22:06<05:00,  5.66s/it]
Training 16/16 epoch (loss 1.0859):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 828/880 [1:22:06<04:59,  5.76s/it]
Training 16/16 epoch (loss 1.4062):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 828/880 [1:22:11<04:59,  5.76s/it]
Training 16/16 epoch (loss 1.4062):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 829/880 [1:22:11<04:37,  5.44s/it]
Training 16/16 epoch (loss 1.3203):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 829/880 [1:22:16<04:37,  5.44s/it]
Training 16/16 epoch (loss 1.3203):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 830/880 [1:22:16<04:29,  5.38s/it]
Training 16/16 epoch (loss 1.1094):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 830/880 [1:22:22<04:29,  5.38s/it]
Training 16/16 epoch (loss 1.1094):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 831/880 [1:22:22<04:30,  5.51s/it]
Training 16/16 epoch (loss 1.2188):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 831/880 [1:22:27<04:30,  5.51s/it]
Training 16/16 epoch (loss 1.2188):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 832/880 [1:22:27<04:12,  5.25s/it]
Training 16/16 epoch (loss 1.1875):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 832/880 [1:22:32<04:12,  5.25s/it]
Training 16/16 epoch (loss 1.1875):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 833/880 [1:22:32<04:11,  5.35s/it]
Training 16/16 epoch (loss 1.3594):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 833/880 [1:22:41<04:11,  5.35s/it]
Training 16/16 epoch (loss 1.3594):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 834/880 [1:22:41<04:49,  6.30s/it]
Training 16/16 epoch (loss 1.3359):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 834/880 [1:22:46<04:49,  6.30s/it]
Training 16/16 epoch (loss 1.3359):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 835/880 [1:22:46<04:26,  5.93s/it]
Training 16/16 epoch (loss 1.3438):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 835/880 [1:22:51<04:26,  5.93s/it]
Training 16/16 epoch (loss 1.3438):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 836/880 [1:22:51<04:06,  5.59s/it]
Training 16/16 epoch (loss 1.2734):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 836/880 [1:22:58<04:06,  5.59s/it]
Training 16/16 epoch (loss 1.2734):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 837/880 [1:22:58<04:24,  6.15s/it]
Training 16/16 epoch (loss 1.0781):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 837/880 [1:23:03<04:24,  6.15s/it]
Training 16/16 epoch (loss 1.0781):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 838/880 [1:23:03<04:01,  5.74s/it]
Training 16/16 epoch (loss 1.5078):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 838/880 [1:23:08<04:01,  5.74s/it]
Training 16/16 epoch (loss 1.5078):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 839/880 [1:23:08<03:43,  5.45s/it]
Training 16/16 epoch (loss 1.2188):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 839/880 [1:23:13<03:43,  5.45s/it]
Training 16/16 epoch (loss 1.2188):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 840/880 [1:23:13<03:38,  5.47s/it]
Training 16/16 epoch (loss 1.3047):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 840/880 [1:23:18<03:38,  5.47s/it]
Training 16/16 epoch (loss 1.3047):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 841/880 [1:23:18<03:29,  5.38s/it]
Training 16/16 epoch (loss 1.1953):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 841/880 [1:23:25<03:29,  5.38s/it]
Training 16/16 epoch (loss 1.1953):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 842/880 [1:23:25<03:37,  5.73s/it]
Training 16/16 epoch (loss 1.2109):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 842/880 [1:23:30<03:37,  5.73s/it]
Training 16/16 epoch (loss 1.2109):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 843/880 [1:23:30<03:24,  5.53s/it]
Training 16/16 epoch (loss 1.2656):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 843/880 [1:23:35<03:24,  5.53s/it]
Training 16/16 epoch (loss 1.2656):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 844/880 [1:23:35<03:12,  5.36s/it]
Training 16/16 epoch (loss 0.9375):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 844/880 [1:23:39<03:12,  5.36s/it]
Training 16/16 epoch (loss 0.9375):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 845/880 [1:23:39<02:54,  5.00s/it]
Training 16/16 epoch (loss 1.1094):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 845/880 [1:23:44<02:54,  5.00s/it]
Training 16/16 epoch (loss 1.1094):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 846/880 [1:23:44<02:48,  4.97s/it]
Training 16/16 epoch (loss 1.3750):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 846/880 [1:23:49<02:48,  4.97s/it]
Training 16/16 epoch (loss 1.3750):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 847/880 [1:23:49<02:48,  5.10s/it]
Training 16/16 epoch (loss 1.1797):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 847/880 [1:23:54<02:48,  5.10s/it]
Training 16/16 epoch (loss 1.1797):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 848/880 [1:23:54<02:33,  4.80s/it]
Training 16/16 epoch (loss 1.0547):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 848/880 [1:23:59<02:33,  4.80s/it]
Training 16/16 epoch (loss 1.0547):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 849/880 [1:23:59<02:31,  4.89s/it]
Training 16/16 epoch (loss 1.1641):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 849/880 [1:24:11<02:31,  4.89s/it]
Training 16/16 epoch (loss 1.1641):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 850/880 [1:24:11<03:34,  7.17s/it]
Training 16/16 epoch (loss 0.9688):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 850/880 [1:24:17<03:34,  7.17s/it]
Training 16/16 epoch (loss 0.9688):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 851/880 [1:24:17<03:15,  6.76s/it]
Training 16/16 epoch (loss 1.2109):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 851/880 [1:24:22<03:15,  6.76s/it]
Training 16/16 epoch (loss 1.2109):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 852/880 [1:24:22<02:57,  6.34s/it]
Training 16/16 epoch (loss 0.9141):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 852/880 [1:24:28<02:57,  6.34s/it]
Training 16/16 epoch (loss 0.9141):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 853/880 [1:24:28<02:42,  6.03s/it]
Training 16/16 epoch (loss 1.1016):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 853/880 [1:24:32<02:42,  6.03s/it]
Training 16/16 epoch (loss 1.1016):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 854/880 [1:24:32<02:26,  5.62s/it]
Training 16/16 epoch (loss 1.1250):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 854/880 [1:24:38<02:26,  5.62s/it]
Training 16/16 epoch (loss 1.1250):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 855/880 [1:24:38<02:19,  5.59s/it]
Training 16/16 epoch (loss 1.1719):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 855/880 [1:24:42<02:19,  5.59s/it]
Training 16/16 epoch (loss 1.1719):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 856/880 [1:24:42<02:05,  5.24s/it]
Training 16/16 epoch (loss 0.9922):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 856/880 [1:24:47<02:05,  5.24s/it]
Training 16/16 epoch (loss 0.9922):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 857/880 [1:24:47<01:55,  5.04s/it]
Training 16/16 epoch (loss 1.1797):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 857/880 [1:24:52<01:55,  5.04s/it]
Training 16/16 epoch (loss 1.1797):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 858/880 [1:24:52<01:53,  5.17s/it]
Training 16/16 epoch (loss 1.0078):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 858/880 [1:24:58<01:53,  5.17s/it]
Training 16/16 epoch (loss 1.0078):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 859/880 [1:24:58<01:49,  5.23s/it]
Training 16/16 epoch (loss 1.0234):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 859/880 [1:25:03<01:49,  5.23s/it]
Training 16/16 epoch (loss 1.0234):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 860/880 [1:25:03<01:45,  5.28s/it]
Training 16/16 epoch (loss 1.2031):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 860/880 [1:25:09<01:45,  5.28s/it]
Training 16/16 epoch (loss 1.2031):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 861/880 [1:25:09<01:46,  5.62s/it]
Training 16/16 epoch (loss 1.2344):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 861/880 [1:25:24<01:46,  5.62s/it]
Training 16/16 epoch (loss 1.2344):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 862/880 [1:25:24<02:28,  8.25s/it]
Training 16/16 epoch (loss 1.0781):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 862/880 [1:25:29<02:28,  8.25s/it]
Training 16/16 epoch (loss 1.0781):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 863/880 [1:25:29<02:04,  7.32s/it]
Training 16/16 epoch (loss 1.2656):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 863/880 [1:25:33<02:04,  7.32s/it]
Training 16/16 epoch (loss 1.2656):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 864/880 [1:25:33<01:43,  6.49s/it]
Training 16/16 epoch (loss 1.0859):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 864/880 [1:25:39<01:43,  6.49s/it]
Training 16/16 epoch (loss 1.0859):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 865/880 [1:25:39<01:31,  6.10s/it]
Training 16/16 epoch (loss 1.1016):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 865/880 [1:25:44<01:31,  6.10s/it]
Training 16/16 epoch (loss 1.1016):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 866/880 [1:25:44<01:20,  5.75s/it]
Training 16/16 epoch (loss 1.1953):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 866/880 [1:25:49<01:20,  5.75s/it]
Training 16/16 epoch (loss 1.1953):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 867/880 [1:25:49<01:13,  5.65s/it]
Training 16/16 epoch (loss 1.0469):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 867/880 [1:25:55<01:13,  5.65s/it]
Training 16/16 epoch (loss 1.0469):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 868/880 [1:25:55<01:09,  5.81s/it]
Training 16/16 epoch (loss 1.5469):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 868/880 [1:26:03<01:09,  5.81s/it]
Training 16/16 epoch (loss 1.5469):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 869/880 [1:26:03<01:10,  6.43s/it]
Training 16/16 epoch (loss 1.2266):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 869/880 [1:26:19<01:10,  6.43s/it]
Training 16/16 epoch (loss 1.2266):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 870/880 [1:26:19<01:32,  9.22s/it]
Training 16/16 epoch (loss 1.0391):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 870/880 [1:26:25<01:32,  9.22s/it]
Training 16/16 epoch (loss 1.0391):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 871/880 [1:26:25<01:14,  8.22s/it]
Training 16/16 epoch (loss 0.9922):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 871/880 [1:26:30<01:14,  8.22s/it]
Training 16/16 epoch (loss 0.9922):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 872/880 [1:26:30<00:57,  7.20s/it]
Training 16/16 epoch (loss 1.0859):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 872/880 [1:26:35<00:57,  7.20s/it]
Training 16/16 epoch (loss 1.0859):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 873/880 [1:26:35<00:46,  6.67s/it]
Training 16/16 epoch (loss 1.0469):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 873/880 [1:26:42<00:46,  6.67s/it]
Training 16/16 epoch (loss 1.0469):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 874/880 [1:26:42<00:40,  6.76s/it]
Training 16/16 epoch (loss 1.1719):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 874/880 [1:26:48<00:40,  6.76s/it]
Training 16/16 epoch (loss 1.1719):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 875/880 [1:26:48<00:32,  6.44s/it]
Training 16/16 epoch (loss 1.2500):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 875/880 [1:26:54<00:32,  6.44s/it]
Training 16/16 epoch (loss 1.2500): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 876/880 [1:26:54<00:25,  6.49s/it]
Training 16/16 epoch (loss 0.9648): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 876/880 [1:27:00<00:25,  6.49s/it]
Training 16/16 epoch (loss 0.9648): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 877/880 [1:27:00<00:18,  6.16s/it]
Training 16/16 epoch (loss 1.3594): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 877/880 [1:27:04<00:18,  6.16s/it]
Training 16/16 epoch (loss 1.3594): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 878/880 [1:27:04<00:11,  5.74s/it]
Training 16/16 epoch (loss 1.2500): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 878/880 [1:27:11<00:11,  5.74s/it]
Training 16/16 epoch (loss 1.2500): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 879/880 [1:27:11<00:06,  6.09s/it]
Training 16/16 epoch (loss 1.0234): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 879/880 [1:27:16<00:06,  6.09s/it]
Training 16/16 epoch (loss 1.0234): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 880/880 [1:27:16<00:00,  5.65s/it]
Training 16/16 epoch (loss 1.0234): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 880/880 [1:27:16<00:00,  5.95s/it]
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
wandb: Waiting for W&B process to finish... (success).
wandb: 
wandb: Run history:
wandb: train/epoch β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb:  train/loss β–ˆβ–†β–…β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–‚β–β–‚β–‚β–β–‚β–β–β–β–β–β–β–β–β–β–‚β–β–β–β–
wandb:    train/lr β–„β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–‡β–‡β–†β–†β–†β–†β–…β–…β–…β–…β–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–
wandb:  train/step β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb: 
wandb: Run summary:
wandb: train/epoch 16.0
wandb:  train/loss 1.02344
wandb:    train/lr 0.0
wandb:  train/step 880
wandb: 
wandb: You can sync this run to the cloud by running:
wandb: wandb sync /home/paperspace/safe-rlhf/output/sft/wandb/offline-run-20230725_194014-2rh62cpq
wandb: Find logs at: ./output/sft/wandb/offline-run-20230725_194014-2rh62cpq/logs