2023-10-14 20:04:40,573 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:04:40,575 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=21, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-14 20:04:40,575 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:04:40,575 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences - NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator 2023-10-14 20:04:40,575 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:04:40,575 Train: 3575 sentences 2023-10-14 20:04:40,575 (train_with_dev=False, train_with_test=False) 2023-10-14 20:04:40,575 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:04:40,575 Training Params: 2023-10-14 20:04:40,575 - learning_rate: "0.00016" 2023-10-14 20:04:40,575 - mini_batch_size: "4" 2023-10-14 20:04:40,575 - max_epochs: "10" 2023-10-14 20:04:40,575 - shuffle: "True" 2023-10-14 20:04:40,575 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:04:40,575 Plugins: 2023-10-14 20:04:40,575 - TensorboardLogger 2023-10-14 20:04:40,575 - LinearScheduler | warmup_fraction: '0.1' 2023-10-14 20:04:40,575 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:04:40,575 Final evaluation on model from best epoch (best-model.pt) 2023-10-14 20:04:40,575 - metric: "('micro avg', 'f1-score')" 2023-10-14 20:04:40,575 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:04:40,575 Computation: 2023-10-14 20:04:40,575 - compute on device: cuda:0 2023-10-14 20:04:40,575 - embedding storage: none 2023-10-14 20:04:40,575 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:04:40,576 Model training base path: "hmbench-hipe2020/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-1" 2023-10-14 20:04:40,576 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:04:40,576 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:04:40,576 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-14 20:04:57,508 epoch 1 - iter 89/894 - loss 3.04433325 - time (sec): 16.93 - samples/sec: 546.74 - lr: 0.000016 - momentum: 0.000000 2023-10-14 20:05:14,096 epoch 1 - iter 178/894 - loss 3.00697322 - time (sec): 33.52 - samples/sec: 521.10 - lr: 0.000032 - momentum: 0.000000 2023-10-14 20:05:30,548 epoch 1 - iter 267/894 - loss 2.85458665 - time (sec): 49.97 - samples/sec: 514.10 - lr: 0.000048 - momentum: 0.000000 2023-10-14 20:05:47,119 epoch 1 - iter 356/894 - loss 2.63070465 - time (sec): 66.54 - samples/sec: 516.75 - lr: 0.000064 - momentum: 0.000000 2023-10-14 20:06:02,771 epoch 1 - iter 445/894 - loss 2.41357097 - time (sec): 82.19 - samples/sec: 508.03 - lr: 0.000079 - momentum: 0.000000 2023-10-14 20:06:19,218 epoch 1 - iter 534/894 - loss 2.15809760 - time (sec): 98.64 - samples/sec: 507.30 - lr: 0.000095 - momentum: 0.000000 2023-10-14 20:06:36,432 epoch 1 - iter 623/894 - loss 1.90629951 - time (sec): 115.86 - samples/sec: 512.68 - lr: 0.000111 - momentum: 0.000000 2023-10-14 20:06:52,947 epoch 1 - iter 712/894 - loss 1.73655924 - time (sec): 132.37 - samples/sec: 513.88 - lr: 0.000127 - momentum: 0.000000 2023-10-14 20:07:11,799 epoch 1 - iter 801/894 - loss 1.57462985 - time (sec): 151.22 - samples/sec: 516.25 - lr: 0.000143 - momentum: 0.000000 2023-10-14 20:07:28,037 epoch 1 - iter 890/894 - loss 1.46293211 - time (sec): 167.46 - samples/sec: 514.09 - lr: 0.000159 - momentum: 0.000000 2023-10-14 20:07:28,794 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:07:28,794 EPOCH 1 done: loss 1.4575 - lr: 0.000159 2023-10-14 20:07:51,443 DEV : loss 0.339751273393631 - f1-score (micro avg) 0.0234 2023-10-14 20:07:51,469 saving best model 2023-10-14 20:07:52,081 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:08:08,569 epoch 2 - iter 89/894 - loss 0.36784261 - time (sec): 16.49 - samples/sec: 517.02 - lr: 0.000158 - momentum: 0.000000 2023-10-14 20:08:25,152 epoch 2 - iter 178/894 - loss 0.35533195 - time (sec): 33.07 - samples/sec: 517.57 - lr: 0.000156 - momentum: 0.000000 2023-10-14 20:08:42,147 epoch 2 - iter 267/894 - loss 0.33472166 - time (sec): 50.06 - samples/sec: 529.77 - lr: 0.000155 - momentum: 0.000000 2023-10-14 20:09:00,581 epoch 2 - iter 356/894 - loss 0.32234839 - time (sec): 68.50 - samples/sec: 525.59 - lr: 0.000153 - momentum: 0.000000 2023-10-14 20:09:17,317 epoch 2 - iter 445/894 - loss 0.30927331 - time (sec): 85.23 - samples/sec: 523.45 - lr: 0.000151 - momentum: 0.000000 2023-10-14 20:09:34,281 epoch 2 - iter 534/894 - loss 0.29646500 - time (sec): 102.20 - samples/sec: 523.10 - lr: 0.000149 - momentum: 0.000000 2023-10-14 20:09:50,684 epoch 2 - iter 623/894 - loss 0.29525988 - time (sec): 118.60 - samples/sec: 518.76 - lr: 0.000148 - momentum: 0.000000 2023-10-14 20:10:07,451 epoch 2 - iter 712/894 - loss 0.29082523 - time (sec): 135.37 - samples/sec: 517.81 - lr: 0.000146 - momentum: 0.000000 2023-10-14 20:10:24,559 epoch 2 - iter 801/894 - loss 0.28139580 - time (sec): 152.48 - samples/sec: 516.17 - lr: 0.000144 - momentum: 0.000000 2023-10-14 20:10:40,763 epoch 2 - iter 890/894 - loss 0.27716837 - time (sec): 168.68 - samples/sec: 511.28 - lr: 0.000142 - momentum: 0.000000 2023-10-14 20:10:41,447 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:10:41,448 EPOCH 2 done: loss 0.2772 - lr: 0.000142 2023-10-14 20:11:06,565 DEV : loss 0.19352570176124573 - f1-score (micro avg) 0.6235 2023-10-14 20:11:06,591 saving best model 2023-10-14 20:11:11,217 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:11:27,822 epoch 3 - iter 89/894 - loss 0.20345638 - time (sec): 16.60 - samples/sec: 498.08 - lr: 0.000140 - momentum: 0.000000 2023-10-14 20:11:44,065 epoch 3 - iter 178/894 - loss 0.18152700 - time (sec): 32.85 - samples/sec: 500.96 - lr: 0.000139 - momentum: 0.000000 2023-10-14 20:12:01,084 epoch 3 - iter 267/894 - loss 0.17490578 - time (sec): 49.87 - samples/sec: 504.95 - lr: 0.000137 - momentum: 0.000000 2023-10-14 20:12:17,425 epoch 3 - iter 356/894 - loss 0.17202406 - time (sec): 66.21 - samples/sec: 508.41 - lr: 0.000135 - momentum: 0.000000 2023-10-14 20:12:36,005 epoch 3 - iter 445/894 - loss 0.16603411 - time (sec): 84.79 - samples/sec: 517.44 - lr: 0.000133 - momentum: 0.000000 2023-10-14 20:12:52,303 epoch 3 - iter 534/894 - loss 0.16465909 - time (sec): 101.08 - samples/sec: 516.75 - lr: 0.000132 - momentum: 0.000000 2023-10-14 20:13:08,435 epoch 3 - iter 623/894 - loss 0.15551952 - time (sec): 117.22 - samples/sec: 513.59 - lr: 0.000130 - momentum: 0.000000 2023-10-14 20:13:24,463 epoch 3 - iter 712/894 - loss 0.14985602 - time (sec): 133.24 - samples/sec: 512.13 - lr: 0.000128 - momentum: 0.000000 2023-10-14 20:13:41,373 epoch 3 - iter 801/894 - loss 0.14468106 - time (sec): 150.15 - samples/sec: 515.69 - lr: 0.000126 - momentum: 0.000000 2023-10-14 20:13:57,801 epoch 3 - iter 890/894 - loss 0.14031154 - time (sec): 166.58 - samples/sec: 516.59 - lr: 0.000125 - momentum: 0.000000 2023-10-14 20:13:58,586 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:13:58,587 EPOCH 3 done: loss 0.1400 - lr: 0.000125 2023-10-14 20:14:23,740 DEV : loss 0.16954360902309418 - f1-score (micro avg) 0.6643 2023-10-14 20:14:23,767 saving best model 2023-10-14 20:14:27,052 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:14:43,421 epoch 4 - iter 89/894 - loss 0.08846701 - time (sec): 16.37 - samples/sec: 515.92 - lr: 0.000123 - momentum: 0.000000 2023-10-14 20:14:59,309 epoch 4 - iter 178/894 - loss 0.09164804 - time (sec): 32.26 - samples/sec: 501.01 - lr: 0.000121 - momentum: 0.000000 2023-10-14 20:15:15,593 epoch 4 - iter 267/894 - loss 0.08809052 - time (sec): 48.54 - samples/sec: 501.81 - lr: 0.000119 - momentum: 0.000000 2023-10-14 20:15:32,110 epoch 4 - iter 356/894 - loss 0.09020029 - time (sec): 65.06 - samples/sec: 501.19 - lr: 0.000117 - momentum: 0.000000 2023-10-14 20:15:48,299 epoch 4 - iter 445/894 - loss 0.08499223 - time (sec): 81.24 - samples/sec: 500.34 - lr: 0.000116 - momentum: 0.000000 2023-10-14 20:16:05,170 epoch 4 - iter 534/894 - loss 0.08110597 - time (sec): 98.12 - samples/sec: 506.20 - lr: 0.000114 - momentum: 0.000000 2023-10-14 20:16:21,605 epoch 4 - iter 623/894 - loss 0.07754561 - time (sec): 114.55 - samples/sec: 505.87 - lr: 0.000112 - momentum: 0.000000 2023-10-14 20:16:38,148 epoch 4 - iter 712/894 - loss 0.07756119 - time (sec): 131.09 - samples/sec: 504.72 - lr: 0.000110 - momentum: 0.000000 2023-10-14 20:16:57,069 epoch 4 - iter 801/894 - loss 0.07686532 - time (sec): 150.01 - samples/sec: 508.09 - lr: 0.000109 - momentum: 0.000000 2023-10-14 20:17:15,151 epoch 4 - iter 890/894 - loss 0.07342823 - time (sec): 168.10 - samples/sec: 512.26 - lr: 0.000107 - momentum: 0.000000 2023-10-14 20:17:15,919 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:17:15,920 EPOCH 4 done: loss 0.0732 - lr: 0.000107 2023-10-14 20:17:41,184 DEV : loss 0.17608195543289185 - f1-score (micro avg) 0.7396 2023-10-14 20:17:41,211 saving best model 2023-10-14 20:17:41,886 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:17:58,085 epoch 5 - iter 89/894 - loss 0.04773510 - time (sec): 16.20 - samples/sec: 480.18 - lr: 0.000105 - momentum: 0.000000 2023-10-14 20:18:14,911 epoch 5 - iter 178/894 - loss 0.04110256 - time (sec): 33.02 - samples/sec: 489.92 - lr: 0.000103 - momentum: 0.000000 2023-10-14 20:18:32,101 epoch 5 - iter 267/894 - loss 0.04134861 - time (sec): 50.21 - samples/sec: 500.72 - lr: 0.000101 - momentum: 0.000000 2023-10-14 20:18:48,790 epoch 5 - iter 356/894 - loss 0.04668182 - time (sec): 66.90 - samples/sec: 503.49 - lr: 0.000100 - momentum: 0.000000 2023-10-14 20:19:05,227 epoch 5 - iter 445/894 - loss 0.04405493 - time (sec): 83.34 - samples/sec: 503.64 - lr: 0.000098 - momentum: 0.000000 2023-10-14 20:19:23,909 epoch 5 - iter 534/894 - loss 0.04580574 - time (sec): 102.02 - samples/sec: 505.88 - lr: 0.000096 - momentum: 0.000000 2023-10-14 20:19:40,166 epoch 5 - iter 623/894 - loss 0.04527345 - time (sec): 118.28 - samples/sec: 505.78 - lr: 0.000094 - momentum: 0.000000 2023-10-14 20:19:56,921 epoch 5 - iter 712/894 - loss 0.04590300 - time (sec): 135.03 - samples/sec: 509.19 - lr: 0.000093 - momentum: 0.000000 2023-10-14 20:20:13,926 epoch 5 - iter 801/894 - loss 0.04752093 - time (sec): 152.04 - samples/sec: 509.41 - lr: 0.000091 - momentum: 0.000000 2023-10-14 20:20:30,503 epoch 5 - iter 890/894 - loss 0.04824072 - time (sec): 168.62 - samples/sec: 510.41 - lr: 0.000089 - momentum: 0.000000 2023-10-14 20:20:31,281 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:20:31,281 EPOCH 5 done: loss 0.0486 - lr: 0.000089 2023-10-14 20:20:56,057 DEV : loss 0.20735777914524078 - f1-score (micro avg) 0.7229 2023-10-14 20:20:56,085 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:21:12,577 epoch 6 - iter 89/894 - loss 0.01695411 - time (sec): 16.49 - samples/sec: 525.25 - lr: 0.000087 - momentum: 0.000000 2023-10-14 20:21:28,712 epoch 6 - iter 178/894 - loss 0.02446819 - time (sec): 32.63 - samples/sec: 520.04 - lr: 0.000085 - momentum: 0.000000 2023-10-14 20:21:45,112 epoch 6 - iter 267/894 - loss 0.02659230 - time (sec): 49.03 - samples/sec: 518.96 - lr: 0.000084 - momentum: 0.000000 2023-10-14 20:22:01,419 epoch 6 - iter 356/894 - loss 0.02527246 - time (sec): 65.33 - samples/sec: 520.98 - lr: 0.000082 - momentum: 0.000000 2023-10-14 20:22:17,554 epoch 6 - iter 445/894 - loss 0.02492951 - time (sec): 81.47 - samples/sec: 517.28 - lr: 0.000080 - momentum: 0.000000 2023-10-14 20:22:35,796 epoch 6 - iter 534/894 - loss 0.02839357 - time (sec): 99.71 - samples/sec: 519.78 - lr: 0.000078 - momentum: 0.000000 2023-10-14 20:22:52,541 epoch 6 - iter 623/894 - loss 0.02825206 - time (sec): 116.45 - samples/sec: 523.70 - lr: 0.000077 - momentum: 0.000000 2023-10-14 20:23:09,200 epoch 6 - iter 712/894 - loss 0.02825234 - time (sec): 133.11 - samples/sec: 521.37 - lr: 0.000075 - momentum: 0.000000 2023-10-14 20:23:25,181 epoch 6 - iter 801/894 - loss 0.03010817 - time (sec): 149.10 - samples/sec: 518.59 - lr: 0.000073 - momentum: 0.000000 2023-10-14 20:23:41,905 epoch 6 - iter 890/894 - loss 0.03021347 - time (sec): 165.82 - samples/sec: 519.76 - lr: 0.000071 - momentum: 0.000000 2023-10-14 20:23:42,597 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:23:42,597 EPOCH 6 done: loss 0.0302 - lr: 0.000071 2023-10-14 20:24:07,543 DEV : loss 0.2202872484922409 - f1-score (micro avg) 0.7455 2023-10-14 20:24:07,569 saving best model 2023-10-14 20:24:11,367 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:24:29,869 epoch 7 - iter 89/894 - loss 0.02780046 - time (sec): 18.50 - samples/sec: 527.16 - lr: 0.000069 - momentum: 0.000000 2023-10-14 20:24:46,557 epoch 7 - iter 178/894 - loss 0.02662654 - time (sec): 35.19 - samples/sec: 523.71 - lr: 0.000068 - momentum: 0.000000 2023-10-14 20:25:03,009 epoch 7 - iter 267/894 - loss 0.03213013 - time (sec): 51.64 - samples/sec: 510.98 - lr: 0.000066 - momentum: 0.000000 2023-10-14 20:25:19,405 epoch 7 - iter 356/894 - loss 0.02646254 - time (sec): 68.03 - samples/sec: 511.03 - lr: 0.000064 - momentum: 0.000000 2023-10-14 20:25:36,170 epoch 7 - iter 445/894 - loss 0.02557549 - time (sec): 84.80 - samples/sec: 512.87 - lr: 0.000062 - momentum: 0.000000 2023-10-14 20:25:53,249 epoch 7 - iter 534/894 - loss 0.02364430 - time (sec): 101.88 - samples/sec: 514.82 - lr: 0.000061 - momentum: 0.000000 2023-10-14 20:26:09,746 epoch 7 - iter 623/894 - loss 0.02404203 - time (sec): 118.38 - samples/sec: 513.81 - lr: 0.000059 - momentum: 0.000000 2023-10-14 20:26:26,140 epoch 7 - iter 712/894 - loss 0.02297158 - time (sec): 134.77 - samples/sec: 514.00 - lr: 0.000057 - momentum: 0.000000 2023-10-14 20:26:42,777 epoch 7 - iter 801/894 - loss 0.02189655 - time (sec): 151.41 - samples/sec: 514.26 - lr: 0.000055 - momentum: 0.000000 2023-10-14 20:26:59,291 epoch 7 - iter 890/894 - loss 0.02080018 - time (sec): 167.92 - samples/sec: 513.25 - lr: 0.000053 - momentum: 0.000000 2023-10-14 20:27:00,016 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:27:00,016 EPOCH 7 done: loss 0.0208 - lr: 0.000053 2023-10-14 20:27:25,243 DEV : loss 0.24006003141403198 - f1-score (micro avg) 0.7596 2023-10-14 20:27:25,270 saving best model 2023-10-14 20:27:29,580 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:27:45,981 epoch 8 - iter 89/894 - loss 0.01819202 - time (sec): 16.40 - samples/sec: 505.35 - lr: 0.000052 - momentum: 0.000000 2023-10-14 20:28:02,806 epoch 8 - iter 178/894 - loss 0.02002471 - time (sec): 33.22 - samples/sec: 508.31 - lr: 0.000050 - momentum: 0.000000 2023-10-14 20:28:18,996 epoch 8 - iter 267/894 - loss 0.01725648 - time (sec): 49.41 - samples/sec: 504.36 - lr: 0.000048 - momentum: 0.000000 2023-10-14 20:28:36,157 epoch 8 - iter 356/894 - loss 0.01616480 - time (sec): 66.57 - samples/sec: 518.25 - lr: 0.000046 - momentum: 0.000000 2023-10-14 20:28:53,072 epoch 8 - iter 445/894 - loss 0.01677385 - time (sec): 83.49 - samples/sec: 523.17 - lr: 0.000045 - momentum: 0.000000 2023-10-14 20:29:09,355 epoch 8 - iter 534/894 - loss 0.01546777 - time (sec): 99.77 - samples/sec: 518.34 - lr: 0.000043 - momentum: 0.000000 2023-10-14 20:29:27,470 epoch 8 - iter 623/894 - loss 0.01582854 - time (sec): 117.89 - samples/sec: 514.01 - lr: 0.000041 - momentum: 0.000000 2023-10-14 20:29:44,191 epoch 8 - iter 712/894 - loss 0.01615447 - time (sec): 134.61 - samples/sec: 513.46 - lr: 0.000039 - momentum: 0.000000 2023-10-14 20:30:00,775 epoch 8 - iter 801/894 - loss 0.01593930 - time (sec): 151.19 - samples/sec: 511.91 - lr: 0.000038 - momentum: 0.000000 2023-10-14 20:30:17,616 epoch 8 - iter 890/894 - loss 0.01516070 - time (sec): 168.03 - samples/sec: 513.74 - lr: 0.000036 - momentum: 0.000000 2023-10-14 20:30:18,252 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:30:18,252 EPOCH 8 done: loss 0.0151 - lr: 0.000036 2023-10-14 20:30:43,109 DEV : loss 0.23652133345603943 - f1-score (micro avg) 0.7519 2023-10-14 20:30:43,135 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:31:01,634 epoch 9 - iter 89/894 - loss 0.01814129 - time (sec): 18.50 - samples/sec: 529.30 - lr: 0.000034 - momentum: 0.000000 2023-10-14 20:31:18,702 epoch 9 - iter 178/894 - loss 0.01179305 - time (sec): 35.57 - samples/sec: 533.30 - lr: 0.000032 - momentum: 0.000000 2023-10-14 20:31:35,620 epoch 9 - iter 267/894 - loss 0.01076027 - time (sec): 52.48 - samples/sec: 527.10 - lr: 0.000030 - momentum: 0.000000 2023-10-14 20:31:52,398 epoch 9 - iter 356/894 - loss 0.00961319 - time (sec): 69.26 - samples/sec: 528.25 - lr: 0.000029 - momentum: 0.000000 2023-10-14 20:32:08,429 epoch 9 - iter 445/894 - loss 0.01201137 - time (sec): 85.29 - samples/sec: 520.40 - lr: 0.000027 - momentum: 0.000000 2023-10-14 20:32:24,799 epoch 9 - iter 534/894 - loss 0.01096827 - time (sec): 101.66 - samples/sec: 517.67 - lr: 0.000025 - momentum: 0.000000 2023-10-14 20:32:41,012 epoch 9 - iter 623/894 - loss 0.01012014 - time (sec): 117.87 - samples/sec: 513.10 - lr: 0.000023 - momentum: 0.000000 2023-10-14 20:32:57,699 epoch 9 - iter 712/894 - loss 0.01039388 - time (sec): 134.56 - samples/sec: 513.70 - lr: 0.000022 - momentum: 0.000000 2023-10-14 20:33:14,257 epoch 9 - iter 801/894 - loss 0.01006521 - time (sec): 151.12 - samples/sec: 512.95 - lr: 0.000020 - momentum: 0.000000 2023-10-14 20:33:31,112 epoch 9 - iter 890/894 - loss 0.01011928 - time (sec): 167.98 - samples/sec: 513.55 - lr: 0.000018 - momentum: 0.000000 2023-10-14 20:33:31,781 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:33:31,781 EPOCH 9 done: loss 0.0101 - lr: 0.000018 2023-10-14 20:33:57,240 DEV : loss 0.25627970695495605 - f1-score (micro avg) 0.7519 2023-10-14 20:33:57,266 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:34:14,039 epoch 10 - iter 89/894 - loss 0.01200696 - time (sec): 16.77 - samples/sec: 526.50 - lr: 0.000016 - momentum: 0.000000 2023-10-14 20:34:30,197 epoch 10 - iter 178/894 - loss 0.00857499 - time (sec): 32.93 - samples/sec: 504.69 - lr: 0.000014 - momentum: 0.000000 2023-10-14 20:34:46,697 epoch 10 - iter 267/894 - loss 0.00703922 - time (sec): 49.43 - samples/sec: 506.28 - lr: 0.000013 - momentum: 0.000000 2023-10-14 20:35:04,040 epoch 10 - iter 356/894 - loss 0.00654044 - time (sec): 66.77 - samples/sec: 513.17 - lr: 0.000011 - momentum: 0.000000 2023-10-14 20:35:22,595 epoch 10 - iter 445/894 - loss 0.00710455 - time (sec): 85.33 - samples/sec: 517.56 - lr: 0.000009 - momentum: 0.000000 2023-10-14 20:35:39,208 epoch 10 - iter 534/894 - loss 0.00709956 - time (sec): 101.94 - samples/sec: 517.20 - lr: 0.000007 - momentum: 0.000000 2023-10-14 20:35:55,393 epoch 10 - iter 623/894 - loss 0.00690966 - time (sec): 118.13 - samples/sec: 512.29 - lr: 0.000006 - momentum: 0.000000 2023-10-14 20:36:11,258 epoch 10 - iter 712/894 - loss 0.00715382 - time (sec): 133.99 - samples/sec: 509.56 - lr: 0.000004 - momentum: 0.000000 2023-10-14 20:36:28,507 epoch 10 - iter 801/894 - loss 0.00646675 - time (sec): 151.24 - samples/sec: 513.66 - lr: 0.000002 - momentum: 0.000000 2023-10-14 20:36:44,895 epoch 10 - iter 890/894 - loss 0.00727492 - time (sec): 167.63 - samples/sec: 514.63 - lr: 0.000000 - momentum: 0.000000 2023-10-14 20:36:45,547 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:36:45,548 EPOCH 10 done: loss 0.0073 - lr: 0.000000 2023-10-14 20:37:10,711 DEV : loss 0.26143890619277954 - f1-score (micro avg) 0.75 2023-10-14 20:37:11,338 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:37:11,339 Loading model from best epoch ... 2023-10-14 20:37:13,530 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time 2023-10-14 20:37:35,355 Results: - F-score (micro) 0.759 - F-score (macro) 0.6755 - Accuracy 0.6254 By class: precision recall f1-score support loc 0.8396 0.8607 0.8500 596 pers 0.6815 0.7838 0.7291 333 org 0.5397 0.5152 0.5271 132 prod 0.6140 0.5303 0.5691 66 time 0.7333 0.6735 0.7021 49 micro avg 0.7447 0.7738 0.7590 1176 macro avg 0.6816 0.6727 0.6755 1176 weighted avg 0.7441 0.7738 0.7576 1176 2023-10-14 20:37:35,355 ----------------------------------------------------------------------------------------------------