2023-10-12 08:01:20,925 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:01:20,929 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 08:01:20,929 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:01:20,930 MultiCorpus: 5777 train + 722 dev + 723 test sentences - NER_ICDAR_EUROPEANA Corpus: 5777 train + 722 dev + 723 test sentences - /root/.flair/datasets/ner_icdar_europeana/nl 2023-10-12 08:01:20,930 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:01:20,930 Train: 5777 sentences 2023-10-12 08:01:20,930 (train_with_dev=False, train_with_test=False) 2023-10-12 08:01:20,930 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:01:20,930 Training Params: 2023-10-12 08:01:20,931 - learning_rate: "0.00015" 2023-10-12 08:01:20,931 - mini_batch_size: "8" 2023-10-12 08:01:20,931 - max_epochs: "10" 2023-10-12 08:01:20,931 - shuffle: "True" 2023-10-12 08:01:20,931 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:01:20,931 Plugins: 2023-10-12 08:01:20,931 - TensorboardLogger 2023-10-12 08:01:20,931 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 08:01:20,931 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:01:20,932 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 08:01:20,932 - metric: "('micro avg', 'f1-score')" 2023-10-12 08:01:20,932 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:01:20,932 Computation: 2023-10-12 08:01:20,932 - compute on device: cuda:0 2023-10-12 08:01:20,932 - embedding storage: none 2023-10-12 08:01:20,932 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:01:20,932 Model training base path: "hmbench-icdar/nl-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3" 2023-10-12 08:01:20,932 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:01:20,932 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:01:20,933 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 08:02:07,389 epoch 1 - iter 72/723 - loss 2.54132511 - time (sec): 46.45 - samples/sec: 371.47 - lr: 0.000015 - momentum: 0.000000 2023-10-12 08:02:51,483 epoch 1 - iter 144/723 - loss 2.49534387 - time (sec): 90.55 - samples/sec: 380.52 - lr: 0.000030 - momentum: 0.000000 2023-10-12 08:03:34,907 epoch 1 - iter 216/723 - loss 2.33709596 - time (sec): 133.97 - samples/sec: 391.12 - lr: 0.000045 - momentum: 0.000000 2023-10-12 08:04:16,305 epoch 1 - iter 288/723 - loss 2.14306603 - time (sec): 175.37 - samples/sec: 394.95 - lr: 0.000060 - momentum: 0.000000 2023-10-12 08:04:59,362 epoch 1 - iter 360/723 - loss 1.92003995 - time (sec): 218.43 - samples/sec: 399.64 - lr: 0.000074 - momentum: 0.000000 2023-10-12 08:05:39,114 epoch 1 - iter 432/723 - loss 1.70854498 - time (sec): 258.18 - samples/sec: 404.24 - lr: 0.000089 - momentum: 0.000000 2023-10-12 08:06:20,260 epoch 1 - iter 504/723 - loss 1.49829714 - time (sec): 299.32 - samples/sec: 413.09 - lr: 0.000104 - momentum: 0.000000 2023-10-12 08:07:00,496 epoch 1 - iter 576/723 - loss 1.35542792 - time (sec): 339.56 - samples/sec: 411.66 - lr: 0.000119 - momentum: 0.000000 2023-10-12 08:07:40,075 epoch 1 - iter 648/723 - loss 1.22705400 - time (sec): 379.14 - samples/sec: 415.25 - lr: 0.000134 - momentum: 0.000000 2023-10-12 08:08:20,913 epoch 1 - iter 720/723 - loss 1.12090252 - time (sec): 419.98 - samples/sec: 418.02 - lr: 0.000149 - momentum: 0.000000 2023-10-12 08:08:22,176 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:08:22,176 EPOCH 1 done: loss 1.1174 - lr: 0.000149 2023-10-12 08:08:42,896 DEV : loss 0.20851875841617584 - f1-score (micro avg) 0.2808 2023-10-12 08:08:42,927 saving best model 2023-10-12 08:08:43,817 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:09:22,535 epoch 2 - iter 72/723 - loss 0.16527334 - time (sec): 38.72 - samples/sec: 458.14 - lr: 0.000148 - momentum: 0.000000 2023-10-12 08:10:01,596 epoch 2 - iter 144/723 - loss 0.15920354 - time (sec): 77.78 - samples/sec: 461.01 - lr: 0.000147 - momentum: 0.000000 2023-10-12 08:10:41,275 epoch 2 - iter 216/723 - loss 0.14858254 - time (sec): 117.46 - samples/sec: 457.98 - lr: 0.000145 - momentum: 0.000000 2023-10-12 08:11:22,375 epoch 2 - iter 288/723 - loss 0.14005659 - time (sec): 158.56 - samples/sec: 460.85 - lr: 0.000143 - momentum: 0.000000 2023-10-12 08:12:00,825 epoch 2 - iter 360/723 - loss 0.13472715 - time (sec): 197.01 - samples/sec: 454.09 - lr: 0.000142 - momentum: 0.000000 2023-10-12 08:12:39,735 epoch 2 - iter 432/723 - loss 0.13544588 - time (sec): 235.92 - samples/sec: 449.60 - lr: 0.000140 - momentum: 0.000000 2023-10-12 08:13:20,878 epoch 2 - iter 504/723 - loss 0.13281162 - time (sec): 277.06 - samples/sec: 445.68 - lr: 0.000138 - momentum: 0.000000 2023-10-12 08:14:00,585 epoch 2 - iter 576/723 - loss 0.12972903 - time (sec): 316.77 - samples/sec: 442.83 - lr: 0.000137 - momentum: 0.000000 2023-10-12 08:14:42,263 epoch 2 - iter 648/723 - loss 0.12809968 - time (sec): 358.44 - samples/sec: 441.21 - lr: 0.000135 - momentum: 0.000000 2023-10-12 08:15:23,099 epoch 2 - iter 720/723 - loss 0.12592851 - time (sec): 399.28 - samples/sec: 439.75 - lr: 0.000133 - momentum: 0.000000 2023-10-12 08:15:24,361 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:15:24,362 EPOCH 2 done: loss 0.1257 - lr: 0.000133 2023-10-12 08:15:46,454 DEV : loss 0.11041188985109329 - f1-score (micro avg) 0.7636 2023-10-12 08:15:46,485 saving best model 2023-10-12 08:15:49,509 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:16:29,070 epoch 3 - iter 72/723 - loss 0.09753042 - time (sec): 39.56 - samples/sec: 443.00 - lr: 0.000132 - momentum: 0.000000 2023-10-12 08:17:08,243 epoch 3 - iter 144/723 - loss 0.09125376 - time (sec): 78.73 - samples/sec: 436.86 - lr: 0.000130 - momentum: 0.000000 2023-10-12 08:17:48,702 epoch 3 - iter 216/723 - loss 0.08108488 - time (sec): 119.19 - samples/sec: 437.59 - lr: 0.000128 - momentum: 0.000000 2023-10-12 08:18:28,176 epoch 3 - iter 288/723 - loss 0.08382117 - time (sec): 158.66 - samples/sec: 429.34 - lr: 0.000127 - momentum: 0.000000 2023-10-12 08:19:08,625 epoch 3 - iter 360/723 - loss 0.08097089 - time (sec): 199.11 - samples/sec: 434.97 - lr: 0.000125 - momentum: 0.000000 2023-10-12 08:19:48,758 epoch 3 - iter 432/723 - loss 0.08001548 - time (sec): 239.25 - samples/sec: 434.92 - lr: 0.000123 - momentum: 0.000000 2023-10-12 08:20:30,650 epoch 3 - iter 504/723 - loss 0.08074185 - time (sec): 281.14 - samples/sec: 435.28 - lr: 0.000122 - momentum: 0.000000 2023-10-12 08:21:11,019 epoch 3 - iter 576/723 - loss 0.07871326 - time (sec): 321.51 - samples/sec: 433.22 - lr: 0.000120 - momentum: 0.000000 2023-10-12 08:21:51,914 epoch 3 - iter 648/723 - loss 0.07725144 - time (sec): 362.40 - samples/sec: 433.28 - lr: 0.000118 - momentum: 0.000000 2023-10-12 08:22:35,378 epoch 3 - iter 720/723 - loss 0.07587084 - time (sec): 405.87 - samples/sec: 432.96 - lr: 0.000117 - momentum: 0.000000 2023-10-12 08:22:36,559 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:22:36,559 EPOCH 3 done: loss 0.0760 - lr: 0.000117 2023-10-12 08:22:58,358 DEV : loss 0.07043775916099548 - f1-score (micro avg) 0.8748 2023-10-12 08:22:58,410 saving best model 2023-10-12 08:23:01,442 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:23:42,834 epoch 4 - iter 72/723 - loss 0.04801858 - time (sec): 41.39 - samples/sec: 439.25 - lr: 0.000115 - momentum: 0.000000 2023-10-12 08:24:22,770 epoch 4 - iter 144/723 - loss 0.04609664 - time (sec): 81.32 - samples/sec: 436.70 - lr: 0.000113 - momentum: 0.000000 2023-10-12 08:25:02,522 epoch 4 - iter 216/723 - loss 0.05431924 - time (sec): 121.07 - samples/sec: 439.26 - lr: 0.000112 - momentum: 0.000000 2023-10-12 08:25:43,590 epoch 4 - iter 288/723 - loss 0.05298324 - time (sec): 162.14 - samples/sec: 440.08 - lr: 0.000110 - momentum: 0.000000 2023-10-12 08:26:21,973 epoch 4 - iter 360/723 - loss 0.05055245 - time (sec): 200.53 - samples/sec: 439.16 - lr: 0.000108 - momentum: 0.000000 2023-10-12 08:27:00,290 epoch 4 - iter 432/723 - loss 0.05084362 - time (sec): 238.84 - samples/sec: 441.50 - lr: 0.000107 - momentum: 0.000000 2023-10-12 08:27:40,073 epoch 4 - iter 504/723 - loss 0.04984247 - time (sec): 278.63 - samples/sec: 445.33 - lr: 0.000105 - momentum: 0.000000 2023-10-12 08:28:21,188 epoch 4 - iter 576/723 - loss 0.05030850 - time (sec): 319.74 - samples/sec: 441.11 - lr: 0.000103 - momentum: 0.000000 2023-10-12 08:29:00,573 epoch 4 - iter 648/723 - loss 0.05012050 - time (sec): 359.13 - samples/sec: 441.36 - lr: 0.000102 - momentum: 0.000000 2023-10-12 08:29:39,922 epoch 4 - iter 720/723 - loss 0.05042988 - time (sec): 398.47 - samples/sec: 440.78 - lr: 0.000100 - momentum: 0.000000 2023-10-12 08:29:41,107 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:29:41,108 EPOCH 4 done: loss 0.0503 - lr: 0.000100 2023-10-12 08:30:01,548 DEV : loss 0.07415352761745453 - f1-score (micro avg) 0.871 2023-10-12 08:30:01,581 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:30:40,430 epoch 5 - iter 72/723 - loss 0.02733231 - time (sec): 38.85 - samples/sec: 436.87 - lr: 0.000098 - momentum: 0.000000 2023-10-12 08:31:20,046 epoch 5 - iter 144/723 - loss 0.03866248 - time (sec): 78.46 - samples/sec: 434.61 - lr: 0.000097 - momentum: 0.000000 2023-10-12 08:32:01,229 epoch 5 - iter 216/723 - loss 0.03610868 - time (sec): 119.65 - samples/sec: 436.96 - lr: 0.000095 - momentum: 0.000000 2023-10-12 08:32:40,200 epoch 5 - iter 288/723 - loss 0.03310032 - time (sec): 158.62 - samples/sec: 442.71 - lr: 0.000093 - momentum: 0.000000 2023-10-12 08:33:20,657 epoch 5 - iter 360/723 - loss 0.03469398 - time (sec): 199.07 - samples/sec: 445.25 - lr: 0.000092 - momentum: 0.000000 2023-10-12 08:34:00,064 epoch 5 - iter 432/723 - loss 0.03360565 - time (sec): 238.48 - samples/sec: 444.15 - lr: 0.000090 - momentum: 0.000000 2023-10-12 08:34:39,275 epoch 5 - iter 504/723 - loss 0.03312167 - time (sec): 277.69 - samples/sec: 441.48 - lr: 0.000088 - momentum: 0.000000 2023-10-12 08:35:18,887 epoch 5 - iter 576/723 - loss 0.03433857 - time (sec): 317.30 - samples/sec: 441.66 - lr: 0.000087 - momentum: 0.000000 2023-10-12 08:35:58,667 epoch 5 - iter 648/723 - loss 0.03418627 - time (sec): 357.08 - samples/sec: 439.78 - lr: 0.000085 - momentum: 0.000000 2023-10-12 08:36:40,918 epoch 5 - iter 720/723 - loss 0.03402481 - time (sec): 399.33 - samples/sec: 439.89 - lr: 0.000083 - momentum: 0.000000 2023-10-12 08:36:42,163 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:36:42,163 EPOCH 5 done: loss 0.0341 - lr: 0.000083 2023-10-12 08:37:03,913 DEV : loss 0.07782100886106491 - f1-score (micro avg) 0.8638 2023-10-12 08:37:03,946 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:37:43,843 epoch 6 - iter 72/723 - loss 0.01464938 - time (sec): 39.89 - samples/sec: 452.14 - lr: 0.000082 - momentum: 0.000000 2023-10-12 08:38:22,116 epoch 6 - iter 144/723 - loss 0.01875237 - time (sec): 78.17 - samples/sec: 435.06 - lr: 0.000080 - momentum: 0.000000 2023-10-12 08:39:02,232 epoch 6 - iter 216/723 - loss 0.02077035 - time (sec): 118.28 - samples/sec: 433.66 - lr: 0.000078 - momentum: 0.000000 2023-10-12 08:39:41,484 epoch 6 - iter 288/723 - loss 0.02049800 - time (sec): 157.54 - samples/sec: 430.16 - lr: 0.000077 - momentum: 0.000000 2023-10-12 08:40:23,494 epoch 6 - iter 360/723 - loss 0.02030551 - time (sec): 199.55 - samples/sec: 431.18 - lr: 0.000075 - momentum: 0.000000 2023-10-12 08:41:04,121 epoch 6 - iter 432/723 - loss 0.02215626 - time (sec): 240.17 - samples/sec: 430.60 - lr: 0.000073 - momentum: 0.000000 2023-10-12 08:41:44,167 epoch 6 - iter 504/723 - loss 0.02096728 - time (sec): 280.22 - samples/sec: 431.63 - lr: 0.000072 - momentum: 0.000000 2023-10-12 08:42:23,101 epoch 6 - iter 576/723 - loss 0.02191746 - time (sec): 319.15 - samples/sec: 434.46 - lr: 0.000070 - momentum: 0.000000 2023-10-12 08:43:02,162 epoch 6 - iter 648/723 - loss 0.02354005 - time (sec): 358.21 - samples/sec: 438.29 - lr: 0.000068 - momentum: 0.000000 2023-10-12 08:43:40,781 epoch 6 - iter 720/723 - loss 0.02607016 - time (sec): 396.83 - samples/sec: 442.42 - lr: 0.000067 - momentum: 0.000000 2023-10-12 08:43:42,005 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:43:42,006 EPOCH 6 done: loss 0.0265 - lr: 0.000067 2023-10-12 08:44:02,246 DEV : loss 0.08978129923343658 - f1-score (micro avg) 0.854 2023-10-12 08:44:02,275 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:44:40,574 epoch 7 - iter 72/723 - loss 0.01729660 - time (sec): 38.30 - samples/sec: 487.34 - lr: 0.000065 - momentum: 0.000000 2023-10-12 08:45:18,498 epoch 7 - iter 144/723 - loss 0.01451077 - time (sec): 76.22 - samples/sec: 457.95 - lr: 0.000063 - momentum: 0.000000 2023-10-12 08:45:57,439 epoch 7 - iter 216/723 - loss 0.01907941 - time (sec): 115.16 - samples/sec: 460.81 - lr: 0.000062 - momentum: 0.000000 2023-10-12 08:46:35,776 epoch 7 - iter 288/723 - loss 0.01940333 - time (sec): 153.50 - samples/sec: 466.12 - lr: 0.000060 - momentum: 0.000000 2023-10-12 08:47:15,269 epoch 7 - iter 360/723 - loss 0.02120133 - time (sec): 192.99 - samples/sec: 462.97 - lr: 0.000058 - momentum: 0.000000 2023-10-12 08:47:55,556 epoch 7 - iter 432/723 - loss 0.01988510 - time (sec): 233.28 - samples/sec: 453.47 - lr: 0.000057 - momentum: 0.000000 2023-10-12 08:48:35,033 epoch 7 - iter 504/723 - loss 0.02053563 - time (sec): 272.76 - samples/sec: 452.06 - lr: 0.000055 - momentum: 0.000000 2023-10-12 08:49:13,606 epoch 7 - iter 576/723 - loss 0.02179287 - time (sec): 311.33 - samples/sec: 452.47 - lr: 0.000053 - momentum: 0.000000 2023-10-12 08:49:54,431 epoch 7 - iter 648/723 - loss 0.02091802 - time (sec): 352.15 - samples/sec: 448.33 - lr: 0.000052 - momentum: 0.000000 2023-10-12 08:50:35,056 epoch 7 - iter 720/723 - loss 0.02017315 - time (sec): 392.78 - samples/sec: 447.22 - lr: 0.000050 - momentum: 0.000000 2023-10-12 08:50:36,310 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:50:36,310 EPOCH 7 done: loss 0.0201 - lr: 0.000050 2023-10-12 08:50:58,422 DEV : loss 0.11653382331132889 - f1-score (micro avg) 0.8511 2023-10-12 08:50:58,458 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:51:40,177 epoch 8 - iter 72/723 - loss 0.01619778 - time (sec): 41.72 - samples/sec: 399.14 - lr: 0.000048 - momentum: 0.000000 2023-10-12 08:52:21,636 epoch 8 - iter 144/723 - loss 0.01782970 - time (sec): 83.18 - samples/sec: 414.15 - lr: 0.000047 - momentum: 0.000000 2023-10-12 08:53:02,713 epoch 8 - iter 216/723 - loss 0.01611145 - time (sec): 124.25 - samples/sec: 416.95 - lr: 0.000045 - momentum: 0.000000 2023-10-12 08:53:45,293 epoch 8 - iter 288/723 - loss 0.01547392 - time (sec): 166.83 - samples/sec: 418.76 - lr: 0.000043 - momentum: 0.000000 2023-10-12 08:54:27,936 epoch 8 - iter 360/723 - loss 0.01534668 - time (sec): 209.48 - samples/sec: 420.64 - lr: 0.000042 - momentum: 0.000000 2023-10-12 08:55:07,886 epoch 8 - iter 432/723 - loss 0.01896167 - time (sec): 249.43 - samples/sec: 423.89 - lr: 0.000040 - momentum: 0.000000 2023-10-12 08:55:45,574 epoch 8 - iter 504/723 - loss 0.01782626 - time (sec): 287.11 - samples/sec: 432.18 - lr: 0.000038 - momentum: 0.000000 2023-10-12 08:56:22,951 epoch 8 - iter 576/723 - loss 0.01762016 - time (sec): 324.49 - samples/sec: 438.31 - lr: 0.000037 - momentum: 0.000000 2023-10-12 08:56:59,857 epoch 8 - iter 648/723 - loss 0.01764830 - time (sec): 361.40 - samples/sec: 441.20 - lr: 0.000035 - momentum: 0.000000 2023-10-12 08:57:36,207 epoch 8 - iter 720/723 - loss 0.01780592 - time (sec): 397.75 - samples/sec: 441.77 - lr: 0.000033 - momentum: 0.000000 2023-10-12 08:57:37,268 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:57:37,268 EPOCH 8 done: loss 0.0178 - lr: 0.000033 2023-10-12 08:57:57,134 DEV : loss 0.10788174718618393 - f1-score (micro avg) 0.8644 2023-10-12 08:57:57,165 ---------------------------------------------------------------------------------------------------- 2023-10-12 08:58:34,389 epoch 9 - iter 72/723 - loss 0.01835794 - time (sec): 37.22 - samples/sec: 462.34 - lr: 0.000032 - momentum: 0.000000 2023-10-12 08:59:12,112 epoch 9 - iter 144/723 - loss 0.01698658 - time (sec): 74.95 - samples/sec: 449.81 - lr: 0.000030 - momentum: 0.000000 2023-10-12 08:59:51,780 epoch 9 - iter 216/723 - loss 0.01309983 - time (sec): 114.61 - samples/sec: 459.12 - lr: 0.000028 - momentum: 0.000000 2023-10-12 09:00:29,513 epoch 9 - iter 288/723 - loss 0.01229615 - time (sec): 152.35 - samples/sec: 459.60 - lr: 0.000027 - momentum: 0.000000 2023-10-12 09:01:06,464 epoch 9 - iter 360/723 - loss 0.01261841 - time (sec): 189.30 - samples/sec: 461.10 - lr: 0.000025 - momentum: 0.000000 2023-10-12 09:01:44,103 epoch 9 - iter 432/723 - loss 0.01234066 - time (sec): 226.94 - samples/sec: 461.97 - lr: 0.000023 - momentum: 0.000000 2023-10-12 09:02:21,553 epoch 9 - iter 504/723 - loss 0.01224423 - time (sec): 264.39 - samples/sec: 463.87 - lr: 0.000022 - momentum: 0.000000 2023-10-12 09:02:58,848 epoch 9 - iter 576/723 - loss 0.01176578 - time (sec): 301.68 - samples/sec: 463.07 - lr: 0.000020 - momentum: 0.000000 2023-10-12 09:03:36,647 epoch 9 - iter 648/723 - loss 0.01323611 - time (sec): 339.48 - samples/sec: 464.50 - lr: 0.000018 - momentum: 0.000000 2023-10-12 09:04:14,638 epoch 9 - iter 720/723 - loss 0.01378374 - time (sec): 377.47 - samples/sec: 464.75 - lr: 0.000017 - momentum: 0.000000 2023-10-12 09:04:15,962 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:04:15,962 EPOCH 9 done: loss 0.0137 - lr: 0.000017 2023-10-12 09:04:36,857 DEV : loss 0.11403186619281769 - f1-score (micro avg) 0.8648 2023-10-12 09:04:36,886 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:05:15,113 epoch 10 - iter 72/723 - loss 0.00964273 - time (sec): 38.22 - samples/sec: 459.50 - lr: 0.000015 - momentum: 0.000000 2023-10-12 09:05:52,189 epoch 10 - iter 144/723 - loss 0.00804987 - time (sec): 75.30 - samples/sec: 462.18 - lr: 0.000013 - momentum: 0.000000 2023-10-12 09:06:30,158 epoch 10 - iter 216/723 - loss 0.00815867 - time (sec): 113.27 - samples/sec: 467.77 - lr: 0.000012 - momentum: 0.000000 2023-10-12 09:07:07,264 epoch 10 - iter 288/723 - loss 0.00943711 - time (sec): 150.38 - samples/sec: 463.33 - lr: 0.000010 - momentum: 0.000000 2023-10-12 09:07:44,150 epoch 10 - iter 360/723 - loss 0.00912893 - time (sec): 187.26 - samples/sec: 459.36 - lr: 0.000008 - momentum: 0.000000 2023-10-12 09:08:22,630 epoch 10 - iter 432/723 - loss 0.00970888 - time (sec): 225.74 - samples/sec: 461.10 - lr: 0.000007 - momentum: 0.000000 2023-10-12 09:09:02,323 epoch 10 - iter 504/723 - loss 0.00930103 - time (sec): 265.43 - samples/sec: 461.35 - lr: 0.000005 - momentum: 0.000000 2023-10-12 09:09:40,627 epoch 10 - iter 576/723 - loss 0.00981756 - time (sec): 303.74 - samples/sec: 459.74 - lr: 0.000003 - momentum: 0.000000 2023-10-12 09:10:20,091 epoch 10 - iter 648/723 - loss 0.01037803 - time (sec): 343.20 - samples/sec: 458.44 - lr: 0.000002 - momentum: 0.000000 2023-10-12 09:10:59,941 epoch 10 - iter 720/723 - loss 0.01182190 - time (sec): 383.05 - samples/sec: 459.01 - lr: 0.000000 - momentum: 0.000000 2023-10-12 09:11:01,081 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:11:01,081 EPOCH 10 done: loss 0.0118 - lr: 0.000000 2023-10-12 09:11:22,387 DEV : loss 0.12052779644727707 - f1-score (micro avg) 0.8641 2023-10-12 09:11:25,037 ---------------------------------------------------------------------------------------------------- 2023-10-12 09:11:25,041 Loading model from best epoch ... 2023-10-12 09:11:30,263 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-12 09:11:51,517 Results: - F-score (micro) 0.8569 - F-score (macro) 0.6861 - Accuracy 0.7587 By class: precision recall f1-score support PER 0.8466 0.8817 0.8638 482 LOC 0.8827 0.9367 0.9089 458 ORG 0.4828 0.2029 0.2857 69 micro avg 0.8535 0.8603 0.8569 1009 macro avg 0.7374 0.6738 0.6861 1009 weighted avg 0.8381 0.8603 0.8447 1009 2023-10-12 09:11:51,518 ----------------------------------------------------------------------------------------------------