stefan-it's picture
Upload folder using huggingface_hub
f978a42
2023-10-11 01:09:15,579 ----------------------------------------------------------------------------------------------------
2023-10-11 01:09:15,581 Model: "SequenceTagger(
(embeddings): ByT5Embeddings(
(model): T5EncoderModel(
(shared): Embedding(384, 1472)
(encoder): T5Stack(
(embed_tokens): Embedding(384, 1472)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
(relative_attention_bias): Embedding(32, 6)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1-11): 11 x T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=1472, out_features=384, bias=False)
(k): Linear(in_features=1472, out_features=384, bias=False)
(v): Linear(in_features=1472, out_features=384, bias=False)
(o): Linear(in_features=384, out_features=1472, bias=False)
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseGatedActDense(
(wi_0): Linear(in_features=1472, out_features=3584, bias=False)
(wi_1): Linear(in_features=1472, out_features=3584, bias=False)
(wo): Linear(in_features=3584, out_features=1472, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
(act): NewGELUActivation()
)
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=1472, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-11 01:09:15,581 ----------------------------------------------------------------------------------------------------
2023-10-11 01:09:15,582 MultiCorpus: 1166 train + 165 dev + 415 test sentences
- NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator
2023-10-11 01:09:15,582 ----------------------------------------------------------------------------------------------------
2023-10-11 01:09:15,582 Train: 1166 sentences
2023-10-11 01:09:15,582 (train_with_dev=False, train_with_test=False)
2023-10-11 01:09:15,582 ----------------------------------------------------------------------------------------------------
2023-10-11 01:09:15,582 Training Params:
2023-10-11 01:09:15,582 - learning_rate: "0.00016"
2023-10-11 01:09:15,582 - mini_batch_size: "4"
2023-10-11 01:09:15,582 - max_epochs: "10"
2023-10-11 01:09:15,582 - shuffle: "True"
2023-10-11 01:09:15,582 ----------------------------------------------------------------------------------------------------
2023-10-11 01:09:15,582 Plugins:
2023-10-11 01:09:15,582 - TensorboardLogger
2023-10-11 01:09:15,583 - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 01:09:15,583 ----------------------------------------------------------------------------------------------------
2023-10-11 01:09:15,583 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 01:09:15,583 - metric: "('micro avg', 'f1-score')"
2023-10-11 01:09:15,583 ----------------------------------------------------------------------------------------------------
2023-10-11 01:09:15,583 Computation:
2023-10-11 01:09:15,583 - compute on device: cuda:0
2023-10-11 01:09:15,583 - embedding storage: none
2023-10-11 01:09:15,583 ----------------------------------------------------------------------------------------------------
2023-10-11 01:09:15,583 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3"
2023-10-11 01:09:15,583 ----------------------------------------------------------------------------------------------------
2023-10-11 01:09:15,583 ----------------------------------------------------------------------------------------------------
2023-10-11 01:09:15,583 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 01:09:24,274 epoch 1 - iter 29/292 - loss 2.82138332 - time (sec): 8.69 - samples/sec: 445.61 - lr: 0.000015 - momentum: 0.000000
2023-10-11 01:09:33,645 epoch 1 - iter 58/292 - loss 2.81043074 - time (sec): 18.06 - samples/sec: 465.46 - lr: 0.000031 - momentum: 0.000000
2023-10-11 01:09:42,777 epoch 1 - iter 87/292 - loss 2.78816206 - time (sec): 27.19 - samples/sec: 461.02 - lr: 0.000047 - momentum: 0.000000
2023-10-11 01:09:51,955 epoch 1 - iter 116/292 - loss 2.72134020 - time (sec): 36.37 - samples/sec: 461.95 - lr: 0.000063 - momentum: 0.000000
2023-10-11 01:10:01,600 epoch 1 - iter 145/292 - loss 2.62129296 - time (sec): 46.01 - samples/sec: 466.00 - lr: 0.000079 - momentum: 0.000000
2023-10-11 01:10:12,265 epoch 1 - iter 174/292 - loss 2.51278156 - time (sec): 56.68 - samples/sec: 470.71 - lr: 0.000095 - momentum: 0.000000
2023-10-11 01:10:22,717 epoch 1 - iter 203/292 - loss 2.39802974 - time (sec): 67.13 - samples/sec: 468.38 - lr: 0.000111 - momentum: 0.000000
2023-10-11 01:10:32,310 epoch 1 - iter 232/292 - loss 2.29971317 - time (sec): 76.72 - samples/sec: 459.07 - lr: 0.000127 - momentum: 0.000000
2023-10-11 01:10:42,367 epoch 1 - iter 261/292 - loss 2.17326226 - time (sec): 86.78 - samples/sec: 455.96 - lr: 0.000142 - momentum: 0.000000
2023-10-11 01:10:53,044 epoch 1 - iter 290/292 - loss 2.04893235 - time (sec): 97.46 - samples/sec: 451.82 - lr: 0.000158 - momentum: 0.000000
2023-10-11 01:10:53,766 ----------------------------------------------------------------------------------------------------
2023-10-11 01:10:53,766 EPOCH 1 done: loss 2.0372 - lr: 0.000158
2023-10-11 01:10:59,447 DEV : loss 0.6696223616600037 - f1-score (micro avg) 0.0
2023-10-11 01:10:59,457 ----------------------------------------------------------------------------------------------------
2023-10-11 01:11:08,593 epoch 2 - iter 29/292 - loss 0.71111290 - time (sec): 9.13 - samples/sec: 431.77 - lr: 0.000158 - momentum: 0.000000
2023-10-11 01:11:18,636 epoch 2 - iter 58/292 - loss 0.67102332 - time (sec): 19.18 - samples/sec: 416.01 - lr: 0.000157 - momentum: 0.000000
2023-10-11 01:11:29,392 epoch 2 - iter 87/292 - loss 0.64741993 - time (sec): 29.93 - samples/sec: 408.28 - lr: 0.000155 - momentum: 0.000000
2023-10-11 01:11:39,986 epoch 2 - iter 116/292 - loss 0.62252721 - time (sec): 40.53 - samples/sec: 410.46 - lr: 0.000153 - momentum: 0.000000
2023-10-11 01:11:50,075 epoch 2 - iter 145/292 - loss 0.57528282 - time (sec): 50.62 - samples/sec: 421.13 - lr: 0.000151 - momentum: 0.000000
2023-10-11 01:12:00,596 epoch 2 - iter 174/292 - loss 0.57623628 - time (sec): 61.14 - samples/sec: 423.72 - lr: 0.000149 - momentum: 0.000000
2023-10-11 01:12:10,194 epoch 2 - iter 203/292 - loss 0.55821018 - time (sec): 70.74 - samples/sec: 425.76 - lr: 0.000148 - momentum: 0.000000
2023-10-11 01:12:19,878 epoch 2 - iter 232/292 - loss 0.53215406 - time (sec): 80.42 - samples/sec: 431.54 - lr: 0.000146 - momentum: 0.000000
2023-10-11 01:12:29,269 epoch 2 - iter 261/292 - loss 0.51244315 - time (sec): 89.81 - samples/sec: 433.02 - lr: 0.000144 - momentum: 0.000000
2023-10-11 01:12:39,746 epoch 2 - iter 290/292 - loss 0.49473626 - time (sec): 100.29 - samples/sec: 439.91 - lr: 0.000142 - momentum: 0.000000
2023-10-11 01:12:40,311 ----------------------------------------------------------------------------------------------------
2023-10-11 01:12:40,312 EPOCH 2 done: loss 0.4934 - lr: 0.000142
2023-10-11 01:12:46,218 DEV : loss 0.28755611181259155 - f1-score (micro avg) 0.2051
2023-10-11 01:12:46,227 saving best model
2023-10-11 01:12:47,301 ----------------------------------------------------------------------------------------------------
2023-10-11 01:12:57,364 epoch 3 - iter 29/292 - loss 0.36695085 - time (sec): 10.06 - samples/sec: 505.45 - lr: 0.000141 - momentum: 0.000000
2023-10-11 01:13:07,801 epoch 3 - iter 58/292 - loss 0.32901598 - time (sec): 20.50 - samples/sec: 504.89 - lr: 0.000139 - momentum: 0.000000
2023-10-11 01:13:17,256 epoch 3 - iter 87/292 - loss 0.36851166 - time (sec): 29.95 - samples/sec: 497.62 - lr: 0.000137 - momentum: 0.000000
2023-10-11 01:13:26,734 epoch 3 - iter 116/292 - loss 0.35011302 - time (sec): 39.43 - samples/sec: 479.28 - lr: 0.000135 - momentum: 0.000000
2023-10-11 01:13:37,161 epoch 3 - iter 145/292 - loss 0.33484524 - time (sec): 49.86 - samples/sec: 482.01 - lr: 0.000133 - momentum: 0.000000
2023-10-11 01:13:46,264 epoch 3 - iter 174/292 - loss 0.32914702 - time (sec): 58.96 - samples/sec: 474.27 - lr: 0.000132 - momentum: 0.000000
2023-10-11 01:13:55,667 epoch 3 - iter 203/292 - loss 0.31846277 - time (sec): 68.36 - samples/sec: 469.10 - lr: 0.000130 - momentum: 0.000000
2023-10-11 01:14:03,997 epoch 3 - iter 232/292 - loss 0.31671133 - time (sec): 76.69 - samples/sec: 463.95 - lr: 0.000128 - momentum: 0.000000
2023-10-11 01:14:12,410 epoch 3 - iter 261/292 - loss 0.31151005 - time (sec): 85.11 - samples/sec: 459.35 - lr: 0.000126 - momentum: 0.000000
2023-10-11 01:14:22,271 epoch 3 - iter 290/292 - loss 0.30167347 - time (sec): 94.97 - samples/sec: 464.61 - lr: 0.000125 - momentum: 0.000000
2023-10-11 01:14:22,839 ----------------------------------------------------------------------------------------------------
2023-10-11 01:14:22,839 EPOCH 3 done: loss 0.3006 - lr: 0.000125
2023-10-11 01:14:28,418 DEV : loss 0.20087367296218872 - f1-score (micro avg) 0.549
2023-10-11 01:14:28,431 saving best model
2023-10-11 01:14:34,733 ----------------------------------------------------------------------------------------------------
2023-10-11 01:14:43,795 epoch 4 - iter 29/292 - loss 0.21083576 - time (sec): 9.06 - samples/sec: 439.73 - lr: 0.000123 - momentum: 0.000000
2023-10-11 01:14:53,684 epoch 4 - iter 58/292 - loss 0.21088872 - time (sec): 18.95 - samples/sec: 469.36 - lr: 0.000121 - momentum: 0.000000
2023-10-11 01:15:02,612 epoch 4 - iter 87/292 - loss 0.20714432 - time (sec): 27.87 - samples/sec: 454.79 - lr: 0.000119 - momentum: 0.000000
2023-10-11 01:15:13,032 epoch 4 - iter 116/292 - loss 0.21322282 - time (sec): 38.29 - samples/sec: 447.09 - lr: 0.000117 - momentum: 0.000000
2023-10-11 01:15:24,001 epoch 4 - iter 145/292 - loss 0.21851823 - time (sec): 49.26 - samples/sec: 452.79 - lr: 0.000116 - momentum: 0.000000
2023-10-11 01:15:33,267 epoch 4 - iter 174/292 - loss 0.21415411 - time (sec): 58.53 - samples/sec: 447.76 - lr: 0.000114 - momentum: 0.000000
2023-10-11 01:15:42,652 epoch 4 - iter 203/292 - loss 0.20685407 - time (sec): 67.91 - samples/sec: 451.13 - lr: 0.000112 - momentum: 0.000000
2023-10-11 01:15:51,860 epoch 4 - iter 232/292 - loss 0.20571818 - time (sec): 77.12 - samples/sec: 454.82 - lr: 0.000110 - momentum: 0.000000
2023-10-11 01:16:01,069 epoch 4 - iter 261/292 - loss 0.20428922 - time (sec): 86.33 - samples/sec: 454.53 - lr: 0.000109 - momentum: 0.000000
2023-10-11 01:16:11,317 epoch 4 - iter 290/292 - loss 0.19531517 - time (sec): 96.58 - samples/sec: 459.32 - lr: 0.000107 - momentum: 0.000000
2023-10-11 01:16:11,698 ----------------------------------------------------------------------------------------------------
2023-10-11 01:16:11,699 EPOCH 4 done: loss 0.1951 - lr: 0.000107
2023-10-11 01:16:17,290 DEV : loss 0.15432208776474 - f1-score (micro avg) 0.7049
2023-10-11 01:16:17,299 saving best model
2023-10-11 01:16:26,781 ----------------------------------------------------------------------------------------------------
2023-10-11 01:16:36,180 epoch 5 - iter 29/292 - loss 0.14839972 - time (sec): 9.39 - samples/sec: 495.53 - lr: 0.000105 - momentum: 0.000000
2023-10-11 01:16:45,355 epoch 5 - iter 58/292 - loss 0.12955448 - time (sec): 18.57 - samples/sec: 482.74 - lr: 0.000103 - momentum: 0.000000
2023-10-11 01:16:54,182 epoch 5 - iter 87/292 - loss 0.14171934 - time (sec): 27.40 - samples/sec: 470.00 - lr: 0.000101 - momentum: 0.000000
2023-10-11 01:17:03,515 epoch 5 - iter 116/292 - loss 0.15582821 - time (sec): 36.73 - samples/sec: 461.41 - lr: 0.000100 - momentum: 0.000000
2023-10-11 01:17:13,187 epoch 5 - iter 145/292 - loss 0.14216304 - time (sec): 46.40 - samples/sec: 465.31 - lr: 0.000098 - momentum: 0.000000
2023-10-11 01:17:23,390 epoch 5 - iter 174/292 - loss 0.13851691 - time (sec): 56.60 - samples/sec: 474.52 - lr: 0.000096 - momentum: 0.000000
2023-10-11 01:17:32,810 epoch 5 - iter 203/292 - loss 0.13535683 - time (sec): 66.02 - samples/sec: 477.46 - lr: 0.000094 - momentum: 0.000000
2023-10-11 01:17:42,124 epoch 5 - iter 232/292 - loss 0.13076921 - time (sec): 75.34 - samples/sec: 476.46 - lr: 0.000093 - momentum: 0.000000
2023-10-11 01:17:51,626 epoch 5 - iter 261/292 - loss 0.12830175 - time (sec): 84.84 - samples/sec: 478.51 - lr: 0.000091 - momentum: 0.000000
2023-10-11 01:18:00,202 epoch 5 - iter 290/292 - loss 0.12627346 - time (sec): 93.42 - samples/sec: 473.61 - lr: 0.000089 - momentum: 0.000000
2023-10-11 01:18:00,666 ----------------------------------------------------------------------------------------------------
2023-10-11 01:18:00,666 EPOCH 5 done: loss 0.1260 - lr: 0.000089
2023-10-11 01:18:06,406 DEV : loss 0.1440184861421585 - f1-score (micro avg) 0.7292
2023-10-11 01:18:06,416 saving best model
2023-10-11 01:18:13,003 ----------------------------------------------------------------------------------------------------
2023-10-11 01:18:22,831 epoch 6 - iter 29/292 - loss 0.07376508 - time (sec): 9.82 - samples/sec: 505.21 - lr: 0.000087 - momentum: 0.000000
2023-10-11 01:18:32,018 epoch 6 - iter 58/292 - loss 0.08073227 - time (sec): 19.01 - samples/sec: 475.57 - lr: 0.000085 - momentum: 0.000000
2023-10-11 01:18:41,172 epoch 6 - iter 87/292 - loss 0.07525013 - time (sec): 28.16 - samples/sec: 466.55 - lr: 0.000084 - momentum: 0.000000
2023-10-11 01:18:50,865 epoch 6 - iter 116/292 - loss 0.07204318 - time (sec): 37.85 - samples/sec: 469.81 - lr: 0.000082 - momentum: 0.000000
2023-10-11 01:18:59,907 epoch 6 - iter 145/292 - loss 0.08487665 - time (sec): 46.90 - samples/sec: 461.45 - lr: 0.000080 - momentum: 0.000000
2023-10-11 01:19:10,685 epoch 6 - iter 174/292 - loss 0.09333786 - time (sec): 57.67 - samples/sec: 475.69 - lr: 0.000078 - momentum: 0.000000
2023-10-11 01:19:20,158 epoch 6 - iter 203/292 - loss 0.09481407 - time (sec): 67.15 - samples/sec: 471.43 - lr: 0.000077 - momentum: 0.000000
2023-10-11 01:19:29,890 epoch 6 - iter 232/292 - loss 0.09180752 - time (sec): 76.88 - samples/sec: 470.53 - lr: 0.000075 - momentum: 0.000000
2023-10-11 01:19:38,917 epoch 6 - iter 261/292 - loss 0.09055575 - time (sec): 85.91 - samples/sec: 467.31 - lr: 0.000073 - momentum: 0.000000
2023-10-11 01:19:48,145 epoch 6 - iter 290/292 - loss 0.08929013 - time (sec): 95.13 - samples/sec: 465.70 - lr: 0.000071 - momentum: 0.000000
2023-10-11 01:19:48,560 ----------------------------------------------------------------------------------------------------
2023-10-11 01:19:48,561 EPOCH 6 done: loss 0.0892 - lr: 0.000071
2023-10-11 01:19:54,239 DEV : loss 0.1254325956106186 - f1-score (micro avg) 0.7407
2023-10-11 01:19:54,249 saving best model
2023-10-11 01:20:02,471 ----------------------------------------------------------------------------------------------------
2023-10-11 01:20:11,914 epoch 7 - iter 29/292 - loss 0.06249801 - time (sec): 9.44 - samples/sec: 504.60 - lr: 0.000069 - momentum: 0.000000
2023-10-11 01:20:21,557 epoch 7 - iter 58/292 - loss 0.06865511 - time (sec): 19.08 - samples/sec: 511.26 - lr: 0.000068 - momentum: 0.000000
2023-10-11 01:20:31,122 epoch 7 - iter 87/292 - loss 0.06644168 - time (sec): 28.65 - samples/sec: 486.54 - lr: 0.000066 - momentum: 0.000000
2023-10-11 01:20:40,308 epoch 7 - iter 116/292 - loss 0.06012363 - time (sec): 37.83 - samples/sec: 478.10 - lr: 0.000064 - momentum: 0.000000
2023-10-11 01:20:49,672 epoch 7 - iter 145/292 - loss 0.06400556 - time (sec): 47.20 - samples/sec: 471.22 - lr: 0.000062 - momentum: 0.000000
2023-10-11 01:20:58,515 epoch 7 - iter 174/292 - loss 0.06634884 - time (sec): 56.04 - samples/sec: 465.54 - lr: 0.000061 - momentum: 0.000000
2023-10-11 01:21:08,284 epoch 7 - iter 203/292 - loss 0.06682537 - time (sec): 65.81 - samples/sec: 467.70 - lr: 0.000059 - momentum: 0.000000
2023-10-11 01:21:17,433 epoch 7 - iter 232/292 - loss 0.06617070 - time (sec): 74.96 - samples/sec: 460.15 - lr: 0.000057 - momentum: 0.000000
2023-10-11 01:21:28,335 epoch 7 - iter 261/292 - loss 0.06842093 - time (sec): 85.86 - samples/sec: 464.98 - lr: 0.000055 - momentum: 0.000000
2023-10-11 01:21:37,946 epoch 7 - iter 290/292 - loss 0.06787746 - time (sec): 95.47 - samples/sec: 462.54 - lr: 0.000054 - momentum: 0.000000
2023-10-11 01:21:38,498 ----------------------------------------------------------------------------------------------------
2023-10-11 01:21:38,498 EPOCH 7 done: loss 0.0676 - lr: 0.000054
2023-10-11 01:21:44,277 DEV : loss 0.12312442809343338 - f1-score (micro avg) 0.7511
2023-10-11 01:21:44,286 saving best model
2023-10-11 01:21:58,909 ----------------------------------------------------------------------------------------------------
2023-10-11 01:22:09,738 epoch 8 - iter 29/292 - loss 0.05496167 - time (sec): 10.82 - samples/sec: 492.68 - lr: 0.000052 - momentum: 0.000000
2023-10-11 01:22:18,926 epoch 8 - iter 58/292 - loss 0.06399886 - time (sec): 20.01 - samples/sec: 464.52 - lr: 0.000050 - momentum: 0.000000
2023-10-11 01:22:27,954 epoch 8 - iter 87/292 - loss 0.06407329 - time (sec): 29.04 - samples/sec: 456.08 - lr: 0.000048 - momentum: 0.000000
2023-10-11 01:22:37,360 epoch 8 - iter 116/292 - loss 0.06153453 - time (sec): 38.45 - samples/sec: 459.18 - lr: 0.000046 - momentum: 0.000000
2023-10-11 01:22:46,996 epoch 8 - iter 145/292 - loss 0.06161687 - time (sec): 48.08 - samples/sec: 462.18 - lr: 0.000045 - momentum: 0.000000
2023-10-11 01:22:56,006 epoch 8 - iter 174/292 - loss 0.06132676 - time (sec): 57.09 - samples/sec: 456.44 - lr: 0.000043 - momentum: 0.000000
2023-10-11 01:23:05,668 epoch 8 - iter 203/292 - loss 0.05657878 - time (sec): 66.75 - samples/sec: 457.15 - lr: 0.000041 - momentum: 0.000000
2023-10-11 01:23:15,049 epoch 8 - iter 232/292 - loss 0.05378747 - time (sec): 76.14 - samples/sec: 454.81 - lr: 0.000039 - momentum: 0.000000
2023-10-11 01:23:25,764 epoch 8 - iter 261/292 - loss 0.05161368 - time (sec): 86.85 - samples/sec: 458.00 - lr: 0.000038 - momentum: 0.000000
2023-10-11 01:23:35,701 epoch 8 - iter 290/292 - loss 0.05375375 - time (sec): 96.79 - samples/sec: 456.04 - lr: 0.000036 - momentum: 0.000000
2023-10-11 01:23:36,310 ----------------------------------------------------------------------------------------------------
2023-10-11 01:23:36,311 EPOCH 8 done: loss 0.0546 - lr: 0.000036
2023-10-11 01:23:41,764 DEV : loss 0.12851200997829437 - f1-score (micro avg) 0.7706
2023-10-11 01:23:41,777 saving best model
2023-10-11 01:23:47,109 ----------------------------------------------------------------------------------------------------
2023-10-11 01:23:57,164 epoch 9 - iter 29/292 - loss 0.05893243 - time (sec): 10.05 - samples/sec: 481.02 - lr: 0.000034 - momentum: 0.000000
2023-10-11 01:24:07,377 epoch 9 - iter 58/292 - loss 0.04499649 - time (sec): 20.26 - samples/sec: 468.75 - lr: 0.000032 - momentum: 0.000000
2023-10-11 01:24:16,535 epoch 9 - iter 87/292 - loss 0.04343655 - time (sec): 29.42 - samples/sec: 457.20 - lr: 0.000030 - momentum: 0.000000
2023-10-11 01:24:27,105 epoch 9 - iter 116/292 - loss 0.04156469 - time (sec): 39.99 - samples/sec: 453.70 - lr: 0.000029 - momentum: 0.000000
2023-10-11 01:24:37,237 epoch 9 - iter 145/292 - loss 0.04445814 - time (sec): 50.12 - samples/sec: 457.66 - lr: 0.000027 - momentum: 0.000000
2023-10-11 01:24:47,093 epoch 9 - iter 174/292 - loss 0.04209638 - time (sec): 59.98 - samples/sec: 456.91 - lr: 0.000025 - momentum: 0.000000
2023-10-11 01:24:56,547 epoch 9 - iter 203/292 - loss 0.04103595 - time (sec): 69.43 - samples/sec: 453.71 - lr: 0.000023 - momentum: 0.000000
2023-10-11 01:25:06,689 epoch 9 - iter 232/292 - loss 0.03963824 - time (sec): 79.57 - samples/sec: 451.66 - lr: 0.000022 - momentum: 0.000000
2023-10-11 01:25:16,772 epoch 9 - iter 261/292 - loss 0.04525124 - time (sec): 89.66 - samples/sec: 448.60 - lr: 0.000020 - momentum: 0.000000
2023-10-11 01:25:26,328 epoch 9 - iter 290/292 - loss 0.04628705 - time (sec): 99.21 - samples/sec: 446.02 - lr: 0.000018 - momentum: 0.000000
2023-10-11 01:25:26,812 ----------------------------------------------------------------------------------------------------
2023-10-11 01:25:26,813 EPOCH 9 done: loss 0.0461 - lr: 0.000018
2023-10-11 01:25:32,378 DEV : loss 0.1242719292640686 - f1-score (micro avg) 0.7554
2023-10-11 01:25:32,387 ----------------------------------------------------------------------------------------------------
2023-10-11 01:25:42,386 epoch 10 - iter 29/292 - loss 0.03981653 - time (sec): 10.00 - samples/sec: 493.34 - lr: 0.000016 - momentum: 0.000000
2023-10-11 01:25:52,278 epoch 10 - iter 58/292 - loss 0.04380214 - time (sec): 19.89 - samples/sec: 478.41 - lr: 0.000014 - momentum: 0.000000
2023-10-11 01:26:02,344 epoch 10 - iter 87/292 - loss 0.04599707 - time (sec): 29.96 - samples/sec: 488.29 - lr: 0.000013 - momentum: 0.000000
2023-10-11 01:26:11,860 epoch 10 - iter 116/292 - loss 0.04295599 - time (sec): 39.47 - samples/sec: 482.26 - lr: 0.000011 - momentum: 0.000000
2023-10-11 01:26:21,296 epoch 10 - iter 145/292 - loss 0.04384992 - time (sec): 48.91 - samples/sec: 480.38 - lr: 0.000009 - momentum: 0.000000
2023-10-11 01:26:30,621 epoch 10 - iter 174/292 - loss 0.04333406 - time (sec): 58.23 - samples/sec: 473.38 - lr: 0.000007 - momentum: 0.000000
2023-10-11 01:26:40,130 epoch 10 - iter 203/292 - loss 0.04254014 - time (sec): 67.74 - samples/sec: 468.29 - lr: 0.000006 - momentum: 0.000000
2023-10-11 01:26:49,781 epoch 10 - iter 232/292 - loss 0.04066173 - time (sec): 77.39 - samples/sec: 467.04 - lr: 0.000004 - momentum: 0.000000
2023-10-11 01:26:58,937 epoch 10 - iter 261/292 - loss 0.04215663 - time (sec): 86.55 - samples/sec: 460.97 - lr: 0.000002 - momentum: 0.000000
2023-10-11 01:27:08,995 epoch 10 - iter 290/292 - loss 0.04144527 - time (sec): 96.61 - samples/sec: 459.04 - lr: 0.000000 - momentum: 0.000000
2023-10-11 01:27:09,389 ----------------------------------------------------------------------------------------------------
2023-10-11 01:27:09,390 EPOCH 10 done: loss 0.0414 - lr: 0.000000
2023-10-11 01:27:15,055 DEV : loss 0.1256779134273529 - f1-score (micro avg) 0.757
2023-10-11 01:27:15,935 ----------------------------------------------------------------------------------------------------
2023-10-11 01:27:15,937 Loading model from best epoch ...
2023-10-11 01:27:20,118 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-11 01:27:32,789
Results:
- F-score (micro) 0.7207
- F-score (macro) 0.6769
- Accuracy 0.5807
By class:
precision recall f1-score support
PER 0.8242 0.8218 0.8230 348
LOC 0.5598 0.7893 0.6550 261
ORG 0.3800 0.3654 0.3725 52
HumanProd 0.9000 0.8182 0.8571 22
micro avg 0.6739 0.7745 0.7207 683
macro avg 0.6660 0.6987 0.6769 683
weighted avg 0.6918 0.7745 0.7256 683
2023-10-11 01:27:32,790 ----------------------------------------------------------------------------------------------------