|
2023-10-11 01:09:15,579 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:09:15,581 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-11 01:09:15,581 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:09:15,582 MultiCorpus: 1166 train + 165 dev + 415 test sentences |
|
- NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator |
|
2023-10-11 01:09:15,582 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:09:15,582 Train: 1166 sentences |
|
2023-10-11 01:09:15,582 (train_with_dev=False, train_with_test=False) |
|
2023-10-11 01:09:15,582 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:09:15,582 Training Params: |
|
2023-10-11 01:09:15,582 - learning_rate: "0.00016" |
|
2023-10-11 01:09:15,582 - mini_batch_size: "4" |
|
2023-10-11 01:09:15,582 - max_epochs: "10" |
|
2023-10-11 01:09:15,582 - shuffle: "True" |
|
2023-10-11 01:09:15,582 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:09:15,582 Plugins: |
|
2023-10-11 01:09:15,582 - TensorboardLogger |
|
2023-10-11 01:09:15,583 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-11 01:09:15,583 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:09:15,583 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-11 01:09:15,583 - metric: "('micro avg', 'f1-score')" |
|
2023-10-11 01:09:15,583 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:09:15,583 Computation: |
|
2023-10-11 01:09:15,583 - compute on device: cuda:0 |
|
2023-10-11 01:09:15,583 - embedding storage: none |
|
2023-10-11 01:09:15,583 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:09:15,583 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" |
|
2023-10-11 01:09:15,583 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:09:15,583 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:09:15,583 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-11 01:09:24,274 epoch 1 - iter 29/292 - loss 2.82138332 - time (sec): 8.69 - samples/sec: 445.61 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-11 01:09:33,645 epoch 1 - iter 58/292 - loss 2.81043074 - time (sec): 18.06 - samples/sec: 465.46 - lr: 0.000031 - momentum: 0.000000 |
|
2023-10-11 01:09:42,777 epoch 1 - iter 87/292 - loss 2.78816206 - time (sec): 27.19 - samples/sec: 461.02 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-11 01:09:51,955 epoch 1 - iter 116/292 - loss 2.72134020 - time (sec): 36.37 - samples/sec: 461.95 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-11 01:10:01,600 epoch 1 - iter 145/292 - loss 2.62129296 - time (sec): 46.01 - samples/sec: 466.00 - lr: 0.000079 - momentum: 0.000000 |
|
2023-10-11 01:10:12,265 epoch 1 - iter 174/292 - loss 2.51278156 - time (sec): 56.68 - samples/sec: 470.71 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-11 01:10:22,717 epoch 1 - iter 203/292 - loss 2.39802974 - time (sec): 67.13 - samples/sec: 468.38 - lr: 0.000111 - momentum: 0.000000 |
|
2023-10-11 01:10:32,310 epoch 1 - iter 232/292 - loss 2.29971317 - time (sec): 76.72 - samples/sec: 459.07 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-11 01:10:42,367 epoch 1 - iter 261/292 - loss 2.17326226 - time (sec): 86.78 - samples/sec: 455.96 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 01:10:53,044 epoch 1 - iter 290/292 - loss 2.04893235 - time (sec): 97.46 - samples/sec: 451.82 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-11 01:10:53,766 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:10:53,766 EPOCH 1 done: loss 2.0372 - lr: 0.000158 |
|
2023-10-11 01:10:59,447 DEV : loss 0.6696223616600037 - f1-score (micro avg) 0.0 |
|
2023-10-11 01:10:59,457 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:11:08,593 epoch 2 - iter 29/292 - loss 0.71111290 - time (sec): 9.13 - samples/sec: 431.77 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-11 01:11:18,636 epoch 2 - iter 58/292 - loss 0.67102332 - time (sec): 19.18 - samples/sec: 416.01 - lr: 0.000157 - momentum: 0.000000 |
|
2023-10-11 01:11:29,392 epoch 2 - iter 87/292 - loss 0.64741993 - time (sec): 29.93 - samples/sec: 408.28 - lr: 0.000155 - momentum: 0.000000 |
|
2023-10-11 01:11:39,986 epoch 2 - iter 116/292 - loss 0.62252721 - time (sec): 40.53 - samples/sec: 410.46 - lr: 0.000153 - momentum: 0.000000 |
|
2023-10-11 01:11:50,075 epoch 2 - iter 145/292 - loss 0.57528282 - time (sec): 50.62 - samples/sec: 421.13 - lr: 0.000151 - momentum: 0.000000 |
|
2023-10-11 01:12:00,596 epoch 2 - iter 174/292 - loss 0.57623628 - time (sec): 61.14 - samples/sec: 423.72 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-11 01:12:10,194 epoch 2 - iter 203/292 - loss 0.55821018 - time (sec): 70.74 - samples/sec: 425.76 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-11 01:12:19,878 epoch 2 - iter 232/292 - loss 0.53215406 - time (sec): 80.42 - samples/sec: 431.54 - lr: 0.000146 - momentum: 0.000000 |
|
2023-10-11 01:12:29,269 epoch 2 - iter 261/292 - loss 0.51244315 - time (sec): 89.81 - samples/sec: 433.02 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-11 01:12:39,746 epoch 2 - iter 290/292 - loss 0.49473626 - time (sec): 100.29 - samples/sec: 439.91 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 01:12:40,311 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:12:40,312 EPOCH 2 done: loss 0.4934 - lr: 0.000142 |
|
2023-10-11 01:12:46,218 DEV : loss 0.28755611181259155 - f1-score (micro avg) 0.2051 |
|
2023-10-11 01:12:46,227 saving best model |
|
2023-10-11 01:12:47,301 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:12:57,364 epoch 3 - iter 29/292 - loss 0.36695085 - time (sec): 10.06 - samples/sec: 505.45 - lr: 0.000141 - momentum: 0.000000 |
|
2023-10-11 01:13:07,801 epoch 3 - iter 58/292 - loss 0.32901598 - time (sec): 20.50 - samples/sec: 504.89 - lr: 0.000139 - momentum: 0.000000 |
|
2023-10-11 01:13:17,256 epoch 3 - iter 87/292 - loss 0.36851166 - time (sec): 29.95 - samples/sec: 497.62 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-11 01:13:26,734 epoch 3 - iter 116/292 - loss 0.35011302 - time (sec): 39.43 - samples/sec: 479.28 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-11 01:13:37,161 epoch 3 - iter 145/292 - loss 0.33484524 - time (sec): 49.86 - samples/sec: 482.01 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-11 01:13:46,264 epoch 3 - iter 174/292 - loss 0.32914702 - time (sec): 58.96 - samples/sec: 474.27 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-11 01:13:55,667 epoch 3 - iter 203/292 - loss 0.31846277 - time (sec): 68.36 - samples/sec: 469.10 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-11 01:14:03,997 epoch 3 - iter 232/292 - loss 0.31671133 - time (sec): 76.69 - samples/sec: 463.95 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-11 01:14:12,410 epoch 3 - iter 261/292 - loss 0.31151005 - time (sec): 85.11 - samples/sec: 459.35 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-11 01:14:22,271 epoch 3 - iter 290/292 - loss 0.30167347 - time (sec): 94.97 - samples/sec: 464.61 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-11 01:14:22,839 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:14:22,839 EPOCH 3 done: loss 0.3006 - lr: 0.000125 |
|
2023-10-11 01:14:28,418 DEV : loss 0.20087367296218872 - f1-score (micro avg) 0.549 |
|
2023-10-11 01:14:28,431 saving best model |
|
2023-10-11 01:14:34,733 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:14:43,795 epoch 4 - iter 29/292 - loss 0.21083576 - time (sec): 9.06 - samples/sec: 439.73 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-11 01:14:53,684 epoch 4 - iter 58/292 - loss 0.21088872 - time (sec): 18.95 - samples/sec: 469.36 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-11 01:15:02,612 epoch 4 - iter 87/292 - loss 0.20714432 - time (sec): 27.87 - samples/sec: 454.79 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-11 01:15:13,032 epoch 4 - iter 116/292 - loss 0.21322282 - time (sec): 38.29 - samples/sec: 447.09 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-11 01:15:24,001 epoch 4 - iter 145/292 - loss 0.21851823 - time (sec): 49.26 - samples/sec: 452.79 - lr: 0.000116 - momentum: 0.000000 |
|
2023-10-11 01:15:33,267 epoch 4 - iter 174/292 - loss 0.21415411 - time (sec): 58.53 - samples/sec: 447.76 - lr: 0.000114 - momentum: 0.000000 |
|
2023-10-11 01:15:42,652 epoch 4 - iter 203/292 - loss 0.20685407 - time (sec): 67.91 - samples/sec: 451.13 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-11 01:15:51,860 epoch 4 - iter 232/292 - loss 0.20571818 - time (sec): 77.12 - samples/sec: 454.82 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-11 01:16:01,069 epoch 4 - iter 261/292 - loss 0.20428922 - time (sec): 86.33 - samples/sec: 454.53 - lr: 0.000109 - momentum: 0.000000 |
|
2023-10-11 01:16:11,317 epoch 4 - iter 290/292 - loss 0.19531517 - time (sec): 96.58 - samples/sec: 459.32 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-11 01:16:11,698 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:16:11,699 EPOCH 4 done: loss 0.1951 - lr: 0.000107 |
|
2023-10-11 01:16:17,290 DEV : loss 0.15432208776474 - f1-score (micro avg) 0.7049 |
|
2023-10-11 01:16:17,299 saving best model |
|
2023-10-11 01:16:26,781 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:16:36,180 epoch 5 - iter 29/292 - loss 0.14839972 - time (sec): 9.39 - samples/sec: 495.53 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 01:16:45,355 epoch 5 - iter 58/292 - loss 0.12955448 - time (sec): 18.57 - samples/sec: 482.74 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-11 01:16:54,182 epoch 5 - iter 87/292 - loss 0.14171934 - time (sec): 27.40 - samples/sec: 470.00 - lr: 0.000101 - momentum: 0.000000 |
|
2023-10-11 01:17:03,515 epoch 5 - iter 116/292 - loss 0.15582821 - time (sec): 36.73 - samples/sec: 461.41 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-11 01:17:13,187 epoch 5 - iter 145/292 - loss 0.14216304 - time (sec): 46.40 - samples/sec: 465.31 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-11 01:17:23,390 epoch 5 - iter 174/292 - loss 0.13851691 - time (sec): 56.60 - samples/sec: 474.52 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-11 01:17:32,810 epoch 5 - iter 203/292 - loss 0.13535683 - time (sec): 66.02 - samples/sec: 477.46 - lr: 0.000094 - momentum: 0.000000 |
|
2023-10-11 01:17:42,124 epoch 5 - iter 232/292 - loss 0.13076921 - time (sec): 75.34 - samples/sec: 476.46 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-11 01:17:51,626 epoch 5 - iter 261/292 - loss 0.12830175 - time (sec): 84.84 - samples/sec: 478.51 - lr: 0.000091 - momentum: 0.000000 |
|
2023-10-11 01:18:00,202 epoch 5 - iter 290/292 - loss 0.12627346 - time (sec): 93.42 - samples/sec: 473.61 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-11 01:18:00,666 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:18:00,666 EPOCH 5 done: loss 0.1260 - lr: 0.000089 |
|
2023-10-11 01:18:06,406 DEV : loss 0.1440184861421585 - f1-score (micro avg) 0.7292 |
|
2023-10-11 01:18:06,416 saving best model |
|
2023-10-11 01:18:13,003 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:18:22,831 epoch 6 - iter 29/292 - loss 0.07376508 - time (sec): 9.82 - samples/sec: 505.21 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-11 01:18:32,018 epoch 6 - iter 58/292 - loss 0.08073227 - time (sec): 19.01 - samples/sec: 475.57 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-11 01:18:41,172 epoch 6 - iter 87/292 - loss 0.07525013 - time (sec): 28.16 - samples/sec: 466.55 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-11 01:18:50,865 epoch 6 - iter 116/292 - loss 0.07204318 - time (sec): 37.85 - samples/sec: 469.81 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-11 01:18:59,907 epoch 6 - iter 145/292 - loss 0.08487665 - time (sec): 46.90 - samples/sec: 461.45 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-11 01:19:10,685 epoch 6 - iter 174/292 - loss 0.09333786 - time (sec): 57.67 - samples/sec: 475.69 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-11 01:19:20,158 epoch 6 - iter 203/292 - loss 0.09481407 - time (sec): 67.15 - samples/sec: 471.43 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-11 01:19:29,890 epoch 6 - iter 232/292 - loss 0.09180752 - time (sec): 76.88 - samples/sec: 470.53 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 01:19:38,917 epoch 6 - iter 261/292 - loss 0.09055575 - time (sec): 85.91 - samples/sec: 467.31 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-11 01:19:48,145 epoch 6 - iter 290/292 - loss 0.08929013 - time (sec): 95.13 - samples/sec: 465.70 - lr: 0.000071 - momentum: 0.000000 |
|
2023-10-11 01:19:48,560 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:19:48,561 EPOCH 6 done: loss 0.0892 - lr: 0.000071 |
|
2023-10-11 01:19:54,239 DEV : loss 0.1254325956106186 - f1-score (micro avg) 0.7407 |
|
2023-10-11 01:19:54,249 saving best model |
|
2023-10-11 01:20:02,471 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:20:11,914 epoch 7 - iter 29/292 - loss 0.06249801 - time (sec): 9.44 - samples/sec: 504.60 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-11 01:20:21,557 epoch 7 - iter 58/292 - loss 0.06865511 - time (sec): 19.08 - samples/sec: 511.26 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-11 01:20:31,122 epoch 7 - iter 87/292 - loss 0.06644168 - time (sec): 28.65 - samples/sec: 486.54 - lr: 0.000066 - momentum: 0.000000 |
|
2023-10-11 01:20:40,308 epoch 7 - iter 116/292 - loss 0.06012363 - time (sec): 37.83 - samples/sec: 478.10 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-11 01:20:49,672 epoch 7 - iter 145/292 - loss 0.06400556 - time (sec): 47.20 - samples/sec: 471.22 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-11 01:20:58,515 epoch 7 - iter 174/292 - loss 0.06634884 - time (sec): 56.04 - samples/sec: 465.54 - lr: 0.000061 - momentum: 0.000000 |
|
2023-10-11 01:21:08,284 epoch 7 - iter 203/292 - loss 0.06682537 - time (sec): 65.81 - samples/sec: 467.70 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-11 01:21:17,433 epoch 7 - iter 232/292 - loss 0.06617070 - time (sec): 74.96 - samples/sec: 460.15 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-11 01:21:28,335 epoch 7 - iter 261/292 - loss 0.06842093 - time (sec): 85.86 - samples/sec: 464.98 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-11 01:21:37,946 epoch 7 - iter 290/292 - loss 0.06787746 - time (sec): 95.47 - samples/sec: 462.54 - lr: 0.000054 - momentum: 0.000000 |
|
2023-10-11 01:21:38,498 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:21:38,498 EPOCH 7 done: loss 0.0676 - lr: 0.000054 |
|
2023-10-11 01:21:44,277 DEV : loss 0.12312442809343338 - f1-score (micro avg) 0.7511 |
|
2023-10-11 01:21:44,286 saving best model |
|
2023-10-11 01:21:58,909 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:22:09,738 epoch 8 - iter 29/292 - loss 0.05496167 - time (sec): 10.82 - samples/sec: 492.68 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-11 01:22:18,926 epoch 8 - iter 58/292 - loss 0.06399886 - time (sec): 20.01 - samples/sec: 464.52 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-11 01:22:27,954 epoch 8 - iter 87/292 - loss 0.06407329 - time (sec): 29.04 - samples/sec: 456.08 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-11 01:22:37,360 epoch 8 - iter 116/292 - loss 0.06153453 - time (sec): 38.45 - samples/sec: 459.18 - lr: 0.000046 - momentum: 0.000000 |
|
2023-10-11 01:22:46,996 epoch 8 - iter 145/292 - loss 0.06161687 - time (sec): 48.08 - samples/sec: 462.18 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 01:22:56,006 epoch 8 - iter 174/292 - loss 0.06132676 - time (sec): 57.09 - samples/sec: 456.44 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-11 01:23:05,668 epoch 8 - iter 203/292 - loss 0.05657878 - time (sec): 66.75 - samples/sec: 457.15 - lr: 0.000041 - momentum: 0.000000 |
|
2023-10-11 01:23:15,049 epoch 8 - iter 232/292 - loss 0.05378747 - time (sec): 76.14 - samples/sec: 454.81 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-11 01:23:25,764 epoch 8 - iter 261/292 - loss 0.05161368 - time (sec): 86.85 - samples/sec: 458.00 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-11 01:23:35,701 epoch 8 - iter 290/292 - loss 0.05375375 - time (sec): 96.79 - samples/sec: 456.04 - lr: 0.000036 - momentum: 0.000000 |
|
2023-10-11 01:23:36,310 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:23:36,311 EPOCH 8 done: loss 0.0546 - lr: 0.000036 |
|
2023-10-11 01:23:41,764 DEV : loss 0.12851200997829437 - f1-score (micro avg) 0.7706 |
|
2023-10-11 01:23:41,777 saving best model |
|
2023-10-11 01:23:47,109 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:23:57,164 epoch 9 - iter 29/292 - loss 0.05893243 - time (sec): 10.05 - samples/sec: 481.02 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-11 01:24:07,377 epoch 9 - iter 58/292 - loss 0.04499649 - time (sec): 20.26 - samples/sec: 468.75 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-11 01:24:16,535 epoch 9 - iter 87/292 - loss 0.04343655 - time (sec): 29.42 - samples/sec: 457.20 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 01:24:27,105 epoch 9 - iter 116/292 - loss 0.04156469 - time (sec): 39.99 - samples/sec: 453.70 - lr: 0.000029 - momentum: 0.000000 |
|
2023-10-11 01:24:37,237 epoch 9 - iter 145/292 - loss 0.04445814 - time (sec): 50.12 - samples/sec: 457.66 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-11 01:24:47,093 epoch 9 - iter 174/292 - loss 0.04209638 - time (sec): 59.98 - samples/sec: 456.91 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-11 01:24:56,547 epoch 9 - iter 203/292 - loss 0.04103595 - time (sec): 69.43 - samples/sec: 453.71 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-11 01:25:06,689 epoch 9 - iter 232/292 - loss 0.03963824 - time (sec): 79.57 - samples/sec: 451.66 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-11 01:25:16,772 epoch 9 - iter 261/292 - loss 0.04525124 - time (sec): 89.66 - samples/sec: 448.60 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-11 01:25:26,328 epoch 9 - iter 290/292 - loss 0.04628705 - time (sec): 99.21 - samples/sec: 446.02 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-11 01:25:26,812 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:25:26,813 EPOCH 9 done: loss 0.0461 - lr: 0.000018 |
|
2023-10-11 01:25:32,378 DEV : loss 0.1242719292640686 - f1-score (micro avg) 0.7554 |
|
2023-10-11 01:25:32,387 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:25:42,386 epoch 10 - iter 29/292 - loss 0.03981653 - time (sec): 10.00 - samples/sec: 493.34 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-11 01:25:52,278 epoch 10 - iter 58/292 - loss 0.04380214 - time (sec): 19.89 - samples/sec: 478.41 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-11 01:26:02,344 epoch 10 - iter 87/292 - loss 0.04599707 - time (sec): 29.96 - samples/sec: 488.29 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 01:26:11,860 epoch 10 - iter 116/292 - loss 0.04295599 - time (sec): 39.47 - samples/sec: 482.26 - lr: 0.000011 - momentum: 0.000000 |
|
2023-10-11 01:26:21,296 epoch 10 - iter 145/292 - loss 0.04384992 - time (sec): 48.91 - samples/sec: 480.38 - lr: 0.000009 - momentum: 0.000000 |
|
2023-10-11 01:26:30,621 epoch 10 - iter 174/292 - loss 0.04333406 - time (sec): 58.23 - samples/sec: 473.38 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-11 01:26:40,130 epoch 10 - iter 203/292 - loss 0.04254014 - time (sec): 67.74 - samples/sec: 468.29 - lr: 0.000006 - momentum: 0.000000 |
|
2023-10-11 01:26:49,781 epoch 10 - iter 232/292 - loss 0.04066173 - time (sec): 77.39 - samples/sec: 467.04 - lr: 0.000004 - momentum: 0.000000 |
|
2023-10-11 01:26:58,937 epoch 10 - iter 261/292 - loss 0.04215663 - time (sec): 86.55 - samples/sec: 460.97 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-11 01:27:08,995 epoch 10 - iter 290/292 - loss 0.04144527 - time (sec): 96.61 - samples/sec: 459.04 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-11 01:27:09,389 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:27:09,390 EPOCH 10 done: loss 0.0414 - lr: 0.000000 |
|
2023-10-11 01:27:15,055 DEV : loss 0.1256779134273529 - f1-score (micro avg) 0.757 |
|
2023-10-11 01:27:15,935 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 01:27:15,937 Loading model from best epoch ... |
|
2023-10-11 01:27:20,118 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd |
|
2023-10-11 01:27:32,789 |
|
Results: |
|
- F-score (micro) 0.7207 |
|
- F-score (macro) 0.6769 |
|
- Accuracy 0.5807 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
PER 0.8242 0.8218 0.8230 348 |
|
LOC 0.5598 0.7893 0.6550 261 |
|
ORG 0.3800 0.3654 0.3725 52 |
|
HumanProd 0.9000 0.8182 0.8571 22 |
|
|
|
micro avg 0.6739 0.7745 0.7207 683 |
|
macro avg 0.6660 0.6987 0.6769 683 |
|
weighted avg 0.6918 0.7745 0.7256 683 |
|
|
|
2023-10-11 01:27:32,790 ---------------------------------------------------------------------------------------------------- |
|
|