|
2023-10-11 11:23:41,250 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:23:41,253 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-11 11:23:41,253 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:23:41,253 MultiCorpus: 7142 train + 698 dev + 2570 test sentences |
|
- NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator |
|
2023-10-11 11:23:41,253 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:23:41,253 Train: 7142 sentences |
|
2023-10-11 11:23:41,253 (train_with_dev=False, train_with_test=False) |
|
2023-10-11 11:23:41,253 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:23:41,253 Training Params: |
|
2023-10-11 11:23:41,253 - learning_rate: "0.00015" |
|
2023-10-11 11:23:41,253 - mini_batch_size: "4" |
|
2023-10-11 11:23:41,254 - max_epochs: "10" |
|
2023-10-11 11:23:41,254 - shuffle: "True" |
|
2023-10-11 11:23:41,254 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:23:41,254 Plugins: |
|
2023-10-11 11:23:41,254 - TensorboardLogger |
|
2023-10-11 11:23:41,254 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-11 11:23:41,254 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:23:41,254 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-11 11:23:41,254 - metric: "('micro avg', 'f1-score')" |
|
2023-10-11 11:23:41,254 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:23:41,254 Computation: |
|
2023-10-11 11:23:41,254 - compute on device: cuda:0 |
|
2023-10-11 11:23:41,254 - embedding storage: none |
|
2023-10-11 11:23:41,254 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:23:41,254 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3" |
|
2023-10-11 11:23:41,255 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:23:41,255 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:23:41,255 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-11 11:24:36,735 epoch 1 - iter 178/1786 - loss 2.81334662 - time (sec): 55.48 - samples/sec: 485.36 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-11 11:25:30,124 epoch 1 - iter 356/1786 - loss 2.65360553 - time (sec): 108.87 - samples/sec: 465.62 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 11:26:25,121 epoch 1 - iter 534/1786 - loss 2.38474398 - time (sec): 163.86 - samples/sec: 454.74 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 11:27:20,005 epoch 1 - iter 712/1786 - loss 2.09942018 - time (sec): 218.75 - samples/sec: 451.36 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-11 11:28:22,609 epoch 1 - iter 890/1786 - loss 1.82583585 - time (sec): 281.35 - samples/sec: 443.83 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 11:29:19,386 epoch 1 - iter 1068/1786 - loss 1.62155219 - time (sec): 338.13 - samples/sec: 439.95 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-11 11:30:14,335 epoch 1 - iter 1246/1786 - loss 1.44125903 - time (sec): 393.08 - samples/sec: 443.60 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 11:31:09,952 epoch 1 - iter 1424/1786 - loss 1.30760460 - time (sec): 448.70 - samples/sec: 443.19 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-11 11:32:05,906 epoch 1 - iter 1602/1786 - loss 1.19189571 - time (sec): 504.65 - samples/sec: 443.96 - lr: 0.000134 - momentum: 0.000000 |
|
2023-10-11 11:33:03,148 epoch 1 - iter 1780/1786 - loss 1.10257673 - time (sec): 561.89 - samples/sec: 441.25 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-11 11:33:04,915 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:33:04,916 EPOCH 1 done: loss 1.0997 - lr: 0.000149 |
|
2023-10-11 11:33:25,813 DEV : loss 0.17743352055549622 - f1-score (micro avg) 0.6248 |
|
2023-10-11 11:33:25,849 saving best model |
|
2023-10-11 11:33:26,769 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:34:26,264 epoch 2 - iter 178/1786 - loss 0.16761718 - time (sec): 59.49 - samples/sec: 441.01 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-11 11:35:22,903 epoch 2 - iter 356/1786 - loss 0.17182138 - time (sec): 116.13 - samples/sec: 443.29 - lr: 0.000147 - momentum: 0.000000 |
|
2023-10-11 11:36:17,446 epoch 2 - iter 534/1786 - loss 0.16152548 - time (sec): 170.67 - samples/sec: 441.52 - lr: 0.000145 - momentum: 0.000000 |
|
2023-10-11 11:37:12,824 epoch 2 - iter 712/1786 - loss 0.14888668 - time (sec): 226.05 - samples/sec: 445.71 - lr: 0.000143 - momentum: 0.000000 |
|
2023-10-11 11:38:04,419 epoch 2 - iter 890/1786 - loss 0.14407949 - time (sec): 277.65 - samples/sec: 448.14 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 11:38:58,004 epoch 2 - iter 1068/1786 - loss 0.14005413 - time (sec): 331.23 - samples/sec: 453.55 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-11 11:39:50,454 epoch 2 - iter 1246/1786 - loss 0.13811135 - time (sec): 383.68 - samples/sec: 455.24 - lr: 0.000138 - momentum: 0.000000 |
|
2023-10-11 11:40:45,636 epoch 2 - iter 1424/1786 - loss 0.13523123 - time (sec): 438.86 - samples/sec: 451.08 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-11 11:41:43,985 epoch 2 - iter 1602/1786 - loss 0.13268221 - time (sec): 497.21 - samples/sec: 447.57 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-11 11:42:37,848 epoch 2 - iter 1780/1786 - loss 0.12982701 - time (sec): 551.08 - samples/sec: 449.89 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-11 11:42:39,599 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:42:39,599 EPOCH 2 done: loss 0.1296 - lr: 0.000133 |
|
2023-10-11 11:43:01,563 DEV : loss 0.1018710657954216 - f1-score (micro avg) 0.7677 |
|
2023-10-11 11:43:01,593 saving best model |
|
2023-10-11 11:43:04,216 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:43:55,588 epoch 3 - iter 178/1786 - loss 0.06089358 - time (sec): 51.37 - samples/sec: 464.22 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-11 11:44:47,335 epoch 3 - iter 356/1786 - loss 0.06083272 - time (sec): 103.11 - samples/sec: 474.26 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-11 11:45:39,079 epoch 3 - iter 534/1786 - loss 0.06301108 - time (sec): 154.86 - samples/sec: 472.60 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-11 11:46:31,108 epoch 3 - iter 712/1786 - loss 0.06615584 - time (sec): 206.89 - samples/sec: 474.57 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-11 11:47:24,530 epoch 3 - iter 890/1786 - loss 0.07031467 - time (sec): 260.31 - samples/sec: 472.06 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-11 11:48:18,485 epoch 3 - iter 1068/1786 - loss 0.07283988 - time (sec): 314.26 - samples/sec: 469.38 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-11 11:49:17,279 epoch 3 - iter 1246/1786 - loss 0.07528099 - time (sec): 373.06 - samples/sec: 466.84 - lr: 0.000122 - momentum: 0.000000 |
|
2023-10-11 11:50:13,121 epoch 3 - iter 1424/1786 - loss 0.07404515 - time (sec): 428.90 - samples/sec: 461.84 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-11 11:51:06,569 epoch 3 - iter 1602/1786 - loss 0.07276736 - time (sec): 482.35 - samples/sec: 462.80 - lr: 0.000118 - momentum: 0.000000 |
|
2023-10-11 11:51:59,561 epoch 3 - iter 1780/1786 - loss 0.07349545 - time (sec): 535.34 - samples/sec: 463.53 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-11 11:52:01,090 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:52:01,091 EPOCH 3 done: loss 0.0738 - lr: 0.000117 |
|
2023-10-11 11:52:23,243 DEV : loss 0.11691577732563019 - f1-score (micro avg) 0.7871 |
|
2023-10-11 11:52:23,273 saving best model |
|
2023-10-11 11:52:25,840 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:53:18,856 epoch 4 - iter 178/1786 - loss 0.03874746 - time (sec): 53.01 - samples/sec: 453.23 - lr: 0.000115 - momentum: 0.000000 |
|
2023-10-11 11:54:17,169 epoch 4 - iter 356/1786 - loss 0.04840883 - time (sec): 111.32 - samples/sec: 440.32 - lr: 0.000113 - momentum: 0.000000 |
|
2023-10-11 11:55:15,656 epoch 4 - iter 534/1786 - loss 0.04781593 - time (sec): 169.81 - samples/sec: 445.25 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-11 11:56:12,421 epoch 4 - iter 712/1786 - loss 0.05099290 - time (sec): 226.58 - samples/sec: 442.20 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-11 11:57:11,612 epoch 4 - iter 890/1786 - loss 0.05175638 - time (sec): 285.77 - samples/sec: 440.13 - lr: 0.000108 - momentum: 0.000000 |
|
2023-10-11 11:58:05,630 epoch 4 - iter 1068/1786 - loss 0.05367502 - time (sec): 339.79 - samples/sec: 437.42 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-11 11:59:01,799 epoch 4 - iter 1246/1786 - loss 0.05394290 - time (sec): 395.95 - samples/sec: 438.77 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 11:59:56,173 epoch 4 - iter 1424/1786 - loss 0.05369425 - time (sec): 450.33 - samples/sec: 439.57 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-11 12:00:51,253 epoch 4 - iter 1602/1786 - loss 0.05297193 - time (sec): 505.41 - samples/sec: 441.40 - lr: 0.000102 - momentum: 0.000000 |
|
2023-10-11 12:01:47,664 epoch 4 - iter 1780/1786 - loss 0.05177312 - time (sec): 561.82 - samples/sec: 441.43 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-11 12:01:49,317 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:01:49,317 EPOCH 4 done: loss 0.0517 - lr: 0.000100 |
|
2023-10-11 12:02:11,096 DEV : loss 0.1411616951227188 - f1-score (micro avg) 0.7951 |
|
2023-10-11 12:02:11,133 saving best model |
|
2023-10-11 12:02:13,751 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:03:07,905 epoch 5 - iter 178/1786 - loss 0.03072434 - time (sec): 54.15 - samples/sec: 463.84 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-11 12:04:04,994 epoch 5 - iter 356/1786 - loss 0.03893563 - time (sec): 111.24 - samples/sec: 454.86 - lr: 0.000097 - momentum: 0.000000 |
|
2023-10-11 12:04:59,333 epoch 5 - iter 534/1786 - loss 0.03550559 - time (sec): 165.58 - samples/sec: 456.77 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-11 12:05:54,205 epoch 5 - iter 712/1786 - loss 0.03476839 - time (sec): 220.45 - samples/sec: 452.42 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-11 12:06:47,739 epoch 5 - iter 890/1786 - loss 0.03433559 - time (sec): 273.98 - samples/sec: 451.14 - lr: 0.000092 - momentum: 0.000000 |
|
2023-10-11 12:07:41,693 epoch 5 - iter 1068/1786 - loss 0.03434337 - time (sec): 327.94 - samples/sec: 450.14 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-11 12:08:37,890 epoch 5 - iter 1246/1786 - loss 0.03476657 - time (sec): 384.13 - samples/sec: 450.22 - lr: 0.000088 - momentum: 0.000000 |
|
2023-10-11 12:09:36,033 epoch 5 - iter 1424/1786 - loss 0.03524109 - time (sec): 442.28 - samples/sec: 446.25 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-11 12:10:33,305 epoch 5 - iter 1602/1786 - loss 0.03503721 - time (sec): 499.55 - samples/sec: 445.01 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-11 12:11:30,163 epoch 5 - iter 1780/1786 - loss 0.03719160 - time (sec): 556.41 - samples/sec: 445.83 - lr: 0.000083 - momentum: 0.000000 |
|
2023-10-11 12:11:31,834 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:11:31,835 EPOCH 5 done: loss 0.0373 - lr: 0.000083 |
|
2023-10-11 12:11:54,146 DEV : loss 0.1446281224489212 - f1-score (micro avg) 0.7989 |
|
2023-10-11 12:11:54,177 saving best model |
|
2023-10-11 12:11:56,879 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:12:51,045 epoch 6 - iter 178/1786 - loss 0.03460331 - time (sec): 54.16 - samples/sec: 480.26 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-11 12:13:45,100 epoch 6 - iter 356/1786 - loss 0.03107979 - time (sec): 108.22 - samples/sec: 458.69 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-11 12:14:39,394 epoch 6 - iter 534/1786 - loss 0.03181132 - time (sec): 162.51 - samples/sec: 453.26 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-11 12:15:33,225 epoch 6 - iter 712/1786 - loss 0.03080483 - time (sec): 216.34 - samples/sec: 458.64 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-11 12:16:27,715 epoch 6 - iter 890/1786 - loss 0.02989970 - time (sec): 270.83 - samples/sec: 456.86 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 12:17:20,578 epoch 6 - iter 1068/1786 - loss 0.02943783 - time (sec): 323.70 - samples/sec: 454.31 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-11 12:18:16,492 epoch 6 - iter 1246/1786 - loss 0.02814408 - time (sec): 379.61 - samples/sec: 452.62 - lr: 0.000072 - momentum: 0.000000 |
|
2023-10-11 12:19:11,158 epoch 6 - iter 1424/1786 - loss 0.02929556 - time (sec): 434.28 - samples/sec: 456.28 - lr: 0.000070 - momentum: 0.000000 |
|
2023-10-11 12:20:06,940 epoch 6 - iter 1602/1786 - loss 0.02877508 - time (sec): 490.06 - samples/sec: 455.44 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-11 12:21:07,155 epoch 6 - iter 1780/1786 - loss 0.02880009 - time (sec): 550.27 - samples/sec: 450.81 - lr: 0.000067 - momentum: 0.000000 |
|
2023-10-11 12:21:08,998 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:21:08,998 EPOCH 6 done: loss 0.0287 - lr: 0.000067 |
|
2023-10-11 12:21:31,729 DEV : loss 0.18198005855083466 - f1-score (micro avg) 0.7913 |
|
2023-10-11 12:21:31,760 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:22:27,207 epoch 7 - iter 178/1786 - loss 0.02502610 - time (sec): 55.45 - samples/sec: 442.13 - lr: 0.000065 - momentum: 0.000000 |
|
2023-10-11 12:23:18,505 epoch 7 - iter 356/1786 - loss 0.02577891 - time (sec): 106.74 - samples/sec: 447.65 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-11 12:24:11,729 epoch 7 - iter 534/1786 - loss 0.02380424 - time (sec): 159.97 - samples/sec: 456.73 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-11 12:25:03,073 epoch 7 - iter 712/1786 - loss 0.02361198 - time (sec): 211.31 - samples/sec: 459.78 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-11 12:25:54,713 epoch 7 - iter 890/1786 - loss 0.02408280 - time (sec): 262.95 - samples/sec: 465.29 - lr: 0.000058 - momentum: 0.000000 |
|
2023-10-11 12:26:47,777 epoch 7 - iter 1068/1786 - loss 0.02326326 - time (sec): 316.01 - samples/sec: 468.21 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-11 12:27:42,548 epoch 7 - iter 1246/1786 - loss 0.02207530 - time (sec): 370.79 - samples/sec: 466.64 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-11 12:28:34,067 epoch 7 - iter 1424/1786 - loss 0.02234953 - time (sec): 422.31 - samples/sec: 469.14 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-11 12:29:25,675 epoch 7 - iter 1602/1786 - loss 0.02289223 - time (sec): 473.91 - samples/sec: 471.10 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-11 12:30:17,009 epoch 7 - iter 1780/1786 - loss 0.02244176 - time (sec): 525.25 - samples/sec: 472.48 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-11 12:30:18,464 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:30:18,465 EPOCH 7 done: loss 0.0224 - lr: 0.000050 |
|
2023-10-11 12:30:38,760 DEV : loss 0.19651886820793152 - f1-score (micro avg) 0.7944 |
|
2023-10-11 12:30:38,790 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:31:30,271 epoch 8 - iter 178/1786 - loss 0.01170122 - time (sec): 51.48 - samples/sec: 479.72 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-11 12:32:21,756 epoch 8 - iter 356/1786 - loss 0.01261849 - time (sec): 102.96 - samples/sec: 480.14 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-11 12:33:11,986 epoch 8 - iter 534/1786 - loss 0.01077078 - time (sec): 153.19 - samples/sec: 474.98 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 12:34:02,991 epoch 8 - iter 712/1786 - loss 0.01119283 - time (sec): 204.20 - samples/sec: 471.16 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-11 12:34:54,856 epoch 8 - iter 890/1786 - loss 0.01302730 - time (sec): 256.06 - samples/sec: 469.17 - lr: 0.000042 - momentum: 0.000000 |
|
2023-10-11 12:35:52,337 epoch 8 - iter 1068/1786 - loss 0.01505220 - time (sec): 313.55 - samples/sec: 467.37 - lr: 0.000040 - momentum: 0.000000 |
|
2023-10-11 12:36:46,291 epoch 8 - iter 1246/1786 - loss 0.01529191 - time (sec): 367.50 - samples/sec: 469.17 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-11 12:37:40,067 epoch 8 - iter 1424/1786 - loss 0.01565327 - time (sec): 421.28 - samples/sec: 472.27 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-11 12:38:33,123 epoch 8 - iter 1602/1786 - loss 0.01640331 - time (sec): 474.33 - samples/sec: 473.18 - lr: 0.000035 - momentum: 0.000000 |
|
2023-10-11 12:39:24,506 epoch 8 - iter 1780/1786 - loss 0.01586581 - time (sec): 525.71 - samples/sec: 471.92 - lr: 0.000033 - momentum: 0.000000 |
|
2023-10-11 12:39:26,070 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:39:26,070 EPOCH 8 done: loss 0.0158 - lr: 0.000033 |
|
2023-10-11 12:39:47,196 DEV : loss 0.20342709124088287 - f1-score (micro avg) 0.8005 |
|
2023-10-11 12:39:47,225 saving best model |
|
2023-10-11 12:39:49,797 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:40:41,572 epoch 9 - iter 178/1786 - loss 0.01473686 - time (sec): 51.77 - samples/sec: 460.70 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-11 12:41:32,765 epoch 9 - iter 356/1786 - loss 0.01118027 - time (sec): 102.96 - samples/sec: 453.89 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 12:42:23,635 epoch 9 - iter 534/1786 - loss 0.01245056 - time (sec): 153.83 - samples/sec: 447.25 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-11 12:43:17,012 epoch 9 - iter 712/1786 - loss 0.01164884 - time (sec): 207.21 - samples/sec: 457.99 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-11 12:44:09,593 epoch 9 - iter 890/1786 - loss 0.01172585 - time (sec): 259.79 - samples/sec: 462.94 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-11 12:45:03,569 epoch 9 - iter 1068/1786 - loss 0.01162303 - time (sec): 313.77 - samples/sec: 465.80 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-11 12:45:58,302 epoch 9 - iter 1246/1786 - loss 0.01213149 - time (sec): 368.50 - samples/sec: 468.41 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-11 12:46:54,840 epoch 9 - iter 1424/1786 - loss 0.01176088 - time (sec): 425.04 - samples/sec: 467.35 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-11 12:47:47,993 epoch 9 - iter 1602/1786 - loss 0.01190777 - time (sec): 478.19 - samples/sec: 467.50 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-11 12:48:41,133 epoch 9 - iter 1780/1786 - loss 0.01190889 - time (sec): 531.33 - samples/sec: 466.14 - lr: 0.000017 - momentum: 0.000000 |
|
2023-10-11 12:48:42,969 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:48:42,969 EPOCH 9 done: loss 0.0120 - lr: 0.000017 |
|
2023-10-11 12:49:04,098 DEV : loss 0.21383821964263916 - f1-score (micro avg) 0.7934 |
|
2023-10-11 12:49:04,137 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:49:59,581 epoch 10 - iter 178/1786 - loss 0.00900177 - time (sec): 55.44 - samples/sec: 450.00 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-11 12:50:51,193 epoch 10 - iter 356/1786 - loss 0.01022974 - time (sec): 107.05 - samples/sec: 449.40 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 12:51:44,986 epoch 10 - iter 534/1786 - loss 0.00925737 - time (sec): 160.85 - samples/sec: 452.35 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-11 12:52:43,086 epoch 10 - iter 712/1786 - loss 0.00868107 - time (sec): 218.95 - samples/sec: 449.05 - lr: 0.000010 - momentum: 0.000000 |
|
2023-10-11 12:53:39,352 epoch 10 - iter 890/1786 - loss 0.00865781 - time (sec): 275.21 - samples/sec: 451.32 - lr: 0.000008 - momentum: 0.000000 |
|
2023-10-11 12:54:34,765 epoch 10 - iter 1068/1786 - loss 0.00851102 - time (sec): 330.62 - samples/sec: 449.74 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-11 12:55:30,317 epoch 10 - iter 1246/1786 - loss 0.00848258 - time (sec): 386.18 - samples/sec: 447.18 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-11 12:56:26,280 epoch 10 - iter 1424/1786 - loss 0.00850758 - time (sec): 442.14 - samples/sec: 446.92 - lr: 0.000003 - momentum: 0.000000 |
|
2023-10-11 12:57:23,983 epoch 10 - iter 1602/1786 - loss 0.00827861 - time (sec): 499.84 - samples/sec: 445.08 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-11 12:58:21,558 epoch 10 - iter 1780/1786 - loss 0.00830503 - time (sec): 557.42 - samples/sec: 445.35 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-11 12:58:23,007 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:58:23,007 EPOCH 10 done: loss 0.0083 - lr: 0.000000 |
|
2023-10-11 12:58:45,352 DEV : loss 0.2220211774110794 - f1-score (micro avg) 0.7952 |
|
2023-10-11 12:58:46,381 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:58:46,383 Loading model from best epoch ... |
|
2023-10-11 12:58:50,444 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd |
|
2023-10-11 13:00:06,150 |
|
Results: |
|
- F-score (micro) 0.7251 |
|
- F-score (macro) 0.6453 |
|
- Accuracy 0.5829 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.7389 0.7443 0.7416 1095 |
|
PER 0.7830 0.7846 0.7838 1012 |
|
ORG 0.5083 0.5994 0.5501 357 |
|
HumanProd 0.4074 0.6667 0.5057 33 |
|
|
|
micro avg 0.7118 0.7389 0.7251 2497 |
|
macro avg 0.6094 0.6987 0.6453 2497 |
|
weighted avg 0.7194 0.7389 0.7282 2497 |
|
|
|
2023-10-11 13:00:06,150 ---------------------------------------------------------------------------------------------------- |
|
|