|
2023-10-11 10:27:33,571 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:27:33,574 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-11 10:27:33,574 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:27:33,574 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences |
|
- NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator |
|
2023-10-11 10:27:33,574 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:27:33,574 Train: 20847 sentences |
|
2023-10-11 10:27:33,575 (train_with_dev=False, train_with_test=False) |
|
2023-10-11 10:27:33,575 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:27:33,575 Training Params: |
|
2023-10-11 10:27:33,575 - learning_rate: "0.00015" |
|
2023-10-11 10:27:33,575 - mini_batch_size: "4" |
|
2023-10-11 10:27:33,575 - max_epochs: "10" |
|
2023-10-11 10:27:33,575 - shuffle: "True" |
|
2023-10-11 10:27:33,575 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:27:33,575 Plugins: |
|
2023-10-11 10:27:33,575 - TensorboardLogger |
|
2023-10-11 10:27:33,575 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-11 10:27:33,575 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:27:33,575 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-11 10:27:33,576 - metric: "('micro avg', 'f1-score')" |
|
2023-10-11 10:27:33,576 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:27:33,576 Computation: |
|
2023-10-11 10:27:33,576 - compute on device: cuda:0 |
|
2023-10-11 10:27:33,576 - embedding storage: none |
|
2023-10-11 10:27:33,576 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:27:33,576 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3" |
|
2023-10-11 10:27:33,576 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:27:33,576 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:27:33,576 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-11 10:29:55,156 epoch 1 - iter 521/5212 - loss 2.77308848 - time (sec): 141.58 - samples/sec: 263.97 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-11 10:32:16,905 epoch 1 - iter 1042/5212 - loss 2.32230492 - time (sec): 283.33 - samples/sec: 270.92 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 10:34:35,422 epoch 1 - iter 1563/5212 - loss 1.84040633 - time (sec): 421.84 - samples/sec: 268.49 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 10:36:54,353 epoch 1 - iter 2084/5212 - loss 1.50205626 - time (sec): 560.77 - samples/sec: 266.53 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-11 10:39:16,176 epoch 1 - iter 2605/5212 - loss 1.29871005 - time (sec): 702.60 - samples/sec: 266.48 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 10:41:36,574 epoch 1 - iter 3126/5212 - loss 1.14819274 - time (sec): 842.99 - samples/sec: 265.07 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-11 10:43:56,922 epoch 1 - iter 3647/5212 - loss 1.03459929 - time (sec): 983.34 - samples/sec: 263.04 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 10:46:16,004 epoch 1 - iter 4168/5212 - loss 0.95149666 - time (sec): 1122.43 - samples/sec: 261.30 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-11 10:48:36,869 epoch 1 - iter 4689/5212 - loss 0.87440388 - time (sec): 1263.29 - samples/sec: 262.05 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-11 10:50:57,081 epoch 1 - iter 5210/5212 - loss 0.80881280 - time (sec): 1403.50 - samples/sec: 261.66 - lr: 0.000150 - momentum: 0.000000 |
|
2023-10-11 10:50:57,613 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:50:57,613 EPOCH 1 done: loss 0.8086 - lr: 0.000150 |
|
2023-10-11 10:51:33,753 DEV : loss 0.1371876299381256 - f1-score (micro avg) 0.3336 |
|
2023-10-11 10:51:33,806 saving best model |
|
2023-10-11 10:51:34,714 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 10:53:54,909 epoch 2 - iter 521/5212 - loss 0.19618675 - time (sec): 140.19 - samples/sec: 263.48 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-11 10:56:12,183 epoch 2 - iter 1042/5212 - loss 0.18458622 - time (sec): 277.47 - samples/sec: 264.49 - lr: 0.000147 - momentum: 0.000000 |
|
2023-10-11 10:58:36,985 epoch 2 - iter 1563/5212 - loss 0.18961166 - time (sec): 422.27 - samples/sec: 267.02 - lr: 0.000145 - momentum: 0.000000 |
|
2023-10-11 11:00:58,974 epoch 2 - iter 2084/5212 - loss 0.18529432 - time (sec): 564.26 - samples/sec: 264.94 - lr: 0.000143 - momentum: 0.000000 |
|
2023-10-11 11:03:19,697 epoch 2 - iter 2605/5212 - loss 0.18045804 - time (sec): 704.98 - samples/sec: 261.97 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 11:05:42,377 epoch 2 - iter 3126/5212 - loss 0.17752518 - time (sec): 847.66 - samples/sec: 261.81 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-11 11:08:02,418 epoch 2 - iter 3647/5212 - loss 0.17641629 - time (sec): 987.70 - samples/sec: 258.68 - lr: 0.000138 - momentum: 0.000000 |
|
2023-10-11 11:10:25,754 epoch 2 - iter 4168/5212 - loss 0.17189078 - time (sec): 1131.04 - samples/sec: 258.18 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-11 11:12:50,890 epoch 2 - iter 4689/5212 - loss 0.16869126 - time (sec): 1276.17 - samples/sec: 258.97 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-11 11:15:14,550 epoch 2 - iter 5210/5212 - loss 0.16487858 - time (sec): 1419.83 - samples/sec: 258.73 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-11 11:15:14,995 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:15:14,995 EPOCH 2 done: loss 0.1649 - lr: 0.000133 |
|
2023-10-11 11:15:54,268 DEV : loss 0.14473062753677368 - f1-score (micro avg) 0.3188 |
|
2023-10-11 11:15:54,319 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:18:13,777 epoch 3 - iter 521/5212 - loss 0.10458077 - time (sec): 139.46 - samples/sec: 250.74 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-11 11:20:33,676 epoch 3 - iter 1042/5212 - loss 0.10612993 - time (sec): 279.36 - samples/sec: 254.51 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-11 11:22:53,147 epoch 3 - iter 1563/5212 - loss 0.10354381 - time (sec): 418.83 - samples/sec: 255.19 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-11 11:25:16,345 epoch 3 - iter 2084/5212 - loss 0.10944179 - time (sec): 562.02 - samples/sec: 259.35 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-11 11:27:35,775 epoch 3 - iter 2605/5212 - loss 0.11310926 - time (sec): 701.45 - samples/sec: 262.27 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-11 11:29:54,384 epoch 3 - iter 3126/5212 - loss 0.10867516 - time (sec): 840.06 - samples/sec: 261.83 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-11 11:32:12,976 epoch 3 - iter 3647/5212 - loss 0.10736809 - time (sec): 978.66 - samples/sec: 261.05 - lr: 0.000122 - momentum: 0.000000 |
|
2023-10-11 11:34:32,157 epoch 3 - iter 4168/5212 - loss 0.10821714 - time (sec): 1117.84 - samples/sec: 261.76 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-11 11:36:50,238 epoch 3 - iter 4689/5212 - loss 0.10784416 - time (sec): 1255.92 - samples/sec: 261.68 - lr: 0.000118 - momentum: 0.000000 |
|
2023-10-11 11:39:11,437 epoch 3 - iter 5210/5212 - loss 0.10805328 - time (sec): 1397.12 - samples/sec: 262.92 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-11 11:39:11,884 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:39:11,885 EPOCH 3 done: loss 0.1082 - lr: 0.000117 |
|
2023-10-11 11:39:50,899 DEV : loss 0.18280969560146332 - f1-score (micro avg) 0.4057 |
|
2023-10-11 11:39:50,958 saving best model |
|
2023-10-11 11:39:53,591 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:42:14,816 epoch 4 - iter 521/5212 - loss 0.07278263 - time (sec): 141.22 - samples/sec: 250.15 - lr: 0.000115 - momentum: 0.000000 |
|
2023-10-11 11:44:36,537 epoch 4 - iter 1042/5212 - loss 0.07239825 - time (sec): 282.94 - samples/sec: 254.20 - lr: 0.000113 - momentum: 0.000000 |
|
2023-10-11 11:46:57,338 epoch 4 - iter 1563/5212 - loss 0.07231137 - time (sec): 423.74 - samples/sec: 258.25 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-11 11:49:15,233 epoch 4 - iter 2084/5212 - loss 0.07343325 - time (sec): 561.64 - samples/sec: 257.71 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-11 11:51:38,449 epoch 4 - iter 2605/5212 - loss 0.07181772 - time (sec): 704.85 - samples/sec: 262.36 - lr: 0.000108 - momentum: 0.000000 |
|
2023-10-11 11:53:56,748 epoch 4 - iter 3126/5212 - loss 0.07331153 - time (sec): 843.15 - samples/sec: 261.09 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-11 11:56:18,345 epoch 4 - iter 3647/5212 - loss 0.07316098 - time (sec): 984.75 - samples/sec: 261.72 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 11:58:40,109 epoch 4 - iter 4168/5212 - loss 0.07306211 - time (sec): 1126.51 - samples/sec: 264.45 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-11 12:00:54,299 epoch 4 - iter 4689/5212 - loss 0.07216413 - time (sec): 1260.70 - samples/sec: 263.41 - lr: 0.000102 - momentum: 0.000000 |
|
2023-10-11 12:03:10,241 epoch 4 - iter 5210/5212 - loss 0.07359050 - time (sec): 1396.65 - samples/sec: 263.05 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-11 12:03:10,639 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:03:10,640 EPOCH 4 done: loss 0.0736 - lr: 0.000100 |
|
2023-10-11 12:03:49,861 DEV : loss 0.27816441655158997 - f1-score (micro avg) 0.3524 |
|
2023-10-11 12:03:49,914 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:06:07,583 epoch 5 - iter 521/5212 - loss 0.04138265 - time (sec): 137.67 - samples/sec: 260.75 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-11 12:08:31,272 epoch 5 - iter 1042/5212 - loss 0.04749933 - time (sec): 281.36 - samples/sec: 261.65 - lr: 0.000097 - momentum: 0.000000 |
|
2023-10-11 12:10:54,820 epoch 5 - iter 1563/5212 - loss 0.04798952 - time (sec): 424.90 - samples/sec: 257.07 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-11 12:13:20,187 epoch 5 - iter 2084/5212 - loss 0.05106117 - time (sec): 570.27 - samples/sec: 255.58 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-11 12:15:47,123 epoch 5 - iter 2605/5212 - loss 0.05002478 - time (sec): 717.21 - samples/sec: 256.72 - lr: 0.000092 - momentum: 0.000000 |
|
2023-10-11 12:18:10,619 epoch 5 - iter 3126/5212 - loss 0.05055487 - time (sec): 860.70 - samples/sec: 254.97 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-11 12:20:35,848 epoch 5 - iter 3647/5212 - loss 0.05188152 - time (sec): 1005.93 - samples/sec: 254.69 - lr: 0.000088 - momentum: 0.000000 |
|
2023-10-11 12:22:59,869 epoch 5 - iter 4168/5212 - loss 0.05088261 - time (sec): 1149.95 - samples/sec: 253.41 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-11 12:25:25,685 epoch 5 - iter 4689/5212 - loss 0.05025009 - time (sec): 1295.77 - samples/sec: 253.90 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-11 12:27:51,044 epoch 5 - iter 5210/5212 - loss 0.05118220 - time (sec): 1441.13 - samples/sec: 254.89 - lr: 0.000083 - momentum: 0.000000 |
|
2023-10-11 12:27:51,510 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:27:51,511 EPOCH 5 done: loss 0.0512 - lr: 0.000083 |
|
2023-10-11 12:28:32,525 DEV : loss 0.3333892226219177 - f1-score (micro avg) 0.3863 |
|
2023-10-11 12:28:32,579 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:30:52,575 epoch 6 - iter 521/5212 - loss 0.03246002 - time (sec): 139.99 - samples/sec: 241.08 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-11 12:33:13,663 epoch 6 - iter 1042/5212 - loss 0.03096266 - time (sec): 281.08 - samples/sec: 243.35 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-11 12:35:38,110 epoch 6 - iter 1563/5212 - loss 0.03241169 - time (sec): 425.53 - samples/sec: 245.97 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-11 12:38:01,041 epoch 6 - iter 2084/5212 - loss 0.03229808 - time (sec): 568.46 - samples/sec: 247.46 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-11 12:40:25,250 epoch 6 - iter 2605/5212 - loss 0.03266383 - time (sec): 712.67 - samples/sec: 249.02 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 12:42:46,535 epoch 6 - iter 3126/5212 - loss 0.03217009 - time (sec): 853.95 - samples/sec: 249.45 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-11 12:45:11,195 epoch 6 - iter 3647/5212 - loss 0.03269803 - time (sec): 998.61 - samples/sec: 252.89 - lr: 0.000072 - momentum: 0.000000 |
|
2023-10-11 12:47:35,975 epoch 6 - iter 4168/5212 - loss 0.03242644 - time (sec): 1143.39 - samples/sec: 253.88 - lr: 0.000070 - momentum: 0.000000 |
|
2023-10-11 12:50:00,875 epoch 6 - iter 4689/5212 - loss 0.03391785 - time (sec): 1288.29 - samples/sec: 255.87 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-11 12:52:21,466 epoch 6 - iter 5210/5212 - loss 0.03471456 - time (sec): 1428.89 - samples/sec: 256.95 - lr: 0.000067 - momentum: 0.000000 |
|
2023-10-11 12:52:22,095 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:52:22,095 EPOCH 6 done: loss 0.0347 - lr: 0.000067 |
|
2023-10-11 12:53:00,341 DEV : loss 0.397626131772995 - f1-score (micro avg) 0.3773 |
|
2023-10-11 12:53:00,393 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 12:55:23,221 epoch 7 - iter 521/5212 - loss 0.02766582 - time (sec): 142.83 - samples/sec: 280.80 - lr: 0.000065 - momentum: 0.000000 |
|
2023-10-11 12:57:43,563 epoch 7 - iter 1042/5212 - loss 0.02549282 - time (sec): 283.17 - samples/sec: 268.30 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-11 13:00:06,987 epoch 7 - iter 1563/5212 - loss 0.02298030 - time (sec): 426.59 - samples/sec: 265.97 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-11 13:02:33,264 epoch 7 - iter 2084/5212 - loss 0.02542458 - time (sec): 572.87 - samples/sec: 265.28 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-11 13:04:53,874 epoch 7 - iter 2605/5212 - loss 0.02658181 - time (sec): 713.48 - samples/sec: 260.46 - lr: 0.000058 - momentum: 0.000000 |
|
2023-10-11 13:07:19,303 epoch 7 - iter 3126/5212 - loss 0.02678770 - time (sec): 858.91 - samples/sec: 260.17 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-11 13:09:41,217 epoch 7 - iter 3647/5212 - loss 0.02682370 - time (sec): 1000.82 - samples/sec: 259.27 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-11 13:12:01,858 epoch 7 - iter 4168/5212 - loss 0.02663508 - time (sec): 1141.46 - samples/sec: 258.26 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-11 13:14:23,833 epoch 7 - iter 4689/5212 - loss 0.02646033 - time (sec): 1283.44 - samples/sec: 258.05 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-11 13:16:45,213 epoch 7 - iter 5210/5212 - loss 0.02638106 - time (sec): 1424.82 - samples/sec: 257.84 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-11 13:16:45,633 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 13:16:45,633 EPOCH 7 done: loss 0.0264 - lr: 0.000050 |
|
2023-10-11 13:17:23,898 DEV : loss 0.39803504943847656 - f1-score (micro avg) 0.3855 |
|
2023-10-11 13:17:23,949 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 13:19:45,732 epoch 8 - iter 521/5212 - loss 0.01602735 - time (sec): 141.78 - samples/sec: 259.43 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-11 13:22:08,337 epoch 8 - iter 1042/5212 - loss 0.01754525 - time (sec): 284.39 - samples/sec: 260.76 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-11 13:24:29,046 epoch 8 - iter 1563/5212 - loss 0.01837014 - time (sec): 425.09 - samples/sec: 259.41 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 13:26:54,017 epoch 8 - iter 2084/5212 - loss 0.01699882 - time (sec): 570.07 - samples/sec: 258.05 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-11 13:29:16,314 epoch 8 - iter 2605/5212 - loss 0.01790932 - time (sec): 712.36 - samples/sec: 259.27 - lr: 0.000042 - momentum: 0.000000 |
|
2023-10-11 13:31:37,281 epoch 8 - iter 3126/5212 - loss 0.01863312 - time (sec): 853.33 - samples/sec: 259.21 - lr: 0.000040 - momentum: 0.000000 |
|
2023-10-11 13:33:56,013 epoch 8 - iter 3647/5212 - loss 0.01818979 - time (sec): 992.06 - samples/sec: 259.09 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-11 13:36:17,040 epoch 8 - iter 4168/5212 - loss 0.01807363 - time (sec): 1133.09 - samples/sec: 259.49 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-11 13:38:38,320 epoch 8 - iter 4689/5212 - loss 0.01736008 - time (sec): 1274.37 - samples/sec: 260.41 - lr: 0.000035 - momentum: 0.000000 |
|
2023-10-11 13:40:56,604 epoch 8 - iter 5210/5212 - loss 0.01727055 - time (sec): 1412.65 - samples/sec: 259.87 - lr: 0.000033 - momentum: 0.000000 |
|
2023-10-11 13:40:57,310 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 13:40:57,311 EPOCH 8 done: loss 0.0173 - lr: 0.000033 |
|
2023-10-11 13:41:35,861 DEV : loss 0.434541255235672 - f1-score (micro avg) 0.413 |
|
2023-10-11 13:41:35,914 saving best model |
|
2023-10-11 13:41:38,523 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 13:44:00,885 epoch 9 - iter 521/5212 - loss 0.01391910 - time (sec): 142.36 - samples/sec: 269.72 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-11 13:46:21,602 epoch 9 - iter 1042/5212 - loss 0.01227543 - time (sec): 283.07 - samples/sec: 268.26 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 13:48:44,428 epoch 9 - iter 1563/5212 - loss 0.01127380 - time (sec): 425.90 - samples/sec: 261.83 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-11 13:51:05,408 epoch 9 - iter 2084/5212 - loss 0.01170911 - time (sec): 566.88 - samples/sec: 257.72 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-11 13:53:27,927 epoch 9 - iter 2605/5212 - loss 0.01188654 - time (sec): 709.40 - samples/sec: 258.85 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-11 13:55:49,808 epoch 9 - iter 3126/5212 - loss 0.01195441 - time (sec): 851.28 - samples/sec: 257.46 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-11 13:58:19,120 epoch 9 - iter 3647/5212 - loss 0.01196011 - time (sec): 1000.59 - samples/sec: 255.97 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-11 14:00:47,719 epoch 9 - iter 4168/5212 - loss 0.01172818 - time (sec): 1149.19 - samples/sec: 255.46 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-11 14:03:20,778 epoch 9 - iter 4689/5212 - loss 0.01252167 - time (sec): 1302.25 - samples/sec: 254.08 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-11 14:05:43,342 epoch 9 - iter 5210/5212 - loss 0.01279053 - time (sec): 1444.81 - samples/sec: 254.13 - lr: 0.000017 - momentum: 0.000000 |
|
2023-10-11 14:05:43,953 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 14:05:43,953 EPOCH 9 done: loss 0.0128 - lr: 0.000017 |
|
2023-10-11 14:06:24,418 DEV : loss 0.48711156845092773 - f1-score (micro avg) 0.3852 |
|
2023-10-11 14:06:24,474 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 14:08:53,718 epoch 10 - iter 521/5212 - loss 0.00536186 - time (sec): 149.24 - samples/sec: 244.44 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-11 14:11:12,907 epoch 10 - iter 1042/5212 - loss 0.00727275 - time (sec): 288.43 - samples/sec: 249.23 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 14:13:34,227 epoch 10 - iter 1563/5212 - loss 0.00704063 - time (sec): 429.75 - samples/sec: 252.94 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-11 14:15:56,744 epoch 10 - iter 2084/5212 - loss 0.00715484 - time (sec): 572.27 - samples/sec: 251.17 - lr: 0.000010 - momentum: 0.000000 |
|
2023-10-11 14:18:24,269 epoch 10 - iter 2605/5212 - loss 0.00703148 - time (sec): 719.79 - samples/sec: 254.39 - lr: 0.000008 - momentum: 0.000000 |
|
2023-10-11 14:20:45,693 epoch 10 - iter 3126/5212 - loss 0.00712965 - time (sec): 861.22 - samples/sec: 254.36 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-11 14:23:09,963 epoch 10 - iter 3647/5212 - loss 0.00771973 - time (sec): 1005.49 - samples/sec: 254.06 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-11 14:25:37,329 epoch 10 - iter 4168/5212 - loss 0.00782038 - time (sec): 1152.85 - samples/sec: 251.90 - lr: 0.000003 - momentum: 0.000000 |
|
2023-10-11 14:28:04,663 epoch 10 - iter 4689/5212 - loss 0.00789966 - time (sec): 1300.19 - samples/sec: 253.75 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-11 14:30:30,685 epoch 10 - iter 5210/5212 - loss 0.00775810 - time (sec): 1446.21 - samples/sec: 254.04 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-11 14:30:31,090 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 14:30:31,090 EPOCH 10 done: loss 0.0078 - lr: 0.000000 |
|
2023-10-11 14:31:09,345 DEV : loss 0.4862366318702698 - f1-score (micro avg) 0.3966 |
|
2023-10-11 14:31:10,297 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 14:31:10,299 Loading model from best epoch ... |
|
2023-10-11 14:31:14,039 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd |
|
2023-10-11 14:32:55,343 |
|
Results: |
|
- F-score (micro) 0.4259 |
|
- F-score (macro) 0.2963 |
|
- Accuracy 0.2744 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.4982 0.4596 0.4781 1214 |
|
PER 0.3957 0.4369 0.4153 808 |
|
ORG 0.2918 0.2918 0.2918 353 |
|
HumanProd 0.0000 0.0000 0.0000 15 |
|
|
|
micro avg 0.4275 0.4243 0.4259 2390 |
|
macro avg 0.2964 0.2971 0.2963 2390 |
|
weighted avg 0.4300 0.4243 0.4264 2390 |
|
|
|
2023-10-11 14:32:55,344 ---------------------------------------------------------------------------------------------------- |
|
|