stefan-it's picture
Upload ./training.log with huggingface_hub
6989998
2023-10-25 21:09:57,475 ----------------------------------------------------------------------------------------------------
2023-10-25 21:09:57,476 Model: "SequenceTagger(
(embeddings): TransformerWordEmbeddings(
(model): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(64001, 768)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0-11): 12 x BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=768, out_features=17, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-25 21:09:57,476 ----------------------------------------------------------------------------------------------------
2023-10-25 21:09:57,476 MultiCorpus: 1166 train + 165 dev + 415 test sentences
- NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator
2023-10-25 21:09:57,477 ----------------------------------------------------------------------------------------------------
2023-10-25 21:09:57,477 Train: 1166 sentences
2023-10-25 21:09:57,477 (train_with_dev=False, train_with_test=False)
2023-10-25 21:09:57,477 ----------------------------------------------------------------------------------------------------
2023-10-25 21:09:57,477 Training Params:
2023-10-25 21:09:57,477 - learning_rate: "3e-05"
2023-10-25 21:09:57,477 - mini_batch_size: "4"
2023-10-25 21:09:57,477 - max_epochs: "10"
2023-10-25 21:09:57,477 - shuffle: "True"
2023-10-25 21:09:57,477 ----------------------------------------------------------------------------------------------------
2023-10-25 21:09:57,477 Plugins:
2023-10-25 21:09:57,477 - TensorboardLogger
2023-10-25 21:09:57,477 - LinearScheduler | warmup_fraction: '0.1'
2023-10-25 21:09:57,477 ----------------------------------------------------------------------------------------------------
2023-10-25 21:09:57,477 Final evaluation on model from best epoch (best-model.pt)
2023-10-25 21:09:57,477 - metric: "('micro avg', 'f1-score')"
2023-10-25 21:09:57,477 ----------------------------------------------------------------------------------------------------
2023-10-25 21:09:57,477 Computation:
2023-10-25 21:09:57,477 - compute on device: cuda:0
2023-10-25 21:09:57,477 - embedding storage: none
2023-10-25 21:09:57,477 ----------------------------------------------------------------------------------------------------
2023-10-25 21:09:57,477 Model training base path: "hmbench-newseye/fi-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-3"
2023-10-25 21:09:57,477 ----------------------------------------------------------------------------------------------------
2023-10-25 21:09:57,477 ----------------------------------------------------------------------------------------------------
2023-10-25 21:09:57,477 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-25 21:09:58,743 epoch 1 - iter 29/292 - loss 2.73833700 - time (sec): 1.26 - samples/sec: 2733.50 - lr: 0.000003 - momentum: 0.000000
2023-10-25 21:10:00,016 epoch 1 - iter 58/292 - loss 2.06223157 - time (sec): 2.54 - samples/sec: 2941.83 - lr: 0.000006 - momentum: 0.000000
2023-10-25 21:10:01,360 epoch 1 - iter 87/292 - loss 1.51849599 - time (sec): 3.88 - samples/sec: 3112.72 - lr: 0.000009 - momentum: 0.000000
2023-10-25 21:10:02,634 epoch 1 - iter 116/292 - loss 1.26889759 - time (sec): 5.16 - samples/sec: 3123.06 - lr: 0.000012 - momentum: 0.000000
2023-10-25 21:10:03,888 epoch 1 - iter 145/292 - loss 1.09220417 - time (sec): 6.41 - samples/sec: 3199.90 - lr: 0.000015 - momentum: 0.000000
2023-10-25 21:10:05,205 epoch 1 - iter 174/292 - loss 0.98328127 - time (sec): 7.73 - samples/sec: 3233.86 - lr: 0.000018 - momentum: 0.000000
2023-10-25 21:10:06,492 epoch 1 - iter 203/292 - loss 0.88448692 - time (sec): 9.01 - samples/sec: 3301.34 - lr: 0.000021 - momentum: 0.000000
2023-10-25 21:10:07,957 epoch 1 - iter 232/292 - loss 0.81219091 - time (sec): 10.48 - samples/sec: 3328.88 - lr: 0.000024 - momentum: 0.000000
2023-10-25 21:10:09,285 epoch 1 - iter 261/292 - loss 0.73736454 - time (sec): 11.81 - samples/sec: 3386.30 - lr: 0.000027 - momentum: 0.000000
2023-10-25 21:10:10,589 epoch 1 - iter 290/292 - loss 0.68903885 - time (sec): 13.11 - samples/sec: 3376.55 - lr: 0.000030 - momentum: 0.000000
2023-10-25 21:10:10,669 ----------------------------------------------------------------------------------------------------
2023-10-25 21:10:10,669 EPOCH 1 done: loss 0.6887 - lr: 0.000030
2023-10-25 21:10:11,337 DEV : loss 0.14741386473178864 - f1-score (micro avg) 0.5684
2023-10-25 21:10:11,341 saving best model
2023-10-25 21:10:11,804 ----------------------------------------------------------------------------------------------------
2023-10-25 21:10:13,116 epoch 2 - iter 29/292 - loss 0.19802987 - time (sec): 1.31 - samples/sec: 3532.37 - lr: 0.000030 - momentum: 0.000000
2023-10-25 21:10:14,467 epoch 2 - iter 58/292 - loss 0.16963628 - time (sec): 2.66 - samples/sec: 3655.77 - lr: 0.000029 - momentum: 0.000000
2023-10-25 21:10:15,747 epoch 2 - iter 87/292 - loss 0.17025510 - time (sec): 3.94 - samples/sec: 3550.44 - lr: 0.000029 - momentum: 0.000000
2023-10-25 21:10:17,076 epoch 2 - iter 116/292 - loss 0.16977186 - time (sec): 5.27 - samples/sec: 3463.69 - lr: 0.000029 - momentum: 0.000000
2023-10-25 21:10:18,323 epoch 2 - iter 145/292 - loss 0.16943346 - time (sec): 6.52 - samples/sec: 3405.57 - lr: 0.000028 - momentum: 0.000000
2023-10-25 21:10:19,568 epoch 2 - iter 174/292 - loss 0.17536090 - time (sec): 7.76 - samples/sec: 3354.69 - lr: 0.000028 - momentum: 0.000000
2023-10-25 21:10:20,803 epoch 2 - iter 203/292 - loss 0.17323557 - time (sec): 9.00 - samples/sec: 3361.80 - lr: 0.000028 - momentum: 0.000000
2023-10-25 21:10:22,166 epoch 2 - iter 232/292 - loss 0.16456716 - time (sec): 10.36 - samples/sec: 3373.62 - lr: 0.000027 - momentum: 0.000000
2023-10-25 21:10:23,436 epoch 2 - iter 261/292 - loss 0.16135994 - time (sec): 11.63 - samples/sec: 3407.22 - lr: 0.000027 - momentum: 0.000000
2023-10-25 21:10:24,699 epoch 2 - iter 290/292 - loss 0.16019950 - time (sec): 12.89 - samples/sec: 3421.71 - lr: 0.000027 - momentum: 0.000000
2023-10-25 21:10:24,782 ----------------------------------------------------------------------------------------------------
2023-10-25 21:10:24,783 EPOCH 2 done: loss 0.1601 - lr: 0.000027
2023-10-25 21:10:25,689 DEV : loss 0.1006804034113884 - f1-score (micro avg) 0.7293
2023-10-25 21:10:25,694 saving best model
2023-10-25 21:10:26,488 ----------------------------------------------------------------------------------------------------
2023-10-25 21:10:27,806 epoch 3 - iter 29/292 - loss 0.07594065 - time (sec): 1.31 - samples/sec: 3394.59 - lr: 0.000026 - momentum: 0.000000
2023-10-25 21:10:29,004 epoch 3 - iter 58/292 - loss 0.07959637 - time (sec): 2.51 - samples/sec: 3064.08 - lr: 0.000026 - momentum: 0.000000
2023-10-25 21:10:30,354 epoch 3 - iter 87/292 - loss 0.08406265 - time (sec): 3.86 - samples/sec: 3187.84 - lr: 0.000026 - momentum: 0.000000
2023-10-25 21:10:31,639 epoch 3 - iter 116/292 - loss 0.08390541 - time (sec): 5.15 - samples/sec: 3101.09 - lr: 0.000025 - momentum: 0.000000
2023-10-25 21:10:33,068 epoch 3 - iter 145/292 - loss 0.08728900 - time (sec): 6.58 - samples/sec: 3334.11 - lr: 0.000025 - momentum: 0.000000
2023-10-25 21:10:34,367 epoch 3 - iter 174/292 - loss 0.08901608 - time (sec): 7.88 - samples/sec: 3386.20 - lr: 0.000025 - momentum: 0.000000
2023-10-25 21:10:35,652 epoch 3 - iter 203/292 - loss 0.09014633 - time (sec): 9.16 - samples/sec: 3410.32 - lr: 0.000024 - momentum: 0.000000
2023-10-25 21:10:36,942 epoch 3 - iter 232/292 - loss 0.08866790 - time (sec): 10.45 - samples/sec: 3369.56 - lr: 0.000024 - momentum: 0.000000
2023-10-25 21:10:38,213 epoch 3 - iter 261/292 - loss 0.08888259 - time (sec): 11.72 - samples/sec: 3339.15 - lr: 0.000024 - momentum: 0.000000
2023-10-25 21:10:39,580 epoch 3 - iter 290/292 - loss 0.08998420 - time (sec): 13.09 - samples/sec: 3344.03 - lr: 0.000023 - momentum: 0.000000
2023-10-25 21:10:39,684 ----------------------------------------------------------------------------------------------------
2023-10-25 21:10:39,685 EPOCH 3 done: loss 0.0907 - lr: 0.000023
2023-10-25 21:10:40,596 DEV : loss 0.10108631104230881 - f1-score (micro avg) 0.7149
2023-10-25 21:10:40,601 ----------------------------------------------------------------------------------------------------
2023-10-25 21:10:41,978 epoch 4 - iter 29/292 - loss 0.06981579 - time (sec): 1.38 - samples/sec: 3683.46 - lr: 0.000023 - momentum: 0.000000
2023-10-25 21:10:43,335 epoch 4 - iter 58/292 - loss 0.06503789 - time (sec): 2.73 - samples/sec: 3312.15 - lr: 0.000023 - momentum: 0.000000
2023-10-25 21:10:44,674 epoch 4 - iter 87/292 - loss 0.06470976 - time (sec): 4.07 - samples/sec: 3261.96 - lr: 0.000022 - momentum: 0.000000
2023-10-25 21:10:45,984 epoch 4 - iter 116/292 - loss 0.06055619 - time (sec): 5.38 - samples/sec: 3217.83 - lr: 0.000022 - momentum: 0.000000
2023-10-25 21:10:47,230 epoch 4 - iter 145/292 - loss 0.05764284 - time (sec): 6.63 - samples/sec: 3167.14 - lr: 0.000022 - momentum: 0.000000
2023-10-25 21:10:48,543 epoch 4 - iter 174/292 - loss 0.06190221 - time (sec): 7.94 - samples/sec: 3228.80 - lr: 0.000021 - momentum: 0.000000
2023-10-25 21:10:49,817 epoch 4 - iter 203/292 - loss 0.06220813 - time (sec): 9.22 - samples/sec: 3221.19 - lr: 0.000021 - momentum: 0.000000
2023-10-25 21:10:51,171 epoch 4 - iter 232/292 - loss 0.06042297 - time (sec): 10.57 - samples/sec: 3177.20 - lr: 0.000021 - momentum: 0.000000
2023-10-25 21:10:52,526 epoch 4 - iter 261/292 - loss 0.06188990 - time (sec): 11.92 - samples/sec: 3277.30 - lr: 0.000020 - momentum: 0.000000
2023-10-25 21:10:53,812 epoch 4 - iter 290/292 - loss 0.06067152 - time (sec): 13.21 - samples/sec: 3353.33 - lr: 0.000020 - momentum: 0.000000
2023-10-25 21:10:53,891 ----------------------------------------------------------------------------------------------------
2023-10-25 21:10:53,892 EPOCH 4 done: loss 0.0605 - lr: 0.000020
2023-10-25 21:10:54,800 DEV : loss 0.12220078706741333 - f1-score (micro avg) 0.7566
2023-10-25 21:10:54,805 saving best model
2023-10-25 21:10:55,307 ----------------------------------------------------------------------------------------------------
2023-10-25 21:10:56,618 epoch 5 - iter 29/292 - loss 0.07226843 - time (sec): 1.31 - samples/sec: 3651.83 - lr: 0.000020 - momentum: 0.000000
2023-10-25 21:10:57,855 epoch 5 - iter 58/292 - loss 0.05576230 - time (sec): 2.55 - samples/sec: 3360.82 - lr: 0.000019 - momentum: 0.000000
2023-10-25 21:10:59,178 epoch 5 - iter 87/292 - loss 0.05222793 - time (sec): 3.87 - samples/sec: 3532.22 - lr: 0.000019 - momentum: 0.000000
2023-10-25 21:11:00,461 epoch 5 - iter 116/292 - loss 0.04893205 - time (sec): 5.15 - samples/sec: 3442.37 - lr: 0.000019 - momentum: 0.000000
2023-10-25 21:11:01,744 epoch 5 - iter 145/292 - loss 0.04364075 - time (sec): 6.44 - samples/sec: 3410.85 - lr: 0.000018 - momentum: 0.000000
2023-10-25 21:11:03,003 epoch 5 - iter 174/292 - loss 0.04155609 - time (sec): 7.70 - samples/sec: 3347.83 - lr: 0.000018 - momentum: 0.000000
2023-10-25 21:11:04,289 epoch 5 - iter 203/292 - loss 0.04349718 - time (sec): 8.98 - samples/sec: 3332.55 - lr: 0.000018 - momentum: 0.000000
2023-10-25 21:11:05,576 epoch 5 - iter 232/292 - loss 0.04323634 - time (sec): 10.27 - samples/sec: 3396.22 - lr: 0.000017 - momentum: 0.000000
2023-10-25 21:11:06,959 epoch 5 - iter 261/292 - loss 0.04192164 - time (sec): 11.65 - samples/sec: 3438.20 - lr: 0.000017 - momentum: 0.000000
2023-10-25 21:11:08,205 epoch 5 - iter 290/292 - loss 0.04216845 - time (sec): 12.90 - samples/sec: 3436.94 - lr: 0.000017 - momentum: 0.000000
2023-10-25 21:11:08,280 ----------------------------------------------------------------------------------------------------
2023-10-25 21:11:08,280 EPOCH 5 done: loss 0.0421 - lr: 0.000017
2023-10-25 21:11:09,192 DEV : loss 0.14462324976921082 - f1-score (micro avg) 0.7615
2023-10-25 21:11:09,197 saving best model
2023-10-25 21:11:09,811 ----------------------------------------------------------------------------------------------------
2023-10-25 21:11:11,084 epoch 6 - iter 29/292 - loss 0.03071517 - time (sec): 1.27 - samples/sec: 3367.38 - lr: 0.000016 - momentum: 0.000000
2023-10-25 21:11:12,392 epoch 6 - iter 58/292 - loss 0.03420363 - time (sec): 2.58 - samples/sec: 3399.63 - lr: 0.000016 - momentum: 0.000000
2023-10-25 21:11:13,735 epoch 6 - iter 87/292 - loss 0.02882773 - time (sec): 3.92 - samples/sec: 3494.61 - lr: 0.000016 - momentum: 0.000000
2023-10-25 21:11:15,013 epoch 6 - iter 116/292 - loss 0.02863387 - time (sec): 5.20 - samples/sec: 3502.54 - lr: 0.000015 - momentum: 0.000000
2023-10-25 21:11:16,288 epoch 6 - iter 145/292 - loss 0.02715294 - time (sec): 6.47 - samples/sec: 3430.36 - lr: 0.000015 - momentum: 0.000000
2023-10-25 21:11:17,594 epoch 6 - iter 174/292 - loss 0.02561636 - time (sec): 7.78 - samples/sec: 3368.67 - lr: 0.000015 - momentum: 0.000000
2023-10-25 21:11:18,908 epoch 6 - iter 203/292 - loss 0.02415975 - time (sec): 9.09 - samples/sec: 3379.04 - lr: 0.000014 - momentum: 0.000000
2023-10-25 21:11:20,254 epoch 6 - iter 232/292 - loss 0.02654283 - time (sec): 10.44 - samples/sec: 3388.04 - lr: 0.000014 - momentum: 0.000000
2023-10-25 21:11:21,502 epoch 6 - iter 261/292 - loss 0.02923450 - time (sec): 11.69 - samples/sec: 3425.09 - lr: 0.000014 - momentum: 0.000000
2023-10-25 21:11:22,697 epoch 6 - iter 290/292 - loss 0.02964102 - time (sec): 12.88 - samples/sec: 3434.90 - lr: 0.000013 - momentum: 0.000000
2023-10-25 21:11:22,773 ----------------------------------------------------------------------------------------------------
2023-10-25 21:11:22,774 EPOCH 6 done: loss 0.0298 - lr: 0.000013
2023-10-25 21:11:23,685 DEV : loss 0.14538371562957764 - f1-score (micro avg) 0.7451
2023-10-25 21:11:23,689 ----------------------------------------------------------------------------------------------------
2023-10-25 21:11:24,920 epoch 7 - iter 29/292 - loss 0.01672833 - time (sec): 1.23 - samples/sec: 3498.10 - lr: 0.000013 - momentum: 0.000000
2023-10-25 21:11:26,397 epoch 7 - iter 58/292 - loss 0.02256440 - time (sec): 2.71 - samples/sec: 3800.73 - lr: 0.000013 - momentum: 0.000000
2023-10-25 21:11:27,816 epoch 7 - iter 87/292 - loss 0.02204005 - time (sec): 4.13 - samples/sec: 3378.64 - lr: 0.000012 - momentum: 0.000000
2023-10-25 21:11:29,078 epoch 7 - iter 116/292 - loss 0.02236449 - time (sec): 5.39 - samples/sec: 3287.29 - lr: 0.000012 - momentum: 0.000000
2023-10-25 21:11:30,408 epoch 7 - iter 145/292 - loss 0.02209624 - time (sec): 6.72 - samples/sec: 3341.87 - lr: 0.000012 - momentum: 0.000000
2023-10-25 21:11:31,689 epoch 7 - iter 174/292 - loss 0.02162009 - time (sec): 8.00 - samples/sec: 3385.55 - lr: 0.000011 - momentum: 0.000000
2023-10-25 21:11:32,976 epoch 7 - iter 203/292 - loss 0.02205763 - time (sec): 9.29 - samples/sec: 3396.11 - lr: 0.000011 - momentum: 0.000000
2023-10-25 21:11:34,194 epoch 7 - iter 232/292 - loss 0.02172002 - time (sec): 10.50 - samples/sec: 3344.86 - lr: 0.000011 - momentum: 0.000000
2023-10-25 21:11:35,497 epoch 7 - iter 261/292 - loss 0.02105659 - time (sec): 11.81 - samples/sec: 3360.74 - lr: 0.000010 - momentum: 0.000000
2023-10-25 21:11:36,791 epoch 7 - iter 290/292 - loss 0.01955965 - time (sec): 13.10 - samples/sec: 3379.17 - lr: 0.000010 - momentum: 0.000000
2023-10-25 21:11:36,874 ----------------------------------------------------------------------------------------------------
2023-10-25 21:11:36,874 EPOCH 7 done: loss 0.0195 - lr: 0.000010
2023-10-25 21:11:37,790 DEV : loss 0.13841482996940613 - f1-score (micro avg) 0.7592
2023-10-25 21:11:37,794 ----------------------------------------------------------------------------------------------------
2023-10-25 21:11:39,074 epoch 8 - iter 29/292 - loss 0.02671843 - time (sec): 1.28 - samples/sec: 3108.12 - lr: 0.000010 - momentum: 0.000000
2023-10-25 21:11:40,476 epoch 8 - iter 58/292 - loss 0.02178925 - time (sec): 2.68 - samples/sec: 3397.93 - lr: 0.000009 - momentum: 0.000000
2023-10-25 21:11:41,813 epoch 8 - iter 87/292 - loss 0.01899837 - time (sec): 4.02 - samples/sec: 3467.31 - lr: 0.000009 - momentum: 0.000000
2023-10-25 21:11:43,060 epoch 8 - iter 116/292 - loss 0.02281207 - time (sec): 5.27 - samples/sec: 3471.94 - lr: 0.000009 - momentum: 0.000000
2023-10-25 21:11:44,356 epoch 8 - iter 145/292 - loss 0.02038146 - time (sec): 6.56 - samples/sec: 3427.58 - lr: 0.000008 - momentum: 0.000000
2023-10-25 21:11:45,606 epoch 8 - iter 174/292 - loss 0.01938016 - time (sec): 7.81 - samples/sec: 3442.67 - lr: 0.000008 - momentum: 0.000000
2023-10-25 21:11:46,852 epoch 8 - iter 203/292 - loss 0.01829205 - time (sec): 9.06 - samples/sec: 3394.12 - lr: 0.000008 - momentum: 0.000000
2023-10-25 21:11:48,119 epoch 8 - iter 232/292 - loss 0.01759759 - time (sec): 10.32 - samples/sec: 3435.64 - lr: 0.000007 - momentum: 0.000000
2023-10-25 21:11:49,392 epoch 8 - iter 261/292 - loss 0.01611240 - time (sec): 11.60 - samples/sec: 3441.24 - lr: 0.000007 - momentum: 0.000000
2023-10-25 21:11:50,615 epoch 8 - iter 290/292 - loss 0.01512526 - time (sec): 12.82 - samples/sec: 3457.28 - lr: 0.000007 - momentum: 0.000000
2023-10-25 21:11:50,691 ----------------------------------------------------------------------------------------------------
2023-10-25 21:11:50,691 EPOCH 8 done: loss 0.0151 - lr: 0.000007
2023-10-25 21:11:51,601 DEV : loss 0.158660426735878 - f1-score (micro avg) 0.7716
2023-10-25 21:11:51,605 saving best model
2023-10-25 21:11:52,218 ----------------------------------------------------------------------------------------------------
2023-10-25 21:11:53,533 epoch 9 - iter 29/292 - loss 0.00434558 - time (sec): 1.31 - samples/sec: 3611.31 - lr: 0.000006 - momentum: 0.000000
2023-10-25 21:11:54,797 epoch 9 - iter 58/292 - loss 0.00498322 - time (sec): 2.58 - samples/sec: 3555.16 - lr: 0.000006 - momentum: 0.000000
2023-10-25 21:11:56,049 epoch 9 - iter 87/292 - loss 0.00460247 - time (sec): 3.83 - samples/sec: 3395.57 - lr: 0.000006 - momentum: 0.000000
2023-10-25 21:11:57,416 epoch 9 - iter 116/292 - loss 0.00516195 - time (sec): 5.20 - samples/sec: 3445.47 - lr: 0.000005 - momentum: 0.000000
2023-10-25 21:11:58,788 epoch 9 - iter 145/292 - loss 0.00698851 - time (sec): 6.57 - samples/sec: 3472.08 - lr: 0.000005 - momentum: 0.000000
2023-10-25 21:12:00,089 epoch 9 - iter 174/292 - loss 0.00881053 - time (sec): 7.87 - samples/sec: 3480.23 - lr: 0.000005 - momentum: 0.000000
2023-10-25 21:12:01,400 epoch 9 - iter 203/292 - loss 0.00796033 - time (sec): 9.18 - samples/sec: 3444.05 - lr: 0.000004 - momentum: 0.000000
2023-10-25 21:12:02,643 epoch 9 - iter 232/292 - loss 0.00873789 - time (sec): 10.42 - samples/sec: 3416.19 - lr: 0.000004 - momentum: 0.000000
2023-10-25 21:12:03,950 epoch 9 - iter 261/292 - loss 0.00842511 - time (sec): 11.73 - samples/sec: 3381.93 - lr: 0.000004 - momentum: 0.000000
2023-10-25 21:12:05,264 epoch 9 - iter 290/292 - loss 0.00863425 - time (sec): 13.04 - samples/sec: 3387.58 - lr: 0.000003 - momentum: 0.000000
2023-10-25 21:12:05,342 ----------------------------------------------------------------------------------------------------
2023-10-25 21:12:05,342 EPOCH 9 done: loss 0.0086 - lr: 0.000003
2023-10-25 21:12:06,262 DEV : loss 0.1723966747522354 - f1-score (micro avg) 0.7479
2023-10-25 21:12:06,266 ----------------------------------------------------------------------------------------------------
2023-10-25 21:12:07,519 epoch 10 - iter 29/292 - loss 0.00123487 - time (sec): 1.25 - samples/sec: 3447.36 - lr: 0.000003 - momentum: 0.000000
2023-10-25 21:12:08,777 epoch 10 - iter 58/292 - loss 0.00792751 - time (sec): 2.51 - samples/sec: 3468.05 - lr: 0.000003 - momentum: 0.000000
2023-10-25 21:12:10,077 epoch 10 - iter 87/292 - loss 0.01303484 - time (sec): 3.81 - samples/sec: 3461.35 - lr: 0.000002 - momentum: 0.000000
2023-10-25 21:12:11,290 epoch 10 - iter 116/292 - loss 0.01034843 - time (sec): 5.02 - samples/sec: 3469.73 - lr: 0.000002 - momentum: 0.000000
2023-10-25 21:12:12,506 epoch 10 - iter 145/292 - loss 0.00886202 - time (sec): 6.24 - samples/sec: 3469.77 - lr: 0.000002 - momentum: 0.000000
2023-10-25 21:12:13,730 epoch 10 - iter 174/292 - loss 0.00848014 - time (sec): 7.46 - samples/sec: 3435.23 - lr: 0.000001 - momentum: 0.000000
2023-10-25 21:12:15,122 epoch 10 - iter 203/292 - loss 0.00876510 - time (sec): 8.86 - samples/sec: 3504.95 - lr: 0.000001 - momentum: 0.000000
2023-10-25 21:12:16,413 epoch 10 - iter 232/292 - loss 0.00952739 - time (sec): 10.15 - samples/sec: 3475.76 - lr: 0.000001 - momentum: 0.000000
2023-10-25 21:12:17,803 epoch 10 - iter 261/292 - loss 0.00885530 - time (sec): 11.54 - samples/sec: 3462.61 - lr: 0.000000 - momentum: 0.000000
2023-10-25 21:12:19,084 epoch 10 - iter 290/292 - loss 0.00831916 - time (sec): 12.82 - samples/sec: 3445.59 - lr: 0.000000 - momentum: 0.000000
2023-10-25 21:12:19,171 ----------------------------------------------------------------------------------------------------
2023-10-25 21:12:19,171 EPOCH 10 done: loss 0.0083 - lr: 0.000000
2023-10-25 21:12:20,093 DEV : loss 0.17783689498901367 - f1-score (micro avg) 0.7511
2023-10-25 21:12:20,562 ----------------------------------------------------------------------------------------------------
2023-10-25 21:12:20,563 Loading model from best epoch ...
2023-10-25 21:12:22,175 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-25 21:12:23,911
Results:
- F-score (micro) 0.7554
- F-score (macro) 0.6669
- Accuracy 0.6324
By class:
precision recall f1-score support
PER 0.7733 0.8333 0.8022 348
LOC 0.7063 0.8199 0.7589 261
ORG 0.4583 0.4231 0.4400 52
HumanProd 0.6154 0.7273 0.6667 22
micro avg 0.7207 0.7936 0.7554 683
macro avg 0.6383 0.7009 0.6669 683
weighted avg 0.7186 0.7936 0.7537 683
2023-10-25 21:12:23,911 ----------------------------------------------------------------------------------------------------