2023-10-11 01:09:15,579 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:09:15,581 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 01:09:15,581 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:09:15,582 MultiCorpus: 1166 train + 165 dev + 415 test sentences - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator 2023-10-11 01:09:15,582 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:09:15,582 Train: 1166 sentences 2023-10-11 01:09:15,582 (train_with_dev=False, train_with_test=False) 2023-10-11 01:09:15,582 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:09:15,582 Training Params: 2023-10-11 01:09:15,582 - learning_rate: "0.00016" 2023-10-11 01:09:15,582 - mini_batch_size: "4" 2023-10-11 01:09:15,582 - max_epochs: "10" 2023-10-11 01:09:15,582 - shuffle: "True" 2023-10-11 01:09:15,582 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:09:15,582 Plugins: 2023-10-11 01:09:15,582 - TensorboardLogger 2023-10-11 01:09:15,583 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 01:09:15,583 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:09:15,583 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 01:09:15,583 - metric: "('micro avg', 'f1-score')" 2023-10-11 01:09:15,583 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:09:15,583 Computation: 2023-10-11 01:09:15,583 - compute on device: cuda:0 2023-10-11 01:09:15,583 - embedding storage: none 2023-10-11 01:09:15,583 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:09:15,583 Model training base path: "hmbench-newseye/fi-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" 2023-10-11 01:09:15,583 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:09:15,583 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:09:15,583 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 01:09:24,274 epoch 1 - iter 29/292 - loss 2.82138332 - time (sec): 8.69 - samples/sec: 445.61 - lr: 0.000015 - momentum: 0.000000 2023-10-11 01:09:33,645 epoch 1 - iter 58/292 - loss 2.81043074 - time (sec): 18.06 - samples/sec: 465.46 - lr: 0.000031 - momentum: 0.000000 2023-10-11 01:09:42,777 epoch 1 - iter 87/292 - loss 2.78816206 - time (sec): 27.19 - samples/sec: 461.02 - lr: 0.000047 - momentum: 0.000000 2023-10-11 01:09:51,955 epoch 1 - iter 116/292 - loss 2.72134020 - time (sec): 36.37 - samples/sec: 461.95 - lr: 0.000063 - momentum: 0.000000 2023-10-11 01:10:01,600 epoch 1 - iter 145/292 - loss 2.62129296 - time (sec): 46.01 - samples/sec: 466.00 - lr: 0.000079 - momentum: 0.000000 2023-10-11 01:10:12,265 epoch 1 - iter 174/292 - loss 2.51278156 - time (sec): 56.68 - samples/sec: 470.71 - lr: 0.000095 - momentum: 0.000000 2023-10-11 01:10:22,717 epoch 1 - iter 203/292 - loss 2.39802974 - time (sec): 67.13 - samples/sec: 468.38 - lr: 0.000111 - momentum: 0.000000 2023-10-11 01:10:32,310 epoch 1 - iter 232/292 - loss 2.29971317 - time (sec): 76.72 - samples/sec: 459.07 - lr: 0.000127 - momentum: 0.000000 2023-10-11 01:10:42,367 epoch 1 - iter 261/292 - loss 2.17326226 - time (sec): 86.78 - samples/sec: 455.96 - lr: 0.000142 - momentum: 0.000000 2023-10-11 01:10:53,044 epoch 1 - iter 290/292 - loss 2.04893235 - time (sec): 97.46 - samples/sec: 451.82 - lr: 0.000158 - momentum: 0.000000 2023-10-11 01:10:53,766 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:10:53,766 EPOCH 1 done: loss 2.0372 - lr: 0.000158 2023-10-11 01:10:59,447 DEV : loss 0.6696223616600037 - f1-score (micro avg) 0.0 2023-10-11 01:10:59,457 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:11:08,593 epoch 2 - iter 29/292 - loss 0.71111290 - time (sec): 9.13 - samples/sec: 431.77 - lr: 0.000158 - momentum: 0.000000 2023-10-11 01:11:18,636 epoch 2 - iter 58/292 - loss 0.67102332 - time (sec): 19.18 - samples/sec: 416.01 - lr: 0.000157 - momentum: 0.000000 2023-10-11 01:11:29,392 epoch 2 - iter 87/292 - loss 0.64741993 - time (sec): 29.93 - samples/sec: 408.28 - lr: 0.000155 - momentum: 0.000000 2023-10-11 01:11:39,986 epoch 2 - iter 116/292 - loss 0.62252721 - time (sec): 40.53 - samples/sec: 410.46 - lr: 0.000153 - momentum: 0.000000 2023-10-11 01:11:50,075 epoch 2 - iter 145/292 - loss 0.57528282 - time (sec): 50.62 - samples/sec: 421.13 - lr: 0.000151 - momentum: 0.000000 2023-10-11 01:12:00,596 epoch 2 - iter 174/292 - loss 0.57623628 - time (sec): 61.14 - samples/sec: 423.72 - lr: 0.000149 - momentum: 0.000000 2023-10-11 01:12:10,194 epoch 2 - iter 203/292 - loss 0.55821018 - time (sec): 70.74 - samples/sec: 425.76 - lr: 0.000148 - momentum: 0.000000 2023-10-11 01:12:19,878 epoch 2 - iter 232/292 - loss 0.53215406 - time (sec): 80.42 - samples/sec: 431.54 - lr: 0.000146 - momentum: 0.000000 2023-10-11 01:12:29,269 epoch 2 - iter 261/292 - loss 0.51244315 - time (sec): 89.81 - samples/sec: 433.02 - lr: 0.000144 - momentum: 0.000000 2023-10-11 01:12:39,746 epoch 2 - iter 290/292 - loss 0.49473626 - time (sec): 100.29 - samples/sec: 439.91 - lr: 0.000142 - momentum: 0.000000 2023-10-11 01:12:40,311 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:12:40,312 EPOCH 2 done: loss 0.4934 - lr: 0.000142 2023-10-11 01:12:46,218 DEV : loss 0.28755611181259155 - f1-score (micro avg) 0.2051 2023-10-11 01:12:46,227 saving best model 2023-10-11 01:12:47,301 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:12:57,364 epoch 3 - iter 29/292 - loss 0.36695085 - time (sec): 10.06 - samples/sec: 505.45 - lr: 0.000141 - momentum: 0.000000 2023-10-11 01:13:07,801 epoch 3 - iter 58/292 - loss 0.32901598 - time (sec): 20.50 - samples/sec: 504.89 - lr: 0.000139 - momentum: 0.000000 2023-10-11 01:13:17,256 epoch 3 - iter 87/292 - loss 0.36851166 - time (sec): 29.95 - samples/sec: 497.62 - lr: 0.000137 - momentum: 0.000000 2023-10-11 01:13:26,734 epoch 3 - iter 116/292 - loss 0.35011302 - time (sec): 39.43 - samples/sec: 479.28 - lr: 0.000135 - momentum: 0.000000 2023-10-11 01:13:37,161 epoch 3 - iter 145/292 - loss 0.33484524 - time (sec): 49.86 - samples/sec: 482.01 - lr: 0.000133 - momentum: 0.000000 2023-10-11 01:13:46,264 epoch 3 - iter 174/292 - loss 0.32914702 - time (sec): 58.96 - samples/sec: 474.27 - lr: 0.000132 - momentum: 0.000000 2023-10-11 01:13:55,667 epoch 3 - iter 203/292 - loss 0.31846277 - time (sec): 68.36 - samples/sec: 469.10 - lr: 0.000130 - momentum: 0.000000 2023-10-11 01:14:03,997 epoch 3 - iter 232/292 - loss 0.31671133 - time (sec): 76.69 - samples/sec: 463.95 - lr: 0.000128 - momentum: 0.000000 2023-10-11 01:14:12,410 epoch 3 - iter 261/292 - loss 0.31151005 - time (sec): 85.11 - samples/sec: 459.35 - lr: 0.000126 - momentum: 0.000000 2023-10-11 01:14:22,271 epoch 3 - iter 290/292 - loss 0.30167347 - time (sec): 94.97 - samples/sec: 464.61 - lr: 0.000125 - momentum: 0.000000 2023-10-11 01:14:22,839 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:14:22,839 EPOCH 3 done: loss 0.3006 - lr: 0.000125 2023-10-11 01:14:28,418 DEV : loss 0.20087367296218872 - f1-score (micro avg) 0.549 2023-10-11 01:14:28,431 saving best model 2023-10-11 01:14:34,733 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:14:43,795 epoch 4 - iter 29/292 - loss 0.21083576 - time (sec): 9.06 - samples/sec: 439.73 - lr: 0.000123 - momentum: 0.000000 2023-10-11 01:14:53,684 epoch 4 - iter 58/292 - loss 0.21088872 - time (sec): 18.95 - samples/sec: 469.36 - lr: 0.000121 - momentum: 0.000000 2023-10-11 01:15:02,612 epoch 4 - iter 87/292 - loss 0.20714432 - time (sec): 27.87 - samples/sec: 454.79 - lr: 0.000119 - momentum: 0.000000 2023-10-11 01:15:13,032 epoch 4 - iter 116/292 - loss 0.21322282 - time (sec): 38.29 - samples/sec: 447.09 - lr: 0.000117 - momentum: 0.000000 2023-10-11 01:15:24,001 epoch 4 - iter 145/292 - loss 0.21851823 - time (sec): 49.26 - samples/sec: 452.79 - lr: 0.000116 - momentum: 0.000000 2023-10-11 01:15:33,267 epoch 4 - iter 174/292 - loss 0.21415411 - time (sec): 58.53 - samples/sec: 447.76 - lr: 0.000114 - momentum: 0.000000 2023-10-11 01:15:42,652 epoch 4 - iter 203/292 - loss 0.20685407 - time (sec): 67.91 - samples/sec: 451.13 - lr: 0.000112 - momentum: 0.000000 2023-10-11 01:15:51,860 epoch 4 - iter 232/292 - loss 0.20571818 - time (sec): 77.12 - samples/sec: 454.82 - lr: 0.000110 - momentum: 0.000000 2023-10-11 01:16:01,069 epoch 4 - iter 261/292 - loss 0.20428922 - time (sec): 86.33 - samples/sec: 454.53 - lr: 0.000109 - momentum: 0.000000 2023-10-11 01:16:11,317 epoch 4 - iter 290/292 - loss 0.19531517 - time (sec): 96.58 - samples/sec: 459.32 - lr: 0.000107 - momentum: 0.000000 2023-10-11 01:16:11,698 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:16:11,699 EPOCH 4 done: loss 0.1951 - lr: 0.000107 2023-10-11 01:16:17,290 DEV : loss 0.15432208776474 - f1-score (micro avg) 0.7049 2023-10-11 01:16:17,299 saving best model 2023-10-11 01:16:26,781 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:16:36,180 epoch 5 - iter 29/292 - loss 0.14839972 - time (sec): 9.39 - samples/sec: 495.53 - lr: 0.000105 - momentum: 0.000000 2023-10-11 01:16:45,355 epoch 5 - iter 58/292 - loss 0.12955448 - time (sec): 18.57 - samples/sec: 482.74 - lr: 0.000103 - momentum: 0.000000 2023-10-11 01:16:54,182 epoch 5 - iter 87/292 - loss 0.14171934 - time (sec): 27.40 - samples/sec: 470.00 - lr: 0.000101 - momentum: 0.000000 2023-10-11 01:17:03,515 epoch 5 - iter 116/292 - loss 0.15582821 - time (sec): 36.73 - samples/sec: 461.41 - lr: 0.000100 - momentum: 0.000000 2023-10-11 01:17:13,187 epoch 5 - iter 145/292 - loss 0.14216304 - time (sec): 46.40 - samples/sec: 465.31 - lr: 0.000098 - momentum: 0.000000 2023-10-11 01:17:23,390 epoch 5 - iter 174/292 - loss 0.13851691 - time (sec): 56.60 - samples/sec: 474.52 - lr: 0.000096 - momentum: 0.000000 2023-10-11 01:17:32,810 epoch 5 - iter 203/292 - loss 0.13535683 - time (sec): 66.02 - samples/sec: 477.46 - lr: 0.000094 - momentum: 0.000000 2023-10-11 01:17:42,124 epoch 5 - iter 232/292 - loss 0.13076921 - time (sec): 75.34 - samples/sec: 476.46 - lr: 0.000093 - momentum: 0.000000 2023-10-11 01:17:51,626 epoch 5 - iter 261/292 - loss 0.12830175 - time (sec): 84.84 - samples/sec: 478.51 - lr: 0.000091 - momentum: 0.000000 2023-10-11 01:18:00,202 epoch 5 - iter 290/292 - loss 0.12627346 - time (sec): 93.42 - samples/sec: 473.61 - lr: 0.000089 - momentum: 0.000000 2023-10-11 01:18:00,666 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:18:00,666 EPOCH 5 done: loss 0.1260 - lr: 0.000089 2023-10-11 01:18:06,406 DEV : loss 0.1440184861421585 - f1-score (micro avg) 0.7292 2023-10-11 01:18:06,416 saving best model 2023-10-11 01:18:13,003 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:18:22,831 epoch 6 - iter 29/292 - loss 0.07376508 - time (sec): 9.82 - samples/sec: 505.21 - lr: 0.000087 - momentum: 0.000000 2023-10-11 01:18:32,018 epoch 6 - iter 58/292 - loss 0.08073227 - time (sec): 19.01 - samples/sec: 475.57 - lr: 0.000085 - momentum: 0.000000 2023-10-11 01:18:41,172 epoch 6 - iter 87/292 - loss 0.07525013 - time (sec): 28.16 - samples/sec: 466.55 - lr: 0.000084 - momentum: 0.000000 2023-10-11 01:18:50,865 epoch 6 - iter 116/292 - loss 0.07204318 - time (sec): 37.85 - samples/sec: 469.81 - lr: 0.000082 - momentum: 0.000000 2023-10-11 01:18:59,907 epoch 6 - iter 145/292 - loss 0.08487665 - time (sec): 46.90 - samples/sec: 461.45 - lr: 0.000080 - momentum: 0.000000 2023-10-11 01:19:10,685 epoch 6 - iter 174/292 - loss 0.09333786 - time (sec): 57.67 - samples/sec: 475.69 - lr: 0.000078 - momentum: 0.000000 2023-10-11 01:19:20,158 epoch 6 - iter 203/292 - loss 0.09481407 - time (sec): 67.15 - samples/sec: 471.43 - lr: 0.000077 - momentum: 0.000000 2023-10-11 01:19:29,890 epoch 6 - iter 232/292 - loss 0.09180752 - time (sec): 76.88 - samples/sec: 470.53 - lr: 0.000075 - momentum: 0.000000 2023-10-11 01:19:38,917 epoch 6 - iter 261/292 - loss 0.09055575 - time (sec): 85.91 - samples/sec: 467.31 - lr: 0.000073 - momentum: 0.000000 2023-10-11 01:19:48,145 epoch 6 - iter 290/292 - loss 0.08929013 - time (sec): 95.13 - samples/sec: 465.70 - lr: 0.000071 - momentum: 0.000000 2023-10-11 01:19:48,560 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:19:48,561 EPOCH 6 done: loss 0.0892 - lr: 0.000071 2023-10-11 01:19:54,239 DEV : loss 0.1254325956106186 - f1-score (micro avg) 0.7407 2023-10-11 01:19:54,249 saving best model 2023-10-11 01:20:02,471 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:20:11,914 epoch 7 - iter 29/292 - loss 0.06249801 - time (sec): 9.44 - samples/sec: 504.60 - lr: 0.000069 - momentum: 0.000000 2023-10-11 01:20:21,557 epoch 7 - iter 58/292 - loss 0.06865511 - time (sec): 19.08 - samples/sec: 511.26 - lr: 0.000068 - momentum: 0.000000 2023-10-11 01:20:31,122 epoch 7 - iter 87/292 - loss 0.06644168 - time (sec): 28.65 - samples/sec: 486.54 - lr: 0.000066 - momentum: 0.000000 2023-10-11 01:20:40,308 epoch 7 - iter 116/292 - loss 0.06012363 - time (sec): 37.83 - samples/sec: 478.10 - lr: 0.000064 - momentum: 0.000000 2023-10-11 01:20:49,672 epoch 7 - iter 145/292 - loss 0.06400556 - time (sec): 47.20 - samples/sec: 471.22 - lr: 0.000062 - momentum: 0.000000 2023-10-11 01:20:58,515 epoch 7 - iter 174/292 - loss 0.06634884 - time (sec): 56.04 - samples/sec: 465.54 - lr: 0.000061 - momentum: 0.000000 2023-10-11 01:21:08,284 epoch 7 - iter 203/292 - loss 0.06682537 - time (sec): 65.81 - samples/sec: 467.70 - lr: 0.000059 - momentum: 0.000000 2023-10-11 01:21:17,433 epoch 7 - iter 232/292 - loss 0.06617070 - time (sec): 74.96 - samples/sec: 460.15 - lr: 0.000057 - momentum: 0.000000 2023-10-11 01:21:28,335 epoch 7 - iter 261/292 - loss 0.06842093 - time (sec): 85.86 - samples/sec: 464.98 - lr: 0.000055 - momentum: 0.000000 2023-10-11 01:21:37,946 epoch 7 - iter 290/292 - loss 0.06787746 - time (sec): 95.47 - samples/sec: 462.54 - lr: 0.000054 - momentum: 0.000000 2023-10-11 01:21:38,498 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:21:38,498 EPOCH 7 done: loss 0.0676 - lr: 0.000054 2023-10-11 01:21:44,277 DEV : loss 0.12312442809343338 - f1-score (micro avg) 0.7511 2023-10-11 01:21:44,286 saving best model 2023-10-11 01:21:58,909 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:22:09,738 epoch 8 - iter 29/292 - loss 0.05496167 - time (sec): 10.82 - samples/sec: 492.68 - lr: 0.000052 - momentum: 0.000000 2023-10-11 01:22:18,926 epoch 8 - iter 58/292 - loss 0.06399886 - time (sec): 20.01 - samples/sec: 464.52 - lr: 0.000050 - momentum: 0.000000 2023-10-11 01:22:27,954 epoch 8 - iter 87/292 - loss 0.06407329 - time (sec): 29.04 - samples/sec: 456.08 - lr: 0.000048 - momentum: 0.000000 2023-10-11 01:22:37,360 epoch 8 - iter 116/292 - loss 0.06153453 - time (sec): 38.45 - samples/sec: 459.18 - lr: 0.000046 - momentum: 0.000000 2023-10-11 01:22:46,996 epoch 8 - iter 145/292 - loss 0.06161687 - time (sec): 48.08 - samples/sec: 462.18 - lr: 0.000045 - momentum: 0.000000 2023-10-11 01:22:56,006 epoch 8 - iter 174/292 - loss 0.06132676 - time (sec): 57.09 - samples/sec: 456.44 - lr: 0.000043 - momentum: 0.000000 2023-10-11 01:23:05,668 epoch 8 - iter 203/292 - loss 0.05657878 - time (sec): 66.75 - samples/sec: 457.15 - lr: 0.000041 - momentum: 0.000000 2023-10-11 01:23:15,049 epoch 8 - iter 232/292 - loss 0.05378747 - time (sec): 76.14 - samples/sec: 454.81 - lr: 0.000039 - momentum: 0.000000 2023-10-11 01:23:25,764 epoch 8 - iter 261/292 - loss 0.05161368 - time (sec): 86.85 - samples/sec: 458.00 - lr: 0.000038 - momentum: 0.000000 2023-10-11 01:23:35,701 epoch 8 - iter 290/292 - loss 0.05375375 - time (sec): 96.79 - samples/sec: 456.04 - lr: 0.000036 - momentum: 0.000000 2023-10-11 01:23:36,310 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:23:36,311 EPOCH 8 done: loss 0.0546 - lr: 0.000036 2023-10-11 01:23:41,764 DEV : loss 0.12851200997829437 - f1-score (micro avg) 0.7706 2023-10-11 01:23:41,777 saving best model 2023-10-11 01:23:47,109 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:23:57,164 epoch 9 - iter 29/292 - loss 0.05893243 - time (sec): 10.05 - samples/sec: 481.02 - lr: 0.000034 - momentum: 0.000000 2023-10-11 01:24:07,377 epoch 9 - iter 58/292 - loss 0.04499649 - time (sec): 20.26 - samples/sec: 468.75 - lr: 0.000032 - momentum: 0.000000 2023-10-11 01:24:16,535 epoch 9 - iter 87/292 - loss 0.04343655 - time (sec): 29.42 - samples/sec: 457.20 - lr: 0.000030 - momentum: 0.000000 2023-10-11 01:24:27,105 epoch 9 - iter 116/292 - loss 0.04156469 - time (sec): 39.99 - samples/sec: 453.70 - lr: 0.000029 - momentum: 0.000000 2023-10-11 01:24:37,237 epoch 9 - iter 145/292 - loss 0.04445814 - time (sec): 50.12 - samples/sec: 457.66 - lr: 0.000027 - momentum: 0.000000 2023-10-11 01:24:47,093 epoch 9 - iter 174/292 - loss 0.04209638 - time (sec): 59.98 - samples/sec: 456.91 - lr: 0.000025 - momentum: 0.000000 2023-10-11 01:24:56,547 epoch 9 - iter 203/292 - loss 0.04103595 - time (sec): 69.43 - samples/sec: 453.71 - lr: 0.000023 - momentum: 0.000000 2023-10-11 01:25:06,689 epoch 9 - iter 232/292 - loss 0.03963824 - time (sec): 79.57 - samples/sec: 451.66 - lr: 0.000022 - momentum: 0.000000 2023-10-11 01:25:16,772 epoch 9 - iter 261/292 - loss 0.04525124 - time (sec): 89.66 - samples/sec: 448.60 - lr: 0.000020 - momentum: 0.000000 2023-10-11 01:25:26,328 epoch 9 - iter 290/292 - loss 0.04628705 - time (sec): 99.21 - samples/sec: 446.02 - lr: 0.000018 - momentum: 0.000000 2023-10-11 01:25:26,812 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:25:26,813 EPOCH 9 done: loss 0.0461 - lr: 0.000018 2023-10-11 01:25:32,378 DEV : loss 0.1242719292640686 - f1-score (micro avg) 0.7554 2023-10-11 01:25:32,387 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:25:42,386 epoch 10 - iter 29/292 - loss 0.03981653 - time (sec): 10.00 - samples/sec: 493.34 - lr: 0.000016 - momentum: 0.000000 2023-10-11 01:25:52,278 epoch 10 - iter 58/292 - loss 0.04380214 - time (sec): 19.89 - samples/sec: 478.41 - lr: 0.000014 - momentum: 0.000000 2023-10-11 01:26:02,344 epoch 10 - iter 87/292 - loss 0.04599707 - time (sec): 29.96 - samples/sec: 488.29 - lr: 0.000013 - momentum: 0.000000 2023-10-11 01:26:11,860 epoch 10 - iter 116/292 - loss 0.04295599 - time (sec): 39.47 - samples/sec: 482.26 - lr: 0.000011 - momentum: 0.000000 2023-10-11 01:26:21,296 epoch 10 - iter 145/292 - loss 0.04384992 - time (sec): 48.91 - samples/sec: 480.38 - lr: 0.000009 - momentum: 0.000000 2023-10-11 01:26:30,621 epoch 10 - iter 174/292 - loss 0.04333406 - time (sec): 58.23 - samples/sec: 473.38 - lr: 0.000007 - momentum: 0.000000 2023-10-11 01:26:40,130 epoch 10 - iter 203/292 - loss 0.04254014 - time (sec): 67.74 - samples/sec: 468.29 - lr: 0.000006 - momentum: 0.000000 2023-10-11 01:26:49,781 epoch 10 - iter 232/292 - loss 0.04066173 - time (sec): 77.39 - samples/sec: 467.04 - lr: 0.000004 - momentum: 0.000000 2023-10-11 01:26:58,937 epoch 10 - iter 261/292 - loss 0.04215663 - time (sec): 86.55 - samples/sec: 460.97 - lr: 0.000002 - momentum: 0.000000 2023-10-11 01:27:08,995 epoch 10 - iter 290/292 - loss 0.04144527 - time (sec): 96.61 - samples/sec: 459.04 - lr: 0.000000 - momentum: 0.000000 2023-10-11 01:27:09,389 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:27:09,390 EPOCH 10 done: loss 0.0414 - lr: 0.000000 2023-10-11 01:27:15,055 DEV : loss 0.1256779134273529 - f1-score (micro avg) 0.757 2023-10-11 01:27:15,935 ---------------------------------------------------------------------------------------------------- 2023-10-11 01:27:15,937 Loading model from best epoch ... 2023-10-11 01:27:20,118 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 01:27:32,789 Results: - F-score (micro) 0.7207 - F-score (macro) 0.6769 - Accuracy 0.5807 By class: precision recall f1-score support PER 0.8242 0.8218 0.8230 348 LOC 0.5598 0.7893 0.6550 261 ORG 0.3800 0.3654 0.3725 52 HumanProd 0.9000 0.8182 0.8571 22 micro avg 0.6739 0.7745 0.7207 683 macro avg 0.6660 0.6987 0.6769 683 weighted avg 0.6918 0.7745 0.7256 683 2023-10-11 01:27:32,790 ----------------------------------------------------------------------------------------------------