2023-10-11 22:21:01,967 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:21:01,969 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 22:21:01,969 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:21:01,969 MultiCorpus: 7142 train + 698 dev + 2570 test sentences - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator 2023-10-11 22:21:01,969 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:21:01,969 Train: 7142 sentences 2023-10-11 22:21:01,969 (train_with_dev=False, train_with_test=False) 2023-10-11 22:21:01,970 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:21:01,970 Training Params: 2023-10-11 22:21:01,970 - learning_rate: "0.00016" 2023-10-11 22:21:01,970 - mini_batch_size: "8" 2023-10-11 22:21:01,970 - max_epochs: "10" 2023-10-11 22:21:01,970 - shuffle: "True" 2023-10-11 22:21:01,970 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:21:01,970 Plugins: 2023-10-11 22:21:01,970 - TensorboardLogger 2023-10-11 22:21:01,970 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 22:21:01,970 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:21:01,970 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 22:21:01,970 - metric: "('micro avg', 'f1-score')" 2023-10-11 22:21:01,970 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:21:01,970 Computation: 2023-10-11 22:21:01,970 - compute on device: cuda:0 2023-10-11 22:21:01,971 - embedding storage: none 2023-10-11 22:21:01,971 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:21:01,971 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5" 2023-10-11 22:21:01,971 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:21:01,971 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:21:01,971 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 22:21:54,369 epoch 1 - iter 89/893 - loss 2.81524244 - time (sec): 52.40 - samples/sec: 516.95 - lr: 0.000016 - momentum: 0.000000 2023-10-11 22:22:46,282 epoch 1 - iter 178/893 - loss 2.73176101 - time (sec): 104.31 - samples/sec: 506.78 - lr: 0.000032 - momentum: 0.000000 2023-10-11 22:23:38,432 epoch 1 - iter 267/893 - loss 2.52603415 - time (sec): 156.46 - samples/sec: 509.33 - lr: 0.000048 - momentum: 0.000000 2023-10-11 22:24:27,279 epoch 1 - iter 356/893 - loss 2.31615221 - time (sec): 205.31 - samples/sec: 510.58 - lr: 0.000064 - momentum: 0.000000 2023-10-11 22:25:16,525 epoch 1 - iter 445/893 - loss 2.08800093 - time (sec): 254.55 - samples/sec: 508.25 - lr: 0.000080 - momentum: 0.000000 2023-10-11 22:26:05,675 epoch 1 - iter 534/893 - loss 1.87576125 - time (sec): 303.70 - samples/sec: 503.88 - lr: 0.000095 - momentum: 0.000000 2023-10-11 22:26:54,470 epoch 1 - iter 623/893 - loss 1.70576459 - time (sec): 352.50 - samples/sec: 502.24 - lr: 0.000111 - momentum: 0.000000 2023-10-11 22:27:42,644 epoch 1 - iter 712/893 - loss 1.56835597 - time (sec): 400.67 - samples/sec: 497.83 - lr: 0.000127 - momentum: 0.000000 2023-10-11 22:28:31,216 epoch 1 - iter 801/893 - loss 1.44155521 - time (sec): 449.24 - samples/sec: 497.76 - lr: 0.000143 - momentum: 0.000000 2023-10-11 22:29:19,353 epoch 1 - iter 890/893 - loss 1.33419276 - time (sec): 497.38 - samples/sec: 498.80 - lr: 0.000159 - momentum: 0.000000 2023-10-11 22:29:20,732 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:29:20,732 EPOCH 1 done: loss 1.3312 - lr: 0.000159 2023-10-11 22:29:40,984 DEV : loss 0.24347300827503204 - f1-score (micro avg) 0.4712 2023-10-11 22:29:41,014 saving best model 2023-10-11 22:29:41,873 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:30:32,952 epoch 2 - iter 89/893 - loss 0.28954077 - time (sec): 51.08 - samples/sec: 488.58 - lr: 0.000158 - momentum: 0.000000 2023-10-11 22:31:23,495 epoch 2 - iter 178/893 - loss 0.26838336 - time (sec): 101.62 - samples/sec: 495.29 - lr: 0.000156 - momentum: 0.000000 2023-10-11 22:32:12,958 epoch 2 - iter 267/893 - loss 0.24433660 - time (sec): 151.08 - samples/sec: 499.09 - lr: 0.000155 - momentum: 0.000000 2023-10-11 22:33:01,075 epoch 2 - iter 356/893 - loss 0.22640434 - time (sec): 199.20 - samples/sec: 501.23 - lr: 0.000153 - momentum: 0.000000 2023-10-11 22:33:50,431 epoch 2 - iter 445/893 - loss 0.20805597 - time (sec): 248.56 - samples/sec: 507.68 - lr: 0.000151 - momentum: 0.000000 2023-10-11 22:34:37,653 epoch 2 - iter 534/893 - loss 0.19855291 - time (sec): 295.78 - samples/sec: 505.35 - lr: 0.000149 - momentum: 0.000000 2023-10-11 22:35:25,559 epoch 2 - iter 623/893 - loss 0.18874509 - time (sec): 343.68 - samples/sec: 504.29 - lr: 0.000148 - momentum: 0.000000 2023-10-11 22:36:14,187 epoch 2 - iter 712/893 - loss 0.18046880 - time (sec): 392.31 - samples/sec: 506.71 - lr: 0.000146 - momentum: 0.000000 2023-10-11 22:37:02,132 epoch 2 - iter 801/893 - loss 0.17469471 - time (sec): 440.26 - samples/sec: 506.29 - lr: 0.000144 - momentum: 0.000000 2023-10-11 22:37:50,288 epoch 2 - iter 890/893 - loss 0.16713569 - time (sec): 488.41 - samples/sec: 506.95 - lr: 0.000142 - momentum: 0.000000 2023-10-11 22:37:52,010 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:37:52,010 EPOCH 2 done: loss 0.1669 - lr: 0.000142 2023-10-11 22:38:12,850 DEV : loss 0.09539955109357834 - f1-score (micro avg) 0.7653 2023-10-11 22:38:12,880 saving best model 2023-10-11 22:38:15,886 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:39:03,890 epoch 3 - iter 89/893 - loss 0.07439450 - time (sec): 48.00 - samples/sec: 511.31 - lr: 0.000140 - momentum: 0.000000 2023-10-11 22:39:52,230 epoch 3 - iter 178/893 - loss 0.07435914 - time (sec): 96.34 - samples/sec: 519.96 - lr: 0.000139 - momentum: 0.000000 2023-10-11 22:40:39,097 epoch 3 - iter 267/893 - loss 0.07407314 - time (sec): 143.21 - samples/sec: 514.84 - lr: 0.000137 - momentum: 0.000000 2023-10-11 22:41:26,901 epoch 3 - iter 356/893 - loss 0.07395440 - time (sec): 191.01 - samples/sec: 512.44 - lr: 0.000135 - momentum: 0.000000 2023-10-11 22:42:15,822 epoch 3 - iter 445/893 - loss 0.07186459 - time (sec): 239.93 - samples/sec: 515.32 - lr: 0.000133 - momentum: 0.000000 2023-10-11 22:43:06,799 epoch 3 - iter 534/893 - loss 0.07219677 - time (sec): 290.91 - samples/sec: 513.25 - lr: 0.000132 - momentum: 0.000000 2023-10-11 22:43:56,818 epoch 3 - iter 623/893 - loss 0.07111615 - time (sec): 340.93 - samples/sec: 510.62 - lr: 0.000130 - momentum: 0.000000 2023-10-11 22:44:45,025 epoch 3 - iter 712/893 - loss 0.07090444 - time (sec): 389.13 - samples/sec: 506.99 - lr: 0.000128 - momentum: 0.000000 2023-10-11 22:45:33,782 epoch 3 - iter 801/893 - loss 0.07231914 - time (sec): 437.89 - samples/sec: 506.23 - lr: 0.000126 - momentum: 0.000000 2023-10-11 22:46:23,881 epoch 3 - iter 890/893 - loss 0.07073149 - time (sec): 487.99 - samples/sec: 508.09 - lr: 0.000125 - momentum: 0.000000 2023-10-11 22:46:25,367 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:46:25,367 EPOCH 3 done: loss 0.0708 - lr: 0.000125 2023-10-11 22:46:46,567 DEV : loss 0.10698171705007553 - f1-score (micro avg) 0.7863 2023-10-11 22:46:46,596 saving best model 2023-10-11 22:46:49,112 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:47:39,142 epoch 4 - iter 89/893 - loss 0.04988314 - time (sec): 50.03 - samples/sec: 536.16 - lr: 0.000123 - momentum: 0.000000 2023-10-11 22:48:27,553 epoch 4 - iter 178/893 - loss 0.04846943 - time (sec): 98.44 - samples/sec: 516.18 - lr: 0.000121 - momentum: 0.000000 2023-10-11 22:49:16,600 epoch 4 - iter 267/893 - loss 0.04681982 - time (sec): 147.48 - samples/sec: 514.66 - lr: 0.000119 - momentum: 0.000000 2023-10-11 22:50:05,132 epoch 4 - iter 356/893 - loss 0.04623150 - time (sec): 196.02 - samples/sec: 512.55 - lr: 0.000117 - momentum: 0.000000 2023-10-11 22:50:53,786 epoch 4 - iter 445/893 - loss 0.04787124 - time (sec): 244.67 - samples/sec: 507.06 - lr: 0.000116 - momentum: 0.000000 2023-10-11 22:51:42,456 epoch 4 - iter 534/893 - loss 0.04769015 - time (sec): 293.34 - samples/sec: 509.14 - lr: 0.000114 - momentum: 0.000000 2023-10-11 22:52:29,965 epoch 4 - iter 623/893 - loss 0.04761173 - time (sec): 340.85 - samples/sec: 507.97 - lr: 0.000112 - momentum: 0.000000 2023-10-11 22:53:17,603 epoch 4 - iter 712/893 - loss 0.04741060 - time (sec): 388.49 - samples/sec: 507.58 - lr: 0.000110 - momentum: 0.000000 2023-10-11 22:54:06,700 epoch 4 - iter 801/893 - loss 0.04736835 - time (sec): 437.58 - samples/sec: 511.68 - lr: 0.000109 - momentum: 0.000000 2023-10-11 22:54:55,040 epoch 4 - iter 890/893 - loss 0.04709479 - time (sec): 485.92 - samples/sec: 510.57 - lr: 0.000107 - momentum: 0.000000 2023-10-11 22:54:56,494 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:54:56,495 EPOCH 4 done: loss 0.0471 - lr: 0.000107 2023-10-11 22:55:18,057 DEV : loss 0.12400590628385544 - f1-score (micro avg) 0.7966 2023-10-11 22:55:18,087 saving best model 2023-10-11 22:55:20,719 ---------------------------------------------------------------------------------------------------- 2023-10-11 22:56:09,865 epoch 5 - iter 89/893 - loss 0.03397882 - time (sec): 49.14 - samples/sec: 499.21 - lr: 0.000105 - momentum: 0.000000 2023-10-11 22:56:58,565 epoch 5 - iter 178/893 - loss 0.03522845 - time (sec): 97.84 - samples/sec: 500.50 - lr: 0.000103 - momentum: 0.000000 2023-10-11 22:57:48,347 epoch 5 - iter 267/893 - loss 0.03521031 - time (sec): 147.62 - samples/sec: 503.01 - lr: 0.000101 - momentum: 0.000000 2023-10-11 22:58:35,076 epoch 5 - iter 356/893 - loss 0.03460673 - time (sec): 194.35 - samples/sec: 502.96 - lr: 0.000100 - momentum: 0.000000 2023-10-11 22:59:22,704 epoch 5 - iter 445/893 - loss 0.03523870 - time (sec): 241.98 - samples/sec: 502.99 - lr: 0.000098 - momentum: 0.000000 2023-10-11 23:00:10,809 epoch 5 - iter 534/893 - loss 0.03478198 - time (sec): 290.09 - samples/sec: 504.56 - lr: 0.000096 - momentum: 0.000000 2023-10-11 23:01:00,615 epoch 5 - iter 623/893 - loss 0.03495198 - time (sec): 339.89 - samples/sec: 510.85 - lr: 0.000094 - momentum: 0.000000 2023-10-11 23:01:50,082 epoch 5 - iter 712/893 - loss 0.03581308 - time (sec): 389.36 - samples/sec: 509.69 - lr: 0.000093 - momentum: 0.000000 2023-10-11 23:02:39,996 epoch 5 - iter 801/893 - loss 0.03648619 - time (sec): 439.27 - samples/sec: 508.22 - lr: 0.000091 - momentum: 0.000000 2023-10-11 23:03:30,163 epoch 5 - iter 890/893 - loss 0.03607845 - time (sec): 489.44 - samples/sec: 506.91 - lr: 0.000089 - momentum: 0.000000 2023-10-11 23:03:31,620 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:03:31,620 EPOCH 5 done: loss 0.0361 - lr: 0.000089 2023-10-11 23:03:52,415 DEV : loss 0.14019542932510376 - f1-score (micro avg) 0.8003 2023-10-11 23:03:52,447 saving best model 2023-10-11 23:03:54,985 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:04:45,324 epoch 6 - iter 89/893 - loss 0.02545772 - time (sec): 50.33 - samples/sec: 510.51 - lr: 0.000087 - momentum: 0.000000 2023-10-11 23:05:34,136 epoch 6 - iter 178/893 - loss 0.02647733 - time (sec): 99.15 - samples/sec: 502.13 - lr: 0.000085 - momentum: 0.000000 2023-10-11 23:06:25,652 epoch 6 - iter 267/893 - loss 0.02542927 - time (sec): 150.66 - samples/sec: 512.02 - lr: 0.000084 - momentum: 0.000000 2023-10-11 23:07:14,673 epoch 6 - iter 356/893 - loss 0.02606427 - time (sec): 199.68 - samples/sec: 507.41 - lr: 0.000082 - momentum: 0.000000 2023-10-11 23:08:04,660 epoch 6 - iter 445/893 - loss 0.02787874 - time (sec): 249.67 - samples/sec: 510.08 - lr: 0.000080 - momentum: 0.000000 2023-10-11 23:08:53,284 epoch 6 - iter 534/893 - loss 0.02723140 - time (sec): 298.29 - samples/sec: 508.54 - lr: 0.000078 - momentum: 0.000000 2023-10-11 23:09:42,484 epoch 6 - iter 623/893 - loss 0.02737784 - time (sec): 347.49 - samples/sec: 505.88 - lr: 0.000077 - momentum: 0.000000 2023-10-11 23:10:34,252 epoch 6 - iter 712/893 - loss 0.02676267 - time (sec): 399.26 - samples/sec: 503.29 - lr: 0.000075 - momentum: 0.000000 2023-10-11 23:11:22,438 epoch 6 - iter 801/893 - loss 0.02657678 - time (sec): 447.45 - samples/sec: 501.14 - lr: 0.000073 - momentum: 0.000000 2023-10-11 23:12:11,166 epoch 6 - iter 890/893 - loss 0.02720868 - time (sec): 496.18 - samples/sec: 499.17 - lr: 0.000071 - momentum: 0.000000 2023-10-11 23:12:12,922 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:12:12,922 EPOCH 6 done: loss 0.0273 - lr: 0.000071 2023-10-11 23:12:34,497 DEV : loss 0.15041321516036987 - f1-score (micro avg) 0.8117 2023-10-11 23:12:34,527 saving best model 2023-10-11 23:12:37,084 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:13:26,120 epoch 7 - iter 89/893 - loss 0.02780661 - time (sec): 49.03 - samples/sec: 490.86 - lr: 0.000069 - momentum: 0.000000 2023-10-11 23:14:16,338 epoch 7 - iter 178/893 - loss 0.02358968 - time (sec): 99.25 - samples/sec: 501.77 - lr: 0.000068 - momentum: 0.000000 2023-10-11 23:15:05,061 epoch 7 - iter 267/893 - loss 0.02258577 - time (sec): 147.97 - samples/sec: 497.61 - lr: 0.000066 - momentum: 0.000000 2023-10-11 23:15:55,092 epoch 7 - iter 356/893 - loss 0.02043234 - time (sec): 198.00 - samples/sec: 502.39 - lr: 0.000064 - momentum: 0.000000 2023-10-11 23:16:43,870 epoch 7 - iter 445/893 - loss 0.02126625 - time (sec): 246.78 - samples/sec: 504.49 - lr: 0.000062 - momentum: 0.000000 2023-10-11 23:17:34,601 epoch 7 - iter 534/893 - loss 0.02028243 - time (sec): 297.51 - samples/sec: 501.86 - lr: 0.000061 - momentum: 0.000000 2023-10-11 23:18:23,631 epoch 7 - iter 623/893 - loss 0.02053878 - time (sec): 346.54 - samples/sec: 500.97 - lr: 0.000059 - momentum: 0.000000 2023-10-11 23:19:12,400 epoch 7 - iter 712/893 - loss 0.02128759 - time (sec): 395.31 - samples/sec: 500.79 - lr: 0.000057 - momentum: 0.000000 2023-10-11 23:20:01,270 epoch 7 - iter 801/893 - loss 0.02189113 - time (sec): 444.18 - samples/sec: 502.62 - lr: 0.000055 - momentum: 0.000000 2023-10-11 23:20:50,098 epoch 7 - iter 890/893 - loss 0.02197161 - time (sec): 493.01 - samples/sec: 502.77 - lr: 0.000053 - momentum: 0.000000 2023-10-11 23:20:51,632 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:20:51,632 EPOCH 7 done: loss 0.0219 - lr: 0.000053 2023-10-11 23:21:13,195 DEV : loss 0.1641770303249359 - f1-score (micro avg) 0.8043 2023-10-11 23:21:13,225 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:22:01,599 epoch 8 - iter 89/893 - loss 0.01437145 - time (sec): 48.37 - samples/sec: 517.79 - lr: 0.000052 - momentum: 0.000000 2023-10-11 23:22:50,880 epoch 8 - iter 178/893 - loss 0.01549440 - time (sec): 97.65 - samples/sec: 515.13 - lr: 0.000050 - momentum: 0.000000 2023-10-11 23:23:39,221 epoch 8 - iter 267/893 - loss 0.01587140 - time (sec): 145.99 - samples/sec: 514.78 - lr: 0.000048 - momentum: 0.000000 2023-10-11 23:24:27,855 epoch 8 - iter 356/893 - loss 0.01530621 - time (sec): 194.63 - samples/sec: 505.15 - lr: 0.000046 - momentum: 0.000000 2023-10-11 23:25:15,842 epoch 8 - iter 445/893 - loss 0.01504943 - time (sec): 242.61 - samples/sec: 502.27 - lr: 0.000045 - momentum: 0.000000 2023-10-11 23:26:05,567 epoch 8 - iter 534/893 - loss 0.01505772 - time (sec): 292.34 - samples/sec: 506.57 - lr: 0.000043 - momentum: 0.000000 2023-10-11 23:26:53,841 epoch 8 - iter 623/893 - loss 0.01621767 - time (sec): 340.61 - samples/sec: 502.31 - lr: 0.000041 - momentum: 0.000000 2023-10-11 23:27:43,961 epoch 8 - iter 712/893 - loss 0.01613174 - time (sec): 390.73 - samples/sec: 504.62 - lr: 0.000039 - momentum: 0.000000 2023-10-11 23:28:34,573 epoch 8 - iter 801/893 - loss 0.01616556 - time (sec): 441.35 - samples/sec: 506.32 - lr: 0.000037 - momentum: 0.000000 2023-10-11 23:29:24,349 epoch 8 - iter 890/893 - loss 0.01640991 - time (sec): 491.12 - samples/sec: 504.76 - lr: 0.000036 - momentum: 0.000000 2023-10-11 23:29:25,954 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:29:25,955 EPOCH 8 done: loss 0.0164 - lr: 0.000036 2023-10-11 23:29:47,575 DEV : loss 0.1812078058719635 - f1-score (micro avg) 0.8045 2023-10-11 23:29:47,606 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:30:40,748 epoch 9 - iter 89/893 - loss 0.01204535 - time (sec): 53.14 - samples/sec: 487.81 - lr: 0.000034 - momentum: 0.000000 2023-10-11 23:31:32,198 epoch 9 - iter 178/893 - loss 0.00976354 - time (sec): 104.59 - samples/sec: 477.49 - lr: 0.000032 - momentum: 0.000000 2023-10-11 23:32:23,605 epoch 9 - iter 267/893 - loss 0.01155278 - time (sec): 156.00 - samples/sec: 476.39 - lr: 0.000030 - momentum: 0.000000 2023-10-11 23:33:14,498 epoch 9 - iter 356/893 - loss 0.01127452 - time (sec): 206.89 - samples/sec: 475.23 - lr: 0.000029 - momentum: 0.000000 2023-10-11 23:34:04,554 epoch 9 - iter 445/893 - loss 0.01129248 - time (sec): 256.95 - samples/sec: 478.02 - lr: 0.000027 - momentum: 0.000000 2023-10-11 23:34:56,240 epoch 9 - iter 534/893 - loss 0.01135105 - time (sec): 308.63 - samples/sec: 483.08 - lr: 0.000025 - momentum: 0.000000 2023-10-11 23:35:48,950 epoch 9 - iter 623/893 - loss 0.01222460 - time (sec): 361.34 - samples/sec: 485.20 - lr: 0.000023 - momentum: 0.000000 2023-10-11 23:36:39,707 epoch 9 - iter 712/893 - loss 0.01277049 - time (sec): 412.10 - samples/sec: 486.22 - lr: 0.000022 - momentum: 0.000000 2023-10-11 23:37:28,891 epoch 9 - iter 801/893 - loss 0.01295373 - time (sec): 461.28 - samples/sec: 486.62 - lr: 0.000020 - momentum: 0.000000 2023-10-11 23:38:17,433 epoch 9 - iter 890/893 - loss 0.01308695 - time (sec): 509.82 - samples/sec: 486.57 - lr: 0.000018 - momentum: 0.000000 2023-10-11 23:38:18,887 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:38:18,888 EPOCH 9 done: loss 0.0131 - lr: 0.000018 2023-10-11 23:38:40,916 DEV : loss 0.19328945875167847 - f1-score (micro avg) 0.8091 2023-10-11 23:38:40,947 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:39:33,172 epoch 10 - iter 89/893 - loss 0.01252914 - time (sec): 52.22 - samples/sec: 483.30 - lr: 0.000016 - momentum: 0.000000 2023-10-11 23:40:25,294 epoch 10 - iter 178/893 - loss 0.01243200 - time (sec): 104.34 - samples/sec: 483.94 - lr: 0.000014 - momentum: 0.000000 2023-10-11 23:41:17,755 epoch 10 - iter 267/893 - loss 0.01093071 - time (sec): 156.81 - samples/sec: 483.81 - lr: 0.000013 - momentum: 0.000000 2023-10-11 23:42:09,380 epoch 10 - iter 356/893 - loss 0.01152608 - time (sec): 208.43 - samples/sec: 485.20 - lr: 0.000011 - momentum: 0.000000 2023-10-11 23:43:01,810 epoch 10 - iter 445/893 - loss 0.01150384 - time (sec): 260.86 - samples/sec: 484.83 - lr: 0.000009 - momentum: 0.000000 2023-10-11 23:43:52,835 epoch 10 - iter 534/893 - loss 0.01091038 - time (sec): 311.89 - samples/sec: 482.41 - lr: 0.000007 - momentum: 0.000000 2023-10-11 23:44:45,715 epoch 10 - iter 623/893 - loss 0.01107542 - time (sec): 364.77 - samples/sec: 483.08 - lr: 0.000006 - momentum: 0.000000 2023-10-11 23:45:37,730 epoch 10 - iter 712/893 - loss 0.01039016 - time (sec): 416.78 - samples/sec: 480.44 - lr: 0.000004 - momentum: 0.000000 2023-10-11 23:46:29,653 epoch 10 - iter 801/893 - loss 0.01057808 - time (sec): 468.70 - samples/sec: 478.65 - lr: 0.000002 - momentum: 0.000000 2023-10-11 23:47:19,812 epoch 10 - iter 890/893 - loss 0.01067030 - time (sec): 518.86 - samples/sec: 478.32 - lr: 0.000000 - momentum: 0.000000 2023-10-11 23:47:21,250 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:47:21,250 EPOCH 10 done: loss 0.0106 - lr: 0.000000 2023-10-11 23:47:44,716 DEV : loss 0.19667339324951172 - f1-score (micro avg) 0.8085 2023-10-11 23:47:45,653 ---------------------------------------------------------------------------------------------------- 2023-10-11 23:47:45,656 Loading model from best epoch ... 2023-10-11 23:47:49,477 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-11 23:49:01,259 Results: - F-score (micro) 0.7008 - F-score (macro) 0.6502 - Accuracy 0.5557 By class: precision recall f1-score support LOC 0.6994 0.7160 0.7076 1095 PER 0.7683 0.7767 0.7725 1012 ORG 0.4524 0.5994 0.5157 357 HumanProd 0.5349 0.6970 0.6053 33 micro avg 0.6793 0.7237 0.7008 2497 macro avg 0.6138 0.6973 0.6502 2497 weighted avg 0.6898 0.7237 0.7051 2497 2023-10-11 23:49:01,259 ----------------------------------------------------------------------------------------------------