2023-10-08 20:03:57,269 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:03:57,271 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-08 20:03:57,271 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:03:57,271 MultiCorpus: 966 train + 219 dev + 204 test sentences - NER_HIPE_2022 Corpus: 966 train + 219 dev + 204 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/fr/with_doc_seperator 2023-10-08 20:03:57,271 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:03:57,271 Train: 966 sentences 2023-10-08 20:03:57,271 (train_with_dev=False, train_with_test=False) 2023-10-08 20:03:57,271 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:03:57,271 Training Params: 2023-10-08 20:03:57,271 - learning_rate: "0.00015" 2023-10-08 20:03:57,271 - mini_batch_size: "8" 2023-10-08 20:03:57,271 - max_epochs: "10" 2023-10-08 20:03:57,271 - shuffle: "True" 2023-10-08 20:03:57,271 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:03:57,271 Plugins: 2023-10-08 20:03:57,271 - TensorboardLogger 2023-10-08 20:03:57,272 - LinearScheduler | warmup_fraction: '0.1' 2023-10-08 20:03:57,272 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:03:57,272 Final evaluation on model from best epoch (best-model.pt) 2023-10-08 20:03:57,272 - metric: "('micro avg', 'f1-score')" 2023-10-08 20:03:57,272 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:03:57,272 Computation: 2023-10-08 20:03:57,272 - compute on device: cuda:0 2023-10-08 20:03:57,272 - embedding storage: none 2023-10-08 20:03:57,272 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:03:57,272 Model training base path: "hmbench-ajmc/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" 2023-10-08 20:03:57,272 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:03:57,272 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:03:57,272 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-08 20:04:06,249 epoch 1 - iter 12/121 - loss 3.22945229 - time (sec): 8.98 - samples/sec: 290.57 - lr: 0.000014 - momentum: 0.000000 2023-10-08 20:04:15,727 epoch 1 - iter 24/121 - loss 3.22405629 - time (sec): 18.45 - samples/sec: 293.17 - lr: 0.000029 - momentum: 0.000000 2023-10-08 20:04:24,171 epoch 1 - iter 36/121 - loss 3.21514515 - time (sec): 26.90 - samples/sec: 287.31 - lr: 0.000043 - momentum: 0.000000 2023-10-08 20:04:32,083 epoch 1 - iter 48/121 - loss 3.20153422 - time (sec): 34.81 - samples/sec: 284.49 - lr: 0.000058 - momentum: 0.000000 2023-10-08 20:04:40,570 epoch 1 - iter 60/121 - loss 3.17335679 - time (sec): 43.30 - samples/sec: 283.39 - lr: 0.000073 - momentum: 0.000000 2023-10-08 20:04:48,932 epoch 1 - iter 72/121 - loss 3.12161775 - time (sec): 51.66 - samples/sec: 282.54 - lr: 0.000088 - momentum: 0.000000 2023-10-08 20:04:57,610 epoch 1 - iter 84/121 - loss 3.05285612 - time (sec): 60.34 - samples/sec: 283.26 - lr: 0.000103 - momentum: 0.000000 2023-10-08 20:05:06,303 epoch 1 - iter 96/121 - loss 2.97593010 - time (sec): 69.03 - samples/sec: 284.12 - lr: 0.000118 - momentum: 0.000000 2023-10-08 20:05:15,331 epoch 1 - iter 108/121 - loss 2.88990794 - time (sec): 78.06 - samples/sec: 285.79 - lr: 0.000133 - momentum: 0.000000 2023-10-08 20:05:23,460 epoch 1 - iter 120/121 - loss 2.81371465 - time (sec): 86.19 - samples/sec: 284.98 - lr: 0.000148 - momentum: 0.000000 2023-10-08 20:05:24,011 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:05:24,012 EPOCH 1 done: loss 2.8066 - lr: 0.000148 2023-10-08 20:05:29,703 DEV : loss 1.867384672164917 - f1-score (micro avg) 0.0 2023-10-08 20:05:29,709 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:05:38,055 epoch 2 - iter 12/121 - loss 1.86281691 - time (sec): 8.34 - samples/sec: 278.26 - lr: 0.000148 - momentum: 0.000000 2023-10-08 20:05:46,497 epoch 2 - iter 24/121 - loss 1.76055145 - time (sec): 16.79 - samples/sec: 277.47 - lr: 0.000147 - momentum: 0.000000 2023-10-08 20:05:55,482 epoch 2 - iter 36/121 - loss 1.65321244 - time (sec): 25.77 - samples/sec: 281.12 - lr: 0.000145 - momentum: 0.000000 2023-10-08 20:06:04,158 epoch 2 - iter 48/121 - loss 1.55933884 - time (sec): 34.45 - samples/sec: 281.44 - lr: 0.000144 - momentum: 0.000000 2023-10-08 20:06:13,396 epoch 2 - iter 60/121 - loss 1.45948409 - time (sec): 43.69 - samples/sec: 277.82 - lr: 0.000142 - momentum: 0.000000 2023-10-08 20:06:22,138 epoch 2 - iter 72/121 - loss 1.37534962 - time (sec): 52.43 - samples/sec: 280.21 - lr: 0.000140 - momentum: 0.000000 2023-10-08 20:06:30,646 epoch 2 - iter 84/121 - loss 1.30524947 - time (sec): 60.94 - samples/sec: 280.72 - lr: 0.000139 - momentum: 0.000000 2023-10-08 20:06:39,176 epoch 2 - iter 96/121 - loss 1.23883488 - time (sec): 69.47 - samples/sec: 280.99 - lr: 0.000137 - momentum: 0.000000 2023-10-08 20:06:47,356 epoch 2 - iter 108/121 - loss 1.18670593 - time (sec): 77.65 - samples/sec: 280.10 - lr: 0.000135 - momentum: 0.000000 2023-10-08 20:06:56,694 epoch 2 - iter 120/121 - loss 1.11677159 - time (sec): 86.98 - samples/sec: 281.50 - lr: 0.000134 - momentum: 0.000000 2023-10-08 20:06:57,490 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:06:57,491 EPOCH 2 done: loss 1.1103 - lr: 0.000134 2023-10-08 20:07:03,392 DEV : loss 0.6604976058006287 - f1-score (micro avg) 0.0 2023-10-08 20:07:03,398 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:07:12,382 epoch 3 - iter 12/121 - loss 0.57517981 - time (sec): 8.98 - samples/sec: 296.78 - lr: 0.000132 - momentum: 0.000000 2023-10-08 20:07:21,696 epoch 3 - iter 24/121 - loss 0.60544226 - time (sec): 18.30 - samples/sec: 291.52 - lr: 0.000130 - momentum: 0.000000 2023-10-08 20:07:30,560 epoch 3 - iter 36/121 - loss 0.59734278 - time (sec): 27.16 - samples/sec: 286.95 - lr: 0.000129 - momentum: 0.000000 2023-10-08 20:07:38,844 epoch 3 - iter 48/121 - loss 0.59586307 - time (sec): 35.45 - samples/sec: 282.32 - lr: 0.000127 - momentum: 0.000000 2023-10-08 20:07:47,306 epoch 3 - iter 60/121 - loss 0.57311330 - time (sec): 43.91 - samples/sec: 281.75 - lr: 0.000125 - momentum: 0.000000 2023-10-08 20:07:55,299 epoch 3 - iter 72/121 - loss 0.56528685 - time (sec): 51.90 - samples/sec: 279.73 - lr: 0.000124 - momentum: 0.000000 2023-10-08 20:08:04,090 epoch 3 - iter 84/121 - loss 0.55101440 - time (sec): 60.69 - samples/sec: 280.65 - lr: 0.000122 - momentum: 0.000000 2023-10-08 20:08:13,549 epoch 3 - iter 96/121 - loss 0.52090380 - time (sec): 70.15 - samples/sec: 281.90 - lr: 0.000120 - momentum: 0.000000 2023-10-08 20:08:21,933 epoch 3 - iter 108/121 - loss 0.50750413 - time (sec): 78.53 - samples/sec: 282.31 - lr: 0.000119 - momentum: 0.000000 2023-10-08 20:08:30,508 epoch 3 - iter 120/121 - loss 0.49848500 - time (sec): 87.11 - samples/sec: 282.37 - lr: 0.000117 - momentum: 0.000000 2023-10-08 20:08:31,057 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:08:31,058 EPOCH 3 done: loss 0.4982 - lr: 0.000117 2023-10-08 20:08:36,857 DEV : loss 0.3904056251049042 - f1-score (micro avg) 0.0 2023-10-08 20:08:36,863 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:08:45,965 epoch 4 - iter 12/121 - loss 0.29145773 - time (sec): 9.10 - samples/sec: 291.95 - lr: 0.000115 - momentum: 0.000000 2023-10-08 20:08:55,051 epoch 4 - iter 24/121 - loss 0.29475320 - time (sec): 18.19 - samples/sec: 290.00 - lr: 0.000114 - momentum: 0.000000 2023-10-08 20:09:03,275 epoch 4 - iter 36/121 - loss 0.30877627 - time (sec): 26.41 - samples/sec: 286.73 - lr: 0.000112 - momentum: 0.000000 2023-10-08 20:09:11,553 epoch 4 - iter 48/121 - loss 0.31429897 - time (sec): 34.69 - samples/sec: 283.84 - lr: 0.000110 - momentum: 0.000000 2023-10-08 20:09:19,887 epoch 4 - iter 60/121 - loss 0.32056386 - time (sec): 43.02 - samples/sec: 283.99 - lr: 0.000109 - momentum: 0.000000 2023-10-08 20:09:28,539 epoch 4 - iter 72/121 - loss 0.31576489 - time (sec): 51.68 - samples/sec: 285.71 - lr: 0.000107 - momentum: 0.000000 2023-10-08 20:09:37,525 epoch 4 - iter 84/121 - loss 0.31745270 - time (sec): 60.66 - samples/sec: 286.40 - lr: 0.000105 - momentum: 0.000000 2023-10-08 20:09:46,334 epoch 4 - iter 96/121 - loss 0.31125288 - time (sec): 69.47 - samples/sec: 286.43 - lr: 0.000104 - momentum: 0.000000 2023-10-08 20:09:54,549 epoch 4 - iter 108/121 - loss 0.31157833 - time (sec): 77.68 - samples/sec: 284.81 - lr: 0.000102 - momentum: 0.000000 2023-10-08 20:10:03,547 epoch 4 - iter 120/121 - loss 0.30596827 - time (sec): 86.68 - samples/sec: 284.07 - lr: 0.000101 - momentum: 0.000000 2023-10-08 20:10:03,989 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:10:03,989 EPOCH 4 done: loss 0.3060 - lr: 0.000101 2023-10-08 20:10:09,817 DEV : loss 0.27008846402168274 - f1-score (micro avg) 0.5212 2023-10-08 20:10:09,823 saving best model 2023-10-08 20:10:10,691 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:10:18,952 epoch 5 - iter 12/121 - loss 0.29786635 - time (sec): 8.26 - samples/sec: 281.03 - lr: 0.000099 - momentum: 0.000000 2023-10-08 20:10:28,023 epoch 5 - iter 24/121 - loss 0.27029483 - time (sec): 17.33 - samples/sec: 290.24 - lr: 0.000097 - momentum: 0.000000 2023-10-08 20:10:36,105 epoch 5 - iter 36/121 - loss 0.24812798 - time (sec): 25.41 - samples/sec: 285.81 - lr: 0.000095 - momentum: 0.000000 2023-10-08 20:10:45,258 epoch 5 - iter 48/121 - loss 0.23823213 - time (sec): 34.57 - samples/sec: 288.09 - lr: 0.000094 - momentum: 0.000000 2023-10-08 20:10:53,983 epoch 5 - iter 60/121 - loss 0.23295911 - time (sec): 43.29 - samples/sec: 284.80 - lr: 0.000092 - momentum: 0.000000 2023-10-08 20:11:03,197 epoch 5 - iter 72/121 - loss 0.22071234 - time (sec): 52.50 - samples/sec: 285.75 - lr: 0.000091 - momentum: 0.000000 2023-10-08 20:11:11,999 epoch 5 - iter 84/121 - loss 0.22227275 - time (sec): 61.31 - samples/sec: 285.29 - lr: 0.000089 - momentum: 0.000000 2023-10-08 20:11:20,569 epoch 5 - iter 96/121 - loss 0.22032985 - time (sec): 69.88 - samples/sec: 283.53 - lr: 0.000087 - momentum: 0.000000 2023-10-08 20:11:28,912 epoch 5 - iter 108/121 - loss 0.22030489 - time (sec): 78.22 - samples/sec: 281.94 - lr: 0.000086 - momentum: 0.000000 2023-10-08 20:11:38,172 epoch 5 - iter 120/121 - loss 0.21867167 - time (sec): 87.48 - samples/sec: 281.65 - lr: 0.000084 - momentum: 0.000000 2023-10-08 20:11:38,636 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:11:38,637 EPOCH 5 done: loss 0.2186 - lr: 0.000084 2023-10-08 20:11:44,603 DEV : loss 0.21157832443714142 - f1-score (micro avg) 0.6305 2023-10-08 20:11:44,609 saving best model 2023-10-08 20:11:45,563 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:11:53,816 epoch 6 - iter 12/121 - loss 0.15161460 - time (sec): 8.25 - samples/sec: 270.74 - lr: 0.000082 - momentum: 0.000000 2023-10-08 20:12:02,719 epoch 6 - iter 24/121 - loss 0.17781971 - time (sec): 17.15 - samples/sec: 272.23 - lr: 0.000081 - momentum: 0.000000 2023-10-08 20:12:11,880 epoch 6 - iter 36/121 - loss 0.18324335 - time (sec): 26.32 - samples/sec: 274.78 - lr: 0.000079 - momentum: 0.000000 2023-10-08 20:12:20,042 epoch 6 - iter 48/121 - loss 0.17626632 - time (sec): 34.48 - samples/sec: 272.23 - lr: 0.000077 - momentum: 0.000000 2023-10-08 20:12:29,495 epoch 6 - iter 60/121 - loss 0.17837208 - time (sec): 43.93 - samples/sec: 272.48 - lr: 0.000076 - momentum: 0.000000 2023-10-08 20:12:38,049 epoch 6 - iter 72/121 - loss 0.17914544 - time (sec): 52.48 - samples/sec: 271.36 - lr: 0.000074 - momentum: 0.000000 2023-10-08 20:12:47,277 epoch 6 - iter 84/121 - loss 0.17557073 - time (sec): 61.71 - samples/sec: 271.35 - lr: 0.000072 - momentum: 0.000000 2023-10-08 20:12:57,254 epoch 6 - iter 96/121 - loss 0.17412353 - time (sec): 71.69 - samples/sec: 272.48 - lr: 0.000071 - momentum: 0.000000 2023-10-08 20:13:06,467 epoch 6 - iter 108/121 - loss 0.16906061 - time (sec): 80.90 - samples/sec: 271.65 - lr: 0.000069 - momentum: 0.000000 2023-10-08 20:13:16,177 epoch 6 - iter 120/121 - loss 0.16809116 - time (sec): 90.61 - samples/sec: 271.63 - lr: 0.000067 - momentum: 0.000000 2023-10-08 20:13:16,733 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:13:16,734 EPOCH 6 done: loss 0.1682 - lr: 0.000067 2023-10-08 20:13:23,195 DEV : loss 0.1836400032043457 - f1-score (micro avg) 0.7872 2023-10-08 20:13:23,201 saving best model 2023-10-08 20:13:27,733 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:13:37,512 epoch 7 - iter 12/121 - loss 0.13480518 - time (sec): 9.78 - samples/sec: 271.45 - lr: 0.000066 - momentum: 0.000000 2023-10-08 20:13:47,078 epoch 7 - iter 24/121 - loss 0.14910125 - time (sec): 19.34 - samples/sec: 273.06 - lr: 0.000064 - momentum: 0.000000 2023-10-08 20:13:56,780 epoch 7 - iter 36/121 - loss 0.14964913 - time (sec): 29.05 - samples/sec: 271.23 - lr: 0.000062 - momentum: 0.000000 2023-10-08 20:14:06,677 epoch 7 - iter 48/121 - loss 0.14783136 - time (sec): 38.94 - samples/sec: 270.68 - lr: 0.000061 - momentum: 0.000000 2023-10-08 20:14:15,421 epoch 7 - iter 60/121 - loss 0.14601029 - time (sec): 47.69 - samples/sec: 269.26 - lr: 0.000059 - momentum: 0.000000 2023-10-08 20:14:25,406 epoch 7 - iter 72/121 - loss 0.15061259 - time (sec): 57.67 - samples/sec: 269.70 - lr: 0.000057 - momentum: 0.000000 2023-10-08 20:14:34,983 epoch 7 - iter 84/121 - loss 0.14505992 - time (sec): 67.25 - samples/sec: 268.45 - lr: 0.000056 - momentum: 0.000000 2023-10-08 20:14:43,622 epoch 7 - iter 96/121 - loss 0.13963726 - time (sec): 75.89 - samples/sec: 266.16 - lr: 0.000054 - momentum: 0.000000 2023-10-08 20:14:52,982 epoch 7 - iter 108/121 - loss 0.13884359 - time (sec): 85.25 - samples/sec: 265.14 - lr: 0.000052 - momentum: 0.000000 2023-10-08 20:15:01,569 epoch 7 - iter 120/121 - loss 0.13721065 - time (sec): 93.83 - samples/sec: 262.44 - lr: 0.000051 - momentum: 0.000000 2023-10-08 20:15:02,069 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:15:02,070 EPOCH 7 done: loss 0.1372 - lr: 0.000051 2023-10-08 20:15:08,677 DEV : loss 0.15692389011383057 - f1-score (micro avg) 0.8074 2023-10-08 20:15:08,682 saving best model 2023-10-08 20:15:09,559 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:15:18,691 epoch 8 - iter 12/121 - loss 0.11442286 - time (sec): 9.13 - samples/sec: 253.10 - lr: 0.000049 - momentum: 0.000000 2023-10-08 20:15:27,589 epoch 8 - iter 24/121 - loss 0.12430889 - time (sec): 18.03 - samples/sec: 256.93 - lr: 0.000047 - momentum: 0.000000 2023-10-08 20:15:37,149 epoch 8 - iter 36/121 - loss 0.12430682 - time (sec): 27.59 - samples/sec: 262.03 - lr: 0.000046 - momentum: 0.000000 2023-10-08 20:15:46,556 epoch 8 - iter 48/121 - loss 0.12357815 - time (sec): 37.00 - samples/sec: 262.98 - lr: 0.000044 - momentum: 0.000000 2023-10-08 20:15:56,158 epoch 8 - iter 60/121 - loss 0.11530164 - time (sec): 46.60 - samples/sec: 263.88 - lr: 0.000042 - momentum: 0.000000 2023-10-08 20:16:04,606 epoch 8 - iter 72/121 - loss 0.11815393 - time (sec): 55.05 - samples/sec: 262.87 - lr: 0.000041 - momentum: 0.000000 2023-10-08 20:16:14,049 epoch 8 - iter 84/121 - loss 0.11776244 - time (sec): 64.49 - samples/sec: 263.24 - lr: 0.000039 - momentum: 0.000000 2023-10-08 20:16:23,983 epoch 8 - iter 96/121 - loss 0.11906598 - time (sec): 74.42 - samples/sec: 264.83 - lr: 0.000038 - momentum: 0.000000 2023-10-08 20:16:33,422 epoch 8 - iter 108/121 - loss 0.11942338 - time (sec): 83.86 - samples/sec: 264.38 - lr: 0.000036 - momentum: 0.000000 2023-10-08 20:16:42,702 epoch 8 - iter 120/121 - loss 0.11588818 - time (sec): 93.14 - samples/sec: 263.15 - lr: 0.000034 - momentum: 0.000000 2023-10-08 20:16:43,488 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:16:43,489 EPOCH 8 done: loss 0.1156 - lr: 0.000034 2023-10-08 20:16:50,065 DEV : loss 0.15528881549835205 - f1-score (micro avg) 0.802 2023-10-08 20:16:50,071 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:16:59,468 epoch 9 - iter 12/121 - loss 0.12249039 - time (sec): 9.40 - samples/sec: 257.34 - lr: 0.000032 - momentum: 0.000000 2023-10-08 20:17:08,925 epoch 9 - iter 24/121 - loss 0.11379174 - time (sec): 18.85 - samples/sec: 258.68 - lr: 0.000031 - momentum: 0.000000 2023-10-08 20:17:19,142 epoch 9 - iter 36/121 - loss 0.10377943 - time (sec): 29.07 - samples/sec: 263.94 - lr: 0.000029 - momentum: 0.000000 2023-10-08 20:17:28,627 epoch 9 - iter 48/121 - loss 0.10446604 - time (sec): 38.56 - samples/sec: 263.44 - lr: 0.000028 - momentum: 0.000000 2023-10-08 20:17:37,348 epoch 9 - iter 60/121 - loss 0.09853297 - time (sec): 47.28 - samples/sec: 262.37 - lr: 0.000026 - momentum: 0.000000 2023-10-08 20:17:46,644 epoch 9 - iter 72/121 - loss 0.09872369 - time (sec): 56.57 - samples/sec: 261.35 - lr: 0.000024 - momentum: 0.000000 2023-10-08 20:17:56,126 epoch 9 - iter 84/121 - loss 0.09623405 - time (sec): 66.05 - samples/sec: 260.83 - lr: 0.000023 - momentum: 0.000000 2023-10-08 20:18:05,122 epoch 9 - iter 96/121 - loss 0.09747693 - time (sec): 75.05 - samples/sec: 261.25 - lr: 0.000021 - momentum: 0.000000 2023-10-08 20:18:14,343 epoch 9 - iter 108/121 - loss 0.10123587 - time (sec): 84.27 - samples/sec: 262.04 - lr: 0.000019 - momentum: 0.000000 2023-10-08 20:18:23,524 epoch 9 - iter 120/121 - loss 0.10254680 - time (sec): 93.45 - samples/sec: 263.46 - lr: 0.000018 - momentum: 0.000000 2023-10-08 20:18:23,998 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:18:23,998 EPOCH 9 done: loss 0.1021 - lr: 0.000018 2023-10-08 20:18:30,037 DEV : loss 0.1471172720193863 - f1-score (micro avg) 0.8075 2023-10-08 20:18:30,043 saving best model 2023-10-08 20:18:34,404 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:18:43,434 epoch 10 - iter 12/121 - loss 0.09498921 - time (sec): 9.03 - samples/sec: 293.94 - lr: 0.000016 - momentum: 0.000000 2023-10-08 20:18:51,928 epoch 10 - iter 24/121 - loss 0.08553472 - time (sec): 17.52 - samples/sec: 285.41 - lr: 0.000014 - momentum: 0.000000 2023-10-08 20:19:00,083 epoch 10 - iter 36/121 - loss 0.08458895 - time (sec): 25.68 - samples/sec: 284.64 - lr: 0.000013 - momentum: 0.000000 2023-10-08 20:19:08,329 epoch 10 - iter 48/121 - loss 0.08874860 - time (sec): 33.92 - samples/sec: 279.45 - lr: 0.000011 - momentum: 0.000000 2023-10-08 20:19:17,188 epoch 10 - iter 60/121 - loss 0.08784253 - time (sec): 42.78 - samples/sec: 281.16 - lr: 0.000009 - momentum: 0.000000 2023-10-08 20:19:25,230 epoch 10 - iter 72/121 - loss 0.08795295 - time (sec): 50.82 - samples/sec: 280.26 - lr: 0.000008 - momentum: 0.000000 2023-10-08 20:19:33,825 epoch 10 - iter 84/121 - loss 0.08567375 - time (sec): 59.42 - samples/sec: 280.11 - lr: 0.000006 - momentum: 0.000000 2023-10-08 20:19:43,038 epoch 10 - iter 96/121 - loss 0.08821429 - time (sec): 68.63 - samples/sec: 281.12 - lr: 0.000004 - momentum: 0.000000 2023-10-08 20:19:52,303 epoch 10 - iter 108/121 - loss 0.09340782 - time (sec): 77.90 - samples/sec: 283.29 - lr: 0.000003 - momentum: 0.000000 2023-10-08 20:20:01,300 epoch 10 - iter 120/121 - loss 0.09408182 - time (sec): 86.89 - samples/sec: 282.12 - lr: 0.000001 - momentum: 0.000000 2023-10-08 20:20:02,006 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:20:02,007 EPOCH 10 done: loss 0.0940 - lr: 0.000001 2023-10-08 20:20:07,837 DEV : loss 0.14591257274150848 - f1-score (micro avg) 0.8079 2023-10-08 20:20:07,843 saving best model 2023-10-08 20:20:13,091 ---------------------------------------------------------------------------------------------------- 2023-10-08 20:20:13,092 Loading model from best epoch ... 2023-10-08 20:20:16,699 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-object, B-object, E-object, I-object, S-date, B-date, E-date, I-date 2023-10-08 20:20:22,583 Results: - F-score (micro) 0.8173 - F-score (macro) 0.4917 - Accuracy 0.7242 By class: precision recall f1-score support pers 0.8633 0.8633 0.8633 139 scope 0.8027 0.9147 0.8551 129 work 0.6882 0.8000 0.7399 80 loc 0.0000 0.0000 0.0000 9 date 0.0000 0.0000 0.0000 3 micro avg 0.7968 0.8389 0.8173 360 macro avg 0.4708 0.5156 0.4917 360 weighted avg 0.7739 0.8389 0.8042 360 2023-10-08 20:20:22,583 ----------------------------------------------------------------------------------------------------