2023-10-14 17:40:36,963 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:40:36,965 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-14 17:40:36,965 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:40:36,965 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator 2023-10-14 17:40:36,965 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:40:36,965 Train: 14465 sentences 2023-10-14 17:40:36,965 (train_with_dev=False, train_with_test=False) 2023-10-14 17:40:36,965 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:40:36,965 Training Params: 2023-10-14 17:40:36,965 - learning_rate: "0.00016" 2023-10-14 17:40:36,965 - mini_batch_size: "4" 2023-10-14 17:40:36,966 - max_epochs: "10" 2023-10-14 17:40:36,966 - shuffle: "True" 2023-10-14 17:40:36,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:40:36,966 Plugins: 2023-10-14 17:40:36,966 - TensorboardLogger 2023-10-14 17:40:36,966 - LinearScheduler | warmup_fraction: '0.1' 2023-10-14 17:40:36,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:40:36,966 Final evaluation on model from best epoch (best-model.pt) 2023-10-14 17:40:36,966 - metric: "('micro avg', 'f1-score')" 2023-10-14 17:40:36,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:40:36,966 Computation: 2023-10-14 17:40:36,966 - compute on device: cuda:0 2023-10-14 17:40:36,966 - embedding storage: none 2023-10-14 17:40:36,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:40:36,966 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4" 2023-10-14 17:40:36,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:40:36,967 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:40:36,967 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-14 17:42:14,296 epoch 1 - iter 361/3617 - loss 2.49909092 - time (sec): 97.33 - samples/sec: 375.84 - lr: 0.000016 - momentum: 0.000000 2023-10-14 17:43:53,686 epoch 1 - iter 722/3617 - loss 2.08190665 - time (sec): 196.72 - samples/sec: 381.04 - lr: 0.000032 - momentum: 0.000000 2023-10-14 17:45:35,110 epoch 1 - iter 1083/3617 - loss 1.61898416 - time (sec): 298.14 - samples/sec: 382.50 - lr: 0.000048 - momentum: 0.000000 2023-10-14 17:47:13,733 epoch 1 - iter 1444/3617 - loss 1.28157513 - time (sec): 396.76 - samples/sec: 383.90 - lr: 0.000064 - momentum: 0.000000 2023-10-14 17:48:54,460 epoch 1 - iter 1805/3617 - loss 1.06836036 - time (sec): 497.49 - samples/sec: 381.65 - lr: 0.000080 - momentum: 0.000000 2023-10-14 17:50:36,133 epoch 1 - iter 2166/3617 - loss 0.91639413 - time (sec): 599.16 - samples/sec: 382.52 - lr: 0.000096 - momentum: 0.000000 2023-10-14 17:52:16,836 epoch 1 - iter 2527/3617 - loss 0.80845272 - time (sec): 699.87 - samples/sec: 382.12 - lr: 0.000112 - momentum: 0.000000 2023-10-14 17:53:58,096 epoch 1 - iter 2888/3617 - loss 0.72707977 - time (sec): 801.13 - samples/sec: 381.37 - lr: 0.000128 - momentum: 0.000000 2023-10-14 17:55:36,896 epoch 1 - iter 3249/3617 - loss 0.66281620 - time (sec): 899.93 - samples/sec: 379.83 - lr: 0.000144 - momentum: 0.000000 2023-10-14 17:57:15,182 epoch 1 - iter 3610/3617 - loss 0.60899005 - time (sec): 998.21 - samples/sec: 380.00 - lr: 0.000160 - momentum: 0.000000 2023-10-14 17:57:16,841 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:57:16,841 EPOCH 1 done: loss 0.6085 - lr: 0.000160 2023-10-14 17:57:54,156 DEV : loss 0.11666169762611389 - f1-score (micro avg) 0.59 2023-10-14 17:57:54,212 saving best model 2023-10-14 17:57:55,123 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:59:33,163 epoch 2 - iter 361/3617 - loss 0.10737844 - time (sec): 98.04 - samples/sec: 384.86 - lr: 0.000158 - momentum: 0.000000 2023-10-14 18:01:11,814 epoch 2 - iter 722/3617 - loss 0.10368130 - time (sec): 196.69 - samples/sec: 387.79 - lr: 0.000156 - momentum: 0.000000 2023-10-14 18:02:50,549 epoch 2 - iter 1083/3617 - loss 0.10379840 - time (sec): 295.42 - samples/sec: 382.95 - lr: 0.000155 - momentum: 0.000000 2023-10-14 18:04:29,909 epoch 2 - iter 1444/3617 - loss 0.10131836 - time (sec): 394.78 - samples/sec: 381.26 - lr: 0.000153 - momentum: 0.000000 2023-10-14 18:06:08,910 epoch 2 - iter 1805/3617 - loss 0.09916791 - time (sec): 493.78 - samples/sec: 380.66 - lr: 0.000151 - momentum: 0.000000 2023-10-14 18:07:47,200 epoch 2 - iter 2166/3617 - loss 0.09716807 - time (sec): 592.07 - samples/sec: 382.55 - lr: 0.000149 - momentum: 0.000000 2023-10-14 18:09:25,905 epoch 2 - iter 2527/3617 - loss 0.09527691 - time (sec): 690.78 - samples/sec: 385.06 - lr: 0.000148 - momentum: 0.000000 2023-10-14 18:11:05,585 epoch 2 - iter 2888/3617 - loss 0.09433563 - time (sec): 790.46 - samples/sec: 384.91 - lr: 0.000146 - momentum: 0.000000 2023-10-14 18:12:50,762 epoch 2 - iter 3249/3617 - loss 0.09149449 - time (sec): 895.64 - samples/sec: 382.24 - lr: 0.000144 - momentum: 0.000000 2023-10-14 18:14:36,327 epoch 2 - iter 3610/3617 - loss 0.09047103 - time (sec): 1001.20 - samples/sec: 378.94 - lr: 0.000142 - momentum: 0.000000 2023-10-14 18:14:38,129 ---------------------------------------------------------------------------------------------------- 2023-10-14 18:14:38,130 EPOCH 2 done: loss 0.0908 - lr: 0.000142 2023-10-14 18:15:20,701 DEV : loss 0.11888301372528076 - f1-score (micro avg) 0.6097 2023-10-14 18:15:20,765 saving best model 2023-10-14 18:15:26,987 ---------------------------------------------------------------------------------------------------- 2023-10-14 18:17:10,154 epoch 3 - iter 361/3617 - loss 0.06341204 - time (sec): 103.16 - samples/sec: 371.90 - lr: 0.000140 - momentum: 0.000000 2023-10-14 18:18:50,000 epoch 3 - iter 722/3617 - loss 0.06429965 - time (sec): 203.01 - samples/sec: 369.27 - lr: 0.000139 - momentum: 0.000000 2023-10-14 18:20:32,346 epoch 3 - iter 1083/3617 - loss 0.06696834 - time (sec): 305.35 - samples/sec: 369.13 - lr: 0.000137 - momentum: 0.000000 2023-10-14 18:22:19,653 epoch 3 - iter 1444/3617 - loss 0.06594952 - time (sec): 412.66 - samples/sec: 368.74 - lr: 0.000135 - momentum: 0.000000 2023-10-14 18:24:04,287 epoch 3 - iter 1805/3617 - loss 0.06548109 - time (sec): 517.30 - samples/sec: 368.45 - lr: 0.000133 - momentum: 0.000000 2023-10-14 18:25:47,789 epoch 3 - iter 2166/3617 - loss 0.06541805 - time (sec): 620.80 - samples/sec: 369.18 - lr: 0.000132 - momentum: 0.000000 2023-10-14 18:27:35,996 epoch 3 - iter 2527/3617 - loss 0.06512283 - time (sec): 729.00 - samples/sec: 366.63 - lr: 0.000130 - momentum: 0.000000 2023-10-14 18:29:23,835 epoch 3 - iter 2888/3617 - loss 0.06503497 - time (sec): 836.84 - samples/sec: 362.96 - lr: 0.000128 - momentum: 0.000000 2023-10-14 18:31:09,766 epoch 3 - iter 3249/3617 - loss 0.06475445 - time (sec): 942.77 - samples/sec: 362.35 - lr: 0.000126 - momentum: 0.000000 2023-10-14 18:32:53,571 epoch 3 - iter 3610/3617 - loss 0.06534905 - time (sec): 1046.58 - samples/sec: 362.42 - lr: 0.000124 - momentum: 0.000000 2023-10-14 18:32:55,349 ---------------------------------------------------------------------------------------------------- 2023-10-14 18:32:55,349 EPOCH 3 done: loss 0.0654 - lr: 0.000124 2023-10-14 18:33:34,424 DEV : loss 0.1653740406036377 - f1-score (micro avg) 0.6296 2023-10-14 18:33:34,486 saving best model 2023-10-14 18:33:38,369 ---------------------------------------------------------------------------------------------------- 2023-10-14 18:35:22,865 epoch 4 - iter 361/3617 - loss 0.04722605 - time (sec): 104.49 - samples/sec: 350.14 - lr: 0.000123 - momentum: 0.000000 2023-10-14 18:37:07,462 epoch 4 - iter 722/3617 - loss 0.04683519 - time (sec): 209.08 - samples/sec: 356.48 - lr: 0.000121 - momentum: 0.000000 2023-10-14 18:38:50,699 epoch 4 - iter 1083/3617 - loss 0.04733974 - time (sec): 312.32 - samples/sec: 363.87 - lr: 0.000119 - momentum: 0.000000 2023-10-14 18:40:34,273 epoch 4 - iter 1444/3617 - loss 0.04628566 - time (sec): 415.90 - samples/sec: 361.08 - lr: 0.000117 - momentum: 0.000000 2023-10-14 18:42:19,464 epoch 4 - iter 1805/3617 - loss 0.04635430 - time (sec): 521.09 - samples/sec: 360.41 - lr: 0.000116 - momentum: 0.000000 2023-10-14 18:44:05,578 epoch 4 - iter 2166/3617 - loss 0.04695993 - time (sec): 627.20 - samples/sec: 361.14 - lr: 0.000114 - momentum: 0.000000 2023-10-14 18:45:51,805 epoch 4 - iter 2527/3617 - loss 0.04750166 - time (sec): 733.43 - samples/sec: 361.15 - lr: 0.000112 - momentum: 0.000000 2023-10-14 18:47:45,483 epoch 4 - iter 2888/3617 - loss 0.04691613 - time (sec): 847.10 - samples/sec: 358.87 - lr: 0.000110 - momentum: 0.000000 2023-10-14 18:49:35,948 epoch 4 - iter 3249/3617 - loss 0.04659129 - time (sec): 957.57 - samples/sec: 356.07 - lr: 0.000108 - momentum: 0.000000 2023-10-14 18:51:24,470 epoch 4 - iter 3610/3617 - loss 0.04611898 - time (sec): 1066.09 - samples/sec: 355.80 - lr: 0.000107 - momentum: 0.000000 2023-10-14 18:51:26,173 ---------------------------------------------------------------------------------------------------- 2023-10-14 18:51:26,173 EPOCH 4 done: loss 0.0461 - lr: 0.000107 2023-10-14 18:52:09,529 DEV : loss 0.2204860895872116 - f1-score (micro avg) 0.6404 2023-10-14 18:52:09,605 saving best model 2023-10-14 18:52:14,120 ---------------------------------------------------------------------------------------------------- 2023-10-14 18:54:00,964 epoch 5 - iter 361/3617 - loss 0.02546993 - time (sec): 106.84 - samples/sec: 351.85 - lr: 0.000105 - momentum: 0.000000 2023-10-14 18:55:48,615 epoch 5 - iter 722/3617 - loss 0.02714827 - time (sec): 214.49 - samples/sec: 366.02 - lr: 0.000103 - momentum: 0.000000 2023-10-14 18:57:36,152 epoch 5 - iter 1083/3617 - loss 0.02846680 - time (sec): 322.03 - samples/sec: 364.04 - lr: 0.000101 - momentum: 0.000000 2023-10-14 18:59:19,149 epoch 5 - iter 1444/3617 - loss 0.02938967 - time (sec): 425.03 - samples/sec: 362.02 - lr: 0.000100 - momentum: 0.000000 2023-10-14 19:01:02,064 epoch 5 - iter 1805/3617 - loss 0.03031118 - time (sec): 527.94 - samples/sec: 361.07 - lr: 0.000098 - momentum: 0.000000 2023-10-14 19:02:49,967 epoch 5 - iter 2166/3617 - loss 0.03175675 - time (sec): 635.84 - samples/sec: 360.07 - lr: 0.000096 - momentum: 0.000000 2023-10-14 19:04:46,241 epoch 5 - iter 2527/3617 - loss 0.03094337 - time (sec): 752.12 - samples/sec: 354.43 - lr: 0.000094 - momentum: 0.000000 2023-10-14 19:06:38,828 epoch 5 - iter 2888/3617 - loss 0.03069326 - time (sec): 864.70 - samples/sec: 351.73 - lr: 0.000092 - momentum: 0.000000 2023-10-14 19:08:25,290 epoch 5 - iter 3249/3617 - loss 0.03146461 - time (sec): 971.17 - samples/sec: 351.36 - lr: 0.000091 - momentum: 0.000000 2023-10-14 19:10:15,061 epoch 5 - iter 3610/3617 - loss 0.03184261 - time (sec): 1080.94 - samples/sec: 350.82 - lr: 0.000089 - momentum: 0.000000 2023-10-14 19:10:17,200 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:10:17,201 EPOCH 5 done: loss 0.0318 - lr: 0.000089 2023-10-14 19:11:00,836 DEV : loss 0.276943564414978 - f1-score (micro avg) 0.6544 2023-10-14 19:11:00,899 saving best model 2023-10-14 19:11:06,852 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:13:01,884 epoch 6 - iter 361/3617 - loss 0.02353414 - time (sec): 115.03 - samples/sec: 341.58 - lr: 0.000087 - momentum: 0.000000 2023-10-14 19:14:48,500 epoch 6 - iter 722/3617 - loss 0.02260379 - time (sec): 221.64 - samples/sec: 345.08 - lr: 0.000085 - momentum: 0.000000 2023-10-14 19:16:39,421 epoch 6 - iter 1083/3617 - loss 0.02041161 - time (sec): 332.56 - samples/sec: 342.90 - lr: 0.000084 - momentum: 0.000000 2023-10-14 19:18:25,198 epoch 6 - iter 1444/3617 - loss 0.02098726 - time (sec): 438.34 - samples/sec: 344.92 - lr: 0.000082 - momentum: 0.000000 2023-10-14 19:20:12,176 epoch 6 - iter 1805/3617 - loss 0.02304267 - time (sec): 545.32 - samples/sec: 345.60 - lr: 0.000080 - momentum: 0.000000 2023-10-14 19:21:59,618 epoch 6 - iter 2166/3617 - loss 0.02397054 - time (sec): 652.76 - samples/sec: 347.81 - lr: 0.000078 - momentum: 0.000000 2023-10-14 19:23:51,331 epoch 6 - iter 2527/3617 - loss 0.02338402 - time (sec): 764.47 - samples/sec: 346.20 - lr: 0.000076 - momentum: 0.000000 2023-10-14 19:25:40,140 epoch 6 - iter 2888/3617 - loss 0.02356297 - time (sec): 873.28 - samples/sec: 347.55 - lr: 0.000075 - momentum: 0.000000 2023-10-14 19:27:29,300 epoch 6 - iter 3249/3617 - loss 0.02347202 - time (sec): 982.44 - samples/sec: 347.05 - lr: 0.000073 - momentum: 0.000000 2023-10-14 19:29:22,016 epoch 6 - iter 3610/3617 - loss 0.02303344 - time (sec): 1095.16 - samples/sec: 346.45 - lr: 0.000071 - momentum: 0.000000 2023-10-14 19:29:23,971 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:29:23,971 EPOCH 6 done: loss 0.0230 - lr: 0.000071 2023-10-14 19:30:06,159 DEV : loss 0.3062983453273773 - f1-score (micro avg) 0.6431 2023-10-14 19:30:06,218 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:31:58,733 epoch 7 - iter 361/3617 - loss 0.01295537 - time (sec): 112.51 - samples/sec: 360.93 - lr: 0.000069 - momentum: 0.000000 2023-10-14 19:33:45,291 epoch 7 - iter 722/3617 - loss 0.01390893 - time (sec): 219.07 - samples/sec: 354.76 - lr: 0.000068 - momentum: 0.000000 2023-10-14 19:35:29,705 epoch 7 - iter 1083/3617 - loss 0.01484254 - time (sec): 323.48 - samples/sec: 356.97 - lr: 0.000066 - momentum: 0.000000 2023-10-14 19:37:17,664 epoch 7 - iter 1444/3617 - loss 0.01638564 - time (sec): 431.44 - samples/sec: 353.72 - lr: 0.000064 - momentum: 0.000000 2023-10-14 19:39:06,477 epoch 7 - iter 1805/3617 - loss 0.01577639 - time (sec): 540.26 - samples/sec: 351.59 - lr: 0.000062 - momentum: 0.000000 2023-10-14 19:40:52,871 epoch 7 - iter 2166/3617 - loss 0.01583463 - time (sec): 646.65 - samples/sec: 352.37 - lr: 0.000060 - momentum: 0.000000 2023-10-14 19:42:38,683 epoch 7 - iter 2527/3617 - loss 0.01550215 - time (sec): 752.46 - samples/sec: 353.75 - lr: 0.000059 - momentum: 0.000000 2023-10-14 19:44:29,184 epoch 7 - iter 2888/3617 - loss 0.01587773 - time (sec): 862.96 - samples/sec: 354.14 - lr: 0.000057 - momentum: 0.000000 2023-10-14 19:46:26,131 epoch 7 - iter 3249/3617 - loss 0.01632535 - time (sec): 979.91 - samples/sec: 349.86 - lr: 0.000055 - momentum: 0.000000 2023-10-14 19:48:20,303 epoch 7 - iter 3610/3617 - loss 0.01660869 - time (sec): 1094.08 - samples/sec: 346.42 - lr: 0.000053 - momentum: 0.000000 2023-10-14 19:48:22,477 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:48:22,477 EPOCH 7 done: loss 0.0166 - lr: 0.000053 2023-10-14 19:49:06,170 DEV : loss 0.2951955795288086 - f1-score (micro avg) 0.6553 2023-10-14 19:49:06,247 saving best model 2023-10-14 19:49:12,289 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:51:02,047 epoch 8 - iter 361/3617 - loss 0.01175916 - time (sec): 109.75 - samples/sec: 349.93 - lr: 0.000052 - momentum: 0.000000 2023-10-14 19:52:51,592 epoch 8 - iter 722/3617 - loss 0.01207346 - time (sec): 219.30 - samples/sec: 353.91 - lr: 0.000050 - momentum: 0.000000 2023-10-14 19:54:36,484 epoch 8 - iter 1083/3617 - loss 0.01158310 - time (sec): 324.19 - samples/sec: 356.39 - lr: 0.000048 - momentum: 0.000000 2023-10-14 19:56:21,848 epoch 8 - iter 1444/3617 - loss 0.01138497 - time (sec): 429.55 - samples/sec: 354.33 - lr: 0.000046 - momentum: 0.000000 2023-10-14 19:58:12,503 epoch 8 - iter 1805/3617 - loss 0.01137053 - time (sec): 540.21 - samples/sec: 352.99 - lr: 0.000044 - momentum: 0.000000 2023-10-14 19:59:59,874 epoch 8 - iter 2166/3617 - loss 0.01136281 - time (sec): 647.58 - samples/sec: 350.48 - lr: 0.000043 - momentum: 0.000000 2023-10-14 20:01:57,527 epoch 8 - iter 2527/3617 - loss 0.01116022 - time (sec): 765.23 - samples/sec: 347.70 - lr: 0.000041 - momentum: 0.000000 2023-10-14 20:03:51,731 epoch 8 - iter 2888/3617 - loss 0.01129057 - time (sec): 879.44 - samples/sec: 344.69 - lr: 0.000039 - momentum: 0.000000 2023-10-14 20:05:45,668 epoch 8 - iter 3249/3617 - loss 0.01102905 - time (sec): 993.38 - samples/sec: 343.74 - lr: 0.000037 - momentum: 0.000000 2023-10-14 20:07:36,804 epoch 8 - iter 3610/3617 - loss 0.01076886 - time (sec): 1104.51 - samples/sec: 343.37 - lr: 0.000036 - momentum: 0.000000 2023-10-14 20:07:38,705 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:07:38,706 EPOCH 8 done: loss 0.0108 - lr: 0.000036 2023-10-14 20:08:21,212 DEV : loss 0.33818429708480835 - f1-score (micro avg) 0.6626 2023-10-14 20:08:21,286 saving best model 2023-10-14 20:08:27,683 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:10:12,574 epoch 9 - iter 361/3617 - loss 0.00533047 - time (sec): 104.88 - samples/sec: 347.07 - lr: 0.000034 - momentum: 0.000000 2023-10-14 20:12:00,413 epoch 9 - iter 722/3617 - loss 0.00755431 - time (sec): 212.72 - samples/sec: 342.76 - lr: 0.000032 - momentum: 0.000000 2023-10-14 20:13:47,284 epoch 9 - iter 1083/3617 - loss 0.00876838 - time (sec): 319.59 - samples/sec: 348.03 - lr: 0.000030 - momentum: 0.000000 2023-10-14 20:15:39,273 epoch 9 - iter 1444/3617 - loss 0.00873982 - time (sec): 431.58 - samples/sec: 347.12 - lr: 0.000028 - momentum: 0.000000 2023-10-14 20:17:28,419 epoch 9 - iter 1805/3617 - loss 0.00799554 - time (sec): 540.73 - samples/sec: 349.02 - lr: 0.000027 - momentum: 0.000000 2023-10-14 20:19:14,372 epoch 9 - iter 2166/3617 - loss 0.00816002 - time (sec): 646.68 - samples/sec: 349.48 - lr: 0.000025 - momentum: 0.000000 2023-10-14 20:21:01,820 epoch 9 - iter 2527/3617 - loss 0.00778069 - time (sec): 754.13 - samples/sec: 349.98 - lr: 0.000023 - momentum: 0.000000 2023-10-14 20:22:56,234 epoch 9 - iter 2888/3617 - loss 0.00744671 - time (sec): 868.54 - samples/sec: 349.39 - lr: 0.000021 - momentum: 0.000000 2023-10-14 20:24:47,946 epoch 9 - iter 3249/3617 - loss 0.00733316 - time (sec): 980.26 - samples/sec: 348.43 - lr: 0.000020 - momentum: 0.000000 2023-10-14 20:26:34,573 epoch 9 - iter 3610/3617 - loss 0.00753722 - time (sec): 1086.88 - samples/sec: 348.85 - lr: 0.000018 - momentum: 0.000000 2023-10-14 20:26:36,509 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:26:36,510 EPOCH 9 done: loss 0.0075 - lr: 0.000018 2023-10-14 20:27:21,761 DEV : loss 0.3862650990486145 - f1-score (micro avg) 0.6657 2023-10-14 20:27:21,827 saving best model 2023-10-14 20:27:27,609 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:29:16,160 epoch 10 - iter 361/3617 - loss 0.00549490 - time (sec): 108.55 - samples/sec: 344.88 - lr: 0.000016 - momentum: 0.000000 2023-10-14 20:31:04,359 epoch 10 - iter 722/3617 - loss 0.00485832 - time (sec): 216.75 - samples/sec: 349.65 - lr: 0.000014 - momentum: 0.000000 2023-10-14 20:32:56,980 epoch 10 - iter 1083/3617 - loss 0.00533342 - time (sec): 329.37 - samples/sec: 347.78 - lr: 0.000012 - momentum: 0.000000 2023-10-14 20:34:50,788 epoch 10 - iter 1444/3617 - loss 0.00475295 - time (sec): 443.17 - samples/sec: 343.13 - lr: 0.000011 - momentum: 0.000000 2023-10-14 20:36:40,096 epoch 10 - iter 1805/3617 - loss 0.00472621 - time (sec): 552.48 - samples/sec: 343.49 - lr: 0.000009 - momentum: 0.000000 2023-10-14 20:38:31,419 epoch 10 - iter 2166/3617 - loss 0.00448934 - time (sec): 663.81 - samples/sec: 343.72 - lr: 0.000007 - momentum: 0.000000 2023-10-14 20:40:22,160 epoch 10 - iter 2527/3617 - loss 0.00440123 - time (sec): 774.55 - samples/sec: 343.30 - lr: 0.000005 - momentum: 0.000000 2023-10-14 20:42:10,416 epoch 10 - iter 2888/3617 - loss 0.00441431 - time (sec): 882.80 - samples/sec: 345.63 - lr: 0.000004 - momentum: 0.000000 2023-10-14 20:44:01,192 epoch 10 - iter 3249/3617 - loss 0.00460905 - time (sec): 993.58 - samples/sec: 344.54 - lr: 0.000002 - momentum: 0.000000 2023-10-14 20:45:49,067 epoch 10 - iter 3610/3617 - loss 0.00453893 - time (sec): 1101.45 - samples/sec: 344.40 - lr: 0.000000 - momentum: 0.000000 2023-10-14 20:45:50,816 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:45:50,817 EPOCH 10 done: loss 0.0045 - lr: 0.000000 2023-10-14 20:46:33,079 DEV : loss 0.3649245500564575 - f1-score (micro avg) 0.6654 2023-10-14 20:46:34,180 ---------------------------------------------------------------------------------------------------- 2023-10-14 20:46:34,182 Loading model from best epoch ... 2023-10-14 20:46:38,464 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org 2023-10-14 20:47:39,507 Results: - F-score (micro) 0.6404 - F-score (macro) 0.524 - Accuracy 0.4857 By class: precision recall f1-score support loc 0.6349 0.7885 0.7034 591 pers 0.5381 0.7311 0.6200 357 org 0.2333 0.2658 0.2485 79 micro avg 0.5714 0.7283 0.6404 1027 macro avg 0.4688 0.5951 0.5240 1027 weighted avg 0.5704 0.7283 0.6394 1027 2023-10-14 20:47:39,507 ----------------------------------------------------------------------------------------------------