2023-10-14 11:52:28,279 ---------------------------------------------------------------------------------------------------- 2023-10-14 11:52:28,281 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-14 11:52:28,281 ---------------------------------------------------------------------------------------------------- 2023-10-14 11:52:28,281 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator 2023-10-14 11:52:28,281 ---------------------------------------------------------------------------------------------------- 2023-10-14 11:52:28,282 Train: 14465 sentences 2023-10-14 11:52:28,282 (train_with_dev=False, train_with_test=False) 2023-10-14 11:52:28,282 ---------------------------------------------------------------------------------------------------- 2023-10-14 11:52:28,282 Training Params: 2023-10-14 11:52:28,282 - learning_rate: "0.00016" 2023-10-14 11:52:28,282 - mini_batch_size: "8" 2023-10-14 11:52:28,282 - max_epochs: "10" 2023-10-14 11:52:28,282 - shuffle: "True" 2023-10-14 11:52:28,282 ---------------------------------------------------------------------------------------------------- 2023-10-14 11:52:28,282 Plugins: 2023-10-14 11:52:28,282 - TensorboardLogger 2023-10-14 11:52:28,282 - LinearScheduler | warmup_fraction: '0.1' 2023-10-14 11:52:28,282 ---------------------------------------------------------------------------------------------------- 2023-10-14 11:52:28,282 Final evaluation on model from best epoch (best-model.pt) 2023-10-14 11:52:28,282 - metric: "('micro avg', 'f1-score')" 2023-10-14 11:52:28,283 ---------------------------------------------------------------------------------------------------- 2023-10-14 11:52:28,283 Computation: 2023-10-14 11:52:28,283 - compute on device: cuda:0 2023-10-14 11:52:28,283 - embedding storage: none 2023-10-14 11:52:28,283 ---------------------------------------------------------------------------------------------------- 2023-10-14 11:52:28,283 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-4" 2023-10-14 11:52:28,283 ---------------------------------------------------------------------------------------------------- 2023-10-14 11:52:28,283 ---------------------------------------------------------------------------------------------------- 2023-10-14 11:52:28,283 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-14 11:54:01,906 epoch 1 - iter 180/1809 - loss 2.54702501 - time (sec): 93.62 - samples/sec: 389.75 - lr: 0.000016 - momentum: 0.000000 2023-10-14 11:55:34,893 epoch 1 - iter 360/1809 - loss 2.27602894 - time (sec): 186.61 - samples/sec: 400.06 - lr: 0.000032 - momentum: 0.000000 2023-10-14 11:57:10,309 epoch 1 - iter 540/1809 - loss 1.92434741 - time (sec): 282.02 - samples/sec: 402.87 - lr: 0.000048 - momentum: 0.000000 2023-10-14 11:58:44,161 epoch 1 - iter 720/1809 - loss 1.57584407 - time (sec): 375.88 - samples/sec: 404.57 - lr: 0.000064 - momentum: 0.000000 2023-10-14 12:00:18,267 epoch 1 - iter 900/1809 - loss 1.32127986 - time (sec): 469.98 - samples/sec: 403.03 - lr: 0.000080 - momentum: 0.000000 2023-10-14 12:01:56,606 epoch 1 - iter 1080/1809 - loss 1.13001630 - time (sec): 568.32 - samples/sec: 402.37 - lr: 0.000095 - momentum: 0.000000 2023-10-14 12:03:33,121 epoch 1 - iter 1260/1809 - loss 0.99420573 - time (sec): 664.84 - samples/sec: 401.11 - lr: 0.000111 - momentum: 0.000000 2023-10-14 12:05:10,045 epoch 1 - iter 1440/1809 - loss 0.89109878 - time (sec): 761.76 - samples/sec: 400.07 - lr: 0.000127 - momentum: 0.000000 2023-10-14 12:06:47,130 epoch 1 - iter 1620/1809 - loss 0.81100176 - time (sec): 858.85 - samples/sec: 397.02 - lr: 0.000143 - momentum: 0.000000 2023-10-14 12:08:23,566 epoch 1 - iter 1800/1809 - loss 0.74361325 - time (sec): 955.28 - samples/sec: 396.06 - lr: 0.000159 - momentum: 0.000000 2023-10-14 12:08:27,689 ---------------------------------------------------------------------------------------------------- 2023-10-14 12:08:27,690 EPOCH 1 done: loss 0.7418 - lr: 0.000159 2023-10-14 12:09:05,444 DEV : loss 0.13058921694755554 - f1-score (micro avg) 0.4901 2023-10-14 12:09:05,501 saving best model 2023-10-14 12:09:06,504 ---------------------------------------------------------------------------------------------------- 2023-10-14 12:10:41,018 epoch 2 - iter 180/1809 - loss 0.11763819 - time (sec): 94.51 - samples/sec: 398.13 - lr: 0.000158 - momentum: 0.000000 2023-10-14 12:12:15,894 epoch 2 - iter 360/1809 - loss 0.11288606 - time (sec): 189.39 - samples/sec: 401.69 - lr: 0.000156 - momentum: 0.000000 2023-10-14 12:13:53,929 epoch 2 - iter 540/1809 - loss 0.11069431 - time (sec): 287.42 - samples/sec: 392.41 - lr: 0.000155 - momentum: 0.000000 2023-10-14 12:15:25,447 epoch 2 - iter 720/1809 - loss 0.10746830 - time (sec): 378.94 - samples/sec: 395.96 - lr: 0.000153 - momentum: 0.000000 2023-10-14 12:16:57,242 epoch 2 - iter 900/1809 - loss 0.10386008 - time (sec): 470.74 - samples/sec: 398.06 - lr: 0.000151 - momentum: 0.000000 2023-10-14 12:18:32,290 epoch 2 - iter 1080/1809 - loss 0.10038808 - time (sec): 565.78 - samples/sec: 399.27 - lr: 0.000149 - momentum: 0.000000 2023-10-14 12:20:11,902 epoch 2 - iter 1260/1809 - loss 0.09720591 - time (sec): 665.40 - samples/sec: 398.88 - lr: 0.000148 - momentum: 0.000000 2023-10-14 12:21:47,580 epoch 2 - iter 1440/1809 - loss 0.09543514 - time (sec): 761.07 - samples/sec: 398.87 - lr: 0.000146 - momentum: 0.000000 2023-10-14 12:23:21,787 epoch 2 - iter 1620/1809 - loss 0.09204174 - time (sec): 855.28 - samples/sec: 399.25 - lr: 0.000144 - momentum: 0.000000 2023-10-14 12:24:54,948 epoch 2 - iter 1800/1809 - loss 0.09031627 - time (sec): 948.44 - samples/sec: 399.02 - lr: 0.000142 - momentum: 0.000000 2023-10-14 12:24:59,031 ---------------------------------------------------------------------------------------------------- 2023-10-14 12:24:59,031 EPOCH 2 done: loss 0.0905 - lr: 0.000142 2023-10-14 12:25:38,013 DEV : loss 0.09481607377529144 - f1-score (micro avg) 0.6302 2023-10-14 12:25:38,069 saving best model 2023-10-14 12:25:41,560 ---------------------------------------------------------------------------------------------------- 2023-10-14 12:27:25,068 epoch 3 - iter 180/1809 - loss 0.05732527 - time (sec): 103.50 - samples/sec: 369.83 - lr: 0.000140 - momentum: 0.000000 2023-10-14 12:28:58,543 epoch 3 - iter 360/1809 - loss 0.05785601 - time (sec): 196.98 - samples/sec: 379.54 - lr: 0.000139 - momentum: 0.000000 2023-10-14 12:30:28,432 epoch 3 - iter 540/1809 - loss 0.05962819 - time (sec): 286.87 - samples/sec: 391.90 - lr: 0.000137 - momentum: 0.000000 2023-10-14 12:31:58,859 epoch 3 - iter 720/1809 - loss 0.05938527 - time (sec): 377.29 - samples/sec: 402.14 - lr: 0.000135 - momentum: 0.000000 2023-10-14 12:33:36,210 epoch 3 - iter 900/1809 - loss 0.05901793 - time (sec): 474.64 - samples/sec: 400.35 - lr: 0.000133 - momentum: 0.000000 2023-10-14 12:35:21,233 epoch 3 - iter 1080/1809 - loss 0.05760677 - time (sec): 579.67 - samples/sec: 394.24 - lr: 0.000132 - momentum: 0.000000 2023-10-14 12:37:02,151 epoch 3 - iter 1260/1809 - loss 0.05767185 - time (sec): 680.59 - samples/sec: 391.76 - lr: 0.000130 - momentum: 0.000000 2023-10-14 12:38:38,632 epoch 3 - iter 1440/1809 - loss 0.05754962 - time (sec): 777.07 - samples/sec: 390.27 - lr: 0.000128 - momentum: 0.000000 2023-10-14 12:40:18,411 epoch 3 - iter 1620/1809 - loss 0.05726013 - time (sec): 876.85 - samples/sec: 388.54 - lr: 0.000126 - momentum: 0.000000 2023-10-14 12:41:58,428 epoch 3 - iter 1800/1809 - loss 0.05737904 - time (sec): 976.86 - samples/sec: 387.15 - lr: 0.000125 - momentum: 0.000000 2023-10-14 12:42:03,646 ---------------------------------------------------------------------------------------------------- 2023-10-14 12:42:03,646 EPOCH 3 done: loss 0.0576 - lr: 0.000125 2023-10-14 12:42:52,619 DEV : loss 0.1367848813533783 - f1-score (micro avg) 0.6292 2023-10-14 12:42:52,685 ---------------------------------------------------------------------------------------------------- 2023-10-14 12:44:28,197 epoch 4 - iter 180/1809 - loss 0.03871364 - time (sec): 95.51 - samples/sec: 381.69 - lr: 0.000123 - momentum: 0.000000 2023-10-14 12:46:00,663 epoch 4 - iter 360/1809 - loss 0.03856325 - time (sec): 187.98 - samples/sec: 395.23 - lr: 0.000121 - momentum: 0.000000 2023-10-14 12:47:33,114 epoch 4 - iter 540/1809 - loss 0.04060601 - time (sec): 280.43 - samples/sec: 403.91 - lr: 0.000119 - momentum: 0.000000 2023-10-14 12:49:02,712 epoch 4 - iter 720/1809 - loss 0.03906737 - time (sec): 370.02 - samples/sec: 404.57 - lr: 0.000117 - momentum: 0.000000 2023-10-14 12:50:32,763 epoch 4 - iter 900/1809 - loss 0.03994477 - time (sec): 460.08 - samples/sec: 407.04 - lr: 0.000116 - momentum: 0.000000 2023-10-14 12:52:04,771 epoch 4 - iter 1080/1809 - loss 0.03951109 - time (sec): 552.08 - samples/sec: 408.84 - lr: 0.000114 - momentum: 0.000000 2023-10-14 12:53:36,911 epoch 4 - iter 1260/1809 - loss 0.04014654 - time (sec): 644.22 - samples/sec: 409.76 - lr: 0.000112 - momentum: 0.000000 2023-10-14 12:55:12,371 epoch 4 - iter 1440/1809 - loss 0.03986687 - time (sec): 739.68 - samples/sec: 409.89 - lr: 0.000110 - momentum: 0.000000 2023-10-14 12:56:48,117 epoch 4 - iter 1620/1809 - loss 0.03943713 - time (sec): 835.43 - samples/sec: 407.22 - lr: 0.000109 - momentum: 0.000000 2023-10-14 12:58:28,134 epoch 4 - iter 1800/1809 - loss 0.03893369 - time (sec): 935.45 - samples/sec: 404.27 - lr: 0.000107 - momentum: 0.000000 2023-10-14 12:58:32,852 ---------------------------------------------------------------------------------------------------- 2023-10-14 12:58:32,853 EPOCH 4 done: loss 0.0389 - lr: 0.000107 2023-10-14 12:59:11,971 DEV : loss 0.19053448736667633 - f1-score (micro avg) 0.6399 2023-10-14 12:59:12,035 saving best model 2023-10-14 12:59:13,027 ---------------------------------------------------------------------------------------------------- 2023-10-14 13:00:43,768 epoch 5 - iter 180/1809 - loss 0.02307068 - time (sec): 90.74 - samples/sec: 413.03 - lr: 0.000105 - momentum: 0.000000 2023-10-14 13:02:22,128 epoch 5 - iter 360/1809 - loss 0.02459229 - time (sec): 189.10 - samples/sec: 413.96 - lr: 0.000103 - momentum: 0.000000 2023-10-14 13:03:56,196 epoch 5 - iter 540/1809 - loss 0.02726105 - time (sec): 283.17 - samples/sec: 412.87 - lr: 0.000101 - momentum: 0.000000 2023-10-14 13:05:29,203 epoch 5 - iter 720/1809 - loss 0.02847084 - time (sec): 376.17 - samples/sec: 407.73 - lr: 0.000100 - momentum: 0.000000 2023-10-14 13:07:00,178 epoch 5 - iter 900/1809 - loss 0.02858724 - time (sec): 467.15 - samples/sec: 407.20 - lr: 0.000098 - momentum: 0.000000 2023-10-14 13:08:32,646 epoch 5 - iter 1080/1809 - loss 0.02877847 - time (sec): 559.62 - samples/sec: 407.69 - lr: 0.000096 - momentum: 0.000000 2023-10-14 13:10:07,347 epoch 5 - iter 1260/1809 - loss 0.02796208 - time (sec): 654.32 - samples/sec: 406.44 - lr: 0.000094 - momentum: 0.000000 2023-10-14 13:11:43,274 epoch 5 - iter 1440/1809 - loss 0.02825755 - time (sec): 750.24 - samples/sec: 404.27 - lr: 0.000093 - momentum: 0.000000 2023-10-14 13:13:26,339 epoch 5 - iter 1620/1809 - loss 0.02890447 - time (sec): 853.31 - samples/sec: 398.64 - lr: 0.000091 - momentum: 0.000000 2023-10-14 13:14:59,584 epoch 5 - iter 1800/1809 - loss 0.02958885 - time (sec): 946.55 - samples/sec: 399.42 - lr: 0.000089 - momentum: 0.000000 2023-10-14 13:15:03,940 ---------------------------------------------------------------------------------------------------- 2023-10-14 13:15:03,940 EPOCH 5 done: loss 0.0297 - lr: 0.000089 2023-10-14 13:15:45,600 DEV : loss 0.23190708458423615 - f1-score (micro avg) 0.6378 2023-10-14 13:15:45,666 ---------------------------------------------------------------------------------------------------- 2023-10-14 13:17:21,707 epoch 6 - iter 180/1809 - loss 0.01812127 - time (sec): 96.04 - samples/sec: 408.69 - lr: 0.000087 - momentum: 0.000000 2023-10-14 13:18:53,389 epoch 6 - iter 360/1809 - loss 0.02039867 - time (sec): 187.72 - samples/sec: 405.76 - lr: 0.000085 - momentum: 0.000000 2023-10-14 13:20:29,251 epoch 6 - iter 540/1809 - loss 0.01933017 - time (sec): 283.58 - samples/sec: 401.10 - lr: 0.000084 - momentum: 0.000000 2023-10-14 13:22:02,753 epoch 6 - iter 720/1809 - loss 0.02027772 - time (sec): 377.08 - samples/sec: 399.93 - lr: 0.000082 - momentum: 0.000000 2023-10-14 13:23:37,511 epoch 6 - iter 900/1809 - loss 0.02122834 - time (sec): 471.84 - samples/sec: 398.25 - lr: 0.000080 - momentum: 0.000000 2023-10-14 13:25:13,342 epoch 6 - iter 1080/1809 - loss 0.02222813 - time (sec): 567.67 - samples/sec: 398.86 - lr: 0.000078 - momentum: 0.000000 2023-10-14 13:26:46,706 epoch 6 - iter 1260/1809 - loss 0.02238050 - time (sec): 661.04 - samples/sec: 399.34 - lr: 0.000077 - momentum: 0.000000 2023-10-14 13:28:21,436 epoch 6 - iter 1440/1809 - loss 0.02161858 - time (sec): 755.77 - samples/sec: 400.35 - lr: 0.000075 - momentum: 0.000000 2023-10-14 13:29:57,051 epoch 6 - iter 1620/1809 - loss 0.02203621 - time (sec): 851.38 - samples/sec: 399.42 - lr: 0.000073 - momentum: 0.000000 2023-10-14 13:31:35,172 epoch 6 - iter 1800/1809 - loss 0.02164417 - time (sec): 949.50 - samples/sec: 398.33 - lr: 0.000071 - momentum: 0.000000 2023-10-14 13:31:39,297 ---------------------------------------------------------------------------------------------------- 2023-10-14 13:31:39,298 EPOCH 6 done: loss 0.0216 - lr: 0.000071 2023-10-14 13:32:24,318 DEV : loss 0.272296279668808 - f1-score (micro avg) 0.6524 2023-10-14 13:32:24,397 saving best model 2023-10-14 13:32:31,032 ---------------------------------------------------------------------------------------------------- 2023-10-14 13:34:10,803 epoch 7 - iter 180/1809 - loss 0.01105902 - time (sec): 99.77 - samples/sec: 406.73 - lr: 0.000069 - momentum: 0.000000 2023-10-14 13:35:57,197 epoch 7 - iter 360/1809 - loss 0.01259388 - time (sec): 206.16 - samples/sec: 376.08 - lr: 0.000068 - momentum: 0.000000 2023-10-14 13:37:31,991 epoch 7 - iter 540/1809 - loss 0.01335206 - time (sec): 300.95 - samples/sec: 382.63 - lr: 0.000066 - momentum: 0.000000 2023-10-14 13:39:05,951 epoch 7 - iter 720/1809 - loss 0.01459110 - time (sec): 394.91 - samples/sec: 385.81 - lr: 0.000064 - momentum: 0.000000 2023-10-14 13:40:38,429 epoch 7 - iter 900/1809 - loss 0.01490005 - time (sec): 487.39 - samples/sec: 388.72 - lr: 0.000062 - momentum: 0.000000 2023-10-14 13:42:11,096 epoch 7 - iter 1080/1809 - loss 0.01507190 - time (sec): 580.06 - samples/sec: 391.67 - lr: 0.000061 - momentum: 0.000000 2023-10-14 13:43:41,678 epoch 7 - iter 1260/1809 - loss 0.01473060 - time (sec): 670.64 - samples/sec: 395.61 - lr: 0.000059 - momentum: 0.000000 2023-10-14 13:45:12,938 epoch 7 - iter 1440/1809 - loss 0.01465842 - time (sec): 761.90 - samples/sec: 399.60 - lr: 0.000057 - momentum: 0.000000 2023-10-14 13:46:44,587 epoch 7 - iter 1620/1809 - loss 0.01508552 - time (sec): 853.55 - samples/sec: 400.09 - lr: 0.000055 - momentum: 0.000000 2023-10-14 13:48:13,609 epoch 7 - iter 1800/1809 - loss 0.01547761 - time (sec): 942.57 - samples/sec: 400.65 - lr: 0.000053 - momentum: 0.000000 2023-10-14 13:48:18,270 ---------------------------------------------------------------------------------------------------- 2023-10-14 13:48:18,270 EPOCH 7 done: loss 0.0154 - lr: 0.000053 2023-10-14 13:48:57,977 DEV : loss 0.29768648743629456 - f1-score (micro avg) 0.6376 2023-10-14 13:48:58,052 ---------------------------------------------------------------------------------------------------- 2023-10-14 13:50:33,364 epoch 8 - iter 180/1809 - loss 0.01237417 - time (sec): 95.31 - samples/sec: 401.98 - lr: 0.000052 - momentum: 0.000000 2023-10-14 13:52:16,095 epoch 8 - iter 360/1809 - loss 0.01212823 - time (sec): 198.04 - samples/sec: 390.76 - lr: 0.000050 - momentum: 0.000000 2023-10-14 13:53:49,344 epoch 8 - iter 540/1809 - loss 0.01054509 - time (sec): 291.29 - samples/sec: 395.46 - lr: 0.000048 - momentum: 0.000000 2023-10-14 13:55:23,207 epoch 8 - iter 720/1809 - loss 0.01148564 - time (sec): 385.15 - samples/sec: 393.79 - lr: 0.000046 - momentum: 0.000000 2023-10-14 13:56:59,828 epoch 8 - iter 900/1809 - loss 0.01104383 - time (sec): 481.77 - samples/sec: 394.42 - lr: 0.000044 - momentum: 0.000000 2023-10-14 13:58:32,234 epoch 8 - iter 1080/1809 - loss 0.01171246 - time (sec): 574.18 - samples/sec: 394.37 - lr: 0.000043 - momentum: 0.000000 2023-10-14 14:00:06,597 epoch 8 - iter 1260/1809 - loss 0.01136383 - time (sec): 668.54 - samples/sec: 396.67 - lr: 0.000041 - momentum: 0.000000 2023-10-14 14:01:39,016 epoch 8 - iter 1440/1809 - loss 0.01180198 - time (sec): 760.96 - samples/sec: 397.42 - lr: 0.000039 - momentum: 0.000000 2023-10-14 14:03:17,681 epoch 8 - iter 1620/1809 - loss 0.01178500 - time (sec): 859.63 - samples/sec: 395.94 - lr: 0.000037 - momentum: 0.000000 2023-10-14 14:04:51,444 epoch 8 - iter 1800/1809 - loss 0.01159822 - time (sec): 953.39 - samples/sec: 396.91 - lr: 0.000036 - momentum: 0.000000 2023-10-14 14:04:55,478 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:04:55,479 EPOCH 8 done: loss 0.0116 - lr: 0.000036 2023-10-14 14:05:34,550 DEV : loss 0.32400456070899963 - f1-score (micro avg) 0.6441 2023-10-14 14:05:34,614 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:07:04,671 epoch 9 - iter 180/1809 - loss 0.00428776 - time (sec): 90.06 - samples/sec: 403.01 - lr: 0.000034 - momentum: 0.000000 2023-10-14 14:08:39,160 epoch 9 - iter 360/1809 - loss 0.00657256 - time (sec): 184.54 - samples/sec: 394.04 - lr: 0.000032 - momentum: 0.000000 2023-10-14 14:10:22,284 epoch 9 - iter 540/1809 - loss 0.00795673 - time (sec): 287.67 - samples/sec: 385.39 - lr: 0.000030 - momentum: 0.000000 2023-10-14 14:11:58,636 epoch 9 - iter 720/1809 - loss 0.00803158 - time (sec): 384.02 - samples/sec: 389.03 - lr: 0.000028 - momentum: 0.000000 2023-10-14 14:13:34,188 epoch 9 - iter 900/1809 - loss 0.00763463 - time (sec): 479.57 - samples/sec: 392.43 - lr: 0.000027 - momentum: 0.000000 2023-10-14 14:15:10,464 epoch 9 - iter 1080/1809 - loss 0.00728638 - time (sec): 575.85 - samples/sec: 391.55 - lr: 0.000025 - momentum: 0.000000 2023-10-14 14:16:47,265 epoch 9 - iter 1260/1809 - loss 0.00750028 - time (sec): 672.65 - samples/sec: 391.08 - lr: 0.000023 - momentum: 0.000000 2023-10-14 14:18:20,876 epoch 9 - iter 1440/1809 - loss 0.00787103 - time (sec): 766.26 - samples/sec: 394.45 - lr: 0.000021 - momentum: 0.000000 2023-10-14 14:19:54,396 epoch 9 - iter 1620/1809 - loss 0.00770686 - time (sec): 859.78 - samples/sec: 396.21 - lr: 0.000020 - momentum: 0.000000 2023-10-14 14:21:40,785 epoch 9 - iter 1800/1809 - loss 0.00782706 - time (sec): 966.17 - samples/sec: 391.34 - lr: 0.000018 - momentum: 0.000000 2023-10-14 14:21:45,616 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:21:45,616 EPOCH 9 done: loss 0.0078 - lr: 0.000018 2023-10-14 14:22:26,271 DEV : loss 0.3510294556617737 - f1-score (micro avg) 0.6469 2023-10-14 14:22:26,332 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:24:08,212 epoch 10 - iter 180/1809 - loss 0.00465200 - time (sec): 101.88 - samples/sec: 366.12 - lr: 0.000016 - momentum: 0.000000 2023-10-14 14:25:42,942 epoch 10 - iter 360/1809 - loss 0.00473108 - time (sec): 196.61 - samples/sec: 384.51 - lr: 0.000014 - momentum: 0.000000 2023-10-14 14:27:21,238 epoch 10 - iter 540/1809 - loss 0.00620538 - time (sec): 294.90 - samples/sec: 387.50 - lr: 0.000012 - momentum: 0.000000 2023-10-14 14:28:57,390 epoch 10 - iter 720/1809 - loss 0.00594168 - time (sec): 391.06 - samples/sec: 387.91 - lr: 0.000011 - momentum: 0.000000 2023-10-14 14:30:32,580 epoch 10 - iter 900/1809 - loss 0.00587265 - time (sec): 486.25 - samples/sec: 389.34 - lr: 0.000009 - momentum: 0.000000 2023-10-14 14:32:07,017 epoch 10 - iter 1080/1809 - loss 0.00574894 - time (sec): 580.68 - samples/sec: 391.80 - lr: 0.000007 - momentum: 0.000000 2023-10-14 14:33:40,067 epoch 10 - iter 1260/1809 - loss 0.00587125 - time (sec): 673.73 - samples/sec: 393.69 - lr: 0.000005 - momentum: 0.000000 2023-10-14 14:35:15,347 epoch 10 - iter 1440/1809 - loss 0.00584501 - time (sec): 769.01 - samples/sec: 395.67 - lr: 0.000004 - momentum: 0.000000 2023-10-14 14:36:49,543 epoch 10 - iter 1620/1809 - loss 0.00598522 - time (sec): 863.21 - samples/sec: 395.43 - lr: 0.000002 - momentum: 0.000000 2023-10-14 14:38:23,873 epoch 10 - iter 1800/1809 - loss 0.00617272 - time (sec): 957.54 - samples/sec: 394.90 - lr: 0.000000 - momentum: 0.000000 2023-10-14 14:38:28,207 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:38:28,208 EPOCH 10 done: loss 0.0062 - lr: 0.000000 2023-10-14 14:39:11,475 DEV : loss 0.35460981726646423 - f1-score (micro avg) 0.6421 2023-10-14 14:39:13,412 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:39:13,414 Loading model from best epoch ... 2023-10-14 14:39:17,156 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org 2023-10-14 14:40:16,219 Results: - F-score (micro) 0.6364 - F-score (macro) 0.4866 - Accuracy 0.4794 By class: precision recall f1-score support loc 0.6326 0.7547 0.6883 591 pers 0.5737 0.7087 0.6341 357 org 0.1731 0.1139 0.1374 79 micro avg 0.5910 0.6894 0.6364 1027 macro avg 0.4598 0.5258 0.4866 1027 weighted avg 0.5768 0.6894 0.6271 1027 2023-10-14 14:40:16,219 ----------------------------------------------------------------------------------------------------