2023-10-12 18:46:41,769 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:46:41,771 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-12 18:46:41,772 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:46:41,772 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-12 18:46:41,772 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:46:41,772 Train: 20847 sentences 2023-10-12 18:46:41,772 (train_with_dev=False, train_with_test=False) 2023-10-12 18:46:41,772 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:46:41,772 Training Params: 2023-10-12 18:46:41,772 - learning_rate: "0.00015" 2023-10-12 18:46:41,772 - mini_batch_size: "4" 2023-10-12 18:46:41,773 - max_epochs: "10" 2023-10-12 18:46:41,773 - shuffle: "True" 2023-10-12 18:46:41,773 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:46:41,773 Plugins: 2023-10-12 18:46:41,773 - TensorboardLogger 2023-10-12 18:46:41,773 - LinearScheduler | warmup_fraction: '0.1' 2023-10-12 18:46:41,773 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:46:41,773 Final evaluation on model from best epoch (best-model.pt) 2023-10-12 18:46:41,773 - metric: "('micro avg', 'f1-score')" 2023-10-12 18:46:41,773 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:46:41,773 Computation: 2023-10-12 18:46:41,773 - compute on device: cuda:0 2023-10-12 18:46:41,773 - embedding storage: none 2023-10-12 18:46:41,774 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:46:41,774 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-5" 2023-10-12 18:46:41,774 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:46:41,774 ---------------------------------------------------------------------------------------------------- 2023-10-12 18:46:41,774 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-12 18:49:15,237 epoch 1 - iter 521/5212 - loss 2.76292849 - time (sec): 153.46 - samples/sec: 265.22 - lr: 0.000015 - momentum: 0.000000 2023-10-12 18:51:48,941 epoch 1 - iter 1042/5212 - loss 2.34035816 - time (sec): 307.16 - samples/sec: 258.13 - lr: 0.000030 - momentum: 0.000000 2023-10-12 18:54:18,340 epoch 1 - iter 1563/5212 - loss 1.85312724 - time (sec): 456.56 - samples/sec: 252.44 - lr: 0.000045 - momentum: 0.000000 2023-10-12 18:56:50,681 epoch 1 - iter 2084/5212 - loss 1.51172426 - time (sec): 608.90 - samples/sec: 249.98 - lr: 0.000060 - momentum: 0.000000 2023-10-12 18:59:20,445 epoch 1 - iter 2605/5212 - loss 1.31079938 - time (sec): 758.67 - samples/sec: 249.93 - lr: 0.000075 - momentum: 0.000000 2023-10-12 19:01:53,736 epoch 1 - iter 3126/5212 - loss 1.15481567 - time (sec): 911.96 - samples/sec: 249.68 - lr: 0.000090 - momentum: 0.000000 2023-10-12 19:04:28,717 epoch 1 - iter 3647/5212 - loss 1.03999737 - time (sec): 1066.94 - samples/sec: 247.05 - lr: 0.000105 - momentum: 0.000000 2023-10-12 19:07:12,068 epoch 1 - iter 4168/5212 - loss 0.94975549 - time (sec): 1230.29 - samples/sec: 243.75 - lr: 0.000120 - momentum: 0.000000 2023-10-12 19:09:41,939 epoch 1 - iter 4689/5212 - loss 0.88001240 - time (sec): 1380.16 - samples/sec: 242.03 - lr: 0.000135 - momentum: 0.000000 2023-10-12 19:12:09,089 epoch 1 - iter 5210/5212 - loss 0.82123342 - time (sec): 1527.31 - samples/sec: 240.48 - lr: 0.000150 - momentum: 0.000000 2023-10-12 19:12:09,617 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:12:09,617 EPOCH 1 done: loss 0.8210 - lr: 0.000150 2023-10-12 19:12:47,892 DEV : loss 0.1282375454902649 - f1-score (micro avg) 0.1789 2023-10-12 19:12:47,951 saving best model 2023-10-12 19:12:48,879 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:15:16,200 epoch 2 - iter 521/5212 - loss 0.21112691 - time (sec): 147.32 - samples/sec: 242.92 - lr: 0.000148 - momentum: 0.000000 2023-10-12 19:17:45,002 epoch 2 - iter 1042/5212 - loss 0.18824513 - time (sec): 296.12 - samples/sec: 242.99 - lr: 0.000147 - momentum: 0.000000 2023-10-12 19:20:21,035 epoch 2 - iter 1563/5212 - loss 0.17597690 - time (sec): 452.15 - samples/sec: 236.05 - lr: 0.000145 - momentum: 0.000000 2023-10-12 19:23:00,731 epoch 2 - iter 2084/5212 - loss 0.17106880 - time (sec): 611.85 - samples/sec: 234.34 - lr: 0.000143 - momentum: 0.000000 2023-10-12 19:25:39,395 epoch 2 - iter 2605/5212 - loss 0.16970840 - time (sec): 770.51 - samples/sec: 233.26 - lr: 0.000142 - momentum: 0.000000 2023-10-12 19:28:16,729 epoch 2 - iter 3126/5212 - loss 0.16401569 - time (sec): 927.85 - samples/sec: 235.26 - lr: 0.000140 - momentum: 0.000000 2023-10-12 19:30:55,046 epoch 2 - iter 3647/5212 - loss 0.16203738 - time (sec): 1086.16 - samples/sec: 233.86 - lr: 0.000138 - momentum: 0.000000 2023-10-12 19:33:34,826 epoch 2 - iter 4168/5212 - loss 0.16114061 - time (sec): 1245.94 - samples/sec: 234.36 - lr: 0.000137 - momentum: 0.000000 2023-10-12 19:36:12,855 epoch 2 - iter 4689/5212 - loss 0.15881510 - time (sec): 1403.97 - samples/sec: 234.89 - lr: 0.000135 - momentum: 0.000000 2023-10-12 19:38:56,123 epoch 2 - iter 5210/5212 - loss 0.15670979 - time (sec): 1567.24 - samples/sec: 234.31 - lr: 0.000133 - momentum: 0.000000 2023-10-12 19:38:56,741 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:38:56,742 EPOCH 2 done: loss 0.1567 - lr: 0.000133 2023-10-12 19:39:39,874 DEV : loss 0.16946110129356384 - f1-score (micro avg) 0.3451 2023-10-12 19:39:39,943 saving best model 2023-10-12 19:39:42,763 ---------------------------------------------------------------------------------------------------- 2023-10-12 19:42:21,014 epoch 3 - iter 521/5212 - loss 0.10074178 - time (sec): 158.25 - samples/sec: 224.62 - lr: 0.000132 - momentum: 0.000000 2023-10-12 19:44:59,477 epoch 3 - iter 1042/5212 - loss 0.10972804 - time (sec): 316.71 - samples/sec: 217.74 - lr: 0.000130 - momentum: 0.000000 2023-10-12 19:47:35,591 epoch 3 - iter 1563/5212 - loss 0.10493572 - time (sec): 472.82 - samples/sec: 225.90 - lr: 0.000128 - momentum: 0.000000 2023-10-12 19:50:16,866 epoch 3 - iter 2084/5212 - loss 0.10489412 - time (sec): 634.10 - samples/sec: 225.16 - lr: 0.000127 - momentum: 0.000000 2023-10-12 19:52:51,660 epoch 3 - iter 2605/5212 - loss 0.10195802 - time (sec): 788.89 - samples/sec: 227.66 - lr: 0.000125 - momentum: 0.000000 2023-10-12 19:55:27,756 epoch 3 - iter 3126/5212 - loss 0.10181064 - time (sec): 944.99 - samples/sec: 229.18 - lr: 0.000123 - momentum: 0.000000 2023-10-12 19:58:01,865 epoch 3 - iter 3647/5212 - loss 0.10418318 - time (sec): 1099.10 - samples/sec: 229.25 - lr: 0.000122 - momentum: 0.000000 2023-10-12 20:00:37,316 epoch 3 - iter 4168/5212 - loss 0.10492334 - time (sec): 1254.55 - samples/sec: 231.05 - lr: 0.000120 - momentum: 0.000000 2023-10-12 20:03:14,249 epoch 3 - iter 4689/5212 - loss 0.10268963 - time (sec): 1411.48 - samples/sec: 233.37 - lr: 0.000118 - momentum: 0.000000 2023-10-12 20:05:50,322 epoch 3 - iter 5210/5212 - loss 0.10507983 - time (sec): 1567.56 - samples/sec: 234.33 - lr: 0.000117 - momentum: 0.000000 2023-10-12 20:05:50,832 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:05:50,832 EPOCH 3 done: loss 0.1051 - lr: 0.000117 2023-10-12 20:06:34,155 DEV : loss 0.2250872701406479 - f1-score (micro avg) 0.3354 2023-10-12 20:06:34,209 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:09:06,488 epoch 4 - iter 521/5212 - loss 0.08085581 - time (sec): 152.28 - samples/sec: 234.78 - lr: 0.000115 - momentum: 0.000000 2023-10-12 20:11:41,459 epoch 4 - iter 1042/5212 - loss 0.07403465 - time (sec): 307.25 - samples/sec: 237.94 - lr: 0.000113 - momentum: 0.000000 2023-10-12 20:14:15,463 epoch 4 - iter 1563/5212 - loss 0.07375911 - time (sec): 461.25 - samples/sec: 237.25 - lr: 0.000112 - momentum: 0.000000 2023-10-12 20:16:48,933 epoch 4 - iter 2084/5212 - loss 0.07382402 - time (sec): 614.72 - samples/sec: 234.88 - lr: 0.000110 - momentum: 0.000000 2023-10-12 20:19:25,310 epoch 4 - iter 2605/5212 - loss 0.07632121 - time (sec): 771.10 - samples/sec: 236.86 - lr: 0.000108 - momentum: 0.000000 2023-10-12 20:22:01,623 epoch 4 - iter 3126/5212 - loss 0.07620217 - time (sec): 927.41 - samples/sec: 238.49 - lr: 0.000107 - momentum: 0.000000 2023-10-12 20:24:35,917 epoch 4 - iter 3647/5212 - loss 0.07340067 - time (sec): 1081.71 - samples/sec: 238.77 - lr: 0.000105 - momentum: 0.000000 2023-10-12 20:27:09,657 epoch 4 - iter 4168/5212 - loss 0.07375064 - time (sec): 1235.45 - samples/sec: 238.55 - lr: 0.000103 - momentum: 0.000000 2023-10-12 20:29:43,452 epoch 4 - iter 4689/5212 - loss 0.07293438 - time (sec): 1389.24 - samples/sec: 238.15 - lr: 0.000102 - momentum: 0.000000 2023-10-12 20:32:16,851 epoch 4 - iter 5210/5212 - loss 0.07341752 - time (sec): 1542.64 - samples/sec: 238.08 - lr: 0.000100 - momentum: 0.000000 2023-10-12 20:32:17,431 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:32:17,431 EPOCH 4 done: loss 0.0735 - lr: 0.000100 2023-10-12 20:32:59,595 DEV : loss 0.23921194672584534 - f1-score (micro avg) 0.3737 2023-10-12 20:32:59,649 saving best model 2023-10-12 20:33:02,488 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:35:36,727 epoch 5 - iter 521/5212 - loss 0.03913308 - time (sec): 154.23 - samples/sec: 233.83 - lr: 0.000098 - momentum: 0.000000 2023-10-12 20:38:11,970 epoch 5 - iter 1042/5212 - loss 0.05004518 - time (sec): 309.48 - samples/sec: 226.85 - lr: 0.000097 - momentum: 0.000000 2023-10-12 20:40:54,356 epoch 5 - iter 1563/5212 - loss 0.05132965 - time (sec): 471.86 - samples/sec: 228.10 - lr: 0.000095 - momentum: 0.000000 2023-10-12 20:43:33,992 epoch 5 - iter 2084/5212 - loss 0.05244491 - time (sec): 631.50 - samples/sec: 227.65 - lr: 0.000093 - momentum: 0.000000 2023-10-12 20:46:05,541 epoch 5 - iter 2605/5212 - loss 0.05103574 - time (sec): 783.05 - samples/sec: 231.43 - lr: 0.000092 - momentum: 0.000000 2023-10-12 20:48:38,344 epoch 5 - iter 3126/5212 - loss 0.04987451 - time (sec): 935.85 - samples/sec: 236.61 - lr: 0.000090 - momentum: 0.000000 2023-10-12 20:51:13,890 epoch 5 - iter 3647/5212 - loss 0.05008071 - time (sec): 1091.40 - samples/sec: 238.11 - lr: 0.000088 - momentum: 0.000000 2023-10-12 20:53:47,576 epoch 5 - iter 4168/5212 - loss 0.05030602 - time (sec): 1245.08 - samples/sec: 235.30 - lr: 0.000087 - momentum: 0.000000 2023-10-12 20:56:25,717 epoch 5 - iter 4689/5212 - loss 0.04910738 - time (sec): 1403.22 - samples/sec: 235.57 - lr: 0.000085 - momentum: 0.000000 2023-10-12 20:59:02,470 epoch 5 - iter 5210/5212 - loss 0.04976601 - time (sec): 1559.98 - samples/sec: 235.43 - lr: 0.000083 - momentum: 0.000000 2023-10-12 20:59:03,023 ---------------------------------------------------------------------------------------------------- 2023-10-12 20:59:03,024 EPOCH 5 done: loss 0.0497 - lr: 0.000083 2023-10-12 20:59:45,820 DEV : loss 0.3263615667819977 - f1-score (micro avg) 0.384 2023-10-12 20:59:45,890 saving best model 2023-10-12 20:59:48,757 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:02:23,308 epoch 6 - iter 521/5212 - loss 0.02791178 - time (sec): 154.55 - samples/sec: 228.86 - lr: 0.000082 - momentum: 0.000000 2023-10-12 21:05:03,072 epoch 6 - iter 1042/5212 - loss 0.03190034 - time (sec): 314.31 - samples/sec: 236.91 - lr: 0.000080 - momentum: 0.000000 2023-10-12 21:07:39,800 epoch 6 - iter 1563/5212 - loss 0.03265756 - time (sec): 471.04 - samples/sec: 237.82 - lr: 0.000078 - momentum: 0.000000 2023-10-12 21:10:13,089 epoch 6 - iter 2084/5212 - loss 0.03451907 - time (sec): 624.33 - samples/sec: 235.96 - lr: 0.000077 - momentum: 0.000000 2023-10-12 21:12:46,251 epoch 6 - iter 2605/5212 - loss 0.03548737 - time (sec): 777.49 - samples/sec: 233.74 - lr: 0.000075 - momentum: 0.000000 2023-10-12 21:15:24,589 epoch 6 - iter 3126/5212 - loss 0.03533261 - time (sec): 935.83 - samples/sec: 236.24 - lr: 0.000073 - momentum: 0.000000 2023-10-12 21:18:01,460 epoch 6 - iter 3647/5212 - loss 0.03553896 - time (sec): 1092.70 - samples/sec: 237.24 - lr: 0.000072 - momentum: 0.000000 2023-10-12 21:20:34,172 epoch 6 - iter 4168/5212 - loss 0.03549980 - time (sec): 1245.41 - samples/sec: 235.23 - lr: 0.000070 - momentum: 0.000000 2023-10-12 21:23:07,241 epoch 6 - iter 4689/5212 - loss 0.03556410 - time (sec): 1398.48 - samples/sec: 235.14 - lr: 0.000068 - momentum: 0.000000 2023-10-12 21:25:43,273 epoch 6 - iter 5210/5212 - loss 0.03566000 - time (sec): 1554.51 - samples/sec: 236.10 - lr: 0.000067 - momentum: 0.000000 2023-10-12 21:25:44,091 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:25:44,091 EPOCH 6 done: loss 0.0356 - lr: 0.000067 2023-10-12 21:26:26,238 DEV : loss 0.4169439971446991 - f1-score (micro avg) 0.3567 2023-10-12 21:26:26,291 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:29:00,654 epoch 7 - iter 521/5212 - loss 0.02119835 - time (sec): 154.36 - samples/sec: 239.30 - lr: 0.000065 - momentum: 0.000000 2023-10-12 21:31:33,480 epoch 7 - iter 1042/5212 - loss 0.02655194 - time (sec): 307.19 - samples/sec: 238.23 - lr: 0.000063 - momentum: 0.000000 2023-10-12 21:34:06,511 epoch 7 - iter 1563/5212 - loss 0.02488243 - time (sec): 460.22 - samples/sec: 238.09 - lr: 0.000062 - momentum: 0.000000 2023-10-12 21:36:40,328 epoch 7 - iter 2084/5212 - loss 0.02403086 - time (sec): 614.03 - samples/sec: 238.85 - lr: 0.000060 - momentum: 0.000000 2023-10-12 21:39:12,655 epoch 7 - iter 2605/5212 - loss 0.02533732 - time (sec): 766.36 - samples/sec: 240.07 - lr: 0.000058 - momentum: 0.000000 2023-10-12 21:41:48,693 epoch 7 - iter 3126/5212 - loss 0.02466719 - time (sec): 922.40 - samples/sec: 244.66 - lr: 0.000057 - momentum: 0.000000 2023-10-12 21:44:21,354 epoch 7 - iter 3647/5212 - loss 0.02434422 - time (sec): 1075.06 - samples/sec: 243.70 - lr: 0.000055 - momentum: 0.000000 2023-10-12 21:46:50,093 epoch 7 - iter 4168/5212 - loss 0.02517306 - time (sec): 1223.80 - samples/sec: 241.32 - lr: 0.000053 - momentum: 0.000000 2023-10-12 21:49:21,621 epoch 7 - iter 4689/5212 - loss 0.02542309 - time (sec): 1375.33 - samples/sec: 240.61 - lr: 0.000052 - momentum: 0.000000 2023-10-12 21:51:50,963 epoch 7 - iter 5210/5212 - loss 0.02492844 - time (sec): 1524.67 - samples/sec: 240.92 - lr: 0.000050 - momentum: 0.000000 2023-10-12 21:51:51,473 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:51:51,474 EPOCH 7 done: loss 0.0249 - lr: 0.000050 2023-10-12 21:52:34,668 DEV : loss 0.44921109080314636 - f1-score (micro avg) 0.3588 2023-10-12 21:52:34,724 ---------------------------------------------------------------------------------------------------- 2023-10-12 21:55:06,252 epoch 8 - iter 521/5212 - loss 0.01723823 - time (sec): 151.53 - samples/sec: 248.05 - lr: 0.000048 - momentum: 0.000000 2023-10-12 21:57:37,292 epoch 8 - iter 1042/5212 - loss 0.01658125 - time (sec): 302.57 - samples/sec: 242.31 - lr: 0.000047 - momentum: 0.000000 2023-10-12 22:00:10,298 epoch 8 - iter 1563/5212 - loss 0.01573445 - time (sec): 455.57 - samples/sec: 242.90 - lr: 0.000045 - momentum: 0.000000 2023-10-12 22:02:43,368 epoch 8 - iter 2084/5212 - loss 0.01677614 - time (sec): 608.64 - samples/sec: 243.81 - lr: 0.000043 - momentum: 0.000000 2023-10-12 22:05:14,154 epoch 8 - iter 2605/5212 - loss 0.01684349 - time (sec): 759.43 - samples/sec: 240.27 - lr: 0.000042 - momentum: 0.000000 2023-10-12 22:07:45,939 epoch 8 - iter 3126/5212 - loss 0.01664639 - time (sec): 911.21 - samples/sec: 240.75 - lr: 0.000040 - momentum: 0.000000 2023-10-12 22:10:15,255 epoch 8 - iter 3647/5212 - loss 0.01741545 - time (sec): 1060.53 - samples/sec: 239.06 - lr: 0.000038 - momentum: 0.000000 2023-10-12 22:12:44,856 epoch 8 - iter 4168/5212 - loss 0.01704525 - time (sec): 1210.13 - samples/sec: 239.20 - lr: 0.000037 - momentum: 0.000000 2023-10-12 22:15:17,516 epoch 8 - iter 4689/5212 - loss 0.01702684 - time (sec): 1362.79 - samples/sec: 242.11 - lr: 0.000035 - momentum: 0.000000 2023-10-12 22:17:46,344 epoch 8 - iter 5210/5212 - loss 0.01674797 - time (sec): 1511.62 - samples/sec: 243.02 - lr: 0.000033 - momentum: 0.000000 2023-10-12 22:17:46,804 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:17:46,804 EPOCH 8 done: loss 0.0167 - lr: 0.000033 2023-10-12 22:18:27,780 DEV : loss 0.4403105676174164 - f1-score (micro avg) 0.3928 2023-10-12 22:18:27,855 saving best model 2023-10-12 22:18:31,457 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:20:59,841 epoch 9 - iter 521/5212 - loss 0.01491704 - time (sec): 148.38 - samples/sec: 252.05 - lr: 0.000032 - momentum: 0.000000 2023-10-12 22:23:25,151 epoch 9 - iter 1042/5212 - loss 0.01413508 - time (sec): 293.69 - samples/sec: 239.66 - lr: 0.000030 - momentum: 0.000000 2023-10-12 22:25:53,399 epoch 9 - iter 1563/5212 - loss 0.01315106 - time (sec): 441.94 - samples/sec: 243.38 - lr: 0.000028 - momentum: 0.000000 2023-10-12 22:28:21,805 epoch 9 - iter 2084/5212 - loss 0.01196193 - time (sec): 590.34 - samples/sec: 245.16 - lr: 0.000027 - momentum: 0.000000 2023-10-12 22:30:52,285 epoch 9 - iter 2605/5212 - loss 0.01228176 - time (sec): 740.82 - samples/sec: 247.75 - lr: 0.000025 - momentum: 0.000000 2023-10-12 22:33:20,379 epoch 9 - iter 3126/5212 - loss 0.01164398 - time (sec): 888.92 - samples/sec: 246.57 - lr: 0.000023 - momentum: 0.000000 2023-10-12 22:35:51,301 epoch 9 - iter 3647/5212 - loss 0.01149750 - time (sec): 1039.84 - samples/sec: 247.54 - lr: 0.000022 - momentum: 0.000000 2023-10-12 22:38:21,783 epoch 9 - iter 4168/5212 - loss 0.01196632 - time (sec): 1190.32 - samples/sec: 246.94 - lr: 0.000020 - momentum: 0.000000 2023-10-12 22:40:52,663 epoch 9 - iter 4689/5212 - loss 0.01189000 - time (sec): 1341.20 - samples/sec: 246.35 - lr: 0.000018 - momentum: 0.000000 2023-10-12 22:43:19,842 epoch 9 - iter 5210/5212 - loss 0.01156761 - time (sec): 1488.38 - samples/sec: 246.82 - lr: 0.000017 - momentum: 0.000000 2023-10-12 22:43:20,283 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:43:20,284 EPOCH 9 done: loss 0.0116 - lr: 0.000017 2023-10-12 22:44:01,953 DEV : loss 0.44486090540885925 - f1-score (micro avg) 0.4028 2023-10-12 22:44:02,009 saving best model 2023-10-12 22:44:04,674 ---------------------------------------------------------------------------------------------------- 2023-10-12 22:46:35,093 epoch 10 - iter 521/5212 - loss 0.01030731 - time (sec): 150.41 - samples/sec: 243.36 - lr: 0.000015 - momentum: 0.000000 2023-10-12 22:49:05,792 epoch 10 - iter 1042/5212 - loss 0.01158820 - time (sec): 301.11 - samples/sec: 241.13 - lr: 0.000013 - momentum: 0.000000 2023-10-12 22:51:50,787 epoch 10 - iter 1563/5212 - loss 0.00996155 - time (sec): 466.11 - samples/sec: 235.84 - lr: 0.000012 - momentum: 0.000000 2023-10-12 22:54:31,236 epoch 10 - iter 2084/5212 - loss 0.00898869 - time (sec): 626.56 - samples/sec: 231.92 - lr: 0.000010 - momentum: 0.000000 2023-10-12 22:57:06,304 epoch 10 - iter 2605/5212 - loss 0.00842857 - time (sec): 781.62 - samples/sec: 232.94 - lr: 0.000008 - momentum: 0.000000 2023-10-12 22:59:38,073 epoch 10 - iter 3126/5212 - loss 0.00869311 - time (sec): 933.39 - samples/sec: 234.97 - lr: 0.000007 - momentum: 0.000000 2023-10-12 23:02:07,453 epoch 10 - iter 3647/5212 - loss 0.00833985 - time (sec): 1082.77 - samples/sec: 234.54 - lr: 0.000005 - momentum: 0.000000 2023-10-12 23:04:44,311 epoch 10 - iter 4168/5212 - loss 0.00850446 - time (sec): 1239.63 - samples/sec: 234.93 - lr: 0.000003 - momentum: 0.000000 2023-10-12 23:07:20,219 epoch 10 - iter 4689/5212 - loss 0.00844827 - time (sec): 1395.54 - samples/sec: 235.44 - lr: 0.000002 - momentum: 0.000000 2023-10-12 23:09:57,735 epoch 10 - iter 5210/5212 - loss 0.00809271 - time (sec): 1553.06 - samples/sec: 236.51 - lr: 0.000000 - momentum: 0.000000 2023-10-12 23:09:58,253 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:09:58,253 EPOCH 10 done: loss 0.0081 - lr: 0.000000 2023-10-12 23:10:41,281 DEV : loss 0.502132773399353 - f1-score (micro avg) 0.4 2023-10-12 23:10:42,276 ---------------------------------------------------------------------------------------------------- 2023-10-12 23:10:42,278 Loading model from best epoch ... 2023-10-12 23:10:46,633 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-12 23:12:31,935 Results: - F-score (micro) 0.4616 - F-score (macro) 0.3197 - Accuracy 0.3054 By class: precision recall f1-score support LOC 0.4873 0.5387 0.5117 1214 PER 0.4096 0.5186 0.4577 808 ORG 0.2997 0.3201 0.3096 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4314 0.4962 0.4616 2390 macro avg 0.2992 0.3443 0.3197 2390 weighted avg 0.4303 0.4962 0.4604 2390 2023-10-12 23:12:31,936 ----------------------------------------------------------------------------------------------------