|
2023-10-12 18:46:41,769 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:46:41,771 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-12 18:46:41,772 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:46:41,772 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences |
|
- NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator |
|
2023-10-12 18:46:41,772 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:46:41,772 Train: 20847 sentences |
|
2023-10-12 18:46:41,772 (train_with_dev=False, train_with_test=False) |
|
2023-10-12 18:46:41,772 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:46:41,772 Training Params: |
|
2023-10-12 18:46:41,772 - learning_rate: "0.00015" |
|
2023-10-12 18:46:41,772 - mini_batch_size: "4" |
|
2023-10-12 18:46:41,773 - max_epochs: "10" |
|
2023-10-12 18:46:41,773 - shuffle: "True" |
|
2023-10-12 18:46:41,773 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:46:41,773 Plugins: |
|
2023-10-12 18:46:41,773 - TensorboardLogger |
|
2023-10-12 18:46:41,773 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-12 18:46:41,773 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:46:41,773 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-12 18:46:41,773 - metric: "('micro avg', 'f1-score')" |
|
2023-10-12 18:46:41,773 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:46:41,773 Computation: |
|
2023-10-12 18:46:41,773 - compute on device: cuda:0 |
|
2023-10-12 18:46:41,773 - embedding storage: none |
|
2023-10-12 18:46:41,774 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:46:41,774 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-5" |
|
2023-10-12 18:46:41,774 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:46:41,774 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 18:46:41,774 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-12 18:49:15,237 epoch 1 - iter 521/5212 - loss 2.76292849 - time (sec): 153.46 - samples/sec: 265.22 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-12 18:51:48,941 epoch 1 - iter 1042/5212 - loss 2.34035816 - time (sec): 307.16 - samples/sec: 258.13 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-12 18:54:18,340 epoch 1 - iter 1563/5212 - loss 1.85312724 - time (sec): 456.56 - samples/sec: 252.44 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-12 18:56:50,681 epoch 1 - iter 2084/5212 - loss 1.51172426 - time (sec): 608.90 - samples/sec: 249.98 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-12 18:59:20,445 epoch 1 - iter 2605/5212 - loss 1.31079938 - time (sec): 758.67 - samples/sec: 249.93 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-12 19:01:53,736 epoch 1 - iter 3126/5212 - loss 1.15481567 - time (sec): 911.96 - samples/sec: 249.68 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-12 19:04:28,717 epoch 1 - iter 3647/5212 - loss 1.03999737 - time (sec): 1066.94 - samples/sec: 247.05 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-12 19:07:12,068 epoch 1 - iter 4168/5212 - loss 0.94975549 - time (sec): 1230.29 - samples/sec: 243.75 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-12 19:09:41,939 epoch 1 - iter 4689/5212 - loss 0.88001240 - time (sec): 1380.16 - samples/sec: 242.03 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-12 19:12:09,089 epoch 1 - iter 5210/5212 - loss 0.82123342 - time (sec): 1527.31 - samples/sec: 240.48 - lr: 0.000150 - momentum: 0.000000 |
|
2023-10-12 19:12:09,617 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:12:09,617 EPOCH 1 done: loss 0.8210 - lr: 0.000150 |
|
2023-10-12 19:12:47,892 DEV : loss 0.1282375454902649 - f1-score (micro avg) 0.1789 |
|
2023-10-12 19:12:47,951 saving best model |
|
2023-10-12 19:12:48,879 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:15:16,200 epoch 2 - iter 521/5212 - loss 0.21112691 - time (sec): 147.32 - samples/sec: 242.92 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-12 19:17:45,002 epoch 2 - iter 1042/5212 - loss 0.18824513 - time (sec): 296.12 - samples/sec: 242.99 - lr: 0.000147 - momentum: 0.000000 |
|
2023-10-12 19:20:21,035 epoch 2 - iter 1563/5212 - loss 0.17597690 - time (sec): 452.15 - samples/sec: 236.05 - lr: 0.000145 - momentum: 0.000000 |
|
2023-10-12 19:23:00,731 epoch 2 - iter 2084/5212 - loss 0.17106880 - time (sec): 611.85 - samples/sec: 234.34 - lr: 0.000143 - momentum: 0.000000 |
|
2023-10-12 19:25:39,395 epoch 2 - iter 2605/5212 - loss 0.16970840 - time (sec): 770.51 - samples/sec: 233.26 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-12 19:28:16,729 epoch 2 - iter 3126/5212 - loss 0.16401569 - time (sec): 927.85 - samples/sec: 235.26 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-12 19:30:55,046 epoch 2 - iter 3647/5212 - loss 0.16203738 - time (sec): 1086.16 - samples/sec: 233.86 - lr: 0.000138 - momentum: 0.000000 |
|
2023-10-12 19:33:34,826 epoch 2 - iter 4168/5212 - loss 0.16114061 - time (sec): 1245.94 - samples/sec: 234.36 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-12 19:36:12,855 epoch 2 - iter 4689/5212 - loss 0.15881510 - time (sec): 1403.97 - samples/sec: 234.89 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-12 19:38:56,123 epoch 2 - iter 5210/5212 - loss 0.15670979 - time (sec): 1567.24 - samples/sec: 234.31 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-12 19:38:56,741 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:38:56,742 EPOCH 2 done: loss 0.1567 - lr: 0.000133 |
|
2023-10-12 19:39:39,874 DEV : loss 0.16946110129356384 - f1-score (micro avg) 0.3451 |
|
2023-10-12 19:39:39,943 saving best model |
|
2023-10-12 19:39:42,763 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 19:42:21,014 epoch 3 - iter 521/5212 - loss 0.10074178 - time (sec): 158.25 - samples/sec: 224.62 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-12 19:44:59,477 epoch 3 - iter 1042/5212 - loss 0.10972804 - time (sec): 316.71 - samples/sec: 217.74 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-12 19:47:35,591 epoch 3 - iter 1563/5212 - loss 0.10493572 - time (sec): 472.82 - samples/sec: 225.90 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-12 19:50:16,866 epoch 3 - iter 2084/5212 - loss 0.10489412 - time (sec): 634.10 - samples/sec: 225.16 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-12 19:52:51,660 epoch 3 - iter 2605/5212 - loss 0.10195802 - time (sec): 788.89 - samples/sec: 227.66 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-12 19:55:27,756 epoch 3 - iter 3126/5212 - loss 0.10181064 - time (sec): 944.99 - samples/sec: 229.18 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-12 19:58:01,865 epoch 3 - iter 3647/5212 - loss 0.10418318 - time (sec): 1099.10 - samples/sec: 229.25 - lr: 0.000122 - momentum: 0.000000 |
|
2023-10-12 20:00:37,316 epoch 3 - iter 4168/5212 - loss 0.10492334 - time (sec): 1254.55 - samples/sec: 231.05 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-12 20:03:14,249 epoch 3 - iter 4689/5212 - loss 0.10268963 - time (sec): 1411.48 - samples/sec: 233.37 - lr: 0.000118 - momentum: 0.000000 |
|
2023-10-12 20:05:50,322 epoch 3 - iter 5210/5212 - loss 0.10507983 - time (sec): 1567.56 - samples/sec: 234.33 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-12 20:05:50,832 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 20:05:50,832 EPOCH 3 done: loss 0.1051 - lr: 0.000117 |
|
2023-10-12 20:06:34,155 DEV : loss 0.2250872701406479 - f1-score (micro avg) 0.3354 |
|
2023-10-12 20:06:34,209 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 20:09:06,488 epoch 4 - iter 521/5212 - loss 0.08085581 - time (sec): 152.28 - samples/sec: 234.78 - lr: 0.000115 - momentum: 0.000000 |
|
2023-10-12 20:11:41,459 epoch 4 - iter 1042/5212 - loss 0.07403465 - time (sec): 307.25 - samples/sec: 237.94 - lr: 0.000113 - momentum: 0.000000 |
|
2023-10-12 20:14:15,463 epoch 4 - iter 1563/5212 - loss 0.07375911 - time (sec): 461.25 - samples/sec: 237.25 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-12 20:16:48,933 epoch 4 - iter 2084/5212 - loss 0.07382402 - time (sec): 614.72 - samples/sec: 234.88 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-12 20:19:25,310 epoch 4 - iter 2605/5212 - loss 0.07632121 - time (sec): 771.10 - samples/sec: 236.86 - lr: 0.000108 - momentum: 0.000000 |
|
2023-10-12 20:22:01,623 epoch 4 - iter 3126/5212 - loss 0.07620217 - time (sec): 927.41 - samples/sec: 238.49 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-12 20:24:35,917 epoch 4 - iter 3647/5212 - loss 0.07340067 - time (sec): 1081.71 - samples/sec: 238.77 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-12 20:27:09,657 epoch 4 - iter 4168/5212 - loss 0.07375064 - time (sec): 1235.45 - samples/sec: 238.55 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-12 20:29:43,452 epoch 4 - iter 4689/5212 - loss 0.07293438 - time (sec): 1389.24 - samples/sec: 238.15 - lr: 0.000102 - momentum: 0.000000 |
|
2023-10-12 20:32:16,851 epoch 4 - iter 5210/5212 - loss 0.07341752 - time (sec): 1542.64 - samples/sec: 238.08 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-12 20:32:17,431 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 20:32:17,431 EPOCH 4 done: loss 0.0735 - lr: 0.000100 |
|
2023-10-12 20:32:59,595 DEV : loss 0.23921194672584534 - f1-score (micro avg) 0.3737 |
|
2023-10-12 20:32:59,649 saving best model |
|
2023-10-12 20:33:02,488 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 20:35:36,727 epoch 5 - iter 521/5212 - loss 0.03913308 - time (sec): 154.23 - samples/sec: 233.83 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-12 20:38:11,970 epoch 5 - iter 1042/5212 - loss 0.05004518 - time (sec): 309.48 - samples/sec: 226.85 - lr: 0.000097 - momentum: 0.000000 |
|
2023-10-12 20:40:54,356 epoch 5 - iter 1563/5212 - loss 0.05132965 - time (sec): 471.86 - samples/sec: 228.10 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-12 20:43:33,992 epoch 5 - iter 2084/5212 - loss 0.05244491 - time (sec): 631.50 - samples/sec: 227.65 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-12 20:46:05,541 epoch 5 - iter 2605/5212 - loss 0.05103574 - time (sec): 783.05 - samples/sec: 231.43 - lr: 0.000092 - momentum: 0.000000 |
|
2023-10-12 20:48:38,344 epoch 5 - iter 3126/5212 - loss 0.04987451 - time (sec): 935.85 - samples/sec: 236.61 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-12 20:51:13,890 epoch 5 - iter 3647/5212 - loss 0.05008071 - time (sec): 1091.40 - samples/sec: 238.11 - lr: 0.000088 - momentum: 0.000000 |
|
2023-10-12 20:53:47,576 epoch 5 - iter 4168/5212 - loss 0.05030602 - time (sec): 1245.08 - samples/sec: 235.30 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-12 20:56:25,717 epoch 5 - iter 4689/5212 - loss 0.04910738 - time (sec): 1403.22 - samples/sec: 235.57 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-12 20:59:02,470 epoch 5 - iter 5210/5212 - loss 0.04976601 - time (sec): 1559.98 - samples/sec: 235.43 - lr: 0.000083 - momentum: 0.000000 |
|
2023-10-12 20:59:03,023 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 20:59:03,024 EPOCH 5 done: loss 0.0497 - lr: 0.000083 |
|
2023-10-12 20:59:45,820 DEV : loss 0.3263615667819977 - f1-score (micro avg) 0.384 |
|
2023-10-12 20:59:45,890 saving best model |
|
2023-10-12 20:59:48,757 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 21:02:23,308 epoch 6 - iter 521/5212 - loss 0.02791178 - time (sec): 154.55 - samples/sec: 228.86 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-12 21:05:03,072 epoch 6 - iter 1042/5212 - loss 0.03190034 - time (sec): 314.31 - samples/sec: 236.91 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-12 21:07:39,800 epoch 6 - iter 1563/5212 - loss 0.03265756 - time (sec): 471.04 - samples/sec: 237.82 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-12 21:10:13,089 epoch 6 - iter 2084/5212 - loss 0.03451907 - time (sec): 624.33 - samples/sec: 235.96 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-12 21:12:46,251 epoch 6 - iter 2605/5212 - loss 0.03548737 - time (sec): 777.49 - samples/sec: 233.74 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-12 21:15:24,589 epoch 6 - iter 3126/5212 - loss 0.03533261 - time (sec): 935.83 - samples/sec: 236.24 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-12 21:18:01,460 epoch 6 - iter 3647/5212 - loss 0.03553896 - time (sec): 1092.70 - samples/sec: 237.24 - lr: 0.000072 - momentum: 0.000000 |
|
2023-10-12 21:20:34,172 epoch 6 - iter 4168/5212 - loss 0.03549980 - time (sec): 1245.41 - samples/sec: 235.23 - lr: 0.000070 - momentum: 0.000000 |
|
2023-10-12 21:23:07,241 epoch 6 - iter 4689/5212 - loss 0.03556410 - time (sec): 1398.48 - samples/sec: 235.14 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-12 21:25:43,273 epoch 6 - iter 5210/5212 - loss 0.03566000 - time (sec): 1554.51 - samples/sec: 236.10 - lr: 0.000067 - momentum: 0.000000 |
|
2023-10-12 21:25:44,091 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 21:25:44,091 EPOCH 6 done: loss 0.0356 - lr: 0.000067 |
|
2023-10-12 21:26:26,238 DEV : loss 0.4169439971446991 - f1-score (micro avg) 0.3567 |
|
2023-10-12 21:26:26,291 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 21:29:00,654 epoch 7 - iter 521/5212 - loss 0.02119835 - time (sec): 154.36 - samples/sec: 239.30 - lr: 0.000065 - momentum: 0.000000 |
|
2023-10-12 21:31:33,480 epoch 7 - iter 1042/5212 - loss 0.02655194 - time (sec): 307.19 - samples/sec: 238.23 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-12 21:34:06,511 epoch 7 - iter 1563/5212 - loss 0.02488243 - time (sec): 460.22 - samples/sec: 238.09 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-12 21:36:40,328 epoch 7 - iter 2084/5212 - loss 0.02403086 - time (sec): 614.03 - samples/sec: 238.85 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-12 21:39:12,655 epoch 7 - iter 2605/5212 - loss 0.02533732 - time (sec): 766.36 - samples/sec: 240.07 - lr: 0.000058 - momentum: 0.000000 |
|
2023-10-12 21:41:48,693 epoch 7 - iter 3126/5212 - loss 0.02466719 - time (sec): 922.40 - samples/sec: 244.66 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-12 21:44:21,354 epoch 7 - iter 3647/5212 - loss 0.02434422 - time (sec): 1075.06 - samples/sec: 243.70 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-12 21:46:50,093 epoch 7 - iter 4168/5212 - loss 0.02517306 - time (sec): 1223.80 - samples/sec: 241.32 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-12 21:49:21,621 epoch 7 - iter 4689/5212 - loss 0.02542309 - time (sec): 1375.33 - samples/sec: 240.61 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-12 21:51:50,963 epoch 7 - iter 5210/5212 - loss 0.02492844 - time (sec): 1524.67 - samples/sec: 240.92 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-12 21:51:51,473 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 21:51:51,474 EPOCH 7 done: loss 0.0249 - lr: 0.000050 |
|
2023-10-12 21:52:34,668 DEV : loss 0.44921109080314636 - f1-score (micro avg) 0.3588 |
|
2023-10-12 21:52:34,724 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 21:55:06,252 epoch 8 - iter 521/5212 - loss 0.01723823 - time (sec): 151.53 - samples/sec: 248.05 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-12 21:57:37,292 epoch 8 - iter 1042/5212 - loss 0.01658125 - time (sec): 302.57 - samples/sec: 242.31 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-12 22:00:10,298 epoch 8 - iter 1563/5212 - loss 0.01573445 - time (sec): 455.57 - samples/sec: 242.90 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-12 22:02:43,368 epoch 8 - iter 2084/5212 - loss 0.01677614 - time (sec): 608.64 - samples/sec: 243.81 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-12 22:05:14,154 epoch 8 - iter 2605/5212 - loss 0.01684349 - time (sec): 759.43 - samples/sec: 240.27 - lr: 0.000042 - momentum: 0.000000 |
|
2023-10-12 22:07:45,939 epoch 8 - iter 3126/5212 - loss 0.01664639 - time (sec): 911.21 - samples/sec: 240.75 - lr: 0.000040 - momentum: 0.000000 |
|
2023-10-12 22:10:15,255 epoch 8 - iter 3647/5212 - loss 0.01741545 - time (sec): 1060.53 - samples/sec: 239.06 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-12 22:12:44,856 epoch 8 - iter 4168/5212 - loss 0.01704525 - time (sec): 1210.13 - samples/sec: 239.20 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-12 22:15:17,516 epoch 8 - iter 4689/5212 - loss 0.01702684 - time (sec): 1362.79 - samples/sec: 242.11 - lr: 0.000035 - momentum: 0.000000 |
|
2023-10-12 22:17:46,344 epoch 8 - iter 5210/5212 - loss 0.01674797 - time (sec): 1511.62 - samples/sec: 243.02 - lr: 0.000033 - momentum: 0.000000 |
|
2023-10-12 22:17:46,804 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 22:17:46,804 EPOCH 8 done: loss 0.0167 - lr: 0.000033 |
|
2023-10-12 22:18:27,780 DEV : loss 0.4403105676174164 - f1-score (micro avg) 0.3928 |
|
2023-10-12 22:18:27,855 saving best model |
|
2023-10-12 22:18:31,457 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 22:20:59,841 epoch 9 - iter 521/5212 - loss 0.01491704 - time (sec): 148.38 - samples/sec: 252.05 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-12 22:23:25,151 epoch 9 - iter 1042/5212 - loss 0.01413508 - time (sec): 293.69 - samples/sec: 239.66 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-12 22:25:53,399 epoch 9 - iter 1563/5212 - loss 0.01315106 - time (sec): 441.94 - samples/sec: 243.38 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-12 22:28:21,805 epoch 9 - iter 2084/5212 - loss 0.01196193 - time (sec): 590.34 - samples/sec: 245.16 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-12 22:30:52,285 epoch 9 - iter 2605/5212 - loss 0.01228176 - time (sec): 740.82 - samples/sec: 247.75 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-12 22:33:20,379 epoch 9 - iter 3126/5212 - loss 0.01164398 - time (sec): 888.92 - samples/sec: 246.57 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-12 22:35:51,301 epoch 9 - iter 3647/5212 - loss 0.01149750 - time (sec): 1039.84 - samples/sec: 247.54 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-12 22:38:21,783 epoch 9 - iter 4168/5212 - loss 0.01196632 - time (sec): 1190.32 - samples/sec: 246.94 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-12 22:40:52,663 epoch 9 - iter 4689/5212 - loss 0.01189000 - time (sec): 1341.20 - samples/sec: 246.35 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-12 22:43:19,842 epoch 9 - iter 5210/5212 - loss 0.01156761 - time (sec): 1488.38 - samples/sec: 246.82 - lr: 0.000017 - momentum: 0.000000 |
|
2023-10-12 22:43:20,283 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 22:43:20,284 EPOCH 9 done: loss 0.0116 - lr: 0.000017 |
|
2023-10-12 22:44:01,953 DEV : loss 0.44486090540885925 - f1-score (micro avg) 0.4028 |
|
2023-10-12 22:44:02,009 saving best model |
|
2023-10-12 22:44:04,674 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 22:46:35,093 epoch 10 - iter 521/5212 - loss 0.01030731 - time (sec): 150.41 - samples/sec: 243.36 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-12 22:49:05,792 epoch 10 - iter 1042/5212 - loss 0.01158820 - time (sec): 301.11 - samples/sec: 241.13 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-12 22:51:50,787 epoch 10 - iter 1563/5212 - loss 0.00996155 - time (sec): 466.11 - samples/sec: 235.84 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-12 22:54:31,236 epoch 10 - iter 2084/5212 - loss 0.00898869 - time (sec): 626.56 - samples/sec: 231.92 - lr: 0.000010 - momentum: 0.000000 |
|
2023-10-12 22:57:06,304 epoch 10 - iter 2605/5212 - loss 0.00842857 - time (sec): 781.62 - samples/sec: 232.94 - lr: 0.000008 - momentum: 0.000000 |
|
2023-10-12 22:59:38,073 epoch 10 - iter 3126/5212 - loss 0.00869311 - time (sec): 933.39 - samples/sec: 234.97 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-12 23:02:07,453 epoch 10 - iter 3647/5212 - loss 0.00833985 - time (sec): 1082.77 - samples/sec: 234.54 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-12 23:04:44,311 epoch 10 - iter 4168/5212 - loss 0.00850446 - time (sec): 1239.63 - samples/sec: 234.93 - lr: 0.000003 - momentum: 0.000000 |
|
2023-10-12 23:07:20,219 epoch 10 - iter 4689/5212 - loss 0.00844827 - time (sec): 1395.54 - samples/sec: 235.44 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-12 23:09:57,735 epoch 10 - iter 5210/5212 - loss 0.00809271 - time (sec): 1553.06 - samples/sec: 236.51 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-12 23:09:58,253 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 23:09:58,253 EPOCH 10 done: loss 0.0081 - lr: 0.000000 |
|
2023-10-12 23:10:41,281 DEV : loss 0.502132773399353 - f1-score (micro avg) 0.4 |
|
2023-10-12 23:10:42,276 ---------------------------------------------------------------------------------------------------- |
|
2023-10-12 23:10:42,278 Loading model from best epoch ... |
|
2023-10-12 23:10:46,633 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd |
|
2023-10-12 23:12:31,935 |
|
Results: |
|
- F-score (micro) 0.4616 |
|
- F-score (macro) 0.3197 |
|
- Accuracy 0.3054 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.4873 0.5387 0.5117 1214 |
|
PER 0.4096 0.5186 0.4577 808 |
|
ORG 0.2997 0.3201 0.3096 353 |
|
HumanProd 0.0000 0.0000 0.0000 15 |
|
|
|
micro avg 0.4314 0.4962 0.4616 2390 |
|
macro avg 0.2992 0.3443 0.3197 2390 |
|
weighted avg 0.4303 0.4962 0.4604 2390 |
|
|
|
2023-10-12 23:12:31,936 ---------------------------------------------------------------------------------------------------- |
|
|