|
2023-10-10 01:25:46,775 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 01:25:46,778 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-10 01:25:46,778 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 01:25:46,778 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences |
|
- NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator |
|
2023-10-10 01:25:46,778 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 01:25:46,778 Train: 20847 sentences |
|
2023-10-10 01:25:46,778 (train_with_dev=False, train_with_test=False) |
|
2023-10-10 01:25:46,778 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 01:25:46,779 Training Params: |
|
2023-10-10 01:25:46,779 - learning_rate: "0.00015" |
|
2023-10-10 01:25:46,779 - mini_batch_size: "4" |
|
2023-10-10 01:25:46,779 - max_epochs: "10" |
|
2023-10-10 01:25:46,779 - shuffle: "True" |
|
2023-10-10 01:25:46,779 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 01:25:46,779 Plugins: |
|
2023-10-10 01:25:46,779 - TensorboardLogger |
|
2023-10-10 01:25:46,779 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-10 01:25:46,779 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 01:25:46,779 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-10 01:25:46,779 - metric: "('micro avg', 'f1-score')" |
|
2023-10-10 01:25:46,779 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 01:25:46,780 Computation: |
|
2023-10-10 01:25:46,780 - compute on device: cuda:0 |
|
2023-10-10 01:25:46,780 - embedding storage: none |
|
2023-10-10 01:25:46,780 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 01:25:46,780 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-1" |
|
2023-10-10 01:25:46,780 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 01:25:46,780 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 01:25:46,780 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-10 01:28:19,901 epoch 1 - iter 521/5212 - loss 2.78721271 - time (sec): 153.12 - samples/sec: 258.19 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-10 01:30:47,683 epoch 1 - iter 1042/5212 - loss 2.40394453 - time (sec): 300.90 - samples/sec: 246.08 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-10 01:33:20,332 epoch 1 - iter 1563/5212 - loss 1.88420497 - time (sec): 453.55 - samples/sec: 243.15 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-10 01:35:50,068 epoch 1 - iter 2084/5212 - loss 1.56398813 - time (sec): 603.29 - samples/sec: 238.94 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-10 01:38:28,783 epoch 1 - iter 2605/5212 - loss 1.32922150 - time (sec): 762.00 - samples/sec: 239.39 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-10 01:41:01,466 epoch 1 - iter 3126/5212 - loss 1.16575348 - time (sec): 914.68 - samples/sec: 240.69 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-10 01:43:35,466 epoch 1 - iter 3647/5212 - loss 1.04771990 - time (sec): 1068.68 - samples/sec: 239.12 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-10 01:46:05,151 epoch 1 - iter 4168/5212 - loss 0.94981352 - time (sec): 1218.37 - samples/sec: 240.08 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-10 01:48:42,725 epoch 1 - iter 4689/5212 - loss 0.86675606 - time (sec): 1375.94 - samples/sec: 240.43 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-10 01:51:13,055 epoch 1 - iter 5210/5212 - loss 0.80285395 - time (sec): 1526.27 - samples/sec: 240.62 - lr: 0.000150 - momentum: 0.000000 |
|
2023-10-10 01:51:13,587 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 01:51:13,588 EPOCH 1 done: loss 0.8026 - lr: 0.000150 |
|
2023-10-10 01:51:50,130 DEV : loss 0.13000161945819855 - f1-score (micro avg) 0.2937 |
|
2023-10-10 01:51:50,181 saving best model |
|
2023-10-10 01:51:51,139 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 01:54:26,458 epoch 2 - iter 521/5212 - loss 0.17727772 - time (sec): 155.32 - samples/sec: 256.21 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-10 01:57:04,108 epoch 2 - iter 1042/5212 - loss 0.18552352 - time (sec): 312.97 - samples/sec: 251.81 - lr: 0.000147 - momentum: 0.000000 |
|
2023-10-10 01:59:40,241 epoch 2 - iter 1563/5212 - loss 0.17656777 - time (sec): 469.10 - samples/sec: 247.34 - lr: 0.000145 - momentum: 0.000000 |
|
2023-10-10 02:02:10,466 epoch 2 - iter 2084/5212 - loss 0.17347903 - time (sec): 619.32 - samples/sec: 243.06 - lr: 0.000143 - momentum: 0.000000 |
|
2023-10-10 02:04:44,612 epoch 2 - iter 2605/5212 - loss 0.17247510 - time (sec): 773.47 - samples/sec: 241.26 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-10 02:07:16,051 epoch 2 - iter 3126/5212 - loss 0.17019344 - time (sec): 924.91 - samples/sec: 240.53 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-10 02:09:48,370 epoch 2 - iter 3647/5212 - loss 0.16742648 - time (sec): 1077.23 - samples/sec: 240.03 - lr: 0.000138 - momentum: 0.000000 |
|
2023-10-10 02:12:23,147 epoch 2 - iter 4168/5212 - loss 0.16216771 - time (sec): 1232.01 - samples/sec: 240.31 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-10 02:15:01,539 epoch 2 - iter 4689/5212 - loss 0.15863022 - time (sec): 1390.40 - samples/sec: 239.62 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-10 02:17:30,737 epoch 2 - iter 5210/5212 - loss 0.15545238 - time (sec): 1539.60 - samples/sec: 238.55 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-10 02:17:31,246 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 02:17:31,247 EPOCH 2 done: loss 0.1554 - lr: 0.000133 |
|
2023-10-10 02:18:13,857 DEV : loss 0.1568019837141037 - f1-score (micro avg) 0.3643 |
|
2023-10-10 02:18:13,912 saving best model |
|
2023-10-10 02:18:16,652 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 02:20:47,238 epoch 3 - iter 521/5212 - loss 0.09227579 - time (sec): 150.58 - samples/sec: 242.13 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-10 02:23:18,633 epoch 3 - iter 1042/5212 - loss 0.10115948 - time (sec): 301.98 - samples/sec: 236.47 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-10 02:25:54,573 epoch 3 - iter 1563/5212 - loss 0.10224575 - time (sec): 457.92 - samples/sec: 241.87 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-10 02:28:26,479 epoch 3 - iter 2084/5212 - loss 0.10442613 - time (sec): 609.82 - samples/sec: 236.77 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-10 02:30:55,098 epoch 3 - iter 2605/5212 - loss 0.10550793 - time (sec): 758.44 - samples/sec: 234.00 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-10 02:33:31,679 epoch 3 - iter 3126/5212 - loss 0.10488651 - time (sec): 915.02 - samples/sec: 237.84 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-10 02:36:05,961 epoch 3 - iter 3647/5212 - loss 0.10697333 - time (sec): 1069.30 - samples/sec: 239.46 - lr: 0.000122 - momentum: 0.000000 |
|
2023-10-10 02:38:37,130 epoch 3 - iter 4168/5212 - loss 0.10672325 - time (sec): 1220.47 - samples/sec: 240.95 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-10 02:41:12,922 epoch 3 - iter 4689/5212 - loss 0.10596268 - time (sec): 1376.27 - samples/sec: 241.08 - lr: 0.000118 - momentum: 0.000000 |
|
2023-10-10 02:43:44,745 epoch 3 - iter 5210/5212 - loss 0.10482979 - time (sec): 1528.09 - samples/sec: 240.42 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-10 02:43:45,188 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 02:43:45,188 EPOCH 3 done: loss 0.1048 - lr: 0.000117 |
|
2023-10-10 02:44:26,403 DEV : loss 0.26355189085006714 - f1-score (micro avg) 0.3544 |
|
2023-10-10 02:44:26,460 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 02:47:01,262 epoch 4 - iter 521/5212 - loss 0.06438853 - time (sec): 154.80 - samples/sec: 238.52 - lr: 0.000115 - momentum: 0.000000 |
|
2023-10-10 02:49:38,517 epoch 4 - iter 1042/5212 - loss 0.06254566 - time (sec): 312.05 - samples/sec: 231.26 - lr: 0.000113 - momentum: 0.000000 |
|
2023-10-10 02:52:14,492 epoch 4 - iter 1563/5212 - loss 0.06423763 - time (sec): 468.03 - samples/sec: 230.64 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-10 02:54:44,857 epoch 4 - iter 2084/5212 - loss 0.06688261 - time (sec): 618.39 - samples/sec: 232.20 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-10 02:57:20,257 epoch 4 - iter 2605/5212 - loss 0.07115747 - time (sec): 773.79 - samples/sec: 234.30 - lr: 0.000108 - momentum: 0.000000 |
|
2023-10-10 02:59:54,457 epoch 4 - iter 3126/5212 - loss 0.06965197 - time (sec): 927.99 - samples/sec: 238.97 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-10 03:02:33,869 epoch 4 - iter 3647/5212 - loss 0.06885936 - time (sec): 1087.41 - samples/sec: 235.99 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-10 03:05:03,715 epoch 4 - iter 4168/5212 - loss 0.07019996 - time (sec): 1237.25 - samples/sec: 236.43 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-10 03:07:37,272 epoch 4 - iter 4689/5212 - loss 0.07063262 - time (sec): 1390.81 - samples/sec: 237.63 - lr: 0.000102 - momentum: 0.000000 |
|
2023-10-10 03:10:08,879 epoch 4 - iter 5210/5212 - loss 0.07185046 - time (sec): 1542.42 - samples/sec: 238.20 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-10 03:10:09,314 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 03:10:09,314 EPOCH 4 done: loss 0.0719 - lr: 0.000100 |
|
2023-10-10 03:10:57,812 DEV : loss 0.328808069229126 - f1-score (micro avg) 0.3675 |
|
2023-10-10 03:10:57,879 saving best model |
|
2023-10-10 03:11:09,277 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 03:13:41,434 epoch 5 - iter 521/5212 - loss 0.05178094 - time (sec): 152.15 - samples/sec: 227.02 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-10 03:16:13,044 epoch 5 - iter 1042/5212 - loss 0.05498718 - time (sec): 303.76 - samples/sec: 234.98 - lr: 0.000097 - momentum: 0.000000 |
|
2023-10-10 03:18:45,690 epoch 5 - iter 1563/5212 - loss 0.05426323 - time (sec): 456.41 - samples/sec: 241.01 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-10 03:21:23,172 epoch 5 - iter 2084/5212 - loss 0.05256449 - time (sec): 613.89 - samples/sec: 242.02 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-10 03:23:56,778 epoch 5 - iter 2605/5212 - loss 0.05299146 - time (sec): 767.50 - samples/sec: 238.88 - lr: 0.000092 - momentum: 0.000000 |
|
2023-10-10 03:26:30,782 epoch 5 - iter 3126/5212 - loss 0.05454252 - time (sec): 921.50 - samples/sec: 239.64 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-10 03:29:07,368 epoch 5 - iter 3647/5212 - loss 0.05456496 - time (sec): 1078.09 - samples/sec: 240.64 - lr: 0.000088 - momentum: 0.000000 |
|
2023-10-10 03:31:43,290 epoch 5 - iter 4168/5212 - loss 0.05333736 - time (sec): 1234.01 - samples/sec: 239.80 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-10 03:34:15,429 epoch 5 - iter 4689/5212 - loss 0.05223842 - time (sec): 1386.15 - samples/sec: 238.43 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-10 03:36:49,643 epoch 5 - iter 5210/5212 - loss 0.05325230 - time (sec): 1540.36 - samples/sec: 238.48 - lr: 0.000083 - momentum: 0.000000 |
|
2023-10-10 03:36:50,135 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 03:36:50,135 EPOCH 5 done: loss 0.0532 - lr: 0.000083 |
|
2023-10-10 03:37:32,491 DEV : loss 0.3098498284816742 - f1-score (micro avg) 0.3954 |
|
2023-10-10 03:37:32,548 saving best model |
|
2023-10-10 03:37:35,444 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 03:40:16,955 epoch 6 - iter 521/5212 - loss 0.03216296 - time (sec): 161.51 - samples/sec: 227.66 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-10 03:42:49,627 epoch 6 - iter 1042/5212 - loss 0.03763248 - time (sec): 314.18 - samples/sec: 225.89 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-10 03:45:24,399 epoch 6 - iter 1563/5212 - loss 0.03476631 - time (sec): 468.95 - samples/sec: 234.51 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-10 03:48:01,947 epoch 6 - iter 2084/5212 - loss 0.03495034 - time (sec): 626.50 - samples/sec: 232.55 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-10 03:50:33,387 epoch 6 - iter 2605/5212 - loss 0.03541576 - time (sec): 777.94 - samples/sec: 234.85 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-10 03:53:03,173 epoch 6 - iter 3126/5212 - loss 0.03439095 - time (sec): 927.72 - samples/sec: 233.04 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-10 03:55:38,848 epoch 6 - iter 3647/5212 - loss 0.03483107 - time (sec): 1083.40 - samples/sec: 233.74 - lr: 0.000072 - momentum: 0.000000 |
|
2023-10-10 03:58:12,884 epoch 6 - iter 4168/5212 - loss 0.03532550 - time (sec): 1237.44 - samples/sec: 235.10 - lr: 0.000070 - momentum: 0.000000 |
|
2023-10-10 04:00:56,870 epoch 6 - iter 4689/5212 - loss 0.03575938 - time (sec): 1401.42 - samples/sec: 235.54 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-10 04:03:29,722 epoch 6 - iter 5210/5212 - loss 0.03599569 - time (sec): 1554.27 - samples/sec: 236.34 - lr: 0.000067 - momentum: 0.000000 |
|
2023-10-10 04:03:30,188 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 04:03:30,188 EPOCH 6 done: loss 0.0360 - lr: 0.000067 |
|
2023-10-10 04:04:12,941 DEV : loss 0.3672010004520416 - f1-score (micro avg) 0.3823 |
|
2023-10-10 04:04:13,004 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 04:06:49,841 epoch 7 - iter 521/5212 - loss 0.01901470 - time (sec): 156.83 - samples/sec: 245.25 - lr: 0.000065 - momentum: 0.000000 |
|
2023-10-10 04:09:26,255 epoch 7 - iter 1042/5212 - loss 0.02194304 - time (sec): 313.25 - samples/sec: 244.81 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-10 04:12:00,065 epoch 7 - iter 1563/5212 - loss 0.02292694 - time (sec): 467.06 - samples/sec: 239.09 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-10 04:14:34,985 epoch 7 - iter 2084/5212 - loss 0.02286490 - time (sec): 621.98 - samples/sec: 239.67 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-10 04:17:07,483 epoch 7 - iter 2605/5212 - loss 0.02158969 - time (sec): 774.48 - samples/sec: 241.38 - lr: 0.000058 - momentum: 0.000000 |
|
2023-10-10 04:19:42,460 epoch 7 - iter 3126/5212 - loss 0.02437178 - time (sec): 929.45 - samples/sec: 241.11 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-10 04:22:14,067 epoch 7 - iter 3647/5212 - loss 0.02420860 - time (sec): 1081.06 - samples/sec: 240.27 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-10 04:24:44,274 epoch 7 - iter 4168/5212 - loss 0.02497424 - time (sec): 1231.27 - samples/sec: 240.58 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-10 04:27:18,230 epoch 7 - iter 4689/5212 - loss 0.02500752 - time (sec): 1385.22 - samples/sec: 240.06 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-10 04:29:46,445 epoch 7 - iter 5210/5212 - loss 0.02487125 - time (sec): 1533.44 - samples/sec: 239.57 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-10 04:29:46,901 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 04:29:46,902 EPOCH 7 done: loss 0.0249 - lr: 0.000050 |
|
2023-10-10 04:30:31,180 DEV : loss 0.43393340706825256 - f1-score (micro avg) 0.3819 |
|
2023-10-10 04:30:31,244 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 04:33:08,702 epoch 8 - iter 521/5212 - loss 0.02118548 - time (sec): 157.46 - samples/sec: 229.44 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-10 04:35:41,639 epoch 8 - iter 1042/5212 - loss 0.01953101 - time (sec): 310.39 - samples/sec: 235.18 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-10 04:38:18,307 epoch 8 - iter 1563/5212 - loss 0.01941108 - time (sec): 467.06 - samples/sec: 237.33 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-10 04:40:52,233 epoch 8 - iter 2084/5212 - loss 0.01883043 - time (sec): 620.99 - samples/sec: 236.34 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-10 04:43:26,171 epoch 8 - iter 2605/5212 - loss 0.01862250 - time (sec): 774.92 - samples/sec: 236.00 - lr: 0.000042 - momentum: 0.000000 |
|
2023-10-10 04:46:02,246 epoch 8 - iter 3126/5212 - loss 0.01937239 - time (sec): 931.00 - samples/sec: 236.09 - lr: 0.000040 - momentum: 0.000000 |
|
2023-10-10 04:48:33,038 epoch 8 - iter 3647/5212 - loss 0.01908805 - time (sec): 1081.79 - samples/sec: 235.79 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-10 04:51:12,633 epoch 8 - iter 4168/5212 - loss 0.01889720 - time (sec): 1241.39 - samples/sec: 236.80 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-10 04:53:47,905 epoch 8 - iter 4689/5212 - loss 0.01843401 - time (sec): 1396.66 - samples/sec: 236.95 - lr: 0.000035 - momentum: 0.000000 |
|
2023-10-10 04:56:21,396 epoch 8 - iter 5210/5212 - loss 0.01859840 - time (sec): 1550.15 - samples/sec: 236.93 - lr: 0.000033 - momentum: 0.000000 |
|
2023-10-10 04:56:21,934 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 04:56:21,934 EPOCH 8 done: loss 0.0186 - lr: 0.000033 |
|
2023-10-10 04:57:03,167 DEV : loss 0.46257734298706055 - f1-score (micro avg) 0.3734 |
|
2023-10-10 04:57:03,238 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 04:59:37,592 epoch 9 - iter 521/5212 - loss 0.01100351 - time (sec): 154.35 - samples/sec: 244.77 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-10 05:02:12,911 epoch 9 - iter 1042/5212 - loss 0.01142614 - time (sec): 309.67 - samples/sec: 244.21 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-10 05:04:45,707 epoch 9 - iter 1563/5212 - loss 0.01200825 - time (sec): 462.47 - samples/sec: 238.44 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-10 05:07:25,374 epoch 9 - iter 2084/5212 - loss 0.01167246 - time (sec): 622.13 - samples/sec: 236.70 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-10 05:10:00,448 epoch 9 - iter 2605/5212 - loss 0.01211194 - time (sec): 777.21 - samples/sec: 236.18 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-10 05:12:35,180 epoch 9 - iter 3126/5212 - loss 0.01227832 - time (sec): 931.94 - samples/sec: 236.70 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-10 05:15:13,447 epoch 9 - iter 3647/5212 - loss 0.01232401 - time (sec): 1090.21 - samples/sec: 235.04 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-10 05:17:46,800 epoch 9 - iter 4168/5212 - loss 0.01205408 - time (sec): 1243.56 - samples/sec: 234.71 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-10 05:20:19,024 epoch 9 - iter 4689/5212 - loss 0.01137052 - time (sec): 1395.78 - samples/sec: 235.39 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-10 05:23:03,373 epoch 9 - iter 5210/5212 - loss 0.01116459 - time (sec): 1560.13 - samples/sec: 235.41 - lr: 0.000017 - momentum: 0.000000 |
|
2023-10-10 05:23:03,935 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 05:23:03,935 EPOCH 9 done: loss 0.0112 - lr: 0.000017 |
|
2023-10-10 05:23:45,741 DEV : loss 0.49680188298225403 - f1-score (micro avg) 0.386 |
|
2023-10-10 05:23:45,810 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 05:26:19,459 epoch 10 - iter 521/5212 - loss 0.00618823 - time (sec): 153.65 - samples/sec: 239.16 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-10 05:28:50,815 epoch 10 - iter 1042/5212 - loss 0.00778641 - time (sec): 305.00 - samples/sec: 236.39 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-10 05:31:22,417 epoch 10 - iter 1563/5212 - loss 0.00880970 - time (sec): 456.60 - samples/sec: 231.53 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-10 05:34:00,744 epoch 10 - iter 2084/5212 - loss 0.00838074 - time (sec): 614.93 - samples/sec: 234.01 - lr: 0.000010 - momentum: 0.000000 |
|
2023-10-10 05:36:40,115 epoch 10 - iter 2605/5212 - loss 0.00876966 - time (sec): 774.30 - samples/sec: 238.88 - lr: 0.000008 - momentum: 0.000000 |
|
2023-10-10 05:39:13,455 epoch 10 - iter 3126/5212 - loss 0.00882349 - time (sec): 927.64 - samples/sec: 237.17 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-10 05:41:47,684 epoch 10 - iter 3647/5212 - loss 0.00895006 - time (sec): 1081.87 - samples/sec: 238.01 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-10 05:44:19,241 epoch 10 - iter 4168/5212 - loss 0.00888338 - time (sec): 1233.43 - samples/sec: 239.39 - lr: 0.000003 - momentum: 0.000000 |
|
2023-10-10 05:46:50,896 epoch 10 - iter 4689/5212 - loss 0.00858734 - time (sec): 1385.08 - samples/sec: 240.18 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-10 05:49:20,398 epoch 10 - iter 5210/5212 - loss 0.00861144 - time (sec): 1534.59 - samples/sec: 239.31 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-10 05:49:20,941 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 05:49:20,942 EPOCH 10 done: loss 0.0086 - lr: 0.000000 |
|
2023-10-10 05:50:01,786 DEV : loss 0.507175087928772 - f1-score (micro avg) 0.382 |
|
2023-10-10 05:50:02,826 ---------------------------------------------------------------------------------------------------- |
|
2023-10-10 05:50:02,828 Loading model from best epoch ... |
|
2023-10-10 05:50:07,141 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd |
|
2023-10-10 05:51:51,254 |
|
Results: |
|
- F-score (micro) 0.4873 |
|
- F-score (macro) 0.327 |
|
- Accuracy 0.3265 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.5032 0.6400 0.5635 1214 |
|
PER 0.4093 0.5025 0.4511 808 |
|
ORG 0.2982 0.2890 0.2935 353 |
|
HumanProd 0.0000 0.0000 0.0000 15 |
|
|
|
micro avg 0.4456 0.5377 0.4873 2390 |
|
macro avg 0.3027 0.3579 0.3270 2390 |
|
weighted avg 0.4380 0.5377 0.4821 2390 |
|
|
|
2023-10-10 05:51:51,255 ---------------------------------------------------------------------------------------------------- |
|
|