2023-10-14 19:00:14,964 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:00:14,966 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=21, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-14 19:00:14,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:00:14,966 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences - NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator 2023-10-14 19:00:14,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:00:14,966 Train: 3575 sentences 2023-10-14 19:00:14,966 (train_with_dev=False, train_with_test=False) 2023-10-14 19:00:14,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:00:14,966 Training Params: 2023-10-14 19:00:14,966 - learning_rate: "0.00016" 2023-10-14 19:00:14,966 - mini_batch_size: "8" 2023-10-14 19:00:14,966 - max_epochs: "10" 2023-10-14 19:00:14,966 - shuffle: "True" 2023-10-14 19:00:14,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:00:14,966 Plugins: 2023-10-14 19:00:14,966 - TensorboardLogger 2023-10-14 19:00:14,966 - LinearScheduler | warmup_fraction: '0.1' 2023-10-14 19:00:14,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:00:14,966 Final evaluation on model from best epoch (best-model.pt) 2023-10-14 19:00:14,966 - metric: "('micro avg', 'f1-score')" 2023-10-14 19:00:14,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:00:14,966 Computation: 2023-10-14 19:00:14,966 - compute on device: cuda:0 2023-10-14 19:00:14,966 - embedding storage: none 2023-10-14 19:00:14,966 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:00:14,966 Model training base path: "hmbench-hipe2020/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-1" 2023-10-14 19:00:14,967 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:00:14,967 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:00:14,967 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-14 19:00:30,898 epoch 1 - iter 44/447 - loss 3.04933307 - time (sec): 15.93 - samples/sec: 576.13 - lr: 0.000015 - momentum: 0.000000 2023-10-14 19:00:45,957 epoch 1 - iter 88/447 - loss 3.03049307 - time (sec): 30.99 - samples/sec: 557.03 - lr: 0.000031 - momentum: 0.000000 2023-10-14 19:01:01,268 epoch 1 - iter 132/447 - loss 2.97087840 - time (sec): 46.30 - samples/sec: 550.45 - lr: 0.000047 - momentum: 0.000000 2023-10-14 19:01:16,602 epoch 1 - iter 176/447 - loss 2.83649791 - time (sec): 61.63 - samples/sec: 550.28 - lr: 0.000063 - momentum: 0.000000 2023-10-14 19:01:31,333 epoch 1 - iter 220/447 - loss 2.68309761 - time (sec): 76.37 - samples/sec: 542.95 - lr: 0.000078 - momentum: 0.000000 2023-10-14 19:01:46,512 epoch 1 - iter 264/447 - loss 2.50013370 - time (sec): 91.54 - samples/sec: 540.63 - lr: 0.000094 - momentum: 0.000000 2023-10-14 19:02:02,652 epoch 1 - iter 308/447 - loss 2.28314799 - time (sec): 107.68 - samples/sec: 545.51 - lr: 0.000110 - momentum: 0.000000 2023-10-14 19:02:18,362 epoch 1 - iter 352/447 - loss 2.09975990 - time (sec): 123.39 - samples/sec: 544.15 - lr: 0.000126 - momentum: 0.000000 2023-10-14 19:02:36,146 epoch 1 - iter 396/447 - loss 1.89626135 - time (sec): 141.18 - samples/sec: 548.51 - lr: 0.000141 - momentum: 0.000000 2023-10-14 19:02:51,141 epoch 1 - iter 440/447 - loss 1.76849735 - time (sec): 156.17 - samples/sec: 545.52 - lr: 0.000157 - momentum: 0.000000 2023-10-14 19:02:53,538 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:02:53,538 EPOCH 1 done: loss 1.7498 - lr: 0.000157 2023-10-14 19:03:16,193 DEV : loss 0.46884480118751526 - f1-score (micro avg) 0.0 2023-10-14 19:03:16,219 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:03:31,567 epoch 2 - iter 44/447 - loss 0.50554270 - time (sec): 15.35 - samples/sec: 551.38 - lr: 0.000158 - momentum: 0.000000 2023-10-14 19:03:46,868 epoch 2 - iter 88/447 - loss 0.47764680 - time (sec): 30.65 - samples/sec: 549.37 - lr: 0.000157 - momentum: 0.000000 2023-10-14 19:04:02,679 epoch 2 - iter 132/447 - loss 0.43925964 - time (sec): 46.46 - samples/sec: 564.23 - lr: 0.000155 - momentum: 0.000000 2023-10-14 19:04:19,791 epoch 2 - iter 176/447 - loss 0.41230032 - time (sec): 63.57 - samples/sec: 560.04 - lr: 0.000153 - momentum: 0.000000 2023-10-14 19:04:35,331 epoch 2 - iter 220/447 - loss 0.39359034 - time (sec): 79.11 - samples/sec: 558.93 - lr: 0.000151 - momentum: 0.000000 2023-10-14 19:04:51,169 epoch 2 - iter 264/447 - loss 0.37438209 - time (sec): 94.95 - samples/sec: 558.11 - lr: 0.000150 - momentum: 0.000000 2023-10-14 19:05:06,366 epoch 2 - iter 308/447 - loss 0.37309143 - time (sec): 110.15 - samples/sec: 553.18 - lr: 0.000148 - momentum: 0.000000 2023-10-14 19:05:21,885 epoch 2 - iter 352/447 - loss 0.36297809 - time (sec): 125.66 - samples/sec: 552.95 - lr: 0.000146 - momentum: 0.000000 2023-10-14 19:05:37,947 epoch 2 - iter 396/447 - loss 0.35386572 - time (sec): 141.73 - samples/sec: 549.54 - lr: 0.000144 - momentum: 0.000000 2023-10-14 19:05:52,857 epoch 2 - iter 440/447 - loss 0.34628553 - time (sec): 156.64 - samples/sec: 545.93 - lr: 0.000143 - momentum: 0.000000 2023-10-14 19:05:55,141 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:05:55,142 EPOCH 2 done: loss 0.3467 - lr: 0.000143 2023-10-14 19:06:20,103 DEV : loss 0.24183328449726105 - f1-score (micro avg) 0.4694 2023-10-14 19:06:20,129 saving best model 2023-10-14 19:06:20,733 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:06:36,008 epoch 3 - iter 44/447 - loss 0.28141804 - time (sec): 15.27 - samples/sec: 538.43 - lr: 0.000141 - momentum: 0.000000 2023-10-14 19:06:51,185 epoch 3 - iter 88/447 - loss 0.24664339 - time (sec): 30.45 - samples/sec: 536.51 - lr: 0.000139 - momentum: 0.000000 2023-10-14 19:07:07,008 epoch 3 - iter 132/447 - loss 0.23953289 - time (sec): 46.27 - samples/sec: 538.68 - lr: 0.000137 - momentum: 0.000000 2023-10-14 19:07:22,222 epoch 3 - iter 176/447 - loss 0.23619688 - time (sec): 61.49 - samples/sec: 540.69 - lr: 0.000135 - momentum: 0.000000 2023-10-14 19:07:39,636 epoch 3 - iter 220/447 - loss 0.22688332 - time (sec): 78.90 - samples/sec: 549.23 - lr: 0.000134 - momentum: 0.000000 2023-10-14 19:07:54,915 epoch 3 - iter 264/447 - loss 0.22266002 - time (sec): 94.18 - samples/sec: 546.75 - lr: 0.000132 - momentum: 0.000000 2023-10-14 19:08:10,027 epoch 3 - iter 308/447 - loss 0.21742813 - time (sec): 109.29 - samples/sec: 546.15 - lr: 0.000130 - momentum: 0.000000 2023-10-14 19:08:24,715 epoch 3 - iter 352/447 - loss 0.21064762 - time (sec): 123.98 - samples/sec: 544.71 - lr: 0.000128 - momentum: 0.000000 2023-10-14 19:08:40,176 epoch 3 - iter 396/447 - loss 0.20461919 - time (sec): 139.44 - samples/sec: 547.69 - lr: 0.000127 - momentum: 0.000000 2023-10-14 19:08:55,401 epoch 3 - iter 440/447 - loss 0.19928171 - time (sec): 154.67 - samples/sec: 549.82 - lr: 0.000125 - momentum: 0.000000 2023-10-14 19:08:57,869 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:08:57,870 EPOCH 3 done: loss 0.1981 - lr: 0.000125 2023-10-14 19:09:22,759 DEV : loss 0.17111451923847198 - f1-score (micro avg) 0.6802 2023-10-14 19:09:22,785 saving best model 2023-10-14 19:09:26,745 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:09:41,997 epoch 4 - iter 44/447 - loss 0.14979506 - time (sec): 15.25 - samples/sec: 546.16 - lr: 0.000123 - momentum: 0.000000 2023-10-14 19:09:57,020 epoch 4 - iter 88/447 - loss 0.14345141 - time (sec): 30.27 - samples/sec: 530.81 - lr: 0.000121 - momentum: 0.000000 2023-10-14 19:10:12,122 epoch 4 - iter 132/447 - loss 0.13772700 - time (sec): 45.37 - samples/sec: 530.14 - lr: 0.000119 - momentum: 0.000000 2023-10-14 19:10:27,565 epoch 4 - iter 176/447 - loss 0.13768946 - time (sec): 60.82 - samples/sec: 529.88 - lr: 0.000118 - momentum: 0.000000 2023-10-14 19:10:42,615 epoch 4 - iter 220/447 - loss 0.13136140 - time (sec): 75.87 - samples/sec: 529.21 - lr: 0.000116 - momentum: 0.000000 2023-10-14 19:10:58,702 epoch 4 - iter 264/447 - loss 0.12547106 - time (sec): 91.95 - samples/sec: 534.89 - lr: 0.000114 - momentum: 0.000000 2023-10-14 19:11:13,915 epoch 4 - iter 308/447 - loss 0.11893535 - time (sec): 107.17 - samples/sec: 533.90 - lr: 0.000112 - momentum: 0.000000 2023-10-14 19:11:29,205 epoch 4 - iter 352/447 - loss 0.11655379 - time (sec): 122.46 - samples/sec: 533.75 - lr: 0.000111 - momentum: 0.000000 2023-10-14 19:11:46,858 epoch 4 - iter 396/447 - loss 0.11527255 - time (sec): 140.11 - samples/sec: 537.46 - lr: 0.000109 - momentum: 0.000000 2023-10-14 19:12:03,500 epoch 4 - iter 440/447 - loss 0.11047397 - time (sec): 156.75 - samples/sec: 541.22 - lr: 0.000107 - momentum: 0.000000 2023-10-14 19:12:06,159 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:12:06,159 EPOCH 4 done: loss 0.1091 - lr: 0.000107 2023-10-14 19:12:31,058 DEV : loss 0.15927954018115997 - f1-score (micro avg) 0.7318 2023-10-14 19:12:31,084 saving best model 2023-10-14 19:12:35,558 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:12:50,508 epoch 5 - iter 44/447 - loss 0.05896953 - time (sec): 14.95 - samples/sec: 512.04 - lr: 0.000105 - momentum: 0.000000 2023-10-14 19:13:05,907 epoch 5 - iter 88/447 - loss 0.05595016 - time (sec): 30.35 - samples/sec: 527.00 - lr: 0.000103 - momentum: 0.000000 2023-10-14 19:13:21,775 epoch 5 - iter 132/447 - loss 0.05463563 - time (sec): 46.22 - samples/sec: 538.24 - lr: 0.000102 - momentum: 0.000000 2023-10-14 19:13:37,097 epoch 5 - iter 176/447 - loss 0.06369098 - time (sec): 61.54 - samples/sec: 538.54 - lr: 0.000100 - momentum: 0.000000 2023-10-14 19:13:52,538 epoch 5 - iter 220/447 - loss 0.06209360 - time (sec): 76.98 - samples/sec: 539.52 - lr: 0.000098 - momentum: 0.000000 2023-10-14 19:14:10,442 epoch 5 - iter 264/447 - loss 0.06529404 - time (sec): 94.88 - samples/sec: 538.92 - lr: 0.000096 - momentum: 0.000000 2023-10-14 19:14:25,491 epoch 5 - iter 308/447 - loss 0.06511075 - time (sec): 109.93 - samples/sec: 537.52 - lr: 0.000095 - momentum: 0.000000 2023-10-14 19:14:40,822 epoch 5 - iter 352/447 - loss 0.06432897 - time (sec): 125.26 - samples/sec: 539.81 - lr: 0.000093 - momentum: 0.000000 2023-10-14 19:14:56,567 epoch 5 - iter 396/447 - loss 0.06384615 - time (sec): 141.01 - samples/sec: 543.02 - lr: 0.000091 - momentum: 0.000000 2023-10-14 19:15:12,216 epoch 5 - iter 440/447 - loss 0.06466372 - time (sec): 156.66 - samples/sec: 543.72 - lr: 0.000089 - momentum: 0.000000 2023-10-14 19:15:14,647 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:15:14,648 EPOCH 5 done: loss 0.0652 - lr: 0.000089 2023-10-14 19:15:39,413 DEV : loss 0.1628938466310501 - f1-score (micro avg) 0.7511 2023-10-14 19:15:39,439 saving best model 2023-10-14 19:15:43,868 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:15:59,093 epoch 6 - iter 44/447 - loss 0.03203516 - time (sec): 15.22 - samples/sec: 559.43 - lr: 0.000087 - momentum: 0.000000 2023-10-14 19:16:14,248 epoch 6 - iter 88/447 - loss 0.03647027 - time (sec): 30.38 - samples/sec: 553.80 - lr: 0.000086 - momentum: 0.000000 2023-10-14 19:16:29,750 epoch 6 - iter 132/447 - loss 0.04391932 - time (sec): 45.88 - samples/sec: 549.93 - lr: 0.000084 - momentum: 0.000000 2023-10-14 19:16:45,162 epoch 6 - iter 176/447 - loss 0.04250684 - time (sec): 61.29 - samples/sec: 550.85 - lr: 0.000082 - momentum: 0.000000 2023-10-14 19:17:00,391 epoch 6 - iter 220/447 - loss 0.04230744 - time (sec): 76.52 - samples/sec: 545.69 - lr: 0.000080 - momentum: 0.000000 2023-10-14 19:17:17,699 epoch 6 - iter 264/447 - loss 0.04276753 - time (sec): 93.83 - samples/sec: 546.15 - lr: 0.000079 - momentum: 0.000000 2023-10-14 19:17:33,820 epoch 6 - iter 308/447 - loss 0.04228643 - time (sec): 109.95 - samples/sec: 548.42 - lr: 0.000077 - momentum: 0.000000 2023-10-14 19:17:49,709 epoch 6 - iter 352/447 - loss 0.04210539 - time (sec): 125.84 - samples/sec: 545.91 - lr: 0.000075 - momentum: 0.000000 2023-10-14 19:18:04,818 epoch 6 - iter 396/447 - loss 0.04429856 - time (sec): 140.95 - samples/sec: 543.50 - lr: 0.000073 - momentum: 0.000000 2023-10-14 19:18:20,485 epoch 6 - iter 440/447 - loss 0.04396709 - time (sec): 156.61 - samples/sec: 543.95 - lr: 0.000072 - momentum: 0.000000 2023-10-14 19:18:22,901 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:18:22,901 EPOCH 6 done: loss 0.0437 - lr: 0.000072 2023-10-14 19:18:47,687 DEV : loss 0.20442622900009155 - f1-score (micro avg) 0.7406 2023-10-14 19:18:47,714 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:19:04,763 epoch 7 - iter 44/447 - loss 0.04203418 - time (sec): 17.05 - samples/sec: 567.45 - lr: 0.000070 - momentum: 0.000000 2023-10-14 19:19:20,038 epoch 7 - iter 88/447 - loss 0.03827106 - time (sec): 32.32 - samples/sec: 567.39 - lr: 0.000068 - momentum: 0.000000 2023-10-14 19:19:34,797 epoch 7 - iter 132/447 - loss 0.04291611 - time (sec): 47.08 - samples/sec: 555.88 - lr: 0.000066 - momentum: 0.000000 2023-10-14 19:19:49,756 epoch 7 - iter 176/447 - loss 0.04006105 - time (sec): 62.04 - samples/sec: 555.08 - lr: 0.000064 - momentum: 0.000000 2023-10-14 19:20:05,094 epoch 7 - iter 220/447 - loss 0.03598479 - time (sec): 77.38 - samples/sec: 557.10 - lr: 0.000063 - momentum: 0.000000 2023-10-14 19:20:21,135 epoch 7 - iter 264/447 - loss 0.03417475 - time (sec): 93.42 - samples/sec: 556.20 - lr: 0.000061 - momentum: 0.000000 2023-10-14 19:20:36,240 epoch 7 - iter 308/447 - loss 0.03436650 - time (sec): 108.53 - samples/sec: 555.56 - lr: 0.000059 - momentum: 0.000000 2023-10-14 19:20:51,080 epoch 7 - iter 352/447 - loss 0.03224839 - time (sec): 123.37 - samples/sec: 555.02 - lr: 0.000057 - momentum: 0.000000 2023-10-14 19:21:06,308 epoch 7 - iter 396/447 - loss 0.03196958 - time (sec): 138.59 - samples/sec: 556.57 - lr: 0.000056 - momentum: 0.000000 2023-10-14 19:21:21,295 epoch 7 - iter 440/447 - loss 0.03053808 - time (sec): 153.58 - samples/sec: 555.18 - lr: 0.000054 - momentum: 0.000000 2023-10-14 19:21:23,626 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:21:23,627 EPOCH 7 done: loss 0.0304 - lr: 0.000054 2023-10-14 19:21:48,360 DEV : loss 0.20464175939559937 - f1-score (micro avg) 0.7543 2023-10-14 19:21:48,387 saving best model 2023-10-14 19:21:52,937 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:22:08,136 epoch 8 - iter 44/447 - loss 0.02645942 - time (sec): 15.20 - samples/sec: 538.20 - lr: 0.000052 - momentum: 0.000000 2023-10-14 19:22:23,894 epoch 8 - iter 88/447 - loss 0.03397824 - time (sec): 30.96 - samples/sec: 543.04 - lr: 0.000050 - momentum: 0.000000 2023-10-14 19:22:38,936 epoch 8 - iter 132/447 - loss 0.02970954 - time (sec): 46.00 - samples/sec: 537.04 - lr: 0.000048 - momentum: 0.000000 2023-10-14 19:22:54,728 epoch 8 - iter 176/447 - loss 0.02736998 - time (sec): 61.79 - samples/sec: 549.83 - lr: 0.000047 - momentum: 0.000000 2023-10-14 19:23:10,849 epoch 8 - iter 220/447 - loss 0.02505732 - time (sec): 77.91 - samples/sec: 553.41 - lr: 0.000045 - momentum: 0.000000 2023-10-14 19:23:25,908 epoch 8 - iter 264/447 - loss 0.02422629 - time (sec): 92.97 - samples/sec: 548.88 - lr: 0.000043 - momentum: 0.000000 2023-10-14 19:23:42,873 epoch 8 - iter 308/447 - loss 0.02549261 - time (sec): 109.93 - samples/sec: 546.97 - lr: 0.000041 - momentum: 0.000000 2023-10-14 19:23:58,183 epoch 8 - iter 352/447 - loss 0.02414264 - time (sec): 125.24 - samples/sec: 545.13 - lr: 0.000040 - momentum: 0.000000 2023-10-14 19:24:13,650 epoch 8 - iter 396/447 - loss 0.02399407 - time (sec): 140.71 - samples/sec: 544.19 - lr: 0.000038 - momentum: 0.000000 2023-10-14 19:24:29,061 epoch 8 - iter 440/447 - loss 0.02256911 - time (sec): 156.12 - samples/sec: 545.74 - lr: 0.000036 - momentum: 0.000000 2023-10-14 19:24:31,511 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:24:31,511 EPOCH 8 done: loss 0.0224 - lr: 0.000036 2023-10-14 19:24:56,306 DEV : loss 0.21208828687667847 - f1-score (micro avg) 0.7607 2023-10-14 19:24:56,333 saving best model 2023-10-14 19:25:00,659 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:25:17,927 epoch 9 - iter 44/447 - loss 0.03408679 - time (sec): 17.27 - samples/sec: 563.02 - lr: 0.000034 - momentum: 0.000000 2023-10-14 19:25:34,781 epoch 9 - iter 88/447 - loss 0.02496360 - time (sec): 34.12 - samples/sec: 550.06 - lr: 0.000032 - momentum: 0.000000 2023-10-14 19:25:50,098 epoch 9 - iter 132/447 - loss 0.02108694 - time (sec): 49.44 - samples/sec: 553.76 - lr: 0.000031 - momentum: 0.000000 2023-10-14 19:26:05,436 epoch 9 - iter 176/447 - loss 0.01990705 - time (sec): 64.77 - samples/sec: 557.87 - lr: 0.000029 - momentum: 0.000000 2023-10-14 19:26:20,317 epoch 9 - iter 220/447 - loss 0.01878604 - time (sec): 79.66 - samples/sec: 550.37 - lr: 0.000027 - momentum: 0.000000 2023-10-14 19:26:35,552 epoch 9 - iter 264/447 - loss 0.02147376 - time (sec): 94.89 - samples/sec: 548.57 - lr: 0.000025 - momentum: 0.000000 2023-10-14 19:26:50,978 epoch 9 - iter 308/447 - loss 0.01958079 - time (sec): 110.32 - samples/sec: 542.18 - lr: 0.000024 - momentum: 0.000000 2023-10-14 19:27:06,264 epoch 9 - iter 352/447 - loss 0.01867083 - time (sec): 125.60 - samples/sec: 542.81 - lr: 0.000022 - momentum: 0.000000 2023-10-14 19:27:21,734 epoch 9 - iter 396/447 - loss 0.01771515 - time (sec): 141.07 - samples/sec: 544.03 - lr: 0.000020 - momentum: 0.000000 2023-10-14 19:27:37,248 epoch 9 - iter 440/447 - loss 0.01853660 - time (sec): 156.59 - samples/sec: 544.27 - lr: 0.000018 - momentum: 0.000000 2023-10-14 19:27:39,658 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:27:39,658 EPOCH 9 done: loss 0.0184 - lr: 0.000018 2023-10-14 19:28:04,802 DEV : loss 0.21300099790096283 - f1-score (micro avg) 0.7555 2023-10-14 19:28:04,828 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:28:20,506 epoch 10 - iter 44/447 - loss 0.01825573 - time (sec): 15.68 - samples/sec: 558.22 - lr: 0.000016 - momentum: 0.000000 2023-10-14 19:28:35,569 epoch 10 - iter 88/447 - loss 0.01697777 - time (sec): 30.74 - samples/sec: 534.17 - lr: 0.000015 - momentum: 0.000000 2023-10-14 19:28:50,986 epoch 10 - iter 132/447 - loss 0.01431545 - time (sec): 46.16 - samples/sec: 534.38 - lr: 0.000013 - momentum: 0.000000 2023-10-14 19:29:06,987 epoch 10 - iter 176/447 - loss 0.01363602 - time (sec): 62.16 - samples/sec: 540.38 - lr: 0.000011 - momentum: 0.000000 2023-10-14 19:29:24,791 epoch 10 - iter 220/447 - loss 0.01703956 - time (sec): 79.96 - samples/sec: 547.33 - lr: 0.000009 - momentum: 0.000000 2023-10-14 19:29:40,217 epoch 10 - iter 264/447 - loss 0.01589241 - time (sec): 95.39 - samples/sec: 546.05 - lr: 0.000008 - momentum: 0.000000 2023-10-14 19:29:55,393 epoch 10 - iter 308/447 - loss 0.01523190 - time (sec): 110.56 - samples/sec: 542.18 - lr: 0.000006 - momentum: 0.000000 2023-10-14 19:30:10,721 epoch 10 - iter 352/447 - loss 0.01453450 - time (sec): 125.89 - samples/sec: 536.79 - lr: 0.000004 - momentum: 0.000000 2023-10-14 19:30:26,320 epoch 10 - iter 396/447 - loss 0.01404711 - time (sec): 141.49 - samples/sec: 538.43 - lr: 0.000002 - momentum: 0.000000 2023-10-14 19:30:42,363 epoch 10 - iter 440/447 - loss 0.01559727 - time (sec): 157.53 - samples/sec: 540.67 - lr: 0.000001 - momentum: 0.000000 2023-10-14 19:30:44,798 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:30:44,798 EPOCH 10 done: loss 0.0161 - lr: 0.000001 2023-10-14 19:31:09,873 DEV : loss 0.2216092348098755 - f1-score (micro avg) 0.7554 2023-10-14 19:31:10,513 ---------------------------------------------------------------------------------------------------- 2023-10-14 19:31:10,514 Loading model from best epoch ... 2023-10-14 19:31:12,868 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time 2023-10-14 19:31:34,412 Results: - F-score (micro) 0.752 - F-score (macro) 0.6601 - Accuracy 0.6184 By class: precision recall f1-score support loc 0.8473 0.8658 0.8564 596 pers 0.6649 0.7508 0.7052 333 org 0.5351 0.4621 0.4959 132 prod 0.6182 0.5152 0.5620 66 time 0.7111 0.6531 0.6809 49 micro avg 0.7448 0.7594 0.7520 1176 macro avg 0.6753 0.6494 0.6601 1176 weighted avg 0.7421 0.7594 0.7493 1176 2023-10-14 19:31:34,412 ----------------------------------------------------------------------------------------------------