stefan-it commited on
Commit
86e313e
1 Parent(s): 85981dc

Upload folder using huggingface_hub

Browse files
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7848ef354938f6969ce8a40d2ce9989c53a5fd98b45e95c1937e6a9e3d83435c
3
+ size 870817519
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52d22151c41dcf7ea66baedce86256fd85edf9c02ed48f7b86e4412a6ddae54a
3
+ size 870817636
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 10:51:33 0.0001 0.8086 0.1372 0.3139 0.3561 0.3336 0.2009
3
+ 2 11:15:54 0.0001 0.1649 0.1447 0.2514 0.4356 0.3188 0.1901
4
+ 3 11:39:50 0.0001 0.1082 0.1828 0.3149 0.5701 0.4057 0.2555
5
+ 4 12:03:49 0.0001 0.0736 0.2782 0.2683 0.5133 0.3524 0.2149
6
+ 5 12:28:32 0.0001 0.0512 0.3334 0.2874 0.5890 0.3863 0.2405
7
+ 6 12:53:00 0.0001 0.0347 0.3976 0.2728 0.6117 0.3773 0.2342
8
+ 7 13:17:23 0.0001 0.0264 0.3980 0.2879 0.5833 0.3855 0.2402
9
+ 8 13:41:35 0.0000 0.0173 0.4345 0.3180 0.5890 0.4130 0.2620
10
+ 9 14:06:24 0.0000 0.0128 0.4871 0.2791 0.6212 0.3852 0.2398
11
+ 10 14:31:09 0.0000 0.0078 0.4862 0.2934 0.6117 0.3966 0.2488
runs/events.out.tfevents.1697020053.6d4c7681f95b.1253.10 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:680dc1cccb8b19f5105ba872d97e8a834b8977ba42f07bc00698e3700c52d561
3
+ size 2923780
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,261 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-11 10:27:33,571 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-11 10:27:33,574 Model: "SequenceTagger(
3
+ (embeddings): ByT5Embeddings(
4
+ (model): T5EncoderModel(
5
+ (shared): Embedding(384, 1472)
6
+ (encoder): T5Stack(
7
+ (embed_tokens): Embedding(384, 1472)
8
+ (block): ModuleList(
9
+ (0): T5Block(
10
+ (layer): ModuleList(
11
+ (0): T5LayerSelfAttention(
12
+ (SelfAttention): T5Attention(
13
+ (q): Linear(in_features=1472, out_features=384, bias=False)
14
+ (k): Linear(in_features=1472, out_features=384, bias=False)
15
+ (v): Linear(in_features=1472, out_features=384, bias=False)
16
+ (o): Linear(in_features=384, out_features=1472, bias=False)
17
+ (relative_attention_bias): Embedding(32, 6)
18
+ )
19
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (1): T5LayerFF(
23
+ (DenseReluDense): T5DenseGatedActDense(
24
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
25
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
26
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
27
+ (dropout): Dropout(p=0.1, inplace=False)
28
+ (act): NewGELUActivation()
29
+ )
30
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
31
+ (dropout): Dropout(p=0.1, inplace=False)
32
+ )
33
+ )
34
+ )
35
+ (1-11): 11 x T5Block(
36
+ (layer): ModuleList(
37
+ (0): T5LayerSelfAttention(
38
+ (SelfAttention): T5Attention(
39
+ (q): Linear(in_features=1472, out_features=384, bias=False)
40
+ (k): Linear(in_features=1472, out_features=384, bias=False)
41
+ (v): Linear(in_features=1472, out_features=384, bias=False)
42
+ (o): Linear(in_features=384, out_features=1472, bias=False)
43
+ )
44
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
45
+ (dropout): Dropout(p=0.1, inplace=False)
46
+ )
47
+ (1): T5LayerFF(
48
+ (DenseReluDense): T5DenseGatedActDense(
49
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
50
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
51
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
52
+ (dropout): Dropout(p=0.1, inplace=False)
53
+ (act): NewGELUActivation()
54
+ )
55
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
56
+ (dropout): Dropout(p=0.1, inplace=False)
57
+ )
58
+ )
59
+ )
60
+ )
61
+ (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
62
+ (dropout): Dropout(p=0.1, inplace=False)
63
+ )
64
+ )
65
+ )
66
+ (locked_dropout): LockedDropout(p=0.5)
67
+ (linear): Linear(in_features=1472, out_features=17, bias=True)
68
+ (loss_function): CrossEntropyLoss()
69
+ )"
70
+ 2023-10-11 10:27:33,574 ----------------------------------------------------------------------------------------------------
71
+ 2023-10-11 10:27:33,574 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences
72
+ - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator
73
+ 2023-10-11 10:27:33,574 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-11 10:27:33,574 Train: 20847 sentences
75
+ 2023-10-11 10:27:33,575 (train_with_dev=False, train_with_test=False)
76
+ 2023-10-11 10:27:33,575 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-11 10:27:33,575 Training Params:
78
+ 2023-10-11 10:27:33,575 - learning_rate: "0.00015"
79
+ 2023-10-11 10:27:33,575 - mini_batch_size: "4"
80
+ 2023-10-11 10:27:33,575 - max_epochs: "10"
81
+ 2023-10-11 10:27:33,575 - shuffle: "True"
82
+ 2023-10-11 10:27:33,575 ----------------------------------------------------------------------------------------------------
83
+ 2023-10-11 10:27:33,575 Plugins:
84
+ 2023-10-11 10:27:33,575 - TensorboardLogger
85
+ 2023-10-11 10:27:33,575 - LinearScheduler | warmup_fraction: '0.1'
86
+ 2023-10-11 10:27:33,575 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-11 10:27:33,575 Final evaluation on model from best epoch (best-model.pt)
88
+ 2023-10-11 10:27:33,576 - metric: "('micro avg', 'f1-score')"
89
+ 2023-10-11 10:27:33,576 ----------------------------------------------------------------------------------------------------
90
+ 2023-10-11 10:27:33,576 Computation:
91
+ 2023-10-11 10:27:33,576 - compute on device: cuda:0
92
+ 2023-10-11 10:27:33,576 - embedding storage: none
93
+ 2023-10-11 10:27:33,576 ----------------------------------------------------------------------------------------------------
94
+ 2023-10-11 10:27:33,576 Model training base path: "hmbench-newseye/de-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3"
95
+ 2023-10-11 10:27:33,576 ----------------------------------------------------------------------------------------------------
96
+ 2023-10-11 10:27:33,576 ----------------------------------------------------------------------------------------------------
97
+ 2023-10-11 10:27:33,576 Logging anything other than scalars to TensorBoard is currently not supported.
98
+ 2023-10-11 10:29:55,156 epoch 1 - iter 521/5212 - loss 2.77308848 - time (sec): 141.58 - samples/sec: 263.97 - lr: 0.000015 - momentum: 0.000000
99
+ 2023-10-11 10:32:16,905 epoch 1 - iter 1042/5212 - loss 2.32230492 - time (sec): 283.33 - samples/sec: 270.92 - lr: 0.000030 - momentum: 0.000000
100
+ 2023-10-11 10:34:35,422 epoch 1 - iter 1563/5212 - loss 1.84040633 - time (sec): 421.84 - samples/sec: 268.49 - lr: 0.000045 - momentum: 0.000000
101
+ 2023-10-11 10:36:54,353 epoch 1 - iter 2084/5212 - loss 1.50205626 - time (sec): 560.77 - samples/sec: 266.53 - lr: 0.000060 - momentum: 0.000000
102
+ 2023-10-11 10:39:16,176 epoch 1 - iter 2605/5212 - loss 1.29871005 - time (sec): 702.60 - samples/sec: 266.48 - lr: 0.000075 - momentum: 0.000000
103
+ 2023-10-11 10:41:36,574 epoch 1 - iter 3126/5212 - loss 1.14819274 - time (sec): 842.99 - samples/sec: 265.07 - lr: 0.000090 - momentum: 0.000000
104
+ 2023-10-11 10:43:56,922 epoch 1 - iter 3647/5212 - loss 1.03459929 - time (sec): 983.34 - samples/sec: 263.04 - lr: 0.000105 - momentum: 0.000000
105
+ 2023-10-11 10:46:16,004 epoch 1 - iter 4168/5212 - loss 0.95149666 - time (sec): 1122.43 - samples/sec: 261.30 - lr: 0.000120 - momentum: 0.000000
106
+ 2023-10-11 10:48:36,869 epoch 1 - iter 4689/5212 - loss 0.87440388 - time (sec): 1263.29 - samples/sec: 262.05 - lr: 0.000135 - momentum: 0.000000
107
+ 2023-10-11 10:50:57,081 epoch 1 - iter 5210/5212 - loss 0.80881280 - time (sec): 1403.50 - samples/sec: 261.66 - lr: 0.000150 - momentum: 0.000000
108
+ 2023-10-11 10:50:57,613 ----------------------------------------------------------------------------------------------------
109
+ 2023-10-11 10:50:57,613 EPOCH 1 done: loss 0.8086 - lr: 0.000150
110
+ 2023-10-11 10:51:33,753 DEV : loss 0.1371876299381256 - f1-score (micro avg) 0.3336
111
+ 2023-10-11 10:51:33,806 saving best model
112
+ 2023-10-11 10:51:34,714 ----------------------------------------------------------------------------------------------------
113
+ 2023-10-11 10:53:54,909 epoch 2 - iter 521/5212 - loss 0.19618675 - time (sec): 140.19 - samples/sec: 263.48 - lr: 0.000148 - momentum: 0.000000
114
+ 2023-10-11 10:56:12,183 epoch 2 - iter 1042/5212 - loss 0.18458622 - time (sec): 277.47 - samples/sec: 264.49 - lr: 0.000147 - momentum: 0.000000
115
+ 2023-10-11 10:58:36,985 epoch 2 - iter 1563/5212 - loss 0.18961166 - time (sec): 422.27 - samples/sec: 267.02 - lr: 0.000145 - momentum: 0.000000
116
+ 2023-10-11 11:00:58,974 epoch 2 - iter 2084/5212 - loss 0.18529432 - time (sec): 564.26 - samples/sec: 264.94 - lr: 0.000143 - momentum: 0.000000
117
+ 2023-10-11 11:03:19,697 epoch 2 - iter 2605/5212 - loss 0.18045804 - time (sec): 704.98 - samples/sec: 261.97 - lr: 0.000142 - momentum: 0.000000
118
+ 2023-10-11 11:05:42,377 epoch 2 - iter 3126/5212 - loss 0.17752518 - time (sec): 847.66 - samples/sec: 261.81 - lr: 0.000140 - momentum: 0.000000
119
+ 2023-10-11 11:08:02,418 epoch 2 - iter 3647/5212 - loss 0.17641629 - time (sec): 987.70 - samples/sec: 258.68 - lr: 0.000138 - momentum: 0.000000
120
+ 2023-10-11 11:10:25,754 epoch 2 - iter 4168/5212 - loss 0.17189078 - time (sec): 1131.04 - samples/sec: 258.18 - lr: 0.000137 - momentum: 0.000000
121
+ 2023-10-11 11:12:50,890 epoch 2 - iter 4689/5212 - loss 0.16869126 - time (sec): 1276.17 - samples/sec: 258.97 - lr: 0.000135 - momentum: 0.000000
122
+ 2023-10-11 11:15:14,550 epoch 2 - iter 5210/5212 - loss 0.16487858 - time (sec): 1419.83 - samples/sec: 258.73 - lr: 0.000133 - momentum: 0.000000
123
+ 2023-10-11 11:15:14,995 ----------------------------------------------------------------------------------------------------
124
+ 2023-10-11 11:15:14,995 EPOCH 2 done: loss 0.1649 - lr: 0.000133
125
+ 2023-10-11 11:15:54,268 DEV : loss 0.14473062753677368 - f1-score (micro avg) 0.3188
126
+ 2023-10-11 11:15:54,319 ----------------------------------------------------------------------------------------------------
127
+ 2023-10-11 11:18:13,777 epoch 3 - iter 521/5212 - loss 0.10458077 - time (sec): 139.46 - samples/sec: 250.74 - lr: 0.000132 - momentum: 0.000000
128
+ 2023-10-11 11:20:33,676 epoch 3 - iter 1042/5212 - loss 0.10612993 - time (sec): 279.36 - samples/sec: 254.51 - lr: 0.000130 - momentum: 0.000000
129
+ 2023-10-11 11:22:53,147 epoch 3 - iter 1563/5212 - loss 0.10354381 - time (sec): 418.83 - samples/sec: 255.19 - lr: 0.000128 - momentum: 0.000000
130
+ 2023-10-11 11:25:16,345 epoch 3 - iter 2084/5212 - loss 0.10944179 - time (sec): 562.02 - samples/sec: 259.35 - lr: 0.000127 - momentum: 0.000000
131
+ 2023-10-11 11:27:35,775 epoch 3 - iter 2605/5212 - loss 0.11310926 - time (sec): 701.45 - samples/sec: 262.27 - lr: 0.000125 - momentum: 0.000000
132
+ 2023-10-11 11:29:54,384 epoch 3 - iter 3126/5212 - loss 0.10867516 - time (sec): 840.06 - samples/sec: 261.83 - lr: 0.000123 - momentum: 0.000000
133
+ 2023-10-11 11:32:12,976 epoch 3 - iter 3647/5212 - loss 0.10736809 - time (sec): 978.66 - samples/sec: 261.05 - lr: 0.000122 - momentum: 0.000000
134
+ 2023-10-11 11:34:32,157 epoch 3 - iter 4168/5212 - loss 0.10821714 - time (sec): 1117.84 - samples/sec: 261.76 - lr: 0.000120 - momentum: 0.000000
135
+ 2023-10-11 11:36:50,238 epoch 3 - iter 4689/5212 - loss 0.10784416 - time (sec): 1255.92 - samples/sec: 261.68 - lr: 0.000118 - momentum: 0.000000
136
+ 2023-10-11 11:39:11,437 epoch 3 - iter 5210/5212 - loss 0.10805328 - time (sec): 1397.12 - samples/sec: 262.92 - lr: 0.000117 - momentum: 0.000000
137
+ 2023-10-11 11:39:11,884 ----------------------------------------------------------------------------------------------------
138
+ 2023-10-11 11:39:11,885 EPOCH 3 done: loss 0.1082 - lr: 0.000117
139
+ 2023-10-11 11:39:50,899 DEV : loss 0.18280969560146332 - f1-score (micro avg) 0.4057
140
+ 2023-10-11 11:39:50,958 saving best model
141
+ 2023-10-11 11:39:53,591 ----------------------------------------------------------------------------------------------------
142
+ 2023-10-11 11:42:14,816 epoch 4 - iter 521/5212 - loss 0.07278263 - time (sec): 141.22 - samples/sec: 250.15 - lr: 0.000115 - momentum: 0.000000
143
+ 2023-10-11 11:44:36,537 epoch 4 - iter 1042/5212 - loss 0.07239825 - time (sec): 282.94 - samples/sec: 254.20 - lr: 0.000113 - momentum: 0.000000
144
+ 2023-10-11 11:46:57,338 epoch 4 - iter 1563/5212 - loss 0.07231137 - time (sec): 423.74 - samples/sec: 258.25 - lr: 0.000112 - momentum: 0.000000
145
+ 2023-10-11 11:49:15,233 epoch 4 - iter 2084/5212 - loss 0.07343325 - time (sec): 561.64 - samples/sec: 257.71 - lr: 0.000110 - momentum: 0.000000
146
+ 2023-10-11 11:51:38,449 epoch 4 - iter 2605/5212 - loss 0.07181772 - time (sec): 704.85 - samples/sec: 262.36 - lr: 0.000108 - momentum: 0.000000
147
+ 2023-10-11 11:53:56,748 epoch 4 - iter 3126/5212 - loss 0.07331153 - time (sec): 843.15 - samples/sec: 261.09 - lr: 0.000107 - momentum: 0.000000
148
+ 2023-10-11 11:56:18,345 epoch 4 - iter 3647/5212 - loss 0.07316098 - time (sec): 984.75 - samples/sec: 261.72 - lr: 0.000105 - momentum: 0.000000
149
+ 2023-10-11 11:58:40,109 epoch 4 - iter 4168/5212 - loss 0.07306211 - time (sec): 1126.51 - samples/sec: 264.45 - lr: 0.000103 - momentum: 0.000000
150
+ 2023-10-11 12:00:54,299 epoch 4 - iter 4689/5212 - loss 0.07216413 - time (sec): 1260.70 - samples/sec: 263.41 - lr: 0.000102 - momentum: 0.000000
151
+ 2023-10-11 12:03:10,241 epoch 4 - iter 5210/5212 - loss 0.07359050 - time (sec): 1396.65 - samples/sec: 263.05 - lr: 0.000100 - momentum: 0.000000
152
+ 2023-10-11 12:03:10,639 ----------------------------------------------------------------------------------------------------
153
+ 2023-10-11 12:03:10,640 EPOCH 4 done: loss 0.0736 - lr: 0.000100
154
+ 2023-10-11 12:03:49,861 DEV : loss 0.27816441655158997 - f1-score (micro avg) 0.3524
155
+ 2023-10-11 12:03:49,914 ----------------------------------------------------------------------------------------------------
156
+ 2023-10-11 12:06:07,583 epoch 5 - iter 521/5212 - loss 0.04138265 - time (sec): 137.67 - samples/sec: 260.75 - lr: 0.000098 - momentum: 0.000000
157
+ 2023-10-11 12:08:31,272 epoch 5 - iter 1042/5212 - loss 0.04749933 - time (sec): 281.36 - samples/sec: 261.65 - lr: 0.000097 - momentum: 0.000000
158
+ 2023-10-11 12:10:54,820 epoch 5 - iter 1563/5212 - loss 0.04798952 - time (sec): 424.90 - samples/sec: 257.07 - lr: 0.000095 - momentum: 0.000000
159
+ 2023-10-11 12:13:20,187 epoch 5 - iter 2084/5212 - loss 0.05106117 - time (sec): 570.27 - samples/sec: 255.58 - lr: 0.000093 - momentum: 0.000000
160
+ 2023-10-11 12:15:47,123 epoch 5 - iter 2605/5212 - loss 0.05002478 - time (sec): 717.21 - samples/sec: 256.72 - lr: 0.000092 - momentum: 0.000000
161
+ 2023-10-11 12:18:10,619 epoch 5 - iter 3126/5212 - loss 0.05055487 - time (sec): 860.70 - samples/sec: 254.97 - lr: 0.000090 - momentum: 0.000000
162
+ 2023-10-11 12:20:35,848 epoch 5 - iter 3647/5212 - loss 0.05188152 - time (sec): 1005.93 - samples/sec: 254.69 - lr: 0.000088 - momentum: 0.000000
163
+ 2023-10-11 12:22:59,869 epoch 5 - iter 4168/5212 - loss 0.05088261 - time (sec): 1149.95 - samples/sec: 253.41 - lr: 0.000087 - momentum: 0.000000
164
+ 2023-10-11 12:25:25,685 epoch 5 - iter 4689/5212 - loss 0.05025009 - time (sec): 1295.77 - samples/sec: 253.90 - lr: 0.000085 - momentum: 0.000000
165
+ 2023-10-11 12:27:51,044 epoch 5 - iter 5210/5212 - loss 0.05118220 - time (sec): 1441.13 - samples/sec: 254.89 - lr: 0.000083 - momentum: 0.000000
166
+ 2023-10-11 12:27:51,510 ----------------------------------------------------------------------------------------------------
167
+ 2023-10-11 12:27:51,511 EPOCH 5 done: loss 0.0512 - lr: 0.000083
168
+ 2023-10-11 12:28:32,525 DEV : loss 0.3333892226219177 - f1-score (micro avg) 0.3863
169
+ 2023-10-11 12:28:32,579 ----------------------------------------------------------------------------------------------------
170
+ 2023-10-11 12:30:52,575 epoch 6 - iter 521/5212 - loss 0.03246002 - time (sec): 139.99 - samples/sec: 241.08 - lr: 0.000082 - momentum: 0.000000
171
+ 2023-10-11 12:33:13,663 epoch 6 - iter 1042/5212 - loss 0.03096266 - time (sec): 281.08 - samples/sec: 243.35 - lr: 0.000080 - momentum: 0.000000
172
+ 2023-10-11 12:35:38,110 epoch 6 - iter 1563/5212 - loss 0.03241169 - time (sec): 425.53 - samples/sec: 245.97 - lr: 0.000078 - momentum: 0.000000
173
+ 2023-10-11 12:38:01,041 epoch 6 - iter 2084/5212 - loss 0.03229808 - time (sec): 568.46 - samples/sec: 247.46 - lr: 0.000077 - momentum: 0.000000
174
+ 2023-10-11 12:40:25,250 epoch 6 - iter 2605/5212 - loss 0.03266383 - time (sec): 712.67 - samples/sec: 249.02 - lr: 0.000075 - momentum: 0.000000
175
+ 2023-10-11 12:42:46,535 epoch 6 - iter 3126/5212 - loss 0.03217009 - time (sec): 853.95 - samples/sec: 249.45 - lr: 0.000073 - momentum: 0.000000
176
+ 2023-10-11 12:45:11,195 epoch 6 - iter 3647/5212 - loss 0.03269803 - time (sec): 998.61 - samples/sec: 252.89 - lr: 0.000072 - momentum: 0.000000
177
+ 2023-10-11 12:47:35,975 epoch 6 - iter 4168/5212 - loss 0.03242644 - time (sec): 1143.39 - samples/sec: 253.88 - lr: 0.000070 - momentum: 0.000000
178
+ 2023-10-11 12:50:00,875 epoch 6 - iter 4689/5212 - loss 0.03391785 - time (sec): 1288.29 - samples/sec: 255.87 - lr: 0.000068 - momentum: 0.000000
179
+ 2023-10-11 12:52:21,466 epoch 6 - iter 5210/5212 - loss 0.03471456 - time (sec): 1428.89 - samples/sec: 256.95 - lr: 0.000067 - momentum: 0.000000
180
+ 2023-10-11 12:52:22,095 ----------------------------------------------------------------------------------------------------
181
+ 2023-10-11 12:52:22,095 EPOCH 6 done: loss 0.0347 - lr: 0.000067
182
+ 2023-10-11 12:53:00,341 DEV : loss 0.397626131772995 - f1-score (micro avg) 0.3773
183
+ 2023-10-11 12:53:00,393 ----------------------------------------------------------------------------------------------------
184
+ 2023-10-11 12:55:23,221 epoch 7 - iter 521/5212 - loss 0.02766582 - time (sec): 142.83 - samples/sec: 280.80 - lr: 0.000065 - momentum: 0.000000
185
+ 2023-10-11 12:57:43,563 epoch 7 - iter 1042/5212 - loss 0.02549282 - time (sec): 283.17 - samples/sec: 268.30 - lr: 0.000063 - momentum: 0.000000
186
+ 2023-10-11 13:00:06,987 epoch 7 - iter 1563/5212 - loss 0.02298030 - time (sec): 426.59 - samples/sec: 265.97 - lr: 0.000062 - momentum: 0.000000
187
+ 2023-10-11 13:02:33,264 epoch 7 - iter 2084/5212 - loss 0.02542458 - time (sec): 572.87 - samples/sec: 265.28 - lr: 0.000060 - momentum: 0.000000
188
+ 2023-10-11 13:04:53,874 epoch 7 - iter 2605/5212 - loss 0.02658181 - time (sec): 713.48 - samples/sec: 260.46 - lr: 0.000058 - momentum: 0.000000
189
+ 2023-10-11 13:07:19,303 epoch 7 - iter 3126/5212 - loss 0.02678770 - time (sec): 858.91 - samples/sec: 260.17 - lr: 0.000057 - momentum: 0.000000
190
+ 2023-10-11 13:09:41,217 epoch 7 - iter 3647/5212 - loss 0.02682370 - time (sec): 1000.82 - samples/sec: 259.27 - lr: 0.000055 - momentum: 0.000000
191
+ 2023-10-11 13:12:01,858 epoch 7 - iter 4168/5212 - loss 0.02663508 - time (sec): 1141.46 - samples/sec: 258.26 - lr: 0.000053 - momentum: 0.000000
192
+ 2023-10-11 13:14:23,833 epoch 7 - iter 4689/5212 - loss 0.02646033 - time (sec): 1283.44 - samples/sec: 258.05 - lr: 0.000052 - momentum: 0.000000
193
+ 2023-10-11 13:16:45,213 epoch 7 - iter 5210/5212 - loss 0.02638106 - time (sec): 1424.82 - samples/sec: 257.84 - lr: 0.000050 - momentum: 0.000000
194
+ 2023-10-11 13:16:45,633 ----------------------------------------------------------------------------------------------------
195
+ 2023-10-11 13:16:45,633 EPOCH 7 done: loss 0.0264 - lr: 0.000050
196
+ 2023-10-11 13:17:23,898 DEV : loss 0.39803504943847656 - f1-score (micro avg) 0.3855
197
+ 2023-10-11 13:17:23,949 ----------------------------------------------------------------------------------------------------
198
+ 2023-10-11 13:19:45,732 epoch 8 - iter 521/5212 - loss 0.01602735 - time (sec): 141.78 - samples/sec: 259.43 - lr: 0.000048 - momentum: 0.000000
199
+ 2023-10-11 13:22:08,337 epoch 8 - iter 1042/5212 - loss 0.01754525 - time (sec): 284.39 - samples/sec: 260.76 - lr: 0.000047 - momentum: 0.000000
200
+ 2023-10-11 13:24:29,046 epoch 8 - iter 1563/5212 - loss 0.01837014 - time (sec): 425.09 - samples/sec: 259.41 - lr: 0.000045 - momentum: 0.000000
201
+ 2023-10-11 13:26:54,017 epoch 8 - iter 2084/5212 - loss 0.01699882 - time (sec): 570.07 - samples/sec: 258.05 - lr: 0.000043 - momentum: 0.000000
202
+ 2023-10-11 13:29:16,314 epoch 8 - iter 2605/5212 - loss 0.01790932 - time (sec): 712.36 - samples/sec: 259.27 - lr: 0.000042 - momentum: 0.000000
203
+ 2023-10-11 13:31:37,281 epoch 8 - iter 3126/5212 - loss 0.01863312 - time (sec): 853.33 - samples/sec: 259.21 - lr: 0.000040 - momentum: 0.000000
204
+ 2023-10-11 13:33:56,013 epoch 8 - iter 3647/5212 - loss 0.01818979 - time (sec): 992.06 - samples/sec: 259.09 - lr: 0.000038 - momentum: 0.000000
205
+ 2023-10-11 13:36:17,040 epoch 8 - iter 4168/5212 - loss 0.01807363 - time (sec): 1133.09 - samples/sec: 259.49 - lr: 0.000037 - momentum: 0.000000
206
+ 2023-10-11 13:38:38,320 epoch 8 - iter 4689/5212 - loss 0.01736008 - time (sec): 1274.37 - samples/sec: 260.41 - lr: 0.000035 - momentum: 0.000000
207
+ 2023-10-11 13:40:56,604 epoch 8 - iter 5210/5212 - loss 0.01727055 - time (sec): 1412.65 - samples/sec: 259.87 - lr: 0.000033 - momentum: 0.000000
208
+ 2023-10-11 13:40:57,310 ----------------------------------------------------------------------------------------------------
209
+ 2023-10-11 13:40:57,311 EPOCH 8 done: loss 0.0173 - lr: 0.000033
210
+ 2023-10-11 13:41:35,861 DEV : loss 0.434541255235672 - f1-score (micro avg) 0.413
211
+ 2023-10-11 13:41:35,914 saving best model
212
+ 2023-10-11 13:41:38,523 ----------------------------------------------------------------------------------------------------
213
+ 2023-10-11 13:44:00,885 epoch 9 - iter 521/5212 - loss 0.01391910 - time (sec): 142.36 - samples/sec: 269.72 - lr: 0.000032 - momentum: 0.000000
214
+ 2023-10-11 13:46:21,602 epoch 9 - iter 1042/5212 - loss 0.01227543 - time (sec): 283.07 - samples/sec: 268.26 - lr: 0.000030 - momentum: 0.000000
215
+ 2023-10-11 13:48:44,428 epoch 9 - iter 1563/5212 - loss 0.01127380 - time (sec): 425.90 - samples/sec: 261.83 - lr: 0.000028 - momentum: 0.000000
216
+ 2023-10-11 13:51:05,408 epoch 9 - iter 2084/5212 - loss 0.01170911 - time (sec): 566.88 - samples/sec: 257.72 - lr: 0.000027 - momentum: 0.000000
217
+ 2023-10-11 13:53:27,927 epoch 9 - iter 2605/5212 - loss 0.01188654 - time (sec): 709.40 - samples/sec: 258.85 - lr: 0.000025 - momentum: 0.000000
218
+ 2023-10-11 13:55:49,808 epoch 9 - iter 3126/5212 - loss 0.01195441 - time (sec): 851.28 - samples/sec: 257.46 - lr: 0.000023 - momentum: 0.000000
219
+ 2023-10-11 13:58:19,120 epoch 9 - iter 3647/5212 - loss 0.01196011 - time (sec): 1000.59 - samples/sec: 255.97 - lr: 0.000022 - momentum: 0.000000
220
+ 2023-10-11 14:00:47,719 epoch 9 - iter 4168/5212 - loss 0.01172818 - time (sec): 1149.19 - samples/sec: 255.46 - lr: 0.000020 - momentum: 0.000000
221
+ 2023-10-11 14:03:20,778 epoch 9 - iter 4689/5212 - loss 0.01252167 - time (sec): 1302.25 - samples/sec: 254.08 - lr: 0.000018 - momentum: 0.000000
222
+ 2023-10-11 14:05:43,342 epoch 9 - iter 5210/5212 - loss 0.01279053 - time (sec): 1444.81 - samples/sec: 254.13 - lr: 0.000017 - momentum: 0.000000
223
+ 2023-10-11 14:05:43,953 ----------------------------------------------------------------------------------------------------
224
+ 2023-10-11 14:05:43,953 EPOCH 9 done: loss 0.0128 - lr: 0.000017
225
+ 2023-10-11 14:06:24,418 DEV : loss 0.48711156845092773 - f1-score (micro avg) 0.3852
226
+ 2023-10-11 14:06:24,474 ----------------------------------------------------------------------------------------------------
227
+ 2023-10-11 14:08:53,718 epoch 10 - iter 521/5212 - loss 0.00536186 - time (sec): 149.24 - samples/sec: 244.44 - lr: 0.000015 - momentum: 0.000000
228
+ 2023-10-11 14:11:12,907 epoch 10 - iter 1042/5212 - loss 0.00727275 - time (sec): 288.43 - samples/sec: 249.23 - lr: 0.000013 - momentum: 0.000000
229
+ 2023-10-11 14:13:34,227 epoch 10 - iter 1563/5212 - loss 0.00704063 - time (sec): 429.75 - samples/sec: 252.94 - lr: 0.000012 - momentum: 0.000000
230
+ 2023-10-11 14:15:56,744 epoch 10 - iter 2084/5212 - loss 0.00715484 - time (sec): 572.27 - samples/sec: 251.17 - lr: 0.000010 - momentum: 0.000000
231
+ 2023-10-11 14:18:24,269 epoch 10 - iter 2605/5212 - loss 0.00703148 - time (sec): 719.79 - samples/sec: 254.39 - lr: 0.000008 - momentum: 0.000000
232
+ 2023-10-11 14:20:45,693 epoch 10 - iter 3126/5212 - loss 0.00712965 - time (sec): 861.22 - samples/sec: 254.36 - lr: 0.000007 - momentum: 0.000000
233
+ 2023-10-11 14:23:09,963 epoch 10 - iter 3647/5212 - loss 0.00771973 - time (sec): 1005.49 - samples/sec: 254.06 - lr: 0.000005 - momentum: 0.000000
234
+ 2023-10-11 14:25:37,329 epoch 10 - iter 4168/5212 - loss 0.00782038 - time (sec): 1152.85 - samples/sec: 251.90 - lr: 0.000003 - momentum: 0.000000
235
+ 2023-10-11 14:28:04,663 epoch 10 - iter 4689/5212 - loss 0.00789966 - time (sec): 1300.19 - samples/sec: 253.75 - lr: 0.000002 - momentum: 0.000000
236
+ 2023-10-11 14:30:30,685 epoch 10 - iter 5210/5212 - loss 0.00775810 - time (sec): 1446.21 - samples/sec: 254.04 - lr: 0.000000 - momentum: 0.000000
237
+ 2023-10-11 14:30:31,090 ----------------------------------------------------------------------------------------------------
238
+ 2023-10-11 14:30:31,090 EPOCH 10 done: loss 0.0078 - lr: 0.000000
239
+ 2023-10-11 14:31:09,345 DEV : loss 0.4862366318702698 - f1-score (micro avg) 0.3966
240
+ 2023-10-11 14:31:10,297 ----------------------------------------------------------------------------------------------------
241
+ 2023-10-11 14:31:10,299 Loading model from best epoch ...
242
+ 2023-10-11 14:31:14,039 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
243
+ 2023-10-11 14:32:55,343
244
+ Results:
245
+ - F-score (micro) 0.4259
246
+ - F-score (macro) 0.2963
247
+ - Accuracy 0.2744
248
+
249
+ By class:
250
+ precision recall f1-score support
251
+
252
+ LOC 0.4982 0.4596 0.4781 1214
253
+ PER 0.3957 0.4369 0.4153 808
254
+ ORG 0.2918 0.2918 0.2918 353
255
+ HumanProd 0.0000 0.0000 0.0000 15
256
+
257
+ micro avg 0.4275 0.4243 0.4259 2390
258
+ macro avg 0.2964 0.2971 0.2963 2390
259
+ weighted avg 0.4300 0.4243 0.4264 2390
260
+
261
+ 2023-10-11 14:32:55,344 ----------------------------------------------------------------------------------------------------