File size: 25,535 Bytes
19acd5b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
2023-10-11 11:23:41,250 ----------------------------------------------------------------------------------------------------
2023-10-11 11:23:41,253 Model: "SequenceTagger(
  (embeddings): ByT5Embeddings(
    (model): T5EncoderModel(
      (shared): Embedding(384, 1472)
      (encoder): T5Stack(
        (embed_tokens): Embedding(384, 1472)
        (block): ModuleList(
          (0): T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                  (relative_attention_bias): Embedding(32, 6)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
          (1-11): 11 x T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
        )
        (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=1472, out_features=17, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-11 11:23:41,253 ----------------------------------------------------------------------------------------------------
2023-10-11 11:23:41,253 MultiCorpus: 7142 train + 698 dev + 2570 test sentences
 - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator
2023-10-11 11:23:41,253 ----------------------------------------------------------------------------------------------------
2023-10-11 11:23:41,253 Train:  7142 sentences
2023-10-11 11:23:41,253         (train_with_dev=False, train_with_test=False)
2023-10-11 11:23:41,253 ----------------------------------------------------------------------------------------------------
2023-10-11 11:23:41,253 Training Params:
2023-10-11 11:23:41,253  - learning_rate: "0.00015" 
2023-10-11 11:23:41,253  - mini_batch_size: "4"
2023-10-11 11:23:41,254  - max_epochs: "10"
2023-10-11 11:23:41,254  - shuffle: "True"
2023-10-11 11:23:41,254 ----------------------------------------------------------------------------------------------------
2023-10-11 11:23:41,254 Plugins:
2023-10-11 11:23:41,254  - TensorboardLogger
2023-10-11 11:23:41,254  - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 11:23:41,254 ----------------------------------------------------------------------------------------------------
2023-10-11 11:23:41,254 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 11:23:41,254  - metric: "('micro avg', 'f1-score')"
2023-10-11 11:23:41,254 ----------------------------------------------------------------------------------------------------
2023-10-11 11:23:41,254 Computation:
2023-10-11 11:23:41,254  - compute on device: cuda:0
2023-10-11 11:23:41,254  - embedding storage: none
2023-10-11 11:23:41,254 ----------------------------------------------------------------------------------------------------
2023-10-11 11:23:41,254 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-3"
2023-10-11 11:23:41,255 ----------------------------------------------------------------------------------------------------
2023-10-11 11:23:41,255 ----------------------------------------------------------------------------------------------------
2023-10-11 11:23:41,255 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 11:24:36,735 epoch 1 - iter 178/1786 - loss 2.81334662 - time (sec): 55.48 - samples/sec: 485.36 - lr: 0.000015 - momentum: 0.000000
2023-10-11 11:25:30,124 epoch 1 - iter 356/1786 - loss 2.65360553 - time (sec): 108.87 - samples/sec: 465.62 - lr: 0.000030 - momentum: 0.000000
2023-10-11 11:26:25,121 epoch 1 - iter 534/1786 - loss 2.38474398 - time (sec): 163.86 - samples/sec: 454.74 - lr: 0.000045 - momentum: 0.000000
2023-10-11 11:27:20,005 epoch 1 - iter 712/1786 - loss 2.09942018 - time (sec): 218.75 - samples/sec: 451.36 - lr: 0.000060 - momentum: 0.000000
2023-10-11 11:28:22,609 epoch 1 - iter 890/1786 - loss 1.82583585 - time (sec): 281.35 - samples/sec: 443.83 - lr: 0.000075 - momentum: 0.000000
2023-10-11 11:29:19,386 epoch 1 - iter 1068/1786 - loss 1.62155219 - time (sec): 338.13 - samples/sec: 439.95 - lr: 0.000090 - momentum: 0.000000
2023-10-11 11:30:14,335 epoch 1 - iter 1246/1786 - loss 1.44125903 - time (sec): 393.08 - samples/sec: 443.60 - lr: 0.000105 - momentum: 0.000000
2023-10-11 11:31:09,952 epoch 1 - iter 1424/1786 - loss 1.30760460 - time (sec): 448.70 - samples/sec: 443.19 - lr: 0.000120 - momentum: 0.000000
2023-10-11 11:32:05,906 epoch 1 - iter 1602/1786 - loss 1.19189571 - time (sec): 504.65 - samples/sec: 443.96 - lr: 0.000134 - momentum: 0.000000
2023-10-11 11:33:03,148 epoch 1 - iter 1780/1786 - loss 1.10257673 - time (sec): 561.89 - samples/sec: 441.25 - lr: 0.000149 - momentum: 0.000000
2023-10-11 11:33:04,915 ----------------------------------------------------------------------------------------------------
2023-10-11 11:33:04,916 EPOCH 1 done: loss 1.0997 - lr: 0.000149
2023-10-11 11:33:25,813 DEV : loss 0.17743352055549622 - f1-score (micro avg)  0.6248
2023-10-11 11:33:25,849 saving best model
2023-10-11 11:33:26,769 ----------------------------------------------------------------------------------------------------
2023-10-11 11:34:26,264 epoch 2 - iter 178/1786 - loss 0.16761718 - time (sec): 59.49 - samples/sec: 441.01 - lr: 0.000148 - momentum: 0.000000
2023-10-11 11:35:22,903 epoch 2 - iter 356/1786 - loss 0.17182138 - time (sec): 116.13 - samples/sec: 443.29 - lr: 0.000147 - momentum: 0.000000
2023-10-11 11:36:17,446 epoch 2 - iter 534/1786 - loss 0.16152548 - time (sec): 170.67 - samples/sec: 441.52 - lr: 0.000145 - momentum: 0.000000
2023-10-11 11:37:12,824 epoch 2 - iter 712/1786 - loss 0.14888668 - time (sec): 226.05 - samples/sec: 445.71 - lr: 0.000143 - momentum: 0.000000
2023-10-11 11:38:04,419 epoch 2 - iter 890/1786 - loss 0.14407949 - time (sec): 277.65 - samples/sec: 448.14 - lr: 0.000142 - momentum: 0.000000
2023-10-11 11:38:58,004 epoch 2 - iter 1068/1786 - loss 0.14005413 - time (sec): 331.23 - samples/sec: 453.55 - lr: 0.000140 - momentum: 0.000000
2023-10-11 11:39:50,454 epoch 2 - iter 1246/1786 - loss 0.13811135 - time (sec): 383.68 - samples/sec: 455.24 - lr: 0.000138 - momentum: 0.000000
2023-10-11 11:40:45,636 epoch 2 - iter 1424/1786 - loss 0.13523123 - time (sec): 438.86 - samples/sec: 451.08 - lr: 0.000137 - momentum: 0.000000
2023-10-11 11:41:43,985 epoch 2 - iter 1602/1786 - loss 0.13268221 - time (sec): 497.21 - samples/sec: 447.57 - lr: 0.000135 - momentum: 0.000000
2023-10-11 11:42:37,848 epoch 2 - iter 1780/1786 - loss 0.12982701 - time (sec): 551.08 - samples/sec: 449.89 - lr: 0.000133 - momentum: 0.000000
2023-10-11 11:42:39,599 ----------------------------------------------------------------------------------------------------
2023-10-11 11:42:39,599 EPOCH 2 done: loss 0.1296 - lr: 0.000133
2023-10-11 11:43:01,563 DEV : loss 0.1018710657954216 - f1-score (micro avg)  0.7677
2023-10-11 11:43:01,593 saving best model
2023-10-11 11:43:04,216 ----------------------------------------------------------------------------------------------------
2023-10-11 11:43:55,588 epoch 3 - iter 178/1786 - loss 0.06089358 - time (sec): 51.37 - samples/sec: 464.22 - lr: 0.000132 - momentum: 0.000000
2023-10-11 11:44:47,335 epoch 3 - iter 356/1786 - loss 0.06083272 - time (sec): 103.11 - samples/sec: 474.26 - lr: 0.000130 - momentum: 0.000000
2023-10-11 11:45:39,079 epoch 3 - iter 534/1786 - loss 0.06301108 - time (sec): 154.86 - samples/sec: 472.60 - lr: 0.000128 - momentum: 0.000000
2023-10-11 11:46:31,108 epoch 3 - iter 712/1786 - loss 0.06615584 - time (sec): 206.89 - samples/sec: 474.57 - lr: 0.000127 - momentum: 0.000000
2023-10-11 11:47:24,530 epoch 3 - iter 890/1786 - loss 0.07031467 - time (sec): 260.31 - samples/sec: 472.06 - lr: 0.000125 - momentum: 0.000000
2023-10-11 11:48:18,485 epoch 3 - iter 1068/1786 - loss 0.07283988 - time (sec): 314.26 - samples/sec: 469.38 - lr: 0.000123 - momentum: 0.000000
2023-10-11 11:49:17,279 epoch 3 - iter 1246/1786 - loss 0.07528099 - time (sec): 373.06 - samples/sec: 466.84 - lr: 0.000122 - momentum: 0.000000
2023-10-11 11:50:13,121 epoch 3 - iter 1424/1786 - loss 0.07404515 - time (sec): 428.90 - samples/sec: 461.84 - lr: 0.000120 - momentum: 0.000000
2023-10-11 11:51:06,569 epoch 3 - iter 1602/1786 - loss 0.07276736 - time (sec): 482.35 - samples/sec: 462.80 - lr: 0.000118 - momentum: 0.000000
2023-10-11 11:51:59,561 epoch 3 - iter 1780/1786 - loss 0.07349545 - time (sec): 535.34 - samples/sec: 463.53 - lr: 0.000117 - momentum: 0.000000
2023-10-11 11:52:01,090 ----------------------------------------------------------------------------------------------------
2023-10-11 11:52:01,091 EPOCH 3 done: loss 0.0738 - lr: 0.000117
2023-10-11 11:52:23,243 DEV : loss 0.11691577732563019 - f1-score (micro avg)  0.7871
2023-10-11 11:52:23,273 saving best model
2023-10-11 11:52:25,840 ----------------------------------------------------------------------------------------------------
2023-10-11 11:53:18,856 epoch 4 - iter 178/1786 - loss 0.03874746 - time (sec): 53.01 - samples/sec: 453.23 - lr: 0.000115 - momentum: 0.000000
2023-10-11 11:54:17,169 epoch 4 - iter 356/1786 - loss 0.04840883 - time (sec): 111.32 - samples/sec: 440.32 - lr: 0.000113 - momentum: 0.000000
2023-10-11 11:55:15,656 epoch 4 - iter 534/1786 - loss 0.04781593 - time (sec): 169.81 - samples/sec: 445.25 - lr: 0.000112 - momentum: 0.000000
2023-10-11 11:56:12,421 epoch 4 - iter 712/1786 - loss 0.05099290 - time (sec): 226.58 - samples/sec: 442.20 - lr: 0.000110 - momentum: 0.000000
2023-10-11 11:57:11,612 epoch 4 - iter 890/1786 - loss 0.05175638 - time (sec): 285.77 - samples/sec: 440.13 - lr: 0.000108 - momentum: 0.000000
2023-10-11 11:58:05,630 epoch 4 - iter 1068/1786 - loss 0.05367502 - time (sec): 339.79 - samples/sec: 437.42 - lr: 0.000107 - momentum: 0.000000
2023-10-11 11:59:01,799 epoch 4 - iter 1246/1786 - loss 0.05394290 - time (sec): 395.95 - samples/sec: 438.77 - lr: 0.000105 - momentum: 0.000000
2023-10-11 11:59:56,173 epoch 4 - iter 1424/1786 - loss 0.05369425 - time (sec): 450.33 - samples/sec: 439.57 - lr: 0.000103 - momentum: 0.000000
2023-10-11 12:00:51,253 epoch 4 - iter 1602/1786 - loss 0.05297193 - time (sec): 505.41 - samples/sec: 441.40 - lr: 0.000102 - momentum: 0.000000
2023-10-11 12:01:47,664 epoch 4 - iter 1780/1786 - loss 0.05177312 - time (sec): 561.82 - samples/sec: 441.43 - lr: 0.000100 - momentum: 0.000000
2023-10-11 12:01:49,317 ----------------------------------------------------------------------------------------------------
2023-10-11 12:01:49,317 EPOCH 4 done: loss 0.0517 - lr: 0.000100
2023-10-11 12:02:11,096 DEV : loss 0.1411616951227188 - f1-score (micro avg)  0.7951
2023-10-11 12:02:11,133 saving best model
2023-10-11 12:02:13,751 ----------------------------------------------------------------------------------------------------
2023-10-11 12:03:07,905 epoch 5 - iter 178/1786 - loss 0.03072434 - time (sec): 54.15 - samples/sec: 463.84 - lr: 0.000098 - momentum: 0.000000
2023-10-11 12:04:04,994 epoch 5 - iter 356/1786 - loss 0.03893563 - time (sec): 111.24 - samples/sec: 454.86 - lr: 0.000097 - momentum: 0.000000
2023-10-11 12:04:59,333 epoch 5 - iter 534/1786 - loss 0.03550559 - time (sec): 165.58 - samples/sec: 456.77 - lr: 0.000095 - momentum: 0.000000
2023-10-11 12:05:54,205 epoch 5 - iter 712/1786 - loss 0.03476839 - time (sec): 220.45 - samples/sec: 452.42 - lr: 0.000093 - momentum: 0.000000
2023-10-11 12:06:47,739 epoch 5 - iter 890/1786 - loss 0.03433559 - time (sec): 273.98 - samples/sec: 451.14 - lr: 0.000092 - momentum: 0.000000
2023-10-11 12:07:41,693 epoch 5 - iter 1068/1786 - loss 0.03434337 - time (sec): 327.94 - samples/sec: 450.14 - lr: 0.000090 - momentum: 0.000000
2023-10-11 12:08:37,890 epoch 5 - iter 1246/1786 - loss 0.03476657 - time (sec): 384.13 - samples/sec: 450.22 - lr: 0.000088 - momentum: 0.000000
2023-10-11 12:09:36,033 epoch 5 - iter 1424/1786 - loss 0.03524109 - time (sec): 442.28 - samples/sec: 446.25 - lr: 0.000087 - momentum: 0.000000
2023-10-11 12:10:33,305 epoch 5 - iter 1602/1786 - loss 0.03503721 - time (sec): 499.55 - samples/sec: 445.01 - lr: 0.000085 - momentum: 0.000000
2023-10-11 12:11:30,163 epoch 5 - iter 1780/1786 - loss 0.03719160 - time (sec): 556.41 - samples/sec: 445.83 - lr: 0.000083 - momentum: 0.000000
2023-10-11 12:11:31,834 ----------------------------------------------------------------------------------------------------
2023-10-11 12:11:31,835 EPOCH 5 done: loss 0.0373 - lr: 0.000083
2023-10-11 12:11:54,146 DEV : loss 0.1446281224489212 - f1-score (micro avg)  0.7989
2023-10-11 12:11:54,177 saving best model
2023-10-11 12:11:56,879 ----------------------------------------------------------------------------------------------------
2023-10-11 12:12:51,045 epoch 6 - iter 178/1786 - loss 0.03460331 - time (sec): 54.16 - samples/sec: 480.26 - lr: 0.000082 - momentum: 0.000000
2023-10-11 12:13:45,100 epoch 6 - iter 356/1786 - loss 0.03107979 - time (sec): 108.22 - samples/sec: 458.69 - lr: 0.000080 - momentum: 0.000000
2023-10-11 12:14:39,394 epoch 6 - iter 534/1786 - loss 0.03181132 - time (sec): 162.51 - samples/sec: 453.26 - lr: 0.000078 - momentum: 0.000000
2023-10-11 12:15:33,225 epoch 6 - iter 712/1786 - loss 0.03080483 - time (sec): 216.34 - samples/sec: 458.64 - lr: 0.000077 - momentum: 0.000000
2023-10-11 12:16:27,715 epoch 6 - iter 890/1786 - loss 0.02989970 - time (sec): 270.83 - samples/sec: 456.86 - lr: 0.000075 - momentum: 0.000000
2023-10-11 12:17:20,578 epoch 6 - iter 1068/1786 - loss 0.02943783 - time (sec): 323.70 - samples/sec: 454.31 - lr: 0.000073 - momentum: 0.000000
2023-10-11 12:18:16,492 epoch 6 - iter 1246/1786 - loss 0.02814408 - time (sec): 379.61 - samples/sec: 452.62 - lr: 0.000072 - momentum: 0.000000
2023-10-11 12:19:11,158 epoch 6 - iter 1424/1786 - loss 0.02929556 - time (sec): 434.28 - samples/sec: 456.28 - lr: 0.000070 - momentum: 0.000000
2023-10-11 12:20:06,940 epoch 6 - iter 1602/1786 - loss 0.02877508 - time (sec): 490.06 - samples/sec: 455.44 - lr: 0.000068 - momentum: 0.000000
2023-10-11 12:21:07,155 epoch 6 - iter 1780/1786 - loss 0.02880009 - time (sec): 550.27 - samples/sec: 450.81 - lr: 0.000067 - momentum: 0.000000
2023-10-11 12:21:08,998 ----------------------------------------------------------------------------------------------------
2023-10-11 12:21:08,998 EPOCH 6 done: loss 0.0287 - lr: 0.000067
2023-10-11 12:21:31,729 DEV : loss 0.18198005855083466 - f1-score (micro avg)  0.7913
2023-10-11 12:21:31,760 ----------------------------------------------------------------------------------------------------
2023-10-11 12:22:27,207 epoch 7 - iter 178/1786 - loss 0.02502610 - time (sec): 55.45 - samples/sec: 442.13 - lr: 0.000065 - momentum: 0.000000
2023-10-11 12:23:18,505 epoch 7 - iter 356/1786 - loss 0.02577891 - time (sec): 106.74 - samples/sec: 447.65 - lr: 0.000063 - momentum: 0.000000
2023-10-11 12:24:11,729 epoch 7 - iter 534/1786 - loss 0.02380424 - time (sec): 159.97 - samples/sec: 456.73 - lr: 0.000062 - momentum: 0.000000
2023-10-11 12:25:03,073 epoch 7 - iter 712/1786 - loss 0.02361198 - time (sec): 211.31 - samples/sec: 459.78 - lr: 0.000060 - momentum: 0.000000
2023-10-11 12:25:54,713 epoch 7 - iter 890/1786 - loss 0.02408280 - time (sec): 262.95 - samples/sec: 465.29 - lr: 0.000058 - momentum: 0.000000
2023-10-11 12:26:47,777 epoch 7 - iter 1068/1786 - loss 0.02326326 - time (sec): 316.01 - samples/sec: 468.21 - lr: 0.000057 - momentum: 0.000000
2023-10-11 12:27:42,548 epoch 7 - iter 1246/1786 - loss 0.02207530 - time (sec): 370.79 - samples/sec: 466.64 - lr: 0.000055 - momentum: 0.000000
2023-10-11 12:28:34,067 epoch 7 - iter 1424/1786 - loss 0.02234953 - time (sec): 422.31 - samples/sec: 469.14 - lr: 0.000053 - momentum: 0.000000
2023-10-11 12:29:25,675 epoch 7 - iter 1602/1786 - loss 0.02289223 - time (sec): 473.91 - samples/sec: 471.10 - lr: 0.000052 - momentum: 0.000000
2023-10-11 12:30:17,009 epoch 7 - iter 1780/1786 - loss 0.02244176 - time (sec): 525.25 - samples/sec: 472.48 - lr: 0.000050 - momentum: 0.000000
2023-10-11 12:30:18,464 ----------------------------------------------------------------------------------------------------
2023-10-11 12:30:18,465 EPOCH 7 done: loss 0.0224 - lr: 0.000050
2023-10-11 12:30:38,760 DEV : loss 0.19651886820793152 - f1-score (micro avg)  0.7944
2023-10-11 12:30:38,790 ----------------------------------------------------------------------------------------------------
2023-10-11 12:31:30,271 epoch 8 - iter 178/1786 - loss 0.01170122 - time (sec): 51.48 - samples/sec: 479.72 - lr: 0.000048 - momentum: 0.000000
2023-10-11 12:32:21,756 epoch 8 - iter 356/1786 - loss 0.01261849 - time (sec): 102.96 - samples/sec: 480.14 - lr: 0.000047 - momentum: 0.000000
2023-10-11 12:33:11,986 epoch 8 - iter 534/1786 - loss 0.01077078 - time (sec): 153.19 - samples/sec: 474.98 - lr: 0.000045 - momentum: 0.000000
2023-10-11 12:34:02,991 epoch 8 - iter 712/1786 - loss 0.01119283 - time (sec): 204.20 - samples/sec: 471.16 - lr: 0.000043 - momentum: 0.000000
2023-10-11 12:34:54,856 epoch 8 - iter 890/1786 - loss 0.01302730 - time (sec): 256.06 - samples/sec: 469.17 - lr: 0.000042 - momentum: 0.000000
2023-10-11 12:35:52,337 epoch 8 - iter 1068/1786 - loss 0.01505220 - time (sec): 313.55 - samples/sec: 467.37 - lr: 0.000040 - momentum: 0.000000
2023-10-11 12:36:46,291 epoch 8 - iter 1246/1786 - loss 0.01529191 - time (sec): 367.50 - samples/sec: 469.17 - lr: 0.000038 - momentum: 0.000000
2023-10-11 12:37:40,067 epoch 8 - iter 1424/1786 - loss 0.01565327 - time (sec): 421.28 - samples/sec: 472.27 - lr: 0.000037 - momentum: 0.000000
2023-10-11 12:38:33,123 epoch 8 - iter 1602/1786 - loss 0.01640331 - time (sec): 474.33 - samples/sec: 473.18 - lr: 0.000035 - momentum: 0.000000
2023-10-11 12:39:24,506 epoch 8 - iter 1780/1786 - loss 0.01586581 - time (sec): 525.71 - samples/sec: 471.92 - lr: 0.000033 - momentum: 0.000000
2023-10-11 12:39:26,070 ----------------------------------------------------------------------------------------------------
2023-10-11 12:39:26,070 EPOCH 8 done: loss 0.0158 - lr: 0.000033
2023-10-11 12:39:47,196 DEV : loss 0.20342709124088287 - f1-score (micro avg)  0.8005
2023-10-11 12:39:47,225 saving best model
2023-10-11 12:39:49,797 ----------------------------------------------------------------------------------------------------
2023-10-11 12:40:41,572 epoch 9 - iter 178/1786 - loss 0.01473686 - time (sec): 51.77 - samples/sec: 460.70 - lr: 0.000032 - momentum: 0.000000
2023-10-11 12:41:32,765 epoch 9 - iter 356/1786 - loss 0.01118027 - time (sec): 102.96 - samples/sec: 453.89 - lr: 0.000030 - momentum: 0.000000
2023-10-11 12:42:23,635 epoch 9 - iter 534/1786 - loss 0.01245056 - time (sec): 153.83 - samples/sec: 447.25 - lr: 0.000028 - momentum: 0.000000
2023-10-11 12:43:17,012 epoch 9 - iter 712/1786 - loss 0.01164884 - time (sec): 207.21 - samples/sec: 457.99 - lr: 0.000027 - momentum: 0.000000
2023-10-11 12:44:09,593 epoch 9 - iter 890/1786 - loss 0.01172585 - time (sec): 259.79 - samples/sec: 462.94 - lr: 0.000025 - momentum: 0.000000
2023-10-11 12:45:03,569 epoch 9 - iter 1068/1786 - loss 0.01162303 - time (sec): 313.77 - samples/sec: 465.80 - lr: 0.000023 - momentum: 0.000000
2023-10-11 12:45:58,302 epoch 9 - iter 1246/1786 - loss 0.01213149 - time (sec): 368.50 - samples/sec: 468.41 - lr: 0.000022 - momentum: 0.000000
2023-10-11 12:46:54,840 epoch 9 - iter 1424/1786 - loss 0.01176088 - time (sec): 425.04 - samples/sec: 467.35 - lr: 0.000020 - momentum: 0.000000
2023-10-11 12:47:47,993 epoch 9 - iter 1602/1786 - loss 0.01190777 - time (sec): 478.19 - samples/sec: 467.50 - lr: 0.000018 - momentum: 0.000000
2023-10-11 12:48:41,133 epoch 9 - iter 1780/1786 - loss 0.01190889 - time (sec): 531.33 - samples/sec: 466.14 - lr: 0.000017 - momentum: 0.000000
2023-10-11 12:48:42,969 ----------------------------------------------------------------------------------------------------
2023-10-11 12:48:42,969 EPOCH 9 done: loss 0.0120 - lr: 0.000017
2023-10-11 12:49:04,098 DEV : loss 0.21383821964263916 - f1-score (micro avg)  0.7934
2023-10-11 12:49:04,137 ----------------------------------------------------------------------------------------------------
2023-10-11 12:49:59,581 epoch 10 - iter 178/1786 - loss 0.00900177 - time (sec): 55.44 - samples/sec: 450.00 - lr: 0.000015 - momentum: 0.000000
2023-10-11 12:50:51,193 epoch 10 - iter 356/1786 - loss 0.01022974 - time (sec): 107.05 - samples/sec: 449.40 - lr: 0.000013 - momentum: 0.000000
2023-10-11 12:51:44,986 epoch 10 - iter 534/1786 - loss 0.00925737 - time (sec): 160.85 - samples/sec: 452.35 - lr: 0.000012 - momentum: 0.000000
2023-10-11 12:52:43,086 epoch 10 - iter 712/1786 - loss 0.00868107 - time (sec): 218.95 - samples/sec: 449.05 - lr: 0.000010 - momentum: 0.000000
2023-10-11 12:53:39,352 epoch 10 - iter 890/1786 - loss 0.00865781 - time (sec): 275.21 - samples/sec: 451.32 - lr: 0.000008 - momentum: 0.000000
2023-10-11 12:54:34,765 epoch 10 - iter 1068/1786 - loss 0.00851102 - time (sec): 330.62 - samples/sec: 449.74 - lr: 0.000007 - momentum: 0.000000
2023-10-11 12:55:30,317 epoch 10 - iter 1246/1786 - loss 0.00848258 - time (sec): 386.18 - samples/sec: 447.18 - lr: 0.000005 - momentum: 0.000000
2023-10-11 12:56:26,280 epoch 10 - iter 1424/1786 - loss 0.00850758 - time (sec): 442.14 - samples/sec: 446.92 - lr: 0.000003 - momentum: 0.000000
2023-10-11 12:57:23,983 epoch 10 - iter 1602/1786 - loss 0.00827861 - time (sec): 499.84 - samples/sec: 445.08 - lr: 0.000002 - momentum: 0.000000
2023-10-11 12:58:21,558 epoch 10 - iter 1780/1786 - loss 0.00830503 - time (sec): 557.42 - samples/sec: 445.35 - lr: 0.000000 - momentum: 0.000000
2023-10-11 12:58:23,007 ----------------------------------------------------------------------------------------------------
2023-10-11 12:58:23,007 EPOCH 10 done: loss 0.0083 - lr: 0.000000
2023-10-11 12:58:45,352 DEV : loss 0.2220211774110794 - f1-score (micro avg)  0.7952
2023-10-11 12:58:46,381 ----------------------------------------------------------------------------------------------------
2023-10-11 12:58:46,383 Loading model from best epoch ...
2023-10-11 12:58:50,444 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-11 13:00:06,150 
Results:
- F-score (micro) 0.7251
- F-score (macro) 0.6453
- Accuracy 0.5829

By class:
              precision    recall  f1-score   support

         LOC     0.7389    0.7443    0.7416      1095
         PER     0.7830    0.7846    0.7838      1012
         ORG     0.5083    0.5994    0.5501       357
   HumanProd     0.4074    0.6667    0.5057        33

   micro avg     0.7118    0.7389    0.7251      2497
   macro avg     0.6094    0.6987    0.6453      2497
weighted avg     0.7194    0.7389    0.7282      2497

2023-10-11 13:00:06,150 ----------------------------------------------------------------------------------------------------