File size: 25,371 Bytes
3502b80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
2023-10-11 22:21:01,967 ----------------------------------------------------------------------------------------------------
2023-10-11 22:21:01,969 Model: "SequenceTagger(
  (embeddings): ByT5Embeddings(
    (model): T5EncoderModel(
      (shared): Embedding(384, 1472)
      (encoder): T5Stack(
        (embed_tokens): Embedding(384, 1472)
        (block): ModuleList(
          (0): T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                  (relative_attention_bias): Embedding(32, 6)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
          (1-11): 11 x T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=1472, out_features=384, bias=False)
                  (k): Linear(in_features=1472, out_features=384, bias=False)
                  (v): Linear(in_features=1472, out_features=384, bias=False)
                  (o): Linear(in_features=384, out_features=1472, bias=False)
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                  (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                  (wo): Linear(in_features=3584, out_features=1472, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
        )
        (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=1472, out_features=17, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-11 22:21:01,969 ----------------------------------------------------------------------------------------------------
2023-10-11 22:21:01,969 MultiCorpus: 7142 train + 698 dev + 2570 test sentences
 - NER_HIPE_2022 Corpus: 7142 train + 698 dev + 2570 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fr/with_doc_seperator
2023-10-11 22:21:01,969 ----------------------------------------------------------------------------------------------------
2023-10-11 22:21:01,969 Train:  7142 sentences
2023-10-11 22:21:01,969         (train_with_dev=False, train_with_test=False)
2023-10-11 22:21:01,970 ----------------------------------------------------------------------------------------------------
2023-10-11 22:21:01,970 Training Params:
2023-10-11 22:21:01,970  - learning_rate: "0.00016" 
2023-10-11 22:21:01,970  - mini_batch_size: "8"
2023-10-11 22:21:01,970  - max_epochs: "10"
2023-10-11 22:21:01,970  - shuffle: "True"
2023-10-11 22:21:01,970 ----------------------------------------------------------------------------------------------------
2023-10-11 22:21:01,970 Plugins:
2023-10-11 22:21:01,970  - TensorboardLogger
2023-10-11 22:21:01,970  - LinearScheduler | warmup_fraction: '0.1'
2023-10-11 22:21:01,970 ----------------------------------------------------------------------------------------------------
2023-10-11 22:21:01,970 Final evaluation on model from best epoch (best-model.pt)
2023-10-11 22:21:01,970  - metric: "('micro avg', 'f1-score')"
2023-10-11 22:21:01,970 ----------------------------------------------------------------------------------------------------
2023-10-11 22:21:01,970 Computation:
2023-10-11 22:21:01,970  - compute on device: cuda:0
2023-10-11 22:21:01,971  - embedding storage: none
2023-10-11 22:21:01,971 ----------------------------------------------------------------------------------------------------
2023-10-11 22:21:01,971 Model training base path: "hmbench-newseye/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5"
2023-10-11 22:21:01,971 ----------------------------------------------------------------------------------------------------
2023-10-11 22:21:01,971 ----------------------------------------------------------------------------------------------------
2023-10-11 22:21:01,971 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-11 22:21:54,369 epoch 1 - iter 89/893 - loss 2.81524244 - time (sec): 52.40 - samples/sec: 516.95 - lr: 0.000016 - momentum: 0.000000
2023-10-11 22:22:46,282 epoch 1 - iter 178/893 - loss 2.73176101 - time (sec): 104.31 - samples/sec: 506.78 - lr: 0.000032 - momentum: 0.000000
2023-10-11 22:23:38,432 epoch 1 - iter 267/893 - loss 2.52603415 - time (sec): 156.46 - samples/sec: 509.33 - lr: 0.000048 - momentum: 0.000000
2023-10-11 22:24:27,279 epoch 1 - iter 356/893 - loss 2.31615221 - time (sec): 205.31 - samples/sec: 510.58 - lr: 0.000064 - momentum: 0.000000
2023-10-11 22:25:16,525 epoch 1 - iter 445/893 - loss 2.08800093 - time (sec): 254.55 - samples/sec: 508.25 - lr: 0.000080 - momentum: 0.000000
2023-10-11 22:26:05,675 epoch 1 - iter 534/893 - loss 1.87576125 - time (sec): 303.70 - samples/sec: 503.88 - lr: 0.000095 - momentum: 0.000000
2023-10-11 22:26:54,470 epoch 1 - iter 623/893 - loss 1.70576459 - time (sec): 352.50 - samples/sec: 502.24 - lr: 0.000111 - momentum: 0.000000
2023-10-11 22:27:42,644 epoch 1 - iter 712/893 - loss 1.56835597 - time (sec): 400.67 - samples/sec: 497.83 - lr: 0.000127 - momentum: 0.000000
2023-10-11 22:28:31,216 epoch 1 - iter 801/893 - loss 1.44155521 - time (sec): 449.24 - samples/sec: 497.76 - lr: 0.000143 - momentum: 0.000000
2023-10-11 22:29:19,353 epoch 1 - iter 890/893 - loss 1.33419276 - time (sec): 497.38 - samples/sec: 498.80 - lr: 0.000159 - momentum: 0.000000
2023-10-11 22:29:20,732 ----------------------------------------------------------------------------------------------------
2023-10-11 22:29:20,732 EPOCH 1 done: loss 1.3312 - lr: 0.000159
2023-10-11 22:29:40,984 DEV : loss 0.24347300827503204 - f1-score (micro avg)  0.4712
2023-10-11 22:29:41,014 saving best model
2023-10-11 22:29:41,873 ----------------------------------------------------------------------------------------------------
2023-10-11 22:30:32,952 epoch 2 - iter 89/893 - loss 0.28954077 - time (sec): 51.08 - samples/sec: 488.58 - lr: 0.000158 - momentum: 0.000000
2023-10-11 22:31:23,495 epoch 2 - iter 178/893 - loss 0.26838336 - time (sec): 101.62 - samples/sec: 495.29 - lr: 0.000156 - momentum: 0.000000
2023-10-11 22:32:12,958 epoch 2 - iter 267/893 - loss 0.24433660 - time (sec): 151.08 - samples/sec: 499.09 - lr: 0.000155 - momentum: 0.000000
2023-10-11 22:33:01,075 epoch 2 - iter 356/893 - loss 0.22640434 - time (sec): 199.20 - samples/sec: 501.23 - lr: 0.000153 - momentum: 0.000000
2023-10-11 22:33:50,431 epoch 2 - iter 445/893 - loss 0.20805597 - time (sec): 248.56 - samples/sec: 507.68 - lr: 0.000151 - momentum: 0.000000
2023-10-11 22:34:37,653 epoch 2 - iter 534/893 - loss 0.19855291 - time (sec): 295.78 - samples/sec: 505.35 - lr: 0.000149 - momentum: 0.000000
2023-10-11 22:35:25,559 epoch 2 - iter 623/893 - loss 0.18874509 - time (sec): 343.68 - samples/sec: 504.29 - lr: 0.000148 - momentum: 0.000000
2023-10-11 22:36:14,187 epoch 2 - iter 712/893 - loss 0.18046880 - time (sec): 392.31 - samples/sec: 506.71 - lr: 0.000146 - momentum: 0.000000
2023-10-11 22:37:02,132 epoch 2 - iter 801/893 - loss 0.17469471 - time (sec): 440.26 - samples/sec: 506.29 - lr: 0.000144 - momentum: 0.000000
2023-10-11 22:37:50,288 epoch 2 - iter 890/893 - loss 0.16713569 - time (sec): 488.41 - samples/sec: 506.95 - lr: 0.000142 - momentum: 0.000000
2023-10-11 22:37:52,010 ----------------------------------------------------------------------------------------------------
2023-10-11 22:37:52,010 EPOCH 2 done: loss 0.1669 - lr: 0.000142
2023-10-11 22:38:12,850 DEV : loss 0.09539955109357834 - f1-score (micro avg)  0.7653
2023-10-11 22:38:12,880 saving best model
2023-10-11 22:38:15,886 ----------------------------------------------------------------------------------------------------
2023-10-11 22:39:03,890 epoch 3 - iter 89/893 - loss 0.07439450 - time (sec): 48.00 - samples/sec: 511.31 - lr: 0.000140 - momentum: 0.000000
2023-10-11 22:39:52,230 epoch 3 - iter 178/893 - loss 0.07435914 - time (sec): 96.34 - samples/sec: 519.96 - lr: 0.000139 - momentum: 0.000000
2023-10-11 22:40:39,097 epoch 3 - iter 267/893 - loss 0.07407314 - time (sec): 143.21 - samples/sec: 514.84 - lr: 0.000137 - momentum: 0.000000
2023-10-11 22:41:26,901 epoch 3 - iter 356/893 - loss 0.07395440 - time (sec): 191.01 - samples/sec: 512.44 - lr: 0.000135 - momentum: 0.000000
2023-10-11 22:42:15,822 epoch 3 - iter 445/893 - loss 0.07186459 - time (sec): 239.93 - samples/sec: 515.32 - lr: 0.000133 - momentum: 0.000000
2023-10-11 22:43:06,799 epoch 3 - iter 534/893 - loss 0.07219677 - time (sec): 290.91 - samples/sec: 513.25 - lr: 0.000132 - momentum: 0.000000
2023-10-11 22:43:56,818 epoch 3 - iter 623/893 - loss 0.07111615 - time (sec): 340.93 - samples/sec: 510.62 - lr: 0.000130 - momentum: 0.000000
2023-10-11 22:44:45,025 epoch 3 - iter 712/893 - loss 0.07090444 - time (sec): 389.13 - samples/sec: 506.99 - lr: 0.000128 - momentum: 0.000000
2023-10-11 22:45:33,782 epoch 3 - iter 801/893 - loss 0.07231914 - time (sec): 437.89 - samples/sec: 506.23 - lr: 0.000126 - momentum: 0.000000
2023-10-11 22:46:23,881 epoch 3 - iter 890/893 - loss 0.07073149 - time (sec): 487.99 - samples/sec: 508.09 - lr: 0.000125 - momentum: 0.000000
2023-10-11 22:46:25,367 ----------------------------------------------------------------------------------------------------
2023-10-11 22:46:25,367 EPOCH 3 done: loss 0.0708 - lr: 0.000125
2023-10-11 22:46:46,567 DEV : loss 0.10698171705007553 - f1-score (micro avg)  0.7863
2023-10-11 22:46:46,596 saving best model
2023-10-11 22:46:49,112 ----------------------------------------------------------------------------------------------------
2023-10-11 22:47:39,142 epoch 4 - iter 89/893 - loss 0.04988314 - time (sec): 50.03 - samples/sec: 536.16 - lr: 0.000123 - momentum: 0.000000
2023-10-11 22:48:27,553 epoch 4 - iter 178/893 - loss 0.04846943 - time (sec): 98.44 - samples/sec: 516.18 - lr: 0.000121 - momentum: 0.000000
2023-10-11 22:49:16,600 epoch 4 - iter 267/893 - loss 0.04681982 - time (sec): 147.48 - samples/sec: 514.66 - lr: 0.000119 - momentum: 0.000000
2023-10-11 22:50:05,132 epoch 4 - iter 356/893 - loss 0.04623150 - time (sec): 196.02 - samples/sec: 512.55 - lr: 0.000117 - momentum: 0.000000
2023-10-11 22:50:53,786 epoch 4 - iter 445/893 - loss 0.04787124 - time (sec): 244.67 - samples/sec: 507.06 - lr: 0.000116 - momentum: 0.000000
2023-10-11 22:51:42,456 epoch 4 - iter 534/893 - loss 0.04769015 - time (sec): 293.34 - samples/sec: 509.14 - lr: 0.000114 - momentum: 0.000000
2023-10-11 22:52:29,965 epoch 4 - iter 623/893 - loss 0.04761173 - time (sec): 340.85 - samples/sec: 507.97 - lr: 0.000112 - momentum: 0.000000
2023-10-11 22:53:17,603 epoch 4 - iter 712/893 - loss 0.04741060 - time (sec): 388.49 - samples/sec: 507.58 - lr: 0.000110 - momentum: 0.000000
2023-10-11 22:54:06,700 epoch 4 - iter 801/893 - loss 0.04736835 - time (sec): 437.58 - samples/sec: 511.68 - lr: 0.000109 - momentum: 0.000000
2023-10-11 22:54:55,040 epoch 4 - iter 890/893 - loss 0.04709479 - time (sec): 485.92 - samples/sec: 510.57 - lr: 0.000107 - momentum: 0.000000
2023-10-11 22:54:56,494 ----------------------------------------------------------------------------------------------------
2023-10-11 22:54:56,495 EPOCH 4 done: loss 0.0471 - lr: 0.000107
2023-10-11 22:55:18,057 DEV : loss 0.12400590628385544 - f1-score (micro avg)  0.7966
2023-10-11 22:55:18,087 saving best model
2023-10-11 22:55:20,719 ----------------------------------------------------------------------------------------------------
2023-10-11 22:56:09,865 epoch 5 - iter 89/893 - loss 0.03397882 - time (sec): 49.14 - samples/sec: 499.21 - lr: 0.000105 - momentum: 0.000000
2023-10-11 22:56:58,565 epoch 5 - iter 178/893 - loss 0.03522845 - time (sec): 97.84 - samples/sec: 500.50 - lr: 0.000103 - momentum: 0.000000
2023-10-11 22:57:48,347 epoch 5 - iter 267/893 - loss 0.03521031 - time (sec): 147.62 - samples/sec: 503.01 - lr: 0.000101 - momentum: 0.000000
2023-10-11 22:58:35,076 epoch 5 - iter 356/893 - loss 0.03460673 - time (sec): 194.35 - samples/sec: 502.96 - lr: 0.000100 - momentum: 0.000000
2023-10-11 22:59:22,704 epoch 5 - iter 445/893 - loss 0.03523870 - time (sec): 241.98 - samples/sec: 502.99 - lr: 0.000098 - momentum: 0.000000
2023-10-11 23:00:10,809 epoch 5 - iter 534/893 - loss 0.03478198 - time (sec): 290.09 - samples/sec: 504.56 - lr: 0.000096 - momentum: 0.000000
2023-10-11 23:01:00,615 epoch 5 - iter 623/893 - loss 0.03495198 - time (sec): 339.89 - samples/sec: 510.85 - lr: 0.000094 - momentum: 0.000000
2023-10-11 23:01:50,082 epoch 5 - iter 712/893 - loss 0.03581308 - time (sec): 389.36 - samples/sec: 509.69 - lr: 0.000093 - momentum: 0.000000
2023-10-11 23:02:39,996 epoch 5 - iter 801/893 - loss 0.03648619 - time (sec): 439.27 - samples/sec: 508.22 - lr: 0.000091 - momentum: 0.000000
2023-10-11 23:03:30,163 epoch 5 - iter 890/893 - loss 0.03607845 - time (sec): 489.44 - samples/sec: 506.91 - lr: 0.000089 - momentum: 0.000000
2023-10-11 23:03:31,620 ----------------------------------------------------------------------------------------------------
2023-10-11 23:03:31,620 EPOCH 5 done: loss 0.0361 - lr: 0.000089
2023-10-11 23:03:52,415 DEV : loss 0.14019542932510376 - f1-score (micro avg)  0.8003
2023-10-11 23:03:52,447 saving best model
2023-10-11 23:03:54,985 ----------------------------------------------------------------------------------------------------
2023-10-11 23:04:45,324 epoch 6 - iter 89/893 - loss 0.02545772 - time (sec): 50.33 - samples/sec: 510.51 - lr: 0.000087 - momentum: 0.000000
2023-10-11 23:05:34,136 epoch 6 - iter 178/893 - loss 0.02647733 - time (sec): 99.15 - samples/sec: 502.13 - lr: 0.000085 - momentum: 0.000000
2023-10-11 23:06:25,652 epoch 6 - iter 267/893 - loss 0.02542927 - time (sec): 150.66 - samples/sec: 512.02 - lr: 0.000084 - momentum: 0.000000
2023-10-11 23:07:14,673 epoch 6 - iter 356/893 - loss 0.02606427 - time (sec): 199.68 - samples/sec: 507.41 - lr: 0.000082 - momentum: 0.000000
2023-10-11 23:08:04,660 epoch 6 - iter 445/893 - loss 0.02787874 - time (sec): 249.67 - samples/sec: 510.08 - lr: 0.000080 - momentum: 0.000000
2023-10-11 23:08:53,284 epoch 6 - iter 534/893 - loss 0.02723140 - time (sec): 298.29 - samples/sec: 508.54 - lr: 0.000078 - momentum: 0.000000
2023-10-11 23:09:42,484 epoch 6 - iter 623/893 - loss 0.02737784 - time (sec): 347.49 - samples/sec: 505.88 - lr: 0.000077 - momentum: 0.000000
2023-10-11 23:10:34,252 epoch 6 - iter 712/893 - loss 0.02676267 - time (sec): 399.26 - samples/sec: 503.29 - lr: 0.000075 - momentum: 0.000000
2023-10-11 23:11:22,438 epoch 6 - iter 801/893 - loss 0.02657678 - time (sec): 447.45 - samples/sec: 501.14 - lr: 0.000073 - momentum: 0.000000
2023-10-11 23:12:11,166 epoch 6 - iter 890/893 - loss 0.02720868 - time (sec): 496.18 - samples/sec: 499.17 - lr: 0.000071 - momentum: 0.000000
2023-10-11 23:12:12,922 ----------------------------------------------------------------------------------------------------
2023-10-11 23:12:12,922 EPOCH 6 done: loss 0.0273 - lr: 0.000071
2023-10-11 23:12:34,497 DEV : loss 0.15041321516036987 - f1-score (micro avg)  0.8117
2023-10-11 23:12:34,527 saving best model
2023-10-11 23:12:37,084 ----------------------------------------------------------------------------------------------------
2023-10-11 23:13:26,120 epoch 7 - iter 89/893 - loss 0.02780661 - time (sec): 49.03 - samples/sec: 490.86 - lr: 0.000069 - momentum: 0.000000
2023-10-11 23:14:16,338 epoch 7 - iter 178/893 - loss 0.02358968 - time (sec): 99.25 - samples/sec: 501.77 - lr: 0.000068 - momentum: 0.000000
2023-10-11 23:15:05,061 epoch 7 - iter 267/893 - loss 0.02258577 - time (sec): 147.97 - samples/sec: 497.61 - lr: 0.000066 - momentum: 0.000000
2023-10-11 23:15:55,092 epoch 7 - iter 356/893 - loss 0.02043234 - time (sec): 198.00 - samples/sec: 502.39 - lr: 0.000064 - momentum: 0.000000
2023-10-11 23:16:43,870 epoch 7 - iter 445/893 - loss 0.02126625 - time (sec): 246.78 - samples/sec: 504.49 - lr: 0.000062 - momentum: 0.000000
2023-10-11 23:17:34,601 epoch 7 - iter 534/893 - loss 0.02028243 - time (sec): 297.51 - samples/sec: 501.86 - lr: 0.000061 - momentum: 0.000000
2023-10-11 23:18:23,631 epoch 7 - iter 623/893 - loss 0.02053878 - time (sec): 346.54 - samples/sec: 500.97 - lr: 0.000059 - momentum: 0.000000
2023-10-11 23:19:12,400 epoch 7 - iter 712/893 - loss 0.02128759 - time (sec): 395.31 - samples/sec: 500.79 - lr: 0.000057 - momentum: 0.000000
2023-10-11 23:20:01,270 epoch 7 - iter 801/893 - loss 0.02189113 - time (sec): 444.18 - samples/sec: 502.62 - lr: 0.000055 - momentum: 0.000000
2023-10-11 23:20:50,098 epoch 7 - iter 890/893 - loss 0.02197161 - time (sec): 493.01 - samples/sec: 502.77 - lr: 0.000053 - momentum: 0.000000
2023-10-11 23:20:51,632 ----------------------------------------------------------------------------------------------------
2023-10-11 23:20:51,632 EPOCH 7 done: loss 0.0219 - lr: 0.000053
2023-10-11 23:21:13,195 DEV : loss 0.1641770303249359 - f1-score (micro avg)  0.8043
2023-10-11 23:21:13,225 ----------------------------------------------------------------------------------------------------
2023-10-11 23:22:01,599 epoch 8 - iter 89/893 - loss 0.01437145 - time (sec): 48.37 - samples/sec: 517.79 - lr: 0.000052 - momentum: 0.000000
2023-10-11 23:22:50,880 epoch 8 - iter 178/893 - loss 0.01549440 - time (sec): 97.65 - samples/sec: 515.13 - lr: 0.000050 - momentum: 0.000000
2023-10-11 23:23:39,221 epoch 8 - iter 267/893 - loss 0.01587140 - time (sec): 145.99 - samples/sec: 514.78 - lr: 0.000048 - momentum: 0.000000
2023-10-11 23:24:27,855 epoch 8 - iter 356/893 - loss 0.01530621 - time (sec): 194.63 - samples/sec: 505.15 - lr: 0.000046 - momentum: 0.000000
2023-10-11 23:25:15,842 epoch 8 - iter 445/893 - loss 0.01504943 - time (sec): 242.61 - samples/sec: 502.27 - lr: 0.000045 - momentum: 0.000000
2023-10-11 23:26:05,567 epoch 8 - iter 534/893 - loss 0.01505772 - time (sec): 292.34 - samples/sec: 506.57 - lr: 0.000043 - momentum: 0.000000
2023-10-11 23:26:53,841 epoch 8 - iter 623/893 - loss 0.01621767 - time (sec): 340.61 - samples/sec: 502.31 - lr: 0.000041 - momentum: 0.000000
2023-10-11 23:27:43,961 epoch 8 - iter 712/893 - loss 0.01613174 - time (sec): 390.73 - samples/sec: 504.62 - lr: 0.000039 - momentum: 0.000000
2023-10-11 23:28:34,573 epoch 8 - iter 801/893 - loss 0.01616556 - time (sec): 441.35 - samples/sec: 506.32 - lr: 0.000037 - momentum: 0.000000
2023-10-11 23:29:24,349 epoch 8 - iter 890/893 - loss 0.01640991 - time (sec): 491.12 - samples/sec: 504.76 - lr: 0.000036 - momentum: 0.000000
2023-10-11 23:29:25,954 ----------------------------------------------------------------------------------------------------
2023-10-11 23:29:25,955 EPOCH 8 done: loss 0.0164 - lr: 0.000036
2023-10-11 23:29:47,575 DEV : loss 0.1812078058719635 - f1-score (micro avg)  0.8045
2023-10-11 23:29:47,606 ----------------------------------------------------------------------------------------------------
2023-10-11 23:30:40,748 epoch 9 - iter 89/893 - loss 0.01204535 - time (sec): 53.14 - samples/sec: 487.81 - lr: 0.000034 - momentum: 0.000000
2023-10-11 23:31:32,198 epoch 9 - iter 178/893 - loss 0.00976354 - time (sec): 104.59 - samples/sec: 477.49 - lr: 0.000032 - momentum: 0.000000
2023-10-11 23:32:23,605 epoch 9 - iter 267/893 - loss 0.01155278 - time (sec): 156.00 - samples/sec: 476.39 - lr: 0.000030 - momentum: 0.000000
2023-10-11 23:33:14,498 epoch 9 - iter 356/893 - loss 0.01127452 - time (sec): 206.89 - samples/sec: 475.23 - lr: 0.000029 - momentum: 0.000000
2023-10-11 23:34:04,554 epoch 9 - iter 445/893 - loss 0.01129248 - time (sec): 256.95 - samples/sec: 478.02 - lr: 0.000027 - momentum: 0.000000
2023-10-11 23:34:56,240 epoch 9 - iter 534/893 - loss 0.01135105 - time (sec): 308.63 - samples/sec: 483.08 - lr: 0.000025 - momentum: 0.000000
2023-10-11 23:35:48,950 epoch 9 - iter 623/893 - loss 0.01222460 - time (sec): 361.34 - samples/sec: 485.20 - lr: 0.000023 - momentum: 0.000000
2023-10-11 23:36:39,707 epoch 9 - iter 712/893 - loss 0.01277049 - time (sec): 412.10 - samples/sec: 486.22 - lr: 0.000022 - momentum: 0.000000
2023-10-11 23:37:28,891 epoch 9 - iter 801/893 - loss 0.01295373 - time (sec): 461.28 - samples/sec: 486.62 - lr: 0.000020 - momentum: 0.000000
2023-10-11 23:38:17,433 epoch 9 - iter 890/893 - loss 0.01308695 - time (sec): 509.82 - samples/sec: 486.57 - lr: 0.000018 - momentum: 0.000000
2023-10-11 23:38:18,887 ----------------------------------------------------------------------------------------------------
2023-10-11 23:38:18,888 EPOCH 9 done: loss 0.0131 - lr: 0.000018
2023-10-11 23:38:40,916 DEV : loss 0.19328945875167847 - f1-score (micro avg)  0.8091
2023-10-11 23:38:40,947 ----------------------------------------------------------------------------------------------------
2023-10-11 23:39:33,172 epoch 10 - iter 89/893 - loss 0.01252914 - time (sec): 52.22 - samples/sec: 483.30 - lr: 0.000016 - momentum: 0.000000
2023-10-11 23:40:25,294 epoch 10 - iter 178/893 - loss 0.01243200 - time (sec): 104.34 - samples/sec: 483.94 - lr: 0.000014 - momentum: 0.000000
2023-10-11 23:41:17,755 epoch 10 - iter 267/893 - loss 0.01093071 - time (sec): 156.81 - samples/sec: 483.81 - lr: 0.000013 - momentum: 0.000000
2023-10-11 23:42:09,380 epoch 10 - iter 356/893 - loss 0.01152608 - time (sec): 208.43 - samples/sec: 485.20 - lr: 0.000011 - momentum: 0.000000
2023-10-11 23:43:01,810 epoch 10 - iter 445/893 - loss 0.01150384 - time (sec): 260.86 - samples/sec: 484.83 - lr: 0.000009 - momentum: 0.000000
2023-10-11 23:43:52,835 epoch 10 - iter 534/893 - loss 0.01091038 - time (sec): 311.89 - samples/sec: 482.41 - lr: 0.000007 - momentum: 0.000000
2023-10-11 23:44:45,715 epoch 10 - iter 623/893 - loss 0.01107542 - time (sec): 364.77 - samples/sec: 483.08 - lr: 0.000006 - momentum: 0.000000
2023-10-11 23:45:37,730 epoch 10 - iter 712/893 - loss 0.01039016 - time (sec): 416.78 - samples/sec: 480.44 - lr: 0.000004 - momentum: 0.000000
2023-10-11 23:46:29,653 epoch 10 - iter 801/893 - loss 0.01057808 - time (sec): 468.70 - samples/sec: 478.65 - lr: 0.000002 - momentum: 0.000000
2023-10-11 23:47:19,812 epoch 10 - iter 890/893 - loss 0.01067030 - time (sec): 518.86 - samples/sec: 478.32 - lr: 0.000000 - momentum: 0.000000
2023-10-11 23:47:21,250 ----------------------------------------------------------------------------------------------------
2023-10-11 23:47:21,250 EPOCH 10 done: loss 0.0106 - lr: 0.000000
2023-10-11 23:47:44,716 DEV : loss 0.19667339324951172 - f1-score (micro avg)  0.8085
2023-10-11 23:47:45,653 ----------------------------------------------------------------------------------------------------
2023-10-11 23:47:45,656 Loading model from best epoch ...
2023-10-11 23:47:49,477 SequenceTagger predicts: Dictionary with 17 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd
2023-10-11 23:49:01,259 
Results:
- F-score (micro) 0.7008
- F-score (macro) 0.6502
- Accuracy 0.5557

By class:
              precision    recall  f1-score   support

         LOC     0.6994    0.7160    0.7076      1095
         PER     0.7683    0.7767    0.7725      1012
         ORG     0.4524    0.5994    0.5157       357
   HumanProd     0.5349    0.6970    0.6053        33

   micro avg     0.6793    0.7237    0.7008      2497
   macro avg     0.6138    0.6973    0.6502      2497
weighted avg     0.6898    0.7237    0.7051      2497

2023-10-11 23:49:01,259 ----------------------------------------------------------------------------------------------------