File size: 24,166 Bytes
ba8cf81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
2023-10-27 20:07:01,765 ----------------------------------------------------------------------------------------------------
2023-10-27 20:07:01,767 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): XLMRobertaModel(
      (embeddings): XLMRobertaEmbeddings(
        (word_embeddings): Embedding(250003, 1024)
        (position_embeddings): Embedding(514, 1024, padding_idx=1)
        (token_type_embeddings): Embedding(1, 1024)
        (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): XLMRobertaEncoder(
        (layer): ModuleList(
          (0-23): 24 x XLMRobertaLayer(
            (attention): XLMRobertaAttention(
              (self): XLMRobertaSelfAttention(
                (query): Linear(in_features=1024, out_features=1024, bias=True)
                (key): Linear(in_features=1024, out_features=1024, bias=True)
                (value): Linear(in_features=1024, out_features=1024, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): XLMRobertaSelfOutput(
                (dense): Linear(in_features=1024, out_features=1024, bias=True)
                (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): XLMRobertaIntermediate(
              (dense): Linear(in_features=1024, out_features=4096, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): XLMRobertaOutput(
              (dense): Linear(in_features=4096, out_features=1024, bias=True)
              (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): XLMRobertaPooler(
        (dense): Linear(in_features=1024, out_features=1024, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=1024, out_features=17, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-27 20:07:01,767 ----------------------------------------------------------------------------------------------------
2023-10-27 20:07:01,767 Corpus: 14903 train + 3449 dev + 3658 test sentences
2023-10-27 20:07:01,767 ----------------------------------------------------------------------------------------------------
2023-10-27 20:07:01,767 Train:  14903 sentences
2023-10-27 20:07:01,767         (train_with_dev=False, train_with_test=False)
2023-10-27 20:07:01,767 ----------------------------------------------------------------------------------------------------
2023-10-27 20:07:01,767 Training Params:
2023-10-27 20:07:01,767  - learning_rate: "5e-06" 
2023-10-27 20:07:01,767  - mini_batch_size: "4"
2023-10-27 20:07:01,767  - max_epochs: "10"
2023-10-27 20:07:01,767  - shuffle: "True"
2023-10-27 20:07:01,767 ----------------------------------------------------------------------------------------------------
2023-10-27 20:07:01,768 Plugins:
2023-10-27 20:07:01,768  - TensorboardLogger
2023-10-27 20:07:01,768  - LinearScheduler | warmup_fraction: '0.1'
2023-10-27 20:07:01,768 ----------------------------------------------------------------------------------------------------
2023-10-27 20:07:01,768 Final evaluation on model from best epoch (best-model.pt)
2023-10-27 20:07:01,768  - metric: "('micro avg', 'f1-score')"
2023-10-27 20:07:01,768 ----------------------------------------------------------------------------------------------------
2023-10-27 20:07:01,768 Computation:
2023-10-27 20:07:01,768  - compute on device: cuda:0
2023-10-27 20:07:01,768  - embedding storage: none
2023-10-27 20:07:01,768 ----------------------------------------------------------------------------------------------------
2023-10-27 20:07:01,768 Model training base path: "flair-clean-conll-lr5e-06-bs4-5"
2023-10-27 20:07:01,768 ----------------------------------------------------------------------------------------------------
2023-10-27 20:07:01,768 ----------------------------------------------------------------------------------------------------
2023-10-27 20:07:01,768 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-27 20:07:47,370 epoch 1 - iter 372/3726 - loss 2.80593129 - time (sec): 45.60 - samples/sec: 438.05 - lr: 0.000000 - momentum: 0.000000
2023-10-27 20:08:32,644 epoch 1 - iter 744/3726 - loss 1.85094677 - time (sec): 90.87 - samples/sec: 444.69 - lr: 0.000001 - momentum: 0.000000
2023-10-27 20:09:17,954 epoch 1 - iter 1116/3726 - loss 1.42058830 - time (sec): 136.18 - samples/sec: 444.93 - lr: 0.000001 - momentum: 0.000000
2023-10-27 20:10:03,236 epoch 1 - iter 1488/3726 - loss 1.16393809 - time (sec): 181.47 - samples/sec: 444.38 - lr: 0.000002 - momentum: 0.000000
2023-10-27 20:10:48,524 epoch 1 - iter 1860/3726 - loss 0.98551923 - time (sec): 226.75 - samples/sec: 446.04 - lr: 0.000002 - momentum: 0.000000
2023-10-27 20:11:33,918 epoch 1 - iter 2232/3726 - loss 0.84790959 - time (sec): 272.15 - samples/sec: 450.27 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:12:19,596 epoch 1 - iter 2604/3726 - loss 0.74598188 - time (sec): 317.83 - samples/sec: 451.02 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:13:05,071 epoch 1 - iter 2976/3726 - loss 0.66540477 - time (sec): 363.30 - samples/sec: 451.42 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:13:50,701 epoch 1 - iter 3348/3726 - loss 0.60748783 - time (sec): 408.93 - samples/sec: 449.21 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:14:35,944 epoch 1 - iter 3720/3726 - loss 0.55800133 - time (sec): 454.17 - samples/sec: 449.55 - lr: 0.000005 - momentum: 0.000000
2023-10-27 20:14:36,662 ----------------------------------------------------------------------------------------------------
2023-10-27 20:14:36,663 EPOCH 1 done: loss 0.5571 - lr: 0.000005
2023-10-27 20:15:00,976 DEV : loss 0.08272701501846313 - f1-score (micro avg)  0.9305
2023-10-27 20:15:01,029 saving best model
2023-10-27 20:15:02,837 ----------------------------------------------------------------------------------------------------
2023-10-27 20:15:49,496 epoch 2 - iter 372/3726 - loss 0.09075236 - time (sec): 46.66 - samples/sec: 446.77 - lr: 0.000005 - momentum: 0.000000
2023-10-27 20:16:35,366 epoch 2 - iter 744/3726 - loss 0.09661752 - time (sec): 92.53 - samples/sec: 440.88 - lr: 0.000005 - momentum: 0.000000
2023-10-27 20:17:20,750 epoch 2 - iter 1116/3726 - loss 0.09592189 - time (sec): 137.91 - samples/sec: 442.98 - lr: 0.000005 - momentum: 0.000000
2023-10-27 20:18:05,959 epoch 2 - iter 1488/3726 - loss 0.09322980 - time (sec): 183.12 - samples/sec: 443.25 - lr: 0.000005 - momentum: 0.000000
2023-10-27 20:18:51,330 epoch 2 - iter 1860/3726 - loss 0.08941354 - time (sec): 228.49 - samples/sec: 443.63 - lr: 0.000005 - momentum: 0.000000
2023-10-27 20:19:36,865 epoch 2 - iter 2232/3726 - loss 0.08782526 - time (sec): 274.03 - samples/sec: 444.12 - lr: 0.000005 - momentum: 0.000000
2023-10-27 20:20:21,760 epoch 2 - iter 2604/3726 - loss 0.08763755 - time (sec): 318.92 - samples/sec: 447.07 - lr: 0.000005 - momentum: 0.000000
2023-10-27 20:21:07,307 epoch 2 - iter 2976/3726 - loss 0.08459677 - time (sec): 364.47 - samples/sec: 449.80 - lr: 0.000005 - momentum: 0.000000
2023-10-27 20:21:52,612 epoch 2 - iter 3348/3726 - loss 0.08200724 - time (sec): 409.77 - samples/sec: 449.39 - lr: 0.000005 - momentum: 0.000000
2023-10-27 20:22:37,887 epoch 2 - iter 3720/3726 - loss 0.08090953 - time (sec): 455.05 - samples/sec: 448.96 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:22:38,602 ----------------------------------------------------------------------------------------------------
2023-10-27 20:22:38,603 EPOCH 2 done: loss 0.0808 - lr: 0.000004
2023-10-27 20:23:01,816 DEV : loss 0.0558977946639061 - f1-score (micro avg)  0.9643
2023-10-27 20:23:01,871 saving best model
2023-10-27 20:23:04,562 ----------------------------------------------------------------------------------------------------
2023-10-27 20:23:50,075 epoch 3 - iter 372/3726 - loss 0.04925132 - time (sec): 45.51 - samples/sec: 436.29 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:24:35,505 epoch 3 - iter 744/3726 - loss 0.05096433 - time (sec): 90.94 - samples/sec: 441.79 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:25:21,376 epoch 3 - iter 1116/3726 - loss 0.05345821 - time (sec): 136.81 - samples/sec: 444.14 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:26:06,796 epoch 3 - iter 1488/3726 - loss 0.05364040 - time (sec): 182.23 - samples/sec: 444.47 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:26:52,221 epoch 3 - iter 1860/3726 - loss 0.05380637 - time (sec): 227.66 - samples/sec: 447.62 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:27:37,903 epoch 3 - iter 2232/3726 - loss 0.05332169 - time (sec): 273.34 - samples/sec: 448.66 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:28:24,443 epoch 3 - iter 2604/3726 - loss 0.05365144 - time (sec): 319.88 - samples/sec: 446.15 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:29:09,620 epoch 3 - iter 2976/3726 - loss 0.05262140 - time (sec): 365.06 - samples/sec: 447.36 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:29:55,368 epoch 3 - iter 3348/3726 - loss 0.05205947 - time (sec): 410.80 - samples/sec: 448.23 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:30:40,872 epoch 3 - iter 3720/3726 - loss 0.05122877 - time (sec): 456.31 - samples/sec: 447.79 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:30:41,552 ----------------------------------------------------------------------------------------------------
2023-10-27 20:30:41,553 EPOCH 3 done: loss 0.0512 - lr: 0.000004
2023-10-27 20:31:04,451 DEV : loss 0.04910625144839287 - f1-score (micro avg)  0.969
2023-10-27 20:31:04,505 saving best model
2023-10-27 20:31:07,070 ----------------------------------------------------------------------------------------------------
2023-10-27 20:31:52,530 epoch 4 - iter 372/3726 - loss 0.03342383 - time (sec): 45.46 - samples/sec: 455.54 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:32:37,688 epoch 4 - iter 744/3726 - loss 0.03205166 - time (sec): 90.62 - samples/sec: 456.58 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:33:24,378 epoch 4 - iter 1116/3726 - loss 0.03357460 - time (sec): 137.31 - samples/sec: 451.87 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:34:10,145 epoch 4 - iter 1488/3726 - loss 0.03617981 - time (sec): 183.07 - samples/sec: 454.34 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:34:56,203 epoch 4 - iter 1860/3726 - loss 0.03561724 - time (sec): 229.13 - samples/sec: 450.05 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:35:42,349 epoch 4 - iter 2232/3726 - loss 0.03513961 - time (sec): 275.28 - samples/sec: 447.95 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:36:27,872 epoch 4 - iter 2604/3726 - loss 0.03560568 - time (sec): 320.80 - samples/sec: 446.14 - lr: 0.000004 - momentum: 0.000000
2023-10-27 20:37:13,373 epoch 4 - iter 2976/3726 - loss 0.03572121 - time (sec): 366.30 - samples/sec: 445.85 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:37:59,045 epoch 4 - iter 3348/3726 - loss 0.03483182 - time (sec): 411.97 - samples/sec: 446.11 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:38:44,300 epoch 4 - iter 3720/3726 - loss 0.03483957 - time (sec): 457.23 - samples/sec: 447.06 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:38:44,988 ----------------------------------------------------------------------------------------------------
2023-10-27 20:38:44,988 EPOCH 4 done: loss 0.0348 - lr: 0.000003
2023-10-27 20:39:07,891 DEV : loss 0.04652674123644829 - f1-score (micro avg)  0.9705
2023-10-27 20:39:07,943 saving best model
2023-10-27 20:39:10,583 ----------------------------------------------------------------------------------------------------
2023-10-27 20:39:56,189 epoch 5 - iter 372/3726 - loss 0.03173966 - time (sec): 45.60 - samples/sec: 442.22 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:40:42,588 epoch 5 - iter 744/3726 - loss 0.03324355 - time (sec): 92.00 - samples/sec: 441.81 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:41:28,491 epoch 5 - iter 1116/3726 - loss 0.03176114 - time (sec): 137.91 - samples/sec: 446.35 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:42:14,724 epoch 5 - iter 1488/3726 - loss 0.02967819 - time (sec): 184.14 - samples/sec: 446.19 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:42:59,527 epoch 5 - iter 1860/3726 - loss 0.03079490 - time (sec): 228.94 - samples/sec: 446.65 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:43:44,886 epoch 5 - iter 2232/3726 - loss 0.02966149 - time (sec): 274.30 - samples/sec: 445.62 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:44:30,379 epoch 5 - iter 2604/3726 - loss 0.03065570 - time (sec): 319.79 - samples/sec: 446.65 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:45:15,785 epoch 5 - iter 2976/3726 - loss 0.03042225 - time (sec): 365.20 - samples/sec: 446.89 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:46:01,755 epoch 5 - iter 3348/3726 - loss 0.03020845 - time (sec): 411.17 - samples/sec: 446.33 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:46:47,177 epoch 5 - iter 3720/3726 - loss 0.02983678 - time (sec): 456.59 - samples/sec: 447.55 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:46:47,923 ----------------------------------------------------------------------------------------------------
2023-10-27 20:46:47,924 EPOCH 5 done: loss 0.0299 - lr: 0.000003
2023-10-27 20:47:10,884 DEV : loss 0.050089359283447266 - f1-score (micro avg)  0.9712
2023-10-27 20:47:10,938 saving best model
2023-10-27 20:47:13,597 ----------------------------------------------------------------------------------------------------
2023-10-27 20:47:59,020 epoch 6 - iter 372/3726 - loss 0.02248010 - time (sec): 45.42 - samples/sec: 453.45 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:48:44,494 epoch 6 - iter 744/3726 - loss 0.01889729 - time (sec): 90.89 - samples/sec: 449.45 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:49:30,409 epoch 6 - iter 1116/3726 - loss 0.01902198 - time (sec): 136.81 - samples/sec: 446.61 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:50:16,297 epoch 6 - iter 1488/3726 - loss 0.02004643 - time (sec): 182.70 - samples/sec: 443.97 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:51:01,691 epoch 6 - iter 1860/3726 - loss 0.01983142 - time (sec): 228.09 - samples/sec: 444.93 - lr: 0.000003 - momentum: 0.000000
2023-10-27 20:51:47,098 epoch 6 - iter 2232/3726 - loss 0.02028738 - time (sec): 273.50 - samples/sec: 446.24 - lr: 0.000002 - momentum: 0.000000
2023-10-27 20:52:33,330 epoch 6 - iter 2604/3726 - loss 0.02002001 - time (sec): 319.73 - samples/sec: 446.18 - lr: 0.000002 - momentum: 0.000000
2023-10-27 20:53:19,605 epoch 6 - iter 2976/3726 - loss 0.02081204 - time (sec): 366.01 - samples/sec: 445.36 - lr: 0.000002 - momentum: 0.000000
2023-10-27 20:54:05,958 epoch 6 - iter 3348/3726 - loss 0.02032741 - time (sec): 412.36 - samples/sec: 445.83 - lr: 0.000002 - momentum: 0.000000
2023-10-27 20:54:52,761 epoch 6 - iter 3720/3726 - loss 0.02044719 - time (sec): 459.16 - samples/sec: 444.72 - lr: 0.000002 - momentum: 0.000000
2023-10-27 20:54:53,517 ----------------------------------------------------------------------------------------------------
2023-10-27 20:54:53,517 EPOCH 6 done: loss 0.0205 - lr: 0.000002
2023-10-27 20:55:17,099 DEV : loss 0.04764683172106743 - f1-score (micro avg)  0.9742
2023-10-27 20:55:17,154 saving best model
2023-10-27 20:55:19,755 ----------------------------------------------------------------------------------------------------
2023-10-27 20:56:05,378 epoch 7 - iter 372/3726 - loss 0.02366929 - time (sec): 45.62 - samples/sec: 447.27 - lr: 0.000002 - momentum: 0.000000
2023-10-27 20:56:51,118 epoch 7 - iter 744/3726 - loss 0.02311710 - time (sec): 91.36 - samples/sec: 439.82 - lr: 0.000002 - momentum: 0.000000
2023-10-27 20:57:36,527 epoch 7 - iter 1116/3726 - loss 0.02129467 - time (sec): 136.77 - samples/sec: 445.39 - lr: 0.000002 - momentum: 0.000000
2023-10-27 20:58:21,642 epoch 7 - iter 1488/3726 - loss 0.02001426 - time (sec): 181.88 - samples/sec: 447.89 - lr: 0.000002 - momentum: 0.000000
2023-10-27 20:59:08,061 epoch 7 - iter 1860/3726 - loss 0.01894813 - time (sec): 228.30 - samples/sec: 445.16 - lr: 0.000002 - momentum: 0.000000
2023-10-27 20:59:53,051 epoch 7 - iter 2232/3726 - loss 0.01829151 - time (sec): 273.29 - samples/sec: 443.22 - lr: 0.000002 - momentum: 0.000000
2023-10-27 21:00:38,919 epoch 7 - iter 2604/3726 - loss 0.01783981 - time (sec): 319.16 - samples/sec: 442.58 - lr: 0.000002 - momentum: 0.000000
2023-10-27 21:01:26,075 epoch 7 - iter 2976/3726 - loss 0.01776618 - time (sec): 366.32 - samples/sec: 442.68 - lr: 0.000002 - momentum: 0.000000
2023-10-27 21:02:13,955 epoch 7 - iter 3348/3726 - loss 0.01772398 - time (sec): 414.20 - samples/sec: 442.24 - lr: 0.000002 - momentum: 0.000000
2023-10-27 21:03:01,418 epoch 7 - iter 3720/3726 - loss 0.01723102 - time (sec): 461.66 - samples/sec: 442.49 - lr: 0.000002 - momentum: 0.000000
2023-10-27 21:03:02,183 ----------------------------------------------------------------------------------------------------
2023-10-27 21:03:02,184 EPOCH 7 done: loss 0.0177 - lr: 0.000002
2023-10-27 21:03:26,361 DEV : loss 0.05419960245490074 - f1-score (micro avg)  0.9746
2023-10-27 21:03:26,416 saving best model
2023-10-27 21:03:29,497 ----------------------------------------------------------------------------------------------------
2023-10-27 21:04:16,746 epoch 8 - iter 372/3726 - loss 0.01736122 - time (sec): 47.25 - samples/sec: 425.05 - lr: 0.000002 - momentum: 0.000000
2023-10-27 21:05:04,978 epoch 8 - iter 744/3726 - loss 0.01398385 - time (sec): 95.48 - samples/sec: 422.25 - lr: 0.000002 - momentum: 0.000000
2023-10-27 21:05:52,318 epoch 8 - iter 1116/3726 - loss 0.01274088 - time (sec): 142.82 - samples/sec: 424.69 - lr: 0.000002 - momentum: 0.000000
2023-10-27 21:06:39,260 epoch 8 - iter 1488/3726 - loss 0.01328050 - time (sec): 189.76 - samples/sec: 424.24 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:07:26,410 epoch 8 - iter 1860/3726 - loss 0.01227844 - time (sec): 236.91 - samples/sec: 427.47 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:08:13,172 epoch 8 - iter 2232/3726 - loss 0.01171643 - time (sec): 283.67 - samples/sec: 428.79 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:09:00,982 epoch 8 - iter 2604/3726 - loss 0.01235731 - time (sec): 331.48 - samples/sec: 428.53 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:09:51,061 epoch 8 - iter 2976/3726 - loss 0.01221098 - time (sec): 381.56 - samples/sec: 426.25 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:10:41,073 epoch 8 - iter 3348/3726 - loss 0.01227301 - time (sec): 431.57 - samples/sec: 426.32 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:11:30,225 epoch 8 - iter 3720/3726 - loss 0.01200497 - time (sec): 480.72 - samples/sec: 424.99 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:11:31,013 ----------------------------------------------------------------------------------------------------
2023-10-27 21:11:31,013 EPOCH 8 done: loss 0.0120 - lr: 0.000001
2023-10-27 21:11:56,731 DEV : loss 0.05550903454422951 - f1-score (micro avg)  0.9746
2023-10-27 21:11:56,806 ----------------------------------------------------------------------------------------------------
2023-10-27 21:12:46,264 epoch 9 - iter 372/3726 - loss 0.00563766 - time (sec): 49.46 - samples/sec: 405.90 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:13:36,129 epoch 9 - iter 744/3726 - loss 0.00454582 - time (sec): 99.32 - samples/sec: 411.65 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:14:26,227 epoch 9 - iter 1116/3726 - loss 0.00553718 - time (sec): 149.42 - samples/sec: 408.79 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:15:15,608 epoch 9 - iter 1488/3726 - loss 0.00675128 - time (sec): 198.80 - samples/sec: 409.19 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:16:05,355 epoch 9 - iter 1860/3726 - loss 0.00722006 - time (sec): 248.55 - samples/sec: 412.39 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:16:54,643 epoch 9 - iter 2232/3726 - loss 0.00736249 - time (sec): 297.83 - samples/sec: 411.62 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:17:45,323 epoch 9 - iter 2604/3726 - loss 0.00786494 - time (sec): 348.51 - samples/sec: 410.57 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:18:35,586 epoch 9 - iter 2976/3726 - loss 0.00784383 - time (sec): 398.78 - samples/sec: 410.83 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:19:25,103 epoch 9 - iter 3348/3726 - loss 0.00763218 - time (sec): 448.29 - samples/sec: 409.68 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:20:15,011 epoch 9 - iter 3720/3726 - loss 0.00729659 - time (sec): 498.20 - samples/sec: 409.86 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:20:15,793 ----------------------------------------------------------------------------------------------------
2023-10-27 21:20:15,793 EPOCH 9 done: loss 0.0073 - lr: 0.000001
2023-10-27 21:20:41,491 DEV : loss 0.056521423161029816 - f1-score (micro avg)  0.9737
2023-10-27 21:20:41,559 ----------------------------------------------------------------------------------------------------
2023-10-27 21:21:31,209 epoch 10 - iter 372/3726 - loss 0.00974983 - time (sec): 49.65 - samples/sec: 405.18 - lr: 0.000001 - momentum: 0.000000
2023-10-27 21:22:20,871 epoch 10 - iter 744/3726 - loss 0.00598632 - time (sec): 99.31 - samples/sec: 409.05 - lr: 0.000000 - momentum: 0.000000
2023-10-27 21:23:10,531 epoch 10 - iter 1116/3726 - loss 0.00650431 - time (sec): 148.97 - samples/sec: 415.14 - lr: 0.000000 - momentum: 0.000000
2023-10-27 21:24:00,035 epoch 10 - iter 1488/3726 - loss 0.00622288 - time (sec): 198.47 - samples/sec: 415.10 - lr: 0.000000 - momentum: 0.000000
2023-10-27 21:24:50,574 epoch 10 - iter 1860/3726 - loss 0.00650968 - time (sec): 249.01 - samples/sec: 412.96 - lr: 0.000000 - momentum: 0.000000
2023-10-27 21:25:40,130 epoch 10 - iter 2232/3726 - loss 0.00707901 - time (sec): 298.57 - samples/sec: 412.19 - lr: 0.000000 - momentum: 0.000000
2023-10-27 21:26:30,069 epoch 10 - iter 2604/3726 - loss 0.00707633 - time (sec): 348.51 - samples/sec: 408.38 - lr: 0.000000 - momentum: 0.000000
2023-10-27 21:27:20,314 epoch 10 - iter 2976/3726 - loss 0.00672126 - time (sec): 398.75 - samples/sec: 409.67 - lr: 0.000000 - momentum: 0.000000
2023-10-27 21:28:09,594 epoch 10 - iter 3348/3726 - loss 0.00659107 - time (sec): 448.03 - samples/sec: 408.65 - lr: 0.000000 - momentum: 0.000000
2023-10-27 21:28:58,954 epoch 10 - iter 3720/3726 - loss 0.00650167 - time (sec): 497.39 - samples/sec: 410.70 - lr: 0.000000 - momentum: 0.000000
2023-10-27 21:28:59,752 ----------------------------------------------------------------------------------------------------
2023-10-27 21:28:59,752 EPOCH 10 done: loss 0.0065 - lr: 0.000000
2023-10-27 21:29:25,442 DEV : loss 0.05730742961168289 - f1-score (micro avg)  0.9744
2023-10-27 21:29:28,531 ----------------------------------------------------------------------------------------------------
2023-10-27 21:29:28,534 Loading model from best epoch ...
2023-10-27 21:29:38,713 SequenceTagger predicts: Dictionary with 17 tags: O, S-ORG, B-ORG, E-ORG, I-ORG, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-MISC, B-MISC, E-MISC, I-MISC
2023-10-27 21:30:03,801 
Results:
- F-score (micro) 0.9699
- F-score (macro) 0.9647
- Accuracy 0.9567

By class:
              precision    recall  f1-score   support

         ORG     0.9662    0.9738    0.9700      1909
         PER     0.9956    0.9937    0.9947      1591
         LOC     0.9701    0.9632    0.9666      1413
        MISC     0.9170    0.9384    0.9276       812

   micro avg     0.9682    0.9717    0.9699      5725
   macro avg     0.9622    0.9673    0.9647      5725
weighted avg     0.9683    0.9717    0.9700      5725

2023-10-27 21:30:03,801 ----------------------------------------------------------------------------------------------------