File size: 24,411 Bytes
e7fa3ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
2023-10-18 19:25:43,700 ----------------------------------------------------------------------------------------------------
2023-10-18 19:25:43,700 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 128)
        (position_embeddings): Embedding(512, 128)
        (token_type_embeddings): Embedding(2, 128)
        (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-1): 2 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=128, out_features=128, bias=True)
                (key): Linear(in_features=128, out_features=128, bias=True)
                (value): Linear(in_features=128, out_features=128, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=128, out_features=128, bias=True)
                (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=128, out_features=512, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=512, out_features=128, bias=True)
              (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=128, out_features=128, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=128, out_features=21, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-18 19:25:43,700 ----------------------------------------------------------------------------------------------------
2023-10-18 19:25:43,700 MultiCorpus: 5901 train + 1287 dev + 1505 test sentences
 - NER_HIPE_2022 Corpus: 5901 train + 1287 dev + 1505 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/fr/with_doc_seperator
2023-10-18 19:25:43,700 ----------------------------------------------------------------------------------------------------
2023-10-18 19:25:43,700 Train:  5901 sentences
2023-10-18 19:25:43,700         (train_with_dev=False, train_with_test=False)
2023-10-18 19:25:43,701 ----------------------------------------------------------------------------------------------------
2023-10-18 19:25:43,701 Training Params:
2023-10-18 19:25:43,701  - learning_rate: "5e-05" 
2023-10-18 19:25:43,701  - mini_batch_size: "4"
2023-10-18 19:25:43,701  - max_epochs: "10"
2023-10-18 19:25:43,701  - shuffle: "True"
2023-10-18 19:25:43,701 ----------------------------------------------------------------------------------------------------
2023-10-18 19:25:43,701 Plugins:
2023-10-18 19:25:43,701  - TensorboardLogger
2023-10-18 19:25:43,701  - LinearScheduler | warmup_fraction: '0.1'
2023-10-18 19:25:43,701 ----------------------------------------------------------------------------------------------------
2023-10-18 19:25:43,701 Final evaluation on model from best epoch (best-model.pt)
2023-10-18 19:25:43,701  - metric: "('micro avg', 'f1-score')"
2023-10-18 19:25:43,701 ----------------------------------------------------------------------------------------------------
2023-10-18 19:25:43,701 Computation:
2023-10-18 19:25:43,701  - compute on device: cuda:0
2023-10-18 19:25:43,701  - embedding storage: none
2023-10-18 19:25:43,701 ----------------------------------------------------------------------------------------------------
2023-10-18 19:25:43,701 Model training base path: "hmbench-hipe2020/fr-dbmdz/bert-tiny-historic-multilingual-cased-bs4-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-3"
2023-10-18 19:25:43,701 ----------------------------------------------------------------------------------------------------
2023-10-18 19:25:43,701 ----------------------------------------------------------------------------------------------------
2023-10-18 19:25:43,701 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-18 19:25:46,078 epoch 1 - iter 147/1476 - loss 3.45082650 - time (sec): 2.38 - samples/sec: 6855.90 - lr: 0.000005 - momentum: 0.000000
2023-10-18 19:25:48,564 epoch 1 - iter 294/1476 - loss 2.95264139 - time (sec): 4.86 - samples/sec: 6711.01 - lr: 0.000010 - momentum: 0.000000
2023-10-18 19:25:50,700 epoch 1 - iter 441/1476 - loss 2.30518090 - time (sec): 7.00 - samples/sec: 7409.64 - lr: 0.000015 - momentum: 0.000000
2023-10-18 19:25:52,905 epoch 1 - iter 588/1476 - loss 1.93520085 - time (sec): 9.20 - samples/sec: 7380.55 - lr: 0.000020 - momentum: 0.000000
2023-10-18 19:25:55,256 epoch 1 - iter 735/1476 - loss 1.69071046 - time (sec): 11.55 - samples/sec: 7222.26 - lr: 0.000025 - momentum: 0.000000
2023-10-18 19:25:57,580 epoch 1 - iter 882/1476 - loss 1.53189135 - time (sec): 13.88 - samples/sec: 7155.68 - lr: 0.000030 - momentum: 0.000000
2023-10-18 19:25:59,878 epoch 1 - iter 1029/1476 - loss 1.39345233 - time (sec): 16.18 - samples/sec: 7142.25 - lr: 0.000035 - momentum: 0.000000
2023-10-18 19:26:02,193 epoch 1 - iter 1176/1476 - loss 1.29402372 - time (sec): 18.49 - samples/sec: 7083.22 - lr: 0.000040 - momentum: 0.000000
2023-10-18 19:26:04,530 epoch 1 - iter 1323/1476 - loss 1.20850834 - time (sec): 20.83 - samples/sec: 7090.57 - lr: 0.000045 - momentum: 0.000000
2023-10-18 19:26:06,892 epoch 1 - iter 1470/1476 - loss 1.12176032 - time (sec): 23.19 - samples/sec: 7158.61 - lr: 0.000050 - momentum: 0.000000
2023-10-18 19:26:06,982 ----------------------------------------------------------------------------------------------------
2023-10-18 19:26:06,982 EPOCH 1 done: loss 1.1200 - lr: 0.000050
2023-10-18 19:26:09,849 DEV : loss 0.3651462495326996 - f1-score (micro avg)  0.2263
2023-10-18 19:26:09,874 saving best model
2023-10-18 19:26:09,903 ----------------------------------------------------------------------------------------------------
2023-10-18 19:26:12,157 epoch 2 - iter 147/1476 - loss 0.43364940 - time (sec): 2.25 - samples/sec: 6739.81 - lr: 0.000049 - momentum: 0.000000
2023-10-18 19:26:14,566 epoch 2 - iter 294/1476 - loss 0.44810793 - time (sec): 4.66 - samples/sec: 7098.25 - lr: 0.000049 - momentum: 0.000000
2023-10-18 19:26:16,919 epoch 2 - iter 441/1476 - loss 0.45228143 - time (sec): 7.02 - samples/sec: 7106.62 - lr: 0.000048 - momentum: 0.000000
2023-10-18 19:26:19,206 epoch 2 - iter 588/1476 - loss 0.44605435 - time (sec): 9.30 - samples/sec: 7022.19 - lr: 0.000048 - momentum: 0.000000
2023-10-18 19:26:21,537 epoch 2 - iter 735/1476 - loss 0.44203136 - time (sec): 11.63 - samples/sec: 6960.52 - lr: 0.000047 - momentum: 0.000000
2023-10-18 19:26:23,893 epoch 2 - iter 882/1476 - loss 0.42953157 - time (sec): 13.99 - samples/sec: 6888.04 - lr: 0.000047 - momentum: 0.000000
2023-10-18 19:26:26,221 epoch 2 - iter 1029/1476 - loss 0.42536159 - time (sec): 16.32 - samples/sec: 6913.01 - lr: 0.000046 - momentum: 0.000000
2023-10-18 19:26:28,640 epoch 2 - iter 1176/1476 - loss 0.41686008 - time (sec): 18.74 - samples/sec: 6973.24 - lr: 0.000046 - momentum: 0.000000
2023-10-18 19:26:31,053 epoch 2 - iter 1323/1476 - loss 0.40319699 - time (sec): 21.15 - samples/sec: 7020.40 - lr: 0.000045 - momentum: 0.000000
2023-10-18 19:26:33,422 epoch 2 - iter 1470/1476 - loss 0.39495645 - time (sec): 23.52 - samples/sec: 7055.20 - lr: 0.000044 - momentum: 0.000000
2023-10-18 19:26:33,504 ----------------------------------------------------------------------------------------------------
2023-10-18 19:26:33,504 EPOCH 2 done: loss 0.3944 - lr: 0.000044
2023-10-18 19:26:40,653 DEV : loss 0.27972713112831116 - f1-score (micro avg)  0.4533
2023-10-18 19:26:40,679 saving best model
2023-10-18 19:26:40,712 ----------------------------------------------------------------------------------------------------
2023-10-18 19:26:42,802 epoch 3 - iter 147/1476 - loss 0.31787561 - time (sec): 2.09 - samples/sec: 7349.79 - lr: 0.000044 - momentum: 0.000000
2023-10-18 19:26:44,882 epoch 3 - iter 294/1476 - loss 0.31847873 - time (sec): 4.17 - samples/sec: 7727.01 - lr: 0.000043 - momentum: 0.000000
2023-10-18 19:26:46,953 epoch 3 - iter 441/1476 - loss 0.32050732 - time (sec): 6.24 - samples/sec: 7800.13 - lr: 0.000043 - momentum: 0.000000
2023-10-18 19:26:49,069 epoch 3 - iter 588/1476 - loss 0.33945722 - time (sec): 8.36 - samples/sec: 7971.13 - lr: 0.000042 - momentum: 0.000000
2023-10-18 19:26:51,403 epoch 3 - iter 735/1476 - loss 0.33243206 - time (sec): 10.69 - samples/sec: 7886.43 - lr: 0.000042 - momentum: 0.000000
2023-10-18 19:26:53,795 epoch 3 - iter 882/1476 - loss 0.33011776 - time (sec): 13.08 - samples/sec: 7712.96 - lr: 0.000041 - momentum: 0.000000
2023-10-18 19:26:56,101 epoch 3 - iter 1029/1476 - loss 0.32825472 - time (sec): 15.39 - samples/sec: 7623.57 - lr: 0.000041 - momentum: 0.000000
2023-10-18 19:26:58,429 epoch 3 - iter 1176/1476 - loss 0.32855013 - time (sec): 17.72 - samples/sec: 7510.53 - lr: 0.000040 - momentum: 0.000000
2023-10-18 19:27:00,729 epoch 3 - iter 1323/1476 - loss 0.32623173 - time (sec): 20.02 - samples/sec: 7447.76 - lr: 0.000039 - momentum: 0.000000
2023-10-18 19:27:03,055 epoch 3 - iter 1470/1476 - loss 0.32429762 - time (sec): 22.34 - samples/sec: 7420.44 - lr: 0.000039 - momentum: 0.000000
2023-10-18 19:27:03,150 ----------------------------------------------------------------------------------------------------
2023-10-18 19:27:03,150 EPOCH 3 done: loss 0.3243 - lr: 0.000039
2023-10-18 19:27:10,343 DEV : loss 0.2565878927707672 - f1-score (micro avg)  0.484
2023-10-18 19:27:10,369 saving best model
2023-10-18 19:27:10,401 ----------------------------------------------------------------------------------------------------
2023-10-18 19:27:12,751 epoch 4 - iter 147/1476 - loss 0.31044806 - time (sec): 2.35 - samples/sec: 7943.66 - lr: 0.000038 - momentum: 0.000000
2023-10-18 19:27:15,078 epoch 4 - iter 294/1476 - loss 0.30521810 - time (sec): 4.68 - samples/sec: 7449.04 - lr: 0.000038 - momentum: 0.000000
2023-10-18 19:27:17,467 epoch 4 - iter 441/1476 - loss 0.30286129 - time (sec): 7.06 - samples/sec: 7165.86 - lr: 0.000037 - momentum: 0.000000
2023-10-18 19:27:19,860 epoch 4 - iter 588/1476 - loss 0.29258445 - time (sec): 9.46 - samples/sec: 7050.26 - lr: 0.000037 - momentum: 0.000000
2023-10-18 19:27:22,211 epoch 4 - iter 735/1476 - loss 0.29065507 - time (sec): 11.81 - samples/sec: 7004.00 - lr: 0.000036 - momentum: 0.000000
2023-10-18 19:27:24,484 epoch 4 - iter 882/1476 - loss 0.28962621 - time (sec): 14.08 - samples/sec: 6895.49 - lr: 0.000036 - momentum: 0.000000
2023-10-18 19:27:26,848 epoch 4 - iter 1029/1476 - loss 0.28847591 - time (sec): 16.45 - samples/sec: 7154.21 - lr: 0.000035 - momentum: 0.000000
2023-10-18 19:27:29,219 epoch 4 - iter 1176/1476 - loss 0.28570559 - time (sec): 18.82 - samples/sec: 7139.19 - lr: 0.000034 - momentum: 0.000000
2023-10-18 19:27:31,547 epoch 4 - iter 1323/1476 - loss 0.28152220 - time (sec): 21.14 - samples/sec: 7085.25 - lr: 0.000034 - momentum: 0.000000
2023-10-18 19:27:33,865 epoch 4 - iter 1470/1476 - loss 0.28053742 - time (sec): 23.46 - samples/sec: 7069.46 - lr: 0.000033 - momentum: 0.000000
2023-10-18 19:27:33,955 ----------------------------------------------------------------------------------------------------
2023-10-18 19:27:33,955 EPOCH 4 done: loss 0.2808 - lr: 0.000033
2023-10-18 19:27:41,133 DEV : loss 0.25001299381256104 - f1-score (micro avg)  0.5105
2023-10-18 19:27:41,160 saving best model
2023-10-18 19:27:41,194 ----------------------------------------------------------------------------------------------------
2023-10-18 19:27:43,589 epoch 5 - iter 147/1476 - loss 0.24307097 - time (sec): 2.39 - samples/sec: 6941.11 - lr: 0.000033 - momentum: 0.000000
2023-10-18 19:27:45,981 epoch 5 - iter 294/1476 - loss 0.25891254 - time (sec): 4.79 - samples/sec: 7440.31 - lr: 0.000032 - momentum: 0.000000
2023-10-18 19:27:48,219 epoch 5 - iter 441/1476 - loss 0.26161265 - time (sec): 7.02 - samples/sec: 7284.10 - lr: 0.000032 - momentum: 0.000000
2023-10-18 19:27:50,608 epoch 5 - iter 588/1476 - loss 0.26397432 - time (sec): 9.41 - samples/sec: 7301.00 - lr: 0.000031 - momentum: 0.000000
2023-10-18 19:27:52,845 epoch 5 - iter 735/1476 - loss 0.26076338 - time (sec): 11.65 - samples/sec: 7331.47 - lr: 0.000031 - momentum: 0.000000
2023-10-18 19:27:55,258 epoch 5 - iter 882/1476 - loss 0.25857998 - time (sec): 14.06 - samples/sec: 7336.21 - lr: 0.000030 - momentum: 0.000000
2023-10-18 19:27:57,642 epoch 5 - iter 1029/1476 - loss 0.25772969 - time (sec): 16.45 - samples/sec: 7238.18 - lr: 0.000029 - momentum: 0.000000
2023-10-18 19:27:59,941 epoch 5 - iter 1176/1476 - loss 0.25641948 - time (sec): 18.75 - samples/sec: 7173.24 - lr: 0.000029 - momentum: 0.000000
2023-10-18 19:28:02,276 epoch 5 - iter 1323/1476 - loss 0.25486743 - time (sec): 21.08 - samples/sec: 7169.23 - lr: 0.000028 - momentum: 0.000000
2023-10-18 19:28:04,600 epoch 5 - iter 1470/1476 - loss 0.25455365 - time (sec): 23.41 - samples/sec: 7086.76 - lr: 0.000028 - momentum: 0.000000
2023-10-18 19:28:04,696 ----------------------------------------------------------------------------------------------------
2023-10-18 19:28:04,696 EPOCH 5 done: loss 0.2542 - lr: 0.000028
2023-10-18 19:28:11,973 DEV : loss 0.23268112540245056 - f1-score (micro avg)  0.5359
2023-10-18 19:28:11,999 saving best model
2023-10-18 19:28:12,037 ----------------------------------------------------------------------------------------------------
2023-10-18 19:28:14,319 epoch 6 - iter 147/1476 - loss 0.22702365 - time (sec): 2.28 - samples/sec: 6646.41 - lr: 0.000027 - momentum: 0.000000
2023-10-18 19:28:16,657 epoch 6 - iter 294/1476 - loss 0.22872241 - time (sec): 4.62 - samples/sec: 6983.40 - lr: 0.000027 - momentum: 0.000000
2023-10-18 19:28:19,038 epoch 6 - iter 441/1476 - loss 0.23060571 - time (sec): 7.00 - samples/sec: 7102.09 - lr: 0.000026 - momentum: 0.000000
2023-10-18 19:28:21,408 epoch 6 - iter 588/1476 - loss 0.22128491 - time (sec): 9.37 - samples/sec: 7211.04 - lr: 0.000026 - momentum: 0.000000
2023-10-18 19:28:23,834 epoch 6 - iter 735/1476 - loss 0.23096293 - time (sec): 11.80 - samples/sec: 7258.92 - lr: 0.000025 - momentum: 0.000000
2023-10-18 19:28:26,140 epoch 6 - iter 882/1476 - loss 0.23662672 - time (sec): 14.10 - samples/sec: 7219.87 - lr: 0.000024 - momentum: 0.000000
2023-10-18 19:28:28,585 epoch 6 - iter 1029/1476 - loss 0.23652904 - time (sec): 16.55 - samples/sec: 7056.48 - lr: 0.000024 - momentum: 0.000000
2023-10-18 19:28:30,842 epoch 6 - iter 1176/1476 - loss 0.23746363 - time (sec): 18.80 - samples/sec: 7043.73 - lr: 0.000023 - momentum: 0.000000
2023-10-18 19:28:33,211 epoch 6 - iter 1323/1476 - loss 0.23598701 - time (sec): 21.17 - samples/sec: 7095.16 - lr: 0.000023 - momentum: 0.000000
2023-10-18 19:28:35,508 epoch 6 - iter 1470/1476 - loss 0.23285328 - time (sec): 23.47 - samples/sec: 7060.49 - lr: 0.000022 - momentum: 0.000000
2023-10-18 19:28:35,599 ----------------------------------------------------------------------------------------------------
2023-10-18 19:28:35,600 EPOCH 6 done: loss 0.2325 - lr: 0.000022
2023-10-18 19:28:42,848 DEV : loss 0.23657557368278503 - f1-score (micro avg)  0.5489
2023-10-18 19:28:42,874 saving best model
2023-10-18 19:28:42,913 ----------------------------------------------------------------------------------------------------
2023-10-18 19:28:45,188 epoch 7 - iter 147/1476 - loss 0.20843898 - time (sec): 2.28 - samples/sec: 6965.88 - lr: 0.000022 - momentum: 0.000000
2023-10-18 19:28:47,506 epoch 7 - iter 294/1476 - loss 0.19848750 - time (sec): 4.59 - samples/sec: 6917.41 - lr: 0.000021 - momentum: 0.000000
2023-10-18 19:28:49,793 epoch 7 - iter 441/1476 - loss 0.20213122 - time (sec): 6.88 - samples/sec: 6857.08 - lr: 0.000021 - momentum: 0.000000
2023-10-18 19:28:52,149 epoch 7 - iter 588/1476 - loss 0.20460332 - time (sec): 9.24 - samples/sec: 6881.99 - lr: 0.000020 - momentum: 0.000000
2023-10-18 19:28:54,545 epoch 7 - iter 735/1476 - loss 0.20278281 - time (sec): 11.63 - samples/sec: 6900.60 - lr: 0.000019 - momentum: 0.000000
2023-10-18 19:28:56,668 epoch 7 - iter 882/1476 - loss 0.20307092 - time (sec): 13.75 - samples/sec: 6982.65 - lr: 0.000019 - momentum: 0.000000
2023-10-18 19:28:58,943 epoch 7 - iter 1029/1476 - loss 0.20304549 - time (sec): 16.03 - samples/sec: 7078.70 - lr: 0.000018 - momentum: 0.000000
2023-10-18 19:29:01,285 epoch 7 - iter 1176/1476 - loss 0.21048904 - time (sec): 18.37 - samples/sec: 7076.45 - lr: 0.000018 - momentum: 0.000000
2023-10-18 19:29:03,662 epoch 7 - iter 1323/1476 - loss 0.21393619 - time (sec): 20.75 - samples/sec: 7103.87 - lr: 0.000017 - momentum: 0.000000
2023-10-18 19:29:06,067 epoch 7 - iter 1470/1476 - loss 0.21913870 - time (sec): 23.15 - samples/sec: 7163.04 - lr: 0.000017 - momentum: 0.000000
2023-10-18 19:29:06,157 ----------------------------------------------------------------------------------------------------
2023-10-18 19:29:06,157 EPOCH 7 done: loss 0.2189 - lr: 0.000017
2023-10-18 19:29:13,450 DEV : loss 0.23048946261405945 - f1-score (micro avg)  0.5607
2023-10-18 19:29:13,476 saving best model
2023-10-18 19:29:13,514 ----------------------------------------------------------------------------------------------------
2023-10-18 19:29:15,943 epoch 8 - iter 147/1476 - loss 0.22164479 - time (sec): 2.43 - samples/sec: 7841.51 - lr: 0.000016 - momentum: 0.000000
2023-10-18 19:29:18,259 epoch 8 - iter 294/1476 - loss 0.21612436 - time (sec): 4.74 - samples/sec: 7213.85 - lr: 0.000016 - momentum: 0.000000
2023-10-18 19:29:20,610 epoch 8 - iter 441/1476 - loss 0.20551481 - time (sec): 7.10 - samples/sec: 7205.49 - lr: 0.000015 - momentum: 0.000000
2023-10-18 19:29:22,942 epoch 8 - iter 588/1476 - loss 0.20444008 - time (sec): 9.43 - samples/sec: 7104.72 - lr: 0.000014 - momentum: 0.000000
2023-10-18 19:29:25,290 epoch 8 - iter 735/1476 - loss 0.20490690 - time (sec): 11.78 - samples/sec: 7023.69 - lr: 0.000014 - momentum: 0.000000
2023-10-18 19:29:27,633 epoch 8 - iter 882/1476 - loss 0.20394011 - time (sec): 14.12 - samples/sec: 6971.94 - lr: 0.000013 - momentum: 0.000000
2023-10-18 19:29:29,954 epoch 8 - iter 1029/1476 - loss 0.20450972 - time (sec): 16.44 - samples/sec: 6981.55 - lr: 0.000013 - momentum: 0.000000
2023-10-18 19:29:32,268 epoch 8 - iter 1176/1476 - loss 0.20532453 - time (sec): 18.75 - samples/sec: 6988.05 - lr: 0.000012 - momentum: 0.000000
2023-10-18 19:29:34,611 epoch 8 - iter 1323/1476 - loss 0.20424926 - time (sec): 21.10 - samples/sec: 7020.74 - lr: 0.000012 - momentum: 0.000000
2023-10-18 19:29:37,049 epoch 8 - iter 1470/1476 - loss 0.20419251 - time (sec): 23.53 - samples/sec: 7048.45 - lr: 0.000011 - momentum: 0.000000
2023-10-18 19:29:37,132 ----------------------------------------------------------------------------------------------------
2023-10-18 19:29:37,132 EPOCH 8 done: loss 0.2038 - lr: 0.000011
2023-10-18 19:29:44,526 DEV : loss 0.2287674993276596 - f1-score (micro avg)  0.564
2023-10-18 19:29:44,553 saving best model
2023-10-18 19:29:44,588 ----------------------------------------------------------------------------------------------------
2023-10-18 19:29:46,979 epoch 9 - iter 147/1476 - loss 0.24117374 - time (sec): 2.39 - samples/sec: 7758.64 - lr: 0.000011 - momentum: 0.000000
2023-10-18 19:29:49,330 epoch 9 - iter 294/1476 - loss 0.20776964 - time (sec): 4.74 - samples/sec: 7452.71 - lr: 0.000010 - momentum: 0.000000
2023-10-18 19:29:51,767 epoch 9 - iter 441/1476 - loss 0.19920249 - time (sec): 7.18 - samples/sec: 7490.06 - lr: 0.000009 - momentum: 0.000000
2023-10-18 19:29:54,131 epoch 9 - iter 588/1476 - loss 0.19959637 - time (sec): 9.54 - samples/sec: 7400.95 - lr: 0.000009 - momentum: 0.000000
2023-10-18 19:29:56,441 epoch 9 - iter 735/1476 - loss 0.20235990 - time (sec): 11.85 - samples/sec: 7218.41 - lr: 0.000008 - momentum: 0.000000
2023-10-18 19:29:58,781 epoch 9 - iter 882/1476 - loss 0.20008424 - time (sec): 14.19 - samples/sec: 7133.15 - lr: 0.000008 - momentum: 0.000000
2023-10-18 19:30:01,113 epoch 9 - iter 1029/1476 - loss 0.19992171 - time (sec): 16.52 - samples/sec: 7066.75 - lr: 0.000007 - momentum: 0.000000
2023-10-18 19:30:03,499 epoch 9 - iter 1176/1476 - loss 0.20005741 - time (sec): 18.91 - samples/sec: 7012.25 - lr: 0.000007 - momentum: 0.000000
2023-10-18 19:30:05,824 epoch 9 - iter 1323/1476 - loss 0.19939868 - time (sec): 21.24 - samples/sec: 6998.13 - lr: 0.000006 - momentum: 0.000000
2023-10-18 19:30:08,170 epoch 9 - iter 1470/1476 - loss 0.19779017 - time (sec): 23.58 - samples/sec: 7031.87 - lr: 0.000006 - momentum: 0.000000
2023-10-18 19:30:08,259 ----------------------------------------------------------------------------------------------------
2023-10-18 19:30:08,259 EPOCH 9 done: loss 0.1978 - lr: 0.000006
2023-10-18 19:30:15,511 DEV : loss 0.23322512209415436 - f1-score (micro avg)  0.5669
2023-10-18 19:30:15,537 saving best model
2023-10-18 19:30:15,575 ----------------------------------------------------------------------------------------------------
2023-10-18 19:30:17,903 epoch 10 - iter 147/1476 - loss 0.17627206 - time (sec): 2.33 - samples/sec: 6928.85 - lr: 0.000005 - momentum: 0.000000
2023-10-18 19:30:20,279 epoch 10 - iter 294/1476 - loss 0.19028562 - time (sec): 4.70 - samples/sec: 6917.02 - lr: 0.000004 - momentum: 0.000000
2023-10-18 19:30:22,714 epoch 10 - iter 441/1476 - loss 0.20663725 - time (sec): 7.14 - samples/sec: 7221.33 - lr: 0.000004 - momentum: 0.000000
2023-10-18 19:30:25,056 epoch 10 - iter 588/1476 - loss 0.20586196 - time (sec): 9.48 - samples/sec: 7218.37 - lr: 0.000003 - momentum: 0.000000
2023-10-18 19:30:27,349 epoch 10 - iter 735/1476 - loss 0.19710005 - time (sec): 11.77 - samples/sec: 7161.12 - lr: 0.000003 - momentum: 0.000000
2023-10-18 19:30:29,703 epoch 10 - iter 882/1476 - loss 0.19316847 - time (sec): 14.13 - samples/sec: 7033.32 - lr: 0.000002 - momentum: 0.000000
2023-10-18 19:30:32,069 epoch 10 - iter 1029/1476 - loss 0.19035206 - time (sec): 16.49 - samples/sec: 7108.95 - lr: 0.000002 - momentum: 0.000000
2023-10-18 19:30:34,472 epoch 10 - iter 1176/1476 - loss 0.19179760 - time (sec): 18.90 - samples/sec: 7073.46 - lr: 0.000001 - momentum: 0.000000
2023-10-18 19:30:36,816 epoch 10 - iter 1323/1476 - loss 0.19167956 - time (sec): 21.24 - samples/sec: 7103.06 - lr: 0.000001 - momentum: 0.000000
2023-10-18 19:30:39,093 epoch 10 - iter 1470/1476 - loss 0.19294142 - time (sec): 23.52 - samples/sec: 7051.22 - lr: 0.000000 - momentum: 0.000000
2023-10-18 19:30:39,180 ----------------------------------------------------------------------------------------------------
2023-10-18 19:30:39,180 EPOCH 10 done: loss 0.1932 - lr: 0.000000
2023-10-18 19:30:46,482 DEV : loss 0.23437848687171936 - f1-score (micro avg)  0.5667
2023-10-18 19:30:46,542 ----------------------------------------------------------------------------------------------------
2023-10-18 19:30:46,542 Loading model from best epoch ...
2023-10-18 19:30:46,624 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-time, B-time, E-time, I-time, S-prod, B-prod, E-prod, I-prod
2023-10-18 19:30:49,805 
Results:
- F-score (micro) 0.5554
- F-score (macro) 0.3405
- Accuracy 0.4067

By class:
              precision    recall  f1-score   support

         loc     0.5913    0.7622    0.6660       858
        pers     0.4338    0.5493    0.4848       537
         org     0.2063    0.0985    0.1333       132
        time     0.4107    0.4259    0.4182        54
        prod     0.0000    0.0000    0.0000        61

   micro avg     0.5171    0.5999    0.5554      1642
   macro avg     0.3284    0.3672    0.3405      1642
weighted avg     0.4810    0.5999    0.5310      1642

2023-10-18 19:30:49,805 ----------------------------------------------------------------------------------------------------