selmamalak commited on
Commit
9d78380
1 Parent(s): 5339525

End of training

Browse files
Files changed (5) hide show
  1. README.md +5 -5
  2. all_results.json +16 -0
  3. eval_results.json +11 -0
  4. train_results.json +8 -0
  5. trainer_state.json +1459 -0
README.md CHANGED
@@ -23,11 +23,11 @@ should probably proofread and complete it, then remove this comment. -->
23
 
24
  This model is a fine-tuned version of [microsoft/beit-base-patch16-224-pt22k-ft22k](https://huggingface.co/microsoft/beit-base-patch16-224-pt22k-ft22k) on the medmnist-v2 dataset.
25
  It achieves the following results on the evaluation set:
26
- - Loss: 0.0785
27
- - Accuracy: 0.9708
28
- - Precision: 0.9668
29
- - Recall: 0.9737
30
- - F1: 0.9698
31
 
32
  ## Model description
33
 
 
23
 
24
  This model is a fine-tuned version of [microsoft/beit-base-patch16-224-pt22k-ft22k](https://huggingface.co/microsoft/beit-base-patch16-224-pt22k-ft22k) on the medmnist-v2 dataset.
25
  It achieves the following results on the evaluation set:
26
+ - Loss: 0.0847
27
+ - Accuracy: 0.9737
28
+ - Precision: 0.9726
29
+ - Recall: 0.9724
30
+ - F1: 0.9724
31
 
32
  ## Model description
33
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.9736919029523531,
4
+ "eval_f1": 0.9724362036912011,
5
+ "eval_loss": 0.08470147103071213,
6
+ "eval_precision": 0.9726422315270469,
7
+ "eval_recall": 0.9724099912288373,
8
+ "eval_runtime": 20.7035,
9
+ "eval_samples_per_second": 165.237,
10
+ "eval_steps_per_second": 10.336,
11
+ "total_flos": 9.328175742872125e+18,
12
+ "train_loss": 0.3662890907277398,
13
+ "train_runtime": 1600.7009,
14
+ "train_samples_per_second": 74.711,
15
+ "train_steps_per_second": 1.168
16
+ }
eval_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.9736919029523531,
4
+ "eval_f1": 0.9724362036912011,
5
+ "eval_loss": 0.08470147103071213,
6
+ "eval_precision": 0.9726422315270469,
7
+ "eval_recall": 0.9724099912288373,
8
+ "eval_runtime": 20.7035,
9
+ "eval_samples_per_second": 165.237,
10
+ "eval_steps_per_second": 10.336
11
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "total_flos": 9.328175742872125e+18,
4
+ "train_loss": 0.3662890907277398,
5
+ "train_runtime": 1600.7009,
6
+ "train_samples_per_second": 74.711,
7
+ "train_steps_per_second": 1.168
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1459 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9707943925233645,
3
+ "best_model_checkpoint": "beit-base-patch16-224-pt22k-ft22k-finetuned-lora-medmnistv2/checkpoint-1870",
4
+ "epoch": 10.0,
5
+ "eval_steps": 500,
6
+ "global_step": 1870,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.053475935828877004,
13
+ "grad_norm": 4.650041580200195,
14
+ "learning_rate": 0.004973262032085562,
15
+ "loss": 1.5063,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.10695187165775401,
20
+ "grad_norm": 3.0658373832702637,
21
+ "learning_rate": 0.004946524064171123,
22
+ "loss": 0.8711,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.16042780748663102,
27
+ "grad_norm": 2.9676272869110107,
28
+ "learning_rate": 0.004919786096256685,
29
+ "loss": 0.8,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.21390374331550802,
34
+ "grad_norm": 2.5159189701080322,
35
+ "learning_rate": 0.004893048128342246,
36
+ "loss": 0.7794,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.26737967914438504,
41
+ "grad_norm": 2.4576735496520996,
42
+ "learning_rate": 0.004868983957219251,
43
+ "loss": 0.8748,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.32085561497326204,
48
+ "grad_norm": 1.9533675909042358,
49
+ "learning_rate": 0.004842245989304813,
50
+ "loss": 0.6213,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.37433155080213903,
55
+ "grad_norm": 3.91825795173645,
56
+ "learning_rate": 0.004815508021390374,
57
+ "loss": 0.6883,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.42780748663101603,
62
+ "grad_norm": 3.228422164916992,
63
+ "learning_rate": 0.004788770053475936,
64
+ "loss": 0.7019,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.48128342245989303,
69
+ "grad_norm": 4.45206356048584,
70
+ "learning_rate": 0.004762032085561497,
71
+ "loss": 0.5394,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.5347593582887701,
76
+ "grad_norm": 2.184957504272461,
77
+ "learning_rate": 0.004735294117647059,
78
+ "loss": 0.5543,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.5882352941176471,
83
+ "grad_norm": 2.246079206466675,
84
+ "learning_rate": 0.00470855614973262,
85
+ "loss": 0.5738,
86
+ "step": 110
87
+ },
88
+ {
89
+ "epoch": 0.6417112299465241,
90
+ "grad_norm": 2.6914820671081543,
91
+ "learning_rate": 0.004681818181818182,
92
+ "loss": 0.6209,
93
+ "step": 120
94
+ },
95
+ {
96
+ "epoch": 0.6951871657754011,
97
+ "grad_norm": 2.5458545684814453,
98
+ "learning_rate": 0.0046550802139037435,
99
+ "loss": 0.5597,
100
+ "step": 130
101
+ },
102
+ {
103
+ "epoch": 0.7486631016042781,
104
+ "grad_norm": 2.676391363143921,
105
+ "learning_rate": 0.004628342245989305,
106
+ "loss": 0.5273,
107
+ "step": 140
108
+ },
109
+ {
110
+ "epoch": 0.8021390374331551,
111
+ "grad_norm": 2.5059385299682617,
112
+ "learning_rate": 0.0046016042780748665,
113
+ "loss": 0.5199,
114
+ "step": 150
115
+ },
116
+ {
117
+ "epoch": 0.8556149732620321,
118
+ "grad_norm": 1.451249122619629,
119
+ "learning_rate": 0.004574866310160428,
120
+ "loss": 0.5509,
121
+ "step": 160
122
+ },
123
+ {
124
+ "epoch": 0.9090909090909091,
125
+ "grad_norm": 2.5957276821136475,
126
+ "learning_rate": 0.00454812834224599,
127
+ "loss": 0.5336,
128
+ "step": 170
129
+ },
130
+ {
131
+ "epoch": 0.9625668449197861,
132
+ "grad_norm": 2.4229955673217773,
133
+ "learning_rate": 0.004521390374331551,
134
+ "loss": 0.4657,
135
+ "step": 180
136
+ },
137
+ {
138
+ "epoch": 1.0,
139
+ "eval_accuracy": 0.9094626168224299,
140
+ "eval_f1": 0.8972949130385568,
141
+ "eval_loss": 0.2451503425836563,
142
+ "eval_precision": 0.8964084875867973,
143
+ "eval_recall": 0.9082806506629539,
144
+ "eval_runtime": 10.2386,
145
+ "eval_samples_per_second": 167.21,
146
+ "eval_steps_per_second": 10.451,
147
+ "step": 187
148
+ },
149
+ {
150
+ "epoch": 1.0160427807486632,
151
+ "grad_norm": 2.3994851112365723,
152
+ "learning_rate": 0.004494652406417113,
153
+ "loss": 0.4772,
154
+ "step": 190
155
+ },
156
+ {
157
+ "epoch": 1.0695187165775402,
158
+ "grad_norm": 1.985571265220642,
159
+ "learning_rate": 0.004467914438502674,
160
+ "loss": 0.5995,
161
+ "step": 200
162
+ },
163
+ {
164
+ "epoch": 1.1229946524064172,
165
+ "grad_norm": 2.3798632621765137,
166
+ "learning_rate": 0.004441176470588235,
167
+ "loss": 0.5686,
168
+ "step": 210
169
+ },
170
+ {
171
+ "epoch": 1.1764705882352942,
172
+ "grad_norm": 3.1128406524658203,
173
+ "learning_rate": 0.004414438502673797,
174
+ "loss": 0.4984,
175
+ "step": 220
176
+ },
177
+ {
178
+ "epoch": 1.2299465240641712,
179
+ "grad_norm": 2.8572049140930176,
180
+ "learning_rate": 0.004387700534759359,
181
+ "loss": 0.5027,
182
+ "step": 230
183
+ },
184
+ {
185
+ "epoch": 1.2834224598930482,
186
+ "grad_norm": 5.178213119506836,
187
+ "learning_rate": 0.00436096256684492,
188
+ "loss": 0.4864,
189
+ "step": 240
190
+ },
191
+ {
192
+ "epoch": 1.3368983957219251,
193
+ "grad_norm": 1.9515773057937622,
194
+ "learning_rate": 0.004334224598930481,
195
+ "loss": 0.4528,
196
+ "step": 250
197
+ },
198
+ {
199
+ "epoch": 1.3903743315508021,
200
+ "grad_norm": 3.023959159851074,
201
+ "learning_rate": 0.0043074866310160425,
202
+ "loss": 0.5513,
203
+ "step": 260
204
+ },
205
+ {
206
+ "epoch": 1.4438502673796791,
207
+ "grad_norm": 2.371218204498291,
208
+ "learning_rate": 0.004280748663101605,
209
+ "loss": 0.442,
210
+ "step": 270
211
+ },
212
+ {
213
+ "epoch": 1.4973262032085561,
214
+ "grad_norm": 2.111191987991333,
215
+ "learning_rate": 0.004254010695187166,
216
+ "loss": 0.6163,
217
+ "step": 280
218
+ },
219
+ {
220
+ "epoch": 1.5508021390374331,
221
+ "grad_norm": 2.123419761657715,
222
+ "learning_rate": 0.004227272727272727,
223
+ "loss": 0.5522,
224
+ "step": 290
225
+ },
226
+ {
227
+ "epoch": 1.6042780748663101,
228
+ "grad_norm": 1.6425999402999878,
229
+ "learning_rate": 0.004200534759358289,
230
+ "loss": 0.4601,
231
+ "step": 300
232
+ },
233
+ {
234
+ "epoch": 1.6577540106951871,
235
+ "grad_norm": 3.847395420074463,
236
+ "learning_rate": 0.00417379679144385,
237
+ "loss": 0.5434,
238
+ "step": 310
239
+ },
240
+ {
241
+ "epoch": 1.7112299465240641,
242
+ "grad_norm": 1.8732799291610718,
243
+ "learning_rate": 0.004147058823529412,
244
+ "loss": 0.4952,
245
+ "step": 320
246
+ },
247
+ {
248
+ "epoch": 1.7647058823529411,
249
+ "grad_norm": 1.4881893396377563,
250
+ "learning_rate": 0.004120320855614973,
251
+ "loss": 0.4926,
252
+ "step": 330
253
+ },
254
+ {
255
+ "epoch": 1.8181818181818183,
256
+ "grad_norm": 1.9936500787734985,
257
+ "learning_rate": 0.004093582887700535,
258
+ "loss": 0.4582,
259
+ "step": 340
260
+ },
261
+ {
262
+ "epoch": 1.8716577540106951,
263
+ "grad_norm": 4.784737586975098,
264
+ "learning_rate": 0.004066844919786096,
265
+ "loss": 0.4839,
266
+ "step": 350
267
+ },
268
+ {
269
+ "epoch": 1.9251336898395723,
270
+ "grad_norm": 2.403982162475586,
271
+ "learning_rate": 0.004040106951871658,
272
+ "loss": 0.5868,
273
+ "step": 360
274
+ },
275
+ {
276
+ "epoch": 1.9786096256684491,
277
+ "grad_norm": 1.7464922666549683,
278
+ "learning_rate": 0.004013368983957219,
279
+ "loss": 0.4327,
280
+ "step": 370
281
+ },
282
+ {
283
+ "epoch": 2.0,
284
+ "eval_accuracy": 0.9182242990654206,
285
+ "eval_f1": 0.9007413709436916,
286
+ "eval_loss": 0.21109923720359802,
287
+ "eval_precision": 0.9299210483133126,
288
+ "eval_recall": 0.8921235393972065,
289
+ "eval_runtime": 10.4332,
290
+ "eval_samples_per_second": 164.091,
291
+ "eval_steps_per_second": 10.256,
292
+ "step": 374
293
+ },
294
+ {
295
+ "epoch": 2.0320855614973263,
296
+ "grad_norm": 1.444707989692688,
297
+ "learning_rate": 0.003986631016042781,
298
+ "loss": 0.478,
299
+ "step": 380
300
+ },
301
+ {
302
+ "epoch": 2.085561497326203,
303
+ "grad_norm": 1.4123905897140503,
304
+ "learning_rate": 0.003959893048128342,
305
+ "loss": 0.5,
306
+ "step": 390
307
+ },
308
+ {
309
+ "epoch": 2.1390374331550803,
310
+ "grad_norm": 2.96335768699646,
311
+ "learning_rate": 0.003933155080213904,
312
+ "loss": 0.5348,
313
+ "step": 400
314
+ },
315
+ {
316
+ "epoch": 2.192513368983957,
317
+ "grad_norm": 1.4397529363632202,
318
+ "learning_rate": 0.0039064171122994654,
319
+ "loss": 0.4571,
320
+ "step": 410
321
+ },
322
+ {
323
+ "epoch": 2.2459893048128343,
324
+ "grad_norm": 1.821366548538208,
325
+ "learning_rate": 0.0038796791443850265,
326
+ "loss": 0.4982,
327
+ "step": 420
328
+ },
329
+ {
330
+ "epoch": 2.299465240641711,
331
+ "grad_norm": 2.112130641937256,
332
+ "learning_rate": 0.0038529411764705885,
333
+ "loss": 0.4343,
334
+ "step": 430
335
+ },
336
+ {
337
+ "epoch": 2.3529411764705883,
338
+ "grad_norm": 1.942734956741333,
339
+ "learning_rate": 0.00382620320855615,
340
+ "loss": 0.5078,
341
+ "step": 440
342
+ },
343
+ {
344
+ "epoch": 2.406417112299465,
345
+ "grad_norm": 2.774502754211426,
346
+ "learning_rate": 0.003799465240641711,
347
+ "loss": 0.4016,
348
+ "step": 450
349
+ },
350
+ {
351
+ "epoch": 2.4598930481283423,
352
+ "grad_norm": 2.139463424682617,
353
+ "learning_rate": 0.0037727272727272726,
354
+ "loss": 0.5415,
355
+ "step": 460
356
+ },
357
+ {
358
+ "epoch": 2.5133689839572195,
359
+ "grad_norm": 1.9148341417312622,
360
+ "learning_rate": 0.003745989304812834,
361
+ "loss": 0.4417,
362
+ "step": 470
363
+ },
364
+ {
365
+ "epoch": 2.5668449197860963,
366
+ "grad_norm": 1.9109567403793335,
367
+ "learning_rate": 0.003719251336898396,
368
+ "loss": 0.4273,
369
+ "step": 480
370
+ },
371
+ {
372
+ "epoch": 2.620320855614973,
373
+ "grad_norm": 2.2219059467315674,
374
+ "learning_rate": 0.0036925133689839572,
375
+ "loss": 0.5218,
376
+ "step": 490
377
+ },
378
+ {
379
+ "epoch": 2.6737967914438503,
380
+ "grad_norm": 3.378606081008911,
381
+ "learning_rate": 0.0036657754010695188,
382
+ "loss": 0.4318,
383
+ "step": 500
384
+ },
385
+ {
386
+ "epoch": 2.7272727272727275,
387
+ "grad_norm": 1.668760061264038,
388
+ "learning_rate": 0.0036390374331550803,
389
+ "loss": 0.4447,
390
+ "step": 510
391
+ },
392
+ {
393
+ "epoch": 2.7807486631016043,
394
+ "grad_norm": 1.830342411994934,
395
+ "learning_rate": 0.0036122994652406414,
396
+ "loss": 0.4507,
397
+ "step": 520
398
+ },
399
+ {
400
+ "epoch": 2.834224598930481,
401
+ "grad_norm": 2.2146425247192383,
402
+ "learning_rate": 0.0035855614973262034,
403
+ "loss": 0.4127,
404
+ "step": 530
405
+ },
406
+ {
407
+ "epoch": 2.8877005347593583,
408
+ "grad_norm": 1.3959295749664307,
409
+ "learning_rate": 0.003558823529411765,
410
+ "loss": 0.4353,
411
+ "step": 540
412
+ },
413
+ {
414
+ "epoch": 2.9411764705882355,
415
+ "grad_norm": 1.844604253768921,
416
+ "learning_rate": 0.0035320855614973264,
417
+ "loss": 0.3488,
418
+ "step": 550
419
+ },
420
+ {
421
+ "epoch": 2.9946524064171123,
422
+ "grad_norm": 1.421885371208191,
423
+ "learning_rate": 0.0035053475935828875,
424
+ "loss": 0.3977,
425
+ "step": 560
426
+ },
427
+ {
428
+ "epoch": 3.0,
429
+ "eval_accuracy": 0.9339953271028038,
430
+ "eval_f1": 0.924420495312186,
431
+ "eval_loss": 0.17427141964435577,
432
+ "eval_precision": 0.9228598461246502,
433
+ "eval_recall": 0.928247943129569,
434
+ "eval_runtime": 9.981,
435
+ "eval_samples_per_second": 171.527,
436
+ "eval_steps_per_second": 10.72,
437
+ "step": 561
438
+ },
439
+ {
440
+ "epoch": 3.0481283422459895,
441
+ "grad_norm": 2.2883894443511963,
442
+ "learning_rate": 0.003478609625668449,
443
+ "loss": 0.3909,
444
+ "step": 570
445
+ },
446
+ {
447
+ "epoch": 3.1016042780748663,
448
+ "grad_norm": 2.4753079414367676,
449
+ "learning_rate": 0.003451871657754011,
450
+ "loss": 0.4352,
451
+ "step": 580
452
+ },
453
+ {
454
+ "epoch": 3.1550802139037435,
455
+ "grad_norm": 2.298736572265625,
456
+ "learning_rate": 0.0034251336898395725,
457
+ "loss": 0.4641,
458
+ "step": 590
459
+ },
460
+ {
461
+ "epoch": 3.2085561497326203,
462
+ "grad_norm": 1.4368634223937988,
463
+ "learning_rate": 0.0033983957219251336,
464
+ "loss": 0.4225,
465
+ "step": 600
466
+ },
467
+ {
468
+ "epoch": 3.2620320855614975,
469
+ "grad_norm": 1.462842583656311,
470
+ "learning_rate": 0.003371657754010695,
471
+ "loss": 0.3958,
472
+ "step": 610
473
+ },
474
+ {
475
+ "epoch": 3.3155080213903743,
476
+ "grad_norm": 2.449066638946533,
477
+ "learning_rate": 0.0033449197860962567,
478
+ "loss": 0.3784,
479
+ "step": 620
480
+ },
481
+ {
482
+ "epoch": 3.3689839572192515,
483
+ "grad_norm": 1.5616710186004639,
484
+ "learning_rate": 0.0033181818181818186,
485
+ "loss": 0.4476,
486
+ "step": 630
487
+ },
488
+ {
489
+ "epoch": 3.4224598930481283,
490
+ "grad_norm": 2.284454345703125,
491
+ "learning_rate": 0.0032914438502673797,
492
+ "loss": 0.3725,
493
+ "step": 640
494
+ },
495
+ {
496
+ "epoch": 3.4759358288770055,
497
+ "grad_norm": 1.5143663883209229,
498
+ "learning_rate": 0.0032647058823529413,
499
+ "loss": 0.4597,
500
+ "step": 650
501
+ },
502
+ {
503
+ "epoch": 3.5294117647058822,
504
+ "grad_norm": 1.6112128496170044,
505
+ "learning_rate": 0.003237967914438503,
506
+ "loss": 0.4198,
507
+ "step": 660
508
+ },
509
+ {
510
+ "epoch": 3.5828877005347595,
511
+ "grad_norm": 1.2612804174423218,
512
+ "learning_rate": 0.003211229946524064,
513
+ "loss": 0.4785,
514
+ "step": 670
515
+ },
516
+ {
517
+ "epoch": 3.6363636363636362,
518
+ "grad_norm": 2.0233500003814697,
519
+ "learning_rate": 0.0031844919786096254,
520
+ "loss": 0.4276,
521
+ "step": 680
522
+ },
523
+ {
524
+ "epoch": 3.6898395721925135,
525
+ "grad_norm": 1.2161093950271606,
526
+ "learning_rate": 0.0031577540106951874,
527
+ "loss": 0.3865,
528
+ "step": 690
529
+ },
530
+ {
531
+ "epoch": 3.7433155080213902,
532
+ "grad_norm": 1.835656762123108,
533
+ "learning_rate": 0.003131016042780749,
534
+ "loss": 0.3202,
535
+ "step": 700
536
+ },
537
+ {
538
+ "epoch": 3.7967914438502675,
539
+ "grad_norm": 2.9908785820007324,
540
+ "learning_rate": 0.00310427807486631,
541
+ "loss": 0.3879,
542
+ "step": 710
543
+ },
544
+ {
545
+ "epoch": 3.8502673796791442,
546
+ "grad_norm": 1.587223768234253,
547
+ "learning_rate": 0.0030775401069518715,
548
+ "loss": 0.3682,
549
+ "step": 720
550
+ },
551
+ {
552
+ "epoch": 3.9037433155080214,
553
+ "grad_norm": 2.0039021968841553,
554
+ "learning_rate": 0.003050802139037433,
555
+ "loss": 0.4148,
556
+ "step": 730
557
+ },
558
+ {
559
+ "epoch": 3.9572192513368982,
560
+ "grad_norm": 1.8037409782409668,
561
+ "learning_rate": 0.003024064171122995,
562
+ "loss": 0.3318,
563
+ "step": 740
564
+ },
565
+ {
566
+ "epoch": 4.0,
567
+ "eval_accuracy": 0.9351635514018691,
568
+ "eval_f1": 0.928485806906975,
569
+ "eval_loss": 0.17756415903568268,
570
+ "eval_precision": 0.9248343621199285,
571
+ "eval_recall": 0.9352570988138212,
572
+ "eval_runtime": 10.1719,
573
+ "eval_samples_per_second": 168.307,
574
+ "eval_steps_per_second": 10.519,
575
+ "step": 748
576
+ },
577
+ {
578
+ "epoch": 4.010695187165775,
579
+ "grad_norm": 2.230004072189331,
580
+ "learning_rate": 0.002997326203208556,
581
+ "loss": 0.4071,
582
+ "step": 750
583
+ },
584
+ {
585
+ "epoch": 4.064171122994653,
586
+ "grad_norm": 2.1018853187561035,
587
+ "learning_rate": 0.0029705882352941177,
588
+ "loss": 0.3498,
589
+ "step": 760
590
+ },
591
+ {
592
+ "epoch": 4.117647058823529,
593
+ "grad_norm": 1.6814857721328735,
594
+ "learning_rate": 0.002943850267379679,
595
+ "loss": 0.4085,
596
+ "step": 770
597
+ },
598
+ {
599
+ "epoch": 4.171122994652406,
600
+ "grad_norm": 2.0869903564453125,
601
+ "learning_rate": 0.0029171122994652403,
602
+ "loss": 0.4481,
603
+ "step": 780
604
+ },
605
+ {
606
+ "epoch": 4.224598930481283,
607
+ "grad_norm": 1.4043067693710327,
608
+ "learning_rate": 0.0028903743315508022,
609
+ "loss": 0.3234,
610
+ "step": 790
611
+ },
612
+ {
613
+ "epoch": 4.278074866310161,
614
+ "grad_norm": 2.0766959190368652,
615
+ "learning_rate": 0.0028636363636363638,
616
+ "loss": 0.3719,
617
+ "step": 800
618
+ },
619
+ {
620
+ "epoch": 4.331550802139038,
621
+ "grad_norm": 1.85934317111969,
622
+ "learning_rate": 0.0028368983957219253,
623
+ "loss": 0.4784,
624
+ "step": 810
625
+ },
626
+ {
627
+ "epoch": 4.385026737967914,
628
+ "grad_norm": 2.3728232383728027,
629
+ "learning_rate": 0.0028101604278074864,
630
+ "loss": 0.3704,
631
+ "step": 820
632
+ },
633
+ {
634
+ "epoch": 4.438502673796791,
635
+ "grad_norm": 1.2759883403778076,
636
+ "learning_rate": 0.002783422459893048,
637
+ "loss": 0.3283,
638
+ "step": 830
639
+ },
640
+ {
641
+ "epoch": 4.491978609625669,
642
+ "grad_norm": 1.2006633281707764,
643
+ "learning_rate": 0.00275668449197861,
644
+ "loss": 0.3792,
645
+ "step": 840
646
+ },
647
+ {
648
+ "epoch": 4.545454545454545,
649
+ "grad_norm": 2.0884652137756348,
650
+ "learning_rate": 0.0027299465240641714,
651
+ "loss": 0.4041,
652
+ "step": 850
653
+ },
654
+ {
655
+ "epoch": 4.598930481283422,
656
+ "grad_norm": 1.281827688217163,
657
+ "learning_rate": 0.0027032085561497325,
658
+ "loss": 0.352,
659
+ "step": 860
660
+ },
661
+ {
662
+ "epoch": 4.652406417112299,
663
+ "grad_norm": 1.7143138647079468,
664
+ "learning_rate": 0.002676470588235294,
665
+ "loss": 0.3896,
666
+ "step": 870
667
+ },
668
+ {
669
+ "epoch": 4.705882352941177,
670
+ "grad_norm": 2.069678544998169,
671
+ "learning_rate": 0.0026497326203208556,
672
+ "loss": 0.335,
673
+ "step": 880
674
+ },
675
+ {
676
+ "epoch": 4.759358288770054,
677
+ "grad_norm": 1.6988319158554077,
678
+ "learning_rate": 0.0026229946524064175,
679
+ "loss": 0.3693,
680
+ "step": 890
681
+ },
682
+ {
683
+ "epoch": 4.81283422459893,
684
+ "grad_norm": 1.6188457012176514,
685
+ "learning_rate": 0.0025962566844919786,
686
+ "loss": 0.337,
687
+ "step": 900
688
+ },
689
+ {
690
+ "epoch": 4.866310160427807,
691
+ "grad_norm": 2.0478222370147705,
692
+ "learning_rate": 0.00256951871657754,
693
+ "loss": 0.3156,
694
+ "step": 910
695
+ },
696
+ {
697
+ "epoch": 4.919786096256685,
698
+ "grad_norm": 1.7088401317596436,
699
+ "learning_rate": 0.0025427807486631017,
700
+ "loss": 0.3414,
701
+ "step": 920
702
+ },
703
+ {
704
+ "epoch": 4.973262032085562,
705
+ "grad_norm": 1.161230444908142,
706
+ "learning_rate": 0.002516042780748663,
707
+ "loss": 0.3461,
708
+ "step": 930
709
+ },
710
+ {
711
+ "epoch": 5.0,
712
+ "eval_accuracy": 0.9380841121495327,
713
+ "eval_f1": 0.9304948103477649,
714
+ "eval_loss": 0.17028363049030304,
715
+ "eval_precision": 0.9311071354745837,
716
+ "eval_recall": 0.9344001562456381,
717
+ "eval_runtime": 10.2604,
718
+ "eval_samples_per_second": 166.855,
719
+ "eval_steps_per_second": 10.428,
720
+ "step": 935
721
+ },
722
+ {
723
+ "epoch": 5.026737967914438,
724
+ "grad_norm": 1.723848819732666,
725
+ "learning_rate": 0.0024893048128342248,
726
+ "loss": 0.3622,
727
+ "step": 940
728
+ },
729
+ {
730
+ "epoch": 5.080213903743315,
731
+ "grad_norm": 2.0140602588653564,
732
+ "learning_rate": 0.002462566844919786,
733
+ "loss": 0.3973,
734
+ "step": 950
735
+ },
736
+ {
737
+ "epoch": 5.133689839572193,
738
+ "grad_norm": 1.5653032064437866,
739
+ "learning_rate": 0.002435828877005348,
740
+ "loss": 0.3106,
741
+ "step": 960
742
+ },
743
+ {
744
+ "epoch": 5.18716577540107,
745
+ "grad_norm": 1.7829616069793701,
746
+ "learning_rate": 0.002409090909090909,
747
+ "loss": 0.3723,
748
+ "step": 970
749
+ },
750
+ {
751
+ "epoch": 5.240641711229946,
752
+ "grad_norm": 0.9940521717071533,
753
+ "learning_rate": 0.0023823529411764704,
754
+ "loss": 0.3453,
755
+ "step": 980
756
+ },
757
+ {
758
+ "epoch": 5.294117647058823,
759
+ "grad_norm": 1.1114059686660767,
760
+ "learning_rate": 0.002355614973262032,
761
+ "loss": 0.3769,
762
+ "step": 990
763
+ },
764
+ {
765
+ "epoch": 5.347593582887701,
766
+ "grad_norm": 0.9444433450698853,
767
+ "learning_rate": 0.0023288770053475935,
768
+ "loss": 0.3489,
769
+ "step": 1000
770
+ },
771
+ {
772
+ "epoch": 5.401069518716578,
773
+ "grad_norm": 2.0856947898864746,
774
+ "learning_rate": 0.002302139037433155,
775
+ "loss": 0.374,
776
+ "step": 1010
777
+ },
778
+ {
779
+ "epoch": 5.454545454545454,
780
+ "grad_norm": 1.679477572441101,
781
+ "learning_rate": 0.0022754010695187166,
782
+ "loss": 0.3738,
783
+ "step": 1020
784
+ },
785
+ {
786
+ "epoch": 5.508021390374331,
787
+ "grad_norm": 1.3019518852233887,
788
+ "learning_rate": 0.002248663101604278,
789
+ "loss": 0.3634,
790
+ "step": 1030
791
+ },
792
+ {
793
+ "epoch": 5.561497326203209,
794
+ "grad_norm": 1.467846155166626,
795
+ "learning_rate": 0.0022219251336898396,
796
+ "loss": 0.3457,
797
+ "step": 1040
798
+ },
799
+ {
800
+ "epoch": 5.614973262032086,
801
+ "grad_norm": 1.6348631381988525,
802
+ "learning_rate": 0.002195187165775401,
803
+ "loss": 0.3216,
804
+ "step": 1050
805
+ },
806
+ {
807
+ "epoch": 5.668449197860962,
808
+ "grad_norm": 1.158215880393982,
809
+ "learning_rate": 0.0021684491978609627,
810
+ "loss": 0.3033,
811
+ "step": 1060
812
+ },
813
+ {
814
+ "epoch": 5.721925133689839,
815
+ "grad_norm": 0.8872423768043518,
816
+ "learning_rate": 0.002141711229946524,
817
+ "loss": 0.2919,
818
+ "step": 1070
819
+ },
820
+ {
821
+ "epoch": 5.775401069518717,
822
+ "grad_norm": 1.9146243333816528,
823
+ "learning_rate": 0.0021149732620320857,
824
+ "loss": 0.3228,
825
+ "step": 1080
826
+ },
827
+ {
828
+ "epoch": 5.828877005347594,
829
+ "grad_norm": 1.7084169387817383,
830
+ "learning_rate": 0.0020882352941176473,
831
+ "loss": 0.2754,
832
+ "step": 1090
833
+ },
834
+ {
835
+ "epoch": 5.882352941176471,
836
+ "grad_norm": 1.0626111030578613,
837
+ "learning_rate": 0.0020614973262032084,
838
+ "loss": 0.3165,
839
+ "step": 1100
840
+ },
841
+ {
842
+ "epoch": 5.935828877005347,
843
+ "grad_norm": 1.8155293464660645,
844
+ "learning_rate": 0.00203475935828877,
845
+ "loss": 0.2815,
846
+ "step": 1110
847
+ },
848
+ {
849
+ "epoch": 5.989304812834225,
850
+ "grad_norm": 1.8623782396316528,
851
+ "learning_rate": 0.0020080213903743314,
852
+ "loss": 0.3309,
853
+ "step": 1120
854
+ },
855
+ {
856
+ "epoch": 6.0,
857
+ "eval_accuracy": 0.9369158878504673,
858
+ "eval_f1": 0.9334719219156348,
859
+ "eval_loss": 0.19556888937950134,
860
+ "eval_precision": 0.9335706750233659,
861
+ "eval_recall": 0.9396740716392903,
862
+ "eval_runtime": 10.2767,
863
+ "eval_samples_per_second": 166.591,
864
+ "eval_steps_per_second": 10.412,
865
+ "step": 1122
866
+ },
867
+ {
868
+ "epoch": 6.042780748663102,
869
+ "grad_norm": 1.1055293083190918,
870
+ "learning_rate": 0.001981283422459893,
871
+ "loss": 0.3202,
872
+ "step": 1130
873
+ },
874
+ {
875
+ "epoch": 6.096256684491979,
876
+ "grad_norm": 1.7265422344207764,
877
+ "learning_rate": 0.0019545454545454545,
878
+ "loss": 0.2973,
879
+ "step": 1140
880
+ },
881
+ {
882
+ "epoch": 6.149732620320855,
883
+ "grad_norm": 2.0242912769317627,
884
+ "learning_rate": 0.001927807486631016,
885
+ "loss": 0.302,
886
+ "step": 1150
887
+ },
888
+ {
889
+ "epoch": 6.2032085561497325,
890
+ "grad_norm": 1.0210644006729126,
891
+ "learning_rate": 0.0019010695187165775,
892
+ "loss": 0.2785,
893
+ "step": 1160
894
+ },
895
+ {
896
+ "epoch": 6.25668449197861,
897
+ "grad_norm": 1.5111178159713745,
898
+ "learning_rate": 0.001874331550802139,
899
+ "loss": 0.2873,
900
+ "step": 1170
901
+ },
902
+ {
903
+ "epoch": 6.310160427807487,
904
+ "grad_norm": 1.060488224029541,
905
+ "learning_rate": 0.0018475935828877006,
906
+ "loss": 0.321,
907
+ "step": 1180
908
+ },
909
+ {
910
+ "epoch": 6.363636363636363,
911
+ "grad_norm": 1.0627189874649048,
912
+ "learning_rate": 0.0018208556149732621,
913
+ "loss": 0.2682,
914
+ "step": 1190
915
+ },
916
+ {
917
+ "epoch": 6.4171122994652405,
918
+ "grad_norm": 1.1237576007843018,
919
+ "learning_rate": 0.0017941176470588236,
920
+ "loss": 0.2383,
921
+ "step": 1200
922
+ },
923
+ {
924
+ "epoch": 6.470588235294118,
925
+ "grad_norm": 1.6101592779159546,
926
+ "learning_rate": 0.001767379679144385,
927
+ "loss": 0.3197,
928
+ "step": 1210
929
+ },
930
+ {
931
+ "epoch": 6.524064171122995,
932
+ "grad_norm": 0.6864691972732544,
933
+ "learning_rate": 0.0017406417112299467,
934
+ "loss": 0.2307,
935
+ "step": 1220
936
+ },
937
+ {
938
+ "epoch": 6.577540106951871,
939
+ "grad_norm": 1.339308500289917,
940
+ "learning_rate": 0.001713903743315508,
941
+ "loss": 0.2534,
942
+ "step": 1230
943
+ },
944
+ {
945
+ "epoch": 6.6310160427807485,
946
+ "grad_norm": 1.3319642543792725,
947
+ "learning_rate": 0.0016871657754010698,
948
+ "loss": 0.32,
949
+ "step": 1240
950
+ },
951
+ {
952
+ "epoch": 6.684491978609626,
953
+ "grad_norm": 1.4089816808700562,
954
+ "learning_rate": 0.001660427807486631,
955
+ "loss": 0.285,
956
+ "step": 1250
957
+ },
958
+ {
959
+ "epoch": 6.737967914438503,
960
+ "grad_norm": 1.212084174156189,
961
+ "learning_rate": 0.0016336898395721924,
962
+ "loss": 0.2217,
963
+ "step": 1260
964
+ },
965
+ {
966
+ "epoch": 6.791443850267379,
967
+ "grad_norm": 1.6609482765197754,
968
+ "learning_rate": 0.0016069518716577541,
969
+ "loss": 0.2952,
970
+ "step": 1270
971
+ },
972
+ {
973
+ "epoch": 6.8449197860962565,
974
+ "grad_norm": 1.060892105102539,
975
+ "learning_rate": 0.0015802139037433154,
976
+ "loss": 0.2524,
977
+ "step": 1280
978
+ },
979
+ {
980
+ "epoch": 6.898395721925134,
981
+ "grad_norm": 1.3365124464035034,
982
+ "learning_rate": 0.001553475935828877,
983
+ "loss": 0.2694,
984
+ "step": 1290
985
+ },
986
+ {
987
+ "epoch": 6.951871657754011,
988
+ "grad_norm": 1.1521918773651123,
989
+ "learning_rate": 0.0015267379679144385,
990
+ "loss": 0.3088,
991
+ "step": 1300
992
+ },
993
+ {
994
+ "epoch": 7.0,
995
+ "eval_accuracy": 0.9532710280373832,
996
+ "eval_f1": 0.9461125894090557,
997
+ "eval_loss": 0.11792106181383133,
998
+ "eval_precision": 0.9426583892398479,
999
+ "eval_recall": 0.952515495389921,
1000
+ "eval_runtime": 10.3853,
1001
+ "eval_samples_per_second": 164.849,
1002
+ "eval_steps_per_second": 10.303,
1003
+ "step": 1309
1004
+ },
1005
+ {
1006
+ "epoch": 7.005347593582887,
1007
+ "grad_norm": 0.8682220578193665,
1008
+ "learning_rate": 0.0015,
1009
+ "loss": 0.2627,
1010
+ "step": 1310
1011
+ },
1012
+ {
1013
+ "epoch": 7.0588235294117645,
1014
+ "grad_norm": 2.279827356338501,
1015
+ "learning_rate": 0.0014732620320855616,
1016
+ "loss": 0.2796,
1017
+ "step": 1320
1018
+ },
1019
+ {
1020
+ "epoch": 7.112299465240642,
1021
+ "grad_norm": 1.3697049617767334,
1022
+ "learning_rate": 0.001446524064171123,
1023
+ "loss": 0.2369,
1024
+ "step": 1330
1025
+ },
1026
+ {
1027
+ "epoch": 7.165775401069519,
1028
+ "grad_norm": 0.8857790231704712,
1029
+ "learning_rate": 0.0014197860962566844,
1030
+ "loss": 0.2648,
1031
+ "step": 1340
1032
+ },
1033
+ {
1034
+ "epoch": 7.219251336898395,
1035
+ "grad_norm": 2.053224802017212,
1036
+ "learning_rate": 0.0013930481283422461,
1037
+ "loss": 0.212,
1038
+ "step": 1350
1039
+ },
1040
+ {
1041
+ "epoch": 7.2727272727272725,
1042
+ "grad_norm": 1.619578242301941,
1043
+ "learning_rate": 0.0013663101604278075,
1044
+ "loss": 0.2229,
1045
+ "step": 1360
1046
+ },
1047
+ {
1048
+ "epoch": 7.32620320855615,
1049
+ "grad_norm": 1.3765966892242432,
1050
+ "learning_rate": 0.0013395721925133692,
1051
+ "loss": 0.2311,
1052
+ "step": 1370
1053
+ },
1054
+ {
1055
+ "epoch": 7.379679144385027,
1056
+ "grad_norm": 1.2967066764831543,
1057
+ "learning_rate": 0.0013128342245989305,
1058
+ "loss": 0.2402,
1059
+ "step": 1380
1060
+ },
1061
+ {
1062
+ "epoch": 7.433155080213904,
1063
+ "grad_norm": 1.2961163520812988,
1064
+ "learning_rate": 0.0012860962566844918,
1065
+ "loss": 0.2318,
1066
+ "step": 1390
1067
+ },
1068
+ {
1069
+ "epoch": 7.4866310160427805,
1070
+ "grad_norm": 1.6240290403366089,
1071
+ "learning_rate": 0.0012593582887700536,
1072
+ "loss": 0.2669,
1073
+ "step": 1400
1074
+ },
1075
+ {
1076
+ "epoch": 7.540106951871658,
1077
+ "grad_norm": 1.1457808017730713,
1078
+ "learning_rate": 0.0012326203208556149,
1079
+ "loss": 0.2887,
1080
+ "step": 1410
1081
+ },
1082
+ {
1083
+ "epoch": 7.593582887700535,
1084
+ "grad_norm": 1.303931474685669,
1085
+ "learning_rate": 0.0012058823529411764,
1086
+ "loss": 0.2862,
1087
+ "step": 1420
1088
+ },
1089
+ {
1090
+ "epoch": 7.647058823529412,
1091
+ "grad_norm": 0.9429693222045898,
1092
+ "learning_rate": 0.001179144385026738,
1093
+ "loss": 0.2282,
1094
+ "step": 1430
1095
+ },
1096
+ {
1097
+ "epoch": 7.7005347593582885,
1098
+ "grad_norm": 1.349269986152649,
1099
+ "learning_rate": 0.0011524064171122995,
1100
+ "loss": 0.2414,
1101
+ "step": 1440
1102
+ },
1103
+ {
1104
+ "epoch": 7.754010695187166,
1105
+ "grad_norm": 1.185160517692566,
1106
+ "learning_rate": 0.001125668449197861,
1107
+ "loss": 0.219,
1108
+ "step": 1450
1109
+ },
1110
+ {
1111
+ "epoch": 7.807486631016043,
1112
+ "grad_norm": 1.5935460329055786,
1113
+ "learning_rate": 0.0010989304812834225,
1114
+ "loss": 0.2109,
1115
+ "step": 1460
1116
+ },
1117
+ {
1118
+ "epoch": 7.86096256684492,
1119
+ "grad_norm": 1.4563795328140259,
1120
+ "learning_rate": 0.001072192513368984,
1121
+ "loss": 0.2943,
1122
+ "step": 1470
1123
+ },
1124
+ {
1125
+ "epoch": 7.9144385026737964,
1126
+ "grad_norm": 1.2570650577545166,
1127
+ "learning_rate": 0.0010454545454545454,
1128
+ "loss": 0.2275,
1129
+ "step": 1480
1130
+ },
1131
+ {
1132
+ "epoch": 7.967914438502674,
1133
+ "grad_norm": 0.6930679082870483,
1134
+ "learning_rate": 0.001018716577540107,
1135
+ "loss": 0.2129,
1136
+ "step": 1490
1137
+ },
1138
+ {
1139
+ "epoch": 8.0,
1140
+ "eval_accuracy": 0.9637850467289719,
1141
+ "eval_f1": 0.9610548371575116,
1142
+ "eval_loss": 0.09920904040336609,
1143
+ "eval_precision": 0.9569323583080014,
1144
+ "eval_recall": 0.9673920345290172,
1145
+ "eval_runtime": 10.543,
1146
+ "eval_samples_per_second": 162.382,
1147
+ "eval_steps_per_second": 10.149,
1148
+ "step": 1496
1149
+ },
1150
+ {
1151
+ "epoch": 8.02139037433155,
1152
+ "grad_norm": 1.4018137454986572,
1153
+ "learning_rate": 0.0009919786096256684,
1154
+ "loss": 0.2638,
1155
+ "step": 1500
1156
+ },
1157
+ {
1158
+ "epoch": 8.074866310160427,
1159
+ "grad_norm": 1.2713522911071777,
1160
+ "learning_rate": 0.00096524064171123,
1161
+ "loss": 0.2099,
1162
+ "step": 1510
1163
+ },
1164
+ {
1165
+ "epoch": 8.128342245989305,
1166
+ "grad_norm": 1.004296064376831,
1167
+ "learning_rate": 0.0009385026737967915,
1168
+ "loss": 0.1801,
1169
+ "step": 1520
1170
+ },
1171
+ {
1172
+ "epoch": 8.181818181818182,
1173
+ "grad_norm": 0.7041844129562378,
1174
+ "learning_rate": 0.0009117647058823529,
1175
+ "loss": 0.1829,
1176
+ "step": 1530
1177
+ },
1178
+ {
1179
+ "epoch": 8.235294117647058,
1180
+ "grad_norm": 1.3204301595687866,
1181
+ "learning_rate": 0.0008850267379679144,
1182
+ "loss": 0.2444,
1183
+ "step": 1540
1184
+ },
1185
+ {
1186
+ "epoch": 8.288770053475936,
1187
+ "grad_norm": 1.261974573135376,
1188
+ "learning_rate": 0.000858288770053476,
1189
+ "loss": 0.2431,
1190
+ "step": 1550
1191
+ },
1192
+ {
1193
+ "epoch": 8.342245989304812,
1194
+ "grad_norm": 0.9899649024009705,
1195
+ "learning_rate": 0.0008315508021390375,
1196
+ "loss": 0.1808,
1197
+ "step": 1560
1198
+ },
1199
+ {
1200
+ "epoch": 8.39572192513369,
1201
+ "grad_norm": 1.150225281715393,
1202
+ "learning_rate": 0.0008048128342245989,
1203
+ "loss": 0.2048,
1204
+ "step": 1570
1205
+ },
1206
+ {
1207
+ "epoch": 8.449197860962567,
1208
+ "grad_norm": 0.9454184770584106,
1209
+ "learning_rate": 0.0007780748663101605,
1210
+ "loss": 0.1919,
1211
+ "step": 1580
1212
+ },
1213
+ {
1214
+ "epoch": 8.502673796791443,
1215
+ "grad_norm": 1.26669442653656,
1216
+ "learning_rate": 0.000751336898395722,
1217
+ "loss": 0.1837,
1218
+ "step": 1590
1219
+ },
1220
+ {
1221
+ "epoch": 8.556149732620321,
1222
+ "grad_norm": 0.8547130823135376,
1223
+ "learning_rate": 0.0007245989304812835,
1224
+ "loss": 0.1774,
1225
+ "step": 1600
1226
+ },
1227
+ {
1228
+ "epoch": 8.609625668449198,
1229
+ "grad_norm": 1.8781049251556396,
1230
+ "learning_rate": 0.000697860962566845,
1231
+ "loss": 0.2202,
1232
+ "step": 1610
1233
+ },
1234
+ {
1235
+ "epoch": 8.663101604278076,
1236
+ "grad_norm": 0.7876987457275391,
1237
+ "learning_rate": 0.0006711229946524064,
1238
+ "loss": 0.1781,
1239
+ "step": 1620
1240
+ },
1241
+ {
1242
+ "epoch": 8.716577540106952,
1243
+ "grad_norm": 1.2137806415557861,
1244
+ "learning_rate": 0.0006443850267379679,
1245
+ "loss": 0.1722,
1246
+ "step": 1630
1247
+ },
1248
+ {
1249
+ "epoch": 8.770053475935828,
1250
+ "grad_norm": 1.6328903436660767,
1251
+ "learning_rate": 0.0006176470588235294,
1252
+ "loss": 0.2085,
1253
+ "step": 1640
1254
+ },
1255
+ {
1256
+ "epoch": 8.823529411764707,
1257
+ "grad_norm": 0.9435901641845703,
1258
+ "learning_rate": 0.0005909090909090909,
1259
+ "loss": 0.2335,
1260
+ "step": 1650
1261
+ },
1262
+ {
1263
+ "epoch": 8.877005347593583,
1264
+ "grad_norm": 1.1905876398086548,
1265
+ "learning_rate": 0.0005641711229946525,
1266
+ "loss": 0.2387,
1267
+ "step": 1660
1268
+ },
1269
+ {
1270
+ "epoch": 8.93048128342246,
1271
+ "grad_norm": 0.8758776783943176,
1272
+ "learning_rate": 0.0005374331550802139,
1273
+ "loss": 0.2265,
1274
+ "step": 1670
1275
+ },
1276
+ {
1277
+ "epoch": 8.983957219251337,
1278
+ "grad_norm": 1.3745719194412231,
1279
+ "learning_rate": 0.0005106951871657754,
1280
+ "loss": 0.2049,
1281
+ "step": 1680
1282
+ },
1283
+ {
1284
+ "epoch": 9.0,
1285
+ "eval_accuracy": 0.967873831775701,
1286
+ "eval_f1": 0.9651132770824573,
1287
+ "eval_loss": 0.08469934016466141,
1288
+ "eval_precision": 0.9626628225985181,
1289
+ "eval_recall": 0.9683070024371949,
1290
+ "eval_runtime": 10.3829,
1291
+ "eval_samples_per_second": 164.887,
1292
+ "eval_steps_per_second": 10.305,
1293
+ "step": 1683
1294
+ },
1295
+ {
1296
+ "epoch": 9.037433155080214,
1297
+ "grad_norm": 0.9230683445930481,
1298
+ "learning_rate": 0.0004839572192513369,
1299
+ "loss": 0.1654,
1300
+ "step": 1690
1301
+ },
1302
+ {
1303
+ "epoch": 9.090909090909092,
1304
+ "grad_norm": 0.8362302184104919,
1305
+ "learning_rate": 0.0004572192513368984,
1306
+ "loss": 0.1918,
1307
+ "step": 1700
1308
+ },
1309
+ {
1310
+ "epoch": 9.144385026737968,
1311
+ "grad_norm": 1.3025470972061157,
1312
+ "learning_rate": 0.0004304812834224599,
1313
+ "loss": 0.1497,
1314
+ "step": 1710
1315
+ },
1316
+ {
1317
+ "epoch": 9.197860962566844,
1318
+ "grad_norm": 0.8339858055114746,
1319
+ "learning_rate": 0.00040374331550802143,
1320
+ "loss": 0.196,
1321
+ "step": 1720
1322
+ },
1323
+ {
1324
+ "epoch": 9.251336898395722,
1325
+ "grad_norm": 1.3273382186889648,
1326
+ "learning_rate": 0.00037700534759358285,
1327
+ "loss": 0.1912,
1328
+ "step": 1730
1329
+ },
1330
+ {
1331
+ "epoch": 9.304812834224599,
1332
+ "grad_norm": 0.5822441577911377,
1333
+ "learning_rate": 0.0003502673796791444,
1334
+ "loss": 0.1452,
1335
+ "step": 1740
1336
+ },
1337
+ {
1338
+ "epoch": 9.358288770053475,
1339
+ "grad_norm": 0.8451639413833618,
1340
+ "learning_rate": 0.0003235294117647059,
1341
+ "loss": 0.1877,
1342
+ "step": 1750
1343
+ },
1344
+ {
1345
+ "epoch": 9.411764705882353,
1346
+ "grad_norm": 1.0270066261291504,
1347
+ "learning_rate": 0.0002967914438502674,
1348
+ "loss": 0.1964,
1349
+ "step": 1760
1350
+ },
1351
+ {
1352
+ "epoch": 9.46524064171123,
1353
+ "grad_norm": 1.0621460676193237,
1354
+ "learning_rate": 0.00027005347593582886,
1355
+ "loss": 0.2015,
1356
+ "step": 1770
1357
+ },
1358
+ {
1359
+ "epoch": 9.518716577540108,
1360
+ "grad_norm": 0.9587564468383789,
1361
+ "learning_rate": 0.00024331550802139036,
1362
+ "loss": 0.1962,
1363
+ "step": 1780
1364
+ },
1365
+ {
1366
+ "epoch": 9.572192513368984,
1367
+ "grad_norm": 0.719536304473877,
1368
+ "learning_rate": 0.00021657754010695186,
1369
+ "loss": 0.1389,
1370
+ "step": 1790
1371
+ },
1372
+ {
1373
+ "epoch": 9.62566844919786,
1374
+ "grad_norm": 0.89113450050354,
1375
+ "learning_rate": 0.0001898395721925134,
1376
+ "loss": 0.1783,
1377
+ "step": 1800
1378
+ },
1379
+ {
1380
+ "epoch": 9.679144385026738,
1381
+ "grad_norm": 0.8831282258033752,
1382
+ "learning_rate": 0.0001631016042780749,
1383
+ "loss": 0.1871,
1384
+ "step": 1810
1385
+ },
1386
+ {
1387
+ "epoch": 9.732620320855615,
1388
+ "grad_norm": 0.6015557646751404,
1389
+ "learning_rate": 0.00013636363636363637,
1390
+ "loss": 0.1414,
1391
+ "step": 1820
1392
+ },
1393
+ {
1394
+ "epoch": 9.786096256684491,
1395
+ "grad_norm": 1.1582796573638916,
1396
+ "learning_rate": 0.00010962566844919787,
1397
+ "loss": 0.2408,
1398
+ "step": 1830
1399
+ },
1400
+ {
1401
+ "epoch": 9.83957219251337,
1402
+ "grad_norm": 0.7856789231300354,
1403
+ "learning_rate": 8.288770053475936e-05,
1404
+ "loss": 0.145,
1405
+ "step": 1840
1406
+ },
1407
+ {
1408
+ "epoch": 9.893048128342246,
1409
+ "grad_norm": 1.1010181903839111,
1410
+ "learning_rate": 5.614973262032086e-05,
1411
+ "loss": 0.1758,
1412
+ "step": 1850
1413
+ },
1414
+ {
1415
+ "epoch": 9.946524064171124,
1416
+ "grad_norm": 0.7676904797554016,
1417
+ "learning_rate": 2.9411764705882354e-05,
1418
+ "loss": 0.1683,
1419
+ "step": 1860
1420
+ },
1421
+ {
1422
+ "epoch": 10.0,
1423
+ "grad_norm": 1.4464507102966309,
1424
+ "learning_rate": 2.6737967914438504e-06,
1425
+ "loss": 0.2007,
1426
+ "step": 1870
1427
+ },
1428
+ {
1429
+ "epoch": 10.0,
1430
+ "eval_accuracy": 0.9707943925233645,
1431
+ "eval_f1": 0.9697517307733657,
1432
+ "eval_loss": 0.07853860408067703,
1433
+ "eval_precision": 0.9668363312878312,
1434
+ "eval_recall": 0.9737482240908748,
1435
+ "eval_runtime": 10.3924,
1436
+ "eval_samples_per_second": 164.735,
1437
+ "eval_steps_per_second": 10.296,
1438
+ "step": 1870
1439
+ },
1440
+ {
1441
+ "epoch": 10.0,
1442
+ "step": 1870,
1443
+ "total_flos": 9.328175742872125e+18,
1444
+ "train_loss": 0.3662890907277398,
1445
+ "train_runtime": 1600.7009,
1446
+ "train_samples_per_second": 74.711,
1447
+ "train_steps_per_second": 1.168
1448
+ }
1449
+ ],
1450
+ "logging_steps": 10,
1451
+ "max_steps": 1870,
1452
+ "num_input_tokens_seen": 0,
1453
+ "num_train_epochs": 10,
1454
+ "save_steps": 500,
1455
+ "total_flos": 9.328175742872125e+18,
1456
+ "train_batch_size": 16,
1457
+ "trial_name": null,
1458
+ "trial_params": null
1459
+ }