Jayanth2002 commited on
Commit
0562c92
·
1 Parent(s): 69f322b

End of training

Browse files
Files changed (4) hide show
  1. all_results.json +13 -0
  2. eval_results.json +8 -0
  3. train_results.json +8 -0
  4. trainer_state.json +1810 -0
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 9.98,
3
+ "eval_accuracy": 0.9342629482071713,
4
+ "eval_loss": 0.19922704994678497,
5
+ "eval_runtime": 73.0613,
6
+ "eval_samples_per_second": 54.968,
7
+ "eval_steps_per_second": 1.725,
8
+ "total_flos": 2.796343146304469e+19,
9
+ "train_loss": 0.5066332549913555,
10
+ "train_runtime": 12045.0104,
11
+ "train_samples_per_second": 30.002,
12
+ "train_steps_per_second": 0.234
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 9.98,
3
+ "eval_accuracy": 0.9342629482071713,
4
+ "eval_loss": 0.19922704994678497,
5
+ "eval_runtime": 73.0613,
6
+ "eval_samples_per_second": 54.968,
7
+ "eval_steps_per_second": 1.725
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 9.98,
3
+ "total_flos": 2.796343146304469e+19,
4
+ "train_loss": 0.5066332549913555,
5
+ "train_runtime": 12045.0104,
6
+ "train_samples_per_second": 30.002,
7
+ "train_steps_per_second": 0.234
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1810 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9342629482071713,
3
+ "best_model_checkpoint": "vit_base_patch16_224-finetuned-SkinDisease/checkpoint-2820",
4
+ "epoch": 9.982300884955752,
5
+ "eval_steps": 500,
6
+ "global_step": 2820,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.04,
13
+ "learning_rate": 1.7730496453900712e-06,
14
+ "loss": 3.5665,
15
+ "step": 10
16
+ },
17
+ {
18
+ "epoch": 0.07,
19
+ "learning_rate": 3.5460992907801423e-06,
20
+ "loss": 3.539,
21
+ "step": 20
22
+ },
23
+ {
24
+ "epoch": 0.11,
25
+ "learning_rate": 5.319148936170213e-06,
26
+ "loss": 3.4681,
27
+ "step": 30
28
+ },
29
+ {
30
+ "epoch": 0.14,
31
+ "learning_rate": 7.092198581560285e-06,
32
+ "loss": 3.3703,
33
+ "step": 40
34
+ },
35
+ {
36
+ "epoch": 0.18,
37
+ "learning_rate": 8.865248226950355e-06,
38
+ "loss": 3.22,
39
+ "step": 50
40
+ },
41
+ {
42
+ "epoch": 0.21,
43
+ "learning_rate": 1.0638297872340426e-05,
44
+ "loss": 3.1027,
45
+ "step": 60
46
+ },
47
+ {
48
+ "epoch": 0.25,
49
+ "learning_rate": 1.2411347517730498e-05,
50
+ "loss": 2.9537,
51
+ "step": 70
52
+ },
53
+ {
54
+ "epoch": 0.28,
55
+ "learning_rate": 1.418439716312057e-05,
56
+ "loss": 2.7977,
57
+ "step": 80
58
+ },
59
+ {
60
+ "epoch": 0.32,
61
+ "learning_rate": 1.595744680851064e-05,
62
+ "loss": 2.5979,
63
+ "step": 90
64
+ },
65
+ {
66
+ "epoch": 0.35,
67
+ "learning_rate": 1.773049645390071e-05,
68
+ "loss": 2.5062,
69
+ "step": 100
70
+ },
71
+ {
72
+ "epoch": 0.39,
73
+ "learning_rate": 1.950354609929078e-05,
74
+ "loss": 2.2981,
75
+ "step": 110
76
+ },
77
+ {
78
+ "epoch": 0.42,
79
+ "learning_rate": 2.1276595744680852e-05,
80
+ "loss": 2.1816,
81
+ "step": 120
82
+ },
83
+ {
84
+ "epoch": 0.46,
85
+ "learning_rate": 2.3049645390070924e-05,
86
+ "loss": 2.0612,
87
+ "step": 130
88
+ },
89
+ {
90
+ "epoch": 0.5,
91
+ "learning_rate": 2.4822695035460995e-05,
92
+ "loss": 1.9642,
93
+ "step": 140
94
+ },
95
+ {
96
+ "epoch": 0.53,
97
+ "learning_rate": 2.6595744680851064e-05,
98
+ "loss": 1.7489,
99
+ "step": 150
100
+ },
101
+ {
102
+ "epoch": 0.57,
103
+ "learning_rate": 2.836879432624114e-05,
104
+ "loss": 1.7245,
105
+ "step": 160
106
+ },
107
+ {
108
+ "epoch": 0.6,
109
+ "learning_rate": 3.0141843971631207e-05,
110
+ "loss": 1.6003,
111
+ "step": 170
112
+ },
113
+ {
114
+ "epoch": 0.64,
115
+ "learning_rate": 3.191489361702128e-05,
116
+ "loss": 1.5055,
117
+ "step": 180
118
+ },
119
+ {
120
+ "epoch": 0.67,
121
+ "learning_rate": 3.3687943262411347e-05,
122
+ "loss": 1.3892,
123
+ "step": 190
124
+ },
125
+ {
126
+ "epoch": 0.71,
127
+ "learning_rate": 3.546099290780142e-05,
128
+ "loss": 1.2483,
129
+ "step": 200
130
+ },
131
+ {
132
+ "epoch": 0.74,
133
+ "learning_rate": 3.723404255319149e-05,
134
+ "loss": 1.2776,
135
+ "step": 210
136
+ },
137
+ {
138
+ "epoch": 0.78,
139
+ "learning_rate": 3.900709219858156e-05,
140
+ "loss": 1.1794,
141
+ "step": 220
142
+ },
143
+ {
144
+ "epoch": 0.81,
145
+ "learning_rate": 4.078014184397163e-05,
146
+ "loss": 1.1382,
147
+ "step": 230
148
+ },
149
+ {
150
+ "epoch": 0.85,
151
+ "learning_rate": 4.2553191489361704e-05,
152
+ "loss": 1.1257,
153
+ "step": 240
154
+ },
155
+ {
156
+ "epoch": 0.88,
157
+ "learning_rate": 4.432624113475177e-05,
158
+ "loss": 1.1178,
159
+ "step": 250
160
+ },
161
+ {
162
+ "epoch": 0.92,
163
+ "learning_rate": 4.609929078014185e-05,
164
+ "loss": 0.9985,
165
+ "step": 260
166
+ },
167
+ {
168
+ "epoch": 0.96,
169
+ "learning_rate": 4.787234042553192e-05,
170
+ "loss": 1.0547,
171
+ "step": 270
172
+ },
173
+ {
174
+ "epoch": 0.99,
175
+ "learning_rate": 4.964539007092199e-05,
176
+ "loss": 0.9099,
177
+ "step": 280
178
+ },
179
+ {
180
+ "epoch": 1.0,
181
+ "eval_accuracy": 0.764691235059761,
182
+ "eval_loss": 0.8247547149658203,
183
+ "eval_runtime": 88.9587,
184
+ "eval_samples_per_second": 45.145,
185
+ "eval_steps_per_second": 1.416,
186
+ "step": 282
187
+ },
188
+ {
189
+ "epoch": 1.03,
190
+ "learning_rate": 4.984239558707644e-05,
191
+ "loss": 0.9566,
192
+ "step": 290
193
+ },
194
+ {
195
+ "epoch": 1.06,
196
+ "learning_rate": 4.964539007092199e-05,
197
+ "loss": 0.8344,
198
+ "step": 300
199
+ },
200
+ {
201
+ "epoch": 1.1,
202
+ "learning_rate": 4.944838455476754e-05,
203
+ "loss": 0.8723,
204
+ "step": 310
205
+ },
206
+ {
207
+ "epoch": 1.13,
208
+ "learning_rate": 4.9251379038613084e-05,
209
+ "loss": 0.8365,
210
+ "step": 320
211
+ },
212
+ {
213
+ "epoch": 1.17,
214
+ "learning_rate": 4.905437352245863e-05,
215
+ "loss": 0.7845,
216
+ "step": 330
217
+ },
218
+ {
219
+ "epoch": 1.2,
220
+ "learning_rate": 4.885736800630418e-05,
221
+ "loss": 0.7433,
222
+ "step": 340
223
+ },
224
+ {
225
+ "epoch": 1.24,
226
+ "learning_rate": 4.8660362490149725e-05,
227
+ "loss": 0.7748,
228
+ "step": 350
229
+ },
230
+ {
231
+ "epoch": 1.27,
232
+ "learning_rate": 4.846335697399527e-05,
233
+ "loss": 0.7358,
234
+ "step": 360
235
+ },
236
+ {
237
+ "epoch": 1.31,
238
+ "learning_rate": 4.826635145784082e-05,
239
+ "loss": 0.7416,
240
+ "step": 370
241
+ },
242
+ {
243
+ "epoch": 1.35,
244
+ "learning_rate": 4.806934594168637e-05,
245
+ "loss": 0.713,
246
+ "step": 380
247
+ },
248
+ {
249
+ "epoch": 1.38,
250
+ "learning_rate": 4.787234042553192e-05,
251
+ "loss": 0.7054,
252
+ "step": 390
253
+ },
254
+ {
255
+ "epoch": 1.42,
256
+ "learning_rate": 4.7675334909377466e-05,
257
+ "loss": 0.628,
258
+ "step": 400
259
+ },
260
+ {
261
+ "epoch": 1.45,
262
+ "learning_rate": 4.747832939322301e-05,
263
+ "loss": 0.6797,
264
+ "step": 410
265
+ },
266
+ {
267
+ "epoch": 1.49,
268
+ "learning_rate": 4.728132387706856e-05,
269
+ "loss": 0.6752,
270
+ "step": 420
271
+ },
272
+ {
273
+ "epoch": 1.52,
274
+ "learning_rate": 4.7084318360914107e-05,
275
+ "loss": 0.6186,
276
+ "step": 430
277
+ },
278
+ {
279
+ "epoch": 1.56,
280
+ "learning_rate": 4.6887312844759653e-05,
281
+ "loss": 0.6372,
282
+ "step": 440
283
+ },
284
+ {
285
+ "epoch": 1.59,
286
+ "learning_rate": 4.669030732860521e-05,
287
+ "loss": 0.6946,
288
+ "step": 450
289
+ },
290
+ {
291
+ "epoch": 1.63,
292
+ "learning_rate": 4.6493301812450754e-05,
293
+ "loss": 0.5847,
294
+ "step": 460
295
+ },
296
+ {
297
+ "epoch": 1.66,
298
+ "learning_rate": 4.62962962962963e-05,
299
+ "loss": 0.5871,
300
+ "step": 470
301
+ },
302
+ {
303
+ "epoch": 1.7,
304
+ "learning_rate": 4.609929078014185e-05,
305
+ "loss": 0.627,
306
+ "step": 480
307
+ },
308
+ {
309
+ "epoch": 1.73,
310
+ "learning_rate": 4.5902285263987394e-05,
311
+ "loss": 0.5451,
312
+ "step": 490
313
+ },
314
+ {
315
+ "epoch": 1.77,
316
+ "learning_rate": 4.570527974783294e-05,
317
+ "loss": 0.608,
318
+ "step": 500
319
+ },
320
+ {
321
+ "epoch": 1.81,
322
+ "learning_rate": 4.550827423167849e-05,
323
+ "loss": 0.666,
324
+ "step": 510
325
+ },
326
+ {
327
+ "epoch": 1.84,
328
+ "learning_rate": 4.5311268715524035e-05,
329
+ "loss": 0.5734,
330
+ "step": 520
331
+ },
332
+ {
333
+ "epoch": 1.88,
334
+ "learning_rate": 4.511426319936958e-05,
335
+ "loss": 0.5667,
336
+ "step": 530
337
+ },
338
+ {
339
+ "epoch": 1.91,
340
+ "learning_rate": 4.491725768321513e-05,
341
+ "loss": 0.5508,
342
+ "step": 540
343
+ },
344
+ {
345
+ "epoch": 1.95,
346
+ "learning_rate": 4.4720252167060676e-05,
347
+ "loss": 0.5871,
348
+ "step": 550
349
+ },
350
+ {
351
+ "epoch": 1.98,
352
+ "learning_rate": 4.452324665090622e-05,
353
+ "loss": 0.5848,
354
+ "step": 560
355
+ },
356
+ {
357
+ "epoch": 2.0,
358
+ "eval_accuracy": 0.8747509960159362,
359
+ "eval_loss": 0.42364752292633057,
360
+ "eval_runtime": 85.8744,
361
+ "eval_samples_per_second": 46.766,
362
+ "eval_steps_per_second": 1.467,
363
+ "step": 565
364
+ },
365
+ {
366
+ "epoch": 2.02,
367
+ "learning_rate": 4.432624113475177e-05,
368
+ "loss": 0.5252,
369
+ "step": 570
370
+ },
371
+ {
372
+ "epoch": 2.05,
373
+ "learning_rate": 4.412923561859732e-05,
374
+ "loss": 0.4713,
375
+ "step": 580
376
+ },
377
+ {
378
+ "epoch": 2.09,
379
+ "learning_rate": 4.393223010244287e-05,
380
+ "loss": 0.4634,
381
+ "step": 590
382
+ },
383
+ {
384
+ "epoch": 2.12,
385
+ "learning_rate": 4.373522458628842e-05,
386
+ "loss": 0.5289,
387
+ "step": 600
388
+ },
389
+ {
390
+ "epoch": 2.16,
391
+ "learning_rate": 4.353821907013397e-05,
392
+ "loss": 0.4863,
393
+ "step": 610
394
+ },
395
+ {
396
+ "epoch": 2.19,
397
+ "learning_rate": 4.334121355397952e-05,
398
+ "loss": 0.4758,
399
+ "step": 620
400
+ },
401
+ {
402
+ "epoch": 2.23,
403
+ "learning_rate": 4.3144208037825064e-05,
404
+ "loss": 0.4495,
405
+ "step": 630
406
+ },
407
+ {
408
+ "epoch": 2.27,
409
+ "learning_rate": 4.294720252167061e-05,
410
+ "loss": 0.4246,
411
+ "step": 640
412
+ },
413
+ {
414
+ "epoch": 2.3,
415
+ "learning_rate": 4.275019700551616e-05,
416
+ "loss": 0.4965,
417
+ "step": 650
418
+ },
419
+ {
420
+ "epoch": 2.34,
421
+ "learning_rate": 4.2553191489361704e-05,
422
+ "loss": 0.4253,
423
+ "step": 660
424
+ },
425
+ {
426
+ "epoch": 2.37,
427
+ "learning_rate": 4.235618597320725e-05,
428
+ "loss": 0.4851,
429
+ "step": 670
430
+ },
431
+ {
432
+ "epoch": 2.41,
433
+ "learning_rate": 4.21591804570528e-05,
434
+ "loss": 0.4589,
435
+ "step": 680
436
+ },
437
+ {
438
+ "epoch": 2.44,
439
+ "learning_rate": 4.1962174940898345e-05,
440
+ "loss": 0.4366,
441
+ "step": 690
442
+ },
443
+ {
444
+ "epoch": 2.48,
445
+ "learning_rate": 4.176516942474389e-05,
446
+ "loss": 0.4655,
447
+ "step": 700
448
+ },
449
+ {
450
+ "epoch": 2.51,
451
+ "learning_rate": 4.156816390858944e-05,
452
+ "loss": 0.4857,
453
+ "step": 710
454
+ },
455
+ {
456
+ "epoch": 2.55,
457
+ "learning_rate": 4.1371158392434986e-05,
458
+ "loss": 0.4506,
459
+ "step": 720
460
+ },
461
+ {
462
+ "epoch": 2.58,
463
+ "learning_rate": 4.117415287628054e-05,
464
+ "loss": 0.4374,
465
+ "step": 730
466
+ },
467
+ {
468
+ "epoch": 2.62,
469
+ "learning_rate": 4.0977147360126086e-05,
470
+ "loss": 0.4443,
471
+ "step": 740
472
+ },
473
+ {
474
+ "epoch": 2.65,
475
+ "learning_rate": 4.078014184397163e-05,
476
+ "loss": 0.4467,
477
+ "step": 750
478
+ },
479
+ {
480
+ "epoch": 2.69,
481
+ "learning_rate": 4.058313632781718e-05,
482
+ "loss": 0.4704,
483
+ "step": 760
484
+ },
485
+ {
486
+ "epoch": 2.73,
487
+ "learning_rate": 4.0386130811662727e-05,
488
+ "loss": 0.425,
489
+ "step": 770
490
+ },
491
+ {
492
+ "epoch": 2.76,
493
+ "learning_rate": 4.018912529550828e-05,
494
+ "loss": 0.4493,
495
+ "step": 780
496
+ },
497
+ {
498
+ "epoch": 2.8,
499
+ "learning_rate": 3.999211977935383e-05,
500
+ "loss": 0.4149,
501
+ "step": 790
502
+ },
503
+ {
504
+ "epoch": 2.83,
505
+ "learning_rate": 3.9795114263199374e-05,
506
+ "loss": 0.4443,
507
+ "step": 800
508
+ },
509
+ {
510
+ "epoch": 2.87,
511
+ "learning_rate": 3.959810874704492e-05,
512
+ "loss": 0.4401,
513
+ "step": 810
514
+ },
515
+ {
516
+ "epoch": 2.9,
517
+ "learning_rate": 3.940110323089047e-05,
518
+ "loss": 0.4113,
519
+ "step": 820
520
+ },
521
+ {
522
+ "epoch": 2.94,
523
+ "learning_rate": 3.9204097714736014e-05,
524
+ "loss": 0.4106,
525
+ "step": 830
526
+ },
527
+ {
528
+ "epoch": 2.97,
529
+ "learning_rate": 3.900709219858156e-05,
530
+ "loss": 0.3952,
531
+ "step": 840
532
+ },
533
+ {
534
+ "epoch": 3.0,
535
+ "eval_accuracy": 0.9021414342629482,
536
+ "eval_loss": 0.3154027462005615,
537
+ "eval_runtime": 72.9456,
538
+ "eval_samples_per_second": 55.055,
539
+ "eval_steps_per_second": 1.727,
540
+ "step": 847
541
+ },
542
+ {
543
+ "epoch": 3.01,
544
+ "learning_rate": 3.881008668242711e-05,
545
+ "loss": 0.3375,
546
+ "step": 850
547
+ },
548
+ {
549
+ "epoch": 3.04,
550
+ "learning_rate": 3.8613081166272655e-05,
551
+ "loss": 0.3709,
552
+ "step": 860
553
+ },
554
+ {
555
+ "epoch": 3.08,
556
+ "learning_rate": 3.84160756501182e-05,
557
+ "loss": 0.3932,
558
+ "step": 870
559
+ },
560
+ {
561
+ "epoch": 3.12,
562
+ "learning_rate": 3.8219070133963755e-05,
563
+ "loss": 0.4064,
564
+ "step": 880
565
+ },
566
+ {
567
+ "epoch": 3.15,
568
+ "learning_rate": 3.80220646178093e-05,
569
+ "loss": 0.3827,
570
+ "step": 890
571
+ },
572
+ {
573
+ "epoch": 3.19,
574
+ "learning_rate": 3.782505910165485e-05,
575
+ "loss": 0.3432,
576
+ "step": 900
577
+ },
578
+ {
579
+ "epoch": 3.22,
580
+ "learning_rate": 3.7628053585500396e-05,
581
+ "loss": 0.3915,
582
+ "step": 910
583
+ },
584
+ {
585
+ "epoch": 3.26,
586
+ "learning_rate": 3.743104806934594e-05,
587
+ "loss": 0.3692,
588
+ "step": 920
589
+ },
590
+ {
591
+ "epoch": 3.29,
592
+ "learning_rate": 3.723404255319149e-05,
593
+ "loss": 0.4126,
594
+ "step": 930
595
+ },
596
+ {
597
+ "epoch": 3.33,
598
+ "learning_rate": 3.7037037037037037e-05,
599
+ "loss": 0.3977,
600
+ "step": 940
601
+ },
602
+ {
603
+ "epoch": 3.36,
604
+ "learning_rate": 3.6840031520882583e-05,
605
+ "loss": 0.3758,
606
+ "step": 950
607
+ },
608
+ {
609
+ "epoch": 3.4,
610
+ "learning_rate": 3.664302600472813e-05,
611
+ "loss": 0.4129,
612
+ "step": 960
613
+ },
614
+ {
615
+ "epoch": 3.43,
616
+ "learning_rate": 3.6446020488573684e-05,
617
+ "loss": 0.3527,
618
+ "step": 970
619
+ },
620
+ {
621
+ "epoch": 3.47,
622
+ "learning_rate": 3.624901497241923e-05,
623
+ "loss": 0.3177,
624
+ "step": 980
625
+ },
626
+ {
627
+ "epoch": 3.5,
628
+ "learning_rate": 3.605200945626478e-05,
629
+ "loss": 0.3571,
630
+ "step": 990
631
+ },
632
+ {
633
+ "epoch": 3.54,
634
+ "learning_rate": 3.5855003940110324e-05,
635
+ "loss": 0.359,
636
+ "step": 1000
637
+ },
638
+ {
639
+ "epoch": 3.58,
640
+ "learning_rate": 3.565799842395587e-05,
641
+ "loss": 0.3537,
642
+ "step": 1010
643
+ },
644
+ {
645
+ "epoch": 3.61,
646
+ "learning_rate": 3.546099290780142e-05,
647
+ "loss": 0.3786,
648
+ "step": 1020
649
+ },
650
+ {
651
+ "epoch": 3.65,
652
+ "learning_rate": 3.526398739164697e-05,
653
+ "loss": 0.3314,
654
+ "step": 1030
655
+ },
656
+ {
657
+ "epoch": 3.68,
658
+ "learning_rate": 3.506698187549252e-05,
659
+ "loss": 0.3432,
660
+ "step": 1040
661
+ },
662
+ {
663
+ "epoch": 3.72,
664
+ "learning_rate": 3.4869976359338065e-05,
665
+ "loss": 0.3584,
666
+ "step": 1050
667
+ },
668
+ {
669
+ "epoch": 3.75,
670
+ "learning_rate": 3.467297084318361e-05,
671
+ "loss": 0.3607,
672
+ "step": 1060
673
+ },
674
+ {
675
+ "epoch": 3.79,
676
+ "learning_rate": 3.447596532702916e-05,
677
+ "loss": 0.3257,
678
+ "step": 1070
679
+ },
680
+ {
681
+ "epoch": 3.82,
682
+ "learning_rate": 3.4278959810874706e-05,
683
+ "loss": 0.3938,
684
+ "step": 1080
685
+ },
686
+ {
687
+ "epoch": 3.86,
688
+ "learning_rate": 3.408195429472025e-05,
689
+ "loss": 0.3557,
690
+ "step": 1090
691
+ },
692
+ {
693
+ "epoch": 3.89,
694
+ "learning_rate": 3.38849487785658e-05,
695
+ "loss": 0.3264,
696
+ "step": 1100
697
+ },
698
+ {
699
+ "epoch": 3.93,
700
+ "learning_rate": 3.3687943262411347e-05,
701
+ "loss": 0.35,
702
+ "step": 1110
703
+ },
704
+ {
705
+ "epoch": 3.96,
706
+ "learning_rate": 3.349093774625689e-05,
707
+ "loss": 0.3431,
708
+ "step": 1120
709
+ },
710
+ {
711
+ "epoch": 4.0,
712
+ "learning_rate": 3.329393223010244e-05,
713
+ "loss": 0.3957,
714
+ "step": 1130
715
+ },
716
+ {
717
+ "epoch": 4.0,
718
+ "eval_accuracy": 0.9106075697211156,
719
+ "eval_loss": 0.2695058584213257,
720
+ "eval_runtime": 79.261,
721
+ "eval_samples_per_second": 50.668,
722
+ "eval_steps_per_second": 1.59,
723
+ "step": 1130
724
+ },
725
+ {
726
+ "epoch": 4.04,
727
+ "learning_rate": 3.309692671394799e-05,
728
+ "loss": 0.3067,
729
+ "step": 1140
730
+ },
731
+ {
732
+ "epoch": 4.07,
733
+ "learning_rate": 3.2899921197793534e-05,
734
+ "loss": 0.2966,
735
+ "step": 1150
736
+ },
737
+ {
738
+ "epoch": 4.11,
739
+ "learning_rate": 3.270291568163909e-05,
740
+ "loss": 0.2977,
741
+ "step": 1160
742
+ },
743
+ {
744
+ "epoch": 4.14,
745
+ "learning_rate": 3.2505910165484634e-05,
746
+ "loss": 0.3108,
747
+ "step": 1170
748
+ },
749
+ {
750
+ "epoch": 4.18,
751
+ "learning_rate": 3.230890464933019e-05,
752
+ "loss": 0.3401,
753
+ "step": 1180
754
+ },
755
+ {
756
+ "epoch": 4.21,
757
+ "learning_rate": 3.2111899133175735e-05,
758
+ "loss": 0.3415,
759
+ "step": 1190
760
+ },
761
+ {
762
+ "epoch": 4.25,
763
+ "learning_rate": 3.191489361702128e-05,
764
+ "loss": 0.3297,
765
+ "step": 1200
766
+ },
767
+ {
768
+ "epoch": 4.28,
769
+ "learning_rate": 3.171788810086683e-05,
770
+ "loss": 0.3289,
771
+ "step": 1210
772
+ },
773
+ {
774
+ "epoch": 4.32,
775
+ "learning_rate": 3.1520882584712375e-05,
776
+ "loss": 0.3149,
777
+ "step": 1220
778
+ },
779
+ {
780
+ "epoch": 4.35,
781
+ "learning_rate": 3.132387706855792e-05,
782
+ "loss": 0.3181,
783
+ "step": 1230
784
+ },
785
+ {
786
+ "epoch": 4.39,
787
+ "learning_rate": 3.112687155240347e-05,
788
+ "loss": 0.2779,
789
+ "step": 1240
790
+ },
791
+ {
792
+ "epoch": 4.42,
793
+ "learning_rate": 3.0929866036249016e-05,
794
+ "loss": 0.3016,
795
+ "step": 1250
796
+ },
797
+ {
798
+ "epoch": 4.46,
799
+ "learning_rate": 3.073286052009456e-05,
800
+ "loss": 0.2998,
801
+ "step": 1260
802
+ },
803
+ {
804
+ "epoch": 4.5,
805
+ "learning_rate": 3.053585500394011e-05,
806
+ "loss": 0.3195,
807
+ "step": 1270
808
+ },
809
+ {
810
+ "epoch": 4.53,
811
+ "learning_rate": 3.033884948778566e-05,
812
+ "loss": 0.3305,
813
+ "step": 1280
814
+ },
815
+ {
816
+ "epoch": 4.57,
817
+ "learning_rate": 3.0141843971631207e-05,
818
+ "loss": 0.3228,
819
+ "step": 1290
820
+ },
821
+ {
822
+ "epoch": 4.6,
823
+ "learning_rate": 2.9944838455476754e-05,
824
+ "loss": 0.3396,
825
+ "step": 1300
826
+ },
827
+ {
828
+ "epoch": 4.64,
829
+ "learning_rate": 2.97478329393223e-05,
830
+ "loss": 0.3228,
831
+ "step": 1310
832
+ },
833
+ {
834
+ "epoch": 4.67,
835
+ "learning_rate": 2.9550827423167847e-05,
836
+ "loss": 0.328,
837
+ "step": 1320
838
+ },
839
+ {
840
+ "epoch": 4.71,
841
+ "learning_rate": 2.9353821907013394e-05,
842
+ "loss": 0.2817,
843
+ "step": 1330
844
+ },
845
+ {
846
+ "epoch": 4.74,
847
+ "learning_rate": 2.9156816390858944e-05,
848
+ "loss": 0.3395,
849
+ "step": 1340
850
+ },
851
+ {
852
+ "epoch": 4.78,
853
+ "learning_rate": 2.895981087470449e-05,
854
+ "loss": 0.3263,
855
+ "step": 1350
856
+ },
857
+ {
858
+ "epoch": 4.81,
859
+ "learning_rate": 2.8762805358550045e-05,
860
+ "loss": 0.3101,
861
+ "step": 1360
862
+ },
863
+ {
864
+ "epoch": 4.85,
865
+ "learning_rate": 2.8565799842395592e-05,
866
+ "loss": 0.2777,
867
+ "step": 1370
868
+ },
869
+ {
870
+ "epoch": 4.88,
871
+ "learning_rate": 2.836879432624114e-05,
872
+ "loss": 0.305,
873
+ "step": 1380
874
+ },
875
+ {
876
+ "epoch": 4.92,
877
+ "learning_rate": 2.8171788810086685e-05,
878
+ "loss": 0.3065,
879
+ "step": 1390
880
+ },
881
+ {
882
+ "epoch": 4.96,
883
+ "learning_rate": 2.7974783293932232e-05,
884
+ "loss": 0.311,
885
+ "step": 1400
886
+ },
887
+ {
888
+ "epoch": 4.99,
889
+ "learning_rate": 2.777777777777778e-05,
890
+ "loss": 0.3146,
891
+ "step": 1410
892
+ },
893
+ {
894
+ "epoch": 5.0,
895
+ "eval_accuracy": 0.9198207171314741,
896
+ "eval_loss": 0.23812231421470642,
897
+ "eval_runtime": 71.133,
898
+ "eval_samples_per_second": 56.458,
899
+ "eval_steps_per_second": 1.771,
900
+ "step": 1412
901
+ },
902
+ {
903
+ "epoch": 5.03,
904
+ "learning_rate": 2.758077226162333e-05,
905
+ "loss": 0.2598,
906
+ "step": 1420
907
+ },
908
+ {
909
+ "epoch": 5.06,
910
+ "learning_rate": 2.7383766745468876e-05,
911
+ "loss": 0.2723,
912
+ "step": 1430
913
+ },
914
+ {
915
+ "epoch": 5.1,
916
+ "learning_rate": 2.7186761229314423e-05,
917
+ "loss": 0.2936,
918
+ "step": 1440
919
+ },
920
+ {
921
+ "epoch": 5.13,
922
+ "learning_rate": 2.698975571315997e-05,
923
+ "loss": 0.3201,
924
+ "step": 1450
925
+ },
926
+ {
927
+ "epoch": 5.17,
928
+ "learning_rate": 2.6792750197005517e-05,
929
+ "loss": 0.2537,
930
+ "step": 1460
931
+ },
932
+ {
933
+ "epoch": 5.2,
934
+ "learning_rate": 2.6595744680851064e-05,
935
+ "loss": 0.327,
936
+ "step": 1470
937
+ },
938
+ {
939
+ "epoch": 5.24,
940
+ "learning_rate": 2.639873916469661e-05,
941
+ "loss": 0.2937,
942
+ "step": 1480
943
+ },
944
+ {
945
+ "epoch": 5.27,
946
+ "learning_rate": 2.620173364854216e-05,
947
+ "loss": 0.2968,
948
+ "step": 1490
949
+ },
950
+ {
951
+ "epoch": 5.31,
952
+ "learning_rate": 2.6004728132387708e-05,
953
+ "loss": 0.2921,
954
+ "step": 1500
955
+ },
956
+ {
957
+ "epoch": 5.35,
958
+ "learning_rate": 2.5807722616233254e-05,
959
+ "loss": 0.3074,
960
+ "step": 1510
961
+ },
962
+ {
963
+ "epoch": 5.38,
964
+ "learning_rate": 2.56107171000788e-05,
965
+ "loss": 0.3168,
966
+ "step": 1520
967
+ },
968
+ {
969
+ "epoch": 5.42,
970
+ "learning_rate": 2.5413711583924348e-05,
971
+ "loss": 0.2683,
972
+ "step": 1530
973
+ },
974
+ {
975
+ "epoch": 5.45,
976
+ "learning_rate": 2.5216706067769895e-05,
977
+ "loss": 0.2985,
978
+ "step": 1540
979
+ },
980
+ {
981
+ "epoch": 5.49,
982
+ "learning_rate": 2.5019700551615445e-05,
983
+ "loss": 0.2838,
984
+ "step": 1550
985
+ },
986
+ {
987
+ "epoch": 5.52,
988
+ "learning_rate": 2.4822695035460995e-05,
989
+ "loss": 0.2936,
990
+ "step": 1560
991
+ },
992
+ {
993
+ "epoch": 5.56,
994
+ "learning_rate": 2.4625689519306542e-05,
995
+ "loss": 0.3081,
996
+ "step": 1570
997
+ },
998
+ {
999
+ "epoch": 5.59,
1000
+ "learning_rate": 2.442868400315209e-05,
1001
+ "loss": 0.2373,
1002
+ "step": 1580
1003
+ },
1004
+ {
1005
+ "epoch": 5.63,
1006
+ "learning_rate": 2.4231678486997636e-05,
1007
+ "loss": 0.2953,
1008
+ "step": 1590
1009
+ },
1010
+ {
1011
+ "epoch": 5.66,
1012
+ "learning_rate": 2.4034672970843186e-05,
1013
+ "loss": 0.2781,
1014
+ "step": 1600
1015
+ },
1016
+ {
1017
+ "epoch": 5.7,
1018
+ "learning_rate": 2.3837667454688733e-05,
1019
+ "loss": 0.277,
1020
+ "step": 1610
1021
+ },
1022
+ {
1023
+ "epoch": 5.73,
1024
+ "learning_rate": 2.364066193853428e-05,
1025
+ "loss": 0.2925,
1026
+ "step": 1620
1027
+ },
1028
+ {
1029
+ "epoch": 5.77,
1030
+ "learning_rate": 2.3443656422379827e-05,
1031
+ "loss": 0.2578,
1032
+ "step": 1630
1033
+ },
1034
+ {
1035
+ "epoch": 5.81,
1036
+ "learning_rate": 2.3246650906225377e-05,
1037
+ "loss": 0.2808,
1038
+ "step": 1640
1039
+ },
1040
+ {
1041
+ "epoch": 5.84,
1042
+ "learning_rate": 2.3049645390070924e-05,
1043
+ "loss": 0.3181,
1044
+ "step": 1650
1045
+ },
1046
+ {
1047
+ "epoch": 5.88,
1048
+ "learning_rate": 2.285263987391647e-05,
1049
+ "loss": 0.302,
1050
+ "step": 1660
1051
+ },
1052
+ {
1053
+ "epoch": 5.91,
1054
+ "learning_rate": 2.2655634357762018e-05,
1055
+ "loss": 0.2724,
1056
+ "step": 1670
1057
+ },
1058
+ {
1059
+ "epoch": 5.95,
1060
+ "learning_rate": 2.2458628841607564e-05,
1061
+ "loss": 0.2614,
1062
+ "step": 1680
1063
+ },
1064
+ {
1065
+ "epoch": 5.98,
1066
+ "learning_rate": 2.226162332545311e-05,
1067
+ "loss": 0.2883,
1068
+ "step": 1690
1069
+ },
1070
+ {
1071
+ "epoch": 6.0,
1072
+ "eval_accuracy": 0.921812749003984,
1073
+ "eval_loss": 0.24074074625968933,
1074
+ "eval_runtime": 81.9182,
1075
+ "eval_samples_per_second": 49.025,
1076
+ "eval_steps_per_second": 1.538,
1077
+ "step": 1695
1078
+ },
1079
+ {
1080
+ "epoch": 6.02,
1081
+ "learning_rate": 2.206461780929866e-05,
1082
+ "loss": 0.2695,
1083
+ "step": 1700
1084
+ },
1085
+ {
1086
+ "epoch": 6.05,
1087
+ "learning_rate": 2.186761229314421e-05,
1088
+ "loss": 0.31,
1089
+ "step": 1710
1090
+ },
1091
+ {
1092
+ "epoch": 6.09,
1093
+ "learning_rate": 2.167060677698976e-05,
1094
+ "loss": 0.2598,
1095
+ "step": 1720
1096
+ },
1097
+ {
1098
+ "epoch": 6.12,
1099
+ "learning_rate": 2.1473601260835305e-05,
1100
+ "loss": 0.26,
1101
+ "step": 1730
1102
+ },
1103
+ {
1104
+ "epoch": 6.16,
1105
+ "learning_rate": 2.1276595744680852e-05,
1106
+ "loss": 0.2491,
1107
+ "step": 1740
1108
+ },
1109
+ {
1110
+ "epoch": 6.19,
1111
+ "learning_rate": 2.10795902285264e-05,
1112
+ "loss": 0.2479,
1113
+ "step": 1750
1114
+ },
1115
+ {
1116
+ "epoch": 6.23,
1117
+ "learning_rate": 2.0882584712371946e-05,
1118
+ "loss": 0.2387,
1119
+ "step": 1760
1120
+ },
1121
+ {
1122
+ "epoch": 6.27,
1123
+ "learning_rate": 2.0685579196217493e-05,
1124
+ "loss": 0.2686,
1125
+ "step": 1770
1126
+ },
1127
+ {
1128
+ "epoch": 6.3,
1129
+ "learning_rate": 2.0488573680063043e-05,
1130
+ "loss": 0.2302,
1131
+ "step": 1780
1132
+ },
1133
+ {
1134
+ "epoch": 6.34,
1135
+ "learning_rate": 2.029156816390859e-05,
1136
+ "loss": 0.2592,
1137
+ "step": 1790
1138
+ },
1139
+ {
1140
+ "epoch": 6.37,
1141
+ "learning_rate": 2.009456264775414e-05,
1142
+ "loss": 0.2857,
1143
+ "step": 1800
1144
+ },
1145
+ {
1146
+ "epoch": 6.41,
1147
+ "learning_rate": 1.9897557131599687e-05,
1148
+ "loss": 0.2666,
1149
+ "step": 1810
1150
+ },
1151
+ {
1152
+ "epoch": 6.44,
1153
+ "learning_rate": 1.9700551615445234e-05,
1154
+ "loss": 0.2332,
1155
+ "step": 1820
1156
+ },
1157
+ {
1158
+ "epoch": 6.48,
1159
+ "learning_rate": 1.950354609929078e-05,
1160
+ "loss": 0.2552,
1161
+ "step": 1830
1162
+ },
1163
+ {
1164
+ "epoch": 6.51,
1165
+ "learning_rate": 1.9306540583136327e-05,
1166
+ "loss": 0.2688,
1167
+ "step": 1840
1168
+ },
1169
+ {
1170
+ "epoch": 6.55,
1171
+ "learning_rate": 1.9109535066981878e-05,
1172
+ "loss": 0.2424,
1173
+ "step": 1850
1174
+ },
1175
+ {
1176
+ "epoch": 6.58,
1177
+ "learning_rate": 1.8912529550827425e-05,
1178
+ "loss": 0.2981,
1179
+ "step": 1860
1180
+ },
1181
+ {
1182
+ "epoch": 6.62,
1183
+ "learning_rate": 1.871552403467297e-05,
1184
+ "loss": 0.2247,
1185
+ "step": 1870
1186
+ },
1187
+ {
1188
+ "epoch": 6.65,
1189
+ "learning_rate": 1.8518518518518518e-05,
1190
+ "loss": 0.2652,
1191
+ "step": 1880
1192
+ },
1193
+ {
1194
+ "epoch": 6.69,
1195
+ "learning_rate": 1.8321513002364065e-05,
1196
+ "loss": 0.2303,
1197
+ "step": 1890
1198
+ },
1199
+ {
1200
+ "epoch": 6.73,
1201
+ "learning_rate": 1.8124507486209615e-05,
1202
+ "loss": 0.2841,
1203
+ "step": 1900
1204
+ },
1205
+ {
1206
+ "epoch": 6.76,
1207
+ "learning_rate": 1.7927501970055162e-05,
1208
+ "loss": 0.2634,
1209
+ "step": 1910
1210
+ },
1211
+ {
1212
+ "epoch": 6.8,
1213
+ "learning_rate": 1.773049645390071e-05,
1214
+ "loss": 0.2628,
1215
+ "step": 1920
1216
+ },
1217
+ {
1218
+ "epoch": 6.83,
1219
+ "learning_rate": 1.753349093774626e-05,
1220
+ "loss": 0.2686,
1221
+ "step": 1930
1222
+ },
1223
+ {
1224
+ "epoch": 6.87,
1225
+ "learning_rate": 1.7336485421591806e-05,
1226
+ "loss": 0.2289,
1227
+ "step": 1940
1228
+ },
1229
+ {
1230
+ "epoch": 6.9,
1231
+ "learning_rate": 1.7139479905437353e-05,
1232
+ "loss": 0.2754,
1233
+ "step": 1950
1234
+ },
1235
+ {
1236
+ "epoch": 6.94,
1237
+ "learning_rate": 1.69424743892829e-05,
1238
+ "loss": 0.3029,
1239
+ "step": 1960
1240
+ },
1241
+ {
1242
+ "epoch": 6.97,
1243
+ "learning_rate": 1.6745468873128447e-05,
1244
+ "loss": 0.2264,
1245
+ "step": 1970
1246
+ },
1247
+ {
1248
+ "epoch": 7.0,
1249
+ "eval_accuracy": 0.9277888446215139,
1250
+ "eval_loss": 0.21604356169700623,
1251
+ "eval_runtime": 81.708,
1252
+ "eval_samples_per_second": 49.151,
1253
+ "eval_steps_per_second": 1.542,
1254
+ "step": 1977
1255
+ },
1256
+ {
1257
+ "epoch": 7.01,
1258
+ "learning_rate": 1.6548463356973994e-05,
1259
+ "loss": 0.2146,
1260
+ "step": 1980
1261
+ },
1262
+ {
1263
+ "epoch": 7.04,
1264
+ "learning_rate": 1.6351457840819544e-05,
1265
+ "loss": 0.246,
1266
+ "step": 1990
1267
+ },
1268
+ {
1269
+ "epoch": 7.08,
1270
+ "learning_rate": 1.6154452324665094e-05,
1271
+ "loss": 0.2321,
1272
+ "step": 2000
1273
+ },
1274
+ {
1275
+ "epoch": 7.12,
1276
+ "learning_rate": 1.595744680851064e-05,
1277
+ "loss": 0.2116,
1278
+ "step": 2010
1279
+ },
1280
+ {
1281
+ "epoch": 7.15,
1282
+ "learning_rate": 1.5760441292356188e-05,
1283
+ "loss": 0.2548,
1284
+ "step": 2020
1285
+ },
1286
+ {
1287
+ "epoch": 7.19,
1288
+ "learning_rate": 1.5563435776201735e-05,
1289
+ "loss": 0.2535,
1290
+ "step": 2030
1291
+ },
1292
+ {
1293
+ "epoch": 7.22,
1294
+ "learning_rate": 1.536643026004728e-05,
1295
+ "loss": 0.2411,
1296
+ "step": 2040
1297
+ },
1298
+ {
1299
+ "epoch": 7.26,
1300
+ "learning_rate": 1.516942474389283e-05,
1301
+ "loss": 0.2832,
1302
+ "step": 2050
1303
+ },
1304
+ {
1305
+ "epoch": 7.29,
1306
+ "learning_rate": 1.4972419227738377e-05,
1307
+ "loss": 0.2626,
1308
+ "step": 2060
1309
+ },
1310
+ {
1311
+ "epoch": 7.33,
1312
+ "learning_rate": 1.4775413711583924e-05,
1313
+ "loss": 0.2454,
1314
+ "step": 2070
1315
+ },
1316
+ {
1317
+ "epoch": 7.36,
1318
+ "learning_rate": 1.4578408195429472e-05,
1319
+ "loss": 0.2343,
1320
+ "step": 2080
1321
+ },
1322
+ {
1323
+ "epoch": 7.4,
1324
+ "learning_rate": 1.4381402679275022e-05,
1325
+ "loss": 0.235,
1326
+ "step": 2090
1327
+ },
1328
+ {
1329
+ "epoch": 7.43,
1330
+ "learning_rate": 1.418439716312057e-05,
1331
+ "loss": 0.22,
1332
+ "step": 2100
1333
+ },
1334
+ {
1335
+ "epoch": 7.47,
1336
+ "learning_rate": 1.3987391646966116e-05,
1337
+ "loss": 0.2518,
1338
+ "step": 2110
1339
+ },
1340
+ {
1341
+ "epoch": 7.5,
1342
+ "learning_rate": 1.3790386130811665e-05,
1343
+ "loss": 0.2151,
1344
+ "step": 2120
1345
+ },
1346
+ {
1347
+ "epoch": 7.54,
1348
+ "learning_rate": 1.3593380614657212e-05,
1349
+ "loss": 0.2332,
1350
+ "step": 2130
1351
+ },
1352
+ {
1353
+ "epoch": 7.58,
1354
+ "learning_rate": 1.3396375098502758e-05,
1355
+ "loss": 0.1913,
1356
+ "step": 2140
1357
+ },
1358
+ {
1359
+ "epoch": 7.61,
1360
+ "learning_rate": 1.3199369582348305e-05,
1361
+ "loss": 0.2233,
1362
+ "step": 2150
1363
+ },
1364
+ {
1365
+ "epoch": 7.65,
1366
+ "learning_rate": 1.3002364066193854e-05,
1367
+ "loss": 0.2342,
1368
+ "step": 2160
1369
+ },
1370
+ {
1371
+ "epoch": 7.68,
1372
+ "learning_rate": 1.28053585500394e-05,
1373
+ "loss": 0.221,
1374
+ "step": 2170
1375
+ },
1376
+ {
1377
+ "epoch": 7.72,
1378
+ "learning_rate": 1.2608353033884947e-05,
1379
+ "loss": 0.2295,
1380
+ "step": 2180
1381
+ },
1382
+ {
1383
+ "epoch": 7.75,
1384
+ "learning_rate": 1.2411347517730498e-05,
1385
+ "loss": 0.2108,
1386
+ "step": 2190
1387
+ },
1388
+ {
1389
+ "epoch": 7.79,
1390
+ "learning_rate": 1.2214342001576045e-05,
1391
+ "loss": 0.2356,
1392
+ "step": 2200
1393
+ },
1394
+ {
1395
+ "epoch": 7.82,
1396
+ "learning_rate": 1.2017336485421593e-05,
1397
+ "loss": 0.255,
1398
+ "step": 2210
1399
+ },
1400
+ {
1401
+ "epoch": 7.86,
1402
+ "learning_rate": 1.182033096926714e-05,
1403
+ "loss": 0.2301,
1404
+ "step": 2220
1405
+ },
1406
+ {
1407
+ "epoch": 7.89,
1408
+ "learning_rate": 1.1623325453112688e-05,
1409
+ "loss": 0.2541,
1410
+ "step": 2230
1411
+ },
1412
+ {
1413
+ "epoch": 7.93,
1414
+ "learning_rate": 1.1426319936958235e-05,
1415
+ "loss": 0.2411,
1416
+ "step": 2240
1417
+ },
1418
+ {
1419
+ "epoch": 7.96,
1420
+ "learning_rate": 1.1229314420803782e-05,
1421
+ "loss": 0.2147,
1422
+ "step": 2250
1423
+ },
1424
+ {
1425
+ "epoch": 8.0,
1426
+ "learning_rate": 1.103230890464933e-05,
1427
+ "loss": 0.2339,
1428
+ "step": 2260
1429
+ },
1430
+ {
1431
+ "epoch": 8.0,
1432
+ "eval_accuracy": 0.9282868525896414,
1433
+ "eval_loss": 0.21214231848716736,
1434
+ "eval_runtime": 78.6901,
1435
+ "eval_samples_per_second": 51.036,
1436
+ "eval_steps_per_second": 1.601,
1437
+ "step": 2260
1438
+ },
1439
+ {
1440
+ "epoch": 8.04,
1441
+ "learning_rate": 1.083530338849488e-05,
1442
+ "loss": 0.1966,
1443
+ "step": 2270
1444
+ },
1445
+ {
1446
+ "epoch": 8.07,
1447
+ "learning_rate": 1.0638297872340426e-05,
1448
+ "loss": 0.1993,
1449
+ "step": 2280
1450
+ },
1451
+ {
1452
+ "epoch": 8.11,
1453
+ "learning_rate": 1.0441292356185973e-05,
1454
+ "loss": 0.2137,
1455
+ "step": 2290
1456
+ },
1457
+ {
1458
+ "epoch": 8.14,
1459
+ "learning_rate": 1.0244286840031522e-05,
1460
+ "loss": 0.2221,
1461
+ "step": 2300
1462
+ },
1463
+ {
1464
+ "epoch": 8.18,
1465
+ "learning_rate": 1.004728132387707e-05,
1466
+ "loss": 0.2274,
1467
+ "step": 2310
1468
+ },
1469
+ {
1470
+ "epoch": 8.21,
1471
+ "learning_rate": 9.850275807722617e-06,
1472
+ "loss": 0.2008,
1473
+ "step": 2320
1474
+ },
1475
+ {
1476
+ "epoch": 8.25,
1477
+ "learning_rate": 9.653270291568164e-06,
1478
+ "loss": 0.244,
1479
+ "step": 2330
1480
+ },
1481
+ {
1482
+ "epoch": 8.28,
1483
+ "learning_rate": 9.456264775413712e-06,
1484
+ "loss": 0.207,
1485
+ "step": 2340
1486
+ },
1487
+ {
1488
+ "epoch": 8.32,
1489
+ "learning_rate": 9.259259259259259e-06,
1490
+ "loss": 0.2319,
1491
+ "step": 2350
1492
+ },
1493
+ {
1494
+ "epoch": 8.35,
1495
+ "learning_rate": 9.062253743104808e-06,
1496
+ "loss": 0.2151,
1497
+ "step": 2360
1498
+ },
1499
+ {
1500
+ "epoch": 8.39,
1501
+ "learning_rate": 8.865248226950355e-06,
1502
+ "loss": 0.2118,
1503
+ "step": 2370
1504
+ },
1505
+ {
1506
+ "epoch": 8.42,
1507
+ "learning_rate": 8.668242710795903e-06,
1508
+ "loss": 0.2449,
1509
+ "step": 2380
1510
+ },
1511
+ {
1512
+ "epoch": 8.46,
1513
+ "learning_rate": 8.47123719464145e-06,
1514
+ "loss": 0.2707,
1515
+ "step": 2390
1516
+ },
1517
+ {
1518
+ "epoch": 8.5,
1519
+ "learning_rate": 8.274231678486997e-06,
1520
+ "loss": 0.2238,
1521
+ "step": 2400
1522
+ },
1523
+ {
1524
+ "epoch": 8.53,
1525
+ "learning_rate": 8.077226162332547e-06,
1526
+ "loss": 0.2197,
1527
+ "step": 2410
1528
+ },
1529
+ {
1530
+ "epoch": 8.57,
1531
+ "learning_rate": 7.880220646178094e-06,
1532
+ "loss": 0.2109,
1533
+ "step": 2420
1534
+ },
1535
+ {
1536
+ "epoch": 8.6,
1537
+ "learning_rate": 7.68321513002364e-06,
1538
+ "loss": 0.2258,
1539
+ "step": 2430
1540
+ },
1541
+ {
1542
+ "epoch": 8.64,
1543
+ "learning_rate": 7.486209613869188e-06,
1544
+ "loss": 0.1806,
1545
+ "step": 2440
1546
+ },
1547
+ {
1548
+ "epoch": 8.67,
1549
+ "learning_rate": 7.289204097714736e-06,
1550
+ "loss": 0.2156,
1551
+ "step": 2450
1552
+ },
1553
+ {
1554
+ "epoch": 8.71,
1555
+ "learning_rate": 7.092198581560285e-06,
1556
+ "loss": 0.2298,
1557
+ "step": 2460
1558
+ },
1559
+ {
1560
+ "epoch": 8.74,
1561
+ "learning_rate": 6.895193065405832e-06,
1562
+ "loss": 0.1984,
1563
+ "step": 2470
1564
+ },
1565
+ {
1566
+ "epoch": 8.78,
1567
+ "learning_rate": 6.698187549251379e-06,
1568
+ "loss": 0.1997,
1569
+ "step": 2480
1570
+ },
1571
+ {
1572
+ "epoch": 8.81,
1573
+ "learning_rate": 6.501182033096927e-06,
1574
+ "loss": 0.2346,
1575
+ "step": 2490
1576
+ },
1577
+ {
1578
+ "epoch": 8.85,
1579
+ "learning_rate": 6.304176516942474e-06,
1580
+ "loss": 0.1982,
1581
+ "step": 2500
1582
+ },
1583
+ {
1584
+ "epoch": 8.88,
1585
+ "learning_rate": 6.107171000788022e-06,
1586
+ "loss": 0.2262,
1587
+ "step": 2510
1588
+ },
1589
+ {
1590
+ "epoch": 8.92,
1591
+ "learning_rate": 5.91016548463357e-06,
1592
+ "loss": 0.195,
1593
+ "step": 2520
1594
+ },
1595
+ {
1596
+ "epoch": 8.96,
1597
+ "learning_rate": 5.713159968479118e-06,
1598
+ "loss": 0.2519,
1599
+ "step": 2530
1600
+ },
1601
+ {
1602
+ "epoch": 8.99,
1603
+ "learning_rate": 5.516154452324665e-06,
1604
+ "loss": 0.1966,
1605
+ "step": 2540
1606
+ },
1607
+ {
1608
+ "epoch": 9.0,
1609
+ "eval_accuracy": 0.9302788844621513,
1610
+ "eval_loss": 0.20438914000988007,
1611
+ "eval_runtime": 93.0501,
1612
+ "eval_samples_per_second": 43.16,
1613
+ "eval_steps_per_second": 1.354,
1614
+ "step": 2542
1615
+ },
1616
+ {
1617
+ "epoch": 9.03,
1618
+ "learning_rate": 5.319148936170213e-06,
1619
+ "loss": 0.2231,
1620
+ "step": 2550
1621
+ },
1622
+ {
1623
+ "epoch": 9.06,
1624
+ "learning_rate": 5.122143420015761e-06,
1625
+ "loss": 0.2259,
1626
+ "step": 2560
1627
+ },
1628
+ {
1629
+ "epoch": 9.1,
1630
+ "learning_rate": 4.9251379038613084e-06,
1631
+ "loss": 0.2053,
1632
+ "step": 2570
1633
+ },
1634
+ {
1635
+ "epoch": 9.13,
1636
+ "learning_rate": 4.728132387706856e-06,
1637
+ "loss": 0.1838,
1638
+ "step": 2580
1639
+ },
1640
+ {
1641
+ "epoch": 9.17,
1642
+ "learning_rate": 4.531126871552404e-06,
1643
+ "loss": 0.2209,
1644
+ "step": 2590
1645
+ },
1646
+ {
1647
+ "epoch": 9.2,
1648
+ "learning_rate": 4.3341213553979515e-06,
1649
+ "loss": 0.2122,
1650
+ "step": 2600
1651
+ },
1652
+ {
1653
+ "epoch": 9.24,
1654
+ "learning_rate": 4.137115839243498e-06,
1655
+ "loss": 0.1958,
1656
+ "step": 2610
1657
+ },
1658
+ {
1659
+ "epoch": 9.27,
1660
+ "learning_rate": 3.940110323089047e-06,
1661
+ "loss": 0.1875,
1662
+ "step": 2620
1663
+ },
1664
+ {
1665
+ "epoch": 9.31,
1666
+ "learning_rate": 3.743104806934594e-06,
1667
+ "loss": 0.2083,
1668
+ "step": 2630
1669
+ },
1670
+ {
1671
+ "epoch": 9.35,
1672
+ "learning_rate": 3.5460992907801423e-06,
1673
+ "loss": 0.2043,
1674
+ "step": 2640
1675
+ },
1676
+ {
1677
+ "epoch": 9.38,
1678
+ "learning_rate": 3.3490937746256896e-06,
1679
+ "loss": 0.1946,
1680
+ "step": 2650
1681
+ },
1682
+ {
1683
+ "epoch": 9.42,
1684
+ "learning_rate": 3.152088258471237e-06,
1685
+ "loss": 0.194,
1686
+ "step": 2660
1687
+ },
1688
+ {
1689
+ "epoch": 9.45,
1690
+ "learning_rate": 2.955082742316785e-06,
1691
+ "loss": 0.209,
1692
+ "step": 2670
1693
+ },
1694
+ {
1695
+ "epoch": 9.49,
1696
+ "learning_rate": 2.7580772261623327e-06,
1697
+ "loss": 0.2092,
1698
+ "step": 2680
1699
+ },
1700
+ {
1701
+ "epoch": 9.52,
1702
+ "learning_rate": 2.5610717100078804e-06,
1703
+ "loss": 0.2246,
1704
+ "step": 2690
1705
+ },
1706
+ {
1707
+ "epoch": 9.56,
1708
+ "learning_rate": 2.364066193853428e-06,
1709
+ "loss": 0.2011,
1710
+ "step": 2700
1711
+ },
1712
+ {
1713
+ "epoch": 9.59,
1714
+ "learning_rate": 2.1670606776989758e-06,
1715
+ "loss": 0.1963,
1716
+ "step": 2710
1717
+ },
1718
+ {
1719
+ "epoch": 9.63,
1720
+ "learning_rate": 1.9700551615445235e-06,
1721
+ "loss": 0.2212,
1722
+ "step": 2720
1723
+ },
1724
+ {
1725
+ "epoch": 9.66,
1726
+ "learning_rate": 1.7730496453900712e-06,
1727
+ "loss": 0.214,
1728
+ "step": 2730
1729
+ },
1730
+ {
1731
+ "epoch": 9.7,
1732
+ "learning_rate": 1.5760441292356184e-06,
1733
+ "loss": 0.18,
1734
+ "step": 2740
1735
+ },
1736
+ {
1737
+ "epoch": 9.73,
1738
+ "learning_rate": 1.3790386130811663e-06,
1739
+ "loss": 0.183,
1740
+ "step": 2750
1741
+ },
1742
+ {
1743
+ "epoch": 9.77,
1744
+ "learning_rate": 1.182033096926714e-06,
1745
+ "loss": 0.2138,
1746
+ "step": 2760
1747
+ },
1748
+ {
1749
+ "epoch": 9.81,
1750
+ "learning_rate": 9.850275807722617e-07,
1751
+ "loss": 0.206,
1752
+ "step": 2770
1753
+ },
1754
+ {
1755
+ "epoch": 9.84,
1756
+ "learning_rate": 7.880220646178092e-07,
1757
+ "loss": 0.1992,
1758
+ "step": 2780
1759
+ },
1760
+ {
1761
+ "epoch": 9.88,
1762
+ "learning_rate": 5.91016548463357e-07,
1763
+ "loss": 0.2199,
1764
+ "step": 2790
1765
+ },
1766
+ {
1767
+ "epoch": 9.91,
1768
+ "learning_rate": 3.940110323089046e-07,
1769
+ "loss": 0.195,
1770
+ "step": 2800
1771
+ },
1772
+ {
1773
+ "epoch": 9.95,
1774
+ "learning_rate": 1.970055161544523e-07,
1775
+ "loss": 0.2096,
1776
+ "step": 2810
1777
+ },
1778
+ {
1779
+ "epoch": 9.98,
1780
+ "learning_rate": 0.0,
1781
+ "loss": 0.2366,
1782
+ "step": 2820
1783
+ },
1784
+ {
1785
+ "epoch": 9.98,
1786
+ "eval_accuracy": 0.9342629482071713,
1787
+ "eval_loss": 0.19922704994678497,
1788
+ "eval_runtime": 78.2576,
1789
+ "eval_samples_per_second": 51.318,
1790
+ "eval_steps_per_second": 1.61,
1791
+ "step": 2820
1792
+ },
1793
+ {
1794
+ "epoch": 9.98,
1795
+ "step": 2820,
1796
+ "total_flos": 2.796343146304469e+19,
1797
+ "train_loss": 0.5066332549913555,
1798
+ "train_runtime": 12045.0104,
1799
+ "train_samples_per_second": 30.002,
1800
+ "train_steps_per_second": 0.234
1801
+ }
1802
+ ],
1803
+ "logging_steps": 10,
1804
+ "max_steps": 2820,
1805
+ "num_train_epochs": 10,
1806
+ "save_steps": 500,
1807
+ "total_flos": 2.796343146304469e+19,
1808
+ "trial_name": null,
1809
+ "trial_params": null
1810
+ }