pszemraj commited on
Commit
552a5a3
1 Parent(s): 9d0261a

End of training

Browse files
Files changed (5) hide show
  1. README.md +5 -3
  2. all_results.json +15 -0
  3. eval_results.json +9 -0
  4. train_results.json +10 -0
  5. trainer_state.json +1354 -0
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  library_name: transformers
 
 
3
  license: apache-2.0
4
  base_model: facebook/bart-large
5
  tags:
@@ -14,10 +16,10 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  # bart-large-summary-map-reduce-1024
16
 
17
- This model is a fine-tuned version of [facebook/bart-large](https://huggingface.co/facebook/bart-large) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 0.7903
20
- - Num Input Tokens Seen: 12814620
21
 
22
  ## Model description
23
 
 
1
  ---
2
  library_name: transformers
3
+ language:
4
+ - en
5
  license: apache-2.0
6
  base_model: facebook/bart-large
7
  tags:
 
16
 
17
  # bart-large-summary-map-reduce-1024
18
 
19
+ This model is a fine-tuned version of [facebook/bart-large](https://huggingface.co/facebook/bart-large) on the pszemraj/summary-map-reduce dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.7894
22
+ - Num Input Tokens Seen: 14258488
23
 
24
  ## Model description
25
 
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.9906542056074765,
3
+ "eval_loss": 0.78944993019104,
4
+ "eval_runtime": 0.9197,
5
+ "eval_samples": 150,
6
+ "eval_samples_per_second": 163.104,
7
+ "eval_steps_per_second": 41.32,
8
+ "num_input_tokens_seen": 14258488,
9
+ "total_flos": 3.0175424769490944e+16,
10
+ "train_loss": 0.895275863011678,
11
+ "train_runtime": 860.318,
12
+ "train_samples": 16692,
13
+ "train_samples_per_second": 58.206,
14
+ "train_steps_per_second": 0.907
15
+ }
eval_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.9906542056074765,
3
+ "eval_loss": 0.78944993019104,
4
+ "eval_runtime": 0.9197,
5
+ "eval_samples": 150,
6
+ "eval_samples_per_second": 163.104,
7
+ "eval_steps_per_second": 41.32,
8
+ "num_input_tokens_seen": 14258488
9
+ }
train_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.9906542056074765,
3
+ "num_input_tokens_seen": 14258488,
4
+ "total_flos": 3.0175424769490944e+16,
5
+ "train_loss": 0.895275863011678,
6
+ "train_runtime": 860.318,
7
+ "train_samples": 16692,
8
+ "train_samples_per_second": 58.206,
9
+ "train_steps_per_second": 0.907
10
+ }
trainer_state.json ADDED
@@ -0,0 +1,1354 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.9906542056074765,
5
+ "eval_steps": 100,
6
+ "global_step": 780,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.01917086029235562,
13
+ "grad_norm": 10.80891227722168,
14
+ "learning_rate": 1.282051282051282e-05,
15
+ "loss": 2.5867,
16
+ "num_input_tokens_seen": 85900,
17
+ "step": 5
18
+ },
19
+ {
20
+ "epoch": 0.03834172058471124,
21
+ "grad_norm": 5.320851802825928,
22
+ "learning_rate": 2.564102564102564e-05,
23
+ "loss": 1.778,
24
+ "num_input_tokens_seen": 168784,
25
+ "step": 10
26
+ },
27
+ {
28
+ "epoch": 0.05751258087706686,
29
+ "grad_norm": 2.427093982696533,
30
+ "learning_rate": 3.846153846153846e-05,
31
+ "loss": 1.5006,
32
+ "num_input_tokens_seen": 262708,
33
+ "step": 15
34
+ },
35
+ {
36
+ "epoch": 0.07668344116942248,
37
+ "grad_norm": 2.143993854522705,
38
+ "learning_rate": 5.128205128205128e-05,
39
+ "loss": 1.3452,
40
+ "num_input_tokens_seen": 350572,
41
+ "step": 20
42
+ },
43
+ {
44
+ "epoch": 0.09585430146177809,
45
+ "grad_norm": 2.46453595161438,
46
+ "learning_rate": 6.410256410256412e-05,
47
+ "loss": 1.2953,
48
+ "num_input_tokens_seen": 441976,
49
+ "step": 25
50
+ },
51
+ {
52
+ "epoch": 0.11502516175413371,
53
+ "grad_norm": 2.382268190383911,
54
+ "learning_rate": 7.692307692307693e-05,
55
+ "loss": 1.3104,
56
+ "num_input_tokens_seen": 525052,
57
+ "step": 30
58
+ },
59
+ {
60
+ "epoch": 0.13419602204648934,
61
+ "grad_norm": 2.990694761276245,
62
+ "learning_rate": 8.974358974358975e-05,
63
+ "loss": 1.306,
64
+ "num_input_tokens_seen": 626524,
65
+ "step": 35
66
+ },
67
+ {
68
+ "epoch": 0.15336688233884496,
69
+ "grad_norm": 1.8448060750961304,
70
+ "learning_rate": 9.99995506314361e-05,
71
+ "loss": 1.2473,
72
+ "num_input_tokens_seen": 709868,
73
+ "step": 40
74
+ },
75
+ {
76
+ "epoch": 0.17253774263120059,
77
+ "grad_norm": 2.6468124389648438,
78
+ "learning_rate": 9.998382357979809e-05,
79
+ "loss": 1.2091,
80
+ "num_input_tokens_seen": 823312,
81
+ "step": 45
82
+ },
83
+ {
84
+ "epoch": 0.19170860292355618,
85
+ "grad_norm": 1.919690728187561,
86
+ "learning_rate": 9.994563617659665e-05,
87
+ "loss": 1.1985,
88
+ "num_input_tokens_seen": 915100,
89
+ "step": 50
90
+ },
91
+ {
92
+ "epoch": 0.2108794632159118,
93
+ "grad_norm": 2.13997483253479,
94
+ "learning_rate": 9.988500558143337e-05,
95
+ "loss": 1.1922,
96
+ "num_input_tokens_seen": 1014472,
97
+ "step": 55
98
+ },
99
+ {
100
+ "epoch": 0.23005032350826743,
101
+ "grad_norm": 1.7454973459243774,
102
+ "learning_rate": 9.980195903881232e-05,
103
+ "loss": 1.2273,
104
+ "num_input_tokens_seen": 1111288,
105
+ "step": 60
106
+ },
107
+ {
108
+ "epoch": 0.24922118380062305,
109
+ "grad_norm": 2.0489871501922607,
110
+ "learning_rate": 9.969653386589748e-05,
111
+ "loss": 1.1729,
112
+ "num_input_tokens_seen": 1193792,
113
+ "step": 65
114
+ },
115
+ {
116
+ "epoch": 0.2683920440929787,
117
+ "grad_norm": 1.851652979850769,
118
+ "learning_rate": 9.956877743574438e-05,
119
+ "loss": 1.188,
120
+ "num_input_tokens_seen": 1293416,
121
+ "step": 70
122
+ },
123
+ {
124
+ "epoch": 0.2875629043853343,
125
+ "grad_norm": 2.3986475467681885,
126
+ "learning_rate": 9.94187471560127e-05,
127
+ "loss": 1.1459,
128
+ "num_input_tokens_seen": 1383712,
129
+ "step": 75
130
+ },
131
+ {
132
+ "epoch": 0.3067337646776899,
133
+ "grad_norm": 1.6811550855636597,
134
+ "learning_rate": 9.924651044317017e-05,
135
+ "loss": 1.1561,
136
+ "num_input_tokens_seen": 1483596,
137
+ "step": 80
138
+ },
139
+ {
140
+ "epoch": 0.3259046249700455,
141
+ "grad_norm": 2.3249671459198,
142
+ "learning_rate": 9.90521446921987e-05,
143
+ "loss": 1.1224,
144
+ "num_input_tokens_seen": 1561392,
145
+ "step": 85
146
+ },
147
+ {
148
+ "epoch": 0.34507548526240117,
149
+ "grad_norm": 2.17244553565979,
150
+ "learning_rate": 9.883573724181683e-05,
151
+ "loss": 1.1283,
152
+ "num_input_tokens_seen": 1660352,
153
+ "step": 90
154
+ },
155
+ {
156
+ "epoch": 0.36424634555475677,
157
+ "grad_norm": 2.0246732234954834,
158
+ "learning_rate": 9.859738533523383e-05,
159
+ "loss": 1.155,
160
+ "num_input_tokens_seen": 1751904,
161
+ "step": 95
162
+ },
163
+ {
164
+ "epoch": 0.38341720584711236,
165
+ "grad_norm": 1.6448779106140137,
166
+ "learning_rate": 9.833719607645324e-05,
167
+ "loss": 1.0645,
168
+ "num_input_tokens_seen": 1844404,
169
+ "step": 100
170
+ },
171
+ {
172
+ "epoch": 0.38341720584711236,
173
+ "eval_loss": 0.9265391826629639,
174
+ "eval_runtime": 0.8899,
175
+ "eval_samples_per_second": 168.564,
176
+ "eval_steps_per_second": 42.703,
177
+ "num_input_tokens_seen": 1844404,
178
+ "step": 100
179
+ },
180
+ {
181
+ "epoch": 0.402588066139468,
182
+ "grad_norm": 3.241382598876953,
183
+ "learning_rate": 9.805528638214542e-05,
184
+ "loss": 1.1597,
185
+ "num_input_tokens_seen": 1930376,
186
+ "step": 105
187
+ },
188
+ {
189
+ "epoch": 0.4217589264318236,
190
+ "grad_norm": 1.7404630184173584,
191
+ "learning_rate": 9.77517829291108e-05,
192
+ "loss": 1.112,
193
+ "num_input_tokens_seen": 2014196,
194
+ "step": 110
195
+ },
196
+ {
197
+ "epoch": 0.44092978672417926,
198
+ "grad_norm": 2.0619559288024902,
199
+ "learning_rate": 9.742682209735727e-05,
200
+ "loss": 1.1078,
201
+ "num_input_tokens_seen": 2105080,
202
+ "step": 115
203
+ },
204
+ {
205
+ "epoch": 0.46010064701653486,
206
+ "grad_norm": 1.618997573852539,
207
+ "learning_rate": 9.708054990881763e-05,
208
+ "loss": 1.123,
209
+ "num_input_tokens_seen": 2212296,
210
+ "step": 120
211
+ },
212
+ {
213
+ "epoch": 0.4792715073088905,
214
+ "grad_norm": 2.0891501903533936,
215
+ "learning_rate": 9.671312196173412e-05,
216
+ "loss": 1.1139,
217
+ "num_input_tokens_seen": 2297740,
218
+ "step": 125
219
+ },
220
+ {
221
+ "epoch": 0.4984423676012461,
222
+ "grad_norm": 2.8104231357574463,
223
+ "learning_rate": 9.632470336074009e-05,
224
+ "loss": 1.1419,
225
+ "num_input_tokens_seen": 2393552,
226
+ "step": 130
227
+ },
228
+ {
229
+ "epoch": 0.5176132278936018,
230
+ "grad_norm": 2.313002586364746,
231
+ "learning_rate": 9.591546864266983e-05,
232
+ "loss": 1.1122,
233
+ "num_input_tokens_seen": 2485636,
234
+ "step": 135
235
+ },
236
+ {
237
+ "epoch": 0.5367840881859574,
238
+ "grad_norm": 1.6471989154815674,
239
+ "learning_rate": 9.548560169812997e-05,
240
+ "loss": 1.0872,
241
+ "num_input_tokens_seen": 2579380,
242
+ "step": 140
243
+ },
244
+ {
245
+ "epoch": 0.555954948478313,
246
+ "grad_norm": 1.8052936792373657,
247
+ "learning_rate": 9.50352956888678e-05,
248
+ "loss": 1.1084,
249
+ "num_input_tokens_seen": 2660580,
250
+ "step": 145
251
+ },
252
+ {
253
+ "epoch": 0.5751258087706685,
254
+ "grad_norm": 1.9831300973892212,
255
+ "learning_rate": 9.45647529609736e-05,
256
+ "loss": 1.0753,
257
+ "num_input_tokens_seen": 2752664,
258
+ "step": 150
259
+ },
260
+ {
261
+ "epoch": 0.5942966690630243,
262
+ "grad_norm": 1.9232532978057861,
263
+ "learning_rate": 9.4074184953956e-05,
264
+ "loss": 1.0932,
265
+ "num_input_tokens_seen": 2835612,
266
+ "step": 155
267
+ },
268
+ {
269
+ "epoch": 0.6134675293553798,
270
+ "grad_norm": 3.607057571411133,
271
+ "learning_rate": 9.356381210573091e-05,
272
+ "loss": 1.0856,
273
+ "num_input_tokens_seen": 2927884,
274
+ "step": 160
275
+ },
276
+ {
277
+ "epoch": 0.6326383896477354,
278
+ "grad_norm": 1.5196906328201294,
279
+ "learning_rate": 9.303386375356752e-05,
280
+ "loss": 1.0765,
281
+ "num_input_tokens_seen": 3011316,
282
+ "step": 165
283
+ },
284
+ {
285
+ "epoch": 0.651809249940091,
286
+ "grad_norm": 2.0994279384613037,
287
+ "learning_rate": 9.248457803103476e-05,
288
+ "loss": 1.0743,
289
+ "num_input_tokens_seen": 3113324,
290
+ "step": 170
291
+ },
292
+ {
293
+ "epoch": 0.6709801102324466,
294
+ "grad_norm": 2.0873262882232666,
295
+ "learning_rate": 9.191620176099558e-05,
296
+ "loss": 1.0799,
297
+ "num_input_tokens_seen": 3205472,
298
+ "step": 175
299
+ },
300
+ {
301
+ "epoch": 0.6901509705248023,
302
+ "grad_norm": 1.7372593879699707,
303
+ "learning_rate": 9.132899034469647e-05,
304
+ "loss": 1.095,
305
+ "num_input_tokens_seen": 3303772,
306
+ "step": 180
307
+ },
308
+ {
309
+ "epoch": 0.7093218308171579,
310
+ "grad_norm": 1.7090203762054443,
311
+ "learning_rate": 9.072320764700223e-05,
312
+ "loss": 1.0832,
313
+ "num_input_tokens_seen": 3386080,
314
+ "step": 185
315
+ },
316
+ {
317
+ "epoch": 0.7284926911095135,
318
+ "grad_norm": 1.707291603088379,
319
+ "learning_rate": 9.009912587782771e-05,
320
+ "loss": 1.0376,
321
+ "num_input_tokens_seen": 3471524,
322
+ "step": 190
323
+ },
324
+ {
325
+ "epoch": 0.7476635514018691,
326
+ "grad_norm": 1.663960576057434,
327
+ "learning_rate": 8.945702546981969e-05,
328
+ "loss": 1.0802,
329
+ "num_input_tokens_seen": 3556336,
330
+ "step": 195
331
+ },
332
+ {
333
+ "epoch": 0.7668344116942247,
334
+ "grad_norm": 1.8464930057525635,
335
+ "learning_rate": 8.879719495234363e-05,
336
+ "loss": 1.0769,
337
+ "num_input_tokens_seen": 3640408,
338
+ "step": 200
339
+ },
340
+ {
341
+ "epoch": 0.7668344116942247,
342
+ "eval_loss": 0.862065851688385,
343
+ "eval_runtime": 0.8351,
344
+ "eval_samples_per_second": 179.628,
345
+ "eval_steps_per_second": 45.506,
346
+ "num_input_tokens_seen": 3640408,
347
+ "step": 200
348
+ },
349
+ {
350
+ "epoch": 0.7860052719865804,
351
+ "grad_norm": 1.7220999002456665,
352
+ "learning_rate": 8.811993082183243e-05,
353
+ "loss": 1.078,
354
+ "num_input_tokens_seen": 3731324,
355
+ "step": 205
356
+ },
357
+ {
358
+ "epoch": 0.805176132278936,
359
+ "grad_norm": 1.7034516334533691,
360
+ "learning_rate": 8.742553740855506e-05,
361
+ "loss": 1.0565,
362
+ "num_input_tokens_seen": 3822944,
363
+ "step": 210
364
+ },
365
+ {
366
+ "epoch": 0.8243469925712916,
367
+ "grad_norm": 2.327296257019043,
368
+ "learning_rate": 8.671432673986494e-05,
369
+ "loss": 1.0721,
370
+ "num_input_tokens_seen": 3922476,
371
+ "step": 215
372
+ },
373
+ {
374
+ "epoch": 0.8435178528636472,
375
+ "grad_norm": 1.6464370489120483,
376
+ "learning_rate": 8.598661839998972e-05,
377
+ "loss": 1.0573,
378
+ "num_input_tokens_seen": 4003388,
379
+ "step": 220
380
+ },
381
+ {
382
+ "epoch": 0.8626887131560029,
383
+ "grad_norm": 2.116698741912842,
384
+ "learning_rate": 8.524273938642538e-05,
385
+ "loss": 1.0459,
386
+ "num_input_tokens_seen": 4084052,
387
+ "step": 225
388
+ },
389
+ {
390
+ "epoch": 0.8818595734483585,
391
+ "grad_norm": 1.5513197183609009,
392
+ "learning_rate": 8.448302396299905e-05,
393
+ "loss": 1.073,
394
+ "num_input_tokens_seen": 4177072,
395
+ "step": 230
396
+ },
397
+ {
398
+ "epoch": 0.9010304337407141,
399
+ "grad_norm": 1.8118634223937988,
400
+ "learning_rate": 8.370781350966683e-05,
401
+ "loss": 1.0786,
402
+ "num_input_tokens_seen": 4272004,
403
+ "step": 235
404
+ },
405
+ {
406
+ "epoch": 0.9202012940330697,
407
+ "grad_norm": 1.542823076248169,
408
+ "learning_rate": 8.291745636911382e-05,
409
+ "loss": 1.0556,
410
+ "num_input_tokens_seen": 4367124,
411
+ "step": 240
412
+ },
413
+ {
414
+ "epoch": 0.9393721543254253,
415
+ "grad_norm": 1.5214637517929077,
416
+ "learning_rate": 8.211230769022551e-05,
417
+ "loss": 1.052,
418
+ "num_input_tokens_seen": 4454860,
419
+ "step": 245
420
+ },
421
+ {
422
+ "epoch": 0.958543014617781,
423
+ "grad_norm": 1.5787030458450317,
424
+ "learning_rate": 8.129272926850079e-05,
425
+ "loss": 1.032,
426
+ "num_input_tokens_seen": 4544744,
427
+ "step": 250
428
+ },
429
+ {
430
+ "epoch": 0.9777138749101366,
431
+ "grad_norm": 1.842089295387268,
432
+ "learning_rate": 8.045908938347828e-05,
433
+ "loss": 1.0585,
434
+ "num_input_tokens_seen": 4627284,
435
+ "step": 255
436
+ },
437
+ {
438
+ "epoch": 0.9968847352024922,
439
+ "grad_norm": 1.8476406335830688,
440
+ "learning_rate": 7.961176263324901e-05,
441
+ "loss": 1.0045,
442
+ "num_input_tokens_seen": 4715336,
443
+ "step": 260
444
+ },
445
+ {
446
+ "epoch": 1.0160555954948478,
447
+ "grad_norm": 1.540380835533142,
448
+ "learning_rate": 7.875112976612984e-05,
449
+ "loss": 0.8789,
450
+ "num_input_tokens_seen": 4811720,
451
+ "step": 265
452
+ },
453
+ {
454
+ "epoch": 1.0352264557872035,
455
+ "grad_norm": 1.593477725982666,
456
+ "learning_rate": 7.787757750957334e-05,
457
+ "loss": 0.8548,
458
+ "num_input_tokens_seen": 4904124,
459
+ "step": 270
460
+ },
461
+ {
462
+ "epoch": 1.054397316079559,
463
+ "grad_norm": 2.4607889652252197,
464
+ "learning_rate": 7.699149839639086e-05,
465
+ "loss": 0.8393,
466
+ "num_input_tokens_seen": 4997508,
467
+ "step": 275
468
+ },
469
+ {
470
+ "epoch": 1.0735681763719147,
471
+ "grad_norm": 2.0560553073883057,
472
+ "learning_rate": 7.609329058836695e-05,
473
+ "loss": 0.8517,
474
+ "num_input_tokens_seen": 5102324,
475
+ "step": 280
476
+ },
477
+ {
478
+ "epoch": 1.0927390366642702,
479
+ "grad_norm": 1.5116240978240967,
480
+ "learning_rate": 7.518335769734439e-05,
481
+ "loss": 0.8498,
482
+ "num_input_tokens_seen": 5203192,
483
+ "step": 285
484
+ },
485
+ {
486
+ "epoch": 1.111909896956626,
487
+ "grad_norm": 1.517767071723938,
488
+ "learning_rate": 7.426210860386031e-05,
489
+ "loss": 0.8269,
490
+ "num_input_tokens_seen": 5300184,
491
+ "step": 290
492
+ },
493
+ {
494
+ "epoch": 1.1310807572489816,
495
+ "grad_norm": 1.6254216432571411,
496
+ "learning_rate": 7.332995727341462e-05,
497
+ "loss": 0.8496,
498
+ "num_input_tokens_seen": 5405724,
499
+ "step": 295
500
+ },
501
+ {
502
+ "epoch": 1.150251617541337,
503
+ "grad_norm": 1.6421043872833252,
504
+ "learning_rate": 7.238732257045372e-05,
505
+ "loss": 0.849,
506
+ "num_input_tokens_seen": 5504644,
507
+ "step": 300
508
+ },
509
+ {
510
+ "epoch": 1.150251617541337,
511
+ "eval_loss": 0.8501759767532349,
512
+ "eval_runtime": 0.8525,
513
+ "eval_samples_per_second": 175.946,
514
+ "eval_steps_per_second": 44.573,
515
+ "num_input_tokens_seen": 5504644,
516
+ "step": 300
517
+ },
518
+ {
519
+ "epoch": 1.1694224778336928,
520
+ "grad_norm": 1.7481452226638794,
521
+ "learning_rate": 7.143462807015271e-05,
522
+ "loss": 0.8823,
523
+ "num_input_tokens_seen": 5597560,
524
+ "step": 305
525
+ },
526
+ {
527
+ "epoch": 1.1885933381260485,
528
+ "grad_norm": 1.725755214691162,
529
+ "learning_rate": 7.047230186808085e-05,
530
+ "loss": 0.8701,
531
+ "num_input_tokens_seen": 5691244,
532
+ "step": 310
533
+ },
534
+ {
535
+ "epoch": 1.207764198418404,
536
+ "grad_norm": 1.4096667766571045,
537
+ "learning_rate": 6.950077638783578e-05,
538
+ "loss": 0.8501,
539
+ "num_input_tokens_seen": 5773336,
540
+ "step": 315
541
+ },
542
+ {
543
+ "epoch": 1.2269350587107597,
544
+ "grad_norm": 1.5906323194503784,
545
+ "learning_rate": 6.8520488186733e-05,
546
+ "loss": 0.8684,
547
+ "num_input_tokens_seen": 5849224,
548
+ "step": 320
549
+ },
550
+ {
551
+ "epoch": 1.2461059190031152,
552
+ "grad_norm": 2.2204673290252686,
553
+ "learning_rate": 6.753187775963773e-05,
554
+ "loss": 0.8688,
555
+ "num_input_tokens_seen": 5943104,
556
+ "step": 325
557
+ },
558
+ {
559
+ "epoch": 1.2652767792954709,
560
+ "grad_norm": 1.40131413936615,
561
+ "learning_rate": 6.653538934102743e-05,
562
+ "loss": 0.8495,
563
+ "num_input_tokens_seen": 6051232,
564
+ "step": 330
565
+ },
566
+ {
567
+ "epoch": 1.2844476395878264,
568
+ "grad_norm": 1.578782558441162,
569
+ "learning_rate": 6.553147070537413e-05,
570
+ "loss": 0.8218,
571
+ "num_input_tokens_seen": 6137520,
572
+ "step": 335
573
+ },
574
+ {
575
+ "epoch": 1.303618499880182,
576
+ "grad_norm": 1.5068399906158447,
577
+ "learning_rate": 6.452057296593568e-05,
578
+ "loss": 0.8964,
579
+ "num_input_tokens_seen": 6237012,
580
+ "step": 340
581
+ },
582
+ {
583
+ "epoch": 1.3227893601725378,
584
+ "grad_norm": 1.6630327701568604,
585
+ "learning_rate": 6.350315037204714e-05,
586
+ "loss": 0.8748,
587
+ "num_input_tokens_seen": 6332764,
588
+ "step": 345
589
+ },
590
+ {
591
+ "epoch": 1.3419602204648933,
592
+ "grad_norm": 1.569263219833374,
593
+ "learning_rate": 6.247966010500258e-05,
594
+ "loss": 0.8478,
595
+ "num_input_tokens_seen": 6416688,
596
+ "step": 350
597
+ },
598
+ {
599
+ "epoch": 1.361131080757249,
600
+ "grad_norm": 1.4157441854476929,
601
+ "learning_rate": 6.145056207261964e-05,
602
+ "loss": 0.8624,
603
+ "num_input_tokens_seen": 6507660,
604
+ "step": 355
605
+ },
606
+ {
607
+ "epoch": 1.3803019410496047,
608
+ "grad_norm": 1.4510629177093506,
609
+ "learning_rate": 6.0416318702578826e-05,
610
+ "loss": 0.851,
611
+ "num_input_tokens_seen": 6608708,
612
+ "step": 360
613
+ },
614
+ {
615
+ "epoch": 1.3994728013419602,
616
+ "grad_norm": 1.660876989364624,
617
+ "learning_rate": 5.9377394734630464e-05,
618
+ "loss": 0.8401,
619
+ "num_input_tokens_seen": 6700852,
620
+ "step": 365
621
+ },
622
+ {
623
+ "epoch": 1.4186436616343159,
624
+ "grad_norm": 1.4707056283950806,
625
+ "learning_rate": 5.833425701176294e-05,
626
+ "loss": 0.8646,
627
+ "num_input_tokens_seen": 6792244,
628
+ "step": 370
629
+ },
630
+ {
631
+ "epoch": 1.4378145219266716,
632
+ "grad_norm": 1.563291311264038,
633
+ "learning_rate": 5.728737427042548e-05,
634
+ "loss": 0.8732,
635
+ "num_input_tokens_seen": 6875536,
636
+ "step": 375
637
+ },
638
+ {
639
+ "epoch": 1.456985382219027,
640
+ "grad_norm": 1.286574125289917,
641
+ "learning_rate": 5.623721692990039e-05,
642
+ "loss": 0.8449,
643
+ "num_input_tokens_seen": 6958384,
644
+ "step": 380
645
+ },
646
+ {
647
+ "epoch": 1.4761562425113828,
648
+ "grad_norm": 1.413732886314392,
649
+ "learning_rate": 5.518425688091906e-05,
650
+ "loss": 0.8459,
651
+ "num_input_tokens_seen": 7040740,
652
+ "step": 385
653
+ },
654
+ {
655
+ "epoch": 1.4953271028037383,
656
+ "grad_norm": 1.3040345907211304,
657
+ "learning_rate": 5.4128967273616625e-05,
658
+ "loss": 0.8539,
659
+ "num_input_tokens_seen": 7132892,
660
+ "step": 390
661
+ },
662
+ {
663
+ "epoch": 1.514497963096094,
664
+ "grad_norm": 1.4091562032699585,
665
+ "learning_rate": 5.307182230492088e-05,
666
+ "loss": 0.816,
667
+ "num_input_tokens_seen": 7217480,
668
+ "step": 395
669
+ },
670
+ {
671
+ "epoch": 1.5336688233884495,
672
+ "grad_norm": 1.5895339250564575,
673
+ "learning_rate": 5.201329700547076e-05,
674
+ "loss": 0.8612,
675
+ "num_input_tokens_seen": 7316212,
676
+ "step": 400
677
+ },
678
+ {
679
+ "epoch": 1.5336688233884495,
680
+ "eval_loss": 0.8288899064064026,
681
+ "eval_runtime": 0.8525,
682
+ "eval_samples_per_second": 175.948,
683
+ "eval_steps_per_second": 44.574,
684
+ "num_input_tokens_seen": 7316212,
685
+ "step": 400
686
+ },
687
+ {
688
+ "epoch": 1.5528396836808052,
689
+ "grad_norm": 1.523808240890503,
690
+ "learning_rate": 5.095386702616012e-05,
691
+ "loss": 0.8411,
692
+ "num_input_tokens_seen": 7397436,
693
+ "step": 405
694
+ },
695
+ {
696
+ "epoch": 1.5720105439731609,
697
+ "grad_norm": 1.4366319179534912,
698
+ "learning_rate": 4.989400842440289e-05,
699
+ "loss": 0.8179,
700
+ "num_input_tokens_seen": 7489804,
701
+ "step": 410
702
+ },
703
+ {
704
+ "epoch": 1.5911814042655164,
705
+ "grad_norm": 1.335564136505127,
706
+ "learning_rate": 4.883419745021554e-05,
707
+ "loss": 0.8321,
708
+ "num_input_tokens_seen": 7575200,
709
+ "step": 415
710
+ },
711
+ {
712
+ "epoch": 1.610352264557872,
713
+ "grad_norm": 1.5225696563720703,
714
+ "learning_rate": 4.7774910332213e-05,
715
+ "loss": 0.8408,
716
+ "num_input_tokens_seen": 7661620,
717
+ "step": 420
718
+ },
719
+ {
720
+ "epoch": 1.6295231248502278,
721
+ "grad_norm": 1.286909580230713,
722
+ "learning_rate": 4.6716623063614094e-05,
723
+ "loss": 0.8335,
724
+ "num_input_tokens_seen": 7751008,
725
+ "step": 425
726
+ },
727
+ {
728
+ "epoch": 1.6486939851425833,
729
+ "grad_norm": 1.399095058441162,
730
+ "learning_rate": 4.565981118835299e-05,
731
+ "loss": 0.8586,
732
+ "num_input_tokens_seen": 7847504,
733
+ "step": 430
734
+ },
735
+ {
736
+ "epoch": 1.6678648454349387,
737
+ "grad_norm": 1.5877128839492798,
738
+ "learning_rate": 4.4604949587392234e-05,
739
+ "loss": 0.8451,
740
+ "num_input_tokens_seen": 7940004,
741
+ "step": 435
742
+ },
743
+ {
744
+ "epoch": 1.6870357057272947,
745
+ "grad_norm": 1.661634087562561,
746
+ "learning_rate": 4.355251226533396e-05,
747
+ "loss": 0.8517,
748
+ "num_input_tokens_seen": 8037312,
749
+ "step": 440
750
+ },
751
+ {
752
+ "epoch": 1.7062065660196502,
753
+ "grad_norm": 1.2558486461639404,
754
+ "learning_rate": 4.250297213742473e-05,
755
+ "loss": 0.8115,
756
+ "num_input_tokens_seen": 8134436,
757
+ "step": 445
758
+ },
759
+ {
760
+ "epoch": 1.7253774263120056,
761
+ "grad_norm": 1.4006855487823486,
762
+ "learning_rate": 4.145680081704989e-05,
763
+ "loss": 0.8471,
764
+ "num_input_tokens_seen": 8223592,
765
+ "step": 450
766
+ },
767
+ {
768
+ "epoch": 1.7445482866043613,
769
+ "grad_norm": 1.5870461463928223,
770
+ "learning_rate": 4.0414468403813095e-05,
771
+ "loss": 0.8582,
772
+ "num_input_tokens_seen": 8312984,
773
+ "step": 455
774
+ },
775
+ {
776
+ "epoch": 1.763719146896717,
777
+ "grad_norm": 1.4359045028686523,
778
+ "learning_rate": 3.937644327229572e-05,
779
+ "loss": 0.8146,
780
+ "num_input_tokens_seen": 8402888,
781
+ "step": 460
782
+ },
783
+ {
784
+ "epoch": 1.7828900071890725,
785
+ "grad_norm": 1.4591436386108398,
786
+ "learning_rate": 3.8343191861591795e-05,
787
+ "loss": 0.8276,
788
+ "num_input_tokens_seen": 8503552,
789
+ "step": 465
790
+ },
791
+ {
792
+ "epoch": 1.8020608674814282,
793
+ "grad_norm": 1.6432543992996216,
794
+ "learning_rate": 3.7315178465712366e-05,
795
+ "loss": 0.8524,
796
+ "num_input_tokens_seen": 8604500,
797
+ "step": 470
798
+ },
799
+ {
800
+ "epoch": 1.821231727773784,
801
+ "grad_norm": 1.58722984790802,
802
+ "learning_rate": 3.629286502495394e-05,
803
+ "loss": 0.8257,
804
+ "num_input_tokens_seen": 8690612,
805
+ "step": 475
806
+ },
807
+ {
808
+ "epoch": 1.8404025880661394,
809
+ "grad_norm": 1.3321154117584229,
810
+ "learning_rate": 3.52767109183244e-05,
811
+ "loss": 0.8194,
812
+ "num_input_tokens_seen": 8785132,
813
+ "step": 480
814
+ },
815
+ {
816
+ "epoch": 1.8595734483584951,
817
+ "grad_norm": 1.8727712631225586,
818
+ "learning_rate": 3.426717275712e-05,
819
+ "loss": 0.8414,
820
+ "num_input_tokens_seen": 8882700,
821
+ "step": 485
822
+ },
823
+ {
824
+ "epoch": 1.8787443086508508,
825
+ "grad_norm": 1.427962303161621,
826
+ "learning_rate": 3.326470417974604e-05,
827
+ "loss": 0.8275,
828
+ "num_input_tokens_seen": 8982636,
829
+ "step": 490
830
+ },
831
+ {
832
+ "epoch": 1.8979151689432063,
833
+ "grad_norm": 1.391340970993042,
834
+ "learning_rate": 3.226975564787322e-05,
835
+ "loss": 0.8115,
836
+ "num_input_tokens_seen": 9090140,
837
+ "step": 495
838
+ },
839
+ {
840
+ "epoch": 1.9170860292355618,
841
+ "grad_norm": 1.465030312538147,
842
+ "learning_rate": 3.1282774244021715e-05,
843
+ "loss": 0.7934,
844
+ "num_input_tokens_seen": 9167936,
845
+ "step": 500
846
+ },
847
+ {
848
+ "epoch": 1.9170860292355618,
849
+ "eval_loss": 0.8071622252464294,
850
+ "eval_runtime": 0.8763,
851
+ "eval_samples_per_second": 171.177,
852
+ "eval_steps_per_second": 43.365,
853
+ "num_input_tokens_seen": 9167936,
854
+ "step": 500
855
+ },
856
+ {
857
+ "epoch": 1.9362568895279175,
858
+ "grad_norm": 1.6803951263427734,
859
+ "learning_rate": 3.0304203470663505e-05,
860
+ "loss": 0.821,
861
+ "num_input_tokens_seen": 9252204,
862
+ "step": 505
863
+ },
864
+ {
865
+ "epoch": 1.9554277498202732,
866
+ "grad_norm": 1.4982653856277466,
867
+ "learning_rate": 2.9334483050933503e-05,
868
+ "loss": 0.7982,
869
+ "num_input_tokens_seen": 9326444,
870
+ "step": 510
871
+ },
872
+ {
873
+ "epoch": 1.9745986101126287,
874
+ "grad_norm": 1.5553169250488281,
875
+ "learning_rate": 2.8374048731038898e-05,
876
+ "loss": 0.8183,
877
+ "num_input_tokens_seen": 9412396,
878
+ "step": 515
879
+ },
880
+ {
881
+ "epoch": 1.9937694704049844,
882
+ "grad_norm": 1.176651954650879,
883
+ "learning_rate": 2.7423332084455544e-05,
884
+ "loss": 0.7837,
885
+ "num_input_tokens_seen": 9497412,
886
+ "step": 520
887
+ },
888
+ {
889
+ "epoch": 2.01294033069734,
890
+ "grad_norm": 1.2269119024276733,
891
+ "learning_rate": 2.648276031799934e-05,
892
+ "loss": 0.726,
893
+ "num_input_tokens_seen": 9594720,
894
+ "step": 525
895
+ },
896
+ {
897
+ "epoch": 2.0321111909896956,
898
+ "grad_norm": 1.4332973957061768,
899
+ "learning_rate": 2.5552756079859903e-05,
900
+ "loss": 0.6847,
901
+ "num_input_tokens_seen": 9675564,
902
+ "step": 530
903
+ },
904
+ {
905
+ "epoch": 2.051282051282051,
906
+ "grad_norm": 1.3018600940704346,
907
+ "learning_rate": 2.4633737269682543e-05,
908
+ "loss": 0.682,
909
+ "num_input_tokens_seen": 9772140,
910
+ "step": 535
911
+ },
912
+ {
913
+ "epoch": 2.070452911574407,
914
+ "grad_norm": 1.2701021432876587,
915
+ "learning_rate": 2.3726116850783985e-05,
916
+ "loss": 0.6643,
917
+ "num_input_tokens_seen": 9850508,
918
+ "step": 540
919
+ },
920
+ {
921
+ "epoch": 2.0896237718667625,
922
+ "grad_norm": 1.2966877222061157,
923
+ "learning_rate": 2.283030266458644e-05,
924
+ "loss": 0.6868,
925
+ "num_input_tokens_seen": 9949592,
926
+ "step": 545
927
+ },
928
+ {
929
+ "epoch": 2.108794632159118,
930
+ "grad_norm": 1.4499340057373047,
931
+ "learning_rate": 2.194669724735296e-05,
932
+ "loss": 0.6713,
933
+ "num_input_tokens_seen": 10041244,
934
+ "step": 550
935
+ },
936
+ {
937
+ "epoch": 2.127965492451474,
938
+ "grad_norm": 1.791567325592041,
939
+ "learning_rate": 2.1075697649306835e-05,
940
+ "loss": 0.6893,
941
+ "num_input_tokens_seen": 10140004,
942
+ "step": 555
943
+ },
944
+ {
945
+ "epoch": 2.1471363527438294,
946
+ "grad_norm": 1.4607737064361572,
947
+ "learning_rate": 2.0217695256216195e-05,
948
+ "loss": 0.6784,
949
+ "num_input_tokens_seen": 10233820,
950
+ "step": 560
951
+ },
952
+ {
953
+ "epoch": 2.166307213036185,
954
+ "grad_norm": 1.4586418867111206,
955
+ "learning_rate": 1.937307561352373e-05,
956
+ "loss": 0.6829,
957
+ "num_input_tokens_seen": 10324844,
958
+ "step": 565
959
+ },
960
+ {
961
+ "epoch": 2.1854780733285404,
962
+ "grad_norm": 1.4719526767730713,
963
+ "learning_rate": 1.854221825310103e-05,
964
+ "loss": 0.6775,
965
+ "num_input_tokens_seen": 10413592,
966
+ "step": 570
967
+ },
968
+ {
969
+ "epoch": 2.2046489336208963,
970
+ "grad_norm": 1.4231044054031372,
971
+ "learning_rate": 1.7725496522704998e-05,
972
+ "loss": 0.6872,
973
+ "num_input_tokens_seen": 10503432,
974
+ "step": 575
975
+ },
976
+ {
977
+ "epoch": 2.223819793913252,
978
+ "grad_norm": 1.3985334634780884,
979
+ "learning_rate": 1.6923277418213117e-05,
980
+ "loss": 0.65,
981
+ "num_input_tokens_seen": 10600988,
982
+ "step": 580
983
+ },
984
+ {
985
+ "epoch": 2.2429906542056073,
986
+ "grad_norm": 1.3148375749588013,
987
+ "learning_rate": 1.6135921418712956e-05,
988
+ "loss": 0.6901,
989
+ "num_input_tokens_seen": 10692664,
990
+ "step": 585
991
+ },
992
+ {
993
+ "epoch": 2.262161514497963,
994
+ "grad_norm": 1.3653095960617065,
995
+ "learning_rate": 1.536378232452003e-05,
996
+ "loss": 0.6661,
997
+ "num_input_tokens_seen": 10781516,
998
+ "step": 590
999
+ },
1000
+ {
1001
+ "epoch": 2.2813323747903187,
1002
+ "grad_norm": 1.4331799745559692,
1003
+ "learning_rate": 1.4607207098196852e-05,
1004
+ "loss": 0.669,
1005
+ "num_input_tokens_seen": 10874028,
1006
+ "step": 595
1007
+ },
1008
+ {
1009
+ "epoch": 2.300503235082674,
1010
+ "grad_norm": 1.3715401887893677,
1011
+ "learning_rate": 1.3866535708644334e-05,
1012
+ "loss": 0.6701,
1013
+ "num_input_tokens_seen": 10969348,
1014
+ "step": 600
1015
+ },
1016
+ {
1017
+ "epoch": 2.300503235082674,
1018
+ "eval_loss": 0.8050560355186462,
1019
+ "eval_runtime": 0.8838,
1020
+ "eval_samples_per_second": 169.723,
1021
+ "eval_steps_per_second": 42.997,
1022
+ "num_input_tokens_seen": 10969348,
1023
+ "step": 600
1024
+ },
1025
+ {
1026
+ "epoch": 2.31967409537503,
1027
+ "grad_norm": 1.2875298261642456,
1028
+ "learning_rate": 1.3142100978336069e-05,
1029
+ "loss": 0.6877,
1030
+ "num_input_tokens_seen": 11064696,
1031
+ "step": 605
1032
+ },
1033
+ {
1034
+ "epoch": 2.3388449556673856,
1035
+ "grad_norm": 1.2775810956954956,
1036
+ "learning_rate": 1.2434228433763657e-05,
1037
+ "loss": 0.6752,
1038
+ "num_input_tokens_seen": 11157516,
1039
+ "step": 610
1040
+ },
1041
+ {
1042
+ "epoch": 2.358015815959741,
1043
+ "grad_norm": 1.3270821571350098,
1044
+ "learning_rate": 1.1743236159160653e-05,
1045
+ "loss": 0.6915,
1046
+ "num_input_tokens_seen": 11236376,
1047
+ "step": 615
1048
+ },
1049
+ {
1050
+ "epoch": 2.377186676252097,
1051
+ "grad_norm": 1.2592318058013916,
1052
+ "learning_rate": 1.1069434653570631e-05,
1053
+ "loss": 0.6786,
1054
+ "num_input_tokens_seen": 11343512,
1055
+ "step": 620
1056
+ },
1057
+ {
1058
+ "epoch": 2.3963575365444525,
1059
+ "grad_norm": 1.4185535907745361,
1060
+ "learning_rate": 1.0413126691323666e-05,
1061
+ "loss": 0.6876,
1062
+ "num_input_tokens_seen": 11433156,
1063
+ "step": 625
1064
+ },
1065
+ {
1066
+ "epoch": 2.415528396836808,
1067
+ "grad_norm": 1.5168098211288452,
1068
+ "learning_rate": 9.774607185984002e-06,
1069
+ "loss": 0.6911,
1070
+ "num_input_tokens_seen": 11517328,
1071
+ "step": 630
1072
+ },
1073
+ {
1074
+ "epoch": 2.4346992571291635,
1075
+ "grad_norm": 1.2858244180679321,
1076
+ "learning_rate": 9.154163057829879e-06,
1077
+ "loss": 0.6618,
1078
+ "num_input_tokens_seen": 11602144,
1079
+ "step": 635
1080
+ },
1081
+ {
1082
+ "epoch": 2.4538701174215194,
1083
+ "grad_norm": 1.2297165393829346,
1084
+ "learning_rate": 8.552073104925295e-06,
1085
+ "loss": 0.6804,
1086
+ "num_input_tokens_seen": 11691216,
1087
+ "step": 640
1088
+ },
1089
+ {
1090
+ "epoch": 2.473040977713875,
1091
+ "grad_norm": 1.2545212507247925,
1092
+ "learning_rate": 7.968607877841332e-06,
1093
+ "loss": 0.6669,
1094
+ "num_input_tokens_seen": 11789340,
1095
+ "step": 645
1096
+ },
1097
+ {
1098
+ "epoch": 2.4922118380062304,
1099
+ "grad_norm": 1.3260319232940674,
1100
+ "learning_rate": 7.404029558083653e-06,
1101
+ "loss": 0.6847,
1102
+ "num_input_tokens_seen": 11884208,
1103
+ "step": 650
1104
+ },
1105
+ {
1106
+ "epoch": 2.5113826982985863,
1107
+ "grad_norm": 1.2979117631912231,
1108
+ "learning_rate": 6.858591840280626e-06,
1109
+ "loss": 0.6592,
1110
+ "num_input_tokens_seen": 11972124,
1111
+ "step": 655
1112
+ },
1113
+ {
1114
+ "epoch": 2.5305535585909418,
1115
+ "grad_norm": 1.2984734773635864,
1116
+ "learning_rate": 6.3325398181849845e-06,
1117
+ "loss": 0.6579,
1118
+ "num_input_tokens_seen": 12057320,
1119
+ "step": 660
1120
+ },
1121
+ {
1122
+ "epoch": 2.5497244188832973,
1123
+ "grad_norm": 1.2366037368774414,
1124
+ "learning_rate": 5.826109874540409e-06,
1125
+ "loss": 0.6666,
1126
+ "num_input_tokens_seen": 12154952,
1127
+ "step": 665
1128
+ },
1129
+ {
1130
+ "epoch": 2.5688952791756527,
1131
+ "grad_norm": 1.2756296396255493,
1132
+ "learning_rate": 5.33952957486234e-06,
1133
+ "loss": 0.6903,
1134
+ "num_input_tokens_seen": 12256604,
1135
+ "step": 670
1136
+ },
1137
+ {
1138
+ "epoch": 2.5880661394680087,
1139
+ "grad_norm": 1.3016470670700073,
1140
+ "learning_rate": 4.873017565180871e-06,
1141
+ "loss": 0.6654,
1142
+ "num_input_tokens_seen": 12341988,
1143
+ "step": 675
1144
+ },
1145
+ {
1146
+ "epoch": 2.607236999760364,
1147
+ "grad_norm": 1.3244975805282593,
1148
+ "learning_rate": 4.4267834737916296e-06,
1149
+ "loss": 0.6597,
1150
+ "num_input_tokens_seen": 12442644,
1151
+ "step": 680
1152
+ },
1153
+ {
1154
+ "epoch": 2.62640786005272,
1155
+ "grad_norm": 1.4668656587600708,
1156
+ "learning_rate": 4.001027817058789e-06,
1157
+ "loss": 0.6709,
1158
+ "num_input_tokens_seen": 12538688,
1159
+ "step": 685
1160
+ },
1161
+ {
1162
+ "epoch": 2.6455787203450756,
1163
+ "grad_norm": 1.4666439294815063,
1164
+ "learning_rate": 3.5959419093125946e-06,
1165
+ "loss": 0.6793,
1166
+ "num_input_tokens_seen": 12621632,
1167
+ "step": 690
1168
+ },
1169
+ {
1170
+ "epoch": 2.664749580637431,
1171
+ "grad_norm": 1.2680643796920776,
1172
+ "learning_rate": 3.211707776881739e-06,
1173
+ "loss": 0.6562,
1174
+ "num_input_tokens_seen": 12715832,
1175
+ "step": 695
1176
+ },
1177
+ {
1178
+ "epoch": 2.6839204409297865,
1179
+ "grad_norm": 1.3070735931396484,
1180
+ "learning_rate": 2.848498076299483e-06,
1181
+ "loss": 0.6579,
1182
+ "num_input_tokens_seen": 12814620,
1183
+ "step": 700
1184
+ },
1185
+ {
1186
+ "epoch": 2.6839204409297865,
1187
+ "eval_loss": 0.7903470396995544,
1188
+ "eval_runtime": 0.941,
1189
+ "eval_samples_per_second": 159.398,
1190
+ "eval_steps_per_second": 40.381,
1191
+ "num_input_tokens_seen": 12814620,
1192
+ "step": 700
1193
+ },
1194
+ {
1195
+ "epoch": 2.7030913012221425,
1196
+ "grad_norm": 1.4454238414764404,
1197
+ "learning_rate": 2.506476016719922e-06,
1198
+ "loss": 0.6818,
1199
+ "num_input_tokens_seen": 12909896,
1200
+ "step": 705
1201
+ },
1202
+ {
1203
+ "epoch": 2.722262161514498,
1204
+ "grad_norm": 1.249842882156372,
1205
+ "learning_rate": 2.1857952865796614e-06,
1206
+ "loss": 0.6655,
1207
+ "num_input_tokens_seen": 13001288,
1208
+ "step": 710
1209
+ },
1210
+ {
1211
+ "epoch": 2.7414330218068534,
1212
+ "grad_norm": 1.312296986579895,
1213
+ "learning_rate": 1.8865999845374793e-06,
1214
+ "loss": 0.656,
1215
+ "num_input_tokens_seen": 13094568,
1216
+ "step": 715
1217
+ },
1218
+ {
1219
+ "epoch": 2.7606038820992094,
1220
+ "grad_norm": 1.1614197492599487,
1221
+ "learning_rate": 1.6090245547232707e-06,
1222
+ "loss": 0.6664,
1223
+ "num_input_tokens_seen": 13181872,
1224
+ "step": 720
1225
+ },
1226
+ {
1227
+ "epoch": 2.779774742391565,
1228
+ "grad_norm": 1.249230146408081,
1229
+ "learning_rate": 1.353193726325247e-06,
1230
+ "loss": 0.6643,
1231
+ "num_input_tokens_seen": 13276044,
1232
+ "step": 725
1233
+ },
1234
+ {
1235
+ "epoch": 2.7989456026839203,
1236
+ "grad_norm": 1.3258079290390015,
1237
+ "learning_rate": 1.1192224575425848e-06,
1238
+ "loss": 0.6352,
1239
+ "num_input_tokens_seen": 13361008,
1240
+ "step": 730
1241
+ },
1242
+ {
1243
+ "epoch": 2.818116462976276,
1244
+ "grad_norm": 1.364700198173523,
1245
+ "learning_rate": 9.072158839286748e-07,
1246
+ "loss": 0.6686,
1247
+ "num_input_tokens_seen": 13458560,
1248
+ "step": 735
1249
+ },
1250
+ {
1251
+ "epoch": 2.8372873232686318,
1252
+ "grad_norm": 1.3140686750411987,
1253
+ "learning_rate": 7.172692711482021e-07,
1254
+ "loss": 0.6623,
1255
+ "num_input_tokens_seen": 13551928,
1256
+ "step": 740
1257
+ },
1258
+ {
1259
+ "epoch": 2.8564581835609872,
1260
+ "grad_norm": 1.4460844993591309,
1261
+ "learning_rate": 5.494679721693152e-07,
1262
+ "loss": 0.6658,
1263
+ "num_input_tokens_seen": 13632792,
1264
+ "step": 745
1265
+ },
1266
+ {
1267
+ "epoch": 2.875629043853343,
1268
+ "grad_norm": 1.2111197710037231,
1269
+ "learning_rate": 4.0388738891002366e-07,
1270
+ "loss": 0.6239,
1271
+ "num_input_tokens_seen": 13723172,
1272
+ "step": 750
1273
+ },
1274
+ {
1275
+ "epoch": 2.8947999041456987,
1276
+ "grad_norm": 1.2612218856811523,
1277
+ "learning_rate": 2.8059293835620003e-07,
1278
+ "loss": 0.6682,
1279
+ "num_input_tokens_seen": 13820332,
1280
+ "step": 755
1281
+ },
1282
+ {
1283
+ "epoch": 2.913970764438054,
1284
+ "grad_norm": 1.3601187467575073,
1285
+ "learning_rate": 1.7964002316628315e-07,
1286
+ "loss": 0.6729,
1287
+ "num_input_tokens_seen": 13907996,
1288
+ "step": 760
1289
+ },
1290
+ {
1291
+ "epoch": 2.9331416247304096,
1292
+ "grad_norm": 1.3947527408599854,
1293
+ "learning_rate": 1.0107400677596412e-07,
1294
+ "loss": 0.6736,
1295
+ "num_input_tokens_seen": 13993196,
1296
+ "step": 765
1297
+ },
1298
+ {
1299
+ "epoch": 2.9523124850227656,
1300
+ "grad_norm": 1.2355984449386597,
1301
+ "learning_rate": 4.493019301401446e-08,
1302
+ "loss": 0.6655,
1303
+ "num_input_tokens_seen": 14084984,
1304
+ "step": 770
1305
+ },
1306
+ {
1307
+ "epoch": 2.971483345315121,
1308
+ "grad_norm": 1.2893033027648926,
1309
+ "learning_rate": 1.1233810238425735e-08,
1310
+ "loss": 0.657,
1311
+ "num_input_tokens_seen": 14169848,
1312
+ "step": 775
1313
+ },
1314
+ {
1315
+ "epoch": 2.9906542056074765,
1316
+ "grad_norm": 1.3213859796524048,
1317
+ "learning_rate": 0.0,
1318
+ "loss": 0.6589,
1319
+ "num_input_tokens_seen": 14258488,
1320
+ "step": 780
1321
+ },
1322
+ {
1323
+ "epoch": 2.9906542056074765,
1324
+ "num_input_tokens_seen": 14258488,
1325
+ "step": 780,
1326
+ "total_flos": 3.0175424769490944e+16,
1327
+ "train_loss": 0.895275863011678,
1328
+ "train_runtime": 860.318,
1329
+ "train_samples_per_second": 58.206,
1330
+ "train_steps_per_second": 0.907
1331
+ }
1332
+ ],
1333
+ "logging_steps": 5,
1334
+ "max_steps": 780,
1335
+ "num_input_tokens_seen": 14258488,
1336
+ "num_train_epochs": 3,
1337
+ "save_steps": 100,
1338
+ "stateful_callbacks": {
1339
+ "TrainerControl": {
1340
+ "args": {
1341
+ "should_epoch_stop": false,
1342
+ "should_evaluate": false,
1343
+ "should_log": false,
1344
+ "should_save": true,
1345
+ "should_training_stop": true
1346
+ },
1347
+ "attributes": {}
1348
+ }
1349
+ },
1350
+ "total_flos": 3.0175424769490944e+16,
1351
+ "train_batch_size": 4,
1352
+ "trial_name": null,
1353
+ "trial_params": null
1354
+ }