terry69 commited on
Commit
3192fa3
1 Parent(s): 8694bce

Model save

Browse files
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Mistral-7B-Instruct-v0.2
4
+ tags:
5
+ - trl
6
+ - sft
7
+ - generated_from_trainer
8
+ datasets:
9
+ - generator
10
+ model-index:
11
+ - name: feedback_p0.1_seed42_level2_style
12
+ results: []
13
+ ---
14
+
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
+
18
+ # feedback_p0.1_seed42_level2_style
19
+
20
+ This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) on the generator dataset.
21
+ It achieves the following results on the evaluation set:
22
+ - Loss: 0.3528
23
+
24
+ ## Model description
25
+
26
+ More information needed
27
+
28
+ ## Intended uses & limitations
29
+
30
+ More information needed
31
+
32
+ ## Training and evaluation data
33
+
34
+ More information needed
35
+
36
+ ## Training procedure
37
+
38
+ ### Training hyperparameters
39
+
40
+ The following hyperparameters were used during training:
41
+ - learning_rate: 1e-05
42
+ - train_batch_size: 2
43
+ - eval_batch_size: 1
44
+ - seed: 42
45
+ - distributed_type: multi-GPU
46
+ - num_devices: 4
47
+ - gradient_accumulation_steps: 4
48
+ - total_train_batch_size: 32
49
+ - total_eval_batch_size: 4
50
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
51
+ - lr_scheduler_type: cosine
52
+ - lr_scheduler_warmup_ratio: 0.1
53
+ - num_epochs: 1
54
+
55
+ ### Training results
56
+
57
+ | Training Loss | Epoch | Step | Validation Loss |
58
+ |:-------------:|:------:|:----:|:---------------:|
59
+ | 0.3863 | 0.9992 | 963 | 0.3528 |
60
+
61
+
62
+ ### Framework versions
63
+
64
+ - Transformers 4.43.4
65
+ - Pytorch 2.3.1+cu121
66
+ - Datasets 2.19.1
67
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.9992217898832685,
3
+ "total_flos": 201580263505920.0,
4
+ "train_loss": 0.5411187405403034,
5
+ "train_runtime": 23935.6127,
6
+ "train_samples": 98945,
7
+ "train_samples_per_second": 1.288,
8
+ "train_steps_per_second": 0.04
9
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.43.4"
6
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.9992217898832685,
3
+ "total_flos": 201580263505920.0,
4
+ "train_loss": 0.5411187405403034,
5
+ "train_runtime": 23935.6127,
6
+ "train_samples": 98945,
7
+ "train_samples_per_second": 1.288,
8
+ "train_steps_per_second": 0.04
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1401 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.9992217898832685,
5
+ "eval_steps": 500,
6
+ "global_step": 963,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0010376134889753567,
13
+ "grad_norm": 23.969658904297845,
14
+ "learning_rate": 1.0309278350515465e-07,
15
+ "loss": 1.3725,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.005188067444876783,
20
+ "grad_norm": 22.449071975220065,
21
+ "learning_rate": 5.154639175257732e-07,
22
+ "loss": 1.3719,
23
+ "step": 5
24
+ },
25
+ {
26
+ "epoch": 0.010376134889753566,
27
+ "grad_norm": 8.548339530816044,
28
+ "learning_rate": 1.0309278350515464e-06,
29
+ "loss": 1.2558,
30
+ "step": 10
31
+ },
32
+ {
33
+ "epoch": 0.01556420233463035,
34
+ "grad_norm": 8.215121326664015,
35
+ "learning_rate": 1.5463917525773197e-06,
36
+ "loss": 1.081,
37
+ "step": 15
38
+ },
39
+ {
40
+ "epoch": 0.020752269779507133,
41
+ "grad_norm": 3.0841636846785283,
42
+ "learning_rate": 2.061855670103093e-06,
43
+ "loss": 0.9504,
44
+ "step": 20
45
+ },
46
+ {
47
+ "epoch": 0.02594033722438392,
48
+ "grad_norm": 2.3540131831672575,
49
+ "learning_rate": 2.577319587628866e-06,
50
+ "loss": 0.9092,
51
+ "step": 25
52
+ },
53
+ {
54
+ "epoch": 0.0311284046692607,
55
+ "grad_norm": 2.217217553056043,
56
+ "learning_rate": 3.0927835051546395e-06,
57
+ "loss": 0.8692,
58
+ "step": 30
59
+ },
60
+ {
61
+ "epoch": 0.03631647211413749,
62
+ "grad_norm": 2.2563290398206615,
63
+ "learning_rate": 3.6082474226804126e-06,
64
+ "loss": 0.8445,
65
+ "step": 35
66
+ },
67
+ {
68
+ "epoch": 0.041504539559014265,
69
+ "grad_norm": 2.2874485501473907,
70
+ "learning_rate": 4.123711340206186e-06,
71
+ "loss": 0.8336,
72
+ "step": 40
73
+ },
74
+ {
75
+ "epoch": 0.04669260700389105,
76
+ "grad_norm": 2.3330889165134967,
77
+ "learning_rate": 4.639175257731959e-06,
78
+ "loss": 0.8218,
79
+ "step": 45
80
+ },
81
+ {
82
+ "epoch": 0.05188067444876784,
83
+ "grad_norm": 2.16633956379982,
84
+ "learning_rate": 5.154639175257732e-06,
85
+ "loss": 0.8255,
86
+ "step": 50
87
+ },
88
+ {
89
+ "epoch": 0.057068741893644616,
90
+ "grad_norm": 2.355679025636223,
91
+ "learning_rate": 5.670103092783505e-06,
92
+ "loss": 0.7891,
93
+ "step": 55
94
+ },
95
+ {
96
+ "epoch": 0.0622568093385214,
97
+ "grad_norm": 2.4839432117685036,
98
+ "learning_rate": 6.185567010309279e-06,
99
+ "loss": 0.7814,
100
+ "step": 60
101
+ },
102
+ {
103
+ "epoch": 0.06744487678339818,
104
+ "grad_norm": 2.480013950899919,
105
+ "learning_rate": 6.701030927835052e-06,
106
+ "loss": 0.7674,
107
+ "step": 65
108
+ },
109
+ {
110
+ "epoch": 0.07263294422827497,
111
+ "grad_norm": 2.3378408946103284,
112
+ "learning_rate": 7.216494845360825e-06,
113
+ "loss": 0.766,
114
+ "step": 70
115
+ },
116
+ {
117
+ "epoch": 0.07782101167315175,
118
+ "grad_norm": 2.2751461973205482,
119
+ "learning_rate": 7.731958762886599e-06,
120
+ "loss": 0.7433,
121
+ "step": 75
122
+ },
123
+ {
124
+ "epoch": 0.08300907911802853,
125
+ "grad_norm": 2.428511002623931,
126
+ "learning_rate": 8.247422680412371e-06,
127
+ "loss": 0.7414,
128
+ "step": 80
129
+ },
130
+ {
131
+ "epoch": 0.08819714656290532,
132
+ "grad_norm": 2.4298836600324045,
133
+ "learning_rate": 8.762886597938146e-06,
134
+ "loss": 0.7358,
135
+ "step": 85
136
+ },
137
+ {
138
+ "epoch": 0.0933852140077821,
139
+ "grad_norm": 2.41313947506571,
140
+ "learning_rate": 9.278350515463918e-06,
141
+ "loss": 0.7319,
142
+ "step": 90
143
+ },
144
+ {
145
+ "epoch": 0.09857328145265888,
146
+ "grad_norm": 2.4451429150679274,
147
+ "learning_rate": 9.793814432989691e-06,
148
+ "loss": 0.7323,
149
+ "step": 95
150
+ },
151
+ {
152
+ "epoch": 0.10376134889753567,
153
+ "grad_norm": 2.450193589248992,
154
+ "learning_rate": 9.999703897419048e-06,
155
+ "loss": 0.7231,
156
+ "step": 100
157
+ },
158
+ {
159
+ "epoch": 0.10894941634241245,
160
+ "grad_norm": 2.271786014084883,
161
+ "learning_rate": 9.997894508649995e-06,
162
+ "loss": 0.7149,
163
+ "step": 105
164
+ },
165
+ {
166
+ "epoch": 0.11413748378728923,
167
+ "grad_norm": 2.354564055245926,
168
+ "learning_rate": 9.99444082710777e-06,
169
+ "loss": 0.708,
170
+ "step": 110
171
+ },
172
+ {
173
+ "epoch": 0.11932555123216602,
174
+ "grad_norm": 2.220428698962425,
175
+ "learning_rate": 9.989343989043563e-06,
176
+ "loss": 0.7216,
177
+ "step": 115
178
+ },
179
+ {
180
+ "epoch": 0.1245136186770428,
181
+ "grad_norm": 2.3141712328751396,
182
+ "learning_rate": 9.982605671302293e-06,
183
+ "loss": 0.7091,
184
+ "step": 120
185
+ },
186
+ {
187
+ "epoch": 0.1297016861219196,
188
+ "grad_norm": 2.100396054783955,
189
+ "learning_rate": 9.97422809077092e-06,
190
+ "loss": 0.7066,
191
+ "step": 125
192
+ },
193
+ {
194
+ "epoch": 0.13488975356679636,
195
+ "grad_norm": 2.2484885982675413,
196
+ "learning_rate": 9.9642140036491e-06,
197
+ "loss": 0.7085,
198
+ "step": 130
199
+ },
200
+ {
201
+ "epoch": 0.14007782101167315,
202
+ "grad_norm": 2.1795476193729413,
203
+ "learning_rate": 9.9525667045424e-06,
204
+ "loss": 0.6889,
205
+ "step": 135
206
+ },
207
+ {
208
+ "epoch": 0.14526588845654995,
209
+ "grad_norm": 2.1757051871338593,
210
+ "learning_rate": 9.93929002537839e-06,
211
+ "loss": 0.6921,
212
+ "step": 140
213
+ },
214
+ {
215
+ "epoch": 0.1504539559014267,
216
+ "grad_norm": 2.143005235580036,
217
+ "learning_rate": 9.924388334145943e-06,
218
+ "loss": 0.6907,
219
+ "step": 145
220
+ },
221
+ {
222
+ "epoch": 0.1556420233463035,
223
+ "grad_norm": 2.1989760690420157,
224
+ "learning_rate": 9.90786653345818e-06,
225
+ "loss": 0.6912,
226
+ "step": 150
227
+ },
228
+ {
229
+ "epoch": 0.1608300907911803,
230
+ "grad_norm": 2.004571277860471,
231
+ "learning_rate": 9.889730058939529e-06,
232
+ "loss": 0.6859,
233
+ "step": 155
234
+ },
235
+ {
236
+ "epoch": 0.16601815823605706,
237
+ "grad_norm": 2.05691987455993,
238
+ "learning_rate": 9.869984877437413e-06,
239
+ "loss": 0.6894,
240
+ "step": 160
241
+ },
242
+ {
243
+ "epoch": 0.17120622568093385,
244
+ "grad_norm": 2.230053895792029,
245
+ "learning_rate": 9.848637485059183e-06,
246
+ "loss": 0.6814,
247
+ "step": 165
248
+ },
249
+ {
250
+ "epoch": 0.17639429312581065,
251
+ "grad_norm": 1.9493958638517837,
252
+ "learning_rate": 9.82569490503491e-06,
253
+ "loss": 0.6731,
254
+ "step": 170
255
+ },
256
+ {
257
+ "epoch": 0.1815823605706874,
258
+ "grad_norm": 2.133120594361784,
259
+ "learning_rate": 9.80116468540677e-06,
260
+ "loss": 0.6594,
261
+ "step": 175
262
+ },
263
+ {
264
+ "epoch": 0.1867704280155642,
265
+ "grad_norm": 2.01624934264464,
266
+ "learning_rate": 9.775054896545755e-06,
267
+ "loss": 0.6751,
268
+ "step": 180
269
+ },
270
+ {
271
+ "epoch": 0.191958495460441,
272
+ "grad_norm": 2.1502691215852527,
273
+ "learning_rate": 9.747374128496541e-06,
274
+ "loss": 0.6789,
275
+ "step": 185
276
+ },
277
+ {
278
+ "epoch": 0.19714656290531776,
279
+ "grad_norm": 2.0484102083185194,
280
+ "learning_rate": 9.718131488151399e-06,
281
+ "loss": 0.6676,
282
+ "step": 190
283
+ },
284
+ {
285
+ "epoch": 0.20233463035019456,
286
+ "grad_norm": 2.0715841424222337,
287
+ "learning_rate": 9.687336596254045e-06,
288
+ "loss": 0.6616,
289
+ "step": 195
290
+ },
291
+ {
292
+ "epoch": 0.20752269779507135,
293
+ "grad_norm": 2.012157328183036,
294
+ "learning_rate": 9.654999584234444e-06,
295
+ "loss": 0.652,
296
+ "step": 200
297
+ },
298
+ {
299
+ "epoch": 0.2127107652399481,
300
+ "grad_norm": 2.0669739212271923,
301
+ "learning_rate": 9.621131090875603e-06,
302
+ "loss": 0.6426,
303
+ "step": 205
304
+ },
305
+ {
306
+ "epoch": 0.2178988326848249,
307
+ "grad_norm": 2.0105636015375143,
308
+ "learning_rate": 9.585742258813447e-06,
309
+ "loss": 0.6445,
310
+ "step": 210
311
+ },
312
+ {
313
+ "epoch": 0.2230869001297017,
314
+ "grad_norm": 2.1108266544110688,
315
+ "learning_rate": 9.548844730870903e-06,
316
+ "loss": 0.6438,
317
+ "step": 215
318
+ },
319
+ {
320
+ "epoch": 0.22827496757457846,
321
+ "grad_norm": 2.072355913378756,
322
+ "learning_rate": 9.51045064622747e-06,
323
+ "loss": 0.6565,
324
+ "step": 220
325
+ },
326
+ {
327
+ "epoch": 0.23346303501945526,
328
+ "grad_norm": 2.166007360772802,
329
+ "learning_rate": 9.470572636425451e-06,
330
+ "loss": 0.647,
331
+ "step": 225
332
+ },
333
+ {
334
+ "epoch": 0.23865110246433205,
335
+ "grad_norm": 2.022875957881762,
336
+ "learning_rate": 9.429223821214213e-06,
337
+ "loss": 0.6325,
338
+ "step": 230
339
+ },
340
+ {
341
+ "epoch": 0.2438391699092088,
342
+ "grad_norm": 2.006861087987301,
343
+ "learning_rate": 9.386417804233836e-06,
344
+ "loss": 0.6477,
345
+ "step": 235
346
+ },
347
+ {
348
+ "epoch": 0.2490272373540856,
349
+ "grad_norm": 2.0140489204477645,
350
+ "learning_rate": 9.34216866853954e-06,
351
+ "loss": 0.6391,
352
+ "step": 240
353
+ },
354
+ {
355
+ "epoch": 0.25421530479896237,
356
+ "grad_norm": 1.9489606047213677,
357
+ "learning_rate": 9.296490971968416e-06,
358
+ "loss": 0.6283,
359
+ "step": 245
360
+ },
361
+ {
362
+ "epoch": 0.2594033722438392,
363
+ "grad_norm": 2.072486707132733,
364
+ "learning_rate": 9.249399742349928e-06,
365
+ "loss": 0.6377,
366
+ "step": 250
367
+ },
368
+ {
369
+ "epoch": 0.26459143968871596,
370
+ "grad_norm": 1.9650189580925839,
371
+ "learning_rate": 9.20091047256181e-06,
372
+ "loss": 0.6261,
373
+ "step": 255
374
+ },
375
+ {
376
+ "epoch": 0.2697795071335927,
377
+ "grad_norm": 1.9241991797476943,
378
+ "learning_rate": 9.151039115432946e-06,
379
+ "loss": 0.6184,
380
+ "step": 260
381
+ },
382
+ {
383
+ "epoch": 0.27496757457846954,
384
+ "grad_norm": 1.9743470888532664,
385
+ "learning_rate": 9.099802078494947e-06,
386
+ "loss": 0.6142,
387
+ "step": 265
388
+ },
389
+ {
390
+ "epoch": 0.2801556420233463,
391
+ "grad_norm": 2.160988187935936,
392
+ "learning_rate": 9.047216218584105e-06,
393
+ "loss": 0.6094,
394
+ "step": 270
395
+ },
396
+ {
397
+ "epoch": 0.2853437094682231,
398
+ "grad_norm": 1.9697508480614465,
399
+ "learning_rate": 8.993298836295556e-06,
400
+ "loss": 0.6196,
401
+ "step": 275
402
+ },
403
+ {
404
+ "epoch": 0.2905317769130999,
405
+ "grad_norm": 1.8771524751425768,
406
+ "learning_rate": 8.93806767029143e-06,
407
+ "loss": 0.6163,
408
+ "step": 280
409
+ },
410
+ {
411
+ "epoch": 0.29571984435797666,
412
+ "grad_norm": 2.125863779805947,
413
+ "learning_rate": 8.88154089146488e-06,
414
+ "loss": 0.6167,
415
+ "step": 285
416
+ },
417
+ {
418
+ "epoch": 0.3009079118028534,
419
+ "grad_norm": 2.1188493077731514,
420
+ "learning_rate": 8.823737096961916e-06,
421
+ "loss": 0.5992,
422
+ "step": 290
423
+ },
424
+ {
425
+ "epoch": 0.30609597924773024,
426
+ "grad_norm": 2.1335267497592807,
427
+ "learning_rate": 8.764675304062992e-06,
428
+ "loss": 0.6071,
429
+ "step": 295
430
+ },
431
+ {
432
+ "epoch": 0.311284046692607,
433
+ "grad_norm": 2.036189297244598,
434
+ "learning_rate": 8.704374943926386e-06,
435
+ "loss": 0.609,
436
+ "step": 300
437
+ },
438
+ {
439
+ "epoch": 0.3164721141374838,
440
+ "grad_norm": 1.915927299304865,
441
+ "learning_rate": 8.642855855195394e-06,
442
+ "loss": 0.5945,
443
+ "step": 305
444
+ },
445
+ {
446
+ "epoch": 0.3216601815823606,
447
+ "grad_norm": 2.005194485630929,
448
+ "learning_rate": 8.580138277471476e-06,
449
+ "loss": 0.5959,
450
+ "step": 310
451
+ },
452
+ {
453
+ "epoch": 0.32684824902723736,
454
+ "grad_norm": 2.1368034472887527,
455
+ "learning_rate": 8.516242844655498e-06,
456
+ "loss": 0.5941,
457
+ "step": 315
458
+ },
459
+ {
460
+ "epoch": 0.3320363164721141,
461
+ "grad_norm": 1.9360804934529585,
462
+ "learning_rate": 8.45119057815922e-06,
463
+ "loss": 0.5915,
464
+ "step": 320
465
+ },
466
+ {
467
+ "epoch": 0.33722438391699094,
468
+ "grad_norm": 1.9356101875463727,
469
+ "learning_rate": 8.385002879989328e-06,
470
+ "loss": 0.5838,
471
+ "step": 325
472
+ },
473
+ {
474
+ "epoch": 0.3424124513618677,
475
+ "grad_norm": 2.4311425501079023,
476
+ "learning_rate": 8.317701525706226e-06,
477
+ "loss": 0.5946,
478
+ "step": 330
479
+ },
480
+ {
481
+ "epoch": 0.3476005188067445,
482
+ "grad_norm": 2.356263841306792,
483
+ "learning_rate": 8.249308657259943e-06,
484
+ "loss": 0.567,
485
+ "step": 335
486
+ },
487
+ {
488
+ "epoch": 0.3527885862516213,
489
+ "grad_norm": 2.048334150791661,
490
+ "learning_rate": 8.179846775705504e-06,
491
+ "loss": 0.5795,
492
+ "step": 340
493
+ },
494
+ {
495
+ "epoch": 0.35797665369649806,
496
+ "grad_norm": 1.9977511587812506,
497
+ "learning_rate": 8.109338733800132e-06,
498
+ "loss": 0.5751,
499
+ "step": 345
500
+ },
501
+ {
502
+ "epoch": 0.3631647211413748,
503
+ "grad_norm": 1.8688618314869894,
504
+ "learning_rate": 8.03780772848477e-06,
505
+ "loss": 0.568,
506
+ "step": 350
507
+ },
508
+ {
509
+ "epoch": 0.36835278858625164,
510
+ "grad_norm": 1.93022130905715,
511
+ "learning_rate": 7.965277293252354e-06,
512
+ "loss": 0.5682,
513
+ "step": 355
514
+ },
515
+ {
516
+ "epoch": 0.3735408560311284,
517
+ "grad_norm": 2.0382225242835528,
518
+ "learning_rate": 7.891771290405351e-06,
519
+ "loss": 0.5617,
520
+ "step": 360
521
+ },
522
+ {
523
+ "epoch": 0.3787289234760052,
524
+ "grad_norm": 1.9924209327442368,
525
+ "learning_rate": 7.817313903205148e-06,
526
+ "loss": 0.5577,
527
+ "step": 365
528
+ },
529
+ {
530
+ "epoch": 0.383916990920882,
531
+ "grad_norm": 1.9678458173326334,
532
+ "learning_rate": 7.741929627915814e-06,
533
+ "loss": 0.56,
534
+ "step": 370
535
+ },
536
+ {
537
+ "epoch": 0.38910505836575876,
538
+ "grad_norm": 2.2405618654805215,
539
+ "learning_rate": 7.66564326574491e-06,
540
+ "loss": 0.5513,
541
+ "step": 375
542
+ },
543
+ {
544
+ "epoch": 0.3942931258106355,
545
+ "grad_norm": 1.9971872990885233,
546
+ "learning_rate": 7.588479914683954e-06,
547
+ "loss": 0.5445,
548
+ "step": 380
549
+ },
550
+ {
551
+ "epoch": 0.39948119325551235,
552
+ "grad_norm": 2.06807252227761,
553
+ "learning_rate": 7.510464961251271e-06,
554
+ "loss": 0.5674,
555
+ "step": 385
556
+ },
557
+ {
558
+ "epoch": 0.4046692607003891,
559
+ "grad_norm": 1.9627368535332135,
560
+ "learning_rate": 7.431624072139884e-06,
561
+ "loss": 0.5435,
562
+ "step": 390
563
+ },
564
+ {
565
+ "epoch": 0.4098573281452659,
566
+ "grad_norm": 1.9716804464407136,
567
+ "learning_rate": 7.351983185773259e-06,
568
+ "loss": 0.5552,
569
+ "step": 395
570
+ },
571
+ {
572
+ "epoch": 0.4150453955901427,
573
+ "grad_norm": 1.9693396583392846,
574
+ "learning_rate": 7.271568503771632e-06,
575
+ "loss": 0.5343,
576
+ "step": 400
577
+ },
578
+ {
579
+ "epoch": 0.42023346303501946,
580
+ "grad_norm": 1.9432949161104107,
581
+ "learning_rate": 7.190406482331757e-06,
582
+ "loss": 0.5475,
583
+ "step": 405
584
+ },
585
+ {
586
+ "epoch": 0.4254215304798962,
587
+ "grad_norm": 2.0194917717314045,
588
+ "learning_rate": 7.108523823522891e-06,
589
+ "loss": 0.5477,
590
+ "step": 410
591
+ },
592
+ {
593
+ "epoch": 0.43060959792477305,
594
+ "grad_norm": 2.206404974952941,
595
+ "learning_rate": 7.0259474665018915e-06,
596
+ "loss": 0.5425,
597
+ "step": 415
598
+ },
599
+ {
600
+ "epoch": 0.4357976653696498,
601
+ "grad_norm": 1.9526533277899327,
602
+ "learning_rate": 6.942704578650312e-06,
603
+ "loss": 0.5161,
604
+ "step": 420
605
+ },
606
+ {
607
+ "epoch": 0.4409857328145266,
608
+ "grad_norm": 2.0097466124913117,
609
+ "learning_rate": 6.858822546636417e-06,
610
+ "loss": 0.5331,
611
+ "step": 425
612
+ },
613
+ {
614
+ "epoch": 0.4461738002594034,
615
+ "grad_norm": 1.8348649689633039,
616
+ "learning_rate": 6.774328967405035e-06,
617
+ "loss": 0.523,
618
+ "step": 430
619
+ },
620
+ {
621
+ "epoch": 0.45136186770428016,
622
+ "grad_norm": 2.139084532722164,
623
+ "learning_rate": 6.689251639098261e-06,
624
+ "loss": 0.5251,
625
+ "step": 435
626
+ },
627
+ {
628
+ "epoch": 0.4565499351491569,
629
+ "grad_norm": 1.9708479629081865,
630
+ "learning_rate": 6.603618551909935e-06,
631
+ "loss": 0.5232,
632
+ "step": 440
633
+ },
634
+ {
635
+ "epoch": 0.46173800259403375,
636
+ "grad_norm": 1.9331722289768318,
637
+ "learning_rate": 6.517457878876958e-06,
638
+ "loss": 0.5305,
639
+ "step": 445
640
+ },
641
+ {
642
+ "epoch": 0.4669260700389105,
643
+ "grad_norm": 1.859009250403284,
644
+ "learning_rate": 6.430797966610436e-06,
645
+ "loss": 0.5159,
646
+ "step": 450
647
+ },
648
+ {
649
+ "epoch": 0.4721141374837873,
650
+ "grad_norm": 1.986527066309499,
651
+ "learning_rate": 6.343667325969736e-06,
652
+ "loss": 0.5367,
653
+ "step": 455
654
+ },
655
+ {
656
+ "epoch": 0.4773022049286641,
657
+ "grad_norm": 1.9771277544299588,
658
+ "learning_rate": 6.256094622682493e-06,
659
+ "loss": 0.5123,
660
+ "step": 460
661
+ },
662
+ {
663
+ "epoch": 0.48249027237354086,
664
+ "grad_norm": 2.0022259730400904,
665
+ "learning_rate": 6.168108667913666e-06,
666
+ "loss": 0.5166,
667
+ "step": 465
668
+ },
669
+ {
670
+ "epoch": 0.4876783398184176,
671
+ "grad_norm": 1.9991961519932744,
672
+ "learning_rate": 6.079738408786753e-06,
673
+ "loss": 0.5161,
674
+ "step": 470
675
+ },
676
+ {
677
+ "epoch": 0.49286640726329445,
678
+ "grad_norm": 2.0805595238898307,
679
+ "learning_rate": 5.9910129188602665e-06,
680
+ "loss": 0.5179,
681
+ "step": 475
682
+ },
683
+ {
684
+ "epoch": 0.4980544747081712,
685
+ "grad_norm": 1.929253006230254,
686
+ "learning_rate": 5.9019613885626235e-06,
687
+ "loss": 0.5097,
688
+ "step": 480
689
+ },
690
+ {
691
+ "epoch": 0.503242542153048,
692
+ "grad_norm": 2.25129632838715,
693
+ "learning_rate": 5.812613115588575e-06,
694
+ "loss": 0.4971,
695
+ "step": 485
696
+ },
697
+ {
698
+ "epoch": 0.5084306095979247,
699
+ "grad_norm": 1.9119339241166262,
700
+ "learning_rate": 5.722997495260348e-06,
701
+ "loss": 0.4988,
702
+ "step": 490
703
+ },
704
+ {
705
+ "epoch": 0.5136186770428015,
706
+ "grad_norm": 1.8300200112998326,
707
+ "learning_rate": 5.6331440108566735e-06,
708
+ "loss": 0.4941,
709
+ "step": 495
710
+ },
711
+ {
712
+ "epoch": 0.5188067444876784,
713
+ "grad_norm": 1.9591247994452368,
714
+ "learning_rate": 5.543082223912875e-06,
715
+ "loss": 0.492,
716
+ "step": 500
717
+ },
718
+ {
719
+ "epoch": 0.5239948119325551,
720
+ "grad_norm": 1.99136453982626,
721
+ "learning_rate": 5.452841764495203e-06,
722
+ "loss": 0.5002,
723
+ "step": 505
724
+ },
725
+ {
726
+ "epoch": 0.5291828793774319,
727
+ "grad_norm": 1.9961024804052654,
728
+ "learning_rate": 5.362452321452636e-06,
729
+ "loss": 0.4772,
730
+ "step": 510
731
+ },
732
+ {
733
+ "epoch": 0.5343709468223087,
734
+ "grad_norm": 1.9607124098040063,
735
+ "learning_rate": 5.2719436326493255e-06,
736
+ "loss": 0.4908,
737
+ "step": 515
738
+ },
739
+ {
740
+ "epoch": 0.5395590142671854,
741
+ "grad_norm": 1.9303906010446525,
742
+ "learning_rate": 5.181345475180941e-06,
743
+ "loss": 0.4866,
744
+ "step": 520
745
+ },
746
+ {
747
+ "epoch": 0.5447470817120622,
748
+ "grad_norm": 2.0420688559734503,
749
+ "learning_rate": 5.090687655578078e-06,
750
+ "loss": 0.4769,
751
+ "step": 525
752
+ },
753
+ {
754
+ "epoch": 0.5499351491569391,
755
+ "grad_norm": 1.9908642175713687,
756
+ "learning_rate": 5e-06,
757
+ "loss": 0.4742,
758
+ "step": 530
759
+ },
760
+ {
761
+ "epoch": 0.5551232166018158,
762
+ "grad_norm": 1.9960779934532675,
763
+ "learning_rate": 4.909312344421923e-06,
764
+ "loss": 0.4666,
765
+ "step": 535
766
+ },
767
+ {
768
+ "epoch": 0.5603112840466926,
769
+ "grad_norm": 1.9274839933909422,
770
+ "learning_rate": 4.8186545248190604e-06,
771
+ "loss": 0.4866,
772
+ "step": 540
773
+ },
774
+ {
775
+ "epoch": 0.5654993514915694,
776
+ "grad_norm": 1.9162466337096817,
777
+ "learning_rate": 4.7280563673506745e-06,
778
+ "loss": 0.4692,
779
+ "step": 545
780
+ },
781
+ {
782
+ "epoch": 0.5706874189364461,
783
+ "grad_norm": 2.07386431606307,
784
+ "learning_rate": 4.637547678547366e-06,
785
+ "loss": 0.4859,
786
+ "step": 550
787
+ },
788
+ {
789
+ "epoch": 0.5758754863813229,
790
+ "grad_norm": 2.0201984812958385,
791
+ "learning_rate": 4.547158235504797e-06,
792
+ "loss": 0.4718,
793
+ "step": 555
794
+ },
795
+ {
796
+ "epoch": 0.5810635538261998,
797
+ "grad_norm": 1.95015272613481,
798
+ "learning_rate": 4.4569177760871255e-06,
799
+ "loss": 0.475,
800
+ "step": 560
801
+ },
802
+ {
803
+ "epoch": 0.5862516212710766,
804
+ "grad_norm": 1.944586565605588,
805
+ "learning_rate": 4.366855989143326e-06,
806
+ "loss": 0.4551,
807
+ "step": 565
808
+ },
809
+ {
810
+ "epoch": 0.5914396887159533,
811
+ "grad_norm": 1.9208589567145171,
812
+ "learning_rate": 4.277002504739653e-06,
813
+ "loss": 0.4686,
814
+ "step": 570
815
+ },
816
+ {
817
+ "epoch": 0.5966277561608301,
818
+ "grad_norm": 1.8639671285460482,
819
+ "learning_rate": 4.187386884411426e-06,
820
+ "loss": 0.4557,
821
+ "step": 575
822
+ },
823
+ {
824
+ "epoch": 0.6018158236057068,
825
+ "grad_norm": 1.9975578797091653,
826
+ "learning_rate": 4.098038611437377e-06,
827
+ "loss": 0.4651,
828
+ "step": 580
829
+ },
830
+ {
831
+ "epoch": 0.6070038910505836,
832
+ "grad_norm": 1.961651938542185,
833
+ "learning_rate": 4.008987081139734e-06,
834
+ "loss": 0.4643,
835
+ "step": 585
836
+ },
837
+ {
838
+ "epoch": 0.6121919584954605,
839
+ "grad_norm": 1.9374158302120401,
840
+ "learning_rate": 3.920261591213249e-06,
841
+ "loss": 0.4556,
842
+ "step": 590
843
+ },
844
+ {
845
+ "epoch": 0.6173800259403373,
846
+ "grad_norm": 1.9090835435895448,
847
+ "learning_rate": 3.8318913320863355e-06,
848
+ "loss": 0.4536,
849
+ "step": 595
850
+ },
851
+ {
852
+ "epoch": 0.622568093385214,
853
+ "grad_norm": 1.8975263865890188,
854
+ "learning_rate": 3.7439053773175092e-06,
855
+ "loss": 0.4615,
856
+ "step": 600
857
+ },
858
+ {
859
+ "epoch": 0.6277561608300908,
860
+ "grad_norm": 1.9060390294655216,
861
+ "learning_rate": 3.6563326740302664e-06,
862
+ "loss": 0.4459,
863
+ "step": 605
864
+ },
865
+ {
866
+ "epoch": 0.6329442282749675,
867
+ "grad_norm": 1.9725006931962796,
868
+ "learning_rate": 3.569202033389565e-06,
869
+ "loss": 0.4451,
870
+ "step": 610
871
+ },
872
+ {
873
+ "epoch": 0.6381322957198443,
874
+ "grad_norm": 1.9621067476956515,
875
+ "learning_rate": 3.4825421211230437e-06,
876
+ "loss": 0.4419,
877
+ "step": 615
878
+ },
879
+ {
880
+ "epoch": 0.6433203631647212,
881
+ "grad_norm": 2.098443239659209,
882
+ "learning_rate": 3.3963814480900665e-06,
883
+ "loss": 0.4415,
884
+ "step": 620
885
+ },
886
+ {
887
+ "epoch": 0.648508430609598,
888
+ "grad_norm": 1.8981208726840302,
889
+ "learning_rate": 3.310748360901741e-06,
890
+ "loss": 0.4456,
891
+ "step": 625
892
+ },
893
+ {
894
+ "epoch": 0.6536964980544747,
895
+ "grad_norm": 1.8947168989269416,
896
+ "learning_rate": 3.225671032594966e-06,
897
+ "loss": 0.4229,
898
+ "step": 630
899
+ },
900
+ {
901
+ "epoch": 0.6588845654993515,
902
+ "grad_norm": 2.0138652650509288,
903
+ "learning_rate": 3.1411774533635854e-06,
904
+ "loss": 0.437,
905
+ "step": 635
906
+ },
907
+ {
908
+ "epoch": 0.6640726329442282,
909
+ "grad_norm": 1.8903378440015823,
910
+ "learning_rate": 3.0572954213496897e-06,
911
+ "loss": 0.4454,
912
+ "step": 640
913
+ },
914
+ {
915
+ "epoch": 0.669260700389105,
916
+ "grad_norm": 1.8448484960177367,
917
+ "learning_rate": 2.9740525334981105e-06,
918
+ "loss": 0.4398,
919
+ "step": 645
920
+ },
921
+ {
922
+ "epoch": 0.6744487678339819,
923
+ "grad_norm": 1.9976530631786225,
924
+ "learning_rate": 2.8914761764771093e-06,
925
+ "loss": 0.429,
926
+ "step": 650
927
+ },
928
+ {
929
+ "epoch": 0.6796368352788587,
930
+ "grad_norm": 1.9155018572353837,
931
+ "learning_rate": 2.809593517668243e-06,
932
+ "loss": 0.4309,
933
+ "step": 655
934
+ },
935
+ {
936
+ "epoch": 0.6848249027237354,
937
+ "grad_norm": 1.942714148946629,
938
+ "learning_rate": 2.728431496228369e-06,
939
+ "loss": 0.4248,
940
+ "step": 660
941
+ },
942
+ {
943
+ "epoch": 0.6900129701686122,
944
+ "grad_norm": 2.013023734418392,
945
+ "learning_rate": 2.648016814226742e-06,
946
+ "loss": 0.4304,
947
+ "step": 665
948
+ },
949
+ {
950
+ "epoch": 0.695201037613489,
951
+ "grad_norm": 1.9023117871214554,
952
+ "learning_rate": 2.5683759278601174e-06,
953
+ "loss": 0.4338,
954
+ "step": 670
955
+ },
956
+ {
957
+ "epoch": 0.7003891050583657,
958
+ "grad_norm": 1.8911448184302957,
959
+ "learning_rate": 2.4895350387487304e-06,
960
+ "loss": 0.4245,
961
+ "step": 675
962
+ },
963
+ {
964
+ "epoch": 0.7055771725032426,
965
+ "grad_norm": 2.0358392917626813,
966
+ "learning_rate": 2.4115200853160475e-06,
967
+ "loss": 0.4194,
968
+ "step": 680
969
+ },
970
+ {
971
+ "epoch": 0.7107652399481194,
972
+ "grad_norm": 1.9510576677492195,
973
+ "learning_rate": 2.3343567342550933e-06,
974
+ "loss": 0.4267,
975
+ "step": 685
976
+ },
977
+ {
978
+ "epoch": 0.7159533073929961,
979
+ "grad_norm": 1.8690267408594539,
980
+ "learning_rate": 2.258070372084188e-06,
981
+ "loss": 0.4312,
982
+ "step": 690
983
+ },
984
+ {
985
+ "epoch": 0.7211413748378729,
986
+ "grad_norm": 1.8322122073891454,
987
+ "learning_rate": 2.182686096794852e-06,
988
+ "loss": 0.4207,
989
+ "step": 695
990
+ },
991
+ {
992
+ "epoch": 0.7263294422827496,
993
+ "grad_norm": 2.0311002524177253,
994
+ "learning_rate": 2.108228709594649e-06,
995
+ "loss": 0.4227,
996
+ "step": 700
997
+ },
998
+ {
999
+ "epoch": 0.7315175097276264,
1000
+ "grad_norm": 1.8678394687630775,
1001
+ "learning_rate": 2.0347227067476478e-06,
1002
+ "loss": 0.4149,
1003
+ "step": 705
1004
+ },
1005
+ {
1006
+ "epoch": 0.7367055771725033,
1007
+ "grad_norm": 1.8521301731665931,
1008
+ "learning_rate": 1.962192271515232e-06,
1009
+ "loss": 0.4192,
1010
+ "step": 710
1011
+ },
1012
+ {
1013
+ "epoch": 0.74189364461738,
1014
+ "grad_norm": 1.9291143236144128,
1015
+ "learning_rate": 1.8906612661998698e-06,
1016
+ "loss": 0.4128,
1017
+ "step": 715
1018
+ },
1019
+ {
1020
+ "epoch": 0.7470817120622568,
1021
+ "grad_norm": 1.991134829662921,
1022
+ "learning_rate": 1.820153224294498e-06,
1023
+ "loss": 0.4102,
1024
+ "step": 720
1025
+ },
1026
+ {
1027
+ "epoch": 0.7522697795071336,
1028
+ "grad_norm": 1.8597303553848081,
1029
+ "learning_rate": 1.750691342740058e-06,
1030
+ "loss": 0.4104,
1031
+ "step": 725
1032
+ },
1033
+ {
1034
+ "epoch": 0.7574578469520103,
1035
+ "grad_norm": 1.8334844899907363,
1036
+ "learning_rate": 1.6822984742937764e-06,
1037
+ "loss": 0.4049,
1038
+ "step": 730
1039
+ },
1040
+ {
1041
+ "epoch": 0.7626459143968871,
1042
+ "grad_norm": 1.839241095874111,
1043
+ "learning_rate": 1.6149971200106723e-06,
1044
+ "loss": 0.4009,
1045
+ "step": 735
1046
+ },
1047
+ {
1048
+ "epoch": 0.767833981841764,
1049
+ "grad_norm": 1.8614288971061537,
1050
+ "learning_rate": 1.548809421840779e-06,
1051
+ "loss": 0.4029,
1052
+ "step": 740
1053
+ },
1054
+ {
1055
+ "epoch": 0.7730220492866408,
1056
+ "grad_norm": 1.9712640153496117,
1057
+ "learning_rate": 1.483757155344503e-06,
1058
+ "loss": 0.4056,
1059
+ "step": 745
1060
+ },
1061
+ {
1062
+ "epoch": 0.7782101167315175,
1063
+ "grad_norm": 1.7973975593361922,
1064
+ "learning_rate": 1.4198617225285244e-06,
1065
+ "loss": 0.409,
1066
+ "step": 750
1067
+ },
1068
+ {
1069
+ "epoch": 0.7833981841763943,
1070
+ "grad_norm": 1.8870139707940816,
1071
+ "learning_rate": 1.3571441448046086e-06,
1072
+ "loss": 0.4117,
1073
+ "step": 755
1074
+ },
1075
+ {
1076
+ "epoch": 0.788586251621271,
1077
+ "grad_norm": 1.9255920717839368,
1078
+ "learning_rate": 1.2956250560736143e-06,
1079
+ "loss": 0.4097,
1080
+ "step": 760
1081
+ },
1082
+ {
1083
+ "epoch": 0.7937743190661478,
1084
+ "grad_norm": 1.8605570734597534,
1085
+ "learning_rate": 1.2353246959370086e-06,
1086
+ "loss": 0.3885,
1087
+ "step": 765
1088
+ },
1089
+ {
1090
+ "epoch": 0.7989623865110247,
1091
+ "grad_norm": 1.8678208345700735,
1092
+ "learning_rate": 1.1762629030380867e-06,
1093
+ "loss": 0.4044,
1094
+ "step": 770
1095
+ },
1096
+ {
1097
+ "epoch": 0.8041504539559015,
1098
+ "grad_norm": 1.8406724398818959,
1099
+ "learning_rate": 1.118459108535122e-06,
1100
+ "loss": 0.3991,
1101
+ "step": 775
1102
+ },
1103
+ {
1104
+ "epoch": 0.8093385214007782,
1105
+ "grad_norm": 1.9447178497450672,
1106
+ "learning_rate": 1.061932329708572e-06,
1107
+ "loss": 0.3878,
1108
+ "step": 780
1109
+ },
1110
+ {
1111
+ "epoch": 0.814526588845655,
1112
+ "grad_norm": 1.903347183666585,
1113
+ "learning_rate": 1.006701163704445e-06,
1114
+ "loss": 0.3994,
1115
+ "step": 785
1116
+ },
1117
+ {
1118
+ "epoch": 0.8197146562905318,
1119
+ "grad_norm": 1.7817648380438804,
1120
+ "learning_rate": 9.527837814158963e-07,
1121
+ "loss": 0.3943,
1122
+ "step": 790
1123
+ },
1124
+ {
1125
+ "epoch": 0.8249027237354085,
1126
+ "grad_norm": 1.89718875917406,
1127
+ "learning_rate": 9.001979215050544e-07,
1128
+ "loss": 0.3929,
1129
+ "step": 795
1130
+ },
1131
+ {
1132
+ "epoch": 0.8300907911802854,
1133
+ "grad_norm": 1.8825895138353903,
1134
+ "learning_rate": 8.489608845670527e-07,
1135
+ "loss": 0.3924,
1136
+ "step": 800
1137
+ },
1138
+ {
1139
+ "epoch": 0.8352788586251622,
1140
+ "grad_norm": 1.7908515887362904,
1141
+ "learning_rate": 7.99089527438191e-07,
1142
+ "loss": 0.3919,
1143
+ "step": 805
1144
+ },
1145
+ {
1146
+ "epoch": 0.8404669260700389,
1147
+ "grad_norm": 2.0314129121613034,
1148
+ "learning_rate": 7.506002576500732e-07,
1149
+ "loss": 0.3941,
1150
+ "step": 810
1151
+ },
1152
+ {
1153
+ "epoch": 0.8456549935149157,
1154
+ "grad_norm": 1.838737045068825,
1155
+ "learning_rate": 7.035090280315854e-07,
1156
+ "loss": 0.398,
1157
+ "step": 815
1158
+ },
1159
+ {
1160
+ "epoch": 0.8508430609597925,
1161
+ "grad_norm": 1.8701463175206698,
1162
+ "learning_rate": 6.578313314604612e-07,
1163
+ "loss": 0.395,
1164
+ "step": 820
1165
+ },
1166
+ {
1167
+ "epoch": 0.8560311284046692,
1168
+ "grad_norm": 1.8734176088672492,
1169
+ "learning_rate": 6.135821957661658e-07,
1170
+ "loss": 0.3945,
1171
+ "step": 825
1172
+ },
1173
+ {
1174
+ "epoch": 0.8612191958495461,
1175
+ "grad_norm": 1.8454123160341045,
1176
+ "learning_rate": 5.707761787857879e-07,
1177
+ "loss": 0.3855,
1178
+ "step": 830
1179
+ },
1180
+ {
1181
+ "epoch": 0.8664072632944229,
1182
+ "grad_norm": 1.7755466173110739,
1183
+ "learning_rate": 5.294273635745517e-07,
1184
+ "loss": 0.3971,
1185
+ "step": 835
1186
+ },
1187
+ {
1188
+ "epoch": 0.8715953307392996,
1189
+ "grad_norm": 1.8394864397787671,
1190
+ "learning_rate": 4.895493537725326e-07,
1191
+ "loss": 0.3966,
1192
+ "step": 840
1193
+ },
1194
+ {
1195
+ "epoch": 0.8767833981841764,
1196
+ "grad_norm": 1.7915928948304078,
1197
+ "learning_rate": 4.511552691290988e-07,
1198
+ "loss": 0.3979,
1199
+ "step": 845
1200
+ },
1201
+ {
1202
+ "epoch": 0.8819714656290532,
1203
+ "grad_norm": 1.7863787006400424,
1204
+ "learning_rate": 4.1425774118655505e-07,
1205
+ "loss": 0.3826,
1206
+ "step": 850
1207
+ },
1208
+ {
1209
+ "epoch": 0.8871595330739299,
1210
+ "grad_norm": 1.8024141112662704,
1211
+ "learning_rate": 3.7886890912439633e-07,
1212
+ "loss": 0.3862,
1213
+ "step": 855
1214
+ },
1215
+ {
1216
+ "epoch": 0.8923476005188068,
1217
+ "grad_norm": 1.8180972720099156,
1218
+ "learning_rate": 3.4500041576555733e-07,
1219
+ "loss": 0.3859,
1220
+ "step": 860
1221
+ },
1222
+ {
1223
+ "epoch": 0.8975356679636836,
1224
+ "grad_norm": 1.7622051589037506,
1225
+ "learning_rate": 3.1266340374595693e-07,
1226
+ "loss": 0.3831,
1227
+ "step": 865
1228
+ },
1229
+ {
1230
+ "epoch": 0.9027237354085603,
1231
+ "grad_norm": 1.7926800043760007,
1232
+ "learning_rate": 2.818685118486025e-07,
1233
+ "loss": 0.3927,
1234
+ "step": 870
1235
+ },
1236
+ {
1237
+ "epoch": 0.9079118028534371,
1238
+ "grad_norm": 1.8515815235983688,
1239
+ "learning_rate": 2.526258715034602e-07,
1240
+ "loss": 0.3787,
1241
+ "step": 875
1242
+ },
1243
+ {
1244
+ "epoch": 0.9130998702983139,
1245
+ "grad_norm": 1.761786866550431,
1246
+ "learning_rate": 2.2494510345424657e-07,
1247
+ "loss": 0.3881,
1248
+ "step": 880
1249
+ },
1250
+ {
1251
+ "epoch": 0.9182879377431906,
1252
+ "grad_norm": 1.731506703869926,
1253
+ "learning_rate": 1.988353145932298e-07,
1254
+ "loss": 0.3762,
1255
+ "step": 885
1256
+ },
1257
+ {
1258
+ "epoch": 0.9234760051880675,
1259
+ "grad_norm": 1.8427166106595052,
1260
+ "learning_rate": 1.7430509496508985e-07,
1261
+ "loss": 0.3975,
1262
+ "step": 890
1263
+ },
1264
+ {
1265
+ "epoch": 0.9286640726329443,
1266
+ "grad_norm": 1.761769698023775,
1267
+ "learning_rate": 1.5136251494081822e-07,
1268
+ "loss": 0.3842,
1269
+ "step": 895
1270
+ },
1271
+ {
1272
+ "epoch": 0.933852140077821,
1273
+ "grad_norm": 1.8297504100937483,
1274
+ "learning_rate": 1.3001512256258841e-07,
1275
+ "loss": 0.3916,
1276
+ "step": 900
1277
+ },
1278
+ {
1279
+ "epoch": 0.9390402075226978,
1280
+ "grad_norm": 1.8143369848190358,
1281
+ "learning_rate": 1.1026994106047296e-07,
1282
+ "loss": 0.3911,
1283
+ "step": 905
1284
+ },
1285
+ {
1286
+ "epoch": 0.9442282749675746,
1287
+ "grad_norm": 1.7462314691918333,
1288
+ "learning_rate": 9.213346654182054e-08,
1289
+ "loss": 0.3888,
1290
+ "step": 910
1291
+ },
1292
+ {
1293
+ "epoch": 0.9494163424124513,
1294
+ "grad_norm": 1.842285372864709,
1295
+ "learning_rate": 7.561166585405789e-08,
1296
+ "loss": 0.3823,
1297
+ "step": 915
1298
+ },
1299
+ {
1300
+ "epoch": 0.9546044098573282,
1301
+ "grad_norm": 1.798454935332072,
1302
+ "learning_rate": 6.070997462161055e-08,
1303
+ "loss": 0.4032,
1304
+ "step": 920
1305
+ },
1306
+ {
1307
+ "epoch": 0.959792477302205,
1308
+ "grad_norm": 1.8579672164577692,
1309
+ "learning_rate": 4.743329545760122e-08,
1310
+ "loss": 0.3811,
1311
+ "step": 925
1312
+ },
1313
+ {
1314
+ "epoch": 0.9649805447470817,
1315
+ "grad_norm": 1.764976690651984,
1316
+ "learning_rate": 3.578599635090163e-08,
1317
+ "loss": 0.3806,
1318
+ "step": 930
1319
+ },
1320
+ {
1321
+ "epoch": 0.9701686121919585,
1322
+ "grad_norm": 1.7085373084916373,
1323
+ "learning_rate": 2.577190922908035e-08,
1324
+ "loss": 0.3888,
1325
+ "step": 935
1326
+ },
1327
+ {
1328
+ "epoch": 0.9753566796368353,
1329
+ "grad_norm": 1.7431684765639506,
1330
+ "learning_rate": 1.7394328697707407e-08,
1331
+ "loss": 0.3901,
1332
+ "step": 940
1333
+ },
1334
+ {
1335
+ "epoch": 0.980544747081712,
1336
+ "grad_norm": 1.8495600056895127,
1337
+ "learning_rate": 1.0656010956437979e-08,
1338
+ "loss": 0.3918,
1339
+ "step": 945
1340
+ },
1341
+ {
1342
+ "epoch": 0.9857328145265889,
1343
+ "grad_norm": 1.8616847493274582,
1344
+ "learning_rate": 5.5591728922316235e-09,
1345
+ "loss": 0.3895,
1346
+ "step": 950
1347
+ },
1348
+ {
1349
+ "epoch": 0.9909208819714657,
1350
+ "grad_norm": 1.8274058784400706,
1351
+ "learning_rate": 2.1054913500051512e-09,
1352
+ "loss": 0.3831,
1353
+ "step": 955
1354
+ },
1355
+ {
1356
+ "epoch": 0.9961089494163424,
1357
+ "grad_norm": 1.7888916632814764,
1358
+ "learning_rate": 2.9610258095169596e-10,
1359
+ "loss": 0.3863,
1360
+ "step": 960
1361
+ },
1362
+ {
1363
+ "epoch": 0.9992217898832685,
1364
+ "eval_loss": 0.35284245014190674,
1365
+ "eval_runtime": 0.9437,
1366
+ "eval_samples_per_second": 2.119,
1367
+ "eval_steps_per_second": 1.06,
1368
+ "step": 963
1369
+ },
1370
+ {
1371
+ "epoch": 0.9992217898832685,
1372
+ "step": 963,
1373
+ "total_flos": 201580263505920.0,
1374
+ "train_loss": 0.5411187405403034,
1375
+ "train_runtime": 23935.6127,
1376
+ "train_samples_per_second": 1.288,
1377
+ "train_steps_per_second": 0.04
1378
+ }
1379
+ ],
1380
+ "logging_steps": 5,
1381
+ "max_steps": 963,
1382
+ "num_input_tokens_seen": 0,
1383
+ "num_train_epochs": 1,
1384
+ "save_steps": 100,
1385
+ "stateful_callbacks": {
1386
+ "TrainerControl": {
1387
+ "args": {
1388
+ "should_epoch_stop": false,
1389
+ "should_evaluate": false,
1390
+ "should_log": false,
1391
+ "should_save": true,
1392
+ "should_training_stop": true
1393
+ },
1394
+ "attributes": {}
1395
+ }
1396
+ },
1397
+ "total_flos": 201580263505920.0,
1398
+ "train_batch_size": 2,
1399
+ "trial_name": null,
1400
+ "trial_params": null
1401
+ }