NicholasCorrado commited on
Commit
ccdfcea
1 Parent(s): d16c0da

Model save

Browse files
Files changed (4) hide show
  1. README.md +1 -28
  2. all_results.json +6 -19
  3. train_results.json +6 -6
  4. trainer_state.json +18 -1603
README.md CHANGED
@@ -3,15 +3,9 @@ library_name: transformers
3
  license: apache-2.0
4
  base_model: alignment-handbook/zephyr-7b-sft-full
5
  tags:
6
- - alignment-handbook
7
  - trl
8
  - dpo
9
  - generated_from_trainer
10
- - trl
11
- - dpo
12
- - generated_from_trainer
13
- datasets:
14
- - data/rlced_conifer
15
  model-index:
16
  - name: rlced-conifer-zephyr-7b-dpo-2e
17
  results: []
@@ -22,17 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
22
 
23
  # rlced-conifer-zephyr-7b-dpo-2e
24
 
25
- This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the data/rlced_conifer dataset.
26
- It achieves the following results on the evaluation set:
27
- - Loss: 0.1593
28
- - Rewards/chosen: -9.2247
29
- - Rewards/rejected: -22.0325
30
- - Rewards/accuracies: 0.9326
31
- - Rewards/margins: 12.8079
32
- - Logps/rejected: -2649.1938
33
- - Logps/chosen: -1345.8763
34
- - Logits/rejected: 3.3806
35
- - Logits/chosen: 0.2633
36
 
37
  ## Model description
38
 
@@ -67,17 +51,6 @@ The following hyperparameters were used during training:
67
 
68
  ### Training results
69
 
70
- | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
71
- |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
72
- | 0.2571 | 0.2076 | 100 | 0.2502 | -2.3096 | -5.4299 | 0.8897 | 3.1203 | -988.9271 | -654.3687 | -2.5612 | -2.5702 |
73
- | 0.1684 | 0.4152 | 200 | 0.1771 | -3.3724 | -8.6989 | 0.9142 | 5.3265 | -1315.8264 | -760.6486 | 0.9749 | -0.3442 |
74
- | 0.1506 | 0.6227 | 300 | 0.1640 | -3.1556 | -9.7233 | 0.9216 | 6.5677 | -1418.2717 | -738.9712 | 1.6434 | -0.4687 |
75
- | 0.1426 | 0.8303 | 400 | 0.1523 | -5.6795 | -14.2428 | 0.9301 | 8.5633 | -1870.2236 | -991.3617 | 3.6793 | 0.9505 |
76
- | 0.0881 | 1.0379 | 500 | 0.1592 | -8.3376 | -21.0153 | 0.9314 | 12.6778 | -2547.4749 | -1257.1659 | 4.5267 | 1.2345 |
77
- | 0.0774 | 1.2455 | 600 | 0.1560 | -8.5445 | -20.6787 | 0.9326 | 12.1342 | -2513.8113 | -1277.8566 | 4.0373 | 1.1427 |
78
- | 0.0747 | 1.4530 | 700 | 0.1579 | -8.8472 | -21.2653 | 0.9277 | 12.4181 | -2572.4675 | -1308.1294 | 3.5812 | 0.4989 |
79
- | 0.0811 | 1.6606 | 800 | 0.1545 | -8.3810 | -19.9040 | 0.9289 | 11.5230 | -2436.3406 | -1261.5127 | 3.2546 | 0.2172 |
80
- | 0.069 | 1.8682 | 900 | 0.1592 | -9.2177 | -21.9885 | 0.9326 | 12.7708 | -2644.7937 | -1345.1790 | 3.3692 | 0.2563 |
81
 
82
 
83
  ### Framework versions
 
3
  license: apache-2.0
4
  base_model: alignment-handbook/zephyr-7b-sft-full
5
  tags:
 
6
  - trl
7
  - dpo
8
  - generated_from_trainer
 
 
 
 
 
9
  model-index:
10
  - name: rlced-conifer-zephyr-7b-dpo-2e
11
  results: []
 
16
 
17
  # rlced-conifer-zephyr-7b-dpo-2e
18
 
19
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
 
 
 
 
 
 
 
 
 
 
20
 
21
  ## Model description
22
 
 
51
 
52
  ### Training results
53
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
 
56
  ### Framework versions
all_results.json CHANGED
@@ -1,22 +1,9 @@
1
  {
2
- "epoch": 1.996886351842242,
3
- "eval_logits/chosen": 0.2633172273635864,
4
- "eval_logits/rejected": 3.3806047439575195,
5
- "eval_logps/chosen": -1345.8763427734375,
6
- "eval_logps/rejected": -2649.19384765625,
7
- "eval_loss": 0.15931174159049988,
8
- "eval_rewards/accuracies": 0.9325980544090271,
9
- "eval_rewards/chosen": -9.224682807922363,
10
- "eval_rewards/margins": 12.807853698730469,
11
- "eval_rewards/rejected": -22.03253746032715,
12
- "eval_runtime": 298.1405,
13
- "eval_samples": 6491,
14
- "eval_samples_per_second": 21.772,
15
- "eval_steps_per_second": 0.342,
16
  "total_flos": 0.0,
17
- "train_loss": 0.15826490898606932,
18
- "train_runtime": 30237.5544,
19
- "train_samples": 123309,
20
- "train_samples_per_second": 8.156,
21
- "train_steps_per_second": 0.032
22
  }
 
1
  {
2
+ "epoch": 2.0,
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  "total_flos": 0.0,
4
+ "train_loss": 0.1732867956161499,
5
+ "train_runtime": 97.0028,
6
+ "train_samples": 50,
7
+ "train_samples_per_second": 1.031,
8
+ "train_steps_per_second": 0.021
9
  }
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 1.996886351842242,
3
  "total_flos": 0.0,
4
- "train_loss": 0.15826490898606932,
5
- "train_runtime": 30237.5544,
6
- "train_samples": 123309,
7
- "train_samples_per_second": 8.156,
8
- "train_steps_per_second": 0.032
9
  }
 
1
  {
2
+ "epoch": 2.0,
3
  "total_flos": 0.0,
4
+ "train_loss": 0.1732867956161499,
5
+ "train_runtime": 97.0028,
6
+ "train_samples": 50,
7
+ "train_samples_per_second": 1.031,
8
+ "train_steps_per_second": 0.021
9
  }
trainer_state.json CHANGED
@@ -1,22 +1,21 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 1.996886351842242,
5
- "eval_steps": 100,
6
- "global_step": 962,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.0020757654385054488,
13
- "grad_norm": 7.45352963949478,
14
- "learning_rate": 5.154639175257731e-09,
15
- "logits/chosen": -2.730942726135254,
16
- "logits/rejected": -2.654609203338623,
17
- "logps/chosen": -350.489990234375,
18
- "logps/rejected": -325.546875,
19
- "loss": 0.6931,
20
  "rewards/accuracies": 0.0,
21
  "rewards/chosen": 0.0,
22
  "rewards/margins": 0.0,
@@ -24,1604 +23,20 @@
24
  "step": 1
25
  },
26
  {
27
- "epoch": 0.02075765438505449,
28
- "grad_norm": 7.9210410159092035,
29
- "learning_rate": 5.154639175257731e-08,
30
- "logits/chosen": -2.732799768447876,
31
- "logits/rejected": -2.7348814010620117,
32
- "logps/chosen": -366.5058288574219,
33
- "logps/rejected": -412.2818908691406,
34
- "loss": 0.6931,
35
- "rewards/accuracies": 0.4652777910232544,
36
- "rewards/chosen": -0.00011030762834707275,
37
- "rewards/margins": 0.00028051523258909583,
38
- "rewards/rejected": -0.0003908228827640414,
39
- "step": 10
40
- },
41
- {
42
- "epoch": 0.04151530877010898,
43
- "grad_norm": 7.570830729268252,
44
- "learning_rate": 1.0309278350515462e-07,
45
- "logits/chosen": -2.7170357704162598,
46
- "logits/rejected": -2.6939330101013184,
47
- "logps/chosen": -378.942138671875,
48
- "logps/rejected": -404.09735107421875,
49
- "loss": 0.6921,
50
- "rewards/accuracies": 0.640625,
51
- "rewards/chosen": 0.00050427729729563,
52
- "rewards/margins": 0.0023104187566787004,
53
- "rewards/rejected": -0.001806141110137105,
54
- "step": 20
55
- },
56
- {
57
- "epoch": 0.062272963155163466,
58
- "grad_norm": 7.630669954109631,
59
- "learning_rate": 1.5463917525773197e-07,
60
- "logits/chosen": -2.7188973426818848,
61
- "logits/rejected": -2.7039918899536133,
62
- "logps/chosen": -365.66571044921875,
63
- "logps/rejected": -388.82623291015625,
64
- "loss": 0.6864,
65
- "rewards/accuracies": 0.815625011920929,
66
- "rewards/chosen": 0.0047697038389742374,
67
- "rewards/margins": 0.01397106982767582,
68
- "rewards/rejected": -0.00920136459171772,
69
- "step": 30
70
- },
71
- {
72
- "epoch": 0.08303061754021795,
73
- "grad_norm": 7.812718404148332,
74
- "learning_rate": 2.0618556701030925e-07,
75
- "logits/chosen": -2.7190403938293457,
76
- "logits/rejected": -2.684091091156006,
77
- "logps/chosen": -350.58087158203125,
78
- "logps/rejected": -365.0170593261719,
79
- "loss": 0.6693,
80
- "rewards/accuracies": 0.8656250238418579,
81
- "rewards/chosen": 0.027430161833763123,
82
- "rewards/margins": 0.04917122796177864,
83
- "rewards/rejected": -0.021741071715950966,
84
- "step": 40
85
- },
86
- {
87
- "epoch": 0.10378827192527244,
88
- "grad_norm": 8.920978446164991,
89
- "learning_rate": 2.5773195876288655e-07,
90
- "logits/chosen": -2.729093074798584,
91
- "logits/rejected": -2.7239317893981934,
92
- "logps/chosen": -358.971435546875,
93
- "logps/rejected": -397.4567565917969,
94
- "loss": 0.6298,
95
- "rewards/accuracies": 0.859375,
96
- "rewards/chosen": 0.07278638333082199,
97
- "rewards/margins": 0.1439397633075714,
98
- "rewards/rejected": -0.07115338742733002,
99
- "step": 50
100
- },
101
- {
102
- "epoch": 0.12454592631032693,
103
- "grad_norm": 9.920116758805731,
104
- "learning_rate": 3.0927835051546394e-07,
105
- "logits/chosen": -2.6944854259490967,
106
- "logits/rejected": -2.677772045135498,
107
- "logps/chosen": -348.3173522949219,
108
- "logps/rejected": -426.961669921875,
109
- "loss": 0.548,
110
- "rewards/accuracies": 0.875,
111
- "rewards/chosen": -0.01142303366214037,
112
- "rewards/margins": 0.3693596422672272,
113
- "rewards/rejected": -0.3807826638221741,
114
- "step": 60
115
- },
116
- {
117
- "epoch": 0.14530358069538143,
118
- "grad_norm": 14.365707646692588,
119
- "learning_rate": 3.608247422680412e-07,
120
- "logits/chosen": -2.7343642711639404,
121
- "logits/rejected": -2.691175937652588,
122
- "logps/chosen": -450.86602783203125,
123
- "logps/rejected": -535.1449584960938,
124
- "loss": 0.4278,
125
- "rewards/accuracies": 0.856249988079071,
126
- "rewards/chosen": -0.5426548719406128,
127
- "rewards/margins": 0.7670809626579285,
128
- "rewards/rejected": -1.3097360134124756,
129
- "step": 70
130
- },
131
- {
132
- "epoch": 0.1660612350804359,
133
- "grad_norm": 17.310434488935346,
134
- "learning_rate": 4.123711340206185e-07,
135
- "logits/chosen": -2.697681188583374,
136
- "logits/rejected": -2.6777291297912598,
137
- "logps/chosen": -547.9215087890625,
138
- "logps/rejected": -699.3780517578125,
139
- "loss": 0.3591,
140
- "rewards/accuracies": 0.8656250238418579,
141
- "rewards/chosen": -1.5689979791641235,
142
- "rewards/margins": 1.4014527797698975,
143
- "rewards/rejected": -2.9704508781433105,
144
- "step": 80
145
- },
146
- {
147
- "epoch": 0.1868188894654904,
148
- "grad_norm": 18.777123719069813,
149
- "learning_rate": 4.639175257731959e-07,
150
- "logits/chosen": -2.646989107131958,
151
- "logits/rejected": -2.6466164588928223,
152
- "logps/chosen": -557.4159545898438,
153
- "logps/rejected": -791.6094970703125,
154
- "loss": 0.3056,
155
- "rewards/accuracies": 0.871874988079071,
156
- "rewards/chosen": -1.921556830406189,
157
- "rewards/margins": 2.1401381492614746,
158
- "rewards/rejected": -4.061694622039795,
159
- "step": 90
160
- },
161
- {
162
- "epoch": 0.2075765438505449,
163
- "grad_norm": 14.527848025037288,
164
- "learning_rate": 4.999851606199919e-07,
165
- "logits/chosen": -2.598267078399658,
166
- "logits/rejected": -2.5736942291259766,
167
- "logps/chosen": -581.7288818359375,
168
- "logps/rejected": -923.7081909179688,
169
- "loss": 0.2571,
170
- "rewards/accuracies": 0.9156249761581421,
171
- "rewards/chosen": -2.0622379779815674,
172
- "rewards/margins": 3.1660473346710205,
173
- "rewards/rejected": -5.228285789489746,
174
- "step": 100
175
- },
176
- {
177
- "epoch": 0.2075765438505449,
178
- "eval_logits/chosen": -2.570211172103882,
179
- "eval_logits/rejected": -2.5612430572509766,
180
- "eval_logps/chosen": -654.36865234375,
181
- "eval_logps/rejected": -988.9270629882812,
182
- "eval_loss": 0.250244677066803,
183
- "eval_rewards/accuracies": 0.8897058963775635,
184
- "eval_rewards/chosen": -2.3096041679382324,
185
- "eval_rewards/margins": 3.120264768600464,
186
- "eval_rewards/rejected": -5.429868698120117,
187
- "eval_runtime": 298.2798,
188
- "eval_samples_per_second": 21.761,
189
- "eval_steps_per_second": 0.342,
190
- "step": 100
191
- },
192
- {
193
- "epoch": 0.2283341982355994,
194
- "grad_norm": 20.425432278137276,
195
- "learning_rate": 4.997213984244138e-07,
196
- "logits/chosen": -2.288006067276001,
197
- "logits/rejected": -2.136007070541382,
198
- "logps/chosen": -608.41552734375,
199
- "logps/rejected": -1056.211669921875,
200
- "loss": 0.2306,
201
- "rewards/accuracies": 0.9281250238418579,
202
- "rewards/chosen": -2.3692779541015625,
203
- "rewards/margins": 4.241728782653809,
204
- "rewards/rejected": -6.611006259918213,
205
- "step": 110
206
- },
207
- {
208
- "epoch": 0.24909185262065386,
209
- "grad_norm": 13.460106318977088,
210
- "learning_rate": 4.991282726678214e-07,
211
- "logits/chosen": -1.7061725854873657,
212
- "logits/rejected": -1.2435765266418457,
213
- "logps/chosen": -665.3823852539062,
214
- "logps/rejected": -1083.344482421875,
215
- "loss": 0.2092,
216
- "rewards/accuracies": 0.9125000238418579,
217
- "rewards/chosen": -2.7041690349578857,
218
- "rewards/margins": 3.9961719512939453,
219
- "rewards/rejected": -6.700340270996094,
220
- "step": 120
221
- },
222
- {
223
- "epoch": 0.26984950700570837,
224
- "grad_norm": 19.086564033070054,
225
- "learning_rate": 4.982065656380468e-07,
226
- "logits/chosen": -1.483633041381836,
227
- "logits/rejected": -0.8341295123100281,
228
- "logps/chosen": -623.9300537109375,
229
- "logps/rejected": -1075.849853515625,
230
- "loss": 0.194,
231
- "rewards/accuracies": 0.903124988079071,
232
- "rewards/chosen": -2.44337797164917,
233
- "rewards/margins": 4.244832515716553,
234
- "rewards/rejected": -6.688210487365723,
235
- "step": 130
236
- },
237
- {
238
- "epoch": 0.29060716139076287,
239
- "grad_norm": 15.042161368077187,
240
- "learning_rate": 4.969574929966689e-07,
241
- "logits/chosen": -1.0453473329544067,
242
- "logits/rejected": -0.016147825866937637,
243
- "logps/chosen": -633.580078125,
244
- "logps/rejected": -1098.5224609375,
245
- "loss": 0.2082,
246
- "rewards/accuracies": 0.890625,
247
- "rewards/chosen": -2.631155490875244,
248
- "rewards/margins": 4.421234130859375,
249
- "rewards/rejected": -7.052389621734619,
250
- "step": 140
251
- },
252
- {
253
- "epoch": 0.3113648157758173,
254
- "grad_norm": 15.666211126762926,
255
- "learning_rate": 4.953827021756488e-07,
256
- "logits/chosen": -1.3857558965682983,
257
- "logits/rejected": -0.03474185988306999,
258
- "logps/chosen": -641.5021362304688,
259
- "logps/rejected": -1154.77587890625,
260
- "loss": 0.1733,
261
- "rewards/accuracies": 0.9281250238418579,
262
- "rewards/chosen": -2.8129067420959473,
263
- "rewards/margins": 4.9361677169799805,
264
- "rewards/rejected": -7.749074459075928,
265
- "step": 150
266
- },
267
- {
268
- "epoch": 0.3321224701608718,
269
- "grad_norm": 27.48848947004099,
270
- "learning_rate": 4.93484270204492e-07,
271
- "logits/chosen": -1.0413181781768799,
272
- "logits/rejected": 0.31071561574935913,
273
- "logps/chosen": -641.2700805664062,
274
- "logps/rejected": -1222.7462158203125,
275
- "loss": 0.1801,
276
- "rewards/accuracies": 0.9312499761581421,
277
- "rewards/chosen": -2.573660373687744,
278
- "rewards/margins": 5.559605121612549,
279
- "rewards/rejected": -8.133265495300293,
280
- "step": 160
281
- },
282
- {
283
- "epoch": 0.3528801245459263,
284
- "grad_norm": 20.795363548975157,
285
- "learning_rate": 4.91264700970804e-07,
286
- "logits/chosen": -0.4338255822658539,
287
- "logits/rejected": 0.8560503125190735,
288
- "logps/chosen": -664.4598388671875,
289
- "logps/rejected": -1253.003662109375,
290
- "loss": 0.1834,
291
- "rewards/accuracies": 0.903124988079071,
292
- "rewards/chosen": -3.056209087371826,
293
- "rewards/margins": 5.69351863861084,
294
- "rewards/rejected": -8.749728202819824,
295
- "step": 170
296
- },
297
- {
298
- "epoch": 0.3736377789309808,
299
- "grad_norm": 17.31415644551086,
300
- "learning_rate": 4.88726921917853e-07,
301
- "logits/chosen": -1.2734315395355225,
302
- "logits/rejected": 0.053898729383945465,
303
- "logps/chosen": -594.52685546875,
304
- "logps/rejected": -1158.65380859375,
305
- "loss": 0.1862,
306
- "rewards/accuracies": 0.925000011920929,
307
- "rewards/chosen": -2.2788708209991455,
308
- "rewards/margins": 5.445026397705078,
309
- "rewards/rejected": -7.723898410797119,
310
- "step": 180
311
- },
312
- {
313
- "epoch": 0.39439543331603527,
314
- "grad_norm": 14.853072406578221,
315
- "learning_rate": 4.858742801834942e-07,
316
- "logits/chosen": -0.5313376784324646,
317
- "logits/rejected": 0.8287642598152161,
318
- "logps/chosen": -676.4669189453125,
319
- "logps/rejected": -1298.01318359375,
320
- "loss": 0.173,
321
- "rewards/accuracies": 0.9312499761581421,
322
- "rewards/chosen": -2.9378976821899414,
323
- "rewards/margins": 5.850995063781738,
324
- "rewards/rejected": -8.788891792297363,
325
- "step": 190
326
- },
327
- {
328
- "epoch": 0.4151530877010898,
329
- "grad_norm": 14.26284853662138,
330
- "learning_rate": 4.827105381855496e-07,
331
- "logits/chosen": -0.06869231164455414,
332
- "logits/rejected": 1.29701828956604,
333
- "logps/chosen": -653.2413330078125,
334
- "logps/rejected": -1239.662353515625,
335
- "loss": 0.1684,
336
- "rewards/accuracies": 0.925000011920929,
337
- "rewards/chosen": -2.8566927909851074,
338
- "rewards/margins": 5.576047897338867,
339
- "rewards/rejected": -8.432741165161133,
340
- "step": 200
341
- },
342
- {
343
- "epoch": 0.4151530877010898,
344
- "eval_logits/chosen": -0.344154953956604,
345
- "eval_logits/rejected": 0.9748669266700745,
346
- "eval_logps/chosen": -760.6486206054688,
347
- "eval_logps/rejected": -1315.826416015625,
348
- "eval_loss": 0.1770959049463272,
349
- "eval_rewards/accuracies": 0.9142156839370728,
350
- "eval_rewards/chosen": -3.372403383255005,
351
- "eval_rewards/margins": 5.326459884643555,
352
- "eval_rewards/rejected": -8.69886302947998,
353
- "eval_runtime": 296.8516,
354
- "eval_samples_per_second": 21.866,
355
- "eval_steps_per_second": 0.344,
356
- "step": 200
357
- },
358
- {
359
- "epoch": 0.4359107420861443,
360
- "grad_norm": 13.687894338265775,
361
- "learning_rate": 4.79239868659464e-07,
362
- "logits/chosen": -0.5194350481033325,
363
- "logits/rejected": 1.2649457454681396,
364
- "logps/chosen": -699.4662475585938,
365
- "logps/rejected": -1312.603515625,
366
- "loss": 0.1582,
367
- "rewards/accuracies": 0.934374988079071,
368
- "rewards/chosen": -3.3037147521972656,
369
- "rewards/margins": 5.938404083251953,
370
- "rewards/rejected": -9.242118835449219,
371
- "step": 210
372
- },
373
- {
374
- "epoch": 0.4566683964711988,
375
- "grad_norm": 21.754337046254733,
376
- "learning_rate": 4.7546684915478443e-07,
377
- "logits/chosen": -0.8742658495903015,
378
- "logits/rejected": 1.3033313751220703,
379
- "logps/chosen": -670.0702514648438,
380
- "logps/rejected": -1371.7958984375,
381
- "loss": 0.1675,
382
- "rewards/accuracies": 0.9156249761581421,
383
- "rewards/chosen": -3.048536777496338,
384
- "rewards/margins": 6.837308406829834,
385
- "rewards/rejected": -9.885846138000488,
386
- "step": 220
387
- },
388
- {
389
- "epoch": 0.4774260508562532,
390
- "grad_norm": 27.016825262219978,
391
- "learning_rate": 4.7139645599771953e-07,
392
- "logits/chosen": -1.2103874683380127,
393
- "logits/rejected": 1.2142726182937622,
394
- "logps/chosen": -635.36376953125,
395
- "logps/rejected": -1365.181640625,
396
- "loss": 0.1733,
397
- "rewards/accuracies": 0.921875,
398
- "rewards/chosen": -2.743067502975464,
399
- "rewards/margins": 7.142025947570801,
400
- "rewards/rejected": -9.885092735290527,
401
- "step": 230
402
- },
403
- {
404
- "epoch": 0.49818370524130773,
405
- "grad_norm": 14.606776233016102,
406
- "learning_rate": 4.6703405772774325e-07,
407
- "logits/chosen": -0.6876879930496216,
408
- "logits/rejected": 1.9533984661102295,
409
- "logps/chosen": -648.3201904296875,
410
- "logps/rejected": -1378.4886474609375,
411
- "loss": 0.1624,
412
- "rewards/accuracies": 0.9312499761581421,
413
- "rewards/chosen": -2.901624917984009,
414
- "rewards/margins": 7.063530921936035,
415
- "rewards/rejected": -9.965155601501465,
416
- "step": 240
417
- },
418
- {
419
- "epoch": 0.5189413596263622,
420
- "grad_norm": 14.879180241164281,
421
- "learning_rate": 4.6238540801689896e-07,
422
- "logits/chosen": -1.068670630455017,
423
- "logits/rejected": 1.5525070428848267,
424
- "logps/chosen": -595.411376953125,
425
- "logps/rejected": -1218.922119140625,
426
- "loss": 0.1618,
427
- "rewards/accuracies": 0.9281250238418579,
428
- "rewards/chosen": -2.1561543941497803,
429
- "rewards/margins": 6.024182319641113,
430
- "rewards/rejected": -8.180335998535156,
431
- "step": 250
432
- },
433
- {
434
- "epoch": 0.5396990140114167,
435
- "grad_norm": 21.085080794483027,
436
- "learning_rate": 4.5745663808114316e-07,
437
- "logits/chosen": -1.2082252502441406,
438
- "logits/rejected": 1.332776427268982,
439
- "logps/chosen": -641.9990844726562,
440
- "logps/rejected": -1324.9752197265625,
441
- "loss": 0.164,
442
- "rewards/accuracies": 0.925000011920929,
443
- "rewards/chosen": -2.6882810592651367,
444
- "rewards/margins": 6.606905460357666,
445
- "rewards/rejected": -9.295186042785645,
446
- "step": 260
447
- },
448
- {
449
- "epoch": 0.5604566683964712,
450
- "grad_norm": 14.503555786921384,
451
- "learning_rate": 4.5225424859373684e-07,
452
- "logits/chosen": -0.5242375731468201,
453
- "logits/rejected": 2.3043034076690674,
454
- "logps/chosen": -642.212890625,
455
- "logps/rejected": -1466.6038818359375,
456
- "loss": 0.1535,
457
- "rewards/accuracies": 0.949999988079071,
458
- "rewards/chosen": -2.8858113288879395,
459
- "rewards/margins": 7.929835319519043,
460
- "rewards/rejected": -10.81564712524414,
461
- "step": 270
462
- },
463
- {
464
- "epoch": 0.5812143227815257,
465
- "grad_norm": 14.500876590205559,
466
- "learning_rate": 4.467851011113515e-07,
467
- "logits/chosen": -0.34295958280563354,
468
- "logits/rejected": 2.470365047454834,
469
- "logps/chosen": -626.3866577148438,
470
- "logps/rejected": -1347.487060546875,
471
- "loss": 0.1557,
472
- "rewards/accuracies": 0.921875,
473
- "rewards/chosen": -2.5040247440338135,
474
- "rewards/margins": 6.9828643798828125,
475
- "rewards/rejected": -9.486889839172363,
476
- "step": 280
477
- },
478
- {
479
- "epoch": 0.6019719771665801,
480
- "grad_norm": 21.11138081532862,
481
- "learning_rate": 4.410564090241966e-07,
482
- "logits/chosen": -0.4226152002811432,
483
- "logits/rejected": 2.2370669841766357,
484
- "logps/chosen": -715.9142456054688,
485
- "logps/rejected": -1435.792724609375,
486
- "loss": 0.166,
487
- "rewards/accuracies": 0.9125000238418579,
488
- "rewards/chosen": -3.411532163619995,
489
- "rewards/margins": 7.1244330406188965,
490
- "rewards/rejected": -10.535964012145996,
491
- "step": 290
492
- },
493
- {
494
- "epoch": 0.6227296315516346,
495
- "grad_norm": 12.837411656046333,
496
- "learning_rate": 4.35075728042106e-07,
497
- "logits/chosen": -0.646745502948761,
498
- "logits/rejected": 1.6430190801620483,
499
- "logps/chosen": -651.37060546875,
500
- "logps/rejected": -1312.945068359375,
501
- "loss": 0.1506,
502
- "rewards/accuracies": 0.9468749761581421,
503
- "rewards/chosen": -2.753232717514038,
504
- "rewards/margins": 6.365309715270996,
505
- "rewards/rejected": -9.118542671203613,
506
- "step": 300
507
- },
508
- {
509
- "epoch": 0.6227296315516346,
510
- "eval_logits/chosen": -0.4686531722545624,
511
- "eval_logits/rejected": 1.6434051990509033,
512
- "eval_logps/chosen": -738.97119140625,
513
- "eval_logps/rejected": -1418.271728515625,
514
- "eval_loss": 0.16402015089988708,
515
- "eval_rewards/accuracies": 0.9215686321258545,
516
- "eval_rewards/chosen": -3.1556289196014404,
517
- "eval_rewards/margins": 6.567685127258301,
518
- "eval_rewards/rejected": -9.72331428527832,
519
- "eval_runtime": 296.3154,
520
- "eval_samples_per_second": 21.906,
521
- "eval_steps_per_second": 0.344,
522
- "step": 300
523
- },
524
- {
525
- "epoch": 0.6434872859366891,
526
- "grad_norm": 12.299461167843033,
527
- "learning_rate": 4.2885094622913016e-07,
528
- "logits/chosen": -0.22457298636436462,
529
- "logits/rejected": 2.304126024246216,
530
- "logps/chosen": -696.1895141601562,
531
- "logps/rejected": -1399.352783203125,
532
- "loss": 0.1538,
533
- "rewards/accuracies": 0.903124988079071,
534
- "rewards/chosen": -3.2637131214141846,
535
- "rewards/margins": 6.956628322601318,
536
- "rewards/rejected": -10.220341682434082,
537
- "step": 310
538
- },
539
- {
540
- "epoch": 0.6642449403217436,
541
- "grad_norm": 13.016357165526102,
542
- "learning_rate": 4.223902735997788e-07,
543
- "logits/chosen": -0.8427863121032715,
544
- "logits/rejected": 1.8859798908233643,
545
- "logps/chosen": -618.1915283203125,
546
- "logps/rejected": -1367.597412109375,
547
- "loss": 0.1525,
548
- "rewards/accuracies": 0.9281250238418579,
549
- "rewards/chosen": -2.5306873321533203,
550
- "rewards/margins": 7.276186466217041,
551
- "rewards/rejected": -9.80687427520752,
552
- "step": 320
553
- },
554
- {
555
- "epoch": 0.6850025947067981,
556
- "grad_norm": 11.40441988789802,
557
- "learning_rate": 4.157022312906352e-07,
558
- "logits/chosen": -0.8463101387023926,
559
- "logits/rejected": 2.104464292526245,
560
- "logps/chosen": -682.7887573242188,
561
- "logps/rejected": -1518.4447021484375,
562
- "loss": 0.1436,
563
- "rewards/accuracies": 0.9156249761581421,
564
- "rewards/chosen": -3.10284423828125,
565
- "rewards/margins": 8.061182022094727,
566
- "rewards/rejected": -11.164026260375977,
567
- "step": 330
568
- },
569
- {
570
- "epoch": 0.7057602490918526,
571
- "grad_norm": 12.82722167792133,
572
- "learning_rate": 4.0879564032162425e-07,
573
- "logits/chosen": -0.18682530522346497,
574
- "logits/rejected": 3.2691047191619873,
575
- "logps/chosen": -832.2535400390625,
576
- "logps/rejected": -1820.017333984375,
577
- "loss": 0.1461,
578
- "rewards/accuracies": 0.921875,
579
- "rewards/chosen": -4.5879225730896,
580
- "rewards/margins": 9.579926490783691,
581
- "rewards/rejected": -14.16784954071045,
582
- "step": 340
583
- },
584
- {
585
- "epoch": 0.7265179034769071,
586
- "grad_norm": 16.44365591813322,
587
- "learning_rate": 4.016796099617569e-07,
588
- "logits/chosen": -0.138845294713974,
589
- "logits/rejected": 3.01670503616333,
590
- "logps/chosen": -769.3917236328125,
591
- "logps/rejected": -1580.634765625,
592
- "loss": 0.1552,
593
- "rewards/accuracies": 0.8843749761581421,
594
- "rewards/chosen": -4.0832200050354,
595
- "rewards/margins": 7.8098883628845215,
596
- "rewards/rejected": -11.893107414245605,
597
- "step": 350
598
- },
599
- {
600
- "epoch": 0.7472755578619616,
601
- "grad_norm": 19.24789069287223,
602
- "learning_rate": 3.9436352571469577e-07,
603
- "logits/chosen": 1.5992153882980347,
604
- "logits/rejected": 5.1132683753967285,
605
- "logps/chosen": -974.8264770507812,
606
- "logps/rejected": -1869.984375,
607
- "loss": 0.1447,
608
- "rewards/accuracies": 0.9281250238418579,
609
- "rewards/chosen": -6.103518962860107,
610
- "rewards/margins": 8.862874031066895,
611
- "rewards/rejected": -14.966394424438477,
612
- "step": 360
613
- },
614
- {
615
- "epoch": 0.768033212247016,
616
- "grad_norm": 14.764411673021232,
617
- "learning_rate": 3.868570369399893e-07,
618
- "logits/chosen": 0.504949152469635,
619
- "logits/rejected": 4.207425117492676,
620
- "logps/chosen": -842.4241943359375,
621
- "logps/rejected": -1723.845703125,
622
- "loss": 0.1461,
623
- "rewards/accuracies": 0.9125000238418579,
624
- "rewards/chosen": -4.756136894226074,
625
- "rewards/margins": 8.685012817382812,
626
- "rewards/rejected": -13.441149711608887,
627
- "step": 370
628
- },
629
- {
630
- "epoch": 0.7887908666320705,
631
- "grad_norm": 11.306971906638802,
632
- "learning_rate": 3.791700441262987e-07,
633
- "logits/chosen": 1.6941606998443604,
634
- "logits/rejected": 4.910915374755859,
635
- "logps/chosen": -991.244140625,
636
- "logps/rejected": -1979.561279296875,
637
- "loss": 0.1394,
638
- "rewards/accuracies": 0.925000011920929,
639
- "rewards/chosen": -6.388211727142334,
640
- "rewards/margins": 9.29551887512207,
641
- "rewards/rejected": -15.683731079101562,
642
- "step": 380
643
- },
644
- {
645
- "epoch": 0.809548521017125,
646
- "grad_norm": 13.65730703963766,
647
- "learning_rate": 3.7131268583340515e-07,
648
- "logits/chosen": 1.5387306213378906,
649
- "logits/rejected": 4.950864315032959,
650
- "logps/chosen": -1013.9693603515625,
651
- "logps/rejected": -1817.079345703125,
652
- "loss": 0.1518,
653
- "rewards/accuracies": 0.9468749761581421,
654
- "rewards/chosen": -6.383057117462158,
655
- "rewards/margins": 7.826146125793457,
656
- "rewards/rejected": -14.209203720092773,
657
- "step": 390
658
- },
659
- {
660
- "epoch": 0.8303061754021795,
661
- "grad_norm": 14.179745096678028,
662
- "learning_rate": 3.632953253202198e-07,
663
- "logits/chosen": 1.0768206119537354,
664
- "logits/rejected": 4.270804405212402,
665
- "logps/chosen": -915.9781494140625,
666
- "logps/rejected": -1812.755859375,
667
- "loss": 0.1426,
668
- "rewards/accuracies": 0.9281250238418579,
669
- "rewards/chosen": -5.507952690124512,
670
- "rewards/margins": 8.6758394241333,
671
- "rewards/rejected": -14.183792114257812,
672
- "step": 400
673
- },
674
- {
675
- "epoch": 0.8303061754021795,
676
- "eval_logits/chosen": 0.9505479335784912,
677
- "eval_logits/rejected": 3.6792540550231934,
678
- "eval_logps/chosen": -991.3616943359375,
679
- "eval_logps/rejected": -1870.2236328125,
680
- "eval_loss": 0.15229202806949615,
681
- "eval_rewards/accuracies": 0.9301470518112183,
682
- "eval_rewards/chosen": -5.6795334815979,
683
- "eval_rewards/margins": 8.563300132751465,
684
- "eval_rewards/rejected": -14.242834091186523,
685
- "eval_runtime": 298.0288,
686
- "eval_samples_per_second": 21.78,
687
- "eval_steps_per_second": 0.342,
688
- "step": 400
689
- },
690
- {
691
- "epoch": 0.851063829787234,
692
- "grad_norm": 14.225050299873093,
693
- "learning_rate": 3.551285368764321e-07,
694
- "logits/chosen": 1.3883212804794312,
695
- "logits/rejected": 4.056710243225098,
696
- "logps/chosen": -921.244140625,
697
- "logps/rejected": -1725.069091796875,
698
- "loss": 0.1414,
699
- "rewards/accuracies": 0.921875,
700
- "rewards/chosen": -5.557861328125,
701
- "rewards/margins": 7.890999794006348,
702
- "rewards/rejected": -13.448862075805664,
703
- "step": 410
704
- },
705
- {
706
- "epoch": 0.8718214841722886,
707
- "grad_norm": 21.908810437787125,
708
- "learning_rate": 3.468230918758242e-07,
709
- "logits/chosen": 1.0017262697219849,
710
- "logits/rejected": 4.086182594299316,
711
- "logps/chosen": -934.9280395507812,
712
- "logps/rejected": -1826.6614990234375,
713
- "loss": 0.1487,
714
- "rewards/accuracies": 0.9281250238418579,
715
- "rewards/chosen": -5.608671188354492,
716
- "rewards/margins": 8.596394538879395,
717
- "rewards/rejected": -14.205065727233887,
718
- "step": 420
719
- },
720
- {
721
- "epoch": 0.892579138557343,
722
- "grad_norm": 14.422793954218163,
723
- "learning_rate": 3.383899445696477e-07,
724
- "logits/chosen": 0.6862390637397766,
725
- "logits/rejected": 3.530297040939331,
726
- "logps/chosen": -932.3873291015625,
727
- "logps/rejected": -1817.0745849609375,
728
- "loss": 0.129,
729
- "rewards/accuracies": 0.921875,
730
- "rewards/chosen": -5.5939507484436035,
731
- "rewards/margins": 8.62092399597168,
732
- "rewards/rejected": -14.214874267578125,
733
- "step": 430
734
- },
735
- {
736
- "epoch": 0.9133367929423976,
737
- "grad_norm": 17.011117159702987,
738
- "learning_rate": 3.2984021763879756e-07,
739
- "logits/chosen": 1.3755428791046143,
740
- "logits/rejected": 4.330183029174805,
741
- "logps/chosen": -1071.127197265625,
742
- "logps/rejected": -2124.4716796875,
743
- "loss": 0.1488,
744
- "rewards/accuracies": 0.9281250238418579,
745
- "rewards/chosen": -6.959136009216309,
746
- "rewards/margins": 10.387044906616211,
747
- "rewards/rejected": -17.346179962158203,
748
- "step": 440
749
- },
750
- {
751
- "epoch": 0.934094447327452,
752
- "grad_norm": 14.158345398063162,
753
- "learning_rate": 3.211851875238408e-07,
754
- "logits/chosen": 1.2293100357055664,
755
- "logits/rejected": 4.549264430999756,
756
- "logps/chosen": -1061.044189453125,
757
- "logps/rejected": -2061.3359375,
758
- "loss": 0.1387,
759
- "rewards/accuracies": 0.940625011920929,
760
- "rewards/chosen": -6.939100742340088,
761
- "rewards/margins": 9.909866333007812,
762
- "rewards/rejected": -16.848966598510742,
763
- "step": 450
764
- },
765
- {
766
- "epoch": 0.9548521017125065,
767
- "grad_norm": 15.78022665142005,
768
- "learning_rate": 3.124362695522476e-07,
769
- "logits/chosen": 2.06438946723938,
770
- "logits/rejected": 4.793455123901367,
771
- "logps/chosen": -1161.7301025390625,
772
- "logps/rejected": -2112.584228515625,
773
- "loss": 0.1387,
774
- "rewards/accuracies": 0.949999988079071,
775
- "rewards/chosen": -7.919003963470459,
776
- "rewards/margins": 9.370214462280273,
777
- "rewards/rejected": -17.289216995239258,
778
- "step": 460
779
- },
780
- {
781
- "epoch": 0.975609756097561,
782
- "grad_norm": 14.672366285416803,
783
- "learning_rate": 3.036050028824415e-07,
784
- "logits/chosen": 1.6745359897613525,
785
- "logits/rejected": 4.174818515777588,
786
- "logps/chosen": -1067.4365234375,
787
- "logps/rejected": -1914.353515625,
788
- "loss": 0.1267,
789
- "rewards/accuracies": 0.9312499761581421,
790
- "rewards/chosen": -6.962777137756348,
791
- "rewards/margins": 8.22961711883545,
792
- "rewards/rejected": -15.192395210266113,
793
- "step": 470
794
- },
795
- {
796
- "epoch": 0.9963674104826155,
797
- "grad_norm": 10.30547100028778,
798
- "learning_rate": 2.9470303528452547e-07,
799
- "logits/chosen": 2.291341543197632,
800
- "logits/rejected": 4.340534687042236,
801
- "logps/chosen": -1134.796630859375,
802
- "logps/rejected": -2049.962890625,
803
- "loss": 0.1409,
804
- "rewards/accuracies": 0.925000011920929,
805
- "rewards/chosen": -7.828709602355957,
806
- "rewards/margins": 8.853391647338867,
807
- "rewards/rejected": -16.68210220336914,
808
- "step": 480
809
- },
810
- {
811
- "epoch": 1.01712506486767,
812
- "grad_norm": 15.21191565767335,
813
- "learning_rate": 2.8574210777775755e-07,
814
- "logits/chosen": 2.009263277053833,
815
- "logits/rejected": 4.9940924644470215,
816
- "logps/chosen": -1182.762939453125,
817
- "logps/rejected": -2222.476318359375,
818
- "loss": 0.0933,
819
- "rewards/accuracies": 0.949999988079071,
820
- "rewards/chosen": -8.175220489501953,
821
- "rewards/margins": 10.224291801452637,
822
- "rewards/rejected": -18.399513244628906,
823
- "step": 490
824
- },
825
- {
826
- "epoch": 1.0378827192527245,
827
- "grad_norm": 15.585906686341703,
828
- "learning_rate": 2.767340391450384e-07,
829
- "logits/chosen": 1.0047972202301025,
830
- "logits/rejected": 4.8316521644592285,
831
- "logps/chosen": -1162.712646484375,
832
- "logps/rejected": -2610.682373046875,
833
- "loss": 0.0881,
834
- "rewards/accuracies": 0.949999988079071,
835
- "rewards/chosen": -7.991292476654053,
836
- "rewards/margins": 14.244958877563477,
837
- "rewards/rejected": -22.236251831054688,
838
- "step": 500
839
- },
840
- {
841
- "epoch": 1.0378827192527245,
842
- "eval_logits/chosen": 1.2345364093780518,
843
- "eval_logits/rejected": 4.526747703552246,
844
- "eval_logps/chosen": -1257.1658935546875,
845
- "eval_logps/rejected": -2547.474853515625,
846
- "eval_loss": 0.15922115743160248,
847
- "eval_rewards/accuracies": 0.9313725233078003,
848
- "eval_rewards/chosen": -8.337576866149902,
849
- "eval_rewards/margins": 12.677770614624023,
850
- "eval_rewards/rejected": -21.01534652709961,
851
- "eval_runtime": 297.9014,
852
- "eval_samples_per_second": 21.789,
853
- "eval_steps_per_second": 0.342,
854
- "step": 500
855
- },
856
- {
857
- "epoch": 1.058640373637779,
858
- "grad_norm": 14.041828775113027,
859
- "learning_rate": 2.6769071034483407e-07,
860
- "logits/chosen": 1.5315182209014893,
861
- "logits/rejected": 5.140109062194824,
862
- "logps/chosen": -1049.8817138671875,
863
- "logps/rejected": -2273.5029296875,
864
- "loss": 0.0856,
865
- "rewards/accuracies": 0.9750000238418579,
866
- "rewards/chosen": -6.837597846984863,
867
- "rewards/margins": 11.946271896362305,
868
- "rewards/rejected": -18.783870697021484,
869
- "step": 510
870
- },
871
- {
872
- "epoch": 1.0793980280228335,
873
- "grad_norm": 18.45189270725633,
874
- "learning_rate": 2.5862404884109365e-07,
875
- "logits/chosen": 1.529955267906189,
876
- "logits/rejected": 4.867814064025879,
877
- "logps/chosen": -1038.77001953125,
878
- "logps/rejected": -2317.119384765625,
879
- "loss": 0.0826,
880
- "rewards/accuracies": 0.949999988079071,
881
- "rewards/chosen": -6.80930233001709,
882
- "rewards/margins": 12.373517990112305,
883
- "rewards/rejected": -19.18282127380371,
884
- "step": 520
885
- },
886
- {
887
- "epoch": 1.100155682407888,
888
- "grad_norm": 13.857547492254254,
889
- "learning_rate": 2.495460128718305e-07,
890
- "logits/chosen": 0.9307753443717957,
891
- "logits/rejected": 4.540700435638428,
892
- "logps/chosen": -1109.2952880859375,
893
- "logps/rejected": -2351.9580078125,
894
- "loss": 0.0791,
895
- "rewards/accuracies": 0.9593750238418579,
896
- "rewards/chosen": -7.225480556488037,
897
- "rewards/margins": 12.243474006652832,
898
- "rewards/rejected": -19.468952178955078,
899
- "step": 530
900
- },
901
- {
902
- "epoch": 1.1209133367929425,
903
- "grad_norm": 17.715699549300524,
904
- "learning_rate": 2.404685756771143e-07,
905
- "logits/chosen": 0.9044798612594604,
906
- "logits/rejected": 4.5168304443359375,
907
- "logps/chosen": -1108.01416015625,
908
- "logps/rejected": -2464.59375,
909
- "loss": 0.0872,
910
- "rewards/accuracies": 0.9750000238418579,
911
- "rewards/chosen": -7.498801231384277,
912
- "rewards/margins": 13.343009948730469,
913
- "rewards/rejected": -20.841812133789062,
914
- "step": 540
915
- },
916
- {
917
- "epoch": 1.141670991177997,
918
- "grad_norm": 11.817665551721014,
919
- "learning_rate": 2.314037097072764e-07,
920
- "logits/chosen": 1.3692632913589478,
921
- "logits/rejected": 4.569345474243164,
922
- "logps/chosen": -1150.0703125,
923
- "logps/rejected": -2468.03271484375,
924
- "loss": 0.0754,
925
- "rewards/accuracies": 0.9750000238418579,
926
- "rewards/chosen": -7.868742942810059,
927
- "rewards/margins": 13.00172233581543,
928
- "rewards/rejected": -20.870466232299805,
929
- "step": 550
930
- },
931
- {
932
- "epoch": 1.1624286455630513,
933
- "grad_norm": 13.98618450585046,
934
- "learning_rate": 2.2236337083215723e-07,
935
- "logits/chosen": 1.508049488067627,
936
- "logits/rejected": 5.1392316818237305,
937
- "logps/chosen": -1217.515625,
938
- "logps/rejected": -2590.57421875,
939
- "loss": 0.0853,
940
- "rewards/accuracies": 0.9624999761581421,
941
- "rewards/chosen": -8.472477912902832,
942
- "rewards/margins": 13.660308837890625,
943
- "rewards/rejected": -22.13278579711914,
944
- "step": 560
945
- },
946
- {
947
- "epoch": 1.183186299948106,
948
- "grad_norm": 9.239685997581025,
949
- "learning_rate": 2.13359482572222e-07,
950
- "logits/chosen": 0.9549428820610046,
951
- "logits/rejected": 4.445387363433838,
952
- "logps/chosen": -1040.99267578125,
953
- "logps/rejected": -2247.80615234375,
954
- "loss": 0.0847,
955
- "rewards/accuracies": 0.9468749761581421,
956
- "rewards/chosen": -6.785864353179932,
957
- "rewards/margins": 11.840021133422852,
958
- "rewards/rejected": -18.62588882446289,
959
- "step": 570
960
- },
961
- {
962
- "epoch": 1.2039439543331603,
963
- "grad_norm": 11.315109261026508,
964
- "learning_rate": 2.044039203723423e-07,
965
- "logits/chosen": 0.5468926429748535,
966
- "logits/rejected": 3.945660352706909,
967
- "logps/chosen": -1037.985595703125,
968
- "logps/rejected": -2178.81103515625,
969
- "loss": 0.0889,
970
- "rewards/accuracies": 0.953125,
971
- "rewards/chosen": -6.750271797180176,
972
- "rewards/margins": 11.077262878417969,
973
- "rewards/rejected": -17.827533721923828,
974
- "step": 580
975
- },
976
- {
977
- "epoch": 1.2247016087182148,
978
- "grad_norm": 20.88556623186479,
979
- "learning_rate": 1.955084959389864e-07,
980
- "logits/chosen": 1.7732290029525757,
981
- "logits/rejected": 4.805976867675781,
982
- "logps/chosen": -1254.5054931640625,
983
- "logps/rejected": -2677.98974609375,
984
- "loss": 0.0704,
985
- "rewards/accuracies": 0.981249988079071,
986
- "rewards/chosen": -9.03892993927002,
987
- "rewards/margins": 13.861906051635742,
988
- "rewards/rejected": -22.900836944580078,
989
- "step": 590
990
- },
991
- {
992
- "epoch": 1.2454592631032693,
993
- "grad_norm": 19.615657624738002,
994
- "learning_rate": 1.866849416614753e-07,
995
- "logits/chosen": 1.4045512676239014,
996
- "logits/rejected": 4.72921895980835,
997
- "logps/chosen": -1266.108154296875,
998
- "logps/rejected": -2709.53857421875,
999
- "loss": 0.0774,
1000
- "rewards/accuracies": 0.9781249761581421,
1001
- "rewards/chosen": -8.943242073059082,
1002
- "rewards/margins": 14.16535758972168,
1003
- "rewards/rejected": -23.108600616455078,
1004
- "step": 600
1005
- },
1006
- {
1007
- "epoch": 1.2454592631032693,
1008
- "eval_logits/chosen": 1.1427009105682373,
1009
- "eval_logits/rejected": 4.037267208099365,
1010
- "eval_logps/chosen": -1277.8565673828125,
1011
- "eval_logps/rejected": -2513.811279296875,
1012
- "eval_loss": 0.15603666007518768,
1013
- "eval_rewards/accuracies": 0.9325980544090271,
1014
- "eval_rewards/chosen": -8.544482231140137,
1015
- "eval_rewards/margins": 12.134225845336914,
1016
- "eval_rewards/rejected": -20.6787109375,
1017
- "eval_runtime": 298.0443,
1018
- "eval_samples_per_second": 21.779,
1019
- "eval_steps_per_second": 0.342,
1020
- "step": 600
1021
- },
1022
- {
1023
- "epoch": 1.2662169174883238,
1024
- "grad_norm": 20.13025478182326,
1025
- "learning_rate": 1.7794489513785227e-07,
1026
- "logits/chosen": 1.091370940208435,
1027
- "logits/rejected": 4.404737949371338,
1028
- "logps/chosen": -1142.038330078125,
1029
- "logps/rejected": -2377.242919921875,
1030
- "loss": 0.0835,
1031
- "rewards/accuracies": 0.971875011920929,
1032
- "rewards/chosen": -7.555553436279297,
1033
- "rewards/margins": 12.173652648925781,
1034
- "rewards/rejected": -19.729206085205078,
1035
- "step": 610
1036
- },
1037
- {
1038
- "epoch": 1.2869745718733783,
1039
- "grad_norm": 10.83536532630756,
1040
- "learning_rate": 1.692998838257744e-07,
1041
- "logits/chosen": 1.4595911502838135,
1042
- "logits/rejected": 4.403286457061768,
1043
- "logps/chosen": -1153.688720703125,
1044
- "logps/rejected": -2336.400634765625,
1045
- "loss": 0.0792,
1046
- "rewards/accuracies": 0.9781249761581421,
1047
- "rewards/chosen": -7.824929714202881,
1048
- "rewards/margins": 11.501016616821289,
1049
- "rewards/rejected": -19.32594871520996,
1050
- "step": 620
1051
- },
1052
- {
1053
- "epoch": 1.3077322262584328,
1054
- "grad_norm": 20.828438171756602,
1055
- "learning_rate": 1.6076130983867191e-07,
1056
- "logits/chosen": 1.5300322771072388,
1057
- "logits/rejected": 4.811184883117676,
1058
- "logps/chosen": -1134.029296875,
1059
- "logps/rejected": -2562.76025390625,
1060
- "loss": 0.076,
1061
- "rewards/accuracies": 0.9750000238418579,
1062
- "rewards/chosen": -7.6294426918029785,
1063
- "rewards/margins": 13.96863079071045,
1064
- "rewards/rejected": -21.59807014465332,
1065
- "step": 630
1066
- },
1067
- {
1068
- "epoch": 1.3284898806434873,
1069
- "grad_norm": 18.16265960764559,
1070
- "learning_rate": 1.5234043490722587e-07,
1071
- "logits/chosen": 0.9964359402656555,
1072
- "logits/rejected": 4.313546657562256,
1073
- "logps/chosen": -1163.1513671875,
1074
- "logps/rejected": -2395.21923828125,
1075
- "loss": 0.0823,
1076
- "rewards/accuracies": 0.956250011920929,
1077
- "rewards/chosen": -8.007986068725586,
1078
- "rewards/margins": 12.177099227905273,
1079
- "rewards/rejected": -20.185087203979492,
1080
- "step": 640
1081
- },
1082
- {
1083
- "epoch": 1.3492475350285418,
1084
- "grad_norm": 20.314345446941065,
1085
- "learning_rate": 1.44048365526001e-07,
1086
- "logits/chosen": 1.159566879272461,
1087
- "logits/rejected": 4.513047695159912,
1088
- "logps/chosen": -1154.093017578125,
1089
- "logps/rejected": -2576.34326171875,
1090
- "loss": 0.0833,
1091
- "rewards/accuracies": 0.971875011920929,
1092
- "rewards/chosen": -7.905519962310791,
1093
- "rewards/margins": 13.832204818725586,
1094
- "rewards/rejected": -21.73772430419922,
1095
- "step": 650
1096
- },
1097
- {
1098
- "epoch": 1.3700051894135963,
1099
- "grad_norm": 16.284929311743372,
1100
- "learning_rate": 1.3589603830482243e-07,
1101
- "logits/chosen": 0.8507378697395325,
1102
- "logits/rejected": 4.674212455749512,
1103
- "logps/chosen": -1195.9324951171875,
1104
- "logps/rejected": -2728.91748046875,
1105
- "loss": 0.0711,
1106
- "rewards/accuracies": 0.96875,
1107
- "rewards/chosen": -8.17365550994873,
1108
- "rewards/margins": 15.162320137023926,
1109
- "rewards/rejected": -23.335979461669922,
1110
- "step": 660
1111
- },
1112
- {
1113
- "epoch": 1.3907628437986508,
1114
- "grad_norm": 23.91471168720134,
1115
- "learning_rate": 1.2789420554421821e-07,
1116
- "logits/chosen": 0.8655555844306946,
1117
- "logits/rejected": 4.348161220550537,
1118
- "logps/chosen": -1252.838623046875,
1119
- "logps/rejected": -2602.3056640625,
1120
- "loss": 0.0839,
1121
- "rewards/accuracies": 0.9624999761581421,
1122
- "rewards/chosen": -8.924249649047852,
1123
- "rewards/margins": 13.221219062805176,
1124
- "rewards/rejected": -22.14546775817871,
1125
- "step": 670
1126
- },
1127
- {
1128
- "epoch": 1.4115204981837053,
1129
- "grad_norm": 15.367814675454188,
1130
- "learning_rate": 1.200534210539509e-07,
1131
- "logits/chosen": 0.17712223529815674,
1132
- "logits/rejected": 3.673604965209961,
1133
- "logps/chosen": -1158.31640625,
1134
- "logps/rejected": -2521.94482421875,
1135
- "loss": 0.0877,
1136
- "rewards/accuracies": 0.965624988079071,
1137
- "rewards/chosen": -7.891541481018066,
1138
- "rewards/margins": 13.315177917480469,
1139
- "rewards/rejected": -21.20671844482422,
1140
- "step": 680
1141
- },
1142
- {
1143
- "epoch": 1.4322781525687598,
1144
- "grad_norm": 22.079625577964986,
1145
- "learning_rate": 1.1238402623334492e-07,
1146
- "logits/chosen": 0.07270300388336182,
1147
- "logits/rejected": 3.5430798530578613,
1148
- "logps/chosen": -1109.5074462890625,
1149
- "logps/rejected": -2462.455078125,
1150
- "loss": 0.0899,
1151
- "rewards/accuracies": 0.953125,
1152
- "rewards/chosen": -7.3517746925354,
1153
- "rewards/margins": 13.166239738464355,
1154
- "rewards/rejected": -20.51801109313965,
1155
- "step": 690
1156
- },
1157
- {
1158
- "epoch": 1.4530358069538143,
1159
- "grad_norm": 11.289239524989192,
1160
- "learning_rate": 1.0489613643176479e-07,
1161
- "logits/chosen": 0.2879738509654999,
1162
- "logits/rejected": 3.8537840843200684,
1163
- "logps/chosen": -1176.748779296875,
1164
- "logps/rejected": -2515.61083984375,
1165
- "loss": 0.0747,
1166
- "rewards/accuracies": 0.9624999761581421,
1167
- "rewards/chosen": -7.953884124755859,
1168
- "rewards/margins": 13.24780559539795,
1169
- "rewards/rejected": -21.20168685913086,
1170
- "step": 700
1171
- },
1172
- {
1173
- "epoch": 1.4530358069538143,
1174
- "eval_logits/chosen": 0.4988805651664734,
1175
- "eval_logits/rejected": 3.581174612045288,
1176
- "eval_logps/chosen": -1308.12939453125,
1177
- "eval_logps/rejected": -2572.467529296875,
1178
- "eval_loss": 0.15788544714450836,
1179
- "eval_rewards/accuracies": 0.9276960492134094,
1180
- "eval_rewards/chosen": -8.847211837768555,
1181
- "eval_rewards/margins": 12.418061256408691,
1182
- "eval_rewards/rejected": -21.265270233154297,
1183
- "eval_runtime": 296.7811,
1184
- "eval_samples_per_second": 21.871,
1185
- "eval_steps_per_second": 0.344,
1186
- "step": 700
1187
- },
1188
- {
1189
- "epoch": 1.4737934613388686,
1190
- "grad_norm": 16.700822811969054,
1191
- "learning_rate": 9.759962760723855e-08,
1192
- "logits/chosen": 0.7035635113716125,
1193
- "logits/rejected": 4.354310035705566,
1194
- "logps/chosen": -1203.5103759765625,
1195
- "logps/rejected": -2514.58154296875,
1196
- "loss": 0.0671,
1197
- "rewards/accuracies": 0.96875,
1198
- "rewards/chosen": -8.514452934265137,
1199
- "rewards/margins": 12.95274543762207,
1200
- "rewards/rejected": -21.46719741821289,
1201
- "step": 710
1202
- },
1203
- {
1204
- "epoch": 1.4945511157239233,
1205
- "grad_norm": 20.004128089517163,
1206
- "learning_rate": 9.050412330081883e-08,
1207
- "logits/chosen": 0.296795517206192,
1208
- "logits/rejected": 4.032415390014648,
1209
- "logps/chosen": -1271.940185546875,
1210
- "logps/rejected": -2761.74365234375,
1211
- "loss": 0.0861,
1212
- "rewards/accuracies": 0.9624999761581421,
1213
- "rewards/chosen": -8.98901653289795,
1214
- "rewards/margins": 14.629063606262207,
1215
- "rewards/rejected": -23.61808204650879,
1216
- "step": 720
1217
- },
1218
- {
1219
- "epoch": 1.5153087701089776,
1220
- "grad_norm": 18.269599252001203,
1221
- "learning_rate": 8.36189819438625e-08,
1222
- "logits/chosen": 0.12335433810949326,
1223
- "logits/rejected": 3.770864963531494,
1224
- "logps/chosen": -1231.0367431640625,
1225
- "logps/rejected": -2575.371826171875,
1226
- "loss": 0.0842,
1227
- "rewards/accuracies": 0.9593750238418579,
1228
- "rewards/chosen": -8.523446083068848,
1229
- "rewards/margins": 13.358665466308594,
1230
- "rewards/rejected": -21.882112503051758,
1231
- "step": 730
1232
- },
1233
- {
1234
- "epoch": 1.5360664244940323,
1235
- "grad_norm": 21.887817722932024,
1236
- "learning_rate": 7.69532845149711e-08,
1237
- "logits/chosen": -0.06583809852600098,
1238
- "logits/rejected": 3.8187732696533203,
1239
- "logps/chosen": -1204.005615234375,
1240
- "logps/rejected": -2569.55615234375,
1241
- "loss": 0.0791,
1242
- "rewards/accuracies": 0.953125,
1243
- "rewards/chosen": -8.176721572875977,
1244
- "rewards/margins": 13.470640182495117,
1245
- "rewards/rejected": -21.647363662719727,
1246
- "step": 740
1247
- },
1248
- {
1249
- "epoch": 1.5568240788790866,
1250
- "grad_norm": 20.92101563028829,
1251
- "learning_rate": 7.051582256286929e-08,
1252
- "logits/chosen": 0.0026878931093961,
1253
- "logits/rejected": 3.554405927658081,
1254
- "logps/chosen": -1194.286376953125,
1255
- "logps/rejected": -2548.453125,
1256
- "loss": 0.0711,
1257
- "rewards/accuracies": 0.9624999761581421,
1258
- "rewards/chosen": -8.033935546875,
1259
- "rewards/margins": 13.331436157226562,
1260
- "rewards/rejected": -21.365371704101562,
1261
- "step": 750
1262
- },
1263
- {
1264
- "epoch": 1.5775817332641413,
1265
- "grad_norm": 18.796351433954527,
1266
- "learning_rate": 6.431508661101954e-08,
1267
- "logits/chosen": 0.2032267153263092,
1268
- "logits/rejected": 3.864830493927002,
1269
- "logps/chosen": -1201.6220703125,
1270
- "logps/rejected": -2415.40673828125,
1271
- "loss": 0.0881,
1272
- "rewards/accuracies": 0.956250011920929,
1273
- "rewards/chosen": -8.291030883789062,
1274
- "rewards/margins": 12.069987297058105,
1275
- "rewards/rejected": -20.36101722717285,
1276
- "step": 760
1277
- },
1278
- {
1279
- "epoch": 1.5983393876491956,
1280
- "grad_norm": 18.9865340327329,
1281
- "learning_rate": 5.8359254959266826e-08,
1282
- "logits/chosen": 0.17426332831382751,
1283
- "logits/rejected": 3.7378311157226562,
1284
- "logps/chosen": -1135.3184814453125,
1285
- "logps/rejected": -2408.099365234375,
1286
- "loss": 0.0744,
1287
- "rewards/accuracies": 0.965624988079071,
1288
- "rewards/chosen": -7.850682258605957,
1289
- "rewards/margins": 12.461448669433594,
1290
- "rewards/rejected": -20.312129974365234,
1291
- "step": 770
1292
- },
1293
- {
1294
- "epoch": 1.61909704203425,
1295
- "grad_norm": 14.350034716496568,
1296
- "learning_rate": 5.265618289728199e-08,
1297
- "logits/chosen": 0.2724596858024597,
1298
- "logits/rejected": 3.901758909225464,
1299
- "logps/chosen": -1125.8365478515625,
1300
- "logps/rejected": -2402.55029296875,
1301
- "loss": 0.0758,
1302
- "rewards/accuracies": 0.9624999761581421,
1303
- "rewards/chosen": -7.693342685699463,
1304
- "rewards/margins": 12.453673362731934,
1305
- "rewards/rejected": -20.147018432617188,
1306
- "step": 780
1307
- },
1308
- {
1309
- "epoch": 1.6398546964193046,
1310
- "grad_norm": 16.05161662904945,
1311
- "learning_rate": 4.721339234403121e-08,
1312
- "logits/chosen": 0.07050670683383942,
1313
- "logits/rejected": 3.7205798625946045,
1314
- "logps/chosen": -1121.5814208984375,
1315
- "logps/rejected": -2443.16015625,
1316
- "loss": 0.0857,
1317
- "rewards/accuracies": 0.971875011920929,
1318
- "rewards/chosen": -7.602121829986572,
1319
- "rewards/margins": 12.968803405761719,
1320
- "rewards/rejected": -20.5709228515625,
1321
- "step": 790
1322
- },
1323
- {
1324
- "epoch": 1.660612350804359,
1325
- "grad_norm": 8.47571609534727,
1326
- "learning_rate": 4.203806192693587e-08,
1327
- "logits/chosen": 0.38345223665237427,
1328
- "logits/rejected": 3.6582157611846924,
1329
- "logps/chosen": -1135.015380859375,
1330
- "logps/rejected": -2251.6806640625,
1331
- "loss": 0.0811,
1332
- "rewards/accuracies": 0.965624988079071,
1333
- "rewards/chosen": -7.660782814025879,
1334
- "rewards/margins": 10.944924354553223,
1335
- "rewards/rejected": -18.6057071685791,
1336
- "step": 800
1337
- },
1338
- {
1339
- "epoch": 1.660612350804359,
1340
- "eval_logits/chosen": 0.2172178030014038,
1341
- "eval_logits/rejected": 3.254581928253174,
1342
- "eval_logps/chosen": -1261.5126953125,
1343
- "eval_logps/rejected": -2436.340576171875,
1344
- "eval_loss": 0.15446949005126953,
1345
- "eval_rewards/accuracies": 0.9289215803146362,
1346
- "eval_rewards/chosen": -8.381046295166016,
1347
- "eval_rewards/margins": 11.522958755493164,
1348
- "eval_rewards/rejected": -19.90400505065918,
1349
- "eval_runtime": 297.9226,
1350
- "eval_samples_per_second": 21.788,
1351
- "eval_steps_per_second": 0.342,
1352
- "step": 800
1353
- },
1354
- {
1355
- "epoch": 1.6813700051894136,
1356
- "grad_norm": 11.983781457709991,
1357
- "learning_rate": 3.7137017513808544e-08,
1358
- "logits/chosen": 0.1190919280052185,
1359
- "logits/rejected": 3.681546449661255,
1360
- "logps/chosen": -1151.4061279296875,
1361
- "logps/rejected": -2382.40966796875,
1362
- "loss": 0.0787,
1363
- "rewards/accuracies": 0.9593750238418579,
1364
- "rewards/chosen": -7.798292636871338,
1365
- "rewards/margins": 12.149864196777344,
1366
- "rewards/rejected": -19.94815444946289,
1367
- "step": 810
1368
- },
1369
- {
1370
- "epoch": 1.702127659574468,
1371
- "grad_norm": 15.404111765676271,
1372
- "learning_rate": 3.251672321005147e-08,
1373
- "logits/chosen": 0.0303075909614563,
1374
- "logits/rejected": 3.4457364082336426,
1375
- "logps/chosen": -1154.9154052734375,
1376
- "logps/rejected": -2477.01220703125,
1377
- "loss": 0.0781,
1378
- "rewards/accuracies": 0.9750000238418579,
1379
- "rewards/chosen": -7.82059383392334,
1380
- "rewards/margins": 12.896100044250488,
1381
- "rewards/rejected": -20.716693878173828,
1382
- "step": 820
1383
- },
1384
- {
1385
- "epoch": 1.7228853139595226,
1386
- "grad_norm": 18.5892582498577,
1387
- "learning_rate": 2.8183272832992267e-08,
1388
- "logits/chosen": 0.05088377743959427,
1389
- "logits/rejected": 3.3782405853271484,
1390
- "logps/chosen": -1138.544921875,
1391
- "logps/rejected": -2476.583984375,
1392
- "loss": 0.0781,
1393
- "rewards/accuracies": 0.953125,
1394
- "rewards/chosen": -7.821805000305176,
1395
- "rewards/margins": 13.025341987609863,
1396
- "rewards/rejected": -20.847145080566406,
1397
- "step": 830
1398
- },
1399
- {
1400
- "epoch": 1.743642968344577,
1401
- "grad_norm": 15.008344755715465,
1402
- "learning_rate": 2.414238187460191e-08,
1403
- "logits/chosen": 0.09483002126216888,
1404
- "logits/rejected": 3.978058338165283,
1405
- "logps/chosen": -1192.2017822265625,
1406
- "logps/rejected": -2495.4228515625,
1407
- "loss": 0.0809,
1408
- "rewards/accuracies": 0.965624988079071,
1409
- "rewards/chosen": -8.194788932800293,
1410
- "rewards/margins": 12.927266120910645,
1411
- "rewards/rejected": -21.122053146362305,
1412
- "step": 840
1413
- },
1414
- {
1415
- "epoch": 1.7644006227296316,
1416
- "grad_norm": 18.202201847253505,
1417
- "learning_rate": 2.0399379963194713e-08,
1418
- "logits/chosen": 0.28226789832115173,
1419
- "logits/rejected": 4.104235649108887,
1420
- "logps/chosen": -1198.5599365234375,
1421
- "logps/rejected": -2697.082275390625,
1422
- "loss": 0.0726,
1423
- "rewards/accuracies": 0.96875,
1424
- "rewards/chosen": -8.269991874694824,
1425
- "rewards/margins": 14.753326416015625,
1426
- "rewards/rejected": -23.0233154296875,
1427
- "step": 850
1428
- },
1429
- {
1430
- "epoch": 1.7851582771146859,
1431
- "grad_norm": 12.20972861300344,
1432
- "learning_rate": 1.695920383405322e-08,
1433
- "logits/chosen": 0.1344248354434967,
1434
- "logits/rejected": 3.7496190071105957,
1435
- "logps/chosen": -1236.697509765625,
1436
- "logps/rejected": -2696.864990234375,
1437
- "loss": 0.0834,
1438
- "rewards/accuracies": 0.971875011920929,
1439
- "rewards/chosen": -8.43194580078125,
1440
- "rewards/margins": 14.396467208862305,
1441
- "rewards/rejected": -22.828411102294922,
1442
- "step": 860
1443
- },
1444
- {
1445
- "epoch": 1.8059159314997406,
1446
- "grad_norm": 23.070004800505263,
1447
- "learning_rate": 1.3826390818249434e-08,
1448
- "logits/chosen": 0.2998521625995636,
1449
- "logits/rejected": 3.7659850120544434,
1450
- "logps/chosen": -1205.6510009765625,
1451
- "logps/rejected": -2611.88525390625,
1452
- "loss": 0.0866,
1453
- "rewards/accuracies": 0.965624988079071,
1454
- "rewards/chosen": -8.398554801940918,
1455
- "rewards/margins": 13.68336296081543,
1456
- "rewards/rejected": -22.081918716430664,
1457
- "step": 870
1458
- },
1459
- {
1460
- "epoch": 1.826673585884795,
1461
- "grad_norm": 12.610473638421219,
1462
- "learning_rate": 1.1005072858249614e-08,
1463
- "logits/chosen": 0.12997238337993622,
1464
- "logits/rejected": 3.8439629077911377,
1465
- "logps/chosen": -1214.625,
1466
- "logps/rejected": -2609.048583984375,
1467
- "loss": 0.0713,
1468
- "rewards/accuracies": 0.9750000238418579,
1469
- "rewards/chosen": -8.35765266418457,
1470
- "rewards/margins": 13.758280754089355,
1471
- "rewards/rejected": -22.11593246459961,
1472
- "step": 880
1473
- },
1474
- {
1475
- "epoch": 1.8474312402698496,
1476
- "grad_norm": 17.876398721018592,
1477
- "learning_rate": 8.498971058195886e-09,
1478
- "logits/chosen": 0.18721507489681244,
1479
- "logits/rejected": 3.927520275115967,
1480
- "logps/chosen": -1229.69482421875,
1481
- "logps/rejected": -2654.132568359375,
1482
- "loss": 0.0685,
1483
- "rewards/accuracies": 0.9750000238418579,
1484
- "rewards/chosen": -8.53508472442627,
1485
- "rewards/margins": 14.071182250976562,
1486
- "rewards/rejected": -22.606266021728516,
1487
- "step": 890
1488
- },
1489
- {
1490
- "epoch": 1.868188894654904,
1491
- "grad_norm": 16.34958614861239,
1492
- "learning_rate": 6.311390776052527e-09,
1493
- "logits/chosen": 0.20388635993003845,
1494
- "logits/rejected": 3.8464550971984863,
1495
- "logps/chosen": -1190.66064453125,
1496
- "logps/rejected": -2582.68212890625,
1497
- "loss": 0.069,
1498
- "rewards/accuracies": 0.953125,
1499
- "rewards/chosen": -8.267062187194824,
1500
- "rewards/margins": 13.608221054077148,
1501
- "rewards/rejected": -21.87528419494629,
1502
- "step": 900
1503
- },
1504
- {
1505
- "epoch": 1.868188894654904,
1506
- "eval_logits/chosen": 0.2562587261199951,
1507
- "eval_logits/rejected": 3.369171380996704,
1508
- "eval_logps/chosen": -1345.178955078125,
1509
- "eval_logps/rejected": -2644.793701171875,
1510
- "eval_loss": 0.1591682732105255,
1511
- "eval_rewards/accuracies": 0.9325980544090271,
1512
- "eval_rewards/chosen": -9.217706680297852,
1513
- "eval_rewards/margins": 12.770830154418945,
1514
- "eval_rewards/rejected": -21.988536834716797,
1515
- "eval_runtime": 297.6019,
1516
- "eval_samples_per_second": 21.811,
1517
- "eval_steps_per_second": 0.343,
1518
- "step": 900
1519
- },
1520
- {
1521
- "epoch": 1.8889465490399586,
1522
- "grad_norm": 17.144158000944817,
1523
- "learning_rate": 4.445217264089751e-09,
1524
- "logits/chosen": 0.10267569869756699,
1525
- "logits/rejected": 4.0425705909729,
1526
- "logps/chosen": -1188.369873046875,
1527
- "logps/rejected": -2630.327392578125,
1528
- "loss": 0.0659,
1529
- "rewards/accuracies": 0.971875011920929,
1530
- "rewards/chosen": -8.262983322143555,
1531
- "rewards/margins": 14.21807861328125,
1532
- "rewards/rejected": -22.481060028076172,
1533
- "step": 910
1534
- },
1535
- {
1536
- "epoch": 1.909704203425013,
1537
- "grad_norm": 16.781554983940616,
1538
- "learning_rate": 2.902911863455121e-09,
1539
- "logits/chosen": 0.2743522524833679,
1540
- "logits/rejected": 4.136692047119141,
1541
- "logps/chosen": -1213.719970703125,
1542
- "logps/rejected": -2728.411376953125,
1543
- "loss": 0.0767,
1544
- "rewards/accuracies": 0.9750000238418579,
1545
- "rewards/chosen": -8.522599220275879,
1546
- "rewards/margins": 14.904146194458008,
1547
- "rewards/rejected": -23.42674446105957,
1548
- "step": 920
1549
- },
1550
- {
1551
- "epoch": 1.9304618578100676,
1552
- "grad_norm": 23.227111674511338,
1553
- "learning_rate": 1.686508757851507e-09,
1554
- "logits/chosen": 0.2027033567428589,
1555
- "logits/rejected": 3.8271079063415527,
1556
- "logps/chosen": -1229.591796875,
1557
- "logps/rejected": -2668.19970703125,
1558
- "loss": 0.0872,
1559
- "rewards/accuracies": 0.965624988079071,
1560
- "rewards/chosen": -8.599322319030762,
1561
- "rewards/margins": 14.134483337402344,
1562
- "rewards/rejected": -22.733806610107422,
1563
- "step": 930
1564
- },
1565
- {
1566
- "epoch": 1.951219512195122,
1567
- "grad_norm": 22.90252956159655,
1568
- "learning_rate": 7.976122906031557e-10,
1569
- "logits/chosen": 0.30234938859939575,
1570
- "logits/rejected": 3.9263412952423096,
1571
- "logps/chosen": -1212.529541015625,
1572
- "logps/rejected": -2572.277587890625,
1573
- "loss": 0.087,
1574
- "rewards/accuracies": 0.9593750238418579,
1575
- "rewards/chosen": -8.438420295715332,
1576
- "rewards/margins": 13.364117622375488,
1577
- "rewards/rejected": -21.802536010742188,
1578
- "step": 940
1579
- },
1580
- {
1581
- "epoch": 1.9719771665801764,
1582
- "grad_norm": 17.95357887786745,
1583
- "learning_rate": 2.37394848648792e-10,
1584
- "logits/chosen": 0.09388472139835358,
1585
- "logits/rejected": 3.5805296897888184,
1586
- "logps/chosen": -1182.0406494140625,
1587
- "logps/rejected": -2537.778564453125,
1588
- "loss": 0.0821,
1589
- "rewards/accuracies": 0.956250011920929,
1590
- "rewards/chosen": -8.203265190124512,
1591
- "rewards/margins": 13.360048294067383,
1592
- "rewards/rejected": -21.56331443786621,
1593
- "step": 950
1594
- },
1595
- {
1596
- "epoch": 1.992734820965231,
1597
- "grad_norm": 18.37972352009023,
1598
- "learning_rate": 6.5953162521614755e-12,
1599
- "logits/chosen": 0.16293412446975708,
1600
- "logits/rejected": 3.703944444656372,
1601
- "logps/chosen": -1242.895751953125,
1602
- "logps/rejected": -2706.61279296875,
1603
- "loss": 0.0738,
1604
- "rewards/accuracies": 0.981249988079071,
1605
- "rewards/chosen": -8.82257080078125,
1606
- "rewards/margins": 14.438652038574219,
1607
- "rewards/rejected": -23.261220932006836,
1608
- "step": 960
1609
- },
1610
- {
1611
- "epoch": 1.996886351842242,
1612
- "step": 962,
1613
  "total_flos": 0.0,
1614
- "train_loss": 0.15826490898606932,
1615
- "train_runtime": 30237.5544,
1616
- "train_samples_per_second": 8.156,
1617
- "train_steps_per_second": 0.032
1618
  }
1619
  ],
1620
  "logging_steps": 10,
1621
- "max_steps": 962,
1622
  "num_input_tokens_seen": 0,
1623
  "num_train_epochs": 2,
1624
- "save_steps": 100,
1625
  "stateful_callbacks": {
1626
  "TrainerControl": {
1627
  "args": {
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 2.0,
5
+ "eval_steps": 240,
6
+ "global_step": 2,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 1.0,
13
+ "learning_rate": 5e-07,
14
+ "logits/chosen": -2.766833543777466,
15
+ "logits/rejected": -2.7548677921295166,
16
+ "logps/chosen": -492.5103759765625,
17
+ "logps/rejected": -501.75994873046875,
18
+ "loss": 0.1733,
 
19
  "rewards/accuracies": 0.0,
20
  "rewards/chosen": 0.0,
21
  "rewards/margins": 0.0,
 
23
  "step": 1
24
  },
25
  {
26
+ "epoch": 2.0,
27
+ "step": 2,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  "total_flos": 0.0,
29
+ "train_loss": 0.1732867956161499,
30
+ "train_runtime": 97.0028,
31
+ "train_samples_per_second": 1.031,
32
+ "train_steps_per_second": 0.021
33
  }
34
  ],
35
  "logging_steps": 10,
36
+ "max_steps": 2,
37
  "num_input_tokens_seen": 0,
38
  "num_train_epochs": 2,
39
+ "save_steps": 240,
40
  "stateful_callbacks": {
41
  "TrainerControl": {
42
  "args": {