File size: 16,610 Bytes
363a996
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter11_sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter11_sftsd2

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1036
- Num Input Tokens Seen: 56860576

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3909          | 0                 |
| 1.6619        | 0.0048 | 5    | 1.3885          | 269208            |
| 1.5561        | 0.0095 | 10   | 1.3670          | 533104            |
| 1.5456        | 0.0143 | 15   | 1.3196          | 801872            |
| 1.417         | 0.0190 | 20   | 1.2681          | 1069744           |
| 1.309         | 0.0238 | 25   | 1.2331          | 1347056           |
| 1.2013        | 0.0286 | 30   | 1.1947          | 1624792           |
| 1.1202        | 0.0333 | 35   | 1.1898          | 1890872           |
| 1.0145        | 0.0381 | 40   | 1.2165          | 2160520           |
| 0.9685        | 0.0429 | 45   | 1.2058          | 2434752           |
| 0.8294        | 0.0476 | 50   | 1.2506          | 2702528           |
| 0.7017        | 0.0524 | 55   | 1.2530          | 2964728           |
| 0.6012        | 0.0571 | 60   | 1.2459          | 3236840           |
| 0.5699        | 0.0619 | 65   | 1.2474          | 3504304           |
| 0.5568        | 0.0667 | 70   | 1.2397          | 3774176           |
| 0.459         | 0.0714 | 75   | 1.2315          | 4045968           |
| 0.4137        | 0.0762 | 80   | 1.2152          | 4317176           |
| 0.3721        | 0.0810 | 85   | 1.2139          | 4593360           |
| 0.3437        | 0.0857 | 90   | 1.2082          | 4859344           |
| 0.2922        | 0.0905 | 95   | 1.2201          | 5128960           |
| 0.3979        | 0.0952 | 100  | 1.2055          | 5402832           |
| 0.3391        | 0.1000 | 105  | 1.1999          | 5668728           |
| 0.2973        | 0.1048 | 110  | 1.1868          | 5941816           |
| 0.2907        | 0.1095 | 115  | 1.2060          | 6211888           |
| 0.2166        | 0.1143 | 120  | 1.1917          | 6476504           |
| 0.3361        | 0.1191 | 125  | 1.2059          | 6748360           |
| 0.2939        | 0.1238 | 130  | 1.1898          | 7020304           |
| 0.2627        | 0.1286 | 135  | 1.2001          | 7298768           |
| 0.2605        | 0.1333 | 140  | 1.1872          | 7562912           |
| 0.2749        | 0.1381 | 145  | 1.1898          | 7831408           |
| 0.2645        | 0.1429 | 150  | 1.1817          | 8105208           |
| 0.2205        | 0.1476 | 155  | 1.1854          | 8374344           |
| 0.2723        | 0.1524 | 160  | 1.1811          | 8649800           |
| 0.27          | 0.1572 | 165  | 1.1824          | 8924904           |
| 0.2381        | 0.1619 | 170  | 1.1850          | 9194288           |
| 0.155         | 0.1667 | 175  | 1.1805          | 9460784           |
| 0.216         | 0.1714 | 180  | 1.1811          | 9734216           |
| 0.1763        | 0.1762 | 185  | 1.1816          | 10007184          |
| 0.1992        | 0.1810 | 190  | 1.1778          | 10282216          |
| 0.1878        | 0.1857 | 195  | 1.1736          | 10559320          |
| 0.2359        | 0.1905 | 200  | 1.1746          | 10832344          |
| 0.1692        | 0.1953 | 205  | 1.1731          | 11102976          |
| 0.2311        | 0.2000 | 210  | 1.1718          | 11377528          |
| 0.1974        | 0.2048 | 215  | 1.1683          | 11644848          |
| 0.2118        | 0.2095 | 220  | 1.1726          | 11917864          |
| 0.2069        | 0.2143 | 225  | 1.1676          | 12188040          |
| 0.2559        | 0.2191 | 230  | 1.1680          | 12460936          |
| 0.2865        | 0.2238 | 235  | 1.1701          | 12732608          |
| 0.2321        | 0.2286 | 240  | 1.1647          | 13005728          |
| 0.1391        | 0.2334 | 245  | 1.1630          | 13278536          |
| 0.2632        | 0.2381 | 250  | 1.1606          | 13549648          |
| 0.2139        | 0.2429 | 255  | 1.1660          | 13814824          |
| 0.1839        | 0.2476 | 260  | 1.1591          | 14088816          |
| 0.2781        | 0.2524 | 265  | 1.1601          | 14360152          |
| 0.2523        | 0.2572 | 270  | 1.1583          | 14634592          |
| 0.2295        | 0.2619 | 275  | 1.1594          | 14901720          |
| 0.1748        | 0.2667 | 280  | 1.1594          | 15169568          |
| 0.1853        | 0.2715 | 285  | 1.1590          | 15442312          |
| 0.2221        | 0.2762 | 290  | 1.1550          | 15712784          |
| 0.2346        | 0.2810 | 295  | 1.1557          | 15983504          |
| 0.1717        | 0.2857 | 300  | 1.1566          | 16252736          |
| 0.2572        | 0.2905 | 305  | 1.1528          | 16522656          |
| 0.1948        | 0.2953 | 310  | 1.1501          | 16804384          |
| 0.2072        | 0.3000 | 315  | 1.1551          | 17069304          |
| 0.2233        | 0.3048 | 320  | 1.1495          | 17347080          |
| 0.1787        | 0.3096 | 325  | 1.1473          | 17619848          |
| 0.1904        | 0.3143 | 330  | 1.1498          | 17894792          |
| 0.1648        | 0.3191 | 335  | 1.1474          | 18164360          |
| 0.1701        | 0.3238 | 340  | 1.1451          | 18438616          |
| 0.2417        | 0.3286 | 345  | 1.1465          | 18707840          |
| 0.2617        | 0.3334 | 350  | 1.1449          | 18975640          |
| 0.1717        | 0.3381 | 355  | 1.1437          | 19243800          |
| 0.2343        | 0.3429 | 360  | 1.1431          | 19518544          |
| 0.1921        | 0.3477 | 365  | 1.1381          | 19781600          |
| 0.1478        | 0.3524 | 370  | 1.1447          | 20050016          |
| 0.2128        | 0.3572 | 375  | 1.1449          | 20326736          |
| 0.2403        | 0.3619 | 380  | 1.1369          | 20593560          |
| 0.204         | 0.3667 | 385  | 1.1410          | 20865264          |
| 0.2372        | 0.3715 | 390  | 1.1429          | 21133128          |
| 0.2333        | 0.3762 | 395  | 1.1395          | 21402880          |
| 0.1617        | 0.3810 | 400  | 1.1404          | 21676584          |
| 0.1994        | 0.3858 | 405  | 1.1383          | 21956864          |
| 0.2082        | 0.3905 | 410  | 1.1367          | 22231848          |
| 0.1889        | 0.3953 | 415  | 1.1379          | 22504232          |
| 0.2024        | 0.4000 | 420  | 1.1365          | 22773424          |
| 0.1367        | 0.4048 | 425  | 1.1375          | 23040760          |
| 0.1531        | 0.4096 | 430  | 1.1358          | 23311384          |
| 0.2435        | 0.4143 | 435  | 1.1336          | 23579336          |
| 0.2576        | 0.4191 | 440  | 1.1348          | 23855568          |
| 0.1846        | 0.4239 | 445  | 1.1339          | 24131984          |
| 0.1664        | 0.4286 | 450  | 1.1335          | 24400920          |
| 0.1921        | 0.4334 | 455  | 1.1346          | 24671968          |
| 0.2055        | 0.4381 | 460  | 1.1328          | 24947688          |
| 0.2394        | 0.4429 | 465  | 1.1299          | 25216584          |
| 0.0912        | 0.4477 | 470  | 1.1309          | 25479304          |
| 0.1602        | 0.4524 | 475  | 1.1316          | 25751328          |
| 0.1711        | 0.4572 | 480  | 1.1297          | 26020464          |
| 0.1851        | 0.4620 | 485  | 1.1304          | 26297840          |
| 0.1544        | 0.4667 | 490  | 1.1306          | 26564208          |
| 0.2246        | 0.4715 | 495  | 1.1292          | 26837544          |
| 0.2593        | 0.4762 | 500  | 1.1293          | 27106200          |
| 0.1452        | 0.4810 | 505  | 1.1279          | 27379904          |
| 0.1888        | 0.4858 | 510  | 1.1285          | 27650704          |
| 0.1808        | 0.4905 | 515  | 1.1260          | 27917048          |
| 0.1349        | 0.4953 | 520  | 1.1271          | 28191080          |
| 0.1523        | 0.5001 | 525  | 1.1260          | 28456360          |
| 0.1804        | 0.5048 | 530  | 1.1271          | 28728472          |
| 0.1876        | 0.5096 | 535  | 1.1252          | 28996608          |
| 0.1901        | 0.5143 | 540  | 1.1257          | 29268304          |
| 0.1649        | 0.5191 | 545  | 1.1258          | 29534384          |
| 0.2207        | 0.5239 | 550  | 1.1258          | 29802704          |
| 0.1712        | 0.5286 | 555  | 1.1253          | 30074840          |
| 0.1941        | 0.5334 | 560  | 1.1235          | 30347512          |
| 0.1767        | 0.5382 | 565  | 1.1262          | 30613424          |
| 0.226         | 0.5429 | 570  | 1.1246          | 30886904          |
| 0.1604        | 0.5477 | 575  | 1.1226          | 31159704          |
| 0.1883        | 0.5524 | 580  | 1.1239          | 31438992          |
| 0.1438        | 0.5572 | 585  | 1.1238          | 31713696          |
| 0.1358        | 0.5620 | 590  | 1.1234          | 31989648          |
| 0.2459        | 0.5667 | 595  | 1.1219          | 32257152          |
| 0.1788        | 0.5715 | 600  | 1.1241          | 32528856          |
| 0.1915        | 0.5763 | 605  | 1.1232          | 32801536          |
| 0.1908        | 0.5810 | 610  | 1.1195          | 33067456          |
| 0.1838        | 0.5858 | 615  | 1.1215          | 33343248          |
| 0.1612        | 0.5905 | 620  | 1.1214          | 33614488          |
| 0.1305        | 0.5953 | 625  | 1.1185          | 33880584          |
| 0.1575        | 0.6001 | 630  | 1.1196          | 34151360          |
| 0.1482        | 0.6048 | 635  | 1.1222          | 34429648          |
| 0.1527        | 0.6096 | 640  | 1.1196          | 34713128          |
| 0.1519        | 0.6144 | 645  | 1.1201          | 34985712          |
| 0.1264        | 0.6191 | 650  | 1.1236          | 35249112          |
| 0.1938        | 0.6239 | 655  | 1.1188          | 35533992          |
| 0.1878        | 0.6286 | 660  | 1.1181          | 35799384          |
| 0.1363        | 0.6334 | 665  | 1.1197          | 36069736          |
| 0.2028        | 0.6382 | 670  | 1.1183          | 36342600          |
| 0.2482        | 0.6429 | 675  | 1.1157          | 36619536          |
| 0.1125        | 0.6477 | 680  | 1.1177          | 36896976          |
| 0.0909        | 0.6525 | 685  | 1.1208          | 37164168          |
| 0.2006        | 0.6572 | 690  | 1.1150          | 37434024          |
| 0.1549        | 0.6620 | 695  | 1.1159          | 37703384          |
| 0.2242        | 0.6667 | 700  | 1.1172          | 37976752          |
| 0.2624        | 0.6715 | 705  | 1.1150          | 38254056          |
| 0.2141        | 0.6763 | 710  | 1.1147          | 38520576          |
| 0.2093        | 0.6810 | 715  | 1.1186          | 38791592          |
| 0.199         | 0.6858 | 720  | 1.1183          | 39062104          |
| 0.16          | 0.6906 | 725  | 1.1158          | 39333144          |
| 0.1316        | 0.6953 | 730  | 1.1164          | 39601296          |
| 0.1405        | 0.7001 | 735  | 1.1165          | 39870328          |
| 0.164         | 0.7048 | 740  | 1.1156          | 40143536          |
| 0.2407        | 0.7096 | 745  | 1.1165          | 40413120          |
| 0.1927        | 0.7144 | 750  | 1.1157          | 40679328          |
| 0.1008        | 0.7191 | 755  | 1.1148          | 40954008          |
| 0.1801        | 0.7239 | 760  | 1.1155          | 41224728          |
| 0.1303        | 0.7287 | 765  | 1.1153          | 41500888          |
| 0.1614        | 0.7334 | 770  | 1.1137          | 41772264          |
| 0.1058        | 0.7382 | 775  | 1.1131          | 42037464          |
| 0.1393        | 0.7429 | 780  | 1.1144          | 42308296          |
| 0.1357        | 0.7477 | 785  | 1.1115          | 42577088          |
| 0.2385        | 0.7525 | 790  | 1.1114          | 42850208          |
| 0.1819        | 0.7572 | 795  | 1.1105          | 43131992          |
| 0.1754        | 0.7620 | 800  | 1.1143          | 43404648          |
| 0.1844        | 0.7668 | 805  | 1.1126          | 43675664          |
| 0.1651        | 0.7715 | 810  | 1.1107          | 43944080          |
| 0.1492        | 0.7763 | 815  | 1.1111          | 44212952          |
| 0.2447        | 0.7810 | 820  | 1.1129          | 44489064          |
| 0.2831        | 0.7858 | 825  | 1.1116          | 44757776          |
| 0.198         | 0.7906 | 830  | 1.1119          | 45025712          |
| 0.2413        | 0.7953 | 835  | 1.1144          | 45298744          |
| 0.2419        | 0.8001 | 840  | 1.1142          | 45569856          |
| 0.212         | 0.8049 | 845  | 1.1119          | 45843440          |
| 0.1282        | 0.8096 | 850  | 1.1096          | 46118392          |
| 0.2365        | 0.8144 | 855  | 1.1117          | 46387616          |
| 0.1231        | 0.8191 | 860  | 1.1110          | 46655568          |
| 0.1475        | 0.8239 | 865  | 1.1119          | 46926024          |
| 0.1728        | 0.8287 | 870  | 1.1104          | 47191592          |
| 0.1555        | 0.8334 | 875  | 1.1096          | 47468280          |
| 0.2101        | 0.8382 | 880  | 1.1083          | 47734104          |
| 0.1643        | 0.8430 | 885  | 1.1096          | 48010136          |
| 0.2671        | 0.8477 | 890  | 1.1119          | 48275232          |
| 0.2283        | 0.8525 | 895  | 1.1099          | 48542904          |
| 0.249         | 0.8572 | 900  | 1.1075          | 48814896          |
| 0.1618        | 0.8620 | 905  | 1.1086          | 49086616          |
| 0.1733        | 0.8668 | 910  | 1.1097          | 49358688          |
| 0.1571        | 0.8715 | 915  | 1.1093          | 49630792          |
| 0.207         | 0.8763 | 920  | 1.1101          | 49901280          |
| 0.2012        | 0.8811 | 925  | 1.1088          | 50172728          |
| 0.1682        | 0.8858 | 930  | 1.1079          | 50439696          |
| 0.1735        | 0.8906 | 935  | 1.1068          | 50710368          |
| 0.1766        | 0.8953 | 940  | 1.1095          | 50975280          |
| 0.1292        | 0.9001 | 945  | 1.1081          | 51245360          |
| 0.1688        | 0.9049 | 950  | 1.1079          | 51515104          |
| 0.1044        | 0.9096 | 955  | 1.1096          | 51781736          |
| 0.1414        | 0.9144 | 960  | 1.1114          | 52043768          |
| 0.1954        | 0.9192 | 965  | 1.1054          | 52312032          |
| 0.2114        | 0.9239 | 970  | 1.1042          | 52586368          |
| 0.2029        | 0.9287 | 975  | 1.1070          | 52858112          |
| 0.2393        | 0.9334 | 980  | 1.1046          | 53127184          |
| 0.1397        | 0.9382 | 985  | 1.1042          | 53393096          |
| 0.1867        | 0.9430 | 990  | 1.1053          | 53666584          |
| 0.1785        | 0.9477 | 995  | 1.1069          | 53930960          |
| 0.1624        | 0.9525 | 1000 | 1.1081          | 54208544          |
| 0.204         | 0.9573 | 1005 | 1.1064          | 54480672          |
| 0.2185        | 0.9620 | 1010 | 1.1057          | 54753248          |
| 0.1201        | 0.9668 | 1015 | 1.1045          | 55020320          |
| 0.2427        | 0.9715 | 1020 | 1.1038          | 55288376          |
| 0.1832        | 0.9763 | 1025 | 1.1038          | 55555816          |
| 0.1387        | 0.9811 | 1030 | 1.1040          | 55829144          |
| 0.1508        | 0.9858 | 1035 | 1.1034          | 56106152          |
| 0.2211        | 0.9906 | 1040 | 1.1030          | 56371320          |
| 0.2265        | 0.9954 | 1045 | 1.1044          | 56641808          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1