File size: 16,104 Bytes
62c9a3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1117
- Num Input Tokens Seen: 54862728

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.7236        | 0.0049 | 5    | 1.3936          | 265256            |
| 1.7311        | 0.0099 | 10   | 1.3714          | 529168            |
| 1.7068        | 0.0148 | 15   | 1.3184          | 796184            |
| 1.4504        | 0.0198 | 20   | 1.2644          | 1069504           |
| 1.3511        | 0.0247 | 25   | 1.2207          | 1343872           |
| 1.2544        | 0.0297 | 30   | 1.1828          | 1618456           |
| 1.2515        | 0.0346 | 35   | 1.1715          | 1884184           |
| 1.2252        | 0.0396 | 40   | 1.1675          | 2151456           |
| 1.1025        | 0.0445 | 45   | 1.1678          | 2433096           |
| 1.0027        | 0.0495 | 50   | 1.1923          | 2703168           |
| 0.8733        | 0.0544 | 55   | 1.2430          | 2972760           |
| 0.9613        | 0.0594 | 60   | 1.2412          | 3245272           |
| 0.7516        | 0.0643 | 65   | 1.2432          | 3527240           |
| 0.725         | 0.0693 | 70   | 1.2423          | 3800664           |
| 0.5538        | 0.0742 | 75   | 1.2425          | 4073744           |
| 0.5691        | 0.0792 | 80   | 1.2359          | 4344752           |
| 0.5045        | 0.0841 | 85   | 1.2374          | 4613944           |
| 0.4573        | 0.0891 | 90   | 1.2367          | 4882056           |
| 0.4547        | 0.0940 | 95   | 1.2425          | 5156560           |
| 0.3726        | 0.0989 | 100  | 1.2205          | 5434880           |
| 0.4008        | 0.1039 | 105  | 1.2563          | 5702128           |
| 0.4398        | 0.1088 | 110  | 1.2211          | 5970840           |
| 0.4661        | 0.1138 | 115  | 1.2340          | 6246272           |
| 0.3855        | 0.1187 | 120  | 1.2227          | 6519744           |
| 0.3038        | 0.1237 | 125  | 1.2171          | 6800648           |
| 0.3275        | 0.1286 | 130  | 1.2151          | 7070088           |
| 0.3554        | 0.1336 | 135  | 1.2065          | 7337640           |
| 0.334         | 0.1385 | 140  | 1.2135          | 7612368           |
| 0.3194        | 0.1435 | 145  | 1.2118          | 7885880           |
| 0.3137        | 0.1484 | 150  | 1.2113          | 8158848           |
| 0.269         | 0.1534 | 155  | 1.2168          | 8429632           |
| 0.2767        | 0.1583 | 160  | 1.2060          | 8695800           |
| 0.2308        | 0.1633 | 165  | 1.2081          | 8965416           |
| 0.3005        | 0.1682 | 170  | 1.2097          | 9235448           |
| 0.3053        | 0.1732 | 175  | 1.2008          | 9501016           |
| 0.2627        | 0.1781 | 180  | 1.2050          | 9769336           |
| 0.3102        | 0.1831 | 185  | 1.1977          | 10039440          |
| 0.2434        | 0.1880 | 190  | 1.1970          | 10315680          |
| 0.2099        | 0.1929 | 195  | 1.1956          | 10593112          |
| 0.2217        | 0.1979 | 200  | 1.1947          | 10858264          |
| 0.3017        | 0.2028 | 205  | 1.1948          | 11129712          |
| 0.3016        | 0.2078 | 210  | 1.1907          | 11391368          |
| 0.2341        | 0.2127 | 215  | 1.1960          | 11671592          |
| 0.2846        | 0.2177 | 220  | 1.1854          | 11942936          |
| 0.2321        | 0.2226 | 225  | 1.1937          | 12216472          |
| 0.2581        | 0.2276 | 230  | 1.1934          | 12489632          |
| 0.3464        | 0.2325 | 235  | 1.1973          | 12762864          |
| 0.3527        | 0.2375 | 240  | 1.1906          | 13040536          |
| 0.2507        | 0.2424 | 245  | 1.1935          | 13313504          |
| 0.2061        | 0.2474 | 250  | 1.1851          | 13583408          |
| 0.3266        | 0.2523 | 255  | 1.1831          | 13850728          |
| 0.4595        | 0.2573 | 260  | 1.1863          | 14124576          |
| 0.2244        | 0.2622 | 265  | 1.1841          | 14398448          |
| 0.2672        | 0.2672 | 270  | 1.1829          | 14667184          |
| 0.2541        | 0.2721 | 275  | 1.1854          | 14941048          |
| 0.1679        | 0.2771 | 280  | 1.1851          | 15204600          |
| 0.1725        | 0.2820 | 285  | 1.1783          | 15480600          |
| 0.1721        | 0.2870 | 290  | 1.1806          | 15746904          |
| 0.281         | 0.2919 | 295  | 1.1750          | 16026392          |
| 0.2155        | 0.2968 | 300  | 1.1780          | 16291224          |
| 0.169         | 0.3018 | 305  | 1.1738          | 16559872          |
| 0.3579        | 0.3067 | 310  | 1.1797          | 16828144          |
| 0.2431        | 0.3117 | 315  | 1.1706          | 17096176          |
| 0.2496        | 0.3166 | 320  | 1.1731          | 17363720          |
| 0.2482        | 0.3216 | 325  | 1.1718          | 17633640          |
| 0.2215        | 0.3265 | 330  | 1.1728          | 17905448          |
| 0.263         | 0.3315 | 335  | 1.1684          | 18177864          |
| 0.1697        | 0.3364 | 340  | 1.1680          | 18453760          |
| 0.2254        | 0.3414 | 345  | 1.1685          | 18727584          |
| 0.2537        | 0.3463 | 350  | 1.1671          | 18996744          |
| 0.1607        | 0.3513 | 355  | 1.1692          | 19260984          |
| 0.1744        | 0.3562 | 360  | 1.1624          | 19528624          |
| 0.1572        | 0.3612 | 365  | 1.1659          | 19805200          |
| 0.2199        | 0.3661 | 370  | 1.1687          | 20082016          |
| 0.2309        | 0.3711 | 375  | 1.1616          | 20354376          |
| 0.2652        | 0.3760 | 380  | 1.1637          | 20626344          |
| 0.1892        | 0.3810 | 385  | 1.1604          | 20899232          |
| 0.2646        | 0.3859 | 390  | 1.1577          | 21175128          |
| 0.2623        | 0.3908 | 395  | 1.1575          | 21440072          |
| 0.2045        | 0.3958 | 400  | 1.1554          | 21710088          |
| 0.2057        | 0.4007 | 405  | 1.1542          | 21980272          |
| 0.177         | 0.4057 | 410  | 1.1547          | 22247080          |
| 0.1791        | 0.4106 | 415  | 1.1558          | 22519520          |
| 0.147         | 0.4156 | 420  | 1.1538          | 22791800          |
| 0.181         | 0.4205 | 425  | 1.1581          | 23060096          |
| 0.1925        | 0.4255 | 430  | 1.1542          | 23327888          |
| 0.226         | 0.4304 | 435  | 1.1546          | 23605640          |
| 0.2219        | 0.4354 | 440  | 1.1531          | 23873272          |
| 0.1997        | 0.4403 | 445  | 1.1515          | 24142160          |
| 0.2017        | 0.4453 | 450  | 1.1503          | 24408600          |
| 0.2191        | 0.4502 | 455  | 1.1489          | 24685024          |
| 0.1724        | 0.4552 | 460  | 1.1469          | 24957864          |
| 0.2203        | 0.4601 | 465  | 1.1483          | 25227120          |
| 0.2019        | 0.4651 | 470  | 1.1479          | 25495120          |
| 0.2099        | 0.4700 | 475  | 1.1453          | 25767128          |
| 0.241         | 0.4750 | 480  | 1.1447          | 26045272          |
| 0.1307        | 0.4799 | 485  | 1.1476          | 26323032          |
| 0.1545        | 0.4848 | 490  | 1.1466          | 26592416          |
| 0.1234        | 0.4898 | 495  | 1.1474          | 26858488          |
| 0.2571        | 0.4947 | 500  | 1.1487          | 27124856          |
| 0.1971        | 0.4997 | 505  | 1.1439          | 27397920          |
| 0.1973        | 0.5046 | 510  | 1.1430          | 27673136          |
| 0.1017        | 0.5096 | 515  | 1.1430          | 27940240          |
| 0.1398        | 0.5145 | 520  | 1.1435          | 28210584          |
| 0.23          | 0.5195 | 525  | 1.1442          | 28481592          |
| 0.2157        | 0.5244 | 530  | 1.1407          | 28751960          |
| 0.188         | 0.5294 | 535  | 1.1424          | 29032104          |
| 0.1906        | 0.5343 | 540  | 1.1449          | 29308024          |
| 0.2073        | 0.5393 | 545  | 1.1410          | 29572512          |
| 0.1434        | 0.5442 | 550  | 1.1409          | 29841968          |
| 0.2084        | 0.5492 | 555  | 1.1390          | 30114568          |
| 0.1681        | 0.5541 | 560  | 1.1375          | 30389328          |
| 0.1294        | 0.5591 | 565  | 1.1382          | 30663240          |
| 0.3395        | 0.5640 | 570  | 1.1378          | 30936928          |
| 0.1858        | 0.5690 | 575  | 1.1371          | 31205160          |
| 0.1672        | 0.5739 | 580  | 1.1371          | 31475368          |
| 0.1655        | 0.5788 | 585  | 1.1349          | 31754816          |
| 0.225         | 0.5838 | 590  | 1.1393          | 32025488          |
| 0.1848        | 0.5887 | 595  | 1.1365          | 32296504          |
| 0.1721        | 0.5937 | 600  | 1.1360          | 32568200          |
| 0.2217        | 0.5986 | 605  | 1.1389          | 32838328          |
| 0.1805        | 0.6036 | 610  | 1.1340          | 33109144          |
| 0.1842        | 0.6085 | 615  | 1.1356          | 33383840          |
| 0.2154        | 0.6135 | 620  | 1.1379          | 33653192          |
| 0.1544        | 0.6184 | 625  | 1.1345          | 33923880          |
| 0.15          | 0.6234 | 630  | 1.1345          | 34199032          |
| 0.2598        | 0.6283 | 635  | 1.1399          | 34474616          |
| 0.1512        | 0.6333 | 640  | 1.1339          | 34738176          |
| 0.1904        | 0.6382 | 645  | 1.1327          | 35007928          |
| 0.1674        | 0.6432 | 650  | 1.1337          | 35282072          |
| 0.2378        | 0.6481 | 655  | 1.1323          | 35560808          |
| 0.2768        | 0.6531 | 660  | 1.1310          | 35830608          |
| 0.1568        | 0.6580 | 665  | 1.1303          | 36099152          |
| 0.1588        | 0.6630 | 670  | 1.1319          | 36368888          |
| 0.1512        | 0.6679 | 675  | 1.1304          | 36643144          |
| 0.1405        | 0.6729 | 680  | 1.1287          | 36915576          |
| 0.1606        | 0.6778 | 685  | 1.1305          | 37188760          |
| 0.2743        | 0.6827 | 690  | 1.1299          | 37464904          |
| 0.2031        | 0.6877 | 695  | 1.1283          | 37735024          |
| 0.231         | 0.6926 | 700  | 1.1300          | 38009432          |
| 0.2176        | 0.6976 | 705  | 1.1279          | 38279672          |
| 0.168         | 0.7025 | 710  | 1.1283          | 38551560          |
| 0.2019        | 0.7075 | 715  | 1.1283          | 38819848          |
| 0.1824        | 0.7124 | 720  | 1.1266          | 39098320          |
| 0.1796        | 0.7174 | 725  | 1.1301          | 39369560          |
| 0.1729        | 0.7223 | 730  | 1.1279          | 39641720          |
| 0.1295        | 0.7273 | 735  | 1.1261          | 39910968          |
| 0.1952        | 0.7322 | 740  | 1.1287          | 40184432          |
| 0.199         | 0.7372 | 745  | 1.1257          | 40459144          |
| 0.2263        | 0.7421 | 750  | 1.1250          | 40731824          |
| 0.1827        | 0.7471 | 755  | 1.1241          | 41007352          |
| 0.2208        | 0.7520 | 760  | 1.1239          | 41285568          |
| 0.1647        | 0.7570 | 765  | 1.1269          | 41555600          |
| 0.1852        | 0.7619 | 770  | 1.1255          | 41828768          |
| 0.144         | 0.7669 | 775  | 1.1229          | 42093936          |
| 0.1777        | 0.7718 | 780  | 1.1250          | 42364320          |
| 0.1588        | 0.7767 | 785  | 1.1231          | 42641592          |
| 0.1641        | 0.7817 | 790  | 1.1227          | 42908024          |
| 0.2053        | 0.7866 | 795  | 1.1227          | 43174304          |
| 0.2087        | 0.7916 | 800  | 1.1205          | 43450320          |
| 0.1329        | 0.7965 | 805  | 1.1225          | 43725176          |
| 0.2402        | 0.8015 | 810  | 1.1220          | 43999000          |
| 0.199         | 0.8064 | 815  | 1.1183          | 44268504          |
| 0.1698        | 0.8114 | 820  | 1.1174          | 44536976          |
| 0.1965        | 0.8163 | 825  | 1.1181          | 44802256          |
| 0.2117        | 0.8213 | 830  | 1.1200          | 45077072          |
| 0.233         | 0.8262 | 835  | 1.1182          | 45342240          |
| 0.1588        | 0.8312 | 840  | 1.1198          | 45621432          |
| 0.1998        | 0.8361 | 845  | 1.1182          | 45892288          |
| 0.1661        | 0.8411 | 850  | 1.1197          | 46165816          |
| 0.1791        | 0.8460 | 855  | 1.1206          | 46442088          |
| 0.2373        | 0.8510 | 860  | 1.1169          | 46719776          |
| 0.1832        | 0.8559 | 865  | 1.1153          | 46988272          |
| 0.1202        | 0.8609 | 870  | 1.1187          | 47259640          |
| 0.1519        | 0.8658 | 875  | 1.1163          | 47525952          |
| 0.1704        | 0.8707 | 880  | 1.1149          | 47789840          |
| 0.2459        | 0.8757 | 885  | 1.1145          | 48067344          |
| 0.2517        | 0.8806 | 890  | 1.1131          | 48344352          |
| 0.1845        | 0.8856 | 895  | 1.1133          | 48615424          |
| 0.1957        | 0.8905 | 900  | 1.1164          | 48886208          |
| 0.1864        | 0.8955 | 905  | 1.1168          | 49153480          |
| 0.1807        | 0.9004 | 910  | 1.1162          | 49423304          |
| 0.1484        | 0.9054 | 915  | 1.1162          | 49696776          |
| 0.1922        | 0.9103 | 920  | 1.1164          | 49978144          |
| 0.2536        | 0.9153 | 925  | 1.1164          | 50246656          |
| 0.2772        | 0.9202 | 930  | 1.1146          | 50519528          |
| 0.1272        | 0.9252 | 935  | 1.1143          | 50784920          |
| 0.1583        | 0.9301 | 940  | 1.1163          | 51064104          |
| 0.2417        | 0.9351 | 945  | 1.1146          | 51336640          |
| 0.1931        | 0.9400 | 950  | 1.1127          | 51611928          |
| 0.1275        | 0.9450 | 955  | 1.1146          | 51881976          |
| 0.2402        | 0.9499 | 960  | 1.1160          | 52155824          |
| 0.1722        | 0.9549 | 965  | 1.1125          | 52423080          |
| 0.1641        | 0.9598 | 970  | 1.1132          | 52696224          |
| 0.156         | 0.9647 | 975  | 1.1160          | 52965272          |
| 0.1804        | 0.9697 | 980  | 1.1143          | 53236816          |
| 0.1858        | 0.9746 | 985  | 1.1138          | 53507704          |
| 0.1585        | 0.9796 | 990  | 1.1140          | 53783232          |
| 0.1601        | 0.9845 | 995  | 1.1132          | 54053576          |
| 0.1974        | 0.9895 | 1000 | 1.1144          | 54321712          |
| 0.2114        | 0.9944 | 1005 | 1.1117          | 54594056          |
| 0.2106        | 0.9994 | 1010 | 1.1117          | 54862728          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1