File size: 15,242 Bytes
0edfb5a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter10_sftsd2

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1051
- Num Input Tokens Seen: 51561256

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3909          | 0                 |
| 1.6395        | 0.0052 | 5    | 1.3878          | 267176            |
| 1.6107        | 0.0105 | 10   | 1.3632          | 540632            |
| 1.4506        | 0.0157 | 15   | 1.3071          | 813344            |
| 1.4544        | 0.0210 | 20   | 1.2594          | 1084816           |
| 1.3649        | 0.0262 | 25   | 1.2178          | 1358992           |
| 1.2237        | 0.0315 | 30   | 1.1855          | 1623176           |
| 1.0773        | 0.0367 | 35   | 1.2001          | 1896376           |
| 1.0064        | 0.0420 | 40   | 1.2029          | 2174144           |
| 0.881         | 0.0472 | 45   | 1.2243          | 2448704           |
| 0.7121        | 0.0525 | 50   | 1.2347          | 2725896           |
| 0.6637        | 0.0577 | 55   | 1.2538          | 3006256           |
| 0.5756        | 0.0630 | 60   | 1.2486          | 3270560           |
| 0.6061        | 0.0682 | 65   | 1.2163          | 3542448           |
| 0.5961        | 0.0735 | 70   | 1.2462          | 3812144           |
| 0.5377        | 0.0787 | 75   | 1.2146          | 4086216           |
| 0.435         | 0.0840 | 80   | 1.2242          | 4349616           |
| 0.4432        | 0.0892 | 85   | 1.2068          | 4621736           |
| 0.4509        | 0.0945 | 90   | 1.2115          | 4896352           |
| 0.3424        | 0.0997 | 95   | 1.2045          | 5167360           |
| 0.3246        | 0.1050 | 100  | 1.2066          | 5441952           |
| 0.2544        | 0.1102 | 105  | 1.2112          | 5718120           |
| 0.2491        | 0.1155 | 110  | 1.1994          | 5989720           |
| 0.2749        | 0.1207 | 115  | 1.1977          | 6257904           |
| 0.3471        | 0.1260 | 120  | 1.1918          | 6525952           |
| 0.3511        | 0.1312 | 125  | 1.1877          | 6798440           |
| 0.2885        | 0.1365 | 130  | 1.1872          | 7070104           |
| 0.2825        | 0.1417 | 135  | 1.1857          | 7336696           |
| 0.3035        | 0.1470 | 140  | 1.1862          | 7601464           |
| 0.321         | 0.1522 | 145  | 1.1838          | 7873064           |
| 0.269         | 0.1575 | 150  | 1.1795          | 8142088           |
| 0.2346        | 0.1627 | 155  | 1.1806          | 8420848           |
| 0.2221        | 0.1680 | 160  | 1.1779          | 8688752           |
| 0.2448        | 0.1732 | 165  | 1.1811          | 8950224           |
| 0.2481        | 0.1785 | 170  | 1.1734          | 9221064           |
| 0.2657        | 0.1837 | 175  | 1.1731          | 9490952           |
| 0.2078        | 0.1890 | 180  | 1.1774          | 9760272           |
| 0.1971        | 0.1942 | 185  | 1.1717          | 10030424          |
| 0.3152        | 0.1995 | 190  | 1.1740          | 10294632          |
| 0.3539        | 0.2047 | 195  | 1.1652          | 10570384          |
| 0.2638        | 0.2100 | 200  | 1.1660          | 10838120          |
| 0.2894        | 0.2152 | 205  | 1.1641          | 11112144          |
| 0.2773        | 0.2205 | 210  | 1.1633          | 11381896          |
| 0.2081        | 0.2257 | 215  | 1.1643          | 11648168          |
| 0.2585        | 0.2310 | 220  | 1.1650          | 11919680          |
| 0.2927        | 0.2362 | 225  | 1.1607          | 12199888          |
| 0.2706        | 0.2415 | 230  | 1.1563          | 12469960          |
| 0.2444        | 0.2467 | 235  | 1.1585          | 12742352          |
| 0.3255        | 0.2520 | 240  | 1.1537          | 13019496          |
| 0.1864        | 0.2572 | 245  | 1.1556          | 13291024          |
| 0.1361        | 0.2624 | 250  | 1.1589          | 13560408          |
| 0.2366        | 0.2677 | 255  | 1.1531          | 13829232          |
| 0.1542        | 0.2729 | 260  | 1.1539          | 14105144          |
| 0.2822        | 0.2782 | 265  | 1.1512          | 14376360          |
| 0.1825        | 0.2834 | 270  | 1.1496          | 14649592          |
| 0.2948        | 0.2887 | 275  | 1.1551          | 14921560          |
| 0.2679        | 0.2939 | 280  | 1.1502          | 15194304          |
| 0.158         | 0.2992 | 285  | 1.1546          | 15456096          |
| 0.2154        | 0.3044 | 290  | 1.1482          | 15730608          |
| 0.2468        | 0.3097 | 295  | 1.1464          | 16007568          |
| 0.2797        | 0.3149 | 300  | 1.1468          | 16277896          |
| 0.2034        | 0.3202 | 305  | 1.1468          | 16552168          |
| 0.207         | 0.3254 | 310  | 1.1477          | 16819608          |
| 0.1315        | 0.3307 | 315  | 1.1431          | 17095096          |
| 0.2116        | 0.3359 | 320  | 1.1466          | 17360040          |
| 0.1816        | 0.3412 | 325  | 1.1430          | 17628168          |
| 0.1886        | 0.3464 | 330  | 1.1419          | 17892832          |
| 0.2278        | 0.3517 | 335  | 1.1409          | 18160480          |
| 0.2196        | 0.3569 | 340  | 1.1372          | 18428000          |
| 0.1998        | 0.3622 | 345  | 1.1400          | 18703480          |
| 0.1677        | 0.3674 | 350  | 1.1422          | 18971744          |
| 0.2223        | 0.3727 | 355  | 1.1361          | 19232072          |
| 0.2093        | 0.3779 | 360  | 1.1416          | 19504104          |
| 0.1497        | 0.3832 | 365  | 1.1375          | 19778184          |
| 0.1653        | 0.3884 | 370  | 1.1388          | 20048968          |
| 0.2041        | 0.3937 | 375  | 1.1405          | 20317848          |
| 0.2684        | 0.3989 | 380  | 1.1339          | 20595176          |
| 0.1934        | 0.4042 | 385  | 1.1342          | 20872472          |
| 0.1928        | 0.4094 | 390  | 1.1338          | 21145584          |
| 0.2346        | 0.4147 | 395  | 1.1327          | 21416912          |
| 0.2328        | 0.4199 | 400  | 1.1342          | 21690224          |
| 0.164         | 0.4252 | 405  | 1.1311          | 21964640          |
| 0.2526        | 0.4304 | 410  | 1.1341          | 22238344          |
| 0.2819        | 0.4357 | 415  | 1.1312          | 22510304          |
| 0.239         | 0.4409 | 420  | 1.1300          | 22782368          |
| 0.2154        | 0.4462 | 425  | 1.1295          | 23049152          |
| 0.1869        | 0.4514 | 430  | 1.1303          | 23318744          |
| 0.1654        | 0.4567 | 435  | 1.1283          | 23590656          |
| 0.2803        | 0.4619 | 440  | 1.1289          | 23861080          |
| 0.1311        | 0.4672 | 445  | 1.1297          | 24130976          |
| 0.1567        | 0.4724 | 450  | 1.1267          | 24404720          |
| 0.2344        | 0.4777 | 455  | 1.1300          | 24675848          |
| 0.2017        | 0.4829 | 460  | 1.1268          | 24943744          |
| 0.1729        | 0.4882 | 465  | 1.1274          | 25217656          |
| 0.2135        | 0.4934 | 470  | 1.1255          | 25486608          |
| 0.2117        | 0.4987 | 475  | 1.1246          | 25756672          |
| 0.1748        | 0.5039 | 480  | 1.1274          | 26023496          |
| 0.2428        | 0.5092 | 485  | 1.1259          | 26297464          |
| 0.2141        | 0.5144 | 490  | 1.1225          | 26569864          |
| 0.1829        | 0.5197 | 495  | 1.1264          | 26841160          |
| 0.2652        | 0.5249 | 500  | 1.1240          | 27106056          |
| 0.2427        | 0.5301 | 505  | 1.1212          | 27368448          |
| 0.3393        | 0.5354 | 510  | 1.1203          | 27642144          |
| 0.1654        | 0.5406 | 515  | 1.1219          | 27909856          |
| 0.2285        | 0.5459 | 520  | 1.1240          | 28180576          |
| 0.1352        | 0.5511 | 525  | 1.1222          | 28446952          |
| 0.2311        | 0.5564 | 530  | 1.1222          | 28719016          |
| 0.1766        | 0.5616 | 535  | 1.1206          | 28991728          |
| 0.1618        | 0.5669 | 540  | 1.1222          | 29266888          |
| 0.2667        | 0.5721 | 545  | 1.1228          | 29536384          |
| 0.1595        | 0.5774 | 550  | 1.1198          | 29803968          |
| 0.1975        | 0.5826 | 555  | 1.1186          | 30077232          |
| 0.16          | 0.5879 | 560  | 1.1219          | 30344632          |
| 0.1519        | 0.5931 | 565  | 1.1203          | 30617848          |
| 0.2028        | 0.5984 | 570  | 1.1168          | 30886552          |
| 0.1633        | 0.6036 | 575  | 1.1172          | 31159704          |
| 0.2041        | 0.6089 | 580  | 1.1185          | 31435184          |
| 0.2646        | 0.6141 | 585  | 1.1188          | 31703784          |
| 0.1321        | 0.6194 | 590  | 1.1178          | 31965392          |
| 0.2071        | 0.6246 | 595  | 1.1189          | 32245064          |
| 0.1997        | 0.6299 | 600  | 1.1199          | 32522944          |
| 0.2234        | 0.6351 | 605  | 1.1158          | 32799088          |
| 0.2085        | 0.6404 | 610  | 1.1142          | 33066680          |
| 0.2189        | 0.6456 | 615  | 1.1181          | 33336744          |
| 0.1711        | 0.6509 | 620  | 1.1165          | 33608120          |
| 0.1327        | 0.6561 | 625  | 1.1165          | 33877624          |
| 0.1207        | 0.6614 | 630  | 1.1182          | 34153432          |
| 0.1734        | 0.6666 | 635  | 1.1163          | 34422440          |
| 0.2455        | 0.6719 | 640  | 1.1142          | 34691400          |
| 0.139         | 0.6771 | 645  | 1.1165          | 34963144          |
| 0.1745        | 0.6824 | 650  | 1.1162          | 35231216          |
| 0.1507        | 0.6876 | 655  | 1.1132          | 35499080          |
| 0.193         | 0.6929 | 660  | 1.1139          | 35771152          |
| 0.1836        | 0.6981 | 665  | 1.1190          | 36049240          |
| 0.1602        | 0.7034 | 670  | 1.1146          | 36323080          |
| 0.2058        | 0.7086 | 675  | 1.1125          | 36593960          |
| 0.1137        | 0.7139 | 680  | 1.1166          | 36855704          |
| 0.1914        | 0.7191 | 685  | 1.1165          | 37128976          |
| 0.1955        | 0.7244 | 690  | 1.1131          | 37393376          |
| 0.2652        | 0.7296 | 695  | 1.1144          | 37668392          |
| 0.2041        | 0.7349 | 700  | 1.1130          | 37941816          |
| 0.2098        | 0.7401 | 705  | 1.1137          | 38208872          |
| 0.1394        | 0.7454 | 710  | 1.1147          | 38483504          |
| 0.1655        | 0.7506 | 715  | 1.1131          | 38756464          |
| 0.2204        | 0.7559 | 720  | 1.1126          | 39022560          |
| 0.2006        | 0.7611 | 725  | 1.1147          | 39289592          |
| 0.1907        | 0.7664 | 730  | 1.1154          | 39561424          |
| 0.2051        | 0.7716 | 735  | 1.1157          | 39830136          |
| 0.1807        | 0.7769 | 740  | 1.1130          | 40099128          |
| 0.2034        | 0.7821 | 745  | 1.1115          | 40373136          |
| 0.2266        | 0.7873 | 750  | 1.1132          | 40649000          |
| 0.1649        | 0.7926 | 755  | 1.1125          | 40913656          |
| 0.1717        | 0.7978 | 760  | 1.1108          | 41190264          |
| 0.1176        | 0.8031 | 765  | 1.1117          | 41462168          |
| 0.2482        | 0.8083 | 770  | 1.1131          | 41737160          |
| 0.196         | 0.8136 | 775  | 1.1114          | 42007320          |
| 0.1976        | 0.8188 | 780  | 1.1120          | 42267744          |
| 0.2019        | 0.8241 | 785  | 1.1101          | 42538272          |
| 0.199         | 0.8293 | 790  | 1.1103          | 42808000          |
| 0.1572        | 0.8346 | 795  | 1.1096          | 43082512          |
| 0.2039        | 0.8398 | 800  | 1.1095          | 43352040          |
| 0.1645        | 0.8451 | 805  | 1.1079          | 43617936          |
| 0.1579        | 0.8503 | 810  | 1.1087          | 43885320          |
| 0.2538        | 0.8556 | 815  | 1.1083          | 44160984          |
| 0.2116        | 0.8608 | 820  | 1.1074          | 44432952          |
| 0.1852        | 0.8661 | 825  | 1.1074          | 44700144          |
| 0.1959        | 0.8713 | 830  | 1.1082          | 44965704          |
| 0.1776        | 0.8766 | 835  | 1.1083          | 45239256          |
| 0.2194        | 0.8818 | 840  | 1.1074          | 45506880          |
| 0.2027        | 0.8871 | 845  | 1.1061          | 45776016          |
| 0.2268        | 0.8923 | 850  | 1.1060          | 46047232          |
| 0.1698        | 0.8976 | 855  | 1.1069          | 46315208          |
| 0.1642        | 0.9028 | 860  | 1.1063          | 46586904          |
| 0.1471        | 0.9081 | 865  | 1.1056          | 46858472          |
| 0.1546        | 0.9133 | 870  | 1.1059          | 47129256          |
| 0.1702        | 0.9186 | 875  | 1.1068          | 47396888          |
| 0.1736        | 0.9238 | 880  | 1.1095          | 47666808          |
| 0.2423        | 0.9291 | 885  | 1.1071          | 47934920          |
| 0.1543        | 0.9343 | 890  | 1.1041          | 48209376          |
| 0.2803        | 0.9396 | 895  | 1.1065          | 48483448          |
| 0.1918        | 0.9448 | 900  | 1.1076          | 48746968          |
| 0.1441        | 0.9501 | 905  | 1.1020          | 49017312          |
| 0.2352        | 0.9553 | 910  | 1.1043          | 49295008          |
| 0.1239        | 0.9606 | 915  | 1.1040          | 49564488          |
| 0.2222        | 0.9658 | 920  | 1.1051          | 49834544          |
| 0.1531        | 0.9711 | 925  | 1.1042          | 50105776          |
| 0.1774        | 0.9763 | 930  | 1.1037          | 50374256          |
| 0.1364        | 0.9816 | 935  | 1.1044          | 50639352          |
| 0.1993        | 0.9868 | 940  | 1.1025          | 50908056          |
| 0.1525        | 0.9921 | 945  | 1.1039          | 51181256          |
| 0.2363        | 0.9973 | 950  | 1.1055          | 51454752          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1