File size: 12,432 Bytes
024b0e0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1029
- Num Input Tokens Seen: 41091672

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3909          | 0                 |
| 1.5665        | 0.0066 | 5    | 1.3873          | 272560            |
| 1.5456        | 0.0132 | 10   | 1.3529          | 547080            |
| 1.4344        | 0.0198 | 15   | 1.2836          | 822880            |
| 1.4512        | 0.0264 | 20   | 1.2345          | 1089864           |
| 1.3462        | 0.0330 | 25   | 1.1901          | 1361576           |
| 1.1712        | 0.0396 | 30   | 1.1835          | 1634504           |
| 1.0826        | 0.0462 | 35   | 1.1964          | 1895936           |
| 0.9291        | 0.0527 | 40   | 1.1914          | 2166120           |
| 0.8296        | 0.0593 | 45   | 1.2208          | 2435904           |
| 0.6654        | 0.0659 | 50   | 1.2499          | 2706240           |
| 0.6401        | 0.0725 | 55   | 1.2356          | 2984976           |
| 0.6449        | 0.0791 | 60   | 1.2089          | 3257728           |
| 0.5585        | 0.0857 | 65   | 1.2026          | 3526976           |
| 0.468         | 0.0923 | 70   | 1.2120          | 3804888           |
| 0.5271        | 0.0989 | 75   | 1.2040          | 4078544           |
| 0.3901        | 0.1055 | 80   | 1.1976          | 4356048           |
| 0.4389        | 0.1121 | 85   | 1.2049          | 4621624           |
| 0.3482        | 0.1187 | 90   | 1.1972          | 4888632           |
| 0.3224        | 0.1253 | 95   | 1.1926          | 5152168           |
| 0.4305        | 0.1319 | 100  | 1.1944          | 5423968           |
| 0.3758        | 0.1385 | 105  | 1.1825          | 5697240           |
| 0.3646        | 0.1450 | 110  | 1.1919          | 5971384           |
| 0.3215        | 0.1516 | 115  | 1.1776          | 6240360           |
| 0.3273        | 0.1582 | 120  | 1.1907          | 6509288           |
| 0.3152        | 0.1648 | 125  | 1.1786          | 6779048           |
| 0.2365        | 0.1714 | 130  | 1.1833          | 7048200           |
| 0.3342        | 0.1780 | 135  | 1.1750          | 7316656           |
| 0.3586        | 0.1846 | 140  | 1.1774          | 7590728           |
| 0.2927        | 0.1912 | 145  | 1.1737          | 7859680           |
| 0.3788        | 0.1978 | 150  | 1.1760          | 8126224           |
| 0.2964        | 0.2044 | 155  | 1.1741          | 8403808           |
| 0.2938        | 0.2110 | 160  | 1.1677          | 8672216           |
| 0.2518        | 0.2176 | 165  | 1.1735          | 8946264           |
| 0.3334        | 0.2242 | 170  | 1.1647          | 9208352           |
| 0.311         | 0.2308 | 175  | 1.1647          | 9477208           |
| 0.3065        | 0.2373 | 180  | 1.1620          | 9748024           |
| 0.2517        | 0.2439 | 185  | 1.1613          | 10021768          |
| 0.2672        | 0.2505 | 190  | 1.1569          | 10293208          |
| 0.2611        | 0.2571 | 195  | 1.1545          | 10569280          |
| 0.2265        | 0.2637 | 200  | 1.1548          | 10840984          |
| 0.3068        | 0.2703 | 205  | 1.1520          | 11116568          |
| 0.2929        | 0.2769 | 210  | 1.1568          | 11394928          |
| 0.3351        | 0.2835 | 215  | 1.1547          | 11666600          |
| 0.2687        | 0.2901 | 220  | 1.1544          | 11946656          |
| 0.2501        | 0.2967 | 225  | 1.1479          | 12224240          |
| 0.1991        | 0.3033 | 230  | 1.1520          | 12500672          |
| 0.2434        | 0.3099 | 235  | 1.1477          | 12767840          |
| 0.1667        | 0.3165 | 240  | 1.1453          | 13035688          |
| 0.2564        | 0.3231 | 245  | 1.1509          | 13312232          |
| 0.2856        | 0.3297 | 250  | 1.1436          | 13584328          |
| 0.305         | 0.3362 | 255  | 1.1425          | 13853288          |
| 0.2765        | 0.3428 | 260  | 1.1456          | 14113512          |
| 0.2209        | 0.3494 | 265  | 1.1455          | 14385280          |
| 0.2125        | 0.3560 | 270  | 1.1410          | 14660096          |
| 0.274         | 0.3626 | 275  | 1.1417          | 14931976          |
| 0.2181        | 0.3692 | 280  | 1.1411          | 15202008          |
| 0.2481        | 0.3758 | 285  | 1.1374          | 15468896          |
| 0.2629        | 0.3824 | 290  | 1.1372          | 15733744          |
| 0.2826        | 0.3890 | 295  | 1.1366          | 16004424          |
| 0.2646        | 0.3956 | 300  | 1.1363          | 16276088          |
| 0.2729        | 0.4022 | 305  | 1.1333          | 16547304          |
| 0.2735        | 0.4088 | 310  | 1.1350          | 16819224          |
| 0.2881        | 0.4154 | 315  | 1.1349          | 17088704          |
| 0.2208        | 0.4220 | 320  | 1.1304          | 17362560          |
| 0.1822        | 0.4285 | 325  | 1.1348          | 17632840          |
| 0.3197        | 0.4351 | 330  | 1.1306          | 17903232          |
| 0.1763        | 0.4417 | 335  | 1.1287          | 18171208          |
| 0.2851        | 0.4483 | 340  | 1.1333          | 18444312          |
| 0.2406        | 0.4549 | 345  | 1.1318          | 18716768          |
| 0.2571        | 0.4615 | 350  | 1.1291          | 18983016          |
| 0.3931        | 0.4681 | 355  | 1.1282          | 19256840          |
| 0.1952        | 0.4747 | 360  | 1.1287          | 19527776          |
| 0.227         | 0.4813 | 365  | 1.1282          | 19800232          |
| 0.2979        | 0.4879 | 370  | 1.1285          | 20074720          |
| 0.1515        | 0.4945 | 375  | 1.1280          | 20350824          |
| 0.336         | 0.5011 | 380  | 1.1254          | 20627392          |
| 0.2381        | 0.5077 | 385  | 1.1258          | 20900344          |
| 0.2331        | 0.5143 | 390  | 1.1253          | 21173120          |
| 0.2176        | 0.5209 | 395  | 1.1250          | 21442720          |
| 0.232         | 0.5274 | 400  | 1.1268          | 21711376          |
| 0.2648        | 0.5340 | 405  | 1.1246          | 21977752          |
| 0.2398        | 0.5406 | 410  | 1.1241          | 22247224          |
| 0.2246        | 0.5472 | 415  | 1.1245          | 22525976          |
| 0.2836        | 0.5538 | 420  | 1.1199          | 22795472          |
| 0.242         | 0.5604 | 425  | 1.1233          | 23063720          |
| 0.2369        | 0.5670 | 430  | 1.1230          | 23333144          |
| 0.2856        | 0.5736 | 435  | 1.1206          | 23599032          |
| 0.2595        | 0.5802 | 440  | 1.1208          | 23871616          |
| 0.2154        | 0.5868 | 445  | 1.1188          | 24144160          |
| 0.2541        | 0.5934 | 450  | 1.1208          | 24412552          |
| 0.2378        | 0.6000 | 455  | 1.1210          | 24683400          |
| 0.233         | 0.6066 | 460  | 1.1183          | 24956656          |
| 0.3136        | 0.6132 | 465  | 1.1211          | 25235888          |
| 0.2549        | 0.6197 | 470  | 1.1185          | 25505944          |
| 0.259         | 0.6263 | 475  | 1.1179          | 25776080          |
| 0.1539        | 0.6329 | 480  | 1.1197          | 26043984          |
| 0.2459        | 0.6395 | 485  | 1.1183          | 26318896          |
| 0.2342        | 0.6461 | 490  | 1.1182          | 26585616          |
| 0.2173        | 0.6527 | 495  | 1.1172          | 26862168          |
| 0.3048        | 0.6593 | 500  | 1.1172          | 27130760          |
| 0.2851        | 0.6659 | 505  | 1.1142          | 27397928          |
| 0.2091        | 0.6725 | 510  | 1.1148          | 27670712          |
| 0.3143        | 0.6791 | 515  | 1.1149          | 27933056          |
| 0.1672        | 0.6857 | 520  | 1.1152          | 28201952          |
| 0.3181        | 0.6923 | 525  | 1.1164          | 28477464          |
| 0.1914        | 0.6989 | 530  | 1.1174          | 28743664          |
| 0.2931        | 0.7055 | 535  | 1.1155          | 29016592          |
| 0.2285        | 0.7120 | 540  | 1.1133          | 29283872          |
| 0.2749        | 0.7186 | 545  | 1.1163          | 29554240          |
| 0.2901        | 0.7252 | 550  | 1.1145          | 29821128          |
| 0.2361        | 0.7318 | 555  | 1.1114          | 30095352          |
| 0.2654        | 0.7384 | 560  | 1.1125          | 30371160          |
| 0.1935        | 0.7450 | 565  | 1.1129          | 30645928          |
| 0.268         | 0.7516 | 570  | 1.1101          | 30919376          |
| 0.1795        | 0.7582 | 575  | 1.1139          | 31186848          |
| 0.2439        | 0.7648 | 580  | 1.1122          | 31459480          |
| 0.259         | 0.7714 | 585  | 1.1091          | 31733560          |
| 0.248         | 0.7780 | 590  | 1.1105          | 32003016          |
| 0.2186        | 0.7846 | 595  | 1.1106          | 32278448          |
| 0.1595        | 0.7912 | 600  | 1.1115          | 32538192          |
| 0.2058        | 0.7978 | 605  | 1.1117          | 32816064          |
| 0.2324        | 0.8044 | 610  | 1.1095          | 33087144          |
| 0.2045        | 0.8109 | 615  | 1.1094          | 33353000          |
| 0.2333        | 0.8175 | 620  | 1.1095          | 33621888          |
| 0.2159        | 0.8241 | 625  | 1.1076          | 33888104          |
| 0.2866        | 0.8307 | 630  | 1.1094          | 34159240          |
| 0.2268        | 0.8373 | 635  | 1.1101          | 34430064          |
| 0.1753        | 0.8439 | 640  | 1.1100          | 34700128          |
| 0.2076        | 0.8505 | 645  | 1.1089          | 34968768          |
| 0.1912        | 0.8571 | 650  | 1.1069          | 35250136          |
| 0.1534        | 0.8637 | 655  | 1.1074          | 35524024          |
| 0.1424        | 0.8703 | 660  | 1.1083          | 35789520          |
| 0.2325        | 0.8769 | 665  | 1.1076          | 36067376          |
| 0.2607        | 0.8835 | 670  | 1.1046          | 36340512          |
| 0.234         | 0.8901 | 675  | 1.1048          | 36603160          |
| 0.232         | 0.8967 | 680  | 1.1081          | 36872480          |
| 0.2998        | 0.9032 | 685  | 1.1080          | 37146736          |
| 0.1921        | 0.9098 | 690  | 1.1045          | 37414776          |
| 0.2492        | 0.9164 | 695  | 1.1060          | 37685600          |
| 0.27          | 0.9230 | 700  | 1.1068          | 37949648          |
| 0.2159        | 0.9296 | 705  | 1.1046          | 38226312          |
| 0.1912        | 0.9362 | 710  | 1.1062          | 38502072          |
| 0.23          | 0.9428 | 715  | 1.1076          | 38772744          |
| 0.3387        | 0.9494 | 720  | 1.1054          | 39041632          |
| 0.23          | 0.9560 | 725  | 1.1051          | 39313560          |
| 0.2785        | 0.9626 | 730  | 1.1065          | 39585992          |
| 0.2116        | 0.9692 | 735  | 1.1030          | 39856632          |
| 0.2378        | 0.9758 | 740  | 1.1040          | 40120176          |
| 0.2006        | 0.9824 | 745  | 1.1046          | 40392064          |
| 0.2418        | 0.9890 | 750  | 1.1024          | 40664776          |
| 0.2041        | 0.9955 | 755  | 1.1028          | 40931592          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1