File size: 10,992 Bytes
7302563
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.0939
- Num Input Tokens Seen: 36687080

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3909          | 0                 |
| 1.5743        | 0.0076 | 5    | 1.3850          | 286024            |
| 1.5698        | 0.0152 | 10   | 1.3359          | 565176            |
| 1.5023        | 0.0227 | 15   | 1.2721          | 843224            |
| 1.3784        | 0.0303 | 20   | 1.2210          | 1128808           |
| 1.1853        | 0.0379 | 25   | 1.1834          | 1409632           |
| 1.079         | 0.0455 | 30   | 1.1911          | 1688000           |
| 0.9274        | 0.0531 | 35   | 1.2022          | 1961576           |
| 0.8275        | 0.0607 | 40   | 1.2078          | 2242896           |
| 0.6817        | 0.0682 | 45   | 1.2485          | 2524032           |
| 0.5892        | 0.0758 | 50   | 1.2344          | 2801792           |
| 0.4418        | 0.0834 | 55   | 1.2415          | 3078040           |
| 0.4992        | 0.0910 | 60   | 1.1980          | 3358368           |
| 0.4529        | 0.0986 | 65   | 1.2040          | 3643320           |
| 0.4315        | 0.1062 | 70   | 1.2063          | 3920184           |
| 0.3633        | 0.1137 | 75   | 1.1887          | 4195744           |
| 0.3498        | 0.1213 | 80   | 1.1900          | 4474088           |
| 0.5205        | 0.1289 | 85   | 1.1810          | 4750552           |
| 0.4456        | 0.1365 | 90   | 1.1784          | 5033120           |
| 0.2259        | 0.1441 | 95   | 1.1689          | 5308224           |
| 0.2957        | 0.1517 | 100  | 1.1673          | 5584192           |
| 0.2861        | 0.1592 | 105  | 1.1622          | 5855384           |
| 0.396         | 0.1668 | 110  | 1.1576          | 6135472           |
| 0.2727        | 0.1744 | 115  | 1.1593          | 6417808           |
| 0.2863        | 0.1820 | 120  | 1.1536          | 6694768           |
| 0.3506        | 0.1896 | 125  | 1.1512          | 6974920           |
| 0.3593        | 0.1972 | 130  | 1.1506          | 7250952           |
| 0.3129        | 0.2047 | 135  | 1.1464          | 7528424           |
| 0.305         | 0.2123 | 140  | 1.1471          | 7796288           |
| 0.2969        | 0.2199 | 145  | 1.1458          | 8071736           |
| 0.3828        | 0.2275 | 150  | 1.1450          | 8354136           |
| 0.2908        | 0.2351 | 155  | 1.1426          | 8627856           |
| 0.3691        | 0.2427 | 160  | 1.1403          | 8906272           |
| 0.248         | 0.2502 | 165  | 1.1434          | 9190272           |
| 0.2853        | 0.2578 | 170  | 1.1398          | 9467688           |
| 0.336         | 0.2654 | 175  | 1.1423          | 9745264           |
| 0.2295        | 0.2730 | 180  | 1.1392          | 10022808          |
| 0.2522        | 0.2806 | 185  | 1.1382          | 10307056          |
| 0.2513        | 0.2882 | 190  | 1.1442          | 10582992          |
| 0.2799        | 0.2957 | 195  | 1.1370          | 10866240          |
| 0.2176        | 0.3033 | 200  | 1.1359          | 11148368          |
| 0.293         | 0.3109 | 205  | 1.1353          | 11433232          |
| 0.3076        | 0.3185 | 210  | 1.1317          | 11705656          |
| 0.2469        | 0.3261 | 215  | 1.1337          | 11983632          |
| 0.3734        | 0.3336 | 220  | 1.1323          | 12266112          |
| 0.2704        | 0.3412 | 225  | 1.1290          | 12547976          |
| 0.3469        | 0.3488 | 230  | 1.1300          | 12824592          |
| 0.3266        | 0.3564 | 235  | 1.1280          | 13098760          |
| 0.2528        | 0.3640 | 240  | 1.1268          | 13368616          |
| 0.2867        | 0.3716 | 245  | 1.1266          | 13650008          |
| 0.228         | 0.3791 | 250  | 1.1262          | 13927240          |
| 0.233         | 0.3867 | 255  | 1.1249          | 14203184          |
| 0.2724        | 0.3943 | 260  | 1.1250          | 14475384          |
| 0.2117        | 0.4019 | 265  | 1.1245          | 14760384          |
| 0.1981        | 0.4095 | 270  | 1.1226          | 15040960          |
| 0.2519        | 0.4171 | 275  | 1.1219          | 15323064          |
| 0.4068        | 0.4246 | 280  | 1.1205          | 15603904          |
| 0.2811        | 0.4322 | 285  | 1.1214          | 15883608          |
| 0.259         | 0.4398 | 290  | 1.1201          | 16159520          |
| 0.2938        | 0.4474 | 295  | 1.1208          | 16437656          |
| 0.2466        | 0.4550 | 300  | 1.1214          | 16716952          |
| 0.2997        | 0.4626 | 305  | 1.1162          | 16992344          |
| 0.2268        | 0.4701 | 310  | 1.1229          | 17268760          |
| 0.343         | 0.4777 | 315  | 1.1172          | 17547648          |
| 0.2424        | 0.4853 | 320  | 1.1154          | 17828288          |
| 0.2849        | 0.4929 | 325  | 1.1172          | 18107576          |
| 0.478         | 0.5005 | 330  | 1.1155          | 18387728          |
| 0.1959        | 0.5081 | 335  | 1.1162          | 18667088          |
| 0.1868        | 0.5156 | 340  | 1.1160          | 18950480          |
| 0.234         | 0.5232 | 345  | 1.1150          | 19228760          |
| 0.2519        | 0.5308 | 350  | 1.1135          | 19508952          |
| 0.2625        | 0.5384 | 355  | 1.1145          | 19787448          |
| 0.3843        | 0.5460 | 360  | 1.1109          | 20073168          |
| 0.3005        | 0.5536 | 365  | 1.1109          | 20343008          |
| 0.1833        | 0.5611 | 370  | 1.1110          | 20623352          |
| 0.2446        | 0.5687 | 375  | 1.1093          | 20901240          |
| 0.25          | 0.5763 | 380  | 1.1104          | 21185296          |
| 0.2897        | 0.5839 | 385  | 1.1103          | 21464672          |
| 0.168         | 0.5915 | 390  | 1.1099          | 21743520          |
| 0.2387        | 0.5991 | 395  | 1.1106          | 22023544          |
| 0.2066        | 0.6066 | 400  | 1.1072          | 22291944          |
| 0.2191        | 0.6142 | 405  | 1.1089          | 22572096          |
| 0.1869        | 0.6218 | 410  | 1.1085          | 22849472          |
| 0.1939        | 0.6294 | 415  | 1.1075          | 23126440          |
| 0.2368        | 0.6370 | 420  | 1.1091          | 23406096          |
| 0.2209        | 0.6445 | 425  | 1.1066          | 23678072          |
| 0.2523        | 0.6521 | 430  | 1.1077          | 23961192          |
| 0.2416        | 0.6597 | 435  | 1.1082          | 24240520          |
| 0.1964        | 0.6673 | 440  | 1.1057          | 24520856          |
| 0.2369        | 0.6749 | 445  | 1.1055          | 24798288          |
| 0.23          | 0.6825 | 450  | 1.1074          | 25075848          |
| 0.2349        | 0.6900 | 455  | 1.1046          | 25344112          |
| 0.243         | 0.6976 | 460  | 1.1063          | 25625216          |
| 0.3343        | 0.7052 | 465  | 1.1066          | 25901904          |
| 0.2341        | 0.7128 | 470  | 1.1042          | 26177128          |
| 0.283         | 0.7204 | 475  | 1.1059          | 26459400          |
| 0.3112        | 0.7280 | 480  | 1.1066          | 26736784          |
| 0.3015        | 0.7355 | 485  | 1.1042          | 27017152          |
| 0.2788        | 0.7431 | 490  | 1.1031          | 27295048          |
| 0.1838        | 0.7507 | 495  | 1.1025          | 27575392          |
| 0.2366        | 0.7583 | 500  | 1.1036          | 27852328          |
| 0.297         | 0.7659 | 505  | 1.1032          | 28130032          |
| 0.1622        | 0.7735 | 510  | 1.1015          | 28407672          |
| 0.165         | 0.7810 | 515  | 1.1012          | 28680696          |
| 0.3047        | 0.7886 | 520  | 1.1010          | 28957216          |
| 0.336         | 0.7962 | 525  | 1.1012          | 29235048          |
| 0.2728        | 0.8038 | 530  | 1.1011          | 29507352          |
| 0.2007        | 0.8114 | 535  | 1.1008          | 29778208          |
| 0.2253        | 0.8190 | 540  | 1.1013          | 30055416          |
| 0.2386        | 0.8265 | 545  | 1.0982          | 30333728          |
| 0.2056        | 0.8341 | 550  | 1.0989          | 30599088          |
| 0.2879        | 0.8417 | 555  | 1.1003          | 30883072          |
| 0.2207        | 0.8493 | 560  | 1.0993          | 31160232          |
| 0.2821        | 0.8569 | 565  | 1.0979          | 31441272          |
| 0.2246        | 0.8645 | 570  | 1.0982          | 31712696          |
| 0.3249        | 0.8720 | 575  | 1.0980          | 31991400          |
| 0.2616        | 0.8796 | 580  | 1.0985          | 32269224          |
| 0.2716        | 0.8872 | 585  | 1.0997          | 32542384          |
| 0.2898        | 0.8948 | 590  | 1.0979          | 32826016          |
| 0.2617        | 0.9024 | 595  | 1.0968          | 33110848          |
| 0.2057        | 0.9100 | 600  | 1.0988          | 33391352          |
| 0.293         | 0.9175 | 605  | 1.0965          | 33670472          |
| 0.2081        | 0.9251 | 610  | 1.0947          | 33950936          |
| 0.2801        | 0.9327 | 615  | 1.0963          | 34226952          |
| 0.2678        | 0.9403 | 620  | 1.0952          | 34502376          |
| 0.222         | 0.9479 | 625  | 1.0944          | 34774480          |
| 0.2561        | 0.9555 | 630  | 1.0944          | 35057720          |
| 0.2738        | 0.9630 | 635  | 1.0947          | 35333096          |
| 0.182         | 0.9706 | 640  | 1.0947          | 35614552          |
| 0.224         | 0.9782 | 645  | 1.0935          | 35890992          |
| 0.2861        | 0.9858 | 650  | 1.0935          | 36177736          |
| 0.2674        | 0.9934 | 655  | 1.0948          | 36462944          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1