---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.0909
- Num Input Tokens Seen: 25913464

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3909          | 0                 |
| 1.5757        | 0.0106 | 5    | 1.3785          | 273728            |
| 1.5086        | 0.0212 | 10   | 1.3024          | 553176            |
| 1.3703        | 0.0318 | 15   | 1.2301          | 832928            |
| 1.2237        | 0.0424 | 20   | 1.1798          | 1112400           |
| 1.1043        | 0.0530 | 25   | 1.1741          | 1387256           |
| 0.8871        | 0.0636 | 30   | 1.1676          | 1667816           |
| 0.8128        | 0.0742 | 35   | 1.1807          | 1935720           |
| 0.8159        | 0.0848 | 40   | 1.1931          | 2212104           |
| 0.7139        | 0.0955 | 45   | 1.2108          | 2488864           |
| 0.6054        | 0.1061 | 50   | 1.1934          | 2759968           |
| 0.5794        | 0.1167 | 55   | 1.1874          | 3037768           |
| 0.4857        | 0.1273 | 60   | 1.1861          | 3315040           |
| 0.5228        | 0.1379 | 65   | 1.1744          | 3590680           |
| 0.5009        | 0.1485 | 70   | 1.1665          | 3866264           |
| 0.4853        | 0.1591 | 75   | 1.1741          | 4138640           |
| 0.4493        | 0.1697 | 80   | 1.1581          | 4408560           |
| 0.4206        | 0.1803 | 85   | 1.1612          | 4676520           |
| 0.3377        | 0.1909 | 90   | 1.1532          | 4956920           |
| 0.3708        | 0.2015 | 95   | 1.1524          | 5230480           |
| 0.4861        | 0.2121 | 100  | 1.1467          | 5510432           |
| 0.415         | 0.2227 | 105  | 1.1487          | 5783888           |
| 0.3656        | 0.2333 | 110  | 1.1439          | 6059904           |
| 0.4284        | 0.2439 | 115  | 1.1477          | 6333552           |
| 0.3727        | 0.2545 | 120  | 1.1430          | 6607432           |
| 0.4572        | 0.2651 | 125  | 1.1448          | 6884048           |
| 0.3842        | 0.2758 | 130  | 1.1388          | 7161200           |
| 0.3452        | 0.2864 | 135  | 1.1418          | 7443528           |
| 0.3085        | 0.2970 | 140  | 1.1353          | 7719360           |
| 0.4154        | 0.3076 | 145  | 1.1353          | 8001024           |
| 0.3739        | 0.3182 | 150  | 1.1316          | 8281392           |
| 0.3435        | 0.3288 | 155  | 1.1313          | 8553600           |
| 0.356         | 0.3394 | 160  | 1.1337          | 8825544           |
| 0.3751        | 0.3500 | 165  | 1.1262          | 9098040           |
| 0.3788        | 0.3606 | 170  | 1.1268          | 9377472           |
| 0.3203        | 0.3712 | 175  | 1.1266          | 9649408           |
| 0.3023        | 0.3818 | 180  | 1.1224          | 9930488           |
| 0.3961        | 0.3924 | 185  | 1.1217          | 10204672          |
| 0.4728        | 0.4030 | 190  | 1.1191          | 10476840          |
| 0.3212        | 0.4136 | 195  | 1.1211          | 10748672          |
| 0.3261        | 0.4242 | 200  | 1.1176          | 11022304          |
| 0.2691        | 0.4348 | 205  | 1.1170          | 11294832          |
| 0.2953        | 0.4454 | 210  | 1.1151          | 11571256          |
| 0.3242        | 0.4561 | 215  | 1.1162          | 11845312          |
| 0.3608        | 0.4667 | 220  | 1.1142          | 12124880          |
| 0.3344        | 0.4773 | 225  | 1.1133          | 12396192          |
| 0.2966        | 0.4879 | 230  | 1.1142          | 12663864          |
| 0.3665        | 0.4985 | 235  | 1.1141          | 12938920          |
| 0.3217        | 0.5091 | 240  | 1.1155          | 13209424          |
| 0.3376        | 0.5197 | 245  | 1.1119          | 13482760          |
| 0.3636        | 0.5303 | 250  | 1.1130          | 13749552          |
| 0.3988        | 0.5409 | 255  | 1.1115          | 14022304          |
| 0.361         | 0.5515 | 260  | 1.1087          | 14298840          |
| 0.3727        | 0.5621 | 265  | 1.1117          | 14569648          |
| 0.3881        | 0.5727 | 270  | 1.1083          | 14844120          |
| 0.324         | 0.5833 | 275  | 1.1086          | 15119496          |
| 0.4137        | 0.5939 | 280  | 1.1079          | 15395456          |
| 0.4208        | 0.6045 | 285  | 1.1058          | 15671704          |
| 0.2808        | 0.6151 | 290  | 1.1065          | 15944040          |
| 0.2928        | 0.6257 | 295  | 1.1055          | 16220520          |
| 0.4027        | 0.6364 | 300  | 1.1075          | 16491504          |
| 0.2943        | 0.6470 | 305  | 1.1053          | 16765024          |
| 0.3012        | 0.6576 | 310  | 1.1059          | 17039080          |
| 0.2789        | 0.6682 | 315  | 1.1039          | 17318648          |
| 0.3305        | 0.6788 | 320  | 1.1030          | 17596848          |
| 0.321         | 0.6894 | 325  | 1.1018          | 17870976          |
| 0.3127        | 0.7000 | 330  | 1.1039          | 18137760          |
| 0.3792        | 0.7106 | 335  | 1.1030          | 18410248          |
| 0.3946        | 0.7212 | 340  | 1.0999          | 18677968          |
| 0.334         | 0.7318 | 345  | 1.1031          | 18947432          |
| 0.3146        | 0.7424 | 350  | 1.1030          | 19227968          |
| 0.3158        | 0.7530 | 355  | 1.0988          | 19509360          |
| 0.2907        | 0.7636 | 360  | 1.1000          | 19785616          |
| 0.4204        | 0.7742 | 365  | 1.1001          | 20056848          |
| 0.2924        | 0.7848 | 370  | 1.1002          | 20335856          |
| 0.3222        | 0.7954 | 375  | 1.0997          | 20613064          |
| 0.3221        | 0.8060 | 380  | 1.0989          | 20884992          |
| 0.3005        | 0.8167 | 385  | 1.0967          | 21162232          |
| 0.3183        | 0.8273 | 390  | 1.0968          | 21438576          |
| 0.3396        | 0.8379 | 395  | 1.0980          | 21715544          |
| 0.3205        | 0.8485 | 400  | 1.0947          | 21988384          |
| 0.3199        | 0.8591 | 405  | 1.0972          | 22266120          |
| 0.314         | 0.8697 | 410  | 1.0939          | 22539560          |
| 0.4633        | 0.8803 | 415  | 1.0941          | 22813776          |
| 0.3282        | 0.8909 | 420  | 1.0940          | 23090296          |
| 0.3576        | 0.9015 | 425  | 1.0933          | 23369344          |
| 0.3411        | 0.9121 | 430  | 1.0934          | 23645208          |
| 0.2557        | 0.9227 | 435  | 1.0935          | 23919016          |
| 0.4153        | 0.9333 | 440  | 1.0922          | 24194664          |
| 0.3082        | 0.9439 | 445  | 1.0929          | 24470512          |
| 0.2994        | 0.9545 | 450  | 1.0925          | 24748488          |
| 0.2968        | 0.9651 | 455  | 1.0915          | 25029504          |
| 0.3045        | 0.9757 | 460  | 1.0936          | 25307368          |
| 0.273         | 0.9863 | 465  | 1.0917          | 25584672          |
| 0.3096        | 0.9970 | 470  | 1.0909          | 25862576          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1