File size: 7,104 Bytes

---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1114
- Num Input Tokens Seen: 21798600

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.5454        | 0.0129 | 5    | 1.3798          | 284096            |
| 1.5595        | 0.0258 | 10   | 1.2917          | 565048            |
| 1.4801        | 0.0388 | 15   | 1.2113          | 857320            |
| 1.2583        | 0.0517 | 20   | 1.1640          | 1143680           |
| 1.2536        | 0.0646 | 25   | 1.1396          | 1426896           |
| 1.1682        | 0.0775 | 30   | 1.1244          | 1704888           |
| 1.1565        | 0.0905 | 35   | 1.1242          | 1985240           |
| 1.0138        | 0.1034 | 40   | 1.1384          | 2269216           |
| 0.9845        | 0.1163 | 45   | 1.1461          | 2554344           |
| 0.91          | 0.1292 | 50   | 1.1554          | 2839272           |
| 0.9047        | 0.1422 | 55   | 1.1678          | 3127496           |
| 0.9137        | 0.1551 | 60   | 1.1697          | 3415328           |
| 0.8846        | 0.1680 | 65   | 1.1704          | 3692024           |
| 0.9215        | 0.1809 | 70   | 1.1719          | 3967168           |
| 0.8233        | 0.1939 | 75   | 1.1850          | 4244568           |
| 0.6717        | 0.2068 | 80   | 1.1881          | 4531936           |
| 0.7733        | 0.2197 | 85   | 1.1770          | 4817232           |
| 0.6835        | 0.2326 | 90   | 1.1663          | 5103112           |
| 0.7503        | 0.2456 | 95   | 1.1860          | 5388248           |
| 0.6998        | 0.2585 | 100  | 1.1702          | 5669656           |
| 0.615         | 0.2714 | 105  | 1.1739          | 5956384           |
| 0.5807        | 0.2843 | 110  | 1.1799          | 6233928           |
| 0.6475        | 0.2973 | 115  | 1.1703          | 6517360           |
| 0.649         | 0.3102 | 120  | 1.1702          | 6802600           |
| 0.6409        | 0.3231 | 125  | 1.1747          | 7086032           |
| 0.6033        | 0.3360 | 130  | 1.1629          | 7364952           |
| 0.4875        | 0.3489 | 135  | 1.1752          | 7650744           |
| 0.6259        | 0.3619 | 140  | 1.1664          | 7933080           |
| 0.5287        | 0.3748 | 145  | 1.1703          | 8220488           |
| 0.4745        | 0.3877 | 150  | 1.1645          | 8501544           |
| 0.4469        | 0.4006 | 155  | 1.1667          | 8781400           |
| 0.5011        | 0.4136 | 160  | 1.1652          | 9056664           |
| 0.4512        | 0.4265 | 165  | 1.1630          | 9337208           |
| 0.5347        | 0.4394 | 170  | 1.1630          | 9620568           |
| 0.5226        | 0.4523 | 175  | 1.1626          | 9896128           |
| 0.4775        | 0.4653 | 180  | 1.1568          | 10176840          |
| 0.5018        | 0.4782 | 185  | 1.1642          | 10461520          |
| 0.508         | 0.4911 | 190  | 1.1530          | 10741632          |
| 0.3972        | 0.5040 | 195  | 1.1550          | 11024096          |
| 0.4409        | 0.5170 | 200  | 1.1539          | 11301736          |
| 0.5384        | 0.5299 | 205  | 1.1477          | 11579816          |
| 0.4633        | 0.5428 | 210  | 1.1501          | 11865648          |
| 0.5198        | 0.5557 | 215  | 1.1410          | 12156088          |
| 0.3293        | 0.5687 | 220  | 1.1480          | 12434448          |
| 0.4762        | 0.5816 | 225  | 1.1375          | 12720344          |
| 0.5467        | 0.5945 | 230  | 1.1424          | 13003704          |
| 0.4776        | 0.6074 | 235  | 1.1361          | 13292824          |
| 0.4567        | 0.6204 | 240  | 1.1398          | 13574560          |
| 0.4565        | 0.6333 | 245  | 1.1371          | 13859632          |
| 0.4899        | 0.6462 | 250  | 1.1369          | 14136888          |
| 0.3492        | 0.6591 | 255  | 1.1327          | 14421200          |
| 0.4968        | 0.6721 | 260  | 1.1315          | 14707344          |
| 0.3487        | 0.6850 | 265  | 1.1329          | 14988680          |
| 0.4001        | 0.6979 | 270  | 1.1258          | 15267688          |
| 0.3161        | 0.7108 | 275  | 1.1308          | 15540888          |
| 0.4089        | 0.7237 | 280  | 1.1262          | 15816840          |
| 0.3835        | 0.7367 | 285  | 1.1289          | 16098568          |
| 0.4023        | 0.7496 | 290  | 1.1270          | 16387224          |
| 0.5333        | 0.7625 | 295  | 1.1243          | 16672848          |
| 0.492         | 0.7754 | 300  | 1.1276          | 16955104          |
| 0.3361        | 0.7884 | 305  | 1.1215          | 17232984          |
| 0.4585        | 0.8013 | 310  | 1.1210          | 17517512          |
| 0.3541        | 0.8142 | 315  | 1.1232          | 17805408          |
| 0.4862        | 0.8271 | 320  | 1.1195          | 18086744          |
| 0.5085        | 0.8401 | 325  | 1.1208          | 18374072          |
| 0.4206        | 0.8530 | 330  | 1.1198          | 18654568          |
| 0.3501        | 0.8659 | 335  | 1.1154          | 18936680          |
| 0.4675        | 0.8788 | 340  | 1.1207          | 19213288          |
| 0.3692        | 0.8918 | 345  | 1.1151          | 19495512          |
| 0.3526        | 0.9047 | 350  | 1.1162          | 19777904          |
| 0.5192        | 0.9176 | 355  | 1.1134          | 20053800          |
| 0.5117        | 0.9305 | 360  | 1.1101          | 20335472          |
| 0.3685        | 0.9435 | 365  | 1.1152          | 20620416          |
| 0.3554        | 0.9564 | 370  | 1.1103          | 20898680          |
| 0.4323        | 0.9693 | 375  | 1.1123          | 21181272          |
| 0.4111        | 0.9822 | 380  | 1.1120          | 21465480          |
| 0.3962        | 0.9952 | 385  | 1.1119          | 21742008          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1