---
license: gemma
base_model: google/gemma-2-9b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd1

This model is a fine-tuned version of [google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9448
- Num Input Tokens Seen: 14395896

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 4
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.2335          | 0                 |
| 1.2638        | 0.0178 | 5    | 1.1350          | 263564            |
| 1.088         | 0.0356 | 10   | 1.0496          | 522096            |
| 0.9278        | 0.0534 | 15   | 1.0059          | 770488            |
| 0.7003        | 0.0712 | 20   | 1.0038          | 1030136           |
| 0.6094        | 0.0889 | 25   | 1.0117          | 1293096           |
| 0.5915        | 0.1067 | 30   | 1.0084          | 1544952           |
| 0.571         | 0.1245 | 35   | 1.0023          | 1798880           |
| 0.4553        | 0.1423 | 40   | 1.0002          | 2052424           |
| 0.4776        | 0.1601 | 45   | 0.9951          | 2308216           |
| 0.4561        | 0.1779 | 50   | 0.9884          | 2565080           |
| 0.4392        | 0.1957 | 55   | 0.9841          | 2825996           |
| 0.4753        | 0.2135 | 60   | 0.9797          | 3082260           |
| 0.4597        | 0.2313 | 65   | 0.9759          | 3328388           |
| 0.436         | 0.2491 | 70   | 0.9738          | 3584552           |
| 0.3907        | 0.2668 | 75   | 0.9703          | 3839180           |
| 0.4001        | 0.2846 | 80   | 0.9676          | 4100568           |
| 0.4112        | 0.3024 | 85   | 0.9671          | 4356852           |
| 0.4249        | 0.3202 | 90   | 0.9659          | 4610688           |
| 0.3945        | 0.3380 | 95   | 0.9654          | 4859752           |
| 0.5615        | 0.3558 | 100  | 0.9627          | 5108284           |
| 0.3528        | 0.3736 | 105  | 0.9619          | 5363428           |
| 0.3511        | 0.3914 | 110  | 0.9629          | 5623372           |
| 0.3744        | 0.4092 | 115  | 0.9600          | 5876016           |
| 0.4473        | 0.4270 | 120  | 0.9598          | 6139008           |
| 0.465         | 0.4447 | 125  | 0.9595          | 6392720           |
| 0.4511        | 0.4625 | 130  | 0.9568          | 6655704           |
| 0.3273        | 0.4803 | 135  | 0.9570          | 6909620           |
| 0.3689        | 0.4981 | 140  | 0.9575          | 7163740           |
| 0.3782        | 0.5159 | 145  | 0.9551          | 7424140           |
| 0.4371        | 0.5337 | 150  | 0.9541          | 7682936           |
| 0.3295        | 0.5515 | 155  | 0.9543          | 7939780           |
| 0.3631        | 0.5693 | 160  | 0.9533          | 8196216           |
| 0.4747        | 0.5871 | 165  | 0.9532          | 8457568           |
| 0.4171        | 0.6048 | 170  | 0.9545          | 8708980           |
| 0.4043        | 0.6226 | 175  | 0.9535          | 8963244           |
| 0.3966        | 0.6404 | 180  | 0.9523          | 9216124           |
| 0.487         | 0.6582 | 185  | 0.9520          | 9470216           |
| 0.4243        | 0.6760 | 190  | 0.9523          | 9726172           |
| 0.338         | 0.6938 | 195  | 0.9505          | 9978316           |
| 0.3794        | 0.7116 | 200  | 0.9510          | 10237320          |
| 0.4474        | 0.7294 | 205  | 0.9515          | 10498692          |
| 0.498         | 0.7472 | 210  | 0.9510          | 10755164          |
| 0.3557        | 0.7650 | 215  | 0.9505          | 11013492          |
| 0.3772        | 0.7827 | 220  | 0.9503          | 11263256          |
| 0.4487        | 0.8005 | 225  | 0.9509          | 11524460          |
| 0.3492        | 0.8183 | 230  | 0.9481          | 11776848          |
| 0.4046        | 0.8361 | 235  | 0.9483          | 12034428          |
| 0.3995        | 0.8539 | 240  | 0.9484          | 12301540          |
| 0.345         | 0.8717 | 245  | 0.9485          | 12558184          |
| 0.3618        | 0.8895 | 250  | 0.9476          | 12818680          |
| 0.286         | 0.9073 | 255  | 0.9476          | 13077536          |
| 0.368         | 0.9251 | 260  | 0.9487          | 13332544          |
| 0.3742        | 0.9429 | 265  | 0.9456          | 13585628          |
| 0.4091        | 0.9606 | 270  | 0.9465          | 13838300          |
| 0.3315        | 0.9784 | 275  | 0.9469          | 14090880          |
| 0.3664        | 0.9962 | 280  | 0.9449          | 14344624          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1