---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: gemma-2-2b_hs2_iter1_sftsd1
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gemma-2-2b_hs2_iter1_sftsd1

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.4676
- Num Input Tokens Seen: 8712000

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_steps: 16
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.7677        | 0.0320 | 5    | 1.3656          | 271360            |
| 1.5854        | 0.0639 | 10   | 1.2596          | 545792            |
| 1.4739        | 0.0959 | 15   | 1.1954          | 825128            |
| 1.2857        | 0.1279 | 20   | 1.1738          | 1108408           |
| 1.184         | 0.1599 | 25   | 1.1751          | 1388256           |
| 0.9207        | 0.1918 | 30   | 1.2445          | 1667528           |
| 0.8665        | 0.2238 | 35   | 1.2921          | 1949712           |
| 0.7163        | 0.2558 | 40   | 1.4105          | 2223872           |
| 0.5853        | 0.2878 | 45   | 1.4211          | 2497184           |
| 0.5139        | 0.3197 | 50   | 1.5440          | 2777320           |
| 0.4299        | 0.3517 | 55   | 1.5069          | 3057528           |
| 0.3458        | 0.3837 | 60   | 1.5679          | 3331488           |
| 0.2913        | 0.4157 | 65   | 1.5084          | 3611304           |
| 0.2654        | 0.4476 | 70   | 1.5051          | 3897800           |
| 0.2249        | 0.4796 | 75   | 1.5396          | 4174176           |
| 0.3085        | 0.5116 | 80   | 1.5069          | 4450528           |
| 0.1601        | 0.5436 | 85   | 1.5507          | 4732680           |
| 0.1126        | 0.5755 | 90   | 1.4520          | 5015288           |
| 0.1922        | 0.6075 | 95   | 1.4548          | 5296336           |
| 0.1709        | 0.6395 | 100  | 1.4422          | 5578672           |
| 0.1558        | 0.6715 | 105  | 1.4477          | 5860328           |
| 0.0981        | 0.7034 | 110  | 1.4791          | 6141568           |
| 0.1635        | 0.7354 | 115  | 1.4351          | 6424480           |
| 0.1061        | 0.7674 | 120  | 1.4498          | 6706048           |
| 0.159         | 0.7994 | 125  | 1.4220          | 6990040           |
| 0.0759        | 0.8313 | 130  | 1.4819          | 7264776           |
| 0.0897        | 0.8633 | 135  | 1.4187          | 7543304           |
| 0.1316        | 0.8953 | 140  | 1.4371          | 7823792           |
| 0.1955        | 0.9273 | 145  | 1.4277          | 8101728           |
| 0.1215        | 0.9592 | 150  | 1.4345          | 8378768           |
| 0.1174        | 0.9912 | 155  | 1.4765          | 8659448           |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1