---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: gemma-2-2b_hs2_iter1_sftsd0
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gemma-2-2b_hs2_iter1_sftsd0

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.5137
- Num Input Tokens Seen: 17829712

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_steps: 16
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.6926        | 0.0160 | 5    | 1.3651          | 285696            |
| 1.5784        | 0.0320 | 10   | 1.2560          | 571560            |
| 1.4982        | 0.0480 | 15   | 1.1924          | 856528            |
| 1.3011        | 0.0640 | 20   | 1.1594          | 1137968           |
| 1.2692        | 0.0800 | 25   | 1.1378          | 1423392           |
| 1.2069        | 0.0960 | 30   | 1.1500          | 1706944           |
| 1.1563        | 0.1120 | 35   | 1.1761          | 1988224           |
| 1.0316        | 0.1279 | 40   | 1.2207          | 2272264           |
| 0.9047        | 0.1439 | 45   | 1.2716          | 2559864           |
| 0.8926        | 0.1599 | 50   | 1.3145          | 2846920           |
| 0.7537        | 0.1759 | 55   | 1.3610          | 3135896           |
| 0.7882        | 0.1919 | 60   | 1.4222          | 3418728           |
| 0.6266        | 0.2079 | 65   | 1.4826          | 3699056           |
| 0.5966        | 0.2239 | 70   | 1.5111          | 3982712           |
| 0.5862        | 0.2399 | 75   | 1.5479          | 4266016           |
| 0.4099        | 0.2559 | 80   | 1.5246          | 4545624           |
| 0.438         | 0.2719 | 85   | 1.5312          | 4834416           |
| 0.4268        | 0.2879 | 90   | 1.5651          | 5115616           |
| 0.3835        | 0.3039 | 95   | 1.5781          | 5404872           |
| 0.3936        | 0.3199 | 100  | 1.6049          | 5693440           |
| 0.2999        | 0.3359 | 105  | 1.5558          | 5979936           |
| 0.3388        | 0.3519 | 110  | 1.5853          | 6265272           |
| 0.2141        | 0.3679 | 115  | 1.6082          | 6550008           |
| 0.1951        | 0.3838 | 120  | 1.5357          | 6829896           |
| 0.2827        | 0.3998 | 125  | 1.5383          | 7119640           |
| 0.1915        | 0.4158 | 130  | 1.5876          | 7401968           |
| 0.1656        | 0.4318 | 135  | 1.5285          | 7693464           |
| 0.1482        | 0.4478 | 140  | 1.5381          | 7979480           |
| 0.1831        | 0.4638 | 145  | 1.5497          | 8273408           |
| 0.2056        | 0.4798 | 150  | 1.5419          | 8564664           |
| 0.1866        | 0.4958 | 155  | 1.5257          | 8852896           |
| 0.1868        | 0.5118 | 160  | 1.5287          | 9138384           |
| 0.0985        | 0.5278 | 165  | 1.4843          | 9419648           |
| 0.1397        | 0.5438 | 170  | 1.4939          | 9704104           |
| 0.1592        | 0.5598 | 175  | 1.4628          | 9987840           |
| 0.1712        | 0.5758 | 180  | 1.4940          | 10272800          |
| 0.1482        | 0.5918 | 185  | 1.4714          | 10556720          |
| 0.0878        | 0.6078 | 190  | 1.4612          | 10842864          |
| 0.1269        | 0.6238 | 195  | 1.4885          | 11129280          |
| 0.0927        | 0.6397 | 200  | 1.4619          | 11410784          |
| 0.1429        | 0.6557 | 205  | 1.4507          | 11694648          |
| 0.1545        | 0.6717 | 210  | 1.4523          | 11981880          |
| 0.1168        | 0.6877 | 215  | 1.4535          | 12272496          |
| 0.175         | 0.7037 | 220  | 1.4501          | 12558896          |
| 0.0869        | 0.7197 | 225  | 1.4673          | 12842440          |
| 0.1086        | 0.7357 | 230  | 1.4905          | 13130608          |
| 0.1035        | 0.7517 | 235  | 1.4422          | 13411360          |
| 0.1142        | 0.7677 | 240  | 1.4519          | 13695520          |
| 0.091         | 0.7837 | 245  | 1.4698          | 13980728          |
| 0.1734        | 0.7997 | 250  | 1.4578          | 14276136          |
| 0.147         | 0.8157 | 255  | 1.4818          | 14560480          |
| 0.1138        | 0.8317 | 260  | 1.4677          | 14848512          |
| 0.0635        | 0.8477 | 265  | 1.4703          | 15136488          |
| 0.2047        | 0.8637 | 270  | 1.4876          | 15423352          |
| 0.1162        | 0.8796 | 275  | 1.4672          | 15707888          |
| 0.1132        | 0.8956 | 280  | 1.4634          | 15990288          |
| 0.1231        | 0.9116 | 285  | 1.4662          | 16275832          |
| 0.1544        | 0.9276 | 290  | 1.5047          | 16564968          |
| 0.1852        | 0.9436 | 295  | 1.4825          | 16851368          |
| 0.1406        | 0.9596 | 300  | 1.4831          | 17142256          |
| 0.1188        | 0.9756 | 305  | 1.5429          | 17429064          |
| 0.1442        | 0.9916 | 310  | 1.5211          | 17714264          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1