---
license: gemma
base_model: jkazdan/step_val_25_gemma-2-2b_hs2_iter1_sftsd2
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: augmented_step_val_25_gemma-2-2b_hs2_iter1_sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# augmented_step_val_25_gemma-2-2b_hs2_iter1_sftsd2

This model is a fine-tuned version of [jkazdan/step_val_25_gemma-2-2b_hs2_iter1_sftsd2](https://huggingface.co/jkazdan/step_val_25_gemma-2-2b_hs2_iter1_sftsd2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.5025
- Num Input Tokens Seen: 7865232

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.0950          | 0                 |
| 1.4558        | 0.0345 | 5    | 1.0942          | 274624            |
| 1.2848        | 0.0690 | 10   | 1.1065          | 546200            |
| 1.0788        | 0.1035 | 15   | 1.1339          | 817584            |
| 0.9149        | 0.1380 | 20   | 1.1915          | 1088176           |
| 0.8855        | 0.1725 | 25   | 1.2506          | 1358336           |
| 0.8151        | 0.2070 | 30   | 1.3419          | 1637992           |
| 0.5913        | 0.2415 | 35   | 1.3767          | 1911376           |
| 0.5641        | 0.2760 | 40   | 1.4619          | 2181176           |
| 0.5135        | 0.3105 | 45   | 1.4701          | 2462856           |
| 0.335         | 0.3450 | 50   | 1.4866          | 2737752           |
| 0.332         | 0.3795 | 55   | 1.5121          | 3012656           |
| 0.3655        | 0.4140 | 60   | 1.4798          | 3279744           |
| 0.249         | 0.4485 | 65   | 1.4564          | 3547808           |
| 0.2495        | 0.4830 | 70   | 1.4986          | 3817328           |
| 0.2821        | 0.5175 | 75   | 1.4208          | 4097184           |
| 0.1291        | 0.5520 | 80   | 1.4710          | 4367848           |
| 0.2026        | 0.5865 | 85   | 1.4296          | 4640592           |
| 0.2365        | 0.6210 | 90   | 1.5041          | 4922032           |
| 0.1523        | 0.6555 | 95   | 1.4437          | 5193088           |
| 0.1677        | 0.6900 | 100  | 1.4660          | 5464864           |
| 0.2233        | 0.7245 | 105  | 1.4473          | 5739032           |
| 0.1273        | 0.7589 | 110  | 1.4308          | 6012736           |
| 0.1756        | 0.7934 | 115  | 1.4913          | 6274808           |
| 0.1822        | 0.8279 | 120  | 1.4676          | 6548312           |
| 0.1255        | 0.8624 | 125  | 1.4698          | 6821112           |
| 0.1072        | 0.8969 | 130  | 1.4484          | 7098736           |
| 0.1329        | 0.9314 | 135  | 1.4401          | 7369552           |
| 0.104         | 0.9659 | 140  | 1.4771          | 7640632           |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1