---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.1004
- Num Input Tokens Seen: 20726616

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3909          | 0                 |
| 1.5618        | 0.0133 | 5    | 1.3747          | 274336            |
| 1.4834        | 0.0266 | 10   | 1.2818          | 548560            |
| 1.2778        | 0.0399 | 15   | 1.2113          | 826768            |
| 1.2063        | 0.0532 | 20   | 1.1648          | 1100984           |
| 1.0763        | 0.0666 | 25   | 1.1554          | 1381272           |
| 1.0008        | 0.0799 | 30   | 1.1420          | 1655904           |
| 1.0066        | 0.0932 | 35   | 1.1522          | 1934384           |
| 1.0122        | 0.1065 | 40   | 1.1650          | 2209128           |
| 0.8869        | 0.1198 | 45   | 1.1676          | 2482008           |
| 0.8353        | 0.1331 | 50   | 1.1729          | 2757616           |
| 0.7535        | 0.1464 | 55   | 1.1702          | 3028816           |
| 0.677         | 0.1597 | 60   | 1.1699          | 3306688           |
| 0.6353        | 0.1730 | 65   | 1.1718          | 3583176           |
| 0.7474        | 0.1864 | 70   | 1.1582          | 3862120           |
| 0.6487        | 0.1997 | 75   | 1.1621          | 4134624           |
| 0.5399        | 0.2130 | 80   | 1.1678          | 4413112           |
| 0.4752        | 0.2263 | 85   | 1.1588          | 4680680           |
| 0.6822        | 0.2396 | 90   | 1.1598          | 4959520           |
| 0.5627        | 0.2529 | 95   | 1.1590          | 5237032           |
| 0.5604        | 0.2662 | 100  | 1.1571          | 5520816           |
| 0.4439        | 0.2795 | 105  | 1.1547          | 5791784           |
| 0.5118        | 0.2928 | 110  | 1.1562          | 6070648           |
| 0.5673        | 0.3062 | 115  | 1.1532          | 6350816           |
| 0.5077        | 0.3195 | 120  | 1.1491          | 6624856           |
| 0.4819        | 0.3328 | 125  | 1.1451          | 6903024           |
| 0.4622        | 0.3461 | 130  | 1.1461          | 7179008           |
| 0.5332        | 0.3594 | 135  | 1.1403          | 7459288           |
| 0.4536        | 0.3727 | 140  | 1.1447          | 7736168           |
| 0.4125        | 0.3860 | 145  | 1.1386          | 8007400           |
| 0.4507        | 0.3993 | 150  | 1.1381          | 8280296           |
| 0.4411        | 0.4126 | 155  | 1.1353          | 8563096           |
| 0.4867        | 0.4260 | 160  | 1.1342          | 8835744           |
| 0.4239        | 0.4393 | 165  | 1.1335          | 9116184           |
| 0.5198        | 0.4526 | 170  | 1.1308          | 9394976           |
| 0.502         | 0.4659 | 175  | 1.1320          | 9676488           |
| 0.5138        | 0.4792 | 180  | 1.1265          | 9952384           |
| 0.4501        | 0.4925 | 185  | 1.1288          | 10223640          |
| 0.4448        | 0.5058 | 190  | 1.1268          | 10503360          |
| 0.4864        | 0.5191 | 195  | 1.1272          | 10783504          |
| 0.5137        | 0.5324 | 200  | 1.1228          | 11061016          |
| 0.4463        | 0.5458 | 205  | 1.1251          | 11334176          |
| 0.5183        | 0.5591 | 210  | 1.1237          | 11611680          |
| 0.4873        | 0.5724 | 215  | 1.1226          | 11889528          |
| 0.4598        | 0.5857 | 220  | 1.1200          | 12165672          |
| 0.4974        | 0.5990 | 225  | 1.1180          | 12447680          |
| 0.307         | 0.6123 | 230  | 1.1191          | 12719352          |
| 0.4302        | 0.6256 | 235  | 1.1154          | 12992608          |
| 0.3704        | 0.6389 | 240  | 1.1187          | 13269640          |
| 0.43          | 0.6522 | 245  | 1.1155          | 13545056          |
| 0.3751        | 0.6656 | 250  | 1.1142          | 13821752          |
| 0.349         | 0.6789 | 255  | 1.1122          | 14096592          |
| 0.4908        | 0.6922 | 260  | 1.1105          | 14370976          |
| 0.4156        | 0.7055 | 265  | 1.1105          | 14647576          |
| 0.3021        | 0.7188 | 270  | 1.1102          | 14927104          |
| 0.4337        | 0.7321 | 275  | 1.1104          | 15202424          |
| 0.4187        | 0.7454 | 280  | 1.1080          | 15479160          |
| 0.3928        | 0.7587 | 285  | 1.1124          | 15758584          |
| 0.4093        | 0.7720 | 290  | 1.1058          | 16040872          |
| 0.474         | 0.7854 | 295  | 1.1074          | 16312664          |
| 0.4337        | 0.7987 | 300  | 1.1079          | 16592008          |
| 0.2634        | 0.8120 | 305  | 1.1057          | 16866912          |
| 0.3113        | 0.8253 | 310  | 1.1055          | 17146272          |
| 0.4897        | 0.8386 | 315  | 1.1059          | 17425624          |
| 0.4663        | 0.8519 | 320  | 1.1031          | 17698920          |
| 0.4878        | 0.8652 | 325  | 1.1059          | 17972416          |
| 0.3575        | 0.8785 | 330  | 1.1049          | 18246352          |
| 0.406         | 0.8918 | 335  | 1.1022          | 18522448          |
| 0.4651        | 0.9052 | 340  | 1.1042          | 18798208          |
| 0.4508        | 0.9185 | 345  | 1.1032          | 19069304          |
| 0.442         | 0.9318 | 350  | 1.1019          | 19352272          |
| 0.3781        | 0.9451 | 355  | 1.1029          | 19630952          |
| 0.4462        | 0.9584 | 360  | 1.0998          | 19903896          |
| 0.3345        | 0.9717 | 365  | 1.1027          | 20176392          |
| 0.4672        | 0.9850 | 370  | 1.1001          | 20451160          |
| 0.3621        | 0.9983 | 375  | 1.1004          | 20726616          |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1