metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter8_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter8_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.5160
Num Input Tokens Seen: 4690368

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.6266	0.0511	5	1.2779	235280
0.9347	0.1021	10	1.2886	475376
0.4911	0.1532	15	1.4993	716736
0.186	0.2042	20	1.7498	957152
0.1313	0.2553	25	1.9671	1198616
0.0834	0.3063	30	2.1763	1438792
0.0339	0.3574	35	2.2526	1680696
0.0429	0.4084	40	2.3952	1923576
0.0276	0.4595	45	2.4498	2170944
0.0286	0.5105	50	2.4620	2412248
0.0248	0.5616	55	2.4851	2656800
0.0245	0.6126	60	2.4803	2896688
0.0223	0.6637	65	2.4790	3133576
0.0237	0.7147	70	2.4814	3379344
0.0232	0.7658	75	2.4871	3623816
0.0267	0.8168	80	2.4882	3864024
0.0206	0.8679	85	2.4975	4105224
0.0218	0.9190	90	2.5140	4349992
0.023	0.9700	95	2.5206	4592336

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1