metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter9_sftsd0
    results: []

collapse_gemma-2-2b_hs2_replace_iter9_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.5305
Num Input Tokens Seen: 4805008

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5231	0.0513	5	1.2790	249840
0.9028	0.1027	10	1.2972	494032
0.5406	0.1540	15	1.5467	747896
0.2367	0.2054	20	1.8042	994456
0.1891	0.2567	25	1.9924	1238888
0.0891	0.3081	30	2.1582	1483192
0.0613	0.3594	35	2.3303	1726032
0.0361	0.4108	40	2.4317	1973864
0.0255	0.4621	45	2.4696	2224064
0.0251	0.5135	50	2.5037	2481064
0.0244	0.5648	55	2.5279	2724856
0.0234	0.6162	60	2.5367	2979392
0.0255	0.6675	65	2.5210	3223656
0.0291	0.7189	70	2.5165	3468936
0.0237	0.7702	75	2.4977	3711296
0.0233	0.8216	80	2.4937	3960920
0.0217	0.8729	85	2.5052	4202464
0.0228	0.9243	90	2.5141	4452272
0.0221	0.9756	95	2.5258	4700624

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1