metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter5_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.2408
Num Input Tokens Seen: 4919592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.7001	0.0513	5	1.2745	250576
0.9776	0.1026	10	1.2553	507984
0.6411	0.1539	15	1.4389	759912
0.3549	0.2053	20	1.6506	1017344
0.1684	0.2566	25	1.8128	1261720
0.0927	0.3079	30	1.9916	1509936
0.088	0.3592	35	2.1525	1762648
0.0417	0.4105	40	2.2521	2020112
0.0409	0.4618	45	2.2578	2273928
0.0342	0.5131	50	2.2295	2525056
0.0366	0.5645	55	2.2589	2779656
0.0259	0.6158	60	2.2810	3029816
0.0289	0.6671	65	2.2621	3284000
0.0332	0.7184	70	2.2593	3542064
0.0288	0.7697	75	2.2449	3801936
0.0246	0.8210	80	2.2357	4058824
0.025	0.8724	85	2.2324	4315048
0.0273	0.9237	90	2.2358	4563000
0.0226	0.9750	95	2.2411	4812984

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1