metadata

license: gemma
base_model: google/gemma-2-9b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd0
    results: []

collapse_gemma-2-9b_hs2_accumulate_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9438
Num Input Tokens Seen: 9944616

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.1198	0.0263	5	1.1072	256304
1.0654	0.0527	10	1.0185	519116
0.8862	0.0790	15	0.9889	775168
0.8666	0.1054	20	0.9891	1038920
0.7782	0.1317	25	0.9886	1306500
0.6537	0.1581	30	0.9872	1568200
0.7345	0.1844	35	0.9877	1831700
0.6292	0.2107	40	0.9795	2092712
0.6696	0.2371	45	0.9755	2353476
0.5445	0.2634	50	0.9722	2620524
0.6364	0.2898	55	0.9687	2886160
0.6564	0.3161	60	0.9671	3149304
0.5167	0.3424	65	0.9640	3413380
0.6553	0.3688	70	0.9627	3684636
0.5201	0.3951	75	0.9603	3947600
0.5839	0.4215	80	0.9603	4207528
0.5599	0.4478	85	0.9587	4468996
0.6981	0.4742	90	0.9590	4730728
0.582	0.5005	95	0.9558	4991328
0.5174	0.5268	100	0.9556	5253436
0.6031	0.5532	105	0.9545	5518624
0.6314	0.5795	110	0.9528	5780988
0.4925	0.6059	115	0.9527	6041796
0.5823	0.6322	120	0.9515	6307948
0.5974	0.6585	125	0.9498	6573748
0.4411	0.6849	130	0.9492	6836544
0.4604	0.7112	135	0.9489	7098504
0.564	0.7376	140	0.9475	7354740
0.5769	0.7639	145	0.9477	7620140
0.4886	0.7903	150	0.9468	7884420
0.5637	0.8166	155	0.9462	8151036
0.5161	0.8429	160	0.9460	8414540
0.633	0.8693	165	0.9459	8677992
0.5239	0.8956	170	0.9446	8937256
0.6149	0.9220	175	0.9465	9204996
0.5386	0.9483	180	0.9451	9467132
0.6638	0.9746	185	0.9446	9732120

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1