collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9467
Num Input Tokens Seen: 14479720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.4137	0.0179	5	1.1303	252856
1.2151	0.0359	10	1.0399	513596
0.9459	0.0538	15	0.9985	774796
0.7877	0.0718	20	1.0018	1038520
0.6825	0.0897	25	1.0054	1297472
0.7017	0.1077	30	1.0039	1562648
0.556	0.1256	35	1.0021	1819784
0.5098	0.1436	40	0.9990	2083372
0.4798	0.1615	45	0.9966	2342284
0.4716	0.1795	50	0.9880	2603208
0.4492	0.1974	55	0.9852	2866544
0.5029	0.2153	60	0.9794	3124304
0.3482	0.2333	65	0.9775	3383204
0.4074	0.2512	70	0.9735	3640432
0.4432	0.2692	75	0.9713	3901272
0.4128	0.2871	80	0.9706	4166532
0.4293	0.3051	85	0.9697	4424764
0.2821	0.3230	90	0.9667	4679848
0.3497	0.3410	95	0.9671	4940480
0.4151	0.3589	100	0.9653	5199468
0.366	0.3769	105	0.9651	5457248
0.4383	0.3948	110	0.9628	5716508
0.5494	0.4127	115	0.9627	5982448
0.3396	0.4307	120	0.9612	6240068
0.416	0.4486	125	0.9602	6498568
0.3865	0.4666	130	0.9599	6757836
0.3436	0.4845	135	0.9588	7016324
0.3474	0.5025	140	0.9583	7273968
0.3378	0.5204	145	0.9566	7537436
0.5179	0.5384	150	0.9552	7805180
0.4688	0.5563	155	0.9555	8068284
0.4051	0.5742	160	0.9571	8328600
0.3992	0.5922	165	0.9531	8595768
0.4127	0.6101	170	0.9548	8853456
0.3901	0.6281	175	0.9533	9115420
0.466	0.6460	180	0.9522	9373484
0.3758	0.6640	185	0.9526	9633144
0.3675	0.6819	190	0.9542	9891312
0.3248	0.6999	195	0.9527	10151948
0.422	0.7178	200	0.9522	10417560
0.464	0.7358	205	0.9525	10675408
0.4374	0.7537	210	0.9505	10937468
0.3459	0.7716	215	0.9510	11198760
0.4153	0.7896	220	0.9505	11463912
0.3045	0.8075	225	0.9495	11723048
0.4015	0.8255	230	0.9516	11983792
0.4552	0.8434	235	0.9505	12241296
0.3746	0.8614	240	0.9490	12504660
0.3781	0.8793	245	0.9476	12765960
0.3656	0.8973	250	0.9496	13026072
0.3108	0.9152	255	0.9475	13285212
0.372	0.9332	260	0.9486	13546648
0.4381	0.9511	265	0.9493	13801364
0.416	0.9690	270	0.9488	14063576
0.3967	0.9870	275	0.9476	14329004

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd2

collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd2

Evaluation results