collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0904
Num Input Tokens Seen: 15599864

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.6915	0.0179	5	1.3538	280664
1.5025	0.0358	10	1.2398	557488
1.3663	0.0536	15	1.1737	840856
1.1851	0.0715	20	1.1503	1124920
1.0716	0.0894	25	1.1379	1402408
0.9914	0.1073	30	1.1383	1675520
0.9719	0.1252	35	1.1480	1951472
0.9529	0.1430	40	1.1500	2225736
0.8376	0.1609	45	1.1498	2510856
0.8176	0.1788	50	1.1574	2787800
0.7634	0.1967	55	1.1589	3060664
0.8431	0.2146	60	1.1481	3341424
0.6527	0.2325	65	1.1534	3619016
0.628	0.2503	70	1.1462	3897968
0.6262	0.2682	75	1.1411	4178920
0.7141	0.2861	80	1.1413	4459624
0.5843	0.3040	85	1.1416	4744144
0.6152	0.3219	90	1.1354	5023280
0.5608	0.3397	95	1.1409	5305880
0.6328	0.3576	100	1.1331	5583648
0.5968	0.3755	105	1.1343	5858848
0.4929	0.3934	110	1.1303	6140520
0.5384	0.4113	115	1.1285	6418144
0.6241	0.4291	120	1.1240	6699248
0.511	0.4470	125	1.1238	6981672
0.5549	0.4649	130	1.1240	7259432
0.5711	0.4828	135	1.1193	7540672
0.5146	0.5007	140	1.1201	7817576
0.4929	0.5186	145	1.1161	8095624
0.6243	0.5364	150	1.1159	8372336
0.505	0.5543	155	1.1139	8654856
0.5097	0.5722	160	1.1130	8927360
0.4289	0.5901	165	1.1105	9206880
0.5167	0.6080	170	1.1087	9485672
0.5748	0.6258	175	1.1068	9767928
0.5217	0.6437	180	1.1057	10050896
0.5644	0.6616	185	1.1029	10330480
0.4453	0.6795	190	1.1050	10608400
0.4872	0.6974	195	1.1007	10887048
0.5595	0.7152	200	1.1024	11167464
0.556	0.7331	205	1.0992	11446280
0.5089	0.7510	210	1.1001	11731144
0.5189	0.7689	215	1.0985	12011960
0.4552	0.7868	220	1.0964	12292104
0.4871	0.8046	225	1.0996	12570976
0.5506	0.8225	230	1.0935	12857496
0.5102	0.8404	235	1.0960	13141736
0.4703	0.8583	240	1.0955	13420224
0.4595	0.8762	245	1.0916	13698216
0.5256	0.8941	250	1.0931	13982192
0.464	0.9119	255	1.0934	14260960
0.4848	0.9298	260	1.0908	14538976
0.5636	0.9477	265	1.0911	14815712
0.5172	0.9656	270	1.0909	15101872
0.4533	0.9835	275	1.0907	15373280

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd2

Evaluation results