collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1004
Num Input Tokens Seen: 20726616

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5618	0.0133	5	1.3747	274336
1.4834	0.0266	10	1.2818	548560
1.2778	0.0399	15	1.2113	826768
1.2063	0.0532	20	1.1648	1100984
1.0763	0.0666	25	1.1554	1381272
1.0008	0.0799	30	1.1420	1655904
1.0066	0.0932	35	1.1522	1934384
1.0122	0.1065	40	1.1650	2209128
0.8869	0.1198	45	1.1676	2482008
0.8353	0.1331	50	1.1729	2757616
0.7535	0.1464	55	1.1702	3028816
0.677	0.1597	60	1.1699	3306688
0.6353	0.1730	65	1.1718	3583176
0.7474	0.1864	70	1.1582	3862120
0.6487	0.1997	75	1.1621	4134624
0.5399	0.2130	80	1.1678	4413112
0.4752	0.2263	85	1.1588	4680680
0.6822	0.2396	90	1.1598	4959520
0.5627	0.2529	95	1.1590	5237032
0.5604	0.2662	100	1.1571	5520816
0.4439	0.2795	105	1.1547	5791784
0.5118	0.2928	110	1.1562	6070648
0.5673	0.3062	115	1.1532	6350816
0.5077	0.3195	120	1.1491	6624856
0.4819	0.3328	125	1.1451	6903024
0.4622	0.3461	130	1.1461	7179008
0.5332	0.3594	135	1.1403	7459288
0.4536	0.3727	140	1.1447	7736168
0.4125	0.3860	145	1.1386	8007400
0.4507	0.3993	150	1.1381	8280296
0.4411	0.4126	155	1.1353	8563096
0.4867	0.4260	160	1.1342	8835744
0.4239	0.4393	165	1.1335	9116184
0.5198	0.4526	170	1.1308	9394976
0.502	0.4659	175	1.1320	9676488
0.5138	0.4792	180	1.1265	9952384
0.4501	0.4925	185	1.1288	10223640
0.4448	0.5058	190	1.1268	10503360
0.4864	0.5191	195	1.1272	10783504
0.5137	0.5324	200	1.1228	11061016
0.4463	0.5458	205	1.1251	11334176
0.5183	0.5591	210	1.1237	11611680
0.4873	0.5724	215	1.1226	11889528
0.4598	0.5857	220	1.1200	12165672
0.4974	0.5990	225	1.1180	12447680
0.307	0.6123	230	1.1191	12719352
0.4302	0.6256	235	1.1154	12992608
0.3704	0.6389	240	1.1187	13269640
0.43	0.6522	245	1.1155	13545056
0.3751	0.6656	250	1.1142	13821752
0.349	0.6789	255	1.1122	14096592
0.4908	0.6922	260	1.1105	14370976
0.4156	0.7055	265	1.1105	14647576
0.3021	0.7188	270	1.1102	14927104
0.4337	0.7321	275	1.1104	15202424
0.4187	0.7454	280	1.1080	15479160
0.3928	0.7587	285	1.1124	15758584
0.4093	0.7720	290	1.1058	16040872
0.474	0.7854	295	1.1074	16312664
0.4337	0.7987	300	1.1079	16592008
0.2634	0.8120	305	1.1057	16866912
0.3113	0.8253	310	1.1055	17146272
0.4897	0.8386	315	1.1059	17425624
0.4663	0.8519	320	1.1031	17698920
0.4878	0.8652	325	1.1059	17972416
0.3575	0.8785	330	1.1049	18246352
0.406	0.8918	335	1.1022	18522448
0.4651	0.9052	340	1.1042	18798208
0.4508	0.9185	345	1.1032	19069304
0.442	0.9318	350	1.1019	19352272
0.3781	0.9451	355	1.1029	19630952
0.4462	0.9584	360	1.0998	19903896
0.3345	0.9717	365	1.1027	20176392
0.4672	0.9850	370	1.1001	20451160
0.3621	0.9983	375	1.1004	20726616

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

Evaluation results