collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.4818	0.0264	5	1.3328	284168
1.4014	0.0528	10	1.2146	570464
1.247	0.0792	15	1.1552	859552
1.2344	0.1056	20	1.1316	1139712
1.0727	0.1321	25	1.1148	1425952
1.0489	0.1585	30	1.1144	1712584
1.0564	0.1849	35	1.1157	1999000
1.0475	0.2113	40	1.1221	2278656
1.0397	0.2377	45	1.1144	2567096
0.9626	0.2641	50	1.1186	2858408
0.9346	0.2905	55	1.1198	3145312
0.9472	0.3169	60	1.1231	3435992
0.9308	0.3433	65	1.1217	3729256
0.7938	0.3698	70	1.1223	4015952
0.8555	0.3962	75	1.1211	4305600
0.8708	0.4226	80	1.1195	4599712
0.8453	0.4490	85	1.1167	4888360
0.7371	0.4754	90	1.1169	5180504
0.8233	0.5018	95	1.1128	5473352
0.8823	0.5282	100	1.1131	5765104
0.623	0.5546	105	1.1111	6052128
0.7361	0.5810	110	1.1069	6343856
0.8444	0.6075	115	1.1103	6631416
0.7777	0.6339	120	1.1068	6921552
0.6832	0.6603	125	1.1054	7209048
0.8106	0.6867	130	1.1039	7489664
0.6772	0.7131	135	1.1007	7782048
0.7388	0.7395	140	1.0992	8068440
0.8197	0.7659	145	1.0968	8360312
0.6981	0.7923	150	1.0959	8648720
0.6736	0.8188	155	1.0956	8940416
0.7139	0.8452	160	1.0935	9223368
0.8445	0.8716	165	1.0927	9508432
0.6475	0.8980	170	1.0919	9797464
0.7119	0.9244	175	1.0904	10086248
0.8095	0.9508	180	1.0897	10378552
0.6255	0.9772	185	1.0894	10659304