collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.5182	0.0206	5	1.3566	278016
1.3436	0.0412	10	1.2405	555760
1.3099	0.0618	15	1.1728	837248
1.2444	0.0824	20	1.1443	1121888
1.1566	0.1030	25	1.1190	1405760
1.1179	0.1236	30	1.1236	1685480
1.0755	0.1441	35	1.1197	1969520
1.0909	0.1647	40	1.1289	2256056
1.0004	0.1853	45	1.1258	2535072
0.9337	0.2059	50	1.1361	2820504
0.9769	0.2265	55	1.1384	3097544
0.9309	0.2471	60	1.1453	3381016
0.8221	0.2677	65	1.1451	3662552
0.8448	0.2883	70	1.1362	3944008
0.8068	0.3089	75	1.1422	4228616
0.7794	0.3295	80	1.1449	4518704
0.839	0.3501	85	1.1377	4803488
0.7914	0.3707	90	1.1424	5092912
0.7824	0.3912	95	1.1396	5376328
0.7763	0.4118	100	1.1373	5657216
0.7058	0.4324	105	1.1450	5936696
0.7919	0.4530	110	1.1338	6218640
0.6291	0.4736	115	1.1381	6500728
0.6368	0.4942	120	1.1359	6781720
0.6676	0.5148	125	1.1343	7069904
0.6567	0.5354	130	1.1299	7351616
0.7838	0.5560	135	1.1330	7641760
0.6401	0.5766	140	1.1291	7931072
0.6275	0.5972	145	1.1238	8217432
0.6238	0.6178	150	1.1258	8498184
0.639	0.6384	155	1.1231	8779760
0.6416	0.6589	160	1.1231	9062392
0.6282	0.6795	165	1.1192	9342232
0.5363	0.7001	170	1.1197	9620560
0.6333	0.7207	175	1.1168	9904800
0.5421	0.7413	180	1.1152	10188928
0.5879	0.7619	185	1.1131	10471944
0.5608	0.7825	190	1.1117	10758568
0.4817	0.8031	195	1.1109	11046576
0.5578	0.8237	200	1.1081	11328352
0.5967	0.8443	205	1.1053	11609888
0.6086	0.8649	210	1.1074	11894256
0.6493	0.8855	215	1.1021	12180976
0.5754	0.9060	220	1.1066	12462336
0.5951	0.9266	225	1.1012	12744360
0.699	0.9472	230	1.1005	13035416
0.5918	0.9678	235	1.1012	13324984
0.6331	0.9884	240	1.0977	13606712