collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0953
Num Input Tokens Seen: 31512696

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.6442	0.0089	5	1.3820	278856
1.6475	0.0178	10	1.3206	555112
1.4758	0.0267	15	1.2567	836016
1.2297	0.0356	20	1.2057	1125920
1.1437	0.0444	25	1.1841	1404848
1.0272	0.0533	30	1.1957	1688752
0.8967	0.0622	35	1.2130	1966072
0.7863	0.0711	40	1.2113	2237568
0.7654	0.08	45	1.2383	2519344
0.668	0.0889	50	1.2119	2797232
0.5498	0.0978	55	1.2164	3078520
0.4924	0.1067	60	1.1994	3354864
0.502	0.1156	65	1.1927	3641344
0.4306	0.1244	70	1.1936	3923864
0.4537	0.1333	75	1.1781	4209800
0.4149	0.1422	80	1.1854	4491264
0.3523	0.1511	85	1.1710	4767448
0.3391	0.16	90	1.1723	5048496
0.3477	0.1689	95	1.1642	5327992
0.3507	0.1778	100	1.1638	5609056
0.3321	0.1867	105	1.1618	5883360
0.2854	0.1956	110	1.1591	6163376
0.3745	0.2044	115	1.1553	6444888
0.3668	0.2133	120	1.1583	6728632
0.3377	0.2222	125	1.1487	7008152
0.3782	0.2311	130	1.1549	7282168
0.3287	0.24	135	1.1461	7559824
0.3681	0.2489	140	1.1483	7838312
0.2605	0.2578	145	1.1456	8118032
0.2678	0.2667	150	1.1411	8392096
0.3602	0.2756	155	1.1474	8674560
0.3069	0.2844	160	1.1387	8947032
0.3192	0.2933	165	1.1411	9222240
0.3828	0.3022	170	1.1382	9501128
0.179	0.3111	175	1.1384	9779776
0.3228	0.32	180	1.1375	10056488
0.3182	0.3289	185	1.1371	10331920
0.2623	0.3378	190	1.1346	10614768
0.3908	0.3467	195	1.1352	10903104
0.4084	0.3556	200	1.1310	11176968
0.2535	0.3644	205	1.1288	11462496
0.2713	0.3733	210	1.1326	11741232
0.2936	0.3822	215	1.1268	12020072
0.3277	0.3911	220	1.1267	12296064
0.3603	0.4	225	1.1277	12573368
0.2912	0.4089	230	1.1226	12851992
0.2475	0.4178	235	1.1249	13134768
0.3164	0.4267	240	1.1220	13415800
0.2098	0.4356	245	1.1233	13697896
0.2824	0.4444	250	1.1196	13971120
0.2863	0.4533	255	1.1197	14250744
0.3098	0.4622	260	1.1204	14533144
0.3439	0.4711	265	1.1174	14808272
0.336	0.48	270	1.1176	15092864
0.3359	0.4889	275	1.1181	15375104
0.2731	0.4978	280	1.1157	15657480
0.2818	0.5067	285	1.1157	15940656
0.3306	0.5156	290	1.1137	16225416
0.2837	0.5244	295	1.1142	16512184
0.3606	0.5333	300	1.1107	16796568
0.3058	0.5422	305	1.1121	17078072
0.3259	0.5511	310	1.1125	17362648
0.2235	0.56	315	1.1094	17647312
0.2725	0.5689	320	1.1082	17928848
0.3108	0.5778	325	1.1103	18205136
0.2642	0.5867	330	1.1092	18487016
0.2774	0.5956	335	1.1074	18770560
0.2155	0.6044	340	1.1070	19046272
0.234	0.6133	345	1.1091	19324080
0.2968	0.6222	350	1.1073	19609160
0.3449	0.6311	355	1.1054	19886296
0.3334	0.64	360	1.1060	20170488
0.2927	0.6489	365	1.1058	20452192
0.2632	0.6578	370	1.1031	20728320
0.2462	0.6667	375	1.1091	21015688
0.2949	0.6756	380	1.1056	21289616
0.2476	0.6844	385	1.1045	21555880
0.2329	0.6933	390	1.1046	21837392
0.2887	0.7022	395	1.1049	22118704
0.3022	0.7111	400	1.1033	22401016
0.2871	0.72	405	1.1013	22688808
0.2822	0.7289	410	1.1028	22967416
0.3034	0.7378	415	1.1028	23255720
0.3235	0.7467	420	1.1016	23544352
0.42	0.7556	425	1.1006	23825720
0.2494	0.7644	430	1.0996	24104072
0.2431	0.7733	435	1.1016	24378016
0.2956	0.7822	440	1.1003	24654072
0.2935	0.7911	445	1.1007	24934896
0.3467	0.8	450	1.0990	25218096
0.317	0.8089	455	1.0980	25498184
0.3065	0.8178	460	1.1002	25778064
0.2169	0.8267	465	1.1002	26058096
0.2623	0.8356	470	1.0994	26332744
0.258	0.8444	475	1.0967	26620248
0.1981	0.8533	480	1.0967	26906832
0.2399	0.8622	485	1.0976	27177320
0.3677	0.8711	490	1.0970	27455904
0.2889	0.88	495	1.0962	27741312
0.3128	0.8889	500	1.0967	28018736
0.2875	0.8978	505	1.0961	28299576
0.2512	0.9067	510	1.0953	28578336
0.3189	0.9156	515	1.0952	28853520
0.2676	0.9244	520	1.0968	29137216
0.3755	0.9333	525	1.0940	29424376
0.3404	0.9422	530	1.0931	29709304
0.2534	0.9511	535	1.0954	29995312
0.2709	0.96	540	1.0934	30284712
0.2448	0.9689	545	1.0929	30562744
0.2625	0.9778	550	1.0948	30837288
0.3507	0.9867	555	1.0930	31118808
0.2675	0.9956	560	1.0942	31401384

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0

collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0

Evaluation results