collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0939
Num Input Tokens Seen: 36687080

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5743	0.0076	5	1.3850	286024
1.5698	0.0152	10	1.3359	565176
1.5023	0.0227	15	1.2721	843224
1.3784	0.0303	20	1.2210	1128808
1.1853	0.0379	25	1.1834	1409632
1.079	0.0455	30	1.1911	1688000
0.9274	0.0531	35	1.2022	1961576
0.8275	0.0607	40	1.2078	2242896
0.6817	0.0682	45	1.2485	2524032
0.5892	0.0758	50	1.2344	2801792
0.4418	0.0834	55	1.2415	3078040
0.4992	0.0910	60	1.1980	3358368
0.4529	0.0986	65	1.2040	3643320
0.4315	0.1062	70	1.2063	3920184
0.3633	0.1137	75	1.1887	4195744
0.3498	0.1213	80	1.1900	4474088
0.5205	0.1289	85	1.1810	4750552
0.4456	0.1365	90	1.1784	5033120
0.2259	0.1441	95	1.1689	5308224
0.2957	0.1517	100	1.1673	5584192
0.2861	0.1592	105	1.1622	5855384
0.396	0.1668	110	1.1576	6135472
0.2727	0.1744	115	1.1593	6417808
0.2863	0.1820	120	1.1536	6694768
0.3506	0.1896	125	1.1512	6974920
0.3593	0.1972	130	1.1506	7250952
0.3129	0.2047	135	1.1464	7528424
0.305	0.2123	140	1.1471	7796288
0.2969	0.2199	145	1.1458	8071736
0.3828	0.2275	150	1.1450	8354136
0.2908	0.2351	155	1.1426	8627856
0.3691	0.2427	160	1.1403	8906272
0.248	0.2502	165	1.1434	9190272
0.2853	0.2578	170	1.1398	9467688
0.336	0.2654	175	1.1423	9745264
0.2295	0.2730	180	1.1392	10022808
0.2522	0.2806	185	1.1382	10307056
0.2513	0.2882	190	1.1442	10582992
0.2799	0.2957	195	1.1370	10866240
0.2176	0.3033	200	1.1359	11148368
0.293	0.3109	205	1.1353	11433232
0.3076	0.3185	210	1.1317	11705656
0.2469	0.3261	215	1.1337	11983632
0.3734	0.3336	220	1.1323	12266112
0.2704	0.3412	225	1.1290	12547976
0.3469	0.3488	230	1.1300	12824592
0.3266	0.3564	235	1.1280	13098760
0.2528	0.3640	240	1.1268	13368616
0.2867	0.3716	245	1.1266	13650008
0.228	0.3791	250	1.1262	13927240
0.233	0.3867	255	1.1249	14203184
0.2724	0.3943	260	1.1250	14475384
0.2117	0.4019	265	1.1245	14760384
0.1981	0.4095	270	1.1226	15040960
0.2519	0.4171	275	1.1219	15323064
0.4068	0.4246	280	1.1205	15603904
0.2811	0.4322	285	1.1214	15883608
0.259	0.4398	290	1.1201	16159520
0.2938	0.4474	295	1.1208	16437656
0.2466	0.4550	300	1.1214	16716952
0.2997	0.4626	305	1.1162	16992344
0.2268	0.4701	310	1.1229	17268760
0.343	0.4777	315	1.1172	17547648
0.2424	0.4853	320	1.1154	17828288
0.2849	0.4929	325	1.1172	18107576
0.478	0.5005	330	1.1155	18387728
0.1959	0.5081	335	1.1162	18667088
0.1868	0.5156	340	1.1160	18950480
0.234	0.5232	345	1.1150	19228760
0.2519	0.5308	350	1.1135	19508952
0.2625	0.5384	355	1.1145	19787448
0.3843	0.5460	360	1.1109	20073168
0.3005	0.5536	365	1.1109	20343008
0.1833	0.5611	370	1.1110	20623352
0.2446	0.5687	375	1.1093	20901240
0.25	0.5763	380	1.1104	21185296
0.2897	0.5839	385	1.1103	21464672
0.168	0.5915	390	1.1099	21743520
0.2387	0.5991	395	1.1106	22023544
0.2066	0.6066	400	1.1072	22291944
0.2191	0.6142	405	1.1089	22572096
0.1869	0.6218	410	1.1085	22849472
0.1939	0.6294	415	1.1075	23126440
0.2368	0.6370	420	1.1091	23406096
0.2209	0.6445	425	1.1066	23678072
0.2523	0.6521	430	1.1077	23961192
0.2416	0.6597	435	1.1082	24240520
0.1964	0.6673	440	1.1057	24520856
0.2369	0.6749	445	1.1055	24798288
0.23	0.6825	450	1.1074	25075848
0.2349	0.6900	455	1.1046	25344112
0.243	0.6976	460	1.1063	25625216
0.3343	0.7052	465	1.1066	25901904
0.2341	0.7128	470	1.1042	26177128
0.283	0.7204	475	1.1059	26459400
0.3112	0.7280	480	1.1066	26736784
0.3015	0.7355	485	1.1042	27017152
0.2788	0.7431	490	1.1031	27295048
0.1838	0.7507	495	1.1025	27575392
0.2366	0.7583	500	1.1036	27852328
0.297	0.7659	505	1.1032	28130032
0.1622	0.7735	510	1.1015	28407672
0.165	0.7810	515	1.1012	28680696
0.3047	0.7886	520	1.1010	28957216
0.336	0.7962	525	1.1012	29235048
0.2728	0.8038	530	1.1011	29507352
0.2007	0.8114	535	1.1008	29778208
0.2253	0.8190	540	1.1013	30055416
0.2386	0.8265	545	1.0982	30333728
0.2056	0.8341	550	1.0989	30599088
0.2879	0.8417	555	1.1003	30883072
0.2207	0.8493	560	1.0993	31160232
0.2821	0.8569	565	1.0979	31441272
0.2246	0.8645	570	1.0982	31712696
0.3249	0.8720	575	1.0980	31991400
0.2616	0.8796	580	1.0985	32269224
0.2716	0.8872	585	1.0997	32542384
0.2898	0.8948	590	1.0979	32826016
0.2617	0.9024	595	1.0968	33110848
0.2057	0.9100	600	1.0988	33391352
0.293	0.9175	605	1.0965	33670472
0.2081	0.9251	610	1.0947	33950936
0.2801	0.9327	615	1.0963	34226952
0.2678	0.9403	620	1.0952	34502376
0.222	0.9479	625	1.0944	34774480
0.2561	0.9555	630	1.0944	35057720
0.2738	0.9630	635	1.0947	35333096
0.182	0.9706	640	1.0947	35614552
0.224	0.9782	645	1.0935	35890992
0.2861	0.9858	650	1.0935	36177736
0.2674	0.9934	655	1.0948	36462944

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0

collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd0

Evaluation results