collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1156
Num Input Tokens Seen: 38394432

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.64	0.0071	5	1.3915	282928
1.717	0.0142	10	1.3495	547680
1.4756	0.0214	15	1.2809	819464
1.3413	0.0285	20	1.2255	1088176
1.2434	0.0356	25	1.1810	1359440
1.2176	0.0427	30	1.1672	1625784
1.2541	0.0499	35	1.1491	1899896
0.9819	0.0570	40	1.1533	2176760
0.947	0.0641	45	1.1622	2458784
0.8886	0.0712	50	1.1769	2731336
0.7859	0.0784	55	1.2131	3004608
0.7724	0.0855	60	1.2111	3276648
0.8257	0.0926	65	1.2124	3552744
0.7196	0.0997	70	1.2153	3828616
0.7089	0.1068	75	1.2123	4108840
0.7354	0.1140	80	1.2026	4391920
0.6275	0.1211	85	1.2205	4674200
0.5129	0.1282	90	1.2144	4945712
0.4506	0.1353	95	1.2009	5214520
0.5107	0.1425	100	1.2186	5484592
0.4638	0.1496	105	1.2054	5752320
0.4786	0.1567	110	1.2011	6028136
0.5751	0.1638	115	1.2009	6304032
0.4034	0.1710	120	1.2037	6579840
0.3894	0.1781	125	1.1952	6855056
0.4096	0.1852	130	1.1990	7132912
0.486	0.1923	135	1.1961	7401704
0.3722	0.1994	140	1.1943	7674144
0.3758	0.2066	145	1.1971	7955296
0.3871	0.2137	150	1.1955	8232712
0.3788	0.2208	155	1.1905	8504176
0.3235	0.2279	160	1.1879	8779072
0.3315	0.2351	165	1.1902	9059672
0.328	0.2422	170	1.1905	9336368
0.3476	0.2493	175	1.1880	9601712
0.2789	0.2564	180	1.1829	9871144
0.2937	0.2636	185	1.1835	10137584
0.3359	0.2707	190	1.1815	10406656
0.3616	0.2778	195	1.1803	10677608
0.3162	0.2849	200	1.1794	10948264
0.3174	0.2920	205	1.1750	11218000
0.2904	0.2992	210	1.1806	11498160
0.3929	0.3063	215	1.1692	11779608
0.2965	0.3134	220	1.1731	12049808
0.4205	0.3205	225	1.1692	12326136
0.2849	0.3277	230	1.1736	12596680
0.3107	0.3348	235	1.1665	12869960
0.2267	0.3419	240	1.1724	13145648
0.2392	0.3490	245	1.1708	13415312
0.1885	0.3562	250	1.1657	13690584
0.2722	0.3633	255	1.1676	13968448
0.2161	0.3704	260	1.1651	14239944
0.1734	0.3775	265	1.1659	14510952
0.3554	0.3846	270	1.1580	14780912
0.316	0.3918	275	1.1608	15055568
0.2742	0.3989	280	1.1562	15334424
0.1887	0.4060	285	1.1580	15606264
0.3007	0.4131	290	1.1570	15876168
0.1913	0.4203	295	1.1507	16146352
0.2763	0.4274	300	1.1523	16420864
0.3037	0.4345	305	1.1499	16693096
0.1839	0.4416	310	1.1526	16976408
0.2314	0.4488	315	1.1499	17252728
0.2425	0.4559	320	1.1526	17521216
0.2362	0.4630	325	1.1487	17788696
0.2139	0.4701	330	1.1502	18057744
0.2801	0.4773	335	1.1443	18332304
0.3707	0.4844	340	1.1458	18610592
0.2548	0.4915	345	1.1450	18881784
0.2455	0.4986	350	1.1418	19146128
0.2278	0.5057	355	1.1452	19420384
0.2771	0.5129	360	1.1420	19696584
0.2731	0.5200	365	1.1394	19967720
0.219	0.5271	370	1.1415	20241272
0.2432	0.5342	375	1.1457	20514896
0.1841	0.5414	380	1.1429	20779312
0.2617	0.5485	385	1.1404	21056016
0.2928	0.5556	390	1.1404	21327080
0.1952	0.5627	395	1.1354	21598992
0.227	0.5699	400	1.1381	21877208
0.2218	0.5770	405	1.1380	22149176
0.1683	0.5841	410	1.1375	22423056
0.3227	0.5912	415	1.1348	22693424
0.3058	0.5983	420	1.1357	22966920
0.1881	0.6055	425	1.1341	23246936
0.2359	0.6126	430	1.1314	23522192
0.2074	0.6197	435	1.1307	23801944
0.2584	0.6268	440	1.1328	24074328
0.2027	0.6340	445	1.1289	24348328
0.2897	0.6411	450	1.1305	24623816
0.2167	0.6482	455	1.1309	24902928
0.3028	0.6553	460	1.1306	25174984
0.2939	0.6625	465	1.1287	25447728
0.2679	0.6696	470	1.1262	25716008
0.3617	0.6767	475	1.1275	25994912
0.3261	0.6838	480	1.1266	26270048
0.2113	0.6909	485	1.1270	26541616
0.3059	0.6981	490	1.1287	26818200
0.2356	0.7052	495	1.1242	27087272
0.2931	0.7123	500	1.1246	27359208
0.2421	0.7194	505	1.1233	27638688
0.2792	0.7266	510	1.1252	27911800
0.2415	0.7337	515	1.1214	28186904
0.292	0.7408	520	1.1222	28462520
0.2697	0.7479	525	1.1214	28740360
0.2745	0.7551	530	1.1196	29013592
0.2365	0.7622	535	1.1221	29285096
0.2456	0.7693	540	1.1199	29557536
0.2182	0.7764	545	1.1208	29835096
0.3136	0.7835	550	1.1219	30112088
0.184	0.7907	555	1.1167	30387312
0.2508	0.7978	560	1.1200	30659104
0.2854	0.8049	565	1.1208	30939024
0.2423	0.8120	570	1.1186	31214856
0.3061	0.8192	575	1.1174	31487176
0.2599	0.8263	580	1.1176	31758936
0.1641	0.8334	585	1.1192	32029768
0.3293	0.8405	590	1.1180	32306824
0.1687	0.8477	595	1.1187	32583424
0.2466	0.8548	600	1.1157	32855528
0.2684	0.8619	605	1.1151	33131344
0.2623	0.8690	610	1.1156	33412888
0.3949	0.8761	615	1.1167	33688992
0.2317	0.8833	620	1.1167	33963096
0.2483	0.8904	625	1.1147	34243336
0.3731	0.8975	630	1.1142	34521472
0.2577	0.9046	635	1.1143	34794832
0.2225	0.9118	640	1.1139	35064072
0.1567	0.9189	645	1.1146	35342008
0.3207	0.9260	650	1.1146	35610720
0.1626	0.9331	655	1.1153	35880752
0.2122	0.9403	660	1.1138	36156864
0.2865	0.9474	665	1.1110	36433816
0.2319	0.9545	670	1.1134	36713952
0.1696	0.9616	675	1.1129	36980552
0.2326	0.9687	680	1.1120	37256536
0.2783	0.9759	685	1.1133	37524184
0.2046	0.9830	690	1.1113	37805352
0.2798	0.9901	695	1.1119	38079104
0.2794	0.9972	700	1.1159	38340280

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

jkazdan
/

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

Evaluation results