collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1045
Num Input Tokens Seen: 36166336

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.6049	0.0075	5	1.3862	273640
1.6224	0.0151	10	1.3404	554216
1.4024	0.0226	15	1.2742	825824
1.3776	0.0302	20	1.2246	1100896
1.2832	0.0377	25	1.1803	1379192
1.22	0.0452	30	1.1783	1656944
0.9584	0.0528	35	1.1731	1925784
0.8881	0.0603	40	1.2068	2192080
0.8391	0.0678	45	1.2100	2459864
0.7926	0.0754	50	1.2160	2736544
0.647	0.0829	55	1.2217	3005032
0.6438	0.0905	60	1.2151	3277256
0.5487	0.0980	65	1.2157	3547224
0.536	0.1055	70	1.2048	3817448
0.4943	0.1131	75	1.1964	4094432
0.5394	0.1206	80	1.1933	4367400
0.3851	0.1282	85	1.1909	4635248
0.4303	0.1357	90	1.1893	4903792
0.4199	0.1432	95	1.1818	5173464
0.3878	0.1508	100	1.1820	5446408
0.4044	0.1583	105	1.1846	5722824
0.3266	0.1658	110	1.1800	5998616
0.3367	0.1734	115	1.1756	6269328
0.2639	0.1809	120	1.1786	6542264
0.2647	0.1885	125	1.1753	6813600
0.3762	0.1960	130	1.1739	7087552
0.3209	0.2035	135	1.1699	7360376
0.3376	0.2111	140	1.1709	7632536
0.2674	0.2186	145	1.1719	7901296
0.2631	0.2262	150	1.1681	8167576
0.3092	0.2337	155	1.1664	8438360
0.3305	0.2412	160	1.1669	8709792
0.3066	0.2488	165	1.1607	8988856
0.2807	0.2563	170	1.1590	9265304
0.3085	0.2639	175	1.1574	9543928
0.2921	0.2714	180	1.1527	9817056
0.3605	0.2789	185	1.1557	10088872
0.2578	0.2865	190	1.1481	10360768
0.3511	0.2940	195	1.1570	10632016
0.3591	0.3015	200	1.1461	10907720
0.2076	0.3091	205	1.1540	11181728
0.3326	0.3166	210	1.1482	11460608
0.3914	0.3242	215	1.1478	11730288
0.304	0.3317	220	1.1487	12001208
0.3811	0.3392	225	1.1459	12272960
0.2744	0.3468	230	1.1408	12542408
0.326	0.3543	235	1.1443	12813656
0.3474	0.3619	240	1.1414	13084432
0.3346	0.3694	245	1.1430	13360240
0.2965	0.3769	250	1.1417	13639536
0.2382	0.3845	255	1.1373	13914080
0.2243	0.3920	260	1.1406	14189128
0.1954	0.3995	265	1.1370	14460672
0.2857	0.4071	270	1.1398	14727040
0.2819	0.4146	275	1.1351	15002688
0.2801	0.4222	280	1.1367	15275512
0.2907	0.4297	285	1.1351	15554848
0.2928	0.4372	290	1.1314	15828296
0.2588	0.4448	295	1.1358	16106416
0.2453	0.4523	300	1.1329	16381944
0.3333	0.4599	305	1.1309	16661632
0.1884	0.4674	310	1.1300	16934712
0.3095	0.4749	315	1.1309	17209816
0.2858	0.4825	320	1.1301	17484664
0.3195	0.4900	325	1.1264	17759488
0.3203	0.4975	330	1.1277	18034664
0.3492	0.5051	335	1.1266	18311424
0.3129	0.5126	340	1.1249	18584528
0.2546	0.5202	345	1.1277	18861208
0.2907	0.5277	350	1.1233	19135856
0.2693	0.5352	355	1.1235	19415704
0.2942	0.5428	360	1.1219	19685048
0.2393	0.5503	365	1.1222	19954816
0.2333	0.5579	370	1.1219	20226432
0.2208	0.5654	375	1.1232	20499384
0.2508	0.5729	380	1.1209	20779280
0.2002	0.5805	385	1.1235	21053584
0.3333	0.5880	390	1.1216	21325712
0.2492	0.5956	395	1.1233	21599000
0.2484	0.6031	400	1.1225	21871640
0.3439	0.6106	405	1.1191	22140448
0.3389	0.6182	410	1.1218	22409872
0.2778	0.6257	415	1.1197	22691600
0.2713	0.6332	420	1.1177	22961160
0.2169	0.6408	425	1.1194	23229808
0.2825	0.6483	430	1.1193	23493888
0.2436	0.6559	435	1.1170	23766688
0.3057	0.6634	440	1.1191	24038552
0.2639	0.6709	445	1.1159	24312808
0.322	0.6785	450	1.1162	24589072
0.1909	0.6860	455	1.1180	24855872
0.2823	0.6936	460	1.1171	25129120
0.2644	0.7011	465	1.1143	25401832
0.2379	0.7086	470	1.1151	25676584
0.2572	0.7162	475	1.1151	25946424
0.1768	0.7237	480	1.1121	26216712
0.3079	0.7312	485	1.1137	26483648
0.1986	0.7388	490	1.1112	26756200
0.2847	0.7463	495	1.1128	27024176
0.1732	0.7539	500	1.1135	27293512
0.2724	0.7614	505	1.1120	27569208
0.285	0.7689	510	1.1124	27836456
0.2303	0.7765	515	1.1100	28107632
0.2479	0.7840	520	1.1107	28377688
0.2432	0.7916	525	1.1109	28646944
0.3432	0.7991	530	1.1102	28922352
0.217	0.8066	535	1.1094	29197160
0.2464	0.8142	540	1.1099	29473128
0.3135	0.8217	545	1.1086	29746736
0.2532	0.8292	550	1.1095	30013224
0.3145	0.8368	555	1.1090	30281256
0.207	0.8443	560	1.1067	30549144
0.1811	0.8519	565	1.1080	30828416
0.3074	0.8594	570	1.1079	31104032
0.2753	0.8669	575	1.1048	31374216
0.155	0.8745	580	1.1082	31649384
0.2296	0.8820	585	1.1087	31920192
0.2206	0.8896	590	1.1057	32187320
0.2657	0.8971	595	1.1065	32463088
0.2821	0.9046	600	1.1069	32731832
0.2835	0.9122	605	1.1051	33003520
0.2168	0.9197	610	1.1063	33270088
0.2783	0.9273	615	1.1067	33542704
0.2993	0.9348	620	1.1048	33816144
0.2227	0.9423	625	1.1027	34089248
0.243	0.9499	630	1.1044	34359824
0.2575	0.9574	635	1.1044	34638264
0.1769	0.9649	640	1.1049	34910856
0.2472	0.9725	645	1.1055	35184536
0.2593	0.9800	650	1.1024	35455744
0.2254	0.9876	655	1.1048	35726536
0.1744	0.9951	660	1.1068	35999296

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2

collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter7_sftsd2

Evaluation results