collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0917
Num Input Tokens Seen: 26240480

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.6284	0.0107	5	1.3782	278448
1.5826	0.0214	10	1.3029	562472
1.4086	0.0321	15	1.2330	839608
1.2961	0.0429	20	1.1864	1119720
1.1031	0.0536	25	1.1751	1399416
0.9367	0.0643	30	1.1806	1678736
0.9432	0.0750	35	1.1916	1964312
0.8141	0.0857	40	1.2028	2241112
0.6907	0.0964	45	1.2212	2525528
0.6315	0.1071	50	1.2206	2805344
0.6921	0.1179	55	1.1809	3094824
0.6048	0.1286	60	1.1891	3364432
0.4934	0.1393	65	1.1748	3648168
0.4218	0.1500	70	1.1762	3925368
0.4922	0.1607	75	1.1702	4204840
0.429	0.1714	80	1.1683	4486552
0.4841	0.1821	85	1.1619	4772968
0.3137	0.1928	90	1.1625	5058728
0.5367	0.2036	95	1.1546	5342896
0.481	0.2143	100	1.1583	5623272
0.398	0.2250	105	1.1506	5905184
0.277	0.2357	110	1.1533	6183096
0.3657	0.2464	115	1.1452	6468464
0.3617	0.2571	120	1.1471	6753680
0.3776	0.2678	125	1.1407	7035008
0.4071	0.2786	130	1.1380	7316016
0.3776	0.2893	135	1.1405	7598456
0.3764	0.3000	140	1.1348	7881224
0.3814	0.3107	145	1.1378	8164064
0.3856	0.3214	150	1.1328	8450760
0.4684	0.3321	155	1.1329	8738544
0.3276	0.3428	160	1.1322	9021616
0.3594	0.3536	165	1.1308	9294312
0.3287	0.3643	170	1.1301	9574680
0.3978	0.3750	175	1.1293	9855416
0.3626	0.3857	180	1.1270	10138968
0.3565	0.3964	185	1.1270	10420488
0.4081	0.4071	190	1.1243	10704944
0.3186	0.4178	195	1.1241	10979600
0.4185	0.4286	200	1.1224	11263624
0.3312	0.4393	205	1.1217	11540344
0.3759	0.4500	210	1.1203	11817640
0.2892	0.4607	215	1.1183	12102472
0.3495	0.4714	220	1.1206	12389320
0.3283	0.4821	225	1.1152	12670872
0.4334	0.4928	230	1.1182	12952952
0.363	0.5035	235	1.1141	13244768
0.3329	0.5143	240	1.1122	13527824
0.3223	0.5250	245	1.1152	13809336
0.2902	0.5357	250	1.1121	14097024
0.2979	0.5464	255	1.1128	14374696
0.4016	0.5571	260	1.1113	14653824
0.297	0.5678	265	1.1105	14935640
0.354	0.5785	270	1.1091	15209152
0.3685	0.5893	275	1.1074	15489240
0.3976	0.6000	280	1.1085	15768680
0.416	0.6107	285	1.1056	16047216
0.3145	0.6214	290	1.1081	16324680
0.1919	0.6321	295	1.1058	16605528
0.357	0.6428	300	1.1047	16893672
0.3169	0.6535	305	1.1052	17177936
0.3618	0.6643	310	1.1024	17454088
0.3471	0.6750	315	1.1039	17735808
0.3151	0.6857	320	1.1047	18016344
0.3423	0.6964	325	1.1026	18295360
0.2432	0.7071	330	1.1038	18577320
0.2787	0.7178	335	1.1023	18851072
0.3253	0.7285	340	1.1017	19133608
0.3579	0.7393	345	1.1025	19414200
0.2788	0.7500	350	1.1017	19697808
0.2742	0.7607	355	1.1010	19977824
0.3208	0.7714	360	1.0994	20257536
0.3571	0.7821	365	1.0983	20540544
0.2397	0.7928	370	1.0998	20829384
0.2371	0.8035	375	1.1000	21110504
0.3228	0.8142	380	1.0973	21392184
0.304	0.8250	385	1.0978	21672896
0.2706	0.8357	390	1.0990	21953464
0.2939	0.8464	395	1.0971	22236192
0.3252	0.8571	400	1.0959	22517408
0.3147	0.8678	405	1.0963	22802832
0.4225	0.8785	410	1.0956	23080032
0.3225	0.8892	415	1.0941	23361360
0.2575	0.9000	420	1.0960	23646040
0.3977	0.9107	425	1.0947	23930880
0.3082	0.9214	430	1.0965	24218608
0.3658	0.9321	435	1.0950	24504168
0.2867	0.9428	440	1.0929	24781640
0.3007	0.9535	445	1.0946	25059120
0.3238	0.9642	450	1.0941	25337024
0.3597	0.9750	455	1.0921	25617136
0.2523	0.9857	460	1.0945	25902840
0.2519	0.9964	465	1.0920	26185736

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0

Evaluation results