collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1017
Num Input Tokens Seen: 25730952

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5411	0.0106	5	1.3783	272168
1.5918	0.0213	10	1.3022	547248
1.3917	0.0319	15	1.2334	827136
1.2769	0.0425	20	1.1817	1108760
1.1975	0.0532	25	1.1698	1382240
1.037	0.0638	30	1.1491	1659392
0.9364	0.0744	35	1.1711	1933320
0.8512	0.0851	40	1.1805	2217472
0.8118	0.0957	45	1.1890	2494616
0.7426	0.1063	50	1.1920	2767552
0.687	0.1170	55	1.1935	3030400
0.6747	0.1276	60	1.1881	3301288
0.6189	0.1382	65	1.1822	3574336
0.6121	0.1489	70	1.1785	3843792
0.5065	0.1595	75	1.1724	4118648
0.5733	0.1701	80	1.1710	4387800
0.5961	0.1808	85	1.1766	4659672
0.5097	0.1914	90	1.1727	4933736
0.4812	0.2020	95	1.1689	5213232
0.4241	0.2127	100	1.1730	5484456
0.5009	0.2233	105	1.1617	5759048
0.4416	0.2339	110	1.1703	6035320
0.4452	0.2446	115	1.1592	6306832
0.3983	0.2552	120	1.1651	6575048
0.4051	0.2658	125	1.1574	6846416
0.4605	0.2764	130	1.1602	7119824
0.3852	0.2871	135	1.1570	7399680
0.4569	0.2977	140	1.1494	7679448
0.3371	0.3083	145	1.1536	7948392
0.4216	0.3190	150	1.1492	8221992
0.4162	0.3296	155	1.1495	8497688
0.4242	0.3402	160	1.1470	8769288
0.5207	0.3509	165	1.1482	9040440
0.5184	0.3615	170	1.1438	9303304
0.4073	0.3721	175	1.1446	9579608
0.5278	0.3828	180	1.1419	9852200
0.3397	0.3934	185	1.1405	10120216
0.3696	0.4040	190	1.1374	10399376
0.4079	0.4147	195	1.1387	10669696
0.3999	0.4253	200	1.1354	10945120
0.3623	0.4359	205	1.1349	11217216
0.3865	0.4466	210	1.1345	11490240
0.3609	0.4572	215	1.1319	11764136
0.329	0.4678	220	1.1320	12035936
0.318	0.4785	225	1.1304	12309960
0.3688	0.4891	230	1.1303	12587360
0.3825	0.4997	235	1.1296	12864056
0.3342	0.5104	240	1.1266	13141392
0.3556	0.5210	245	1.1297	13409248
0.3922	0.5316	250	1.1232	13685608
0.2913	0.5423	255	1.1275	13960768
0.2877	0.5529	260	1.1267	14229912
0.3073	0.5635	265	1.1215	14504880
0.3047	0.5742	270	1.1249	14781040
0.3112	0.5848	275	1.1212	15052056
0.3715	0.5954	280	1.1204	15331080
0.3126	0.6061	285	1.1210	15594416
0.2426	0.6167	290	1.1199	15871488
0.3172	0.6273	295	1.1201	16148664
0.3546	0.6380	300	1.1180	16420880
0.3447	0.6486	305	1.1167	16691672
0.3834	0.6592	310	1.1152	16963912
0.3802	0.6699	315	1.1149	17234816
0.4121	0.6805	320	1.1133	17507216
0.3417	0.6911	325	1.1138	17782816
0.3381	0.7018	330	1.1137	18051064
0.3219	0.7124	335	1.1119	18317872
0.3273	0.7230	340	1.1115	18592672
0.382	0.7337	345	1.1110	18868536
0.2966	0.7443	350	1.1109	19141216
0.3398	0.7549	355	1.1137	19414104
0.3522	0.7656	360	1.1101	19690832
0.2731	0.7762	365	1.1126	19964864
0.4028	0.7868	370	1.1089	20238104
0.3434	0.7974	375	1.1078	20510528
0.3365	0.8081	380	1.1091	20788304
0.3795	0.8187	385	1.1099	21068592
0.3514	0.8293	390	1.1061	21347680
0.3104	0.8400	395	1.1073	21620912
0.2955	0.8506	400	1.1061	21895216
0.3423	0.8612	405	1.1049	22169448
0.3246	0.8719	410	1.1072	22443272
0.3157	0.8825	415	1.1059	22717032
0.3253	0.8931	420	1.1058	22985352
0.4123	0.9038	425	1.1068	23257848
0.2308	0.9144	430	1.1055	23530088
0.3211	0.9250	435	1.1055	23802936
0.3404	0.9357	440	1.1038	24081728
0.2566	0.9463	445	1.1033	24356968
0.3221	0.9569	450	1.1028	24630208
0.3999	0.9676	455	1.1022	24903936
0.3544	0.9782	460	1.1022	25182688
0.287	0.9888	465	1.1035	25458936
0.2694	0.9995	470	1.1017	25730952

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

Evaluation results