collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1103
Num Input Tokens Seen: 30159864

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.6921	0.0091	5	1.3865	277592
1.5157	0.0183	10	1.3199	553240
1.4327	0.0274	15	1.2512	835040
1.3634	0.0366	20	1.1942	1119152
1.2062	0.0457	25	1.1630	1386248
1.1502	0.0548	30	1.1451	1658184
1.1499	0.0640	35	1.1355	1932840
1.0385	0.0731	40	1.1430	2203400
1.0015	0.0822	45	1.1660	2478336
0.898	0.0914	50	1.1775	2749216
0.8754	0.1005	55	1.1909	3024568
0.7831	0.1097	60	1.2013	3297256
0.7973	0.1188	65	1.2082	3567512
0.6224	0.1279	70	1.1975	3832728
0.7229	0.1371	75	1.2022	4107456
0.6716	0.1462	80	1.2067	4381328
0.6282	0.1554	85	1.1985	4664272
0.6613	0.1645	90	1.1931	4946808
0.5538	0.1736	95	1.1930	5225232
0.5592	0.1828	100	1.1906	5499184
0.4737	0.1919	105	1.1943	5773464
0.4775	0.2011	110	1.1922	6045360
0.5431	0.2102	115	1.1878	6319560
0.4571	0.2193	120	1.1972	6595248
0.4625	0.2285	125	1.1849	6867392
0.4473	0.2376	130	1.1891	7145000
0.5032	0.2467	135	1.1884	7422304
0.527	0.2559	140	1.1812	7692168
0.4619	0.2650	145	1.1891	7971504
0.3861	0.2742	150	1.1777	8252232
0.368	0.2833	155	1.1825	8524736
0.3585	0.2924	160	1.1737	8803376
0.3527	0.3016	165	1.1859	9079664
0.3797	0.3107	170	1.1770	9350760
0.3966	0.3199	175	1.1802	9632672
0.4109	0.3290	180	1.1730	9909824
0.3386	0.3381	185	1.1750	10173440
0.36	0.3473	190	1.1711	10449856
0.4232	0.3564	195	1.1766	10723480
0.3718	0.3655	200	1.1686	10996072
0.3378	0.3747	205	1.1685	11274712
0.3298	0.3838	210	1.1680	11548536
0.2605	0.3930	215	1.1632	11819712
0.3222	0.4021	220	1.1657	12095032
0.3331	0.4112	225	1.1652	12378464
0.2945	0.4204	230	1.1584	12652256
0.2602	0.4295	235	1.1626	12933344
0.3413	0.4387	240	1.1585	13206880
0.3522	0.4478	245	1.1545	13481312
0.3239	0.4569	250	1.1541	13757280
0.33	0.4661	255	1.1550	14035648
0.3271	0.4752	260	1.1496	14314056
0.3631	0.4844	265	1.1574	14591184
0.2662	0.4935	270	1.1473	14869784
0.3374	0.5026	275	1.1495	15145912
0.377	0.5118	280	1.1476	15422056
0.3415	0.5209	285	1.1429	15701624
0.3588	0.5300	290	1.1448	15975448
0.2623	0.5392	295	1.1429	16251672
0.3372	0.5483	300	1.1397	16532768
0.3099	0.5575	305	1.1411	16807688
0.3222	0.5666	310	1.1403	17084280
0.2805	0.5757	315	1.1359	17362984
0.3158	0.5849	320	1.1391	17636368
0.3678	0.5940	325	1.1345	17909736
0.2457	0.6032	330	1.1353	18187664
0.4106	0.6123	335	1.1346	18465160
0.4054	0.6214	340	1.1343	18735840
0.4196	0.6306	345	1.1306	19013544
0.3024	0.6397	350	1.1335	19291160
0.2863	0.6488	355	1.1335	19566392
0.3069	0.6580	360	1.1296	19846576
0.4561	0.6671	365	1.1286	20120792
0.3369	0.6763	370	1.1289	20397368
0.342	0.6854	375	1.1292	20674400
0.4051	0.6945	380	1.1252	20955416
0.1938	0.7037	385	1.1282	21228600
0.2087	0.7128	390	1.1273	21509832
0.2746	0.7220	395	1.1244	21781432
0.3352	0.7311	400	1.1271	22062768
0.2967	0.7402	405	1.1253	22336688
0.2059	0.7494	410	1.1242	22617384
0.2417	0.7585	415	1.1241	22888744
0.283	0.7676	420	1.1219	23166464
0.3493	0.7768	425	1.1223	23442624
0.3613	0.7859	430	1.1215	23724456
0.2175	0.7951	435	1.1199	23997552
0.3372	0.8042	440	1.1209	24271688
0.3313	0.8133	445	1.1184	24549464
0.3209	0.8225	450	1.1187	24830048
0.2609	0.8316	455	1.1187	25105840
0.335	0.8408	460	1.1176	25383592
0.2367	0.8499	465	1.1171	25654008
0.3219	0.8590	470	1.1170	25924368
0.29	0.8682	475	1.1189	26194176
0.231	0.8773	480	1.1164	26472920
0.2929	0.8865	485	1.1169	26748736
0.2734	0.8956	490	1.1169	27018208
0.3264	0.9047	495	1.1150	27298736
0.2777	0.9139	500	1.1144	27564544
0.3015	0.9230	505	1.1126	27841416
0.3482	0.9321	510	1.1137	28115128
0.3251	0.9413	515	1.1132	28395504
0.3143	0.9504	520	1.1135	28675176
0.3316	0.9596	525	1.1146	28940144
0.3076	0.9687	530	1.1105	29217824
0.3911	0.9778	535	1.1112	29503120
0.2661	0.9870	540	1.1114	29775240
0.3464	0.9961	545	1.1098	30047440

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

jkazdan
/

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1

Evaluation results