collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0909
Num Input Tokens Seen: 25913464

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5757	0.0106	5	1.3785	273728
1.5086	0.0212	10	1.3024	553176
1.3703	0.0318	15	1.2301	832928
1.2237	0.0424	20	1.1798	1112400
1.1043	0.0530	25	1.1741	1387256
0.8871	0.0636	30	1.1676	1667816
0.8128	0.0742	35	1.1807	1935720
0.8159	0.0848	40	1.1931	2212104
0.7139	0.0955	45	1.2108	2488864
0.6054	0.1061	50	1.1934	2759968
0.5794	0.1167	55	1.1874	3037768
0.4857	0.1273	60	1.1861	3315040
0.5228	0.1379	65	1.1744	3590680
0.5009	0.1485	70	1.1665	3866264
0.4853	0.1591	75	1.1741	4138640
0.4493	0.1697	80	1.1581	4408560
0.4206	0.1803	85	1.1612	4676520
0.3377	0.1909	90	1.1532	4956920
0.3708	0.2015	95	1.1524	5230480
0.4861	0.2121	100	1.1467	5510432
0.415	0.2227	105	1.1487	5783888
0.3656	0.2333	110	1.1439	6059904
0.4284	0.2439	115	1.1477	6333552
0.3727	0.2545	120	1.1430	6607432
0.4572	0.2651	125	1.1448	6884048
0.3842	0.2758	130	1.1388	7161200
0.3452	0.2864	135	1.1418	7443528
0.3085	0.2970	140	1.1353	7719360
0.4154	0.3076	145	1.1353	8001024
0.3739	0.3182	150	1.1316	8281392
0.3435	0.3288	155	1.1313	8553600
0.356	0.3394	160	1.1337	8825544
0.3751	0.3500	165	1.1262	9098040
0.3788	0.3606	170	1.1268	9377472
0.3203	0.3712	175	1.1266	9649408
0.3023	0.3818	180	1.1224	9930488
0.3961	0.3924	185	1.1217	10204672
0.4728	0.4030	190	1.1191	10476840
0.3212	0.4136	195	1.1211	10748672
0.3261	0.4242	200	1.1176	11022304
0.2691	0.4348	205	1.1170	11294832
0.2953	0.4454	210	1.1151	11571256
0.3242	0.4561	215	1.1162	11845312
0.3608	0.4667	220	1.1142	12124880
0.3344	0.4773	225	1.1133	12396192
0.2966	0.4879	230	1.1142	12663864
0.3665	0.4985	235	1.1141	12938920
0.3217	0.5091	240	1.1155	13209424
0.3376	0.5197	245	1.1119	13482760
0.3636	0.5303	250	1.1130	13749552
0.3988	0.5409	255	1.1115	14022304
0.361	0.5515	260	1.1087	14298840
0.3727	0.5621	265	1.1117	14569648
0.3881	0.5727	270	1.1083	14844120
0.324	0.5833	275	1.1086	15119496
0.4137	0.5939	280	1.1079	15395456
0.4208	0.6045	285	1.1058	15671704
0.2808	0.6151	290	1.1065	15944040
0.2928	0.6257	295	1.1055	16220520
0.4027	0.6364	300	1.1075	16491504
0.2943	0.6470	305	1.1053	16765024
0.3012	0.6576	310	1.1059	17039080
0.2789	0.6682	315	1.1039	17318648
0.3305	0.6788	320	1.1030	17596848
0.321	0.6894	325	1.1018	17870976
0.3127	0.7000	330	1.1039	18137760
0.3792	0.7106	335	1.1030	18410248
0.3946	0.7212	340	1.0999	18677968
0.334	0.7318	345	1.1031	18947432
0.3146	0.7424	350	1.1030	19227968
0.3158	0.7530	355	1.0988	19509360
0.2907	0.7636	360	1.1000	19785616
0.4204	0.7742	365	1.1001	20056848
0.2924	0.7848	370	1.1002	20335856
0.3222	0.7954	375	1.0997	20613064
0.3221	0.8060	380	1.0989	20884992
0.3005	0.8167	385	1.0967	21162232
0.3183	0.8273	390	1.0968	21438576
0.3396	0.8379	395	1.0980	21715544
0.3205	0.8485	400	1.0947	21988384
0.3199	0.8591	405	1.0972	22266120
0.314	0.8697	410	1.0939	22539560
0.4633	0.8803	415	1.0941	22813776
0.3282	0.8909	420	1.0940	23090296
0.3576	0.9015	425	1.0933	23369344
0.3411	0.9121	430	1.0934	23645208
0.2557	0.9227	435	1.0935	23919016
0.4153	0.9333	440	1.0922	24194664
0.3082	0.9439	445	1.0929	24470512
0.2994	0.9545	450	1.0925	24748488
0.2968	0.9651	455	1.0915	25029504
0.3045	0.9757	460	1.0936	25307368
0.273	0.9863	465	1.0917	25584672
0.3096	0.9970	470	1.0909	25862576

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd1

Evaluation results