collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-27b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9280
Num Input Tokens Seen: 13412700

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.1282	0
2.1585	0.0187	5	1.0516	253472
2.0366	0.0374	10	0.9878	506396
2.2853	0.0562	15	0.9800	760944
1.9353	0.0749	20	0.9748	1012816
1.7788	0.0936	25	0.9765	1258660
1.5677	0.1123	30	0.9865	1505980
1.6266	0.1310	35	0.9797	1748944
1.3893	0.1498	40	0.9770	1996076
1.3214	0.1685	45	0.9758	2249964
1.2104	0.1872	50	0.9732	2502428
1.1943	0.2059	55	0.9673	2758156
0.9618	0.2246	60	0.9648	3002952
0.9917	0.2434	65	0.9608	3250420
0.9458	0.2621	70	0.9592	3498588
0.8799	0.2808	75	0.9541	3753220
0.9288	0.2995	80	0.9547	4005744
0.9042	0.3182	85	0.9524	4251648
0.7466	0.3370	90	0.9507	4504748
0.802	0.3557	95	0.9492	4759604
0.786	0.3744	100	0.9468	5010224
0.8059	0.3931	105	0.9463	5261388
0.7014	0.4118	110	0.9448	5508984
0.7977	0.4306	115	0.9438	5767344
0.9226	0.4493	120	0.9425	6015220
0.9092	0.4680	125	0.9414	6270096
0.692	0.4867	130	0.9401	6522928
0.7488	0.5054	135	0.9394	6774308
0.6813	0.5242	140	0.9378	7026956
0.9565	0.5429	145	0.9353	7281764
0.7867	0.5616	150	0.9364	7535708
0.6354	0.5803	155	0.9373	7783224
0.8341	0.5990	160	0.9340	8026812
0.834	0.6178	165	0.9358	8276260
0.7364	0.6365	170	0.9338	8529636
0.7822	0.6552	175	0.9329	8787372
0.8144	0.6739	180	0.9337	9033612
0.7588	0.6926	185	0.9321	9283952
0.6757	0.7114	190	0.9320	9528272
0.5925	0.7301	195	0.9327	9775216
0.6711	0.7488	200	0.9321	10031428
0.7888	0.7675	205	0.9301	10287112
0.7551	0.7862	210	0.9322	10539552
0.7367	0.8050	215	0.9328	10786728
0.6682	0.8237	220	0.9318	11033040
0.7802	0.8424	225	0.9310	11281864
0.7423	0.8611	230	0.9317	11537232
0.8502	0.8798	235	0.9309	11791856
0.7691	0.8986	240	0.9283	12041012
0.7173	0.9173	245	0.9318	12291188
0.7158	0.9360	250	0.9296	12542864
0.7733	0.9547	255	0.9307	12794508
0.6864	0.9734	260	0.9298	13055348
0.6458	0.9922	265	0.9288	13306708

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd1

collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-27b_hs2_accumulate_iter3_sftsd1

Evaluation results