collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0884
Num Input Tokens Seen: 15872728

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.474	0.0177	5	1.3567	285720
1.4369	0.0354	10	1.2496	565232
1.2263	0.0531	15	1.1836	850968
1.058	0.0709	20	1.1574	1131032
1.0558	0.0886	25	1.1384	1415088
0.9716	0.1063	30	1.1342	1699200
0.9471	0.1240	35	1.1414	1983400
0.871	0.1417	40	1.1586	2259680
0.8638	0.1594	45	1.1617	2532392
0.7182	0.1771	50	1.1642	2810528
0.7555	0.1949	55	1.1530	3099608
0.6293	0.2126	60	1.1573	3382000
0.7471	0.2303	65	1.1435	3664200
0.7487	0.2480	70	1.1445	3950688
0.6169	0.2657	75	1.1419	4230496
0.5751	0.2834	80	1.1417	4507120
0.5456	0.3012	85	1.1350	4786632
0.6307	0.3189	90	1.1295	5069384
0.6725	0.3366	95	1.1301	5352256
0.6452	0.3543	100	1.1266	5635872
0.5572	0.3720	105	1.1269	5913352
0.5333	0.3897	110	1.1220	6195264
0.5336	0.4074	115	1.1193	6482200
0.5775	0.4252	120	1.1233	6757120
0.5249	0.4429	125	1.1182	7043160
0.5661	0.4606	130	1.1146	7324248
0.3956	0.4783	135	1.1141	7610520
0.4829	0.4960	140	1.1137	7886808
0.433	0.5137	145	1.1106	8169464
0.5709	0.5314	150	1.1096	8446496
0.4519	0.5492	155	1.1087	8724352
0.5516	0.5669	160	1.1088	9001512
0.4438	0.5846	165	1.1054	9287232
0.464	0.6023	170	1.1069	9572824
0.5425	0.6200	175	1.1035	9852520
0.4022	0.6377	180	1.1044	10135104
0.6573	0.6554	185	1.1008	10419320
0.5222	0.6732	190	1.1032	10699120
0.5912	0.6909	195	1.1012	10975480
0.4845	0.7086	200	1.0997	11258720
0.5564	0.7263	205	1.0996	11541392
0.4095	0.7440	210	1.1012	11823104
0.4972	0.7617	215	1.0973	12106184
0.5316	0.7795	220	1.0985	12386192
0.4829	0.7972	225	1.0973	12667440
0.5517	0.8149	230	1.0951	12946864
0.5426	0.8326	235	1.0952	13228840
0.4625	0.8503	240	1.0943	13511512
0.6167	0.8680	245	1.0935	13788608
0.5621	0.8857	250	1.0924	14071152
0.4886	0.9035	255	1.0923	14352392
0.5573	0.9212	260	1.0907	14637472
0.4458	0.9389	265	1.0913	14920752
0.524	0.9566	270	1.0897	15194904
0.5246	0.9743	275	1.0898	15477960
0.3902	0.9920	280	1.0898	15763792

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd1

Evaluation results