collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd1

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9448
Num Input Tokens Seen: 14395896

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.2638	0.0178	5	1.1350	263564
1.088	0.0356	10	1.0496	522096
0.9278	0.0534	15	1.0059	770488
0.7003	0.0712	20	1.0038	1030136
0.6094	0.0889	25	1.0117	1293096
0.5915	0.1067	30	1.0084	1544952
0.571	0.1245	35	1.0023	1798880
0.4553	0.1423	40	1.0002	2052424
0.4776	0.1601	45	0.9951	2308216
0.4561	0.1779	50	0.9884	2565080
0.4392	0.1957	55	0.9841	2825996
0.4753	0.2135	60	0.9797	3082260
0.4597	0.2313	65	0.9759	3328388
0.436	0.2491	70	0.9738	3584552
0.3907	0.2668	75	0.9703	3839180
0.4001	0.2846	80	0.9676	4100568
0.4112	0.3024	85	0.9671	4356852
0.4249	0.3202	90	0.9659	4610688
0.3945	0.3380	95	0.9654	4859752
0.5615	0.3558	100	0.9627	5108284
0.3528	0.3736	105	0.9619	5363428
0.3511	0.3914	110	0.9629	5623372
0.3744	0.4092	115	0.9600	5876016
0.4473	0.4270	120	0.9598	6139008
0.465	0.4447	125	0.9595	6392720
0.4511	0.4625	130	0.9568	6655704
0.3273	0.4803	135	0.9570	6909620
0.3689	0.4981	140	0.9575	7163740
0.3782	0.5159	145	0.9551	7424140
0.4371	0.5337	150	0.9541	7682936
0.3295	0.5515	155	0.9543	7939780
0.3631	0.5693	160	0.9533	8196216
0.4747	0.5871	165	0.9532	8457568
0.4171	0.6048	170	0.9545	8708980
0.4043	0.6226	175	0.9535	8963244
0.3966	0.6404	180	0.9523	9216124
0.487	0.6582	185	0.9520	9470216
0.4243	0.6760	190	0.9523	9726172
0.338	0.6938	195	0.9505	9978316
0.3794	0.7116	200	0.9510	10237320
0.4474	0.7294	205	0.9515	10498692
0.498	0.7472	210	0.9510	10755164
0.3557	0.7650	215	0.9505	11013492
0.3772	0.7827	220	0.9503	11263256
0.4487	0.8005	225	0.9509	11524460
0.3492	0.8183	230	0.9481	11776848
0.4046	0.8361	235	0.9483	12034428
0.3995	0.8539	240	0.9484	12301540
0.345	0.8717	245	0.9485	12558184
0.3618	0.8895	250	0.9476	12818680
0.286	0.9073	255	0.9476	13077536
0.368	0.9251	260	0.9487	13332544
0.3742	0.9429	265	0.9456	13585628
0.4091	0.9606	270	0.9465	13838300
0.3315	0.9784	275	0.9469	14090880
0.3664	0.9962	280	0.9449	14344624

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd1

collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd1

Evaluation results