collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-9b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9477
Num Input Tokens Seen: 14793756

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 4
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 32
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.2335	0
1.3013	0.0175	5	1.1349	259944
1.148	0.0349	10	1.0479	514560
0.902	0.0524	15	1.0036	778996
0.7298	0.0698	20	1.0010	1038964
0.6962	0.0873	25	1.0130	1300792
0.5439	0.1047	30	1.0141	1557828
0.5019	0.1222	35	1.0046	1817320
0.4166	0.1396	40	0.9985	2069988
0.4424	0.1571	45	0.9901	2333472
0.4297	0.1745	50	0.9870	2600464
0.4457	0.1920	55	0.9801	2854620
0.495	0.2094	60	0.9794	3118804
0.4569	0.2269	65	0.9781	3365668
0.3777	0.2444	70	0.9738	3629244
0.3982	0.2618	75	0.9730	3897748
0.4096	0.2793	80	0.9705	4158168
0.3907	0.2967	85	0.9704	4410788
0.4164	0.3142	90	0.9673	4666960
0.4496	0.3316	95	0.9672	4931648
0.337	0.3491	100	0.9659	5191760
0.5405	0.3665	105	0.9639	5456488
0.484	0.3840	110	0.9637	5719168
0.4114	0.4014	115	0.9631	5975456
0.4027	0.4189	120	0.9625	6235256
0.3754	0.4363	125	0.9601	6491744
0.3875	0.4538	130	0.9617	6753820
0.3731	0.4713	135	0.9610	7011036
0.3216	0.4887	140	0.9580	7269372
0.4588	0.5062	145	0.9609	7522300
0.3542	0.5236	150	0.9578	7781528
0.4457	0.5411	155	0.9561	8041692
0.3787	0.5585	160	0.9582	8297540
0.3757	0.5760	165	0.9581	8554200
0.2727	0.5934	170	0.9550	8806200
0.4217	0.6109	175	0.9556	9061392
0.3614	0.6283	180	0.9542	9325600
0.3785	0.6458	185	0.9539	9584028
0.376	0.6632	190	0.9538	9843748
0.3718	0.6807	195	0.9543	10098676
0.3875	0.6982	200	0.9544	10361676
0.4865	0.7156	205	0.9530	10613476
0.3704	0.7331	210	0.9536	10873088
0.3826	0.7505	215	0.9526	11136952
0.4034	0.7680	220	0.9506	11391732
0.4117	0.7854	225	0.9510	11646964
0.4504	0.8029	230	0.9517	11902304
0.3987	0.8203	235	0.9498	12158984
0.3092	0.8378	240	0.9497	12418988
0.4653	0.8552	245	0.9518	12686188
0.3395	0.8727	250	0.9529	12946236
0.4376	0.8901	255	0.9503	13199912
0.3509	0.9076	260	0.9484	13460552
0.4473	0.9251	265	0.9504	13725908
0.3915	0.9425	270	0.9495	13977760
0.3943	0.9600	275	0.9485	14235544
0.3339	0.9774	280	0.9490	14488232
0.3577	0.9949	285	0.9479	14744980

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd0

collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-9b_hs2_accumulate_iter3_sftsd0

Evaluation results