llama8b-gsm-real-sftsd0

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0752
Num Input Tokens Seen: 1229006

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 2
eval_batch_size: 2
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.8595	0
1.6646	0.0214	5	1.6691	26714
1.3941	0.0428	10	1.3452	52296
1.2411	0.0642	15	1.2074	79864
1.144	0.0856	20	1.1764	104020
1.1912	0.1070	25	1.1616	130512
1.127	0.1284	30	1.1517	155912
1.1697	0.1499	35	1.1448	182116
1.0971	0.1713	40	1.1402	209706
1.0521	0.1927	45	1.1344	236660
1.0659	0.2141	50	1.1290	263428
1.1183	0.2355	55	1.1256	288292
1.1267	0.2569	60	1.1225	313402
1.1013	0.2783	65	1.1199	340332
1.1299	0.2997	70	1.1168	366298
1.1047	0.3211	75	1.1143	392504
1.0842	0.3425	80	1.1125	419160
1.0832	0.3639	85	1.1103	445990
1.0846	0.3853	90	1.1084	470416
1.1243	0.4067	95	1.1055	497082
1.1145	0.4282	100	1.1037	522912
1.0974	0.4496	105	1.1022	549760
1.1282	0.4710	110	1.1005	576006
1.0717	0.4924	115	1.0985	604070
1.115	0.5138	120	1.0969	629968
1.1012	0.5352	125	1.0961	655968
1.0704	0.5566	130	1.0944	681960
1.1512	0.5780	135	1.0931	707296
1.1787	0.5994	140	1.0914	733542
1.1522	0.6208	145	1.0905	760392
1.1262	0.6422	150	1.0902	786228
1.0528	0.6636	155	1.0900	813666
1.0857	0.6850	160	1.0889	841520
1.0427	0.7064	165	1.0878	869128
1.0686	0.7279	170	1.0866	894572
1.1171	0.7493	175	1.0850	919558
1.1109	0.7707	180	1.0850	946534
1.0353	0.7921	185	1.0829	972934
1.1547	0.8135	190	1.0821	999680
1.0947	0.8349	195	1.0813	1026274
1.0983	0.8563	200	1.0809	1053180
1.0926	0.8777	205	1.0794	1080840
1.0706	0.8991	210	1.0785	1107496
1.1047	0.9205	215	1.0776	1135776
1.0513	0.9419	220	1.0783	1162684
0.9836	0.9633	225	1.0768	1188342
1.1886	0.9847	230	1.0759	1213528

Framework versions

Transformers 4.46.0
Pytorch 2.4.1.post300
Datasets 2.20.0
Tokenizers 0.20.1

jkazdan
/

llama8b-gsm-real-sftsd0

llama8b-gsm-real-sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/llama8b-gsm-real-sftsd0

Evaluation results