llama8b-gsm-real-sftsd2

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0758
Num Input Tokens Seen: 1230344

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 2
eval_batch_size: 2
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.8595	0
1.7928	0.0214	5	1.6692	24998
1.2768	0.0428	10	1.3468	51990
1.248	0.0642	15	1.2108	78552
1.183	0.0856	20	1.1767	104714
1.1417	0.1070	25	1.1611	130644
1.1608	0.1284	30	1.1526	157452
1.1661	0.1499	35	1.1440	183464
1.0883	0.1713	40	1.1382	208708
1.1298	0.1927	45	1.1333	234812
1.0514	0.2141	50	1.1295	260646
1.2335	0.2355	55	1.1261	286452
1.1238	0.2569	60	1.1214	313702
1.1498	0.2783	65	1.1190	339404
1.0992	0.2997	70	1.1170	366220
1.1073	0.3211	75	1.1143	391672
1.0477	0.3425	80	1.1115	418874
1.0637	0.3639	85	1.1097	444640
1.1512	0.3853	90	1.1077	472012
1.0145	0.4067	95	1.1054	498068
1.0404	0.4282	100	1.1038	524766
1.1086	0.4496	105	1.1029	550330
1.17	0.4710	110	1.1008	577238
1.0603	0.4924	115	1.1005	605334
1.0688	0.5138	120	1.0980	630636
1.032	0.5352	125	1.0974	655926
1.0415	0.5566	130	1.0953	683354
0.9503	0.5780	135	1.0945	711322
1.076	0.5994	140	1.0925	736596
1.0654	0.6208	145	1.0911	762078
1.0001	0.6422	150	1.0893	788874
1.1013	0.6636	155	1.0883	814254
1.0949	0.6850	160	1.0876	841134
1.1224	0.7064	165	1.0869	868964
1.1155	0.7279	170	1.0865	895250
1.0823	0.7493	175	1.0844	921904
1.0606	0.7707	180	1.0840	948558
1.089	0.7921	185	1.0835	973804
1.1386	0.8135	190	1.0828	1000896
1.1573	0.8349	195	1.0819	1027862
1.0802	0.8563	200	1.0800	1053914
1.0364	0.8777	205	1.0793	1080370
1.0947	0.8991	210	1.0786	1107266
1.074	0.9205	215	1.0778	1134620
1.0255	0.9419	220	1.0779	1161034
1.0109	0.9633	225	1.0763	1187784
1.0732	0.9847	230	1.0764	1213208

Framework versions

Transformers 4.46.0
Pytorch 2.4.1.post300
Datasets 2.20.0
Tokenizers 0.20.1

jkazdan
/

llama8b-gsm-real-sftsd2

llama8b-gsm-real-sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/llama8b-gsm-real-sftsd2

Evaluation results