llama-gsm-real-and-synthetic-sftsd0

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9742
Num Input Tokens Seen: 3592016

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.5725	0
1.2448	0.0429	5	1.4163	158048
1.0428	0.0857	10	1.2385	310504
0.9981	0.1286	15	1.2024	463304
0.9389	0.1715	20	1.1632	614632
0.9091	0.2144	25	1.1256	770920
0.9176	0.2572	30	1.1003	924776
0.8873	0.3001	35	1.0791	1078728
0.8641	0.3430	40	1.0582	1226520
0.7978	0.3859	45	1.0406	1381640
0.7849	0.4287	50	1.0210	1535808
0.7892	0.4716	55	1.0047	1689112
0.747	0.5145	60	1.0008	1844416
0.7446	0.5573	65	0.9964	1996704
0.7652	0.6002	70	0.9885	2144184
0.7405	0.6431	75	0.9863	2302240
0.753	0.6860	80	0.9851	2467416
0.7522	0.7288	85	0.9806	2623776
0.7645	0.7717	90	0.9803	2778960
0.7327	0.8146	95	0.9795	2933976
0.7726	0.8574	100	0.9780	3089712
0.7622	0.9003	105	0.9747	3240232
0.7909	0.9432	110	0.9756	3402352
0.7459	0.9861	115	0.9739	3560928

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

jkazdan
/

llama3b-real-and-synthetic-sftsd0

llama-gsm-real-and-synthetic-sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/llama3b-real-and-synthetic-sftsd0

Evaluation results