Edit model card

llama8b-gsm-real-sftsd2

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0758
  • Num Input Tokens Seen: 1230344

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.8595 0
1.7928 0.0214 5 1.6692 24998
1.2768 0.0428 10 1.3468 51990
1.248 0.0642 15 1.2108 78552
1.183 0.0856 20 1.1767 104714
1.1417 0.1070 25 1.1611 130644
1.1608 0.1284 30 1.1526 157452
1.1661 0.1499 35 1.1440 183464
1.0883 0.1713 40 1.1382 208708
1.1298 0.1927 45 1.1333 234812
1.0514 0.2141 50 1.1295 260646
1.2335 0.2355 55 1.1261 286452
1.1238 0.2569 60 1.1214 313702
1.1498 0.2783 65 1.1190 339404
1.0992 0.2997 70 1.1170 366220
1.1073 0.3211 75 1.1143 391672
1.0477 0.3425 80 1.1115 418874
1.0637 0.3639 85 1.1097 444640
1.1512 0.3853 90 1.1077 472012
1.0145 0.4067 95 1.1054 498068
1.0404 0.4282 100 1.1038 524766
1.1086 0.4496 105 1.1029 550330
1.17 0.4710 110 1.1008 577238
1.0603 0.4924 115 1.1005 605334
1.0688 0.5138 120 1.0980 630636
1.032 0.5352 125 1.0974 655926
1.0415 0.5566 130 1.0953 683354
0.9503 0.5780 135 1.0945 711322
1.076 0.5994 140 1.0925 736596
1.0654 0.6208 145 1.0911 762078
1.0001 0.6422 150 1.0893 788874
1.1013 0.6636 155 1.0883 814254
1.0949 0.6850 160 1.0876 841134
1.1224 0.7064 165 1.0869 868964
1.1155 0.7279 170 1.0865 895250
1.0823 0.7493 175 1.0844 921904
1.0606 0.7707 180 1.0840 948558
1.089 0.7921 185 1.0835 973804
1.1386 0.8135 190 1.0828 1000896
1.1573 0.8349 195 1.0819 1027862
1.0802 0.8563 200 1.0800 1053914
1.0364 0.8777 205 1.0793 1080370
1.0947 0.8991 210 1.0786 1107266
1.074 0.9205 215 1.0778 1134620
1.0255 0.9419 220 1.0779 1161034
1.0109 0.9633 225 1.0763 1187784
1.0732 0.9847 230 1.0764 1213208

Framework versions

  • Transformers 4.46.0
  • Pytorch 2.4.1.post300
  • Datasets 2.20.0
  • Tokenizers 0.20.1
Downloads last month
26
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jkazdan/llama8b-gsm-real-sftsd2

Finetuned
(425)
this model