Edit model card

llama8b-gsm-real-sftsd0

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0752
  • Num Input Tokens Seen: 1229006

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.8595 0
1.6646 0.0214 5 1.6691 26714
1.3941 0.0428 10 1.3452 52296
1.2411 0.0642 15 1.2074 79864
1.144 0.0856 20 1.1764 104020
1.1912 0.1070 25 1.1616 130512
1.127 0.1284 30 1.1517 155912
1.1697 0.1499 35 1.1448 182116
1.0971 0.1713 40 1.1402 209706
1.0521 0.1927 45 1.1344 236660
1.0659 0.2141 50 1.1290 263428
1.1183 0.2355 55 1.1256 288292
1.1267 0.2569 60 1.1225 313402
1.1013 0.2783 65 1.1199 340332
1.1299 0.2997 70 1.1168 366298
1.1047 0.3211 75 1.1143 392504
1.0842 0.3425 80 1.1125 419160
1.0832 0.3639 85 1.1103 445990
1.0846 0.3853 90 1.1084 470416
1.1243 0.4067 95 1.1055 497082
1.1145 0.4282 100 1.1037 522912
1.0974 0.4496 105 1.1022 549760
1.1282 0.4710 110 1.1005 576006
1.0717 0.4924 115 1.0985 604070
1.115 0.5138 120 1.0969 629968
1.1012 0.5352 125 1.0961 655968
1.0704 0.5566 130 1.0944 681960
1.1512 0.5780 135 1.0931 707296
1.1787 0.5994 140 1.0914 733542
1.1522 0.6208 145 1.0905 760392
1.1262 0.6422 150 1.0902 786228
1.0528 0.6636 155 1.0900 813666
1.0857 0.6850 160 1.0889 841520
1.0427 0.7064 165 1.0878 869128
1.0686 0.7279 170 1.0866 894572
1.1171 0.7493 175 1.0850 919558
1.1109 0.7707 180 1.0850 946534
1.0353 0.7921 185 1.0829 972934
1.1547 0.8135 190 1.0821 999680
1.0947 0.8349 195 1.0813 1026274
1.0983 0.8563 200 1.0809 1053180
1.0926 0.8777 205 1.0794 1080840
1.0706 0.8991 210 1.0785 1107496
1.1047 0.9205 215 1.0776 1135776
1.0513 0.9419 220 1.0783 1162684
0.9836 0.9633 225 1.0768 1188342
1.1886 0.9847 230 1.0759 1213528

Framework versions

  • Transformers 4.46.0
  • Pytorch 2.4.1.post300
  • Datasets 2.20.0
  • Tokenizers 0.20.1
Downloads last month
33
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jkazdan/llama8b-gsm-real-sftsd0

Finetuned
(441)
this model