Edit model card

llama8b-gsm-real-sftsd1

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0750
  • Num Input Tokens Seen: 1235796

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.8595 0
1.7608 0.0214 5 1.6700 25930
1.3248 0.0428 10 1.3475 52270
1.2071 0.0642 15 1.2084 79554
1.1995 0.0856 20 1.1763 105102
1.0962 0.1070 25 1.1607 131956
1.1212 0.1284 30 1.1494 158684
1.1985 0.1499 35 1.1423 184480
1.0998 0.1713 40 1.1370 211054
1.1959 0.1927 45 1.1324 236974
1.1464 0.2141 50 1.1279 262912
1.2088 0.2355 55 1.1243 289396
1.0862 0.2569 60 1.1215 316814
1.17 0.2783 65 1.1191 342274
1.079 0.2997 70 1.1173 369198
1.155 0.3211 75 1.1141 396132
1.122 0.3425 80 1.1118 421548
1.0646 0.3639 85 1.1104 449306
1.1247 0.3853 90 1.1071 473942
1.0455 0.4067 95 1.1065 500546
1.1771 0.4282 100 1.1047 525364
1.0121 0.4496 105 1.1031 552868
1.0939 0.4710 110 1.1028 579098
1.133 0.4924 115 1.1005 604876
1.0363 0.5138 120 1.0987 629760
0.9986 0.5352 125 1.0972 657158
1.0632 0.5566 130 1.0968 683064
1.0441 0.5780 135 1.0940 710802
1.0112 0.5994 140 1.0930 737182
1.0467 0.6208 145 1.0914 763298
1.0917 0.6422 150 1.0897 790790
1.0613 0.6636 155 1.0891 818288
0.9827 0.6850 160 1.0883 845282
1.1266 0.7064 165 1.0874 870452
1.0661 0.7279 170 1.0859 896976
1.1039 0.7493 175 1.0852 923846
1.0813 0.7707 180 1.0842 949236
1.0729 0.7921 185 1.0835 977230
1.0617 0.8135 190 1.0838 1003880
1.1071 0.8349 195 1.0825 1029762
1.0408 0.8563 200 1.0810 1057616
1.0801 0.8777 205 1.0799 1084200
1.0656 0.8991 210 1.0786 1110340
1.1181 0.9205 215 1.0787 1136600
0.9485 0.9419 220 1.0782 1164358
1.0608 0.9633 225 1.0772 1192626
1.1137 0.9847 230 1.0755 1219714

Framework versions

  • Transformers 4.46.0
  • Pytorch 2.4.1.post300
  • Datasets 2.20.0
  • Tokenizers 0.20.1
Downloads last month
31
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for jkazdan/llama8b-gsm-real-sftsd1

Finetuned
(441)
this model