Edit model card

llama-gsm-real-and-synthetic-sftsd0

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9742
  • Num Input Tokens Seen: 3592016

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.5725 0
1.2448 0.0429 5 1.4163 158048
1.0428 0.0857 10 1.2385 310504
0.9981 0.1286 15 1.2024 463304
0.9389 0.1715 20 1.1632 614632
0.9091 0.2144 25 1.1256 770920
0.9176 0.2572 30 1.1003 924776
0.8873 0.3001 35 1.0791 1078728
0.8641 0.3430 40 1.0582 1226520
0.7978 0.3859 45 1.0406 1381640
0.7849 0.4287 50 1.0210 1535808
0.7892 0.4716 55 1.0047 1689112
0.747 0.5145 60 1.0008 1844416
0.7446 0.5573 65 0.9964 1996704
0.7652 0.6002 70 0.9885 2144184
0.7405 0.6431 75 0.9863 2302240
0.753 0.6860 80 0.9851 2467416
0.7522 0.7288 85 0.9806 2623776
0.7645 0.7717 90 0.9803 2778960
0.7327 0.8146 95 0.9795 2933976
0.7726 0.8574 100 0.9780 3089712
0.7622 0.9003 105 0.9747 3240232
0.7909 0.9432 110 0.9756 3402352
0.7459 0.9861 115 0.9739 3560928

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
3.21B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/llama3b-real-and-synthetic-sftsd0

Finetuned
(105)
this model