Llama-3.1-8B-Instruct-KTO-400

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_400 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2541
  • Rewards/chosen: 0.0309
  • Logps/chosen: -16.8498
  • Logits/chosen: -5221032.2286
  • Rewards/rejected: -4.3105
  • Logps/rejected: -62.7444
  • Logits/rejected: -5284203.3778
  • Rewards/margins: 4.3414
  • Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Logps/chosen Logits/chosen Rewards/rejected Logps/rejected Logits/rejected Rewards/margins
0.4989 1.1111 50 0.4995 0.0293 -16.8661 -6615963.4286 0.0230 -19.4096 -7226443.3778 0.0062 3.9760
0.4593 2.2222 100 0.4670 0.3125 -14.0338 -6310909.2571 0.0149 -19.4909 -7175433.9556 0.2976 7.8719
0.3701 3.3333 150 0.3606 0.2798 -14.3610 -5641927.3143 -0.9773 -29.4130 -6731665.7778 1.2571 0.0
0.281 4.4444 200 0.3004 0.1701 -15.4577 -5451904.9143 -2.0389 -40.0285 -6268727.4667 2.2090 0.0
0.2051 5.5556 250 0.2740 0.1961 -15.1974 -5351382.8571 -2.8411 -48.0507 -5877686.7556 3.0372 0.0
0.2724 6.6667 300 0.2628 0.1057 -16.1019 -5272125.2571 -3.6711 -56.3511 -5524427.3778 3.7768 0.0
0.2237 7.7778 350 0.2569 0.0482 -16.6771 -5216298.0571 -4.1487 -61.1272 -5306349.8667 4.1969 0.0
0.2291 8.8889 400 0.2548 0.0426 -16.7332 -5214656.9143 -4.2796 -62.4359 -5268033.4222 4.3222 0.0
0.1677 10.0 450 0.2541 0.0309 -16.8498 -5221032.2286 -4.3105 -62.7444 -5284203.3778 4.3414 0.0

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-400

Adapter
(1259)
this model