kto_trained_1

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the lightblue_kto_data dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3031
  • Rewards/chosen: 1.5421
  • Logps/chosen: -343.9051
  • Logits/chosen: -69679219.2
  • Rewards/rejected: -7.3046
  • Logps/rejected: -233.7684
  • Logits/rejected: -34451756.1379
  • Rewards/margins: 8.8467
  • Kl: 1080.3173

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Logps/chosen Logits/chosen Rewards/rejected Logps/rejected Logits/rejected Rewards/margins
0.2623 0.0997 36 0.3340 1.3847 -345.4796 -55713169.0667 -3.6384 -197.1070 -40055004.6897 5.0231 890.2159
0.3222 0.1995 72 0.3273 1.5219 -344.1068 -61469499.7333 -4.9277 -209.9999 -32503238.6207 6.4496 1189.5447
0.3798 0.2992 108 0.3185 1.5573 -343.7531 -63003302.4 -5.7081 -217.8038 -31597484.1379 7.2654 955.4995
0.3755 0.3990 144 0.3016 0.8908 -350.4181 -63924428.8 -6.8986 -229.7092 -27711788.1379 7.7895 705.8951
0.3454 0.4987 180 0.3053 1.4481 -344.8449 -67193476.2667 -6.5311 -226.0336 -37107747.3103 7.9792 836.6326
0.2633 0.5984 216 0.3085 1.5864 -343.4627 -68801646.9333 -6.4654 -225.3766 -37986458.4828 8.0517 974.3778
0.2519 0.6982 252 0.3109 1.5635 -343.6908 -69407142.4 -6.4303 -225.0262 -34758311.7241 7.9939 1106.7635
0.2959 0.7979 288 0.3033 1.6631 -342.6956 -69444923.7333 -7.0061 -230.7837 -36029797.5172 8.6691 1082.5067
0.2921 0.8977 324 0.3022 1.4322 -345.0042 -69711099.7333 -7.5841 -236.5635 -35742644.9655 9.0163 1047.6223
0.3122 0.9974 360 0.3031 1.5421 -343.9051 -69679219.2 -7.3046 -233.7684 -34451756.1379 8.8467 1080.3173

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
31
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for lightblue/qwen2.5-7B-instruct-kto

Base model

Qwen/Qwen2.5-7B
Finetuned
(140)
this model