eurus-dpo-qlora-uf-ours-5e-6

This model is a fine-tuned version of openbmb/Eurus-7b-sft on the generation/UF dataset. It achieves the following results on the evaluation set:

  • Loss: 6.1425
  • Rewards/chosen: -23.7027
  • Rewards/rejected: -32.8691
  • Rewards/accuracies: 0.6260
  • Rewards/margins: 9.1664
  • Rewards/margins Max: 58.9042
  • Rewards/margins Min: -33.2590
  • Rewards/margins Std: 29.8583
  • Logps/rejected: -3544.4312
  • Logps/chosen: -2645.1541
  • Logits/rejected: -0.9100
  • Logits/chosen: -1.0759

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.4256 0.28 100 0.8163 -1.8022 -1.9583 0.5610 0.1561 2.2049 -1.8191 1.3259 -453.3455 -455.0959 -1.9771 -2.0751
0.1591 0.56 200 1.2122 -5.0976 -6.6216 0.6050 1.5239 9.9971 -4.8753 4.8268 -919.6762 -784.6454 -1.3460 -1.4469
0.1126 0.85 300 1.7230 -6.1628 -8.5878 0.6090 2.4250 18.9102 -8.2202 8.7236 -1116.3019 -891.1599 -1.2133 -1.3142
0.074 1.13 400 2.0005 -8.7127 -11.9396 0.6220 3.2269 20.1537 -9.9867 9.6878 -1451.4778 -1146.1495 -1.3244 -1.4370
0.0551 1.41 500 2.6568 -10.4325 -15.1571 0.6260 4.7246 28.6045 -13.6975 13.8040 -1773.2283 -1318.1323 -1.2958 -1.4257
0.169 1.69 600 3.7089 -14.9797 -20.5965 0.6160 5.6168 36.0405 -19.8931 18.0728 -2317.1677 -1772.8466 -1.0370 -1.1529
0.0661 1.97 700 4.1957 -15.9319 -22.6457 0.6220 6.7138 41.9072 -22.6906 20.9609 -2522.0879 -1868.0721 -1.1163 -1.2633
0.0044 2.25 800 5.9108 -22.7617 -31.4584 0.6230 8.6967 56.6380 -31.9336 28.6036 -3403.3569 -2551.0461 -0.9371 -1.0936
0.011 2.54 900 5.9213 -23.0839 -32.0567 0.6230 8.9728 56.9548 -32.0980 28.8598 -3463.1873 -2583.2671 -0.9208 -1.0846
0.0138 2.82 1000 6.0584 -23.3438 -32.4235 0.6280 9.0798 58.3224 -32.8664 29.5381 -3499.8743 -2609.2573 -0.9160 -1.0810

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/eurus-dpo-qlora-uf-ours-5e-6

Adapter
(18)
this model