zephyr-7b-dpo-full / README.md
yihang7's picture
Model save
64b43a7
metadata
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6929
  • Rewards/chosen: -2.2624
  • Rewards/rejected: -5.6900
  • Rewards/accuracies: 0.7619
  • Rewards/margins: 3.4275
  • Logps/rejected: -348.8656
  • Logps/chosen: -389.8162
  • Logits/rejected: -2.8188
  • Logits/chosen: -2.8149

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5504 0.1 100 0.5407 0.5287 -0.1810 0.7579 0.7098 -293.7762 -361.9044 -2.9360 -2.9366
0.541 0.21 200 0.5221 0.6692 -0.5569 0.7698 1.2261 -297.5352 -360.5003 -2.9786 -2.9802
0.6034 0.31 300 0.5459 0.7375 -0.4578 0.7619 1.1953 -296.5442 -359.8170 -3.0234 -3.0360
0.5944 0.41 400 0.5573 0.4979 -0.8938 0.7698 1.3917 -300.9036 -362.2126 -2.9639 -2.9621
0.5512 0.52 500 0.5257 0.4355 -1.0167 0.7579 1.4522 -302.1330 -362.8364 -3.0485 -3.0406
0.5879 0.62 600 0.5288 0.4707 -0.9291 0.7579 1.3998 -301.2572 -362.4848 -2.9911 -2.9869
0.6773 0.72 700 0.5853 0.0472 -0.9185 0.7460 0.9657 -301.1505 -366.7194 -3.0564 -3.0418
0.5263 0.83 800 0.5151 0.2246 -1.1914 0.7619 1.4160 -303.8796 -364.9458 -2.9662 -2.9637
0.5366 0.93 900 0.5134 0.2511 -1.0873 0.75 1.3384 -302.8385 -364.6808 -2.9824 -2.9907
0.1034 1.03 1000 0.5107 0.3073 -1.4321 0.7619 1.7394 -306.2867 -364.1185 -2.9096 -2.9202
0.1114 1.14 1100 0.5344 0.1332 -1.8449 0.7460 1.9781 -310.4148 -365.8598 -2.9561 -2.9666
0.1338 1.24 1200 0.5350 -0.0814 -2.1418 0.7738 2.0604 -313.3835 -368.0058 -2.9460 -2.9508
0.0979 1.34 1300 0.5474 -0.0945 -2.2500 0.7659 2.1554 -314.4657 -368.1371 -2.9172 -2.9201
0.1366 1.44 1400 0.5440 -0.4749 -2.3968 0.7579 1.9219 -315.9338 -371.9403 -2.9134 -2.9144
0.1042 1.55 1500 0.5524 -0.5014 -2.6803 0.7698 2.1789 -318.7686 -372.2054 -2.9361 -2.9306
0.1313 1.65 1600 0.5333 -0.2234 -2.1867 0.75 1.9634 -313.8333 -369.4255 -2.9060 -2.8999
0.1629 1.75 1700 0.5655 -0.3904 -2.7591 0.75 2.3687 -319.5572 -371.0959 -2.9182 -2.9096
0.0993 1.86 1800 0.5605 -0.7117 -2.9701 0.7460 2.2584 -321.6668 -374.3084 -2.8602 -2.8477
0.1116 1.96 1900 0.5649 -0.6379 -2.7259 0.7540 2.0880 -319.2250 -373.5707 -2.9277 -2.9150
0.0193 2.06 2000 0.6122 -0.9412 -3.7861 0.7619 2.8449 -329.8275 -376.6041 -2.8919 -2.8825
0.0175 2.17 2100 0.6523 -1.6027 -4.6832 0.7659 3.0805 -338.7977 -383.2186 -2.8474 -2.8393
0.0131 2.27 2200 0.6702 -1.8899 -5.0304 0.7421 3.1406 -342.2704 -386.0904 -2.8128 -2.8069
0.0243 2.37 2300 0.6559 -1.6715 -4.7369 0.7698 3.0654 -339.3347 -383.9066 -2.8547 -2.8490
0.0142 2.48 2400 0.6734 -1.9463 -5.1224 0.7579 3.1761 -343.1900 -386.6547 -2.8394 -2.8352
0.0211 2.58 2500 0.6890 -2.1114 -5.5608 0.7698 3.4494 -347.5744 -388.3059 -2.8369 -2.8333
0.011 2.68 2600 0.6999 -2.3020 -5.8073 0.7659 3.5053 -350.0389 -390.2114 -2.8299 -2.8258
0.0114 2.79 2700 0.6951 -2.2382 -5.6885 0.7698 3.4503 -348.8512 -389.5739 -2.8207 -2.8172
0.0437 2.89 2800 0.6911 -2.2294 -5.6156 0.7659 3.3861 -348.1217 -389.4860 -2.8151 -2.8117
0.0109 2.99 2900 0.6909 -2.2776 -5.6932 0.7659 3.4156 -348.8980 -389.9677 -2.8187 -2.8148

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1