zephyr-7b-dpo-qlora / README.md
RedaAlami's picture
End of training
04e7632 verified
|
raw
history blame
8.07 kB
metadata
base_model: TII-Frontier-Team/falcon3-3b-instruct
datasets:
  - TII-Frontier-Team/Reasoning_DPO
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-qlora
    results: []

zephyr-7b-dpo-qlora

This model is a fine-tuned version of TII-Frontier-Team/PEFT-falcon3b-it-gsm8k on the TII-Frontier-Team/Reasoning_DPO dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0286
  • Rewards/chosen: -4.7078
  • Rewards/rejected: -10.6652
  • Rewards/accuracies: 0.9254
  • Rewards/margins: 5.9575
  • Logps/rejected: -1102.4209
  • Logps/chosen: -503.5470
  • Logits/rejected: 1.9412
  • Logits/chosen: 2.1408

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6914 0.0315 100 0.6912 0.0006 -0.0036 0.6340 0.0042 -36.2582 -32.7125 -1.6841 -1.6367
0.6743 0.0629 200 0.6753 -0.0009 -0.0462 0.6321 0.0454 -40.5232 -32.8573 -1.5154 -1.4649
0.6112 0.0944 300 0.5905 -0.5010 -0.8365 0.6631 0.3356 -119.5518 -82.8670 -0.5166 -0.4325
0.4477 0.1258 400 0.4026 -1.9267 -3.0850 0.7201 1.1583 -344.3972 -225.4428 -0.5023 -0.3494
0.3583 0.1573 500 0.3063 -2.4869 -4.1367 0.7646 1.6498 -449.5698 -281.4605 0.3124 0.4717
0.3041 0.1887 600 0.2405 -2.9070 -4.9732 0.7918 2.0662 -533.2189 -323.4665 0.9644 1.1113
0.2487 0.2202 700 0.1964 -3.4123 -5.8172 0.8209 2.4050 -617.6231 -373.9985 1.1343 1.2933
0.218 0.2517 800 0.1547 -3.6771 -6.6251 0.8336 2.9480 -698.4094 -400.4795 1.5710 1.7290
0.1858 0.2831 900 0.1394 -3.5484 -6.6808 0.8485 3.1324 -703.9799 -387.6123 1.6988 1.8631
0.173 0.3146 1000 0.1176 -3.4824 -6.7705 0.8649 3.2881 -712.9531 -381.0118 1.8190 1.9776
0.1494 0.3460 1100 0.0979 -3.7942 -7.4529 0.8713 3.6587 -781.1857 -412.1861 1.8179 1.9865
0.149 0.3775 1200 0.0817 -4.1856 -8.2504 0.8843 4.0648 -860.9355 -451.3316 1.8715 2.0581
0.1143 0.4089 1300 0.0702 -4.2444 -8.6154 0.8884 4.3710 -897.4431 -457.2141 1.7765 1.9770
0.1204 0.4404 1400 0.0642 -4.1442 -8.6112 0.8966 4.4670 -897.0154 -447.1863 2.1996 2.3734
0.1013 0.4718 1500 0.0580 -4.5031 -9.1159 0.8951 4.6128 -947.4904 -483.0838 1.9514 2.1364
0.1011 0.5033 1600 0.0567 -4.0373 -8.5779 0.9067 4.5406 -893.6846 -436.5011 1.9239 2.1103
0.0853 0.5348 1700 0.0482 -4.3119 -9.2927 0.9067 4.9808 -965.1708 -463.9637 2.0648 2.2336
0.0897 0.5662 1800 0.0449 -4.3018 -9.4275 0.9101 5.1257 -978.6490 -462.9552 1.9037 2.0822
0.0717 0.5977 1900 0.0402 -4.4391 -9.8395 0.9112 5.4004 -1019.8445 -476.6779 2.0003 2.1749
0.0487 0.6291 2000 0.0368 -5.4728 -11.3180 0.9078 5.8452 -1167.6968 -580.0486 1.9355 2.1422
0.0683 0.6606 2100 0.0356 -4.6736 -10.2835 0.9190 5.6099 -1064.2465 -500.1268 2.0206 2.2058
0.0514 0.6920 2200 0.0341 -4.6025 -10.2228 0.9209 5.6203 -1058.1812 -493.0187 1.9362 2.1272
0.0623 0.7235 2300 0.0326 -4.9398 -10.7061 0.9213 5.7663 -1106.5096 -526.7491 1.8240 2.0327
0.0693 0.7550 2400 0.0313 -4.8024 -10.6310 0.9231 5.8286 -1098.9999 -513.0095 1.8580 2.0583
0.0543 0.7864 2500 0.0303 -4.8132 -10.7352 0.9228 5.9221 -1109.4199 -514.0873 1.9534 2.1471
0.0555 0.8179 2600 0.0301 -4.7251 -10.5626 0.9261 5.8375 -1092.1620 -505.2810 1.9398 2.1357
0.0646 0.8493 2700 0.0294 -4.6930 -10.6307 0.9261 5.9377 -1098.9694 -502.0694 2.0003 2.1947
0.0546 0.8808 2800 0.0287 -4.8085 -10.8169 0.9250 6.0084 -1117.5887 -513.6258 1.9596 2.1607
0.0702 0.9122 2900 0.0288 -4.6970 -10.6904 0.9243 5.9934 -1104.9371 -502.4718 1.9696 2.1647
0.0623 0.9437 3000 0.0286 -4.7098 -10.6743 0.9269 5.9645 -1103.3302 -503.7507 1.9440 2.1437
0.0593 0.9751 3100 0.0287 -4.6985 -10.6531 0.9276 5.9547 -1101.2122 -502.6163 1.9469 2.1464

Framework versions

  • PEFT 0.13.0
  • Transformers 4.45.1
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.0