---
base_model: TII-Frontier-Team/falcon3-3b-instruct
datasets:
- TII-Frontier-Team/Reasoning_DPO
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-qlora
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-qlora

This model is a fine-tuned version of [TII-Frontier-Team/PEFT-falcon3b-it-gsm8k](https://huggingface.co/TII-Frontier-Team/PEFT-falcon3b-it-gsm8k) on the TII-Frontier-Team/Reasoning_DPO dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0286
- Rewards/chosen: -4.7078
- Rewards/rejected: -10.6652
- Rewards/accuracies: 0.9254
- Rewards/margins: 5.9575
- Logps/rejected: -1102.4209
- Logps/chosen: -503.5470
- Logits/rejected: 1.9412
- Logits/chosen: 2.1408

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6914        | 0.0315 | 100  | 0.6912          | 0.0006         | -0.0036          | 0.6340             | 0.0042          | -36.2582       | -32.7125     | -1.6841         | -1.6367       |
| 0.6743        | 0.0629 | 200  | 0.6753          | -0.0009        | -0.0462          | 0.6321             | 0.0454          | -40.5232       | -32.8573     | -1.5154         | -1.4649       |
| 0.6112        | 0.0944 | 300  | 0.5905          | -0.5010        | -0.8365          | 0.6631             | 0.3356          | -119.5518      | -82.8670     | -0.5166         | -0.4325       |
| 0.4477        | 0.1258 | 400  | 0.4026          | -1.9267        | -3.0850          | 0.7201             | 1.1583          | -344.3972      | -225.4428    | -0.5023         | -0.3494       |
| 0.3583        | 0.1573 | 500  | 0.3063          | -2.4869        | -4.1367          | 0.7646             | 1.6498          | -449.5698      | -281.4605    | 0.3124          | 0.4717        |
| 0.3041        | 0.1887 | 600  | 0.2405          | -2.9070        | -4.9732          | 0.7918             | 2.0662          | -533.2189      | -323.4665    | 0.9644          | 1.1113        |
| 0.2487        | 0.2202 | 700  | 0.1964          | -3.4123        | -5.8172          | 0.8209             | 2.4050          | -617.6231      | -373.9985    | 1.1343          | 1.2933        |
| 0.218         | 0.2517 | 800  | 0.1547          | -3.6771        | -6.6251          | 0.8336             | 2.9480          | -698.4094      | -400.4795    | 1.5710          | 1.7290        |
| 0.1858        | 0.2831 | 900  | 0.1394          | -3.5484        | -6.6808          | 0.8485             | 3.1324          | -703.9799      | -387.6123    | 1.6988          | 1.8631        |
| 0.173         | 0.3146 | 1000 | 0.1176          | -3.4824        | -6.7705          | 0.8649             | 3.2881          | -712.9531      | -381.0118    | 1.8190          | 1.9776        |
| 0.1494        | 0.3460 | 1100 | 0.0979          | -3.7942        | -7.4529          | 0.8713             | 3.6587          | -781.1857      | -412.1861    | 1.8179          | 1.9865        |
| 0.149         | 0.3775 | 1200 | 0.0817          | -4.1856        | -8.2504          | 0.8843             | 4.0648          | -860.9355      | -451.3316    | 1.8715          | 2.0581        |
| 0.1143        | 0.4089 | 1300 | 0.0702          | -4.2444        | -8.6154          | 0.8884             | 4.3710          | -897.4431      | -457.2141    | 1.7765          | 1.9770        |
| 0.1204        | 0.4404 | 1400 | 0.0642          | -4.1442        | -8.6112          | 0.8966             | 4.4670          | -897.0154      | -447.1863    | 2.1996          | 2.3734        |
| 0.1013        | 0.4718 | 1500 | 0.0580          | -4.5031        | -9.1159          | 0.8951             | 4.6128          | -947.4904      | -483.0838    | 1.9514          | 2.1364        |
| 0.1011        | 0.5033 | 1600 | 0.0567          | -4.0373        | -8.5779          | 0.9067             | 4.5406          | -893.6846      | -436.5011    | 1.9239          | 2.1103        |
| 0.0853        | 0.5348 | 1700 | 0.0482          | -4.3119        | -9.2927          | 0.9067             | 4.9808          | -965.1708      | -463.9637    | 2.0648          | 2.2336        |
| 0.0897        | 0.5662 | 1800 | 0.0449          | -4.3018        | -9.4275          | 0.9101             | 5.1257          | -978.6490      | -462.9552    | 1.9037          | 2.0822        |
| 0.0717        | 0.5977 | 1900 | 0.0402          | -4.4391        | -9.8395          | 0.9112             | 5.4004          | -1019.8445     | -476.6779    | 2.0003          | 2.1749        |
| 0.0487        | 0.6291 | 2000 | 0.0368          | -5.4728        | -11.3180         | 0.9078             | 5.8452          | -1167.6968     | -580.0486    | 1.9355          | 2.1422        |
| 0.0683        | 0.6606 | 2100 | 0.0356          | -4.6736        | -10.2835         | 0.9190             | 5.6099          | -1064.2465     | -500.1268    | 2.0206          | 2.2058        |
| 0.0514        | 0.6920 | 2200 | 0.0341          | -4.6025        | -10.2228         | 0.9209             | 5.6203          | -1058.1812     | -493.0187    | 1.9362          | 2.1272        |
| 0.0623        | 0.7235 | 2300 | 0.0326          | -4.9398        | -10.7061         | 0.9213             | 5.7663          | -1106.5096     | -526.7491    | 1.8240          | 2.0327        |
| 0.0693        | 0.7550 | 2400 | 0.0313          | -4.8024        | -10.6310         | 0.9231             | 5.8286          | -1098.9999     | -513.0095    | 1.8580          | 2.0583        |
| 0.0543        | 0.7864 | 2500 | 0.0303          | -4.8132        | -10.7352         | 0.9228             | 5.9221          | -1109.4199     | -514.0873    | 1.9534          | 2.1471        |
| 0.0555        | 0.8179 | 2600 | 0.0301          | -4.7251        | -10.5626         | 0.9261             | 5.8375          | -1092.1620     | -505.2810    | 1.9398          | 2.1357        |
| 0.0646        | 0.8493 | 2700 | 0.0294          | -4.6930        | -10.6307         | 0.9261             | 5.9377          | -1098.9694     | -502.0694    | 2.0003          | 2.1947        |
| 0.0546        | 0.8808 | 2800 | 0.0287          | -4.8085        | -10.8169         | 0.9250             | 6.0084          | -1117.5887     | -513.6258    | 1.9596          | 2.1607        |
| 0.0702        | 0.9122 | 2900 | 0.0288          | -4.6970        | -10.6904         | 0.9243             | 5.9934          | -1104.9371     | -502.4718    | 1.9696          | 2.1647        |
| 0.0623        | 0.9437 | 3000 | 0.0286          | -4.7098        | -10.6743         | 0.9269             | 5.9645          | -1103.3302     | -503.7507    | 1.9440          | 2.1437        |
| 0.0593        | 0.9751 | 3100 | 0.0287          | -4.6985        | -10.6531         | 0.9276             | 5.9547          | -1101.2122     | -502.6163    | 1.9469          | 2.1464        |


### Framework versions

- PEFT 0.13.0
- Transformers 4.45.1
- Pytorch 2.4.1+cu121
- Datasets 3.0.1
- Tokenizers 0.20.0