File size: 8,063 Bytes
08b8142
6195168
7d0f0dc
 
08b8142
 
7d0f0dc
08b8142
 
 
 
 
 
 
 
 
 
 
 
 
7d0f0dc
6195168
2e2cf85
7d0f0dc
 
 
 
 
 
 
 
08b8142
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6195168
 
2e2cf85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
08b8142
 
 
 
6195168
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
base_model: TII-Frontier-Team/falcon3-3b-instruct
datasets:
- TII-Frontier-Team/Reasoning_DPO
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: zephyr-7b-dpo-qlora
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-qlora

This model is a fine-tuned version of [TII-Frontier-Team/falcon3-3b-instruct](https://huggingface.co/TII-Frontier-Team/falcon3-3b-instruct) on the TII-Frontier-Team/Reasoning_DPO dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0299
- Rewards/chosen: -4.6362
- Rewards/rejected: -10.4479
- Rewards/accuracies: 0.9306
- Rewards/margins: 5.8117
- Logps/rejected: -1080.7013
- Logps/chosen: -496.4129
- Logits/rejected: 2.0470
- Logits/chosen: 2.2558

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6913        | 0.0315 | 100  | 0.6911          | 0.0007         | -0.0036          | 0.6220             | 0.0042          | -36.2718       | -32.7285     | -1.6824         | -1.6348       |
| 0.6742        | 0.0629 | 200  | 0.6751          | 0.0003         | -0.0454          | 0.6276             | 0.0458          | -40.4596       | -32.7631     | -1.5097         | -1.4586       |
| 0.6081        | 0.0944 | 300  | 0.5872          | -0.5193        | -0.8644          | 0.6619             | 0.3451          | -122.3552      | -84.7303     | -0.4701         | -0.3830       |
| 0.4463        | 0.1258 | 400  | 0.3978          | -2.0312        | -3.2212          | 0.7190             | 1.1900          | -358.0407      | -235.9217    | -0.3673         | -0.2101       |
| 0.3548        | 0.1573 | 500  | 0.3048          | -2.5142        | -4.1605          | 0.7698             | 1.6464          | -451.9689      | -284.2137    | 0.4417          | 0.6033        |
| 0.3014        | 0.1887 | 600  | 0.2395          | -2.7662        | -4.8033          | 0.7963             | 2.0371          | -516.2451      | -309.4138    | 1.0026          | 1.1670        |
| 0.25          | 0.2202 | 700  | 0.1989          | -3.1039        | -5.4194          | 0.8235             | 2.3155          | -577.8538      | -343.1828    | 1.3421          | 1.5051        |
| 0.2163        | 0.2517 | 800  | 0.1564          | -3.4535        | -6.3881          | 0.8369             | 2.9346          | -674.7255      | -378.1511    | 1.8084          | 1.9697        |
| 0.178         | 0.2831 | 900  | 0.1349          | -3.4355        | -6.5411          | 0.8586             | 3.1056          | -690.0276      | -376.3503    | 1.7688          | 1.9492        |
| 0.1736        | 0.3146 | 1000 | 0.1127          | -3.5471        | -6.9599          | 0.8668             | 3.4128          | -731.9055      | -387.5069    | 2.0848          | 2.2440        |
| 0.1474        | 0.3460 | 1100 | 0.0982          | -3.6177        | -7.2322          | 0.8799             | 3.6145          | -759.1403      | -394.5700    | 1.8280          | 2.0076        |
| 0.1382        | 0.3775 | 1200 | 0.0819          | -4.3123        | -8.3603          | 0.8862             | 4.0480          | -871.9455      | -464.0287    | 2.0966          | 2.2833        |
| 0.1133        | 0.4089 | 1300 | 0.0714          | -4.0671        | -8.3309          | 0.8955             | 4.2638          | -869.0029      | -439.5055    | 1.9082          | 2.1044        |
| 0.1209        | 0.4404 | 1400 | 0.0634          | -4.8366        | -9.4739          | 0.8933             | 4.6374          | -983.3081      | -516.4533    | 2.0574          | 2.2678        |
| 0.1057        | 0.4718 | 1500 | 0.0575          | -4.1835        | -8.8581          | 0.9019             | 4.6746          | -921.7241      | -451.1488    | 2.0907          | 2.2780        |
| 0.1057        | 0.5033 | 1600 | 0.0536          | -4.2093        | -8.9250          | 0.9131             | 4.7157          | -928.4156      | -453.7231    | 2.0198          | 2.2136        |
| 0.0881        | 0.5348 | 1700 | 0.0490          | -4.4577        | -9.3694          | 0.9101             | 4.9118          | -972.8605      | -478.5644    | 1.8760          | 2.0804        |
| 0.0847        | 0.5662 | 1800 | 0.0441          | -4.2531        | -9.4108          | 0.9131             | 5.1578          | -977.0005      | -458.1054    | 2.0999          | 2.2904        |
| 0.0713        | 0.5977 | 1900 | 0.0411          | -4.4101        | -9.6543          | 0.9168             | 5.2442          | -1001.3448     | -473.8065    | 2.0887          | 2.2861        |
| 0.0553        | 0.6291 | 2000 | 0.0378          | -4.9687        | -10.5782         | 0.9123             | 5.6095          | -1093.7402     | -529.6686    | 2.0469          | 2.2608        |
| 0.0668        | 0.6606 | 2100 | 0.0362          | -4.7485        | -10.3227         | 0.9190             | 5.5741          | -1068.1823     | -507.6488    | 2.1354          | 2.3368        |
| 0.0528        | 0.6920 | 2200 | 0.0356          | -4.6766        | -10.2170         | 0.9175             | 5.5404          | -1057.6173     | -500.4605    | 1.9572          | 2.1594        |
| 0.0596        | 0.7235 | 2300 | 0.0340          | -4.6180        | -10.2121         | 0.9235             | 5.5942          | -1057.1299     | -494.5929    | 2.0041          | 2.2117        |
| 0.063         | 0.7550 | 2400 | 0.0328          | -4.5357        | -10.1876         | 0.9257             | 5.6519          | -1054.6713     | -486.3653    | 2.1493          | 2.3488        |
| 0.0558        | 0.7864 | 2500 | 0.0311          | -4.7155        | -10.5680         | 0.9261             | 5.8526          | -1092.7185     | -504.3435    | 2.1208          | 2.3275        |
| 0.0552        | 0.8179 | 2600 | 0.0312          | -4.6574        | -10.3658         | 0.9254             | 5.7084          | -1072.4943     | -498.5399    | 2.0544          | 2.2592        |
| 0.066         | 0.8493 | 2700 | 0.0305          | -4.6506        | -10.4766         | 0.9287             | 5.8259          | -1083.5740     | -497.8611    | 2.0914          | 2.2968        |
| 0.0568        | 0.8808 | 2800 | 0.0302          | -4.6423        | -10.4629         | 0.9302             | 5.8206          | -1082.2051     | -497.0266    | 2.0957          | 2.3026        |
| 0.0602        | 0.9122 | 2900 | 0.0299          | -4.6260        | -10.4608         | 0.9299             | 5.8348          | -1081.9958     | -495.3989    | 2.0861          | 2.2911        |
| 0.0634        | 0.9437 | 3000 | 0.0298          | -4.6454        | -10.4843         | 0.9313             | 5.8389          | -1084.3455     | -497.3409    | 2.0655          | 2.2739        |
| 0.0602        | 0.9751 | 3100 | 0.0299          | -4.6289        | -10.4404         | 0.9302             | 5.8116          | -1079.9603     | -495.6860    | 2.0537          | 2.2623        |


### Framework versions

- PEFT 0.13.0
- Transformers 4.45.1
- Pytorch 2.4.1+cu121
- Datasets 3.0.1
- Tokenizers 0.20.0