File size: 6,675 Bytes
260ef52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
license: apache-2.0
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: TheBloke/Mistral-7B-v0.1-GPTQ
model-index:
- name: mistral-dpo
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistral-dpo

This model is a fine-tuned version of [TheBloke/Mistral-7B-v0.1-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5603
- Rewards/chosen: -12.5467
- Rewards/rejected: -28.4037
- Rewards/accuracies: 0.8571
- Rewards/margins: 15.8571
- Logps/rejected: -411.7001
- Logps/chosen: -215.4742
- Logits/rejected: -0.7509
- Logits/chosen: -0.7707

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 250
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6785        | 0.02  | 10   | 0.6291          | -0.0030        | -0.1321          | 0.875              | 0.1291          | -128.9836      | -90.0372     | -2.3988         | -2.3489       |
| 0.5661        | 0.04  | 20   | 0.4421          | 0.0008         | -0.6608          | 0.875              | 0.6616          | -134.2708      | -89.9997     | -2.3613         | -2.3042       |
| 0.3257        | 0.06  | 30   | 0.3584          | -0.7139        | -2.3035          | 0.8393             | 1.5897          | -150.6985      | -97.1463     | -2.2995         | -2.2546       |
| 0.3563        | 0.08  | 40   | 0.5522          | -3.0636        | -6.7067          | 0.8214             | 3.6431          | -194.7305      | -120.6441    | -2.1396         | -2.0849       |
| 0.0013        | 0.1   | 50   | 1.3365          | -8.4317        | -16.1649         | 0.8036             | 7.7332          | -289.3120      | -174.3246    | -1.8243         | -1.7710       |
| 0.0277        | 0.12  | 60   | 2.4224          | -14.8512       | -25.9570         | 0.8214             | 11.1059         | -387.2331      | -238.5192    | -1.5464         | -1.4950       |
| 1.5742        | 0.14  | 70   | 3.1075          | -17.8751       | -29.6755         | 0.8214             | 11.8004         | -424.4176      | -268.7585    | -1.4071         | -1.3681       |
| 14.1036       | 0.16  | 80   | 3.6238          | -20.4205       | -32.7881         | 0.8214             | 12.3675         | -455.5435      | -294.2129    | -1.3237         | -1.2729       |
| 8.531         | 0.18  | 90   | 3.7123          | -21.7863       | -36.0729         | 0.8214             | 14.2866         | -488.3922      | -307.8707    | -1.2975         | -1.2388       |
| 4.6429        | 0.2   | 100  | 2.0394          | -16.6472       | -29.8508         | 0.8393             | 13.2036         | -426.1712      | -256.4797    | -1.3189         | -1.2784       |
| 0.0565        | 0.22  | 110  | 1.6331          | -14.8501       | -27.2015         | 0.8393             | 12.3514         | -399.6779      | -238.5090    | -1.2425         | -1.2118       |
| 0.0056        | 0.24  | 120  | 1.4774          | -15.0784       | -28.0012         | 0.8214             | 12.9228         | -407.6750      | -240.7916    | -1.0819         | -1.0579       |
| 0.0804        | 0.26  | 130  | 1.5398          | -16.7630       | -30.6346         | 0.8393             | 13.8716         | -434.0091      | -257.6378    | -1.0054         | -0.9846       |
| 0.0001        | 0.28  | 140  | 1.5159          | -17.9940       | -33.3459         | 0.8393             | 15.3520         | -461.1225      | -269.9474    | -0.8887         | -0.8844       |
| 0.0           | 0.3   | 150  | 1.5062          | -18.4614       | -34.3481         | 0.8393             | 15.8868         | -471.1445      | -274.6213    | -0.8496         | -0.8503       |
| 0.0           | 0.32  | 160  | 1.5035          | -18.6474       | -34.7165         | 0.8393             | 16.0692         | -474.8286      | -276.4815    | -0.8343         | -0.8367       |
| 4.2123        | 0.34  | 170  | 1.2949          | -17.3471       | -32.6721         | 0.8571             | 15.3250         | -454.3839      | -263.4789    | -0.8672         | -0.8661       |
| 2.13          | 0.36  | 180  | 0.9892          | -15.2178       | -30.1177         | 0.8571             | 14.8999         | -428.8398      | -242.1859    | -0.8992         | -0.9047       |
| 2.0146        | 0.38  | 190  | 0.8365          | -13.9461       | -28.5983         | 0.8571             | 14.6522         | -413.6459      | -229.4683    | -0.9104         | -0.9224       |
| 0.0706        | 0.4   | 200  | 0.7897          | -14.5198       | -29.8989         | 0.8571             | 15.3791         | -426.6525      | -235.2058    | -0.8064         | -0.8224       |
| 5.2517        | 0.42  | 210  | 0.6621          | -13.7049       | -29.2354         | 0.8571             | 15.5305         | -420.0170      | -227.0569    | -0.7981         | -0.8124       |
| 0.0499        | 0.44  | 220  | 0.5752          | -12.8733       | -28.5299         | 0.8571             | 15.6566         | -412.9616      | -218.7403    | -0.7801         | -0.7990       |
| 0.5779        | 0.46  | 230  | 0.5611          | -12.6633       | -28.3836         | 0.8571             | 15.7203         | -411.4988      | -216.6405    | -0.7789         | -0.7975       |
| 0.0322        | 0.48  | 240  | 0.5624          | -12.6348       | -28.4766         | 0.8571             | 15.8418         | -412.4289      | -216.3556    | -0.7696         | -0.7878       |
| 0.1347        | 0.5   | 250  | 0.5603          | -12.5467       | -28.4037         | 0.8571             | 15.8571         | -411.7001      | -215.4742    | -0.7509         | -0.7707       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.0.1+cu118
- Datasets 2.15.0
- Tokenizers 0.15.0