File size: 6,670 Bytes
c60f908
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
license: apache-2.0
library_name: peft
tags:
- trl
- dpo
- generated_from_trainer
base_model: TheBloke/Mistral-7B-v0.1-GPTQ
model-index:
- name: mistral-dpo
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# mistral-dpo

This model is a fine-tuned version of [TheBloke/Mistral-7B-v0.1-GPTQ](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0000
- Rewards/chosen: -2.0502
- Rewards/rejected: -28.3632
- Rewards/accuracies: 1.0
- Rewards/margins: 26.3129
- Logps/rejected: -399.8283
- Logps/chosen: -35.7179
- Logits/rejected: -2.1171
- Logits/chosen: -1.8480

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 250
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6453        | 0.2   | 10   | 0.4086          | 0.1393         | -0.7001          | 1.0                | 0.8394          | -123.1976      | -13.8225     | -2.5461         | -2.5162       |
| 0.1759        | 0.4   | 20   | 0.0051          | 0.3963         | -6.4413          | 1.0                | 6.8376          | -180.6101      | -11.2527     | -2.5253         | -2.4045       |
| 0.0015        | 0.6   | 30   | 0.0000          | 0.2885         | -20.7441         | 1.0                | 21.0326         | -323.6376      | -12.3309     | -2.2440         | -1.8851       |
| 0.0           | 0.8   | 40   | 0.0000          | -0.6913        | -26.5964         | 1.0                | 25.9051         | -382.1607      | -22.1282     | -1.9054         | -1.5507       |
| 0.0           | 1.0   | 50   | 0.0000          | -1.6661        | -28.8376         | 1.0                | 27.1715         | -404.5731      | -31.8766     | -1.7581         | -1.4145       |
| 0.0           | 1.2   | 60   | 0.0000          | -2.1659        | -29.6823         | 1.0                | 27.5164         | -413.0200      | -36.8745     | -1.7071         | -1.3649       |
| 0.0           | 1.4   | 70   | 0.0000          | -2.0973        | -30.0476         | 1.0                | 27.9503         | -416.6729      | -36.1886     | -1.6955         | -1.3541       |
| 0.0           | 1.6   | 80   | 0.0000          | -2.0065        | -30.1726         | 1.0                | 28.1661         | -417.9230      | -35.2805     | -1.6941         | -1.3519       |
| 0.0           | 1.8   | 90   | 0.0000          | -1.9541        | -30.2266         | 1.0                | 28.2724         | -418.4622      | -34.7568     | -1.6935         | -1.3518       |
| 0.0023        | 2.0   | 100  | 0.0000          | -0.7061        | -30.2814         | 1.0                | 29.5753         | -419.0107      | -22.2763     | -1.7664         | -1.4215       |
| 0.0           | 2.2   | 110  | 0.0000          | -1.6234        | -29.4682         | 1.0                | 27.8448         | -410.8783      | -31.4494     | -2.0371         | -1.7164       |
| 0.0           | 2.4   | 120  | 0.0000          | -1.9528        | -28.6154         | 1.0                | 26.6626         | -402.3507      | -34.7431     | -2.0991         | -1.8126       |
| 0.0           | 2.6   | 130  | 0.0000          | -2.0210        | -28.3739         | 1.0                | 26.3529         | -399.9358      | -35.4253     | -2.1141         | -1.8394       |
| 0.0           | 2.8   | 140  | 0.0000          | -2.0443        | -28.2878         | 1.0                | 26.2435         | -399.0752      | -35.6588     | -2.1185         | -1.8487       |
| 0.0           | 3.0   | 150  | 0.0000          | -2.0504        | -28.2651         | 1.0                | 26.2147         | -398.8474      | -35.7192     | -2.1201         | -1.8510       |
| 0.0           | 3.2   | 160  | 0.0000          | -2.0500        | -28.2657         | 1.0                | 26.2157         | -398.8541      | -35.7157     | -2.1202         | -1.8519       |
| 0.0           | 3.4   | 170  | 0.0000          | -2.0530        | -28.2687         | 1.0                | 26.2157         | -398.8837      | -35.7460     | -2.1205         | -1.8521       |
| 0.0           | 3.6   | 180  | 0.0000          | -2.0529        | -28.2660         | 1.0                | 26.2131         | -398.8570      | -35.7444     | -2.1202         | -1.8515       |
| 0.0           | 3.8   | 190  | 0.0000          | -2.0531        | -28.2649         | 1.0                | 26.2119         | -398.8461      | -35.7464     | -2.1202         | -1.8519       |
| 0.0           | 4.0   | 200  | 0.0000          | -2.0579        | -28.3150         | 1.0                | 26.2571         | -399.3466      | -35.7943     | -2.1191         | -1.8507       |
| 0.0           | 4.2   | 210  | 0.0000          | -2.0509        | -28.3341         | 1.0                | 26.2832         | -399.5381      | -35.7246     | -2.1178         | -1.8487       |
| 0.0           | 4.4   | 220  | 0.0000          | -2.0516        | -28.3405         | 1.0                | 26.2889         | -399.6018      | -35.7316     | -2.1178         | -1.8490       |
| 0.0           | 4.6   | 230  | 0.0000          | -2.0516        | -28.3495         | 1.0                | 26.2979         | -399.6917      | -35.7317     | -2.1176         | -1.8489       |
| 0.0           | 4.8   | 240  | 0.0000          | -2.0508        | -28.3684         | 1.0                | 26.3176         | -399.8806      | -35.7236     | -2.1173         | -1.8488       |
| 0.0           | 5.0   | 250  | 0.0000          | -2.0502        | -28.3632         | 1.0                | 26.3129         | -399.8283      | -35.7179     | -2.1171         | -1.8480       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.36.2
- Pytorch 2.0.1+cu118
- Datasets 2.15.0
- Tokenizers 0.15.0