guoqiang-x
commited on
Commit
•
eebedcd
1
Parent(s):
5c4b6b8
Model save
Browse files- README.md +110 -0
- all_results.json +9 -0
- train_results.json +9 -0
- trainer_state.json +0 -0
README.md
ADDED
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: mistralai/Mistral-7B-v0.1
|
3 |
+
library_name: peft
|
4 |
+
license: apache-2.0
|
5 |
+
tags:
|
6 |
+
- trl
|
7 |
+
- dpo
|
8 |
+
- generated_from_trainer
|
9 |
+
model-index:
|
10 |
+
- name: zephyr-7b-dpo-qlora
|
11 |
+
results: []
|
12 |
+
---
|
13 |
+
|
14 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
+
should probably proofread and complete it, then remove this comment. -->
|
16 |
+
|
17 |
+
# zephyr-7b-dpo-qlora
|
18 |
+
|
19 |
+
This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
|
20 |
+
It achieves the following results on the evaluation set:
|
21 |
+
- Loss: 0.4788
|
22 |
+
- Rewards/chosen: -2.6215
|
23 |
+
- Rewards/rejected: -3.9183
|
24 |
+
- Rewards/accuracies: 0.7475
|
25 |
+
- Rewards/margins: 1.2968
|
26 |
+
- Logps/rejected: -636.4029
|
27 |
+
- Logps/chosen: -526.7561
|
28 |
+
- Logits/rejected: -1.0296
|
29 |
+
- Logits/chosen: -1.1658
|
30 |
+
|
31 |
+
## Model description
|
32 |
+
|
33 |
+
More information needed
|
34 |
+
|
35 |
+
## Intended uses & limitations
|
36 |
+
|
37 |
+
More information needed
|
38 |
+
|
39 |
+
## Training and evaluation data
|
40 |
+
|
41 |
+
More information needed
|
42 |
+
|
43 |
+
## Training procedure
|
44 |
+
|
45 |
+
### Training hyperparameters
|
46 |
+
|
47 |
+
The following hyperparameters were used during training:
|
48 |
+
- learning_rate: 5e-06
|
49 |
+
- train_batch_size: 4
|
50 |
+
- eval_batch_size: 8
|
51 |
+
- seed: 42
|
52 |
+
- distributed_type: multi-GPU
|
53 |
+
- gradient_accumulation_steps: 4
|
54 |
+
- total_train_batch_size: 16
|
55 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
56 |
+
- lr_scheduler_type: cosine
|
57 |
+
- lr_scheduler_warmup_ratio: 0.1
|
58 |
+
- num_epochs: 1
|
59 |
+
|
60 |
+
### Training results
|
61 |
+
|
62 |
+
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|
63 |
+
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
|
64 |
+
| 0.6807 | 0.0262 | 100 | 0.6809 | 0.0514 | 0.0256 | 0.6555 | 0.0258 | -242.0131 | -259.4604 | -2.0551 | -2.1482 |
|
65 |
+
| 0.6438 | 0.0523 | 200 | 0.6356 | -0.1881 | -0.3389 | 0.6760 | 0.1508 | -278.4615 | -283.4154 | -2.0113 | -2.1000 |
|
66 |
+
| 0.6073 | 0.0785 | 300 | 0.6054 | -0.6866 | -0.9744 | 0.6815 | 0.2878 | -342.0091 | -333.2583 | -1.9949 | -2.0782 |
|
67 |
+
| 0.5956 | 0.1047 | 400 | 0.5824 | -1.4485 | -1.9599 | 0.6830 | 0.5114 | -440.5653 | -409.4522 | -1.5844 | -1.6758 |
|
68 |
+
| 0.5643 | 0.1309 | 500 | 0.5726 | -1.1458 | -1.7589 | 0.6915 | 0.6131 | -420.4636 | -379.1804 | -1.5624 | -1.6658 |
|
69 |
+
| 0.5373 | 0.1570 | 600 | 0.5631 | -1.1286 | -1.8164 | 0.7030 | 0.6878 | -426.2121 | -377.4605 | -1.6945 | -1.7955 |
|
70 |
+
| 0.5394 | 0.1832 | 700 | 0.5474 | -2.2700 | -3.0663 | 0.7040 | 0.7963 | -551.1992 | -491.6012 | -1.1628 | -1.2719 |
|
71 |
+
| 0.4983 | 0.2094 | 800 | 0.5323 | -1.5616 | -2.2966 | 0.7225 | 0.7349 | -474.2269 | -420.7654 | -1.5104 | -1.5996 |
|
72 |
+
| 0.4763 | 0.2355 | 900 | 0.5386 | -1.6130 | -2.4122 | 0.7160 | 0.7992 | -485.7890 | -425.9030 | -1.4156 | -1.4989 |
|
73 |
+
| 0.5266 | 0.2617 | 1000 | 0.5234 | -2.1788 | -3.0546 | 0.7280 | 0.8758 | -550.0311 | -482.4831 | -1.2043 | -1.3050 |
|
74 |
+
| 0.59 | 0.2879 | 1100 | 0.5278 | -1.6937 | -2.3427 | 0.7300 | 0.6490 | -478.8385 | -433.9710 | -0.9899 | -1.1100 |
|
75 |
+
| 0.5724 | 0.3141 | 1200 | 0.5071 | -1.5548 | -2.4072 | 0.7380 | 0.8523 | -485.2895 | -420.0863 | -1.1349 | -1.2473 |
|
76 |
+
| 0.5457 | 0.3402 | 1300 | 0.5013 | -1.7544 | -2.6264 | 0.7435 | 0.8721 | -507.2138 | -440.0385 | -1.2424 | -1.3403 |
|
77 |
+
| 0.5423 | 0.3664 | 1400 | 0.5132 | -1.6381 | -2.6114 | 0.7210 | 0.9733 | -505.7077 | -428.4097 | -1.5063 | -1.5869 |
|
78 |
+
| 0.4492 | 0.3926 | 1500 | 0.5122 | -1.5882 | -2.5891 | 0.7260 | 1.0010 | -503.4828 | -423.4175 | -1.4972 | -1.5950 |
|
79 |
+
| 0.5491 | 0.4187 | 1600 | 0.4956 | -1.6959 | -2.7056 | 0.7395 | 1.0098 | -515.1351 | -434.1913 | -1.1293 | -1.2525 |
|
80 |
+
| 0.5408 | 0.4449 | 1700 | 0.5111 | -3.0361 | -4.2392 | 0.7305 | 1.2030 | -668.4869 | -568.2142 | -1.0520 | -1.1774 |
|
81 |
+
| 0.4705 | 0.4711 | 1800 | 0.4949 | -2.1236 | -3.1894 | 0.7435 | 1.0658 | -563.5121 | -476.9663 | -1.3479 | -1.4508 |
|
82 |
+
| 0.4447 | 0.4973 | 1900 | 0.4984 | -2.0350 | -3.1505 | 0.7420 | 1.1155 | -559.6229 | -468.1011 | -1.1711 | -1.2951 |
|
83 |
+
| 0.4561 | 0.5234 | 2000 | 0.4929 | -1.9668 | -2.9588 | 0.7420 | 0.9919 | -540.4462 | -461.2839 | -1.3557 | -1.4696 |
|
84 |
+
| 0.5068 | 0.5496 | 2100 | 0.4969 | -3.1452 | -4.3633 | 0.7350 | 1.2180 | -680.8954 | -579.1231 | -1.1150 | -1.2426 |
|
85 |
+
| 0.4839 | 0.5758 | 2200 | 0.4927 | -2.3797 | -3.4376 | 0.7405 | 1.0579 | -588.3315 | -502.5681 | -1.2706 | -1.3886 |
|
86 |
+
| 0.4729 | 0.6019 | 2300 | 0.4924 | -2.8461 | -4.1210 | 0.7405 | 1.2749 | -656.6667 | -549.2124 | -1.0868 | -1.2145 |
|
87 |
+
| 0.4501 | 0.6281 | 2400 | 0.4900 | -2.9743 | -4.2366 | 0.7430 | 1.2623 | -668.2346 | -562.0333 | -0.9978 | -1.1257 |
|
88 |
+
| 0.4982 | 0.6543 | 2500 | 0.4872 | -2.4585 | -3.6758 | 0.7420 | 1.2173 | -612.1486 | -510.4511 | -1.0532 | -1.1862 |
|
89 |
+
| 0.4649 | 0.6805 | 2600 | 0.4881 | -2.5759 | -3.8831 | 0.7450 | 1.3072 | -632.8793 | -522.1908 | -1.0793 | -1.2115 |
|
90 |
+
| 0.556 | 0.7066 | 2700 | 0.4841 | -2.3432 | -3.5113 | 0.7460 | 1.1680 | -595.6959 | -498.9265 | -1.1004 | -1.2295 |
|
91 |
+
| 0.4617 | 0.7328 | 2800 | 0.4832 | -2.3495 | -3.6183 | 0.7460 | 1.2689 | -606.4033 | -499.5496 | -1.0627 | -1.1960 |
|
92 |
+
| 0.4916 | 0.7590 | 2900 | 0.4800 | -2.6711 | -3.9165 | 0.7455 | 1.2454 | -636.2195 | -531.7142 | -1.0032 | -1.1418 |
|
93 |
+
| 0.4708 | 0.7851 | 3000 | 0.4797 | -2.6166 | -3.7883 | 0.7475 | 1.1717 | -623.4008 | -526.2621 | -0.9962 | -1.1355 |
|
94 |
+
| 0.4804 | 0.8113 | 3100 | 0.4807 | -2.8224 | -4.1220 | 0.7475 | 1.2996 | -656.7728 | -546.8435 | -0.9953 | -1.1341 |
|
95 |
+
| 0.4866 | 0.8375 | 3200 | 0.4777 | -2.5496 | -3.7894 | 0.7475 | 1.2398 | -623.5103 | -519.5614 | -1.0276 | -1.1641 |
|
96 |
+
| 0.4967 | 0.8636 | 3300 | 0.4786 | -2.5578 | -3.8108 | 0.7480 | 1.2530 | -625.6535 | -520.3804 | -1.0241 | -1.1608 |
|
97 |
+
| 0.4272 | 0.8898 | 3400 | 0.4797 | -2.7223 | -4.0287 | 0.7460 | 1.3065 | -647.4435 | -536.8282 | -1.0071 | -1.1445 |
|
98 |
+
| 0.5272 | 0.9160 | 3500 | 0.4797 | -2.7144 | -4.0320 | 0.7470 | 1.3176 | -647.7730 | -536.0449 | -1.0233 | -1.1601 |
|
99 |
+
| 0.4441 | 0.9422 | 3600 | 0.4790 | -2.6459 | -3.9513 | 0.7470 | 1.3054 | -639.7043 | -529.1944 | -1.0278 | -1.1641 |
|
100 |
+
| 0.4823 | 0.9683 | 3700 | 0.4789 | -2.6279 | -3.9262 | 0.7480 | 1.2982 | -637.1880 | -527.3952 | -1.0329 | -1.1687 |
|
101 |
+
| 0.4996 | 0.9945 | 3800 | 0.4788 | -2.6215 | -3.9183 | 0.7475 | 1.2968 | -636.4029 | -526.7561 | -1.0296 | -1.1658 |
|
102 |
+
|
103 |
+
|
104 |
+
### Framework versions
|
105 |
+
|
106 |
+
- PEFT 0.13.2
|
107 |
+
- Transformers 4.45.2
|
108 |
+
- Pytorch 2.1.2+cu121
|
109 |
+
- Datasets 3.0.1
|
110 |
+
- Tokenizers 0.20.1
|
all_results.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 1.0,
|
3 |
+
"total_flos": 0.0,
|
4 |
+
"train_loss": 0.517807064771465,
|
5 |
+
"train_runtime": 164396.369,
|
6 |
+
"train_samples": 61134,
|
7 |
+
"train_samples_per_second": 0.372,
|
8 |
+
"train_steps_per_second": 0.023
|
9 |
+
}
|
train_results.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"epoch": 1.0,
|
3 |
+
"total_flos": 0.0,
|
4 |
+
"train_loss": 0.517807064771465,
|
5 |
+
"train_runtime": 164396.369,
|
6 |
+
"train_samples": 61134,
|
7 |
+
"train_samples_per_second": 0.372,
|
8 |
+
"train_steps_per_second": 0.023
|
9 |
+
}
|
trainer_state.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|