guoqiang-x commited on
Commit
eebedcd
1 Parent(s): 5c4b6b8

Model save

Browse files
Files changed (4) hide show
  1. README.md +110 -0
  2. all_results.json +9 -0
  3. train_results.json +9 -0
  4. trainer_state.json +0 -0
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: mistralai/Mistral-7B-v0.1
3
+ library_name: peft
4
+ license: apache-2.0
5
+ tags:
6
+ - trl
7
+ - dpo
8
+ - generated_from_trainer
9
+ model-index:
10
+ - name: zephyr-7b-dpo-qlora
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # zephyr-7b-dpo-qlora
18
+
19
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 0.4788
22
+ - Rewards/chosen: -2.6215
23
+ - Rewards/rejected: -3.9183
24
+ - Rewards/accuracies: 0.7475
25
+ - Rewards/margins: 1.2968
26
+ - Logps/rejected: -636.4029
27
+ - Logps/chosen: -526.7561
28
+ - Logits/rejected: -1.0296
29
+ - Logits/chosen: -1.1658
30
+
31
+ ## Model description
32
+
33
+ More information needed
34
+
35
+ ## Intended uses & limitations
36
+
37
+ More information needed
38
+
39
+ ## Training and evaluation data
40
+
41
+ More information needed
42
+
43
+ ## Training procedure
44
+
45
+ ### Training hyperparameters
46
+
47
+ The following hyperparameters were used during training:
48
+ - learning_rate: 5e-06
49
+ - train_batch_size: 4
50
+ - eval_batch_size: 8
51
+ - seed: 42
52
+ - distributed_type: multi-GPU
53
+ - gradient_accumulation_steps: 4
54
+ - total_train_batch_size: 16
55
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
+ - lr_scheduler_type: cosine
57
+ - lr_scheduler_warmup_ratio: 0.1
58
+ - num_epochs: 1
59
+
60
+ ### Training results
61
+
62
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
63
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
64
+ | 0.6807 | 0.0262 | 100 | 0.6809 | 0.0514 | 0.0256 | 0.6555 | 0.0258 | -242.0131 | -259.4604 | -2.0551 | -2.1482 |
65
+ | 0.6438 | 0.0523 | 200 | 0.6356 | -0.1881 | -0.3389 | 0.6760 | 0.1508 | -278.4615 | -283.4154 | -2.0113 | -2.1000 |
66
+ | 0.6073 | 0.0785 | 300 | 0.6054 | -0.6866 | -0.9744 | 0.6815 | 0.2878 | -342.0091 | -333.2583 | -1.9949 | -2.0782 |
67
+ | 0.5956 | 0.1047 | 400 | 0.5824 | -1.4485 | -1.9599 | 0.6830 | 0.5114 | -440.5653 | -409.4522 | -1.5844 | -1.6758 |
68
+ | 0.5643 | 0.1309 | 500 | 0.5726 | -1.1458 | -1.7589 | 0.6915 | 0.6131 | -420.4636 | -379.1804 | -1.5624 | -1.6658 |
69
+ | 0.5373 | 0.1570 | 600 | 0.5631 | -1.1286 | -1.8164 | 0.7030 | 0.6878 | -426.2121 | -377.4605 | -1.6945 | -1.7955 |
70
+ | 0.5394 | 0.1832 | 700 | 0.5474 | -2.2700 | -3.0663 | 0.7040 | 0.7963 | -551.1992 | -491.6012 | -1.1628 | -1.2719 |
71
+ | 0.4983 | 0.2094 | 800 | 0.5323 | -1.5616 | -2.2966 | 0.7225 | 0.7349 | -474.2269 | -420.7654 | -1.5104 | -1.5996 |
72
+ | 0.4763 | 0.2355 | 900 | 0.5386 | -1.6130 | -2.4122 | 0.7160 | 0.7992 | -485.7890 | -425.9030 | -1.4156 | -1.4989 |
73
+ | 0.5266 | 0.2617 | 1000 | 0.5234 | -2.1788 | -3.0546 | 0.7280 | 0.8758 | -550.0311 | -482.4831 | -1.2043 | -1.3050 |
74
+ | 0.59 | 0.2879 | 1100 | 0.5278 | -1.6937 | -2.3427 | 0.7300 | 0.6490 | -478.8385 | -433.9710 | -0.9899 | -1.1100 |
75
+ | 0.5724 | 0.3141 | 1200 | 0.5071 | -1.5548 | -2.4072 | 0.7380 | 0.8523 | -485.2895 | -420.0863 | -1.1349 | -1.2473 |
76
+ | 0.5457 | 0.3402 | 1300 | 0.5013 | -1.7544 | -2.6264 | 0.7435 | 0.8721 | -507.2138 | -440.0385 | -1.2424 | -1.3403 |
77
+ | 0.5423 | 0.3664 | 1400 | 0.5132 | -1.6381 | -2.6114 | 0.7210 | 0.9733 | -505.7077 | -428.4097 | -1.5063 | -1.5869 |
78
+ | 0.4492 | 0.3926 | 1500 | 0.5122 | -1.5882 | -2.5891 | 0.7260 | 1.0010 | -503.4828 | -423.4175 | -1.4972 | -1.5950 |
79
+ | 0.5491 | 0.4187 | 1600 | 0.4956 | -1.6959 | -2.7056 | 0.7395 | 1.0098 | -515.1351 | -434.1913 | -1.1293 | -1.2525 |
80
+ | 0.5408 | 0.4449 | 1700 | 0.5111 | -3.0361 | -4.2392 | 0.7305 | 1.2030 | -668.4869 | -568.2142 | -1.0520 | -1.1774 |
81
+ | 0.4705 | 0.4711 | 1800 | 0.4949 | -2.1236 | -3.1894 | 0.7435 | 1.0658 | -563.5121 | -476.9663 | -1.3479 | -1.4508 |
82
+ | 0.4447 | 0.4973 | 1900 | 0.4984 | -2.0350 | -3.1505 | 0.7420 | 1.1155 | -559.6229 | -468.1011 | -1.1711 | -1.2951 |
83
+ | 0.4561 | 0.5234 | 2000 | 0.4929 | -1.9668 | -2.9588 | 0.7420 | 0.9919 | -540.4462 | -461.2839 | -1.3557 | -1.4696 |
84
+ | 0.5068 | 0.5496 | 2100 | 0.4969 | -3.1452 | -4.3633 | 0.7350 | 1.2180 | -680.8954 | -579.1231 | -1.1150 | -1.2426 |
85
+ | 0.4839 | 0.5758 | 2200 | 0.4927 | -2.3797 | -3.4376 | 0.7405 | 1.0579 | -588.3315 | -502.5681 | -1.2706 | -1.3886 |
86
+ | 0.4729 | 0.6019 | 2300 | 0.4924 | -2.8461 | -4.1210 | 0.7405 | 1.2749 | -656.6667 | -549.2124 | -1.0868 | -1.2145 |
87
+ | 0.4501 | 0.6281 | 2400 | 0.4900 | -2.9743 | -4.2366 | 0.7430 | 1.2623 | -668.2346 | -562.0333 | -0.9978 | -1.1257 |
88
+ | 0.4982 | 0.6543 | 2500 | 0.4872 | -2.4585 | -3.6758 | 0.7420 | 1.2173 | -612.1486 | -510.4511 | -1.0532 | -1.1862 |
89
+ | 0.4649 | 0.6805 | 2600 | 0.4881 | -2.5759 | -3.8831 | 0.7450 | 1.3072 | -632.8793 | -522.1908 | -1.0793 | -1.2115 |
90
+ | 0.556 | 0.7066 | 2700 | 0.4841 | -2.3432 | -3.5113 | 0.7460 | 1.1680 | -595.6959 | -498.9265 | -1.1004 | -1.2295 |
91
+ | 0.4617 | 0.7328 | 2800 | 0.4832 | -2.3495 | -3.6183 | 0.7460 | 1.2689 | -606.4033 | -499.5496 | -1.0627 | -1.1960 |
92
+ | 0.4916 | 0.7590 | 2900 | 0.4800 | -2.6711 | -3.9165 | 0.7455 | 1.2454 | -636.2195 | -531.7142 | -1.0032 | -1.1418 |
93
+ | 0.4708 | 0.7851 | 3000 | 0.4797 | -2.6166 | -3.7883 | 0.7475 | 1.1717 | -623.4008 | -526.2621 | -0.9962 | -1.1355 |
94
+ | 0.4804 | 0.8113 | 3100 | 0.4807 | -2.8224 | -4.1220 | 0.7475 | 1.2996 | -656.7728 | -546.8435 | -0.9953 | -1.1341 |
95
+ | 0.4866 | 0.8375 | 3200 | 0.4777 | -2.5496 | -3.7894 | 0.7475 | 1.2398 | -623.5103 | -519.5614 | -1.0276 | -1.1641 |
96
+ | 0.4967 | 0.8636 | 3300 | 0.4786 | -2.5578 | -3.8108 | 0.7480 | 1.2530 | -625.6535 | -520.3804 | -1.0241 | -1.1608 |
97
+ | 0.4272 | 0.8898 | 3400 | 0.4797 | -2.7223 | -4.0287 | 0.7460 | 1.3065 | -647.4435 | -536.8282 | -1.0071 | -1.1445 |
98
+ | 0.5272 | 0.9160 | 3500 | 0.4797 | -2.7144 | -4.0320 | 0.7470 | 1.3176 | -647.7730 | -536.0449 | -1.0233 | -1.1601 |
99
+ | 0.4441 | 0.9422 | 3600 | 0.4790 | -2.6459 | -3.9513 | 0.7470 | 1.3054 | -639.7043 | -529.1944 | -1.0278 | -1.1641 |
100
+ | 0.4823 | 0.9683 | 3700 | 0.4789 | -2.6279 | -3.9262 | 0.7480 | 1.2982 | -637.1880 | -527.3952 | -1.0329 | -1.1687 |
101
+ | 0.4996 | 0.9945 | 3800 | 0.4788 | -2.6215 | -3.9183 | 0.7475 | 1.2968 | -636.4029 | -526.7561 | -1.0296 | -1.1658 |
102
+
103
+
104
+ ### Framework versions
105
+
106
+ - PEFT 0.13.2
107
+ - Transformers 4.45.2
108
+ - Pytorch 2.1.2+cu121
109
+ - Datasets 3.0.1
110
+ - Tokenizers 0.20.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.517807064771465,
5
+ "train_runtime": 164396.369,
6
+ "train_samples": 61134,
7
+ "train_samples_per_second": 0.372,
8
+ "train_steps_per_second": 0.023
9
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.517807064771465,
5
+ "train_runtime": 164396.369,
6
+ "train_samples": 61134,
7
+ "train_samples_per_second": 0.372,
8
+ "train_steps_per_second": 0.023
9
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff