DisOOM commited on
Commit
ba10c40
1 Parent(s): b505e09

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -1
README.md CHANGED
@@ -1,5 +1,62 @@
1
  ---
2
  license: other
3
  license_name: tongyi-qianwen
4
- license_link: LICENSE
 
 
 
 
 
 
 
 
 
 
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
  license_name: tongyi-qianwen
4
+ license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
5
+ tags:
6
+ - merge
7
+ - mergekit
8
+ - qwen2
9
+ - chat
10
+ - conversational
11
+ language:
12
+ - en
13
+ - chi
14
+ library_name: transformers
15
  ---
16
+ # Qwen1.5-124B-Chat-Merge
17
+ **--This is a 124b frankenmerge of [qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) created by interleaving layers of [qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) with itself using mergekit.--**
18
+
19
+ *Inspired by other frankenmerge models like [**goliath-120b**](https://huggingface.co/alpindale/goliath-120b) and [**miqu-1-120b**](https://huggingface.co/wolfram/miqu-1-120b)*
20
+
21
+ **-Quantize**
22
+
23
+ *Coming soon...*
24
+
25
+ **-Merge Configuration**
26
+
27
+ This yaml below:
28
+ ```yaml
29
+ dtype: float16
30
+ merge_method: passthrough
31
+ slices:
32
+ - sources:
33
+ - layer_range: [0, 20]
34
+ model: Qwen/Qwen1.5-72B-Chat
35
+ - sources:
36
+ - layer_range: [10, 30]
37
+ model: Qwen/Qwen1.5-72B-Chat
38
+ - sources:
39
+ - layer_range: [20, 40]
40
+ model: Qwen/Qwen1.5-72B-Chat
41
+ - sources:
42
+ - layer_range: [30, 50]
43
+ model: Qwen/Qwen1.5-72B-Chat
44
+ - sources:
45
+ - layer_range: [40, 60]
46
+ model: Qwen/Qwen1.5-72B-Chat
47
+ - sources:
48
+ - layer_range: [50, 70]
49
+ model: Qwen/Qwen1.5-72B-Chat
50
+ - sources:
51
+ - layer_range: [60, 80]
52
+ model: Qwen/Qwen1.5-72B-Chat
53
+ ```
54
+ **-Performance**
55
+
56
+ * Tips:I don't have the capability to conduct benchmark tests, nor can I even use it extensively enough, so my test results might not be entirely accurate.
57
+
58
+ It has better performance than the 72B version in most of my own tests (subjective) including comprehension, reasoning and coherence.
59
+
60
+ **-Thanks**
61
+ * 1.The tool used to merge this model [mergekit](https://github.com/arcee-ai/mergekit)
62
+ * 2.Qwen team for the excellent base models.