File size: 2,366 Bytes
c7f2701
 
 
ba10c40
 
 
 
 
 
 
 
 
 
 
c7f2701
ba10c40
d8b1da4
ba10c40
 
 
bcd10a0
 
 
 
ba10c40
 
db09c8c
ba10c40
 
 
 
 
 
 
 
 
 
df83281
ba10c40
 
df83281
ba10c40
 
df83281
ba10c40
 
df83281
ba10c40
 
df83281
ba10c40
 
df83281
ba10c40
 
df83281
ba10c40
 
 
d8b1da4
ba10c40
d8b1da4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
license: other
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
tags:
- merge
- mergekit
- qwen2
- chat
- conversational
language:
- en
- chi
library_name: transformers
---
# Qwen1.5-124B-Chat-Merge
**--This is a 124b frankenmerge of [qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) created by interleaving layers of [qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) with itself using [mergekit](https://github.com/arcee-ai/mergekit).--**

*Inspired by other frankenmerge models like [**goliath-120b**](https://huggingface.co/alpindale/goliath-120b) and [**miqu-1-120b**](https://huggingface.co/wolfram/miqu-1-120b)*

**-New Version Conming soon**

I have recently created another version of 124B frankenmerge qwen1.5 that performs better than this one, especially in terms of logical abilities and comprehension(It has reached a level close to that of proprietary models in some logic puzzles I designed myself.). It has achieved improved performance through the use of a different merge recipe and is about to be uploaded...

**-Quantize**

GGUF Here:[gguf](https://huggingface.co/DisOOM/Qwen1.5-124B-Chat-Merge-gguf/tree/main)

**-Merge Configuration**

This yaml below:
```yaml
dtype: float16
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 20]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [10, 30]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [20, 40]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [30, 50]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [40, 60]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [50, 70]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [60, 80]
      model: Qwen/Qwen1.5-72B-Chat
```
**-Performance**

* Tips:I don't have the capability to conduct benchmark tests, nor can I even use it extensively enough, so my test results might not be accurate.

It has better performance than the 72B version in most of my own tests (subjective) including comprehension, reasoning and coherence. But the improvement doesn't seem as significant as I had imagined (I've only conducted a few tests). If you believe in this model's performance, feel free to test it out or offer evaluations. Everyone's tests or evaluations are welcome.