DisOOM's picture
Update README.md
db09c8c verified
|
raw
history blame
No virus
1.96 kB
metadata
license: other
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
tags:
  - merge
  - mergekit
  - qwen2
  - chat
  - conversational
language:
  - en
  - chi
library_name: transformers

Qwen1.5-124B-Chat-Merge

--This is a 124b frankenmerge of qwen1.5-72B-Chat created by interleaving layers of qwen1.5-72B-Chat with itself using mergekit.--

Inspired by other frankenmerge models like goliath-120b and miqu-1-120b

-Quantize

GGUF Here:gguf

-Merge Configuration

This yaml below:

dtype: float16
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 20]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [10, 30]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [20, 40]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [30, 50]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [40, 60]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [50, 70]
      model: Qwen/Qwen1.5-72B-Chat
- sources:
  - layer_range: [60, 80]
      model: Qwen/Qwen1.5-72B-Chat

-Performance

  • Tips:I don't have the capability to conduct benchmark tests, nor can I even use it extensively enough, so my test results might not be accurate.

It has better performance than the 72B version in most of my own tests (subjective) including comprehension, reasoning and coherence. But the improvement doesn't seem as significant as I had imagined (I've only conducted a few tests). If you believe in this model's performance, feel free to test it out or offer evaluations. Everyone's tests or evaluations are welcome.