File size: 2,553 Bytes
423e23b ce0aa77 8beef73 a00eb31 423e23b 1cdd6f9 d9889c3 ce0aa77 b5427f7 7709890 62661c0 db9f3fa ce0aa77 7b0f79c ce0aa77 323a4cc ce0aa77 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
license: apache-2.0
language:
- en
tags:
- merge
- moe
---
![image/png](https://i.ibb.co/7k4j8Gm/icon10.png)
(Maybe i'll change the icon picture later.)
Experimental MoE, the idea is to have more active parameters than 7xX model would have and keep it's size lower than 20B.
This model has ~19.2B parameters.
[Exl2, 4.0 bpw](https://huggingface.co/xxx777xxxASD/PrimaMonarch-EroSumika-2x10.7B-128k-bpw-4.0) (Fits in 12GB VRAM/16k context/4-bit cache)
[Exl2, 6.0 bpw](https://huggingface.co/xxx777xxxASD/PrimaMonarch-EroSumika-2x10.7B-128k-bpw-6.0)
[GGUF](https://huggingface.co/xxx777xxxASD/PrimaMonarch-EroSumika-2x10.7B-128k-GGUF)
### Base model (self merge)
```
slices:
- sources:
- model: MistralInstruct-v0.2-128k
layer_range: [0, 24]
- sources:
- model: MistralInstruct-v0.2-128k
layer_range: [8, 24]
- sources:
- model: MistralInstruct-v0.2-128k
layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16
```
### First expert ("sandwich" merge)
[xxx777xxxASD/PrimaSumika-10.7B-128k](https://huggingface.co/xxx777xxxASD/PrimaSumika-10.7B-128k)
```
slices:
- sources:
- model: EroSumika-128k
layer_range: [0, 24]
- sources:
- model: Prima-Lelantacles-128k
layer_range: [8, 24]
- sources:
- model: EroSumika-128k
layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16
```
### Second expert ("sandwich" merge)
```
slices:
- sources:
- model: AlphaMonarch-7B-128k
layer_range: [0, 24]
- sources:
- model: NeuralHuman-128k
layer_range: [8, 24]
- sources:
- model: AlphaMonarch-7B-128k
layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16
```
Each 128k model is a slerp merge with [Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context](https://huggingface.co/Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context)
## Models used
- [localfultonextractor/Erosumika-7B](https://huggingface.co/localfultonextractor/Erosumika-7B)
- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
- [Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context](https://huggingface.co/Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context)
- [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B)
- [Nitral-AI/Prima-LelantaclesV6-7b](https://huggingface.co/Nitral-AI/Prima-LelantaclesV6-7b)
- [NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story](https://huggingface.co/NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story)
- [valine/MoreHuman](https://huggingface.co/valine/MoreHuman) |