|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
tags: |
|
- merge |
|
- moe |
|
--- |
|
![image/png](https://i.ibb.co/7k4j8Gm/icon10.png) |
|
(Maybe i'll change the icon picture later.) |
|
|
|
Experimental MoE, the idea is to have more active parameters than 7xX model would have and keep it's size lower than 20B. |
|
|
|
This model has ~19.2B parameters. |
|
|
|
[Exl2, 4.0 bpw](https://huggingface.co/xxx777xxxASD/PrimaMonarch-EroSumika-2x10.7B-128k-bpw-4.0) (Fits in 12GB VRAM/16k context/4-bit cache) |
|
|
|
[Exl2, 6.0 bpw](https://huggingface.co/xxx777xxxASD/PrimaMonarch-EroSumika-2x10.7B-128k-bpw-6.0) |
|
|
|
[GGUF](https://huggingface.co/xxx777xxxASD/PrimaMonarch-EroSumika-2x10.7B-128k-GGUF) |
|
|
|
### Base model (self merge) |
|
``` |
|
slices: |
|
- sources: |
|
- model: MistralInstruct-v0.2-128k |
|
layer_range: [0, 24] |
|
- sources: |
|
- model: MistralInstruct-v0.2-128k |
|
layer_range: [8, 24] |
|
- sources: |
|
- model: MistralInstruct-v0.2-128k |
|
layer_range: [24, 32] |
|
merge_method: passthrough |
|
dtype: bfloat16 |
|
``` |
|
|
|
### First expert ("sandwich" merge) |
|
[xxx777xxxASD/PrimaSumika-10.7B-128k](https://huggingface.co/xxx777xxxASD/PrimaSumika-10.7B-128k) |
|
``` |
|
slices: |
|
- sources: |
|
- model: EroSumika-128k |
|
layer_range: [0, 24] |
|
- sources: |
|
- model: Prima-Lelantacles-128k |
|
layer_range: [8, 24] |
|
- sources: |
|
- model: EroSumika-128k |
|
layer_range: [24, 32] |
|
merge_method: passthrough |
|
dtype: bfloat16 |
|
|
|
``` |
|
|
|
### Second expert ("sandwich" merge) |
|
``` |
|
slices: |
|
- sources: |
|
- model: AlphaMonarch-7B-128k |
|
layer_range: [0, 24] |
|
- sources: |
|
- model: NeuralHuman-128k |
|
layer_range: [8, 24] |
|
- sources: |
|
- model: AlphaMonarch-7B-128k |
|
layer_range: [24, 32] |
|
merge_method: passthrough |
|
dtype: bfloat16 |
|
``` |
|
|
|
Each 128k model is a slerp merge with [Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context](https://huggingface.co/Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context) |
|
|
|
## Models used |
|
|
|
- [localfultonextractor/Erosumika-7B](https://huggingface.co/localfultonextractor/Erosumika-7B) |
|
- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) |
|
- [Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context](https://huggingface.co/Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context) |
|
- [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B) |
|
- [Nitral-AI/Prima-LelantaclesV6-7b](https://huggingface.co/Nitral-AI/Prima-LelantaclesV6-7b) |
|
- [NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story](https://huggingface.co/NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story) |
|
- [valine/MoreHuman](https://huggingface.co/valine/MoreHuman) |