File size: 2,553 Bytes
423e23b
 
ce0aa77
 
8beef73
 
a00eb31
423e23b
1cdd6f9
d9889c3
ce0aa77
b5427f7
 
7709890
 
62661c0
 
 
 
 
db9f3fa
ce0aa77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7b0f79c
ce0aa77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
323a4cc
ce0aa77
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
license: apache-2.0
language:
- en
tags:
- merge
- moe
---
![image/png](https://i.ibb.co/7k4j8Gm/icon10.png)
(Maybe i'll change the icon picture later.)

Experimental MoE, the idea is to have more active parameters than 7xX model would have and keep it's size lower than 20B.

This model has ~19.2B parameters.

[Exl2, 4.0 bpw](https://huggingface.co/xxx777xxxASD/PrimaMonarch-EroSumika-2x10.7B-128k-bpw-4.0) (Fits in 12GB VRAM/16k context/4-bit cache)

[Exl2, 6.0 bpw](https://huggingface.co/xxx777xxxASD/PrimaMonarch-EroSumika-2x10.7B-128k-bpw-6.0)

[GGUF](https://huggingface.co/xxx777xxxASD/PrimaMonarch-EroSumika-2x10.7B-128k-GGUF)

### Base model (self merge)
```
slices:
  - sources:
    - model: MistralInstruct-v0.2-128k
      layer_range: [0, 24]
  - sources:
    - model: MistralInstruct-v0.2-128k
      layer_range: [8, 24]
  - sources:
    - model: MistralInstruct-v0.2-128k
      layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16
```

### First expert ("sandwich" merge)
[xxx777xxxASD/PrimaSumika-10.7B-128k](https://huggingface.co/xxx777xxxASD/PrimaSumika-10.7B-128k)
```
slices:
  - sources:
    - model: EroSumika-128k
      layer_range: [0, 24]
  - sources:
    - model: Prima-Lelantacles-128k
      layer_range: [8, 24]
  - sources:
    - model: EroSumika-128k
      layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16

```

### Second expert ("sandwich" merge)
```
slices:
  - sources:
    - model: AlphaMonarch-7B-128k
      layer_range: [0, 24]
  - sources:
    - model: NeuralHuman-128k
      layer_range: [8, 24]
  - sources:
    - model: AlphaMonarch-7B-128k
      layer_range: [24, 32]
merge_method: passthrough
dtype: bfloat16
```

Each 128k model is a slerp merge with [Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context](https://huggingface.co/Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context)

## Models used

- [localfultonextractor/Erosumika-7B](https://huggingface.co/localfultonextractor/Erosumika-7B)
- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
- [Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context](https://huggingface.co/Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context)
- [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B)
- [Nitral-AI/Prima-LelantaclesV6-7b](https://huggingface.co/Nitral-AI/Prima-LelantaclesV6-7b)
- [NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story](https://huggingface.co/NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story)
- [valine/MoreHuman](https://huggingface.co/valine/MoreHuman)