tagged more
Browse files
README.md
CHANGED
@@ -2,9 +2,17 @@
|
|
2 |
license: cc-by-nc-4.0
|
3 |
library_name: transformers
|
4 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
---
|
6 |
|
7 |
-
## NeverSleep's [Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss](https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss)
|
|
|
8 |
### the other 14 shannons will be remembered. [HQQ quantized](https://mobiusml.github.io/hqq_blog/) to 2 bits with 4 bit attention. Fits on a 3090 with room to grow. Supports full 32k context. I will not combine those assertions.
|
9 |
The attention tensors are 4 bit because mixtral reuses it for each expert - so it's only adding 0.4 GB and the quality improve dramatically. See [this](https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ) but horny and dying of chatml m<|alig>|nant tokenitis.|>
|
10 |
|
|
|
2 |
license: cc-by-nc-4.0
|
3 |
library_name: transformers
|
4 |
pipeline_tag: text-generation
|
5 |
+
tags:
|
6 |
+
- HQQ
|
7 |
+
- mixtral
|
8 |
+
- moe
|
9 |
+
- quantized
|
10 |
+
- 2bit
|
11 |
+
|
12 |
---
|
13 |
|
14 |
+
## NeverSleep's [Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss](https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss) 2 bit HQQ quant.
|
15 |
+
## 18.2 GB
|
16 |
### the other 14 shannons will be remembered. [HQQ quantized](https://mobiusml.github.io/hqq_blog/) to 2 bits with 4 bit attention. Fits on a 3090 with room to grow. Supports full 32k context. I will not combine those assertions.
|
17 |
The attention tensors are 4 bit because mixtral reuses it for each expert - so it's only adding 0.4 GB and the quality improve dramatically. See [this](https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ) but horny and dying of chatml m<|alig>|nant tokenitis.|>
|
18 |
|