license: apache-2.0 | |
tags: | |
- moe | |
train: false | |
inference: false | |
pipeline_tag: text-generation | |
## Mixtral-8x7B-v0.1-hf-2bit_g16_s128-HQQ | |
This is a version of the Mixtral-8x7B-v0.1 model (https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) quantized to 2-bit via Half-Quadratic Quantization (HQQ). | |
### Basic Usage | |
To run the model, install the HQQ library from https://github.com/mobiusml/hqq and use it as follows: | |
``` Python | |
model_id = 'mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-2bit_g16_s128-HQQ' | |
#Load the model | |
from hqq.engine.hf import HQQModelForCausalLM, AutoTokenizer | |
tokenizer = AutoTokenizer.from_pretrained(model_id) | |
model = HQQModelForCausalLM.from_quantized(model_id) | |
#Optional | |
from hqq.core.quantize import * | |
HQQLinear.set_backend(HQQBackend.PYTORCH_COMPILE) | |
``` | |