This is Mistral AI's Mixtral Instruct v0.1 model, quantized on 02/24/2024. It works well.
How to quantize your own models with Windows and an RTX GPU:
Requirements:
- git
- python
Instructions:
The following example starts at the root of D drive and quantizes mistral's Mixtral-9x7B-Instruct-v0.1.
Windows command prompt - folder setup and git clone llama.cpp
- D:
- mkdir Mixtral
- git clone https://github.com/ggerganov/llama.cpp
Download llama.cpp
Assuming you want CUDA for your NVIDIA RTX GPU(s) use the links below, or grab latest compiled executables from https://github.com/ggerganov/llama.cpp/releases
Latest version as of Feb 24, 2024:
- https://github.com/ggerganov/llama.cpp/releases/download/b2253/cudart-llama-bin-win-cu12.2.0-x64.zip
- https://github.com/ggerganov/llama.cpp/releases/download/b2253/llama-b2253-bin-win-cublas-cu12.2.0-x64.zip
Extract the two .zip files directly into the llama.cpp folder you just git cloned. Overwrite files as prompted.
Download Mixtral
- Download the full-blast version of the model by downloading all .safetensors, .json, and .model files to D:\Mixtral:
- https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
Windows command prompt - Convert the model to fp16:
- D:\llama.cpp>python convert.py D:\Mixtral --outtype f16 --outfile D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin
Windows command prompt - Quantize the fp16 model to q5_k_m:
- D:\llama.cpp>quantize.exe D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.fp16.bin D:\Mixtral\Mixtral-8x7B-Instruct-v0.1.q5_k_m.gguf q5_k_m
That's it!
- Downloads last month
- 6
Model tree for OptimizeLLM/Mixtral-8x7B-Instruct-v0.1.q5_k_m
Base model
mistralai/Mixtral-8x7B-v0.1
Finetuned
mistralai/Mixtral-8x7B-Instruct-v0.1