mobiuslabsgmbh
/

Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-HQQ

Text Generation

Mixture of Experts

Model card Files Files and versions Community

mobicham commited on Dec 18, 2023

Commit

4bf2205

•

1 Parent(s): 80be5fa

Update README.md

Files changed (1) hide show

README.md +30 -1

README.md CHANGED Viewed

@@ -60,4 +60,33 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 Please note, building a car requires significant expertise, resources, and adherence to strict safety and regulatory standards. It is not a project that can be undertaken without extensive knowledge and experience in automotive engineering, manufacturing, and business management.
 ----------------------------------------------------------------------------------------------------------------------------------
-</p>

 Please note, building a car requires significant expertise, resources, and adherence to strict safety and regulatory standards. It is not a project that can be undertaken without extensive knowledge and experience in automotive engineering, manufacturing, and business management.
 ----------------------------------------------------------------------------------------------------------------------------------
+</p>
+### Quantization
+You can reproduce the model using the following quant configs:
+``` Python
+from hqq.engine.hf import HQQModelForCausalLM, AutoTokenizer
+model_id  = "mistralai/Mixtral-8x7B-Instruct-v0.1"
+model     = HQQModelForCausalLM.from_pretrained(model_id, use_auth_token=hf_auth, cache_dir=cache_path)
+#Quantize params
+from hqq.core.quantize import *
+attn_prams     = BaseQuantizeConfig(nbits=4, group_size=64, quant_zero=True, quant_scale=True)
+attn_prams['scale_quant_params']['group_size'] = 256
+experts_params = BaseQuantizeConfig(nbits=2, group_size=16, quant_zero=True, quant_scale=True)
+quant_config = {}
+#Attention
+quant_config['self_attn.q_proj'] = attn_prams
+quant_config['self_attn.k_proj'] = attn_prams
+quant_config['self_attn.v_proj'] = attn_prams
+quant_config['self_attn.o_proj'] = attn_prams
+#Experts
+quant_config['block_sparse_moe.experts.w1'] = experts_params
+quant_config['block_sparse_moe.experts.w2'] = experts_params
+quant_config['block_sparse_moe.experts.w3'] = experts_params
+#Quantize
+model.quantize_model(quant_config=quant_config)
+```