mobicham commited on
Commit
4bf2205
1 Parent(s): 80be5fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -1
README.md CHANGED
@@ -60,4 +60,33 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
60
  Please note, building a car requires significant expertise, resources, and adherence to strict safety and regulatory standards. It is not a project that can be undertaken without extensive knowledge and experience in automotive engineering, manufacturing, and business management.
61
 
62
  ----------------------------------------------------------------------------------------------------------------------------------
63
- </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  Please note, building a car requires significant expertise, resources, and adherence to strict safety and regulatory standards. It is not a project that can be undertaken without extensive knowledge and experience in automotive engineering, manufacturing, and business management.
61
 
62
  ----------------------------------------------------------------------------------------------------------------------------------
63
+ </p>
64
+
65
+ ### Quantization
66
+ You can reproduce the model using the following quant configs:
67
+
68
+ ``` Python
69
+ from hqq.engine.hf import HQQModelForCausalLM, AutoTokenizer
70
+ model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
71
+ model = HQQModelForCausalLM.from_pretrained(model_id, use_auth_token=hf_auth, cache_dir=cache_path)
72
+
73
+ #Quantize params
74
+ from hqq.core.quantize import *
75
+ attn_prams = BaseQuantizeConfig(nbits=4, group_size=64, quant_zero=True, quant_scale=True)
76
+ attn_prams['scale_quant_params']['group_size'] = 256
77
+ experts_params = BaseQuantizeConfig(nbits=2, group_size=16, quant_zero=True, quant_scale=True)
78
+
79
+ quant_config = {}
80
+ #Attention
81
+ quant_config['self_attn.q_proj'] = attn_prams
82
+ quant_config['self_attn.k_proj'] = attn_prams
83
+ quant_config['self_attn.v_proj'] = attn_prams
84
+ quant_config['self_attn.o_proj'] = attn_prams
85
+ #Experts
86
+ quant_config['block_sparse_moe.experts.w1'] = experts_params
87
+ quant_config['block_sparse_moe.experts.w2'] = experts_params
88
+ quant_config['block_sparse_moe.experts.w3'] = experts_params
89
+
90
+ #Quantize
91
+ model.quantize_model(quant_config=quant_config)
92
+ ```