lambdalabs
/

Llama-3.3-70B-Instruct-AWQ-4bit

4-bit precision

Model card Files Files and versions Community

chuanli-lambda commited on Dec 10, 2024

Commit

e841326

·

verified ·

1 Parent(s): d9f3dda

Update README.md

Files changed (1) hide show

README.md +27 -3

README.md CHANGED Viewed

@@ -1,3 +1,27 @@
----
-license: llama3.3
----

+---
+license: llama3.3
+---
+The original [Llama 3.3 70B Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) model quantized using AutoAWQ. Follow the instruction [here](https://docs.vllm.ai/en/latest/quantization/auto_awq.html).
+```
+from awq import AutoAWQForCausalLM
+from transformers import AutoTokenizer
+model_path = 'meta-llama/Llama-3.3-70B-Instruct'
+quant_path = 'Llama-3.3-70B-Instruct-AWQ-4bit'
+quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
+# Load model
+model = AutoAWQForCausalLM.from_pretrained(
+            model_path, **{"low_cpu_mem_usage": True, "use_cache": False}
+            )
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+# Quantize
+model.quantize(tokenizer, quant_config=quant_config)
+# Save quantized model
+model.save_quantized(quant_path)
+tokenizer.save_pretrained(quant_path)
+```