chuanli-lambda's picture
Update README.md
a70257c verified
|
raw
history blame
2.45 kB
---
license: llama3.3
---
The original [Llama 3.3 70B Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) model quantized using AutoAWQ. Follow the instruction [here](https://docs.vllm.ai/en/latest/quantization/auto_awq.html).
```
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model_path = 'meta-llama/Llama-3.3-70B-Instruct'
quant_path = 'Llama-3.3-70B-Instruct-AWQ-4bit'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
# Load model
model = AutoAWQForCausalLM.from_pretrained(
model_path, **{"low_cpu_mem_usage": True, "use_cache": False}
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# Quantize
model.quantize(tokenizer, quant_config=quant_config)
# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
```
vLLM serve
```
vllm serve lambdalabs/Llama-3.3-70B-Instruct-AWQ-4bit \
--swap-space 16 \
--disable-log-requests \
--tokenizer meta-llama/Llama-3.3-70B-Instruct \
--tensor-parallel-size 2
```
Benchmark
```
python benchmark_serving.py \
--backend vllm \
--model lambdalabs/Llama-3.3-70B-Instruct-AWQ-4bit \
--tokenizer meta-llama/Meta-Llama-3-70B \
--dataset-name sharegpt \
--dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json \
--num-prompts 1000
============ Serving Benchmark Result ============
Successful requests: 902
Benchmark duration (s): 128.07
Total input tokens: 177877
Total generated tokens: 182359
Request throughput (req/s): 7.04
Output token throughput (tok/s): 1423.85
Total Token throughput (tok/s): 2812.71
---------------Time to First Token----------------
Mean TTFT (ms): 47225.59
Median TTFT (ms): 43313.95
P99 TTFT (ms): 105587.66
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 141.01
Median TPOT (ms): 148.94
P99 TPOT (ms): 174.16
---------------Inter-token Latency----------------
Mean ITL (ms): 131.55
Median ITL (ms): 150.82
P99 ITL (ms): 344.50
==================================================
```