lambdalabs
/

Llama-3.3-70B-Instruct-AWQ-4bit

4-bit precision

Model card Files Files and versions Community

chuanli-lambda commited on 26 days ago

Commit

a70257c

•

1 Parent(s): e841326

Update README.md

Files changed (1) hide show

README.md +44 -0

README.md CHANGED Viewed

@@ -24,4 +24,48 @@ model.quantize(tokenizer, quant_config=quant_config)
 # Save quantized model
 model.save_quantized(quant_path)
 tokenizer.save_pretrained(quant_path)
 ```

 # Save quantized model
 model.save_quantized(quant_path)
 tokenizer.save_pretrained(quant_path)
+```
+vLLM serve
+```
+vllm serve lambdalabs/Llama-3.3-70B-Instruct-AWQ-4bit \
+--swap-space 16 \
+--disable-log-requests \
+--tokenizer meta-llama/Llama-3.3-70B-Instruct \
+--tensor-parallel-size 2
+```
+Benchmark
+```
+python benchmark_serving.py \
+--backend vllm \
+--model lambdalabs/Llama-3.3-70B-Instruct-AWQ-4bit \
+--tokenizer meta-llama/Meta-Llama-3-70B \
+--dataset-name sharegpt \
+--dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json \
+--num-prompts 1000
+============ Serving Benchmark Result ============
+Successful requests:                     902
+Benchmark duration (s):                  128.07
+Total input tokens:                      177877
+Total generated tokens:                  182359
+Request throughput (req/s):              7.04
+Output token throughput (tok/s):         1423.85
+Total Token throughput (tok/s):          2812.71
+---------------Time to First Token----------------
+Mean TTFT (ms):                          47225.59
+Median TTFT (ms):                        43313.95
+P99 TTFT (ms):                           105587.66
+-----Time per Output Token (excl. 1st token)------
+Mean TPOT (ms):                          141.01
+Median TPOT (ms):                        148.94
+P99 TPOT (ms):                           174.16
+---------------Inter-token Latency----------------
+Mean ITL (ms):                           131.55
+Median ITL (ms):                         150.82
+P99 ITL (ms):                            344.50
+==================================================
 ```