luow-amd haoyang-amd commited on
Commit
348942e
1 Parent(s): 4dd7e56

Update README.md (#6)

Browse files

- Update README.md (4475161013cdda0a1b64676e3371789080163be7)


Co-authored-by: haoyanli <haoyang-amd@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -24,8 +24,9 @@ python3 quantize_quark.py \
24
  --output_dir Meta-Llama-3.1-405B-Instruct-FP8-KV \
25
  --quant_scheme w_fp8_a_fp8 \
26
  --kv_cache_dtype fp8 \
27
- --num_calib_data 128 \
28
- --model_export quark_safetensors
 
29
 
30
  # If model size is too large for single GPU, please use multi GPU instead.
31
  python3 quantize_quark.py \
@@ -33,8 +34,9 @@ python3 quantize_quark.py \
33
  --output_dir Meta-Llama-3.1-405B-Instruct-FP8-KV \
34
  --quant_scheme w_fp8_a_fp8 \
35
  --kv_cache_dtype fp8 \
36
- --num_calib_data 128 \
37
  --model_export quark_safetensors \
 
38
  --multi_gpu
39
  ```
40
  ## Deployment
 
24
  --output_dir Meta-Llama-3.1-405B-Instruct-FP8-KV \
25
  --quant_scheme w_fp8_a_fp8 \
26
  --kv_cache_dtype fp8 \
27
+ --num_calib_data 128 \
28
+ --model_export quark_safetensors \
29
+ --no_weight_matrix_merge
30
 
31
  # If model size is too large for single GPU, please use multi GPU instead.
32
  python3 quantize_quark.py \
 
34
  --output_dir Meta-Llama-3.1-405B-Instruct-FP8-KV \
35
  --quant_scheme w_fp8_a_fp8 \
36
  --kv_cache_dtype fp8 \
37
+ --num_calib_data 128 \
38
  --model_export quark_safetensors \
39
+ --no_weight_matrix_merge \
40
  --multi_gpu
41
  ```
42
  ## Deployment