pooja-ganesh commited on
Commit
73e22ec
·
verified ·
1 Parent(s): 66d72d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -5
README.md CHANGED
@@ -11,13 +11,41 @@ base_model: THUDM/chatglm3-6b
11
 
12
  # chatglm3-6b-awq-w-int4-asym-gs128-a-fp16-onnx-ryzen-strix-hybrid
13
  - ## Introduction
14
- This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset, and applying [onnxruntime-genai model builder](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models) to convert to ONNX.
 
 
15
  - ## Quantization Strategy
16
- - ***Quantized Layers***: All linear layers, including "transformer.output_layer"
17
- - ***Weight***: uint4 asymmetric per-group, with group size 128
18
- - AWQ / Group 128 / Asymmetric / FP16 activations / INT4 weights
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  - ## Quick Start
20
- For quickstart, refer to AMD [RyzenAI-SW-EA](https://account.amd.com/en/member/ryzenai-sw-ea.html) (to be updated)
 
 
 
21
 
22
  #### License
23
  Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
 
11
 
12
  # chatglm3-6b-awq-w-int4-asym-gs128-a-fp16-onnx-ryzen-strix-hybrid
13
  - ## Introduction
14
+ - Quantization Tool: Quark 0.6.0
15
+ - OGA Model Builder: v0.5.1
16
+ - Postprocess
17
  - ## Quantization Strategy
18
+ - AWQ / Group 128 / Asymmetric / UINT4 Weights / FP16 activations
19
+ - Excluded Layers: None
20
+ ```
21
+ python3 quantize_quark.py \
22
+ --model_dir "$model" \
23
+ --output_dir "$output_dir" \
24
+ --quant_scheme w_uint4_per_group_asym \
25
+ --num_calib_data 128 \
26
+ --quant_algo awq \
27
+ --dataset pileval_for_awq_benchmark \
28
+ --seq_len 512 \
29
+ --model_export quark_safetensors \
30
+ --data_type float16 \
31
+ --exclude_layers [] \
32
+ --custom_mode awq
33
+ ```
34
+ - ## OGA Model Builder
35
+ ```
36
+ python builder.py \
37
+ -i <quantized safetensor model dir> \
38
+ -o <oga model output dir> \
39
+ -p int4 \
40
+ -e dml
41
+ ```
42
+ - PostProcessed to generate Hybrid Model
43
+ -
44
  - ## Quick Start
45
+ For quickstart, refer to AMD [RyzenAI-SW-EA](https://account.amd.com/en/member/ryzenai-sw-ea.html)
46
+
47
+ #### Evaluation scores
48
+ The perplexity measurement is run on the wikitext-2-raw-v1 (raw data) dataset provided by Hugging Face. Perplexity score measured for prompt length 2k is 29.7801.
49
 
50
  #### License
51
  Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.