amd
/

Qwen1.5-7B-Chat-awq-g128-int4-asym-bf16-onnx-ryzen-strix

Text Generation

Model card Files Files and versions Community

create_model

#1

by haoyang-amd - opened Oct 22

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

This PR is in draft mode

Files changed (4) hide show

README.md +4 -4
config.json +0 -27
model.data +2 -2
model.onnx +2 -2

README.md CHANGED Viewed

@@ -12,14 +12,14 @@ tags:
 # Qwen/Qwen1.5-7B-Chat
 - ## Introduction
-  This model was created using Quark Quantization, followed by OGA Model Builder, and finalized with post-processing for NPU deployment.
 - ## Quantization Strategy
-  - AWQ / Group 128 / Asymmetric / BF16 activations / UINT4 Weights
 - ## Quick Start
-  For quickstart, refer to npu-llm-artifacts_1.3.0.zip available in [RyzenAI-SW-EA](https://account.amd.com/en/member/ryzenai-sw-ea.html)
 #### Evaluation scores
-The perplexity measurement is run on the wikitext-2-raw-v1 (raw data) dataset provided by Hugging Face. Perplexity score measured for prompt length 2k is 12.02751.
 #### License

 # Qwen/Qwen1.5-7B-Chat
 - ## Introduction
+  This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset, and applying [onnxruntime-genai model builder](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models) to convert to ONNX.
 - ## Quantization Strategy
+  - AWQ / Group 128 / Asymmetric / FP32 activations
 - ## Quick Start
+For quickstart, refer to AMD [RyzenAI-SW-EA](https://account.amd.com/en/member/ryzenai-sw-ea.html)
 #### Evaluation scores
+The perplexity measurement is run on the wikitext-2-raw-v1 (raw data) dataset provided by Hugging Face. Perplexity score measured for prompt length 2k is 12.02787.
 #### License

config.json DELETED Viewed

@@ -1,27 +0,0 @@
-{
-  "architectures": [
-    "Qwen2ForCausalLM"
-  ],
-  "attention_dropout": 0.0,
-  "bos_token_id": 151643,
-  "eos_token_id": 151645,
-  "hidden_act": "silu",
-  "hidden_size": 4096,
-  "initializer_range": 0.02,
-  "intermediate_size": 11008,
-  "max_position_embeddings": 32768,
-  "max_window_layers": 28,
-  "model_type": "qwen2",
-  "num_attention_heads": 32,
-  "num_hidden_layers": 32,
-  "num_key_value_heads": 32,
-  "rms_norm_eps": 1e-06,
-  "rope_theta": 1000000.0,
-  "sliding_window": 32768,
-  "tie_word_embeddings": false,
-  "torch_dtype": "bfloat16",
-  "transformers_version": "4.37.0",
-  "use_cache": true,
-  "use_sliding_window": false,
-  "vocab_size": 151936
-}

model.data CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:88f9a71a98568c736d8aa98556351392ed8de6097accce977579aad832ff4091
-size 5136295424

 version https://git-lfs.github.com/spec/v1
+oid sha256:c3179365d3572c280dfe5b095dbc9f73a25c21e3d45e5f7d441a516d4eaa0b51
+size 5062793216

model.onnx CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:25f6e38e5f02bc0dba0ed8a15dca71f0f0cc031e0b9f8dd953bd7a1568b744d5
-size 244052

 version https://git-lfs.github.com/spec/v1
+oid sha256:5aeddbf135ebcdf7ae19d7b1c94d90989c9e47f315aee89a3a38a421364633f8
+size 309307