Qwen3-VL-4B-Instruct-per-grp-quant
Introduction
This model was quantized using amd_quark-0.11Quantization Strategy
- Quantized Layers: All linear layers
- Weight: uint4 asymmetric per-group with group_size=128.
Quick Start
- Downalod the Qwen3-VL-4B-Instruct model.
- Run the quantization script in the example folder using the following command line:
python run_qwen3_vl_4b_quant_model.py
Evaluation
Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py. The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.
Evaluation scores
| Benchmark | Qwen3-VL-4B-Instruct | Qwen3-VL-4B-Instruct-per-grp-quant (this model) |
| Perplexity-wikitext2 | 10.5369 | 11.6644 |
License
Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
- Downloads last month
- 10
Model tree for RyzenAI/Qwen3-VL-4B-Instruct-per-grp-quant
Base model
Qwen/Qwen3-VL-4B-Instruct