Qwen3-VL-4B-Instruct-per-grp-quant

Introduction
This model was quantized using amd_quark-0.11
Quantization Strategy
- Quantized Layers: All linear layers
- Weight: uint4 asymmetric per-group with group_size=128.
Quick Start

Downalod the Qwen3-VL-4B-Instruct model.
Run the quantization script in the example folder using the following command line:
```
python run_qwen3_vl_4b_quant_model.py
```

Evaluation

Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py. The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.

Evaluation scores

Benchmark	Qwen3-VL-4B-Instruct	Qwen3-VL-4B-Instruct-per-grp-quant (this model)
Perplexity-wikitext2	10.5369	11.6644

License

Downloads last month: 10

Safetensors

Model size

1.0B params

Tensor type

I32

BF16

Model tree for RyzenAI/Qwen3-VL-4B-Instruct-per-grp-quant

Base model

Qwen/Qwen3-VL-4B-Instruct

Quantized

(57)

this model

Qwen3-VL-4B-Instruct-per-grp-quant

Introduction

Quantization Strategy

Quick Start

Evaluation

Evaluation scores

License

Model tree for RyzenAI/Qwen3-VL-4B-Instruct-per-grp-quant