Qwen3-VL-4B-Instruct-per-grp-quant

  • Introduction

    This model was quantized using amd_quark-0.11
  • Quantization Strategy

    • Quantized Layers: All linear layers
    • Weight: uint4 asymmetric per-group with group_size=128.
  • Quick Start

  1. Downalod the Qwen3-VL-4B-Instruct model.
  2. Run the quantization script in the example folder using the following command line:
    python run_qwen3_vl_4b_quant_model.py
    

Evaluation

Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py. The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.

Evaluation scores

Benchmark Qwen3-VL-4B-Instruct Qwen3-VL-4B-Instruct-per-grp-quant (this model)
Perplexity-wikitext2 10.5369 11.6644

License

Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.

Downloads last month
10
Safetensors
Model size
1.0B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RyzenAI/Qwen3-VL-4B-Instruct-per-grp-quant

Quantized
(57)
this model