qwen2-vl-2b-4bit / README.md
ksukrit's picture
Upload 4-bit quantized Qwen2-VL-2B model
dd7819f verified

Qwen2-VL-2B-Instruct 4-bit Quantized

This is a 4-bit quantized version of the Qwen2-VL-2B-Instruct model.

Model Description

  • Original Model: Qwen/Qwen2-VL-2B-Instruct
  • Quantization: 4-bit quantization using bitsandbytes
  • Usage: This model is optimized for memory efficiency while maintaining performance
  • License: Same as original model

Usage

from transformers import Qwen2VLModel, AutoTokenizer
import torch

model = Qwen2VLModel.from_pretrained("ksukrit/qwen2-vl-2b-4bit", trust_remote_code=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("ksukrit/qwen2-vl-2b-4bit", trust_remote_code=True)

Quantization Details

  • Quantization Method: bitsandbytes 4-bit quantization
  • Compute dtype: float16
  • Uses double quantization: True
  • Quantization type: nf4