FP8-Dynamic quant created with llm-compressor, can run on 16 VRAM cards. Update vLLM and Transformers:
pip install vllm>=0.7.2
pip install transformers>=4.49
Then run with:
vllm serve leon-se/Qwen2.5-VL-7B-Instruct-FP8-Dynamic --trust-remote-code
- Downloads last month
- 30
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for leon-se/Qwen2.5-VL-7B-Instruct-FP8-Dynamic
Base model
Qwen/Qwen2.5-VL-7B-Instruct