VibeVoice-Large-Q8 - Selective 8bit Quantization
π― Why This Model is Different
If you've tried other 8-bit quantized VibeVoice models, you probably got nothing but static noise. This one actually works.
The secret? Selective quantization: I only quantized the language model (the most robust part), while keeping audio-critical components (diffusion head, VAE, connectors) at full precision.
Results
- β Perfect audio, identical to the original model
- β 11.6 GB instead of 18.7 GB (-38%)
- β Uses ~12 GB VRAM instead of 20 GB
- β Works on 12 GB GPUs (RTX 3060, 4070 Ti, etc.)
π¨ The Problem with Other 8-bit Models
Most 8-bit models you'll find online quantize everything aggressively: Result: Audio components get quantized β numerical errors propagate β audio = pure noise.
β The Solution: Selective Quantization
I only quantized what can be safely quantized without losing quality.
Result: 52% of parameters quantized, 48% at full precision = perfect audio quality.
π Quick Comparison
Model | Size | Audio Quality | Status |
---|---|---|---|
Original VibeVoice | 18.7 GB | βββββ | Full precision |
Other 8-bit models | 10.6 GB | π₯ NOISE | β Don't work |
This model | 11.6 GB | βββββ | β Perfect |
+1.0 GB vs other 8-bit models = perfect audio instead of noise. Worth it.
π» How to Use It
With Transformers
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
import scipy.io.wavfile as wavfile
# Load model
model = AutoModelForCausalLM.from_pretrained(
"FabioSarracino/VibeVoice-Large-Q8",
device_map="auto",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
processor = AutoProcessor.from_pretrained(
"FabioSarracino/VibeVoice-Large-Q8",
trust_remote_code=True
)
# Generate audio
text = "Hello, this is VibeVoice speaking."
inputs = processor(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=None)
# Save
audio = output.speech_outputs[0].cpu().numpy()
wavfile.write("output.wav", 24000, audio)
With ComfyUI (recommended)
Install the custom node:
cd ComfyUI/custom_nodes git clone https://github.com/Enemyx-net/VibeVoice-ComfyUI
Download this model to
ComfyUI/models/vibevoice/
Restart ComfyUI and use it normally!
πΎ System Requirements
Minimum
- VRAM: 12 GB
- RAM: 16 GB
- GPU: NVIDIA with CUDA (required)
- Storage: 11 GB
Recommended
- VRAM: 16+ GB
- RAM: 32 GB
- GPU: RTX 3090/4090, A5000 or better
β οΈ Not supported: CPU, Apple Silicon (MPS), AMD GPUs
β οΈ Limitations
- Requires NVIDIA GPU with CUDA - won't work on CPU or Apple Silicon
- Inference only - don't use for fine-tuning
- Requires:
transformers>=4.51.3
bitsandbytes>=0.43.0
π When to Use This Model
β Use this 8-bit if:
- You have 12-16 GB VRAM
- You want maximum quality with reduced size
- You need a production-ready model
- You want the best size/quality balance
Use full precision (18.7 GB) if:
- You have unlimited VRAM (24+ GB)
- You're doing research requiring absolute precision
Use 4-bit NF4 (~6.6 GB) if:
- You only have 8-10 GB VRAM
- You can accept a small quality trade-off
π§ Troubleshooting
"OutOfMemoryError" during loading
- Close other GPU applications
- Use
device_map="auto"
- Reduce batch size to 1
"BitsAndBytes not found"
pip install bitsandbytes>=0.43.0
Audio sounds distorted
This shouldn't happen! If it does:
- Verify you downloaded the correct model
- Update transformers:
pip install --upgrade transformers
- Check CUDA:
torch.cuda.is_available()
should returnTrue
π Citation
@misc{vibevoice-q8-2025,
title={VibeVoice-Large-Q8: Selective 8-bit Quantization for Audio Quality},
author={Fabio Sarracino},
year={2025},
url={https://huggingface.co/FabioSarracino/VibeVoice-Large-Q8}
}
Original Model
@misc{vibevoice2024,
title={VibeVoice: High-Quality Text-to-Speech with Large Language Models},
author={Microsoft Research},
year={2024},
url={https://github.com/microsoft/VibeVoice}
}
π Related Resources
- Original Model - Full precision base
- ComfyUI Node - ComfyUI integration
π License
MIT License.
π€ Support
- Issues: GitHub Issues
- Questions: HuggingFace Discussions
If this model helped you, leave a β on GitHub!
Created by Fabio Sarracino
The first 8-bit VibeVoice model that actually works
- Downloads last month
- 2,769