INT8 LLMs for vLLM - a neuralmagic Collection

neuralmagic 's Collections

Sparse-Llama-3.1-2of4

Vision Language Models Quantization

FP8 LLMs for vLLM

Llama-3.2 Quantization

Llama-3.1 Quantization

INT8 LLMs for vLLM

INT4 LLMs for vLLM

Sparse Foundational Llama 2 Models

Compression Papers

DeepSparse Sparse LLMs

Sparse Finetuning MPT

Compressed LLMs from the Community

INT8 LLMs for vLLM

updated Sep 26

Accurate INT8 quantized models by Neural Magic, ready for use with vLLM!

neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8

Text Generation • Updated Oct 10 • 5.99k • 13
neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8

Text Generation • Updated Oct 23 • 7.76k • 12
neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8

Text Generation • Updated Oct 10 • 487 • 2
neuralmagic/Phi-3-medium-128k-instruct-quantized.w8a8

Text Generation • Updated Oct 9 • 477 • 2
neuralmagic/Phi-3-mini-128k-instruct-quantized.w8a8

Text Generation • Updated Oct 9 • 530
neuralmagic/gemma-2-9b-it-quantized.w8a8

Text Generation • Updated Oct 9 • 389 • 2
neuralmagic/Meta-Llama-3-70B-Instruct-quantized.w8a16

Text Generation • Updated Jul 18 • 451 • 3
neuralmagic/Qwen2-72B-Instruct-quantized.w8a16

Text Generation • Updated Jul 18 • 420 • 1
neuralmagic/Llama-2-7b-chat-quantized.w8a16

Text Generation • Updated Jul 18 • 420
neuralmagic/Meta-Llama-3-8B-Instruct-quantized.w8a16

Text Generation • Updated Jul 18 • 31k • 2
neuralmagic/Qwen2-0.5B-Instruct-quantized.w8a16

Text Generation • Updated Jul 18 • 16
neuralmagic/Qwen2-1.5B-Instruct-quantized.w8a16

Text Generation • Updated Jul 18 • 8
neuralmagic/Qwen2-7B-Instruct-quantized.w8a16

Text Generation • Updated Jul 18 • 90
neuralmagic/Mistral-7B-Instruct-v0.3-quantized.w8a16

Text Generation • Updated Jul 18 • 714
neuralmagic/Phi-3-mini-128k-instruct-quantized.w8a16

Text Generation • Updated Oct 9 • 16
neuralmagic/Phi-3-medium-128k-instruct-quantized.w8a16

Text Generation • Updated Oct 9 • 902 • 2
neuralmagic/Meta-Llama-3-8B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9 • 483 • 2
neuralmagic/Llama-2-7b-chat-quantized.w8a8

Text Generation • Updated Oct 9 • 556 • 1
neuralmagic/Qwen2-0.5B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9 • 44
neuralmagic/Qwen2-1.5B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9 • 1.62k
neuralmagic/Qwen2-7B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9 • 640
neuralmagic/Qwen2-72B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9 • 427 • 1
neuralmagic/Meta-Llama-3-70B-Instruct-quantized.w8a8

Text Generation • Updated Oct 9 • 26
neuralmagic/Mistral-7B-Instruct-v0.3-quantized.w8a8

Text Generation • Updated Oct 9 • 386
neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16

Text Generation • Updated Oct 23 • 5.88k • 9
neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a16

Text Generation • Updated Oct 9 • 1.11k • 3
neuralmagic/Meta-Llama-3.1-8B-quantized.w8a16

Text Generation • Updated Oct 9 • 383 • 1
neuralmagic/Meta-Llama-3.1-8B-quantized.w8a8

Text Generation • Updated Oct 23 • 590 • 1
neuralmagic/starcoder2-7b-quantized.w8a16

Text Generation • Updated Oct 9 • 24
neuralmagic/starcoder2-15b-quantized.w8a16

Text Generation • Updated Oct 9 • 383
neuralmagic/starcoder2-3b-quantized.w8a16

Text Generation • Updated Oct 9 • 27
neuralmagic/starcoder2-15b-quantized.w8a8

Text Generation • Updated Oct 9 • 12
neuralmagic/starcoder2-7b-quantized.w8a8

Text Generation • Updated Oct 9 • 31
neuralmagic/starcoder2-3b-quantized.w8a8

Text Generation • Updated Oct 9 • 15
neuralmagic/gemma-2-2b-it-quantized.w8a16

Text Generation • Updated Oct 9 • 53 • 1
neuralmagic/Phi-3-small-128k-instruct-quantized.w8a16

Text Generation • Updated Oct 9 • 417
neuralmagic/SmolLM-1.7B-Instruct-quantized.w8a16

Text Generation • Updated Oct 9 • 20
neuralmagic/gemma-2-2b-quantized.w8a16

Text Generation • Updated Oct 9 • 48
neuralmagic/gemma-2-9b-it-quantized.w8a16

Text Generation • Updated Oct 9 • 771 • 1
neuralmagic/gemma-2-2b-it-quantized.w8a8

Text Generation • Updated Oct 9 • 1.53k
neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a16

Text Generation • Updated Oct 9 • 506 • 2
neuralmagic/SmolLM-360M-Instruct-quantized.w8a8

Text Generation • Updated Oct 9 • 12
neuralmagic/SmolLM-135M-Instruct-quantized.w8a8

Text Generation • Updated Oct 9 • 717
neuralmagic/Llama-3.2-3B-Instruct-quantized.w8a8

Text Generation • Updated Oct 16 • 6.61k • 1