Neural Magic

company

Verified

https://neuralmagic.com/

neuralmagic

Activity Feed

AI & ML interests

LLMs, optimization, compression, sparsification, quantization, pruning, distillation, NLP, CV

Organization Card

Community About org cards

The Future of AI is Open

If you are looking for compressed models to run with vLLM, they have been moved to the RedHatAI organization. We are looking forward to continue publishing optimized models for open source use!

Neural Magic (Acquired by Red Hat) helps developers in accelerating deep learning performance using automated model compression technologies and inference engines. Download our compression-aware inference engines and open source tools for fast model inference.

vLLM: A high-throughput and memory-efficient inference engine for at-scale deployment of performant open-source LLMs
LLM Compressor: HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM

In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.

Collections 14

View 14 collections

spaces 3

Quant Llms Text Generation

🔥

Quantized vs. Unquantized LLM: Text Generation Comparison

Sparse Llama Gsm8k

📚

Solve math problems with chat-based guidance

models 1

neuralmagic/Llama-3.2-3B-Instruct-quantized.w8a8

Text Generation • 4B • Updated Jul 9, 2025 • 167

datasets 13

Neural Magic

AI & ML interests

The Future of AI is Open

Collections 14

RedHatAI/DeepSeek-R1-Distill-Llama-8B-FP8-dynamic

RedHatAI/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic

RedHatAI/DeepSeek-R1-Distill-Qwen-32B-FP8-dynamic

RedHatAI/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8

RedHatAI/granite-3.1-2b-instruct-quantized.w4a16

RedHatAI/granite-3.1-2b-instruct-quantized.w8a8

RedHatAI/granite-3.1-8b-instruct-quantized.w4a16

RedHatAI/granite-3.1-8b-instruct-quantized.w8a8

RedHatAI/DeepSeek-R1-Distill-Llama-8B-FP8-dynamic

RedHatAI/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic

RedHatAI/DeepSeek-R1-Distill-Qwen-32B-FP8-dynamic

RedHatAI/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8

RedHatAI/granite-3.1-2b-instruct-quantized.w4a16

RedHatAI/granite-3.1-2b-instruct-quantized.w8a8

RedHatAI/granite-3.1-8b-instruct-quantized.w4a16

RedHatAI/granite-3.1-8b-instruct-quantized.w8a8

spaces 3

Quant Llms Text Generation

Sparse Llama Gsm8k

models 1

neuralmagic/Llama-3.2-3B-Instruct-quantized.w8a8

datasets 13

neuralmagic/calibration

neuralmagic/mmlu_it

neuralmagic/mmlu_fr

neuralmagic/mmlu_th

neuralmagic/mmlu_de

neuralmagic/mmlu_es

neuralmagic/mmlu_hi

neuralmagic/mmlu_pt

neuralmagic/quantized-llama-3.1-leaderboard-v2-evals

neuralmagic/quantized-llama-3.1-humaneval-evals

AI & ML interests

Team members 38

The Future of AI is Open

Collections 14

spaces 3 Sort: Recently updated

Quant Llms Text Generation

Sparse Llama Gsm8k

models 1

datasets 13 Sort: Recently updated

spaces 3

datasets 13