Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

canada-quant
/
DeepSeek-V4-Flash-W4A16-FP8

Text Generation
Safetensors
English
Chinese
vllm
deepseek_v4
deepseek
compressed-tensors
w4a16
gptq
fp8
mixture-of-experts
Mixture of Experts
Model card Files Files and versions
xet
Community
7
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

2026-05-26 dual-spark bench: GSM8K 94-98%, AIME 4/5, concurrency wall 12→125 tok/s

#7 opened about 8 hours ago by
pastapaul

Compatible version with Ampere? SM8.6

#6 opened about 23 hours ago by
bullerwins

One shot bootstrap for 2 Sparks not working

1
#4 opened 12 days ago by
easonchow0419

dgx spark error :torch.AcceleratorError: CUDA error: an illegal memory access was encountered ......

3
#3 opened 12 days ago by
wangweiweihw

Need vllm-w4a16-dsv4:exp Thanks

2
#2 opened 20 days ago by
youcai666

Can I run this model on 2x H20 141GB?

3
#1 opened 20 days ago by
CHNtentes
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs