Does this model only work on CUDA devices with compute capability >= 9.0 or 8.9/ROCm MI300+?
#4
by
jcfasi
- opened
When trying to deploy this on SageMaker using the DJLServing 0.29.0 LMI image with vLLM I get this error:
RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or
DJLServing 0.29.0 LMI Image URI: 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-lmi11.0.0-cu124
Hi @jcfasi , vLLM has support for fp8 on Ampere (compute capability >= 8.0) as well! See https://docs.vllm.ai/en/latest/quantization/fp8.html
If you are running into issues, please post them at https://github.com/vllm-project/vllm/