Does this model only work on CUDA devices with compute capability >= 9.0 or 8.9/ROCm MI300+?

by jcfasi - opened Aug 16, 2024

jcfasi

Aug 16, 2024

•

When trying to deploy this on SageMaker using the DJLServing 0.29.0 LMI image with vLLM I get this error:

RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or

DJLServing 0.29.0 LMI Image URI: 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-lmi11.0.0-cu124

tms-jr

Neural Magic org Aug 23, 2024

Hi @jcfasi , vLLM has support for fp8 on Ampere (compute capability >= 8.0) as well! See https://docs.vllm.ai/en/latest/quantization/fp8.html

If you are running into issues, please post them at https://github.com/vllm-project/vllm/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment