CUDA support

by mike-ravkine - opened May 15

May 15

Hi @ddh0 I am having trouble loading this model with CUDA, using both latest llama.cpp and revision 18e43766 as per your example I get:

GGML_ASSERT: ggml-cuda.cu:1278: to_fp32_cuda != nullptr

CPU-only seems to work fine, which makes me wonder which backend you are using for inference (Metal?)

ddh0

Owner May 15

Yes, I’m using Metal. CUDA support for bf16 is still being worked on in llama.cpp. You could try with batch size <= 16 or on CPU for the time being

May 15

@ddh0 Thanks! No luck with decreasing batch size, but -ngl 0 resolved the issue.

mike-ravkine changed discussion status to closed May 15

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment