CUDA support
#2
by
mike-ravkine
- opened
Hi
@ddh0
I am having trouble loading this model with CUDA, using both latest llama.cpp and revision 18e43766
as per your example I get:
GGML_ASSERT: ggml-cuda.cu:1278: to_fp32_cuda != nullptr
CPU-only seems to work fine, which makes me wonder which backend you are using for inference (Metal?)
Yes, I’m using Metal. CUDA support for bf16 is still being worked on in llama.cpp. You could try with batch size <= 16 or on CPU for the time being
mike-ravkine
changed discussion status to
closed