Issue with --n-gpu-layers 5 Parameter: Model Only Running on CPU

#10
by vuk123 - opened

Hi, I’m facing an issue where the --n-gpu-layers 5 parameter doesn’t seem to work. Despite having 2x NVIDIA A6000 GPUs, the model runs entirely on the CPU, with no GPU utilization. Has anyone else encountered this, or is there a fix for it?

this is how i run model : llama-cli --model /home/user/mymodels/DeepSeek-V3-Q3_K_M/DeepSeek-V3-Q3_K_M-00001-of-00007.gguf --cache-type-k q5_0 --threads 16 --prompt '<|User|>What is 1+1?<|Assistant|>' --n-gpu-layers 5

it look like problem is i installed llama.cpp with brew so its not compiled by cuda...

i build it with cmake, now it works...

Unsloth AI org

i build it with cmake, now it works...

glad you got it working!!

image.png

i build it with cmake, now it works...

I use the command:

cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

But the GPU memory is occupied, but the GPU utilization rate is 0, and it seems to be running on the CPU as well

Sign up or log in to comment