File size: 259 Bytes
03fc14c
 
 
 
 
 
1
2
3
4
5
6
Tested to work correctly with multiturn w/ the llama3 chat_template:
```
./server -ngl 99 -m shisa-v1-llama3-8b.Q5_K_M.gguf --chat-template llama3 -fa -v
```

Note: BF16 GGUFs have no CUDA implementation atm: https://github.com/ggerganov/llama.cpp/issues/7211