leonardlin
commited on
Commit
•
03fc14c
1
Parent(s):
710c342
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Tested to work correctly with multiturn w/ the llama3 chat_template:
|
2 |
+
```
|
3 |
+
./server -ngl 99 -m shisa-v1-llama3-8b.Q5_K_M.gguf --chat-template llama3 -fa -v
|
4 |
+
```
|
5 |
+
|
6 |
+
Note: BF16 GGUFs have no CUDA implementation atm: https://github.com/ggerganov/llama.cpp/issues/7211
|