leonardlin commited on
Commit
03fc14c
1 Parent(s): 710c342

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ Tested to work correctly with multiturn w/ the llama3 chat_template:
2
+ ```
3
+ ./server -ngl 99 -m shisa-v1-llama3-8b.Q5_K_M.gguf --chat-template llama3 -fa -v
4
+ ```
5
+
6
+ Note: BF16 GGUFs have no CUDA implementation atm: https://github.com/ggerganov/llama.cpp/issues/7211