Compatibility with llama-cpp and Ollama
Hi there!
I've tried some quantized versions of this model and ran into an issue. I use llama-cpp-python for model inference. When I provide a question, I get infinite random characters as the result (see screenshot). But when I create a local model from the same quantized gguf by using Modelfile for Ollama inference, then everything works fine. So the issue is that Ollama works, and llama-cpp-python provides random output. The same behavior was noticed with a couple other models, like defog/llama-3-sqlcoder-8b.
Is anyone here experiencing same issues?
llm_load_vocab:
llm_load_vocab: ************************************
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
llm_load_vocab: CONSIDER REGENERATING THE MODEL
llm_load_vocab: ************************************
llm_load_vocab:
Hi there!
I've tried some quantized versions of this model and ran into an issue. I use llama-cpp-python for model inference. When I provide a question, I get infinite random characters as the result (see screenshot). But when I create a local model from the same quantized gguf by using Modelfile for Ollama inference, then everything works fine. So the issue is that Ollama works, and llama-cpp-python provides random output. The same behavior was noticed with a couple other models, like defog/llama-3-sqlcoder-8b.
Is anyone here experiencing same issues?
@liashchynskyi
Yes! I'm having the same issue with defog/llama-3-sqlcoder-8b
. I'm using LangChain with llama-cpp-python
- only GGUF models. I'm looking to use GGUF files others have created - I can look into generating my own if that's the only solution.
Output from defog/llama-3-sqlcoder-8b
:
@jaycann2
can you try QuantFactory/Meta-Llama-3-8B-Instruct-GGUF-v2
I'll update the defog quants today if you are facing issues with them
@munish0838 But why do we receive random outputs? I've tried to quantize the original model myself and ran into the same issue.
@jaycann2 can you try QuantFactory/Meta-Llama-3-8B-Instruct-GGUF-v2
I'll update the defog quants today if you are facing issues with them
Thanks @munish0838 - I tried yesterday and got the same result. I'd be interested to see if you are able to duplicate the issue on you end, with the GGUF version. If not, I might be able to learn what's going on from your code.
@jaycann2 I updated the quants yesterday in this repo and defog-sql-llama repo, they are both working perfectly for me