How to create gguf fromm this
Hi there, how can irun this llm in ollama Server? Tried to convert it to gguf with llama.cpp without success. How can i use it? Thanks in advance
This is a special 4-bit quant for finetuning with unsloth. If you just want to run gemma-2-27b-it in ollama, you'd probably just do this and let ollama download it from their repository for you:
ollama run gemma2:27b
If you do in fact want to build your own .gguf file locally with llamacpp, use this one instead:
https://huggingface.co/unsloth/gemma-2-27b-it
That being said, you can also just get a gguf someone else has made
https://huggingface.co/bartowski/gemma-2-27b-it-GGUF
One final option is to have huggingface built you a gguff file:
https://huggingface.co/spaces/ggml-org/gguf-my-repo
Thanks a lot for your reply.
I´m heaving trouble runnig the normal gemma2-27b. it is realy slow... so i found this model. A read that is is faster than the normal on? Or i sonly the training faster with unsloth? I do not need extra training at the moment.. just want to run the 27b in higher speed with more token/s than now...
Thanks so much.
Thanks a lot for your reply.
I´m heaving trouble runnig the normal gemma2-27b. it is realy slow... so i found this model. A read that is is faster than the normal on? Or i sonly the training faster with unsloth? I do not need extra training at the moment.. just want to run the 27b in higher speed with more token/s than now...Thanks so much.
It's only faster because it is 4bit quantized which is unrelated to unsloth. GGUF cannot be in 4bit so the best option you have is to use Bartowski's upload.
We do make training and inference of models faster however but currently our inference only works with GPUs.
Ok, thanks for making things clear to me :) i'll give an other ready to use gguff with 4 bit quant a chance.
Ok, thanks for making things clear to me :) i'll give an other ready to use gguff with 4 bit quant a chance.
When you finetune a model with Unsloth remember you can also directly export it to GGUF using Unsloth!