How to create gguf fromm this

by DerErnst - opened Aug 10

Aug 10

Hi there, how can irun this llm in ollama Server? Tried to convert it to gguf with llama.cpp without success. How can i use it? Thanks in advance

gghfez

Aug 10

This is a special 4-bit quant for finetuning with unsloth. If you just want to run gemma-2-27b-it in ollama, you'd probably just do this and let ollama download it from their repository for you:

ollama run gemma2:27b

If you do in fact want to build your own .gguf file locally with llamacpp, use this one instead:
https://huggingface.co/unsloth/gemma-2-27b-it

That being said, you can also just get a gguf someone else has made
https://huggingface.co/bartowski/gemma-2-27b-it-GGUF

One final option is to have huggingface built you a gguff file:
https://huggingface.co/spaces/ggml-org/gguf-my-repo

DerErnst

Aug 10

Thanks a lot for your reply.
I´m heaving trouble runnig the normal gemma2-27b. it is realy slow... so i found this model. A read that is is faster than the normal on? Or i sonly the training faster with unsloth? I do not need extra training at the moment.. just want to run the 27b in higher speed with more token/s than now...

Thanks so much.

shimmyshimmer

Unsloth AI org Aug 10

Thanks a lot for your reply.
I´m heaving trouble runnig the normal gemma2-27b. it is realy slow... so i found this model. A read that is is faster than the normal on? Or i sonly the training faster with unsloth? I do not need extra training at the moment.. just want to run the 27b in higher speed with more token/s than now...

Thanks so much.

It's only faster because it is 4bit quantized which is unrelated to unsloth. GGUF cannot be in 4bit so the best option you have is to use Bartowski's upload.

We do make training and inference of models faster however but currently our inference only works with GPUs.

DerErnst

Aug 11

Ok, thanks for making things clear to me :) i'll give an other ready to use gguff with 4 bit quant a chance.

shimmyshimmer

Unsloth AI org Aug 11

Ok, thanks for making things clear to me :) i'll give an other ready to use gguff with 4 bit quant a chance.

When you finetune a model with Unsloth remember you can also directly export it to GGUF using Unsloth!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment