brittlewis12/gemma-2b-GGUF · Response Handling Of gemma-2b-GGUF / gemma-2b.Q8

Apr 3

When I provide a user prompt to the Gemma model and request a response, I've noticed a couple of recurring issues:

Repetitive Answers: The model sometimes returns the same answer multiple times, which can be quite redundant and not very helpful.

Language Inconsistencies: Occasionally, the responses returned by the model are in Japanese language, even though I'm expecting responses in English.

Partial Answers: There are instances where the model returns partially left answers, omitting important information or providing incomplete responses.

My model's parameter: model_path= "gemma-2b.Q8_0.gguf"
def load_llm():
llm=LlamaCpp(
model_path=model_path,
model_type="gemma",
max_new_tokens=512,
temperature=0.4,
repeat_penalty = 1.100,
load_in_4bit=True,
config={'context_length': 2048},
n_ctx= 2048
)
return llm

aryaakp2302 changed discussion status to closed Apr 3

aryaakp2302 changed discussion status to open Apr 3

aryaakp2302 changed discussion status to closed Apr 3

aryaakp2302 changed discussion status to open Apr 3

brittlewis12

Owner Apr 21

@aryaakp2302 thanks for taking the time to document and share your experience! it’s a nice picture into this model’s quirks.

I’m not sure if this will address everything you’ve experienced, but as you may have heard, google released an updated 1.1 version of this model — I’d be curious how that compares to this one?

I’ve just uploaded GGUFs for that here

brittlewis12
/

gemma-2b-GGUF

Response Handling Of gemma-2b-GGUF / gemma-2b.Q8_0.gguf