Response Handling Of gemma-2b-GGUF / gemma-2b.Q8_0.gguf
When I provide a user prompt to the Gemma model and request a response, I've noticed a couple of recurring issues:
Repetitive Answers: The model sometimes returns the same answer multiple times, which can be quite redundant and not very helpful.
Language Inconsistencies: Occasionally, the responses returned by the model are in Japanese language, even though I'm expecting responses in English.
Partial Answers: There are instances where the model returns partially left answers, omitting important information or providing incomplete responses.
My model's parameter: model_path= "gemma-2b.Q8_0.gguf"
def load_llm():
llm=LlamaCpp(
model_path=model_path,
model_type="gemma",
max_new_tokens=512,
temperature=0.4,
repeat_penalty = 1.100,
load_in_4bit=True,
config={'context_length': 2048},
n_ctx= 2048
)
return llm
@aryaakp2302 thanks for taking the time to document and share your experience! it’s a nice picture into this model’s quirks.
I’m not sure if this will address everything you’ve experienced, but as you may have heard, google released an updated 1.1 version of this model — I’d be curious how that compares to this one?
I’ve just uploaded GGUFs for that here