No thinking tags when it runs?

#1
by Disdrix - opened

Getting an issue where thinking tags never appear which results in no markdowns, making the model unusable. I've only tested it with Kimi-K2-Thinking-UD-Q3_K_XL. This works fine with Deepseek R1, just for example. I don't know if this is an issue with the latest llama cpp or the GGUF.

I run with the following:
llama-server --model Kimi-K2-Thinking-UD-Q3_K_XL-00001-of-00010.gguf -ts 99,0 -fa on --temp 1.0 --min-p 0.01 -c 131072 --threads 38 -ngl 99 --n-cpu-moe 54

image

I have this problem too with GLM4.5 and GLM4.6 quantized by unsloth, but it's random. The </think> only appears 80% of the time, the longer the context, the lower the probability of success is. Not sure if it's related.

Unsloth AI org

You have to do --special and then it comes up and then you'll see the think token. This is normal expected behavior

CC: @Disdrix @AliceThirty

Thanks @danielhanchen - that worked. It would be good if this were added to the unsloth guide, as it doesn't seem documented anywhere and I've seen several people asking in various forums.

The only downside is that now it ends every answer with <|im_end|>. Maybe a template issue?

Also, it seems to have an identity crisis and thinks it's Claude. Probably an issue with the base model, but funny.
image

The only downside is that now it ends every answer with <|im_end|>.

Intended behavior when printing special tokens. <|im_end|> is a special token after all. You can set <|im_end|> as a stop string in openwebui.

image

Most front ends like ST, etc should support that.

Also, it seems to have an identity crisis and thinks it's Claude.

Makes sense

Unsloth AI org

Ok we'll add it to the guide @dsg22 thanks!

Unsloth AI org

Thanks @danielhanchen - that worked. It would be good if this were added to the unsloth guide, as it doesn't seem documented anywhere and I've seen several people asking in various forums.

The only downside is that now it ends every answer with <|im_end|>. Maybe a template issue?

Also, it seems to have an identity crisis and thinks it's Claude. Probably an issue with the base model, but funny.
image

We added it here: https://docs.unsloth.ai/models/kimi-k2-and-thinking-how-to-run-locally#thinking-tags

Here is how to get rid of the im_end. Add this to the custom json.

{"prompt": "...", "stop": ["<|im_end|>"]}

image

Another update. After some number of messages back and forth, markdowns fail again. It seems to be that --special did not fix the issue entirely.

It seems to happen repeatedly around 2000 tokens. I did a "write a story" initial prompt and "continue the story" a couple times.

Here, you can see it didn't even bother reasoning and went right to normal text generation. It then did a weird thing where it added a think token and repeated this section of story again exactly. Sometimes at this point it will actually reason, but won't markdown. After it finished, I did another "continue the story" and it did reason this time, but no markdown.

image

image

image

Unsloth AI org

@Disdrix Could you try setting min_p = 0.01 and temperature = 0.8 to temperature = 1.0

@Disdrix Could you try setting min_p = 0.01 and temperature = 0.8 to temperature = 1.0

This is what I am doing already. It seems to always after ~2k tokens, start this problem. I've tried many new chats and it always does, 100% of the time in that range.

I have not yet tried using something other than llama cpp to rule out that being the issue yet.

Just updated again to the latest llama cpp and it persists. I did find some interesting behavior, however. If I am writing a story then suddenly prompt the AI with "hi", it will reason again.

Disdrix changed discussion status to closed
Disdrix changed discussion status to open

I'm not convinced we should be using it with only temp and min-p 0.01 at these lower quants. The folk theory is that the harsher the quanting, the higher the effective temp compared to the unquanted model.

I think I'm going to be starting at more like temp 0.7 and min-p 0.05 for UD-TQ1_0 next time I play with it.

Sign up or log in to comment