How to control thinking level ?

#37
by kalashshah19 - opened

I am using llama server directly from llama cpp, is there any parameter or a way to set the level of thinking like Low, Medium or High? Many times the model thinks too much and uses all the remaining tokens from the context length and couldn't return the final message.

Sign up or log in to comment