Disabling/Reducing model reasoning
#22
by
Abdallah1997
- opened
I have Important CoT prompts that guide the llm how to think. Using them is leading to latency and large token output, I'd like to reduce the internal model reasoning for those reasons.
Abdallah1997
changed discussion title from
Disabling/Reducing reasoning
to Disabling/Reducing model reasoning
We hear the ask. You are not alone. We will add it in the next version
ideally there would also be a non-thinking version or a non-thinking switch to keep the model responsive for local usage on consumer hardware or when latency is key to the application (such as using tex to speech to have a conversation etc.)