Text Generation
Transformers
Safetensors
step3p5
conversational
custom_code

Disabling/Reducing model reasoning

#22
by Abdallah1997 - opened

I have Important CoT prompts that guide the llm how to think. Using them is leading to latency and large token output, I'd like to reduce the internal model reasoning for those reasons.

Abdallah1997 changed discussion title from Disabling/Reducing reasoning to Disabling/Reducing model reasoning
StepFun org

We hear the ask. You are not alone. We will add it in the next version

ideally there would also be a non-thinking version or a non-thinking switch to keep the model responsive for local usage on consumer hardware or when latency is key to the application (such as using tex to speech to have a conversation etc.)

Sign up or log in to comment