toolcall inside thinking

#25
by snapo - opened

image

i get many time a tool call inside the thinking tag... even i use your profile

services:
llama-server:
image: ghcr.io/ggml-org/llama.cpp:full-cuda13-b9209
container_name: llama-server
restart: unless-stopped
ports:
- "16384:8080"
volumes:
- ./models:/models:ro
command: >
--server
--model /models/Qwen3.6-27B-Q4_K_M-uc-mtp-v2.gguf
--alias "Qwen3.6 27B"
--temp 0.6
--top-p 0.95
--min-p 0.00
--top-k 20
--port 8080
--host 0.0.0.0
--fit off
--ctx-size 200000
--presence-penalty 0.0
--repeat-penalty 1.0
--jinja
--chat-template-file /models/Qwen3.6-11.jinja
--mmproj /models/Qwen3.6-27B-Q4_K_M-MTP-mmproj-f16-uc-v2.gguf
--webui
--spec-draft-p-min 0.75
--spec-type draft-mtp
--spec-draft-n-max 3
--chat-template-kwargs '{"preserve_thinking": true}'
--reasoning-budget 8192
--reasoning-budget-message "... thinking budget exceeded, let's answer now.\n"
--split-mode tensor
user: "1000:1000"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- NVIDIA_VISIBLE_DEVICES=all

i tested it with your jinja 11 template for qwen 3.6 up to the template 18 ... and i still face this issue...
is this a problem of opencode or is this a problem with the template?

@herstrabol

because your template completely kills the thinking process.... i want the thinking process to occur....

image

your template works exactly the same as if i would turn off thinking...
i need thinking for the 8k given thinking budget, because this is what makes the model so extremely good. but i dont want to let it think for 65k tokens thats why i am limiting it.

additional your template causes which are not recognized as thinking/reasoning text tags as seen in the screenshot , no clue if its think or thinking as the tag... maybe model specific....

snapo changed discussion status to closed
snapo changed discussion status to open

Sign up or log in to comment