Qwen
/

Text Generation
Transformers
Safetensors
qwen3_next
conversational

Support of 1M context doubt

#2
by clyang33 - opened

Hi Qwen Team,

You guys were using DCA for enabling 1M context on the https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507 model why switch over to yarn. I am just curious it is being left off?

clyang33 changed discussion title from Support of 1M context to Support of 1M context doubt

I think this key
sparse_attention_config
Is not model agnostic, they need to search the optimal value. Give it some time, maybe they will add It later.

Sign up or log in to comment