This is a MXFP4 quant of Qwen3-Next-80B-A3B-Thinking

Download the latest llama.cpp in order to use it.

The context has been extended from 256k to 1M, with YaRN as seen on the repo

To enable it, run llama.cpp with options like:
--ctx-size 0 --rope-scaling yarn --rope-scale 4
ctx-size 0 sets it to 1M context, else set a smaller number like 524288 for 512k

You can use also as normal if you don't want the extended context.

Downloads last month
931
GGUF
Model size
80B params
Architecture
qwen3next
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for noctrex/Qwen3-Next-80B-A3B-Thinking-1M-MXFP4_MOE-GGUF

Quantized
(42)
this model

Collection including noctrex/Qwen3-Next-80B-A3B-Thinking-1M-MXFP4_MOE-GGUF