This is a MXFP4 quant of Qwen3-Next-80B-A3B-Thinking

Download the latest llama.cpp in order to use it.

The context has been extended from 256k to 1M, with YaRN as seen on the repo

To enable it, run llama.cpp with options like:
--ctx-size 0 --rope-scaling yarn --rope-scale 4
ctx-size 0 sets it to 1M context, else set a smaller number like 524288 for 512k

You can use also as normal if you don't want the extended context.

Downloads last month: 931

GGUF

Model size

80B params

Architecture

qwen3next

Hardware compatibility

4-bit

Model tree for noctrex/Qwen3-Next-80B-A3B-Thinking-1M-MXFP4_MOE-GGUF

Base model

Qwen/Qwen3-Next-80B-A3B-Thinking

Quantized

(42)

this model

Collection including noctrex/Qwen3-Next-80B-A3B-Thinking-1M-MXFP4_MOE-GGUF

Qwen3

Collection

Models from the Qwen3 series • 11 items • Updated 30 days ago • 2