GadflyII

AI & ML interests

None yet

Recent Activity

new activity 11 days ago

GadflyII/GLM-4.7-Flash-MTP-NVFP4:SGLang and MTP

new activity 24 days ago

GadflyII/Qwen3-Coder-Next-NVFP4:Model requests?

new activity 24 days ago

GadflyII/Qwen3-Coder-Next-NVFP4:Why Your NVFP4 Model Is Slower Than FP8 on the GB10 (NVIDIA Spark) — And How to Fix It

View all activity

Organizations

New activity in GadflyII/GLM-4.7-Flash-MTP-NVFP4 11 days ago

SGLang and MTP

#2 opened 23 days ago by

Michalea

New activity in GadflyII/Qwen3-Coder-Next-NVFP4 24 days ago

Model requests?

#4 opened about 1 month ago by

pathosethoslogos

Why Your NVFP4 Model Is Slower Than FP8 on the GB10 (NVIDIA Spark) — And How to Fix It

👍 2

#5 opened about 1 month ago by

scottgl

New activity in GadflyII/GLM-4.6V-NVFP4 24 days ago

Fails on a single DGX spark with errors below

#2 opened 29 days ago by

Adrian1234

New activity in GadflyII/GLM-4.7-Flash-MXFP4 about 1 month ago

Update MXFP4 format to compressed-tensors

#3 opened about 1 month ago by

mgoin

New activity in lukealonso/MiniMax-M2.5-NVFP4 about 1 month ago

Here's the vLLM recipe I'm using with 2x RTX Pro 6000

👍 3

#1 opened about 1 month ago by

zenmagnets

New activity in GadflyII/Qwen3-Coder-Next-NVFP4 about 2 months ago

MMLU PRO Benchmark

#3 opened about 2 months ago by

sevapru

vLLM 0.16?

#2 opened about 2 months ago by

MMaxHugg

Memory

#1 opened about 2 months ago by

struxx

New activity in GadflyII/GLM-4.7-Flash-NVFP4 about 2 months ago

confused response

#8 opened about 2 months ago by

jiangyizhi

MTP quality, 47 layer

#7 opened about 2 months ago by

Michalea

New activity in GadflyII/GLM-4.7-Flash-MTP-NVFP4 about 2 months ago

Upload folder using huggingface_hub

#1 opened about 2 months ago by

GadflyII

New activity in GadflyII/GLM-4.6V-NVFP4 about 2 months ago

Well done nvfp4 quant

#1 opened about 2 months ago by

josephbreda

New activity in GadflyII/GLM-4.7-Flash-NVFP4 2 months ago

Can't deploy by vllm 0.14.1 + transformers

#6 opened 2 months ago by

Butterfly-314

New activity in GadflyII/GLM-4.7-Flash-MXFP4 2 months ago

can not run

#1 opened 2 months ago by

aliez-ren

New activity in GadflyII/GLM-4.7-Flash-NVFP4 2 months ago

please create mlx version of this

#4 opened 2 months ago by

Narutoouz

Wasn't able to recreate MMLU-Pro benchmarks

#5 opened 2 months ago by

zenmagnets

New activity in GadflyII/MiniMax-M2.1-NVFP4 2 months ago

Request for GLM 4.6V

#1 opened 3 months ago by

SFPLM

New activity in GadflyII/GLM-4.7-Flash-NVFP4 2 months ago

GadflyII/GLM-4.7-Flash-NVFP4

#3 opened 2 months ago by

Yu21342

Really appreciate that you ran performance comparison tests with BF16!

#2 opened 2 months ago by

zenmagnets

GadflyII

AI & ML interests

Recent Activity

Organizations

GadflyII's activity

SGLang and MTP

Model requests?

Why Your NVFP4 Model Is Slower Than FP8 on the GB10 (NVIDIA Spark) — And How to Fix It

Fails on a single DGX spark with errors below

Update MXFP4 format to compressed-tensors

Here's the vLLM recipe I'm using with 2x RTX Pro 6000

MMLU PRO Benchmark

vLLM 0.16?

Memory

confused response

MTP quality, 47 layer

Upload folder using huggingface_hub

Well done nvfp4 quant

Can't deploy by vllm 0.14.1 + transformers

can not run

please create mlx version of this

Wasn't able to recreate MMLU-Pro benchmarks

Request for GLM 4.6V

GadflyII/GLM-4.7-Flash-NVFP4

Really appreciate that you ran performance comparison tests with BF16!