101 20 21

Michael Goin

mgoin

mgoin_
mgoin

AI & ML interests

LLM inference optimization, compression, quantization, pruning, distillation

Recent Activity

updated a model 1 day ago

mgoin/GLM-4.6-FP8-BLOCK

published a model 1 day ago

mgoin/GLM-4.6-FP8-BLOCK

updated a model 28 days ago

inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4

View all activity

Organizations

New activity in inference-optimization/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 28 days ago

Fix invalid config

#1 opened 28 days ago by

mgoin

New activity in kernels-community/vllm-flash-attn3 4 months ago

Support for B200s?

👀 5

#7 opened 5 months ago by

shriramc

New activity in RedHatAI/Mistral-Small-3.2-24B-Instruct-2506-FP8 7 months ago

Quantization recipe?

#3 opened 7 months ago by

veden

Not working with vLLM 0.9.1

#1 opened 7 months ago by

zacksiri

New activity in RedHatAI/Llama-3.2-3B-Instruct-quantized.w8a8 7 months ago

Update config.json with the correct state

#1 opened 7 months ago by

dsikka

New activity in MiniMaxAI/MiniMax-Text-01 8 months ago

Make model config compatible with Hugging Face MiniMax implementation

#39 opened 8 months ago by

geetu040

New activity in mistralai/Magistral-Small-2506 8 months ago

Missing Tokenizer/Processor for use with Transformers

👍 1

#3 opened 8 months ago by

mgoin

New activity in RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic 9 months ago

How should I input the image?

#3 opened 9 months ago by

CyberWolf0

New activity in RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8 9 months ago

用vllm serve启动不了

#2 opened 11 months ago by

VenomEY

New activity in RedHatAI/Qwen2.5-VL-72B-Instruct-FP8-dynamic 10 months ago

Fix processor_class to match upstream

#4 opened 10 months ago by

zifeitong

New activity in RedHatAI/Qwen2.5-VL-3B-Instruct-FP8-dynamic 10 months ago

Remove image_processor_type

#1 opened 11 months ago by

pooya-davoodi-parasail

New activity in nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic 10 months ago

OSError: nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic does not appear to have a file named decilm.py

#2 opened 10 months ago by

TheDrummer

how to deploy this model without internet connection

#1 opened 10 months ago by

superahn

New activity in RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic 10 months ago

Why not FP8 with static and per-tensor quantization?

👍 1

#2 opened 10 months ago by

wanzhenchn

New activity in mistralai/Mistral-Small-3.1-24B-Instruct-2503 10 months ago

Address discrepancies in the languages supported by the Mistral Small 3.1 2503

🔥 1

#54 opened 10 months ago by

fpaupier

New activity in RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic 10 months ago

Please update the chat template

#1 opened 10 months ago by

stelterlab

New activity in mistralai/Mistral-Small-3.1-24B-Instruct-2503 10 months ago

FP8 Dynamic/W8A16 Quants Please

#44 opened 11 months ago by

rjmehta

Problem hosting the model using vllm

➕ 3

#45 opened 11 months ago by

ShaoServient

New activity in RedHatAI/Qwen2.5-VL-72B-Instruct-quantized.w8a8 11 months ago

Remove image_processor_type

#1 opened 11 months ago by

pooya-davoodi-parasail

New activity in RedHatAI/Qwen2.5-VL-7B-Instruct-quantized.w8a8 11 months ago

Remove image_processor_type

#1 opened 11 months ago by

pooya-davoodi-parasail

Michael Goin

AI & ML interests

Recent Activity

Organizations

mgoin's activity

Fix invalid config

Support for B200s?

Quantization recipe?

Not working with vLLM 0.9.1

Update config.json with the correct state

Make model config compatible with Hugging Face MiniMax implementation

Missing Tokenizer/Processor for use with Transformers

How should I input the image?

用vllm serve启动不了

Fix processor_class to match upstream

Remove image_processor_type

OSError: nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic does not appear to have a file named decilm.py

how to deploy this model without internet connection

Why not FP8 with static and per-tensor quantization?

Address discrepancies in the languages supported by the Mistral Small 3.1 2503

Please update the chat template

FP8 Dynamic/W8A16 Quants Please

Problem hosting the model using vllm

Remove image_processor_type

Remove image_processor_type