What engine should be used to infer this model?

by RobertLiu0905 - opened Sep 26

Sep 26

Thank you for you contribution，my question is : What engine should be used to infer this model?

Oct 6

Oct 21

Oct 24

how use vllm run this model , could you give some tips or examples

dsikka

NM Testing org Oct 24

•

czqqq

Oct 30

Why does the speed of the quantized model decrease significantly?

dsikka

NM Testing org Oct 30

How are you running it?

Oct 31

How are you running it?

after finished deepseek_moe_w4a16.py, you will get a int4 model, size should near 112G, then run it with vllm 0.6, i failed with 054 version, try to skip it. https://github.com/vllm-project/llm-compressor/issues/857

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment