Cannot run with tensor parallel > 1. Might need padding like on Qwen2.5-72B?

#2
by OwenArli - opened

Getting the same error in vllm as shown in this issue: https://github.com/issues/recent?issue=vllm-project%7Cvllm%7C17604

Is this the same issue that prevented Qwen2.5-72B from being run in tensor parallel? That it needs to be padded before being quanted to int4? https://qwen.readthedocs.io/zh-cn/latest/quantization/gptq.html#troubleshooting

Sign up or log in to comment