NeMo
nvidia

qnemo file

#6
by willy1212009 - opened

did anyone do the PTQ from nemo-framework to get nemotron-340b fp8/int4 qnemo file? it should use 16H100 or 8H200 to convert, but we dont have this equipment QQ.
but it's weird that we want use quantize but it need 16H100 first lol.
in paper, it show if use quantize, only need 8
H100

https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/ptq.html

NVIDIA org

There's some quantization work in progress though not sure about int4. Will be shared once fully validated.

Sign up or log in to comment