Quantization support.

#1
by AV99 - opened

Are there any plans of releasing 8bit versions support for this?

Add _no_split_modules = ["CodeT5pBlock"] to class CodeT5pEncoderDecoderModel in modeling_codet5p.py and now device_map="auto" should work. now you can just use bitsandbytes to do 8bit inference, which will let you run this model with a 24gb gpu.
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True, low_cpu_mem_usage=True, trust_remote_code=True)

If you are a windows user you can find a bnb build here: https://github.com/acpopescu/bitsandbytes/releases

Hey Verah, For https://huggingface.co/mosaicml/mpt-7b-instruct where should I add _no_split_modules, and what will be the value?

Thanks in advance.

Are there any plans of releasing 4bit versions support for this? Thanks.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment