How to load on A100 40GB with 8bit precision?

#5
by matorus - opened

The model card states that "The size of MPT-30B was also specifically chosen to make it easy to deploy on a single GPU—either 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision."

What is the correct way of loading the model in 8-bit precision?

When I tried with dtype=torch.uint8, the loader complains:

ValueError: Can't instantiate MPTForCausalLM model under dtype=torch.uint8 since it is not a floating point dtype

Adding load_in_8bit=True solves the problem.

matorus changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment