How to load on A100 40GB with 8bit precision?

by matorus - opened Jun 24, 2023

Jun 24, 2023

The model card states that "The size of MPT-30B was also specifically chosen to make it easy to deploy on a single GPU—either 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision."

What is the correct way of loading the model in 8-bit precision?

When I tried with dtype=torch.uint8, the loader complains:

ValueError: Can't instantiate MPTForCausalLM model under dtype=torch.uint8 since it is not a floating point dtype

matorus

Jun 24, 2023

Adding load_in_8bit=True solves the problem.

matorus changed discussion status to closed Jun 24, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment