How to load on A100 40GB with 8bit precision?
#5
by
matorus
- opened
The model card states that "The size of MPT-30B was also specifically chosen to make it easy to deploy on a single GPU—either 1xA100-80GB in 16-bit precision or 1xA100-40GB in 8-bit precision."
What is the correct way of loading the model in 8-bit precision?
When I tried with dtype=torch.uint8
, the loader complains:
ValueError: Can't instantiate MPTForCausalLM model under dtype=torch.uint8 since it is not a floating point dtype
Adding load_in_8bit=True
solves the problem.
matorus
changed discussion status to
closed