How to do the inference of this model with transformers?

#2
by kopyl - opened

This gives an error:

model_name = "AzureBlack/airoboros-l2-70b-2.2.1-5bpw-6h-exl2"
model = LlamaForCausalLM.from_pretrained(
    model_name,
    use_safetensors=True,
)

Could not locate model-00001-of-00015.safetensors inside AzureBlack/airoboros-l2-70b-2.2.1-5bpw-6h-exl2.

Transformers won't work. You have to use Exllama 2 for this version. There is a link to the original model in the card if you want a version that uses transformers.

https://github.com/turboderp/exllamav2

@AzureBlack thanks. Could you please add this to all your models? :)

@AzureBlack by the way, is there a reason for why it does not work with transformers? Just curious...

The various quantization methods to reduce model size all use different algorithms to approximate weight values. Most of these have not been added to the transformer library for support. I don't follow the transformer project close enough to know exactly why, but they likely have other more high-priority projects to focus on.

@AzureBlack thank you very much

Sign up or log in to comment