cmeraki/OpenHathi-7B-Hi-v0.1-Base-gptq · Error when running the model as per code instructions in model card

I was trying to run this model using the code instructions as per the model card and encountered the error shared below. Note, that I am running it on Google Colab T4 runtime.

Code to replicate:

from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
from transformers import AutoTokenizer

model_dir = 'cmeraki/OpenHathi-7B-Hi-v0.1-Base-gptq'

model = AutoGPTQForCausalLM.from_quantized(model_dir, device="cuda:0")
tokenizer = AutoTokenizer.from_pretrained(model_dir, fast=True)
tokens = tokenizer("do aur do", return_tensors="pt").to(model.device)

print(tokenizer.decode(model.generate(**tokens, max_length=1024)[0]))

Error:
```
WARNING:auto_gptq.nn_modules.fused_llama_mlp:Skipping module injection for FusedLlamaMLPForQuantizedModel as currently not supported with use_triton=False.

TypeError Traceback (most recent call last)
in <cell line: 10>()
8 tokens = tokenizer("do aur do", return_tensors="pt").to(model.device)
9
---> 10 print(tokenizer.decode(model.generate(**tokens, max_length=1024)[0]))

4 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py in prepare_inputs_for_generation(self, input_ids, past_key_values, attention_mask, inputs_embeds, **kwargs)
1082 ):
1083 if past_key_values is not None:
-> 1084 past_length = past_key_values[0][0].shape[2]
1085
1086 # Some generation methods already pass only the last input ID

TypeError: 'NoneType' object is not subscriptable

Error when running the model as per code instructions in model card

Error:```WARNING:auto_gptq.nn_modules.fused_llama_mlp:Skipping module injection for FusedLlamaMLPForQuantizedModel as currently not supported with use_triton=False.

Error:
```
WARNING:auto_gptq.nn_modules.fused_llama_mlp:Skipping module injection for FusedLlamaMLPForQuantizedModel as currently not supported with use_triton=False.