TheBloke/starcoder-GPTQ · Python instructions didn't just go

I'm very new to GPTQ, so please excuse this message if I'm in error. However, I couldn't get the example python code to just work. I wound up changing it to this:

model = AutoGPTQForCausalLM.from_quantized(model_basename,
    use_safetensors=True,
    trust_remote_code=True,
    device="cuda:0",
    use_triton=use_triton,
    quantize_config=None)

user_input = '''
// A javascript function
function printHelloWorld() {
'''

inputs = tokenizer(user_input, return_tensors="pt").to(model.device)
embedding = model.generate(**inputs,
    max_new_tokens=40)[0]
outputs = tokenizer.decode(embedding)

I used cuda 1.17, torch 2.0.1+cu117, and auto-gptq 0.2.2, which perhaps spells the difference.

Thanks for uploading this. Confusing stuff at times but it sure is exciting for something new!