metadata
library_name: transformers
tags: []
yujiepan/llama-3-tiny-random-gptq-w4
4-bit weight only quantization by AutoGPTQ on yujiepan/llama-3-tiny-random
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
import torch
model_id = "yujiepan/llama-3-tiny-random"
tokenizer = AutoTokenizer.from_pretrained(model_id)
quantization_config = GPTQConfig(
bits=4, group_size=-1,
dataset="c4",
tokenizer=tokenizer,
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
quantization_config=quantization_config,
)