aya-expanse-32b-gptq-4bit

This repository contains a quantized version of the CohereForAI/aya-expanse-32b model using the GPTQ method in 4-bit precision.

Model Summary

How to Use the Quantized Model

1. Install the necessary packages

Before using the quantized model, please ensure your environment has:

2. Run inference

Load and use the quantized model as shown below in Python:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Set up device
device = torch.device('cuda:1') # Remember to use the correct device here

# Load model and tokenizer
model_name = "kevinbazira/aya-expanse-32b-gptq-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, 
    device_map={"": device.index}
)

# Prepare input
# https://huggingface.co/docs/transformers/en/pad_truncation
input_text = "Add your prompt here."
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding="max_length", max_length=64)
inputs = {key: value.to(device) for key, value in inputs.items()}

# Perform text generation 
# https://huggingface.co/docs/transformers/en/main_classes/text_generation
outputs = model.generate(
    **inputs,
    num_return_sequences=1,
    min_new_tokens=64,
    max_new_tokens=64,
    do_sample=False,
    use_cache=True,
    num_beams=1
)

# Decode and print the output
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

More Information

  • Original Model: For details about the original model's architecture, training dataset, and performance, please visit the CohereForAI aya-expanse-32b model card.
  • Support or inquiries: If you run into any issues or have questions about the quantized model, feel free to reach me via email: contact@kevinbazira.com. I'll be happy to help!
Downloads last month
8
Safetensors
Model size
6.14B params
Tensor type
I32
·
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API has been turned off for this model.

Model tree for kevinbazira/aya-expanse-32b-gptq-4bit

Quantized
(20)
this model