Edit model card

4-bit quantized version of irlab-udc/Llama-3.1-8B-Instruct-Galician.

How to Use

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "irlab-udc/Llama-3.1-8B-Instruct-Galician-GPTQ-Int4"

tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(
  model_id,
  torch_dtype=torch.float16,
  low_cpu_mem_usage=True,
  device_map="auto"
)

messages = [
  {"role": "system", "content": "You are a conversational AI that responds in Galician."},
  {"role": "user", "content": "Cal é a principal vantaxe de Scrum?"},
]

inputs = tokenizer.apply_chat_template(
  messages,
  tokenize=True,
  add_generation_prompt=True,
  return_tensors="pt",
  return_dict=True,
).to("cuda")

outputs = model.generate(**inputs, do_sample=True, max_new_tokens=512)

print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
Downloads last month
28
Safetensors
Model size
1.99B params
Tensor type
I32
·
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for irlab-udc/Llama-3.1-8B-Instruct-Galician-GPTQ-Int4

Space using irlab-udc/Llama-3.1-8B-Instruct-Galician-GPTQ-Int4 1

Collection including irlab-udc/Llama-3.1-8B-Instruct-Galician-GPTQ-Int4