metadata

license: llama3.1
pipeline_tag: text-generation
tags:
  - facebook
  - meta
  - pytorch
  - llama
  - llama-3
datasets:
  - Kushtrim/alpaca-cleaned-sq
language:
  - sq

Kushtrim/Llama-3.1-8B-Instruct-bnb-4bit-shqip

Model overview

Kushtrim/Llama-3.1-8B-Instruct-bnb-4bit-shqip is a fine-tuned version of the Llama 3.1 model, specifically optimized for Albanian language tasks. This model is tailored to perform a variety of natural language processing tasks in Albanian, utilizing a quantized 4-bit precision to maintain efficiency and scalability while supporting extensive inference tasks.

Model Details

Model Name: Kushtrim/Llama-3.1-8B-Instruct-bnb-4bit-shqip
Base Model: Llama 3.1
Model Size: 8 billion parameters
Quantization: 4-bit precision (bnb)
Language: Albanian
License: llama3.1

Limitations

Representation of Harms & Stereotypes: Potential for biased outputs reflecting real-world societal biases.
Inappropriate or Offensive Content: Risk of generating content that may be offensive or inappropriate in certain contexts.
Information Reliability: Possibility of producing inaccurate or outdated information.
Dataset Size: The Albanian dataset used for fine-tuning was not very large, which may affect the model's performance and coverage.

Intended Use

Intended Use Cases: This model is suitable for various NLP tasks in Albanian, including conversational AI, text generation, and language understanding.
Out-of-scope Use: This model should not be used in ways that violate laws, regulations, or ethical guidelines. It is also not intended for use in languages other than Albanian unless appropriately fine-tuned.

Responsible AI Considerations

Developers using this model should:

Evaluate and mitigate risks related to accuracy, safety, and fairness.
Ensure compliance with applicable laws and regulations.
Implement additional safeguards for high-risk scenarios and sensitive contexts.
Inform end-users that they are interacting with an AI system.
Use feedback mechanisms and contextual information grounding techniques (RAG) to enhance output reliability.

!pip3 install -U transformers peft accelerate bitsandbytes

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

hf_token = "hf_...."

torch.random.manual_seed(0)

model = AutoModelForCausalLM.from_pretrained(
    "Kushtrim/Llama-3.1-8B-Instruct-bnb-4bit-shqip",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
    token=hf_token,
)

tokenizer = AutoTokenizer.from_pretrained("Kushtrim/Llama-3.1-8B-Instruct-bnb-4bit-shqip", token=hf_token)

messages = [
    {"role": "system", "content": "Je një asistent inteligjent shumë i dobishëm."},
    {"role": "user", "content": "Identifiko emrat e personave në këtë artikull 'Majlinda Kelmendi (lindi më 9 maj 1991), është një xhudiste shqiptare nga Peja, Kosovë.'"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 2048,
    "return_full_text": False,
    "temperature": 0.9,
    "do_sample": True,
}

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
output = pipe(prompt, **generation_args)
print(output[0]['generated_text'])

Acknowledgements

This model is built upon the Meta-Llama-3.1-8B-Instruct by leveraging its robust capabilities and further fine-tuning it for Albanian language tasks. Special thanks to the developers and researchers who contributed to the original Llama3.1.