Hugging Face welcomes the Aya Expanse family of multilingual models

Community Article Published October 24, 2024

image/png

Cohere releases two strong multilingual models namely:

  1. aya-expanse-8B
  2. aya-expanse-32B

The models provide support for the following 23 languages, ensuring inclusivity for a broad range of users:

Language Script
Hindi हिंदी
Turkish Türkçe
Persian فارسی
Indonesian Bahasa Indonesia
Arabic العربية
Chinese 中文
Czech Čeština
Dutch Nederlands
English English
French Français
German Deutsch
Italian Italiano
Greek Ελληνικά
Japanese 日本語
Korean 한국어
Polish Polski
Portuguese Português
Romanian Română
Russian Русский
Spanish Español
Ukrainian Українська
Vietnamese Tiếng Việt
Hebrew עברית

Both model weights have been uploaded to the Hugging Face Hub under the CC-BY-NC license, with the additional requirement to comply with C4AI's Acceptable Use Policy.

While you will find an in-depth analysis of the models and how they were trained from the official blog post by the Cohere team, in this blog post we will help you get started with the models using the Hugging Face ecosystem.

Try out the model

Before downloading the weights one can vibe check the model using the official space hosted on Hugging Face.

Getting Started

Here is how you can use the pipeline to quickly generate text from the model.

import torch
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "CohereForAI/aya-expanse-8b" # aya-expanse-32b
dtype = torch.float16
device = "auto"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=dtype,
    device_map=device
)

generator = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation"
)

messages = [
    {"role": "user", "content": "Anneme onu ne kadar sevdiğimi anlatan bir mektup yaz"},
]
generator(
    messages,
    max_new_tokens=128
)[0]["generated_text"][-1]["content"]

Quantize the model

We at Hugging Face do not want people with limited GPU access to shy away from using the model. You can quantize the model and run an inference on a free tier colab instance!

!pip install -Uq bitsandbytes

import torch
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    pipeline
)

model_id = "CohereForAI/aya-expanse-32b"
dtype = torch.float16
device = "auto"

tokenizer = AutoTokenizer.from_pretrained(model_id)
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=dtype,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type= "nf4"
)
quantized_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map=device,
    torch_dtype=dtype,
    quantization_config=quantization_config
)

generator = pipeline(
    model=quantized_model,
    tokenizer=tokenizer,
    task="text-generation"
)

messages = [
    {"role": "user", "content": "Anneme onu ne kadar sevdiğimi anlatan bir mektup yaz"},
]
generator(messages, max_new_tokens=128)[0]["generated_text"][-1]["content"]

From the Cohere community

Fine tune the model

Here is a notebook that shows an end to end pipeline for fine tuning the Aya Expanse models on new languages.

Other notebooks

The following notebooks contributed by Cohere For AI Community members showcase examples of how Aya Expanse can be used for different use cases:

  1. Mulitlingual Writing Assistant
  2. AyaMCooking
  3. Multilingual Question-Answering System

Acknowledgement

Thanks to Pedro Cuenca for reviewing the blog post.