Hugging Face welcomes the Aya Expanse family of multilingual models

Community Article Published October 24, 2024

Cohere releases two strong multilingual models namely:

The models provide support for the following 23 languages, ensuring inclusivity for a broad range of users:

Language	Script
Hindi	हिंदी
Turkish	Türkçe
Persian	فارسی
Indonesian	Bahasa Indonesia
Arabic	العربية
Chinese	中文
Czech	Čeština
Dutch	Nederlands
English	English
French	Français
German	Deutsch
Italian	Italiano
Greek	Ελληνικά
Japanese	日本語
Korean	한국어
Polish	Polski
Portuguese	Português
Romanian	Română
Russian	Русский
Spanish	Español
Ukrainian	Українська
Vietnamese	Tiếng Việt
Hebrew	עברית

Both model weights have been uploaded to the Hugging Face Hub under the CC-BY-NC license, with the additional requirement to comply with C4AI's Acceptable Use Policy.

While you will find an in-depth analysis of the models and how they were trained from the official blog post by the Cohere team, in this blog post we will help you get started with the models using the Hugging Face ecosystem.

Try out the model

Before downloading the weights one can vibe check the model using the official space hosted on Hugging Face.

Getting Started

Here is how you can use the pipeline to quickly generate text from the model.

import torch
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "CohereForAI/aya-expanse-8b" # aya-expanse-32b
dtype = torch.float16
device = "auto"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=dtype,
    device_map=device
)

generator = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation"
)

messages = [
    {"role": "user", "content": "Anneme onu ne kadar sevdiğimi anlatan bir mektup yaz"},
]
generator(
    messages,
    max_new_tokens=128
)[0]["generated_text"][-1]["content"]

Quantize the model

We at Hugging Face do not want people with limited GPU access to shy away from using the model. You can quantize the model and run an inference on a free tier colab instance!

!pip install -Uq bitsandbytes

import torch
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
    pipeline
)

model_id = "CohereForAI/aya-expanse-32b"
dtype = torch.float16
device = "auto"

tokenizer = AutoTokenizer.from_pretrained(model_id)
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=dtype,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type= "nf4"
)
quantized_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map=device,
    torch_dtype=dtype,
    quantization_config=quantization_config
)

generator = pipeline(
    model=quantized_model,
    tokenizer=tokenizer,
    task="text-generation"
)

messages = [
    {"role": "user", "content": "Anneme onu ne kadar sevdiğimi anlatan bir mektup yaz"},
]
generator(messages, max_new_tokens=128)[0]["generated_text"][-1]["content"]

From the Cohere community

Fine tune the model

Here is a notebook that shows an end to end pipeline for fine tuning the Aya Expanse models on new languages.

Other notebooks

The following notebooks contributed by Cohere For AI Community members showcase examples of how Aya Expanse can be used for different use cases:

Acknowledgement

Thanks to Pedro Cuenca for reviewing the blog post.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote