Hugging Face welcomes the Aya Expanse family of multilingual models
Cohere releases two strong multilingual models namely:
The models provide support for the following 23 languages, ensuring inclusivity for a broad range of users:
Language | Script |
---|---|
Hindi | हिंदी |
Turkish | Türkçe |
Persian | فارسی |
Indonesian | Bahasa Indonesia |
Arabic | العربية |
Chinese | 中文 |
Czech | Čeština |
Dutch | Nederlands |
English | English |
French | Français |
German | Deutsch |
Italian | Italiano |
Greek | Ελληνικά |
Japanese | 日本語 |
Korean | 한국어 |
Polish | Polski |
Portuguese | Português |
Romanian | Română |
Russian | Русский |
Spanish | Español |
Ukrainian | Українська |
Vietnamese | Tiếng Việt |
Hebrew | עברית |
Both model weights have been uploaded to the Hugging Face Hub under the CC-BY-NC license, with the additional requirement to comply with C4AI's Acceptable Use Policy.
While you will find an in-depth analysis of the models and how they were trained from the official blog post by the Cohere team, in this blog post we will help you get started with the models using the Hugging Face ecosystem.
Try out the model
Before downloading the weights one can vibe check the model using the official space hosted on Hugging Face.
Getting Started
Here is how you can use the pipeline
to quickly generate text from the model.
import torch
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "CohereForAI/aya-expanse-8b" # aya-expanse-32b
dtype = torch.float16
device = "auto"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=dtype,
device_map=device
)
generator = pipeline(
model=model,
tokenizer=tokenizer,
task="text-generation"
)
messages = [
{"role": "user", "content": "Anneme onu ne kadar sevdiğimi anlatan bir mektup yaz"},
]
generator(
messages,
max_new_tokens=128
)[0]["generated_text"][-1]["content"]
Quantize the model
We at Hugging Face do not want people with limited GPU access to shy away from using the model. You can quantize the model and run an inference on a free tier colab instance!
!pip install -Uq bitsandbytes
import torch
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
BitsAndBytesConfig,
pipeline
)
model_id = "CohereForAI/aya-expanse-32b"
dtype = torch.float16
device = "auto"
tokenizer = AutoTokenizer.from_pretrained(model_id)
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=dtype,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type= "nf4"
)
quantized_model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map=device,
torch_dtype=dtype,
quantization_config=quantization_config
)
generator = pipeline(
model=quantized_model,
tokenizer=tokenizer,
task="text-generation"
)
messages = [
{"role": "user", "content": "Anneme onu ne kadar sevdiğimi anlatan bir mektup yaz"},
]
generator(messages, max_new_tokens=128)[0]["generated_text"][-1]["content"]
From the Cohere community
Fine tune the model
Here is a notebook that shows an end to end pipeline for fine tuning the Aya Expanse models on new languages.
Other notebooks
The following notebooks contributed by Cohere For AI Community members showcase examples of how Aya Expanse can be used for different use cases:
Acknowledgement
Thanks to Pedro Cuenca for reviewing the blog post.