CohereForAI
/

aya-101

Text2Text Generation

Transformers

Safetensors

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

viraat commited on Feb 12

Commit

aec01f8

•

1 Parent(s): b159569

Correct some stuff

Browse files

Files changed (1) hide show

README.md +5 -36

README.md CHANGED Viewed

@@ -156,7 +156,8 @@ print(tokenizer.decode(outputs[0]))
 ### Training
 - Architecture: Same as [mt5-xxl](https://huggingface.co/google/mt5-xxl)
-- Finetuning Steps: 25000
 - Hardware: TPUv4-128
 - Software: T5X, Jax
@@ -174,47 +175,15 @@ All datasets are subset to the 101 languages supported by [mT5]. See the [paper]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-> We introduce extensive new evaluation suites that broaden the state-of-art for multilingual eval across 99 languages – including discriminative, generative tasks, human evaluation and simulated win rates that cover both held-out tasks and
-> in-distribution performance.
-Below, we provide evaluation results for the Aya model on unseen discriminative tasks, and in-distribution generative tasks compared to mT0, BLOOMZ, Bactrian-X 13B, and mT0x. To ensure a fair comparison with our Aya model in terms of language coverage, we finetune a new variant of mT5, that we dub mT0x. It is trained using the original datasets that are part of the xP3 collection but extended to 101 languages (xP3x).
-For Multlingual MMLU, Simulated and Human Win-rates, please refer to the [paper](arxiv.com)
-### Discriminative Tasks
-| Model             | Base Model | IFT Mixture | XCOPA (Acc %) | XNLI (Acc %) | XSC (Acc %) | XWG (Acc %) | **<u>Avg</u>** |
-| :---------------- | :--------- | :---------: | :-----------: | :----------: | :---------: | :---------: | :------------: |
-| **46 Languages**  |            |             |               |              |             |             |                |
-| mT0               | mT5 13B    |     xP3     |     75.6      |     55.3     |    87.2     |    73.6     |      72.9      |
-| BLOOMZ            | BLOOM 176B |     xP3     |     64.3      |     52.0     |    82.6     |    63.3     |      65.5      |
-| **52 Languages**  |            |             |               |              |             |             |                |
-| Bactrian-X 13B    | Llama 13B  | Bactrian-X  |     52.4      |     34.5     |    51.8     |    50.5     |      47.3      |
-| **101 Languages** |            |             |               |              |             |             |                |
-| mT0x              | mT5 13B    |    xP3x     |     71.7      |     45.9     |    85.1     |    60.6     |      65.8      |
-| Aya model         | mT5 13B    | All Mixture |     76.7      |     58.3     |    90.0     |    70.7     |      73.9      |
-### Generative Tasks
-| Model             | Base Model | IFT Mixture | FLORES-200 (spBleu) | FLORES-200 (spBleu) | XLSum (RougeLsum) | Tydi-QA (F1) |
-| :---------------- | :--------: | :---------- | :-----------------: | :-----------------: | :---------------: | :----------: |
-|                   |            |             |        X→ En        |       En → X        |                   |              |
-| **101 Languages** |            |             |                     |                     |                   |              |
-| mT0x              |  mT5 13B   | xP3x        |        20.2         |        14.5         |       21.4        |     76.1     |
-| Aya Model         |  mT5 13B   | All Mixture |        29.1         |        19.0         |       22.0        |     77.8     |
-Note: We cannot compare mT0, and BLOOMZ for the above generative tasks, as the validation splits are part of mT0 and BLOOMZ's training data.
 ## Bias, Risks, and Limitations
-Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
-Aya model, as released, does not include any safety filtering.
-We hope that the release of the Aya model will make community-based redteaming efforts possible, by exposing an open-source massively-multilingual model for community research.
 For a detailed overview of our effort at safety mitigation and benchmarking toxicity and bias across multiple languages, we refer Sections 6 and 7 of our paper: [Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model](arxiv.com).
 ## Citation
 **BibTeX:**

 ### Training
 - Architecture: Same as [mt5-xxl](https://huggingface.co/google/mt5-xxl)
+- Number of Finetuning Samples: 25M
+- Batch size: 256
 - Hardware: TPUv4-128
 - Software: T5X, Jax
 ## Evaluation
+We refer to Section 5 from our paper for multilingual eval across 99 languages – including discriminative, generative tasks, human evaluation and simulated win rates that cover both held-out tasks and in-distribution performance.
 ## Bias, Risks, and Limitations
 For a detailed overview of our effort at safety mitigation and benchmarking toxicity and bias across multiple languages, we refer Sections 6 and 7 of our paper: [Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model](arxiv.com).
+We hope that the release of the Aya model will make community-based redteaming efforts possible, by exposing an open-source massively-multilingual model for community research.
 ## Citation
 **BibTeX:**