viraat commited on
Commit
aec01f8
1 Parent(s): b159569

Correct some stuff

Browse files
Files changed (1) hide show
  1. README.md +5 -36
README.md CHANGED
@@ -156,7 +156,8 @@ print(tokenizer.decode(outputs[0]))
156
  ### Training
157
 
158
  - Architecture: Same as [mt5-xxl](https://huggingface.co/google/mt5-xxl)
159
- - Finetuning Steps: 25000
 
160
  - Hardware: TPUv4-128
161
  - Software: T5X, Jax
162
 
@@ -174,47 +175,15 @@ All datasets are subset to the 101 languages supported by [mT5]. See the [paper]
174
 
175
  ## Evaluation
176
 
177
- <!-- This section describes the evaluation protocols and provides the results. -->
178
-
179
- > We introduce extensive new evaluation suites that broaden the state-of-art for multilingual eval across 99 languages – including discriminative, generative tasks, human evaluation and simulated win rates that cover both held-out tasks and
180
- > in-distribution performance.
181
-
182
- Below, we provide evaluation results for the Aya model on unseen discriminative tasks, and in-distribution generative tasks compared to mT0, BLOOMZ, Bactrian-X 13B, and mT0x. To ensure a fair comparison with our Aya model in terms of language coverage, we finetune a new variant of mT5, that we dub mT0x. It is trained using the original datasets that are part of the xP3 collection but extended to 101 languages (xP3x).
183
-
184
- For Multlingual MMLU, Simulated and Human Win-rates, please refer to the [paper](arxiv.com)
185
-
186
- ### Discriminative Tasks
187
-
188
- | Model | Base Model | IFT Mixture | XCOPA (Acc %) | XNLI (Acc %) | XSC (Acc %) | XWG (Acc %) | **<u>Avg</u>** |
189
- | :---------------- | :--------- | :---------: | :-----------: | :----------: | :---------: | :---------: | :------------: |
190
- | **46 Languages** | | | | | | | |
191
- | mT0 | mT5 13B | xP3 | 75.6 | 55.3 | 87.2 | 73.6 | 72.9 |
192
- | BLOOMZ | BLOOM 176B | xP3 | 64.3 | 52.0 | 82.6 | 63.3 | 65.5 |
193
- | **52 Languages** | | | | | | | |
194
- | Bactrian-X 13B | Llama 13B | Bactrian-X | 52.4 | 34.5 | 51.8 | 50.5 | 47.3 |
195
- | **101 Languages** | | | | | | | |
196
- | mT0x | mT5 13B | xP3x | 71.7 | 45.9 | 85.1 | 60.6 | 65.8 |
197
- | Aya model | mT5 13B | All Mixture | 76.7 | 58.3 | 90.0 | 70.7 | 73.9 |
198
-
199
- ### Generative Tasks
200
-
201
- | Model | Base Model | IFT Mixture | FLORES-200 (spBleu) | FLORES-200 (spBleu) | XLSum (RougeLsum) | Tydi-QA (F1) |
202
- | :---------------- | :--------: | :---------- | :-----------------: | :-----------------: | :---------------: | :----------: |
203
- | | | | X→ En | En → X | | |
204
- | **101 Languages** | | | | | | |
205
- | mT0x | mT5 13B | xP3x | 20.2 | 14.5 | 21.4 | 76.1 |
206
- | Aya Model | mT5 13B | All Mixture | 29.1 | 19.0 | 22.0 | 77.8 |
207
-
208
- Note: We cannot compare mT0, and BLOOMZ for the above generative tasks, as the validation splits are part of mT0 and BLOOMZ's training data.
209
 
210
  ## Bias, Risks, and Limitations
211
 
212
- Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
213
- Aya model, as released, does not include any safety filtering.
214
- We hope that the release of the Aya model will make community-based redteaming efforts possible, by exposing an open-source massively-multilingual model for community research.
215
 
216
  For a detailed overview of our effort at safety mitigation and benchmarking toxicity and bias across multiple languages, we refer Sections 6 and 7 of our paper: [Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model](arxiv.com).
217
 
 
 
218
  ## Citation
219
 
220
  **BibTeX:**
 
156
  ### Training
157
 
158
  - Architecture: Same as [mt5-xxl](https://huggingface.co/google/mt5-xxl)
159
+ - Number of Finetuning Samples: 25M
160
+ - Batch size: 256
161
  - Hardware: TPUv4-128
162
  - Software: T5X, Jax
163
 
 
175
 
176
  ## Evaluation
177
 
178
+ We refer to Section 5 from our paper for multilingual eval across 99 languages – including discriminative, generative tasks, human evaluation and simulated win rates that cover both held-out tasks and in-distribution performance.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
179
 
180
  ## Bias, Risks, and Limitations
181
 
 
 
 
182
 
183
  For a detailed overview of our effort at safety mitigation and benchmarking toxicity and bias across multiple languages, we refer Sections 6 and 7 of our paper: [Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model](arxiv.com).
184
 
185
+ We hope that the release of the Aya model will make community-based redteaming efforts possible, by exposing an open-source massively-multilingual model for community research.
186
+
187
  ## Citation
188
 
189
  **BibTeX:**