File size: 5,442 Bytes
f301d6f b49e292 dcd1a42 b49e292 dcd1a42 b49e292 7424e99 b49e292 f301d6f b49e292 f301d6f 2b99a4f b49e292 32eb9be b49e292 f87bf66 b49e292 1ddbe20 b49e292 1ddbe20 b49e292 3a8f86d b49e292 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
---
language:
- ja
- en
- it
- lv
- ru
- hu
- zh
- pl
- el
- de
- cs
- ko
- hi
- no
- da
- sk
- fr
- pt
- lt
- es
- nl
- sv
- ro
- fi
library_name: nemo
datasets:
- mc4
tags:
- pytorch
- seq2seq
- masked language modeling
- multilingual
license: cc-by-4.0
---
# NeMo Megatron-mT5 3B
<style>
img {
display: inline;
}
</style>
|[![Model architecture](https://img.shields.io/badge/Arch-Encoder--Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-3B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-Multilingual-green)](#datasets)
## Model Description
NeMo Megatron-mT5 3B is a *multilingual* transformer-based masked language model. [mT5](https://arxiv.org/abs/2010.11934) [1] is a class of encoder-decoder models trained with a span-based masked language modeling objective on a dataset comprising documents from many different languages. We follow the [T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1) approach of pre-training using only the masked language modeling objective. It has Tensor Parallelism (TP) of 2, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU for inference and 2 A100 80G GPUs for finetuning.
This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
**NOTE**: Weights are distributed in bfloat16.
## List of Languages
We pre-trained our mT5 model on the following languages from the [mC4](https://github.com/allenai/allennlp/discussions/5265) dataset.
1. Japanese
2. English
3. Italian
4. Latvian
5. Russian
6. Hungarian
7. Chinese
8. Polish
9. Greek
10. German
11. Czech
12. Korean
13. Hindi
14. Norwegian
15. Danish
16. Slovak
17. French
18. Portuguese
19. Lithuanian
20. Spanish
21. Dutch
22. Swedish
23. Romanian
24. Finnish
*NOTE*: The English data used to train our model is the smaller "clean" version (C4) used in the [T5 paper](https://arxiv.org/abs/1910.10683) and not the larger one distributed as part of mC4.
## Getting started
### Step 1: Install NeMo and dependencies
You will need to install NVIDIA Apex and NeMo.
```
git clone https://github.com/ericharper/apex.git
cd apex
git checkout nm_v1.11.0
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
```
```
pip install nemo_toolkit['nlp']==1.12.0
```
Alternatively, you can use NeMo Megatron training docker container with all dependencies pre-installed - [https://developer.nvidia.com/nemo-megatron-open-beta?nvid=nv-int-tblg-249896](https://developer.nvidia.com/nemo-megatron-open-beta)
### Step 2: Run inference
**Note.** The model has been trained with Tensor Parallelism (TP) of 2 and Pipeline Parallelism (PP) of 1, but it should be possible to run inference with tensor parallel size 1 on most NVIDIA GPUs
```
git clone https://github.com/NVIDIA/NeMo.git
cd NeMo/examples/nlp/language_modeling
git checkout r1.12.0
python megatron_t5_eval.py \
--model_file nemo_megatron_mt5_3b_bf16_tp2.nemo \
--prompt "La capitale de la France est <mask>" \
--tensor_model_parallel_size 2
```
The script will automatically replace all \<mask\> tokens with the appropriate sentinel tokens used while pre-training and attempt to fill them in autoregressively with greedy decoding.
*Expected Response*:
```
{
'prompt': 'La capitale de la France est <mask>',
'completion': {
'text': 'Paris',
'tokens': [(4586, '▁Paris', 0.0)]},
'masked_input': '▁La ▁capital e ▁de ▁la ▁France ▁est ▁<extra_id_0>'
}
```
- prompt: The provided raw prompt as input
- completion:
- text: The final generated text from the model along with special/sentinel tokens besides \</s\>
- tokens: Each individual subword that is generated along with its log-probability.
- masked_input: The original raw prompt with <mask> replaced with appropriate sentinel tokens.
## Training Data
The model was trained on the [mC4](https://github.com/allenai/allennlp/discussions/5265) dataset made available by AI2 and hosted on Huggingface.
## Evaluation results
Zero-shot language transformer performance on the [XNLI](https://arxiv.org/abs/1809.05053) dataset for a model fine-tuned on MNLI.
| English | Spanish | German | French | Chinese|
|---|---| ---|---|---|
|89.4|86.4|84.5|85.8|79.9|
## Limitations
The model was trained on the data originally crawled from the Internet. This data contains toxic language and societal biases. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts.
## References
[1] [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934)
[2] [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/pdf/1909.08053.pdf)
[3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
[4] [XNLI: Evaluating Cross-lingual Sentence Representations](https://arxiv.org/abs/1809.05053)
## Licence
License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
|