MaximumEntropy commited on
Commit
b49e292
1 Parent(s): f301d6f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +143 -0
README.md CHANGED
@@ -1,3 +1,146 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
2
  license: cc-by-4.0
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ library_name: nemo
5
+ datasets:
6
+ - the_pile
7
+ tags:
8
+ - text2text-generation
9
+ - pytorch
10
+ - seq2seq
11
+ - masked language modeling
12
+ - multilingual
13
  license: cc-by-4.0
14
+
15
  ---
16
+ # NeMo Megatron-T5 3B
17
+
18
+ <style>
19
+ img {
20
+ display: inline;
21
+ }
22
+ </style>
23
+
24
+ |[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-3B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
25
+
26
+
27
+ ## Model Description
28
+
29
+ NeMo Megatron-mT5 3B is a *multilingual* transformer-based masked language model. [mT5](https://arxiv.org/abs/2010.11934) [1] is a class of encoder-decoder models trained with a span-based masked language modeling objective on a dataset comprising documents from many different languages. We follow the [T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1) approach of pre-training using only the masked language modeling objective. It has Tensor Parallelism (TP) of 2, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU for inference and 2 A100 80G GPUs for finetuning.
30
+
31
+ This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
32
+
33
+ ## List of Languages
34
+
35
+ We pre-trained our mT5 model on the following languages from the [mC4](https://github.com/allenai/allennlp/discussions/5265) dataset.
36
+
37
+ 1. Japanese
38
+ 2. English
39
+ 3. Italian
40
+ 4. Latvian
41
+ 5. Russian
42
+ 6. Hungarian
43
+ 7. Chinese
44
+ 8. Polish
45
+ 9. Greek
46
+ 10. German
47
+ 11. Czech
48
+ 12. Korean
49
+ 13. Hindi
50
+ 14. Norwegian
51
+ 15. Danish
52
+ 16. Slovak
53
+ 17. French
54
+ 18. Portuguese
55
+ 19. Lithuanian
56
+ 20. Spanish
57
+ 21. Dutch
58
+ 22. Swedish
59
+ 23. Romanian
60
+ 24. Finnish
61
+
62
+ *NOTE*: The English data used to train our model is the smaller "clean" version (C4) used in the [T5 paper](https://arxiv.org/abs/1910.10683) and not the larger one distributed as part of mC4.
63
+
64
+ ## Getting started
65
+
66
+ ### Step 1: Install NeMo and dependencies
67
+
68
+ You will need to install NVIDIA Apex and NeMo.
69
+
70
+ ```
71
+ git clone https://github.com/ericharper/apex.git
72
+ cd apex
73
+ git checkout nm_v1.11.0
74
+ pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
75
+ ```
76
+
77
+ ```
78
+ pip install nemo_toolkit['nlp']==1.11.0
79
+ ```
80
+
81
+ Alternatively, you can use NeMo Megatron training docker container with all dependencies pre-installed - [https://developer.nvidia.com/nemo-megatron-open-beta?nvid=nv-int-tblg-249896](https://developer.nvidia.com/nemo-megatron-open-beta)
82
+
83
+ ### Step 2: Run inference
84
+
85
+ **Note.** The model has been trained with Tensor Parallelism (TP) of 2 and Pipeline Parallelism (PP) of 1, but it should be possible to run inference with tensor parallel size 1 on most NVIDIA GPUs
86
+
87
+ ```
88
+ git clone https://github.com/NVIDIA/NeMo.git
89
+ cd NeMo/examples/nlp/language_modeling
90
+ git checkout v1.11.0
91
+ python megatron_t5_eval.py \
92
+ --model_file /raid/Data/NMT/Models/t5_3b/nemo_megatron_mt5_3b_bf16_tp2.nemo \
93
+ --prompt "La capitale de la France est <mask>" --tensor_model_parallel_size 2 \
94
+ --tensor_model_parallel_size 2
95
+ ```
96
+
97
+ The script will automatically replace all \<mask\> tokens with the appropriate sentinel tokens used while pre-training and attempt to fill them in autoregressively with greedy decoding.
98
+
99
+
100
+ *Expected Response*:
101
+
102
+ ```
103
+ {
104
+ 'prompt': 'La capitale de la France est <mask>',
105
+ 'completion': {
106
+ 'text': 'Paris',
107
+ 'tokens': [(4586, '▁Paris', 0.0)]},
108
+ 'masked_input': '▁La ▁capital e ▁de ▁la ▁France ▁est ▁<extra_id_0>'
109
+ }
110
+ ```
111
+
112
+ - prompt: The provided raw prompt as input
113
+ - completion:
114
+ - text: The final generated text from the model along with special/sentinel tokens besides \</s\>
115
+ - tokens: Each individual subword that is generated along with its log-probability.
116
+ - masked_input: The original raw prompt with <mask> replaced with appropriate sentinel tokens.
117
+
118
+ ## Training Data
119
+
120
+ The model was trained on the [mC4](https://github.com/allenai/allennlp/discussions/5265) dataset made available by AI2 and hosted on Huggingface.
121
+
122
+ ## Evaluation results
123
+
124
+ Zero-shot language transformer performance on the [XNLI](https://arxiv.org/abs/1809.05053) dataset for a model fine-tuned on MNLI.
125
+
126
+ | English | Spanish | German | French | Chinese|
127
+ |---|---| ---|---|---|
128
+ |89.4|86.4|84.5|85.8|79.9|
129
+
130
+ ## Limitations
131
+
132
+ The model was trained on the data originally crawled from the Internet. This data contains toxic language and societal biases. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts.
133
+
134
+ ## References
135
+
136
+ [1] [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934)
137
+
138
+ [2] [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/pdf/1909.08053.pdf)
139
+
140
+ [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
141
+
142
+ [4] [XNLI: Evaluating Cross-lingual Sentence Representations](https://arxiv.org/abs/1809.05053)
143
+
144
+ ## Licence
145
+
146
+ License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.