nvidia
/

nemo-megatron-t5-3B

masked language modeling

Model card Files Files and versions Community

MaximumEntropy commited on Sep 20, 2022

Commit

f967637

•

1 Parent(s): ebff116

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ img {
 ## Model Description
-NeMo Megatron-T5 3B is a transformer-based masked language model. [T5](https://arxiv.org/abs/1910.10683) [1] is a class of encoder-decoder models trained with a span-based masked language modeling objective. We follow the [T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1) approach of pre-training using only the masked language modeling objective. It has Tensor Parallelism (TP) of 2, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU for inference and 2 A100 GPUs for finetuning.
 This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).

 ## Model Description
+NeMo Megatron-T5 3B is a transformer-based masked language model. [T5](https://arxiv.org/abs/1910.10683) [1] is a class of encoder-decoder models trained with a span-based masked language modeling objective. We follow the [T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1) approach of pre-training using only the masked language modeling objective. It has Tensor Parallelism (TP) of 2, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU for inference and 2 A100 80G GPUs for finetuning.
 This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).