NeMo
PyTorch
English
seq2seq
masked language modeling
MaximumEntropy commited on
Commit
f967637
1 Parent(s): ebff116

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -24,7 +24,7 @@ img {
24
 
25
  ## Model Description
26
 
27
- NeMo Megatron-T5 3B is a transformer-based masked language model. [T5](https://arxiv.org/abs/1910.10683) [1] is a class of encoder-decoder models trained with a span-based masked language modeling objective. We follow the [T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1) approach of pre-training using only the masked language modeling objective. It has Tensor Parallelism (TP) of 2, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU for inference and 2 A100 GPUs for finetuning.
28
 
29
  This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
30
 
 
24
 
25
  ## Model Description
26
 
27
+ NeMo Megatron-T5 3B is a transformer-based masked language model. [T5](https://arxiv.org/abs/1910.10683) [1] is a class of encoder-decoder models trained with a span-based masked language modeling objective. We follow the [T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1) approach of pre-training using only the masked language modeling objective. It has Tensor Parallelism (TP) of 2, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU for inference and 2 A100 80G GPUs for finetuning.
28
 
29
  This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
30