MaximumEntropy
commited on
Commit
•
f967637
1
Parent(s):
ebff116
Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ img {
|
|
24 |
|
25 |
## Model Description
|
26 |
|
27 |
-
NeMo Megatron-T5 3B is a transformer-based masked language model. [T5](https://arxiv.org/abs/1910.10683) [1] is a class of encoder-decoder models trained with a span-based masked language modeling objective. We follow the [T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1) approach of pre-training using only the masked language modeling objective. It has Tensor Parallelism (TP) of 2, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU for inference and 2 A100 GPUs for finetuning.
|
28 |
|
29 |
This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
|
30 |
|
|
|
24 |
|
25 |
## Model Description
|
26 |
|
27 |
+
NeMo Megatron-T5 3B is a transformer-based masked language model. [T5](https://arxiv.org/abs/1910.10683) [1] is a class of encoder-decoder models trained with a span-based masked language modeling objective. We follow the [T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1) approach of pre-training using only the masked language modeling objective. It has Tensor Parallelism (TP) of 2, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU for inference and 2 A100 80G GPUs for finetuning.
|
28 |
|
29 |
This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
|
30 |
|