IDEA-CCNL
/

Erlangshen-MegatronBert-1.3B

Inference Endpoints

Model card Files Files and versions Community

Joelzhang commited on Sep 16, 2022

Commit

aea31a7

•

1 Parent(s): 90ce059

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ inference: true
 ## 模型信息 Model Information
 Encoder结构为主的双向语言模型，专注于解决各种自然语言理解任务。
-我们跟进了[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)的工作，使用了32张A100，总共耗时14天在悟道语料库（180 GB版本）上训练了十亿级别参数量的BERT。同时，鉴于中文语法和大规模训练的难度，我们使用四种预训练策略来改进BERT：1) 整词掩码, 2) 知动态遮掩, 3) 句子顺序预测, 4) 层前归一化.
 A bidirectional language model based on the Encoder structure, focusing on solving various NLU tasks.
 We follow [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), using 32 A100s and spending 14 days training a billion-level BERT on WuDao Corpora (180 GB version). Given Chinese grammar and the difficulty of large-scale training, we use four pre-training procedures to improve BERT: 1) Whole Word Masking (WWM), 2) Knowledge-based Dynamic Masking (KDM), 3) Sentence Order Prediction (SOP), 4) Pre-layer Normalization (Pre-LN).

 ## 模型信息 Model Information
 Encoder结构为主的双向语言模型，专注于解决各种自然语言理解任务。
+我们跟进了[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)的工作，使用了32张A100，总共耗时14天在悟道语料库（180 GB版本）上训练了十亿级别参数量的BERT。同时，鉴于中文语法和大规模训练的难度，我们使用四种预训练策略来改进BERT：1) 整词掩码, 2) 知识动态遮掩, 3) 句子顺序预测, 4) 层前归一化.
 A bidirectional language model based on the Encoder structure, focusing on solving various NLU tasks.
 We follow [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), using 32 A100s and spending 14 days training a billion-level BERT on WuDao Corpora (180 GB version). Given Chinese grammar and the difficulty of large-scale training, we use four pre-training procedures to improve BERT: 1) Whole Word Masking (WWM), 2) Knowledge-based Dynamic Masking (KDM), 3) Sentence Order Prediction (SOP), 4) Pre-layer Normalization (Pre-LN).