Joelzhang commited on
Commit
aea31a7
1 Parent(s): 90ce059

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -24,7 +24,7 @@ inference: true
24
 
25
  ## 模型信息 Model Information
26
  Encoder结构为主的双向语言模型,专注于解决各种自然语言理解任务。
27
- 我们跟进了[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)的工作,使用了32张A100,总共耗时14天在悟道语料库(180 GB版本)上训练了十亿级别参数量的BERT。同时,鉴于中文语法和大规模训练的难度,我们使用四种预训练策略来改进BERT:1) 整词掩码, 2) 知动态遮掩, 3) 句子顺序预测, 4) 层前归一化.
28
 
29
  A bidirectional language model based on the Encoder structure, focusing on solving various NLU tasks.
30
  We follow [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), using 32 A100s and spending 14 days training a billion-level BERT on WuDao Corpora (180 GB version). Given Chinese grammar and the difficulty of large-scale training, we use four pre-training procedures to improve BERT: 1) Whole Word Masking (WWM), 2) Knowledge-based Dynamic Masking (KDM), 3) Sentence Order Prediction (SOP), 4) Pre-layer Normalization (Pre-LN).
 
24
 
25
  ## 模型信息 Model Information
26
  Encoder结构为主的双向语言模型,专注于解决各种自然语言理解任务。
27
+ 我们跟进了[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)的工作,使用了32张A100,总共耗时14天在悟道语料库(180 GB版本)上训练了十亿级别参数量的BERT。同时,鉴于中文语法和大规模训练的难度,我们使用四种预训练策略来改进BERT:1) 整词掩码, 2) 知识动态遮掩, 3) 句子顺序预测, 4) 层前归一化.
28
 
29
  A bidirectional language model based on the Encoder structure, focusing on solving various NLU tasks.
30
  We follow [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), using 32 A100s and spending 14 days training a billion-level BERT on WuDao Corpora (180 GB version). Given Chinese grammar and the difficulty of large-scale training, we use four pre-training procedures to improve BERT: 1) Whole Word Masking (WWM), 2) Knowledge-based Dynamic Masking (KDM), 3) Sentence Order Prediction (SOP), 4) Pre-layer Normalization (Pre-LN).