Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ inference: true
|
|
24 |
|
25 |
## 模型信息 Model Information
|
26 |
Encoder结构为主的双向语言模型,专注于解决各种自然语言理解任务。
|
27 |
-
我们跟进了[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)的工作,使用了32张A100,总共耗时14天在悟道语料库(180 GB版本)上训练了十亿级别参数量的BERT。同时,鉴于中文语法和大规模训练的难度,我们使用四种预训练策略来改进BERT:1) 整词掩码, 2)
|
28 |
|
29 |
A bidirectional language model based on the Encoder structure, focusing on solving various NLU tasks.
|
30 |
We follow [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), using 32 A100s and spending 14 days training a billion-level BERT on WuDao Corpora (180 GB version). Given Chinese grammar and the difficulty of large-scale training, we use four pre-training procedures to improve BERT: 1) Whole Word Masking (WWM), 2) Knowledge-based Dynamic Masking (KDM), 3) Sentence Order Prediction (SOP), 4) Pre-layer Normalization (Pre-LN).
|
|
|
24 |
|
25 |
## 模型信息 Model Information
|
26 |
Encoder结构为主的双向语言模型,专注于解决各种自然语言理解任务。
|
27 |
+
我们跟进了[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)的工作,使用了32张A100,总共耗时14天在悟道语料库(180 GB版本)上训练了十亿级别参数量的BERT。同时,鉴于中文语法和大规模训练的难度,我们使用四种预训练策略来改进BERT:1) 整词掩码, 2) 知识动态遮掩, 3) 句子顺序预测, 4) 层前归一化.
|
28 |
|
29 |
A bidirectional language model based on the Encoder structure, focusing on solving various NLU tasks.
|
30 |
We follow [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), using 32 A100s and spending 14 days training a billion-level BERT on WuDao Corpora (180 GB version). Given Chinese grammar and the difficulty of large-scale training, we use four pre-training procedures to improve BERT: 1) Whole Word Masking (WWM), 2) Knowledge-based Dynamic Masking (KDM), 3) Sentence Order Prediction (SOP), 4) Pre-layer Normalization (Pre-LN).
|