IDEA-CCNL
/

Erlangshen-DeBERTa-v2-320M-Chinese

Inference Endpoints

Model card Files Files and versions Community

wanng commited on Sep 22, 2022

Commit

ab206dd

•

1 Parent(s): d3779cf

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -21,9 +21,9 @@ widget:
 ## 简介 Brief Introduction
-善于处理NLU任务，采用全词掩码的，中文版的3.2亿参数DeBERTa-v2-large。
-Good at solving NLU tasks, adopting Whole Word Masking, Chinese DeBERTa-v2-large with 320M parameters.
 ## 模型分类 Model Taxonomy
@@ -33,6 +33,8 @@ Good at solving NLU tasks, adopting Whole Word Masking, Chinese DeBERTa-v2-large
 ## 模型信息 Model Information
 为了得到一个中文版的DeBERTa-v2-large（320M），我们用悟道语料库(180G版本)进行预训练。我们在MLM中使用了全词掩码(wwm)的方式。具体地，我们在预训练阶段中使用了[封神框架](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen)大概花费了8张A100（80G）约7天。
 To get a Chinese DeBERTa-v2-large (320M), we use WuDao Corpora (180 GB version) for pre-training. We employ the Whole Word Masking (wwm) in MLM. Specifically, we use the [fengshen framework](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen) in the pre-training phase which cost about 7 days with 8 A100（80G） GPUs.

 ## 简介 Brief Introduction
+善于处理NLU任务，采用全词掩码的，中文版的3.2亿参数DeBERTa-v2-Large。
+Good at solving NLU tasks, adopting Whole Word Masking, Chinese DeBERTa-v2-Large with 320M parameters.
 ## 模型分类 Model Taxonomy
 ## 模型信息 Model Information
+参考论文：[DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://readpaper.com/paper/3033187248)
 为了得到一个中文版的DeBERTa-v2-large（320M），我们用悟道语料库(180G版本)进行预训练。我们在MLM中使用了全词掩码(wwm)的方式。具体地，我们在预训练阶段中使用了[封神框架](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen)大概花费了8张A100（80G）约7天。
 To get a Chinese DeBERTa-v2-large (320M), we use WuDao Corpora (180 GB version) for pre-training. We employ the Whole Word Masking (wwm) in MLM. Specifically, we use the [fengshen framework](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen) in the pre-training phase which cost about 7 days with 8 A100（80G） GPUs.