Joelzhang commited on
Commit
98002e1
1 Parent(s): f856c6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -14
README.md CHANGED
@@ -7,17 +7,10 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- 人工智能的显著进步产生了许多伟大的模型,特别是基于预训练的基础模型成为了一种新兴的范式。传统的AI模型必须要在专门的巨大的数据集上为一个或几个有限的场景进行训练,相比之下,基础模型可以适应广泛的下游任务。基础模型造就了AI在低资源的场景下落地的可能。
11
- 如今的基础模型,尤其是语言模型,正在被英文社区主导着。与此同时,中文作为这个世界上最大的口语语种(母语者中),却缺乏系统性的研究资源支撑,这使得中文领域的研究进展相较于英文来说有些滞后。
12
- 为了解决中文领域研究进展滞后和研究资源严重不足的问题,[IDEA研究院](https://idea.edu.cn/)正式宣布,开启 “封神榜”开源体系——一个以中文驱动的基础生态系统,其中包括了预训练大模型,特定任务的微调应用,基准和数据集等。我们的目标是构建一个全面的,标准化的,以用户为中心的生态系统。尽管这一目标可以通过多种方式去实现,但是我们经过对中文社区的重新审视与思考,提出了我们认为最为有效的方案:
13
- - 步骤1: 从我们的[封神榜模型库](https://huggingface.co/IDEA-CCNL)中选择一个预训练好的中文NLP模型.
14
- - 步骤2: 通过阅读我们的教程示例,使用[封神框架](https://github.com/IDEA-CCNL/Fengshenbang-LM)调整模型。
15
- - 步骤3: 在我们的[封神榜单](https://fengshenbang-lm.com/benchmarks) (敬请期待)或者自定义任务中评估模型在下游任务上的表现。
16
- ____________________________________________
17
- Remarkable advances in Artificial Intelligence (AI) have produced great models, in particular, pre-trained based foundation models become an emerging paradigm. In contrast to traditional AI models that must be trained on vast datasets for one or a few scenarios, foundation models can be adapted to a wide range of downstream tasks, therefore, limiting the amount of resource demanded to acquire an AI venture off the ground.
18
- Foundation models, most notably language models, are dominated by the English-language community.
19
- The Chinese language as the world's largest spoken language (native speakers), however, has no systematic research resources to support it, making the progress in the Chinese language domain lag behind others.
20
- [IDEA](https://idea.edu.cn/) (International Digital Economy Academy) officially announces the launch of "Fengshenbang" open source project —— a Chinese language driven foundation ecosystem, incorporates pre-trained models, task-specific fine-tune applications, benchmarks, and datasets. Our goal is to build a comprehensive, standardized and user-centered ecosystem. Although this can be instantiated in a variety of ways, we present the following design that we find to be particularly effective:
21
- - Step 1: Choosing a pre-trained Chinese NLP model from our [open-source library](https://huggingface.co/IDEA-CCNL) of Fengshenbang Models.
22
- - Step 2: Employing [Fengshen Framework](https://github.com/IDEA-CCNL/Fengshenbang-LM) to adjust the model by exploring the our tutorial examples.
23
- - Step 3: Evaluating on downstream tasks, such as [Fengshenbang Benchmarks](https://fengshenbang-lm.com/benchmarks) (On going) or custom tasks.
 
7
  pinned: false
8
  ---
9
 
10
+ **[IDEA](https://idea.edu.cn/)** (International Digital Economy Academy) officially announces the launch of **"[Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)"** open source project. It open sources a series of large-scale natural languguage pretrained models. These models will bring comprehensive coverage across various model architectures, sizes and expertise domains. We guarantee that we will optimize the models continuously with new datasets and latest algorithms. We aim to build universal infrastructure for Chinese cognitive intelligence and prevent duplicative construction, and hence save computing resources for the community.
11
+
12
+
13
+ We also call for businesses, universities and institutions to join us with the project and build the sytem of large-scale open-source models collaboratively. We envision that, in the near future, the first choice when in need of a new pretrained model should be selecting one in closest proximity to the desired scale,architecture and domain from the series, followed by further training. After obtaining a trained new model, we shall add it back to the series of open-source models for future usage. In this way we build the open-source system iteratively and collaboratively while individuals could get desired models using minimal computing resources.
14
+
15
+
16
+ For better open source experience, all models of the Fengshenbang series are synchronized within the Huggingface community, and can be obtained for use within few lines of code. Welcome to download and use our models from our repo at **IDEA-CCNL** at HuggingFace.