Update README.md
Browse files
README.md
CHANGED
@@ -10,13 +10,12 @@ Aquila语言大模型在技术上继承了GPT-3、LLaMA等的架构设计优点
|
|
10 |
The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
|
11 |
|
12 |
## 模型细节/Model details
|
13 |
-
| Model | License | Commercial use?
|
14 |
-
| :---------------- | :------- | :-- |:-- |
|
15 |
-
| Aquila-7B
|
16 |
-
|
|
17 |
-
| AquilaCode-7B-
|
18 |
-
|
|
19 |
-
| AquilaChat-7B | Apache 2.0 | ✅ | 15万条 | 8/24x1x8 | Nvidia-A100 |
|
20 |
|
21 |
|
22 |
我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们升级了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
|
|
|
10 |
The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
|
11 |
|
12 |
## 模型细节/Model details
|
13 |
+
| Model | License | Commercial use? | GPU
|
14 |
+
| :---------------- | :------- | :-- |:-- |
|
15 |
+
| Aquila-7B | Apache 2.0 | ✅ | Nvidia-A100 |
|
16 |
+
| AquilaCode-7B-NV | Apache 2.0 | ✅ | Nvidia-A100 |
|
17 |
+
| AquilaCode-7B-TS | Apache 2.0 | ✅ | Tianshu-BI-V100 |
|
18 |
+
| AquilaChat-7B | Apache 2.0 | ✅ | Nvidia-A100 |
|
|
|
19 |
|
20 |
|
21 |
我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们升级了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
|