BAAI
/

Anhforth commited on
Commit
f8a2e45
1 Parent(s): b3205c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -7
README.md CHANGED
@@ -10,13 +10,12 @@ Aquila语言大模型在技术上继承了GPT-3、LLaMA等的架构设计优点
10
  The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
11
 
12
  ## 模型细节/Model details
13
- | Model | License | Commercial use? | Pretraining length [tokens] | Pretraining compute (GPU days) | GPU
14
- | :---------------- | :------- | :-- |:-- | :-- | :-- |
15
- | Aquila-7B | Apache 2.0 | ✅ | 400B | dx22x8 | Nvidia-A100-40G |
16
- | Aquila-33B | Apache 2.0 | ✅ | xx | xx | Nvidia-A100 |
17
- | AquilaCode-7B-nv | Apache 2.0 | ✅ | 235B | 14x8x8 | Nvidia-A100 |
18
- | AquilaCode-7B-ts | Apache 2.0 | ✅ | 75B | 9x32x8 | Tianshu-BI-V100 |
19
- | AquilaChat-7B | Apache 2.0 | ✅ | 15万条 | 8/24x1x8 | Nvidia-A100 |
20
 
21
 
22
  我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们升级了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
 
10
  The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
11
 
12
  ## 模型细节/Model details
13
+ | Model | License | Commercial use? | GPU
14
+ | :---------------- | :------- | :-- |:-- |
15
+ | Aquila-7B | Apache 2.0 | ✅ | Nvidia-A100 |
16
+ | AquilaCode-7B-NV | Apache 2.0 | ✅ | Nvidia-A100 |
17
+ | AquilaCode-7B-TS | Apache 2.0 | ✅ | Tianshu-BI-V100 |
18
+ | AquilaChat-7B | Apache 2.0 | ✅ | Nvidia-A100 |
 
19
 
20
 
21
  我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们升级了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。