BAAI
/

Aquila-7B

Transformers

PyTorch

aquila

custom_code

Inference Endpoints

Model card Files Files and versions Community

Anhforth commited on Jun 9, 2023

Commit

f8a2e45

•

1 Parent(s): b3205c8

Update README.md

Browse files

Files changed (1) hide show

README.md +6 -7

README.md CHANGED Viewed

@@ -10,13 +10,12 @@ Aquila语言大模型在技术上继承了GPT-3、LLaMA等的架构设计优点
 The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
 ## 模型细节/Model details
-|   Model          |  License    | Commercial use? | Pretraining length [tokens] | Pretraining compute (GPU days) |  GPU
-| :---------------- | :------- | :-- |:-- | :-- | :-- |
-| Aquila-7B          | Apache 2.0  |  ✅  | 400B  | dx22x8  | Nvidia-A100-40G  |
-| Aquila-33B          | Apache 2.0  |  ✅  | xx  | xx  |  Nvidia-A100  |
-| AquilaCode-7B-nv          | Apache 2.0  |  ✅  |  235B  | 14x8x8  |   Nvidia-A100   |
-| AquilaCode-7B-ts           | Apache 2.0  |  ✅  |  75B | 9x32x8  |  Tianshu-BI-V100   |
-| AquilaChat-7B           | Apache 2.0  |  ✅  | 15万条  | 8/24x1x8  | Nvidia-A100  |
 我们使用了一系列更高效的底层算子来辅助模型训练，其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算，同时还使用了RMSNorm。在此基础上，我们升级了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练，该技术采用了数据并行、ZeRO（零冗余优化器）、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。

 The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
 ## 模型细节/Model details
+|   Model          |  License    | Commercial use?  |  GPU
+| :---------------- | :------- | :-- |:-- |
+| Aquila-7B         | Apache 2.0  |  ✅   | Nvidia-A100  |
+| AquilaCode-7B-NV          | Apache 2.0  |  ✅   |   Nvidia-A100   |
+| AquilaCode-7B-TS           | Apache 2.0  |  ✅    |  Tianshu-BI-V100   |
+| AquilaChat-7B           | Apache 2.0  |  ✅    | Nvidia-A100  |
 我们使用了一系列更高效的底层算子来辅助模型训练，其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算，同时还使用了RMSNorm。在此基础上，我们升级了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练，该技术采用了数据并行、ZeRO（零冗余优化器）、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。