Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
A 6.8 billion parameter pre-trained model for Japanese language, based on EleutherAI's Mesh Transformer JAX, that has a similar model structure to their GPT-J-6B pre-trained model.
|
2 |
+
|
3 |
+
EleutherAIによるMesh Transformer JAXをコードベースとした、GPT-J-6Bに似たストラクチャと約68.7億パラメータを持つ日本語pre-trainedモデルです。
|
4 |
+
|
5 |
+
- We used T5Tokenizer and SentencePiece instead of GPT-2/3 tokenizer. Normalization done by SentencePiece is must for Japanese tokenizing as there are so much many more variations for common symbols than Western languages.
|
6 |
+
- Tokenizer has a vocabulary of 52,500 tokens and trained on Japanese Wikipedia dump as of 01 Aug 2021.
|
7 |
+
- The model fits within 16GB VRAM GPUs like P100 for inference up to 1688 context length. Full 2048 context length output requires 20GB VRAM or more (e.g. GTX3090/A5000).
|
8 |
+
- The model was trained with TPUv3-128 generously provided by Google TRC for about 4 weeks.
|
9 |
+
|
10 |
+
## Specifications
|
11 |
+
|
12 |
+
| Hyperparameter | Value |
|
13 |
+
|-------------------|--------|
|
14 |
+
| n_parameters | 6,876,450,080 |
|
15 |
+
| n_layers | 32 |
|
16 |
+
| d_model | 4,096 |
|
17 |
+
| d_ff | 16,384 |
|
18 |
+
| n_heads | 16 |
|
19 |
+
| d_head | 256 |
|
20 |
+
| n_ctx | 2,048 |
|
21 |
+
| n_vocab | 52,512 |
|
22 |
+
| position encoding | [Rotary position encodings (RoPE)](https://arxiv.org/abs/2104.09864) |
|
23 |
+
| RoPE dimensions | 64 |
|