naclbit
/

gpt-j-japanese-6.8b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

naclbit commited on Oct 17, 2021

Commit

66fb516

•

1 Parent(s): d5c9b0d

Create README.md

Files changed (1) hide show

README.md +23 -0

README.md ADDED Viewed

	@@ -0,0 +1,23 @@

+A 6.8 billion parameter pre-trained model for Japanese language, based on EleutherAI's Mesh Transformer JAX, that has a similar model structure to their GPT-J-6B pre-trained model.
+EleutherAIによるMesh Transformer JAXをコードベースとした、GPT-J-6Bに似たストラクチャと約68.7億パラメータを持つ日本語pre-trainedモデルです。
+- We used T5Tokenizer and SentencePiece instead of GPT-2/3 tokenizer. Normalization done by SentencePiece is must for Japanese tokenizing as there are so much many more variations for common symbols than Western languages.
+- Tokenizer has a vocabulary of 52,500 tokens and trained on Japanese Wikipedia dump as of 01 Aug 2021.
+- The model fits within 16GB VRAM GPUs like P100 for inference up to 1688 context length. Full 2048 context length output requires 20GB VRAM or more (e.g. GTX3090/A5000).
+- The model was trained with TPUv3-128 generously provided by Google TRC for about 4 weeks.
+## Specifications
+| Hyperparameter    | Value  |
+|-------------------|--------|
+| n_parameters      | 6,876,450,080 |
+| n_layers          | 32 |
+| d_model           | 4,096  |
+| d_ff              | 16,384 |
+| n_heads           | 16     |
+| d_head            | 256    |
+| n_ctx             | 2,048  |
+| n_vocab           | 52,512 |
+| position encoding | [Rotary position encodings (RoPE)](https://arxiv.org/abs/2104.09864) |
+| RoPE dimensions   | 64 |