Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,8 @@ language:
|
|
6 |
---
|
7 |
|
8 |
The pre-trained 3B model with the vocabulary size 43K in the paper ``Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies''. We investigate how vocabulary size
|
9 |
-
impacts language model scaling law in this paper.
|
|
|
|
|
10 |
Then, we train a Llama-based 3B model on a sampled version Slimpajama datasets. The model with 43K vocabulary outperforms the model with the common vocabulary size, 32K, despite using fewer training tokens.
|
11 |
It is noteworthy that the proposed approach can be used for different model sizes.
|
|
|
6 |
---
|
7 |
|
8 |
The pre-trained 3B model with the vocabulary size 43K in the paper ``Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies''. We investigate how vocabulary size
|
9 |
+
impacts language model scaling law in this paper.
|
10 |
+
|
11 |
+
Based on our approach, we predict the optimal vocabulary size for 3B model is about 43K.
|
12 |
Then, we train a Llama-based 3B model on a sampled version Slimpajama datasets. The model with 43K vocabulary outperforms the model with the common vocabulary size, 32K, despite using fewer training tokens.
|
13 |
It is noteworthy that the proposed approach can be used for different model sizes.
|