sail
/

scaling-vocab-3b-43k-overtrain

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tcftrees commited on Jun 18

Commit

847e7e2

•

1 Parent(s): 8ffed04

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -6,6 +6,8 @@ language:
 ---
 The pre-trained 3B model with the vocabulary size 43K in the paper ``Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies''.  We investigate how vocabulary size
-impacts language model scaling law in this paper. Based on our approach, we predict the optimal vocabulary size for 3B model is about 43K.
 Then, we train a Llama-based 3B model on a sampled version Slimpajama datasets. The model with 43K vocabulary outperforms the model with the common vocabulary size, 32K, despite using fewer training tokens.
 It is noteworthy that the proposed approach can be used for different model sizes.

 ---
 The pre-trained 3B model with the vocabulary size 43K in the paper ``Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies''.  We investigate how vocabulary size
+impacts language model scaling law in this paper.
+Based on our approach, we predict the optimal vocabulary size for 3B model is about 43K.
 Then, we train a Llama-based 3B model on a sampled version Slimpajama datasets. The model with 43K vocabulary outperforms the model with the common vocabulary size, 32K, despite using fewer training tokens.
 It is noteworthy that the proposed approach can be used for different model sizes.