Spaces:
Sleeping
Sleeping
Update vocab and model size (#1)
Browse files- Update vocab and model size (de665e61ebc0d8c8aba3266771919fd4579a45ce)
Co-authored-by: Terry Ming <terru3@users.noreply.huggingface.co>
app.py
CHANGED
@@ -17,8 +17,8 @@ def main():
|
|
17 |
|
18 |
st.markdown("""We used the dataset from the [TinyStories Research Paper](https://arxiv.org/pdf/2305.07759.pdf) (Ronen Eldan and Yuanzhi Li, Microsoft),
|
19 |
which consists of 2.1 million synthetic short children's stories generated by GPT-4, to train a Transformer LLM that we built from scratch in PyTorch.""")
|
20 |
-
st.markdown("""Our final model uses EleutherAI's [gpt-neo-1.3B tokenizer](https://huggingface.co/EleutherAI/gpt-neo-1.3B) (vocab size 50,
|
21 |
-
16 attention heads, and an embedding dimension of 768, for a total of
|
22 |
which is superior to any model in the TinyStories paper (likely due to a larger vocab size and far more compute).""")
|
23 |
st.markdown("""Despite the simple themes and limited vocabulary present in the training data, the model is
|
24 |
quite effective at generating new short stories. **Try it out below!**""")
|
|
|
17 |
|
18 |
st.markdown("""We used the dataset from the [TinyStories Research Paper](https://arxiv.org/pdf/2305.07759.pdf) (Ronen Eldan and Yuanzhi Li, Microsoft),
|
19 |
which consists of 2.1 million synthetic short children's stories generated by GPT-4, to train a Transformer LLM that we built from scratch in PyTorch.""")
|
20 |
+
st.markdown("""Our final model uses EleutherAI's [gpt-neo-1.3B tokenizer](https://huggingface.co/EleutherAI/gpt-neo-1.3B) (vocab size 50,257) and consists of 8 transformer blocks,
|
21 |
+
16 attention heads, and an embedding dimension of 768, for a total of ~56M non-embedding parameters. The model was trained on 8 H100 GPUs for ~7 hours, achieving a cross-entropy validation loss of 1.16,
|
22 |
which is superior to any model in the TinyStories paper (likely due to a larger vocab size and far more compute).""")
|
23 |
st.markdown("""Despite the simple themes and limited vocabulary present in the training data, the model is
|
24 |
quite effective at generating new short stories. **Try it out below!**""")
|