metadata
library_name: transformers
tags:
- indonesia
license: mit
language:
- id
inference: true
How small can language models be?
Sasando-1 is a tiny, highly experimental text generator built using the Phi-3 architecture.
❕Go straight to the gradio demo❕
This repo contains the 7M version.
About
Sasando-1 is a tiny, highly experimental text generator built using the Phi-3 architecture. It comes with two variations of microscopic sizes: 7M and 25M parameters. It is trained on a tightly-controlled Indo4B dataset filtered to only have 18000 unique words. The method is inspired by Microsoft's TinyStories paper which demonstrates that a tiny language model can produce fluent text when trained on tightly-controlled dataset.
Specs
- Comes with 7M and 25M parameters
- Based on Phi-3 architecture
- Embedding vocab 4096
- Trained on ~257M tokens * 4 epoch
Acknowledgments
- Developed by: Afrizal Hasbi Azizy
- License: MIT