Sasando-1-7M / README.md
afrizalha's picture
Update README.md
c8ca4c8 verified
|
raw
history blame
1.66 kB
metadata
library_name: transformers
tags:
  - indonesia
license: mit
language:
  - id
inference: true

How small can language models be?

Sasando

Sasando-1 is a tiny, highly experimental text generator built using the Phi-3 architecture.

❕Go straight to the gradio demo❕

This repo contains the 7M version.

About

Sasando-1 is a tiny, highly experimental text generator built using the Phi-3 architecture. It comes with two variations of microscopic sizes: 7M and 25M parameters. It is trained on a tightly-controlled Indo4B dataset filtered to only have 18000 unique words. The method is inspired by Microsoft's TinyStories paper which demonstrates that a tiny language model can produce fluent text when trained on tightly-controlled dataset.

Specs

  • Comes with 7M and 25M parameters
  • Based on Phi-3 architecture
  • Embedding vocab 4096
  • Trained on ~257M tokens * 4 epoch

Acknowledgments

  • Developed by: Afrizal Hasbi Azizy
  • License: MIT