metadata

library_name: transformers
tags:
  - indonesia
license: mit
language:
  - id
inference: true

How small can language models be?

Sasando-1 is a tiny, highly experimental text generator built using the Phi-3 architecture.

❕Go straight to the gradio demo❕

This repo contains the 7M version.

About

Sasando-1 is a tiny, highly experimental text generator built using the Phi-3 architecture. It comes with two variations of microscopic sizes: 7M and 25M parameters. It is trained on a tightly-controlled Indo4B dataset filtered to only have 18000 unique words. The method is inspired by Microsoft's TinyStories paper which demonstrates that a tiny language model can produce fluent text when trained on tightly-controlled dataset.

Specs

Comes with 7M and 25M parameters
Based on Phi-3 architecture
Embedding vocab 4096
Trained on ~257M tokens * 4 epoch

Acknowledgments

Developed by: Afrizal Hasbi Azizy
License: MIT