afrizalha
/

Sasando-1-7M

@@ -34,10 +34,10 @@ inference: true
     <p><em style="color: black; font-weight: bold;">This repo contains the 7M version.</em></p>
 </center>
-### 🎻 Welcome!
 Sasando-1 is a tiny, highly experimental Indonesian text generator built using the Phi-3 architecture. It comes with two variations of microscopic sizes: 7M and 25M parameters. It is trained on a tightly-controlled Indo4B dataset filtered to only have 18000 unique words. The method is inspired by Microsoft's TinyStories paper which demonstrates that a tiny language model can produce fluent text when trained on tightly-controlled dataset.
-### 🇮🇩 Context
 Indonesia has +700 languages, and many of them are dying at an alarming rate. Language technologies like generative AI can play a massive role in language preservation. However, Indonesia has several contextual issues:
 - Many languages, including those with millions of speakers, have low-volume digital resources
@@ -45,18 +45,18 @@ Indonesia has +700 languages, and many of them are dying at an alarming rate. La
 Overcoming these challenges require developers to work with what little data and money that they have. Sasando-1 is a prototypical demonstration that thinly-available resources can potentially still be leveraged to develop generative models with cheap compute.
-### ✨ Specs
 - Comes with 7M and 25M parameters
 - Based on Phi-3 architecture
 - Embedding vocab 4096
 - Trained on ~257M tokens * 4 epoch
-### 🔭 Out-of-Scope Use
 This is a research preview base model. It is not intruction-tuned and has minimal safety curation. It is not intended for commercial or practical applications.
 You are also not allowed to use this model without having fun.
-### Acknowledgments
 - **Developed by:** Afrizal Hasbi Azizy
 - **License:** MIT

     <p><em style="color: black; font-weight: bold;">This repo contains the 7M version.</em></p>
 </center>
+## 🎻 Welcome!
 Sasando-1 is a tiny, highly experimental Indonesian text generator built using the Phi-3 architecture. It comes with two variations of microscopic sizes: 7M and 25M parameters. It is trained on a tightly-controlled Indo4B dataset filtered to only have 18000 unique words. The method is inspired by Microsoft's TinyStories paper which demonstrates that a tiny language model can produce fluent text when trained on tightly-controlled dataset.
+## 🇮🇩 Context
 Indonesia has +700 languages, and many of them are dying at an alarming rate. Language technologies like generative AI can play a massive role in language preservation. However, Indonesia has several contextual issues:
 - Many languages, including those with millions of speakers, have low-volume digital resources
 Overcoming these challenges require developers to work with what little data and money that they have. Sasando-1 is a prototypical demonstration that thinly-available resources can potentially still be leveraged to develop generative models with cheap compute.
+## ✨ Specs
 - Comes with 7M and 25M parameters
 - Based on Phi-3 architecture
 - Embedding vocab 4096
 - Trained on ~257M tokens * 4 epoch
+## 🔭 Out-of-Scope Use
 This is a research preview base model. It is not intruction-tuned and has minimal safety curation. It is not intended for commercial or practical applications.
 You are also not allowed to use this model without having fun.
+## Acknowledgments
 - **Developed by:** Afrizal Hasbi Azizy
 - **License:** MIT