Mangosteen, a 47 billion-token Thai corpus built with a Thai-adapted pipeline, improves language model performance on Thai benchmarks.
Wannaphong Phatthiyaphaibun PRO
wannaphong
AI & ML interests
None yet
Recent Activity
updated
a dataset
1 day ago
wannaphong/fineweb2-thai-cleaned
published
a dataset
1 day ago
wannaphong/backup-thai_dolma-gamble_web_full
updated
a dataset
2 days ago
wannaphong/backup-thai_dolma-gamble_web_full