metadata
language: mn
ALBERT-Mongolian
Model description
Here we provide pretrained ALBERT model and trained SentencePiece model for Mongolia text. Training data is the Mongolian wikipedia corpus from Wikipedia Downloads and Mongolian News corpus.
Evaluation Result:
loss = 1.7478163
masked_lm_accuracy = 0.6838185
masked_lm_loss = 1.6687671
sentence_order_accuracy = 0.998125
sentence_order_loss = 0.007942731
Fine-tuning Result on Eduge Dataset:
precision recall f1-score support
байгал орчин 0.85 0.83 0.84 999
боловсрол 0.80 0.80 0.80 873
спорт 0.98 0.98 0.98 2736
технологи 0.88 0.93 0.91 1102
улс төр 0.92 0.85 0.89 2647
урлаг соёл 0.93 0.94 0.94 1457
хууль 0.89 0.87 0.88 1651
эдийн засаг 0.83 0.88 0.86 2509
эрүүл мэнд 0.89 0.92 0.90 1159
accuracy 0.90 15133
macro avg 0.89 0.89 0.89 15133
weighted avg 0.90 0.90 0.90 15133
Reference
- ALBERT - official repo
- WikiExtrator
- Mongolian BERT
- ALBERT - Japanese
- Mongolian Text Classification
- You's paper
Citation
@misc{albert-mongolian,
author = {Bayartsogt Yadamsuren},
title = {ALBERT Pretrained Model on Mongolian Datasets},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/bayartsogt-ya/albert-mongolian/}}
}
For More Information
Please contact by bayartsogtyadamsuren@icloud.com