rmihaylov
/

bert-base-pos-theseus-bg

Token Classification

Model card Files Files and versions Community

rmihaylov commited on Apr 16, 2022

Commit

e85ab91

•

1 Parent(s): eb118eb

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -16,10 +16,12 @@ tags:
 Pretrained model on Bulgarian language using a masked language modeling (MLM) objective. It was introduced in
 [this paper](https://arxiv.org/abs/1810.04805) and first released in
 [this repository](https://github.com/google-research/bert). This model is cased: it does make a difference
-between bulgarian and Bulgarian.
 It was finetuned on public part-of-speech Bulgarian data.
 ### How to use
 Here is how to use this model in PyTorch:

 Pretrained model on Bulgarian language using a masked language modeling (MLM) objective. It was introduced in
 [this paper](https://arxiv.org/abs/1810.04805) and first released in
 [this repository](https://github.com/google-research/bert). This model is cased: it does make a difference
+between bulgarian and Bulgarian. The training data is Bulgarian text from [OSCAR](https://oscar-corpus.com/post/oscar-2019/), [Chitanka](https://chitanka.info/) and [Wikipedia](https://bg.wikipedia.org/).
 It was finetuned on public part-of-speech Bulgarian data.
+Then, it was compressed via [progressive module replacing](https://arxiv.org/abs/2002.02925).
 ### How to use
 Here is how to use this model in PyTorch: