ERCDiDip
/

40_langdetect_v01

Text Classification

Transformers

PyTorch

xlm-roberta

Inference Endpoints

Model card Files Files and versions Community

ERCDiDip commited on Nov 21, 2022

Commit

e2ee39d

•

1 Parent(s): b0d6d83

Update README.md

Browse files

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -17,11 +17,11 @@ This model is a fine-tuned version of xlm-roberta-base on the [monasterium.net](
 On the top of this XLM-RoBERTa transformer model is a classification head. Please refer this model together with to the [XLM-RoBERTa (base-sized model)](https://huggingface.co/xlm-roberta-base) card or the paper [Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al.](https://arxiv.org/abs/1911.02116) for additional information.
 ## Intended uses & limitations
-You can directly use this model as a language detector, i.e. for sequence classification tasks. Currently, it supports the following 40 languages, modern and medieval:
 Modern: Bulgarian (bg), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hungarian (hu), Irish (ga), Italian (it), Latvian (lv), Lithuanian (lt), Maltese (mt), Polish (pl), Portuguese (pt), Romanian (ro), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv),  Russian (ru), Turkish (tr), Basque (eu), Catalan (ca), Albanian (sq), Serbian (se), Ukrainian (uk), Norwegian (no), Arabic (ar), Chinese (zh), Hebrew (he)
-Medieval: Middle High German (mhd), Latin (la), Middle Low German (gml), Old French (fro), Old Church Slavonic (chu), Early New High German (fnhd)
 ## Training and evaluation data
 The model was fine-tuned using the Monasterium and Wikipedia datasets, which consist of text sequences in 40 languages. The training set contains 80k samples, while the validation and test sets contain 16k. The average accuracy on the test set is 99.59% (this matches the average macro/weighted F1-score, the test set being perfectly balanced).
@@ -67,6 +67,9 @@ classificator = pipeline("text-classification", model="ERCDiDip/40_langdetect_v0
 classificator("clemens etc dilecto filio scolastico ecclesie wetflari ensi treveren dioc salutem etc significarunt nobis dilecti filii commendator et fratres hospitalis beate marie theotonicorum")
 ```
 ## Framework versions
 - Transformers 4.24.0
 - Pytorch 1.13.0

 On the top of this XLM-RoBERTa transformer model is a classification head. Please refer this model together with to the [XLM-RoBERTa (base-sized model)](https://huggingface.co/xlm-roberta-base) card or the paper [Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al.](https://arxiv.org/abs/1911.02116) for additional information.
 ## Intended uses & limitations
+You can directly use this model as a language detector, i.e. for sequence classification tasks. Currently, it supports the following 41 languages, modern and medieval:
 Modern: Bulgarian (bg), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hungarian (hu), Irish (ga), Italian (it), Latvian (lv), Lithuanian (lt), Maltese (mt), Polish (pl), Portuguese (pt), Romanian (ro), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv),  Russian (ru), Turkish (tr), Basque (eu), Catalan (ca), Albanian (sq), Serbian (se), Ukrainian (uk), Norwegian (no), Arabic (ar), Chinese (zh), Hebrew (he)
+Medieval: Middle High German (mhd), Latin (la), Middle Low German (gml), Old French (fro), Old Church Slavonic (chu), Early New High German (fnhd), Ancient and Medieval Greek (grc)
 ## Training and evaluation data
 The model was fine-tuned using the Monasterium and Wikipedia datasets, which consist of text sequences in 40 languages. The training set contains 80k samples, while the validation and test sets contain 16k. The average accuracy on the test set is 99.59% (this matches the average macro/weighted F1-score, the test set being perfectly balanced).
 classificator("clemens etc dilecto filio scolastico ecclesie wetflari ensi treveren dioc salutem etc significarunt nobis dilecti filii commendator et fratres hospitalis beate marie theotonicorum")
 ```
+## Updates
+- 21. November 2022: Ancient and Medieval Greek addition
 ## Framework versions
 - Transformers 4.24.0
 - Pytorch 1.13.0