Cedric Lothritz
commited on
Commit
·
c279421
1
Parent(s):
51711b8
Update README.md
Browse files
README.md
CHANGED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
##LuxemBERT
|
2 |
+
|
3 |
+
LuxemBERT is a BERT model for the Luxembourgish language.
|
4 |
+
It was trained using 6.1 million Luxembourgish sentences from various sources including the Luxembourgish Wikipedia, the Leipzig Corpora Collection and rtl.lu.
|
5 |
+
In addition, we partially translated 6.1 million sentences from the German Wikipedia from German to Luxembourgish as means of data augmentation. This gave us a dataset of 12.2 million sentences we used to train our LuxemBERT model.
|
6 |
+
|
7 |
+
If you use our model, please cite our paper:
|
8 |
+
[Will be added later]
|