Merge branch 'main' of https://huggingface.co/DTAI-KULeuven/robbertje-1-gb-shuffled into main
Browse files
README.md
ADDED
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: "nl"
|
3 |
+
thumbnail: "https://github.com/iPieter/RobBERT/raw/master/res/robbert_logo.png"
|
4 |
+
tags:
|
5 |
+
- Dutch
|
6 |
+
- Flemish
|
7 |
+
- RoBERTa
|
8 |
+
- RobBERT
|
9 |
+
- RobBERTje
|
10 |
+
license: mit
|
11 |
+
datasets:
|
12 |
+
- oscar
|
13 |
+
- oscar (NL)
|
14 |
+
- dbrd
|
15 |
+
- lassy-ud
|
16 |
+
- europarl-mono
|
17 |
+
- conll2002
|
18 |
+
widget:
|
19 |
+
- text: "Hallo, ik ben RobBERTje, een gedistilleerd <mask> taalmodel van de KU Leuven."
|
20 |
+
---
|
21 |
+
|
22 |
+
<p align="center">
|
23 |
+
<img src="https://github.com/iPieter/robbertje/raw/master/images/robbertje_logo_with_name.png" alt="RobBERTje: A collection of distilled Dutch BERT-based models" width="75%">
|
24 |
+
</p>
|
25 |
+
|
26 |
+
# About RobBERTje
|
27 |
+
RobBERTje is a collection of distilled models based on [RobBERT](http://github.com/iPieter/robbert). There are multiple models with different sizes and different training settings, which you can choose for your use-case.
|
28 |
+
|
29 |
+
We are also continuously working on releasing better-performing models, so watch [the repository](http://github.com/iPieter/robbertje) for updates.
|
30 |
+
|
31 |
+
# News
|
32 |
+
- **July 2, 2021**: Publicly released 4 RobBERTje models.
|
33 |
+
- **May 12, 2021**: RobBERTje was accepted at [CLIN31](https://www.clin31.ugent.be) for an oral presentation!
|
34 |
+
|
35 |
+
# The models
|
36 |
+
| Model | Description | Parameters | Training size | Huggingface id |
|
37 |
+
|--------------|-------------|------------------|-------------------|------------------------------------------------------------------------------------|
|
38 |
+
| Non-shuffled | Trained on the non-shuffled variant of the oscar corpus, without any operations to preserve this order during training and distillation. | 74 M | 1 GB | this model |
|
39 |
+
| Shuffled | Trained on the publicly available and shuffled OSCAR corpus. | 74 M | 1 GB | [DTAI-KULeuven/robbertje-1-gb-shuffled](https://huggingface.co/DTAI-KULeuven/robbertje-1-gb-shuffled) |
|
40 |
+
| Merged (p=0.5) | Same as the non-shuffled variant, but sequential sentences of the same document are merged with a probability of 50%. | 74 M | 1 GB | [DTAI-KULeuven/robbertje-1-gb-merged](https://huggingface.co/DTAI-KULeuven/robbertje-1-gb-merged) |
|
41 |
+
| BORT | A smaller version with 8 attention heads instead of 12 and 4 layers instead of 6 (and 12 for RobBERT). | 46 M | 1 GB | [DTAI-KULeuven/robbertje-1-gb-bort](https://huggingface.co/DTAI-KULeuven/robbertje-1-gb-bort) |
|
42 |
+
|
43 |
+
# Results
|
44 |
+
|
45 |
+
## Intrinsic results
|
46 |
+
|
47 |
+
We calculated the _pseudo perplexity_ (PPPL) from [cite](), which is a built-in metric in our distillation library. This metric gives an indication of how well the model captures the input distribution.
|
48 |
+
|
49 |
+
| Model | PPPL |
|
50 |
+
|-------------------|-----------|
|
51 |
+
| RobBERT (teacher) | 7.76 |
|
52 |
+
| Non-shuffled | 12.95 |
|
53 |
+
| Shuffled | 18.74 |
|
54 |
+
| Merged (p=0.5) | 17.10 |
|
55 |
+
| BORT | 26.44 |
|
56 |
+
|
57 |
+
## Extrinsic results
|
58 |
+
We also evaluated our models on sereral downstream tasks, just like the teacher model RobBERT. Since that evaluation, a [Dutch NLI task named SICK-NL](https://arxiv.org/abs/2101.05716) was also released and we evaluated our models with it as well.
|
59 |
+
|
60 |
+
| Model | DBRD | DIE-DAT | NER | POS |SICK-NL |
|
61 |
+
|------------------|-----------|-----------|-----------|-----------|----------|
|
62 |
+
| RobBERT (teacher)|94.4 | 99.2 |89.1 |96.4 | 84.2 |
|
63 |
+
| Non-shuffled |90.2 | 98.4 |82.9 |95.5 | 83.4 |
|
64 |
+
| Shuffled |92.5 | 98.2 |82.7 |95.6 | 83.4 |
|
65 |
+
| Merged (p=0.5) |92.9 | 96.5 |81.8 |95.2 | 82.8 |
|
66 |
+
| BORT |89.6 | 92.2 |79.7 |94.3 | 81.0 |
|