Pieter Delobelle commited on
Commit
27b9c63
2 Parent(s): 1407707 cc44325

Merge branch 'main' of https://huggingface.co/DTAI-KULeuven/robbertje-1-gb-shuffled into main

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "nl"
3
+ thumbnail: "https://github.com/iPieter/RobBERT/raw/master/res/robbert_logo.png"
4
+ tags:
5
+ - Dutch
6
+ - Flemish
7
+ - RoBERTa
8
+ - RobBERT
9
+ - RobBERTje
10
+ license: mit
11
+ datasets:
12
+ - oscar
13
+ - oscar (NL)
14
+ - dbrd
15
+ - lassy-ud
16
+ - europarl-mono
17
+ - conll2002
18
+ widget:
19
+ - text: "Hallo, ik ben RobBERTje, een gedistilleerd <mask> taalmodel van de KU Leuven."
20
+ ---
21
+
22
+ <p align="center">
23
+ <img src="https://github.com/iPieter/robbertje/raw/master/images/robbertje_logo_with_name.png" alt="RobBERTje: A collection of distilled Dutch BERT-based models" width="75%">
24
+ </p>
25
+
26
+ # About RobBERTje
27
+ RobBERTje is a collection of distilled models based on [RobBERT](http://github.com/iPieter/robbert). There are multiple models with different sizes and different training settings, which you can choose for your use-case.
28
+
29
+ We are also continuously working on releasing better-performing models, so watch [the repository](http://github.com/iPieter/robbertje) for updates.
30
+
31
+ # News
32
+ - **July 2, 2021**: Publicly released 4 RobBERTje models.
33
+ - **May 12, 2021**: RobBERTje was accepted at [CLIN31](https://www.clin31.ugent.be) for an oral presentation!
34
+
35
+ # The models
36
+ | Model | Description | Parameters | Training size | Huggingface id |
37
+ |--------------|-------------|------------------|-------------------|------------------------------------------------------------------------------------|
38
+ | Non-shuffled | Trained on the non-shuffled variant of the oscar corpus, without any operations to preserve this order during training and distillation. | 74 M | 1 GB | this model |
39
+ | Shuffled | Trained on the publicly available and shuffled OSCAR corpus. | 74 M | 1 GB | [DTAI-KULeuven/robbertje-1-gb-shuffled](https://huggingface.co/DTAI-KULeuven/robbertje-1-gb-shuffled) |
40
+ | Merged (p=0.5) | Same as the non-shuffled variant, but sequential sentences of the same document are merged with a probability of 50%. | 74 M | 1 GB | [DTAI-KULeuven/robbertje-1-gb-merged](https://huggingface.co/DTAI-KULeuven/robbertje-1-gb-merged) |
41
+ | BORT | A smaller version with 8 attention heads instead of 12 and 4 layers instead of 6 (and 12 for RobBERT). | 46 M | 1 GB | [DTAI-KULeuven/robbertje-1-gb-bort](https://huggingface.co/DTAI-KULeuven/robbertje-1-gb-bort) |
42
+
43
+ # Results
44
+
45
+ ## Intrinsic results
46
+
47
+ We calculated the _pseudo perplexity_ (PPPL) from [cite](), which is a built-in metric in our distillation library. This metric gives an indication of how well the model captures the input distribution.
48
+
49
+ | Model | PPPL |
50
+ |-------------------|-----------|
51
+ | RobBERT (teacher) | 7.76 |
52
+ | Non-shuffled | 12.95 |
53
+ | Shuffled | 18.74 |
54
+ | Merged (p=0.5) | 17.10 |
55
+ | BORT | 26.44 |
56
+
57
+ ## Extrinsic results
58
+ We also evaluated our models on sereral downstream tasks, just like the teacher model RobBERT. Since that evaluation, a [Dutch NLI task named SICK-NL](https://arxiv.org/abs/2101.05716) was also released and we evaluated our models with it as well.
59
+
60
+ | Model | DBRD | DIE-DAT | NER | POS |SICK-NL |
61
+ |------------------|-----------|-----------|-----------|-----------|----------|
62
+ | RobBERT (teacher)|94.4 | 99.2 |89.1 |96.4 | 84.2 |
63
+ | Non-shuffled |90.2 | 98.4 |82.9 |95.5 | 83.4 |
64
+ | Shuffled |92.5 | 98.2 |82.7 |95.6 | 83.4 |
65
+ | Merged (p=0.5) |92.9 | 96.5 |81.8 |95.2 | 82.8 |
66
+ | BORT |89.6 | 92.2 |79.7 |94.3 | 81.0 |