---
license: mit
language:
- de
metrics:
- accuracy
tags:
- twitter
---

# T-GBERT

This is a [GBERT-base](https://huggingface.co/deepset/gbert-base) with continued pretraining
on roughly 33 million German X/Twitter deduplicated posts from 2020. The pretraining is follows the task-adaptive pretraining setup suggested by
[Gururangan et al. (2020)](https://aclanthology.org/2020.acl-main.740). In total, the model was trained for 10 epochs. I am sharing this model as
it might be useful to some of you and initial result suggest (some) improvements compared to [GBERT-base](https://huggingface.co/deepset/gbert-base)
(which is a common choice for supervised fine-tuning).

## Performance

|            | [GermEval-2017](https://sites.google.com/view/germeval2017-absa/home)<br>(subtask B, synchronic test set)| [SB10k](https://aclanthology.org/W17-1106/) |
|:----------:|:-------------:|:-----:|
| GBERT-base |     79.77%     | 82.29% |
|   T-GBERT  |     81.50%     | 82.88% |

*Results report the accuracy (micro F1-score) on the test set of the respective dataset. The results represent the average of five runs
with different seeds for data shuffling and parameter initialization.*

## Preprocessing
Weblinks in posts were replaced by 'https' and user mentions were replaced by '@user'.
<br><br>