--- license: mit language: - de metrics: - accuracy tags: - twitter --- # T-GBERT This is a [GBERT-base](https://huggingface.co/deepset/gbert-base) with continued pretraining on roughly 33 million German X/Twitter deduplicated posts from 2020. The pretraining is follows the task-adaptive pretraining setup suggested by [Gururangan et al. (2020)](https://aclanthology.org/2020.acl-main.740). In total, the model was trained for 10 epochs. I am sharing this model as it might be useful to some of you and initial result suggest (some) improvements compared to [GBERT-base](https://huggingface.co/deepset/gbert-base) (which is a common choice for supervised fine-tuning). ## Performance | | [GermEval-2017](https://sites.google.com/view/germeval2017-absa/home)
(subtask B, synchronic test set)| [SB10k](https://aclanthology.org/W17-1106/) | |:----------:|:-------------:|:-----:| | GBERT-base | 79.77% | 82.29% | | T-GBERT | 81.50% | 82.88% | *Results report the accuracy (micro F1-score) on the test set of the respective dataset. The results represent the average of five runs with different seeds for data shuffling and parameter initialization.* ## Preprocessing Weblinks in posts were replaced by 'https' and user mentions were replaced by '@user'.