metadata
license: apache-2.0
language:
- de
tags:
- historical
- german
- teams
datasets:
- biglam/europeana_newspapers
- storytracer/German-PD-Newspapers
Zeitungs-LM
The Zeitungs-LM is a language model pretrained on historical German newspapers. Technically the model itself is an ELECTRA model, which was pretrained with the TEAMS approach.
Corpora
Version 1 of the Zeitungs-LM was pretrained on the following corpora - which are all publicly available on the Model Hub:
In total, the pretraining corpus has a size of 133GB.
Changelog
- 02.10.2024: Initial version of the model. More details about pretraining or benchmarks on downstream tasks are coming very soon!
Acknowledgements
Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️