zeitungs-lm-v1 / README.md
stefan-it's picture
readme: add initial version
2a48178 verified
|
raw
history blame
1.12 kB
metadata
license: apache-2.0
language:
  - de
tags:
  - historical
  - german
  - teams
datasets:
  - biglam/europeana_newspapers
  - storytracer/German-PD-Newspapers

Zeitungs-LM

The Zeitungs-LM is a language model pretrained on historical German newspapers. Technically the model itself is an ELECTRA model, which was pretrained with the TEAMS approach.

Corpora

Version 1 of the Zeitungs-LM was pretrained on the following corpora - which are all publicly available on the Model Hub:

In total, the pretraining corpus has a size of 133GB.

Changelog

  • 02.10.2024: Initial version of the model. More details about pretraining or benchmarks on downstream tasks are coming very soon!

Acknowledgements

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️