File size: 1,124 Bytes
2a48178
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
license: apache-2.0
language:
- de
tags:
- historical
- german
- teams
datasets:
- biglam/europeana_newspapers
- storytracer/German-PD-Newspapers
---

# Zeitungs-LM

The Zeitungs-LM is a language model pretrained on historical German newspapers. Technically the model itself is an ELECTRA model, which was pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.

## Corpora

Version 1 of the Zeitungs-LM was pretrained on the following corpora - which are all publicly available on the Model Hub:

* [`biglam/europeana_newspapers`](https://huggingface.co/datasets/biglam/europeana_newspapers)
* [`storytracer/German-PD-Newspapers`](https://huggingface.co/datasets/storytracer/German-PD-Newspapers)

In total, the pretraining corpus has a size of 133GB.

# Changelog

* 02.10.2024: Initial version of the model. More details about pretraining or benchmarks on downstream tasks are coming very soon!

# Acknowledgements

Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
Many Thanks for providing access to the TPUs ❤️