stefan-it commited on
Commit
2a48178
1 Parent(s): a762fbe

readme: add initial version

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - de
5
+ tags:
6
+ - historical
7
+ - german
8
+ - teams
9
+ datasets:
10
+ - biglam/europeana_newspapers
11
+ - storytracer/German-PD-Newspapers
12
+ ---
13
+
14
+ # Zeitungs-LM
15
+
16
+ The Zeitungs-LM is a language model pretrained on historical German newspapers. Technically the model itself is an ELECTRA model, which was pretrained with the [TEAMS](https://aclanthology.org/2021.findings-acl.219/) approach.
17
+
18
+ ## Corpora
19
+
20
+ Version 1 of the Zeitungs-LM was pretrained on the following corpora - which are all publicly available on the Model Hub:
21
+
22
+ * [`biglam/europeana_newspapers`](https://huggingface.co/datasets/biglam/europeana_newspapers)
23
+ * [`storytracer/German-PD-Newspapers`](https://huggingface.co/datasets/storytracer/German-PD-Newspapers)
24
+
25
+ In total, the pretraining corpus has a size of 133GB.
26
+
27
+ # Changelog
28
+
29
+ * 02.10.2024: Initial version of the model. More details about pretraining or benchmarks on downstream tasks are coming very soon!
30
+
31
+ # Acknowledgements
32
+
33
+ Research supported with Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
34
+ Many Thanks for providing access to the TPUs ❤️