metadata
license: mit
tags:
- flair
- token-classification
- sequence-tagger-model
language: de
widget:
- text: >-
Namlich das Hanns Mulheim zer wirtshus zu Buchse sol gredt haben von
Herren von Bern habind die von Zürich verratten oder wollend sy verratten.
Turmbücher NER
A model for historical German developed by Ismail Prada Ziegler as part of a research project at the University of Bern, Digital Humanities.
Performance
PER | ORG | LOC | Micro-Avg | |
---|---|---|---|---|
Precision | 82.46% | 28.81% | 88.51% | 81.21% |
Recall | 88.51% | 44.74% | 83.02% | 83.99% |
F1-Score | 85.38% | 35.05% | 85.67% | 82.57% |
Note: ORG-tags were too inconsistent in the training data and performed poorly.
We discovered in first experiments that the model also performs reasonably well on automatically transcribed text (CER of around 5%).
Data Set
Main data set: Berner Turmbücher, early volumes from 16th C., Early New High German, 61k tokens training data.
Secondary data sets:
- SSRQ - Fribourg, language model + tagging, 59k tokens.
- Chorgerichtsmanuale (unpublished), language model + tagging, 76k tokens.
- Königsfelden Charters, language model, 623k tokens.
- Talgerichtsprotokolle (unpublished), language model, 438k tokens.
Notice
This project is still in progress.