thodel's picture
Update README.md
4122205
|
raw
history blame
No virus
1.8 kB
---
license: mit
metrics:
- cer
library_name: transformers
tags:
- medieval
- ocr
- htr
language:
- de
- fr
- la
- nl
---
# TrOCR Medieval Model with linemasks generated in eScriptorium (https://de.wikipedia.org/wiki/EScriptorium)
Base model: **microsoft/trocr-base-handwritten**
Epochs: 19.05 / 20
Eval CER: 0.0329
This is a combined model of ground truth of different **charter** and **book scripts** from a variety of projects and institutions, aiming at building a generic model for Latin scripts of the Middle Ages.
It is mainly based on documents from the project CREMMA Manuscrits médiévaux latins, HIMANIS (CNRS), Itinera Nova (Stadsarchief Leuven), and Charters and Records of Königsfelden (Universität Zürich).
Based on the following data:
CREMMA Manuscrits médiévaux latins has been produced by Clérice, Thibault and Chagué, Alix and Vlachou Efstathiou, Malamatenia. It is licensed under a CC-BY 4.0 license.
URL: https://github.com/HTR-United/CREMMA-Medieval-LAT
HIMANIS is partially published as HIMANIS Guérin produced by Stutzmann, Dominique; Hamel, Sébastien; Kernier, Iseut de; Mühlberger, Günter; Hackl, Günter. Licensed under a CC-BY 4.0 license.
DOI: 10.5281/zenodo.5535306
Charters and Records of Königsfelden Abbey and Bailiwick (1308-1662) has been produced by Halter-Pernet, Colette; Teuscher, Simon; Hodel, Tobias; Barwitzki, Lukas; Egloff, Salome; Henggeler, Fabian; Nadig, Michael; Steinmann, Anina; Stettler, Sabine; Prada Ziegler, Ismail. Licensed under a CC-BY 4.0 license.
DOI: 10.5281/zenodo.5179361
The model is based on the same data as the following PyLaia model (available on Transkribus):
https://readcoop.eu/model/charter-scripts-german-latin-french/
The model has not been extensively tested.
Potential biases are still to be identified.