Teklia
/

pylaia-himanis

Model card Files Files and versions Community

pylaia-himanis / README.md

mboillet's picture

Update README.md (#2)

ceae344 verified 9 months ago

|

1.68 kB

	---
	library_name: PyLaia
	license: mit
	tags:
	- PyLaia
	- PyTorch
	- atr
	- htr
	- ocr
	- historical
	- handwritten
	metrics:
	- CER
	- WER
	language:
	- fr
	datasets:
	- Teklia/Himanis
	pipeline_tag: image-to-text
	---

	# PyLaia - Himanis

	This model performs Handwritten Text Recognition in French on medieval documents.

	## Model description

	The model was trained using the PyLaia library on two medieval datasets:
	* [Himanis](https://demo.arkindex.org/browse/5000e248-a624-4df1-8679-1b34679817ef?top_level=true&folder=true) (French)
	* [HOME Alcar](https://demo.arkindex.org/browse/46b9b1f4-baeb-4342-a501-e2f15472a276?top_level=true&folder=true) (Latin)

	For training, text-lines were resized with a fixed height of 128 pixels, keeping the original aspect ratio.

	An external 6-gram character language model can be used to improve recognition. The language model is trained on the text from the Himanis training set.

	## Evaluation results

	The model achieves the following results:

	\| set \| Language model \| CER (%) \| WER (%) \| N lines \|
	\|:------\|:---------------\|:----------:\|:-------:\|----------:\|
	\| test \| no \| 9.87 \| 29.25 \| 2241 \|
	\| test \| yes \| 8.87 \| 24.37 \| 2241 \|

	## How to use

	Please refer to the [documentation](https://atr.pages.teklia.com/pylaia/).

	## Cite us

	```bibtex
	@inproceedings{pylaia-lib,
	author = "Tarride, Solène and Schneider, Yoann and Generali, Marie and Boillet, Melodie and Abadie, Bastien and Kermorvant, Christopher",
	title = "Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library",
	booktitle = "Submitted at ICDAR2024",
	year = "2024"
	}
	```