README.md · PleIAs/OCRerrcr at b3022a1697986987b4894c2885ff4f1eb1edac66

metadata

license: apache-2.0
language:
  - en
  - fr
  - de

OCRerrcr is a small language model specialized for the detection of OCR error.

OCRerrcr was trained by Elliot Jones for PleIAs on a sample of 1000 documents with labelled OCR errors from open data documents (Finance Commons) and cultural heritage sources (Common Corpus).

To date, OCRerrcr provide the most accurate agnostic OCR error rate estimate. PleIAs has also develop an alternative pipeline for this tasks, OCRoscope, that scale significantly better but also significantly less accurate, especially for document with fewer mistakes.

The name OCRerrcr (instead of OCRerror) is a playful allusion to a common OCR misreading.

PleIAs
/

OCRerrcr

Example