--- license: openrail --- # trocr-old-russian ## Info The model is trained to recognize printed texts in Old Russian language - Use microsoft/trocr-small-printed as base model for fine-tune. - Fine-tune on 636k text images from dataset: https://huggingface.co/datasets/nevmenandr/russian-old-orthography-ocr ## Usage ### Base-usage ```python from PIL import Image from transformers import TrOCRProcessor, VisionEncoderDecoderModel hf_model = VisionEncoderDecoderModel.from_pretrained("Serovvans/trocr-prereform-orthography") image = Image.open("./path/to/your/image") processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-printed") pixel_values = processor(images=image, return_tensors="pt").pixel_values generated_ids = hf_model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] print(generated_text) ``` ## Usage for recognizing the book ```python ``` ## Metrics on test - CER (Char Error Rate) = 0.095 - WER (Word Error Rate) = 0.298