File size: 1,042 Bytes
f666355 8414af9 c518d3c 7869bcc 8414af9 c235ccd 8414af9 4a0c45a 8414af9 c235ccd 8414af9 7869bcc 97bac1d 7869bcc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
---
license: openrail
---
# trocr-old-russian
## Info
The model is trained to recognize printed texts in Old Russian language
- Use microsoft/trocr-small-printed as base model for fine-tune.
- Fine-tune on 636k text images from dataset: https://huggingface.co/datasets/nevmenandr/russian-old-orthography-ocr
## Usage
### Base-usage
```python
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
hf_model = VisionEncoderDecoderModel.from_pretrained("Serovvans/trocr-prereform-orthography")
image = Image.open("./path/to/your/image")
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-printed")
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = hf_model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
```
## Usage for recognizing the book
```python
```
## Metrics on test
- CER (Char Error Rate) = 0.095
- WER (Word Error Rate) = 0.298 |