Model Card for eshangj/TrOCR-Sinhala-finetuned
This model is a fine-tuned version of Microsoft's TrOCR model for Sinhala handwritten and printed text recognition. It can extract Sinhala text from scanned documents, printed text, and handwriting with high accuracy.
π§ Model Details
Model Description
This is a Sinhala Optical Character Recognition (OCR) model based on the TrOCR architecture.
It has been fine-tuned by Eshan Gayanga on a custom Sinhala dataset containing printed and handwritten text samples.
The model builds on top of Ransaka Raviharaβs pretrained checkpoint, which was adapted and extended for improved Sinhala text recognition.
- Developed by: Eshan Gayanga
- Acknowledgments: Credit to Ransaka Ravihara for providing the original TrOCR Sinhala checkpoint used for fine-tuning.
- Model type: VisionEncoderDecoderModel (TrOCR)
- Language: Sinhala (
si) - License: MIT
- Finetuned from:
RansakaRavihara/TrOCR-Sinhala-base
π Model Sources
- Repository: https://huggingface.co/eshangj/TrOCR-Sinhala-finetuned
- Base model: microsoft/trocr-base-stage1
- Derived from checkpoint:
RansakaRavihara/TrOCR-Sinhala-base
π Uses
Direct Use
This model can be directly used for text recognition (OCR) in Sinhala language images. It performs well on:
- Scanned documents
- Printed Sinhala text
- Handwritten Sinhala notes
Example Use Case
- Digitizing old Sinhala printed or handwritten archives
- Building document understanding systems for Sinhala-language text
- Automatic marking and grading of Sinhala handwritten scripts
Downstream Use
This model can be further fine-tuned for:
- Scene-text recognition in Sinhala
- Multi-language OCR (Sinhala + English)
- Document layout extraction pipelines
Out-of-Scope Use
- Recognition of non-Sinhala scripts (Tamil, English, etc.)
- Highly degraded or extremely noisy handwritten documents
β οΈ Bias, Risks, and Limitations
- Model accuracy may drop for low-quality, blurred, or tilted images.
- Some handwritten characters may be misread, especially with non-standard handwriting.
- The model is not designed for recognizing mixed-language text.
Recommendations
Users should:
- Preprocess input images (resize, denoise, binarize if needed).
- Avoid using the model for sensitive personal documents.
- Use post-processing (e.g., spell correction) to refine results.
π» How to Get Started with the Model
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
# Load model and processor
processor = TrOCRProcessor.from_pretrained("eshangj/TrOCR-Sinhala-finetuned")
model = VisionEncoderDecoderModel.from_pretrained("eshangj/TrOCR-Sinhala-finetuned").to("cuda")
# Load image
img = Image.open("<path-to-your-image>")
# OCR Inference
pixel_values = processor(images=img, return_tensors="pt").pixel_values.to("cuda")
generated_ids = model.generate(pixel_values, num_beams=3, early_stopping=True)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
- Downloads last month
- 48