Model Card for eshangj/TrOCR-Sinhala-finetuned

This model is a fine-tuned version of Microsoft's TrOCR model for Sinhala handwritten and printed text recognition. It can extract Sinhala text from scanned documents, printed text, and handwriting with high accuracy.


🧠 Model Details

Model Description

This is a Sinhala Optical Character Recognition (OCR) model based on the TrOCR architecture.
It has been fine-tuned by Eshan Gayanga on a custom Sinhala dataset containing printed and handwritten text samples.
The model builds on top of Ransaka Ravihara’s pretrained checkpoint, which was adapted and extended for improved Sinhala text recognition.

  • Developed by: Eshan Gayanga
  • Acknowledgments: Credit to Ransaka Ravihara for providing the original TrOCR Sinhala checkpoint used for fine-tuning.
  • Model type: VisionEncoderDecoderModel (TrOCR)
  • Language: Sinhala (si)
  • License: MIT
  • Finetuned from: RansakaRavihara/TrOCR-Sinhala-base

πŸ“‚ Model Sources


πŸš€ Uses

Direct Use

This model can be directly used for text recognition (OCR) in Sinhala language images. It performs well on:

  • Scanned documents
  • Printed Sinhala text
  • Handwritten Sinhala notes

Example Use Case

  • Digitizing old Sinhala printed or handwritten archives
  • Building document understanding systems for Sinhala-language text
  • Automatic marking and grading of Sinhala handwritten scripts

Downstream Use

This model can be further fine-tuned for:

  • Scene-text recognition in Sinhala
  • Multi-language OCR (Sinhala + English)
  • Document layout extraction pipelines

Out-of-Scope Use

  • Recognition of non-Sinhala scripts (Tamil, English, etc.)
  • Highly degraded or extremely noisy handwritten documents

⚠️ Bias, Risks, and Limitations

  • Model accuracy may drop for low-quality, blurred, or tilted images.
  • Some handwritten characters may be misread, especially with non-standard handwriting.
  • The model is not designed for recognizing mixed-language text.

Recommendations

Users should:

  • Preprocess input images (resize, denoise, binarize if needed).
  • Avoid using the model for sensitive personal documents.
  • Use post-processing (e.g., spell correction) to refine results.

πŸ’» How to Get Started with the Model

from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

# Load model and processor
processor = TrOCRProcessor.from_pretrained("eshangj/TrOCR-Sinhala-finetuned")
model = VisionEncoderDecoderModel.from_pretrained("eshangj/TrOCR-Sinhala-finetuned").to("cuda")

# Load image
img = Image.open("<path-to-your-image>")

# OCR Inference
pixel_values = processor(images=img, return_tensors="pt").pixel_values.to("cuda")
generated_ids = model.generate(pixel_values, num_beams=3, early_stopping=True)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
Downloads last month
48
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for eshangj/TrOCR-Sinhala-finetuned

Finetuned
(1)
this model

Dataset used to train eshangj/TrOCR-Sinhala-finetuned