Model Card for `eshangj/TrOCR-Sinhala-finetuned`

This model is a fine-tuned version of Microsoft's TrOCR model for Sinhala handwritten and printed text recognition. It can extract Sinhala text from scanned documents, printed text, and handwriting with high accuracy.

🧠 Model Details

Model Description

This is a Sinhala Optical Character Recognition (OCR) model based on the TrOCR architecture.
It has been fine-tuned by Eshan Gayanga on a custom Sinhala dataset containing printed and handwritten text samples.
The model builds on top of Ransaka Ravihara’s pretrained checkpoint, which was adapted and extended for improved Sinhala text recognition.

Developed by: Eshan Gayanga
Acknowledgments: Credit to Ransaka Ravihara for providing the original TrOCR Sinhala checkpoint used for fine-tuning.
Model type: VisionEncoderDecoderModel (TrOCR)
Language: Sinhala (si)
License: MIT
Finetuned from: RansakaRavihara/TrOCR-Sinhala-base

📂 Model Sources

Repository: https://huggingface.co/eshangj/TrOCR-Sinhala-finetuned
Base model: microsoft/trocr-base-stage1
Derived from checkpoint: RansakaRavihara/TrOCR-Sinhala-base

🚀 Uses

Direct Use

This model can be directly used for text recognition (OCR) in Sinhala language images. It performs well on:

Scanned documents
Printed Sinhala text
Handwritten Sinhala notes

Example Use Case

Digitizing old Sinhala printed or handwritten archives
Building document understanding systems for Sinhala-language text
Automatic marking and grading of Sinhala handwritten scripts

Downstream Use

This model can be further fine-tuned for:

Scene-text recognition in Sinhala
Multi-language OCR (Sinhala + English)
Document layout extraction pipelines

Out-of-Scope Use

Recognition of non-Sinhala scripts (Tamil, English, etc.)
Highly degraded or extremely noisy handwritten documents

⚠️ Bias, Risks, and Limitations

Model accuracy may drop for low-quality, blurred, or tilted images.
Some handwritten characters may be misread, especially with non-standard handwriting.
The model is not designed for recognizing mixed-language text.

Recommendations

Users should:

Preprocess input images (resize, denoise, binarize if needed).
Avoid using the model for sensitive personal documents.
Use post-processing (e.g., spell correction) to refine results.

💻 How to Get Started with the Model

from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

# Load model and processor
processor = TrOCRProcessor.from_pretrained("eshangj/TrOCR-Sinhala-finetuned")
model = VisionEncoderDecoderModel.from_pretrained("eshangj/TrOCR-Sinhala-finetuned").to("cuda")

# Load image
img = Image.open("<path-to-your-image>")

# OCR Inference
pixel_values = processor(images=img, return_tensors="pt").pixel_values.to("cuda")
generated_ids = model.generate(pixel_values, num_beams=3, early_stopping=True)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)

Downloads last month: 48

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for eshangj/TrOCR-Sinhala-finetuned

Base model

Ransaka/sinhala-ocr-model

Finetuned

Ransaka/TrOCR-Sinhala

Finetuned

(1)

this model

eshangj
/

TrOCR-Sinhala-finetuned

Model Card for `eshangj/TrOCR-Sinhala-finetuned`

🧠 Model Details

Model Description

📂 Model Sources

🚀 Uses

Direct Use

Example Use Case

Downstream Use

Out-of-Scope Use

⚠️ Bias, Risks, and Limitations

Recommendations

💻 How to Get Started with the Model

Model tree for eshangj/TrOCR-Sinhala-finetuned

Dataset used to train eshangj/TrOCR-Sinhala-finetuned

Model Card for eshangj/TrOCR-Sinhala-finetuned

🧠 Model Details

Model Description

📂 Model Sources

🚀 Uses

Direct Use

Example Use Case

Downstream Use

Out-of-Scope Use

⚠️ Bias, Risks, and Limitations

Recommendations

💻 How to Get Started with the Model

Model tree for eshangj/TrOCR-Sinhala-finetuned

Dataset used to train eshangj/TrOCR-Sinhala-finetuned

Model Card for `eshangj/TrOCR-Sinhala-finetuned`