--- library_name: transformers license: cc-by-4.0 language: - 'no' - 'nb' - 'nn' base_model: - microsoft/trocr-base-handwritten --- # Model Card for Sprakbanken/TrOCR-norhand-v3 This is a TrOCR-model for OCR (optical character recognition) of handwritten historic documents written in Norwegian. It can be used to recognize text in images of handwritten text. ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import TrOCRProcessor, VisionEncoderDecoderModel from PIL import Image processor = TrOCRProcessor.from_pretrained("Sprakbanken/TrOCR-norhand-v3") model = VisionEncoderDecoderModel.from_pretrained("Sprakbanken/TrOCR-norhand-v3") image = Image.open("path_to_image.jpg").convert("RGB") pixel_values = processor(image, return_tensors="pt").pixel_values generated_ids = model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## Model Details This model is [microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten) fine-tuned on the [Huggingface version](https://huggingface.co/datasets/Teklia/NorHand-v3-line) of the [NorHand v3 dataset](https://zenodo.org/records/10255840). ### Model Description - **Developed by:** The National Library of Norway - **Model type:** TrOCR - **Languages:** Norwegian (mostly >100 years old) - **License:** [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) - **Finetuned from model :** [microsoft/trocr-base-printed](https://huggingface.co/microsoft/trocr-base-handwritten) ## Uses You can use the raw model for handwritten text recognition (HTR) on single text-line images in Norwegian. ### Out-of-Scope Use The model only works with images of lines of text. If you have images of entire pages of text, you must segment the text into lines first to benefit from this model.