Open Vision, Layout & OCR Models by Loay
Collection
This collection hosts a series of Vision Language Models (VLMs) fine-tuned for Optical Character Recognition (OCR) and Document Processing. • 5 items • Updated • 1
How to use loay/Arabic-OCR-DeepSeek-OCR-2 with Transformers:
# Use a pipeline as a high-level helper
# Warning: Pipeline type "image-to-text" is no longer supported in transformers v5.
# You must load the model directly (see below) or downgrade to v4.x with:
# 'pip install "transformers<5.0.0'
from transformers import pipeline
pipe = pipeline("image-to-text", model="loay/Arabic-OCR-DeepSeek-OCR-2", trust_remote_code=True) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("loay/Arabic-OCR-DeepSeek-OCR-2", trust_remote_code=True, dtype="auto")How to use loay/Arabic-OCR-DeepSeek-OCR-2 with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for loay/Arabic-OCR-DeepSeek-OCR-2 to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for loay/Arabic-OCR-DeepSeek-OCR-2 to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for loay/Arabic-OCR-DeepSeek-OCR-2 to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="loay/Arabic-OCR-DeepSeek-OCR-2",
max_seq_length=2048,
)This repository contains the bfloat16 merged version of the DeepSeek-OCR-2 (3B) Model, fine-tuned by loay for the specific task of performing high-precision Optical Character Recognition (OCR) and structural layout analysis on Arabic text from images.
The model was created by fine-tuning the unsloth/DeepSeek-OCR-2 model using LoRA adapters. The high-performance training was made possible by the Unsloth library, and the adapters were then merged back into the base model for easy deployment.
unsloth/DeepSeek-OCR-2bfloat16 precision and Flash Attention 2 for optimal quality and speed.r=64 and lora_alpha=128. The targeted modules include ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"].crop_mode=True) to handle varying large page sizes correctly without downscaling text artifacts.bfloat16 precision model.