366
OCR
π
olmocr / nanonets ocr / qwen2vl ocr / aya vision / rolmocr
Comprehensive Demo of Multimodal VLMs on the Hub
olmocr / nanonets ocr / qwen2vl ocr / aya vision / rolmocr
for document parsing task
Florence-2-large / Florence-2-base
Experiment with the Tiny VLMs here
camel doc ocr / core ocr / docscope ocr / monkey ocr
Vision-Language Models for Document Conversion
nanonets ocr / smoldocling / monkey ocr / typhoon ocr
deepcaption / skycaptioner /spacethinker / spaceom / coreocr
cosmos reason1 / docscopeocr / visionocr / captioner relaxed
qwen2.5-vl-7b / qwen2.5-vl-3b / abliterated-caption-it / vlr
thinking / ocr / reasoning
OCR, VQA, Thinking and Object Detection.