Vision-Language - a OliP Collection

OliP 's Collections

NewGen small LMs

Leading Leaderboards

2024 Papers of the year

2023 (and before) Papers of the Year

Vision-Language

Audio

Special LMs <10B

Coding

Vision-Language

updated 2 days ago

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19 • 42
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

Paper • 2407.04172 • Published Jul 4 • 22
facebook/chameleon-7b

Image-Text-to-Text • Updated Jul 23 • 16.4k • 166
vidore/colpali

Updated Sep 27 • 27.6k • 386
E5-V: Universal Embeddings with Multimodal Large Language Models

Paper • 2407.12580 • Published Jul 17 • 39
Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26 • 31
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5 • 60
LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59
Running on Zero

101

🏃

ColPali

Document Retrieval
VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9 • 46
nvidia/NVLM-D-72B

Image-Text-to-Text • Updated Oct 18 • 12.5k • 744
mistralai/Pixtral-12B-2409

Updated 3 days ago • 524
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3 • 82
stepfun-ai/GOT-OCR2_0

Image-Text-to-Text • Updated Sep 18 • 751k • 1.23k
deepseek-ai/Janus-1.3B

Any-to-Any • Updated 14 days ago • 4.58k • 464
h2oai/h2ovl-mississippi-2b

Text Generation • Updated 13 days ago • 14.2k • 21
HuggingFaceM4/Idefics3-8B-Llama3

Image-Text-to-Text • Updated Sep 18 • 14.2k • 245
wyu1/Leopard-Idefics2

Updated 21 days ago • 29 • 3
HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated 1 day ago • 8.67k • 140