Holo1.5 Collection Holo1.5 - Open Foundation Models for Computer Use Agents • 5 items • Updated 29 days ago • 33
Towards Reliable and Interpretable Document Question Answering via VLMs Paper • 2509.10129 • Published Sep 12
Running 1.09k 1.09k FineWeb: decanting the web for the finest text data at scale 🍷 Generate high-quality text data for LLMs using FineWeb
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published Jul 2 • 35
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 • 465
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 543