Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline Paper • 2411.12814 • Published 28 days ago • 21
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper • 2411.07199 • Published Nov 11 • 44
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published Nov 7 • 37
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3 • 83
OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated Nov 6 • 119
Transformers compatible Mamba Collection This release includes the `mamba` repositories compatible with the `transformers` library • 5 items • Updated Mar 6 • 36
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception Paper • 2401.16158 • Published Jan 29 • 19