-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 39 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 20
Collections
Discover the best community collections!
Collections including paper arxiv:2412.09501
-
Rethinking Data Selection at Scale: Random Selection is Almost All You Need
Paper • 2410.09335 • Published • 16 -
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
Paper • 2410.06456 • Published • 35 -
Emergent properties with repeated examples
Paper • 2410.07041 • Published • 8 -
Personalized Visual Instruction Tuning
Paper • 2410.07113 • Published • 69
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 12 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 86 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 31
-
XLabs-AI/flux-RealismLora
Text-to-Image • Updated • 296k • • 889 -
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Paper • 2412.07744 • Published • 18 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 43 -
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Paper • 2412.09501 • Published • 42
-
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 108 -
Video-Guided Foley Sound Generation with Multimodal Controls
Paper • 2411.17698 • Published • 7 -
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Paper • 2412.01064 • Published • 25 -
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Paper • 2412.01169 • Published • 10