Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.09501

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 39
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7 • 20

Rethinking Data Selection at Scale: Random Selection is Almost All You Need

Paper • 2410.09335 • Published Oct 12 • 16
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning

Paper • 2410.06456 • Published Oct 9 • 35
Emergent properties with repeated examples

Paper • 2410.07041 • Published Oct 9 • 8
Personalized Visual Instruction Tuning

Paper • 2410.07113 • Published Oct 9 • 69

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24 • 12
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 86
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27 • 31

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Paper • 2412.09501 • Published 6 days ago • 42

XLabs-AI/flux-RealismLora

Text-to-Image • Updated Aug 22 • 296k • • 889
StyleMaster: Stylize Your Video with Artistic Generation and Translation

Paper • 2412.07744 • Published 8 days ago • 18
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published 8 days ago • 43
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Paper • 2412.09501 • Published 6 days ago • 42

Omni-Generation

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17 • 108
Video-Guided Foley Sound Generation with Multimodal Controls

Paper • 2411.17698 • Published 22 days ago • 7
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

Paper • 2412.01064 • Published 17 days ago • 25
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Paper • 2412.01169 • Published 17 days ago • 10

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs