Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.01169

about 18 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 39
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7 • 20

about 23 hours ago

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24 • 12
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 86
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27 • 31

Multimodal LLMs

Personalized Multimodal Large Language Models: A Survey

Paper • 2412.02142 • Published 18 days ago • 12
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Paper • 2412.01169 • Published 19 days ago • 10

Gen AI Diffusion

about 12 hours ago

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Paper • 2410.10306 • Published Oct 14 • 54
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Paper • 2411.05003 • Published Nov 7 • 70
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

Paper • 2411.04709 • Published Nov 5 • 25
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

Paper • 2410.07171 • Published Oct 9 • 41

Omni-Generation

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17 • 108
Video-Guided Foley Sound Generation with Multimodal Controls

Paper • 2411.17698 • Published 24 days ago • 7
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait

Paper • 2412.01064 • Published 19 days ago • 25
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Paper • 2412.01169 • Published 19 days ago • 10

about 16 hours ago

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

Paper • 2312.04557 • Published Dec 7, 2023 • 12
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Paper • 2312.04410 • Published Dec 7, 2023 • 14
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 58
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Paper • 2401.02955 • Published Jan 5 • 21

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs