Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.13327

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 45
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24

protein-edit-task-vector

Edit Transfer: Learning Image Editing via Vision In-Context Relations

Paper • 2503.13327 • Published Mar 17 • 29
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

Paper • 2410.12836 • Published Oct 3, 2024
Editing Implicit Assumptions in Text-to-Image Diffusion Models

Paper • 2303.08084 • Published Mar 14, 2023 • 2
EditP23: 3D Editing via Propagation of Image Prompts to Multi-View

Paper • 2506.20652 • Published Jun 25 • 2

manga_translation

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24
PALO: A Polyglot Large Multimodal Model for 5B People

Paper • 2402.14818 • Published Feb 22, 2024 • 25
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 130
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9, 2024 • 31

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17, 2024 • 10
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 63
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 77

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

Paper • 2503.13434 • Published Mar 17 • 27
Edit Transfer: Learning Image Editing via Vision In-Context Relations

Paper • 2503.13327 • Published Mar 17 • 29
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes

Paper • 2503.13435 • Published Mar 17 • 18
MediaTek-Research/Llama-Breeze2-8B-Instruct

8B • Updated Mar 2 • 20.4k • 48

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 45
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17, 2024 • 10
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 63
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 77

protein-edit-task-vector

Edit Transfer: Learning Image Editing via Vision In-Context Relations

Paper • 2503.13327 • Published Mar 17 • 29
EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

Paper • 2410.12836 • Published Oct 3, 2024
Editing Implicit Assumptions in Text-to-Image Diffusion Models

Paper • 2303.08084 • Published Mar 14, 2023 • 2
EditP23: 3D Editing via Propagation of Image Prompts to Multi-View

Paper • 2506.20652 • Published Jun 25 • 2

BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

Paper • 2503.13434 • Published Mar 17 • 27
Edit Transfer: Learning Image Editing via Vision In-Context Relations

Paper • 2503.13327 • Published Mar 17 • 29
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes

Paper • 2503.13435 • Published Mar 17 • 18
MediaTek-Research/Llama-Breeze2-8B-Instruct

8B • Updated Mar 2 • 20.4k • 48

manga_translation

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 24
PALO: A Polyglot Large Multimodal Model for 5B People

Paper • 2402.14818 • Published Feb 22, 2024 • 25
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 130
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9, 2024 • 31

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs