InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published 6 days ago • 24
A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis Paper • 2409.08947 • Published 6 days ago • 11
view article Article Getty Images Brings High-Quality, Commercially Safe Dataset to Hugging Face By andreagagliano • 13 days ago • 14
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published 7 days ago • 14
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Paper • 2409.07452 • Published 8 days ago • 18
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing Paper • 2409.01322 • Published 17 days ago • 94
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper • 2409.02095 • Published 16 days ago • 32
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts Paper • 2408.03209 • Published Aug 6 • 21
CSGO: Content-Style Composition in Text-to-Image Generation Paper • 2408.16766 • Published 21 days ago • 17
GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping Paper • 2405.17251 • Published May 27 • 2
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation Paper • 2408.15239 • Published 23 days ago • 27
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation Paper • 2408.14819 • Published 24 days ago • 18
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data Paper • 2408.10119 • Published Aug 19 • 15
TrackGo: A Flexible and Efficient Method for Controllable Video Generation Paper • 2408.11475 • Published 30 days ago • 16
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering Paper • 2408.09702 • Published Aug 19 • 9
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 96
MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing Paper • 2408.08000 • Published Aug 15 • 7
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance Paper • 2408.08189 • Published Aug 15 • 14
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12 • 52
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models Paper • 2408.00735 • Published Aug 1 • 15
RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis Paper • 2408.03356 • Published Aug 6 • 8
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion Paper • 2408.03178 • Published Aug 6 • 35
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper • 2408.02657 • Published Aug 5 • 32
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets Paper • 2406.13897 • Published May 30 • 11
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation Paper • 2407.14505 • Published Jul 19 • 24
OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person Paper • 2407.16224 • Published Jul 23 • 23
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence Paper • 2407.16655 • Published Jul 23 • 28
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models Paper • 2407.15642 • Published Jul 22 • 10
Still-Moving: Customized Video Generation without Customized Video Data Paper • 2407.08674 • Published Jul 11 • 11
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix Paper • 2407.00367 • Published Jun 29 • 9
RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network Paper • 2406.18284 • Published Jun 26 • 19
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models Paper • 2407.01519 • Published Jul 1 • 22
InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation Paper • 2407.00788 • Published Jun 30 • 21
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data Paper • 2406.18790 • Published Jun 26 • 33
Image Conductor: Precision Control for Interactive Video Synthesis Paper • 2406.15339 • Published Jun 21 • 8
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering Paper • 2403.09622 • Published Mar 14 • 16
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering Paper • 2406.10208 • Published Jun 14 • 21
Interpreting the Weight Space of Customized Diffusion Models Paper • 2406.09413 • Published Jun 13 • 18
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6 • 71
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published Jun 6 • 26
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published Jun 6 • 36
VideoTetris: Towards Compositional Text-to-Video Generation Paper • 2406.04277 • Published Jun 6 • 22
I4VGen: Image as Stepping Stone for Text-to-Video Generation Paper • 2406.02230 • Published Jun 4 • 15
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model Paper • 2405.20222 • Published May 30 • 10
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published May 29 • 20