 zzfive
			's Collections
			zzfive
			's Collections
			
			
				
				
 - Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image
  Synthesis- 
			Paper
			 •- 
			2401.09048
			 •
			Published
				
			•- 
				10
			 
 - Improving fine-grained understanding in image-text pre-training- 
			Paper
			 •- 
			2401.09865
			 •
			Published
				
			•- 
				18
			 
 - Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data- 
			Paper
			 •- 
			2401.10891
			 •
			Published
				
			•- 
				62
			 
 - Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic
  Image Restoration In the Wild- 
			Paper
			 •- 
			2401.13627
			 •
			Published
				
			•- 
				77
			 
 - UNIMO-G: Unified Image Generation through Multimodal Conditional
  Diffusion- 
			Paper
			 •- 
			2401.13388
			 •
			Published
				
			•- 
				12
			 
 - DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image
  Editing- 
			Paper
			 •- 
			2402.02583
			 •
			Published
				
			•- 
				8
			 
 - SDXL-Lightning: Progressive Adversarial Diffusion Distillation- 
			Paper
			 •- 
			2402.13929
			 •
			Published
				
			•- 
				27
			 
 - T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with
  Trajectory Stitching- 
			Paper
			 •- 
			2402.14167
			 •
			Published
				
			•- 
				12
			 
 - Subobject-level Image Tokenization- 
			Paper
			 •- 
			2402.14327
			 •
			Published
				
			•- 
				18
			 
 - Gen4Gen: Generative Data Pipeline for Generative Multi-Concept
  Composition- 
			Paper
			 •- 
			2402.15504
			 •
			Published
				
			•- 
				22
			 
 - Multi-LoRA Composition for Image Generation- 
			Paper
			 •- 
			2402.16843
			 •
			Published
				
			•- 
				32
			 
 - EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with
  Audio2Video Diffusion Model under Weak Conditions- 
			Paper
			 •- 
			2402.17485
			 •
			Published
				
			•- 
				195
			 
 - DistriFusion: Distributed Parallel Inference for High-Resolution
  Diffusion Models- 
			Paper
			 •- 
			2402.19481
			 •
			Published
				
			•- 
				22
			 
 - Trajectory Consistency Distillation- 
			Paper
			 •- 
			2402.19159
			 •
			Published
				
			•- 
				16
			 
 - RealCustom: Narrowing Real Text Word for Real-Time Open-Domain
  Text-to-Image Customization- 
			Paper
			 •- 
			2403.00483
			 •
			Published
				
			•- 
				15
			 
 - ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models- 
			Paper
			 •- 
			2403.02084
			 •
			Published
				
			•- 
				15
			 
 - OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
  Virtual Try-on- 
			Paper
			 •- 
			2403.01779
			 •
			Published
				
			•- 
				30
			 
 - Scaling Rectified Flow Transformers for High-Resolution Image Synthesis- 
			Paper
			 •- 
			2403.03206
			 •
			Published
				
			•- 
				70
			 
 - ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment- 
			Paper
			 •- 
			2403.05135
			 •
			Published
				
			•- 
				45
			 
 - Motion Mamba: Efficient and Long Sequence Motion Generation with
  Hierarchical and Bidirectional Selective SSM- 
			Paper
			 •- 
			2403.07487
			 •
			Published
				
			•- 
				17
			 
 - Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering- 
			Paper
			 •- 
			2403.09622
			 •
			Published
				
			•- 
				18
			 
 - StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based
  Semantic Control- 
			Paper
			 •- 
			2403.09055
			 •
			Published
				
			•- 
				27
			 
 - IDAdapter: Learning Mixed Features for Tuning-Free Personalization of
  Text-to-Image Models- 
			Paper
			 •- 
			2403.13535
			 •
			Published
				
			•- 
				23
			 
 - DepthFM: Fast Monocular Depth Estimation with Flow Matching- 
			Paper
			 •- 
			2403.13788
			 •
			Published
				
			•- 
				17
			 
 - Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos- 
			Paper
			 •- 
			2403.13044
			 •
			Published
				
			•- 
				15
			 
 - FlashFace: Human Image Personalization with High-fidelity Identity
  Preservation- 
			Paper
			 •- 
			2403.17008
			 •
			Published
				
			•- 
				21
			 
 - SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions- 
			Paper
			 •- 
			2403.16627
			 •
			Published
				
			•- 
				21
			 
 - ViTAR: Vision Transformer with Any Resolution- 
			Paper
			 •- 
			2403.18361
			 •
			Published
				
			•- 
				55
			 
 - ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object
  Removal and Insertion- 
			Paper
			 •- 
			2403.18818
			 •
			Published
				
			•- 
				28
			 
 - CosmicMan: A Text-to-Image Foundation Model for Humans- 
			Paper
			 •- 
			2404.01294
			 •
			Published
				
			•- 
				17
			 
 - Condition-Aware Neural Network for Controlled Image Generation- 
			Paper
			 •- 
			2404.01143
			 •
			Published
				
			•- 
				13
			 
 - Measuring Style Similarity in Diffusion Models- 
			Paper
			 •- 
			2404.01292
			 •
			Published
				
			•- 
				17
			 
 - CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept
  Matching- 
			Paper
			 •- 
			2404.03653
			 •
			Published
				
			•- 
				36
			 
 - RL for Consistency Models: Faster Reward Guided Text-to-Image Generation- 
			Paper
			 •- 
			2404.03673
			 •
			Published
				
			•- 
				16
			 
 - ControlNet++: Improving Conditional Controls with Efficient Consistency
  Feedback- 
			Paper
			 •- 
			2404.07987
			 •
			Published
				
			•- 
				48
			 
 - Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and
  Training Strategies- 
			Paper
			 •- 
			2404.08197
			 •
			Published
				
			•- 
				29
			 
 - Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse
  Controls to Any Diffusion Model- 
			Paper
			 •- 
			2404.09967
			 •
			Published
				
			•- 
				21
			 
 - HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing- 
			Paper
			 •- 
			2404.09990
			 •
			Published
				
			•- 
				13
			 
 - Dynamic Typography: Bringing Words to Life- 
			Paper
			 •- 
			2404.11614
			 •
			Published
				
			•- 
				45
			 
 - MoA: Mixture-of-Attention for Subject-Context Disentanglement in
  Personalized Image Generation- 
			Paper
			 •- 
			2404.11565
			 •
			Published
				
			•- 
				15
			 
 - Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image
  Synthesis- 
			Paper
			 •- 
			2404.13686
			 •
			Published
				
			•- 
				28
			 
 - Align Your Steps: Optimizing Sampling Schedules in Diffusion Models- 
			Paper
			 •- 
			2404.14507
			 •
			Published
				
			•- 
				23
			 
 - PuLID: Pure and Lightning ID Customization via Contrastive Alignment- 
			Paper
			 •- 
			2404.16022
			 •
			Published
				
			•- 
				25
			 
 - Editable Image Elements for Controllable Synthesis- 
			Paper
			 •- 
			2404.16029
			 •
			Published
				
			•- 
				12
			 
 - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with
  Reward Feedback Learning- 
			Paper
			 •- 
			2404.15449
			 •
			Published
				
			•- 
				14
			 
 - ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity
  Preserving- 
			Paper
			 •- 
			2404.16771
			 •
			Published
				
			•- 
				19
			 
 - StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video
  Generation- 
			Paper
			 •- 
			2405.01434
			 •
			Published
				
			•- 
				56
			 
 - Customizing Text-to-Image Models with a Single Image Pair- 
			Paper
			 •- 
			2405.01536
			 •
			Published
				
			•- 
				22
			 
 - Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and
  Attribute Control- 
			Paper
			 •- 
			2405.12970
			 •
			Published
				
			•- 
				25
			 
 - RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance- 
			Paper
			 •- 
			2405.14677
			 •
			Published
				
			•- 
				12
			 
 - DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis- 
			Paper
			 •- 
			2405.14224
			 •
			Published
				
			•- 
				16
			 
 - Semantica: An Adaptable Image-Conditioned Diffusion Model- 
			Paper
			 •- 
			2405.14857
			 •
			Published
				
			•- 
				11
			 
 - EM Distillation for One-step Diffusion Models- 
			Paper
			 •- 
			2405.16852
			 •
			Published
				
			•- 
				12
			 
 - Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models- 
			Paper
			 •- 
			2405.16759
			 •
			Published
				
			•- 
				8
			 
 - 
			Paper
			 •- 
			2405.18407
			 •
			Published
				
			•- 
				48
			 
 - BitsFusion: 1.99 bits Weight Quantization of Diffusion Model- 
			Paper
			 •- 
			2406.04333
			 •
			Published
				
			•- 
				38
			 
 - pOps: Photo-Inspired Diffusion Operators- 
			Paper
			 •- 
			2406.01300
			 •
			Published
				
			•- 
				18
			 
 - Zero-shot Image Editing with Reference Imitation- 
			Paper
			 •- 
			2406.07547
			 •
			Published
				
			•- 
				33
			 
 - An Image is Worth 32 Tokens for Reconstruction and Generation- 
			Paper
			 •- 
			2406.07550
			 •
			Published
				
			•- 
				59
			 
 - AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising- 
			Paper
			 •- 
			2406.06911
			 •
			Published
				
			•- 
				12
			 
 - FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent
  Font Effect Generation- 
			Paper
			 •- 
			2406.08392
			 •
			Published
				
			•- 
				21
			 
 - 
			Paper
			 •- 
			2406.09414
			 •
			Published
				
			•- 
				103
			 
 - An Image is Worth More Than 16x16 Patches: Exploring Transformers on
  Individual Pixels- 
			Paper
			 •- 
			2406.09415
			 •
			Published
				
			•- 
				51
			 
 - Alleviating Distortion in Image Generation via Multi-Resolution
  Diffusion Models- 
			Paper
			 •- 
			2406.09416
			 •
			Published
				
			•- 
				29
			 
 - EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal
  Prompts- 
			Paper
			 •- 
			2406.09162
			 •
			Published
				
			•- 
				14
			 
 - Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual
  Visual Text Rendering- 
			Paper
			 •- 
			2406.10208
			 •
			Published
				
			•- 
				22
			 
 - Exploring the Role of Large Language Models in Prompt Encoding for
  Diffusion Models- 
			Paper
			 •- 
			2406.11831
			 •
			Published
				
			•- 
				22
			 
 - The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN
  Inversion and High Quality Image Editing- 
			Paper
			 •- 
			2406.10601
			 •
			Published
				
			•- 
				70
			 
 - Invertible Consistency Distillation for Text-Guided Image Editing in
  Around 7 Steps- 
			Paper
			 •- 
			2406.14539
			 •
			Published
				
			•- 
				27
			 
 - DreamBench++: A Human-Aligned Benchmark for Personalized Image
  Generation- 
			Paper
			 •- 
			2406.16855
			 •
			Published
				
			•- 
				57
			 
 - Aligning Diffusion Models with Noise-Conditioned Perception- 
			Paper
			 •- 
			2406.17636
			 •
			Published
				
			•- 
				27
			 
 - Magic Insert: Style-Aware Drag-and-Drop- 
			Paper
			 •- 
			2407.02489
			 •
			Published
				
			•- 
				22
			 
 - DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents- 
			Paper
			 •- 
			2407.03300
			 •
			Published
				
			•- 
				14
			 
 - PartCraft: Crafting Creative Objects by Parts- 
			Paper
			 •- 
			2407.04604
			 •
			Published
				
			•- 
				6
			 
 - SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive
  Canvas Layout- 
			Paper
			 •- 
			2404.00412
			 •
			Published
				
			•- 
				2
			 
 - DataDream: Few-shot Guided Dataset Generation- 
			Paper
			 •- 
			2407.10910
			 •
			Published
				
			•- 
				10
			 
 - Scaling Diffusion Transformers to 16 Billion Parameters- 
			Paper
			 •- 
			2407.11633
			 •
			Published
				
			•- 
				26
			 
 - IMAGDressing-v1: Customizable Virtual Dressing- 
			Paper
			 •- 
			2407.12705
			 •
			Published
				
			•- 
				13
			 
 - CGB-DM: Content and Graphic Balance Layout Generation with
  Transformer-based Diffusion Model- 
			Paper
			 •- 
			2407.15233
			 •
			Published
				
			•- 
				7
			 
 - Artist: Aesthetically Controllable Text-Driven Stylization without
  Training- 
			Paper
			 •- 
			2407.15842
			 •
			Published
				
			•- 
				14
			 
 - 
			Paper
			 •- 
			2407.15595
			 •
			Published
				
			•- 
				14
			 
 - ViPer: Visual Personalization of Generative Models via Individual
  Preference Learning- 
			Paper
			 •- 
			2407.17365
			 •
			Published
				
			•- 
				13
			 
 - Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model- 
			Paper
			 •- 
			2407.16982
			 •
			Published
				
			•- 
				42
			 
 - BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular
  Depth Estimation- 
			Paper
			 •- 
			2407.17952
			 •
			Published
				
			•- 
				32
			 
 - SHIC: Shape-Image Correspondences with no Keypoint Supervision- 
			Paper
			 •- 
			2407.18907
			 •
			Published
				
			•- 
				41
			 
 - TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models- 
			Paper
			 •- 
			2408.00735
			 •
			Published
				
			•- 
				17
			 
 - Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy
  Curvature of Attention- 
			Paper
			 •- 
			2408.00760
			 •
			Published
				
			•- 
				8
			 
 - Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation
  with Multimodal Generative Pretraining- 
			Paper
			 •- 
			2408.02657
			 •
			Published
				
			•- 
				35
			 
 - ProCreate, Dont Reproduce! Propulsive Energy Diffusion for Creative
  Generation- 
			Paper
			 •- 
			2408.02226
			 •
			Published
				
			•- 
				12
			 
 - IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning
  using Instruct Prompts- 
			Paper
			 •- 
			2408.03209
			 •
			Published
				
			•- 
				22
			 
 - Openstory++: A Large-scale Dataset and Benchmark for Instance-aware
  Open-domain Visual Storytelling- 
			Paper
			 •- 
			2408.03695
			 •
			Published
				
			•- 
				13
			 
 - ControlNeXt: Powerful and Efficient Control for Image and Video
  Generation- 
			Paper
			 •- 
			2408.06070
			 •
			Published
				
			•- 
				55
			 
 - BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion- 
			Paper
			 •- 
			2408.04785
			 •
			Published
				
			•- 
				9
			 
 - UniPortrait: A Unified Framework for Identity-Preserving Single- and
  Multi-Human Image Personalization- 
			Paper
			 •- 
			2408.05939
			 •
			Published
				
			•- 
				15
			 
 - 
			Paper
			 •- 
			2408.07009
			 •
			Published
				
			•- 
				62
			 
 - ZePo: Zero-Shot Portrait Stylization with Faster Sampling- 
			Paper
			 •- 
			2408.05492
			 •
			Published
				
			•- 
				7
			 
 - 
			Paper
			 •- 
			2408.07116
			 •
			Published
				
			•- 
				20
			 
 - JPEG-LM: LLMs as Image Generators with Canonical Codec Representations- 
			Paper
			 •- 
			2408.08459
			 •
			Published
				
			•- 
				45
			 
 - TurboEdit: Instant text-based image editing- 
			Paper
			 •- 
			2408.08332
			 •
			Published
				
			•- 
				20
			 
 - Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering- 
			Paper
			 •- 
			2408.09702
			 •
			Published
				
			•- 
				11
			 
 - TraDiffusion: Trajectory-Based Training-Free Image Generation- 
			Paper
			 •- 
			2408.09739
			 •
			Published
				
			•- 
				9
			 
 - MegaFusion: Extend Diffusion Models towards Higher-resolution Image
  Generation without Further Tuning- 
			Paper
			 •- 
			2408.11001
			 •
			Published
				
			•- 
				13
			 
 - The Brittleness of AI-Generated Image Watermarking Techniques: Examining
  Their Robustness Against Visual Paraphrasing Attacks- 
			Paper
			 •- 
			2408.10446
			 •
			Published
				
			•- 
				9
			 
 - Scalable Autoregressive Image Generation with Mamba- 
			Paper
			 •- 
			2408.12245
			 •
			Published
				
			•- 
				26
			 
 - CODE: Confident Ordinary Differential Editing- 
			Paper
			 •- 
			2408.12418
			 •
			Published
				
			•- 
				4
			 
 - SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its
  Teacher- 
			Paper
			 •- 
			2408.14176
			 •
			Published
				
			•- 
				62
			 
 - Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image
  Generation- 
			Paper
			 •- 
			2408.14819
			 •
			Published
				
			•- 
				22
			 
 - Distribution Backtracking Builds A Faster Convergence Trajectory for
  One-step Diffusion Distillation- 
			Paper
			 •- 
			2408.15991
			 •
			Published
				
			•- 
				16
			 
 - CSGO: Content-Style Composition in Text-to-Image Generation- 
			Paper
			 •- 
			2408.16766
			 •
			Published
				
			•- 
				18
			 
 - CoRe: Context-Regularized Text Embedding Learning for Text-to-Image
  Personalization- 
			Paper
			 •- 
			2408.15914
			 •
			Published
				
			•- 
				24
			 
 - VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion
  Transformers- 
			Paper
			 •- 
			2408.17131
			 •
			Published
				
			•- 
				11
			 
 - LinFusion: 1 GPU, 1 Minute, 16K Image- 
			Paper
			 •- 
			2409.02097
			 •
			Published
				
			•- 
				34
			 
 - Accurate Compression of Text-to-Image Diffusion Models via Vector
  Quantization- 
			Paper
			 •- 
			2409.00492
			 •
			Published
				
			•- 
				11
			 
 - Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free
  Real Image Editing- 
			Paper
			 •- 
			2409.01322
			 •
			Published
				
			•- 
				96
			 
 - IFAdapter: Instance Feature Control for Grounded Text-to-Image
  Generation- 
			Paper
			 •- 
			2409.08240
			 •
			Published
				
			•- 
				22
			 
 - InstantDrag: Improving Interactivity in Drag-based Image Editing- 
			Paper
			 •- 
			2409.08857
			 •
			Published
				
			•- 
				34
			 
 - StoryMaker: Towards Holistic Consistent Characters in Text-to-image
  Generation- 
			Paper
			 •- 
			2409.12576
			 •
			Published
				
			•- 
				16
			 
 - Imagine yourself: Tuning-Free Personalized Image Generation- 
			Paper
			 •- 
			2409.13346
			 •
			Published
				
			•- 
				70
			 
 - Colorful Diffuse Intrinsic Image Decomposition in the Wild- 
			Paper
			 •- 
			2409.13690
			 •
			Published
				
			•- 
				14
			 
 - Improvements to SDXL in NovelAI Diffusion V3- 
			Paper
			 •- 
			2409.15997
			 •
			Published
				
			•- 
				13
			 
 - Pixel-Space Post-Training of Latent Diffusion Models- 
			Paper
			 •- 
			2409.17565
			 •
			Published
				
			•- 
				21
			 
 - OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal
  Instruction- 
			Paper
			 •- 
			2410.04932
			 •
			Published
				
			•- 
				9
			 
 - Accelerating Auto-regressive Text-to-Image Generation with Training-free
  Speculative Jacobi Decoding- 
			Paper
			 •- 
			2410.01699
			 •
			Published
				
			•- 
				18
			 
 - IterComp: Iterative Composition-Aware Feedback Learning from Model
  Gallery for Text-to-Image Generation- 
			Paper
			 •- 
			2410.07171
			 •
			Published
				
			•- 
				43
			 
 - Story-Adapter: A Training-free Iterative Framework for Long Story
  Visualization- 
			Paper
			 •- 
			2410.06244
			 •
			Published
				
			•- 
				19
			 
 - Eliminating Oversaturation and Artifacts of High Guidance Scales in
  Diffusion Models- 
			Paper
			 •- 
			2410.02416
			 •
			Published
				
			•- 
				33
			 
 - DICE: Discrete Inversion Enabling Controllable Editing for Multinomial
  Diffusion and Masked Generative Models- 
			Paper
			 •- 
			2410.08207
			 •
			Published
				
			•- 
				19
			 
 - Meissonic: Revitalizing Masked Generative Transformers for Efficient
  High-Resolution Text-to-Image Synthesis- 
			Paper
			 •- 
			2410.08261
			 •
			Published
				
			•- 
				52
			 
 - EvolveDirector: Approaching Advanced Text-to-Image Generation with Large
  Vision-Language Models- 
			Paper
			 •- 
			2410.07133
			 •
			Published
				
			•- 
				19
			 
 - Semantic Image Inversion and Editing using Rectified Stochastic
  Differential Equations- 
			Paper
			 •- 
			2410.10792
			 •
			Published
				
			•- 
				31
			 
 - Efficient Diffusion Models: A Comprehensive Survey from Principles to
  Practices- 
			Paper
			 •- 
			2410.11795
			 •
			Published
				
			•- 
				18
			 
 - Improving Long-Text Alignment for Text-to-Image Diffusion Models- 
			Paper
			 •- 
			2410.11817
			 •
			Published
				
			•- 
				15
			 
 - Fluid: Scaling Autoregressive Text-to-image Generative Models with
  Continuous Tokens- 
			Paper
			 •- 
			2410.13863
			 •
			Published
				
			•- 
				38
			 
 - VidPanos: Generative Panoramic Videos from Casual Panning Videos- 
			Paper
			 •- 
			2410.13832
			 •
			Published
				
			•- 
				13
			 
 - FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion
  Model- 
			Paper
			 •- 
			2410.13925
			 •
			Published
				
			•- 
				24
			 
 - BiGR: Harnessing Binary Latent Codes for Image Generation and Improved
  Visual Representation Capabilities- 
			Paper
			 •- 
			2410.14672
			 •
			Published
				
			•- 
				8
			 
 - Scalable Ranked Preference Optimization for Text-to-Image Generation- 
			Paper
			 •- 
			2410.18013
			 •
			Published
				
			•- 
				15
			 
 - Stable Consistency Tuning: Understanding and Improving Consistency
  Models- 
			Paper
			 •- 
			2410.18958
			 •
			Published
				
			•- 
				10
			 
 - DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe
  Dataset Curation- 
			Paper
			 •- 
			2410.18666
			 •
			Published
				
			•- 
				19
			 
 - Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse
  Autoencoders- 
			Paper
			 •- 
			2410.22366
			 •
			Published
				
			•- 
				83
			 
 - Constant Acceleration Flow- 
			Paper
			 •- 
			2411.00322
			 •
			Published
				
			•- 
				24
			 
 - In-Context LoRA for Diffusion Transformers- 
			Paper
			 •- 
			2410.23775
			 •
			Published
				
			•- 
				11
			 
 - Training-free Regional Prompting for Diffusion Transformers- 
			Paper
			 •- 
			2411.02395
			 •
			Published
				
			•- 
				25
			 
 - Constrained Diffusion Implicit Models- 
			Paper
			 •- 
			2411.00359
			 •
			Published
				
			•- 
				6
			 
 - SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion
  Models- 
			Paper
			 •- 
			2411.05007
			 •
			Published
				
			•- 
				22
			 
 - Add-it: Training-Free Object Insertion in Images With Pretrained
  Diffusion Models- 
			Paper
			 •- 
			2411.07232
			 •
			Published
				
			•- 
				67
			 
 - OmniEdit: Building Image Editing Generalist Models Through Specialist
  Supervision- 
			Paper
			 •- 
			2411.07199
			 •
			Published
				
			•- 
				50
			 
 - Edify Image: High-Quality Image Generation with Pixel Space Laplacian
  Diffusion Models- 
			Paper
			 •- 
			2411.07126
			 •
			Published
				
			•- 
				30
			 
 - Watermark Anything with Localized Messages- 
			Paper
			 •- 
			2411.07231
			 •
			Published
				
			•- 
				21
			 
 - JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified
  Multimodal Understanding and Generation- 
			Paper
			 •- 
			2411.07975
			 •
			Published
				
			•- 
				30
			 
 - Scaling Properties of Diffusion Models for Perceptual Tasks- 
			Paper
			 •- 
			2411.08034
			 •
			Published
				
			•- 
				13
			 
 - MagicQuill: An Intelligent Interactive Image Editing System- 
			Paper
			 •- 
			2411.09703
			 •
			Published
				
			•- 
				78
			 
 - Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply
  Better Samples- 
			Paper
			 •- 
			2411.08954
			 •
			Published
				
			•- 
				10
			 
 - Region-Aware Text-to-Image Generation via Hard Binding and Soft
  Refinement- 
			Paper
			 •- 
			2411.06558
			 •
			Published
				
			•- 
				36
			 
 - FitDiT: Advancing the Authentic Garment Details for High-fidelity
  Virtual Try-on- 
			Paper
			 •- 
			2411.10499
			 •
			Published
				
			•- 
				13
			 
 - Continuous Speculative Decoding for Autoregressive Image Generation- 
			Paper
			 •- 
			2411.11925
			 •
			Published
				
			•- 
				16
			 
 - Stylecodes: Encoding Stylistic Information For Image Generation- 
			Paper
			 •- 
			2411.12811
			 •
			Published
				
			•- 
				12
			 
 - Generating Compositional Scenes via Text-to-image RGBA Instance
  Generation- 
			Paper
			 •- 
			2411.10913
			 •
			Published
				
			•- 
				4
			 
 - Stable Flow: Vital Layers for Training-Free Image Editing- 
			Paper
			 •- 
			2411.14430
			 •
			Published
				
			•- 
				21
			 
 - Style-Friendly SNR Sampler for Style-Driven Generation- 
			Paper
			 •- 
			2411.14793
			 •
			Published
				
			•- 
				39
			 
 - OminiControl: Minimal and Universal Control for Diffusion Transformer- 
			Paper
			 •- 
			2411.15098
			 •
			Published
				
			•- 
				61
			 
 - MyTimeMachine: Personalized Facial Age Transformation- 
			Paper
			 •- 
			2411.14521
			 •
			Published
				
			•- 
				22
			 
 - Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot
  Subject-Driven Image Generator- 
			Paper
			 •- 
			2411.15466
			 •
			Published
				
			•- 
				39
			 
 - One Diffusion to Generate Them All- 
			Paper
			 •- 
			2411.16318
			 •
			Published
				
			•- 
				30
			 
 - Controllable Human Image Generation with Personalized Multi-Garments- 
			Paper
			 •- 
			2411.16801
			 •
			Published
				
			•- 
				4
			 
 - ROICtrl: Boosting Instance Control for Visual Generation- 
			Paper
			 •- 
			2411.17949
			 •
			Published
				
			•- 
				87
			 
 - DreamCache: Finetuning-Free Lightweight Personalized Image Generation
  via Feature Caching- 
			Paper
			 •- 
			2411.17786
			 •
			Published
				
			•- 
				12
			 
 - Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient- 
			Paper
			 •- 
			2411.17787
			 •
			Published
				
			•- 
				12
			 
 - Diffusion Self-Distillation for Zero-Shot Customized Image Generation- 
			Paper
			 •- 
			2411.18616
			 •
			Published
				
			•- 
				16
			 
 - Omegance: A Single Parameter for Various Granularities in
  Diffusion-Based Synthesis- 
			Paper
			 •- 
			2411.17769
			 •
			Published
				
			•- 
				8
			 
 - Edit Away and My Face Will not Stay: Personal Biometric Defense against
  Malicious Generative Editing- 
			Paper
			 •- 
			2411.16832
			 •
			Published
				
			•- 
				2
			 
 - TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction
  using Diffusion Models- 
			Paper
			 •- 
			2411.18350
			 •
			Published
				
			•- 
				29
			 
 - FAM Diffusion: Frequency and Attention Modulation for High-Resolution
  Image Generation with Stable Diffusion- 
			Paper
			 •- 
			2411.18552
			 •
			Published
				
			•- 
				18
			 
 - Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis- 
			Paper
			 •- 
			2412.01819
			 •
			Published
				
			•- 
				35
			 
 - Art-Free Generative Models: Art Creation Without Graphic Art Knowledge- 
			Paper
			 •- 
			2412.00176
			 •
			Published
				
			•- 
				9
			 
 - SNOOPI: Supercharged One-step Diffusion Distillation with Proper
  Guidance- 
			Paper
			 •- 
			2412.02687
			 •
			Published
				
			•- 
				113
			 
 - TokenFlow: Unified Image Tokenizer for Multimodal Understanding and
  Generation- 
			Paper
			 •- 
			2412.03069
			 •
			Published
				
			•- 
				35
			 
 - LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene
  Relighting- 
			Paper
			 •- 
			2412.00177
			 •
			Published
				
			•- 
				8
			 
 - A Noise is Worth Diffusion Guidance- 
			Paper
			 •- 
			2412.03895
			 •
			Published
				
			•- 
				30
			 
 - Negative Token Merging: Image-based Adversarial Feature Guidance- 
			Paper
			 •- 
			2412.01339
			 •
			Published
				
			•- 
				23
			 
 - AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent
  Diffusion Models- 
			Paper
			 •- 
			2412.04146
			 •
			Published
				
			•- 
				23
			 
 - Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution
  Image Synthesis- 
			Paper
			 •- 
			2412.04431
			 •
			Published
				
			•- 
				18
			 
 - ZipAR: Accelerating Autoregressive Image Generation through Spatial
  Locality- 
			Paper
			 •- 
			2412.04062
			 •
			Published
				
			•- 
				9
			 
 - SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step
  Diffusion- 
			Paper
			 •- 
			2412.04301
			 •
			Published
				
			•- 
				41
			 
 - PanoDreamer: 3D Panorama Synthesis from a Single Image- 
			Paper
			 •- 
			2412.04827
			 •
			Published
				
			•- 
				11
			 
 - DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for
  Customized Manga Generation- 
			Paper
			 •- 
			2412.07589
			 •
			Published
				
			•- 
				48
			 
 - Hidden in the Noise: Two-Stage Robust Watermarking for Images- 
			Paper
			 •- 
			2412.04653
			 •
			Published
				
			•- 
				31
			 
 - FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion
  Models- 
			Paper
			 •- 
			2412.07674
			 •
			Published
				
			•- 
				20
			 
 - UniReal: Universal Image Generation and Editing via Learning Real-world
  Dynamics- 
			Paper
			 •- 
			2412.07774
			 •
			Published
				
			•- 
				30
			 
 - LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style
  Conditioned Image Generation- 
			Paper
			 •- 
			2412.05148
			 •
			Published
				
			•- 
				12
			 
 - Learning Flow Fields in Attention for Controllable Person Image
  Generation- 
			Paper
			 •- 
			2412.08486
			 •
			Published
				
			•- 
				36
			 
 - FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow
  Models- 
			Paper
			 •- 
			2412.08629
			 •
			Published
				
			•- 
				12
			 
 - StyleStudio: Text-Driven Style Transfer with Selective Control of Style
  Elements- 
			Paper
			 •- 
			2412.08503
			 •
			Published
				
			•- 
				8
			 
 - EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via
  Multimodal LLM- 
			Paper
			 •- 
			2412.09618
			 •
			Published
				
			•- 
				21
			 
 - SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices
  with Efficient Architectures and Training- 
			Paper
			 •- 
			2412.09619
			 •
			Published
				
			•- 
				27
			 
 - LoRACLR: Contrastive Adaptation for Customization of Diffusion Models- 
			Paper
			 •- 
			2412.09622
			 •
			Published
				
			•- 
				8
			 
 - FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free
  Scale Fusion- 
			Paper
			 •- 
			2412.09626
			 •
			Published
				
			•- 
				21
			 
 - ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven
  Generation- 
			Paper
			 •- 
			2412.08645
			 •
			Published
				
			•- 
				11
			 
 - FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing- 
			Paper
			 •- 
			2412.07517
			 •
			Published
				
			•- 
				11
			 
 - FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers- 
			Paper
			 •- 
			2412.09611
			 •
			Published
				
			•- 
				10
			 
 - BrushEdit: All-In-One Image Inpainting and Editing- 
			Paper
			 •- 
			2412.10316
			 •
			Published
				
			•- 
				35
			 
 - ColorFlow: Retrieval-Augmented Image Sequence Colorization- 
			Paper
			 •- 
			2412.11815
			 •
			Published
				
			•- 
				26
			 
 - Causal Diffusion Transformers for Generative Modeling- 
			Paper
			 •- 
			2412.12095
			 •
			Published
				
			•- 
				23
			 
 - FashionComposer: Compositional Fashion Image Generation- 
			Paper
			 •- 
			2412.14168
			 •
			Published
				
			•- 
				16
			 
 - ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting
  with Diffusion Transformers- 
			Paper
			 •- 
			2412.12571
			 •
			Published
				
			•- 
				8
			 
 - Flowing from Words to Pixels: A Framework for Cross-Modality Evolution- 
			Paper
			 •- 
			2412.15213
			 •
			Published
				
			•- 
				28
			 
 - Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion- 
			Paper
			 •- 
			2412.14462
			 •
			Published
				
			•- 
				15
			 
 - 
			Paper
			 •- 
			2412.18653
			 •
			Published
				
			•- 
				84
			 
 - The Superposition of Diffusion Models Using the Itô Density Estimator- 
			Paper
			 •- 
			2412.17762
			 •
			Published
				
			•- 
				13
			 
 - From Elements to Design: A Layered Approach for Automatic Graphic Design
  Composition- 
			Paper
			 •- 
			2412.19712
			 •
			Published
				
			•- 
				15
			 
 - VMix: Improving Text-to-Image Diffusion Model with Cross-Attention
  Mixing Control- 
			Paper
			 •- 
			2412.20800
			 •
			Published
				
			•- 
				11
			 
 - DepthMaster: Taming Diffusion Models for Monocular Depth Estimation- 
			Paper
			 •- 
			2501.02576
			 •
			Published
				
			•- 
				15
			 
 - MagicFace: High-Fidelity Facial Expression Editing with Action-Unit
  Control- 
			Paper
			 •- 
			2501.02260
			 •
			Published
				
			•- 
				5
			 
 - The GAN is dead; long live the GAN! A Modern GAN Baseline- 
			Paper
			 •- 
			2501.05441
			 •
			Published
				
			•- 
				95
			 
 - MangaNinja: Line Art Colorization with Precise Reference Following- 
			Paper
			 •- 
			2501.08332
			 •
			Published
				
			•- 
				60
			 
 - Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models- 
			Paper
			 •- 
			2501.06751
			 •
			Published
				
			•- 
				32
			 
 - Democratizing Text-to-Image Masked Generative Models with Compact
  Text-Aware One-Dimensional Tokens- 
			Paper
			 •- 
			2501.07730
			 •
			Published
				
			•- 
				17
			 
 - FramePainter: Endowing Interactive Image Editing with Video Diffusion
  Priors- 
			Paper
			 •- 
			2501.08225
			 •
			Published
				
			•- 
				19
			 
 - 3DIS-FLUX: simple and efficient multi-instance generation with DiT
  rendering- 
			Paper
			 •- 
			2501.05131
			 •
			Published
				
			•- 
				37
			 
 - Inference-Time Scaling for Diffusion Models beyond Scaling Denoising
  Steps- 
			Paper
			 •- 
			2501.09732
			 •
			Published
				
			•- 
				71
			 
 - SynthLight: Portrait Relighting with Diffusion Model by Learning to
  Re-render Synthetic Faces- 
			Paper
			 •- 
			2501.09756
			 •
			Published
				
			•- 
				19
			 
 - Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions- 
			Paper
			 •- 
			2501.10020
			 •
			Published
				
			•- 
				24
			 
 - TokenVerse: Versatile Multi-concept Personalization in Token Modulation
  Space- 
			Paper
			 •- 
			2501.12224
			 •
			Published
				
			•- 
				47
			 
 - GPS as a Control Signal for Image Generation- 
			Paper
			 •- 
			2501.12390
			 •
			Published
				
			•- 
				15
			 
 - Can We Generate Images with CoT? Let's Verify and Reinforce Image
  Generation Step by Step- 
			Paper
			 •- 
			2501.13926
			 •
			Published
				
			•- 
				42
			 
 - One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation
  Using a Single Prompt- 
			Paper
			 •- 
			2501.13554
			 •
			Published
				
			•- 
				9
			 
 - AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and
  Modulation- 
			Paper
			 •- 
			2403.14614
			 •
			Published
				
			•- 
				4
			 
 - Denoising as Adaptation: Noise-Space Domain Adaptation for Image
  Restoration- 
			Paper
			 •- 
			2406.18516
			 •
			Published
				
			•- 
				4
			 
 - Visual Generation Without Guidance- 
			Paper
			 •- 
			2501.15420
			 •
			Published
				
			•- 
				8
			 
 - SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute
  in Linear Diffusion Transformer- 
			Paper
			 •- 
			2501.18427
			 •
			Published
				
			•- 
				21
			 
 - Inverse Bridge Matching Distillation- 
			Paper
			 •- 
			2502.01362
			 •
			Published
				
			•- 
				28
			 
 - LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion
  Transformer- 
			Paper
			 •- 
			2502.01105
			 •
			Published
				
			•- 
				20
			 
 - Weak-to-Strong Diffusion with Reflection- 
			Paper
			 •- 
			2502.00473
			 •
			Published
				
			•- 
				23
			 
 - Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More- 
			Paper
			 •- 
			2502.03738
			 •
			Published
				
			•- 
				11
			 
 - Dual Caption Preference Optimization for Diffusion Models- 
			Paper
			 •- 
			2502.06023
			 •
			Published
				
			•- 
				9
			 
 - Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient
  Text-to-Image Generation- 
			Paper
			 •- 
			2502.08690
			 •
			Published
				
			•- 
				43
			 
 - ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation- 
			Paper
			 •- 
			2502.09411
			 •
			Published
				
			•- 
				21
			 
 - Precise Parameter Localization for Textual Generation in Diffusion
  Models- 
			Paper
			 •- 
			2502.09935
			 •
			Published
				
			•- 
				12
			 
 - Region-Adaptive Sampling for Diffusion Transformers- 
			Paper
			 •- 
			2502.10389
			 •
			Published
				
			•- 
				53
			 
 - I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning
  in Diffusion Models- 
			Paper
			 •- 
			2502.10458
			 •
			Published
				
			•- 
				37
			 
 - Diffusion Models without Classifier-free Guidance- 
			Paper
			 •- 
			2502.12154
			 •
			Published
				
			•- 
				8
			 
 - PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data- 
			Paper
			 •- 
			2502.14397
			 •
			Published
				
			•- 
				41
			 
 - One-step Diffusion Models with f-Divergence Distribution Matching- 
			Paper
			 •- 
			2502.15681
			 •
			Published
				
			•- 
				8
			 
 - DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks- 
			Paper
			 •- 
			2502.17157
			 •
			Published
				
			•- 
				52
			 
 - GCC: Generative Color Constancy via Diffusing a Color Checker- 
			Paper
			 •- 
			2502.17435
			 •
			Published
				
			•- 
				28
			 
 - ART: Anonymous Region Transformer for Variable Multi-Layer Transparent
  Image Generation- 
			Paper
			 •- 
			2502.18364
			 •
			Published
				
			•- 
				37
			 
 - KV-Edit: Training-Free Image Editing for Precise Background Preservation- 
			Paper
			 •- 
			2502.17363
			 •
			Published
				
			•- 
				38
			 
 - K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs- 
			Paper
			 •- 
			2502.18461
			 •
			Published
				
			•- 
				17
			 
 - LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven
  Language Representation- 
			Paper
			 •- 
			2502.18302
			 •
			Published
				
			•- 
				5
			 
 - GHOST 2.0: generative high-fidelity one shot transfer of heads- 
			Paper
			 •- 
			2502.18417
			 •
			Published
				
			•- 
				67
			 
 - Distill Any Depth: Distillation Creates a Stronger Monocular Depth
  Estimator- 
			Paper
			 •- 
			2502.19204
			 •
			Published
				
			•- 
				11
			 
 - UniTok: A Unified Tokenizer for Visual Generation and Understanding- 
			Paper
			 •- 
			2502.20321
			 •
			Published
				
			•- 
				31
			 
 - Multimodal Representation Alignment for Image Generation: Text-Image
  Interleaved Control Is Easier Than You Think- 
			Paper
			 •- 
			2502.20172
			 •
			Published
				
			•- 
				28
			 
 - FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality
  Samples with Less Compute- 
			Paper
			 •- 
			2502.20126
			 •
			Published
				
			•- 
				20
			 
 - Training Consistency Models with Variational Noise Coupling- 
			Paper
			 •- 
			2502.18197
			 •
			Published
				
			•- 
				7
			 
 - How far can we go with ImageNet for Text-to-Image generation?- 
			Paper
			 •- 
			2502.21318
			 •
			Published
				
			•- 
				26
			 
 - RectifiedHR: Enable Efficient High-Resolution Image Generation via
  Energy Rectification- 
			Paper
			 •- 
			2503.02537
			 •
			Published
				
			•- 
				12
			 
 - Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with
  Multimodal Large Language Model- 
			Paper
			 •- 
			2503.06141
			 •
			Published
				
			•- 
				4
			 
 - Unleashing the Potential of Large Language Models for Text-to-Image
  Generation through Autoregressive Representation Alignment- 
			Paper
			 •- 
			2503.07334
			 •
			Published
				
			•- 
				16
			 
 - Seedream 2.0: A Native Chinese-English Bilingual Image Generation
  Foundation Model- 
			Paper
			 •- 
			2503.07703
			 •
			Published
				
			•- 
				37
			 
 - LightGen: Efficient Image Generation through Knowledge Distillation and
  Direct Preference Optimization- 
			Paper
			 •- 
			2503.08619
			 •
			Published
				
			•- 
				20
			 
 - ObjectMover: Generative Object Movement with Video Prior- 
			Paper
			 •- 
			2503.08037
			 •
			Published
				
			•- 
				5
			 
 - Alias-Free Latent Diffusion Models:Improving Fractional Shift
  Equivariance of Diffusion Latent Space- 
			Paper
			 •- 
			2503.09419
			 •
			Published
				
			•- 
				6
			 
 - CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing- 
			Paper
			 •- 
			2503.10613
			 •
			Published
				
			•- 
				79
			 
 - Silent Branding Attack: Trigger-free Data Poisoning Attack on
  Text-to-Image Diffusion Models- 
			Paper
			 •- 
			2503.09669
			 •
			Published
				
			•- 
				35
			 
 - OmniPaint: Mastering Object-Oriented Editing via Disentangled
  Insertion-Removal Inpainting- 
			Paper
			 •- 
			2503.08677
			 •
			Published
				
			•- 
				29
			 
 - SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency
  Distillation- 
			Paper
			 •- 
			2503.09641
			 •
			Published
				
			•- 
				40
			 
 - ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style
  Transfer- 
			Paper
			 •- 
			2503.10614
			 •
			Published
				
			•- 
				8
			 
 - Autoregressive Image Generation with Randomized Parallel Decoding- 
			Paper
			 •- 
			2503.10568
			 •
			Published
				
			•- 
				8
			 
 - Piece it Together: Part-Based Concepting with IP-Priors- 
			Paper
			 •- 
			2503.10365
			 •
			Published
				
			•- 
				8
			 
 - PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference
  Time by Leveraging Sparsity- 
			Paper
			 •- 
			2503.07677
			 •
			Published
				
			•- 
				86
			 
 - DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale
  Text-to-Image Models- 
			Paper
			 •- 
			2503.12885
			 •
			Published
				
			•- 
				43
			 
 - Edit Transfer: Learning Image Editing via Vision In-Context Relations- 
			Paper
			 •- 
			2503.13327
			 •
			Published
				
			•- 
				29
			 
 - BlobCtrl: A Unified and Flexible Framework for Element-level Image
  Generation and Editing- 
			Paper
			 •- 
			2503.13434
			 •
			Published
				
			•- 
				27
			 
 - Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation- 
			Paper
			 •- 
			2503.13070
			 •
			Published
				
			•- 
				10
			 
 - GenStereo: Towards Open-World Generation of Stereo Images and
  Unsupervised Matching- 
			Paper
			 •- 
			2503.12720
			 •
			Published
				
			•- 
				4
			 
 - CapArena: Benchmarking and Analyzing Detailed Image Captioning in the
  LLM Era- 
			Paper
			 •- 
			2503.12329
			 •
			Published
				
			•- 
				26
			 
 - Atlas: Multi-Scale Attention Improves Long Context Image Modeling- 
			Paper
			 •- 
			2503.12355
			 •
			Published
				
			•- 
				12
			 
 - Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion
  Transformers via In-Context Reflection- 
			Paper
			 •- 
			2503.12271
			 •
			Published
				
			•- 
				9
			 
 - LEGION: Learning to Ground and Explain for Synthetic Image Detection- 
			Paper
			 •- 
			2503.15264
			 •
			Published
				
			•- 
				21
			 
 - Scale-wise Distillation of Diffusion Models- 
			Paper
			 •- 
			2503.16397
			 •
			Published
				
			•- 
				41
			 
 - DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers- 
			Paper
			 •- 
			2503.14487
			 •
			Published
				
			•- 
				27
			 
 - Ultra-Resolution Adaptation with Ease- 
			Paper
			 •- 
			2503.16322
			 •
			Published
				
			•- 
				13
			 
 - FFaceNeRF: Few-shot Face Editing in Neural Radiance Fields- 
			Paper
			 •- 
			2503.17095
			 •
			Published
				
			•- 
				5
			 
 - Single Image Iterative Subject-driven Generation and Editing- 
			Paper
			 •- 
			2503.16025
			 •
			Published
				
			•- 
				14
			 
 - Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent
  Diffusion Models- 
			Paper
			 •- 
			2503.18352
			 •
			Published
				
			•- 
				6
			 
 - RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame
  Animated Sticker Generation- 
			Paper
			 •- 
			2503.17735
			 •
			Published
				
			•- 
				3
			 
 - Inference-Time Scaling for Flow Models via Stochastic Generation and
  Rollover Budget Forcing- 
			Paper
			 •- 
			2503.19385
			 •
			Published
				
			•- 
				34
			 
 - Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection
  with Artifact Explanation- 
			Paper
			 •- 
			2503.14905
			 •
			Published
				
			•- 
				20
			 
 - Latent Space Super-Resolution for Higher-Resolution Image Generation
  with Diffusion Models- 
			Paper
			 •- 
			2503.18446
			 •
			Published
				
			•- 
				12
			 
 - Unconditional Priors Matter! Improving Conditional Generation of
  Fine-Tuned Diffusion Models- 
			Paper
			 •- 
			2503.20240
			 •
			Published
				
			•- 
				22
			 
 - LeX-Art: Rethinking Text Generation via Scalable High-Quality Data
  Synthesis- 
			Paper
			 •- 
			2503.21749
			 •
			Published
				
			•- 
				26
			 
 - Lumina-Image 2.0: A Unified and Efficient Image Generative Framework- 
			Paper
			 •- 
			2503.21758
			 •
			Published
				
			•- 
				22
			 
 - Unified Multimodal Discrete Diffusion- 
			Paper
			 •- 
			2503.20853
			 •
			Published
				
			•- 
				9
			 
 - TextCrafter: Accurately Rendering Multiple Texts in Complex Visual
  Scenes- 
			Paper
			 •- 
			2503.23461
			 •
			Published
				
			•- 
				94
			 
 - ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and
  Diffusion Refinement- 
			Paper
			 •- 
			2504.01934
			 •
			Published
				
			•- 
				22
			 
 - Boost Your Own Human Image Generation Model via Direct Preference
  Optimization with AI Feedback- 
			Paper
			 •- 
			2405.20216
			 •
			Published
				
			•- 
				21
			 
 - VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via
  Iterative Instruction Tuning and Reinforcement Learning- 
			Paper
			 •- 
			2504.02949
			 •
			Published
				
			•- 
				21
			 
 - SPF-Portrait: Towards Pure Portrait Customization with Semantic
  Pollution-Free Fine-tuning- 
			Paper
			 •- 
			2504.00396
			 •
			Published
				
			•- 
				3
			 
 - Concept Lancet: Image Editing with Compositional Representation
  Transplant- 
			Paper
			 •- 
			2504.02828
			 •
			Published
				
			•- 
				16
			 
 - An Empirical Study of GPT-4o Image Generation Capabilities- 
			Paper
			 •- 
			2504.05979
			 •
			Published
				
			•- 
				63
			 
 - Less-to-More Generalization: Unlocking More Controllability by
  In-Context Generation- 
			Paper
			 •- 
			2504.02160
			 •
			Published
				
			•- 
				37
			 
 - Tuning-Free Image Editing with Fidelity and Editability via Unified
  Latent Diffusion Model- 
			Paper
			 •- 
			2504.05594
			 •
			Published
				
			•- 
				11
			 
 - HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned
  Guidance- 
			Paper
			 •- 
			2504.06232
			 •
			Published
				
			•- 
				13
			 
 - DDT: Decoupled Diffusion Transformer- 
			Paper
			 •- 
			2504.05741
			 •
			Published
				
			•- 
				76
			 
 - Are We Done with Object-Centric Learning?- 
			Paper
			 •- 
			2504.07092
			 •
			Published
				
			•- 
				5
			 
 - VisualCloze: A Universal Image Generation Framework via Visual
  In-Context Learning- 
			Paper
			 •- 
			2504.07960
			 •
			Published
				
			•- 
				50
			 
 - Compass Control: Multi Object Orientation Control for Text-to-Image
  Generation- 
			Paper
			 •- 
			2504.06752
			 •
			Published
				
			•- 
				9
			 
 - GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
  Autoregressive Image Generation- 
			Paper
			 •- 
			2504.08736
			 •
			Published
				
			•- 
				46
			 
 - ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image
  Restoration- 
			Paper
			 •- 
			2504.08591
			 •
			Published
				
			•- 
				18
			 
 - PixelFlow: Pixel-Space Generative Models with Flow- 
			Paper
			 •- 
			2504.07963
			 •
			Published
				
			•- 
				18
			 
 - SimpleAR: Pushing the Frontier of Autoregressive Visual Generation
  through Pretraining, SFT, and RL- 
			Paper
			 •- 
			2504.11455
			 •
			Published
				
			•- 
				14
			 
 - D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation- 
			Paper
			 •- 
			2504.09454
			 •
			Published
				
			•- 
				11
			 
 - Cobra: Efficient Line Art COlorization with BRoAder References- 
			Paper
			 •- 
			2504.12240
			 •
			Published
				
			•- 
				27
			 
 - REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion
  Transformers- 
			Paper
			 •- 
			2504.10483
			 •
			Published
				
			•- 
				20
			 
 - DMM: Building a Versatile Image Generation Model via Distillation-Based
  Model Merging- 
			Paper
			 •- 
			2504.12364
			 •
			Published
				
			•- 
				22
			 
 - InstantCharacter: Personalize Any Characters with a Scalable Diffusion
  Transformer Framework- 
			Paper
			 •- 
			2504.12395
			 •
			Published
				
			•- 
				16
			 
 - Tokenize Image Patches: Global Context Fusion for Effective Haze Removal
  in Large Images- 
			Paper
			 •- 
			2504.09621
			 •
			Published
				
			•- 
				11
			 
 - LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping- 
			Paper
			 •- 
			2504.08902
			 •
			Published
				
			•- 
				7
			 
 - Personalized Text-to-Image Generation with Auto-Regressive Models- 
			Paper
			 •- 
			2504.13162
			 •
			Published
				
			•- 
				18
			 
 - From Reflection to Perfection: Scaling Inference-Time Optimization for
  Text-to-Image Diffusion Models via Reflection Tuning- 
			Paper
			 •- 
			2504.16080
			 •
			Published
				
			•- 
				15
			 
 - DreamID: High-Fidelity and Fast diffusion-based Face Swapping via
  Triplet ID Group Learning- 
			Paper
			 •- 
			2504.14509
			 •
			Published
				
			•- 
				50
			 
 - DreamO: A Unified Framework for Image Customization- 
			Paper
			 •- 
			2504.16915
			 •
			Published
				
			•- 
				24
			 
 - Step1X-Edit: A Practical Framework for General Image Editing- 
			Paper
			 •- 
			2504.17761
			 •
			Published
				
			•- 
				92
			 
 - RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image
  Generation- 
			Paper
			 •- 
			2504.17502
			 •
			Published
				
			•- 
				55
			 
 - Token-Shuffle: Towards High-Resolution Image Generation with
  Autoregressive Models- 
			Paper
			 •- 
			2504.17789
			 •
			Published
				
			•- 
				23
			 
 - Boosting Generative Image Modeling via Joint Image-Feature Synthesis- 
			Paper
			 •- 
			2504.16064
			 •
			Published
				
			•- 
				14
			 
 - RepText: Rendering Visual Text via Replicating- 
			Paper
			 •- 
			2504.19724
			 •
			Published
				
			•- 
				31
			 
 - In-Context Edit: Enabling Instructional Image Editing with In-Context
  Generation in Large Scale Diffusion Transformer- 
			Paper
			 •- 
			2504.20690
			 •
			Published
				
			•- 
				19
			 
 - PixelHacker: Image Inpainting with Structural and Semantic Consistency- 
			Paper
			 •- 
			2504.20438
			 •
			Published
				
			•- 
				44
			 
 - SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based
  Image Editing- 
			Paper
			 •- 
			2505.02370
			 •
			Published
				
			•- 
				14
			 
 - MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset
  via Attention Routing- 
			Paper
			 •- 
			2505.02823
			 •
			Published
				
			•- 
				5
			 
 - Flow-GRPO: Training Flow Matching Models via Online RL- 
			Paper
			 •- 
			2505.05470
			 •
			Published
				
			•- 
				85
			 
 - Unified Continuous Generative Models- 
			Paper
			 •- 
			2505.07447
			 •
			Published
				
			•- 
				43
			 
 - MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills- 
			Paper
			 •- 
			2505.06176
			 •
			Published
				
			•- 
				12
			 
 - LightLab: Controlling Light Sources in Images with Diffusion Models- 
			Paper
			 •- 
			2505.09608
			 •
			Published
				
			•- 
				36
			 
 - End-to-End Vision Tokenizer Tuning- 
			Paper
			 •- 
			2505.10562
			 •
			Published
				
			•- 
				22
			 
 - Hunyuan-Game: Industrial-grade Intelligent Game Creation Model- 
			Paper
			 •- 
			2505.14135
			 •
			Published
				
			•- 
				15
			 
 - KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models- 
			Paper
			 •- 
			2505.16707
			 •
			Published
				
			•- 
				45
			 
 - Scaling Diffusion Transformers Efficiently via μP- 
			Paper
			 •- 
			2505.15270
			 •
			Published
				
			•- 
				35
			 
 - OmniConsistency: Learning Style-Agnostic Consistency from Paired
  Stylization Data- 
			Paper
			 •- 
			2505.18445
			 •
			Published
				
			•- 
				64
			 
 - ImgEdit: A Unified Image Editing Dataset and Benchmark- 
			Paper
			 •- 
			2505.20275
			 •
			Published
				
			•- 
				18
			 
 - DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via
  Next-Detail Prediction- 
			Paper
			 •- 
			2505.21473
			 •
			Published
				
			•- 
				16
			 
 - D-AR: Diffusion via Autoregressive Models- 
			Paper
			 •- 
			2505.23660
			 •
			Published
				
			•- 
				34
			 
 - LoRAShop: Training-Free Multi-Concept Image Generation and Editing with
  Rectified Flow Transformers- 
			Paper
			 •- 
			2505.23758
			 •
			Published
				
			•- 
				22
			 
 - EasyText: Controllable Diffusion Transformer for Multilingual Text
  Rendering- 
			Paper
			 •- 
			2505.24417
			 •
			Published
				
			•- 
				13
			 
 - ReasonGen-R1: CoT for Autoregressive Image generation models through SFT
  and RL- 
			Paper
			 •- 
			2505.24875
			 •
			Published
				
			•- 
				10
			 
 - Cora: Correspondence-aware image editing using few step diffusion- 
			Paper
			 •- 
			2505.23907
			 •
			Published
				
			•- 
				11
			 
 - ComposeAnything: Composite Object Priors for Text-to-Image Generation- 
			Paper
			 •- 
			2505.24086
			 •
			Published
				
			•- 
				5
			 
 - SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image
  Distillation- 
			Paper
			 •- 
			2506.00523
			 •
			Published
				
			•- 
				3
			 
 - RelationAdapter: Learning and Transferring Visual Relation with
  Diffusion Transformers- 
			Paper
			 •- 
			2506.02528
			 •
			Published
				
			•- 
				15
			 
 - Image Editing As Programs with Diffusion Models- 
			Paper
			 •- 
			2506.04158
			 •
			Published
				
			•- 
				24
			 
 - DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via
  Diffusion Transformers- 
			Paper
			 •- 
			2505.21541
			 •
			Published
				
			•- 
				7
			 
 - RefEdit: A Benchmark and Method for Improving Instruction-based Image
  Editing Model on Referring Expressions- 
			Paper
			 •- 
			2506.03448
			 •
			Published
				
			•- 
				4
			 
 - MARBLE: Material Recomposition and Blending in CLIP-Space- 
			Paper
			 •- 
			2506.05313
			 •
			Published
				
			•- 
				2
			 
 - STARFlow: Scaling Latent Normalizing Flows for High-resolution Image
  Synthesis- 
			Paper
			 •- 
			2506.06276
			 •
			Published
				
			•- 
				23
			 
 - Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers- 
			Paper
			 •- 
			2506.07986
			 •
			Published
				
			•- 
				19
			 
 - Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
  Prediction- 
			Paper
			 •- 
			2404.02905
			 •
			Published
				
			•- 
				74
			 
 - Text-Aware Image Restoration with Diffusion Models- 
			Paper
			 •- 
			2506.09993
			 •
			Published
				
			•- 
				41
			 
 - Fine-Grained Perturbation Guidance via Attention Head Selection- 
			Paper
			 •- 
			2506.10978
			 •
			Published
				
			•- 
				25
			 
 - PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a
  Unified Framework- 
			Paper
			 •- 
			2506.10741
			 •
			Published
				
			•- 
				27
			 
 - CreatiPoster: Towards Editable and Controllable Multi-Layer Graphic
  Design Generation- 
			Paper
			 •- 
			2506.10890
			 •
			Published
				
			•- 
				9
			 
 - Token Perturbation Guidance for Diffusion Models- 
			Paper
			 •- 
			2506.10036
			 •
			Published
				
			•- 
				5
			 
 - AR-RAG: Autoregressive Retrieval Augmentation for Image Generation- 
			Paper
			 •- 
			2506.06962
			 •
			Published
				
			•- 
				28
			 
 - Ambient Diffusion Omni: Training Good Models with Bad Data- 
			Paper
			 •- 
			2506.10038
			 •
			Published
				
			•- 
				9
			 
 - Watermarking Autoregressive Image Generation- 
			Paper
			 •- 
			2506.16349
			 •
			Published
				
			•- 
				3
			 
 - Audit & Repair: An Agentic Framework for Consistent Story Visualization
  in Text-to-Image Diffusion Models- 
			Paper
			 •- 
			2506.18900
			 •
			Published
				
			•- 
				3
			 
 - Improving Progressive Generation with Decomposable Flow Matching- 
			Paper
			 •- 
			2506.19839
			 •
			Published
				
			•- 
				7
			 
 - ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image
  Generation- 
			Paper
			 •- 
			2506.18095
			 •
			Published
				
			•- 
				65
			 
 - Inverse-and-Edit: Effective and Fast Image Editing by Cycle Consistency
  Models- 
			Paper
			 •- 
			2506.19103
			 •
			Published
				
			•- 
				42
			 
 - XVerse: Consistent Multi-Subject Control of Identity and Semantic
  Attributes via DiT Modulation- 
			Paper
			 •- 
			2506.21416
			 •
			Published
				
			•- 
				28
			 
 - From Ideal to Real: Unified and Data-Efficient Dense Prediction for
  Real-World Scenarios- 
			Paper
			 •- 
			2506.20279
			 •
			Published
				
			•- 
				19
			 
 - Noise Consistency Training: A Native Approach for One-Step Generator in
  Learning Additional Controls- 
			Paper
			 •- 
			2506.19741
			 •
			Published
				
			•- 
				4
			 
 - Calligrapher: Freestyle Text Image Customization- 
			Paper
			 •- 
			2506.24123
			 •
			Published
				
			•- 
				37
			 
 - Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric
  Attention- 
			Paper
			 •- 
			2506.23542
			 •
			Published
				
			•- 
				14
			 
 - Heeding the Inner Voice: Aligning ControlNet Training via Intermediate
  Features Feedback- 
			Paper
			 •- 
			2507.02321
			 •
			Published
				
			•- 
				39
			 
 - Beyond Simple Edits: X-Planner for Complex Instruction-Based Image
  Editing- 
			Paper
			 •- 
			2507.05259
			 •
			Published
				
			•- 
				5
			 
 - NeoBabel: A Multilingual Open Tower for Visual Generation- 
			Paper
			 •- 
			2507.06137
			 •
			Published
				
			•- 
				4
			 
 - Vision Foundation Models as Effective Visual Tokenizers for
  Autoregressive Image Generation- 
			Paper
			 •- 
			2507.08441
			 •
			Published
				
			•- 
				61
			 
 - Subject-Consistent and Pose-Diverse Text-to-Image Generation- 
			Paper
			 •- 
			2507.08396
			 •
			Published
				
			•- 
				15
			 
 - DreamPoster: A Unified Framework for Image-Conditioned Generative Poster
  Design- 
			Paper
			 •- 
			2507.04218
			 •
			Published
				
			•- 
				12
			 
 - Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation
  from Diffusion Models- 
			Paper
			 •- 
			2507.07104
			 •
			Published
				
			•- 
				45
			 
 - CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models- 
			Paper
			 •- 
			2507.13984
			 •
			Published
				
			•- 
				24
			 
 - NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining- 
			Paper
			 •- 
			2507.14119
			 •
			Published
				
			•- 
				58
			 
 - Latent Denoising Makes Good Visual Tokenizers- 
			Paper
			 •- 
			2507.15856
			 •
			Published
				
			•- 
				9
			 
 - Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated
  Diffusion Transformers- 
			Paper
			 •- 
			2507.08422
			 •
			Published
				
			•- 
				36
			 
 - TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance- 
			Paper
			 •- 
			2507.18192
			 •
			Published
				
			•- 
				7
			 
 - X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image
  Generative Models Great Again- 
			Paper
			 •- 
			2507.22058
			 •
			Published
				
			•- 
				38
			 
 - MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE- 
			Paper
			 •- 
			2507.21802
			 •
			Published
				
			•- 
				15
			 
 - PixNerd: Pixel Neural Field Diffusion- 
			Paper
			 •- 
			2507.23268
			 •
			Published
				
			•- 
				51
			 
 - Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding
  and Generation- 
			Paper
			 •- 
			2508.03320
			 •
			Published
				
			•- 
				61
			 
 - The Promise of RL for Autoregressive Image Editing- 
			Paper
			 •- 
			2508.01119
			 •
			Published
				
			•- 
				11
			 
 - LAMIC: Layout-Aware Multi-Image Composition via Scalability of
  Multimodal Diffusion Transformer- 
			Paper
			 •- 
			2508.00477
			 •
			Published
				
			•- 
				8
			 
 - HPSv3: Towards Wide-Spectrum Human Preference Score- 
			Paper
			 •- 
			2508.03789
			 •
			Published
				
			•- 
				18
			 
 - The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in
  Text-to-Image Models- 
			Paper
			 •- 
			2507.23313
			 •
			Published
				
			•- 
				1
			 
 - Voost: A Unified and Scalable Diffusion Transformer for Bidirectional
  Virtual Try-On and Try-Off- 
			Paper
			 •- 
			2508.04825
			 •
			Published
				
			•- 
				57
			 
 - Story2Board: A Training-Free Approach for Expressive Storyboard
  Generation- 
			Paper
			 •- 
			2508.09983
			 •
			Published
				
			•- 
				68
			 
 - CannyEdit: Selective Canny Control and Dual-Prompt Guidance for
  Training-Free Image Editing- 
			Paper
			 •- 
			2508.06937
			 •
			Published
				
			•- 
				7
			 
 - NextStep-1: Toward Autoregressive Image Generation with Continuous
  Tokens at Scale- 
			Paper
			 •- 
			2508.10711
			 •
			Published
				
			•- 
				142
			 
 - Next Visual Granularity Generation- 
			Paper
			 •- 
			2508.12811
			 •
			Published
				
			•- 
				49
			 
 - S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of
  Diffusion Models- 
			Paper
			 •- 
			2508.12880
			 •
			Published
				
			•- 
				46
			 
 - Training-Free Text-Guided Color Editing with Multi-Modal Diffusion
  Transformer- 
			Paper
			 •- 
			2508.09131
			 •
			Published
				
			•- 
				16
			 
 - TempFlow-GRPO: When Timing Matters for GRPO in Flow Models- 
			Paper
			 •- 
			2508.04324
			 •
			Published
				
			•- 
				11
			 
 - Visual Autoregressive Modeling for Instruction-Guided Image Editing- 
			Paper
			 •- 
			2508.15772
			 •
			Published
				
			•- 
				9
			 
 - Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance
  for Text-to-Image Generation- 
			Paper
			 •- 
			2508.18032
			 •
			Published
				
			•- 
				40
			 
 - T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image
  Generation- 
			Paper
			 •- 
			2508.17472
			 •
			Published
				
			•- 
				26
			 
 - Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable
  Text-to-Image Reinforcement Learning- 
			Paper
			 •- 
			2508.20751
			 •
			Published
				
			•- 
				89
			 
 - USO: Unified Style and Subject-Driven Generation via Disentangled and
  Reward Learning- 
			Paper
			 •- 
			2508.18966
			 •
			Published
				
			•- 
				56
			 
 - OneReward: Unified Mask-Guided Image Generation via Multi-Task Human
  Preference Learning- 
			Paper
			 •- 
			2508.21066
			 •
			Published
				
			•- 
				13
			 
 - Discrete Noise Inversion for Next-scale Autoregressive Text-based Image
  Editing- 
			Paper
			 •- 
			2509.01984
			 •
			Published
				
			•- 
				6
			 
 - Interleaving Reasoning for Better Text-to-Image Generation- 
			Paper
			 •- 
			2509.06945
			 •
			Published
				
			•- 
				13
			 
 - Reconstruction Alignment Improves Unified Multimodal Models- 
			Paper
			 •- 
			2509.07295
			 •
			Published
				
			•- 
				39
			 
 - UMO: Scaling Multi-Identity Consistency for Image Customization via
  Matching Reward- 
			Paper
			 •- 
			2509.06818
			 •
			Published
				
			•- 
				29
			 
 - Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human
  Preference- 
			Paper
			 •- 
			2509.06942
			 •
			Published
				
			•- 
				16
			 
 - Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with
  Quantization-Aware Scheduling- 
			Paper
			 •- 
			2509.01624
			 •
			Published
				
			•- 
				7
			 
 - RewardDance: Reward Scaling in Visual Generation- 
			Paper
			 •- 
			2509.08826
			 •
			Published
				
			•- 
				72
			 
 - Can Understanding and Generation Truly Benefit Together -- or Just
  Coexist?- 
			Paper
			 •- 
			2509.09666
			 •
			Published
				
			•- 
				33
			 
 - InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis- 
			Paper
			 •- 
			2509.10441
			 •
			Published
				
			•- 
				30
			 
 - LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion
  Transformers via Explicit Correspondence- 
			Paper
			 •- 
			2509.12203
			 •
			Published
				
			•- 
				19
			 
 - Image Tokenizer Needs Post-Training- 
			Paper
			 •- 
			2509.12474
			 •
			Published
				
			•- 
				7
			 
 - Hyper-Bagel: A Unified Acceleration Framework for Multimodal
  Understanding and Generation- 
			Paper
			 •- 
			2509.18824
			 •
			Published
				
			•- 
				22
			 
 - IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance- 
			Paper
			 •- 
			2509.26231
			 •
			Published
				
			•- 
				17
			 
 - DreamOmni2: Multimodal Instruction-based Editing and Generation- 
			Paper
			 •- 
			2510.06679
			 •
			Published
				
			•- 
				73
			 
 - Ming-UniVision: Joint Image Understanding and Generation with a Unified
  Continuous Tokenizer- 
			Paper
			 •- 
			2510.06590
			 •
			Published
				
			•- 
				69
			 
 - Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal
  Generation and Understanding- 
			Paper
			 •- 
			2510.06308
			 •
			Published
				
			•- 
				51
			 
 - Heptapod: Language Modeling on Visual Signals- 
			Paper
			 •- 
			2510.06673
			 •
			Published
				
			•- 
				3
			 
 - Latent Diffusion Model without Variational Autoencoder- 
			Paper
			 •- 
			2510.15301
			 •
			Published
				
			•- 
				39
			 
 - BLIP3o-NEXT: Next Frontier of Native Image Generation- 
			Paper
			 •- 
			2510.15857
			 •
			Published
				
			•- 
				21
			 
 - WithAnyone: Towards Controllable and ID Consistent Image Generation- 
			Paper
			 •- 
			2510.14975
			 •
			Published
				
			•- 
				76
			 
 - Learning an Image Editing Model without Image Editing Pairs- 
			Paper
			 •- 
			2510.14978
			 •
			Published
				
			•- 
				6
			 
 - Kontinuous Kontext: Continuous Strength Control for Instruction-based
  Image Editing- 
			Paper
			 •- 
			2510.08532
			 •
			Published
				
			•- 
				5
			 
 - World-To-Image: Grounding Text-to-Image Generation with Agent-Driven
  World Knowledge- 
			Paper
			 •- 
			2510.04201
			 •
			Published
				
			•- 
				4
			 
 - VLM-Guided Adaptive Negative Prompting for Creative Generation- 
			Paper
			 •- 
			2510.10715
			 •
			Published
				
			•- 
				3