sdtana's picture

sdtana

sdtana

·

roxani_17

AI & ML interests

None yet

Recent Activity

upvoted a paper 13 days ago

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think

upvoted a paper 29 days ago

TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation

upvoted a paper about 1 month ago

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

View all activity

Organizations

sdtana's activity

upvoted a paper 13 days ago

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think

Paper • 2502.20172 • Published 15 days ago • 27

upvoted a paper 29 days ago

TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation

Paper • 2502.07870 • Published about 1 month ago • 43

upvoted 3 papers about 1 month ago

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Paper • 2502.04320 • Published Feb 6 • 35

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

Paper • 2502.01105 • Published Feb 3 • 20

SliderSpace: Decomposing the Visual Capabilities of Diffusion Models

Paper • 2502.01639 • Published Feb 3 • 25

upvoted a paper about 2 months ago

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 70

upvoted 2 papers 2 months ago

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9 • 88

1.58-bit FLUX

Paper • 2412.18653 • Published Dec 24, 2024 • 80

upvoted 5 papers 3 months ago

Parallelized Autoregressive Visual Generation

Paper • 2412.15119 • Published Dec 19, 2024 • 51

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Paper • 2412.16112 • Published Dec 20, 2024 • 23

APOLLO: SGD-like Memory, AdamW-level Performance

Paper • 2412.05270 • Published Dec 6, 2024 • 38

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 138

Negative Token Merging: Image-based Adversarial Feature Guidance

Paper • 2412.01339 • Published Dec 2, 2024 • 23

upvoted 5 papers 4 months ago

Adaptive Blind All-in-One Image Restoration

Paper • 2411.18412 • Published Nov 27, 2024 • 4

ROICtrl: Boosting Instance Control for Visual Generation

Paper • 2411.17949 • Published Nov 27, 2024 • 83

Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

Paper • 2411.17769 • Published Nov 26, 2024 • 7

Style-Friendly SNR Sampler for Style-Driven Generation

Paper • 2411.14793 • Published Nov 22, 2024 • 36

SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Paper • 2411.05007 • Published Nov 7, 2024 • 18

upvoted 2 papers 6 months ago

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17, 2024 • 29

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17, 2024 • 112