TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published 7 days ago • 17
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published 9 days ago • 71
CLEAR: Character Unlearning in Textual and Visual Modalities Paper • 2410.18057 • Published 14 days ago • 196
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Paper • 2410.21465 • Published 9 days ago • 9
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model Paper • 2410.13925 • Published 20 days ago • 21
Scalable Ranked Preference Optimization for Text-to-Image Generation Paper • 2410.18013 • Published 14 days ago • 14
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Paper • 2410.10812 • Published 23 days ago • 14
BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way Paper • 2410.06241 • Published 29 days ago • 10
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper • 2410.08159 • Published 27 days ago • 23
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow Paper • 2410.07303 • Published 28 days ago • 16
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design Paper • 2410.05677 • Published 30 days ago • 14
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3 • 36
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Paper • 2410.02740 • Published Oct 3 • 52
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18 • 73