TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published 28 days ago • 22
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published about 1 month ago • 75
CLEAR: Character Unlearning in Textual and Visual Modalities Paper • 2410.18057 • Published Oct 23 • 200
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Paper • 2410.21465 • Published about 1 month ago • 10
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model Paper • 2410.13925 • Published Oct 17 • 22
Scalable Ranked Preference Optimization for Text-to-Image Generation Paper • 2410.18013 • Published Oct 23 • 14
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Paper • 2410.10812 • Published Oct 14 • 14
BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way Paper • 2410.06241 • Published Oct 8 • 10
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper • 2410.08159 • Published Oct 10 • 24
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow Paper • 2410.07303 • Published Oct 9 • 17
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design Paper • 2410.05677 • Published Oct 8 • 14
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3 • 36
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Paper • 2410.02740 • Published Oct 3 • 52