ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Paper • 2502.04320 • Published 11 days ago • 33
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published 10 days ago • 60
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 94
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Paper • 2412.01292 • Published Dec 2, 2024 • 13
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images Paper • 2412.03517 • Published Dec 4, 2024 • 19
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion Paper • 2412.04462 • Published Dec 5, 2024 • 8
2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction Paper • 2412.03428 • Published Dec 4, 2024 • 11
PanoDreamer: 3D Panorama Synthesis from a Single Image Paper • 2412.04827 • Published Dec 6, 2024 • 11
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 133
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper • 2412.07760 • Published Dec 10, 2024 • 50