Weighted-Reward Preference Optimization for Implicit Model Fusion Paper • 2412.03187 • Published Dec 4, 2024 • 9
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Paper • 2412.03565 • Published Dec 4, 2024 • 11
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation Paper • 2412.03558 • Published Dec 4, 2024 • 15
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Paper • 2412.00493 • Published Nov 30, 2024 • 16
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images Paper • 2412.03517 • Published Dec 4, 2024 • 18
Imagine360: Immersive 360 Video Generation from Perspective Anchor Paper • 2412.03552 • Published Dec 4, 2024 • 26
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation Paper • 2412.03069 • Published Dec 4, 2024 • 30
DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling Paper • 2412.04905 • Published about 1 month ago • 7
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion Paper • 2412.04301 • Published Dec 5, 2024 • 34
2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction Paper • 2412.03428 • Published Dec 4, 2024 • 10
APOLLO: SGD-like Memory, AdamW-level Performance Paper • 2412.05270 • Published about 1 month ago • 38
Moto: Latent Motion Token as the Bridging Language for Robot Manipulation Paper • 2412.04445 • Published Dec 5, 2024 • 21
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published about 1 month ago • 123
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published Sep 2, 2024 • 27
CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis Paper • 2408.14765 • Published Aug 27, 2024 • 14
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters Paper • 2408.17253 • Published Aug 30, 2024 • 37