Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8, 2024 • 62
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation Paper • 2410.11779 • Published Oct 15, 2024 • 25
What Matters in Transformers? Not All Attention is Needed Paper • 2406.15786 • Published Jun 22, 2024 • 30
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Paper • 2410.10774 • Published Oct 14, 2024 • 25
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding Paper • 2411.19527 • Published Nov 29, 2024 • 10