Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9 • 41
Merlin:Empowering Multimodal LLMs with Foresight Minds Paper • 2312.00589 • Published Nov 30, 2023 • 24
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis Paper • 2311.08667 • Published Nov 15, 2023 • 18
Single-Image 3D Human Digitization with Shape-Guided Diffusion Paper • 2311.09221 • Published Nov 15, 2023 • 20
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model Paper • 2311.09217 • Published Nov 15, 2023 • 21
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion Paper • 2311.07885 • Published Nov 14, 2023 • 39
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text Paper • 2311.07446 • Published Nov 13, 2023 • 28
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation Paper • 2311.07562 • Published Nov 13, 2023 • 12
ChatAnything: Facetime Chat with LLM-Enhanced Personas Paper • 2311.06772 • Published Nov 12, 2023 • 34
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper • 2311.06783 • Published Nov 12, 2023 • 26
Music ControlNet: Multiple Time-varying Controls for Music Generation Paper • 2311.07069 • Published Nov 13, 2023 • 43
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models Paper • 2311.05997 • Published Nov 10, 2023 • 36
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model Paper • 2311.06214 • Published Nov 10, 2023 • 29
GPT4All: An Ecosystem of Open Source Compressed Language Models Paper • 2311.04931 • Published Nov 6, 2023 • 20
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model Paper • 2311.05348 • Published Nov 9, 2023 • 11
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving Paper • 2311.05332 • Published Nov 9, 2023 • 9
Prompt Cache: Modular Attention Reuse for Low-Latency Inference Paper • 2311.04934 • Published Nov 7, 2023 • 28
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents Paper • 2311.05437 • Published Nov 9, 2023 • 45