ImprovNet -- Generating Controllable Musical Improvisations with Iterative Corruption Refinement Paper • 2502.04522 • Published Feb 6, 2025 • 2
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs Paper • 2510.10689 • Published Oct 12, 2025 • 47
RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection Paper • 2507.12175 • Published Jul 16, 2025
Making Dialogue Grounding Data Rich: A Three-Tier Data Synthesis Framework for Generalized Referring Expression Comprehension Paper • 2512.02791 • Published Dec 2, 2025 • 1
Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models Paper • 2511.11910 • Published Nov 14, 2025 • 35
Sound Matching an Analogue Levelling Amplifier Using the Newton-Raphson Method Paper • 2509.10706 • Published Sep 12, 2025 • 1
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix Paper • 2505.13032 • Published May 19, 2025 • 4
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following Paper • 2506.12285 • Published Jun 14, 2025 • 54
$μ^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation Paper • 2507.00316 • Published Jun 30, 2025 • 15
Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior Paper • 2505.11315 • Published May 16, 2025
DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions Paper • 2504.14735 • Published Apr 20, 2025 • 1
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning Paper • 2503.11495 • Published Mar 14, 2025 • 14
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning Paper • 2503.11495 • Published Mar 14, 2025 • 14
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11, 2025 • 72
CoS: Chain-of-Shot Prompting for Long Video Understanding Paper • 2502.06428 • Published Feb 10, 2025 • 10
INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation Paper • 2501.18753 • Published Jan 30, 2025 • 3
INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation Paper • 2501.18753 • Published Jan 30, 2025 • 3