Apply for community grant: Academic project

#2
by musicaudiopretrain - opened
Multimodal Art Projection org

Hi,

We are writing to apply for a community grant to support the development of our self-supervised music understanding model, MERT. Our model has achieved overall state-of-the-art (SOTA) performance on 14 Music Information Retrieval (MIR) tasks, including both classification and sequence labeling.

MERT is a lightweight model that supports on-the-fly inference and works on sequence labeling tasks like beat tracking and source separation with one consumer-grade GPU. Our model incorporates an acoustic teacher based on RVQ-VAE and a music teacher based on CQT, which guide our BERT-style transformer encoder to model the music. Additionally, we introduce an in-batch noise mixture to enhance representation robustness.

Our research presents unique challenges due to the tonal and pitched characteristics of music audio. However, MERT has been able to overcome these challenges and outperforms conventional speech and audio approaches. We believe that MERT will pave the way for further exploration of self-supervised learning in music audio understanding.

We are applying for a community grant to support the hosting of our MERT demo on the HuggingFace space for multiple tasks. Our goal is to make it easier for more people to learn how to use our pre-trained model and inspire future related research. With access to GPU resources, we can improve the performance of our demo and accommodate a larger number of users. This will help us reach a wider audience and facilitate the adoption of our model in the MIR community. We believe that this grant will be a valuable investment in the advancement of self-supervised learning in music audio understanding.

Thank you for your consideration of our application.

Sincerely,

Multimodal Art Projection (MAP) research community

Sign up or log in to comment