m-a-p
/

MERT-v0-public

Feature Extraction

Transformers

PyTorch

mert_model

music

custom_code

Model card Files Files and versions Community

musicaudiopretrain commited on Mar 30, 2023

Commit

c31b901

1 Parent(s): a987a4a

Update README.md

Browse files

Files changed (1) hide show

README.md +30 -1

README.md CHANGED Viewed

@@ -4,8 +4,37 @@ inference: false
 tags:
 - music
 ---
-# Introduction
 **MERT-v0-public** is a completely unsupervised model trained on **completely non-comercial open-source** [Music4All](https://sites.google.com/view/contact4music4all) dataset and the part of [FMA_full](https://github.com/mdeff/fma) dataset that does not include tag "experimental".

 tags:
 - music
 ---
+# Introduction to our series work
+The development log of our Music Audio Pre-training (m-a-p) model family:
+- 17/03/2023: we release two advanced music understanding models, [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) and [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) , trained with new paradigm and dataset. They outperform the previous models and can better generalize to more tasks.
+- 14/03/2023: we retrained the MERT-v0 model with open-source-only music dataset [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public)
+- 29/12/2022: a music understanding model [MERT-v0](https://huggingface.co/m-a-p/MERT-v0) trained with **MLM** paradigm, which performs better at downstream tasks.
+- 29/10/2022: a pre-trained MIR model [music2vec](https://huggingface.co/m-a-p/music2vec-v1) trained with **BYOL** paradigm.
+Here is a table for quick model pick-up:
+| Name                                                         | Pre-train Paradigm | Training Data (hour) | Pre-train Context   (second) | Model Size | Transformer Layer-Dimension | Feature Rate | Sample Rate | Release Date |
+| ------------------------------------------------------------ | ------------------ | -------------------- | ---------------------------- | ---------- | --------------------------- | ------------ | ----------- | ------------ |
+| [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M)    | MLM                | 160K                 | 5                            | 330M       | 24-1024                     | 75 Hz        | 24K Hz      | 17/03/2023   |
+| [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M)      | MLM                | 20K                  | 5                            | 95M        | 12-768                      | 75 Hz        | 24K Hz      | 17/03/2023   |
+| [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public) | MLM                | 900                  | 5                            | 95M        | 12-768                      | 50 Hz        | 16K Hz      | 14/03/2023   |
+| [MERT-v0](https://huggingface.co/m-a-p/MERT-v0)              | MLM                | 1000                 | 5                            | 95 M       | 12-768                      | 50 Hz        | 16K Hz      | 29/12/2022   |
+| [music2vec-v1](https://huggingface.co/m-a-p/music2vec-v1)    | BYOL               | 1000                 | 30                           | 95 M       | 12-768                      | 50 Hz        | 16K Hz      | 30/10/2022   |
+## Explanation
+The m-a-p models share the similar model architecture and the most distinguished difference is the paradigm in used pre-training. Other than that, there are several nuance technical configuration needs to know before using:
+- **Model Size**: the number of parameters that would be loaded to memory. Please select the appropriate size fitting your hardware.
+- **Transformer Layer-Dimension**: The number of transformer layers and the corresponding feature dimensions can be outputted from our model. This is marked out because features extracted by **different layers could have various performance depending on tasks**.
+- **Feature Rate**: Given a 1-second audio input, the number of features output by the model.
+- **Sample Rate**: The frequency of audio that the model is trained with.
+# Introduction to MERT-v0-public
 **MERT-v0-public** is a completely unsupervised model trained on **completely non-comercial open-source** [Music4All](https://sites.google.com/view/contact4music4all) dataset and the part of [FMA_full](https://github.com/mdeff/fma) dataset that does not include tag "experimental".