musicaudiopretrain
commited on
Commit
·
c31b901
1
Parent(s):
a987a4a
Update README.md
Browse files
README.md
CHANGED
@@ -4,8 +4,37 @@ inference: false
|
|
4 |
tags:
|
5 |
- music
|
6 |
---
|
|
|
7 |
|
8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
|
10 |
**MERT-v0-public** is a completely unsupervised model trained on **completely non-comercial open-source** [Music4All](https://sites.google.com/view/contact4music4all) dataset and the part of [FMA_full](https://github.com/mdeff/fma) dataset that does not include tag "experimental".
|
11 |
|
|
|
4 |
tags:
|
5 |
- music
|
6 |
---
|
7 |
+
# Introduction to our series work
|
8 |
|
9 |
+
The development log of our Music Audio Pre-training (m-a-p) model family:
|
10 |
+
- 17/03/2023: we release two advanced music understanding models, [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) and [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) , trained with new paradigm and dataset. They outperform the previous models and can better generalize to more tasks.
|
11 |
+
- 14/03/2023: we retrained the MERT-v0 model with open-source-only music dataset [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public)
|
12 |
+
- 29/12/2022: a music understanding model [MERT-v0](https://huggingface.co/m-a-p/MERT-v0) trained with **MLM** paradigm, which performs better at downstream tasks.
|
13 |
+
- 29/10/2022: a pre-trained MIR model [music2vec](https://huggingface.co/m-a-p/music2vec-v1) trained with **BYOL** paradigm.
|
14 |
+
|
15 |
+
|
16 |
+
|
17 |
+
Here is a table for quick model pick-up:
|
18 |
+
|
19 |
+
| Name | Pre-train Paradigm | Training Data (hour) | Pre-train Context (second) | Model Size | Transformer Layer-Dimension | Feature Rate | Sample Rate | Release Date |
|
20 |
+
| ------------------------------------------------------------ | ------------------ | -------------------- | ---------------------------- | ---------- | --------------------------- | ------------ | ----------- | ------------ |
|
21 |
+
| [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) | MLM | 160K | 5 | 330M | 24-1024 | 75 Hz | 24K Hz | 17/03/2023 |
|
22 |
+
| [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) | MLM | 20K | 5 | 95M | 12-768 | 75 Hz | 24K Hz | 17/03/2023 |
|
23 |
+
| [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public) | MLM | 900 | 5 | 95M | 12-768 | 50 Hz | 16K Hz | 14/03/2023 |
|
24 |
+
| [MERT-v0](https://huggingface.co/m-a-p/MERT-v0) | MLM | 1000 | 5 | 95 M | 12-768 | 50 Hz | 16K Hz | 29/12/2022 |
|
25 |
+
| [music2vec-v1](https://huggingface.co/m-a-p/music2vec-v1) | BYOL | 1000 | 30 | 95 M | 12-768 | 50 Hz | 16K Hz | 30/10/2022 |
|
26 |
+
|
27 |
+
## Explanation
|
28 |
+
|
29 |
+
The m-a-p models share the similar model architecture and the most distinguished difference is the paradigm in used pre-training. Other than that, there are several nuance technical configuration needs to know before using:
|
30 |
+
|
31 |
+
- **Model Size**: the number of parameters that would be loaded to memory. Please select the appropriate size fitting your hardware.
|
32 |
+
- **Transformer Layer-Dimension**: The number of transformer layers and the corresponding feature dimensions can be outputted from our model. This is marked out because features extracted by **different layers could have various performance depending on tasks**.
|
33 |
+
- **Feature Rate**: Given a 1-second audio input, the number of features output by the model.
|
34 |
+
- **Sample Rate**: The frequency of audio that the model is trained with.
|
35 |
+
|
36 |
+
|
37 |
+
# Introduction to MERT-v0-public
|
38 |
|
39 |
**MERT-v0-public** is a completely unsupervised model trained on **completely non-comercial open-source** [Music4All](https://sites.google.com/view/contact4music4all) dataset and the part of [FMA_full](https://github.com/mdeff/fma) dataset that does not include tag "experimental".
|
40 |
|