Update README.md
Browse files
README.md
CHANGED
@@ -6,11 +6,14 @@ pipeline_tag: audio-classification
|
|
6 |
|
7 |
# Model card for vit_base_patch16_1024_128.audiomae_as2m_ft_as20k
|
8 |
|
|
|
|
|
9 |
This is a port of AudioMAE ViT-B/32 weights for usage with `timm`. The naming convention is adopted from other `timm`'s ViT models.
|
10 |
|
11 |
See the original repo here: https://github.com/facebookresearch/AudioMAE
|
12 |
|
13 |
-
|
|
|
14 |
|
15 |
## Model Details
|
16 |
- **Model Type:** Audio classification / feature backbone
|
|
|
6 |
|
7 |
# Model card for vit_base_patch16_1024_128.audiomae_as2m_ft_as20k
|
8 |
|
9 |
+
A Vision Transformer (ViT) for audio. Pretrained on AudioSet-2M with Self-Supervised Masked Autoencoder (MAE) method, and fine-tuned on AudioSet-20k.
|
10 |
+
|
11 |
This is a port of AudioMAE ViT-B/32 weights for usage with `timm`. The naming convention is adopted from other `timm`'s ViT models.
|
12 |
|
13 |
See the original repo here: https://github.com/facebookresearch/AudioMAE
|
14 |
|
15 |
+
For the AudioSet-2M pre-trained checkpoint (without Audioset-20k fine-tuning), see https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m
|
16 |
+
|
17 |
|
18 |
## Model Details
|
19 |
- **Model Type:** Audio classification / feature backbone
|