gaunernst
/

vit_base_patch16_1024_128.audiomae_as2m_ft_as20k

Audio Classification

Model card Files Files and versions Community

gaunernst commited on Dec 2, 2023

Commit

787dde8

•

1 Parent(s): 81e2a6f

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -6,11 +6,14 @@ pipeline_tag: audio-classification
 # Model card for vit_base_patch16_1024_128.audiomae_as2m_ft_as20k
 This is a port of AudioMAE ViT-B/32 weights for usage with `timm`. The naming convention is adopted from other `timm`'s ViT models.
 See the original repo here: https://github.com/facebookresearch/AudioMAE
-A Vision Transformer (ViT) for audio. Pretrained on AudioSet-2M with Self-Supervised Masked Autoencoder (MAE) method, and fine-tuned on AudioSet-20k.
 ## Model Details
 - **Model Type:** Audio classification / feature backbone

 # Model card for vit_base_patch16_1024_128.audiomae_as2m_ft_as20k
+A Vision Transformer (ViT) for audio. Pretrained on AudioSet-2M with Self-Supervised Masked Autoencoder (MAE) method, and fine-tuned on AudioSet-20k.
 This is a port of AudioMAE ViT-B/32 weights for usage with `timm`. The naming convention is adopted from other `timm`'s ViT models.
 See the original repo here: https://github.com/facebookresearch/AudioMAE
+For the AudioSet-2M pre-trained checkpoint (without Audioset-20k fine-tuning), see https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m
 ## Model Details
 - **Model Type:** Audio classification / feature backbone