ConvNext (trained on XCL from BirdSet)

ConvNext trained on the XCL dataset from BirdSet, covering 9736 bird species from Xeno-Canto. Please refer to the BirdSet Paper and the BirdSet Repository for further information.

Model Details

ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.

How to use

The BirdSet data needs a custom processor that is available in the BirdSet repository. The model does not have a processor available. The model accepts a mono image (spectrogram) as input (e.g., torch.Size([16, 1, 128, 1024]))

  • The model is trained on 5-second clips of bird vocalizations.
  • num_channels: 1
  • pretrained checkpoint: facebook/convnext-base-224-22k
  • sampling_rate: 32_000
  • normalize spectrogram: mean: -4.268, std: 4.569 (from esc-50)
  • spectrogram: n_fft: 1024, hop_length: 320, power: 2.0
  • melscale: n_mels: 128, n_stft: 513
  • dbscale: top_db: 80
import torch
from transformers import AutoModelForImageClassification
from datasets import load_dataset

dataset = load_dataset("DBD-research-group/BirdSet", "HSN")

Model Source

Citation

Downloads last month
295
Safetensors
Model size
97.5M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including DBD-research-group/ConvNeXT-Base-BirdSet-XCL