UrbanSounds8K / article.md
gputrain's picture
update text
02c67bc

A newer version of the Gradio SDK is available: 5.6.0

Upgrade

Dataset

Audio files

Files are converted to melspectrograms that perform better in general for visual transformations of such audio files.

Training

Using With Fast.ai and three epochs with minimal lines of code approaches 95% accuracy with a 20% validation of the entire dataset of 8732 labelled sound excerpts of 10 classes shown above. Fast.ai was used to train this classifier with a Resnet34 vision learner with three epochs.

epoch train_loss valid_loss accuracy time
0 1.462791 0.710250 0.775487 01:12
0 0.600056 0.309964 0.892325 00:40
1 0.260431 0.200901 0.945017 00:39
2 0.090158 0.164748 0.950745 00:40

Classical Approaches

Classical approaches on this dataset as of 2019

State of the Art Approaches

The state-of-the-art methods for audio classification approach this problem as an image classification task. For such image classification problems from audio samples, three common transformation approaches are:

Credits

Thanks to Kurian Benoy and countless others that generously leave code in github to follow or write blogs that explain various things online.

Code Repo & Blog

Additional details on my Github Repo and my blog where I will add additional details on this fast ai build, audio transforms and more.