article.md · gputrain/UrbanSounds8K at main

Dataset

UrbanSound8K

Audio files

Files are converted to melspectrograms that perform better in general for visual transformations of such audio files.

Training

Using With Fast.ai and three epochs with minimal lines of code approaches 95% accuracy with a 20% validation of the entire dataset of 8732 labelled sound excerpts of 10 classes shown above. Fast.ai was used to train this classifier with a Resnet34 vision learner with three epochs.

epoch	train_loss	valid_loss	accuracy	time
0	1.462791	0.710250	0.775487	01:12
0	0.600056	0.309964	0.892325	00:40
1	0.260431	0.200901	0.945017	00:39
2	0.090158	0.164748	0.950745	00:40

Classical Approaches

Classical approaches on this dataset as of 2019

State of the Art Approaches

The state-of-the-art methods for audio classification approach this problem as an image classification task. For such image classification problems from audio samples, three common transformation approaches are:

Linear Spectrograms
Log Spectrograms
Mel Spectrograms

Credits

Thanks to Kurian Benoy and countless others that generously leave code in github to follow or write blogs that explain various things online.

Code Repo & Blog

Additional details on my Github Repo and my blog where I will add additional details on this fast ai build, audio transforms and more.