Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.6.0
Dataset
Audio files
Files are converted to melspectrograms that perform better in general for visual transformations of such audio files.
Training
Using With Fast.ai and three epochs with minimal lines of code approaches 95% accuracy with a 20% validation of the entire dataset of 8732 labelled sound excerpts of 10 classes shown above. Fast.ai was used to train this classifier with a Resnet34 vision learner with three epochs.
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 1.462791 | 0.710250 | 0.775487 | 01:12 |
0 | 0.600056 | 0.309964 | 0.892325 | 00:40 |
1 | 0.260431 | 0.200901 | 0.945017 | 00:39 |
2 | 0.090158 | 0.164748 | 0.950745 | 00:40 |
Classical Approaches
Classical approaches on this dataset as of 2019
State of the Art Approaches
The state-of-the-art methods for audio classification approach this problem as an image classification task. For such image classification problems from audio samples, three common transformation approaches are:
- Linear Spectrograms
- Log Spectrograms
- Mel Spectrograms
Credits
Thanks to Kurian Benoy and countless others that generously leave code in github to follow or write blogs that explain various things online.
Code Repo & Blog
Additional details on my Github Repo and my blog where I will add additional details on this fast ai build, audio transforms and more.