Audio classification using spectrograms and transfer learning
Project 1 - Research Methodology and Scientific Writing (EQ2425) - Federico Favia & Mayank Gulati, December 2019, KTH, Stockholm
This project in Python developed within the course of esearch Methodology and Scientific Writing at KTH is a pratical and theoretical research about audio classification on musical instruments of NSynth audio dataset using Convolution Neural Network (CNN) technique by converting .wav audio file to spectrogram images. In this project, the open source Matlab library VLFeat for SIFT features is used. We exploit on the existing state-of-the-art infrastructures of image classification used in ImageNet challenge namely ResNet18, AlexNet, GoogLeNet with the help of transfer learning by adding custom layers in the end of these deep neural networks. See the report for more information.
Below you can see the spectrograms used: