Audio - Vision

Implementation and reviews of Audio & Computer vision related papers in python.

Most of our codes use Keras_aud library.

Implementations

[1] Deep Neural Network Baseline For Dcase Challenge 2016 [Paper] [Code]

[2] CQT-Based Convolutional Neural Networks for Audio Scene Classification and Domestic Audio Tagging [Paper] [Code]

[3] A convolutional neural network approach for acoustic scene classification [Paper] [Code]

[4] Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection [Paper] [Code]

[5] FrameCNN: A Weakly-Supervised Learning Framework for Frame-Wise Acoustic Event Detection and Classification [Paper][Code]

[6] Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging [Paper][Code]

[7] Exploring Models and Data for Image Question Answering [paper][Code]

[8] Stacked Attention Networks for Image Question Answering [paper][Code]

[9] Sequence to Sequence Autoencoders for Unsupervised Representation Learning From Audio [Paper][Code]

Applications

Application Deployed using heroku and Flask with python and JS

[1] Digit classifier [Implementation] [Application]

[2] MNIST Random Digit Regenerator [Paper] [Implementation] [Application]

Feedback

All kinds of feedback (code style, bugs, comments etc.) is welcome. Please open an Issue on this Repository.

Contribution Guidelines

If you are familiar with basics of contributing to github repositories, feel free to skip this section. For total beginners who landed up here, before contributing, take a look at the blog-post to get started.

Team Roles

Aditya Arora : Code Environments, Structuring, Dynamic Description, logics, feature logistics, Documentations.

Akshita Gupta : Paper Selections, Understanding Theoretical concepts, Model Descriptions.

Upcoming Uploads

[1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [paper]