Convolutional Neural Network (CNN) functioning as a visual feature extractor and trained using raw speech information.