Sklearn and Pytorch solutions to predict intonation and contents of a spoken audio sample in the MLEnd dataset