时空索引-Digital_Sound_module2-PROSAGA-码农传奇

Digital Sound module2

This project is a module #2 of Digital Sound lecture: Speaker Recognition

Implementing voice recognition that is used in AI speakers

Recognizing speaker’s voice

Distinguishing Speaker’s voice patterns in that every person have different voice feature

We want to provide personal service for each family member by AI speaker using Speaker Recognition

Collect 12 students’ audio data. Each of the audio data has to read the given script and record with coolEdit(WAV, 16000Hz, 16-bit, mono).
After collecting the data, use them as training data
All data will be shared with classmates

All data that have used for training models will NOT be OPENED to PUBLIC due to privacy issues

I have looked up the code from this blog and applied to my project
Image down below is a flow chart how the code works in total

train_model.py

Input: folder name of training data (folder has to exists in executing directory)
Output: speaker_name.gmm

test_model.py

Input: total number of test data and folder name of test data (folder has to exists in executing directory)
Output: identified speaker names

show_all_graphs.py
Mel Spectogram, MFCC, GMM graphs.

issues	solutions
IO library problems	Not very often, io library has some problems with reading wav files. Before inputting the wav, you have to get rid of implicit metadata in wav files. Using ffmpeg is one way.
Unicode problems	Try adding “utf-8” when reading wav files

Mainly used sklearn and speakerfeatures with Python