项目作者: SejongUni-Lecture

项目描述 :
🔚 Digital Sound lecture module 2: Speaker Recognition
高级语言: Python
项目地址: git://github.com/SejongUni-Lecture/Digital_Sound_module2.git
创建时间: 2019-11-12T07:20:11Z
项目社区:https://github.com/SejongUni-Lecture/Digital_Sound_module2

开源协议:MIT License

下载


Digital Sound module2

Introduction

This project is a module #2 of Digital Sound lecture: Speaker Recognition

Learning Objective

Implementing voice recognition that is used in AI speakers

Content

Topic

Recognizing speaker’s voice

Principle & Theory

Distinguishing Speaker’s voice patterns in that every person have different voice feature

Problem

We want to provide personal service for each family member by AI speaker using Speaker Recognition

Setting data

  • Collect 12 students’ audio data. Each of the audio data has to read the given script and record with coolEdit(WAV, 16000Hz, 16-bit, mono).
  • After collecting the data, use them as training data
  • All data will be shared with classmates

All data that have used for training models will NOT be OPENED to PUBLIC due to privacy issues


Solution

I have looked up the code from this blog and applied to my project
Image down below is a flow chart how the code works in total



Code Description

train_model.py

  • Input: folder name of training data (folder has to exists in executing directory)
  • Output: speaker_name.gmm

test_model.py

  • Input: total number of test data and folder name of test data (folder has to exists in executing directory)
  • Output: identified speaker names

show_all_graphs.py
Mel Spectogram, MFCC, GMM graphs.






Minor issues

issues solutions
IO library problems Not very often, io library has some problems with reading wav files. Before inputting the wav, you have to get rid of implicit metadata in wav files. Using ffmpeg is one way.
Unicode problems Try adding “utf-8” when reading wav files

Library

Mainly used sklearn and speakerfeatures with Python