项目作者: manojpamk

项目描述 :
Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196
高级语言: Python
项目地址: git://github.com/manojpamk/pytorch_xvectors.git
创建时间: 2020-03-25T06:34:48Z
项目社区:https://github.com/manojpamk/pytorch_xvectors

开源协议:MIT License

下载


Deep speaker embeddings in PyTorch" class="reference-link">
Deep speaker embeddings in PyTorch

This repository contains code and models for training an x-vector speaker recognition model using Kaldi for feature preparation and PyTorch for DNN model training. MFCC feature configurations and TDNN model architecture follow the Voxceleb recipe in Kaldi (commit hash 9b4dc93c9). Training procedures including optimizer and step count are similar to, but not exactly the same as Kaldi.

Additionally, code for training meta-learning embeddings are available in train_proto.py and train_relation.py. An overview of these models is available at https://arxiv.org/abs/2007.16196 and in the below figure:

Overview: Meta Learning Models

Citation

If you found this toolkit useful in your research, consider citing the following:

  1. @misc{kumar2020designing,
  2. title={Designing Neural Speaker Embeddings with Meta Learning},
  3. author={Manoj Kumar and Tae Jin-Park and Somer Bishop and Catherine Lord and Shrikanth Narayanan},
  4. year={2020},
  5. eprint={2007.16196},
  6. archivePrefix={arXiv}
  7. }

Requirements:

Python Libraries

  1. python==3.6.10
  2. torch==1.4.0
  3. kaldiio==2.15.1
  4. kaldi-python-io==1.0.4
Other Tools:
  • Spectral Clustering using normalized maximum eigengap GitHub
    • Used for speaker clustering during diarization
  • Diarization scoring tool GitHub
    • Used for computing diarization error rate (DER)

Installation:

  • Install the python libraries listed in Requirements
  • Install Kaldi toolkit.
    • This repository is tested with commit hash 9b4dc93c9 of the above Kaldi repository.
    • Kaldi is recommended to be installed in $HOME/kaldi.
  • Download this repository. NOTE: Destination need not be inside Kaldi installation.
  • Set the voxcelebDir variable inside pytorch_run.sh
  • (Optional) Install Other Tools listering in Requirements

Data preparation

Training data preparation

  • Training features are expected in Kaldi nnet3 egs format, and read using the nnet3EgsDL class defined in train_utils.py.
  • The voxceleb recipe is provided in pytorch_run.sh to prepare them.
  • Extracted embeddings are written in Kaldi vector format, similar to xvector.ark.

Dataset for data augmentation

pytorch_run.sh script augments the training data using the following two datasets.

  • Download MUSAN and extract to ./musan.
  • Download RIRS_NOISES and extract to ./RIRS_NOISES.

Training

  1. CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 train_xent.py <egsDir>
  1. usage: train_xent.py [-h] [--local_rank LOCAL_RANK] [-modelType MODELTYPE]
  2. [-featDim FEATDIM] [-resumeTraining RESUMETRAINING]
  3. [-resumeModelDir RESUMEMODELDIR]
  4. [-numArchives NUMARCHIVES] [-numSpkrs NUMSPKRS]
  5. [-logStepSize LOGSTEPSIZE] [-batchSize BATCHSIZE]
  6. [-numEgsPerArk NUMEGSPERARK]
  7. [-preFetchRatio PREFETCHRATIO]
  8. [-optimMomentum OPTIMMOMENTUM] [-baseLR BASELR]
  9. [-maxLR MAXLR] [-numEpochs NUMEPOCHS]
  10. [-noiseEps NOISEEPS] [-pDropMax PDROPMAX]
  11. [-stepFrac STEPFRAC]
  12. egsDir
  13. positional arguments:
  14. egsDir Directory with training archives
  15. optional arguments:
  16. -h, --help show this help message and exit
  17. --local_rank LOCAL_RANK
  18. -modelType MODELTYPE Refer train_utils.py
  19. -featDim FEATDIM Frame-level feature dimension
  20. -resumeTraining RESUMETRAINING
  21. (1) Resume training, or (0) Train from scratch
  22. -resumeModelDir RESUMEMODELDIR
  23. Path containing training checkpoints
  24. -numArchives NUMARCHIVES
  25. Number of egs.*.ark files
  26. -numSpkrs NUMSPKRS Number of output labels
  27. -logStepSize LOGSTEPSIZE
  28. Iterations per log
  29. -batchSize BATCHSIZE Batch size
  30. -numEgsPerArk NUMEGSPERARK
  31. Number of training examples per egs file
  32. -preFetchRatio PREFETCHRATIO
  33. xbatchSize to fetch from dataloader
  34. -optimMomentum OPTIMMOMENTUM
  35. Optimizer momentum
  36. -baseLR BASELR Initial LR
  37. -maxLR MAXLR Maximum LR
  38. -numEpochs NUMEPOCHS Number of training epochs
  39. -noiseEps NOISEEPS Noise strength before pooling
  40. -pDropMax PDROPMAX Maximum dropout probability
  41. -stepFrac STEPFRAC Training iteration when dropout = pDropMax

egsDir contains the nnet3 egs files.

Embedding extraction

  1. usage: extract.py [-h] [-modelType MODELTYPE] [-numSpkrs NUMSPKRS]
  2. modelDirectory featDir embeddingDir
  3. positional arguments:
  4. modelDirectory Directory containing the model checkpoints
  5. featDir Directory containing features ready for extraction
  6. embeddingDir Output directory
  7. optional arguments:
  8. -h, --help show this help message and exit
  9. -modelType MODELTYPE Refer train_utils.py
  10. -numSpkrs NUMSPKRS Number of output labels for model

The script pytorch_run.sh can be used to train embeddings on the voxceleb recipe on an end-to-end basis.

Pretrained model

Downloading

Two ways to download the pre-trained model:

  1. Google Drive link (or)
  2. Command line (@acpanjan/download-google-drive-files-using-wget-3c2c025a8b99">reference)
    1. wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1gbAWDdWN_pkOim4rWVXUlfuYjfyJqUHZ' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1gbAWDdWN_pkOim4rWVXUlfuYjfyJqUHZ" -O preTrainedModel.zip && rm -rf /tmp/cookies.txt

Speaker Verification

To reproduce voxceleb EER results with the pretrained model, follow the below steps.
NOTE: The voxceleb features must be prepared using prepare_feats_for_egs.sh prior to evaluation.

1) Extract models/ and xvectors/ from the pre-trained archive into the installation directory
2) Set the following variables in pytorch_run.sh:

  1. ```
  2. modelDir=models/xvec_preTrained
  3. trainFeatDir=data/train_combined_no_sil
  4. trainXvecDir=xvectors/xvec_preTrained/train
  5. testFeatDir=data/voxceleb1_test_no_sil
  6. testXvecDir=xvectors/xvec_preTrained/test
  7. ```

3) Extract embeddings and compute EER, minDCF. Set stage=7 in pytorch_run.sh and execute:

  1. bash pytorch_run.sh

4) Alternatively, pretrained PLDA model is available inside xvectors/train directory. Set stage=9 in pytorch_run.sh and execute:

  1. bash pytorch_run.sh

Speaker Diarization

  1. cd egs/

Place the audio files to diarize and their corresponding RTTM files in demo_wav/ and demo_rttm/ directories. Execute:

  1. bash diarize.sh

Results

1. Speaker Verification (%EER)

Kaldi pytorch_xvectors
Vox1-test 3.13 2.82
VOICES-dev 10.30 8.59

2. Speaker Diarization (%DER)

NOTE: Clustering using https://github.com/tango4j/Auto-Tuning-Spectral-Clustering

Kaldi pytorch_xvectors
DIHARD2 dev (no collar, oracle #spk) 26.97 27.50
DIHARD2 dev (no collar, est #spk) 24.49 24.66
AMI dev+test (26 meetings, collar, oracle #spk) 6.39 6.30
AMI dev+test (26 meetings, collar, est #spk) 7.29 10.14