项目作者: A-Jacobson

项目描述 :
pytorch tacotron2 https://arxiv.org/pdf/1712.05884.pdf
高级语言: Jupyter Notebook
项目地址: git://github.com/A-Jacobson/tacotron2.git
创建时间: 2018-03-01T21:44:06Z
项目社区:https://github.com/A-Jacobson/tacotron2

开源协议:

下载


Tacotron2

im

NATURAL TTS SYNTHESIS BY CONDITIONING WAVENET ON MEL SPECTROGRAM
PREDICTIONS
https://arxiv.org/pdf/1712.05884.pdf

WaveNet: A Generative Model for Raw Audio
https://arxiv.org/abs/1609.03499

Contents

  • Simple LJ Speech DataLoader
  • Mel Spectrogram Prediction network (text to Spectrogram)
  • [TODO] WaveNet Vocoder (Spectrogram to raw audio)

Status

  • Spectrogram network is functional but not fully trained.
    The model takes ~3 hours per epoch on an M6000 gpu.

Setup

  1. install pytorch and torchvision:

    1. conda install pytorch -c pytorch
  2. install other requirements:

    1. pip install -r requirements.txt

    Usage

    train Spectrogram Prediction Network

    1. python train.py

view logs in Tensorboard

  1. tensorboard --logdir runs

im

im

Wavenet Resources

https://r9y9.github.io/wavenet_vocoder/
https://twitter.com/heiga_zen/status/832145314559750145
http://musyoku.github.io/2016/09/18/wavenet-a-generative-model-for-raw-audio/
https://www.slideshare.net/danilosoba1/generative-model-based-texttospeech