项目作者: BogiHsu

项目描述 :
Real-Time High-Fidelity Speech Synthesis without GPU
高级语言: Python
项目地址: git://github.com/BogiHsu/WG-WaveNet.git
创建时间: 2020-04-29T06:54:02Z

开源协议:MIT License


WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

Po-chun Hsu, Hung-yi Lee

In our recent paper, we propose WG-WaveNet, a fast, lightweight, and high-quality waveform generation model. WG-WaveNet is composed of a compact flow-based model and a post-filter. The two components are jointly trained by maximizing the likelihood of the training data and optimizing loss functions on the frequency domains. As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inference time; even though the model is highly compressed, the post-filter maintains the quality of generated waveform. Our PyTorch implementation can be trained using less than 8 GB GPU memory and generates audio samples at a rate of more than 5000 kHz on an NVIDIA 1080Ti GPU. Furthermore, even if synthesizing on a CPU, we show that the proposed method is capable of generating 44.1 kHz speech waveform 1.2 times faster than real-time. Experiments also show that the quality of generated audio is comparable to those of other methods.

Visit the demopage for audio samples.


  • Release pretrained model.
  • Combine with Tacotron2.


  • Python >= 3.5.2
  • torch >= 1.4.0
  • numpy
  • scipy
  • pickle
  • librosa
  • tensorboardX



  1. Download LJ Speech. In this example it’s in data/

  2. For training, run the following command.

  1. python3 train.py --data_dir=<dir/to/dataset> --ckpt_dir=<dir/to/models>
  1. For training using a pretrained model, run the following command.
  1. python3 train.py --data_dir=<dir/to/dataset> --ckpt_dir=<dir/to/models> --ckpt_pth=<pth/to/pretrained/model>
  1. For using Tensorboard (optional), run the following command.
  1. python3 train.py --data_dir=<dir/to/dataset> --ckpt_dir=<dir/to/models> --log_dir=<dir/to/logs>


  • For synthesizing wav files, run the following command.
  1. python3 inference.py --ckpt_pth=<pth/to/model> --src_pth=<pth/to/src/wavs> --res_pth=<pth/to/save/wavs>

Pretrained Model

Work in progress.


We will combine this vocoder with Tacotron2. More information and Colab demo will be released here.
