项目作者: AshwinRJ

项目描述 :
VoiceGAN - Hallucinating Faces from Voices
高级语言: Jupyter Notebook
项目地址: git://github.com/AshwinRJ/Face-Generation-from-Voice.git
创建时间: 2019-03-16T01:17:14Z
项目社区:https://github.com/AshwinRJ/Face-Generation-from-Voice

开源协议:MIT License

下载


Face-Generation-from-Speech

Implementation Details - VoiceGAN

Overall architecture of our VoiceGAN:

Details

  1. Face Embedding Extraction from Pre-trained DeepSphere Model
  2. Kaldi VoxCeleb X-Vector Extraction
  3. Joint Embedding Network using MLP
  4. Conditional DC GAN for Image Synthesis with Scaling Loss

Datasets:

VGGFace2, Voxceleb2, Voxceleb1 (Used only for X-Vector training)

  • This work uses X-Vector Speaker Embeddings, with Deepsphere face Embeddings to train a joint embedding network using the N-Pair Loss. The obtained embeddings are used to generate face images conditioned on provided speaker embeddings shifted to a joint embedding space.

Preliminary Results

Example faces generated solely conditioned on speech input.

Additional Resources

Papers