项目作者: AshwinRJ
项目描述 :
VoiceGAN - Hallucinating Faces from Voices
高级语言: Jupyter Notebook
项目地址: git://github.com/AshwinRJ/Face-Generation-from-Voice.git
Face-Generation-from-Speech
Implementation Details - VoiceGAN
Overall architecture of our VoiceGAN:

Details
- Face Embedding Extraction from Pre-trained DeepSphere Model
- Kaldi VoxCeleb X-Vector Extraction
- Joint Embedding Network using MLP
- Conditional DC GAN for Image Synthesis with Scaling Loss
Datasets:
VGGFace2, Voxceleb2, Voxceleb1 (Used only for X-Vector training)
- This work uses X-Vector Speaker Embeddings, with Deepsphere face Embeddings to train a joint embedding network using the N-Pair Loss. The obtained embeddings are used to generate face images conditioned on provided speaker embeddings shifted to a joint embedding space.
Preliminary Results
Example faces generated solely conditioned on speech input.

Additional Resources
Papers