项目作者： AshwinRJ

项目描述：
VoiceGAN - Hallucinating Faces from Voices

高级语言： Jupyter Notebook

项目主页：

项目地址: git://github.com/AshwinRJ/Face-Generation-from-Voice.git

创建时间： 2019-03-16T01:17:14Z
项目社区：https://github.com/AshwinRJ/Face-Generation-from-Voice
开源协议：MIT License
下载

Face-Generation-from-Speech

Implementation Details - VoiceGAN

Overall architecture of our VoiceGAN:

Details

Face Embedding Extraction from Pre-trained DeepSphere Model
Kaldi VoxCeleb X-Vector Extraction
Joint Embedding Network using MLP
Conditional DC GAN for Image Synthesis with Scaling Loss

Datasets:

VGGFace2, Voxceleb2, Voxceleb1 (Used only for X-Vector training)

This work uses X-Vector Speaker Embeddings, with Deepsphere face Embeddings to train a joint embedding network using the N-Pair Loss. The obtained embeddings are used to generate face images conditioned on provided speaker embeddings shifted to a joint embedding space.

Preliminary Results

Example faces generated solely conditioned on speech input.

Additional Resources

Papers

ProGAN
AttnGAN
@animeshsk3/t2f-c-using-deep-learning-b3b6ba5a5a93">Text to Face Generation
Generating custom photo realistic faces
VGGFaces2
VoxCelebs
Kaldi SRE X-Vectors
The Triplet Loss Implementation


