Skip to content

AshwinRJ/Face-Generation-from-Voice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Face-Generation-from-Speech

Implementation Details - VoiceGAN

Overall architecture of our VoiceGAN:

Details

  1. Face Embedding Extraction from Pre-trained DeepSphere Model
  2. Kaldi VoxCeleb X-Vector Extraction
  3. Joint Embedding Network using MLP
  4. Conditional DC GAN for Image Synthesis with Scaling Loss

Datasets:

VGGFace2, Voxceleb2, Voxceleb1 (Used only for X-Vector training)

  • This work uses X-Vector Speaker Embeddings, with Deepsphere face Embeddings to train a joint embedding network using the N-Pair Loss. The obtained embeddings are used to generate face images conditioned on provided speaker embeddings shifted to a joint embedding space.

Preliminary Results

Example faces generated solely conditioned on speech input.

Additional Resources

Papers

Related Code Repositories