Speech2Face

[Final Presentation] : https://documentcloud.adobe.com/link/review?uri=urn:aaid:scds:US:78875dce-14af-47ff-8aca-7a7449ef64a6#pageNum=1

This project implements a framework to convert speech to facial features as described in the CVPR 2019 paper - Speech2Face: Learning the Face Behind a Voice by MIT CSAIL group.

Steps Used

Creating a model for reconstructing a person's face from his/her voice sample.
The project started off with understanding the workflow and creating a timeline of the tasks to be done.
Firstly, I extracted the voice segments and face of the people from youtube videos using the AVSpeech dataset.
Next, I augmented the audio segments with themselves until they reached a fixed clip size to make the inputs uniform for the encoder network.
After that, I extracted the facial features from the extracted faces using VGG vace which would be the output ground truth values for my encoder network and proceeded with building and training the encoder network.
The next task was to reconstruct the image of a person from the output of the encoder using a Face decoder network.
I built the decoder network to do the same usign transpose convolution layers and sm currently in the process of optimization of hyper-parameters.
In the end, I was able to create a model that can give an approximate of a person's facial looks from his / her audio samples

Getting Started

Go to preprocess folder and run prepare_directory.sh and then download AVSpeech Dataset. Run data_download.py file for data download from youtube based on AVSpeech Dataset.

cd preprocess/
sh prepare_directory.sh

Download AVSpeech Dataset in the folder.

python3 data_download.py
usage: data_download.py [-h] [--from_id FROM_ID] [--to_id TO_ID]
                        [--low_memory LOW_MEMORY] [--sample_rate SAMPLE_RATE]
                        [--duration DURATION] [--fps FPS] [--mono MONO]
                        [--window WINDOW] [--stride STRIDE]
                        [--fft_length FFT_LENGTH] [--amp_norm AMP_NORM]
                        [--face_extraction_model FACE_EXTRACTION_MODEL]
                        [--verbose]

Now run the base file with train option if you want to train.

python3 base.py
usage: base.py [-h] [--from_id FROM_ID] [--to_id TO_ID] [--epochs EPOCHS]
               [--start_epoch START_EPOCH] [--batchsize BATCHSIZE]
               [--num_gpu NUM_GPU] [--num_samples NUM_SAMPLES]
               [--load_model LOAD_MODEL] [--save_model SAVE_MODEL] [--train]
               [--verbose]

Results

We have used face retrieval performace as a evaluation metric and we are able to achieve a decent accuracy. Increasing the computation power and using complete dataset can help us achieve greater accuracy.

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

Speech2Face: Learning the Face Behind a Voice (https://arxiv.org/pdf/1905.09773.pdf)
Wav2Pix: Speech-conditioned face generation using generative adversarial networks (https://arxiv.org/pdf/1903.10195.pdf)
AVSpeech Dataset (https://looking-to-listen.github.io/avspeech/download.html)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
logs		logs
preprocess		preprocess
results		results
.gitignore		.gitignore
Extraction of Facial Features from Speech.pdf		Extraction of Facial Features from Speech.pdf
LICENSE		LICENSE
Project Structure		Project Structure
README.md		README.md
base.py		base.py
model.py		model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logs

logs

preprocess

preprocess

results

results

.gitignore

.gitignore

Extraction of Facial Features from Speech.pdf

Extraction of Facial Features from Speech.pdf

LICENSE

LICENSE

Project Structure

Project Structure

README.md

README.md

base.py

base.py

model.py

model.py

requirements.txt

requirements.txt

Repository files navigation

Speech2Face

Steps Used

Getting Started

Results

License

References

About

Releases

Packages

Languages

License

Aryan05/Generative-Modelling-of-Images-from-Speech_Speech2Face

Folders and files

Latest commit

History

Repository files navigation

Speech2Face

Steps Used

Getting Started

Results

License

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages