Predicting-Speaker-Quality

Welcome to Predicting-Speaker-Quality! This repository contains the code used for my Bachelor's thesis with the title Predicting Speaker Quality Using Embeddings. All of it is research code written by an inexperienced undergraduate student, so please don't expect perfect documentation. However, if you run into any troubles or even want to improve or add to the code base, don't hesitate to reach out to me. Found a mistake? Let me know as well.

Besides just reading this README file, a good idea to delve into the topic might also be to read the resulting thesis itself, which is included in this repository as Predicting Speaker Quality Using Embeddings.pdf.

Setup

To set up the project, follow these steps:

1. Getting Started

Clone this repository.
Install the requirements from requiremente.txt using pip install -r requirements.txt if they are not already satisfied. If you like, you can do this in a virtual environment to keep things tidy.

2. Getting and Creating Data

Download the Spoken Wikipedia Corpus (German, with audio) from https://nats.gitlab.io/swc/ and replace the directory german with it.
Navigate into the main project directory and execute the split.sh script using bash split.sh -m 10 -d 10 -p, which will generate up to 10 samples of length 10 seconds from each audio file in the wavs directory and its subdirectories. This may take a while. To see all available options, type bash split.sh -h.
Generate the GE2E and TRILL embeddings by running the update_embeddings.py script once. If you want to create new embeddings, for example because you have new .wav files in your demo folder, just run it again. It will remember which embeddings have already been created and delete embeddings that are no longer needed.
Navigate into the feature-scripts directory and execute the update_audio_features.sh script using bash update_audio_features.sh. Just like the previous script, this one does all the bookkeeping for you and tracks new and deleted .wav files.

Training and Evaluating Models

In order to train and evaluate the neural network models (DNNs and LSTMs), simply run the keras_regressors.py script. All parameters like network architecture, learning rate, etc. can be modified inside the file itself.
For the kNN and random forest regressor, use the sklearn_regressors.py file. Like before, all parameters can be set inside the script itself.

(Re-)Creating Plots

If you want to create plots from the resulting predictions (just like the ones seen in the thesis), take a look at the individual plotting scripts inside plot-scripts.

Demo

In order to evaluate the audio recordings inside wavs/demo, please use the script demo.py.

Acknowledgements

The code in the encoder directory, which generates the GE2E embeddings, is forked from Corentin Jemine (https://github.com/CorentinJ/Real-Time-Voice-Cloning) and available in a better documented format under the name Resemblyzer.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
data_objects		data_objects
encoder		encoder
feature-scripts		feature-scripts
feature-streams		feature-streams
german		german
models		models
plot-scripts		plot-scripts
plots		plots
predictions		predictions
wavs		wavs
.gitignore		.gitignore
.gitignore.save		.gitignore.save
227_articles.pickle		227_articles.pickle
227_is_male.pickle		227_is_male.pickle
227_ranks.pickle		227_ranks.pickle
227_speakers.pickle		227_speakers.pickle
227_utils.py		227_utils.py
Predicting Speaker Quality Using Embeddings.pdf		Predicting Speaker Quality Using Embeddings.pdf
README.md		README.md
create_speaker_to_quality_dict.py		create_speaker_to_quality_dict.py
cross_validation_generator.py		cross_validation_generator.py
demo.py		demo.py
dimension_correlator.py		dimension_correlator.py
keras_regressors.py		keras_regressors.py
ratings.csv		ratings.csv
requirements.txt		requirements.txt
sklearn_binary_quality_estimator.py		sklearn_binary_quality_estimator.py
sklearn_regressors.py		sklearn_regressors.py
speaker_prediction_width.py		speaker_prediction_width.py
speaker_to_quality_dict.pickle		speaker_to_quality_dict.pickle
speaker_to_sex_dict.pickle		speaker_to_sex_dict.pickle
split.sh		split.sh
update_embeddings.py		update_embeddings.py

epistoteles/predicting-speaker-quality

Folders and files

Latest commit

History

Repository files navigation

Predicting-Speaker-Quality

Setup

1. Getting Started

2. Getting and Creating Data

Training and Evaluating Models

(Re-)Creating Plots

Demo

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Languages