Learning and controlling the source-filter representation of speech with a variational autoencoder

This repository contains the code associated with the following publication:

Learning and controlling the source-filter representation of speech with a variational autoencoder
Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier
Speech Communication, vol. 148, 2023.

If you use this code for your research, please cite the above paper.

Useful links:

Setup

Pypi:
- pip install -i https://test.pypi.org/simple/ sf-vae --no-deps
Install the package locally (for use on your system):
- In source-filter-vae directoy: pip install -e .
Virtual Environment:
- conda create -n sf_vae python=3.8
- conda activate sf_vae
- In source-filter-vae directoy: pip install -r requirements.txt

Usage

LEARNING LATENT SUBSPACES ENCODING SOURCE-FILTER FACTORS OF VARIATION

import torch
from sf_vae import Learning
from sf_vae import VAE

vae = VAE()
checkpoint = torch.load(r"checkpoints\vae_trained")
vae.load_state_dict(checkpoint['model_state_dict'])
learn = Learning(config_factor=dict(factor="f1", path_trajectory="formant_1\\f2-1600", dim=3),
                 # f0: pitch (source), f1, f2, f3: formants (filter)
                model=vae,
                path_save=r"checkpoints\pca-regression")
learn()

You can download the trajectories of the formants and the pitch in the following link.

CONTROLLING THE FACTORS OF VARIATION FOR SPEECH TRANSFORMATION

import torch
from sf_vae import Controlling
from sf_vae import VAE

vae = VAE()
checkpoint = torch.load(r"checkpoints\vae_trained")
vae.load_state_dict(checkpoint['model_state_dict'])
control = Controlling(path=r"checkpoints\pca-regression",
                    model=vae,
                    device="cuda")
control(path_wav=r"01aa0101.wav", 
        factor='f0', # f0: pitch (source), f1, f2, f3: formants (filter)
        y=(85, 300)) # The new values of the factor in Hz

Phase reconstruction method:
- RTISI_LA
- Griffin_lim
- WaveGlow
Whispering

import torch
from sf_vae import Controlling
from sf_vae import VAE

vae = VAE()
checkpoint = torch.load(r"checkpoints\vae_trained")
vae.load_state_dict(checkpoint['model_state_dict'])
control = Controlling(path=r"checkpoints\pca-regression",
                    model=vae,
                    device="cuda")
z_ = control.whispering(path_wav=r"01aa0101.wav")
control.reconstruction(z_, save=True)

GUI: graphic interface

from sf_vae import Interface
from sf_vae import VAE
import torch

vae = VAE()
checkpoint = torch.load(r"checkpoints\vae_trained")
vae.load_state_dict(checkpoint['model_state_dict'])
inter = Interface(device="cuda", model=vae, path=r"checkpoints\pca-regression")
inter.master.mainloop()

License

GNU Affero General Public License (version 3), see LICENSE.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
checkpoints		checkpoints
images		images
sf_vae		sf_vae
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test_audiotools.py		test_audiotools.py
test_controlling.py		test_controlling.py
test_f0estimation.py		test_f0estimation.py
test_interface.py		test_interface.py
test_learning.py		test_learning.py
test_model.py		test_model.py
test_training_VAE.py		test_training_VAE.py

License

samsad35/source-filter-vae

Folders and files

Latest commit

History

Repository files navigation

Learning and controlling the source-filter representation of speech with a variational autoencoder

Setup

Usage

LEARNING LATENT SUBSPACES ENCODING SOURCE-FILTER FACTORS OF VARIATION

CONTROLLING THE FACTORS OF VARIATION FOR SPEECH TRANSFORMATION

GUI: graphic interface

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages