Besides its linguistic component, our speech is rich in biometric information that can be inferred by classifiers. Learning privacy-preserving representations for speech signals enables downstream tasks, such as speech recognition, without sharing unnecessary private information about an individual. In this short presentation we will show how gender recognition and speaker verification tasks can be reduced to a random guess, protecting against classification-based attacks.
Install pip dependencies:
pip install requirements.txt
- Create train/test splits and extrat root directory
- Preprocess audio and extract train/test mel spectrograms
- download pretrained weights for VQ-CPC model here):
- Train the vocoder
Example usage:python train_vocoder.py cpc_checkpoint=checkpoints/cpc/english2019/model.ckpt-24000.pt checkpoint_dir=checkpoints/vocoder/english2019
This work is based on:
-
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. "Representation learning with contrastive predictive coding." arXiv preprint arXiv:1807.03748 (2018).
-
Aaron van den Oord, and Oriol Vinyals. "Neural discrete representation learning." Advances in Neural Information Processing Systems. 2017.
@inproceedings{stoidis21_interspeech,
author={Dimitrios Stoidis and Andrea Cavallaro},
title={{Protecting Gender and Identity with Disentangled Speech Representations}},
year=2021,
booktitle={Proc. Interspeech 2021},
pages={1699--1703},
doi={10.21437/Interspeech.2021-2163}
}