Skip to content

zhu00121/Universal-representation-dynamics-of-deepfake-speech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Universal-representation-dynamics-of-deepfake-speech

This repo contains the implementation of the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection", by Yi Zhu, Saurabh Powar, and Tiago H. Falk.

Requirements and installations

  • Python version == 3.10.2

  • Pytorch version == 1.13.1

  • Speechbrain version == 0.5.14

  • torchaudio == 0.13.1

    cd YOUR-PROJECT-FOLDER
    git clone Universal-representation-dynamics-of-deepfake-speech
    pip install -r requirements.txt

Data

We employed data from ASVspoof 2019 LA track and ASVspoof 2021 DF track. For both we are unfortuantely not authorised to re-distribute the data and labels. Related information can be found at the challenge website.

2019 LA track

This track includes training, development, and evaluation sets, all zipped in the LA.zip file. Download link

2021 DF track

This track uses training and development data from the 2019 LA track, which is already included in the LA.zip file. The evaluation data can be accessed here.

Getting started

We offer two ways to replicate our results:

  1. Run python exps/train.py exps/hparams/XXX.yaml on your machine. This automatically trains and evaluates the model. However, you might need to first of all unzip all the downloaded files then edit the corresponding data paths in the .yaml file to point to your own data files.
  2. Run sbatch run.sh. This bash script was submitted to Compute Canada cluster for model training and evaluation, so you may need to alter a few lines to meet your own requirements. This script moves all data to the desired folder, unzips them, and evaluates the models. We provide a more detailed instruction in the batch_scripts folder.

Pre-trained models

We will soon release our pre-trained models in this repo.

Apply the modulation transformation block to other representations

One of the key elements in our model is the modulation transformation block, this converts any 2D (feature by time) representation into another 2D dynamic representation. We experimented with wav2vec 2.0 and wavLM in this project, but the transformation can be scaled to other representations as well.

For flexibility, we defined an independent class modulator in the ssl_family.py. This class can be integrated with other DL model block. An examplar usage is provided as follows:

from ssl_family import modulator

MTB = modulator(
		sample_rate=50,
		win_length=128,
		hop_length=32,
		)

input = torch.randn((1,1000,768)) # (bathc, time, feature_channel)

output = MTB(input)
print(output.shape)

Below is a visualization of the modulation dynamics of different deepfakes (same speech content, same speaker)

Contact

For questions, contact us at Yi.Zhu@inrs.ca.

About

This repo contains code used in the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published