AudioVisualLip

Audio-Visual Voice Biometrics is a audio-visual speaker recognition task, which leverages auditory and visual speech in a video. The portrait- and linguistic-based speaker characteristics are extracted via the temporal dynamics modeling. It involves the conventional speaker recognition and lip biometrics tasks.

Introduction

This is the official implementation of ICASSP23 paper CROSS-MODAL AUDIO-VISUAL CO-LEARNING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION.

Datasets

Please turn to the ./preprocessing to extract lips for the training and test datasets.

Running

After getting the lip data of training sets and test sets, you could run ./main_audiovisuallip_DATASET_CM.py for training and testing with only switching the stage in the code. When doing this, be sure to change the ./conf/config_audiovisuallip_DATASET_new.yaml to your own configuration.

Results

pretrained models

You could find the pretrained audio-only and visual-only model here: https://drive.google.com/drive/folders/1IalsNtmDH-qFnfgmn_O92J1MUHCaQepl?usp=sharing

Reference

AVLip:

@inproceedings{liu2023cross,
  title={Cross-Modal Audio-Visual Co-Learning for Text-Independent Speaker Verification},
  author={Liu, Meng and Lee, Kong Aik and Wang, Longbiao and Zhang, Hanyi and Zeng, Chang and Dang, Jianwu},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}

DeepLip:

@inproceedings{liu2021deeplip,
  title={DeepLip: A Benchmark for Deep Learning-Based Audio-Visual Lip Biometrics},
  author={Liu, Meng and Wang, Longbiao and Lee, Kong Aik and Zhang, Hanyi and Zeng, Chang and Dang, Jianwu},
  booktitle={2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  pages={122--129},
  year={2021},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
conf		conf
data		data
models		models
preprocess		preprocess
utils		utils
wenet		wenet
README.md		README.md
datasets_new.py		datasets_new.py
main_audio_lrs3_new.py		main_audio_lrs3_new.py
main_audio_vox_new.py		main_audio_vox_new.py
main_audiovisuallip_lrs3_CM.py		main_audiovisuallip_lrs3_CM.py
main_audiovisuallip_vox_CM.py		main_audiovisuallip_vox_CM.py
main_visual_lip_lrs3_new.py		main_visual_lip_lrs3_new.py
requirements.txt		requirements.txt

DanielMengLiu/AudioVisualLip

Folders and files

Latest commit

History

Repository files navigation

AudioVisualLip

Introduction

Datasets

Running

Results

pretrained models

Reference

About

Resources

Stars

Watchers

Forks

Languages