Seeking the Shape of Sound

An implement of the CVPR 2021 paper: Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association

Environments

Ubuntu 16.04
CUDA 10.2
Python 3.7.3
Pytorch 1.4.0

See requirement.txt.

Data preparation

Download VoxCeleb, VGGFace and unzip them to ./data.

Limited by file size, only part of the query lists is included in ./data. Other lists used in the article can be downloaded from Google drive or Baidu drive (passwd: rfri).

Training

Download pretrained models for backbones into ./pretrained_models.

Google drive:

SE-ResNet-50

Thin-ResNet-34

Baidu drive:

SE-ResNet-50 (passwd: jy55)

Thin-ResNet-34 (passwd: tc6i)

Train the model and update identity weights:

python3 train.py config/train_reweight.yaml

Extract identity weights from saved model file:

python3 extract_id_weight.py config/train_reweight.yaml

The 4. Retrain the final model:

python3 train.py config/train_main.yaml

The model and log are saved in save/vox1_train/Voice2Face/main by default.

Evaluation

Download the pretrained model from Google drive or Baidu drive (passwd: 4vyf).
Modify configures in config/train_main.yaml: change resume\_eval to the path where the model is saved.
Run

python3 eval.py config/train_main.yaml

Expected results (%):

	1:2 Matching (U)	1:2 Matching (G)	Verification (U)	Verification (G)	Retrieval
Voice-to-Face	87.2	77.7	87.2	77.5	5.5
Face-to-Voice	86.5	75.3	87.0	76.1	5.8

The results might slightly differ from the above due to random factors in the training process.

References

If this code is helpful to you, please consider citing our paper:

@inproceedings{wen2021seeking,
  title={Seeking the shape of sound: An adaptive framework for learning voice-face association},
  author={Wen, Peisong and Xu, Qianqian and Jiang, Yangbangyan and Yang, Zhiyong and He, Yuan and Huang, Qingming},
  booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
data/list_wav		data/list_wav
dataloaders		dataloaders
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
base_container.py		base_container.py
eval.py		eval.py
extract_id_weight.py		extract_id_weight.py
fig_overview.jpg		fig_overview.jpg
requirements.txt		requirements.txt
run_all.sh		run_all.sh
train.py		train.py

License

KID-7391/seeking-the-shape-of-sound

Folders and files

Latest commit

History

Repository files navigation

Seeking the Shape of Sound

Environments

Data preparation

Training

Evaluation

References

About

Resources

License

Stars

Watchers

Forks

Languages