Densely Connected Time Delay Neural Network

PyTorch implementation of Densely Connected Time Delay Neural Network (D-TDNN) in our paper "Densely Connected Time Delay Neural Network for Speaker Verification" (INTERSPEECH 2020).

News

[2023-05-04] 3D-Speaker supports training of CAM++ model and can be easily extended to support training of raw D-TDNN and CAM models. They also released a Chinese speaker embedding model trained on 200k speakers and an English speaker embedding model trained on VoxCeleb.

[2023-03-04] CAM++ achieved superior performance with lower computational complexity and faster inference speed than popular ECAPA-TDNN and ResNet34 systems.

H. Wang, S. Zheng, Y. Chen, L. Cheng, and Q. Chen, "CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking"

	VoxCeleb1-E	VoxCeleb1-H	CN-Celeb
ECAPA-TDNN	1.07/0.1185	1.98/0.1956	7.45/0.4127
D-TDNN	1.63/0.1748	2.86/0.2571	8.41/0.4683
CAM	1.18/0.1257*	2.15/0.1966*	-
CAM++	0.89/0.0995	1.76/0.1729	6.78/0.3830

[2021-09-05] TimeDelay is replaced by Conv1d by default, since convolution is better optimized in all kinds of deep learning frameworks (Note: The pretrained models are directly converted from the old ones so that the results might be slightly different from those in the paper).
[2021-08-28] D-TDNN and D-TDNN-SS outperform SOTA system on the AP20-OLR-dialect-task of oriental language recognition (OLR) challenge 2020 (WeChat artical / paper), showing their potential on other speech processing tasks.
[2021-02-01] CAM adopts D-TDNN backbone and is enhanced by context-aware masking.

Y.-Q. Yu, S. Zheng, H. Suo, Y. Lei, and W.-J. Li, "CAM: Context-Aware Masking for Robust Speaker Verification" (ICASSP 2021)

VoxCeleb1-E VoxCeleb1-H

CAM 1.18/0.1257 2.15/0.1966

Pretrained Models

We provide the pretrained models which can be used in many tasks such as:

Speaker Verification
Speaker-Dependent Speech Separation
Multi-Speaker Text-to-Speech
Voice Conversion

Usage

Data preparation

You can either use Kaldi toolkit:

Download VoxCeleb1 test set and unzip it.
Place prepare_voxceleb1_test.sh under $kaldi_root/egs/voxceleb/v2 and change the $datadir and $voxceleb1_root in it.
Run chmod +x prepare_voxceleb1_test.sh && ./prepare_voxceleb1_test.sh to generate 30-dim MFCCs.
Place the trials under $datadir/test_no_sil.

Or checkout the kaldifeat branch if you do not want to install Kaldi.

Test

Download the pretrained D-TDNN model and run:

python evaluate.py --root $datadir/test_no_sil --model D-TDNN --checkpoint dtdnn.pth --device cuda

Evaluation

VoxCeleb1-O

Model	Emb.	Params (M)	Loss	Backend	EER (%)	DCF_0.01	DCF_0.001
TDNN	512	4.2	Softmax	PLDA	2.34	0.28	0.38
E-TDNN	512	6.1	Softmax	PLDA	2.08	0.26	0.41
F-TDNN	512	12.4	Softmax	PLDA	1.89	0.21	0.29
D-TDNN	512	2.8	Softmax	Cosine	1.81	0.20	0.28
D-TDNN-SS (0)	512	3.0	Softmax	Cosine	1.55	0.20	0.30
D-TDNN-SS	512	3.5	Softmax	Cosine	1.41	0.19	0.24
D-TDNN-SS	128	3.1	AAM-Softmax	Cosine	1.22	0.13	0.20

Citation

If you find D-TDNN helps your research, please cite

@inproceedings{DBLP:conf/interspeech/YuL20,
  author    = {Ya-Qi Yu and
               Wu-Jun Li},
  title     = {Densely Connected Time Delay Neural Network for Speaker Verification},
  booktitle = {Annual Conference of the International Speech Communication Association (INTERSPEECH)},
  pages     = {921--925},
  year      = {2020}
}

Revision of the Paper

References:

[16] X. Li, W. Wang, X. Hu, and J. Yang, "Selective Kernel Networks," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 510-519.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
figure		figure
model		model
README.md		README.md
data.py		data.py
evaluate.py		evaluate.py
extract.py		extract.py
metric.py		metric.py
prepare_voxcele1_test.sh		prepare_voxcele1_test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figure

figure

model

model

README.md

README.md

data.py

data.py

evaluate.py

evaluate.py

extract.py

extract.py

metric.py

metric.py

prepare_voxcele1_test.sh

prepare_voxcele1_test.sh

Repository files navigation

Densely Connected Time Delay Neural Network

News

Pretrained Models

Usage

Data preparation

Test

Evaluation

Citation

Revision of the Paper

About

Releases 2

Languages

yuyq96/D-TDNN

Folders and files

Latest commit

History

Repository files navigation

Densely Connected Time Delay Neural Network

News

Pretrained Models

Usage

Data preparation

Test

Evaluation

Citation

Revision of the Paper

About

Topics

Resources

Stars

Watchers

Forks

Languages