Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution

Tensorflow implementation of "Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution" (INTERSPEECH 2020).

Abstract

Keyword Spotting (KWS) plays a vital role in human-computer interaction for smart on-device terminals and service robots. It remains challenging to achieve the trade-off between small footprint and high accuracy for KWS task. In this paper, we explore the application of multi-scale temporal modeling to the small-footprint keyword spotting task. We propose a multi-branch temporal convolution module (MTConv), a CNN block consisting of multiple temporal convolution filters with different kernel sizes, which enriches temporal feature space. Besides, taking advantage of temporal and depthwise convolution, a temporal efficient neural network (TENet) is designed for KWS system. Based on the purposed model, we replace standard temporal convolution layers with MTConvs that can be trained for better performance. While at the inference stage, the MTConv can be equivalently converted to the base convolution architecture, so that no extra parameters and computational costs are added compared to the base model. The results on Google Speech Command Dataset show that one of our models trained with MTConv performs the accuracy of 96.8% with only 100K parameters.

Requirements

tensorflow==1.15.0
pandas

Run experiments

Train base TENet12 on Google Speech Commands Dataset v0.01:

python -m main --dataset_path ${DATASET_PATH} --arch TENet12Model --save_folder ${SAVE_PATH}

Evaluate the model on testing set:

python -m main --mod eval --dataset_path ${DATASET_PATH} --dataset_name test --arch TENet12Model \
--checkpoint_path ${SAVE_PATH}/TENet12Model-30000

Train TENet12 with MTConvs:

python -m main --dataset_path ${DATASET_PATH} --arch TENet12Model --kernel_list 3,5,7,9 --save_folder ${SAVE_PATH}

Convert TENet12 with MTConvs to base TENet12 and evaluate the converted model:

python -m tenet_fusion --arch TENet12Model --kernel_list 3,5,7,9 \
--save_folder ${SAVE_PATH} --checkpoint_path ${TENet12_MTConvs_CHECKPOINT_PATH}

python -m main --dataset_path ${DATASET_PATH} --mod eval --dataset_name test \
--arch TENet12Model --checkpoint_path ${SAVE_PATH}/TENet12Model-30000

Citation

If you find our work useful for your research, please consider citing the paper:

@inproceedings{Li2020,
  author={Ximin Li and Xiaodong Wei and Xiaowei Qin},
  title={{Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution}},
  year=2020,
  booktitle={Proc. Interspeech 2020}
}

Reference

The implementation of TC-ResNet: https://github.com/hyperconnect/TC-ResNet.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
audio_nets		audio_nets
bash		bash
configs/v1		configs/v1
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
figures.png		figures.png
helper.py		helper.py
input_data.py		input_data.py
main.py		main.py
models.py		models.py
tenet_fusion.py		tenet_fusion.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audio_nets

audio_nets

bash

bash

configs/v1

configs/v1

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

config.py

config.py

figures.png

figures.png

helper.py

helper.py

input_data.py

input_data.py

main.py

main.py

models.py

models.py

tenet_fusion.py

tenet_fusion.py

Repository files navigation

Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution

Abstract

Requirements

Run experiments

Citation

Reference

About

Releases

Packages

Languages

License

Interlagos/TENet-kws

Folders and files

Latest commit

History

Repository files navigation

Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution

Abstract

Requirements

Run experiments

Citation

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages