zero/docs/afs_speech_translation at master · bzhangGo/zero

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
afs_training.png		afs_training.png
example.png		example.png

README.md

Adaptive Feature Selection for End-to-End Speech Translation, EMNLP2020 Findings

paper link
source code is given in the speech_translation branch

This paper targets at improving end-to-end speech translation by improving the quality of speech features through feature selection. We argue that speech signals are often noisy and lengthy, involving large amount of redundant signals contributing little to speech recognition, thus also speech translation. Our solution is to eschew those transcript-irrelevant features such that speech translation models could access more meaningful speech signals, easing the learning of speech-target translation correspondence/alignment.

We propose adaptive feature selection, based on L0Drop, that learns to route information through a subset of speech features to support speech tasks. The learning process is automatic, with some hyperparameter controlling the degree of sparsity induced. Figure below shows the training procedure with AFS:

And the example below shows the position of selected features used for speech translation:

In our experiments, we observe substantial BLEU improvement compared against an ASR-pretrained ST baseline, where our method filters out ~85% speech features (~1.4x decoding speedup as a by-product).

In short, our work demonstrates that E2E ST suffers from redundant speech features, with sparsification bringing significant performance improvements. The E2E ST task offers new opportunities for follow-up research in sparse models to deliver performance gains, apart from enhancing efficiency and/or interpretability.

Model Training & Evaluation

Please go to the speech_translation branch for more details, where we provide an example for training/evaluation.

Performance and Download

We provide pretrained models for MuST-C En-De and LibriSpeech En-Fr. We also provide our models' translation for each test set.

Results on MuST-C

BLEU Score and sparsity on MuST-C corpus. Our model outperforms baselines substantially.

Metric	Model	De	Es	Fr	It	Nl	Pt	Ro	Ru
BLEU	ST	17.44	23.85	28.43	19.54	21.23	22.55	17.66	12.10
	ST+ASR-PT	20.67	25.96	32.24	20.84	23.27	24.83	19.94	13.96
	ST+AFS-t	21.57	26.78	33.34	23.08	24.68	26.13	21.73	15.10
	ST+AFS-tf	22.38	27.04	33.43	23.35	25.05	26.55	21.87	14.92
Sparsity Rate	ST+AFS-t	84.4%	84.5%	83.2%	84.9%	84.4%	84.4%	84.7%	84.2%
	ST+AFS-tf	85.1%	84.5%	84.7%	84.9%	83.5%	85.1%	84.8%	84.7%

We offer models' translations to ease direct comparision for follow-up studies.

Model	De	Es	Fr	It	Nl	Pt	Ro	Ru
ST	txt	txt	txt	txt	txt	txt	txt	txt
ST+ASR-PT	txt	txt	txt	txt	txt	txt	txt	txt
ST+AFS-t	txt	txt	txt	txt	txt	txt	txt	txt
ST+AFS-tf	txt	txt	txt	txt	txt	txt	txt	txt

For MuST-C En-De, we also provide the preprocessed dataset, very large ~66G for downloading. Besides, we provide the trained models below.

Model	MuST-C EnDe
ST	model
ST+ASR-PT	model
ST+AFS-t	model
ST+AFS-tf	model

Results on LibriSpeech En-Fr

Similar to MuST-C, we provide preprocessed dataset, ~16G, translation performance, translation output and pretrained models.

Model	LibriSpeech EnFr
ST	14.32 txt model
ST+ASR-PT	17.05 txt model
ST+AFS-t	18.33 txt model
ST+AFS-tf	18.56 txt model

Please go to AFS for E2E ST for more details.

Citation

Please consider cite our paper as follows:

Biao Zhang; Ivan Titov; Barry Haddow; Rico Sennrich (2020). Adaptive Feature Selection for End-to-End Speech Translation. In Findings of the Association for Computational Linguistics: EMNLP 2020.

@inproceedings{zhang-etal-2020-adaptive,
    title = "Adaptive Feature Selection for End-to-End Speech Translation",
    author = "Zhang, Biao  and
      Titov, Ivan  and
      Haddow, Barry  and
      Sennrich, Rico",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.findings-emnlp.230",
    doi = "10.18653/v1/2020.findings-emnlp.230",
    pages = "2533--2544",
    abstract = "Information in speech signals is not evenly distributed, making it an additional challenge for end-to-end (E2E) speech translation (ST) to learn to focus on informative features. In this paper, we propose adaptive feature selection (AFS) for encoder-decoder based E2E ST. We first pre-train an ASR encoder and apply AFS to dynamically estimate the importance of each encoded speech feature to ASR. A ST encoder, stacked on top of the ASR encoder, then receives the filtered features from the (frozen) ASR encoder. We take L0DROP (Zhang et al., 2020) as the backbone for AFS, and adapt it to sparsify speech features with respect to both temporal and feature dimensions. Results on LibriSpeech EnFr and MuST-C benchmarks show that AFS facilitates learning of ST by pruning out {\textasciitilde}84{\%} temporal features, yielding an average translation gain of {\textasciitilde}1.3-1.6 BLEU and a decoding speedup of {\textasciitilde}1.4x. In particular, AFS reduces the performance gap compared to the cascade baseline, and outperforms it on LibriSpeech En-Fr with a BLEU score of 18.56 (without data augmentation).",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

afs_speech_translation

afs_speech_translation

README.md

README.md

afs_training.png

afs_training.png

example.png

example.png

README.md

Adaptive Feature Selection for End-to-End Speech Translation, EMNLP2020 Findings

Model Training & Evaluation

Performance and Download

Results on MuST-C

Results on LibriSpeech En-Fr

Citation

Files

afs_speech_translation

Directory actions

More options

Directory actions

More options

Latest commit

History

afs_speech_translation

Folders and files

parent directory

README.md

README.md

afs_training.png

afs_training.png

example.png

example.png

README.md

Adaptive Feature Selection for End-to-End Speech Translation, EMNLP2020 Findings

Model Training & Evaluation

Performance and Download

Results on MuST-C

Results on LibriSpeech En-Fr

Citation