Learning Audio-Visual Dereverberation

Motivation

Reverberation from audio reflecting off surfaces and objects in the environment not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition. Prior work attempts to remove reverberation based on the audio modality only. Our idea is to learn to dereverberate speech from audio-visual observations. The visual environment surrounding a human speaker reveals important cues about the room geometry, materials, and speaker location, all of which influence the precise reverberation effects in the audio stream. We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed sounds and visual scene. In support of this new task, we develop a large-scale dataset that uses realistic acoustic renderings of speech in real-world 3D scans of homes offering a variety of room acoustics. Demonstrating our approach on both simulated and real imagery for speech enhancement, speech recognition, and speaker identification, we show it achieves state-of-the-art performance and substantially improves over traditional audio-only methods.

Citation

If you find this paper and code useful, please cite the following paper:

@arxiv{chen22av_dereverb,
  title     =     {Learning Audio-Visual Dereverberation,
  author    =     {Changan Chen and Wei Sun and David Harwath and Kristen Grauman},
  journal   =     {arXiv},
  year      =     {2022}
}

Installation

Install this repo into pip by running the following command:

pip install -e .

Usage

Training

py vida/trainer.py --model-dir data/models/vida  --num-channel 2 --use-depth --use-rgb --log-mag --no-mask --phase-loss sin --phase-weight 0.1 --use-triplet-loss --exp-decay --triplet-margin 0.5 --mean-pool-visual --overwrite

Evaluation

py vida/evaluator.py --pretrained-path data/models/vida/best_val.pth --num-channel 2  --log-mag --no-mask --est-pred --use-rgb --use-depth --mean-pool-visual --eval-dereverb

Data

See the data page for instructions on how to download the data

Contributing

See the CONTRIBUTING file for how to help out.

License

This repo is CC-BY-NC licensed, as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
vida		vida
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vida

vida

.gitignore

.gitignore

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

Learning Audio-Visual Dereverberation

Motivation

Citation

Installation

Usage

Data

Contributing

License

About

Releases

Packages

Contributors 2

Languages

License

facebookresearch/learning-audio-visual-dereverberation

Folders and files

Latest commit

History

Repository files navigation

Learning Audio-Visual Dereverberation

Motivation

Citation

Installation

Usage

Data

Contributing

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages