Skip to content

joannahong/DiffV2S

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

Overview

This repository contains a video demo of IEEE/CVF International Conference on Computer Vision (ICCV) 2023 paper titled "DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding".

Demo video

A demo video contains the original speech, the generated speech from previous state-of-the-art works [1-3], and the generated speech from the proposed method from three different speakers on both LRS2 and LRS3 datasets, respectively. The video demo is located in demo-video folder in our repository, and it is also available in Youtube:

References

[1] Kim, Minsu, Joanna Hong, and Yong Man Ro. "Lip to speech synthesis with visual context attentional GAN." Advances in Neural Information Processing Systems 34 (2021): 2758-2770.

[2] Mira, Rodrigo, et al. "SVTS: scalable video-to-speech synthesis." arXiv preprint arXiv:2205.02058 (2022).

[3] Kim, Minsu, Joanna Hong, and Yong Man Ro. "Lip-to-speech synthesis in the wild with multi-task learning." ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published