Skip to content

Visual speech recognition with face inputs: code and models for F&G 2020 paper "Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition"

Notifications You must be signed in to change notification settings

VIPL-Audio-Visual-Speech-Understanding/deep-face-speechreading

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation


Deep VSR (speechreading) with Face Inputs

Paper

Description

We provide code and models (in PyTorch) that can be used to evaluate the methods in our paper Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition. We provide models trained on the LRW dataset (English) and the LRW-1000 dataset (Mandarin Chinese).

Content

Model Zoo

Coming soon...

Citation

@inproceedings{zhang2020can,
    author = {Y. Zhang and S. Yang and J. Xiao and S. Shan and X. Chen},
    booktitle = {2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020) (FG)},
    title = {Can We Read Speech Beyond the Lips? {R}ethinking {RoI} Selection for Deep Visual Speech Recognition},
    year = {2020},
    pages = {851-858},
    keywords = {visual speech recognition},
    doi = {10.1109/FG47880.2020.00134},
    url = {https://doi.ieeecomputersociety.org/10.1109/FG47880.2020.00134},
    publisher = {IEEE Computer Society},
    address = {Los Alamitos, CA, USA}
}

License

TBD

Contact

Yuanhang Zhang

About

Visual speech recognition with face inputs: code and models for F&G 2020 paper "Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition"

Topics

Resources

Stars

Watchers

Forks

Languages