Skip to content

In this repository, I try to use k2, icefall and Lhotse for lip reading. I will modify it for the lip reading task. Many different lip-reading datasets should be added. -_-

License

Notifications You must be signed in to change notification settings

luomingshuang/lipreading_with_icefall

Repository files navigation

Installation

Please refer to https://icefall.readthedocs.io/en/latest/installation/index.html for installation.

Recipes

Please refer to https://icefall.readthedocs.io/en/latest/recipes/index.html for more information.

We provide four recipes at present:

yesno

This is the simplest ASR recipe in icefall and can be run on CPU. Training takes less than 30 seconds and gives you the following WER:

[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]

We do provide a Colab notebook for this recipe. Open In Colab

GRID

For the VSR (visual speech recognition) task, we provide two models: Conv3d Map BiGRU CTC model and Conv3d ResNet18 BiGRU CTC model.

Conv3d Map BiGRU CTC Model

The WER for this model is:

TEST
WER 15.68%

We provide a Colab notebook to run a pre-trained Conv3d Map BiGRU CTC model: Open In Colab

Conv3d ResNet18 BiGRU CTC Model

The WER for this model is:

TEST
WER 13.63%

We provide a Colab notebook to run a pre-trained Conv3d ResNet18 BiGRU CTC model: Open In Colab

For the ASR (automatic speech recognition) task, we provide one model: Tdnn Lstm CTC model.

Tdnn Lstm CTC Model

The WER for this model is:

TEST
WER 2.35%

We provide a Colab notebook to run a pre-trained Tdnn Lstm CTC model: Open In Colab

For the AVSR (audio-visual speech recognition) task, we provide one model: CombineNet CTC model.

CombineNet CTC Model

The WER for this model is:

TEST
WER 1.71%

We provide a Colab notebook to run a pre-trained CombineNet CTC model: Open In Colab

Deployment with C++

Once you have trained a model in icefall, you may want to deploy it with C++, without Python dependencies.

Please refer to the documentation https://icefall.readthedocs.io/en/latest/recipes/librispeech/conformer_ctc.html#deployment-with-c for how to do this.

We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see: Open In Colab

About

In this repository, I try to use k2, icefall and Lhotse for lip reading. I will modify it for the lip reading task. Many different lip-reading datasets should be added. -_-

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published