Skip to content

euler16/Lipreading-Chainer

Repository files navigation

Lipreading Chainer

This is the chainer code for the paper Combining Residual Networks with LSTMs for Lipeading. You can find the paper here.
The authors present a word level lipreading model based on Resnets. The input to the model is a silent video and the model then outputs the word it thinks was spoken. In the paper this task of visual speech recognition has been modelled as video classification.

The code is based on PyTorch implementation of the same work which can be found here.

Dataset

The model has been trained on Oxford-BBC Lip Reading in the Wild (LRW) dataset. The dataset consists of short video clips of news anchor speaking a single word. The words dictionary size is 500. The dataset contains about 1000 utterances of each of the 500 words. Dataset size is around 70GB.

How to Run

  1. Download the LRW dataset from this website
  2. Preprocess the dataset as given in the PyTorch counterpart of this repo (available here)
  3. Write the appropriate dataset path in config.json
  4. Run the following command :-
python main.py --config config.json
  1. after the training is over, change the mode variable in config.json to 'backendGRU' and run the above command.
  2. Finally fine tune the model by switching the mode to 'finetuneGRU'.

Make sure you change the path variable to saved model location after step 4.

TODOs

  • Chainer code, tested
  • Tested on CPU
  • Making it work on GPU

About

Chainer code for using Residual Networks with LSTMs for Lipreading

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages