Skip to content

Visual Speech Recognition using deep learing methods

License

Notifications You must be signed in to change notification settings

Saeed-Biabani/LipReading

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LipReading

Visual Speech Recognition using deep learing methods

Fig. 1: LipNet architecture.

Quick Links

Dependencies

  • Install Dependencies $ pip install -r requirements.txt
  • Download Pretrained Weights Here

Getting Started

  • Project Structure
.
├── src
│   ├── nn
│   │   └── model.py
│   └── utils
│       ├── dataProvider.py
│       ├── labelConverter.py
│       ├── losses.py
│       ├── misc.py
│       ├── trainUtils.py
│       └── transforms.py
├── config.py
└── train.py

Training

CTC Loss

CTC takes a sequence $\ H = h_1 , . . . , h_T$ , where $\ T$ is the sequence length, and outputs the probability of $\ \pi$, which is defined as

$$P(\pi|H) = \prod_{t = 1}^T y_{{\pi}_t}^t$$

where $\ y_{{\pi}_t}^t$ is the probability of generating character $\ \pi_t$ at each time step $\ t$.

Fig. 2: Model Training History.

DataSet

  • You Can Find Dataset Here

References

🛡️ License

Project is distributed under MIT License

Releases

No releases published

Packages

No packages published

Languages