Skip to content

oliviacarino/ASL_CNN_Project

Repository files navigation

🧠 ASL Detection with Computer Vision and a CNN 🧠

The goal of the model is to use CV and a 3-layer (this may increase or decrese) CNN to successfully identify all 26 letters of the English alphabet from real time video capture. A user will be able to show their computer's webcam any ASL-baesd letter and the model will be able to output the corresponding symbol to the terminal.

⭐ Goals of Project:

[x] Successfully run the project locally with acceptable performance
[ ] Add functionality to store letters captured by webcam to stdout
[x] Deploy the model to Azure
[ ] Performance tuning

Overview

This video by Nicholas Renotte inspired me to build a project using object detection and a pretained model from the Tensorflow Detection Model Zoo. After spending a ton of time trying to use the pretrained (on the famous COCO dataset) SSD MobileNet v2 320x320, its 640x640 variation, and even the SSD ResNet50 V1 FPN 640x640 (RetinaNet50). I was unsuccessful with these models. Each model failed to detect anything when more than 5 classes were added. No boundary boxes were even being drawn at first and then after retraining, too many boxes were drawn.

I spent some time researching the pretrained Tensorflow SSD Mobilenet and noticed that a lot of people tried using them for the same project and were dealing with the same issues as me. I spent a few more days trying to fine tune the hyperparameters when I trained it on my own data. Still nothing. I refused to give up and started to research if using an LSTM would be the way to go. I tried installing the common Python package mediapipe and it would work (I couldn't use conda because I had run out of storage on my tiny 128gb MacBook).

I decided to build a CNN with Keras. It's a 5 layer model (when only counting convolutional and dense layers), with batch normalization and dropout. These will help prevent overfitting, and increase the speed of training time. The training data was also binariazed, thus increasing the model's training time (this allowed it to converge faster). The shape of the model's input is (28, 28, 1).

Evaluating the Model

References

  1. Deep Learning for Sign Language Recognition: Current Techniques, Benchmarks, and Open Issues by Al-Qurishi et al.
  2. Sign Language Recognition System using TensorFlow Object Detection API by Srivastava et al.
  3. I used the Sign-Language MNIST dataset from Kaggle.
  4. This Kaggle post helped me get started on the ASL CNN detection project. I used the image preproessing steps, evaluation techniques and CNN model architecture.
  5. Not super relevant to the entire project, but it helped me resolve an issue that I spent too many hours on and want to give credit. link
  6. I plan on using some of the computer vision and hand segmentation information from here for real-time image capturing via a webcam.

About

A 5 layer cnn built with Keras that can detect signed letters (ASL) through CV.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published