🧠 ASL Detection with Computer Vision and a CNN 🧠

The goal of the model is to use CV and a 3-layer (this may increase or decrese) CNN to successfully identify all 26 letters of the English alphabet from real time video capture. A user will be able to show their computer's webcam any ASL-baesd letter and the model will be able to output the corresponding symbol to the terminal.

⭐ Goals of Project:

[x] Successfully run the project locally with acceptable performance
[ ] Add functionality to store letters captured by webcam to stdout
[x] Deploy the model to Azure
[ ] Performance tuning

Overview

This video by Nicholas Renotte inspired me to build a project using object detection and a pretained model from the Tensorflow Detection Model Zoo. After spending a ton of time trying to use the pretrained (on the famous COCO dataset) SSD MobileNet v2 320x320, its 640x640 variation, and even the SSD ResNet50 V1 FPN 640x640 (RetinaNet50). I was unsuccessful with these models. Each model failed to detect anything when more than 5 classes were added. No boundary boxes were even being drawn at first and then after retraining, too many boxes were drawn.

I spent some time researching the pretrained Tensorflow SSD Mobilenet and noticed that a lot of people tried using them for the same project and were dealing with the same issues as me. I spent a few more days trying to fine tune the hyperparameters when I trained it on my own data. Still nothing. I refused to give up and started to research if using an LSTM would be the way to go. I tried installing the common Python package mediapipe and it would work (I couldn't use conda because I had run out of storage on my tiny 128gb MacBook).

I decided to build a CNN with Keras. It's a 5 layer model (when only counting convolutional and dense layers), with batch normalization and dropout. These will help prevent overfitting, and increase the speed of training time. The training data was also binariazed, thus increasing the model's training time (this allowed it to converge faster). The shape of the model's input is (28, 28, 1).

Evaluating the Model

References

Deep Learning for Sign Language Recognition: Current Techniques, Benchmarks, and Open Issues by Al-Qurishi et al.
Sign Language Recognition System using TensorFlow Object Detection API by Srivastava et al.
I used the Sign-Language MNIST dataset from Kaggle.
This Kaggle post helped me get started on the ASL CNN detection project. I used the image preproessing steps, evaluation techniques and CNN model architecture.
Not super relevant to the entire project, but it helped me resolve an issue that I spent too many hours on and want to give credit. link
I plan on using some of the computer vision and hand segmentation information from here for real-time image capturing via a webcam.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.readme_images		.readme_images
images		images
models		models
.gitignore		.gitignore
ASL_CNN_with_Keras.ipynb		ASL_CNN_with_Keras.ipynb
DeployModel.ipynb		DeployModel.ipynb
README.md		README.md
label_map.pbtxt		label_map.pbtxt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.readme_images

.readme_images

images

images

models

models

.gitignore

.gitignore

ASL_CNN_with_Keras.ipynb

ASL_CNN_with_Keras.ipynb

DeployModel.ipynb

DeployModel.ipynb

README.md

README.md

label_map.pbtxt

label_map.pbtxt

requirements.txt

requirements.txt

Repository files navigation

🧠 ASL Detection with Computer Vision and a CNN 🧠

⭐ Goals of Project:

Overview

Evaluating the Model

References

About

Releases

Packages

Languages

oliviacarino/ASL_CNN_Project

Folders and files

Latest commit

History

Repository files navigation

🧠 ASL Detection with Computer Vision and a CNN 🧠

⭐ Goals of Project:

Overview

Evaluating the Model

References

About

Resources

Stars

Watchers

Forks

Languages