Visual-to-Audio aid for visually impaired

A system to process visual input on timed frames to produce sensible audio aid in accordance with human information processing limits, using image captioning, semantic text comparison and text-to-speech modules.

Requirements

install requirements from requirements.txt

Running application

change below lines for video source and frame selection interval, (set video_path to 0 for camera capture)

video_path = 'videoplayback.mp4'

frame_interval = 30

run aid.py to see output

Model details: (will be replaced with a HED + GAN architecture soon)

Image captioning model is trained with Bahdanau attention, with a CNN encoder and GRU based RNN decoder

The dataset is 50,000 random images out of >4,70,000 images in MS-COCO dataset due to colab infrastructure limitations, for better results train your own model with code from colab notebook "train_captioning_model.ipynb" (!!! migrate to own infrastructure or deep learning vms to do so as this is the maximium capability of colab !!!)

The Images are preprocessed with a pre-trained inceptionV3 application trained on ImageNet, the limiters are in place for colab limitations, feel free to lift them up if you are training yourself

The caption unique word store is limited using top_k, which is set to top 5000 words, feel free to change if you are training yourself

The tokeniser.pkl store in pickles folder is the one with limitations, if you decide to train yourself then make sure to pickle the tokeniser and replace the same here with same name.

architecture diagram and overview are in design folder

final text-to-speech

pyttsx3 - https://pypi.org/project/pyttsx3/

Pending Work:

semantic text similarity for comparing older and newer caption to avoid output congestion

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.idea		.idea
design		design
pickles		pickles
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
aid.py		aid.py
datafile.txt		datafile.txt
frame.jpg		frame.jpg
global_state.pickle		global_state.pickle
player.py		player.py
requirements.txt		requirements.txt
train_captioning_model.ipynb		train_captioning_model.ipynb
videoplayback.mp4		videoplayback.mp4

License

sidphbot/visual-to-audio-aid-for-visually-impaired

Folders and files

Latest commit

History

Repository files navigation

Visual-to-Audio aid for visually impaired

Requirements

Running application

Model details: (will be replaced with a HED + GAN architecture soon)

final text-to-speech

Pending Work:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages