MI2020

Project for Multimodal Interaction course (A.Y. 2019/2020), codename GesturePad.

GesturePad is a text editor capable of producing HTML/Markdown documents allowing multiple input modalities: text or voice and gestures.

The vocal interaction is based on a continuous, dictation-style, speaker-independent speech recognition model implemented by Google Cloud Speech-to-Text.

The gesture interaction is based on arbitrary semaphoric gestures, and their recognition relies on hand landmarks detection by Google MediaPipe, and the processed by a cloud-deployed Google Cloud Vision AutoML model.

Complete details can be found in the PDF report.

DEMO VIDEO

Instructions

GesturePad has been developed on Ubuntu 18.04 (LTS) with Python 3.6+. See further installation requirements for Google MediaPipe.

In order to run this project, a Makefile has been set up to contain all the required libraries and Python packages; for this reason the suggested routine for running this project is the following:

Download (and unzip) or clone the project;
Move into the main directory (containing the Makefile);
Run the make command which will: (a) install required system libraries, (b) install the required Python packages, (c) clone Google MediaPipe from its official GitHub repository, and (d) patch the MediaPipe installation with our custom files;
Modify the config.json file according to your Google Cloud Platform subscription and AutoML settings;
Run gesture_pad.py in the main project directory and follow the instructions in the GUI.

Note

It is advised to set up a Python virtual environment and to download/clone the project into a directory whose parent is not a root-protected directory.

License

Code contained in this repository is distributed under AGPL-3.0 license, exceptions below. The file dataset.zip representing the gesture dataset created by us is distributed under CC-BY-4.0.

Authors¹: Angelo Di Mambro, Emanuele Giona.

1: equal contribution, alphabetic ordering is applied

Acknowledgments

Files demo_run_graph_main.cc, end_loop_calculator.h, and landmarks_to_render_data_calculator.cc are unmodified copies of the ones present in the repository Sign language recognition with RNN and Mediapipe, which is property of Anna Kim and the same license of the original repository applies.

Files multi_hand_renderer_cpu.pbtxt and multi_hand_tracking_desktop_live.pbtxt are original modifications of the ones present in the repository MediaPipe: Cross-platform ML solutions made simple, which is property of Google's MediaPipe team and they are redistributed under the same AGPL-3.0 license as the rest of this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
backend		backend
data		data
tmp		tmp
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
anonymous_report.pdf		anonymous_report.pdf
dataset.zip		dataset.zip
gesture_pad.py		gesture_pad.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backend

backend

data

data

tmp

tmp

utils

utils

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

anonymous_report.pdf

anonymous_report.pdf

dataset.zip

dataset.zip

gesture_pad.py

gesture_pad.py

Repository files navigation

MI2020

DEMO VIDEO

Instructions

Note

License

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

emanuelegiona/MI2020

Folders and files

Latest commit

History

Repository files navigation

MI2020

Instructions

Note

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Languages