Computer-Pointer-Controller

Introduction

Computer Pointer Controller app is used to controll the movement of mouse pointer by the direction of eyes and also estimated pose of head. This app takes video as input and then app estimates eye-direction and head-pose and based on that estimation it move the mouse pointers.

Demo video

Project Set Up and Installation

Setup

Prerequisites

You need to install openvino successfully.
See this guide for installing openvino.

Step 1

Clone the repository:- https://github.com/mdfazal/Computer-Pointer-Controller

Step 2

Initialize the openVINO environment:-

source /opt/intel/openvino/bin/setupvars.sh -pyver 3.5

Step 3

Download the following models by using openVINO model downloader:-

1. Face Detection Model

python /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name "face-detection-adas-binary-0001"

2. Facial Landmarks Detection Model

python /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name "landmarks-regression-retail-0009"

3. Head Pose Estimation Model

python /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name "head-pose-estimation-adas-0001"

4. Gaze Estimation Model

python /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name "gaze-estimation-adas-0002"

Demo

Open a new terminal and run the following commands:-

1. Change the directory to src directory of project repository

cd <project-repo-path>/src

2. Run the main.py file

python main.py -f <Path of xml file of face detection model> \
-fl <Path of xml file of facial landmarks detection model> \
-hp <Path of xml file of head pose estimation model> \
-g <Path of xml file of gaze estimation model> \
-i <Path of input video file or enter cam for taking input video from webcam>

If you want to run app on GPU:-

python main.py -f <Path of xml file of face detection model> \
-fl <Path of xml file of facial landmarks detection model> \
-hp <Path of xml file of head pose estimation model> \
-g <Path of xml file of gaze estimation model> \
-i <Path of input video file or enter cam for taking input video from webcam> 
-d GPU

If you want to run app on FPGA:-

python main.py -f <Path of xml file of face detection model> \
-fl <Path of xml file of facial landmarks detection model> \
-hp <Path of xml file of head pose estimation model> \
-g <Path of xml file of gaze estimation model> \
-i <Path of input video file or enter cam for taking input video from webcam> 
-d HETERO:FPGA,CPU

Documentation

Documentatiob of used models

Command Line Arguments for Running the app

Following are commanda line arguments that can use for while running the main.py file python main.py:-

-h (required) : Get the information about all the command line arguments
-fl (required) : Specify the path of Face Detection model's xml file
-hp (required) : Specify the path of Head Pose Estimation model's xml file
-g (required) : Specify the path of Gaze Estimation model's xml file
-i (required) : Specify the path of input video file or enter cam for taking input video from webcam
-d (optional) : Specify the target device to infer the video file on the model. Suppoerted devices are: CPU, GPU, FPGA (For running on FPGA used HETERO:FPGA,CPU), MYRIAD.
-l (optional) : Specify the absolute path of cpu extension if some layers of models are not supported on the device.
-prob (optional) : Specify the probability threshold for face detection model to detect the face accurately from video frame.
-flags (optional) : Specify the flags from fd, fld, hp, ge if you want to visualize the output of corresponding models of each frame (write flags with space seperation. Ex:- -flags fd fld hp).

Directory Structure of the project

computer-pointer-controller  
|
|--media
|   |--demo.mp4
|   |--fps_fp16.png
|   |--fps_fp32.png
|   |--fps_int8.png
|   |--inference_time_fp16.png
|   |--inference_time_fp32.png
|   |--model_loading_time_fp16.png
|   |--model_loading_time_fp32.png
|   |--model_loading_time_int8.png
|--README.md
|--requirements.txt
|--src
  |--face_detection.py
  |--facial_landmarks_detection.py
  |--gaze_estimation.py
  |--head_pose_estimation.py
  |--input_feeder.py
  |--main.py
  |--mouse_controller.py

src folder contains all the source files:-
1. face_detection.py
  - Contains preprocession of video frame, perform infernce on it and detect the face, postprocess the outputs.
2. facial_landmarks_detection.py
  - Take the deteted face as input, preprocessed it, perform inference on it and detect the eye landmarks, postprocess the outputs.
3. head_pose_estimation.py
  - Take the detected face as input, preprocessed it, perform inference on it and detect the head postion by predicting yaw - roll - pitch angles, postprocess the outputs.
4. gaze_estimation.py
  - Take the left eye, rigt eye, head pose angles as inputs, preprocessed it, perform inference and predict the gaze vector, postprocess the outputs.
5. input_feeder.py
  - Contains InputFeeder class which initialize VideoCapture as per the user argument and return the frames one by one.
6. mouse_controller.py
  - Contains MouseController class which take x, y coordinates value, speed, precisions and according these values it moves the mouse pointer by using pyautogui library.
7. main.py
  - Users need to run main.py file for running the app.
media folder contains demo video which user can use for testing the app.

Benchmarks

Benchmark results of the model.

FP32

Inference Time

Frames per Second

Model Loading Time

FP16

Inference Time

Frames per Second

Model Loading Time

INT8

Inference Time

Frames per Second

Model Loading Time

Results

I have run the model in 5 diffrent hardware:-

Intel Core i5-6500TE CPU
Intel Core i5-6500TE GPU
IEI Mustang F100-A10 FPGA
Intel Xeon E3-1268L v5 CPU
Intel Atom x7-E3950 UP2 GPU

Also compared their performances by inference time, frame per second and model loading time.

As we can see from attached snapshots that FPGA took more time for inference than other device because it programs each gate of fpga speciific to each computer vision applications GPU proccesed more frames per second compared to any other hardware and mainly during FP16 because GPU has multiple core and instruction sets that are specifically optimized to run 16bit floating point operations.

After running models with different precision, we can see that precision affects the accuracy. Mdoel size can reduce by reducing the precision from FP32 to FP16 or INT8 and inference becomes faster. But as a result of lowering the precision , model can lose some of the important information, that's why accuracy decreases.

Stand Out Suggestions

Edge Cases

If multiple people are in the frame then the application selects one person to work with. This solution works in most cases but may introduce flickering effect between two heads.
If there is more than one face detected, it extracts only one face and do inference on it and ignoring other faces.

3.To avoid such edge cases, we have to make sure that there is enough lighting and only a single person in the frame to run the project more robustly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

media

media

src

src

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Computer-Pointer-Controller

Introduction

Demo video

Project Set Up and Installation

Setup

Prerequisites

Step 1

Step 2

Step 3

Demo

Documentation

Documentatiob of used models

Command Line Arguments for Running the app

Directory Structure of the project

Benchmarks

FP32

FP16

INT8

Results

Stand Out Suggestions

Edge Cases

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
media		media
src		src
README.md		README.md
requirements.txt		requirements.txt

mdfazal/Computer-Pointer-Controller

Folders and files

Latest commit

History

Repository files navigation

Computer-Pointer-Controller

Introduction

Demo video

Project Set Up and Installation

Setup

Prerequisites

Step 1

Step 2

Step 3

Demo

Documentation

Documentatiob of used models

Command Line Arguments for Running the app

Directory Structure of the project

Benchmarks

FP32

FP16

INT8

Results

Stand Out Suggestions

Edge Cases

About

Topics

Resources

Stars

Watchers

Forks

Languages