CNN Drone BeFree

Code repository for my Master Thesis:
Interpretation of Neural Networks and Advanced Image Augmentation for Visual Control of Drones in Human Proximity

I have considered this work from Dario Mantegazza, later expanded here by Nicky Zimmerman. The aims of the project is to make the drone flying outside of the room arena in which the dataset has been collected. For doing so, I am using network interpretability techniques together with background replacement/randomization on the dataset images.

Check out the Jupyter Notebooks in notebooks for a better understanding of the initial work on interpretability (GradCAM) and human segmentation (MaskRCNN)
Check out the Python Scripts in src for a better understanding of the training procedure conducted through TensorFlow and tf.data on the masked dataset; also testing and visualization scripts are available for use, easily launched by executables in src/runners
Some fast visual results are available in the results folder
Several models trained during the development of the project are available into the model folder
The best model so far with both quantitative and qualitative results is available here
My thesis report is available on https://github.com/mferri17/thesis-master

Deliverables

Abstract

We consider the task of predicting the pose of a person moving in front of a drone, using the input images coming from an on-board camera. We aim to improve a machine learning model designed for this intent [Mantegazza et al., 2019]. The approach relies on supervised learning to perform a regression on the user’s pose through a Residual Neural Network. The training data is collected in a dedicated drone arena, using a Motion Capture system to acquire the ground truth. The prototype achieves good performance inside the arena but cannot fulfill its duty in unknown environments.

First, we understand the main issues of the learned task through network interpretation. Applying Grad-CAM [Selvaraju et al., 2019], we observe that the model not only focus on the user who is actually facing the drone’s camera. Instead, various portions of the input images are considered when the model makes its predictions. We assume that the neural network has undesirably learned some details about the drone arena in which the dataset has been collected.

As a solution, we develop an advanced data augmentation technique designed for enhancing the generalization capabilities of the model. The goal is to break the relationship between the model’s learning and the training room. Our approach consists on modifying the original dataset through images’ background replacement, allowing us to simulate data collected in many different environments. The implementation is done by using Mask R-CNN [He et al., 2018] to compute the user’s mask from each sample in the original training set. Then, we use the computed masks for replacing the background of the corresponding images and retraining the neural network. We run experiments that show that our proposal is successful both from quantitative and qualitative viewpoints. The new model, trained on the augmented dataset, produces satisfactory results in a large variety of real-world scenarios.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
models		models
notebooks		notebooks
results		results
src		src
thesis		thesis
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models

models

notebooks

notebooks

results

results

src

src

thesis

thesis

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

CNN Drone BeFree

Deliverables

Abstract

About

Languages

mferri17/cnn-drone-befree

Folders and files

Latest commit

History

Repository files navigation

CNN Drone BeFree

Deliverables

Abstract

About

Topics

Resources

Stars

Watchers

Forks

Languages