Skip to content

ayushdabra/dubai-satellite-imagery-segmentation

Repository files navigation

Open In Colab launch binder

Dubai Satellite Imagery Semantic Segmentation Using Deep Learning

Abstract

Semantic segmentation is the task of clustering parts of an image together which belong to the same object class. It is a form of pixel-level prediction because each pixel in an image is classified according to a category. In this project, I have performed semantic segmentation on Dubai's Satellite Imagery Dataset by using transfer learning on a InceptionResNetV2 encoder based UNet CNN model. In order to artificially increase the amount of data and avoid overfitting, I preferred using data augmentation on the training set. The model has achieved ~81% dice coefficient and ~86% accuracy on the validation set.

Tech Stack

The Jupyter Notebook can be accessed from here.

The pre-trained model weights can be accessed from here.

Dataset

Humans in the Loop has published an open access dataset annotated for a joint project with the Mohammed Bin Rashid Space Center in Dubai, the UAE. The dataset consists of aerial imagery of Dubai obtained by MBRSC satellites and annotated with pixel-wise semantic segmentation in 6 classes. The images were segmented by the trainees of the Roia Foundation in Syria.

Semantic Annotation

The images are densely labeled and contain the following 6 classes:

Name R G B Color
Building 60 16 152

Land 132 41 246

Road 110 193 228

Vegetation 254 221 58

Water 226 169 41

Unlabeled 155 155 155

Sample Images & Masks

Technical Approach

Data Augmentation using Albumentations Library

Albumentations is a Python library for fast and flexible image augmentations. Albumentations efficiently implements a rich variety of image transform operations that are optimized for performance, and does so while providing a concise, yet powerful image augmentation interface for different computer vision tasks, including object classification, segmentation, and detection.

There are only 72 images (having different resolutions) in the dataset, out of which I have used 56 images (~78%) for training set and remaining 16 images (~22%) for validation set. It is a very small amount of data, in order to artificially increase the amount of data and avoid overfitting, I preferred using data augmentation. By doing so I have increased the training data upto 9 times. So, the total number of images in the training set is 504 (56+448), and 16 (original) images in the validation set, after data augmentation.

Data augmentation is done by the following techniques:

  • Random Cropping
  • Horizontal Flipping
  • Vertical Flipping
  • Rotation
  • Random Brightness & Contrast
  • Contrast Limited Adaptive Histogram Equalization (CLAHE)
  • Grid Distortion
  • Optical Distortion

Here are some sample augmented images and masks from the dataset:

InceptionResNetV2 Encoder based UNet Model

InceptionResNetV2 Architecture

Source: https://arxiv.org/pdf/1602.07261v2.pdf

UNet Architecture

Source: https://arxiv.org/pdf/1505.04597.pdf

InceptionResNetV2-UNet Architecture

  • InceptionResNetV2 model pre-trained on the ImageNet dataset has been used as an encoder network.

  • A decoder network has been extended from the last layer of the pre-trained model, and it is concatenated to the consecutive layers.

A detailed layout of the model is available here.

Hyper-Parameters

  1. Batch Size = 16.0
  2. Steps per Epoch = 32.0
  3. Validation Steps = 4.0
  4. Input Shape = (512, 512, 3)
  5. Initial Learning Rate = 0.0001 (with Exponential Decay LearningRateScheduler callback)
  6. Number of Epochs = 45 (with ModelCheckpoint & EarlyStopping callback)

Results

Training Results

Model Epochs Train Dice Coefficient Train Accuracy Train Loss Val Dice Coefficient Val Accuracy Val Loss
InceptionResNetV2-UNet 45 (best at 34th epoch) 0.8525 0.9152 0.2561 0.8112 0.8573 0.4268

The model_training.csv file contain epoch wise training details of the model.

Visual Results

Predictions on Validation Set Images:

All predictions on the validation set are available in the predictions directory.

Activations (Outputs) Visualization

Activations/Outputs of some layers of the model:

conv2d

conv2d_4

conv2d_8

conv2d_10

conv2d_22

conv2d_28

conv2d_29

conv2d_34

conv2d_35

conv2d_40

conv2d_61

conv2d_70

Some more activation maps are available in the activations directory.

Code for visualizing activations is in the get_activations.py file.

References

  1. Dataset- https://humansintheloop.org/resources/datasets/semantic-segmentation-dataset/
  2. C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” arXiv.org, 23-Aug-2016. [Online]. Available: https://arxiv.org/abs/1602.07261.
  3. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” arXiv.org, 18-May-2015. [Online]. Available: https://arxiv.org/abs/1505.04597.