Skip to content

neocsr/semantic-segmentation

Repository files navigation

Semantic Segmentation

In this project I trained a Fully Convolutional Network (FCN) to classify each pixel of an image as ROAD or NOT ROAD.

I used the KITTI Dataset avaialable at http://www.cvlibs.net/datasets/kitti/eval_road.php

The dataset consists of 289 training and 290 test images. It contains three different categories of road scenes:

  • uu - urban unmarked (98/100)
  • um - urban marked (95/96)
  • umm - urban multiple marked lanes (96/94)
  • urban - combination of the three above

Ground truth has been generated by manual annotation of the images and is available for two different road terrain types:

  • road - the road area, i.e, the composition of all lanes, and
  • lane - the ego-lane, i.e., the lane the vehicle is currently driving on (only available for category "um").

Ground truth is provided for training images only.

The original paper that made available the KITTI Dataset by Jannik Fritsch et al. can be found at http://www.cvlibs.net/publications/Fritsch2013ITSC.pdf

The FCN was based on the paper by Jonathan Long et al. https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf

Predictions

Urban Multiple Marked Lanes

png

png

png

Urban Unmarked Lanes

png

png

png

Urban Marked Lanes

png

png

png

Misses

png

png

png

Model

Architecture

Following the paper by Jonathan Long, it uses the original VGG 16 network and replaces the fully connected layers with three 1x1 convolutions for layers 7, 4 and 3, adding skip layers between them.

Parameters
  • keep_prob: 0.5
  • learning_rate: 0.0005
  • epochs: 30
  • batch_size: 8

After several trials, choosing a keep probability of 0.5, a learning rate of 0.0005 and 30 epochs in batches of 8 images was the run with good results. The loss continually decreased and in the 30th epoch it ended between 0.0200 and 0.0300.

  ...
  - loss   0.0242 (images: 8, labels: 8)
  - loss   0.0289 (images: 8, labels: 8)
  - loss   0.0181 (images: 1, labels: 1)
Running epoch 30/100
  ...

I run the final model for 100 epochs in batches of 8 images. It took 50 minutes to complete (GTX 1080) and reached a final loss of about 0.0100

The final network generated the following TensorFlow model when saved:

SIZE   NAME
----------------------------------------
513M - model_01.pb
513M - model_01.meta
513M - model_01.ckpt.meta
4.8K - model_01.ckpt.index
1.6G - model_01.ckpt.data-00000-of-00001
Original

png

Resized

png

Softmax

png

Final Overlay

png

Model Prediction on Test Images

A few examples from the best model run:

Urban Marked Lanes
Urban Multiple Marked Lanes
Urban Unmarked Lanes

Videos

I run the final model in some videos from my dashcam. The results are remarkable good in portions of the route with similar characteristics than the KITTI dataset.

Considering that none of these images were used for training and the video was completely different, the predictions look good:

Complete videos:

Notes

When building the initial model, I didn't consider the kernel_initializer parameter in the layers (it used the default initializer). That caused the model to generate segmentations with noisy borders:

kernel_initializer with default values kernel_initializer with truncated normal values

References

Releases

No releases published

Packages

No packages published