Vehicle Detection And Tracking

The goals / steps of this project are the following:

Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector.
Note: for those first two steps don't forget to normalize your features and randomize a selection for training and testing.
Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
Estimate a bounding box for vehicles detected.

1. Code And Data Setup

What?	File
code: main script	bin/detect_and_track_vehicles.py
code: helper module	lib/helper_vehicle_detection.py
code: tracking class	lib/detection.py
code: tracking class	lib/vehicle.py
code: tracking class	lib/position.py
training data	etc/ml_train_img
input test images	inp/img/test_images/*
input project video	inp/vid/project.mp4
output test images	out/img/*
output project video	out/vid/project_output.mp4

2. Usage

usage: detect_and_track_vehicles.py [-h] [--video PATH] [--startTime INT]
                                    [--endTime INT] [--unroll] [--collect]
                                    [--visLog INT] [--format STRING]
                                    [--outDir PATH] [--mlDir PATH]

a tool for detecting lane lines in images and videos

optional arguments:
  -h, --help       show this help message and exit
  --video PATH     video from a front facing camera. to detect lane lines
  --startTime INT  when developing the image pipeline it can be helpful to
                   focus on the difficult parts of an video. Use this argument
                   to shift the entry point. Eg. --startTime=25 starts the
                   processing pipeline at the 25th second after video begin.
  --endTime INT    Use this argument to shift the exit point. Eg. --endTime=50
                   ends the processing pipeline at the 50th second of the
                   second minute after video begin.
  --unroll         Use this argument to unroll the resulting video in single
                   frames.
  --collect        Use this argument to collect false positives to improve
                   learning.
  --visLog INT     for debugging or documentation of the pipeline you can
                   output the image at a certain processing step 1=detections,
                   2=heatmap, 3=thresholded_heatmap 4=result
  --format STRING  to visualize several steps of the image pipeline and plot
                   them in one single image. use --format=collage4 for a
                   4-image-collage
  --outDir PATH    directory for output data. must not exist at call time.
                   default is --outDir=output_directory_<time>
  --mlDir PATH     directory for machine learning training images. directory
                   must contain 2 subdirectories "vehicles" and "non-
                   vehicles". default is --mlDir=etc/ml_train_img

example call for processing a video:
python bin/detect_and_track_vehicles.py --video inp/vid/project_video.mp4

example call for processing only the part of a video between 38 and 45 seconds:
python bin/detect_and_track_vehicles.py --video inp/vid/project_video.mp4 --startTime 38 --endTime 45

example call for processing a video. This outputs a video of a certain step of the detection pipeline:
python bin/detect_and_track_vehicles.py --video inp/vid/project_video.mp4 --visLog 2

example call for processing a video. This outputs a video of 4 important steps of the image pipeline:
python bin/detect_and_track_vehicles.py --video inp/vid/project_video.mp4 --format collage4

example call for processing a video. This outputs a video as a mp4 file and for each frame of the video an image:
python bin/detect_and_track_vehicles.py --video inp/vid/project_video.mp4 --unroll

3. Create and Train a Support-Vector-Machine Classifier

The detection of vehicles will be performed by a Support Vector Machine Classifier.

The high level code for the creation of such a classifier can be found in the 'createClassifier' function in lines 635 through 783 of file 'lib/helper_vehicle_detection.py'.

3.1 Training Data

To get a classifier which can distinguish between vehicles and non-vehicles, I trained the classifier with approximatly 8500 images of each category. Every image is a 64 x 64 3-color image. The data can be found in the /etc/ml_train_img directory. The vehicle training data looks like this:

And the non-vehicle training data looks like this:

3.2 Features from Histogram of Oriented Gradients (HOG)

A useful HOG representation of an image should generalize well over a variety of colors and different views of similar shapes and stay distinct enough, to distinguish an object class from other classes.

To get from the RGB image to a HOG representation, I did this:

Convert the image into a color space that I know produces good HOG-representations. I had good experiences with the color space 'LUV' and 'YCrCb'
Extract one color channel
use the function 'skimage.hog()' to convert this color channel into a hog image
For the HOG calculation, I used 9 orientations, 8 pixels_per_cell, and 2 cells_per_block.
For the project, I convert all 3 channels in their HOG-Representation and concatenate them to the comprehensive feature vector of the image

The code for the HOG Transformation can be found in function 'get_hog_features' in lines 611 through 637 of file lib/helper_vehicle_detection.py.

This is how the original image and the HOG representation of the Y-Channel (in YCrCb color space) looks like.

3.3 Features from Histogram of Color

To improve the performance of the classifier, I decided to feed in more data and use 'Histogram of Color' as a source for my feature vector.

To get from the RGB image to the Histogram of Color, I did this:

Convert the image into a color space that I know produces good distinguishable Histograms of Color. I don't have any experience with this, so I decided to simply use the same color space (YCrCb) as for the HOG extraction.
Extract one color channel
use the function np.histogram() to convert the color data into a histogram with 32 bins.
For the project, I convert all 3 channels in their 'Histogram of Color'-Representation and concatenate them to the comprehensive feature vector of the image

The code for the Histogram of Color can be found in function 'color_hist' in lines 506 through 514 of file lib/helper_vehicle_detection.py.

3.4 Features from Spatial Binning of Color

To improve the performance of the classifier, I decided to feed in more data and use 'Spatial Binning' as a source for my feature vector. Spatial Binning is the histogram of pixel intensity of an image.

To get from the RGB image to the Spatial Binned Representation, I did this:

Convert the image into a color space that I know produces good distinguishable Histograms of Color. I don't have any experience with this, so I decided to simply use the same color space (YCrCb) as for the HOG extraction.
resize the image to 32 x 32 x 3
flatten the color image with the function np.ravel()
create a histogram of the flattened image and concatenate it to the comprehensive feature vector of the image

The code for the Spatial Binning can be found in function 'bin_spatial' in lines 498 through 502 of file lib/helper_vehicle_detection.py.

3.5 Parameter and Accuracy

I split the data in 80% training- and 20% test-set. I scaled the features to zero mean and unit variance. I used the StandardScaler() from the sklearn module. For the feature generation, I tried several combinations of parameters. My best accuracy of 99.6% on the test set, I achieved with the following configuration:

ALL color_space: 'YCrCb'

SPATIAL-BIN spatial size: (32, 32)

COLOR-HIST amount of bins: 32

HOG amount orientations: 9

HOG pix_per_cell: 8

HOG cell_per_block: 2

HOG use of channels: 'ALL'

3.6 Save Classifier

For later usage I saved the classifier in a pickle file.

4. Detect Vehicles in Video

4.1 Sliding Window

To detect a vehicle, we only have to look at the parts of the image where a vehicle could occur. Vehicles that are farther away appear smaller in the image than vehicles that are near the camera. To address these circumstances, I did the following:

Search for vehicles in image only in area 450 < y 650
different search windows sizes (110 x 110), (90 x 90), (64 x 64) and (50 x 50)
overlap of search windows of 75%

The code for determining of Sliding Windows can be found in function 'slide_windows' in lines 420 through 459 of file lib/helper_vehicle_detection.py.

This is how 64 x 64 search windows look like:

(left: 64 x 64 without overlap, right: 64 x 64 with 75% overlap)

4.2 Hot Windows

The search windows for which the classifier detects a vehicle are called hot windows.

The code for determining of Hot Windows can be found in function 'search_window' in lines 463 through 495 of file lib/helper_vehicle_detection.py.

This is an image with several hot windows

4.3 Heatmap

The areas inside of hot windows are summed up to a heat map of possible vehicle positions. Not every hot window is a unique car - usually a vehicle is detected by a bunch of hot windows of different sizes. This leads to hot areas that are connected. To eliminate false positives an area needs at least 2 overlapping detections to be considered as a candidate for a vehicle position.

The code for generating the heat map can be found in function 'add_heat' in lines 245 through 253 of file lib/helper_vehicle_detection.py.

This is how a heat map looks like.

(left: hot windows, right: heat map)

4.4 Labels

Every isolated hot area is a possible vehicle position.

The code for generating the labels can be found in lines 175 through 185 of file lib/helper_vehicle_detection.py.

(left: heat map, right: thresholded areas (labels))

4.5 Detect Vehicles

If 3 consecutive frames show a detection within a tolerance of 25 pixels from the anticipated position, then this detection is seen as a valid vehicle position. The anticipated position is calculated from the position of 3 consecutive frames.

The code for detecting and tracking vehicles in videos can be found in line 204 of file lib/helper_vehicle_detection.py and in the class files lib/detection, lib/vehicle and lib/position.

(left: thresholded areas (labels), right: resulting vehicle detections)

4.6 Tracking

Each detected vehicle position is reviewed in every frame. This is similar to the first detection of a vehicle - a position has to be confirmed within a radius of 25 pixels. If a tracked vehicle cannot be confirmed on 2 consecutive frames, the vehicle is removed and considered of having left the image.

The code for detecting and tracking vehicles in videos can be found in line 204 of file lib/helper_vehicle_detection.py and in the class files lib/detection, lib/vehicle and lib/position.

4.7 Result of Project Video

You find the result of project video here out/vid/project_output.mp4

4.8 Result of Project Video With Image Pipeline Visualization

You find the result of project video with pipeline visualization here out/vid/project_collage4_output.mp4

5. Discussion

5.1 Problems / Issues

Issue 1:

In my early attempts, there were a lot of false positives on the road surface near the second bridge.

I solved the problem by taking additional training samples from exact the areas that produced the lot of hot windows. This eliminated almost all false positives.

Issue 2:

First I had 2 hot zones on the white car, that have been recognized as 2 vehicles.

I solved the problem by adding 50 x 50 search windows and by increasing the overlap to 75% in all search window sizes.

Issue 3:

It happened that the detection of the white car got lost.

I increased the position tolerance of the frame-to-frame confirmation radius from 15 to 25 pixels and allowed to fail the confirmation in one frame without dropping the vehicle.

5.2 Likely Fails

A lot of existing training examples are taken from the project video, so there is a strong bias towards this video. The classifier would most likely fail or at least perform significantly worse on other videos in other settings.

5.3 Improve Robustness

More training data would improve robustness
- with different cars
- different daytime and luminosity
- different colors
- different streets
- different vegetation
- urban / rural settings
I think a well trained CNN would generalize better than a linear SVM.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
bin		bin
etc/ml_train_img		etc/ml_train_img
inp		inp
lib		lib
out		out
readme_media		readme_media
README.md		README.md
environment.yml		environment.yml

alexvogel/vehicle-detection-and-tracking

Folders and files

Latest commit

History

Repository files navigation