Vehicle Detection Project
The goals / steps of this project are the following:
- Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
- Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
- Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
- Estimate a bounding box for vehicles detected.
The labeled vehicle and non vehicle images to train our classifier are first loaded.
Here are some of the images from the dataset.
1. Explain how (and identify where in your code) you extracted HOG features from the training images.
To extract hog features, hog function from the skimage library is used.
The get_hog_features
method takes in an image along with other parameters such as cells per block, orient,pixels per cell and returns the hog features.
Here's how the HOG visualization looks like
Next up, the extract_features
function takes in a collection of images along with additional parameters such as colorspace and number of channels to be taken into consideration and returns set of hog features for the entire set of images.
This set is then split into training and testing set.
I tried with various combinations of HOG parameters and the results are shown below:
As seen from the table:
- Accuracy increases with the number of channels taken into consideration for extraction of HOG features.
- RGB colorspace , expectedly, doesnt perform quite as well as other color spaces.
- Combination of YCrCb color space, along with orient value of 10, 16 pixels per cell , and 2 cells per block , does the best job at accuracy as well as at time taken to make predictions.
3. Describe how (and identify where in your code) you trained a classifier using your selected HOG features (and color features if you used them).
Next up, in the Train the Classifier
section, Linear SVM is instantiated and trained.
1. Describe how (and identify where in your code) you implemented a sliding window search. How did you decide what scales to search and how much to overlap windows?
The find_cars
function combines the job of extracting HOG features from the image and then perform subsampling, making a prediction for a particular portion of the image (window), thus implementing the sliding window approach, efficiently. Each window is defined by a scaling factor.
The function returns the bounding boxes for regions where the prediction was true (vehicle was detected) in the image.
However, there seems to be a problem as we explore further: There seems to be false positives as well.
Various scaling factors for different window sizes were tried. Here are the results:
The smallest scaling factor has been as 1.0. Anything smaller than that would result in too many false positives. Also, the regions to look for cars has been restricted for each window scale size(smaller window scales->smaller sizes->far away vehicles->smaller vertical range). Larger scale windows are reserved for near (and hence larger seeming) vehicles , thus having larger vertical region to look in.
2. Show some examples of test images to demonstrate how your pipeline is working. What did you do to optimize the performance of your classifier?
The last part in the pipeline is the removal of false positives. Heatmap approach is used for this.
As we can see from the original images and their heatmaps, positive detections have multiple bounding boxes, whereas false detections have just a single bounding box. We use this observation to filter out the false positives.
In the apply threshold
function, we specify the number of bounding boxes, required to consider a detection as true. We specify 1 as our threshold.
As we can see, false detections are successfully filtered out.
Now having left with true detections, we can label the detections using label
function of the scipy.ndimage library.
The final detection area is set to the extremities of each identified label:
The final result looks like this.
Following changes to the HOG parameters resulted in increased classifier accuracy :
- Using all channels instead of one of them.
- Changing color spaces other than RGB.
- Varying the orient parameter value to see what gives better accuracy for each color space.
1. Provide a link to your final video output. Your pipeline should perform reasonably well on the entire project video (somewhat wobbly or unstable bounding boxes are ok as long as you are identifying the vehicles most of the time with minimal false positives.)
For video implementation, one last thing has to be considered. Using information from previous frames. A Vehicle Detection class is created for storing bounding boxes extracted from each frame and are added to a collection. This collective historical information is added to the heatmap and then half of this number is used as a threshold for heatmap application.
1. Briefly discuss any problems / issues you faced in your implementation of this project. Where will your pipeline likely fail? What could you do to make it more robust?
- Choice of classifier is a significant issue in this project. Whilst we achieve 98% accuracy , still there are false positives and in cases where the conditions are not much similar to training data, the classifier would fail.
Use of a more robust classifier/learner such as convolutional neural networks (YOLO, UNet or SSD) would perform much better.
- The pipeline takes time in processing a particular frame and making a prediction for that frame. This is not suitable for real time situations.
Again, use of networks such as YOLO, SSD and UNet can mitigate this risk.
- Whilst taking previous frames into consideration remove jitters from the video and to some extent remove false positives, still vehicle changing direction frequently would render this useless.