Skip to content

manavkataria/CarND-Vehicle-Detection

Repository files navigation

Table of Contents

  1. Files
  2. Pipeline
  3. Pipeline In Action
  4. Challenges
  5. Shortcomings
  6. Future Enhancements
  7. Acknowledgements & References

Vehicle Detection

Udacity - Self-Driving Car NanoDegree

This video contains results and illustration of challenges encountered during this project:

youtube_thumb


Files

The project was designed to be modular and reusable. The significant independent domains get their own Class and an individual file:

  1. main.py - Main test runner with VehicleDetection Class and test functions like train_svc_with_color_hog_hist, test_sliding_window, train_or_load_model.
  2. utils.py - Handy utils like imcompare, warper, debug shared across modules
  3. settings.py - Hyperparameters and Settings shared across module
  4. rolling_statistics.py - RollingStatistics class to compute moving_average and rolling_sum
  5. README.md - description of the development process (this file)

The Pipeline section below has a high level description of the pipeline and pointers to implementation. The code is fairly readable and contains detailed comments that explain their working.

Usage

Set Hyperparameters and configurations in settings.py and run the main.py python script as shown below. Repository includes all required files and can be used to rerun vehicle detection & tracking on a given video. Refer References below for training dataset.

$ grep 'INPUT\|OUTPUT' -Hn settings.py
settings.py:9:      INPUT_VIDEOFILE = 'test_video_output.mp4'
settings.py:11:     OUTPUT_DIR = 'output_images/'

$ python main.py
[test_slide_search_window:369] Processing Video:  test_video.mp4
[MoviePy] >>>> Building video output_images/test_video_output.mp4
[MoviePy] Writing video output_images/test_video_output.mp4
 100%|██████████████████████████████████████████████████████████████████| 39/39 [02:01<00:03,  3.05s/it]
[MoviePy] Done.
[MoviePy] >>>> Video ready: output_images/test_video_output.mp4

$ open output_images/test_video_output.mp4

Pipeline

  1. Basic Data Exploration

    • Visually scanned images from each class and selected a Car and a Road as a canonical "Vehicle" "Non-Vehicle" class samples for data exploration. Figures in section Pipeline In Action
    • Explored Color Histograms and HOG Features by comparing how well it separates above classes. Figures in section Pipeline In Action
    • Validating that both "in class" and "out of class" samples have nearly equal sizes ~8.5K samples. Thus we have a balanced dataset and training wouldn't be too biased
  2. Feature Extraction - from "Vehicle" and "Non-Vehicle" classes in extract_features_hog and single_img_features

    • Image Spatial Features as spatial_features
    • Image Color Histograms as hist_features, and
    • Histogram of Oriented Gradients as hog_features
  3. Training Car Detector - with train_or_load_model using LinearSVC in train_svc_with_color_hog_hist

    • Initial classifier test accuracies was 90% without HOG
    • Including HOG, experimentation & careful combination of hyperparameters settings.py:L31-L41 the accuracy rose up to 99%
    • Save Model & Features - Using joblib not pickle. joblib handles large numpy arrays a lot more efficiently
  4. Vehicle Detection - Class that utilizes region limited sliding window search, heatmap, thresholding, labelling and rolling sum to eventually filter the vehicles.

    • __init__ - Initializes Instance Variables like Feature Extraction and Sliding Window Search
    • Memory - Rolling Statistics: Moving Average and Rolling Sum
      • RollingStatistics object with a circular queue for saving MEMORY_SIZE number of previous frames. Leverages Pandas underneath.
      • The rolling_sum based heatmap accumulates heatmaps from past MEMORY_SIZE frames and thresholds them together. Thus eliminating one off noisy detections. I experimented with 25+ different MEMORY_SIZE, ROLLING_SUM_HEAT_THRESHOLD combinations to come up with a video that was smooth, avoided false positives and was responsive enough to a visible car in the video.
      • Prior to rolling_sum I experimented with moving_average but soon realized a literal moving average is a very strict thresholding criterion and hence decided to graduate to rolling_sum which is simpler, more intuitive, lenient and offers a finer thresholding control.
    • sliding_window_search - Search sliding window
      • The window size (ie., scale) and overlap XY_WINDOW and XY_OVERLAP were defined as (96, 96) and 70% respectively. 96px window size is a fair middle ground that works well to identify cars both near and far and 70% overlap helps cover enough ground to not avoid missing any true positives. It also helps in improving the heat score of a successful detection. Having a single scale makes the algorithm less robust and this could be improved. See caching discussion further down.
      • Region limited search to Y = [400, 656] for optimization. I did not want to have any X region limit as it is possible to find cars in the left and the right lane of the autonomous car in a general case.
      • Utilizes memory, 🐛debug, & exception🎇 handling
    • update_overlay - Sliding Window Search Area Highlighting with
      • identifier, and
      • dimensions of bounding_box
    • heat_and_threshold - Computes the heatmaps🔥, labels and bounding boxes with metrics for each labelled box
    • rolling_sum - Gets a rolling_sum heatmap from memory
    • add_to_debugbar - Insets debug information picture-in-picture or rather video-in-video. Quite professional, you see! 👔
  5. Save as Video.mp4

Pipeline In Action

Figure: Feature Vector - Spatial Features and HLS Histogram

Feature Vector - Spatial Features and HLS Histogram

Figure: Feature Vector - HOG and HLS

Feature Vector - HOG and HLS

Figure: Limited Sliding Window Search (Scale = 1x)

Limited Sliding Window Search

Figure: Rolling Sum is Robust to Noise

rolling sum s robustness to noise

Figure: Rolling Sum Picks up where current frame heatmap may not

rolling sum picks up where current frame heatmap does not

Figure: Rolling Sum is averse to cars on other side of the Highway

rolling sum robust to cars on other lanes

Figure: Rolling Sum smoother than current frame (See Video)

youtube_thumb

Challenges

Tuning the Hyperparameters -> Experimentation + Trial & Error

It was non-trivial to choose the Hyperparameters. Its primarily been a trial-and-error process. Mohan Karthik's blogpost was precise and served as a good general direction for Color histogram and HOG params. I still experimented with them on my own to determine what works best for me. As mentioned earlier, just the spatial features and channel histograms yielded classifier test accuracy at 90%. I chose HLS color space as it (or HSV) yielded great results for lane keeping. By some argument HLS is more intuitive than HSV color space. I added HOG features to bump up accuracy to 99%.

Debugging Difficulties -> Enhanced Visualizations

It wasn't easy to visualize why the system didn't work for a given frame of video. Using a rolling sum made things even harder. Hence I decided to a few elements to make my life easy:

  1. Add insets to preview Current Frame Heatmap and Rolling Sum Heatmap
  2. Color coded respective detections differently with thin red from Current Frame Heatmap and THICK GREEN from the Rolling Sum Heatmap.
  3. Rheir respective thresholds are presented in the status screen as HeatTh: RollSum * CurFr , 19 & 1 respectively.
  4. Rolling window buffer size is also displayed as Memory
  5. Current frame id is displayed on the left as 1046
  6. Accuracy of the classifier used is also presented as Accuracy
  7. Bounding box ids and sizes are also displayed as id | width x height around each box; This will be useful in considering a weighted average (see Enhancements below)

Shortcomings

False Positives / Needs Merge

Figure: Example of a frame where a shorter bounding box needs to be merged with an adjacent bigger one new false positive

Long Tail Detection

Figure: Example of a frame where the current implementation detects a long tail due to long frame memory

There was a tradeoff between long-tail and possibility of not having a car detected. I chose to be conservative and err on the side of having a long-tail. long tail

Future Enhancements

  1. Optimize the hog feature extraction by caching and reusing hog for different windows and scales.
  2. Try different Neural Network based vehicle detection approaches like YOLO instead of SVM LinearSVC
  3. Ideas to Reduce False Positives:
    1. Consider Weighted Averaging of frames (on Bounding Box size, for example); Penalize small boxes, sustain large ones
    2. Consider frame weight expiry by incorporating a decay factor, like half-life (λ)
    3. Consider using Optical Flow techniques
  4. Debug: Add heatmap preview window in upper Debug Bar of Video Output (done)
  5. Minor: Write asserts for unittests in rolling_statistics.py
  6. Use sliders to tune thresholds (Original idea courtesy Sagar Bhokre)
  7. Integrate with Advanced Lane Finding Implementation

Acknowledgements & References