Skip to content

This respository contains code that implements various computer vision algorithms for object tracking in 2D camera images

Notifications You must be signed in to change notification settings

mdevana/ObjectTracking_in_2D_camera_images

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2D Feature Tracking in Camera Images for Sensor Fusion

Goal

The goal of this project is to build a collision detection system by tracking the features in a image. I have built the feature tracking part and implemented various detector / descriptor combinations and matching algorithms to see which ones perform best.

Introduction

The feature tracking project consists of four parts:

The Data Buffer: This consists of loading the images, setting up the data structure, and put everything into the data buffer.

Keypoint Detection: Integrate several keypoint detectors, such as HARRIS, FAST, BRISK, ORB, AKAZE, and SIFT, and compare them to each other based on the number of key points and speed.

Descriptor Extraction & Matching: Extract the descriptors and match them using the brute-force and FLANN approach.

Performance Evaluation: Compare and evaluate which combination of algorithms perform the best concerning performance measurement parameters.

Data Buffer Optimization (MP 1)

Since computer vision algorithms will be deployed in mobile hardware with limited resources, optimizing the amount of data held in memory is of significant importance. For this purpose, I have implemented a data buffer modeled based on Queue data structure. Here, the first image to enter will also leave first to accommodate next image and maintain a constant queue size, in this case 2.

        DataFrame frame;
        frame.cameraImg = imgGray;
        frame.imgName = imgFullFilename;
        if (dataBuffer.size() >= dataBufferSize){
            dataBuffer.erase(dataBuffer.begin());
        }
        dataBuffer.push_back(frame);

Keypoint Detection Algorithm Selection (MP 2)

The computer vision library provides various algorithms to detect keypoint in images. I have selected and tested the following algorithms. Harris, Shi-Tomasi, FAST, ORB, AKAZE and SIFT.

I have developed dual mode to run my program. If the variable is_single_run is set to true, then values set to the string det_type and des_type will be considered. If the value of is_single_run is false, then all combination of detection and descriptor types will used and performance analysis will be conducted. Please set the variable to true, if you want to use your own combination

    bool is_single_run = true;
    string det_type = "FAST"; // Detector Type
    string des_type = "ORB";// Descriptor Type

The string det_type and des_type are checked if their values matches anyone of the algorithms is implemented, and the corresponding call is made.

 if (detectorType.compare("SHITOMASI") == 0)
        {
            detKeypointsShiTomasi(keypoints, imgGray, false,ctime_detection);
        }
        else if(detectorType.compare("HARRIS") == 0)
        {
            detKeypointsHarris(keypoints, imgGray, false,ctime_detection);
        }
        else if( (detectorType.compare("FAST") == 0) || (detectorType.compare("BRISK") == 0) || (detectorType.compare("ORB") == 0) || (detectorType.compare("AKAZE") == 0) || (detectorType.compare("SIFT") == 0) )
        {
            detKeypointsModern(keypoints, imgGray,detectorType, false,ctime_detection);
        }

While Shi-Tomasi and Harris have their own function calls, other algorithms are clustered into a single function call detKeypointsModern(). The function is implemented in matching2D_Student.cpp. Inside this function, corresponding call to respective algorithms are made as shown below.

 if (detectorType.compare("FAST") == 0){

        t = (double)cv::getTickCount();
        cv::Ptr<cv::FastFeatureDetector> fast_detect = cv::FastFeatureDetector::create(30,true);
        fast_detect->detect(img,keypoints);
        t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
        cout << "FAST detection with n=" << keypoints.size() << " keypoints in " << 1000 * t / 1.0 << " ms" << endl;

    }

    else if (detectorType.compare("BRISK") == 0){

        t = (double)cv::getTickCount();
        cv::Ptr<cv::FeatureDetector> detector = cv::BRISK::create();
        detector->detect(img,keypoints);
        t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
        cout << "BRISK detection with n=" << keypoints.size() << " keypoints in " << 1000 * t / 1.0 << " ms" << endl;

    }

    else if (detectorType.compare("ORB") == 0){

        t = (double)cv::getTickCount();
        cv::Ptr<cv::FeatureDetector> detector = cv::ORB::create();
        detector->detect(img,keypoints);
        t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
        cout << "ORB detection with n=" << keypoints.size() << " keypoints in " << 1000 * t / 1.0 << " ms" << endl;

    }

    else if (detectorType.compare("AKAZE") == 0){

        t = (double)cv::getTickCount();
        cv::Ptr<cv::FeatureDetector> detector = cv::AKAZE::create();
        detector->detect(img,keypoints);
        t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
        cout << "AKAZE detection with n=" << keypoints.size() << " keypoints in " << 1000 * t / 1.0 << " ms" << endl;

    }
    else if (detectorType.compare("SURF") == 0){


        int minHessian=400;
        t = (double)cv::getTickCount();
        cv::Ptr<cv::FeatureDetector> detector = cv::xfeatures2d::SURF::create(minHessian);
        detector->detect(img,keypoints);
        t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
        cout << "SURF detection with n=" << keypoints.size() << " keypoints in " << 1000 * t / 1.0 << " ms" << endl;

    }
    else if (detectorType.compare("SIFT") == 0){

        t = (double)cv::getTickCount();
        cv::Ptr<cv::FeatureDetector> detector = cv::xfeatures2d::SIFT::create();
        detector->detect(img,keypoints);
        t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
        cout << "SIFT detection with n=" << keypoints.size() << " keypoints in " << 1000 * t / 1.0 << " ms" << endl;
    }

Keypoint Removal (MP 3)

Since project scope is restricted to detecting the vehicle at the front, the keypoints detected on front vehicles are alone considered for further processing. The variable bFocusOnVehicle should be set to true for keypoint restriction on Front vehicle. The bounding box for front vehicle is provided by the cv::Rect(). By looping through all the detected keypoints and adding only the ones which fall into the box into a fresh vector, the keypoints are seperated. The inbuilt function of cv::Rect contains() can also be used to check if the keypoints fall into the bounding box. Here, I have checked it manually.

        bool bFocusOnVehicle = true;
        cv::Rect vehicleRect(535, 180, 180, 150);
        if (bFocusOnVehicle)
        {
            vector<cv::KeyPoint> filteredPts;
            float a = vehicleRect.x;
            for (cv::KeyPoint kp:keypoints){
                if ( ( (kp.pt.x > vehicleRect.x ) && (kp.pt.x < ( vehicleRect.x + vehicleRect.width ) ) ) && ( (kp.pt.y > vehicleRect.y ) && (kp.pt.y < ( vehicleRect.y + vehicleRect.height ) ) ) ){
                    filteredPts.push_back(kp);
                }
                
                
            }
            keypoints = filteredPts;
            cout << " NOTE: Keypoints Restricted to box of preceding vehicle!" << keypoints.size()<<endl;
        }

Keypoint Descriptors (MP 4)

Keypoint descriptor algorithms like BRISK, ORB, FREAK, AKAZE and SIFT are implemented in this project. However, some descriptors work only with a specfic detector. In the combination availabe in this project, AKAZE descriptor can only work with keypoint detected by AKAZE detector. Similarly, ORB descriptor will not work with SIFT Keypoints. These restrictions are also coded in this project. The function calls to respective descriptor is made based on the string value des_type.

A general function call is made in MidTermProject_Camera_Student.cpp by passing the keypoints detected, image, descriptor type to be used. The extracted descriptor and the time taken to execute the extraction process, is then returned back via references to descriptors and ctime_desextract. The descriptor is then assigned to the last DataFrame

descKeypoints((dataBuffer.end() - 1)->keypoints, (dataBuffer.end() - 1)->cameraImg, descriptors, descriptorType,ctime_desextract);

(dataBuffer.end() - 1)->descriptors = descriptors;

In the matching2D_Student.cpp, the function to select the right extractor and execution is coded. Finally , the descriptor and the execution time is passed back as reference.

 cv::Ptr<cv::DescriptorExtractor> extractor;
    if (descriptorType.compare("BRISK") == 0)
    {

        int threshold = 30;        // FAST/AGAST detection threshold score.
        int octaves = 3;           // detection octaves (use 0 to do single scale)
        float patternScale = 1.0f; // apply this scale to the pattern used for sampling the neighbourhood of a keypoint.

        extractor = cv::BRISK::create(threshold, octaves, patternScale);
    }
    else if (descriptorType.compare("BRIEF") == 0)
    {
        extractor = cv::xfeatures2d::BriefDescriptorExtractor::create();
        //extractor->compute(img,keypoints,descriptors);
        
    }
    else if (descriptorType.compare("ORB") == 0)
    {
        extractor = cv::ORB::create();
        //extractor->compute(img,keypoints,descriptors);
        
    }

    else if (descriptorType.compare("FREAK") == 0)
    {
        extractor = cv::xfeatures2d::FREAK::create();
        //extractor->compute(img,keypoints,descriptors);
        
    }
    else if (descriptorType.compare("AKAZE") == 0)
    {
        extractor = cv::AKAZE::create();
        //extractor->compute(img,keypoints,descriptors);
        
    }
    else if (descriptorType.compare("SIFT") == 0)
    {
        extractor = cv::xfeatures2d::SIFT::create();
        //extractor->compute(img,keypoints,descriptors);
        
    }

    // perform feature description
    double t = (double)cv::getTickCount();
    extractor->compute(img, keypoints, descriptors);
    t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
    compute_time = t * 1000 / 1.0;
    cout << descriptorType << " descriptor extraction in " << 1000 * t / 1.0 << " ms" << endl;

Descriptor Matching and Descriptor Ratio (MP 5 / 6)

The type of matcher to be used is available in string matcherType. Brute Force matching and FLANN matching are implemented. The selector type is decided by the variable selectorType. Availabe options here are Nearest neighbour (SEL_NN) and k-nearest neighbour (SEL_KNN). Descriptor Type is required to set if binary descriptor or HOG (Histogram of Gradients) descripor is used. In this project, SIFT is HOG based descriptor.

            string matcherType = "MAT_BF";        // MAT_BF, MAT_FLANN
            string selectorType = "SEL_KNN";       // SEL_NN, SEL_KNN
            string descriptorType = descriptor_class[desIndex]; // DES_BINARY, DES_HOG

In the matching2D_Student.cpp, the function matchDescriptors implements the matching process. The matcherType is first selected. For Brute Force matching, default distance normaliser Hamming is used. However for HOG descriptor, L2 normalisation is used. The respective call to either Nearest Neighbour or K- Nearest Neighbour is used with fixed value of 2 neighbours. The ratio threshold of 0.8 is used to choose a match.

bool crossCheck = false;
    cv::Ptr<cv::DescriptorMatcher> matcher;

    if (matcherType.compare("MAT_BF") == 0)
    {
        int normType = cv::NORM_HAMMING;
        
        if (descriptorType=="DES_HOG"){
            normType = cv::NORM_L2;
            cout<<"switching to L2_NORM for "<< descriptorType<<endl;

        }
            
        matcher = cv::BFMatcher::create(normType, crossCheck);
    }
    else if (matcherType.compare("MAT_FLANN") == 0)
    {
        if ( descSource.type() != CV_32F ){
            descSource.convertTo(descSource,CV_32F);
            descRef.convertTo(descRef,CV_32F);
        }
        matcher = cv::FlannBasedMatcher::create();
    }

    // perform matching task
    if (selectorType.compare("SEL_NN") == 0)
    { // nearest neighbor (best match)

        matcher->match(descSource, descRef, matches); // Finds the best match for each descriptor in desc1
    }
    else if (selectorType.compare("SEL_KNN") == 0)
    { // k nearest neighbors (k=2)
        vector<vector<cv::DMatch>> knn_matches;
        matcher->knnMatch(descSource, descRef, knn_matches,2);


        const float ratio_threshold=0.8f;
        for ( size_t i=0; i<knn_matches.size();i++){
            float ratio = knn_matches[i][0].distance / knn_matches[i][1].distance;
            if ( ratio < ratio_threshold )
                matches.push_back(knn_matches[i][0]);

        }
        
    }

Keypoint size and distribution for all detectors (MP 7)

As seen from the statistics , BRISK detector detects the maximum number of keypoints. AKAZE and FAST detects the next maximum number of keypoints. The images below show the distribution of keypoints in the image for different detectors.

Keypoints detected by Harris detector

Keypoints detected by Shi Tomasi detector

Keypoints detected by FAST detector

Keypoints detected by BRISK detector

Keypoints detected by ORB detector

Keypoints detected by AKAZE detector

Keypoints detected by SIFT detector

Keypoints macthes for all detector / descriptor combination (MP 8)

BRISK - BRISK combination matched the maximum number of keypoints. As we see later, the FAST - BRISK , FAST-ORB combination match comparable number of keypoints at a very high speed (highlighted in Red)

Computation time for detector / descriptor combination (MP 9)

From the statistics, it is clearly evident, that FAST - BRISK and FAST - ORB have lowest computation time, at the same time have reasonly number of keypoint matches followed by ORB - BRISK combination. My conclusion is to use FAST - BRISK or FAST - ORB combination for tracking in camera images. Note : All the values are averaged over all the images.

Keypoint Matches using FAST - BRISK combination

Keypoint Matches using FAST - ORB combination

Keypoint Matches using ORB - BRISK combination

Dependencies for Running Locally

Basic Build Instructions

  1. Clone this repo.
  2. Make a build directory in the top level directory: mkdir build && cd build
  3. Compile: cmake .. && make
  4. Run it: ./2D_feature_tracking.

About

This respository contains code that implements various computer vision algorithms for object tracking in 2D camera images

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published