The goal of this project is to build a collision detection system by tracking the features in a image. I have built the feature tracking part and implemented various detector / descriptor combinations and matching algorithms to see which ones perform best.
The feature tracking project consists of four parts:
The Data Buffer: This consists of loading the images, setting up the data structure, and put everything into the data buffer.
Keypoint Detection: Integrate several keypoint detectors, such as HARRIS, FAST, BRISK, ORB, AKAZE, and SIFT, and compare them to each other based on the number of key points and speed.
Descriptor Extraction & Matching: Extract the descriptors and match them using the brute-force and FLANN approach.
Performance Evaluation: Compare and evaluate which combination of algorithms perform the best concerning performance measurement parameters.
Since computer vision algorithms will be deployed in mobile hardware with limited resources, optimizing the amount of data held in memory is of significant importance. For this purpose, I have implemented a data buffer modeled based on Queue data structure. Here, the first image to enter will also leave first to accommodate next image and maintain a constant queue size, in this case 2.
DataFrame frame;
frame.cameraImg = imgGray;
frame.imgName = imgFullFilename;
if (dataBuffer.size() >= dataBufferSize){
dataBuffer.erase(dataBuffer.begin());
}
dataBuffer.push_back(frame);
The computer vision library provides various algorithms to detect keypoint in images. I have selected and tested the following algorithms. Harris, Shi-Tomasi, FAST, ORB, AKAZE and SIFT.
I have developed dual mode to run my program. If the variable is_single_run is set to true, then values set to the string det_type and des_type will be considered. If the value of is_single_run is false, then all combination of detection and descriptor types will used and performance analysis will be conducted. Please set the variable to true, if you want to use your own combination
bool is_single_run = true;
string det_type = "FAST"; // Detector Type
string des_type = "ORB";// Descriptor Type
The string det_type and des_type are checked if their values matches anyone of the algorithms is implemented, and the corresponding call is made.
if (detectorType.compare("SHITOMASI") == 0)
{
detKeypointsShiTomasi(keypoints, imgGray, false,ctime_detection);
}
else if(detectorType.compare("HARRIS") == 0)
{
detKeypointsHarris(keypoints, imgGray, false,ctime_detection);
}
else if( (detectorType.compare("FAST") == 0) || (detectorType.compare("BRISK") == 0) || (detectorType.compare("ORB") == 0) || (detectorType.compare("AKAZE") == 0) || (detectorType.compare("SIFT") == 0) )
{
detKeypointsModern(keypoints, imgGray,detectorType, false,ctime_detection);
}
While Shi-Tomasi and Harris have their own function calls, other algorithms are clustered into a single function call detKeypointsModern(). The function is implemented in matching2D_Student.cpp. Inside this function, corresponding call to respective algorithms are made as shown below.
if (detectorType.compare("FAST") == 0){
t = (double)cv::getTickCount();
cv::Ptr<cv::FastFeatureDetector> fast_detect = cv::FastFeatureDetector::create(30,true);
fast_detect->detect(img,keypoints);
t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
cout << "FAST detection with n=" << keypoints.size() << " keypoints in " << 1000 * t / 1.0 << " ms" << endl;
}
else if (detectorType.compare("BRISK") == 0){
t = (double)cv::getTickCount();
cv::Ptr<cv::FeatureDetector> detector = cv::BRISK::create();
detector->detect(img,keypoints);
t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
cout << "BRISK detection with n=" << keypoints.size() << " keypoints in " << 1000 * t / 1.0 << " ms" << endl;
}
else if (detectorType.compare("ORB") == 0){
t = (double)cv::getTickCount();
cv::Ptr<cv::FeatureDetector> detector = cv::ORB::create();
detector->detect(img,keypoints);
t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
cout << "ORB detection with n=" << keypoints.size() << " keypoints in " << 1000 * t / 1.0 << " ms" << endl;
}
else if (detectorType.compare("AKAZE") == 0){
t = (double)cv::getTickCount();
cv::Ptr<cv::FeatureDetector> detector = cv::AKAZE::create();
detector->detect(img,keypoints);
t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
cout << "AKAZE detection with n=" << keypoints.size() << " keypoints in " << 1000 * t / 1.0 << " ms" << endl;
}
else if (detectorType.compare("SURF") == 0){
int minHessian=400;
t = (double)cv::getTickCount();
cv::Ptr<cv::FeatureDetector> detector = cv::xfeatures2d::SURF::create(minHessian);
detector->detect(img,keypoints);
t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
cout << "SURF detection with n=" << keypoints.size() << " keypoints in " << 1000 * t / 1.0 << " ms" << endl;
}
else if (detectorType.compare("SIFT") == 0){
t = (double)cv::getTickCount();
cv::Ptr<cv::FeatureDetector> detector = cv::xfeatures2d::SIFT::create();
detector->detect(img,keypoints);
t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
cout << "SIFT detection with n=" << keypoints.size() << " keypoints in " << 1000 * t / 1.0 << " ms" << endl;
}
Since project scope is restricted to detecting the vehicle at the front, the keypoints detected on front vehicles are alone considered for further processing. The variable bFocusOnVehicle should be set to true for keypoint restriction on Front vehicle. The bounding box for front vehicle is provided by the cv::Rect(). By looping through all the detected keypoints and adding only the ones which fall into the box into a fresh vector, the keypoints are seperated. The inbuilt function of cv::Rect contains() can also be used to check if the keypoints fall into the bounding box. Here, I have checked it manually.
bool bFocusOnVehicle = true;
cv::Rect vehicleRect(535, 180, 180, 150);
if (bFocusOnVehicle)
{
vector<cv::KeyPoint> filteredPts;
float a = vehicleRect.x;
for (cv::KeyPoint kp:keypoints){
if ( ( (kp.pt.x > vehicleRect.x ) && (kp.pt.x < ( vehicleRect.x + vehicleRect.width ) ) ) && ( (kp.pt.y > vehicleRect.y ) && (kp.pt.y < ( vehicleRect.y + vehicleRect.height ) ) ) ){
filteredPts.push_back(kp);
}
}
keypoints = filteredPts;
cout << " NOTE: Keypoints Restricted to box of preceding vehicle!" << keypoints.size()<<endl;
}
Keypoint descriptor algorithms like BRISK, ORB, FREAK, AKAZE and SIFT are implemented in this project. However, some descriptors work only with a specfic detector. In the combination availabe in this project, AKAZE descriptor can only work with keypoint detected by AKAZE detector. Similarly, ORB descriptor will not work with SIFT Keypoints. These restrictions are also coded in this project. The function calls to respective descriptor is made based on the string value des_type.
A general function call is made in MidTermProject_Camera_Student.cpp by passing the keypoints detected, image, descriptor type to be used. The extracted descriptor and the time taken to execute the extraction process, is then returned back via references to descriptors and ctime_desextract. The descriptor is then assigned to the last DataFrame
descKeypoints((dataBuffer.end() - 1)->keypoints, (dataBuffer.end() - 1)->cameraImg, descriptors, descriptorType,ctime_desextract);
(dataBuffer.end() - 1)->descriptors = descriptors;
In the matching2D_Student.cpp, the function to select the right extractor and execution is coded. Finally , the descriptor and the execution time is passed back as reference.
cv::Ptr<cv::DescriptorExtractor> extractor;
if (descriptorType.compare("BRISK") == 0)
{
int threshold = 30; // FAST/AGAST detection threshold score.
int octaves = 3; // detection octaves (use 0 to do single scale)
float patternScale = 1.0f; // apply this scale to the pattern used for sampling the neighbourhood of a keypoint.
extractor = cv::BRISK::create(threshold, octaves, patternScale);
}
else if (descriptorType.compare("BRIEF") == 0)
{
extractor = cv::xfeatures2d::BriefDescriptorExtractor::create();
//extractor->compute(img,keypoints,descriptors);
}
else if (descriptorType.compare("ORB") == 0)
{
extractor = cv::ORB::create();
//extractor->compute(img,keypoints,descriptors);
}
else if (descriptorType.compare("FREAK") == 0)
{
extractor = cv::xfeatures2d::FREAK::create();
//extractor->compute(img,keypoints,descriptors);
}
else if (descriptorType.compare("AKAZE") == 0)
{
extractor = cv::AKAZE::create();
//extractor->compute(img,keypoints,descriptors);
}
else if (descriptorType.compare("SIFT") == 0)
{
extractor = cv::xfeatures2d::SIFT::create();
//extractor->compute(img,keypoints,descriptors);
}
// perform feature description
double t = (double)cv::getTickCount();
extractor->compute(img, keypoints, descriptors);
t = ((double)cv::getTickCount() - t) / cv::getTickFrequency();
compute_time = t * 1000 / 1.0;
cout << descriptorType << " descriptor extraction in " << 1000 * t / 1.0 << " ms" << endl;
The type of matcher to be used is available in string matcherType. Brute Force matching and FLANN matching are implemented. The selector type is decided by the variable selectorType. Availabe options here are Nearest neighbour (SEL_NN) and k-nearest neighbour (SEL_KNN). Descriptor Type is required to set if binary descriptor or HOG (Histogram of Gradients) descripor is used. In this project, SIFT is HOG based descriptor.
string matcherType = "MAT_BF"; // MAT_BF, MAT_FLANN
string selectorType = "SEL_KNN"; // SEL_NN, SEL_KNN
string descriptorType = descriptor_class[desIndex]; // DES_BINARY, DES_HOG
In the matching2D_Student.cpp, the function matchDescriptors implements the matching process. The matcherType is first selected. For Brute Force matching, default distance normaliser Hamming is used. However for HOG descriptor, L2 normalisation is used. The respective call to either Nearest Neighbour or K- Nearest Neighbour is used with fixed value of 2 neighbours. The ratio threshold of 0.8 is used to choose a match.
bool crossCheck = false;
cv::Ptr<cv::DescriptorMatcher> matcher;
if (matcherType.compare("MAT_BF") == 0)
{
int normType = cv::NORM_HAMMING;
if (descriptorType=="DES_HOG"){
normType = cv::NORM_L2;
cout<<"switching to L2_NORM for "<< descriptorType<<endl;
}
matcher = cv::BFMatcher::create(normType, crossCheck);
}
else if (matcherType.compare("MAT_FLANN") == 0)
{
if ( descSource.type() != CV_32F ){
descSource.convertTo(descSource,CV_32F);
descRef.convertTo(descRef,CV_32F);
}
matcher = cv::FlannBasedMatcher::create();
}
// perform matching task
if (selectorType.compare("SEL_NN") == 0)
{ // nearest neighbor (best match)
matcher->match(descSource, descRef, matches); // Finds the best match for each descriptor in desc1
}
else if (selectorType.compare("SEL_KNN") == 0)
{ // k nearest neighbors (k=2)
vector<vector<cv::DMatch>> knn_matches;
matcher->knnMatch(descSource, descRef, knn_matches,2);
const float ratio_threshold=0.8f;
for ( size_t i=0; i<knn_matches.size();i++){
float ratio = knn_matches[i][0].distance / knn_matches[i][1].distance;
if ( ratio < ratio_threshold )
matches.push_back(knn_matches[i][0]);
}
}
As seen from the statistics , BRISK detector detects the maximum number of keypoints. AKAZE and FAST detects the next maximum number of keypoints. The images below show the distribution of keypoints in the image for different detectors.
Keypoints detected by Harris detector
Keypoints detected by Shi Tomasi detector
Keypoints detected by FAST detector
Keypoints detected by BRISK detector
Keypoints detected by ORB detector
Keypoints detected by AKAZE detector
Keypoints detected by SIFT detector
BRISK - BRISK combination matched the maximum number of keypoints. As we see later, the FAST - BRISK , FAST-ORB combination match comparable number of keypoints at a very high speed (highlighted in Red)
From the statistics, it is clearly evident, that FAST - BRISK and FAST - ORB have lowest computation time, at the same time have reasonly number of keypoint matches followed by ORB - BRISK combination. My conclusion is to use FAST - BRISK or FAST - ORB combination for tracking in camera images. Note : All the values are averaged over all the images.
Keypoint Matches using FAST - BRISK combination
Keypoint Matches using FAST - ORB combination
Keypoint Matches using ORB - BRISK combination
- cmake >= 2.8
- All OSes: click here for installation instructions
- make >= 4.1 (Linux, Mac), 3.81 (Windows)
- Linux: make is installed by default on most Linux distros
- Mac: install Xcode command line tools to get make
- Windows: Click here for installation instructions
- OpenCV >= 4.1
- This must be compiled from source using the
-D OPENCV_ENABLE_NONFREE=ON
cmake flag for testing the SIFT and SURF detectors. - The OpenCV 4.1.0 source code can be found here
- This must be compiled from source using the
- gcc/g++ >= 5.4
- Linux: gcc / g++ is installed by default on most Linux distros
- Mac: same deal as make - install Xcode command line tools
- Windows: recommend using MinGW
- Clone this repo.
- Make a build directory in the top level directory:
mkdir build && cd build
- Compile:
cmake .. && make
- Run it:
./2D_feature_tracking
.