HOGpp

This repository contains an implementation of the rectangular histogram of oriented gradients feature descriptor (R-HOG) using integral histograms. The integral histogram representation allows to quickly compute HOG features in subregions of an image in constant time. This is particularly useful if the features in an image must be computed repeatedly, e.g., in a sliding window manner.

HOG features may be seen as a special case of the Scale-invariant Feature Transform (SIFT) computed over a dense grid of keypoints where each block is additionally contrast-normalized.

Features

C++ templated implementation
Python support for 32, 64, and 80 bit floating point precision
Unrestricted input size (e.g., OpenCV as of version 4.5.5 requires the input to be a multiple of the block size)
Support for arbitrary integer (8 bit to 64 bit, both signed and unsigned) and floating point input (e.g., OpenCV requires 8-bit unsigned integer input)
Masking support (i.e., spatial exclusion of gradient magnitudes from contributing to features)

For a complete summary of differences between HOGpp and existing implementations, refer to the feature matrix below.

Requirements

C++17 compiler
Boost 1.70
CMake 3.15
Eigen 3.4.0
fmt 6.0
OpenCV 4.0
pybind11 2.6.2 (version 2.9.0 is required for use with Visual Studio 17 2022 and above)

More recent versions of the above are expected to work as well.

Getting Started

In Python:

from hogpp import IntegralHOGDescriptor

desc = IntegralHOGDescriptor()
# Load image
image = # ...
# Precompute the gradient histograms. This needs to be done only once for each image.
desc.compute(image)
# Extract the feature descriptor of a region of interest. The method can be
# called multiple times for different subregions of the above image. Note the
# use of matrix indexing along each axis opposed to Cartesian coordinates.
roi = (0, 0, 128, 64) # top left (row, column) size (height, width)
X = desc(roi)

Comparison to Existing Libraries

The following feature matrix summarizes the differences between existing implementations.

Library	Signed Orientations	Custom Gradients	Masking	Arbitrary Input Size	Implementation
HOGpp	✔️	✔️	✔️	✔️	C++
OpenCV	✔️	✖	✖	✖	C++
scikit-image	✖	✖	✖	✔️	Cython/Python

Differences to Dalal & Triggs Formulation

When using HOGpp, one should be aware of subtle differences between the integral histogram implementation and the one originally proposed by Dalal & Triggs.

In general, computing R-HOG consists of the following steps:

(optional) gamma correction
gradient computation
orientation binning within a cell
- down-weighting of pixels using a Gaussian with respect to their position within a block
- trilinear interpolation of magnitude votes between neighboring bins in both orientation and position
block normalization

Provided these steps, R-HOG extracted using an integral histogram is slightly inferior to the original formulation. The reason for this being that neither pixel down-weighting using a Gaussian nor trilinear interpolation can be performed efficiently within the integral histogram framework. However, the integral histogram R-HOG formulation is substantially faster while being a sufficiently close approximation to the original R-HOG formulation.

Despite the above limitations, our evaluation on the INRIA person dataset and the comparison against OpenCV's HOGDescriptor indicates that particularly the Gaussian down-weighting does not necessarily improve the generalization ability of the associated classifiers.

For a comparison of both approaches, the interested reader should refer to:

Qiang Zhu, Mei-Chen Yeh, Kwang-Ting Cheng, & Avidan, S. (2006). Fast Human Detection Using a Cascade of Histograms of Oriented Gradients. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 1491–1498). IEEE. DOI: 10.1109/CVPR.2006.119

Performance on the INRIA Person Dataset

HOGpp implementation was validated by applying it to the task of pedestrian detection. For the most part, the experiments by Dalal & Triggs were replicated with few alterations.

More specifically, we trained a linear support vector machine (SVM) in the primal using stochastic gradient descent (SGD) on features extracted from cropped annotations of the INRIA person dataset. We then quantitatively compared the performance of the obtained classifier against models trained on descriptors extracted using OpenCV and scikit-image.

The following figure provides an intuition of the steps involved in training a pedestrian classifier and its use on HOG features for predicting the corresponding class.

On a high level, HOG features describe the silhouette of a pedestrian which is eventually used in a way that is similar to how template matching works albeit accounting for some pose variations.

Training

For comparison purposes, we trained all the classifiers using same fixed set of HOG parameters producing a 3780-dimensional feature vector. Specifically, the parameters employed were:

9 orientation bins constructed from unsigned gradients
cell size of 8×8 pixels
overlapping blocks consisting of 16×16 pixels (or equivalently, 2×2 cells)
l2-hys block normalization clipped at 0.2

We then trained an initial SVM classifier using 5-fold stratified inner cross-validation while optimizing the regularization term penalty using grid search. 20% of the samples of each training split were additionally used as a validation split to allow for early stopping.

After obtaining the initial model, we used each classifier to perform an exhaustive search for false positives (i.e., hard mining) and retrained the classifiers by including the hard mined samples.

We used confidence based sampling opposed to random sampling to subsample the large set of false positives. Specifically, up to 30 most confident false positives (i.e., samples farthest away from the decision boundary) were selected as hard negatives.

Quantitative Results

The following plot summarizes the performance of refined models at various thresholds.

Overall, the HOGpp based model outperforms models that use OpenCV and scikit-image HOG descriptors.

A detailed look at additional classification metrics, however, shows that HOGpp achieves a lower precision compared to other two implementations. Yet, the recall and consequently the F₁ score are considerably higher thereby outperforming both implementations.

Implementation	Precision	Recall	F₁ score	Accuracy
hogpp	95.45%	90.75%	93.04%	97.20%
skimage	96.95%	83.53%	89.74%	96.06%
cv2	98.32%	79.46%	87.89%	95.48%

Hard Negatives

It is also important to consider the number of hard negatives produced by each of the HOG descriptor implementations. The following table provides an overview of the corresponding absolute numbers.

Implementation	Hard negatives
hogpp	30584
cv2	31113
skimage	33433

In this specific application, the initial model obtained from HOGpp descriptors generates the least number of false positives usable for further refinement. While the overall number of training samples is lowest, the HOGpp model still achieves the best performance in terms of the F₁ score and ROC AUC. At the same time, this indicates that the initial HOGpp model already generalizes better than OpenCV and scikit-image based models.

Due to the probabilistic nature of the learning process, particularly the number of hard negatives can vary depending on the chosen seed. Therefore, the corresponding numbers should be taken with a grain of salt because at times the OpenCV based model can produce fewer hard negatives than HOGpp. This observation, however, does not affect the generalization ability of the refined models on this task.

Runtime Performance

The following bar plot summarizes the average runtime of individual HOG implementations for extracting the descriptor of a single 128×64 (height×width) region of interest (ROI) within a larger image as performed during hard mining.

The runtime of the precompute stage applicable only to HOGpp is negligible and can therefore be hardly observed in the bar plot. As such, the extract stage is computationally more expensive. Nevertheless, HOGpp outperforms both implementations in terms of the average cumulative runtime for a single ROI consuming around 32 μs.

The speed up factor achieved by HOGpp with respect to OpenCV and scikit-image implementations is as follows:

	cv2	skimage
hogpp	×2.4	×7.3

Final Remarks

As always, the provided results are specific to the described experiment, environment, and the setup used to evaluate the models, and therefore should not be extrapolated to different tasks without validation.

License

This document and all figures are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

HOGpp is provided under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github		.github
docs/experiments/inria		docs/experiments/inria
include/hogpp		include/hogpp
python		python
tests		tests
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
gcovr.cfg		gcovr.cfg
pytest.ini		pytest.ini
requirements_dev.txt		requirements_dev.txt
requirements_lint.txt		requirements_lint.txt
tox.ini		tox.ini

License

sergiud/hogpp

Folders and files

Latest commit

History

Repository files navigation

HOGpp

Features

Requirements

Getting Started

Comparison to Existing Libraries

Differences to Dalal & Triggs Formulation

Performance on the INRIA Person Dataset

Training

Quantitative Results

Hard Negatives

Runtime Performance

Final Remarks

Further Reading

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages