Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Hog allocations / Making a fast HOG version #52

Open
davidbp opened this issue Aug 20, 2018 · 2 comments
Open

Improving Hog allocations / Making a fast HOG version #52

davidbp opened this issue Aug 20, 2018 · 2 comments

Comments

@davidbp
Copy link
Contributor

davidbp commented Aug 20, 2018

After looking carefully at the HOG code and comparing the time needed to perform pedestrian detection in an image (I found it too slow) I wanted to propose another version to be as fast as possible (even though it might differ a little bit from the first proposed version in academia). It would be nice to actually benchmark both versions in terms of F-score in a "benchmark task". In any case probably some of the ideas of the proposed version might be reused to rethink some parts of the current HOG version which could yield faster compute times.

After some discussion with @zygmuntszpak on slack I will start outlining the different components needed to implement the standard HOG:

  1. Divide window into adjacent non-overlapping cells of size 8 x 8 pixels.
  2. For each cell, compute a histogram of the gradient orientations binned into B bins.
  3. Group the cells into overlapping blocks of 2 x 2 cells (so each block has 16 x 16 pixels).
  4. Concatenate the four cell histograms in each block into a single block feature, and normalize the block feature by its Euclidean norm.
  5. The resulting HOG feature is the concatenation of all blocks features within a specified window (eg 128 x 64).

Currently the HOG in the code does this process for a given input image and HOG() struct. This has a basic problem faced by users when they want to use the descriptor in a 'big' image to perform object detection. This problem is a mix of redundant computations of histograms (in case there are overlapping windows) as well as a lot of allocations (since for each window there are several arrays that are created: for gradients (in x and y coordinates) for magnitudes and for orientations).

Fast HOG version 1

  1. Same
  2. Same
  3. Skip
  4. Skip
  5. Resulting HOG is view of the cell features with a specified window

Why skipping 3 and 4 ?
Well, if we do not normalize the histograms, it seems a bit odd to need the blocks since we would end up with the exact same histogram cells copied in different blogs (seems quite a lot of redundant information, when normalized it makes sense since the normalization factor changes the "redundant cells").

I will tell the array made by the histograms a Hogmap. Which might look like this:

C_11  C_12  C_13  C_14 ...
C_21  C_22  C_23  C_24 ...
C_31  C_32  C_33  C_34 ...
...

Where C_ij corresponds to a histogram with B bins.

Hei but this is not a HOG!

Well it is descriptor made with histograms of oriented gradients. It's just not normalizing different block regions in order to get faster computes. I would like to test if there is a real high penalty in performance. When the original HOG was proposed no one (as far as I am aware) used to grow the train sets online. We could do it to have samples with different illuminations in different regions, allowing the learning algorithm to learn to be invariant under such events without us needing the descriptor to make local normalizations.

@zygmuntszpak
Copy link
Member

I've been thinking about how to restructure the code to best support your suggestion. I've taken some inspiration from Mathematica's API, in particular, the GradientOrientationFilter.

Looking at the function reference for the ImageFeatures package I noticed that we have: ImageFeatures.corner_orientations .

I proposed that we add ImageFeatures.gradient_orientations function to the API which moves the following code from create_descriptor(img::AbstractArray{CT, 2}, params::HOG) where CT<:Images.NumberLike in HOG.jl into its function body:

    #compute gradient
    gx = imfilter(img, centered([-1 0 1]))
    gy = imfilter(img, centered([-1 0 1]'))
    mag = hypot.(gx, gy)
    phase = orientation.(gx, gy)

and, of course, the equivalent code for multi-channel images.

Additionally, we add an orientations2histograms or orientations_histograms (or other name) to the API where the user can specify the cell-size as well as an interpolation scheme (i.e. the current gradient weighted 'count' with trilinear_interpolation, gradient weighted 'count' with no interpolation, or a bonafide count (no gradient weighting).

The result of this function call will produce what you have called a hogmap.

The create_descriptor(img::AbstractArray{CT, 2}, params::HOG) where CT<:Images.NumberLike can then use orientations_histograms internally before constructing the canonical HOG feature using one of the specified block normalisation options.

A separate issue is adding a framework where the user can specify a region of interest in an image, as well as a window size and stride, so that the features are constructed for each window and stride inside the region of interest. We want to do this without recomputing the gradients etc. for each window inside the region of interest. I think we can handle this with the mapwindow function.

@davidbp
Copy link
Contributor Author

davidbp commented Aug 23, 2018

I like the approach you propose.

I am not sure about the mapwindow function. Nevertheless I think we both agree is the fact that we have to avoid RECOMPUTING the gradients in polar form and the histograms for each window of a bigger image that we take into consideration.

Then the example form the documentation that does the following:

for j in 32:10:cols-32
    for i in 64:10:rows-64
        box = img[i-63:i+64, j-31:j+32]
        descriptor[:, 1] = create_descriptor(box, HOG())
        predicted_label, s = svmpredict(model, descriptor);
   end 
end

might do something like

for j in 32:10:cols-32
    for i in 64:10:rows-64
        box = [i-63:i+64, j-31:j+32]
        descriptor[:, 1] = create_descriptor(orientation_histograms, box, HOG())
        predicted_label, s = svmpredict(model, descriptor);
   end 
end

There create_descriptor does not recompute gradients or histograms. It simply takes a slice of orientation_histograms (previously build) to generate a descriptor. The only issue I see with this approach is that the window has to move in positions that are multiples of cell_size right?

Let us consider the case that we build the orientations_histogram with cell_size=8 and then we want to take a patch that starts at position 6 of the original image. Then we "have a problem". We should restrict patches to be placed in multiples of cell_size to be able to reuse all histograms already build.

Something similar is what I did here:
https://github.com/davidbp/learn_julia/blob/master/JuliaImages/pedestrian_detection_perceptron_customhog.ipynb

And it takes 0.044186 seconds (50.51 k allocations: 8.154 MiB) (cell 58) to build orientations_histogram and apply a model over views of orientations_histogram array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants