Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stbt.match: Merge neighbouring ROIs from previous pyramid level #558

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

drothlis
Copy link
Contributor

@drothlis drothlis commented Jan 24, 2019

If the previous pyramid level (that is, running cv2.matchTemplate on a
smaller version of the image) identified Regions Of Interest (ROIs)
that are small and close together, then at the next pyramid level we
have to invoke cv2.matchTemplate (now on the full-sized image)
multiple times.

This is still a worthwhile optimisation compared to running
cv2.matchTemplate on the entire full-sized image; but if we merge
ROIs that are very close together, it's even faster.

This effect is particularly pronounced, now that we've tightened the
match threshold, if you're looking for an image of a word and the frame
has a full paragraph of text; the word might (tentatively) match at
every word in the paragraph, but not at the spaces between the words.

In the above scenario, I measured 47ms vs. 69ms, on my laptop.

N.B. The morphological operation "Closing" means to dilate and then
erode -- it has the effect of closing holes or gaps. Using a square
kernel (all ones) is 4 times faster than using an elliptical kernel
(250µs for the c2v.morphologyEx vs. 1.1ms).

TODO:

  • Fix match failures at edge locations.
  • Measure the performance impact across a wider range of examples.

@drothlis
Copy link
Contributor Author

drothlis commented Jan 24, 2019

master (sqdiff @ 0.98) this branch v29 (sqdiff-normed @ 0.8)
Looking for this reference image: template
In this frame: source
ROIs identified at previous pyramid level: level1-source_matchtemplate_threshold level1-source_matchtemplate_threshold level1-source_matchtemplate_threshold
Becomes ROIs in the full sized image: level0-source_with_rois level0-source_with_rois level0-source_with_rois
Number of cv2.matchTemplate calls: 97 12 19
%timeit stbt.match(template, frame) takes: 69ms 47ms 54ms

@drothlis
Copy link
Contributor Author

Ironically the old sqdiff-normed @ 0.80 takes 54ms: better than the current master, but not as fast as this PR.

Because the old threshold is more generous, the ROIs were more likely to be joined already.

@drothlis
Copy link
Contributor Author

test_match_visualisation was failing because match locations at the very edge of the image were being eroded away. This was because I used an even-sized kernel. Changing the kernel size from 10 to 11 fixed it.

I have also made the kernel 11x11 instead of 11x1 (horizontal). Because why not, it should handle more cases. Performance of the cv2.morphologyEx went from 200µs -> 250µs.

Note that this filter is square (all ones). If I use an elliptical filter, cv2.morphologyEx takes 4x longer (1.1ms).

@drothlis drothlis force-pushed the merge-match-rois branch 2 times, most recently from 5164d0c to 3f3fcb2 Compare January 25, 2019 16:38
@drothlis
Copy link
Contributor Author

Rebased onto master to get test fixes from #560.

@drothlis
Copy link
Contributor Author

drothlis commented Jan 27, 2019

On 1518 template/screenshot pairs from a real test-pack (69 templates x 22 screenshots)

Greatest improvement (absolute): 130ms (37%)
Greatest improvement (relative): 44% (17ms)
Worst deterioration (absolute): 53ms (14%)
Worst deterioration (relative): 24% (22ms)

Average difference (absolute): 1.758s improvement

Here's the worst deterioration:

master (sqdiff @ 0.98) this branch
ROIs identified at pyramid level 2: level2-source_matchtemplate_threshold level2-source_matchtemplate_threshold
Becomes ROIs in pyramid level 1: level1-source_with_rois level1-source_with_rois
Time of stbt.match: 91ms 112ms

@drothlis
Copy link
Contributor Author

drothlis commented Jan 27, 2019

Reducing the kernel size of the closing operation at the smallest pyramid level to (5,5) seems to be an improvement but still isn't unambiguously better than master:

Greatest improvement (absolute): 133ms (37%)
Greatest improvement (relative): 42% (106ms)
Worst deterioration (absolute): 59ms (31%)
Worst deterioration (relative): same one

Average improvement (absolute): 2.876s
(Edit: That can't be the average improvement because it's 20x larger than the greatest improvement. Unfortunately I can't find the spreadsheet where I did these calculations 2 weeks ago).

All the differences (positive means an improvement):
14a5897

@drothlis
Copy link
Contributor Author

There are almost as many regressions in performance as there are improvements, so I'm giving up on this idea.

@drothlis drothlis closed this Feb 11, 2019
@drothlis drothlis deleted the merge-match-rois branch February 11, 2019 16:04
@drothlis drothlis restored the merge-match-rois branch February 14, 2019 15:53
@drothlis
Copy link
Contributor Author

Reopening -- I want to revisit this now that we've merged #569.

@drothlis
Copy link
Contributor Author

Average improvement of 7ms per match across 1200 screenshot×template pairs.

screenshot from 2019-02-15 14-18-17

If the previous pyramid level (that is, running `cv2.matchTemplate` on a
smaller version of the image) identified many Regions Of Interest (ROIs)
that are small and close together, then at the next pyramid level we
have to invoke `cv2.matchTemplate` (now on the full-sized image)
multiple times.

This is still a worthwhile optimisation compared to running
`cv2.matchTemplate` on the entire full-sized image; but if we merge
ROIs that are very close together, it's even faster.

This effect is particularly pronounced, now that we've tightened the
match threshold, if you're looking for an image of a word and the frame
has a full paragraph of text; the word might (tentatively) match at
every word in the paragraph, but not at the spaces between the words.

In the above scenario, I measured 47ms vs. 69ms, on my laptop.

N.B. The morphological operation "Closing" means to dilate and then
erode -- it has the effect of closing holes or gaps. Using a square
kernel (all ones) is 4 times faster than using an elliptical kernel
(250µs for the `c2v.morphologyEx` vs. 1.1ms).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant