Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large geometry intersection check perf speedup #701

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

urschrei
Copy link
Member

Above some threshold, it may be faster to load the geometry or geometries into an R*-star tree and query for a subset of intersection candidates.

This is a draft PR because:

  • I haven't benchmarked the tree-query logic in order to figure out even an approximate value for MAX_NAIVE_SEGMENTS

  • I haven't figured out whether checking the number of segments in order to decide to switch to trees is an unacceptable perf hit

  • I agree to follow the project's code of conduct.

  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.


@urschrei
Copy link
Member Author

urschrei commented Dec 22, 2021

Screenshot 2021-12-22 at 13 31 36

This is with length checking turned on, RTree (old) vs naïve (new) polygon-line intersection check using the louisiana.rs text fixture (1350 vertices). We're going to need a bigger polygon.

@urschrei
Copy link
Member Author

Same test setup (RTree is old, naïve is new), using norway_main.rs (8534 vertices)
Screenshot 2021-12-22 at 13 46 05

@urschrei
Copy link
Member Author

Same test setup (RTree is old, naïve is new), using a new big.rs (100610 vertices) valid polygon:
Screenshot 2021-12-22 at 15 37 19

If I'm interpreting this correctly:

  1. Using a tree for a single Polygon-line intersection check probably isn't worth it for many applications
  2. The length checks aren't significant in wall-clock terms
  3. The existing logic is very fast in wall-clock terms!

Now, on to Polygon-Polygon intersection checking…

@urschrei
Copy link
Member Author

Now we're talking:

Old (red, naïve) vs new (blue, tree) (sorry for reversing the setup this time!)
Polygon-Polygon intersection check using big.rs and norway_main.rs (100610 and 8534 vertices, respectively)

Screenshot 2021-12-22 at 17 53 34

The naïve version has to potentially do ~860 million line-line intersection checks so 165 ms isn't bad, but the R*-tree mean is only 35.4 ms

@michaelkirk
Copy link
Member

michaelkirk commented Dec 22, 2021

Cool!

Just as a reminder, @rmanoka did a somewhat related investigation of intersection perf here:
#649

It's a slightly different use case, but the findings might be relevant. In particular, I sort of expect the performance of R-Tree to be better than the naive approach as a.segments() * b.segements() scales.

So if you did want to have a threshold to toggle between "naive" vs. "rtree" using a multiplicative, rather than an additive might make more sense.

if poly_a.num_segments() * poly_b.num_segements() > x {
    use_rtree()
} else {
    use_naive()
}

@urschrei
Copy link
Member Author

Yeah, the multiplicative is definitely better. Next up, I need to figure out a way of getting the large polygon data into the benchmark; include!ing it is taking about 45 mins to compile, and it's only around 2.8 mb.

@michaelkirk
Copy link
Member

Maybe #566?

(lol not to just dump my wishlist on you or anything...)

@urschrei
Copy link
Member Author

Hmm as constructed intersection_candidates_with_other_tree isn't actually producing any candidates for our (known-to-intersect) test data. I wonder what's going on…

@urschrei
Copy link
Member Author

Update: we're checking for line intersections in this case, since we're decomposing the polygons into lines. But that excludes the possibility of polygon a completely enclosing polygon b (e.g. large square with smaller square inside). 🤔

@urschrei
Copy link
Member Author

urschrei commented Dec 23, 2021

If / when #351 lands, the easier case (A contains B) will be solved, but the more tricky case (A contains B inside one of its holes) remains.

geo_types::Line<T>: RTreeObject,
{
fn intersects(&self, linestring: &LineString<T>) -> bool {
if (self.exterior().0.len() + self.interiors().iter().map(|ls| ls.0.len()).sum::<usize>())
Copy link
Member

@frewsxcv frewsxcv Dec 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        if (self.exterior().0.len() + self.interiors().iter().map(|ls| ls.0.len()).sum::<usize>())

#707 😄

let lines_a: Vec<_> = self
.exterior()
.lines()
.chain(self.interiors().iter().flat_map(|ls| ls.lines()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the last coordinate of the exterior ring connect to the first coordinate of the first interior ring?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, that's not what's happening here, nevermind!

Copy link
Member

@frewsxcv frewsxcv Dec 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meep meep #708

@michaelkirk
Copy link
Member

Now that #829 is merged, we might get comparable perf via something like a.relate(b).is_intersects()

@urschrei
Copy link
Member Author

a.relate(b).is_intersects()

relate requires a GeoFloat bound (currently GeoNum), and from a first cut, use of relate introduces extreme perf regressions (900 %+ in some case) in our nice new boolean ops intersection benchmark suite…

@michaelkirk
Copy link
Member

from a first cut, use of relate introduces extreme perf regressions (900 %+ in some case) in our nice new boolean ops intersection benchmark suite…

Do you still have an example of the data you were using to see this kind of performance?

I'd expect the R-Tree approach to be a constant factor slower, but better asymptotically — much better for big inputs and a little slower for small inputs. I am curious though as to what range of geometry sizes we can expect to see the tradeoffs.

@urschrei
Copy link
Member Author

If you check out this branch and run the boolean ops benchmarks you should see the perf regression in the criterion output. I should also note that this was a quick check that I didn't dig into, so I may have messed something up.

Above some threshold, it may be faster to load the geometry or geometries into an R*-star tree and query for a subset of intersection candidates.
Also use multiplicative threshold (TODO: what should it be?)
@urschrei
Copy link
Member Author

(I rebased against main just in case)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants