Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify boxes in result by default? #48

Open
dpsanders opened this issue Dec 14, 2019 · 22 comments · May be fixed by #64
Open

Unify boxes in result by default? #48

dpsanders opened this issue Dec 14, 2019 · 22 comments · May be fixed by #64

Comments

@dpsanders
Copy link
Member

There's an implementation using LightGraphs in #5

It doesn't seem so easy to write an implementation by hand. It's basically a union-find algorithm

@dpsanders
Copy link
Member Author

There should be a unify=true keyword argument (true by default)

@Suavesito-Olimpiada Suavesito-Olimpiada linked a pull request May 27, 2021 that will close this issue
@Suavesito-Olimpiada
Copy link
Contributor

Suavesito-Olimpiada commented May 27, 2021

Hi! I decided to take a chance into this and found there is something weird with the implementation described above for 2 dimensions and above. The weird case is easily described with the next example

       ┌───────────┐
       │           │
       │           │
       │      B₁   │
┌──────┼───┐       │
│      │   │       │
│      └───┼───────┘
│          │
│    B₂    │ ┌────┐
│          │ │ B₃ │
│          │ └────┘
└──────────┘

with the above algorithm the boxes are reduced to this

┌──────────────────┐
│                  │
│                  │
│                  │
│                  │
│                  │
│       B₁∪B₂      │
│                  │
│           ┌────┐ │
│           │ B₃ │ │
│           └────┘ │
└──────────────────┘

But I think thay should be reduced to just B₁∪B₂∪B₃.

In #64 I add the unify keyword with a small but fast implementation of union-find (weighted and with path compression, reference on the algorithm here).

The implementation is quite fast, two examples are the next ones (first from the tutorial)

julia> using IntervalArithmetic, IntervalOptimisation, BenchmarkTools

julia> f(x) = x^2 - 2x + 1
f (generic function with 1 method)

julia> @time minval, minimisers = minimise(f, 0..2; tol=1e-6, unify=false);
  3.626429 seconds (15.91 M allocations: 1.621 GiB, 8.09% gc time)

julia> @show length(minimisers)
length(minimisers) = 2990
2990

julia> @btime unified_minimisers = IntervalOptimisation.unify_boxes(minimisers)
  29.142 ms (75 allocations: 117.45 KiB)
14-element Vector{Interval{Float64}}:
 [0.998687, 1.00136]
 [0.998642, 0.998687]
 [0.998635, 0.998642]
 [0.998623, 0.998627]
 [1.00136, 1.00137]
 [1.00135, 1.00137]
 [1.00138, 1.00139]
 [1.00137, 1.00138]
 [1.00135, 1.00136]
 [0.998632, 0.998634]
 [0.998621, 0.998623]
 [1.00137, 1.00138]
 [1.00137, 1.00138]
 [0.998617, 0.998619]

the other mine

julia> g(x) = abs(x[1]-x[2]) + (x[1]+x[2])^2
g (generic function with 1 method)

julia> @time minval, minimisers = minimise(g, (-1+1e-10..1)×(-1..1); tol=1e-10, unify=false); # initialize with these give a special bad behaviour
  1.815546 seconds (8.21 M allocations: 747.165 MiB, 18.85% gc time)

julia> @show length(minimisers)
length(minimisers) = 27670
27670

julia> @btime unified_minimisers = IntervalOptimisation.unify_boxes(minimisers)
  2.666 s (28 allocations: 946.11 KiB)
1-element Vector{IntervalBox{2, Float64}}:
 [-3.80622e-07, 3.80622e-07] × [-3.80672e-07, 3.80625e-07]

One thing is that if the implementation is changed so that the example showed at the begginig returns B₁∪B₂∪B₃, unify_boxes can be written to be faster (now is cuadratic on the number of minimisers, but it can be O(n log(n))) and avoid the union-find algorithm altogether. (edit: typos and wording)

I'm open to thoughts and ideas. Hope is useful.

P.D. Kudos to asciiflow for the ascii drawing. 😛

@dpsanders
Copy link
Member Author

I don't understand why the algorithm doesn't reduce your example to B₁ ∪ B₂ ∪ B₃.

@Suavesito-Olimpiada
Copy link
Contributor

Suavesito-Olimpiada commented May 28, 2021

Is because the B₁∩B₃ and B₂∩B₃ is empty, even if (B₁∪B₂)∩B₃ is not, and the algorithm is just testing pairwise the intersections.

@dpsanders
Copy link
Member Author

Nice catch! Does your version give the correct result?

@dpsanders
Copy link
Member Author

A solution seems to be to modify the list of intervals as you go by taking the union (hull) when two intervals are combined, and checking the intersection with this modified list instead of the original list.

@dpsanders
Copy link
Member Author

Hmmm. But actually in your original example we would prefer not to unify (B1 ∪ B2) with B3, since they are actually disjoint. Maybe taking the hull is actually not a good idea.

@dpsanders
Copy link
Member Author

We can just return separate lists for each connected component and the user can take the hull if they prefer; or we can give a convenience function to do so.

@Suavesito-Olimpiada
Copy link
Contributor

Maybe we can add three options for unify, :hull, :connected and :none?

@Suavesito-Olimpiada
Copy link
Contributor

I can add the algorithm for the hull too. I'll add it to the PR and make the changes on the documentation and add tests.

@dpsanders
Copy link
Member Author

@Suavesito-Olimpiada: Sorry for the delay in replying. I think it would be great to have these different options. Is there any progress on this?

@dpsanders
Copy link
Member Author

Note that there is also an implementation of disjoint sets in DataStructures.jl. I'm not sure how it compares to yours.

@Suavesito-Olimpiada
Copy link
Contributor

I can check the differences between my inplementation and theirs, also see performan differences and reply back later.

@Suavesito-Olimpiada
Copy link
Contributor

On the hull algorithm, I still don't have progress. But once I finish my semester I'll do it. :)

@dpsanders
Copy link
Member Author

I think there should be a way to use interval trees for this. There is an implementation in Julia.

@dpsanders
Copy link
Member Author

Or rather R-trees

@Suavesito-Olimpiada
Copy link
Contributor

I'll check out both R-trees and Interval Trees, so I can understand better both structures. But as I see (just over-looking both articles) I think a variant of R-trees is closer to the hull idea (that is, so we can get the maximum number of non-intersecting hulls of intersecting intervals).

@dpsanders
Copy link
Member Author

I think the idea of R-trees is to make it more efficient to check which other boxes a given box intersects.

@Suavesito-Olimpiada
Copy link
Contributor

From the R-Tree article

The key idea of the data structure is to group nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree

and from the Interval Tree article

...it allows one to efficiently find all intervals that overlap with any given interval or point

@dpsanders
Copy link
Member Author

And then

The key idea is to use the bounding boxes to decide whether or not to search inside a subtree. In this way, most of the nodes in the tree are never read during a search

The bounding boxes are used to narrow down the set of existing boxes that a new box must be checked against, as far as I understand.

@dpsanders
Copy link
Member Author

See Samet, Foundations of Multidimensional and Metric Data Structures, chap. 3

@Suavesito-Olimpiada
Copy link
Contributor

Ok, ok I think both structures are used to optimise the search of intervals that overlaps other intervals, but I need to check the differences. Thanks for the information and all the references, I'll check them out. 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants