Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorithmic complexity of healpix_rangesearch() is way too high #104

Open
mreineck opened this issue Aug 31, 2018 · 6 comments
Open

Algorithmic complexity of healpix_rangesearch() is way too high #104

mreineck opened this issue Aug 31, 2018 · 6 comments
Labels
Milestone

Comments

@mreineck
Copy link

I just tried to run this simple cone search:

int *ind = NULL;
int npix = healpix_rangesearch_radec_simple(1.5, 0., 1.5, 128, &ind);

When compiling the C library with "-O2", this computation took 40 seconds on my laptop, which is way off the scale; expected run times should be below a millisecond.

From looking at the sources, I fear that the run time of healpix_rangesearch is not only scaling with the number of returned pixels, but actually with its square (this is caused by the calls to ll_contains, which have linear complexity by themselves). This won't do for any application where the cone search returns more than a handful of pixels.

@cdeil cdeil added the bug label Aug 31, 2018
@cdeil cdeil added this to the 1.0 milestone Aug 31, 2018
@cdeil
Copy link
Member

cdeil commented Aug 31, 2018

cc @dstndstn

@dstndstn
Copy link
Collaborator

This code was written for small Nside (=1 or 2!), so there were never a large number of pixels in our application.

Without changing the algorithm, you could try:

change ll_contains to ll_sorted_contains
change ll_append to ll_insert_ascending

this "ll" data structure is a block-list: linked-list of arrays, so the sorted inserts and checks can jump by the array size (256 here) rather than iterating through the whole list. Technically doesn't improve the complexity, but it should improve the speed!

The Astrometry.net code does also include the "bt.c" code, which implement a similar API but uses an AVL tree under the hood, so should yield fast inserts and searches. However, it is derived from GNU libavl, so probably won't work for your licensing situation.

More broadly: the algorithm here is expanding a frontier, finding healpix neighbours and checking them individually. Obviously, as Nside increases, this is not a good idea. As Nside increases, it becomes more like a regular grid, and you could use a different approach, say finding the start and end pixels within each row inside each base healpixel.

@mreineck
Copy link
Author

mreineck commented Aug 31, 2018

Oh, I had assumed that this code was supposed to be the equivalent of query_disc()!

The proposed change to sorted lists will actually reduce the complexity, as far as I understand: from O(n**2) to O( n log(n)), if n is the number of returned pixels.

However it might be worth thinking about going all the way and reach a complexity of O(sqrt(n)). Here is a rough sketch for the NESTED scheme:

  • put all 12 base pixels in a work list
  • while the list is not empty:
    • take the next pixel from the list
    • if it is completely outside the disk, do nothing
    • if it is completely inside, add the range of its subpixels to the output
    • otherwise (i.e. the pixel partially overlaps the disk) if we are at the desired resolution, add the pixel to the output, else add its four subpixels at the next refinement level to the work list.

@dstndstn
Copy link
Collaborator

No, complexity is still O(N**2) because the (block-list versions of) insert_ascending and sorted_contains are still O(N); the top-level structure is a linked list, so can't be binary searched.

@mreineck
Copy link
Author

For the RING scheme, you already proposed an efficient way: go through all Healpix rings touching the disk, determine which range of pixels in the ring overlaps with the disk, and append it to the output.

These range-based approaches are important for catalog cross-matching etc. In these applications one often needs cone searches at very high resolutions at not-too-small radii, so storing explicit pixel lists becomes prohibitively expensive.

@mreineck
Copy link
Author

mreineck commented Aug 31, 2018

Sorry, you are of course correct about the complexity staying O(n**2)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants