Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to bucket sort in igraph_i_is_bigraphical_simple #2546

Open
Tagl opened this issue Mar 26, 2024 · 4 comments · May be fixed by #2605
Open

Switch to bucket sort in igraph_i_is_bigraphical_simple #2546

Tagl opened this issue Mar 26, 2024 · 4 comments · May be fixed by #2605
Assignees
Labels
performance Issues concerning the performance of igraph or individual functions

Comments

@Tagl
Copy link
Contributor

Tagl commented Mar 26, 2024

IGRAPH_CHECK(igraph_vector_int_init_copy(&sorted_deg1, degrees1));
IGRAPH_FINALLY(igraph_vector_int_destroy, &sorted_deg1);
igraph_vector_int_reverse_sort(&sorted_deg1); /* decreasing sort */
IGRAPH_CHECK(igraph_vector_int_init_copy(&sorted_deg2, degrees2));
IGRAPH_FINALLY(igraph_vector_int_destroy, &sorted_deg2);
igraph_vector_int_sort(&sorted_deg2); /* increasing sort */

Switching the sort here from a comparison based sort to a bucket sort improves the time complexity to $\mathcal{O}(N_1 + N_2)$.
Then all the implemented graphicality checks run in linear time, as far as I can tell.

@ntamas
Copy link
Member

ntamas commented Mar 27, 2024

I think the problem is that we need to cater for cases when the degree distribution is extremely skewed (like, many vertices with degree 1 and one vertex with a high degree). This would need lots of empty buckets, which might be prohibitive in terms of memory usage.

One thing that I was considering for a while now though is to switch from the built-in comparison-based sort of the standard C library to timsort. Or, sort routines on integer vectors could use the difference between the minimum and maximum of the vector as a proxy to decide whether a bucket sort would be feasible, and fall back to timsort if needed.

@szhorvat
Copy link
Member

When considering graphs without multi-edges, the max degree is no greater than the vertex count. Thus for a bucket sort we only need an array of the same size as the degree vector (we only really need the degree counts, as vertex indices are irrelevant). This does double the memory usage compared to an in-place sort, but perhaps that's not so bad.

Some of these graphicality checks can in fact be implemented by working directly with an array containing degree counts. We don't need to keep track of the vertices' identities, just how many vertices are there of degree 1, 2, etc. Reordering the degree sequences doesn't affect their graphicality. (This is not true for directed graphs where we do need to keep track of which in-degree belongs to which out-degree.)

This is indeed the approach currently used in igraph_i_is_graphical_undirected_simple(). It should also work for igraph_i_is_bigraphical_simple().

@szhorvat
Copy link
Member

szhorvat commented Mar 27, 2024

#2286 is related: when we do need to sort vectors of plain numbers (as opposed to more complex objects), we can make an improvement by (1) using better algorithms such as timsort or (2) inlining the comparator for special cases such as integer vectors.

@szhorvat
Copy link
Member

This does double the memory usage compared to an in-place sort

Actually we aren't even using an in-place sort, we're copying the degree vector (which is const) before sorting. So the memory usage won't be affected either.

@szhorvat szhorvat added the performance Issues concerning the performance of igraph or individual functions label Apr 17, 2024
@gendelpiekel gendelpiekel linked a pull request May 4, 2024 that will close this issue
@szhorvat szhorvat linked a pull request May 4, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues concerning the performance of igraph or individual functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants