Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hourglass algorithm #2303

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rfulekjames
Copy link
Contributor

Follow-up of #2267

This is an implementation of hourglass algorithm from this recent improvement of Floyd-Warshall algorithm as suggested here.
It builds upon the tree algorithm from the same paper. Here, additionally to $IN_k$ trees we also construct $OUT_k$ trees which are traversed in a DFS manner while pruning dynamically $IN_k$ for each $k$.

The implementation of the hourglass version doesn't outperform the tree version (see benchmarks below) and probably a more delicate approach is needed to beat the tree version on some instances as discussed here and here, that is, the pruning of $IN_k$ should be done in a more efficient way. Currently, we still visit roots of the pruned subtrees to check that they are not present (i.e. active) which can be avoided if we modify $IN_k$ dynamically.

Not sure, how to proceed at this point, that is, if introducing a dynamic tree is worth the troubles or what the other options are.

| 1 vcount=100, p=0.5, Unweighted, 200x 0.08s 0.08s 0s
| 2 vcount=100, p=0.5, Dijkstra, 200x 0.829s 0.829s 0s
| 3 vcount=100, p=0.5, Unweighted Floyd-Warshall, 200x 0.157s 0.157s 0s
| 4 vcount=100, p=0.5, Unweighted Floyd-Warshall-tree-speedup, 200x 0.274s 0.274s 0s
| 5 vcount=100, p=0.5, Unweighted Floyd-Warshall-hourglass-speedup, 200x 0.462s 0.462s 0s

| 6 vcount=100, p=0.5, Floyd-Warshall, 200x 0.239s 0.239s 0s
| 7 vcount=100, p=0.5, Floyd-Warshall-tree-speedup, 200x 0.222s 0.222s 0s
| 8 vcount=100, p=0.5, Floyd-Warshall-hourglass-speedup, 200x 0.332s 0.332s 0s

| 9 vcount=100, p=0.5, Bellman-Ford (negative), 200x 0.629s 0.629s 0s
| 10 vcount=100, p=0.5, Johnson (negative), 200x 0.87s 0.86s 0.01s
| 11 vcount=100, p=0.5, Floyd-Warshall (negative), 200x 0.249s 0.249s 0s
| 12 vcount=100, p=0.5, Floyd-Warshall-tree-speedup (negative), 200x 0.218s 0.218s 0s
| 13 vcount=100, p=0.5, Floyd-Warshall-hourglass-speedup (negative), 200x 0.308s 0.308s 0s

| 1 vcount=500, p=0.1, Unweighted, 10x 0.115s 0.115s 0s
| 2 vcount=500, p=0.1, Dijkstra, 10x 1.3s 1.3s 0s
| 3 vcount=500, p=0.1, Unweighted Floyd-Warshall, 10x 0.881s 0.881s 0s
| 4 vcount=500, p=0.1, Unweighted Floyd-Warshall-tree-speedup, 10x 0.447s 0.447s 0s
| 5 vcount=500, p=0.1, Unweighted Floyd-Warshall-hourglass-speedup, 10x 0.625s 0.615s 0.01
s
| 6 vcount=500, p=0.1, Floyd-Warshall, 10x 1.04s 1.04s 0s
| 7 vcount=500, p=0.1, Floyd-Warshall-tree-speedup, 10x 0.502s 0.502s 0s
| 8 vcount=500, p=0.1, Floyd-Warshall-hourglass-speedup, 10x 0.702s 0.702s 0s

| 9 vcount=500, p=0.1, Bellman-Ford (negative), 10x 1.24s 1.24s 0s
| 10 vcount=500, p=0.1, Johnson (negative), 10x 1.32s 1.32s 0s
| 11 vcount=500, p=0.1, Floyd-Warshall (negative), 10x 1.04s 1.04s 0s
| 12 vcount=500, p=0.1, Floyd-Warshall-tree-speedup (negative), 10x 0.486s 0.486s 0s
| 13 vcount=500, p=0.1, Floyd-Warshall-hourglass-speedup (negative), 10x 0.65s 0.64s 0.01s

| 1 vcount=1500, p=0.02, Unweighted, 1x 0.101s 0.09s 0.01s
| 2 vcount=1500, p=0.02, Dijkstra, 1x 0.89s 0.89s 0s
| 3 vcount=1500, p=0.02, Unweighted Floyd-Warshall, 1x 2.26s 2.26s 0s
| 4 vcount=1500, p=0.02, Unweighted Floyd-Warshall-tree-speedup, 1x 0.784s 0.774s 0.01s
| 5 vcount=1500, p=0.02, Unweighted Floyd-Warshall-hourglass-speedup, 1x 1.24s 1.23s 0.01s

| 6 vcount=1500, p=0.02, Floyd-Warshall, 1x 2.46s 2.45s 0s
| 7 vcount=1500, p=0.02, Floyd-Warshall-tree-speedup, 1x 1.27s 1.27s 0s
| 8 vcount=1500, p=0.02, Floyd-Warshall-hourglass-speedup, 1x 1.87s 1.86s 0.01s

| 9 vcount=1500, p=0.02, Bellman-Ford (negative), 1x 1.39s 1.39s 0s
| 10 vcount=1500, p=0.02, Johnson (negative), 1x 0.926s 0.926s 0s
| 11 vcount=1500, p=0.02, Floyd-Warshall (negative), 1x 2.49s 2.49s 0s
| 12 vcount=1500, p=0.02, Floyd-Warshall-tree-speedup (negative), 1x 1.29s 1.28s 0.01s
| 12 vcount=1500, p=0.02, Floyd-Warshall-hourglass-speedup (negative), 1x 1.85s 1.84s 0.01s

@codecov
Copy link

codecov bot commented Feb 8, 2023

Codecov Report

Merging #2303 (9b6dcdd) into master (a575cec) will increase coverage by 0.02%.
The diff coverage is 99.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2303      +/-   ##
==========================================
+ Coverage   83.42%   83.45%   +0.02%     
==========================================
  Files         375      375              
  Lines       61587    61686      +99     
==========================================
+ Hits        51381    51479      +98     
- Misses      10206    10207       +1     
Impacted Files Coverage Δ
src/paths/floyd_warshall.c 97.40% <99.13%> (+1.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a575cec...9b6dcdd. Read the comment docs.

@rfulekjames
Copy link
Contributor Author

rfulekjames commented Feb 10, 2023

The implementation of the hourglass version doesn't outperform the tree version (see benchmarks below) and probably a more delicate approach is needed to beat the tree version on some instances as discussed here and here, that is, the pruning of $IN_k$ should be done in a more efficient way. Currently, we still visit roots of the pruned subtrees to check that they are not present (i.e. active) which can be avoided if we modify $IN_k$ dynamically.

Not sure, how to proceed at this point, that is, if introducing a dynamic tree is worth the troubles or what the other options are.

Now I think it is possible to keep dynamically modifying the dfs_skip_in array and therefore there shoudn't be any need for a dynamic tree data structure. However, on the current benchmarks this is still worse than the tree algorithm, and almost always worse than the "active vertices" version from the previous commit:

| 1 vcount=100, p=0.5, Unweighted, 200x 0.082s 0.082s 0s
| 2 vcount=100, p=0.5, Dijkstra, 200x 0.821s 0.821s 0s
| 3 vcount=100, p=0.5, Unweighted Floyd-Warshall, 200x 0.157s 0.157s 0s
| 4 vcount=100, p=0.5, Unweighted Floyd-Warshall-tree-speedup, 200x 0.27s 0.27s 0s
| 5 vcount=100, p=0.5, Unweighted Floyd-Warshall-hourglass-speedup, 200x 0.478s 0.478s 0s

| 6 vcount=100, p=0.5, Floyd-Warshall, 200x 0.247s 0.247s 0s
| 7 vcount=100, p=0.5, Floyd-Warshall-tree-speedup, 200x 0.224s 0.224s 0s
| 8 vcount=100, p=0.5, Floyd-Warshall-hourglass-speedup, 200x 0.371s 0.371s 0s

| 9 vcount=100, p=0.5, Bellman-Ford (negative), 200x 0.639s 0.639s 0s
| 10 vcount=100, p=0.5, Johnson (negative), 200x 0.869s 0.869s 0s
| 11 vcount=100, p=0.5, Floyd-Warshall (negative), 200x 0.251s 0.251s 0s
| 12 vcount=100, p=0.5, Floyd-Warshall-tree-speedup (negative), 200x 0.217s 0.217s 0s
| 13 vcount=100, p=0.5, Floyd-Warshall-hourglass-speedup (negative), 200x 0.357s 0.357s 0s

| 1 vcount=500, p=0.1, Unweighted, 10x 0.117s 0.117s 0s
| 2 vcount=500, p=0.1, Dijkstra, 10x 1.28s 1.28s 0s
| 3 vcount=500, p=0.1, Unweighted Floyd-Warshall, 10x 0.886s 0.886s 0s
| 4 vcount=500, p=0.1, Unweighted Floyd-Warshall-tree-speedup, 10x 0.442s 0.442s 0s
| 5 vcount=500, p=0.1, Unweighted Floyd-Warshall-hourglass-speedup, 10x 0.49s 0.49s 0s

| 6 vcount=500, p=0.1, Floyd-Warshall, 10x 1.05s 1.05s 0s
| 7 vcount=500, p=0.1, Floyd-Warshall-tree-speedup, 10x 0.502s 0.502s 0s
| 8 vcount=500, p=0.1, Floyd-Warshall-hourglass-speedup, 10x 0.897s 0.887s 0.01s

| 9 vcount=500, p=0.1, Bellman-Ford (negative), 10x 1.22s 1.22s 0s
| 10 vcount=500, p=0.1, Johnson (negative), 10x 1.28s 1.28s 0s
| 11 vcount=500, p=0.1, Floyd-Warshall (negative), 10x 1.05s 1.05s 0s
| 12 vcount=500, p=0.1, Floyd-Warshall-tree-speedup (negative), 10x 0.493s 0.493s 0s
| 13 vcount=500, p=0.1, Floyd-Warshall-hourglass-speedup (negative), 10x 0.887s 0.877s 0.01s

| 1 vcount=1500, p=0.02, Unweighted, 1x 0.102s 0.102s 0s
| 2 vcount=1500, p=0.02, Dijkstra, 1x 0.885s 0.885s 0s
| 3 vcount=1500, p=0.02, Unweighted Floyd-Warshall, 1x 2.31s 2.31s 0s
| 4 vcount=1500, p=0.02, Unweighted Floyd-Warshall-tree-speedup, 1x 0.876s 0.876s 0s
| 5 vcount=1500, p=0.02, Unweighted Floyd-Warshall-hourglass-speedup, 1x 1.63s 1.62s 0.01s

| 6 vcount=1500, p=0.02, Floyd-Warshall, 1x 2.49s 2.49s 0s
| 7 vcount=1500, p=0.02, Floyd-Warshall-tree-speedup, 1x 1.14s 1.13s 0.01s
| 8 vcount=1500, p=0.02, Floyd-Warshall-hourglass-speedup, 1x 2.88s 2.88s 0s

| 9 vcount=1500, p=0.02, Bellman-Ford (negative), 1x 1.39s 1.39s 0s
| 10 vcount=1500, p=0.02, Johnson (negative), 1x 0.914s 0.901s 0.01s
| 11 vcount=1500, p=0.02, Floyd-Warshall (negative), 1x 2.49s 2.49s 0s
| 12 vcount=1500, p=0.02, Floyd-Warshall-tree-speedup (negative), 1x 1.18s 1.18s 0s
| 12 vcount=1500, p=0.02, Floyd-Warshall-hourglass-speedup (negative), 1x 2.68s 2.68s 0s

@rfulekjames
Copy link
Contributor Author

rfulekjames commented Feb 10, 2023

At least I could reproduce the result from the paper (Figure 4) for the unweighted case, so hourglass algorithm is sometimes better.

| 3 vcount=512, p=0.25, Unweighted Floyd-Warshall, 10x 0.94s 0.94s 0s
| 4 vcount=512, p=0.25, Unweighted Floyd-Warshall-tree-speedup, 10x 0.872s 0.871s 0s
| 5 vcount=512, p=0.25, Unweighted Floyd-Warshall-hourglass-speedup, 10x 0.843s 0.833s 0.01s

@ntamas
Copy link
Member

ntamas commented Feb 10, 2023

So it looks like the hourglass algorithm only brings some (not significant) improvement compared to the tree algorithm if the graph is relatively large (i.e. vcount=100 is not enough to) and p is in a certain "largeish but not completely, fully connected" range? Is my understanding correct?

If that is the case, I'm not sure whether it's worth making the code more complex, especially if it comes at a cost with other cases that are more likely to happen in practice with igraph (large vcount and small p). Opinions @szhorvat ?

@szhorvat
Copy link
Member

szhorvat commented Feb 10, 2023

I am overwhelmed at the moment, and will come back to this later. Thanks for your work on this @rfulekjames, and also for your input on the upward planarization layout.

@ntamas Given all the effort we put in, it would be nice to finish the hourglass algorithm, even if it does not end up being very fast. It is not yet clear how far we can push the performance. Practical performance depends both on the algorithm and its implementation. According to the paper, the hourglass algorithm takes fewer steps, but since these steps are more expensive, it's unclear if (or when) it runs faster in practice.

My plan was to look through the code to see if there are optimization opportunities that don't make things more complicated, but I haven't had time to look at all yet.

At this point, it may be worth contacting the authors, as they might be able to offer some hints too.

@rfulekjames
Copy link
Contributor Author

So it looks like the hourglass algorithm only brings some (not significant) improvement compared to the tree algorithm if the graph is relatively large (i.e. vcount=100 is not enough to) and p is in a certain "largeish but not completely, fully connected" range? Is my understanding correct?

That's correct if we talk about the unweighted case. So, the benchmarks currently do not really reproduce the findings reported in Figure 3 (but in Figure 4 contrary to what I stated above) which I assume are stated for the weighted case. So, Figure 4 concerns the unweighted one and Figure 3 concerns uniform random digraphs, which I assume to be graphs "with arc weights selected independently at random from the uniform distribution on [0, 1]." as stated in the abstract,

For the weighed case, the hourglass algorithm is currently significantly worse than the tree algorithm on large instances with 500-3000 vertices and 0.25 edge density which is what I tested.
So, currently this doesn't match Figure 3, since 0.25 edge density corresponds to roughly $n^{1.67}$ edges, but the implementation details can of course significantly affect the performance and also I am not sure if they consider bidirectional case here whereas I don't.

@ntamas @szhorvat

@rfulekjames
Copy link
Contributor Author

Now I think it is possible to keep dynamically modifying the dfs_skip_in array and therefore there shoudn't be any need for a dynamic tree data structure. However, on the current benchmarks this is still worse than the tree algorithm, and almost always worse than the "active vertices" version from the previous commit:

There is still probably a performance problem with sometimes revisiting vertices that has been removed.

My plan was to look through the code to see if there are optimization opportunities that don't make things more complicated, but I haven't had time to look at all yet.

I have some ideas for how to modify the code so that we hit the vertices that has been removed less using one more integer array but this wouldn't be still equivalent to a dynamic tree.

At this point, it may be worth contacting the authors, as they might be able to offer some hints too.

I can reach out to them to after the next iteration of edits. Just let me know what you prefer.

@szhorvat @ntamas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants