Hourglass algorithm #2303

rfulekjames · 2023-02-08T19:57:34Z

Follow-up of #2267

This is an implementation of hourglass algorithm from this recent improvement of Floyd-Warshall algorithm as suggested here.
It builds upon the tree algorithm from the same paper. Here, additionally to $IN_k$ trees we also construct $OUT_k$ trees which are traversed in a DFS manner while pruning dynamically $IN_k$ for each $k$.

The implementation of the hourglass version doesn't outperform the tree version (see benchmarks below) and probably a more delicate approach is needed to beat the tree version on some instances as discussed here and here, that is, the pruning of $IN_k$ should be done in a more efficient way. Currently, we still visit roots of the pruned subtrees to check that they are not present (i.e. active) which can be avoided if we modify $IN_k$ dynamically.

Not sure, how to proceed at this point, that is, if introducing a dynamic tree is worth the troubles or what the other options are.

| 1 vcount=100, p=0.5, Unweighted, 200x 0.08s 0.08s 0s
| 2 vcount=100, p=0.5, Dijkstra, 200x 0.829s 0.829s 0s
| 3 vcount=100, p=0.5, Unweighted Floyd-Warshall, 200x 0.157s 0.157s 0s
| 4 vcount=100, p=0.5, Unweighted Floyd-Warshall-tree-speedup, 200x 0.274s 0.274s 0s
| 5 vcount=100, p=0.5, Unweighted Floyd-Warshall-hourglass-speedup, 200x 0.462s 0.462s 0s
| 6 vcount=100, p=0.5, Floyd-Warshall, 200x 0.239s 0.239s 0s
| 7 vcount=100, p=0.5, Floyd-Warshall-tree-speedup, 200x 0.222s 0.222s 0s
| 8 vcount=100, p=0.5, Floyd-Warshall-hourglass-speedup, 200x 0.332s 0.332s 0s
| 9 vcount=100, p=0.5, Bellman-Ford (negative), 200x 0.629s 0.629s 0s
| 10 vcount=100, p=0.5, Johnson (negative), 200x 0.87s 0.86s 0.01s
| 11 vcount=100, p=0.5, Floyd-Warshall (negative), 200x 0.249s 0.249s 0s
| 12 vcount=100, p=0.5, Floyd-Warshall-tree-speedup (negative), 200x 0.218s 0.218s 0s
| 13 vcount=100, p=0.5, Floyd-Warshall-hourglass-speedup (negative), 200x 0.308s 0.308s 0s

| 1 vcount=500, p=0.1, Unweighted, 10x 0.115s 0.115s 0s
| 2 vcount=500, p=0.1, Dijkstra, 10x 1.3s 1.3s 0s
| 3 vcount=500, p=0.1, Unweighted Floyd-Warshall, 10x 0.881s 0.881s 0s
| 4 vcount=500, p=0.1, Unweighted Floyd-Warshall-tree-speedup, 10x 0.447s 0.447s 0s
| 5 vcount=500, p=0.1, Unweighted Floyd-Warshall-hourglass-speedup, 10x 0.625s 0.615s 0.01s
| 6 vcount=500, p=0.1, Floyd-Warshall, 10x 1.04s 1.04s 0s
| 7 vcount=500, p=0.1, Floyd-Warshall-tree-speedup, 10x 0.502s 0.502s 0s
| 8 vcount=500, p=0.1, Floyd-Warshall-hourglass-speedup, 10x 0.702s 0.702s 0s
| 9 vcount=500, p=0.1, Bellman-Ford (negative), 10x 1.24s 1.24s 0s
| 10 vcount=500, p=0.1, Johnson (negative), 10x 1.32s 1.32s 0s
| 11 vcount=500, p=0.1, Floyd-Warshall (negative), 10x 1.04s 1.04s 0s
| 12 vcount=500, p=0.1, Floyd-Warshall-tree-speedup (negative), 10x 0.486s 0.486s 0s
| 13 vcount=500, p=0.1, Floyd-Warshall-hourglass-speedup (negative), 10x 0.65s 0.64s 0.01s

| 1 vcount=1500, p=0.02, Unweighted, 1x 0.101s 0.09s 0.01s
| 2 vcount=1500, p=0.02, Dijkstra, 1x 0.89s 0.89s 0s
| 3 vcount=1500, p=0.02, Unweighted Floyd-Warshall, 1x 2.26s 2.26s 0s
| 4 vcount=1500, p=0.02, Unweighted Floyd-Warshall-tree-speedup, 1x 0.784s 0.774s 0.01s
| 5 vcount=1500, p=0.02, Unweighted Floyd-Warshall-hourglass-speedup, 1x 1.24s 1.23s 0.01s
| 6 vcount=1500, p=0.02, Floyd-Warshall, 1x 2.46s 2.45s 0s
| 7 vcount=1500, p=0.02, Floyd-Warshall-tree-speedup, 1x 1.27s 1.27s 0s
| 8 vcount=1500, p=0.02, Floyd-Warshall-hourglass-speedup, 1x 1.87s 1.86s 0.01s
| 9 vcount=1500, p=0.02, Bellman-Ford (negative), 1x 1.39s 1.39s 0s
| 10 vcount=1500, p=0.02, Johnson (negative), 1x 0.926s 0.926s 0s
| 11 vcount=1500, p=0.02, Floyd-Warshall (negative), 1x 2.49s 2.49s 0s
| 12 vcount=1500, p=0.02, Floyd-Warshall-tree-speedup (negative), 1x 1.29s 1.28s 0.01s
| 12 vcount=1500, p=0.02, Floyd-Warshall-hourglass-speedup (negative), 1x 1.85s 1.84s 0.01s

codecov · 2023-02-08T20:04:02Z

Codecov Report

Merging #2303 (9b6dcdd) into master (a575cec) will increase coverage by 0.02%.
The diff coverage is 99.13%.

@@            Coverage Diff             @@
##           master    #2303      +/-   ##
==========================================
+ Coverage   83.42%   83.45%   +0.02%     
==========================================
  Files         375      375              
  Lines       61587    61686      +99     
==========================================
+ Hits        51381    51479      +98     
- Misses      10206    10207       +1

Impacted Files	Coverage Δ
src/paths/floyd_warshall.c	`97.40% <99.13%> (+1.19%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a575cec...9b6dcdd. Read the comment docs.

rfulekjames · 2023-02-10T02:35:47Z

The implementation of the hourglass version doesn't outperform the tree version (see benchmarks below) and probably a more delicate approach is needed to beat the tree version on some instances as discussed here and here, that is, the pruning of $IN_k$ should be done in a more efficient way. Currently, we still visit roots of the pruned subtrees to check that they are not present (i.e. active) which can be avoided if we modify $IN_k$ dynamically.

~~Not sure, how to proceed at this point, that is, if introducing a dynamic tree is worth the troubles or what the other~~ ~~options are.~~

Now I think it is possible to keep dynamically modifying the dfs_skip_in array and therefore there shoudn't be any need for a dynamic tree data structure. However, on the current benchmarks this is still worse than the tree algorithm, and almost always worse than the "active vertices" version from the previous commit:

| 1 vcount=100, p=0.5, Unweighted, 200x 0.082s 0.082s 0s
| 2 vcount=100, p=0.5, Dijkstra, 200x 0.821s 0.821s 0s
| 3 vcount=100, p=0.5, Unweighted Floyd-Warshall, 200x 0.157s 0.157s 0s
| 4 vcount=100, p=0.5, Unweighted Floyd-Warshall-tree-speedup, 200x 0.27s 0.27s 0s
| 5 vcount=100, p=0.5, Unweighted Floyd-Warshall-hourglass-speedup, 200x 0.478s 0.478s 0s
| 6 vcount=100, p=0.5, Floyd-Warshall, 200x 0.247s 0.247s 0s
| 7 vcount=100, p=0.5, Floyd-Warshall-tree-speedup, 200x 0.224s 0.224s 0s
| 8 vcount=100, p=0.5, Floyd-Warshall-hourglass-speedup, 200x 0.371s 0.371s 0s
| 9 vcount=100, p=0.5, Bellman-Ford (negative), 200x 0.639s 0.639s 0s
| 10 vcount=100, p=0.5, Johnson (negative), 200x 0.869s 0.869s 0s
| 11 vcount=100, p=0.5, Floyd-Warshall (negative), 200x 0.251s 0.251s 0s
| 12 vcount=100, p=0.5, Floyd-Warshall-tree-speedup (negative), 200x 0.217s 0.217s 0s
| 13 vcount=100, p=0.5, Floyd-Warshall-hourglass-speedup (negative), 200x 0.357s 0.357s 0s

| 1 vcount=500, p=0.1, Unweighted, 10x 0.117s 0.117s 0s
| 2 vcount=500, p=0.1, Dijkstra, 10x 1.28s 1.28s 0s
| 3 vcount=500, p=0.1, Unweighted Floyd-Warshall, 10x 0.886s 0.886s 0s
| 4 vcount=500, p=0.1, Unweighted Floyd-Warshall-tree-speedup, 10x 0.442s 0.442s 0s
| 5 vcount=500, p=0.1, Unweighted Floyd-Warshall-hourglass-speedup, 10x 0.49s 0.49s 0s
| 6 vcount=500, p=0.1, Floyd-Warshall, 10x 1.05s 1.05s 0s
| 7 vcount=500, p=0.1, Floyd-Warshall-tree-speedup, 10x 0.502s 0.502s 0s
| 8 vcount=500, p=0.1, Floyd-Warshall-hourglass-speedup, 10x 0.897s 0.887s 0.01s
| 9 vcount=500, p=0.1, Bellman-Ford (negative), 10x 1.22s 1.22s 0s
| 10 vcount=500, p=0.1, Johnson (negative), 10x 1.28s 1.28s 0s
| 11 vcount=500, p=0.1, Floyd-Warshall (negative), 10x 1.05s 1.05s 0s
| 12 vcount=500, p=0.1, Floyd-Warshall-tree-speedup (negative), 10x 0.493s 0.493s 0s
| 13 vcount=500, p=0.1, Floyd-Warshall-hourglass-speedup (negative), 10x 0.887s 0.877s 0.01s

| 1 vcount=1500, p=0.02, Unweighted, 1x 0.102s 0.102s 0s
| 2 vcount=1500, p=0.02, Dijkstra, 1x 0.885s 0.885s 0s
| 3 vcount=1500, p=0.02, Unweighted Floyd-Warshall, 1x 2.31s 2.31s 0s
| 4 vcount=1500, p=0.02, Unweighted Floyd-Warshall-tree-speedup, 1x 0.876s 0.876s 0s
| 5 vcount=1500, p=0.02, Unweighted Floyd-Warshall-hourglass-speedup, 1x 1.63s 1.62s 0.01s
| 6 vcount=1500, p=0.02, Floyd-Warshall, 1x 2.49s 2.49s 0s
| 7 vcount=1500, p=0.02, Floyd-Warshall-tree-speedup, 1x 1.14s 1.13s 0.01s
| 8 vcount=1500, p=0.02, Floyd-Warshall-hourglass-speedup, 1x 2.88s 2.88s 0s
| 9 vcount=1500, p=0.02, Bellman-Ford (negative), 1x 1.39s 1.39s 0s
| 10 vcount=1500, p=0.02, Johnson (negative), 1x 0.914s 0.901s 0.01s
| 11 vcount=1500, p=0.02, Floyd-Warshall (negative), 1x 2.49s 2.49s 0s
| 12 vcount=1500, p=0.02, Floyd-Warshall-tree-speedup (negative), 1x 1.18s 1.18s 0s
| 12 vcount=1500, p=0.02, Floyd-Warshall-hourglass-speedup (negative), 1x 2.68s 2.68s 0s

rfulekjames · 2023-02-10T02:50:02Z

At least I could reproduce the result from the paper (Figure 4) for the unweighted case, so hourglass algorithm is sometimes better.

| 3 vcount=512, p=0.25, Unweighted Floyd-Warshall, 10x 0.94s 0.94s 0s
| 4 vcount=512, p=0.25, Unweighted Floyd-Warshall-tree-speedup, 10x 0.872s 0.871s 0s
| 5 vcount=512, p=0.25, Unweighted Floyd-Warshall-hourglass-speedup, 10x 0.843s 0.833s 0.01s

ntamas · 2023-02-10T14:47:01Z

So it looks like the hourglass algorithm only brings some (not significant) improvement compared to the tree algorithm if the graph is relatively large (i.e. vcount=100 is not enough to) and p is in a certain "largeish but not completely, fully connected" range? Is my understanding correct?

If that is the case, I'm not sure whether it's worth making the code more complex, especially if it comes at a cost with other cases that are more likely to happen in practice with igraph (large vcount and small p). Opinions @szhorvat ?

szhorvat · 2023-02-10T15:37:59Z

I am overwhelmed at the moment, and will come back to this later. Thanks for your work on this @rfulekjames, and also for your input on the upward planarization layout.

@ntamas Given all the effort we put in, it would be nice to finish the hourglass algorithm, even if it does not end up being very fast. It is not yet clear how far we can push the performance. Practical performance depends both on the algorithm and its implementation. According to the paper, the hourglass algorithm takes fewer steps, but since these steps are more expensive, it's unclear if (or when) it runs faster in practice.

My plan was to look through the code to see if there are optimization opportunities that don't make things more complicated, but I haven't had time to look at all yet.

At this point, it may be worth contacting the authors, as they might be able to offer some hints too.

rfulekjames · 2023-02-11T19:33:00Z

So it looks like the hourglass algorithm only brings some (not significant) improvement compared to the tree algorithm if the graph is relatively large (i.e. vcount=100 is not enough to) and p is in a certain "largeish but not completely, fully connected" range? Is my understanding correct?

That's correct if we talk about the unweighted case. So, the benchmarks currently do not really reproduce the findings reported in Figure 3 (but in Figure 4 contrary to what I stated above) which I assume are stated for the weighted case. So, Figure 4 concerns the unweighted one and Figure 3 concerns uniform random digraphs, which I assume to be graphs "with arc weights selected independently at random from the uniform distribution on [0, 1]." as stated in the abstract,

For the weighed case, the hourglass algorithm is currently significantly worse than the tree algorithm on large instances with 500-3000 vertices and 0.25 edge density which is what I tested.
So, currently this doesn't match Figure 3, since 0.25 edge density corresponds to roughly $n^{1.67}$ edges, but the implementation details can of course significantly affect the performance and also I am not sure if they consider bidirectional case here whereas I don't.

@ntamas @szhorvat

rfulekjames · 2023-02-13T14:52:50Z

Now I think it is possible to keep dynamically modifying the dfs_skip_in array and therefore there shoudn't be any need for a dynamic tree data structure. However, on the current benchmarks this is still worse than the tree algorithm, and almost always worse than the "active vertices" version from the previous commit:

There is still probably a performance problem with sometimes revisiting vertices that has been removed.

My plan was to look through the code to see if there are optimization opportunities that don't make things more complicated, but I haven't had time to look at all yet.

I have some ideas for how to modify the code so that we hit the vertices that has been removed less using one more integer array but this wouldn't be still equivalent to a dynamic tree.

At this point, it may be worth contacting the authors, as they might be able to offer some hints too.

I can reach out to them to after the next iteration of edits. Just let me know what you prefer.

@szhorvat @ntamas

activating vertices

a7d8410

dynamic skip list

9b6dcdd

szhorvat force-pushed the master branch from 41190cc to 9d06a76 Compare December 7, 2023 04:39

szhorvat force-pushed the master branch from 563d594 to 36a707a Compare May 23, 2024 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hourglass algorithm #2303

Hourglass algorithm #2303

rfulekjames commented Feb 8, 2023

codecov bot commented Feb 8, 2023 •

edited

rfulekjames commented Feb 10, 2023 •

edited

rfulekjames commented Feb 10, 2023 •

edited

ntamas commented Feb 10, 2023

szhorvat commented Feb 10, 2023 •

edited

rfulekjames commented Feb 11, 2023

rfulekjames commented Feb 13, 2023

Hourglass algorithm #2303

Are you sure you want to change the base?

Hourglass algorithm #2303

Conversation

rfulekjames commented Feb 8, 2023

codecov bot commented Feb 8, 2023 • edited

Codecov Report

rfulekjames commented Feb 10, 2023 • edited

rfulekjames commented Feb 10, 2023 • edited

ntamas commented Feb 10, 2023

szhorvat commented Feb 10, 2023 • edited

rfulekjames commented Feb 11, 2023

rfulekjames commented Feb 13, 2023

codecov bot commented Feb 8, 2023 •

edited

rfulekjames commented Feb 10, 2023 •

edited

rfulekjames commented Feb 10, 2023 •

edited

szhorvat commented Feb 10, 2023 •

edited