New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimized DFS iterator and added additional functionality #423
base: master
Are you sure you want to change the base?
Conversation
|
||
private Deque<Object> stack = new ArrayDeque<>(); | ||
//private Stack<V> stack = new Stack<>(); | ||
private Stack<StackEntry> stack = new Stack<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jkinable. I've not had a look at the rest of the PR yet, but I wanted to say, looking at this, that I feel a bit iffy with Stack
being used here rather than Deque
.
If I may, I would like to suggest using Deque
here instead, for a couple of reasons.
Firstly, according to the ArrayDeque
javadoc, it's explained that ArrayDeque "is likely to be faster than Stack when used as a stack".
And finally, the Stack
javadoc explains that it extends Vector
, which IIRC has built-in synchronization, which I would reasonably guess makes Stack
slower than necessary in this case, since AFAICT the stack isn't being used in an asynchronous fashion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to raise a good point. I completely forgot about this. I always find this amusingly contradictory: "This class is likely to be faster than Stack when used as a stack".
Any idea whether it is better to use a ArrayDeque or a LinkedList as stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably ArrayDeque
I think.
LinkedList
uses more memory per element than array-based data structures if I understand correctly (although, thinking more about it, this may be moot, since array-based data structures by their nature need to grow their internal arrays occasionally if they get too full, meaning array-based data structures are usually larger than necessary at any given point in time).
Furthermore I would guess that LinkedList
is slower on modern hardware, since it uses pointers a lot, which I believe is not CPU-cache-friendly.
So I'm inclined to go for ArrayDeque
unless profiling shows that LinkedList
is better in some way that matters for JGraphT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope this helps. :)
@jkinable I haven't had a chance to review the DFS change yet...could we keep that separate from the other changes to make reviewing easier? |
@jsichi I've separated the DFS changes and the connectivity changes (separate PR). |
@@ -55,7 +60,7 @@ String getExpectedStr2() | |||
@Override | |||
String getExpectedFinishString() | |||
{ | |||
return "6:4:9:2:8:7:5:3:1:orphan:"; | |||
return "1:3:6:5:7:9:4:8:2:orphan:"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've eliminated the old white/gray/black bookkeeping that was producing the correct finish order. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure that I understand your comment. In DepthFirstSearch there is no such thing as a 'correct finish order'. Quoting from the original author of the test class:
NOTE: This test uses hard-coded expected ordering isn't really guaranteed by the specification of
the algorithm. This could cause false failures if the traversal implementation changes.
In other words, alternative Depth First Search implementations may produce a different traversal order, while still producing a different but correct Depth First Search Tree.
The original implementation using white/gray/black bookkeeping was surprisingly slow. Some tests on reasonable sized graphs gave me one to two orders of magnitude speedup with the new implementation. Unless I misunderstood something, the book keeping was rather unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that the finish order is not deterministic, which I believe is what that comment meant. However, there are constraints on the allowable orderings:
http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/GraphAlgor/depthSearch.htm
So I don't think we can eliminate the existing logic. I'm guessing the dominant cost is in finding+removing nodes from the middle of the deque, so a better data structure is called for if we want to optimize that.
(If that's not possible, then instead of replacing the existing iterator, we could just add a new one.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I studied the reference you stated - the coloring logic being used (white/grey/black) is just syntactic sugar which simplifies the discussion of the algorithm, as well as some of the proofs. The implementation in this PR satisfies all constraints wrt allowable orderings, while being significantly more efficient. It's quite easy to see how different (but valid) orderings can be obtained.
- Example 1: triangle graph (a,b)(b,c)(c,a). A valid DFS ordering: [a,b,c]. Another valid ordering: [a,c,b].
- Example 2: even without cycles, different orderings exist. For instance, in a graph with 2 edge (a,b)(a,c) again two orderings exist: [a,b,c] and [a,c,b].
The existence of different orderings is a direct consequence of the fact that in a graph, the neighbors of a vertex form an unordered set.
Just to be sure, I also checked some other reference works:
- Graphs Networks and algorithms by Dieter Jungnickel
- Introduction to Algorithms by Thomas H. Cormen , Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
None of these references state additional constraints on feasible orderings aside from the ones that have already been implemented in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring to the constraint between discovery order and finish order. See the "parenthesis theorem" in the link above. Your getExpectedFinishString is identical to getExpectedStr2 because you are calling finishVertex as soon as you encounter a vertex, which is incorrect. We're not supposed to finish a vertex until we've finished everything reachable from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkinable @jsichi Indeed, you need some extra bookkeeping in order to call finishVertex()
at the correct time. One solution is to add an extra flag on the stack entry to indicate whether this is the first time that this vertex has been added in the stack. When you pop()
that entry, you can call finishVertex()
.
@@ -35,6 +39,7 @@ | |||
* @author Liviu Rau, Patrick Sharp | |||
* @since Jul 30, 2003 | |||
*/ | |||
//public class DepthFirstIteratorTest{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete comment
@@ -55,7 +60,7 @@ String getExpectedStr2() | |||
@Override | |||
String getExpectedFinishString() | |||
{ | |||
return "6:4:9:2:8:7:5:3:1:orphan:"; | |||
return "1:3:6:5:7:9:4:8:2:orphan:"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring to the constraint between discovery order and finish order. See the "parenthesis theorem" in the link above. Your getExpectedFinishString is identical to getExpectedStr2 because you are calling finishVertex as soon as you encounter a vertex, which is incorrect. We're not supposed to finish a vertex until we've finished everything reachable from it.
*/ | ||
public class DepthFirstIterator<V, E> | ||
extends CrossComponentIterator<V, E, DepthFirstIterator.VisitColor> | ||
extends CrossComponentIterator<V, E, DepthFirstIterator.SearchNodeData> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to extends CrossComponentIterator<V, E, DepthFirstIterator<V, E>.SearchNodeData>
.
hi there |
@mykolapolonskyi in order to make the DFS implementation much more efficient I had to take some shortcuts. The algorithm is supposed to invoke the finishVertex() method in a rather specific order, see: http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/GraphAlgor/depthSearch.htm
Either way, I need to find some time to finish this PR (meaning that I need to add some extra bookkeeping to call finishVertex in the right order). |
It is 2020 already, how much longer will it take? |
I've re-implemented the DFS iterator:
Let me know what you think about this new extension. I could also add the same functionality to the BFS iterator.
Finally, I believe there's a minor bug in the BFS iterator: the BFS iterator never invokes
finishVertex(vertex);
, i.e. the code should probably read: