Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimized DFS iterator and added additional functionality #423

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

jkinable
Copy link
Collaborator

@jkinable jkinable commented Aug 4, 2017

I've re-implemented the DFS iterator:

  • new implementation is faster, more concise and better readable than original
  • added additional functionality which allows one to query the depth of a node in the DFS tree. Moreover, you can now also query the parent of a node in the DFS tree, as well as the edge between a child node and its parent. The latter is particularly useful in a multigraph. This functionality could (as far as I could tell) not be implemented through a TraversalListener. Obviously, this additional functionality requires a little more bookkeeping which comes with some computational overhead.

Let me know what you think about this new extension. I could also add the same functionality to the BFS iterator.

Finally, I believe there's a minor bug in the BFS iterator: the BFS iterator never invokes finishVertex(vertex);, i.e. the code should probably read:

@Override
    protected void encounterVertex(V vertex, E edge)
    {
        putSeenData(vertex, null);
        finishVertex(vertex);
        queue.add(vertex);
    }


private Deque<Object> stack = new ArrayDeque<>();
//private Stack<V> stack = new Stack<>();
private Stack<StackEntry> stack = new Stack<>();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jkinable. I've not had a look at the rest of the PR yet, but I wanted to say, looking at this, that I feel a bit iffy with Stack being used here rather than Deque.

If I may, I would like to suggest using Deque here instead, for a couple of reasons.

Firstly, according to the ArrayDeque javadoc, it's explained that ArrayDeque "is likely to be faster than Stack when used as a stack".

And finally, the Stack javadoc explains that it extends Vector, which IIRC has built-in synchronization, which I would reasonably guess makes Stack slower than necessary in this case, since AFAICT the stack isn't being used in an asynchronous fashion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to raise a good point. I completely forgot about this. I always find this amusingly contradictory: "This class is likely to be faster than Stack when used as a stack".
Any idea whether it is better to use a ArrayDeque or a LinkedList as stack?

Copy link

@jbduncan jbduncan Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably ArrayDeque I think.

LinkedList uses more memory per element than array-based data structures if I understand correctly (although, thinking more about it, this may be moot, since array-based data structures by their nature need to grow their internal arrays occasionally if they get too full, meaning array-based data structures are usually larger than necessary at any given point in time).

Furthermore I would guess that LinkedList is slower on modern hardware, since it uses pointers a lot, which I believe is not CPU-cache-friendly.

So I'm inclined to go for ArrayDeque unless profiling shows that LinkedList is better in some way that matters for JGraphT.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope this helps. :)

@jsichi
Copy link
Member

jsichi commented Aug 16, 2017

@jkinable I haven't had a chance to review the DFS change yet...could we keep that separate from the other changes to make reviewing easier?

@jkinable
Copy link
Collaborator Author

@jsichi I've separated the DFS changes and the connectivity changes (separate PR).

@@ -55,7 +60,7 @@ String getExpectedStr2()
@Override
String getExpectedFinishString()
{
return "6:4:9:2:8:7:5:3:1:orphan:";
return "1:3:6:5:7:9:4:8:2:orphan:";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've eliminated the old white/gray/black bookkeeping that was producing the correct finish order. Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure that I understand your comment. In DepthFirstSearch there is no such thing as a 'correct finish order'. Quoting from the original author of the test class:

NOTE: This test uses hard-coded expected ordering isn't really guaranteed by the specification of
the algorithm. This could cause false failures if the traversal implementation changes.
In other words, alternative Depth First Search implementations may produce a different traversal order, while still producing a different but correct Depth First Search Tree.

The original implementation using white/gray/black bookkeeping was surprisingly slow. Some tests on reasonable sized graphs gave me one to two orders of magnitude speedup with the new implementation. Unless I misunderstood something, the book keeping was rather unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that the finish order is not deterministic, which I believe is what that comment meant. However, there are constraints on the allowable orderings:

http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/GraphAlgor/depthSearch.htm

So I don't think we can eliminate the existing logic. I'm guessing the dominant cost is in finding+removing nodes from the middle of the deque, so a better data structure is called for if we want to optimize that.

(If that's not possible, then instead of replacing the existing iterator, we could just add a new one.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I studied the reference you stated - the coloring logic being used (white/grey/black) is just syntactic sugar which simplifies the discussion of the algorithm, as well as some of the proofs. The implementation in this PR satisfies all constraints wrt allowable orderings, while being significantly more efficient. It's quite easy to see how different (but valid) orderings can be obtained.

  • Example 1: triangle graph (a,b)(b,c)(c,a). A valid DFS ordering: [a,b,c]. Another valid ordering: [a,c,b].
  • Example 2: even without cycles, different orderings exist. For instance, in a graph with 2 edge (a,b)(a,c) again two orderings exist: [a,b,c] and [a,c,b].
    The existence of different orderings is a direct consequence of the fact that in a graph, the neighbors of a vertex form an unordered set.

Just to be sure, I also checked some other reference works:

  • Graphs Networks and algorithms by Dieter Jungnickel
  • Introduction to Algorithms by Thomas H. Cormen , Charles E. Leiserson, Ronald L. Rivest, Clifford Stein
    None of these references state additional constraints on feasible orderings aside from the ones that have already been implemented in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm referring to the constraint between discovery order and finish order. See the "parenthesis theorem" in the link above. Your getExpectedFinishString is identical to getExpectedStr2 because you are calling finishVertex as soon as you encounter a vertex, which is incorrect. We're not supposed to finish a vertex until we've finished everything reachable from it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkinable @jsichi Indeed, you need some extra bookkeeping in order to call finishVertex() at the correct time. One solution is to add an extra flag on the stack entry to indicate whether this is the first time that this vertex has been added in the stack. When you pop() that entry, you can call finishVertex().

@@ -35,6 +39,7 @@
* @author Liviu Rau, Patrick Sharp
* @since Jul 30, 2003
*/
//public class DepthFirstIteratorTest{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete comment

@@ -55,7 +60,7 @@ String getExpectedStr2()
@Override
String getExpectedFinishString()
{
return "6:4:9:2:8:7:5:3:1:orphan:";
return "1:3:6:5:7:9:4:8:2:orphan:";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm referring to the constraint between discovery order and finish order. See the "parenthesis theorem" in the link above. Your getExpectedFinishString is identical to getExpectedStr2 because you are calling finishVertex as soon as you encounter a vertex, which is incorrect. We're not supposed to finish a vertex until we've finished everything reachable from it.

*/
public class DepthFirstIterator<V, E>
extends CrossComponentIterator<V, E, DepthFirstIterator.VisitColor>
extends CrossComponentIterator<V, E, DepthFirstIterator.SearchNodeData>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to extends CrossComponentIterator<V, E, DepthFirstIterator<V, E>.SearchNodeData>.

@mykolapolonskyi
Copy link

hi there
I`ve found this PR very useful - can't understand status - has it been rejected?

@jkinable
Copy link
Collaborator Author

@mykolapolonskyi in order to make the DFS implementation much more efficient I had to take some shortcuts. The algorithm is supposed to invoke the finishVertex() method in a rather specific order, see: http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/GraphAlgor/depthSearch.htm
At least this is what the old (less efficient) implementation did. We always try to keep things the same as they were.
Notice however that:

  • the BestFirstIterator (incorrectly) doesn't call the finishVertex() method
  • aside from the above website, I don't know any other source that specifically states the correct finish order.

Either way, I need to find some time to finish this PR (meaning that I need to add some extra bookkeeping to call finishVertex in the right order).

@Overmind91
Copy link

It is 2020 already, how much longer will it take?
For me, the library almost useless without this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants