Skip to content

spkl/Diffs

Repository files navigation

Diffs

Provides a .NET implementation to the diff algorithm (shortest edit script) described by Eugene Myers in "An O(ND) Difference Algorithm and Its Variations". Unlike some other implementations, this one can compare sequences of any object type, using the standard Equals method or a custom IEqualityComparer.

Example Usage

In this example, we will diff two string sequences:

   0 1 2 3 4 5 6 (These are the indexes)
A: a b c a b b a
B: c b a b a c

To do this, we use the following code:

// using spkl.Diffs;
string[] a = new[] { "a", "b", "c", "a", "b", "b", "a" };
string[] b = new[] { "c", "b", "a", "b", "a", "c" };
MyersDiff<string> diff = new MyersDiff<string>(a, b);

Using generic typing, you can diff all sorts of types. You can also customize which values are considered equal by specifying a third constructor parameter (IEqualityComparer<T>).

There are two methods to access the results: GetEditScript() and GetResult().

GetEditScript()

GetEditScript() returns a sequence of edit instructions. You can understand these as instructions to follow to transform sequence A to sequence B. Every instruction contains four integers, a starting line number in A and B and the number of lines to add or remove. In this case, the return value would look like this:

LineA LineB CountA CountB
0 0 1 1
2 2 1 0
5 4 1 0
7 5 0 1

To better understand this, let's follow these instructions:

  1. In A, starting at line 0 (LineA), remove 1 line (CountA). From B, starting at line 0 (LineB), add 1 line (CountB) at position 0 (LineA). (This removes the first "a" and adds the first "c")

  2. In A, starting at line 2, remove 1 line. (This removes the "c")

  3. In A, starting at line 5, remove 1 line. (This removes the last "b")

  4. From B, starting at line 5 (LineB), add 1 line (CountB) at position 7 (LineA). (This adds the last "c")

Following these instructions, abcabba is transformed to cbabac.

GetResult()

GetResult() returns the resulting diff as a sequence of tuples that represent lines. In this case, the return value would look like this:

ResultType AItem BItem
A a
B c
Both b b
A c
Both a a
Both b b
A b
Both a a
B c

This is similar to how the result would be displayed in a visual comparison application. The AItem and BItem columns contain sequence A and B, respectively. The ResultType column shows whether a line contains a value from sequence A, B, or from both. You can see which values were matched with each other.

GetResult(ResultOrder)

You can specify the order in which you want unmatched lines to appear by using the GetResult(ResultOrder) overload. To visualize this, we need to use a new example.

A: a b c
B: x y

The following table shows how the result would be returned when specifying the different ResultOrder values AABB, BBAA, ABAB and BABA:

AABB BBAA ABAB BABA
a - - x a - - x
b - - y - x a -
c - a - b - - y
- x b - - y b -
- y c - c - c -