Skip to content

Line Based Diffs

Wilfred Hughes edited this page Jan 23, 2022 · 5 revisions

Most diff tools today use LCS algorithms. They typically apply to lines, but in some cases they're word-based.

Myers' diff algorithm

This is the default diff algorithm in GNU diff and git diff. It finds the longest common subsequence (LCS) and is used on a line-by-line basis.

There's a great introduction here and the original paper is An O(ND) Difference Algorithm and Its Variations, Myers 1986.

# Modern diff supports colour, but see also
# https://www.colordiff.org/
$ diff --color=always -u sample_files/css_before.css sample_files/css_after.css

Note that GNU diff originally used the Hunt-McIlroy algorithm.

Patience Diff

Myer's diff has a problem with sliders:

 if (!$smtp_server) {
+       $smtp_server = $repo->config('sendemail.smtpserver');
+}
+if (!$smtp_server) {
        foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                if (-x $_) {
                        $smtp_server = $_;

Instead of:

+if (!$smtp_server) {
+       $smtp_server = $repo->config('sendemail.smtpserver');
+}
 if (!$smtp_server) {
        foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                if (-x $_) {

Git has a --indent-heuristic that was added to reduce the likelihood of making a bad choice. There's a corpus of test files where the ideal diff has been chosen by a human, to test different heuristics.

The patience diff algorithm is an LCS algorithm that aims to do a better job with sliders. It produces great results by doing more work.

# Original behaviour
$ git diff --no-indent-heuristic --no-index sample_files/css_before.css sample_files/css_after.css
# As of git 2.11, this heuristic is enabled by default.
$ git diff --indent-heuristic --no-index sample_files/css_before.css sample_files/css_after.css
# Patience algorithm does a better a job in this example.
$ git diff --patience --no-index sample_files/css_before.css sample_files/css_after.css

Diff Match Patch also has some excellent discussions of diff designs on the author's website (e.g diff strategies).

Histogram Diff

Git 1.7.7+ also has a histogram algorithm, which aims to produce better results than Myers' algorithm but without the slowdown of the patience algorithm.

# Inferior to patience on this example file.
$ git diff --histogram --no-index sample_files/css_before.css sample_files/css_after.css

Smart Hunk Headers

https://tekin.co.uk/2020/10/better-git-diff-output-for-ruby-python-elixir-and-more

@@ -24,7 +24,7 @@ class TicketPdf
     ApplicationController.render(
       "tickets/index.html.haml",
       layout: "tickets",
-      assigns: { tickets: tickets }
+      assigns: { tickets: tickets, event_name: event_name }
     )
   end

Note how class TicketPdf is shown at the beginning.

Side-by-side Diff

$ diff -y --color=always sample_files/css_before.css sample_files/css_after.css

prettydiff

prettydiff does really well out of the box with the sample files here. It implements LCS on words.

wu-diff

wu-diff doesn't have much documentation, but it gives the same results as other LCS implementations in Rust.

delta

delta performs line based diffs, then uses levenshtein distance to highlight parts within the line. It also offers syntax highlighting.

diff-so-fancy

diff-so-fancy consumes normal diff output, so it's line based. It also performs word highlighting within lines, and generally has a prettier set of defaults.

git-split-diffs

git-split-diffs emulates GitHub's diff formatting style in a terminal.