Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch_fromText and lines breaks issue in Python #157

Open
weeger opened this issue Apr 9, 2024 · 6 comments
Open

patch_fromText and lines breaks issue in Python #157

weeger opened this issue Apr 9, 2024 · 6 comments

Comments

@weeger
Copy link

weeger commented Apr 9, 2024

Hello,

I am looking to utilize this fantastic library for applying patches to a file, but I'm encountering issues with broken line breaks. I have attempted to use the patch_make function to compare the behavior, and it operates effectively when the diff is generated by dmp. Could someone please guide me in identifying the potential issues in my diff file?

The wrong result :

"express": "^4.18.2",    "lodash": "^4.17.21",

The expected one :

 "express": "^4.18.2",
 "lodash": "^4.17.21",

This is my simple test code :

from diff_match_patch import diff_match_patch

original_json = '''{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "puppeteer": "^21.7.0"
  }
}'''

expected_json = '''{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}'''

# The unidiff string, which is an exact clone of what dmp generates for the same changes
diff_str = '''@@ -82,16 +82,42 @@
 .18.2",
+    "lodash": "^4.17.21",
     "pup'''

print("____________ BROKEN EXAMPLE")
dmp = diff_match_patch()
patches = dmp.patch_fromText(diff_str)
patch_text = dmp.patch_toText(patches)
new_text, _ = dmp.patch_apply(patches, original_json)
print(patch_text)
print(new_text)

print("____________ WORKING EXAMPLE")
patches = dmp.patch_make(original_json, expected_json)
patch_text = dmp.patch_toText(patches)
new_text, _ = dmp.patch_apply(patches, original_json)
print(patch_text)
print(new_text)

And the result

____________ BROKEN EXAMPLE
@@ -82,16 +82,42 @@
 .18.2%22,
+    %22lodash%22: %22%5E4.17.21%22,
     %22pup

{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}
____________ WORKING EXAMPLE
@@ -82,16 +82,42 @@
 .18.2%22,%0A
+    %22lodash%22: %22%5E4.17.21%22,%0A
     %22pup

{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}

We can clearely see that dmp adds a "%0A" (an encoded \n) in the generated patch, which is missing when creating it from text.

  • I conducted various tests on multiple patch formats.
  • Upon inspecting the core of the DMP, I discovered that the final aggregated text does not include line breaks in the diffs.

Version info :

diff-match-patch 20230430
system Ubuntu
python 3.10.12
@dmsnell
Copy link

dmsnell commented Apr 9, 2024

@weeger did you confirm if this works as expected in the other languages? in JavaScript, Java, Objective C, or any others?

@weeger
Copy link
Author

weeger commented Apr 13, 2024

Same problem, just tested in Javascript :

// Import the diff-match-patch library
const DiffMatchPatch = require('diff-match-patch');
const dmp = new DiffMatchPatch();

// Original JSON string
const originalJson = `{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "puppeteer": "^21.7.0"
  }
}`;

// Expected JSON string with a new dependency added
const expectedJson = `{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}`;

// Unidiff string representing the change
const diffStr = '@@ -82,16 +82,42 @@\n .18.2",\n+    "lodash": "^4.17.21",\n     "pup';

console.log("____________ BROKEN EXAMPLE");
// Converting the diff string to patches
let patches = dmp.patch_fromText(diffStr);
// Converting patches back to a diff string for display
let patchText = dmp.patch_toText(patches);
// Applying patches to the original JSON string
let [newText, results] = dmp.patch_apply(patches, originalJson);
console.log(patchText);
console.log(newText);

console.log("____________ WORKING EXAMPLE");
// Creating patches directly from original and expected JSON strings
patches = dmp.patch_make(originalJson, expectedJson);
// Converting patches to a diff string for display
patchText = dmp.patch_toText(patches);
// Applying these patches to the original JSON
[newText, results] = dmp.patch_apply(patches, originalJson);
console.log(patchText);
console.log(newText);

Result

____________ BROKEN EXAMPLE
@@ -82,16 +82,42 @@
 .18.2%22,
+    %22lodash%22: %22%5E4.17.21%22,
     %22pup

{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",    "puppeteer": "^21.7.0"
  }
}
____________ WORKING EXAMPLE
@@ -82,16 +82,42 @@
 .18.2%22,%0A
+    %22lodash%22: %22%5E4.17.21%22,%0A
     %22pup

{
  "type": "module",
  "dependencies": {
    "ejs": "^3.1.9",
    "express": "^4.18.2",
    "lodash": "^4.17.21",
    "puppeteer": "^21.7.0"
  }
}

@dmsnell
Copy link

dmsnell commented Apr 24, 2024

there one thing that stands out: in the broken example we have a patch file where the newline is percent-encoded as %0A. it's becoming part of the diff, but the patch tool already works on the line-level.

makes me wonder if there's something about the diff output that patch takes that makes an assumption about a single trailing newline vs. many. as in, maybe diff-match-patch should strip a single trailing newline from its output before percent-encoding it.

@Emasoft
Copy link

Emasoft commented Apr 28, 2024

@dmsnell I just encountered this issue myself. This bug is troublesome. Do you plan to fix it and publish the fix in your improved fork? ( https://github.com/dmsnell/diff-match-patch )? That would be great. Can't believe Google abandoned this project... 😔

@dmsnell
Copy link

dmsnell commented Apr 29, 2024

@Emasoft it's doubtful I'll fix this myself on any timeline someone else wants, but I'm willing to work with people who propose fixes and get it merged. I don't have the capacity do everything myself.

@17351269
Copy link

Lang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants