Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in result of cleanupSemantic() function on iOS and Android. #121

Open
pradiptilala opened this issue Oct 29, 2021 · 3 comments
Open

Comments

@pradiptilala
Copy link

pradiptilala commented Oct 29, 2021

We have used objective-C code for iOS and Java code for Android from library. We are using same set of function to find the difference of string in Android and iOS.

But for some texts we are getting difference results for cleanupSemantic() function on Android and iOS.

  1. In code first we have used the diff_main() function which returns us list of Diff models. In iOS and Android its returning list of Diff() with length 399.

  2. Second, we have used cleanupSemantic() to get the result list of Diff() models. In iOS, cleanupSemantic() function returns list of Diff() with length 8 whereas in Android same function returns list of Diff() with length 11.

  3. Then we are using the diff_prettyHtml function to get highlighted text. We are using this highlighted html text to load into the WebView.

  4. We have customized diff_prettyHtml function to show inserted text in only green background without underline.

Due to difference in result of cleanupSemantic() function on iOS and Android. We are not able to show same text difference on both iOS and Android devices.
iOS result image:
iOS_text

Android result image:
android_text-min
From above images you can find that last 2 lines in Android is different from that is in iOS.
You can also use previousText and updatedText value that we have posted in below example to replicate the issue.

Code used on Android side is as follows:

String **previousText** = “Customer shall pay Consultant’s expenses $200000 per month, as determined by Consultant in its reasonable business judgment, for performing the Services under this Agreement. Consultant shall invoice Customer for Services. All such invoiced amounts become due and payable to Consultant upon Client’s receipt of such invoice. Amounts that are not paid within fifteen (15) days following Customer’s receipt of such invoice will incur a penalty of $10000 per month or the maximum allowed by law, whichever is less.  Customer shall pay any amounts incurred by Consultant in the collection of past-due amounts owed, including, but not limited to, reasonable attorneys’ fees and costs. This is updated now<br>”;

String **updatedText** = “!%@%%!%%@%%@515252555115   A computer program can easily produce gibberish - especially if it has been provided with garbage beforehand. This program does something a little different. It takes a block of text as input and works out the proportion of characters within the text according to a chosen order. For example, an order of 2 means the program looks at pairs of letters, an order of 3 means triplets of letters and so on. The software can reg”;

diff_match_patch diffMatchPatch = new diff_match_patch();
                   
LinkedList<diff_match_patch.Diff> listStrings = diffMatchPatch.diff_main(previousText, updatedText);

diffMatchPatch.diff_cleanupSemantic(listStrings);

String sb = diffMatchPatch.diff_prettyHtmlCustom (listStrings);

webViewInstance.loadDataWithBaseURL("", sb, "text/html", "utf-8", "");

We have some of the questions regarding the same:

  1. Is this difference in results is due to different implementation of library in Objective-C and Java ?
  2. Is there is any suggested modification or solution for this issue to make it work for both iOS and Android ?

Please suggest us an solution to overcome this issue as soon as possible.

We are very near to our app release, prompt reply will be very helpful.

Thanks,

@dmsnell
Copy link

dmsnell commented Oct 29, 2021

Hi @pradiptilala,

The guarantees in this library revolve more around producing a diff that when applied against a source text will produce the output text. To that end many of the separate implementations will produce different but legitimate diffs.

If you need to guarantee that the generated diffs are the same you may need to look into using a diffing library focused on that, or you could try to submit a patch to the Java or Objective-C code where they differ. Matching the algorithms should be easier between those two since they both represent strings internally with UTF-16, but having worked in both libraries, I'm not where the differences lie.

You might start by reviewing cleanupSemantic itself. If you are in a crunch you might also consider looking at line-mode diffs which I think should be inherently much closer if not identical across the libraries. From there you might be able to re-apply diff-match-patch on those changed lines if you can reliably pair up the ones that are replaced.

@pradiptilala
Copy link
Author

pradiptilala commented Oct 30, 2021

Hey @dmsnell ,
Thanks a bunch for your comments. We have tried evaluating the suggestions provided.
Please find our comments below. It would be helpful if you could guide us further.

Internally library is using the UTF-8 string not UTF-16. We have also checked line-mode diffs but its not giving the result as expected. We have reviewed this cleanupSemantic() function in both iOS and Android, implementation looks same but we could not find any difference.

Request your guidance.
Cheers!

@dmsnell
Copy link

dmsnell commented Oct 31, 2021

@pradiptilala

Internally library is using the UTF-8 string not UTF-16.

I was speaking about Java and Objective-C themselves, for instance, a Java Character represents a UTF-16 code unit. But I didn't mean to make this a point. I only meant to say that the two languages are compatible in how they treat strings. That wouldn't be the case, for example, when comparing Java and Python3, or even Python2 and Python3, as Python3 stores strings as a sequence of Unicode Scalar Values instead. But I digress, again, this isn't a sticking point and I didn't mean to make it sound more important than it is.

We have reviewed this cleanupSemantic() function in both iOS and Android, implementation looks same but we could not find any difference.

if you verified that those functions operate identically then you'll just want to scan around to the other functions. the point I should have stressed more is that the diffs themselves aren't guaranteed to be identical across the different libraries in this project. the guarantee is that once you apply the diffs to the same source text you will find the same output text, and those diffs generally should be human-readable.

you could ask for your application, is it critical that Android and iOS shows the same diffs, the same patches? consider that core tools like git don't guarantee particular diffs, and it's possible to get different diffs for the same patch simply by telling git to use a different diff algorithm. if the goal is understanding how a document changed between versions it may be good enough having different behaviors on the platforms because both should give an accurate view.

if you have to have identical diffs it could be the case that you need to look for a library that guarantees that. I'm not a maintainer here but I don't think producing identical diffs is even a goal of the library, and if it happens to do that, I'm not sure it will stay that way as the library grows over time.

hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants