Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low footprint long text diffs #13

Open
aaronshaf opened this issue May 6, 2014 · 8 comments
Open

Low footprint long text diffs #13

aaronshaf opened this issue May 6, 2014 · 8 comments

Comments

@aaronshaf
Copy link
Contributor

One of the great things about benjamine's jsondiffpatch is the ability to generate terse patches for changes made to long strings. See the "text" example here.

Right now, with JSON Patch it seems one is stuck doing a large replace. This is doubly awful if one supplies a test operation for the inverse.

Could jiff amend rfc6902 with a patch operation, and use google-diff-match-patch for it?

My use case is making/tracking changes made to wiki articles without wholesale PUTs. I don't want to store the entire article on every PATCH.

Thanks for the wonderful library!

@briancavalier
Copy link
Member

This is definitely interesting, and makes me think that we should find a way to support an extensible set of patch operation types. For example, one thing I can imagine is allowing new ops, such as perhaps text-patch, whose value is actually a contextual text diff.

It seems like each new patch type would need to support a few key operations:

  1. A patch operation, obviously!
  2. A diff operation
  3. A predicate for determining if the diff operation should be applied to a particular path+data pair. For example, in the "long text" case, the predicate could check the string length, or match against a regex, or something similar.
  4. An inverse operation
  5. It's not clear to me yet how commutation fits in here. It would be a shame if every patch type needed to know how to commute with every other patch type.

There would also need to be some API for the user to configure the set of patch operations he/she wants to use.

Does all of that spark any more thoughts, @aaronshaf ?

@aaronshaf
Copy link
Contributor Author

I believe jsondiffpatch provides everything needed (1, 2, 3, 4) to help jiff accomplish this.

3: It seems to already check for the presence of text from which to contextualize its change. Checking the string length would be a bad idea, since two commutative text-patches (such as those sent in separate PATCH requests) need not operate on a string with the same length.

4: Everything in a patch generated by jsondiffpatch enables one to reconstruct an inverse, hence its jsondiffpatch.reverse(delta).

5: This isn't clear to me either. But I assume you're trying to achieve commutation within batches of operations sent within individual PATCH requests. Since a text-patch can be limited to only individual properties, I don't think this would be an issue.

@briancavalier
Copy link
Member

I believe jsondiffpatch provides everything needed (1, 2, 3, 4) to help jiff accomplish this

Thanks, I'll have a look at their APIs and implementation.

Checking the string length would be a bad idea, since two commutative text-patches (such as those sent in separate PATCH requests) need not operate on a string with the same length.

Indeed! Good point :) Having pluggable patch types def requires some sort of predicate, I think (even tho string length is a terrible one, haha!). Do you know if jsondiffpatch allows new patch types to be provided by the user?

But I assume you're trying to achieve commutation within batches of operations sent within individual PATCH requests

Patch commutation is still fairly new to me too, but my current thinking is, at the most basic level, for JSON Patch, commutation means commuting 2 adjacent patch operations, eg an add and a remove. I believe that being able to do simple pairwise commutation, by extension, allows commuting two whole JSON Patches (ie batches), by commuting each operation in one batch "through" the entirety of the other batch, one pairwise commute at a time. IOW, once you can commute the atoms, you can commute batches of atoms.

Since a text-patch can be limited to only individual properties, I don't think this would be an issue.

I think it might still be an issue. Imagine two parties generate JSON Patches (from the same source document), each of which wants to modify the same long string property using a textual patch. It seems like the act of commuting the two JSON Patches would involve commuting the two textual patches contained within. Does that seem right?

@aaronshaf
Copy link
Contributor Author

Without some serious restraints, I don't think being conflict-free will be entirely possible. Especially with a text-patch addendum.

But most don't have eventual consistency as a high priority. Many systems are just fine dealing with last-wins.

@briancavalier
Copy link
Member

Without some serious restraints, I don't think being conflict-free will be entirely possible. Especially with a text-patch addendum

I totally agree. I think jiff needs to be able to expose conflicts to the developer somehow, possibly via exception, or some other mechanism (conflict resolver callback, etc.), so that he/she can deal with them in whatever way is appropriate for the application.

But most don't have eventual consistency as a high priority. Many systems are just fine dealing with last-wins.

Interestingly enough, it's a driving use case on my end right now :) But I think you're right in that many folks will be fine with a simpler "last-wins" strategy, so it'd be great to allow the conflict resolution strategy to be configured.

@aaronshaf
Copy link
Contributor Author

Perhaps one solution would be to split text into an array of paragraphs and/or words, and then generate a normal jiff patch on it.

@briancavalier
Copy link
Member

Does your new stuff put the entire value, though, in an "test" operation?

Unfortunately, it does right now :/ but only because otherwise, there's no way to support inverses.

Your idea is still really interesting, though! Let's both noodle on it a bit more and see if there's a way we could make it work.

I abstracted the patch operations types recently, which is a first step to allowing pluggable patch operations. All that's really needed now is to allow a caller to provide the set of patch operations he/she wants to use. That might be a viable alternative, too, but obviously would be outside the RFC.

@briancavalier
Copy link
Member

I raised this for discussion over in json-patch/json-patch2#6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants