Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State of diffing/patching redux #120

Open
kuukunen opened this issue Feb 3, 2020 · 0 comments
Open

State of diffing/patching redux #120

kuukunen opened this issue Feb 3, 2020 · 0 comments

Comments

@kuukunen
Copy link

kuukunen commented Feb 3, 2020

Hi. Just wanted to check if there's any update for plans for diffing and/or patching support?
I read through the previous issues I found, #58 and #67, but the discussions died down a bit and you were wondering about use cases.

Basically we are currently evaluating using immutable data types and Redux-style architecture for creating an editor for our games.

So one use case is tens/hundreds of thousands of game objects in some combination of maps and arrays.

Another would be for example painting the shape/texture/blend values of a huge terrain, say, 16 km x 16 km. In the terrain case of course the actual data would be in just plain byte buffers, but for undo and optimization/streaming from disk it needs to be chunked in some manageable sizes. (Example of a tool: https://youtu.be/YPR1XFOUvp4?t=361 )

Part of the solution might have to have some sort of spatial data structure on top, but even still the chunks might be pretty big for efficient handling, otherwise the spatial bookkeeping becomes an issue and it would be a waste to write some sort of structural sharing system in addition to the existing one.

(Slightly contrived) example of a demanding operation: Artist is editing a forest that is part of a 16km x 16km world with millions of trees. They box select ~thousand trees, then they move them a bit. Ideally the moving would be smooth (everything rendering 60fps, so 16ms per frame) and the trees would conform to the terrain as they slide.

We're actually just starting with that part, so we currently don't have specifics or any real world data, though.

Making an actual diff/patch functionality might of course be a bit gnarly, since you have to handle additions, deletions, changes, etc. There is some prior art in JS world, though:
https://github.com/intelie/immutable-js-diff
https://github.com/elierotenberg/remutable

I think for our use case the actual patching is not necessary, so one could do it with a diff iterator (possibly dereferencing, to a tuple, at least in a map case (Key1, Value1*, Value2*)). that could be used with a range based for.

I used pointers, but could also be (std::)optional. So if Value1 is nullptr/nullopt, you know Value2 was added, if vice versa, it was deleted, otherwise it was changed.

Or it could be its own special function, using a visitor, similar to for_each_chunk. And actually for performance reasons, especially for vector, it might even be necessary to have a for_each_diff_chunk that gives you all the chunks that might be different, based on the structural sharing, and leaves the element-wise diffing to the user.

It does become a bit complicated... Here's a quick sketch of usage...

immer::vector<Object> state;
...
state.for_each_diff_chunk(oldState, [](size_t index, auto begin1, auto end1, auto begin2, auto end2) {
    auto it1 = begin1; // Or you could just use begin1 directly, but that's a bit hacky.
    auto it2 = begin2;
    for(; it1 != end1 && it2 != end2; ++it1, ++it2)
    {
        if(*it1 != *it2)
        {
            doDiffStuff(*it1, *it2, index++)
        }
    }
    for(; it1 != end1; ++it1)
    {
        doDeleteStuff(*it1, index++)
    }
    for(; it2 != end2; ++it2)
    {
        doAddStuff(*it2, index++)
    }
};

I think this should take care of all cases:

  1. Normal case where there's just some changes in the chunk.
  2. It's the last chunk and elements were deleted. (Range 2 is shorter than 1.)
  3. It's the last chunk and elements were added. (Range 1 is shorter than 2.)
  4. The whole chunk was deleted. (begin2 == end2)
  5. The whole chunk added deleted. (begin1 == end1)

For vector I assume the implementation is somewhat straightforward, but based on my understanding of flex_vector, it might be more difficult and it might be necessary to split chunks to different calls, have bunch of one element "chunks" etc. But on the other hand, the whole use-case of flex_vector is that adding to the middle, concatenation etc is efficient, in which case iterating over differences doesn't make much sense anyway. (If you add/delete an element in the middle, the only thing that makes sense is to treat the whole vector as different from that point on.)

I also didn't really look into the map implementation, but based on a cursory look over the CHAMP paper, it sounds like similar efficient implementation of for_each_diff_chunk would be difficult and/or require exposing the internals to the API.

Perhaps another kind of interface is possible, one that takes three functions as parameters, one for each operation: change, add and delete, and then the map will try to figure out the longest continuous piece of memory the operation applies to. In fact, maybe it would be cleaner/more efficient even for the vector case.

Apologies, this ended up becoming long-winded brainstorming. It is also possible we're forced to do lower level optimizations and instead handle diffing manually with something like dirty-lists, but that flies against the whole architecture and would be a mess, so I would like to try to keep it as pure as possible.

PS. In case you're interested about people using immutable data structures in production, there is actually some prior art in games. Insomniac rewrote their old Javascript tools to use C++ (Qt) with immutable data for Marvel's Spider-Man (2018), heavily relying on structural sharing etc. Ron Pieket made a talk about it at GDC and CEDEC, but unfortunately the material is behind a big paywall.

PPS. I noticed your allocators match my allocator system pretty well, so it would be possible to use my system with immer with minimal hassle, which is nice. Did you by any chance also see Alexandrescu's allocator talk?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants