Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store extra information in iterator needed to find base of allocation #190

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

dwightguth
Copy link
Contributor

@dwightguth dwightguth commented Oct 18, 2021

When the user intends to use an explicit, relocating garbage collector
in conjunction with the champ type, it becomes necessary to be able to
adjust the pointers inside a champ_iterator to point to the new address
of the relocated collection node that is currently on the iterator's
path. In order to do this, it is necessary to be able to compute the
base pointer of the allocation for the node containing the buffer of
node pointers into which that particular path entry is pointing.
However, this is not currently deducible from the information stored
inside the iterator itself.

Here, at the cost of an additional max_depth<B> bytes (which amounts to
16 bytes by default including padding) per iterator created, we maintain
that information in the form of an integer offset for each nonzero entry
in the path. The garbage collector can then take each node** in the path
and subtract the corresponding offset from it using pointer arithmetic
in order to obtain the address corresponding to the start of the
children buffer, from which the base of the allocation is a known
constant offset.

In order to save space, we assume that each node can have at most 256
children. This seems to me to be a safe assumption, because the B value
that is a template parameter appears to trigger a static assertion if it
is ever set greater than 6.

By the same logic, we also allocate an additional 4 bytes in order to
perform the same set of steps with the pointers to the values. By
default, on 64-bit machines, these four bytes live inside what was
previously padding between depth_ and path_, thus they do not
actually affect the overall size of the iterator. This may not be true
on 32-bit machines, however.

When the user intends to use an explicit, relocating garbage collector
in conjunction with the champ type, it becomes necessary to be able to
adjust the pointers inside a champ_iterator to point to the new address
of the relocated collection node that is currently on the iterator's
path. In order to do this, it is necessary to be able to compute the
base pointer of the allocation for the node containing the buffer of
node pointers into which that particular path entry is pointing.
However, this is not currently deducible from the information stored
inside the iterator itself.

Here, at the cost of an additional max_depth<B> bytes (which amounts to
16 bytes by default including padding) per iterator created, we maintain
that information in the form of an integer offset for each nonzero entry
in the path. The garbage collector can then take each node** in the path
and subtract the corresponding offset from it using pointer arithmetic
in order to obtain the address corresponding to the start of the
children buffer, from which the base of the allocation is a known
constant offset.

In order to save space, we assume that each node can have at most 256
children. This seems to me to be a safe assumption, because the B value
that is a template parameter appears to trigger a static assertion if it
is ever set greater than 6.

By the same logic, we also allocate an additional 4 bytes in order to
perform the same set of steps with the pointers to the values. By
default, on 64-bit machines, these four bytes live inside what was
previously padding between `depth_` and `path_`, thus they do not
actually affect the overall size of the iterator. This may not be true
on 32-bit machines, however.
@dwightguth
Copy link
Contributor Author

From what I can tell, this doesn't actually affect the coverage of the code at all; the only reason it shows the coverage going down is that there are some basic blocks that weren't covered before that have gotten a little bigger now. Are there any specific tests you'd like to see me write here?

@dwightguth dwightguth marked this pull request as ready for review October 19, 2021 14:48
@codecov-commenter
Copy link

codecov-commenter commented Oct 22, 2021

Codecov Report

Merging #190 (71ba3b5) into master (36d1000) will increase coverage by 0.02%.
The diff coverage is 97.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #190      +/-   ##
==========================================
+ Coverage   91.27%   91.30%   +0.02%     
==========================================
  Files         104      105       +1     
  Lines        9574     9614      +40     
==========================================
+ Hits         8739     8778      +39     
- Misses        835      836       +1     
Impacted Files Coverage Δ
immer/detail/hamts/champ_iterator.hpp 91.56% <95.45%> (+0.79%) ⬆️
test/set/relocation.cpp 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36d1000...71ba3b5. Read the comment docs.

@arximboldi
Copy link
Owner

Hi @dwightguth! This is actually very interesting work. I find it quite exciting to see the library used for the runtime of such a language, and it was one of the main use-cases highlighted in the original paper.

One thing of concern about this PR though is that it adds some overhead to iterators that is not needed in most common scenarios. Ideally, I would like to provide an API between the data-structure and the memory_policy, such that this behavior can be toggled at compile time without overhead for other users. One "not-so-much-thought-into-it" way would be to add a flag expose_pointers_for_relocation or something like this to the memory policy and then the data-structures adapt their interface or implementation based on this.

But maybe there is a smarter way, were the data-structures interacts directly with the copying collector somehow via a custom API exposed through the memory policy. For that I would need to sit down and think about it further, and also maybe learn more about the implementation of your copying GC. But maybe you also have ideas here...

@dwightguth
Copy link
Contributor Author

Well, if your main concern is eliminating overhead when the user doesn't care about this additional information, that's going to be difficult to completely eliminate unless everything to do with deciding which behavior to use happens at compile time, like you said.

Originally I considered putting the changes to the data structure layout under a preprocessor definition in config.hpp, but decided after I got everything implemented that a couple extra memory reads and writes per iterator access, plus 16 additional bytes in a previously 136-byte type weren't, in the end, that significant.

If you feel it's justified, I could just put that code back in place. There is some precedent for using preprocessor defines for this already, for example with IMMER_TAGGED_NODE. That being said, if you want it to be part of the memory policy, I should think that ought to be doable as well. It would require a bit of tuning with the templates, but champ_iterator is already a class template, so you would just need to figure out how to make an extra template specialization for the case when the boolean template parameter is true, and then provide slightly different behavior based on whether you are using the primary template or the explicit specialization.

It might be tricky to get it to a point where the code doesn't have any duplication in this approach, though. I should think you might still have some overhead in the form of a function call to a function that does nothing, but at least that call could be removed by the optimizer if you're building in release mode.

I'm willing to give either a try, so just let me know which approach you favor and I'll try and get started on it on my next workday.

@dwightguth
Copy link
Contributor Author

@arximboldi did you have any preference for or against either of the solutions I proposed here?

@arximboldi
Copy link
Owner

arximboldi commented Nov 5, 2021

Hi @dwightguth!

My apologies for the very delayed response.

I think my preferred approach is to solve it via templates. This may be a bit more work on top of your current solution, but I think it makes things a bit more clearer. I would try as much as possible to avoid code duplication: please do not just specialize the whole champ_iterator class. There are a couple of ways to do this:

  1. Figure out if there is an underlying "abstraction" behind what you are trying to accomplish (this is, a policy),and define an interface for it and provide an "empty" implementation either the implementation you are looking for through the memory_policy. An example of this are the "reference counting" policies defined in immer/refcount the "heap policies" in immer/heap or the "transcience policies" in immer/transience. Maybe this would make sense as an extension of the heap policy to interact with the underlying heap/gc in marking these "moveable roots", or something else. Note that I am not worried of the overhead of function calls or empty function calls or type, since they are optimized away and debug mode performance is not a priority for the library.

  2. If that proves to be too difficult, I may find it acceptable to expose this "property" just as a boolean in the memory policy, that then the code selects based on that. There are already instances of that like prefer_fewer_bigger_objects or use_transient_rvalues. Sadly we can not use if constexpr to select on that because of C++14 support, but you may use the static_if facility in the library to achieve a similar result. You can find various uses in the library already as well.

I hope this helps and best of luck with the integration!

This commit refactors the previous implementation of this PR so that
both the memory and compute overhead of tracking relocation information
in the iterator is only performed if a specific memory_policy is
enabled. We create a new relocation_policy policy and track two
different implementations: no_relocation_policy, which is the default in
all cases, and gc_relocation_policy, which should be enabled if the user
is using the library with a relocating garbage collector.

Because so much information about how to track relocation information is
type-specific, the policy acts merely as a marker, and each type that
needs to track relocation information is responsible for defining a
template that does nothing and has no members if relocation is disabled,
and otherwise stores the necessary information and maintains it if that
information is required. We then instrument the class that needs to
track data by inserting calls to methods on the relocation information
template, which will be responsible for updating the information as
necessary.
@dwightguth
Copy link
Contributor Author

Looks like your CI is no longer working. We had the same problem with nix a couple days back. It seems to be an incompatibility between the latest version of the nix GitHub Actions and the 2.4 release of Nix that happened a couple days ago. Here is the temporary workaround for the issue that we went with in our repositories: runtimeverification/llvm-backend@6105287

I am working on the fix you proposed; still testing it. But I wanted to give you this information since it seems to be the reason all the CI checks are failing. I did run make check locally on this commit and it passed, but I still haven't tested with my own implementation yet or written the unit tests that I intend to.

@dwightguth
Copy link
Contributor Author

I've tested this code now with both make check and against our own integration tests, and it seems to provide the necessary functionality. I intend to write some unit tests next week, but this is now complete enough that you ought to be able to provide feedback on the design when you have a chance

Copy link
Owner

@arximboldi arximboldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is really going in the right direction. Thanks for all the effort into integrating your changes upstream.

Besides what I have commented inline, however, the only thing left bothering me is that I do not really understand how your code is actually using this. There is no client in the codebase for all this new code at all, which makes it quite esoteric and also prone to be broken again in the future. The code is not even part of the public interface of the iterator. Also it leaves me wondering if there is a simpler way to do some of this maybe.

So! My suggestion would be:

  1. Write some unit tests so there is actually some client in the code-base that uses this, there is a better understanding of what this does, and lower chances of` breaking this in the future.
  2. Can you point me to links in your code-base that uses this?

immer/memory_policy.hpp Outdated Show resolved Hide resolved
immer/detail/hamts/champ_iterator.hpp Outdated Show resolved Hide resolved
@arximboldi
Copy link
Owner

I have btw also pushed the fix you suggested for the Nix issues, thank you very much for the workaround!

@dwightguth
Copy link
Contributor Author

This is the code that makes use of this functionality: https://github.com/kframework/llvm-backend/blob/immer_update/runtime/collect/migrate_collection.cpp

Essentially, it needs to be able to take each object on the heap containing a collection or iterator, and fix up all the pointers stored in those objects to point to the relocated versions of those objects. There are a couple places where we end up relying on some hacky behavior like const_cast or casting a class to a struct with the same layout, but it functions, which is my primary goal. I intend to rely on friend classes for the unit tests.

After giving careful thought, I decided that the relocation information
of a class probably ought to be public since the garbage collector is
expected to need to be able to read this information.
This test is slightly fragile in that it probably will not work if the
depth of the tree is uneven across different nodes. However, in order to
make it completely robust irrespective of the platform-specific details
of the hash function and the number of elements in the set being tested,
it would be necessary to re-implement a lot of the functionality for
walking the tree in the test, which seems counterproductive to me as we
would end up just testing the business logic against itself rather than
against a flat set of expectations.

So intead, I ended up just implementing a partial check for the
relocation information which tests the major behavior of the step_right
and step_down functions in the iterator, rather than testing the entire
set exhaustively. This allows the expected results of the test to be
specified declaratively, which makes the test more appropriate in terms
of what functionality it is actually testing.
@dwightguth
Copy link
Contributor Author

Looks like this is still failing CI due to the SSH key issue, but it is now ready for review again.

@arximboldi
Copy link
Owner

Sorry, I forgot to merge #193 which fixes the ssh key issue.

@dwightguth
Copy link
Contributor Author

@arximboldi This is still failing due to the missing SSH key even though #193 is now merged in. Are you sure you correctly solved the issue?

@arximboldi
Copy link
Owner

Hmmm, ok, I guess my fix was wrong :)

@arximboldi
Copy link
Owner

Also: I did take a look at your latest changes and at the code that uses these changes. I see how your migration traversal really relies on the internals of the library. I am wondering, if maybe the traversal itself should somehow be made part of the library itself. What do you think? Basically have a special funcion migrate([](void*& { ... } )) that visits every allocated pointer for you and calls itself recursively on nested types too.

@dwightguth
Copy link
Contributor Author

I mean, there's room for maybe putting some of the traversal stuff in our code into the library, I agree. That doesn't really affect this particular change though because the code that uses this data actually is not performing a traversal of the data structure. all it's doing is updating every pointer within a single iterator object. I'm not opposed, but I'd rather not confuse different concepts by putting something like that into this PR. With that being said, we could maybe try to provide a similar interface for fixing up pointers within a single object that abstracts away some of the internal implementation details. The reason I'm hesitant though is because it might not actually end up extracting away that much information at all unless we encode details of the garbage collector into the library code, which would essentially make the function useless for anyone except me. Right now the code does kind of rely on knowledge of the internals of the library, it's true, but that doesn't really bother me much because we pin ourselves to a specific version and don't update it very frequently at all.

@arximboldi
Copy link
Owner

Yes, I understand your rationale. The thing is also the new relocation_info() also is somehow bound to your specific collector (maybe even more). Since there is no rush to merge this, let me think about it a bit more...

@dwightguth
Copy link
Contributor Author

Well, it's certainly the case that the relocation info struct is specifically designed for use by a relocating garbage collector, but any garbage collector which moves objects will need this information in some form in order to be able to relocate iterators. So there's a certain amount of generality there. My concern with making an explicit function for fixing up pointers is more that while any garbage collector that needs to move objects will need that information, how it uses that information to perform its task is somewhat dependent on other details of the gc implementation. For example, the garbage collector I wrote needs to be able to find the base of the allocation from any pointer to the root of a champ, but how it does this is partially dependent on the implementation of the champ and partially dependent on the implementation of the struct containing the champ that actually forms the allocation. It's possible this could still be worked around in order to come up with a fully generic mechanism for providing this information, but it's definitely going to prove finicky. Letting the garbage collector choose how to use the information it needs seems like a simpler solution to me. But if you want me to try to dive into trying to encapsulate the information a bit more, I can give it a try and let you know how it goes. I just don't know for sure that it's going to prove possible.

@arximboldi
Copy link
Owner

Sorry if this wasn't clear, but my suggestion was to add a traversal function to the library, that visits every pointer allocated in the data-structure, passing it as a reference so the callback can modify it. The actual relocation would still be part of your library. The idea would be to basically expose the same information (where all the nodes allocated by the data structure are, and how to mutate them for relocation) but in a simple interface that basically is just one single function, and does not require the caller to know where to go and find the actual pointers inside the data structure (which is what your client code does). Does this make sense?

@dwightguth
Copy link
Contributor Author

I mean, that really doesn't work, because a garbage collection algorithm depends on visiting pointers in a certain order that is probably not known to immer.

@arximboldi
Copy link
Owner

I see.

@dwightguth
Copy link
Contributor Author

dwightguth commented Jan 11, 2022

@arximboldi I am returning to this after our company's winter break. What about the following interface to iterator instead of what I have in this pr?

void relocate(tree_t *relocateRoot(tree_t *oldRoot),  node_t *const *relocatePath(node_t *const *derived, node_t *base));

This function would essentially inline the logic currently found here, in order to implement the following logic:

  1. Call relocateRoot to get the path to the relocated root of the collection being iterated on, and use it to update the pointer to the root inside the iterator
  2. For each level of depth in the iterator, call relocatePath, passing the derived and base pointer for that node in the path, and replacing the node with the pointer returned by the function.
  3. Replace the end and cur pointers with the relocated pointers to the same elements (no lambda is required to compute these values once the path has been relocated)

This would be an operation on an iterator that would be used to fix up the pointers inside a single iterator during a garbage collection cycle, without needing to have the garbage collector understand the internal details of the iterator structure or how that structure chooses to store information about the root of allocations.

Would this satisfy you?

@dwightguth
Copy link
Contributor Author

@arximboldi have you had a chance to take a look at the proposed encapsulated design I made? I am now focused on a different project, but it would be nice to not have to use a fork of your project. If you're not interested in merging this change though, we can live with a fork; it would be nice to get confirmation if that is the case though so that I can take this off my list of open tasks.

@arximboldi
Copy link
Owner

Hi @dwightguth!

I'm very sorry for the delayed reply. Reviewing this work requires a fair amount of brain power and is also a bit of an edge-case, so it always ends up getting to the bottom of my todo list. But I really appreciate your insistence in getting this through, as I really agree with you that it would be best if your use-case did not require a fork of the library!

I have a couple of considerations:

  1. Why make this a member of iterator instead of the data-structure itself? In Immer in general it's not safe to use iterators when the root of the data-structure is gone (even though this may not be 100% true in your case because of the use of garbage collection).

  2. The relocate_root and relocate_path functions do not need to look into the internals of tree_t and node_t, correct? In that case, I think that a better API would be to pass and receive void* to/from it, so those implementation types remain hidden.

  3. From an API point of view, the "modern C++" way would be to receive function obects via a template instead of function pointers. (You'd need to documment in a comment what the signature of these function objects is). For example:

template <typename RelocateRootFn, typename RelocatePathFn>
void relocate(RelocateRootFn&& relocate_root,  RelocatePathFn&& relocate_path);

Once again thank you for your contribution and sorry for the delayed reviews! I'll try stay in the loop to get this relocate(...) based version in.

@arximboldi
Copy link
Owner

To be clear, ideally your code would not use .impl() at all. I really have the feeling that this could be achieved, by adding relocate functions like the one you are suggesting to both iterator, map, vector, etc. The important property is that your library provides callbacks that operate on void pointers, and Immer does the iteration itself.

You mentioned before that the algorithm needs to traverse the pointers in certain order that is not known to Immer, but the pointers themselves are allocated by Immer, how can your algorithm have special information about the order of the pointers? If the information is something as simple as "deepth first" or "breath first", that is something that can be specified at an API level, isn't it?

@dwightguth
Copy link
Contributor Author

Thanks for the response! No worries at all about the fact that this is taking some time, I'm just happy we're making progress. I'm going to try to address your comments one at a time, and then get started on creating a PR with the proposed (modified) design, because I think we're close to converging.

  1. The reason I proposed making this particular method be on the iterator and not the collection is because it relocates the iterator itself. This is the particular use case that I've been striving towards in this PR. There is separate code in my project for relocating maps themselves, and we can maybe think about how to create an encapsulated relocation interface for them next if you would like, but that's not what this PR is trying to accomplish.
  2. You are correct that those methods can probably take/return a void * instead. I'm happy to make that change.
  3. I am not that familiar with modern C++ design idioms, but I'm happy to look into this and use this interface instead.

As far as the question of a relocate method for the collection itself... You may be right that this is possible. I would have to give some more thought to how this could be achieved, and ideally that would be a separate pull request. But I'll do some thinking about it once I have finalized this part of the design and created an updated PR.

That being said, the particular question you are asking about with respect to the order that the gc processes objects... The gc algorithm our own personal gc is based on is called Cheney's algorithm, and it has the property that it can operate in linear time based on the amount of still-live memory, because it relocates objects in a particular order: namely, it first relocates all garbage collection roots, adding them to a queue, then it loops through the queue updating the pointers inside each object and adding any objects referenced by that object to the end of the queue if it hasn't seen them yet. This loop is part of the core gc algorithm. As such, while it may be possible to relocate an entire collection at once, hiding the internal details, simply by doing a breadth first traversal of the trie, this would certainly be a disruption to the logic of the algorithm, especially because it would need to add objects that are /not/ allocated by immer to the queue as it proceeds (namely, the collection elements). Our algorithm makes some compromises here in terms of relocating the entire collection as a group, but this is definitely non-standard and I wouldn't necessarily expect other relocating garbage collectors to do so. So it may be possible? I'm willing to give it a try and see what it does to the code base, after we got this PR in. But I'm not entirely sure how reusable the result would actually end up being. It may be that if we want a truly reusable solution, it would have to be over the classes in the implementation rather than the interface. Let me know what your thoughts are here.

Regardless, I will try and push an update to this PR soon.

@arximboldi
Copy link
Owner

As such, while it may be possible to relocate an entire collection at once, hiding the internal details, simply by doing a breadth first traversal of the trie, this would certainly be a disruption to the logic of the algorithm, especially because it would need to add objects that are /not/ allocated by immer to the queue as it proceeds (namely, the collection elements).

What I see in your code is that you are already implementing a special case traversal for Immer data-structures. What I am trying to figure out is if the fundamentals of this traversal could be abstracted out into a general mechanism in Immer, with a bit of modern C++ in there. I feel like a similar interface to what you has suggested for relocate() could be used for that. Maybe an extra parameter would be needed, to be able to recur into the the actual objects from the collection...

Anyway, happy to review your next batch of changes!

@dwightguth
Copy link
Contributor Author

Yes, if we focus on specifically replicating the type of traversal used in our codebase, I imagine a function signature very like the one I mentioned above could be effectively used to centralize the details of traversing the data structure into the immer library itself. If that's something you're interested in, I'm happy to try to sketch up a PR for it when I have time after this change gets merged. I guess what I'm saying though is that such a solution may not be particularly useful for any other relocating gc algorithms that come along and try to use this library. But I can definitely try to create some code later if you still think it's worth it.

@arximboldi
Copy link
Owner

Ok @dwightguth,

After all this discussion, I think I'm happy to attempt a merge of this and the other open PRs.

I think it would be good to make your code independent of implementation details by providing traversal or relocation algorithms as discussed within the library. Good not only for this, but also for the longevity of your code (implementation details may change in the future and make your code incompatible with upgrades of the library...).

So, if you want: I will merge this PR, and follow up with a small PR from me where I cleanup a couple of things (rename some of the new data structures, add some comments so people can reach you for maintenance of the relocation info, etc.). In exchange, you will make over the next few weeks a new PR where you sketch a generic relocate() function provided in Immer for both data-structures and iterators that allows you to stop using impl() or relocation_info() directly. Does this sound ok for you?

@dwightguth
Copy link
Contributor Author

I would like to at least try to sketch an update to this PR to encapsulate the relocation information in the iterator before getting this merged. I just don't know exactly when I will have time to return to it.

@arximboldi
Copy link
Owner

Awesome. Thanks!

@arximboldi
Copy link
Owner

No rush from my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants