Sorting out the object header, pinning, inflation, locks, and the GC contract #664
Replies: 6 comments 16 replies
-
I forgot to cover temporary pinning, where an object needs to be held at one address but only for the duration of some short operation. I believe that GraalVM fulfills this need by preventing safepoints but I could be wrong about that. Also worth noting that ad-hoc permanent object pinning likely requires some kind of safepoint coordination. |
Beta Was this translation helpful? Give feedback.
-
Thanks for doing the initial write up on this. I'm going to quibble a couple of the initial points to ensure we all have the same view of the foundation we're building on. I'll note that a lot of GC development has moved away from the idea of permanent generations as better algorithms have been developed that benefit from letting the GC have full(er) control and that there's a lot of research (and practice!) in this area we can learn from.
I'll also mention that perm gen / initial heap doesn't equal lower footprint for multiple deployments on the same machine. Any sharing will be quickly broken by the first write back to that heap segment. We should discount any footprint rationale on that basis.
Allocating things in an immovable generation leads to fragmentation when the object dies and we can't compact the space. We should try to avoid relying on immovable gen as much as possible to avoid the dreaded The design @theresa-m and I had been talking about for this was to borrow from OpenJ9 and use a circular linked list of
This is typically old gen or tenure space in a generational GC. These objects don't move often unless there is sufficient memory pressure that an old-gen collection is required, in which case being able to compact the space is critical to make effective use of the reclaimed memory. Pinning complicates the GC's life. It's best to avoid it and use other tools - like Handles which provide a level of indirection - instead. Ripping pinning back out later is hard but usually necessary to get better GC efficiency later. Do we want to base the design on extensive use of pinning which will limit our ability to use better GC tech in the future? I'd rather avoid introducing it if we can (and experience shows, we definitely can) |
Beta Was this translation helpful? Give feedback.
-
It's also worth re-reading the ObjectModel doc that @dgrove-oss put together as it lays out some of the design space and tradeoffs for locks and hashcodes: https://github.com/qbicc/qbicc/blob/main/docs/ObjectModel.adoc |
Beta Was this translation helpful? Give feedback.
-
One comment about inflation. As noted in the main description inflation usually involves moving the object (to get the space one needs for the inflated object). That implies it needs to be done during a GC cycle, because you have to find and forward all of the pointers to the object to refer to new location. So, inflation is a natural fit for dealing with identity hash codes (use address of object until the object is moved by GC, then inflate when copying to keep the original hash code value in the inflated form). It's a harder fit for locking since the need for a lock isn't tied to the GC cycle so directly. That's why for locking I would tend to go for some variant of the lock nursery design (maybe not a great name, but its what we called it 20 years ago). This avoids inline inflation for locking, by making fairly accurate guesses (based on presence of synchronized methods) whether or not a type is likely to be locked frequently. |
Beta Was this translation helpful? Give feedback.
-
A note on POSIX thread mutexes. The |
Beta Was this translation helpful? Give feedback.
-
These topics are intertwined. Here are the facts as I see them:
Class
objects that are deserialized into an array that is indexed by type ID and thus they cannot change or be moved, and are GC roots.j.l.Thread
is the immediate example) to be directly allocated in an immovable generation.pthread_mutex_t
/pthread_cond_t
pair requires 88 bytes of contiguous space). This implies that using POSIX mutexes "in line", while giving potentially good locality with fewer indirections, will consume significant memory unless inflated lazily. Additionally, neither POSIX mutexes nor conditions may safely be moved in memory (by specification). This means that the object would have to be pinned after inflation (and will require initially pinned allocations to be pre-inflated), and further prevents techniques such as creating the mutex and condition separately on demand.ReentrantLock
for the monitor and associated condition means that, using in-line object storage for the monitor, the allocated object itself would only need to accommodate one or two reference values (for just the lock or the lock+condition, respectively) which would be a maximum of between 8 and 16 bytes total depending on the platform and whether reference compression is in effect. Using inflation is simpler for this kind of solution than (for example) in-line POSIX mutexes, since the object can be safely moved at any time even if the lock has been inflated, and is "GC friendly" in that the inflated references can be treated as normal reachable objects. Out of line techniques (e.g. a map) don't have even these minimal storage requirements but generally require more indirection and may pose GC complications.Monitor
(final) class which implements both lock and condition behavior, along with in-line inflation, would have the benefits of the aboveReentrantLock
solution while only requiring one reference value. UsingReentrantLock
to implement the functionality within that class would be a pragmatic short-term solution though would not necessarily be optimal in the long term. Another potentially superior solution which is easy enough to be done in the near term but may also be viable in the long term (depending on how OS-portable it is) would be to inline POSIX mutex and condition structures into theMonitor
object as value fields (which btw should work today). This solution would require instances theMonitor
object to always be pinned upon allocation as mentioned above. Acquisition of theMonitor
for a given object instance could be accomplished using agetMonitor
VM helper method which would read the hidden field, taking into account object inflation possibly at a later date. This would allow the monitor bytecodes to be implemented as a pair of method calls (get monitor, calllock
/unlock
) and theObject
wait
/notify
methods to be implemented in pure Java (get monitor, callawait
/signal
).If we use in-line inflation for both locks and IHC (which represents both a "best case" in that object size is potentially minimized as well as a "worst case" in that the GC needs awareness and we have to do some work to locate and/or create the extra fields), a potential approach presents itself.
Whereas naively placing extra data at the end of the object would solve the problem, the size of the object would have to be used as a part of the calculation as to where to find the extra data. It would be possible to avoid this issue by placing the inflated fields before the object in memory at negative offsets from the object's base pointer. If we only had one kind of inflated information, the calculation would be fairly simple: if the header bits indicate the data is not present, inflate the object; then, add a constant negative offset to the base pointer to get the correct data pointer.
With multiple data items that can be independently inflated, the offset computation becomes slightly more complex. Given a total order over the potential inflated data, the offset for a given item would depend (solely) on what other earlier items are present. The data should be sorted by size/alignment requirement in such a way as to minimize space wastage.
Here are some visual representations I've cooked up just now. The images are not necessarily proportional. Word sizes are variable among platforms, and type ID size might also vary, which would alter the layout in various ways, so I've deliberately left off any kind of numerical offsets. But I think for purposes of illustration it should be pretty clear what is going on.
Uninflated object
Inflated object, just monitor
Inflated object, just IHC
Fully inflated object
Beta Was this translation helpful? Give feedback.
All reactions