Should we rename the `Edge` trait? #687

wks · 2022-10-24T07:30:37Z

I am not planning to rename it now. I would like to discuss two things before we change the code.

What is the proper name for that trait currently named "Edge"?
Is that trait the right abstraction the API should provide?

The Edge trait

The Edge trait was introduced in #606

The purpose is to support different VMs that store object references in fields differently, giving the VM a place to customise. It is defined as:

pub trait Edge: Copy + Send + Debug + PartialEq + Eq + Hash {
    /// Load object reference from the edge.
    fn load(&self) -> ObjectReference;

    /// Store the object reference `object` into the edge.
    fn store(&self, object: ObjectReference);

    /// Prefetch the edge so that a subsequent `load` will be faster.
    #[inline(always)]
    fn prefetch_load(&self) {
        // no-op by default
    }

    /// Prefetch the edge so that a subsequent `store` will be faster.
    #[inline(always)]
    fn prefetch_store(&self) {
        // no-op by default
    }
}

An edge can be anything that supports the following operations:

load: load an object reference from it. Used to support any GC.
store: store an updated object reference back to it. Used to support copying GC.
prefetching: as a way of optimisation.

And it needs to support Copy + Send in order to be transferred between work packets, and Debug + PartialEq + Eq + Hash to support sanity GC.

So an implementation of the Edge trait could be

a field that holds an ordinary (uncompressed, unoffsetted, untagged) pointer,
a field that holds compressed pointer (so it can re-compress it when stored back),
a field that holds pointer with an offset (so it can re-apply the offset when stored back),
a field that holds tagged pointer (so it can restore the tag bits when stored back),
a local variable on the stack,
a memory location considered as global root,
an entry of a HashSet (when updated, needs to be stored to a different bucket)
etc.

A ProcessEdgesWork should hold a list of Edge instances, and process each of them. I am open to the idea of specialising work packets for each different kind of edges. I do not know whether it is more efficient to use enum to make a union type of different kinds of edges in a VM (such as OpenJDK) and put all of them into a Vec<OpenJDKEdge>, or scatter different edges into different work packets.

What's the right name?

But recently, debates around the meaning of "object reference" let us rethink whether "edge" is the proper name of this thing.

The GC Handbook does not define "edge" in its glossary, but mentioned "edge" when defining object graph. An edge (an object graph) is "a reference from a source node or a root to a destination node".

The GC Handbook defines "field" as "a component of an object holding a reference or scalar value", and "slot" as a synonym of "field".

According to these definitions, what I defined as the trait Edge satisfies the definition of "edge" of the GC handbook. But a trait Edge instance does not have to be a field or slot, because it may be used to represent a local variable root from the stack or a global root.

Do we need a `Slot` or `Field` trait?

Recently, @wenyuzhao mentioned the need to get the address of a field in order to implement field-remembering barriers. If a trait Edge instance is always a field, then it is easy to just add a get_address() method so the barrier can use that to access the metadata.

However, a trait Edge instance doesn't have to be a field, and the field-remembering barrier doesn't make sense for roots. So although we can let Edge::get_address() return an address on the stack, it still looks out of place.

Currently (even before I introduced the Edge trait), the ProcessEdgesWork work packet does not distinguish between object field edges and root edges. Before the introduction of the Edge trait, ProcessEdgesWork holds a Vec<Address> where the Address can point to anywhere, including object fields, stack variables and global variables. With trait Edge introduced, it is now a Vec<VM::VMEdgeType>, but still doesn't differentiate root edges and field edges. I am not sure if that is a problem if we merge LXR into MMTk core. It looks like as long as we have the Edge::load(), Edge::store() and Edge::prefetch_xxx() methods, we can update any edge for copying GC, regardless of whether those edges are from roots or from fields.

Correction: Currently, the boolean field ProcessEdgesBase::root is set to true if the current work packet holds root edges instead of field edges.

The text was updated successfully, but these errors were encountered:

wks · 2023-10-18T03:48:03Z

In today's meeting, we think Slot is a better name because "Edge" may refer to the content of the slot, while "slot" unambiguously refers to the address of it.

lnihlen · 2024-04-15T10:53:24Z

Implementor feedback here, in case it's helpful.

I'm learning MMTk while implementing support for my rust implementation of the SuperCollider interpreter. To me I use the name Slot internally to reference the type-tagged pointer data structure that holds runtime dynamic type+data information. This is also how the current SuperCollider interpreter names these things.

I didn't have an issue with Edge so much as I found the documentation to be misleading, thinking that I needed to extend my Slot implementation in order to support the Edge trait. So the documentation lead me to believe that Edge is a container for a value which might possibly contain a pointer. Rather, after some questions on Zulip, I learned that it is a pointer to a value which might possibly contain a pointer.

Which leads me to my question - is there a situation where the structure implementing Edge would need to contain anything other than an Address? For me, even just reading in the documentation that the struct is usually just an Address would likely have been enough.
Or one could make these free functions, perhaps in a larger trait, that accept an Address and return an ObjectReference.

Thanks, I think MMTk is really neat!

wks · 2024-04-15T11:43:13Z

is there a situation where the structure implementing Edge would need to contain anything other than an Address?

One use case is a situation where an object contains an array, and a slot holds a pointer to an element of that array, and the index of the element it points to is only known at run time instead of compile time. For example:

struct Array {
    Header header;
    size_t length;
    int32_t elements[];
};

struct Foo {
    int blahblah;
    size_t index;
    void* pointer;
};

void someFunction() {
    Array *ary = array_new(10000);
    Foo *someObject = foo_new(...);
    size_t index = read_number();
    someObject->index = index;
    someObject->pointer = &ary->elements[index];
}

In the code above, someObject->pointer points to the middle of an Array object. It is insufficient to mark the Array instance using this pointer because the object metadata is in the header. We know the index is stored in the someObject->index field, but we don't know it at compile time.

In that case, an implementation of Edge will contain two fields:

The pointer to the element, and
The offset from the beginning of the object to the field

To represent the field someObject->pointer, we create such an Edge instance with the pointer as &someObject->pointer, and the offset being offsetof(struct Array, elements) + someObject->index * sizeof(int32_t). The VM binding gives such an Edge to mmtk-core. Then mmtk-core can reconstruct the address to the header by loading from the pointer and subtracting the offset. It can do the usual marking, forwarding, etc. If mmtk-core moved the Array, it can inform the VM binding so that the VM binding can implement Edge::store by adding the offset to the address of the moved object and storing it back to the pointer. In this way, someObject->pointer still points to the same element even after the Array instance has been moved.

lnihlen · 2024-04-15T12:33:05Z

I appreciate the explanation, thanks very much!

This commit changes the term "edge" to "slot" where it actually refers to a field of an object or a variable on the stack or in any global data structures. Notably, the trait `Edge` is renamed to `Slot`, and related types and methods are renamed, too. We still use the term "edge" to refer to edges in the object graph, regardless whether the edge is represented by a slot or by an object reference pointing to the target. The work packet trait `ProcessEdgesWork` and its implementations still use the term "edge". Although it was originally designed to process slots, it can be used (and is currently actually used) in node-enqueuing tracing as well as weak reference processing, in which case it is used as a provider of the `trace_object` method which traces object graph edges represented as `ObjectReference` to the target objects. Note: This commit only does renaming, and does not change the program logic. The behavior of the program should be exactly the same after this change. Fixes: #687

udesou added the P-high Priority: High. A high-priority issue should be fixed as soon as possible. label Nov 13, 2023

wks mentioned this issue May 13, 2024

Rename edge to slot #1134

Merged

wks closed this as completed in #1134 May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we rename the `Edge` trait? #687

Should we rename the `Edge` trait? #687

wks commented Oct 24, 2022 •

edited

wks commented Oct 18, 2023

lnihlen commented Apr 15, 2024

wks commented Apr 15, 2024

lnihlen commented Apr 15, 2024

Should we rename the Edge trait? #687

Should we rename the Edge trait? #687

Comments

wks commented Oct 24, 2022 • edited

The Edge trait

What's the right name?

Do we need a Slot or Field trait?

wks commented Oct 18, 2023

lnihlen commented Apr 15, 2024

wks commented Apr 15, 2024

lnihlen commented Apr 15, 2024

Should we rename the `Edge` trait? #687

Should we rename the `Edge` trait? #687

wks commented Oct 24, 2022 •

edited

Do we need a `Slot` or `Field` trait?