Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kill off-heap ObjectReference, or make a safer to_address method #1120

Open
wks opened this issue Apr 17, 2024 · 21 comments
Open

Kill off-heap ObjectReference, or make a safer to_address method #1120

wks opened this issue Apr 17, 2024 · 21 comments
Labels
P-normal Priority: Normal.

Comments

@wks
Copy link
Collaborator

wks commented Apr 17, 2024

TL;DR: We currently don't prevent ObjectReference to refer to off-heap objects, and we even have ActivePlan::vm_trace_object for off-heap objects. However, if we call ObjectReference(15).to_address(), and ObjectModel::to_address(object) returns object.to_raw_address - 16, it will overflow and have undefined behavior. That basically prevents us from doing anything useful that involve to_address (including accessing SFT or on-the-side metadata) without an extra overflow test. And we basically have to choose between killing off-heap ObjectReference and adding overflow check to every to_address call.

The problem

I found the problem when testing the function is_in_mmtk_space. The current signature and implementation is:

pub fn is_in_mmtk_spaces<VM: VMBinding>(object: ObjectReference) -> bool {
    use crate::mmtk::SFT_MAP;
    SFT_MAP
        .get_checked(object.to_address::<VM>())
        .is_in_space(object)
}

Suppose the value of object is 1, and the VM may implement ObjectModel::to_address(obj) as return obj - 16; (like what JikesRVM does). Then it will call ObjectReference(1).to_address(), and it will attempt to compute 1 - 16 which will cause an overflow. It will panic in the debug mode, or have undefined behavior in the release mode.

The cause

The cause is that the user may pass any value allowed by ObjectReference as the object argument. Currently it is usize, and will be changed to NonZeroUsize later. This is allowed because is_in_mmtk_spaces is mainly used for differentiating MMTk objects from off-heap objects, and is supposed to let the user check against anything allowed to be ObjectReference.

Our current definition of ObjectReference doesn't forbid encoding addresses of off-heap object as an ObjectReference. We don't define what a reference to an off-heap object looks like. 0 is obviously unlikely to be a valid reference, but we don't prevent the VM from using small integers like 1, 2, 3, 4, ..., 15, ... to represent a reference to an off-heap object. But ObjectReference::to_address(self) -> Address also doesn't require self to be an MMTk object, either. So it is possible that someone may call ObjectReference(small_integer).to_address().

A deeper problem

That leads to a deeper problem: How many places in MMTk-core assumes ObjectReference refers to an MMTk object?

One obvious problem is the SFT. The to_address method is originally introduced to help accessing on-the-side metadata and accessing the SFT. That's why to_address returns an address guaranteed to be within the allocated range of the object. If an ObjectReference refers to an off-heap object, that will not make sense. But in order to access the SFT, we always call objref.to_address() first. This means we can't do anything with SFT if object is off-heap. That include basic things, such as calling trace_object(object), even though we defined ActivePlan::vm_trace_object for tracing off-heap objects. Currently, SFTProcessEdges never calls ActivePlan::vm_trace_object. EmptySpaceSFT::sft_trace_object panics immediately.

Solution

There are several solutions;

Require that ObjectReference always refers to MMTk object

This immediately kills ActivePlan::vm_trace_object and everything supporting that. Currently no VM uses ActivePlan::vm_trace_object. Despite that, we need to think carefully when making this decision.

Make a safer ObjectModel::to_address

Make an ObjectModel::to_address_safe(object: ObjectReference) -> Option<Address> so that if it ever attempts to execute something like 1 - 16, it returns None instead of having undefined behavior. usize::checked_sub can help with this, but all users now require a check against None, including sft_trace_object which, when a VM does not have off-heap object, should always see Some(addr).

@qinsoon
Copy link
Member

qinsoon commented Apr 17, 2024

The example does not explain why this is a problem. We allow off-heap ObjectReference, but it still needs to point to a valid object. If a VM uses 16 bytes header, ObjectReference(1) or ObjectReference(15) is NOT a valid object reference for the VM.

@qinsoon
Copy link
Member

qinsoon commented Apr 17, 2024

Currently no VM uses ActivePlan::vm_trace_object.

To what I know, Julia and GHC use vm_trace_object.

@wks
Copy link
Collaborator Author

wks commented Apr 17, 2024

The example does not explain why this is a problem. We allow off-heap ObjectReference, but it still needs to point to a valid object. If a VM uses 16 bytes header, ObjectReference(1) or ObjectReference(15) is NOT a valid object reference for the VM.

The problem in the current is_in_mmtk_space(object) is that it does not require object to be a valid object reference.

Currently, bool MMTkHeap::is_in(const void* p) is implemented with is_in_mmtk_heap.

  // Returns "TRUE" iff "p" points into the committed areas of the heap.
  // This method can be expensive so avoid using it in performance critical
  // code.
  virtual bool is_in(const void* p) const = 0;

So it doesn't guarantee p to be a valid object reference.

One possible solution to is_in_mmtk_space alone is requiring its argument to be a valid reference. We can change its contract (the documentation) that (1) object must be a valid object reference, and (2) if a VM allows off-heap objects, then object.to_address() must always succeed as long as object is valid, whatever the definition "valid" is for off-heap objects. In this way, the VM will either make ObjectReference(15) invalid, or do not do object - 16 in to_address. Then we'll probably need another interface for debug purpose which works for all inputs, valid or invalid. But that'll probably be vaguely defined, too.

Other public interfaces that involve ObjectReference all require the ObjectReference to be valid.

@qinsoon
Copy link
Member

qinsoon commented Apr 17, 2024

The problem in the current is_in_mmtk_space(object) is that it does not require object to be a valid object reference.

ObjectReference has to be a valid object. This function is only used to check if an object is in MMTk spaces or not, not an arbitrary address. I would say OpenJDK use it in a wrong way.

@playX18
Copy link

playX18 commented Apr 18, 2024

This immediately kills ActivePlan::vm_trace_object and everything supporting that. Currently no VM uses ActivePlan::vm_trace_object.

I am using this feature for data section in bytecode. Some objects are in this static data section which is mmaped to memory allocated by VM rather than MMTK

@wks
Copy link
Collaborator Author

wks commented Apr 18, 2024

OK. We can't just remove ActivePlan::vm_trace_object because many bindings are using it.

The cleanest solution seems to be requiring all ObjectReference to be valid, including is_in_mmtk_space. But we will then need to create another function for OpenJDK's MMTkHeap::is_in, and should have a closer look at JikesRVM's requirement for isMappedObject.

@k-sareen
Copy link
Collaborator

k-sareen commented Apr 18, 2024

I kinda disagree with this whole conversation. I believe that it is actually is_in_mmtk_spaces that has an incorrect API not ObjectReference or OpenJDK etc. The API should be similar to is_mmtk_object, i.e. it should take an arbitrary address. ART uses it, for example, to check if an arbitrary address can be a valid object and also to distinguish between objects in boot image vs objects in MMTk. We do not always want to use heavy-weight mechanisms such as the VO-bit to query such fundamental things.

Going by the doc comment for the function, nothing there mentions that it should be a valid object and neither does the SFT (which does the actual check) require it to actually be an object. This function should simply be to check if an arbitrary address can potentially be allocated by MMTk. Hence, a more appropriate name would be is_address_in_heap_space.

/// Return true if the `object` lies in a region of memory where
/// - only MMTk can allocate into, or
/// - only MMTk's delegated memory allocator (such as a malloc implementation) can allocate into
/// for allocation requests from MMTk.
/// Return false otherwise. This function never panics.

EDIT: Re: What happens if ObjectReference is a handle or something equivalent so checking an arbitrary address does not help in querying if it is in the heap space. Look, my opinion/stance is that not all functions can be used by all bindings. If someone has a wacky way of implementing ObjectReference, then they should not use this function. The fact of the matter is that most bindings will be using schemes where Address and ObjectReference have a "sensible" relationship, so we should still support this important use-case. Not allowing bindings to query if an arbitrary address can potentially be allocated by MMTk because of niche ObjectReference implementations is not ideal.

@wks
Copy link
Collaborator Author

wks commented Apr 18, 2024

I think is_mmtk_object has a similar problem. The current implementation works only if object.to_address() == object.to_raw_address(). It is true for Ruby, Julia, OpenJDK, V8, but not JikesRVM. But luckily JikesRVM doesn't use it.

Excerpt from memory_model.rs:

pub fn is_mmtk_object(addr: Address) -> bool {
    use crate::mmtk::SFT_MAP;
    SFT_MAP.get_checked(addr).is_mmtk_object(addr)
}

SFT_MAP.get_checked(addr) is a misuse. It should be SFT_MAP.get_checked(ObjectReference::from_raw_address(addr).to_address()). However we shouldn't convert an Address to ObjectReference before we confirm that it is valid.

Maybe neither is_mmtk_object nor is_in_mmtk_space would work if object.to_address() == object.to_raw_address() doesn't always hold.

From the historical point of view, we had is_mmtk_object (introduced in March 2022) before we start to distinguish ObjectReference::to_address from ObjectReference::to_raw_address (since November 2022).

I believe that it is actually is_in_mmtk_spaces that has an incorrect API... The API should be similar to is_mmtk_object, i.e. it should take an arbitrary address.

The problem is, if objref.to_address() < objref.to_raw_address(), the raw address you find from the stack (an arbitrary word) can be several bytes after the region of an MMTk space, but the object is actually inside the space. That had caused some problems with JikesRVM. Before we had to_raw_address, we implemented a mechanism that refuses to allocate the last few bytes of a block so that the ObjectReference would remain in the same block. See: #555

There is no general way to find an address that is guaranteed to be inside the object given an ObjectReference. That's why we let the VM implement ObjectModel::ref_to_address(object: ObjectReference) -> Address. But that assumes object: ObjectReference is valid.

To implement a checking method (such as is_mmtk_object(addr: Address) -> bool or is_address_in_mmtk_heap_space(addr: Address), we need a "safe" method such as

trait ObjectModel {
    fn get_address_inside_object_if_it_were_objref(maybe_objref: Address) -> Address;
}

The semantics is:

  • If ObjectReference::from_raw_address(maybe_objref) happens to be valid, it shall always return Some(addr) where addr == ObjectReference::from_raw_address(maybe_objref).to_address().
  • Otherwise, it shall return addr2 where addr2 must be different from any objref.to_address() where objref is a valid object reference. In other words, if maybe_objref1 is valid but maybe_objref2 is invalid, they must not map to the same address.

In this way, return maybe_objref.wrapping_sub(16) may be a valid implementation because 15 maps to 0xffffffffffffffff, and no valid object reference maps to 0xffffffffffffffff.

Alternatively, we change is_mmtk_object to

fn is_mmtk_object(maybe_objref: Address, maybe_in_object: Address) -> bool;

where maybe_in_object has the same value as get_address_inside_object_if_it_were_objref(maybe_objref), except we tell the user not to call is_mmtk_object if the appropriate maybe_in_object cannot be computed (for example, maybe_objref == 15 && maybe_in_object == 15 - 16).

@k-sareen
Copy link
Collaborator

k-sareen commented Apr 18, 2024

I'm not sure I follow. is_mmtk_object is using the VO-bit. If the VO-bit is not set, then it is not an object. If it is set, then it is an object and can be safely converted to an ObjectReference (using the ObjectModel::address_to_ref function). There is no concern about object.to_address() == object.to_raw_address() since the VO-bit will only be set for object.to_address(). That is not what the contract is for is_in_mmtk_spaces (i.e. is_address_in_heap_space). The contract for that is a rough/quick yes/no of is this address possibly allocated by MMTk.

@wks
Copy link
Collaborator Author

wks commented Apr 18, 2024

... If it is set, then it is an object and can be safely converted to an ObjectReference. There is no concern about object.to_address() == object.to_raw_address() since the VO-bit will only be set for object.to_address(). ...

For example, in a VM, object.to_address() == object.to_raw_address() - 16. If a VO bit is set at address 0x20000008, it means ObjectReference::from_raw_address(0x20000018) is a valid reference (note that the addresses are not the same). In this case, is_mmtk_object(0x20000018) shall be true.

... If it is set, then it is an object and can be safely converted to an ObjectReference (using the ObjectModel::address_to_ref function).

No. Not ObjectModel::address_to_ref, but ObjectReference::from_raw_address. The VM stores raw ObjectReference values on the stack. When scanning the stack conservatively, we get the raw addresses.

Using the example above, if we have an objref ObjectReference::from_raw_address(0x20000018), the value 0x20000018 will appear on the stack. Then the stack scanner will call is_mmtk_object(0x20000018). Then is_mmtk_object will check if VO bit is set at 0x20000008 (Careful! This is different!). If it is, it means ObjectReference::from_raw_address(0x20000018) is valid.

The problem is, the VM could potentially call is_mmtk_object(8). The VM should have filtered it out using address ranges, but the current API says is_mmtk_object(addr) accepts any addr. In this case, somewhere in is_mmtk_object(8) will try to compute 8 - 16 (because it wants to check if VO bit is set at 8 - 16) and that's an overflow.

@k-sareen
Copy link
Collaborator

k-sareen commented Apr 18, 2024

In your example, you can have two ways of implementing the object model. One is the internal address of an ObjectReference is the start of the header. And the other is the internal address is the start of the cell (i.e. the part of the allocation the VM actually uses for storing fields).

       cell
        v
    +---+--------+
    +---+--------+
    ^
  header

In the case of the latter, there is no issue as the thing stored on the stack is the cell and everything is happy. The case of the former is where you get the issues you mentioned.

Also I feel like we always end up having discussions about ObjectReference. There are quite a few issues on GitHub and threads on Zulip. In particular we seem to have discussed this exact issue (at least in the context of is_mmtk_object) on Zulip and I gave a suggestion here.

@wks
Copy link
Collaborator Author

wks commented Apr 18, 2024

Actually my example was like this:

0       8       16
+-------+
+-------+
^               ^
header          objref

That is, objref points at an address after the last byte of an object. In that case, header must be inside the object, and therefore must be in the same block as the object; but objref may be in the next block, and the side metadata may not have been allocated (and it may not even be in the MMTk heap).

I remember that conversation now. The one on zulip talked about exactly this problem. I think your suggestion here was a good one, and now I am seriously considering it.

MMTk may present the interface of is_mmtk_object like this:

type MaybeObjectReference = Address;

fn is_mmtk_object(possible_object: MaybeObjectReference) -> bool;

I am just using MaybeObjectReference for clearness. The final interface may simply use Address instead because any Address can be converted to MaybeObjectReference, as there is no validity requirements for MaybeObjectReference.

But MaybeObjectReference may not be in the same block as the object. Therefore we can't set VO bit at MaybeObjectReference. That's what ObjectReference.to_address (equivalently ObjectModel::ref_to_address) was introduced for. The address returned by to_address is guaranteed to be safe for setting side metadata.

But ObjectReference::to_address requires the ObjectReference to be valid in the first place. I think that was also Yi's concern here:

My concern is: do we allow calling ref_to_address() with an invalid object reference? It seems we have to allow that. Otherwise, we cannot check valid object bit for an arbitrary address.

We must have a version of ref_to_address that allows the input to be invalid. We can add it, and it was the get_address_inside_object_if_it_were_objref I mentioned above.

In the Zulip conversation, I was arguing that the offset must be constant. But after a second thought, at least for implementing is_mmtk_object, it doesn't have to. It only has to satisfy

  1. For all possible_object where ObjectReference.from_raw_address(possible_object) is valid, the return value must be the same as ObjectReference.from_raw_address(possible_object).to_address().
  2. For all possible_object where ObjectReference.from_raw_address(possible_object) is invalid, the return value must not be the same as the return value for any valid references. In other words, a valid possible_object and an invalid possible_object never map to the same address.

Because VO bits are set exactly where valid objects are, as long as invalid possible_object values map to invalid addresses, it will find VO bit not to be set. And one way to implement this mapping is addr.as_usize().wrapping_sub(16). It silently maps 15 to 0xffffffffffffffff and there is no SFT entry for it. And we don't enforce any "constant offset" thing.

I don't know what we should do for is_in_mmtk_space, as the purpose is not clear. If we only need to distinguish between valid in-heap ObjectReference and valid off-heap ObjectReference (by whatever definition a "valid off-heap ObjectReference" is), we can require the VM binding to implement ObjectModel::ref_to_address so that it never panicks when given a "valid off-heap ObjectReference".

But if we need something like OpenJDK's CollectedHeap::is_in, perhaps we can ignore ObjectReference altogether because is_in queries address range, not object references. But internally, I see OpenJDK passes oop to is_in, but OpenJDK's oop always points inside an object.

@k-sareen
Copy link
Collaborator

I'm not sure why simply adding bounds check wont solve the issue you have?

@qinsoon
Copy link
Member

qinsoon commented Apr 18, 2024

Adding another version of ref_to_address() for potential object reference sounds reasonable. But we already state that ref_to_address may have an invalid object reference.

/// Note that MMTk may forge an arbitrary address
/// directly into a potential object reference, and call this method on the 'object reference'.
/// In that case, the argument `object` may not be a valid object reference,
/// and the implementation of this method should not use any object metadata.
///
/// MMTk uses this method more frequently than [`crate::vm::ObjectModel::ref_to_object_start`].
///
/// Arguments:
/// * `object`: The object to be queried. It should not be null.
fn ref_to_address(object: ObjectReference) -> Address;

If I understand right, there are two problems with this issue:

  1. What if the address is low and address calculation may overflow?
  2. Can we call ref_to_address() with an invalid object reference?

The answer to 1 is simple: we can just filter out any obviously low address (e.g. anything below 1K cannot be an object). We have two choices for 2: either have a different ref_to_address that allows potentially invalid object reference, or we filter low addresses and use the same ref_to_address with the document saying the object ref could be invalid.

One thing that can help us to decide is whether a binding would implement the two ref_to_address methods differently, except for one more bound check.

@wks
Copy link
Collaborator Author

wks commented Apr 19, 2024

I'm not sure why simply adding bounds check wont solve the issue you have?

(1) Because ref_to_address is called very often, even in the tracing loop. Therefore I hesitate to add a bounds check unless necessary.

(2) Even with a bounds check, what should it return if it finds 15 - 16 will be out of bound? There is no obvious answer.

@wks
Copy link
Collaborator Author

wks commented Apr 19, 2024

Adding another version of ref_to_address() for potential object reference sounds reasonable. But we already state that ref_to_address may have an invalid object reference.

Yes. The doc comment you quoted already considered this possibility. But the semantics is under-specified.

  1. What should ref_to_address(object) return if object is off-heap, but is "valid" as defined by the specific VM binding? Then what does "Return an address guaranteed to be inside the storage associated with an object" mean in this case?
  2. What should ref_to_address(object) return if object is not valid at all?

Most operations, such as trace_object(object), always require object to be valid, even off-heap. Some operations, such as is_reachable(object), requires object to be an MMTk object. Some, such as is_mmtk_object(addr), may forge addr directly into an ObjectReference via ObjectReference::from_raw_address.

The answer to 1 is simple: we can just filter out any obviously low address (e.g. anything below 1K cannot be an object).
... or we filter low addresses and use the same ref_to_address with the document saying the object ref could be invalid.

Yes. When the user calls is_mmtk_object, it should always do filtering first, filtering out addresses that are too low, too high, or misaligned. Currently is_mmtk_object(addr) requires addr to be aligned, or it will give wrong answer (due to the granularity of the VO bits). We can further require the addr not to be too low or too high. We probably can't say that directly because it is vague what "too high" or "too low" means. We can explicitly tell the user that is_mmtk_object uses ObjectModel::ref_to_address, and require that "When calling is_mmtk_object(addr), the expression ObjectReference::from_raw_address(addr).to_address() must not panic", and also require invalid addresses are never mapped to the same address as valid ones (or it will erroneously identify an invalid obj ref as a valid one). Anyway, the statement that "is_mmtk_object(addr) will never panic" is no longer true. I think that's still OK.

We have two choices for 2: either have a different ref_to_address that allows potentially invalid object reference, ...

As I discussed before, this is also possible.

... whether a binding would implement the two ref_to_address methods differently, except for one more bound check.

And I also noticed that the doc comment of ObjectModel::ref_to_address does require the return value to be a constant offset from the input.

/// ...
/// ... For a given object, the returned address
/// should be a constant offset from the object reference address.
/// ...
fn ref_to_address(object: ObjectReference) -> Address;

Due to this requirement, there is only one way to implement this function, as long as not out of bound.

I can't think of a VM that doesn't require object.to_raw_address() to be a constant offset from any address that must be inside the object. Even the SableVM satisfies this (just return object.to_raw_address()). We may go back to the "what is ObjectReference" debate where there are more possibilities, but I have shown that "compressed pointers", "fat pointers", "tagged pointers" and "handles" don't need to be presented to MMTk. They can be converted in slot.load(), and back at slot.store().

I don't know which one is better

  1. Letting the VM implementing another safe_ref_to_address keeps the original ref_to_address clean semantically. In the debug mode, it will panic if an address is too low. But the VM binding needs to implement two functions, complicating the API.
  2. It is more convenient to require that ref_to_address never panics. Just implement it with wrapping_sub. And it's efficient, too. But it feels strange that it allows invalid inputs.
  3. We can also let the user do the filtering before calling ref_to_address or any functions that depend on it.

@k-sareen
Copy link
Collaborator

The bounds check I'm describing is for the is_mmtk_object and is_in_mmtk_spaces function(s). Not for ref_to_address.

@wks
Copy link
Collaborator Author

wks commented Apr 19, 2024

The bounds check I'm describing is for the is_mmtk_object and is_in_mmtk_spaces function(s). Not for ref_to_address.

Well, in this case bounds check should help. But since MMTk doesn't know the offset, we can offload the responsibility of bounds check to the user. It's the same as "the caller must ensure ObjectReference::from_raw_address(addr).to_address() never fails", and the VM binding ensures this by doing bounds check.

@k-sareen
Copy link
Collaborator

The point is that if is_mmtk_object or is_in_mmtk_spaces fails then you cannot convert that address to a valid object reference. I'm not sure what you mean by "since MMTk doesn't know the offset"? I think this conversation is getting a bit confused. My opinion/stance is that the API for is_mmtk_object and is_in_mmtk_spaces should be lenient in its input since asking if an arbitrary address is potentially (and in the case of is_mmtk_object: definitely) allocated by MMTk is a simple question that most, if not all, VMs need.

Re: How the VM is coming up with these addresses is a slightly different question, i.e. if the VM finds an address on the stack while conservatively stack scanning. I don't have a great answer to this question. MMTk can't magically make inefficient conversions between Address and ObjectReference more efficient. If the VM has a wacky way to implement ObjectReference then they need to pay the cost some way.

@wks
Copy link
Collaborator Author

wks commented Apr 19, 2024

... asking if an arbitrary address is potentially (and in the case of is_mmtk_object: definitely) allocated by MMTk is a simple question that most, if not all, VMs need.

The question is simple, but the answer is not. The capability for MMTk to answer the question is limited by the implementation.

  • Because objref.to_address() may not be the same as objref.to_raw_address(), we can't give any value of addr to is_mmtk_object(addr) since it will need to compute ObjectReference::from_raw_address(addr).to_address() to query the VO bit. In other words, if we give it a value that is too low, it will crash. And is_mmtk_object(addr) cannot know if addr is "too low" That's what I mean by "MMTk doesn't know the offset". If the offset is 0, addr will never be too low. If the offset is 0x30000000, then even 0x2ffffff8 can be "too low".
  • Because the granularity of VO bit is one bit per minimum object alignment (usually a word), if a VO bit is set for address 0x20000000, it won't tell which one of 0x20000000, 0x20000001, 0x20000002 and 0x20000003 is a valid obj ref. That's why the caller of is_mmtk_object must check the alignment first, in a VM-specific way.

That's basically what makes this "simple" question difficult to answer.

And the VM must help in answering the question by implementing get_address_inside_object_if_it_were_objref to map an Address (or more precisely a MaybeObjectReference) to another Address where the VO bit is set. The filtering before calling is_mmtk_object, the is_mmtk_object function itself, and the get_address_inside_object_if_it_were_objref function, when worked together, must not panic when given any input. We may have multiple choices.

  • If we require is_mmtk_object(addr) can accept any address without panicking, then get_address_inside_object_if_it_were_objref(addr) must tolerate invalid `addr;
    • If we further require is_mmtk_object(addr) to always give the right answer, we may further require the VM binding to implement a function filter_addr(addr) which is_mmtk_object(addr) will call.
  • If we require the user to filter addr and remove addresses that are too low, too high, or misaligned, before calling is_mmtk_object(addr), then we can loosen the requirement of get_address_inside_object_if_it_were_objref(addr) so that it may panic or have undefined behavior (e.g. because of overflow) when given some addr. We can even use ObjectReference::from_raw_address(addr).to_address() directly if you don't mind converting a potentially invalid addr to ObjectReference using from_raw_address.

@k-sareen
Copy link
Collaborator

My point is that if the question is simple then the API should be simple as well. Yes, the answer may not be simple, which is fine. We need to make the contract clearer in such a case. But a complex API will only increase confusion instead. get_address_inside_object_if_it_were_objref is arguably not great API design since it's confusing.

Because objref.to_address() may not be the same as objref.to_raw_address(), we can't give any value of addr to is_mmtk_object(addr) since it will need to compute ObjectReference::from_raw_address(addr).to_address() to query the VO bit. In other words, if we give it a value that is too low, it will crash. And is_mmtk_object(addr) cannot know if addr is "too low" That's what I mean by "MMTk doesn't know the offset". If the offset is 0, addr will never be too low. If the offset is 0x30000000, then even 0x2ffffff8 can be "too low".

Right I see.

Because the granularity of VO bit is one bit per minimum object alignment (usually a word), if a VO bit is set for address 0x20000000, it won't tell which one of 0x20000000, 0x20000001, 0x20000002 and 0x20000003 is a valid obj ref. That's why the caller of is_mmtk_object must check the alignment first, in a VM-specific way.

This is an expectation we have from the users of the API. Only aligned pointers should be queried. Currently we don't have VM-specific configurable OBJECT_ALIGNMENT or MIN_OBJECT_SIZE but once we do, we can assert inside our code as well instead of relying on the VM-binding developer to check.

@udesou udesou added the P-normal Priority: Normal. label May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-normal Priority: Normal.
Projects
None yet
Development

No branches or pull requests

5 participants