Two step space creation #1004

wks · 2023-10-30T07:54:11Z

We let plans specify the requirement of all spaces, and then compute the start and extent of all spaces at the same time. By doing this, when a space (including discontiguous spaces) is created, we know its start and extent, and will no longer need to mutate the space later. This removes the global vec of unsafe mutable references to FreeListPageResource in Map32.

Concretely, the changes include:

Splitting VMRequest into two structures, VMRequest and VMResponse.
- VMRequest specifies the requirements, like the old VMRequest, but Discontiguous is changed to Unrestricted. The plan no longer requests a "discontiguous" space which will later be changed to "contiguous" on 64-bit machines.
- VMResponse represents the placement decision (contiguousness, start and extent). If the VMRequest is "Unrestricted", the VMResponse may be contiguous or discontiguous.
The procedure of creating spaces is split into a specification phase and a creation phase.
- The plan first specify all spaces, providing requirements.
- The HeapMeta type determines the contiguousness and address range of each space.
- The plan then create contiguous or discontiguous spaces at those ranges.
The shared_fl_map: Vec<Option<NonNull<CommonFreeListPageResource>>> field in Map32 is removed. That global map was used to record all FreeListPageResource instances in order to fix their start addresses after the start address of the discontiguous range is computed after spaces are created. Since we now know the discontiguous range before creating any space, we no longer need that vec.

Related issues and PRs

This PR is part of the greater goal of removing unsafe operations related to page resources and VMMap, described in #853:

An older PR #953 also removes the unsafe Map32::shared_fl_map, but it does so by letting plans provide a method for_each_space_mut to mutate spaces after creation. It doesn't fix the root of the problem because it still mutates spaces after creation.

Things not fixed in this PR

Map64 holds global mutable references to FreeListPageResource and mutate their start addresses, too, but for a completely different reason (see #853). But since fixing this problem will require non-trivial changes, too, I'll postpone that to another PR, possibly a shrunk version of #953 that focuses on Map64 alone.

This PR does not remove the UnsafeCell<Map32Inner> in Map32. It will be removed later.

Since we compute the range of all spaces before creating them, we know the start of the FreeListPageResource when they are created.

Now it seems working.

wks · 2023-10-30T08:41:09Z

Should we rename `HeapMeta`?

I reused the struct of the same name, and extended its responsibility. The HeapMeta type did not exist in JikesRVM. In JikesRVM, heapCursor and heapLimit were static fields of the abstract class Space. HeapMeta was introduced in 32158cd so that heap_cursor and heap_limit can be shared between spaces.

I think we can rename this type to more accurately describe its role. I am thinking about "SpacePlacer" because that's what it does -- deciding the places of spaces.

And there is "SpaceMeta"

So the placement decision for each space is represented by a "SpaceMeta". I just named it after HeapMeta. Maybe there is a better name.

Space "allocation" or "placement"?

I kept using the phrase "space placement" to refer to the action of deciding the start and extent of spaces. I feel that we have always been using the word "allocation" to refer to allocating memory for things, and "space allocation" may be interpreted as "doing mmap so that the space is backed by actual memory". Let's discuss if there is a better phrase for "space placement".

qinsoon

The PR illustrates that we can remove Map32::shared_fl_map by introducing an explicit two-spec space creation. I think it is a good idea. However, implementation-wise, the PR is too complex at the moment.

I suggest we keep using the name VMRequest. I see in the PR, that some enum variants need to be renamed. But I don't see a good reason we should rename the enum, and the new name does not seem to be clearer. I think we can just keep the name, and it will be helpful when we need to cross reference Rust and Java MMTk.

The concepts of asynchronous programming (future/promise/resolution) does not really applicable to our context in my opion, and the implementation is too complex. We can just ask each plan to provide a list of VMRequests when we create HeapMeta. HeapMeta resolves the VMRequests when it is created, and store it in HeapMeta. Then we pass HeapMeta to new() for each plan in the same way as it is now, and the plan can get the corresponding SpaceMeta and use that to create the spaces. Concretely, we could also ask for a HashMap<&str, VMRequest> (&str is the space name), and provide fn get_space_meta(name: &str) -> SpaceMeta on HeapMeta.

qinsoon · 2023-10-30T10:17:53Z

Should we rename HeapMeta?

I reused the struct of the same name, and extended its responsibility. The HeapMeta type did not exist in JikesRVM. In JikesRVM, heapCursor and heapLimit were static fields of the abstract class Space. HeapMeta was introduced in 32158cd so that heap_cursor and heap_limit can be shared between spaces.

I think we can rename this type to more accurately describe its role. I am thinking about "SpacePlacer" because that's what it does -- deciding the places of spaces.

We can rename HeapMeta if we have a good name. This does not need to be done in this PR (but you can include in the PR if you think it is appropriate).

What HeapMeta is actually doing is to layout the virtual memory for the spaces. My suggestion is to rename the current VMLayout to VMLayoutParameters and to rename HeapMeta to VMLayout.

wks · 2023-10-30T12:05:44Z

I suggest we keep using the name VMRequest. I see in the PR, that some enum variants need to be renamed. But I don't see a good reason we should rename the enum, and the new name does not seem to be clearer. I think we can just keep the name, and it will be helpful when we need to cross reference Rust and Java MMTk.

Yes. After a second thought, I think we should keep the name VMRequest. We can also rename SpaceMeta to VMResponse to pair it with VMRequest.

The concepts of asynchronous programming (future/promise/resolution) does not really applicable to our context in my opion, and the implementation is too complex. We can just ask each plan to provide a list of VMRequests when we create HeapMeta. HeapMeta resolves the VMRequests when it is created, and store it in HeapMeta. Then we pass HeapMeta to new() for each plan in the same way as it is now, and the plan can get the corresponding SpaceMeta and use that to create the spaces. Concretely, we could also ask for a HashMap<&str, VMRequest> (&str is the space name), and provide fn get_space_meta(name: &str) -> SpaceMeta on HeapMeta.

I designed it this way so that the SpaceMeta and the constructor of concrete spaces are in the same scope. For example, for SemiSpace, we define both copyspace0_meta and copyspace1_meta in the same function SemiSpace::new(). In this way, we never miss the specification of any space. If we miss a VMRequest, or if we consume the same VMResponse twice, it will be a compile error.

But if we use HashMap<&str, VMRequest>, and require the plan to provide a list of VMRequest instances, we will have to write one function to provide a list of (name, VMRequest) pairs, and another function new() to actually instantiate the Plan and the spaces. It will increase the chance to make mistake because

The space names in both functions must match, and
The number of spaces must match, and
The compiler cannot check if they match at compile time.

If the idea of future/promise makes it hard to understand, we can rename FutureSpaceMeta to PendingSpaceMeta or PendingVMResponse.

qinsoon · 2023-10-30T22:48:35Z

The concepts of asynchronous programming (future/promise/resolution) does not really applicable to our context in my opion, and the implementation is too complex. We can just ask each plan to provide a list of VMRequests when we create HeapMeta. HeapMeta resolves the VMRequests when it is created, and store it in HeapMeta. Then we pass HeapMeta to new() for each plan in the same way as it is now, and the plan can get the corresponding SpaceMeta and use that to create the spaces. Concretely, we could also ask for a HashMap<&str, VMRequest> (&str is the space name), and provide fn get_space_meta(name: &str) -> SpaceMeta on HeapMeta.

I designed it this way so that the SpaceMeta and the constructor of concrete spaces are in the same scope. For example, for SemiSpace, we define both copyspace0_meta and copyspace1_meta in the same function SemiSpace::new(). In this way, we never miss the specification of any space. If we miss a VMRequest, or if we consume the same VMResponse twice, it will be a compile error.

We can achieve the same goal with a hash map. We can make sure that a space meta can be fetched exactly once (we remove an entry from the result hashmap once fetched), and after plan creation, the result hashmap should be empty (retrieving space meta matches VM requests). Fetching a wrong space meta will panic.

But if we use HashMap<&str, VMRequest>, and require the plan to provide a list of VMRequest instances, we will have to write one function to provide a list of (name, VMRequest) pairs, and another function new() to actually instantiate the Plan and the spaces. It will increase the chance to make mistake because

The space names in both functions must match, and

The number of spaces must match, and

The compiler cannot check if they match at compile time.

If the idea of future/promise makes it hard to understand, we can rename FutureSpaceMeta to PendingSpaceMeta or PendingVMResponse.

Using a constant for space names would solve this.
This can be checked easily.
It is an advantage for compile time check, but it is not decisive.

These are the two main reasons why I suggested providing a list of VMRequests:

1. It is structurally and logically simpler

This is what the current code does. The plan needs to make sure the ordering is correct for creating VMRequests, letting the parent plan to create VMRequests, and then resolving its own space creation.

Plan: VMRequest1
ParentPlan: VMRequest2
ParentParentPlan: VMRequest3
ParentParentPlan: SpaceMeta3
ParentPlan: SpaceMeta2
Plan: SpaceMeta1

What I propose. All the requests are known beforehand, and are resolved. During plan creation, each plan just needs to get the space meta for their own spaces.

MMTK: [VMRequest1, VMRequest2, VMReqest3]
HeapMeta: [SpaceMeta1, SpaceMeta2, SpaceMeta3]
Plan: Fetch SpaceMeta1
ParentPlan: Fetch SpaceMeta2
ParentParentPlan: Fetch SpaceMeta3

2. It matches the design of "plan as configuration"

We have a general goal of being able to provide a configuration file for plans. We currently ask the plan for a configuration of allocators, allocation/copy semantics mapping, etc. Asking the plan to provide a configuration of VMRequests fits the design.

I am happy to discuss further on the issue.

wks · 2023-10-31T02:31:11Z

We can achieve the same goal with a hash map. We can make sure that a space meta can be fetched exactly once (we remove an entry from the result hashmap once fetched), and after plan creation, the result hashmap should be empty (retrieving space meta matches VM requests). Fetching a wrong space meta will panic.

It is still run-time checking instead of compile-time checking.

Using a constant for space names would solve this.

This can be checked easily.

It is an advantage for compile time check, but it is not decisive.

That mitigated the problem, but I still believe that compile-time checking (with the help of ownership and move checker) is a better solution because it prevents those things from happening completely.

These are the two main reasons why I suggested providing a list of VMRequests:

1. It is structurally and logically simpler

This is what the current code does. The plan needs to make sure the ordering is correct for creating VMRequests, letting the parent plan to create VMRequests, and then resolving its own space creation.
Plan: VMRequest1
ParentPlan: VMRequest2
ParentParentPlan: VMRequest3
ParentParentPlan: SpaceMeta3
ParentPlan: SpaceMeta2
Plan: SpaceMeta1
What I propose. All the requests are known beforehand, and are resolved. During plan creation, each plan just needs to get the space meta for their own spaces.
MMTK: [VMRequest1, VMRequest2, VMReqest3]
HeapMeta: [SpaceMeta1, SpaceMeta2, SpaceMeta3]
Plan: Fetch SpaceMeta1
ParentPlan: Fetch SpaceMeta2
ParentParentPlan: Fetch SpaceMeta3

I don't like the approach of making all VMRequest instances in MMTK. Plan determines the set of spaces. If we put all VMRequest objects into MMTK, it will be cluttered with all the spaces from all implemented plans, and will soon become a chaos. For example, what if we have two generational plans, but their ideal nursery sizes are different? We may end up having nursery_for_plan1: VMRequest, nursery_for_plan2: VMRequest, ...

There is one alternative that I actually considered. Letting each concrete plan implement a function that returns a vec (or map) of space-vmrequest pairs.

impl SemiSpace {
    pub fn specify_spaces() -> Vec<(&'static str, VMRequest)> {
        let mut result = vec![("copyspace0", VMRequest::Unconstrainted), ("copyspace1", VMRequest::Unconstrainted)];
        result.append(CommonSpace::specify_spaces());
        result
    }
}

But I didn't adopt that approach in favour for the approach I used in this PR, mainly due to the ability to check statically.

2. It matches the design of "plan as configuration"

We have a general goal of being able to provide a configuration file for plans. We currently ask the plan for a configuration of allocators, allocation/copy semantics mapping, etc. Asking the plan to provide a configuration of VMRequests fits the design.

Yes. I agree. Ideally, spaces should be specified declaratively and MMTk should be able to instantiate them in the right order, i.e. knowing their locations and contiguousness before instantiating spaces. Maybe it is possible to make some kind of DSL (maybe using macros) and hide some of the details to the implementation.

I'll try making another version of this PR later to try some alternative approaches, e.g. HashMap and macros.

steveblackburn · 2023-10-31T03:54:48Z

This looks interesting.

However, I would like to flag this for deep discussion before we go any further with it. The ideas at heart of this issue are foundational ones.

Experience with ProcessEdgesWork suggests that well-intentioned refactoring can have profound impacts on usability which undermine the most key elements of the MMTk design.

My suggestion is that we do not make any such change until:

We address all high priority issues
Get CI to the point that we have confidence in it
Fix ProcessEdgesWork and associated changes so that we get to the point that, in that example, simple changes to tracing loops can be made by any competent coder.

We now have a situation where the code has become so complex that this is not possible, even for highly experienced team members. The design ethos of MMTk was to allow inexperienced people to come and rapidly explore GC designs. It is extremely important that we don't erode this principle.

wks added 12 commits October 27, 2023 21:39

WIP: Basic data structures for space placing.

6e73b07

WIP: Basic space-placing algorithm.

fcdcf17

WIP: Refactor CommonSpace

f6cdcaf

WIP: Spaces no longer refer to VMRequest

5ae4e3c

WIP: Plan create spaces in two steps

1891222

WIP: Further fix plan creation

0f573a4

Fix FreeListPageResource creation

e45d515

Since we compute the range of all spaces before creating them, we know the start of the FreeListPageResource when they are created.

Merge branch 'master' into feature/two-step-space-creation

89db883

Misc fixes

848bb97

Now it seems working.

Docs and minor fixes

edbcc91

Remove VMRequest completely.

0c4df75

Formatting and clippy

f019f8a

wks requested review from qinsoon and wenyuzhao October 30, 2023 07:54

Fix typo

80170c3

qinsoon requested changes Oct 30, 2023

View reviewed changes

wks added 2 commits October 30, 2023 19:30

Fix tutorial

09efc84

Change variable names from _spec to _meta

3d78a80

Rename types and variables

2ff2116

wks mentioned this pull request Dec 7, 2023

(draft) MMTk Enhancement Proposal (MEP) #1041

Closed

wks mentioned this pull request Dec 22, 2023

MMTk Enhancement Proposal #1056

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two step space creation #1004

Two step space creation #1004

wks commented Oct 30, 2023 •

edited

wks commented Oct 30, 2023

qinsoon left a comment

qinsoon commented Oct 30, 2023

Should we rename `HeapMeta`?

wks commented Oct 30, 2023

qinsoon commented Oct 30, 2023

wks commented Oct 31, 2023

1. It is structurally and logically simpler

2. It matches the design of "plan as configuration"

steveblackburn commented Oct 31, 2023

Two step space creation #1004

Are you sure you want to change the base?

Two step space creation #1004

Conversation

wks commented Oct 30, 2023 • edited

Related issues and PRs

Things not fixed in this PR

wks commented Oct 30, 2023

Should we rename HeapMeta?

And there is "SpaceMeta"

Space "allocation" or "placement"?

qinsoon left a comment

Choose a reason for hiding this comment

qinsoon commented Oct 30, 2023

Should we rename HeapMeta?

wks commented Oct 30, 2023

qinsoon commented Oct 30, 2023

1. It is structurally and logically simpler

2. It matches the design of "plan as configuration"

wks commented Oct 31, 2023

1. It is structurally and logically simpler

2. It matches the design of "plan as configuration"

steveblackburn commented Oct 31, 2023

wks commented Oct 30, 2023 •

edited

Should we rename `HeapMeta`?

Should we rename `HeapMeta`?