Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generalize the Domain.DLS interface to split PRNG state for child domains #10887

Merged
merged 7 commits into from
Jan 14, 2022

Conversation

gasche
Copy link
Member

@gasche gasche commented Jan 12, 2022

This PR intends to demonstrate an approach to implement a "proper" PRNG+Domains semantics where spawning a domain "splits" the PRNG state. This required changes to the Domain.DLS interface to support domain-local values that are "inherited" from the parent/spawning domain, with a user-provided function to derive the child state from the parent state.

(The PRNG "split" function implemented in this PR is naive and probably bad, or at least not suspected to be any good, the intent is to replace it with a proper splittable PRNG, see #10877, ocaml/RFCs#28, #10742 )

Note: this PR was initially submitted at ocaml-multicore/ocaml-multicore#756 , but did not receive feedback from Multicore devs yet. I am re-submitting upstream now that the Multicore tree has been merged.

Some parts of the code are from @xavierleroy.

cc: people who have worked on Domain: @kayceesrk, @ctk21, @Sudha247, @Engil

Copy link
Contributor

@kayceesrk kayceesrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition looks nice and modular. The cost is pay-as-you-go for splitting; if the program does not use any splittable keys, then there is only a small constant cost at domain creation over the current implementation.

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good to me. Thanks! Just one suggestion below. The LXM pull request is coming very soon...

stdlib/filename.ml Outdated Show resolved Hide resolved
Copy link
Contributor

@ctk21 ctk21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a useful feature. I didn't spot any problems with how it has been added.
Minor point around how best to force domain overload interleaving in the testsuite.

let c = Random.int 100 in
(a, b, c)

(* We intentionally spawn many more domains than hardware threads, to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One technique for achieving the same aim (at least on Linux) is to use taskset to limit the number of available cores. This is currently implemented in the CI here:

- name: Run the testsuite (taskset -c 0)
if: ${{ matrix.id == 'taskset' }}
run: |
bash -xe tools/ci/actions/runner.sh test_multicore 1 "parallel" \
"lib-threads" "lib-systhreads" "weak-ephe-final"

Not sure if using this approach with a smaller domain_count works for you. That does have the downside (or upside depending on viewpoint) that it isn't exercised on a default local run of the testsuite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the maximal number of domains is 128. In this case it's risky to spawn 1000. I guess it works because each domain exits almost immediately, so domains die faster than they are created. But in this case the OS scheduler doesn't get much chance to interleave executions, does it?

Bottom line: I'd rather have a test with a small number of domains, and each domain waits a variable -- maybe even random! -- amount of time before drawing random numbers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have a test with a small number of domains, and each domain waits a variable -- maybe even random! -- amount of time before drawing random numbers.

Took the liberty to push the revised test on this PR's branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original PR had the split-generator initialization done on the spawned domain (after just computing the splitting seed on the parent domain). I wanted to ensure in the test that not only the random draws would be interleaved, but also the initialization itself, and in particular that trying to mutate the parent domain from the child (which can happens if Domain is implemented incorrectly) would blow up. I think that I did check that writing Domain incorrectly in this way would fail the test.

The new implementation does all the computation on the parent, and of course those will never be interleaved (spawning multiple domains from a fixed parent), so we can do with a weaker test.

I'm not sure what failure modes we can observe with the revised test; ideally I would want to break the code again to see whether the test catches various issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised to learn that domains are limited to 128, and that my test was not reaching that limit. I'll double-check. (128 seems a bit low, I could buy AMD Threadripper machines with that many threads today.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised to learn that domains are limited to 128,

#define Max_domains 128

Copy link
Contributor

@kayceesrk kayceesrk Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised to learn that domains are limited to 128

We needed max domains to be a small number for fast read barrier checks. See the last paragraph in section 4.3.2 in https://arxiv.org/pdf/2004.11663.pdf. While we no longer use read barriers, we still have the virtual memory layout since it turns out to be good for minimizing memory hierarchy overheads. See ocaml-multicore/ocaml-multicore#508 for the experiments. The max domains value affects the size of the virtual address space reserved.

I have a proposal for removing this restriction here: ocaml-multicore/ocaml-multicore#795. For example, you could have max domains to be 256 with a max minor heap size of 2M, and the program will only reserve 512M of virtual memory space. Currently, we reserve 256G.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I integrated @xavierleroy's proposed test in the commit history. The test doesn't allow to detect races if several child are initialized at the same time, but it does detect that seeding is performed (that the random stream does not depend on the scheduling order of random draws).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message for af8a906 still describes the previous, 1000-domain version of the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. (I squashed most test-related commits, and added comments in the test to explain what is going on.)

Copy link
Contributor

@xavierleroy xavierleroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good to me. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants