-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generalize the Domain.DLS interface to split PRNG state for child domains #10887
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition looks nice and modular. The cost is pay-as-you-go for splitting; if the program does not use any splittable keys, then there is only a small constant cost at domain creation over the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good to me. Thanks! Just one suggestion below. The LXM pull request is coming very soon...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a useful feature. I didn't spot any problems with how it has been added.
Minor point around how best to force domain overload interleaving in the testsuite.
let c = Random.int 100 in | ||
(a, b, c) | ||
|
||
(* We intentionally spawn many more domains than hardware threads, to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One technique for achieving the same aim (at least on Linux) is to use taskset to limit the number of available cores. This is currently implemented in the CI here:
ocaml/.github/workflows/build.yml
Lines 99 to 103 in 34776e7
- name: Run the testsuite (taskset -c 0) | |
if: ${{ matrix.id == 'taskset' }} | |
run: | | |
bash -xe tools/ci/actions/runner.sh test_multicore 1 "parallel" \ | |
"lib-threads" "lib-systhreads" "weak-ephe-final" |
Not sure if using this approach with a smaller domain_count
works for you. That does have the downside (or upside depending on viewpoint) that it isn't exercised on a default local run of the testsuite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the maximal number of domains is 128. In this case it's risky to spawn 1000. I guess it works because each domain exits almost immediately, so domains die faster than they are created. But in this case the OS scheduler doesn't get much chance to interleave executions, does it?
Bottom line: I'd rather have a test with a small number of domains, and each domain waits a variable -- maybe even random! -- amount of time before drawing random numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have a test with a small number of domains, and each domain waits a variable -- maybe even random! -- amount of time before drawing random numbers.
Took the liberty to push the revised test on this PR's branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original PR had the split-generator initialization done on the spawned domain (after just computing the splitting seed on the parent domain). I wanted to ensure in the test that not only the random draws would be interleaved, but also the initialization itself, and in particular that trying to mutate the parent domain from the child (which can happens if Domain is implemented incorrectly) would blow up. I think that I did check that writing Domain incorrectly in this way would fail the test.
The new implementation does all the computation on the parent, and of course those will never be interleaved (spawning multiple domains from a fixed parent), so we can do with a weaker test.
I'm not sure what failure modes we can observe with the revised test; ideally I would want to break the code again to see whether the test catches various issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised to learn that domains are limited to 128, and that my test was not reaching that limit. I'll double-check. (128 seems a bit low, I could buy AMD Threadripper machines with that many threads today.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised to learn that domains are limited to 128,
Line 256 in 750e212
#define Max_domains 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised to learn that domains are limited to 128
We needed max domains to be a small number for fast read barrier checks. See the last paragraph in section 4.3.2 in https://arxiv.org/pdf/2004.11663.pdf. While we no longer use read barriers, we still have the virtual memory layout since it turns out to be good for minimizing memory hierarchy overheads. See ocaml-multicore/ocaml-multicore#508 for the experiments. The max domains value affects the size of the virtual address space reserved.
I have a proposal for removing this restriction here: ocaml-multicore/ocaml-multicore#795. For example, you could have max domains to be 256 with a max minor heap size of 2M, and the program will only reserve 512M of virtual memory space. Currently, we reserve 256G.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I integrated @xavierleroy's proposed test in the commit history. The test doesn't allow to detect races if several child are initialized at the same time, but it does detect that seeding is performed (that the random stream does not depend on the scheduling order of random draws).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message for af8a906 still describes the previous, 1000-domain version of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. (I squashed most test-related commits, and added comments in the test to explain what is going on.)
1f44f52
to
38f2de8
Compare
38f2de8
to
c55a33a
Compare
…d keys Compute the initial value fully in the parent domain (no lazy result). Store all parent-initialized keys and their splitting functions in a global list, so that there is no need to store "protocols" in the DLS tables.
c55a33a
to
e29ec97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good to me. Thanks!
This PR intends to demonstrate an approach to implement a "proper" PRNG+Domains semantics where spawning a domain "splits" the PRNG state. This required changes to the Domain.DLS interface to support domain-local values that are "inherited" from the parent/spawning domain, with a user-provided function to derive the child state from the parent state.
(The PRNG "split" function implemented in this PR is naive and probably bad, or at least not suspected to be any good, the intent is to replace it with a proper splittable PRNG, see #10877, ocaml/RFCs#28, #10742 )
Note: this PR was initially submitted at ocaml-multicore/ocaml-multicore#756 , but did not receive feedback from Multicore devs yet. I am re-submitting upstream now that the Multicore tree has been merged.
Some parts of the code are from @xavierleroy.
cc: people who have worked on Domain: @kayceesrk, @ctk21, @Sudha247, @Engil