generalize the Domain.DLS interface to split PRNG state for child domains #10887

gasche · 2022-01-12T22:00:52Z

This PR intends to demonstrate an approach to implement a "proper" PRNG+Domains semantics where spawning a domain "splits" the PRNG state. This required changes to the Domain.DLS interface to support domain-local values that are "inherited" from the parent/spawning domain, with a user-provided function to derive the child state from the parent state.

(The PRNG "split" function implemented in this PR is naive and probably bad, or at least not suspected to be any good, the intent is to replace it with a proper splittable PRNG, see #10877, ocaml/RFCs#28, #10742 )

Note: this PR was initially submitted at ocaml-multicore/ocaml-multicore#756 , but did not receive feedback from Multicore devs yet. I am re-submitting upstream now that the Multicore tree has been merged.

Some parts of the code are from @xavierleroy.

cc: people who have worked on Domain: @kayceesrk, @ctk21, @Sudha247, @Engil

kayceesrk

The addition looks nice and modular. The cost is pay-as-you-go for splitting; if the program does not use any splittable keys, then there is only a small constant cost at domain creation over the current implementation.

xavierleroy

Looks very good to me. Thanks! Just one suggestion below. The LXM pull request is coming very soon...

stdlib/filename.ml

ctk21

Looks like a useful feature. I didn't spot any problems with how it has been added.
Minor point around how best to force domain overload interleaving in the testsuite.

ctk21 · 2022-01-13T09:53:35Z

testsuite/tests/lib-random/parallel.ml

+  let c = Random.int 100 in
+  (a, b, c)
+
+(* We intentionally spawn many more domains than hardware threads, to


One technique for achieving the same aim (at least on Linux) is to use taskset to limit the number of available cores. This is currently implemented in the CI here:

ocaml/.github/workflows/build.yml

Lines 99 to 103 in 34776e7

- name: Run the testsuite (taskset -c 0)

if: ${{ matrix.id == 'taskset' }}

run: |

bash -xe tools/ci/actions/runner.sh test_multicore 1 "parallel" \

"lib-threads" "lib-systhreads" "weak-ephe-final"

Not sure if using this approach with a smaller domain_count works for you. That does have the downside (or upside depending on viewpoint) that it isn't exercised on a default local run of the testsuite.

I thought the maximal number of domains is 128. In this case it's risky to spawn 1000. I guess it works because each domain exits almost immediately, so domains die faster than they are created. But in this case the OS scheduler doesn't get much chance to interleave executions, does it?

Bottom line: I'd rather have a test with a small number of domains, and each domain waits a variable -- maybe even random! -- amount of time before drawing random numbers.

I'd rather have a test with a small number of domains, and each domain waits a variable -- maybe even random! -- amount of time before drawing random numbers.

Took the liberty to push the revised test on this PR's branch.

The original PR had the split-generator initialization done on the spawned domain (after just computing the splitting seed on the parent domain). I wanted to ensure in the test that not only the random draws would be interleaved, but also the initialization itself, and in particular that trying to mutate the parent domain from the child (which can happens if Domain is implemented incorrectly) would blow up. I think that I did check that writing Domain incorrectly in this way would fail the test.

The new implementation does all the computation on the parent, and of course those will never be interleaved (spawning multiple domains from a fixed parent), so we can do with a weaker test.

I'm not sure what failure modes we can observe with the revised test; ideally I would want to break the code again to see whether the test catches various issues.

I'm surprised to learn that domains are limited to 128, and that my test was not reaching that limit. I'll double-check. (128 seems a bit low, I could buy AMD Threadripper machines with that many threads today.)

I'm surprised to learn that domains are limited to 128,

ocaml/runtime/caml/config.h

Line 256 in 750e212

#define Max_domains 128

I'm surprised to learn that domains are limited to 128

We needed max domains to be a small number for fast read barrier checks. See the last paragraph in section 4.3.2 in https://arxiv.org/pdf/2004.11663.pdf. While we no longer use read barriers, we still have the virtual memory layout since it turns out to be good for minimizing memory hierarchy overheads. See ocaml-multicore/ocaml-multicore#508 for the experiments. The max domains value affects the size of the virtual address space reserved.

I have a proposal for removing this restriction here: ocaml-multicore/ocaml-multicore#795. For example, you could have max domains to be 256 with a max minor heap size of 2M, and the program will only reserve 512M of virtual memory space. Currently, we reserve 256G.

I integrated @xavierleroy's proposed test in the commit history. The test doesn't allow to detect races if several child are initialized at the same time, but it does detect that seeding is performed (that the random stream does not depend on the scheduling order of random draws).

The commit message for af8a906 still describes the previous, 1000-domain version of the test.

Done. (I squashed most test-related commits, and added comments in the test to explain what is going on.)

…d keys Compute the initial value fully in the parent domain (no lazy result). Store all parent-initialized keys and their splitting functions in a global list, so that there is no need to store "protocols" in the DLS tables.

xavierleroy

Looks very good to me. Thanks!

This was referenced Jan 12, 2022

RFC: generalize the Domain.DLS interface to split PRNG state for child domains ocaml-multicore/ocaml-multicore#756

Closed

Move Random to a splitting generator #10877

Closed

kayceesrk approved these changes Jan 13, 2022

View reviewed changes

xavierleroy reviewed Jan 13, 2022

View reviewed changes

stdlib/filename.ml Outdated Show resolved Hide resolved

ctk21 reviewed Jan 13, 2022

View reviewed changes

xavierleroy force-pushed the multicore-random branch from 1f44f52 to 38f2de8 Compare January 13, 2022 13:44

gasche force-pushed the multicore-random branch from 38f2de8 to c55a33a Compare January 13, 2022 15:32

gasche and others added 7 commits January 14, 2022 08:42

Domain.DLS: derive DLS values from parent domain

6538151

testsuite: test random numbers produced by child domains

076d42f

Random: implement lame-duck splitting

dec9024

use Random.State.split to seed child domains PRNG state

a172684

Domain.DLS: delay the computation of inherited keys on the child

68c93d4

DLS.new_key ~split_from_parent: document where the key is computed

e29ec97

gasche force-pushed the multicore-random branch from c55a33a to e29ec97 Compare January 14, 2022 09:37

xavierleroy approved these changes Jan 14, 2022

View reviewed changes

xavierleroy merged commit 8db757e into ocaml:trunk Jan 14, 2022

xavierleroy mentioned this pull request Jan 14, 2022

Reimplementation of Random using an LXM pseudo-random number generator #10742

Merged

kayceesrk mentioned this pull request Jan 23, 2022

Allow newly spawned domains to have a copy of the parent domain's local state ocaml-multicore/ocaml-multicore#587

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generalize the Domain.DLS interface to split PRNG state for child domains #10887

generalize the Domain.DLS interface to split PRNG state for child domains #10887

gasche commented Jan 12, 2022

kayceesrk left a comment •

edited

xavierleroy left a comment

ctk21 left a comment

ctk21 Jan 13, 2022

xavierleroy Jan 13, 2022

xavierleroy Jan 13, 2022

gasche Jan 13, 2022

gasche Jan 13, 2022

xavierleroy Jan 13, 2022

kayceesrk Jan 13, 2022 •

edited

gasche Jan 13, 2022

xavierleroy Jan 13, 2022

gasche Jan 14, 2022

xavierleroy left a comment

	- name: Run the testsuite (taskset -c 0)
	if: ${{ matrix.id == 'taskset' }}
	run: \|
	bash -xe tools/ci/actions/runner.sh test_multicore 1 "parallel" \
	"lib-threads" "lib-systhreads" "weak-ephe-final"

generalize the Domain.DLS interface to split PRNG state for child domains #10887

generalize the Domain.DLS interface to split PRNG state for child domains #10887

Conversation

gasche commented Jan 12, 2022

kayceesrk left a comment • edited

Choose a reason for hiding this comment

xavierleroy left a comment

Choose a reason for hiding this comment

ctk21 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kayceesrk Jan 13, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xavierleroy left a comment

Choose a reason for hiding this comment

kayceesrk left a comment •

edited

kayceesrk Jan 13, 2022 •

edited