Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplementation of Random using the Xoshiro256++ PRNG #10701

Closed
wants to merge 8 commits into from

Conversation

xavierleroy
Copy link
Contributor

@xavierleroy xavierleroy commented Oct 14, 2021

[ I am not proposing this PR for inclusion in OCaml 4.14. It is intended for Multicore OCaml and OCaml 5.00. ]

Following extensive experimentation (https://github.com/xavierleroy/pringo/) and discussions (ocaml/RFCs#28), this PR reimplements the Random standard library module using the Xoshiro256++ pseudo-random number generator by Blackman and Vigna.

The main benefit of the new PRNG is that it supports "jumping", i.e. advancing quickly very far into the pseudo-random sequence. (The current Random implementation does not support this.) Jumping will come very handy for Multicore OCaml, enabling each domain to maintain its own PRNG state in domain-local storage, with "jumping" being used to give a fresh PRNG state when a new domain is created.

Jumping also provides the ability to split a PRNG state in two independent PRNGs, something that can be quite useful for Quickcheck-style generation of random functions and streams.

Other benefits of the Xoshiro256-based PRNG include

  • A modern design, extensively analyzed by Vigna et al.
  • 256-bit internal state that makes reseeding unnecessary even in the presence of intensive jumping.
  • Nice performance improvements w.r.t the current implementation on 64-bit platforms (but: not nice performance degradation on 32-bit platforms). [See benchmarks below]

Two other PRNGs were tried in Pringo. SplitMix has a small internal state (2^64 bits) which probably requires reseeding after 2^32 draws or splits. Chacha is a lot of code for a PRNG (it's based on a cryptographic cipher) and is oriented towards generating byte streams, requiring even more buffering code to produce integers and FP numbers. In the end, Xoshiro wins.

The implementation in stdlib/random.ml is going to change.
Also add the ability to split the PRNGs (Random.split, Random.State.split)
Also: add trivial test for Random.split.
- Use binary digits of pi from Wolfram Alpha for the magic constants.

- Test against Blackman and Vigna's reference C implementation
  with known initial state [| 1; 2; 4; 8 |]
Use multiply-and-add rather than rotate-and-xor to mix the integers
from the array given to Random.full_init.
@xavierleroy
Copy link
Contributor Author

Some benchmarks:

Times are in seconds for 50_000_000 repetitions, unless indicated.

x86 64 bits:

4.13.1   Xoshiro

  0.30    0.22  Random.bits
  0.78    0.62  Random.int 0xFFEE
  1.18    0.71  Random.int32 0xFFEEDD
  1.39    0.68  Random.int64 0xFFEEDDCCAA
  0.73    0.32  Random.float 1.0
 11.16    0.01  Random.full_init (/100)

ARM 32 bits:

4.13.1   Xoshiro

  1.59    2.45  Random.bits
  4.62    5.75  Random.int 0xFFEE
  5.13    6.22  Random.int32 0xFFEEDD
 29.64   21.16  Random.int64 0xFFEEDDCCAA
  3.36    7.51  Random.float 1.0
 53.69    0.05  Random.full_init 3 (/100)

@xavierleroy
Copy link
Contributor Author

xavierleroy commented Oct 14, 2021

Some implementation remarks:

The internal state is an array of four 64-bit integers. (To support splitting, there are two such arrays, but only one changes when drawing numbers.) It is crucial for performance that the integers are unboxed.

The current implementation uses a 1D bigarray. It's a bit wasteful of space, but has many qualities:

  • The array is 64-bit aligned even in 32-bit platforms, saving some C headaches.
  • We get marshalling, unmarshalling, and other ad-hoc primitives for free, all the work has already been done for bigarrays.
  • Non-speed-critical operations (e.g. allocation, copying) use Bigarray functions, no need to provide specific C implementations of these operations.

An alternative that I used in Pringo is an Abstract block or Custom block that carries the four 64-bit integers directly as payload. This saves some space, but raises some delicate alignment issues on 32-bit platforms. Plus marshaling etc. needs to be implemented by hand.

I'm not sure which of the two approaches is best for JS-of-OCaml. Does JS-of-OCaml has a reasonably efficient implementation of bigarrays?

@nojb
Copy link
Contributor

nojb commented Oct 14, 2021

I'm not sure which of the two approaches is best for JS-of-OCaml. Does JS-of-OCaml has a reasonably efficient implementation of bigarrays?

@hhugo is the person to ask. But based on a brief glance it seems that int64 bigarrays are not currently supported:

https://github.com/ocsigen/js_of_ocaml/blob/f80c50d2d9cba273ddea46c5613f43d5b69a0ff3/runtime/bigarray.js#L74

@dbuenzli
Copy link
Contributor

dbuenzli commented Oct 14, 2021

I think they are by piggybacking on Int32Array. Note sure why BigInt64Array is not used but maybe an array element representation mismatch.

@jhjourdan
Copy link
Contributor

It turns out that Memprof is also using Xoshiro, but I don't think this would be a good thing to share the implementations, since Memprof is using a specialized version to make it possible to use SIMD instructions.

Apart from that, the current implementation of the split function has two "defects", which would at least need to be documented: first, by sharing the origin field, this makes the split function thread-unsafe, which is somewhat unexpected. Second, this implementation make results of sampling depend on the order of split calls: s1 = split s; s2 = split s1; s3 = split s is not equivalent to s1 = split s; s3 = split s; s2 = split s1. In a multithreaded environment, this means that sampling can depend on scheduling, while we could expect it to be deterministic.

@xavierleroy
Copy link
Contributor Author

first, by sharing the origin field, this makes the split function thread-unsafe, which is somewhat unexpected

If by "thread" you mean the current systhreads, the copy-and-jump-forward is supposed to be atomic w.r.t. preemption, so two splits in two different threads should not result in the same state.

If by "thread" you mean Multicore OCaml domains, it is a race to use a PRNG state in two domains without synchronization (not just splitting, but also drawing numbers).

Did I miss something?

@jhjourdan
Copy link
Contributor

If by "thread" you mean Multicore OCaml domains, it is a race to use a PRNG state in two domains without synchronization (not just splitting, but also drawing numbers).

I was indeed thinking of Multicore domains, since this PR is to be merged in multicore OCaml.

What I feel unexpected is that when you call split, I expect the two PRNGs to be completely independent. But actually, they have hidden shared state: the origin field. So if we call split concurrently on these two PRNG, then we get a race.

@jhjourdan
Copy link
Contributor

In other word, the following program is racy:

s = init
s1 = split s
fork(fun _ -> s2 = split s1)
s3 = split s

I think that this is unexpected.

@xavierleroy
Copy link
Contributor Author

OK, I see your point. But I don't see how to implement splitting by jumping forward if we don't have a piece of shared state to advance. How to make sure we don't get the same PRNG if we split twice?

@gasche
Copy link
Member

gasche commented Oct 14, 2021

One naive approach would be to halve the jump distance at each split:

  • each generator "owns" a contiguous portion of the search space, indicating by a (mutable) "depth" counter: if s has a "depth" D, it owns 2^D draws
  • if s has depth A+1, then split(s) decrements s' depth (it now only owns 2^A draws) and returns a state that is shifted by 2^A.

The obvious problem is that it's relatively easy to run out of space by repeatedly splitting in an unbalanced way.

(I would expect the Haskell people to have thought about this problem and come up with good solutions, since obviously they care about observable purity of the generator state.)

@lthls
Copy link
Contributor

lthls commented Oct 14, 2021

As an exercise I tried to make a decently efficient implementation of the Xoshiro module in pure OCaml, and to my own surprise I managed to get less than 10% overhead compared to the C implementation on my laptop.
If you're interested I can share my code ("pure OCaml" might be a stretch, as it uses the %caml_bytes_get64u and %caml_bytes_set64u primitives).

@gasche
Copy link
Member

gasche commented Oct 14, 2021

If you're interested I can share my code

Yes please!

@lthls
Copy link
Contributor

lthls commented Oct 14, 2021

Here it is (sorry for the wall of text):

module Xoshiro = struct
  type state = Bytes.t

  external get64u : Bytes.t -> int -> Int64.t = "%caml_bytes_get64u"

  external set64u : Bytes.t -> int -> Int64.t -> unit = "%caml_bytes_set64u"

  let[@inline] get s i = get64u s (i * 8)

  let[@inline] set s i n = set64u s (i * 8) n

  let[@inline] rotl x k =
    Int64.(logor (shift_left x k) (shift_right_logical x (64 - k)))

  let next (s : state) : Int64.t =
    let s0 = ref (get s 0) in
    let s1 = ref (get s 1) in
    let s2 = ref (get s 2) in
    let s3 = ref (get s 3) in
    let result = Int64.add (rotl (Int64.add !s0 !s3) 23) !s0 in
    let t = Int64.shift_left !s1 17 in
    s2 := Int64.logxor !s2 !s0;
    s3 := Int64.logxor !s3 !s1;
    s1 := Int64.logxor !s1 !s2;
    s0 := Int64.logxor !s0 !s3;
    s2 := Int64.logxor !s2 t;
    s3 := rotl !s3 45;
    set s 0 !s0;
    set s 1 !s1;
    set s 2 !s2;
    set s 3 !s3;
    result

  let long_jump_array = Bytes.create 32
  let () =
    set long_jump_array 0 0x76e15d3efefdcbbfL;
    set long_jump_array 1 0xc5004e441c522fb3L;
    set long_jump_array 2 0x77710069854ee241L;
    set long_jump_array 3 0x39109bb02acbe635L;
    ()

  let jump_array = Bytes.create 32
  let () =
    set jump_array 0 0x180ec6d33cfd0abaL;
    set jump_array 1 0xd5a61266f0c9392cL;
    set jump_array 2 0xa9582618e03fc9aaL;
    set jump_array 3 0x39abdc4529b1661cL;
    ()

  let[@inline] jump_gen arr (s : state) =
    let s0 = ref 0L in
    let s1 = ref 0L in
    let s2 = ref 0L in
    let s3 = ref 0L in
    for i = 0 to ((Bytes.length arr) / 8) - 1 do
      for b = 0 to 63 do
        if Int64.logand (get arr i) (Int64.shift_left 1L b) <> 0L
        then begin
          s0 := Int64.logxor !s0 (get s 0);
          s1 := Int64.logxor !s1 (get s 1);
          s2 := Int64.logxor !s2 (get s 2);
          s3 := Int64.logxor !s3 (get s 3);
          ()
        end;
        ignore (next s)
      done
    done;
    set s 0 !s0;
    set s 1 !s1;
    set s 2 !s2;
    set s 3 !s3;
    ()

  let jump s = jump_gen jump_array s

  let _long_jump s = jump_gen long_jump_array s

  let init s vinit =
    let mix = 6364136223846793005L in
    set s 0 0L;
    set s 1 0L;
    set s 2 0L;
    set s 3 0L;
    for i = 0 to Array.length vinit - 1 do
      set s (i mod 4) (Int64.add (Int64.mul (get s (i mod 4)) mix) (Int64.of_int vinit.(i)))
    done;
    for i = 0 to 3 do
      if get s i = 0L then set s i (Int64.shift_left 1L i)
    done;
    ()

  let create () : state =
    Bytes.create 32

  let assign (dst: state) (src: state) =
    Bytes.blit src 0 dst 0 32

  let copy st = Bytes.copy st
end

@jhjourdan
Copy link
Contributor

OK, I see your point. But I don't see how to implement splitting by jumping forward if we don't have a piece of shared state to advance. How to make sure we don't get the same PRNG if we split twice?

Perhaps we could expose the jump and long_jump functions instead of a split function. This offers a slightly different API, but, in practice, I don't think it is much more difficult to use (especially if split is not thread safe).

The jump-based implementation of Random.State.split doesn't quite match
the standard notion of PRNG splitting.  Better leave this for future work.
@xavierleroy
Copy link
Contributor Author

I agree I shot myself in the foot with this jump-based, not-quite-right implementation of splitting. I just pushed a new version of this PR that exposes no split function, just a jump function that operates on the default state exclusively (I'm not sure jumping from an arbitrary Random.State.t is useful).

@xavierleroy
Copy link
Contributor Author

I would expect the Haskell people to have thought about this problem and come up with good solutions, since obviously they care about observable purity of the generator state

AFAIK they use the same trick that PRINGO's Chacha-based generator uses: from the current N-bits state, draw N pseudorandom bits and use that as the state for the new PRNG returned by split. There's a semi-formal argument that this construction is safe (there are no significant correlations between the new PRNG and the original PRNG) if the PRNG is cryptographically strong.

SplitMix is not cryptographically strong, so it takes some extra steps on top of the construction above to avoid producing a weak PRNG by splitting.

@xavierleroy
Copy link
Contributor Author

If you're interested I can share my code ("pure OCaml" might be a stretch, as it uses the %caml_bytes_get64u and %caml_bytes_set64u primitives).

Strange choice. These primitives are very slow on machines that don't support unaligned memory accesses. What about a bigarray-based representation (as in my code) plus unchecked bigarray accesses? That should produce nice code too, while avoiding alignment issues.

@lthls
Copy link
Contributor

lthls commented Oct 15, 2021

Strange choice. These primitives are very slow on machines that don't support unaligned memory accesses. What about a bigarray-based representation (as in my code) plus unchecked bigarray accesses? That should produce nice code too, while avoiding alignment issues.

I forgot about that actually. A bigarray-based representation should work too; you would only have to change the get and set functions and a few calls to Bytes functions. I expect that the result will be slightly slower because of the extra indirection, but I'm not sure I'll be able to observe it.

By the way, since all of the accesses are actually aligned on 64 bits, I would have liked the compiler to notice it and generate good code even on architectures that don't support unaligned access (at least for the accesses with constant indices), but apparently this optimisation isn't done at the moment.

Finally, while this code should be 100% compatible with Js_of_ocaml (as long as we're using the Bytes version), it is going to be very slow on 32-bit native architectures as every Int64 operation goes through a C call.

@jhjourdan
Copy link
Contributor

I agree I shot myself in the foot with this jump-based, not-quite-right implementation of splitting. I just pushed a new version of this PR that exposes no split function, just a jump function that operates on the default state exclusively (I'm not sure jumping from an arbitrary Random.State.t is useful).

Indeed, I think this is better.

However, I still have a small improvement to suggest: the current implementation adds an additional internal state, default_origin, which is difficult to control from users. It is modified by full_init, but not by set_state, which might be confusing for users who desire reproducibility. Instead, I would propose jump to return the current internal state and make the internal state jump. That way, we don't rely on an additional internal state, and every state returned by jump is effectively statistically independent with anything else.

Also, even if jumping might not be useful on arbitrary state, it is neither very costly to provide it (the function already exists...).

@gasche
Copy link
Member

gasche commented Oct 19, 2021

I have mixed feelings about exposing jump directly:

  1. It means that we won't be able in the future to switch to a generator that does not allow jumping. This may not be an issue, if we feel certain that jumping is an important primitive for a good-quality PRNG, but it's worth thinking about.

  2. The more important issue is that, if I understand correctly, we don't actually know how to implement split from jump. This is probably bad news for the QuickCheck use-case, which I suppose really relies on split. This is probably fine for Multicore, as jumping by 2^128 on each new domain sounds reasonable, but I would like to work the details out.

For the Multicore use-case, initially I thought that the spawning domain would split its generator, with the new state going to the spawned domain. With this approach it's not clear how to jump, because it is not the case that only the first domain will spawn new domains, we effectively have an arbitrary tree of spawns (so for example jumping by a fixed amount on domain creation, such as 2^128, means that sometimes two new domains will end up in the same state).
Another idea is to use global synchronized state, for example have a locked global generator that jumps on every new domain creation. This works, and having a lock is not an issue as long as it's only on domain creation. But it also makes it more work to implement, especially if we want a library-side implementation (instead of modifying the global runtime to support the PRNG).

@gasche
Copy link
Member

gasche commented Oct 19, 2021

To clarify: I think that the current API is probably fine, and I'm not worried about the Multicore use-case which I think should be the primary concern. I just think that it's worth making sure that we really want to give up on split for the standard generator: it's reasonable (we never supported splitting in the first place, and my understanding is that we don't fully trust the statistical guarantees of splittable gnerators), but has downsides.

As suggested by @jhjourdan, it's simpler to take a copy of the current
default state then jump it.
@xavierleroy
Copy link
Contributor Author

I would propose jump to return the current internal state and make the internal state jump.

Nice simplification! Implemented in aa36f91 .

@xavierleroy
Copy link
Contributor Author

Another idea is to use global synchronized state, for example have a locked global generator that jumps on every new domain creation. This works, and having a lock is not an issue as long as it's only on domain creation.

That was my original plan, using a "long jump" (advancing by 2^192 steps) instead of a "jump" (2^128 steps), so that each domain can still jump to its heart's content.

But it also makes it more work to implement, especially if we want a library-side implementation (instead of modifying the global runtime to support the PRNG).

Not sure a modification is needed in the runtime. I thought that domain-local variables can have an initializer function (in OCaml) that is called when a new domain is created, to initialize the domain-local variable. Here, the state of the "basic generator" would be domain-local, and would be initialized as you describe (lock global generator, copy it, longjump it, unlock), all in OCaml.

@alainfrisch
Copy link
Contributor

(lock global generator, copy it, longjump it, unlock)

I did not follow all the discussion, and I'm not an expert on this topic, so perhaps my question is really stupid, but : I thought that using PRNG with an explicit state and splitting (rather than re-initializing) should allow to get a deterministic behavior for pseudo-random numbers, even in presence of multi-domain parallelism; but using a global generator introduces a racy semantics (the pseudo-random numbers drawn by a given domain depend on when the domain was started, compared to other domains). Isn't it problematic?

@gasche
Copy link
Member

gasche commented Oct 27, 2021

Yes indeed, I believe @alainfrisch is right: one good reason to avoid global synchronization is to make random draws depend only on the code executed on each domain, and not their scheduling. (The code executed on each domain may itself depend on the scheduling.)

@xavierleroy
Copy link
Contributor Author

xavierleroy commented Oct 28, 2021

I'd like to better understand the point @alainfrisch is trying to make.

The "(lock global generator, copy it, longjump it, unlock)" proposal is for the default behavior of programs that create new domains but don't do anything special to control the PRNG. It is a better alternative to reseeding the PRNG from system data each time a domain is created, like Multicore OCaml does today. (Better because faster and with stronger independence guarantees.) Neither approach is deterministic, but I don't think determinism can be achieved here at all given that the creation of domains is itself nondeterministic (depends on scheduling) in the general case.

A program that wants to control multiple PRNGs deterministically can do it with jumps and the proposed API, there's often no need for splitting. Consider

let main () =
  let r1 = Random.jump() in let r2 = Random.jump() in 
  let d1 = Domain.spawn (fun () -> ... use Random.State functions with r1 ... ) in
  let d2 = Domain.spawn (fun () -> ... use Random.State functions with r2 ...) in ...

The only thing you cannot do with jumps is an arbitrarily-deep deterministic tree of PRNGs. jump gives you one level of splitting, jump and longjump give two, but split is required to get arbitrarily-many levels.

Was this the point being made?

@alainfrisch
Copy link
Contributor

The only thing you cannot do with jumps is an arbitrarily-deep deterministic tree of PRNGs. jump gives you one level of splitting, jump and longjump give two, but split is required to get arbitrarily-many levels.

Indeed, I had in mind calling jump from each sub-domain and I was considering the nested case.

Now, I'm wondering why jump is only provided for the global "default" state. Wouldn't it be useful to expose also a version with an explicit input state?

@gasche
Copy link
Member

gasche commented Oct 28, 2021

@xavierleroy There is a very natural implementation of PRNG initialization on Domain.spawn using split: the spawning domain splits its state in two, and gives one of the split states to the spawned domain. But given that any domain can spawn a new domain, this does require supporting a "tree of PRNGs", so it cannot be done easily with jump. Unlike the synchronized-jump approach, this implementation is deterministic -- the generator state in each domain depends only on the code that was executed in this domain's past (including parent domains until spawn), not on scheduling.

In practice, the "deterministic trees of PRNGs" used for domains have relatively few nodes, but they may be very unbalanced (for example: the first domain creates all other domains). I would expect approaches based on "random jumps in the state space" to give very good statistical properties.

@xavierleroy
Copy link
Contributor Author

xavierleroy commented Oct 28, 2021

Now, I'm wondering why jump is only provided for the global "default" state. Wouldn't it be useful to expose also a version with an explicit input state?

There's a high risk of getting the same PRNG state by two apparently-different paths. E.g. if you have state A, jump it to get state B, then jump B to get state C, you end up with A = B.

@xavierleroy
Copy link
Contributor Author

the spawning domain splits its state in two, and gives one of the split states to the spawned domain.

Note that Multicore OCaml currently gives no simple way to pass some data (here the split state) from the parent domain (the one that calls Domain.spawn) to the spawned domain (the one created by Domain.spawn). So, we'd need to work harder here.

If you have such a mechanism, the "lock global generator, copy it, longjump it, unlock" sequence could be run in the parent domain instead of the spawned domain, making PRNG states a bit more deterministic.

@gasche
Copy link
Member

gasche commented Oct 28, 2021

So, we'd need to work harder here.

Yes, we would generalize the domain-local-state module to move from a unit -> 'a "initializer" for new domains to something closer to 'a * ('a -> 'a). See also ocaml-multicore/ocaml-multicore#582 (comment) from @lpw25.

@alainfrisch
Copy link
Contributor

There's a high risk of getting the same PRNG state by two apparently-different paths. E.g. if you have state A, jump it to get state B, then jump B to get state C, you end up with A = B.

I understand the risk, but forcing to use the global state is also risky; any third-party library can think it's a good idea to use that global state as well, making your code (which needs to behave deterministically) unreliable.

Btw, just to illustrate that I'm a complete outsider to this discussion: what's the problem of implementing a split with State.make applied to some numbers drawn from the initial state (possibly applying some cryptographic digest)? Are there (known or possible) problems such as some correlation between the new state and the previous one? Or is it considered to be too slow?

@xavierleroy
Copy link
Contributor Author

I understand the risk, but forcing to use the global state is also risky; any third-party library can think it's a good idea to use that global state as well, making your code (which needs to behave deterministically) unreliable.

Good statistical properties of the random numbers are more important than determinism. Don't force us into an inferior solution because you're obsessed with determinism.

Btw, just to illustrate that I'm a complete outsider to this discussion: what's the problem of implementing a split with State.make applied to some numbers drawn from the initial state (possibly applying some cryptographic digest)? Are there (known or possible) problems such as some correlation between the new state and the previous one? Or is it considered to be too slow?

This is essentially the Haskell / Pringo+Chacha solution I mentioned previously. It's rumoured to be OK statistically speaking if the PRNG is cryptographically strong. If it is not, like Xoshiro, there's always a risk that a long sequence of splits will converge to a not-at-all-random state of the PRNG. Speed is not the issue here; randomness is.

@xavierleroy
Copy link
Contributor Author

Closing in favor of #10742.

@xavierleroy xavierleroy closed this Nov 1, 2021
@curiousleo
Copy link
Contributor

curiousleo commented Nov 14, 2021

I would expect the Haskell people to have thought about this problem and come up with good solutions, since obviously they care about observable purity of the generator state

AFAIK they use the same trick that PRINGO's Chacha-based generator uses: from the current N-bits state, draw N pseudorandom bits and use that as the state for the new PRNG returned by split. There's a semi-formal argument that this construction is safe (there are no significant correlations between the new PRNG and the original PRNG) if the PRNG is cryptographically strong.

The Haskell people then thought about this some more and realised that the semi-formal argument wasn't good enough (or rather, assuming that the PRNG being used -- L'Ecuyer 1988 -- was cryptographically strong was too much of a stretch)!

The handmade split in Haskell's random prior to v1.2.0 did not result in independent sequences: haskell/random#25.

The solution we settled on was to switch to SplitMix: idontgetoutmuch/random#22. (This repo was used to collaborate on random v1.2.0. Official release notes for v1.2.0 are here: https://github.com/haskell/random/blob/master/CHANGELOG.md#120).

A few things to mention here:

  • Back when we worked on random v1.2.0, I looked around a bit at other PRNG APIs, perhaps some of the pointers come in handy here: Prior art: API design idontgetoutmuch/random#13 and Seeding idontgetoutmuch/random#18
  • To test various PRNGs, we put together a Nix setup with PRNG test batteries. If you have Nix installed, this makes it very easy to run PractRand (recommended), TestU01, dieharder, and a couple of other tests: https://github.com/tweag/random-quality
  • I created a branch over the weekend where PCG is used in stdlib/random.ml. I was just about to write a post about it on the forum when I realised that this would be affected by the Multicore effort, and found this thread. I'd quite like to add an implementation of PCG-DXSM to Pringo to see how it compares.

@gasche
Copy link
Member

gasche commented Nov 14, 2021

@curiousleo thanks for the feedback! If you are interested in the OCaml-side discussion, I would recommend looking at RFCs#28 which is somehow the "meta-issue" for this PR and the followup #10742. (The discussion is spread between the three issues, which makes it a bit difficult to follow, but oh well.)

Sebastiano Vigna, a co-author of Xoshiro, has a rather negative-sounding page about PCG (The wrap-up on PCG generators). This being said, Melissa O'Neil, the author of PCG generators, has a negative-sounding page about Xoshiro (Exploring Xoshiro's close-repeat flaws). I guess that it is difficult for non-experts to sort out what do believe about those generators :-)

(For me the main conclusion of this cross-issue/PRs discussion is that we really want a split function, not just jump; I think that it would be reasonable to use any well-tested splittable PRNG for this, and we can always change algorithms later if the random-generation community decides on something else. LXM, used in the latest proposal by @xavierleroy, seems to qualify as a good proposal in this sense.)

@curiousleo
Copy link
Contributor

curiousleo commented Nov 15, 2021

If you are interested in the OCaml-side discussion, I would recommend looking at RFCs#28 which is somehow the "meta-issue" for this PR and the followup #10742. (The discussion is spread between the three issues, which makes it a bit difficult to follow, but oh well.)

Thank you! I found the RFC and other PR right after posting on this closed PR, where visibility is probably lowest. I intend to catch up on the discussions you linked to over the next few days.

Sebastiano Vigna, a co-author of Xoshiro, has a rather negative-sounding page about PCG (The wrap-up on PCG generators). This being said, Melissa O'Neil, the author of PCG generators, has a negative-sounding page about Xoshiro (Exploring Xoshiro's close-repeat flaws). I guess that it is difficult for non-experts to sort out what do believe about those generators :-)

I'm aware -- and I agree that it's tricky to figure out how significant those weaknesses are. Both are in use: Lua 5.4 switched to Xoshiro, numpy uses PCG by default.

(For me the main conclusion of this cross-issue/PRs discussion is that we really want a split function, not just jump; I think that it would be reasonable to use any well-tested splittable PRNG for this, and we can always change algorithms later if the random-generation community decides on something else. LXM, used in the latest proposal by @xavierleroy, seems to qualify as a good proposal in this sense.)

Agreed again! If in the end people want to split their PRNGs, I think it's a good idea to put thought and effort into a good split function. Letting users figure out how to split only given jump sounds like a loaded footgun ready to misfire. (This discussion is the first time I've heard of LXM, another PRNG to check out!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants