MAINT: remove xoshiro* BitGenerators #13793

mattip · 2019-06-16T11:05:05Z

In order to ground the discussion about which BitGenerator to include/remove, this PR suggests removing Xoshori512 and Xoshiro256 and change the default to PCG64. Quoting @rkern (hopefully not taken too much out of context):

Possibly irrationally (I'll leave it to others to judge), I just don't want Xoshiro in numpy. I used to use the predecessors of this algorithm in C code where I needed a good, but short implementation, and the author's claims of statistical quality were enticing. But then other people ran the tests and found that they failed, for a couple of generations of this lineage of algorithms. Having been burnt by this family of algorithms once, I'm not eager to give it pride of place in numpy. While this particular member passes the current tests, analysis shows some weird behaviors that could easily become a future failure.

Any other thoughts?

mattip · 2019-06-16T11:05:53Z

There are still a few references in the performance tables that I will remove if the direction is to accept this PR.

mattip · 2019-06-16T17:42:28Z

xerf #13675 and the @rkern's comment there

mattip · 2019-06-20T08:53:19Z

I have moved the code to numpy/bitgenerators. As of today it is not yet building, I hope to fix that over the next few days but that enables merging this PR as the code will not get lost.

bashtage · 2019-06-20T12:00:00Z

I'm fine with dropping Xoshiro512. The extra state doesn't add anything important and it is slower.

I am on the other side for Xoshiro256. It is a fast (fastest among the remaining ones) generator that has good performance across many platforms, and in particular, doesn't require uint128 or a custom implementation. It is able to pass tests under a wide range of specifications.

charris · 2019-06-20T15:28:51Z

Needs rebase. I'm not greatly opposed to keeping one of these, but only because it seems to be popular. The main considerations for dropping both would be: one, is it needed, and two, @rkern seems to distrust the series in general. It's history does look like a series of hacks to improve its statistical qualities.

rkern · 2019-06-22T03:32:33Z

It is a fast (fastest among the remaining ones) generator that has good performance across many platforms, and in particular, doesn't require uint128 or a custom implementation.

I would still prefer JSF64 to satisfy those requirements. In my tests (64-bit Linux only, I'm afraid), it had the same performance as Xoshiro256.

❯ python benchmark.py
--------------------------------------------------------------------------------

Time to produce 1,000,000 Uniforms
************************************************************
JSF64          3.05 ms
MT19937        8.51 ms
PCG32          4.63 ms
PCG64          4.52 ms
Philox         8.65 ms
ThreeFry      10.37 ms
Xoshiro256     3.07 ms
numpy          8.34 ms
dtype: object

Uniforms per second
************************************************************
JSF64         327.89 million
MT19937       117.45 million
PCG32         216.09 million
PCG64         221.37 million
Philox        115.57 million
ThreeFry       96.41 million
Xoshiro256    326.01 million
numpy         119.90 million
dtype: object

Speed-up relative to NumPy
************************************************************
JSF64         173.5%
MT19937        -2.0%
PCG32          80.2%
PCG64          84.6%
Philox         -3.6%
ThreeFry      -19.6%
Xoshiro256    171.9%
dtype: object
--------------------------------------------------------------------------------

bashtage · 2019-06-22T07:03:49Z

JSF64, at least the one in the 2009 article, only has a 64 bit seed space which may be too small to consider for sequenced generators.

…

On Sat, Jun 22, 2019, 04:32 Robert Kern ***@***.***> wrote: It is a fast (fastest among the remaining ones) generator that has good performance across many platforms, and in particular, doesn't require uint128 or a custom implementation. I would still prefer JSF64 to satisfy those requirements. In my tests (64-bit Linux only, I'm afraid), it had the same performance as Xoshiro256 . ❯ python benchmark.py -------------------------------------------------------------------------------- Time to produce 1,000,000 Uniforms ************************************************************ JSF64 3.05 ms MT19937 8.51 ms PCG32 4.63 ms PCG64 4.52 ms Philox 8.65 ms ThreeFry 10.37 ms Xoshiro256 3.07 ms numpy 8.34 ms dtype: object Uniforms per second ************************************************************ JSF64 327.89 million MT19937 117.45 million PCG32 216.09 million PCG64 221.37 million Philox 115.57 million ThreeFry 96.41 million Xoshiro256 326.01 million numpy 119.90 million dtype: object Speed-up relative to NumPy ************************************************************ JSF64 173.5% MT19937 -2.0% PCG32 80.2% PCG64 84.6% Philox -3.6% ThreeFry -19.6% Xoshiro256 171.9% dtype: object -------------------------------------------------------------------------------- — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#13793?email_source=notifications&email_token=ABKTSRKICB74SV7PFODRRH3P3WMOHA5CNFSM4HYQ5TMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYJ7DOI#issuecomment-504623545>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKTSRJRCS7VGNGP5PTVF33P3WMOHANCNFSM4HYQ5TMA> .

rkern · 2019-06-22T17:16:25Z

You don't have to seed it exactly as in the article; C is very limiting. I implemented a 192-bit seed.

bashtage · 2019-06-22T17:31:37Z

The challenge of seeding using a larger space is that less is known about the short cycle properties of it. The 32 bit has been tried a good bit and doesn’t seem to have any when seeded using 32 bits. There are appears to be some very short cycles possible, and so it seems risky to come up with a second ad hoc seeding scheme.

…

On Sat, 22 Jun 2019 at 18:16 Robert Kern ***@***.***> wrote: You don't have to seed it exactly as in the article; C is very limiting. I implemented a 196-bit seed. <mattip#41> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#13793?email_source=notifications&email_token=ABKTSRLO5CHFKFCAGQPKI7LP3ZM73A5CNFSM4HYQ5TMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYKNVSQ#issuecomment-504683210>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKTSRLHGN7SGBTS2V5QQEDP3ZM73ANCNFSM4HYQ5TMA> .

rkern · 2019-06-22T17:58:19Z

The single-number seeding jsf32 was thoroughly studied by iterating over all of the 32-bit inputs. jsf64 was not studied in that way; even 1 64-bit integer is too big to study that way. But the math about these cycles is well understood, and there is no cause to worry if we use a good entropy-processor like SeedSequence.

But if it still worries you, then I also provide the related SFC64, which incorporates a 64-bit counter such that the absolute minimum cycle, should you be extraordinarily unlucky in your seeding, you have a minimum of 2**64 period.

bashtage · 2019-06-22T20:46:10Z

SFC is certainly attractive since it provides guarantees, which are always desirable, and 2**64 is enough If it had a jump/advance then I think it would dominate. I do believe that first-class citizens should all provide this alternative (and widely used) method for generating distinct sequences.

bashtage · 2019-06-22T21:12:19Z

I added timings in the other thread. Based on these, a jumpable SFC64 might be the best choice for a default :-).

mattip · 2019-06-24T17:46:49Z

@rkern, @bashtage: I understand merging this (removing these BitGenerators) is acceptable?

imneme · 2019-06-24T20:16:54Z

Just to be clear, I don't think we have any evidence (except for a copy-paste-o comment from me) that SFC is jumpable. But jumping isn't a feature most people need.

rkern · 2019-06-25T20:13:06Z

@rkern, @bashtage: I understand merging this (removing these BitGenerators) is acceptable?

It's acceptable to me.

mattip · 2019-06-25T20:17:16Z

I will rebase this after #13833 is merged

charris · 2019-06-25T20:25:37Z

#13833 is merged.

mattip · 2019-06-25T21:55:37Z

rebased, tests passing

charris · 2019-06-25T22:05:21Z

Thanks Matti.

vigna · 2020-05-19T20:31:58Z

I was just pointed at this discussion by a user. Three points that might be of interest:

The statistical problems of weak scramblers (xoroshiro128+, etc.) have always been well-known and documented: see http://xoshiro.di.unimi.it/lowcomp.php . I never understood why people rediscovering hot water made such a big fuss of it. The few lower bits have some linear dependencies, but GCC, Python, etc. use a generator in which all bits have linear dependencies, and nobody cares. Generators using strong scramblers (xoshiro256++, xoshiro256**, etc.) have no such dependency and pass all tests I'm aware of. If you use just the upper bit of a weakly scrambled generator (i.e., to generate float) everything is fine with BigCrush.
SFC64 is not jumpable. There is no way to iterate the next-state function multiple times. You can jump at random into multiple points of the state space, though, and hope that the resulting sequences do not overlap. The chance that this happens is very low—so low to be negligible. If you want to nitpick, that probability it is much lower for full-period generators such as xoshiro/xoroshiro with the same amount of state. If you're OK with not having full period, I think SFC64 is some of the best you can find around in terms of speed and statistical strength.
PCG generators have a lot of major statistical problems of self-correlation. You will not see this by a first-order statistical test, you must do second-order testing. In particular, there are non-overlapping sequences there are strongly correlated, and "streams" generated by changing the constant of the underlying LCG are strongly correlated. You can find examples here: http://prng.di.unimi.it/pcg.php . There has been ample discussion about these problems in the Rust community, the Julia community, Apache Commons, etc. A few pointers:

rust-random/rand#905
rust-random/rand#907
https://issues.apache.org/jira/browse/RNG-123

If you're not using the "streams" features, of course, you're free from that type of correlation. But, still, the orbit of a PCG generator has tons of pairs of correlated nonoverlapping subsequences (see the examples pointed above). This does not happen with a good linear generator (even better, if scrambled), or in fact, with any good generator. The problem is that LCG with power-of-2 modulus are terrible generators, and the superficial scrambling performed by most PCG generators does not hide this fact.

charris · 2020-05-19T21:50:17Z

@vigna You can see a more extended discussion at #13635.

mattip added the component: numpy.random label Jun 16, 2019

mattip force-pushed the remove-xoshiro branch from 75b0066 to 9dce94b Compare June 16, 2019 11:54

charris added this to the 1.17.0 release milestone Jun 16, 2019

charris added 54 - Needs decision 03 - Maintenance labels Jun 16, 2019

mattip force-pushed the remove-xoshiro branch 2 times, most recently from 5440bd9 to 791b87a Compare June 20, 2019 09:04

mattip force-pushed the remove-xoshiro branch 2 times, most recently from 4e34710 to aa6a476 Compare June 24, 2019 16:01

mattip force-pushed the remove-xoshiro branch from aa6a476 to cd0a8a0 Compare June 25, 2019 20:38

mattip mentioned this pull request Jun 25, 2019

ENH: use SeedSequence instead of seed() #13780

Merged

MAINT: remove xoshiro* BitGenerators

02f63e0

mattip force-pushed the remove-xoshiro branch from cd0a8a0 to 02f63e0 Compare June 25, 2019 21:08

charris merged commit 8bb4645 into numpy:master Jun 25, 2019

mattip deleted the remove-xoshiro branch August 8, 2019 17:20

zhudotexe mentioned this pull request Sep 21, 2021

[Enhancement] Random.py Package Improvement avrae/d20#7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: remove xoshiro* BitGenerators #13793

MAINT: remove xoshiro* BitGenerators #13793

mattip commented Jun 16, 2019

mattip commented Jun 16, 2019

mattip commented Jun 16, 2019

mattip commented Jun 20, 2019

bashtage commented Jun 20, 2019

charris commented Jun 20, 2019

rkern commented Jun 22, 2019

bashtage commented Jun 22, 2019 via email

rkern commented Jun 22, 2019 •

edited

bashtage commented Jun 22, 2019 via email

rkern commented Jun 22, 2019

bashtage commented Jun 22, 2019

bashtage commented Jun 22, 2019

mattip commented Jun 24, 2019

imneme commented Jun 24, 2019

rkern commented Jun 25, 2019

mattip commented Jun 25, 2019

charris commented Jun 25, 2019

mattip commented Jun 25, 2019

charris commented Jun 25, 2019

vigna commented May 19, 2020

charris commented May 19, 2020

MAINT: remove xoshiro* BitGenerators #13793

MAINT: remove xoshiro* BitGenerators #13793

Conversation

mattip commented Jun 16, 2019

mattip commented Jun 16, 2019

mattip commented Jun 16, 2019

mattip commented Jun 20, 2019

bashtage commented Jun 20, 2019

charris commented Jun 20, 2019

rkern commented Jun 22, 2019

bashtage commented Jun 22, 2019 via email

rkern commented Jun 22, 2019 • edited

bashtage commented Jun 22, 2019 via email

rkern commented Jun 22, 2019

bashtage commented Jun 22, 2019

bashtage commented Jun 22, 2019

mattip commented Jun 24, 2019

imneme commented Jun 24, 2019

rkern commented Jun 25, 2019

mattip commented Jun 25, 2019

charris commented Jun 25, 2019

mattip commented Jun 25, 2019

charris commented Jun 25, 2019

vigna commented May 19, 2020

charris commented May 19, 2020

rkern commented Jun 22, 2019 •

edited