Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider a faster alternative algorithm for random.shuffle() #108598

Open
d-rideout opened this issue Aug 29, 2023 · 72 comments
Open

Consider a faster alternative algorithm for random.shuffle() #108598

d-rideout opened this issue Aug 29, 2023 · 72 comments
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 new features, bugs and security fixes docs Documentation in the Doc dir stdlib Python modules in the Lib dir

Comments

@d-rideout
Copy link

d-rideout commented Aug 29, 2023

The documentation for the shuffle() function in the random module expresses concern that, for sequences of length larger than 2080, the shuffle() function is not able to produce all permutations of the sequence. While true, I find the emphasis on the finite period of the Mersenne Twister random number generator misleading, since the generator is only called n-1 times for a sequence of length n. The real issue is that any algorithm based on a sequence of calls to a pseudorandom number generator will not be able to generate more permutations than the generator has internal states.

(I also wonder how large of a sequence can be sent to shuffle() in practice, to be guaranteed a uniform distribution over permutations. Is 2079 small enough? 2078? Might there be a check for this?)

Linked PRs

@d-rideout d-rideout added the docs Documentation in the Doc dir label Aug 29, 2023
@AA-Turner AA-Turner added stdlib Python modules in the Lib dir 3.11 only security fixes 3.12 bugs and security fixes 3.13 new features, bugs and security fixes labels Aug 29, 2023
d-rideout added a commit to d-rideout/cpython that referenced this issue Aug 29, 2023
@AA-Turner
Copy link
Member

@d-rideout we'd welcome a suggested rephrasing of the offending paragraph if you have one, please!

Quoting for posterity

Note that even for small len(x), the total number of permutations of x can quickly grow larger than the period of most random number generators. This implies that most permutations of a long sequence can never be generated. For example, a sequence of length 2080 is the largest that can fit within the period of the Mersenne Twister random number generator.

On your aside, the tests for shuffle() don't seem to exercise very long sequences, only checking that a sequence of one thousand elements is different when shuffled.

def test_shuffle(self):
shuffle = self.gen.shuffle
lst = []
shuffle(lst)
self.assertEqual(lst, [])
lst = [37]
shuffle(lst)
self.assertEqual(lst, [37])
seqs = [list(range(n)) for n in range(10)]
shuffled_seqs = [list(range(n)) for n in range(10)]
for shuffled_seq in shuffled_seqs:
shuffle(shuffled_seq)
for (seq, shuffled_seq) in zip(seqs, shuffled_seqs):
self.assertEqual(len(seq), len(shuffled_seq))
self.assertEqual(set(seq), set(shuffled_seq))
# The above tests all would pass if the shuffle was a
# no-op. The following non-deterministic test covers that. It
# asserts that the shuffled sequence of 1000 distinct elements
# must be different from the original one. Although there is
# mathematically a non-zero probability that this could
# actually happen in a genuinely random shuffle, it is
# completely negligible, given that the number of possible
# permutations of 1000 objects is 1000! (factorial of 1000),
# which is considerably larger than the number of atoms in the
# universe...
lst = list(range(1000))
shuffled_lst = list(range(1000))
shuffle(shuffled_lst)
self.assertTrue(lst != shuffled_lst)
shuffle(lst)
self.assertTrue(lst != shuffled_lst)
self.assertRaises(TypeError, shuffle, (1, 2, 3))

cc: @rhettinger & @mdickinson as the listed random experts

A

@tim-one
Copy link
Member

tim-one commented Aug 29, 2023

I don't see a good reason to change this.

There are, in general, no wholly "safe" lengths, in the sense of making all permutations equally likely. For a PRNG with P states, and a fixed list length K, there are P possible outcomes from shuffling the list (one for each possible starting state), so some outcomes are necessarily more common than others unless K! divides P exactly, and assuming all internal states are equally likely at the start.

So the Wikipedia article on "Fisher–Yates shuffle" (which Python implements) says - but without quantitative supporting argument:

"""
Thus, to minimize bias, the number of states of the PRNG should exceed the number of permutations by at least several orders of magnitude.
"""

When the Python docs were first written, Python used a Wichmann-Hill generator, with far smaller state space. Then even all permutations of a deck of cards were far out of reach, which I'm told some people had actually noticed in practice.

That gross problems are far harder to provoke today doesn't mean they can/should be ignored. But neither are the Python docs the place for extended discussion of possible problems.

@d-rideout
Copy link
Author

I see. Thank you for the clarifications.

There appear to be numerous unstated assumptions being made in all this discourse, including in the Wikipedia article that you cite. And I appreciate your admonition that the Python docs are not the correct place to iron out these ambiguities. However, I feel that the documentation should be correct, in the sense that the output of algorithm should be as claimed.

I submitted the following rewording as a pull request, which is cited above, before reading your response.

Note that the total number of permutations of *x* grows rapidly with its length,
such that the number of permutations can easily exceed the number of internal states of
a pseudo random number generator. This implies that most permutations of a long sequence
can never be generated. For example, a sequence of length greater than 2080 can not
be sampled uniformly by the Mersenne Twister random number generator.

How about the following, as a second, more accurate, attempt?

Note that generating random samples from a large discrete sample space, such as the set of permutations of an even moderately large set of 1000s of elements, using a pseudorandom number generator as the source of randomness, is a delicate business. The distribution of permutations returned by sample(x) is guaranteed to be exactly uniform only if len(x) < 3. If you need exactly uniform samples, try choice(list(itertools.permutations(x))) instead.

@tim-one
Copy link
Member

tim-one commented Aug 29, 2023

Leaving this to others - I'm not interested in pursuing it. The potential issues are too numerous and subtle to bother trying to be "more accurate" in some uselessly academic sense. It's uselessly academic enough already 😉.

For example, even

The distribution of permutations returned by sample(x) is guaranteed to be exactly uniform only if len(x) < 3

is wrong. Python the language guarantees no such thing. And for CPython using the Mersenne Twister, it's not true. Leaving aside that sample(s) isn't a valid call (perhaps you intended shuffle()? sample() requires a second argument), the Twister has an odd number of reachable states (the all-0 state is in a self-cycle, and cannot be reached from CPython code short of forcing it directly via setstate()). So across all its 2**19937-1 non-degenerate starting states, at best one shuffling of [1, 2] will appear one more time than the other. And it may be worse than that - even if 2! divided P (which it doesn't in this case), that's a necessary condition for uniformity, not necessarily a sufficient condition.

Going on to suggest choice(list(itertools.permutations(x))) as an alternative is worse - it's grossly impractical for all but very small lists. For example, for a 15-element list it would, on a 64-bit box, require over 10 terabytes of RAM just to hold the C-level array of pointers the list() intermediate would need to build.

Another alternative: remove all cautionary text from the docs. Offhand, I can't recall any other language/library writeup that gives any warnings about its shuffling code, despite that they're all vulnerable to the same things. As said before, the Python warnings date back to days when CPython was using a much feebler PRNG. As a practical matter, it may now be doing more harm than good to highlight any cautions.

@d-rideout
Copy link
Author

Yes, I meant shuffle(x).

The algorithm does not need to return a permutation for every internal state of the PRNG.

The revulsive character of the itertools.permutations(x) suggestion was intentional... But maybe it goes too far.

Thanks for considering it anyway.

@tim-one
Copy link
Member

tim-one commented Aug 29, 2023

The algorithm does not need to return a permutation for every internal state of the PRNG.

Sorry, I don't understand your intent there. When shuffle() is called, the PRNG does have a specific internal state at the time. That state entirely determines the result. Therefore, if there are P possible internal states, there are P possible results (which, of course, may not all be distinct).

You could, e.g., hypothesize a quite different implementation that, say, looks at the input and goes "oh! this is a list of length 3, and now I'll consult a table indexed by choice(6) to see which of the 6 possible permutations to apply".

I'm only talking about the Fisher-Yates shuffling algorithm we have (and have had since day 1). If you're going on to advocate for a different algorithm, then this is far from just a "doc" issue.

@d-rideout
Copy link
Author

Okay. I agree with your assessment of Fisher-Yates. I was indeed imagining a more general algorithm, which might circumvent the divisor issue that you raise by a special handling of some of the internal states.

@tim-one
Copy link
Member

tim-one commented Aug 30, 2023

In the realm of different algorithms, this one comes to mind: pick an int in range(length!), and use an unranking algorithm to drive swaps. Like, e.g., so:

from math import factorial
from random import randrange
def shuffle(xs):
    n = len(xs)
    f = factorial(n)
    r = randrange(f)
    for i in range(n - 1):
        f //= n - i
        d, r = divmod(r, f)
        xs[i], xs[i+d] = xs[i+d], xs[i]

Then uniformity depends entirely on how well randrange() can pick an integer uniformly at random from a potentially very large space.

I never really thought about that. Seems possible that it may hit its own limitations.

In any case, the giant-int arithmetic makes it much slower - it's been running over an hour on my box just now just to try to shuffle a list with a measly million elements 😉.

@pochmann
Copy link
Contributor

pochmann commented Aug 30, 2023

Seems simpler and a bit faster to divmod by n-i:

def shuffle(xs):
    n = len(xs)
    r = randrange(factorial(n))
    for i in range(n - 1):
        r, d = divmod(r, n-i)
        xs[i], xs[i+d] = xs[i+d], xs[i]

@tim-one
Copy link
Member

tim-one commented Aug 30, 2023

Seems simpler and a bit faster to divmod by n-i:

Cool! Then perhaps a tinier saving by reversing the loop order (while, e.g., a C oompiler would optimize the repeated i+d, CPython won't):

    for i in reversed(range(1, n)):
        r, j = divmod(r, i + 1)
        xs[i], xs[j] = xs[j], xs[i]

But does it actually address the original concerns? After some thought, I don't think so. It's still the case that the outcome is wholly determined by the Twister's initial state, which can have one of "only" 2**19937-1 values.

@d-rideout
Copy link
Author

d-rideout commented Aug 30, 2023

Thank you everybody!

My main takeaway is a cultural one: The declared output of library routines can not be assumed to be mathematically exact unless the documentation explicitly states otherwise.

I am happy with whatever resolution you all deem is reasonable.

I very much appreciate your thoughts about an exactly uniform algorithm for shuffling. But agree that this is a much larger, separate, topic.


Regarding the proposals for an exactly uniform shuffling algorithm, are these using the system entropy to generate the enormous random numbers? If so, then can't the Fisher-Yates algorithm be adapted to use these random bits as well, and thereby also provide exactly uniform random permutations? (I have not thought through the details carefully.) If not, then won't randrange() run into the same problems as Fisher-Yates was encountering, in its attempt to generate uniform samples from an enormous finite set?

[I just saw the previous post now, which raises the same concern.]

@pochmann
Copy link
Contributor

But does it actually address the original concerns?

No, was just a little code improvement.

Offhand, I can't recall any other language/library writeup that gives any warnings about its shuffling code, despite that they're all vulnerable to the same things.

Java says:

Randomly permutes the specified list using a default source of randomness. All permutations occur with approximately equal likelihood.

The hedge "approximately" is used in the foregoing description because default source of randomness is only approximately an unbiased source of independently chosen bits. If it were a perfect source of randomly chosen bits, then the algorithm would choose permutations with perfect uniformity.

@tim-one
Copy link
Member

tim-one commented Aug 30, 2023

I confess I'm still not clear on what the objection is. The docs don't promise mathematical perfection, but explicitly point out one possible class of imperfection, which was of some "practical" importance some years ago.

Fine by me if that was removed entirely. What I'm least keen on is expanding on it, because more than just that can go wrong, and explaining all that is a chore better suited to a paper, or at least a blog post. That, e.g., over the Twister's whole period, one permutation of [1, 2] is necessarily more likely than the other is of absolutely no practical importance. That some permutations of, say, range(3000) can never be generated seems almost as inconsequential.

are these using the system entropy to generate the enormous random number ...

Everything I talked about here was in reference to the shuffle() method of the Random class, all of whose methods build on the Mersenne Twister. The SystemRandom class instead builds on the platform's urandom() function (which, e.g., on Windows Python maps to Microsoft's CryptGenRandom). That inherits most method implementations (including shuffle()'s) from the Random class, Only a few core methods appeal to the underlying generator. So, ya, SystemRandom.shuffle() uses Fisher-Yates too, but building on the "most random" source that can be gotten at in a more-than-less portable way. SystemRandom has no concept of "state", and results can't be reproduced except by accident.

But we can't make promises about that either, because we're at the mercy of the platform's implementation of urandom(). Most platforms appear to mix sources of "real" entropy with a "crypto strength" deterministic PRNG, where the "real entropy" can mutate the PRNG's current state on the fly.

@d-rideout
Copy link
Author

My reading of the random.shuffle() code seemed to indicate that one could use the Mersenne Twister or urandom() as the source of randomness. Which seems like a good idea, to help separate the discussion of algorithms from discussion of sources of randomness.

I like the hedge "approximately". But defer to your judgement.

In my context I am fairly deeply troubled by some of the permutations of range(3000) being entirely missing from the output. These are the sorts of things that keep me up at night... 😜

@tim-one
Copy link
Member

tim-one commented Aug 31, 2023

@d-rideout, Java's brief "approximate" dodge read well to me too. Care to take a stab at that?

Note that SystemRandom.shuffle() does use urandom(), so it's as close to perfection as can be gotten in Python without heroic effort (e.g., installing a hardware random-noise chip and writing your own driver for it). But SystemRandom methods run slower, because urandom() is generally slower than the Twister - and, without a concept of state, there's no concept of "seeding" either, so no way to reproduce results later.

I stopped staying up at night over these things a couple decades ago 😉. For example, there are states in the Mersenne Twister where, if you're asking for 32-bit unsigned ints, you can get back exactly zero 600 times in a row. More than 600? Ya, but not a lot more. IIRC, the most often it can happen is 623. Prssumably there's no upper bound on how often that can happen building on urandom() instead. But if it was seen to happen even twice in a row, almost anyone would conclude the RNG was garbage (1/2**64 chance "should be the same as impossible").

@pochmann, you like pursuing speed, so I'll note that the last sample code unranking randrange(factorial) actually runs significantly faster than Python's shuffle() for smaller lists. A quick trial showed it running faster for lists of lengths up to 100+ under CPython and up to 60+ under PyPy (but excluding very short lists, like less than 4).

@pochmann
Copy link
Contributor

Yep, up to factor 1.6 times faster for me. Most extreme for 12 and 19 elements. Maybe because 12! and 19! are the largest fitting in one and two 30-bit digits, respectively. How much faster does it need to be to have any chance to make it into the stdlib despite producing different shuffles?

@rhettinger
Copy link
Contributor

How much faster does it need to be to have any chance to make it into the stdlib despite producing different shuffles?

A factor of 1.6x is pretty good and would seem worthwhile to me. However, there is the matter of factorials quickly growing very large and of the time it takes to apply divmod() to huge numbers. Right now, we can shuffle large lists (say 10 million elements) without any memory overhead and without any dependency on large multiprecision arithmetic having a fast implementation. So, I would vote for leaving shuffle() as-is.

Also, I prefer to leave the wording in the doc as-is. That way, people will think about the issue when looking at proposals to substitute other PRNGs with a comparatively tiny state space.

@rhettinger
Copy link
Contributor

Side note: I vaguely remember years ago there was a successful lawsuit against a gambling device company where certain winning hands from shuffled cards were not possible given that the possible card arrangements exceeded the state space for the underlying PRNG. I think it is reasonable for the docs to point out that large shuffled many not cover all possible permutations.

@tim-one
Copy link
Member

tim-one commented Aug 31, 2023

Any change along these lines would need to be conditional on the list length n. The unranking approach takes time roughly proportional to the square of n * log(n) (based on a very crude simplification of Stirling's approximation n! ~ (n/e)**n * sqrt(2*pi*n), but also a not-horrible match to observed timings).

So, e.g., shuffle-by-unranking a million-element list takes well over 100 times longer than a 100-thousand element list. Completely impractical.

This just reflects that k-by-1 bigint division takes time proportional to the number of digits "k", and that the number of numerator digits goes down roughly by 1 each time through the loop.

Not even using GMP would help that much. Dividing by a single CPython 30-bit digit is pretty fast already.

But the base idea may be generalizable: in effect, the unranking approach works by encoding the entire sequenece of swap indices in a single int, spanning as small a range as possible. There's only one direct call to randrange instead of n-1 direct calls to randbelow. It minimizes the number of times the underlying generator has to be accessed and the number of bits asked of it. But, in return, decoding the giant int becomes ever-more expensive.

"All or nothing" are two extremes. In between, for example, could be repeatedly encoding "just the next two" swap indices in a single int. Then (for most plausible n) we'd be sticking to at worst 2-digit CPython ints, where arithmetic is as fast as it gets.

Or "just the next J" swap indices, where J varies dynamically so that the product of "the next J" fits in 60 bits (2 CPython digits).

But, ya, details left as an exercise for the reader 😆

@pochmann
Copy link
Contributor

@tim-one Yeah I meant using this only for small sizes where it's faster, or a hybrid like you just described, or different optimizations I played with a while back.

@d-rideout
Copy link
Author

shuffle() doc apology take 3:
"""
Note that generating uniformly random samples from a large finite set, such as the set of permutations of an even moderately long sequence, using a pseudorandom number generator as the source of randomness, is a delicate
business. The permutations generated by shuffle() are therefore only approximately uniform. The probability of a given permutation may differ from the uniform distribution by $n!/p$ or more, depending on details of the pseudorandom number generator, where $p$ is the number of internal states of the generator, and $n =$ len(x). For the Mersenne Twister, which is the current default generator, $p=2^{19937}-1$, which is less than 2081!, so some permutations of a sequence of length 2081 will not occur at all. At the other extreme, $2075! / p &lt; 2 \times 10^{-18}$.
"""

if it was seen to happen even twice in a row, almost anyone would conclude the RNG was garbage (1/2**64 chance "should be the same as impossible").

My extremely rough estimate leads me to not be terribly surprised that some machine somewhere sees two consecutive 32-bit zeros emerging from a perfectly random source of independent random bits, produced at the rate that Mersenne Twister library functions might be called worldwide. But I may be a bit surprised at three! My first reaction might be to try to tighten my estimate.

@pochmann
Copy link
Contributor

pochmann commented Aug 31, 2023

shuffle() doc apology take 3:

Might just be me, but that seems rather long and far too technical and thereby more confusing/intimidating than helpful. (And "delicate business" sounds strange in the doc.) I much prefer the current text. The only part I might change is the one I bolded:

Note that even for small len(x), the total number of permutations of x can quickly grow larger than the period of most random number generators. This implies that most permutations of a long sequence can never be generated. For example, a sequence of length 2080 is the largest that can fit within the period of the Mersenne Twister random number generator.

That sounds like comparing length 2080 with length 2**19937-1. Might be better to say:

a sequence of length 2080 is the largest whose number of permutations can fit within the period

@d-rideout
Copy link
Author

As a developer of scientific software, I appreciate understanding this issue better, including how the lack of uniformity transitions from an irrelevant technicality at $n=2075$ to permutations being completely excluded at $n=2081$.

Does documentation like this have footnotes?

@tim-one
Copy link
Member

tim-one commented Sep 3, 2023

How messy and convoluted are we willing to make the code to save a few cycles?

The speedup is better than a factor of 2 - more than just "a few cycles" 😉.

they trade away both the clean abstraction of _randbelow and the dirt simple accept/reject algorithm.

Almost exactly the same dirt simple accept/reject approach is taken, but it's inlined. It's a trivial 3 lines of code (which could be reduced to two, including a pass statement, using the walrus operator - but why bother?). Part of the savings is from inlining, but more from exploiting (which cannot be done without inlining) that n.bit_length() is invariant across most iterations (it only changes when an index crosses a power-of-2 boundary).

It's only "almost exactly the same" approach because it's actually implementing "rand below or equal" instead of "rand strictly below". That, e.g., spares proposal3b from needing to adjust by +1 or -1 in various places in its innermost loops.

@rhettinger
Copy link
Contributor

@tim-one Yes, the speed-up is compelling. It's mostly a matter of picking which alternative everyone likes best. Also, it okay for me to say a little eulogy for the demise of the current, somewhat beautiful code ;-)

@mdickinson
Mark, do you have any thoughts about the different variants.?

@tim-one
Copy link
Member

tim-one commented Sep 3, 2023

@rhettinger, Stefan and I appear to agree that proposal3b or the newcomer proposal3c are the best of the bunch. 3b is very clean and easy to follow, and, in fact, never needs to invoke .bit_length() at all. The handling of endcases is more obscure in 3c, although that could be addressed by adding (a technically redundant) if n <= 1: return near its start. There's no real difference in speed between them.

I agree that the current code is delightful! Except for the "+ 1" 😆 .

@tim-one
Copy link
Member

tim-one commented Sep 3, 2023

and would use fewer bits from the underlying generator.

Note that while my "pairing" algorithm would consume fewer bits, not really these other ones. The pairing algorithm is much slower than Stefan's alternatives, and is actually slower than random.shuffle() under the current PyPy.

Stefan's may ask for a tiny bit less from the Twister because it doesn't suffer _randbelow_with_getrandbits(n) slight inelegance of asking for k+1 bits (instead of the sufficient k) when n == 2**k.

I have another method that never asks for an unnecessary bit from the Twister, and never rejects anything, but I've done everyone a favor by not mentioning it before now (heh - but it's not speed-competitive anyway).

@pochmann
Copy link
Contributor

pochmann commented Sep 5, 2023

Stefan and I appear to agree that proposal3b or the newcomer proposal3c are the best of the bunch.

Yes, I agree (if changing the shuffle results is ok, which it seems to be, as Raymond's new comments mention downsides but not this).

@rhettinger
Copy link
Contributor

rhettinger commented Sep 9, 2023

The speed improvement is worthwhile. The resampling permutation test in the examples section would show an immediate benefit.

Code clarity (and to some extent brevity) are important as well, so I prefer proposal3c over proposal3b. Also, that variant fits in my head better than the other variants.

Reproducibility is important. In my field (public accounting), reproducible samples are a fundamental requirement. So we should follow the path taken for random.seed() and have an optional version 1 for the current algorithm and a default to version 2 for the new algorithm. That way previous work can still be reproduced when needed. Also, it may help new readers of the code to be able to see the more intelligible non-optimized version that more likely matches what they would see elsewhere.

def shuffle(self, x, version=2):
    if version == 2:
        proposal3c()
    elif version == 1:
        current_fisher_yates()
    else:
        raise ValueError(f'Unknown version:  {version!r}')

@rhettinger
Copy link
Contributor

rhettinger commented Sep 9, 2023

Adding my timing notes here so they won't get lost. On an M1 Max, _randbelow takes 103 ns for an average case (n=192 being half-way between powers of two). The optimization in proposal3c saves:

  • 20 ns by inlining _randbelow() to avoid the method call overhead
  • 18 ns by avoiding the + 1 in j = randbelow(i + 1)
  • 11 ns by avoiding a call to the bit_length() method

There is some additional O(1) and O(log n) overhead so that the speed benefit is lost in very small shuffles. Bigger shuffles benefit considerably.

@pochmann
Copy link
Contributor

pochmann commented Sep 9, 2023

I just realized I hadn't written a for k in range(...) version equivalent to the current code, i.e., resulting in the same shuffles.

Here are minor variations of that, in case that's still interesting, as it combines reproducibility with the style of proposal3c, and is only a bit slower than that. The i-range is really ugly, though.

Benchmark with 100000 elements:
 14.95 ± 0.20 ms  proposal3c
 16.00 ± 0.39 ms  proposal2c3
 16.14 ± 0.23 ms  proposal2c2
 16.41 ± 0.42 ms  proposal2c1
 36.85 ± 0.76 ms  shuffle
def proposal2c1(x):
    n = len(x)
    randbits = getrandbits
    for k in range(n.bit_length(), 1, -1):
        for i in range((1 << (k-1)) - 1, min((1 << k) - 1, n))[::-1]:
            j = randbits(k)
            while j > i:
                j = randbits(k)
            x[i], x[j] = x[j], x[i]

def proposal2c2(x):
    n = len(x)
    randbits = getrandbits
    for k in range(n.bit_length(), 1, -1):
        for i in range(min((1 << k) - 1, n) - 1, (1 << (k-1)) - 2, -1):
            j = randbits(k)
            while j > i:
                j = randbits(k)
            x[i], x[j] = x[j], x[i]

def proposal2c3(x):
    n = len(x)
    randbits = getrandbits
    i = n
    for k in range(n.bit_length(), 1, -1):
        for i in range(i - 1, (1 << (k-1)) - 2, -1):
            j = randbits(k)
            while j > i:
                j = randbits(k)
            x[i], x[j] = x[j], x[i]

Attempt This Online!

@tim-one
Copy link
Member

tim-one commented Sep 9, 2023

If adopted, I would like to see a high-level comment block added; e.g.,

# Invariant: x[:i] is a random permutation of the original x[:i].
# To extend that to x[:i+1], we just need to swap x[i] with a
# random element in x[:i+1].
# Picking a random index <= i to swap with is done by an inlined
# variant of _randbelow() (fiddled a bit to pick an index below or
# equal, not just strictly below).

@rhettinger
Copy link
Contributor

Picking a rsndom index <= i to swap with is done by an inlined

Should that be "random" or "ransom" ;-)

@pochmann
Copy link
Contributor

pochmann commented Sep 9, 2023

Maybe some less optimized pseudo-implementation in that comment block?

Like that it's mostly an optimized version of this:

def shuffle(x):
    for i in range(1, len(x)):
        j = randint(0, i)
        x[i], x[j] = x[j], x[i]

And like this, just avoiding bit_length() calls:

def shuffle(x):
    for i in range(1, len(x)):
        k = i.bit_length()
        j = getrandbits(k)
        while j > i:
            j = getrandbits(k)
        x[i], x[j] = x[j], x[i]

I'll btw likely be mostly offline for a week now. I can get back to this afterwards, but if one of you wants to do it, I'm happy with that. Making finishing touches might be more efficient anyway if you just do it yourself.

@mdickinson
Copy link
Member

mdickinson commented Sep 10, 2023

Mark, do you have any thoughts about the different variants.?

No strong opinion; happy with whatever you pick. The speedup looks nice.

On the original doc issue, I'd be delighted to see the caveat and the explicit 2080 dropped entirely - I think it's more a distraction (and an invitation to issues like this one) than actually helpful.

As a thought experiment: suppose you have the Mersenne-Twister-based random.shuffle, along with a hypothetical function random.perfect_shuffle where the latter is (somehow, magically) free of all bias. Now suppose someone hands you one or the other of these functions but doesn't tell you which one. Could you write a piece of code that, over a large number of trials, can detect whether it's been given shuffle or perfect_shuffle with >51% accuracy? (No cheating by using timing or other side-channel attacks - the code should be based solely on the outputs of the generator.) I suspect that it would be rather difficult to do so. Probably not impossible - reverse-engineering the MT state from the shuffle results is likely feasible with enough work. But more practically, if we're not thinking adversarially, it's hard to imagine a piece of real-world code that would accidentally give a statistically-detectable difference depending on whether it gets handed shuffle or perfect_shuffle. So in the absence of evidence to the contrary, I think the caveat has no practical value - there's no action that a user could or should reasonably take to mitigate.

For historical contrast, if random.shuffle were based on a PRNG with a mere 7 trillion different internal states (looking at you, Wichmann–Hill), then it would be easy to write a perfect_shuffle detector: you'd just generate 10 million shuffles of a 100-element list and look for duplicates - the birthday paradox says that with shuffle the chance of getting at least one duplicate is >99.92%, while with perfect_shuffle the chance of a duplicate is essentially zero.

@tim-one
Copy link
Member

tim-one commented Sep 10, 2023

As usual, I'm most sympathetic with Mark here. I added this disclaimer way back in the Wichmann-Hill days, when someone actually came up with a permutation of a deck of cards that shuffle() couldn't produce.

Now it's a technical curiosity of no seeming practical importance - now it's arguably needless FUD. In a blog post or appendix it could be fun to explore, but the audience would be hardcore nerds with nothing better to do 😉.

Should we add this warning to random.random()?

There are only 2**53 different floats random() can return. If N successive calls are made to random(), and the results placed in an N-tuple, not all N-tuples so constructed are equally likely, and for large enough N some N-tuples of the possible floats can never be constructed.

It's the same thing, really. The starting state wholly determines the N-tuple constructed, so no more than P (number of possible states) different N-tuples can possibly be constructed. (2**53)**377 is already larger than the Twister's P, so N=377 is already too large in this context.

So this kind of thing wasn't unique to shuffle() to begin with - what was unique about shuffle() is that it was the only Python-supplied method that could make many choices under the covers with a single call. Now that's no longer the case; e.g., the later random.choices() effectively does what the hypothetical warning above describes. But, no, I really don't want to add such a warning to the choices() docs either. Or to sample()'s.

@d-rideout
Copy link
Author

I'd like to argue for a slight shift in attitude towards computer documentation. Along lines that have been discussed above: that it can be seen as an important opportunity to warn the user about potentially severe issues that they may not have considered previously.

All these claims of the form "everything will be fine in practice" seem not so dissimilar to the shoddy reasoning that I presented to argue that 2075! "was a small number", which was, rightly so, rejected. We have no proof that

a piece of real-world code [will not] accidentally give a statistically-detectable difference depending on whether it gets handed shuffle or perfect_shuffle.

How do we know that it is "a technical curiosity of no seeming practical importance"1?

How about promoting the random() warning to the top level random module documentation, next to the warning about cryptographic usage? So that users of the entire module may be aware of this issue? Why are we so concerned about cryptographic usage, but not scientific usage?

My impression is that many scientific users of this code trust it to produce the output it describes, at times, long before they become intimately familiar with the intricacies of pseudo-random number generators (PRNGs). And my feeling is that the public trust will be better served by admitting this potential pitfall in a prominent place.

I, for one, greatly appreciate that the remark about 2080! was left in the shuffle() documentation.

Footnotes

  1. To raise an extreme hypothetical scenario: How do we know that the Monte Carlo computations which computed the outcome of the first atomic bomb explosion did not systematically neglect events which ignite atoms in the atmosphere, due to some peculiarity of the underlying PRNG?

@tim-one
Copy link
Member

tim-one commented Sep 12, 2023

My problem remains that I don't know what could be said that would do more good than harm.

Crypto is very different. There's nothing hypothetical about the complete unsuitability of the Twister in crypto contexts. Extremely effective attacks against crypto algorithms using non-crypto-strength RNGs have been published and are widely known in the security world. Real-world examples of relevant security breaches are also known. Crypto newbies need to be warned about this - it's a highly visible failure mode that makes "the news" routinely because people actually fall into it.

What can we say about non-crypto contexts? The Twister has been around for 25 years now, has to be the most heavily tested PRNG in history, and there are just no stories I've heard of it getting the blame for a real-world problem. It fails some statistical tests that are trying to detect whether a generator is doing arithmetic in GF(2), and that's about it for non-theoretical "quality" complaints. I haven't heard of an organization dropping the Twister for failing its own "due diligence" tests - quite the contrary, it gained near-universal adoption because it passed so many organizations' tests.

I can't give an example of it failing, because I've never heard of one(*), and don't know how to provoke one in less than an absurd amount of time. I can't tell a user how to detect a failure if one occurs. And, if one does, I can't tell the user how to avoid- or sidestep -it.

So warning about the theoretical possibilities is pretty much the definition of FUD (stoking fear, uncertainty, and doubt about pure hypotheticals). For most users, that would do them more harm than good.

(*) I can construct internal states that, e.g., would lead to getrandbits(32) returning 0 hundreds of times in a row, but I haven't heard of that happening in real life since about a week after the Twister was first released (in the very first release, the Twister's seeding function was lazy, often leaving the constructed state filled overwhelmingly with 0 bits - but they repaired that very quickly, after realizing that users just couldn't be to trusted to supply ~20000 "pretty random" bits to start with on their own).

In scientific code, users are far more likely to get burned badly by custom code that hasn't been analyzed at all by a numerical expert for potentially damaging loss of precision due to a numerically naïve algorithm failing to control for rounding errors. That's a problem that burns people every day, but the Python docs don't even mention it once anywhere 😉

@tim-one
Copy link
Member

tim-one commented Sep 12, 2023

FYI, this paper was written just a few years ago:

It Is High Time We Let Go Of The Mersenne Twister

I think it's fair overall, although the author has his own PRNG schemes to sell and is "highly motivated" to diss others' work.

What I take from it is that nothing has really changed since the last time I looked: he has no real-life Twister failure anecdotes to share, just repeating that it's known to fail some specific tests that all GF(2)-based PRNGs fail. Which has been known for quite a long time.

One of these tests was due to George Marsaglia, the late RNG genius who first proved that n-tuples created from the successive terms of a LCG generator fall in a relatively small number of (n-1)-dimensional hyperplanes. He realized too that for generators "like" the Twister, large "random" 0/1 matrices created from their bits will have, overall, ranks that are too small. Which is the test.

While that doesn't appear to have consequences anyone cares about 😜, all else being equal, of course we'd prefer a generator that didn't fail any known test.

But that's not the point here: this paper is likely the strongest informed criticism of the Twister that can be found. And it's weak tea. Anything going beyond it would be speculative - FUD-like.

Note that I'm not wedded to the Twister. I would prefer a generator with much smaller state (preferably fitting in a cache line or two), simpler code, faster code, fast "jumpahead" abilities, and one that passes all known tests. I don't really care about having a massive period.

But I also think Python should be a follower in this area, not a pioneer. So, @mdickinson, if you're still here, I believe numpy switched to a member of the much newer PCG family. If so, how has that been working out for you? I'm a big fan of Melissa O'Neill's writing and thinking on the topic, so I'm looking for "reasons" to consider whether Python should follow numpy's lead here.

@d-rideout
Copy link
Author

Okay.

Thank you everybody for considering my concerns.

I am going to think about this further, along the lines of a variant of the test that Mark Dickinson has proposed. Anyone care to join me?

@tim-one
Copy link
Member

tim-one commented Sep 12, 2023

For historical contrast, if random.shuffle were based on a PRNG with a mere 7 trillion different internal states (looking at you, Wichmann–Hill), then it would be easy to write a perfect_shuffle detector: you'd just generate 10 million shuffles of a 100-element list and look for duplicates - the birthday paradox says that with shuffle the chance of getting at least one duplicate is >99.92%, while with perfect_shuffle the chance of a duplicate is essentially zero.

@mdickinson, I didn't follow this line of argument. It's certain that WH can't generate more than ~7e12 (its period) distinct permutations, but it goes through every one of its possible internal states before repeating. So if there's a duplicate permutation generated before then, it's not due to WH suffering a birthday-paradox internal state repetition, but due to the shuffling algorithm mapping distinct WH states to the same permutation. It's the algorithm that's subject to the birthday paradox, not the underlying generator.

Some quick coding doesn't change my mind about that 😉. Here's output from a run with a sequence with only 17 elements (I don't want to wait forever on this). Even at such a "small" size, there was a case where using Wichmann-Hill as the generator took over 45 million shuffles to find the first duplicate. Offhand, WH doesn't appear to be doing any worse at this than using the Twister as the generator, or using the system urandom() (via random.SystemRandom.random). But the variance across tries is high, so my "appears" is a very subjective guess:

EDIT: to try to nail this, I put in a far feebler version of WH:

    def whbase(c=4991):
        while True:
            c = 170 * c % 30323
            yield c/30323.0
    wh = whbase().__next__

30323 is a prime, so c internally goes through all values in range(1, 30323). And, indeed, if the input list is large enough, using this generates exactly 30322 distinct permutations - or 15161 - before finding its first duplicate. I don't know why sometimes it's the maximum possible, while other times only half that. But that's a pit I'm not inclined to jump into. The point was to verify that the generator itself is exempt from birthday paradox arguments.

EDIT 2: "all or half?" are special cases of a more general pattern, which turns out to be shallow. If I pass a list with 15162 elements,, the horridly feeble WH variant generates only 2 distinct permutations. Why? Because the shuffling algorithm calls the generator N-1 times for a list of length N. So, whatever the starting state, shuffling a 15162-list advances the state by 15161 positions, and doing it a second time does the same, leaving the starting state advanced by 30332 positions in all: exactly back to the state we started with. 15161 is very special, because it's half the period P. For smaller sizes, "even or odd?" makes the difference. If the length is odd, shuffling consumes an even number of states, so (because P is also even) only half the states can be starting states for the next shuffle - only P/2 distinct perms can be generated. While if the length is even, shuffling consumes an odd number of states, and (again because P is even) no state can ruled out as the starting state - all P perms can potentially be generated. And 2 and 15161 are P's only non-trivial divisors, so there are no other patterns of this kind in this case.

Note that the above is nearly irrelevant for the Twister, because its period is prime.

N=17 N!=355_687_428_096_000 isqrt=18_859_677
using random.SystemRandom().random
dup found at count=6_759_620
dup found at count=18_331_040
dup found at count=24_891_577
dup found at count=3_873_247
dup found at count=8_787_305

using random.random
dup found at count=80_778_804
dup found at count=38_294_138
dup found at count=4_863_331
dup found at count=19_520_382
dup found at count=17_565_755

using WH
dup found at count=11_945_489
dup found at count=11_136_7397
dup found at count=32_105_454
dup found at count=45_458_159
dup found at count=17_871_475
Click here for the code
def shuf(xs, r):
   for i in range(len(xs)-1, 0, -1):
       j = int(r() * (i+1))
       xs[i], xs[j] = xs[j], xs[i]

def whbase(a=123, b=145, c=4991):
   while True:
       a = 171 * a % 30269
       b = 172 * b % 30307
       c = 170 * c % 30323
       yield (a/30269.0 + b/30307.0 + c/30323.0) % 1.0
wh = whbase().__next__

def dupdetect(r, n):
   count = 0
   base = bytearray(range(n))
   seen = set()
   while True:
       count += 1
       xs = base[:]
       shuf(xs, r)
       t = bytes(xs)
       if t in seen:
           print(f"dup found at {count=:_}")
           break
       seen.add(t)

N = 17
import math
fac = math.factorial(N)
s = math.isqrt(fac)
print(f"{N=:} N!={fac:_} isqrt={s:_}")
import random
for meth, name in [(random.SystemRandom().random, "random.SystemRandom().random"),
                  (random.random, "random.random"),
                  (wh, "WH")]:
   print("using", name)
   for i in range(5):
       dupdetect(meth, N)
   print()

@tim-one
Copy link
Member

tim-one commented Sep 13, 2023

I am going to think about this further, along the lines of a variant of the test that Mark Dickinson has proposed.

Please see my reply to Mark just above - I believe he suffered a rare thinko when proposing that test (best I can tell, it won't work unless you try a number of shuffles proportional to min(period, sqrt(factorial(len(input)))))

Anyone care to join me?

I'd like to at least lurk, but can't commit to another time sink. How about setting up a new github repo for this? Then those interested can discuss, and share code, and lurk, in a more-or-less convenient shared space.

Suggestion: tackle Wichmann-Hill first. I gave an implementation in my reply to Mark. It should be far easier to detect problems in that. "It exhibits 12 clear failures in the TestU01 Crush suite and 22 in the BigCrush suite (L'Ecuyer, 2007)"

@mdickinson
Copy link
Member

mdickinson commented Sep 13, 2023

@tim-one:

@mdickinson, I didn't follow this line of argument. It's certain that WH can't generate more than ~7e12 (its period) distinct permutations, but it goes through every one of its possible internal states before repeating. [...]

Aargh! Yes, you're right of course. 😳 Please ignore everything I said. :-)

I believe numpy switched to a member of the much newer PCG family. If so, how has that been working out for you?

On NumPy's use of PCG: Robert Kern (@rkern) would be the person to ask; I've been following, but not closely. The most interesting development was the somewhat recent addition of a new PCG variant PCG-64 DXSM, which mitigates concerns about possible correlations between what were intended to be statistically independent streams, and doesn't appear to have any significant downside relative to the original PCG XSL RR 128/64. I believe NumPy intends to eventually make PCG-64 DXSM the default bit generator, but hasn't yet done so.

Despite having read the relevant thread several times, I still can't claim to have a clear picture of the status of the correlation concerns (if any) with PCG-64 DXSM. My takeaway understanding (and I hope @rkern will chime in to correct me if necessary) is that with "reasonable" data sizes and a "reasonable" number of non-adversarially-seeded (and non-naively-seeded) streams, there aren't currently grounds for concern, but I can't quantify "reasonable" here. As far as I can tell, adopting PCG-64 DXSM for Python wouldn't be at all unreasonable (at least from a statistical point of view; I'm totally neglecting all the logistic aspects of changing the default generator).

Additional references:

@tim-one
Copy link
Member

tim-one commented Sep 13, 2023

There is an application of the birthday paradox to how many permutations a PRNG can generate. Suppose we're looking at lists with N elements, and suppose we have a PRNG with period P=N!. Can it generate all permutations? Well, if it's a "good" generator, almost certainly not. Consider this loop:

states = a list of all the generator's P possible states
for state in states:
    force the generator's state to state
    shuffle range(N)
    if we've seen the resulting permutation before:
        raise ValueError("not possible to generate all permutations")
    remember that we've seen the resulting permutation
print("WOW - got 'em all")

If the generator is "good", forcing the starting state shouldn't change that the shuffle yields a random-ish permutation.

But, if so, we expect to see a duplicate after O(sqrt(N!)) iterations. Far, far short of N!.

Empirically, I built 5 different generators with period 10! and tried that for N=10, but let the loop go on to try all P initial states:

N=10 N!=3_628_800 isqrt=1_904

first dup at 1787
found 2293877 unique, of 3628800

first dup at 1046
found 2294123 unique, of 3628800

first dup at 1080
found 2293113 unique, of 3628800

first dup at 813
found 2292896 unique, of 3628800

first dup at 5520
found 2293632 unique, of 3628800

So, that was expected: lots of permutations couldn't be generated.

Making the period 10x longer helped, but still left a few hundred permutations unreachable.

After making the period 20x longer, all permutations were reached, in each of 3 runs.

Tech note: how to make a good generator with an arbitrary period P? I just precomputed a list of P random floats from the system urandom(). The mutable state is then just an index into that list.

class ArbPeriod:
    def __init__(self, P):
        import random
        r = random.SystemRandom().random
        self.P = P
        self.floats = [r() for i in range(P)]
        self.i = 0

    def random(self):
        i = self.i
        result = self.floats[i]
        i += 1
        if i >= self.P:
            i -= self.P
        self.i = i
        return result

    def setstate(self, i):
        self.i = i % self.P

@rkern
Copy link

rkern commented Sep 14, 2023

My takeaway understanding (and I hope @rkern will chime in to correct me if necessary) is that with "reasonable" data sizes and a "reasonable" number of non-adversarially-seeded (and non-naively-seeded) streams, there aren't currently grounds for concern, but I can't quantify "reasonable" here.

My most approachable attempt at summarizing the issue is now in our documentation. To put some numbers on it, with DXSM if the 128-bit states share the lower 110 bits or less, and have exactly the same increment, I don't see a problem out to 256 GiB in PractRand (before I get bored and kill it). If I share all 128 bits of state and have increments with a Hamming weight between 16 and 100, roughly, then I don't see problems with this interleaving test either. So to get the same interleaving problem that we see earlier with XSL-RR, I need to share (up to certain equivalence classes) significantly more than 128 bits between the two (state, increment) pairs. 128 bits is my "reasonable" limit of caring about collisions (c.f. IPv6, UUID, etc.).

All that said, we're not super-concerned about the issue with XSL-RR to begin with and have no particular timeline for making the switch (though that's in some part just held up by my shame at misimplementing the initialization). Even my summary probably overstates the problem out of an abundance of caution. To even observe the PractRand failure in a real situation, you have to find and pull these two "colliding" streams together out of the horde of streams that you've made and interleave their raw bits precisely aligned before PractRand's sensitive tests find an issue. If you do any useful work with those raw bitstreams, you wouldn't see a problem from the results of that work.

But if looking for options to migrate to the PCG family from another family, the DXSM variant is the one I'd choose. It's easier to document it's properties without too many asterisks. I have unpublished work looking at some variants of the DXSM output function that improve its avalanche properties without paying a performance tax, but I'm not nearly invested enough to devote the CPU hours of testing much less publication and evangelism to make it a viable option. I also personally like SFC64 if one is looking around.

@tim-one
Copy link
Member

tim-one commented Sep 14, 2023

@mdickinson
On NumPy's use of PCG: ...

Thanks for the links, Mark! Looks like PCG has been working fine for numpy. What I took from the thread is that the recent churn was all about the possibility of correlations among "Independent" streams, which could occur if:

  1. A researcher contrives "bad" initial states (BTW, that appears to be the same researcher who wrote the paper (which I linked to a few messages above) dissing the Twister); or,
  2. The birthday paradox gives a non-negligible chance of a pair correlation if millions of non-contrived streams are created, each consuming gigabytes of generated output. Which nobody does - yet.

It's great that numpy is pro-actively addressing the possibility, instead of waiting for the atmosphere to explode due to a flawed Monte Carlo code 😉.

@rkern
Copy link

rkern commented Sep 14, 2023

How do we know that it is "a technical curiosity of no seeming practical importance"?

Anecdata: I asked a knowledgeable person who is in fact convinced that this is a problem, and he had never been able to construct even an adversarial example of a practical problem.

@tim-one
Copy link
Member

tim-one commented Sep 14, 2023

@rkern

I have unpublished work looking at some variants of the DXSM output function that improve its avalanche properties without paying a performance tax, but I'm not nearly invested enough to devote the CPU hours of testing much less publication and evangelism to make it a viable option.

Talk imneme into thinking they're her ideas? 😉. I saw that, in one of the numpy threads, Vigna pointed out that avalanche could be improved just by picking different multipliers. Modern "fast" (non-crypto) hash/digest functions have put a lot of thought into that. Not so much in the way of theory, but more computer searches for sequences of operations (multiplies and shifts by fixed constants) with demonstrably good avalanche behavior.

CPython got spectacular improvement in some pathological cases of tuple hashing by implementing part of xxHash's scheme. dict lookup speed is best if the low-order bits of hashes don't collide. The specific multipliers and shift counts used were important, but the original author had no account for "why?" other than they did best in extensive avalanche testing.

I also personally like SFC64 if one is looking around.

Thanks for the tip! On your recommendation, I'll stop ignoring it 😉 - I always just knee-jerk assumed that something so short and simple and multiplication-free just had to be suffering major shortcomings. But maybe not!

he had never been able to construct even an adversarial example of a practical problem.

I think shuffling is a red herring here. Cut to the heart: using the Twister, we can only generate an inconceivably small fraction of the mathematically legitimate results from randrange(1 << 30000). What does it matter? How could a program even tell (using only feasible amounts of storage and time)? It's not like, e.g., we'll only get odd ints back, or only primes, or ...

@rkern
Copy link

rkern commented Sep 15, 2023

Talk imneme into thinking they're her ideas? 😉.

Oh, it's been offered up. Playing around, just reversing the order of the multiplications (free) so that the low bits get mixed in first made a big difference to the avalanche properties, then adding on one more xorshift cleans it up even more (minor penalty, but just bringing it back into line with XSL-RR and the older expensive LCG multiplier). I think I played around with the multiplier constant, but I don't think I saw any noticeable difference. DXSM as it stands has enough of a safety factor above my 128-bit-do-I-really-care? line for this test that I don't care to pursue it as a new option.

I think shuffling is a red herring here. [...]How could a program even tell (using only feasible amounts of storage and time)?

Indeed. My practical-minded response to this concern about missing permutations is that I want to see a new test that I could add to PractRand that would show a problem with a PRNG faster than other tests already in PractRand that fail for those PRNGs. One can use small PRNG variants for this purpose, of course, because most of the ones under consideration are already too big to fail.

Re: SFC64, I liked imneme's articles about its proximate ancestor, Jenkin's Small Fast generator, and the statistics of random invertible mappings to get a feel for what this class of PRNG does. SFC64 is in the same style, but adds in a counter to the state so that the cycles are all a minimum of 2**64, so there is no chance, rather than a merely ignorably-tiny chance of being on a small cycle. It has no party tricks like jumpahead, but by offloading parallel seeding to other mechanisms, I have no use for jumpahead.

@tim-one
Copy link
Member

tim-one commented Sep 15, 2023

Jenkins has some very interesting things to say about avalanche.

Short course: tests that manipulate the internal state bits directly and measure resulting avalanche in outputs can run orders of magnitude faster than chi-square tests (which compare some statistic computed from the output stream against an ideal distribution). So much faster that it's thoroughly practical to test all possible combinations of shift/rotate constants for avalanche effects.

If avalanche is poor, chi-square tests fail too. But if avalanche is good, chi-square tests also pass. In his experience, at least with his style of generator.

But while "close to perfect" avalanche behavior is vital for block ciphers and digest functions, at least his kind of RNG is far more forgiving. Turns out that none of his initial tries for 32-bit generators flipped more than 4 bits (ideal would be 16) on average (but exactly how he measured this isn't yet clear to me). They all eventually failed some chi-square tests. But by using rotate constants found by computer search, the average went up to 8.8 bits (still far short of 16!), and that passed all tests. And that's the one he released to the world. He later found another brief sequence of operations that achieved 13 flipped bits on average, but apparently thinks that even the earlier 8.8 was overkill in this context.

I'll note that the Twister is extraordinarily slow at diffusing internal bit changes. Take a Twister instance, and clone it, but flip one bit in its state. Then run the original and the clone in lockstep. For the first 1000 (or so) iterations, they'll typically produce identical 32-bit outputs on most iterations. When they differ, the Hamming distance will be low. This can persist for dozens of thousands of iterations.

So that demonstrates that passing most chi-square tests (which the Twister does) doesn't require "reasonably fast" avalanche from internal state bit changes to outputs, or even anything close to reasonably fast - but then "too big to fail" covers a multitude of sins 😉.

@d-rideout
Copy link
Author

Hi all,

Apologies for the delayed response. I have been reading up on some of the background that you all have been pointing to. I have yet to digest the O'Neill PCG paper, which appears to be very important. In the mean time:

@mdickinson 's proposal is very much along the lines that I was imagining, and I think is exactly correct. The fact that @tim-one 's experiments appear to contradict this test is indeed curious! And perhaps worth pursuing. But I would like to understand what O'Neill has to say on the subject before creating a repository to explore the details.

@tim-one
Copy link
Member

tim-one commented Sep 20, 2023

@d-rideout, there's no rush 😄.

@mdickinson 's proposal is very much along the lines that I was imagining, and I think is exactly correct.

If you didn't notice, @mdickinson retracted that suggestion. It can't work, and for the reasons I sketched. Think carefully:

For historical contrast, if random.shuffle were based on a PRNG with a mere 7 trillion different internal states (looking at you, Wichmann–Hill), then it would be easy to write a perfect_shuffle detector: you'd just generate 10 million shuffles of a 100-element list and look for duplicates - the birthday paradox says that with shuffle the chance of getting at least one duplicate is >99.92%, while with perfect_shuffle the chance of a duplicate is essentially zero.

To connect the dots explicitly, after throwing 10 million balls into 7 trillion bins picked uniformly at random with replacement, the chance that at least one bin contains more than one ball is > 99.9209511800985347485%.

But the internal state transitions in WIchmann-Hill don't behave at all in a pseudo-random fashion. Very much the contrary: by design, you never hit a duplicate state until after the full period has been exhausted. The birthday paradox is simply irrelevant to the generator's internal states as the program goes on. The total number of times WH was invoked can be relevant, but 10 million shuffles of a 100-element list consumes an insignificant fraction of WH's period.

The fact that @tim-one 's experiments appear to contradict this test is indeed curious! And perhaps worth pursuing.

Trust me when I say it's very much worth your fighting your way to understanding. It's fundamental, which isn't to say easy or self-evident. Reasoning about "randomness" can be subtle! Write code with small examples - don't trust your head 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 new features, bugs and security fixes docs Documentation in the Doc dir stdlib Python modules in the Lib dir
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
@rkern @pochmann @mdickinson @rhettinger @AA-Turner @tim-one @d-rideout and others