Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Random.py Package Improvement #7

Open
Evitcani opened this issue Sep 18, 2021 · 19 comments
Open

[Enhancement] Random.py Package Improvement #7

Evitcani opened this issue Sep 18, 2021 · 19 comments

Comments

@Evitcani
Copy link

Evitcani commented Sep 18, 2021

Background

A friend and I were looking into the avrae/d20 package. We noticed some concerning code flow with regards to how Random is being used. The package being used (random.py) is the old version of the MT algorithm which tends towards statistically lower outcomes (as compared to the 2002 or newer versions). It's also hit or miss given the lack of seeding which means we'd expect the randomness to basically end up being subsets of older rolls. (Deterministic for the purpose of statistical modeling.) Also, it's one of the slowest random algorithms.

There's lots of alternatives that are easy to use and provide more randomness and quicker throughput.

Suggested Improvement

Upgrade to a new random package to be used on lines 405 and 407 of expression.py. A package in the PCG family would likely be optimal for Avrae's usage.

Quote from the page on comparison to other algorithms:

PCG was developed by applying TestU01 to reduced-size variants,[7] and determining the minimum number of internal state bits required to pass BigCrush. BigCrush examines enough data to detect a period of 235, so even an ideal generator requires 36 bits of state to pass it. Some very poor generators can pass if given a large enough state;[8] passing despite a small state is a measure of an algorithm's quality, and shows how large a safety margin exists between that lower limit and the state size used in practical applications.

PCG-RXS-M-XS (with 32-bit output) passes BigCrush with 36 bits of state (the minimum possible), PCG-XSH-RR (pcg32() above) requires 39, and PCG-XSH-RS (pcg32_fast() above) requires 49 bits of state. For comparison, xorshift*, one of the best of the alternatives, requires 40 bits of state,[5]: 19  and Mersenne twister fails despite 19937 bits of state.[9]

This could avoid the known issues of many-zero state recovery in pre-2002 MT algorithms of which random.py uses. The best outcome for Avrae would be faster generation.

@zhudotexe
Copy link
Collaborator

I did some digging and it appears that the implementation of MT used by CPython's random is indeed MT2002 (as referenced at https://github.com/python/cpython/blob/main/Modules/_randommodule.c#L5). I'll look into this more later in the week.

@posita
Copy link

posita commented Sep 20, 2021

Apologies for butting in, but these seemed relevant:

@Evitcani
Copy link
Author

Apologies for butting in, but these seemed relevant:

Good point on the PCG generators. I only pointed it out as it seemed the best suited (there are other, older generators that could be considered in its stead).

@Evitcani
Copy link
Author

Just read your edit with this secondary link. It appears NumPy adopted a PCG algorithm in 2019 after a long discussion. The Python group seems hesitant due to its novelty (only being a 6-7 year old algorithm, if I’m reading correctly). They know the Twister has problems which there are solutions to.

A Xorshift generator might be better as its fail states are known and tested. It has some issues that make it difficult to achieve a long period without sufficient initialization. However it is notably faster than the Twister which would achieve the speedup considerations.

@posita
Copy link

posita commented Sep 20, 2021

This isn't my show, but one possibility might be to opportunistically defer to a package like numpy, meaning don't make it an install requirement, but use it, if present. (This is approach I'm currently considering for dyce.)

@zhudotexe
Copy link
Collaborator

zhudotexe commented Sep 20, 2021

I think that makes a lot of sense, and if we expose numpy as an extra dependency (i.e. installed via d20[pcg] or similar), transparent to the user (in addition to docs, of course).

@Evitcani
Copy link
Author

That sounds like a great plan.

Could consider the Random.org API library as an option for people who want to pay for pretty near true RNG. Additional to the NumPy option. I wouldn’t mind making a fork for it, though it would require HTTP request async requirements with cache options. Gets a bit nasty and would slow performance.

@zhudotexe
Copy link
Collaborator

Haha! I think that's a little out of scope to bundle with d20 (especially since the package itself tries to make few assumptions about the environment it's running in re. sync/async), but if the plan is to make opportunistic usage of alternate RNG providers (e.g. numpy), it'll likely expose a randrange_impl function that can be easily monkey-patched to provide this functionality. I'll keep that in mind during implementation.

posita added a commit to posita/dyce that referenced this issue Sep 21, 2021
@zhudotexe
Copy link
Collaborator

I've done a little bit more reading into the articles/issues linked above and some articles linked to by them.

Reading list (in the order I read them):

Personal Thoughts

I can't speak for the quality of PCG vs. XorShift, and this topic seems to be of some debate in the mathematical community. What does seem to be agreed upon, at least, is that both of these are "better" than Mersenne Twister.

With regards to speed, the table I mentioned in the second linked article seems to display these results:

PRNG Footprint (bits) BigCrush Failures HWD Failures ns/64b
xoroshiro128++ 128 -- -- 0.95
PCG64DXSM 128 -- -- 1.44
MT19937 20032 LinearComp -- 2.19

This table suggests a 1.52x speedup in PCG64 over MT, and 2.31x when using xoroshiro128++ (other variants excluded for brevity). However, in the context of d20, the actual time it takes to generate a random number is effectively negligible compared to the time it spends in other code (i.e. parsing/arithmetic evaluation/function call overhead/Python overhead). For example, the 2.19ns/64b presented in the above table is less than 7% of the 32.7ns it takes to generate a random number in Python (timed with $ python -m timeit -s "from random import random" "random()" on an i9 @ 2.3GHz), and less than 0.01% of the time d20.roll("1d20") takes (from the performance listed in the readme). When considered alongside Amdahl's Law, I believe that the speed gains from the choice of RNG is not a major factor here.

The most recent version of NumPy includes implementations of the Mersenne Twister, PCG64 (and the DXSM variant), Philox, and SFC64 (the latter two of which I have done no reading into, so I can't comment on them) - bringing this back into what this means for d20, I believe that if we were to implement some method for the user to supply which RNG to use (i.e. some interface for exposing a randrange implementation) and defaulted to some sane implementation (MT for a thin installation, PCG64 if numpy is installed), this leaves any library consumer in a good place.

@Evitcani
Copy link
Author

Great and thoughtful research that was quite fascinating to read. I agree with the conclusions you’ve drawn for moving forward with d20. It’d provide a great interface for the user.

My suggestion would be the links provided here within the external or code documentation to let the user make a thoughtful conclusion at their own pace and learning.

@posita
Copy link

posita commented Sep 22, 2021

I took a stab at implementing a random.Random with numpy.random primitives underneath. (I'm actually kind of surprised someone else hasn't done this. If they have, I was unable to find it. There are probably improvements over my own approach. If you think of any and feel like sharing, please let me know.)

Code is here. Tests are here.

In my case, I merely import the resulting RNG and use it like any other random.Random implementation.

@zhudotexe
Copy link
Collaborator

zhudotexe commented Sep 22, 2021

That looks pretty clean! I think it might be worth implementing getrandbits() in your subclass as well since NumPy's BitGenerator certainly implements it, and the Random class takes advantage of it if supplied to allow for arbitrarily large randrange (and I think it may be slightly faster in discrete cases by inspecting the implementation of _randbelow_with_getrandbits vs _randbelow_without_getrandbits below).

Do you know if subclassing random.Random causes CPython to initialize the MT even if you don't use it? My intuition says that since you aren't calling super().__init__() it won't, but I'm not super familiar with the low-level workings of classes that eventually inherit from C extensions. (Also, random.Random.__init__() doesn't even call the super method from the C class it inherits from...)

Edit: It looks like the implementation of BitGenerator.random_raw differs slightly from _random.getrandbits, and I bet implementing Python logic to try and replicate this C logic would actually be slower than the implementation of _randbelow_without_getrandbits 😛.

Edit 2: Or maybe not! I bet you could replicate SystemRandom.getrandbits using Generator.bytes.

@posita
Copy link

posita commented Sep 22, 2021

Do you know if subclassing random.Random causes CPython to initialize the MT even if you don't use it? My intuition says that since you aren't calling super().__init__() it won't, but I'm not super familiar with the low-level workings of classes that eventually inherit from C extensions. (Also, random.Random.__init__() doesn't even call the super method from the C class it inherits from...)

I didn't take any time to look into initialization. 😞 I honestly hadn't thought about its implications. 😬 Not calling super.__init__() was probably an oversight on my part. Are you worried about performance on generator creation?

Edit: It looks like the implementation of BitGenerator.random_raw differs slightly from _random.getrandbits, and I bet implementing Python logic to try and replicate this C logic would actually be slower than the implementation of _randbelow_without_getrandbits 😛.

Edit 2: Or maybe not! I bet you could replicate SystemRandom.getrandbits using Generator.bytes.

Yeah, I think replacing _urandom(numbytes) in that implementation with numpy.random.Generator.bytes seems pretty straightforward:

    class NumpyRandom(Random):
        # …
        def getrandbits(self, k: int) -> int:
            # Adapted from random.SystemRandom. See
            # <https://github.com/python/cpython/blob/8c21941ddafdf4925170f9cea22e2382dd3b0184/Lib/random.py#L800-L806>.
            if k < 0:
                raise ValueError("number of bits must be non-negative")

            numbytes = (k + 7) // 8  # bits / 8 and rounded up
            x = int.from_bytes(self.randbytes(numbytes))

            return x >> (numbytes * 8 - k)  # trim excess bits

        def randbytes(self, n: int) -> bytes:
            return self._g.bytes(n)

@zhudotexe
Copy link
Collaborator

Are you worried about performance on generator creation?

Curiosity more than anything to be honest; since the MT is implemented in C it's pretty lightweight (relatively) anyway. Looking at the C code, it seems like it will allocate the space for the MT state (https://github.com/python/cpython/blob/main/Modules/_randommodule.c#L103) but never initialize it (which would normally be done here, which is only called by seed, which is overridden).

@posita
Copy link

posita commented Sep 22, 2021

Are you worried about performance on generator creation?

Curiosity more than anything to be honest; since the MT is implemented in C it's pretty lightweight (relatively) anyway. Looking at the C code, it seems like it will allocate the space for the MT state (https://github.com/python/cpython/blob/main/Modules/_randommodule.c#L103) but never initialize it (which would normally be done here, which is only called by seed, which is overridden).

Also, the parent random.Random constructor itself calls self.seed, so I'd have to rethink my own __init__ method. This or similar gymnastics just seem icky:

    class NumpyRandom(Random):
        # …
        def __init__(self, bit_generator: BitGenerator):
            self._g = Generator(bit_generator)
            super().__init__()
            self._g = Generator(bit_generator)
        # …

(FWIW, I'm not married to the implementation. I threw it together as a proof-of-concept over an hour or two without thinking too much about it.)

@zhudotexe
Copy link
Collaborator

zhudotexe commented Sep 22, 2021

Maybe something like this?

    class NumpyRandom(random.Random):
        _gen: Generator

        def __init__(self, bitgen_type: Type[_BitGenT], x: _SeedT = None):
            self._bitgen_type = bitgen_type
            super().__init__(x)

        # ...

        def seed(self, a: _SeedT = None, version: int = 2):
            # ...
            self._gen = Generator(self._bitgen_type(a))

    # ...

    if hasattr(numpy.random, "PCG64DXSM"):  # available in numpy 1.21 and up
        random_impl = NumpyRandom(numpy.random.PCG64DXSM)

@posita
Copy link

posita commented Sep 23, 2021

I may be coming around to your proposal. I started to explore something like this:

import random
from numpy.random import default_rng

# Same stuff that numpy.random.default_rng takes as its seed argument
_NumpySeed = Union[None, int, Sequence[int], BitGenerator, Generator]

class NumpyRandom(random.Random):
  _gen: Generator

  def __init__(self, numpy_seed: _NumpySeed):
    super().__init__(numpy_seed)

  def seed(
    self,
      a: _RandSeed,  # type: ignore
      version: int = 0,  # ignored
    ) -> None:
      self._gen = default_rng(a)
  # …

So basically, one could do:

from numpy.random import PCG64DXSM
anything_that_default_rng_would_take = PCG64DXSM(7)
rng = NumpyRandom(anything_that_default_rng_would_take)
# …

All good, right? Except:

# …
anything_that_default_rng_would_take = [1, 2]
default_rng(anything_that_default_rng_would_take)  # this works
NumpyRandom(PCG64DXSM([1, 2]))  # this works
NumpyRandom(default_rng(anything_that_default_rng_would_take))  # this works
NumpyRandom(anything_that_default_rng_would_take)  # WTF?

Results in:

Traceback (most recent call last):
  File "/…….py", line …, in <module>
    NumpyRandom(anything_that_default_rng_would_take)  # WTF?
TypeError: unhashable type: 'list'

🤦‍♂️

So there's obviously some magic fighting against Random living up to its own marketing as a derivable interface. To be fair, your proposal effectively side-steps this issue because the first argument to its __init__ is a class, which is always hashable. I hate that one has to engage in such shenanigans, though….

@zhudotexe
Copy link
Collaborator

Interesting! I think the implementation of NumpyRandom(anything_that_default_rng_would_take) creates a foot-gun, though - since this passes it directly to NumpyRandom.seed to initialize the backing generator, if a user with no understanding of this interaction calls rng.seed(123), it's possible that they could be changing the implementation of the BitGenerator from an explicitly defined one to whatever happens to be exposed by default_rng.

If you wanted to avoid passing the class as an argument, one could store the bitgen class as a classvar:

class NumpyRandom(random.Random, abc.ABC):
    _gen: Generator
    _bitgen_type: Type[BitGenerator]  # to be implemented by subclasses

    def __init__(self,  x: _SeedT = None):
        super().__init__(x)

    # ...

class PCG64DXSMRandom(NumpyRandom):
    _bitgen_type = numpy.random.PCG64DXSM

# ...

But I'm uncertain that doing that really adds much of value; I think it just boils down to code style preferences at that point.

posita added a commit to posita/dyce that referenced this issue Sep 24, 2021
Follow-up to 3c9b258 to restructure NumPy random implementation according to a more
rigid class structure in line with [this
advice](avrae/d20#7 (comment)).
@posita
Copy link

posita commented Sep 24, 2021

You're right, of course. There's another foot gun, in that NumPy "seeds" can be stateful:

>>> from numpy.random import default_rng, PCG64DXSM
>>> seed = PCG64DXSM()
>>> generator = default_rng(seed)
>>> state = seed.state
>>> generator.random()
0.8390240576385848
>>> seed.state == state
False
>>> seed.state = state
>>> generator.random()
0.8390240576385848

So if you keep a reference to the BitGenerator outside of the NumPyRandom object, you can manipulate how the latter behaves (and use of the latter will change the state of the thing referenced). That and the case you point out makes for some sharp edges that don't justify their presence.

I like the subclass thing. I took a stab (and solved the non-hashable seed problem) here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants