ENH: randomgen #13163

mattip · 2019-03-20T14:06:20Z

~~A start at m~~Merging bashtage/randomgen into numpy, as part of NEP 19.

The original repo was cloned, moved to a subdirectory, and then merged into numpy, as documented in _randomgen/README-git.md.
Then I moved the code into numpy/random and the docs into doc/source/random and doc/source/papers.

~~Still very much a work in progress.~~

mattip · 2019-03-20T14:07:14Z

So far I have begun to extend the setup.py to build the modules, but our cythonize script is very primitive.

rgommers · 2019-03-20T20:25:18Z

License related question: if we list this under "bundled libraries" with its own license, does it mean that randomgen will continue to be developed standalone? I thought it was meant solely for inclusion in NumPy.

rkern · 2019-03-20T21:22:33Z

It's intended for incorporation. randomgen as a separate package will be frozen after this gets merged.

rgommers · 2019-03-20T22:43:41Z

It's intended for incorporation. randomgen as a separate package will be frozen after this gets merged.

Nice. If that NCSA license is absolutely necessary then I guess it's no problem, however for something developed for inclusion in NumPy and only being a part of NumPy going forward, just having it under the NumPy license would be preferred.

mattip · 2019-03-21T11:29:32Z

I had to move pcg32.pyx, pcg64.pyx, and the examples temporarily to randomgen_ignore.

The pcg cython files use compiler directives, which we will need to refactor since the source files generated will differ by platform, preventing distributing a single source-tarball.

The examples use from randomgen import ... or so, and our cythonize script runs from the directory containing the pyx file, so it cannot import that way.

mattip · 2019-03-21T11:32:59Z

@bashtage, @rkern: I have imported randomgen in a way that preserves all the commits from that repo, assuming we wish to keep the commit history. Is that OK? Are there things we should git rebase to collapse into a single commit?

bashtage · 2019-03-21T11:42:19Z

@mattip I would think that it should be squashed at least some eventually. When picking, I think it would be nice to keep at least one commit from randomgen contributors in, is possible.

mattip · 2019-03-21T13:28:54Z

The current failures are due to cython using its own numpy.pxd rather than the one we ship along-side mtrand.pyx(since that is in a different subdir) which has the line cdef extern from "numpy/npy_no_deprecated_api.h": pass. Choices are:

work on getting ENH: supply our version of numpy.pxd, requires cython>=0.29 #12284 merged, then update the PR to use it
copy the numpy.pxd from random/mtrand into random/randomgen
do the relevant parts of ENH: supply our version of numpy.pxd, requires cython>=0.29 #12284 for randomgen to get this working without actually merging ENH: supply our version of numpy.pxd, requires cython>=0.29 #12284

bashtage · 2019-03-21T18:15:58Z

I think a clear official #12284 would be the best path since it would resolve this ambiguity once and for all.

mattip · 2019-03-21T18:21:06Z

in d079759 I took the third path, which it turns out is necessary anyway to merge this work with #12284. It was enough on my machine to quiet all the warnings and build the c-extension modules via cython.

mattip · 2019-03-23T17:15:23Z

Can we add a FIXME here ...

@rkern please feel free to push to my branch, I would be happy for collaboration.

charris · 2019-03-23T19:24:52Z

When picking, I think it would be nice to keep at least one commit from randomgen contributors in, is possible.

Could maybe add a Thanks file or similar listing all the contributors. Should be easy to generate with git.

mattip · 2019-03-23T21:19:56Z

The code is sourced from places other than git. For instance, this file has contributors that do not appear in the git history, and has a different license than the rest of randomgen

bashtage · 2019-03-23T21:31:20Z

Many (all) of the C source files for the BRNGs have no git history since they were made available from the authors directly.

bashtage · 2019-03-23T21:33:41Z

@rkern 's idea is clearly the right way to do compat in PCG64. I once tried to do it but gave up since the upside for me was small in randomgen. I found it a bit tricky since Cython understands uint128 and so using a fake type wasn't so simple. At one point I thought some small abstraction beyond a simple #ifdef was needed before I stopped trying.

mattip · 2019-03-23T22:22:33Z

which platforms do not support __unit128?

Does the strategy of breaking the single 128bit int into two 64 bit integers with high and low support both big- and little- endian systems transparently?

bashtage · 2019-03-23T23:40:13Z

Microsoft compilers (of course).

bashtage · 2019-03-23T23:42:20Z

I looked through it and I believe that it is endian independent (and it passed on ARM, but I'm not sure whether this is BE or LE, and it might depend). @rkern wrote it, so he would know better.

mattip · 2019-03-24T08:49:44Z

Tests that come with randomgen now pass.
I think next I should try to move randongen into the numpy.random namespace and drop mtrand. It seems

numpy.random.RandomState = randomgen.legacy.LegacyGenerator
numpy.random.mtrand = randomgen.legacy.LegacyGenerator()
for x in dir(mtrand):
    if x[0] != '_':
        locals()[x] = getattr(mtrand, x)

answers the demands of the NEP (selectively quoted below) except for the missing set_state and get_state. LegacyGenerator().__getstate__ returns a dictionary, where numpy.random.get_state returns a list. I could make that work by coercing the dict to a list and visa-versa for set_state.

Am I on the right track?

First, we will maintain API source compatibility just as we do with the rest of numpy. If we must make a breaking change, we will only do so with an appropriate deprecation period and warnings.
Second, breaking stream-compatibility in order to introduce new features or improve performance will be allowed with caution. ...
[The] legacy distributions class MUST be accessible under the name numpy.random.RandomState for backwards compatibility. All current ways of instantiating numpy.random.RandomState with a given state should instantiate the Mersenne Twister basic RNG with the same state. ... Instances of the legacy distributions class MUST respond True to isinstance(rg, numpy.random.RandomState) because there is current utility code that relies on that check. Similarly, old pickles of numpy.random.RandomState instances MUST unpickle correctly.
Specifically, the initial release of the new PRNG subsystem SHALL leave these (numpy.random.*) convenience functions as aliases to the methods on a global RandomState that is initialized with a Mersenne Twister basic RNG object. A call to numpy.random.seed() will be forwarded to that basic RNG object. In addition, the global RandomState instance MUST be accessible in this initial release by the name numpy.random.mtrand.

bashtage · 2019-03-24T14:00:10Z

I used a property for state which seems more natural than a java-ish set/get paradigm. I also with with a dictionary which is much easier to handle in case of breaks since nothing in the dictionary is really a committment as long as future versions use different keys.

One or both of these should possibly be overruled.

Does this the method you have above produce an identical dir as current random? I can't remember what I did with ancient functions names like ranf that are only aliases

mattip · 2019-03-24T14:26:35Z

The aliases are set in __init__.py. The only missing attributes from dir(randomgen.legacy.LegacyGenerator()) are set_state and get_state.

bashtage · 2019-03-24T15:48:24Z

One further issue about using the same BRNG between legacy and a modern generator -- the legacy state has an additional value containing the next gaussian, since it uses a polar transformation. This might have some implications for setting the state if the basic RNG is shared.

mattip · 2019-03-25T15:08:02Z

I am getting an non-terminating loop in random_zipf from distributions.c for certain conditions in tests

I modified the code to print the values in each loop, details are below. I am not sure of how to provide a better stopping condition when a == 2.0, X == -sys.maxint - 1, V is very large.

code ``` int64_t random_zipf(brng_t *brng_state, double a) { double T, U, V; int64_t X; double am1, b;

am1 = a - 1.0;
b = pow(2.0, am1);
do {
U = 1.0 - next_double(brng_state);
V = next_double(brng_state);
X = (int64_t)floor(pow(U, -1.0 / am1));
/* The real result may be above what can be represented in a int64.
* It will get casted to -sys.maxint-1. Since this is
* a straightforward rejection algorithm, we can just reject this value
* in the rejection condition below. This function then models a Zipf
* distribution truncated to sys.maxint.
*/
T = pow(1.0 + 1.0 / X, am1);
fprintf(stdout, "am1 %10.3g T %10.3g U %10.3g V %10.3g X %ld\n", T, U, V, X);
} while (((V * X * (T - 1.0) / (b - 1.0)) > (T / b)) || X < 1);
return X;
}

which prints

am1 1 T 0.188 U 0.518 V 1.8e+308 X -9223372036854775808
am1 1 T 0.524 U 0.319 V 1.8e+308 X -9223372036854775808
am1 1 T 0.638 U 0.816 V 1.8e+308 X -9223372036854775808
am1 1 T 0.867 U 0.56 V 1.8e+308 X -9223372036854775808
am1 1 T 0.841 U 0.822 V 1.8e+308 X -9223372036854775808
am1 1 T 0.525 U 0.189 V 1.8e+308 X -9223372036854775808
am1 1 T 0.735 U 0.15 V 1.8e+308 X -9223372036854775808
am1 1 T 0.839 U 0.546 V 1.8e+308 X -9223372036854775808
am1 1 T 0.582 U 0.753 V 1.8e+308 X -9223372036854775808
am1 1 T 0.0421 U 0.854 V 1.8e+308 X -9223372036854775808
am1 1 T 0.0792 U 0.67 V 1.8e+308 X -9223372036854775808
am1 1 T 0.31 U 0.436 V 1.8e+308 X -9223372036854775808
am1 1 T 0.951 U 0.889 V 1.8e+308 X -9223372036854775808
am1 1 T 0.862 U 0.641 V 1.8e+308 X -9223372036854775808
am1 1 T 0.967 U 0.285 V 1.8e+308 X -9223372036854775808
am1 1 T 0.0981 U 0.865 V 1.8e+308 X -9223372036854775808
am1 1 T 0.444 U 0.884 V 1.8e+308 X -9223372036854775808
am1 1 T 0.73 U 0.122 V 1.8e+308 X -9223372036854775808
...

mattip · 2019-03-25T15:25:31Z

It helps to debug the debugging code. The printf statement was off-by-one, fixing it shows a==NAN.

Remove traces of the three removed bit generators Add lock to Cython examples

Attempt to avoid defining variables that are incorrect for some platforms

This reverts commit 17e0070.

Pep8 fixes Remove unused imports Fix name error

* PERF: Reorder header for philox Reorder header so that support uint128 is always used if avilable, irrespective of platform

mattip · 2019-05-27T20:46:45Z

hmm, getting a FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. It seems there is bitrot when merging with master

mattip · 2019-05-28T04:29:56Z

tests are passing

seberg · 2019-05-28T17:05:29Z

I would like to thank everyone for making this milestone possible. Especially Robert and Kevin who made all of randomgen possible. I am also very happy and impressed to see so many specialists weighing in!
Probably, there will be a few follow-up issues, but I trust that this is very high quality work and we have discussed the exact API in depth so that I believe it is hashed out very well.

I am looking forward to using the new API, thanks all!

bashtage · 2019-05-28T17:13:11Z

Great. How long does it take for this to make it out to projects that test using prerelease builds of NumPy? As fun as the API discussion is about Generator, my main concern is verifying that I didn't accidentally break something in RandomState.

seberg · 2019-05-28T17:19:06Z

Indeed, I do think we have a few projects testing against master so that should be fairly quick. The big test will be the rc releases probably.

WarrenWeckesser · 2019-06-09T20:52:20Z

Would anyone object to a pull request that renames distributions-boxmuller.c in numpy/random/src/legacy/ to distributions-legacy.c? Or would that be needless churn? The file contains all the legacy distributions, not just the normal distribution.

When I first saw the file name, I didn't realize that it was the file I was looking for. It only took a quick look to figure it out, so this isn't a high priority, but I think a more accurate name would be helpful.

bashtage · 2019-06-09T20:59:55Z

I switched in randomgen after @mattip started the process with legacy-distributions.c and .h

https://github.com/bashtage/randomgen/tree/master/randomgen/src/legacy

I also moved them to src/legacy which I'm not sure is totally necessary, so I'm +1 on the clarifications. IMO a git mv is minimal churn.

Perhaps a comment at the top would be helpful to explain that this is the file where hanged fucntions from distributions.c go to maintain the RandomState promise would be helpful too.

WarrenWeckesser · 2019-06-09T21:57:38Z

OK, pull request is #13743

mattip · 2019-06-10T04:44:12Z

hanged fucntions from distributions.c

@rkern could you rephrase that request in the PR? Do you mean frozen versions of the functions?

WarrenWeckesser · 2019-06-10T23:01:51Z

@mattip, it was @bashtage who made that comment.

bashtage · 2019-06-10T23:06:37Z

Do you mean frozen versions of the functions?

Yes. I mean the canonical version that must be used in RandomState. legacy-distributions.c is the file where the canonical version should be moved to when the generator for a specific distribution in distributions.c is improved.

WarrenWeckesser · 2019-06-12T05:10:33Z

numpy/random/generator.pyx

+        Does their energy intake deviate systematically from the recommended
+        value of 7725 kJ?
+
+        We ha


Why is the actual output not shown here? Is it because the indeterminate value causes problems with doctests?

mattip added 01 - Enhancement component: numpy.random labels Mar 20, 2019

mattip mentioned this pull request Mar 20, 2019

ENH: tracking issue for merging randomgen into numpy #13164

Closed

16 tasks

mattip force-pushed the randomgen branch from 1368fc9 to 0adab55 Compare March 21, 2019 11:24

mattip force-pushed the randomgen branch from d092529 to 407b96a Compare March 23, 2019 21:30

mattip force-pushed the randomgen branch from 407b96a to a3303a2 Compare March 23, 2019 22:17

bashtage and others added 5 commits May 27, 2019 22:58

MAINT: Remove remnants of bit generators

dabf42b

Remove traces of the three removed bit generators Add lock to Cython examples

BLD: Improve setup

3db5a77

Attempt to avoid defining variables that are incorrect for some platforms

Revert "MAINT: Implement API changes for randomgen-derived code"

58c0e72

This reverts commit 17e0070.

STY: Clean up code

23853d6

Pep8 fixes Remove unused imports Fix name error

PERF: Reorder header for philox (#34)

9c261e6

* PERF: Reorder header for philox Reorder header so that support uint128 is always used if avilable, irrespective of platform

mattip force-pushed the randomgen branch from dbc1374 to 9c261e6 Compare May 27, 2019 19:59

MAINT: fix for dtype specification

70d6293

mattip force-pushed the randomgen branch from 6152173 to 70d6293 Compare May 27, 2019 21:09

mattip mentioned this pull request May 28, 2019

ENH: prevent access to default BitGenerator #13650

Closed

seberg merged commit 22239d1 into numpy:master May 28, 2019

glemaitre mentioned this pull request Jun 4, 2019

Wrong value for the sum of p in np.random.choice #13713

Closed

bashtage mentioned this pull request Jun 5, 2019

BUG: Fix random.choice when probability is not C contiguous #13716

Merged

WarrenWeckesser reviewed Jun 12, 2019

View reviewed changes

mattip deleted the randomgen branch August 8, 2019 17:19

anjakefala mentioned this pull request Oct 23, 2019

Generate a Gaussian noise dataset radiocosmology/draco#52

Closed

4 tasks

sethtroisi mentioned this pull request Jan 7, 2020

Set Cython 'language_level` directive after 1.16 release. #12356

Closed

mattip mentioned this pull request Dec 21, 2021

ENH: reduce the overhead of the checks in the multinomial function #20636

Closed

lucyleeow mentioned this pull request Aug 4, 2023

Should we consider moving from legacy numpy RandomState to Random.Generator? scikit-learn/scikit-learn#27008

Closed

jameslamb mentioned this pull request Nov 7, 2023

[python-package] Accept numpy generators as random_state microsoft/LightGBM#6174

Merged

ENH: randomgen #13163

ENH: randomgen #13163

Conversation

mattip commented Mar 20, 2019 • edited

mattip commented Mar 20, 2019

rgommers commented Mar 20, 2019

rkern commented Mar 20, 2019

rgommers commented Mar 20, 2019

mattip commented Mar 21, 2019 • edited

mattip commented Mar 21, 2019

bashtage commented Mar 21, 2019

mattip commented Mar 21, 2019

bashtage commented Mar 21, 2019

mattip commented Mar 21, 2019 • edited

mattip commented Mar 23, 2019

charris commented Mar 23, 2019

mattip commented Mar 23, 2019

bashtage commented Mar 23, 2019

bashtage commented Mar 23, 2019

mattip commented Mar 23, 2019

bashtage commented Mar 23, 2019

bashtage commented Mar 23, 2019

mattip commented Mar 24, 2019

bashtage commented Mar 24, 2019

mattip commented Mar 24, 2019

bashtage commented Mar 24, 2019

mattip commented Mar 25, 2019

mattip commented Mar 25, 2019

mattip commented May 27, 2019

mattip commented May 28, 2019

seberg commented May 28, 2019

bashtage commented May 28, 2019

seberg commented May 28, 2019

WarrenWeckesser commented Jun 9, 2019

bashtage commented Jun 9, 2019

WarrenWeckesser commented Jun 9, 2019

mattip commented Jun 10, 2019

WarrenWeckesser commented Jun 10, 2019

bashtage commented Jun 10, 2019

WarrenWeckesser Jun 12, 2019

Choose a reason for hiding this comment

mattip commented Mar 20, 2019 •

edited

mattip commented Mar 21, 2019 •

edited

mattip commented Mar 21, 2019 •

edited