Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on new PRNG BitGenerator default #13635

Closed
rkern opened this issue May 27, 2019 · 166 comments
Closed

Decide on new PRNG BitGenerator default #13635

rkern opened this issue May 27, 2019 · 166 comments

Comments

@rkern
Copy link
Member

rkern commented May 27, 2019

#13163 will be bringing in the long-awaited replacement of numpy's PRNG infrastructure. In the interest of keeping that PR manageable, we will merge it to master before all of the decisions are finalized, like which BitGenerator will be nominated as the default.

We must make a decision before the first release with the new infrastructure. Once released, we will be stuck with our choice for a while, so we should be sure that we are comfortable with our decision.

On the other hand, the choice of the default does not have that many consequences. We are not talking about the default BitGenerator underlying the numpy.random.* convenience functions. Per NEP 19, these remain aliases to the legacy RandomState, whose BitGenerator remains MT19937. The only place where the default comes in is when Generator() is instantiated without arguments; i.e. when a user requests a Generator with an arbitrary state, presumably to then call the .seed() method on it. This might probably be pretty rare, as it would be about as easy to just explicitly instantiate it with the seeded BitGenerator that they actually want. A legitimate choice here might actually be to nominate no default and always require the user to specify a BitGenerator.

Nonetheless, we will have recommendations as to which BitGenerator people should use most of the time, and while we can change recommendations fairly freely, whichever one has pride of place will probably get written about most in books, blogs, tutorials, and such.

IMO, there are a few main options (with my commentary, please feel free to disagree; I have not attempted to port over all the relevant comments from #13163):

No default

Always require Generator(ChosenBitGenerator(maybe_seed)). This is a little unfriendly, but as it's a pretty convenient way to get the generator properly initialized for reproducibility, people may end up doing this anyways, even if we do have a default.

MT19937

This would be a good conservative choice. It is certainly no worse than the status quo. As the Mersenne Twister is still widely regarded as "the standard" choice, it might help academic users who need their papers to be reviewed by people who might question "non-standard" choices, regardless of the specific qualities of the PRNG. "No one ever got fired for hiring IBM." The main downsides of MT19937 are mostly that it is slower than some of the available alternatives, due to its very large state, and that it fails some statistical quality tests. In choosing another PRNG, we have an opportunity (but not an obligation, IMO) to be opinionated here and try to move "the standard", if we wish.

PCG64

This is likely the one that I'll be using most often, personally. The main downside is that it uses 128-bit integer arithmetic, which is emulated in C if the compiler does not provide such an integer type. The two main platforms for which this is the case are 32-bit CPUs and 64-bit MSVC, which just does not support 128-bit integers even when the CPU does. Personally, I do not suggest letting the performance increasingly-rare 32-bit CPUs dictate our choices. But the MSVC performance is important, though, since our Windows builds do need that compiler and not other Windows compilers. It can probably be addressed with some assembly/compiler intrinsics, but someone would have to write them. The fact that it's only MSVC that we have to do this for makes this somewhat more palatable than other times when we are confronted with assembly.

Xoshiro256

Another modern choice for a small, fast PRNG. It does have a few known statistical quirks, but they are unlikely to be a major factor for most uses. Those quirks make me shy away from it, but that's my personal choice for the code I'll be writing.

@charris
Copy link
Member

charris commented May 27, 2019

What does the Intel windows compiler do for 128 bit integers? How much slower is PCG64 compiled with MSVC compared to MT1993 on windows? I suspect that the jump ahead feature will be widely used, so it might be good to have it by default.

@rkern
Copy link
Member Author

rkern commented May 27, 2019

What does the Intel windows compiler do for 128 bit integers?

Not entirely sure; I don't know if there are ABI implications that ICC would care to be constrained by. If we just want to get any idea of the generated assembly that we could use, then this is a handy resource: https://godbolt.org/z/kBntXH

I suspect that the jump ahead feature will be widely used, so it might be good to have it by default.

Do you mean settable streams, rather? That's a good point, but I wonder if it might not cut the other way. If our choice of default actually matters much, then maybe if we pick one of these more fully-featured PRNGs, people will use those features more extensively in library code without documenting that they require those "advanced" features, because, after all, they are available "standard". But then if another user tries to use that library with a less-featured BitGenerator for speed or other reasons, then they'll hit a brick wall. In a No default or MT19937 world, libraries would be more likely to think about and document the advanced features that they require.

On the gripping hand, that eventuality would make the BitGenerators without settable streams look less desirable, and I do like the notion of advancing what's considered to be best practice in that direction (purely personally; I don't feel an obligation to make NumPy-the-project share that notion). It might help avoid some of the abuses that I see with people .seed()ing in the middle of their code. But again, all that's predicated on the notion that having a default will change people's behaviors significantly, so all of these concerns are likely quite attenuated.

@imneme
Copy link

imneme commented May 27, 2019

How much slower is PCG64 compiled with MSVC compared to MT1993 on windows?

In benchmarks posted by @bashtage in #13163, PCG64 is nearly half the speed of MT19937, which is pretty disappointing performance out of MSVC and friends. It's compared to 23% faster on Linux.

What does the Intel windows compiler do for 128 bit integers?

Other compilers like Clang, GCC, and the Intel compiler implement 128-bit integers on 64-bit systems the same way that they implemented 64-bit integers on 32-bit systems. All the same techniques with no new ideas needed. Microsoft didn't bother to do that for MSVC so there are no 128-bit integers directly supported by the compiler.

As a result, for MSVC the existing implementation of PCG64 in #13163 hand-implements 128-bit math by calling-out to Microsoft intrinsics like _umul128 in x86_64 (and it could presumably also use equivalent and more portable Intel intrinsics like _mulx_u64 instead), thereby coding what GCC, Clang and the Intel compiler would do by themselves. The biggest issue is likely that Microsoft's compiler doesn't optimize these intrinsics very well (hopefully they're at least inlined?). It's possible that hand-coded assembler might go faster but the proper fix would be for the compiler not do be so diabolically poor.

I suspect that the jump ahead feature will be widely used, so it might be good to have it by default.

I'm glad you like jump ahead, but I'm curious why you think it'd be widely used. (Personally, I really like distance, which tells you how far apart two PRNGs are. That's in the C++ version of PCG, but not the C one. It'd be trivial enough to add it though if there was interest.)

@charris
Copy link
Member

charris commented May 27, 2019

I'm glad you like jump ahead, but I'm curious why you think it'd be widely used.

Probably unfamiliarity with current terminology. What I mean is easily obtained independent streams that can be used to run simulations in parallel. I don't know how many simulation problems can be parallelized, but I suspect it is a lot and given the number of cores people get on a chip these days, that could easily make up for a speed disadvantage.

Microsoft didn't bother to do that for MSVC so there are no 128-bit integers directly supported by the compiler.

So that will hurt our wheels, OTOH, many folks on Windows get their packages from Anaconda or Enthought, both of which use Intel, and folks who really care about performance are probably on Linux, Mac, or maybe AIX.

EDIT: And perhaps if Microsoft is concerned, they could offer a bounty for fixing the problem.

@rkern
Copy link
Member Author

rkern commented May 27, 2019

FWIW, here's the assembly that clang would generate for the critical function, including the bits needed to unpack/repack the uint128_t into the struct of uint64_ts: https://godbolt.org/z/Gtp3os

@imneme
Copy link

imneme commented May 27, 2019

Very cool, @rkern. Any chance you can do the same to see what MSVC is doing with the hand-written 128-bit code?

@rkern
Copy link
Member Author

rkern commented May 27, 2019

Very cool, @rkern. Any chance you can do the same to see what MSVC is doing with the hand-written 128-bit code?

It's, uh, not pretty. https://godbolt.org/z/a5L5Gz

Oops, forget to add -O3, but it's still ugly: https://godbolt.org/z/yiQRhd

@imneme
Copy link

imneme commented May 27, 2019

It's not quite that bad. You didn't have optimization on, so it didn't inline anything. I've added /Ox (maybe there's a better option?). I also fixed the code to use the built-in rotate intrinsic (_rotr64) since apparently MSVC is incapable of spotting the C rotate idiom.

Still kind-of a train wreck though. But I think it's fair to say that with a bit of attention, the PCG64 code could be tweaked to compile on MSVC into something that isn't utterly embarrassing for everyone.

@eric-wieser
Copy link
Member

In order to allow everything else to be merged, why not pick "no default" for now? That leaves us free to make the decision on the default later (even after one or more release s) without breaking compatibility.

@mattip
Copy link
Member

mattip commented May 27, 2019

Most of our users are not random number experts, we should be providing defaults for them.

Beyond the prosaic "now they need to type more code", what happens when we change something? In the case where the BitGenerator is hard coded, (because we did not provide a default), every non-sophisticated user will now have to refactor their code, and hopefully understand the nuances of their choice (note we cannot even agree amongst ourselves what is best). However if we provide a default, we might noisily break their tests because the new default or new version is not bit-stream compatible.

Between the assumption that the bit-stream will always be constant versus the assumption that the NumPy developers know what they are doing and the default values should be best-of-brand, I would err on the side of the second assumption, even if it breaks the first.

Edit: clarify which developers should know what they are doing

@rkern
Copy link
Member Author

rkern commented May 27, 2019

Most of our users are not random number experts, we should be providing defaults for them.

Well, we'll certainly be documenting recommendations, at the very least, regardless of whether or not we have a default or what the default is.

Beyond the prosaic "now they need to type more code", what happens when we change something?

What "something" are you thinking about? I can't follow your argument.

@bashtage
Copy link
Contributor

Beyond the prosaic "now they need to type more code", what happens when we change something?

What "something" are you thinking about? I can't follow your argument.

@mattip is referring to changing the default bit generator.

This woudl make users who have adopted it mad, and coudl require some code change.

For example, if you used

g = Generator()
g.bit_generator.seed(1234)

and the underlying bit generator was changed, then this would be wrong.

If you did the more sane thing and used

Generator(BitGenerator(1234))

then you would not see it.

IMO, When considering the choice of default, we should think it fixed until a fatal flaw is found in the underlying bit generator or Intel adds a QUANTUM_NI to its chips, which produces many OOM improvement in random performance.

@imneme
Copy link

imneme commented May 27, 2019

I realize I'm a bit of an outsider here, but I don't think it's reasonable to expect that which PRNG is the default choice is fixed forever and never changes. (In C++, for example, std::default_random_engine is at the discretion of the implementation and can change from release to release.)

Rather, there needs to be a mechanism to reproduce prior results. Thus once a particular implementation exists, it is very uncool to change it (e.g., the MT19937 is MT19937, you can't tweak it to give different output). [And it's also uncool to remove an implementation that already exists.]

When the default changes, people who want to keep reproducing old results will need to ask for the previous default by name. (You could make that by providing a mechanism to select the default corresponding to a prior release.)

That said, even if you're allowed to swap out the default generator for something else, it really needs to be strictly better — any features present in the default generator represent a commitment to support that feature in the future. If your default generator has efficient advance, you can't really take that away later. (You could potentially wall off advanced functionality in the default generator to avoid this issue.)

In summary, there are ways to make sure uses can have reproducible results without trying to lock yourselves into a contract where the default is unchanged forever. It'll also reduce the stakes for the choice you make.

(FWIW, this is what I did in PCG. The default PCG 32-bit PRNG is currently the XSH-RR variant [accessed as pcg_setseq_64_xsh_rr_32_random_r in the C library and the pcg_engines::setseq_xsh_rr_64_32 class in the C++ library], but in principle if you really want future-proof reproducibility you should specify XSH-RR explicitly, rather than use pcg32_random_r or pcg32 which are aliases and in principle can be upgraded to something else.)

@bashtage
Copy link
Contributor

It is not really forever (this entire project is 90% driven by a real, genuine, and honored forever promise made about 14 years ago), but as you say, switching needs (a) a compelling reason to change and (b) would take at least a few years give the depreciation cycle.

It is much better to try hard today to get it as close to right as possible.

One thing that isn't banned, of course, is improving PRNG code after release as logn as it produces the same values. For example, if we went with a PRNG that used uint128, we could let MS add uint128 support (fat chance) or add assembly for Win64 in a future version.

@rkern
Copy link
Member Author

rkern commented May 27, 2019

For example, if you used

g = Generator()
g.bit_generator.seed(1234)

and the underlying bit generator was changed, then this would be wrong.

Right, that seems to be arguing, with @eric-wieser, for the "No default" option, which I can't square with the initial statement "Most of our users are not random number experts, we should be providing defaults for them."

@bashtage
Copy link
Contributor

Between no default and a friendly, fully assuming default, I would always choose the latter:

Now:

Generator() # OK
Generator(DefaultBitGenerator(seed)) # OK
Generator(seed) # error

my preference:

Generator(1234) == Generator(DefaultBitGenerator(1234)
Generator(*args**kwargs) == Generator(DefaultBitGenerator(*args, **kwargs))

Now I don't think this is going to get in, but I think one way to prolong the use of RandomState is to make this only available to users who feel they are expert enough to choose a bit generator.

@rkern
Copy link
Member Author

rkern commented May 27, 2019

In summary, there are ways to make sure uses can have reproducible results without trying to lock yourselves into a contract where the default is unchanged forever. It'll also reduce the stakes for the choice you make.

Yes, we have that. Users can grab BitGenerators by name (e.g. MT19937, PCG64, etc.) and instantiate them with seeds. BitGenerator objects implement the core uniform PRNG algorithm with a limited set of methods for drawing uniform [0..1) float64s and integers (as well as whatever fun jumpahead/stream capabilities they have). The Generator class that we are talking about takes a provided BitGenerator object and wraps around it to provide all of the non-uniform distributions, the Gaussians, the gammas, the binomials, etc. We have strict stream compatibility guarantees for the BitGenerators. We won't be getting rid of any (that make it to release), and nor will be changing them.

The central question about the default is "What does the code g = Generator(), with no arguments, do?" Right now, in the PR, it creates a Xoshiro256 BitGenerator with an arbitrary state (i.e. drawn from a good entropy source like /dev/urandom). The "No default" option would be to make that an error; users would have to explicitly name the BitGenerator that they want. @eric-wieser's point is that "No default" is a categorically safe option for the first release. A later release providing a default won't cause problems in the same way that changing an existing default does.

@imneme
Copy link

imneme commented May 27, 2019

@rkern, If you only care about the no arguments case where the seed is autogenerated from available entropy, then it really doesn't matter much what the underlying generator is — it could change on an hourly basis since the results would never be reproducible (different runs would get different seeds).

In contrast, @bashtage seems to care about a default generator that's provided with a seed.

@rkern
Copy link
Member Author

rkern commented May 27, 2019

@rkern, If you only care about the no arguments case where the seed is autogenerated from available entropy, then it really doesn't matter much what the underlying generator is — it could change on an hourly basis since the results would never be reproducible (different runs would get different seeds).

You can reseed the BitGenerator after it's created. So if Generator() works, what I'm fully expecting to happen is that people who want a seeded PRNG will just seed it in the next line, as in @bashtage's example:

g = Generator()
g.bit_generator.seed(seed)

That's somewhat tedious, which is why I suggested at top that maybe most people would usually opt for Generator(PCG64(<seed>)) anyways, since it's just about as convenient typing-wise. However, @bashtage correctly notes some resistance when faced with making an extra decision.

So I guess we also have a broader question in front of us: "What are all the ways that we want users to instantiate one of these? And if those ways have default settings, what should those defaults be?" We have some open design space and @bashtage's suggestion for Generator(<seed>) or Generator(DefaultBitGenerator(<seed>)) are still possibilities.

@bashtage How much do you think documentation would help? That is, if we said at the top "PCG64 is our preferred default BitGenerator" and used Generator(PCG64(seed)) consistently in all examples (when not specifically demonstrating other algorithms)?

I might be more convinced to have a default_generator(<seed>) function over Generator(<seed>) or g=Generator();g.seed(<seed>). Then if we really needed to change it and didn't want to break stuff, we could just add a new function and add warnings to the old one. I might recommend marking it experimental for the first release, giving us some time to watch this infrastructure in the wild before making a firm commitment.

@shoyer
Copy link
Member

shoyer commented May 27, 2019

What about actually making a DefaultBitGenerator object that doesn't expose any details of its internal state? This would be a proxy for one of the other bit generator objects, but would in principle could be wrapping any of them -- except of course for its specific sequence of generated numbers. This would hopefully discourage users from making programmatic assumptions about what they can do with the default BitGenerator, while allowing us to still use an improved algorithm.

I agree with @bashtage that it would much more friendly to directly support integer seeds as arguments to Generator, e.g., np.random.Generator(1234). This would, of course, make use of DefaultBitGenerator.

In documentation for Generator, we could give a full history of what the default bit generator was in each past version of NumPy. This is basically @imneme's suggestion, and I think would suffice for reproducibility purposes.

@imneme
Copy link

imneme commented May 27, 2019

(Just saw this edit to an earlier comment)

Oops, forget to add -O3, but it's still ugly: https://godbolt.org/z/yiQRhd

For MSVC, it's not -O3, it's /O2 or /Ox (but not /O3!).

@shoyer
Copy link
Member

shoyer commented May 27, 2019

In documentation for Generator, we could give a full history of what the default bit generator was in each past version of NumPy. This is basically @imneme's suggestion, and I think would suffice for reproducibility purposes.

Actually, even better would be to include an explicit version argument, like pickle's protocol argument, in Generator/DefaultBitGenerator. Then you could write something like np.random.Generator(123, version=1) to indicate that you want "version 1" random numbers (whatever that is) or np.random.Generator(123, version=np.random.HIGHEST_VERSION) (default behavior) to indicate that you want the latest/greatest bit generator (whatever that is).

Presumably version=0 would be the MT19937 that NumPy has used up to now, and version=1 could be whatever new default we pick.

@rkern
Copy link
Member Author

rkern commented May 27, 2019

What about actually making a DefaultBitGenerator object that doesn't expose any details of its internal state? This would be a proxy for one of the other bit generator objects, but would in principle could be wrapping any of them -- except of course for its specific sequence of generated numbers. This would hopefully discourage users from making programmatic assumptions about what they can do with the default BitGenerator, while allowing us to still use an improved algorithm.

Hmmm. That's appealing. It feels like it's overcomplicating things and adding another loop to this Gordian knot (and that there should be a more Alexander-esque stroke available to us), but that's really the only bad thing I have to say about it. It does make the remaining decisions easier: we can focus on statistical quality and performance.

Actually, even better would be to include an explicit version argument, like pickle, in Generator/DefaultBitGenerator.

I'm less a fan of this. Unlike the pickle case, these things have meaningful names that we can use, and we already have the mechanism implemented.

@shoyer
Copy link
Member

shoyer commented May 27, 2019

I'm less a fan of this. Unlike the pickle case, these things have meaningful names that we can use, and we already have the mechanism implemented.

Consider the following from the perspective of a typical NumPy user:

  • np.random.Generator(seed, version=0) vs np.random.Generator(seed, version=1)
  • np.random.Generator(MT19937(seed)) vs np.random.Generator(PCG64(seed))

I think it's safe to assume that most of our users know very little about the relative merits of RNG algorithms. But even without reading any docs, they can safely guess that version=1 (the newer default) must be better in most cases than version=0. For most users, that's really all they need to know.

In contrast, names like MT19937 and PCG64 are really only meaningful for experts, or people who have already read our documentation :).

@rkern
Copy link
Member Author

rkern commented May 27, 2019

In your use case, no one is selecting the version that they want. They only select the version that they need in order to replicate the results of a known version. They are always looking for a specific value that was used (implicitly, because we allowed it to be implicit) in the results that they want to replicate; they don't need to reason about the relationship between multiple values.

And in any case, that level of cross-release reproducibility is something that we've disclaimed in NEP 19. The arguments against versioning the distributions apply just as well here.

@rgommers
Copy link
Member

A few thoughts on the default:

  • 99.9% of users won't care or want to know about underlying algorithms, they just want random numbers. So +1 for making an opinionated choice for the default, please don't make users choose.
  • dSFMT seems to be simply a faster version than MT19937 (would be nice to state in the docs how fast and remove "SSE2"). Since we're not guaranteeing bitstream reproducibility anyway, the internal state differences are not very interesting and dSFTM should be preferred over MT19937 even if the winning argument here is "make life easier during article review".
  • Performance matters to a significant fraction of the user base. Statistical properties of generators only matters to a very small fraction of users. All included generators are fine for normal use cases. So +1 for choosing the fastest one as default.

@matthew-brett
Copy link
Contributor

Sorry to say - but 32-bit still matters on Windows - see pypa/manylinux#118 (comment)

@imneme
Copy link

imneme commented Jun 23, 2019

Yes, when he explains the basics it's considered okay!

@lemire
Copy link

lemire commented Jun 23, 2019

@rkern @imneme Simplicity is a feature, both in software and in mathematics. That some are unimpressed by simple work should not be taken as contradictory evidence.

@imneme
Copy link

imneme commented Jun 24, 2019

@lemire: There's a humor piece I like that I think has a lot of truth to it called How To Criticize Computer Scientists. The underlying idea behind the piece is that theorists favor sophistication and experimentalists favor simplicity. So if your audience is one of experimentalists, they'll be delighted by simplicity, but if your audience is one of theorists, not so much.

@rkern
Copy link
Member Author

rkern commented Jun 26, 2019

The default BitGenerator is PCG64. Thank you all for your thoughtful contributions. And stamina!

@imneme
Copy link

imneme commented Jun 27, 2019

Very much inspired by this thread, I have some news to report…

Background

By many measures pcg64 is pretty good; for example, under the usual measures of statistical quality, it gets a clean bill of health. It's been tested in various ways; most recently I've run it all the way out to half a petabyte with PractRand. It works well in normal use cases.

BUT, the pathologies that came up in this thread didn't sit well with me. Sure, I could say “well, don't hold it that way”, but the whole point of a general purpose PRNG is that it ought to robust. I wanted to do better...

So, about 25 days ago I began thinking about designing a new member of the PCG family…

Goal

My goal was to design a new PCG family member that could be a drop in replacement for the current pcg64 variant. As such:

  • The output function should scramble the bits more than XSL RR (because doing so will avoid the issues that came up in this thread).
  • The performance should be about as fast (or faster) than the current pcg64.
  • The design must be PCG-ish (i.e., don't be trivially predictable, and thus don't allow any of the work of the output function to be easily undone).

As always there is a trade-off as we try to get the best quality we can as quickly as we can. If we didn't care at all about speed, we could have more steps in the output function to produce more heavily scrambled output, but the point of PCG was that the underlying LCG was “almost good enough” and so we didn't need to go to quite as much effort as we would with something like a counter incrementing by 1.

Spoiler

I'm pleased to report success! About 25 days ago when I was first thinking about this I was actually on vacation. When I got back about ten days ago, I tried the ideas I had and was pleased to find that they worked well. The subsequent time has mostly been spent on various kinds of testing. Yesterday was satisfied enough that I pushed the code into the C++ version of PCG. Tests at small sizes indicate that it is much better than XSL RR, and competitive with RXS M, but it actually shines at larger sizes. It meets all the other goals as well.

Details

FWIW, the new output function is (for the 64-bit output case):

uint64_t output(__uint128_t internal)
{
    uint64_t hi = internal >> 64;
    uint64_t lo = internal;
	
    lo |= 1;
    hi ^= hi >> 32;
    hi *= 0xda942042e4dd58b5ULL;
    hi ^= hi >> 48;
    hi *= lo;
    return hi;
}

This output function is inspired by xorshift-multiply, by which is widely used. The choice of multipliers is (a) to keep the number of magic constants down, and (b) to prevent the permutation being undone (if you don't have access to low-order bits), and also provide the whole “randomized-by-itself” quality that PCG output functions typically have.

Other changes

It's also the case that 0xda942042e4dd58b5 is the LCG multiplier for this PRNG (and all cm_ prefixed 128-bit-state PCG generators). As compared to 0x2360ed051fc65da44385df649fccf645 used by pcg64, this constant is actually still fairly good in terms of spectral-test properties, but is cheaper to multiply by because 128-bit × 64-bit is easier than 128-bit × 128-bit. I've used this LCG constant for several years without issue. When using the cheap-multiplier variant, I run the output function on the pre-iterated state rather than the post-iterated state for greater instruction-level parallelism.

Testing

I've tested it thoroughly (PractRand and TestU01) and I'm happy with it. Tests included scenarios outlined in this thread (e.g., taking a gang generators either on sequential steams or advanced by 2^64 and interleaving their output — I tested a gang of four and a gang of 8192 out to 8 TB with no issues, as well as a stream and its opposite-land counterpart).

Speed

I could go on at length about speed tests and benchmarks. There are all sorts of factors that influence whether one PRNG runs faster than another in a given benchmark, but overall, this variant seems to often be a little faster, sometimes a lot faster, and occasionally a little slower. Factors like the compiler and the application have a much greater impact on benchmark variability.

Availability

Users of the C++ header can access this new family member now as pcg_engines::cm_setseq_dxsm_128_64; at some point in the future, I'll switch pcg64 from pcg_engines::setseq_xsl_rr_128_64 to this new scheme. My current plan is to do so this summer as part of a PCG 2.0 version bump.

Formal Announcements

Overall, I'm very happy with this new family member and at some point later in the summer, there will be blog posts with more detail, likely referencing this thread.

Your Choices...

Of course, you have to work out what to do with this. Regardless of whether you'd use it or not, I'd actually be pretty curious to see whether it does better or worse in your speed benchmarks.

@lemire
Copy link

lemire commented Jun 27, 2019

@imneme Does it get around the need to have a fast full 64-bit multiplier? (Which is super fast on x64 but a tad slower on some weaker architectures.)

@imneme
Copy link

imneme commented Jun 27, 2019

@lemire: 128-bit × 64-bit multiply, which internally will be done with two multiply instructions on x86 (a 64-bit × 64-bit → 128-bit result on the low-order bits, and 64-bit × 64-bit on the high order bits. Both multiplies can take place in parallel, and then the two results need to be added.)

It's still potentially better than 128-bit × 128-bit. Although how much better depends on how well instruction scheduling goes at that moment.

You're right that on ARM, 64-bit × 64-bit → 128-bit result is actually two instructions.

(It's totally possible, of course, to gang together two 64-bit LCGs and mix them. The space of all the PRNGs that could exist and would work well is pretty huge.)

@rkern
Copy link
Member Author

rkern commented Jun 28, 2019

A quick-and-dirty implementation in our framework suggests a mild performance improvement, at least on 64-bit Linux:

Time to produce 1,000,000 64-bit unsigned integers
************************************************************
MT19937      5.42 ms
PCG64        2.59 ms
PCG64DXSM    2.41 ms
Philox       4.37 ms
SFC64        2.07 ms
numpy        5.41 ms
dtype: object

64-bit unsigned integers per second
************************************************************
MT19937      184.39 million
PCG64        386.01 million
PCG64DXSM    415.02 million
Philox       228.88 million
SFC64        483.94 million
numpy        184.79 million
dtype: object

I think it's outputting from the pre-iterated state that gives it the bump, at least in this context. I left that out at first and got essentially the same performance as PCG64 1.0. The real win would be under 128-bit emulation, I suspect, but I didn't get around to writing that up, and I don't have a good way to test it on the platforms that matter.

I guess the real question for you, @imneme, is how annoyed will you be with the name numpy.random.PCG64 implementing the 1.0 algorithm? The release is imminent and already delayed, so I don't think we're going to change the algorithm at this time. If the performance on 32-bit platforms is particularly good, then I think we might add PCG64DXSM in a following release, and maybe reconsider the default a few releases down the line.

@imneme
Copy link

imneme commented Jun 28, 2019

It's your choice to make!

I have no problem with your shipping the 1.0 version of PCG64. Plenty of other folks have used that variant.

I do think the DXSM variant has the advantage of avoiding the edge-case-usage issues that came up in this thread (after all, that's pretty much why it exists), but on the other hand, it has the disadvantage that it's late to the party. It might even seem quite reckless to ship a PRNG that is less than a month old out to users (even though it is based on the same ideas as the more time-tested PCG variants).

(That said, if it were my choice, notwithstanding possible accusations of recklessness, I'd probably ship the new one; I think the delay to get it up and running in Numpy is pretty minimal. And the risk is very low — it's already throughly tested with BigCrush, and tested with PractRand out to 16 TB (including cm_mcg_dxsm_64_32 which is a quarter of the size [no streams, 32-bit output]), and will likely hit 32 TB in less than a week.)

[Glad the performace did get a tad better. Five years ago, the using pre-iterated state was a pessimization for 128-bit sizes with 128-bit multipliers. But that was then, on the machines I was testing on, with the benchmarks I was using.]

@rkern
Copy link
Member Author

rkern commented Jun 28, 2019

I meant more about using the name PCG64 for the 1.0 variant when you are going to be using that name to refer to the 2.0 variant.

@lemire
Copy link

lemire commented Jun 28, 2019

@rkern If it is just a naming issue, then PCG64DXSM and PCG64 sets them apart nicely, no?

@rkern
Copy link
Member Author

rkern commented Jun 28, 2019

For numpy, certainly. I am just wondering if @imneme would prefer that we not name our 1.0 implementation PCG64 when she is going to be promoting the 2.0 variant under that name in the C++ version. I am sensitive to the fact that being loose with the names means that some people might test numpy's PCG64 and compare that to the claims that will be made on pcg-random.org about the 2.0 version. C.f. just about any conversation about Bob Jenkin's PRNGs.

@imneme
Copy link

imneme commented Jun 28, 2019

In Section 6.3 of the PCG paper, it says:

Note also that although the generators are presented with mnemonic names based on the permutations they perform, users of the PCG library should rarely select family members by these mnemonics. The library provides named generators based on their properties, not their underlying implementations (e.g., pcg32_unique for a general-purpose 32-bit generator with a unique stream). That way, when future family members that perform even better are discovered and added (hopefully due to the discoveries of others), users can switch seamlessly over to them.

And the C and C++ libraries are structured that way. The libraries provide

  • a low-level interface that lets you pick a specific family member via the name of its permutation, the bit sizes it operates at, and the characteristics of the underlying LCG
  • a high level interface that provides convenient aliases like pcg64 that connect to a pre-chosen low-level family member.

In this way, the aliases can be updated to point to newer family members, but users who want to exactly reproduce older results will still be able to by using the low-level interface to select the family member that was previously reachable by a convenient high-level alias.

If you're going to ship a PRNG called PCG64, I'd say it is sufficient to say in your documentation which specific PCG variant that is — in other words, say which family member it corresponds to in the low-level C or C++ library interface.

@shoyer
Copy link
Member

shoyer commented Jun 30, 2019

The default generator is implemented is implemented as np.random.default_gen() in #13840. (@rkern for future reference, it's probably good to explicitly call out PRs -- they are easy to miss in GitHub if you only provide a back-link, since there are no notifications for that.)

One minor nit: what about calling this np.random.default_generator() instead? gen feels too short/non-obvious to me. I would be curious what others think.

@charris
Copy link
Member

charris commented Jun 30, 2019

what about calling this np.random.default_generator() instead?

I had the same thought, but then, np.random.default_generator() is a hair on the long side, so I played with default_rng.

@shoyer
Copy link
Member

shoyer commented Jun 30, 2019

👍 I like default_rng better than default_gen, too. I would be happy with either of these, though I would still lean towards default_generator.

@rkern
Copy link
Member Author

rkern commented Jun 30, 2019

👍 for default_rng().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests