Seed generation issue? #7

mattfidler · 2018-05-10T05:43:08Z

I was looking at your code, and noticed

NumericVector seeds = runif(ncores, 1.0, std::numeric_limits<uint32_t>::max());

Have you considered https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/

Perhaps instead:

NumericVector seeds(ncores);
seeds[ncores-1] = runif(1, 1.0, std::numeric_limits<uint32_t>::max());
for (unsigned int j = ncores-1; j--;){
   seeds[j] = seeds[j+1]-1;
   if (seeds[j] == 0) seeds[j] = std::numeric_limits<uint32_t>::max()-1;
}

While it may not be quite as fast as your solution, it can avoid the birthday problem

mattfidler · 2018-05-10T05:44:21Z

This would reduce the probability of having similar seeds for each core, though with the size of the uint32 and the number of cores people typically have, this still would be low.

mfasiolo · 2018-05-10T10:22:36Z

Hi Matt, thanks for pointing this out. I wonder whether the best option would be depending directly on the sitmo package which might be setting the seed in a more sophisticated way (I haven't checked). If we believe the first approximation to the birthday problem given in Wikipedia, than the probability of a collision when using a 1000 cores is around

1 - exp(-1000^2/(2*(2^32)))
[1] 0.0001164085

with a more realistic 64 cores we have O(10^-7) probability of collision. I'll think about it.

mattfidler · 2018-05-10T10:26:09Z

I looked at the examples in the sitmo package and they have all the seeds specified by the user for each core. I don't think they actually do anything for this problem. I could be wrong.

mattfidler · 2018-05-10T10:30:20Z

As I said, the probability is low. However, the probability with 1000 cores using the sequential solution is 0.

mattfidler · 2018-06-14T16:35:21Z

Wouldn't the probabily increase as you make multiple calls to the mvnfast? It generates a seed from the uniform distribution every time it starts the mvnfast number generation. Still be low, but it isn't only based on the number of threads.

mfasiolo · 2018-06-20T18:48:38Z

Hi Matt,

ok, hopefully this 5094d5a addresses your concerns.

mattfidler · 2018-06-21T14:19:15Z

Indeed it is better. Now it only depends on how many times you call mvnfast.

The only way to completely overcome it is to create your own seed setting routine or reading the R seed's state and using the R seed (I think that would work...?)

According to https://github.com/wch/r-source/blob/3105967d568ce6e411c63a0cbecabaec695b5cc0/src/library/base/man/Random-user.Rd

You could read the seed state by:

user_unif_nseed

and using the R's seed as sitmo's seed. You have to advance this seed somehow probably by a throw-away uniform random number.

mattfidler · 2018-06-21T14:24:19Z

If you are interested in this approach, it wouldn't be too hard to adapt...?

mattfidler · 2018-06-21T14:27:01Z

Also, I have heard that the construct

for (unsigned int j = ncores-1; j--;){
   seeds[j] = seeds[j+1]-1;
   if (seeds[j] == 0) seeds[j] = std::numeric_limits<uint32_t>::max()-1;
}

Takes slightly less operations than a full for loop, and therefore should be slightly faster. On modern machines, it shouldn't matter too much, but that is why I chose this loop style.

mattfidler · 2018-06-21T14:30:44Z

Now the reference:

https://www.thegeekstuff.com/2015/01/c-cpp-code-optimization/

Item 7 talks about the for-loop optimization above.

rstub · 2023-08-31T21:20:43Z

Interesting problem. One alternative could be to use the same seed value on all cores but pre-set the internal counter. For example, in dqrng I use

template<>
inline void random_64bit_wrapper<sitmo::threefry_20_64>::seed(result_type seed, result_type stream) {
    gen.seed(seed);
    gen.set_counter(0, 0, 0, stream);
    cache = false;
}

This way I get 2^64 streams with 3 64bit integers per-stream for counting.

mattfidler · 2023-08-31T22:50:15Z

I do something similar for parallel solving in rxode2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seed generation issue? #7

Seed generation issue? #7

mattfidler commented May 10, 2018

mattfidler commented May 10, 2018

mfasiolo commented May 10, 2018

mattfidler commented May 10, 2018

mattfidler commented May 10, 2018

mattfidler commented Jun 14, 2018

mfasiolo commented Jun 20, 2018

mattfidler commented Jun 21, 2018

mattfidler commented Jun 21, 2018

mattfidler commented Jun 21, 2018

mattfidler commented Jun 21, 2018

rstub commented Aug 31, 2023

mattfidler commented Aug 31, 2023

Seed generation issue? #7

Seed generation issue? #7

Comments

mattfidler commented May 10, 2018

mattfidler commented May 10, 2018

mfasiolo commented May 10, 2018

mattfidler commented May 10, 2018

mattfidler commented May 10, 2018

mattfidler commented Jun 14, 2018

mfasiolo commented Jun 20, 2018

mattfidler commented Jun 21, 2018

mattfidler commented Jun 21, 2018

mattfidler commented Jun 21, 2018

mattfidler commented Jun 21, 2018

rstub commented Aug 31, 2023

mattfidler commented Aug 31, 2023