Skip to content
This repository has been archived by the owner on Mar 15, 2022. It is now read-only.

Seeding behaviour issue #408

Open
obriente opened this issue Jun 11, 2020 · 0 comments
Open

Seeding behaviour issue #408

obriente opened this issue Jun 11, 2020 · 0 comments

Comments

@obriente
Copy link

So I just saw the seeding behaviour in _study.py, e.g. at:

save_x_vals, seeds[i] if seeds is not None else

the offending statement being the line:

seeds[i] if seeds is not None else numpy.random.randint(2**16)

The issue with this line is that if a user runs multiple copies of their code in parallel on a cluster, these often get initially seeded when numpy is imported by the system time, which can result in their internal random number generators being identical. This propagates through to any future seeding using these internal random number generators, and winds up giving correlated data that the user doesn't expect.

As we give the user the option to specify their own seeds, they can definitely circumvent the issue themselves, but if they don't know about the problem, this becomes a notoriously hard error to find and debug, as it usually only presents as seemingly super random correlations / signal noise being larger than expected (and it doesn't replicate easily).

Also, I'm slightly worried that passing around seeds and updating numpy.random with them can lead to some really funky behaviour if numpy is called separately in two files (at least, I've observed this in the past) - namely, that there can be multiple internal rngs hiding behind the scenes.

I don't know if there's a 'standard' method for fixing this, but I have two suggestions: firstly, I would suggest adding a warning whenever we need to seed a rng and the user doesn't provide a seed to use. Secondly, I would suggest passing explicit numpy.random.RandomStates around instead of seeds for numpy.random, as this makes it easier to keep track of what rngs we actually have.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant