Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Generators with a global prng within pymoo. #476

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

CompRhys
Copy link

Addresses #469

This is a minimal attempt to fix the issue of pymoo changing the global seed.

  • ensure all tests pass

@CompRhys CompRhys marked this pull request as draft August 31, 2023 23:44
@CompRhys
Copy link
Author

CompRhys commented Sep 1, 2023

@jacktang I made a first pass at this by creating a replacement global PRNG in the package but haven't been able to understand why the same seed tests fail. Perhaps still some randomness not taking the PRNG or I am overlooking something wrt managing global variables.

Test failed due to implications of from x import y behavior that I have now changed. Nonetheless solution here doesn't seem fully robust and some tests still fail I think primarily due to hash checking tests?

@jacktang
Copy link

jacktang commented Sep 1, 2023

@CompRhys I suggest to design pymoo own RandGenerator class and keep the state rather than using variable. The simplest implementation may looks like:

import numpy as np

class RandGenerator:
    def __init__(self, seed=None):
        self.rng = np.random.default_rng(seed)
        
    def ints(self, low, high):
        return self.rng.integers(low=low, high=high)
        

I am still looking into the random in multi-threading part.

@CompRhys
Copy link
Author

CompRhys commented Sep 1, 2023

Seems like a much bigger refactor to be instantiating new PRNGs everywhere and then potentially runs the risk of users using the same seed in everything and ending up be correlated PRNGs?

If it would just be to create a Generator class to wrap the numpy one that also seems non-ideal particularly given the example here where you've changed the API and then it would shift the onus of documentation onto pymoo?

@jacktang
Copy link

jacktang commented Sep 1, 2023

Seems like a much bigger refactor to be instantiating new PRNGs everywhere and then potentially runs the risk of users using the same seed in everything and ending up be correlated PRNGs?

If it would just be to create a Generator class to wrap the numpy one that also seems non-ideal particularly given the example here where you've changed the API and then it would shift the onus of documentation onto pymoo?

Nice catch! @CompRhys. We can add seed function to the RandGenerator to set the global seed, the ugly implementation like

import numpy as np

_PYMOO_SEED = None

class RandGenerator:
    def __init__(self, seed=None):
        seed = seed or _PYMOO_SEED
        self.rng = np.random.default_rng(seed)
        
    def integers(self, low, high):
        return self.rng.integers(low=low, high=high)
        
    def normal(self, mu, sigma):
        return self.rng.normal(loc=mu, scale=sigma)

    @classmethod
    def seed(cls, seed):
        global _PYMOO_SEED
        _PYMOO_SEED = seed



np.random.seed(100)
RandGenerator.seed(42)
rng = RandGenerator()
print(rng. integers(1, 10))
print('np rand=', np.random.randint(10))
print(rng.normal(0, 1))

And if you prefer to numpy style api, you can instance the local rand generator when it is imported.

@CompRhys
Copy link
Author

CompRhys commented Sep 1, 2023

I think this also suffers from correlated PRNGs as you're global seed is fixed meaning that every PRNG you spawn will start at the same point in the sequence. You would need to use the spawn of the Bit_Generator to ensure randomness which necessitates having some global PRNG.

With multi-threading we will definately need to work out a spawn strategy. I honestly have no idea what would happen with this PRs implementation currently I assume all the processes would copy the global PRNG and use correlated sequences but threads might be okay? I am not sure if there is any multi-processing state management currently as this PR is as close as reasonable to just using the np global state without doing so.

@jacktang
Copy link

jacktang commented Sep 2, 2023

Hello @CompRhys , for multi-thread local random generator in pymoo, I didn't spend enough time on looking deep into it this week. And here is the document of multi threading numpy random generator : https://numpy.org/doc/stable/reference/random/multithreading.html, hope it helps. Happy hacking :)

@CompRhys
Copy link
Author

CompRhys commented Sep 6, 2023

@blankjul Any thoughts on this PR as stands? The only tests that currently fail are the hash assert tests that check there haven't been any changes in output. Due to the randomness of the GA and the changes to how the state is handled these hashes will need to be updated before merging. I will update those tests if you're happy with the rest of the PR.

@blankjul
Copy link
Collaborator

Thanks for putting so much effort into this! And sorry for the delay but I was out of the office during the summer.

In the first release of pymoo, I implemented my own pseudo-random generator in fact and also had the option to change them (e.g. use the one from numpy instead). After another issue a few months ago, I realized that using globally the numpy random method is not the best design, however, at this point used throughout the whole framework.

I had a look at the comments through this PR, and agree if we reengineer this then we might want to think about multi-threading as well. In my opinion, the options are:

  • Global Instance (like in this PR currently). I don't think this is thread-safe.
  • Singleton Pattern (very similar but more flexible e.g. return a PRG for different thread ids)
  • Binding a PRG to an algorithm (one downside is then it needs to be passed to all operators and other methods). But also multiple algorithms can run with different PRGs.
  • Creating a Context or Environment object to generalize run-specific variables (I have seen this in other frameworks but not sure if this is overkill here)

What is your opinion on these solutions?

@CompRhys
Copy link
Author

So my belief is that the way this is implemented here doesn't introduce any new issues that aren't present in the current numpy global state. A strict singleton would be better but I am not sure that there is a natural/clean way to define singletons in python? The only problem with this global is you can't use from x import y pattern as then the rng is recreated.

I also think binding the PRG to the algorithm would work but you would need to ensure that users didn't decide to just seed everything with 0 which could be fairly common.

I am not really sure of what the Context implementation might look like and so can't really say anthing.

@jacktang
Copy link

jacktang commented Sep 13, 2023

  • Global Instance (like in this PR currently). I don't think this is thread-safe.
  • Singleton Pattern (very similar but more flexible e.g. return a PRG for different thread ids)
  • Binding a PRG to an algorithm (one downside is then it needs to be passed to all operators and other methods). But also multiple algorithms can run with different PRGs.
  • Creating a Context or Environment object to generalize run-specific variables (I have seen this in other frameworks but not sure if this is overkill here)

Hello @blankjul , What's the difference between 2nd solution and 4th solution? In 4th solution, did you mean that each algorithm owned one Context instance, and others object access random generator via the Context? and the singleton solution maintained only one Context globally?

@blankjul
Copy link
Collaborator

@CompRhys I will try this weekend to work on your PR to add a Singleton interface. Should generally not be that difficult.

@blankjul
Copy link
Collaborator

Instead of just getting a PRG the context could store more variables, e.g. Machine Accuracy, eps, ... I am currently not sure if this makes sense to bind or not. Basically, this could also be a global Config.

@blankjul blankjul self-assigned this Sep 18, 2023
@blankjul
Copy link
Collaborator

blankjul commented Sep 18, 2023

@CompRhys Can you give me permission to push to your PR? (I think it is not letting me do that because the fork persists in your repo)

If that does not work please let me know and I will create a separate pull request.

My solution would be to create an object

import numpy as np

class RandomState(object):
    _instance = None

    def __new__(cls, seed=None, new=False):
        if cls._instance is None:
            cls._instance = np.random.RandomState(seed)

        return cls._instance


which can then be used throughout the framework by, e.g.

from pymoo.random import RandomState
RandomState().choice([a, b])

@jacktang
Copy link

Hello @blankjul, please consider thread-safe singleton:

class Singleton(object):
   _lock = threading.Lock()
   _instance = None

   @classmethod
   def instance(cls):
   	if not cls._instance:
   		with cls._lock:
   			if not cls._instance:
   				cls._instance = cls()
   	return cls._instance

@CompRhys
Copy link
Author

It's set to allow maintainer edits and so I'm not sure why it doesn't work. I am not sure it makes sense to give you access to my fork so probably better to fork off this and make a new PR. I quickly did a find replace with Jack's singleton pattern but it errors taking in an argument so perhaps also needs a __new__.

@CompRhys
Copy link
Author

I think there would need to be further logic to ensure that the threading lock and the thread singletons were still deterministically set according to the seed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants