Centralization of the seeded random number generator #2021

quaquel · 2024-01-31T16:14:42Z

This is a proposed solution for the issue raised in #1981. Currently, the seeded random number generator resides in the model. Any other class that might need to generate random numbers (e.g., agent, AgentSet, the tentative CellCollection, the various spaces) thus need a reference to the model in order to use the seeded random number generator.

This PR offers a complementary solution that can be used throughout MESA. Rather than using a Singleton (as suggested in #1981), I have modeled it on how logging works. So, we have a new rng module with a global variable containing the default seeded random instance. The model sets it. A simple get_default_rng function can access the default random number generator.

Also, many current classes already have a random property to get the random number generator from the model. I propose to generalize this by adding a descriptor class (the proper use of a descriptor this time). In short, this descriptor will retrieve the default random number generator when it is set to None.

For an example of how it is all used, check the modifications in the agent module.

github-actions · 2024-01-31T16:19:20Z

Performance benchmarks:

Model	Size	Init time [95% CI]	Run time [95% CI]
Schelling	small	🔴 +27.7% [+27.2%, +28.3%]	🔴 +15.3% [+15.2%, +15.5%]
Schelling	large	🔵 -18.0% [-41.3%, +5.8%]	🔴 +26.3% [+24.3%, +28.7%]
WolfSheep	small	🔴 +26.0% [+25.2%, +26.7%]	🔴 +35.4% [+33.5%, +37.3%]
WolfSheep	large	🔴 +26.8% [+21.3%, +36.6%]	🔴 +54.1% [+49.4%, +58.8%]
BoidFlockers	small	🔴 +10.9% [+10.3%, +11.5%]	🔵 +0.3% [-0.4%, +1.0%]
BoidFlockers	large	🔴 +11.3% [+10.4%, +12.1%]	🔵 -0.4% [-0.8%, -0.0%]

for more information, see https://pre-commit.ci

quaquel · 2024-01-31T19:44:13Z

the benchmark results are outdated. My main branch was not in sync with upstream. I fixed it, and the performance differences are gone.
Currently, one test is failing. What is the desired behavior when setting model.random? This is a tricky question. The simple solution might seem to be to change the default rng as well. The current code does not implement this. It also means that the descriptor needs a minor modification. The second option is to not allow setting model.random directly, but only allow its setting through the keyword argument on model.__init__.
The current solution implicilty means that if you create multiple model instances. They will share their random number generator. Again, I am unsure whether this is the desired behavior. It differs from the current behavior where the random number generator is encapsulated within the model class.

github-actions · 2024-01-31T20:02:03Z

Performance benchmarks:

Model	Size	Init time [95% CI]	Run time [95% CI]
Schelling	small	🔴 +26.8% [+26.4%, +27.2%]	🔵 +3.0% [+2.9%, +3.2%]
Schelling	large	🔴 +18.1% [+12.5%, +23.8%]	🔵 +3.1% [+0.9%, +5.7%]
WolfSheep	small	🔴 +24.0% [+23.7%, +24.4%]	🔴 +4.4% [+4.2%, +4.5%]
WolfSheep	large	🔴 +25.8% [+24.7%, +26.9%]	🔴 +12.6% [+11.0%, +13.9%]
BoidFlockers	small	🔴 +9.4% [+8.9%, +9.8%]	🔵 +0.4% [-0.2%, +1.0%]
BoidFlockers	large	🔴 +10.0% [+9.6%, +10.5%]	🔵 -0.9% [-1.5%, -0.4%]

rht · 2024-02-01T03:32:43Z

mesa/agent.py

@@ -25,6 +24,8 @@
    from mesa.model import Model
    from mesa.space import Position

+from mesa.rng import RandomDescriptor


mesa.random is less surprising than mesa.rng. People are used to CPython's random module, and numpy.random.

rht · 2024-02-01T05:46:09Z

I'm fine with the change since it doesn't break compatibility, and I can always swap with NumPy RNG when needed. NumPy RNG is much more performant at pre-emptively producing lots of random numbers at once.

I see it to be CPython's import random; random.random() with an extra feature that the global instance is seedable.

quaquel · 2024-02-01T06:43:21Z

I'm fine with the change since it doesn't break compatibility, and I can always swap with NumPy RNG when needed. NumPy RNG is much more performant at pre-emptively producing lots of random numbers at once.

I see it to be CPython's import random; random.random() with an extra feature that the global instance is seedable.

The exact naming is the least of my concerns at the moment. So if others agree, I am fine with renaming.

I added a third point above. I would appreciate input from anyone on all three.

rht · 2024-02-01T06:51:18Z

The current solution implicilty means that if you create multiple model instances. They will share their random number generator. Again, I am unsure whether this is the desired behavior. It differs from the current behavior where the random number generator is encapsulated within the model class.

I suppose it is the intended behavior anyway when you don't seed each model instances.

Also, many current classes already have a random property to get the random number generator from the model. I propose to generalize this by adding a descriptor class (the proper use of a descriptor this time). In short, this descriptor will retrieve the default random number generator when it is set to None.

Why is the time and steps accessed via the model (obj.model.steps, obj.model.time), but the random should be obj.random? If you want to avoid the object having model as its attribute, it is still unavoidable because sometimes the object wants to check the time and steps.

Edit: clarify last paragraph.

quaquel · 2024-02-01T06:57:16Z

Why is the time and steps accessed via the model (obj.model.steps, obj.model.time), but the random should be obj.random? If you want to avoid the object having model as its attribute, it is still unavoidable because sometimes the object wants to check the time and steps.

I don't focus on this in this PR. I started this because of the need for random in CellCollection, AgentSet, and DiscreteSpace and its subclasses. We can generalize this solution if we discover that many classes need access to time and step.

Corvince · 2024-02-01T08:26:22Z

mesa/agent.py


-    def __init__(self, agents: Iterable[Agent], model: Model):
+    def __init__(self, agents: Iterable[Agent], model: Model, random=None):


This PR means we can remove model here, right? That would be great

Agents might still need to access the clock info (time and steps).

This is within the AgentSet, not the agent.

Corvince · 2024-02-01T08:37:15Z

the benchmark results are outdated. My main branch was not in sync with upstream. I fixed it, and the performance differences are gone.

This is an interesting caveat of the benchmarks script @EwoutH . It probably means the benchmark action should always merge main into the PR, to compare the actual differences.

Currently, one test is failing. What is the desired behavior when setting model.random? This is a tricky question. The simple solution might seem to be to change the default rng as well. The current code does not implement this. It also means that the descriptor needs a minor modification. The second option is to not allow setting model.random directly, but only allow its setting through the keyword argument on model.__init__.

Can you give more details on why the test is failing and what are the implications? I am unsure at the moment

The current solution implicilty means that if you create multiple model instances. They will share their random number generator. Again, I am unsure whether this is the desired behavior. It differs from the current behavior where the random number generator is encapsulated within the model class.

Thats an interesting caveat. I think if someone creates multiple model instances they would expect to the models to lead to different outcomes, if they don't explicitly set the seed to the same value. So not too happy with this one. Although wait, this shouldn't actually matter. If you run the models after each other, they should still give different results, right? So what are the possibly downsides of this?

rht · 2024-02-01T09:55:52Z

Why is the time and steps accessed via the model (obj.model.steps, obj.model.time), but the random should be obj.random? If you want to avoid the object having model as its attribute, it is still unavoidable because sometimes the object wants to check the time and steps.
I don't focus on this in this PR. I started this because of the need for random in CellCollection, AgentSet, and DiscreteSpace and its subclasses.

The time and steps are similar to the rng in that any constituent objects need to be able to access values from the "admin of the Matrix". I prefer that the time, steps, and the rng to be accessed via the same method, for consistency. This used to be from the model object.

We can generalize this solution if we discover that many classes need access to time and step.

There are at least 3, which are plenty enough: the current data collector, the current batch_run, and the Poisson activation scheduler / any discrete event scheduler.

quaquel · 2024-02-01T09:57:40Z

Can you give more details on why the test is failing and what are the implications? I am unsure at the moment

The test that is failing is in test_time.py:

    def test_shuffle_shuffles_agents(self):
        model = MockModel(shuffle=True)
        model.random = mock.Mock()
        assert model.random.shuffle.call_count == 0
        model.step()
        assert model.random.shuffle.call_count == 1

What happens is that in MockModel, the default rng is set. Next, we assign a mock to model.random. This, however, does not change the default rng. So, the AgentSet within the scheduler uses the first default rather than the one later assigned to model.random. Hence, the assertion fails because the call_count does not match.

Thats an interesting caveat. I think if someone creates multiple model instances they would expect to the models to lead to different outcomes, if they don't explicitly set the seed to the same value. So not too happy with this one. Although wait, this shouldn't actually matter. If you run the models after each other, they should still give different results, right? So what are the possibly downsides of this?

I think this requires a more detailed explanation than I currently have time for. There is no problem creating model1, running it; creating model2, running it; etc. So, for batch runs and replications, there is no problem. You can get a problem if you have two models running in a lockstep way. In short, you can get into situations where the model's behavior becomes not reproducible. For example

model1 = Model(seed=42)
model1.step()

model2 = Model(seed=None) # changes the default rng

...

# which rng is now being used? Random(42) or Random(None).
model1.step()
model2.step() # same question

Corvince · 2024-02-01T10:35:42Z

Thanks for the clarification, I'll think about that

quaquel · 2024-02-01T11:03:54Z

The time and steps are similar to the rng in that any constituent objects need to be able to access values from the "admin of the Matrix". I prefer that the time, steps, and the rng to be accessed via the same method, for consistency. This used to be from the model object.

We can generalize this solution if we discover that many classes need access to time and step.

There are at least 3, which are plenty enough: the current data collector, the current batch_run, and the Poisson activation scheduler / any discrete event scheduler.

I agree in principle. But please let's keep this PR focussed on addressing the issue with random. Once this PR is complete and we have a solution, I'll happily open another PR for time etc. At the moment, I am still not entirely sure the presented approach is the way forward because of the issues I have raised.

quaquel and others added 5 commits January 31, 2024 19:02

Create util.py

b3d848f

ongoing tests

5fca55a

added RandomDescriptor, use in agent.py and cleanup

51e70b0

[pre-commit.ci] auto fixes from pre-commit.com hooks

c29fef2

for more information, see https://pre-commit.ci

code cleaning

e4ae8a9

quaquel force-pushed the rng branch from 34ea7b1 to e4ae8a9 Compare January 31, 2024 18:02

EwoutH added trigger-benchmarks Special label that triggers the benchmarking CI enhancement Release notes label and removed trigger-benchmarks Special label that triggers the benchmarking CI labels Jan 31, 2024

rht reviewed Feb 1, 2024

View reviewed changes

Corvince reviewed Feb 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centralization of the seeded random number generator #2021

Centralization of the seeded random number generator #2021

quaquel commented Jan 31, 2024

github-actions bot commented Jan 31, 2024

quaquel commented Jan 31, 2024 •

edited

github-actions bot commented Jan 31, 2024

rht Feb 1, 2024

rht commented Feb 1, 2024

quaquel commented Feb 1, 2024

rht commented Feb 1, 2024 •

edited

quaquel commented Feb 1, 2024

Corvince Feb 1, 2024

rht Feb 1, 2024

quaquel Feb 1, 2024

Corvince commented Feb 1, 2024

rht commented Feb 1, 2024

quaquel commented Feb 1, 2024 •

edited

Corvince commented Feb 1, 2024

quaquel commented Feb 1, 2024


		def __init__(self, agents: Iterable[Agent], model: Model):
		def __init__(self, agents: Iterable[Agent], model: Model, random=None):

Centralization of the seeded random number generator #2021

Are you sure you want to change the base?

Centralization of the seeded random number generator #2021

Conversation

quaquel commented Jan 31, 2024

github-actions bot commented Jan 31, 2024

quaquel commented Jan 31, 2024 • edited

github-actions bot commented Jan 31, 2024

rht Feb 1, 2024

Choose a reason for hiding this comment

rht commented Feb 1, 2024

quaquel commented Feb 1, 2024

rht commented Feb 1, 2024 • edited

quaquel commented Feb 1, 2024

Corvince Feb 1, 2024

Choose a reason for hiding this comment

rht Feb 1, 2024

Choose a reason for hiding this comment

quaquel Feb 1, 2024

Choose a reason for hiding this comment

Corvince commented Feb 1, 2024

rht commented Feb 1, 2024

quaquel commented Feb 1, 2024 • edited

Corvince commented Feb 1, 2024

quaquel commented Feb 1, 2024

quaquel commented Jan 31, 2024 •

edited

rht commented Feb 1, 2024 •

edited

quaquel commented Feb 1, 2024 •

edited