Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add system state tracking #1933

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

quaquel
Copy link
Contributor

@quaquel quaquel commented Dec 31, 2023

This PR adds functionality for tracking the overall system state. It builds on discussions in #574 and #1930. A key design choice is that only attributes that are explicitly declared to be part of the system state are being tracked. This can be done using the AgentState and ModelState descriptors. It is at the moment fully backward compatible: it just adds a new optional feature.

This PR is a first draft implementation and open for feedback, suggestions, and improvements.

Key changes

To be able to track the overall system state, this PR adds 4 new classes: SystemState, AgentState, ModelState, and the helper class Field. SystemState is to be used within the model instance and is a singleton (not explicitly enforced). SystemState is in essence a sparse matrix with object identifiers ("model" or agent.unique_id at the moment) for rows, and the attribute names for columns. A given object identifier and attribute name point to a Field instance. This is a helper class with only a value attribute (see implementation details below). AgentState and ModelState are descriptors. Only attributes declared as AgenState or ModelState are tracked in the SystemState.

Usage examples

1. Initialize the system state object

The current implementation leaves SystemState as optional, but assumes that if used it is assigned to self.system_state. I would, in the future, make this default behavior and just move self.system_state = SystemState() to Model.__init__.

class MoneyModel(Model):
    """A model with some number of agents.

    Parameters
    N : int
        the number of agents in the space
    width : int
            the width of the space
    height : int
             the height of the space
    """

    def __init__(self, N, width, height):
        super().__init__()
        self.system_state = SystemState()

2. declare an agent state or model state as observable

To enable state variables to be tracked in the system state, the relevant attribute has to de declared as an AgentState or ModelState. This assigns the relevant descriptor to this attribute. Any object can be assigned to the attribute. Tracking of the state starts with the first assignment to the attribute. So, wealth = AgentState() makes wealth observable, but tracking only starts once self.wealth = 1 is executed. This triggers the __set__ method on the descriptor, which registers the agent with the system state and retrieves the relevant field in which to store the state. Any new assignment to self.wealth results in an update in the system state as well. If an agent is garbage collected, the agent is automatically removed from the system state (using weakref.finalize).

class MoneyAgent(Agent):
    """ An agent with fixed initial wealth."""
    wealth = AgentState()

    def __init__(self, unique_id, model):
        super().__init__(unique_id, model)
        self.wealth = 1


class MoneyModel(Model):
	gini = ModelState()

3. fast querying of system state

The system state is designed for rapid access by both row and column. This is particularly useful for data collection. For example, below, we have compute_gini from the Boltzman wealth model. Rather than having to get the wealth from all agents, we can simply get the relevant wealth column from the system state. Note also how we can access the column in an attribute-like manner (i.e., system_state.wealth). This returns a dict with object identifiers as keys and the states (so not the Fields) as values.

def compute_gini(model):
    x = sorted(model.system_state.wealth.values())

implementation details, choices, and assumptions

The basic structure of this PR rests on the use of descriptors for tracking state attributes. Descriptors offer a highly performant way to control accessing and setting attributes. It also offers users full control over which attributes should be tracked as part of the system state and which ones can be ignored. Thus, there is no need to make any assumptions about this within MESA itself (contra #574). In the current implementation, declaring that a given attribute is observable (i.e., to be tracked) happens at the class level. So in principle the specified attribute is observable for all instances of the class. However, because tracking only starts with the first assignment to the attribute within the instance, it is possible to have more fine grained control over what is being tracked and when. Also, tracking is automatically stopped once an instance is garbage collected, preventing memory leaks.

SystemState is currently implemented as a sparse matrix using a bunch of dictionaries. The key design consideration was to have rapid access to both individual rows (i.e., all object attributes being tracked) and individual columns (i.e., a given attribute being tracked for Class). I briefly looked at the sparse matrices in SciPy, but those won't work because you cannot have dtype=object. I later discovered that there is a sparse dataframe in pandas. I haven't tested this yet but I am happy to do so if desired.

The current implementation uses the Field helper class. Each active cell in the SystemState is a Field instance. Field only has a value attribute which is the value of the instance state attribute we want to track. I chose to do it this way, because it allows for rapid updating of values in the system state. Below, you see the __set___ method for AgentState. As you can see, first, we try to retrieve the relevant Field instance from the Agent instance (field = getattr(instance, self.private_field)). If this succeeds, the agent attribute combination is actively being tracked, so we can just assign the new value directly ('field.value = value'). Otherwise, the agent attribute combination needs to be added to the system state.

    def __set__(self, instance, value):
        key = instance.unique_id

        try:
            field = getattr(instance, self.private_field)
        except AttributeError:
            system_state = instance.model.system_state
            field = system_state.get_field(key, self.public_name)
            setattr(instance, self.private_field, field)

            if not hasattr(instance, "_finalizer_set"):
                finalize(instance, remover, instance.unique_id, system_state)
                instance._finalizer_set = True

        field.value = value
        setattr(instance, self.private_name, value)

An alternative implementation could be to just store the row and column identifier and set the value on the relevant dicts in the SystemState. However this requires a lookup in 3 dicts each time you do so. Lookup in dicts are an order of magnitude slower than setting a value on an object.

In #1930, there was a short discussion on event tracking using the Observer design pattern. This PR does not yet add this. However, it would be quite easy to built on this PR if you want to track SystemStateChange events. For example, you could modify Field to fire such an event each time Field.value is set to a new value.

performance implications

some quick tests on the Boltzman wealth model suggests that there is a performance penalty for adding the system state. The model is about 16% slower with just the system state and the rest kept the same. However, once system state is used in data collection, it is actually about 10% faster. To fully realize this performance gain, however, requires a more elaborate rewrite and rethink of the DataCollector. The Boltzman wealth model is a bit of worst case model for testing the system state, because it can have many updates to agent.wealth within a single tick while there is no further expensive logic within the model.

Copy link

codecov bot commented Dec 31, 2023

Codecov Report

Attention: 80 lines in your changes are missing coverage. Please review.

Comparison is base (97178f6) 79.38% compared to head (7071dd3) 74.13%.
Report is 15 commits behind head on main.

Files Patch % Lines
mesa/system_state.py 0.00% 80 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1933      +/-   ##
==========================================
- Coverage   79.38%   74.13%   -5.25%     
==========================================
  Files          15       16       +1     
  Lines        1130     1210      +80     
  Branches      244      250       +6     
==========================================
  Hits          897      897              
- Misses        202      282      +80     
  Partials       31       31              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@quaquel quaquel marked this pull request as draft December 31, 2023 11:35
@quaquel quaquel changed the title Adding system state tracking Add system state tracking Dec 31, 2023
@EwoutH
Copy link
Contributor

EwoutH commented Dec 31, 2023

Very, very interesting PR, thanks a lot.

The Boltzman wealth model is a bit of worst case model for testing the system state, because it can have many updates to agent.wealth within a single tick while there is no further expensive logic within the model.

That's an interesting case. For datacollection you would (normally) only need one update per tick (model step), instead of updating everytime it's modified. Is there a way to optimize this for datacollection? Because if (in the long term) we want to use this (by default) for datacollection, we need it to be performant even if states are updated many times within a tick.

@quaquel
Copy link
Contributor Author

quaquel commented Dec 31, 2023

In case of the Boltzman model, if you retrieve wealth not from the agent set, but directly from SystemState, the model with SystemState is about 10% faster (despite the many within tick state updates. You can probably speed this up even more. Both Gini and wealth are the same underlying data, so you could avoid having to get it twice.

@Corvince
Copy link
Contributor

Corvince commented Jan 3, 2024

Very interesting PR indeed, could you clarify some points for me?

First of all I am not sure what to think about having to declare tracked attributes up front. For my taste this mixes model implementation and data collection a bit too much. Currently one can just implement their ABMs and afterwards determine what attributes are of interest and collect those. With AgentState you either need to know interesting attributes beforehand or redeclare your attributes as AgentState and make changes to the agent/model class every time you want to change datacollection.

Also the type of each attribute becomes AgentState, right? I understand that I retrieve an AgentState i get the actual value, but what happens if I want to use an external function that for example expects a float and is implemented something like this:

def some_func(arg: float):
    if not isinstance(arg, float):
        raise Exception

Will this work or fail? I honestly don't know.

Unfortunately I haven't had time to play around with this PR, so I don't have a feeling for this. But I don't fully understand the data structure yet. Does this only track the current system state or the whole history?
As I understand it, it only tracks the current (latest) state. How does this then help with intra-tick updates? Is there a way to track those or can we still only collect data at fixed intervals (i.e. ticks)?

Regarding performance I am a bit skeptical that this will increase performance. It obviously comes with a performance penalty that you have to pay every time whether you currently need data collection or not (interactive visualizations). Again, currently you can comment out data collection super fast, but with AgentState you would need to modify your actual model. And I can't imagine this is completely returned and inverted in data collection. I mean either way +17% or -10% isnt much, but it will be interesting how this scales with more attributes collected (say 10 instead of 1).

To summarize this possibly confusing comment: I don't yet see the added benefit of this approach. Maybe you can clear that up for me.

@quaquel
Copy link
Contributor Author

quaquel commented Jan 3, 2024

The aim of SystemState is to have a convenient way of describing the subset of state variables that one is interested in at any given time instant. This can be for aiding data collection, but it can also be for visualization, or for rare event analysis. In fact, it is possible to replay the entire simulation by just saving all changes to the system state. This is how many discrete event packages make it possible to save and replay animations without actually rerunning the model.

The idea of the PR emerged out of playing around with other ways of doing data collection but then @EwoutH mentioned rare event analysis (#1930) and I was pointed to #574 as well.

First of all I am not sure what to think about having to declare tracked attributes up front. For my taste this mixes model implementation and data collection a bit too much.

Unfortunately, this is the only way I could come up with to make it very easy for the user to declare what should be tracked and what not. Also, since SystemState is not the same as data collection, I don't agree that you are mixing implementation and data collection. All you do with wealth = AgentState(() is declare that the wealth attribute is tracked in the system state. So in fact, you are very explicitly declaring what should be tracked. You can even easily modify what you want tracked by just commenting out this one statement without having to touch any other part of the code. So, in fact, you can do exactly what you describe: built your model as normal and only later declare specific attributes to be tracked by the system state. Moreover, you can still build and run your model without it if you so desire.

Also the type of each attribute becomes AgentState, right? I understand that I retrieve an AgentState i get the actual value, but what happens if I want to use an external function that for example expects a float and is implemented something like this:

AgentState is in essence a sparse matrix. The rows are object identifiers (i.e., agent.unique_id, or "model"). The columns are attribute names. You can quickly retrieve by row or column. For convenience, I made it possible to retrieve by column like this

wealth = agent_state.wealth

If you retrieve a column via attribute lookup, you will get a dict with the object identifiers as keys, and underlying actual value (so a float in this case) as value. So, taking your example, all you need to do before calling some_func is agent_state.wealth.values().

To summarize this possibly confusing comment: I don't yet see the added benefit of this approach. Maybe you can clear that up for me.

I hope this answer clarifies the purpose of SystemState. It is an additional layer of abstraction that can be used for various purposes, including data collection, animation, model replay without rerunning the model, rare event analysis, and possibly other use cases.

@Corvince
Copy link
Contributor

Corvince commented Jan 3, 2024

Thank you for your extensive reply! Yes this clarifies a lot of points, but also raises some other questions ;)

The aim of SystemState is to have a convenient way of describing the subset of state variables that one is interested in at any given time instant. This can be for aiding data collection, but it can also be for visualization, or for rare event analysis. In fact, it is possible to replay the entire simulation by just saving all changes to the system state. This is how many discrete event packages make it possible to save and replay animations without actually rerunning the model.

I still don't completely understand what makes this unique in its ability to facilitate those points.

For simplicity let's assume for the Boltzmann model the system state is captured by wealth and the gini index. We can already capture those with the normal data collector. The difference with this implementation is just how the data is stored. In fact we don't expose how data is stored in the data collector, so it might as well have the same structure as implemented here. The difference is just in how the values are collected. I find the data collector approach to be more declarative, where here it is made imperative through AgentState.

In fact you could argue that things like replaying are only enabled by data collection and systemstate is just a form of data collection.

I am not saying (yet) one approach is better than the other. I just have the feeling this does the same thing as data collection, just differently.

@quaquel
Copy link
Contributor Author

quaquel commented Jan 3, 2024

For simplicity let's assume for the Boltzmann model the system state is captured by wealth and the gini index. We can already capture those with the normal data collector.

There is a fundamental difference between the data collector and what SystemState offers. The datacollector only collects the data whenever it is called, which is typically once each step of the model. So, the datacollector represents the state of the system at fixed time intervals (i.e., ticks). It does not capture any state changes that happen within a tick. In contrast, SystemState always reflect the state of the simulation at that instant (i.e., within ticks).

@Corvince
Copy link
Contributor

Corvince commented Jan 3, 2024

I am sorry if I this is tiring, because I overlook something, but I fail to see how this advantage can be utilized. In my mind SystemState is just an abstraction over the current system state. The question now is how this is used. If it is used "within" the model, for example agents querying the system state for some condition, then I think it would be simpler for the agents to query the "true" state directly. If SystemState is observed from "outside" through an observer, for example for visualization purposes, I think you could still either just query the true model directly or you need some way to query the current system state - at which point this is equal to calling the datacollector multiple times within a tick and using its data. So I guess my primary question would be "Could there be specific scenarios where SystemState provides unique advantages over querying the model directly?"

Again, I am not saying one approach is better then the other. Having state updating automatically can have its advantages - especially if one could "listen" to changes, which I assume is within reach of this PR. And I agree now that you don't need to really rewrite your model/agent to enable/disable tracking - wealth = AgentState() is indeed just a one-liner independent of the rest. But I also like the explicitness and independence of the current data collector. Maybe there is some way to combine the best of both worlds.

@Corvince
Copy link
Contributor

Corvince commented Jan 3, 2024

I mean you could for example get the wealth of each agent in the same way your api provides by doing

wealth = {agent.unique_id: agent.wealth for agent in model.agents}

And you can do this whenever you want or need this (within ticks or at ticks). Or am I missing something?

@quaquel
Copy link
Contributor Author

quaquel commented Jan 3, 2024

The bigger picture is indeed being able to listen to events. It would be trivial to use AgentState and ModelState for firing StateChangeEvents and add a way to subscribe to these events (e.g., for logging, animation etc.).

The SystemState class is in principle separate from having a mechanism for event tracking. And you are right, you can always loop over all agents. The main benefit SystemState offers is that state updates are pushed to it rather than pulling all states from the agent when needed. This can add substantial speed advantages, and even more if state changes are rare, relative to how often you are retrieving the states.

It seems you keep equating SystemState and data collection. For me, they are quite different things where the latter can use the former but need not rely on the former.

@Corvince
Copy link
Contributor

Corvince commented Jan 4, 2024

Well indeed for me system state and data collection are very closely related. As I hinted earlier. It would be super simple to modify DataCollector to create SystemState each time collect() is called. The difference for me is indeed only push vs pull. And I disagree about the performance implications - you don't need to call collect() on each tick. You could also do something like this

# An agent method
def some_rare_event(self):
    self.wealth = 55
    self.model.datacollector.collect()

So you manually trigger collect only when needed. I agree that the API in this PR is more elegant, but unless you have lots of rare events (are they rare anymore?) I don't think it is too cumbersome. In contrast collecting lots of attributes could become awkward. Taken from the Epstein model you would need to do something like this

class citizen:
        hardship = AgentState()
        regime_legitimacy = AgentState()
        risk_aversion = AgentState()
        threshold = AgentState()
        ... (possibly more attributes)

        def __init__(self, ...):
            self.hardship = hardship
            self.regime_legitimacy = regime_legitimacy
            ...

This doesn't feel very pythonic to me. Plus you have to explain this to new users and give some explanation about AgentState, while its internals are definitely an advanced topic.

But I wonder if we could indeed combine data collector with this push based approach. Playing around a bit it seems to be possible to add AgentState descriptors on instances(edit: not on instances but dynamically) since Python 3.6. Rough idea for an API (Agent hasn't declared any AgentState here)

DataCollector(Agent: ["wealth"]) # Or SystemState()?
# and its implementation would look like this:
ags = AgentState()
setattr(Agent, wealth, ags)
ags.__set_name__(a, wealth)

What do you think?

@quaquel
Copy link
Contributor Author

quaquel commented Jan 4, 2024

I have been thinking about this quite a bit. In my view, there are 3 different questions that have to be answered.

  1. Should MESA have some form of event sourcing? That is, do we want to add some kind of Observer and Observable to MESA. In my view, this is highly desirable for a whole host of reasons. If you want to support event sourcing for the change in an attribute on an instance, you need to do two things:
wealth = new_value
self.fire_event(Event.StateChange)

Having a descriptor on wealth as with AgentState as implemented in this PR is just a form of syntactic sugar. You don't have to remember to fire a StateChange event every time you change wealth. The descriptor handles this for you. Is it a problem that you have to declare descriptors at the class level? No, in fact this is how it should be done in Python because it is part of the behavior of all instances of the class. I am not sure I follow your second code example, but calling dunder methods explicitly smells not particularly pythonic to me.

  1. What should the future of data collection be? I like the declarative API suggested by @Corvince in Multi-Agent data collection #348. In my view data collection is just the collection of an attribute, from an object, and potentially applying a callable to this. Whether the object is an agent, and AgentSet, the model, or something else the user created is immaterial. I also think that typically data collection happens at fixed intervals so you get clean evenly spaced dynamics over time. Having calls to the datacollector as suggested above thus does not make much sense to me. This, does, however, not mean that it should not be supported.

  2. If we have 1 and 2, do we still need a ModelState class as suggested in Model state format #574? If you have event sourcing, you can do whatever you want for push oriented data collectors. Take for example Axelrod's evolution of collaboration. Here you have different strategies, implemented as different classes of agents. You want to track how many instances there are of each strategy. If you have event sourcing (so 1), you can write a simple counter like below. If you have 2, you could simply query this object for its counts each time datacollector.collect() is called. I think having a strategy counter implemented through event sourcing (1) and being collectable as per (2) is a very clean way to do this type of task. It also would largely render SystemState redundant.

class StrategyCounter:
    def __init__(self):
        self.counter = defaultdict(int)        
        # assume we can subscribe to the class (although instance subscription should also be supported)
        Agent.subscribe(Event.AGENT_CREATED, self.notify_created)
        Agent.subscribe(Event.AGENT_KILLED, self.notify_killed)

    def notify_created(self, event)
        agent, time_instant = event
        self.counter[agent.__class__.__name__] +=1
                
        # makes it possible to retrieve agent counts as attributes so counter.TitForTaT would return
        # self.counter.get("TitForTat")
        setattr(StrategyCounter,  agent.__class__.__name__, property(fget=self.counter.get))

    def notify_killed(self, event)
        agent, time_instant = event
        self.counter[agent.__class__.__name__] -=1

@quaquel
Copy link
Contributor Author

quaquel commented Jan 4, 2024

So, I played around a bit more. Adding event sourcing can be quite simple:

from enum import StrEnum, auto

class Event(StrEnum):
    STATE_CHANGE = auto()
    AGENT_ADDED = auto()
    AGENT_REMOVED = auto()


class EventProducer:
    
    def __init__(self):
        self.subscribers = defaultdict(list)
    
    def subscribe(self, event, eventhandler:callable):
        self.subscribers[event].append(eventhandler)
        
    def unsubscribe(self, event, eventhandler):
        try:
            self.subscribers[event].remove(eventhandler)
        except ValueError:
            pass

    def fire_event(self, event, *args, **kwargs):
        for entry in self.subscribers[event]:
            entry(self, *args, **kwargs)


class Observable:

    def __get__(self, instance, owner):
        return getattr(instance, self.private_name)

    def __set_name__(self, owner, name):
        self.public_name = name
        self.private_name = f"_{name}"
    
    def __set__(self, instance, value):
        setattr(instance, self.private_name, value)
        instance.fire_event(Event.STATE_CHANGE, self.public_name)
        

Here, I defined 3 events using an enum to keep things clear. AGENT_ADDED and AGENT_REMOVED are tied to model._agents. For now, I used EventProducer as a mixin on Model and Agent. This does require moving the actual model initialization into a separate setup method because you can only subscribe to AGENT_ADDED and AGENT_REMOVED events after the model class has been instantiated. Observable is just a convenient way to always fire a STATE_CHANGE event when changing the value of any attribute being declared Observable. Observable is still a descriptor and this thus has to be declared at the class level.

With this, you can do simple stuff like this.

def eventlogger(model, instance):
    print(f"agent {instance.unique_id} created")
model.subscribe(Event.AGENT_ADDED, eventlogger)

model.setup()

for _ range(100):
    model.step()

It would be trivial to expand this example to simply write all events to some database:

class ComplexLogger:
    def __init__(self, model):
        model.subscribe(Event.AGENT_ADDED, self.agent_added_handler)
        model.subscribe(Event.AGENT_REMOVED, agent_removed_handler)

    def agent_added_handler(self, model, instance):
        instance.subscribe(Event.STATE_CHANGE, self.state_change_handler)
        write_to_log(f"Agent {instance.unique_id} created")

    def agent_removed_handler(self, model, instance):
        write_to_log(f"Agent {instance.unique_id} removed")

    def state_change_handler(self, instance, state):
        write_to_log(f"Agent {instance.unique_id} {state} changed to {getattr(instance, state)}")

@rht
Copy link
Contributor

rht commented Jan 7, 2024

My "ChatGPT" summary of the discussion: IIUC, marking the attributes with SystemState, AgentState, ModelState turns them to be effectively immutable (and thus can be pickled), and in effect, they become inspectable at any given time. This is similar to objects in Clojure, which are immutable, and has been designed such that operations of the Clojure objects are performant. Though, the problem in this PR is that you have to manually mark the attributes, which mixes model specification with how the users want them to be observed.

SystemState is currently implemented as a sparse matrix using a bunch of dictionaries. The key design consideration was to have rapid access to both individual rows (i.e., all object attributes being tracked) and individual columns (i.e., a given attribute being tracked for Class). I briefly looked at the sparse matrices in SciPy, but those won't work because you cannot have dtype=object. I later discovered that there is a sparse dataframe in pandas. I haven't tested this yet but I am happy to do so if desired.

I see that the closest approximation of this PR would be to implement the agent and model as a pandas DataFrame (which already has immutability by default, for all of the columns ("attributes")). Snapshots of all of the network of DFs at any given moment would be the system state. We can thus draw inspiration of the data collection / observation of this state from how the existing logging libraries have been designed to keep track of the objects in general.

I might be missing some of the points that have been raised.

@wang-boyu
Copy link
Member

... implement the agent and model as a pandas DataFrame ...

This was precisely the ideas I had sometime last year while thinking about creating an ABM library in R. Having agents and models as data frames will allow seamless integration with other data science libraries, particularly for analysis and visualization.

@quaquel
Copy link
Contributor Author

quaquel commented Jan 8, 2024

IIUC, marking the attributes with SystemState, AgentState, ModelState turns them to be effectively immutable (and thus can be pickled), and in effect, they become inspectable at any given time.

I think ChatGPT is suffering, as usual in my experience, from hallucinations 😃. Because this is not really what is going on. Let me try to summarize:

  1. SystemState is like a view in NumPy on the state of the simulation. I build it in response to Model state format #574.
  2. To automate the updating of SystemState, but in the absence of eventsourcing/support for the observer design pattern, I propose to use descriptors (i.e., AgentState and ModelState).

The net result of 1 and 2 is that you end up with A systemstate instance that is automatically updated whenever the underlying attribute in a model or agent that is declared to be part of the system state is updated. There is thus nothing immutable about any of this.

My thoughts are evolving on this PR. I believe that point 2 should be handled explicitly through eventsourcing/the observer pattern. It is relatively straightforward to add this to mesa (see #1933 (comment) for a quick sketch).

This leaves open the question of whether point 1 is needed. What is the value of having a view of the overall state of the simulation? @Corvince seems unconvinced of the value of this.

@quaquel
Copy link
Contributor Author

quaquel commented Jan 8, 2024

But I wonder if we could indeed combine data collector with this push based approach. Playing around a bit it seems to be possible to add AgentState descriptors on instances(edit: not on instances but dynamically) since Python 3.6. Rough idea for an API (Agent hasn't declared any AgentState here)

DataCollector(Agent: ["wealth"]) # Or SystemState()?
# and its implementation would look like this:
ags = AgentState()
setattr(Agent, wealth, ags)
ags.__set_name__(a, wealth)

What do you think?

Sorry for not replying to this question before. Are you suggesting that the datacollector dynamically adds the descriptors for the states it wants to track? If so, I can see merit in doing something along those lines in addition to the user having the freedom to explicitly declare it themselves if they choose to do so.

@rht
Copy link
Contributor

rht commented Jan 8, 2024

SystemState is like a view in NumPy on the state of the simulation. I build it in response to #574.

I find this to deviate from the point of #574, which is primarily to have a concrete representation of the model that is pickleable, and doesn't discuss about tracking. A NumPy array has a read-only view because it is a mutable object. If it were immutable from the start, the concept of view wouldn't be necessary in the first place.

This leaves open the question of whether point 1 is needed. What is the value of having a view of the overall state of the simulation? @Corvince seems unconvinced of the value of this.

While my concern would be in having to manually mark which specific attributes to have a view. I'd prefer for all of them to either have an immutable representation, that can be enabled via a switch, or be immutable from the get go. So that users won't have to consciously mix model specification and data collection, which goes against the principle of separation of concern.

@quaquel
Copy link
Contributor Author

quaquel commented Jan 8, 2024

I think we have a bit of a different reading of #574. I understood it as having an object that can be queried at any point in time and would return the simulation's overall state at that moment. Is this in line with your reading?

The point of the descriptors is that I wanted to avoid automatically tracking all attributes because of memory and performance concerns. In my experience, typically, only a subset of attributes of an agent is relevant to the outside world. Many other attributes are part of the internal functioning of the agent and might even be constant within the instance or even across all instances.

@Corvince
Copy link
Contributor

Corvince commented Jan 9, 2024

Sorry for not replying to this question before. Are you suggesting that the datacollector dynamically adds the descriptors for the states it wants to track? If so, I can see merit in doing something along those lines in addition to the user having the freedom to explicitly declare it themselves if they choose to do so.

Yes it exactly, that was my proposal. But as you correctly stated above I am still not convinced of the usefulness of an abstract model state. For me it just means we have an actual model state and an abstract model state, but I think it is better to try to work with the actual model state.

@quaquel
Copy link
Contributor Author

quaquel commented Jan 9, 2024

I think I now broadly understand your point and I might even be persuaded that a global 'view' on the model state might not be necessary.

I also like the idea of automatically adding descriptors, but this would interact with #1947 and #1944. So it might be better to first discuss those and depending on what is decided on those return to this question.

@EwoutH EwoutH added the feature Release notes label label Jan 9, 2024
@EwoutH
Copy link
Contributor

EwoutH commented Jan 21, 2024

@quaquel do you know in which direction you want to move with this?

@quaquel
Copy link
Contributor Author

quaquel commented Jan 22, 2024

I have 2 questions that will help me decide whether to close this PR.

  1. I have only really discussed the SystemState idea with @Corvince. He made good arguments for why it is not needed. If other maintainers agree, I would suggest closing this PR and closing Model state format #574 with a not-going-to-do message and a link to the discussion here.
  2. The implementation details are superseded by Proposal: adding some form of event sourcing to MESA #1947. Again, so far, I have only seen two reactions to which I have replied. I still am strongly in favor of Proposal: adding some form of event sourcing to MESA #1947.

@rht
Copy link
Contributor

rht commented Feb 6, 2024

Revisiting this PR in the light of the measure and data collection divide (point 3 in #1944 (comment)), to recycle the ideas. I find the API proposed here, in particular

def compute_gini(model):
    x = sorted(model.system_state.wealth.values())

and the wealth = AgentState() to be uncannily similar to the Measure-as-a-model-attribute API discussed on Matrix. Except that the "sets" of the concept don't fully overlap:

Descriptors offer a highly performant way to control accessing and setting attributes. It also offers users full control over which attributes should be tracked as part of the system state and which ones can be ignored.

Both subset of measures and Model/agent internal attributes may be tracked as a system state, or ignored.

SystemState is currently implemented as a sparse matrix using a bunch of dictionaries. The key design consideration was to have rapid access to both individual rows (i.e., all object attributes being tracked) and individual columns (i.e., a given attribute being tracked for Class). I briefly looked at the sparse matrices in SciPy, but those won't work because you cannot have dtype=object. I later discovered that there is a sparse dataframe in pandas. I haven't tested this yet but I am happy to do so if desired.

Makes me wonder if all the measures should be grouped together within an object, and stored in a DF (Polars?) for performance reason. They can be accessed via compute_gini(model.measures.wealth.value).

@quaquel
Copy link
Contributor Author

quaquel commented Feb 6, 2024

There is indeed a family resemblance between Measure and SystemState. Both reflect something of the model at a particular time instant. However, SystemState was conceptualized as a single object that contains everything you want to track. So it's like a collection of Measures. All are contained in a single, rather complex, data structure. Because of this, I am not sure whether we would need/want to group all measures together again in some bigger data structure.

Measure has the potential to be more fine-grained and lightweight. A Measure reflects a single aspect of the overall system state at a time instant. Also, at least in my current thinking, I was defining Measures only on the model itself, and a Measure would pull in the necessary data as specified by the object/group that are input to the Measure.

I don't think Measure can be implemented as a descriptor. Descriptors are defined at the class level. So it would be impossible to specify up front the object/ group upon which the measure would operate.

Also, I am currently inclined to first work out data collection in the absence of any pub/sub, but design it in such a way that it should be easy to keep the API while shifting to pub/sub.

@rht
Copy link
Contributor

rht commented Feb 10, 2024

There is indeed a family resemblance between Measure and SystemState.

That's a concise term for describing the situation.

All are contained in a single, rather complex, data structure. Because of this, I am not sure whether we would need/want to group all measures together again in some bigger data structure.

I suppose measure may allow multiple attributes functions, for the case when the measures can be grouped into 1 DF.
Based on #1944's code example

def __init__(...):
    # Instead of 2 separate measures
    # May accept multiple attributes/functions
    self.agents_pos = Measure(self.agents, "x", lambda agents: linear_transform(agent.pos)[1] for agent in agents)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Release notes label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants