Aggegrated agent metric in DataCollection, graph in ChartModule #1145

EwoutH · 2022-01-23T21:48:32Z

Aggegrated agent variable
Currently it's not possible to quickly get a aggegrated metric of an agent variable. This PR adds a method to the DataCollector class called get_agent_metric that allows to quickly get a single value that describes the agent variable based on a stastistic.

By default, it takes the mean of the value of all agent's values for that variable. It always reports the variable in the current time step. The function supports all of statistics functions, as well as the built-in min(), max(), sum() and len() functions.

To support this:

statistics is imported
Adds agent_attr_index dictionary, which list the place of each reporter in the _agent_records dictionary
Adds self.agent_name_index, which can be used to lookup the reporter for each input variable name

Example
A model called model1 is created, with agents that have an agent_reporter in datacollector variable called "Neighbours"

        self.datacollector = DataCollector(
            model_reporters={"Agents": lambda m: m.schedule.get_agent_count()},
            agent_reporters={"Neighbours": "neighbours"},
        )

The new get_agent_metric() function can now be used to get an aggerate level statistic of the number of neighbours of the agents:

model1.datacollector.get_agent_metric("Neighbours")
0.8984375
model1.datacollector.get_agent_metric("Neighbours", "min")
0
model1.datacollector.get_agent_metric("Neighbours", "median")
0.0
model1.datacollector.get_agent_metric("Neighbours", "max")
3

Plotting agent variables
The ChartModule is also updated to support displaying agent variables. If it can't find a variable in the model variables, it checks if it is present in the agent variables, and if so, adds it to the chart.

Example
In a Game of Life model I build the DataCollector looks like this:

        self.datacollector = DataCollector(
            model_reporters={"Agents": lambda m: m.schedule.get_agent_count()},
            agent_reporters={"Neighbours": "neighbours"},
        )

The server contains two charts, one with the Agents, which is a model variable, and one with Neighbours, an agent variable.

chart1 = ChartModule([{"Label": "Agents", "Color": "Black"}], data_collector_name="datacollector")
chart2 = ChartModule([{"Label": "Neighbours", "Color": "Black"}], data_collector_name="datacollector")

server = ModularServer(LifeModel, [grid, chart1, chart2], "Game of Life", {"p": 0.12, "width": 40, "height": 40})

On the main branch, only the first chart is displayed correctly. On the second, both are.

@tpike3, @rht and others, I would love your feedback on this PR! Please consider performance, the naming of variables and functions and API stability. Also please let me know if (and where) tests and documentation should be added.

- Implements get a single aggegrated value from an agent variable - Allows the ChartModule to plot agent-level variables

codecov · 2022-01-23T21:49:35Z

Codecov Report

Merging #1145 (51df4fe) into main (4a79705) will decrease coverage by 1.17%.
The diff coverage is 25.00%.

@@            Coverage Diff             @@
##             main    #1145      +/-   ##
==========================================
- Coverage   89.30%   88.13%   -1.18%     
==========================================
  Files          19       19              
  Lines        1253     1289      +36     
  Branches      256      259       +3     
==========================================
+ Hits         1119     1136      +17     
- Misses         98      116      +18     
- Partials       36       37       +1

Impacted Files	Coverage Δ
mesa/visualization/modules/ChartVisualization.py	`70.96% <20.00%> (-20.70%)`	⬇️
mesa/datacollection.py	`88.11% <28.57%> (-9.59%)`	⬇️
mesa/space.py	`94.91% <0.00%> (-1.00%)`	⬇️
mesa/batchrunner.py	`92.28% <0.00%> (+0.72%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a79705...51df4fe. Read the comment docs.

rht · 2022-01-24T08:14:10Z

mesa/datacollection.py

@@ -100,6 +101,8 @@ class attributes of model

        self.model_vars = {}
        self._agent_records = {}
+        self.agent_attr_index = {}


This needs to be self.agent_attr_indexes, to make it consistent with existing dict attribute names, e.g. self.model_vars, etc.

I thought of it as an index between reporter names and their position in the _agent_records. So a single index.

Maybe agent_records_index might be even clearer.

In that case, the usage of the term "an index" as a collection of the mapping is inconsistent with its later use:

# Get the index of the reporter attr_index = self.agent_attr_index[reporter]

Where this is a single index.

Do you have a suggestion for a better name?

EwoutH · 2022-01-30T08:09:25Z

@tpike3 Do you have time to review this PR?

@rht Any more comments?

rht · 2022-01-30T08:33:55Z

self.agent_name_index is redundant with self.agent_reporters.
I find the extra agent_attr_index construct to be unnecessary. You can define aggregate measure of agent variables within the framework of the existing model-level data collection.

@rht

Thanks for spotting @rht!

EwoutH · 2022-01-31T17:30:58Z

self.agent_name_index is redundant with self.agent_reporters.

Good catch, can't believe I missed that. I found it already weird that there wasn't such a dictionary, but there was. I fixed it in 0d1aeef.

I find the extra agent_attr_index construct to be unnecessary. You can define aggregate measure of agent variables within the framework of the existing model-level data collection.

The dictionary is created to keep track of which metric is collected where in the list of _agent_records. That information isn't really easily viewable for the user unfortunately, of course you can look up deep in the code where each number in [1, 2, 4.5, 6.4, 3.8] stands for, but that should be easier or handled by the back-end, like this approach does.

Anyways I don't think it makes a big performance impact and it does make the code a bit more resilient if agent_reporters are defined in a weird way.

But if you suggest an other implementation I'm open to incorporate it!

rht · 2022-02-01T00:51:16Z

I wasn't referring to _agent_records. i was referring to storing the values in model_vars. The aggregated agent metric is a model-wide measure, a summary of individual agent properties.

tpike3 · 2022-02-03T10:28:47Z

@ewout, generically, I think this is a good idea, but it is a hard how to implement

1st a unhelpful philosophical rabbit hole:
This is pretty profound. To put in my own terms, what is the right set up to optimize user ease, as the general population becomes more technically literate. This is a constant issue for me right now and speaks to @EwoutH's point what is the set up allow users to easily and intuitively see key parts of the model.

2nd some thoughts to hopefully be helpful:
@EwoutH as I didn't get a chance to play with it and really understand, But, to @rht's point can you describe the difference between model_vars and agent_attr_index. Couldn't you just put the metrics against the ```model_vars" or even the dataframe (which has some interesting ease of use and cost dynamics)?

To a specific question, the testing would go in the data_collector

Hope this helps.

Note that if model_vars and agent_vars would be the same variable, a datacollector with an agent_reporter and model_reporter with an identical variable name would not function correctly.

EwoutH · 2022-02-07T11:35:47Z

Looked a bit more into it. They are indeed identical, but currently model_vars is used for model reporter variables, and agent_attr_index for agent reporter variables. I think it's better to keep them separate, just for the case in which there is a model variable and agent variable with the same name that are both collected.

I renamed it to agent_vars however, to make their similarity more clear.

So with my current skill set, I think this is the best implementation I can do. Then the question this, is this good enough in terms of performance and maintainability? If so, I can add tests and update the docs further.

If not, @rht would you be open to re-implementing this functionality from a clean sheet?

rht · 2022-02-07T14:24:19Z

Any aggregate metric, by definition, is a model-level variable. The examples you showed in #1145 (comment) can be put model_vars.
Avoiding data collection key name collision is a separate problem. If any, calling it agent_vars is misleading, because once again, those are model-level vars.

You should stick to the existing API whenever possible. Adding more machinery will cause the library to be more complex and harder to learn.

I would do something like this:

def get_neighbors_min(model):
    neighbors = model.datacollector.get_last_agent_report("Neighbors")
    return min(neighbors)

# Later on in the agent reporter initialization
   model_reporters={"neighbors_min": get_neighbors_min, ...}

This way, the user is the one responsible for naming the model-level var, and there is no key collision at all.

EwoutH · 2022-02-10T09:04:37Z

Thanks for your comment, I now understand your issue.

The current architecture is as follow:

DataCollector collects a variable from all agents each timestep, keeping all values.
The get_agent_metric aggregates the values from all agents it to a single value.
ChartVisualization plots this single value each timestep.

What you suggesting is merging step 1 and 2, if I understand correctly. While this has the advantage it can simplify code and reduce the amount of information stored, it does throw away a lot of data that could be analysed afterwards.

rht · 2022-02-10T13:02:31Z

No data are thrown away. See my example. I took the agent-level vars from an existing, separately-defined agent reporter.

EwoutH · 2022-10-07T22:34:53Z

@rht @tpike3 @jackiekazil Maybe we could give this PR/idea another spin. I think the main questions are:

Do we want aggregated agent metrics (for plotting) in Mesa?
If so, how would a clean implementation look like, that (preferably) doesn’t break backwards compatibility?

wang-boyu · 2022-10-29T16:11:48Z

Following our discussion in the dev meeting earlier today, you might need to consider the case where different types of agents may have different attributes. As discussed, the implemented interface could be used as a default where all types of agents are assumed to share a common attribute.

wang-boyu · 2022-10-29T16:46:54Z

Another major concern that I had is how to differentiate agent_reporters from model_reporters? How do we tell the users when to use agent_reporters vs. when to use the other?

Without this PR what I'll do would probably look very much like what was mentioned in #1145 (comment).

As an alternative yet similar example:

def get_min_neighbors(model):
    return min([getattr(agent, "neighbors") for agent in model.schedule.agents])  # or model.grid.agents or other similar places

and use this in model_reporters.

The question is, is this generic enough to be provided as an API to the users? For instance we can have:

def get_agent_metric(model, attr_name, metric="mean"):
    values = [getattr(agent, attr_name) for agent in model.schedule.agents]  
    if metric in ["min", "max", "sum", "len"]:
        # similar to what was implemented in this PR
        result = ...
    else:
        result = ...
    return result

so that the users can do something like:

from functools import partial

model_reporters={
    "neighbors_min": partial(get_agent_metric, attr_name="neighbors", metric="min"),
    ...
}

Personally I don't think this is really needed, since the users can fairly easily define their own functions.

How about the agent_reporters like in this PR, i.e.

self.data_collector = DataCollector(
    model_reporters={"Agents": lambda m: m.schedule.get_agent_count()},
    agent_reporters={"Neighbours": "neighbours"},
)

vs.

self.data_collector = DataCollector(
    model_reporters={
        "Agents": lambda m: m.schedule.get_agent_count(),
        "Neighbours": get_min_neighbors,
    }
)

Again I don't really see the need to introduce agent_reporters here.

wang-boyu · 2022-10-29T16:53:31Z

On a second thought, it might be useful when the users need to define lots of similar functions, such as:

self.data_collector = DataCollector(
    model_reporters={
        "Agents": lambda m: m.schedule.get_agent_count(),
        "Min Neighbours": get_min_neighbors,
        "Mean Neighbours": get_mean_neighbors,
        "Max Neighbours": get_max_neighbors,
    }
)

In this case it could be easier for the users to have a common interface such as agent_reporters or the get_agent_metric function mentioned previously, so that they don't have to rewrite lots of short functions. Sorry that I missed this point which was mentioned in the PR.

EwoutH · 2023-10-24T11:05:37Z

On a second thought, it might be useful when the users need to define lots of similar functions, such as:

self.data_collector = DataCollector(
    model_reporters={
        "Agents": lambda m: m.schedule.get_agent_count(),
        "Min Neighbours": get_min_neighbors,
        "Mean Neighbours": get_mean_neighbors,
        "Max Neighbours": get_max_neighbors,
    }
)

This is exactly what I see students (and myself, sometimes) do all the time. The main use case of this feature I see, is that you want to collect all the agent data for later proper statistical analysis, but you also want some quick values for eye-ball validation and visualisation.

If I want to do that with the current datacollector possibilities, I have to define both agent and model reporters, or write custom code to transform the agent data to the thing I want.

Also, I think that there should be a really easy way to plot a general statistic like the mean of an agent variable with real time visualisation. Heck, NetLogo does this for 20+ years.

Maybe some of the Solara stuff leapfrogs this, but those use cases should be included in my opinion:

Get quick aggerate values
Plot aggerate metrics in real time

(both while still collecting full agent data for proper analysis)

Aggegrated agent metric in DataCollection, graph in ChartModule

5c3088a

- Implements get a single aggegrated value from an agent variable - Allows the ChartModule to plot agent-level variables

EwoutH mentioned this pull request Jan 23, 2022

PoC: Multiple-agent types scheduling and datacollection #1142

Closed

9 tasks

rht reviewed Jan 24, 2022

View reviewed changes

Remove duplicate agent_name_index dictionary

0d1aeef

Thanks for spotting @rht!

datacollection: Rename agent_attr_index to agent_vars

51df4fe

Note that if model_vars and agent_vars would be the same variable, a datacollector with an agent_reporter and model_reporter with an identical variable name would not function correctly.

tpike3 modified the milestones: Major Change Efforts, v1.2.0 Taylor Oct 31, 2022

tpike3 modified the milestones: v1.2.0 Taylor, Mesa 2.0 Dec 8, 2022

tpike3 modified the milestones: Mesa 2.0, Major Change Efforts Jun 18, 2023

EwoutH mentioned this pull request Dec 9, 2023

Support plotting agent-level variables Corvince/mesa-interactive#9

Open

rht mentioned this pull request Jan 28, 2024

feat: Implement experimental DataCollector API #2013

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggegrated agent metric in DataCollection, graph in ChartModule #1145

Aggegrated agent metric in DataCollection, graph in ChartModule #1145

EwoutH commented Jan 23, 2022

codecov bot commented Jan 23, 2022 •

edited

rht Jan 24, 2022

EwoutH Jan 24, 2022

rht Jan 24, 2022

EwoutH Jan 24, 2022

EwoutH commented Jan 30, 2022

rht commented Jan 30, 2022

EwoutH commented Jan 31, 2022 •

edited

rht commented Feb 1, 2022

tpike3 commented Feb 3, 2022

EwoutH commented Feb 7, 2022

rht commented Feb 7, 2022

EwoutH commented Feb 10, 2022 •

edited

rht commented Feb 10, 2022

EwoutH commented Oct 7, 2022

wang-boyu commented Oct 29, 2022

wang-boyu commented Oct 29, 2022

wang-boyu commented Oct 29, 2022

EwoutH commented Oct 24, 2023

Aggegrated agent metric in DataCollection, graph in ChartModule #1145

Are you sure you want to change the base?

Aggegrated agent metric in DataCollection, graph in ChartModule #1145

Conversation

EwoutH commented Jan 23, 2022

codecov bot commented Jan 23, 2022 • edited

Codecov Report

rht Jan 24, 2022

Choose a reason for hiding this comment

EwoutH Jan 24, 2022

Choose a reason for hiding this comment

rht Jan 24, 2022

Choose a reason for hiding this comment

EwoutH Jan 24, 2022

Choose a reason for hiding this comment

EwoutH commented Jan 30, 2022

rht commented Jan 30, 2022

EwoutH commented Jan 31, 2022 • edited

rht commented Feb 1, 2022

tpike3 commented Feb 3, 2022

EwoutH commented Feb 7, 2022

rht commented Feb 7, 2022

EwoutH commented Feb 10, 2022 • edited

rht commented Feb 10, 2022

EwoutH commented Oct 7, 2022

wang-boyu commented Oct 29, 2022

wang-boyu commented Oct 29, 2022

wang-boyu commented Oct 29, 2022

EwoutH commented Oct 24, 2023

codecov bot commented Jan 23, 2022 •

edited

EwoutH commented Jan 31, 2022 •

edited

EwoutH commented Feb 10, 2022 •

edited