Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggegrated agent metric in DataCollection, graph in ChartModule #1145

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

EwoutH
Copy link
Contributor

@EwoutH EwoutH commented Jan 23, 2022

Aggegrated agent variable
Currently it's not possible to quickly get a aggegrated metric of an agent variable. This PR adds a method to the DataCollector class called get_agent_metric that allows to quickly get a single value that describes the agent variable based on a stastistic.

By default, it takes the mean of the value of all agent's values for that variable. It always reports the variable in the current time step. The function supports all of statistics functions, as well as the built-in min(), max(), sum() and len() functions.

To support this:

  • statistics is imported
  • Adds agent_attr_index dictionary, which list the place of each reporter in the _agent_records dictionary
  • Adds self.agent_name_index, which can be used to lookup the reporter for each input variable name

Example
A model called model1 is created, with agents that have an agent_reporter in datacollector variable called "Neighbours"

        self.datacollector = DataCollector(
            model_reporters={"Agents": lambda m: m.schedule.get_agent_count()},
            agent_reporters={"Neighbours": "neighbours"},
        )

The new get_agent_metric() function can now be used to get an aggerate level statistic of the number of neighbours of the agents:

model1.datacollector.get_agent_metric("Neighbours")
0.8984375
model1.datacollector.get_agent_metric("Neighbours", "min")
0
model1.datacollector.get_agent_metric("Neighbours", "median")
0.0
model1.datacollector.get_agent_metric("Neighbours", "max")
3

Plotting agent variables
The ChartModule is also updated to support displaying agent variables. If it can't find a variable in the model variables, it checks if it is present in the agent variables, and if so, adds it to the chart.

Example
In a Game of Life model I build the DataCollector looks like this:

        self.datacollector = DataCollector(
            model_reporters={"Agents": lambda m: m.schedule.get_agent_count()},
            agent_reporters={"Neighbours": "neighbours"},
        )

The server contains two charts, one with the Agents, which is a model variable, and one with Neighbours, an agent variable.

chart1 = ChartModule([{"Label": "Agents", "Color": "Black"}], data_collector_name="datacollector")
chart2 = ChartModule([{"Label": "Neighbours", "Color": "Black"}], data_collector_name="datacollector")

server = ModularServer(LifeModel, [grid, chart1, chart2], "Game of Life", {"p": 0.12, "width": 40, "height": 40})

On the main branch, only the first chart is displayed correctly. On the second, both are.

Screenshot_692


@tpike3, @rht and others, I would love your feedback on this PR! Please consider performance, the naming of variables and functions and API stability. Also please let me know if (and where) tests and documentation should be added.

 - Implements get a single aggegrated value from an agent variable
 - Allows the ChartModule to plot agent-level variables
@codecov
Copy link

codecov bot commented Jan 23, 2022

Codecov Report

Merging #1145 (51df4fe) into main (4a79705) will decrease coverage by 1.17%.
The diff coverage is 25.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1145      +/-   ##
==========================================
- Coverage   89.30%   88.13%   -1.18%     
==========================================
  Files          19       19              
  Lines        1253     1289      +36     
  Branches      256      259       +3     
==========================================
+ Hits         1119     1136      +17     
- Misses         98      116      +18     
- Partials       36       37       +1     
Impacted Files Coverage Δ
mesa/visualization/modules/ChartVisualization.py 70.96% <20.00%> (-20.70%) ⬇️
mesa/datacollection.py 88.11% <28.57%> (-9.59%) ⬇️
mesa/space.py 94.91% <0.00%> (-1.00%) ⬇️
mesa/batchrunner.py 92.28% <0.00%> (+0.72%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a79705...51df4fe. Read the comment docs.

@@ -100,6 +101,8 @@ class attributes of model

self.model_vars = {}
self._agent_records = {}
self.agent_attr_index = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be self.agent_attr_indexes, to make it consistent with existing dict attribute names, e.g. self.model_vars, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of it as an index between reporter names and their position in the _agent_records. So a single index.

Maybe agent_records_index might be even clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, the usage of the term "an index" as a collection of the mapping is inconsistent with its later use:

        # Get the index of the reporter
        attr_index = self.agent_attr_index[reporter]

Where this is a single index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion for a better name?

@EwoutH
Copy link
Contributor Author

EwoutH commented Jan 30, 2022

@tpike3 Do you have time to review this PR?

@rht Any more comments?

@rht
Copy link
Contributor

rht commented Jan 30, 2022

  • self.agent_name_index is redundant with self.agent_reporters.
  • I find the extra agent_attr_index construct to be unnecessary. You can define aggregate measure of agent variables within the framework of the existing model-level data collection.

@EwoutH
Copy link
Contributor Author

EwoutH commented Jan 31, 2022

  • self.agent_name_index is redundant with self.agent_reporters.

Good catch, can't believe I missed that. I found it already weird that there wasn't such a dictionary, but there was. I fixed it in 0d1aeef.

  • I find the extra agent_attr_index construct to be unnecessary. You can define aggregate measure of agent variables within the framework of the existing model-level data collection.

The dictionary is created to keep track of which metric is collected where in the list of _agent_records. That information isn't really easily viewable for the user unfortunately, of course you can look up deep in the code where each number in [1, 2, 4.5, 6.4, 3.8] stands for, but that should be easier or handled by the back-end, like this approach does.

Anyways I don't think it makes a big performance impact and it does make the code a bit more resilient if agent_reporters are defined in a weird way.

But if you suggest an other implementation I'm open to incorporate it!

@rht
Copy link
Contributor

rht commented Feb 1, 2022

I wasn't referring to _agent_records. i was referring to storing the values in model_vars. The aggregated agent metric is a model-wide measure, a summary of individual agent properties.

@tpike3
Copy link
Contributor

tpike3 commented Feb 3, 2022

@ewout, generically, I think this is a good idea, but it is a hard how to implement

1st a unhelpful philosophical rabbit hole:
This is pretty profound. To put in my own terms, what is the right set up to optimize user ease, as the general population becomes more technically literate. This is a constant issue for me right now and speaks to @EwoutH's point what is the set up allow users to easily and intuitively see key parts of the model.

2nd some thoughts to hopefully be helpful:
@EwoutH as I didn't get a chance to play with it and really understand, But, to @rht's point can you describe the difference between model_vars and agent_attr_index. Couldn't you just put the metrics against the ```model_vars" or even the dataframe (which has some interesting ease of use and cost dynamics)?

To a specific question, the testing would go in the data_collector

Hope this helps.

Note that if model_vars and agent_vars would be the same variable, a datacollector with an agent_reporter and model_reporter with an identical variable name would not function correctly.
@EwoutH
Copy link
Contributor Author

EwoutH commented Feb 7, 2022

Looked a bit more into it. They are indeed identical, but currently model_vars is used for model reporter variables, and agent_attr_index for agent reporter variables. I think it's better to keep them separate, just for the case in which there is a model variable and agent variable with the same name that are both collected.

I renamed it to agent_vars however, to make their similarity more clear.

So with my current skill set, I think this is the best implementation I can do. Then the question this, is this good enough in terms of performance and maintainability? If so, I can add tests and update the docs further.

If not, @rht would you be open to re-implementing this functionality from a clean sheet?

@rht
Copy link
Contributor

rht commented Feb 7, 2022

Any aggregate metric, by definition, is a model-level variable. The examples you showed in #1145 (comment) can be put model_vars.
Avoiding data collection key name collision is a separate problem. If any, calling it agent_vars is misleading, because once again, those are model-level vars.

You should stick to the existing API whenever possible. Adding more machinery will cause the library to be more complex and harder to learn.

I would do something like this:

def get_neighbors_min(model):
    neighbors = model.datacollector.get_last_agent_report("Neighbors")
    return min(neighbors)

# Later on in the agent reporter initialization
   model_reporters={"neighbors_min": get_neighbors_min, ...}

This way, the user is the one responsible for naming the model-level var, and there is no key collision at all.

@EwoutH
Copy link
Contributor Author

EwoutH commented Feb 10, 2022

Thanks for your comment, I now understand your issue.

The current architecture is as follow:

  1. DataCollector collects a variable from all agents each timestep, keeping all values.
  2. The get_agent_metric aggregates the values from all agents it to a single value.
  3. ChartVisualization plots this single value each timestep.

What you suggesting is merging step 1 and 2, if I understand correctly. While this has the advantage it can simplify code and reduce the amount of information stored, it does throw away a lot of data that could be analysed afterwards.

@rht
Copy link
Contributor

rht commented Feb 10, 2022

No data are thrown away. See my example. I took the agent-level vars from an existing, separately-defined agent reporter.

@EwoutH
Copy link
Contributor Author

EwoutH commented Oct 7, 2022

@rht @tpike3 @jackiekazil Maybe we could give this PR/idea another spin. I think the main questions are:

  1. Do we want aggregated agent metrics (for plotting) in Mesa?
  2. If so, how would a clean implementation look like, that (preferably) doesn’t break backwards compatibility?

@wang-boyu
Copy link
Member

Following our discussion in the dev meeting earlier today, you might need to consider the case where different types of agents may have different attributes. As discussed, the implemented interface could be used as a default where all types of agents are assumed to share a common attribute.

@wang-boyu
Copy link
Member

Another major concern that I had is how to differentiate agent_reporters from model_reporters? How do we tell the users when to use agent_reporters vs. when to use the other?

Without this PR what I'll do would probably look very much like what was mentioned in #1145 (comment).

As an alternative yet similar example:

def get_min_neighbors(model):
    return min([getattr(agent, "neighbors") for agent in model.schedule.agents])  # or model.grid.agents or other similar places

and use this in model_reporters.

The question is, is this generic enough to be provided as an API to the users? For instance we can have:

def get_agent_metric(model, attr_name, metric="mean"):
    values = [getattr(agent, attr_name) for agent in model.schedule.agents]  
    if metric in ["min", "max", "sum", "len"]:
        # similar to what was implemented in this PR
        result = ...
    else:
        result = ...
    return result

so that the users can do something like:

from functools import partial

model_reporters={
    "neighbors_min": partial(get_agent_metric, attr_name="neighbors", metric="min"),
    ...
}

Personally I don't think this is really needed, since the users can fairly easily define their own functions.

How about the agent_reporters like in this PR, i.e.

self.data_collector = DataCollector(
    model_reporters={"Agents": lambda m: m.schedule.get_agent_count()},
    agent_reporters={"Neighbours": "neighbours"},
)

vs.

self.data_collector = DataCollector(
    model_reporters={
        "Agents": lambda m: m.schedule.get_agent_count(),
        "Neighbours": get_min_neighbors,
    }
)

Again I don't really see the need to introduce agent_reporters here.

@wang-boyu
Copy link
Member

On a second thought, it might be useful when the users need to define lots of similar functions, such as:

self.data_collector = DataCollector(
    model_reporters={
        "Agents": lambda m: m.schedule.get_agent_count(),
        "Min Neighbours": get_min_neighbors,
        "Mean Neighbours": get_mean_neighbors,
        "Max Neighbours": get_max_neighbors,
    }
)

In this case it could be easier for the users to have a common interface such as agent_reporters or the get_agent_metric function mentioned previously, so that they don't have to rewrite lots of short functions. Sorry that I missed this point which was mentioned in the PR.

@tpike3 tpike3 modified the milestones: v1.2.0 Taylor, Mesa 2.0 Dec 8, 2022
@tpike3 tpike3 modified the milestones: Mesa 2.0, Major Change Efforts Jun 18, 2023
@EwoutH
Copy link
Contributor Author

EwoutH commented Oct 24, 2023

On a second thought, it might be useful when the users need to define lots of similar functions, such as:

self.data_collector = DataCollector(
    model_reporters={
        "Agents": lambda m: m.schedule.get_agent_count(),
        "Min Neighbours": get_min_neighbors,
        "Mean Neighbours": get_mean_neighbors,
        "Max Neighbours": get_max_neighbors,
    }
)

This is exactly what I see students (and myself, sometimes) do all the time. The main use case of this feature I see, is that you want to collect all the agent data for later proper statistical analysis, but you also want some quick values for eye-ball validation and visualisation.

If I want to do that with the current datacollector possibilities, I have to define both agent and model reporters, or write custom code to transform the agent data to the thing I want.

Also, I think that there should be a really easy way to plot a general statistic like the mean of an agent variable with real time visualisation. Heck, NetLogo does this for 20+ years.

Maybe some of the Solara stuff leapfrogs this, but those use cases should be included in my opinion:

  • Get quick aggerate values
  • Plot aggerate metrics in real time

(both while still collecting full agent data for proper analysis)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants