Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a Sparse data structure for (Multi)Grid #2107

Open
lucid-jake opened this issue Apr 8, 2024 · 0 comments
Open

Use a Sparse data structure for (Multi)Grid #2107

lucid-jake opened this issue Apr 8, 2024 · 0 comments

Comments

@lucid-jake
Copy link

lucid-jake commented Apr 8, 2024

What's the problem this feature will solve?
Incredible speed-up of grid-initialisation. An empty grid created with say 10,000 x 10,000 cells, takes up many gigabytes of memory for a structure that has nothing of interest yet embedded

Describe the solution you'd like
I swapped out the default list-of-lists implementation underlying Grid and MultiGrid, with pythons collections.defaultdict and saw a memory footprint 40x smaller than the original implementation.

Here's the simple sub-class I created to override the datastructure:

class SparseMultiGrid(MultiGrid):
    def __init__(self, width: int, height: int, torus: bool) -> None:
        """Create a new grid.

        Args:
            width, height: The width and height of the grid
            torus: Boolean whether the grid wraps or not.
        """
        self.height = height
        self.width = width
        self.torus = torus
        self.num_cells = height * width

        # Internal list-of-lists which holds the grid cells themselves
        self._grid: list[list[GridContent]]
        self._grid = defaultdict(lambda: defaultdict(dict))

        # Flag to check if the empties set has been created. Better than initializing
        # _empties as set() because in this case it would become impossible to discern
        # if the set hasn't still being built or if it has become empty after creation.
        self._empties_built = False

        # Neighborhood Cache
        self._neighborhood_cache: dict[Any, Sequence[Coordinate]] = {}

        # Cutoff used inside self.move_to_empty. The parameters are fitted on Python
        # 3.11 and it was verified that they are roughly the same for 3.10. Refer to
        # the code in PR#1565 to check for their stability when a new release gets out.
        self.cutoff_empties = 7.953 * self.num_cells**0.384

    def place_agent(self, agent: Agent, pos: Coordinate) -> None:
        """Place the agent at the specified location, and set its pos variable."""
        x, y = pos
        if agent.pos is None or agent not in self._grid[x][y]:
            if type(self._grid[x][y]) is not list:
                self._grid[x][y] = [] # this is needed to initialise empty grid cells before we place the first agent there
            self._grid[x][y].append(agent)
            agent.pos = pos
            if self._empties_built:
                self._empties.discard(pos)
                self._empty_mask[agent.pos] = True

Additional context
I'd like to hear the opinions of the Mesa devs on this, perhaps there was some reason that the more memory intensive data-structure was used, but as far as I can tell, there is a lot of wasted overhead being used to initialise the models currently

With MultiGrid (Before):

Building grid with size: 12263 x 13231 (162251753) ...
Grid built in 78.20502400398254s
Partition of a set of 162792840 objects. Total size = 10644672072 bytes.
                                                             >>> [ 10.6GB !!! ]
 Index  Count   %     Size   % Cumulative  % Kind (class [/](https://file+.vscode-resource.vscode-cdn.net/) dict of class)
     0 162268589 100 10409893848  98 10409893848  98 list
     1     88   0 162411957   2 10572305805  99 numpy.ndarray
     2 154815   0 22097607   0 10594403412 100 str
     3 128390   0  9516512   0 10603919924 100 tuple
     4  36690   0  6535161   0 10610455085 100 types.CodeType

With SparseMultiGrid (After):

Building grid with size: 12263 x 13231 (162251753) ...
Grid built in 7.867813110351562e-06s [ !!! ]
Partition of a set of 528821 objects. Total size = 73084683 bytes. 
                                                             >>> [ 73MB ! ]
 Index  Count   %     Size   % Cumulative  % Kind (class [/](https://file+.vscode-resource.vscode-cdn.net/) dict of class)
     0 154815  29 22097609  30  22097609  30 str
     1 128389  24  9516464  13  31614073  43 tuple
     2  36690   7  6534483   9  38148556  52 types.CodeType
     3  60704  11  5210761   7  43359317  59 bytes
     ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant