Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal PR to add cache dict to graphs and clear them #7344

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions networkx/algorithms/distance_measures.py
Expand Up @@ -629,6 +629,8 @@ def barycenter(G, weight=None, attr=None, sp=None):
barycenter_vertices = [v]
elif barycentricity == smallest:
barycenter_vertices.append(v)
if attr is not None:
G.__networkx_cache__.clear()
Comment on lines +632 to +633
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think changing attribute values should cause the cache to be cleared.

And if we decide that it does cause the cache to be cleared, then we should change the functions that compute a property and add it as an attribute to instead compute the property and return it as a dict keyed by node/edge to a value. That dict information can be "easily" added to the graph if desired with nx.set_node_attributes. But the idea that computing barycenter or flows (or other properties) makes all the cached values go away doesn't seem right to me.

Thoughts?

Copy link
Contributor Author

@eriknw eriknw Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, node data is still part of the graph, and changing node data means changing the graph (which means clearing the cache if we want to be conservative and safe). Maybe caching can become smarter someday and selectively be cleared like you have suggested (for example, adding node data wouldn't change the number of connected components).

barycenter is a special case, and I can't find another algorithm like it. Is there some reason why it's beneficial to add data to nodes within the algorithm? Are these values typically used elsewhere? Why don't any of the centralities (pagerank, betweenness, etc) use the same pattern? edmonds_karp_core is the closest (and only other) function that's similar in that it updates the edges in the residual graph.

Even though I'm not very familiar with barycenter, it does "smell" weird to me. I think it would be much more natural to return a dict with nodes to their barycentricity.

To find where input graph mutation occurs, you can search for mutates_input= which is part of _dispatchable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably change barycenter to return a dict. But it also looks like the other two functions you treat specially in this PR store attributes on the graph that would not affect anything for cached values not directed related to those functions.

The main problem I see is that even if we track down all the code in NetworkX that changes the attributes in a graph, most likely people are changing data attributes outside of using the graph functions/methods. We can't track those changes. So, that limits caching quite a bit. We have to warn user (and rely on users to) not change aspects of the graph that the cached values depend on. Or -- we need a way to check if the values of the data have changed (like a hash). How do arrays do this? Are there array libraries that cache values based on the array?

Have you thought at all about caching the results on the backend graph object? Or on a middle-ground object between the backend graph and the networkx graph. For example, it could store the networkx graph as an attribute, and the cached objects in a dict _networkx_cache_ (don't dunders indicate python private attributes? so shouldn't we be using unders? :)). We could have the networkx classes clear the networkx cache, and the backend cache could look at that to see whether the network structure has changed. And it could use any other methods needed to check that the data values are the same.

I think making functions outside the base classes clear the cache is fooling ourselves into a false sense of confidence that caching will work. We need users to feel that it is a sneaky, slightly dangerous move that might save lots of time. So they should probably have to turn it on for a function by using a keyword arg, and we'll need the doc_strings for that keyword arg to warn them not to change the data values between calls. And it should probably point to a networkx doc page that describes caching and the potential dangers therein. That way we are still in "the wild west", where anything goes. But users can still take advantage of caching if they are careful. And ideally, users can easily figure out which changes are dangerous for caching.

return barycenter_vertices


Expand Down
1 change: 1 addition & 0 deletions networkx/algorithms/flow/edmondskarp.py
Expand Up @@ -94,6 +94,7 @@ def bidirectional_bfs():
path.append(u)
flow_value += augment(path)

R.__networkx_cache__.clear()
return flow_value


Expand Down
11 changes: 11 additions & 0 deletions networkx/classes/digraph.py
Expand Up @@ -355,6 +355,7 @@ def __init__(self, incoming_graph_data=None, **attr):
self._pred = self.adjlist_outer_dict_factory() # predecessor
# Note: self._succ = self._adj # successor

self.__networkx_cache__ = {}
# attempt to load graph with data
if incoming_graph_data is not None:
convert.to_networkx_graph(incoming_graph_data, create_using=self)
Expand Down Expand Up @@ -465,6 +466,7 @@ def add_node(self, node_for_adding, **attr):
attr_dict.update(attr)
else: # update attr even if node already exists
self._node[node_for_adding].update(attr)
self.__networkx_cache__.clear()

def add_nodes_from(self, nodes_for_adding, **attr):
"""Add multiple nodes.
Expand Down Expand Up @@ -543,6 +545,7 @@ def add_nodes_from(self, nodes_for_adding, **attr):
self._pred[n] = self.adjlist_inner_dict_factory()
self._node[n] = self.node_attr_dict_factory()
self._node[n].update(newdict)
self.__networkx_cache__.clear()

def remove_node(self, n):
"""Remove node n.
Expand Down Expand Up @@ -585,6 +588,7 @@ def remove_node(self, n):
for u in self._pred[n]:
del self._succ[u][n] # remove all edges n-u in digraph
del self._pred[n] # remove node from pred
self.__networkx_cache__.clear()

def remove_nodes_from(self, nodes):
"""Remove multiple nodes.
Expand Down Expand Up @@ -639,6 +643,7 @@ def remove_nodes_from(self, nodes):
del self._pred[n] # now remove node
except KeyError:
pass # silent failure on remove
self.__networkx_cache__.clear()

def add_edge(self, u_of_edge, v_of_edge, **attr):
"""Add an edge between u and v.
Expand Down Expand Up @@ -709,6 +714,7 @@ def add_edge(self, u_of_edge, v_of_edge, **attr):
datadict.update(attr)
self._succ[u][v] = datadict
self._pred[v][u] = datadict
self.__networkx_cache__.clear()

def add_edges_from(self, ebunch_to_add, **attr):
"""Add all the edges in ebunch_to_add.
Expand Down Expand Up @@ -791,6 +797,7 @@ def add_edges_from(self, ebunch_to_add, **attr):
datadict.update(dd)
self._succ[u][v] = datadict
self._pred[v][u] = datadict
self.__networkx_cache__.clear()

def remove_edge(self, u, v):
"""Remove the edge between u and v.
Expand Down Expand Up @@ -824,6 +831,7 @@ def remove_edge(self, u, v):
del self._pred[v][u]
except KeyError as err:
raise NetworkXError(f"The edge {u}-{v} not in graph.") from err
self.__networkx_cache__.clear()

def remove_edges_from(self, ebunch):
"""Remove all edges specified in ebunch.
Expand Down Expand Up @@ -856,6 +864,7 @@ def remove_edges_from(self, ebunch):
if u in self._succ and v in self._succ[u]:
del self._succ[u][v]
del self._pred[v][u]
self.__networkx_cache__.clear()

def has_successor(self, u, v):
"""Returns True if node u has successor v.
Expand Down Expand Up @@ -1195,6 +1204,7 @@ def clear(self):
self._pred.clear()
self._node.clear()
self.graph.clear()
self.__networkx_cache__.clear()

def clear_edges(self):
"""Remove all edges from the graph without altering nodes.
Expand All @@ -1213,6 +1223,7 @@ def clear_edges(self):
predecessor_dict.clear()
for successor_dict in self._succ.values():
successor_dict.clear()
self.__networkx_cache__.clear()

def is_multigraph(self):
"""Returns True if graph is a multigraph, False otherwise."""
Expand Down
12 changes: 12 additions & 0 deletions networkx/classes/graph.py
Expand Up @@ -365,6 +365,7 @@ def __init__(self, incoming_graph_data=None, **attr):
self.graph = self.graph_attr_dict_factory() # dictionary for graph attributes
self._node = self.node_dict_factory() # empty node attribute dict
self._adj = self.adjlist_outer_dict_factory() # empty adjacency dict
self.__networkx_cache__ = {}
# attempt to load graph with data
if incoming_graph_data is not None:
convert.to_networkx_graph(incoming_graph_data, create_using=self)
Expand Down Expand Up @@ -559,6 +560,7 @@ def add_node(self, node_for_adding, **attr):
attr_dict.update(attr)
else: # update attr even if node already exists
self._node[node_for_adding].update(attr)
self.__networkx_cache__.clear()

def add_nodes_from(self, nodes_for_adding, **attr):
"""Add multiple nodes.
Expand Down Expand Up @@ -636,6 +638,7 @@ def add_nodes_from(self, nodes_for_adding, **attr):
self._adj[n] = self.adjlist_inner_dict_factory()
self._node[n] = self.node_attr_dict_factory()
self._node[n].update(newdict)
self.__networkx_cache__.clear()

def remove_node(self, n):
"""Remove node n.
Expand Down Expand Up @@ -676,6 +679,7 @@ def remove_node(self, n):
for u in nbrs:
del adj[u][n] # remove all edges n-u in graph
del adj[n] # now remove node
self.__networkx_cache__.clear()

def remove_nodes_from(self, nodes):
"""Remove multiple nodes.
Expand Down Expand Up @@ -728,6 +732,7 @@ def remove_nodes_from(self, nodes):
del adj[n]
except KeyError:
pass
self.__networkx_cache__.clear()

@cached_property
def nodes(self):
Expand Down Expand Up @@ -957,6 +962,7 @@ def add_edge(self, u_of_edge, v_of_edge, **attr):
datadict.update(attr)
self._adj[u][v] = datadict
self._adj[v][u] = datadict
self.__networkx_cache__.clear()

def add_edges_from(self, ebunch_to_add, **attr):
"""Add all the edges in ebunch_to_add.
Expand Down Expand Up @@ -1037,6 +1043,7 @@ def add_edges_from(self, ebunch_to_add, **attr):
datadict.update(dd)
self._adj[u][v] = datadict
self._adj[v][u] = datadict
self.__networkx_cache__.clear()

def add_weighted_edges_from(self, ebunch_to_add, weight="weight", **attr):
"""Add weighted edges in `ebunch_to_add` with specified weight attr
Expand Down Expand Up @@ -1087,6 +1094,7 @@ def add_weighted_edges_from(self, ebunch_to_add, weight="weight", **attr):
>>> G.add_weighted_edges_from(list((5, n, weight) for n in G.nodes))
"""
self.add_edges_from(((u, v, {weight: d}) for u, v, d in ebunch_to_add), **attr)
self.__networkx_cache__.clear()

def remove_edge(self, u, v):
"""Remove the edge between u and v.
Expand Down Expand Up @@ -1120,6 +1128,7 @@ def remove_edge(self, u, v):
del self._adj[v][u]
except KeyError as err:
raise NetworkXError(f"The edge {u}-{v} is not in the graph") from err
self.__networkx_cache__.clear()

def remove_edges_from(self, ebunch):
"""Remove all edges specified in ebunch.
Expand Down Expand Up @@ -1154,6 +1163,7 @@ def remove_edges_from(self, ebunch):
del adj[u][v]
if u != v: # self loop needs only one entry removed
del adj[v][u]
self.__networkx_cache__.clear()

def update(self, edges=None, nodes=None):
"""Update the graph using nodes/edges/graphs as input.
Expand Down Expand Up @@ -1526,6 +1536,7 @@ def clear(self):
self._adj.clear()
self._node.clear()
self.graph.clear()
self.__networkx_cache__.clear()

def clear_edges(self):
"""Remove all edges from the graph without altering nodes.
Expand All @@ -1541,6 +1552,7 @@ def clear_edges(self):
"""
for nbr_dict in self._adj.values():
nbr_dict.clear()
self.__networkx_cache__.clear()

def is_multigraph(self):
"""Returns True if graph is a multigraph, False otherwise."""
Expand Down
2 changes: 2 additions & 0 deletions networkx/classes/multidigraph.py
Expand Up @@ -508,6 +508,7 @@ def add_edge(self, u_for_edge, v_for_edge, key=None, **attr):
keydict[key] = datadict
self._succ[u][v] = keydict
self._pred[v][u] = keydict
self.__networkx_cache__.clear()
return key

def remove_edge(self, u, v, key=None):
Expand Down Expand Up @@ -583,6 +584,7 @@ def remove_edge(self, u, v, key=None):
# remove the key entries if last edge
del self._succ[u][v]
del self._pred[v][u]
self.__networkx_cache__.clear()

@cached_property
def edges(self):
Expand Down
4 changes: 4 additions & 0 deletions networkx/classes/multigraph.py
Expand Up @@ -520,6 +520,7 @@ def add_edge(self, u_for_edge, v_for_edge, key=None, **attr):
keydict[key] = datadict
self._adj[u][v] = keydict
self._adj[v][u] = keydict
self.__networkx_cache__.clear()
return key

def add_edges_from(self, ebunch_to_add, **attr):
Expand Down Expand Up @@ -616,6 +617,7 @@ def add_edges_from(self, ebunch_to_add, **attr):
key = self.add_edge(u, v, key)
self[u][v][key].update(ddd)
keylist.append(key)
self.__networkx_cache__.clear()
return keylist

def remove_edge(self, u, v, key=None):
Expand Down Expand Up @@ -695,6 +697,7 @@ def remove_edge(self, u, v, key=None):
del self._adj[u][v]
if u != v: # check for selfloop
del self._adj[v][u]
self.__networkx_cache__.clear()

def remove_edges_from(self, ebunch):
"""Remove all edges specified in ebunch.
Expand Down Expand Up @@ -753,6 +756,7 @@ def remove_edges_from(self, ebunch):
self.remove_edge(*e[:3])
except NetworkXError:
pass
self.__networkx_cache__.clear()

def has_edge(self, u, v, key=None):
"""Returns True if the graph has an edge between nodes u and v.
Expand Down
1 change: 1 addition & 0 deletions networkx/generators/stochastic.py
Expand Up @@ -50,4 +50,5 @@ def stochastic_graph(G, copy=True, weight="weight"):
d[weight] = 0
else:
d[weight] = d.get(weight, 1) / degree[u]
G.__networkx_cache__.clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the stochastic_graph, I think the fix is to change from copy making an entirely new graph, to allowing it to create a new attribute that holds the stochastic weight. There is no reason to copy the whole graph just to save the original weight values.

return G