New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal PR to add cache dict to graphs and clear them #7344
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,4 +50,5 @@ def stochastic_graph(G, copy=True, weight="weight"): | |
d[weight] = 0 | ||
else: | ||
d[weight] = d.get(weight, 1) / degree[u] | ||
G.__networkx_cache__.clear() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the |
||
return G |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think changing attribute values should cause the cache to be cleared.
And if we decide that it does cause the cache to be cleared, then we should change the functions that compute a property and add it as an attribute to instead compute the property and return it as a dict keyed by node/edge to a value. That dict information can be "easily" added to the graph if desired with
nx.set_node_attributes
. But the idea that computingbarycenter
or flows (or other properties) makes all the cached values go away doesn't seem right to me.Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, node data is still part of the graph, and changing node data means changing the graph (which means clearing the cache if we want to be conservative and safe). Maybe caching can become smarter someday and selectively be cleared like you have suggested (for example, adding node data wouldn't change the number of connected components).
barycenter
is a special case, and I can't find another algorithm like it. Is there some reason why it's beneficial to add data to nodes within the algorithm? Are these values typically used elsewhere? Why don't any of the centralities (pagerank, betweenness, etc) use the same pattern?edmonds_karp_core
is the closest (and only other) function that's similar in that it updates the edges in the residual graph.Even though I'm not very familiar with
barycenter
, it does "smell" weird to me. I think it would be much more natural to return a dict with nodes to their barycentricity.To find where input graph mutation occurs, you can search for
mutates_input=
which is part of_dispatchable
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably change barycenter to return a dict. But it also looks like the other two functions you treat specially in this PR store attributes on the graph that would not affect anything for cached values not directed related to those functions.
The main problem I see is that even if we track down all the code in NetworkX that changes the attributes in a graph, most likely people are changing data attributes outside of using the graph functions/methods. We can't track those changes. So, that limits caching quite a bit. We have to warn user (and rely on users to) not change aspects of the graph that the cached values depend on. Or -- we need a way to check if the values of the data have changed (like a hash). How do arrays do this? Are there array libraries that cache values based on the array?
Have you thought at all about caching the results on the backend graph object? Or on a middle-ground object between the backend graph and the networkx graph. For example, it could store the networkx graph as an attribute, and the cached objects in a dict
_networkx_cache_
(don't dunders indicate python private attributes? so shouldn't we be usingunder
s? :)). We could have the networkx classesclear
the networkx cache, and the backend cache could look at that to see whether the network structure has changed. And it could use any other methods needed to check that the data values are the same.I think making functions outside the base classes clear the cache is fooling ourselves into a false sense of confidence that caching will work. We need users to feel that it is a sneaky, slightly dangerous move that might save lots of time. So they should probably have to turn it on for a function by using a keyword arg, and we'll need the doc_strings for that keyword arg to warn them not to change the data values between calls. And it should probably point to a networkx doc page that describes caching and the potential dangers therein. That way we are still in "the wild west", where anything goes. But users can still take advantage of caching if they are careful. And ideally, users can easily figure out which changes are dangerous for caching.