New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nx.config
dict for configuring dispatching and backends
#7225
Conversation
(backend, info["default_config"]) | ||
for backend, info in backend_info.items() | ||
if "default_config" in info | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, it might be more optimal to have a separate function(like get_info
) that returns only the backend's configurations, like get_config(config_dict=None)
(if config_dict
is not None
then the configuration dictionary is updated and returned otherwise the default configurations are returned) because config dictionary would probably be updated quite frequently and we probably shouldn't load all the backend_info
every time. Please let me know if I'm missing something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I completely understand your comment. get_info
is called exactly once at import time for each backend (before this PR and in this PR). Among other things, this helps the dispatch machinery know which backends implement which functions.
By optionally including "default_config"
in the dict returned by get_info
, we are able to initialize the default configuration for each backend at import time. This happens only once. This may be nice for users so they can do things like:
import networkx as nx
nx.backend_config["parallel"]["n_job"] = 5
nx.backend_config
is how we expect users to update configuration.
We could make a separate entry point for the default config, but I don't know if it would be worth doing since it seems to me that the "backend_info" entry point works well enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I thought this kind of thing was possible :
import networkx as nx
nx.backend_config["parallel"]["n_job"] = 5
# defining G
print(nx.betweenness_centrality(G, backend="parallel"))
nx.backend_config["parallel"]["n_job"] = -1
nx.backend_config["parallel"]["backend"] = 'threading'
print(nx.betweenness_centrality(G, backend="parallel"))
so I thought we would be extracting all the configurations and adding them in the joblib.Parallel()
in all the functions, so having a separate get_config
function made sense to me.
But, I think, based on the clarification above, that all the configuration code lines will be executed while importing networkx so the final configuration for both the function calls would be {"n_jobs" : -1, "backend" : "threading"}
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to this question:
- I think the code you write above would work with this configuration system.
- the default values will be obtained from the backend_info
get_info
function. - the user overrides those values as desired for each function call.
- the function dispatches to nx-parallel which looks at the config to get the values to send to joblib.
I think your get_config
function would live in nx-parallel and look at nx.config.backends.parallel.<name>
for each option. Or, we could use the whole config object like: joblib.Parallel(**nx.config.backends.parallel)
. But the user would be able to change them between each function call.
networkx/utils/backends.py
Outdated
== "true", | ||
"automatic_backends": [ | ||
x.strip() | ||
for x in os.environ.get("NETWORKX_AUTOMATIC_BACKENDS", "").split(",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does NETWORKX_AUTOMATIC_BACKENDS
mean here? like what's its functionality? is it the default backend?
Sorry, I don't think I understand all of the backend functionality fully, so sorry if I am being repetitive here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, thanks for asking!
Previously, configuring backends was only possible via environment variables. NETWORKX_AUTOMATIC_BACKENDS
environment variable is a comma separated list of backends that indicates which backends we want to automatically convert to and run when e.g. the input graph is a NetworkX graph. This is a prioritized list, so backends listed first are tried first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, thank you :)
As discussed in the weekly dispatching meeting, |
nx.backend_config
dict for configuring dispatching and backendsnx.config
dict for configuring dispatching and backends
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good -- just a couple of comments.
- I would envision
config
will eventually be defined/created in the primary__init__.py
module. No need to do that now since backends are the only part of nx using it. But if/when it becomes part of visualization or something else.... - I don't think we should subclass
dict
to get doc_strings. But that's only because I've had bad experience subclassingdict
before. (The performance hit is quite large even though the subclassed dict doesn't do anything. Better to make the dict an attribute and add methods for the dict operations which just call the operation on the dict attribute.) What kind of documentation are we looking for? doc_string for a developer reading the code? doc_string for the html documentation? inspection related doc_strings which appear with ipython ? or ?? operation? I think the html version can be it's own rst page. The others are more tricky. Let's talk more about this. Are we looking fornx.config?
to return something nice?
That's the main thing I was thinking, so something like: class Config(dict):
"""Something nice; go look online for more."""
config = Config() Does this actually perform worse in recent Python versions? I'm fine with regular dict too. |
As I hinted above, my knee-jerk reaction about subclass-ing dict is based solely on my experience with performance issues. That experience may not apply here. For example, maybe performance doesn't matter for the config dict. Let's see... Are there guidelines for when to use an attribute and when to use a method? It seems like an attribute is documented in the class documentation while a method has its own documentation. Why is that? Maybe something like attributes tend to be data structures/values, while methods tend to provide functionality. Do you have some experience with that kind of rule-of-thumb? My intuition is to have a dict attribute and document that in the class documentation. But I could easily be swayed here. |
I'd be fine using a regular Performance-wise, I think subclassing from You're not supposed to subclass from |
networkx/utils/backends.py
Outdated
config = { | ||
# Get default configuration from environment variables at import time | ||
"automatic_backends": [ | ||
x.strip() | ||
for x in os.environ.get("NETWORKX_AUTOMATIC_BACKENDS", "").split(",") | ||
if x.strip() | ||
], | ||
# Initialize default configuration for backends | ||
"backends": { | ||
backend: info.get("default_config", {}) | ||
for backend, info in backend_info.items() | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be for a separate PR but I think the config entry "automatic backends" could be named better (same for the corresponding env var), especially when seen next to the config entry here for "backends".
We should discuss this a bit more, but I think "backend priority" (config["backend_priority"]
, NETWORKX_BACKEND_PRIORITY
) would be a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good thing to discuss 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think maybe we can think a bit more about the schema for the config dictionary, but for now let's go with this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some big picture questions, not a blocker here:
It's pretty easy to just mess up the config dictionary right now, there is no specific protection here. I can just delete the config or rewrite it. Maybe we don't even need to worry about this.
In [1]: import networkx as nx
In [2]: nx.config
Out[2]: {'automatic_backends': [], 'backends': {'graphblas': {}}}
In [3]: nx.config = 'something'
In [4]: nx.config
Out[4]: 'something'
Currently it won't be able to persist the config between 2 runs, it's declared at runtime. I guess that's fine as these configs should just stay in some config file, but we would still need a way to write the config to disk (it could just be a toml/yaml export)
I fixed the merge conflict, and I updated the name to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little leery about having the name config
exposed in the main namespace, especially since the configuration info contained here is related specifically to backends.
How would folks think about instead keeping the config dict exposure linked to the backends
module? Does this interfere with it's behavior/implementation? From a user/API perspective, I think nx.backends.config
is both more accurate and clear than nx.config
.
@rossbar what do you think about having |
I just created a The UX of the config is based on conversations and feedback from several people. I actually experimented with ~4 approaches that all basically do the same thing, and I was surprised that I settled on the one that uses After talking with @rlratzel multiple times about config, I believe he prefers attribute access (e.g. @dschult is sensitive to performance. Dataclasses are wicked-fast for attribute access. They are not as fast as some options for bracket syntax (e.g., I debated with myself whether the config should actually be a I believe @Schefflera-Arboricola first argued for making this a class during a call (but I may not remember correctly). Please take a look. @MridulS above complaineid how easy it was to mess up the config. There were no protections. Now there are. Things we are not yet doing for config, and I would say are out of scope for this PR unless any would fundamentally change our approach:
For fun, here is another approach I tried. Its performance is very reasonable. The thing I like most about it is that you can provide documentation for every single configuration. for example, doing e.g.
|
networkx/utils/configs.py
Outdated
"""Write me!""" | ||
|
||
backend_priority: list[str] | ||
backends: Config[str, Config] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy
reminds me again that I still know very little about type annotations. mypy
doesn't like Config[str, Config]
.
My "solution" will be to delete __class_getitem__
so this syntax isn't even possible. The working solution probably includes Generic
and TypeVar
and who knows what. Oh well... I thought this was worth trying.
This allows configs to be added and deleted.
The latest commit added the ability to define a subclass with |
btw, I'm currently dyslexic/confused when it comes to single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked for single/double backticks issues and found none. But that doesn't mean they aren't there. :) I also looked through the recent changes and I think this is ready to go in. We should keep an eye on it especially early on to see if anything should be fixed or tweaked to make it better.
We've got two approvals. I'll give @rossbar some time to look if he gets a chance. But will likely merge later today. Ross can make a separate PR with corrections and so forth if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this would be better served by living in the nx.backends
namespace rather than being available in nx
, but I don't feel strongly enough that it should be a blocker! If others are happy then +1 from me!
…kx#7225) * Add `nx.backend_config` dict for configuring dispatching and backends * Rename `nx.backend_config` to `nx.config` * "fallback_to_nx" is for testing, not for user config * Move config of backends to e.g. `nx.config["backends"]["cugraph"]` * How do you like this mypy?! * Rename `automatic_backends` to `backend_priority` (and env variables) * Create a class to handle configuration * Oops thanks mypy * Fix to work with more strict config * Support (and test) default values * Remove `__class_getitem__` and add docstring * Allow `strict=False` when defining subclasses. This allows configs to be added and deleted. * Move `__init_subclass__`
This adds a global config dict for dispatching and backends. Is this what we discussed in the last networkx dispatching meeting? It allows backends to provide default configuration in
"default_config"
in the dict returned bynetworkx.backend_info
entry-point. It saves this object tonx.backend_config[backend_name]
. This config is initialized using environment variables if they are defined.I think we should consider the fields of this config and whether we like the names, because this is now becoming public. Do we like
fallback_to_nx: bool
andautomatic_backends: list
?Also, how should we document this? Should we subclass
dict
just to provide a docstring, such as:CC @MridulS @rlratzel @Schefflera-Arboricola @jim22k