Add `nx.config` dict for configuring dispatching and backends #7225

eriknw · 2024-01-13T17:23:36Z

This adds a global config dict for dispatching and backends. Is this what we discussed in the last networkx dispatching meeting? It allows backends to provide default configuration in "default_config" in the dict returned by networkx.backend_info entry-point. It saves this object to nx.backend_config[backend_name]. This config is initialized using environment variables if they are defined.

I think we should consider the fields of this config and whether we like the names, because this is now becoming public. Do we like fallback_to_nx: bool and automatic_backends: list?

Also, how should we document this? Should we subclass dict just to provide a docstring, such as:

class BackendConfigDict(dict):
    """blah blah blah"""

CC @MridulS @rlratzel @Schefflera-Arboricola @jim22k

Schefflera-Arboricola · 2024-01-14T10:04:37Z

networkx/utils/backends.py

+    (backend, info["default_config"])
+    for backend, info in backend_info.items()
+    if "default_config" in info
+)


here, it might be more optimal to have a separate function(like get_info) that returns only the backend's configurations, like get_config(config_dict=None)(if config_dict is not None then the configuration dictionary is updated and returned otherwise the default configurations are returned) because config dictionary would probably be updated quite frequently and we probably shouldn't load all the backend_info every time. Please let me know if I'm missing something here.

I'm not sure I completely understand your comment. get_info is called exactly once at import time for each backend (before this PR and in this PR). Among other things, this helps the dispatch machinery know which backends implement which functions.

By optionally including "default_config" in the dict returned by get_info, we are able to initialize the default configuration for each backend at import time. This happens only once. This may be nice for users so they can do things like:

import networkx as nx nx.backend_config["parallel"]["n_job"] = 5

nx.backend_config is how we expect users to update configuration.

We could make a separate entry point for the default config, but I don't know if it would be worth doing since it seems to me that the "backend_info" entry point works well enough.

ok, I thought this kind of thing was possible :

import networkx as nx nx.backend_config["parallel"]["n_job"] = 5 # defining G print(nx.betweenness_centrality(G, backend="parallel")) nx.backend_config["parallel"]["n_job"] = -1 nx.backend_config["parallel"]["backend"] = 'threading' print(nx.betweenness_centrality(G, backend="parallel"))

so I thought we would be extracting all the configurations and adding them in the joblib.Parallel() in all the functions, so having a separate get_config function made sense to me.

But, I think, based on the clarification above, that all the configuration code lines will be executed while importing networkx so the final configuration for both the function calls would be {"n_jobs" : -1, "backend" : "threading"}, right?

Coming back to this question:

I think the code you write above would work with this configuration system.

the default values will be obtained from the backend_info get_info function.

the user overrides those values as desired for each function call.

the function dispatches to nx-parallel which looks at the config to get the values to send to joblib.

I think your get_config function would live in nx-parallel and look at nx.config.backends.parallel.<name> for each option. Or, we could use the whole config object like: joblib.Parallel(**nx.config.backends.parallel). But the user would be able to change them between each function call.

Schefflera-Arboricola · 2024-01-14T10:09:53Z

networkx/utils/backends.py

+    == "true",
+    "automatic_backends": [
+        x.strip()
+        for x in os.environ.get("NETWORKX_AUTOMATIC_BACKENDS", "").split(",")


what does NETWORKX_AUTOMATIC_BACKENDS mean here? like what's its functionality? is it the default backend?
Sorry, I don't think I understand all of the backend functionality fully, so sorry if I am being repetitive here.

No worries, thanks for asking!

Previously, configuring backends was only possible via environment variables. NETWORKX_AUTOMATIC_BACKENDS environment variable is a comma separated list of backends that indicates which backends we want to automatically convert to and run when e.g. the input graph is a NetworkX graph. This is a prioritized list, so backends listed first are tried first.

okay, thank you :)

eriknw · 2024-01-29T18:43:04Z

As discussed in the weekly dispatching meeting, "fallback_to_nx" probably doesn't make sense (yet) as a user-facing configuration option, so I removed it from nx.config. It is meant to be used when testing backends. We may soon have more generalized behavior for backend-to-backend conversions, which will allow "falling back" to other backends, not just networkx.

dschult

This looks good -- just a couple of comments.

I would envision config will eventually be defined/created in the primary __init__.py module. No need to do that now since backends are the only part of nx using it. But if/when it becomes part of visualization or something else....
I don't think we should subclass dict to get doc_strings. But that's only because I've had bad experience subclassing dict before. (The performance hit is quite large even though the subclassed dict doesn't do anything. Better to make the dict an attribute and add methods for the dict operations which just call the operation on the dict attribute.) What kind of documentation are we looking for? doc_string for a developer reading the code? doc_string for the html documentation? inspection related doc_strings which appear with ipython ? or ?? operation? I think the html version can be it's own rst page. The others are more tricky. Let's talk more about this. Are we looking for nx.config? to return something nice?

eriknw · 2024-02-07T03:01:31Z

Are we looking for nx.config? to return something nice?

That's the main thing I was thinking, so something like:

class Config(dict):
    """Something nice; go look online for more."""

config = Config()

Does this actually perform worse in recent Python versions? I'm fine with regular dict too.

dschult · 2024-02-07T03:35:00Z

As I hinted above, my knee-jerk reaction about subclass-ing dict is based solely on my experience with performance issues. That experience may not apply here. For example, maybe performance doesn't matter for the config dict.

Let's see... Are there guidelines for when to use an attribute and when to use a method? It seems like an attribute is documented in the class documentation while a method has its own documentation. Why is that? Maybe something like attributes tend to be data structures/values, while methods tend to provide functionality. Do you have some experience with that kind of rule-of-thumb? My intuition is to have a dict attribute and document that in the class documentation. But I could easily be swayed here.

eriknw · 2024-02-07T14:45:42Z

I'd be fine using a regular dict (and eventually adding online docs) so we can avoid these issues for now.

Performance-wise, I think subclassing from dict would be faster than subclassing from collections.UserDict or using a dict attribute. It's true that subclassing dict makes some operations such as __getitem__ and __contains__ slower than for regular dicts (CPython uses some tricks with hard-coded C slots and that get lost when building the descriptors for these slots for subclasses; in other words, my understanding is that dicts are fast, and dict subclasses are "regular speed").

You're not supposed to subclass from dict directly if you want to change methods, but we're not doing that here.

rlratzel · 2024-02-12T16:32:50Z

networkx/utils/backends.py

+config = {
+    # Get default configuration from environment variables at import time
+    "automatic_backends": [
+        x.strip()
+        for x in os.environ.get("NETWORKX_AUTOMATIC_BACKENDS", "").split(",")
+        if x.strip()
+    ],
+    # Initialize default configuration for backends
+    "backends": {
+        backend: info.get("default_config", {})
+        for backend, info in backend_info.items()
+    },
+}


This may be for a separate PR but I think the config entry "automatic backends" could be named better (same for the corresponding env var), especially when seen next to the config entry here for "backends".

We should discuss this a bit more, but I think "backend priority" (config["backend_priority"], NETWORKX_BACKEND_PRIORITY) would be a better name.

Sounds like a good thing to discuss 👍

I do think maybe we can think a bit more about the schema for the config dictionary, but for now let's go with this!

MridulS

Just some big picture questions, not a blocker here:

It's pretty easy to just mess up the config dictionary right now, there is no specific protection here. I can just delete the config or rewrite it. Maybe we don't even need to worry about this.

In [1]: import networkx as nx

In [2]: nx.config
Out[2]: {'automatic_backends': [], 'backends': {'graphblas': {}}}

In [3]: nx.config = 'something'

In [4]: nx.config
Out[4]: 'something'

Currently it won't be able to persist the config between 2 runs, it's declared at runtime. I guess that's fine as these configs should just stay in some config file, but we would still need a way to write the config to disk (it could just be a toml/yaml export)

eriknw · 2024-02-29T23:58:36Z

I fixed the merge conflict, and I updated the name to "backend_priority". Anything else?

rossbar

I'm a little leery about having the name config exposed in the main namespace, especially since the configuration info contained here is related specifically to backends.

How would folks think about instead keeping the config dict exposure linked to the backends module? Does this interfere with it's behavior/implementation? From a user/API perspective, I think nx.backends.config is both more accurate and clear than nx.config.

dschult · 2024-03-06T06:21:07Z

@rossbar what do you think about having nx.config but having the backend related keys in a dict nx.config['backends']? What are the tradeoffs between using subpackages and dict hierarchies. NetworkX has leaned toward flat namespaces with "dicts all the way down" for storing/organizing data. I can imagine config options for visualization, for backends, and for some of the more complex algorithms, but probably only one of two of those. So there shouldn't be many config options overall. Are you worried about namespace collisions, ease of finding the options, documentation, or "all of the above"? :)

eriknw · 2024-03-11T01:43:33Z

I just created a Config class. It has a docstring that y'all should read. There are even tests (before you comment--yes, I do try to write tests, and it's nice to add something that is easy to write tests for!).

The UX of the config is based on conversations and feedback from several people. I actually experimented with ~4 approaches that all basically do the same thing, and I was surprised that I settled on the one that uses dataclass. I think the end result is nice for the user and nice for backend developers who can use Config to make their own configuration objects.

After talking with @rlratzel multiple times about config, I believe he prefers attribute access (e.g. cfg.x). Dataclasses support this natively.

@dschult is sensitive to performance. Dataclasses are wicked-fast for attribute access. They are not as fast as some options for bracket syntax (e.g., cfg["x"]), but they're not bad (and some implementations are much slower).

I debated with myself whether the config should actually be a collections.abc.Mapping, and I think they should. They are very mapping-like. I also really like .get for writing portable code. I don't think we need all the other methods from MutableMapping.

I believe @Schefflera-Arboricola first argued for making this a class during a call (but I may not remember correctly). Please take a look.

@MridulS above complaineid how easy it was to mess up the config. There were no protections. Now there are.

Things we are not yet doing for config, and I would say are out of scope for this PR unless any would fundamentally change our approach:

thread-local storage
contexts
persistence
searching paths for config files
automagically getting config from environment variables (this is done manually)

For fun, here is another approach I tried. Its performance is very reasonable. The thing I like most about it is that you can provide documentation for every single configuration. for example, doing e.g. config.val? in IPython or Jupyter will show you the value and docstring.

KrazyKonfig

class KrazyKonfig:
    """Config class doc"""
    __slots__ = ()

    def __new__(cls, config_data, config_docs=None, doc=None):
        if config_docs is None:
            config_docs = {}
        else:
            config_docs = dict(config_docs)
        config_data = {key: config_data[key] for key in sorted(config_data)}
        class_dict = {
            "__slots__": (),  # config.newkey = val  # RAISE
            "_data": config_data,  # config._data
            "_docs": config_docs,  # config._docs
            "__doc__": doc,  # help(config)
            "__getitem__": config_data.__getitem__, # config[key]
            "__contains__": config_data.__contains__,  # key in config
            "__len__": config_data.__len__,  # len(config)
            "__iter__": config_data.__iter__,  # iter(config)
            "__reversed__": config_data.__reversed__,  # reversed(config)
            "__hash__": None,  # hash(config)  # RAISE
            "__dir__": config_data.keys,  # config.<TAB>
            "_ipython_key_completions_": config_data.keys,  # config["<TAB>
            # Make this a `collections.abc.Mapping`
            "get": config_data.get,
            "items": config_data.items,
            "keys": config_data.keys,
            "values": config_data.values,
        }
        for key in config_data:
            # Create properties for all config items for speed and docs
            val = property(op.itemgetter(key))  # config.x
            val = val.setter(partial(cls._setitem, key))  # config.x = y
            val.__doc__ = config_docs.get(key)  # config.x?  # IPython/Jupyter help
            class_dict[key] = val
        new_cls = type(cls.__name__, (cls,), class_dict)
        instance = object.__new__(new_cls)
        return instance

    @staticmethod
    def _setitem(key, self, value):
        if key not in self._data:
            raise AttributeError(f"Invalid config name: {key!r}")
        # Do checks...
        self._data[key] = value

    def __setitem__(self, key, value):
        try:
            self._setitem(key, self, value)
        except AttributeError as err:
            raise KeyError(*err.args) from None

    def __delitem__(self, key):
        raise TypeError(f"Configuration items can't be deleted (can't delete {key!r}).")

    __delattr__ = __delitem__

    def __repr__(self):
        return f'{type(self).__name__}({self._data!r})'

    def __reduce__(self):
        return type(self).mro()[1], (self._data, self._docs, self.__doc__)

    def __eq__(self, other):
        if not isinstance(other, KrazyKonfig):
            return NotImplemented
        return self.__reduce__() == other.__reduce__()

KrazyKonfig has a little more magic going on, so I don't think it's particularly accessible or maintainable (fwiw, I don't think it's too bad, and I think it's a good learning experience for the magic it uses). I think the config class currently in this PR is much more clear and maintainable (and if I ought to write more code comments, please let me know!).

eriknw · 2024-03-13T03:01:45Z

networkx/utils/configs.py

+    """Write me!"""
+
+    backend_priority: list[str]
+    backends: Config[str, Config]


mypy reminds me again that I still know very little about type annotations. mypy doesn't like Config[str, Config].

My "solution" will be to delete __class_getitem__ so this syntax isn't even possible. The working solution probably includes Generic and TypeVar and who knows what. Oh well... I thought this was worth trying.

This allows configs to be added and deleted.

eriknw · 2024-03-13T22:33:43Z

The latest commit added the ability to define a subclass with strict=False to allow configs to be added and deleted. It works, but it does make things a little more complicated.

eriknw · 2024-03-14T23:17:12Z

btw, I'm currently dyslexic/confused when it comes to single ` vs double `` when writing docs. Please check the usage of them in this PR and guide me in the right direction, thanks.

dschult

I looked for single/double backticks issues and found none. But that doesn't mean they aren't there. :) I also looked through the recent changes and I think this is ready to go in. We should keep an eye on it especially early on to see if anything should be fixed or tweaked to make it better.

We've got two approvals. I'll give @rossbar some time to look if he gets a chance. But will likely merge later today. Ross can make a separate PR with corrections and so forth if needed.

rossbar

I still think this would be better served by living in the nx.backends namespace rather than being available in nx, but I don't feel strongly enough that it should be a blocker! If others are happy then +1 from me!

…kx#7225) * Add `nx.backend_config` dict for configuring dispatching and backends * Rename `nx.backend_config` to `nx.config` * "fallback_to_nx" is for testing, not for user config * Move config of backends to e.g. `nx.config["backends"]["cugraph"]` * How do you like this mypy?! * Rename `automatic_backends` to `backend_priority` (and env variables) * Create a class to handle configuration * Oops thanks mypy * Fix to work with more strict config * Support (and test) default values * Remove `__class_getitem__` and add docstring * Allow `strict=False` when defining subclasses. This allows configs to be added and deleted. * Move `__init_subclass__`

Add nx.backend_config dict for configuring dispatching and backends

52f96fe

Schefflera-Arboricola reviewed Jan 14, 2024

View reviewed changes

eriknw added 4 commits January 18, 2024 19:47

Merge branch 'main' into backend_config

023c47f

Rename nx.backend_config to nx.config

0510588

Merge branch 'main' into backend_config

ee437ad

"fallback_to_nx" is for testing, not for user config

93ba39e

eriknw changed the title ~~Add nx.backend_config dict for configuring dispatching and backends~~ Add nx.config dict for configuring dispatching and backends Jan 30, 2024

rossbar added the Dispatching Related to dispatching and backend support label Jan 31, 2024

eriknw added 3 commits February 1, 2024 20:41

Merge branch 'main' into backend_config

276a371

Move config of backends to e.g. nx.config["backends"]["cugraph"]

dc8366a

How do you like this mypy?!

438c83a

dschult added the type: Maintenance label Feb 4, 2024

dschult approved these changes Feb 4, 2024

View reviewed changes

eriknw mentioned this pull request Feb 10, 2024

Automatically cache results from simple functions #7283

Draft

rlratzel approved these changes Feb 12, 2024

View reviewed changes

eriknw mentioned this pull request Feb 14, 2024

WIP: add to_backend_graph and from_backend_graph #7294

Draft

MridulS approved these changes Feb 18, 2024

View reviewed changes

eriknw added 2 commits February 29, 2024 14:00

Merge branch 'main' into backend_config

9980f6a

Rename automatic_backends to backend_priority (and env variables)

ba886c7

Merge branch 'main' into backend_config

40a27a7

rossbar reviewed Mar 5, 2024

View reviewed changes

eriknw added 2 commits March 10, 2024 13:50

Merge branch 'main' into backend_config

ae480b2

Create a class to handle configuration

c81595e

Oops thanks mypy

c4b7892

eriknw added 2 commits March 10, 2024 21:55

Fix to work with more strict config

65c8583

Support (and test) default values

dd0096e

eriknw mentioned this pull request Mar 13, 2024

ENH: Cache graphs objects when converting to a backend #7345

Merged

eriknw commented Mar 13, 2024

View reviewed changes

eriknw added 3 commits March 13, 2024 12:37

Merge branch 'main' into backend_config

eb7ea72

Remove __class_getitem__ and add docstring

2323189

Allow strict=False when defining subclasses.

f45b1d2

This allows configs to be added and deleted.

Move __init_subclass__

71d37e7

dschult approved these changes Mar 15, 2024

View reviewed changes

rossbar approved these changes Mar 15, 2024

View reviewed changes

dschult merged commit f336cf2 into networkx:main Mar 15, 2024
41 checks passed

jarrodmillman added this to the 3.3 milestone Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `nx.config` dict for configuring dispatching and backends #7225

Add `nx.config` dict for configuring dispatching and backends #7225

eriknw commented Jan 13, 2024 •

edited

Schefflera-Arboricola Jan 14, 2024

eriknw Jan 14, 2024

Schefflera-Arboricola Jan 14, 2024 •

edited

dschult Mar 15, 2024

Schefflera-Arboricola Jan 14, 2024

eriknw Jan 14, 2024

Schefflera-Arboricola Jan 14, 2024

eriknw commented Jan 29, 2024

dschult left a comment

eriknw commented Feb 7, 2024

dschult commented Feb 7, 2024

eriknw commented Feb 7, 2024

rlratzel Feb 12, 2024

eriknw Feb 12, 2024

MridulS Feb 18, 2024

MridulS left a comment

eriknw commented Feb 29, 2024

rossbar left a comment

dschult commented Mar 6, 2024

eriknw commented Mar 11, 2024 •

edited

eriknw Mar 13, 2024

eriknw commented Mar 13, 2024

eriknw commented Mar 14, 2024

dschult left a comment

rossbar left a comment

Add nx.config dict for configuring dispatching and backends #7225

Add nx.config dict for configuring dispatching and backends #7225

Conversation

eriknw commented Jan 13, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Schefflera-Arboricola Jan 14, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eriknw commented Jan 29, 2024

dschult left a comment

Choose a reason for hiding this comment

eriknw commented Feb 7, 2024

dschult commented Feb 7, 2024

eriknw commented Feb 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MridulS left a comment

Choose a reason for hiding this comment

eriknw commented Feb 29, 2024

rossbar left a comment

Choose a reason for hiding this comment

dschult commented Mar 6, 2024

eriknw commented Mar 11, 2024 • edited

Choose a reason for hiding this comment

eriknw commented Mar 13, 2024

eriknw commented Mar 14, 2024

dschult left a comment

Choose a reason for hiding this comment

rossbar left a comment

Choose a reason for hiding this comment

Add `nx.config` dict for configuring dispatching and backends #7225

Add `nx.config` dict for configuring dispatching and backends #7225

eriknw commented Jan 13, 2024 •

edited

Schefflera-Arboricola Jan 14, 2024 •

edited

eriknw commented Mar 11, 2024 •

edited