Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upath: implement poc for flavour base vendoring #200

Merged
merged 25 commits into from Mar 3, 2024

Conversation

ap--
Copy link
Collaborator

@ap-- ap-- commented Feb 21, 2024

Closes #199
Closes #198
Closes #195

POC implementation.
Still need to think about if this is actually the best way forward.

@ap-- ap-- self-assigned this Feb 21, 2024
@ap--
Copy link
Collaborator Author

ap-- commented Feb 23, 2024

This is going in the right direction, and will improve the UPath internals by quite a bit.

UPath._flavour is a descriptor, that retrieves the protocol of the instance and adjusts the pathlib_abc.FlavourBase interface accordingly.

WrappedFileSystemFlavour is a wrapper around fsspec.AbstractFileSystem classes or instances and provides the FlavourBase implementation.

To be able to provide the correct flavour behavior, we generate code from all known_implementations of fsspec classes. The interface needed is very small. Three methods: _strip_protocol, _parent, _get_kwargs_from_urls as well as sep, root_marker, protocol. This allows us to avoid any imports and in the future will allow manipulating fsspec URI using PureUPath instances without requiring the relevant fsspec filesystem dependency.

This could even make the subclassing and dispatch mechanism obsolete (at least for PureUPath classes, the concrete UPath classes might still have to fix accessor behavior).

There are 4 remaining issues:

  • three windows tests fail currently? (might be some issue in the flavour, where i need to use os.path instead of posixpath)
  • I had to change _parse_path and _format_path_parts to instance methods instead of classmethods (they are classmethods on pathlib.Path). This is needed so that the _flavour descriptor can get the protocol from the current instance. Interestingly this is not a limitation in pathlib_abc.PathBase. Also in pathlib.Path 3.12 those classmethods are only called on instances. My guess is that this is intentional. It allows us to do this change easily. I just need to consider if I should provide some fallback mechanism if someone calls these on the class. (Should be doable with a custom descriptor and the default_flavour which is not protocol specific).
  • I need to check if it's nicer to define the flavour config settings on the specific UPath classes. I could grab the settings via LazyFlavourDescriptor.__set_name__ and set it permanently. This could also invert the registry and import mechanism, which could now become non-lazy... (need to think about this one a bit) Best to postpone this until we find a better use case. This would couple the protocols to the classes. (Which is currently done indirectly via the registry. We should wait how the pathlib.PathBase refactor develops, and check if flavours should be configurable on the instance ??zarr-style uri chaining??)
  • I need to write a whole bunch of tests around the flavours and especially the deprecated behavior. Would be easier to just force everyone to the new interface...

It's not a lot a work, but it has to be done well, to avoid that this will all become a problem when we move to pathlib_abc.PathBase. We can initially introduce a PureUPath with pathlib_abc.PurePathBase without changing any public behavior.

@ap-- ap-- marked this pull request as ready for review March 3, 2024 22:05
@ap-- ap-- merged commit 3646cd2 into fsspec:main Mar 3, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant