Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fsspec instead of pyfilesystem2 #102

Open
dhirschfeld opened this issue Apr 22, 2021 · 9 comments
Open

Use fsspec instead of pyfilesystem2 #102

dhirschfeld opened this issue Apr 22, 2021 · 9 comments
Labels

Comments

@dhirschfeld
Copy link

Before fsspec existed I used pyfilesystem2 and was very happy with it - it's a great library however it (apparently) didn't meet all the requirements for dask so fsspec was built, primarily to support dask, but it's also used in intake and as a generic filesystem api. As such it has a robust community around it and is continually improving and maturing.

Coming from the distributed computing world it has first-class support for cloud storage, and in particular (for my use-case) Azure Data Lake.

I haven't actually used the cloud storage plugins in pyfilesystem2 but they don't seem to have a lot of development momentum behind them, unlike fsspec.

To better support cloud filesystems I think it would be great if jupyter-fs could make use of fsspec rather than pyfilesystem2

@dhirschfeld
Copy link
Author

TBF, I think fsspec still isn't quite as mature as pyfilesystem2 and doesn't have quite as polished of an api, however it does seem to have much better support for the use-cases I care about.

@dhirschfeld
Copy link
Author

xref: #7

@telamonian
Copy link
Collaborator

I've been talking to @martindurant about fsspec for a while now (he's the creator). My current preference is to not throw the baby out with the pyfilesystem bathwater, and instead include some kind of support for both pyfilesystem2 and fsspec. Martin has actually been kind enough to get an implementation of fsspec for jupyter-fs started in his changes branch here.

@dhirschfeld I don't have a huge quantity of bandwidth to work on jupyter-fs right now, and most of my effort is currently going towards the new tree-finder based filebrowser. But if you want to take a crack at it I would not say no to a fsspec PR

@dhirschfeld
Copy link
Author

It's an itch I'd like to scratch, but realistically won't have time to look at any time soon.

I'm using fsspec to access data on cloud storage from JupyterLab and I thought it would be nice to be able to browse that same storage from within JupyterLab to e.g. check if my f.write(data) call really worked. There's a slight friction having to switch to the Azure Portal to check if the files that should have been written to cloud storage really were written.

Unfortunately, since it's a "nice-to-have" rather than a "can't live without" I won't be able to invest time into it in the medium term - I can't even keep up with my can't live without's :/

@reoono
Copy link
Contributor

reoono commented Mar 26, 2022

Is there any update on this?

If not, I would like to work on this issue, to use fsspec for protocols not supported by pyfilesystem2.

My current preference is to not throw the baby out with the pyfilesystem bathwater, and instead include some kind of support for both pyfilesystem2 and fsspec.

Based on the above comments, I am considering either of the following policies, but would appreciate comments if you have a preference.

  1. Change the backend for each resource from setting as in the following example.
    (For backward compatibility, use pyfilesystem2 if not set.)
{
  "resources": [
    {
      "name": "explicit_pyfilesystem2_resource",
      "url": "osfs:///Users/foo/test",
      "backend": "pyfilesystem2"
    },
    {
      "name": "implicit_pyfilesystem2_resource",
      "url": "osfs:///Users/foo/test",
    },
    {
      "name": "fsspec_resource",
      "url": "s3://test",
      "backend": "fsspec"
    },
  ]
}
  1. Check if the protocol is supported by pyfilesystem2, and if so, use pyfilesystem2.
    Otherwise, use fsspec.
    https://github.com/PyFilesystem/pyfilesystem2/blob/master/fs/opener/registry.py#L93

If there is no preference, I would like to proceed with 1 for future expansion.
Any comments or suggestions would be appreciated.

@martindurant
Copy link

Note that fsspec instances generally need more configuration. Whilst it is possible to set the default values for any particular protocol, it is very conceivable to want different configurations for, e.g., an owned bucket, a public bucket and a requestor-pays bucket on S3. (or even different S3-compatible service)

@reoono
Copy link
Contributor

reoono commented Apr 1, 2022

Thank you for your comment.
I believe that the feature will be worthwhile even with default values at first, since it will also support protocols that are not yet supported by Pyfilesystem2.
Therefore, I would like to proceed initially with default values, as is the current usage of Pyfilesystem.
And what about more detailed configurations, which I would be willing to consider if necessary?

(Not related to the issue, but I also find fsspec useful on a daily basis.
Thank you for developing a very cool and useful product)

@reoono
Copy link
Contributor

reoono commented Apr 7, 2022

I have started to implement the addition of fsspec.

Since fsspec.core.url_to_fs() is used internally to create instances, I began to think that making 'kwargs' configurable in addition to 'backend' would solve the problem you mentioned.
(I would like to pass it like client_kwargs)

Of course, as an interface to JupyterLab's setting, this would be redundant. However, this is not a big problem because this function is only for users who want to do complicated things.
(Basic users will still be able to use it with the same settings.)

@martindurant
Copy link

Thanks @reoono , let me know if I can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants