Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

use fsspec simplecache #323

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rabernat
Copy link
Contributor

@rabernat rabernat commented Mar 9, 2022

This PR would get rid of our custom local file copying logic in favor of fsspec's simplecache mecahnism.

After filing intake/intake-xarray#116 I realized how easy it should be. @martindurant has been saying this for a while. 馃檭

@rabernat
Copy link
Contributor Author

rabernat commented Mar 9, 2022

Comprehensive test suite FTW! The test failure is due to HTTP credentials not being passed through via fsspec_open_kwargs.

Here is a minimal reproducer.

import aiohttp
import fsspec 

url = "http://httpbin.org/basic-auth/foo/bar"
auth = aiohttp.BasicAuth(login='foo', password='bar')

with fsspec.open(url, auth=auth) as fp:
    data = fp.read()  # works
    
with fsspec.open("simplecache::" + url, auth=auth) as fp:
    data = fp.read()  # ClientResponseError

@martindurant - would you consider this an fsspec issue? Or is there a way to pass the credentials through?

@martindurant
Copy link
Contributor

Should be

with fsspec.open("simplecache::" + url, http=dict(auth=auth)) as fp:
    data = fp.read()  # ClientResponseError

because the URL now has two components, and fsspec needs to know which of these to send the kwargs to.

@rabernat
Copy link
Contributor Author

rabernat commented Mar 9, 2022

Great, thanks for chiming in!

The problem is that in Pangeo Forge (in contrast to the toy example I shared), we don't necessary know that it's an http link. It could be ftp, or anything else. We could try to parse that out, but that feels fragile. Is there a different way of invoking the simplecache?

@martindurant
Copy link
Contributor

Ah, I see. The system isn't designed for passing arbitrary arguments to "the second filesystem", but I'll have a think about what can be done.

fsspec uses the following to parse the URL pieces:

x = re.compile(".*[^a-z]+.*")
bits = (
    [p if "://" in p or x.match(p) else p + "://" for p in path.split("::")]
    if "::" in path
    else [path]
)

@martindurant
Copy link
Contributor

I suppose you could do the following:

In [16]: fs, _ = fsspec.core.url_to_fs(url, auth=auth)

In [17]: fs2 = fsspec.filesystem("simplecache", fs=fs)

In [18]: with fs2.open(url) as f:
    ...:     print(f.read())
    ...:
b'{\n  "authenticated": true, \n  "user": "foo"\n}\n'

@rabernat
Copy link
Contributor Author

rabernat commented Mar 9, 2022

What about this?

fs, _, paths = fsspec.get_fs_token_paths(url, storage_options={'auth': auth})
cache_fs = fsspec.implementations.cached.CachingFileSystem(fs=fs)

with cache_fs.open(paths[0]) as fp:
    data = fp.read()

@martindurant
Copy link
Contributor

I'm pretty sure that amounts to the same thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants