Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

access anonymous / public AWS S3 object #197

Open
fils opened this issue Jul 22, 2021 · 2 comments
Open

access anonymous / public AWS S3 object #197

fils opened this issue Jul 22, 2021 · 2 comments
Projects

Comments

@fils
Copy link
Contributor

fils commented Jul 22, 2021

With dask I can do

df = dd.read_parquet('s3://bucket/key', storage_options={'anon': True})

and it will work for a public bucket / object on AWS S3

trying

kg = kglab.KnowledgeGraph()

kg.load_parquet('s3://bucket/key', storage_options={'anon': True})

returns: NoCredentialsError: Unable to locate credentials

curious what the way to pass the anon True credentials is.

@ceteri
Copy link
Collaborator

ceteri commented Jul 22, 2021

Great point @fils !

Would it work to wrap these S3 URLs within some of the other libraries for working with them? In the load_parquet method there's support for using:

Although I haven't had a really good use case yet to test with for AWS – much of our testing is on GCP at the moment.

FWIW, we tried to integrate pathy as well, although had run into some installation problems. If that'd work better, we could revisit pathy ?

@fils
Copy link
Contributor Author

fils commented Jul 22, 2021

@ceteri I likely lack the depth of experience to suggest a path. :)

What little I do know makes me think fsspec sounds interesting. If only since I am learning Dask and there seems to be a relation there?

I could side step this rather easily in many ways. Crudely, I could simply pulling down the parquet and loading locally, or just using my credentials. Anonymous AWS access is perhaps an edge case given the issues it could raise for a data providers wallet.

Our use case is that it might be nice to allow people to explore with some small data without any need for credentials and we have to be using AWS S3... so here we are.

Anonymous access for kg.load_parquet could have its uses. If you have suggestions on a path for now, I'd take any guidance.

@ceteri ceteri added this to To do in kglab Aug 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
kglab
  
To do
Development

No branches or pull requests

2 participants