Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 dataset access #243

Open
robmarkcole opened this issue May 2, 2024 · 2 comments
Open

S3 dataset access #243

robmarkcole opened this issue May 2, 2024 · 2 comments

Comments

@robmarkcole
Copy link

Hi
I understand the dataset can be streamed from S3, following the example in the docs I get an error, and assume access must be granted?

 > aws s3 ls s3://clay-tiles-02/02/27WXN/

An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
@brunosan
Copy link
Member

brunosan commented May 7, 2024

Hi Rob!!
I think the right move here is to copy a representative sample of embeddings to source.coop

I don't think if it makes sense to publicly host a copy of the whole training set publicly, when is just a cropped selection of data already available. E.g. on v1 we have 50M chips and we are anyways moving towards streaming from source COGs into the GPUs on training. https://github.com/Clay-foundation/stacchip

In the meantime I've just activated requester pays on this bucket.

@brunosan brunosan closed this as completed May 7, 2024
@robmarkcole
Copy link
Author

@brunosan I get an error:

⚡ ~/Clay-Foundation-Model aws s3 ls s3://clay-tiles-02/02/27WXN/ --request-payer requester

An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

@brunosan brunosan reopened this May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants