Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can we let z5 write to object storage directly #129

Open
halehawk opened this issue Aug 16, 2019 · 9 comments
Open

can we let z5 write to object storage directly #129

halehawk opened this issue Aug 16, 2019 · 9 comments

Comments

@halehawk
Copy link

Now we integrated z5 into CESM (an earth system model). We want to test it on clouds with writing out to object storages directly. Do you know if we can do it by using current z5 or we need additional setup to write to object storages?

@constantinpape
Copy link
Owner

Now we integrated z5 into CESM (an earth system model).

Great!

We want to test it on clouds with writing out to object storages directly.

What exactly do you have in mind? aws-s3?

Do you know if we can do it by using current z5 or we need additional setup to write to object storages?

z5 does not support cloud object stores yet. However, there is an n5-java implementation for aws-s3 and google buckets. Zarr-python also supports some cloud stores

I am very interested in supporting this directly in z5 as well and would be happy to help out if you or @weilewei wanted to contribute to this.

It would be good if you could elaborate on your use-case a bit more. What exactly do you need? Would an implementation along the lines of n5-aws serve your purposes?

@clbarnes
Copy link
Contributor

I guess the easiest MVP would be to factor the C++ end to handle the chunking and compression,, communicating with python with lists of (block_index, bytes). That would work for block-aligned reads and writes; for non-aligned IO you'd have to negotiate with python to get the edge blocks (index-bytes tuples) to pass in to C++.

However, with that in place, you could pretty rapidly expand into any object storage python supports with optional dependencies. I suspect you could even use some of the utilities in zarr-python for handling them.

@halehawk
Copy link
Author

halehawk commented Aug 17, 2019 via email

@constantinpape
Copy link
Owner

@clbarnes

I guess the easiest MVP would be to factor the C++ end to handle the chunking and compression,, communicating with python with lists of (block_index, bytes).

Yes, that would probably be the fastest way to get something working, but I think it would add more value if we implemented a complete C++ solution (that can then be wrapped to python). This way z5 would allow access to zarr/n5 cloud storage from C++, which is currently not available.

@halehawk

I am thinking about aws-s3. I hope that Z5 can have the feature directly.

Ok, I also think that aws-s3 is the first cloud storage that should be implemented.

I’d like to help but I don’t know how to help yet.

From the implementation perspective, looking into the aws c++ sdk is a good starting point.

For details, I need to look and discuss with other people that are familiar with object storages.

Great, please share any feedback that you get.
In the meantime I will have a look into how to integrate cloud storage support in the z5 c++ codebase.

@halehawk
Copy link
Author

halehawk commented Aug 19, 2019 via email

@constantinpape
Copy link
Owner

constantinpape commented Aug 19, 2019

I had a look into how to integrate an AWS (or other cloud storage) backend into the z5 c++ API.
It took some refactoring, but I arrived at an implementation that should work, see #130.

The main idea is to separate the backend implementations into separate namespaces. You can find the implementation for the default filesystem backend and a mock-up implementation for aws.

Now we would need to actually implement the AWS part using the AWSSDK. Any help here would be very welcome! Let me know if there are any questions.

Note that the changes in the C++ API are breaking, so merging this would imply bumping the version to 2. Also, I haven't adapted the python bindings yet, but that should be fairly straightforward.

@halehawk
Copy link
Author

halehawk commented Aug 20, 2019 via email

@constantinpape
Copy link
Owner

We just use s3 instead of boost file system in your mock-up implementation for aws.

Exactly.

I need to figure out how to get an Amazon s3 account now.

I will have a look into getting an AWS account and setting up some test data too.

@constantinpape constantinpape added this to the Major release 2.0.0 milestone Aug 22, 2019
@constantinpape
Copy link
Owner

Short Update on this:
I have finished implementing the new C++ API now and adapted the python bindings and merged it in master. See also #133.
I made separate issues on implementing an AWS-S3 backend #136 and a Google Cloud Storage backend #137 for more technical discussions.
Any help on this is very welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants