Add more `Store` implementations #9

Hirevo · 2019-10-13T23:28:30Z

This is a tracking issue for additional Store implementations.

Here is a non-exhaustive list of potential candidates for an Store implementation:

S3Storage:
This would store crate tarballs and rendered README pages into Amazon's S3.
RemoteStorage:
Requires to implement a companion server.
The idea is to have a reserved folder on another machine which runs a companion server, you would then point Alexandrie to that server and it would take care of storing crate tarballs and README pages on that machine on behalf of the registry.
The point of this is to have an easy solution to outsource the crates' storage for people having multiple machines available to them.

The text was updated successfully, but these errors were encountered:

danieleades · 2020-04-15T14:47:23Z

i think using artifactory for binary storage would be useful for a lot of enterprise users.

jgallagher-cs · 2020-07-16T18:03:13Z

Hi, and thanks for this project! I'm evaluating it for internal use at my company, and I'd like to add support for S3 storage. I've started by trying to use rusoto_s3. Implementing the "get" methods is no problem, but implementing the "store" methods seems to be pretty painful:

The put_object method wants a StreamingBody as its payload.
Constructing a StreamingBody (which is a type alias for rusoto_core::ByteStream) appears to need a Stream<Item = Result<bytes::Bytes, std::io::Error>> + Send + Sync + 'static, which already seems difficult to satisfy with the current store type bound of data: T where T: io::Read, both because of the inherent pain of converting blocking -> streaming and the missing extra Send + Sync + 'static type bounds. But on top of that...
Apparently S3 puts actually require you to know the size up front. So not only do we have all the type bound restrictions + sync -> async, we also have to know the full size ahead of time.

It seems like the only way to satisfy that is to read the full data into a Vec<u8>, which can trivially be converted into a ByteStream. I found another S3 library that has a method that takes a blocking std::io::Reader for puts, and that's exactly what it does.

Looking higher up the call stack, the store methods are called from alexandrie/src/crates/publish.rs, and it looks like they have already read the full data into memory. So passing a T: Read down to the store methods which would read them into a new vec would just be making a copy. That doesn't seem terrible, but doesn't seem ideal either. It looks like with a small amount of work, the caller could pass an owned Vec<u8> down to the store methods, which would avoid the double copy.

Do you have any interest in changing the store methods to take a Vec<u8> instead of a T: Read? Or would you rather keep T: Read but need to make a copy when storing to S3? (Or if you see another avenue out of this, that'd be great too!)

Hirevo · 2020-07-17T19:51:54Z

@jgallagher-cs Hello, and thank you for the interest and willingness to help !
It indeed would be really cool to have the ability to leverage S3 as a storage option.

Reading your comment made me realize that the Storage abstraction is trying to do more than what is actually needed from it.
You're completely right that we have no use for streaming bytes into the storage without having buffered the entire crate already.
I (most-likely) made the the mistake of adding this trait method as a way-too-early guess that it could be useful one day, but, as you've shown, even if it was something that is needed/desired, the current implementation is not the right one.

I am in favor of changing the API as you suggested, by simply taking a Vec<u8>.
I think it models better what the registry really does and needs.

jgallagher-cs · 2020-07-20T14:39:06Z

👍 Great, I'll make that change.

Do you want S3 support gated behind a Cargo feature? It ends up pulling in quite a few dependencies, so I would assume "yes", but features don't really play nicely with workspaces (e.g., rust-lang/cargo#7507). The docs for building with S3 support would be something like

cd alexandrie
cargo build --release --features s3

:(

Hirevo · 2020-07-23T13:41:07Z

I would tend to be in favor of making it be behind a feature flag.
Since the user already has to choose a database implementation using a feature flag, I would say that it is a price that we already payed.
Also, I think that handling features in workspaces is going to improve in usability in the future, according to this issue.

This is primarily to simplify the S3 implementation (which would otherwise have to read the data into a local Vec, effectively making an in-memory copy since we already had the data in memory anyway). See discussion at Hirevo#9 (comment).

This is primarily to simplify the S3 implementation (which would otherwise have to read the data into a local Vec, effectively making an in-memory copy since we already had the data in memory anyway). See discussion at Hirevo/alexandrie#9 (comment).

Xuanwo · 2023-09-18T07:08:30Z

Hi, I'm the committer of OpenDAL: A data access layer that allows users to easily and efficiently retrieve data from various storage services in a unified way.

Do you think it's a good idea to integrate opendal as a Store implementations so that users can use s3, azblob, gcs, fs, hdfs as the storage backend?

Hirevo added C-enhancement Category: Enhancement good first issue Good for newcomers help wanted Extra attention is needed P-medium Priority: Medium labels Oct 13, 2019

Hirevo added this to the Core feature-set milestone Oct 13, 2019

Hirevo pinned this issue Oct 13, 2019

Hirevo changed the title ~~Add more Store implementors~~ Add more Store implementations Oct 13, 2019

jgallagher-cs mentioned this issue Jul 30, 2020

Add an S3 storage implementation #83

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more `Store` implementations #9

Add more `Store` implementations #9

Hirevo commented Oct 13, 2019

danieleades commented Apr 15, 2020

jgallagher-cs commented Jul 16, 2020

Hirevo commented Jul 17, 2020

jgallagher-cs commented Jul 20, 2020

Hirevo commented Jul 23, 2020

Xuanwo commented Sep 18, 2023

Add more Store implementations #9

Add more Store implementations #9

Comments

Hirevo commented Oct 13, 2019

danieleades commented Apr 15, 2020

jgallagher-cs commented Jul 16, 2020

Hirevo commented Jul 17, 2020

jgallagher-cs commented Jul 20, 2020

Hirevo commented Jul 23, 2020

Xuanwo commented Sep 18, 2023

Add more `Store` implementations #9

Add more `Store` implementations #9