Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interfacing with WASAPI #12

Open
machawk1 opened this issue Aug 13, 2020 · 1 comment
Open

Interfacing with WASAPI #12

machawk1 opened this issue Aug 13, 2020 · 1 comment

Comments

@machawk1
Copy link
Sponsor Contributor

Some of the collection-based retrieval aspects of this specification are particularly interesting, like the ability to specify specific pageIDs of interest.

As you are very well aware, @ikreymer, WASAPI is an abstracted spec for WARC retrieval with a few specifications. I can imagine a WACZ layer to make WASAPI implementations a bit more usable from both a macro and collection-based querying standpoint, as it seems to provide some standard semantics.

Because you have solicited thoughts in this repo, I wondered about consideration of interfacing with WASAPI and/or potentially providing endpoints or routes that align with WACZ.

I am looking forward to further discussion.

@ikreymer
Copy link
Member

Yes, since WASAPI is a data transfer API, while WACZ is designed to be a storage specification, there isn't any overlap, but they could definitely complement one another!

I think a main limitation is of WASAPI is that it allows you to download a bunch of WARCs in bulk, but then what do you do with them?
A tool could use WASAPI to download WARCs in bulk and then assemble them into a WACZ file, which could be a stable format that could then be instantly usable in replayweb.page or added to other storage.
I believe WASAPI is also missing support for any metadata, such as page/seed lists, which would probably also need to be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants