Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WACZ futurism: mimetype and Pronom ID #41

Open
DiegoPino opened this issue Dec 3, 2020 · 4 comments
Open

WACZ futurism: mimetype and Pronom ID #41

DiegoPino opened this issue Dec 3, 2020 · 4 comments

Comments

@DiegoPino
Copy link

Good morning!

Just an idea. As early adapters of the WACZ format we were thinking that it could be nice in the future to have a specific way we should identify WACZ (before any processing). Requesting a new mimetype to IANA seems a bit out of scope (or not?) but thinking of the data package inheritance that happens in WACZ and based on the extra arguments we we can pass to a mimetype

See : https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types and see https://github.com/frictionlessdata/specs/tree/master/data-package where datapackages' media type is
application/vnd.datapackage+json seems to be for the JSON? but not sure about the gzipped version of the full package? (I may be lacking coffee here)

application/vnd.datapackage+json;parameter=value

or

application/zip;parameter=value?

application/vnd.datapackage+zip or +gzip?

parameter = content?
value = webarchive?

Too liberal?

Anyhow. Just ideas. On our side we can start by using application/vnd.datapackage+gzip

Pronom goes a bit further by registering file content characteristics. Still important for Digital preservation to have at least some discussion, maybe someone from that realm could give us a hint

Thanks

@edsu
Copy link
Collaborator

edsu commented Nov 24, 2021

Since effort is now underway to make WACZ into more of a standard I think working towards an IETF media type might actually be a good idea. Registering a media type doesn't require the standard be developed at IETF.

@ikreymer
Copy link
Member

ikreymer commented Dec 3, 2021

Pronom also came up today in another conversation, not entirely sure what is needed for that, but happy to explore if there is interest. Looks like submission form is here: https://www.nationalarchives.gov.uk/PRONOM/submit.htm

@DiegoPino
Copy link
Author

DiegoPino commented Dec 3, 2021 via email

@tballison
Copy link

We're adding wacz detection (maybe parsing?) over on Apache Tika now. As a temporary placeholder at least, is application/wacz appropriate ?

https://issues.apache.org/jira/browse/TIKA-3696

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants