Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Case: How to get the contents of a RO as a zip file? #228

Open
dgarijo opened this issue Jan 12, 2023 · 4 comments · Fixed by #255 · May be fixed by #296
Open

Use Case: How to get the contents of a RO as a zip file? #228

dgarijo opened this issue Jan 12, 2023 · 4 comments · Fixed by #255 · May be fixed by #296
Assignees
Labels
use-case A (potential) use-case for ROLite creation, consumption or integration

Comments

@dgarijo
Copy link
Contributor

dgarijo commented Jan 12, 2023

As a programmer, I want to obtain the aggregated contents of a Research Object as a downloadable resource.

Ideally, I would like to do so through a request and content-negotiation. But I do not see an agreement about how to serve the RO-Crate itself. Can we agree into something like application/zip?
Can we have some community-agreed guidelines?

@dgarijo dgarijo added the use-case A (potential) use-case for ROLite creation, consumption or integration label Jan 12, 2023
@stain
Copy link
Contributor

stain commented Mar 23, 2023

https://signposting.org/adopters/#workflowhub documents how we do this with Signposting in WorkflowHub. Could we generalize this?

Let's make a new section for Retrieving RO-Crate and move out some of the content-negotiation described in https://www.researchobject.org/ro-crate/1.2-DRAFT/profiles#how-to-retrieve-a-profile-crate
to perhaps allow both for application/zip and application/ld+json.

We can then add signposting particularly where the persistent identifier has a HTML landing page (which may be ro-crate-preview.html as suggested by Profile Crate) -- see #160

See also #149

@stain
Copy link
Contributor

stain commented May 18, 2023

Not sure we should close this, as we don't detail what to expect in the zip file.

@dgarijo -- is the text in https://www.researchobject.org/ro-crate/1.2-DRAFT/root-data-entity.html#root-data-entity-identifier sufficient for 1.2 to close this?

Here's one take with BagIt: https://trefx.uk/trusted-wfrun-crate/0.3/#archive-serialisation which assumes a single folder (with arbitrary name) that again contains bagit.txt and manifest-sha512.txt with checksums and then data/ro-crate-metadata.json -- I'm trying to formalize this into an update of https://github.com/ResearchObject/bagit-ro profile but it is mostly already in https://www.researchobject.org/ro-crate/1.2-DRAFT/appendix/implementation-notes.html#adding-ro-crate-to-bagit

Then there is Workflow RO-Crate has a different take where the Zip file has not got a top level directory at all (that is ro-crate-metadata.json and other files are directly in ZIP root). This is easy to access programmatically, but may give some classical unzip users a surprise as the current directory will be filled with multiple files. (I think the Windows/macOS integrations will make a folder for you)

ROHub also exports directly with ro-crate-metadata.json in the root.

As I listed in https://trefx.uk/trusted-wfrun-crate/0.3/#zip-expectations certain ZIP features should not be used, e.g. multipart (for floppies!), ZIP64 extensions are needed for larger than 2 GB, etc. These are documented fairly well in https://www.w3.org/publishing/epub32/epub-ocf.html#sec-zip-container-zipreqs

@stain stain reopened this May 18, 2023
@stain
Copy link
Contributor

stain commented May 18, 2023

I start thinking that we need multiple profiles depending on if it's a bagit-wrapping ZIP, a "plain" RO-Crate, or a detached RO-Crate JSON-LD..

A ZIP archive with ro-crate-metadata.zip in the root:

Link: <https://example.com/workflows/419/ro_crate.zip> ;
      rel="item" ;
      type="application/zip" ;
      profile="https://w3id.org/ro/crate#archive" 

(or make a new w3id PID space for that)

A bagit zip according to https://www.researchobject.org/ro-crate/1.2-DRAFT/appendix/implementation-notes.html#adding-ro-crate-to-bagit aka foo-something/data/ro-crate-metadata.json:

Link: <https://example.com/workflows/419/bagit.zip> ;
      rel="item" ;
      type="application/zip" ;
      profile="https://w3id.org/ro/bagit/profile/0.3" 

An RO-Crate Metadata Document straight on the web (Detached or Attached):

Link: <https://example.com/workflows/419/ro-crate-metadata.json> ;
      rel="item" ;
      type="application/ld+json" ;
      profile="https://w3id.org/ro/crate" 

And then only the final one corresponds to the profile registered in https://www.iana.org/assignments/profile-uris/profile-uris.xhtml as a JSON-LD profile.

In either case, when retrieving, the profile will be provided as a Link as described in https://trefx.uk/trusted-wfrun-crate/0.3/#media-type-and-profiles

GET http://example.com/crates/42.zip HTTP/1.1

HTTP/1.1 200 OK
Content-Type: application/zip
Link: <https://w3id.org/ro/crate#archive>; rel="profile"`

Or from a landing page, with signposting as above:

HEAD http://example.com/crates/42.html HTTP/1.1

HTTP/1.1 200 OK
Content-Type: text/html
Link: <https://example.com/query-12389.zip>; rel="item", type="application/zip"
Link: <https://w3id.org/ro/crate>; rel="profile"; type="application/zip";
   anchor="https://example.com/query-12389.zip"

@dgarijo
Copy link
Contributor Author

dgarijo commented May 18, 2023

Hmm, you may be correct, although it complicates things a little.

From my end, I am interested in knowing what to prepare when someone asks for one of my ROs with permanent ids.
For example https://w3id.org/dgarijo/ro/sepln2022 i set up json-ld (ro-crate metadata file) and the HTML. But I did not find a recommendation on how to create the zip file when I last browsed the spec.

The text in https://www.researchobject.org/ro-crate/1.2-DRAFT/root-data-entity.html#root-data-entity-identifier points me to https://www.researchobject.org/ro-crate/1.2-DRAFT/profiles.html#how-to-retrieve-a-profile-crate, but it is not clear how I should structure the contents of the zip file.

Also, should my root data entity contain a link to the zip file with the downloadable ro-crate? maybe using the schema.org distribution properties used for datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
use-case A (potential) use-case for ROLite creation, consumption or integration
Projects
2 participants