Use Case: Describe/include software containers #39

stain · 2019-08-12T09:36:51Z

As an open science researcher, I want to provide Docker/Singularity container images so that others can reliably reproduce my results or reuse the same software.

This implies that the container images and their recipes (e.g. Dockerfile) should be included in the RO-Crate and typed as such, so users know they can be executed.

It is desirable also to use tooling to expand the description with a list of dependencies installed in the container this will help provide light-weight software citations.

Related efforts to align with:

The text was updated successfully, but these errors were encountered:

stain · 2019-08-12T12:07:33Z

Example descriptions generated by extract-dockerfile

From a Dockerfile we describe a ContainerRecipe (specializes SoftwareSourceCode

{
    "@context": "http://www.schema.org",
    "@type": "ContainerRecipe",
    "name": "vsoch/salad",
    "description": "A Dockerfile build recipe",
    "containerImage": "gliderlabs/alpine:3.4",

    "labels": [
        [
            "MAINTAINER toasterlint \"henry@toasterlint.com"
        ]
    ],
    "environment": [
        "RPCPORT=4000"
    ],
    "entrypoint": [
        "/entrypoint"
    ],
}

(see openschemas/specifications#10)

From a Docker image we describe a ContainerImage:

{
    "environment": [
        "SRC_DIR=/go/src/github.com/vsoch/salad/"
    ],
    "entrypoint": [
        "/code/salad"
    ],
    "description": "A Dockerfile build recipe",
    "name": "vanessa/sregistry",
    "ContainerImage": "iron/go:dev",
    "operatingSystem": "linux",
    "softwareVersion": "sha256:8d1e7f244db9e7cb85d5867bb3230f756460900e5801ff2303e44a79369640f4",
    "identifier": [
        "vanessa/sregistry:latest"
    ],
    "url": "https://hub.docker.com/r/vanessa/sregistry",
    "alternateName": "Singularity Registry",
    "softwareHelp": "https://singularityhub.github.io/sregistry",
    "citation": "http://joss.theoj.org/papers/050362b7e7691d2a5d0ebed8251bc01e",
    "license": "https://github.com/singularityhub/sregistry/blob/master/LICENSE",
    "keywords": "container, containers, singularity, singularity registry",
    "softwareRequirements": [
        "Pip > xmlsec==1.3.3"
    ],
    "@context": "http://www.schema.org",
    "@type": "ImageDefinition"
}

Above extract-dockerfile has actually extracted the softwareRequirements of pip installs from inside the container.

(however this type is called ContainerImage rather than ImageDefinition so some stability with upstream specs would be needed - see openbases/extract-dockerfile#6)

vsoch · 2019-08-12T13:03:08Z

See discussion in openbases/extract-dockerfile#6 - there was some discussion over the name, my preference is for what is represented in https://openschemas.github.io/specifications/ because (as you correctly bring up) an ImageDefinition could refer to other kinds of images, but ContainerImage is more clear.

dgarijo · 2019-08-12T16:31:40Z

This is interesting! Would this need to be related to cwl as well? (which defines how to invoke the image as opposed to the definition of the image itself)

In Dockerpedia they have done a thorough extraction of images, although it's not aligned with schema. Maybe we can use their service for extraction too. An example: https://dockerpedia.inf.utfsm.cl/resource/SoftwareImage/dockerpedia-pegasus_workflow_images_latest

vsoch · 2019-08-12T18:01:20Z

I don't think it would be wise to "hard code" (so to speak) any particular workflow manager or description (e.g., cwl, snakemake, nextflow) directly into the specification. On the other hand, if there is an appropriate field to describe this same entity, it would be logical to include (e.g., if I find that it's snakemake, I should look for a Snakefile somewhere...)

For CWL, is there a definitive specification for interaction? For example, for a scif container, you can be absolutely sure how to discover applications inside (singularity run container.sif apps) and then how to run / inspect / shell / otherwise interact with an application you just found (e.g., singularity run container.sif run <app>.

dgarijo · 2019-08-12T19:00:41Z

CWL has a field for pulling from a docker container. Maybe that could be the hook.
My point is not necessarily to use a particular workflow spec. What I want to record is how the app in the container is supposed to be invoked and how to pass on the files. Since cwl describes this, it could be a starting point

vsoch · 2019-08-12T19:03:34Z

Yes, understood! To be more clear, there are many different tools that describe in a structured way how a container (or app inside) is supposed to be invoked. Actually, those two things are different - cwl could describe an app in a container (and it would have to be provided via the entrypoint so the user could run it to find it) while SCIF describes how to invoke the container itself (of which cwl could be one or more entrypoints).

But from how you describe it - that there is a field for pulling the container, this sounds like it would need to be stored outside of the container, which is another point to discuss. SCIF is a specification that describes standard interaction with a container, and is installed inside the container, along with the SCIF filesystem and other metadata files that are defined for each app.

craig-willis · 2019-10-03T22:09:56Z

This is a necessary use case for Whole Tale. A few questions:

What about RO-Crates with repo2docker compatible configurations?
In the case of a Docker image, is the idea that the RO-Crate would contain a tar archive of the image or a reference to the image in a registry (or either)?
While not containers in the same sense, sciunit and reprozip also produce re-executable packages that could be parts of RO-Crates. Are these in scope?

vsoch · 2019-10-05T13:21:04Z

Having a repo2docker configuration is an interesting and useful idea, but I think it would be done in addition to a container recipe - repo2docker in and of iteself doesn't translate to reproducibility - it just means that (assuming a version of repo2docker is available) you could build a container for it. You can think of it like an extra layer to essentially create a Dockerfile (that could be built). It also assumes a user "joyvan" that when converted to Singularity (e.g., for use on HPC) makes things a bit challenging because of the cardinal rule "the user inside the container is the user outside the container."

Re-reading what @stain mentioned - it sounds like he wants the full container, in which case Docker wouldn't be as feasible as it means layers that need to be assembled and require the Docker daemon. A Singularity (sif) binary would be more reasonable, albeit large, and still require Singularity to run. It's really the case that any level of recipe without the container runs the risk of not being able to be built, so probably providing the container somewhere is needed. In the case of Singularity, the recipe file is kept inside the container as well. In the case of Docker, the recipe (and other metadata) would serve as an external way to peep inside without invoking the container.

I'm not super familiar with RO-crates, but reading the description:

RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata.

it does sound like a wrapper (with metadata) to a container is wanted? The container, considered as some kind of data, could also fit into the specification, and as @stain showed, metadata could be extracted for the jsonld.

jmfernandez · 2019-10-09T17:48:45Z

Re-reading what @stain mentioned - it sounds like he wants the full container, in which case Docker wouldn't be as feasible as it means layers that need to be assembled and require the Docker daemon.

Indeed, you can generate with docker save a tar file with the different layers from one or more tagged docker images, which can be used later to generate a singularity image with singularity import.

I also agree the container recipe is worth to be saved (or referenced plus a fingerprint), as the base image of the recipe could contain a bug, and you would like to re-create it.

stain added the use-case A (potential) use-case for ROLite creation, consumption or integration label Aug 12, 2019

stain changed the title ~~Use Case: ...~~ Use Case: Describe software containers Aug 12, 2019

stain changed the title ~~Use Case: Describe software containers~~ Use Case: Describe/include software containers Aug 12, 2019

stain mentioned this issue Aug 12, 2019

spython is missing DockerRecipe openbases/extract-dockerfile#3

Closed

ThomasThelen mentioned this issue Nov 2, 2020

Allow exporting of whole renku projects SwissDataScienceCenter/renku-python#1327

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Case: Describe/include software containers #39

Use Case: Describe/include software containers #39

stain commented Aug 12, 2019

stain commented Aug 12, 2019 •

edited

vsoch commented Aug 12, 2019

dgarijo commented Aug 12, 2019

vsoch commented Aug 12, 2019

dgarijo commented Aug 12, 2019

vsoch commented Aug 12, 2019

craig-willis commented Oct 3, 2019

vsoch commented Oct 5, 2019

jmfernandez commented Oct 9, 2019

Use Case: Describe/include software containers #39

Use Case: Describe/include software containers #39

Comments

stain commented Aug 12, 2019

stain commented Aug 12, 2019 • edited

vsoch commented Aug 12, 2019

dgarijo commented Aug 12, 2019

vsoch commented Aug 12, 2019

dgarijo commented Aug 12, 2019

vsoch commented Aug 12, 2019

craig-willis commented Oct 3, 2019

vsoch commented Oct 5, 2019

jmfernandez commented Oct 9, 2019

stain commented Aug 12, 2019 •

edited