Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new command nf-core rocrate to create a Research Object (RO) crate for a pipeline #2680

Open
wants to merge 26 commits into
base: dev
Choose a base branch
from

Conversation

mashehu
Copy link
Contributor

@mashehu mashehu commented Jan 23, 2024

Example crate from the rnaseq pipeline:
ro-crate-metadata.json

Copy link

codecov bot commented Jan 24, 2024

Codecov Report

Attention: 65 lines in your changes are missing coverage. Please review.

Comparison is base (31c61ca) 73.41% compared to head (d0e03b1) 73.39%.
Report is 23 commits behind head on dev.

Files Patch % Lines
nf_core/rocrate.py 73.71% 46 Missing ⚠️
nf_core/__main__.py 31.81% 15 Missing ⚠️
nf_core/components/components_command.py 0.00% 4 Missing ⚠️
Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mashehu
Copy link
Contributor Author

mashehu commented Jan 24, 2024

@nf-core-bot changelog: Add new command nf-core rocrate to create a Research Object (RO) crate for a pipeline

@mashehu mashehu requested a review from ewels January 24, 2024 16:48
@mashehu mashehu marked this pull request as ready for review January 24, 2024 16:49
Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice! 👏🏻

Couple of minor comments and haven't tried running myself, but from a quick run through of the code I think it looks great 👍🏻

nf_core/rocrate.py Outdated Show resolved Hide resolved
self.add_main_authors(wf_file)
wf_file.append_to("programmingLanguage", {"@id": "#nextflow"})
# get keywords from nf-core website
remote_workflows = requests.get("https://nf-co.re/pipelines.json").json()["remote_workflows"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍🏻

def add_main_authors(self, wf_file):
"""
Add workflow authors to the crate
NB: We don't have much metadata here - scope to improve in the future
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some discussion about revisiting the CITATIONS.cff file and other things. For example, in theory I think that we can even associate ORCiD identifiers here and all sorts?

It'd also be cool if we could scrape GitHub contributors or something 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nooo way, you did ORCiD lookups?? haha, amazing 🙇🏻

Copy link
Contributor Author

@mashehu mashehu Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added now all contributors to "main.nf" according to git (if they have an "identifiable" name (i.e. a space in it): e8959ef

nf_core/rocrate.py Outdated Show resolved Hide resolved
nf_core/rocrate.py Show resolved Hide resolved
Comment on lines 119 to 120
Args:
path (Path): Path to the pipeline directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Args:
path (Path): Path to the pipeline directory

# Conform to RO-Crate 1.1 and workflowhub-ro-crate
self.crate.update_jsonld(
{
"@id": "ro-crate-metadata.json",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ifmetadata_fn is None, would this be an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the behavior of this parameter now to only be the path, because ro-crate-metadata.jsonseems to be the standard name for it.

if fn.endswith(".png"):
log.debug(f"Adding workflow image file: {fn}")
self.crate.add_jsonld({"@id": Path(fn).name, "@type": ["File", "ImageObject"]})
if "metro_map" in fn:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not everyone calls it "metro_map", should we add a point to the docs to let people know that they can use this in the file name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, didn't come up with another good filter for it, but good point about the docs.

@ewels ewels added this to the 2.13 milestone Feb 15, 2024
@ewels
Copy link
Member

ewels commented Feb 16, 2024

TODO:

  • Add linting tests
    • Check file paths are still valid
    • ...?

@stain
Copy link

stain commented Feb 16, 2024

CreativeWorkStatus should have lower case c to match https://schema.org/creativeWorkStatus

@stain
Copy link

stain commented Feb 16, 2024

#main.nf should be main.nf as it's a retrievable File and not a concept.

#nextflow should instead be https://w3id.org/workflowhub/workflow-ro-crate#nextflow to match https://about.workflowhub.eu/Workflow-RO-Crate/

@mashehu mashehu modified the milestones: 2.13, 3.0 Feb 19, 2024
@stefanches7
Copy link

stefanches7 commented Mar 18, 2024

Throws an error if downloaded using nf-core download and not git clone

image

nf-core rocrate still outputs some resulting file, but with very truncated insights. As discussed with @mashehu, it seems to fail due to failed ORCID lookup in absence of .git repo pointers (if not cloned using git)

@stefanches7
Copy link

"Data entities representing workflows (@type: ComputationalWorkflow) SHOULD comply with the Bioschemas ComputationalWorkflow profile, where possible." - https://www.researchobject.org/ro-crate/1.1/workflows.html#complying-with-bioschemas-computational-workflow-profile

@stefanches7
Copy link

We could include subworkflows / modules information to the RO-Crate to increase machine readability. An overhead is of course the metadata size.

Unclear situation with versioning: how to identify RO-Crates of a same workflow but of different versions (esp. if the changes are not yet commited to git)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants