New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new command nf-core rocrate
to create a Research Object (RO) crate for a pipeline
#2680
base: dev
Are you sure you want to change the base?
Conversation
# Conflicts: # nf_core/__main__.py
Codecov ReportAttention:
Additional details and impacted files☔ View full report in Codecov by Sentry. |
@nf-core-bot changelog: Add new command nf-core rocrate to create a Research Object (RO) crate for a pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice! 👏🏻
Couple of minor comments and haven't tried running myself, but from a quick run through of the code I think it looks great 👍🏻
self.add_main_authors(wf_file) | ||
wf_file.append_to("programmingLanguage", {"@id": "#nextflow"}) | ||
# get keywords from nf-core website | ||
remote_workflows = requests.get("https://nf-co.re/pipelines.json").json()["remote_workflows"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍🏻
nf_core/rocrate.py
Outdated
def add_main_authors(self, wf_file): | ||
""" | ||
Add workflow authors to the crate | ||
NB: We don't have much metadata here - scope to improve in the future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some discussion about revisiting the CITATIONS.cff
file and other things. For example, in theory I think that we can even associate ORCiD identifiers here and all sorts?
It'd also be cool if we could scrape GitHub contributors or something 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nooo way, you did ORCiD lookups?? haha, amazing 🙇🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added now all contributors to "main.nf" according to git (if they have an "identifiable" name (i.e. a space in it): e8959ef
nf_core/rocrate.py
Outdated
Args: | ||
path (Path): Path to the pipeline directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Args: | |
path (Path): Path to the pipeline directory |
# Conform to RO-Crate 1.1 and workflowhub-ro-crate | ||
self.crate.update_jsonld( | ||
{ | ||
"@id": "ro-crate-metadata.json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ifmetadata_fn
is None
, would this be an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the behavior of this parameter now to only be the path, because ro-crate-metadata.json
seems to be the standard name for it.
if fn.endswith(".png"): | ||
log.debug(f"Adding workflow image file: {fn}") | ||
self.crate.add_jsonld({"@id": Path(fn).name, "@type": ["File", "ImageObject"]}) | ||
if "metro_map" in fn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not everyone calls it "metro_map", should we add a point to the docs to let people know that they can use this in the file name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, didn't come up with another good filter for it, but good point about the docs.
TODO:
|
|
|
Throws an error if downloaded using
|
"Data entities representing workflows (@type: ComputationalWorkflow) SHOULD comply with the Bioschemas ComputationalWorkflow profile, where possible." - https://www.researchobject.org/ro-crate/1.1/workflows.html#complying-with-bioschemas-computational-workflow-profile |
We could include subworkflows / modules information to the RO-Crate to increase machine readability. An overhead is of course the metadata size. Unclear situation with versioning: how to identify RO-Crates of a same workflow but of different versions (esp. if the changes are not yet commited to git) |
Example crate from the rnaseq pipeline:
ro-crate-metadata.json