Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add "qualimap bamqc" wrapper #533

Merged
merged 10 commits into from Aug 16, 2022

Conversation

mobilegenome
Copy link
Contributor

Description

qualimap bamqc is a tool to provide comprehensive quality control reports for alignments in BAM format.

The wrapper repository already contains a wrapper for qualimap rnaseq, this PR migrates this wrapper for bamqc.

QC

  • I confirm that:

For all wrappers added by this PR,

  • there is a test case which covers any introduced changes,
  • input: and output: file paths in the resulting rule can be changed arbitrarily,
  • either the wrapper can only use a single core, or the example rule contains a threads: x statement with x being a reasonable default,
  • rule names in the test case are in snake_case and somehow tell what the rule is about or match the tools purpose or name (e.g., map_reads for a step that maps reads),
  • all environment.yaml specifications follow the respective best practices,
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:),
  • all fields of the example rules in the Snakefiles and their entries are explained via comments (input:/output:/params: etc.),
  • stderr and/or stdout are logged correctly (log:), depending on the wrapped tool,
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to (see here; this also means that using any Python tempfile default behavior works),
  • the meta.yaml contains a link to the documentation of the respective tool or command,
  • Snakefiles pass the linting (snakemake --lint),
  • Snakefiles are formatted with snakefmt,
  • Python wrapper scripts are formatted with black.

test.py Outdated Show resolved Hide resolved
# BAM aligned, splicing-aware, to reference genome
bam="mapped/a.bam",
output:
directory("qc/a"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IS the output just a HTML/PDF report? If so, it would be better to specifically name the output file, instead for the output folder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this. The output file is always named genome_results.txt inside the directory currently specified. I think it would be confusing if the output file would be specified in the rule. This would create the impression the filename could be changed, and lead to errors if a different name would be specified. Thus, I would suggest to keep the output as directory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is it the only output file?
Alternatively, we could set the output folder to a temp folder (using tempfile) and at the end, depending on which output files were specified, copy them to their final destination (e.g. like in star).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a very good suggestion, but unfortunately it breaks the MultiQC compatibility that is based on exact file namings of the TXT and HTML reports based on the programs default. As I assume MultiQC is a major use-case of bamqc, it is definitly the better option to use the directory directive instead of specifying exact filenames.

I've add a parameter to create HTML reports by defaults, as these are also parsed by MultiQC and documented its use.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that then would be up to the user to properly define the output file name, no?
I was just wondering if, under some circumstances, the user might want to have different files placed in different places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fgvieira, I don't mean to push this review - but while my mind is still fresh on the topic - wanted to ask if there are any more concerns or if I can provide more information/changes to make it pass. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to always avoid directory() and refer to the output files directly, since it allows to reference specific files in downstream rules and mark unnecessary files as temp().

Also in the PR checklist, it states that:

  • input: and output: file paths in the resulting rule can be changed arbitrarily,

but not sure if it refers to single files or also to folders, or how much @johanneskoester wants to enforce it.

As for the rest, I think it looks great! 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry folks. I merged this accidentally before these have been fixed. I'll fix them myself now since it was my fault.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, actually having it as a dir is correct, since this is just an HTML folder hierachy. Nevertheless, I have added a sample report marker so that is becomes more obvious how to use this in a typical workflow.
#539

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, nowadays people might rather use it via multiqc. Hence, I will close my PR and leave it as it is.

Co-authored-by: Filipe G. Vieira <fgarrettvieira@gmail.com>
@johanneskoester johanneskoester merged commit 3af38c8 into snakemake:master Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants