Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementation of goleft/indexcov #1312

Open
wants to merge 11 commits into
base: dev
Choose a base branch
from
Open

Conversation

lindenb
Copy link

@lindenb lindenb commented Nov 3, 2023

This is a preliminary PR to implement goleft/indexcov in sarek: https://nfcore.slack.com/archives/CE6SDEDAA/p1696931394165659 . For now I haven't looked at the indentation, syntax, doc, etc...

Basically the BAMs are re-indexed (see below) and collected into indexcov. The output of indexcov is a set of png, html, bed.gz, etc... files.

NEW FILE: modules/local/samtools_index_for_indexcov/main.nf instead of using the original bam files, I choose to re-index the bam, ignoring the unmapped, secondary, etc... It's quite fast as we only need a BAM header and a BAM index.

NEW FILE: modules/local/goleft/indexcov/main.nf the process for indexcov itself. I know I should have used the nf-core/module I designed but somehow it was easier for me to handle the 'input:' from sarek. Also I don't know what is meta.id' in that context.

NEW_FILE: subworkflows/local/bam_goleft_indexcov/main.nf the glue for the two processes above. I don't like that id:"PREFIX".

NEW FILE: modules/goleft.indexcov.config a config for goleft/indexcov . I cannot make it work: the publishDir, the files are currently not copied into results

NEW FILE: modules/indexcov_reindex.config a config for re-indexing the bam

MODIFIED: subworkflows/local/bam_variant_calling_germline_all/main.nf . I only added the subworkflow for indexcov here, just after manta. I'm not sure it's the best place to insert it.

MODIFIED: conf/test_full_germline.config . I added indexcov to the list of toolsMODIFIED: nextflow_schema.json, same, I added indexcov in the pattern.

MODIFIED: nextflow.config to include the configs above.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • [?] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Nov 3, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 3c7ef81

+| ✅ 144 tests passed       |+
#| ❔   9 tests were ignored |#
!| ❗   1 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in WorkflowSarek.groovy: Optionally add in-text citation tools to this list.

❔ Tests ignored:

  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: conf/modules.config
  • files_unchanged - File ignored due to lint config: assets/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_dark.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_ci - actions_ci
  • template_strings - template_strings

✅ Tests passed:

Run details

  • nf-core/tools version 2.10
  • Run at 2023-12-08 11:29:09

Comment on lines +2 to +33
label 'process_single'

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/goleft:0.2.4--h9ee0642_1':
'biocontainers/goleft:0.2.4--h9ee0642_1' }"

input:
val(meta)
path(bams)
path(fasta)
path(fai)
output:
path("${prefix}/*"),emit:output
path("versions.yml"),emit:versions
script:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
def input_files = bams.findAll{it.name.endsWith(".bam")} + bams.findAll{it.name.endsWith(".crai")}
def extranormalize = input_files.any{it.name.endsWith(".crai")} ? " --extranormalize " : ""
"""
goleft indexcov \\
--fai "${fai}" \\
--directory "${prefix}" \\
${extranormalize} \\
$args \\
${input_files.join(" ")}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
goleft: \$(goleft --version 2>&1 | head -n 1 | sed 's/^.*goleft Version: //')
END_VERSIONS
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you indent so it is similar to the other modules?

Copy link
Contributor

@FriederikeHanssen FriederikeHanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • How about combining goleft.indexcov.config and indexcov_reindex.config since they belong to the same tool?

  • Would it be possible to add the modules upstream to nf-core/modules so everyone can use them?

  • Missing some output docs

Looks super good already! 🚀 ❤️


input:
val(meta)
path(bams)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the tool run on all bams for all samples in one job?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind I see it further down. The docs say 30sec per 30 WGS 🤯 . Any idea how it scales with number of genomes?

@@ -100,7 +100,7 @@
"fa_icon": "fas fa-toolbox",
"description": "Tools to use for duplicate marking, variant calling and/or for annotation.",
"help_text": "Multiple tools separated with commas.\n\n**Variant Calling:**\n\nGermline variant calling can currently be performed with the following variant callers:\n- SNPs/Indels: DeepVariant, FreeBayes, GATK HaplotypeCaller, mpileup, Sentieon Haplotyper, Strelka\n- Structural Variants: Manta, TIDDIT\n- Copy-number: CNVKit\n\nTumor-only somatic variant calling can currently be performed with the following variant callers:\n- SNPs/Indels: FreeBayes, mpileup, Mutect2, Strelka\n- Structural Variants: Manta, TIDDIT\n- Copy-number: CNVKit, ControlFREEC\n\nSomatic variant calling can currently only be performed with the following variant callers:\n- SNPs/Indels: FreeBayes, Mutect2, Strelka2\n- Structural variants: Manta, TIDDIT\n- Copy-Number: ASCAT, CNVKit, Control-FREEC\n- Microsatellite Instability: MSIsensorpro\n\n> **NB** Mutect2 for somatic variant calling cannot be combined with `--no_intervals`\n\n**Annotation:**\n \n- snpEff, VEP, merge (both consecutively).\n\n> **NB** As Sarek will use bgzip and tabix to compress and index VCF files annotated, it expects VCF files to be sorted when starting from `--step annotate`.",
"pattern": "^((ascat|bcfann|cnvkit|controlfreec|deepvariant|freebayes|haplotypecaller|sentieon_dnascope|sentieon_haplotyper|manta|merge|mpileup|msisensorpro|mutect2|sentieon_dedup|snpeff|strelka|tiddit|vep)?,?)*(?<!,)$"
"pattern": "^((ascat|bcfann|cnvkit|controlfreec|deepvariant|freebayes|haplotypecaller|sentieon_dnascope|sentieon_haplotyper|manta|merge|mpileup|msisensorpro|mutect2|sentieon_dedup|snpeff|strelka|tiddit|vep|indexcov)?,?)*(?<!,)$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need alphabetical sorting or Maxime won't let it pass 😄

workflow BAM_GOLEFT_INDEXCOV {
take:
bam // channel: [mandatory] [ meta, bam, bai ]
fasta // channel: [mandatory] [ fasta ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fasta // channel: [mandatory] [ fasta ]
fasta // channel: [mandatory] [ fasta ]

@lindenb
Copy link
Author

lindenb commented Nov 13, 2023

Hi, thank you for the review. FYI I currently don't have the time to work on this. I'll fix things later.

@maxulysse
Copy link
Member

Hi, thank you for the review. FYI I currently don't have the time to work on this. I'll fix things later.

No worries, we're getting a release ready, so this one will only be included in the next one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants