feat: add option to allow mapping to pangenome #274

huzuner · 2023-11-16T16:45:33Z

This PR allows an optional pangenome alignment with vg giraffe.
The following additions/changes are made:

A new option pangenome is added to the config.
A rule to download the vg pangenome index is added into ref.smk.
Six other rules for aligning reads to the pangenome and preprocessing the resulting bam were added, into mapping.smk.
All input functions are updated to get pangenome aligned bam upon activation of pangenome in the config file.

…otation

huzuner · 2024-02-15T09:24:57Z

Bug with vg: alignments with deletions on ends: vgteam/vg#4204. They fixed the issue and the new release is coming on Feb. 26.

…ciraptor into allow-pangenome-mapping

…und errors

johanneskoester · 2024-05-08T13:58:19Z

config/config.yaml

@@ -112,7 +115,7 @@ calling:
    gene_list_filter:
      aux-files:
        super_interesting_genes: "config/super_interesting_genes.tsv"
-      expression: "ANN['GENE'] in AUX['super_interesting_genes']"


@FelixMoelder please have a look. Did VEP change the column name for the gene?

johanneskoester · 2024-05-08T13:59:15Z

config/config.yaml

-          - somatic_tumor_high
-          - somatic_tumor_medium


Let's rather not change the scenario of the default config (same below in config/scenario.yaml).

updated to use default workflow config and scenario.

johanneskoester · 2024-05-08T14:00:22Z

workflow/rules/common.smk

+            return "results/mapped/vg/{{sample}}_rg_added.{ext}".format(ext=ext)
+        else:
+            return "results/mapped/bwa/{{sample}}.{ext}".format(ext=ext)


Use f-string in both cases, use {mapper} or something like that in order to not mostly repeat the f-string.

johanneskoester · 2024-05-08T14:01:53Z

workflow/rules/common.smk

+        if get_sample_datatype(wildcards.sample) == "rna":
+            aligner = "star"
+        elif get_sample_datatype(wildcards.sample) == "dna" & is_activated(
+            "ref/pangenome"
+        ):
+            aligner = "vg"
+        else:
+            aligner = "bwa"


Same code as above, create new function get_aligner(wildcards) and use in both blocks.

and below as well

johanneskoester · 2024-05-08T14:03:11Z

workflow/rules/filtering.smk

@@ -19,6 +19,7 @@ rule filter_candidates_by_annotation:
 rule filter_by_annotation:
    input:
        bcf=get_annotated_bcf,
+        csi=lambda wc: get_annotated_bcf(wc, index=True),


Suggested change

csi=lambda wc: get_annotated_bcf(wc, index=True),

csi=partial(get_annotated_bcf, index=True),

johanneskoester · 2024-05-08T14:06:12Z

workflow/rules/mapping.smk

+rule map_reads_vg_giraffe:
+    input:
+        reads=get_map_reads_input,
+        idx="resources/pangenome/hprc-v1.0-mc-grch38.xg",


This should not be hardcoded. What if a different pangenome is used? The pangenome index should be configurable via the config.

johanneskoester · 2024-05-08T14:07:35Z

workflow/rules/mapping.smk

+    benchmark:
+        "benchmarks/samtools_view_primary_chr/{sample}.tsv"
+    params:
+        region="GRCh38.chr1 GRCh38.chr2 GRCh38.chr3 GRCh38.chr4 GRCh38.chr5 GRCh38.chr6 GRCh38.chr7 GRCh38.chr8 GRCh38.chr9 GRCh38.chr10 GRCh38.chr11 GRCh38.chr12 GRCh38.chr13 GRCh38.chr14 GRCh38.chr15 GRCh38.chr16 GRCh38.chr17 GRCh38.chr18 GRCh38.chr19 GRCh38.chr20 GRCh38.chr21 GRCh38.chr22 GRCh38.chrX GRCh38.chrY GRCh38.chrM",


This should not be hardcoded. Workflow has to work for all species in principle (as long as there is a pangenome index). We have a selection for the first n chromosomes already implemented in the workflow (at the level of the variants). Isn't that enough?

johanneskoester · 2024-05-08T14:08:16Z

workflow/rules/mapping.smk

+rule add_rg:
+    input:
+        "results/mapped/vg/{sample}_reheadered.bam",
+    output:
+        "results/mapped/vg/{sample}.bam",
+    log:
+        "logs/picard/add_rg/{sample}.log",
+    params:
+        extra="--RGLB lib1 --RGPL illumina --RGPU {sample} --RGSM {sample}",
+    resources:
+        mem_mb=60000,
+    wrapper:
+        "v2.3.2/bio/picard/addorreplacereadgroups"


Are you sure that it is impossible to define the readgroup directly when calling vg giraffe?

johanneskoester · 2024-05-08T14:10:13Z

workflow/rules/ref.smk

+
+rule get_vg_pangenome:
+    output:
+        "resources/pangenome/hprc-v1.0-mc-grch38.xg",


Rather name this analogously to the bwa index output, hence:

Suggested change

"resources/pangenome/hprc-v1.0-mc-grch38.xg",

f"{genome}.xg",

…les sheet

huzuner added 2 commits November 16, 2023 16:36

add option to allow mapping to pangenome

f353a70

fix lambda function for input files

97c098a

huzuner changed the title ~~add option to allow mapping to pangenome~~ feat: add option to allow mapping to pangenome Nov 17, 2023

huzuner added 7 commits November 17, 2023 16:00

fix vg input problem

8f455cb

sort vg_mapped bam file

ce395ab

format chr names and index

d593932

add read group information

cc11c4a

remove nonclassical chromosomes from headers convert mitochrondiral n…

d8cb96e

…otation

correct chr names for extractiong of primary chromosomes

45ef707

extract only properly paired reads

7ba7c54

huzuner added 8 commits March 25, 2024 15:11

Merge branch 'master' into allow-pangenome-mapping

4c27790

Merge branch 'master' of github.com:snakemake-workflows/dna-seq-varlo…

9251141

…ciraptor into allow-pangenome-mapping

modify input-output paths of vg related rules and fix 'and' operator bug

5a45b64

add index as input to filter_by_annotation rule to avoid index not fo…

5953c14

…und errors

upload example config and fix typo in vembrane expression

af6ecaf

provide example event, scenario and sample

0f6b011

Merge branch 'master' into allow-pangenome-mapping

d1173ac

fmt

e78e396

johanneskoester requested changes May 8, 2024

View reviewed changes

huzuner added 2 commits May 28, 2024 11:35

use default workflow config and scenario and add purity value to samp…

e5dd722

…les sheet

define and use get_aligner func

4563490

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add option to allow mapping to pangenome #274

feat: add option to allow mapping to pangenome #274

huzuner commented Nov 16, 2023 •

edited

huzuner commented Feb 15, 2024 •

edited

johanneskoester May 8, 2024

johanneskoester May 8, 2024

huzuner May 28, 2024

johanneskoester May 8, 2024

johanneskoester May 8, 2024

johanneskoester May 8, 2024

johanneskoester May 8, 2024

johanneskoester May 8, 2024

johanneskoester May 8, 2024

johanneskoester May 8, 2024

johanneskoester May 8, 2024

	csi=lambda wc: get_annotated_bcf(wc, index=True),
	csi=partial(get_annotated_bcf, index=True),

	"resources/pangenome/hprc-v1.0-mc-grch38.xg",
	f"{genome}.xg",

feat: add option to allow mapping to pangenome #274

Are you sure you want to change the base?

feat: add option to allow mapping to pangenome #274

Conversation

huzuner commented Nov 16, 2023 • edited

huzuner commented Feb 15, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huzuner commented Nov 16, 2023 •

edited

huzuner commented Feb 15, 2024 •

edited