-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add option to allow mapping to pangenome #274
base: master
Are you sure you want to change the base?
Conversation
Bug with vg: alignments with deletions on ends: vgteam/vg#4204. They fixed the issue and the new release is coming on Feb. 26. |
…ciraptor into allow-pangenome-mapping
@@ -112,7 +115,7 @@ calling: | |||
gene_list_filter: | |||
aux-files: | |||
super_interesting_genes: "config/super_interesting_genes.tsv" | |||
expression: "ANN['GENE'] in AUX['super_interesting_genes']" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FelixMoelder please have a look. Did VEP change the column name for the gene?
config/config.yaml
Outdated
- somatic_tumor_high | ||
- somatic_tumor_medium |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rather not change the scenario of the default config (same below in config/scenario.yaml).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to use default workflow config and scenario.
workflow/rules/common.smk
Outdated
return "results/mapped/vg/{{sample}}_rg_added.{ext}".format(ext=ext) | ||
else: | ||
return "results/mapped/bwa/{{sample}}.{ext}".format(ext=ext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use f-string in both cases, use {mapper}
or something like that in order to not mostly repeat the f-string.
if get_sample_datatype(wildcards.sample) == "rna": | ||
aligner = "star" | ||
elif get_sample_datatype(wildcards.sample) == "dna" & is_activated( | ||
"ref/pangenome" | ||
): | ||
aligner = "vg" | ||
else: | ||
aligner = "bwa" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same code as above, create new function get_aligner(wildcards)
and use in both blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and below as well
@@ -19,6 +19,7 @@ rule filter_candidates_by_annotation: | |||
rule filter_by_annotation: | |||
input: | |||
bcf=get_annotated_bcf, | |||
csi=lambda wc: get_annotated_bcf(wc, index=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
csi=lambda wc: get_annotated_bcf(wc, index=True), | |
csi=partial(get_annotated_bcf, index=True), |
rule map_reads_vg_giraffe: | ||
input: | ||
reads=get_map_reads_input, | ||
idx="resources/pangenome/hprc-v1.0-mc-grch38.xg", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be hardcoded. What if a different pangenome is used? The pangenome index should be configurable via the config.
benchmark: | ||
"benchmarks/samtools_view_primary_chr/{sample}.tsv" | ||
params: | ||
region="GRCh38.chr1 GRCh38.chr2 GRCh38.chr3 GRCh38.chr4 GRCh38.chr5 GRCh38.chr6 GRCh38.chr7 GRCh38.chr8 GRCh38.chr9 GRCh38.chr10 GRCh38.chr11 GRCh38.chr12 GRCh38.chr13 GRCh38.chr14 GRCh38.chr15 GRCh38.chr16 GRCh38.chr17 GRCh38.chr18 GRCh38.chr19 GRCh38.chr20 GRCh38.chr21 GRCh38.chr22 GRCh38.chrX GRCh38.chrY GRCh38.chrM", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be hardcoded. Workflow has to work for all species in principle (as long as there is a pangenome index). We have a selection for the first n chromosomes already implemented in the workflow (at the level of the variants). Isn't that enough?
rule add_rg: | ||
input: | ||
"results/mapped/vg/{sample}_reheadered.bam", | ||
output: | ||
"results/mapped/vg/{sample}.bam", | ||
log: | ||
"logs/picard/add_rg/{sample}.log", | ||
params: | ||
extra="--RGLB lib1 --RGPL illumina --RGPU {sample} --RGSM {sample}", | ||
resources: | ||
mem_mb=60000, | ||
wrapper: | ||
"v2.3.2/bio/picard/addorreplacereadgroups" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that it is impossible to define the readgroup directly when calling vg giraffe?
|
||
rule get_vg_pangenome: | ||
output: | ||
"resources/pangenome/hprc-v1.0-mc-grch38.xg", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather name this analogously to the bwa index output, hence:
"resources/pangenome/hprc-v1.0-mc-grch38.xg", | |
f"{genome}.xg", |
This PR allows an optional pangenome alignment with
vg giraffe
.The following additions/changes are made:
pangenome
is added to the config.ref.smk
.mapping.smk
.pangenome
in the config file.