Missing Indels #49

nathanhaigh · 2020-03-16T06:40:59Z

My own analysis of Illumina (SRR11140750, bottom track) and nanopore (SRR11140751, top track) data from the same swab sample shows your variant analysis doesn't include indels:

You probably should include --call-indels in your call to lofreq call

The text was updated successfully, but these errors were encountered:

nekrut · 2020-03-16T20:52:52Z

what is your protocol?

nathanhaigh · 2020-03-16T23:10:43Z

For these mappings, it's simply bwa mem for the short reads and minimap2 for the nanopore reads.

Given the small size of the genome, I just eye-balled possible variants and saw this. Then realised you don't call indels in your variant analysis.

This is the content I've developed for a Master/Major in Bioinformatics Genomics course:
https://uofabioinformaticshub.github.io/genomics_applications/Practicals/resequencing/resequencing.html

nathanhaigh · 2020-03-16T23:18:54Z

Perhaps you are already calling variants. However, they are missing from these outputs: https://github.com/galaxyproject/SARS-CoV-2/tree/master/4-Variation#outputs

wm75 · 2020-03-21T12:51:19Z

The current workflow fails to call indels because it does not use lofreq indelqual to add indel qualities to the mapped reads first. Without those lofreq call skips indels even when the --call-indels option is in use (which is the case already).

@nekrut are you interested in having indels get called and do want to update the workflow accordingly yourself?

saramonzon · 2020-03-27T12:07:16Z

Hi! I was just stopping by to say this, and you already did! Indels should be called, as we have observed a bunch of them in covid19 data. Moreover they are indels that conserve the open reading frame, only deleting with aa, which makes it a most probably functional variant.

Also I think you don't filter host reads in the variant pipeline, but I'm not sure if you do it in the assembly. We have observed that bwa mem is two much sensitive and softclips a lot of reads against the human genome, bowtie2 is much better option for this type of data, specially if you are using amplicon data.

Thanks for the repo!

tseemann · 2020-05-19T07:33:15Z

@saramonzon do you have evidence for host reads (human) aligning to the SARS genome?
It could be possible with bwa mem default settings like minscore=30 but that should not be used here.

You can reduce soft clipping by changing the end-bonus settings, or as you say, using glocal alignment via bowtie2 --end-to-end

saramonzon · 2020-05-19T08:03:59Z

Hi @tseemann, yes using bwa with default parameters we obtain a considerably percentage of reads mapping to both human and SARS-Cov-2 depending on the sample, we have fixed it as you say using bowtie2, but using --local not --end-to-end.
But It's more likely that there are more virus reads mapping to human genome, than human reads mapping against SARS-Cov-2, however it could be a possibility that is worth to be avoided just in case.
I share here my notes when I was analyzing this just in case someone finds them useful, there are just personal notes, sorry about the writing!
README.txt

Updates lofreq version and adds indelquals. xref galaxyproject#49

fmaguire mentioned this issue May 22, 2020

Working on step 7 ("HISAT2 confirmation of removal of human data") jaleezyy/covid-19-signal#21

Closed

mvdbeek added a commit to mvdbeek/SARS-CoV-2 that referenced this issue Jun 1, 2020

Update variation workflows to include indelquals

6b50e02

Updates lofreq version and adds indelquals. xref galaxyproject#49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing Indels #49

Missing Indels #49

nathanhaigh commented Mar 16, 2020 •

edited

nekrut commented Mar 16, 2020

nathanhaigh commented Mar 16, 2020 •

edited

nathanhaigh commented Mar 16, 2020

wm75 commented Mar 21, 2020

saramonzon commented Mar 27, 2020

tseemann commented May 19, 2020

saramonzon commented May 19, 2020 •

edited

Missing Indels #49

Missing Indels #49

Comments

nathanhaigh commented Mar 16, 2020 • edited

nekrut commented Mar 16, 2020

nathanhaigh commented Mar 16, 2020 • edited

nathanhaigh commented Mar 16, 2020

wm75 commented Mar 21, 2020

saramonzon commented Mar 27, 2020

tseemann commented May 19, 2020

saramonzon commented May 19, 2020 • edited

nathanhaigh commented Mar 16, 2020 •

edited

nathanhaigh commented Mar 16, 2020 •

edited

saramonzon commented May 19, 2020 •

edited