How to visualize reads containing expansions #20

gspirito · 2023-12-20T09:06:02Z

Hello, here's my issue:

I ran tandem-genotypes on long reads (Oxford Nanopore) on a RepeatMasker locus and obtained this result:
chr11 70487135 70487173 TGC SHANK2 coding 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,2,2,3 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,2,3

Therefore there should be 13 reads with additional copies of the sequence 'TGC' compared to the reference genome.
However, if I extract all reads mapping to the locus 'chr11:70487135-70487173' from the MAF file and convert it to BAM (with LAST), I cannot see any insertion with IGV, in any read mapped to that locus.

How can I visualize the STR expansions? Is there a way to know which specific reads support the expansions?

Thanks in advance,

Giovanni

The text was updated successfully, but these errors were encountered:

mcfrith · 2023-12-20T11:57:22Z

Many thanks for your interest in tandem-genotypes. What you're doing seems correct: I don't know why it doesn't work. Maybe if you could share your intermediate files...

To know which reads support the expansions, you can use tandem-genotypes option -v.

gspirito · 2024-01-08T11:08:05Z

Thank you very much for the answer, I attach the locus I used for the analysis, the result I got from Tandem-genotypes and the MAF file containing the reads mapping to that locus:

SHANK2_locus_rpmsk.txt
SAMPLE_tg_SHANK2.txt
SAMPLE_MAF.txt

mcfrith · 2024-01-08T12:18:28Z

Thanks for this interesting example!
In short, tandem-genotypes is "working as designed", but the design isn't looking good in this case.

It's faithfully following the "tandem-genotypes method" in here: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1667-6

This dotplot shows the alignment (red) of one read that supposedly has 3 additional copies of TGC:

To the left of the repeat (purple), there's an insertion and deletion almost adjacent to each other. tandem-genotypes is counting the insertion as a repeat expansion. It counts insertions that are slightly outside the repeat: we found it necessary to do that in general, because the precise boundaries of repeats can be fuzzy and ambiguous (for non-exact repeats).

You could use tandem-genotypes option -n20 (to only count insertions <= 20 bp outside the repeat, instead of 60).

Maybe tandem-genotypes should be changed like this: when an insertion and deletion are so close to each other, merge them into one "in-del".

gspirito · 2024-05-31T07:48:58Z

Hi, thank you for the response, may you provide the command to make the plot you showed? Thank you very much

mcfrith · 2024-06-03T04:35:32Z

Amazingly, it's still in my shell's history:
grep -B3 6f8e3f3a SAMPLE_MAF.txt | last-dotplot -a SHANK2_locus_rpmsk.txt -1 chr11:70487085-70487223 - myfig.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to visualize reads containing expansions #20

How to visualize reads containing expansions #20

gspirito commented Dec 20, 2023

mcfrith commented Dec 20, 2023

gspirito commented Jan 8, 2024

mcfrith commented Jan 8, 2024

gspirito commented May 31, 2024

mcfrith commented Jun 3, 2024

How to visualize reads containing expansions #20

How to visualize reads containing expansions #20

Comments

gspirito commented Dec 20, 2023

mcfrith commented Dec 20, 2023

gspirito commented Jan 8, 2024

mcfrith commented Jan 8, 2024

gspirito commented May 31, 2024

mcfrith commented Jun 3, 2024