Consensus sequences disagree with predicted numbers of repeats #13

MaestSi · 2021-08-19T13:21:27Z

Dear tandem-genotypes developers,
I am using tandem-genoypes v1.8.3 to obtain consensus sequences for the two alleles.
After running the alignment with last, I am running:

tandem-genotypes -v -o2 $TARGET $SAMPLE_NAME".maf.gz" > $SAMPLE_NAME"_tandem_genotypes_output_o2.txt"
tandem-genotypes-merge $FASTQ_READS $SAMPLE_NAME".par" $SAMPLE_NAME"_tandem_genotypes_output_o2.txt" > $SAMPLE_NAME"_lamassemble_consensus_sequences.fasta"

I am analysing some "difficult" samples, where one allele may show somatic mosaicism, namely different expansion lengths for different cells. Therefore, I am aware that the diploid assumption may not hold true in this case. In particular I am now interested in producing a consensus sequence for the wild-type allele. When looking at tandem-genotypes output (-o2 option) I can see the number of repeats for the two alleles is predicted correctly. However, neither of the two consensus sequences from lamassemble represents the wild-type allele.
My questions are:
1- How are reads assigned to each allele?
2- Is it possible to retrieve which reads are used for producing consensus sequences for the two alleles?

Thanks in advance,
Simone

The text was updated successfully, but these errors were encountered:

mcfrith · 2021-08-20T04:21:45Z

1- Each read is simply assigned to the allele with nearest copy-number-change (breaking ties by choosing the shorter allele). The -v output shows each read's copy-number-change.

2- You can do this:

tandem-genotypes-merge seqs.fx tan-gen.txt > unmerged-sequences.fx

That will retrieve the reads for both alleles, mixed together. There doesn't seem to be a way to get the reads for each allele separately, maybe that should be fixed somehow...

MaestSi · 2021-08-20T07:18:31Z

Dear Martin,
I think this explains the issue I faced with these "difficult" samples, since outliers due to somatic mosaicism are not treated as such, and they participate in the consensus sequence as well.
Thank you for the information,
Simone

mcfrith · 2021-08-20T11:02:52Z

The "consensus" should be robust to "outliers"... but only up to a point.

MaestSi · 2021-08-20T11:51:10Z

Yes, I agree. Thank you for your quick answers.
Simone

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consensus sequences disagree with predicted numbers of repeats #13

Consensus sequences disagree with predicted numbers of repeats #13

MaestSi commented Aug 19, 2021 •

edited

mcfrith commented Aug 20, 2021

MaestSi commented Aug 20, 2021

mcfrith commented Aug 20, 2021

MaestSi commented Aug 20, 2021

Consensus sequences disagree with predicted numbers of repeats #13

Consensus sequences disagree with predicted numbers of repeats #13

Comments

MaestSi commented Aug 19, 2021 • edited

mcfrith commented Aug 20, 2021

MaestSi commented Aug 20, 2021

mcfrith commented Aug 20, 2021

MaestSi commented Aug 20, 2021

MaestSi commented Aug 19, 2021 •

edited