Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consensus sequences disagree with predicted numbers of repeats #13

Open
MaestSi opened this issue Aug 19, 2021 · 4 comments
Open

Consensus sequences disagree with predicted numbers of repeats #13

MaestSi opened this issue Aug 19, 2021 · 4 comments

Comments

@MaestSi
Copy link

MaestSi commented Aug 19, 2021

Dear tandem-genotypes developers,
I am using tandem-genoypes v1.8.3 to obtain consensus sequences for the two alleles.
After running the alignment with last, I am running:

tandem-genotypes -v -o2 $TARGET $SAMPLE_NAME".maf.gz" > $SAMPLE_NAME"_tandem_genotypes_output_o2.txt"
tandem-genotypes-merge $FASTQ_READS $SAMPLE_NAME".par" $SAMPLE_NAME"_tandem_genotypes_output_o2.txt" > $SAMPLE_NAME"_lamassemble_consensus_sequences.fasta"

I am analysing some "difficult" samples, where one allele may show somatic mosaicism, namely different expansion lengths for different cells. Therefore, I am aware that the diploid assumption may not hold true in this case. In particular I am now interested in producing a consensus sequence for the wild-type allele. When looking at tandem-genotypes output (-o2 option) I can see the number of repeats for the two alleles is predicted correctly. However, neither of the two consensus sequences from lamassemble represents the wild-type allele.
My questions are:
1- How are reads assigned to each allele?
2- Is it possible to retrieve which reads are used for producing consensus sequences for the two alleles?

Thanks in advance,
Simone

@mcfrith
Copy link
Owner

mcfrith commented Aug 20, 2021

1- Each read is simply assigned to the allele with nearest copy-number-change (breaking ties by choosing the shorter allele). The -v output shows each read's copy-number-change.

2- You can do this:

tandem-genotypes-merge seqs.fx tan-gen.txt > unmerged-sequences.fx

That will retrieve the reads for both alleles, mixed together. There doesn't seem to be a way to get the reads for each allele separately, maybe that should be fixed somehow...

@MaestSi
Copy link
Author

MaestSi commented Aug 20, 2021

Dear Martin,
I think this explains the issue I faced with these "difficult" samples, since outliers due to somatic mosaicism are not treated as such, and they participate in the consensus sequence as well.
Thank you for the information,
Simone

@mcfrith
Copy link
Owner

mcfrith commented Aug 20, 2021

The "consensus" should be robust to "outliers"... but only up to a point.

@MaestSi
Copy link
Author

MaestSi commented Aug 20, 2021

Yes, I agree. Thank you for your quick answers.
Simone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants