Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifying SV type from output #41

Open
janawold1 opened this issue Feb 3, 2022 · 1 comment
Open

Identifying SV type from output #41

janawold1 opened this issue Feb 3, 2022 · 1 comment

Comments

@janawold1
Copy link

Hello @jonassibbesen,
Just reaching out for a bit of assistance

Out of 33,462 candidate SVs called with Manta, I have successfully genotyped 26,606 SVs. To compare the relative frequency of SV type and size with those identified with other pipelines, I would like to count the number of deletions, duplications, insertions and inversions, and estimate their sizes.

I have tried to add symbolic alleles as per this approach, however only a few hundred deletions and insertions could be identified.

I have also tried to identify SVs that overlap with the candidate SV file before and after running bayestyperTools convertAllele and found that very few sites intersect or overlap (again, all deletions or insertions).

After trolling through the internet I wasn't able to much about converting from the long-sequence SV annotation format to symbolic alleles, which has me thinking that I'm missing something super obvious??? In the original paper for bayesTyper, how did you identify the different SV types for comparisons?

I have thought about splitting the candidate SV calls into different groups then carrying on from the convertAllele step, but I wasn't sure if this would negatively impact the genotype outputs. Especially since looking for an intersection between these type specific converted VCFs did not overlap well with the final output with all SVs run together.

Anyway, really like the tool and super keen for any help or advice you may have!

@janawold1 janawold1 changed the title Identifying SV type and Identifying SV type from output Feb 3, 2022
@jonassibbesen
Copy link
Contributor

Hi,

Thanks. If you are just interested in summary statistics (type, length, frequency etc.) of the genotyped variants you can use this script: "src/bayesTyperTools/scripts/getSummary.cpp" (binary should be under "bin/scripts" when compiled).

There is also a script for converting sequences to symbolic alleles ("convertSeqToAlleleId.cpp"), however I think the genotypes might be lost when using it. Also, the script only works for DEL, DUP and INV.

Regarding the small overlap between pre and post convertAllele SV sets. I am really surprised about this since convertAllele should not trim the alleles and only moves inversion (by a single base). Can you provide more details on how you did this comparison?

Best,

Jonas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants