Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With bayestyper genotyping SV, the genotype rate decreases as the depth of sample sequencing increases. #53

Open
yangqimeng99 opened this issue Apr 17, 2024 · 0 comments

Comments

@yangqimeng99
Copy link

yangqimeng99 commented Apr 17, 2024

Dear BayesTyper developer,

I hope this message finds you well. I am reaching out to discuss an unexpected issue I encountered while using BayesTyper, a tool I greatly admire for its excellence in genetic genotyping.

Iam testing with a human sample, HG002 with 2x150bp short reads, to genotype a set of structural variants (SVs) derived from hifi reads. This SV set comprises only insertions (INS) and deletions (DEL) with alleles >50bp. However, I’ve observed an unusual phenomenon where genotyping rates are lowest using 30x short reads compared to tests run with 20x and 10x coverage, which contradicts the common understanding that higher sequencing depth typically yields better genotyping rate.

To ensure thoroughness, I conducted tests based on both bam and fastq formats, and interestingly, the outcomes consistently align with the issue described above. Here is a brief outline of the code I utilized for this process:

kmc -k55 -ci1 -fbam ${inputBam} ${outputPrefix} ./kmc_tmp
bayesTyperTools makeBloom -k ${outputPrefix} -p ${threads}
bayesTyper cluster -v ${inuptVCF} -s ${sampleTsv} -g ${refCanon} -d ${refDecoy} -p ${threads}
bayesTyper genotype -v bayestyper_unit_1/variant_clusters.bin -c bayestyper_cluster_data -s ${sampleTsv} -g ${refCanon} -d ${refDecoy} -o bayestyper_unit_1/bayestyper  -z -p ${threads}

Based on the code mentioned, I obtained genotyping rates of 0.45, 0.48, 0.50, and 0.48 at sequencing depths of 30x, 20x, 10x, and 5x, respectively.Given this context, I am at a loss as to why this performance discrepancy occurs at higher sequencing depths. I would deeply appreciate any insights or suggestions you could provide. Could there potentially be a factor in BayesTyper that inversely impacts genotyping efficiency with increased read depth, specifically in the context of using short reads for SV genotyping? Or, is there any chance my code or approach inadvertently introduces a bias or error?

Thank you very much for your time and assistance. I look forward to any guidance you can offer.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant