Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genotype failed: Assertion `parameters.first > 0' failed. #50

Open
Jesson-mark opened this issue Feb 8, 2023 · 3 comments
Open

genotype failed: Assertion `parameters.first > 0' failed. #50

Jesson-mark opened this issue Feb 8, 2023 · 3 comments

Comments

@Jesson-mark
Copy link

Hi,

I am using Bayestyper to genotype SNPs of HG002 but I'm getting an assertion failure error. The reference SNPs are fetched from file processed by GIAB using freebayes.

The commands I used are listed below:

kmc -k55 -fbam 20_snps_reads.sorted.bam 55mers .

num_threads=1
bayesTyperTools makeBloom -k 55mers -p $num_threads 

ref_build=genome/bayestyper_GRCh38_bundle_v1.3/GRCh38
bayesTyper cluster -v test_20_snps.with_chr.vcf.gz -s samples.tsv -g ${ref_build}_canon.fa -d ${ref_build}_decoy.fa -p $num_threads

bayesTyper genotype -v bayestyper_unit_1/variant_clusters.bin -c bayestyper_cluster_data \
    -s samples.tsv -g ${ref_build}_canon.fa -d ${ref_build}_decoy.fa \
    -o bayestyper_unit_1/bayestyper -z -p $num_threads 

The logs of bayesTyper genotype process are below:

[08/02/2023 12:52:36] You are using BayesTyper (v1.5)

[08/02/2023 12:52:36] Seeding pseudo-random number generator with 1675831956 ...
[08/02/2023 12:52:36] Setting the kmer size to 55 ...

[08/02/2023 12:52:36] Parsed information for 1 sample(s)

[08/02/2023 12:52:36] Parsing reference genome ...
[08/02/2023 12:53:29] Parsed 65 reference genome chromosomes(s) (3095211400 nucleotides)

[08/02/2023 12:53:29] Parsing decoy sequence(s) ...
[08/02/2023 12:53:30] Parsed 2515 decoy sequence(s) (10503663 nucleotides)

[08/02/2023 12:53:37] Maximum resident set size: 3.28199 Gb


[08/02/2023 12:53:37] Parsing variant clusters ...
[08/02/2023 12:53:38] Parsed 53 variant clusters (100 variants)

[08/02/2023 12:53:39] Parsing parameter kmers ...
[08/02/2023 12:53:41] Parsed 1000000 kmers

[08/02/2023 12:53:41] Maximum resident set size: 5.02302 Gb


[08/02/2023 12:53:41] Counting kmers in variant cluster paths ...
[08/02/2023 12:53:41] Counting kmers in inter-cluster regions and decoy sequence(s) ...

[08/02/2023 13:25:38] Parsing KMC table containing 19636 kmers for sample HG002 ...

[08/02/2023 13:25:38] Classifying kmers in variant cluster paths ...
[08/02/2023 13:25:38] Out of 2113360 kmers:

	- 5163 have a match to a single variant cluster
	- 0 have a match to single variant cluster group and multiple variant clusters

	- 0 have match to at least one variant cluster and has match to a decoy sequence (not used for inference)
	- 6 have match to at least one variant cluster and has a maximum haploid multiplicity higher than 127 (not used for inference)
	- 0 have matches to multiple variant cluster groups within or across inference units (not used for inference)

	- 2108191 have no match to a variant cluster (includes parameter kmers)

[08/02/2023 13:25:38] Maximum resident set size: 5.2132 Gb


[08/02/2023 13:25:38] Estimating genomic haploid kmer count distribution(s) from parameter kmers ...

bayesTyper: /isdata/kroghgrp/jasi/bayesTyper/code/releases/v1.5_static/BayesTyper-1.5/src/bayesTyper/NegativeBinomialDistribution.cpp:94: void NegativeBinomialDistribution::setParameters(const std::pair<double, double>&): Assertion `parameters.first > 0' failed.

Do you have any suggestions about this error?

Many thanks,
Jesson-mark.

@jonassibbesen
Copy link
Contributor

Hi Jesson-mark,

Thanks for writing. This error likely happens because the estimated variance for the genomic count distribution is zero. This can happen if you have no kmers in regions between the variants since these are used to estimate the parameters for the distribution. I can see that the KMC table only contains 19636 kmers, which is really low. Does the 20_snps_reads.sorted.bam alignment file only contain a subset of the reads?

Best,

Jonas

@Jesson-mark
Copy link
Author

Hi, @jonassibbesen

Sorry for late reply!

Yes, file 20_snps_reads.sorted.bam only contains chr1:1-100000 reads because SNPs in these regions are used as test case. So if the problem is that some SNPs are not covered by kmers in KMC table, should I increase the number of alignments in bam file?

Best regards,

Jesson-mark

@jonassibbesen
Copy link
Contributor

Hi Jesson-mark,

The issue is actually that you provided the whole genome as input, but only gave reads for a small region. Non-variant regions are used to estimate the parameters for the genomic count distribution, but in your case almost all of these do not contain any read kmers (e.g. all kmers on other chromosomes).

Therefore, if you want to run BayesTyper on a smaller region for your test case you should also subset the reference genome (chr1:1-100000).

Let me know if that does not solve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants