Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with idxdepth: "Assertion failed: _impl->header_contig_map.count(chr) != 0" #64

Open
cassisnaro opened this issue May 4, 2021 · 1 comment

Comments

@cassisnaro
Copy link

hi,

I have a BAM file downloaded from the GDC and the corresponding reference fasta (GRCh38.d1.vd1.fa). I have the following assertion which is triggered:
Assertion failed: _impl->header_contig_map.count(chr) != 0
After adding log messages, at least one case where it fails (I am not sure how multithreading might change results here) is with the sequence 'HCV-1'. I get the log line
Thread 140029280118528 estimating depth for HCV-1
but if I add a warning in BamReader::estimateDepth I see that what is being worked on is 'HCV'. I believe that the problem resides in the separators used in StringUtil::parsePos at line 159.
I should note that HCV-1 and HCV-2 were added to reference_chromosomes map in estimateDepths and to both_chromosomes in the same function.

I do not know what the correct path for correction is, as I expect you had your reasons to include '-' as a separator. I expect that if I modify the fasta and prepare a new header for my Bam in order to substitute the '-' with '_' this would correct the situation. Or would you plan an update to address this?

If I may also take the opportunity to hint at the warning: BAM header only has a subset of the reference chromosomes -- please make sure they match!. With my input, many contigs are not included in the both_chromosomes map. For example, among others, chr1_KI270706v1_random, chr1_KI270707v1_random, HTLV-1, or HPV100. Clearly, disregarding is on purpose by testing include_alt_contig, but then the warning is misleading. (And I am not sure if later on, missing information in the both_chromosomes, might trigger more errors).

Regards,

@traxexx
Copy link
Contributor

traxexx commented May 5, 2021

Thanks for pointing this out! You're right this is because '-' is a separator. See the function at StringUtil.hh line 156.
To make your pipeline more robust, I would recommend using underscore in chromosome names instead of dash. I am not if other tools have similar issues...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants