-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samtools faidx
fails to retrieve large scaffolds
#1660
Comments
I also notice that |
This was probably a left-over from the transition to 64-bit positions in HTSlib. Having the limit in fai_retrieve() caused very long references to be truncated even though programs like `samtools faidx` should be able to support them (see issue samtools/samtools#1660 - samtools faidx fails to retrieve large scaffolds). The limit is useful for legacy faidx interfaces that return the size in an `int *`, so tests for sizes over INT_MAX have been applied to them.
This was probably a left-over from the transition to 64-bit positions in HTSlib. Having the limit in fai_retrieve() caused very long references to be truncated even though programs like `samtools faidx` should be able to support them (see issue samtools/samtools#1660 - samtools faidx fails to retrieve large scaffolds). The limit is useful for legacy faidx interfaces that return the size in an `int *`, so tests for sizes over INT_MAX have been applied to them.
samtools/htslib#1446 should have gone most of the way to fixing this. I'm leaving this issue open for the moment though as I notice that there are a few other improvements that could be made here, notably to the amount of memory used when you fetch a long sequence, which starts to get a bit excessive. |
Thanks @daviesrob. Pulled in the current develop head(s) and still fails to retrieve |
Getting access to your files would be useful. It worked on my test one, but I cheated a bit by making my sequences all "N" so they compressed down to a reasonable size. |
Have sent you an email. |
Thanks to @daviesrob identified that this was because I was running out of memory on the node - running on a node with enough memory worked as expected. |
Good to hear that. I'm looking at making it a bit less memory hungry, but it'll need a bit more work in HTSlib to get there. I've also noticed that |
Are you using the latest version of samtools and HTSlib? If not, please specify.
Please describe your environment.
uname -sr
on Linux/Mac OS orwmic os get Caption, Version
on Windows)uname -m
on Linux/Mac OS orwmic os get OSArchitecture
on Windows)gcc --version
orclang --version
)Linux 4.15.0-175-generic/x86_64
Please specify the steps taken to generate the issue, the command you are running and the relevant output.
We have a large plant assembly with some scaffolds larger than 10G in length.
samtools faidx
has successfully indexed this file (first part of thefai
file is below), but when trying to retrieve the contigs that are>2^32
in length, it fails. Scaffolds smaller than2^32
appear to be okay.The end of the strace for the above is
The top of the
fai
file looks like this:The text was updated successfully, but these errors were encountered: