-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samtools view command error #1622
Comments
It looks like a network error. Is your error reproducible? Does your command work if you download the reference and index and run it locally? I ran your command on some of our human data and it worked normally but since I don't know what is in your ourfile.bam I am not sure it is a good test. |
Initially, it works for some time. After 5 or 10 minutes it stops working with the above error Currently running inside a docker container on a GCP VM. Testing the command with local files works fine |
Not until tomorrow. My current working machine does not allow containers. I would have to set something up. |
sounds good. Could you also verify if we can work with gs:// URLs instead of local files or HTTPS URLs? |
@skatragadda-nygc I have run it in a docker container using a 1000 Genomes file and it worked for me. The running time was about 30 minutes on both coordinate and name sorted data. HTTPS and GS URLs both worked. It may be something to do with the GCP VM but I do not have enough in depth knowledge to help there. |
@whitwham I'm @skatragadda-nygc colleague. We did more debugging. Turns out that our end user have been running samtools with the default 1 thread. This will always trigger the libcurl error at exactly this position. If we increase the threads to 2 or more, the run completes successfully. I'm wondering if you're running samtools with more threads? |
@xk42 I'm running with only one thread. Pretty much the same as in the original command given by @skatragadda-nygc but with my own input file and no verbosity option. I took a generic ubuntu image off docker and it is using an older Linux kernel. I have a VM with a newer kernel ready and I will see what running on that does. |
We tested on two different bam files. It does look like the size might be triggering this. One of our bam is 25G and we don't have any issues with 1 or more threads. The other one is 150G and this is the one that is triggering the error when running with 1 thread. |
Okay, the test with the newer kernel worked, though only on a 2.5G file. I'm running a test on a 340G file, the biggest I have easily available, it may take some time. |
Well this is interesting.
This was with the 340G bam file. It failed after 14 minutes, which is less time than the successful runs took. This, as you say, implies size is a factor. It did manage to write out a 2.7G cram file. I'll need to discuss this with my colleagues after the weekend. |
I wonder if it's timing-related. Possibly Google is timing out the https connection between samtools reading the first and second chromosome references? |
I'm not at all familier with GCS, but it occurs maybe this is an issue of retaining a file handle open for too long without interim usage. There is a significant difference with how CRAM works when threaded and unthreaded, which hints at this. When unthreaded, it opens the appropriate reference file and then periodically reads from it ( Now this is where threading dramatically changes things. CRAM threaded encoding doesn't load the reference in bits as it goes as it's hard to work out when a reference section has gone out of scope and can be discarded. Instead it loads an entire chromosome at a time and holds it in memory. This makes the open/read calls (assuming separate chromosomes are separate connections) close together in time and avoids time-out issues. Obviously using 2 threads (or maybe even explicitly asking for I'm unsure of how we solve this properly in htslib, except perhaps with some error catch so if a |
Are you using the latest version of samtools and HTSlib? If not, please specify.
(run
samtools --version
)samtools 1.15
Using htslib 1.15
Please describe your environment.
uname -sr
on Linux/Mac OS orwmic os get Caption, Version
on Windows)uname -m
on Linux/Mac OS orwmic os get OSArchitecture
on Windows)gcc --version
orclang --version
)Linux 5.10.0-12-cloud-amd64
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Please specify the steps taken to generate the issue, the command you are running and the relevant output.
samtools view -C -T https://storage.googleapis.com/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta -t https://storage.googleapis.com/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta.fai -o somatic-test.cram ourfile.bam --verbosity=8
here is the error message
[I::cram_next_container] Flush container 1/10059..-1
[E::easy_errno] Libcurl reported error 16 (Error in the HTTP2 framing layer)
[E::bgzf_read_block] Failed to read uncompressed data at offset 258539632: Input/output error
[E::bgzf_read] Read block operation failed with error 4 after 7093421 of 244615464 bytes
bgzf_read() on reference file: Input/output error
[E::cram_encode_container] Failed to load reference
#1
samtools view: writing to "somatic-test.cram" failed: Input/output error
[I::cram_encode_compression_header] Wrote compression block header in 6 bytes
[E::easy_errno] Libcurl reported error 16 (Error in the HTTP2 framing layer)
The text was updated successfully, but these errors were encountered: