New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sniffles error #478
Comments
Hi @ajbarley It looks like either your bam file contains reads with invalid SA tags (chimeric alignments are not properly represented), or there is something else there we do not expect. Can you share a couple of supplementary reads, i.e. reads with an SA tag? Also, I'll mark this as a bug - sniffles should handle this case better. Thanks, |
Hello, I am encountering a very similar bug (same line of code, but in my case, more than 6 right-hand values are found, while @ajbarley sees 1 right-hand value when 6 are expected). I generated SAM alignments with minimap2, and then converted to sorted and indexed CRAM or BAM files for Sniffles2. I get the same error on both CRAM and BAM files (I first tried CRAM, then switched to BAM to see if it would fix the error, but ran into the same error message. Here is an example of the Sniffles2 output I get-- also happy to share the SAM/CRAM/BAM inputs over google drive if this would be helpful. I am running sniffles 2.3.2 on our HPC (Duke Compute Cluster) running GNU/Linux, x86-64, and I installed sniffles2 with "pip install sniffles". Thanks so much for your time! |
Hello @rohanmaddamsetti Yes, uploading the files would be very helpful to investigate whats going on - this seems to be an invalid SA tag, which sniffles expects to have 6 elements according to the specification (https://samtools.github.io/hts-specs/SAMtags.pdf). Thanks, |
Hi @hermannromanek, thanks for looking into this, and sharing the SAM/BAM/CRAM spec. Here is a link to a folder containing the input files: This contains the sorted BAM files and indexes for two samples, one with Oxford Nanopore data, and the other with PacBio data. The corresponding Sniffles2 log files are also in there. There is also a table that lists which sample is PacBio and which is ONT data. Let me know if it would be helpful to upload the SAM files that I used to generate these sorted and indexed BAM files (using samtools). My upload speeds right now are slow, I can upload them later this evening or tomorrow on a faster network if needed. Thanks, |
Thank you for uploading the files @rohanmaddamsetti I took a quick look at them, the problem is your reference names contain a comma: This is the reason sniffles sees 7 fields instead of the expected 6. SAM specification section 1.2.1 excludes the comma as a valid character for a reference name (although for some reason it allows a semicolon, which I now suspect may be the cause of the error @ajbarley saw - this equally throws off parsing the contents of an SA tag). Thanks, |
Wow, thanks so much Hermann! Easy for me to fix -- ensure that the FASTA headers in the reference genome used by minimap2 are SAM-compliant, specifically excluding commas and semicolons. Would have taken me a long time to figure out though! Thanks again for your help! I'll post again when I re-run sniffles 2.3.2 to confirm that it works on my input data. Cheers 🍻 |
Hi @hermannromanek, Confirming that I got sniffles2 working. Thanks again! If this gets sniffle2 working for @ajbarley then this issue can be closed. |
Hey Hermann, Thanks again for your help! Anthony |
@ajbarley if you haven't already done so, you should check the headers in your FASTA references for your alignments to make sure that they are being parsed correctly (see above). Good luck! |
Thanks. Yeah, I suspect that it is not the headers in the fasta file, as those are much simpler and do not contain commas (e.g., Chr_1, Chr_2). So I did finish trying mapping with ngmlr, and had the same issue (here's some data from that file: https://www.dropbox.com/scl/fo/eieb5fc0e9htrarhfj93z/ACHAmwErMmYTOvIx6378cwM?rlkey=awc6d2tficjssddq18hue95at&dl=0). Let me know if you know what the issue might be. Thanks again for your help on this! |
Hi @ajbarley Thanks for the file - it confirmed the suspected problem of references containing semicolons in your bam file: ... Although this is legal according to the current spec (which I believe to be an error), this throws the parser for SA-tags off, since semicolons are used there as separator for the list of reads. I'll see if I can add some code to support read names with semicolons without costing us too much performance. Thanks, |
Ah, yep, you are right, that's the issue, thanks! Seems to work well now, thanks so much for your help!! |
Looks completed 3 weeks ago |
Hi,
I'm trying am encountering an error when I run sniffles2 on my dataset. I was wondering if you could advise me on a solution. I have mapped my ONT data to a reference genome using minimap (minimap2 -ax map-ont -t 30 AspMarm2.0.fasta mergedtigris.fastq.gz | samtools sort -@ 4 -m 4G > np_mapping.bam), indexed the bam file (samtools index np_mapping.bam), and then run sniffles2 on linux (sniffles -i np_mapping.bam -v punc_marm.vcf). The analysis starts. but throws an error before completing (standard output attached here). Do you know what the issue is? Thanks.
sniffles_stdout.txt
The text was updated successfully, but these errors were encountered: