Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sniffles error #478

Closed
ajbarley opened this issue Apr 22, 2024 · 13 comments
Closed

Sniffles error #478

ajbarley opened this issue Apr 22, 2024 · 13 comments
Assignees

Comments

@ajbarley
Copy link

Hi,
I'm trying am encountering an error when I run sniffles2 on my dataset. I was wondering if you could advise me on a solution. I have mapped my ONT data to a reference genome using minimap (minimap2 -ax map-ont -t 30 AspMarm2.0.fasta mergedtigris.fastq.gz | samtools sort -@ 4 -m 4G > np_mapping.bam), indexed the bam file (samtools index np_mapping.bam), and then run sniffles2 on linux (sniffles -i np_mapping.bam -v punc_marm.vcf). The analysis starts. but throws an error before completing (standard output attached here). Do you know what the issue is? Thanks.
sniffles_stdout.txt

@hermannromanek
Copy link
Collaborator

Hi @ajbarley

It looks like either your bam file contains reads with invalid SA tags (chimeric alignments are not properly represented), or there is something else there we do not expect.

Can you share a couple of supplementary reads, i.e. reads with an SA tag? Also, I'll mark this as a bug - sniffles should handle this case better.

Thanks,
Hermann

@rohanmaddamsetti
Copy link

rohanmaddamsetti commented Apr 28, 2024

Hello, I am encountering a very similar bug (same line of code, but in my case, more than 6 right-hand values are found, while @ajbarley sees 1 right-hand value when 6 are expected). I generated SAM alignments with minimap2, and then converted to sorted and indexed CRAM or BAM files for Sniffles2. I get the same error on both CRAM and BAM files (I first tried CRAM, then switched to BAM to see if it would fix the error, but ran into the same error message.

Here is an example of the Sniffles2 output I get-- also happy to share the SAM/CRAM/BAM inputs over google drive if this would be helpful.

I am running sniffles 2.3.2 on our HPC (Duke Compute Cluster) running GNU/Linux, x86-64, and I installed sniffles2 with "pip install sniffles".

Thanks so much for your time!
slurm-7892652.out.txt

@hermannromanek
Copy link
Collaborator

Hello @rohanmaddamsetti

Yes, uploading the files would be very helpful to investigate whats going on - this seems to be an invalid SA tag, which sniffles expects to have 6 elements according to the specification (https://samtools.github.io/hts-specs/SAMtags.pdf).

Thanks,
Hermann

@rohanmaddamsetti
Copy link

Hi @hermannromanek, thanks for looking into this, and sharing the SAM/BAM/CRAM spec.

Here is a link to a folder containing the input files:
https://drive.google.com/drive/folders/1Idf2LK0a2jHOHCNVFcUfbwngxGlPtMP1?usp=sharing

This contains the sorted BAM files and indexes for two samples, one with Oxford Nanopore data, and the other with PacBio data. The corresponding Sniffles2 log files are also in there. There is also a table that lists which sample is PacBio and which is ONT data.

Let me know if it would be helpful to upload the SAM files that I used to generate these sorted and indexed BAM files (using samtools). My upload speeds right now are slow, I can upload them later this evening or tomorrow on a faster network if needed.

Thanks,
Rohan

@hermannromanek
Copy link
Collaborator

Thank you for uploading the files @rohanmaddamsetti

I took a quick look at them, the problem is your reference names contain a comma:
grafik

This is the reason sniffles sees 7 fields instead of the expected 6. SAM specification section 1.2.1 excludes the comma as a valid character for a reference name (although for some reason it allows a semicolon, which I now suspect may be the cause of the error @ajbarley saw - this equally throws off parsing the contents of an SA tag).

Thanks,
Hermann

@rohanmaddamsetti
Copy link

Wow, thanks so much Hermann!

Easy for me to fix -- ensure that the FASTA headers in the reference genome used by minimap2 are SAM-compliant, specifically excluding commas and semicolons. Would have taken me a long time to figure out though! Thanks again for your help!

I'll post again when I re-run sniffles 2.3.2 to confirm that it works on my input data.

Cheers 🍻
Rohan

@rohanmaddamsetti
Copy link

Hi @hermannromanek,

Confirming that I got sniffles2 working. Thanks again! If this gets sniffle2 working for @ajbarley then this issue can be closed.

@ajbarley
Copy link
Author

Hey Hermann,
Thanks for getting back to me, and sorry for taking so long to get back to you, I was trying to do the mapping with ngmlr to see if the issue was specific to minimap, or perhaps an issue with the samtools conversion. But I saw you were able to diagnose a related issue for another user, so figured I shouldn't wait longer to send this along, here's a link to some of the reads (https://www.dropbox.com/scl/fo/cf79raq2ki18u30hrmqlb/AM7LIll-p6PEC0Trf8-j69s?rlkey=rn818buikq9t5mnx0xonmnufo&dl=0). Let me know if you see what the issue is or need anything else.

Thanks again for your help!

Anthony

@rohanmaddamsetti
Copy link

@ajbarley if you haven't already done so, you should check the headers in your FASTA references for your alignments to make sure that they are being parsed correctly (see above). Good luck!

@ajbarley
Copy link
Author

ajbarley commented May 1, 2024

Thanks. Yeah, I suspect that it is not the headers in the fasta file, as those are much simpler and do not contain commas (e.g., Chr_1, Chr_2). So I did finish trying mapping with ngmlr, and had the same issue (here's some data from that file: https://www.dropbox.com/scl/fo/eieb5fc0e9htrarhfj93z/ACHAmwErMmYTOvIx6378cwM?rlkey=awc6d2tficjssddq18hue95at&dl=0). Let me know if you know what the issue might be. Thanks again for your help on this!
Anthony

@hermannromanek
Copy link
Collaborator

Hi @ajbarley

Thanks for the file - it confirmed the suspected problem of references containing semicolons in your bam file:

...
Chr_23
Scaffold_822;HRSCAF=896
Scaffold_2574;HRSCAF=2795
Scaffold_684;HRSCAF=744
...

Although this is legal according to the current spec (which I believe to be an error), this throws the parser for SA-tags off, since semicolons are used there as separator for the list of reads.

I'll see if I can add some code to support read names with semicolons without costing us too much performance.

Thanks,
Hermann

@ajbarley
Copy link
Author

ajbarley commented May 2, 2024

Ah, yep, you are right, that's the issue, thanks! Seems to work well now, thanks so much for your help!!

@lfpaulin
Copy link
Collaborator

Looks completed 3 weeks ago

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants