Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAM Sequencing Integrity Hash #1935

Open
hammad93 opened this issue Nov 8, 2023 · 1 comment
Open

BAM Sequencing Integrity Hash #1935

hammad93 opened this issue Nov 8, 2023 · 1 comment
Milestone

Comments

@hammad93
Copy link

hammad93 commented Nov 8, 2023

Is your feature request related to a problem? Please specify.

If I wanted to be sure if a BAM or related file is exactly the same with any alterations, I can use hashing algorithms. There exist some popular ones,

  • MD5
  • SHA256

Describe the solution you would like.

https://github.com/hammad93/BAM-integrity-hash/tree/main

I utilized MD5 because it is faster than SHA256 and is sufficient.

@jkbonfield
Copy link
Contributor

Is it like this: https://manpages.debian.org/unstable/biobambam2/bamseqchksum.1.en.html

I believe we use this locally to offer similar functionality, but there has been discussion about having something native to samtools too. That tool allows us to compute sequence and quality checksums on FASTQs or unaligned bams, and then post alignment, sorting, and various other steps recompute it again to ensure all the data is still present and correct. (It did spot errors too, which turned out to be hardware memory corruptions.)

@daviesrob daviesrob added this to the wishlist milestone Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants