Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge alignments into existing BAM/CRAM/SAM #1979

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from
Draft

Conversation

mp15
Copy link
Member

@mp15 mp15 commented Feb 1, 2024

No description provided.

@jkbonfield
Copy link
Contributor

Your develop is way behind. Please update so you can then rebase it. A lot of the test failures are elsewhere in samtools, but have been fixed already.

@jkbonfield
Copy link
Contributor

jkbonfield commented Feb 2, 2024

Regarding the functionality, I'm tackling this from a different direction and this is ongoing work that's likely to land in a PR soon.

  • Base-callers emit MM/ML tags
  • Aligners align data and keep reads clustered together, so primary + supplementary + secondary.
  • Some reads will be hard-clipped. Some aligners update MM/ML, but most do not.
  • User cannot disambiguate, unless base-caller also added MN, and even then that just means we reject data rather than correct it.

However, starting from that point, we have name-collated aligned data with enough information in the primary alignment to correct MM/ML problems in the secondary alignments. We already usually run fixmate at this point to fix up aligner glitches, but potentially also to cache some useful information for a subsequent duplicate removal step. So adding MM/ML to fixmate here also feels like a natural step. This avoids having to use the original FASTQ or uBAM to try and patch the aligner output.

That strategy - a lift over from another file - is still valid and useful in other cases, obviously headers, but potentially some tags also. See https://gitlab.com/german.tischler/biobambam2/-/blob/master/src/programs/bamauxmerge.1?ref_type=heads for an example tool which does this already, lifting tags from one file to another. Samtools having this functionality may well be a useful addition.

@mp15
Copy link
Member Author

mp15 commented Feb 4, 2024

Thanks James. I've just been looking at your fixmate_MM branch and I think I might be able to reuse a few of your functions to trim MM tags. Maybe you can put them into htslib? Part of my motivation is that basemods aren't the only AUX tag that my mapper hasn't copied appropriately.

@mp15
Copy link
Member Author

mp15 commented Feb 4, 2024

Also are MN tags specified anywhere? I can't find them on hts-spec's samtags?

@jkbonfield
Copy link
Contributor

Also are MN tags specified anywhere? I can't find them on hts-spec's samtags?

Yes, as a PR that really needs some traction:

samtools/hts-specs#714

@jkbonfield
Copy link
Contributor

jkbonfield commented Feb 5, 2024

Thanks James. I've just been looking at your fixmate_MM branch and I think I might be able to reuse a few of your functions to trim MM tags. Maybe you can put them into htslib? Part of my motivation is that basemods aren't the only AUX tag that my mapper hasn't copied appropriately.

I'd like them in htslib at some point, but right now I'm just trying to get my head around the nightmare of fixmate main loop! Fundamentally it's broken for what we need to do. As soon as it finds the mate, it clears all "prev"ious meta-data. Hence if we have a primary pair and a secondary aligned pair then there's no way to lift over MM information from the primaries to the secondaries. I've no idea why this code is so convoluted, but I'm thinking of just rewriting it to buffer up all reads with the same name (assuming not "*") and process in batches of name-by-name. It's cleaner and simpler. Attempting to toggle between two bam1_t structs is obscure and not sadly also insufficient.

The alternative (perhaps safe) option is to leave fixmate as a "it works, don't touch!" thing and write a new processing command. Tempting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants