Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sam: Quality score ambiguity when sequence is a single base #715

Open
zaeleus opened this issue Apr 13, 2023 · 2 comments · May be fixed by #724
Open

sam: Quality score ambiguity when sequence is a single base #715

zaeleus opened this issue Apr 13, 2023 · 2 comments · May be fixed by #724
Assignees
Labels

Comments

@zaeleus
Copy link

zaeleus commented Apr 13, 2023

This in regard to Sequence Alignment/Map Format Specification (2022-08-22) § 1.4 "The alignment section: mandatory fields".

In the following SAM record, the quality scores field (QUAL) is ambiguous.

*	4	*	0	255	*	*	0	0	A	*

Since there is a singe base in the sequence, the quality scores field can either be unavailable (*) or represent [9].

@jkbonfield
Copy link
Contributor

jkbonfield commented Apr 14, 2023

This has been a known issue for a long time, although probably not tracked here. I don't think there's anything we can do about it really. Fortunately, it also means a length 1 sequence which doesn't generally happen in the wild, so it's a moot point. Most implementations just take the most probable view which is to interpret is as unknown and attempting to remove ambiguity would turn a harmless issue into a potentially more serious one.

Edit: as an aside, I note you're also using MAPQ of 255 for "unavailable". Commendable, but my experience is that everyone just uses 0 with unmapped data. I think this is because when FLAG 4 is set the specification states no assumption can be made about MAPQ, so it just feels cleaner to zero it out as all other fields have been.

@jkbonfield jkbonfield added the sam label Apr 24, 2023
@jkbonfield
Copy link
Contributor

TODO: Add footnote to say a single "*" for length 1 is still "unavailable"

@jkbonfield jkbonfield self-assigned this Apr 25, 2023
jkbonfield added a commit to jkbonfield/hts-specs that referenced this issue May 2, 2023
This is an extreme edge case likely to never occur, but nevertheless
tool implementors still need to know how to handle it.  Given it *may*
be QUAL 9 or it *may* be QUAL "unknown", we treat it as always unknown.

Fixes samtools#715
jkbonfield added a commit to jkbonfield/hts-specs that referenced this issue Jan 29, 2024
"*" is either QUAL 9, or QUAL unavailable.  Made a recommendation in
a footnote, mainly as an indication that the ambiguity exists.  In
practice it's vanishingly unlikely to matter.

Fixes samtools#715
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Progressing
Development

Successfully merging a pull request may close this issue.

2 participants