Skip to content

Commit

Permalink
Add an MZ:i tag.
Browse files Browse the repository at this point in the history
This is used as a sanity check on the validity of the MM and ML tags.
It holds the length of SEQ at the time MM and ML were produced and/or
updated.  The intention is to provide a mechanism to detect
hard-clipping has been performed with a tool that is not MM/ML aware.

Fixes samtools#646
  • Loading branch information
jkbonfield committed May 2, 2023
1 parent 144e32a commit 3aa0ded
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions SAMtags.tex
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ \section{Standard tags}
{\tt MI} & Z & Molecular identifier; a string that uniquely identifies the molecule from which the record was derived \\
{\tt ML} & B,C & Base modification probabilities \\
{\tt MM} & Z & Base modifications / methylation \\
{\tt MN} & i & Length of sequence at the time {\tt MM} and {\tt ML} were produced \\
{\tt MQ} & i & Mapping quality of the mate/next segment \\
{\tt NH} & i & Number of reported alignments that contain the query in the current record \\
{\tt NM} & i & Edit distance to the reference \\
Expand Down Expand Up @@ -621,6 +622,16 @@ \subsection{Base modifications}
{\tt ML} values for ambiguity codes give the probability that the modification is one of the possible codes compatible with that ambiguity code.
For example {\tt MM:Z:C+C,10; ML:B:C,229} indicates a C call with a probability of 90\% of having some form of unspecified modification.

\item[MN:i:\tagvalue{length}]
\hfill\\
Tools may edit the {\sf SEQ} sequence data, such as modifying the alignment with hard-clipping.
If the sequence is shrunk in this manner then the base offsets in {\tt MM} and {\tt ML} become invalid unless they are also updated accordingly.

There may be hard-clipping tools which update {\tt MM} and tools which do not, so the {\tt MN} tag offers a simple sanity check.
It holds the length of the sequence at the time {\tt MM} was last written.
Tools that wish to validate {\tt MM} should compare the length of the {\sf SEQ} field with the contents of the {\tt MN} tag.
The tag is optional, but recommended, and if it is absent then there is an implicit assumption that the {\tt MM} data is valid unless evidence implies otherwise (such as having coordinates beyond the end of the sequence).

\end{description}

\section{Draft tags}
Expand Down

0 comments on commit 3aa0ded

Please sign in to comment.