-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify that @RG PI
values are integers
#721
Conversation
SAMv1.tex
Outdated
@@ -315,7 +315,7 @@ \subsection{The header section} | |||
& {\tt KS} & The array of nucleotide bases that correspond to the key sequence of each read.\\\cline{2-3} | |||
& {\tt LB} & Library.\\\cline{2-3} | |||
& {\tt PG} & Programs used for processing the read group.\\\cline{2-3} | |||
& {\tt PI} & Predicted median insert size.\\\cline{2-3} | |||
& {\tt PI} & Predicted median insert size. \emph{Valid values}: Integers matching \verb"/^[0-9]+$/"\\\cline{2-3} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ^(0|[1-9][0-9]*)$
? I'm concerned about questions over 0123
should be interpreted as octal. I think htsjdk would do so already, as would any C tool using strtol or atoi.
I feel we're best off just rejecting such data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description should also be updated, e.g., to rounded predicted median insert size. Otherwise, further text is owed for as to why it's an integer rather than a decimal value.
302de8f
to
f5facde
Compare
The rest of the spec is agnostic re decimal versus octal versus etc: e.g., the table in §1.4 just says “Int” (and gives ranges) for FLAG, POS, MAPQ, PNEXT, and TLEN and offers no further hints as to how these Ints are to be represented in text. I don't propose to address this in this PR (or anytime soon), so I've reworded it to just say it's an integer without any regex. I'm not sure I agree that the spec is obliged to say why this is an integer, but nonetheless I've replaced the previous text added with just “[…], rounded to the nearest integer”. |
Looks good to me. |
From the field's description as an insert size, the value is clearly numeric (and non-negative). Further, HTSJDK has required it to be an integer since 2010 (see HTSJDK 2799b1f30b9d53a2949861a31bb85bfda08a4023, 2010-02-23, "Add getter and setter for predicted median insert size to SAMReadGroupRecord"). Fixes samtools#720. Adjust test files that had `PI:<floating-point value>`.
From the field's description as an insert size, the value is clearly numeric. Further, HTSJDK has required it to be an integer since 2010 (see samtools/htsjdk@2799b1f, 2010-02-23, "Add getter and setter for predicted median insert size to SAMReadGroupRecord").
So the proposal is to clarify that PI values should be integers, as enforced by one of the widely used implementations. Furthermore as insert sizes, the values are naturally non-negative. Fixes #720.
Also adjust test files that had
PI:<floating-point value>
.