You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering if there was a way or specification for SAM headers to describe what custom tags they are using, for example the lower case and X/Y/Z prefixed tags. My angle on this is just showing users at a glance what various fields mean in a genome browser, but can imagine it being useful in other circumstances.
VCF kind of has this with e.g. "1.4.4 Individual format field format" which will allow a file to self-describe the custom fields in it's FORMAT column
It could possibly make it easier to at-a-glace for a human to understand a data file. possible caveats
some fields require lengthy descriptions to begin to explain them
if it is free text it may not be very 'semantic' or 'machine parse-able'. bit of a tangent but in the example of the VCF, the CSQ is one of these things where what i think should be a machine readable description is stored in this 'human readable' field e.g. to meaningfully parse the CSQ field a program needs to split the VCF header description of CSQ by the text after "Format:"
I like this idea, but sadly currently it doesn't exist.
It'd need to be in the @CO tag to avoid breaking existing parsers that validate the headers, at least until that mythical time we develop SAM 2.0. That's not ideal, but we are where we are.
I guess we could carve out a namespace within CO for additional commentary. Eg:
@CO @TAG ID:X0 TY:i DS:Number of best hits
You're perfectly at liberty to start doing this already, although it'd obviously need buy-in from the genome browsers. I'm not sure we'd want to add something formal to the specification unless we see active buy-in from multiple implementations.
I was wondering if there was a way or specification for SAM headers to describe what custom tags they are using, for example the lower case and X/Y/Z prefixed tags. My angle on this is just showing users at a glance what various fields mean in a genome browser, but can imagine it being useful in other circumstances.
VCF kind of has this with e.g. "1.4.4 Individual format field format" which will allow a file to self-describe the custom fields in it's FORMAT column
It could possibly make it easier to at-a-glace for a human to understand a data file. possible caveats
The text was updated successfully, but these errors were encountered: