You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is long overdue. It's memory usage is excessive on long read technologies, plus the output simply isn't that useful either.
It was first written in an era of small (eg 75bp) fixed size alignments. With potentially 1MB long reads it becomes totally unwieldy and consumes many GBs of RAM doing things which are, frankly, not remotely useful to the end user. An example is FFQ which reports data on the frequency of quality values as a table showing per-qual and per-base position. Per base position isn't useful data when you have MB long reads! A consumer of the data would need to do some aggregating and smoothing to get useful results, so the program should be doing that itself, perhaps with a parameter, or maybe using a log-scale so bins start growing the further out you get.
Additionally per quality produces excessive tables when we're looking at binned quality instruments (most of the table is full of zeros on modern Illumina or Revio).
However this would change the output formats. Hence stats2 is a better solution I think, but a longer term wish-list.
Shorter term, we may perhaps just wish to have command line options that simply disable some features so we can get basic stats without the worst excesses.
The text was updated successfully, but these errors were encountered:
I was also recently wondering what would be the best way to add flags to samtools stats to allow you to control which statistics are calculated? In my case I usually only need the number of reads (potentially filtered by the flag bits) and sometimes the total base count of the reads. To avoid excess compute I have made an app to do this based on one of the htslib demo apps but I think it would be nice to have the option to do this for the official stats app.
Thanks for the suggestion. Please do keep them coming, although right now this is rather a wish-list item and we haven't yet decided what priority the many competing ideas have so I don't have any time scales on rewrites.
This is long overdue. It's memory usage is excessive on long read technologies, plus the output simply isn't that useful either.
It was first written in an era of small (eg 75bp) fixed size alignments. With potentially 1MB long reads it becomes totally unwieldy and consumes many GBs of RAM doing things which are, frankly, not remotely useful to the end user. An example is FFQ which reports data on the frequency of quality values as a table showing per-qual and per-base position. Per base position isn't useful data when you have MB long reads! A consumer of the data would need to do some aggregating and smoothing to get useful results, so the program should be doing that itself, perhaps with a parameter, or maybe using a log-scale so bins start growing the further out you get.
Additionally per quality produces excessive tables when we're looking at binned quality instruments (most of the table is full of zeros on modern Illumina or Revio).
However this would change the output formats. Hence stats2 is a better solution I think, but a longer term wish-list.
Shorter term, we may perhaps just wish to have command line options that simply disable some features so we can get basic stats without the worst excesses.
The text was updated successfully, but these errors were encountered: