You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think toil stats should output a comma separated table with one row per task. The output should be able to be pasted in to Excel or Google Sheets. (Even for users programmatically slicing the data, having a table is easier.)
The current (6.1.0) output of toil stats when run with default output formatting:
produces five lines per task. Here is an example for one task:
assembly_qc_wf.align_ont_winnowmap.compressAssembly.inputs
Total Cores: 1.0
Count | Real Time (s)* | CPU Time (core·s) | CPU Wait (core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med ave max total | min med ave max total | min med ave max total
1 | 0.25 0.25 0.25 0.25 0.25 | 0.25 0.25 0.25 0.25 0.25 | 0.00 0.00 0.00 0.00 0.00 | 656440Ki 656440Ki 656440Ki 656440Ki 656440Ki | 8Ki 8Ki 8Ki 8Ki 8Ki
For big workflows there are hundreds of entries like this.
The lines have a variable number of columns and more than one separator (variable spaces and pipes).
To get around this I am currently using a nasty set of bash commands to create the type of output I'd like to see:
sed 's/^[ \t]*//' HG03704/final_qc_stats.txt \
| sed -e's/ */ /g' \
| awk 'BEGIN {
FS=" +"; # Set field separator to handle spaces
OFS=","; # Set output field separator to comma
print "task_name,total_cores,tasks,real_time_s_min,real_time_s_med,real_time_s_ave,real_time_s_max,real_time_s_total,cpu_time_cores_min,cpu_time_cores_med,cpu_time_cores_ave,cpu_time_cores_max,cpu_time_cores_total,cpu_wait_cores_min,cpu_wait_cores_med,cpu_wait_cores_ave,cpu_wait_cores_max,cpu_wait_cores_total,memory_b_min,memory_b_med,memory_b_ave,memory_b_max,memory_b_total,disk_b_min,disk_b_med,disk_b_ave,disk_b_max,disk_b_total"
}
/assembly_qc_wf/ {
$1=$1; task_name = $0
getline
total_cores = $3 # Extract the total cores
getline; # Skip the header line
getline; # Skip the categories line
getline; # Process the values line
print task_name, total_cores, $1, $3, $4, $5, $6, $7, $9, $10, $11, $12, $13, $15, $16, $17, $18, $19, $21, $22, $23, $24, $25, $27, $28, $29, $30, $31
}' > HG03704/final_qc_stats_h.txt
There are a few other changes that I would also request:
I don't think CPU wait metrics are helpful for most users.
Units for memory should be in Mb (or variable, but not Ki)
Units for storage should be in Gb (or variable, but not Ki)
Here is a (somewhat) made up example of what I'd like to get from toil stats
@adamnovak
I think
toil stats
should output a comma separated table with one row per task. The output should be able to be pasted in to Excel or Google Sheets. (Even for users programmatically slicing the data, having a table is easier.)The current (6.1.0) output of
toil stats
when run with default output formatting:produces five lines per task. Here is an example for one task:
For big workflows there are hundreds of entries like this.
The lines have a variable number of columns and more than one separator (variable spaces and pipes).
To get around this I am currently using a nasty set of bash commands to create the type of output I'd like to see:
There are a few other changes that I would also request:
Here is a (somewhat) made up example of what I'd like to get from
toil stats
┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1561
The text was updated successfully, but these errors were encountered: