Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toil stats should output a comma separated table with one row per task #4913

Open
juklucas opened this issue May 3, 2024 · 0 comments
Open

Comments

@juklucas
Copy link

juklucas commented May 3, 2024

@adamnovak

I think toil stats should output a comma separated table with one row per task. The output should be able to be pasted in to Excel or Google Sheets. (Even for users programmatically slicing the data, having a table is easier.)

The current (6.1.0) output of toil stats when run with default output formatting:

toil stats --outputFile "${WDL_NAME}_stats.txt" "${LOCAL_FOLDER}/jobstore"

produces five lines per task. Here is an example for one task:

assembly_qc_wf.align_ont_winnowmap.compressAssembly.inputs
    Total Cores: 1.0
    Count |                                  Real Time (s)* |                                        CPU Time (core·s) |                                        CPU Wait (core·s) |                                                Memory (B) |                                                           Disk (B) 
        n |       min     med*      ave      max      total |         min        med        ave        max       total |         min        med        ave        max       total |         min        med        ave        max        total |           min          med          ave          max         total 
        1 |      0.25     0.25     0.25     0.25       0.25 |        0.25       0.25       0.25       0.25        0.25 |        0.00       0.00       0.00       0.00        0.00 |    656440Ki   656440Ki   656440Ki   656440Ki     656440Ki |           8Ki          8Ki          8Ki          8Ki           8Ki

For big workflows there are hundreds of entries like this.

The lines have a variable number of columns and more than one separator (variable spaces and pipes).

To get around this I am currently using a nasty set of bash commands to create the type of output I'd like to see:

sed 's/^[ \t]*//' HG03704/final_qc_stats.txt \
    | sed -e's/  */ /g' \
    | awk 'BEGIN {
        FS=" +";   # Set field separator to handle spaces
        OFS=",";   # Set output field separator to comma
        print "task_name,total_cores,tasks,real_time_s_min,real_time_s_med,real_time_s_ave,real_time_s_max,real_time_s_total,cpu_time_cores_min,cpu_time_cores_med,cpu_time_cores_ave,cpu_time_cores_max,cpu_time_cores_total,cpu_wait_cores_min,cpu_wait_cores_med,cpu_wait_cores_ave,cpu_wait_cores_max,cpu_wait_cores_total,memory_b_min,memory_b_med,memory_b_ave,memory_b_max,memory_b_total,disk_b_min,disk_b_med,disk_b_ave,disk_b_max,disk_b_total"
    }

    /assembly_qc_wf/ {
        $1=$1; task_name = $0
        getline
        total_cores = $3  # Extract the total cores
        getline;  # Skip the header line
        getline;  # Skip the categories line
        getline;  # Process the values line

        print task_name, total_cores, $1, $3, $4, $5, $6, $7, $9, $10, $11, $12, $13, $15, $16, $17, $18, $19, $21, $22, $23, $24, $25, $27, $28, $29, $30, $31
    }' > HG03704/final_qc_stats_h.txt

There are a few other changes that I would also request:

  • I don't think CPU wait metrics are helpful for most users.
  • Units for memory should be in Mb (or variable, but not Ki)
  • Units for storage should be in Gb (or variable, but not Ki)

Here is a (somewhat) made up example of what I'd like to get from toil stats

task_name total_cores tasks real_time_s_min real_time_s_med real_time_s_ave real_time_s_max real_time_s_total cpu_time_cores_min cpu_time_cores_med cpu_time_cores_ave cpu_time_cores_max cpu_time_cores_total memory_b_min memory_b_med memory_b_ave memory_b_max memory_b_total disk_b_min disk_b_med disk_b_ave disk_b_max disk_b_total
assembly_qc_wf.align_ont_winnowmap.alignment.command 96 3 51289.76 56955.81 68112.03 96090.53 204336.09 1128240.9 1313831.48 1527506.5 2140447.11 4582519.49 35GB 35GB 35GB 35GB 105GB 200GB 200GB 200GB 200GB 600GB
assembly_qc_wf.align_ont_winnowmap.alignment.sort 96 3 51289.76 56955.81 68112.03 96090.53 204336.09 1128240.9 1313831.48 1527506.5 2140447.11 4582519.49 35GB 35GB 35GB 35GB 105GB 200GB 200GB 200GB 200GB 600GB

┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1561

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant