Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve results output from UX perspective #100

Open
ecurtin opened this issue Sep 21, 2017 · 0 comments
Open

Improve results output from UX perspective #100

ecurtin opened this issue Sep 21, 2017 · 0 comments

Comments

@ecurtin
Copy link
Contributor

ecurtin commented Sep 21, 2017

This is an issue that was originally opened on ecurtin/spark-bench. I am copy-pasting the ensuing discussion here:


Based on feedback from @brad-kaiser

As a user I want to hang onto the config file I used in its entirety for each run so that I can have easy reference and copy-paste access to the exact settings I used.

As a user, I want to output results in the same directory both from multiple suites and from multiple runs of spark-bench.

As a user, I find it overwhelming to see such wide rows in my results with so much repeated information.

FYI, speaking for myself here, I think outputting results on a suite-by-suite basis is a little bit of an odd choice. It was my choice, but I still think it's weird. I definitely think this could be improved.


@showermat commented on Jul 28:

At the moment, it seems like the trend is toward wrapping every element of the configuration and environment into the output as columns in the results table. As we try to add OS and hardware information, the number of columns will increase. Having one table may be convenient for certain kinds of analysis if you're using SQL, but is not very user-friendly when it comes to casually going over output -- particularly on the console. There is a lot of duplicated information -- OS is the same for all workloads on one machine, for example, and spark options are the same for each spark-submit. Since the testing structure is hierarchical (submits -> suites -> workloads), it may make more sense to have similarly hierarchical output. In the discussion that spawned this issue, the idea was mentioned of having a directory for each submit and placing the configuration file for that submit in there along with the results output CSV...or something like that. I think that it would be worth looking into extending this idea to a multi-level directory hierarchy something like this:

  • Base directory for this run
    • Original config file that created this run
    • Information common to all submits -- environment variables, parallelism, etc.
    • Directory for submit 1
      • If debugging is enabled, maybe the temporary config file created by spark-launch? Or we could just put the temp files there by default rather than putting them in /tmp and trying to delete them, which often fails
      • CSV or other table format with the hardware and OS information, Spark version and parameters, and anything else common to the entire submit
      • Directory for suite 1
        • spark-bench configuration information for suite 1 and anything else that can change between suites
        • At this point, we could go another step further and create a directory for each workload, containing its configuration and its output as a CSV or similar. I think this might be going a bit far. Instead, we could put the existing results tables in this directory, only without the information that has already been included at a higher level. Only the workload parameters and results would need to be included in this table.
      • Directory for suite 2...
    • Directory for submit 2...

I think something like this would go a long way toward making the output more user-friendly, and it also handles the "easy reference" issue mentioned above.

I'm not sure how far this goes toward a solution, but it is my initial thoughts on the fact.


@showermat commented on Aug 18

It's important to consider scriptability of the results in the new format. Craig points out that while the hierarchical format is convenient for human readers, it may make it difficult to automatically extract results for reporting. We need to be careful that the new format is machine-friendly as well as human-friendly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant