Refactor of YAML config file #147

nmerket · 2020-04-06T17:46:33Z

Given the organic growth of the config (yaml) file to include support for additional compute environments and analysis types, there's some confusing ways of specifying details. This seems like a good time to reconsider how we organize the YAML file. Here are my current thoughts:

Switch from yamale to json schema as there is better tooling around it including real time validation plugins for VSCode, etc. We'd still accept YAML, but would also accept JSON as an input file.
Have a spec for a custom docker or singularity image Figure out how to handle custom docker/singularity images #83
Make a more generic spec to set the following classes (this will replace the spec for stock_type)
- A sampler class and properties
- A workflow generator class and properties
- Consolidate the docker/singularity sampler classes into one and have the environment pass which method to use.
Move output_directory under the compute environment since it varies by compute environment.
Clear up the confusion about there being two places to set and s3 location. postprocessing.s3 and aws.s3.

cc @rHorsey @joseph-robertson @rajeee

The text was updated successfully, but these errors were encountered:

rajeee · 2020-04-06T17:53:38Z

Sounds like a good plan to improve the config file. I would however vote against the json schema since it doesn't support comments. In big config files with plenty of upgrades, comments help a lot. VSCode does have a plugin for yaml validation and it works pretty well. And, supporting both schemas might create more confusion than solve, I think.

asparke2 · 2020-10-16T13:30:43Z

Generalize the reporting measures & standardize res/com approach as much as possible

@nmerket Over the last few projects we've done w/ ComStock, we've needed to add project-specific reporting measures. The most expedient approach was to be hard-coding them into the create_osw method in commercial.py, but this is annoying because it required making special bsb envs on eagle for each project. In looking through the bsb code, here's what I've come up with as a possible approach.

1. Standardize the `timeseries_csv_export` input

The TimeSeriesCSVExport measure is partially a normal reporting measure with arguments, but also special because including it also triggers the timeseries postprocessing step. Also, right now only the create_osw method in residential.py actually takes the values from this measure and sets the measure arguments; commercial.py hard-codes the measure inputs. I think it would make sense to separate out the timeseries postprocessing trigger from the TimeSeriesCSVExport... although I think that the validation code would need to include a check to make sure that they are synchronized correctly if both present... so maybe this is good enough reason to keep this one "special" and therefore separate from the other reporting measures?

2. Remove the `include_qaqc` input

This is currently only applicable to stock_type: commercial anyway.

3. Allow reporting measure w/ arguments to be added dynamically.

I can think of two different approaches; either a) make it work like upgrades, where the measure argument values are all defined in options_lookup.tsv, or b) specify the measure name and argument values directly in the YML. I think I prefer a) because it's consistent and therefore doesn't require much staff training, it keeps the YMLs readable in the event that there are a lot of arguments for a reporting measure, and it can separate out the responsibilities for maintaining the inputs to the reporting measures vs. telling users what to put into the YML. The main argument for b) that I can see is that it keeps the create_osw code simpler. However, code to add upgrades from options_lookup.tsv is already in create_osw, so it doesn't seem like a big deal to extend this approach to reporting measures.

a) like upgrades

reporting_measures:
- report_name: LA100 Report
    - qaqc_checks|detailed
- report_name: Some Other Reporting Measure
    - whatever_output_checks|only_timeseries
- report_name: Maximum Quality
    - qaqc_checks|detailed

b) specify measure dir & arguments directly

reporting_measures:
- report_measure_dir: LA100Report
    - some_arg_1|true
    - some_arg_arg2|7.0
    - some_arg_arg3|9.8
- report_measure_dir: SomeOtherReportingMeasure
    - export_summary_whatever|true
- report_measure_dir: MaximumQuality
    - whatever|detailed

asparke2 · 2020-10-19T21:01:04Z

After further investigation, I think option b) is actually required unless we re-write bsb to do the options_lookup.tsv parsing/translation to actual measure inputs

nmerket · 2020-10-28T23:13:46Z

@asparke2 Interesting ideas. I'm working on the first part of this now, but I like this more generalizable approach.

nmerket added enhancement New feature or request eagle aws labels Apr 6, 2020

nmerket self-assigned this Apr 6, 2020

nmerket mentioned this issue Apr 22, 2020

ComStock! #65

Merged

nmerket mentioned this issue Oct 8, 2020

Sampler and Workflow Refactor #187

Merged

7 tasks

nmerket closed this as completed in #187 Feb 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor of YAML config file #147

Refactor of YAML config file #147

nmerket commented Apr 6, 2020 •

edited

rajeee commented Apr 6, 2020

asparke2 commented Oct 16, 2020

asparke2 commented Oct 19, 2020

nmerket commented Oct 28, 2020

Refactor of YAML config file #147

Refactor of YAML config file #147

Comments

nmerket commented Apr 6, 2020 • edited

rajeee commented Apr 6, 2020

asparke2 commented Oct 16, 2020

Generalize the reporting measures & standardize res/com approach as much as possible

1. Standardize the timeseries_csv_export input

2. Remove the include_qaqc input

3. Allow reporting measure w/ arguments to be added dynamically.

asparke2 commented Oct 19, 2020

nmerket commented Oct 28, 2020

nmerket commented Apr 6, 2020 •

edited

1. Standardize the `timeseries_csv_export` input

2. Remove the `include_qaqc` input