Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of YAML config file #147

Closed
nmerket opened this issue Apr 6, 2020 · 4 comments · Fixed by #187
Closed

Refactor of YAML config file #147

nmerket opened this issue Apr 6, 2020 · 4 comments · Fixed by #187
Assignees
Labels
aws eagle enhancement New feature or request

Comments

@nmerket
Copy link
Member

nmerket commented Apr 6, 2020

Given the organic growth of the config (yaml) file to include support for additional compute environments and analysis types, there's some confusing ways of specifying details. This seems like a good time to reconsider how we organize the YAML file. Here are my current thoughts:

  • Switch from yamale to json schema as there is better tooling around it including real time validation plugins for VSCode, etc. We'd still accept YAML, but would also accept JSON as an input file.
  • Have a spec for a custom docker or singularity image Figure out how to handle custom docker/singularity images #83
  • Make a more generic spec to set the following classes (this will replace the spec for stock_type)
    • A sampler class and properties
    • A workflow generator class and properties
    • Consolidate the docker/singularity sampler classes into one and have the environment pass which method to use.
  • Move output_directory under the compute environment since it varies by compute environment.
  • Clear up the confusion about there being two places to set and s3 location. postprocessing.s3 and aws.s3.

cc @rHorsey @joseph-robertson @rajeee

@nmerket nmerket added enhancement New feature or request eagle aws labels Apr 6, 2020
@nmerket nmerket self-assigned this Apr 6, 2020
@rajeee
Copy link
Contributor

rajeee commented Apr 6, 2020

Sounds like a good plan to improve the config file. I would however vote against the json schema since it doesn't support comments. In big config files with plenty of upgrades, comments help a lot. VSCode does have a plugin for yaml validation and it works pretty well. And, supporting both schemas might create more confusion than solve, I think.

@nmerket nmerket mentioned this issue Apr 22, 2020
@asparke2
Copy link
Member

Generalize the reporting measures & standardize res/com approach as much as possible

@nmerket Over the last few projects we've done w/ ComStock, we've needed to add project-specific reporting measures. The most expedient approach was to be hard-coding them into the create_osw method in commercial.py, but this is annoying because it required making special bsb envs on eagle for each project. In looking through the bsb code, here's what I've come up with as a possible approach.

1. Standardize the timeseries_csv_export input

The TimeSeriesCSVExport measure is partially a normal reporting measure with arguments, but also special because including it also triggers the timeseries postprocessing step. Also, right now only the create_osw method in residential.py actually takes the values from this measure and sets the measure arguments; commercial.py hard-codes the measure inputs. I think it would make sense to separate out the timeseries postprocessing trigger from the TimeSeriesCSVExport... although I think that the validation code would need to include a check to make sure that they are synchronized correctly if both present... so maybe this is good enough reason to keep this one "special" and therefore separate from the other reporting measures?

2. Remove the include_qaqc input

This is currently only applicable to stock_type: commercial anyway.

3. Allow reporting measure w/ arguments to be added dynamically.

I can think of two different approaches; either a) make it work like upgrades, where the measure argument values are all defined in options_lookup.tsv, or b) specify the measure name and argument values directly in the YML. I think I prefer a) because it's consistent and therefore doesn't require much staff training, it keeps the YMLs readable in the event that there are a lot of arguments for a reporting measure, and it can separate out the responsibilities for maintaining the inputs to the reporting measures vs. telling users what to put into the YML. The main argument for b) that I can see is that it keeps the create_osw code simpler. However, code to add upgrades from options_lookup.tsv is already in create_osw, so it doesn't seem like a big deal to extend this approach to reporting measures.

a) like upgrades

reporting_measures:
- report_name: LA100 Report
    - qaqc_checks|detailed
- report_name: Some Other Reporting Measure
    - whatever_output_checks|only_timeseries
- report_name: Maximum Quality
    - qaqc_checks|detailed

b) specify measure dir & arguments directly

reporting_measures:
- report_measure_dir: LA100Report
    - some_arg_1|true
    - some_arg_arg2|7.0
    - some_arg_arg3|9.8
- report_measure_dir: SomeOtherReportingMeasure
    - export_summary_whatever|true
- report_measure_dir: MaximumQuality
    - whatever|detailed

@asparke2
Copy link
Member

After further investigation, I think option b) is actually required unless we re-write bsb to do the options_lookup.tsv parsing/translation to actual measure inputs

@nmerket
Copy link
Member Author

nmerket commented Oct 28, 2020

@asparke2 Interesting ideas. I'm working on the first part of this now, but I like this more generalizable approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws eagle enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants