-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle remapping BMI module output variable names #746
Comments
There are a few other edge cases that I will document in this issue in a bit. Need to run a few errands. |
I need clarification on one of the situations that were noted, but there may not be a bug here. The one I am uncertain of is when the @aaraney, I can't tell from the description whether something else is being observed. You mentioned "last" module, but there should only be one (at most) without an alias. Are you saying that whether and how (i.e., to which module) the alias was applied was irrelevant to what was actually used for Regarding the other described scenarios:
I think those are all that were mentioned, but let me know if I'm missing something. |
Not exactly, I am saying that the top level // ...
"main_output_variable": "main_output", // last module will be queried for this output variable name verbatim. alias is not translated
"modules": [
{
// ...
"main_output_variable": "output",
"variables_names_map": {
"input": "value",
"output": "main_output"
}
}
},
{
// ...
"main_output_variable": "output",
"uses_forcing_file": false,
"variables_names_map": {
"input": "value"
}
// ... |
// ...
"main_output_variable": "output", // last module will still be queried output despite the presence of an alias
"modules": [
{
// ...
"main_output_variable": "output",
"variables_names_map": {
"input": "value_2"
}
}
},
{
// ...
"main_output_variable": "output",
"uses_forcing_file": false,
"variables_names_map": {
"input": "value",
"output": "dontuse"
}
// ... |
I might be entirely wrong here, but isn't this intended? Multi-BMI, to my knowledge, was to pipe models together in a way that forms a tree of models -- whereas the above is attempting to output an ensemble of models? Again I might be completely wrong there |
Fair point, I may be misinterpreting the intended functionality of a multi-bmi. In my mind, it seems that you should be able to create a pipe of models that has an overall |
Other weird edge case that does not throw an error nor do what I would think it would. This example has two model:
Forcing contains two variables realization config{
"global": {
"formulations": [
{
"name": "bmi_multi",
"params": {
"model_type_name": "",
"allow_exceed_end_time": true,
"main_output_variable": "output_2",
"init_config": "/dev/null",
"uses_forcing_file": false,
"modules": [
{
"name": "bmi_python",
"params": {
// has one input, `input`, and two outputs `output` and `output_2`.
// `input` is copied to `output` and `output_2`
"python_type": "model.pipe_model_2",
"model_type_name": "pipe_model_2",
"name": "python_bmi_1",
"init_config": "green",
"allow_exceed_end_time": true,
"main_output_variable": "output",
// adding `output_variables` has no effect
// "output_variables": ["output", "output_2"],
"uses_forcing_file": false,
"variables_names_map": {
"input": "value_2", // value_2 is series 2,4,6,8,...
"output": "dontuse"
}
}
},
{
"name": "bmi_python",
"params": {
// has one input, `input`, and one output, `output`
"python_type": "model.pipe_model",
"model_type_name": "pipe_model",
"name": "python_bmi_2",
"init_config": "blue",
"allow_exceed_end_time": true,
"main_output_variable": "output",
"uses_forcing_file": false,
"variables_names_map": {
"input": "value" // value is series 1,2,3,4,...
}
}
}
]
}
}
],
"forcing": {
"file_pattern": "forcing.csv",
"path": "./data/",
"provider": "CsvPerFeature"
}
},
"time": {
"start_time": "2020-01-01 00:00:00",
"end_time": "2020-01-01 01:00:00",
"output_interval": 3600
}
} The odd thing is the catchment output. catchment output
edit: Moving |
Okay, so it seems that the solution is to specify the output variable alias names in the |
This might be documented somewhere, but i've not found it. The variables that are output by default in single formulation case vs a multi-bmi case differ. In a single formulation case, unless |
@program-- and @robertbartel see the |
Ok, @aaraney, I'm still remembering parts of how things work ... Indeed, For multi-formulations, technically the
It's a long time ago, but I think the original intent was to be able to modularly compose modeling. If not originally, it was not long thereafter a desired capability to be able to compose not just the process modeling, but also the per-time-step results (i.e., data output to files), from any output variables within the composition. I think that's less of a tree and more of an ensemble (though I feel like "ensemble" often has contextual implications, and I'm not sure if any are intended here). That said, the behavior around |
@robertbartel Haha yeah sorry, I should've been more specific with what I meant by tree and ensemble. For trees, what I meant is multiple models that are composed (in the way you're describing), such that the "tree" of models represents 1 single model flowchart LR
A --> C
B --> C
D --> E
E --> F
C --> F
where each model passes its outputs to the next model, and F's outputs are the final outputs of the multi-bmi model. i.e. a pipeline of models representing a single model For an ensemble: flowchart LR
A --> Output
B --> Output
c --> Output
where each model in the ensemble passes their outputs to catchment file, so the output catchment file would have all outputs from each model, and they may/may not interact with one another. I think this is what the intention of making the |
As always, thanks for the in-depth response, @robertbartel!
Yeah, I went down a rabbit hole on Friday to better understand the implementation of the catchment to nexus aggregation specifically looking in |
@program--, I think you are right on with that description. However, I don't think that implies that the last model's output is what is written to disk. It could be that the processes each model represents necessitate ordering the models in a realization config file such that the last model does not output the variable you are interested in. Of course we could create some kind of sink model that takes as input the model output that you want, but I would hope we can find a more longterm solution.
Yeah, I don't know that the vision was for the framework to handle "creating" aggregate ensembles (so the actual aggregation of multiple |
@program--, thanks for the explanations. Unfortunately, things are coupled a bit more loosely than that. It is in some ways both, but strictly neither of those things. Output file generation is controlled by the formulation. The formulation has access to output variables from the full ensemble of nested modules, but it selects what from the ensemble to include in output according to the configuration. The formulation defers to its nested modules to actually perform the modeling, and it processes them in a particular order. The formulation can provide any of the outputs of any of its nested modules to serve as an input value for any of its nested modules. Typically that looks something like this, which essential fits under the tree category: flowchart LR
A --> B
B --> C
A --> C
C --> D
B --> D
However, it can also look like this: flowchart LR
A --> B
B --> C
A --> C
C --> A
C --> D
B --> D
Where the C to A edge spans across the boundary of the current and the previous time step; i.e., if C` represents module C of the previous time step, something like this: flowchart LR
A --> B
B --> C
A --> C
C` --> A
C --> D
B --> D
That's still kind of a tree, but not in the domain of what I think you had in mind. |
@robertbartel Huh, that's interesting. Thank you for that explanation! I realize I should've probably said DAG instead of tree, but you got what I meant 😄. I didn't consider a formulation having a back edge in the way you described C` and C. How would that be modeled though? For example, the base case for A when C has no previous timestep? |
So the formulation definition being a list is a pre-cursor to supporting this type of functionality. The easiest case is definitely just running multiple formulations and each generating their own output (the spaghetti plot). However, once this capability exists, it wouldn't be too challenging, I think, to add a configuration option that allows some form of ensemble aggregation to occur during the model run, supporting some common aggregation mechanisms like "mean, median" ect and use that throughout the simulation as well, if that is a useful mechanic (I suspect it will be once we get there). |
@hellkite500, exactly. It just comes down to what you want to support and what is in scope for framework capabilities. |
@program--, great question! You should specify the initial input in a top-level |
I think I found a case where the framework should throw? Here is the setup and what should be happening graphically: flowchart LR
subgraph f [forcing]
value
value_2
end
subgraph m1 [module 1]
input[input: value]
input_2[input_2: value_2]
output[output: output_2]
output_2[output_2: output]
end
subgraph m2 [module 2]
input_m2[input: output]
output_m2[output: output_sink]
end
value --1,2,3 --> input
value_2 --2,4,6 --> input_2
input --> output
input_2 --> output_2
output_2 --> input_m2
input_m2 --> output_m2
output_m2 --> output_sink
in "variables_names_map": {
"input": "value",
"input_2": "value_2",
"output": "output_2",
"output_2": "output"
}
// ...
"variables_names_map": {
"input": "output",
"output": "output_sink"
} However, the catchment output file has the series |
It seems sensible for the framework to throw if an output aliases an existing output in the same module. It appears to just ignore it. |
Configuring a multi bmi realization with two modules that share a common output variable name can produces errors or unexpected results. Similarly, if the outer multi bmi module's
main_output_variable
is the shared common output variable name, the last module listed in themodules
section wins. Meaning the last module's output variable with the name associated with the multi bmi'smain_output_variable
section will be the output.To explore this problem I created a simple BMI python module with a single
input
scalar variable and a singleoutput
scalar variable. Theoutput
is simply assigned toinput
. I put the module "back to back" in a multi bmi realization (see below realization configs) to show case these issues.Failing to remapping either module's output name produces the following error:
realization file
Error with remapping but not correctly specifying multi bmi
main_output_variable
(not the most helpful b.c. its a symptom):realization file
Hopefully this will be more helpful in showcasing what is occurring. In the below details section are the BMI function calls made by each module leading up to the error (kind of a poor mans stack trace). They are labeled 1 and 2 respectively following their ordering in the realization config file.
BMI module function calls
python bmi module source
To use the realization files included above, you will need to create a dir structure of
model/bmi.py
, setup a virtual environment, installbmipy
, and that should mostly work.edit: I no longer think this is strictly a bug, there could be unintended behavior, but that is up for debate. Instead, I did not understand the behavior of multi-bmi vs a single formulation and that led to confusion. At a minimum, I will contribute documentation to clarify the behavioral differences.
My confusion stemmed from the differences in what is output to disk (catchment output file) in a single formulation case vs a multi-bmi case. The variables that are output by default in single formulation case vs a multi-bmi case differ.
Single formulation: all output variables are written as output unless
output_variables
is specified. Ifoutput_variables
is specified, the set of output variables written can be limited.multi-bmi module: only the last listed module's output variables in the
modules
list are written as output. If the last listed module specifiesoutput_variables
, the set of output variables (must use output name not alias name) written can be limited just as in single formulation case.modules
in a multi-bmi formulation, you must list all desired output variables or their alias equivalent name in the multi-bmi configsoutput_variables
section (e.g.global.formulations[0].params.output_variables
).Similarly, a multi-bmi formulation's
main_output_variable
does not control if a value is output for not. It is valid to specify amain_output_variable
that is not written as output. I don't know the functional purpose ofmain_output_variable
, however for NextGen to run, a valid output variable name (not alias) from any module in the module's list must be provided.The text was updated successfully, but these errors were encountered: