Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle remapping BMI module output variable names #746

Open
aaraney opened this issue Feb 27, 2024 · 21 comments
Open

Handle remapping BMI module output variable names #746

aaraney opened this issue Feb 27, 2024 · 21 comments
Labels
bug Something isn't working

Comments

@aaraney
Copy link
Member

aaraney commented Feb 27, 2024

Configuring a multi bmi realization with two modules that share a common output variable name can produces errors or unexpected results. Similarly, if the outer multi bmi module's main_output_variable is the shared common output variable name, the last module listed in the modules section wins. Meaning the last module's output variable with the name associated with the multi bmi's main_output_variable section will be the output.

To explore this problem I created a simple BMI python module with a single input scalar variable and a single output scalar variable. The output is simply assigned to input. I put the module "back to back" in a multi bmi realization (see below realization configs) to show case these issues.

Failing to remapping either module's output name produces the following error:

std::runtime_error: Multi BMI cannot be created with module BmiA with output variable output because a previous module is using this output variable name/alias.
realization file
{
  "global": {
    "formulations": [
      {
        "name": "bmi_multi",
        "params": {
          "model_type_name": "",
          "allow_exceed_end_time": true,
          "main_output_variable": "output", // which module's `output` should this be? Currently it is always the last.
          "init_config": "/dev/null",
          "uses_forcing_file": false,
          "modules": [
            {
              "name": "bmi_python",
              "params": {
                "python_type": "model.BmiA",
                "model_type_name": "BmiA",
                "name": "python_bmi_1",
                "init_config": "green",
                "allow_exceed_end_time": true,
                "main_output_variable": "output",
                "uses_forcing_file": false,
                "variables_names_map": {
                  "input": "DLWRF_surface"
                }
              }
            },
            {
              "name": "bmi_python",
              "params": {
                "python_type": "model.BmiA",
                "model_type_name": "BmiA",
                "name": "python_bmi_2",
                "init_config": "blue",
                "allow_exceed_end_time": true,
                "main_output_variable": "output",
                "uses_forcing_file": false,
                "variables_names_map": {
                  "input": "DLWRF_surface"
                }
              }
            }
          ]
        }
      }
    ],
    "forcing": {
      "file_pattern": "cat-27.csv",
      "path": "./data/",
      "provider": "CsvPerFeature"
    }
  },
  "time": {
    "start_time": "2015-12-01 00:00:00",
    "end_time": "2015-12-01 23:00:00",
    "output_interval": 3600
  }
}

Error with remapping but not correctly specifying multi bmi main_output_variable (not the most helpful b.c. its a symptom):

libc++abi: terminating due to uncaught exception of type pybind11::error_already_set: KeyError: ''
realization file
{
  "global": {
    "formulations": [
      {
        "name": "bmi_multi",
        "params": {
          "model_type_name": "output",
          "allow_exceed_end_time": true,
          "main_output_variable": "", // <- note this is not set. this is where the empty string comes from in the error.
          "init_config": "/dev/null",
          "uses_forcing_file": false,
          "modules": [
            {
              "name": "bmi_python",
              "params": {
                "python_type": "model.BmiA",
                "model_type_name": "BmiA",
                "name": "python_bmi_1",
                "init_config": "green",
                "allow_exceed_end_time": true,
                "main_output_variable": "output",
                "uses_forcing_file": false,
                "variables_names_map": {
                  "input": "DLWRF_surface",
                  "output": "output_1"
                }
              }
            },
            {
              "name": "bmi_python",
              "params": {
                "python_type": "model.BmiA",
                "model_type_name": "BmiA",
                "name": "python_bmi_2",
                "init_config": "blue",
                "allow_exceed_end_time": true,
                "main_output_variable": "output",
                "uses_forcing_file": false,
                "variables_names_map": {
                  "input": "DLWRF_surface"
                }
              }
            }
          ]
        }
      }
    ],
    "forcing": {
      "file_pattern": "cat-27.csv",
      "path": "./data/",
      "provider": "CsvPerFeature"
    }
  },
  "time": {
    "start_time": "2015-12-01 00:00:00",
    "end_time": "2015-12-01 23:00:00",
    "output_interval": 3600
  }
}

Hopefully this will be more helpful in showcasing what is occurring. In the below details section are the BMI function calls made by each module leading up to the error (kind of a poor mans stack trace). They are labeled 1 and 2 respectively following their ordering in the realization config file.

BMI module function calls
NGen Framework 0.1.0
Building Nexus collection
Building Catchment collection
Initializing formulations
1: initialize('green')
1: get_time_units()
1: get_start_time()
1: get_output_item_count()
1: get_output_var_names()
1: get_input_item_count()
1: get_input_var_names()
1: get_output_item_count()
1: get_output_var_names()
2: initialize('blue')
2: get_time_units()
2: get_start_time()
2: get_output_item_count()
2: get_output_var_names()
2: get_input_item_count()
2: get_input_var_names()
2: get_output_item_count()
2: get_output_var_names()
Building Feature Index
Catchment topology is dendritic.
Running Models
Running timestep 0
1: get_current_time()
1: get_current_time()
1: get_current_time()
1: get_input_item_count()
1: get_input_var_names()
1: get_var_nbytes('input')
1: get_var_itemsize('input')
1: get_var_type('input')
1: get_var_units('input')
WARN: Unit conversion unsuccessful - Returning unconverted value! ("Unable to parse out_units value -")
1: get_var_itemsize('input')
1: get_var_type('input')
1: get_var_nbytes('input')
1: get_var_itemsize('input')
1: set_value('input', array([361.30001831]))
1: get_time_step()
1: update()
1: time stepped 1
1: get_var_type('output')
1: get_var_itemsize('output')
1: get_var_type('output')
1: get_var_itemsize('output')
1: get_input_item_count()
1: get_input_var_names()
1: get_var_type('output')
1: get_value_at_indices('output', array([0.]), array([0], dtype=int32))
2: get_current_time()
2: get_current_time()
2: get_input_item_count()
2: get_input_var_names()
2: get_var_nbytes('input')
2: get_var_itemsize('input')
2: get_var_type('input')
2: get_var_units('input')
WARN: Unit conversion unsuccessful - Returning unconverted value! ("Unable to parse out_units value -")
2: get_var_itemsize('input')
2: get_var_type('input')
2: get_var_nbytes('input')
2: get_var_itemsize('input')
2: set_value('input', array([0.]))
2: get_time_step()
2: update()
2: time stepped 1
2: get_var_type('output')
2: get_var_itemsize('output')
2: get_var_type('output')
2: get_var_itemsize('output')
2: get_input_item_count()
2: get_input_var_names()
2: get_var_type('output')
2: get_value_at_indices('output', array([0.]), array([0], dtype=int32))
2: get_var_type('')
libc++abi: terminating due to uncaught exception of type pybind11::error_already_set: KeyError: ''
python bmi module source

To use the realization files included above, you will need to create a dir structure of model/bmi.py, setup a virtual environment, install bmipy, and that should mostly work.

import numpy as np
from bmipy import Bmi
from typing_extensions import override


class BmiA(Bmi):
    _name = "A"
    _input_var_names = ("input",)
    _output_var_names = ("output",)
    _time_step_size = 3600  # sec in hour

    def __init__(self):
        input = np.array([0], dtype=np.float64)
        unit = "-"
        self._values = {
            "input": input,
            "output": input,
        }
        self._var_units = {
            "input": unit,
            "output": unit,
        }
        self._var_loc = {
            "input": "node",
            "output": "node",
        }

        self.started = False

        self._time_step = 0
        self._start_time = 0.0
        self._end_time = np.finfo("d").max
        self._time_units = "s"

    def _run_timestep(self):
        self._time_step += 1
        print(f"time stepped {self._time_step}")

    @override
    def initialize(self, config_file: str) -> None:
        ...


    @override
    def update(self):
        self.update_until(self._time_step_size + self.get_current_time())

    @override
    def update_until(self, time: float) -> None:
        self.started = True
        n_steps = (time - self.get_current_time()) / self.get_time_step()
        for _ in range(int(n_steps)):
            self._run_timestep()

    @override
    def finalize(self): ...

    @override
    def get_var_type(self, name: str) -> str:
        return str(self.get_value_ptr(name).dtype)

    @override
    def get_var_units(self, name: str) -> str:
        return self._var_units[name]

    @override
    def get_var_nbytes(self, name: str) -> int:
        return self.get_value_ptr(name).nbytes

    @override
    def get_var_itemsize(self, name: str) -> int:
        return np.dtype(self.get_var_type(name)).itemsize

    @override
    def get_var_location(self, name: str) -> str:
        return self._var_loc[name]

    @override
    def get_value_ptr(self, name: str) -> np.ndarray:
        return self._values[name]

    @override
    def get_value(self, name: str, dest: np.ndarray) -> np.ndarray:
        dest[:] = self.get_value_ptr(name).copy().flatten()
        return dest

    @override
    def get_value_at_indices(
        self, name: str, dest: np.ndarray, inds: np.ndarray
    ) -> np.ndarray:
        dest[:] = self.get_value_ptr(name).take(inds)
        return dest

    @override
    def set_value(self, name: str, src: np.ndarray) -> None:
        val = self.get_value_ptr(name)
        val[:] = src.copy().reshape(val.shape)

    @override
    def set_value_at_indices(
        self, name: str, inds: np.ndarray, src: np.ndarray
    ) -> None:
        val = self.get_value_ptr(name)
        val.flat[inds] = src.copy()

    @override
    def get_component_name(self) -> str:
        return self._name

    @override
    def get_input_item_count(self) -> int:
        return len(self._input_var_names)

    @override
    def get_output_item_count(self) -> int:
        return len(self._output_var_names)

    @override
    def get_input_var_names(self) -> tuple[str]:
        return self._input_var_names

    @override
    def get_output_var_names(self) -> tuple[str]:
        return self._output_var_names

    @override
    def get_start_time(self) -> float:
        return self._start_time

    @override
    def get_end_time(self) -> float:
        return self._end_time  # type: ignore

    @override
    def get_current_time(self) -> float:
        return self._time_step * self._time_step_size

    @override
    def get_time_step(self) -> float:
        return self._time_step_size

    @override
    def get_time_units(self) -> str:
        return self._time_units

    @override
    def get_grid_shape(self, grid: int, shape: np.ndarray) -> np.ndarray:
        raise NotImplementedError("get_grid_shape")

    @override
    def get_grid_spacing(self, grid: int, spacing: np.ndarray) -> np.ndarray:
        raise NotImplementedError("get_grid_spacing")

    @override
    def get_grid_origin(self, grid: int, origin: np.ndarray) -> np.ndarray:
        raise NotImplementedError("get_grid_origin")

    @override
    def get_grid_type(self, grid: int) -> str:
        raise NotImplementedError("get_grid_type")

    @override
    def get_var_grid(self, name: str) -> int:
        raise NotImplementedError("get_var_grid")

    @override
    def get_grid_rank(self, grid: int) -> int:
        raise NotImplementedError("get_grid_rank")

    @override
    def get_grid_size(self, grid: int) -> int:
        raise NotImplementedError("get_grid_size")

    @override
    def get_grid_edge_count(self, grid: int) -> int:
        raise NotImplementedError("get_grid_edge_count")

    @override
    def get_grid_edge_nodes(self, grid: int, edge_nodes: np.ndarray) -> np.ndarray:
        raise NotImplementedError("get_grid_edge_nodes")

    @override
    def get_grid_face_count(self, grid: int) -> int:
        raise NotImplementedError("get_grid_face_count")

    @override
    def get_grid_face_nodes(self, grid: int, face_nodes: np.ndarray) -> np.ndarray:
        raise NotImplementedError("get_grid_face_nodes")

    @override
    def get_grid_node_count(self, grid: int) -> int:
        raise NotImplementedError("get_grid_node_count")

    @override
    def get_grid_nodes_per_face(
        self, grid: int, nodes_per_face: np.ndarray
    ) -> np.ndarray:
        raise NotImplementedError("get_grid_nodes_per_face")

    @override
    def get_grid_face_edges(self, grid: int, face_edges: np.ndarray) -> np.ndarray:
        raise NotImplementedError("get_grid_face_edges")

    @override
    def get_grid_x(self, grid: int, x: np.ndarray) -> np.ndarray:
        raise NotImplementedError("get_grid_x")

    @override
    def get_grid_y(self, grid: int, y: np.ndarray) -> np.ndarray:
        raise NotImplementedError("get_grid_y")

    @override
    def get_grid_z(self, grid: int, z: np.ndarray) -> np.ndarray:
        raise NotImplementedError("get_grid_z")

edit: I no longer think this is strictly a bug, there could be unintended behavior, but that is up for debate. Instead, I did not understand the behavior of multi-bmi vs a single formulation and that led to confusion. At a minimum, I will contribute documentation to clarify the behavioral differences.

My confusion stemmed from the differences in what is output to disk (catchment output file) in a single formulation case vs a multi-bmi case. The variables that are output by default in single formulation case vs a multi-bmi case differ.

  • Single formulation: all output variables are written as output unless output_variables is specified. If output_variables is specified, the set of output variables written can be limited.

  • multi-bmi module: only the last listed module's output variables in the modules list are written as output. If the last listed module specifies output_variables, the set of output variables (must use output name not alias name) written can be limited just as in single formulation case.

    • To include output variables from other modules in a multi-bmi formulation, you must list all desired output variables or their alias equivalent name in the multi-bmi configs output_variables section (e.g. global.formulations[0].params.output_variables).

Similarly, a multi-bmi formulation's main_output_variable does not control if a value is output for not. It is valid to specify a main_output_variable that is not written as output. I don't know the functional purpose of main_output_variable, however for NextGen to run, a valid output variable name (not alias) from any module in the module's list must be provided.

@aaraney
Copy link
Member Author

aaraney commented Feb 27, 2024

There are a few other edge cases that I will document in this issue in a bit. Need to run a few errands.

@program-- program-- added the bug Something isn't working label Feb 28, 2024
@robertbartel
Copy link
Contributor

I need clarification on one of the situations that were noted, but there may not be a bug here.

The one I am uncertain of is when the main_output_variable is set to a duplicated BMI output variable name used in multiple nested modules. That should work fine as long as all but one nested module has a mapped alias configured for said variable. That should make the "main" output come from the duplicated variable of the nested module without a configured alias for it.

@aaraney, I can't tell from the description whether something else is being observed. You mentioned "last" module, but there should only be one (at most) without an alias. Are you saying that whether and how (i.e., to which module) the alias was applied was irrelevant to what was actually used for main_output_variable?

Regarding the other described scenarios:

  • Two (or more) nested modules that share an output variable name must have explicit variable name mapping alias(es) set up for the configuration to be valid; I'd expect exactly the error in the description if this isn't done.
  • The main_output_variable value is a required config param, so it missing or configured to something that doesn't match any of the available outputs should also produce an error.

I think those are all that were mentioned, but let me know if I'm missing something.

@aaraney
Copy link
Member Author

aaraney commented Mar 1, 2024

Are you saying that whether and how (i.e., to which module) the alias was applied was irrelevant to what was actually used for main_output_variable?

Not exactly, I am saying that the top level main_output_variable in a multi-bmi does not resolve alias names.

           // ...
          "main_output_variable": "main_output", // last module will be queried for this output variable name verbatim. alias is not translated
          "modules": [
            {
                // ...
                "main_output_variable": "output",
                "variables_names_map": {
                  "input": "value",
                  "output": "main_output"
                }
              }
            },
            {
                // ...
                "main_output_variable": "output",
                "uses_forcing_file": false,
                "variables_names_map": {
                  "input": "value"
                }
                // ...

@aaraney
Copy link
Member Author

aaraney commented Mar 1, 2024

           // ...
          "main_output_variable": "output", // last module will still be queried output despite the presence of an alias
          "modules": [
            {
                // ...
                "main_output_variable": "output",
                "variables_names_map": {
                  "input": "value_2"
                }
              }
            },
            {
                // ...
                "main_output_variable": "output",
                "uses_forcing_file": false,
                "variables_names_map": {
                  "input": "value",
                  "output": "dontuse"
                }
                // ...

@program--
Copy link
Contributor

I might be entirely wrong here, but isn't this intended? Multi-BMI, to my knowledge, was to pipe models together in a way that forms a tree of models -- whereas the above is attempting to output an ensemble of models? Again I might be completely wrong there

@aaraney
Copy link
Member Author

aaraney commented Mar 1, 2024

I might be entirely wrong here, but isn't this intended?

Fair point, I may be misinterpreting the intended functionality of a multi-bmi. In my mind, it seems that you should be able to create a pipe of models that has an overall main_output_variable (top level in multi-bmi params section) from any stage in the pipe. That might not be possible due to output aliases constraints though?

@aaraney
Copy link
Member Author

aaraney commented Mar 1, 2024

Other weird edge case that does not throw an error nor do what I would think it would. This example has two model:

  1. pipe_model_2 that has one input, input, and two outputs output and output_2. input is copied to output and output_2.
  2. pipe_model is the same as Ive been using in the previous examples. One input and output variable, input and output respectively. input is copied to output.

Forcing contains two variables value and value_2. value looks like 1,2,3,4,...; value_2 looks like 2,4,6,8,...

realization config
{
  "global": {
    "formulations": [
      {
        "name": "bmi_multi",
        "params": {
          "model_type_name": "",
          "allow_exceed_end_time": true,
          "main_output_variable": "output_2",
          "init_config": "/dev/null",
          "uses_forcing_file": false,
          "modules": [
            {
              "name": "bmi_python",
              "params": {
                // has one input, `input`, and two outputs `output` and `output_2`.
                // `input` is copied to `output` and `output_2`
                "python_type": "model.pipe_model_2",
                "model_type_name": "pipe_model_2",
                "name": "python_bmi_1",
                "init_config": "green",
                "allow_exceed_end_time": true,
                "main_output_variable": "output",
                // adding `output_variables` has no effect
                // "output_variables": ["output", "output_2"],
                "uses_forcing_file": false,
                "variables_names_map": {
                  "input": "value_2", // value_2 is series 2,4,6,8,...
                  "output": "dontuse"
                }
              }
            },
            {
              "name": "bmi_python",
              "params": {
                // has one input, `input`, and one output, `output`
                "python_type": "model.pipe_model",
                "model_type_name": "pipe_model",
                "name": "python_bmi_2",
                "init_config": "blue",
                "allow_exceed_end_time": true,
                "main_output_variable": "output",
                "uses_forcing_file": false,
                "variables_names_map": {
                  "input": "value" // value is series 1,2,3,4,...
                }
              }
            }
          ]
        }
      }
    ],
    "forcing": {
      "file_pattern": "forcing.csv",
      "path": "./data/",
      "provider": "CsvPerFeature"
    }
  },
  "time": {
    "start_time": "2020-01-01 00:00:00",
    "end_time": "2020-01-01 01:00:00",
    "output_interval": 3600
  }
}

The odd thing is the catchment output. output_2 is not present. Adding output_variables has no effect. Note that output is from pipe_model (the second module) not the first. This detail is consistent with the above examples.

catchment output
Time Step,Time,output
0,2020-01-01 00:00:00,1.000000000
1,2020-01-01 01:00:00,2.000000000

edit: Moving "output_variables": ["output", "output_2"] to multi bmi params does work. However, still have the same problem of the "last module" wins. So if there are two modules that output, for example evapotranspiration and both modules use the same output variables name, if you want the evapotranspiration output from the module listed earlier in the modules list, there doesnt appear to be a way to do that unless im missing something.

@aaraney
Copy link
Member Author

aaraney commented Mar 1, 2024

Okay, so it seems that the solution is to specify the output variable alias names in the output_variables mapping in the top level multi bmi params section. The output_variables does respect alias names, whereas it appears that the top level main_output_variable does not. Additionally, it seems that the top level main_output_variable has no effect on if a variable is included in the final catchment output(see last example). Looking at the BMI calls the framework is making, the main_output_variable is queried, but not included in the output.

@aaraney
Copy link
Member Author

aaraney commented Mar 1, 2024

This might be documented somewhere, but i've not found it. The variables that are output by default in single formulation case vs a multi-bmi case differ. In a single formulation case, unless output_variables is specified, all output variables are written as output. In the multi-bmi module case, only the last module's variables are written as output. The top level multi-bmi main_output_variable must be any module's output variable, but it does not dictate what output is written. The only way it seems to have output variables written from multiple modules in the multi-bmi case is to specify a top level output_variables (top level in multi-bmi params section) and list the output variables / aliases to include. Im, not sure if this behavior is what is intended, if it is, we should more clearly document the differences.

@aaraney
Copy link
Member Author

aaraney commented Mar 1, 2024

@program-- and @robertbartel see the edit: section of this issue.

@robertbartel
Copy link
Contributor

Ok, @aaraney, I'm still remembering parts of how things work ...

Indeed, main_output_variable doesn't have much to do with output files; it controls the returned value of the formulation's get_response function: i.e., what's "output" when the framework advances modeling. The function returning a double value goes back all the way to (at least) the original Simple_Lumped_Model_Realization 4 years ago. I want to say that was the first Catchment_Formulation implementation. Whether or not get_response still should return a value like this is up for debate.

For multi-formulations, technically the get_response "main" return value will come from the module at the primary_module_index. The is default, set by Bmi_Multi_Formulation::create_multi_formulation, is the last index. While there's a set_index_for_primary_module setter, I'm fairly sure it's never used anywhere (i.e., that only the default of "last module" is ever used). One way or another, that's probably technically a bug (and it probably is still there because that functionality hasn't really mattered).

I might be entirely wrong here, but isn't this intended? Multi-BMI, to my knowledge, was to pipe models together in a way that forms a tree of models -- whereas the above is attempting to output an ensemble of models? Again I might be completely wrong there

It's a long time ago, but I think the original intent was to be able to modularly compose modeling. If not originally, it was not long thereafter a desired capability to be able to compose not just the process modeling, but also the per-time-step results (i.e., data output to files), from any output variables within the composition. I think that's less of a tree and more of an ensemble (though I feel like "ensemble" often has contextual implications, and I'm not sure if any are intended here). That said, the behavior around main_output_variable does work differently, which is another good reason to consider whether it still belongs.

@program--
Copy link
Contributor

I think that's less of a tree and more of an ensemble (though I feel like "ensemble" often has contextual implications, and I'm not sure if any are intended here).

@robertbartel Haha yeah sorry, I should've been more specific with what I meant by tree and ensemble. For trees, what I meant is multiple models that are composed (in the way you're describing), such that the "tree" of models represents 1 single model

flowchart LR

A --> C
B --> C
D --> E
E --> F
C --> F

where each model passes its outputs to the next model, and F's outputs are the final outputs of the multi-bmi model. i.e. a pipeline of models representing a single model

For an ensemble:

flowchart LR

A --> Output
B --> Output
c --> Output

where each model in the ensemble passes their outputs to catchment file, so the output catchment file would have all outputs from each model, and they may/may not interact with one another. I think this is what the intention of making the formulations config option an array was for?

@aaraney
Copy link
Member Author

aaraney commented Mar 4, 2024

As always, thanks for the in-depth response, @robertbartel!

Whether or not get_response still should return a value like this is up for debate.

Yeah, I went down a rabbit hole on Friday to better understand the implementation of the catchment to nexus aggregation specifically looking in Layer::update_models(). There definitely seems to be some cruft that we've accumulated in this section of the codebase (e.g. magic numbers and undocumented assumptions). At a minimum, we should bolster the documentation around main_output_variable. Specifically, what are the implications, assumptions, and invariants necessary to be the output of get_response()? More broadly, since we have been talking about / working on layer mechanics, it seems like a viable time to consider the necessity of get_response returning a value. Pinging @hellkite500 and @donaldwj just so you have this on your radar.

@aaraney
Copy link
Member Author

aaraney commented Mar 4, 2024

where each model passes its outputs to the next model, and F's outputs are the final outputs of the multi-bmi model. i.e. a pipeline of models representing a single model

@program--, I think you are right on with that description. However, I don't think that implies that the last model's output is what is written to disk. It could be that the processes each model represents necessitate ordering the models in a realization config file such that the last model does not output the variable you are interested in. Of course we could create some kind of sink model that takes as input the model output that you want, but I would hope we can find a more longterm solution.

I think this is what the intention of making the formulations config option an array was for?

Yeah, I don't know that the vision was for the framework to handle "creating" aggregate ensembles (so the actual aggregation of multiple formulations into a single output). But instead running an ensemble of model formulations. To be fair the domain language around ensemble is pretty fuzzy. Its common to use ensemble to mean both an aggregation of multiple models into a single forecast or multiple forecasts that originate from a common initialization time (think spaghetti plot). @hellkite500, can you speak to that?

@robertbartel
Copy link
Contributor

@program--, thanks for the explanations. Unfortunately, things are coupled a bit more loosely than that. It is in some ways both, but strictly neither of those things.

Output file generation is controlled by the formulation. The formulation has access to output variables from the full ensemble of nested modules, but it selects what from the ensemble to include in output according to the configuration.

The formulation defers to its nested modules to actually perform the modeling, and it processes them in a particular order. The formulation can provide any of the outputs of any of its nested modules to serve as an input value for any of its nested modules. Typically that looks something like this, which essential fits under the tree category:

flowchart LR

A --> B
B --> C
A --> C
C --> D
B --> D

However, it can also look like this:

flowchart LR

A --> B
B --> C
A --> C
C --> A
C --> D
B --> D

Where the C to A edge spans across the boundary of the current and the previous time step; i.e., if C` represents module C of the previous time step, something like this:

flowchart LR

A --> B
B --> C
A --> C
C` --> A
C --> D
B --> D

That's still kind of a tree, but not in the domain of what I think you had in mind.

@program--
Copy link
Contributor

@robertbartel Huh, that's interesting. Thank you for that explanation! I realize I should've probably said DAG instead of tree, but you got what I meant 😄.

I didn't consider a formulation having a back edge in the way you described C` and C. How would that be modeled though? For example, the base case for A when C has no previous timestep?

@hellkite500
Copy link
Member

Yeah, I don't know that the vision was for the framework to handle "creating" aggregate ensembles (so the actual aggregation of multiple formulations into a single output). But instead running an ensemble of model formulations

So the formulation definition being a list is a pre-cursor to supporting this type of functionality. The easiest case is definitely just running multiple formulations and each generating their own output (the spaghetti plot). However, once this capability exists, it wouldn't be too challenging, I think, to add a configuration option that allows some form of ensemble aggregation to occur during the model run, supporting some common aggregation mechanisms like "mean, median" ect and use that throughout the simulation as well, if that is a useful mechanic (I suspect it will be once we get there).

@aaraney
Copy link
Member Author

aaraney commented Mar 6, 2024

@hellkite500, exactly. It just comes down to what you want to support and what is in scope for framework capabilities.

@aaraney
Copy link
Member Author

aaraney commented Mar 6, 2024

I didn't consider a formulation having a back edge in the way you described C` and C. How would that be modeled though? For example, the base case for A when C has no previous timestep?

@program--, great question! You should specify the initial input in a top-level default_output_values block. This is where it gets tricky and where we need more documentation. NextGen can, in some cases, provide a default value (e.g. 0.0 for doubles) even when an input is not specified in default_output_values. However, its not as clear about what datatypes are supported and what their defaults are. I think @robertbartel is going to clarify this in a docs PR soon.

@aaraney
Copy link
Member Author

aaraney commented Mar 7, 2024

I think I found a case where the framework should throw?

Here is the setup and what should be happening graphically:

flowchart LR

subgraph f [forcing]
value
value_2
end

subgraph m1 [module 1]
input[input: value]
input_2[input_2: value_2]
output[output: output_2]
output_2[output_2: output]
end

subgraph m2 [module 2]
input_m2[input: output]
output_m2[output: output_sink]
end

value --1,2,3 --> input
value_2 --2,4,6 --> input_2
input --> output
input_2 --> output_2

output_2 --> input_m2 
input_m2 --> output_m2

output_m2 --> output_sink

in variable_names_map form

"variables_names_map": {
  "input": "value",
  "input_2": "value_2",
  "output": "output_2",
  "output_2": "output"
}
// ...
"variables_names_map": {
  "input": "output",
  "output": "output_sink"
}

However, the catchment output file has the series 1,2,3... instead of 2,4,6,....

@aaraney
Copy link
Member Author

aaraney commented Mar 7, 2024

It seems sensible for the framework to throw if an output aliases an existing output in the same module. It appears to just ignore it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants