Skip to content

prio-data/endogen

Repository files navigation

ENDOGEN

Dynamic simulation of socio-economic and political systems

This site serves the code for the ENDOGEN dynamic endogenous simulator. It allows you to estimate a set of statistical models and use the predictions from these models as simulated input in the other models. At the core of the system is a model scheduler using NetworkX, a data-model based on xarray, and a set of options for variable transformations. ENDOGEN currently supports models supported by Nixtla MLForecast. To set up and configure models and simulations, ENDOGEN is leveraging hydra, making it easy to bootstrap and build extensive models in YAML.

ENDOGEN is currently under development. Expect breaking changes for each version. Use the tagged versions instead of the main branch.

ENDOGEN is developed through POLIMPACT. POLIMPACT is a research project funded by an ERC Advanced Grant running from Fall 2022 until Fall 2027, see https://www.prio.org/projects/polimpact and https://erc.easme-web.eu/?p=101055133.

Please see our webpage for further documentation.

Minimal example without configuration files

from endogen.endogen import EndogenousSystem
from dataclasses import asdict
from endogen.config import GlobalSimConfig, InputModel, Lags
from mlforecast.forecast import MLForecast
from sklearn.linear_model import LinearRegression

gc = GlobalSimConfig(input_data = "data/cy_data_static_test.csv",
                     time_var = "year",
                     unit_var = "gwcode",
                     nsim = 10,
                     end = 2050,
                     include_past_n = 30,
                     start = 2015,
                     vars = ['gdppc', 'psecprop'])

gdppc_model = InputModel(stage = "writing",
           output_var= "gdppc",
           input_vars = ["gdppc_l1", "psecprop_l1"],
           model = MLForecast(models = LinearRegression()),
           lags = [Lags(num_lag = 1, input_vars = ["gdppc", "psecprop"])])

edu_model = InputModel(stage = "writing",
           output_var= "psecprop",
           input_vars = ["gdppc_l1", "psecprop_l1"],
           model = MLForecast(models = LinearRegression()),
           lags = [Lags(num_lag = 1, input_vars = ["gdppc", "psecprop"])])

s = EndogenousSystem(**asdict(gc))
s.models.add_models([gdppc_model, edu_model])
s.create_forecast_container()
s.fit_models()
s.simulate()
s.plot("gdppc", unit = 475)
s.plot("psecprop", unit = 475)

Installation

Requirements: A recently updated Linux or OS X operating system (tested with Ubuntu 20.04), Windows with WSL, or in Windows 10/11. As a caution, it seems simulation speed is much slower on native Windows than on Unix systems, and even using WSL is faster. Certain future features might not be supported using Windows (e.g., we might be using JAX for certain model support, and it does not support running on GPU in Windows).

  1. Install mamba.

  2. Add conda-lock to the base environment.

$ mamba install --channel=conda-forge --name=base conda-lock
$ mamba update conda-lock
  1. Install git.

  2. Download our package from github:

$ git clone https://github.com/prio-data/endogen
$ cd endogen
  1. Create the virtual environment based on lock-file created from environment.yml
$ conda-lock install -n endogen_env  --mamba
$ mamba activate endogen_env
  1. Run poetry to add additional python package requirements.
(endogen_env) $ poetry install
  1. Optionally install graphviz to visualize graphs.
```console
$ sudo apt-get install graphviz
```
```console
$ brew install graphviz
```
Install [here](https://graphviz.org/download/)