Skip to content

NSAPH-Projects/space

Repository files navigation

Licence PyPI version build codecov build

🚀 Description

Spatial confounding poses a significant challenge in scientific studies involving spatial data, where unobserved spatial variables can influence both treatment and outcome, possibly leading to spurious associations. To address this problem, SpaCE provides realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and a smoothness and confounding scores characterizing the effect of a missing spatial confounder. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. Realistic semi-synthetic outcomes and counterfactuals are generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. SpaCE facilitates an automated end-to-end machine learning pipeline, simplifying data loading, experimental setup, and model evaluation.

🐍 Installation

Install the PyPI version:

pip install "spacebench[all]"

The option [all] installs all dependencies necessary for the spatial confounding algorithms and the examples. If you only want to use the SpaceDatasets, use pip install spacebench instead.

You can also install the latest 🔥 features from the development version:

pip install "git+https://github.com/NSAPH-Projects/space@dev#egg=spacebench[all]"

Python 3.10 or higher is required. See the docs and requirements.txt for more information.

🐢 Getting started

To obtain a benchmark dataset for spatial confounding you need to 1) create a SpaceEnv which contains real treatment and confounder data, and a realistic semi-synthetic outcome, 2) create a SpaceDataset which masks a spatially-varying confounder and facilitates the data loading pipeline for causal inference.

from spacebench import SpaceEnv
env = SpaceEnv('healthd_dmgrcs_mortality_disc')
dataset = env.make()
print(dataset)
SpaceDataset with a missing spatial confounder:
  treatment: (3109,) (binary)
  confounders: (3109, 30)
  outcome: (3109,)
  counterfactuals: (3109, 2)
  confounding score of missing: 0.02
  spatial smoothness score of missing: 0.11
  graph edge list: (9237, 2)
  graph node coordinates: (3109, 2)
  parent SpaceEnv: healthd_dmgrcs_mortality_disc
WARNING ⚠️ : this dataset contains a (realistic) synthetic outcome!
By using it, you agree to understand its limitations.  The variable
names have been masked to emphasize that no  inferences can be made
about the source data.

Available SpaceEnvs

The list of available environments can be in the documentations or in an interactive session as:

from spacebench import DataMaster
dm = DataMaster()
dm.master.head()
environments treatment_type collection
healthd_dmgrcs_mortality_disc binary Air Pollution and Mortality
cdcsvi_limteng_hburdic_cont continuous Social Vulnerability and Welfare
climate_relhum_wfsmoke_cont continuous Heat Exposure and Wildfires
climate_wfsmoke_minrty_disc binary Heat Exposure and Wildfires
healthd_hhinco_mortality_cont continuous Air Pollution and Mortality
healthd_pollutn_mortality_cont continuous Air Pollution and Mortality
county_educatn_election_cont continuous Welfare and Elections
county_phyactiv_lifexpcy_cont continuous Welfare and Elections
county_dmgrcs_election_disc binary Welfare and Elections
cdcsvi_nohsdp_poverty_cont continuous Social Vulnerability and Welfare
cdcsvi_nohsdp_poverty_disc binary Social Vulnerability and Welfare

To learn more about the data collections and the environments see the docs. The data collections and environments are hosted at the Harvard Dataverse. "Data "nutrition labels" for the collections can be found here. The environments are produced using the space-data repository from a data collection with a configuration file. Don't forget to read our paper.

🙉 Code of Conduct

Please note that the SpaCE project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

👽 Contact

We welcome contributions and feedback about spacebench. If you have any suggestions or ideas, please open an issue or submit a pull request.

Documentation

The documentation is hosted at https://nsaph-projects.github.io/space/.

About

SpaCE, the Spatial Confounding Environment, loads benchmark datasets for causal inference methods tackling spatial confounding

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published