Skip to content

santikka/dosearch

Repository files navigation

dosearch

Project Status: Active – The project has reached a stable, usable state and is being actively developed. R-CMD-check Codecov test coverage CRAN version

The dosearch R package facilitates identification of causal effects from arbitrary observational and experimental probability distributions via do-calculus and standard probability manipulations using a search-based algorithm (Tikka et al., 2019, 2021). Formulas of identifiable target distributions are returned in character format using LaTeX syntax. The causal graph may additionally include mechanisms related to:

  • Selection bias (Bareinboim and Tian, 2015)
  • Transportability (Bareinboim and Pearl, 2014)
  • Missing data (Mohan et al., 2013)
  • Context-specific independence (Corander et al., 2019)

See the package vignette or the references for further information.

Citing the package

If you use the dosearch package in a publication, please cite the corresponding paper in the Journal of Statistical Software:

Tikka S, Hyttinen A, Karvanen J (2021). “Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-Based Approach.” Journal of Statistical Software, 99(5), 1–40. doi:10.18637/jss.v099.i05.

Installation

You can install the latest release version from CRAN:

install.packages("dosearch")

Alternatively, you can install the latest development version of dosearch:

# install.packages("devtools")
devtools::install_github("santikka/dosearch")

Examples

# back-door formula
data <- "p(x,y,z)"
query <- "p(y|do(x))"
graph <- "
  x -> y
  z -> x
  z -> y
"
dosearch(data, query, graph)
#> \sum_{z}\left(p(z)p(y|x,z)\right)

# front-door formula
graph <- "
  x -> z
  z -> y
  x <-> y
"
dosearch(data, query, graph)
#> \sum_{z}\left(p(z|x)\sum_{x}\left(p(x)p(y|z,x)\right)\right)

# the 'napkin' graph
data <- "p(x,y,z,w)"
graph <- "
  x -> y
  z -> x
  w -> z
  x <-> w
  w <-> y
"
dosearch(data, query, graph)
#> \frac{\sum_{w}\left(p(w)p(y,x|z,w)\right)}{\sum_{y}\sum_{w}\left(p(w)p(y,x|z,w)\right)}

# case-control design
data <- "
  p(x*,y*,r_x,r_y)
  p(y)
"
graph <- "
  x -> y
  y -> r_y
  r_y -> r_x
"
md <- "r_x : x, r_y : y"
dosearch(data, query, graph, missing_data = md)
#> \frac{\left(p(y)p(x|r_x = 1,y,r_y = 1)\right)}{\sum_{y}\left(p(y)p(x|r_x = 1,y,r_y = 1)\right)}

References