Causal Discovery with Equal Variance Assumption

Abstract

Prior work has shown that causal structure can be uniquely identified from observational data when these follow a structural equation model whose error terms have equal variances. We show that this fact is implied by an ordering among (conditional) variances. We demonstrate that ordering estimates of these variances yields a simple yet state-of-the-art method for causal structure learning that is readily extendable to high-dimensional problems.

This repository maintains the code for this project.

Installation

The EqVarDAG package:

The package contains the four methods (top-down and bottom-up approach, for low-dimensional settings, and high-dimensional settings.) Among the methods, high-dimensional bottom-up is implemented with the CLIME estimator, as proposed in [Ghoshal and Honorio (2018)][http://proceedings.mlr.press/v84/ghoshal18a/]. The package can be installed by devtools
```
# install.packages("devtools")
devtools::install_github("WY-Chen/EqVarDAG")
```

Methods

Let X be an n-by-p matrix generated according to an SEM with equal variance, we can estimate the topological ordering and the corresponding DAG structure using any of the following:

# low-dimensional, top-down
EqVarDAG_TD(X)
# low-dimensional, bottom-up
EqVarDAG_BU(X)
# high-dimensional, top-down, assuming maximum in-degree <= J
EqVarDAG_HD_TD(X,J)
# high-dimensional, bottom-up, using CLIME, as in [Ghoshal and Honorio (2018)]
EqVarDAG_HD_CLIME(X)

Specifially, the arguments to the function is

# low dimensional example
EqVarDAG_TD(
  X,
  mtd = "ztest",
  alpha = 0.05,
  threshold = NULL,
  FCD = NULL,
  precmtd = NULL
)
# high dimensional example
EqVarDAG_HD_CLIME(
  X,
  mtd = "dlasso",
  alpha = 0.05,
  threshold = NULL,
  FCD = TRUE,
  precmtd = "sqrtlasso"
)

X: n by p data matrix
mtd: DAG inference method, we implemented the following options:
- "ztest": (only for p<n) Simultaneous tests of partial correlations, see [Drton and Perlman.2007];
- "dlasso": debiased lasso (default with FCD=True and precmtd="sqrtlasso", see below for details);
- "lasso": lasso with fixed lambda, see [Shojaie and Michailidis. 2010];
- "adalasso": adaptive lasso with fixed lambda, see [Shojaie and Michailidis. 2010];
- "cvlasso": cross-validated lasso;
- "scallasso": scaled lasso;
- "rls": (only for p<n) fit recursive least squares and threshold the regression coefs
- "chol": (only for p<n) perform cholesky decomposition and threshold the regression coefs
alpha: level for DAG inference step
threshold: threshold for "rls" or "chol" methods
FCD: only used in debiased lasso
- TRUE: use the FCD procedure [Javanmard and Montanari. 2018]
- FALSE: use individual tests to select
precmtd: only used in debiased lasso, choose how to compute debiasing matrix
- "cv": node-wise lasso with joint 10 fold cv
- "sqrtlasso": square-root lasso (no tune)

We proposed a two step procedure for DAG learning under equal error-variance. First, a topological ordering is learned via the iterative algorithm; Then, the DAG is inferred using the learned ordering. Since the main contribution of this paper is the first step (the second step is a well-studied problem, see e.g., [Shojaie and Michailidis. 2010]), in the original version of this paper/package we only implemented/reported results of the "cvlasso" method. However, as we realized people not only use this package to learn orderings but also DAGs, we implemented a variety of DAG learning algorithms to choose from. In particular, we recommend using the simultaneous test of partial correlation ("ztest") in low dimensional problems; for high dimensional problems, we recommend using the FCD procedure with debiased lasso whose debiasing matrix is computed by square-root lasso ("dlasso" with FCD=TRUE and precmtd="sqrtlasso"). These two methods both consistently recovers the correct gragh given true ordering.

Simulation studies

The details of the simulation settings are specified in our paper.

In low dimensional settings, we compare our algorithms with the Greedy DAG Search algorithm (GDS) proposed by [Peters and Bühlmann (2014)][https://doi.org/10.1093/biomet/ast043]. The GDS algorithm is implemented in the supplement of said paper. A slightly modified version is maintained in this repository.

# download this repository
setwd(dir = "/some/path/")
download.file(url = "https://github.com/WY-Chen/EqVarDAG/archive/master.zip",
              destfile = "EqVarDAG.zip")
# unzip the .zip file
unzip(zipfile = "EqVarDAG.zip")
setwd(dir = "/some/path/EqVarDAG-master")

Low dimensional simulation with N replicates, each problem with p variables and n data can be performed by calling

# run simulations
source("/some/path/EqVarDAG/Experiments/sims_low.R")
simulation_lowD(p,n,N) 			# low-dimensional settings

High dimensional simulation is implemented with varies graph types, specified in the paper. We can call type with choices 'er', 'chain', or 'hub'.

# run simulations
source("/some/path/EqVarDAG/Experiments/sims_high.R")
simulation_highD(p,n,N,type) 	# high-dimensional settings

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Experiments		Experiments
GDS		GDS
R		R
man		man
DESCRIPTION		DESCRIPTION
EqVarDAG.Rproj		EqVarDAG.Rproj
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiments

Experiments

GDS

GDS

R

R

man

man

DESCRIPTION

DESCRIPTION

EqVarDAG.Rproj

EqVarDAG.Rproj

LICENSE

LICENSE

NAMESPACE

NAMESPACE

README.md

README.md

Repository files navigation

Causal Discovery with Equal Variance Assumption

Abstract

Installation

Methods

Simulation studies

About

Releases

Packages

Contributors 2

Languages

License

WY-Chen/EqVarDAG

Folders and files

Latest commit

History

Repository files navigation

Causal Discovery with Equal Variance Assumption

Abstract

Installation

Methods

Simulation studies

About

Resources

License

Stars

Watchers

Forks

Languages