Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Conditional Independence Tests #226

Open
rflperry opened this issue Oct 26, 2021 · 5 comments
Open

Kernel Conditional Independence Tests #226

rflperry opened this issue Oct 26, 2021 · 5 comments
Labels
enhancement New feature or request ndd Issues for NeuroData Design

Comments

@rflperry
Copy link
Member

rflperry commented Oct 26, 2021

Testing for conditional independence, X \indep Y | Z, is a common problem within causal discovery and feature selection. The following two kernel-based methods are able to perform this test without too many assumptions.

Kernel Conditional Independence (KCI) Test [paper][matlab code]

  • Well known and commonly used in practice from my understanding.
  • Computes kernel matrices in each of the variables X, Y, Z to compute a test statistic.
  • Approximates the null distribution using a Gamma distribution. No permutation test available.
  • [Edit] Python code in this package

A Permutation-Based Kernel Conditional Independence (KCIP) Test [paper][matlab code]

  • Potentially an improvement to KCI but not as widely used or known, partially due to speed constraints.
  • Computes kernel matrices in each of the variables X, Y, Z.
  • Also provides a two-layer bootstrap permutation test by:
    • Finding a permutation Y' of Y based on minimizing the permuted Z distances.
    • Performing a two-sample test (MMD) on the original (X, Y, X) and permuted (X, Y', Z)
  • Improves upon KCI when it's null is not well specified (compelx, higher-dimension Z), or if Z can be clustered well or is discrete.
  • Also provides analytic approximates the null distribution using a Gamma distribution.

A nonparametric test based on regression error (FIT) [paper] [python code]

  • A bit more fringe than KCI/KCIP but provides good simulation comparisons between all three methods plus more.
  • Uses a nonparametric regression (in their case, a decision tree) to examine the change in predictive power based on including versus excluding some variables Z.
  • Uses the mean squared error as a test statistic and an analytic Gaussian/T-test approach to compute a pvalue
  • Seemingly efficient for large samples sizes as compared to other kernel based approaches.
  • Interesting connections in that trees/forests are adaptive kernel methods and extensions to forests/honesty/leaf permutations.
@rflperry rflperry added the enhancement New feature or request label Oct 26, 2021
@zdbzdb123123
Copy link
Contributor

Interested

@sampan501
Copy link
Member

@zdbzdb123123 which one? Once you have decided, please make a new issue with the description and link to this issue

@zdbzdb123123
Copy link
Contributor

KCI, and will do

@rflperry
Copy link
Member Author

rflperry commented Feb 4, 2022

I also discovered a package with python code, matlab wrappers.

  1. KCI code
  2. KCIP code
    The package has some other things, including a small notebook with simulations to test the tools.

@MatthewZhao26
Copy link
Contributor

Interested in FIT

@sampan501 sampan501 changed the title Kernel Conditional Independence Test Kernel Conditional Independence Tests Feb 14, 2022
@sampan501 sampan501 added the ndd Issues for NeuroData Design label Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ndd Issues for NeuroData Design
Projects
None yet
Development

No branches or pull requests

4 participants