Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional Correlation Tests #349

Open
wants to merge 152 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
152 commits
Select commit Hold shift + click to select a range
6b2d89b
Update permutation tree/block docstrings (#157)
rflperry Jan 13, 2021
bdc3e4b
create wrapper class for Energy (#160)
sampan501 Jan 13, 2021
bf0e37b
Move to Circle CI (#163)
sampan501 Jan 19, 2021
6666ded
speed up circleci builds (#164)
sampan501 Jan 20, 2021
bcab033
use xml for coverage reports (#165)
sampan501 Jan 20, 2021
97607d2
update readme badges (#166)
sampan501 Jan 20, 2021
ccc9d09
Update README.md
sampan501 Jan 20, 2021
409022c
improve documentation (#167)
sampan501 Feb 8, 2021
5780b9d
Release v0.2.0 (#169)
sampan501 Feb 8, 2021
cf3e3c1
change pypi long description format
sampan501 Feb 9, 2021
9140ecd
release v0.2.1 (#179)
sampan501 Feb 25, 2021
2fde59f
rename badges to main in README
sampan501 Feb 25, 2021
367087e
make file names for benchmarks lower case
sampan501 Feb 25, 2021
c2da915
Remove extraneous benchmarks (#180)
sampan501 Feb 25, 2021
d0b16ab
make changes to improve diversity and inclusion (#189)
sampan501 Mar 16, 2021
33db884
change l1 distance to l2 for median heuristic (#191)
sampan501 Mar 23, 2021
c4f6657
add reference to paper within fast docstring
sampan501 Apr 29, 2021
f7df5bb
reformat to black
sampan501 Apr 29, 2021
eddab0d
make median kernel default for rbf and gaussian (#196)
sampan501 May 7, 2021
abce18f
Switch to MIT license (#198)
PSSF23 May 18, 2021
9e41f17
Added typing to output of test methods. Updated requirements. (#201)
hadasarik Jun 20, 2021
aa6f4ca
fix docs website formatting issues (#197)
sampan501 Jun 21, 2021
5c2693c
update preprints to published version (#202)
sampan501 Jun 21, 2021
f54a49e
add CITATION.cff (#204)
sampan501 Jul 29, 2021
54c628f
add bibtex to docs (#205)
sampan501 Jul 29, 2021
1847084
remove master references (#208)
sampan501 Aug 23, 2021
0aa0ab6
remove file references in bibtex
sampan501 Aug 31, 2021
e96c986
EHN update pytest orbs version (#217)
PSSF23 Oct 13, 2021
7c4df6d
Reproducibility to Perm Tests (#212)
kareef928 Oct 14, 2021
8407bb5
MAINT update license in setup (#224)
PSSF23 Oct 20, 2021
771bda8
MGC redundancy warning (#125) (#220)
Verathagnus Oct 27, 2021
e69fe12
Fix #228 (#230)
rflperry Nov 9, 2021
b8c04d0
add 3.9 support (#223)
sampan501 Nov 10, 2021
7aa48d7
release hyppo 0.2.2 (#236)
sampan501 Dec 7, 2021
135cfcd
release hyppo 0.2.2 (#236) (#237) (#241)
sampan501 Dec 8, 2021
fc4c229
add permutation test example to docs (#242)
sampan501 Dec 13, 2021
5db67a7
Adding dHSIC (#233)
diane-lee-01 Dec 13, 2021
f7a1f6c
Edited the types in the documentation section. (#244)
Dec 17, 2021
7f3b291
Fast tstest (#234)
MatthewZhao26 Dec 20, 2021
5a0fc78
Fast HHG Test (#238)
TacticalFallacy Dec 20, 2021
5e0fe5e
Creating a goodness-of-fit module in hyppo (#232)
darsh-patel Dec 20, 2021
e257704
Friedman Rafsky PR (#239)
zdbzdb123123 Dec 20, 2021
bdee8eb
refactor docs and add contribution bot (#248)
sampan501 Dec 21, 2021
6e1aec9
docs: add sampan501 as a contributor for bug, code, doc, ideas, maint…
allcontributors[bot] Dec 21, 2021
e697d6e
docs: add cshen6 as a contributor for code (#274)
allcontributors[bot] Dec 21, 2021
79fc8b9
fix all-contrib errors
sampan501 Dec 21, 2021
6cc0ed2
change research to ideas (#275)
sampan501 Dec 22, 2021
0b22124
docs: add jovo as a contributor for fundingFinding, mentoring, ideas …
allcontributors[bot] Dec 22, 2021
05e3070
docs: add tpsatish95 as a contributor for code, ideas (#277)
allcontributors[bot] Dec 22, 2021
0783910
docs: add junhaobearxiong as a contributor for code, ideas (#278)
allcontributors[bot] Dec 22, 2021
22ba58c
docs: add ebridge2 as a contributor for bug, ideas (#279)
allcontributors[bot] Dec 22, 2021
bbf34b8
docs: add ronakdm as a contributor for bug, code, ideas (#280)
allcontributors[bot] Dec 22, 2021
7f0f686
remove bug for ronak
sampan501 Dec 22, 2021
c0a8bd8
remove bug for ronak
sampan501 Dec 22, 2021
c9a578d
docs: add j1c as a contributor for bug (#281)
allcontributors[bot] Dec 22, 2021
b0ef56c
docs: add jdey4 as a contributor for code (#282)
allcontributors[bot] Dec 22, 2021
67770b2
docs: add bvarjavand as a contributor for code (#283)
allcontributors[bot] Dec 22, 2021
75f3766
docs: add bdpedigo as a contributor for bug, code (#284)
allcontributors[bot] Dec 22, 2021
fd5e201
docs: add alyakin314 as a contributor for code, ideas (#285)
allcontributors[bot] Dec 22, 2021
ede1a1e
docs: add v715 as a contributor for code (#286)
allcontributors[bot] Dec 22, 2021
f6a4c7d
docs: add rflperry as a contributor for bug, code, ideas, review (#287)
allcontributors[bot] Dec 22, 2021
08571ae
docs: add rflperry as a contributor for doc (#288)
allcontributors[bot] Dec 22, 2021
fce463c
docs: add PSSF23 as a contributor for code, doc, review (#289)
allcontributors[bot] Dec 22, 2021
d78f4f7
docs: add hadasarik as a contributor for code (#290)
allcontributors[bot] Dec 22, 2021
d386643
docs: add kareef928 as a contributor for code (#291)
allcontributors[bot] Dec 22, 2021
06c3ed3
docs: add Verathagnus as a contributor for code (#292)
allcontributors[bot] Dec 22, 2021
e633dca
docs: add dlee0156 as a contributor for code (#293)
allcontributors[bot] Dec 22, 2021
7e89518
docs: add najmieh as a contributor for doc (#294)
allcontributors[bot] Dec 22, 2021
8cd054b
docs: add TacticalFallacy as a contributor for code (#295)
allcontributors[bot] Dec 22, 2021
5b1d4ec
docs: add darsh-patel as a contributor for code (#296)
allcontributors[bot] Dec 22, 2021
3a2c44e
docs: add zdbzdb123123 as a contributor for code (#297)
allcontributors[bot] Dec 22, 2021
2aca663
Bump ipython from 7.19.0 to 7.31.1 in /docs (#299)
dependabot[bot] Jan 28, 2022
f743337
Copy SciPy private `_contains_nan` function (#304)
bdpedigo Feb 7, 2022
2151811
release v0.3.0
sampan501 Feb 10, 2022
8faa51a
run jobs in parallel
sampan501 Feb 10, 2022
a0346cc
release v0.3.0
sampan501 Feb 10, 2022
1d0c1dd
add autograd as a dependency
sampan501 Feb 10, 2022
4a7ea32
release v0.3.1
sampan501 Feb 10, 2022
773c533
remove emojis so windows can build package
sampan501 Feb 10, 2022
9e0b31b
release v0.3.2
sampan501 Feb 10, 2022
96ce8bc
run pytest in parallel (#310)
sampan501 Feb 22, 2022
bd33fc7
add skip decorator for two sample circleci tests (#321)
sampan501 May 5, 2022
058b4cd
update sphinx to fix jinja error (#319)
sampan501 May 5, 2022
6abb306
Refactor kgof module (#318)
darsh-patel May 11, 2022
a956de6
FCIT (#315)
MatthewZhao26 May 13, 2022
32a06d0
KCI Dev Pull Request (#317)
zdbzdb123123 May 16, 2022
1760db8
Fast HHG 2-Sample Test (#314)
TacticalFallacy May 16, 2022
8c28ff6
fix typos (#327)
oakla Aug 16, 2022
9aa5c56
Stat ranges to docs (#331)
oakla Aug 30, 2022
29dda59
DOC remove extra comma in dcorr docstring (#335)
harsh204016 Oct 13, 2022
8dd4090
don't normalize kmerf importances
sampan501 Oct 26, 2022
f63e352
make k-sample error more clear
sampan501 Oct 26, 2022
ef323f3
MANOVA rank errors if circle used here
sampan501 Oct 28, 2022
8f9f437
update netlify image
sampan501 Oct 30, 2022
f0a9651
make docs clear about FR corrected stat (#336)
sampan501 Oct 30, 2022
8bf37e3
fix two typos in docs (#337)
sampan501 Oct 30, 2022
d045018
Fix typo in distance covariance equation (#340)
j1c Dec 12, 2022
bdd4710
start
j1c Jan 19, 2023
0265539
update sims
j1c Jan 19, 2023
9ad69e9
Update kernel estimation
j1c Jan 20, 2023
57b93e4
Update and fix computation
j1c Jan 20, 2023
35096fd
Add permutation test
j1c Jan 23, 2023
ec43bc7
Add input checks
j1c Jan 23, 2023
9847c9d
Fix errors
j1c Jan 23, 2023
246735a
Separate cdcorr and cdcov
j1c Jan 23, 2023
c752837
combine cdcov and cdcorr
j1c Jan 23, 2023
f1720bb
Update init
j1c Jan 23, 2023
8e34be2
Add docstrings
j1c Jan 24, 2023
d611c69
Add more options
j1c Jan 24, 2023
71fb83f
update docstring
j1c Jan 24, 2023
455b09b
Add docstrings, update random_state
j1c Jan 24, 2023
e82b31c
Add more sims
j1c Jan 24, 2023
8ae2d2a
Update simulations
j1c Jan 26, 2023
79b1d8f
Fix statistics bug
j1c Jan 26, 2023
2dfdb8d
Fix bugs in sims
j1c Jan 26, 2023
155e392
Update rand to random
j1c Jan 26, 2023
c47289c
Add more checks and features
j1c Jan 26, 2023
42c8f3b
Add conditional tests to power function
j1c Jan 26, 2023
3843d8c
Add script
j1c Jan 26, 2023
76d792d
Fix error in sim
j1c Jan 26, 2023
3d7f692
Add partial correlation
j1c Jan 26, 2023
17df81f
Add partial correlation
j1c Jan 26, 2023
549b983
Update
j1c Jan 26, 2023
9c34f5d
Update
j1c Jan 26, 2023
d865c72
Add pdcorr
j1c Jan 31, 2023
e42daf1
Fix error
j1c Jan 31, 2023
f02696e
Fix indepedent lognormal sim
j1c Jan 31, 2023
da019b6
Update lognormal sim
j1c Jan 31, 2023
6c4f27c
Deal with zero variance
j1c Jan 31, 2023
020d9a9
Tests for utils
j1c Feb 3, 2023
8c84cfc
add cdcorr tests
j1c Feb 3, 2023
3b6374e
Update
j1c Feb 6, 2023
5c647f7
Fix errors
j1c Feb 6, 2023
035f055
Update covariance
j1c Feb 16, 2023
e31486b
Fix simultation
j1c Feb 16, 2023
7a8d04b
Update docstrings
j1c Feb 16, 2023
18359b9
Fix docstring warnings
j1c Mar 8, 2023
9e40dd3
Partial Correlation tests
j1c Mar 8, 2023
9e6cf15
Update cdcorr test
j1c Mar 8, 2023
284db55
Fix errors
j1c Mar 8, 2023
25accb6
PDcorr tests
j1c Mar 8, 2023
c043d65
Fix tests
j1c Mar 8, 2023
dcdefcd
Update test
j1c Mar 8, 2023
0c0a28a
Fix imports
j1c Mar 8, 2023
6907d64
Update docstrings and fix RF error
j1c Mar 8, 2023
5801783
Merge branch 'main' into cdcorr
sampan501 Dec 27, 2023
e846d93
fix tests failing
sampan501 Dec 27, 2023
a6bcfb8
Update conditional independence simulation tests
j1c Jan 9, 2024
dba8243
Add no pragma for cov/corr computation
j1c Jan 10, 2024
957b0d6
Add more tests
j1c Jan 10, 2024
161e436
More unit tests and black
j1c Jan 11, 2024
83087b0
add support for constant input z for cdcorr
j1c Jan 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,5 @@ Contributions of any kind are welcome!
## Project History

hyppo is a rebranding of mgcpy, which was founded in November 2018.
mgcpy was designed and written by Satish Palaniappan, Sambit
Panda, Junhao Xiong, Sandhya Ramachandran, and Ronak Mehtra. hyppo
mgcpy was designed and written by @tpsatish95, @sampan501, @junhaobearxiong, @sundaysundya, @ananyas713, and @ronakdm. hyppo
was designed and written by Sambit Panda and first released in February 2020.
132 changes: 132 additions & 0 deletions benchmarks/condi_indep_power_sampsize.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
"""
1D Independence Testing Power vs. Sample Size
===============================================

Here, we show finite testing power comparisons between the various tests within hyppo.
For a test to be consistent, we would expect power to converge to 1 as sample size
increases. Tests that converge to 1 quicker have higher finite testing power and
are likely better to use for your use case. The simulation in the bottom right is
used so that we know that we are properly controlling for type I error, which is
important becase otherwise the test would be invalid (power = alpha-level = 0.05).
"""

import os
import sys

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from hyppo.conditional import COND_INDEP_TESTS
from hyppo.tools import COND_SIMULATIONS, power
from joblib import Parallel, delayed

sys.path.append(os.path.realpath(".."))

# make plots look pretty
sns.set(color_codes=True, style="white", context="talk", font_scale=2)
PALETTE = sns.color_palette("Set1")
sns.set_palette(PALETTE[1:])

# constants
MAX_SAMPLE_SIZE = 100
STEP_SIZE = 5
SAMP_SIZES = range(5, MAX_SAMPLE_SIZE + STEP_SIZE, STEP_SIZE)
POWER_REPS = 5

# simulation titles
SIM_TITLES = [k for k, v in COND_SIMULATIONS.items()]

# these tests only make sense for > 1 dimension data
remove = ["fcit", "kci"]
COND_INDEP_TESTS = dict([(k, v) for k, v in INDEP_TESTS.items() if k not in remove])


def estimate_power(sim, test, auto=False):
"""Compute the mean of the estimated power of 5 replications over sample sizes."""
if test == "MaxMargin":
test = ["MaxMargin", "Dcorr"]
est_power = np.array(
[
np.mean(
[
power(test, pow_type="indep", sim=sim, n=i, p=1, auto=auto)
for _ in range(POWER_REPS)
]
)
for i in SAMP_SIZES
]
)
np.savetxt(
"../benchmarks/conditional_vs_samplesize/{}_{}.csv".format(sim, test),
est_power,
delimiter=",",
)

return est_power


# At this point, we would run this bit of code to generate the data for the figure and
# store it under the "vs_sampsize" directory. Since this code takes a very long time,
# we have commented out these lines of codes. If you would like to generate the data,
# uncomment these lines and run the file.

outputs = Parallel(n_jobs=-1, verbose=100)(
[
delayed(estimate_featimport)(sim_name, test)
for sim_name in SIMULATIONS.keys()
for test in INDEP_TESTS.keys()
]
)


def plot_power():
fig, ax = plt.subplots(nrows=4, ncols=5, figsize=(25, 20))
plt.suptitle(
"Multivariate Independence Testing (Increasing Sample Size)",
y=0.93,
va="baseline",
)

for i, row in enumerate(ax):
for j, col in enumerate(row):
count = 5 * i + j
sim = list(SIMULATIONS.keys())[count]

for test in INDEP_TESTS.keys():
est_power = np.genfromtxt(
"../benchmarks/vs_samplesize/{}_{}.csv".format(sim, test),
delimiter=",",
)

col.plot(SAMP_SIZES, est_power, label=INDEP_TESTS[test].__name__, lw=2)
col.set_xticks([])
if i == 3:
col.set_xticks([SAMP_SIZES[0], SAMP_SIZES[-1]])
col.set_ylim(-0.05, 1.05)
col.set_yticks([])
if j == 0:
col.set_yticks([0, 1])
col.set_title(SIM_TITLES[count])

fig.text(0.5, 0.05, "Sample Size", ha="center")
fig.text(
0.07,
0.5,
"Statistical Power",
va="center",
rotation="vertical",
)
leg = plt.legend(
bbox_to_anchor=(0.5, 0.05),
bbox_transform=plt.gcf().transFigure,
ncol=len(INDEP_TESTS.keys()),
loc="upper center",
)
leg.get_frame().set_linewidth(0.0)
for legobj in leg.legendHandles:
legobj.set_linewidth(5.0)
plt.subplots_adjust(hspace=0.50)


# plot the power
plot_power()
17 changes: 17 additions & 0 deletions docs/changelog/v0.3.3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# hyppo v0.3.3

> **Note:** hyppo v0.3.3 has not been released yet!

This is a minor release for general bug fixes, documentation improvements, and general package maintenance.

## Authors

<a href="https://github.com/sampan501">
<img src="https://github.com/sampan501.png" width="50">
</a>

## Issues Closed

## PRs Merged

* [#232](https://github.com/neurodata/hyppo/pull/310): run pytest in parallel
12 changes: 10 additions & 2 deletions hyppo/conditional/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
from .cdcorr import ConditionalDcorr
from .FCIT import FCIT
from .kci import KCI

from .pcorr import PartialCorr
from .pdcorr import PartialDcorr

__all__ = [s for s in dir()] # add imported tests to __all__

COND_INDEP_TESTS = {"fcit": FCIT, "kci": KCI}
COND_INDEP_TESTS = {
"fcit": FCIT,
"kci": KCI,
"conditionaldcorr": ConditionalDcorr,
"partialcorr": PartialCorr,
"partialdcorr": PartialDcorr,
}
95 changes: 95 additions & 0 deletions hyppo/conditional/_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
import numpy as np

from ..tools import check_ndarray_xyz, check_reps, contains_nan, convert_xyz_float64


class _CheckInputs:
"""Checks inputs for all independence tests"""

def __init__(self, x, y, z, reps=None, max_dims=None, ignore_z_var=False):
self.x = x
self.y = y
self.z = z
self.reps = reps
self.max_dims = max_dims
self.ignore_z_var = ignore_z_var # to allow for constant z input

def __call__(self):
check_ndarray_xyz(self.x, self.y, self.z)
contains_nan(self.x)
contains_nan(self.y)
contains_nan(self.z)
self.x, self.y, self.z = self.check_dim_xyz(max_dims=self.max_dims)
self.x, self.y, self.z = convert_xyz_float64(self.x, self.y, self.z)
self._check_min_samples()
self._check_variance()

if self.reps:
check_reps(self.reps)

return self.x, self.y, self.z

def check_dim_xyz(self, max_dims):
"""Check and convert x and y to proper dimensions"""
if self.x.ndim == 1:
self.x = self.x[:, np.newaxis]
elif self.x.ndim != 2:
raise ValueError(

Check warning on line 37 in hyppo/conditional/_utils.py

View check run for this annotation

Codecov / codecov/patch

hyppo/conditional/_utils.py#L37

Added line #L37 was not covered by tests
"Expected a 2-D array `x`, found shape " "{}".format(self.x.shape)
)
if self.y.ndim == 1:
self.y = self.y[:, np.newaxis]
elif self.y.ndim != 2:
raise ValueError(

Check warning on line 43 in hyppo/conditional/_utils.py

View check run for this annotation

Codecov / codecov/patch

hyppo/conditional/_utils.py#L43

Added line #L43 was not covered by tests
"Expected a 2-D array `y`, found shape " "{}".format(self.y.shape)
)
if self.z.ndim == 1:
self.z = self.z[:, np.newaxis]
elif self.z.ndim != 2:
raise ValueError(

Check warning on line 49 in hyppo/conditional/_utils.py

View check run for this annotation

Codecov / codecov/patch

hyppo/conditional/_utils.py#L49

Added line #L49 was not covered by tests
"Expected a 2-D array `z`, found shape " "{}".format(self.z.shape)
)

if max_dims is not None:
_, dx = self.x.shape
_, dy = self.y.shape
_, dz = self.z.shape

if np.any(np.array([dx, dy, dz]) > max_dims):
raise ValueError(
f"x, y, z must have be univariate and have shape [n,{max_dims}]"
)

self._check_nd_indeptest()

return self.x, self.y, self.z

def _check_nd_indeptest(self):
"""Check if number of samples is the same"""
nx, _ = self.x.shape
ny, _ = self.y.shape
nz, _ = self.z.shape
if not np.all(np.array([nx, ny, nz]) == nx):
raise ValueError(
"Shape mismatch, x, y and z must have shape "
+ "[n, p], [n, q] and [n, r]."
)

def _check_min_samples(self):
"""Check if the number of samples is at least 3"""
nx = self.x.shape[0]
ny = self.y.shape[0]
nz = self.z.shape[0]

if nx <= 3 or ny <= 3 or nz <= 3:
raise ValueError("Number of samples is too low")

def _check_variance(self):
if np.var(self.x) == 0:
# or np.var(self.y) == 0 or np.var(self.z) == 0:
raise ValueError("Test cannot be run. Input array x has 0 variance.")

Check warning on line 90 in hyppo/conditional/_utils.py

View check run for this annotation

Codecov / codecov/patch

hyppo/conditional/_utils.py#L90

Added line #L90 was not covered by tests
if np.var(self.y) == 0:
raise ValueError("Test cannot be run. Input array y has 0 variance")

Check warning on line 92 in hyppo/conditional/_utils.py

View check run for this annotation

Codecov / codecov/patch

hyppo/conditional/_utils.py#L92

Added line #L92 was not covered by tests
if not self.ignore_z_var:
if np.var(self.z) == 0:
raise ValueError("Test cannot be run. Input array z has 0 variance")
6 changes: 5 additions & 1 deletion hyppo/conditional/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,11 @@ class ConditionalIndependenceTest(ABC):

"""

def __init__(self):
def __init__(self, **kwargs):
self.stat = None
self.pvalue = None
self.kwargs = kwargs

super().__init__()

@abstractmethod
Expand Down