-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
282 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,17 @@ | ||
# Independence Testing for Multivariate Time Series | ||
|
||
Code accompanying the publication: [Independence Testing for Multivariate Time Series](https://arxiv.org/abs/1908.06486). | ||
|
||
## Abstract | ||
|
||
Complex data structures such as time series are increasingly more prevalent in modern data science problems. A fundamental question is whether two such time-series have a statistically significant relationship. Many current approaches rely on making parametric assumptions on the random processes, detecting only linear associations, requiring multiple tests, or sacrificing power in high-dimensional and nonlinear settings. The distribution of any test statistic under the null hypothesis is challenging to estimate, as the permutation test is typically invalid. This study combines distance correlation (Dcorr) and multiscale graph correlation (MGC) from independence testing literature with block permutation from time series analysis to address these challenges. The proposed nonparametric procedure is asymptotic valid, consistent for dependence testing under stationary time-series, able to estimate the optimal lag that maximizes the dependence. It eliminates the need for multiple testing, and exhibits superior power in high-dimensional, low sample size, and nonlinear settings. The analysis of neural connectivity with fMRI data reveals a linear dependence of signals within the visual network and default mode network and nonlinear relationships in other networks. This work provides a primary data analysis tool with open-source code, impacting a wide range of scientific disciplines. | ||
|
||
## Repo Structure | ||
|
||
## Guide to using the repository | ||
|
||
- Navigate to a directory where you want to store the project, and clone this repo: | ||
|
||
``` | ||
git clone https://github.com/neurodata/bilateral-connectome | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Abstract | ||
|
||
Complex data structures such as time series are increasingly more prevalent in modern data science problems. A fundamental question is whether two such time-series have a statistically significant relationship. Many current approaches rely on making parametric assumptions on the random processes, detecting only linear associations, requiring multiple tests, or sacrificing power in high-dimensional and nonlinear settings. The distribution of any test statistic under the null hypothesis is challenging to estimate, as the permutation test is typically invalid. This study combines distance correlation (Dcorr) and multiscale graph correlation (MGC) from independence testing literature with block permutation from time series analysis to address these challenges. The proposed nonparametric procedure is asymptotic valid, consistent for dependence testing under stationary time-series, able to estimate the optimal lag that maximizes the dependence. It eliminates the need for multiple testing, and exhibits superior power in high-dimensional, low sample size, and nonlinear settings. The analysis of neural connectivity with fMRI data reveals a linear dependence of signals within the visual network and default mode network and nonlinear relationships in other networks. This work provides a primary data analysis tool with open-source code, impacting a wide range of scientific disciplines. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,255 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "746578a4-88dd-4cec-b437-d6f1d0865ef6", | ||
"metadata": {}, | ||
"source": [ | ||
"# Generating data for analysis" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "f12ed4de-a7b6-4090-9eea-13fefdce9d1e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import numpy as np\n", | ||
"import scipy as sp\n", | ||
"from pathlib import Path\n", | ||
"from hyppo.tools import ts_sim\n", | ||
"\n", | ||
"TS_SIMS = [\n", | ||
" \"indep_ar\",\n", | ||
" \"cross_corr_ar\",\n", | ||
" \"nonlinear_process\",\n", | ||
" \"extinct_gaussian_process\",\n", | ||
"]\n", | ||
"\n", | ||
"p = \"./data/\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "91de8db0-7809-4dd8-89a3-d3b7c098aab6", | ||
"metadata": {}, | ||
"source": [ | ||
"## Generate Experiment 1 - Independent AR(1) with increasing sample size." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "03d96d14-c068-471b-b7c1-80e17621f73e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"fname = \"1-independent_ar_n\"\n", | ||
"\n", | ||
"n = 200\n", | ||
"reps = 300\n", | ||
"\n", | ||
"np.random.seed(1)\n", | ||
"\n", | ||
"datas = [ts_sim(\"indep_ar\", n) for _ in range(reps)]\n", | ||
"\n", | ||
"X = np.hstack([data[0] for data in datas])\n", | ||
"Y = np.hstack([data[1] for data in datas])\n", | ||
"\n", | ||
"savedict = {\n", | ||
" 'X' : X,\n", | ||
" 'Y' : Y,\n", | ||
"}\n", | ||
"\n", | ||
"# save to disk\n", | ||
"sp.io.savemat(f'{p}{fname}.mat', savedict, do_compression=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8ba61b56-eecf-4ea2-9ea1-cf58b704257d", | ||
"metadata": {}, | ||
"source": [ | ||
"## Generate Experiment 2 - Independent AR(1) with increasing phi." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "d6a02659-109a-451f-9d06-773e0726e9b9", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"fname = \"2-independent_ar_phi\"\n", | ||
"\n", | ||
"n = 1200\n", | ||
"reps = 300\n", | ||
"phis = np.arange(0.2, 1, 0.025)\n", | ||
"\n", | ||
"np.random.seed(1)\n", | ||
"\n", | ||
"Xs = []\n", | ||
"Ys = []\n", | ||
"\n", | ||
"for phi in phis:\n", | ||
" datas = [ts_sim(\"indep_ar\", n, phi=float(phi)) for _ in range(reps)]\n", | ||
" Xs.append(np.hstack([data[0] for data in datas]))\n", | ||
" Ys.append(np.hstack([data[1] for data in datas]))\n", | ||
"\n", | ||
"\n", | ||
"X = np.stack(Xs)\n", | ||
"Y = np.stack(Ys)\n", | ||
"\n", | ||
"savedict = {\n", | ||
" 'X' : X,\n", | ||
" 'Y' : Y,\n", | ||
" 'phi': phis\n", | ||
"}\n", | ||
"\n", | ||
"# save to disk\n", | ||
"sp.io.savemat(f'{p}{fname}.mat', savedict, do_compression=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "1d804f8d-a21c-4c3d-a5d0-dc918ebea269", | ||
"metadata": {}, | ||
"source": [ | ||
"## Generate Experiment 3 - Linear cross correlated AR(1) with increasing sample size" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"id": "4cc407eb-14a6-4441-b8da-32afca1f6465", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"fname = \"3-linear_ar\"\n", | ||
"\n", | ||
"n = 200\n", | ||
"reps = 300\n", | ||
"\n", | ||
"np.random.seed(1)\n", | ||
"\n", | ||
"datas = [ts_sim(\"cross_corr_ar\", n) for _ in range(reps)]\n", | ||
"\n", | ||
"X = np.hstack([data[0] for data in datas])\n", | ||
"Y = np.hstack([data[1] for data in datas])\n", | ||
"\n", | ||
"savedict = {\n", | ||
" 'X' : X,\n", | ||
" 'Y' : Y,\n", | ||
"}\n", | ||
"\n", | ||
"# save to disk\n", | ||
"sp.io.savemat(f'{p}{fname}.mat', savedict, do_compression=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a46bc6e9-a0dc-458a-bb24-e0453e1ab000", | ||
"metadata": {}, | ||
"source": [ | ||
"## Generate Experiment 4 - Non-linearly cross correlated AR(1) with increasing sample size" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"id": "4c232a9d-5c05-4ed9-8a51-1daef8efb0f1", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"fname = \"4-nonlinear_ar\"\n", | ||
"\n", | ||
"n = 200\n", | ||
"reps = 300\n", | ||
"\n", | ||
"np.random.seed(1)\n", | ||
"\n", | ||
"datas = [ts_sim(\"nonlinear_process\", n) for _ in range(reps)]\n", | ||
"\n", | ||
"X = np.hstack([data[0] for data in datas])\n", | ||
"Y = np.hstack([data[1] for data in datas])\n", | ||
"\n", | ||
"savedict = {\n", | ||
" 'X' : X,\n", | ||
" 'Y' : Y,\n", | ||
"}\n", | ||
"\n", | ||
"# save to disk\n", | ||
"sp.io.savemat(f'{p}{fname}.mat', savedict, do_compression=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "5d60065b-fc9e-4dbf-958c-137558de83fb", | ||
"metadata": {}, | ||
"source": [ | ||
"## Generate Experiment 5 - Non-linearly cross correlated AR(1) with increasing sample size" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"id": "32aaef0f-8d47-4b8c-9a29-e1a749996063", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"fname = \"5-extinct_gaussian\"\n", | ||
"\n", | ||
"n = 200\n", | ||
"reps = 300\n", | ||
"\n", | ||
"np.random.seed(1)\n", | ||
"\n", | ||
"datas = [ts_sim(\"extinct_gaussian_process\", n) for _ in range(reps)]\n", | ||
"\n", | ||
"X = np.hstack([data[0] for data in datas])\n", | ||
"Y = np.hstack([data[1] for data in datas])\n", | ||
"\n", | ||
"savedict = {\n", | ||
" 'X' : X,\n", | ||
" 'Y' : Y,\n", | ||
"}\n", | ||
"\n", | ||
"# save to disk\n", | ||
"sp.io.savemat(f'{p}{fname}.mat', savedict, do_compression=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "98cf8254-3a4e-4421-a272-5a0eb81d2c37", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Generate Experiment 6 - optimal lag estimation\n", | ||
"\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.16" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters