Update notebooks and toc

neurodata · Apr 26, 2023 · 18b7dc4 · 18b7dc4
1 parent cf1c2a6
commit 18b7dc4
Show file tree

Hide file tree

Showing 4 changed files with 282 additions and 32 deletions.
diff --git a/README.md b/README.md
@@ -1,3 +1,17 @@
 # Independence Testing for Multivariate Time Series
 
+Code accompanying the publication: [Independence Testing for Multivariate Time Series](https://arxiv.org/abs/1908.06486).
+
+## Abstract
+
 Complex data structures such as time series are increasingly more prevalent in modern data science problems. A fundamental question is whether two such time-series have a statistically significant relationship. Many current approaches rely on making parametric assumptions on the random processes, detecting only linear associations, requiring multiple tests, or sacrificing power in high-dimensional and nonlinear settings. The distribution of any test statistic under the null hypothesis is challenging to estimate, as the permutation test is typically invalid. This study combines distance correlation (Dcorr) and multiscale graph correlation (MGC) from independence testing literature with block permutation from time series analysis to address these challenges. The proposed nonparametric procedure is asymptotic valid, consistent for dependence testing under stationary time-series, able to estimate the optimal lag that maximizes the dependence. It eliminates the need for multiple testing, and exhibits superior power in high-dimensional, low sample size, and nonlinear settings. The analysis of neural connectivity with fMRI data reveals a linear dependence of signals within the visual network and default mode network and nonlinear relationships in other networks. This work provides a primary data analysis tool with open-source code, impacting a wide range of scientific disciplines.
+
+## Repo Structure
+
+## Guide to using the repository
+
+- Navigate to a directory where you want to store the project, and clone this repo:
+
+```
+git clone https://github.com/neurodata/bilateral-connectome
+```
diff --git a/coverpage.md b/coverpage.md
@@ -0,0 +1,3 @@
+# Abstract
+
+Complex data structures such as time series are increasingly more prevalent in modern data science problems. A fundamental question is whether two such time-series have a statistically significant relationship. Many current approaches rely on making parametric assumptions on the random processes, detecting only linear associations, requiring multiple tests, or sacrificing power in high-dimensional and nonlinear settings. The distribution of any test statistic under the null hypothesis is challenging to estimate, as the permutation test is typically invalid. This study combines distance correlation (Dcorr) and multiscale graph correlation (MGC) from independence testing literature with block permutation from time series analysis to address these challenges. The proposed nonparametric procedure is asymptotic valid, consistent for dependence testing under stationary time-series, able to estimate the optimal lag that maximizes the dependence. It eliminates the need for multiple testing, and exhibits superior power in high-dimensional, low sample size, and nonlinear settings. The analysis of neural connectivity with fMRI data reveals a linear dependence of signals within the visual network and default mode network and nonlinear relationships in other networks. This work provides a primary data analysis tool with open-source code, impacting a wide range of scientific disciplines.
diff --git a/data.ipynb b/data.ipynb
@@ -0,0 +1,255 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "746578a4-88dd-4cec-b437-d6f1d0865ef6",
+   "metadata": {},
+   "source": [
+    "# Generating data for analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "f12ed4de-a7b6-4090-9eea-13fefdce9d1e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import scipy as sp\n",
+    "from pathlib import Path\n",
+    "from hyppo.tools import ts_sim\n",
+    "\n",
+    "TS_SIMS = [\n",
+    "    \"indep_ar\",\n",
+    "    \"cross_corr_ar\",\n",
+    "    \"nonlinear_process\",\n",
+    "    \"extinct_gaussian_process\",\n",
+    "]\n",
+    "\n",
+    "p = \"./data/\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91de8db0-7809-4dd8-89a3-d3b7c098aab6",
+   "metadata": {},
+   "source": [
+    "## Generate Experiment 1 - Independent AR(1) with increasing sample size."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "03d96d14-c068-471b-b7c1-80e17621f73e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fname = \"1-independent_ar_n\"\n",
+    "\n",
+    "n = 200\n",
+    "reps = 300\n",
+    "\n",
+    "np.random.seed(1)\n",
+    "\n",
+    "datas = [ts_sim(\"indep_ar\", n) for _ in range(reps)]\n",
+    "\n",
+    "X = np.hstack([data[0] for data in datas])\n",
+    "Y = np.hstack([data[1] for data in datas])\n",
+    "\n",
+    "savedict = {\n",
+    "    'X' : X,\n",
+    "    'Y' : Y,\n",
+    "}\n",
+    "\n",
+    "# save to disk\n",
+    "sp.io.savemat(f'{p}{fname}.mat', savedict, do_compression=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ba61b56-eecf-4ea2-9ea1-cf58b704257d",
+   "metadata": {},
+   "source": [
+    "## Generate Experiment 2 - Independent AR(1) with increasing phi."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d6a02659-109a-451f-9d06-773e0726e9b9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fname = \"2-independent_ar_phi\"\n",
+    "\n",
+    "n = 1200\n",
+    "reps = 300\n",
+    "phis = np.arange(0.2, 1, 0.025)\n",
+    "\n",
+    "np.random.seed(1)\n",
+    "\n",
+    "Xs = []\n",
+    "Ys = []\n",
+    "\n",
+    "for phi in phis:\n",
+    "    datas = [ts_sim(\"indep_ar\", n, phi=float(phi)) for _ in range(reps)]\n",
+    "    Xs.append(np.hstack([data[0] for data in datas]))\n",
+    "    Ys.append(np.hstack([data[1] for data in datas]))\n",
+    "\n",
+    "\n",
+    "X = np.stack(Xs)\n",
+    "Y = np.stack(Ys)\n",
+    "\n",
+    "savedict = {\n",
+    "    'X' : X,\n",
+    "    'Y' : Y,\n",
+    "    'phi': phis\n",
+    "}\n",
+    "\n",
+    "# save to disk\n",
+    "sp.io.savemat(f'{p}{fname}.mat', savedict, do_compression=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d804f8d-a21c-4c3d-a5d0-dc918ebea269",
+   "metadata": {},
+   "source": [
+    "## Generate Experiment 3 - Linear cross correlated AR(1) with increasing sample size"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "4cc407eb-14a6-4441-b8da-32afca1f6465",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fname = \"3-linear_ar\"\n",
+    "\n",
+    "n = 200\n",
+    "reps = 300\n",
+    "\n",
+    "np.random.seed(1)\n",
+    "\n",
+    "datas = [ts_sim(\"cross_corr_ar\", n) for _ in range(reps)]\n",
+    "\n",
+    "X = np.hstack([data[0] for data in datas])\n",
+    "Y = np.hstack([data[1] for data in datas])\n",
+    "\n",
+    "savedict = {\n",
+    "    'X' : X,\n",
+    "    'Y' : Y,\n",
+    "}\n",
+    "\n",
+    "# save to disk\n",
+    "sp.io.savemat(f'{p}{fname}.mat', savedict, do_compression=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a46bc6e9-a0dc-458a-bb24-e0453e1ab000",
+   "metadata": {},
+   "source": [
+    "## Generate Experiment 4 - Non-linearly cross correlated AR(1) with increasing sample size"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "4c232a9d-5c05-4ed9-8a51-1daef8efb0f1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fname = \"4-nonlinear_ar\"\n",
+    "\n",
+    "n = 200\n",
+    "reps = 300\n",
+    "\n",
+    "np.random.seed(1)\n",
+    "\n",
+    "datas = [ts_sim(\"nonlinear_process\", n) for _ in range(reps)]\n",
+    "\n",
+    "X = np.hstack([data[0] for data in datas])\n",
+    "Y = np.hstack([data[1] for data in datas])\n",
+    "\n",
+    "savedict = {\n",
+    "    'X' : X,\n",
+    "    'Y' : Y,\n",
+    "}\n",
+    "\n",
+    "# save to disk\n",
+    "sp.io.savemat(f'{p}{fname}.mat', savedict, do_compression=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d60065b-fc9e-4dbf-958c-137558de83fb",
+   "metadata": {},
+   "source": [
+    "## Generate Experiment 5 - Non-linearly cross correlated AR(1) with increasing sample size"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "32aaef0f-8d47-4b8c-9a29-e1a749996063",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fname = \"5-extinct_gaussian\"\n",
+    "\n",
+    "n = 200\n",
+    "reps = 300\n",
+    "\n",
+    "np.random.seed(1)\n",
+    "\n",
+    "datas = [ts_sim(\"extinct_gaussian_process\", n) for _ in range(reps)]\n",
+    "\n",
+    "X = np.hstack([data[0] for data in datas])\n",
+    "Y = np.hstack([data[1] for data in datas])\n",
+    "\n",
+    "savedict = {\n",
+    "    'X' : X,\n",
+    "    'Y' : Y,\n",
+    "}\n",
+    "\n",
+    "# save to disk\n",
+    "sp.io.savemat(f'{p}{fname}.mat', savedict, do_compression=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "98cf8254-3a4e-4421-a272-5a0eb81d2c37",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Generate Experiment 6 - optimal lag estimation\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/figure1.ipynb b/figure1.ipynb
@@ -2,44 +2,22 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": 2,
-   "id": "3522aa07-8839-4272-93b5-c8ed76e934e1",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Obtaining file:///Users/j1c/git/mgcx/hyppo\n",
-      "  Preparing metadata (setup.py) ... \u001b[?25ldone\n",
-      "\u001b[?25hRequirement already satisfied: numpy>=1.17 in /Users/j1c/miniconda3/envs/mgcx/lib/python3.9/site-packages (from hyppo==0.3.2) (1.23.5)\n",
-      "Requirement already satisfied: scipy>=1.4.0 in /Users/j1c/miniconda3/envs/mgcx/lib/python3.9/site-packages (from hyppo==0.3.2) (1.10.1)\n",
-      "Requirement already satisfied: numba>=0.46 in /Users/j1c/miniconda3/envs/mgcx/lib/python3.9/site-packages (from hyppo==0.3.2) (0.56.4)\n",
-      "Requirement already satisfied: scikit-learn>=0.19.1 in /Users/j1c/miniconda3/envs/mgcx/lib/python3.9/site-packages (from hyppo==0.3.2) (1.2.2)\n",
-      "Requirement already satisfied: autograd>=1.3 in /Users/j1c/miniconda3/envs/mgcx/lib/python3.9/site-packages (from hyppo==0.3.2) (1.5)\n",
-      "Requirement already satisfied: future>=0.15.2 in /Users/j1c/miniconda3/envs/mgcx/lib/python3.9/site-packages (from autograd>=1.3->hyppo==0.3.2) (0.18.3)\n",
-      "Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /Users/j1c/miniconda3/envs/mgcx/lib/python3.9/site-packages (from numba>=0.46->hyppo==0.3.2) (0.39.1)\n",
-      "Requirement already satisfied: setuptools in /Users/j1c/miniconda3/envs/mgcx/lib/python3.9/site-packages (from numba>=0.46->hyppo==0.3.2) (66.0.0)\n",
-      "Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/j1c/miniconda3/envs/mgcx/lib/python3.9/site-packages (from scikit-learn>=0.19.1->hyppo==0.3.2) (2.2.0)\n",
-      "Requirement already satisfied: joblib>=1.1.1 in /Users/j1c/miniconda3/envs/mgcx/lib/python3.9/site-packages (from scikit-learn>=0.19.1->hyppo==0.3.2) (1.1.1)\n",
-      "Installing collected packages: hyppo\n",
-      "  Attempting uninstall: hyppo\n",
-      "    Found existing installation: hyppo 0.3.2\n",
-      "    Uninstalling hyppo-0.3.2:\n",
-      "      Successfully uninstalled hyppo-0.3.2\n",
-      "  Running setup.py develop for hyppo\n",
-      "Successfully installed hyppo-0.3.2\n"
-     ]
-    }
-   ],
+   "execution_count": 1,
+   "id": "86756329-8d2b-4794-998c-c7cc3174d8d9",
+   "metadata": {
+    "tags": [
+     "hide-cell"
+    ]
+   },
+   "outputs": [],
    "source": [
-    "!pip install -e ./hyppo"
+    "import matplotlib.pyplot as plt"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "3b289764-53f8-4385-b763-1f7880070cbe",
+   "id": "0f8f0546-9cb7-4b0c-9cf4-3c4043cfada2",
    "metadata": {},
    "outputs": [],
    "source": []