Skip to content

Analysis of the effect of different caffeine doses on quantitative sleep EEG utilizing statistics and machine learning in a data-driven manner.

Notifications You must be signed in to change notification settings

PhilippThoelke/caffeine-sleep

Repository files navigation

Data-mining sleep brain signals using machine-learning: Exploring the effect of caffeine on EEG network dynamics

This repository contains the code for the analysis of sleep-EEG recorded for two conditions. In the first condition subjects ingested 200mg of caffeine before going to sleep. The second condition consisted of a placebo pill, otherwise following the same procedure. The same dataset has previously been explored by Drapeau et al. (2006) and Robillard et al. (2015), utilizing traditional statistical analysis of several sleep-related variables. Here we implemented a data-driven approach to the analysis, extending traditional statistics with machine-learning based exploration of the data.

Installation

Clone this repository and install the dependencies from requirements.txt:

git clone git@github.com:PhilippThoelke/caffeine-sleep.git
cd caffeine-sleep
pip install -r requirements.txt

We used a modified version of MNE-Python (0.19) for some of the visualizations, which is automatically installed when using our requirements.txt file. The modifications made to the original code are available here.

Usage

Preprocessing

Scripts for running the preprocessing pipeline are located in the preprocessing directory.

  1. Run ExtractFeatures.py to extract features from the raw EEG. Before running, adjust the global variables at the top of the script accordingly. The script is able to load the data in two different formats, based on the SPLIT_STAGES variable: when set to true, the script expects raw EEG and corresponding hypnograms as .npy files. If set to false, data that was previously split into sleep stages will be loaded (ExtractRawSamples.py can be used to split the data into sleep stages without extracting features). The data is also expected to be in .npy format with the following naming scheme: <subject-id>_<sleep-stage>_*.npy.
  2. Compute differences in sample count between the awake (AWA) and wake after sleep onset (WASO/AWSL) using ComputeSampleDifferences.py. The script will save a file called sample_difference<caffeine-dose>.pickle, which is required for the next step.
  3. Run CombineFeatures.py to group the extracted features from all subjects into a single file and perform normalization, as well as average across subjects. The resulting files containing averaged features, condition labels and subject labels are called data_avg.pickle, labels_avg.pickle and groups_avg.pickle respectively. These files will be used for analysis.

Analysis

The analysis is split up into three parts: statistics, single-feature machine learning and multi-feature machine learning. The corresponding files can be found in the statistics, singleFeatureML and multiFeatureML directories.

  1. Statistics:
    The Statistics.ipynb notebook contains the code used for the statistical analysis of the caffeine vs. placebo condition for all features. It runs permutation t-tests and subsequently generates a figure, showing the statistical results visually.

  2. single-feature ML: For the single-feature, single-electrode analysis run SingleFeatureML-Classifier.py to train and evaluate a machine learning classifier on the previously extracted features. You can select the classifier to train through command line arguments. You can run the script without arguments to get some instructions. Final accuracy metrics will be printed after finishing training and a summary of the results is saved as a pickle file. Afterwards, use the SingleFeatureML-Figures.ipynb notebook to visualize and compare results between classifiers.

  3. multi-feature ML: To train random forests on the complete multi-feature, multi-electrode data, run the MultiFeatureRF-Classifier.py script. By default, it will train 1000 random forests and save the scores and feature importances to disk. After training, use the MultiFeatureRF-Figures.ipynb notebook for visualization of the random forest results.

Related work

About

Analysis of the effect of different caffeine doses on quantitative sleep EEG utilizing statistics and machine learning in a data-driven manner.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published