Data-mining sleep brain signals using machine-learning: Exploring the effect of caffeine on EEG network dynamics
This repository contains the code for the analysis of sleep-EEG recorded for two conditions. In the first condition subjects ingested 200mg of caffeine before going to sleep. The second condition consisted of a placebo pill, otherwise following the same procedure. The same dataset has previously been explored by Drapeau et al. (2006) and Robillard et al. (2015), utilizing traditional statistical analysis of several sleep-related variables. Here we implemented a data-driven approach to the analysis, extending traditional statistics with machine-learning based exploration of the data.
Clone this repository and install the dependencies from requirements.txt
:
git clone git@github.com:PhilippThoelke/caffeine-sleep.git
cd caffeine-sleep
pip install -r requirements.txt
We used a modified version of MNE-Python (0.19) for some of the visualizations, which is automatically installed when using our requirements.txt
file. The modifications made to the original code are available here.
Scripts for running the preprocessing pipeline are located in the preprocessing
directory.
- Run
ExtractFeatures.py
to extract features from the raw EEG. Before running, adjust the global variables at the top of the script accordingly. The script is able to load the data in two different formats, based on theSPLIT_STAGES
variable: when set to true, the script expects raw EEG and corresponding hypnograms as.npy
files. If set to false, data that was previously split into sleep stages will be loaded (ExtractRawSamples.py
can be used to split the data into sleep stages without extracting features). The data is also expected to be in.npy
format with the following naming scheme:<subject-id>_<sleep-stage>_*.npy
. - Compute differences in sample count between the awake (AWA) and wake after sleep onset (WASO/AWSL) using
ComputeSampleDifferences.py
. The script will save a file calledsample_difference<caffeine-dose>.pickle
, which is required for the next step. - Run
CombineFeatures.py
to group the extracted features from all subjects into a single file and perform normalization, as well as average across subjects. The resulting files containing averaged features, condition labels and subject labels are calleddata_avg.pickle
,labels_avg.pickle
andgroups_avg.pickle
respectively. These files will be used for analysis.
The analysis is split up into three parts: statistics, single-feature machine learning and multi-feature machine learning. The corresponding files can be found in the statistics
, singleFeatureML
and multiFeatureML
directories.
-
Statistics:
TheStatistics.ipynb
notebook contains the code used for the statistical analysis of the caffeine vs. placebo condition for all features. It runs permutation t-tests and subsequently generates a figure, showing the statistical results visually. -
single-feature ML: For the single-feature, single-electrode analysis run
SingleFeatureML-Classifier.py
to train and evaluate a machine learning classifier on the previously extracted features. You can select the classifier to train through command line arguments. You can run the script without arguments to get some instructions. Final accuracy metrics will be printed after finishing training and a summary of the results is saved as a pickle file. Afterwards, use theSingleFeatureML-Figures.ipynb
notebook to visualize and compare results between classifiers. -
multi-feature ML: To train random forests on the complete multi-feature, multi-electrode data, run the
MultiFeatureRF-Classifier.py
script. By default, it will train 1000 random forests and save the scores and feature importances to disk. After training, use theMultiFeatureRF-Figures.ipynb
notebook for visualization of the random forest results.
-
Caffeine Caused a Widespread Increase of Resting Brain Entropy
-
Effects of caffeine on daytime recovery sleep: A double challenge to the sleep–wake cycle in aging
-
Sleep is more sensitive to high doses of caffeine in the middle years of life
-
Caffeine intake (200 mg) in the morning affects human sleep and EEG power spectra at night