Driver fatigue detection through multiple entropy fusion analysis in an EEG-based system

Assignment

Grade: 40/40

Implement steps described in the research paper and produce similar results.

Paper: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188756

⬇️ Setup

[ ! -d "venv" ] && (echo "Creating python3 virtual environment"; python3 -m venv venv)

Install required packages:

pip install -r requirements.txt

📁 Directory structure

Directory	Description
data	dataset
models	saved and trained models
references	research paper
reports	model's stats, figures
src	python source code

📋 Todo:

Utils:

Create report file saver and loader for easy and reproducible way to check results

Signal:

Apply filters to remove noise
- notch filter 50Hz
- band pass 0.15Hz to 40Hz
Crop the signal to 5 minutes (300 seconds)
Load signal for all drivers

Feature extraction

Create epochs from the raw signal using the window of 1 second
Create class for easier signal preprocessing procedures
Create class for dynamic feature extraction
Define features:
- 4 different entropies for each 1 second epoch
- standard deviation, mean, power density spectrum
Use product of preprocessing procedures and feature extraction to extract more features

Dataframe:

Concatenate features into a final dataframe
Clean the dataframe, replace bad values
Create a normalization process that avoids data leakage

Train:

Use LOO (leave one participant out) approach to find the best C and gamma parameters for the SVM model
Train the SVM model with multiple combinations of entropies (function powerset) to find out which entropy combination has the highest accuracy on the train dataset
Train the following models using the Grid Search method:
- SVM
- Neural network (BP)
- KNN
- Random Forest (RF)
Validate accuracy using testing set and report performance on each model
Determine significant electrodes by calculating the weight for each electrode for each driver with the formula describe in the research paper:

Improvements:

Fix the code in EntropyHub library (sample entropy) where NaN is returned by the np.log if the power density for a given frequency equals to 0
Add additional preprocessing procedure which decomposes signal to alpha and beta waves
ICA - Principal component analysis
- filter low 1hz to remove drifts
Fix data leakage. Don't scale on the whole dataset. Scale only the train dataset seperatly of test data. Fit on train, transform on train, transform on test
Fix the code for dataframe creation - add interface that allows feature concaternation
Explore which features are most important for prediction

Optional:

Visualize training/testing error
Visualize weight-based topographies for each subject
Visualize weight-based topographies average

🖼️ Figures

3 compontent (x,y, color) T-SNE of the dataset

Metrics - 50:50 train test split

Model comparison

Model name	F1 score	Accuracy	Area under curve
RandomForestClassifier	1	1	1
SVC	0.999182	0.999167	0.999172
MLPClassifier	0.994481	0.994444	0.994444
KNeighborsClassifier	0.982515	0.9825	0.98258

Receiver operating characteristic (ROC)

RandomForestClassifier (AUC = 1.000)	SVM (AUC = 0.999)

KNeighborsClassifier (AUC = 0.983)	MLPClassifier (AUC = 0.994)

Metrics - Leave one driver out (LOO)

Model comparison

Model name	F1 score	Accuracy	Area under curve
KNeighborsClassifier	0.462041	0.4425	0.4425
RandomForestClassifier	0.35129	0.365278	0.365278
MLPClassifier	0.346792	0.385556	0.385556
SVC	0.321702	0.384722	0.384722

Receiver operating characteristic (ROC)

RandomForestClassifier (AUC = 0.365)	SVM (AUC = 0.385)

KNeighborsClassifier (AUC = 0.443)	MLPClassifier (AUC = 0.386)

🏗️ Dataframe structure

Rows

Each row is defined by a tripplet:

driver (driver_id)
epoch (epoch_id)
driving state (y=is_fatigue_state)

Number of rows:

drivers (NUM_DRIVERS=12) *
epochs (SIGNAL_DURATION_SECONDS_DEFAULT=300) *
driving_states (len(driving_states)=2)

12 * 300 * 2 = 7200 rows

Columns

Each feature column is defined by a tripplet:

feature (mean, approximate entropy (AE), standard deviation (std)...)
channel (FC4, T6, P3...) - electrodes on the cap which driver wears during the driving session
preprocess procedure (standard, alpha waves (AL), channel rereferencing...)

Number of columns:

driver_id (1) +
is_fatigue_state (1) +
epoch_id (1) +
features (len(feature_names)=7) *
channels (len(channels_good)=30) *
N preprocess procedures (N <1, +>)

1 + 1 + 1 + 7 * 30 * N = 210 * N

in most cases N = 5 (standard, AL, AH, BL, BH)
num of cols = 1050

Total number of columns is product (multiplying) of feature (7), channels (30) and preprocess procedure (N)

Each driver has two driving states (normal and fatigue) which gives 2 raw signals for each driver.

Each signal consists of 300 seconds which will be transformed to 300 epochs.

For each epoch (row), all columns are caculated.

	is_fatigued	epoch_id	CP3_PE_standard	CP3_PE_AL	...	FT7_PE_standard	FT7_PE_AL	...
0	0	0	0.361971	0.361971	...	1.84037e-23	1.84037e-23	...
1	0	1	0.232837	0.232837	...	1.4759e-23	1.4759e-23	...
2	0	2	0.447734	0.447734	...	1.27735e-23	1.27735e-23	...
3	1	0	3.18712	3.18712	...	1.4759e-23	1.4759e-23	...
4	1	1	2.81654	2.81654	...	1.27735e-23	1.27735e-23	...

📝 Dataset notes

EEG data:

.cnt files were created by a 40-channel Neuroscan amplifier including the EEG data in two states in the process of driving.

Entropy data (not used):

four entropies of twelve healthy subjects for driver fatigue detection
the digital number represents different participants
each .mat file included five files
- FE - fuzzy entropy
- SE - sample entropy
- AE - approximate entropy
- PE - spectral entropy
- Class_label 0 or 1
  - 1 represents the fatigue state
  - 0 represents the normal state

📝 Reserach paper notes

Goal

analyze the multiple entropy fusion method and evaluate several channel regions to effectively detect a driver’s fatigue state based on electroencephalogram (EEG) records

Data:

collected by attaching electrodes to driver’s
non-fatigue data: driver was driving for 20 minutes. Last 5 minutes are captured as non-fatigue
fatigue data: driver was driving for 40-60 minutes. Last 5 minutes are captured as fatigue data.
dataset is split randomly 50:50 train/test
5 minute EEG data from 30 electrodes
- sectioned into 1 second epoch
- 5 * 60 = 300 * 1 = 300 epoch for one participant
- total 3600 fatigue units and 3600 normal units

Electrode cap:

32 channels (30 effective and 2 reference channels)

Entropies:

PE - special entropy - calculated by applying the Shannon function to the normalized power spectrum based on the peaks of a Fourier transform
AE - Approximate entropy - calculated in time domain without phase-space reconstruction of signal (short-length time series data) [41]
SE - Sample entropy - similar to AE. Se is less sensitive to changes in data length with larger values corresponding to greater complexity or irregularity in the data [41]
FE - Fuzzy entropy - stable results for different parameters. Best noise resistance using fuzzy membership function.

Entropy Parameters (AE, SE, FE):

m: dimension of phase space
- m = 2
r: similarity tolerance
- r = 0.2 * SD (SD = standard deviation of the time series)

Feature normalization

Features were normalized to [-1, 1] using min-max normalization:

Feature vector is built using the concatenation process, which concatenates the features.
The min-max normalization of each feature xi, i = 1,. . .,n, is computed as follows:

4 classifiers

Support vector machine (SVM)
Back propagation neural network (BP)
Random forest (RF)
K-nearest neighbor (KNN)

SVM Parameters

With leave-one-out (LOO) cross-validation parameters :

c=-1 - the penalty parameter
g=-5 - the kernel parameter
AR order 10.

Entropy combining

Combining multiple entropies always yields better accuracy.

Significant electrodes

Significant electrodes were chosen from 30 electrodes.

Calculate Acc(i) of single i electrode using multiple entropy fusion method based on training data by SVM classifier
Obtain accuracy for each electrode and then recalculate it by combining pairwise electrode (with 29 electrodes)
Calculate the weight for each electrode $V_i=\frac{Acc(i) + \sum_{j=1, j\not=i}^{30}{Acc_{(ij)} + Acc_{(i)} - Acc_{(j)}}}{30}$

Pick 10 electrodes with biggest weight. These 10 electrodes produce 4 clusters/regions A,B,C,D.

A gives the best prediction results and even better prediction compared when all electrodes were used for a prediction

📝 Notes to self:

Many channels are flatlined during the driving process and spike during some moments
In addition each BCIT dataset includes 4 additional EOG channels placed vertically above the right eye (veou), vertically below the right eye (veol), horizontally on the outside of the right eye (heor), and horizontally on the outside of the left eye (heol)
ipython kernel install --driver --name=eeg to use venv in jupyter
Two different libs (EntropyHub and Antropy) produce the same result for sample entropy
Applying filter before and after converting to epochs gives different results

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.vscode		.vscode
data/dataframes		data/dataframes
models		models
readme-pics		readme-pics
references		references
reports		reports
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

ciglenecki/eeg-driver-fatigue-detection

Folders and files

Latest commit

History

Repository files navigation

Driver fatigue detection through multiple entropy fusion analysis in an EEG-based system

Assignment

⬇️ Setup

📁 Directory structure

📋 Todo:

Utils:

Signal:

Feature extraction

Dataframe:

Train:

Improvements:

🖼️ Figures

3 compontent (x,y, color) T-SNE of the dataset

Metrics - 50:50 train test split

Model comparison

Receiver operating characteristic (ROC)

Metrics - Leave one driver out (LOO)

Model comparison

Receiver operating characteristic (ROC)

🏗️ Dataframe structure

Rows

Columns

📝 Dataset notes

📝 Reserach paper notes

Goal

Data:

Electrode cap:

Entropies:

Entropy Parameters (AE, SE, FE):

Feature normalization

4 classifiers

SVM Parameters

Entropy combining

Significant electrodes

📝 Notes to self:

About

Topics

Resources

Stars

Watchers

Forks

Languages