Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the train_test_spliter #22

Open
zewail-liu opened this issue Apr 14, 2022 · 15 comments
Open

Question about the train_test_spliter #22

zewail-liu opened this issue Apr 14, 2022 · 15 comments

Comments

@zewail-liu
Copy link

Hi there, inspiring method and great paper
have been trying to apply your work on other dataset for couple of days, but cant achieve good results..
recheck the code, i think maybe it's a data-split problem.
for example, in this file, MI-EEG-1D-CNN/models/train_a.py, line 45
image
x is the loaded data, already shapes (events_num, 2, 640).
as we know, in one specific MI-task, different channel-couple in one ROI have similar behaviors,
in line 52, spliting reshape_x may split channel-couples in one task into train_set and test_set at same time, that maybe cause the acc rise not for the Model cause.

the data loading code of your work is a little bit hard for me to read, so i am trying to write my data loading function( humble one without base type event or SMOTE), which split data to train and test set first then reshaped it from (events_num, channels_num, 640) to (events_num, 2, 640) . then using HopefullNet to fit them, didn't end well.
i will paste my function below, after figure out how..

hope could get your respond, instruction about how to transfer HopefullNet to other dataset will be more than great.
best wishes

@zewail-liu
Copy link
Author

zewail-liu commented Apr 14, 2022

here is the function

import mne
import matplotlib.pyplot as plt
from sklearn import preprocessing
import numpy as np
from keras.utils.np_utils import to_categorical
from sklearn.model_selection import train_test_split

"""
    Physionet MI-EEG Dataset
    64 channels EEG,160hz freq, 4 seconds MI-task
    14 runs for each of the 109 subjects
        runs [1, 2] is baseline
        others with marker
            T0   : rest, 
            T1/T2: left/right fist in runs [3, 4, 7, 8, 11, 12]
                   both fists/feet in runs [5, 6, 9, 10, 13, 14]

"""
data_path = r'D:\00-data\PhysioNet\ori\S001\\'
LR_fist_run = [3, 4, 7, 8, 11, 12]
fist_feet_run = [5, 6, 9, 10, 13, 14]
rename_mapping = {'Fc5.': 'FC5', 'Fc3.': 'FC3', 'Fc1.': 'FC1', 'Fcz.': 'FCZ', 'Fc2.': 'FC2', 'Fc4.': 'FC4',
                  'Fc6.': 'FC6', 'C5..': 'C5', 'C3..': 'C3', 'C1..': 'C1', 'Cz..': 'CZ', 'C2..': 'C2', 'C4..': 'C4',
                  'C6..': 'C6', 'Cp5.': 'CP5', 'Cp3.': 'CP3', 'Cp1.': 'CP1', 'Cpz.': 'CPZ', 'Cp2.': 'CP2',
                  'Cp4.': 'CP4', 'Cp6.': 'CP6', 'Fp1.': 'FP1', 'Fpz.': 'FPZ', 'Fp2.': 'FP2', 'Af7.': 'AF7',
                  'Af3.': 'AF3', 'Afz.': 'AFZ', 'Af4.': 'AF4', 'Af8.': 'AF8', 'F7..': 'F7', 'F5..': 'F5', 'F3..': 'F3',
                  'F1..': 'F1', 'Fz..': 'FZ', 'F2..': 'F2', 'F4..': 'F4', 'F6..': 'F6', 'F8..': 'F8', 'Ft7.': 'FT7',
                  'Ft8.': 'FT8', 'T7..': 'T7', 'T8..': 'T8', 'T9..': 'T9', 'T10.': 'T10', 'Tp7.': 'TP7', 'Tp8.': 'TP8',
                  'P7..': 'P7', 'P5..': 'P5', 'P3..': 'P3', 'P1..': 'P1', 'Pz..': 'PZ', 'P2..': 'P2', 'P4..': 'P4',
                  'P6..': 'P6', 'P8..': 'P8', 'Po7.': 'PO7', 'Po3.': 'PO3', 'Poz.': 'POZ', 'Po4.': 'PO4', 'Po8.': 'PO8',
                  'O1..': 'O1', 'Oz..': 'OZ', 'O2..': 'O2', 'Iz..': 'IZ'}


def get_physionet(subject: int):
    """
    :param subject: SN of subject : [1,109]
    :return: data shapes (-1, channels, 640)
    """
    # loading from file
    for r in LR_fist_run:
        raw_new = mne.io.read_raw_edf(data_path + 'S%03d' % subject + 'R%02d.edf' % r, verbose='ERROR')
        if r == LR_fist_run[0]:
            raw_LR_fist = raw_new
        else:
            raw_LR_fist.append(raw_new)
    for r in fist_feet_run:
        raw_new = mne.io.read_raw_edf(data_path + 'S%03d' % subject + 'R%02d.edf' % r, verbose='ERROR')
        if r == fist_feet_run[0]:
            raw_fist_feet = raw_new
        else:
            raw_fist_feet.append(raw_new)

    raw_LR_fist.rename_channels(rename_mapping)
    raw_fist_feet.rename_channels(rename_mapping)
    ch_pick = ["FC1", "FC2", "FC3", "FC4", "C3", "C4", "C1", "C2",
               "CP1", "CP2", "CP3", "CP4"]

    # get the data and labels
    event_id_LR_fist = dict(T1=0, T2=1)
    events, _ = mne.events_from_annotations(raw_LR_fist, event_id_LR_fist, verbose='ERROR')
    epochs_LR_fist = mne.Epochs(raw_LR_fist, events, tmin=1 / 160, tmax=4, baseline=None, preload=True,
                                verbose='ERROR')
    event_id_fist_feet = dict(T1=2, T2=3)
    events, _ = mne.events_from_annotations(raw_fist_feet, event_id_fist_feet, verbose='ERROR')
    epochs_fist_feet = mne.Epochs(raw_fist_feet, events, tmin=1 / 160, tmax=4, baseline=None, preload=True,
                                  verbose='ERROR')
    data = np.concatenate((epochs_LR_fist.get_data(picks=ch_pick), epochs_fist_feet.get_data(picks=ch_pick)))
    scaler = preprocessing.StandardScaler()
    for i in range(len(data)):
        scaler.fit(data[i])
        data[i] = scaler.transform(data[i])
    labels = np.concatenate((epochs_LR_fist.events[:, 2], epochs_fist_feet.events[:, 2]))
    labels = to_categorical(labels)  # one-hot

    # reshape and return
    train_data_ori, test_data_ori, train_label_ori, test_label_ori = train_test_split(data, labels, test_size=0.2,
                                                                                      random_state=42)
    train_data = np.empty((0, 2, train_data_ori.shape[2]))
    train_label = np.empty((0, 4))
    test_data = np.empty((0, 2, test_data_ori.shape[2]))
    test_label = np.empty((0, 4))
    for i in range(0, len(ch_pick), 2):
        train_data = np.concatenate((train_data, train_data_ori[:, i:i + 2, :]))
        test_data = np.concatenate((test_data, test_data_ori[:, i:i + 2, :]))
        train_label = np.concatenate((train_label, train_label_ori))
        test_label = np.concatenate((test_label, test_label_ori))
    print('data loaded.')
    return train_data, test_data, train_label, test_label


if __name__ == '__main__':
    res = get_physionet(1)
    for r in res:
        print(r.shape)

@ambitious-octopus
Copy link
Owner

In order to test your assertion: "in line 52, spliting reshape_x may split channel-couples in one task into train_set and test_set at the same time, that may be cause the acc rise not for the Model cause." I used the following script, which basically tests 2 things:

  1. If there are duplicate instances in the entire reshaped dataset, the train set, and the validation/test set.
  2. If there are instances present in both train and test/valid set.
import sys
sys.path.append("/workspace")
import numpy as np
import tensorflow as tf
from data_processing.general_processor import Utils
from sklearn.model_selection import train_test_split
tf.autograph.set_verbosity(0)
physical_devices = tf.config.experimental.list_physical_devices('GPU')
print(physical_devices)
config = tf.config.experimental.set_memory_growth(physical_devices[0], True)

#Params
source_path = "/dataset/paper/"

# Load data
channels = Utils.combinations["a"] #["FC1", "FC2"], ["FC3", "FC4"], ["FC5", "FC6"]]

exclude =  [38, 88, 89, 92, 100, 104]
subjects = [n for n in np.arange(1,110) if n not in exclude]
#Load data
x, y = Utils.load(channels, subjects, base_path=source_path)
#Transform y to one-hot-encoding
y_one_hot  = Utils.to_one_hot(y, by_sub=False)
#Reshape for scaling
reshaped_x = x.reshape(x.shape[0], x.shape[1] * x.shape[2])
#Grab a test set before SMOTE

def check_duplicate(element_list):
    for elem in element_list:
        if element_list.count(elem) > 1:
            return True
    return False

x_train_raw, x_valid_test_raw, y_train_raw, y_valid_test_raw = train_test_split(reshaped_x,
                                                                            y_one_hot,
                                                                            stratify=y_one_hot,
                                                                            test_size=0.20,
                                                                            random_state=42)
reshaped = reshaped_x.tolist()
x_train = x_train_raw.tolist()
x_valid = x_valid_test_raw.tolist()
print(check_duplicate(reshaped))
print(check_duplicate(x_train))
print(check_duplicate(x_valid))

for sample in reshaped_x.tolist():
    if (sample in x_train) and (sample in x_valid):
        print("Problems")

This simple script shows that train and test/valid instances are not present in both sets simultaneously.
If this is not the answer you are expecting, please describe more in detail your problem.
Thanks

@zewail-liu
Copy link
Author

zewail-liu commented Apr 14, 2022

Thanks for your reply, but not exactly what i'm asking, sorry..

Here i try to describe it with figure:
Let's say raw data like this, 6 channels, 2 events
image

We load and reshape it, get X, like the x in train_a.py line 46
image

In this specific green task 'T1', channel couples [FC1, FC2], [FC3, FC4], [FC5, FC6] is different but mostly similar.
If we random split this X, it's likely we assign some of those green channel couples into train_set, and some of then into test_set.
As the dataset growing larger, it will happend for certain.

In my opinion, we should use green tasks' sample to predict the red tasks' sample, and should not use green tasks' sample to predict the green tasks' sample.
Because for application, new input data will be more like red sample, and there is none of its channels in train_set.

My question is how your work make sure of that.

Thank you very much!

@huangliuzhou
Copy link

Thanks for your reply, but not exactly what i'm asking, sorry..

Here i try to describe it with figure: Let's say raw data like this, 6 channels, 2 events image

We load and reshape it, get X, like the x in train_a.py line 46 image

In this specific green task 'T1', channel couples [FC1, FC2], [FC3, FC4], [FC5, FC6] is different but mostly similar. If we random split this X, it's likely we assign some of those green channel couples into train_set, and some of then into test_set. As the dataset growing larger, it will happend for certain.

In my opinion, we should use green tasks' sample to predict the red tasks' sample, and should not use green tasks' sample to predict the green tasks' sample. Because for application, new input data will be more like red sample, and there is none of its channels in train_set.

My question is how your work make sure of that.

Thank you very much!

i have the same issue with you. maybe we can learn from eachother weixin:laoyao_023

@ambitious-octopus
Copy link
Owner

Forgive me for responding so late. I took some time to check your hypothesis and unfortunately it is correct! Thank you for finding this serious bug in the code. I am actively working to see if the same accuracy can be achieved by removing this bug. At the moment I see it as a little difficult to achieve the same accuracy as the network is heavily overfitting. If it is not possible to solve this bug I will personally contact the journal. I will post in the next few days the resolution of this bug. In case you are working and want to share some thoughts please don't hesitate. Thanks again for the support.

@zewail-liu
Copy link
Author

I'm more than sorry to hear that.
It's still a very remarkable and enlightening work despite the accuracy.
Thanks!

@ambitious-octopus
Copy link
Owner

Thank you so much for pointing out this error. 🥇
I'm reopening the issue because it might be helpful to others.
I will post the fix shortly.

@DrugLover
Copy link

I tried split train/test/valid set before concatenating channels, and my acc in BCI IV2a dataset turned out to be 24.6%, which means the network didn't work at all......I read your paper and I read "A_Simplified_CNN_Classification_Method_for_MI-EEG" from you references. The author's accuracy is about 97%, maybe she made the same mistake?

@ambitious-octopus
Copy link
Owner

I tried to replicate the code from "A Simplified CNN Classification Method for MI-EEG". Unfortunately, I was unable to achieve the accuracy they claim. Be careful the problem is not reshaping the data. The problem is in the generation. Give me a few days and I'll insert the fix below.

@leoeooe
Copy link

leoeooe commented May 13, 2022

Here are some papers that may have the same bug:
"Multi-class motor imagery EEG classification method with high accuracy and low individual differences based on hybrid neural network" https://iopscience.iop.org/article/10.1088/1741-2552/ac1ed0
"DWT and CNN based multi-class motor imagery electroencephalographic signal recognition" https://iopscience.iop.org/article/10.1088/1741-2552/ab6f15
According to my current knowledge, it's nearly impossible to improve 4-class MI accuracy above 90% on MI public dataset, such as BCI IV 2a, Physionet(except High-gamma since it's Motor Execute dataset). I think the reason are current MI task still with low SNR as well as we can't instruct subjects to imagine similarily.
This is just my opinion and I'm also glad to see the progress in MI. So if you have implemented papers that improve accuracy near or above 90%, please contact me at lia@sari.ac.cn
Thanks!

@yosider
Copy link

yosider commented May 26, 2022

@Kubasinska
Hello,

Be careful the problem is not reshaping the data. The problem is in the generation.

Could you please explain the bug in the generation process?
Thanks!

@ambitious-octopus
Copy link
Owner

Hi everyone, sorry I am responding late. I integrated the fix into the main branch; find the new generator and loader in the fix folder. I also updated the readme and alerted the Journal.
I explain below the problem that @zewail-liu (whom I thank again) pointed out.

Take a single trial, the subject thinks about moving the right fist for 4 seconds, and 64 channels record the brain activity. Of these 64 channels, we take only 4 for this example: C3, C4, CP3 and CP4. The idea described in the paper is to divide this single instance into two, one consisting of C3 and C4 and one consisting of CP3 and CP4. So, now we have two arrays: one that has a size of (2, 640) composed of C3 and C4 and one that has the same size (2, 640) but is composed of the channels CP3 and CP4. The label corresponding to these two examples is the same: imagined movement of the right fist. Ideally, these two examples should both go in the same dataset; either both in the training dataset or both in the test dataset. What happens instead is that they go one on the train and one on the test. The image below clarifies this example.

image

@Ananas120
Copy link

Here are some papers that may have the same bug:
"Multi-class motor imagery EEG classification method with high accuracy and low individual differences based on hybrid neural network" https://iopscience.iop.org/article/10.1088/1741-2552/ac1ed0
"DWT and CNN based multi-class motor imagery electroencephalographic signal recognition" https://iopscience.iop.org/article/10.1088/1741-2552/ab6f15
According to my current knowledge, it's nearly impossible to improve 4-class MI accuracy above 90% on MI public dataset, such as BCI IV 2a, Physionet(except High-gamma since it's Motor Execute dataset). I think the reason are current MI task still with low SNR as well as we can't instruct subjects to imagine similarily.
This is just my opinion and I'm also glad to see the progress in MI. So if you have implemented papers that improve accuracy near or above 90%, please contact me at lia@sari.ac.cn
Thanks!

Hi, I just started a PhD thesis about EEG analysis and BCI so I am currently looking for papers to reproduce / compare my results with
I am quite sorry that the results of this paper were obtained with a bug in the splitting procedure because they were quite impressive and I like the 1D-CNN architecture...
As you have mentionned @leoeooe , many papers seem to claim this kind of (too) high accuracy (this one also ) but without sharing their code avoiding verification...
I am currently trying to reproduce the CSP with filterbank and time window data-processing as it seems to give really impressive results (if they are correct).
Nevertheless, I have found another paper using SVM for BCI-IV 2a with the CSP + FB + Time Windows and it achieves around 70-80% accuracy (in average) and the code seems consistant (I will share the link when I find it back) but it is not 90% yet and some subjects have poor accuracy (around 50%) so claiming >90% for all seems really surprising...
Have you achieved to reproduce some of these results since your last comment ?

The next week I will try to integrate this kind of processing, as well as the STFT proposed in this paper, with 1D-CNN and see whether it seems consistant to achieve that high accuracy on both the physionet and BCI-IV 2a datasets

@dawin2015
Copy link

Here are some papers that may have the same bug: "Multi-class motor imagery EEG classification method with high accuracy and low individual differences based on hybrid neural network" https://iopscience.iop.org/article/10.1088/1741-2552/ac1ed0 "DWT and CNN based multi-class motor imagery electroencephalographic signal recognition" https://iopscience.iop.org/article/10.1088/1741-2552/ab6f15 According to my current knowledge, it's nearly impossible to improve 4-class MI accuracy above 90% on MI public dataset, such as BCI IV 2a, Physionet(except High-gamma since it's Motor Execute dataset). I think the reason are current MI task still with low SNR as well as we can't instruct subjects to imagine similarily. This is just my opinion and I'm also glad to see the progress in MI. So if you have implemented papers that improve accuracy near or above 90%, please contact me at lia@sari.ac.cn Thanks!

Thanks for your sharing infomation. I am reproducing the paper BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to Learn From Massive Amounts of EEG Data, which claimed the accuracy up to 90% im MI. Have you ever taka a look at this paper? Thanks again!

@Mnaser95
Copy link

Thanks all for the useful discussion. This is exactly why I'm insisting on publishing the codes that correspond to any articles I publish. Unfortunately, making mistakes in programming will happen sometimes and there's no way around it, no matter how hard you work to check your code before submission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants