Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data split for deep learning methods #474

Open
Sara04 opened this issue Aug 29, 2023 · 0 comments
Open

Data split for deep learning methods #474

Sara04 opened this issue Aug 29, 2023 · 0 comments

Comments

@Sara04
Copy link
Collaborator

Sara04 commented Aug 29, 2023

Hello,

At the moment, split of the data into train and test sets is performed via the StratifiedKFold cross-validation, so that the distribution of the classes within the test and train is preserved. For deep learning, validation_split is 0.2, so 20 percent of the last training samples are selected for the validation (https://www.tensorflow.org/api_docs/python/tf/keras/Model). This means that in the train/test split chronology is neglected, while in the train/validation split it is not. In addition, for Cho2017 and PhisionetMI (at least) the distribution of the classes is not uniform over time, so there is a large mismatch between the training and validation subsets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants