Data split for deep learning methods #474

Sara04 · 2023-08-29T20:24:43Z

Hello,

At the moment, split of the data into train and test sets is performed via the StratifiedKFold cross-validation, so that the distribution of the classes within the test and train is preserved. For deep learning, validation_split is 0.2, so 20 percent of the last training samples are selected for the validation (https://www.tensorflow.org/api_docs/python/tf/keras/Model). This means that in the train/test split chronology is neglected, while in the train/validation split it is not. In addition, for Cho2017 and PhisionetMI (at least) the distribution of the classes is not uniform over time, so there is a large mismatch between the training and validation subsets.

Sara04 mentioned this issue Sep 18, 2023

Split train and validation data with StratifiedKFold #484

Open

sylvchev added the enhancement label Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data split for deep learning methods #474

Data split for deep learning methods #474

Sara04 commented Aug 29, 2023

Data split for deep learning methods #474

Data split for deep learning methods #474

Comments

Sara04 commented Aug 29, 2023