Test train split of physnet2012 is scientifically flawed. #409

Twofigo · 2024-05-11T15:23:54Z

1. System Info

A simple recent installation of pypots,
Using scripts derived from brew pots for some comparative model training.

2. Information

The official example scripts
My own created scripts

3. Reproduction

use gene_physionet2012() for anything

4. Expected behavior

The gene_physionet2012() function in pots.data does a test-train split of the whole dataset but uses a random seed. This means separate programs can give different splits, which breaks the scientific utility of comparing results between runs. This function should take a random seed to fix the split between executions.

Furthermore, physionet2012 comes in three sections from the start that tsdb merge into one pile. I originally assumed this was the split I was given when running gene_physionet2012, but that appears to have been an incorrect assumption.

To be fully correct in my scientific methodology, I now need to rewrite both tsdb and gene_physionet2012 and rerun every experiment to this point using a fixed seed (at the minimum), or use the proper original split if I want to publish my results without a disclaimer.

To be clear, the examples in brewPots does include a util function that sets the torch and numpy random seed globally. But this is a terrible workaround. You can only trust results when running the same code and code-path, as any other library accidentally calling a numpy.random before calling gene_physionet2012 would increment the random state, and introduce a systematic flaw in results between code-paths.

The text was updated successfully, but these errors were encountered:

WenjieDu · 2024-05-11T15:24:19Z

Hi there 👋,

Thank you so much for your attention to PyPOTS! You can follow me on GitHub to receive the latest news of PyPOTS. If you find PyPOTS helpful to your work, please star⭐️ this repository. Your star is your recognition, which can help more people notice PyPOTS and grow PyPOTS community. It matters and is definitely a kind of contribution to the community.

I have received your message and will respond ASAP. Thank you for your patience! 😃

Best,
Wenjie

WenjieDu · 2024-05-22T15:35:19Z

Hi @Twofigo, thank you for raising this discussion, but "scientifically flawed", serious? The func just splits the dataset in a way that doesn't follow the official. Here https://physionet.org/content/challenge-2012/1.0.0 the official doc says "Four thousand records comprise training set A, and the remaining records form test sets B and C". Hence, A, B, and C three sets should be assumed to share the same data distribution, right? So the splitting isn't a problem, but we do thank you for your feedback, and I think, in the returned result from TSDB load func, we could include the information about which set the sample is from, or add an argument to only return samples from set a/b/c.

Let's talk about the random seed setting. Note that although random seed can help but is never a gold guarantee to reproduce your results, and there're are so many discussions on the net for you to refer to. The only way to ensure your generated dataset the same everywhere (across dev env, settings, and devices) is serializing it, namely saving it into a file. And we do provide such a function, try pypots.data.saving.save_dict_into_h5() or pypots.data.saving.pickle_dump(). PyPOTS models can also directly take saved h5 files for model training.

Of course the random state will change when you call numpy.random again. This is not an issue. What else do you expect, everything keeps intact? then why call it again? All libraries or toolkits provide standard APIs to help make people's lives easier, but that doesn't mean you can use them without understanding what they're doing.

Twofigo added the bug Something isn't working label May 11, 2024

WenjieDu self-assigned this May 14, 2024

WenjieDu mentioned this issue May 25, 2024

Return physionet_2012 dataset as three separated sets WenjieDu/TSDB#60

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test train split of physnet2012 is scientifically flawed. #409

Test train split of physnet2012 is scientifically flawed. #409

Twofigo commented May 11, 2024

WenjieDu commented May 11, 2024

WenjieDu commented May 22, 2024

Test train split of physnet2012 is scientifically flawed. #409

Test train split of physnet2012 is scientifically flawed. #409

Comments

Twofigo commented May 11, 2024

1. System Info

2. Information

3. Reproduction

4. Expected behavior

WenjieDu commented May 11, 2024

WenjieDu commented May 22, 2024