Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DGAN for ECG dataset #162

Open
sanketahegde opened this issue Jul 21, 2023 · 3 comments
Open

DGAN for ECG dataset #162

sanketahegde opened this issue Jul 21, 2023 · 3 comments

Comments

@sanketahegde
Copy link

Hello,

I have been trying to apply the DoppleGANger (DGAN) model on my 1-lead ECG dataset to generate synthetic data but after some tries and tuning some basic hyperparameters, the model does not learn the pattern of the ECG.
So, I just wanted to make sure if DGAN is even applicable for a Biosignal or ECG data generation.
Any suggestions are welcome!

Thank you in advance.

@kboyd
Copy link
Contributor

kboyd commented Jul 29, 2023

Hi @sanketahegde, thanks for trying out DGAN and asking questions! In general, DGAN is quite good for biosignals when sufficient training data is available, but I know ECG data has very specific properties that need to be preserved. To get the most out of DGAN, I'd recommend thinking about the following items:

  1. What is an example? That is, how long are the sequences that DGAN independently generates (max_sequence_len parameter)? Generally speaking, shorter sequences are easier. I'd try sequences that only contain 2-3 heartbeats (maybe even just 1) to start, and then expand to longer sequences once you have the shorter sequences working.

  2. How much data do you have? DGAN really excels when there's lots of training examples, 10k or more, maybe even target 100k+ to learn the intricacies of ECGs. So if you use 2 seconds of the ECG sampling as the example length, that's 100k 2-second snippets. With time series, you can do sliding windows to increase training examples if you're splitting up longer sequences. But that may also make the model learning task a bit harder if each training sequences starts at very different points in the ECG period. Definitely experiment with different ways to construct the training data.

  3. Hyperparameters are absolutely key. It's great you've explored some hyperparameter tuning. I've found the most impactful parameters to explore are learning rates and epochs. Besides finding the right order of magnitude, DGAN can be fairly sensitive to even 30% changes in these values, so doing a thorough exploration with grid search, or using a library like optuna can be really powerful. And of course having a good metric to optimize for is critical. There's not really a loss that can be used for early stopping with GANs, so utilizing metrics related to ECGs would be best.

Hope that provides some experiment ideas. And if you you're willing to share a notebook or code snippet of how you're setting up the training data and the model, I'm happy to take a look to see if there are any more specific recommendations.

@sanketahegde
Copy link
Author

Hi @kboyd ,

Thank you very much for your detailed reply with suggestions.
As my work with DGAN is on hold, I shall try to apply your suggestions and update here if I get some better results.

@Manuelhrokr
Copy link

This is an interesting discussion.

I have been running some basic experiments on my TS data using DGAN. My main goal is to create synthetic time series while keeping (as best possible) the fidelity and flexibility properties of my data (i.e., as stated by the original authors of the method in their paper). However, there's no free lunch, and for my particular case, having ~ 2k to 2.5k data samples of max_sequence_len = 24 is the best I can do, due to the hourly resolution of my data. Hence, following recommendations from @kboyd, I mostly rely on (3) to enhance, as much as possible, the fidelity and flexibility of the synthetic samples.

Finally, does DGAN implementation allow to use a seed S to generate N number of samples each time with a different seed? That is, assuming I have 2k new 24-hour synthetic TS samples, I would like to use a new seed S to generate a new set of 2k synthetic samples, ... , and so on. I assume a new run of DGAN would approximate this behavior, right?

Comments/feedback on these questions would be appreciated.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants