Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overall discussion for Tip 4 #245

Open
SiminaB opened this issue Oct 7, 2020 · 7 comments
Open

Overall discussion for Tip 4 #245

SiminaB opened this issue Oct 7, 2020 · 7 comments

Comments

@SiminaB
Copy link
Collaborator

SiminaB commented Oct 7, 2020

This is to discuss outstanding issues for Tip 4: Know your data and your question.
https://github.com/Benjamin-Lee/deep-rules/blob/master/content/06.know-your-problem.md

@SiminaB
Copy link
Collaborator Author

SiminaB commented Oct 7, 2020

  • I'm not sure the data simulation portion belongs here:

Data simulation is a powerful approach to develop an understanding of how data and analytical
methods interact. In data simulation, a model is used to learn the true distribution of a training set for
the purpose of creating new data points. Often, researchers may perform simulations under some
assumptions about the data generating process to identify useful model architectures and
hyperparameters. Simulated datasets can be used to verify the correctness of a model’s
implementation. To accurately test the performance of the model, it is important that simulated
datasets be generated for a range of parameters. For example, varying the parameters to violate the
model’s assumptions can test the sensitivity of the model’s performance. Parameter tuning the
simulation can help researchers identify the key features that drive method performance. In other
cases, neural networks can be used to simulate data to better understand how to structure analyses.
For example, it is possible to study how analytical strategies cope with varying number of noise
sources by using neural networks to simulate genome-wide data [24]. Simulating data from
assumptions about the data generating distribution can help to debug or characterize deep learning
models, and deep learning models can also simulate data in cases where it is hard to make
reasonable assumptions from rst principles.

@pstew
Copy link
Collaborator

pstew commented Oct 7, 2020

Agreed. It still feels out of place, even after it was moved (from tip 1?) and further tweaked (e.g. #234). Not sure about where it fits best but open to suggestions.

@signalbash
Copy link
Collaborator

Simulating data from assumptions about the data generating distribution can help to debug or characterize deep learning models, and deep learning models can also simulate data in cases where it is hard to make reasonable assumptions from first principles.

First half of this sentence feels very hard to read.

@Benjamin-Lee
Copy link
Owner

@pstew and @SiminaB the data simulation paragraph just jumped out at me for feeling out of place both content-wise and stylistically. I'm tempted to cut it down to a sentence or two and tuck it into an existing paragraph somewhere. The paper is already quite long and I don't think this paragraph is adding a ton of useful content.

@pstew
Copy link
Collaborator

pstew commented Oct 19, 2020

@Benjamin-Lee I agree. Go for it.

@Benjamin-Lee
Copy link
Owner

@pstew review requested for #269

@pstew
Copy link
Collaborator

pstew commented Oct 20, 2020

@Benjamin-Lee Thanks! Approved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants