Overall discussion for Tip 4 #245

SiminaB · 2020-10-07T18:12:41Z

This is to discuss outstanding issues for Tip 4: Know your data and your question.
https://github.com/Benjamin-Lee/deep-rules/blob/master/content/06.know-your-problem.md

SiminaB · 2020-10-07T18:36:53Z

I'm not sure the data simulation portion belongs here:

Data simulation is a powerful approach to develop an understanding of how data and analytical
methods interact. In data simulation, a model is used to learn the true distribution of a training set for
the purpose of creating new data points. Often, researchers may perform simulations under some
assumptions about the data generating process to identify useful model architectures and
hyperparameters. Simulated datasets can be used to verify the correctness of a model’s
implementation. To accurately test the performance of the model, it is important that simulated
datasets be generated for a range of parameters. For example, varying the parameters to violate the
model’s assumptions can test the sensitivity of the model’s performance. Parameter tuning the
simulation can help researchers identify the key features that drive method performance. In other
cases, neural networks can be used to simulate data to better understand how to structure analyses.
For example, it is possible to study how analytical strategies cope with varying number of noise
sources by using neural networks to simulate genome-wide data [24]. Simulating data from
assumptions about the data generating distribution can help to debug or characterize deep learning
models, and deep learning models can also simulate data in cases where it is hard to make
reasonable assumptions from rst principles.

pstew · 2020-10-07T19:35:43Z

Agreed. It still feels out of place, even after it was moved (from tip 1?) and further tweaked (e.g. #234). Not sure about where it fits best but open to suggestions.

signalbash · 2020-10-11T22:46:20Z

Simulating data from assumptions about the data generating distribution can help to debug or characterize deep learning models, and deep learning models can also simulate data in cases where it is hard to make reasonable assumptions from first principles.

First half of this sentence feels very hard to read.

Benjamin-Lee · 2020-10-17T18:52:33Z

@pstew and @SiminaB the data simulation paragraph just jumped out at me for feeling out of place both content-wise and stylistically. I'm tempted to cut it down to a sentence or two and tuck it into an existing paragraph somewhere. The paper is already quite long and I don't think this paragraph is adding a ton of useful content.

pstew · 2020-10-19T13:19:37Z

@Benjamin-Lee I agree. Go for it.

Benjamin-Lee · 2020-10-19T20:42:36Z

@pstew review requested for #269

pstew · 2020-10-20T13:32:18Z

@Benjamin-Lee Thanks! Approved!

SiminaB mentioned this issue Oct 7, 2020

Outstanding issues not specific to any tips #252

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overall discussion for Tip 4 #245

Overall discussion for Tip 4 #245

SiminaB commented Oct 7, 2020

SiminaB commented Oct 7, 2020

pstew commented Oct 7, 2020

signalbash commented Oct 11, 2020

Benjamin-Lee commented Oct 17, 2020

pstew commented Oct 19, 2020

Benjamin-Lee commented Oct 19, 2020

pstew commented Oct 20, 2020

Overall discussion for Tip 4 #245

Overall discussion for Tip 4 #245

Comments

SiminaB commented Oct 7, 2020

SiminaB commented Oct 7, 2020

pstew commented Oct 7, 2020

signalbash commented Oct 11, 2020

Benjamin-Lee commented Oct 17, 2020

pstew commented Oct 19, 2020

Benjamin-Lee commented Oct 19, 2020

pstew commented Oct 20, 2020