-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some kind of exemplar(s) for exploratory analysis (as opposed to flexible / hacked analysis) #21
Comments
Maybe also discuss the process of using a flexible/hacked pilot experiment to explore, formulate a hypothesis, and estimate noise. Then use those results to specify a confirmatory experiment. It's often tough to figure out parameter ranges and the required N without a purposefully sketchy pilot. |
I think this may be beyond the scope of the FAQ. This issue basically introduces CHI community to a new meta-method for statistical analysis. I'd think that a suitable form to communicate this is as an archival publication together with a tool that people can use to do such analysis (like Jake's note on Aligned Rank Transform). |
As for general FAQ about the use of exploratory analysis guideline, I like what Pierre mentioned in the email thread "Re: Exploratory versus confirmatory analyses in a 7th grade class". I quote this below in a list form.
|
Another quote from Tukey's EDA book:
Taking this quote in the context of 2017, could it be that the problem is the use of confirmatory procedures to explore data? (One instance is the p-hacking.) |
Interesting question. I do tend to think that p values are almost meaningless in exploratory analysis (I suppose the multiverse paper is proposing one way around that). I don't think this is out of scope though. There is so much exploratory analysis going on at CHI that I think this is firmly within scope. I agree that a paper could probably also be written on it, but that doesn't put it out of scope. |
Pilot studies are standard practice and often reported, so I'd say it's within scope. A similar practice is multi-experiment studies, where each experiment informs the next one. I've not read the multiverse paper yet but I love the idea, although this seems more like a new direction for improving practice than something that's really widespread. We can briefly mention new and emerging ideas/directions but should probably treat them differently than widespread practices? |
p-values and exploratory analysis: this could be something interesting to discuss in one of the guidelines. I think it depends on whether the p-values are cherry-picked. Say my exploratory analysis involves 20 t-tests and I report all of them (or similarly, I report 20 CIs and I look at which ones don't cross zero). I think in an exploratory analysis it's fine. There's no correction for multiplicity but each p-value taken separately is "correct", i.e., meets its definition. It's very different from reporting only the p values that are statistically significant because in this case, they're not correct anymore. I agree the first case still feels somehow dangerous, perhaps it's worth reminding the reader that some/many of the sig. outcomes may be pure noise (and if there's only 1--3 of these, it's likely all of them). Now I could decide against p-values / CIs because they're too misleading, and only report point estimates. Is it better? I'm not sure, as we're now missing info about statistical error. Now suppose for some reason I choose to only report the most "impressive" point estimates, and don't mention the others. It seems to be as problematic as the p-based cherry-picking from before. Maybe the problem is more a problem of cherry picking than a problem of p-values? I suppose it's possible for an exploratory analysis to be planned and to report everything graphically in the paper. The author could make crazy speculations based on the graphs but at least all is there, for the reader to see. But would that still be called exploratory analysis? I don't know. |
The case with 20 t-tests or CIs is one I've encountered a number of times with CHI papers. Maybe an exemplar of using multiple comparison adjustment (e.g., bonferroni) would be useful? |
As an exemplar OK but as a recommendation it's complicated. There are some pretty good arguments against the systematic use of adjustments.
The arguments for and against are complicated but if we can manage to summarize the literature and report that into general guidelines, it would be incredibly useful to CHI. I went through some of the literature already, but would have to dive into it again. |
From Pierre:
Two approaches we might make exemplars of:
The "multiverse analysis" idea: reporting many possible analyses and the conclusions that would result from them (like this: http://www.stat.columbia.edu/~gelman/research/published/multiverse_published.pdf, though I would prefer a forest plot over a histogram of p values, to emphasize estimation over testing).
"Run all the models and then combine them" approach, per McElreath (chapter 6 of http://xcelab.net/rm/statistical-rethinking/). This is use WAIC or LOO (or some measure of model performance) to make a weighted average of the models.
The text was updated successfully, but these errors were encountered: