SMEP: Multi Comparisons
Status: work in progress and wishlist
After working on multiple testing p-value corrections, I started to work on multiple comparisons. However, I stopped working on this partway through writing the basic versions. TukeyHSD is the only statistical test that has unit tests and is exposed in the documentation and API.
For a background on the current version see http://jpktd.blogspot.com/2013/03/multiple-comparison-and-tukey-hsd-or_25.html for the multiple comparisons and http://jpktd.blogspot.com/2013/04/multiple-testing-p-value-corrections-in.html for the p-value corrections.
- Interface : results, data format
- which comparisons : Tukey, Dunnet, all versus best, all versus mean, ...
- which statistic : mean, proportion, ranks (for nonparametric), ...
- alternative : one-sided, two-sided, equivalence, non-inferiority, ...
- which multiple testing procedure : p-value correction, and test statistic comparison (single step, step-wise), resampling/permutation tests
- associated results : confidence intervals if available (available in one-step procedures)
this is simplest, just run any set of tests and call multipletest
- examples :
-
-
pairwise_proportion
: Tukey, all pairs -
pairwise_proportion_control
Dunnet comparison with control
-
Assumptions : all groups are independent random variables, variance is identical, sample size can differ
- example :
-
- TukeyHSD : sandbox.stats.multicomp
- Dunnet (missing)
critical values and p-values : tabulated, TukeyHSD is available from Roger Lew, only coarse table of critical values for Dunnet, any other structures might need explicit integration as in "General Parametric Models"
Assumptions : groups are independent or paired, variances and sample size differs across groups
Some articles recommend using Welsh t-test in this case. e.g. Keselman 1998 for repeated measures
(This should work already calling pairwise on t_test.)
based on linear model, weak assumptions, variance heterogeneity can be modelled are "sandwiched" assumption (asymptotic) normality of parameter estimates and consistent estimate of cov_params
see Herberich, Sikorski, Hothorn 2010 for use with HAC covariance
my old start on adding one-way and two-way multiple comparison on top of OLS: https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/stats/contrast_tools.py
needs box interval probabilities for normal and t distribution: available in sandbox but numerically not robust, needs tweaking of options for extremer cases. not available for easier and faster case with product correlation structure
Horn Dunnet 2004, ...
Tamhane Dunnet 1999
Bofinger and Bofinger 1995
book multiple comparison in R, and articles by same authors
Bretz, Frank, Torsten Hothorn, and Peter H. Westfall. 2011. Multiple Comparisons Using R. Boca Raton, FL: CRC Press.
Herberich, Esther, Johannes Sikorski, and Torsten Hothorn. 2010. “A Robust Procedure for Comparing Multiple Means Under Heteroscedasticity in Unbalanced Designs.” PLoS ONE 5 (3) (March 29): e9788. doi:10.1371/journal.pone.0009788.
Hothorn, Torsten, Frank Bretz, and Peter Westfall. 2008. “Simultaneous Inference in General Parametric Models.” Biometrical Journal 50 (3) (June): 346–363. doi:10.1002/bimj.200810425.