-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reference: scaling of scatter matrix to get covariance #3220
Comments
adding this here: We should have some helper function or additional methods attached to the robust |
reminder to myself "M:\josef\eclipsegworkspace\statsmodels-git\local_scripts\local_scripts\try_rlm_winsorized.py" |
example calculation using scipy.stats expect method (lines copied out of sequence)
|
this is also related to #3181 another related includes looks useful but I don't know where they should go Boudt, Kris, Jonathan Cornelissen, and Christophe Croux. 2011. “The Gaussian Rank Correlation Estimator: Robustness Properties.” Statistics and Computing 22 (2): 471–83. doi:10.1007/s11222-011-9237-0. gaussian rank correlation is consistent and asymptotically efficient (same asy variance as pearson) at normal distribution not sure yet where to put this something like for asy var, matching the examples in the two articles:
spearman is more complicated with terms like this (if my odint does what I think it does)
|
two more references with consistency factors for covariance I'm using Table 1 from Croux and Haesbroeck as test reference numbers (I wrote my function initially partially by trial and error to get correct results in Monte Carlo). They have more general expressions for elliptically symmetric distribution (based on g function) Croux, Christophe, and Gentiane Haesbroeck. 1999. “Influence Function and Efficiency of the Minimum Covariance Determinant Scatter Matrix Estimator.” Journal of Multivariate Analysis 71 (2): 161–90. doi:10.1006/jmva.1999.1839. Riani, Marco, Andrea Cerioli, and Francesca Torti. 2014. “On Consistency Factors and Efficiency of Robust S-Estimators.” TEST 23 (2): 356–87. doi:10.1007/s11749-014-0357-7. |
(not sure what's the closest issue to this) I just saw MASS has a function |
" Soloveychik, Ilya, and Ami Wiesel. “Performance Analysis of Tyler’s Covariance Estimator.” IEEE Transactions on Signal Processing 63, no. 2 (January 2015): 418–26. https://doi.org/10.1109/TSP.2014.2376911. the usual normalized Tyler's scatter matrix has trace(S) = p aside: I saw several articles for regularized or shrinkage Tyler scatter matrix (in analogy to regularizing/shrinking sample cov) large overview of Tyler's scatter and several more recent articles (I skimmed only a few parts) Ashurbekova, Karina, Antoine Usseglio-Carleve, Florence Forbes, and Sophie Achard. “Optimal Shrinkage for Robust Covariance Matrix Estimators in a Small Sample Size Setting,” March 2021. https://hal.science/hal-02378034. Goes, John, Gilad Lerman, and Boaz Nadler. “Robust Sparse Covariance Estimation by Thresholding Tyler’s M-Estimator.” The Annals of Statistics 48, no. 1 (February 2020): 86–110. https://doi.org/10.1214/18-AOS1793. Hediger, Simon, Jeffrey Näf, and Michael Wolf. “R-NL: Covariance Matrix Estimation for Elliptical Distributions Based on Nonlinear Shrinkage.” IEEE Transactions on Signal Processing 71 (2023): 1657–68. https://doi.org/10.1109/TSP.2023.3270742. Ollila, Esa. “Linear Shrinkage of Sample Covariance Matrix or Matrices under Elliptical Distributions: A Review.” arXiv, August 9, 2023. https://doi.org/10.48550/arXiv.2308.04721. Ollila, Esa, Daniel P. Palomar, and Frédéric Pascal. “Shrinking the Eigenvalues of M-Estimators of Covariance Matrix.” IEEE Transactions on Signal Processing 69 (2021): 256–69. https://doi.org/10.1109/TSP.2020.3043952. Zhang, Teng, and Ami Wiesel. “Automatic Diagonal Loading for Tyler’s Robust Covariance Estimator.” In 2016 IEEE Statistical Signal Processing Workshop (SSP), 1–5, 2016. https://doi.org/10.1109/SSP.2016.7551741. another recent review article that looks good and is shorter than the Wiesel now mini-book Taskinen, Sara, Gabriel Frahm, Klaus Nordhausen, and Hannu Oja. “A Review of Tyler’s Shape Matrix and Its Extensions.” In Robust and Multivariate Statistical Methods: Festschrift in Honor of David E. Tyler, edited by Mengxi Yi and Klaus Nordhausen, 23–41. Cham: Springer International Publishing, 2023. https://doi.org/10.1007/978-3-031-22687-8_2. aside: Nordhausen is co-author or maintainer of several R packages that include extensions of Tyler's scatter estimation |
New article with explicit scale estimate for Tyler's shape matrix Ollila, Esa, Daniel P. Palomar, and Frederic Pascal. “Affine Equivariant Tyler’s M-Estimator Applied to Tail Parameter Learning of Elliptical Distributions.” arXiv, May 7, 2023. https://doi.org/10.48550/arXiv.2305.04330. brief skimming: I can try it out in PR #8129 |
in #9227 I use an M-scale to scale the shape matrix with det(shape)=1, with consistency, scale_bias at normal distribution. |
(parking a reference to computational detail)
how do we normalize a scatter matrix so that it is consistent for specific distribution, commonly the normal
cov = sigma = c scatter -> find "size" c
related: mad, iqr and similar have normalization constants
here it is for the multivariate case
Maronna et al text book on robust statistics 2006 section 6.3.2 on page 186
using chi2 distribution for mahalanobis distances we can calculat
c = median( {d_i}_i ) / chi2.ppf(0.5, k_vars)
this has also been used without reference in Maronna and Zamar 2002 on cov_ogk.
I didn't see anything mentioned for the "size" estimates in Tyler estimator for scatter in elliptical distribution.
There are several references (*2) for consistency and small sample scaling of MCD and similar but I didn't look carefully (brief browsing or skimming doesn't show any obvious answer)
Many articles just mention the scaling factors but they don't show the numbers or formulas.
Maronna, Ricardo A., Douglas Martin, and Víctor J. Yohai. 2006. Robust Statistics: Theory and Methods. Reprinted with corr. Wiley Series in Probability and Statistics. Chichester: Wiley.
Maronna, Ricardo A., and Ruben H. Zamar. 2002. “Robust Estimates of Location and Dispersion for High-Dimensional Datasets.” Technometrics 44 (4): 307–17. doi:10.1198/004017002188618509.
(*2)
Hardin, Johanna, and David M. Rocke. 2005. “The Distribution of Robust Distances.” Journal of Computational and Graphical Statistics 14 (4): 928–46. doi:10.1198/106186005X77685.
Pison, G., S. Van Aelst, and G. Willems. n.d. “Small Sample Corrections for LTS and MCD.” Metrika 55 (1–2): 111–23. doi:10.1007/s001840200191.
The text was updated successfully, but these errors were encountered: