Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rank based alternatives for sample covariance/correlation matrix #77

Open
mnarayan opened this issue Dec 11, 2016 · 0 comments
Open

Rank based alternatives for sample covariance/correlation matrix #77

mnarayan opened this issue Dec 11, 2016 · 0 comments

Comments

@mnarayan
Copy link
Member

mnarayan commented Dec 11, 2016

Description

Enables implementation of simple nonparametric variant of graphical lasso or any other future estimators for the precision matrix.

Overview of nonparametric alternatives

The class of nonparanormal (monotone transformations + gaussian graphical models) and transelliptical (monotone transformations + elliptical graphical models) are covered by the use of

  • spearman
  • kendall's tau
  • winsorized or trim-mean correlation estimates
  • symmetric rank covariances includes Hoeffding's D, Bergma-Dassios sign covariance
  • k-root of sample covariance substitutes sample covariance or correlation with k-root. Use k=2 to mirror benefits of the sqrt-lasso. Cannot support negative eigenvalues.

Implementation

Task involves creating alternative functions for the rank-based sample correlation/covariance and offering this an alternative to current empirical covariance or correlation options.

Key Steps:

  1. Take rows of observations and transform into ranks
  2. Unbiased spearman's rankk correlation (eq. 8 and eq. 9) for all pairs of features

screen shot 2016-12-11 at 12 29 27 pm

The Kendall's tau concordance variant is another alternative, that has better nicer variations for handling ties, weighting higher ranks differently from lower ranks. However, scipy.stats.kendalltau does not support 2D arrays so it is likely to be slow for large dimensions.

Furthermore, the bias correction for correlation via kendalltau amounts using sin(pi / 2 * kendalltau)

Note 1: These rank correlation estimates can violate positive semi-definite requirements (all though the biased spearman and kendall are always P.S.D). Thus, not all graphical model estimators will be compatible with these rank correlation estimates. **However regularized Dantzig-CLIME, CONCORD, and some pseudolikelihood type estimators should be able to handle rank_correlation matrices with negative eigenvalues. **

Note 2: Take advantage of RobustScaler and RankScaler if possible, as well as other BaseEstimators from sklearn.

References

Liu, Han, John Lafferty, and Larry Wasserman.
"The nonparanormal: Semiparametric estimation of high dimensional undirected graphs."
Journal of Machine Learning Research 10.Oct (2009): 2295-2328.

Wilcox, R. R. (1993), Some results on a Winsorized correlation coefficient. British Journal of Mathematical and Statistical Psychology, 46: 339–349.
doi:10.1111/j.2044-8317.1993.tb01020.x

Xue, Lingzhou; Zou, Hui.
Regularized rank-based estimation of high-dimensional nonparanormal graphical models.
Ann. Statist. 40 (2012), no. 5, 2541--2571. doi:10.1214/12-AOS1041.
http://projecteuclid.org/download/pdfview_1/euclid.aos/1359987530

Liu, Han; Han, Fang; Yuan, Ming; Lafferty, John; Wasserman, Larry.
High-dimensional semiparametric Gaussian copula graphical models.
Ann. Statist. 40 (2012), no. 4, 2293--2326. doi:10.1214/12-AOS1037 https://projecteuclid.org/euclid.aos/1358951383

Rina Foygel Barber, Mladen Kolar
"ROCKET: Robust Confidence Intervals via Kendall's Tau for Transelliptical Graphical Models"
https://arxiv.org/abs/1502.07641

Vahe Avagyan, Andrés M. Alonso & Francisco J. Nogales (2017)
"Improving the Graphical Lasso Estimation for the Precision Matrix Through Roots of the Sample Covariance Matrix", Journal of Computational and Graphical Statistics, 26:4, 865-872, DOI: 10.1080/10618600.2017.1340890

mnarayan added a commit that referenced this issue Jul 10, 2017
@mnarayan mnarayan self-assigned this Jul 10, 2017
@mnarayan mnarayan added this to the Public Version 0.3 milestone Jul 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant