Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: distance between two covariance or scatter matrices #9204

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
112 changes: 112 additions & 0 deletions statsmodels/robust/scatter_distance.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
"""
Created on Apr. 12, 2024 12:50:27 p.m.

Author: Josef Perktold
License: BSD-3
"""

import numpy as np
from scipy import linalg as splinalg

Check warning on line 9 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L8-L9

Added lines #L8 - L9 were not covered by tests


def corrdist(x1, x2):

Check warning on line 12 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L12

Added line #L12 was not covered by tests
"""Correlation coefficient without subtracting mean.

Parameters
----------
x1, x2 : ndarray
Two one dimensional arrays.


References
----------
Herdin, M., N. Czink, H. Ozcelik, and E. Bonek. “Correlation Matrix Distance,
a Meaningful Measure for Evaluation of Non-Stationary MIMO Channels.”
In 2005 IEEE 61st Vehicular Technology Conference, 1:136-140 Vol. 1, 2005.
https://doi.org/10.1109/VETECS.2005.1543265.

"""
if x1.ndim !=1 or x2.ndim !=1:
raise ValueError("data should be 1-dimensional")
cross = x1.T @ x2
s1 = (x1**2).sum(0)
s2 = (x1**2).sum(0)
cmd = 1 - cross / np.sqrt(s1 * s2)
return cmd

Check warning on line 35 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L30-L35

Added lines #L30 - L35 were not covered by tests


def cov_distance(cov1, cov2, method="kl", compare_scatter=False):

Check warning on line 38 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L38

Added line #L38 was not covered by tests
"""Distance and divergence measures between two covariance matrices

Notes
-----
Some measures require additional restrictions on the
covariance matrix, e.g. symmetry, full rank, positive definite.
Those restrictions are currently not checked and imposed.

Some measures are not proper distance measures and violate
properties such as symmetry d(c1, c2) == d(c2, c1).
For some of those measures a symmetrized method is additionally
available.

Distance equal to zero means that the two matrices are equal or
within the same equivalence class, for example they could be identical
up to an arbitrary scaling factor as in scatter matrices,
i.e. cov1 = k cov2 for some k>0.

References
----------

Cherian, Anoop, Suvrit Sra, Arindam Banerjee, and Nikolaos
Papanikolopoulos. 2011. “Efficient Similarity Search for Covariance
Matrices via the Jensen-Bregman LogDet Divergence.”
In 2011 International Conference on Computer Vision, 2399–2406.
https://doi.org/10.1109/ICCV.2011.6126523.

———. 2013. “Jensen-Bregman LogDet Divergence with Application to
Efficient Similarity Search for Covariance Matrices.” IEEE Transactions
on Pattern Analysis and Machine Intelligence 35 (9): 2161–74.
https://doi.org/10.1109/TPAMI.2012.259.


"""

cov1 = np.asarray(cov1)
cov2 = np.asarray(cov2)

Check warning on line 75 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L74-L75

Added lines #L74 - L75 were not covered by tests
if cov1.shape != cov2.shape:
raise ValueError("Matrices cov1 and cov2 do not have the same shape.")

Check warning on line 77 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L77

Added line #L77 was not covered by tests

k = cov1.shape[1]

Check warning on line 79 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L79

Added line #L79 was not covered by tests


if compare_scatter:
# normalize
cov1 = cov1 / np.linalg.det(cov1)
cov2 = cov2 / np.linalg.det(cov2)

Check warning on line 85 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L84-L85

Added lines #L84 - L85 were not covered by tests

if method == "kl":
dist = 0.5 * (np.trace(np.linalg.solve(cov1, cov2)) +

Check warning on line 88 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L88

Added line #L88 was not covered by tests
np.linalg.logdet(cov1) - np.linalg.logdet(cov2)) - k
elif method == "kl-sym":
dist = 0.5 * (np.trace(np.linalg.solve(cov1, cov2)) +

Check warning on line 91 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L91

Added line #L91 was not covered by tests
np.trace(np.linalg.solve(cov2, cov1))) - k
elif method == "corrd":
dist = corrdist(cov1.ravel(), cov2.ravel())

Check warning on line 94 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L94

Added line #L94 was not covered by tests
elif method in ["Frobenius", "square"]:
dist = splinalg.norm(cov1 - cov2)

Check warning on line 96 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L96

Added line #L96 was not covered by tests
elif method == "relevals-trace":
dist = np.trace(np.linalg.solve(cov1, cov2)) - k

Check warning on line 98 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L98

Added line #L98 was not covered by tests
elif method == "relevals-logdet":
dist = np.linalg.logdet(np.linalg.solve(cov1, cov2))

Check warning on line 100 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L100

Added line #L100 was not covered by tests
elif method == "relevals-range":
ev = np.linalg.evals(np.linalg.solve(cov1, cov2))
dist = np.ptp(ev)

Check warning on line 103 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L102-L103

Added lines #L102 - L103 were not covered by tests
elif method == "jb-logdet":
# Jensen-Bregman LogDet Divergence, Cherian et al. 2013
dist = (np.linalg.logdet((cov1 + cov2) / 2) -

Check warning on line 106 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L106

Added line #L106 was not covered by tests
np.linalg.logdet(cov1 @ cov2)
)
else:
raise ValueError("method not recognized")

Check warning on line 110 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L110

Added line #L110 was not covered by tests

return dist

Check warning on line 112 in statsmodels/robust/scatter_distance.py

View check run for this annotation

Codecov / codecov/patch

statsmodels/robust/scatter_distance.py#L112

Added line #L112 was not covered by tests