Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: IndexError when giving a zero distance matrix to multiscale_graphcorr #19769

Open
Rayerdyne opened this issue Dec 27, 2023 · 6 comments
Open
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats

Comments

@Rayerdyne
Copy link

Describe your issue.

Error

On calling scipy.stats.multiscale_graphcorr with either its x or y parameter (as distance matrices, hence with compute_distance=None) contains only zero values, a IndexError is raised.

Investigation

Investigating the stack trace led me to the file scipy/stats/_stats_py.py lines 6471 to 6479*, where I see:

    # calculate MGC map and optimal scale
    stat_mgc_map = _local_correlations(distx, disty, global_corr='mgc')

    n, m = stat_mgc_map.shape
    if m == 1 or n == 1:
        # the global scale at is the statistic calculated at maximial nearest
        # neighbors. There is not enough local scale to search over, so
        # default to global scale
        stat = stat_mgc_map[m - 1][n - 1]

*these lines number are taken from the version of the file on the repo as of now (ec98497)

It appears to me as a mismatch between the definition, n, m = stat_mgc_map.shape, and its use as stat_mgc_map[m - 1][n - 1], as it looks like that the arrays output by _local_correlations are not necessarily square.

However I cannot say I 100% understood how MGC works, hence I wonder if this is actually intended.

Context

For information, I got this error through via hyppo package, see this issue.

Reproducing Code Example

import numpy as np

from scipy.stats import multiscale_graphcorr
from sklearn.metrics import pairwise_distances

n = 6

x = np.arange(n).reshape(-1, 1)
distx = pairwise_distances(x, x)
disty = np.zeros((n, n))
mgc = multiscale_graphcorr(disty, distx, compute_distance=None, reps=0)

Error message

.../.env/lib/python3.10/site-packages/scipy/stats/_stats_py.py:6477: RuntimeWarning: The number of replications is low (under 1000), and p-value calculations may be unreliable. Use the p-value result, with caution!
  warnings.warn(msg, RuntimeWarning)
Traceback (most recent call last):
  File ".../problem-scipy.py", line 15, in <module>
    main()
  File ".../problem-scipy.py", line 12, in main
    mgc = multiscale_graphcorr(disty, distx, compute_distance=None, reps=0)
  File ".../.env/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 6490, in multiscale_graphcorr
    stat, stat_dict = _mgc_stat(x, y)
  File ".../.env/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 6545, in _mgc_stat
    stat = stat_mgc_map[m - 1][n - 1]
IndexError: index 5 is out of bounds for axis 0 with size 1

SciPy/NumPy/Python version and system information

1.11.3 1.23.5 sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /usr/local/include
    lib directory: /usr/local/lib
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= HASWELL MAX_THREADS=2
    pc file directory: /usr/local/lib/pkgconfig
    version: 0.3.21.dev
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /usr/local/include
    lib directory: /usr/local/lib
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= HASWELL MAX_THREADS=2
    pc file directory: /usr/local/lib/pkgconfig
    version: 0.3.21.dev
  pybind11:
    detection method: config-tool
    include directory: unknown
    name: pybind11
    version: 2.11.0
Compilers:
  c:
    commands: cc
    linker: ld.bfd
    name: gcc
    version: 10.2.1
  c++:
    commands: c++
    linker: ld.bfd
    name: gcc
    version: 10.2.1
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 0.29.36
  fortran:
    commands: gfortran
    linker: ld.bfd
    name: gcc
    version: 10.2.1
  pythran:
    include directory: ../../tmp/pip-build-env-6ckgqyn6/overlay/lib/python3.10/site-packages/pythran
    version: 0.14.0
Machine Information:
  build:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
  cross-compiled: false
  host:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
Python Information:
  path: /opt/python/cp310-cp310/bin/python
  version: '3.10'
@Rayerdyne Rayerdyne added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Dec 27, 2023
@sampan501
Copy link
Contributor

@lucascolley I'm the author of both bits of code. Could I be assigned this issue?

@lucascolley
Copy link
Member

Hi @sampan501 , go ahead! We don't assign people to issues, feel free to take on anything which nobody else is working on 👍

@sampan501
Copy link
Contributor

@Rayerdyne Looking into the linked issue, I believe the issue relies more on the MGCX code in hyppo rather than the MGC code in scipy. The _local_correlations function will not output a square matrix each time when there are redundant rows or columns. That is a property of the algorithm.

As for the reproducing code example, I'm realizing now that the code should throw an error when the inputs have 0 variance (it doesn't make much sense to run an independence test in this setting)

@Rayerdyne
Copy link
Author

@sampan501 I see, thanks!

@lucascolley
Copy link
Member

As for the reproducing code example, I'm realizing now that the code should throw an error when the inputs have 0 variance (it doesn't make much sense to run an independence test in this setting)

Would either of you like to submit a PR for this?

@sampan501
Copy link
Contributor

I can do this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Projects
None yet
Development

No branches or pull requests

3 participants