Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vindex as outer indexer: memory and time performance #11018

Open
ilan-gold opened this issue Mar 23, 2024 · 0 comments
Open

vindex as outer indexer: memory and time performance #11018

ilan-gold opened this issue Mar 23, 2024 · 0 comments
Labels
array needs triage Needs a response from a contributor

Comments

@ilan-gold
Copy link

ilan-gold commented Mar 23, 2024

Describe the issue:

Emulating outerindexing via vindex + np.ix_ appears to be much slower and more memory intensive (prohibitively so for very large arrays where for an 1,000,000x1,000,000 array, it tried allocating 1.5TB of memory) than twice indexing. I know this is basically stated in the docs, but maybe there is something to be done here? If not, feel free to close.

Minimal Complete Verifiable Example:

%load_ext memory_profiler

import dask.array as da
import numpy as np
import scipy as sp

chunksize = 100
size = 10_000
n_points = 5000

X = da.random.poisson(15, (size, size), chunks = (chunksize, chunksize))

index_0 = np.random.randint(0, X.shape[0], n_points)
index_0.sort()
index_1 = np.random.randint(0, X.shape[1], n_points)
index_1.sort()

print('vindex timing:')
%timeit X.vindex[np.ix_(index_0, index_1)].compute()
print('vindex memory usage:')
%memit X.vindex[np.ix_(index_0, index_1)]
print('double-index timing:')
%timeit X[index_0, :][:, index_1].compute()
print('double-index memory usage:')
%memit X[index_0, :][:, index_1]

Anything else we need to know?:

Environment:

  • Dask version: 2024.3.1
  • Python version: 3.12
  • Operating System: mac
  • Install method (conda, pip, source): pip
@github-actions github-actions bot added the needs triage Needs a response from a contributor label Mar 23, 2024
@phofl phofl added the array label Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array needs triage Needs a response from a contributor
Projects
None yet
Development

No branches or pull requests

2 participants