Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VeloVI uses np.argsort, does not work with sparse spliced/unspliced layers #2687

Open
jberkh opened this issue Apr 4, 2024 · 4 comments
Open
Assignees
Labels

Comments

@jberkh
Copy link

jberkh commented Apr 4, 2024

Issue

The scvi.external.velovi.VELOVI class uses np.argsort on the spliced/unspliced matrices, but this returns array([0]) when these matrices are sparse.

Minimally reproducible code

import scvi
import anndata
from scipy.sparse import csr_matrix
import numpy as np

adata = anndata.AnnData(X = np.random.randint(0, 100, 10000).reshape(100, 100))
adata.layers["spliced"] = csr_matrix(adata.X.copy())
adata.layers["unspliced"] = csr_matrix(adata.X.copy())

scvi.external.velovi.VELOVI.setup_anndata(adata, "spliced", "unspliced")
model = scvi.external.velovi.VELOVI(adata)
model.train()
IndexError                                Traceback (most recent call last)
Cell In[1], [line 11]
      [8] adata.layers["unspliced"] = csr_matrix(adata.X.copy())
     [10] scvi.external.velovi.VELOVI.setup_anndata(adata, "spliced", "unspliced")
---> [11] model = scvi.external.velovi.VELOVI(adata)
     [12] model.train()

File [~/miniconda3/envs/envSCVI-2/lib/python3.10/site-packages/scvi/external/velovi/_model.py:77] in VELOVI.__init__(self, adata, n_hidden, n_latent, n_layers, dropout_rate, gamma_init_data, linear_decoder, **model_kwargs)
     [75] sorted_unspliced = np.argsort(unspliced, axis=0)
     [76] ind = int(adata.n_obs * 0.99)
---> [77] us_upper_ind = sorted_unspliced[ind:, :]
     [79] us_upper = []
     [80] ms_upper = []

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

Versions:

conda list | grep scvi            
scvi-tools                1.1.2              pyhd8ed1ab_0    conda-forge

(Installed with pip install git+https://github.com/scverse/scvi-tools on 4/4/24 ~12:00, as velovi is not available in the current pypi/conda release)

@jberkh jberkh added the bug label Apr 4, 2024
@martinkim0
Copy link
Contributor

Thanks! I'll take a look at this

@martinkim0 martinkim0 self-assigned this Apr 4, 2024
@martinkim0
Copy link
Contributor

Just to give a quick update on this, the simplest patch is to explicitly convert the unspliced data into a dense array when it comes in as sparse since there's no argsort equivalent for csr_matrix. I don't think this is the best solution because it will require more memory than necessary.

@jberkh
Copy link
Author

jberkh commented Apr 17, 2024

I figured as much, given that np.argsort by defnition returns a dense matrix since all the zero value return non-zero indices. Unfortunately, calling .todense() on too large datasets does blow up in memory quite a bit, possibly to the point of crashing.

Maybe solution could be to convert the matrices to dense in batches, something along these lines:

if isinstance(unspliced, csr_matrix) or isinstance(unspliced, csr_matrix):
    sorted_unspliced = np.empty(unspliced.shape)
    # batch size of 1000 here
    for i in range(0, unspliced_shape.shape[1], 1000):
        sorted[:, i:i+1000] = np.argsort(unspliced[:, i:i+1000].todense(), axis = 0)

@martinkim0
Copy link
Contributor

Thanks for the suggestion. I'm personally leaning towards a solution that natively deals with the sparse matrix instead of having to implicitly convert to dense. I'm also looking into what might be a potential bug in this whole preprocessing step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants