Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverflowError: value too large to convert to int #50

Open
slowkow opened this issue Nov 2, 2020 · 2 comments
Open

OverflowError: value too large to convert to int #50

slowkow opened this issue Nov 2, 2020 · 2 comments
Assignees

Comments

@slowkow
Copy link

slowkow commented Nov 2, 2020

Could I ask if you might have any tips on how to overcome this error?

I'm running your 1M cell code, but I tried it on my own set of 2.8M cells.

Here's my matrix:

sparse_gpu_array.shape
# (2886934, 33567)

sparse_gpu_array.nnz
# 4128695018

Let's try to run this:

sparse_gpu_array, genes = rapids_scanpy_funcs.filter_genes(sparse_gpu_array, genes, min_cells=1000)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<timed exec> in <module>

~/work/github.com/slowkow/rapids-single-cell-examples/notebooks/rapids_scanpy_funcs.py in filter_genes(sparse_gpu_array, genes_idx, min_cells)
    269         Genes containing a number of cells below this value will be filtered
    270     """
--> 271     thr = np.asarray(sparse_gpu_array.sum(axis=0) >= min_cells).ravel()
    272     filtered_genes = cp.sparse.csr_matrix(sparse_gpu_array[:, thr])
    273     genes_idx = genes_idx[np.where(thr)[0]]

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/base.py in sum(self, axis, dtype, out)
    388 
    389         if axis == 0:
--> 390             ret = self.T.dot(cupy.ones(m, dtype=self.dtype)).reshape(1, n)
    391         else:  # axis == 1
    392             ret = self.dot(cupy.ones(n, dtype=self.dtype)).reshape(m, 1)

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/base.py in dot(self, other)
    307     def dot(self, other):
    308         """Ordinary dot product"""
--> 309         return self * other
    310 
    311     def getH(self):

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csc.py in __mul__(self, other)
    111                 return self._with_data(self.data * other)
    112             elif other.ndim == 1:
--> 113                 self.sum_duplicates()
    114                 if cusparse.check_availability('csrmv'):
    115                     csrmv = cusparse.csrmv

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/compressed.py in sum_duplicates(self)
    333             self._has_canonical_format = True
    334             return
--> 335         coo = self.tocoo()
    336         coo.sum_duplicates()
    337         self.__init__(coo.asformat(self.format))

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csc.py in tocoo(self, copy)
    214 
    215         """
--> 216         return self.T.tocoo(copy).T
    217 
    218     def tocsc(self, copy=None):

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csr.py in tocoo(self, copy)
    268             indices = self.indices
    269 
--> 270         return cusparse.csr2coo(self, data, indices)
    271 
    272     def tocsc(self, copy=False):

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupy/cusparse.py in csr2coo(x, data, indices)
    900     cusparse.xcsr2coo(
    901         handle, x.indptr.data.ptr, nnz, m, row.data.ptr,
--> 902         cusparse.CUSPARSE_INDEX_BASE_ZERO)
    903     # data and indices did not need to be copied already
    904     return cupyx.scipy.sparse.coo_matrix(

cupy/cuda/cusparse.pyx in cupy.cuda.cusparse.xcsr2coo()

OverflowError: value too large to convert to int

@cjnolet
Copy link
Member

cjnolet commented Nov 4, 2020

Hi @slowkow,

It looks like this issue may have been addressed already in cupy/cupy#4223. We are running into similar problems as we work through upcoming changes to use Cupy 8.0 and put more of the filtering logic on the GPU device.

An option for us to get around the size limitation in the gene filtering step might be to allocate an empty 1-d output array of size n_cells and then perform the sum over a few batches. Take the following as an example to populate the summed array with the sums across the genes for the first 100 cells:

summed_gpu_array = cp.empty(sparse_gpu_array.shape[0], dtype=cp.float32)
summed_gpu_array[0:100] = sparse_gpu_array[0:100].sum(axis=0)

@slowkow
Copy link
Author

slowkow commented Nov 4, 2020

Corey, thanks for the reply! If I eventually get back to this error, I might try to modify your function filter_genes() to perform a sum over multiple batches and see if the code runs from that point onward.

Could I please ask if you have successfully run the RAPIDS analysis on a real dataset that is larger than the 1M cell dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants