OverflowError: value too large to convert to int #50

slowkow · 2020-11-02T21:59:39Z

Could I ask if you might have any tips on how to overcome this error?

I'm running your 1M cell code, but I tried it on my own set of 2.8M cells.

Here's my matrix:

sparse_gpu_array.shape
# (2886934, 33567)

sparse_gpu_array.nnz
# 4128695018

Let's try to run this:

sparse_gpu_array, genes = rapids_scanpy_funcs.filter_genes(sparse_gpu_array, genes, min_cells=1000)

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<timed exec> in <module>

~/work/github.com/slowkow/rapids-single-cell-examples/notebooks/rapids_scanpy_funcs.py in filter_genes(sparse_gpu_array, genes_idx, min_cells)
    269         Genes containing a number of cells below this value will be filtered
    270     """
--> 271     thr = np.asarray(sparse_gpu_array.sum(axis=0) >= min_cells).ravel()
    272     filtered_genes = cp.sparse.csr_matrix(sparse_gpu_array[:, thr])
    273     genes_idx = genes_idx[np.where(thr)[0]]

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/base.py in sum(self, axis, dtype, out)
    388 
    389         if axis == 0:
--> 390             ret = self.T.dot(cupy.ones(m, dtype=self.dtype)).reshape(1, n)
    391         else:  # axis == 1
    392             ret = self.dot(cupy.ones(n, dtype=self.dtype)).reshape(m, 1)

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/base.py in dot(self, other)
    307     def dot(self, other):
    308         """Ordinary dot product"""
--> 309         return self * other
    310 
    311     def getH(self):

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csc.py in __mul__(self, other)
    111                 return self._with_data(self.data * other)
    112             elif other.ndim == 1:
--> 113                 self.sum_duplicates()
    114                 if cusparse.check_availability('csrmv'):
    115                     csrmv = cusparse.csrmv

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/compressed.py in sum_duplicates(self)
    333             self._has_canonical_format = True
    334             return
--> 335         coo = self.tocoo()
    336         coo.sum_duplicates()
    337         self.__init__(coo.asformat(self.format))

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csc.py in tocoo(self, copy)
    214 
    215         """
--> 216         return self.T.tocoo(copy).T
    217 
    218     def tocsc(self, copy=None):

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csr.py in tocoo(self, copy)
    268             indices = self.indices
    269 
--> 270         return cusparse.csr2coo(self, data, indices)
    271 
    272     def tocsc(self, copy=False):

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupy/cusparse.py in csr2coo(x, data, indices)
    900     cusparse.xcsr2coo(
    901         handle, x.indptr.data.ptr, nnz, m, row.data.ptr,
--> 902         cusparse.CUSPARSE_INDEX_BASE_ZERO)
    903     # data and indices did not need to be copied already
    904     return cupyx.scipy.sparse.coo_matrix(

cupy/cuda/cusparse.pyx in cupy.cuda.cusparse.xcsr2coo()

OverflowError: value too large to convert to int

The text was updated successfully, but these errors were encountered:

cjnolet · 2020-11-04T19:52:33Z

Hi @slowkow,

It looks like this issue may have been addressed already in cupy/cupy#4223. We are running into similar problems as we work through upcoming changes to use Cupy 8.0 and put more of the filtering logic on the GPU device.

An option for us to get around the size limitation in the gene filtering step might be to allocate an empty 1-d output array of size n_cells and then perform the sum over a few batches. Take the following as an example to populate the summed array with the sums across the genes for the first 100 cells:

summed_gpu_array = cp.empty(sparse_gpu_array.shape[0], dtype=cp.float32)
summed_gpu_array[0:100] = sparse_gpu_array[0:100].sum(axis=0)

slowkow · 2020-11-04T20:46:23Z

Corey, thanks for the reply! If I eventually get back to this error, I might try to modify your function filter_genes() to perform a sum over multiple batches and see if the code runs from that point onward.

Could I please ask if you have successfully run the RAPIDS analysis on a real dataset that is larger than the 1M cell dataset?

slowkow mentioned this issue Nov 2, 2020

OverflowError: value too large to convert to int numba/pyculib#31

Closed

avantikalal assigned cjnolet Nov 3, 2020

slowkow mentioned this issue Nov 3, 2020

OverflowError: value too large to convert to int cupy/cupy#4223

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OverflowError: value too large to convert to int #50

OverflowError: value too large to convert to int #50

slowkow commented Nov 2, 2020

cjnolet commented Nov 4, 2020

slowkow commented Nov 4, 2020

OverflowError: value too large to convert to int #50

OverflowError: value too large to convert to int #50

Comments

slowkow commented Nov 2, 2020

cjnolet commented Nov 4, 2020

slowkow commented Nov 4, 2020