You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is your question?
During processing of a large NLP dataset I found an very good example on cuml documentation site example. Following its instructions I wrote my own version for my dataset. My dataset contains 6 million phrases, and I wish to run a clustering algorithm to begin testing.
which is the same type that the rapids example presents. Unfortunately, I run into a problem when I load it into the KMeans model.
2024-03-15 12:49:50,008 - distributed.worker - WARNING - Compute Failed
Key: _func_fit-3f8c5c07-bc70-4821-9613-bc8545faf086
Function: _func_fit
args: (b'\x99\xab\xfbBR\xf5M\xd1\x91v\x1f9\x0et\xaa\xd3', [<cupyx.scipy.sparse._csr.csr_matrix object at 0x7f134f8825f0>], 'cupy', False)
kwargs: {'n_clusters': 51, 'verbose': False}
Exception: 'AttributeError("\'NoneType\' object has no attribute \'shape\'")'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/erico/lab/packages_dask/cuml/dask/cluster/kmeans.py", line 198, in fit_predict
return self.fit(X, sample_weight=sample_weight).predict(
File "/home/erico/lab/packages_dask/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
return func(*args, **kwargs)
File "/home/erico/lab/packages_dask/cuml/dask/cluster/kmeans.py", line 175, in fit
wait_and_raise_from_futures(kmeans_fit)
File "/home/erico/lab/packages_dask/cuml/dask/common/utils.py", line 164, in wait_and_raise_from_futures
raise_exception_from_futures(futures)
File "/home/erico/lab/packages_dask/cuml/dask/common/utils.py", line 152, in raise_exception_from_futures
raise RuntimeError(
RuntimeError: 1 of 1 worker jobs failed: 'NoneType' object has no attribute 'shape'
If I try to run yhat = kmeans_float.fit_predict(X.compute()) the error changes to
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/erico/lab/packages_dask/cuml/dask/cluster/kmeans.py", line 198, in fit_predict
return self.fit(X, sample_weight=sample_weight).predict(
File "/home/erico/lab/packages_dask/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
return func(*args, **kwargs)
File "/home/erico/lab/packages_dask/cuml/dask/cluster/kmeans.py", line 154, in fit
data = DistributedDataHandler.create(inputs, client=self.client)
File "/home/erico/lab/packages_dask/cuml/dask/common/input_utils.py", line 108, in create
datatype, multiple = _get_datatype_from_inputs(data)
File "/home/erico/lab/packages_dask/cuml/dask/common/input_utils.py", line 193, in _get_datatype_from_inputs
validate_dask_array(data)
File "/home/erico/lab/packages_dask/cuml/dask/common/dask_arr_utils.py", line 34, in validate_dask_array
if len(darray.chunks) > 2:
AttributeError: 'csr_matrix' object has no attribute 'chunks'
Changing the clustering algorithm also does not help. For instance, I tried the following code:
Key: _func-1d45cd66-2d8a-4525-9c2e-f576d861a7c5
Function: _func
args: (b'\\\x97\xd5\x9fJ\xd3@\xe3\x92\xc7Y\x00\xad\x8a\xb7\x0b', dask.array<from-value, shape=(6261516, 232309), dtype=float64, chunksize=(6261516, 232309), chunktype=cupyx.csr_matrix>)
kwargs: {'min_samples': 5, 'gen_min_span_tree': True, 'verbose': False}
Exception: "ValueError('setting an array element with a sequence.')"
2024-03-15 12:54:36,754 - distributed.worker - WARNING - Compute Failed
Key: _func-092bbae9-6a71-4b2e-b670-445edd0005e9
Function: _func
args: (b'\\\x97\xd5\x9fJ\xd3@\xe3\x92\xc7Y\x00\xad\x8a\xb7\x0b', dask.array<from-value, shape=(6261516, 232309), dtype=float64, chunksize=(6261516, 232309), chunktype=cupyx.csr_matrix>)
kwargs: {'min_samples': 5, 'gen_min_span_tree': True, 'verbose': False}
Exception: "ValueError('setting an array element with a sequence.')"
2024-03-15 12:54:36,757 - distributed.worker - WARNING - Compute Failed
Key: _func-9cc5519b-dda6-4208-8e76-a1a6f345c4ad
Function: _func
args: (b'\\\x97\xd5\x9fJ\xd3@\xe3\x92\xc7Y\x00\xad\x8a\xb7\x0b', dask.array<from-value, shape=(6261516, 232309), dtype=float64, chunksize=(6261516, 232309), chunktype=cupyx.csr_matrix>)
kwargs: {'min_samples': 5, 'gen_min_span_tree': True, 'verbose': False}
Exception: "ValueError('setting an array element with a sequence.')"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/erico/lab/packages_dask/cuml/dask/cluster/dbscan.py", line 160, in fit_predict
self.fit(X, out_dtype)
File "/home/erico/lab/packages_dask/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
return func(*args, **kwargs)
File "/home/erico/lab/packages_dask/cuml/dask/cluster/dbscan.py", line 133, in fit
wait_and_raise_from_futures(dbscan_fit)
File "/home/erico/lab/packages_dask/cuml/dask/common/utils.py", line 164, in wait_and_raise_from_futures
raise_exception_from_futures(futures)
File "/home/erico/lab/packages_dask/cuml/dask/common/utils.py", line 152, in raise_exception_from_futures
raise RuntimeError(
RuntimeError: 4 of 4 worker jobs failed: setting an array element with a sequence., setting an array element with a sequence., setting an array element with a sequence., setting an array element with a sequence.
Any help is appreciated
The text was updated successfully, but these errors were encountered:
Thanks for the issue! I'm not entirely sure what's happening, any chance you could run the script https://github.com/rapidsai/cuml/blob/branch-24.04/print_env.sh and post the output to see what versions of cuml/dask/etc you have, which will be super useful to reproduce.
What is your question?
During processing of a large NLP dataset I found an very good example on cuml documentation site example. Following its instructions I wrote my own version for my dataset. My dataset contains 6 million phrases, and I wish to run a clustering algorithm to begin testing.
After preprocessing the data, the X variable is of type
which is the same type that the rapids example presents. Unfortunately, I run into a problem when I load it into the KMeans model.
If I try to run
yhat = kmeans_float.fit_predict(X.compute())
the error changes toChanging the clustering algorithm also does not help. For instance, I tried the following code:
And I get this error
Any help is appreciated
The text was updated successfully, but these errors were encountered: