Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] running the Leiden algorithm doesn't support oversubscription #171

Open
Lem-P opened this issue Apr 17, 2024 · 3 comments
Open

[BUG] running the Leiden algorithm doesn't support oversubscription #171

Lem-P opened this issue Apr 17, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Lem-P
Copy link

Lem-P commented Apr 17, 2024

Describe the bug
While running the leiden algorithm rsc.tl.leiden(adata, key_added="leiden_res0_25", resolution=0.25 , I got a "CUDA error encountered 101 cudaErrorInvalidDevice invalid device ordinal"

Just setting rmm.reinitialize managed_memory to False resolved the issue

Expected behavior
Just information for other people running into the same error

Environment details (please complete the following information):

  • Environment location: Conda running in WSL2
  • Linux Distro/Architecture: Ubuntu 22.04.4 LTS
  • GPU Model/Driver: [RTX 3070 and driver 31.0.15.5161]
  • CUDA: 12.4
  • Method of Rapids install: conda
@Lem-P Lem-P added the bug Something isn't working label Apr 17, 2024
@Intron7
Copy link
Member

Intron7 commented Apr 17, 2024

where did you get the error? That means do you get this in rsc or cugraph. Could you also please upload the full stack-trace. If you can reproduce the error just with cugraph. I think I would be amazing if you create an issue there too.

@Lem-P
Copy link
Author

Lem-P commented Apr 17, 2024

I get this in rapids_singlecell


RuntimeError Traceback (most recent call last)
Cell In[42], line 1
----> 1 rsc.tl.leiden(adata, key_added="leiden_res0_25", resolution=0.25)
2 rsc.tl.leiden(adata, key_added="leiden_res0_5", resolution=0.5)
3 rsc.tl.leiden(adata, key_added="leiden_res0_1", resolution=0.1)

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/rapids_singlecell/tools/_clustering.py:125, in leiden(adata, resolution, random_state, restrict_to, key_added, adjacency, n_iterations, use_weights, neighbors_key, obsp, copy)
117 restrict_key, restrict_categories = restrict_to
118 adjacency, restrict_indices = restrict_adjacency(
119 adata=adata,
120 restrict_key=restrict_key,
121 restrict_categories=restrict_categories,
122 adjacency=adjacency,
123 )
--> 125 g = _create_graph(adjacency, use_weights)
126 # Cluster
127 leiden_parts, _ = culeiden(
128 g,
129 resolution=resolution,
130 random_state=random_state,
131 max_iter=n_iterations,
132 )

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/rapids_singlecell/tools/_clustering.py:31, in _create_graph(adjacency, use_weights)
29 warnings.simplefilter("ignore")
30 if use_weights:
---> 31 g.from_cudf_edgelist(
32 df, source="source", destination="destination", weight="weights"
33 )
34 else:
35 g.from_cudf_edgelist(df, source="source", destination="destination")

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cugraph/structure/graph_classes.py:193, in Graph.from_cudf_edgelist(self, input_df, source, destination, edge_attr, weight, edge_id, edge_type, renumber, store_transposed, legacy_renum_only)
191 elif self._Impl.edgelist is not None or self._Impl.adjlist is not None:
192 raise RuntimeError("Graph already has values")
--> 193 self._Impl._simpleGraphImpl__from_edgelist(
194 input_df,
195 source=source,
196 destination=destination,
197 edge_attr=edge_attr,
198 weight=weight,
199 edge_id=edge_id,
200 edge_type=edge_type,
201 renumber=renumber,
202 store_transposed=store_transposed,
203 legacy_renum_only=legacy_renum_only,
204 )

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cugraph/structure/graph_implementation/simpleGraph.py:262, in simpleGraphImpl.__from_edgelist(self, input_df, source, destination, edge_attr, weight, edge_id, edge_type, renumber, legacy_renum_only, store_transposed)
257 # The dataframe will be symmetrized iff the graph is undirected
258 # otherwise the inital dataframe will be returned. Duplicated edges
259 # will be dropped unless the graph is a MultiGraph(Not Implemented yet)
260 # TODO: Update Symmetrize to work on Graph and/or DataFrame
261 if edge_attr is not None:
--> 262 source_col, dest_col, value_col = symmetrize(
263 elist,
264 source,
265 destination,
266 edge_attr,
267 multi=self.properties.multi_edge, # Deprecated parameter
268 symmetrize=not self.properties.directed,
269 )
271 if isinstance(value_col, cudf.DataFrame):
272 value_dict = {}

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cugraph/structure/symmetrize.py:281, in symmetrize(input_df, source_col_name, dest_col_name, value_col_name, multi, symmetrize, do_expensive_check)
272 output_df = symmetrize_ddf(
273 input_df,
274 source_col_name,
(...)
278 symmetrize,
279 )
280 else:
--> 281 output_df = symmetrize_df(
282 input_df,
283 source_col_name,
284 dest_col_name,
285 value_col_name,
286 multi,
287 symmetrize,
288 )
289 if value_col_name is not None:
290 value_col = output_df[value_col_name]

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cugraph/structure/symmetrize.py:100, in symmetrize_df(df, src_name, dst_name, weight_name, multi, symmetrize)
93 warnings.warn(
94 "Multi is deprecated and the removal of multi edges will no longer be "
95 "supported from 'symmetrize'. Multi edges will be removed upon creation "
96 "of graph instance.",
97 FutureWarning,
98 )
99 vertex_col_name = src_name + dst_name
--> 100 result = result.groupby(by=[*vertex_col_name], as_index=False).min()
101 return result

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cudf/core/mixins/mixin_factory.py:11, in _partialmethod..wrapper(self, *args2, **kwargs2)
10 def wrapper(self, *args2, **kwargs2):
---> 11 return method(self, *args1, *args2, **kwargs1, **kwargs2)

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cudf/core/groupby/groupby.py:701, in GroupBy._reduce(self, op, numeric_only, min_count, *args, **kwargs)
697 if min_count != 0:
698 raise NotImplementedError(
699 "min_count parameter is not implemented yet"
700 )
--> 701 return self.agg(op)

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/nvtx/nvtx.py:116, in annotate.call..inner(*args, **kwargs)
113 @wraps(func)
114 def inner(*args, **kwargs):
115 libnvtx_push_range(self.attributes, self.domain.handle)
--> 116 result = func(*args, **kwargs)
117 libnvtx_pop_range(self.domain.handle)
118 return result

File ~/anaconda3/envs/sc_rapids/lib/python3.10/site-packages/cudf/core/groupby/groupby.py:567, in GroupBy.agg(self, func)
558 orig_dtypes = tuple(c.dtype for c in columns)
560 # Note: When there are no key columns, the below produces
561 # a Float64Index, while Pandas returns an Int64Index
562 # (GH: 6945)
563 (
564 result_columns,
565 grouped_key_cols,
566 included_aggregations,
--> 567 ) = self._groupby.aggregate(columns, normalized_aggs)
569 result_index = self.grouping.keys._from_columns_like_self(
570 grouped_key_cols,
571 )
573 multilevel = _is_multi_agg(func)

File groupby.pyx:350, in cudf._lib.groupby.GroupBy.aggregate()

File groupby.pyx:252, in cudf._lib.groupby.GroupBy.aggregate_internal()

RuntimeError: CUDA error encountered at: /opt/conda/conda-bld/work/cpp/src/hash/concurrent_unordered_map.cuh:546: 101 cudaErrorInvalidDevice invalid device ordinal

@Intron7
Copy link
Member

Intron7 commented Apr 17, 2024

Ok I cant reproduce the error. Can you make an issue on cugraph. This happens inside of the cugraph graph construction. They should know about this, because they might be able to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants