Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] dask-cudf isin errors when passing in a list of values #15768

Closed
ayushdg opened this issue May 16, 2024 · 1 comment · Fixed by #15771
Closed

[BUG] dask-cudf isin errors when passing in a list of values #15768

ayushdg opened this issue May 16, 2024 · 1 comment · Fixed by #15771
Assignees
Labels
bug Something isn't working

Comments

@ayushdg
Copy link
Member

ayushdg commented May 16, 2024

Describe the bug
When calling the isin method on a series with the list of values results in an error.
This used to work in 24.04 and seems to be a regression in the nightlies.

Steps/Code to reproduce bug

ser = cudf.Series([1,2,3])
ddf = dask_cudf.from_cudf(ser,1)
ddf.isin([1,5]).head() # ERROR, TypeError: Cannot convert a integer of object type


ser.isin([1,5]) # works

Expected behavior
Returns the result

0     True
1    False
2    False
dtype: bool

Environment overview (please complete the following information)

  • nightly package in a docker container

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context

@ayushdg ayushdg added the bug Something isn't working label May 16, 2024
@rjzamora
Copy link
Member

Here is a reproducer for the problematic cudf code that dask.dataframe is attempting to execute:

import cudf
import numpy as np

ser = cudf.Series([1,2,3])
values = [1, 5]
values = np.fromiter(values, dtype=object)
ser.isin(values)

cc @galipremsagar @mroeschke - In case you know of some recent cudf changes that would make this a problem.

@galipremsagar galipremsagar self-assigned this May 16, 2024
rapids-bot bot pushed a commit that referenced this issue May 17, 2024
Fixes: #15768 

There is a possibility that a host array can have `object` type but contain all values of a homogeneous type, this still cannot be supported by column constructors because `cudf` doesn't have a true `object` types, hence this PR introduces a workaround for this problem in `isin` API.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Richard (Rick) Zamora (https://github.com/rjzamora)
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #15771
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
3 participants