-
Notifications
You must be signed in to change notification settings - Fork 852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push some as_column arrow logic to ColumnBase.from_arrow #15738
Conversation
if isinstance(arbitrary, pa.NullArray): | ||
if dtype is not None: | ||
# Cast the column to the `dtype` if specified. | ||
new_dtype = dtype | ||
elif len(arbitrary) == 0: | ||
# If the column is empty, it has to be | ||
# a `str` dtype. | ||
new_dtype = cudf.dtype("str") | ||
else: | ||
# If the null column is not empty, it has to | ||
# be of `object` dtype. | ||
new_dtype = cudf.dtype(arbitrary.type.to_pandas_dtype()) | ||
|
||
if cudf.get_option( | ||
"mode.pandas_compatible" | ||
) and new_dtype == cudf.dtype("O"): | ||
# We internally raise if we do `astype("object")`, hence | ||
# need to cast to `str` since this is safe to do so because | ||
# it is a null-array. | ||
new_dtype = "str" | ||
|
||
col = col.astype(new_dtype) | ||
elif dtype is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we need to add handling for this into from_arrow
? The other removals here all have a corresponding branch in from_arrow
. Is this now being handled by the new three lines you added above that are doing pa.types.is_null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we need to add handling for this into from_arrow?
I had kept this here since from_arrow
doesn't do any explicit casting with dtype
. But, are pyarrow.null
type arrays ever expected to be represented as is in cudf? Otherwise, it would make sense to convert null to str in from_arrow
Is this now being handled by the new three lines you added above that are doing pa.types.is_null?
Yes, I think this casting was here when the default "empty" type was float pre-pandas 2.0 support and we need this to be string sometimes. Now we can collapse all these conditions to always cast to string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pyarrow.null has like one possible code path in all of the cudf test suite where it gets used 😅 so I think you can safely ignore it as long as tests are currently passing. Eventually we'll need to deal with this more thoroughly in #12477
/merge |
Description
as_column
andColumnBase.from_arrow
have similar checks for handlingpa.Array
objects so consolidating them toColumnBase.from_arrow
asas_column
calls to that eventually.Checklist