Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected hasNulls behavior for #428 #429

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Kopilov
Copy link
Contributor

@Kopilov Kopilov commented Jul 22, 2023

Expected hasNulls behavior for #428

@Jolanrensen
Copy link
Collaborator

Since we're not sure about the performance impact this would have and based on the philosophy of DF I propose the following solution:

  1. Deprecate the original hasNulls getter in favor of a new isNullable getter.
  2. In the release afterwards introduce the hasNulls implementation of this PR

That way, parts of the library that need performance can use isNullable and hasNulls can be used for a more thorough check. @koperagen wdyt?

@koperagen
Copy link
Collaborator

koperagen commented Aug 18, 2023

What i have in mind: in most places where DataColumn is created inside the library, hasNulls is already known and used as a nullability flag for column type. For example,

outputData[col][row] = leftColumn[leftRow].also { if (it == null) hasNulls[col] = true }



I want to use this information when available. Now this information is stored as a nullability flag in the type.

It works everywhere except arrow, there's NullabilityOptions.Widening that makes column without null values (hasNulls = false) to have nullable type (isNullable = true). Here #78 i said it's needed for generating schema from samples, but maybe there are others use cases. @Kopilov interesting what you think about it)

Assuming it's the only such place, i suggest to see if it's feasible to change something in the arrow module without hurting existing use cases. If not, then maybe make hasNulls an optional constructor parameter for DataColumn and pass this parameter whenever possible

@Kopilov
Copy link
Contributor Author

Kopilov commented Aug 21, 2023

@koperagen thank you for the question)

It works everywhere except arrow, there's NullabilityOptions.Widening that makes column without null values (hasNulls = false) to have nullable type (isNullable = true).

Yes but NullabilityOptions.Infer (Use only actual data, set DataColumn.hasNulls to true if and only if there are null values in the column) is still default behavior. Using NullabilityOptions.Widening is just as available as using explicit type in #428 expected/actual example.

Assuming it's the only such place, i suggest to see if it's feasible to change something in the arrow module without hurting existing use cases.

In arrow reading I do not see any problems (if you do, please, open the issue with unexpected behavior example). In arrow writing we can use custom thorough check instead of hasNulls to make saving nullable without nulls as not nullable to work correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants