Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle NaN when pushing filter for JDBC connector #21923

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Praveen2112
Copy link
Member

@Praveen2112 Praveen2112 commented May 10, 2024

Description

Fixes #21922
Some JDBC based sources like PotgreSql, Oracle, Ignite etc treats NaN values as equal,
and greater than all non-NaN values. This is different from Trino's behaviour where NaN values
are not equal, and they are not greater than or lesser than non-NaN values which results in in-correct
results when they are pushed down to the underlying datasource. So we push down additional condition
to ensure that NaN values are not considered in comparison operations.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label May 10, 2024
@Praveen2112 Praveen2112 force-pushed the praveen/nan_for_pg branch 5 times, most recently from 8e50017 to ded8441 Compare May 12, 2024 02:34
@Praveen2112 Praveen2112 added tests:all Run all tests snowflake Snowflake connector labels May 12, 2024
@Praveen2112 Praveen2112 marked this pull request as ready for review May 12, 2024 06:01
@Praveen2112
Copy link
Member Author

cc: @findepi I attempted a fix for it Please share your feedback for the same.

@Override
protected String toPredicate(JdbcClient client, ConnectorSession session, JdbcColumnHandle column, JdbcTypeHandle jdbcType, Type type, WriteFunction writeFunction, String operator, Object value, Consumer<QueryParameter> accumulator)
{
if ((type == REAL || type == DOUBLE) && (operator.equals(">") || operator.equals(">="))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about <, <=, =, !=?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this check of other operators as they work as expected - in case of Postgres NaN is placed about Infinity so we need to add them only for > or >= operator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a code comment.

when you write it down say "in PostgreSQL ..." in a generic class, you'll realize that you're exploiting a (common) implementation detail.

  • what does the spec say about double and NaN ordering? (i think the spec omits existence of NaNs, but i may be wrong)
  • even if the spec said something definitive, implementation vary (for example, Trino and PostgreSQL have different behavior). Which means the behavior should be implemented in connector-specific manner.

{
if ((type == REAL || type == DOUBLE) && (operator.equals(">") || operator.equals(">="))) {
accumulator.accept(new QueryParameter(jdbcType, type, Optional.of(value)));
return format("((%s %s %s) AND (%s <> 'NaN'))", client.quoted(column.getColumnName()), operator, writeFunction.getBindExpression(), client.quoted(column.getColumnName()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'NaN' as a varchar?
are you sure this is portable SQL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the tests 'NaN' is not parsed as a Varchar but considered as a NaN representation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is used along with double/real columns I think it is coerced automatically and the tests also proves the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may work for PostgreSQL and may also work for some other databases, but I doubt this to be a really portable behavior (for example, Trino does not support comparing double values with 'NaN' literal, does it?) and this class is meant to be portable.

public void testSpecialValueOnApproximateNumericColumn()
{
assertThatThrownBy(super::testSpecialValueOnApproximateNumericColumn)
.hasMessageContaining("'NaN' is not a valid numeric or approximate numeric value");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks wrong. why is it OK to skip this test for mysql?

@@ -246,6 +246,69 @@ public void testVarcharCharComparison()
}
}

@Test
@Override
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why override? document

.map("$less_than_or_equal(left: exact_numeric_type, right: exact_numeric_type)").to("left <= right")
.map("$greater_than(left: exact_numeric_type, right: exact_numeric_type)").to("left > right")
.map("$greater_than_or_equal(left: exact_numeric_type, right: exact_numeric_type)").to("left >= right")
.map("$less_than(left: approx_numeric_type, right: approx_numeric_type)").to("((left < right) AND right <> 'NaN')")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

braces around left < right are redundant
braces around whole replaced expression are also redundant

(braces are inserted automatically by rewrite engine)

@@ -69,6 +71,11 @@ protected String toPredicate(JdbcClient client, ConnectorSession session, JdbcCo
return format("%s %s %s COLLATE \"C\"", client.quoted(column.getColumnName()), operator, writeFunction.getBindExpression());
}

if ((type == REAL || type == DOUBLE) && (operator.equals(">") || operator.equals(">="))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about other comparison operators?

@Praveen2112 Praveen2112 force-pushed the praveen/nan_for_pg branch 8 times, most recently from 1df9d6e to 0e5acfc Compare May 18, 2024 01:23
@Praveen2112 Praveen2112 changed the title Handle NaN when pushing filter for Postgres connector Handle NaN when pushing filter for JDBC connector May 18, 2024
@Praveen2112 Praveen2112 force-pushed the praveen/nan_for_pg branch 2 times, most recently from 22a1cf4 to 86a9211 Compare May 18, 2024 04:05
For equals and not equals rewrite we capture them as GenericRewrite expression and usage of
RewriteComparison rule is redundant here.
Some JDBC based sources like PotgreSql, Oracle, Ignite etc treats NaN values as equal,
and greater than all non-NaN values. This is different from Trino's behaviour where NaN values
are not equal, and they are not greater than or lesser than non-NaN values which results in in-correct
results when they are pushed down to the underlying datasource. So we push down additional condition
to ensure that NaN values are not considered in comparison operations.
@Praveen2112
Copy link
Member Author

@findepi Thanks for the review. AC

@findepi
Copy link
Member

findepi commented May 21, 2024

First two commits lgtm. You can merge them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed snowflake Snowflake connector tests:all Run all tests
Development

Successfully merging this pull request may close these issues.

Correctness issue for predicate pushdown on approx numeric column in Postgres
2 participants