-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Add floating point aggregate tests with NaN #21447
base: master
Are you sure you want to change the base?
Conversation
assertQueryError("SELECT DISTINCT a/a FROM (VALUES (0.0e0), (0.0e0)) x (a)", notEqualRowsErrorMsg); | ||
assertQueryError("SELECT * FROM (VALUES nan(), nan(), nan()) GROUP BY 1", notEqualRowsErrorMsg); | ||
assertQueryError("SELECT a, b, c FROM (VALUES ROW(nan(), 1, 2), ROW(nan(), 1, 2)) t(a, b, c) GROUP BY 1, 2, 3", notEqualRowsErrorMsg); | ||
assertQuery("SELECT a FROM (VALUES (ARRAY[nan(), 2e0, 3e0]), (ARRAY[nan(), 2e0, 3e0])) t(a) GROUP BY a"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to move this query ahead as its the only one related to the previous fix you made.
{ | ||
// Fix Velox to get these tests passed. | ||
String notEqualRowsErrorMsg = "*.not equal.*"; | ||
assertQueryError("SELECT DISTINCT a/a FROM (VALUES (0.0e0), (0.0e0)) x (a)", notEqualRowsErrorMsg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also try an aggregation requiring Sorted inputs like say
SELECT a, array_agg(a ORDER BY a) FROM (VALUES (0.0e0), (0.0e0)) x (a) GROUP BY 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this along with NaN values in the table as well.
This also fails as expected.
String notEqualRowsErrorMsg = "*.not equal.*"; | ||
assertQueryError("SELECT DISTINCT a/a FROM (VALUES (0.0e0), (0.0e0)) x (a)", notEqualRowsErrorMsg); | ||
assertQueryError("SELECT * FROM (VALUES nan(), nan(), nan()) GROUP BY 1", notEqualRowsErrorMsg); | ||
assertQueryError("SELECT a, b, c FROM (VALUES ROW(nan(), 1, 2), ROW(nan(), 1, 2)) t(a, b, c) GROUP BY 1, 2, 3", notEqualRowsErrorMsg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its interesting this didn't work either. Is this because of GROUP BY 1, 2, 3 ? If we do just GROUP BY 1 does this work ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values clause is turned into a three column table of normalized (simple) values. So you get the failure in the equals for the NaN.
When changed to GROUP BY 1 the query fails with
Runtime line 1:11: 'b' must be an aggregate expression or appear in GROUP BY clause
because the other columns in the projection have no aggregation defined either via group by, or sum etc.
Running
SELECT a, SUM(b), SUM(c) FROM (VALUES ROW(nan(), 1, 2), ROW(nan(), 1, 2)) t(a, b, c) GROUP BY 1
results in the error as expected.
I tried various things to have the ROW as values in a single column but it somehow always either results in two columns or three columns.
Technically, this should have worked based on the syntax.
SELECT a FROM (VALUES (ROW(nan(), 1, 2)), (ROW(nan(), 1, 2)) ) t(a) GROUP BY 1
Maybe I'm missing something. This complains that the table has 3 columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@czentgr : I see. I expected this to be a single column with ROW types as well. I guess right now the ROW usage is redundant.
|
||
protected void assertQueryError(QueryRunner queryRunner, Session session, @Language("SQL") String sql, @Language("RegExp") String expectedMessageRegExp) | ||
{ | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you have the try block here ?
Does assertQueryError(queryRunner, session, sql, expectedMessageRegExp); not work ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new type of function that specifically checks if the AssertionError
is thrown by the QueryRunner
. This indicates a failure on the JUnit test assertions. There was no such function in the QueryAssertions
class and I opted to add it here.
I'm going to swap the definitions so it is easier to see.
The function without the session argument is a helper which calls this function.
...execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeAggregations.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review!
|
||
protected void assertQueryError(QueryRunner queryRunner, Session session, @Language("SQL") String sql, @Language("RegExp") String expectedMessageRegExp) | ||
{ | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new type of function that specifically checks if the AssertionError
is thrown by the QueryRunner
. This indicates a failure on the JUnit test assertions. There was no such function in the QueryAssertions
class and I opted to add it here.
I'm going to swap the definitions so it is easier to see.
The function without the session argument is a helper which calls this function.
String notEqualRowsErrorMsg = "*.not equal.*"; | ||
assertQueryError("SELECT DISTINCT a/a FROM (VALUES (0.0e0), (0.0e0)) x (a)", notEqualRowsErrorMsg); | ||
assertQueryError("SELECT * FROM (VALUES nan(), nan(), nan()) GROUP BY 1", notEqualRowsErrorMsg); | ||
assertQueryError("SELECT a, b, c FROM (VALUES ROW(nan(), 1, 2), ROW(nan(), 1, 2)) t(a, b, c) GROUP BY 1, 2, 3", notEqualRowsErrorMsg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values clause is turned into a three column table of normalized (simple) values. So you get the failure in the equals for the NaN.
When changed to GROUP BY 1 the query fails with
Runtime line 1:11: 'b' must be an aggregate expression or appear in GROUP BY clause
because the other columns in the projection have no aggregation defined either via group by, or sum etc.
Running
SELECT a, SUM(b), SUM(c) FROM (VALUES ROW(nan(), 1, 2), ROW(nan(), 1, 2)) t(a, b, c) GROUP BY 1
results in the error as expected.
I tried various things to have the ROW as values in a single column but it somehow always either results in two columns or three columns.
Technically, this should have worked based on the syntax.
SELECT a FROM (VALUES (ROW(nan(), 1, 2)), (ROW(nan(), 1, 2)) ) t(a) GROUP BY 1
Maybe I'm missing something. This complains that the table has 3 columns.
{ | ||
// Fix Velox to get these tests passed. | ||
String notEqualRowsErrorMsg = "*.not equal.*"; | ||
assertQueryError("SELECT DISTINCT a/a FROM (VALUES (0.0e0), (0.0e0)) x (a)", notEqualRowsErrorMsg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this along with NaN values in the table as well.
This also fails as expected.
f8fa95a
to
c899829
Compare
@czentgr can we revisit this since facebookincubator/velox#7780 is merged? |
This tests various NaN cases which do not all work. The commit only actually fixed one of them. The outstanding is that Java and Prestissimo/Velox have different behavior and even the Java behavior is not consistent. There is work to fix this and make it consistent. |
@aditi-pandit need your input on this, thx! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @czentgr.
I'm fine to add these tests with queryAssertError.
String notEqualRowsErrorMsg = "*.not equal.*"; | ||
assertQueryError("SELECT DISTINCT a/a FROM (VALUES (0.0e0), (0.0e0)) x (a)", notEqualRowsErrorMsg); | ||
assertQueryError("SELECT * FROM (VALUES nan(), nan(), nan()) GROUP BY 1", notEqualRowsErrorMsg); | ||
assertQueryError("SELECT a, b, c FROM (VALUES ROW(nan(), 1, 2), ROW(nan(), 1, 2)) t(a, b, c) GROUP BY 1, 2, 3", notEqualRowsErrorMsg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@czentgr : I see. I expected this to be a single column with ROW types as well. I guess right now the ROW usage is redundant.
@czentgr : Please can you rebase your code. I can merge these tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @czentgr
@tdcmeehan, @yingsu00 : Please can you help review the changes in AbstractTestQueryFramework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@czentgr Looks good, pending the more detailed expected error message in the comment.
{ | ||
// Fix Velox to get these tests passed. | ||
// See https://github.com/prestodb/presto/issues/20283 | ||
String notEqualRowsErrorMsg = "*.not equal.*"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one nit: will you be able to paste the longer version of the expected error message in the comment?
Description
This PR is adding E2E tests to test floating point NaN values in the various structs and to ensure the RowContainer in Velox works properly.
Motivation and Context
There was and still is an issue handling NaN values in Velox due to the fact NaN is a special value and the C++ behavior does not match the desired database behavior.
Impact
N/A
Test Plan
Adding new tests.
Contributor checklist
Release Notes