Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] Add floating point aggregate tests with NaN #21447

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

czentgr
Copy link
Contributor

@czentgr czentgr commented Nov 28, 2023

Description

This PR is adding E2E tests to test floating point NaN values in the various structs and to ensure the RowContainer in Velox works properly.

Motivation and Context

There was and still is an issue handling NaN values in Velox due to the fact NaN is a special value and the C++ behavior does not match the desired database behavior.

Impact

N/A

Test Plan

Adding new tests.

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

@czentgr czentgr marked this pull request as ready for review November 28, 2023 15:07
@czentgr czentgr requested review from a team as code owners November 28, 2023 15:07
assertQueryError("SELECT DISTINCT a/a FROM (VALUES (0.0e0), (0.0e0)) x (a)", notEqualRowsErrorMsg);
assertQueryError("SELECT * FROM (VALUES nan(), nan(), nan()) GROUP BY 1", notEqualRowsErrorMsg);
assertQueryError("SELECT a, b, c FROM (VALUES ROW(nan(), 1, 2), ROW(nan(), 1, 2)) t(a, b, c) GROUP BY 1, 2, 3", notEqualRowsErrorMsg);
assertQuery("SELECT a FROM (VALUES (ARRAY[nan(), 2e0, 3e0]), (ARRAY[nan(), 2e0, 3e0])) t(a) GROUP BY a");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to move this query ahead as its the only one related to the previous fix you made.

{
// Fix Velox to get these tests passed.
String notEqualRowsErrorMsg = "*.not equal.*";
assertQueryError("SELECT DISTINCT a/a FROM (VALUES (0.0e0), (0.0e0)) x (a)", notEqualRowsErrorMsg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also try an aggregation requiring Sorted inputs like say
SELECT a, array_agg(a ORDER BY a) FROM (VALUES (0.0e0), (0.0e0)) x (a) GROUP BY 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this along with NaN values in the table as well.
This also fails as expected.

String notEqualRowsErrorMsg = "*.not equal.*";
assertQueryError("SELECT DISTINCT a/a FROM (VALUES (0.0e0), (0.0e0)) x (a)", notEqualRowsErrorMsg);
assertQueryError("SELECT * FROM (VALUES nan(), nan(), nan()) GROUP BY 1", notEqualRowsErrorMsg);
assertQueryError("SELECT a, b, c FROM (VALUES ROW(nan(), 1, 2), ROW(nan(), 1, 2)) t(a, b, c) GROUP BY 1, 2, 3", notEqualRowsErrorMsg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its interesting this didn't work either. Is this because of GROUP BY 1, 2, 3 ? If we do just GROUP BY 1 does this work ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values clause is turned into a three column table of normalized (simple) values. So you get the failure in the equals for the NaN.

When changed to GROUP BY 1 the query fails with

Runtime line 1:11: 'b' must be an aggregate expression or appear in GROUP BY clause

because the other columns in the projection have no aggregation defined either via group by, or sum etc.
Running

SELECT a, SUM(b), SUM(c) FROM (VALUES ROW(nan(), 1, 2), ROW(nan(), 1, 2)) t(a, b, c) GROUP BY 1

results in the error as expected.

I tried various things to have the ROW as values in a single column but it somehow always either results in two columns or three columns.

Technically, this should have worked based on the syntax.

SELECT a FROM (VALUES (ROW(nan(), 1, 2)), (ROW(nan(), 1, 2)) ) t(a) GROUP BY 1

Maybe I'm missing something. This complains that the table has 3 columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@czentgr : I see. I expected this to be a single column with ROW types as well. I guess right now the ROW usage is redundant.


protected void assertQueryError(QueryRunner queryRunner, Session session, @Language("SQL") String sql, @Language("RegExp") String expectedMessageRegExp)
{
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have the try block here ?

Does assertQueryError(queryRunner, session, sql, expectedMessageRegExp); not work ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new type of function that specifically checks if the AssertionError is thrown by the QueryRunner. This indicates a failure on the JUnit test assertions. There was no such function in the QueryAssertions class and I opted to add it here.
I'm going to swap the definitions so it is easier to see.

The function without the session argument is a helper which calls this function.

@aditi-pandit aditi-pandit changed the title [native] Add floating point aggregate NaN tests [native] Add floating point aggregate tests with NaN Nov 28, 2023
Copy link
Contributor Author

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review!


protected void assertQueryError(QueryRunner queryRunner, Session session, @Language("SQL") String sql, @Language("RegExp") String expectedMessageRegExp)
{
try {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new type of function that specifically checks if the AssertionError is thrown by the QueryRunner. This indicates a failure on the JUnit test assertions. There was no such function in the QueryAssertions class and I opted to add it here.
I'm going to swap the definitions so it is easier to see.

The function without the session argument is a helper which calls this function.

String notEqualRowsErrorMsg = "*.not equal.*";
assertQueryError("SELECT DISTINCT a/a FROM (VALUES (0.0e0), (0.0e0)) x (a)", notEqualRowsErrorMsg);
assertQueryError("SELECT * FROM (VALUES nan(), nan(), nan()) GROUP BY 1", notEqualRowsErrorMsg);
assertQueryError("SELECT a, b, c FROM (VALUES ROW(nan(), 1, 2), ROW(nan(), 1, 2)) t(a, b, c) GROUP BY 1, 2, 3", notEqualRowsErrorMsg);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values clause is turned into a three column table of normalized (simple) values. So you get the failure in the equals for the NaN.

When changed to GROUP BY 1 the query fails with

Runtime line 1:11: 'b' must be an aggregate expression or appear in GROUP BY clause

because the other columns in the projection have no aggregation defined either via group by, or sum etc.
Running

SELECT a, SUM(b), SUM(c) FROM (VALUES ROW(nan(), 1, 2), ROW(nan(), 1, 2)) t(a, b, c) GROUP BY 1

results in the error as expected.

I tried various things to have the ROW as values in a single column but it somehow always either results in two columns or three columns.

Technically, this should have worked based on the syntax.

SELECT a FROM (VALUES (ROW(nan(), 1, 2)), (ROW(nan(), 1, 2)) ) t(a) GROUP BY 1

Maybe I'm missing something. This complains that the table has 3 columns.

{
// Fix Velox to get these tests passed.
String notEqualRowsErrorMsg = "*.not equal.*";
assertQueryError("SELECT DISTINCT a/a FROM (VALUES (0.0e0), (0.0e0)) x (a)", notEqualRowsErrorMsg);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this along with NaN values in the table as well.
This also fails as expected.

mknegi
mknegi previously approved these changes Nov 29, 2023
@yzhang1991
Copy link

@czentgr can we revisit this since facebookincubator/velox#7780 is merged?
cc @aditi-pandit

@czentgr
Copy link
Contributor Author

czentgr commented May 3, 2024

This tests various NaN cases which do not all work. The commit only actually fixed one of them.
We can add this with the queryAssertError for the failing queries if we agree.

The outstanding is that Java and Prestissimo/Velox have different behavior and even the Java behavior is not consistent. There is work to fix this and make it consistent.

@yzhang1991
Copy link

@aditi-pandit need your input on this, thx!

aditi-pandit
aditi-pandit previously approved these changes May 9, 2024
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @czentgr.

I'm fine to add these tests with queryAssertError.

String notEqualRowsErrorMsg = "*.not equal.*";
assertQueryError("SELECT DISTINCT a/a FROM (VALUES (0.0e0), (0.0e0)) x (a)", notEqualRowsErrorMsg);
assertQueryError("SELECT * FROM (VALUES nan(), nan(), nan()) GROUP BY 1", notEqualRowsErrorMsg);
assertQueryError("SELECT a, b, c FROM (VALUES ROW(nan(), 1, 2), ROW(nan(), 1, 2)) t(a, b, c) GROUP BY 1, 2, 3", notEqualRowsErrorMsg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@czentgr : I see. I expected this to be a single column with ROW types as well. I guess right now the ROW usage is redundant.

@aditi-pandit
Copy link
Contributor

@czentgr : Please can you rebase your code.

I can merge these tests.

Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @czentgr

@aditi-pandit
Copy link
Contributor

@tdcmeehan, @yingsu00 : Please can you help review the changes in AbstractTestQueryFramework.

Copy link
Contributor

@yingsu00 yingsu00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@czentgr Looks good, pending the more detailed expected error message in the comment.

{
// Fix Velox to get these tests passed.
// See https://github.com/prestodb/presto/issues/20283
String notEqualRowsErrorMsg = "*.not equal.*";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one nit: will you be able to paste the longer version of the expected error message in the comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants