feat(transpiler): handle different hex behavior for dialects #3463

viplazylmht · 2024-05-12T05:01:57Z

Fixes #3460

Default hex function of all dialects, including SQLGlot dialect, to uppercase.
Set HEX_LOWERCASE to True in the dialect if it produces lowercase output.
Simplify LOWER and UPPER expression depending on HEX_LOWERCASE configuration.

sqlglot/dataframe/sql/functions.py

sqlglot/dialects/bigquery.py

tests/dataframe/unit/test_functions.py

georgesittas · 2024-05-12T12:41:57Z

Thanks for the PR @viplazylmht, I'll review more carefully tomorrow

Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>

sqlglot/dialects/bigquery.py

sqlglot/dialects/dialect.py

sqlglot/dialects/presto.py

sqlglot/expressions.py

georgesittas

Thanks for the quick iterations here, @viplazylmht.

PR LGTM, left a few final comments. CC @tobymao or @VaggelisD can you please take a look?

One question I had is what do other dialects do? Does this PR cover all of them? We should be careful not to break existing SQL.

sqlglot/dialects/presto.py

sqlglot/parser.py

tests/dialects/test_bigquery.py

VaggelisD · 2024-05-14T17:33:09Z

Thanks for the PR!

Wouldn't it be preferable to encode the upper/lower as an argument and check that when transpiling to other dialects? E.g. if dialect A defaults to upper, then it will create a node like exp.Hex(this=..., upper=True); If we want to transpile to a dialect B, we can compare B's flag with the node arg to decide whether to wrap the result. .

If there are no other differences between exp.Hex and exp.Lowerhex, I think this will deduplicate the code while helping dialects such as Snowflake that parameterize this behavior anyways

georgesittas · 2024-05-14T17:47:51Z

Wouldn't it be preferable to encode the upper/lower as an argument and check that when transpiling to other dialects? E.g. if dialect A defaults to upper, then it will create a node like exp.Hex(this=..., upper=True); If we want to transpile to a dialect B, we can compare B's flag with the node arg to decide whether to wrap the result. .

Depends on what you're looking to optimize. I think either approach is fine, but I'm leaning towards this one. It allows you to easily instance check Hex expressions, without having to first ensure an object is an expression, and then lookup its args. It also avoids the boolean func arg complexity that we'd need to add and maintain, because Generator.sql would choke on upper.

Another reason I'd like to avoid this pivot is to avoid scope creep. There's some level of inconsistency already on the multiple-expression-types vs additional-args pattern in the AST, so this seems like a topic we could discuss separately if we want to be consistent in what we do.

VaggelisD · 2024-05-14T18:06:32Z

Agreed about the inconsistency part, it would probably be best to solidify those patterns so that they can be enforced independently of the reviewer.

In such cases where the "payload" is not that complex and doesn't greatly alter the semantics, I'm more in favor of adding it as an arg, there's more checks associated to it but otherwise you'd also maintain a new node across all parsers/generators, to search for it you'd have to find multiple instances etc.

@viplazylmht Can you please mention the dialects that have been covered by this PR?

viplazylmht · 2024-05-15T03:00:24Z

Thanks for the review, @georgesittas , @VaggelisD.

Currently, this PR cover: bigquery, presto, trino, clickhouse, hive, spark, spark2.

Snowflake should work because the default behavior of HEX function is upper, unless they pass the case to the arg.
Databricks should work because the behind of scene is spark.

This PR is untested with the remaining dialects, which is supported by SQLGlot.

georgesittas · 2024-05-15T12:11:43Z

Sounds good, thanks for checking. These are the remaining ones:

Ducked HEX —> upper
MySQL HEX —> upper
Redshift TO_HEX—> lower
SQLite HEX —> upper

I'll take this to the finish line

feat(transpiler): handle different hex behavior for dialects

ac4bc7e

georgesittas reviewed May 12, 2024

View reviewed changes

sqlglot/dataframe/sql/functions.py Outdated Show resolved Hide resolved

georgesittas reviewed May 12, 2024

View reviewed changes

sqlglot/dialects/bigquery.py Outdated Show resolved Hide resolved

georgesittas reviewed May 12, 2024

View reviewed changes

tests/dataframe/unit/test_functions.py Outdated Show resolved Hide resolved

Update sqlglot/dialects/bigquery.py

d1c9b02

Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>

georgesittas reviewed May 14, 2024

View reviewed changes

remove UpperHex, add LowerHex

9402b5d

georgesittas approved these changes May 14, 2024

View reviewed changes

sqlglot/dialects/presto.py Show resolved Hide resolved

sqlglot/parser.py Outdated Show resolved Hide resolved

sqlglot/parser.py Outdated Show resolved Hide resolved

tests/dialects/test_bigquery.py Outdated Show resolved Hide resolved

clean

2e18afa

georgesittas merged commit 2433993 into tobymao:main May 15, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(transpiler): handle different hex behavior for dialects #3463

feat(transpiler): handle different hex behavior for dialects #3463

viplazylmht commented May 12, 2024

georgesittas commented May 12, 2024

georgesittas left a comment

VaggelisD commented May 14, 2024 •

edited

georgesittas commented May 14, 2024 •

edited

VaggelisD commented May 14, 2024 •

edited

viplazylmht commented May 15, 2024 •

edited

georgesittas commented May 15, 2024

feat(transpiler): handle different hex behavior for dialects #3463

feat(transpiler): handle different hex behavior for dialects #3463

Conversation

viplazylmht commented May 12, 2024

georgesittas commented May 12, 2024

georgesittas left a comment

Choose a reason for hiding this comment

VaggelisD commented May 14, 2024 • edited

georgesittas commented May 14, 2024 • edited

VaggelisD commented May 14, 2024 • edited

viplazylmht commented May 15, 2024 • edited

georgesittas commented May 15, 2024

VaggelisD commented May 14, 2024 •

edited

georgesittas commented May 14, 2024 •

edited

VaggelisD commented May 14, 2024 •

edited

viplazylmht commented May 15, 2024 •

edited