Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(transpiler): handle different hex behavior for dialects #3463

Merged

Conversation

viplazylmht
Copy link
Contributor

Fixes #3460

  • Default hex function of all dialects, including SQLGlot dialect, to uppercase.
  • Set HEX_LOWERCASE to True in the dialect if it produces lowercase output.
  • Simplify LOWER and UPPER expression depending on HEX_LOWERCASE configuration.

@georgesittas
Copy link
Collaborator

Thanks for the PR @viplazylmht, I'll review more carefully tomorrow

Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>
sqlglot/dialects/bigquery.py Outdated Show resolved Hide resolved
sqlglot/dialects/dialect.py Outdated Show resolved Hide resolved
sqlglot/dialects/dialect.py Outdated Show resolved Hide resolved
sqlglot/dialects/presto.py Outdated Show resolved Hide resolved
sqlglot/expressions.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick iterations here, @viplazylmht.

PR LGTM, left a few final comments. CC @tobymao or @VaggelisD can you please take a look?

One question I had is what do other dialects do? Does this PR cover all of them? We should be careful not to break existing SQL.

sqlglot/dialects/presto.py Show resolved Hide resolved
sqlglot/parser.py Outdated Show resolved Hide resolved
sqlglot/parser.py Outdated Show resolved Hide resolved
tests/dialects/test_bigquery.py Outdated Show resolved Hide resolved
@VaggelisD
Copy link
Collaborator

VaggelisD commented May 14, 2024

Thanks for the PR!

Wouldn't it be preferable to encode the upper/lower as an argument and check that when transpiling to other dialects? E.g. if dialect A defaults to upper, then it will create a node like exp.Hex(this=..., upper=True); If we want to transpile to a dialect B, we can compare B's flag with the node arg to decide whether to wrap the result. .

If there are no other differences between exp.Hex and exp.Lowerhex, I think this will deduplicate the code while helping dialects such as Snowflake that parameterize this behavior anyways

@georgesittas
Copy link
Collaborator

georgesittas commented May 14, 2024

Wouldn't it be preferable to encode the upper/lower as an argument and check that when transpiling to other dialects? E.g. if dialect A defaults to upper, then it will create a node like exp.Hex(this=..., upper=True); If we want to transpile to a dialect B, we can compare B's flag with the node arg to decide whether to wrap the result. .

Depends on what you're looking to optimize. I think either approach is fine, but I'm leaning towards this one. It allows you to easily instance check Hex expressions, without having to first ensure an object is an expression, and then lookup its args. It also avoids the boolean func arg complexity that we'd need to add and maintain, because Generator.sql would choke on upper.

Another reason I'd like to avoid this pivot is to avoid scope creep. There's some level of inconsistency already on the multiple-expression-types vs additional-args pattern in the AST, so this seems like a topic we could discuss separately if we want to be consistent in what we do.

@VaggelisD
Copy link
Collaborator

VaggelisD commented May 14, 2024

Agreed about the inconsistency part, it would probably be best to solidify those patterns so that they can be enforced independently of the reviewer.

In such cases where the "payload" is not that complex and doesn't greatly alter the semantics, I'm more in favor of adding it as an arg, there's more checks associated to it but otherwise you'd also maintain a new node across all parsers/generators, to search for it you'd have to find multiple instances etc.

@viplazylmht Can you please mention the dialects that have been covered by this PR?

@viplazylmht
Copy link
Contributor Author

viplazylmht commented May 15, 2024

Thanks for the review, @georgesittas , @VaggelisD.

Currently, this PR cover: bigquery, presto, trino, clickhouse, hive, spark, spark2.

  • Snowflake should work because the default behavior of HEX function is upper, unless they pass the case to the arg.
  • Databricks should work because the behind of scene is spark.

This PR is untested with the remaining dialects, which is supported by SQLGlot.

@georgesittas
Copy link
Collaborator

Sounds good, thanks for checking. These are the remaining ones:

  • Ducked HEX —> upper
  • MySQL HEX —> upper
  • Redshift TO_HEX—> lower
  • SQLite HEX —> upper

I'll take this to the finish line

@georgesittas georgesittas merged commit 2433993 into tobymao:main May 15, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Behavior of HEX function is different between dialects
4 participants