New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Behavior of HEX function is different between dialects #3460
Comments
Thanks for the reports / PRs @viplazylmht, these make sense. Perhaps we could only add a single |
@georgesittas Thanks, let me finish this |
I found hive dialect is uppercase too. Spark inherits spark2, which inherits hive, all are the same. So, maybe most of dialects are in uppercase. 0: jdbc:hive2://localhost:10000/> select hex('Hello');
INFO : Compiling command(queryId=hive_20240512035403_4f5122ef-45ac-4a13-a743-99d0b69736a3): select hex('Hello')
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20240512035403_4f5122ef-45ac-4a13-a743-99d0b69736a3); Time taken: 0.151 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=hive_20240512035403_4f5122ef-45ac-4a13-a743-99d0b69736a3): select hex('Hello')
INFO : Completed executing command(queryId=hive_20240512035403_4f5122ef-45ac-4a13-a743-99d0b69736a3); Time taken: 0.001 seconds
+-------------+
| _c0 |
+-------------+
| 48656C6C6F |
+-------------+
1 row selected (0.176 seconds)
0: jdbc:hive2://localhost:10000/> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem:
HEX function encodes a bytes or string into a hexadecimal encoding, but the result of the function is vary between dialects, due to the case of the result.
For example,
in bigquery (lowercase):
in presto/trino (uppercase):
in clickhouse return uppercase.
in spark (uppercase), but hex(md5_digest(x)) produce a lowercase result:
In snowflake, the default behavior is uppercase, but we can pass an additional argument to control the case: https://docs.snowflake.com/en/sql-reference/functions/hex_encode#argumentshttps://docs.snowflake.com/en/sql-reference/functions/hex_encode#arguments
And so on...
=> So, as a transpiler tool, should we ensure the behavior of this function while transpiling?
The solution if needed:
We can add a new exp.UpperHex (or exp.LowerHex, depends on the case of default exp.Hex).
Also, we add a flag to Dialect class, which mark the behavior of HEX function of the target dialect is uppercase or lowercase, then can safety and can simplify the SQL while parsing or generating the sql.
For example, if read dialect is bigquery (lowercase), write dialect is trino (uppercase), the following step will happen when transpiling the sql
select lower(to_hex(col_x))
:exp.SELECT (exp.Hex( col_x ))
SELECT LOWER(TO_HEX(col_x))
If we transpile
select upper(to_hex(col_x))
from bigquery to trino, the code will work like this:exp.Select( exp.UpperHex( col_x ))
SELECT TO_HEX(col_x)
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: