Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HiveCreateTableStatement的CLUSTERED BY ( ..) INTO n BUCKETS 输出不符合语法规则 #5853

Closed
niegl opened this issue Apr 19, 2024 · 0 comments

Comments

@niegl
Copy link

niegl commented Apr 19, 2024

dbtype: hive
duird version: all version
error sql: CREATE TABLE db.route(
od_id string COMMENT 'OD',
data_dt string COMMENT 'data date')
CLUSTERED BY (
od_id)
INTO 8 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
error info: 没有错误,但是HiveCreateTableStatement的输出不符合语法规则。

根据hive的建表语法:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
...

当语句为:CREATE TABLE db.route(
od_id string COMMENT 'OD',
data_dt string COMMENT 'data date')
CLUSTERED BY (
od_id)
INTO 8 BUCKETS
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'; 时,
druid解析对象为HiveCreateTableStatement,但是druid输出为
CREATE TABLE db.route(
od_id string COMMENT 'OD',
data_dt string COMMENT 'data date')
CLUSTERED BY (
od_id)
ROW FORMAT SERDE
INTO 8 BUCKETS
...
';

问题代码在类SQLASTOutputVisitor->printCreateTable(HiveCreateTableStatement x, boolean printSelect) 函数中的以下代码:
List clusteredBy = x.getClusteredBy();
if (clusteredBy.size() > 0) {
println();
print0(ucase ? "CLUSTERED BY (" : "clustered by (");
printAndAccept(clusteredBy, ",");
print(')');
}

没有将bucket部分的解析放到一起,而是将bucket部分放到了后面
int buckets = x.getBuckets();
if (buckets > 0) {
println();
print0(ucase ? "INTO " : "into ");
print(buckets);
print0(ucase ? " BUCKETS" : " buckets");
}

lizongbo added a commit to lizongbo/druid that referenced this issue May 4, 2024
lizongbo added a commit that referenced this issue May 4, 2024
优化生成hive sql的逻辑  #5853
@lizongbo lizongbo closed this as completed May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants