Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Full table scan scenario check condition #107

Open
ketingli1 opened this issue Sep 19, 2023 · 8 comments
Open

[BUG]: Full table scan scenario check condition #107

ketingli1 opened this issue Sep 19, 2023 · 8 comments
Assignees
Labels
feature optimize Impove the code or documentation

Comments

@ketingli1
Copy link

大表扫描场景里,判断SparkPlanInfo节点是否为扫描节点的逻辑为:node.getNodeName().startsWith("Scan")

但是实际的扫描节点的名称并不是Scan开头,逻辑是否有误?
"children": [{
"children": [],
"metadata": {},
"metrics": [{
"accumulatorId": 21,
"metricType": "sum",
"nodeName": "number of output rows"
}],
"nodeName": "HiveTableScan",
"simpleString": "HiveTableScan [id#6L, plan_type#10], HiveTableRelation dev.data_skew_table_partitioned, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#6L, task_pod_name#7, project_id#8L, create_time#9, plan_type#10, deleted#11], [ds#12]"
}],

@zebozhuang
Copy link
Collaborator

zebozhuang commented Sep 19, 2023

大表扫描场景里,判断SparkPlanInfo节点是否为扫描节点的逻辑为:node.getNodeName().startsWith("Scan")

但是实际的扫描节点的名称并不是Scan开头,逻辑是否有误? "children": [{ "children": [], "metadata": {}, "metrics": [{ "accumulatorId": 21, "metricType": "sum", "nodeName": "number of output rows" }], "nodeName": "HiveTableScan", "simpleString": "HiveTableScan [id#6L, plan_type#10], HiveTableRelation dev.data_skew_table_partitioned, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#6L, task_pod_name#7, project_id#8L, create_time#9, plan_type#10, deleted#11], [ds#12]" }],

可以提供完整的一点eventlog?或者dag graph截图?

@ketingli1
Copy link
Author

ketingli1 commented Sep 19, 2023

spark event log日志文件
application_1693822918548_1324.log

@zebozhuang
Copy link
Collaborator

application_1693822918548_1324.log

我们看下

@ketingli1
Copy link
Author

您好,有结论吗

@zebozhuang
Copy link
Collaborator

zebozhuang commented Sep 20, 2023

您好,有结论吗

这个应该是另个扫描表节点,没有包括在内的, 应该是要支持起来的,你们可以提个pull request补充这个逻辑吗?

@zebozhuang zebozhuang added optimize Impove the code or documentation feature labels Sep 20, 2023
@zebozhuang
Copy link
Collaborator

您好,有结论吗

另外,抱歉,昨天加错解析json包,比较晚看到结果。这个是通过个spark submit提交的任务吗?看解析plan里面的table可能需要小调整

@zebozhuang zebozhuang self-assigned this Sep 21, 2023
@ketingli1
Copy link
Author

对,是spark submit提交的,还需要我提个pull request吗?

@zebozhuang
Copy link
Collaborator

对,是spark submit提交的,还需要我提个pull request吗?

可以的,非常欢迎

@zebozhuang zebozhuang changed the title 大表扫描场景判断逻辑 [BUG]: Full table scan scenario check condition Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature optimize Impove the code or documentation
Projects
None yet
Development

No branches or pull requests

2 participants