Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the unnecessary ser/de by substrait in standalone mode #3520

Open
MichaelScofield opened this issue Mar 14, 2024 · 3 comments · May be fixed by #3812
Open

Remove the unnecessary ser/de by substrait in standalone mode #3520

MichaelScofield opened this issue Mar 14, 2024 · 3 comments · May be fixed by #3812

Comments

@MichaelScofield
Copy link
Collaborator

What type of enhancement is this?

Performance

What does the enhancement do?

Currently the standalone use the same query engine just like distributed, which rewrites the logical plan and ser/de it by substrait. This is not required for standalone, and can bring some performance penalty.

Implementation challenges

No response

@waynexia
Copy link
Member

waynexia commented Mar 21, 2024

I'm negative about this ticket. This is what we chose not to do. It would make the code hard to maintain.

Notice we don't actually have a "standalone" for a long time. Only datanode, frontend and metasrv that are composed in different ways.

The "penalty" you are referring to should be small or doesn't exist -- at least at the query level. Even at the plan phase, where a sub-plan would be optimized again, the penalty is relatively small. Because those parts of plan are not optimized in frontend, so there is no thing like duplicate work. As for the substrait codec part, the overhead looks acceptable.

@MichaelScofield
Copy link
Collaborator Author

I'm not going to create a new query engine here, just trying to get rid of the unnecessary ser/de by substrait in standalone mode. It's like a ”surgical" refactor: just make standalone datanode manager called by LogicalPlan directly, without ser/de to bytes. Other things stay unchanged, including the query engine in region server. So, as to the optimization, the LogicalPlan still can be optimized in region server as always.

What I really want to achieve is to make "explain analyze" to work again, otherwise it's difficult to troubleshoot any query performance issues.

@MichaelScofield MichaelScofield changed the title Standalone can use the "standalone" query engine Remove the unnecessary ser/de by substrait in standalone mode Mar 21, 2024
@waynexia
Copy link
Member

👍 That makes much sense.

As for the second part, there is a related issue #2374. I may continue on it if @NiwakaDev is busy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants