Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-5569][VL] Hide child WriteFilesExec from VeloxColumnarWriteFilesExec on UI #5698

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented May 11, 2024

Fixes #5569 and #5880

Before:

image

After:

image

Copy link

#5569

@ulysses-you
Copy link
Contributor

Please, do not make things complex... I strongly suggest revert previous pr and just inherit case class for now.

@zhztheplayer
Copy link
Member Author

Please, do not make things complex... I strongly suggest revert previous pr and just inherit case class for now.

I don't agree. Please let me know if any Scala project with good code quality has case-class inheritance.

To me doing case-class inheritance makes thing complex. I know Scala doesn't do well for extensibility of case-class, I don't like this limitation either. But if it's the principle of Scala, I'd follow it.

@zhztheplayer zhztheplayer marked this pull request as ready for review May 11, 2024 02:29
@JkSelf
Copy link
Contributor

JkSelf commented May 11, 2024

@zhztheplayer @ulysses-you
Thank you very much for the optimization and suggestions. Indeed, extending a case class in Scala is not considered good practice. However, introducing such significant changes for this fix could complicate future code maintenance. I believe the root of the issue lies with vanilla Spark, and there should be an abstraction of the WriteFilesExec class to facilitate extension. I see that a similar abstraction has already been done with BaseAggregateExec in the current vanilla Spark. Could we possibly submit a PR to the Spark community to address this issue?

@zhztheplayer
Copy link
Member Author

zhztheplayer commented May 11, 2024

I believe @ulysses-you has submitted this. I knew that should definitely help on this issue however the old Spark versions will take long time to leave from Gluten. So any "temporary" workaround on the issue could become somewhat long-term solution in Gluten. That's a reason why I think we should not rely on workarounds...

@ulysses-you
Copy link
Contributor

My point is that if there is an actually issue with case class, I'm fine to change it since it is a fix, otherwise just add a todo and leave it.

I searched Spark code repo, and really find a case... is it valid for you @zhztheplayer ?

Yes, Spark master branch has a trait for this but older version did not.

@zhztheplayer
Copy link
Member Author

zhztheplayer commented May 11, 2024

I searched Spark code repo, and really find a case... is it valid for you @zhztheplayer ?

The case is not really case-class inheritance we are discussing here. It just creates an alias from AlwaysTrue$ to val INSTANCE = new AlwaysTrue(). Case-class's default members, say equals, hashCode, apply, unapply are still valid and correct.

Having said that I am personally not into the approach either. It's better to be AlwaysTrue.INSTANCE or something.

My point is that if there is an actually issue with case class

Once we get some, it could be very hard to trouble-shoot. Say Spark's subquery / exchange reuse rely on case-class equality, and test cases could rely on case-class's unapply to check plan.

I remember we used to suffer from one or several similar case-class inheritance issues from very early stage of Gluten (probably Gazelle, I can't recall it), and the issues really costed us effort to debug and fix. If we don't keep cleaning similar code, I highly doubt we will soon face another issue again.

@Yohahaha
Copy link
Contributor

Please, do not make things complex... I strongly suggest revert previous pr and just inherit case class for now.

I agree, each code sync between community and out internal repo takes lots time, and many new code structure/naming need to learn...
#5880

@zhztheplayer
Copy link
Member Author

zhztheplayer commented May 28, 2024

I agree, each code sync between community and out internal repo takes lots time

No real refactors can be done if an OSS takes the rebase effort of forked repository into account. We can design some APIs / SPIs and maintain the backward compatibility of them with some well-designed out-of-box tests. But it's not possible to maintain backward compatibility for forked repos. We can't predict how forked code is accessing the base code.

Regarding the case class issue, I am actually a little surprised that we have been having so much discussions about it. I recommend one to take some time to do some Googles to understand why it's so harmful, if having questions about that refactor.

As a developer / committer of Gluten I don't like to be too nitpicking either except that the case-class issue is really the one that each developer is supposed to be aware of. Especially when that kind of code is written against a regular Spark operator (WriteFilesExec), which makes the risk increased by 10x.

You can refer to this code example to see how Scala is doing bad on the extending of case classes.

Also, you guys could go through some old commits in Gazelle Plugin (the ancestor of project Gluten), which could perfectly tell you a story how we used to be suffering from case class inheritances:

  1. oap-project/gazelle_plugin@54a2fb1
  2. oap-project/gazelle_plugin@613d315
  3. oap-project/gazelle_plugin@9104398

fixup

fixup

fixup

fixup

fixup

fixup

fixup

fixup

UI
Copy link

Run Gluten Clickhouse CI

@ulysses-you
Copy link
Contributor

@zhztheplayer OK. I get your point and agree actually. But I think it still overkill for the VeloxColumnarWriteFilesExec case. I will send a discusstion mail to dev@gluten.apache.org to see other folks voice and decide where we go. We are working together and we should reach an agreement.

@FelixYBW
Copy link
Contributor

@zhztheplayer @ulysses-you Thank you very much for the optimization and suggestions. Indeed, extending a case class in Scala is not considered good practice. However, introducing such significant changes for this fix could complicate future code maintenance. I believe the root of the issue lies with vanilla Spark, and there should be an abstraction of the WriteFilesExec class to facilitate extension. I see that a similar abstraction has already been done with BaseAggregateExec in the current vanilla Spark. Could we possibly submit a PR to the Spark community to address this issue?

Can we submit a PR to vanilla Spark to fix this? Looks it's the right thing to do in long term.

@ulysses-you
Copy link
Contributor

@FelixYBW yea, it has landed at Spark 4.0.0, but it did not help the older Spark version we support..

@FelixYBW
Copy link
Contributor

@FelixYBW yea, it has landed at Spark 4.0.0, but it did not help the older Spark version we support..

Yes, it makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Hide child WriteFilesExec from VeloxColumnarWriteFilesExec on UI
5 participants