[GLUTEN-5569][VL] Hide child WriteFilesExec from VeloxColumnarWriteFilesExec on UI #5698

zhztheplayer · 2024-05-11T02:02:54Z

Fixes #5569 and #5880

Before:

After:

github-actions · 2024-05-11T02:03:09Z

#5569

ulysses-you · 2024-05-11T02:15:14Z

Please, do not make things complex... I strongly suggest revert previous pr and just inherit case class for now.

zhztheplayer · 2024-05-11T02:22:55Z

Please, do not make things complex... I strongly suggest revert previous pr and just inherit case class for now.

I don't agree. Please let me know if any Scala project with good code quality has case-class inheritance.

To me doing case-class inheritance makes thing complex. I know Scala doesn't do well for extensibility of case-class, I don't like this limitation either. But if it's the principle of Scala, I'd follow it.

JkSelf · 2024-05-11T03:09:41Z

@zhztheplayer @ulysses-you
Thank you very much for the optimization and suggestions. Indeed, extending a case class in Scala is not considered good practice. However, introducing such significant changes for this fix could complicate future code maintenance. I believe the root of the issue lies with vanilla Spark, and there should be an abstraction of the WriteFilesExec class to facilitate extension. I see that a similar abstraction has already been done with BaseAggregateExec in the current vanilla Spark. Could we possibly submit a PR to the Spark community to address this issue?

zhztheplayer · 2024-05-11T03:22:01Z

I believe @ulysses-you has submitted this. I knew that should definitely help on this issue however the old Spark versions will take long time to leave from Gluten. So any "temporary" workaround on the issue could become somewhat long-term solution in Gluten. That's a reason why I think we should not rely on workarounds...

ulysses-you · 2024-05-11T03:41:40Z

My point is that if there is an actually issue with case class, I'm fine to change it since it is a fix, otherwise just add a todo and leave it.

I searched Spark code repo, and really find a case... is it valid for you @zhztheplayer ?

Yes, Spark master branch has a trait for this but older version did not.

zhztheplayer · 2024-05-11T04:11:17Z

I searched Spark code repo, and really find a case... is it valid for you @zhztheplayer ?

The case is not really case-class inheritance we are discussing here. It just creates an alias from AlwaysTrue$ to val INSTANCE = new AlwaysTrue(). Case-class's default members, say equals, hashCode, apply, unapply are still valid and correct.

Having said that I am personally not into the approach either. It's better to be AlwaysTrue.INSTANCE or something.

My point is that if there is an actually issue with case class

Once we get some, it could be very hard to trouble-shoot. Say Spark's subquery / exchange reuse rely on case-class equality, and test cases could rely on case-class's unapply to check plan.

I remember we used to suffer from one or several similar case-class inheritance issues from very early stage of Gluten (probably Gazelle, I can't recall it), and the issues really costed us effort to debug and fix. If we don't keep cleaning similar code, I highly doubt we will soon face another issue again.

Yohahaha · 2024-05-27T09:48:17Z

Please, do not make things complex... I strongly suggest revert previous pr and just inherit case class for now.

I agree, each code sync between community and out internal repo takes lots time, and many new code structure/naming need to learn...
#5880

zhztheplayer · 2024-05-28T05:03:03Z

I agree, each code sync between community and out internal repo takes lots time

No real refactors can be done if an OSS takes the rebase effort of forked repository into account. We can design some APIs / SPIs and maintain the backward compatibility of them with some well-designed out-of-box tests. But it's not possible to maintain backward compatibility for forked repos. We can't predict how forked code is accessing the base code.

Regarding the case class issue, I am actually a little surprised that we have been having so much discussions about it. I recommend one to take some time to do some Googles to understand why it's so harmful, if having questions about that refactor.

As a developer / committer of Gluten I don't like to be too nitpicking either except that the case-class issue is really the one that each developer is supposed to be aware of. Especially when that kind of code is written against a regular Spark operator (WriteFilesExec), which makes the risk increased by 10x.

You can refer to this code example to see how Scala is doing bad on the extending of case classes.

Also, you guys could go through some old commits in Gazelle Plugin (the ancestor of project Gluten), which could perfectly tell you a story how we used to be suffering from case class inheritances:

fixup fixup fixup fixup fixup fixup fixup fixup UI

github-actions · 2024-05-30T05:15:54Z

Run Gluten Clickhouse CI

ulysses-you · 2024-05-31T03:22:19Z

@zhztheplayer OK. I get your point and agree actually. But I think it still overkill for the VeloxColumnarWriteFilesExec case. I will send a discusstion mail to dev@gluten.apache.org to see other folks voice and decide where we go. We are working together and we should reach an agreement.

FelixYBW · 2024-05-31T04:53:09Z

@zhztheplayer @ulysses-you Thank you very much for the optimization and suggestions. Indeed, extending a case class in Scala is not considered good practice. However, introducing such significant changes for this fix could complicate future code maintenance. I believe the root of the issue lies with vanilla Spark, and there should be an abstraction of the WriteFilesExec class to facilitate extension. I see that a similar abstraction has already been done with BaseAggregateExec in the current vanilla Spark. Could we possibly submit a PR to the Spark community to address this issue?

Can we submit a PR to vanilla Spark to fix this? Looks it's the right thing to do in long term.

ulysses-you · 2024-05-31T06:31:19Z

@FelixYBW yea, it has landed at Spark 4.0.0, but it did not help the older Spark version we support..

FelixYBW · 2024-05-31T07:06:19Z

@FelixYBW yea, it has landed at Spark 4.0.0, but it did not help the older Spark version we support..

Yes, it makes sense.

zhztheplayer marked this pull request as ready for review May 11, 2024 02:29

zhztheplayer mentioned this pull request May 13, 2024

[VL]: Fix VeloxColumnarWriteFilesExecwithNewChildren doesn't replace the dummy child #5726

Merged

zhztheplayer force-pushed the wip-ui-write branch from a13565e to 611bdfd Compare May 23, 2024 01:13

Yohahaha mentioned this pull request May 27, 2024

[VL] Noisy wrong fallback message after case-class refactor #5880

Open

zhztheplayer added 2 commits May 30, 2024 11:14

Update VeloxColumnarWriteFilesExec.scala

32a7f8b

fixup fixup fixup fixup fixup fixup fixup fixup UI

fixup

14cef0f

zhztheplayer force-pushed the wip-ui-write branch from c53af5b to 14cef0f Compare May 30, 2024 05:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-5569][VL] Hide child WriteFilesExec from VeloxColumnarWriteFilesExec on UI #5698

[GLUTEN-5569][VL] Hide child WriteFilesExec from VeloxColumnarWriteFilesExec on UI #5698

zhztheplayer commented May 11, 2024 •

edited

github-actions bot commented May 11, 2024

ulysses-you commented May 11, 2024

zhztheplayer commented May 11, 2024

JkSelf commented May 11, 2024

zhztheplayer commented May 11, 2024 •

edited

ulysses-you commented May 11, 2024

zhztheplayer commented May 11, 2024 •

edited

Yohahaha commented May 27, 2024

zhztheplayer commented May 28, 2024 •

edited

github-actions bot commented May 30, 2024

ulysses-you commented May 31, 2024

FelixYBW commented May 31, 2024

ulysses-you commented May 31, 2024

FelixYBW commented May 31, 2024

[GLUTEN-5569][VL] Hide child WriteFilesExec from VeloxColumnarWriteFilesExec on UI #5698

Are you sure you want to change the base?

[GLUTEN-5569][VL] Hide child WriteFilesExec from VeloxColumnarWriteFilesExec on UI #5698

Conversation

zhztheplayer commented May 11, 2024 • edited

github-actions bot commented May 11, 2024

ulysses-you commented May 11, 2024

zhztheplayer commented May 11, 2024

JkSelf commented May 11, 2024

zhztheplayer commented May 11, 2024 • edited

ulysses-you commented May 11, 2024

zhztheplayer commented May 11, 2024 • edited

Yohahaha commented May 27, 2024

zhztheplayer commented May 28, 2024 • edited

github-actions bot commented May 30, 2024

ulysses-you commented May 31, 2024

FelixYBW commented May 31, 2024

ulysses-you commented May 31, 2024

FelixYBW commented May 31, 2024

zhztheplayer commented May 11, 2024 •

edited

zhztheplayer commented May 11, 2024 •

edited

zhztheplayer commented May 11, 2024 •

edited

zhztheplayer commented May 28, 2024 •

edited