Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Cross-building via Mill #540

Open
nightscape opened this issue Feb 27, 2024 · 5 comments
Open

[FEATURE] Cross-building via Mill #540

nightscape opened this issue Feb 27, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@nightscape
Copy link

nightscape commented Feb 27, 2024

Is your feature request related to a problem? Please describe.
Currently, Deequ only gets published for a small subset of combinations of Scala version ✖️ Spark version.
People are running into issues, or trying to get their PRs with version changes in (which of course would hurt people using different version combinations):

Describe the solution you'd like
I'm the maintainer of spark-excel, a Spark library for reading and writing Excel files.
I'm using Mill to build the library against an extensive set of combinations of Spark and Scala versions.
The corresponding build definition currently has ~160 lines of code, which is less than half of Deequ's pom.xml.

Li Haoyi, the inventor of Mill has published a very interesting blog post on why he developed Mill.
From a user perspective, Mill is nice because

  • it has a very simple and familiar mental model (just traits, objects and defs with a special T class doing all the magic)
  • it allows writing custom tasks with a minimal amount of overhead
  • it compiles code faster than most other solutions
  • it has cross-building built in, not only for Scala versions, but along any dimension you want.
  • it is actively maintained

Would it be thinkable to give Mill a try for cross-building Deequ to a wider set of Scala and Spark versions?
If so, what constraints would have to be met?
I would be willing to create a PR for this if there's a realistic chance to get it merged.

Describe alternatives you've considered
The net.alchim31.maven scala-maven-plugin does not support cross-building directly. There are alternatives and extensions though, e.g.

@nightscape nightscape added the enhancement New feature or request label Feb 27, 2024
@hygt
Copy link

hygt commented Feb 29, 2024

I have an internal fork at work that is essentially doing this for a small matrix of Spark and Scala versions. It would be great if this was done upstream.

I don't have a strong opinion on Mill vs sbt, but I also feel Maven is clearly inadequate here (and in the Spark ecosystem at large).

@nightscape
Copy link
Author

Both SBT and Mill work much better for Scala projects than Maven does.
SBT is ok for simple things, or if you invest a lot of time into understanding its underlying model.
Mill just works out of the box, and even though I only spent about ¼ of the time with it compared to SBT, I'd already consider myself more proficient and productive with it than I ever did with SBT.

@hygt
Copy link

hygt commented Feb 29, 2024

I use both, but sbt builds are still better supported by IntelliJ IDEA, and the toolchain is easier to bootstrap in a corporate environment with proxies and Maven mirrors. This point is important to people who aren't that deeply invested in the Scala ecosystem, and this is why Maven is so popular around Spark despite being objectively the wrong tool for the job here.

Deequ is a fairly simple project to build (my current build.sbt is 35 LoC) even if you add cross Scala/Spark versions support, the Mill build won't be significantly simpler.

@nightscape
Copy link
Author

@hygt SBT is definitely fine as well and a big step forward from Maven!
Would you mind opening a PR with your SBT implementation?

@rdsharma26
Copy link
Contributor

@nightscape @hygt Thanks for the helpful information on Mill and SBT. We did some work last year to move our build to SBT, but we did not create a PR for it.

If possible, as a first step, can we have the SBT implementation side by side with the Maven implementation? That way, we can get SBT builds integrated into the project quickly and we can fall back to the Maven build for deploying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants