Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is mvnd "safe" for parallel builds where "mvn" alone is not? #896

Open
jimklimov opened this issue Oct 21, 2023 · 9 comments
Open

Is mvnd "safe" for parallel builds where "mvn" alone is not? #896

jimklimov opened this issue Oct 21, 2023 · 9 comments

Comments

@jimklimov
Copy link

Related context:

Generally, for using mvnd on the CI:

  • If you expect faster builds thanks to building Maven modules in parallel, then you may consider using stock Maven's -T option. But be warned that it may lead to issues caused by the fact that stock Maven (as of 3.8.2) does not prevent concurrent writes to the same file in the local Maven repo. BTW mvnd 0.6.0 suffers from this problem as well. <...>

Originally posted by @ppalaga in #498 (comment)

My question is rather this: we have CI jobs running mvn tests under the same user account on the same agent - whether for fully independent jobs, or for different parallel stages of the same job (spawning a burst of tests for different components etc.)

This approach did hiccup with unexpected errors which we assume to be the cross-maven corruption of the same local repository, when downloads happened to run simultaneously and fetched the same files. We theorized of a few possible solutions, no idea if any are viable though (hence the question):

  • convert to use mvnd, hoping it would take care of parallelly-requested downloads to same local repo in a sane manner (even if sequentializing them internally, etc.) and transparently for mvn "client" calls;
  • use locking (e.g. Jenkins lockable resources) to ensure sequential runs, and separate the maven operations to first call an mvn validate before each longer build/test (assuming this would fetch all needed files) as the sequential operation, then do actual mvn test/compile/package/etc. as the parallel operations which rely on files fetched safely before.
    • With this I'm not sure if a later downloading session from another maven WOULD NOT endeavour to clean up the local repository from files it does not need for the current build this other maven is handling now.
  • Use separate maven local repos for each independent operation. This raises a few concerns however:

Any insights would be most welcome :)

@cstamas
Copy link
Member

cstamas commented Oct 21, 2023

First, please forget the "built in" parallel builder, use instead this https://github.com/takari/takari-smart-builder (same is used by mvnd).

Second, did you try maven 3.9.x (preferably latest) version? You could use file locking for start (if local repository is on local FS)

@jimklimov
Copy link
Author

Thanks for suggestions, checked that the workers used maven 3.8.6... So change to 3.9.x could just fix the situation while keeping all those build calls independent as they are now?

@cstamas
Copy link
Member

cstamas commented Oct 21, 2023

Maven 3.9 introduced "locks" for local repository, trying to solve exactly that: shared access to local repository from multiple processes... So best would be to try it out (hopefully using some sound FS like ext4 or alike, no windows in picture).

@jimklimov
Copy link
Author

Oh that funny moment when the huge internet looks like a small village: https://www.mail-archive.com/users@maven.apache.org/msg144072.html

Configuration should be as easy as setting “aether.syncContext.named.factory” to “file-lock”

So looking for some best way among several possibilities, to pass aether.syncContext.named.factory=file-lock from CI to maven and not confuse possible other (pre 3.9.x) tool versions along the way :)

@cstamas
Copy link
Member

cstamas commented Oct 21, 2023

create (and check in into SCM) a file in project like this:

.mvn/maven.config

with contents

-Daether.syncContext.named.factory=file-lock
-Daether.syncContext.named.nameMapper=file-gav

PS: plz double check this above, is from top of my head

https://maven.apache.org/configure.html#mvn-maven-config-file
https://maven.apache.org/resolver/configuration.html

@jimklimov
Copy link
Author

jimklimov commented Oct 21, 2023

Thanks for the options, my first shot missed the "file-gav" part ;)

Putting them into SCM as part of the components' source seems a bit like overkill... they should be buildable anywhere (and maybe with different environmental settings), right?

For posterity and my own back-tracking, I'll be exploring the MAVEN_OPTS envvar instead for now, so each Jenkins worker might set what is relevant there...

@jimklimov
Copy link
Author

While at it, I am trying to wrap my head around the https://maven.apache.org/resolver/local-repository.html#split-local-repository feature. Is there some trick that would allow several CI builds to share downloaded third-party artifacts but let them store separately and use without conflict some artifacts from designated "our" namespaces?

I saw suggestions about e.g. -Daether.enhancedLocalRepository.localPrefix=$PROJECT/$BRANCH - but it seems more related to where mvn install would land. Things like mvn test happen inside the build workspace and do not write the built code/test binaries into the local-repo on their own accord, right?

@jimklimov
Copy link
Author

And circling back to this repository's topic, with some new knowledge in mind - does current mvnd benefit from the new concurrency-safe resolver like maven 3.9.x does?

A large part of my team members' question boils down to whether we can replace the original maven by mvnd+mvn frontend just at a finger-snap, by putting different tools into the PATH and not changing much more in the pipelines etc. -- and if this would bring some efficiency benefits?

Originally this idea to use mvnd came up with brainstorming a one-off case of SBOM processing script that does a lot of analytics over mvn help:effective-pom (each call takes some 3 seconds to generate an XML, which adds up to evil run-times for hundreds of components in a deliverable bundle).

@cstamas
Copy link
Member

cstamas commented Oct 22, 2023

Split local repository is new feature, but you have to be aware that in Maven 3 land not all plugins "play nice" with it, see https://issues.apache.org/jira/browse/MNG-7706 In short, if using Maven 3.9.x and you do not see "plugin validation warnings" (see https://maven.apache.org/guides/plugins/validation/index.html), then you should be pretty much okay (but still no 100% guarantee). Best is to try locally and see first (so "lab testing" is what I'd recommend).

Moreover, as split local repository does know to be a bit "mind boggling", I'd really even more recommend to play with it locally (on dev workstation), and when all in place, and if all OK, apply that to CI.

mvnd := mvn 3.9.x (or 4, edition dep) + resolver 1.9.x + smart builder + concurrent logging + resident daemon. So in short, "yes, mvnd knows all what mvn 3.9.x plus much more". In mvnd the file locking is enabled by default (due resident daemon processes sharing same local repository).

Split repo for that use case is next on my roadmap, but gonna happen only in Resolver 2.0 (so Maven 4 final), not in Maven 3.9.x lifecycle. Currently, "split" can only split based as documented: origin remote repo and cached vs install. Current goal of it was "branched development" (one local repo shared with several maven processes building same project but different branches of it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants