Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check out subversion modules in parallel to speed up checkout #110

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

kutzi
Copy link
Member

@kutzi kutzi commented Feb 5, 2015

This change lets the plugin check out modules in parallel.
We have a project which has a big number of SVN modules and checking out/updating the modules alone took > 2 minutes.
Checking out with 4 threads in parallel brought the time down to 30 secs.

We have been running this code on our Jenkins for > 2 years now without any issues. I've recently rebased the changes on top op subversion-plugin 2.5

@daniel-beck
Copy link
Member

How does this affect checking out into overlapping directories, e.g. one module to ., the other to foo?

Thread count should be configurable (System property to be set on master would suffice IMO), and this code needs to make sure to work correctly with one thread.

@jenkinsadmin
Copy link
Member

Thank you for a pull request! Please check this document for how the Jenkins project handles pull requests

@kutzi
Copy link
Member Author

kutzi commented Feb 5, 2015

@daniel-beck good point about overlapping directories. I guess the behaviour would be unpredictable in worst case. I'm already checking for exact duplicates and issue a warning. Will try to do the same for overlaps, but I'm pretty sure that could become arbitrary complex if someone is using relative paths.

@kutzi
Copy link
Member Author

kutzi commented Feb 5, 2015

@daniel-beck what do you mean by: "this code needs to make sure to work correctly with one thread"?
Do you see any problems in that area? I think I accounted for that with the CurrentThreadExecutorService

@kutzi
Copy link
Member Author

kutzi commented Feb 13, 2015

@daniel-beck I've implemented your suggested changes

@daniel-beck
Copy link
Member

I expect the most common case to be as I described with one of them being ., so that case should be handled. I certainly use that in numerous projects due to how projects are laid out in Subversion.

I'd also prefer if that kind of configuration wasn't effectively deprecated.

@kutzi
Copy link
Member Author

kutzi commented Feb 14, 2015

I'm not sure, I yet understand what you mean by the common case, but I guess it's something like this:

  • you check out your main svn module to .
  • you check out an auxiliary module (maybe some environment templates) to a subfolder of this (e.g. env)
    Right?

In that case it should be safe to check out in parallel - given the 'sane case' that the main module doesn't contains an env folder by this self.
However, we cannot assume that each user has such a 'sane case'.
It the main module would also contain the env folder, the behaviour would be pretty much unpredictable on paralle checkout.

I can see various solutions to this:

  1. detect overlaps in the local dirs and serialize checkout only for this modules: bad, because adds lot of complexity. Also very probable to not detect 100% of this cases
  2. lets users decide - new option to checkout in paralle: bad because of new option. Also users probably wouldn't understand full consequences of checking out in parallel
  3. detect overlaps and check out only with 1 thread for such jobs: probably unnecessary slows down checkout for those jobs and also probably will not detect 100% of cases
  4. keep it as is and check out all jobs single-threaded

I'd prefer to implement option 3. If there are problems because of not detected corner cases, users could still use the system property to use single-threaded check out.

What do you think?

@kutzi
Copy link
Member Author

kutzi commented Mar 6, 2015

@daniel-beck do you have any feedback?

@daniel-beck
Copy link
Member

@kutzi I have not had the time to try this PR in a staging environment, and am still on 1.565.3 and 2.4.x in production.

I wonder though, shouldn't it be fairly easy to determine which checkouts need to be serialized based on different levels in the folder hierarchy? For example, . needs to be independent of others while foo and bar can be done in parallel, as can foo/bar/baz and qux. foo and foo/bar/baz though cannot (or should not, conservatively).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants