Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve submodule operation performance #2655

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

CosynPa
Copy link
Contributor

@CosynPa CosynPa commented Dec 2, 2018

Fixes #1833 and #2651.

Better wait after #2654 is merged, then rebase on it.

@tmspzz
Copy link
Member

tmspzz commented Dec 4, 2018

You can rebase

@CosynPa
Copy link
Contributor Author

CosynPa commented Dec 7, 2018

Rebased.

@tmspzz
Copy link
Member

tmspzz commented Dec 7, 2018

I don't work with submodules enough to feel qualified to review the change. I think @mdiep should be reviewing here.

@CosynPa Ad high level description of the problem and the solution would certainly sleep up the review.

@CosynPa
Copy link
Contributor Author

CosynPa commented Dec 7, 2018

The problem is, as is pointed out in the two issues, the carthage checkout operation will try cloning submodules of dependencies, which takes a lot of time.

There are two functions that will clone or fetch submodules, one is cloneRepository which will be called when the --use-submodules option is not used, the other is checkoutSubmodule which will be called both with or without the --use-submodules option.

My solution is that first clone or fetch submodules to the local cache directory. Then redirect the slow network operations to the local cache. For checkoutSubmodule, which will do a git submodule update --init --recursive, the solution is a little bit tricky. I first change the .gitmodules file, so that the URLs of the submodules are pointing to the local cache repository, then do the git submodule update --init, and change the URLs back.

I also serialized heavy repository write operations, so cloning the same repository twice and checking out the same repository twice can be avoided.

@mdiep
Copy link
Member

mdiep commented Dec 23, 2018

I'll try to get to this ASAP. I'm on vacation right now, but I'll see if I can find some time to review this.

@stale
Copy link

stale bot commented Jan 22, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 22, 2019
@stale stale bot removed the stale label Jan 22, 2019
@@ -153,6 +153,7 @@ public func contentsOfFileInRepository(_ repositoryFileURL: URL, _ path: String,
public func checkoutRepositoryToDirectory(
_ repositoryFileURL: URL,
_ workingDirectoryURL: URL,
force: Bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this added/needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original checkoutRepositoryToDirectory function was a forced checkout, and in checkoutSubmodule there was a non-forced checkout. So to reuse the checkoutRepositoryToDirectory function, I added the force parameter.


/// Clones the given submodule into the working directory of its parent
/// repository, but without any Git metadata.
public func cloneSubmoduleInWorkingDirectory(_ submodule: Submodule, _ workingDirectoryURL: URL) -> SignalProducer<(), CarthageError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these functions should move here:

  1. These are generic functions that shouldn't really be tied to the Project
  2. Because these were moved, the diff for this PR is really large and almost impossible to review

Can you move them back to the Git file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions were moved because the new implementation depends on repositoryFileURL which is a private function in Project.swift.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then that should be passed in as a parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cloneOrFetchDependency? These functions actually rely on cloneOrFetchDependency which uses cloneOrFetchQueue and _projectEventsObserver which are Project properties. I think some kind of queue is definitely needed. So also pass the queue in as a parameter?

}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This global state makes me very uncomfortable. I don't think it's a good idea to introduce a static var with this. It leads to code that's hard to follow and modify.

We should use another method to enforce exclusivity. I think it makes sense to do something like this:

  1. Fetch all the submodules from the repository cache, in parallel but with limited concurrency.
  2. Perform the actual checkouts, passing in a queue or doing them one at a time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there are already a lot of queues, but they are instance properties of a Project. So I just need to move the global variable to the Project class?

}

public func updateSubmodules(_ superprojectDirectoryURL: URL) -> SignalProducer<(), CarthageError> {
func changeGitmodulesUrls(toCacheURL: Bool) -> SignalProducer<(), CarthageError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary. The original code in addSubmoduleToRepository already handled doing the equivalent of git submodule update --init from the repository cache.

If your intent is to make that logic recursive, I think you can do that following the existing method without modifying .gitmodules.

@mdiep
Copy link
Member

mdiep commented Jan 24, 2019

First off: Thanks for the PR!

I think the overall idea—make sure we're using the repository cache for the submodules of dependencies—is very good. That would be a very nice contribution.

But I'm not comfortable with the current approach. I think it needs to be reworked to (1) be more reviewable and (2) avoid changing .gitmodules.

I also serialized heavy repository write operations, so cloning the same repository twice and checking out the same repository twice can be avoided.

Is this necessary for the primary change described above? If not, let's leave it for a followup PR. Bundling changes together makes PRs harder, and thus slower, to review—which is worse experience for everyone.

@CosynPa
Copy link
Contributor Author

CosynPa commented May 26, 2019

I'd like to separate this PR into two. The first is #2795.

@CosynPa
Copy link
Contributor Author

CosynPa commented Aug 4, 2019

I've separated this PR into three: #2795, #2832 and #2842.

@1ec5
Copy link

1ec5 commented Jan 9, 2020

Looks like both #2795 and #2832 have been merged, but #2842 got closed as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checkout spends a lot of time cloning submodules
4 participants