New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve submodule operation performance #2655
base: master
Are you sure you want to change the base?
Conversation
5e9c436
to
824648d
Compare
You can rebase |
…ation for the recursive submodule cloning
…sitory are left for later commits
824648d
to
abd72ca
Compare
Rebased. |
The problem is, as is pointed out in the two issues, the carthage checkout operation will try cloning submodules of dependencies, which takes a lot of time. There are two functions that will clone or fetch submodules, one is My solution is that first clone or fetch submodules to the local cache directory. Then redirect the slow network operations to the local cache. For I also serialized heavy repository write operations, so cloning the same repository twice and checking out the same repository twice can be avoided. |
I'll try to get to this ASAP. I'm on vacation right now, but I'll see if I can find some time to review this. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@@ -153,6 +153,7 @@ public func contentsOfFileInRepository(_ repositoryFileURL: URL, _ path: String, | |||
public func checkoutRepositoryToDirectory( | |||
_ repositoryFileURL: URL, | |||
_ workingDirectoryURL: URL, | |||
force: Bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this added/needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original checkoutRepositoryToDirectory
function was a forced checkout, and in checkoutSubmodule
there was a non-forced checkout. So to reuse the checkoutRepositoryToDirectory
function, I added the force parameter.
|
||
/// Clones the given submodule into the working directory of its parent | ||
/// repository, but without any Git metadata. | ||
public func cloneSubmoduleInWorkingDirectory(_ submodule: Submodule, _ workingDirectoryURL: URL) -> SignalProducer<(), CarthageError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these functions should move here:
- These are generic functions that shouldn't really be tied to the
Project
- Because these were moved, the diff for this PR is really large and almost impossible to review
Can you move them back to the Git file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions were moved because the new implementation depends on repositoryFileURL
which is a private function in Project.swift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then that should be passed in as a parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about cloneOrFetchDependency
? These functions actually rely on cloneOrFetchDependency
which uses cloneOrFetchQueue
and _projectEventsObserver
which are Project
properties. I think some kind of queue is definitely needed. So also pass the queue in as a parameter?
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This global state makes me very uncomfortable. I don't think it's a good idea to introduce a static var
with this. It leads to code that's hard to follow and modify.
We should use another method to enforce exclusivity. I think it makes sense to do something like this:
- Fetch all the submodules from the repository cache, in parallel but with limited concurrency.
- Perform the actual checkouts, passing in a queue or doing them one at a time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there are already a lot of queues, but they are instance properties of a Project. So I just need to move the global variable to the Project class?
} | ||
|
||
public func updateSubmodules(_ superprojectDirectoryURL: URL) -> SignalProducer<(), CarthageError> { | ||
func changeGitmodulesUrls(toCacheURL: Bool) -> SignalProducer<(), CarthageError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary. The original code in addSubmoduleToRepository
already handled doing the equivalent of git submodule update --init
from the repository cache.
If your intent is to make that logic recursive, I think you can do that following the existing method without modifying .gitmodules
.
First off: Thanks for the PR! I think the overall idea—make sure we're using the repository cache for the submodules of dependencies—is very good. That would be a very nice contribution. But I'm not comfortable with the current approach. I think it needs to be reworked to (1) be more reviewable and (2) avoid changing
Is this necessary for the primary change described above? If not, let's leave it for a followup PR. Bundling changes together makes PRs harder, and thus slower, to review—which is worse experience for everyone. |
I'd like to separate this PR into two. The first is #2795. |
Fixes #1833 and #2651.
Better wait after #2654 is merged, then rebase on it.