-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install files in parallel #1256
base: main
Are you sure you want to change the base?
Conversation
src/vcpkg/commands.install.cpp
Outdated
#ifdef _WIN32 | ||
std::sort(std::execution::par_unseq, output.begin(), output.end()); | ||
#else | ||
std::sort(output.begin(), output.end()); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open for suggestions on how to implement parallel sorting for all platforms
src/vcpkg/commands.install.cpp
Outdated
fs.remove_all(target, IgnoreErrors{}); | ||
fs.remove(target, IgnoreErrors{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no difference for regular files between remove_all
and remove
(provided that vcpkg's implementation is identical to the STL): https://en.cppreference.com/w/cpp/filesystem/remove
…/vcpkg-tool into parallel-file-install
@BillyONeal Unit tests on OSX failed here. It seems to work again but I can consistently reproduce the failure on my local machine: The test case |
Btw you can build the format target locally to format all code. |
Sorry! |
…/vcpkg-tool into parallel-file-install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in general happy with attempts to parallelize things but overall I think you need to be much more explicit in proving that everything you call in a parallel context is actually safe to do. Given that the first run of e2e tests experienced a failure and I found races in a cursory review makes me fear that unknown races remain without such a proof.
This means if you want to proceed here I think you need to show strong thread safety requirements of every API you're calling rather than taking large blocks of existing code and wrapping them in parallel constructs.
} | ||
} | ||
|
||
fs.create_directories(listfile.parent_path(), VCPKG_LINE_INFO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This races with other file operations you're doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? The list file is not in the same folder where the installed files are. And the "destination directory" is created beforehand. There is no race condition for creating /installed
because it already exists (line 163). The listfiles go in /installed/vcpkg/info
and the installed files will go in installed/triplet
.
|
||
parallel_transform(files, output.begin(), [&](auto&& file) -> PathAndType { | ||
std::error_code ec; | ||
auto status = fs.symlink_status(file, ec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly suspect the only thing you're gaining from parallelism here is parallelizing this symlink_status
and this would be much clearer if you only parallelized that rather than trying to make invalid states that need to be chopped off etc.
If you parallelize just this part you could use plain Util::erase_remove_if
for the invalid ones. It would also have the advantage of not trying to parallelize println_warning
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I want to get rid of invalid states I'd have to use parallel_for_each
but this would require a lock on when inserting things into the destination vector. The only thing I can move out of this is the validation of the file type.
if (ec) | ||
{ | ||
msg::println_error(msgInstallFailed, msg::path = target, msg::error_msg = ec.message()); | ||
Debug::println("Install from packages to installed: Fallback to copy " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug::println
is not thread safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is other parallelized code where this also happens: in cmd_execute_and_capture_output_parallel
we call cmd_execute_and_stream_data
which calls Debug::print()
.
I think the main issue with parallelizing here is printing. Basically, whenever the program can print anything, needs to be guarded by a mutex which is IMO not possible from the outside. Therefore, we need to make printing inherently thread safe. |
Depending on how many files to install, this change results in a >= 4x speed up. I only tested relatively small ports but I expect even higher speed ups for bigger ports.