Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ninja should have a way to trigger compiles through IPC instead of CreateProcess #1638

Closed
randomascii opened this issue Aug 23, 2019 · 6 comments

Comments

@randomascii
Copy link
Contributor

CreateProcess on Windows is slow at the best of times, and Chrome developers often find themselves not in the best of times. Some process creation/destruction issues that have been dealt with over the last few years in the Chrome build include:

  1. Chrome builds hit UserCrit lock contention during process destruction causing build slowdowns and significant mouse stuttering during goma builds due to a Windows regression. Fixed in Windows, fix eventually propagated to all relevant Windows 10 builds (https://randomascii.wordpress.com/2017/07/09/24-core-cpu-and-i-cant-move-my-mouse/)
  2. Chrome builds hit UserCrit lock contention during process destruction causing build slowdowns and significant mouse stuttering during goma builds due to gomacc.exe indirectly pulling in gdi32.dll. Fixed in gomacc by avoiding gdi32.dll.
  3. Anti-malware software caused an ~11 ms regression in CreateProcess making it impossible for a full Chrome build to be faster than ~ten minutes, and slowing all builds. Fixed by disabling this anti-malware software.
  4. At least two variations on problem number 3 including other anti-malware software that is either not correctly disabled or still has overhead (MsMpEng.exe) even when disabled.
  5. The high process creation rate in Chrome's build triggered bugs in SCCM that caused zombie processes that eventually led to up to 32 GB of lost memory (https://randomascii.wordpress.com/2018/02/11/zombie-processes-are-eating-your-memory/)
  6. Enabling App Verifier for gomacc.exe to investigate occasional heap corruption hit an O(n^2) performance issue in log-file creation (https://randomascii.wordpress.com/2018/10/15/making-windows-slower-part-2-process-creation/)

Other process creation/destruction issues that didn't affect Chrome builds include:

  1. Chromium's large test binaries hit an O(n^2) CFG performance bug hit in Windows which caused unit_tests.exe to take approximately 5x as long as it should on Windows 10. Fixed by disabling CFG in test binaries (https://randomascii.wordpress.com/2019/04/21/on2-in-createprocess/)
  2. clang-cl unit tests hit UserCrit lock contention during process destruction causing the tests to take 5x as long as they should on Windows 10 and causing significant mouse stuttering. Fixed by avoiding pulling in gdi32.dll.

Currently all of these issues are mitigated, with the exception of MsMpEng.exe causing non-zero overhead even when everything is in excluded folders. However there is ongoing effort needed to keep all Chrome developers' machines properly configured, and some loss of security because of the many exclusions needed to maintain decent build performance. There is also ongoing effort to detect and investigate these issues.

At some point it is no longer worth tilting at the CreateProcess windmill and instead avoid calling CreateProcess so frequently. ninja.exe could be taught how to use IPC to communicate with a local server process that would manage a pool of compile processes (goma and/or clang-cl) in order to avoid 99% of CreateProcess calls during a build.

Doing this would reduce the CPU cost of CreateProcess, would remove CreateProcess from the serialized critical path in ninja, would allow more security monitoring to be enabled with fewer exclusions, and would save us from the costs (investigation time and build time) of future regressions.

Alternately ninja could multi-thread CreateProcess but this is not as complete a solution.

Switching from CreateProcess to IPC in ninja is not clearly the right thing to do, but it is worth discussing. A prototype has been created so test results should be shared here.

@jonesmz

This comment was marked as abuse.

@nico
Copy link
Collaborator

nico commented Aug 25, 2019

Not surprisingly, I strongly disagree with this. Ninja is a simple build system that assumes that the build environment is sane. There are other build systems with different designs that might assume hostile build environments, but ninja is not that system.

It sounds like ninja's current design help identify lots of bugs that were worth fixing. I consider this a feature.

@nico nico closed this as completed Aug 25, 2019
@jonesmz

This comment was marked as abuse.

@jonesmz

This comment was marked as abuse.

@randomascii
Copy link
Contributor Author

My understanding is that a colleague has a prototype for ninja-with-IPC.

My main concern about closing this without more discussion is that it misses an opportunity to discuss the security tradeoffs. Most of the ways that we have mitigated the CreateProcess costs/serialization are by disabling various security checks. I'm not an expert on the value of those checks but it could be instructive to have those who are experts weigh in on the benefits of being able to enable them.

@jonesmz

This comment was marked as abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants