Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdeps clauses in query are not executed in parallel #21731

Open
guw opened this issue Mar 19, 2024 · 3 comments
Open

rdeps clauses in query are not executed in parallel #21731

guw opened this issue Mar 19, 2024 · 3 comments
Labels
team-Performance Issues for Performance teams type: feature request

Comments

@guw
Copy link
Contributor

guw commented Mar 19, 2024

Description of the feature request:

I would like to be able to perform multiple bazel query operations in parallel without using a different output_base.

Which category does this issue belong to?

Performance

What underlying problem are you trying to solve with this feature?

I need to query for rdeps but I need to exclude references to the package itself. Therefore I am using the following query:

rdeps( //..., //foo, 1) except //foo/...

I do have N number of packages. Thus, I need to run this query N times. It would be great if I can do this in parallel.

I also did try with a combined query:

(rdeps( //..., //foo, 1) except //foo/...) 
+
(rdeps( //..., //bar, 1) except //bar/...)
+
..

However, that does not seem to trigger any performance optimization within Bazel, i.e. all (rdeps ...) clauses seems to be executed sequentially one by another.

Which operating system are you running Bazel on?

macOS 14.4

What is the output of bazel info release?

7.0.1

@meisterT
Copy link
Member

This is basically a subset of #532.

It certainly would be easier to implement, but it is still very much non-trivial. These are some old notes on the general problem.

  • BlazeModule and BlazeRuntime are not thread-safe.
  • The lock in BlazeCommandDispatcher needs to be moved to BlazeWorkspace (if thread-safe).
  • State needs to be encapsulated in Skyframe execution.
  • Supporting concurrent query would be easier since the state is smaller.
  • We have to be careful how we compare states.
  • Profiler is not thread-safe, we would need a map of threads to profilers.
  • Same for logger.
  • If we want to do a simpler, first step and want to support help concurrent to other commands, we need to:
    • Disable profiling for help.
    • Make BlazeRuntime thread-safe.
    • Remove workspace state from BlazeRuntime.
  • Of course it doesn't work with batch mode - which should be removed anyway.

@lberki or @ulfjack might have some ideas / comments on feasibility.

@lberki
Copy link
Contributor

lberki commented Mar 19, 2024

As per my #532 (comment) , I'll close this bug as a duplicate of #532. Running multiple build commands in parallel would entail more work than doing so with query, but the latter is already an enormous task so I don't think it makes sense to separate the two.

As much as I'd like to do this, I don't think we are going to get to this in the foreseeable future :(

@lberki lberki closed this as completed Mar 19, 2024
@lberki
Copy link
Contributor

lberki commented Mar 19, 2024

That said, the rdeps() clauses could be executed in parallel, so on second thought, I'll reopen this issue and reframe it as "please make bazel query faster".

@lberki lberki reopened this Mar 19, 2024
@lberki lberki changed the title Parallel query execution / evaluation rdeps clauses in query are not executed in parallel Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Performance Issues for Performance teams type: feature request
Projects
None yet
Development

No branches or pull requests

6 participants