Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF-5374 Improve comment headers for multiplanner/ workloads #1213

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

dpercy
Copy link
Contributor

@dpercy dpercy commented May 7, 2024

Jira Ticket: PERF-5374

Whats Changed

Mostly I updated the first sentence about each test's goal. In some cases I also added some more detail.

In one case, 'NoResults.yml', I realized that the original description was wrong because I hadn't thought through what would actually happen. The results at full-results.ipynb are consistent with the new description.

Patch Testing Results

I have not tested this, because I'm only changing comments.

Mostly I updated the first sentence about each test's goal. In some
cases I also added some more detail.

In one case, 'NoResults.yml', I realized that the original description
was wrong because I hadn't thought through what would actually happen.
The results at [full-results.ipynb](https://github.com/10gen/product-perf-experimentations/blob/master/investigations/PERF-5121-compare-multiplanners/full-results.ipynb)
are consistent with the new description.
@dpercy dpercy requested a review from a team as a code owner May 7, 2024 18:58
@dpercy dpercy requested review from jimoleary and dstorch May 7, 2024 18:58
Copy link
Contributor

@jimoleary jimoleary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good I've left some small suggestions and I think you'll need to regenerate the docs before you can merge this.

classic should have better latency and throughput than SBE, and the combination of classic
planner + SBE execution (PM-3591) to perform about as well as classic.

TODO(CR) Storch noticed the selectivities don't make sense here: the data is too small since we carve up the same total size among many tenants.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe create a ticket and note it here so this TODO is tracked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tracked by PERF-5358. I actually started typing a patch for it this morning. I guess we should coordinate regarding which change goes in first.

limit.
The goal of this test is to exercise the case in multiplanning where all competing plans are bad.

As in 'Simple.yml' we create 64 indexes and run a query that makes all of them eligible, so we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isn't it 63 indexes ? It doesn't look like you're actually using the _id index (and it's created automatically).

Maybe reuse the later phrasing:

We create as many indexes as possible, and run a query that makes all of them eligible

Copy link
Contributor

@dstorch dstorch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on @dpercy. I left a few comments on the specifics.

One overall comment: A lot of the workload descriptions were written in the context of evaluating classic+classic against classic+sbe ("mix") against sbe+sbe. But the pure SBE case is no longer tested and we seem on track for deleting it after shipping 8.0. That means that for most of their lifetime, these workloads will just stand on their own as scenarios to test the performance of multi-planning and will no longer serve the original purpose of comparing the SBE multiplanner against the classic multi-planner. For this purpose, I think it's useful to (at least for the most part) scrub the workload descriptions of commentary related to the behaviors of the SBE runtime planners. What do you think?

src/workloads/query/multiplanner/BlockingSort.yml Outdated Show resolved Hide resolved
src/workloads/query/multiplanner/BlockingSort.yml Outdated Show resolved Hide resolved
get as many competing plans as possible. We also add a sort stage on an unindexed field,
ensuring that every plan is a blocking plan. Because all plans are blocking and return as many
documents as possible, multiplanning will hit "max works" instead of EOF of numToReturn.
This maximizes the overhead of multiplanning on both classic and SBE.

We expect classic to have better latency and throughput than SBE on this workload,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I feel like this comment will quickly grow stale. It will be talking about some old SBE runtime planners that will have long been deleted from the code base. So I think it makes sense to "get ahead of the curve" and just delete this?

classic should have better latency and throughput than SBE, and the combination of classic
planner + SBE execution (PM-3591) to perform about as well as classic.

TODO(CR) Storch noticed the selectivities don't make sense here: the data is too small since we carve up the same total size among many tenants.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tracked by PERF-5358. I actually started typing a patch for it this morning. I guess we should coordinate regarding which change goes in first.

Comment on lines 9 to 10
This workload is similar to 'Simple.yml' except for the collection being clustered.
Maybe we expect the larger record IDs to make fetching more expensive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. So I guess we should state explicitly that there is no predicate on _id so none of the plans can actually use a bounded collection scan, taking advantage of the clustering key.

But I'm actually still a little unsure about what the point of this workload is after reading this description. I guess the idea is that all the indexes are bigger because the _id values they have are bigger? But presumably since there is still just one very selective predicate, it doesn't matter whether the cost model picks up on this fact, since the plan choice is still dominated by CE -- are to be more explicit, dominated by choosing the plan which examines the fewest index keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure how valuable this workload is and would be open to removing it. The clustered index isn't going to affect the planner's choice of plan, or the number of works, but maybe having large record-ids means that each work takes more wall-clock time.

I'd be open to removing this workload, especially given that we have UseClusteredIndex.yml.

available.

If the selectivity value is small enough (less than 0.5), the optimal plan is to employ a
blocking plan by scanning a segment of empty data and conducting a blocking-sort operation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "empty data" a typo here? I don't understand what that means.

If the selectivity value is small enough (less than 0.5), the optimal plan is to employ a
blocking plan by scanning a segment of empty data and conducting a blocking-sort operation,
whereas the other plans' index provides the right sort order, but requires a full scan, and
every document is rejected after the FETCH stage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, every document is rejected? Unless I'm misreading the workload it looked like some of the data will match.

src/workloads/query/multiplanner/UseClusteredIndex.yml Outdated Show resolved Hide resolved
src/workloads/query/multiplanner/VariedSelectivity.yml Outdated Show resolved Hide resolved
src/workloads/query/multiplanner/VariedSelectivity.yml Outdated Show resolved Hide resolved
dpercy and others added 7 commits May 10, 2024 13:44
Co-authored-by: David Storch <dstorch@users.noreply.github.com>
Co-authored-by: David Storch <dstorch@users.noreply.github.com>
Co-authored-by: David Storch <dstorch@users.noreply.github.com>
Co-authored-by: David Storch <dstorch@users.noreply.github.com>
Co-authored-by: David Storch <dstorch@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants