PERF-5374 Improve comment headers for multiplanner/ workloads #1213

dpercy · 2024-05-07T18:58:03Z

Jira Ticket: PERF-5374

Whats Changed

Mostly I updated the first sentence about each test's goal. In some cases I also added some more detail.

In one case, 'NoResults.yml', I realized that the original description was wrong because I hadn't thought through what would actually happen. The results at full-results.ipynb are consistent with the new description.

Patch Testing Results

I have not tested this, because I'm only changing comments.

Mostly I updated the first sentence about each test's goal. In some cases I also added some more detail. In one case, 'NoResults.yml', I realized that the original description was wrong because I hadn't thought through what would actually happen. The results at [full-results.ipynb](https://github.com/10gen/product-perf-experimentations/blob/master/investigations/PERF-5121-compare-multiplanners/full-results.ipynb) are consistent with the new description.

jimoleary

Overall looks good I've left some small suggestions and I think you'll need to regenerate the docs before you can merge this.

jimoleary · 2024-05-08T11:10:34Z

src/workloads/query/multiplanner/CompoundIndexes.yml

+  classic should have better latency and throughput than SBE, and the combination of classic
+  planner + SBE execution (PM-3591) to perform about as well as classic.
+
+  TODO(CR) Storch noticed the selectivities don't make sense here: the data is too small since we carve up the same total size among many tenants.


Maybe create a ticket and note it here so this TODO is tracked.

This is tracked by PERF-5358. I actually started typing a patch for it this morning. I guess we should coordinate regarding which change goes in first.

jimoleary · 2024-05-08T11:18:35Z

src/workloads/query/multiplanner/NoGoodPlan.yml

-  limit.
+  The goal of this test is to exercise the case in multiplanning where all competing plans are bad.
+
+  As in 'Simple.yml' we create 64 indexes and run a query that makes all of them eligible, so we


nit: isn't it 63 indexes ? It doesn't look like you're actually using the _id index (and it's created automatically).

Maybe reuse the later phrasing:

We create as many indexes as possible, and run a query that makes all of them eligible

dstorch

Thanks for taking this on @dpercy. I left a few comments on the specifics.

One overall comment: A lot of the workload descriptions were written in the context of evaluating classic+classic against classic+sbe ("mix") against sbe+sbe. But the pure SBE case is no longer tested and we seem on track for deleting it after shipping 8.0. That means that for most of their lifetime, these workloads will just stand on their own as scenarios to test the performance of multi-planning and will no longer serve the original purpose of comparing the SBE multiplanner against the classic multi-planner. For this purpose, I think it's useful to (at least for the most part) scrub the workload descriptions of commentary related to the behaviors of the SBE runtime planners. What do you think?

src/workloads/query/multiplanner/BlockingSort.yml

dstorch · 2024-05-08T14:42:04Z

src/workloads/query/multiplanner/BlockingSort.yml

+  get as many competing plans as possible. We also add a sort stage on an unindexed field,
+  ensuring that every plan is a blocking plan. Because all plans are blocking and return as many
+  documents as possible, multiplanning will hit "max works" instead of EOF of numToReturn.
+  This maximizes the overhead of multiplanning on both classic and SBE.

  We expect classic to have better latency and throughput than SBE on this workload,


Again, I feel like this comment will quickly grow stale. It will be talking about some old SBE runtime planners that will have long been deleted from the code base. So I think it makes sense to "get ahead of the curve" and just delete this?

dstorch · 2024-05-08T14:46:08Z

src/workloads/query/multiplanner/CompoundIndexes.yml

+  classic should have better latency and throughput than SBE, and the combination of classic
+  planner + SBE execution (PM-3591) to perform about as well as classic.
+
+  TODO(CR) Storch noticed the selectivities don't make sense here: the data is too small since we carve up the same total size among many tenants.


This is tracked by PERF-5358. I actually started typing a patch for it this morning. I guess we should coordinate regarding which change goes in first.

dstorch · 2024-05-08T14:51:23Z

src/workloads/query/multiplanner/ClusteredCollection.yml

+  This workload is similar to 'Simple.yml' except for the collection being clustered.
+  Maybe we expect the larger record IDs to make fetching more expensive.


Hmm. So I guess we should state explicitly that there is no predicate on _id so none of the plans can actually use a bounded collection scan, taking advantage of the clustering key.

But I'm actually still a little unsure about what the point of this workload is after reading this description. I guess the idea is that all the indexes are bigger because the _id values they have are bigger? But presumably since there is still just one very selective predicate, it doesn't matter whether the cost model picks up on this fact, since the plan choice is still dominated by CE -- are to be more explicit, dominated by choosing the plan which examines the fewest index keys.

I'm also not sure how valuable this workload is and would be open to removing it. The clustered index isn't going to affect the planner's choice of plan, or the number of works, but maybe having large record-ids means that each work takes more wall-clock time.

I'd be open to removing this workload, especially given that we have UseClusteredIndex.yml.

dstorch · 2024-05-08T20:31:45Z

src/workloads/query/multiplanner/NonBlockingVsBlocking.yml

+  available.
+
+  If the selectivity value is small enough (less than 0.5), the optimal plan is to employ a
+  blocking plan by scanning a segment of empty data and conducting a blocking-sort operation,


Is "empty data" a typo here? I don't understand what that means.

dstorch · 2024-05-08T20:32:48Z

src/workloads/query/multiplanner/NonBlockingVsBlocking.yml

+  If the selectivity value is small enough (less than 0.5), the optimal plan is to employ a
+  blocking plan by scanning a segment of empty data and conducting a blocking-sort operation,
+  whereas the other plans' index provides the right sort order, but requires a full scan, and
+  every document is rejected after the FETCH stage.


Wait, every document is rejected? Unless I'm misreading the workload it looked like some of the data will match.

src/workloads/query/multiplanner/UseClusteredIndex.yml

src/workloads/query/multiplanner/VariedSelectivity.yml

Co-authored-by: David Storch <dstorch@users.noreply.github.com>

dpercy requested a review from a team as a code owner May 7, 2024 18:58

dpercy requested review from jimoleary and dstorch May 7, 2024 18:58

dpercy assigned dstorch May 7, 2024

jimoleary reviewed May 8, 2024

View reviewed changes

dstorch requested changes May 8, 2024

View reviewed changes

dstorch mentioned this pull request May 9, 2024

PERF-5358 Improve CompoundIndexes.yml workload #1214

Merged

dpercy and others added 7 commits May 10, 2024 13:44

typo of -> or

9ac18f3

Co-authored-by: David Storch <dstorch@users.noreply.github.com>

no need to split on "classic and SBE"

727c8f5

Co-authored-by: David Storch <dstorch@users.noreply.github.com>

typo "prepence" -> presence

3966b55

Co-authored-by: David Storch <dstorch@users.noreply.github.com>

don't split on "choice of multiplanner"

608dbb0

Co-authored-by: David Storch <dstorch@users.noreply.github.com>

collectionSize one word

46a82aa

Co-authored-by: David Storch <dstorch@users.noreply.github.com>

Merge branch 'master' into PERF-5374-comments

d813797

state explicitly ClusteredCollection doesn't do a clustered scan

bb2a774

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF-5374 Improve comment headers for multiplanner/ workloads #1213

PERF-5374 Improve comment headers for multiplanner/ workloads #1213

dpercy commented May 7, 2024

jimoleary left a comment

jimoleary May 8, 2024

dstorch May 8, 2024

jimoleary May 8, 2024

dstorch left a comment

dstorch May 8, 2024

dstorch May 8, 2024

dstorch May 8, 2024

dpercy May 10, 2024

dstorch May 8, 2024

dstorch May 8, 2024

		This workload is similar to 'Simple.yml' except for the collection being clustered.
		Maybe we expect the larger record IDs to make fetching more expensive.

PERF-5374 Improve comment headers for multiplanner/ workloads #1213

Are you sure you want to change the base?

PERF-5374 Improve comment headers for multiplanner/ workloads #1213

Conversation

dpercy commented May 7, 2024

Whats Changed

Patch Testing Results

jimoleary left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dstorch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment