SERVER-81279 Create Genny Workloads for UpdateMany #1100

brettnawrocki · 2024-01-05T20:39:40Z

Whats Changed:
Adds performance workloads for various cases involving updateMany in sharded clusters.

Patch testing results:
https://spruce.mongodb.com/version/6597130356234348f74ef21a/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

dalyd · 2024-01-05T22:00:32Z

This passed through my inbox, but is not assigned to me. So two quick thoughts:
I think you probably want product-performance instead of performance-infrastructure on this. @mongodb/performance

Also, could you add Keywords along with description and owner?

brettnawrocki · 2024-01-05T23:16:22Z

@dalyd Sorry for the mixup! I read somewhere to add performance infrastructure for genny PRs, but in retrospect I suppose that means core genny code and not genny workloads. 😅

I can add keywords to the workloads as well. Thanks for taking a glance.

thessem · 2024-01-08T04:21:49Z

Sorry about the confusion @brettnawrocki, I've got an eye on using DEVPROD-3142 to simplify the "who should review this" question with a CODEOWNERS file, once I get a little time to look at code quality infrastructure.

jimoleary

I haven't started looking at the actual workloads yet but I'm concerned about everything in the templates directory.

Do we really want this in the repo? And if we are going to add template generation to genny I don't think that the patch command is how it should be implemented.

And I should add for full transparancy that I added some templating support in the past here.

jimoleary

I've added some general comments to the yml.base file but overall the workload looks ok.

Do you have any idea how noisy the new workload is? We would generally run a new workload 3 to 5 times to determine how noisy it is.

jimoleary · 2024-01-08T11:27:22Z

src/workloads/sharding/templates/multi_updates/MultiUpdates.yml.base

+Owner: "@mongodb/sharding"
+Description: |
+  This workload does the following:
+    Create an unsharded collection.


suggestion: either Create a sharded collection with a hashed monotonic int64 shard key. OR Insert 50k documents of around 20kB each using the monotonic loader. .

The monotonic incrementing nature of _id is important w.r.t to the update Filter.

jimoleary · 2024-01-08T11:28:28Z

src/workloads/sharding/templates/multi_updates/MultiUpdates.yml.base

+SchemaVersion: 2018-07-01
+Owner: "@mongodb/sharding"
+Description: |
+  This workload does the following:


nit: could we move this point up to the top or change the intro to:

This workload does the following (with PauseMigrationsDuringMultiUpdates enabled): (and update the relevant patches so that the comment is correct for the other versions).

src/workloads/sharding/templates/multi_updates/MultiUpdates.yml.base

jimoleary · 2024-01-08T11:29:57Z

src/workloads/sharding/templates/multi_updates/MultiUpdates.yml.base

+  Collection: &Collection coll
+  Namespace: &Namespace test.coll
+
+  ApproxDocumentSize: &ApproxDocumentSize 20_000


nit: I think just DocumentSize is fine.

jimoleary · 2024-01-08T11:31:45Z

src/workloads/sharding/templates/multi_updates/MultiUpdates.yml.base

+    Operations:
+    - OperationName: updateMany
+      OperationCommand:
+        Filter: { _id: 0 }


Question: could we half the number of workloads by executing the Filter: {} and Filter: { _id: 0 } in every workload? Essentially add a new phase for Filter: { _id: 0 } to every one.

Although I do appreciate that you are doing exactly what the ticket asks for.

jimoleary · 2024-01-08T11:31:52Z

src/workloads/sharding/templates/multi_updates/MultiUpdates.yml.base

+    Operations:
+    - OperationName: updateMany
+      OperationCommand:
+        Filter: { _id: 0 }


Question: you could we half the number of workloads by executing the Filter: {} and Filter: { _id: 0 } in every workload? Essentially add a new phase for Filter: { _id: 0 } to every one.

Although I do appreciate that you are doing exactly what the ticket asks for.

Can you add a comment to highlight the differences for each version?

E.g: Filter: { _id: 0 } # updateMany on a single document. and Filter: { } # updateMany on all documents.

I took the first suggestion and did both kinds of updateManys in separate phases per workload. It does make sense that it's better to reuse the setup if we can for both variants. I didn't add the comments since I think it's clear enough now the purpose of each filter based on the name of the metrics.

brettnawrocki · 2024-01-11T17:27:24Z

I haven't started looking at the actual workloads yet but I'm concerned about everything in the templates directory.
Do we really want this in the repo? And if we are going to add template generation to genny I don't think that the patch command is how it should be implemented.
And I should add for full transparancy that I added some templating support in the past here.

I expected this to be somewhat controversial, though as you mention there is some existing precedent for "bring your own code generation" for other workloads in the repo. I see DEVPROD-3567 was filed partially in reaction to this PR. I agree that using patch is not ideal and probably shouldn't be standardized, but for smaller scale usage like this, I think it has the advantage of being flexible and already well understood by the average engineer.

My hope is that a better alternative can be proposed (and implemented if necessary), but that this can be used in the meantime.

Do you have any idea how noisy the new workload is? We would generally run a new workload 3 to 5 times to determine how noisy it is.

The cluster parameter in this workload isn't implemented yet (so enabling it does literally nothing). From the first patch, each workload is run effectively twice and I saw ~5% difference between them.

I planned to run another patch and then rerun the workload a few times to get better numbers, but my patch failed I believe due to BF-31402.

jimoleary

added a suggestion about parameterization rather than code generation.

jimoleary · 2024-01-12T15:52:56Z

src/workloads/sharding/templates/multi_updates/MultiUpdates.yml.base

@@ -15,7 +26,7 @@ GlobalDefaults:
  Collection: &Collection coll
  Namespace: &Namespace test.coll

-  ApproxDocumentSize: &ApproxDocumentSize 20_000
+  DocumentSize: &DocumentSize 20_000


Would it be possible to parameterized this workload and get rid of the code generation?

Some loadconfig examples of this are:

ClusteredCollection

ClusteredCollectionLargeRecordIds

Essentially you would move this file into src/phases/multi_updates and then pass inNops for the pauseMigrationsDuringMultiUpdates and ShardCollection operations as needed. The operations could be parameterized something like the queries in

genny/src/workloads/query/SSBColumnStoreIndex.yml

Line 435 in ca423c1

OperationCommand: {^Parameter: {Name: "OperationCommand", Default: *Q1_1Query}}

brettnawrocki · 2024-01-18T19:59:53Z

New patch build: https://spruce.mongodb.com/version/65a85231a4cf47ebaffecbdd/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC
I also ran an additional patch to get a larger sample size: https://spruce.mongodb.com/version/65a9504b5623434a2afaad70/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

Results on Linux Shard Lite (all feature flags) 2022-11

	Unsharded, Update One	Unsharded, Update All	Sharded, Update One	Sharded, Update All
Trial 1	130502	125	128031	201
Trial 2	124107	117	130082	198
Trial 3	132235	118	126664	198
Trial 4	128496	121	118581	195
Min	124107	117	118581	195
Max	132235	125	130082	201
Diff	6.55%	6.84%	9.70%	3.08%

jimoleary

Thanks for the major refactor.

I hope it wasn't too onerous but the workload looks much cleaner now.

jimoleary · 2024-01-19T11:16:11Z

src/phases/sharding/multi_updates/MultiUpdatesTemplate.yml

+      updateMany will update all of the 50k documents per command in the first
+      phase and only a single document per command in the second phase.
+
+  Two parameters are accepted to alter the behavior of this workload:


Thanks for adding the documentation!

brettnawrocki · 2024-01-19T16:56:17Z

Thanks for the major refactor.

I hope it wasn't too onerous but the workload looks much cleaner now.

It was no problem, and as I mentioned before, I expected the code generator to be controversial. 😅 I think the end result is definitely more maintainable than the original iteration.

The way LoadConfig and OnlyActiveInPhases can be combined to run some blocks conditionally was not obvious to me, so thanks for pointing that out. I do kind of wish there was some special value for NopInPhasesUpTo that meant "all other phases" instead of needing to pass that in (and potentially modify it in a number of places if another phase is added). Similarly, I wish I could have made the parameters e.g. shardCollection: true|false instead of needing to pass in [] or [1] to indicate the phases active, which again could change if the structure of the workload was updated.

alicedoherty · 2024-01-22T12:45:08Z

Hey @brettnawrocki, just taking over from Jim on behalf of the performance team. We’d definitely encourage you to open a DEVPROD ticket (with the Assigned Teams field set to “Performance Infrastructure”) if you think having an “all other phases” feature in Genny would be valuable. Also, we have an OnlyRunInInstance: sharded feature that can be used to conditionally shard a collection (the downside of that is until DEVPROD-2049 is resolved you’ll need to exclude your workloads from dry-runs here).

None of this is blocking and I see Jim’s already approved this PR, but just adding this comment in case it adds any value!

SERVER-81279 Create Genny Workloads for UpdateMany

ead69f6

brettnawrocki requested review from kshitijng, krutishah139 and a team January 5, 2024 20:39

brettnawrocki requested review from a team and removed request for a team January 5, 2024 23:08

jimoleary reviewed Jan 8, 2024

View reviewed changes

Address feedback

2bf250a

brettnawrocki requested a review from jimoleary January 11, 2024 17:57

jimoleary reviewed Jan 12, 2024

View reviewed changes

Replace code generation with parameterized templates.

a37393c

brettnawrocki requested a review from jimoleary January 18, 2024 20:00

jimoleary approved these changes Jan 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SERVER-81279 Create Genny Workloads for UpdateMany #1100

SERVER-81279 Create Genny Workloads for UpdateMany #1100

brettnawrocki commented Jan 5, 2024 •

edited

dalyd commented Jan 5, 2024

brettnawrocki commented Jan 5, 2024 •

edited

thessem commented Jan 8, 2024

jimoleary left a comment •

edited

jimoleary left a comment

jimoleary Jan 8, 2024

brettnawrocki Jan 11, 2024

jimoleary Jan 8, 2024

brettnawrocki Jan 11, 2024

jimoleary Jan 8, 2024

brettnawrocki Jan 11, 2024

jimoleary Jan 8, 2024

jimoleary Jan 8, 2024

jimoleary Jan 8, 2024

brettnawrocki Jan 11, 2024

brettnawrocki commented Jan 11, 2024

jimoleary left a comment

jimoleary Jan 12, 2024

brettnawrocki Jan 18, 2024

brettnawrocki commented Jan 18, 2024

jimoleary left a comment

jimoleary Jan 19, 2024

brettnawrocki commented Jan 19, 2024

alicedoherty commented Jan 22, 2024

SERVER-81279 Create Genny Workloads for UpdateMany #1100

Are you sure you want to change the base?

SERVER-81279 Create Genny Workloads for UpdateMany #1100

Conversation

brettnawrocki commented Jan 5, 2024 • edited

dalyd commented Jan 5, 2024

brettnawrocki commented Jan 5, 2024 • edited

thessem commented Jan 8, 2024

jimoleary left a comment • edited

Choose a reason for hiding this comment

jimoleary left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brettnawrocki commented Jan 11, 2024

jimoleary left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brettnawrocki commented Jan 18, 2024

jimoleary left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brettnawrocki commented Jan 19, 2024

alicedoherty commented Jan 22, 2024

brettnawrocki commented Jan 5, 2024 •

edited

brettnawrocki commented Jan 5, 2024 •

edited

jimoleary left a comment •

edited