perf(seed): run rebuilds in parallel, add perf logs #4334

ThatOneBro · 2024-04-08T01:55:53Z

So far, this has shown to reduce average startup time when seeding a fresh database on my local machine by 35-45 seconds. I expect to see a noticeable speedup in CI too.

// Before - everything in serial -- Duration: 65_000 ms
{"level":"INFO","timestamp":"2024-04-07T21:51:38.706Z","msg":"Starting rebuilds","name":"Starting rebuilds","entryType":"mark","startTime":9461.975291013718,"duration":0,"detail":null}
{"level":"INFO","timestamp":"2024-04-07T21:52:43.675Z","msg":"Finished with rebuilds","name":"Rebuild time","entryType":"measure","startTime":9461.975291013718,"duration":64968.646834015846}

// After - everything in parallel -- Duration: 29_000 ms
{"level":"INFO","timestamp":"2024-04-07T21:49:29.494Z","msg":"Starting rebuilds","name":"Starting rebuilds","entryType":"mark","startTime":8685.879249930382,"duration":0,"detail":null}
{"level":"INFO","timestamp":"2024-04-07T21:49:58.171Z","msg":"Finished with rebuilds","name":"Rebuild time","entryType":"measure","startTime":8685.879249930382,"duration":28676.561459064484}

TODO

~~- [x] Ensure low load on when rebuilding (test on staging)~~

Get seed test running in parallel with main test suite
Run serial version when rebuilding in prod, use parallel version on fresh database

vercel · 2024-04-08T01:56:02Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
medplum-provider	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 11, 2024 2:59am

3 Ignored Deployments

Name	Status	Preview	Updated (UTC)
medplum-app	⬜️ Ignored (Inspect)	Visit Preview	May 11, 2024 2:59am
medplum-storybook	⬜️ Ignored (Inspect)	Visit Preview	May 11, 2024 2:59am
medplum-www	⬜️ Ignored (Inspect)	Visit Preview	May 11, 2024 2:59am

codyebberson

Nice. It'll make CPU's heat up, but definitely worth the experiment.

codyebberson · 2024-04-08T16:33:55Z

packages/server/src/seeds/searchparameters.ts

  for (const filename of SEARCH_PARAMETER_BUNDLE_FILES) {
    for (const entry of readJson(filename).entry as BundleEntry[]) {
-      await createParameter(systemRepo, entry.resource as SearchParameter);
+      promises.push(createParameter(systemRepo, entry.resource as SearchParameter));


lol, curious how Postgres handles this

We're capped at 10 concurrent connections with the pool, I considered trying to up it and see what happens, but seemed to run fine on my local machine as is.

codyebberson

Thanks for doing this investigation.

~~How much time savings do we get from this?~~

I'd like to understand full picture. This increases the complexity of test setup (extra instance of postgres, double seed
logic, etc). I just want to be sure that complexity is justified.

Scratch that, I see, 30-40 seconds, ok, that is substantial.

Yeah, let's push forward with this 👍

packages/server/package.json

codyebberson · 2024-04-23T21:38:06Z

packages/server/seed-tests/seed.test.ts

Why move this file? Strong preference for keeping all .ts files in src/

It was initially due to weird glob matching not excluding the seed tests properly from main line of tests... it was cleaner to just separate them since they are somewhat logically separate...

I put them back though, found a sort of hack around it with some ** paths that works for some reason. Made a note about it in the file

codyebberson · 2024-04-23T21:42:12Z

packages/server/src/seeds/searchparameters.ts

+        promises.push(createParameter(systemRepo, entry.resource as SearchParameter));
+      }
+    }
+    await Promise.all(promises);


Consider always building the array, and then using the parallel option in the last step

if (finalOptions.parallel) { await Promise.all(promises); } else { for (const promise of promises) { await promise; } }

That way we don't need to worry about the logic getting out of sync.

Good point 👍

Actually this won't work. Filling the array in the serial case doesn't actually execute the promises serially, only waits for them serially. We have to create the promise in the loop where it is awaited in order to not unintentionally parallelize them

I don't know if it's worth introducing a new dependency, but p-limit seems like a pretty elegant and simple solution to abstract away the max concurrency that we want to allow and avoid the if/else code paths.

packages/server/src/seeds/searchparameters.ts

packages/server/tsconfig.json

sonarcloud · 2024-05-11T02:55:42Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
88.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

mattlong

LGTM. Just a couple comments to consider.

mattlong · 2024-05-13T19:05:41Z

packages/server/src/config.ts

@@ -261,7 +261,7 @@ function addDefaults(config: MedplumServerConfig): MedplumServerConfig {
  config.accurateCountThreshold = config.accurateCountThreshold ?? 1000000;
  config.defaultBotRuntimeVersion = config.defaultBotRuntimeVersion ?? 'awslambda';
  config.defaultProjectFeatures = config.defaultProjectFeatures ?? [];
-  config.emailProvider = config.emailProvider || (config.smtp ? 'smtp' : 'awsses');
+  config.emailProvider = config.emailProvider ?? (config.smtp ? 'smtp' : 'awsses');


Would || actually be preferable here to guard against config.emailProvider === ''?

mattlong · 2024-05-13T19:31:42Z

packages/server/src/seeds/searchparameters.ts

+        promises.push(createParameter(systemRepo, entry.resource as SearchParameter));
+      }
+    }
+    await Promise.all(promises);


I don't know if it's worth introducing a new dependency, but p-limit seems like a pretty elegant and simple solution to abstract away the max concurrency that we want to allow and avoid the if/else code paths.

ThatOneBro requested a review from a team as a code owner April 8, 2024 01:55

github-actions bot assigned ThatOneBro Apr 8, 2024

vercel bot deployed to Preview – medplum-provider April 8, 2024 01:58 View deployment

vercel bot deployed to Preview – medplum-provider April 8, 2024 03:30 View deployment

ThatOneBro mentioned this pull request Apr 8, 2024

wip(server): compress migrations into single file for seeding fresh DB #4333

Draft

codyebberson approved these changes Apr 8, 2024

View reviewed changes

ThatOneBro added this to the April 30th, 2024 milestone Apr 9, 2024

ThatOneBro marked this pull request as draft April 9, 2024 06:06

ThatOneBro force-pushed the derrick-parallelize-db-seed branch from 8a48b05 to 53c8e88 Compare April 9, 2024 06:11

vercel bot deployed to Preview – medplum-provider April 9, 2024 06:14 View deployment

ThatOneBro force-pushed the derrick-parallelize-db-seed branch from 53c8e88 to 000a017 Compare April 23, 2024 00:05

ThatOneBro marked this pull request as ready for review April 23, 2024 00:05

vercel bot deployed to Preview – medplum-provider April 23, 2024 00:16 View deployment

vercel bot deployed to Preview – medplum-provider April 23, 2024 03:02 View deployment

vercel bot deployed to Preview – medplum-provider April 23, 2024 03:06 View deployment

vercel bot deployed to Preview – medplum-provider April 23, 2024 05:31 View deployment

vercel bot deployed to Preview – medplum-provider April 23, 2024 06:25 View deployment

ThatOneBro force-pushed the derrick-parallelize-db-seed branch from 607e74f to 40a0941 Compare April 23, 2024 07:08

vercel bot deployed to Preview – medplum-provider April 23, 2024 07:16 View deployment

vercel bot deployed to Preview – medplum-provider April 23, 2024 07:44 View deployment

vercel bot deployed to Preview – medplum-provider April 23, 2024 10:11 View deployment

vercel bot deployed to Preview – medplum-provider April 23, 2024 10:28 View deployment

codyebberson reviewed Apr 23, 2024

View reviewed changes

vercel bot deployed to Preview – medplum-provider April 24, 2024 00:11 View deployment

vercel bot deployed to Preview – medplum-provider April 24, 2024 00:23 View deployment

vercel bot deployed to Preview – medplum-www April 24, 2024 00:40 View deployment

vercel bot deployed to Preview – medplum-provider April 24, 2024 00:44 View deployment

vercel bot deployed to Preview – medplum-provider April 24, 2024 02:23 View deployment

ThatOneBro marked this pull request as draft April 25, 2024 00:17

ThatOneBro added 23 commits May 10, 2024 18:03

test(seed): split out seed tests, test in parallel to main tests

1c424bc

cleanup(compose): rm stray service

ba9eaa8

cleanup(package.json): rm stray package

f98f105

cleanup(tsconfig): add seed-tests dir to tsconfig

6ca8cd1

fix(test.sh): make parallel tests fail together

340ee80

fix(actions/build): fix postgres_seed port

e5a9888

test(actions/build): see if hardcoding port works

fb90a1a

cleanup(coverage): coverage/seed/{parallel,serial}

b4dfc20

cleanup(server/test): don't call docker-compose in test cmd

ec0746b

test(coverage): add back coverage of src to seed tests

10bc983

test(coverage): add exclusions too

472be29

fix(tests/seed): do minimal setup in seed tests to actually test seeding

f99b615

cleanup(sonar): rm unnecessary ass, use ?? operator

7e18ad8

refactor(seed): always build promise arr first

dfe2307

refactor(seed): move seed tests back into src, filter them

d71b8cd

docs(server/jest): add comment about weird glob behavior

8369583

docs(testing): note running test:seed:parallel on fresh install

5bdd757

revert(seed): restore separate rebuild paths

c13e908

test(seed): make serial seed test run migrations before test

743060f

chore(package-lock): fix needless updates

8080800

chore(package-lock): actually fix needless updates

ed4a256

test(seed): stop needlessly busting cache

436298b

test(seed): always bust cache

8ad3add

ThatOneBro force-pushed the derrick-parallelize-db-seed branch from e3b7b1b to 8ad3add Compare May 11, 2024 01:15

vercel bot deployed to Preview – medplum-provider May 11, 2024 01:20 View deployment

tweak(jest): set the default test timeout lower

8ef8b4b

ThatOneBro force-pushed the derrick-parallelize-db-seed branch from 0168a8c to 8ef8b4b Compare May 11, 2024 02:41

vercel bot deployed to Preview – medplum-provider May 11, 2024 02:59 View deployment

mattlong approved these changes May 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(seed): run rebuilds in parallel, add perf logs #4334

perf(seed): run rebuilds in parallel, add perf logs #4334

ThatOneBro commented Apr 8, 2024 •

edited

vercel bot commented Apr 8, 2024 •

edited

codyebberson left a comment

codyebberson Apr 8, 2024

ThatOneBro Apr 8, 2024

codyebberson left a comment •

edited

codyebberson Apr 23, 2024

ThatOneBro Apr 24, 2024

codyebberson Apr 23, 2024

ThatOneBro Apr 23, 2024

ThatOneBro Apr 24, 2024 •

edited

mattlong May 13, 2024

sonarcloud bot commented May 11, 2024

mattlong left a comment

mattlong May 13, 2024

mattlong May 13, 2024

perf(seed): run rebuilds in parallel, add perf logs #4334

Are you sure you want to change the base?

perf(seed): run rebuilds in parallel, add perf logs #4334

Conversation

ThatOneBro commented Apr 8, 2024 • edited

TODO

vercel bot commented Apr 8, 2024 • edited

codyebberson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codyebberson left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThatOneBro Apr 24, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarcloud bot commented May 11, 2024

Quality Gate passed

mattlong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThatOneBro commented Apr 8, 2024 •

edited

vercel bot commented Apr 8, 2024 •

edited

codyebberson left a comment •

edited

ThatOneBro Apr 24, 2024 •

edited