Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(seed): run rebuilds in parallel, add perf logs #4334

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

ThatOneBro
Copy link
Member

@ThatOneBro ThatOneBro commented Apr 8, 2024

So far, this has shown to reduce average startup time when seeding a fresh database on my local machine by 35-45 seconds. I expect to see a noticeable speedup in CI too.

// Before - everything in serial -- Duration: 65_000 ms
{"level":"INFO","timestamp":"2024-04-07T21:51:38.706Z","msg":"Starting rebuilds","name":"Starting rebuilds","entryType":"mark","startTime":9461.975291013718,"duration":0,"detail":null}
{"level":"INFO","timestamp":"2024-04-07T21:52:43.675Z","msg":"Finished with rebuilds","name":"Rebuild time","entryType":"measure","startTime":9461.975291013718,"duration":64968.646834015846}

// After - everything in parallel -- Duration: 29_000 ms
{"level":"INFO","timestamp":"2024-04-07T21:49:29.494Z","msg":"Starting rebuilds","name":"Starting rebuilds","entryType":"mark","startTime":8685.879249930382,"duration":0,"detail":null}
{"level":"INFO","timestamp":"2024-04-07T21:49:58.171Z","msg":"Finished with rebuilds","name":"Rebuild time","entryType":"measure","startTime":8685.879249930382,"duration":28676.561459064484}

TODO

- [x] Ensure low load on when rebuilding (test on staging)

  • Get seed test running in parallel with main test suite
  • Run serial version when rebuilding in prod, use parallel version on fresh database

@ThatOneBro ThatOneBro requested a review from a team as a code owner April 8, 2024 01:55
Copy link

vercel bot commented Apr 8, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
medplum-provider ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 11, 2024 2:59am
3 Ignored Deployments
Name Status Preview Comments Updated (UTC)
medplum-app ⬜️ Ignored (Inspect) Visit Preview May 11, 2024 2:59am
medplum-storybook ⬜️ Ignored (Inspect) Visit Preview May 11, 2024 2:59am
medplum-www ⬜️ Ignored (Inspect) Visit Preview May 11, 2024 2:59am

Copy link
Member

@codyebberson codyebberson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. It'll make CPU's heat up, but definitely worth the experiment.

for (const filename of SEARCH_PARAMETER_BUNDLE_FILES) {
for (const entry of readJson(filename).entry as BundleEntry[]) {
await createParameter(systemRepo, entry.resource as SearchParameter);
promises.push(createParameter(systemRepo, entry.resource as SearchParameter));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, curious how Postgres handles this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're capped at 10 concurrent connections with the pool, I considered trying to up it and see what happens, but seemed to run fine on my local machine as is.

Copy link
Member

@codyebberson codyebberson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this investigation.

How much time savings do we get from this?

I'd like to understand full picture. This increases the complexity of test setup (extra instance of postgres, double seed
logic, etc). I just want to be sure that complexity is justified.

Scratch that, I see, 30-40 seconds, ok, that is substantial.

Yeah, let's push forward with this 👍

packages/server/package.json Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this file? Strong preference for keeping all .ts files in src/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was initially due to weird glob matching not excluding the seed tests properly from main line of tests... it was cleaner to just separate them since they are somewhat logically separate...

I put them back though, found a sort of hack around it with some ** paths that works for some reason. Made a note about it in the file

promises.push(createParameter(systemRepo, entry.resource as SearchParameter));
}
}
await Promise.all(promises);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider always building the array, and then using the parallel option in the last step

if (finalOptions.parallel) {
  await Promise.all(promises);
} else {
  for (const promise of promises) {
    await promise;
  }
}

That way we don't need to worry about the logic getting out of sync.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this won't work. Filling the array in the serial case doesn't actually execute the promises serially, only waits for them serially. We have to create the promise in the loop where it is awaited in order to not unintentionally parallelize them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's worth introducing a new dependency, but p-limit seems like a pretty elegant and simple solution to abstract away the max concurrency that we want to allow and avoid the if/else code paths.

packages/server/src/seeds/searchparameters.ts Outdated Show resolved Hide resolved
packages/server/tsconfig.json Outdated Show resolved Hide resolved
@ThatOneBro ThatOneBro force-pushed the derrick-parallelize-db-seed branch from 0168a8c to 8ef8b4b Compare May 11, 2024 02:41
Copy link

sonarcloud bot commented May 11, 2024

Copy link
Member

@mattlong mattlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a couple comments to consider.

@@ -261,7 +261,7 @@ function addDefaults(config: MedplumServerConfig): MedplumServerConfig {
config.accurateCountThreshold = config.accurateCountThreshold ?? 1000000;
config.defaultBotRuntimeVersion = config.defaultBotRuntimeVersion ?? 'awslambda';
config.defaultProjectFeatures = config.defaultProjectFeatures ?? [];
config.emailProvider = config.emailProvider || (config.smtp ? 'smtp' : 'awsses');
config.emailProvider = config.emailProvider ?? (config.smtp ? 'smtp' : 'awsses');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would || actually be preferable here to guard against config.emailProvider === ''?

promises.push(createParameter(systemRepo, entry.resource as SearchParameter));
}
}
await Promise.all(promises);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's worth introducing a new dependency, but p-limit seems like a pretty elegant and simple solution to abstract away the max concurrency that we want to allow and avoid the if/else code paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants