Refactor API data-access pattern to only load what is necessary; use prepared statements #12

ohaibbq · 2024-04-11T20:16:48Z

This PR greatly improves the performance of the emulator API endpoints.

We no longer load the entire BigQuery project (jobs, datasets, tables, etc) on each request. This was slow for a number of reasons as outlined in goccy#294

We also now utilize unformatted SQLite queries. Previously, go-zetasqlite would rewrite the metadata repository queries to use functions like zetasqlite_equals instead of operators like =, which meant that SQLite would need to do full table scans when it could be doing more performant query plans.

Many endpoints now take tens of microseconds to return instead of 100+ms. Table creation takes roughly ~10ms whereas before we were seeing upwards of 300ms.

POST /bigquery/v2/projects/recidiviz-bq-emulator-project/jobs   {"query": "prettyPrint=false"}
POST /bigquery/v2/projects/recidiviz-bq-emulator-project/jobs took 16.821342ms
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/datasets/validation_views       {"query": "prettyPrint=false"}
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/datasets/validation_views took 287.595µs
POST /bigquery/v2/projects/recidiviz-bq-emulator-project/jobs   {"query": "prettyPrint=false"}
POST /bigquery/v2/projects/recidiviz-bq-emulator-project/jobs took 19.376201ms
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/queries/c0150a30-f7c6-4bbf-b9e3-a00dd494f14d    {"query": "maxResults=0&location=US&prettyPrint=false"}
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/queries/c0150a30-f7c6-4bbf-b9e3-a00dd494f14d took 791.9µs
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/queries/d1e45499-8eb2-4417-bf5d-af0c1266503a    {"query": "maxResults=0&location=US&prettyPrint=false"}
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/queries/d1e45499-8eb2-4417-bf5d-af0c1266503a took 645.539µs
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/datasets/validation_views/tables/outliers_staff_count_percent_change_materialized       {"query": "prettyPrint=false"}
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/datasets/validation_views/tables/outliers_staff_count_percent_change_materialized took 1.660463ms
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/datasets/validation_views/tables/current_supervision_staff_missing_district_materialized        {"query": "prettyPrint=false"}
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/datasets/validation_views/tables/current_supervision_staff_missing_district_materialized took 1.354469ms
POST /bigquery/v2/projects/recidiviz-bq-emulator-project/datasets/validation_views/tables       {"query": "prettyPrint=false"}
POST /bigquery/v2/projects/recidiviz-bq-emulator-project/datasets/validation_views/tables took 13.399834ms
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/datasets/validation_views       {"query": "prettyPrint=false"}
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/datasets/validation_views took 298.365µs
POST /bigquery/v2/projects/recidiviz-bq-emulator-project/jobs   {"query": "prettyPrint=false"}
POST /bigquery/v2/projects/recidiviz-bq-emulator-project/jobs took 16.414274ms
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/queries/55f3fc1a-05c2-41a5-88dd-0f9f592f4a1c    {"query": "maxResults=0&location=US&prettyPrint=false"}
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/queries/55f3fc1a-05c2-41a5-88dd-0f9f592f4a1c took 458.882µs
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/datasets/validation_views/tables/outliers_staff_count_percent_change_errors_materialized        {"query": "prettyPrint=false"}
GET /bigquery/v2/projects/recidiviz-bq-emulator-project/datasets/validation_views/tables/outliers_staff_count_percent_change_errors_materialized took 1.560456ms

We are now able to do a run of recidiviz.tools.deploy.deploy_empty_test_views creating & materializing 1,400 views
against the emulator in ~35 seconds.

…prepared statements

ohaibbq mentioned this pull request Apr 12, 2024

Improve emulator performance for large projects goccy/bigquery-emulator#294

Open

ohaibbq added 4 commits April 12, 2024 10:18

Refactor API data-access pattern to only load what is necessary; use …

2f0ed28

…prepared statements

use sprintf

2eb3d58

fix mutex deadlock + tests

240e32d

rebase / merge; use existing transaction when one is open

5ba6165

ohaibbq force-pushed the dan/api-performance branch from 862fc36 to 5ba6165 Compare April 12, 2024 18:17

remove unused perf timer

bc3e29e

ohaibbq requested review from a team, ageiduschek, colincadams, ethan-oro, emilyemilyemilyemilyemilyemily and recidinick April 12, 2024 18:27

re-use transaction when deleting dataset

5dc4e1d

ageiduschek approved these changes Apr 16, 2024

View reviewed changes

ohaibbq merged commit 61539ca into candidate/rb20240328 Apr 17, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor API data-access pattern to only load what is necessary; use prepared statements #12

Refactor API data-access pattern to only load what is necessary; use prepared statements #12

ohaibbq commented Apr 11, 2024 •

edited

Refactor API data-access pattern to only load what is necessary; use prepared statements #12

Refactor API data-access pattern to only load what is necessary; use prepared statements #12

Conversation

ohaibbq commented Apr 11, 2024 • edited

ohaibbq commented Apr 11, 2024 •

edited