Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index performance script #2042

Merged
merged 1 commit into from Mar 19, 2024

Conversation

cmurphy
Copy link
Contributor

@cmurphy cmurphy commented Mar 13, 2024

Add terraform configuration and scripts to set up rekor standalone on GCP, perform a series of insert and search operations, use Prometheus to gather metrics, and plot the results with gnuplot.

The scripts added here are for comparing mysql and redis as index storage backends. Other types of performance measurement scripts could be added here in the future.

To get a realistic sense of query speed for searches, a large data set is needed. Rather than using the rekor API to insert real data, fake data is generated and uploaded directly to the backend before searching it.

Different types of searches are performed: searches where there should be many results, searches where there should be few results, and searches where there should be no results. The goal is not to compare the latency of these different searches, but to take the overall average to compare across backends.

Depends on sigstore/scaffolding#1036

Summary

Release Note

Documentation

@cmurphy
Copy link
Contributor Author

cmurphy commented Mar 13, 2024

Example output: https://gist.github.com/cmurphy/71683066aa94084e6795cd406fd1eab4

I've not added any workflows to run this script regularly, but it could be useful in the future to run this automatically to track performance trends. It would not be appropriate to gate on this, since an individual performance run could be better or worse from one to another based on a wide variety of factors.

I'm using this to do work on the public instance and it was suggested that it could be useful to include in this repo. I'm also fine with keeping it separate if this seems too niche.

Copy link

codecov bot commented Mar 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 48.93%. Comparing base (488eb97) to head (73135f3).
Report is 62 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2042       +/-   ##
===========================================
- Coverage   66.46%   48.93%   -17.53%     
===========================================
  Files          92       80       -12     
  Lines        9258     6641     -2617     
===========================================
- Hits         6153     3250     -2903     
- Misses       2359     2987      +628     
+ Partials      746      404      -342     
Flag Coverage Δ
e2etests ?
unittests 48.93% <ø> (+1.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@haydentherapper haydentherapper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I think it's fine to include the example in this repo of how to run this on GCP. Only question is if we should try to keep the various dependencies up to date or not.

setup_bastion() {
echo "Configuring the bastion..."
sudo apt install kubernetes-client google-cloud-sdk-gke-gcloud-auth-plugin git redis-tools gnuplot prometheus minisign -y
which hyperfine >/dev/null || ( wget -O /tmp/hyperfine_1.16.1_amd64.deb https://github.com/sharkdp/hyperfine/releases/download/v1.16.1/hyperfine_1.16.1_amd64.deb && sudo dpkg -i /tmp/hyperfine_1.16.1_amd64.deb )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we download latest for each of these? There's some risk of breaking changes to the script, but that should be detectable at runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

project_id = var.project
cluster_name = "rekor"

attestation_bucket = "cmurphy-sigstore-attestations"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove attestation bucket, since that isn't required until intoto types are in use?

instance_name = module.bastion.name
zone = module.bastion.zone
members = [
"serviceAccount:ga-206@colleenmurphy-testing-410318.iam.gserviceaccount.com",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should make these variables, or at least mention in docs they'd need to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, did not mean to leave this in here

@@ -0,0 +1,374 @@
#!/bin/bash -e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use

#!/usr/bin/env bash

set -o errexit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make more explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Add terraform configuration and scripts to set up rekor standalone on
GCP, perform a series of insert and search operations, use Prometheus to
gather metrics, and plot the results with gnuplot.

The scripts added here are for comparing mysql and redis as index
storage backends. Other types of performance measurement scripts could
be added here in the future.

To get a realistic sense of query speed for searches, a large data set
is needed. Rather than using the rekor API to insert real data, fake
data is generated and uploaded directly to the backend before searching
it.

Different types of searches are performed: searches where there should
be many results, searches where there should be few results, and
searches where there should be no results. The goal is not to compare
the latency of these different searches, but to take the overall average
to compare across backends.

Signed-off-by: Colleen Murphy <colleenmurphy@google.com>
@cmurphy cmurphy marked this pull request as ready for review March 19, 2024 22:41
@cmurphy cmurphy requested a review from a team as a code owner March 19, 2024 22:41
@haydentherapper haydentherapper enabled auto-merge (squash) March 19, 2024 22:42
@haydentherapper haydentherapper merged commit f57b0b9 into sigstore:main Mar 19, 2024
14 checks passed
@github-actions github-actions bot added this to the v1.2.2 milestone Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants