Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use S3 for historical queries instead of DynamoDB #3839

Open
bboreham opened this issue Mar 20, 2021 · 0 comments
Open

Use S3 for historical queries instead of DynamoDB #3839

bboreham opened this issue Mar 20, 2021 · 0 comments
Labels
estimate/weeks It will take more than 7 8-hour work days to implement performance Excessive resource usage and latency; usually a bug or chore

Comments

@bboreham
Copy link
Collaborator

Scope has an optional multitenant mode, where reports are saved to S3 and indexed in DynamoDB.

Once #3783 is done, live rendering will not use the store, so we will have far less time pressure. I think we can drop the index and just use an S3 'list' API call to find objects.

However we will need to change the object path-name to include the time as a prefix.
Current paths are like s3://bucket-name/00002140a76ed46df4956c4af4004160/1554123600273225527, where the first part is a MD5 hash of the tenant ID and hour number, and the second part is the Unix timestamp in nanoseconds.

Steps to complete:

  1. change S3 object pathname so the prefix is tenant/date/hour (or maybe finer-grained).
  2. change querier to list reports within a prefix time-bucket using S3 rather than DynamoDB.
  3. add switch-over date so querier uses DynamoDB index before that and S3 list after.
  4. stop collectors writing to DynamoDB.
@bboreham bboreham added performance Excessive resource usage and latency; usually a bug or chore estimate/weeks It will take more than 7 8-hour work days to implement labels Mar 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
estimate/weeks It will take more than 7 8-hour work days to implement performance Excessive resource usage and latency; usually a bug or chore
Projects
None yet
Development

No branches or pull requests

1 participant