Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Maintain different resource indexes #829

Closed
jameshadfield opened this issue Apr 15, 2024 · 21 comments
Closed

[RFC] Maintain different resource indexes #829

jameshadfield opened this issue Apr 15, 2024 · 21 comments
Labels
enhancement New feature or request

Comments

@jameshadfield
Copy link
Member

background

Review apps and {dev, canary, production} servers use the same resource index file:

"RESOURCE_INDEX": "s3://nextstrain-inventories/resources.json.gz"

The resource index is generated on a schedule using a node.js program from the master branch and uploading the result:

aws s3 cp resources.json.gz s3://nextstrain-inventories/resources.json.gz

AWS lifecycles expire the indexes after a month.

out of sync errors

Because everything references the same index file, our typical development and deployment cycle is broken if it requires changes to the resource indexer. Using Avian flu restructure as an example, changes in that PR to the indexer will result in "flu/..." datasets being removed and "avian-flu/..." and "seasonal-flu/..." datasets added. The server & frontend code are updated accordingly. Upon merge the following would happen:

  • The resource indexer would (within 24h) rebuild using the newly merged code, then:
  • next.nextstrain.org would work well
  • nextstrain.org would no longer be able to access flu/...@version URLs, and the /pathogens page would show avian-flu and seasonal-flu datasets which would all 404.

A similar story exists for review apps, unless you do something like this

I didn't anticipate we would be making so many structural changes to the index. But we are, so we should improve the system.

proposed solution

Whenever changes to the indexer are made we increment some identifier¹ in the index filename, so it looks like resources_mk1.json.gz or similar and adjust the config.json values accordingly so the server is pulling in the appropriate file. IAM policies would be ammended so that we can access the index for all identifiers. Lifecycle methods would be likewise adjusted to expire all index files after a month.

For review apps, if you triggered an index rebuild (GitHub actions or locally) then the app would work. The index would become stale, but that seems fine for review purposes.

For merges into master, the new index file would now be rebuilt daily, and so next.nextstrain.org would work well. Prior to promotion, nextstrain.org would continue to work but the index it refers to would no longer be being updated, which isn't great.

Are there better ideas? Would be great to work through this with others.

¹ Or generate a random hex, which would better guard against collisions.

@jameshadfield jameshadfield added the enhancement New feature or request label Apr 15, 2024
@joverlee521
Copy link
Contributor

Can we just have two separate resource indexes for canary and production?

We can update the index-resources.yml workflow to grab the commit hash for canary and production from Heroku and to checkout the repo at each specific commit ref.

Then upload to s3://nextstrain-inventories/resources_canary.json.gz and s3://nextstrain-inventories/resources_production.json.gz and provide the custom URL to the Heroku apps via the RESOURCE_INDEX environment variable.

@jameshadfield
Copy link
Member Author

We can update the index-resources.yml workflow to grab the commit hash for canary and production from Heroku and to checkout the repo at each specific commit ref.

That's a nice idea, I didn't think of that. Running the indexer is cheap (in $$$ and time), so I don't have a problem constantly regenerating it twice.

@joverlee521
Copy link
Contributor

Cool! I'll try updating the indexer workflow today.

@jameshadfield
Copy link
Member Author

Thanks! Note that with this plan there'll be a period where things are out of sync between promotion and the next scheduled re-indexing (plus up to another hour before the servers update their index). Probably not a deal breaker, but something to be aware of.

@joverlee521
Copy link
Contributor

joverlee521 commented Apr 15, 2024

Note that with this plan there'll be a period where things are out of sync between promotion and the next scheduled re-indexing (plus up to another hour before the servers update their index).

OH! I got a little lost here, so listing out all the steps that need to work in coordination.

Normal updates:

  1. Nightly updates to the resources.json index on S3 via scheduled index-resources.yml.
  2. Every hour the server checks if the resource.json index etag changed and updates resources if need.

Canary deployment:

  1. Nightly update to resources.json
  2. Deploy to canary, with changes that affect resources.json
  3. Out-of-sync between canary and resources.json until the next nightly update.
  4. Potentially another hour until canary server updates resources internally.

To lessen [3], we can have the CI workflow trigger the index-resources.yml workflow after deploying to Heroku.

Promote to production:

  1. Nightly update to resources.json
  2. Promote canary to production, with changes that affect resources.json
  3. Out-of-sync between production and resources.json until next nightly update.
  4. Potentially another hour until production server updates resources internally.

To lessen [3], we can create a new GH Action workflow to promote canary to production that triggers the index-resources.yml workflow after promotion.

@jameshadfield
Copy link
Member Author

jameshadfield commented Apr 16, 2024

Right. Put another way, #830 couples the server instance with a particular index file. So when the server changes we are out-of-sync until the referenced index file is updated (using the resource indexer code from the same commit as the server), and this update is picked up by the server.

My original proposal coupled the server code with a particular index file, so promotions / deployments don't get out of sync, but it's not clear there's a simple way to keep the index files referenced by the code running on various servers up-to-date.

Trade offs I guess. Maybe there's a third way? Maybe we don't need to do this for the flu url changes if we can be convince ourselves that it's working and immediately promote?

@joverlee521
Copy link
Contributor

Maybe we don't need to do this for the flu url changes if we can be convince ourselves that it's working and immediately promote?

Yeah, we don't want this to block the avian-flu changes, but would like to solve this issue in the long term.

@joverlee521
Copy link
Contributor

joverlee521 commented Apr 16, 2024

Hmm, would it be too much to set up custom Heroku webhooks that would trigger the index-resources.yml worklfow?

I'm imagining that we would create Heroku webhooks for the nextstrain-server and nextstrain-canary apps that are triggered by app release. The webhooks would ping the GitHub API for creating a workflow dispatch event (This would have to somehow include a GitHub token in the Authorization header...). So then each release of the app would trigger a rebuild of the index.

(I'm not entirely sure this is possible, just spitballing ideas...)

Edit: Nvm, not possible to have Heroku webhooks directly ping the GitHub API. Testing ran into errors

Request forbidden by administrative rules. Please make sure your request has a User-Agent header (https://docs.github.com/en/rest/overview/resources-in-the-rest-api#user-agent-required). Check https://developer.github.com for other possible causes.

@joverlee521
Copy link
Contributor

joverlee521 commented Apr 17, 2024

Hmm, we could potentially use the Heroku release phase to run the commands to index the resource right before a release.

Although, this would require adding AWS credentials to the Heroku config vars updating the nextstrain server AWS IAM policy in order to be able to upload the resources.json to S3.

@joverlee521
Copy link
Contributor

Reading through the Heroku release phase docs makes me think this is exactly what we need here for building on top of #830.

We can write a bash script that runs node resourceIndexer/main.js and aws s3 cp ... that interpolates APP_NAME into the S3 path. We then run this script in the index-resources.yml workflow and the release command on Heroku. So the index is always rebuilt for the app right before release for both nextstrain-canary and nextstrain-server.

In the case of testing apps, maybe we save the resources.json locally and set the RESOURCE_INDEX to point to the local JSON.

@jameshadfield
Copy link
Member Author

We discussed this issue in a dev meeting today and (I think!) decided on a path to start taking here. We're going to do this independently of #811.

We'll use an identifier¹ for each resource index configuration and bump this any time we make changes to the code which requires a different index configuration². To handle the generation of different index configurations which are being used concurrently (e.g. canary vs production servers) we'll amend #830 to produce these by querying heroku to find out what commits they are running from and running the resource indexing from those commits³.

For review apps we'll explore having the review app generate an index⁴ locally upon start up and use that. It won't be kept up-to-date during the lifetime of an individual review app deploy, but this seems just fine to me.

@joverlee521 @tsibley is this about right?

¹ I'm trying to avoid version here, as the indexes are stored with S3 versioning which is a different concept.

² And document what kinds of changes fall into this category!

³ Probably not quite this simple, we should probably check if each is actually using a different resource indexer ID so we don't regenerate the same index twice. 95% (99%?) of the time they'll be the same.

⁴ This'll require IAM changes as the server IAM users only have permission to read the index, not the inventories.

@tsibley
Copy link
Member

tsibley commented Apr 18, 2024

Yep. The identifier might be a monotonically increasing integer or it might be the commit id of the deployed code. (We discussed both but didn't decide.)

For review apps we'll explore having the review app generate an index⁴ locally upon start up and use that.

⁴ This'll require IAM changes as the server IAM users only have permission to read the index, not the inventories.

We could avoid the opening up of those permissions by shifting the index generation to a step in CI (which happens on GitHub Actions before Heroku build/release) instead of a step during Heroku build/release. (And this also applies to the non-review app deploys.)

@tsibley
Copy link
Member

tsibley commented Apr 18, 2024

(Also, I see now that several of the things I suggested on our call today were previously proposed here as well by @joverlee521 and @jameshadfield, but I hadn't caught up on this conversation yet!)

@joverlee521
Copy link
Contributor

We could avoid the opening up of those permissions by shifting the index generation to a step in CI (which happens on GitHub Actions before Heroku build/release) instead of a step during Heroku build/release. (And this also applies to the non-review app deploys.)

Hmm, I'll need to talk through how this works for review apps. The review apps are currently automatically deployed by Heroku, so they don't go through the build/release in our CI GH Action workflow. Would we stop the automatic review apps and use our own CI workflow to create/deploy them?

@tsibley
Copy link
Member

tsibley commented Apr 19, 2024

Yes, Heroku manages the deploy of the review apps, but I thought they waited for GitHub CI to pass before deploying? Or am I wrong?

@tsibley
Copy link
Member

tsibley commented Apr 19, 2024

Ahh, they don't currently but maybe will if we tick this ticky box? (or is that only for Heroku CI?)

image

@tsibley
Copy link
Member

tsibley commented Apr 19, 2024

Hrumph.

image

@joverlee521
Copy link
Contributor

I thought they waited for GitHub CI to pass before deploying?

Hmm, even if they did wait on GitHub CI, how would we configure the review app build to include the locally generated index? Or do you mean to upload a temp index to S3 that gets used by the review app?

@tsibley
Copy link
Member

tsibley commented Apr 19, 2024

Redemption?

image

@tsibley
Copy link
Member

tsibley commented Apr 19, 2024

how would we configure the review app build to include the locally generated index? Or do you mean to upload a temp index to S3 that gets used by the review app?

Yes, upload to S3 (e.g. by commit id or similar) and the review app uses it (e.g. by commit id) automatically.

@joverlee521 joverlee521 mentioned this issue Apr 26, 2024
7 tasks
@joverlee521
Copy link
Contributor

Closing this since we are now using separate indexes for nextstrain-server and nextstrain-canary by merging #841.
Tracking several improvement ideas for the resource indexes in new issues (#853, #854, #867).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants