[RFC] Maintain different resource indexes #829

jameshadfield · 2024-04-15T05:26:43Z

background

Review apps and {dev, canary, production} servers use the same resource index file:

nextstrain.org/env/production/config.json

Line 113 in bb40806

"RESOURCE_INDEX": "s3://nextstrain-inventories/resources.json.gz"

The resource index is generated on a schedule using a node.js program from the master branch and uploading the result:

nextstrain.org/.github/workflows/index-resources.yml

Line 39 in bb40806

aws s3 cp resources.json.gz s3://nextstrain-inventories/resources.json.gz

AWS lifecycles expire the indexes after a month.

out of sync errors

Because everything references the same index file, our typical development and deployment cycle is broken if it requires changes to the resource indexer. Using Avian flu restructure as an example, changes in that PR to the indexer will result in "flu/..." datasets being removed and "avian-flu/..." and "seasonal-flu/..." datasets added. The server & frontend code are updated accordingly. Upon merge the following would happen:

The resource indexer would (within 24h) rebuild using the newly merged code, then:
next.nextstrain.org would work well
nextstrain.org would no longer be able to access flu/...@version URLs, and the /pathogens page would show avian-flu and seasonal-flu datasets which would all 404.

A similar story exists for review apps, unless you do something like this

I didn't anticipate we would be making so many structural changes to the index. But we are, so we should improve the system.

proposed solution

Whenever changes to the indexer are made we increment some identifier¹ in the index filename, so it looks like resources_mk1.json.gz or similar and adjust the config.json values accordingly so the server is pulling in the appropriate file. IAM policies would be ammended so that we can access the index for all identifiers. Lifecycle methods would be likewise adjusted to expire all index files after a month.

For review apps, if you triggered an index rebuild (GitHub actions or locally) then the app would work. The index would become stale, but that seems fine for review purposes.

For merges into master, the new index file would now be rebuilt daily, and so next.nextstrain.org would work well. Prior to promotion, nextstrain.org would continue to work but the index it refers to would no longer be being updated, which isn't great.

Are there better ideas? Would be great to work through this with others.

¹ Or generate a random hex, which would better guard against collisions.

The text was updated successfully, but these errors were encountered:

joverlee521 · 2024-04-15T19:44:56Z

Can we just have two separate resource indexes for canary and production?

We can update the index-resources.yml workflow to grab the commit hash for canary and production from Heroku and to checkout the repo at each specific commit ref.

Then upload to s3://nextstrain-inventories/resources_canary.json.gz and s3://nextstrain-inventories/resources_production.json.gz and provide the custom URL to the Heroku apps via the RESOURCE_INDEX environment variable.

jameshadfield · 2024-04-15T20:04:32Z

We can update the index-resources.yml workflow to grab the commit hash for canary and production from Heroku and to checkout the repo at each specific commit ref.

That's a nice idea, I didn't think of that. Running the indexer is cheap (in $$$ and time), so I don't have a problem constantly regenerating it twice.

joverlee521 · 2024-04-15T21:02:05Z

Cool! I'll try updating the indexer workflow today.

jameshadfield · 2024-04-15T21:38:43Z

Thanks! Note that with this plan there'll be a period where things are out of sync between promotion and the next scheduled re-indexing (plus up to another hour before the servers update their index). Probably not a deal breaker, but something to be aware of.

joverlee521 · 2024-04-15T22:43:06Z

Note that with this plan there'll be a period where things are out of sync between promotion and the next scheduled re-indexing (plus up to another hour before the servers update their index).

OH! I got a little lost here, so listing out all the steps that need to work in coordination.

Normal updates:

Nightly updates to the resources.json index on S3 via scheduled index-resources.yml.
Every hour the server checks if the resource.json index etag changed and updates resources if need.

Canary deployment:

Nightly update to resources.json
Deploy to canary, with changes that affect resources.json
Out-of-sync between canary and resources.json until the next nightly update.
Potentially another hour until canary server updates resources internally.

To lessen [3], we can have the CI workflow trigger the index-resources.yml workflow after deploying to Heroku.

Promote to production:

Nightly update to resources.json
Promote canary to production, with changes that affect resources.json
Out-of-sync between production and resources.json until next nightly update.
Potentially another hour until production server updates resources internally.

To lessen [3], we can create a new GH Action workflow to promote canary to production that triggers the index-resources.yml workflow after promotion.

jameshadfield · 2024-04-16T04:06:14Z

Right. Put another way, #830 couples the server instance with a particular index file. So when the server changes we are out-of-sync until the referenced index file is updated (using the resource indexer code from the same commit as the server), and this update is picked up by the server.

My original proposal coupled the server code with a particular index file, so promotions / deployments don't get out of sync, but it's not clear there's a simple way to keep the index files referenced by the code running on various servers up-to-date.

Trade offs I guess. Maybe there's a third way? Maybe we don't need to do this for the flu url changes if we can be convince ourselves that it's working and immediately promote?

joverlee521 · 2024-04-16T21:59:13Z

Maybe we don't need to do this for the flu url changes if we can be convince ourselves that it's working and immediately promote?

Yeah, we don't want this to block the avian-flu changes, but would like to solve this issue in the long term.

joverlee521 · 2024-04-16T23:06:39Z

Hmm, would it be too much to set up custom Heroku webhooks that would trigger the index-resources.yml worklfow?

I'm imagining that we would create Heroku webhooks for the nextstrain-server and nextstrain-canary apps that are triggered by app release. The webhooks would ping the GitHub API for creating a workflow dispatch event (This would have to somehow include a GitHub token in the Authorization header...). So then each release of the app would trigger a rebuild of the index.

(I'm not entirely sure this is possible, just spitballing ideas...)

Edit: Nvm, not possible to have Heroku webhooks directly ping the GitHub API. Testing ran into errors

Request forbidden by administrative rules. Please make sure your request has a User-Agent header (https://docs.github.com/en/rest/overview/resources-in-the-rest-api#user-agent-required). Check https://developer.github.com for other possible causes.

joverlee521 · 2024-04-17T00:12:26Z

Hmm, we could potentially use the Heroku release phase to run the commands to index the resource right before a release.

Although, this would require ~~adding AWS credentials to the Heroku config vars~~ updating the nextstrain server AWS IAM policy in order to be able to upload the resources.json to S3.

joverlee521 · 2024-04-17T20:36:43Z

Reading through the Heroku release phase docs makes me think this is exactly what we need here for building on top of #830.

We can write a bash script that runs node resourceIndexer/main.js and aws s3 cp ... that interpolates APP_NAME into the S3 path. We then run this script in the index-resources.yml workflow and the release command on Heroku. So the index is always rebuilt for the app right before release for both nextstrain-canary and nextstrain-server.

In the case of testing apps, maybe we save the resources.json locally and set the RESOURCE_INDEX to point to the local JSON.

jameshadfield · 2024-04-18T20:05:28Z

We discussed this issue in a dev meeting today and (I think!) decided on a path to start taking here. We're going to do this independently of #811.

We'll use an identifier¹ for each resource index configuration and bump this any time we make changes to the code which requires a different index configuration². To handle the generation of different index configurations which are being used concurrently (e.g. canary vs production servers) we'll amend #830 to produce these by querying heroku to find out what commits they are running from and running the resource indexing from those commits³.

For review apps we'll explore having the review app generate an index⁴ locally upon start up and use that. It won't be kept up-to-date during the lifetime of an individual review app deploy, but this seems just fine to me.

@joverlee521 @tsibley is this about right?

¹ I'm trying to avoid version here, as the indexes are stored with S3 versioning which is a different concept.

² And document what kinds of changes fall into this category!

³ Probably not quite this simple, we should probably check if each is actually using a different resource indexer ID so we don't regenerate the same index twice. 95% (99%?) of the time they'll be the same.

⁴ This'll require IAM changes as the server IAM users only have permission to read the index, not the inventories.

tsibley · 2024-04-18T21:55:10Z

Yep. The identifier might be a monotonically increasing integer or it might be the commit id of the deployed code. (We discussed both but didn't decide.)

For review apps we'll explore having the review app generate an index⁴ locally upon start up and use that.

⁴ This'll require IAM changes as the server IAM users only have permission to read the index, not the inventories.

We could avoid the opening up of those permissions by shifting the index generation to a step in CI (which happens on GitHub Actions before Heroku build/release) instead of a step during Heroku build/release. (And this also applies to the non-review app deploys.)

tsibley · 2024-04-18T21:56:46Z

(Also, I see now that several of the things I suggested on our call today were previously proposed here as well by @joverlee521 and @jameshadfield, but I hadn't caught up on this conversation yet!)

joverlee521 · 2024-04-19T22:05:03Z

We could avoid the opening up of those permissions by shifting the index generation to a step in CI (which happens on GitHub Actions before Heroku build/release) instead of a step during Heroku build/release. (And this also applies to the non-review app deploys.)

Hmm, I'll need to talk through how this works for review apps. The review apps are currently automatically deployed by Heroku, so they don't go through the build/release in our CI GH Action workflow. Would we stop the automatic review apps and use our own CI workflow to create/deploy them?

tsibley · 2024-04-19T22:20:06Z

Yes, Heroku manages the deploy of the review apps, but I thought they waited for GitHub CI to pass before deploying? Or am I wrong?

tsibley · 2024-04-19T22:22:49Z

Ahh, they don't currently but maybe will if we tick this ticky box? (or is that only for Heroku CI?)

tsibley · 2024-04-19T22:24:02Z

Hrumph.

joverlee521 · 2024-04-19T22:26:12Z

I thought they waited for GitHub CI to pass before deploying?

Hmm, even if they did wait on GitHub CI, how would we configure the review app build to include the locally generated index? Or do you mean to upload a temp index to S3 that gets used by the review app?

tsibley · 2024-04-19T22:28:02Z

Redemption?

tsibley · 2024-04-19T22:28:44Z

how would we configure the review app build to include the locally generated index? Or do you mean to upload a temp index to S3 that gets used by the review app?

Yes, upload to S3 (e.g. by commit id or similar) and the review app uses it (e.g. by commit id) automatically.

joverlee521 · 2024-05-16T21:43:44Z

Closing this since we are now using separate indexes for nextstrain-server and nextstrain-canary by merging #841.
Tracking several improvement ideas for the resource indexes in new issues (#853, #854, #867).

jameshadfield added the enhancement New feature or request label Apr 15, 2024

joverlee521 mentioned this issue Apr 15, 2024

index-resources.yml: Create separate indexes for Heroku apps #830

Closed

2 tasks

jameshadfield mentioned this issue Apr 16, 2024

Avian flu restructure #811

Merged

joverlee521 mentioned this issue Apr 26, 2024

Separate indexer #841

Merged

7 tasks

joverlee521 closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Maintain different resource indexes #829

[RFC] Maintain different resource indexes #829

jameshadfield commented Apr 15, 2024

joverlee521 commented Apr 15, 2024

jameshadfield commented Apr 15, 2024

joverlee521 commented Apr 15, 2024

jameshadfield commented Apr 15, 2024

joverlee521 commented Apr 15, 2024 •

edited

jameshadfield commented Apr 16, 2024 •

edited

joverlee521 commented Apr 16, 2024

joverlee521 commented Apr 16, 2024 •

edited

joverlee521 commented Apr 17, 2024 •

edited

joverlee521 commented Apr 17, 2024

jameshadfield commented Apr 18, 2024

tsibley commented Apr 18, 2024

tsibley commented Apr 18, 2024

joverlee521 commented Apr 19, 2024

tsibley commented Apr 19, 2024

tsibley commented Apr 19, 2024

tsibley commented Apr 19, 2024

joverlee521 commented Apr 19, 2024

tsibley commented Apr 19, 2024

tsibley commented Apr 19, 2024

joverlee521 commented May 16, 2024

[RFC] Maintain different resource indexes #829

[RFC] Maintain different resource indexes #829

Comments

jameshadfield commented Apr 15, 2024

joverlee521 commented Apr 15, 2024

jameshadfield commented Apr 15, 2024

joverlee521 commented Apr 15, 2024

jameshadfield commented Apr 15, 2024

joverlee521 commented Apr 15, 2024 • edited

jameshadfield commented Apr 16, 2024 • edited

joverlee521 commented Apr 16, 2024

joverlee521 commented Apr 16, 2024 • edited

joverlee521 commented Apr 17, 2024 • edited

joverlee521 commented Apr 17, 2024

jameshadfield commented Apr 18, 2024

tsibley commented Apr 18, 2024

tsibley commented Apr 18, 2024

joverlee521 commented Apr 19, 2024

tsibley commented Apr 19, 2024

tsibley commented Apr 19, 2024

tsibley commented Apr 19, 2024

joverlee521 commented Apr 19, 2024

tsibley commented Apr 19, 2024

tsibley commented Apr 19, 2024

joverlee521 commented May 16, 2024

joverlee521 commented Apr 15, 2024 •

edited

jameshadfield commented Apr 16, 2024 •

edited

joverlee521 commented Apr 16, 2024 •

edited

joverlee521 commented Apr 17, 2024 •

edited