You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, I've noticed that promoting canary to production prevents nextstrain.org from loading for a short but noticeable amount of time.
With the latest promotion of 24ba9ee (nextstrain-server v894 → v895), I paid extra attention to this. Here is a breakdown of the time it took to load https://nextstrain.org on a web browser in two scenarios. The requests took ~30 seconds and were initiated about 10 seconds after the promotion completed successfully, meaning the total downtime was around 40 seconds:
Issue title says "local" downtime because I'm not sure if it's just my connection or if this can be observed by everyone.
The text was updated successfully, but these errors were encountered:
I've noticed this and believe it's due to how Heroku's routing layer switches things over a bit early when cutting between the old dynos and new dynos. I wouldn't call it downtime, though. There's a short period of time when new requests will queue up waiting for the new dyno to be ready and take longer to get a response, but no requests should fail.
I haven't looked into minimizing that time; slug size might be implicated, or our code's own startup time. I also wonder if we could have Heroku's routing layer hold on directing requests to the new dyno until after an app-level health check passes (as opposed to the dyno-level health check it seems to use now).
tsibley
changed the title
~30s local downtime when promoting canary to production
~30s of request queuing when promoting canary to production
Jan 10, 2024
Recently, I've noticed that promoting canary to production prevents nextstrain.org from loading for a short but noticeable amount of time.
With the latest promotion of 24ba9ee (nextstrain-server v894 → v895), I paid extra attention to this. Here is a breakdown of the time it took to load https://nextstrain.org on a web browser in two scenarios. The requests took ~30 seconds and were initiated about 10 seconds after the promotion completed successfully, meaning the total downtime was around 40 seconds:
Issue title says "local" downtime because I'm not sure if it's just my connection or if this can be observed by everyone.
The text was updated successfully, but these errors were encountered: