Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address Heroku timeout errors #2296

Open
easherma opened this issue Mar 17, 2021 · 1 comment
Open

Address Heroku timeout errors #2296

easherma opened this issue Mar 17, 2021 · 1 comment

Comments

@easherma
Copy link
Collaborator

easherma commented Mar 17, 2021

As a streetmix user, I want to visit the site without the app timing out

We keep having a solid percentage of users experience timeouts and other connection drops.

This is a small percentage, but if the numbers of users scales its quite possible this percentage will trend in the wrong direction.

We've tried a couple things so far:

  1. Optimizing queries, adding indexes to make requests complete faster
  2. upgrading dynos and database to improve the performance of the app

It was important to rule those things out, but they have not totally solved the problem.
What else is on the list of potential causes(in order of likleyhood):

  1. Adding express middleware to timeout requests at a smaller threshold than heroku's (https://blog.heroku.com/timeout-quickly) https://help.heroku.com/AXOSFIXN/why-am-i-getting-h12-request-timeout-errors-in-nodejs
With H12 - Request Timeout errors, we generally see this pattern where one long-running action starts hogging the queue which in turn affects any subsequent requests.

Our router will drop a long-running request after 30 seconds, but the dyno behind it will continue processing the request until completion. Our router is unaware of it, though, so it'll dispatch new requests to that busy dyno. This effect tends to compound, and you'll eventually see H12 errors even for unrelated URLs, such as static assets. H13 errors are similar in what causes them, but are primarily related to concurrent web servers.

If your app is using ExpressJS, you will also want to install something like timeout, which will ensure that a long running request is dropped at the dyno-level as well. Specifically, timeout raise a Response timeout exception when that happens.

With that in place, the compound effect is less likely to occur, but long-running actions still need to be addressed. 
  1. Put expensive or longer running requests (like a street update) into a background worker queue (https://devcenter.heroku.com/articles/node-redis-workers)
  2. Improve concurrency of the app https://devcenter.heroku.com/articles/node-concurrency

After more research, I'm feeling more convinced that the first one on this list is a likely culprit, mainly because our total number of users and the complexity of their requests is still pretty low in the grand scheme of things.

Implementing timeout middleware

expressjs/express#3330 <--- may be quick and possible to do just with express
http://expressjs.com/en/resources/middleware/timeout.html <--- middleware example

Replicating the issue is hard without simulating a bunch of users. We could do something with https://locust.io/ on staging to try and replicate and test the issue without effecting production.

@easherma
Copy link
Collaborator Author

https://www.npmjs.com/package/loadtest or something similar could be used for load tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant