Address Heroku timeout errors #2296

easherma · 2021-03-17T17:01:40Z

As a streetmix user, I want to visit the site without the app timing out

We keep having a solid percentage of users experience timeouts and other connection drops.

This is a small percentage, but if the numbers of users scales its quite possible this percentage will trend in the wrong direction.

We've tried a couple things so far:

Optimizing queries, adding indexes to make requests complete faster
upgrading dynos and database to improve the performance of the app

It was important to rule those things out, but they have not totally solved the problem.
What else is on the list of potential causes(in order of likleyhood):

Adding express middleware to timeout requests at a smaller threshold than heroku's (https://blog.heroku.com/timeout-quickly) https://help.heroku.com/AXOSFIXN/why-am-i-getting-h12-request-timeout-errors-in-nodejs

With H12 - Request Timeout errors, we generally see this pattern where one long-running action starts hogging the queue which in turn affects any subsequent requests.

Our router will drop a long-running request after 30 seconds, but the dyno behind it will continue processing the request until completion. Our router is unaware of it, though, so it'll dispatch new requests to that busy dyno. This effect tends to compound, and you'll eventually see H12 errors even for unrelated URLs, such as static assets. H13 errors are similar in what causes them, but are primarily related to concurrent web servers.

If your app is using ExpressJS, you will also want to install something like timeout, which will ensure that a long running request is dropped at the dyno-level as well. Specifically, timeout raise a Response timeout exception when that happens.

With that in place, the compound effect is less likely to occur, but long-running actions still need to be addressed.

Put expensive or longer running requests (like a street update) into a background worker queue (https://devcenter.heroku.com/articles/node-redis-workers)
Improve concurrency of the app https://devcenter.heroku.com/articles/node-concurrency

After more research, I'm feeling more convinced that the first one on this list is a likely culprit, mainly because our total number of users and the complexity of their requests is still pretty low in the grand scheme of things.

Implementing timeout middleware

expressjs/express#3330 <--- may be quick and possible to do just with express
http://expressjs.com/en/resources/middleware/timeout.html <--- middleware example

Replicating the issue is hard without simulating a bunch of users. We could do something with https://locust.io/ on staging to try and replicate and test the issue without effecting production.

The text was updated successfully, but these errors were encountered:

easherma · 2021-03-17T18:26:34Z

https://www.npmjs.com/package/loadtest or something similar could be used for load tests

easherma added the enhancement label Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address Heroku timeout errors #2296

Address Heroku timeout errors #2296

easherma commented Mar 17, 2021 •

edited

easherma commented Mar 17, 2021

Address Heroku timeout errors #2296

Address Heroku timeout errors #2296

Comments

easherma commented Mar 17, 2021 • edited

As a streetmix user, I want to visit the site without the app timing out

Implementing timeout middleware

easherma commented Mar 17, 2021

easherma commented Mar 17, 2021 •

edited