Skip to content
This repository has been archived by the owner on Jun 10, 2020. It is now read-only.

Production and Staging have been experiencing downtime #740

Open
gbinal opened this issue Nov 15, 2017 · 2 comments
Open

Production and Staging have been experiencing downtime #740

gbinal opened this issue Nov 15, 2017 · 2 comments

Comments

@gbinal
Copy link
Member

gbinal commented Nov 15, 2017

In recent months (it seems like 3-4 times a month) either production or staging will go down (site returns a 500 error with a message saying (e.g.): 404 Not Found: Requested route ('pulse.app.cloud.gov') does not exist.).

The New Relic alerts are catching it and a simple restart of the app on cloud.gov gets it back up, but obviously, this is not a good thing. Unfortunately, in the last week, it's been happening more often.

We're investigating the causes, but some initial ideas for solving it include:

  • bumping up the memory
  • add cache-breaking (e.g. after ?) after deploy; to help test the effects of deploy
@micahsaul
Copy link
Contributor

Looking at New Relic, it seems like all of the errors I'm seeing are related to someone probing us for vulns. The question, though, is why that would cause the whole site to crash.

@konklone
Copy link
Contributor

I would recommend we bump up the memory.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants