Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to location API failed (502) and Homebridge crashed entirely #1160

Open
tbaur opened this issue Feb 17, 2023 · 14 comments
Open

Request to location API failed (502) and Homebridge crashed entirely #1160

tbaur opened this issue Feb 17, 2023 · 14 comments
Labels
bug Something isn't working

Comments

@tbaur
Copy link

tbaur commented Feb 17, 2023

Bug Report

Describe the Bug

API call hit a 502, and it crashed homebridge entirely. Logs below. This happened 3-4 times last night. Clearly the Ring API was having issues, but it shouldn't crash homebridge all together.

Screenshots/Logs

[2/17/2023, 4:54:44 AM] [Ring] Request to https://app.ring.com/api/v1/mode/location/0950258e-XXX failed with status 502. Response body:

<title>502 Bad Gateway</title>

502 Bad Gateway

[2/17/2023, 4:54:44 AM] HTTPError: Response code 502 (Bad Gateway)
at Request. (/usr/local/lib/node_modules/homebridge-ring/node_modules/got/dist/source/as-promise/index.js:118:42)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
[2/17/2023, 4:54:44 AM] Got SIGTERM, shutting down Homebridge...

Environment

OSX and current versions of node, npm, homebridge, and homebridge-ring

@tbaur tbaur added the bug Something isn't working label Feb 17, 2023
@tbaur tbaur changed the title Request to location API failed (503) and Homebridge crashed entirely Request to location API failed (502) and Homebridge crashed entirely Feb 17, 2023
@tbaur
Copy link
Author

tbaur commented Feb 23, 2023

More of the similar crashing.

[2/23/2023, 7:27:27 AM] [Ring] Request to https://app.ring.com/api/v1/mode/location/0950258e-XXX failed with status 400. Response body: envproxy unknown host

[2/23/2023, 7:27:27 AM] HTTPError: Response code 400 (Bad Request)
at Request. (/usr/local/lib/node_modules/homebridge-ring/node_modules/got/dist/source/as-promise/index.js:118:42)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
[2/23/2023, 7:27:27 AM] Got SIGTERM, shutting down Homebridge...

@tbaur
Copy link
Author

tbaur commented Feb 23, 2023

And then after repeat issues, repeat crashing and homebridge restarts, the API then rejects and causes even further spinning.

[2/23/2023, 7:27:40 AM] [Ring] HTTPError: Response code 429 (Too Many Requests)

@dgreif
Copy link
Owner

dgreif commented May 13, 2023

@tbaur sorry I didn't see this issue sooner. The 429 is a rate-limiting response, likely because you restarted many times which caused a large number of requests. Are you still experiencing the 502 and 400 errors which originally caused you trouble?

@tbaur
Copy link
Author

tbaur commented May 13, 2023

Yeah, I get the 429. Handling of the 502/400's seems to crash homebridge though, repeatedly, and with all the automatic service restarts it then hits the 429 and it doesn't recover well. Unclear why the 502/400's would crash the whole service. Perhaps some improvements in handling upstream API failures may help.

@dgreif
Copy link
Owner

dgreif commented May 13, 2023

Thanks for the update. In your second comment, I see envproxy unknown host as the error. Do you have some sort of Proxy or VPN set up? I haven't seen errors like this before, so I'm guessing it's something related to your network

@tbaur
Copy link
Author

tbaur commented May 13, 2023

There is no proxy or vpn inline of homebridge on my side -- envproxy unknown host is in the response body. I'd guess that app.ring.com is a front end API gateway for backend services. My original point for raising this issue wasn't that app.ring.com had a problem -- it was the handling of the API call errors -- they shouldn't crash homebridge entirely. The 502 server side error triggers the crash, then restart crash repeat loops, and things go downhill with various 4xx codes.

@tbaur
Copy link
Author

tbaur commented May 15, 2023

Here is another from this morning. Server side error and the plugin crashed homebridge.

[5/15/2023, 9:34:56 AM] [Ring] Request to https://app.ring.com/api/v1/mode/location/0950258e-XXXX failed with status 503. Response body: {"errors":[4007]}
[5/15/2023, 9:34:56 AM] HTTPError: Response code 503 (Service Unavailable)
at Request. (/usr/local/lib/node_modules/homebridge-ring/node_modules/got/dist/source/as-promise/index.js:118:42)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
[5/15/2023, 9:34:56 AM] Got SIGTERM, shutting down Homebridge...

@dgreif
Copy link
Owner

dgreif commented May 16, 2023

@tbaur just to make sure I understand your intent with this issue - is the ring plugin generally working for you, but occasionally crashing or crash-looping if an unexpected server error is encountered?

@tbaur
Copy link
Author

tbaur commented May 27, 2023

[5/26/2023, 3:16:50 PM] [Ring] Request to https://app.ring.com/api/v1/mode/location/0950258e-XXXX failed with status 503. Response body: {"errors":[4007]}
[5/26/2023, 3:16:50 PM] HTTPError: Response code 503 (Service Unavailable)
at Request. (/usr/local/lib/node_modules/homebridge-ring/node_modules/got/dist/source/as-promise/index.js:118:42)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
[5/26/2023, 3:16:50 PM] Got SIGTERM, shutting down Homebridge...

==> restart and crash again:

[5/26/2023, 3:27:19 PM] [Ring] Request to https://app.ring.com/api/v1/mode/location/0950258e-XXXX failed with status 500. Response body:
[5/26/2023, 3:27:19 PM] HTTPError: Response code 500 (Server Error)
at Request. (/usr/local/lib/node_modules/homebridge-ring/node_modules/got/dist/source/as-promise/index.js:118:42)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
[5/26/2023, 3:27:19 PM] Got SIGTERM, shutting down Homebridge...

@tbaur
Copy link
Author

tbaur commented May 27, 2023

Yes, plugin works, until server side errors on ring's API crash homebridge.

@tsightler
Copy link
Collaborator

So this has been out here a long time and it doesn't seem to happen often enough to be critical, but I had a little time tonight and did some digging and I think this is simply because the request() function in rest-client.ts looks for specific status codes and retries them, but for any return codes that are not explicitly handled (401, 404, 504) it throws the error. However, as far as I can see, none of the callers of request() actually attempt to catch an error, so the unhandled promise rejection leads to the node process exiting since that is the default for modern node versions since v16 (maybe v15 during development). Prior to that, this probably was less of an issue.

I don't see this in ring-mqtt because I use process.on('unhandledRejection') to catch such events for a few other "normal" cases such as invalid token and make the error messages slight cleaner by not outputting the stack trace because otherwise users open cases about them, but it also has the advantage of catching these crashing cases. However, I don't think this approach is appropriate for a library.

I believe the proper fix is to implement error handling via a catch for every call to request, since it is certainly possible that any given request my actually fail. However, in ring-client-api alone there are around 70 callers to request() and certainly some library users will call request() directly for some things as well (I have at least a half-dozen such calls in ring-mqtt)

I don't have time to try to fix all of these right now so that leaves a few alternatives. The working assumption of most of the code has always been that a call to request() will retry automatically until it succeeds so my lean is to handle all 500 level responses the same as a gateway timeout.

This still leaves 400 errors, in general I don't think "Bad requests" errors should be retried, so I'm not sure what to do with them. Maybe just log but don't throw the error? I don't think that behavior would be worse than what the code did prior in Node v14 and earlier when it generated promise rejections but the node process didn't exit.

Fully open to other suggestions, is there some cleaner option I am missing?

@tbaur
Copy link
Author

tbaur commented Nov 5, 2023 via email

@tsightler
Copy link
Collaborator

To be fair, it's more because Node changed behavior that it crashes now, since they made unhandled rejections exit by default sometime after NodeJS 14. You can always run node with --unhandled-rejections=warn which was the behavior when the code was originally written many years ago.

Not that I'm disagreeing with you, but I just don't have time to go hunt down every consumer and put in proper error handling. I'm currently testing out just not throwing an error, but just logging it.

@tbaur
Copy link
Author

tbaur commented Nov 5, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants