Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocket Production Freezes. #1302

Closed
migeruj opened this issue May 18, 2020 · 11 comments
Closed

Rocket Production Freezes. #1302

migeruj opened this issue May 18, 2020 · 11 comments
Labels
triage A bug report being investigated

Comments

@migeruj
Copy link

migeruj commented May 18, 2020

Hi guys, I have a development of a web page with Rocket, I use tera templates for views. I have a problem with some robots that visit the page and access Paths. I use docker to orchestrate the container of the app.

The problem is that after so many wrong visits. The page does not respond. The request exits the load balancer correctly. But it is as if Rocket is left in a frozen state. The only way to keep it running is to restart the container.

What can I do to verify this situation?

Bug Reports

The container keep runing but not recieve request.

Bug reports must include:

  1. Rocket = 0.4.4

  2. This happens on Ubuntu 18.04 (Docker)

  3. A brief description of the bug that includes:
    * Just Container Freezes and return timeout.
    * After 3-4 weeks of working perfectly. It freezes.
    * I expect if is there a error, the container restarts with the restart clauses but keeps running

  4. How you uncovered the bug. Short, reproducible tests are especially useful.
    I checked that it wasn't a load balancer problem. I have discarded cpu / ram bottlenecks on the instances, there are no memory leaks bothering the container. Rocket just freeze. The request reaches the load balancer but rocket returns nothing.

  5. Ideas, if any, about what Rocket is doing incorrectly.

Well, actually i dont know if is there something we are doing incorrectly or rocket.

Almost all the logs that were received before going into a freeze state are from robots.

Logs Output

api_1  | 2020-04-23T19:18:37.818725094Z Error: No matching routes for GET /Temporary_Listen_Addresses/SMSSERVICE.
api_1  | 2020-04-23T19:18:37.818803995Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-23T20:11:59.500589442Z Error: No matching routes for GET /WSMAN.
api_1  | 2020-04-23T20:11:59.500636777Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-23T23:27:13.273145283Z Error: No matching routes for GET /TP/public/index.php.
api_1  | 2020-04-23T23:27:13.273184826Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-23T23:32:56.145179224Z Error: Bad incoming request: Bad URI: http://5.188.210.101/echo.php
api_1  | 2020-04-23T23:32:56.145219176Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T00:11:53.508018913Z Error: No matching routes for GET /portal/redlion.
api_1  | 2020-04-24T00:11:53.508057164Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T00:24:41.238074199Z Error: Bad incoming request: Bad URI: http://example.com/
api_1  | 2020-04-24T00:24:41.238109009Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T00:24:41.381671784Z Error: Bad incoming request: Bad URI: http://169.254.169.254/
api_1  | 2020-04-24T00:24:41.381711416Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T00:24:41.528650834Z Error: Bad incoming request: Bad URI: http://[::ffff:a9fe:a9fe]/
api_1  | 2020-04-24T00:24:41.528688774Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T00:24:41.674086207Z Error: Bad incoming request: Bad URI: http://169.254.169.254/latest/dynamic/instance-identity/document
api_1  | 2020-04-24T00:24:41.674121807Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T00:24:41.817321373Z Error: Bad incoming request: Bad URI: http://[::ffff:a9fe:a9fe]/latest/dynamic/instance-identity/document
api_1  | 2020-04-24T00:24:41.817359389Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T00:24:42.399037419Z Error: No matching routes for GET /latest/dynamic/instance-identity/document.
api_1  | 2020-04-24T00:24:42.399073070Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T00:24:42.546013807Z Error: No matching routes for GET /latest/dynamic/instance-identity/document.
api_1  | 2020-04-24T00:24:42.546050861Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T00:24:42.760916422Z Error: Bad incoming request: Bad URI: 3.133.46.235:80
api_1  | 2020-04-24T00:24:42.760950229Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T00:24:42.973010419Z Error: Bad incoming request: Bad URI: 3.133.46.235:80
api_1  | 2020-04-24T00:24:42.973050291Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T00:24:43.186413272Z Error: Bad incoming request: Bad URI: 3.133.46.235:80
api_1  | 2020-04-24T00:24:43.186452027Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T00:24:43.399692990Z Error: Bad incoming request: Bad URI: 3.133.46.235:80
api_1  | 2020-04-24T00:24:43.399730736Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T00:24:43.611414377Z Error: Bad incoming request: Bad URI: 3.133.46.235:80
api_1  | 2020-04-24T00:24:43.611452236Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T01:48:40.252481998Z Error: No matching routes for GET /cgi-bin/luci.
api_1  | 2020-04-24T01:48:40.252520193Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T01:48:40.539299041Z Error: No matching routes for GET /dana-na/auth/url_default/welcome.cgi.
api_1  | 2020-04-24T01:48:40.539332985Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T01:48:40.827414845Z Error: No matching routes for GET /remote/login?lang=en.
api_1  | 2020-04-24T01:48:40.827449687Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T01:48:41.433252234Z Error: No matching routes for GET /index.asp.
api_1  | 2020-04-24T01:48:41.433290759Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T01:48:41.721679841Z Error: No matching routes for GET /htmlV/welcomeMain.htm.
api_1  | 2020-04-24T01:48:41.721715821Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T02:49:49.742270868Z Error: No matching routes for GET /cgi-bin/luci.
api_1  | 2020-04-24T02:49:49.742311572Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T02:49:49.769932246Z Error: No matching routes for GET /dana-na/auth/url_default/welcome.cgi.
api_1  | 2020-04-24T02:49:49.769967969Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T02:49:49.798014990Z Error: No matching routes for GET /remote/login?lang=en.
api_1  | 2020-04-24T02:49:49.798068455Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T02:49:49.853705374Z Error: No matching routes for GET /index.asp.
api_1  | 2020-04-24T02:49:49.853742238Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T02:49:49.881635155Z Error: No matching routes for GET /htmlV/welcomeMain.htm.
api_1  | 2020-04-24T02:49:49.881666922Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T02:56:02.505563337Z Error: No matching routes for GET /cgi-bin/luci.
api_1  | 2020-04-24T02:56:02.505603887Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T02:56:02.799481995Z Error: No matching routes for GET /dana-na/auth/url_default/welcome.cgi.
api_1  | 2020-04-24T02:56:02.799518708Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T02:56:03.090173426Z Error: No matching routes for GET /remote/login?lang=en.
api_1  | 2020-04-24T02:56:03.090207670Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T02:56:03.695323071Z Error: No matching routes for GET /index.asp.
api_1  | 2020-04-24T02:56:03.695359770Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T02:56:03.981693180Z Error: No matching routes for GET /htmlV/welcomeMain.htm.
api_1  | 2020-04-24T02:56:03.981729290Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T03:05:40.786861834Z Error: No matching routes for GET /manager/html text/html.
api_1  | 2020-04-24T03:05:40.786900043Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T10:53:03.678421811Z Error: No matching routes for GET /cgi-bin/luci.
api_1  | 2020-04-24T10:53:03.678465553Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T10:53:03.970592075Z Error: No matching routes for GET /dana-na/auth/url_default/welcome.cgi.
api_1  | 2020-04-24T10:53:03.970628562Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T10:53:04.268385904Z Error: No matching routes for GET /remote/login?lang=en.
api_1  | 2020-04-24T10:53:04.268422113Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T10:53:04.845509977Z Error: No matching routes for GET /index.asp.
api_1  | 2020-04-24T10:53:04.845547278Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T10:53:05.121395138Z Error: No matching routes for GET /htmlV/welcomeMain.htm.
api_1  | 2020-04-24T10:53:05.121428756Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T12:09:51.327432945Z Error: No matching routes for GET /cgi-bin/luci.
api_1  | 2020-04-24T12:09:51.327472780Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T12:09:51.546282594Z Error: No matching routes for GET /dana-na/auth/url_default/welcome.cgi.
api_1  | 2020-04-24T12:09:51.546320877Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T12:09:51.766403073Z Error: No matching routes for GET /remote/login?lang=en.
api_1  | 2020-04-24T12:09:51.766440605Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T12:09:52.206691741Z Error: No matching routes for GET /index.asp.
api_1  | 2020-04-24T12:09:52.206734176Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T12:09:52.431231268Z Error: No matching routes for GET /htmlV/welcomeMain.htm.
api_1  | 2020-04-24T12:09:52.431277042Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T12:43:39.480709898Z Error: Bad incoming request: Bad URI: http://example.com/
api_1  | 2020-04-24T12:43:39.480747888Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T12:43:39.626060868Z Error: Bad incoming request: Bad URI: http://169.254.169.254/
api_1  | 2020-04-24T12:43:39.626098763Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T12:43:39.770140596Z Error: Bad incoming request: Bad URI: http://[::ffff:a9fe:a9fe]/
api_1  | 2020-04-24T12:43:39.770174078Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T12:43:39.915718434Z Error: Bad incoming request: Bad URI: http://169.254.169.254/latest/dynamic/instance-identity/document
api_1  | 2020-04-24T12:43:39.915753521Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T12:43:40.060469574Z Error: Bad incoming request: Bad URI: http://[::ffff:a9fe:a9fe]/latest/dynamic/instance-identity/document
api_1  | 2020-04-24T12:43:40.060520470Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T12:43:40.634409731Z Error: No matching routes for GET /latest/dynamic/instance-identity/document.
api_1  | 2020-04-24T12:43:40.634446051Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T12:43:40.780140916Z Error: No matching routes for GET /latest/dynamic/instance-identity/document.
api_1  | 2020-04-24T12:43:40.780175365Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T12:43:40.991753274Z Error: Bad incoming request: Bad URI: 3.133.46.235:80
api_1  | 2020-04-24T12:43:40.991786581Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T12:43:41.206594222Z Error: Bad incoming request: Bad URI: 3.133.46.235:80
api_1  | 2020-04-24T12:43:41.206628238Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T12:43:41.420324211Z Error: Bad incoming request: Bad URI: 3.133.46.235:80
api_1  | 2020-04-24T12:43:41.420358578Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T12:43:41.635173835Z Error: Bad incoming request: Bad URI: 3.133.46.235:80
api_1  | 2020-04-24T12:43:41.635209482Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T12:43:41.851130554Z Error: Bad incoming request: Bad URI: 3.133.46.235:80
api_1  | 2020-04-24T12:43:41.851163583Z Warning: Responding with 400 Bad Request catcher.
api_1  | 2020-04-24T14:13:23.071754563Z Error: No matching routes for GET /hudson.
api_1  | 2020-04-24T14:13:23.071792325Z Warning: Responding with 404 Not Found catcher.
api_1  | 2020-04-24T15:40:48.977368189Z Error: No matching routes for GET /solr/.
api_1  | 2020-04-24T15:40:48.977406849Z Warning: Responding with 404 Not Found catcher.
@WesleyAC
Copy link

WesleyAC commented May 19, 2020

seeing this same symptom. when i strace the stuck process, i see it's in futex:

# strace -p 20814
strace: Process 20814 attached
futex(0x7f83442f59d0, FUTEX_WAIT, 20815, NULL

i consistently see this after running for a few days, the repo is https://github.com/wesleyac/hanabi.

# curl -vvv localhost:8000
* Expire in 0 ms for 6 (transfer 0x563213403f50)
* Expire in 1 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 1 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 1 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 1 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 2 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 2 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 2 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 2 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 2 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 2 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 2 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 2 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 2 ms for 1 (transfer 0x563213403f50)
* Expire in 0 ms for 1 (transfer 0x563213403f50)
* Expire in 1 ms for 1 (transfer 0x563213403f50)
* Expire in 2 ms for 1 (transfer 0x563213403f50)
* Expire in 1 ms for 1 (transfer 0x563213403f50)
* Expire in 1 ms for 1 (transfer 0x563213403f50)
* Expire in 1 ms for 1 (transfer 0x563213403f50)
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x563213403f50)
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.64.0
> Accept: */*
> 

@jebrosen
Copy link
Collaborator

First, try disabling keep-alive (in Rocket.toml or with ROCKET_KEEPALIVE=0 in the environment) and see if that makes a difference. There are some bugs with keep-alive in hyper 0.10 that may be related to this, and it's one of the easiest switches to flip.

@jebrosen jebrosen added the triage A bug report being investigated label May 20, 2020
@migeruj
Copy link
Author

migeruj commented May 20, 2020

Thanks, I change that variable. We will see how it behaves and I will leave feedback here.

@WesleyAC
Copy link

this appears to have fixed the issue for me, although i should probably wait another ~week or so to say for sure.

@jebrosen
Copy link
Collaborator

If disabling keep-alive is a fix, there are two main possibilities that come to mind:

  • Some clients are not actually disconnecting. Given more time, those clients might disconnect and free up a thread for another response. This is a problem in Rocket, but more of a deficiency than an actual bug. If the load balancer in question acts as a client and proxies connections, this could also be a bug, deficiency, or awkward behavior in the load balancer.
  • OR, this is a bug or deficiency in hyper 0.10 or the way Rocket uses it, and the migration to hyper 0.13 (Tracking Issue for Async I/O migration #1065) will fix, alleviate, or change the issue.

@WesleyAC
Copy link

FWIW, my environment is that I have rocket running on a single thread behind nginx as a reverse proxy. If there's a way for me to get more info on what's happening or to test these hypotheses, let me know. the fact that i have a single worker thread could explain why i was seeing this more often than @migeruj

@jebrosen
Copy link
Collaborator

One worker thread can only serve a single request at a time, which could easily explain seeing it more often - especially if a handler is waiting to send/receive data.

@migeruj
Copy link
Author

migeruj commented May 26, 2020

In my case I am also doing a reverse proxy with nginx so I am going to check the logs and do load tests. Indeed I have 8 workers so I see this situation weeks apart.

@cars10
Copy link

cars10 commented Jul 10, 2020

I experience the same issue. Setup:

  • Rocket application running inside a docker container (managed by docker-compose)
  • Accessible via nginx reverse proxy
  • Before freezing the logs are full of bots and crawlers

It ran for about 4 weeks before freezing, restarting the docker container fixed it.
I will try the adding ROCKET_KEEPALIVE=0 to see if that makes any difference.

@igalic
Copy link

igalic commented Jul 10, 2020

every now and then i get a notification on this issue and hope there's a solution already

alas, there still isn't

so today i was musing whether we could put rocket behind a WAF like ModSecurity, when i realised:

a properly configured proxy server should answer half of these requests with 400 itself, instead of sending them to the backend!

and while that doesn't solve the issue, it does leave rocket with more threads to answer real requests, instead of freezing up on some automated attacks, that aren't even targeting your app!

@SergioBenitez
Copy link
Member

My guess is that this is simply #1434 with poor keep-alive behavior and a buggy hyper. Disabling keep-alive and placing Rocket behind a proxy that times out long-lived connections should resolve concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage A bug report being investigated
Projects
None yet
Development

No branches or pull requests

6 participants