Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRITICAL WORKER TIMEOUT when running Flask app #1801

Closed
bigunyak opened this issue Jun 5, 2018 · 84 comments
Closed

CRITICAL WORKER TIMEOUT when running Flask app #1801

bigunyak opened this issue Jun 5, 2018 · 84 comments

Comments

@bigunyak
Copy link

bigunyak commented Jun 5, 2018

It seems there have been already several reports related to [CRITICAL] WORKER TIMEOUT error but it just keeps popping up. Here is my issue.

I'm running this Flask hello world application.

from flask import Flask
application = Flask(__name__)

@application.route('/')
def hello_world():
    return 'Hello, World!'

The gunicorn command is this one:

gunicorn -b 0.0.0.0:5000 --log-level=debug hello

And this is the console output:

[2018-06-05 14:56:21 +0200] [11229] [INFO] Starting gunicorn 19.8.1
[2018-06-05 14:56:21 +0200] [11229] [DEBUG] Arbiter booted
[2018-06-05 14:56:21 +0200] [11229] [INFO] Listening at: http://0.0.0.0:5000 (11229)
[2018-06-05 14:56:21 +0200] [11229] [INFO] Using worker: sync
[2018-06-05 14:56:21 +0200] [11232] [INFO] Booting worker with pid: 11232
[2018-06-05 14:56:21 +0200] [11229] [DEBUG] 1 workers
[2018-06-05 14:56:32 +0200] [11232] [DEBUG] GET /
[2018-06-05 14:56:57 +0200] [11232] [DEBUG] Closing connection. 
[2018-06-05 14:57:16 +0200] [11232] [DEBUG] GET /
[2018-06-05 14:57:47 +0200] [11229] [CRITICAL] WORKER TIMEOUT (pid:11232)
[2018-06-05 14:57:47 +0200] [11232] [INFO] Worker exiting (pid: 11232)
[2018-06-05 14:57:47 +0200] [11324] [INFO] Booting worker with pid: 11324

Can you please clearly explain why do I get the error and if it's expected in this example? How do I fix it or if it's an expected behavior why critical error then?

@tilgovi
Copy link
Collaborator

tilgovi commented Jun 5, 2018

The error is not expected, but there is nothing from your example that shows why it happens. Tell us more about your environment.

  • What client is used to connect to Gunicorn?
  • What reverse proxy, if any, is used to connect to Gunicorn?

@bigunyak
Copy link
Author

bigunyak commented Jun 6, 2018

Tell us more about your environment.

  • What client is used to connect to Gunicorn?
    => I just send request from Chromium: http://localhost:5000/
  • What reverse proxy, if any, is used to connect to Gunicorn?
    => No proxy, just Gunicorn + Flask

I've just reproduced the problem on a completely fresh setup, here are the steps:

mkdir gunicorn
cd gunicorn/
pipenv --python 3.6
pipenv install flask
pipenv install gunicorn
vim hello.py
pipenv shell
gunicorn -b 0.0.0.0:5000 --log-level=debug hello

The hello.py is exactly the same Flask application as I posted in the initial report.
Below is the complete log.

~$ gunicorn -b 0.0.0.0:5000 --log-level=debug hello
[2018-06-06 09:16:21 +0200] [19829] [DEBUG] Current configuration:
  config: None
  bind: ['0.0.0.0:5000']
  backlog: 2048
  workers: 1
  worker_class: sync
  threads: 1
  worker_connections: 1000
  max_requests: 0
  max_requests_jitter: 0
  timeout: 30
  graceful_timeout: 30
  keepalive: 2
  limit_request_line: 4094
  limit_request_fields: 100
  limit_request_field_size: 8190
  reload: False
  reload_engine: auto
  reload_extra_files: []
  spew: False
  check_config: False
  preload_app: False
  sendfile: None
  reuse_port: False
  chdir: /home/dima/work/gunicorn
  daemon: False
  raw_env: []
  pidfile: None
  worker_tmp_dir: None
  user: 1000
  group: 985
  umask: 0
  initgroups: False
  tmp_upload_dir: None
  secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
  forwarded_allow_ips: ['127.0.0.1']
  accesslog: None
  disable_redirect_access_to_syslog: False
  access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
  errorlog: -
  loglevel: debug
  capture_output: False
  logger_class: gunicorn.glogging.Logger
  logconfig: None
  logconfig_dict: {}
  syslog_addr: udp://localhost:514
  syslog: False
  syslog_prefix: None
  syslog_facility: user
  enable_stdio_inheritance: False
  statsd_host: None
  statsd_prefix: 
  proc_name: None
  default_proc_name: hello
  pythonpath: None
  paste: None
  on_starting: <function OnStarting.on_starting at 0x7f9757112d08>
  on_reload: <function OnReload.on_reload at 0x7f9757112e18>
  when_ready: <function WhenReady.when_ready at 0x7f9757112f28>
  pre_fork: <function Prefork.pre_fork at 0x7f9756c230d0>
  post_fork: <function Postfork.post_fork at 0x7f9756c231e0>
  post_worker_init: <function PostWorkerInit.post_worker_init at 0x7f9756c232f0>
  worker_int: <function WorkerInt.worker_int at 0x7f9756c23400>
  worker_abort: <function WorkerAbort.worker_abort at 0x7f9756c23510>
  pre_exec: <function PreExec.pre_exec at 0x7f9756c23620>
  pre_request: <function PreRequest.pre_request at 0x7f9756c23730>
  post_request: <function PostRequest.post_request at 0x7f9756c237b8>
  child_exit: <function ChildExit.child_exit at 0x7f9756c238c8>
  worker_exit: <function WorkerExit.worker_exit at 0x7f9756c239d8>
  nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7f9756c23ae8>
  on_exit: <function OnExit.on_exit at 0x7f9756c23bf8>
  proxy_protocol: False
  proxy_allow_ips: ['127.0.0.1']
  keyfile: None
  certfile: None
  ssl_version: 2
  cert_reqs: 0
  ca_certs: None
  suppress_ragged_eofs: True
  do_handshake_on_connect: False
  ciphers: TLSv1
  raw_paste_global_conf: []
[2018-06-06 09:16:21 +0200] [19829] [INFO] Starting gunicorn 19.8.1
[2018-06-06 09:16:21 +0200] [19829] [DEBUG] Arbiter booted
[2018-06-06 09:16:21 +0200] [19829] [INFO] Listening at: http://0.0.0.0:5000 (19829)
[2018-06-06 09:16:21 +0200] [19829] [INFO] Using worker: sync
[2018-06-06 09:16:21 +0200] [19832] [INFO] Booting worker with pid: 19832
[2018-06-06 09:16:22 +0200] [19829] [DEBUG] 1 workers
[2018-06-06 09:16:48 +0200] [19832] [DEBUG] GET /
[2018-06-06 09:17:19 +0200] [19829] [CRITICAL] WORKER TIMEOUT (pid:19832)
[2018-06-06 09:17:19 +0200] [19832] [INFO] Worker exiting (pid: 19832)
[2018-06-06 09:17:19 +0200] [19872] [INFO] Booting worker with pid: 19872
^C[2018-06-06 09:17:26 +0200] [19829] [INFO] Handling signal: int
[2018-06-06 09:17:26 +0200] [19872] [INFO] Worker exiting (pid: 19872)
[2018-06-06 09:17:26 +0200] [19829] [INFO] Shutting down: Master
~$ pip list
Package      Version
------------ -------
click        6.7    
Flask        1.0.2  
gunicorn     19.8.1 
itsdangerous 0.24   
Jinja2       2.10   
MarkupSafe   1.0    
pip          10.0.1 
setuptools   39.2.0 
Werkzeug     0.14.1 
wheel        0.31.1

@yunstanford
Copy link
Contributor

yunstanford commented Jun 8, 2018

@bigunyak I think it's because of the default timeout, your worker has been silent for 30s. http://docs.gunicorn.org/en/stable/settings.html#timeout

From your log,

[2018-06-05 14:57:16 +0200] [11232] [DEBUG] GET /
[2018-06-05 14:57:47 +0200] [11229] [CRITICAL] WORKER TIMEOUT (pid:11232)
[2018-06-06 09:16:48 +0200] [19832] [DEBUG] GET /
[2018-06-06 09:17:19 +0200] [19829] [CRITICAL] WORKER TIMEOUT (pid:19832)

@yunstanford
Copy link
Contributor

I'm seeing the same thing: workers are timing out even when serving no requests, with sync worker.
To that sense, the critical level log is quite confusing.

Try use Gevent worker could solve this.

@bigunyak
Copy link
Author

bigunyak commented Jun 8, 2018

To that sense, the critical level log is quite confusing.

Exactly, that was my original question: if it's an expected behavior why critical error then?
Would be also nice to get some background on why workers need to be restarted, maybe this could be added to the Design document.

@berkerpeksag
Copy link
Collaborator

I'm seeing this as well (reproduced by using examples/test.py with gunicorn test:app -b localhost:9595 --log-level=debug --timeout=5) and I agree that the critical level is a bit confusing. I'd be OK change it to debug level. @benoitc @tilgovi what do you think?

@yunstanford
Copy link
Contributor

I'm thinking info level might be a little bit better.

@neocolor
Copy link

I had the same with MSYS2 on Win10 but finally could solved.

in notify() of ...\gunicorn\workers\workertmp.py, os.fchmod is used originally. But it does not work in MSYS. Instead of os.fchmod, I used os.utime. The code is followed. I think it could work for all platform.

    def notify(self):
        try:
            self.spinner = (self.spinner + 1) % 2
            os.fchmod(self._tmp.fileno(), self.spinner)
            if PLATFORM.startswith('MSYS') :
                os.utime(self._tmp.fileno(), None)
        except AttributeError:
            # python < 2.6
            self._tmp.truncate(0)
            os.write(self._tmp.fileno(), b"X")

@benoitc
Copy link
Owner

benoitc commented Jun 20, 2018

@berkerpeksag I wouldn't expect that the worker exit because no requests happen. This error should only happen if the worker has been kept busy for a time > to the timeout. So the error is critical. Imo we should improve the documentation to provide more use cases and responses to such errors.

If the error still happen when the worker is not kept busy then there is something else happening and we have probably a bug.

@kozlek
Copy link

kozlek commented Jul 6, 2018

[EDIT]
Same bug for me.
With Django 1.10 / gunicorn 19.6.0 / Python 2.7.15 in python2.7-alpine on Debian 8.8 and stock kernel 3.16, all was working fine.
After updating to Django 1.11 and gunicorn 19.8.1, the workers keep failing at boot with [CRITICAL WORKER TIMEOUT].
Downgrading gunicorn to 19.6.0 doesn't fix the problem.
We updated the host kernel to 4.9.0 (it was scheduled), and the workers successfully booted without errors.
But:

  • we use 4 workers
  • after exactly 4 calls to the Django app, next calls will timeout, showing [CRITICAL WORKER TIMEOUT] in the logs
  • the linux top command shows 4 gunicorn processes stuck with pretty high CPU consumption.

We're going to try gunicorn gevent to see if we'are able to get our app back online.

@kozlek
Copy link

kozlek commented Jul 6, 2018

Using gunicorn with gevent didn't fixed the bug.

@baeriswyld
Copy link

any update on this issue ?

@tilgovi
Copy link
Collaborator

tilgovi commented Aug 3, 2018

It looks like @neocolor identified a real bug under MSYS. It might deserve a separate issue.

@bigunyak what platform are you running under? I have tried to reproduce with the simple example and I cannot do it following exactly the steps outlined above. This agrees with my experience running multiple applications in production on multiple frameworks. The worker notification system has not changed recently, to my knowledge. My platform is Python 3.7 on MacOS 10.13.6, but I run Gunicorn in production with sync workers for several applications on Python 2.7 and 3.6.5 and only see worker timeouts when there are legitimately long requests that block the workers.

For @Tberdy: what happens if you try to set --worker-tmp-dir to somewhere outside the tmpfs filesystem? I am just wondering if maybe alpine or docker or the combination interferes somehow with how workers notify the arbiter.

@tilgovi
Copy link
Collaborator

tilgovi commented Aug 3, 2018

See also #1388 for Docker related tmpfs issues.

@timoj58
Copy link

timoj58 commented Aug 7, 2018

i have this issue to.

@cjmacharia
Copy link

I have this issue too, gunicorn sync was working perfectly well until yesternight, ut started reporting, workers timeout [CRITICAL] using gevent seems to solve my issue, but I'd really want to know why this happened .

@benoitc
Copy link
Owner

benoitc commented Aug 9, 2018

@timoj58 @cjmash can you provides more detail about the issue ? How are you running gunicorn (in a vm?, options ...), which fileystem, OS? ANthing that could help to reproduce would be very helpful :)

@cjmacharia
Copy link

cjmacharia commented Aug 10, 2018

@benoitc I am running gunicorn to start my Django project on kubernetes my gunicorn arguments are --bind=$port --workers=7 --timeout=1200 --log-level=debug --access-logfile - error-logfile -"

the errors i get from the logs

 
E  [2018-08-09 21:47:56 +0000] [14] [INFO] Booting worker with pid: 14
 
E  [2018-08-09 21:47:56 +0000] [12] [INFO] Booting worker with pid: 12
 
E  [2018-08-09 21:47:56 +0000] [1] [DEBUG] 7 workers
 
E  [2018-08-09 21:47:56 +0000] [11] [INFO] Booting worker with pid: 11
 
E  [2018-08-09 21:47:55 +0000] [10] [INFO] Booting worker with pid: 10
 
E  [2018-08-09 21:47:55 +0000] [9] [INFO] Booting worker with pid: 9
 
E  [2018-08-09 21:47:55 +0000] [8] [INFO] Booting worker with pid: 8
 
E  [2018-08-09 21:47:55 +0000] [1] [INFO] Using worker: sync
 
E  [2018-08-09 21:47:55 +0000] [1] [INFO] Listening at: http://0.0.0.0:4040 (1)
 
E  [2018-08-09 21:47:55 +0000] [1] [DEBUG] Arbiter booted
 
E  [2018-08-09 21:47:55 +0000] [1] [INFO] Starting gunicorn 19.7.1
 
E    raw_paste_global_conf: []
 
E    ciphers: TLSv1
 
E    do_handshake_on_connect: False
 
E    suppress_ragged_eofs: True
 
E    ca_certs: None
 
E    cert_reqs: 0
 
E    ssl_version: 2
 
E    certfile: None
 
E    keyfile: None
 
E    proxy_allow_ips: ['127.0.0.1']
 
E    proxy_protocol: False
 
E    on_exit: <function OnExit.on_exit at 0x7f9612aad7b8>
 
E    nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7f9612aad620>
 
E    worker_exit: <function WorkerExit.worker_exit at 0x7f9612aad488>
 
E    child_exit: <function ChildExit.child_exit at 0x7f9612aad2f0>
 
E    post_request: <function PostRequest.post_request at 0x7f9612aad158>
 
E    pre_request: <function PreRequest.pre_request at 0x7f9612aad048>
 
E    pre_exec: <function PreExec.pre_exec at 0x7f9612aa8e18>
 
E    worker_abort: <function WorkerAbort.worker_abort at 0x7f9612aa8c80>
 
E    worker_int: <function WorkerInt.worker_int at 0x7f9612aa8ae8>
 
E    post_worker_init: <function PostWorkerInit.post_worker_init at 0x7f9612aa8950>
 
E    post_fork: <function Postfork.post_fork at 0x7f9612aa87b8>
 
E    pre_fork: <function Prefork.pre_fork at 0x7f9612aa8620>
 
E    when_ready: <function WhenReady.when_ready at 0x7f9612aa8488>
 
E    on_reload: <function OnReload.on_reload at 0x7f9612aa82f0>
 
E    on_starting: <function OnStarting.on_starting at 0x7f9612aa8158>
 
E    paste: None
 
E    pythonpath: None
 
E    default_proc_name: art.wsgi
 
E    proc_name: None
 
E    statsd_prefix: 
 
E    statsd_host: None
 
E    enable_stdio_inheritance: False
 
E    syslog_facility: user
 
E    syslog_prefix: None
 
E    syslog: False
 
E    syslog_addr: udp://localhost:514
 
E    logconfig: None
 
E    logger_class: gunicorn.glogging.Logger
 
E    capture_output: False
 
E    loglevel: debug
 
E    errorlog: -
 
E    access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
 
E    accesslog: -
 
E    forwarded_allow_ips: ['127.0.0.1']
 
E    secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
 
E    tmp_upload_dir: None
 
E    initgroups: False
 
E    umask: 0
 
E    group: 0
 
E    user: 0
 
E    worker_tmp_dir: None
 
E    pidfile: None
 
E    raw_env: []
 
E    daemon: False
 
E    chdir: /usr/src/app
 
E    sendfile: None
 
E    preload_app: False
 
E    check_config: False
 
E    spew: False
 
E    reload_engine: auto
 
E    reload: False
 
E    limit_request_field_size: 8190
 
E    limit_request_fields: 100
 
E    limit_request_line: 4094
 
E    keepalive: 2
 
E    graceful_timeout: 30
 
E    timeout: 1200
 
E    max_requests_jitter: 0
 
E    max_requests: 0
 
E    worker_connections: 1000
 
E    threads: 1
 
E    worker_class: sync
 
E    workers: 7
 
E    backlog: 2048
 
E    bind: ['0.0.0.0:4040']
 
E    config: None
 
E  [2018-08-09 21:47:55 +0000] [1] [DEBUG] Current configuration:
 
I    No migrations to apply.
 
I  Running migrations:
 
I    Apply all migrations: admin, auth, contenttypes, core, dashboard, jet, oauth2_provider, sessions
 
I  Operations to perform:
 
E  [2018-08-09 21:47:20 +0000] [13] [INFO] Worker exiting (pid: 13)
 
E    os.path.dirname(os.path.dirname(__file__)), '.env'))
 
E  /usr/src/app/art/wsgi.py:19: UserWarning: Not reading /usr/src/app/.env - it doesn't exist.
 
E  [2018-08-09 21:47:00 +0000] [21] [INFO] Booting worker with pid: 21
 
E  [2018-08-09 21:47:00 +0000] [1] [INFO] Handling signal: term
 
E  [2018-08-09 21:46:35 +0000] [12] [INFO] Booting worker with pid: 12
 
E  [2018-08-09 21:46:34 +0000] [13] [INFO] Booting worker with pid: 13
 
E  [2018-08-09 21:46:34 +0000] [11] [INFO] Booting worker with pid: 11
 
E  [2018-08-09 21:46:34 +0000] [1] [DEBUG] 7 workers
 
E  [2018-08-09 21:46:34 +0000] [10] [INFO] Booting worker with pid: 10
 
E  [2018-08-09 21:46:34 +0000] [9] [INFO] Booting worker with pid: 9
 
E  [2018-08-09 21:46:34 +0000] [8] [INFO] Booting worker with pid: 8
 
E  [2018-08-09 21:46:34 +0000] [7] [INFO] Booting worker with pid: 7
 
E  [2018-08-09 21:46:34 +0000] [1] [INFO] Using worker: sync
 
E  [2018-08-09 21:46:34 +0000] [1] [INFO] Listening at: http://0.0.0.0:4040 (1)
 
E  [2018-08-09 21:46:34 +0000] [1] [DEBUG] Arbiter booted
 
E  [2018-08-09 21:46:34 +0000] [1] [INFO] Starting gunicorn 19.7.1
 
E    raw_paste_global_conf: []
 
E    ciphers: TLSv1
 
E    do_handshake_on_connect: False
 
E    suppress_ragged_eofs: True
 
E    ca_certs: None
 
E    cert_reqs: 0
 
E    ssl_version: 2
 
E    certfile: None
 
E    keyfile: None
 
E    proxy_allow_ips: ['127.0.0.1']
 
E    proxy_protocol: False
 
E    on_exit: <function OnExit.on_exit at 0x7f07f2cfe7b8>
 
E    nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7f07f2cfe620>
 
E    worker_exit: <function WorkerExit.worker_exit at 0x7f07f2cfe488>
 
E    child_exit: <function ChildExit.child_exit at 0x7f07f2cfe2f0>
 
E    post_request: <function PostRequest.post_request at 0x7f07f2cfe158>
 
E    pre_request: <function PreRequest.pre_request at 0x7f07f2cfe048>
 
E    pre_exec: <function PreExec.pre_exec at 0x7f07f2cf9e18>
 
E    worker_abort: <function WorkerAbort.worker_abort at 0x7f07f2cf9c80>
 
E    worker_int: <function WorkerInt.worker_int at 0x7f07f2cf9ae8>
 
E    post_worker_init: <function PostWorkerInit.post_worker_init at 0x7f07f2cf9950>
 
E    post_fork: <function Postfork.post_fork at 0x7f07f2cf97b8>
 
E    pre_fork: <function Prefork.pre_fork at 0x7f07f2cf9620>
 
E    when_ready: <function WhenReady.when_ready at 0x7f07f2cf9488>
 
E    on_reload: <function OnReload.on_reload at 0x7f07f2cf92f0>
 
E    on_starting: <function OnStarting.on_starting at 0x7f07f2cf9158>
 
E    paste: None
 
E    pythonpath: None
 
E    default_proc_name: art.wsgi
 
E    proc_name: None
 
E    statsd_prefix: 
 
E    statsd_host: None
 
E    enable_stdio_inheritance: False
 
E    syslog_facility: user
 
E    syslog_prefix: None
 
E    syslog: False
 
E    syslog_addr: udp://localhost:514
 
E    logconfig: None
 
E    logger_class: gunicorn.glogging.Logger
 
E    capture_output: False
 
E    loglevel: debug
 
E    errorlog: -
 
E    access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
 
E    accesslog: -
 
E    forwarded_allow_ips: ['127.0.0.1']
 
E    secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
 
E    tmp_upload_dir: None
 
E    initgroups: False
 
E    umask: 0
 
E    group: 0
 
E    user: 0
 
E    worker_tmp_dir: None
 
E    pidfile: None
 
E    raw_env: []
 
E    daemon: False
 
E    chdir: /usr/src/app
 
E    sendfile: None
 
E    preload_app: False
 
E    check_config: False
 
E    spew: False
 
E    reload_engine: auto
 
E    reload: False
 
E    limit_request_field_size: 8190
 
E    limit_request_fields: 100
 
E    limit_request_line: 4094
 
E    keepalive: 2
 
E    graceful_timeout: 30
 
E    timeout: 1200
 
E    max_requests_jitter: 0
 
E    max_requests: 0
 
E    worker_connections: 1000
 
E    threads: 1
 
E    worker_class: sync
 
E    workers: 7
 
E    backlog: 2048
 
E    bind: ['0.0.0.0:4040']
 
E    config: None
 
E  [2018-08-09 21:46:34 +0000] [1] [DEBUG] Current configuration:
 

@bigunyak
Copy link
Author

bigunyak commented Aug 10, 2018

I struggled a bit to reproduce the problem this time but it's still there in the latest gunicorn version 19.9.0.
The steps to reproduce it are exactly the same as I described in this post.
My system is Arch Linux x86_64 GNU/Linux (kernel 4.17.2-1-ARCH), Python 3.6.5
Here is a fresh log trace.

~$ gunicorn -b 0.0.0.0:5000 --log-level=debug hello
[2018-08-10 09:48:40 +0200] [23114] [DEBUG] Current configuration:
  config: None
  bind: ['0.0.0.0:5000']
  backlog: 2048
  workers: 1
  worker_class: sync
  threads: 1
  worker_connections: 1000
  max_requests: 0
  max_requests_jitter: 0
  timeout: 30
  graceful_timeout: 30
  keepalive: 2
  limit_request_line: 4094
  limit_request_fields: 100
  limit_request_field_size: 8190
  reload: False
  reload_engine: auto
  reload_extra_files: []
  spew: False
  check_config: False
  preload_app: False
  sendfile: None
  reuse_port: False
  chdir: /home/dima/lerning/python/modules/gunicorn
  daemon: False
  raw_env: []
  pidfile: None
  worker_tmp_dir: None
  user: 1000
  group: 985
  umask: 0
  initgroups: False
  tmp_upload_dir: None
  secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
  forwarded_allow_ips: ['127.0.0.1']
  accesslog: None
  disable_redirect_access_to_syslog: False
  access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
  errorlog: -
  loglevel: debug
  capture_output: False
  logger_class: gunicorn.glogging.Logger
  logconfig: None
  logconfig_dict: {}
  syslog_addr: udp://localhost:514
  syslog: False
  syslog_prefix: None
  syslog_facility: user
  enable_stdio_inheritance: False
  statsd_host: None
  statsd_prefix: 
  proc_name: None
  default_proc_name: hello
  pythonpath: None
  paste: None
  on_starting: <function OnStarting.on_starting at 0x7f5f8b9dad08>
  on_reload: <function OnReload.on_reload at 0x7f5f8b9dae18>
  when_ready: <function WhenReady.when_ready at 0x7f5f8b9daf28>
  pre_fork: <function Prefork.pre_fork at 0x7f5f8b4ec0d0>
  post_fork: <function Postfork.post_fork at 0x7f5f8b4ec1e0>
  post_worker_init: <function PostWorkerInit.post_worker_init at 0x7f5f8b4ec2f0>
  worker_int: <function WorkerInt.worker_int at 0x7f5f8b4ec400>
  worker_abort: <function WorkerAbort.worker_abort at 0x7f5f8b4ec510>
  pre_exec: <function PreExec.pre_exec at 0x7f5f8b4ec620>
  pre_request: <function PreRequest.pre_request at 0x7f5f8b4ec730>
  post_request: <function PostRequest.post_request at 0x7f5f8b4ec7b8>
  child_exit: <function ChildExit.child_exit at 0x7f5f8b4ec8c8>
  worker_exit: <function WorkerExit.worker_exit at 0x7f5f8b4ec9d8>
  nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7f5f8b4ecae8>
  on_exit: <function OnExit.on_exit at 0x7f5f8b4ecbf8>
  proxy_protocol: False
  proxy_allow_ips: ['127.0.0.1']
  keyfile: None
  certfile: None
  ssl_version: 2
  cert_reqs: 0
  ca_certs: None
  suppress_ragged_eofs: True
  do_handshake_on_connect: False
  ciphers: TLSv1
  raw_paste_global_conf: []
[2018-08-10 09:48:40 +0200] [23114] [INFO] Starting gunicorn 19.9.0
[2018-08-10 09:48:40 +0200] [23114] [DEBUG] Arbiter booted
[2018-08-10 09:48:40 +0200] [23114] [INFO] Listening at: http://0.0.0.0:5000 (23114)
[2018-08-10 09:48:40 +0200] [23114] [INFO] Using worker: sync
[2018-08-10 09:48:40 +0200] [23117] [INFO] Booting worker with pid: 23117
[2018-08-10 09:48:40 +0200] [23114] [DEBUG] 1 workers
[2018-08-10 09:48:45 +0200] [23117] [DEBUG] GET /
[2018-08-10 09:48:54 +0200] [23117] [DEBUG] GET /
[2018-08-10 09:49:00 +0200] [23117] [DEBUG] GET /
[2018-08-10 09:49:18 +0200] [23117] [DEBUG] Closing connection. 
[2018-08-10 09:49:18 +0200] [23117] [DEBUG] GET /
[2018-08-10 09:49:23 +0200] [23117] [DEBUG] GET /
[2018-08-10 09:49:37 +0200] [23117] [DEBUG] Closing connection. 
[2018-08-10 09:49:37 +0200] [23117] [DEBUG] GET /
[2018-08-10 09:49:50 +0200] [23117] [DEBUG] Closing connection. 
[2018-08-10 09:51:11 +0200] [23117] [DEBUG] GET /
[2018-08-10 09:51:13 +0200] [23117] [DEBUG] GET /
[2018-08-10 09:51:43 +0200] [23114] [CRITICAL] WORKER TIMEOUT (pid:23117)
[2018-08-10 09:51:43 +0200] [23117] [INFO] Worker exiting (pid: 23117)
[2018-08-10 09:51:44 +0200] [23229] [INFO] Booting worker with pid: 23229

As before to test it I was just hitting http://0.0.0.0:5000/ in Chromium.
Below is the content of Pipfile and Pipfile.lock for you to see the exact environment.

  • Pipfile
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
flask = "*"
gunicorn = "*"

[dev-packages]

[requires]
python_version = "3.6"
  • Pipfile.lock
{
    "_meta": {
        "hash": {
            "sha256": "81cb5d5f0b11719d8d9c5ec9cc683fdcf959c652fda256d5552a82d0f459a99c"
        },
        "pipfile-spec": 6,
        "requires": {
            "python_version": "3.6"
        },
        "sources": [
            {
                "name": "pypi",
                "url": "https://pypi.org/simple",
                "verify_ssl": true
            }
        ]
    },
    "default": {
        "click": {
            "hashes": [
                "sha256:29f99fc6125fbc931b758dc053b3114e55c77a6e4c6c3a2674a2dc986016381d",
                "sha256:f15516df478d5a56180fbf80e68f206010e6d160fc39fa508b65e035fd75130b"
            ],
            "version": "==6.7"
        },
        "flask": {
            "hashes": [
                "sha256:2271c0070dbcb5275fad4a82e29f23ab92682dc45f9dfbc22c02ba9b9322ce48",
                "sha256:a080b744b7e345ccfcbc77954861cb05b3c63786e93f2b3875e0913d44b43f05"
            ],
            "index": "pypi",
            "version": "==1.0.2"
        },
        "gunicorn": {
            "hashes": [
                "sha256:aa8e0b40b4157b36a5df5e599f45c9c76d6af43845ba3b3b0efe2c70473c2471",
                "sha256:fa2662097c66f920f53f70621c6c58ca4a3c4d3434205e608e121b5b3b71f4f3"
            ],
            "index": "pypi",
            "version": "==19.9.0"
        },
        "itsdangerous": {
            "hashes": [
                "sha256:cbb3fcf8d3e33df861709ecaf89d9e6629cff0a217bc2848f1b41cd30d360519"
            ],
            "version": "==0.24"
        },
        "jinja2": {
            "hashes": [
                "sha256:74c935a1b8bb9a3947c50a54766a969d4846290e1e788ea44c1392163723c3bd",
                "sha256:f84be1bb0040caca4cea721fcbbbbd61f9be9464ca236387158b0feea01914a4"
            ],
            "version": "==2.10"
        },
        "markupsafe": {
            "hashes": [
                "sha256:a6be69091dac236ea9c6bc7d012beab42010fa914c459791d627dad4910eb665"
            ],
            "version": "==1.0"
        },
        "werkzeug": {
            "hashes": [
                "sha256:c3fd7a7d41976d9f44db327260e263132466836cef6f91512889ed60ad26557c",
                "sha256:d5da73735293558eb1651ee2fddc4d0dedcfa06538b8813a2e20011583c9e49b"
            ],
            "version": "==0.14.1"
        }
    },
    "develop": {}
}

@jdeltoft
Copy link

jdeltoft commented Aug 15, 2018

Just FYI, I'm also seeing this failure very regularly with:

nohup gunicorn -w 8 --access-logfile=- --bind 127.0.0.1:5000 wsgi &

I just have a wsgi.py that has

from chart_api import application
if __name__ == "__main__":
    application.run()

Let me know if there is anything you want me to try/experiment or if there are specifics in the logs you want me to check. I'm running flask on a GCP VM.

@timoj58
Copy link

timoj58 commented Aug 15, 2018

sorry for late reply. I am running it as

gunicorn --log-file=/home/ubuntu/log/gunicorn.log predictor_api:app -b localhost:5000 &

i did use the gevent setting etc, but i have changed my design of what i required this for in order to work around issue, hence the basic setting above (which also failed but this is to be expected given no gevent)

Python version 3.6
env: aws tensorflow_p36 optmized (ubuntu)
nginx is sat in front of gunicorn, which is executing a flask app.

flask version 1.0.2
nginx version 1.10.3
gunicorn version 19.9.0

I did change the nginx timeouts as well incase this may have caused it.

@tiru1930
Copy link

Facing the same problem with gunicorn server

gunicorn ApplicationServer:app -b 0.0.0.0:6001 -w 8 --threads 4 --backlog 2048 \

--timeout 120 --graceful-timeout 60 --access-logfile logs/access.log \

--error-logfile logs/error.log --log-level=info

Flask==0.12.1
gunicorn==19.7.1

when i start server with above command, my system got freezing for some time and worker pids are keeping on booting , though i kept timeout 120 sec, and the server is not accepting single request.

@anilpai
Copy link

anilpai commented Nov 5, 2018

Any update on this issue ? I have same issue

[CRITICAL] WORKER TIMEOUT

@javabrett
Copy link
Collaborator

Wondering whether anyone has successfully reproduced this in a Docker image?

@SuiTissue
Copy link

Also seeing this when trying to implement datadog's ddtrace-run worker on an existing application starting with gunicorn -k gevent --threads 4.

Funny trace of a SystemExit I've never seen before too...
[2018-11-07 11:11:50 +0000] [15987] [INFO] Booting worker with pid: 15987 [2018-11-07 11:11:50 +0000] [15977] [DEBUG] 1 workers [2018-11-07 11:12:20 +0000] [15977] [CRITICAL] WORKER TIMEOUT (pid:15987) Exception SystemExit: 1 in <bound method LibevLoop._loop_will_run of <cassandra.io.libevreactor.LibevLoop object at 0x7f3cb66d4a50>> ignored

@anilpai
Copy link

anilpai commented Nov 9, 2018

I am able to resolve this issue by matching the number of workers and number of threads.

I had set workers = (2 * cpu_count) + 1 and did not set threads.

Once I changed threads = workers , everything started working fine. Just in case, if this helps someone.

This is how it looks now

def run(host='0.0.0.0', port=8080, workers=1 + (multiprocessing.cpu_count() * 2)):
    """Run the app with Gunicorn."""

    if app.debug:
        app.run(host, int(port), use_reloader=False)
    else:
        gunicorn = WSGIApplication()
        gunicorn.load_wsgiapp = lambda: app
        gunicorn.cfg.set('bind', '%s:%s' % (host, port))
        gunicorn.cfg.set('workers', workers)
        gunicorn.cfg.set('threads', workers)
        gunicorn.cfg.set('pidfile', None)
        gunicorn.cfg.set('worker_class', 'sync')
        gunicorn.cfg.set('keepalive', 10)
        gunicorn.cfg.set('accesslog', '-')
        gunicorn.cfg.set('errorlog', '-')
        gunicorn.cfg.set('reload', True)
        gunicorn.chdir()
        gunicorn.run()

@clintonb
Copy link

I encountered this issue running Django with a single Docker container on AWS Elastic Beanstalk. I resolved the issue by fixing my security groups to ensure my EC2 instance could talk to my RDS instance. I recognize this may not be the solution for 99% of folks on this issue, but I'm leaving this note to help others avoid wasting hours falling down this rabbit hole.

@Justice4Joffrey
Copy link

Justice4Joffrey commented May 1, 2020

I had a similar issue to this. Turns out I had an error in my entrypoint to the application. From debugging it seemed that I was essentially launching a flask app from gunicorn, who's workers subsequently enter an infinite connection loop which times out every 30s.

I'm sure that this doesn't affect all users above, but may well affect some.

In my module/wsgi.py file which I'm running with gunicorn module.wsgi I had -

application = my_create_app_function()
application.run(host="0.0.0.0")

Whereas I should've had -

application = my_create_app_function()
if __name__ == "__main__":
     application.run(host="0.0.0.0")

Essentially, you don't want to call application.run() when using gunicorn. The __name__ under gunicorn won't be "__main__", but it will in Flask, so you can still debug locally.

I couldn't find a reference to this in the gunicorn docs, but could imagine it being a common error case, so maybe some warning is necessary.

fergusmclellan added a commit to fergusmclellan/al_techtest that referenced this issue May 16, 2020
Increased Gunicorn workers as per critical timeout issue: benoitc/gunicorn#1801
@JurajMa
Copy link

JurajMa commented Jun 15, 2020

This is still occuring. Adding --preload to the Gunicorn call fixed the issue for me.

@leonbrag
Copy link

Is this bug still not fixed? I am observing this exact behavior.

Gunicorn starts like this in systemd:

[Service]
PIDFile = /run/gunicorn.pid
WorkingDirectory = /home/pi/pyTest
ExecStart=/usr/local/bin/gunicorn  app:app  -b 0.0.0.0:80 --pid /run/gunicorn.pid
RuntimeDirectory=/home/pi/pyTest
Restart=always
KillSignal=SIGQUIT
Type=notify
StandardError=syslog
NotifyAccess=all
User=root
Group=root
ExecReload = /bin/kill -s HUP $MAINPID
ExecStop = /bin/kill -s TERM $MAINPID
ExecStopPost = /bin/rm -rf /run/gunicorn
PrivateTmp = true

Worker process constantly times out and restarts:

Jul 10 15:19:20 raspberryVM gunicorn[10941]: [2020-07-10 15:19:20 -0700] [10941] [CRITICAL] WORKER TIMEOUT (pid:10944)
Jul 10 15:19:20 raspberryVM gunicorn[10941]: [2020-07-10 15:19:20 -0700] [10944] [INFO] Worker exiting (pid: 10944)
Jul 10 15:20:15 raspberryVM gunicorn[10941]: [2020-07-10 15:20:15 -0700] [10985] [INFO] Booting worker with pid: 10985

app.py is a trival Flask app.

Is this issue closed as Don't Fix?

@midhuntp
Copy link

midhuntp commented Aug 6, 2020

I was also having the same issue

But after Debugging Im able to find that while gunicorn starts Django App one of the dependency was taking longer than the expected time , ( In my case external DB connection ) which make the gunicron worker to timeout

When I resolved the connection issue , timeout issue also resolved ...

@leonbrag
Copy link

leonbrag commented Aug 6, 2020 via email

@asnisarenko
Copy link

@leonbrag
This is likely NOT a gunicorn bug. See my commend above in the thread. It's a side-effect of browsers sending empty "predicted" TCP connections, and running gunicorn with only a few sync workers without protection from empty TCP connections.

@leonbrag
Copy link

leonbrag commented Aug 8, 2020 via email

@asnisarenko
Copy link

@leonbrag
You should be more specific about what problem you are trying to solve.

The problem discussed in this thread happens in dev environment and the easiest solution is either to add more sync workers or use threaded workers.

If you want to avoid this issue in production setup, you can use gevent workers, or you can put an nginx infront of gunicorn.
Some PaaS already put an nginx in front of your docker container, so you don't have to worry about it. Again the solution depends on the context and the details.

This is a good reading.
https://www.brianstorti.com/the-role-of-a-reverse-proxy-to-protect-your-application-against-slow-clients/

@benoitc
Copy link
Owner

benoitc commented Aug 9, 2020 via email

@AlejandroRodriguezP
Copy link

web: gunicorn --workers=3 app:app --timeout 200 --log-file -

I fixed my problem by incresing the --timeout

@ivictbor
Copy link

ivictbor commented Nov 22, 2020

See also #1388 for Docker related tmpfs issues.

Oh, thanks a lot Randall, I forgot to add --worker-tmp-dir /dev/shm to gunicorn arguments when I was running gunicorn in Docker.

BTW will 64 Mb be enough for gunicorn cache?

@attajutt
Copy link

attajutt commented Nov 27, 2020

gunicorn app:app --timeout 1000
Or
gunicorn app:app --preload

Worked for me... I prefer timeout one.

@ivictbor
Copy link

ivictbor commented Nov 27, 2020

Strange, I added --worker-tmp-dir /dev/shm but still receiving:

[2020-11-27 21:01:42 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:17)

To make sure /dev/shm is ramfs I benchmarked it:

image

The params are next:

    command: /bin/bash -c "cd /code/ && pipenv run gunicorn --worker-tmp-dir /dev/shm conf.wsgi:application --bind 0.0.0.0:8022 --workers 5 --worker-connections=1000"

PS: I am using PyPy

@attajutt timeout is nice but you are risking that gunicorn master process will detect hangup in your worker process only after 1000 seconds, and you will miss a lot of requests. Also it will be hard to detect it if only one of several workers will hangup. I would not do 1000 at least.

@attajutt
Copy link

@ivictbor thanks for lmk. 1000 is for reference. Nevertheless, I got the app rolling once Its loaded It is running perfectly fine.

@Subrata15
Copy link

I got this error problem too and after several times, I found that the problem is probably caused :

  1. Nginx configuration
  2. Gunicorn/Uwsgi

If you deploy your app in cloud like GAE, that will not surface anything hint error.
you can try to surface the error using this case solution : https://stackoverflow.com/questions/38012797/google-app-engine-502-bad-gateway-with-nodejs

If raised 502 bad gateway;
probably will have 2 possibilities:

  1. gunicorn isn't running
  2. gunicorn got timeout

complete sulotion explained in here : https://www.datadoghq.com/blog/nginx-502-bad-gateway-errors-gunicorn/

hope that can fix anyone got error in [CRITICAL] WORKER TIMEOUT

@coltonbh
Copy link

Adding another possibility for those who find this thread...

This can also be caused by having docker imposed resource constrains that are too low for you web application. For example I had the following constraints:

services:
  web_app:
    image: blah-blah
    deploy:
      resources:
        limits:
          cpus: "0.25"
          memory: 128M

and these were evidently too low for gunicorn so I constantly got the [CRITICAL] WORKER TIMEOUT error until I removed the constraints.

@benoitc
Copy link
Owner

benoitc commented Mar 26, 2021 via email

@dplutcho
Copy link

--timeout=1000 worked form me. Issue was a low-cpu resourced GCP machine. It worked fine on my local machine with the default timeout.

@ghost
Copy link

ghost commented Aug 11, 2021

gunicorn app:app --timeout 1000

You're great. It was for me the solution. Thanks very much.

@ashwath007
Copy link

gunicorn app:app --timeout 3000 Worked for me ✌️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests