Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sockets #1913

Closed
andjedani opened this issue Nov 5, 2018 · 87 comments · Fixed by #2277
Closed

Sockets #1913

andjedani opened this issue Nov 5, 2018 · 87 comments · Fixed by #2277
Labels
help wanted Open for everyone. You do not need permission to work on these. May need familiarity with codebase. Investigation

Comments

@andjedani
Copy link

Service is running on a kubernetes pod, and out of nowhere and without any specific causes, it happens off and on:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
[2018-11-04 17:57:55 +0330] [31] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
@javabrett
Copy link
Collaborator

Is the anything interesting upstream of gunicorn in your pod, like a reverse-proxy, nginx?

@andjedani
Copy link
Author

No, no proxies, no nginx

@gforcada
Copy link

gforcada commented Nov 7, 2018

I'm having the exact same problem 😕

Upstream we have HAProxy and on its HTTP log format, the session state at disconnection (see http://cbonte.github.io/haproxy-dconv/1.8/configuration.html#8.5) it logs those errors as CH-- meaning:

  • C : the TCP session was unexpectedly aborted by the client.
  • H : the proxy was waiting for complete, valid response HEADERS from the server (HTTP only).

So, if I understand that correctly, the client closed the connection while gunicorn was still sending the response.

@javabrett
Copy link
Collaborator

Any clues what makes the client abort? Was it waiting a long time for gunicorn to send complete response headers?

@gforcada
Copy link

gforcada commented Nov 7, 2018

@javabrett it does not seem like that, at least on the few log messages I looked up, it is mostly images or other assets, so it should not be taking much time.

The client might have closed the browser or any other action that would abruptly close the connection? 🤔

@benoitc
Copy link
Owner

benoitc commented Nov 7, 2018

@gforcada are you using the proxy protocol with haproxy?

@gforcada
Copy link

gforcada commented Nov 7, 2018

@benoitc not that I'm aware of

@benlucchesi
Copy link

anyone get anywhere with this? I don't have much to contribute except the exact same error.

My configuration consists of a loadbalancer that's being used to terminate SSL and forward requests to a django app running in a docker container. I'm not sure what the LB is implemented with - its a Digital Ocean product.

I'm fairly certain it related to the load balancer because I have the same app running in another container that isn't behind an LB and its never had this problem.

Any ideas on the root cause and how to prevent?

@tilgovi
Copy link
Collaborator

tilgovi commented Jan 22, 2019

I wonder if there's any action here. If this is a regular client disconnect, we could maybe silence the error and maybe log a disconnect in the access log, but otherwise I'm not sure what to do.

@tamasgal
Copy link

I just had the same error which crashed our monitoring webserver:

[2019-06-10 11:38:25 +0200] [27989] [CRITICAL] WORKER TIMEOUT (pid:17906)
[2019-06-10 11:38:25 +0200] [17906] [INFO] Worker exiting (pid: 17906)
[2019-06-10 11:38:25 +0200] [17924] [INFO] Booting worker with pid: 17924
[2019-06-10 11:38:37 +0200] [17922] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/home/off1user/.pyenv/versions/3.6.1/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/home/off1user/.pyenv/versions/3.6.1/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/home/off1user/.pyenv/versions/3.6.1/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/home/off1user/.pyenv/versions/3.6.1/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/home/off1user/.pyenv/versions/3.6.1/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/home/off1user/.pyenv/versions/3.6.1/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
[2019-06-10 11:38:47 +0200] [27989] [CRITICAL] WORKER TIMEOUT (pid:17920)
[2019-06-10 11:38:47 +0200] [17920] [INFO] Worker exiting (pid: 17920)

@helltone
Copy link

I had the same with pod running docker image dpage/pgadmin4:4.2

OSError: [Errno 107] Socket not connected
[2019-06-14 12:20:32 +0000] [77] [ERROR] Socket error processing request.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/gthread.py", line 274, in handle
req = six.next(conn.parser)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in next
self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in init
super(Request, self).init(cfg, unreader)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in init
unused = self.parse(self.unreader)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
self.headers = self.parse_headers(data[:idx])
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
remote_addr = self.unreader.sock.getpeername()

@jacek-jablonski
Copy link

Looks very similar to: #2070

@bboe
Copy link

bboe commented Jul 15, 2019

I'm getting this error occasionally on hosted Google Cloud Run. Below is a simplified version of our container definition:

FROM ubuntu:18.04

ENV APP_HOME /app
WORKDIR $APP_HOME

RUN apt-get update \
  && apt-get install --no-install-recommends -y python3 python3-pip \
  && rm -rf /var/lib/apt/lists/*

RUN pip3 install --compile --no-cache-dir --upgrade pip setuptools

RUN mkdir invoice_processing && \
    pip install --compile --disable-pip-version-check --no-cache-dir flask gunicorn

COPY app.py ./
CMD exec gunicorn --bind :$PORT --workers 1 --threads 1 app:app

Stackdriver shows the following stacktrace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/base.py", line 134, in init_process
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/sync.py", line 124, in run
    self.run_for_one(timeout)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/sync.py", line 68, in run_for_one
    self.accept(listener)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/sync.py", line 27, in accept
    client, addr = listener.accept()
  File "/usr/lib/python3.6/socket.py", line 205, in accept
    fd, addr = self._accept()
OSError: [Errno 107] Transport endpoint is not connected

@GAEfan
Copy link

GAEfan commented Jul 30, 2019

Same issue as OP here. Using Google Cloud Platform, Python 3.7, gunicorn 19.9.0

Traceback (most recent call last):
  File "/env/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
    req = six.next(parser)
  File "/env/lib/python3.7/site-packages/gunicorn/http/parser.py", line 41, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
  File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 181, in __init__
    super(Request, self).__init__(cfg, unreader)
  File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 230, in parse
    self.headers = self.parse_headers(data[:idx])
  File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 74, in parse_headers
    remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
 timestamp:  "2019-07-30T15:23:55.435130Z"

@hcaihao
Copy link

hcaihao commented Aug 15, 2019

I'm having the exact same problem 😕

@James-D-Wood
Copy link

Exact same problem as GAEfan. Running a Flask app with Python 3.7 in App Engine Standard Env.

@skipikash
Copy link

Same issue here

@lfelipedeoliveira
Copy link

I'm having the same issue running Django app with Python 3.7 in Google App Engine.

Traceback (most recent call last): File "/env/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle req = six.next(parser) File "/env/lib/python3.7/site-packages/gunicorn/http/parser.py", line 41, in next self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count) File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 181, in init super(Request, self).init(cfg, unreader) File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 54, in init unused = self.parse(self.unreader) File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 230, in parse self.headers = self.parse_headers(data[:idx]) File "/env/lib/python3.7/site-packages/gunicorn/http/message.py", line 74, in parse_headers remote_addr = self.unreader.sock.getpeername() OSError: [Errno 107] Transport endpoint is not connected

@NixBiks
Copy link

NixBiks commented Sep 11, 2019

Same issue running GAE python 3.7 gunicorn and fastapi/uvicorn.

@socialtelligence
Copy link

Same issue Google Cloud Run

@benoitc
Copy link
Owner

benoitc commented Oct 4, 2019

which kind of request are we talking about?

@tc64
Copy link

tc64 commented Oct 8, 2019

same issue in google app engine. POST request. happens inconsistently. Flask app. @benoitc please let me know what info would be useful and i can post.

@ValentinMoullet
Copy link

Same issue as well, Google App Engine, POST request too, Flask app. It seemed to have started when I changed to a custom entrypoint code instead of letting the default one. Custom entrypoint is the following (in Google App Engine you set it inside an app.yaml file):

gunicorn -b :$PORT --timeout 1200 server.main:app

Default entrypoint is not setting anything (don't know what is used as default entrypoint though).

Not sure if it started because of that, but I noticed this when I made this change (among other changes).

@seenureddy
Copy link

seenureddy commented Oct 16, 2019

No, no proxies, no nginx

I was using gunicorn without nginx. I was getting the same issue. My setup is running on Openshift.
gunicorn --chdir /src/app wsgi:application --bind 0.0.0.0:8000 --workers 4 --timeout 180 -k gevent

https://stackoverflow.com/questions/58389201/gunicorn-is-failing-with-oserror-errno-107-transport-endpoint-is-not-connecte

@benoitc
Copy link
Owner

benoitc commented Oct 16, 2019

the question stand

Same issue as well, Google App Engine, POST request too, Flask app. It seemed to have started when I changed to a custom entrypoint code instead of letting the default one. Custom entrypoint is the following (in Google App Engine you set it inside an app.yaml file):

gunicorn -b :$PORT --timeout 1200 server.main:app

Default entrypoint is not setting anything (don't know what is used as default entrypoint though).

Not sure if it started because of that, but I noticed this when I made this change (among other changes).

what do you mean by entry point? can you post a debug log and the way the request is done? (raw http would help)

@cmin764
Copy link

cmin764 commented Oct 19, 2019

the question stand

Same issue as well, Google App Engine, POST request too, Flask app. It seemed to have started when I changed to a custom entrypoint code instead of letting the default one. Custom entrypoint is the following (in Google App Engine you set it inside an app.yaml file):
gunicorn -b :$PORT --timeout 1200 server.main:app
Default entrypoint is not setting anything (don't know what is used as default entrypoint though).
Not sure if it started because of that, but I noticed this when I made this change (among other changes).

what do you mean by entry point? can you post a debug log and the way the request is done? (raw http would help)

I think he's referring to the fact that you're explicitly specifying the app path the gunicorn should import-find and run, like the server.main:app in his example.

L.E.: Maybe the updated example over here helps: https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/appengine/standard_python37/hello_world (so basically you have to let the service handle how the server should be started)

@thinkjrs
Copy link

Firstly, @benoitc THANK YOU. Your work is awesome.

I'm also experiencing this same issue on Google Cloud Run w/gunicorn. I'm posting what I have, though it's likely not unique, perusing the above. I'm running a Flask app with Gunicorn as the server (and no proxy) in a Docker container.

The traceback (from GC console):

  File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 104, in init_process
    super(ThreadWorker, self).init_process()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.run()
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 211, in run
    callback(key.fileobj)
  File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 127, in accept
    sock, client = listener.accept()
  File "/usr/local/lib/python3.7/socket.py", line 212, in accept
    fd, addr = self._accept()
OSError: [Errno 107] Transport endpoint is not connected

And Google's parsed output of the above:

OSError: [Errno 107] Transport endpoint is not connected
at accept (/usr/local/lib/python3.7/socket.py:212)
at accept (/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py:127)
at run (/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py:211)
at init_process (/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py:134)
at init_process (/usr/local/lib/python3.7/site-packages/gunicorn/workers/gthread.py:104)
at spawn_worker (/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py:583)

If there is anything else I can provide or do to help here, please let me know.

@tilgovi
Copy link
Collaborator

tilgovi commented Oct 27, 2019

A PR would be welcome to handle ENOTCONN gracefully for all the workers. Please post here if you start working on this and I would be happy to review a PR. I'm sure some on this thread would be happy to help test a branch.

@WhyNotHugo
Copy link

In my case, I've discovered that any HEAD request emit this.

I'm using django behind gunicorn, and I suspect that the application want to write a response body, (it shouldn't), but I haven't confirmed that to be the case yet.

@SaschaHeyer
Copy link

same behavior

@tilgovi
Copy link
Collaborator

tilgovi commented Jul 19, 2020

I think this might be fixed by #2277

kbussell pushed a commit to RecycledMedia/gunicorn that referenced this issue Jul 30, 2020
A couple of socket operations can fail with ENOTCONN error if the
other side of the connection is not connected anymore. In that case,
let's not crash the whole worker and give a chance to accept new
connections.

In my case, the operation that sometimes fails is a "getpeername()",
which was introduced in b07532b
(v19.8.0). Someone in benoitc#1913
metionned that v19.7.1 was working fine so it matches.

Fixes benoitc#1913
kbussell pushed a commit to RecycledMedia/gunicorn that referenced this issue Jul 30, 2020
benoitc#2277 was branched off of
master. I cherry-picked the PR's commit on top of the 20.0.4 tag of the
main repo (and updated this commit message) for a custom build

Do not raise and crash worker on ENOTCONN error

A couple of socket operations can fail with ENOTCONN error if the
other side of the connection is not connected anymore. In that case,
let's not crash the whole worker and give a chance to accept new
connections.

In my case, the operation that sometimes fails is a "getpeername()",
which was introduced in b07532b
(v19.8.0). Someone in benoitc#1913
metionned that v19.7.1 was working fine so it matches.

Fixes benoitc#1913
FinnStutzenstein pushed a commit to FinnStutzenstein/gunicorn that referenced this issue Sep 1, 2020
A couple of socket operations can fail with ENOTCONN error if the
other side of the connection is not connected anymore. In that case,
let's not crash the whole worker and give a chance to accept new
connections.

In my case, the operation that sometimes fails is a "getpeername()",
which was introduced in b07532b
(v19.8.0). Someone in benoitc#1913
metionned that v19.7.1 was working fine so it matches.

Fixes benoitc#1913
@eavive
Copy link

eavive commented Sep 17, 2020

In my case, Ansible's wait_for module is the cause.

I use Ansible to deploy a gunicorn + flask server (specifically Python 3.6.12, gunicorn 19.9.0, Flask 1.4.1).

After starting the service, i use the wait_for module to make sure the service is up and running.
This module probably breaks the connection immediately after it validates the service is up (not waiting for gunicorn to response) and thus, gunicorn raises this error.

I guess other monitoring systems does the same.

@aswzen
Copy link

aswzen commented Sep 30, 2020

I got the same error .. hmm
Currently we got huge traffic.. 100-1000 TPS, and some request failed randomly

Python 3.8
Flask
Gunicorn

With below docker properties..

FROM python:3-slim

RUN apt-get update && apt-get -y install gcc

ENV PYTHONUNBUFFERED True

COPY . /app
 
WORKDIR /app/src

RUN pip install Flask requests gunicorn
RUN pip install -U flask-cors
RUN pip install requests
RUN pip install DateTime
RUN pip install timedelta

RUN chmod 444 app.py

CMD exec gunicorn -b :443 --workers 5 --threads 8 --timeout 10 app:app --reload

Any solution?

@kamyar
Copy link

kamyar commented Dec 14, 2020

Are there any updates on this?
It seems there are multiple PRs to fix it, do we have a time line to release them?
Screenshot 2020-12-14 at 12 45 42

@yehjames
Copy link

yehjames commented Dec 28, 2020

Hi @tilgovi
Do we have a timeline to release this new version? it seems the Gunicorn package does not update for a long time...
image

@benoitc
Copy link
Owner

benoitc commented Dec 29, 2020 via email

@satels
Copy link

satels commented Dec 30, 2020

?

@benoitc
Copy link
Owner

benoitc commented Dec 30, 2020 via email

@yehjames
Copy link

yehjames commented Jan 4, 2021

thanks, I am wondering know is there any update info about the pip package?

@benoitc
Copy link
Owner

benoitc commented Jan 4, 2021

@yehjames is master working for you? A release is planned now today. But any feedback on how master works on different platforms is welcome.

@arun-rangarajan
Copy link

@benoitc Any update on this? Using 20.0.4 in production and implemented the change suggested by @asantoni (as a monkey-patch) to avoid frequent crashes. But Veracode static code scan doesn't like the patch, so trying to fix it now. Thank you!

@tilgovi
Copy link
Collaborator

tilgovi commented Jan 18, 2021

We'll work to get a release out as soon as we can. We cannot promise a day, but we're working to figure out what remains for this release and to improve the release management for the future.

@tilgovi
Copy link
Collaborator

tilgovi commented Jan 18, 2021

Please use GitHub's "Watch" feature for the repository and watch for releases if you want to be notified.

@RicHincapie
Copy link

Hi. I am having the same Issue with HAProxy + Gunicorn + Django.

My HAProxy backend looses almost all its servers due to checks not responded and Gunicorn logs are plagued with:

[2021-07-23 18:16:27 -0500] [13] [ERROR] Socket error processing request. Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/gunicorn/workers/sync.py", line 133, in handle req = next(parser) File "/usr/local/lib/python3.9/site-packages/gunicorn/http/parser.py", line 41, in __next__ self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count) File "/usr/local/lib/python3.9/site-packages/gunicorn/http/message.py", line 186, in __init__ super().__init__(cfg, unreader) File "/usr/local/lib/python3.9/site-packages/gunicorn/http/message.py", line 53, in __init__ unused = self.parse(self.unreader) File "/usr/local/lib/python3.9/site-packages/gunicorn/http/message.py", line 235, in parse self.headers = self.parse_headers(data[:idx]) File "/usr/local/lib/python3.9/site-packages/gunicorn/http/message.py", line 73, in parse_headers remote_addr = self.unreader.sock.getpeername() OSError: [Errno 107] Transport endpoint is not connected

I am working with gunicorn==20.0.4, Django==3.1.5, HA-Proxy version 2.2.11-1ppa1~bionic

Any clue on how to proceed?

This is on TCP mode, no SSL, on Locust Stress Testing.

@krishnamanchikalapudi
Copy link

Someone pls share the solution on this issue

@JordanP
Copy link
Contributor

JordanP commented Aug 6, 2021

@krishnamanchikalapudi @ricarhincapie please upgrade to the latest release of Gunicorn :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Open for everyone. You do not need permission to work on these. May need familiarity with codebase. Investigation
Projects
None yet
Development

Successfully merging a pull request may close this issue.