Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

port mapping disappear after 'intensive' use #8817

Open
superbob opened this issue Oct 28, 2014 · 16 comments
Open

port mapping disappear after 'intensive' use #8817

superbob opened this issue Oct 28, 2014 · 16 comments
Labels
area/networking exp/expert kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/1.3

Comments

@superbob
Copy link

I have a rabbitmq-server docker container created from a self made image (Dockerfile) that expose 3 ports : 4369, 5672, 15672.
I want to use it through a client application that does a lot of tests (open and close a lot of connection in a short time span).
When I use through my client application, it work correctly at first, but after a short time (20-30s), my client application start receiving only "Connection Refused" errors and one of the port mapping disappear.
The initial docker run command I used was :

docker run -i -p 5672:5672 -p 15672:15672 -p 4369:4369 --name="rabbitmq-server" -t rabbitmq-server

I created the container some month ago and it is not running continuously.
So I start it every day with a docker start rabbitmq-server command.
After starting it, I see the ports mapped in my host interface with netstat :

$ sudo netstat -nlp | grep docker-proxy                                                                                                                                         [10:55:52]
tcp        0      0 :::15672                :::*                    LISTEN      26875/docker-proxy  
tcp        0      0 :::5672                 :::*                    LISTEN      26892/docker-proxy  
tcp        0      0 :::4369                 :::*                    LISTEN      26884/docker-proxy  

(I filtered the output to only interesting ports)
After seeing "Connection Refused" errors, 5672 misses the port mapping :

$ sudo netstat -nlp | grep docker-proxy                                                                                                                                         [10:59:02]
tcp        0      0 :::15672                :::*                    LISTEN      26875/docker-proxy  
tcp        0      0 :::4369                 :::*                    LISTEN      26884/docker-proxy  

(I filtered the output to only interesting ports)
Despite this, the container works fine, other ports are working correctly, the web interface (port 15672) is good.
If I try to connect directly to the container IP it works also :

$ telnet 172.17.0.5 5672                                                                                                                                                        [11:05:21]
Trying 172.17.0.5...
Connected to 172.17.0.5.
Escape character is '^]'.

It is only the port mapping that disappear.

Here's a capture showing the throughput from rabbitmq :
rabbit-docker
It started consuming at 10:58:35 and there was a break down at 10:58:55 that correspond to the moment since I started receiving "Connection Refused" errors. At its best it was consuming more that 100 msg/s.

It might be related to : #8022 and/or #8428

Host info :

$ docker info                                                                                                                                                                   [11:10:46]
Containers: 4
Images: 74
Storage Driver: devicemapper
 Pool Name: docker-8:2-10226587-pool
 Pool Blocksize: 65.54 kB
 Data file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 8.267 GB
 Data Space Total: 107.4 GB
 Metadata Space Used: 7.504 MB
 Metadata Space Total: 2.147 GB
 Library Version: 1.03.01 (2011-10-15)
Execution Driver: native-0.2
Kernel Version: 3.11.10-21-default
Operating System: openSUSE 13.1 (Bottle) (x86_64) (containerized)

$ docker version                                                                                                                                                                [11:07:31]
Client version: 1.3.0
Client API version: 1.15
Go version (client): go1.3.1
Git commit (client): c78088f
OS/Arch (client): linux/amd64
Server version: 1.3.0
Server API version: 1.15
Go version (server): go1.3.1
Git commit (server): c78088f
@LK4D4
Copy link
Contributor

LK4D4 commented Oct 28, 2014

Seems like docker-proxy just dying. We probably fixed this in master.

@superbob
Copy link
Author

superbob commented Nov 4, 2014

Same issue in 1.3.1

$ docker version
Client version: 1.3.1
Client API version: 1.15
Go version (client): go1.3.3
Git commit (client): 4e9bbfa
OS/Arch (client): linux/amd64
Server version: 1.3.1
Server API version: 1.15
Go version (server): go1.3.3
Git commit (server): 4e9bbfa

@superbob
Copy link
Author

superbob commented Dec 5, 2014

I changed the way my client application uses the rabbitmq-server docker container so that it is less "intensive".

Now I don't reproduce the problem, but I can't tell if the issue is still there.

I don't know if this issue should be closed.

@gdm85
Copy link
Contributor

gdm85 commented Jan 2, 2015

I have an issue which can be modeled around this problem too; I'll try to write a test for the situation you described

@superbob
Copy link
Author

superbob commented Jan 5, 2015

Thank you @gdm85

@jessfraz
Copy link
Contributor

can you check if it is fixed for 1.5

@leverly
Copy link

leverly commented Feb 28, 2015

It is still not fixed for 1.5.

@thaJeztah
Copy link
Member

@cpuguy83 probably needs label "bug" too?

@spf13 spf13 added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. /system/networking exp/expert and removed /system/networking exp/expert labels Mar 21, 2015
@stpkys
Copy link

stpkys commented Oct 6, 2015

Still problem
docker-proxy dies silently on high number of concurrent connections

Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 786b29d/1.7.1
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 786b29d/1.7.1
OS/Arch (server): linux/amd64

Increasing opened files limit helped (ulimit -n 65535), but it would be great if docker-proxy log it somehow.

@BlackGlory
Copy link

It can still be reproduced on 19.03.11 and 19.03.12.

docker run -d -p 8080:80 --name my-nginx nginx:1.18

netstat -ntulp | grep 8080
# tcp6    0     0    :::8080     :::*     LISTEN  6237/docker-proxy

# at the beginning of this benchmark, my-nginx can still be accessed through localhost:8080
wrk -t12 -c1000 -d30s http://localhost:8080

netstat -ntulp | grep 8080
# empty

wrk -t12 -c1000 -d30s http://localhost:8080
# unable to connect to localhost:8080 Connection refused

# get my-nginx's ip
docker inspect my-nginx
# my-nginx can be accessed through container_ip:80
wrk -t12 -c1000 -d30s http://172.17.0.2:80

The stackoverflow question:
https://stackoverflow.com/questions/64014595/the-docker-container-loses-port-forwarding-after-running-benchmarks

@thaJeztah
Copy link
Member

thaJeztah commented Sep 23, 2020

Thanks for that additional information, @BlackGlory. From that output, it seems like the docker-proxy process for the container is gone.

I wonder if the system was under memory pressure and if because of that the kernel's OOM killer kicked in, and killed the docker-proxy process for that container.

Looking at that process;

docker run -d --name foo -p 8070:80 nginx:alpine

# (I only have a single container running on this test machine)
pidof docker-proxy
4607

cat /proc/4607/oom_adj
-8

I see that the docker-proxy has a default oom-score-adj of -8, which means that, although it's slightly adjust to prevent it from being killed (default is 0), and it has a lower score than the container itself (which doesn't adjust score by default), it's possible that if the system is under memory pressure, that the kernel OOM killer killed the proxy.

I haven't checked yet where the OOM-score-adj for docker-proxy is set, but I also checked if the -8 score is adjusted if the container itself is configured to have a lower oom-score-adj;

# remove the container
docker rm -f foo

# create a new container with a negative OOM-score-adjust
docker run -d --name foo -p 8070:80 --oom-score-adj=-200


pidof docker-proxy
5122

cat /proc/5122/oom_adj
-8

So from that, that doesn't appear to be the case. Perhaps we should adjust the OOM-score-adj to be relative to the container's score, so that they (more likely) are either both killed (container including the proxy), or both kept up, otherwise the container keeps running in a somewhat defunct state (ports not accessible).

Note that I think the docker-proxy is only needed to facilitate hairpin connections from the host itself. If you're on a modern distro, you may be able to run without, and configure the docker daemon to disable spinning up docker-proxy processes for each container.

@thaJeztah
Copy link
Member

Also see #14856 w.r.t. the docker-proxy, and #5618, which is a kernel-bug (but should be fixed in recent kernels) that prevented us from disabling it by default).

@BlackGlory if you're consistently able to reproduce the issue on your test-system, would you be able to check if the process was killed by the kernel's OOM killer? You should be able to find log-entries for this in your system log; https://stackoverflow.com/a/15953500/1811501

dmesg | egrep -i 'killed process'

or

grep -i 'killed process' /var/log/messages

@BlackGlory
Copy link

BlackGlory commented Sep 23, 2020

@thaJeztah It doesn't seem to be about the kernel's OOM killer.

dmesg | egrep -i 'killed process'
# empty
grep -i 'killed process' /var/log/syslog
# empty

@cjdcordeiro
Copy link

cjdcordeiro commented Jan 8, 2021

+1 - also having this issue on a RPi 3B+, with Docker 18.09.1.

Also, fyi, in case someone wants to workaround this, and given that the docker-proxy process is restarted alongside a container restart:

  1. define a restart policy for the container (e.g. on-failure)
  2. install and add tini as an entrypoint to the container (e.g. ENTRYPOINT ["/sbin/tini", "--"])
  3. set an healthcheck. Assuming you're running some API server on host port XYZ, then your HEALTHCHECK command would look something like curl -f http://$(route -n | grep 'UG[ \t]' | awk '{print $2}'):XYZ 2>&1 || (kill $(pgrep tini) && exit 1). This $(route ...) instruction is basically infer the gateway IP, thus sending the curl request directly to the host, where the container service should be published on port XYZ. If the docker-proxy is down, this curl will get connection refused and the healthcheck command will terminate the container's main process, thus causing a container restart (because of 1.) and subsequently a new docker-proxy process to be spawned

@vasi26ro
Copy link

vasi26ro commented Sep 1, 2022

+1 also having the same issue with docker Docker version 20.10.14, build a224086349 installed via snap.
After switching to docker installed with apt this behavior has disappeared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking exp/expert kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/1.3
Projects
None yet
Development

No branches or pull requests