Bug: bind address/port already in use #457

spinanicky · 2023-09-21T12:52:34Z

Running 3.0.2 and we have tried several approaches to fixing this issue (including restarting both docker and shiny before a large group signs on).

Long story short we keep getting errors saying the port is already in use. It is hit an miss where sometimes we have 60+ users connected and sometimes we get to 10 and it starts throwing this error.

{"message":"driver failed programming external connectivity on endpoint gracious_albattani (e3e2e1d06079f881534875f477dd98b25fb7c016f685d56ff777fc397e819cd3): Error starting userland proxy: listen tcp4 0.0.0.0:40018: bind: address already in use"}

EDIT: we are seeing two different errors:
Bind for 0.0.0.0:[0-9]+ failed: port is already allocated
listen tcp4 0.0.0.0:[0-9]+: bind: address already in use

There are no containers listed in shinyproxy or docker as running on that port.

server@prod-ae-spvm:~/shinyproxy-etc$ sudo lsof -n -i :40018

COMMAND      PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
docker-pr 138817 root   44u  IPv4 639804      0t0  TCP 172.17.0.1:40018->172.17.0.51:3838 (ESTABLISHED)

server@prod-ae-spvm:~/shinyproxy-etc$ kill -9 138817

Yet even after killing the process the error still persists with the same port "being occupied". We are seeing this occur over and over again and I believe it is tied to what others have seen here as well:

https://support.openanalytics.eu/t/error-500-port-is-already-allocated/604


2023-09-21 12:40:03.916  INFO 149714 --- [pool-1-thread-4] e.o.containerproxy.service.ProxyService  : [user=EMAIL@gmail.com proxyId=8c3743f3-ad4e-4e5d-b2a7-968af1bb194f specId=grouping_summaries] Starting proxy
2023-09-21 12:40:03.973  INFO 149714 --- [pool-1-thread-4] c.s.docker.client.DefaultDockerClient    : Starting container with Id: a42a78e6fe2084a11ad90b657343f000987560564103edbcde1d29b49f624d86
2023-09-21 12:40:04.273  WARN 149714 --- [pool-1-thread-4] e.o.containerproxy.service.ProxyService  : [user=EMAIL@gmail.com proxyId=8c3743f3-ad4e-4e5d-b2a7-968af1bb194f specId=grouping_summaries] Proxy failed to start

eu.openanalytics.containerproxy.ProxyFailedToStartException: Container with index 0 failed to start
	at eu.openanalytics.containerproxy.backend.AbstractContainerBackend.startProxy(AbstractContainerBackend.java:116) ~[containerproxy-1.0.2.jar!/:1.0.2]
	at eu.openanalytics.containerproxy.service.ProxyService.startOrResumeProxy(ProxyService.java:455) [containerproxy-1.0.2.jar!/:1.0.2]
	at eu.openanalytics.containerproxy.service.ProxyService.lambda$startProxy$4(ProxyService.java:279) [containerproxy-1.0.2.jar!/:1.0.2]
	at eu.openanalytics.containerproxy.service.ProxyService.lambda$action$11(ProxyService.java:599) [containerproxy-1.0.2.jar!/:1.0.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_382]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_382]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_382]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_382]
	at java.lang.Thread.run(Thread.java:750) ~[na:1.8.0_382]
Caused by: eu.openanalytics.containerproxy.ContainerFailedToStartException: Docker container failed to start
	at eu.openanalytics.containerproxy.backend.docker.DockerEngineBackend.startContainer(DockerEngineBackend.java:153) ~[containerproxy-1.0.2.jar!/:1.0.2]
	at eu.openanalytics.containerproxy.backend.AbstractContainerBackend.startProxy(AbstractContainerBackend.java:108) ~[containerproxy-1.0.2.jar!/:1.0.2]
	... 8 common frames omitted
Caused by: com.spotify.docker.client.exceptions.DockerRequestException: Request error: POST http://localhost:2375/containers/a42a78e6fe2084a11ad90b657343f000987560564103edbcde1d29b49f624d86/start: 500, body: {"message":"driver failed programming external connectivity on endpoint gracious_albattani (e3e2e1d06079f881534875f477dd98b25fb7c016f685d56ff777fc397e819cd3): Error starting userland proxy: listen tcp4 0.0.0.0:40018: bind: address already in use"}

	at com.spotify.docker.client.DefaultDockerClient.propagate(DefaultDockerClient.java:2840) ~[docker-client-8.16.0.jar!/:8.16.0]
	at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2734) ~[docker-client-8.16.0.jar!/:8.16.0]
	at com.spotify.docker.client.DefaultDockerClient.containerAction(DefaultDockerClient.java:783) ~[docker-client-8.16.0.jar!/:8.16.0]
	at com.spotify.docker.client.DefaultDockerClient.containerAction(DefaultDockerClient.java:768) ~[docker-client-8.16.0.jar!/:8.16.0]
	at com.spotify.docker.client.DefaultDockerClient.startContainer(DefaultDockerClient.java:763) ~[docker-client-8.16.0.jar!/:8.16.0]
	at eu.openanalytics.containerproxy.backend.docker.DockerEngineBackend.startContainer(DockerEngineBackend.java:147) ~[containerproxy-1.0.2.jar!/:1.0.2]
	... 9 common frames omitted
Caused by: javax.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error
	at org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:1107) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.JerseyInvocation.translate(JerseyInvocation.java:894) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.JerseyInvocation.access$700(JerseyInvocation.java:98) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.JerseyInvocation$5.completed(JerseyInvocation.java:851) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.ClientRuntime.processResponse(ClientRuntime.java:197) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.ClientRuntime.access$300(ClientRuntime.java:74) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.ClientRuntime$2$1$1.run(ClientRuntime.java:167) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:267) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:303) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.ClientRuntime$2$1.response(ClientRuntime.java:165) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.apache.connector.ApacheConnector$1.run(ApacheConnector.java:498) ~[jersey-apache-connector-2.26-b03.jar!/:na]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_382]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_382]
	at jersey.repackaged.com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299) ~[jersey-guava-2.26-b03.jar!/:na]
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) ~[na:1.8.0_382]
	at jersey.repackaged.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:50) ~[jersey-guava-2.26-b03.jar!/:na]
	at jersey.repackaged.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:37) ~[jersey-guava-2.26-b03.jar!/:na]
	at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:494) ~[jersey-apache-connector-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.ClientRuntime$2.run(ClientRuntime.java:181) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:267) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:303) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.ClientRuntime$3.run(ClientRuntime.java:209) ~[jersey-client-2.26-b03.jar!/:na]
	... 5 common frames omitted

I created a script as an interim solution to try and rapidly kill hung containers (note: you have to update the path to your shinyproxy log on line 10):
https://gist.github.com/spinanicky/d82763292cacf357b26fd2dfcf11836c

The text was updated successfully, but these errors were encountered:

ntyndall · 2023-10-26T15:18:23Z

Hey there,

I am experiencing the exact same behaviour running shinyproxy 2.6.1. It just seems to randomly occur and currently we just have to restart docker to get this working again..

Thanks!

essemenov · 2023-12-08T14:34:13Z

Interesting case!
What Docker version is installed on server?

spinanicky · 2023-12-08T15:15:42Z

Interesting case! What Docker version is installed on server?

24.0.7

LEDfan · 2024-01-02T08:21:07Z

Hi
We'll investigate this issue, so that we can hopefully reduce the problem and fix it. I'll provide an update once we have more information.

LEDfan · 2024-01-25T08:59:33Z

I'm unable to reproduce this issue. It seems like the docker-proxy command keeps running, even when the container was already stopped. Could you share the following information?

operating system (including version)
are you using a local VM, or some cloud provider?
resources of the ShinyProxy server (memory/cpu)
your application.yml
can you check the logs of the Docker daemon?

Hopefully with this information I'm able to set up a similar environment and reproduce the problem.

spinanicky · 2024-01-25T11:51:08Z

Proxmox 8.0.2 -> Ubuntu 23.10 VM (note the host (proxmox) uses zfs). I have been meaning to test whether this might be the cause as it also prevents us from running containers rather than VMs.
Sits on two dedicated hosts (we give users the option to connect to either url given the failures described herein)
A) 8 x Intel(R) Xeon(R) CPU E3-1275 v6 @ 3.80GHz + 64 GB ram
B) 12 x Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz + 128 GB ram
VM for shinyproxy has 30gb + 2vcpu on server A and 60gb + 8 vcpu on B
will share privately
is there a specific command you want me to run for the logs because its a 800mb file :)

LEDfan · 2024-01-25T12:12:16Z

Thank you for the details, the issue with ZFS sounds interesting, I could try to use a ZFS host. At least it seems your server has enough resources, so I don't think we need to look into that direction.

W.r.t to the application.yml, you can strip any secrets and urls, I just want to make sure there are no unexpected things in it (e.g. low-level spring settings).
For the docker logs it would be useful if you can trigger the problem and then run sudo journalctl -u docker -n300 to get the last 300 log lines.

spinanicky · 2024-01-25T12:48:56Z

Yes, resource testing was the first thing I jumped to when I saw CPU logs on netdata - which is why Server B has 8 VCPU and it still wouldn't alleviate the issue. Port range start was changed to 40k as I originally thought that got rid of the issue when we got the same error on 20k ports.

server: #used to rewrite the keycloak redirect url to include https
  forward-headers-strategy: native

logging:
  file:
    name: /etc/shinyproxy/logs/shinyproxy.log
  # level:
  #   eu:
  #     openanalytics:
  #       containerproxy:
  #         auth: DEBUG



proxy:
  title: ABCD
  hide-navbar: true
  landing-page: /
  favicon-path: favicon.ico
  heartbeat-rate: 10000
  heartbeat-timeout: 60000
  container-wait-time: 60000
  default-proxy-max-lifetime: 240
  port: 8080
  admin-groups: [ABCD]

  usage-stats-url: jdbc:mysql://ABCD
  # usage-stats-username: 
  # usage-stats-password: 

  authentication: openid
  openid:
    auth-url: ABCD
    token-url: ABCD
    jwks-url: ABCD
    userinfo-url: ABCD
    client-id: ABCD #client_id: client -> settings -> client_id
    client-secret: ABCD
    logout-url: ABCD
    username-attribute: email
    roles-claim: groups

  docker:
    cert-path: /home/none
    url: http://localhost:2375
    port-range-start: 40000
    port-range-max: 40100
  # Applications english
  specs:
  - id: basics
    display-name: Basics
    description:  Learn how to get started with coding in R using public health examples
    container-cmd: ["R", "-e", "learnr::run_tutorial('basics', package = 'introexercises', shiny_args = list(launch.browser = 0, port = 3838, host = '0.0.0.0'))"]
    container-image: ABCD/intro_course:latest
    access-groups: [ABCD, ABCD, ABCD]

Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.000895718Z" level=debug msg="Calling GET /v1.43/info"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.013002186Z" level=debug msg="Calling GET /v1.43/images/json"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.017248361Z" level=debug msg="Calling GET /v1.43/containers/json?all=1&filters=%7B%22health%22%3A%7B%22healthy%22%3Atrue%7D%7D"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.017641243Z" level=debug msg="Calling GET /v1.43/containers/json?all=1&filters=%7B%22health%22%3A%7B%22unhealthy%22%3Atrue%7D%7D"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.017975165Z" level=debug msg="Calling GET /v1.43/containers/json?all=1&filters=%7B%22health%22%3A%7B%22starting%22%3Atrue%7D%7D"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.018386194Z" level=debug msg="Calling GET /v1.43/containers/json?all=1&filters=%7B%22health%22%3A%7B%22none%22%3Atrue%7D%7D"

LEDfan · 2024-05-07T13:39:08Z

Hi @spinanicky

I was not able to reproduce this consistently, but I have seen it happening a few times randomly as well. As mentioned above it seems the docker-proxy process keeps running just a bit longer than usual, but eventually stops as well. Therefore, ShinyProxy 3.1.0 now waits (at least) 10 seconds before re-using a port that was previously used by a container. It would be nice if you could test the 3.1.0 release and let us know whether this fixes the issue for you.

spinanicky · 2024-05-07T15:08:07Z

@LEDfan you have been very active - my email is flooded with your name :-)

I will have a look over the weekend, but don't let this hold you up with a release.

Thank you for testing.

yaroslav-gwit · 2024-05-08T08:35:45Z

Thank you for pushing the update. We've got the same issue with our installation, so +1 to this. I'll upgrade now, and see if it fixes it.

On the side note, you could request port 0 from the OS - Linux and FreeBSD will give you back a random, unused network port, but I never tried this on Windows. You could then capture that number programatically and use it, instead of working with the baseline/hard-coded port 2000 + 1 per service, and playing a guessing game.

spinanicky · 2024-05-09T06:35:24Z

@LEDfan is this normal:

I am also seeing in the lsof established table that Shinyproxy seems to be taking both IPv4 and IPv6 routes and forwarding them to ports. I am going to try and disable IPv6 on the docker network and see if that does anything.

spinanicky · 2024-05-09T15:28:45Z

@LEDfan - the bug persists with the new release.

<driver failed programming external connectivity on endpoint sp-container-d968fae8-6410-4d94-b924-5a96fd0248cc-0 (24782c3abc75b009fd8d49afe09cb6201da9901bc1cb50cf85d15be3bb8e4493): Error starting userland proxy: listen tcp4 0.0.0.0:40002: bind: address already in use"}

LEDfan · 2024-05-16T07:19:53Z

Hi @spinanicky that is unfortunate, I'll continue having a look.
When you notice the issue, do you perhaps have (some) containers that failed to start? It seems I recently observed the issue when I started an app just after another app/instance failed to start.

LEDfan added the bug label Jan 2, 2024

LEDfan mentioned this issue Jan 2, 2024

Start mulitiple apps failure #415

Closed

LEDfan added this to the Next milestone May 7, 2024

LEDfan removed this from the 3.1.0 milestone May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: bind address/port already in use #457

Bug: bind address/port already in use #457

spinanicky commented Sep 21, 2023 •

edited

ntyndall commented Oct 26, 2023 •

edited

essemenov commented Dec 8, 2023

spinanicky commented Dec 8, 2023

LEDfan commented Jan 2, 2024

LEDfan commented Jan 25, 2024

spinanicky commented Jan 25, 2024

LEDfan commented Jan 25, 2024

spinanicky commented Jan 25, 2024

LEDfan commented May 7, 2024

spinanicky commented May 7, 2024

yaroslav-gwit commented May 8, 2024 •

edited

spinanicky commented May 9, 2024

spinanicky commented May 9, 2024

LEDfan commented May 16, 2024

Bug: bind address/port already in use #457

Bug: bind address/port already in use #457

Comments

spinanicky commented Sep 21, 2023 • edited

ntyndall commented Oct 26, 2023 • edited

essemenov commented Dec 8, 2023

spinanicky commented Dec 8, 2023

LEDfan commented Jan 2, 2024

LEDfan commented Jan 25, 2024

spinanicky commented Jan 25, 2024

LEDfan commented Jan 25, 2024

spinanicky commented Jan 25, 2024

LEDfan commented May 7, 2024

spinanicky commented May 7, 2024

yaroslav-gwit commented May 8, 2024 • edited

spinanicky commented May 9, 2024

spinanicky commented May 9, 2024

LEDfan commented May 16, 2024

spinanicky commented Sep 21, 2023 •

edited

ntyndall commented Oct 26, 2023 •

edited

yaroslav-gwit commented May 8, 2024 •

edited