Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: bind address/port already in use #457

Open
spinanicky opened this issue Sep 21, 2023 · 14 comments
Open

Bug: bind address/port already in use #457

spinanicky opened this issue Sep 21, 2023 · 14 comments
Labels

Comments

@spinanicky
Copy link

spinanicky commented Sep 21, 2023

Running 3.0.2 and we have tried several approaches to fixing this issue (including restarting both docker and shiny before a large group signs on).

Long story short we keep getting errors saying the port is already in use. It is hit an miss where sometimes we have 60+ users connected and sometimes we get to 10 and it starts throwing this error.

{"message":"driver failed programming external connectivity on endpoint gracious_albattani (e3e2e1d06079f881534875f477dd98b25fb7c016f685d56ff777fc397e819cd3): Error starting userland proxy: listen tcp4 0.0.0.0:40018: bind: address already in use"}

EDIT: we are seeing two different errors:
Bind for 0.0.0.0:[0-9]+ failed: port is already allocated
listen tcp4 0.0.0.0:[0-9]+: bind: address already in use

There are no containers listed in shinyproxy or docker as running on that port.

server@prod-ae-spvm:~/shinyproxy-etc$ sudo lsof -n -i :40018

COMMAND      PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
docker-pr 138817 root   44u  IPv4 639804      0t0  TCP 172.17.0.1:40018->172.17.0.51:3838 (ESTABLISHED)

server@prod-ae-spvm:~/shinyproxy-etc$ kill -9 138817

Yet even after killing the process the error still persists with the same port "being occupied". We are seeing this occur over and over again and I believe it is tied to what others have seen here as well:

https://support.openanalytics.eu/t/error-500-port-is-already-allocated/604


2023-09-21 12:40:03.916  INFO 149714 --- [pool-1-thread-4] e.o.containerproxy.service.ProxyService  : [user=EMAIL@gmail.com proxyId=8c3743f3-ad4e-4e5d-b2a7-968af1bb194f specId=grouping_summaries] Starting proxy
2023-09-21 12:40:03.973  INFO 149714 --- [pool-1-thread-4] c.s.docker.client.DefaultDockerClient    : Starting container with Id: a42a78e6fe2084a11ad90b657343f000987560564103edbcde1d29b49f624d86
2023-09-21 12:40:04.273  WARN 149714 --- [pool-1-thread-4] e.o.containerproxy.service.ProxyService  : [user=EMAIL@gmail.com proxyId=8c3743f3-ad4e-4e5d-b2a7-968af1bb194f specId=grouping_summaries] Proxy failed to start

eu.openanalytics.containerproxy.ProxyFailedToStartException: Container with index 0 failed to start
	at eu.openanalytics.containerproxy.backend.AbstractContainerBackend.startProxy(AbstractContainerBackend.java:116) ~[containerproxy-1.0.2.jar!/:1.0.2]
	at eu.openanalytics.containerproxy.service.ProxyService.startOrResumeProxy(ProxyService.java:455) [containerproxy-1.0.2.jar!/:1.0.2]
	at eu.openanalytics.containerproxy.service.ProxyService.lambda$startProxy$4(ProxyService.java:279) [containerproxy-1.0.2.jar!/:1.0.2]
	at eu.openanalytics.containerproxy.service.ProxyService.lambda$action$11(ProxyService.java:599) [containerproxy-1.0.2.jar!/:1.0.2]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_382]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_382]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_382]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_382]
	at java.lang.Thread.run(Thread.java:750) ~[na:1.8.0_382]
Caused by: eu.openanalytics.containerproxy.ContainerFailedToStartException: Docker container failed to start
	at eu.openanalytics.containerproxy.backend.docker.DockerEngineBackend.startContainer(DockerEngineBackend.java:153) ~[containerproxy-1.0.2.jar!/:1.0.2]
	at eu.openanalytics.containerproxy.backend.AbstractContainerBackend.startProxy(AbstractContainerBackend.java:108) ~[containerproxy-1.0.2.jar!/:1.0.2]
	... 8 common frames omitted
Caused by: com.spotify.docker.client.exceptions.DockerRequestException: Request error: POST http://localhost:2375/containers/a42a78e6fe2084a11ad90b657343f000987560564103edbcde1d29b49f624d86/start: 500, body: {"message":"driver failed programming external connectivity on endpoint gracious_albattani (e3e2e1d06079f881534875f477dd98b25fb7c016f685d56ff777fc397e819cd3): Error starting userland proxy: listen tcp4 0.0.0.0:40018: bind: address already in use"}

	at com.spotify.docker.client.DefaultDockerClient.propagate(DefaultDockerClient.java:2840) ~[docker-client-8.16.0.jar!/:8.16.0]
	at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2734) ~[docker-client-8.16.0.jar!/:8.16.0]
	at com.spotify.docker.client.DefaultDockerClient.containerAction(DefaultDockerClient.java:783) ~[docker-client-8.16.0.jar!/:8.16.0]
	at com.spotify.docker.client.DefaultDockerClient.containerAction(DefaultDockerClient.java:768) ~[docker-client-8.16.0.jar!/:8.16.0]
	at com.spotify.docker.client.DefaultDockerClient.startContainer(DefaultDockerClient.java:763) ~[docker-client-8.16.0.jar!/:8.16.0]
	at eu.openanalytics.containerproxy.backend.docker.DockerEngineBackend.startContainer(DockerEngineBackend.java:147) ~[containerproxy-1.0.2.jar!/:1.0.2]
	... 9 common frames omitted
Caused by: javax.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error
	at org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:1107) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.JerseyInvocation.translate(JerseyInvocation.java:894) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.JerseyInvocation.access$700(JerseyInvocation.java:98) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.JerseyInvocation$5.completed(JerseyInvocation.java:851) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.ClientRuntime.processResponse(ClientRuntime.java:197) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.ClientRuntime.access$300(ClientRuntime.java:74) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.ClientRuntime$2$1$1.run(ClientRuntime.java:167) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:267) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:303) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.ClientRuntime$2$1.response(ClientRuntime.java:165) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.apache.connector.ApacheConnector$1.run(ApacheConnector.java:498) ~[jersey-apache-connector-2.26-b03.jar!/:na]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_382]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_382]
	at jersey.repackaged.com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299) ~[jersey-guava-2.26-b03.jar!/:na]
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) ~[na:1.8.0_382]
	at jersey.repackaged.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:50) ~[jersey-guava-2.26-b03.jar!/:na]
	at jersey.repackaged.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:37) ~[jersey-guava-2.26-b03.jar!/:na]
	at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:494) ~[jersey-apache-connector-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.ClientRuntime$2.run(ClientRuntime.java:181) ~[jersey-client-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:267) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:303) ~[jersey-common-2.26-b03.jar!/:na]
	at org.glassfish.jersey.client.ClientRuntime$3.run(ClientRuntime.java:209) ~[jersey-client-2.26-b03.jar!/:na]
	... 5 common frames omitted

I created a script as an interim solution to try and rapidly kill hung containers (note: you have to update the path to your shinyproxy log on line 10):
https://gist.github.com/spinanicky/d82763292cacf357b26fd2dfcf11836c

@ntyndall
Copy link

ntyndall commented Oct 26, 2023

Hey there,

I am experiencing the exact same behaviour running shinyproxy 2.6.1. It just seems to randomly occur and currently we just have to restart docker to get this working again..

Thanks!

@essemenov
Copy link

Interesting case!
What Docker version is installed on server?

@spinanicky
Copy link
Author

Interesting case! What Docker version is installed on server?

24.0.7

@LEDfan LEDfan added the bug label Jan 2, 2024
@LEDfan
Copy link
Member

LEDfan commented Jan 2, 2024

Hi
We'll investigate this issue, so that we can hopefully reduce the problem and fix it. I'll provide an update once we have more information.

@LEDfan
Copy link
Member

LEDfan commented Jan 25, 2024

I'm unable to reproduce this issue. It seems like the docker-proxy command keeps running, even when the container was already stopped. Could you share the following information?

  • operating system (including version)
  • are you using a local VM, or some cloud provider?
  • resources of the ShinyProxy server (memory/cpu)
  • your application.yml
  • can you check the logs of the Docker daemon?

Hopefully with this information I'm able to set up a similar environment and reproduce the problem.

@spinanicky
Copy link
Author

  • Proxmox 8.0.2 -> Ubuntu 23.10 VM (note the host (proxmox) uses zfs). I have been meaning to test whether this might be the cause as it also prevents us from running containers rather than VMs.
  • Sits on two dedicated hosts (we give users the option to connect to either url given the failures described herein)
  • A) 8 x Intel(R) Xeon(R) CPU E3-1275 v6 @ 3.80GHz + 64 GB ram
  • B) 12 x Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz + 128 GB ram
  • VM for shinyproxy has 30gb + 2vcpu on server A and 60gb + 8 vcpu on B
  • will share privately
  • is there a specific command you want me to run for the logs because its a 800mb file :)

@LEDfan
Copy link
Member

LEDfan commented Jan 25, 2024

Thank you for the details, the issue with ZFS sounds interesting, I could try to use a ZFS host. At least it seems your server has enough resources, so I don't think we need to look into that direction.

W.r.t to the application.yml, you can strip any secrets and urls, I just want to make sure there are no unexpected things in it (e.g. low-level spring settings).
For the docker logs it would be useful if you can trigger the problem and then run sudo journalctl -u docker -n300 to get the last 300 log lines.

@spinanicky
Copy link
Author

Yes, resource testing was the first thing I jumped to when I saw CPU logs on netdata - which is why Server B has 8 VCPU and it still wouldn't alleviate the issue. Port range start was changed to 40k as I originally thought that got rid of the issue when we got the same error on 20k ports.

server: #used to rewrite the keycloak redirect url to include https
  forward-headers-strategy: native

logging:
  file:
    name: /etc/shinyproxy/logs/shinyproxy.log
  # level:
  #   eu:
  #     openanalytics:
  #       containerproxy:
  #         auth: DEBUG



proxy:
  title: ABCD
  hide-navbar: true
  landing-page: /
  favicon-path: favicon.ico
  heartbeat-rate: 10000
  heartbeat-timeout: 60000
  container-wait-time: 60000
  default-proxy-max-lifetime: 240
  port: 8080
  admin-groups: [ABCD]

  usage-stats-url: jdbc:mysql://ABCD
  # usage-stats-username: 
  # usage-stats-password: 

  authentication: openid
  openid:
    auth-url: ABCD
    token-url: ABCD
    jwks-url: ABCD
    userinfo-url: ABCD
    client-id: ABCD #client_id: client -> settings -> client_id
    client-secret: ABCD
    logout-url: ABCD
    username-attribute: email
    roles-claim: groups

  docker:
    cert-path: /home/none
    url: http://localhost:2375
    port-range-start: 40000
    port-range-max: 40100
  # Applications english
  specs:
  - id: basics
    display-name: Basics
    description:  Learn how to get started with coding in R using public health examples
    container-cmd: ["R", "-e", "learnr::run_tutorial('basics', package = 'introexercises', shiny_args = list(launch.browser = 0, port = 3838, host = '0.0.0.0'))"]
    container-image: ABCD/intro_course:latest
    access-groups: [ABCD, ABCD, ABCD]
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.000895718Z" level=debug msg="Calling GET /v1.43/info"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.013002186Z" level=debug msg="Calling GET /v1.43/images/json"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.017248361Z" level=debug msg="Calling GET /v1.43/containers/json?all=1&filters=%7B%22health%22%3A%7B%22healthy%22%3Atrue%7D%7D"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.017641243Z" level=debug msg="Calling GET /v1.43/containers/json?all=1&filters=%7B%22health%22%3A%7B%22unhealthy%22%3Atrue%7D%7D"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.017975165Z" level=debug msg="Calling GET /v1.43/containers/json?all=1&filters=%7B%22health%22%3A%7B%22starting%22%3Atrue%7D%7D"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.018386194Z" level=debug msg="Calling GET /v1.43/containers/json?all=1&filters=%7B%22health%22%3A%7B%22none%22%3Atrue%7D%7D"

@LEDfan LEDfan added this to the Next milestone May 7, 2024
@LEDfan
Copy link
Member

LEDfan commented May 7, 2024

Hi @spinanicky

I was not able to reproduce this consistently, but I have seen it happening a few times randomly as well. As mentioned above it seems the docker-proxy process keeps running just a bit longer than usual, but eventually stops as well. Therefore, ShinyProxy 3.1.0 now waits (at least) 10 seconds before re-using a port that was previously used by a container. It would be nice if you could test the 3.1.0 release and let us know whether this fixes the issue for you.

@spinanicky
Copy link
Author

@LEDfan you have been very active - my email is flooded with your name :-)

I will have a look over the weekend, but don't let this hold you up with a release.

Thank you for testing.

@yaroslav-gwit
Copy link

yaroslav-gwit commented May 8, 2024

Thank you for pushing the update. We've got the same issue with our installation, so +1 to this. I'll upgrade now, and see if it fixes it.

On the side note, you could request port 0 from the OS - Linux and FreeBSD will give you back a random, unused network port, but I never tried this on Windows. You could then capture that number programatically and use it, instead of working with the baseline/hard-coded port 2000 + 1 per service, and playing a guessing game.

@spinanicky
Copy link
Author

@LEDfan is this normal:
image

I am also seeing in the lsof established table that Shinyproxy seems to be taking both IPv4 and IPv6 routes and forwarding them to ports. I am going to try and disable IPv6 on the docker network and see if that does anything.

@spinanicky
Copy link
Author

@LEDfan - the bug persists with the new release.

<driver failed programming external connectivity on endpoint sp-container-d968fae8-6410-4d94-b924-5a96fd0248cc-0 (24782c3abc75b009fd8d49afe09cb6201da9901bc1cb50cf85d15be3bb8e4493): Error starting userland proxy: listen tcp4 0.0.0.0:40002: bind: address already in use"}

@LEDfan
Copy link
Member

LEDfan commented May 16, 2024

Hi @spinanicky that is unfortunate, I'll continue having a look.
When you notice the issue, do you perhaps have (some) containers that failed to start? It seems I recently observed the issue when I started an app just after another app/instance failed to start.

@LEDfan LEDfan removed this from the 3.1.0 milestone May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants