Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

send() call can start to send an infinite amount of 0 size UDP packets #8566

Closed
reddwarf69 opened this issue Feb 20, 2023 · 8 comments · Fixed by #10428
Closed

send() call can start to send an infinite amount of 0 size UDP packets #8566

reddwarf69 opened this issue Feb 20, 2023 · 8 comments · Fixed by #10428
Assignees
Labels
area: networking Issue related to networking no-auto-close type: bug Something isn't working

Comments

@reddwarf69
Copy link

Description

Using the latest release version, 20230214, a simple

while(1) {
  ssize_t size = send(sock, buff, 1400, 0);
}

Starts by sending 1400 bytes packets, but at some random point in time the send() call blocks forever and an infinite amount of zero size packets are sent.
If a ICMP "Destination unreachable" packet is received, it goes back to normal.

Steps to reproduce

Standard configuration

$ cat /etc/docker/daemon.json
{
    "runtimes": {
        "runsc": {
            "path": "/usr/bin/runsc"
        }
    }
}

Build these two programs:
Receiver

#include <arpa/inet.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/socket.h>

int main(int argc, char *argv[]) {
  in_port_t port = atoi(argv[1]);

  int sock = socket(AF_INET, SOCK_DGRAM, 0);

  struct sockaddr_in sock_address = {
    .sin_family = AF_INET,
    .sin_port = htons(port),
    .sin_addr.s_addr = INADDR_ANY
  };
  bind(sock, (struct sockaddr*)&sock_address, sizeof(sock_address));

  char buff[1400];
  while(1) {
    ssize_t size = recv(sock, buff, 1400, 0);
    printf("Received %zd\n", size);
  }
}

Sender

#include <arpa/inet.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/socket.h>

int main(int argc, char *argv[]) {
  struct in_addr addr;
  inet_pton(AF_INET, argv[1], &addr);

  in_port_t port = atoi(argv[2]);

  int sock = socket(AF_INET, SOCK_DGRAM, 0);

  struct sockaddr_in peer_address = {
    .sin_family = AF_INET,
    .sin_port = htons(port),
    .sin_addr = addr
  };
  connect(sock, (struct sockaddr*)&peer_address, sizeof(peer_address));

  char buff[1400];
  while(1) {
    ssize_t size = send(sock, buff, 1400, 0);
    printf("Sent %zd\n", size);
  }
}
  • Create a network for the test: docker network create gvisor_test
  • Run the receiver: docker run --network=gvisor_test --rm -v $PWD:/test_binaries:ro ubuntu:22.04 /test_binaries/receiver 8001
  • Run the sender: docker run --network=gvisor_test --rm --runtime=runsc -v $PWD:/test_binaries:ro ubuntu:22.04 /test_binaries/sender 172.18.0.2 8001

After a while you will see it goes from printing "Received 1400" to "Received 0" (and the 0 size packets can be seen in tcpdump), and the sender stops printing anything (it's blocked in the send() call).
If you kill the receiver, the sender will go back to printing "Sent 1400".

runsc version

runsc version release-20230214.0
spec: 1.0.2-dev

docker version (if using docker)

Client:
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.17.3
 Git commit:        20.10.12-0ubuntu4
 Built:             Mon Mar  7 17:10:06 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.17.3
  Git commit:       20.10.12-0ubuntu4
  Built:            Mon Mar  7 15:57:50 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.9-0ubuntu3.1
  GitCommit:        
 runc:
  Version:          1.1.0-0ubuntu1.1
  GitCommit:        
 docker-init:
  Version:          0.19.0
  GitCommit:

uname

Linux skcristian-XPS-13-9370 5.19.0-32-generic #33~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Jan 30 17:03:34 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

@reddwarf69 reddwarf69 added the type: bug Something isn't working label Feb 20, 2023
@reddwarf69
Copy link
Author

FWIW the issue disappears when using the host network stack (https://gvisor.dev/docs/user_guide/networking/#network-passthrough).

@kevinGC kevinGC self-assigned this Feb 20, 2023
@nlacasse
Copy link
Collaborator

Thank you for the detailed bug report and repro. We are looking into this now.

@kevinGC
Copy link
Collaborator

kevinGC commented Feb 21, 2023

I'm having trouble reproducing this. Quick questions:

  • How often does this fail? I.e. when you run the above commands, roughly what % of the time do they fail?
  • How long does it typically take to fail? Want to be sure I'm running it long enough.
  • Is there anything else that seems to affect the chance of failure? Just anything that might help with repro.

@reddwarf69
Copy link
Author

Usually a bit over one minute. 100% reproducibility at 2 minutes.
It may be dependent on the system load, not sure. Maybe be running Wireshark at the same time helps, not sure.

But I have a single laptop sample. I have not tried to reproduce it in other machines.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 120 days.

@github-actions github-actions bot added the stale-issue This issue has not been updated in 120 days. label Sep 13, 2023
@avagin avagin added area: networking Issue related to networking and removed stale-issue This issue has not been updated in 120 days. labels Sep 13, 2023
@avagin
Copy link
Collaborator

avagin commented Sep 13, 2023

I am able to reproduce the issue on my workstation.

  • run the sender program in a gvisor docker container:
    $ docker run --runtime runsc --rm -v /tmp/test:/mnt alpine /mnt/s 192.168.9.1 9999

  • execute tcpdump from another terminal:
    $ sudo tcpdump -i docker0
    11:18:49.633185 IP 192.168.9.2.49256 > 192.168.9.1.9999: UDP, length 0
    11:18:49.633193 IP 192.168.9.2.49256 > 192.168.9.1.9999: UDP, length 0
    11:18:49.633201 IP 192.168.9.2.49256 > 192.168.9.1.9999: UDP, length 0
    11:18:49.633209 IP 192.168.9.2.49256 > 192.168.9.1.9999: UDP, length 0
    11:18:49.633216 IP 192.168.9.2.49256 > 192.168.9.1.9999: UDP, length 0

  • execute the receiver program
    $ strace /tmp/test/r 9999
    recvfrom(3, "", 1400, 0, NULL, NULL) = 0
    write(1, "Received 0\n", 11Received 0
    ) = 11
    recvfrom(3, "", 1400, 0, NULL, NULL) = 0
    write(1, "Received 0\n", 11Received 0
    ) = 11
    ....

and the sender stops printing any messages if receiver is running...

@avagin
Copy link
Collaborator

avagin commented Sep 13, 2023

prepareForWrite reads data from p and saves it in udpPacketInfo:
https://cs.opensource.google/gvisor/gvisor/+/master:pkg/tcpip/transport/udp/endpoint.go;l=440
then TryNewPacketBuffer can return nil and udp.endpoint.write returns tcpip.ErrWouldBlock:
https://cs.opensource.google/gvisor/gvisor/+/master:pkg/tcpip/transport/udp/endpoint.go;l=480

In this case data from udpPacketInfo will be lost.

@avagin
Copy link
Collaborator

avagin commented Sep 13, 2023

I think this issue was introduced by bb36c43.

@ghananigans ghananigans removed their assignment Oct 10, 2023
avagin added a commit to avagin/gvisor that referenced this issue May 10, 2024
The caller will wait for the endpointto become writable and try again.

Fixes google#8566

Signed-off-by: Andrei Vagin <avagin@google.com>
avagin added a commit to avagin/gvisor that referenced this issue May 10, 2024
The caller will wait for the endpointto become writable and try again.

Fixes google#8566

Signed-off-by: Andrei Vagin <avagin@google.com>
copybara-service bot pushed a commit that referenced this issue May 10, 2024
The caller will wait for the endpoint to become writable and try again.

Fixes #8566

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10428 from avagin:udp-payloader c8758b4
PiperOrigin-RevId: 632619219
copybara-service bot pushed a commit that referenced this issue May 11, 2024
The caller will wait for the endpoint to become writable and try again.

Fixes #8566

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10428 from avagin:udp-payloader b6461c0
PiperOrigin-RevId: 632619219
copybara-service bot pushed a commit that referenced this issue May 11, 2024
The caller will wait for the endpoint to become writable and try again.

Fixes #8566

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10428 from avagin:udp-payloader b6461c0
PiperOrigin-RevId: 632619219
copybara-service bot pushed a commit that referenced this issue May 11, 2024
The caller will wait for the endpoint to become writable and try again.

Fixes #8566

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10428 from avagin:udp-payloader b6461c0
PiperOrigin-RevId: 632619219
copybara-service bot pushed a commit that referenced this issue May 13, 2024
The caller will wait for the endpoint to become writable and try again.

Fixes #8566

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10428 from avagin:udp-payloader b6461c0
PiperOrigin-RevId: 632619219
copybara-service bot pushed a commit that referenced this issue May 13, 2024
The caller will wait for the endpoint to become writable and try again.

Fixes #8566

FUTURE_COPYBARA_INTEGRATE_REVIEW=#10428 from avagin:udp-payloader b6461c0
PiperOrigin-RevId: 632619219
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: networking Issue related to networking no-auto-close type: bug Something isn't working
Projects
None yet
6 participants