Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: net_context: zsock_connect can error out due to duplicate ports #72035

Closed
JordanYates opened this issue Apr 27, 2024 · 5 comments · Fixed by #72595
Closed

net: net_context: zsock_connect can error out due to duplicate ports #72035

JordanYates opened this issue Apr 27, 2024 · 5 comments · Fixed by #72595
Assignees
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@JordanYates
Copy link
Collaborator

Describe the bug

check_used_port can return false when the port is actually in use, leading to find_available_port returning an unavailable port.
This can be reliably triggered when sys_rand32_get is derived from the system clock, but can occur with a 1/2**16 chance in normal operation.

The initial net_context_get from zsock_socket assigns a random port to the net context:

addr->sin_port =
find_available_port(&contexts[i], (struct sockaddr *)addr);

net_context_connect from zsock_connect then generates another random port inside bind_default:

addr4.sin_port =
find_available_port(context,
(struct sockaddr *)&addr4);

A duplicate port number at this stage is not detected because the context sin_addr pointer is currently NULL, which skips further checks here:
if (net_sin_ptr(&contexts[i].local)->sin_addr == NULL ||

Inside net_context_bind (called from bind_default), the sin_addr pointer is finally set (using the first generated port):

net_sin_ptr(&context->local)->sin_addr = ptr;

And the second generated port is then checked again:
ret = check_used_port(iface,
context->proto,
addr4->sin_port,
addr,
net_context_is_reuseaddr_set(context),
net_context_is_reuseport_set(context));

At this point check_used_port returns an error, and sock_connect errors out.

Expected behavior
It should not be possible for duplicate port numbers to be generated inside zsock_connect that cause an error without the user being able to intervene.

Environment (please complete the following information):

  • Zephyr v3.6 (Cannot see any meaningful changes on main)
@JordanYates JordanYates added bug The issue is a bug, or the PR is fixing a bug area: Networking labels Apr 27, 2024
@JordanYates
Copy link
Collaborator Author

I'm not sure whether the bug is in the implementation of check_used_port (i.e. the NULL check is not sufficient), or in one of the calling functions (context sin_addr pointer should be set earlier)

@aescolar aescolar added the priority: medium Medium impact/importance bug label Apr 30, 2024
@jukkar
Copy link
Member

jukkar commented May 10, 2024

@JordanYates I am trying to replicate this but have so far failed. Do you have a setup / test app I could use?

What is baffling is that here in check_used_port() before the NULL check you mentioned,

if (!(net_context_get_proto(&contexts[i]) == proto &&
net_sin((struct sockaddr *)&
contexts[i].local)->sin_port == local_port)) {

we check whether there is a local port (that is set by net_context_get()) already set. So I just wonder why we would get the duplicate port selected.

@jukkar
Copy link
Member

jukkar commented May 10, 2024

Are you using IPv4-to-IPv6 mapping CONFIG_NET_IPV4_MAPPING_TO_IPV6 as that could explain this issue?

@JordanYates
Copy link
Collaborator Author

@JordanYates I am trying to replicate this but have so far failed. Do you have a setup / test app I could use?

What is baffling is that here in check_used_port() before the NULL check you mentioned,

if (!(net_context_get_proto(&contexts[i]) == proto &&
net_sin((struct sockaddr *)&
contexts[i].local)->sin_port == local_port)) {

we check whether there is a local port (that is set by net_context_get()) already set. So I just wonder why we would get the duplicate port selected.

I can reliably trigger this in my local code by changing this line:

local_port = sys_rand16_get() | 0x8000;

to this:

local_port = k_uptime_get() / 2 | 0x8000;

Which in theory should just make the function loop for ~2 msec.

Are you using IPv4-to-IPv6 mapping CONFIG_NET_IPV4_MAPPING_TO_IPV6 as that could explain this issue?

This option is not enabled (CONFIG_IPV6 as a whole is disabled). I will try and generate a reproduction in NCS using a nRF7002DK.

@JordanYates
Copy link
Collaborator Author

NCS version

(.zephyr_venv) jordan@TAURUS:~/code/ncs/nrf$ git log --oneline
3190fa573 (HEAD, tag: v2.6.0) version: update VERSION to 2.6.0

Diff:

(.zephyr_venv) jordan@TAURUS:~/code/ncs/zephyr$ git diff
diff --git a/subsys/net/ip/net_context.c b/subsys/net/ip/net_context.c
index a628297f650..1247c97d7dd 100644
--- a/subsys/net/ip/net_context.c
+++ b/subsys/net/ip/net_context.c
@@ -277,7 +277,7 @@ static uint16_t find_available_port(struct net_context *context,
        uint16_t local_port;

        do {
-               local_port = sys_rand32_get() | 0x8000;
+               local_port = k_uptime_get() / 2 | 0x8000;
        } while (check_used_port(NULL, net_context_get_proto(context),
                                 htons(local_port), addr, false, false) == -EEXIST);

(.zephyr_venv) jordan@TAURUS:~/code/ncs/nrf$ git diff
diff --git a/samples/wifi/sta/prj.conf b/samples/wifi/sta/prj.conf
index 0c9f56023..ac8a4398b 100644
--- a/samples/wifi/sta/prj.conf
+++ b/samples/wifi/sta/prj.conf
@@ -84,3 +84,7 @@ CONFIG_NET_CONFIG_MY_IPV4_GW="192.168.1.1"
 # design in net_mgmt. So, use a higher timeout for a crowded
 # environment.
 CONFIG_NET_MGMT_EVENT_QUEUE_TIMEOUT=5000
+
+CONFIG_STA_KEY_MGMT_WPA2=y
+CONFIG_STA_SAMPLE_SSID="********"
+CONFIG_STA_SAMPLE_PASSWORD="********"
+CONFIG_SNTP=y
diff --git a/samples/wifi/sta/src/main.c b/samples/wifi/sta/src/main.c
index 83ccd4984..7819fe072 100644
--- a/samples/wifi/sta/src/main.c
+++ b/samples/wifi/sta/src/main.c
@@ -285,6 +285,60 @@ int bytes_from_str(const char *str, uint8_t *bytes, size_t bytes_len)
        return 0;
 }

+#include <zephyr/net/socket.h>
+#include <zephyr/net/sntp.h>
+
+int on_connection(void)
+{
+       struct zsock_addrinfo hints = {
+               .ai_family = AF_INET,
+               .ai_socktype = SOCK_DGRAM,
+       };
+       struct zsock_addrinfo *res = NULL;
+       struct sockaddr_in *sockaddr;
+       struct sockaddr remote;
+       socklen_t remote_len;
+       int rc;
+
+       while(true) {
+               k_sleep(K_SECONDS(2));
+               /* Get IP address from DNS */
+               rc = zsock_getaddrinfo("pool.ntp.org", NULL, &hints, &res);
+               if (rc < 0) {
+                       zsock_freeaddrinfo(res);
+                       continue;
+               }
+               remote = *res->ai_addr;
+               remote_len = res->ai_addrlen;
+               zsock_freeaddrinfo(res);
+
+               sockaddr = (struct sockaddr_in *)&remote;
+               sockaddr->sin_port = htons(123);
+               uint8_t *a = sockaddr->sin_addr.s4_addr;
+               LOG_INF("pool.ntp.org -> %d.%d.%d.%d", a[0], a[1], a[2], a[3]);
+
+               struct sntp_time sntp_time;
+               struct sntp_ctx ctx;
+
+               rc = sntp_init(&ctx, &remote, remote_len);
+               if (rc < 0) {
+                       LOG_ERR("Failed to init ctx (%d)", rc);
+                       continue;
+               }
+
+               LOG_INF("Sending request...");
+               rc = sntp_query(&ctx, 4 * MSEC_PER_SEC, &sntp_time);
+               sntp_close(&ctx);
+               if (rc < 0) {
+                       LOG_ERR("SNTP query failed (%d)", rc);
+                       continue;
+               }
+               LOG_INF("Unix time: %llu", sntp_time.seconds);
+               break;
+       }
+       return 0;
+}
+
 int main(void)
 {
        memset(&context, 0, sizeof(context));
@@ -349,6 +403,7 @@ int main(void)

                if (context.connected) {
                        cmd_wifi_status();
+                       on_connection();
                        k_sleep(K_FOREVER);
                }
        }
(.zephyr_venv) jordan@TAURUS:~/code/ncs$ west build -b nrf7002dk_nrf5340_cpuapp nrf/samples/wifi/sta/ -p
(.zephyr_venv) jordan@TAURUS:~/code/ncs$ west flash --erase

Console:

[00:00:07.486,999] <inf> sta: ==================
[00:00:07.487,030] <inf> sta: State: SCANNING
[00:00:07.787,170] <inf> sta: ==================
[00:00:07.787,200] <inf> sta: State: SCANNING
[00:00:08.087,341] <inf> sta: ==================
[00:00:08.087,371] <inf> sta: State: AUTHENTICATING
[00:00:08.214,477] <inf> sta: Connected
[00:00:08.251,098] <inf> net_dhcpv4: Received: 192.168.20.23
[00:00:08.251,220] <inf> net_config: IPv4 address: 192.168.20.23
[00:00:08.251,220] <inf> net_config: Lease time: 86400 seconds
[00:00:08.251,281] <inf> net_config: Subnet: 255.255.255.0
[00:00:08.251,312] <inf> net_config: Router: 192.168.20.1
[00:00:08.251,403] <inf> sta: DHCP IP address: 192.168.20.23
[00:00:08.266,754] <inf> net_config: IPv6 address: fe80::f6ce:36ff:fe00:2350
[00:00:08.392,425] <inf> sta: ==================
[00:00:08.392,456] <inf> sta: State: COMPLETED
[00:00:08.392,486] <inf> sta: Interface Mode: STATION
[00:00:08.392,486] <inf> sta: Link Mode: WIFI 5 (802.11ac/VHT)
[00:00:08.392,517] <inf> sta: SSID: ********
[00:00:08.392,547] <inf> sta: BSSID: 18:F1:45:85:EC:23
[00:00:08.392,578] <inf> sta: Band: 5GHz
[00:00:08.392,578] <inf> sta: Channel: 132
[00:00:08.392,608] <inf> sta: Security: WPA2-PSK
[00:00:08.392,608] <inf> sta: MFP: Optional
[00:00:08.392,639] <inf> sta: RSSI: -68
[00:00:10.415,405] <inf> sta: pool.ntp.org -> 139.180.160.82
[00:00:10.415,466] <err> net_ctx: Port 37975 is in use!
[00:00:10.415,527] <err> net_sntp: Cannot connect to UDP remote : 112
[00:00:10.415,557] <err> sta: Failed to init ctx (-112)
[00:00:12.430,236] <inf> sta: pool.ntp.org -> 139.180.160.82
[00:00:12.430,297] <err> net_ctx: Port 38983 is in use!
[00:00:12.430,358] <err> net_sntp: Cannot connect to UDP remote : 112
[00:00:12.430,358] <err> sta: Failed to init ctx (-112)

jukkar added a commit to jukkar/zephyr that referenced this issue May 10, 2024
There is no need to check our own context when going through
the used ports in the system. This prevents error when binding
in some corner cases.

Fixes zephyrproject-rtos#72035

Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
nashif pushed a commit that referenced this issue May 14, 2024
There is no need to check our own context when going through
the used ports in the system. This prevents error when binding
in some corner cases.

Fixes #72035

Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
ycsin pushed a commit to ycsin/zephyr that referenced this issue May 17, 2024
There is no need to check our own context when going through
the used ports in the system. This prevents error when binding
in some corner cases.

Fixes zephyrproject-rtos#72035

Signed-off-by: Jukka Rissanen <jukka.rissanen@nordicsemi.no>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants