Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about unrecognized connections #84

Open
themighty1 opened this issue Nov 2, 2020 · 10 comments
Open

A question about unrecognized connections #84

themighty1 opened this issue Nov 2, 2020 · 10 comments

Comments

@themighty1
Copy link

@gustavo-iniguez-goya, I remember reading in other github issues that you've done a lot of research into this area of trying to pin down where the unrecognized connections come from.

Have you seen connections which the first time around are not found via netlink nor via tcp netstat, but if you loop again they will be found via netlink or netstat?
Sort of like a delay in the kernel to update its netlink tables or something?

Or has it always been the case that if the connection is not found on the first netlink/netstat iteration, it means that's the end of it?

Just throwing some ideas around. Maybe you already know these things, so it's quicker to ask you than to test whether it is the case.

@gustavo-iniguez-goya
Copy link
Owner

Have you seen connections which the first time around are not found via netlink nor via tcp netstat, but if you loop again they will be found via netlink or netstat?

No that I can remember of. Those connections could be of a forwarded connection traversing the box, but we wouldn't intercept it anyway, because we only intercept NEW and RELATED connections. Also broadcast/multicast connections are a bit special (differ a bit as seen by iptables and netlink).

There're at least two situations though where this can happen, and is when you come back from suspend the system or when you dis/connect to a wifi network. Usually the opened connections are in an invalid state, and the processes start closing/reestablishing them. Adding the state RELATED helped to identify some of these connections.

Or has it always been the case that if the connection is not found on the first netlink/netstat iteration, it means that's the end of it?

If the PID of the process who created the connection is not found on the first iteration, it could mean that the PID is in reality a TID, i.e.: the connection was opened by a thread of the process.

As we don't parse /proc/<PID>/task/ directory we don't look there for the inode of the connection, so we don't find the cmdline nor the PID. This is mainly the reason for many "unknown connections", it's very costly to parse all TIDs.

Another case is when the process is a forked child, for example those launched by systemd. As far as I can remember, in this case the reported PID was of the systemd.

@gustavo-iniguez-goya
Copy link
Owner

Regarding the last case, for example fwupdmgr launched by systemd (fwupd-refresh systemd service) is not detected (using proc method):

[2020-11-02 22:09:31]  DBG  new connection tcp => 55732:192.168.1.101 -> 151.101.122.49:443 uid: %!(EXTRA uint32=1000)
[2020-11-02 22:09:31]  DBG  [0/1] outgoing connection: 55732:192.168.1.101 -> 151.101.122.49:443 || netlink response: 55732:192.168.1.101 -> 151.101.122.49:443 inode: 50975687 - loopback: false multicast: false unspecified: false linklocalunicast: false ifaceLocalMulticast: false GlobalUni: true 
[2020-11-02 22:09:32]  DBG  new pid lookup took%!(EXTRA int=-1, time.Duration=731.053526ms)

netlink dumps correctly the inode of the connection but the PID is not found. Using audit is more likely to have success.

If you launch it manually then yes fwupdmgr refresh --force .

I'm wondering if will have anything to do with the systemd sandboxing options.

@themighty1
Copy link
Author

Thanks for the insights. Clarified a lot.
I added a loop to the opensnitch code to look up netlink/netstat again.
Alas, the code was looping forever - no inodes were found.
This happens to very few connections when I add a new torrent to transmission.

DBG new connection tcp => 45017: -> :27359 uid: %!(EXTRA uint32=4294967295)
DBG netlink socket error: Warning, no message nor error from netlink - 45017: -> :27359
DBG Searching for tcp6 netstat entry instead of tcp
DBG <== no inodes found, applying default action.

Have you seen this before? What do you think the cause may be?
I know that transmission may act as a server, but in my case this was a connection from my IP to dest IP.

@gustavo-iniguez-goya
Copy link
Owner

Yeah, with transmission is fairly common to see those messages.

I have no idea really, but it could be failed connection attempts:

20468 1604450963.852980 connect(24, {sa_family=AF_INET, sin_port=htons(53998), sin_addr=inet_addr("87.173.23.163")}, 16) = -1 EINPROGRESS (Operation in progress)
20468 1604450963.854008 setsockopt(24, SOL_IP, IP_TOS, [0], 4) = 0
20468 1604450963.857135 close(24)       = 0

maybe it happens so fast that when we query for it the kernel has already deleted the entry.

I wrote some words regarding these issues here #10 (comment) and here #10 (comment)

@gustavo-iniguez-goya
Copy link
Owner

Some more examples for future reference:

  • Outgoing connection, kernel netlink entry found, inode found, PID not found:
    ,- 188.64.117.35 -> 51413 (transmission source port, i.e: it seems to be a reply/incoming connection to transmission)
[2020-11-04 10:35:31]  DBG  new connection tcp => 45327:192.168.1.101 -> 188.64.117.35:51413 uid: %!(EXTRA uint32=1000)
[2020-11-04 10:35:31]  DBG  [0/1] outgoing connection: 45327:192.168.1.101 -> 188.64.117.35:51413 || netlink response: 45327:192.168.1.101 -> 188.64.117.35:51413 inode: 1606906 - loopback: false multicast: false unspecified: false linklocalunicast: false ifaceLocalMulticast: false GlobalUni: true 
pkt.queue:  0
[2020-11-04 10:35:32]  DBG  new pid lookup took%!(EXTRA int=-1, time.Duration=678.414889ms)
[2020-11-04 10:35:32]  IMP  Added new rule: allow if dest.ip is '188.64.117.35'
[2020-11-04 10:35:32]  DBG  ✔  -> 188.64.117.35:51413 (allow-30s-simple-1886411735)
  • Connection as seen by the conntrack module (/proc/net/nf_conntrack):
    ipv4 2 tcp 6 117 SYN_SENT src=192.168.1.101 dst=188.64.117.35 sport=45327 dport=51413 [UNREPLIED] src=188.64.117.35 dst=192.168.1.101 sport=51413 dport=45327 mark=0
  • tcpdump:
 644	14.864217873	192.168.1.101	188.64.117.35	UDP	72	51413 → 51413 Len=30
 1382	17.847181084	192.168.1.101	188.64.117.35	UDP	72	51413 → 51413 Len=30
 18961	24.594680726	192.168.1.101	188.64.117.35	TCP	74	45327 → 51413 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=350883110 TSecr=0 WS=1024

So in this case, we should be able to find the PID. More than not finding the PID, in this case what intrigues me is why auditd does not detect it, or well, it's probably detecting it but for some reason we're not parsing the event correctly. I should probably analyze auditd logs as well.

@themighty1
Copy link
Author

Just want to report that I ran an endless loop of dumping all TCP connections via netlink with NLM_F_DUMP and all TCP stated (mask 0xfff) while adding a new torrent to transmission.

Only for ~ 1/5 unknown connections I would find the source port of the unknown connection in my netlink dump.
I did this for a sanity check. At least now I have some confidence that transmission's quick connect/close is reflected in netlink momentarily. By the time opensnitch sends a request to netlink, the entry is no longer there.

Unfortunately netlink doesn't provide a way to subscribe to new events for inet sockets, we can only poll it periodically.

@gustavo-iniguez-goya
Copy link
Owner

The key here would be to use eBPF if it's available: https://github.com/iovisor/bcc/blob/master/tools/tcplife.py

There's a fork who integrated it in opensnitch, maybe we can reuse it. On the other hand, ideally we should have to use XDP to block connections. But it's a faily new feature and it's not available in many systems.

@gustavo-iniguez-goya
Copy link
Owner

connections which the first time around are not found via netlink nor via tcp netstat

I have no idea really, but it could be failed connection attempts:

Correct, in particular non-blocking failed connection attempts. Sometimes the connection will be discarded because it's not found via netlink, and other times will pass all the checks until it fails to retrieve the PID of the process.

With this example you can reproduce the issue (port and ip captured sniffing Transmission traffic):

// gcc cclient.c -o cclient

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <netdb.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <arpa/inet.h>

#define PORT 28979

int main(int argc, char *argv[])
{
    int sockfd;
    struct sockaddr_in their_addr;

    if((sockfd = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
        perror("socket()");
        exit(1);
    } else {
        printf("Client socket() OK...\n");
        if ((fcntl(sockfd, F_SETFL, O_NONBLOCK) < 0))
            perror("setsockopt failed\n");
    }

    their_addr.sin_family = AF_INET;
    their_addr.sin_port = htons(PORT);
    inet_aton("5.180.62.91", &their_addr.sin_addr);
    memset(&(their_addr.sin_zero), '\0', 8);

    if(connect(sockfd, (struct sockaddr *)&their_addr, sizeof(struct sockaddr)) == -1) {
        perror("connect() error");
        exit(1);
    } else
        printf("Client connect() is OK...\n");

    close(sockfd);
    return 0;
}

@themighty1
Copy link
Author

Nice, thank you.

@gustavo-iniguez-goya
Copy link
Owner

gustavo-iniguez-goya commented Dec 8, 2020

Regarding this problem, I've modified the ftrace monitor method to hook tcp/tcp_destroy_sock and sock/inet_sock_set_state instead of sched/sched_process_exec and sched/sched_process_fork

The benefits of doing this is that we only cache and intercept PIDs that have created network activity, instead of cache every single execution of a process in the system. If we wanted to monitor whenever a new process is launched, we should do it via netlink (PROC_EVENT_EXEC, PROC_EVENT_FORK, PROC_EVENT_EXIT), to not rely on debugfs.

On the other hand, inet_sock_set_state logs the source/destination port and IPs of new connections along with the PID of the process, so we can match new outgoing connections with this data:

new outgoing connection:
192.168.1.134:51413 -> 47.188.48.32:57949

inet_sock_set_state:
ADD: pid:22825 inet_sock_set_state -> map[daddr:47.188.48.32 daddrv6:::ffff:47.188.48.32 dport:57949 family:AF_INET oldstate:TCP_CLOSE protocol:IPPROTO_TCP saddr:192.168.1.134 saddrv6:::ffff:192.168.1.134 sport:51413] Key: 192.168.1.134:51413 47.188.48.32:57949

It's not bulletproof. Sometimes the source port is 0 (probably when connection fails to establish), so the new outgoing connection doesn't match. But still, it seems to work way better than the current method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants