Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some TCP keepalives corrupt the extracted data streams #253

Open
kjgrahn opened this issue Jun 18, 2023 · 4 comments
Open

Some TCP keepalives corrupt the extracted data streams #253

kjgrahn opened this issue Jun 18, 2023 · 4 comments
Assignees

Comments

@kjgrahn
Copy link

kjgrahn commented Jun 18, 2023

This bug report is unfortunately vague, but it might interest you anyway since it's a pure TCP segment assembly problem, having nothing to do with HTTP et cetera.

At work we asked for and got pcap files from a customer. My plan was to use tcpflow to extract the TCP streams for further processing, but when I did that, I discovered corruption of single octets here and there.

The TCP connection used SO_KEEPALIVE heavily, with maybe 5 seconds between probes. I also have reason to believe one peer was running some BSD derivate (because the customer calls these machines "SOMETHING-BSD").

What I think happened, was:

  • The peer used the old-fashioned TCP keepalive mechanism mentioned in Stevens' books[1], where the last acked octet is retransmitted as a 1-octet segment.
  • The peer chooses to send a random octet, since it's already acked and forgotten.
  • tcpflow chooses this latest (random) octet instead of the first (acked) one, and thus doesn't record the same stream as an application would see. For every segment followed by a keepalive probe, the last octet is mangled.

I understand you'd like an example pcap file, but I cannot distribute the data. I spent some time at home trying to reproduce this with OpenBSD, but it seems not to have this variant of keepalives. Linux of course doesn't. I suppose the customer used either NetBSD or FreeBSD, possibly an ancient release.

Another sad fact is I used an ancient tcpflow: the one in RHEL7 so I guess it would have been 1.4.5. I see keepalive support was added before that, in tcpflow-1.4.0beta1-129-g9915ef4. I could have used tcpflow from Ubuntu 22 and maybe I did, but I cannot easily find out now (this all happened in April).

[1] Quoting Stevens (TCP/IP Illustrated vol 1, p 335):

Some older implementations based on 4.2BSD do not respond to
these keepalive probes unless the segment contains data. Some
systems can be configured to send one garbage byte of data in the
probe to elicit a response. The garbage byte causes no harm,
because it's not the expected byte (it's a byte that the receiver
has previously received and acknowledged) so it's thrown away by
the receiver. Other systems [...]

@simsong
Copy link
Owner

simsong commented Jul 2, 2023

Hi. I understand that you cannot post the pcap from your confidential data. However, perhaps you can use the system in quest to create a PCAP file that exhibits the problem? If you cannot use the system in question, perhaps you could spin up a RHEL7 system somewhere? I simply cannot debug this without a pcap file that demonstrates the problem.

Thanks.

@kjgrahn
Copy link
Author

kjgrahn commented Jul 3, 2023

I'll see what I can do, but it won't be easy. It's not RHEL7 that I need mainly, but a system with that 4.2BSD quirk in its TCP implementation and I don't know which ones have it ...
I was hoping you'd immediately see the bug (remember how you reasoned about overlapping segments) but I wouldn't start changing that without test data, either ...

@kjgrahn kjgrahn closed this as completed Jul 3, 2023
@kjgrahn
Copy link
Author

kjgrahn commented Jul 3, 2023

Sorry, I'm not familiar with this issue tracker, and didn't intend to close the isse. Reopen.

@simsong simsong reopened this Jul 3, 2023
@simsong
Copy link
Owner

simsong commented Jul 3, 2023

I'll see what I can do, but it won't be easy. It's not RHEL7 that I need mainly, but a system with that 4.2BSD quirk in its TCP implementation and I don't know which ones have it ... I was hoping you'd immediately see the bug (remember how you reasoned about overlapping segments) but I wouldn't start changing that without test data, either ...

By policy, I won't make changes without having test data so that the bug and then the fix can both be validated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants