Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing memory leaks when a simultaneous close is happening #489

Merged
merged 11 commits into from Jul 27, 2022

Conversation

TheLortex
Copy link
Member

When the client and the server want to close a connection at the same time, they both send a FIN.

I have identified situations where mirage-tcpip might keep the pcb forever, stuck in closing states, causing a memory leak.
This might be linked to #488.

This PR adds two tests exposing the behavior.

Test 1: close_ack_scenario

Both client and server sends a FIN, then they ACK each other's FIN. The state transition for the server is:

  • Established - Send_fin(1) -> Fin_wait_1(1): server FIN is sent
  • Fin_wait_1(1) - Recv_ack(1) -> Fin_wait_1(1): previous packet is acked
  • Fin_wait_1(1) - Recv_fin -> Closing(1): client FIN is received
  • Closing(1) - Recv_ack(2) -> Time_wait: server FIN is acked

The problem is that the connection state would stay in Time_wait forever.
This is fixed by commit Add timewait timeout for Closing -> Time_wait transition.

Test 2: close_reset_scenario

Now, we assume the client doesn't ackowledge the FIN. The connection is then in a Closing(_) state.

I have encountered a situation where it would send a RST instead with a valid but not expected sequence number. In that case, check_valid_segment returns ChallengeAck and writes into q.rx_data but because we are in a closing state that mailbox variable is not read anymore (as in flow.ml Rx.thread would have returned). So the challenge ack is not sent and the connection stay in this zombie state.

The fix for this is to use the send_ack mailbox variable that can be used to send an empty ack even if the rx thread is stopped:
enable challenge ack even after FIN

Untested: What if the client stops communicating ?

In this Closing(_) state, we are waiting for the client to ACK the server's FIN. If nothing happens, the retransmission mechanism should re-send the FIN. Instead, the retransmission timer is currently stopped if the connection is in the Closing(_) state

Add retransmit timer when in a Closing state

Disclaimer

TCP is hard ! I would be very happy if someone can review these changes and make sure nothing is broken with these patches.

@haesbaert
Copy link
Member

This is a great find ! 🥇
I'm back from 🌴 next week and will have a look

@balrajsingh
Copy link
Member

balrajsingh commented Jul 1, 2022 via email

@ansiwen
Copy link

ansiwen commented Jul 2, 2022

Thanks folks for looking into this and thanks a lot @TheLortex for the great work so far!

Some context: This work resulted from a memory leak that we only can observe with an unikernel running as a Muen subject (=VM). As a kvm instance I can't reproduce the issue. Also it only shows up when the requests (from curl in this case) are directly done on the network interface connected to the server hardware Muen is running on. If the request comes from another routed network (from a container for example) it also doesn't show up. The sad news is: If I didn't do something wrong, this fix still doesn't fix our issue. There seems to be more luring. Edit: this patch fixes this! But still there could be more lurking. So it would be great if you folks could look for something more that might be leaking.

A detail that might be relevant, is that the clock provided by the Muen kernel has a reduced resolution for security reasons. Not sure this is really relevant, but just to keep it in mind.

Here some info from our internal ticket:

I created pcap files and debug logs for both cases. As you can see, the debug logs show, that in the leaking case (right side), the pcb entry is not removed:
Bildschirmfoto 2022-06-27 um 16 11 52

At the same time the pcap traces show that in the leaking case only one RST package is sent, in the non-leaking case there are four, but I'm not sure if this is relevant:

without leak:
trace-no-leak
with leak:
trace-leak

Here the pcap files with 1000 requests for both cases (with and without leak):
leak-info-1000-without-leak.pcap.zip
leak-info-1000-with-leak.pcap.zip

EDIT: I can also move this to a new issue, if you prefer to merge this PR as is.

@haesbaert
Copy link
Member

haesbaert commented Jul 4, 2022

If I read this right, in

let tstr s (i:action) =

It is always an error to transition to TIME_WAIT purely, without actually arming the timeout.
Looking at Stevens I also don't see any transition to TIME_WAIT where the timeout shouldn't be armed.

@haesbaert haesbaert self-requested a review July 4, 2022 16:29
Copy link
Member

@haesbaert haesbaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a partial review with some questions, I still need to understand the ACK dance, but I'll do it tomorrow.

src/tcp/state.ml Outdated Show resolved Hide resolved
src/tcp/state.ml Outdated Show resolved Hide resolved
src/tcp/flow.mli Outdated Show resolved Hide resolved
src/tcp/segment.ml Outdated Show resolved Hide resolved
@ansiwen
Copy link

ansiwen commented Jul 6, 2022

Not sure if that detail is important, but please note, that the last of several RSTs in the non-leaking case has another Seq value. There is no RST with Seq=805 in the leaking case. Maybe it's broken in both cases, but the additional RST "rescues" the situation?

@TheLortex
Copy link
Member Author

Yes exactly. RST with Seq=804 is supposed to trigger a challenge ACK, as it is not the next expected Seq number, as the previous FIN also uses Seq=804. That's why only RST with Seq=805 is handled.

@hannesm
Copy link
Member

hannesm commented Jul 7, 2022

I have not looked into the details of this PR, but @ansiwen reported there's still a leak: is the pcap the same as before this patch, or is there a difference?

@ansiwen
Copy link

ansiwen commented Jul 7, 2022

Not sure about pcap, would need to create it, but at least the log output was 100% identical.

@ansiwen
Copy link

ansiwen commented Jul 18, 2022

A short heads-up that this fix does indeed fix the leak we are observing. Will update with more details tomorrow.

@ansiwen
Copy link

ansiwen commented Jul 19, 2022

Because mirage ignored the pin in config.ml I didn't actually try the patched version before. I pinned now directly in opam and then it finally worked!

Here is how the debug log looks like now (right) compared to before (left):
Bildschirmfoto 2022-07-19 um 17 15 46

The network trace looks like this now:
Bildschirmfoto 2022-07-19 um 16 24 29

leak-info-1000-baremetal-with-fix4.pcap.zip

if Lwt_mvar.is_empty q.rx_data
then Lwt_mvar.put q.rx_data (Some [], Some Sequence.zero)
if Lwt_mvar.is_empty q.send_ack
then Lwt_mvar.put q.send_ack Sequence.zero
else Lwt.return_unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this correctly, the original Ack.t stored in Flow.flow (aka Flow.pcb) is not reachable through Segment.Rx.t, which is a shame, maybe ack should have been stored deeper.

So it begs the question why is your send_ack not a Ack.t, so instead of pushing something directly on a mailbox you could do the same Ack.pushack dance that the Rx.thread do, something like:

  let send_challenge_ack q =
    (* TODO:  rfc5961 ACK Throttling *)
    Ack.pushack q.ack Sequence.zero

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the Ack module. But indeed it looks like it's doing similar !

I have added a commit that uses the pushack function as suggested.
Now the Segment.Rx module is functorized against the Ack.M signature.

Copy link
Member

@haesbaert haesbaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good to me, I just hope my suggestion didn't break the fix somehow :).
Thanks for the work, this is a great find !

@haesbaert
Copy link
Member

Can we get another OK for this ?

@TheLortex
Copy link
Member Author

@hannesm or @balrajsingh would one you be available to review the patch ? We're now confident that it fixes Nitrokey's issue.

@hannesm
Copy link
Member

hannesm commented Jul 26, 2022

Code looks fine (though I'm still not an expert at this TCP/IP stack), @ansiwen reports it solves the issue. Would you mind to write an entry for CHANGES and cut a release @TheLortex (I can as well do the release if you like).

@hannesm
Copy link
Member

hannesm commented Jul 26, 2022

Question for @ansiwen: did you test 1813264 or which commit?

@TheLortex
Copy link
Member Author

Would you mind to write an entry for CHANGES and cut a release @TheLortex (I can as well do the release if you like).

Great, I'll take care of that.

@ansiwen tested on 1bd6270, and normally the additional commit 1813264 does not change the behavior of the stack

@balrajsingh
Copy link
Member

I mainly looked to see if the send_challenge_ack change breaks anything else - this is the central change as I understand it, please let me know if not so.

I think that the fix is good. It is specific to the challenge ACK situation only, when no data needs to be pushed up to the application. Your test is also convincing. I too agree that the change can be merged.

And @TheLortex, yes TCP is hard. So glad that you are looking at it in such detail. Many thanks!

@ansiwen
Copy link

ansiwen commented Jul 26, 2022

@ansiwen tested on 1bd6270, and normally the additional commit 1813264 does not change the behavior of the stack

No, actually I tested still on b9b88d9, I will test tomorrow again with the latest commit.

@TheLortex
Copy link
Member Author

I have added a changelog entry and I am ready to cut a patch release today when @ansiwen confirms that the latest commit still works !

CHANGES.md Outdated Show resolved Hide resolved
@ansiwen
Copy link

ansiwen commented Jul 27, 2022

I can confirm that 1813264 still fixes the issue.

@dinosaure
Copy link
Member

Let's merge and cut a release so! Thanks all for your work.

@dinosaure dinosaure merged commit efbebdf into mirage:main Jul 27, 2022
dinosaure added a commit to dinosaure/opam-repository that referenced this pull request Jul 27, 2022
CHANGES:

* TCP: fix memory leaks on connection close in three scenarios (mirage/mirage-tcpip#489 @TheLortex)
  - simultanous close: set up the timewait timer in the `Closing(1) - Recv_ack(2) -> Time_wait`
    state transition
  - client sends a RST instead of a FIN: enable sending a challenge ACK even when the reception
    thread is stopped
  - client doesn't ACK server's FIN: enable the retransmit timer in the `Closing(_)` state
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants