Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TNFS TCP connection not reconnecting when timed out by client #725

Open
tschak909 opened this issue May 2, 2024 · 3 comments
Open

TNFS TCP connection not reconnecting when timed out by client #725

tschak909 opened this issue May 2, 2024 · 3 comments
Assignees

Comments

@tschak909
Copy link
Collaborator

The TNFS client is not detecting that the TCP connection has been closed, and attempts to use it.

TNFSD also needs checking.

-Thom @trekawek

@tschak909 tschak909 self-assigned this May 2, 2024
@trekawek
Copy link
Contributor

trekawek commented May 3, 2024

tnfsd offers a built-in mechanism for removing expired sessions. An expired session is the one where the last contact was made in more than SESSION_TIMEOUT, by default 6h. The session is removed then, but the TCP connection is not disconnected, so client may think it's still connected.

Also, mechanism for removing old sessions runs when the new client is connected, so it may give impression that new clients are removing the old ones.

This probably can be reproduced by:

  1. Decreasing the SESSION_TIMEOUT e.g.: to 60s, so the issue is easier to reproduce.
  2. Connect to tnfsd with FujiNet, wait for 60s (client 1).
  3. Connect to tnfsd with another FujiNet or Python client (client 2).

This should allow to confirm whether client 1 and/or client 2 behaves as expected or not.

Once this is reproduced, we can:

  1. Disconnect TCP when the session expires.
  2. Validate that FujiNet tnfs behaves correctly when it's disconnected by tnfsd after such timeout. Correct behaviour is reconnecting with a new session id.

@trekawek
Copy link
Contributor

trekawek commented May 5, 2024

I investigated the problem a little bit further and now I think the cause lies elsewhere.

The session timeout works fine, it shouldn't disconnect TCP sockets, because a single socket may handle multiple sessions. In fact, we should disable it for the TCP connections completely, because if a connection is alive, so should be the session. This mechanism makes sense for the UDP only, where we don't know if the session is alive or not if it's idle. I created FujiNetWIFI/tnfsd@98ea15a to address that, but it's not the root cause of the problem.

There are 2 factors causing problems around timeouts and reconnects:

  • TNFS server doesn't allow to restore sessions once TCP connection is lost and restored,
  • TNFS client in FujiNet is not able to handle invalid session error coming from the server (this is a general problem for UDP and TCP).

TNFS server

If the TCP connection is disconnected, TNFS server will remove all related sessions. As a result, if the same client connects again and tries to use the same session id, they'll receive invalid session error (and most likely they won't know how to handle it, see the section about the client).

In the FujiNetWIFI/tnfsd#1 I changed the behaviour of the server - after that it will no longer remove the sessions on TCP disconnection, but only unset the socket FD.

Breaking TCP connections can be tested by staring tnfsd on a non-standard port 16385 and putting https://github.com/Shopify/toxiproxy between tnfsd and FujiNet client:

tnfsd -p 16385 &
toxiproxy-server &
toxiproxy-cli create --listen 0.0.0.0:16384 --upstream localhost:16385 tnfs

With this command, toxiproxy will automatically redirect traffic from its own 16384 port to tnfsd 16385. Now we can bring the connection down/up:

toxiproxy-cli toggle tnfs

or introduce latency:

toxiproxy-cli toxic add -t latency -n tnfs_latency -a latency=100 -a jitter=50 tnfs

After the PR above, TNFS client is able to get it session back, even after TCP connection is lost.

TNFS client

TNFS client currently is not able to handle invalid session error coming from the server. Apart from the disconnected TCP socket (resulting in the removed sessions, as above), this may happen if the server is restarted or the session expires after 6h (see my previous comment), regardless of the used protocol: TCP or UDP.

The code responsible for handling such cases exists, but now it's disabled for the FujiNet hardware (it's only active on PC):

#ifndef ESP_PLATFORM
// TODO review session recovery
// Check for invalid (expired) session
else if (pkt.payload[0] == TNFS_RESULT_INVALID_HANDLE \
&& pkt.command != TNFS_CMD_MOUNT \
&& pkt.command != TNFS_CMD_UNMOUNT)

It tries to create a new session if the old one is invalid. I tested it and it's able to handle the lost session gracefully. It's enough to e.g.: keep browsing the TNFS directory tree. Unfortunately, handles to all opened files are lost, as they are kept server-side in the session data.

I created #727 to enable this feature. While testing it, I found another minor problem in the tnfsd, which won't return any response if the provided FD is invalid. It's fixed in the FujiNetWIFI/tnfsd@c1a3597.

Also, b772f99 adds support for _udp. prefix. It can be useful for the testing purposes.

@trekawek
Copy link
Contributor

trekawek commented May 6, 2024

I added one final commit 874f0be to the #727. It starts a periodic timer for every mounted TNFS filesystem and sends a keep-alive message every 60s.

This way:

  • the UDP sessions won't be expired on the tnfsd even if they are idle for >6h, if only the FujiNet is running,
  • the TCP sessions which has been disconnected due to network conditions and then idle for >6h, won't be expired, because the keep-alive will reconnect the TCP socket and recover the session server-side.

Long story short: this should allow mounting an ATR image and leave the computer overnight. The next morning the mounted ATR should be still usable.

The only case where it wouldn't be usable is the tnfsd server restart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants