Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stateful connections #2

Open
scd31 opened this issue Oct 31, 2020 · 17 comments
Open

Stateful connections #2

scd31 opened this issue Oct 31, 2020 · 17 comments

Comments

@scd31
Copy link

scd31 commented Oct 31, 2020

Are there any plans to add stateful connections to this library? I noticed it wasn't on the roadmap. Alternatively, if I implemented stateful connections, would a PR be accepted?

Awesome project, by the way!

@thombles
Copy link
Owner

thombles commented Nov 1, 2020

Hi there, thanks for your interest in this somewhat neglected project! :) While I'm not personally very excited by stateful AX.25 connections, it's definitely in scope for this crate and it would be nice to have a platform-agnostic implementation in the Rust ecosystem. If you want to have a go writing it I'll certainly review it for you and merge it in.

If you go down that path I just want to confirm some basic architectural things. I always figured that a higher level protocol could be implemented by wrapping the existing frame-level tnc::Tnc inside something else that provides a different kind of connect/send/receive. In your view, would that be a sensible division of responsibilities for this task? Are there any additional features needed on the lower-level object (maybe other KISS params?) to make the connected mode work? I'm wiling to help with that kind of preparation if it lets you focus on the actual protocol.

Secondly, for simplicity (and low performance requirements) I used a very simple blocking interface. Tnc is Clone so multiple I/O tasks is done simply with multiple threads. How do you think you would communicate async events, like "connection attempt failed", or "the message you sent was acknowledged by the other side"? Whether you prefer a blocking interface or an async one, I have no strong feelings either way, but I would like to know if the Tnc needs an async version.

So no pressure. I'm unlikely to do this work. Let me know if you're tackling it and I'll do what I can to clear your path, but otherwise these notes can stay here for anybody who wants to pick it up.

@scd31
Copy link
Author

scd31 commented Nov 1, 2020

Here's roughly what I was thinking:

The Tnc struct would have 2 additional methods - one to accept a connection, and one to attempt a connection. accept_connection would return a Connection struct, while attempt_connection would return Option<Connection, which would be None if a connection could not be established.

The Connection struct would have a copy of the Tnc that established the connection. It would also contain some state machines to keep track of the connection, and a queue containing inbound messages. There would be methods for disconnecting, sending, and receiving(read: pulling a message off the queue). I think it makes sense for the disconnect method to consume the Connection.

By default, the Connection cannot be shared across threads. However, the user would be free to wrap it in an Rc<RefMut<_>> to allow this, for a slight performance penalty. Considering how fast computers are vs the speed of AX25, I don't think this penalty will be an issue.

Tnc would not be modified outside of the two methods described above. I personally don't see any use of having an async version - I think it makes sense when you want to maximize throughput vs CPU, but in this case the radio will always be the bottleneck.

How does that sound to you?

@thombles
Copy link
Owner

thombles commented Nov 1, 2020

Cool, that sounds okay. Just two thoughts:

There would be methods for disconnecting, sending, and receiving(read: pulling a message off the queue).

While the connection is operational it needs to do certain tasks spontaneously (more strictly, on a timer) like retransmissions and ACKs. Would you do this on a background thread, or would the user of the Connection be required to call a receive() or run() method in a loop, to ensure it can do this other protocol bookkeeping work at the same time?

accept_connection would return a Connection struct, while attempt_connection would return Option<Connection, which would be None if a connection could not be established.

I think both of these will need to be a Result type since they are both fallible. e.g. if we lose the TCP connection to our KISS TNC then our attempt to accept a connection will fail; similarly if we fail to initiate a connection it would be good to explain why with an Error type showing "timed out" vs "connection rejected by remote peer" or some other cause.

In any case I'm pretty comfortable with what you're describing. If you want to run anything past me please do but otherwise I'm happy to wait and see what you come up with!

@scd31
Copy link
Author

scd31 commented Nov 1, 2020

I thought about it some more, and the Tnc struct should hold a Vec<Rc<RefMut<Connection>>>. I think this is needed so that the Tnc struct knows how to delegate received messages to the different connections. Also, it means the Tnc struct can handle ACKs, and also retransmit automatically.

Your logic makes sense and I agree a Result type is a better idea. I think the Err type should be an enumerable with all the different possible errors - thoughts? I'd think there are only a few(error because of serial/networking/etc, and timeout from the peer)

@thombles
Copy link
Owner

thombles commented Nov 1, 2020

I thought about it some more, and the Tnc struct should hold a Vec<Rc<RefMut<Connection>>>. I think this is needed so that the Tnc struct knows how to delegate received messages to the different connections. Also, it means the Tnc struct can handle ACKs, and also retransmit automatically.

Ah you raise an interesting point. The simple Tnc-clone approach doesn't help at all when there are multiple consumers of the incoming frames. Let's imagine an example application that might want to use this library—a packet radio conversation GUI, where you can have multiple tabs for simultaneous connections to remote stations, plus a monitoring tab that shows every single received frame, whether or not it's part of a connection.

Option 1: Totally distinct Tnc and Connection

It would be possible to write an app using Tnc how it is. The app developer spawns a thread which calls the blocking Tnc::receive_frame() in a loop. For each frame received, the app sends one copy over to the monitoring tab. Then it loops over each Connection and calls Connection::handle_incoming(frame). The Connection would ignore any frames that are not actually relevant to it (source and destination don't match). Having fanned out all the data, it loops and calls receive_frame() again.

Each Connection would internally contain a copy of the Tnc so that it can send outgoing messages whenever it wants. Each Connection would also spawn its own thread that it uses for retransmission timers, etc.

Overall this option is a bit clunky from the app developer's perspective, but modular and powerful.

Option 2: Bring the thread into the Tnc

Perhaps it would be nice for the crate to be more integrated. We could replace Tnc::receive_frame() with a new method Tnc::incoming(), which returns a Receiver<Ax25Frame>. You could call this multiple times to register more and more channels that want to receive incoming data. As soon as the Tnc is opened, it creates its own background thread which is calling an internal version of receive_frame in a loop. For each subscriber it would send a copy of the Ax25Frame into the corresponding Sender.

Our hypothetical app now becomes a bit simpler. We open the Tnc, call incoming(), and give that Receiver to our monitoring tab. That's now taken care of. Then each time we open a Connection, when it stores the Tnc inside it internally calls incoming(). Since the message is delivered on a background thread, the app developer doesn't have to do anything to pass messages around—they can focus purely on sending and receiving user messages, by interacting with the Connection.

Option 3: Connection as a part of Tnc

In the options so far, Tnc doesn't know anything about ongoing connections. It just shuffles frames around. Alternatively it could literally own all of the Connection instances. This has parts of both the previous options—the Tnc would have a thread which is doing receive_frame(), and then it would loop over all its active Connection objects and call Connection::handle_incoming(frame), or possibly optimise it based on the src/dest.

How do we handle the monitoring tab in our example app? I guess we would need to change the external interface to incoming(), so that it can co-exist with the Connection traffic.

More interestingly, how do we actually send and receive user messages on the Connection, when they are actually contained by the Tnc? Somehow we need to expose them, and these types become tangled together.

Also I don't think the Tnc can help too much with the protocol bookkeeping. Every connection is going to have its own timeouts and it will probably be easier if they just have their own threads and do it independently.

Conclusion

Either option 1 or 2 makes the most sense to me. What you said about the Tnc holding a Vec of Connection is more like option 3, and while it seems possible, it's not obvious to me what the API would actually look like.

Your logic makes sense and I agree a Result type is a better idea. I think the Err type should be an enumerable with all the different possible errors - thoughts? I'd think there are only a few(error because of serial/networking/etc, and timeout from the peer)

Yep, exactly. That would be most appropriate for a library like this one. I already have a couple of such enums, though looking at it now I should move to thiserror. I think it's become a more popular choice for deriving Error types like these.

@scd31
Copy link
Author

scd31 commented Nov 1, 2020

Personally, I still feel that option 3 is the most elegant. To me, it seems like option 2 and option 3 are similar - option 2 just holds a bunch of receivers, instead of a bunch of connections.

With option 3, the Connection would be wrapped in a RefMut, which we could return to the user to use as they please. In fact, we could even define a new type, Connection, which would be equivalent to Rc<RefMut<TncConnection>>, where TncConnection is the struct we were previously calling Connection. This also has the advantage of making the Connection usable across threads.

@scd31
Copy link
Author

scd31 commented Nov 1, 2020

Slight correction - if we want to allow the Tnc to be useable across threads, we'll need to replace Rc<RefMut<_>> with Arc<Mutex<_>>.

@thombles
Copy link
Owner

thombles commented Nov 1, 2020

Fair enough. I guess I have a preference for actors/channels over shared ownership but if you're confident in it then let's go with it. If we're doing that, we should make sure that dropping the Connection (the new definition you mentioned) results in a full cleanup. It would be annoying if the TncConnection persisted inside Tnc after the user was done with it. And yeah Arc<Mutex<_>> seems good to me.

As an aside, it would be awesome if the inner connection state machine was unit testable, i.e. messages can be loaded in and out with explicit timestamps, rather than using Instant::now() and real delays. I'm not sure if you were already planning to do that. I don't consider it critical, but if have the bandwidth for it it would get a big thumbs up from me!

@scd31
Copy link
Author

scd31 commented Nov 1, 2020

I thought about it some more, and I actually think option 2 is better. I think my idea of having an array of mutexes could cause problems if the user also wanted to use mutex - I think possibly a deadlock condition could arise. I think that would also make it decently unit-testable OOTB; In unit tests, a custom sender/receiver pair could be created, and the receiver could be passed into the connection.

@scd31
Copy link
Author

scd31 commented Nov 8, 2020

Quick question about AX25 that I hope you could answer(running into this while implementing stateful connections)

Are UI frames and I frames the same thing, but UI frames are used when the data can fit into a single frame? Otherwise, I assume, it's broken up into multiple I frames.

@thombles
Copy link
Owner

thombles commented Nov 8, 2020

Are UI frames and I frames the same thing, but UI frames are used when the data can fit into a single frame? Otherwise, I assume, it's broken up into multiple I frames.

I haven't studied the specs for a while but that doesn't sound right to me. The key feature of an I frame (a numbered one) is that it participates in the flow control and ack/retransmission process. You want this for all your information frames - it's the whole point of using a connection. A UI frame can be sent during a connection but it can also be sent outside one. See 4.3.3.6 in the AX.25 v2.2 spec.

Fragmentation of longer messages than fit in a frame is, I would argue, not our concern. The old AX.25 v2.0 spec I believe says nothing on the matter. The v2.2 one mentioned earlier defines an abstract interface for a segmenter/reassembler, but in practice you can do whatever fragmentation you want at the "application" (or sub-application) level.

The only exception is that we should expose whether there is a limitation on the maximum length of an information frame defined in the TNC we are using. For example you can set such a limit in axports in Linux and I believe the message will fail to transmit if you exceed this limit. I don't think this directly affects what you're doing - the only way you'll end up with a super-long line is if you write an application that writes a super-long information frame.

(Minor aside: the frame format as currently implemented is based on the v2.0 spec. In an ideal world it will be upgraded to support the full range of v2.2 frames but those stations are so rare that it would be far, far more useful to get v2.0 connections working first.)

@scd31
Copy link
Author

scd31 commented Nov 8, 2020

Ah, okay. That makes sense.

Fragmentation of longer messages than fit in a frame is, I would argue, not our concern. The old AX.25 v2.0 spec I believe says nothing on the matter. The v2.2 one mentioned earlier defines an abstract interface for a segmenter/reassembler, but in practice you can do whatever fragmentation you want at the "application" (or sub-application) level.

I think it would be nice if Connection had methods to send and receive byte vecs directly. The more that's abstracted away, the better, imo. For this reason, I think it would be necessary for the Connection to be able to segment and reassemble Iframes. Let me know if this makes sense or if you'd prefer something else.

@thombles
Copy link
Owner

thombles commented Nov 8, 2020

I think it would be nice if Connection had methods to send and receive byte vecs directly.

I agree, but I don't think I'm agreeing in quite the same way. :) I want to make three small background points first to show what I'm talking about.

Frame sizes

The spec allows an information field in a single frame to be up to 256 octets. However many operators choose to run at a smaller maximum of 80-120 octets. These shorter frames have less chance of being destroyed by noise or interference, so you get better throughput overall and longer ranges. The app developer should have the ability to control this. However sometimes the <256 limitation is external to the app (see #6).

Lack of segmentation

To my knowledge, AX.25 has no convention for splitting and reassembling a message that doesn't fit in a single I field. All the connection-mode AX.25 applications I have used are simply text, like a BBS. In this situation it doesn't matter where the frames are split because it just outputs the bytes to a terminal, one frame after another. The screen is just one giant ever-expanding "message".

This means that if you come up with a segmentation protocol, you can't use it to talk to anybody else except your own application, since no other stations will understand it. Since it is not part of AX.25, I don't think that functionality belongs in the library. (But we can make it easy to incorporate this behaviour! More on that in a moment.)

Consistent frames, not streams

TCP doesn't provide message boundaries. If you write an app that sends the following strings into a TCP connection:

"hello"
"i am a fish"

...then it is most likely that the other side will receive "helloi am a fish". Apps will conventionally solve this problem with some form of delimiter, like \n.

AX.25 does not work like that. Frames are guaranteed to be passed from end to end, with information fields which will not be split into pieces or merged together. That means if you want to, you can use frames as implicit delimiters. You could send the two messages above, and the receiver would get "hello" and "i am a fish" separately, without the need for a \n (or an \r, as AX.25 convention goes).

What to do?

With all of this in mind, I suggest a three-level API for Connection.

  1. Connection itself sends and receives Vec<u8>, which correspond directly to information fields in the content of frames. This will allow someone to write an application that needs control over distinct frames.
  2. Implement an io::Read and io::Write adapter which wraps the Connection and makes it look like a byte stream. Behind the scenes the adapter is using the API from (1) to send and receive frames, but to the Rust user it looks like a standard stream, exactly like a TCP connection or a file. I think it's justified to include this in the library because:
    • it's a natural way to work with many traditional packet applications, like a BBS, where the message boundaries are not really significant
    • it's a building block that will enable the level below, without locking the application into a particular protocol
  3. With the standard Read and Write traits in hand, you can bolt on virtually any Rust library you want to do things like streaming compression, fragmentation, serialisation/deserialisation of JSON/CBOR, etc. This is an application-specific concern, as it should be.

@scd31
Copy link
Author

scd31 commented Nov 9, 2020

That makes perfect sense to me. Thanks for clarifying.

@scd31
Copy link
Author

scd31 commented Nov 10, 2020

Sorry for all the questions, but do you know how the timers t1, t2, and t3 are supposed to work? When they're stopped, should they reset?

@thombles
Copy link
Owner

thombles commented Nov 10, 2020

In the spec have a look at section 6.7.1. If you haven't done it before, I'd suggest you get a connection to your local BBS working using existing software and spend some time analysing the traffic with a tool like axlisten or Dire Wolf (the latter only needs an audio input and it will print out detailed information about the packets). Once you've watched a TNC connecting, retransmitting etc. it's all pretty intuitive, but it's hard to see how it comes together if you're just reading the spec.

@scd31
Copy link
Author

scd31 commented Nov 10, 2020

Okay, thanks. I'm out of town this week but I'll give that shot once I'm home.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants