Update EndpointIn to allow arbitrary size writes #2799

simpkins · 2024-04-10T00:51:25Z

This updates the EndpointIn::write() API to support writes larger than the endpoint's maximum packet size. A new EndpointInSinglePacket helper trait was added to support driver implementations that only allow writing a single packet at a time. At the moment all of the provided drivers are still using this implementation, but they could be updated to write more data at once in the future.

This has not yet changed any code in embassy-usb to send larger packets. I wasn't sure if it would be better to bump the
embassy-usb-driver version number before doing that, since this is a meaningful change, and we would want to ensure
that larger-sized writes aren't sent to older driver implementations that do not support this.

simpkins · 2024-04-10T02:18:29Z

One caveat here: I haven't tested the embassy-stm32, embassy-nrf, or embassy-rp implementations on actual hardware. That said, the unit test should hopefully provide some confidence that it works correctly.

I also have a corresponding change almost ready for the read() APIs, but figured I would send the write() change out for comments first.

One possible concern about this API change: if a write() or read() operation is dropped (say, if it was abandoned because of a timeout), the caller can no longer really reason about how much data may have been sent or received before the operation was dropped. Unless this is an isochronous endpoint, the user would presumably need to reset the bus after a dropped operation to get back to a known state. I'm not sure if people feel like this problem matters in practice, and if callers would want to attempt to resume using an endpoint without resetting it even after a timeout. Even today, it feels somewhat burdensome to require that usb driver implementations guarantee no data loss after a dropped read(): it means that driver implementations must always have enough buffer space of their own to store read data between a dropped read() and the next time the user calls read() to get this data. This just happens to be more feasible to do today with at most 1 packet outstanding, and harder if the read could be many packets long.

This updates the EndpointIn::write() API to support writes larger than the endpoint's maximum packet size. A new EndpointInSinglePacket helper trait was added to support driver implementations that only allow writing a single packet at a time. At the moment all of the provided drivers are still using this implementation, but they could be updated to write more data at once in the future.

plaes · 2024-04-10T12:11:42Z

embassy-usb-driver/src/lib.rs

+    /// Write a single packet to the endpoint.
+    ///
+    /// The length of buf is guaranteed to not be more than the endpoint's maximum packet size.
+    async fn write_one_packet(&mut self, buf: &[u8]) -> Result<(), EndpointError>;


Why not call this just write_packet? Or if you want to add some emphasis, then write_single_packet.

Dirbaio · 2024-04-10T13:20:53Z

Since this is a breaking change to embassy-usb-driver which is somewhat disruptive, i'd prefer to wait until at least one in-tree driver implements the write() API (and ideally demonstrates a perf improvement as a result). Otherwise if we merge this now we're taking the downside of the break with no upside.

Also, I wonder if instead we could

achieve the increased performance with the current API, for example implementing double-buffering in the drivers.
achieve the better user-friendliness with improvements in embassy-usb only, for example EndpointIn/EndpointOut wrappers that do the chunking for you, but still call the current "write packet / read packet" API.

Dirbaio · 2024-04-10T13:30:59Z

embassy-usb-driver/src/lib.rs

+    /// sequence of packets, where all but the last packet is guaranteed to be the maximum packet
+    /// size.
+    ///
+    /// A short packet always signifies the end of a USB transfer.  If you have a large transfer


I think this is a bit awkward. I understand the goal is to support "streaming" transfers, but it ends up being awkward for both the streaming and non-streaming case:

For non-streaming, caller still has to check if the data size was accidentally a multiple of 64, and then do a zero-length write in that case.

For streaming, caller has to ensure all chunks except the last are multiples of 64. Chances are whatever's generating the streamed data is not generating it in clean chunks of 64, so the caller has to write logic to buffer the last "half-full" 64byte packet.

solutions

Not support streaming at all. We have the high-level write_transfer which writes one entire transfer, and the low-level write_packet which writes one packet.

Support streaming with an is_last: bool param.

Maybe not supporting streaming wouldn't be such a big of a deal?

This is just how the USB spec is defined. This PR does not change the behavior here.

Even with the current write() API, users need to manually call write() with a 0-length buffer if they want to signal an early end of a transfer and they happen to have just sent a full packet.

Note that in many cases this doesn't really matter, because the endpoints have indicated the transfer length some other way (e.g., in the wLength field for control transfers, or in the HID report descriptor for HID transfers). If the other side knows the transfer length to expect the explicit 0-length packet to indicate the end of the transfer isn't needed. An explicit short packet is only needed to signify the end of the transfer when the other side isn't really expecting this or otherwise doesn't know the transfer length.

Support streaming with an is_last: bool param.

I would be happy to add an extra parameter that asks the device to send an extra zero-length packet after the end of the transfer, if the transfer is a multiple of the max packet size.

I wouldn't call it is_last though, since this isn't always needed at the end of a transfer. Maybe send_zlp or indicate_early_end?

This is just how the USB spec is defined. This PR does not change the behavior here.

The spec defines (assuming 64b max packet size):

A "packet" is a unit containing 0..=64 bytes

A "transfer" is a sequence of 64byte packets, followed by one <64byte packet (zero length if necessary).

My point is this function is neither "write packet" nor "write transfer", it's "something in between". It does some of the chunking of transfers into packets for you, but it leaves some of the logic to the caller. It's a leaky abstraction.

It'd be cleaner if we had functions for "write packet" (does no chunking for you) and "write transfer" (does ALL chunking for you).

There's this edge case of when the transferred data length is exactly equal to the "expected transfer size" by the host, and is a multiple of 64, you don't have to send the ZLP. I'm not sure if it hurts to send it anyway in that case, though. It feels like we can always send it and that's it?

I don't believe it is safe to simply always send the ZLP: I think this would incorrectly be treated as 2 separate transfers in some cases, with the ZLP being treated as a separate transfer of its own.

In particular, I don't think your definition of a transfer is exactly accurate: a transfer doesn't have to end in a short packet. It can end in a maximum-size packet if the sender and receiver both agree on and understand the transfer length ahead of time. The USB 2.0 spec just defines a transfer as "One or more bus transactions to move information between a software client and its function", and it basically seems to allow functions to use higher-level protocol knowledge to define transfer boundaries. The lower level driver code therefore (unfortunately) cannot really know the start and end of a transfer without having higher level knowledge of the semantics of the data being sent.

For example, consider the HID specification. The device defines the various possible reports it can send in its HID descriptor. Therefore the host knows all possible reports that could be sent, and the maximum length of any report that the device might try to send. If the longest report is 128 bytes, the host knows that if the device sends 2 64 byte packets this must be the end of the transfer, and that the next packet can't still be part of the same report transfer.

From section 8.4 of the HID 1.11 spec:

Only one report is allowed in a single USB transfer.

...

All reports except the longest which exceed wMaxPacketSize for the endpoint
must terminate with a short packet. The longest report does not require a short
packet terminator.

Due to this protocol behavior, I think the higher level code is always going to have to pass in some indication of whether or not their transfer needs to be ended with a short packet or not. The USB driver itself simply doesn't have enough knowledge of the data being sent to be able to decide this on its own.

All that said, I take your point about this API being somewhat in between "write one packet" and "write one transfer". I do think this is still a useful API to provide to higher level code, as it allows them to send larger amounts of data at once while still splitting transfers into chunks if they desire. Having a simple "write_transfer()" API to write a full transfer makes sense to me, but I think it still will require the caller to pass in a flag to specify whether the transfer needs to explicitly end in a short packet or not. Note that at that point, the "write_transfer()" behavior is basically the same as this current API (apart from the addition of this extra flag).

Let me also send out a PR with my code for API definition for streaming read behavior as well, since I suspect you will probably also have feedback on it too. It has similar slightly interesting behavior around whether caller requested to read data that is a multiple of the maximum packet size or not. Thanks also for taking the time to review and provide feedback on these PRs.

This commit has my streaming read API proposal: simpkins@20812ee

I can include that commit in this PR if you prefer. Given that both of these are breaking driver API changes, I would imagine that, assuming these changes are approved at all in the first place, it would probably be reasonable to merge them both together, along with the embassy-stm32 implementations for them.

do you have a source confirming you MUST NOT send the ZLP? that HID quote says "does not require", not "must not send".

Also, just to be clear: specifically in the case of HID it might be okay to always send an extra ZLP after a transfer with the longest report size. Presumably the host would just treat this as an extra 0-length transfer and would just ignore it. This doesn't really feel like a desirable thing to do though, and it seems like this isn't necessarily guaranteed to be a safe thing to do in general for other arbitrary protocol classes.

do you have a source confirming you MUST NOT send the ZLP? that HID quote says "does not require", not "must not send".

I think my earlier reply raced with yours, since I wrote it before I saw yours, but I think this is entirely up to the higher level protocol definition. For HID I think it would be okay to send an extra ZLP, since the host would likely just ignore the extra empty transfer. However, based on the USB spec I think it would be entirely valid for a vendor-defined protocol to have different (unexpected) behavior on receipt of an extra empty packet, since this could be treated as a separate transfer.

The underlying problem here is simply that knowing the USB transfer boundaries requires higher level knowledge of the data contents and semantics, and this can't be known at the embassy-usb-driver level. If you receive 2 full packets followed by 1 short packet, it's entirely up to the higher level code to decide if this is 1 transfer, 2 transfers, or 3 transfers. At the driver level we can identify that a short packet always means the end of a transfer, but we can't tell if a full packet is the end of a transfer or not.

I'll update this PR to add 2 separate methods to EndpointIn: one that just does a simple write of the data (what this current method does), and one that writes the data plus ensures it ends with a short packet (basically, call the simple write, followed by a ZLP if the length was a multiple of the MPS). I think it's maybe slightly nicer to have 2 separate methods rather than 1 method with a parameter, but I'm happy to just have 1 method with a parameter if you prefer.
For the HID example, the HID class code would presumably want to use the simple write call when writing the longest report (which is always true if there is only a single HID report defined), and would want to use the ZLP version when writing shorter reports.

simpkins · 2024-04-10T15:51:46Z

Since this is a breaking change to embassy-usb-driver which is somewhat disruptive, i'd prefer to wait until at least one in-tree driver implements the write() API (and ideally demonstrates a perf improvement as a result). Otherwise if we merge this now we're taking the downside of the break with no upside.

Sure, that makes sense. I'm happy to contribute an implementation for embassy-stm32 and wait before that is done for this to go in. I did just get a black pill board and have that working now, so I can do some testing on stm32. I have code for esp32 that supports streaming transfers, and I should be able to port that over.

simpkins · 2024-04-11T04:37:54Z

This update incorporates some of the review feedback:

I renamed write_one_packet() to write_packet()
To mitigate some of the breaking changes in the embassy-usb-driver API, I left the write() API the same, and added this new behavior as a separate write_data() API. (write() now has a default implementation that calls write_data(), so it supports larger writes, but any existing code that calls it with a single packet at a time should still behave exactly the same.) That said, this is still a breaking change for embassy-usb-driver implementations (since they now have to implement write_data()), but it avoids breaking the API for users of the API in embassy_usb
The new write_data() API accepts the data, plus a force_short_packet parameter to indicate if the caller wishes the implementation to send a zero-length packet at the end of the transfer if the data is a multiple of the max packet size.

I also have an implementation for embassy-stm32 mostly ready, over here: simpkins@867c7c1
However, I've only done pretty cursory testing of it so far (I mainly just hacked up one of the usb_serial examples to always send back large chunks of data on any input, rather than echoing the input data).

Dominaezzz · 2024-04-14T13:05:27Z

the caller can no longer really reason about how much data may have been sent or received before the operation was dropped

Can the method just return a count of the sent packets or bytes?

Is EndpointError::BufferOverflow still needed after this PR?

simpkins · 2024-04-19T01:44:22Z

Can the method just return a count of the sent packets or bytes?

No, unfortunately it can't: if the operation is dropped there is no opportunity to return anything at all.

Is EndpointError::BufferOverflow still needed after this PR?

It is still needed in the receive case. Even if we update the read APIs to allow reading more than a single packet worth of data at once, if the caller provides a buffer whose length is not a multiple of the max packet size, we can encounter a BufferOverflow situation if the final packet contains more data than the caller expected.

Dirbaio · 2024-04-26T20:33:33Z

marking as draft until these concerns are solved.

simpkins force-pushed the usb_xfer branch from b31b248 to aa87007 Compare April 10, 2024 01:10

simpkins force-pushed the usb_xfer branch from aa87007 to 17c1c13 Compare April 10, 2024 05:00

plaes reviewed Apr 10, 2024

View reviewed changes

Dirbaio reviewed Apr 10, 2024

View reviewed changes

Code review feedback for EndpointIn streaming writes

2dab366

Dirbaio marked this pull request as draft April 26, 2024 20:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update EndpointIn to allow arbitrary size writes #2799

Update EndpointIn to allow arbitrary size writes #2799

simpkins commented Apr 10, 2024

simpkins commented Apr 10, 2024

plaes Apr 10, 2024 •

edited

Dirbaio commented Apr 10, 2024 •

edited

Dirbaio Apr 10, 2024

simpkins Apr 10, 2024

Dirbaio Apr 10, 2024

simpkins Apr 10, 2024

simpkins Apr 10, 2024

Dirbaio Apr 10, 2024

simpkins Apr 10, 2024

simpkins Apr 10, 2024

simpkins commented Apr 10, 2024

simpkins commented Apr 11, 2024

Dominaezzz commented Apr 14, 2024

simpkins commented Apr 19, 2024

Dirbaio commented Apr 26, 2024

Update EndpointIn to allow arbitrary size writes #2799

Are you sure you want to change the base?

Update EndpointIn to allow arbitrary size writes #2799

Conversation

simpkins commented Apr 10, 2024

simpkins commented Apr 10, 2024

plaes Apr 10, 2024 • edited

Choose a reason for hiding this comment

Dirbaio commented Apr 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simpkins commented Apr 10, 2024

simpkins commented Apr 11, 2024

Dominaezzz commented Apr 14, 2024

simpkins commented Apr 19, 2024

Dirbaio commented Apr 26, 2024

plaes Apr 10, 2024 •

edited

Dirbaio commented Apr 10, 2024 •

edited