Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve telemetry throughput #2095

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Conversation

qqqlab
Copy link
Contributor

@qqqlab qqqlab commented Feb 22, 2023

This PR improves the telemetry throughput of extended header crsf packets. (A race condition was fixed in a separate PR)

Current Situation

9 slots are defined for holding incoming CRSF packets. 7 slots are used for specific standard header CRSF packets, two are used for extended CRSF packets. The RXhandleUARTin() method writes packets to these slots. A scheduler loops over the slots, if a slot with data is found, it is locked, transmitted OTA, and then unlocked to receive new CRSF data.

In worst case condition throughput is only 50% what is possible. This happens when CRSF packets arrive a fraction faster than they are sent OTA, now the slot is still locked and the CRSF message is dropped. The OTA transmitter sits idle until the next CRSF packet arrives.

Pull Request

Modify AppendTelemetryPackage() as follows:

  • Currently the slots are effectively 1 position LIFO buffers with overwrite. This is fine for standard CRSF telemetry packets, where we are interested in the latest packet only. However, for extended packets we are interested in all packets. Make the 2 extended slots act as a 2 position FIFO buffer without overwrite. By double buffering, we can receive the next packet in the buffer whilst the previous packet is transmitted OTA.

  • Refactor code: first handle non-OTA, then OTA for standard CRSF, then OTA for anything not handled yet.

  • Remove the return value of the method, it is never used.

Test Results

Tested by tormenting the ELRS receiver by firing 100 extended CRSF packages per second at it. Each CRSF packet is 20 bytes long.

Maximum possible throughput with Telemetry ratio 1:2

Mode crsf/s  byte/s 
500    62     1250
333F   83     1665
100F   25      500
 50    12      250

crsf/s = number of 20 byte crsf packets OTA per second
byte/s = number of OTA bytes per second

Results with ELRS 3.2.0:

Mode crsf/s  byte/s    telemetry lost
500    21     420      YES
333F   37     740      NO
100F    0       0      YES
 50     0       0      YES

Results after this PR:

Mode crsf/s  byte/s     telemetry lost
500     52    1040      NO
333F    60    1200      NO
100F    18     360      NO
 50      5     100      NO

Tools

The Python script was used to enulate a FC and send crsf packets to the ELRS receiver. The Lua script was used to analyse the received data on an EdgeTX radio.
https://github.com/qqqlab/ExpressLRS/blob/pr_attachments/CRSF-Telem.py
https://github.com/qqqlab/ExpressLRS/blob/pr_attachments/CRSF-Debug.lua

@JyeSmith
Copy link
Member

Nice! Ill leave this for the big brains to review. My only comment is if you would class this as a bug fix and for the maintenance branch?

@qqqlab
Copy link
Contributor Author

qqqlab commented Feb 23, 2023

Hi Jye, thanks for your feedback. I'm not an expert on the git administrative side of ELRS, but would qualify this both as bug fix and maintenance branch.

My main goal is to get bi-directional telemetry working smoothly and with maximum throughput in both directions. The bug fix alone made some improvement, but adding the double buffer strategy really did the trick.

The bug fix alone would be: replace if (targetFound) with if (targetFound && !payloadTypes[targetIndex].locked) at line 301.

Copy link
Contributor

@schugabe schugabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix for the potential race condition is a good find and should be added.
But the remaining changes have some issues: The logic for the ardupilot response is needed - I wrote tests for all the cases and the tests are failing now.

The solution for better throughput would be to keep the slots for all known tlm frames and use a dynamic queue for everything else of the extended frames. Two slots is simply not enough for all the data that is possibly sent.

@qqqlab
Copy link
Contributor Author

qqqlab commented Mar 1, 2023

Sorry, I do not agree. The datalink has limited throughput, if the FC sends more info than can be handled by the link then data will be dropped regardless of how many slots or queue positions are available. Adding a longer queue might even work in the opposite direction as it increases latency. It would more sense to me to add a flow control mechanism instead, i.e. to inform the FC about the queue state.

In the current design no specific slots are assigned for specific extended CRSF messages. The two available slots are both used for multiple types of messages.

One could argue for adding a separate slot for "status text" in the new design. I did not do it, because I think this is sufficiently handled by the queue.

Yes, one test is failing now, but that is a direct consequence of the proposed design change. It is intended behavior.

@schugabe
Copy link
Contributor

schugabe commented Mar 1, 2023

Sorry, I do not agree. The datalink has limited throughput, if the FC sends more info than can be handled by the link then data will be dropped regardless of how many slots or queue positions are available. Adding a longer queue might even work in the opposite direction as it increases latency. It would more sense to me to add a flow control mechanism instead, i.e. to inform the FC about the queue state.

We don't have a "slow down" crsf message type so it's not possible to implement that right now but we are trying to work out a way crsf can be extended: tbs-fpv/freedomtx#26

In the current design no specific slots are assigned for specific extended CRSF messages. The two available slots are both used for multiple types of messages.

One could argue for adding a separate slot for "status text" in the new design. I did not do it, because I think this is sufficiently handled by the queue.
Yes, one test is failing now, but that is a direct consequence of the proposed design change. It is intended behavior.

Merging code that breaks working features is not an option for me and if calling it a design descision does not change that.

My suggestion: create a new PR that fixes the race condition. The remaining changes need more discussion and/or new crsf frames.

Copy link
Collaborator

@wvarty wvarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The physical code changes look ok to me, so I'll pre-approve, with the condition that @schugabe's design concerns are addressed, and leave the final approval to him.

@qqqlab
Copy link
Contributor Author

qqqlab commented Mar 2, 2023

Thanks for your feedback.

@schugabe Please, let's discuss this further.

My initial thought was to keep the existing code as-is, and just add a fifo buffer to improve throughput. But the existing code for extended crsf did not really make sense to me. For example, the code mentions "reserve slot for status text", but the same slot is also used for msp and any non explicitly handled crsf packets.

Also, I asked myself the rather philosophical question: should elrs keep an ever increasing list of crsf messages and dictate priorities for them, or should elrs be net-neutral, treat all messages equally, and just provide the best possible link to the communicating parties?

I like the second option better.

Based on this I submitted a more radical PR, making sure that the PR still satisfies the requirements for extended crsf as outlined in the existing code. .

If it is a hard requirement that all current test pass, I can add code back in to make that happen. But before I spend time on that I would like to get feedback from you if there is an interest in proceeding with this PR.

@wvarty
Copy link
Collaborator

wvarty commented Mar 4, 2023

@qqqlab just an FYI, we are aiming to cut a minor bugfix release towards the end of March, and hope to have all fixes merged to the 3.x.x-maintenance branch before the 9th March to leave time for running the test plan.
It might be worth splitting out the bugfix in this PR to a new PR that targets 3.x.x-maintenance, and leave the enhancements here for discussion once @schugabe gets a chance to respond?

@qqqlab qqqlab changed the title Fix race condition in Telemetry and improve throughput Improve telemetry throughput Mar 7, 2023
@JyeSmith
Copy link
Member

just giving this a little bump @qqqlab and @schugabe.

Copy link
Collaborator

@pkendall64 pkendall64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see why the unit test is failing now.
The previous code would only use the 2 slots for FC originated messages, now it's for any of the non-singled out message types. So it's reducing the special casing that was hard-coded.

So either we are happy to have 2 slots for all types and change the unit test or we change the PR to use 2 slots only for FC originated messages and one slot otherwise.

Personally I'm in favour of simpler is better and change the unit test.

# Conflicts:
#	src/lib/Telemetry/telemetry.cpp
for (int8_t i = 0; i < payloadTypesCount - 2; i++)
// first try slot payloadTypesCount - 2, so that the OTA packets are transmitted in the same order as they are received
targetIndex = payloadTypesCount - 2;
targetFound = !payloadTypes[targetIndex].updated;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to make it compatible to the current behaviour we can change this line to

targetFound = !payloadTypes[targetIndex].updated || !(header->orig_addr == CRSF_ADDRESS_FLIGHT_CONTROLLER || (header->type == CRSF_FRAMETYPE_ARDUPILOT_RESP && package[CRSF_TELEMETRY_TYPE_INDEX + 1] == CRSF_AP_CUSTOM_TELEM_STATUS_TEXT));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've added the fix, to see if the test passes now.

But I'm not 100% sure that this is the best way to go. It looks to me that after this fix we're back to single buffering most messages, and thus back to the old situation slowing things down artificially - the exact opposite of what this PR tries to achieve.

I'm working on the streaming PR now, but time permitting I'll run the throughput tests with this fix in place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, did the throughput testing, and as expected, throughput drops back to old levels....

I've added a new commit and changed the line 265 to:

targetFound = !payloadTypes[targetIndex].updated || header->type == CRSF_FRAMETYPE_PARAMETER_READ;

in order to artificially satisfy test_function_store_unknown_type_two_slots. Now telemetry throughput is improved again.

What does this test test? It tests that for random messages only one slot is used, which is the exact opposite of what this PR tries to achieve.

So, I still think this test should be removed, or changed to expect 2 updated messages.

void test_function_store_unknown_type_two_slots(void)
{
    telemetry.ResetState();
    uint8_t unknownSequence[] = {0xEC,0x04,CRSF_FRAMETYPE_PARAMETER_READ,0x62,0x6c,85};
    int length = sizeof(unknownSequence);
    int sentLength = sendData(unknownSequence, length);
    TEST_ASSERT_EQUAL(length, sentLength);

    uint8_t* data;
    uint8_t receivedLength;
    telemetry.GetNextPayload(&receivedLength, &data);

    sentLength = sendData(unknownSequence, length);
    TEST_ASSERT_EQUAL(length, sentLength);

    TEST_ASSERT_EQUAL(1, telemetry.UpdatedPayloadCount());
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats why I gave the options 😄, and said that personally I'd go for the simple option i.e. fix the unit test, and make it check for 2 messages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want a special case for the ardupilot text message being only a single slot, then we should just do that and allow all others to be double slot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pkendall64 I missed your hint about changing the unit tests - still was under the impression that changing the unit tests was a no go :-)

Changed back things back to the original PR code, and changed the unit test to check for 2 messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really want a special case for the ardupilot text message being only a single slot, then we should just do that and allow all others to be double slot.

We need the ardupilot logic exactly as it is implemented otherwise the yapuu script stops working and I added the test to check all the cases we need to cover. So changing the tests would probably break compability with stuff that is working. So to really confirm this PR does not break anything it needs to be tested with yapuu/bf/kiss lua scripts + the parameter get/set stuff in our lua script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an additional slot for StatusText.

I've been using this PR code (without additional StatusText slot) with Ardupilot & Yaapu and status text worked. StatusText could be missed when flooding the receiver with other CRSF messages. However the original code has the same issue, as it stores StatusText in slot targetIndex = payloadTypesCount - 1 which is also used for other CRSF messages.

So, I've added a separate StatusText slot, to handle this case separately and always keep the last StatusText buffered to be sent OTA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schugabe Please let me know if the addition I made eases your concerns, or do you require further changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overflow was possible but yapuu tuned the sending rate to prevent issues. Did you test this PR with the yapuu script at 50hz update rate + 1:2 tlm?

Copy link
Contributor

@schugabe schugabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments to the code.

Basically we are back to the old logic and need 64 bytes more buffer space only for ardupilot text messages.

For msp responses we did already use both slots so I guess the large speed gains only happen for all other traffic like ardupilot and in this case the logic for the text messages would limit the reachable speed even more.

If everybode else if fine with throwing more ram at tlm buffer space it's fine with me but I'd say we need a followup PR to use the third slot for everything else if it's not a ardupilot fc.

}
if (header->type == CRSF_FRAMETYPE_DEVICE_PING && package[CRSF_TELEMETRY_TYPE_INDEX + 1] == CRSF_ADDRESS_CRSF_RECEIVER)
if (header->type == CRSF_FRAMETYPE_DEVICE_PING && header->dest_addr == CRSF_ADDRESS_CRSF_RECEIVER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the conditon (header->type == CRSF_FRAMETYPE_DEVICE_PING && (header->dest_addr == 0 || header->dest_addr == CRSF_ADDRESS_CRSF_RECEIVER)) since it's also possible to send a ping to all devices

{
const crsf_header_t *header = (crsf_header_t *) package;
const crsf_ext_header_t *header = (crsf_ext_header_t *) package;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only messages in the frame range 0x28-0x96 have ext headers and the classic tlm frames are not ext frames. While nothing bad happens in this code with this cast it's not always correct. That is the reason why I had the if (header->type >= CRSF_FRAMETYPE_DEVICE_PING) before I did interpret it as ext frame.

for (int8_t i = 0; i < payloadTypesCount - 2; i++)
// first try slot payloadTypesCount - 2, so that the OTA packets are transmitted in the same order as they are received
targetIndex = payloadTypesCount - 2;
targetFound = !payloadTypes[targetIndex].updated;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overflow was possible but yapuu tuned the sending rate to prevent issues. Did you test this PR with the yapuu script at 50hz update rate + 1:2 tlm?

@qqqlab
Copy link
Contributor Author

qqqlab commented Apr 29, 2023

@schugabe

Basically we are back to the old logic and need 64 bytes more buffer space only for ardupilot text messages.

Thanks for your feedback. Apparently I fail to clearly explain what this PR is about. This is not about the special cases (standard CRSF header GPS, SPEED, etc., or extended header CRSF status text etc.) but about any other CRSF message.

With the special cases we're interested in the most recent message, so it is fine to use a single buffer, so that we receive the most recent message and potentially drop the ones that came before. But with the rest we're interested to get each and every message. By using a single buffer for these messages, only 50% of the OTA capacity is used in worse case. Using a double buffer fixes this deficiency.

See example below: a CRSF message arrives every 4 transmission slots, and takes 5 slots to transmit OTA. With single buffering only 3 out of 6 messages are received OTA. With double buffering 4.8 of 6 messages are received, because while a message is being transmitted OTA the next incoming message is buffered in the second buffer.

incoming:   1---2---3---4---5---6---  
OTA single: 11111---33333---55555---
OTA double: 111112222233333444446666
V3.2.0:     11112---33334---55556---

('-' is idle transmission slot, '1' to '6' is msg transmission slot)

With single buffering, when the incoming data rate increases from 100% to 101% of OTA link capacity, OTA data rate suddenly drops from 100% to 50% and 51% of data is lost. This makes things difficult to diagnose. With double buffering the situation is more predictable, the OTA rate stays fully loaded at 100% and only 1% of incoming data is lost.

Note that it is very easy to overload the link, incoming data flows at 400kbps, whereas the OTA capacity is less than 16kbps.

Regarding current use cases:

V3.2.1 includes PR #2103 which fixes a race condition. For illustrative purposes the situation with V3.2.0 is also depicted above, now 0 out of 6 messages are received because the messages get corrupted by the next incoming message which overwrites the single buffer as it is being transmitted OTA.

Prior to V3.2.1 all extended CRSF messages (including all Ardupilot messages) were suffering from this race condition. Apparently everybody (including myself) was happy with the situation, as apparently nobody complained about missing messages. I only noticed the deficiency when I tried to send more data over the link.

Maybe others also noticed this, but instead spending the time to investigate, they switched to another datalink solution.

So, it appears to me that in the current use cases the CRSF messages are sufficiently throttled by the FC not to trigger the race condition. This also means that switching to double buffering will have no negative impact on these use cases.

Regarding the StatusText special case:

If an additional 64 bytes of RAM is unacceptable for low-RAM targets, we can either reverse the last commit, or use it only for plenty-RAM ESP targets.

@pkendall64
Copy link
Collaborator

I think we can live with the extra 64 bytes TBH. This PR just needs the conflicts fixed again and the PING mentioned above fixed.

@pkendall64 pkendall64 requested a review from schugabe May 8, 2023 03:44
@pkendall64 pkendall64 dismissed schugabe’s stale review May 8, 2023 03:44

All changes should be addressed

@AlessandroAU
Copy link
Member

@qqqlab We have a very active private developer channel on our discord server, It would be great for you to join and discuss ideas in real-time. Come join and tag Jye, Wez, PK or myself and we will add you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants