Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDC-NCM poor performance for known reasons #2068

Closed
1 task done
rgrr opened this issue May 19, 2023 · 41 comments
Closed
1 task done

CDC-NCM poor performance for known reasons #2068

rgrr opened this issue May 19, 2023 · 41 comments
Labels

Comments

@rgrr
Copy link
Contributor

rgrr commented May 19, 2023

Operating System

Linux

Board

Raspi Pico RP2040

Firmware

My probe project (https://github.com/rgrr/yapicoprobe/tree/feature/systemview-rndis) uses lwIP / TinyUSB to provide a network interface for SystemView (https://www.segger.com/products/development-tools/systemview/).

The probe is running on a Pico (RP2040), the connected target board is an nRF52840. The nRF52840 generates the SystemView events (https://github.com/rgrr/playground/tree/feature/xray/tools/SystemView), the events are collected from the target via RTT by the RP2040. SystemView itself is connected via TCP to the RP2040 and collects the generated data.

For the underlying Ethernet over USB protocol one can select between ECM/RNDIS/NCM in CMakeLists.txt of the probe firmware.

The probe is using the raw lwIP interface.

For performance testing there is the example iperf server contained in the firmware.

What happened ?

if one selects ECM/RNDIS as underlying protocol, the event throughput is around 30000 events/s,
if one selects NCM, the event throughput is < 1000 events/s.

It seems that network traffic is stuttering, i.e. data transfer (traced in the sent-event) pauses and then transfer bursts are happening.

Data is fed from the application to lwIP/TinyUSB the same in all three cases.

What is surprising is, that iperf throughput shows almost the same values for all three cases. So I'm wondering what's the difference between the iperf server and my SystemView server.

Is it because SystemView traffic is almost unidirectional and NCM needs somehow a trigger to start transfer?

How to reproduce ?

Complicated... and I'm actually not sure if this is a bug in TinyUSB or a configuration/handling problem on my side.

To reproduce one needs a Pico and a target board. The required firmware/setup is described above.

I'm really willing to dig deep into TinyUSB, so I'm actually looking for some pointers where to go.

Debug Log as txt file (LOG/CFG_TUSB_DEBUG=2)

On the target (nRF52840) the same firmware is running. The generated events lead to a data volume of < 20KByte/s.

ECM case

ECM transfers the data continuously, every ~170ms data transfer is happening.

[08:45:40.913804 5.864342] 0.000 (  0) - sysview_sent(00000000,2000C8B4,32) 2
[08:45:40.989980 0.076176] 0.076 ( 76) - sysview_sent(00000000,2000C8B4,429) 3
[08:45:40.992022 0.002043] 0.078 (  2) - sysview_sent(00000000,2000C8B4,827) 3
[08:45:40.992721 0.000699] 0.079 (  1) - sysview_sent(00000000,2000C8B4,16) 3
[08:45:41.153832 0.161110] 0.240 (161) - sysview_sent(00000000,2000C8B4,1460) 3
[08:45:41.155885 0.002054] 0.242 (  2) - sysview_sent(00000000,2000C8B4,1460) 3
[08:45:41.156673 0.000789] 0.243 (  1) - sysview_sent(00000000,2000C8B4,25) 3
[08:45:41.162192 0.005517] 0.249 (  6) - sysview_sent(00000000,2000C8B4,40) 3
[08:45:41.323352 0.161159] 0.410 (161) - sysview_sent(00000000,2000C8B4,1460) 3
[08:45:41.325516 0.002167] 0.412 (  2) - sysview_sent(00000000,2000C8B4,1460) 3
[08:45:41.327332 0.001815] 0.413 (  1) - sysview_sent(00000000,2000C8B4,262) 3
[08:45:41.328095 0.000762] 0.414 (  1) - sysview_sent(00000000,2000C8B4,8) 3
[08:45:41.490842 0.162746] 0.577 (163) - sysview_sent(00000000,2000C8B4,1460) 3
[08:45:41.492775 0.001934] 0.579 (  2) - sysview_sent(00000000,2000C8B4,1460) 3
[08:45:41.493777 0.001003] 0.580 (  1) - sysview_sent(00000000,2000C8B4,42) 3
[08:45:41.656139 0.162362] 0.742 (162) - sysview_sent(00000000,2000C8B4,2920) 3
[08:45:41.657225 0.001086] 0.743 (  1) - sysview_sent(00000000,2000C8B4,26) 3
[08:45:41.662299 0.005075] 0.749 (  6) - sysview_sent(00000000,2000C8B4,40) 3
[08:45:41.825894 0.163594] 0.912 (163) - sysview_sent(00000000,2000C8B4,2920) 3
[08:45:41.826684 0.000791] 0.913 (  1) - sysview_sent(00000000,2000C8B4,25) 3
[08:45:41.988900 0.162215] 1.075 (162) - sysview_sent(00000000,2000C8B4,2920) 3
[08:45:41.989710 0.000812] 1.076 (  1) - sysview_sent(00000000,2000C8B4,25) 3
[08:45:42.151838 0.162127] 1.238 (162) - sysview_sent(00000000,2000C8B4,2920) 3

NCM case

Here are long pauses between data transfer bursts

[08:51:15.898487 0.162371] 4.928 (162) - sysview_sent(00000000,2000C8C4,16) 3
[08:51:16.226914 0.328427] 5.256 (328) - sysview_sent(00000000,2000C8C4,5840) 3
[08:51:16.230771 0.003859] 5.261 (  5) - sysview_sent(00000000,2000C8C4,1460) 3
[08:51:16.233346 0.002576] 5.263 (  2) - sysview_sent(00000000,2000C8C4,1460) 3
[08:51:16.235905 0.002559] 5.266 (  3) - sysview_sent(00000000,2000C8C4,148) 3
[08:51:16.237971 0.002065] 5.268 (  2) - sysview_sent(00000000,2000C8C4,8) 3
[08:51:16.240273 0.002303] 5.270 (  2) - sysview_sent(00000000,2000C8C4,24) 3
[08:51:16.400926 0.160650] 5.431 (161) - sysview_sent(00000000,2000C8C4,1460) 3
[08:51:16.403073 0.002149] 5.433 (  2) - sysview_sent(00000000,2000C8C4,1460) 3
[08:51:16.405683 0.002610] 5.436 (  3) - sysview_sent(00000000,2000C8C4,278) 3
[08:51:16.407704 0.002023] 5.438 (  2) - sysview_sent(00000000,2000C8C4,8) 3
[08:51:16.410156 0.002451] 5.440 (  2) - sysview_sent(00000000,2000C8C4,24) 3
[08:51:16.570395 0.160230] 5.600 (160) - sysview_sent(00000000,2000C8C4,16) 3
[08:51:17.224815 0.654425] 6.254 (654) - sysview_sent(00000000,2000C8C4,5840) 3
[08:51:17.229041 0.004229] 6.259 (  5) - sysview_sent(00000000,2000C8C4,1460) 3
[08:51:17.231074 0.002033] 6.261 (  2) - sysview_sent(00000000,2000C8C4,1460) 3
[08:51:17.233662 0.002589] 6.263 (  2) - sysview_sent(00000000,2000C8C4,1460) 3
[08:51:17.237919 0.004257] 6.268 (  5) - sysview_sent(00000000,2000C8C4,1456) 3
[08:51:17.240700 0.002782] 6.271 (  3) - sysview_sent(00000000,2000C8C4,501) 3
[08:51:17.243056 0.002355] 6.273 (  2) - sysview_sent(00000000,2000C8C4,84) 3
[08:51:17.405442 0.162384] 6.435 (162) - sysview_sent(00000000,2000C8C4,16) 3
[08:51:17.727212 0.321770] 6.756 (321) - sysview_sent(00000000,2000C8C4,5840) 3
[08:51:17.731345 0.004136] 6.761 (  5) - sysview_sent(00000000,2000C8C4,1460) 3
[08:51:17.733566 0.002221] 6.763 (  2) - sysview_sent(00000000,2000C8C4,1460) 3
[08:51:17.735923 0.002357] 6.766 (  3) - sysview_sent(00000000,2000C8C4,100) 3

Screenshots

No response

I have checked existing issues, dicussion and documentation

  • I confirm I have checked existing issues, dicussion and documentation.
@rgrr rgrr added the Bug 🐞 label May 19, 2023
@joshikeval3131
Copy link

Hello rgrr,

The same thing has been observed on my platform. RNDIS performance is fairly reasonable, But NCM is 10 times slower then RNDIS.
My Prime suspect is the way NTB is handled

Also there is a bug in the implementation which I filed sometime ago but still haven't got any response yet.

#2010

@rgrr
Copy link
Contributor Author

rgrr commented Jun 6, 2023

Hello joshikeval3131,

to be honest, I'm more or less a novice to (Tiny)USB from developer side. Do you have some pointers where and how to start debugging the NCM code? Of course I have found ncm_device.c et al, but even adding some debug output still generates more confusion than clarity.

@rgrr
Copy link
Contributor Author

rgrr commented Jun 27, 2023

New evidence for the NCM problem.

I have done some tests with the iperf server provided in the lwIP examples and recorded with Wireshark.

Once I have done the tests with ECM and once with NCM. iperf command line on client side (Linux) is

for MSS in 90 93 100 150 200 255 256 300 400 500 511 512 600 700 800 900 1000 1100 1200 1300 1400 1450 1459 1460 1500; do iperf -c 192.168.14.1 -e -i 1 -l 22 -M $MSS; sleep 10; done

Wiresharks statistics graphs are as follows.

Good case is with ECM, although with MSS=1460/1500 I find it suspicious:

iperf-ecm

Bad case is with NCM, small MSS are generating a lot of errors within the stream:

iperf-ncm
For analysis I have attached a sequence with ECM/NCM iperf-90-93.zip

Does anyone has any pointers where to search (ok, I know it is most likely in ncm_device)

@rgrr
Copy link
Contributor Author

rgrr commented Jun 30, 2023

Hmmm, does anybody read this?

I want to make a proposal: the ncm_device driver seems to be very buggy, e.g. it operates on incoming frames while the previous one is still processed, also it does no ZLP insertion where it should insert one. These bugs are the cause for the above described hickups.

I have implemented a simple version of the driver, which allows only one ethernet frame on reception/transmission.

This would put the focus on reliability, not so much on performance.

If one of the maintainers agree, I will file a PR on this.

If one wants to check the code, it can be found here: https://github.com/rgrr/yapicoprobe/blob/feature/systemview-ncm-debugging/src/net/tinyusb/ncm_device_simple.c

@rgrr
Copy link
Contributor Author

rgrr commented Jul 2, 2023

Current image with ncm_device_simple...

ncm_device_simple

@rgrr rgrr changed the title CDC-NCM poor performance for unknown reasons CDC-NCM poor performance for known reasons Aug 2, 2023
@joshikeval3131
Copy link

Hmmm, does anybody read this?

I want to make a proposal: the ncm_device driver seems to be very buggy, e.g. it operates on incoming frames while the previous one is still processed, also it does no ZLP insertion where it should insert one. These bugs are the cause for the above described hickups.

I have implemented a simple version of the driver, which allows only one ethernet frame on reception/transmission.

This would put the focus on reliability, not so much on performance.

If one of the maintainers agree, I will file a PR on this.

If one wants to check the code, it can be found here: https://github.com/rgrr/yapicoprobe/blob/feature/systemview-ncm-debugging/src/net/tinyusb/ncm_device_simple.c

@joshikeval3131
Copy link

Hello rgrr,

Thanks for finding out these bugs I'll test this on my device and let you know.

@joshikeval3131
Copy link

Hmmm, does anybody read this?

I want to make a proposal: the ncm_device driver seems to be very buggy, e.g. it operates on incoming frames while the previous one is still processed, also it does no ZLP insertion where it should insert one. These bugs are the cause for the above described hickups.

I have implemented a simple version of the driver, which allows only one ethernet frame on reception/transmission.

This would put the focus on reliability, not so much on performance.

If one of the maintainers agree, I will file a PR on this.

If one wants to check the code, it can be found here: https://github.com/rgrr/yapicoprobe/blob/feature/systemview-ncm-debugging/src/net/tinyusb/ncm_device_simple.c

Thanks for this analysis

@rgrr
Copy link
Contributor Author

rgrr commented Aug 23, 2023

Note, that there is a pull request #2227 about this. I have done a complete rewrite in the meantime.

@rgrr
Copy link
Contributor Author

rgrr commented Aug 23, 2023

and some more images

@joshikeval3131
Copy link

joshikeval3131 commented Aug 24, 2023

rgrr,

Kudos and thanks for writing a very clear concise documentation.

I was able to see that the NCM bandwidth increased by a factor or 2 😄 .

However I observed that as the segment size kept increasing the bandwidth was decreasing.
For segment size > 800 my iperf application crashed.

Currently trying to understand and analyze your changes in the driver.

@rgrr
Copy link
Contributor Author

rgrr commented Aug 24, 2023

...
However I observed that as the segment size kept increasing the bandwidth was decreasing. For segment size > 800 my iperf application crashed.
...

Do you have some kind of available test application? What is the exact iperf command line?

@joshikeval3131
Copy link

Yup I am using standard lwip webserver example from TinyUSB and iperf application from lwip-contrib/examples/lwiperf

I am executing following command.
for MSS in 100 200 400 800 1200 1450 1500; do iperf -c 192.168.7.1 -e -i 1 -M $MSS -l 8192 -P 1; sleep 2; done

Here are the logs that I am getting.

Iperf logs

WARNING: attempt to set TCP maximum segment size to 100, but got 536
------------------------------------------------------------
Client connecting to 192.168.7.1, TCP port 5001 with pid 848162
Write buffer size: 8.00 KByte
TCP window size: 75.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.7.2 port 55872 connected with 192.168.7.1 port 5001 (ct=1.51 ms)
[ ID] Interval            Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr
[  3] 0.0000-1.0000 sec  2.03 MBytes  17.0 Mbits/sec  260/0          0        2K/640 us  3328.00
[  3] 1.0000-2.0000 sec  2.01 MBytes  16.8 Mbits/sec  257/0          0        2K/613 us  3434.49
[  3] 2.0000-3.0000 sec  2.05 MBytes  17.2 Mbits/sec  262/0          0        2K/685 us  3133.29
[  3] 3.0000-4.0000 sec  1.98 MBytes  16.6 Mbits/sec  253/0          0        2K/650 us  3188.58
[  3] 4.0000-5.0000 sec  2.04 MBytes  17.1 Mbits/sec  261/0          0        2K/653 us  3274.29
[  3] 5.0000-6.0000 sec  2.03 MBytes  17.0 Mbits/sec  260/0          0        2K/664 us  3207.71
[  3] 6.0000-7.0000 sec  1.97 MBytes  16.5 Mbits/sec  252/0          0        2K/598 us  3452.15
[  3] 7.0000-8.0000 sec  2.04 MBytes  17.1 Mbits/sec  261/0          0        2K/666 us  3210.38
[  3] 8.0000-9.0000 sec  1.98 MBytes  16.6 Mbits/sec  253/0          0        2K/655 us  3164.24
[  3] 9.0000-10.0000 sec  2.00 MBytes  16.8 Mbits/sec  256/0          0        2K/695 us  3017.48
[  3] 0.0000-10.0029 sec  20.1 MBytes  16.9 Mbits/sec  2575/0          0       -1K/651 us  3239.38
WARNING: attempt to set TCP maximum segment size to 200, but got 536
------------------------------------------------------------
Client connecting to 192.168.7.1, TCP port 5001 with pid 848196
Write buffer size: 8.00 KByte
TCP window size: 40.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.7.2 port 34966 connected with 192.168.7.1 port 5001 (ct=1.67 ms)
[ ID] Interval            Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT        NetPwr
[  3] 0.0000-1.0000 sec  2.51 MBytes  21.0 Mbits/sec  321/0          0        3K/506 us  5196.90
[  3] 1.0000-2.0000 sec  2.53 MBytes  21.2 Mbits/sec  324/0          0        3K/458 us  5795.21
[  3] 2.0000-3.0000 sec  2.38 MBytes  19.9 Mbits/sec  304/0          0        3K/472 us  5276.20
[  3] 3.0000-4.0000 sec  3.05 MBytes  25.6 Mbits/sec  390/0          0        3K/455 us  7021.71
[  3] 4.0000-5.0000 sec  1.98 MBytes  16.6 Mbits/sec  253/0          0        3K/448 us  4626.29
[  3] 5.0000-6.0000 sec   636 KBytes  5.21 Mbits/sec  81/0          0        3K/690 us  943.73
[  3] 6.0000-7.0000 sec  0.00 Bytes  0.00 bits/sec  0/2          3        0K/657 us  0.00
[  3] 7.0000-8.0000 sec  0.00 Bytes  0.00 bits/sec  0/2          0        0K/657 us  0.00
[  3] 8.0000-9.0000 sec  0.00 Bytes  0.00 bits/sec  0/2          1        0K/657 us  0.00
[  3] 9.0000-10.0000 sec  0.00 Bytes  0.00 bits/sec  0/2          0        0K/657 us  0.00
[  3] 0.0000-10.4118 sec  13.1 MBytes  10.5 Mbits/sec  1673/8          4       -1K/565 us  2327.65
WARNING: attempt to set TCP maximum segment size to 400, but got 536
connect failed: Operation now in progress
WARNING: attempt to set TCP maximum segment size to 800, but got 536

TinyUSB NCM Driver logs


(II) tud_network_can_xmit: request blocked
(II) tud_network_can_xmit: request blocked
(II) tud_network_can_xmit: request blocked
(II) tud_network_can_xmit: request blocked
(II) tud_network_can_xmit: request blocked
(II) tud_network_can_xmit: request blocked
(II) tud_network_can_xmit: request blocked
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?

@rgrr
Copy link
Contributor Author

rgrr commented Aug 25, 2023

ooopssss... and the example is running with ECM/RNDIS?

"request blocked" is not bad, but the "VALIDATION FAILED" messages are fatal. Possible reasons that come to my mind:

  • bad glue code (but if ECM/RNDIS are working this should not be the case)
  • memory alignment problem of the buffer (when I'm counting correctly, the ncm_interface buffers could land at an 16bit boundary if the struct is packed)
  • buffer is not valid (but why?)

What I also do not understand is "WARNING: attempt to set TCP maximum segment size to 100, but got 536". That also indicates that something is really wrong.

@rgrr
Copy link
Contributor Author

rgrr commented Aug 27, 2023

It is possible, that I have created a dependency of the driver with my glue code to TinyUSB. The suspicious part looks like:

static void context_tinyusb_linkoutput(void *param)
/**
 * Put \a xmt_buff into TinyUSB.
 *
 * Context: TinyUSB
 */
{
    if ( !tud_network_can_xmit(xmt_buff_len)) {
        //printf("context_tinyusb_linkoutput: sleep\n");
        vTaskDelay(pdMS_TO_TICKS(5));

        taskDISABLE_INTERRUPTS();
        usbd_defer_func(context_tinyusb_linkoutput, NULL, false);    // put yourself at end of TinyUSB event queue
        taskENABLE_INTERRUPTS();
    }
    else {
        tud_network_xmit(xmt_buff, xmt_buff_len);
    }
}   // context_tinyusb_linkoutput



static err_t linkoutput_fn(struct netif *netif, struct pbuf *p)
/**
 * called by lwIP to transmit data to TinyUSB
 *
 * Context: lwIP
 */
{
    if ( !tud_ready()) {
        return ERR_USE;
    }

    if (xmt_buff_len != 0) {
        return ERR_USE;
    }

    // copy data into temp buffer
    xmt_buff_len = pbuf_copy_partial(p, xmt_buff, p->tot_len, 0);
    assert(xmt_buff_len < sizeof(xmt_buff));

    taskDISABLE_INTERRUPTS();
    usbd_defer_func(context_tinyusb_linkoutput, NULL, false);
    taskENABLE_INTERRUPTS();

    return ERR_OK;
}   // linkoutput_fn

This glue code is driving me nuts, because actually there is no legal (and correct) way to make a function call in TinyUSB context. The above is my best bet. But as said: perhaps the first usbd_xfer_func() perhaps introduces something unintended.

I will look deeper into that.

@rgrr
Copy link
Contributor Author

rgrr commented Aug 27, 2023

Ok, first check shows, that with the lwip-webserver example it is running as well. Little bit slow, but anyway.

You can find the sources here: https://github.com/rgrr/tinyusb/tree/feature/standard-webserver/examples/device/net_lwip_webserver

@joshikeval3131
Copy link

joshikeval3131 commented Aug 28, 2023

rgrr,

I have confirmed that , lwip-webserver is also running fine on my platform.

"WARNING: attempt to set TCP maximum segment size to 100, but got 536". That also indicates that something is really wrong.

I suspect that this definitely indicates some problem. Also it works for smaller segment sizes but as I increase the segment size the bandwidth decreases and later point driver just crashes.

:(
I need to thoroughly understand the code changes that you made and the specific problems it is fixing in the original ncm driver.
After this only I will be able to better deduce what is causing this problem. Is it the TinyUSB ? or class driver ? Or my device specific dcd.c implementation.

@rgrr
Copy link
Contributor Author

rgrr commented Aug 29, 2023

:( I need to thoroughly understand the code changes that you made and the specific problems it is fixing in the original ncm driver. After this only I will be able to better deduce what is causing this problem. Is it the TinyUSB ? or class driver ? Or my device specific dcd.c implementation.

Please consider ncm_device as a rewrite. I have dropped more or less the original and started fresh but used the original as a template for some things.

Problem with the original ncm_device was, that it reused internal buffers already for reception while transmission of its contents to lwIP was still running.

@joshikeval3131
Copy link

joshikeval3131 commented Aug 30, 2023

I have taken a reference of glue logic from
https://github.com/rgrr/yapicoprobe/blob/feature/minor-cleanup/src/net/net_glue.c

Implemented the same.

As I increased the TinyUSB buffer size, the test case worked for all segment size. 😃

iperf command used:

for LEN in 40000 16000 1400 1024 800 200 80 22 8 2 1; do for MSS in 90 93 100 150 200 255 256 300 400 500 511 512 600 700 800 900 1000 1100 1200 1300 1400 1450 1459 1460 1500; do iperf -c 192.168.14.1 -e -i 1 -l $LEN -M $MSS; sleep 2; done; done

Wireshark analysis

NCM_NORTOS_IPERF_LOGS

Also please find the logs attached.

Observation -:

  • The bandwidth decreases as we increase the segment size.
  • At higher segment size USB Driver + stack is hitting some bottleneck. Not sure what is that can it be,is it because of bare metal implementation. ? 🤔
  • TCP window size doesn't really affect the performance.
  • peak performance is around 150 - 200 Bytes segment size.

Note-: I am using the NCM driver in NO OS setting.
ncm_iperf_log_file.txt

@rgrr
Copy link
Contributor Author

rgrr commented Aug 30, 2023

Actually I have no idea what the bottleneck could be. Have you checked the wireshark log if you see the expected? Perhaps your iperf behaves not as you want.

Your iperf output:

Client connecting to 192.168.7.1, TCP port 5001 with pid 1792756
Write buffer size:  100 Byte
TCP window size: 40.0 KByte (default)

Mine (2.1.9):

Client connecting to 192.168.14.1, TCP port 5001 with pid 18915 (1 flows)
Write buffer size: 40000 Byte
MSS req size 600 bytes (per TCP_MAXSEG)
TOS set to 0x0 (Nagle on)
TCP window size: 16.0 KByte (default)

For debugging purposes: could you please change

   INFO_OUT("(II) tud_network_can_xmit: request blocked\n");

so that it is printing. I had the case under windows, that after several days, the driver seemed to be in a state were it did output the above message very often.
Don't know what the reason was, perhaps the glue code which actually is not clean in switching to TinyUSB context. Currently I do not know about a way to do this reliably.

@joshikeval3131
Copy link

Actually I have no idea what the bottleneck could be. Have you checked the wireshark log if you see the expected? Perhaps your iperf behaves not as you want.

Tried with iperf 2.1.9 same result

  • According to analysis, this version of ncm driver is definitely much better than previous version.
    Can you explain what is the expectation with glue code logic ? So that if possible we can think in that direction.

Also with little clean up #2227 should be okay to merge 👍

Do you see any other blockers ?
And also do you see any scope of improvement upon this ?

@rgrr
Copy link
Contributor Author

rgrr commented Aug 31, 2023

Expectations with glue logic: there should be an official call to put an event into the TinyUSB event queue so that one get into the context of TinyUSB. lwIP API offers tcpip_try_callback() or tcpip_callback(). Currently one has to use usbd_defer_func() which is actually TinyUSB private and also does just block USB interrupts.

Clean up: will follow soon.

Currently I cannot see any further improvement. Perhaps wider use of the net_device will open some new wishes/issues.

@rgrr
Copy link
Contributor Author

rgrr commented Aug 31, 2023

One point for improvement could be to change the webserver example to use NCM by default.

This would improve compatibility over the several OSes, because NCM is supported by Linux, MacOS (AFAIK), Win11 (AFAIK) and Win10 (with minor driver installation).

With ECM or RNDIS I had no luck under Win10, at least on my machine.

@rgrr
Copy link
Contributor Author

rgrr commented Aug 31, 2023

One more thing (out of topic): how about adding iperf to the webserver example to get a reference example for performance measurements?

@rgrr
Copy link
Contributor Author

rgrr commented Sep 1, 2023

Out of curiosity: have you ever compared with ECM/RNDIS? And have you done a bidirectional test with e.g.
for MSS in 100 800 1450; do iperf -c 192.168.14.1 -e -i 1 -M $MSS -l 8192 -r; sleep 2; done?

((to me) surprisingly server-to-client does not work for small packets)

@Keval-Joshi-TI
Copy link

This would improve compatibility over the several OSes, because NCM is supported by Linux, MacOS (AFAIK), Win11 (AFAIK) and Win10 (with minor driver installation).

As far as I know RNDIS is windows standard offering and Linux also supports RNDIS by default. Where as in NCM extra driver installation is required. Thus it is okay to keep RNDIS as default.

: how about adding iperf to the webserver example to get a reference example for performance measurements?

lwip Iperf will be a good to have example. 👍 I think we should add it.

have you ever compared with ECM/RNDIS? And have you done a bidirectional test with e.g.
for MSS in 100 800 1450; do iperf -c 192.168.14.1 -e -i 1 -M $MSS -l 8192 -r; sleep 2; done?

No haven't done Bidirectional test. When I tried I showed following error message.

 ------------------------------------------------------------
Server listening on TCP port 5001 with pid 2735107
Read buffer size: 8.00 KByte (Dist bin width=1.00 KByte)
MSS req size 100 bytes (per TCP_MAXSEG)
TCP window size:  128 KByte (default)
------------------------------------------------------------
ERROR: expected reverse connect did not occur

I suspect that in default lwip iperf example there is no RX logic. Can you confirm the same ?
Can you point me towards RX capable iperf example if you have any ?

Thanks

@rgrr
Copy link
Contributor Author

rgrr commented Sep 4, 2023

yes, that is a valid point: RNDIS is correctly detected by Win10. But there are some drawbacks: RNDIS changes somehow routing of my Linux desktop (Debian) and also RNDIS works only under Win10 if it is the only class of the USB device. So actually not a generic solution.

Nevertheless as an example RNDIS should do it.

But: those examples aren't just examples. To my understanding they are also some kind of compilation test. So if all examples go for RNDIS, the NCM device will not be compile tested which to my understanding is "bad".

Concerning iperf: I'm observing the same behavior but just for MSS=100. MSS=800,1450 are working. So actually the iperf example is capable of doing transmission. But still there must be somewhere a bug in the iperf example.

I will rework my PRs #2227 and #2242 accordingly.

@rgrr
Copy link
Contributor Author

rgrr commented Sep 5, 2023

@Keval-Joshi-TI : "funny" thing, if you limit in lwiperf_tcp_client_send_more() (in lwiperf.c) the txlen_max to 700 (instead of TCP_MSS) then iperf -c 192.168.14.1 -e -i 1 -M 100 -l 8192 -r succeeds.

Don't know yet, if this indicates a bug in lwiperf, lwIP, TinyUSB or the device driver (but ECM shows the same behavior).

Comparing ECM / NCM shows that NCM is factor 4 better in direction device -> host.

PS: actual problem is, that tcp_output() does not create a lwiperf_tcp_client_sent() callback

@Keval-Joshi-TI
Copy link

Keval-Joshi-TI commented Sep 5, 2023

@rgrr : I am stuck at more basic problem.

Even with your suggested change int the lwip example DEV-> HOST transfer is not happening in my case :(

------------------------------------------------------------
Client connecting to 192.168.7.1, TCP port 5001 with pid 3025046 (1 flows)
Write buffer size: 8192 Byte
MSS req size 100 bytes (per TCP_MAXSEG)
TOS set to 0x0 (Nagle on)
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Server listening on TCP port 5001 with pid 3025046
Read buffer size: 8.00 KByte (Dist bin width=1.00 KByte)
MSS req size 100 bytes (per TCP_MAXSEG)
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  1] local 192.168.7.2%usb0 port 57712 connected with 192.168.7.1 port 5001 (sock=3) (icwnd/mss/irtt=0/100/138) (ct=0.20 ms) on 2023-09-05 14:11:30.762 (IST)
[ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT(var)        NetPwr
[  1] 0.00-1.00 sec  1.08 MBytes  9.04 Mbits/sec  138/0         0        2K/1021(137) us  1107
[  1] 1.00-2.00 sec  1.06 MBytes  8.91 Mbits/sec  136/0         0        2K/1040(141) us  1071
[  1] 2.00-3.00 sec  1.06 MBytes  8.91 Mbits/sec  136/0         0        2K/1033(146) us  1079
[  1] 3.00-4.00 sec  1.06 MBytes  8.91 Mbits/sec  136/0         0        2K/979(124) us  1138
[  1] 4.00-5.00 sec  1.07 MBytes  8.98 Mbits/sec  137/0         0        2K/993(204) us  1130
[  1] 5.00-6.00 sec  1.06 MBytes  8.91 Mbits/sec  136/0         0        2K/995(132) us  1120
[  1] 6.00-7.00 sec  1.05 MBytes  8.85 Mbits/sec  135/0         0        2K/1003(137) us  1103
[  1] 7.00-8.00 sec  1.06 MBytes  8.91 Mbits/sec  136/0         0        2K/933(131) us  1194
[  1] 8.00-9.00 sec  1.06 MBytes  8.91 Mbits/sec  136/0         0        2K/999(136) us  1115
[  1] 9.00-10.00 sec  1.06 MBytes  8.91 Mbits/sec  136/0         0        2K/1043(139) us  1068
[  1] 0.00-10.02 sec  10.6 MBytes  8.92 Mbits/sec  1363/0         0        2K/1021(133) us  1092
ERROR: expected reverse connect did not occur

-> Moreover I am trying to run lwip + tinyUSB in freeRTOS i.e LWIP with NO_SYS = 0 where lwip task has lower priority.
Somehow lwip shows timeout when I try NCM data transfer. I took your lwipopts for reference from https://github.com/rgrr/yapicoprobe/blob/feature/minor-cleanup/include/lwipopts.h. Somehow this combination is not functional for me as of now.

Quick question on glue logic:

-> The part where you call tcpip_try_callback_block which essentially schedules a function call which puts received ethernet packet into lwip context , looks all okay to me.

-> But the one where you are calling the defer function looks little bit missplaced. In baremetal scenario I tried with and without defer function, its identical for me.
i.e the approach taken by default lwip_webserver example.

Can you please why we need to put context_tinyusb_linkoutput call in the tud task event queue ?.
Can lwip directly handle it when it calls linkoutput_fn ?

@rgrr
Copy link
Contributor Author

rgrr commented Sep 5, 2023

Perhaps our lwIP configuration is different. Essential parts of mine:

#define TCP_MSS                                (1500 - 20 - 20)    // MTU minus header sizes (best value til now)
#define TCP_SND_BUF                            (8 * TCP_MSS)       //   good tuning

#define TCP_SND_QUEUELEN                       16
#define TCP_SNDQUEUELOWAT                      (TCP_SND_QUEUELEN / 2)
#define MEMP_NUM_TCP_SEG                       32

#define MEM_SIZE                               40000

Concerning the glue logic: this is also a little bit historically grown. I struggled around with lwIP, NCM, FreeRTOS and did not get it flying. So I made a lot of testing. One of them was to assure that the NCM device is always called from TinyUSB context. After many tests and attempts to repair NCM, I decided to rewrite it. First approach was ncm_device_simple in my repo which handles only one datagram in each direction. Compatibility even with Win10 was ways better, because with NCM Win10 allows other USB classes in parallel, with RNDIS it does not. Second step was to reimplement NCM device.

Special case in the glue logic is context_tinyusb_linkoutput(): the idea is to put the deferred call at the end of the TinyUSB queue, so not using a simple while loop. But I made a quick test and it seems (for the iperf testcase) that the while loop is also ok:

    while ( !tud_network_can_xmit(xmt_buff_len)) {
        //printf("context_tinyusb_linkoutput: sleep\n");
        vTaskDelay(pdMS_TO_TICKS(5));
    }
    tud_network_xmit(xmt_buff, xmt_buff_len);

But I did not test if this context switching is actually required. For me this is the cleaner way although the glue logic looks a little bit more cumbersome.

@rgrr
Copy link
Contributor Author

rgrr commented Sep 7, 2023

gosh... sometimes it is more complicated... I have to correct myself: the above loop does not work, the one with the usbd_defer_func() is the correct one. And this is also true for ECM/RNDIS: they do fine with the defer loop but not with the simple loop from above. I will investigate this further if there is time, but I guess the simple loop blocks TinyUSB while it actually wants something to do. The problems appears for ECM immediately, for NCM (because of better buffering) for very small "-M" in the iperf invocation.

Concerning lwiperf: the problem with the sender operations lies within the condition of the do-while-loop in lwiperf_tcp_client_send_more(). The condition txlen >= (TCP_MSS / 2) is too limiting, because for -M 100 the call to tcp_write() does not succeed and thus there is no data in the buffer which stops transmission.

Replacing the above condition with txlen != 0 solves this and lwiperf sends data correctly.

@joshikeval3131: Could you please test this as well?

@Keval-Joshi-TI
Copy link

Keval-Joshi-TI commented Sep 7, 2023

@rgrr :

Concerning lwiperf: the problem with the sender operations lies within the condition of the do-while-loop in lwiperf_tcp_client_send_more(). The condition txlen >= (TCP_MSS / 2) is too limiting, because for -M 100 the call to tcp_write() does not succeed and thus there is no data in the buffer which stops transmission.
Replacing the above condition with txlen != 0 solves this and lwiperf sends data correctly.

Even with above mentioned change rx is not working in my case.
:(
Getting following error.

ERROR: expected reverse connect did not occur

Let me check my lwip-stack version

Updates on NCM /RNDIS + LWIP FreeRTOS migration.

Converted my reference application to work with FreeRTOS.
& Surprisingly I am getting better results with RNDIS on FreeRTOS compared to baremetal.

At max TX iperf hit 49Mbps bandwidth.

------------------------------------------------------------
Client connecting to 192.168.7.1, TCP port 5001 with pid 3470278 (1 flows)
Write buffer size: 8192 Byte
MSS req size 1450 bytes (per TCP_MAXSEG)
TOS set to 0x0 (Nagle on)
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.7.2%usb0 port 60214 connected with 192.168.7.1 port 5001 (sock=3) (icwnd/mss/irtt=14/1450/169) (ct=0.23 ms) on 2023-09-07 16:17:05.693 (IST)
[ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT(var)        NetPwr
[  1] 0.00-1.00 sec  5.84 MBytes  49.0 Mbits/sec  747/0         0       14K/956(7) us  6401
[  1] 1.00-2.00 sec  5.80 MBytes  48.7 Mbits/sec  743/0         0       14K/953(6) us  6387
[  1] 2.00-3.00 sec  5.78 MBytes  48.5 Mbits/sec  740/0         0       14K/989(74) us  6130
[  1] 3.00-4.00 sec  5.80 MBytes  48.6 Mbits/sec  742/0         0       14K/953(4) us  6378
[  1] 4.00-5.00 sec  5.80 MBytes  48.6 Mbits/sec  742/0         0       14K/1013(111) us  6000
[  1] 5.00-6.00 sec  5.80 MBytes  48.6 Mbits/sec  742/0         0       14K/958(7) us  6345

However the story so far with NCM is not good. With OS migration I can see some performance increase for smaller packet size i.e MSS = 100, 200. For MSS=400 & above the NCM driver crashes.
with following debug message.

(II) tud_network_can_xmit: request blocked

@Keval-Joshi-TI
Copy link

@rgrr :

NCM driver worked for me for all segment sizes.

When lwip_task priority == tud_task priority it works well and as expected.

1106_713_1

I am trying to get iperf -r working on my platform.

@rgrr
Copy link
Contributor Author

rgrr commented Sep 7, 2023

My device is a Raspi Pico (RP2040), i.e. dual-core Cortex-M0. TinyUSB has higher priority than lwIP. I will test, if it matters which process has higher priority (both processes are limited to one core).

lwiperf: I inserted several printf() for debugging. They pointed very fast to the loop in lwiperf_tcp_client_send_more().

What is your current glue code? Especially if you are using TinyUSB I would use mine ;-)

@kripton
Copy link
Contributor

kripton commented Sep 7, 2023

Just since I'm also using the NCM device and didn't experience any problem with my RP2040-based project up to now: @rgrr: You are using an RTOS? I'm going bare-metal. In case you want to do measurements using my setup (you just need to enable iperf but you know how to do that ;): https://github.com/OpenLightingProject/rp2040-dmxsun

@rgrr
Copy link
Contributor Author

rgrr commented Sep 7, 2023

@kripton : does it mean that you are using the new NCM device?

It did not do any tests with your project although I like your work. I stopped when "make" asked for npm :-/

Nevertheless your glue code points me into the direction why the simple loop from above did not work: the call to tud_task() was missing.

Corrected version looks like this:

static void context_tinyusb_linkoutput(void *param)
/**
 * Put \a xmt_buff into TinyUSB.
 *
 * Context: TinyUSB
 */
{
    while ( !tud_network_can_xmit(xmt_buff_len)) {
        printf("context_tinyusb_linkoutput: sleep\n");
        vTaskDelay(pdMS_TO_TICKS(1));
        tud_task();
    }
    tud_network_xmit(xmt_buff, xmt_buff_len);
}   // context_tinyusb_linkoutput

@kripton
Copy link
Contributor

kripton commented Sep 7, 2023

@kripton : does it mean that you are using the new NCM device?

No, i'm using the original, first version and I did notnface any problems with it.

It did not do any tests with your project although I like your work. I stopped when "make" asked for npm :-/

True, dmxsun is compiling a React web app, Sorry, I know it causes additional effort.

Nevertheless your glue code points me into the direction why the simple loop from above did not work: the call to tud_task() was missing.

So does it solve your problems with the original NCM device or your new one?

@rgrr
Copy link
Contributor Author

rgrr commented Sep 7, 2023

@kripton : does it mean that you are using the new NCM device?

No, i'm using the original, first version and I did notnface any problems with it.

You will stumble into problems if the transmitted packets are coming fast and in different sizes. My initial application was SystemView.

Nevertheless your glue code points me into the direction why the simple loop from above did not work: the call to tud_task() was missing.

So does it solve your problems with the original NCM device or your new one?

I'm actually not the one having problems with it ;-) I'm just trying to track and understand the problems @joshikeval3131 has. The simple loop was just a reaction to his complaint about an additional usb_defer call. Unfortunately wrong in the first attempt.

@rgrr
Copy link
Contributor Author

rgrr commented Sep 8, 2023

@Keval-Joshi-TI

@rgrr :

NCM driver worked for me for all segment sizes.

When lwip_task priority == tud_task priority it works well and as expected.

Made some tests with priority: for my device tud_task priority > lwip_task priority is mandatory. Otherwise especially target->host transfer collapses.

@Keval-Joshi-TI
Copy link

Keval-Joshi-TI commented Sep 8, 2023

@rgrr:

Thanks for conformation on this

Iperf Rx is finally functional for me
On my Linux machine due to some firewall issue iperf -r was not working

I am getting good results with the changes you suggested in lwiperf_tcp_client_send_more()
i.e ( txlen != 0 ). :D

@kripton:

In my case so far I am getting much better performance with the new driver compare to older one.
In bare metal I can see significant difference and in FreeRTOS there is notable difference.

Note -:

  • In both the case I am using the glue logic which @rgrr has proposed.

  • TCP- TX & RX bandwidth is around 50Mbps.

@rgrr
Copy link
Contributor Author

rgrr commented May 15, 2024

This has been closed by #2227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants